Approximate computing is one of the approaches to energy-efficient design of digital systems in many domains, including Machine Learning (ML). The use of specialized data formats in Deep Neural Networks (DNNs), the dominant ML algorithm, could allow substantial improvements in processing time and power efficiency.
The focus is on applying variable precision formats to ML algorithms. These formats allow to set different precisions for different operations and to tune the precision of given layers of the neural network to obtain higher power efficiency.
Increasingly sophisticated and computationally intensive algorithms are required for applications running on mobile devices and embedded processors constituting the Internet-of-Things (IoT). These applications include audio and image recognition, machine learning, and security. Heavy computations are transferred to servers (the cloud), but in the paradigm of Edge computing, it is desirable to perform the computation locally to decrease latency, network traffic and reduce the overall energy footprint.
In this context, Application Specific Processors (ASPs) are used to accelerate software applications in portable systems, and at the Edge. FPGA-based accelerators can be designed and fine tuned to match exactly the algorithm, and FPGAs can be reconfigured at run-time by making the system adaptable to the specific workload.
Moreover, provided a library of ASPs, specific ASPs are loaded on-the-fly in the FPGA, Dynamically-Loaded Hardware Libraries (HLL), and the execution is transfered from the CPU to the FPGA-based accelerator.
The main objective is to implement traditional Digital Signal Processing (DSP) by using low-power methods to obtain significant reductions in power consumption.
Residual Arithmetic
The Residue Number System (RNS) allows the decomposition of a given dynamic range (bit-width) in slices of smaller range on which the computation can be implemented in parallel at higher speed.
Imprecise Arithmetic
Precision in arithmetic operations is traded off with reduced power dissipation. In signal processing, it is possible to have an acceptable quality even introducing some errors.
This area of research is related to hardware algorithms for numerical computations and their effective implementation in terms of speed of execution, area, and energy.
Emphasys is given to more complicated operations such as division and square-root, and to the implementation of operations in decimal arithmetic.
When it is not possible to reduce the power dissipation any further, the chip temperature rise can be mitigated by changing the power density of the system: by reorganizing the floorplan (statically), or by thermal-aware scheduling the SoC operations (dynamically).
Modified by Alberto Nannarelli on
Sunday October 01, 2023 at 16:07