List of Figures
List of Tables
1 Background
1.1 Metrics
1.2 Energy Dissipation in CMOS
1.3 Approaches to Energy Dissipation Reduction
1.4 Asynchronous Systems
1.5 Tools for Low-Power Design
1.5.1 Transistor level
1.5.2 Gate level
1.5.3 Architectural level
1.6 Floating-Point Division and Square Root
1.6.1 IEEE Floating-Point Standard
1.6.2 Division and Square Root
2 Algorithms
2.1 Division Algorithm
2.2 Conversion and Rounding Algorithm
2.3 Example of Division
2.4 Division by Overlapping Stages
2.5 Very High Radix Division
2.6 Square Root Algorithm
2.7 Combined Division and Square Root Algorithm
3 Techniques to Reduce Energy Dissipation
3.1 Radix-4 Division Algorithm and Basic Implementation
3.2 Classification of Techniques
3.3 Retiming the Recurrence
3.3.1 Reducing the Transitions in the Multiplexer
3.4 Changing the Redundant Representation
3.5 Using Gates with Lower Drive Capability
3.6 Dual Voltage
3.7 Equalizing the Paths to Reduce Glitches
3.8 Partitioning and Disabling the Selection Function
3.9 Glitch Filtering and Suppression
3.10 Reductions in Conversion and Rounding
3.10.1 On-the-fly Conversion Algorithm Modification
3.10.2 Disabling the Clock
3.10.3 Gating the Trees
3.10.4 Dual Voltage
3.11 Switching-off Not Active Blocks
3.12 Optimization by Synthesis for Low-Power
4 Implementations
4.1 Design Flow, Tools and Libraries
4.1.1 Design Flow and Tools
4.1.2 Standard Cell Libraries
4.1.3 Presentation of Results
4.2 Radix-4 Division
4.2.1 Algorithm and Basic Implementation
4.2.2 Low-Power Implementation
4.2.3 Dual Voltage Implementation
4.2.4 Optimization with Synopsys Power Compiler
4.2.5 Summary of Results for Radix-4
4.3 Radix-8 Division
4.3.1 Algorithm and Basic Implementation
4.3.2 Low-Power Implementation
4.3.3 Dual Voltage Implementation
4.3.4 Optimization with Synopsys Power Compiler
4.3.5 Summary of Results for Radix-8
4.3.6 Comparison with scheme with overlapped radix-2 stages
4.4 Radix-16 Division
4.4.1 Algorithm and Implementation
4.4.2 Low-Power Implementation
4.4.3 Dual Voltage Implementation
4.4.4 Optimization with Synopsys Power Compiler
4.4.5 Summary of Results for Radix-16
4.5 Radix-512 Division
4.5.1 Algorithm and Basic Implementation
4.5.2 Low-Power Implementation
4.5.3 Dual Voltage Implementation
4.5.4 Summary of Results for Radix-512
4.6 Radix-4 Combined Division and Square Root
4.6.1 Algorithm and Implementation
4.6.2 Low Power Implementation
4.6.3 Dual Voltage Implementation
4.6.4 Optimization with Synopsys Power Compiler
4.6.5 Summary of Results for Combined Unit
4.6.6 Energy Comparison with Radix-4 Divider
4.7 Summary of Estimation Error
5 Evaluation of the Designs
5.1 Impact of the Energy Reduction Techniques
5.2 Results and Comparisons among radices
A Implementation of Blocks Common to Most Radices
A.1 Register
A.2 Carry-Save Adder
A.3 Selection Function
A.4 Multiple Generator
A.5 Sign-and-Zero Detection Unit (SZD)
A.6 Voltage Level Shifter
B CAD Tools
B.1 PET: Power Evaluation Tool
B.1.1 PET Energy and Power Models
B.1.2 PET Implementation
B.1.3 PET Testing
B.2 ACC: Automatic Cell Characterization
B.2.1 ACC Energy Views
B.2.2 ACC Implementation
B.3 Synopsys Power Compiler
B.3.1 Gate transistor dimensions
B.3.2 Composition
B.3.3 Pin swapping
B.3.4 Sizing and buffering