From article in proceedings:
The complete visual results of the IDCT test are reported below.
The results in Table V are obtained by implementation in a 90 nm standard cells library (clock rate is 100 MHz). The errors are computed with respect to a floating-point software implementation (quantization error for r4-mult).
Unit | delay | area | uma | huse | power ratio | ||||
MULT | [ps] | [μm2] | Pave [μW] | eave | emax | Pave [μW] | eave | emax | |
r4-mult | 1398 | 7702 | 208 | 3.7 | 9 | 284 | 3.8 | 10 | 1.00 |
r4-trunc-6 | 1254 | 5778 | 163 | 5.1 | 22 | 224 | 8.1 | 24 | 0.78 |
r4-trunc-8 | 1244 | 5197 | 143 | 24.2 | 115 | 194 | 42.9 | 129 | 0.68 |
sloppy-row-2 | 1286 | 7003 | 189 | 4.2 | 40 | 255 | 5.1 | 47 | 0.90 |
sloppy-row-3 | 1286 | 6839 | 180 | 11.3 | 157 | 239 | 14.7 | 189 | 0.85 |
The results show that the larger reduction in power is obtained for radix-4 truncated multipliers. This is in large part justified by the smaller area required by the accumulate circuitry (accumulate-path: CSA 4:2, two registers and final adder) that for the truncated schemes are reduced up to 33% (16 vs. 24 bit accumulate-path). For the multiplier itself, as shown in Fig. 8, the smaller sloppy rows in the sloppy scheme compensate for the larger tree when compared to the truncated multipliers.
The complete visual results of the IDCT test are reported below.
|
|
Decompressed Image
| Error Map
| Error Histogram
|
---|
Decompressed Image
| Error Map
| Error Histogram
|
---|
Decompressed Image
| Error Map
| Error Histogram
|
---|
Decompressed Image
| Error Map
| Error Histogram
|
---|
Decompressed Image
| Error Map
| Error Histogram
|
---|
Decompressed Image
| Error Map
| Error Histogram
|
---|
Decompressed Image
| Error Map
| Error Histogram
|
---|
Decompressed Image
| Error Map
| Error Histogram
|
---|
Decompressed Image
| Error Map
| Error Histogram
|
---|
Decompressed Image
| Error Map
| Error Histogram
|
---|
Alberto Nannarelli