Tunable Floating Point Integration with RISC-V Core
RISC-V K Version:
RISC-V (32-bit) core plus TFP Unit developed at DTU Compute
Authors:
K. Hesse, N. F. Frandsen, S. B. Lindengren, W. K. Mathiassen
Implementation: STM 45 nm Standard Cells. Clock rate 500 MHz
- Standard RISC-V 5-stage with 2 separate 32-entry Register File for integer (X) and FP (F)
- 32-bit Base Integer Instruction Set RV321
- F and Zicsr extensions
TFP32 Unit:
Clock rate 500 MHz
- TFP32 add/sub [1]
- TFP32 mul [2]
- TFP16 div/mul [3]
|
|
TFP Unit Control:
Set m e and bias by CSR (status registers)
- csrwi 0x800, 0x3 # exponent width
- e=8 (encoded as "3")
- csrwi 0x801, 23 # mantissa width
- m=23
- csrwi 0x802, 127 # custom bias
- bias=23 (binary32)
|
Case Study: SAXPY
SAXPY
(
Sum of
A times
X
Plus
Y)
To increase throughput performed
loop unrolling (4 SAXPY iter.)
assembly code
SAXPY unroll-4 execution - VCS/Verdi screenshot
TFP Unit is
active 25% of the cycles
SAXPY: Power Efficiency
SAXPY Execution - Estimated Power in TFPU
Format |
binary32 |
binary16 |
Bfloat16 |
TFPe5m3 |
|
|
|
|
|
|
P(active) |
6.71 |
5.41 |
4.66 |
4.34 |
|
|
|
|
|
|
P(idle) |
0.12 |
0.10 |
0.10 |
0.08 |
|
|
|
|
|
|
P(average) |
1.72 |
1.39 |
1.21 |
1.12 |
|
|
|
|
|
|
Ratio |
1.00 |
0.81 |
0.70 |
0.65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SAXPY: RISC-V Power Breakdown
References
-
A. Nannarelli,
"Tunable Floating-Point Adder,"
IEEE Transactions on Computers, vol. 68, no. 10, pp. 1553-1560, Oct. 2019.
-
A. Nannarelli.
"Tunable Floating-Point for Energy Efficient Accelerators",
Proc. of 25th IEEE Symposium on Computer Arithmetic (ARITH-25),
p. 33-40,
Amherst, USA. 25-27 June 2018.
-
A. Nannarelli.
"Variable Precision 16-bit Floating-Point Vector Unit for Embedded Processors",
Proc. of 27th IEEE Symposium on Computer Arithmetic (ARITH 2020),
p. 96-102. Portland, USA. 7-10 June 2020.
Modified by Alberto Nannarelli on
Wednesday December 04, 2024 at 17:53