Tunable Floating Point Integration with RISC-V Core

RISC-V K Version: RISC-V (32-bit) core plus TFP Unit developed at DTU Compute


Authors: K. Hesse, N. F. Frandsen, S. B. Lindengren, W. K. Mathiassen

Implementation: STM 45 nm Standard Cells. Clock rate 500 MHz

TFP32 Unit:
Clock rate 500 MHz
  • TFP32 add/sub [1]
  • TFP32 mul [2]
  • TFP16 div/mul [3]
                  TFP Unit Control:
Set m e and bias by CSR (status registers)
  • csrwi 0x800, 0x3 # exponent width  -  e=8 (encoded as "3")
  • csrwi 0x801, 23  # mantissa width  -  m=23
  • csrwi 0x802, 127 # custom bias  -  bias=23 (binary32)

Case Study: SAXPY

SAXPY ( Sum of A times X Plus Y)

$\displaystyle Y[k] = A\times X[k] + Y[k] ~~~~~~ k=0,1, \ldots , n-1
$

To increase throughput performed loop unrolling (4 SAXPY iter.) assembly code


SAXPY unroll-4 execution - VCS/Verdi screenshot

TFP Unit is active 25% of the cycles

SAXPY: Power Efficiency


SAXPY Execution - Estimated Power in TFPU

Format  binary32  binary16  Bfloat16  TFPe5m3            
P(active) 6.71 5.41 4.66 4.34            
P(idle) 0.12 0.10 0.10 0.08            
P(average) 1.72 1.39 1.21 1.12            
Ratio 1.00 0.81 0.70 0.65            
                     

SAXPY: RISC-V Power Breakdown



References

  1. A. Nannarelli, "Tunable Floating-Point Adder," IEEE Transactions on Computers, vol. 68, no. 10, pp. 1553-1560, Oct. 2019.

  2. A. Nannarelli. "Tunable Floating-Point for Energy Efficient Accelerators", Proc. of 25th IEEE Symposium on Computer Arithmetic (ARITH-25), p. 33-40, Amherst, USA. 25-27 June 2018.

  3. A. Nannarelli. "Variable Precision 16-bit Floating-Point Vector Unit for Embedded Processors", Proc. of 27th IEEE Symposium on Computer Arithmetic (ARITH 2020), p. 96-102. Portland, USA. 7-10 June 2020.



Modified by Alberto Nannarelli on Wednesday December 04, 2024 at 17:53