If you are unable to display correctly math fonts in Netscape under X11, click here.

Power Characterization of Digital Filters Implemented on FPGA

Gian Carlo Cardarilli, Andrea Del Re, Alberto Nannarelli and Marco Re
Department of Electrical Engineering
University of Rome "Tor Vergata" - Italy

http://dspvlsi.uniroma2.it/

Abstract

The evaluation of power consumption in complex digital systems is a hard task that normally requires long simulation time and complicated models. In this work, we obtain power consumption estimates from the measurement of the average current absorption of digital filters mapped on a Field Programmable Gate Array (FPGA). We also compare the measurements made with the results previously obtained for a standard cells implementation of the same filters. Moreover, we explore the possibility of carrying out measurements of other electrical parameters on hardware to extract information on a system, instead of simulating its behavior with complicated models.

1 Introduction

In the past years, Field Programmable Gate Arrays (FPGA) have gained increasing importance in several areas of VLSI design. Today, FPGAs, which offer a large number of equivalent gates and can be clocked at high frequencies, are becoming a serious competitor of ASICs for some applications. Furthermore, because of the re-configurability, FPGAs provide the quickest approach to system prototyping and a reliable tool to emulate in hardware the behavior of systems too complicated to be simulated in software. For these reasons, it can make sense to extract information on a system by its implementation on FPGAs followed by measurements on the hardware, instead of complicated modeling and long simulation.

In this work, we evaluate the possibility of obtaining power consumption estimates from a FPGA implementation of the system to design. For this purpose, we measure the average power dissipation of circuits mapped on an FPGA, and compare the measurements with the results obtained with software simulation.

Previous work demonstrated that FIR filters implemented in the Residue Number System (RNS) offer better performance of filters realized in the traditional binary system in terms of area and power dissipation [1]. Those results were obtained for the implementation of the filters on standard cells. In this work, we want to verify if consistent results can be obtained for the implementation on FPGA, and comment on the trade-offs between the simulation- estimation and implementation-measurement approaches.

The measurements carried out on the FPGA implementation confirm that RNS filters consumes less power than the corresponding TCS filters.

2 Previous work

In [1], we presented the implementation of digital FIR filters

y(n) = N
å
k = 0
a_k x(n-k)

realized with several architectures: FIR filters in both direct and transposed form, filters with constant and variable coefficients, error-free and truncated. The filters were implemented in the conventional two's complement arithmetic (TCS) and in Residue Number System (RNS).

The use of the RNS allows the decomposition of a given dynamic range in slices of smaller range on which the computation can be implemented in parallel [2], [3]. Therefore, a FIR filter can be decomposed, as shown in Figure 1, into P filters of smaller dynamic range (P is the number of moduli) working in parallel. The overhead introduced by the input and output conversions (binary-RNS and RNS-binary) can be reduced by using efficient conversion techniques [4,].

For the implementation of TCS filters, we chose to keep the product in carry-save (CS) format in order to speed-up the operations. The conversion from CS representation to two's complement is performed by a carry-propagate adder (CPA) in the last stage of the filter.

We summarize some of the results of [1] limiting our discussion to the power dissipation of error-free FIR filters realized in transposed form, because they are modular with respect to the number of taps N (i.e. adding extra taps does not alter the filter architecture), and results can be easily extended to any N. In addition, we consider also the implementation of RNS filters which takes advantage of a redundant representation (carry-save) of the residues to reduce the cycle time (e.g. increase the throughput). This scheme, we abbreviate in CS-RNS, is discussed in detail in [6].

For all the filters (TCS, RNS, CS-RNS) we set a 20-bit dynamic range. In order to cover this range, we chose for the RNS and CS-RNS filters the following set of moduli

{ 3, 5, 7, 11, 17, 64 }

which gave us the best delay/area/power tradeoffs. In [6] the filters were realized in standard cells and they were characterized in terms of delay, area, and power dissipation. We report in Table 1 the expressions of the power dissipation as a function of the number of taps (N) for all the FIR filters with variable coefficients realized in transposed form. In the table, column N^* indicates the N for which the RNS and the CS-RNS filter dissipate less power than the corresponding TCS filter. Table 1 shows that RNS filters, realized with standard cells, consume less power than the corresponding ones in TCS, when the number of taps is larger than 4, while CS-RNS filters consume more than plain RNS filters.

Figure 1: RNS FIR filter.

.	T_c	Power @100MHz [mW]	N^*
TCS	5.0 ns	15.5 ·N + 6	-
RNS	5.0 ns	7.8 ·N + 25	4
CS-RNS	3.0 ns	10.5·N + 27	6

Table 1: Summary of results of [6].

3 Experimental set-up

The experimental set-up we used for the characterization of the power consumption in the FPGA is shown in Figure 2. We used the Xilinx AFXPQ240-110 development board equipped with a Virtex V600E HQ240 FPGA [7]. The FPGA can be easily reconfigured by writing directly in its internal RAM or by writing in a 4 Mbit flash memory provided on the board. The programming bit-stream is downloaded from a PC by using the parallel download cable supplied by Xilinx. The development board is equipped with separate power supplies for the FPGA core (V_CCINT) and for the I/O banks (V_CCO). With this feature, separated measurements of current absorption of the core and of the I/O pads can be easily carried out. The main target of the proposed test bed is the measurement of the core power consumption. Consequently, the mean value of the current absorbed by the core is measured. The different filters under test, have been configured with a low pass frequency mask and have been stimulated by a sequence of random samples (uniform probability density function). The correct behavior of the filters have been checked by acquiring, with the logic state analyzer, the impulse response of the filter. A picture of the Xilinx development board is shown in Figure 4.

Figure 2: Experimental set-up.

4 Results

The main purpose of the measurements is to evaluate if the expression of Table 1 found for standard cells are also valid for an FPGA implementation. Based on measurements done on FIR filters with a given number of taps, we want to find an expression for power dissipated in filters of the same structure with any number of taps. For this reason, we interpolate the value measured and try to fit a curve of the type

P = P₁ ·N + P₀
(1)
which gives the average power dissipation of the N-tap filter.

In order to perform the measurements of average current consumption in the FPGA, we implemented six different filters: 8-tap and 16-tap TCS, 8-tap and 16-tap RNS and 8-tap and 16-tap CS-RNS. All with dynamic range of 20 bits. The VHDL RT-level description of the filters was synthesized and mapped on the FPGA device by using the Xilinx Foundation suite of tools.

Table 2 shows the values of average power dissipation and area occupation of the different circuits implemented. The measurements were performed at different clock frequency (T_c). Average power dissipation was computed from [`P] = V_DD ·[`I], in which V_DD is the FPGA core voltage supply and [`I] is the measured current. In CMOS, dynamic power dissipation scales with frequency (f = 1/T_c), but the power values of Table 2 present some anomalies due to inaccuracies in the measurement system. To improve the accuracy of the measurement, we decided to average the values obtained for different T_c by converting average power dissipation into energy consumed in a cycle:

E_c =
P

·T_c [nJ]

and then averaging E_c. Values of E_c, and their average are also reported in Table 2. Therefore, instead of fitting Expr. (1) we fit

E_c(N) = E₁ ·N + E₀ .
(2)
To better evaluate the term E₀, we implemented a circuit with just the two converters (binary-RNS and RNS-binary), connected in cascade, for the RNS and CS-RNS filter. We obtained an average value of E_c(0) = 5.8 nJ

Area
.	TCS				RNS				CS-RNS
.	8-tap		16-tap		8-tap		16-tap		8-tap		16-tap
T_c	P_ave	E_c	P_ave	E_c	P_ave	E_c	P_ave	E_c	P_ave	E_c	P_ave	E_c
[ns]	[mW]	[nJ]	[mW]	[nJ]	[mW]	[nJ]	[mW]	[nJ]	[mW]	[nJ]	[mW]	[nJ]
1,000	20.5	20.5	41.2	41.2	12.4	12.4	20.2	20.2	10.6	10.6	16.7	16.7
500	40.7	20.3	81.2	40.6	24.3	12.2	40.3	20.2	21.6	10.8	33.5	16.7
250	80.3	20.0	160.0	40.0	49.1	12.3	81.0	20.3	42.8	10.7	66.6	16.7
200	99.9	19.9	198.5	39.7	60.8	12.2	101.0	20.2	53.3	10.6	83.0	16.6
100	197.6	19.7	387.9	38.8	121.5	12.1	198.7	19.9	105.8	10.6	164.7	16.5
average	.	20.1	.	40.1	.	12.2	.	20.1	.	10.7	.	16.6
# slices	1240		2440		1364		2310		1358		2274
(% area)	17%		35%		19%		33%		19%		32%

Table 2: Measurements of average power and E_c.

Finally, we fit the average values of E_c for the filters to obtain the expressions of Table 3 and the curves of Figure 3.

From Figure 3 we can see that for filters with more than 4 taps (N^* = 4) the RNS filter consumes less power. This result is similar to that of [6] (N^* = 4 in Table 1). However, the results of [6] were computed on the synthesized netlist and did not take into account the contribute of interconnections. The value N^* obtained here seems to confirm that, due to the small dynamic range of the residues, RNS has shorter (or at least not longer) interconnection wires, and routing is more local of TCS.

The result obtained for CS-RNS filters is even more interesting: in a standard cells implementation (Table 1) CS-RNS filters consume more energy than the corresponding plain RNS filters, while in the FPGA implementation CS-RNS filters consume less. In carry-save RNS filters, the modular sum

s_k = ás_k-1 + a_k x(n-k) ñ_{m_i}

is not done in each tap, but s_k (kept in carry-save format) is reduced to modulo m_i every 8 taps. Because additional registers are required to keep a carry-save representation of s_k-s, there is a tradeoff between combinational logic (adders) and flip-flops. In standard cells, flip-flops consume considerably more energy than simple logic gates, therefore, the use of a carry-save representation, which replaces adders with registers, leads to an increase in power dissipation. In FPGAs, combinational functions are implemented with LUTs, which apparently consume more energy than flip-flops. Therefore, CS-RNS filters not only are faster than plain RNS (and TCS) filters, but also occupy less area and consume less power in a FPGA implementation.

.	E_c [nJ]	N^*
TCS	2.5 ·N + 0.2	-
RNS	0.9 ·N + 5.6	4
CS-RNS	0.7 ·N + 5.7	3

Table 3: Expressions of E_c for the filters.

Figure 3: Curves of E_TCS, E_RNS and E_CSRNS.

5 Conclusions and future work

In this work, we measured the average power dissipation of FIR filters mapped on an FPGA, and compared the measurements with the results previously obtained for a standard cell implementation. The measurements confirm that RNS filters are smaller and consume less power than the corresponding TCS filters for filters with more than 4 taps. Moreover, the FPGA implementation of the filters in question, seems to be more in favor of RNS, with respect to the standard cells implementation. Carry-save RNS filters offer the best tradeoff delay/area/power for implementations on FPGAs.

Furthermore, we wanted to explore the possibility of carrying out measurements of electrical parameters on hardware to extract information on a system, instead of simulating its behavior with complicated models. The former approach is made possible by the easy, fast and inexpensive re-programmability of FPGAs. Direct measurement requires an initial effort to set-up the measuring environment and to establish a robust testing and measurement methodology. Once the environment is set up, the results are easy to obtain and quite accurate, within the uncertainty of the measuring environment. The advantage of direct measurement is that, no matter how complicated the system under test is, the results are obtained in a short time and with a good accuracy. However, the case of the CS-RNS filter showed that results obtained with FPGAs cannot be extended to standard cells without a careful study of the characteristics of the different technologies and design styles.

In future work, we plan to extend the power analysis of circuits mapped on FPGAs to the measurement of the peak current and the current profile in a clock cycle. At the same time, we are looking into ways of understanding which design style is more appropriate to obtain power savings in circuits mapped on FPGAs.

Figure 4: Picture of the Xilinx development board in the measuring environment.

References

[1]: A. Nannarelli, M. Re, and G. C. Cardarilli, ``Tradeoffs between Residue Number System and Traditional FIR Filters,'' Proc. of IEEE International Symposium on Circuits and Systems, vol. II, pp. 305-308, May 2001.
[2]: M.A. Sodestrand, W.K. Jenkins, G. A. Jullien, and F. J. Taylor, Residue Number System Arithmetic: Modern Applications in Digital Signal Processing, New York: IEEE Press, 1986.
[3]: M.A. Soderstrand and K.Al Marayati, ``VLSI implementation of very high-order FIR filters,'' IEEE International Symposium on Circuits and Systems (ISCAS'95), vol. 2, pp. 1436-1439, 1995.
[4]: S.Piestrak, ``A high-speed realization of a residue to binary number system converter,'' IEEE Trans. Circuits Systems-II Analog and Digital Signal Processing, vol. 42, pp. 661-663, Oct. 1995.
[5]: M. Re, A. Nannarelli, G. C. Cardarilli, and R. Lojacono, ``FPGA Implementation of RNS to Binary Signed Conversion Architecture,'' Proc. of IEEE International Symposium on Circuits and Systems, vol. IV, pp. 350-353, May 2001.
[6]: A. Del Re, A. Nannarelli, and M. Re, ``Implementation of Digital Filters in Carry-Save Residue Number System,'' Proc. of 35th Asilomar Conference on Signals, Systems, and Computers, pp. 1309-1313, Nov. 2001.
[7]: Xilinx Inc., http://www.xilinx.com/.

File translated from T_EX by T_TH, version 2.70.
On 15 Mar 2002, 13:09.