Appendix B

CAD Tools

Introduction

In this appendix we describe the features of some CAD tools used in the realization of this work. A brief description of COMPASS tools is given in Chapter 4. First, the two tools developed in our laboratory (PET and ACC) are presented. Then, the main features of the commercial tool Synopsys Power Compiler are summarized.

B.1 PET: Power Evaluation Tool

PET belongs to the category of power estimators loosely-coupled with the simulator. It is coupled with COMPASS Qsim and it was developed internally for two main reasons:

to have a flexible tool which could be tailored for specific issues.
because when the project started there were no commercial tools adaptable to COMPASS without a considerable effort.

PET computes the energy and power dissipation by reading the energy views for the cells in the library, the layout-extracted netlist and the trace file generated by Qsim. The energy views are computed once for a given library, by characterization using ACC (Section B.2), and then stored in a database.

B.1.1 PET Energy and Power Models

As discussed in Section 1.2, the energy consumption in a cell is proportional to the output load, the supply voltage, the number of output transitions in a given time window and the energy dissipated internally. This is summarized by expression (1.5), which is rewritten below

E_i = (

V_DD² C_L + E^int ) n_i

where:

: V_DD is power supply voltage.
: C_L is the total load applied to the output.
: E^int is the internal energy dissipated in the cell during one transition.
: n_i is the number of transitions at the output of the i-cell in the time window.

The term between parenthesis

E_tran =

V_DD² C_L + E^int [J]

represents the energy per transition. The average power dissipated in a cell can be computed from the energy, by introducing the following quantities:

f₀ is the circuit main frequency (clock frequency),

a_i is the activity factor:

a_i =

nr. of output transitions (in time window)

nr. of clock cycles (in time window)

n_i

n_T

P_i = (

V_DD² C_L + E^int ) a_i f₀ = E_i

f₀

n_T

[W]

In a sequential cell also the internal switching, not affecting the cell's output, dissipates energy. To take into account this contribution we can write the energy and power expressions in the following way:

E_i = (

V_DD² C_L + E^int ) n_i + E^cln_i^cl [J]

P_i = (

V_DD² C_L + E^int ) a_i f₀ + E^clf_i^cl [W]

where :

: E^cl is the energy dissipated internally per transition due to clock switching.
: f_i^cl = [(n_i^cl)/(n_T)] f₀ is the frequency of the transitions of the cell's clock⁵.

Now we consider a large circuit containing N cells, N_S of which are sequential. The total energy consumption in the time window is given by:

E_total =

N
å
i = 1

(

V_DD² C_Li + E_i^int ) n_i +

N_S
å
i = 1

E_i^cln_i^cl [J]

(27)

Summarizing, in order to calculate the energy dissipated, given by expression (B.1) we need to determine the value of the following parameters:

V_DD is the power supply voltage.
C_Li is the load at the output of the i-cell.
E_i^int is the energy per transition dissipated inside the i-cell.
n_i is the number of transitions seen at the output of the i-cell.
E_i^cl is the energy dissipated internally in the sequential i-cell due to clock switching.
n_i^cl is the number of the clock transitions seen at the input of the sequential i-cell.

To compute the power dissipation

P_total =

f₀

n_T

N
å
i = 1

(

V_DD² C_Li + E_i^int ) n_i +

f₀

n_T

N_S
å
i = 1

E_i^cln_i^cl =

f₀

n_T

E_total [W] .

(28)

we need the two additional values

f₀: the clock frequency.
n_T: the number of clock cycles in the time window we are considering.

The quantities V_DD, E^int and E^cl depend on the library that we are using. C_L depends on the design and layout (type of cell connected and wire capacitance) and the number of transitions depends on the design and on the set of input vectors used.

The procedure to determine the energy and power dissipation is the following:

For the chosen library determine the quantities E^int and E^cl for each cell. These values can be provided directly by the silicon vendors or obtained by cell characterization.
From the layout, extract the capacitance (output load plus interconnection capacitance) at each node and associate them as output load (C_L) to each cell.
Run a simulation on a set of random chosen test vectors using a tool that is able to detect transitions (i.e. a logical level simulator).
Calculate energy and power using expression (B.1) and expression (B.2).

B.1.2 PET Implementation

The procedure described above was implemented in PET. It consists of three C routines (analyze, ttgen and calpot) and the use of two COMPASS tools: Qsim (logic-level simulator) and extract (COMPASS Interconnect layout to netlist extractor) [38]. The latter is used to determine the capacitance (including wires) at each node of the circuit while Qsim is used to determine the logic values of the nodes used later to determine the number of transitions. PET is structured as depicted in Figure B.1.

Figure 2.1: Structure of PET.

analyze reads the extracted netlist and determines the output load for each cell of the circuit. It also provides to Qsim the labels of the nodes to monitor. The files read are:

a configuration file containing general parameters such as: power supply voltage (V_DD), clock frequency, time window of the simulation.
the netlist [nle] extracted by extract.
a file containing the mapping of cell's pins for the library. It is needed to associate the capacitance of node x to the output of cell i.

The files produced are:

a list of the labels (file [mon]) corresponding to the nodes to be monitored by Qsim.
the reference capacitance-node (file [cap]).
the reference cell's output-node (file [acn]).
the reference label-node (file [lab]).

All these references are resolved later by calpot. The [mon] file is incorporated with the input stimuli in the simulation file [sim] to be used along with the netlist [nle] in the simulator.

ttgen (transitions table generator) reads the simulation output file [trc] and creates a transitions table [trn]. In this table each label/node is associated with the number of transitions occurred at that node during the simulation.

Finally, calpot calculates energy and power dissipation according to expression (B.1) and expression (B.2). The files read are:

the three files generated by analyze: [cap], [acn], [lab].
the transitions table file [trn] produced by ttgen.
the configuration file containing library parameters.
a file containing the values E^int and E^cl (energy views) for each cell of the library.

B.1.3 PET Testing

PET was tested on a limited set of benchmarks comparing the results with those obtained using SPICE and calculating the power as the product of the voltage and the average current over a time window of the same size of that used for PET [46]. The error was never greater than 10% (the largest benchmark circuit contained about 3,000 transistors).

The main drawback of PET is that it accounts for a fixed amount of short-circuit current for each cell, determined independently of the transition time. This can lead to a lack of accuracy in some situations, for example the power dissipation of blocks not in the critical path where signals could have slow ramps. An approach to include a more accurate evaluation of the short-circuit current is described in [47]. However the improvement in the results obtained is not good enough to justify a significantly greater modeling effort.

B.2 ACC: Automatic Cell Characterization

As an increasing number of transistors is packed in a single chip, the design tools (CAD tools) have to handle larger circuits. Because it is unrealistic to simulate the behavior of a complete system with an electrical-level simulator, such as SPICE, design tools are shifting toward higher levels of abstraction. These levels of abstraction are organized in a hierarchical structure with circuit/electrical level at the bottom of the hierarchy. Circuit characterization is necessary to provide information of the electrical properties of small functional parts of the system to higher hierarchical levels. In general, cell characterization provides capacitance, timing and power values for all the cells in the library to CAD tools operating at gate-level. In our specific case, we characterize the standard cell library to extract the energy views necessary for PET.

B.2.1 ACC Energy Views

ACC (Automatic Cell Characterization) is a tool that performs library characterization by automatically running several SPICE simulations on all the cells of the library. It is derived from the tool presented in [48], and can characterize cells for timing, capacitance and energy. However, in this appendix, we only focus on characterization for energy.

As described in Section B.1, the PET energy model for a single cell is

E = (

V_DD² C_L + E^int ) n_i + E^cln_i^cl

where:

: V_DD is power supply voltage.
: C_L is the total load applied to the output.
: E^int is the internal energy dissipated in the cell during one transition.
: n_i is the number of output transitions in the time window.
: E^cl is the energy dissipated internally due to clock switching.
: n_i^cl is the number of clock transitions, if the cell is sequential.

Of all the quantities indicated in the above expression, the ones obtained by characterization are E^int and E^cl (energy views).

It is convenient to characterize a cell over a period of time in which two output transitions occur (one low-to-high and one high-to-low). The value of energy is computed as the product of V_DD and the value obtained by numerical integration of the current i(t) over a time window [t₁, t₂] in which two transitions occur:

E_cy =

ó
õ

t₂

t₁

v(t) i(t) dt @ V_DD

N
å
k = 0

i(t₁+kDt) with Dt =

t₂ - t₁

The graph of the current i(t₁+kDt) is obtained by SPICE simulation with resolution step Dt. By simulating the cell with different loads we determine different values of E_cy. The value E^int can be obtained, as follows:

By linear curve fitting of the values of C_L and E_cy, we obtain the two coefficients x₁ and x₀

E_cy = x₁ C_L + x₀ .
From expression (1.5), we get:

E_cy = ( 1
2
V_DD² C_L + E^int ) n_i = 2 ( 1
2
V_DD² C_L + E^int )
By combining the two expressions above:

V_DD² C_L + 2 E^int = x₁ C_L + x₀

we obtain:

V_DD² = x₁ and E^int = x₀
2

Note that the value of x₁ could be used to evaluate the accuracy of the linear curve fitting, being the actual value of V_DD known.

For sequential cells, the contribute due to the clock switching E^cl is measured, independently of the output load, by applying an input pattern that causes no output transitions (i.e. n_i = 0).

Note that the internal energy includes the energy due to short-circuit current which depends on the slope of the transitions. In our characterization for PET, we assumed the input slope to be constant for the library and chosen as the response time af a gate with drive strength of one [43], [49]. This assumption leads to accurate energy values when the circuit is optimized for timing. In fact, longer transition times reflect on longer delays. More detailed information on the characterization of energy due to the short-circuit current is provided in [47].

B.2.2 ACC Implementation

The structure of ACC is shown in Figure B.2. ACC reads three databases containing the SPICE netlists of the cells in the library, a set of loads (CapLib), and different waveforms to be applied as input stimuli (WaveLib). In addition, ACC reads three files containing the simulation specifications, the global paramenters for SPICE, and the SPICE models for the transistors.

Figure 2.2: Structure of ACC.

ACC was implemented by routines written in C and scripts in UNIX C-shell, for further details see [50]. The flow of ACC is described in Table B.1

Source configuration file containing library paths and global parameters.
For each cell in library
{
Create a working directory $CELLNAME.
Copy in $CELLNAME the simulation specifications (sim.specs).
Copy in $CELLNAME the SPICE subcircuit ($CELLNAME.sub).
For each line in sim.specs (e.g. each specification)
{
  Create SPICE netlist ($CELLNAME.spi).
  Write file containing simulation variables (var).
  For each capacitance value C_L in var
  {
   For each input stimuli set specified in var
   {
    Run SPICE.
    Extract value (e.g. E_tran) specified in var.
   }
  }
  Elaborate results (polynomial fitting).
}
Write energy view.
}

Table B.1: ACC working flow.

B.3 Synopsys Power Compiler

We summarize below the main features of Synopsys Power Compiler. In particular we discuss the power model, the cost function and some techniques used to reduce the power dissipation. Most of the information and data are derived from those presented in [15].

Power Compiler is built on the synthesis environment of Design Compiler and allows power optimization to be performed with delay and area optimization. Power Compiler obtains its power estimates from Design Power. The power dissipated is divided into 3 contributes:

Switching power:: [1/2] C V² f depends on pin and wire capacitance, which values are available in the synthesis technology libraries, and transition count information described by toggle rates obtained either from Design Power's probabilistic estimation algorithm or from gate-level simulation.
Internal power:: power consumed internally to the gate. The internal power model is not linear and provided by ASIC vendors as a look-up table derived from SPICE characterization. This energy table is indexed by the cell's input edge rates (slopes) and output loads to produce an energy value that is then multiplied by the toggle rate of the output.
Static or leakage power:: This is a single constant value for the cell specified by the ASIC vendor.

In Power Compiler the cost function is prioritized as follows:

maximum delay
minimum delay
maximum dynamic power
maximum leakage power
maximum area.

This means that timing constraints will not be violated to save power, but available time slack will be used to reduce it. A transformation is accepted if decreases one of the cost functions, without increasing higher priority costs.

The circuit transformations that try to reduce one of the main factors contributing to the power dissipation: gate transistor dimensions, net switching activity, net transition times and net capacitive loading are described next.

B.3.1 Gate transistor dimensions

The dimensions of the transistors that compose a CMOS gate can influence a number of factors that determine the power consumption of a design. Sizing of a cell is done by choosing different implementations of the same logic function. These implementations might differ in their parasitic capacitance and internal power.

B.3.2 Composition

In order to reduce the switching power, Power Compiler merges or composes sets of cells into a more complex one. The switching power of the enclosed net is completely eliminated, however the internal power of the new cell is higher because of the increased gate size.

B.3.3 Pin swapping

Some cells can have input pins that are symmetric with respect to the logic function (for example, in a 2-input NAND gate the two input pins are symmetric), but have different capacitance values. Power can be reduced by assigning a higher switching rate net to a lower capacitance pin.

B.3.4 Sizing and buffering

The power due to the net transition time can be reduced by decreasing the transition times at the inputs. Power Compiler substitutes the driver of a net with a higher driver to sharpen the edge of the transition. In alternative the use of buffers can also reduce the transition time. The drawback is that the added capacitance (larger transistors in the driver, or extra gates to implement buffers) might offset the reductions obtained.

Footnotes:

⁵ There are 2 transitions per clock period. Therefore, f_i^cl is twice the frequency of the cell's clock.

File translated from T_EX by T_TH, version 1.1 and by M_E. Last Modified : Fri Jul 9 11:14:41 PDT 1999