VLSI Implementation of a 4 × 4-bit Multiplier in a Two Phase Drive Adiabatic Dynamic CMOS Logic

Yasuhiro TAKAHASHI, Toshikazu SEKINE, Members, and Michio YOKOYAMA, Nonmember

1. Introduction

Along with popularization of a Personal Digital Assistance (PDA) unit, downsizing, low voltage and low power consumption of VLSI are desired. The power consumption influences the battery life of the PDA unit, and so it is important to reduce the power consumption of VLSI. The adiabatic (or energy-recovering) logic is a new promising approach, which has been originally developed for low power digital circuits [1]–[7]. In Ref. [7], we proposed a new topology for the adiabatic circuit, which is called a Two Phase drive Adiabatic Dynamic CMOS Logic (2PADCL) circuit. The 2PADCL achieves ultra low energy dissipation by restricting current to flow across devices with low voltage drop and by recycling the energy stored in the internal (i.e., gate-source) capacitance.

In this paper, we describe a VLSI implementation of a 4 × 4-bit multiplier using a 2PADCL circuit technology. The basis of the logic is presented in Sect. 2. In Sect. 3, we verify that carrying out design, trial manufacture and evaluation using the 2PADCL circuit actually, in respect of the 4 × 4-bit multiplier integrated circuit. Section 4 shows that the performance of the proposed adiabatic multiplier is compared with that of the static CMOS multiplier. The conclusions are summarized in Sect. 5.

2. Adiabatic Logic

2.1 Adiabatic Switching

The main idea in an adiabatic switching is that transitions are considered to be sufficiently slow so that heat is not emitted significantly. This is made possible by replacing the DC power supply by a resonance LC driver or oscillator. If a constant current source delivers the charge \( Q = CV_{DD} \) (where, \( C \) is a load or internal capacitor, and \( V_{DD} \) is a DC power supply voltage) during the time period \( \Delta T \), the energy dissipation in the channel resistance \( R \) is given by

\[
E_{\text{diss}} = \xi P \Delta T = \xi I^2 R \Delta T = \xi \left( \frac{CV_{DD}}{\Delta T} \right)^2 R \Delta T,
\]

where \( \xi \) is a shape factor which depends on the shape of the clock edges [8]. It takes on the minimum value \( \xi_{\text{min}} = 1 \) if the charge of the load capacitor is DC modulated. For a sinusoidal current, \( \xi = \pi^2/8 = 1.23 \). The above equation indicates that when the charging period \( \Delta T \) is indefinitely long, in theory, the energy dissipation is reduced to zero. This is called an adiabatic switching [1].

2.2 2PADCL

The 2PADCL inverter is shown in the top of Fig. 1(a), where the inverter is operated with complementary phases of power supply signals. The supply waveform consists of two modes, “evaluation” and “hold,” as shown in the bottom of Fig. 1(a). Let us consider the adiabatic mode. When \( V_p \) and \( \overline{V}_p \) are in evaluate mode, there is conducting path(s) in either PMOS devices or NMOS devices. Output node may evaluate from low to high or from high to low or remain unchanged, which resembles to the CMOS circuit. Thus, there is no need to restore the node voltage to 0 (or \( V_{DD} \)) every cycle. When \( V_p \) and \( \overline{V}_p \) are in hold mode, output node holds its value in spite of the fact that \( V_p \) and \( \overline{V}_p \) are changing their values*. We can find that such is the case by observing the function of diodes and the fact that the inputs of a gate have a different phase with the output. Circuits node are not

---

SUMMARY  An adiabatic logic is a technique to design low power digital VLSI’s. This paper describes the design and VLSI implementation of a multiplier using a two phase drive adiabatic dynamic CMOS logic (2PADCL) circuit. Circuit operation and performance have been evaluated using a 4 × 4-bit 2PADCL multiplier fabricated in a 1.2 µm CMOS process. The experimental results show that the multiplier was operated with clock frequencies 800 kHz. The total power dissipation of the 4 × 4-bit 2PADCL multiplier was also 5.19 mW at the 1.5 V DC power supply voltage.

\* The stored input values are held in the “hold” phase for the next stage, and so hold mode is equivalent to adiabatic latch function.
where $C_L$ is a load capacitance, $V_d$ is a threshold voltage of diode. In the above equation, the second terms of the right hand side is nearly equal to zero since the diodes used in ADCL are implemented by MOSFETs (i.e. $V_t \approx V_d$), then energy dissipation $E_{ADCL}$ is approximated by

$$E_{ADCL} \approx 2C_L(V_p - 2V_d)V_d. \quad (3)$$

On the other hand, energy dissipation of 2PADCL can be calculated as follows:

$$E_{2PADCL} = 2C_{gs}(V_p - 2V_d)V_d + C_{gs}(V_t - V_d)^2 \approx 2C_{gs}(V_p - 2V_d)V_d, \quad (4)$$

where $C_{gs}$ is a gate-source capacitance in the next stage. Assume, $C_L = 0.1 \text{pF}$, $C_{gs} = 0.02 \text{pF}$, $V_t = V_d = 0.8 \text{V}$, and $V_p = 5 \text{V}$, then we have

$$E_{ADCL} = 0.54 \text{pJ/cycle},$$

$$E_{2PADCL} = 0.11 \text{pJ/cycle}. \quad (5)$$

Since 2PADCL gate is possible to maintain the output voltage without the load capacitor, its energy consumption can be more reduced.

2.3.2 Data Timing Analysis

The minimum energy consumption in a single inverter is reached when the phase difference between the power supply and the input data is $\pi/2$-rad so that the data is lacking the power supply. When the phase difference is somewhere between 0-rad and $\pi/2$-rad, the slope of the toggling gate’s output is partially in CMOS and partially in adiabatic mode. If the phase difference is thought to be determined by a stochastic process, the functionality of the 2PADCL gate is guaranteed only by the following equation: $f_{clk} > f_s$ where $f_{clk}$ and $f_s$ are the frequency of power supply and the frequency of input signal, respectively. In practice, the clock timing ratios $f_{clk}/f_s \in \{5, 7, 9, 11, 13\}$ can provide about from 80 to 90% adiabatic operation. Therefore, energy dissipation of 2PADCL after consideration of clock timing is as follows:

$$E_{2PADCL} = \frac{0.11 \times \xi}{0.8} \approx 0.17 \text{pJ/cycle}. \quad (6)$$

3. 4×4-bit Multiplier Fabrication of 2PADCL Technology

The simplest parallel multiplier is such that pairs of an AND gate and a 1-bit full adder are laid out repetitively and connected in sequence to construct an $n \times n$ array [9]. All the partial products are computed in parallel, then collected through a cascade of carry-ripple adders (or carry-save adders). The completion time is limited by the depth of
the carry-ripple array (or carry-save array), and by the carry propagation in the adder.

The array multiplier is composed of rows of adders for recursive shift-addition operations. Sum and carry signals generated in the previous rows are transferred to the next rows as two of three inputs. Therefore, the power consumption increases if the transitions of these signals occur frequently. Basic array multipliers (e.g., Braun [9] and Baugh-Wooley scheme [10]) consume low power and exhibit relatively good performance. In this study, we use a Braun’s parallel array multiplier as shown in Fig. 2.

3.1 Implementation of 4 × 4-bit Multiplier

In order to validate the functionality of the 2PADCL logic and evaluate its performance, a 4 × 4-bit parallel carry-ripple array multiplier has been implemented in a 1.2 μm double-metal double-poly CMOS n-well technology. This chip size is 2.3 × 2.3 mm² and is mounted in a 52-pin SQFP. The transistor size W/L is 5.0 μm/1.2 μm for both of the PMOS and the NMOS transistors. In the input and the output buffers, the conventional CMOS circuits are used in order to realize the CMOS interface compatibility. Therefore, these interface circuits are non-adiabatic. The LSI has a total area of 926 × 704 μm², and its photomicrograph can be seen in Fig. 3.

3.2 Operational Speed

Figure 4 displays the measurement results using a digital oscilloscope (Tektronix Co., TDS410A). In the experiment, the supply voltage and clock frequency are as follows:

![Figure 2: Structure of a 4 × 4-bit multiplier. HA and FA are half adders and full adders, respectively.](image)

![Figure 3: Photomicrograph of the 4 × 4-bit 2PADCL multiplier.](image)

![Figure 4: Measurement results of 4×4-bit 2PADCL multiplier (multiply 15 × 15). Vertical scale: 5 V/div. Horizontal scale: 0.5 μs/div.](image)
supply voltage $V_{DD} = 5\, \text{V}, \text{DC}$,

supply voltage $V_p = 5\, \text{V}, 10\, \text{MHz}, \text{sin wave},$

clock frequency $f = 5\, \text{V}, 800\, \text{kHz}, \text{square wave}$. 

In this figure, the total critical delay path includes the delay of input and output buffers. The experimental results show that the multiplier is operated with clock frequencies up to 800 kHz.

3.3 Power Dissipation

In order to verify the power dissipation of the proposed 4×4-bit multiplier, we used the power supply circuit as illustrated in Fig. 5. This circuit is based on the Clapp oscillator that generates the sinusoidal voltage, and is realized by using discrete components. The MOS transistor 2SK241 is normally-on type, thereby reducing the DC power supply voltage. The current flows out from the circuit shown in Fig. 5 is set by adjusting the source resistance 100Ω in order to generate the sinusoidal power supply whose voltage is 5 V peak-to-peak at the output terminal $V_p$ and $V_p$. The frequency of $V_p$ is controlled to be near 10 MHz by adjusting the element values of L or C.

The oscillator circuit shown in Fig. 5 is connected to the 4×4-bit 2PADCL multiplier for supplying the power and thus, the 2PADCL system is constructed. The 2PADCL system is experimentally confirmed to operate perfectly. Dependency of the 2PADCL system power dissipation on the DC power supply voltage is shown in Fig. 6. From the figure, we find that the 2PADCL system can operate with sub-one volt DC power supply. The power dissipation is 5.19 mW at the 1.5 V DC power supply voltage, its dissipation is based on the assumption that this 2PADCL system will be used as the PDA's processor. Finally, Table 1 summarizes the main specifications of the 2PADCL multiplier.

4. Power Delay Product of the Multiplier with Other Logics

The parameter used to compare the performance of each multiplier is the power-delay-product (PDP), or what is the same, the energy required to carry out one multiplication. Simulation results related the power dissipation and delay time are summarized in Table 2. These simulations are about the power consumption of the 4×4-bit multiplier, which does not contain the dissipation due to the clock generation. From Table 2, some of the results can be summarized as follows:

1. When the power supply voltage for those logic circuits is 5 V, the PDP of the 2PADCL multiplier is the best of the three logic circuits.
2. As the power supply voltage decrease, the PDP of the

<table>
<thead>
<tr>
<th>Table 1</th>
<th>Summary of the $4 \times 4$-bit 2PADCL multiplier chip.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Feature</td>
<td>Value</td>
</tr>
<tr>
<td>Maximum Clock Frequency</td>
<td>800 kHz</td>
</tr>
<tr>
<td>Technology</td>
<td>1.2μm N-well CMOS, 2metal, 2poly</td>
</tr>
<tr>
<td>Transistors</td>
<td>844</td>
</tr>
<tr>
<td>Transistor size W/L</td>
<td>5.0μm/1.2μm</td>
</tr>
<tr>
<td>Test vector Area</td>
<td>$926 \times 704, \mu\text{m}^2$</td>
</tr>
<tr>
<td>Die area</td>
<td>$2.3 \times 2.3, \text{mm}^2$</td>
</tr>
<tr>
<td>Total power dissipation</td>
<td>$5.19, \text{mW}@800, \text{kHz}, 1.5, \text{VDC}$</td>
</tr>
<tr>
<td>(Clapp oscillator: 5.00 mW)</td>
<td>(Multiplier: 190μW)</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Table 2</th>
<th>Comparison of different $4 \times 4$-bit multiplier.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power consumption</td>
<td>Delay time</td>
</tr>
<tr>
<td>CMOS</td>
<td>22.3 mW</td>
</tr>
<tr>
<td></td>
<td>5.91 mW</td>
</tr>
<tr>
<td></td>
<td>1.03 mW</td>
</tr>
<tr>
<td>ADCL [4],[5]</td>
<td>331 μW</td>
</tr>
<tr>
<td>2PADCL</td>
<td>304 μW</td>
</tr>
</tbody>
</table>
CMOS multiplier can decrease. It follows that the PDP of the CMOS multiplier at 2.0 V power supply voltage is about four times better than the PDP of the multiplier implemented in other adiabatic logics.

3. In any case, the power dissipation of the 2PADCL multiplier is the best of the three logic circuits.

5. Conclusions

In this paper, the 4×4-bit 2PADCL multiplier has been implemented by using a 1.2 μm CMOS process technology with the area of 926 × 704 μm². The experimental results have shown that the multiplier is operated with clock frequencies up to 800 kHz. The total power dissipation of the 4×4-bit 2PADCL multiplier has been also 5.19 mW including the power supply.

Acknowledgment

The 4×4-bit 2PADCL multiplier chip in this letter has been fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo with the collaboration by On- Semiconductor, Nippon Motorola LTD., HOYA Corporation, and KYOCERA Corporation.

References