Optimizing CMOS Circuits for Performance Improvements Using Adiabatic Logic

Information Technology Journal

Year: 2007 | Volume: 6 | Issue: 3 | Page No.: 325-331
DOI: 10.3923/itj.2007.325.331

Optimizing CMOS Circuits for Performance Improvements Using Adiabatic Logic

P. Vijayakumar, M. Shanthanalakshmi and K. Gunavathi

Abstract: Retiming is an efficient technique for redistributing latches in circuits in order to improve the circuit performance. However the retiming inherently increases the power dissipation because of the inclusion of latches for pipelining the circuit. We present an efficient Improved Positive Feedback Adiabatic Logic (IPFAL), which utilizes the inherent nature of pipelining in the circuits with the help of four-phase power clock. The IPFAL efficiently recovers the stored charge from the load capacitance and hence reduces the non-adiabatic losses in the circuits. The proposed technique is validated by applying it to a number of ISCAS benchmark circuits and the results are compared with those of the pipelined circuits with retiming algorithm. The experimental results prove that the circuits implemented in IPFAL has a maximum reduction in power dissipation of upto 89% when compared to the ones using retiming algorithm

Fulltext PDF Fulltext HTML

How to cite this article

P. Vijayakumar, M. Shanthanalakshmi and K. Gunavathi, 2007. Optimizing CMOS Circuits for Performance Improvements Using Adiabatic Logic. Information Technology Journal, 6: 325-331.

Keywords: dynamic power dissipation, Retiming, adiabatic logic and glitching

INTRODUCTION

As battery powered products especially mobile applications using integrated circuits become more important, severe constraints are imposed on the power that can be consumed by these circuits. Apart from this, the integration of an increasing number of transistors on a single chip can cause more power dissipation which leads to reliability and IC packaging problems. These considerations reveal the need for a reduction in power dissipation of integrated circuits.

Decrease in power dissipation can be achieved at different levels of abstraction of the IC design (Weste, 2002). Some of the power optimization techniques at the logic levels reported in the literatures are voltage scaling, gate resizing, precomputation etc. All these techniques optimize power at the expense of speed of the circui (Malik et al., 1990; Monteria et al., 1993; Favalli and Benini, 1995). However a considerable reduction in speed of the circuits is not admissible. Hence this paper aims at the concurrent optimization of both power and speed.

Retiming, a very promising sequential optimization technique, moves latches across combinational logic blocks in order to pipeline the circuit without changing the logic function inside the blocks (Even et al., 1996; Eckl and Legl, 1998). Pipelining is a technique of decomposing a circuit into a number of segments with each segment operating concurrently with all other segments thus increasing the speed of the circuit. Intermittent latches separate the segments from one another. Retiming algorithm is used to place these latches judiciously. This is important because only if the latches are placed at the nodes having higher cost function the power reduction would be more. The cost function is estimated by taking into account of various factors like glitching, fan-in, fan-out and the probability of transition at a gate propagating through its transitive fan-out (Leiserson and Saxe, 1991). As the number of pipelined stages increases the latch count would also increase which inadvertently increases the power dissipation and also the silicon area.

This study proposes a new power optimization technique called Improved Positive Feedback Adiabatic Logic (IPFAL). Adiabatic switching is an attractive approach for reducing power dissipation in digital logic. When adiabatic switching is used, the signal energies stored on the circuit capacitance may be recycled instead of dissipated as heat. Number of adiabatic logic architectures has been proposed for low power VLSI design such as efficient energy recovery logic (ECRL), 2N-2N2P logic and Positive Feedback Adiabatic Logic (PFAL) (Moon and Jeong, 1996; Oklobdzija and Maksimovic, 1997). An exhaustive comparison of these logic families can be found in (Amirante et al., 2001). Each of the adiabatic logic family has some advantage over the other but suffers from non-adiabatic loss in common. The proposed IPFAL overcomes this by completely recovering the stored charge during the recovery phase by using the additional recovery path. The other advantage of adiabatic logic is that it utilizes the inherent pipelining of the circuits by using four phase trapezoidal power clock lines. This eliminates the latches used in the retimed circuits and hence shows a greater reduction in power dissipation.

Comparative study is made between the conventional retiming and IPFAL after applying them to a set of ISCAS benchmark circuits. From the simulation results it is observed that the circuits in IPFAL have a power dissipation of 89% lesser than the ones with pipelining implemented using the retiming algorithm. Rest of the paper is organized in the following manner.

RETIMING

Retiming is an algorithm to pipeline the given circuit into number of stages so that the overall glitching activity in the circuit is reduced. This section explains the concept of pipelining. A working algorithm for the same has been proposed.

Pipelining: Pipelining is technique by which a given circuit is divided into number of segments separated by means of latches, so that each of these segments functions independently. Thus the throughput and hence the speed of the circuit is boosted dramatically. It is the characteristic of pipelines that several computations can be in progress in different segments simultaneously. This concurrent operation is done by associating a latch with each segment that provides isolation between segments in the pipeline.

The pipelined structure of N combinational logic blocks can be represented as shown in Fig 1. Each segment consists of an input latch followed by a combinational circuit. The input latch holds the input data and the combinational circuit performs its operation in the particular segment. The output of this segment is given as the input to the input latch of the next segment. Common clock is applied to all the latches after enough time has elapsed to perform the sub-operations in all the segments.

Glitch reduction: In CMOS circuits, glitches contribute over more than 60% of the total power dissipation even with glitch free inputs (Raghunathan et al., 1999). For effective low power circuit design, glitch reduction is a must. Retiming is a technique to reduce the glitches and hence glitch power. Retiming involves repositioning of latches in a pipelined circuit so that the glitches are minimized. In other words retiming helps in placing the intermittent latches of the pipelined circuit judiciously such that glitch reduction is achieved.


Fig. 1:	Pipeline structure


Fig. 2:	a) Gate and b) Gate with latch

Consider a gate G as shown in Fig. 2a Assume N_G is the average switching activity at the output of the gate G and C is the load capacitance. The power dissipated P₁at the output of this gate is proportional to N_GC. Figure 2b shows the gate G along with the latch F. Assume C_F is the capacitance at the input of the latch F. Let the average switching activity at the output of the latch F be N_F. Thenthe power dissipated P₂at the output of the latch is proportional to N_GC_F + N_FC. It is known that, C_F<< C and N_F<< N_G. So the power at the output of the latch, P₂ is less than the power at the output of the gate, P₁. Thus, by placing a latch at the output of a gate, glitch power dissipation is reduced. This is possible only if N_F<< N_G. For this to happen, the node selected must be at a position where the number of glitches is high. Hence, by placing the latches at those nodes, where the probability of occurrence of glitches is more, greater reduction in dissipated power can be realized.

Algorithm: In retiming algorithm, weights or costs are assigned to each node in the circuit depending upon the glitching, number of fan-in and fan-out and transition probability of glitches at the gate propagating through its transitive fan-out. Then the latch is placed at the output of the node with maximum cost or weight, by tracing the path from each primary input to each primary output (Even et al., 1996). The flow of the algorithm is as follows.

•	Find the power, capacitance and switching probability at each node of the unit delay model (P₁, C₁, S₁) and zero delay model (P₀, C₀, S₀).
•	We know that power dissipated at any given node is

P = ¹/₂C.V²f_clockN

(3)

Where C = load Capacitance, V = supply voltage, f_clock= clock frequency and N = switching activity. From this, the switching activity for unit delay model as well as zero delay model is found and is given by:

N₁=P₁/(¹/₂V²f_clockC₁)

(4)

N₀=P₀/(¹/₂V²f_clockC₀)

(5)

•	The amount of glitching is the difference between switching activity of the circuit for unit delay model and zero delay models. Therefore

N _glitch=N₁-N₀

•	Let fi(x) and fo(x) be the fan-in and fan-out respectively at a node x. Then the corresponding weight function W(x) is given as

(6)

Hence

(7)

•	Trace all paths from each primary input to each primary output.
•	If weights of all nodes are zero then place no latches and go to step 10 else go to step 7.
•	Place a latch at the output of the node having maximum weight among all nodes and among all paths in the circuit.
•	Make the weight of all nodes such that, fan-in to the chosen node or fan-out from the chosen node is zero. This confirms that each path has not more than one latch placed at its path.
•	Check whether all paths have at least one latch. If yes go to step 10, else go to step 6.
•	Stop the process.

ADIABATIC LOGIC

The structure of ECRL schematic and timing diagram of I/O signals is shown in Fig. 3. It consists of two cross coupled pMOS transistors m1 and m2 and two NMOS transistors in the N-functional blocks for the inverter.


Fig. 3:	Schematic and timing diagram of ECRL inverter


Fig. 4:	Schematic of 2N-2N2P inverter

An AC power supply ‘pwr’ is used so as to recover and reuse the supplied energy. Both out and outbar are generated so that the power clock generator can always drive a constant load capacitance independent of the input signal (Moon and Deog-Kyoon, 1996). Full output swing is obtained because of cross coupled PMOS transistors both in the precharge and recover phase. But due to the threshold voltage of PMOS transistors both in the precharge and recover phases the circuit suffers from non-adiabatic loss.

The second adiabatic logic is the 2N-2N2P logic shown in Fig. 4. The advantage of 2N-2N2P logic over ECRL is that it uses a cross coupled latch of two PMOS transistors and two NMOS transistors (m1-m4) instead of only two PMOS transistors. The N-functional block is in parallel with NMOS transistors of the latch and thus occupies additional area. This logic reduces the coupling effect that is present in ECRL (Moon and Deog-Kyoon, 1996).

The PFAL shown in Fig. 5 has two major advantages compared to ECRL. It uses a cross coupled latch of two PMOS transistors and two NMOS transistors (m1-m4) instead of only two PMOS transistors as in ECRL. Here the functional blocks are in parallel with the PMOS transistors and hence their equivalent resistance is smaller.


Fig. 5:	Schematic of PFAL inverter


Fig. 6:	IPFAL inverter

So there is reduction in power dissipation at higher frequencies when compared to earlier two logic families.

The energy dissipation can be divided into two: adiabatic loss and non-adiabatic loss. Adiabatic loss is generated due to the switching resistance of the transistor when current flows through it. This loss is low in all the above three logic families because the switching resistance offered by the charging path is low as the functional block is in parallel with the charging MOS transistor. The non-adiabatic loss occurs due to the threshold voltage of the transistors used in the charging path. During the recovery phase, the energy is only partially recovered from the output load capacitor C_L. When the output node goes below the threshold voltage of the PMOS transistors (|V_tp|) used in the recovery path, PMOS transistors are switched off and hence blocking further recovery. During the evaluation phase this unrecovered charge gets dissipated as loss when the new inputs become valid. The non-adiabatic energy loss is given by the equation (Fischer et al., 2004)

E_LOSS= C_L|Vtp|²

In adiabatic logic the energy consumed is manifold less when compared to the conventional CMOS. In conventional CMOS the energy consumption is given as (Weste, 2002)

E_CMOS = C_LVdd²

But in adiabatic logic the energy consumed is determined by the formula (Fischer et al., 2004)

E_ADIABATIC= [R_CHARGEC_L/T_CHARGE]C_LVdd²

where R_CHARGEis the resistance of thecharging path. Energy dissipation in the adiabatic logic can be reduced increasing T_CHARGE. Hence, the efficiency of all adiabatic logic families is prominent only over a certain range of frequencies.

The main disadvantage in basic quasi adiabatic circuit is that it suffers form non-adiabatic loss and it loses its adiabaticity at frequencies greater than 200 MHZ. So full charge recovery is not achieved. Therefore an improved PFAL is proposed so as to include an additional charge recovery path in parallel to the cross coupled pMOS transistors that overcomes the inefficient recovery from the load capacitances, avoids the non-adiabatic loss as well as increases the range of operating frequency.

An improved PFAL is proposed to overcome the inefficient recovery from the load capacitances as well as to increase the range of operating frequencies. So as to provide complete recovery, a new path is introduced and accomplished through NMOS transistors. The NMOS transistor can be driven by the logic gate in the next phase as given in (Fischer et al., 2004) to conduct fully in recovery phase. This is possible only if the output of the next gate depends exclusively on the output of the current gate. Therefore it may demand an unnecessary inverter in the next phase to drive this NMOS transistor. Hence this logic is modified and the proposed IPFAL inverter is shown in the Fig. 6.

When the IN is high and INBAR is low the corresponding inputs transistors MN5 and MN6 are turned on and off, respectively. Since the power supply is given in the form of trapezoidal clock, output follows the power clock. When OUT rises above the threshold voltage of the transistor MN2 (V_tn) is turned on and OUTBAR is clamped to ground.


Fig. 7:	Four phase power clock

As the OUT rises above |V_tp| the transistor MP1 is also turned on. So during precharge phase both NMOS and PMOS transistor help in full charge recovery and hence there is no non-adiabatic loss. During the recovery phase the output ramps down along with the power clock. While OUT goes below V_tn of the NMOS, MN3 turns off and OUTBAR goes to a negative voltage due to coupling effect between gate capacitance of MP1 and C_Land this difference voltage helps to turn on MP1 and attain full charge recovery during the recovery phase. Thus MP1 and MN3 help to attain full charge recovery in the recover phase. Hence the non-adiabatic loss given by E_LOSS= C_L|V_tp|²is completely eliminated.

For implementing large circuits in IPFAL the technique uses 4-phase power clock for its operation. The structure of four phase power clocks is as shown in Fig. 7. The four phases of power clock are (a) Wait phase, (b) Precharge and evaluate phase, © Hold phase and (d) Recover phase. This clock provides inherent pipelining to the circuit and demonstrates the cascading capability of the adiabatic logic. Each power clock is 90 degree in advance of previous power clock. It eliminates the use of latches in pipelined circuit and hence power dissipation is reduced further.

SIMULATION RESULTS

All the adiabatic logics implemented in T-spice 3.3V CMOS 0.18 μm technology are simulated and their results are summarized. Comparison is made to prove that adiabatic logic is advantageous than static retimed circuits using benchmark circuits.

IPFAL inverter: An improved IPFAL inverter reduces non-adiabatic loss and operates at frequencies greater than that of PFAL. Its simulated output at 400 MHZ is shown in Fig. 8.

The performance improvements of the adiabatic circuits can be inferred from their simulation results summarized in Table 1.

Table 1:	Comparison of power dissipation for inverter

F-Frequency of power clock, *-Circuits are not functional at these frequencies

Table 2:	Power Results of CM42 Circuit VDD = 3.3V and C_load = 20 Ff

P_WOP-Power without pipelining the circuit, P_W1P-Power of the conventionally pipelined circuit with one stage pipeline, P_W2P-Power of the conventionally pipelined circuit with two stage pipeline, P_PFAL-Power in PFAL circuits, P_IPFAL-Power in IPFAL circuits, % Gain-Percentage power reduction in IPFAL circuits over conventionally retimed circuits with two stage pipeline

Table 3:	Simulation results of ISCAS Benchmark circuits

N-Maximum number of pipeline stages possible, P_CONV-Power dissipation in conventionally retimed circuits with N stages of pipeline, P_IPFAL-Power dissipation in IPFAL circuits, % Gain-Percentage power reduction in IPFAL circuits over conventionally retimed circuits, F-Frequency at which % Gain is maximum for respective circuits

It is observed that the power dissipation is considerably less for ECRL at lower frequencies. As the frequency increases the increase in power dissipation is pre-dominant in ECRL over PFAL and IPFAL. Figure 9 shows the comparison of various adiabatic logic families in terms of their power dissipation. The maximum possible power clock frequency of operation with glitch free outputs for ECRL and 2N-2N2P is 200 MHZ and it is 400 MHZ for PFAL. For the frequency of 200 MHZ itself IPFAL shows a power optimization of more than 50% over ECRL and 2N-2N2P logic. At 400 MHZ, the power dissipation in IPFAL is 15.5% lesser than PFAL. The simulation also proves that the IPFAL is functional up to a frequency of 800 MHZ where as all the other logic families fail to function beyond 400 MHZ.


Fig. 8:	Simulation waveform of IPFAL inverter at 400 MHZ


Fig. 9:	Power dissipation vs frequency curve for inverter

Comparison of adiabatic circuits with retimed benchmark circuits: Conventional retimed CMOS circuits are compared with adiabatic circuits to show the effectiveness of adiabatic logic. Using SIS software, the conventional benchmark circuits are represented in the form of blif code and are retimed using retiming algorithm. The retiming principle emphasizes the placement of latches in its conventional pipelined circuits based on mainly two factors,

•	Assigning weight at each node in the circuit and then
•	The capacitance, power and switching probability obtained at each node using the SIS software.

This algorithm is repeated in a pipelined structure till the weights of all the nodes become zero and no more latches need to be placed in the circuit. Thus the algorithm aims at reducing both power and the delay and hence the power delay product gets minimized. This algorithm is compared with adiabatic logic that has inherent pipelining through its four phase power clocks. Since there is no necessity to place any latches, the circuit and operation is made simpler and also the implementation with IPFAL recovers the full stored charge. The area of the circuit is also made lesser and complexity decreases when compared to conventional retimed CMOS circuits. Therefore, the power is minimized by large amount in the case of adiabatic logic.

Table 2 gives a picture of power dissipation for the benchmark CM42 when simulated without pipeline, pipelined with different stages and with adiabatic logics. In Table 2 the column ‘P_WOP’ shows the power dissipation of the static CMOS circuit at various frequencies. The power dissipation at different frequencies for the retimed circuit with different pipelined stages are also given in the table. For this benchmark circuit the number of pipelined stages possible is two. The PFAL and IPFAL are employed for these circuits which do not require intermittent latches as the adiabatic logics utilizes the inherent pipelining with the help of four phase trapezoidal clock. Subsequent columns in Table 2 give the power dissipation and improvements in adiabatic logics over conventionalretimed circuits. The maximum saving in terms of power dissipation for this benchmark circuit is 86% which occurs at 200 MHZ.

Similar simulations are carried out for other benchmark circuits and their performance improvements are compared in Table 3. In the Table ‘N’ indicates the maximum number of pipeline stages possible for the circuit. P_CONV and P_IPFAL columns give the power dissipation of the conventional retimed circuit and the circuit implemented with IPFAL respectively with corresponding ‘F’ column showing the power clock frequency at which IPFAL circuits giving maximum % Gain. From Table 2 and 3 it is observed that the power saving in IPFAL circuits is upto of 89% when compared to conventional retimed circuits.

CONCLUSIONS

Basic adiabatic logics suffer from the disadvantage of incurring non-adiabatic losses. The proposed IPFAL technique overcomes this by completely recovering the stored charge from the load during the recovery phase of the power clock. The conventional retimed circuits have better performance but at the cost of increased power dissipation because of the pipelining of the circuit with the help of intermittent latches. Since the adiabatic logics utilize the inherent pipelining with four phase power clock they do not require intermittent latches for circuit implementation. Hence, the circuits in adiabatic logic perform better with reduced power dissipation. Simulations were carried out for ISCAS benchmark circuits implemented with the proposed IPFAL in 3.3 V, 0.25 μm CMOS technology. Results show that the circuits in proposed adiabatic logic have reduction in power dissipation of upto 89% when compared to the conventional retimed circuits.

REFERENCES

Amirante, E., A. Bargagli-Stoffi, J. Fischer, G. Iannaccone and D. Schmitt-Landsiedel, 2001. Variations of the power dissipation in adiabatic logic gates. Proceedings of the 11th International Workshop on Power and Timing Model, Optimization and Simulation, September 26-28, 2001, Yverdon-les-Bains, Switzerland, pp: 1-10.

Eckl, K. and C. Legl, 1998. A new retiming approach for sequential circuits with multiple flip-flop classes. Technical Report, Institute of Electronic Design automation, Technical University of Munich, TUM-LRE., pp: 98-100.

Even, G., I.Y. Spillinger and L. Stok, 1996. Retiming revisited and reversed. IEEE Trans. Comput. Aided Design Integrated Circuits Syst., 15: 348-357.
Direct Link

Favalli, M. and L. Benini, 1995. Analysis of glitch power dissipation in CMOS ICs. Proceedings of the International Symposium on Low Power Design, April 23-26, 1995, Dana Point, California, pp: 123-128.

Fischer, J., E. Amirante, A. Bargagli-Stoffi and D. Schmitt-Landsiedel, 2004. Improving the Positive feedback adiabatic logic family. Adv. Radio Sci., 2: 221-225.
Direct Link

Leiserson, C.E. and J.B. Saxe, 1991. Retiming Synchronous circuitry. Algorithmica, 6: 5-35.
Direct Link

Malik, S., E. Sentovich, R.K. Brayton and A. Sangiovanni-Vincentelli, 1990. Retiming and resynthesis: Optimization of sequential networks with combinational techniques. Proceedings of the Hawaii International Conference on System Sciences, January 2-5, 1990, Hawaii, pp: 397-406.

Monteria, J., S. Devados and A. Ghosh, 1993. Retiming sequential circuits for low power. Proceedings of the International Conference on Computer Aided Design, November 7-11, 1993, Santa Clara, CA., USA., pp: 398-402.

Moon, Y. and D.K. Jeong, 1996. An efficient charge recovery logic circuit. IEEE J. Solid State Circuits, 31: 514-522.
Direct Link

Oklobdzija, V.G., D. Maksimovic and F. Lin, 1997. Pass-transistor adiabatic logic using single power-clock supply. IEEE Trans. Circuits Syst. II: Analog Digital Signal Process., 44: 842-846.
CrossRef Direct Link

Raghunathan, A., S. Dey and N.K. Jha, 1999. Register transfer level power optimization with emphasis on glitch analysis and reduction. IEEE Trans. Comput. Aided Design Integrated Circuits Syst., 18: 1114-1131.
Direct Link

Weste, H.E., 2002. Principles of CMOS VLSI Design. Pearson Education Pvt. Ltd., Singapore

HOME JOURNALS CONTACT

Information Technology Journal

Year: 2007 | Volume: 6 | Issue: 3 | Page No.: 325-331 DOI: 10.3923/itj.2007.325.331

Optimizing CMOS Circuits for Performance Improvements Using Adiabatic Logic

P. Vijayakumar, M. Shanthanalakshmi and K. Gunavathi

How to cite this article

P. Vijayakumar, M. Shanthanalakshmi and K. Gunavathi, 2007. Optimizing CMOS Circuits for Performance Improvements Using Adiabatic Logic. Information Technology Journal, 6: 325-331.

Keywords: dynamic power dissipation, Retiming, adiabatic logic and glitching

REFERENCES

Year: 2007 | Volume: 6 | Issue: 3 | Page No.: 325-331
DOI: 10.3923/itj.2007.325.331