# Reducing Hardware Complexity of Wallace Multiplier Using High Order Compressors Based on CNTFET

# Saeed Sam Daliri<sup>1\*</sup>, Javad Javidan<sup>2</sup> and Ali Bozorgmehr<sup>3</sup>

<sup>1</sup>Technical Engineering Department, University of Mohaghegh Ardabili, Ardabil, Iran <sup>2</sup>Faculty of Technical Engineering Department, University of Mohaghegh Ardabili, Ardabil, Iran <sup>3</sup>Nano technology and Quantum Computing Lab, Shahid Beheshti University, GC, Tehran, Iran

(\*) Corresponding author: s.samdaliri@chmail.ir (Received: 05 November 2015 and Accepted: 04 August 2016)

## Abstract

Multiplier is one of the important components in many systems such as digital filters, digital processors and data encryption. Improving the speed and area of multipliers have impact on the performance of larger arithmetic circuits that are part of them. Wallace algorithm is one of the most famous architectures that uses a tree of half adders and full adders to increase the speed and reduce the area of multipliers. Compressors are adders which can be used to perform the partial product addition in Wallace tree. On the other hand, using new emerging technologies such as Carbon Nanotube Field Effect Transistors (CNTFET) leads to provide implementations faster and smaller circuits. This paper presents a new method to reduce the simplification of Wallace tree design using high order compressors based on carbon nanotube technology. These compressors use a high-speed full adder cell based on CNTFETs for low-voltage and high-frequency applications. The proposed method reduces the number of gates and transistors, critical path length and complexity of the Wallace tree hardware.

Keywords: Carbon Nanotube Field Effect Transistor, Compressor, Full adder, Multiplier, Wallace Tree.

# 1. INTRODUCTION

Multiplication is one of the main arithmetic operations which uses as a basic computational unit in microprocessor systems such as crypto processors and Digital Signal Processing (DSP) algorithms [1, 2]. Multipliers are used in DSP processors for providing filtering algorithms, convolution, Fast Fourier Transform (FFT). They are within the critical path of DSP circuits so the efficiency of multipliers has a significant impact on the performance of these algorithms. On the other hand multipliers consume considerable power and take large area [1, 2]. The goal is to reduce the hardware complexity of the multiplier circuit. The multiplication process is carried out into partial product (PP) generation, partial product reduction and final addition. Compressors are important circuits which are employed to speed up the partial product reduction tree decrease and dissipation of multipliers [3]. The adder cell is one of the major elements of designing arithmetic circuits, specifically as the most basic element in the design of compressors. There have been many structures for multiplication. One of them is Wallace architecture which is composed of several layers of full adders and half adders to reduce the number of partial product addition in critical path of multipliers. The

adders of each layer act simultaneously while those in separate layers work sequentially. The partial products are grouped together into sets of three and are summed by using full adders. Half adders are used when there are two partial products in a particular column. The results of sum and carry signals of each column pass to the next stage. This trend of reduction continues in each column until the entire partial products are summed. The resulted sum of last stage is added by a carry propagation adder [4, 5]. The goal is to reduce the number of stages of Wallace tree which leads to accelerate the partial product addition and regularity of the whole circuit. This paper presents a new method for reduction stages of Wallace tree by using compressors with higher degrees. These compressors use a high-speed full adder cell based on Carbon Nanotube Field effect Transistor (CNTFET) for low-voltage and high-frequency applications which presented in [6]. The presented method leads to higher speed and reduction in area complexity of multipliers. As shown in Figure 1, the used transistors are CNTFETs. CNTFET is the most promising alternative in the near future due to its inherent similarities with conventional silicon based transistors. In recent years, CNTFET-based circuits have been presented in the literature [7-13]. Carbon Nanotubes (CNTs) are made of graphene sheets, which are cylindrical nanostructures. CNTs are called either single-walled when they have one cylinder or multi-walled when they have more than one cylinder. A Single-Walled Carbon Nanotube (SWCNT) [14,15] can be metallic or semiconducting according to its chiral numbers  $(n_1, n_2)$ . If  $n_1-n_2\neq 3k$   $(k\in \mathbb{Z})$ , the SWCNT is semiconducting, otherwise it is conducting. One or more semiconducting SWCNT can be used as the channel of a transistor [13,16,17]. The main property of a CNTFET which makes it proper for designing multiple-Vt circuits, is the capability of regulating the desired threshold voltage by changing the diameter of the nanotube. The threshold voltage of a CNTFET is estimated as the half-band-gap and is an inverse function of the carbon nanotube diameter ( $D_{CNT}$ ). It is obtained by the following equation [16, 17]:

$$V_{th} \approx \frac{E_g}{2e} = \frac{\sqrt{3}}{3} \frac{a.V_{\pi}}{e.D_{CNT}} \tag{1}$$

Where  $a \approx 2.49 \text{ Å}$  indicates the carbon to carbon atom distance,  $V\pi \approx 3.033 \text{ eV}$  is the carbon  $\pi$ - $\pi$  bond energy in the tight bonding model, e signifies the unit electron charge and  $D_{CNT}$  is calculated by the following equation [14, 15]:

$$D_{CNT} = \frac{a\sqrt{n_1^2 + n_2^2 + n_1 n_2}}{\pi}$$
 (2)

In Figure 1, each transistor is marked by two numbers, the diameter of CNTs under the gate terminal (D) and the number of CNTs (Tube) under the gate.



Figure 1. CNTFET-based full adder cell [6]

# 2. A REVIEW ON COMPRESSORS

A m:n compressor is a combinational circuit in which takes m equally weighted input bits and generates n outputs with different bit positions (n < m). Full adder is the simplest compressor which has 3 inputs and 2 outputs and known as 3:2 compressor.

It is used for designing high order compressors. For example the design of the compressor 4:2 is shown in Figure 2 [18]. This design is composed of two cascaded 3:2 compressors and has four equally weighted primary inputs and a carry input from the previous step. All the inputs have the weight of 2<sup>i</sup>. It has two outputs. The output sum has the same weight as inputs. The carry output has the weight of 2<sup>i+1</sup> which is fed to the next stage. The critical path of this compressor is specified as dotted line.

Although using of 3:2 and 4:2 compressors seems ideal for creating a regular structure of Wallace tree with low complexity [18, 19], using high order compressors is effective to reduce the partial product stages and power consumption of multipliers [18]. Different structures of compressors are implemented in the literature [3, 21, 22]. The implementation of 5:2, 6:2 and 7:2 compressors are plotted in Figure 3. The critical path of each compressor is specified as dotted line. In all these designs, two carry input signals are taken from the previous

stage and two carry output signals are passed to the next stage.

In parallel multipliers, all the inputs arrive at the same time. For example, suppose that two 6:2 compressors of Figure 3 are cascaded as shown in Figure 4. Input signals are applied to these compressors at the same time. The second compressor located in the second stage must wait until the output carry from the previous stage has been calculated so the whole circuit has a critical path delay of  $3t_{FA} + t_{Cout}$  specified as dotted line.

The problem of critical path delay of m:2 compressors can be further reduced by using m:3 compressors. The structure of 6:3 and depicted compressors 7:3 are Figure 5. In these designs, no carry input is taken from the previous stage and no carry output is passed to the next stage. It should be noted that m:3 compressors benefit from less gate count and subsequently less area compressors. than m:2 For further investigation the implementation of 8:3, 9:3, and 10:3 compressor are depicted in Figure 6.



*Figure 2. Implementation of 4-2 Compressor.* 



Figure 3. Implementation of (a) 5:2 Compressor, (b) 6:2 Compressor, (c) 7:2 Compressor.



Figure 4. Cascade connection of two 6:2 Compressors.



Figure 5. Implementation of (a) 6:3 Compressor, (b) 7:3 Compressor.



Figure 6. Implementation of (a) 8:3 Compressor, (b) 9:3 Compressor, (c) 10:3 Compressor.

# 3. THE PROPOSED WALLACE MULTIPLIER BY USING M:3 COMPRESSORS

The proposed architecture of  $8 \times 8$ -bit multiplier cell is presented in Figure 7. It uses m:3 compressors. In this architecture, the entire process is completed in three steps. The partial products are marked with different signs. This structure uses 4:3, 5:3, 6:3, 7:3 and 8:3 compressors for partial product reduction. 4:3, 5:3, 6:3, and 7:3 compressors have no carry output passed to the next stage. But 8:3 compressor generate only one output carry signal transferred to the stage of 2i+2 and yields a 4:3 compressor in the stage of 2i+2 but has no effect on critical path. Finally the Wallace tree is summarized in two rows. A carry lookahead adder is used for summation of these rows. The CNTFET-based full adder cell presented in [6], which is named CNTFA, is used as the building block of m:3compressors.

Correspondingly, the same approach is applied for producing the 16×16-bit multiplier cell. In this multiplier as shown in Figure 8, the longest column at first stage has 10 partial products so the highest order used compressor is 10:3. This structure also uses 4:3, 5:3, 6:3, 7:3 and 8:3 compressors for partial product reduction. The critical path of each multiplier is shown as dotted line.

# 4. ANALYSIS OF AREA OPTIMIZED WALLACE TREE MULTIPLIER

As mentioned in the previous section, the proposed Wallace tree architecture, which is named WMCNTFA, is used CNTFA as the building block of its used compressors. The Wallace tree architecture in [23], which is named WMEECFA, is used EECFA in [22] as its building block.

The transistor level simulations of CNTFA and EECFA are carried out using Synopsys HSPICE simulator with the standard 32 nm CNTFET technology [14, 15]. The simulations were based on 0.65 V supply voltage at room temperature with 1 GHz operating frequency. The transition points of the transistors are adjusted by setting suitable threshold voltages for CNTFETs specified by the diameter of the nanotubes.

Gate count, transistor count, delay, average power dissipation, energy consumption or power-delay product (PDP) and total area of each design are shown in Table 1 where the best results are marked in boldface values. The total width of each full adder cell is obtained by Eq. 3 and Eq. 4 as a reasonable criterion of area competence [25].

$$W_{gate} \approx Min(W_{min}, \neq Tube \times Pitch)$$
 (3)

Total Cell Width = 
$$\sum_{i}$$
 With( $T_{i}$ ) (4)

Where  $W_{gate}$  is the width of the gate of a CNTFET, *Pitch* is the distance between the centers of the two adjacent CNTs and  $\neq Tube$  is the numbers of CNTs under the gate.

It can be inferred from Table 1 that CNTFA has better than EECFA in terms of gate count, transistor count, delay, PDP and total area.

Three multiplier cells including 8 × 8, 16×16, and 32×32 number of bits has been considered and compared with the multipliers presented in [23] in terms of gate and transistor count. Moreover, as depicted in Table 2, WMCNTFA has less stages and fewer total number of gates and transistors compared with WMEECFA. It can be concluded from the results in Table 1 and Table 2 that the proposed method would lead to reduction in area complexity and critical path length of multipliers.



*Figure 7.* The proposed  $8 \times 8$ -bit multiplier cell using m: 3 compressors.



*Figure 8.* The proposed  $16 \times 16$ -bit multiplier cell using m:3 compressors.

**Table 1.** Comparison of full adder designs.

| Full Adder | Gates Count | Transistor Count | Delay(psec) | Average Power (nW) | Energy Consumption (aJ) | Area (nm2) |
|------------|-------------|------------------|-------------|--------------------|-------------------------|------------|
| CNTFA [6]  | 2           | 12               | 5797        | 5006               | 29.02                   | 3840       |
| EECFA [24] | 6           | 26               | 22074       | 3190               | 70.42                   | 108800     |

*Table 2.* Comparison in reduction complexity of multipliers.

|                         |         |         |         | 7 7     |         |         |
|-------------------------|---------|---------|---------|---------|---------|---------|
| Number of bits          | 8       |         | 16      |         | 32      |         |
|                         | WMCNTFA | WMEECFA | WMCNTFA | WMEECFA | WMCNTFA | WMEECFA |
| Number of<br>Stages     | 3       | 4       | 3       | 6       | 5       | 8       |
| Total Gate counts       | 114     | 240     | 536     | 1824    | 2314    | 5488    |
| Total Transistor counts | 684     | 1053    | 3216    | 5343    | 13884   | 23881   |

## 5. CONCLUSION

This paper presents a new method to reduce the simplification of Wallace tree design using high order compressors based on carbon nanotube technology. These compressors use a high-speed full adder cell based on CNTFETs for low-voltage and high-frequency applications. The proposed method leads to the reduction in the number

of gates and transistors, critical path length and complexity of the Wallace tree hardware.

## **ACKNOWLEDGEMENT**

The authors would like to thank Nano technology and Quantum Computing Lab, Shahid Beheshti University.

# REFERENCES

- 1. Khatibzadeh, A., Raahemifar, K. (2005). "A novel pipelined multiplier for high-speed DSP applications", *International Symposium on Signals, Circuits and Systems*, 2005. ISSCS 2005., IEEE.
- 2. Al-Khaleel, O., Chris, P., Frank, W., Kiamal, P. (2006). "A large scale adaptable multiplier for cryptographic applications", *First NASA/ESA Conference on Adaptive Hardware and Systems (AHS'06)*, IEEE.
- 3. Rouholamini, M., Mahnoush., Omid, K., Amir-Pasha, M., Somaye, J., Keivan., N. (2007). "A new design for 7: 2 compressors", 2007 IEEE/ACS International Conference on Computer Systems and Applications, IEEE.
- 4. Oklobdzija, V. G., David, V.,Simon, S. Liu (1996). "A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach", *IEEE Transactions on Computers* 45(3): 294-306.
- Sureka, N., Porselvi, R., Kumuthapriya, K., Kumuthapriya, K.. (2013). "An efficient high speed Wallace tree Multiplier", Information Communication and Embedded Systems (ICICES), 2013 International Conference on, IEEE.
- 6. Samdaliri, S., Javidan, J., Sam, M., Navi, K. (2015). "Design of a new high-speed and high-performance Full Adder Cell Based on Carbon Nanotube FETs", *Quntum Matter*, 5: 524-528.
- 7. Moaiyeri, M. H., Faghih Mirzaee, R., Navi, K., Momeni, A. (2012). "Design and analysis of a high-performance CNFET-based Full Adder", *International Journal of Electronics*, 99(1): 113-130.
- 8. Reshadinezhad, M. R., Moaiyeri, M. H., Navi, K. (2012). "An energy-efficient full adder cell using CNFET technology", *IEICE transactions on electronics* 95(4): 744-751.
- 9. Navi, K., Sharifi Rad, R., Moaiyeri, M. H., Momeni, A. (2010). "A low-voltage and energy-efficient full adder cell based on carbon nanotube technology", *Nano-Micro Letters*, 2(2): 114-120.
- 10. Moaiyeri, M. H., Navi, K., Hashemipour, O. (2012). "Design and evaluation of CNFET-based quaternary circuits", *Circuits, Systems, and Signal Processing*, 31(5): 1631-1652.

- 11. Sajedi, H. H., Sam, M., Navi, K., Jalali, A. (2015). "High Performance and Low Power Half-Adder Cells in Carbon Nanotube Field Effect Transistor Technology", *Journal of Computational and Theoretical Nanoscience*, **12**(8): 1756-1760.
- 12. Sam, M., Navi, K., Moaiyeri, M.H. (2016). "A New 5-Input Molecular Exclusive-OR Gate Based on Benzene Ring and Carbon Nanotube FETs", *Quantum Matter*, **5**(1): 99-102.
- 13. Lin, S., Yong-Bin, K., Lombardi, F. (2011). "CNTFET-based design of ternary logic gates and arithmetic circuits", *IEEE transactions on nanotechnology*, 10(2): 217-225.
- 14. Farhadian, N. (2013). "Investigating the Ibuprofen Chiral Forms Interactions with Single Wall Carbon Nanotube", *International Journal of Nanoscience and Nanotechnology*, 9(3): 127-138.
- 15. Farhadian, N., Shariaty-Niassar, M. (2009). "Molecular Dynamics Simulation of Water in Single WallCarbon Nanotube", *International Journal of Nanoscience and Nanotechnology*, 5(1): 53-62.
- 16. Deng, J., Wong, H.-S. P. (2007). "A compact SPICE model for carbon-nanotube field-effect transistors including nonidealities and its application—Part I: Model of the intrinsic channel region", *IEEE Transactions on Electron Devices*, 54(12): 3186-3194.
- 17. Deng, J., Wong, H.-S. P. (2007). "A compact SPICE model for carbon-nanotube field-effect transistors including nonidealities and its application—Part II: Full device model and circuit performance benchmarking", *IEEE Transactions on Electron Devices*, 54(12): 3195-3205.
- 18. Chang, C.-H., Gu, J., Zhang, M. (2004). "Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits", *IEEE Transactions on Circuits and Systems, I: Regular Papers*, 51(10): 1985-1997.
- 19. Radhakrishnan, D., Preethy, A. (2000). "Low power CMOS pass logic 4-2 compressor for high-speed multiplication", *PROCEEDINGS OF THE IEEE MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, LIDA RAY TECHNOLOGIES INC.*
- 20. Rebala, N. R., krishna Tirumala, B. (2014). "High speed multipliers using nested higher order compressors", 2014 International Conference on Computer Communication and Informatics.
- 21. Koren, I. (2002). "Computer arithmetic algorithms", Universities Press.
- 22. Akoushideh, A., Ardalan, Najafi., Babak, Mazloom-nezhad Maybodi. (2012). "Modified Architecture for 27: 2 Compressor", *Canadian Journal on Electrical and Electronics Engineering*.
- 23. Khan, S., Sandeep, Kakde., Yogesh, Suryawanshi. (2013). "VLSI implementation of reduced complexity wallace multiplier using energy efficient CMOS full adder", *Computational Intelligence and Computing Research (ICCIC)*, 2013 IEEE International Conference on, IEEE.
- 24. Aguirre-Hernandez, M., Linares-Aranda, M. (2011). "CMOS full-adders for energy-efficient arithmetic applications", *IEEE transactions on very large scale integration (VLSI) systems*, 19(4): 718-721.
- 25. Mirzaee, R. F., Keivan, N. (2014). "Optimized adder cells for ternary ripple-carry addition", *IEICE TRANSACTIONS on Information and Systems*, 97(9): 2312-2319.