ArticlePDF Available

ASIC design of low power-delay product carry pre-computation based multiplier

February 2019
Indonesian Journal of Electrical Engineering and Computer Science 13(2):845-852

February 2019
13(2):845-852

DOI:10.11591/ijeecs.v13.i2.pp845-852

License
CC BY-NC 4.0

Authors:

Sundaaresaan Chi

Manipal Academy of Higher Education

Keerthana Prasad

Manipal Academy of Higher Education

High speed and efficient multipliers are essential components in today’s computational circuits like digital signal processing, algorithms for cryptography and high performance processors. Invariably, almost all processing units will contain hardware multipliers based on some algorithm that fits the application requirement. Tremendous advances in VLSI technology over the past several years resulted in an increased need for high speed multipliers and compelled the designers to go for trade-offs among speed, power consumption and area. Amongst various methods of multiplication, Vedic multipliers are gaining ground due to their expected improvement in performance. A novel multiplier design for high speed VLSI applications using Urdhva-Tiryagbhyam sutra of Vedic Multiplication has been presented in this paper. The proposed architecture modeled using Verilog HDL, simulated using Cadence NCSIM and synthesized using Cadence RTL Compiler with 65nm TSMC library.The proposed multiplier architecture is compared with the existing multipliers and the results show significant improvement in speed and power dissipation. © 2019 Institute of Advanced Engineering and Science. All rights reserved.

Architecture of Carry Pre-Computation based Multiplier

…

Block Diagram of 8-bit Multiplier Using 4-bit Carry Pre-Computation Based Multiplier

…

Summary of Synthesis Results of 16-Bit Multiplier Architectures

…

Figures - available via license: Creative Commons Attribution-NonCommercial 4.0 International

Content may be subject to copyright.

Available via license: CC BY-NC 4.0

Content may be subject to copyright.

Indonesian Journal of Electrical Engineering and Computer Science

Vol. 13, No. 2, February 2019, pp. 845~852

ISSN: 2502-4752, DOI: 10.11591/ijeecs.v13.i2.pp845-852  845

Journal homepage: http://iaescore.com/journals/index.php/ijeecs

ASIC design of low power-delay product carry pre-computation

based multiplier

Chaitanya CVS1, Sundaresan C2, P R Venkateswaran3, Keerthana Prasad4

1,2,4School of Information Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India.

3Bharat Heavy Electricals Limited, Tiruchurapalli, Tamil Nadu, India.

Article Info

ABSTRACT

Article history:

Received Oct 6, 2018

Revised Dec 07, 2018

Accepted Dec 21, 2018

High speed and efficient multipliers are essential components in today’s

computational circuits like digital signal processing, algorithms for

cryptography and high performance processors. Invariably, almost all

processing units will contain hardware multipliers based on some algorithm

that fits the application requirement. Tremendous advances in VLSI

technology over the past several years resulted in an increased need for high

speed multipliers and compelled the designers to go for trade-offs among

speed, power consumption and area. Amongst various methods of

multiplication, Vedic multipliers are gaining ground due to their expected

improvement in performance. A novel multiplier design for high speed VLSI

applications using Urdhva-Tiryagbhyam sutra of Vedic Multiplication has

been presented in this paper. The proposed architecture modeled using

Verilog HDL, simulated using Cadence NCSIM and synthesized using

Cadence RTL Compiler with 65nm TSMC library.The proposed multiplier

architecture is compared with the existing multipliers and the results show

significant improvement in speed and power dissipation.

Keywords:

Binary Multiplication

Carry Pre Computation

Multiplier Architecture

Operand Decomposition

Vedic Multiplier

Corresponding Author:

Chaitanya CVS,

School of Information Sciences,

Manipal Academy of Higher Education,

Manipal 576104, Karnataka, India.

Email: chaitanya.cvs@manipal.edu

1. INTRODUCTION

Processors are important part of integrated circuits (IC). Large numbers of functionalities are packed

in an IC thanks to tremendous growth in density of integration in recent times. As the number of functions

increases, the need for computation also grows. With the advent of new process technologies, shrinking of

feature size and availability of modern CAD tools, a development of complex integrated circuits for various

applications is possible. Examples of such applications include digital signal processing [1,2], mobile

computations and communications, multimedia applications and processing required for scientific computing

and applications etc. The speed and efficiency of processor in such IC is very crucial for meeting the

requirements of the applications supported by the IC. The speed of processor and efficiency of processor in-

turn depends upon an arithmetic logic unit [3] which is considered as the main computational unit of the

processor.

Moreover, the multiplier units [4] are the most important hardware structures in a complex

arithmetic unit. The multiplier units are capable of performing operations on operands of various data

types such as calculating running sum of products. As multiplication is a crucial arithmetic operation in

processors [5] and digital computer systems, multipliers are the core building block for many algorithms in a

wide variety of computing applications. Although multipliers are main arithmetic components used for

processing scientific data, the excessive power consumption and delay attracts attention from the research

 ISSN: 2502-4752

Indonesian J Elec Eng & Comp Sci, Vol. 13, No. 2, February 2019 : 845 – 852

846

community. Usually, multiple arithmetic cores working in parallel are used so as to process large amounts of

data with relatively low power and delay.

Various algorithms have been proposed for the hardware implementation of multipliers in the past.

Add and Shift is the common algorithm used in designing of multiplier [6]. In parallel multipliers, the

important parameter which is used to determine performance is the number of partial products which are

needed to be added. One such algorithm is Modified Booth algorithm [7] which reduces the number of partial

products during the multiplication which in turn increases the performance of the multiplier. Another

algorithm is Wallace tree based algorithm which reduces number of adding stages and is used to improve the

speed of multiplication. In some implementations, efficient multiplier architecture is designed by combining

both Modified Booth algorithm and Wallace Tree algorithm. However, an increasing parallelism increases

the number of shifts between intermediate sum and partial products which results in reduced speed,

increased power consumption and also increased area because of irregular structure. Thus, in some cases, low

power and compact multiplier architectures is implemented using serial multiplication algorithm. Serial

multipliers [8] have better performance for power consumption and area with the delay tradeoff. Depending

upon the application, either parallel or serial multipliers are selected to perform the operation.

However, in the high speed processors which are operating at higher clock frequencies, the existing

multiplier takes more delay for execution of the instructions. The existing multiplier units that consume more

power are not suitable to be incorporated in the processors which are used in wireless and portable devices.

Thus, power savings is an important area for improvement.

In order to address the low power computation along with high performance, a new approach to

multiplier design based on ancient Vedic Mathematics has been explored. The mathematical operations using

Vedic mathematics are very fast and require less hardware. This aspect of Vedic mathematics can be utilized

to increase the computational speed of multipliers. This paper describes the design and implementation of a

Vedic multiplier based on Urdhva-Tiryagbhyam Sutra [9]-[11]. The number of steps required to perform a

multiplication operation by using UrdhvaTiryagbhyam Sutra are considerably less compared to the

conventional multiplication techniques. In this paper, we have further explored a novel method to enhance

the speed of a Vedic multiplier by pre-computing the carries which are used during summation of partial

products. The implementation of pre-computation logic using multiplexer based carry-look ahead logic and

XOR logic resulted in reduction of delay. The proposed multiplier along with operand decomposition

technique resulted in reduction of power consumption which in turn reduced the power-delay product of the

multiplier.

The structure of the paper is divided as follows: The methodology and the architecture of the

proposed multipliers are given in section 2. Results are presented in section 3. Finally, conclusion is given in

section 4.

2. RESEARCH METHOD

2.1. Carry pre-computation based binary multiplier

An 8 bit Binary Vedic Multiplier has been proposed with A and B as inputs and P as the final 16-bit

product. The block diagram for 8 bit multiplication is shown in Figure 1. In the proposed multiplier the

operands A and B are divided into Higher and Lower parts with 4-bits each.

A = {AH, AL} (1)

B = {BH, BL} (2)

AL*BH

AH*BH

AL*AL

AH*AL

Product

Figure 1. Block Diagram of 8-bit Multiplication

In this type of multiplier an 8 bit Binary multiplication is realized using 4-bit binary vedic

multiplication using carry pre-computation logic shown in below Figure 2. where A3, A2, A1, A0 & B3, B2,

B1, B0 are 4 bit binary inputs and P7, P6, P5, P4, P3, P2, P1, P0 are the binary output bits.

Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 

ASIC design of low power-delay product carry pre-computation based multiplier (Chaitanya CVS)

847

pp4

pp3

pp2

pp1

pp8

pp7

pp6

pp5

pp12

pp11

pp10

pp9

pp16

pp15

pp14

pp13

c32

c31

c42

c41

c52

c51

c62

c61

c71

Figure 2. Carry Pre-Computation Based Multiplier

The architecture of the 4-bit multiplier can be understood from the block diagram shown in Figure 3.

Partial Products Generator

XOR Logic

Pre-Carry Logic

B[3:0]

A[3:0]

Product[7:0]

PP[15:0] PP[15:0]

Pre-Computed

Carries

Figure 3. Architecture of Carry Pre-Computation based Multiplier

The partial product generator is the first block of the multiplier to which the 4 bit multiplicand and

multiplier are given as inputs. At this juncture, the multiplication technique used is Urdhva-Tiryagbhyam.

The 4 bit multiplication results in a total of 16 partial products (pp1-pp16). The result of multiplying any one

binary bit with another is either a zero or a one which is simply the logic of ANDing of the two bits.

The products of AL*BL, AH*BL, AL*BH, AH*BH are determined using above 4-bit carry pre-

computation based multiplier and the results of all sub multipliers are added to determine the final product.

The block of the 8-bit multiplier is shown in Figure 4.

 ISSN: 2502-4752

Indonesian J Elec Eng & Comp Sci, Vol. 13, No. 2, February 2019 : 845 – 852

848

4-bit Carry Pre-

Computation

Based Multiplier

4-bit Carry Pre-

Computation

Based Multiplier

4-bit Carry Pre-

Computation

Based Multiplier

4-bit Carry Pre-

Computation

Based Multiplier

AH BH AH BL AL BH AL BL

Carry Save Adder

Carry Look Ahead Adder

Carry Look

Ahead

Adder

Product[15:12] Product[11:4] Product[3:0]

A[7:4]

A[3:0]

B[7:4]

B[3:0]

P1[7:0] P2[7:0] P3[7:0] P4[7:0]

P1[7:4] P4[3:0]

Carry[7:0] Sum[7:0]

4'b0000

Figure 4. Block Diagram of 8-bit Multiplier Using 4-bit Carry Pre-Computation Based Multiplier

The second stage in the block diagram is the carry generation circuit. Here, we have integrated pre-

computation logic along with the Urdhva-Tiryagbhyam multiplication technique. The carry equations are

generated separately for each column of partial products and the inputs for these equations are taken from the

previous column. The equations for pre-computed carries are given below.

c2 = pp5 & pp2; (3)

c3t1 = (pp6 & pp3) | (pp9 & (pp3 | pp6)); (4)

c3t2 = (pp9 & ~pp6)| (pp3 & ~pp9) | (~pp3 & pp6); (5)

c31 = c2?c3t2:c3t1; (6)

c32 = pp2 & pp5 & pp3 & pp6 & pp9; (7)

c41t1 = pp13?((pp10 & ~pp7)| (pp4 & ~pp10) | (~pp4 & pp7)):((pp7 & pp4) | (pp10 & (pp4 | pp7))) (8)

c41t2 = pp13?((~pp7 & ~pp4)| (~pp10 & (~pp4 | ~pp7))):((~pp7 & pp4) | (pp10 & ~pp4) | (~pp10 & pp7)); (9)

c41 = c31?c41t2:c41t1; (10)

c42 = ((c31 & pp13) & ((pp10 & (pp7 | pp4)) | (pp7 & pp4))) | ((pp10 & pp7 & pp4) & (c31 | pp13)); (11)

c51t1 = c32?((pp14 & ~pp11)| (pp8 & ~pp14) | (~pp8 & pp11)):((pp11 & pp8) | (pp14 & (pp8 | pp11))); (12)

c51t2 = c32?((~pp11 & ~pp8)| (~pp14 & (~pp8 | ~pp11))):((~pp11 & pp8) | (pp14 & ~pp8) | (~pp14 & pp11)); (13)

c51 = c41?c51t2:c51t1; (14)

Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 

ASIC design of low power-delay product carry pre-computation based multiplier (Chaitanya CVS)

849

c52 = ((c41 & c32) & ((pp14 & (pp11 | pp8)) | (pp11 & pp8))) | ((pp14 & pp11 & pp8) & (c41 | c32)); (15)

c6t1 = (pp12 & pp15) | (c42 & (pp12 | pp15)); (16)

c6t2 = (c42 & ~pp15)| (pp12 & ~c42) | (pp15&~pp12); (17)

c61 = c51?c6t2:c6t1; (18)

c62 = c51 & c42 & pp12 & pp15; (19)

c71 = (c52 & pp16) | (c61 & (c52 | pp16)); (20)

The third stage in the block diagram involves the use of XOR logic for the partial products and carry

generated in each column. The output of this stage gives the final 16 bit product which is obtained in a

parallel mechanism instead of sequential mechanism.

2.2. Carry pre-computation based binary multiplier using operand decomposition

In operand decomposition [12], the operands X and Y are decomposed into four numbers A, B, C

and D to reduce the number of ones in the partial products. The operands are decomposed in such a way that

the number of zeros in decomposed operand will be more when compared to number of ones. As the number

of zeros are more, the switching activity of the circuit will be reduced which in turn reduce the dynamic

power consumption of the architecture.

Assuming that the two operands are X and Y have n bits,

X = [Xn-1Xn-2.......X1X0], and

Y = [Yn-1Yn-2.......Y1Y0] (21)

The four decomposed operands are given in the following

A = ~X Λ ~Y,

B = X Λ Y,

C = ~X Λ Y, and

D = X Λ ~Y (22)

Where, Λ is and operation & ~ is two’s complement

The final product is determined by using equation 23.

X*Y = (C * D) - (A * B); (23)

The products of C*D and A*B are determined using 8-bit carry pre-computation based multiplier.

Then the final partial sum and carry from both products can be combined carry save adder and carry look

ahead adder. The block diagram for above multiplier is shown in Figure 5.

3. RESULTS AND ANALYSIS

The proposed architecture modeled using Verilog HDL, simulated using Cadence NCSIM and

synthesized using Cadence RTL Compiler with 65nm TSMC library. Different implementation methodology

have been taken and implemented in same technological environment and then compared the performance

parameters. For the comparison point of view the ideas have been considered from the references and

simulated and performance parameters was computed using the same MOSFET technology file. Input data

was taken in a regular fashion for experimental purpose. The delay and the power measured using the worst-

case pattern and from the output where the delay is maximum.

It is observed that the proposed carry pre-computation based multiplier and carry pre-computation

based multiplier with operand decomposition offered substantial reduction of propagation delay and total

power consumptions. From Table 1 and Table 2, it can be observed that the proposed carry pre-computation

based multiplier design offered ~23%, ~64%, ~57%, ~83%, ~94% when compared with array multiplier,

wallace multiplier, column based multiplier, Nikhilam based and compressor based multipliers respectively,

and carry pre-computation based multiplier with operand decomposition offered ~41%, ~72%, ~67%, ~87%,

 ISSN: 2502-4752

Indonesian J Elec Eng & Comp Sci, Vol. 13, No. 2, February 2019 : 845 – 852

850

~95% when compared with array multiplier, wallace multiplier, column based multiplier, Nikhilam based

and compressor based multipliers respectively.

Operand Decomposer

8-bit Carry Pre-

Computation Based

Multiplier

8-bit Carry Pre-

Computation Based

Multiplier

Carry Save Adder

X[7:0] Y[7:0]

A[7:0] B[7:0] C[7:0] D[7:0]

Prod1[15:0] Prod2[15:0]

Product[15:0]

Figure 5. Carry Pre-Computation Based Multiplier Using Operand Decomposition

Table 1. Summary of Synthesis Results of 8-Bit Multiplier Architectures

S.No

Architecture (8-bit)

Delay

(ns)

Dynamic

Power (uW)

Static Power

(uW)

Total Power

(uW)

Power-Delay

Product (pJ)

Array Based Multiplier [6]

1.5

15.09

21.09

31.63

Wallace Based Multiplier [2]

1.2

6.27

49.913

56.184

67.42

Column Based Multiplier [9]

1.95

26.74

2.8

29.54

57.6

Nikhilam Based Multiplier [10]

3.2

42.56

4.3

46.86

149.95

Compressor Based Multiplier [11]

4.02

95.2

6.79

101.99

410.92

Pre-Computation Based Multiplier

0.75

25.77

7.45

33.23

24.23

Pre Computation Based Multiplier

with Operand Decomposition

1.02

3.36

14.808

18.172

18.5

Table 2. Summary of Synthesis Results of 16-Bit Multiplier Architectures

S.No

Architecture (16-bit)

Delay

(ns)

Dynamic

Power (uW)

Static Power

(uW)

Total Power

(uW)

Power-Delay

Product (pJ)

Array Based Multiplier [6]

2.89

30.18

42.18

121.90

Wallace Based Multiplier [2]

2.46

12.54

99.826

112.366

276.42

Column Based Multiplier [9]

3.82

52.48

5.4

57.88

221.10

Nikhilam Based Multiplier [10]

5.96

80.65

8.1

88.75

528.95

Compressor Based Multiplier [11]

8.04

190.4

13.58

203.98

1639.99

Pre-Computation Based Multiplier

1.4

51.54

14.9

66.44

93.016

Pre Computation Based Multiplier

with Operand Decomposition

1.96

6.72

29.616

36.336

71.218

From the Table 1 and Table 2, it can be observed that carry pre-computation based multiplier with

operand decomposition consumes less power when compared to carry pre-computation based multiplier with

the delay tradeoff. Proposed Carry pre-computation based multiplier with operand decomposition gave the

better power-delay product when compared to proposed carry pre-computation based multiplier and existing

multiplier from literature.

Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 

ASIC design of low power-delay product carry pre-computation based multiplier (Chaitanya CVS)

851

4. CONCLUSION

In this paper, a Vedic mathematics based multiplier has been proposed which uses Carry pre-

computation and operand decomposition methodology. The proposed architecture combines the benefits of

Vedic method, parallel pre-computation of carries, and operand decomposition thereby resulting in reduction

of power-delay product. The propagation delay of carry pre-computation based multiplier for calculation of 8

bit and 16 bit multiplication was 0.75ns and 1.4ns while power consumption was 33.23 uW and 66.44 uW.

The propagation delay of carry pre-computation based multiplier with operand decomposition for calculation

of 8 bit and 16 bit multiplication was 1.02ns and 1.96ns while power consumption was 18.17 uW and 36.13

uW. The delay of multiplication was decreased by ~68% and power consumption was reduced by ~61%

when compared to Nikhilam based Vedic multiplier.

REFERENCES

[1] Xiangui Kang, AnjiePeng, XianyuXu, Xiaochun Cao, Performing Scalable Lossy Compression On Pixel Encrypted

Images, EURASIP Journal on Image and Video Processing, (2013), pp. 1-6.

[2] Nikolay Ponomarenko, Sergey Krivenko, Vladimir Lukin, Karen Egiazarian, Jaakko T, Astola, Lossy Compression

of Noisy Images Based on Visual Quality: A Comprehensive Study, EURASIP Journal on Advances in Signal

Processing, (2010), pp. 1-13.

[3] L.-K. Wang, M. A. Erle, C. Tsen, E. M. Schwarz, and M. J. Schulte, A survey of hardware designs for decimal

arithmetic, IBM Journal of Research and Development, 54 (2) (2010), pp. 8:1-8:15.

[4] M. Jeevitha, R. Muthaiah, P. Swaminathan, Efficient Multiplier Architecture in VLSI Design, Journal of

Theoretical and Applied Information Technology, 38 (2) (2012), pp. 196-201.2

[5] J. R. Boddie, G. T. Daryanani, I. I. Eldumiati, R. N, Gadenz, J. S. Thompson, S. M. Walters, Digital Signal

Processor: Architecture and Performance, Bell System Technical Journal, 60 (7) (1981), pp. 1449-1462.

[6] Ko-Chi Kuo, Chi-Wen Chou, Low Power And High Speed Multiplier Design With Row Bypassing And Parallel

Architecture, Microelectronics Journal, 41 (2010), pp. 639-650.

[7] Constantinos Efstathiou, N. Moshopolous, N. Axelos, K. Pekmestzi, Efficient Modulo 2n+1 Multiply And

Multiply-Add Units Based On Modified Booth Encoding, Integration, the VLSI Journal, 47 (2014), pp. 140-147.

[8] Manas Ranjan Meher, Ching Chuen Jong, and Chip-Hong Chang, “A High Bit Rate Serial-Serial Multiplier With

On-the-Fly Accumulation by Asynchronous Counters”, IEEE trans. On VLSI systems, Vol. 19, No. 10, pp. 1733-

1745, October, 2011.

[9] BharatiKrsnaTirthaji, V. S Agrawala, “Vedic Mathematics”, 13th Edition, Motilal Banarsidass, 2010.

[10] P. Saha, A. Banerjee, A. Dandapat, and P. Bhattacharyya, “ASIC design of a high speed low power circuit for

factorial calculation using ancient Vedic mathematics”, ELSEVIER Microelectronics Journal, vol. 42, issue 12, pp.

1343-1352, December, 2011.

[11] MD. Belal Rashid, Balaji B.S and Prof. M.B. Anandaraju, “VLSI Design and Implementation of Binary Multiplier

based on UrdhvaTiryagbhyam Sutra with reduced Delay and Area”, International Journal of Engineering Research

and Technology, vol. 6, no. 2, pp. 269-278, March, 2013.

[12] Rizwan Mudassir, Mohab Anis, and Javid Jaffari, “Switching Activity Reduction in Low Power booth Multiplier”,

IEEE Symposium on Circuits and Systems, Seattle, vol. 1, pp. 3306-3309, May, 2008.

BIOGRAPHIES OF AUTHORS

Chaitanya CVS received his Bachelor Degree in Electronics and Communication Engineering in

2006 from JNTU, Hyderabad and his MS degree in VLSI-CAD from Manipal University in 2007.

In 2010, he started his career as Assistant Professor in School Of Information Sciences, Manipal.

Currently, he is doing Ph.D at Manipal University. His research interest includes High Performance

Computer Arithmetic, Advanced Computer Architecture, Low-power VLSI Design, Electronic

Design Automation, and Parallel Algorithms/Architectures.

Dr. C Sundaresan completed Bachelor degree in Electronics and Communication in 2000

from Madurai Kamaraj University and MS degree in VLSI CAD in 2003 from Manipal

University and PhD in 2018 from Manipal Academy of Higher Education. He started his

career as R & D engineer at Aplab Ltd. Currently he is working as Assistant Professor in

School Of Information Sciences. His research interests includes Computer Arithmetic,

Low-Power VLSI Design, Logic Synthesis, Static Timing Analysis.

 ISSN: 2502-4752

Indonesian J Elec Eng & Comp Sci, Vol. 13, No. 2, February 2019 : 845 – 852

852

Dr. P. R. Venkateswaran obtained his bachelor’s degree in Electronics and Instrumentation

Engineering from National Engineering College, Kovilpatti in 1998 and Masters in Instrumentation

and Control Engineering from Technical Teachers’ Training Institute, Chandigarh in 2002. He

completed his doctoral research in 2008 from Manipal University, Manipal. He started his career as

teaching faculty at Sethu Institute of Technology, Madurai and continued his teaching career with

Technical Teachers’ Training Institute, Chandigarh and later at Manipal Institute of Technology,

Manipal. Presently, he is working as Senior Engineer (Control and Instrumentation) at Welding

Research Institute, BHEL, Tiruchirappalli and is associated in the areas of Welding Automation

and Welding Power Sources. His areas of interest are linear Control theory, Electronic

Instrumentation and Soft Computing Techniques. He has been a reviewer for journals like IEEE

SMC, Elsevier, AMSE etc. He is a member of professional bodies of ISTE, IWS and IE.

Dr. Keerthana Prasad is working as Professor in School of Information Sciences, a constituent

institution of Manipal University. Her research interests are image analysis and its applications in

medicine and high performance computing approach for image processing.

Pipelined vedic multiplier with manifold adder complexity levels

Article

Full-text available

Jun 2020
IJECE

Recently, the increased use of portable devices, has driven the research world to design systems with low power-consumption and high throughput. Vedic multiplier provides least delay even in complex multiplications when compared to other conventional multipliers. In this paper, a 64-bit multiplier is created using the Urdhava Tiryakbhyam sutra in Vedic mathematics. The design of this 64-bit multiplier is implemented in five different ways with the pipelining concept applied at different stages of adder complexities. The different architectures show different delay and power consumption. It is noticed that as complexity of adders in the multipliers reduce, the systems show improved speed and least hardware utilization. The architecture designed using 2 x 2 – bit pipelined Vedic multiplier is, then, compared with existing Vedic multipliers and conventional multipliers and shows least delay.

ASIC Design and Implementation of a Power-Delay Product Optimized Arithmetic Operational Unit

Conference Paper

Full-text available

Dec 2023

Power optimization of binary division based on FPGA

Article

Full-text available

Dec 2021

In modern very large scale integrated (VLSI) digital systems, power consumption has become a critical concern of VLSI designers. As size shrinks and density increases in chips, it will be a challenge to design high performance and low-power digital systems. Therefore, VLSI designers are trying to reduce power dissipation in these systems by using power optimization techniques. Different mathematical operations can be found in the architectures of most digital systems. The focus of this paper is division. In comparison to other basic computational operations, division requires more iterations, takes a long time, covers a large area, and consumes more power from the digital system. As a result, the system's design requires high speed and a low-power divider in order to improve its overall performance. This paper focuses on dynamic power dissipation. In order to determine which design consumes the lowest dynamic power, different system designs of digit-recurrence division algorithms, such as restoring division and non-restoring division are suggested. An innovative power-optimization technique, the very hardware descriptions language (VHDL) technique, is utilized to the suggested system designs. The VHDL technique achieved the higher optimization in dynamic power, at 93.66% for non-restoring division with internal-loop iteration, than traditional approaches.

Performing scalable lossy compression on pixel encrypted images

Article

Full-text available

Dec 2013
Int J Image Video Process

Compression of encrypted data draws much attention in recent years due to the security concerns in a service-oriented environment such as cloud computing. We propose a scalable lossy compression scheme for images having their pixel value encrypted with a standard stream cipher. The encrypted data are simply compressed by transmitting a uniformly subsampled portion of the encrypted data and some bitplanes of another uniformly subsampled portion of the encrypted data. At the receiver side, a decoder performs content-adaptive interpolation based on the decrypted partial information, where the received bit plane information serves as the side information that reflects the image edge information, making the image reconstruction more precise. When more bit planes are transmitted, higher quality of the decompressed image can be achieved. The experimental results show that our proposed scheme achieves much better performance than the existing lossy compression scheme for pixel-value encrypted images and also similar performance as the state-of-the-art lossy compression for pixel permutation-based encrypted images. In addition, our proposed scheme has the following advantages: at the decoder side, no computationally intensive iteration and no additional public orthogonal matrix are needed. It works well for both smooth and texture-rich images.

Switching activity reduction in low power Booth multiplier

Conference Paper

Full-text available

May 2008

A new low power multiplication algorithm for reducing the switching activity through operand decomposition for Radix-8 Booth multiplier is proposed. The proposed algorithm incorporates our proposed Redundant Binary Signed Digit (RBSD) Modified Booth-3 (Radix-8) encoding scheme to generate RBSD partial product rows and low power RB Adder unit designed for accumulation and thereby circumventing the need to generate hard multiples and sign extension. Experimental results show a reduction of 21% in dynamic power consumption and at least 44% reduction in Energy Delay Product (EDP) with a penalty of 4% in area.

Lossy Compression of Noisy Images Based on Visual Quality: A Comprehensive Study

Article

Full-text available

Dec 2010

This paper concerns lossy compression of images corrupted by additive noise. The main contribution of the paper is that analysis is carried out from the viewpoint of compressed image visual quality. Several coders for which the compression ratio is controlled in different manner are considered. Visual quality metrics that are the most adequate for the considered application (WSNR, MSSIM, PSNR-HVS-M, and PSNR-HVS) are used. It is demonstrated that under certain conditions visual quality of compressed images can be slightly better than quality of original noisy images due to image filtering through lossy compression. The "optimal" parameters of coders for which this positive effect can be observed depend upon standard deviation of the noise. This allows proposing automatic procedure for compressing noisy images in the neighborhood of optimal operation point, that is, when visual quality either improves or degrades insufficiently. Comparison results for a set of grayscale test images and several variances of noise are presented.

Review article: Efficient multiplier architecture in VLSI design

Article

Apr 2012

Designing high-speed multipliers with low power and regular in layout have substantial research interest. The analysis is done on the basis of certain performance parameters i.e. Area, Speed and Power consumption and dissipation. Multipliers are considered to be an important component in DSP applications like filters. Therefore, the low power multiplier is a necessity for the design and implementation of efficient power-aware devices. In this paper we have analyzed and reviewed a few multiplier architectures based on their working principle, speed and power efficiency.

Efficient modulo 2 +1 multiply and multiply-add units based on modified Booth encoding

Article

Jan 2014
INTEGRATION

In this work a new efficient modulo 2n+1 modified Booth multiplication algorithm for both operands in the weighted representation is proposed. Furthermore, the same algorithm is extended to realize modulo 2n+1 multiply-add units. The derived partial products are reduced by an inverted end around carry-save adder tree to two operands, which are finally added by a modulo 2n+1 adder. The performance and efficiency of the proposed multipliers are evaluated and compared against the earlier modulo 2n+1 multipliers, based on a single gate level model. Comparisons based on experimental CMOS implementations for both the multiply and multiply-add units are also given. The proposed multipliers yield area and power savings by an average of 15% and 10% respectively, while the corresponding area and power savings of the proposed multiply-add units are 14% and 21% respectively.

A survey of hardware designs for decimal arithmetic

Article

May 2010
IBM J RES DEV

Decimal data and decimal arithmetic operations are ubiquitous in daily life. Although microprocessors normally use binary arithmetic for computations, decimal arithmetic is often required in financial and commercial applications. Due to the increasing importance of and demand for decimal arithmetic, decimal floating-point (DFP) formats and operations are specified in the revised IEEE Standard for Floating-Point Arithmetic (IEEE 754-2008). This paper provides a survey of hardware designs for decimal arithmetic. It gives an overview of DFP arithmetic in IEEE 754-2008, describes processors that provide hardware and instruction set support for decimal arithmetic, and provides a survey of hardware designs for decimal addition, subtraction, multiplication, and division. Finally, it describes potential areas for future research.

Low power and high speed multiplier design with row bypassing and parallel architecture

Article

Oct 2010
MICROELECTRON J

This paper presents a low power and high speed row bypassing multiplier. The primary power reductions are obtained by tuning off MOS components through multiplexers when the operands of multiplier are zero. Analysis of the conventional DSP applications shows that the average of zero input of operand in multiplier is 73.8 percent. Therefore, significant power consumption can be reduced by the proposed bypassing multiplier. The proposed multiplier adopts ripple-carry adder with fewer additional hardware components. In addition, the proposed bypassing architecture can enhance operating speed by the additional parallel architecture to shorten the delay time of the proposed multiplier. Both unsigned and signed operands of multiplier are developed. Post-layout simulations are performed with standard TSMC 0.18 μm CMOS technology and 1.8 V supply voltage by Cadence Spectre simulation tools. Simulation results show that the proposed design can reduce power consumption and operating speed compared to those of counterparts. For a 16×16 multiplier, the proposed design achieves 17 and 36 percent reduction in power consumption and delay, respectively, at the cost of 20 percent increase of chip area in comparison with those of conventional array multipliers. In addition, the proposed design achieves averages of 11 and 38 percent reduction in power consumption and delay with 46 percent less chip area in comparison with those counterparts for both unsigned and signed multipliers. The proposed design is suitable for low power and high speed arithmetic applications.

ASIC design of a high speed low power circuit for factorial calculation using ancient Vedic mathematics

Article

Dec 2011
MICROELECTRON J

ASIC design of a high speed low power circuit for factorial calculation of a number is reported in this paper. The factorial of a number can be calculated using iterative multiplication by incrementing or decrementing process and iterative multiplication can be computed through parallel implementation methodology. Parallel implementation along with Vedic multiplication methodology for calculation of factorial of a number ensures significant reduction in propagation delay and switching power consumption due to reduction of stages in multiplication process, in comparison with the conventionally used Vedic multiplication methodologies like ‘Urdhva-tiryakbyham’ (UT) and ‘Nikhilam Navatascaramam Dasatah’ (NND) based implementation methodology. Transistor level implementation was carried out using spice specter with standard 90nm CMOS technology and the results were compared with the above mentioned conventional methodologies. The propagation delay for the calculation of 4-bit factorial of a number was only ∼42.13ns while the power consumption of the same was ∼58.82mW for a layout area of ∼6mm2. Improvement in speed was found to be ∼33% and ∼24% while corresponding reduction of power consumption in ∼34.48% and ∼24% for the factorial calculation circuitry in comparison with UT and NND based implementations, respectively.

Jan 1981
1449-1462

J R Boddie
G T Daryanani
I I Eldumiati
R N Gadenz
J S Thompson
S M Walters

J. R. Boddie, G. T. Daryanani, I. I. Eldumiati, R. N, Gadenz, J. S. Thompson, S. M. Walters, Digital Signal Processor: Architecture and Performance, Bell System Technical Journal, 60 (7) (1981), pp. 1449-1462.

A High Bit Rate Serial-Serial Multiplier With On-the-Fly Accumulation by Asynchronous Counters

Oct 2011
1733-1745

Ching Chuen Manas Ranjan Meher
Chip-Hong Jong
Chang

Manas Ranjan Meher, Ching Chuen Jong, and Chip-Hong Chang, "A High Bit Rate Serial-Serial Multiplier With On-the-Fly Accumulation by Asynchronous Counters", IEEE trans. On VLSI systems, Vol. 19, No. 10, pp. 1733-1745, October, 2011.

ASIC design of low power-delay product carry pre-computation based multiplier

Abstract and Figures

Recommended publications

Design of High-Speed Multiplier Architecture Based on Vedic Mathematics

Design of modified booth based multiplier with carry pre-computation

High speed application specific integrated circuit (ASIC) design of convolution and related function...

Design and Analysis of an Iterative Carry Save Adder-based Power-Efficient Multiplier