ArticlePDF Available

ASIC design of low power-delay product carry pre-computation based multiplier

Authors:

Abstract and Figures

High speed and efficient multipliers are essential components in today’s computational circuits like digital signal processing, algorithms for cryptography and high performance processors. Invariably, almost all processing units will contain hardware multipliers based on some algorithm that fits the application requirement. Tremendous advances in VLSI technology over the past several years resulted in an increased need for high speed multipliers and compelled the designers to go for trade-offs among speed, power consumption and area. Amongst various methods of multiplication, Vedic multipliers are gaining ground due to their expected improvement in performance. A novel multiplier design for high speed VLSI applications using Urdhva-Tiryagbhyam sutra of Vedic Multiplication has been presented in this paper. The proposed architecture modeled using Verilog HDL, simulated using Cadence NCSIM and synthesized using Cadence RTL Compiler with 65nm TSMC library.The proposed multiplier architecture is compared with the existing multipliers and the results show significant improvement in speed and power dissipation. © 2019 Institute of Advanced Engineering and Science. All rights reserved.
Content may be subject to copyright.
Indonesian Journal of Electrical Engineering and Computer Science
Vol. 13, No. 2, February 2019, pp. 845~852
ISSN: 2502-4752, DOI: 10.11591/ijeecs.v13.i2.pp845-852 845
Journal homepage: http://iaescore.com/journals/index.php/ijeecs
ASIC design of low power-delay product carry pre-computation
based multiplier
Chaitanya CVS1, Sundaresan C2, P R Venkateswaran3, Keerthana Prasad4
1,2,4School of Information Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India.
3Bharat Heavy Electricals Limited, Tiruchurapalli, Tamil Nadu, India.
Article Info
ABSTRACT
Article history:
Received Oct 6, 2018
Revised Dec 07, 2018
Accepted Dec 21, 2018
High speed and efficient multipliers are essential components in today’s
computational circuits like digital signal processing, algorithms for
cryptography and high performance processors. Invariably, almost all
processing units will contain hardware multipliers based on some algorithm
that fits the application requirement. Tremendous advances in VLSI
technology over the past several years resulted in an increased need for high
speed multipliers and compelled the designers to go for trade-offs among
speed, power consumption and area. Amongst various methods of
multiplication, Vedic multipliers are gaining ground due to their expected
improvement in performance. A novel multiplier design for high speed VLSI
applications using Urdhva-Tiryagbhyam sutra of Vedic Multiplication has
been presented in this paper. The proposed architecture modeled using
Verilog HDL, simulated using Cadence NCSIM and synthesized using
Cadence RTL Compiler with 65nm TSMC library.The proposed multiplier
architecture is compared with the existing multipliers and the results show
significant improvement in speed and power dissipation.
Keywords:
Binary Multiplication
Carry Pre Computation
Multiplier Architecture
Operand Decomposition
Vedic Multiplier
Copyright © 2019 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Chaitanya CVS,
School of Information Sciences,
Manipal Academy of Higher Education,
Manipal 576104, Karnataka, India.
Email: chaitanya.cvs@manipal.edu
1. INTRODUCTION
Processors are important part of integrated circuits (IC). Large numbers of functionalities are packed
in an IC thanks to tremendous growth in density of integration in recent times. As the number of functions
increases, the need for computation also grows. With the advent of new process technologies, shrinking of
feature size and availability of modern CAD tools, a development of complex integrated circuits for various
applications is possible. Examples of such applications include digital signal processing [1,2], mobile
computations and communications, multimedia applications and processing required for scientific computing
and applications etc. The speed and efficiency of processor in such IC is very crucial for meeting the
requirements of the applications supported by the IC. The speed of processor and efficiency of processor in-
turn depends upon an arithmetic logic unit [3] which is considered as the main computational unit of the
processor.
Moreover, the multiplier units [4] are the most important hardware structures in a complex
arithmetic unit. The multiplier units are capable of performing operations on operands of various data
types such as calculating running sum of products. As multiplication is a crucial arithmetic operation in
processors [5] and digital computer systems, multipliers are the core building block for many algorithms in a
wide variety of computing applications. Although multipliers are main arithmetic components used for
processing scientific data, the excessive power consumption and delay attracts attention from the research
ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 13, No. 2, February 2019 : 845 852
846
community. Usually, multiple arithmetic cores working in parallel are used so as to process large amounts of
data with relatively low power and delay.
Various algorithms have been proposed for the hardware implementation of multipliers in the past.
Add and Shift is the common algorithm used in designing of multiplier [6]. In parallel multipliers, the
important parameter which is used to determine performance is the number of partial products which are
needed to be added. One such algorithm is Modified Booth algorithm [7] which reduces the number of partial
products during the multiplication which in turn increases the performance of the multiplier. Another
algorithm is Wallace tree based algorithm which reduces number of adding stages and is used to improve the
speed of multiplication. In some implementations, efficient multiplier architecture is designed by combining
both Modified Booth algorithm and Wallace Tree algorithm. However, an increasing parallelism increases
the number of shifts between intermediate sum and partial products which results in reduced speed,
increased power consumption and also increased area because of irregular structure. Thus, in some cases, low
power and compact multiplier architectures is implemented using serial multiplication algorithm. Serial
multipliers [8] have better performance for power consumption and area with the delay tradeoff. Depending
upon the application, either parallel or serial multipliers are selected to perform the operation.
However, in the high speed processors which are operating at higher clock frequencies, the existing
multiplier takes more delay for execution of the instructions. The existing multiplier units that consume more
power are not suitable to be incorporated in the processors which are used in wireless and portable devices.
Thus, power savings is an important area for improvement.
In order to address the low power computation along with high performance, a new approach to
multiplier design based on ancient Vedic Mathematics has been explored. The mathematical operations using
Vedic mathematics are very fast and require less hardware. This aspect of Vedic mathematics can be utilized
to increase the computational speed of multipliers. This paper describes the design and implementation of a
Vedic multiplier based on Urdhva-Tiryagbhyam Sutra [9]-[11]. The number of steps required to perform a
multiplication operation by using UrdhvaTiryagbhyam Sutra are considerably less compared to the
conventional multiplication techniques. In this paper, we have further explored a novel method to enhance
the speed of a Vedic multiplier by pre-computing the carries which are used during summation of partial
products. The implementation of pre-computation logic using multiplexer based carry-look ahead logic and
XOR logic resulted in reduction of delay. The proposed multiplier along with operand decomposition
technique resulted in reduction of power consumption which in turn reduced the power-delay product of the
multiplier.
The structure of the paper is divided as follows: The methodology and the architecture of the
proposed multipliers are given in section 2. Results are presented in section 3. Finally, conclusion is given in
section 4.
2. RESEARCH METHOD
2.1. Carry pre-computation based binary multiplier
An 8 bit Binary Vedic Multiplier has been proposed with A and B as inputs and P as the final 16-bit
product. The block diagram for 8 bit multiplication is shown in Figure 1. In the proposed multiplier the
operands A and B are divided into Higher and Lower parts with 4-bits each.
A = {AH, AL} (1)
B = {BH, BL} (2)
AL*BH
AH*BH
AH*AL
Product
Figure 1. Block Diagram of 8-bit Multiplication
In this type of multiplier an 8 bit Binary multiplication is realized using 4-bit binary vedic
multiplication using carry pre-computation logic shown in below Figure 2. where A3, A2, A1, A0 & B3, B2,
B1, B0 are 4 bit binary inputs and P7, P6, P5, P4, P3, P2, P1, P0 are the binary output bits.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
ASIC design of low power-delay product carry pre-computation based multiplier (Chaitanya CVS)
847
A3
A2
A1
A0
X
B3
B2
B1
B0
pp4
pp3
pp2
pp1
pp8
pp7
pp6
pp5
pp12
pp11
pp10
pp9
pp16
pp15
pp14
pp13
c32
c31
c2
c42
c41
c52
c51
c62
c61
c71
P8
P7
P6
P5
P4
P3
P2
P1
Figure 2. Carry Pre-Computation Based Multiplier
The architecture of the 4-bit multiplier can be understood from the block diagram shown in Figure 3.
Partial Products Generator
XOR Logic
Pre-Carry Logic
B[3:0]
A[3:0]
Product[7:0]
PP[15:0] PP[15:0]
Pre-Computed
Carries
Figure 3. Architecture of Carry Pre-Computation based Multiplier
The partial product generator is the first block of the multiplier to which the 4 bit multiplicand and
multiplier are given as inputs. At this juncture, the multiplication technique used is Urdhva-Tiryagbhyam.
The 4 bit multiplication results in a total of 16 partial products (pp1-pp16). The result of multiplying any one
binary bit with another is either a zero or a one which is simply the logic of ANDing of the two bits.
The products of AL*BL, AH*BL, AL*BH, AH*BH are determined using above 4-bit carry pre-
computation based multiplier and the results of all sub multipliers are added to determine the final product.
The block of the 8-bit multiplier is shown in Figure 4.
ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 13, No. 2, February 2019 : 845 852
848
4-bit Carry Pre-
Computation
Based Multiplier
4-bit Carry Pre-
Computation
Based Multiplier
4-bit Carry Pre-
Computation
Based Multiplier
4-bit Carry Pre-
Computation
Based Multiplier
AH BH AH BL AL BH AL BL
Carry Save Adder
Carry Look Ahead Adder
Carry Look
Ahead
Adder
Product[15:12] Product[11:4] Product[3:0]
A[7:4]
A[7:4]
A[3:0]
A[3:0]
B[7:4]
B[7:4]
B[3:0]
B[3:0]
P1[7:0] P2[7:0] P3[7:0] P4[7:0]
P1[7:4] P4[3:0]
Carry[7:0] Sum[7:0]
C1
4'b0000
Figure 4. Block Diagram of 8-bit Multiplier Using 4-bit Carry Pre-Computation Based Multiplier
The second stage in the block diagram is the carry generation circuit. Here, we have integrated pre-
computation logic along with the Urdhva-Tiryagbhyam multiplication technique. The carry equations are
generated separately for each column of partial products and the inputs for these equations are taken from the
previous column. The equations for pre-computed carries are given below.
c2 = pp5 & pp2; (3)
c3t1 = (pp6 & pp3) | (pp9 & (pp3 | pp6)); (4)
c3t2 = (pp9 & ~pp6)| (pp3 & ~pp9) | (~pp3 & pp6); (5)
c31 = c2?c3t2:c3t1; (6)
c32 = pp2 & pp5 & pp3 & pp6 & pp9; (7)
c41t1 = pp13?((pp10 & ~pp7)| (pp4 & ~pp10) | (~pp4 & pp7)):((pp7 & pp4) | (pp10 & (pp4 | pp7))) (8)
c41t2 = pp13?((~pp7 & ~pp4)| (~pp10 & (~pp4 | ~pp7))):((~pp7 & pp4) | (pp10 & ~pp4) | (~pp10 & pp7)); (9)
c41 = c31?c41t2:c41t1; (10)
c42 = ((c31 & pp13) & ((pp10 & (pp7 | pp4)) | (pp7 & pp4))) | ((pp10 & pp7 & pp4) & (c31 | pp13)); (11)
c51t1 = c32?((pp14 & ~pp11)| (pp8 & ~pp14) | (~pp8 & pp11)):((pp11 & pp8) | (pp14 & (pp8 | pp11))); (12)
c51t2 = c32?((~pp11 & ~pp8)| (~pp14 & (~pp8 | ~pp11))):((~pp11 & pp8) | (pp14 & ~pp8) | (~pp14 & pp11)); (13)
c51 = c41?c51t2:c51t1; (14)
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
ASIC design of low power-delay product carry pre-computation based multiplier (Chaitanya CVS)
849
c52 = ((c41 & c32) & ((pp14 & (pp11 | pp8)) | (pp11 & pp8))) | ((pp14 & pp11 & pp8) & (c41 | c32)); (15)
c6t1 = (pp12 & pp15) | (c42 & (pp12 | pp15)); (16)
c6t2 = (c42 & ~pp15)| (pp12 & ~c42) | (pp15&~pp12); (17)
c61 = c51?c6t2:c6t1; (18)
c62 = c51 & c42 & pp12 & pp15; (19)
c71 = (c52 & pp16) | (c61 & (c52 | pp16)); (20)
The third stage in the block diagram involves the use of XOR logic for the partial products and carry
generated in each column. The output of this stage gives the final 16 bit product which is obtained in a
parallel mechanism instead of sequential mechanism.
2.2. Carry pre-computation based binary multiplier using operand decomposition
In operand decomposition [12], the operands X and Y are decomposed into four numbers A, B, C
and D to reduce the number of ones in the partial products. The operands are decomposed in such a way that
the number of zeros in decomposed operand will be more when compared to number of ones. As the number
of zeros are more, the switching activity of the circuit will be reduced which in turn reduce the dynamic
power consumption of the architecture.
Assuming that the two operands are X and Y have n bits,
X = [Xn-1Xn-2.......X1X0], and
Y = [Yn-1Yn-2.......Y1Y0] (21)
The four decomposed operands are given in the following
A = ~X Λ ~Y,
B = X Λ Y,
C = ~X Λ Y, and
D = X Λ ~Y (22)
Where, Λ is and operation & ~ is two’s complement
The final product is determined by using equation 23.
X*Y = (C * D) - (A * B); (23)
The products of C*D and A*B are determined using 8-bit carry pre-computation based multiplier.
Then the final partial sum and carry from both products can be combined carry save adder and carry look
ahead adder. The block diagram for above multiplier is shown in Figure 5.
3. RESULTS AND ANALYSIS
The proposed architecture modeled using Verilog HDL, simulated using Cadence NCSIM and
synthesized using Cadence RTL Compiler with 65nm TSMC library. Different implementation methodology
have been taken and implemented in same technological environment and then compared the performance
parameters. For the comparison point of view the ideas have been considered from the references and
simulated and performance parameters was computed using the same MOSFET technology file. Input data
was taken in a regular fashion for experimental purpose. The delay and the power measured using the worst-
case pattern and from the output where the delay is maximum.
It is observed that the proposed carry pre-computation based multiplier and carry pre-computation
based multiplier with operand decomposition offered substantial reduction of propagation delay and total
power consumptions. From Table 1 and Table 2, it can be observed that the proposed carry pre-computation
based multiplier design offered ~23%, ~64%, ~57%, ~83%, ~94% when compared with array multiplier,
wallace multiplier, column based multiplier, Nikhilam based and compressor based multipliers respectively,
and carry pre-computation based multiplier with operand decomposition offered ~41%, ~72%, ~67%, ~87%,
ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 13, No. 2, February 2019 : 845 852
850
~95% when compared with array multiplier, wallace multiplier, column based multiplier, Nikhilam based
and compressor based multipliers respectively.
Operand Decomposer
8-bit Carry Pre-
Computation Based
Multiplier
8-bit Carry Pre-
Computation Based
Multiplier
Carry Save Adder
X[7:0] Y[7:0]
A[7:0] B[7:0] C[7:0] D[7:0]
Prod1[15:0] Prod2[15:0]
Product[15:0]
Figure 5. Carry Pre-Computation Based Multiplier Using Operand Decomposition
Table 1. Summary of Synthesis Results of 8-Bit Multiplier Architectures
S.No
Architecture (8-bit)
Delay
(ns)
Dynamic
Power (uW)
Static Power
(uW)
Total Power
(uW)
Power-Delay
Product (pJ)
1
Array Based Multiplier [6]
1.5
15.09
6
21.09
31.63
2
Wallace Based Multiplier [2]
1.2
6.27
49.913
56.184
67.42
3
Column Based Multiplier [9]
1.95
26.74
2.8
29.54
57.6
4
Nikhilam Based Multiplier [10]
3.2
42.56
4.3
46.86
149.95
5
Compressor Based Multiplier [11]
4.02
95.2
6.79
101.99
410.92
6
Pre-Computation Based Multiplier
0.75
25.77
7.45
33.23
24.23
7
Pre Computation Based Multiplier
with Operand Decomposition
1.02
3.36
14.808
18.172
18.5
Table 2. Summary of Synthesis Results of 16-Bit Multiplier Architectures
S.No
Architecture (16-bit)
Delay
(ns)
Dynamic
Power (uW)
Static Power
(uW)
Total Power
(uW)
Power-Delay
Product (pJ)
1
Array Based Multiplier [6]
2.89
30.18
12
42.18
121.90
2
Wallace Based Multiplier [2]
2.46
12.54
99.826
112.366
276.42
3
Column Based Multiplier [9]
3.82
52.48
5.4
57.88
221.10
4
Nikhilam Based Multiplier [10]
5.96
80.65
8.1
88.75
528.95
5
Compressor Based Multiplier [11]
8.04
190.4
13.58
203.98
1639.99
6
Pre-Computation Based Multiplier
1.4
51.54
14.9
66.44
93.016
7
Pre Computation Based Multiplier
with Operand Decomposition
1.96
6.72
29.616
36.336
71.218
From the Table 1 and Table 2, it can be observed that carry pre-computation based multiplier with
operand decomposition consumes less power when compared to carry pre-computation based multiplier with
the delay tradeoff. Proposed Carry pre-computation based multiplier with operand decomposition gave the
better power-delay product when compared to proposed carry pre-computation based multiplier and existing
multiplier from literature.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
ASIC design of low power-delay product carry pre-computation based multiplier (Chaitanya CVS)
851
4. CONCLUSION
In this paper, a Vedic mathematics based multiplier has been proposed which uses Carry pre-
computation and operand decomposition methodology. The proposed architecture combines the benefits of
Vedic method, parallel pre-computation of carries, and operand decomposition thereby resulting in reduction
of power-delay product. The propagation delay of carry pre-computation based multiplier for calculation of 8
bit and 16 bit multiplication was 0.75ns and 1.4ns while power consumption was 33.23 uW and 66.44 uW.
The propagation delay of carry pre-computation based multiplier with operand decomposition for calculation
of 8 bit and 16 bit multiplication was 1.02ns and 1.96ns while power consumption was 18.17 uW and 36.13
uW. The delay of multiplication was decreased by ~68% and power consumption was reduced by ~61%
when compared to Nikhilam based Vedic multiplier.
REFERENCES
[1] Xiangui Kang, AnjiePeng, XianyuXu, Xiaochun Cao, Performing Scalable Lossy Compression On Pixel Encrypted
Images, EURASIP Journal on Image and Video Processing, (2013), pp. 1-6.
[2] Nikolay Ponomarenko, Sergey Krivenko, Vladimir Lukin, Karen Egiazarian, Jaakko T, Astola, Lossy Compression
of Noisy Images Based on Visual Quality: A Comprehensive Study, EURASIP Journal on Advances in Signal
Processing, (2010), pp. 1-13.
[3] L.-K. Wang, M. A. Erle, C. Tsen, E. M. Schwarz, and M. J. Schulte, A survey of hardware designs for decimal
arithmetic, IBM Journal of Research and Development, 54 (2) (2010), pp. 8:1-8:15.
[4] M. Jeevitha, R. Muthaiah, P. Swaminathan, Efficient Multiplier Architecture in VLSI Design, Journal of
Theoretical and Applied Information Technology, 38 (2) (2012), pp. 196-201.2
[5] J. R. Boddie, G. T. Daryanani, I. I. Eldumiati, R. N, Gadenz, J. S. Thompson, S. M. Walters, Digital Signal
Processor: Architecture and Performance, Bell System Technical Journal, 60 (7) (1981), pp. 1449-1462.
[6] Ko-Chi Kuo, Chi-Wen Chou, Low Power And High Speed Multiplier Design With Row Bypassing And Parallel
Architecture, Microelectronics Journal, 41 (2010), pp. 639-650.
[7] Constantinos Efstathiou, N. Moshopolous, N. Axelos, K. Pekmestzi, Efficient Modulo 2n+1 Multiply And
Multiply-Add Units Based On Modified Booth Encoding, Integration, the VLSI Journal, 47 (2014), pp. 140-147.
[8] Manas Ranjan Meher, Ching Chuen Jong, and Chip-Hong Chang, “A High Bit Rate Serial-Serial Multiplier With
On-the-Fly Accumulation by Asynchronous Counters”, IEEE trans. On VLSI systems, Vol. 19, No. 10, pp. 1733-
1745, October, 2011.
[9] BharatiKrsnaTirthaji, V. S Agrawala, “Vedic Mathematics”, 13th Edition, Motilal Banarsidass, 2010.
[10] P. Saha, A. Banerjee, A. Dandapat, and P. Bhattacharyya, “ASIC design of a high speed low power circuit for
factorial calculation using ancient Vedic mathematics”, ELSEVIER Microelectronics Journal, vol. 42, issue 12, pp.
1343-1352, December, 2011.
[11] MD. Belal Rashid, Balaji B.S and Prof. M.B. Anandaraju, “VLSI Design and Implementation of Binary Multiplier
based on UrdhvaTiryagbhyam Sutra with reduced Delay and Area”, International Journal of Engineering Research
and Technology, vol. 6, no. 2, pp. 269-278, March, 2013.
[12] Rizwan Mudassir, Mohab Anis, and Javid Jaffari, “Switching Activity Reduction in Low Power booth Multiplier”,
IEEE Symposium on Circuits and Systems, Seattle, vol. 1, pp. 3306-3309, May, 2008.
BIOGRAPHIES OF AUTHORS
Chaitanya CVS received his Bachelor Degree in Electronics and Communication Engineering in
2006 from JNTU, Hyderabad and his MS degree in VLSI-CAD from Manipal University in 2007.
In 2010, he started his career as Assistant Professor in School Of Information Sciences, Manipal.
Currently, he is doing Ph.D at Manipal University. His research interest includes High Performance
Computer Arithmetic, Advanced Computer Architecture, Low-power VLSI Design, Electronic
Design Automation, and Parallel Algorithms/Architectures.
Dr. C Sundaresan completed Bachelor degree in Electronics and Communication in 2000
from Madurai Kamaraj University and MS degree in VLSI CAD in 2003 from Manipal
University and PhD in 2018 from Manipal Academy of Higher Education. He started his
career as R & D engineer at Aplab Ltd. Currently he is working as Assistant Professor in
School Of Information Sciences. His research interests includes Computer Arithmetic,
Low-Power VLSI Design, Logic Synthesis, Static Timing Analysis.
ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 13, No. 2, February 2019 : 845 852
852
Dr. P. R. Venkateswaran obtained his bachelor’s degree in Electronics and Instrumentation
Engineering from National Engineering College, Kovilpatti in 1998 and Masters in Instrumentation
and Control Engineering from Technical Teachers’ Training Institute, Chandigarh in 2002. He
completed his doctoral research in 2008 from Manipal University, Manipal. He started his career as
teaching faculty at Sethu Institute of Technology, Madurai and continued his teaching career with
Technical Teachers’ Training Institute, Chandigarh and later at Manipal Institute of Technology,
Manipal. Presently, he is working as Senior Engineer (Control and Instrumentation) at Welding
Research Institute, BHEL, Tiruchirappalli and is associated in the areas of Welding Automation
and Welding Power Sources. His areas of interest are linear Control theory, Electronic
Instrumentation and Soft Computing Techniques. He has been a reviewer for journals like IEEE
SMC, Elsevier, AMSE etc. He is a member of professional bodies of ISTE, IWS and IE.
Dr. Keerthana Prasad is working as Professor in School of Information Sciences, a constituent
institution of Manipal University. Her research interests are image analysis and its applications in
medicine and high performance computing approach for image processing.
... Speed and power consumption are the two main aspects considered while designing a system in the field of communication [1]. The multipliers (more specifically the adders) which form the major part of these systems affect its speed [2]. The more complex a multiplier or its related adder is, the more is its effect on the speed [3]. ...
Article
Full-text available
Recently, the increased use of portable devices, has driven the research world to design systems with low power-consumption and high throughput. Vedic multiplier provides least delay even in complex multiplications when compared to other conventional multipliers. In this paper, a 64-bit multiplier is created using the Urdhava Tiryakbhyam sutra in Vedic mathematics. The design of this 64-bit multiplier is implemented in five different ways with the pipelining concept applied at different stages of adder complexities. The different architectures show different delay and power consumption. It is noticed that as complexity of adders in the multipliers reduce, the systems show improved speed and least hardware utilization. The architecture designed using 2 x 2 – bit pipelined Vedic multiplier is, then, compared with existing Vedic multipliers and conventional multipliers and shows least delay.
Article
Full-text available
In modern very large scale integrated (VLSI) digital systems, power consumption has become a critical concern of VLSI designers. As size shrinks and density increases in chips, it will be a challenge to design high performance and low-power digital systems. Therefore, VLSI designers are trying to reduce power dissipation in these systems by using power optimization techniques. Different mathematical operations can be found in the architectures of most digital systems. The focus of this paper is division. In comparison to other basic computational operations, division requires more iterations, takes a long time, covers a large area, and consumes more power from the digital system. As a result, the system's design requires high speed and a low-power divider in order to improve its overall performance. This paper focuses on dynamic power dissipation. In order to determine which design consumes the lowest dynamic power, different system designs of digit-recurrence division algorithms, such as restoring division and non-restoring division are suggested. An innovative power-optimization technique, the very hardware descriptions language (VHDL) technique, is utilized to the suggested system designs. The VHDL technique achieved the higher optimization in dynamic power, at 93.66% for non-restoring division with internal-loop iteration, than traditional approaches.
Article
Full-text available
Compression of encrypted data draws much attention in recent years due to the security concerns in a service-oriented environment such as cloud computing. We propose a scalable lossy compression scheme for images having their pixel value encrypted with a standard stream cipher. The encrypted data are simply compressed by transmitting a uniformly subsampled portion of the encrypted data and some bitplanes of another uniformly subsampled portion of the encrypted data. At the receiver side, a decoder performs content-adaptive interpolation based on the decrypted partial information, where the received bit plane information serves as the side information that reflects the image edge information, making the image reconstruction more precise. When more bit planes are transmitted, higher quality of the decompressed image can be achieved. The experimental results show that our proposed scheme achieves much better performance than the existing lossy compression scheme for pixel-value encrypted images and also similar performance as the state-of-the-art lossy compression for pixel permutation-based encrypted images. In addition, our proposed scheme has the following advantages: at the decoder side, no computationally intensive iteration and no additional public orthogonal matrix are needed. It works well for both smooth and texture-rich images.
Conference Paper
Full-text available
A new low power multiplication algorithm for reducing the switching activity through operand decomposition for Radix-8 Booth multiplier is proposed. The proposed algorithm incorporates our proposed Redundant Binary Signed Digit (RBSD) Modified Booth-3 (Radix-8) encoding scheme to generate RBSD partial product rows and low power RB Adder unit designed for accumulation and thereby circumventing the need to generate hard multiples and sign extension. Experimental results show a reduction of 21% in dynamic power consumption and at least 44% reduction in Energy Delay Product (EDP) with a penalty of 4% in area.
Article
Full-text available
This paper concerns lossy compression of images corrupted by additive noise. The main contribution of the paper is that analysis is carried out from the viewpoint of compressed image visual quality. Several coders for which the compression ratio is controlled in different manner are considered. Visual quality metrics that are the most adequate for the considered application (WSNR, MSSIM, PSNR-HVS-M, and PSNR-HVS) are used. It is demonstrated that under certain conditions visual quality of compressed images can be slightly better than quality of original noisy images due to image filtering through lossy compression. The "optimal" parameters of coders for which this positive effect can be observed depend upon standard deviation of the noise. This allows proposing automatic procedure for compressing noisy images in the neighborhood of optimal operation point, that is, when visual quality either improves or degrades insufficiently. Comparison results for a set of grayscale test images and several variances of noise are presented.
Article
Designing high-speed multipliers with low power and regular in layout have substantial research interest. The analysis is done on the basis of certain performance parameters i.e. Area, Speed and Power consumption and dissipation. Multipliers are considered to be an important component in DSP applications like filters. Therefore, the low power multiplier is a necessity for the design and implementation of efficient power-aware devices. In this paper we have analyzed and reviewed a few multiplier architectures based on their working principle, speed and power efficiency.
Article
In this work a new efficient modulo 2n+1 modified Booth multiplication algorithm for both operands in the weighted representation is proposed. Furthermore, the same algorithm is extended to realize modulo 2n+1 multiply-add units. The derived partial products are reduced by an inverted end around carry-save adder tree to two operands, which are finally added by a modulo 2n+1 adder. The performance and efficiency of the proposed multipliers are evaluated and compared against the earlier modulo 2n+1 multipliers, based on a single gate level model. Comparisons based on experimental CMOS implementations for both the multiply and multiply-add units are also given. The proposed multipliers yield area and power savings by an average of 15% and 10% respectively, while the corresponding area and power savings of the proposed multiply-add units are 14% and 21% respectively.
Article
Decimal data and decimal arithmetic operations are ubiquitous in daily life. Although microprocessors normally use binary arithmetic for computations, decimal arithmetic is often required in financial and commercial applications. Due to the increasing importance of and demand for decimal arithmetic, decimal floating-point (DFP) formats and operations are specified in the revised IEEE Standard for Floating-Point Arithmetic (IEEE 754-2008). This paper provides a survey of hardware designs for decimal arithmetic. It gives an overview of DFP arithmetic in IEEE 754-2008, describes processors that provide hardware and instruction set support for decimal arithmetic, and provides a survey of hardware designs for decimal addition, subtraction, multiplication, and division. Finally, it describes potential areas for future research.
Article
This paper presents a low power and high speed row bypassing multiplier. The primary power reductions are obtained by tuning off MOS components through multiplexers when the operands of multiplier are zero. Analysis of the conventional DSP applications shows that the average of zero input of operand in multiplier is 73.8 percent. Therefore, significant power consumption can be reduced by the proposed bypassing multiplier. The proposed multiplier adopts ripple-carry adder with fewer additional hardware components. In addition, the proposed bypassing architecture can enhance operating speed by the additional parallel architecture to shorten the delay time of the proposed multiplier. Both unsigned and signed operands of multiplier are developed. Post-layout simulations are performed with standard TSMC 0.18 μm CMOS technology and 1.8 V supply voltage by Cadence Spectre simulation tools. Simulation results show that the proposed design can reduce power consumption and operating speed compared to those of counterparts. For a 16×16 multiplier, the proposed design achieves 17 and 36 percent reduction in power consumption and delay, respectively, at the cost of 20 percent increase of chip area in comparison with those of conventional array multipliers. In addition, the proposed design achieves averages of 11 and 38 percent reduction in power consumption and delay with 46 percent less chip area in comparison with those counterparts for both unsigned and signed multipliers. The proposed design is suitable for low power and high speed arithmetic applications.
Article
ASIC design of a high speed low power circuit for factorial calculation of a number is reported in this paper. The factorial of a number can be calculated using iterative multiplication by incrementing or decrementing process and iterative multiplication can be computed through parallel implementation methodology. Parallel implementation along with Vedic multiplication methodology for calculation of factorial of a number ensures significant reduction in propagation delay and switching power consumption due to reduction of stages in multiplication process, in comparison with the conventionally used Vedic multiplication methodologies like ‘Urdhva-tiryakbyham’ (UT) and ‘Nikhilam Navatascaramam Dasatah’ (NND) based implementation methodology. Transistor level implementation was carried out using spice specter with standard 90nm CMOS technology and the results were compared with the above mentioned conventional methodologies. The propagation delay for the calculation of 4-bit factorial of a number was only ∼42.13ns while the power consumption of the same was ∼58.82mW for a layout area of ∼6mm2. Improvement in speed was found to be ∼33% and ∼24% while corresponding reduction of power consumption in ∼34.48% and ∼24% for the factorial calculation circuitry in comparison with UT and NND based implementations, respectively.
  • J R Boddie
  • G T Daryanani
  • I I Eldumiati
  • R N Gadenz
  • J S Thompson
  • S M Walters
J. R. Boddie, G. T. Daryanani, I. I. Eldumiati, R. N, Gadenz, J. S. Thompson, S. M. Walters, Digital Signal Processor: Architecture and Performance, Bell System Technical Journal, 60 (7) (1981), pp. 1449-1462.
A High Bit Rate Serial-Serial Multiplier With On-the-Fly Accumulation by Asynchronous Counters
  • Ching Chuen Manas Ranjan Meher
  • Chip-Hong Jong
  • Chang
Manas Ranjan Meher, Ching Chuen Jong, and Chip-Hong Chang, "A High Bit Rate Serial-Serial Multiplier With On-the-Fly Accumulation by Asynchronous Counters", IEEE trans. On VLSI systems, Vol. 19, No. 10, pp. 1733-1745, October, 2011.