Fig 8 - uploaded by Sergio Bampi
Content may be subject to copyright.
Example of a 8-bit wide Modified Booth multiplication.

Example of a 8-bit wide Modified Booth multiplication.

Source publication
Conference Paper
Full-text available
In this work, we present a design of a radix-2m Hybrid array multiplier using Carry Save Adder (CSA) circuit in the partial product lines in order to speed-up the carry propagation along the array. The Hybrid multiplier architecture was previously presented in the literature using Ripple Carry Adders (RCA) in the partial product lines. In our work...

Contexts in source publication

Context 1
... sign extension technique is used in order to keep the regularity of the operation. Figure 8 shows the Modified Booth multiplier architec- ture using carry save adders in the partial product lines. As can be observed in Figure 8, for a 8-bit architecture, 4 operand circuits are necessary to calculate the partial prod- uct terms. ...
Context 2
... 8 shows the Modified Booth multiplier architec- ture using carry save adders in the partial product lines. As can be observed in Figure 8, for a 8-bit architecture, 4 operand circuits are necessary to calculate the partial prod- uct terms. These circuits are composed by an encoder and a multiplexer which produces the multiplicand term ac- cording to the 3-bit in the multiplier term (MR). ...

Similar publications

Article
Full-text available
Inspired by the biology of human tactile perception, a hardware neuromorphic approach is proposed for spiking model of mechanoreceptors to encode the input force. In this way, a digital circuit is designed for a slowly adapting type I (SA-I) and fast adapting type I (FA-I) mechanoreceptors to be implemented on a low-cost digital hardware, such as f...

Citations

... This architecture leads to an improvement in performance at the cost of an area penalty. In [12], another approach for radix-2 m is presented by using carry save adder (CSA) in the partial products, to improve the corresponding modified Booth circuits. Dedicated 16 and 256 radices by using CSA in the partial products are presented in [27], which improve the generic solutions presented in [12,13]. ...
... In [12], another approach for radix-2 m is presented by using carry save adder (CSA) in the partial products, to improve the corresponding modified Booth circuits. Dedicated 16 and 256 radices by using CSA in the partial products are presented in [27], which improve the generic solutions presented in [12,13]. Finally, a reconfigurable High-radix Booth multiplier for video-and image-processing applications is presented in [15]; however, this is a ROM-based architecture. ...
Article
The multiplication of two signed inputs, , can be accelerated by using the iterative Booth algorithm. Although high radix multipliers require summing a smaller number of partial products, and consume less power, its performance is restricted by the generation of the required hard multiples of B ( terms). Mixed radix architectures are presented herein as a method to exploit the use of several radices. In order to implement efficient multipliers, we propose to overlap the computation of the terms for higher radices with the addition of the partial products associated to lower radices. Two approaches are presented which have different advantages, namely a combinatory design and a synchronous design. The best solutions for the combinatory mixed radix multiplier for bits require and less area and delay in comparison to its counterpart radix-4 multiplier, whereas the synchronous solution for bits is almost smaller in comparison with the combinatory solution, although at the cost of about slowdown. Moreover, we propose to extend this technique to further improve the multipliers for residue number systems. Experimental results demonstrate that best proposed modulo and multiplier designs for the same width, bits, provide an Area-Delay-Product similar for the case of the combinatory approach and reduction for the synchronous design, when compared to their respective counterpart radix-4 solutions.
... In 2005, Fonseca, M.; da Costa, E. et al presented a design of a Radix 2 m hybrid Array multiplier to handle operands in 2's- complement form by using Carry Save Adder in each partial product lines. The results showed that the multiplier architecture with CSA gives better performance in terms of area, speed and power consumption as compared to the architecture with RCA [10]. Further in 2008, Hasan Krad and Aws Yousif Al-Taie worked on performance analysis of a 32-Bit unsigned data multiplier with CLA logic and a 32-bit multiplier with a RCA using VHDL. ...
Article
Full-text available
In this paper, design of two different array multipliers are presented, one by using carry-look-ahead (CLA) logic for addition of partial product terms and another by introducing Carry Save Adder (CSA) in partial product lines. The multipliers presented in this paper were all modeled using VHDL (Very High Speed Integration Hardware Description Language) for 32-bit unsigned data. The comparison is done on the basis of three performance parameters i.e. Area, Speed and Power consumption. To design an efficient integrated circuit in terms of area, power and speed, has become a challenging task in modern VLSI design field. Previously in the literature, performance analysis was carried out between multiplier using Ripple carry adder (RCA) and by using CLA. In this work, same multiplier is designed by using CSA logic and compare it's performance with the multiplier designed by using CLA logic. Multiplier with CSA gives better result in terms of speed (78.3% improvement), area (reduced by 4.2%) and power consumption (decreased by 1.4%).
Article
Full-text available
The extended GCD (XGCD) calculation, which computes Bézout coefficients ba, bb such that ba ∗ a0 + bb ∗ b0 = GCD(a0, b0), is a critical operation in many cryptographic applications. In particular, large-integer XGCD is computationally dominant for two applications of increasing interest: verifiable delay functions that square binary quadratic forms within a class group and constant-time modular inversion for elliptic curve cryptography. Most prior work has focused on fast software implementations. The few works investigating hardware acceleration build on variants of Euclid’s division-based algorithm, following the approach used in optimized software. We show that adopting variants of Stein’s subtraction-based algorithm instead leads to significantly faster hardware. We quantify this advantage by performing a large-integer XGCD accelerator design space exploration comparing Euclid- and Stein-based algorithms for various application requirements. This exploration leads us to an XGCD hardware accelerator that is flexible and efficient, supports fast average and constant-time evaluation, and is easily extensible for polynomial GCD. Our 16nm ASIC design calculates 1024-bit XGCD in 294ns (8x faster than the state-of-the-art ASIC) and constant-time 255-bit XGCD for inverses in the field of integers modulo the prime 2255−19 in 85ns (31× faster than state-of-the-art software). We believe our design is the first high-performance ASIC for the XGCD computation that is also capable of constant-time evaluation. Our work is publicly available at https://github.com/kavyasreedhar/sreedhar-xgcd-hardware-ches2022.
Conference Paper
Increasing demand for the mobile, low energy systems has laid emphasis on the development of low power processors. Low power design has to be incorporated into fundamental computation units, such as multipliers. The optimization of the energy-delay product in such low power multipliers will enable energy efficient computation. This study proposes a power estimation tool to analyze different array multiplier architectures, which are most commonly used in such applications. Gate level library design parameters are utilized to derive energy-delay performance for any given set of input vector patterns, and multiplier size. Vector and size dependent factors are therefore clearly identified. Examples are provided from carry save array multiplier (CSAM) and ripple carry array multiplier (RCAM) to demonstrate the capabilities for the tool.
Article
In this paper, we propose a partitioning and gating technique for the design of a high performance and low-power multiplier for kernel-based operations such as 2D convolution in video processing applications. The proposed technique reduces dynamic power consumption by analyzing the bit patterns in the input data to reduce switching activities. Special values of the pixels in the video streams such as zero, repeated values or repeated bit combinations are detected and data paths in the architecture design are disabled appropriately to eliminate unnecessary switching. Input pixels in the video stream are partitioned into halves to increase the possibility of detecting special values. It is observed that the proposed scheme helps to reduce dynamic power consumption in the 2D convolution operations up to 33%.
Conference Paper
In this paper, we propose a neighborhood dependent approach (NDA) for the design of a high performance and low power radix-4 booth multiplier for kernel-based operations such as 2D convolution in video processing applications. The proposed technique reduces dynamic power consumption by analyzing the bit patterns in the input data to reduce switching activities. Special values of the pixels in the video streams such as zero, repeated values or repeated bit combinations are detected and data paths in the architecture design are disabled appropriately to eliminate unnecessary switching in arithmetic units and data buses. Input pixels in the video stream are partitioned into halves to increase the possibility of detecting special values. It is observed that the proposed scheme helps to reduce operations and switching activities in the 2D convolution operations up to 46% of the switching activity rate which results in significant power reduction with low hardware overhead.