Example of a 8-bit wide Modified Booth multiplication.

Source publication

Design of a Radix-2m Hybrid Array Multiplier Using Carry Save Adder

Conference Paper

Full-text available

Oct 2005

In this work, we present a design of a radix-2m Hybrid array multiplier using Carry Save Adder (CSA) circuit in the partial product lines in order to speed-up the carry propagation along the array. The Hybrid multiplier architecture was previously presented in the literature using Ripple Carry Adders (RCA) in the partial product lines. In our work...

Context 1

... sign extension technique is used in order to keep the regularity of the operation. Figure 8 shows the Modified Booth multiplier architec- ture using carry save adders in the partial product lines. As can be observed in Figure 8, for a 8-bit architecture, 4 operand circuits are necessary to calculate the partial prod- uct terms. ...

View in full-text

Context 2

... 8 shows the Modified Booth multiplier architec- ture using carry save adders in the partial product lines. As can be observed in Figure 8, for a 8-bit architecture, 4 operand circuits are necessary to calculate the partial prod- uct terms. These circuits are composed by an encoder and a multiplexer which produces the multiplicand term ac- cording to the 3-bit in the multiplier term (MR). ...

View in full-text

A cross section of the glabrous skin which shows individual type of...

The model for Merkel Cells (SA-I) and Meissner's Corpuscle (FA-I)...

TABLE 3 | Device utilization summary of the ZedBoard.

Scheduling diagram for spiking part of the (A) Merkel cell (SA-I), (B)...

The time response of the Merkel Cells (SA-I) mechanoreceptor in mV. (A)...

A Digital Hardware Realization for Spiking Model of Cutaneous Mechanoreceptor

Article

Full-text available

Jun 2018

Inspired by the biology of human tactile perception, a hardware neuromorphic approach is proposed for spiking model of mechanoreceptors to encode the input force. In this way, a digital circuit is designed for a slowly adapting type I (SA-I) and fast adapting type I (FA-I) mechanoreceptors to be implemented on a low-cost digital hardware, such as f...

Method for Designing Efficient Mixed Radix Multipliers

Article

Oct 2014
CIRC SYST SIGNAL PR

The multiplication of two signed inputs, , can be accelerated by using the iterative Booth algorithm. Although high radix multipliers require summing a smaller number of partial products, and consume less power, its performance is restricted by the generation of the required hard multiples of B ( terms). Mixed radix architectures are presented herein as a method to exploit the use of several radices. In order to implement efficient multipliers, we propose to overlap the computation of the terms for higher radices with the addition of the partial products associated to lower radices. Two approaches are presented which have different advantages, namely a combinatory design and a synchronous design. The best solutions for the combinatory mixed radix multiplier for bits require and less area and delay in comparison to its counterpart radix-4 multiplier, whereas the synchronous solution for bits is almost smaller in comparison with the combinatory solution, although at the cost of about slowdown. Moreover, we propose to extend this technique to further improve the multipliers for residue number systems. Experimental results demonstrate that best proposed modulo and multiplier designs for the same width, bits, provide an Area-Delay-Product similar for the case of the combinatory approach and reduction for the synchronous design, when compared to their respective counterpart radix-4 solutions.

Performance Analysis of 32Bit Array Multiplier with a Carry Save Adder and with a Carry-Look-Ahead Adder

Article

Full-text available

In this paper, design of two different array multipliers are presented, one by using carry-look-ahead (CLA) logic for addition of partial product terms and another by introducing Carry Save Adder (CSA) in partial product lines. The multipliers presented in this paper were all modeled using VHDL (Very High Speed Integration Hardware Description Language) for 32-bit unsigned data. The comparison is done on the basis of three performance parameters i.e. Area, Speed and Power consumption. To design an efficient integrated circuit in terms of area, power and speed, has become a challenging task in modern VLSI design field. Previously in the literature, performance analysis was carried out between multiplier using Ripple carry adder (RCA) and by using CLA. In this work, same multiplier is designed by using CSA logic and compare it's performance with the multiplier designed by using CLA logic. Multiplier with CSA gives better result in terms of speed (78.3% improvement), area (reduced by 4.2%) and power consumption (decreased by 1.4%).

A Fast Large-Integer Extended GCD Algorithm and Hardware Design for Verifiable Delay Functions and Modular Inversion

Article

Full-text available

Aug 2022

The extended GCD (XGCD) calculation, which computes Bézout coefficients ba, bb such that ba ∗ a0 + bb ∗ b0 = GCD(a0, b0), is a critical operation in many cryptographic applications. In particular, large-integer XGCD is computationally dominant for two applications of increasing interest: verifiable delay functions that square binary quadratic forms within a class group and constant-time modular inversion for elliptic curve cryptography. Most prior work has focused on fast software implementations. The few works investigating hardware acceleration build on variants of Euclid’s division-based algorithm, following the approach used in optimized software. We show that adopting variants of Stein’s subtraction-based algorithm instead leads to significantly faster hardware. We quantify this advantage by performing a large-integer XGCD accelerator design space exploration comparing Euclid- and Stein-based algorithms for various application requirements. This exploration leads us to an XGCD hardware accelerator that is flexible and efficient, supports fast average and constant-time evaluation, and is easily extensible for polynomial GCD. Our 16nm ASIC design calculates 1024-bit XGCD in 294ns (8x faster than the state-of-the-art ASIC) and constant-time 255-bit XGCD for inverses in the field of integers modulo the prime 2255−19 in 85ns (31× faster than state-of-the-art software). We believe our design is the first high-performance ASIC for the XGCD computation that is also capable of constant-time evaluation. Our work is publicly available at https://github.com/kavyasreedhar/sreedhar-xgcd-hardware-ches2022.

PETAM: Power estimation tool for array multipliers

Conference Paper

Dec 2012

Increasing demand for the mobile, low energy systems has laid emphasis on the development of low power processors. Low power design has to be incorporated into fundamental computation units, such as multipliers. The optimization of the energy-delay product in such low power multipliers will enable energy efficient computation. This study proposes a power estimation tool to analyze different array multiplier architectures, which are most commonly used in such applications. Gate level library design parameters are utilized to derive energy-delay performance for any given set of input vector patterns, and multiplier size. Vector and size dependent factors are therefore clearly identified. Examples are provided from carry save array multiplier (CSAM) and ripple carry array multiplier (RCAM) to demonstrate the capabilities for the tool.

Partitioning and gating technique for low-power multiplication in video processing applications

Article

Nov 2009
MICROELECTRON J

In this paper, we propose a partitioning and gating technique for the design of a high performance and low-power multiplier for kernel-based operations such as 2D convolution in video processing applications. The proposed technique reduces dynamic power consumption by analyzing the bit patterns in the input data to reduce switching activities. Special values of the pixels in the video streams such as zero, repeated values or repeated bit combinations are detected and data paths in the architecture design are disabled appropriately to eliminate unnecessary switching. Input pixels in the video stream are partitioned into halves to increase the possibility of detecting special values. It is observed that the proposed scheme helps to reduce dynamic power consumption in the 2D convolution operations up to 33%.

Design of a radix-4 Booth multiplier with Neighborhood Dependent Approach for video processing applications

Conference Paper

Sep 2007
Conf Proc

In this paper, we propose a neighborhood dependent approach (NDA) for the design of a high performance and low power radix-4 booth multiplier for kernel-based operations such as 2D convolution in video processing applications. The proposed technique reduces dynamic power consumption by analyzing the bit patterns in the input data to reduce switching activities. Special values of the pixels in the video streams such as zero, repeated values or repeated bit combinations are detected and data paths in the architecture design are disabled appropriately to eliminate unnecessary switching in arithmetic units and data buses. Input pixels in the video stream are partitioned into halves to increase the possibility of detecting special values. It is observed that the proposed scheme helps to reduce operations and switching activities in the 2D convolution operations up to 46% of the switching activity rate which results in significant power reduction with low hardware overhead.

Example of a 8-bit wide Modified Booth multiplication.

Contexts in source publication

Similar publications

Citations