The overview architecture of the proposed high-speed unsigned 32-bit multiplier design.

Source publication

A high-speed unsigned 32-bit multiplier based on booth-encoder and wallace-tree modifications

Conference Paper

Full-text available

Oct 2014

The delay of the multiplier plays a critical role in many high-speed implementations and processors such as RISC, DSP, and image processing cores, etc. In this paper, a design of unsigned 32-bit multiplier is proposed, aiming to achieve the best timing performance with an appropriate area. The proposed architecture consists of a modified Radix-4 Bo...

Figure 10: Dataflow diagram of Different CNN Implementations on...

Figure 14: NoC Traffic of Different CNN Implementations with Single...

Best Performance's Instances number of Different CNN Implementations on...

RISC-NN: Use RISC, NOT CISC as Neural Network Hardware Infrastructure

Preprint

Full-text available

Mar 2021

Neural Networks (NN) have been proven to be powerful tools to analyze Big Data. However, traditional CPUs cannot achieve the desired performance and/or energy efficiency for NN applications. Therefore, numerous NN accelerators have been used or designed to meet these goals. These accelerators all fall into three categories: GPGPUs, ASIC NN Accelera...

Optimizing Encoder and Decoder Blocks for a Power-Efficient Radix-4 Modified Booth Multiplier

Conference Paper

Aug 2021

Application of ameliorated Harris Hawks optimizer for designing of low-power signed floating-point MAC architecture

Article

Full-text available

Jul 2021
NEURAL COMPUT APPL

Recently established Harris Hawks optimization (HHO) has natural behaviour for finding an optimum solution in global search space without getting trapped in previous convergence. However, the exploitation phase of the current Harris Hawks optimizer algorithm is poor. In the present research, an improved version of the Harris Hawks optimization algorithm, which combined HHO with Particle Swarm Optimization and named as ameliorated Harris Hawks optimizer algorithm, has been proposed to find the solution of various optimization problems such as nonlinear, non-convex and highly constrained engineering design problem. In the proposed research, the exploitation phase of the existing HHO algorithm is improved using a particle swarm optimization algorithm and its performance tested for CEC2005, CECE2017 and CEC2018 benchmark problems. Also, discrete algorithms such as FFT algorithms, convolution and image processing algorithm use multiply and accumulate (MAC) unit as a critical component. The efficiency of a MAC is mainly dependent upon the speed of operation, power dissipation and chip area along with the complexity level of the circuit. In this research paper, a power-efficient signed floating-point MAC (SFMAC) is proposed using universal compressor-based multiplier (UCM). Instead of having a complex design architecture, a simple multiplexer-based circuit is used to achieve signed floating output. The 8 × 8 SFMAC can take 8-bit mantissa and 3-bit exponent. And therefore, the input to the SFMAC can be in the range of − (7.96875)10 to + (7.96875)10. The design and implementation of the proposed architecture is done on the Cadence Spectre tool in GPDK 90 nm and TSMC 130 nm technologies. The analysis has proved that the proposed SFMAC architecture has consumed the least power than the recent MAC architectures available in the literature.

A MUX based signed-floating-point MAC architecture using UCM algorithm

Article

Full-text available

Jul 2020

Digital system algorithms such as FFT algorithms, convolution, image processing algorithm, etc. deploy Multiply and Accumulate (MAC) unit as an evaluative component. The efficiency of a MAC typically relies on the speed of operation, power dissipation, and chip area along with the complexity level of the circuit. In this research paper, a power-delay-efficient signed-floating-point MAC (SFMAC) is proposed using Universal Compressor based Multiplier (UCM). Instead of having a complex design architecture, a simple multiplexer-based circuit is used to achieve a signed-floating output. The 8£8 SFMAC can take 8-bit mantissa and 3-bit exponent and therefore, the input to the SFMAC can be in the range of-(7.96875) 10 to +(7.96875) 10. The design and implementation of the proposed architecture is executed on the Cadence Spectre tool in GPDK 90 nm and TSMC 130 nm CMOS, which proves as power and delay efficient.

An Efficient Design Approach of ROI Based DWT Using Vedic and Wallace Tree Multiplier on FPGA Platform

Article

Full-text available

Aug 2019
IJECE

span lang="EN-US">In digital image processing, the compression mechanism is utilized to enhance the visual perception and storage cost. By using hardware architectures, reconstruction of medical images especially Region of interest (ROI) part using Lossy image compression is a challenging task. In this paper, the ROI Based Discrete wavelet transformation (DWT) using separate Wallace- tree multiplier (WM) and modified Vedic Multiplier (VM) methods are designed. The Lifting based DWT method is used for the ROI compression and reconstruction. The 9/7 filter coefficients are multiplied in DWT using Wallace- tree multiplier (WM) and modified Vedic Multiplier (VM). The designed Wallace tree multiplier works with the parallel mechanism using pipeline architecture results with optimized hardware resources, and 8x8 Vedic multiplier designs improves the ROI reconstruction image quality and fast computation. To evaluate the performance metrics between ROI Based DWT-WM and DWT-VM on FPGA platform, The PSNR and MSE are calculated for different Brain MRI images, and also hardware constraints include Area, Delay, maximum operating frequency and power results are tabulated. The proposed model is designed using Xilinx platform using Verilog-HDL and simulated using ModelSim and Implemented on Artix-7 FPGA device.</span

UCM: A Novel Approach for Delay Optimization

Article

Full-text available

Apr 2019

In the era of digital signal processing, such as graphics and computation systems, multiplication is one of the prime operations. A multiplier is a key component in any kind of digital system such as Multiply-Accumulate (MAC) unit, various FFT algorithms, etc. The efficiency of a multiplier is mainly dependent upon the speed of operation and power dissipation of the circuit along with the complexity level of the multiplier. This paper is based on Universal Compressor based Multiplier (UCM), which yields a high-speed operation with comparative power dissipation; hence, the enhanced performance is reported. The novel design of UCM is analyzed using Cadence Spectre tool in 90nm CMOS technology. Finally, the UCM is implemented using Nexys-4 Artix-7 FPGA board. The novel design of UCM has demonstrated significant improvement in terms of delay, which is explored in this paper. Read more: http://www.ijpe-online.com/ucm-a-novel-approach-for-delay-optimization.html#ixzz5oYYgIgld

A 0.75-V 32-MHz 181-µW SOTB-65nm Floating-point Twiddle Factor Using Adaptive CORDIC

Conference Paper

Full-text available

Feb 2019

VLSI Design of Floating-Point Twiddle Factor Using Adaptive CORDIC on Various Iteration Limitations

Conference Paper

Full-text available

Sep 2018

An efficient floating-point FFT twiddle factor implementation based on adaptive angle recoding CORDIC algorithm

Article

Full-text available

Dec 2017

In this paper, a single-precision floating-point FFT twiddle factor (TF) implementation is proposed. The architecture is based on the Adaptive Angle Recoding CORDIC (AARC) algorithm. The TF design was built and verified on Altera Stratix IV FPGA chip and 65nm SOTB synthesis. The FPGA implementation had 103.9 MHz maximum frequency, throughput result of 16.966 Mega-Sample per second (MSps), and resources utilization of 7.747 ALUTs and 625 registers. On the other hand, the SOTB synthesis has 16.858 standard cells on an area of 298x291 μm2, 166 MHz maximum frequency, and the speed of 27.107 MSps. The accuracy results were 1.133E-10 Mean-Square-Error (MSE) and about 26 part-per-million (ppm) maximum error.

A floating-point FFT Twiddle Factor Implementation Based on Adaptive Angle Recoding CORDIC

Conference Paper

Full-text available

Jan 2017

In this paper, a single-precision floating-point FFT Twiddle Factor (TF) implementation is proposed. The architecture is based on Adaptive Angle Recoding CORDIC (AARC) algorithm. The TF design is built and verified on Altera Stratix IV FPGA chip and 65nm SOTB synthesis. The FPGA implementation has 103.9 MHz maximum frequency, throughput result of 16.966 Mega-Sample-per-second (MSps), and resources utilization of 7,747 ALUTs and 625 registers. On the other hand, the SOTB synthesis has 16, 858 standard cells on an area of 86,718um2, 166 MHz maximum frequency, and the speed of 27.107 MSps. The accuracy results are 1.133E −10 Mean-Square-Error (MSE) and about 26 part-per-million (ppm) maximum error-ratio.

Design of an algorithmic Wallace multiplier using high speed counters

Conference Paper

Full-text available

Dec 2015

Wallace tree multipliers provide a power-efficient strategy for high speed multiplication. The use of high speed 7∶3 counters in the Wallace tree reduction can further improve the multiplier speed. This paper presents an algorithmic approach to construct the counter based Wallace tree multipliers. The proposed algorithm can be used to implement the efficient counter based Wallace multiplier of any size suitable for FPGA or ASIC synthesis tools. The designs are synthesized in Synopsys Design Compiler using 90 nm CMOS technology. The detailed comparison of traditional and counter based Wallace multipliers is performed which shows that the counter based Wallace multiplier is up to 22% faster as compared to the traditional Wallace multiplier.

The overview architecture of the proposed high-speed unsigned 32-bit multiplier design.

Similar publications

Citations