Figure 2 - uploaded by Per Gunnar Kjeldsberg
Content may be subject to copyright.
Gate level representation of half-adder and full-adder

Gate level representation of half-adder and full-adder

Source publication
Conference Paper
Full-text available
We propose an interconnect reorganization algorithm for re- duction stages in parallel multipliers. It aims at minimiz- ing power consumption for given static probabilities at the primary inputs. In typical signal processing applications the transition probability varies between the most and least significant bits. The same is the case for individu...

Contexts in source publication

Context 1
... and full-adders are basic elements which are frequently used in parallel multipliers, especially in the par- tial products reduction stage (Fig. 2). Functional represen- tation of CARRY and SUM is shown in Table 1 where ⊕ represents a boolean XOR, + represents a boolean OR and · represents a boolean AND function. The probability of one at the output of these blocks is a function of the probabil- ity of one at the inputs [13] [14]. Static probabilities given in Table 1 can be ...
Context 2
... path of the Carry-in input of a full-adder (C in Fig. 2) is shorter than the other two inputs. A transition on this input will therefore result in less activity. The input with the highest transition activity among three inputs of the full-adder should therefore be connected to the Carry-in input. For a full-adder with inputs (A, B, C), max(α A , α B ) < α C ...

Similar publications

Preprint
Full-text available
Message brokers enable asynchronous communication between data producers and consumers in distributed environments by assigning messages to ordered queues. Message broker systems often provide with mechanisms to parallelize tasks between consumers to increase the rate at which data is consumed. The consumption rate must exceed the production rate o...

Citations

... For many natural signals, the static probability of the MSB signal is smaller than that of the LSB signal. In these cases, it can be very beneficial to optimize the circuits by creating different paths for transferring the LSB or MSB to the output of the gates [40,41]. Hence to minimize the effective capacitance, the switching activity of nodes with higher capacitance can be kept to a minimum value. ...
Article
Different digital multipliers have resulted from various algorithms and hardware designs. This article presents a high-performance multiplier by a novel AND gate and a modified hybrid full adder (FA) cell. The AND is designed by using the pass transistor logic (PTL) technique and a saspeed-up trnsistor, while the FA is based on the transmission gate (TG). Low-power, high-speed, low power-delay-product (PDP), and high competency of both circuits for using in sophisticated structures like multipliers are confirmed by mathematical relations. The proposed 4-bit array multiplier circuit along with the pad has a 2.87 mm² total area and is investigated under different circumstances including VDD, frequency, load capacitances, and process-voltage-temperature (PVT) variations using Monte Carlo method (MCM) by HSPICE tool and 90 nm technology. The efficiency of the multiplier in image processing applications is proved with average improvements of 12.61% and 32.045% for peak signal-to-noise ratio (PSNR) and PDP compared to state-of-the-art designs, respectively. The overall results of the multiplier approve its capability for digital signal processors (DSPs).
... Many researchers have proposed low power multiplier architectures by reducing power consumption in the partial product reduction stage (Oskuii, 2007;Ohban, 2002;Ito et al., 2003;Chen et al., 2003). Historically the partial product reduction stage was implemented using carry save adders based on Wallace or Dadda rules (Parhami, 2010). ...
... The carry save adders used are either Full Adders (FA) or Half Adders (HA). To illustrate this a 6×6 unsigned multiplier using a modified Dadda reduction tree is shown in Fig. 1 (Oskuii, 2007). Stage 1 is the rearranged 6×6 unsigned partial product array obtained by partial product generator of a multiplier. ...
... Also, the extra circuitry consumes additional power. (Oskuii, 2007) Oskuii (2007) proposed a heuristic algorithm to reduce power consumption in the partial product reduction stage based on static probabilities on primary inputs (Oskuii, 2007). At every reduction stage, the number of bits with the same order of magnitude (bits in a column) are grouped together and connected to the adder cells in a Dadda tree. ...
Article
Full-text available
In this study we present an energy efficient multiplier design based on effective capacitance minimization. Only the partial product reduction stage in the multiplier is considered in this research. The effective capacitance at a node is defined as the product of capacitance and switching activity at that node. Hence to minimize the effective capacitance, we decided to ensure that the switching activity of nodes with higher capacitance is kept to a minimum. This is achieved by wiring the higher switching activity signals to nodes with lower capacitance and vice versa, for the 4:2 compressor and adder cells. This reduced the overall switching capacitance, thereby reducing the total power consumption of the multiplier. Power analysis was done by synthesizing our design on Spartan-3E FPGA. The dynamic power for our 1616 multiplier was measured as 360.74 mW and the total power 443.31 mW. This is 17.4% less compared to the most recent design. Also, we noticed that our design has the lowest power-delay product compared to the multipliers presented in literature.
... Alternative heuristics to determine the configuration of the CSA tree were evaluated including a transition-minimizing scheme suggested by Oskuii, et al. [12]. They select inputs at the upper levels of the CSA tree that are predicted to have the lowest likelihood of seeing a transition. ...
Conference Paper
Booth Encoding is a common technique utilized in the design of high-speed multipliers. These multipliers typically encode just one operand of the multiplier, and this asymmetry results in different power characteristics as each input transitions to the next value in a pipelined design. Relative to the non-encoded input, changes on the Booth-encoded input induce more signal transitions requiring ∼73% more multiplier array energy. This paper proposes low-overhead approaches to take advantage of this asymmetric behavior to reduce the energy of multiplication operations in pipelined SIMD architectures like GPUs. Compiler-based approaches that apply constant or uniform inputs to the Booth-encoded input of the multiplier can save 4.8% of multiplier energy on average. An additional 1.5% savings can be achieved with dynamic detection and steering of uniform inputs.
... Since dynamic power consumption in a CMOS VLSI circuit depends on the number of signal transitions at its capacitive nodes, accurate estimation of the bit-level switching activity at the primary inputs is a key requirement in various power estimation techniques. The input switching activity can subsequently be used directly or indirectly to calculate the number of transitions of all nodes in the circuit [1]. The Dual Bit Type method introduced in [2] aims at characterizing the bit-level switching activity in a data word, using the word-level statistics of data, i.e., mean, μ, variance, σ 2 , and temporal correlation, ρ. ...
... Since dynamic power consumption in a CMOS VLSI circuit depends on the number of signal transitions at its capacitive nodes, accurate estimation of the bit-level switching activity at the primary inputs is a key requirement in various power estimation techniques. The input switching activity can subsequently be used directly or indirectly to calculate the number of transitions of all nodes in the circuit [1]. The Dual Bit Type method introduced in [2] aims at characterizing the bit-level switching activity in a data word, using the word-level statistics of data, i.e., mean, μ, variance, σ 2 , and temporal correlation, ρ. method is based on the assumption that the binary representation of real world signals can be divided into a few regions, with well defined switching activity for the bits in each region. ...
Conference Paper
Full-text available
Input switching activity is one of the deciding factors for power consumption in digital signal processing components. For accurate power estimation, it is essential to have knowledge about the switching activity in the input signal, including how this activity changes in different environments, e.g., in the presence of noise. The dual bit type (DBT) method aims at characterizing the bit-level switching activity in a signal, using signal statistics. However, the DBT method requires that the correlation coefficient and switching activity for the most significant bit of the signal are available. In this paper we give an expression for direct calculation of the correlation coefficient for the most significant bit in a signal, using the word-level correlation coefficient. Using simulation results we examine the accuracy of the given method to calculate the switching activity and correlation coefficient for the most significant bit. Furthermore, we derive expressions for accurately calculating the variance and word-level correlation coefficient for a correlated signal, when an additional noise of a given variance is added to the signal. This can be used to estimate the bit-level switching activity in a signal in the presence of noise. Finally, based on this we study the impact the additional noise has on the switching activity of the resulting signal.
... Static power consists of all kinds of leakage currents, of which subthreshold current and gate leakage current are the main contributors [98,99]. Dynamic power is subdivided into switching power (currents flowing to charge the load capacitances), short-circuit power (currents flowing from power supply to ground when both the NMOS and PMOS are conducting) and glitches (gates making several transitions before settling), sometimes also referred to as toggle power [100], although some authors consider glitches to be part of the switching power [101] or consider short-circuit power separately [102]. ...
... Several estimation procedures exist [100,103,104], many of which rely on the determination of circuit activity (the fraction of clock cycles that gates switch). The switching power consumption can be written as P switch = αf CV 2 , where f is the clock frequency, C the load capacitance, V the supply voltage and α the activity [98][99][100][101][102]. Empirical measurements showed that activity on true data ranges from 0.01 to 0.25 [102]. ...
... The model is synthesized using the Synopsys Design Compiler for a 90 nm TSMC-process operating at 1.2 V. Because power consumption of digital circuits heavily depends on toggle rate (activity) of internal nodes [100,101,103,104], three input streams are created. The first stream contains a slowly varying sine with a period of slightly more than 5215 samples, the second a fast varying sine with a period of slightly more than 31 samples. ...
Thesis
Full-text available
Spectrum Analyzers (SAs) are measurement instruments able to decompose a time signal into its frequency components. Due to non-idealities, SAs add noise and distort the signal to be measured. The ratio between the the largest signal and the noise floor level in a measured spectrum, without any distortion components rising above the noise floor, is called the Spurious-Free Dynamic Range (SFDR). In a CMOS-integrated SA the SFDR is limited to around 60 dB by technology, while it needs to be 70 dB (at a frequency resolution of 1 MHz) to be competitive with commercial SAs. A method called crosscorrelation is introduced to lower the noise floor at the cost of measurement time. It relies on two equivalent measurement paths in which the noise produced in one path is uncorrelated with the noise produced in the other path, such that the noise in the final spectrum tends to cancel out. Although the noise level is only lowered by 1:5 dB if measurement time is doubled, it allows the SA to be designed for high linearity. This design involves the use of digital hardware to compute the crosscorrelation. Consequently Analog-to-Digital Converters (ADCs) are required, but they also limit the SFDR due to the non-linear eect of quantization. New approximations to the relation between the number of quantization levels and the SFDR are found. These approximations show that very additional bit improves the SFDR by 8 dB. A simulator of a concept architecture from Recore Systems is used to implement the digital correlation. It achieves an SFDR of 87 dB. An RF-frontend with a frequency range of 0 GHz to 6 GHz is designed for maximum linearity by moving amplification to IF. It provides impedance matching, variable attenuation and mixing. Its performance gures are a Noise Figure (NF) of 14 dB and a Third Order Input-referred Intermodulation Intercept Point (IP3) of +23 dBm, which gives a theoretical SFDR of 82 dB. In order to obtain estimates on the feasability of an integrated SA, other parts, such as the IF-circuitry and local oscillators, are briefly reviewed. The estimated power consumption of the entire correlation SA is 0:5 W at a sample rate of 200 MS/s, and the estimated chip area is 6:5 mm2. The largest power consumers are the VCO (0:2 W), followed by the IF-circuitry (0:1 W) and the ADCs and digital correlator (each 0:08 W). Chip area is dominated by SRAM-memory (36%), ADCs (25%) and the VCO (20%)
... Also, the method in [10] has some similarities, given that the ROMs in [10] are only using one input-bit each. Also, in another line of work the authors have looked at reducing the power consumption in parallel multipliers by utilizing knowledge about the switching activity of the input data [11], [12]. Based on this knowledge the interconnect of the summation tree is reordered to minimize the total switching activity in the multiplier. ...
... Indeed, the freedom of choosing among the numerous permutations of equal weight partial products can be used to reduce the computation delay, dynamic power, static power and other measures of the circuit. These possibilities are considered in several works [11], [12], [17]– [19]. In this work, we apply the progressive reduction-tree design method in [12] to the summation-tree of the elementary function generators. ...
Conference Paper
Full-text available
In this paper we propose a method for lowering the power consumption in our previously proposed method for approximating elementary functions. By rearranging the interconnect ordering in the summation tree we show that it is possible to lower the power consumption in the range of 5.4 % to 25.6% compared to a random ordering. The reduction tree is progressively designed and the interconnect ordering is decided based on the transition activities of the partial products. The reduction in power consumption comes with no overhead in performance or area compared to the random ordering.
... As long as the partial products belong to a unique column, they can be interchanged and used as inputs to any of the half-adders and full-adders without any change in functionality. Related previous work includes [13]– [17]. We will come back to this in the following sections. ...
... Therefore, the pure sorting approach does not necessarily result in a good solution. By incorporating a probabilistic power estimator in the optimization algorithm, the method in [17] increased power saving further. The proposed algorithm in [17] starts with a complete multiplier where the connections are initialized to a random permutation. ...
... By incorporating a probabilistic power estimator in the optimization algorithm, the method in [17] increased power saving further. The proposed algorithm in [17] starts with a complete multiplier where the connections are initialized to a random permutation. Then, for each stage and each column in the reduction tree, a number of useful permutations are specified and tested one at a time. ...
Conference Paper
Full-text available
When designing the reduction tree of a parallel multiplier, we can exploit a large intrinsic freedom for the interconnection order of partial products. The transition activities vary significantly for different internal partial products. In this work we propose a method for generation of power-efficient parallel multipliers in such a way that its partial products are connected to minimize activity. The reduction tree is designed progressively. A simulated annealing optimizer uses power cost numbers from a specially implemented probabilistic gate-level power estimator and selects a power-efficient solution for each stage of the reduction tree. VHDL simulation using ModelSim shows a significant reduction in the overall number of transitions. This reduction ranges from 15% up to 32% compared to randomly generated reduction trees and is achieved without any noticeable area or performance overhead.
... Hence, this aspect is not included in the results. For the cases that we will implement a custom multiplier and not use an existing one in a DSP, an FPGA, or a macro library, it is worth noticing that it is possible to optimize the power consumption of the multiplier based on the expected switching probability [14]. ...
Conference Paper
Full-text available
In this work we consider coefficient reordering for low power realization of FIR filters on fixed-point multiply-accumulate (MAC) based architectures, such as DSP processors. Compared to previous work we consider the input data correlation in the ordering optimization. For this we model the input data using the dual bit type approach. Results show that compared with just optimizing the number of switches between coefficients, the proposed method works better when the input data is correlated, which can be assumed for most applications.
... In the first attempt for optimizing the PPRT, the optimization of the complete PPRT is proposed. Most of the material in this section is published in [181]. The proposed optimization algorithm is summarized in Figure 4. 3 Table 4.1, this is obviously impossible because of the huge number of possibilities. ...
... Therefore larger number of iterations can be afforded, giving better results. The material of this chapter are mostly published in [180,181]. In this approach, design of the PPRT is combined with the optimization phase. ...
Research
-A multiplier is one of the key equipment obstructs in most digital and high frameworks, for example, FIR filter, digital signal processors and microprocessors and so forth. This venture introduces a proficient execution of rapid multiplier utilizing the shift and add technique, Radix_2, Radix_4 modified Booth multiplier algorithm. In this task we look at the working of the three multipliers by actualizing each of them independently in FIR filter. The parallel multipliers like radix 2 and radix 4 modified booth multiplier does the calculations utilizing lesser adders and lesser iterative advances. Because of which they involve lesser space when contrasted with the serial multiplier. This is an imperative basis in light of the fact that in the manufacture of chips and elite framework requires segments which are as little as could reasonably be expected. In our undertaking when we look at the power utilization of the considerable number of multipliers we locate that serial multipliers devour more power. So where control is a critical paradigm there we ought to incline toward parallel multipliers like booth multipliers to serial multipliers. The low power utilization nature of corner multiplier settles on it a favored decision in planning distinctive circuits In this venture we initially composed three distinctive sort of multipliers utilizing shift and add technique, radix 2 and radix 4 modified booth multiplier algorithm. We utilized diverse sort of adders like sixteen bit full adder in outlining that multiplier. At that point we planned a 4 tap delay FIR filter and set up of the augmentation and increases we executed the segments of various multipliers and adders. At that point we looked at the working of various multipliers by contrasting the power utilization by each of them. The consequence of our undertaking causes us to pick a superior choice amongst serial and parallel multiplier in manufacturing diverse frameworks. Multipliers shape a standout amongst the most essential parts of numerous frameworks. So by examining the working of various multipliers outlines a superior framework with less power utilization and lesser zone. The consequence of our undertaking encourages us to settle on a legitimate decision of various multipliers in creating in various number juggling units and settling on a decision among various adders in various advanced applications as per prerequisites Index Terms-Finite Impulse Response, radix_2 and radix_4 Booth multiplier, Shift and add multiplier.