Fig 1 - uploaded by Swaroop Ghosh
Content may be subject to copyright.
Ripple carry adder [5].

Ripple carry adder [5].

Source publication
Article
Full-text available
Design considerations for robustness with respect to variations and low-power operations typically impose contradictory design requirements. Low-power design techniques such as voltage scaling, dual- , etc., can have a large negative impact on parametric yield. In this paper, we propose a novel paradigm for low-power variation-tolerant circuit desi...

Contexts in source publication

Context 1
... the sake of simplicity, we choose a 4-b ripple carry adder, as shown in Fig. 1. Signals P 0 − P 3 (G 0 − G 3 ) are the propa- gate (generate) signals, whereas C i,0 (C o,1 − C o,3 ) are carry-in (carry-out) signals [5]. As evident, the path from carry-in to carry-out is critical and determines the frequency of operation of the adder. However, note that the critical path is activated only when C i,0 = 1, and at ...
Context 2
... circuit design is again based on CRISTA, i.e.; making the possible delay errors (under single-cycle operations) predictable and rare under parametric variations. We follow similar partitioning technique as dis- cussed in Section III-A. However, the sizing routine is modified to attain the path delay distribution, as shown by a cartoon in Fig. 10. It is easy to observe that this kind of delay distribution can allow the circuit to operate at two different lower voltages with small performance overhead. On obtaining this delay distribution, a temperature-adaptive DVS can be performed with the following conditions: 1) at normal temperatures, the circuit operates in single cycle ...
Context 3
... temperatures, the circuit operates in single cycle with nominal supply; 2) if the temperature violates threshold T REF1 , a lower supply V DDL1 is applied to push the critical cofactor to two-cycle operations; and 3) if temperature crosses threshold T REF2 (> T REF1 ), supply V DDL2 (< V DDL1 ) is applied so that the first noncritical cofactor (Fig. 10) also operates in two cycles. A thermal sensor can automatically make decision about the supplies (i.e., V DDH , V DDL1 , or V DDL2 ) that can be applied to the circuit. The activation of critical and first noncritical cofactors is predicted by ...
Context 4
... the temperature-adaptive design, the critical cofactor is downsized to meet slack S 1 ( Fig. 10) with respect to the clock period T c . Furthermore, one of the noncritical cofactors (i.e., first critical cofactor) is sized to meet a slack of S 2 . All other cofactors are upsized to meet the slack S 3 . The remaining steps are similar to Table I so the details are ...
Context 5
... architecture of CRISTA based temperature-adaptive pipeline (with three voltage domains) is shown in Fig. 11. Decoders D 11 , D 21 , and D 31 determine whether the critical cofactor of one of the stages is activated. Similarly, decoders D 12 , D 22 , and D 32 determine the activation of first noncritical cofactor of the stages. These two sets of decoders are ORed to predict the activation of critical cofactor and the first noncritical ...
Context 6
... failures are predicted ahead in time, and corrective action (i.e., two-cycle operation) is taken. However, the delay failures can occur when the supply is brought back from V DDL2 to V DDL1 or V DDL1 to V DDH . To avoid such situations, we place a few bits counter as delay element which extends the freeze signal for a few more cycles (as shown in Fig. 11). The proposed DVS technique can also be extended for N -stage pipeline (at the cost of extra AND and OR ...
Context 7
... target delay of the pipeline was 100 ps, and {V DDH , V DDL1 , V DDL2 } were found to be {1.0 V, 0.8 V, 0.75 V}. The thermal simulation result for our example pipeline circuit Fig. 11 is shown in Fig. 12. It can be observed that the tempera- ture rises from the ambient value and saturates after some time. At t = 4 × 10 5 ns, the temperature of the circuit crosses T REF1 , and the supply voltage is reduced to V DDL1 (i.e., 0.8 V) based on the sensor's output. As expected, the temperature reduces to approximately ...
Context 8
... target delay of the pipeline was 100 ps, and {V DDH , V DDL1 , V DDL2 } were found to be {1.0 V, 0.8 V, 0.75 V}. The thermal simulation result for our example pipeline circuit Fig. 11 is shown in Fig. 12. It can be observed that the tempera- ture rises from the ambient value and saturates after some time. At t = 4 × 10 5 ns, the temperature of the circuit crosses T REF1 , and the supply voltage is reduced to V DDL1 (i.e., 0.8 V) based on the sensor's output. As expected, the temperature reduces to approximately 100.7 • C after some ...

Similar publications

Conference Paper
Full-text available
This paper proposes path mapping, a method of delay estimation for technology independent combinational circuits. Path mapping provides fast and accurate delay estimation using the common ideas of tree covering technology mapping. First, path mapping performs technology mapping for all paths in the circuit with minimum delay. Then, it finds the mos...
Conference Paper
Full-text available
The complexity of timing optimization has been increasing rapidly in proportion to the shrinking CMOS device size, due to the increased number of channel-connected transistors in a path, and the rising magnitude of process variations. These significant challenges can be addressed through the implementation of designs with an optimal balance between...
Conference Paper
Full-text available
This paper introduces an approximate adder architecture based on a digital quasi-feedback technique called Carry Cut-Back in which high-significance stages can cut the carry propagation chain at lower-significance positions. This lightweight approach prevents activation of the critical path, improving energy efficiency while guaranteeing low worst-...

Citations

... Existing CPR techniques can be divided into two types: one is to repeat specific units as replica path circuits, such as the inverter chain [22,23] in Figure 1(b), and the UDL structure [24] in Figure 1(c), as well as the mixed-gate CPR [25,26]. These techniques require additional complex controlling logic to monitor the replica path delay. ...
Article
Critical path replica (CPR) is a widely used technique in synchronous digital circuit design. However, the existing CPR technique cannot accurately reflect the timing of the circuit due to local process variations (LPV). An improved CPR technique based on load capacitance matching (LCM) is proposed in this paper, which can track critical path delay across wide voltage range. The impact of LPV is simulated under wide voltage range, and a configurable delay line is designed to eliminate the effect of LPV. Furthermore, a low overhead mixed-threshold transition detector (TD) circuit is also proposed to monitor timing violations of the replica path, which generate an ‘error’ signal used to dynamically regulate the chip’s operating voltage. The proposed techniques are implemented on a CORDIC chip using the 55-nm CMOS process. Simulation results show that in the near-threshold voltage (NTV) region, the supply voltage can be reduced from 0.8 to 0.6 V, enabling a maximum of 42.6% power saving at the TT corner, 25°C with lesser than 1% area overhead as compared to the baseline design.
... From this sense, we need to find a better trade-off relation regarding the minimum supply voltage under VOS, power, and area. For pursuing the better trade-off, this work targets active CPs, i.e., paths actually causing timing errors, where a similar consideration is found in literature [5], [12], [18], [19]. Also, we refer to [12] and adopt FFbased CPI; assigns manipulated setup delay constraints to FFs, and re-synthesizes the design as an engineering change order (ECO) process. ...
Article
This work proposes a design methodology that saves the power dissipation under voltage over-scaling (VOS) operation. The key idea of the proposed design methodology is to combine critical path isolation (CPI) and bit-width scaling (BWS) under the constraint of computational quality, e.g., Peak Signal-to-Noise Ratio (PSNR) in the image processing domain. Conventional CPI inherently cannot reduce the delay of intrinsic critical paths (CPs), which may significantly restrict the power saving effect. On the other hand, the proposed methodology tries to reduce both intrinsic and non-intrinsic CPs. Therefore, our design dramatically reduces the supply voltage and power dissipation while satisfying the quality constraint. Moreover, for reducing co-design exploration space, the proposed methodology utilizes the exclusiveness of the paths targeted by CPI and BWS, where CPI aims at reducing the minimum supply voltage of non-intrinsic CP, and BWS focuses on intrinsic CPs in arithmetic units. From this key exclusiveness, the proposed design splits the simultaneous optimization problem into three sub-problems; (1) the determination of bit-width reduction, (2) the timing optimization for non-intrinsic CPs, and (3) investigating the minimum supply voltage of the BWS and CPI-applied circuit under quality constraint, for reducing power dissipation. Thanks to the problem splitting, the proposed methodology can efficiently find quality-constrained minimum-power design. Evaluation results show that CPI and BWS are highly compatible, and they significantly enhance the efficacy of VOS. In a case study of a GPGPU processor, the proposed design saves the power dissipation by 42.7% with an image processing workload and by 51.2% with a neural network inference workload.
... Voltage scaling is an effective and well-known approach to enhance efficiency at the expense of timing errors and degraded robustness. Adaptive timing techniques have been developed in previous work to permit aggressive voltage scaling [1]- [3]. Low voltage operation exhibits unique characteristics for deep learning applications due to the inherent resilience of these applications (such as image processing and speech recognition) to errors [4]- [7]. ...
Article
Full-text available
Energy efficiency is a critical design objective in deep learning hardware, particularly for real-time machine learning applications where the processing takes place on resource-constrained platforms. The inherent resilience of these applications to error makes voltage scaling an attractive method to enhance efficiency. Timing error probability models are proposed in this article to better understand the effects of voltage scaling on error rates and power consumption of multiply-accumulate units. The accuracy of the proposed models is demonstrated via Monte Carlo simulations. These models are then used to quantify the related tradeoffs without relying on time-consuming hardware-level simulations. Both modern FinFET and emerging tunneling field-effect transistor (TFET) technologies are considered to explore the dependence of the effects of voltage scaling on these two technologies.
... The most representative technique among this class of AVS powered by timing speculation is proposed in [50]. The authors introduce a design paradigm, called CRISTA, that implements AVS under a desired frequency constraint. ...
Thesis
Full-text available
Energy efficiency has become the main constraint for most of today’s information and communication technologies, from those involving high-performance computing (e.g., cloud services) to those deployed on low-power applications (e.g., portable systems for the Internet-of-Things). In the past decades, the pursuit of energy efficiency was mainly supported through the advance of the underlying CMOS technology. Moving towards a new node was the guarantee to achieve more than 90% of energy savings. However, as soon as the CMOS entered the nanometric regime, improvements brought by a technology shift have shrunk substantially, reaching 20% and then further less generation by generation. To make matters worse, production costs raised dramatically due to the technological impediments imposed by physical geometries below the 28 nm mark. This made technology scaling impractical for many cost-sensitive applications. New sophisticated energy-aware design practices were then introduced to alleviate the suffering of a slow technology scaling. Very soon, low-power and energy-management techniques become the actual kernel of any design and optimization flow. Unfortunately, also design techniques are not fully renewable, namely, their effectiveness degrades with the advance of the technology nodes. This is the case of voltage scaling, for instance, which encountered the 1.0 V plateau that still holds today, but also other architectural-level techniques, such as multi-core/many-core solutions, which have been seriously limited by stringent dark-silicon constraints. The end of Moore’s law is not just a technology issue; it is also the prelude of a design crisis that will soon require to rethink the optimization and integration strategy of digital circuits and systems. A radical solution to all these concerns has to come yet. However, the recent growth of data-centric applications is opening to new design paradigms that alleviate the pressure. Much room is at the application-level indeed, where alternative energy-management knobs are available. The basic idea is that of integrating the quality-of-results as a new dimension in the design space. Leveraging the intrinsic error-resilience of data-centric applications, it is thereby possible to implement an Energy-Accuracy Scaling (EAS) which is orthogonal to the technology adopted and the low-power design strategy deployed. At the basis of this concept, there is the simple intuition that an application whose output can be degraded without affecting the quality perceived by the user may require lower energy consumption for the same amount of work. The broad objective of this dissertation is to introduce advanced design solutions that improve the approach the EAS paradigm is implemented. Two new strategies are presented which reduce the design overhead of classical approximate solutions; according to the revisited taxonomy introduced in this thesis, one of the proposed strategies belongs to the class of Adaptive EAS, while the second falls under the label of Static EAS. With Adaptive EAS, the optimal energy-accuracy tradeoff is achieved by measuring some quality metrics directly on-chip, at run-time, establishing a feedback loop that drives the energy minimization. These metrics can be obtained by explicitly measuring the output accuracy, or by indirect measurements, e.g., through the output error rate. With Static EAS, the energy-accuracy tradeoff is fixed at design-time by functional speculation, i.e., a modification of the logic functionality through algorithmic or circuit simplifications which induce energy savings for a worst-case accuracy loss. The Adaptive solution encompasses the extension of the conventional Error DetectionCorrection techniques for data-driven voltage scaling in order to trade system accuracy for energy reduction. The new mechanism, called Approximate Error DetectionCorrection (AED-C), is built upon in-situ elastic timing monitors which allow to implement a lightweight error management scheme.The AED-C implements EAS using the error detection coverage as a knob: a low error coverage accelerates supply voltage over-scaling thus to achieve more significant energy savings at the cost of quality-ofresult; a high error coverage lessens the voltage scaling leading to higher accuracy at the cost of lower energy savings. As EAS does not have to ensure full error coverage, the traditionally large area/energy overhead of conventional techniques is drastically reduced. Simulations over a representative set of applications/circuits, e.g., MultiplyAccumulate (MAC) unit, Discrete Cosine Transform (DCT), FIR and IIR filters, provide a comparative analysis with the state-of-the-art techniques. The collected results show that AED-C substantially reduces the average energy-per-operation and the area overhead, still guaranteeing reasonable accuracy. The static EAS strategy, instead, is developed exploiting Machine Learning theories which suggest alternative forms to represent relationships among data. Such theories find their application in the Boolean domain, where logic functions can be described as inference rules. The novel paradigm, named as Inferential Logic, leverages the concept of statistical inference for the design of combinational logic circuits that are able to mimic Boolean functions to a certain degree of accuracy. These inferential logic circuits run quasi-exact computations trading energy efficiency for accuracy in error-resilient applications. The figures-of-merit of an Inferential Multiplier are quantified using representative image processing applications as a case study. A comparative analysis against a state-of-the-art Booth Multiplier proves the inferential logic representation simplifies the circuit complexity reducing the overall area/delay. As a result, the inferential multiplier can exploit latency reduction for power optimization guaranteeing a fixed average accuracy.
... The existing CPR design methods mainly include inverter line [13,21,22], mixed-gate CPR [23,24] and UDL (Universal Delay Line) [25]. The inverter line is the most commonly used CPR and its design is simple [26,27]. ...
Article
In this paper, we proposed an improved design method of critical path replica (CPR) for wide voltage design. Timing accuracy of CPR in wide operating voltage is improved by applying load matching and transistor-level static timing analysis (TSTA). We applied proposed method to 100 critical paths of iscas’95 benchmark circuits, the results of simulation experiments in SMIC 55nm shows that the CPR designed by proposed method can operating between 0.3V-1.2V with only 0.25% delay error (DE).
... Conservative approaches for improving circuit robustness are ones that provide enough timing margins for uncertainties, such as process, voltage, temperature, and ageing variations. One efficient alternative option is critical path isolation for timing adaptiveness (CRISTA) [74]. The core idea includes the following: 1) isolate the set of possible critical paths under process variations and render them predictable; 2) guarantee that these paths are activated rarely; and 3) dynamically convert them to a two-cycle operation when they are activated. ...
... Path delay distribution required for CRISTA[74]. ...
... 36: A case study of the Pipeline processor with CRISTA in[74]. ...
Thesis
Internet of Things is a rapidly emerging technology that allows people and things to be connected anywhere and any time, using any path or network. The ”things” in the IoT applications need external power to perform any function. As a result, IoT devices must offer superior power efficiency compared with previous digital applications. In terms of power-saving techniques, voltage scaling is regarded as one of the most efficient method. However, the aggressive scaling of the CMOS size and supply voltage have posed a increase concern about circuit reliability. This thesis studies the effects of process variation on the low-voltage digital signal processing circuits, with the aim to develop efficient methods to further achieve power saving at the circuit-level and provide corresponding cost-effective error mitigation techniques. This thesis presents three major contributions. First of all, the analysis of process variation on the adiabatic logic circuit (an energy recovery logic) is implemented and bit-serial implementation is proposed as an efficient method to improve the energyefficiency and robustness against variation. After that, our research emphasis in this thesis is shifted to low power and reliable serial computing. The second contribution demonstrates the advantages of serial computing on a FFT implementation in terms of area minimisation, power-saving and energy-scalable operation. The third contribution proposes several novel error-mitigation techniques for serial computing via path isolation, variable latency or time borrowing. The FIR evaluation results show that the proposed techniques can effectively mask or prevent timing errors with very low hardware cost and make serial circuit as a promising candidate for power-efficient and reliable computing in the IoT devices.
... There is a minimum supply voltage Vdd min at which timing paths violate the set-up time; below such threshold, timing faults arise and logic errors propagate. A further reduction of the Vdd can m only be accomplished with modification of the circuit, e.g., reshaping the paths distribution via gate re-sizing [11] and/or timing-driven Boolean restructuring [12], or a relaxing of the timing constraint, e.g., by frequency scaling. Both solutions might have a negative impact on the energy efficiency. ...
Article
Full-text available
Adaptive Voltage Over-Scaling can be applied at run-time to reach the best tradeoff between quality of results and energy consumption. This strategy encompasses the concept of timing speculation through some level of approximation. How and on which part of the circuit to implement such approximation is an open issue. This work introduces a quantitative comparison between two complementary strategies: Algorithmic Noise Tolerance and Approximate Error Detection. The first implements a timing speculation by means approximate computing, while the latter exploits a more sophisticated approach that is based on the approximation of the error detection mechanism. The aim of this study was to provide both a qualitative and quantitative analysis on two real-life digital circuits mapped onto a state-of-the-art 28-nm CMOS technology.
... In voltage overscaling (VOS) [26] the voltage is lowered into a region where the (expected) error-rate is acceptable, thus achieving power savings. Adaptive concepts like Razor [27] or CRISTA [28] are not approximate since they correct all occurring errors; however, skipping or relaxing the error-correction leads to quite efficient adaptive approximate schemes [21] or better-than-worst-case ...
Conference Paper
Approximate computing promises significant advantages over more traditional computing architectures with respect to circuit area, performance, power efficiency, flexibility, and cost. Its use is suitable in applications where limited and controlled inaccuracies are tolerable or uncertainty is intrinsic in input or their data processing, e.g., as it happens in (deep-) machine learning, image and signal processing. This paper discusses a dimension of approximate computing that has been neglected so far, despite it represents nowadays a major asset, that of security. A number of hardware-related security threats are considered, and the implications of approximate circuits or systems designed to address these threats are discussed.
... Operating below nominal voltage allows for reductions in energy consumption at the cost of time-induced errors. These errors cannot be rigorously bounded, and so extra errorcompensation circuits need to be incorporated [19], [20]. Since timing errors are caused by long carry chains, i.e., impact the most significant bit of the final product, it is necessary to quantify the impact of timing violation by modifying the conventional multiplier to allow for graceful degradation [21]. ...
Article
Approximate arithmetic has recently emerged as a promising paradigm for many imprecision-tolerant applications. It can offer substantial reductions in circuit complexity, delay and energy consumption by relaxing accuracy requirements. In this paper, we propose a novel energy-efficient approximate multiplier design using a significance-driven logic compression (SDLC) approach. Fundamental to this approach is an algorithmic and configurable lossy compression of the partial product rows based on their progressive bit significance. This is followed by the commutative remapping of the resulting product terms to reduce the number of product rows. As such, the complexity of the multiplier in terms of logic cell counts and lengths of critical paths is drastically reduced. A number of multipliers with different bit-widths (4-bit to 128-bit) are designed in SystemVerilog and synthesized using Synopsys Design Compiler. Post-synthesis experiments showed that up to an order of magnitude energy savings, and reductions of 65% in critical delay and almost 45% in silicon area can be achieved for an 128-bit multiplier, compared to an accurate equivalent. These gains are achieved with low accuracy losses estimated at less than 0.0028 mean relative error. Additionally, we demonstrate the performance-energyquality (PEQ) trade-offs for different degrees of compression, achieved through configurable logic clustering. While evaluating the effectiveness of the proposed approach three case studies were set up. First, a Gaussian blur filter was designed, which demonstrated up to 80% energy reduction with a meagre loss of image quality. Second, we evaluate our approach in machine learning application using perceptron classifier, showed up to 74% energy reduction with negligible error rate. Third, the proposed multiplier designs were used in a power-constrained image processing application. We showed that SDLC can achieve 60x improvement in computation capability, with potential to be employed in ubiquitous systems.
... Prediction based elastic-clocking. The authors of [13] propose a design paradigm, called CRISTA, that implements AVOS under a desired frequency constraint. The basic idea is to isolate the most critical paths through a custom re-synthesis stage that reshapes the original paths distribution. ...
Article
Full-text available
This paper introduces Approximate Error Detection-Correction (AED-C), an error management scheme suited to adaptive power management on error resilient applications. Inspired by the working principle of Approximate Computing, AED-C implements energy-accuracy scaling using the error detection coverage as a knob: a low error coverage accelerates supply voltage scaling thus to achieve larger energy savings at the cost of quality-of-result (QoR); a high error coverage lessens the voltage scaling leading to high QoR at the cost of weaker energy savings. The AED-C mechanism is built upon elastic timing monitors, Razor flip-flops augmented with a tunable detection window and hardened with the aid of a dynamic short-path padding technique. Simulations over a representative set of circuits provide a comparative analysis with the state-of-art. The collected results show AED-C substantially reduces the average energy-per-operation (up to 44.7% savings w.r.t. Razor-driven Adaptive Voltage Over-Scaling) and the area overhead (3.3% vs. 62.0%), still guaranteeing reasonable QoR. When applied to a real-life application, i.e., Forward Discrete Cosine Transform Unit (FDCT) integrated into a JPEG compressor, AED-C shows 51.9% energy savings (w.r.t. a baseline FDCT implementation) and a PSNR of 48.45 dB (w.r.t. baseline JPEG images).