Fig 2 - uploaded by Sudhakar Reddy
Content may be subject to copyright.
An n -bit ripple-carry adder. 

An n -bit ripple-carry adder. 

Source publication
Article
Full-text available
Transient or soft errors caused by various environmental effects are a growing concern in micro and nanoelectronics. We present a general framework for modeling and mitigating the logical effects of such errors in digital circuits. We observe that some errors have time-bounded effects; the system's output is corrupted for a few clock cycles, after...

Context in source publication

Context 1
... the n-bit ripple-carry (RC) adder of Fig. 2. It is constructed from a full adder F A i , an RTL element realizing two functions, the sum z i , and the carry-out c i . It has n FA i stages, 2n þ 1 inputs, and n þ 1 outputs. There are 4n þ 1 lines that can be faulty, so the total number of STFs is ð4n þ 1Þ2 2nþ2 . We can compute the output error probabilities p err ðz i Þ by ...

Similar publications

Conference Paper
Full-text available
The linear deterministic interference channel (LD-IC) with partial feedback is considered. Partial feedback for the LD-IC models a scenario in which the top l most-significant-bits of the channel output of receiver j are received as feedback at transmitter j, for j = 1, 2. The rationale for studying the LD-IC with partial feedback comes from the fa...
Article
Full-text available
Verification is one of the core steps in integrated circuits (ICs) manufacturing due to the multifarious defects and malicious hardware Trojans (HTs). In most cases, the effectiveness of the detection relies on the quality of the sample images of ICs. However, the high-precision and noiseless images are hard to capture due to the mechanical precisi...

Citations

... The soft error caused by transient fault is the main cause of circuit failure, so researchers pay attention to the modeling of transient fault and the calculation of FPC [13,30,34,36,41]. The most common method is to simulate the operation of the circuit by injecting a large number of faults and then calculating the FPC by determining whether the fault propagates to the output of the circuit. ...
Article
Full-text available
As the feature size of integrated circuits decreases to the nanometer scale, process fluctuations, aging effects, and particle radiation have an increasing influence on the Failure Probability of Circuits (FPC), which brings severe challenges to chip reliability. The accurate and efficient estimation of logic circuit failure probability is a prerequisite for high-reliability design. It is difficult to calculate FPC due to a large number of reconvergent fanout structures and the resulting signal correlation, particularly for Very Large-Scale Integrated (VLSI) circuits. Accordingly, this paper presents a Correlation Separation Approach (COSEA) that aims to efficiently and accurately estimate the FPC. The proposed COSEA divides the circuit into several different fanout-relevant and fanout-irrelevant circuits. Moreover, the error probability of the nodes is expressed as the result of interactions between different structures. As a result, the problem of signal correlation can be efficiently solved. Because the computational complexity of COSEA is linearly related to the scale of the circuit, it has good scalability. Compared with the Probabilistic Transfer Matrices (PTM) method, Monte Carlo simulation (MC), and other failure probability calculation methods in the literatures, the experimental results show that our approach not only achieves fast speed and good scalability, but also maintains high accuracy.
... Since this self-healing architecture is designed to realize the functionality of critical systems operating in harsh environments, radiation-induced transient faults that can occur at unpredictable times are the most prevalent fault type [24], and the process of hardening the digital device at the circuit level [25] is more effective, the subsequent faulttolerance can occur even with an uncovered error. If they are not tolerated at the block level, their wrong values will be sensitized at the output signals of the cell level, which is directly connected to the external world through a network of I/O digital ports and can impact the safety of the public or the environment. ...
Article
Full-text available
Digital embedded systems in safety-critical cyber-physical-systems (CPSs) require high levels of resilience and robustness against different fault classes. In recent years, self-healing concepts based on biological physiology have received attention for the design and implementation of reliable systems. However, many of these approaches have not been architected from the outset with safety in mind, nor have they been targeted for the safety-related automation industry where the significant need exists. This study presents a new self-healing hardware architecture inspired by integrating biological concepts, fault tolerance techniques, and IEC 61131-3 operational schematics to facilitate adaption in automation and critical infrastructure. The proposed architecture is organised in two levels: the critical functions layer used for providing the intended service of the application and the healing layer that continuously monitors the correct execution of that application and generates health syndromes to heal any failure occurrence inside the functions layer. Finally, two industrial applications have been mapped on this architecture to date, and the authors believe the nexus of its concepts can positively impact the next generation of critical CPSs in industrial automation
... Since this self-healing architecture is designed to realize the functionality of critical systems operating in harsh environments, radiation-induced transient faults that can occur at unpredictable times are the most prevalent fault type [24], and the process of hardening the digital device at the circuit level [25] is more effective, the subsequent faulttolerance can occur even with an uncovered error. If they are not tolerated at the block level, their wrong values will be sensitized at the output signals of the cell level, which is directly connected to the external world through a network of I/O digital ports and can impact the safety of the public or the environment. ...
Preprint
Digital embedded systems in safety-critical cyber-physical-systems require high levels of resilience and robustness against different fault classes. In recent years, self-healing concepts based on biological physiology have received attention for the design and implementation of reliable systems. However, many of these approaches have not been architected from the outset with safety in mind, nor have they been targeted for the safety-related automation industry where significant need exists. This paper presents a new self-healing hardware architecture inspired by integrating biological concepts, fault tolerance techniques, and IEC 61131-3 operational schematics to facilitate adaption in automation and critical infrastructure. The proposed architecture is organized in two levels: the critical functions layer used for providing the intended service of the application and the healing layer that continuously monitors the correct execution of that application and generates health syndromes to heal any failure occurrence inside the functions layer. Finally, two industrial applications have been mapped on this architecture to date and we believe the nexus of its concepts can positively impact the next generation of critical cyber-physical-systems in industrial automation
... The hybrid unit represents a first line of defense against the discovered transient faults defined as temporary deviations in the register values. The proposed architecture is designed to realize the functionality of safety-critical digital embedded systems operating in harsh environments, so radiation-induced transient faults that can occur at unpredictable times are the most prevalent fault type [20], [21], [22] and the process of correcting them at the functional block layer is most effective. If they are not tolerated at the block level, their wrong values will be sensitized at the output signals of the cell level, which is directly connected to the external world and can impact the safety of the public or the environment. ...
Preprint
Digital Embedded Devices of next-generation safety-critical industrial automation systems require high levels of survivability and resilience against the hardware and software failure. One of the concepts for achieving this requirement is the design of resilient and survivable digital embedded systems. In the last two decades, development of self-healing digital systems based on molecular and cellular biology have received attention for the design of robust digital systems. However, many of these approaches have not been architected from the outset with safety in mind, nor have they been targeted for the applications of automation community where a significant need exists. This paper presents a new self-healing hardware architecture, inspired from the way nature responds, defends and heals: the stem cells in the immune system of living organisms, the life cycle of the living cell, and the pathway from Deoxyribonucleic acid (DNA) to protein. The proposed architecture is integrating cellular-based biological concepts, traditional fault tolerance techniques, and operational schematics for the international standard IEC 61131-3 to facilitate adoption in the automation industry and safety-critical applications. To date, two industrial applications have been mapped on the proposed architecture, which are capable of tolerating a significant number of faults that can stem from harsh environmental changes and external disturbances and we believe the nexus of its concepts can positively impact the next generation of critical systems in the automation industry
... The hybrid unit represents a first line of defense against the discovered transient faults defined as temporary deviations in the register values. The proposed architecture is designed to realize the functionality of safety-critical digital embedded systems operating in harsh environments, so radiation-induced transient faults that can occur at unpredictable times are the most prevalent fault type [20], [21], [22] and the process of correcting them at the functional block layer is most effective. If they are not tolerated at the block level, their wrong values will be sensitized at the output signals of the cell level, which is directly connected to the external world and can impact the safety of the public or the environment. ...
Article
Digital Embedded Devices of next-generation safety-critical industrial automation systems require high levels of survivability and resilience against the hardware and software failure. One of the concepts for achieving this requirement is the design of resilient and survivable digital embedded systems. In the last two decades, development of self-healing digital systems based on molecular and cellular biology have received attention for the design of robust digital systems. However, many of these approaches have not been architected from the outset with safety in mind, nor have they been targeted for the applications of automation community where a significant need exists. This paper presents a new self-healing hardware architecture, inspired from the way nature responds, defends and heals: the stem cells in the immune system of living organisms, the life cycle of the living cell, and the pathway from Deoxyribonucleic acid (DNA) to protein. The proposed architecture is integrating cellular-based biological concepts, traditional fault tolerance techniques, and operational schematics for the international standard IEC 61131-3 to facilitate adoption in the automation industry and safety-critical applications. To date, two industrial applications have been mapped on the proposed architecture, which are capable of tolerating a significant number of faults that can stem from harsh environmental changes and external disturbances and we believe the nexus of its concepts can positively impact the next generation of critical systems in the automation industry.
... The BioSymPLe architecture is designed to realize the functionality of critical systems operating in harsh environments and radiation-induced transient faults that can occur at unpredictable times are the most prevalent fault type [50], [19], [20]. The process of detecting/correcting the transient faults at the register level inside the functional block is most effective and represents the first line of defense for ...
Thesis
Full-text available
Digital Instrumentation and Control (I&C) systems in safety-related applications of next generation industrial automation systems require high levels of resilience against different fault classes. One of the more essential concepts for achieving this goal is the notion of resilient and survivable digital I&C systems. In recent years, self-healing concepts based on biological physiology have received attention for the design of robust digital systems. However, many of these approaches have not been architected from the outset with safety in mind, nor have they been targeted for the automation community where a significant need exists. This dissertation presents a new self-healing digital I&C architecture called BioSymPLe, inspired from the way nature responds, defends and heals: the stem cells in the immune system of living organisms, the life cycle of the living cell, and the pathway from Deoxyribonucleic acid (DNA) to protein. The BioSymPLe architecture is integrating biological concepts, fault tolerance techniques, and operational schematics for the international standard IEC 61131-3 to facilitate adoption in the automation industry. BioSymPLe is organized into three hierarchical levels: the local function migration layer from the top side, the critical service layer in the middle, and the global function migration layer from the bottom side. The local layer is used to monitor the correct execution of functions at the cellular level and to activate healing mechanisms at the critical service level. The critical layer is allocating a group of functional B cells which represent the building block that executes the intended functionality of critical application based on the expression for DNA genetic codes stored inside each cell. The global layer uses a concept of embryonic stem cells by differentiating these type of cells to repair the faulty T cells and supervising all repair mechanisms. Finally, two industrial applications have been mapped on the proposed architecture, which are capable of tolerating a significant number of faults (transient, permanent, and hardware common cause failures CCFs) that can stem from environmental disturbances and we believe the nexus of its concepts can positively impact the next generation of critical systems in the automation industry.
... Recently, [Biswas, 2008;Wang, 2013] discuss the vulnerability behavior of faulty structures at the architectural level. [Polian, 2011] suggests a transient error model. It analyzes the possibility that soft-errors reflect on the output. ...
... As a result, all flip-flops (FF) and input ports are sorted out based on their significance to the final output. It analyzes algorithmic effects of errors (error magnitude), in contrast to [Polian, 2011] where only the chance of detectable errors were covered. ...
Article
Full-text available
With the increasing digital services demand, performance and power-efficiency become vital requirements for digital circuits and systems. However, the enabling CMOS technology scaling has been facing significant challenges of device uncertainties, such as process, voltage, and temperature variations. To ensure system reliability, worst-case corner assumptions are usually made in each design level. However, the over-pessimistic worst-case margin leads to unnecessary power waste and performance loss as high as 2.2x. Since optimizations are traditionally confined to each specific level, those safe margins can hardly be properly exploited. To tackle the challenge, it is therefore advised in this Ph.D. thesis to perform a cross-layer optimization for digital signal processing circuits and systems, to achieve a global balance of power consumption and output quality. To conclude, the traditional over-pessimistic worst-case approach leads to huge power waste. In contrast, the adaptive voltage scaling approach saves power (25% for the CORDIC application) by providing a just-needed supply voltage. The power saving is maximized (46% for CORDIC) when a more aggressive voltage over-scaling scheme is applied. These sparsely occurred circuit errors produced by aggressive voltage over-scaling are mitigated by higher level error resilient designs. For functions like FFT and CORDIC, smart error mitigation schemes were proposed to enhance reliability (soft-errors and timing-errors, respectively). Applications like Massive MIMO systems are robust against lower level errors, thanks to the intrinsically redundant antennas. This property makes it applicable to embrace digital hardware that trades quality for power savings.
... As technology feature size shrinks, increased frequency and reduced supply voltage result in more transient faults being propagated to latches or primary outputs of circuits and creates more failures than before [2,3]. Therefore, reliability evaluation approaches and soft error rate (SER) analysis of logic circuits are needed to meet the increasing demand on reliable design [4,5]. The general definition of reliability is the probability of the correct functioning of a circuit, while SER has been used as a measure on the circuit's vulnerability under the influence of soft errors. ...
Article
Full-text available
Reliability has been an important consideration in designing modern circuits due to the nanometric scaling of CMOS technology. This paper proposes a reliability evaluation approach for logic circuits based on transient faults propagation metrics (TFPMs). In this approach, TFPMs of each nodes are calculated through reverse topological traversal of the target circuit by Boolean operations in parallel. Using these faults propagation features, the reliability of combinational circuits and full scan sequential circuits are evaluated efficiently. Experimental results and statistic analysis show the proposed approach can achieve about three orders of magnitude faster than Monte Carlo simulation (MCS) while maintaining accuracy.
... Reliability issues have become a vital concern for Very Large Scale Integration (VLSI) design due to the continued scaling of VLSI technology and supply voltage. Among these reliability issues, transient faults caused by environmental effects, such as electrical noise, particle strikes, and electromagnetic coupling, are primary failure mechanisms [1]. In contrast to permanent faults, transient faults are temporary deviations of a circuit's state from its correct or reference state. ...
Article
Full-text available
The reliability of Very Large Scale Integration (VLSI) circuits has become increasingly susceptible to transient faults induced by environmental noise with the scaling of technology. Some commonly used fault tolerance strategies require statistical methods to accurately estimate the fault rate in different parts of the logic circuit, and Monte Carlo (MC) simulation is often applied to complete this task. However, the MC method suffers from impractical computation costs due to the size of the circuits. Furthermore, circuit aging effects, such as negative bias temperature instability (NBTI), will change the characteristics of the circuit during its lifetime, leading to a change in the circuit’s noise margin. This change will increase the complexity of transient fault rate estimation tasks. In this paper, an NBTI-aware statistical analysis method based on probability voltage transfer characteristics is proposed for combinational logic circuit. This method can acquire accurate fault rates using a discrete probability density function approximation process, thus resolving the computation cost problem of the MC method. The proposed method can also consider aging effects and analyze statistical changes in the fault rates. Experimental results demonstrate that, compared to the MC simulation, our method can achieve computation times that are two orders of magnitude shorter while maintaining an error rate less than 9%.
... To meet this increasing demand on reliable design, several analytical approaches have been proposed for the reliability evaluation [5][6][7][8][9][10][11][12][13][14][15] and soft error rate (SER) analysis of logic circuits [16][17][18][19][20][21][22][23][24][25][26]. Soft errors are typically caused by temporary environmental phenomena, such as external radiation or power supply noise [23]. ...
Article
Reliability is fast becoming a major concern due to the nanometric scaling of CMOS technology. Accurate analytical approaches for the reliability evaluation of logic circuits, however, have a computational complexity that generally increases exponentially with circuit size. This makes intractable the reliability analysis of large circuits. This paper initially presents novel computational models based on stochastic computation; using these stochastic computational models (SCMs), a simulation-based analytical approach is then proposed for the reliability evaluation of logic circuits. In this approach, signal probabilities are encoded in the statistics of random binary bit streams and non-Bernoulli sequences of random permutations of binary bits are used for initial input and gate error probabilities. By leveraging the bit-wise dependencies of random binary streams, the proposed approach takes into account signal correlations and evaluates the joint reliability of multiple outputs. Therefore, it accurately determines the reliability of a circuit; its precision is only limited by the random fluctuations inherent in the stochastic sequences. Based on both simulation and analysis, the SCM approach takes advantages of ease in implementation and accuracy in evaluation. The use of non-Bernoulli sequences as initial inputs further increases the evaluation efficiency and accuracy compared to the conventional use of Bernoulli sequences, so the proposed stochastic approach is scalable for analyzing large circuits. It can further account for various fault models as well as calculating the soft error rate (SER). These results are supported by extensive simulations and detailed comparison with existing approaches.