Fig. 2. An n -bit ripple-carry adder.

An Accurate Estimation Algorithm for Failure Probability of Logic Circuits Using Correlation Separation

Article

Full-text available

Apr 2022
J ELECTRON TEST

As the feature size of integrated circuits decreases to the nanometer scale, process fluctuations, aging effects, and particle radiation have an increasing influence on the Failure Probability of Circuits (FPC), which brings severe challenges to chip reliability. The accurate and efficient estimation of logic circuit failure probability is a prerequisite for high-reliability design. It is difficult to calculate FPC due to a large number of reconvergent fanout structures and the resulting signal correlation, particularly for Very Large-Scale Integrated (VLSI) circuits. Accordingly, this paper presents a Correlation Separation Approach (COSEA) that aims to efficiently and accurately estimate the FPC. The proposed COSEA divides the circuit into several different fanout-relevant and fanout-irrelevant circuits. Moreover, the error probability of the nodes is expressed as the result of interactions between different structures. As a result, the problem of signal correlation can be efficiently solved. Because the computational complexity of COSEA is linearly related to the scale of the circuit, it has good scalability. Compared with the Probabilistic Transfer Matrices (PTM) method, Monte Carlo simulation (MC), and other failure probability calculation methods in the literatures, the experimental results show that our approach not only achieves fast speed and good scalability, but also maintains high accuracy.

A Self-Repairing Hardware Architecture for Safety-Critical Cyber-Physical- Systems

Article

Full-text available

Nov 2019

Digital embedded systems in safety-critical cyber-physical-systems (CPSs) require high levels of resilience and robustness against different fault classes. In recent years, self-healing concepts based on biological physiology have received attention for the design and implementation of reliable systems. However, many of these approaches have not been architected from the outset with safety in mind, nor have they been targeted for the safety-related automation industry where the significant need exists. This study presents a new self-healing hardware architecture inspired by integrating biological concepts, fault tolerance techniques, and IEC 61131-3 operational schematics to facilitate adaption in automation and critical infrastructure. The proposed architecture is organised in two levels: the critical functions layer used for providing the intended service of the application and the healing layer that continuously monitors the correct execution of that application and generates health syndromes to heal any failure occurrence inside the functions layer. Finally, two industrial applications have been mapped on this architecture to date, and the authors believe the nexus of its concepts can positively impact the next generation of critical CPSs in industrial automation

A Self-Repairing Hardware Architecture for Safety-Critical Cyber-Physical-Systems

Preprint

Oct 2019

Digital embedded systems in safety-critical cyber-physical-systems require high levels of resilience and robustness against different fault classes. In recent years, self-healing concepts based on biological physiology have received attention for the design and implementation of reliable systems. However, many of these approaches have not been architected from the outset with safety in mind, nor have they been targeted for the safety-related automation industry where significant need exists. This paper presents a new self-healing hardware architecture inspired by integrating biological concepts, fault tolerance techniques, and IEC 61131-3 operational schematics to facilitate adaption in automation and critical infrastructure. The proposed architecture is organized in two levels: the critical functions layer used for providing the intended service of the application and the healing layer that continuously monitors the correct execution of that application and generates health syndromes to heal any failure occurrence inside the functions layer. Finally, two industrial applications have been mapped on this architecture to date and we believe the nexus of its concepts can positively impact the next generation of critical cyber-physical-systems in industrial automation

A Self-Healing Hardware Architecture for Safety-Critical Digital Embedded Devices

Preprint

Sep 2019

Shawkat Sabah Khairullah

Digital Embedded Devices of next-generation safety-critical industrial automation systems require high levels of survivability and resilience against the hardware and software failure. One of the concepts for achieving this requirement is the design of resilient and survivable digital embedded systems. In the last two decades, development of self-healing digital systems based on molecular and cellular biology have received attention for the design of robust digital systems. However, many of these approaches have not been architected from the outset with safety in mind, nor have they been targeted for the applications of automation community where a significant need exists. This paper presents a new self-healing hardware architecture, inspired from the way nature responds, defends and heals: the stem cells in the immune system of living organisms, the life cycle of the living cell, and the pathway from Deoxyribonucleic acid (DNA) to protein. The proposed architecture is integrating cellular-based biological concepts, traditional fault tolerance techniques, and operational schematics for the international standard IEC 61131-3 to facilitate adoption in the automation industry and safety-critical applications. To date, two industrial applications have been mapped on the proposed architecture, which are capable of tolerating a significant number of faults that can stem from harsh environmental changes and external disturbances and we believe the nexus of its concepts can positively impact the next generation of critical systems in the automation industry

A Self-Healing Hardware Architecture for Safety-Critical Digital Embedded Devices

Article

Aug 2019

Shawkat Khairullah

Digital Embedded Devices of next-generation safety-critical industrial automation systems require high levels of survivability and resilience against the hardware and software failure. One of the concepts for achieving this requirement is the design of resilient and survivable digital embedded systems. In the last two decades, development of self-healing digital systems based on molecular and cellular biology have received attention for the design of robust digital systems. However, many of these approaches have not been architected from the outset with safety in mind, nor have they been targeted for the applications of automation community where a significant need exists. This paper presents a new self-healing hardware architecture, inspired from the way nature responds, defends and heals: the stem cells in the immune system of living organisms, the life cycle of the living cell, and the pathway from Deoxyribonucleic acid (DNA) to protein. The proposed architecture is integrating cellular-based biological concepts, traditional fault tolerance techniques, and operational schematics for the international standard IEC 61131-3 to facilitate adoption in the automation industry and safety-critical applications. To date, two industrial applications have been mapped on the proposed architecture, which are capable of tolerating a significant number of faults that can stem from harsh environmental changes and external disturbances and we believe the nexus of its concepts can positively impact the next generation of critical systems in the automation industry.

Toward Biologically-Inspired Self-Healing, Resilient Architectures for Digital Instrumentation and Control Systems and Embedded Devices

Thesis

Full-text available

Dec 2018

Shawkat Khairullah

Digital Instrumentation and Control (I&C) systems in safety-related applications of next generation industrial automation systems require high levels of resilience against different fault classes. One of the more essential concepts for achieving this goal is the notion of resilient and survivable digital I&C systems. In recent years, self-healing concepts based on biological physiology have received attention for the design of robust digital systems. However, many of these approaches have not been architected from the outset with safety in mind, nor have they been targeted for the automation community where a significant need exists. This dissertation presents a new self-healing digital I&C architecture called BioSymPLe, inspired from the way nature responds, defends and heals: the stem cells in the immune system of living organisms, the life cycle of the living cell, and the pathway from Deoxyribonucleic acid (DNA) to protein. The BioSymPLe architecture is integrating biological concepts, fault tolerance techniques, and operational schematics for the international standard IEC 61131-3 to facilitate adoption in the automation industry. BioSymPLe is organized into three hierarchical levels: the local function migration layer from the top side, the critical service layer in the middle, and the global function migration layer from the bottom side. The local layer is used to monitor the correct execution of functions at the cellular level and to activate healing mechanisms at the critical service level. The critical layer is allocating a group of functional B cells which represent the building block that executes the intended functionality of critical application based on the expression for DNA genetic codes stored inside each cell. The global layer uses a concept of embryonic stem cells by differentiating these type of cells to repair the faulty T cells and supervising all repair mechanisms. Finally, two industrial applications have been mapped on the proposed architecture, which are capable of tolerating a significant number of faults (transient, permanent, and hardware common cause failures CCFs) that can stem from environmental disturbances and we believe the nexus of its concepts can positively impact the next generation of critical systems in the automation industry.

Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems

Article

Full-text available

Dec 2017

Yanxiang Huang

With the increasing digital services demand, performance and power-efficiency become vital requirements for digital circuits and systems. However, the enabling CMOS technology scaling has been facing significant challenges of device uncertainties, such as process, voltage, and temperature variations. To ensure system reliability, worst-case corner assumptions are usually made in each design level. However, the over-pessimistic worst-case margin leads to unnecessary power waste and performance loss as high as 2.2x. Since optimizations are traditionally confined to each specific level, those safe margins can hardly be properly exploited. To tackle the challenge, it is therefore advised in this Ph.D. thesis to perform a cross-layer optimization for digital signal processing circuits and systems, to achieve a global balance of power consumption and output quality. To conclude, the traditional over-pessimistic worst-case approach leads to huge power waste. In contrast, the adaptive voltage scaling approach saves power (25% for the CORDIC application) by providing a just-needed supply voltage. The power saving is maximized (46% for CORDIC) when a more aggressive voltage over-scaling scheme is applied. These sparsely occurred circuit errors produced by aggressive voltage over-scaling are mitigated by higher level error resilient designs. For functions like FFT and CORDIC, smart error mitigation schemes were proposed to enhance reliability (soft-errors and timing-errors, respectively). Applications like Massive MIMO systems are robust against lower level errors, thanks to the intrinsically redundant antennas. This property makes it applicable to embrace digital hardware that trades quality for power savings.

Reliability evaluation of logic circuits based on transient faults propagation metrics

Article

Full-text available

Mar 2017
IEICE ELECTRON EXPR

Reliability has been an important consideration in designing modern circuits due to the nanometric scaling of CMOS technology. This paper proposes a reliability evaluation approach for logic circuits based on transient faults propagation metrics (TFPMs). In this approach, TFPMs of each nodes are calculated through reverse topological traversal of the target circuit by Boolean operations in parallel. Using these faults propagation features, the reliability of combinational circuits and full scan sequential circuits are evaluated efficiently. Experimental results and statistic analysis show the proposed approach can achieve about three orders of magnitude faster than Monte Carlo simulation (MCS) while maintaining accuracy.

NBTI-Aware Transient Fault Rate Analysis Method for Logic Circuit Based on Probability Voltage Transfer Characteristics

Article

Full-text available

Jan 2016

The reliability of Very Large Scale Integration (VLSI) circuits has become increasingly susceptible to transient faults induced by environmental noise with the scaling of technology. Some commonly used fault tolerance strategies require statistical methods to accurately estimate the fault rate in different parts of the logic circuit, and Monte Carlo (MC) simulation is often applied to complete this task. However, the MC method suffers from impractical computation costs due to the size of the circuits. Furthermore, circuit aging effects, such as negative bias temperature instability (NBTI), will change the characteristics of the circuit during its lifetime, leading to a change in the circuit’s noise margin. This change will increase the complexity of transient fault rate estimation tasks. In this paper, an NBTI-aware statistical analysis method based on probability voltage transfer characteristics is proposed for combinational logic circuit. This method can acquire accurate fault rates using a discrete probability density function approximation process, thus resolving the computation cost problem of the MC method. The proposed method can also consider aging effects and analyze statistical changes in the fault rates. Experimental results demonstrate that, compared to the MC simulation, our method can achieve computation times that are two orders of magnitude shorter while maintaining an error rate less than 9%.

A Stochastic Computational Approach for Accurate and Efficient Reliability Evaluation

Article

Jun 2014
IEEE T COMPUT

Reliability is fast becoming a major concern due to the nanometric scaling of CMOS technology. Accurate analytical approaches for the reliability evaluation of logic circuits, however, have a computational complexity that generally increases exponentially with circuit size. This makes intractable the reliability analysis of large circuits. This paper initially presents novel computational models based on stochastic computation; using these stochastic computational models (SCMs), a simulation-based analytical approach is then proposed for the reliability evaluation of logic circuits. In this approach, signal probabilities are encoded in the statistics of random binary bit streams and non-Bernoulli sequences of random permutations of binary bits are used for initial input and gate error probabilities. By leveraging the bit-wise dependencies of random binary streams, the proposed approach takes into account signal correlations and evaluates the joint reliability of multiple outputs. Therefore, it accurately determines the reliability of a circuit; its precision is only limited by the random fluctuations inherent in the stochastic sequences. Based on both simulation and analysis, the SCM approach takes advantages of ease in implementation and accuracy in evaluation. The use of non-Bernoulli sequences as initial inputs further increases the evaluation efficiency and accuracy compared to the conventional use of Bernoulli sequences, so the proposed stochastic approach is scalable for analyzing large circuits. It can further account for various fault models as well as calculating the soft error rate (SER). These results are supported by extensive simulations and detailed comparison with existing approaches.

An n -bit ripple-carry adder.

Context in source publication

Similar publications

Citations