Workflow of OVF computation

Workflow of OVF computation

Source publication
Article
Full-text available
As process technology scales, electronic devices become more susceptible to soft error induced by radiation. Silent data corruption (SDC) is considered the most severe outcome incurred by soft error. The effects of faulty variables on producing SDC vary widely. Without a profiling of vulnerability of variables, the derived detectors often incur low...

Context in source publication

Context 1
... calculation of OVF is based on eDDG. The whole process has three steps (see Fig.2). ...

Citations

... Another reason for the discrepancy is the masking effects of logical instructions [15,35]. Assume that the result of a logical AND of two input registers is stored in the destination register. ...
Article
Full-text available
With aggressive technology scaling, soft errors have become a major threat in modern computing systems. Several techniques have been proposed in the literature and implemented in actual devices as countermeasures to this problem. However, their effectiveness in ensuring error-free computing cannot be ascertained without an accurate reliability estimation methodology. This can be achieved by using the vulnerability metric: the probability of system failure as a function of the time the program data are exposed to transient faults. In this work, we present a gemV-tool, a comprehensive toolset for estimating system vulnerability, based on the cycle-accurate gem5 simulator. The three main characteristics of the gemV-tool are: (i) fine-grained modeling: vulnerability modeling at a fine-grained granularity through the use of RTL abstraction; (ii) accurate modeling: accurate vulnerability calculation of speculatively executed instructions; and (iii) comprehensive modeling: vulnerability estimation of all the sequential elements in the out-of-order processor core. We validated our vulnerability models through extensive fault injection campaigns with <3% correlation error and 90% statistical confidence. Using the gemV-tool, we made the following observations: (i) the vulnerability of two microarchitectural configurations with similar performance can differ by 82%; (ii) the vulnerability of a processor can vary by more than 10×, depending on the implemented algorithm; and (iii) the vulnerability of each component in the processor varies significantly, depending on the ISA of the processor.
... The detection probability of assertion detection (P Mdet ) can be calculated according to the concept of error masking parameter [20], which refers to the error detection rate of the detection method. An example of P Mdet calculation is shown here using the detection relationship of equality. ...
... When dor is 2 and dop is 0, the average value of reliability and overhead are 95.75% and 215.72%, respectively. S represents the average energy efficiency of each BB (see Eq. (20)). The higher S is, the better the tradeoff effect is. ...
Article
Full-text available
The high energy particles in the space environment will perturb integrated circuits, resulting in system errors or even failures, which is also known as single event effects (SEE). To ensure the normal operation of space systems, it is first necessary to detect these errors. However, detection algorithms also bring additional overhead to the system and reduce its performance. Therefore, we aim to find a trade-off between reliability and performance. To this end, we propose a quantitative evaluation model for detection methods that evaluates the reliability gain of different detection methods under the same overhead. Our method allocates the optimal detection method to the corresponding code segment based on the quantitative results, thereby achieving a trade-off between reliability and performance. Experimental results show that the average energy efficiency of our trade-off method is 91.34%, which is 21.49% higher than the other methods.
Article
Full-text available
The recent trend in most processor manufacturing technologies has significantly increased the vulnerability of embedded systems operating in harsh environments against soft errors. These errors can cause Silent Data Corruptions (SDCs) that produce erroneous execution results silently, disturbing the system’s execution and potentially leading to severe financial, human or environmental disasters. The use of fault tolerance techniques that take into account the performance and constraints of safety-critical systems is therefore essential to improve system reliability efficiently. Given the significant overhead imposed by conventional techniques, e.g., performance loss, increased memory usage, and additional hardware costs, researchers have developed cost-effective software-based techniques for fault tolerance. However, as detection rates grow, these techniques can increase code size and execution time significantly, which creates a challenge. This paper proposes an automated framework for selective fault tolerance of SDCs in software running on different architectures. The framework comprises a sequence of several consecutive techniques executed automatically. It offers a software-based technique that operates at the microarchitecture level and evaluates the vulnerability of program instructions against SDC errors. The framework conducts vulnerability assessment at the binary code level using a non-intrusive, runtime fault injection mechanism. It can inject faults at different granularity levels to maximize fault activation, including fine-grained injection at specific instruction fields or encoding bits, and coarse-grained injection into the entire software system. The framework makes minor modifications to the software being tested, enabling it to run at near-native speed. When SDC vulnerable instructions are identified, the framework selectively protects them automatically using a compiler extension, achieving a more appropriate trade-off between SDC detection and overhead by avoiding overprotection. Our framework was evaluated by conducting a large number of fault injection-based experiments on real-world benchmark programs using the cycle-accurate Gem5 simulator. Leveraging the accurate vulnerability assessment results provided by our framework, the proposed selective technique reduces SDC errors by up to 99% by selectively protecting only 45% of the program’s static instructions, with a performance overhead ranging from 8% to 35%.
Article
Full-text available
Radiation-induced soft errors, despite rare, pose a significant threat to the reliability of systems. Assessing the intrinsic resilience of software to soft errors is therefore essential for building fault-tolerant systems cost-effectively. Analytical models, while fast, can be imprecise. In contrast, Fault Injection (FI) has been successfully applied as a mature method for reliability assessment. While high-level FI offers less accuracy, existing low-level techniques can enhance resilience assessment accuracy by sacrificing some desirable features like fault coverage and intrusiveness. Furthermore, these techniques are often driven by random FI campaigns, making establishing a clear correlation between application characteristics and resilience challenging. This paper presents BiGResi, a versatile software-based framework for assessing software resilience. BiGResi overcomes the limitations of random, instruction type-agnostic FI techniques by evaluating resilience at a low-level granularity, considering instruction type and bit location. Furthermore, it targets the instruction set architecture (ISA), enhancing assessment accuracy by revealing architecturally visible faults. BiGResi employs a timing-based FI mechanism with negligible modifications to the target software, minimizing intrusiveness and ensuring near-native speed. BiGResi’s accuracy is empirically evaluated through many FI campaigns targeting different benchmarks with diverse characteristics. We observed that instruction types, ISA encoding bits, and bit location are key factors to consider when assessing software resilience. Finally, BiGResi’s effectiveness is demonstrated by selectively applying instruction protection, resulting in an average reduction of silent data corruptions (SDCs) by 73.80%, with a performance overhead of 15.46%. Furthermore, allowing a slightly higher overhead of 22% can improve the SDC detection rate by up to 93.83%.
Article
One of the most difficult data flow errors to detect caused by single event upsets in space radiation is the Silent Data Corruption (SDC). To solve the problem of multi-bit upsets causing program SDC, an instruction multi-bit SDC vulnerability prediction model based on one-class support vector machine classification is built using SDC vulnerability analysis, which has more accurate vulnerability instruction identification capabilities. By hardening the program with selective instruction redundancy, we propose a multi-bit data flow error detection method for detecting SDC error (SDCVA-OCSVM), aiming to protect the data in the memory or register used by the program. We have also verified the effectiveness of the method through comparative experiments. The method has been verified to have a higher error detection rate and lower code size and time overhead.
Article
Silent data corruptions (SDCs) have been always regarded as the serious effect of radiation-induced faults. Traditional solutions based on redundancies are very expensive in terms of chip area, energy consumption, and performance. Consequently, providing low-cost and efficient approaches to cope with SDCs has received researchers’ attention more than ever. On the other hand, identifying SDC-prone data and instruction in a program is a very challenging issue, as it requires time-consuming fault injection processes into different parts of a program. In this article, we present a cost-efficient approach to detecting and mitigating the rate of SDCs in the whole program with the presence of multibit faults without a fault injection process. This approach uses a combination of machine learning and a metaheuristic algorithm that predicts the SDC event rate of each instruction. The evaluation results show that the proposed approach provides a high level of detection accuracy of 99% while offering a low-performance overhead of 58%.