Article

Hybrid Lockstep Technique for Soft Error Mitigation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This work presents the evaluation of a new dual-core lockstep hybrid approach aimed to improve the fault tolerance in microprocessors. Our approach takes advantage of modern multicore processor resources to combine software-based lockstep with a custom hardware observer. The first is used to duplicate data and instruction flows; meanwhile, the second is in charge of the control-flow monitoring. The proposal has been implemented in a dual-core ARM microprocessor and validated with low-energy proton irradiation and emulated fault injection campaigns. The results show an improvement of one order of magnitude in the cross section of the benchmarks tested, even considering the worst case scenario.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this investigation, a portable software-based technique is presented that improves multi-core COTS processors by enabling programmed soft-error fault tolerance capabilities. It is an extension and improvement of our previous baremetal approaches [3], and [4], which follows a two-thread duplication scheme with comparison & re-execution (multi-thread DWC-R). The novelty of the current work resides in re-targeting and adapting them to COTS devices with a higher number of cores, which better matches the current reality of COTS. ...
... A DWC single-thread version of the benchmark was added to the radiation campaign, to evaluate the differences between them, as will be discussed in Section V. A multi-threaded version of this technique (i.e., multi-thread DWC-R) was studied at different protection granularity levels in [3]. Furthermore, authors in [4] explored the combination of a two-thread DWC-R version with a custom IP for monitoring the control-flow. The technique produced a performance overhead of 2.5× and offered improvements of one order of magnitude in the cross-section of total errors. ...
... The benchmark was also tested in three different versions, depending on the hardening technique applied: Original (mm), Duplication With Comparison & Re-execution (mmDWC), Redundant Multi-Threading (mmRMT). Note that the DWC-R described in [4] and [6] used only one core to perform two identical executions of the critical section. Only if a mismatch was detected, was a third execution implemented. ...
... For instance, in [15], DCLS is successfully implemented using an ARM Cortex-A9 microprocessor and tested with heavy ions, resulting in a significant reduction in cross-section by one order of magnitude. Another hybrid technique that combines a dual-core microprocessor with thread replication and a trace Intellectual Property (IP) observer is proposed and tested with protons in [16]. Both approaches rely on microsynchronization to achieve system reliability. ...
Article
Full-text available
In various fields, such as those with high reliability requirements, there is a growing demand for high-performance microprocessors. Whereas commercial microprocessors offer a good trade-off between cost, size, and performance, they often need to be adapted to meet the reliability demands of safety-critical applications. To address this challenge, a Supervised Triple Macrosynchronized Lockstep architecture for multicore processors is presented in this work. Multiple recovery mechanisms, including rollback and roll-forward, have been implemented to harden the system. By integrating these mechanisms, the microprocessor becomes more robust and capable of mitigating potential errors or failures that may occur during operation. A quad-core ARM Cortex-A53 processor has been used as a case study and an extensive fault injection campaign in register file has been conducted to evaluate the effectiveness of our proposed approach. The results show that the hardened system exhibits high reliability, with 100% error coverage and error correction capabilities of up to 86.40%.
Article
Full-text available
CdZnTe (CZT) is an II–VI compound semiconductor with a zinc blende structure and has a wide range of applications in nuclear radiation detectors. As the device size becomes nanoscale, the displacement damage caused by high-energy proton irradiation in space environment has become one of the main factors affecting the electrical properties of components. In this paper, Monte Carlo software Geant4 is used to simulate the types and proportions of physical processes generated by different particles incident on CZT crystals, the proportion and depth distribution of NIEL of particles with different energy/angle. The results show that the type, energy and angle of incident particles will affect the proportion of energy deposition and the distribution of energy deposition with depth during irradiation. The physical effects are different under different irradiation conditions. The simulation results have reference value for studying the displacement damage law of CZT irradiated by particles under different conditions and the stability of CZT detector in radiation environment. Graphical abstract
Article
A software technique is presented to protect commercial multi-core microprocessors against radiation-induced soft errors. Important time overheads associated with conventional software redundancy techniques limit the feasibility of advanced critical electronic systems. In our approach, redundant bare-metal threads are used, so that critical computation is distributed over the different micro-processor cores. In doing so, software redundancy can be applied to Commercial Off-The-Shelf (COTS) micro-processors without incurring high-performance penalties. The proposed technique was evaluated using a low-cost single board computer (Raspberry Pi 4) under neutron irradiation. The results showed that the Redundant Multi-Threading versions detected and recovered all the Silent Data Corruption (SDC) events, and only increased HANG sensitivity with respect to the unhardened original versions. In addition, higher Mean Work to Failure (MWTF) estimations are achieved with our bare-metal technique than with the state-of-the-art bare-metal software-based techniques that only implement temporal redundancy.
Article
Reliability requirements are critical in applications used in harsh environments. Although commercial microprocessors offer a good trade-off between cost, size, and performance, they must be tailored to meet tight reliability requirements. This work focuses on the reliability of a real space data intensive application. As a case study we have selected an ESA space benchmark that processes images of a near infrared detector (NIR-Hawaii) involving a large quantity of data with a high computational load. The reliability of the system has been accomplished with the improvement of a macro-synchronized lockstep hardening technique, taking into account the specific special needs of data intensive applications. The microprocessor implementation platform is a commercial dual-core ARM cortex A9 microprocessor. Extensive fault injection campaigns have been carried out in both memory and register file to evaluate the proposed approach. Experimental results demonstrate the high reliability of the proposed hardened system, with error detection capabilities of 100 % and improved system recovery capabilities.
Article
Full-text available
This article presents a software protection technique against radiation-induced faults which is based on a multi-threaded strategy. Data triplication and instructions flow duplication or triplication techniques are used to improve system reliability and thus, ensure a correct system operation. To achieve this objective, a relaxed lockstep model to synchronize the execution of both, redundant threads and variables under protection on different processing units is defined. The evaluation was performed by means of simulated fault injection campaigns in a multi-core ARM system. Results show that despite being considered techniques that imply an evident overhead in memory and instructions (Duplication With Comparison and Re-Execution – DWC-R and Triple Modular Redundancy – TMR), spreading the replicas in different instruction flows not only produce similar results than classic techniques, but also improves the computational and recovery time in presence of soft-errors. In addition, this paper highlights the importance of protecting memory-allocated data, since the instruction flow triplication is not enough to improve the overall system reliability.
Conference Paper
Full-text available
This paper introduces the ARM Triple Core Lock-Step (TCLS) architecture, which builds up on the industry success of the ARM Cortex-R5 Dual-Core Lock-Step (DCLS) processor currently used in safety-critical real-time applications. The TCLS architecture adds a third redundant CPU unit to the DCLS Cortex-R5 system to achieve fail functional capabilities and hence increase the availability of the system. The TCLS architecture allows for transparent, quicker and more reliable resynchronization of the CPUs in the event of an error as the erroneous CPU can be identified by comparing its outputs, and the correct architectural state can be restored from one of the other two functionally correct CPUs. The quick resynchronization is also possible because there is no need to correct the state of the cache memories, which are shared and isolated from the CPUs. As the TCLS architecture provides reliability at the system level, individual CPUs do not need to be fault-tolerant, and can be implemented using commercial technology process that provides higher performance, better energy and cost efficiency than rad-hard process technology. The expectation is that the TCLS could increase reliability in the industrial applications where ARM processors are mainstream (e.g., automotive), as well as in new applications where there is currently no presence of ARM technology (e.g., space).
Article
Full-text available
In the presence of accelerated fault rates, which are projected to be the norm on future exascale systems, it will become increasingly difficult for high-performance computing (HPC) applications to accomplish useful computation. Due to the fault-oblivious nature of current HPC programming paradigms and execution environments, HPC applications are insufficiently equipped to deal with errors. We believe that HPC applications should be enabled with capabilities to actively search for and correct errors in their computations. The redundant multithreading (RMT) approach offers lightweight replicated execution streams of program instructions within the context of a single application process. However, the use of complete redundancy incurs significant overhead to the application performance. In this paper we present RedThreads, an interface that provides application-level fault detection and correction based on RMT, but applies the thread-level redundancy adaptively. We describe the RedThreads syntax and semantics, and the supporting compiler infrastructure and runtime system. Our approach enables application programmers to scope the extent of redundant computation. Additionally, the runtime system permits the use of RMT to be dynamically enabled, or disabled, based on the resiliency needs of the application and the state of the system. Our experimental results demonstrate how adaptive RMT exploits programmer insight and runtime inference to dynamically navigate the trade-off space between an application’s resilience coverage and the associated performance overhead of redundant computation.
Article
Full-text available
This paper presents and justifies an open benchmark suite named BEEBS, targeted at evaluating the energy consumption of embedded processors. We explore the possible sources of energy consumption, then select individual benchmarks from contemporary suites to cover these areas. Version one of BEEBS is presented here and contains 10 benchmarks that cover a wide range of typical embedded applications. The benchmark suite is portable across diverse architectures and is freely available. The benchmark suite is extensively evaluated, and the properties of its constituent programs are analysed. Using real hardware platforms we show case examples which illustrate the difference in power dissipation between three processor architectures and their related ISAs. We observe significant differences in the average instruction dissipation between the architectures of 4.4x, specifically 170uW/MHz (ARM Cortex-M0), 65uW/MHz (Adapteva Epiphany) and 88uW/MHz (XMOS XS1-L1).
Article
Full-text available
The growing availability of embedded processors inside FPGAs provides unprecedented flexibility for system designers. The use of such devices for space or mission critical applications, however, is being delayed by the lack of effective low cost techniques to mitigate radiation induced errors. In this paper a non invasive approach for the implementation of fault tolerant systems based on COTS processors embedded in FPGAs, using lockstep in conjunction with checkpoint and rollback recovery, is presented. The proposed approach does not require modifications in the processor architecture or in the application software. The experimental validation of this approach through fault injection is described, the corresponding results are discussed, and the addition of a write history table as a means to reduce the performance overhead imposed by previous implementations is proposed and evaluated.
Article
Full-text available
Embedded processors, like for example processor macros inside modern FPGAs, are becoming widely used in many applications. As soon as these devices are deployed in radioactive environments, designers need hardening solutions to mitigate radiation-induced errors. When low-cost applications have to be developed, the traditional hardware redundancy-based approaches exploiting m-way replication and voting are no longer viable as too expensive, and new mitigation techniques have to be developed. In this paper we present a new approach, based on processor duplication, checkpoint and rollback, to detect and correct soft errors affecting the memory elements of embedded processors. Preliminary fault injection results performed on a PowerPC-based system confirmed the efficiency of the approach.
Conference Paper
Full-text available
To improve performance and reduce power, processor designers employ advances that shrink feature sizes, lower voltage levels, reduce noise margins, and increase clock rates. However, these advances make processors more susceptible to transient faults that can affect correctness. While reliable systems typically employ hardware techniques to address soft-errors, software techniques can provide a lower-cost and more flexible alternative. This paper presents a novel, software-only, transient-fault-detection technique, called SWIFT. SWIFT efficiently manages redundancy by reclaiming unused instruction-level resources present during the execution of most programs. SWIFT also provides a high level of protection and performance with an enhanced control-flow checking mechanism. We evaluate an implementation of SWIFT on an Itanium 2 which demonstrates exceptional fault coverage with a reasonable performance cost. Compared to the best known single-threaded approach utilizing an ECC memory system, SWIFT demonstrates a 51% average speedup.
Article
Full-text available
Full system simulation seeks to strike a balance between accuracy and performance. Many of its possibilities have been obvious to practitioners in both academia and industry for quite some time, perhaps decades, but Simics supports more of these possibilities within a single framework than other tools do. Simics is a platform for full system simulation that can run actual firmware and completely unmodified kernel and driver code. It is sufficiently abstract to achieve tolerable performance levels, and it provides both functional accuracy for running commercial workloads and sufficient timing accuracy to interface to detailed hardware models. Simics can also run a heterogeneous network of systems from different vendors within the same framework. Exceptionally fast, Simics can easily add new components and leverage older ones within a practical abstraction level. It offers a platform with a rich API and a powerful scripting environment for use in a broad range of applications
Article
Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce higher soft error rates. This trend makes reliability a primary design constraint for computer systems. Redundant multithreading (RMT) makes use of parallelism in modern systems by employing thread-level time redundancy for fault detection and recovery. RMT can detect faults by running identical copies of the program as separate threads in parallel execution units with identical inputs and comparing their outputs. In this article, we present a survey of RMT implementations at different architectural levels with several design considerations. We explain the implementations in seminal papers and their extensions and discuss the design choices employed by the techniques. We review both hardware and software approaches by presenting the main characteristics and analyze the studies with different design choices regarding their strengths and weaknesses. We also present a classification to help potential users find a suitable method for their requirement and to guide researchers planning to work on this area by providing insights into the future trend.
Article
This paper presents a solution for error detection in ARM microprocessors based on the use of the trace infrastructure. This approach uses the Program and Instrumentation Trace Macrocells that are part of ARM's CoreSight architecture to detect control-flow and data-flow errors, respectively. The proposed approach has been tested with low-energy protons. Experimental results demonstrate high accuracy with up to 95% of observed errors detected in a commercial microprocessor with no hardware modification. In addition, it is shown how the proposed approach can be useful for further analysis and diagnosis of the cause of errors.
Conference Paper
Soft errors are one of the significant design technology challenges at smaller technology nodes and especially in radiation enviro nments. This paper presents a particular class of approaches to provide reliability against radiation-induced soft errors. The paper provides a review of the lockstep mechanism across different levels of design abstraction: processor design, architectural level, and the software level. This work explores techniques providing modifications in the processor pipeline, techniques allied with FPGA dynamic reconfiguration strategies and different types of spatial redundancy.
Article
This work proposes a methodology to diagnose radiation-induced faults in a microprocessor using the hardware trace infrastructure. The diagnosis capabilities of this approach are demonstrated for an ARM microprocessor under neutron and proton irradiation campaigns. Experimental results demonstrate that the execution status in the precise moment that the error occurred can be reconstructed, so that error diagnosis can be achieved.
Article
This work presents a hybrid error detection architecture that uses ARM PTM trace interface to observe ARM microprocessor behaviour. The proposed approach is suitable for COTS microprocessors because it does not modify the microprocessor architecture and is able to detect errors thanks to the reuse of its trace subsystem. Validation has been performed by proton irradiation and fault injection campaigns on a Zynq AP SoC including a Cortex-A9 ARM microprocessor and an implementation of the proposed hardware monitor in programmable logic. Experimental results demonstrate that a high error detection rate can be achieved on a commercial microprocessor.
Article
This paper presents a Dual-Core LockStep (DCLS) implementation to protect hard-core processors against radiationinduced soft errors. The proposed DCLS is applied to an ARM Cortex-A9 embedded processor. Different software optimizations were evaluated to assess their impact on performance and fault tolerance. Heavy ions experiments and fault injection emulation were performed to analyze the system susceptibility to errors and the DCLS performance. Results show that the approach is able to decrease the system cross-section and achieve high protection against errors. The DCLS successfully protects the system from up to 78% of the injected faults. The execution performance analysis shows that by reducing the number of verifications and augmenting the block partition execution time it is possible to increase the system reliability with minimal performance losses.
Article
This work analyzes the suitability of SIMD (Single Instruction Multiple Data) extensions of current microprocessors under radiation environments. SIMD extensions are intended for software acceleration, focusing mostly in applications that require high computational effort, which are common in many fields such as computer vision. SIMD extensions use a dedicated coprocessor that makes possible packing several instructions in one single extended instruction. Applications that require high performance could benefit from the use of SIMD coprocessors, but their reliability needs to be studied. In this work NEON™, the SIMD coprocessor of ARM microprocessors, has been selected as a case study to explore the behavior of SIMD extensions under radiation. Radiation experiments of ARM CORTEX™-A9 microprocessors have been accomplished with the objective of determining how the use of this kind of coprocessor can affect the system reliability.
Article
This paper presents an analysis of the efficiency of traditional fault tolerance methods on parallel systems running on top of Linux OS. It starts by studying the occurrence of software errors at systems presenting different levels of complexity, from sequential bare-metal to parallel Linux applications. Then two traditional fault tolerance mechanisms (TMR and a DWC variant) are applied to the applications and their efficiency analyzed. All cases were tested at the single and dual-core versions of an ARM Cortex-A9 processor, that is embedded in many commercial SoC. The OVP simulator platform is used to instantiate the processor model and to inject faults into the system. Faults are modeled as bit-flips in the processor registers. Results show that traditional fault tolerance algorithms are not efficient enough to protect a whole parallel system running on top of an operating system, given that the operating system itself is a major source of errors.
Conference Paper
Relentless technology scaling has made transistors more vulnerable to soft, or transient, errors. To keep systems robust against these, current error detection techniques use different types of redundancy at the hardware or the software level. A consequence of these additional protection mechanisms is that these systems tend to become slower. In particular, software error-detection techniques degrade performance considerably, limiting their uptake. This paper focuses on software redundant multi-threading error detection, a compiler-based technique that makes use of redundant cores within a multi-core system to perform error checking. Implementations of this scheme feature two threads that execute almost the same code: the main thread runs the original code and the checker thread executes code to verify the correctness of the original. The main thread communicates the values that require checking to the checker thread to use in its comparisons. We identify a major performance bottleneck in existing schemes: poorly performing inter-core communication and the generated code associated with it. Our study shows this is a major performance impediment within existing techniques since the two threads require extremely fine-grained communication, on the order of every few instructions. We alleviate this bottleneck with a series of code generation optimisations at the compiler level. We propose COMET (Communication-Optimised Multi-threaded Error-detection Technique), which improves performance across the NAS parallel benchmarks by 31.4% (on average) compared to the state-of-the-art, without affecting fault-coverage.
Article
Commercially available microprocessors could be useful to the space community for noncritical computations. There are many possible components that are smaller, lower-power, and less expensive than traditional radiation-hardened microprocessors. Many commercial microprocessors have issues with single-event effects (SEEs), such as single-event upsets (SEUs) and single-event transients (SETs), that can cause the microprocessor to calculate an incorrect result or crash. In this paper we present the Trikaya technique for masking SEUs and SETs through software mitigation techniques. Test results show that this technique can be very effective at masking errors, making it possible to fly these microprocessors for a variety of missions.
Article
Software-implemented fault tolerance (SIFT) mechanisms allow to tolerate transient hardware faults in commercial off-the-shelf (COTS) systems without using specialized resilient hardware. Unfortunately, existing SIFT methods at both the compiler and the operating system levels are often restricted to single-threaded applications and hence do not apply to multithreaded software on modern multicore platforms. We present RomainMT, an operating system service that provides replication for unmodified multithreaded applications. Replicating these programs is challenging, because scheduling-induced non-determinism may cause replicated threads to execute different valid code paths. This complicates the distinction between valid behavior and the effects of hardware errors. RomainMT solves these problems by transparently making multithreaded execution deterministic. We present two alternative mechanisms that differ in the assumptions made about the respective applications and investigate their performance implications. Our evaluation using the SPLASH2 benchmark suite shows that the overhead for triple-modular redundancy (TMR) is 24% for applications with two application threads and 65% for four application threads.
Conference Paper
To protect processor logic from soft errors, multicore redundant architectures execute two copies of a program on separate cores of a chip multiprocessor (CMP). Maintaining identical instruction streams is challenging because redundant cores operate independently, yet must still receive the same inputs (e.g., load values and shared-memory invalidations). Past proposals strictly replicate load values across two cores, requiring significant changes to the highly-optimized core. We make the key observation that, in the common case, both cores load identical values without special hardware. When the cores do receive different load values (e.g., due to a data race), the same mechanisms employed for soft error detection and recovery can correct the difference. This observation permits designs that relax input replication, while still providing correct redundant execution. In this paper, we present Reunion, an execution model that provides relaxed input replication and preserves the existing memory interface, coherence protocols, and consistency models. We evaluate a CMP-based implementation of the Reunion execution model with full-system, cycle-accurate simulation. We show that the performance overhead of relaxed input replication is only 5% and 6% for commercial and scientific workloads, respectively
Article
In this paper, we propose a new approach to implement a reliable softcore processor on SRAM-based FPGAs, which can mitigate radiation-induced temporary faults (single-event upsets (SEUs)) at moderate cost. A new Enhanced Lockstep scheme built using a pair of MicroBlaze cores is proposed and implemented on Xilinx Virtex-5 FPGA. Unlike the basic lockstep scheme, ours allows to detect and eliminate its internal temporary configuration upsets without interrupting normal functioning. Faults are detected and eliminated using a Configuration Engine built on the basis of the PicoBlaze core which, to avoid a single point of failure, is implemented as fault-tolerant using triple modular redundancy (TMR). A softcore processor can recover from configuration upsets through partial reconfiguration combined with roll-forward recovery. SEUs affecting logic which are significantly less likely than those affecting configuration are handled by checkpointing and rollback. Finally, to handle permanent faults, the tiling technique is also proposed. The new Enhanced Lockstep scheme requires significantly shorter error recovery time compared to conventional lockstep scheme and uses significantly smaller number of slices compared to known TMR-based design (although at the cost of longer error recovery time). The efficiency of the proposed approach was validated through fault injection experiments.
Article
Nowadays, a number of processor cores are available, either as soft intellectual property (IP) cores or as hard macros that can be employed in developing new systems on a chip. Developers of applications targeting harsh environments like the atmospheric radiation environment or the space radiation environment may benefit from the computing power of processor cores, provided that suitable techniques are available for guaranteeing their correct operations in presence of the ionizing radiation that abounds in such environments. In this paper, we describe a design flow and hardware/software architecture to successfully deploy processor IP cores in harsh environments. Experimental data are provided that confirm the robustness of the presented architecture with respect to transient errors induced by radiation and suggest the possibility of employing such architectures in deep-space exploration missions.
Conference Paper
An FPGA-based Linux test-bed was constructed for the purpose of measuring its sensitivity to single-event upsets. The test-bed consists of two ML410 Xilinx development boards connected using a 124-pin custom connector board. The Design Under Test (DUT) consists of the “hard core” PowerPC, running the Linux OS and several peripherals implemented in “soft” (programmable) logic. Faults were injected via the Internal Configuration Access Port (ICAP). The experiments performed here demonstrate that the Linux-based system was sensitive to 92,542 upsets-less than 0.7 percent of all tested bits. Each sensitive bit in the bit-stream is mapped to the resource and user-module to which it configures. A density metric for comparing the reliability of modules within the system is presented.
Article
Higher transistor counts, lower voltage levels, and reduced noise margin increase the susceptibility of multicore processors to transient faults. Redundant hardware modules can detect such faults, but software techniques are more appealing for their low cost and flexibility. Recent software proposals have not achieved widespread acceptance because they either increase register pressure, double memory usage, or are too slow in the absence of hardware extensions. This paper presents DAFT, a fast, safe, and memory efficient transient fault detection framework for commodity multicore systems. DAFT replicates computation across multiple cores and schedules fault detection off the critical path. Where possible, values are speculated to be correct and only communicated to the redundant thread at essential program points. DAFT is implemented in the LLVM compiler framework and evaluated using SPEC CPU2000 and SPEC CPU2006 benchmarks on a commodity multicore system. Evaluation results demonstrate that speculation allows DAFT to improves the performance of software redundant multithreading by 2.17× with no degradation of fault coverage.
Conference Paper
Aggressive CMOS scaling will make future chip multiprocessors (CMPs) increasingly susceptible to transient faults, hard errors, manufacturing defects, and process variations. Existing fault-tolerant CMP proposals that implement dual modular redundancy (DMR) do so by statically binding pairs of adjacent cores via dedicated communication channels and buffers. This can result in unnecessary power and performance losses in cases where one core is defective (in which case the entire DMR pair must be disabled), or when cores exhibit different frequency/leakage characteristics due to process variations (in which case the pair runs at the speed of the slowest core). Static DMR also hinders power density/thermal management, as DMR pairs running code with similar power/thermal characteristics are necessarily placed next to each other on the die. We present dynamic core coupling (DCC), an architectural technique that allows arbitrary CMP cores to verify each other's execution while requiring no static core binding at design time or dedicated communication hardware. Our evaluation shows that the performance overhead of DCC over a CMP without fault tolerance is 3% on SPEC2000 benchmarks, and is within 5% for a set of scalable parallel scientific and data mining applications with up to eight threads (16 processors). Our results also show that DCC has the potential to significantly outperform existing static DMR schemes.
Transient fault detection via simultaneous multithreading
  • S K Reinhardt
  • S S Mukherjee