Figure - available from: Journal of Electronic Testing
This content is subject to copyright. Terms and conditions apply.
Example of different memory maps with and without program section replication. The first configuration shows the most widely used memory section distribution. The second configuration shows how the section are replicated for 3 processing units. The last configuration shows how the different replicas can be easily isolated in different memory sections to perform a parallel computation over them

Example of different memory maps with and without program section replication. The first configuration shows the most widely used memory section distribution. The second configuration shows how the section are replicated for 3 processing units. The last configuration shows how the different replicas can be easily isolated in different memory sections to perform a parallel computation over them

Source publication
Article
Full-text available
This article presents a software protection technique against radiation-induced faults which is based on a multi-threaded strategy. Data triplication and instructions flow duplication or triplication techniques are used to improve system reliability and thus, ensure a correct system operation. To achieve this objective, a relaxed lockstep model to...

Similar publications

Conference Paper
Full-text available
This paper presents a high-throughput fault-resilient hardware implementation of AES S-box, called HFS-box. We propose a deep pipelining S-box at the gate level in which a novel DMR technique is used for fault correction. The proposed fault-resilient technique is based on fault correction in DMR implementation (FC-DMR) of each S-box's combined with...
Article
Full-text available
As soft errors are important design concerns in embedded systems, several schemes have been presented to protect embedded systems against them. Embedded systems can be protected by hardware redundancy; however, hardware-based protections cannot provide flexible protection due to hardware-only protection modifications. Further, they incur significan...
Article
Full-text available
The Modular multilevel converters (MMCs) have excellent performance when operated with high modulation index such as the case of gird-connected converters in high voltage direct current (HVDC) system. Reliable and fault tolerant operation (FTO) of the system is very important in HVDC. A novel FTO algorithm under sub module failure without using any...
Article
Full-text available
In recent years, approximate computing (AC) has attracted attention owing to its tradeoff between the exactness of computations and performance gains. AC has also been probed for the technique of Triple modular redundancy (TMR). TMR is a well-known fault masking methodology, with associated overheads, widely used in systems of different nature and...
Article
Full-text available
The expected k-coverage, prolonged network lifetime, and fault-tolerance capabilities play a vital role in the success of various application operations in Wireless Sensor Networks (WSNs), as they are the key performance indicators of WSNs. The k-coverage protocol ensures that the entire target region of interest (R) is the whole k-covered, where R...

Citations

... In Commercial Off-The-Shelf (COTS) microprocessor-based systems, where hardware modifications are impossible, different approaches can be found at the system level, and specifically at the software level. In this regard, researchers have proposed several system-level fault tolerance works to be implemented in this type of microprocessor [4,5,6] Moreover, researchers have recently shown their interest and directed their efforts towards two other types of processors: ARM-based architectures [7,8] and RISC-V-based microprocessors [9,10,4]. On the one hand, the main attraction of ARM is that it is a low-power architecture, which is why it has become the dominant architecture in the mobile electronics sector, i.e., smartphones. ...
Article
Full-text available
Approximate Computing techniques have been successfully used to reduce the overhead associated with redundancy in fault-tolerant system designs. This paper presents a fault tolerance method to reduce the execution time overhead of the well-known Time Redundancy technique by means of an improvement proposed for the Approximate Computing software-based technique known as loop perforation. Time Redundancy is a software-based fault tolerance technique that involves executing replicas of a task at different times. We propose to approximate the tasks to be executed using a new approximate computing technique based on loop perforation, i.e., simplified iterations. The novelty of this method is the combined use of the fault tolerance technique, temporal redundancy, jointly with the new proposed Approximate Computing technique, simplified iterations. The proposal is validated through simulation-based fault injection campaigns on several test programs for the ARM and RISC-V microprocessor architectures. Experimental results verified not only the applicability of the proposal in different architectures, but also its effectiveness, showing a good trade-off between reliability, error and overhead. Results showed that using the proposed method, a normalized mean work to failure (MWTF) up to 5.28× was obtained with approximation errors lower than those obtained using the traditional loop perforation technique.
Article
A software technique is presented to protect commercial multi-core microprocessors against radiation-induced soft errors. Important time overheads associated with conventional software redundancy techniques limit the feasibility of advanced critical electronic systems. In our approach, redundant bare-metal threads are used, so that critical computation is distributed over the different micro-processor cores. In doing so, software redundancy can be applied to Commercial Off-The-Shelf (COTS) micro-processors without incurring high-performance penalties. The proposed technique was evaluated using a low-cost single board computer (Raspberry Pi 4) under neutron irradiation. The results showed that the Redundant Multi-Threading versions detected and recovered all the Silent Data Corruption (SDC) events, and only increased HANG sensitivity with respect to the unhardened original versions. In addition, higher Mean Work to Failure (MWTF) estimations are achieved with our bare-metal technique than with the state-of-the-art bare-metal software-based techniques that only implement temporal redundancy.
Article
This work presents the evaluation of a new dual-core lockstep hybrid approach aimed to improve the fault tolerance in microprocessors. Our approach takes advantage of modern multicore processor resources to combine software-based lockstep with a custom hardware observer. The first is used to duplicate data and instruction flows; meanwhile, the second is in charge of the control-flow monitoring. The proposal has been implemented in a dual-core ARM microprocessor and validated with low-energy proton irradiation and emulated fault injection campaigns. The results show an improvement of one order of magnitude in the cross section of the benchmarks tested, even considering the worst case scenario.