Block diagram of the hardware monitor [39].

Block diagram of the hardware monitor [39].

Source publication
Article
Full-text available
Soft-core processors implemented in SRAM-based FPGAs are an attractive option for applications to be employed in radiation environments due to their flexibility, relatively-low application development costs, and reconfigurability features enabling them to adapt to the evolving mission needs. Despite the advantages soft-core processors possess, they...

Similar publications

Preprint
Full-text available
High-Level Synthesis has introduced reconfigurable logic to a new world -- that of software development. The newest wave of HLS tools has been successful, and the future looks bright. But is HLS the end-all-be-all to FPGA acceleration? Is it enough to allow non-experts to program FPGAs successfully, even when dealing with troublesome data structure...
Article
Full-text available
This paper presents a power-oriented monitoring of clock signals that is designed to avoid synchronization failure in computer systems such as FPGAs. The proposed design reduces power consumption and increases the power-oriented checkability in FPGA systems. These advantages are due to improvements in the evaluation and measurement of corresponding...
Chapter
Full-text available
Approximate computing is a design paradigm for error-tolerant applications. By relaxing the accuracy requirement, it can significantly reduce circuit area and power consumption. There are many approximate logic synthesis (ALS) methods, but few of them target at FPGA designs. In this work, we propose an ALS method for FPGAs based on decomposition. I...
Article
Full-text available
Field Programmable Gate Arrays (FPGAs) are special type processor which allows the end user to configure directly. This paper investigates to design a low power reconfigurable Asynchronous FPGA cells. The proposed design combines four-phase dual-rail encoding and LEDR (Level-Encoded Dual-Rail) encoding with sleep controller. Four-phase dual-rail en...
Preprint
Full-text available
The online reconstruction of muon tracks in High Energy Physics experiments is a highly demanding task, typically performed with programmable logic boards, such as FPGAs. Complex analytical algorithms are executed in a quasi-real-time environment to identify, select and reconstruct local tracks in often noise-rich environments. A novel approach to...

Citations

... These radiation effects are well known as well as solutions about how to mitigate them [1] [2][3] [4]. An example of such mitigation techniques for radiation hardening are the ones applied to the widely used in space LEON3 processor [5] over an FPGA. Such mitigations are focused around single error bit flips. ...
... Adopting fault mitigation or fault tolerance techniques is vital if FPGAs are used in radiation environments. Fault tolerance techniques that enhance embedded processor reliability can be categorized as hardware-, softwareand hybrid-based techniques [8]. ...
Preprint
Full-text available
The emergence of new nanoscale technologies has imposed significant challenges to designing reliable electronic systems in radiation environments. A few types of radiation like Total Ionizing Dose (TID) effects often cause permanent damages on such nanoscale electronic devices, and current state-of-the-art technologies to tackle TID make use of expensive radiation-hardened devices. This paper focuses on a novel and different approach: using machine learning algorithms on consumer electronic level Field Programmable Gate Arrays (FPGAs) to tackle TID effects and monitor them to replace before they stop working. This condition has a research challenge to anticipate when the board results in a total failure due to TID effects. We observed internal measurements of the FPGA boards under gamma radiation and used three different anomaly detection machine learning (ML) algorithms to detect anomalies in the sensor measurements in a gamma-radiated environment. The statistical results show a highly significant relationship between the gamma radiation exposure levels and the board measurements. Moreover, our anomaly detection results have shown that a One-Class Support Vector Machine with Radial Basis Function Kernel has an average Recall score of 0.95. Also, all anomalies can be detected before the boards stop working.
... Over the years, many efficient techniques have been used to mitigate radiation effects [2], often making use of some notion of spatial or temporal redundancy [3]- [7]. Triple Modular Redundancy (TMR), one of the most commonly employed solutions, is a technique that employs three instances of a module and adds a majority voter at their outputs. ...
Preprint
The interplay between security and reliability is poorly understood. This paper shows how triple modular redundancy affects a side-channel attack (SCA). Our counterintuitive findings show that modular redundancy can increase SCA resiliency.
... AC has been explored for processor components, graphical processing units, and field-programmable gate arrays (FPGAs) [3]. Based on the implementation level, the work of [8], grouped up approximation computing techniques as: software, architecture, and hardware. ...
... The processing power of computers is exhausted by the incessant increase in the data to be processed, and the dilemma of shrinking technology has added its intrinsic challenges. Computer systems are susceptible to soft errors caused by high-energy particles or larger type of radiations, leading to a bit flip in logic values [3,4,11]. Typically, these soft errors are rectified through a redundancy technique of fault masking. ...
... Typically, these soft errors are rectified through a redundancy technique of fault masking. In triple modular redundancy (TMR), an original circuit is replicated three times, and the output of each replication is fed as an input to a majority voter that provides the final output [3,4]. The triplication of the original modules leads to an area and related overhead of 200%. ...
Article
Full-text available
In recent years, approximate computing (AC) has attracted attention owing to its tradeoff between the exactness of computations and performance gains. AC has also been probed for the technique of Triple modular redundancy (TMR). TMR is a well-known fault masking methodology, with associated overheads, widely used in systems of different nature and at different levels. E.g.: layout-level, gate-level, HW-module level, software. At hardware level, through exploitation of AC the 200% area overhead problem due to triplication of the original modules in TMR can be reduced. By approximating the modules of TMR while ensuring that at least two of the approximate modules do not differ from the original module for every input vector, the facilitation of fault masking can lead to overhead reduction. Hence, approximate TMR (ATMR) aims to achieve cost-effective reliability. Nevertheless, due to the extensive search space, computational complexity, and principal fault masking function of ATMR, designing an ATMR is a challenging task. An ATMR technique must be scalable so that it can be easily adopted by circuits having large number of inputs and the extraction of ATMR modules remains computationally inexpensive. Compared with TMR, due to the inclusion of approximations, ATMR is more vulnerable to errors, and hence, the design technique must ensure awareness of input-criticality. To the best of the authors' knowledge, none of the existing survey articles on AC has reported on ATMR. Therefore, in this work, ATMR design techniques are thoroughly surveyed and qualitatively compared. Moreover, design considerations and challenges for designing ATMR are discussed.
Article
The limitations of scaling in CMOS technology pose challenges in meeting the requirements of future applications. To address these challenges, researchers are exploring various design techniques, including Approximate Computing (AC), which leverages the inherent error resilience of applications to achieve high performance and energy gains with desired quality. AC has gained popularity as a computer paradigm for error-resilient applications, and many researchers have studied AC across computing layers and developed tools for implementing these techniques. This paper provides a comprehensive survey of AC techniques at the abstraction levels of software and hardware and discusses the tools to implement AC in hardware and software, quality evaluation tools and comparison points. The paper also covers existing frameworks for AC, potential applications, future research directions, challenges and limitations. This information can guide researchers in identifying promising avenues for further advancements and innovations in this domain. Additionally, this paper compares state-of-the-art surveys of AC and highlights the unique features and contributions of this work that distinguish our work from previous surveys.
Article
Full-text available
The emergence of new nanoscale technologies has imposed significant challenges to designing reliable electronic systems in radiation environments. A few types of radiation like Total Ionizing Dose (TID) can cause permanent damages on such nanoscale electronic devices, and current state-of-the-art technologies to tackle TID make use of expensive radiation-hardened devices. This paper focuses on a novel and different approach: using machine learning algorithms on consumer electronic level Field Programmable Gate Arrays (FPGAs) to tackle TID effects and monitor them to replace before they stop working. This condition has a research challenge to anticipate when the board results in a total failure due to TID effects. We observed internal measurements of FPGA boards under gamma radiation and used three different anomaly detection machine learning (ML) algorithms to detect anomalies in the sensor measurements in a gamma-radiated environment. The statistical results show a highly significant relationship between the gamma radiation exposure levels and the board measurements. Moreover, our anomaly detection results have shown that a One-Class SVM with Radial Basis Function Kernel has an average recall score of 0.95. Also, all anomalies can be detected before the boards are entirely inoperative, i.e. voltages drop to zero and confirmed with a sanity check.
Article
This paper proposes a novel one‐port passive circuit topology consisting of a two‐dimensional network of resistors and capacitors, which can be used as a fault‐tolerant building block for analog circuit design. Through an analytical procedure, the network is shown to follow simple first‐order admittance dynamics. A Monte Carlo method is employed to describe the effect of simultaneous faults (short or open circuit) in random network elements in terms of confidence bounds in the frequency‐domain admittance profile. Faults in 10% of the elements resulted in only minor changes of the frequency response (up to 3.9 dB in magnitude and 12.5 ∘$$ {}^{\circ } $$ in phase in 95% of the cases). An example is presented to illustrate the use of the proposed RC network in the fault‐tolerant design of a low‐pass filter. This paper proposes a novel two‐dimensional RC network topology that can be used as a fault‐tolerant building block for analog circuit design. The network follows simple first‐order admittance dynamics under nominal conditions. A Monte Carlo method is employed to characterize the effect of simultaneous faults (short or open circuit) in random network elements. A fault‐tolerant design of a low‐pass filter is presented for illustration.
Article
Full-text available
All-Programmable System-on-Chips (APSoCs) constitute a compelling option for employing applications in radiation environments thanks to their high-performance computing and power efficiency merits. Despite these advantages, APSoCs are sensitive to radiation like any other electronic device. Processors embedded in APSoCs, therefore, have to be adequately hardened against ionizing-radiation to make them a viable choice of design for harsh environments. This paper proposes a novel lockstep-based approach to harden the dual-core ARM Cortex-A9 processor in the Xilinx Zynq-7000 APSoC against radiation-induced soft errors by coupling it with a MicroBlaze TMR subsystem in the programmable logic (PL) layer of the Zynq. The proposed technique uses the concepts of checkpointing along with roll-back and roll-forward mechanisms at the software level, i.e. software redundancy, as well as processor replication and checker circuits at the hardware level (i.e. hardware redundancy). Results of fault injection experiments show that the proposed approach achieves high levels of protection against soft errors by mitigating around 98% of bit-flips injected into the register files of both ARM cores while keeping timing performance overhead as low as 25% if block and application sizes are adjusted appropriately. Furthermore, the incorporation of the roll-forward recovery operation in addition to the roll-back operation improves the Mean Workload between Failures (MWBF) of the system by up to ≈19% depending on the nature of the running application, since the application can proceed faster, in a scenario where a fault occurs, when treated with the roll-forward operation rather than roll-back operation. Thus, relatively more data can be processed before the next error occurs in the system. <br/