About
63
Publications
13,383
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
782
Citations
Introduction
Joaquín Gracia-Morán is Ph.D. in computer engineering from the Universitat Politècnica de València (UPV), Valencia, Spain.
He is currently an Associate Professor at the Department of Computer Engineering. He's a member of the Fault-Tolerant Systems (STF) research line within the Instituto ITACA. His current research interests include the design and implementation of digital systems, the design and validation of STF, specially those based on Error Correction Codes, and VHDL-based fault injection.
Additional affiliations
January 2000 - present
Publications
Publications (63)
Actual memory systems provide large storage capacity thanks to the integration scale level achieved in CMOS technology. This increment in storage capacity comes with an augment on their fault rate. In this way, the probability of experiencing Single or Multiple Cell Upsets has risen. Error Correction Codes (ECC) are a fault-tolerant mechanism broad...
Deploying convolutional neural networks (CNNs) in
image classification systems requires balancing conflicting goals,
like throughput, power consumption, and silicon area. In safetycritical
environments, ensuring acceptable levels of robustness
against faults is also of utmost importance. The robustness gains
promoted by quantised CNNs entail a loss...
MBU is an increasing challenge in SRAM memory, due to the chip’s large area of SRAM, and supply power scaling applied to reduce static consumption. Powerful ECCs can cope with random MBUs, but at the expense of complex encoding/decoding circuits, and high memory redundancy. Alternatively, radiation-hardened cell is an alternative technique that can...
La reducción de tamaño de la tecnología CMOS
ha permitido aumentar las prestaciones de los sistemas
empotrados. Sin embargo, esta disminución de tamaño implica
un aumento en la tasa de fallos. En este sentido, cobra gran
importancia la protección, de forma sencilla y rápida, de los
datos que procesan estos sistemas.
Uno de los aspectos más importan...
El uso combinado de lenguajes de programación
de alto nivel con herramientas de automatización de
diseño electrónico, facilita el desarrollo de modelos
de redes neuronales sintetizables sobre lógica programable.
Aunque específicos, los aceleradores HW así
producidos pueden optimizarse para ofrecer un buen
balance entre prestaciones, área de silicio...
Durante estos últimos años, el desarrollo
tecnológico ha propiciado un aumento en las prestaciones de los
sistemas digitales, pero a costa de reducir su confiabilidad. Por
ejemplo, y gracias a la continua reducción de tamaño de la
tecnología CMOS, los sistemas de memoria proporcionan hoy
en día una gran capacidad de almacenamiento a costa de un
aum...
During these last years, the use of embedded systems has grown exponentially, mainly due to the expansion of the Internet of Things (IoT). Data collected by IoT devices are sent to the cloud to be processed in datacenters. Edge Computing philosophy wants to change this “passive” behavior of IoT devices. The basic idea is to process data produced by...
El aumento en la escala de integración de los circuitos CMOS ha posibilitado la implementación de sistemas de memoria con una gran capacidad de almacenamiento, pero a costa de aumentar su tasa de fallos. Una posible solución es la inclusión de Códigos de Corrección de Errores (ECCs). Este mecanismo de tolerancia a fallos permite proteger a los sist...
Con la continua reducción de tamaño de la tecnología CMOS, la probabilidad de sufrir tanto fallos simples como múltiples en los sistemas de memoria aumenta. Así pues, son necesarios Mecanismos de Tolerancia a Fallos (MTF) que los protejan. Tradicionalmente, se han utilizado diferentes Códigos Correctores de Errores (ECC) para este fin. A la hora de...
Nowadays, CMOS technology integration scale has allowed memory systems with a large storage capacity. However, it has also caused an increase in its fault rate.
One possible solution is the use of Error Correction Codes (ECCs). New ECCs are continually being proposed. These proposals consider a multitude of factors, such as redundancy, or different...
With the continuous size reduction of CMOS technology, faults suffered
by RAM memory systems are more likely. Thus, the probability of occurrence
of Multiple Cell Upsets (MCUs), in addition to Single Cell Upsets (SCUs),
augments. Traditionally, Error Correction Codes (ECCs) are a family of Fault
Tolerance Mechanisms (FTMs) that have been used to pr...
The continuous increment in the integration scale of CMOS technology has provoked an augment in the fault rate. Particularly, a single particle hit in a storage element (such as memory or registers) can provoke a single error in a memory cell (known as Single Cell Upsets or SCU), as well as simultaneous errors in more than one memory cell (known as...
En la actualidad, la escala de integración de la tecnología CMOS ha permitido diseñar sistemas de memoria con una gran capacidad de almacenamiento. Sin embargo, también ha provocado un incremento en su tasa de fallos.
Una posible solución es el uso de Códigos de Corrección de Errores (ECCs), tal y como se puede comprobar en la literatura científic...
Due to transistor shrinking, intermittent faults are a major concern in current digital systems. This work presents an adaptive fault tolerance mechanism based on error correction codes (ECC), able to modify its behavior when the error conditions change without increasing the redundancy. As a case example, we have designed a mechanism that can dete...
The Bose-Chaudhuri-Hocquenghem (BCH) codes are a well-known class of powerful error correction cyclic codes. BCH codes can correct multiple errors with minimal redundancy. Primitive BCH codes only exist for some word lengths, which do not frequently match those employed in digital systems. This paper focuses on double error correction (DEC) codes f...
Reliable computer systems employ error control codes (ECCs) to protect information from errors. For example, memories are frequently protected using single error correction-double error detection (SEC-DED) codes. ECCs are traditionally designed to minimize the number of redundant bits, as they are added to each word in the whole memory. Nevertheles...
Durante estos últimos años, el desarrollo
tecnológico ha permitido aumentar la escala de integración de
los circuitos integrados. En particular, este aumento ha
posibilitado la creación de sistemas de memoria de gran
capacidad. Sin embargo, también ha provocado un incremento
en su tasa de fallos, aumentando la probabilidad de que se
produzcan Singl...
Due to the increasing defect rates in highly scaled complementary metal-oxide-semiconductor (CMOS) devices, and the emergence of alternative nanotechnology devices, reliability challenges are of growing importance. Understanding and controlling the fault mechanisms associated with new materials and structures for both transistors and interconnectio...
Nowadays, the probability of occurrence of Single Cell Upsets (SCUs) or Multiple Cell Upsets (MCUs) has increased due to the continuous increment in the integration scale of CMOS technology, that has provoked an augment in the fault rate. SCUs and MCUs are particularly common in computer memory systems. To tolerate errors, it is common the use of E...
Actualmente, y debido al continuo aumento en la escala de integración, la tasa de fallos en los sistemas de memoria de los computadores ha aumentado. Así, la probabilidad de que se produzcan Single Cell Upsets (SCUs) o Multiple Cell Upsets (MCUs) aumenta. Una solución común es el uso de Códigos de Corrección de Errores (ECCs). Sin embargo, cuando s...
Due to the continuous increment in the integration scale, the fault rate in computer memory systems has augmented. Thus, the probability of occurrence of Single Cell Upsets (SCUs) or Multiple Cell Upsets (MCUs) also increases. A common solution is the use of Error Correction Codes (ECCs). However, when using ECCs, a good balance between the error c...
Currently, faults suffered by SRAM memory systems have increased due to the aggressive CMOS integration density. Thus, the probability of occurrence of single-cell upsets (SCUs) or multiple-cell upsets (MCUs) augments. One of the main causes of MCUs in space applications is cosmic radiation. A common solution is the use of error correction codes (E...
New fault tolerant methods are needed to cope with the fault rate augment in memory systems. Traditionally, Error Correction Codes (ECCs) have been used. This Fault-Tolerance method works well with single faults. Nevertheless, the increase of the integration density in current deep submicron chips, as well as the decrease of the energy needed to pr...
Due to technology scaling, register protection against soft errors remains
a major concern for deep sub-micron systems. Error Correction Codes
(ECC) improve protection at the price of data redundancy. In memories, trying
to reduce this redundancy is a really important issue. However, this fact is less
important in the case of registers. A major req...
As scaling is more and more aggressive, intermittent faults are increasing their importance in current deep submicron complementary metal-oxide-semiconductor (CMOS) technologies. This work shows the dependability assessment of a fault-tolerant computer system against intermittent faults. The applied methodology lies in VHDL-based fault injection, w...
Traditionally, Error Correction Codes (ECC) works with codeword digits exposed to the same error rates. Nevertheless, with the actual height of intermittent faults, it would be interesting to divide a codeword according to possible different error rates. This work summarizes Flexible Unequal Error Control (FUEC) codes. This new codes family divides...
Error correction codes are used in semiconductor memories to protect information against errors. Simple error correction codes are preferred due to their low redundancy and encoding/decoding latency. Hamming codes are simple and can be easily built for any word length. They only allow single error correction, so a multiple error can lead to a wrong...
With the scaling of complementary metal-oxide-semiconductor (CMOS) technology to the submicron range, designers have to deal with a growing number and variety of fault types. In this way, intermittent faults are gaining importance in modern very large scale integration (VLSI) circuits. The presence of these faults is increasing due to the complexit...
Unequal Error Control (UEC) codes provide means for handling errors where the codeword digits may be exposed to different error rates, like in two-dimensional optical storage media, or VLSI circuits affected by intermittent faults or different noise sources. However, existing UEC codes are quite rigid in their definition. They split codewords in on...
The reduction of transistor size dimensions in new technologies has provoked the apparition of new fault types. In this way, intermittent faults present a great challenge, as they are expected to be more and more common. In this work, the effects of intermittent faults in the behavior of a Fault-Tolerant microprocessor are studied. To carry out thi...
Intermittent faults, being serious concerns for deep-submicron integrated circuits, are not well studied in the literature. This paper performs fault injection simulation to analyze the impact of intermittent faults, which is an important step towards the development of mitigation techniques for such threats.
Transfer to industry is typically understood by academia as a transfer of methodologies and technologies, thus neglecting transfer of knowledge. However, academia is very well placed to improve industry competitiveness through continuous training. Coping with transfer of knowledge is not only a matter of providing courses, but also of considering t...
As technology shrinks, higher operating frequencies, reduced feature sizes and lower supply voltages allow greater performance, but the reliability has been affected negatively. Smaller devices and wire spacing lead to an increase in the occurrence of multiple adjacent faults. Thus, the system reliability is seriously affected. Error correction cod...
As CMOS technology scales to the nanometer range, designers have to deal with a growing number and variety of fault types. Particularly, intermittent faults are expected to be an important issue in modern VLSI circuits. The complexity of manufacturing processes, producing residues and parameter variations, together with special aging mechanisms, ma...
Today, security and safety competences required by professionals of these disciplines in highly competitive business domains, such as aerospace, transport, energy, health and banking, are not properly addressed by graduate, master o postgraduate courses. Such challenging training requirement can be addressed by enriching work-based learning approac...
Resumen: A pesar de la importancia que ha cobrado la exposición oral como competencia instrumental según las directrices de la convergencia europea, ésta sigue siendo una de las competencias menos trabajadas en las ingenierías en general, y en Ingeniería Informática en particular. En este trabajo se presenta una comparativa de las exposiciones oral...
Intermittent faults are expected to be a great challenge in VLSI circuits. The complexity of manufacturing processes, provoking residues and process variations, and special wear out mechanisms, may increase the presence of such faults. This work presents a case study of the effects of intermittent faults on the behavior of a commercial micro contro...
As technologies shrink, new kinds of faults arise. Intermittent faults are part of these new faults. They are expected to be an increasing challenge in modern VLSI circuits. Up to now, transient and permanent faults used to be injected for the experimental validation of fault tolerance mechanisms. The main objective of this work is to improve the d...
Resumen:
En este trabajo se describe la experiencia llevada a cabo en una asignatura básica de redes de computadores de la Universidad Politécnica de Valencia, en la que durante el curso 2007/2008 se ha realizado una evaluación por competencias, estudiándose además la carga temporal que suponía dicha evaluación, tanto para el alumno como para el pr...
It is expected that intermittent faults will be a great challenge in modern VLSI circuits. In this work, we present a case study of the effects of intermittent faults on the behavior of a commercial microcontroller. The methodology used lies in VHDL-based fault injection technique, which allows a systematic and exhaustive analysis of the influence...
Deep submicrometer devices are expected to be increasingly sensitive to physical faults. For this reason, fault-tolerance mechanisms are more and more required in VLSI circuits. So, validating their dependability is a prior concern in the design process. Fault injection techniques based on the use of hardware description languages offer important a...
Nowadays, new submicron technologies have allowed increasing processors performance while decreasing their size. However, as a side effect, their reliability has been negatively affected. Although mainly permanent and transient faults have been studied, intermittent faults are expected to be a big challenge in modern VLSI circuits. Usually, intermi...
Resumen Con la llegada del Espacio Europeo de Educación Superior (EEES), las estrategias didácticas deben cambiar para centrarse en el aprendizaje del estudiante, convirtiendo al alumno en un elemento activo dentro de su aprendizaje, incentivando su participación, de tal manera que se sienta parte activa del proceso de enseñanza–aprendizaje. Otro c...
Resumen La acción tutorial se justifica dentro de los cambios significativos que rigen la implantación del nuevo Espacio Europeo de Educación Superior, obligando a alterar la orientación del trabajo dentro de la Universidad. En este sentido, el objetivo de la formación supone nuevas connotaciones que permitan el desarrollo integral de los alumnos,...
This work shows that faults affecting the combinational logic embedded in a microcontroller can propagate to register elements
and may have an important impact over applications, even in the most favourable case of short transient faults. Using VHDL-based
fault injection techniques, we have experienced that the percentage of propagated faults, and...
Fault injection techniques based on the use of VHDL as design language offer important advantages with regard to other fault injection techniques. First, as they can be applied during the design phase of the system, they allow reducing the time-to-market. Second, this type of techniques presents high controllability and reachability. Among the diff...
Modern processors tend to increase the number of registers, being part of them not accessible by the instruction set. Traditionally, the effect of faults in these hidden registers has not been considered during system validation using fault injection. In this paper, a study of the importance of faults in hidden registers is performed. Firstly, we h...
La inyección de fallos es una técnica utilizada para la validación experimental de Sistemas Tolerantes a Fallos. Se distinguen tres grandes categorías: inyección de fallos física (denominada también physical fault injection o hardware implemented fault injection), inyección de fallos implementada por software (en inglés software implemented fault i...
During last years, the time-triggered architecture (TTA) has been gaining acceptance as a generic architecture for highly dependable real-time systems. It is now being used to implement the "x-by-wire " concept. A problem for this kind of systems is their validation. Fault injection has achieved a great acceptance among designers for the experiment...
This chapter presents an overview of some principal VHDL simulation-based fault injection techniques. Significant designs
and tools, as well as their advantages and drawbacks, are shown. Also, VFIT, a VHDL simulation-based fault injection tool
developed by the GSTF (Fault Tolerant Systems Group — Polytechnic University of Valencia) to run on a PC p...
Nowadays, the use of dependable systems is generalising, and diagnosis is an important step during their design. A diagnosis in early phases of the design cycle allows to save time and money. Fault injection can be used during the design process of the system, and using Hardware Description Languages, particularly VHDL, it is possible to accomplish...
As the use of dependable systems is generalising, their study in early phases of the design cycle is more and more important in order to save time and money. In this work, using a generic VEDL-based fault injection tool, called VFIT (VHDL-Based Fault Injection Tool), we have validated the dependability of a real Fault-Tolerant System using its VHDL...
In this work different VHDL-based fault injection techniques (simulator commands, saboteurs and mutants) have been compared and applied in the validation of a fault-tolerant system. Some extensions and implementation designs of these techniques have been introduced. As a complement of these injection techniques, a wide set of fault models (includin...
This paper presents the prototype of an automatic and model-independent fault injection tool, to be used on an IBM-PC (or compatible) platform. The tool has been built around a commercial VHDL simulator and it is thought to implement different fault injection techniques. With this tool, a wide range of transient and permanent faults can be injected...
Fault injection techniques are frequently used for validating dependable systems. VHDL-based techniques are good resources that support fault injection with many advantages such as a high level of accessibility, controllability and precision. This paper presents the results obtained with a VHDL-based tool (VFIT) injecting single and multiple faults...
In this work it is intended to compare different VHDL-based fault injection techniques: simulator commands, saboteurs and mutants for the validation of fault tolerant systems. Some extensions and implementation designs of these techniques have been introduced. Also, a wide set of non-usual fault models have been implemented. As an application, a fa...
Three different VHDL-based fault injection techniques have been compared to validate a fault tolerant micro- computer system. We have studied the error pathology, their detection and recovery coverages and their latencies.
This work presents a campaign of fault injection to validate the dependability of a fault tolerant microcomputer system. The system is duplex with cold stand-by sparing, parity detection and a watchdog timer. The faults have been injected on a chip-level VHDL model, using an injection tool designed for this purpose. We have carried out a set of inj...
This paper presents the prototype of an automatic and model-independent fault injection tool, to use on an IBM-PC (or compatible) platform. The tool has been built around a commercial VHDL simulator. With this tool, both transient and permanent faults, of a wide range of types, can be injected into medium-complexity models. Another remarkable aspec...
Questions
Question (1)
This work is a copy of a previously published work. In fact, it is a cut and paste of the owrk:
Title: "Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications"
Authors: Joaquín Gracia-Morán , Luis J. Saiz-Adalid, Daniel Gil-Tomás, and Pedro J. Gil-Vicente
Journal: IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 10, OCTOBER 2018