The initialization procedure.

Source publication

Fully Distributed Initialization Procedure for a 2D-Mesh NoC, Including Off Line BIST and Partial Deactivation of Faulty Components

Conference Paper

Full-text available

Aug 2010

In this paper, we present an embedded, at speed, off-line, and fully distributed initialization procedure for 2D-Mesh Network-on-Chip (NoC). This procedure is executed at power boot, and targets the detection and the deactivation of the faulty routers and/or faulty communication channels. The final objective is fault tolerance. The proposed procedu...

Context 1

... A timeout is attached to the initialization procedure, in case one FSM is blocked in an intermediate state. These FSMs are activated by the global RESET signal, to execute the algorithm described in Fig.3. There is two level of parallelism in this distributed algorithm: −The router is tested first, without any interaction with the neighbor routers. ...

View in full-text

Dynamic decentralized mapping of tree-structured applications on NoC architectures

Conference Paper

Full-text available

Jun 2011

This paper presents a novel application-driven and resource-aware mapping methodology for tree-structured streaming applications onto NoCs. This includes strategies for mapping the source of streaming applications (seed point selection), as well as embedding strategies so that each process autonomously embeds its own succeeding tasks. The proposed...

Fault-tolerant Router with Built-in Self-test/Self-diagnosis and Fault-isolation Circuits for 2D-mesh Based Chip Multiprocessor Systems

Conference Paper

Full-text available

May 2009

A fault-tolerant router design (20-path router) is proposed to reduce the impacts of faulty routers for 2D-mesh based chip multiprocessor systems. In our experiments, the OCNs using 20PRs can reduce 75.65% ~ 85.01% unreachable packets and 7.78% ~ 26.59% latency in comparison with the OCNs using generic XY routers.

FASHION: Fault-Aware Self-Healing Intelligent On-chip Network

Article

Full-text available

Feb 2017

To avoid packet loss and deadlock scenarios that arise due to faults or power gating in multicore and many-core systems, the network-on-chip needs to possess resilient communication and load-balancing properties. In this work, we introduce the Fashion router, a self-monitoring and self-reconfiguring design that allows for the on-chip network to dyn...

FIGURE 3. Particle Structure for MPEG ACG shown in Fig. 1

FIGURE 8. Router addressing scheme for 4x4 Torus topology

FIGURE 9. DIP Switch SW11 configuration on KC705 FPGA board, (a) No...

FIGURE 10. Experimental flow of the fault-tolerant application mapping...

Flexible Spare Core Placement in Torus Topology based NoCs and its Validation on an FPGA

Article

Full-text available

Mar 2021

In the nano-scale era, Network-on-Chip (NoC) interconnection paradigm has gained importance to abide by the communication challenges in Chip Multi-Processors (CMPs). With increased integration density on CMPs, NoC components namely cores, routers, and links are susceptible to failures. Therefore, to improve system reliability, there is a need for e...

FIGURE 7: Channel Dependency Graph (CDG) for the faulttolerant topology...

FIGURE 8: Mapping of MP3Encoder Application before generating the...

FIGURE 12: Percentage of improvements in dynamic performance metrics...

NoC router specifications [36] used in this work

Comparison of Dynamic simulation results for any link fault in the...

Fault-Tolerant Application-Specific Topology based NoC and its Prototype on an FPGA

Article

Full-text available

May 2021

Application-Specific Networks-on-Chips (ASNoCs) are suitable communication platforms for meeting current application requirements. Interconnection links are the primary components involved in communication between the cores of an ASNoC design. The integration density in ASNoC increases with continuous scaling down of the transistor size. Excessive...

Hiérarchie mémoire dans les systèmes intégrés multiprocesseurs construits autour de réseaux sur puce

Thesis

Oct 2017

Hela Belhadj Amor

Les systèmes parallèles de type multi/pluri-cœurs permettant d'obtenir une grande puissance de calcul à bas coût énergétique sont de nos jours une réalité. Néanmoins, l'exploitation des performances de ces architectures dépend de l'efficacité du système à gérer les accès aux données. Le but de nos travaux est d'améliorer l'efficacité de ces accès en exploitant les caractéristiques de l'architecture matérielle.Dans une première partie, nous proposons une nouvelle organisation de la hiérarchie des mémoires caches qui maximise l'utilisation de l'espace de stockage disponible à chaque niveau. Cette solution, basée sur les architectures à accès non uniforme au cache (NUCA), supporte les transferts inter et intra-niveau de la hiérarchie. Elle requiert un protocole de cohérence de cache qui s'adapte à ses spécifications.Certes, le transfert des données au niveau de la hiérarchie est aussi un déterminant de la performance du système. Dans une seconde partie, nous prenons en compte les besoins de communication spécifiques du protocole. Nous proposons un réseau virtualisé comme support de communication ad-hoc afin de gérer le trafic de cohérence à moindre coût. Ce dernier relie les caches d'un même niveau pour supporter les transferts intra-niveaux, qui sont une spécificité de notre protocole, en vue de réduire la latence moyenne d'accès.

Collaborative Routing Algorithm for Fault Tolerance in Network on Chip CRAFT NoC

Article

Full-text available

Jan 2017

Fault-Tolerance Mechanisms for Permanent Failures in a Coherent Shared-Memory Many-Core Architecture

Article

Jun 2014

Article for GDR SoC-SIP 2014

Localization of Damaged Resources in NoC Based Shared-Memory MP2SOC, using a Distributed Cooperative Configuration Infrastructure

Conference Paper

Full-text available

Jun 2011

In this paper, we present a software approach for localization of faulty components in a 2D-mesh Network-on-Chip, targeting fault tolerance in a shared memory MP2SoC architecture. We use a pre-existing and distributed hardware infrastructure supporting self-test and de-activation of the faulty components (routers and communication channels), that are transformed into “black hole”. We detail the software method used to localize these “black holes”, and centralize the information in a single point, where a modified global routing function can be defined. This embedded software makes an extensive use of a distributed fault-tolerant configuration firmware assisted by a Distributed Cooperative Configuration Infrastructure (DCCI), that is also presented. Finally, “black hole” detection and localization coverage is evaluated.

Cellular Automata based Built-In-Self Test implementation for Star Topology NoC

Conference Paper

Jan 2017

Built-In-Self Test (BIST) being one of the techniques which are well known for their ability of providing on-chip testability feature, attracts its usage in today's System-on-Chip (SoC) designs. With the evolution of Network-on-Chip (NoC) communication for complex SoC, the need for fault tolerant systems have increased at a speed. In an attempt to design a good BIST architecture, this paper proposes a Cellular Automata Rule 45 based BIST architecture for star topology NoC. Power, resource utilization and timing reports are generated for the proposed architecture and are compared against the most popular and widely used LFSR based BIST architecture. The results and discussion in this paper put forward the advantages of the proposed architecture when compared to its counterpart.

Development of HW/SW Fault Tolerant and Self-Configuring Architectures for 3D Integrated Technologies

Article

Jan 2013

Vladimir Pasca

3D technology promises energy-efficient heterogeneous integrated systems, which may open the way to thousands cores chips. Silicon dies containing processing elements are stacked and connected by vertical wires called Through-Silicon-Vias. In 3D chips, interconnecting an increasing number of processing elements requires a scalable high-performance interconnect solution: the 3D Network-on-Chip. Despite the advantages of 3D integration, testing, reliability and yield remain the major challenges for 3D NoC-based systems. In this thesis, the TSV interconnect test issue is addressed by an off-line Interconnect Built-In Self-Test (IBIST) strategy that detects both structural (i.e. opens, shorts) and parametric faults (i.e. delays and delay due to crosstalk). The IBIST circuitry implements a novel algorithm based on the aggressor-victim scenario and alleviates limitations of existing strategies. The proposed Kth-aggressor fault (KAF) model assumes that the aggressors of a victim TSV are neighboring wires within a distance given by the aggressor order K. Using this model, TSV interconnect tests of inter-die 3D NoC links may be performed for different aggressor order, reducing test times and circuitry complexity. In 3D NoCs, TSV permanent and transient faults can be mitigated at different abstraction levels. In this thesis, several error resilience schemes are proposed at data link and network levels. For transient faults, 3D NoC links can be protected using error correction codes (ECC) and retransmission schemes using error detection (Automatic Retransmission Query) and correction codes (i.e. Hybrid error correction and retransmission).For transients along a source-destination path, ECC codes can be implemented at network level (i.e. Network-level Forward Error Correction). Data link solutions also include TSV repair schemes for faults due to fabrication processes (i.e. TSV-Spare-and-Replace and Configurable Serial Links) and aging (i.e. Interconnect Built-In Self-Repair and Adaptive Serialization) defects. At network-level, the faulty inter-die links of 3D mesh NoCs are repaired by implementing a TSV fault-tolerant routing algorithm. Although single-level solutions can achieve the desired yield / reliability targets, error mitigation can be realized by a combination of approaches at several abstraction levels. To this end, multi-level error resilience strategies have been proposed. Experimental results show that there are cases where this multi-layer strategy pays-off both in terms of cost and performance. Unfortunately, one-fits-all solution does not exist, as each strategy has its advantages and limitations. For system designers, it is very difficult to assess early in the design stages the costs and the impact on performance of error resilience. Therefore, an error resilience exploration (ERX) methodology is proposed for 3D NoCs.

Fault Tolerance on NoCs

Conference Paper

Mar 2013

Multi-Processor Systems-on-Chip (MPSoCs) are increasingly popular in embedded systems, but also on high performance systems. In such systems, the data bandwidth requirements keeps increasing as the number of processing elements increases. Therefore, a Network-on-Chip (NoCs) communication architecture use to be preferred than a communication based on shared buses, because it provides higher communication performance. The probability of failure increases in this systems, due to these great advances in integration scales and the increasing number of components on chip. Therefore Fault Tolerance will become a key aspect on designing the near future VLSI SoC, and especially on their interconnection Network on Chip (NoC). This paper focuses on describe the particular aspects of NoCs, and the proposed fault-tolerant strategies for NoCs.

Cooperative Built-in Self-Testing and Self-Diagnosis of NoC Bisynchronous Channels

Conference Paper

Sep 2012

This paper proposes a built-in self-test/self-diagnosis procedure at start-up of an on-chip network (NoC) for bisynchronous communication channels. Concurrent BIST operations are carried out after reset at each switch, thus resulting in scalable test application time with network size. The key principle consists of exploiting the inherent structural redundancy of the NoC architecture in a cooperative way for the effective diagnosis and error detection. At-speed testing of stuck-at faults can be performed in less than 4000 cycles regardless of their size, with an hardware overhead of less than 30%.

Optimising pseudo-random built-in self-testing of fully synchronous as well as multisynchronous networks-on-chip

Article

Mar 2013
IET COMPUT DIGIT TEC

Most built-in self-test architectures use pseudo-random test pattern generators. However, whenever this technique has been applied to on-chip interconnection networks, overly large testing latencies have been reported. On the other hand, alternative approaches either suffer from large area penalties (like scan-based testing or the use of deterministic test patterns) or poor fault coverage in the control path (functional testing). Moreover, the recent proliferation of clock domains on a chip makes testing overly challenging. This manuscript presents the optimisation of a built-in self-testing framework based on pseudo-random test patterns to the microarchitecture of network-on-chip switches. As a result, fault coverage and testing latency approach those achievable with deterministic test patterns while materialising relevant area savings and enhanced flexibility. Finally, the authors implement the extension of the proposed testing methodology to multisynchronous systems, thus making it compliant with the relaxation of synchronisation assumptions in nanoscale designs.

The initialization procedure.

Context in source publication

Similar publications

Citations