Figure 3 - uploaded by Mounir Benabdenbi
Content may be subject to copyright.
The initialization procedure.

The initialization procedure.

Source publication
Conference Paper
Full-text available
In this paper, we present an embedded, at speed, off-line, and fully distributed initialization procedure for 2D-Mesh Network-on-Chip (NoC). This procedure is executed at power boot, and targets the detection and the deactivation of the faulty routers and/or faulty communication channels. The final objective is fault tolerance. The proposed procedu...

Context in source publication

Context 1
... A timeout is attached to the initialization procedure, in case one FSM is blocked in an intermediate state. These FSMs are activated by the global RESET signal, to execute the algorithm described in Fig.3. There is two level of parallelism in this distributed algorithm: −The router is tested first, without any interaction with the neighbor routers. ...

Similar publications

Conference Paper
Full-text available
This paper presents a novel application-driven and resource-aware mapping methodology for tree-structured streaming applications onto NoCs. This includes strategies for mapping the source of streaming applications (seed point selection), as well as embedding strategies so that each process autonomously embeds its own succeeding tasks. The proposed...
Conference Paper
Full-text available
A fault-tolerant router design (20-path router) is proposed to reduce the impacts of faulty routers for 2D-mesh based chip multiprocessor systems. In our experiments, the OCNs using 20PRs can reduce 75.65% ~ 85.01% unreachable packets and 7.78% ~ 26.59% latency in comparison with the OCNs using generic XY routers.
Article
Full-text available
To avoid packet loss and deadlock scenarios that arise due to faults or power gating in multicore and many-core systems, the network-on-chip needs to possess resilient communication and load-balancing properties. In this work, we introduce the Fashion router, a self-monitoring and self-reconfiguring design that allows for the on-chip network to dyn...
Article
Full-text available
In the nano-scale era, Network-on-Chip (NoC) interconnection paradigm has gained importance to abide by the communication challenges in Chip Multi-Processors (CMPs). With increased integration density on CMPs, NoC components namely cores, routers, and links are susceptible to failures. Therefore, to improve system reliability, there is a need for e...
Article
Full-text available
Application-Specific Networks-on-Chips (ASNoCs) are suitable communication platforms for meeting current application requirements. Interconnection links are the primary components involved in communication between the cores of an ASNoC design. The integration density in ASNoC increases with continuous scaling down of the transistor size. Excessive...

Citations

... Le problème à résoudre lors de la défaillance d'un noeud (lien ou routeur) est celui de la perte de connectivité et de régularité dans le réseau. Plusieurs travaux ont été menés au niveau du routage consistant à reconfigurer les routeurs voisins de celui fautif pour créer des contournements de zone [ZGT08,ZGB10]. La régularité de ces zones est une condition nécessaire et ceci induit de sacrifier des routeurs sains. ...
Thesis
Les systèmes parallèles de type multi/pluri-cœurs permettant d'obtenir une grande puissance de calcul à bas coût énergétique sont de nos jours une réalité. Néanmoins, l'exploitation des performances de ces architectures dépend de l'efficacité du système à gérer les accès aux données. Le but de nos travaux est d'améliorer l'efficacité de ces accès en exploitant les caractéristiques de l'architecture matérielle.Dans une première partie, nous proposons une nouvelle organisation de la hiérarchie des mémoires caches qui maximise l'utilisation de l'espace de stockage disponible à chaque niveau. Cette solution, basée sur les architectures à accès non uniforme au cache (NUCA), supporte les transferts inter et intra-niveau de la hiérarchie. Elle requiert un protocole de cohérence de cache qui s'adapte à ses spécifications.Certes, le transfert des données au niveau de la hiérarchie est aussi un déterminant de la performance du système. Dans une seconde partie, nous prenons en compte les besoins de communication spécifiques du protocole. Nous proposons un réseau virtualisé comme support de communication ad-hoc afin de gérer le trafic de cohérence à moindre coût. Ce dernier relie les caches d'un même niveau pour supporter les transferts intra-niveaux, qui sont une spécificité de notre protocole, en vue de réduire la latence moyenne d'accès.
... Redundancy is the best-known, fault tolerance technique and was the simplest method to achieve reliability. However, using this technique proposed in [9], [10], [16], [11], [12] is specially used to avoid faults in links or routers, when a component fails it is simply replaced by its copy. The disadvantage of this solution is that it is more expensive. ...
... By applying the reconfiguration mechanism [16], [17], [18] new topology will be discovered and the components of the network are updated to compute the new routing path. The solution proposed by Zhang, et al [8] enforces with this mechanism. ...
... During this stage, a fully distributed, hardware BIST is used to deactivate all faulty routers, and/or all inter-routers communication channels in the various NoCs of the TSAR architecture, as described in [4]. This BIST mechanism is implemented in the routers, and executed in parallel to detect faulty channels/routers in a 2D-mesh NoC. ...
... Thus, the global routing function, and the NoC itself must be reconfigured to support the new topology. In two previous works [3], [4] , we presented a selftestable&cleanable , reconfigurable 2D-mesh NoC. The two key features are summarized below: 1) Self-testable&cleanable [4] : A fully distributed & decentralized hardware built-in self-test (BIST) mechanism is integrated into the NoC. ...
... In two previous works [3], [4] , we presented a selftestable&cleanable , reconfigurable 2D-mesh NoC. The two key features are summarized below: 1) Self-testable&cleanable [4] : A fully distributed & decentralized hardware built-in self-test (BIST) mechanism is integrated into the NoC. At power-on or system reboot, all NoC components are tested in isolation and in parallel. ...
... The black hole model (proposed in [4]) is actually a functional fault model where the faulty components can be detected by means of a dedicated BIST approach, and deactivated prior any localization. Several papers present solutions for localization, [5], [6], [7], which rely on the use of ATE (Automatic Test Equipment) and TAM (Test Access Mechanism), to feed NoC inputs with external packets as the test vectors, and to analyze NoC outputs. ...
Conference Paper
Full-text available
In this paper, we present a software approach for localization of faulty components in a 2D-mesh Network-on-Chip, targeting fault tolerance in a shared memory MP2SoC architecture. We use a pre-existing and distributed hardware infrastructure supporting self-test and de-activation of the faulty components (routers and communication channels), that are transformed into “black hole”. We detail the software method used to localize these “black holes”, and centralize the information in a single point, where a modified global routing function can be defined. This embedded software makes an extensive use of a distributed fault-tolerant configuration firmware assisted by a Distributed Cooperative Configuration Infrastructure (DCCI), that is also presented. Finally, “black hole” detection and localization coverage is evaluated.
Conference Paper
Built-In-Self Test (BIST) being one of the techniques which are well known for their ability of providing on-chip testability feature, attracts its usage in today's System-on-Chip (SoC) designs. With the evolution of Network-on-Chip (NoC) communication for complex SoC, the need for fault tolerant systems have increased at a speed. In an attempt to design a good BIST architecture, this paper proposes a Cellular Automata Rule 45 based BIST architecture for star topology NoC. Power, resource utilization and timing reports are generated for the proposed architecture and are compared against the most popular and widely used LFSR based BIST architecture. The results and discussion in this paper put forward the advantages of the proposed architecture when compared to its counterpart.
Article
3D technology promises energy-efficient heterogeneous integrated systems, which may open the way to thousands cores chips. Silicon dies containing processing elements are stacked and connected by vertical wires called Through-Silicon-Vias. In 3D chips, interconnecting an increasing number of processing elements requires a scalable high-performance interconnect solution: the 3D Network-on-Chip. Despite the advantages of 3D integration, testing, reliability and yield remain the major challenges for 3D NoC-based systems. In this thesis, the TSV interconnect test issue is addressed by an off-line Interconnect Built-In Self-Test (IBIST) strategy that detects both structural (i.e. opens, shorts) and parametric faults (i.e. delays and delay due to crosstalk). The IBIST circuitry implements a novel algorithm based on the aggressor-victim scenario and alleviates limitations of existing strategies. The proposed Kth-aggressor fault (KAF) model assumes that the aggressors of a victim TSV are neighboring wires within a distance given by the aggressor order K. Using this model, TSV interconnect tests of inter-die 3D NoC links may be performed for different aggressor order, reducing test times and circuitry complexity. In 3D NoCs, TSV permanent and transient faults can be mitigated at different abstraction levels. In this thesis, several error resilience schemes are proposed at data link and network levels. For transient faults, 3D NoC links can be protected using error correction codes (ECC) and retransmission schemes using error detection (Automatic Retransmission Query) and correction codes (i.e. Hybrid error correction and retransmission).For transients along a source-destination path, ECC codes can be implemented at network level (i.e. Network-level Forward Error Correction). Data link solutions also include TSV repair schemes for faults due to fabrication processes (i.e. TSV-Spare-and-Replace and Configurable Serial Links) and aging (i.e. Interconnect Built-In Self-Repair and Adaptive Serialization) defects. At network-level, the faulty inter-die links of 3D mesh NoCs are repaired by implementing a TSV fault-tolerant routing algorithm. Although single-level solutions can achieve the desired yield / reliability targets, error mitigation can be realized by a combination of approaches at several abstraction levels. To this end, multi-level error resilience strategies have been proposed. Experimental results show that there are cases where this multi-layer strategy pays-off both in terms of cost and performance. Unfortunately, one-fits-all solution does not exist, as each strategy has its advantages and limitations. For system designers, it is very difficult to assess early in the design stages the costs and the impact on performance of error resilience. Therefore, an error resilience exploration (ERX) methodology is proposed for 3D NoCs.
Conference Paper
Multi-Processor Systems-on-Chip (MPSoCs) are increasingly popular in embedded systems, but also on high performance systems. In such systems, the data bandwidth requirements keeps increasing as the number of processing elements increases. Therefore, a Network-on-Chip (NoCs) communication architecture use to be preferred than a communication based on shared buses, because it provides higher communication performance. The probability of failure increases in this systems, due to these great advances in integration scales and the increasing number of components on chip. Therefore Fault Tolerance will become a key aspect on designing the near future VLSI SoC, and especially on their interconnection Network on Chip (NoC). This paper focuses on describe the particular aspects of NoCs, and the proposed fault-tolerant strategies for NoCs.
Conference Paper
This paper proposes a built-in self-test/self-diagnosis procedure at start-up of an on-chip network (NoC) for bisynchronous communication channels. Concurrent BIST operations are carried out after reset at each switch, thus resulting in scalable test application time with network size. The key principle consists of exploiting the inherent structural redundancy of the NoC architecture in a cooperative way for the effective diagnosis and error detection. At-speed testing of stuck-at faults can be performed in less than 4000 cycles regardless of their size, with an hardware overhead of less than 30%.
Article
Most built-in self-test architectures use pseudo-random test pattern generators. However, whenever this technique has been applied to on-chip interconnection networks, overly large testing latencies have been reported. On the other hand, alternative approaches either suffer from large area penalties (like scan-based testing or the use of deterministic test patterns) or poor fault coverage in the control path (functional testing). Moreover, the recent proliferation of clock domains on a chip makes testing overly challenging. This manuscript presents the optimisation of a built-in self-testing framework based on pseudo-random test patterns to the microarchitecture of network-on-chip switches. As a result, fault coverage and testing latency approach those achievable with deterministic test patterns while materialising relevant area savings and enhanced flexibility. Finally, the authors implement the extension of the proposed testing methodology to multisynchronous systems, thus making it compliant with the relaxation of synchronisation assumptions in nanoscale designs.