Alain Greiner's research while affiliated with Université de la Sorbonne Nouvelle Paris 3 and other places

Publications (80)

Article
Full-text available
In the context of high-performance computing, the integration of more computing capabilities with generic cores or dedicated accelerators for artificial intelligence (AI) application is raising more and more challenges. Due to the increasing costs of advanced nodes and the difficulties of shrinking analog and circuit input output signals (IOs), alt...
Article
This paper addresses the important issue of fault tolerance in network-on-chip (NoC) and presents an on-the-field test and configuration infrastructure for a 2-D-mesh NoC, which can be used in many generic shared-memory many-core tiled architectures and MPSoCs. This paper also details all the hardware and software means needed to: 1) initialize the...
Conference Paper
New fine-grained 3D cache architectures have been recently proposed to embed more memory on-chip and thus reduce off-chip memory accesses. These 3D architectures provide a high access bandwidth thanks to wide vertical links. In this paper, we analyze the performances of such caches in a manycore context. We first propose to improve the microarchite...
Conference Paper
With the emergence of many core architectures, the need of on-chip memories such as caches grows faster than the number of cores. Moreover the bandwidth to off-chip memories is saturating. Big memory caches can alleviate the pressure to off-chip accesses. In this paper, we present an adaptive 3Dcache architecture taking advantage of dense vertical...
Conference Paper
Full-text available
In this paper, we present a software approach for localization of faulty components in a 2D-mesh Network-on-Chip, targeting fault tolerance in a shared memory MP2SoC architecture. We use a pre-existing and distributed hardware infrastructure supporting self-test and de-activation of the faulty components (routers and communication channels), that a...
Article
Simulation speed is a key issue in virtual prototyping of Multi-Processors System on Chip (MPSoCs). SystemC TLM2.0 (Transaction Level Modeling) is now commonly used to accelerate the simulation. However, the standard SystemC simulation engine uses a centralized scheduler that is clearly a bottleneck to parallelize the simulation of architectures co...
Conference Paper
The presentation will describe a use case that consists in the modeling and simulation of a genuine heterogeneous system composed of individually powered Wireless Sensor Network nodes. The models are written in SoCLib and SystemC-AMS, an open-source C++ extension to the OSCI SystemC Standard dedicated to the description of AMS designs containing di...
Conference Paper
Full-text available
In this paper, we present an embedded, at speed, off-line, and fully distributed initialization procedure for 2D-Mesh Network-on-Chip (NoC). This procedure is executed at power boot, and targets the detection and the deactivation of the faulty routers and/or faulty communication channels. The final objective is fault tolerance. The proposed procedu...
Conference Paper
The simulation speed is a key issue in virtual prototyping of Multi-Processors System on Chip (MPSoCs). The SystemC TLM2.0 (Transaction Level Modeling) approach accelerates the simulation by using Interface Method Calls (IMC) to implement the communications between hardware components. Another source of speedup can be exploited by parallel simulati...
Conference Paper
Multi-compartment is a flexible, lightweight architecture for embedded systems that allows multiple protection domains (compartments) to securely share processing, memory and other system resources. Compartments run in physical address space and enjoy direct access to security-critical initiator devices, such as DMA devices, while remaining protect...
Conference Paper
Full-text available
Streaming applications, such as packet switching or video and multimedia processing, require high through-put, that can be obtained by exploiting the application coarse grain parallelism, and mapping the parallel multitasks application on a multiprocessor system on chip (MPSoC). The seamless migration of a task from software to hardware implementat...
Conference Paper
This paper presents a method for designing SystemC-compliant Instruction Set Simulators (ISS) that address three of the major problems system designers are faced with when modeling MP-SoCs architectures: the multiple levels of abstraction of the simulation models supporting the design space exploration, the simulation speed, and the debug of the mu...
Article
Networks on chips constitute a new design paradigm for communication infrastructures in large multiprocessor SoCs. NoCs can use the GALS technique to address the difficulty of distributing a synchronous clock signal on the entire chip area. This article describes two approaches to implementing a distributed NoC in a GALS environment.
Conference Paper
Full-text available
In this paper we present a reconfigurable routing algorithm for a 2D-mesh network-on-chip (NoC) dedicated to fault- tolerant, massively parallel multi-processors systems on chip (MP2-SoC). The routing algorithm can be dynamically reconfigured, to adapt to the modification of the micro-network topology caused by a faulty router. This algorithm has b...
Conference Paper
This paper presents a physical implementation of the DSPIN network-on-chip in the FAUST architecture. FAUST is a stream-oriented multi- application SoC platform for telecommunications addressing IEEE 802.11a and MC-CDMA standards. The original asynchronous network-on-chip (ANOC) of FAUST has been replaced by the multi-synchronous DSPIN network-on-c...
Article
This paper presents two high-throughput, low-latency converters that can be used to convert synchronous communication protocol to asynchronous one and vice versa. We have designed these two hardware components to be used in a Globally Asynchronous Locally Synchronous clusterized Multi-Processor System-on-Chip communicating by a fully asynchronous N...
Conference Paper
Full-text available
This paper presents principles and tools to facilitate multi-processor system on chips (MPSoCs) design and modeling, and to speed up cycle accurate SystemC simulation. We describe an effective way to build an hardware architecture virtual prototype, using a library of SystemC simulation models based on communicating synchronous finite state machine...
Article
Full-text available
This paper presents two high-throughput, low-latency converters that can be used to convert synchronous communication protocol to asynchronous one and vice versa. These two hardware components have been designed to be used in Multi-Processor System on Chip respecting the GALS (Globally Asynchronous Locally Synchronous) paradigm and communicating by...
Article
This paper presents the physical design methodology of the VCI/SPIN wrappers. The challenge was the validation of the wrappers' specification with Alliance CAD Tools. A wrapper is a standard gateway used by a subscriber to access an interconnect. Because there are as many wrappers as subscribers, it is relevant to have an estimation of the wrapper'...
Conference Paper
In SoC designs, limited test access to internal cores, low-cost external tester's lack of accuracy and slow frequencies make application of at-speed tests impractical. Therefore, this paper presents an embedded micro-tester for testing IEEE1500-compliant SoCs. In the proposed approach, the test program is no more executed by the external tester but...
Conference Paper
The distribution of a synchronous clock in system-on-chip (SoC) has become a problem, because of wire length and process variation. Novel approaches such as the globally asynchronous, locally synchronous try to solve this issue by partitioning the SoC into isolated synchronous islands. This paper describes the bisynchronous FIFO used on the DSPIN n...
Conference Paper
Full-text available
In this paper we present a systematic comparison between two different implementations of a distributed Network on Chip: fully asynchronous and multi-synchronous. The NoC architecture has been designed to be used in a Globally Asynchronous Locally Synchronous clusterized Multi Processors System on Chip. The 5 relevant parameters are Silicon Area, N...
Conference Paper
Full-text available
This paper presents three high-throughput low-latency FIFOs that can be used as efficient and reliable interfaces between different domains in hybrid-timing systems. These three hardware components have been designed to be used in a Globally Asynchronous Locally Synchronous clusterized Multi-Processor System-on-Chip communicating by a Multi-Synchro...
Conference Paper
Full-text available
The paper presents the DSPIN micro-network, that is an evolution of the SPIN architecture. DSPIN is a scalable packet switching micro-network dedicated to GALS (globally asynchronous, locally synchronous) clustered, multi-processors, systems on chip. The DSPIN architecture has a very small footprint and provides to the system designer both guarante...
Conference Paper
Full-text available
This paper presents two high-throughput, low-latency converters that can be used to convert synchronous communication protocol to asynchronous one and vice versa. These two hardware components have been designed to be used in Multi-Processor System on Chip respecting the GALS (Globally Asynchronous Locally Synchronous) paradigm and communicating by...
Conference Paper
Full-text available
Early energy estimation is increasingly important in MultiProcessor System-On-Chip (MPSoC) design. Applying traditional approaches, which con- sist in delaying the estimation until the architectural layout has been produced, is inefficient and prevents the rapid exploration of alternative architectures. In this paper, we present a framework for arc...
Conference Paper
Full-text available
In this paper, we present the implementation of a multi-threaded software application for pre-crash obstacle detection, using stereo vision, and the "V-disparity" algorithm, that requires intensive computation. This application runs on a generic, low cost, massively parallel, multi-processor system-on-chip (MP-SoC). This hardware architecture is su...
Article
Evaluate system on-chip architectures necessitates test boards and applications. For the comparison between the packet-switched micro-network SPIN and the traditional Pi-Bus we have elaborated SystemC components and multi-threads programs. Instead of using a traffic generator analyzer, component flooding the network with packets, we use a true appl...
Conference Paper
Full-text available
This paper presents an hardware/software communica-tion mechanism, well suited for telecommunication oriented multi-processors system-on-chip (MP-SoC). It allows the system designer to map a parallel, multi-threaded software application, onto a generic multi-processors architecture. This hardware architecture can contain a variable number of progra...
Conference Paper
The paper presents an innovative simulation scheme to speed-up simulations of multi-clusters multi-processors SoCs at the TLM/T (transaction level model with time) abstraction level. The hardware components of the SoC architecture are written in standard SystemC. The goal is to describe the dynamic behavior of a given software application running o...
Conference Paper
Full-text available
The concept of network on chip (NoC) is a recent breakthrough in the system on chip (SoC) design area. A lot of work has been done to define efficient NoC architectures and implementations. In this paper, our goal is twofold. Firstly, we want to outline that the use of a NoC based shared-memory multiprocessor SoC challenges the application integrat...
Conference Paper
Full-text available
This paper presents a software-based approach for testing IEEE1500-compliant SoCs. In the proposed approach, the test program is no more executed by the external-traditional tester but by the SoC itself. The novel feature is the use of a dedicated test processor called T-Proc embedded onto the SoC to test the components. Under the control of the em...
Conference Paper
Full-text available
Résumé Ce papier présente le micro-réseau DSPIN qui est une évolution de l'architecture SPIN. DSPIN est un micro-réseau à commutation de paquets pour architectures multiprocesseurs intégrés sur puce (MPSoC) utilisant l'approche GALS (Globalement Asynchrone, Localement Synchrone). L'architecture de DSPIN a une surface de silicium très petite et four...
Conference Paper
Full-text available
Architectural exploration and application development for digital System On Chip need more and more performance from the simulator. Today, the standard design flaw use a unified modeling language and only one simulator for every development step. SystemC based simulators are efficient to validate hardware specifications but its performances are not...
Conference Paper
Full-text available
This paper presents STEPS, an innovative software-based approach for testing P1500-compliant SoCs. STEPS is based on the concept that the ATE is not considered as an initiator applying vectors to the SoC test pins but rather as a target, a huge repository of 32-bits test data and control commands. The ATE is connected to the functional SoC external...
Article
Full-text available
The micro network SPIN (Scalable Programmable Integrated Network) is a packet-switched system-on- chip interconnection. This technology provides a very general communication mechanism between the different virtual components connected in the system. Moreover the bandwidth increases linearly with the number of embedded proces- sors. This paper descr...
Article
We present a physical imrplementation of a 32-ports SPIN micro-network. For a 0.13 micron CMOS process, the total area is 4.6 , for a cumulated bandwidth of about 100 Gbits/s. In a 6 metal process, all the routing wires can be routed on top of the switching components. The SPIN32 macro-cell will be fabricated by ST Microelectronics, but this macroc...
Conference Paper
Full-text available
This paper presents the SPIN micro-network that is a generic, scalable interconnect architecture for system on chip. The SPIN architecture relies on packet switching and point-to-point bi-directional links between the routers implementing the micro-network. SPIN gives the system designer the simple view of a single shared address space and provides...
Conference Paper
We present a physical imrplementation of a 32-ports SPIN micro-network. For a 0.13 micron CMOS process, the total area is 4.6 mm2, for a cumulated bandwidth of about 100 Gbits/s. In a 6 metal process, all the routing wires can be routed on top of the switching components. The SPIN32 macro-cell will be fabricated by ST Microelectronics, but this mac...
Conference Paper
This paper presents an architectural study of a scalable system-level interconnection template. We explain why the shared bus, which is today's dominant template, will not meet the performance requirements of tomorrow's systems. We present an alternative interconnection in the form of switching networks. This technology originates in parallel compu...
Article
In this paper we describe a full-custom register file generator based on a tiling approach to achieve a high density integration. The leaf cells are designed using symbolic layout, providing a high degree of technology independence and portability. Co-simulation and formal proof ensures the validity of the tool. Introduction Register files are main...
Conference Paper
We present a simple technique for efficient cycle precise core based system simulator implementation. We first examine the current communication mechanisms in state-of-the-art digital embedded systems, and notice that few signals depend on signals set during the same cycle. Using a system model based on communicating finite state machines, we build...
Conference Paper
As integration of a whole system on a chip becomes possible, the need for fast and precise simulation tools increases. We present a high speed cycle precise simulation environment dedicated to core based embedded systems. We show that for the type of systems we aim at simulating, a correct simulation is obtained without event propagation. This rest...
Chapter
This paper presents a High Level Synthesis (HLS) method for specialized coprocessors in embedded systems. In recent years, the synthesis of hardware systems has moved to a higher level of abstraction, but the existing tools leave very little initiative to the designer. With User Guided High Level Synthesis (UGH), we introduce the notion of Draft Da...
Conference Paper
RCube is a reconfigurable network switching element that supports any topology and routes data packets of any length. It provides a simple but efficient support for adaptivity. Its architecture is based upon an 8×8 cross-bar with 8 on-chip bidirectional one Gbit/s asynchronous serial links. The serial link technology is developed within the OMI-HIC...
Conference Paper
This paper presents the design methodology used to implement a test chip for a multiport register file, as well as, the motivation of the project. The register file contains 6 read buses, 4 write buses, and 64 words of 32 bits. A built-in-self test scheme has been used to validate the register file. The final test chip contains 65000 transistors. T...
Conference Paper
We describe a CMOS Read Only Memory architecture designed for high performances and low power consumption using domino logic. Short read delays are achieved using hierarchical evaluation of the read busses, at the price of some more material. Partial block evaluation allows power consumption to be greatly reduced for blocks with an important number...
Article
A complete course on teaching logical testing at university Paris 6 is dedicated to postgraduate students and focuses on the test problem during design process and after fabrication on an ASIC (Application Specific Integrated Circuit). It takes place after an initiation course on CMOS VLSI design where students are required to design and implement...
Conference Paper
We present a new approach for FSM synthesis on the two most popular FPGA architectures: Actel and Xilinx. This approach deals with state assignment, optimization and mapping problems. Very fast FPGA mapping techniques based on multi-ROBDD representation (Shared, Reduced and Ordered BDDs) have been defined that allow the target architecture to be re...
Conference Paper
This paper presents the design methodology for a Superscalar 128-bit Very Long Instruction Word (VLIW) processor. A full set of portable cell libraries, macro-block generators associated with complex and advanced tools, such as logic synthesis, functional abstractor, formal proof tool and data-path compiler have been used in order to achieve a fast...
Conference Paper
We describe a high performance portable read only memory generator tool for CMOS circuits. The layout strategy uses a tiling placement approach to ensure good density. The leaf cells are designed using symbolic layout, providing a high degree of technology independence. The tiler is written using the general purpose C language to ensure software po...
Conference Paper
Presents a new view of routing messages in interconnection networks based on the known compact interval labeling. The authors propose simple algorithms, encapsulating networks and routing, suitable for a large class of topologies. They define a floating rule that unifies the notions of virtual channels and multiple intervals labeling. The introduce...
Conference Paper
Symbolic simulation approach is well suited for VLSI design for testability rules checking. Unfortunately the existing techniques are not extensible and therefore the verification tools based on them can not face up to the evolution of rules. This paper discusses the need of this kind of method and then describes a new symbolic simulation in which...
Conference Paper
The authors describe the design steps of portable generators for CMOS circuits. Starting from a parameterized behavioral description to obtain a layout, the methodology indicates how to specify, build and validate the layout view of parameterized module generators. This methodology uses Alliance CAD system tools for both generation and validation....
Conference Paper
A new symbolic simulation technique for design for testability (DFT) rules checking is discussed. With this method symbolic values and transfer functions of gates are redefinable to allow an adaptability to different sets of rules
Conference Paper
Symbolic simulation approach is well suited for VLSI design for testability (DFT) rules checking. Unfortunately the existing techniques are not extensible and therefore the verification tools based on them can not be parametrized to follow the evolution of rules. As a solution, a concept of symbolic simulator generator is proposed: both symbolic va...
Conference Paper
This paper presents an approach based on ROBDD representation for multilevel logic synthesis. This approach makes it possible to handle very high complexity circuits that cannot be synthesized by the classical factorization algorithms. Two important algorithms are used. The former generates a multilevel expression from an ROBDD. The latter builds a...
Conference Paper
This paper presents the design flow for a superscalar VLIW microprocessor using the 0.8 μ CMOS portable ASIC library developed in the framework of the ESPRIT2 IDPS project. A full set of cell libraries and macro-block generators have been used, in order to achieve fast design cycle and to maintain a high level of integration and performance. The fi...
Conference Paper
Full-text available
The traditional approaches for mu ltilevel logic optimization involve representing Boolean functions in Sum-of-Product forms that are minimized and then factorized in multilevel expressions. We have investigated an alternative approach called graphic synthesis that is based on a set of Reduced and Ordered BDDs, namely a multi-ROBDD for repre- senti...
Conference Paper
DESB is included in a set of tools for hierarchical verification of custom VLSI circuits. These tools include the layout extractor DAX, DESB, the electrical rule checker VERTEC, and the timing analyzer TAS. The functional abstractor DESB is the key point in a hierarchical verification process. A functional abstractor for CMOS VLSI circuits is prese...
Conference Paper
An effective layout design method for VLSI macrocell is presented. The method describes a way to write, to test and to validate efficient full-custom generators and tilers. First, after a brief overview of the design methodology of tilers, the GENLIB C-library of procedural design functions is described. Second, GENVIEW, a portable and graphic layo...
Conference Paper
Full-text available
A CMOS timing analyser using accurate delay models is presented. Switch-level analytic delays are derived from I/V characteristics of short-channel MOSFETS. A significant improvement in accuracy is obtained from the analysis of pertinent capacitances, modeling conflicts and slope effects in CMOS gates. The program handles large-scale circuits and g...
Conference Paper
NOISY is a part of an electrical rule checker with emphasis on noise computation. NOISY analyzes all types of noise, computes worst-case conditions using a relaxation algorithm, and draws a map of noise distribution in a chip. Its hierarchical organization allows verification of a high-complexity chip (more than 100000 transistors). Developed for C...
Article
This paper describes the architecture and the design of a high performance single chip Radix-2 FFT Butterfly. The architecture is optimized to a very efficient bit-serial data processing and to a pipeline-parallel hardware structures. The design is based on a standard cell approach including LSSD test capabilities. The circuit has been fabricated a...
Article
Full-text available
RÉSUMÉ : Les systèmes intégrés, de plus en plus complexes, nécessitent, pour un temps de developpement identique, des composants réutilisables ainsi qu'un simulateur performant facilitant l'exploration architecturale. Un niveau de simulation intéressant est le niveau cycle, qui permet de modéliser exactement le comportement d'un composant avec des...
Article
Full-text available
This paper presents STESI, a software-based approach for testing SoCs containing wrapped IP cores. In the proposed approach, the test program is no more executed by the traditional ATE but by the SoC itself. The novel feature of the STESI approach is the use of a dedicated test coprocessor embedded on the SoC to test the remaining components. Using...
Article
RÉSUMÉ : Dans un domaine où la capacité d'intégration double tous les deux ans, et permet aujourd'hui d'intégrer plusieurs dizaines de millions de transistors sur une seule puce, l'en-jeu des prochaines années dans le secteur des semi-conducteurs est d'intégrer sur une même puce des systèmes contenant plusieurs dizaines de processeurs ou coprocesse...
Article
RÉSUMÉ : Une des étapes essentielles dans la conception des Circuits Intégrés VLSI est la Synthèse Logique. Un outil de Synthèse Logique prend en entrée une description compor-tementale et génère en sortie un réseau de porte logiques interconnectées entre elles. Nous proposons une méthode de Synthèse Logique utilisant un Compilateur de Cellules qui...
Article
Full-text available
Software-based test of SoCs consists in testing IP cores using embedded processor cores. Previously proposed so- lutions are usually ad-hoc. Therefore, this paper presents STESOC, a software-based Test Access Mechanism for SoCs containing standard-wrapped IP cores. Under the control of the embedded microprocessor, a dedicated test- coprocessor test...

Citations

... So we keep at most 2mm 2 for TSV in 3D architecture. Which is enough for both signal and power supply [47]. We use the values shown in Table 3 and 4 in our throughput, cost, and energy model to calculate the cost function of the design points. ...
... Chiplet technology is fundamentally an advanced packaging solution. It integrates multiple dies (chiplets) onto a substrate, which could be an organic substrate [4], [36] or a silicon interposer [19], [59]. This integration is achieved through high-density interconnections, allowing the dies to function collectively as a single chip. ...
... La taille de chaque opérateur peut également être optimisée pour la précision exacte des opérations à exécuter [LLCHM10]. Cela est effectué par exemple dans le logiciel UGH [AP08]. ...
... Moreover, dense 3D interconnects enable high bandwidth density with parallel signaling. Such a communication scheme enables fully modular and scalable cache-coherent architecture, for offering large many-cores [20][28] [29]. ...
... During the past decade, processor designers started working on expanding the multi-core systems to many-core systems where the latter contains a large number, tens to hundreds, of cores [4][5][6] and they developed several many-core systems over the last few years [7][8][9]. Because the Interconnection networks architecture is at the core of such systems, various architectures have been proposed in the literature. ...
... The controller uses a list of defective tiles to redirect traffic initially directed at these tiles to known working tiles. More detail on L3 micro-architecture and performance can be found in [28] [29]. ...
... TLM-DT is presented in [7,31]. It is compliant with the TLM2 standard, but it needs to shift from a global time to a distributed time, which induces to modify all the timing information in the model. ...
... Dans ce cas, elle fournit des fonctionnalités de contrôle de flux optimal [Rad04] ou d'ordonnancement et de synchronisation des tâches [Cle09]. Cette dernière fonctionnalité est généralement fournie en logiciel, à travers un système d'exploitation par exemple, dans les réseaux comme SPIN [Cha06] et STNoC [Cop08]. ...
... All these theories assume that a message arriving at its destination is eventually consumed. As example, in the SPIN/VCI [9] network this condition is not satisfied: some deadlocks may occur due to the dependencies between different kinds of messages. VCI protocol defines two types of messages: request and responses. ...
... The advantage of FIFO is that they do not affect the locally synchronous domain's operation. Many FIFO based designs have been published recently [6], [7] and [8]. ...