Figure 1 - uploaded by Frank Hannig
Content may be subject to copyright.
Netlist representation. In (a), schematic graphical representation, in (b), BLIF textual representation , and in (c), netgraph representation of the given netlist.

Netlist representation. In (a), schematic graphical representation, in (b), BLIF textual representation , and in (c), netgraph representation of the given netlist.

Source publication
Conference Paper
Full-text available
In this paper we present a new approach for generating high-speed optimized event-driven register transfer level (RTL) compiled simulators. The generation of the simulators is part of our BUILDABONG(7) framework, which aims at architecture and compiler co-generation for special purpose processors. The main focus of the paper is on the transformatio...

Contexts in source publication

Context 1
... how a given netlist is transformed into such a graph representation. basic logic elements. A unidirectional 1 net f ∈ F , which interconnects n + m elements will be represented through f = ({v 1 , . . . , v n }, {u 1 , . . . , u m }), where v 1 , . . . , v n ∈ V are source nodes and u 1 , . . . , u m ∈ V are target nodes of net f . Example 1. In Fig. 1, a netlist is shown with |V | = 10 elements. Net f 1 is given by f 1 = ({r 4 }, {c 2 , c 3 }). Nodes named r i denote sequential elements whereas nodes named c i denote combinational (i.e., state free) logic ...
Context 2
... 2. In Fig. 1, a netlist and its netgraph G = (V, E), respectively, is shown. The subset of combinational elements V c = {c 2 , c 3 , c 4 , c 6 } is shown as circles and the subset of registers or sequential vertices V r = {r 3 , r 4 , r 5 , r 6 , r 9 , r 10 } is represented by ...
Context 3
... Furthermore, all of the nets are represented as directed edges e ∈ E of a netgraph G. In case, when a net contains a n : m connection, it is transformed into n × m directed edges of the netgraph, (in Example 1, for f 1 the case of a 1 : 2 connection is represented which is transformed into edges (r 4 , c 2 ) and (r 4 , c 3 ) of the graph G in Fig. 1 (c)). Given such a netgraph, a simple procedure to perform a determination of the initial sensitivity-update-mappings could be as follows: if there exist a directed path from one register v r1 to an other register v r2 and on the path between these registers are no other sequential elements, then if the value of v r1 changes, v r2 has to ...
Context 4
... on the path between these registers are no other sequential elements, then if the value of v r1 changes, v r2 has to be updated by evaluating the path of combinational elements in between. A set of such initial sensitivity-update- mappings and evaluation paths can be achieved by a search algorithm like depth-first search (DFS). For the example in Fig. 1 (c), the initial sensitivity-update-mappings are extracted in (a): In a technical implementation, for instance, registers r 6 and r 9 would have to be updated twice if r 3 and r 4 would have changed their values compared to the previous simulation ...

Similar publications

Article
Full-text available
Let G be a graph of size q and a, n, d be positive integers for which n/2(2a+(n-1)d) ≤ q < (2n+1/2)(2a + nd). Then G is said to have (a, d)- ascending subgraph decomposition into n parts ((a, d) - ASD) if the edge set of G can be partitioned into n-non-empty sets generating subgraphs G 1, G 2, G 3, ..., G n without isolated vertices such that each...
Article
Full-text available
We present here a general framework to design algorithms that compute H-join. For a given bipartite graph H, we say that a graph G admits a H-join decomposition or simply a H-join, if the vertices of G can be partitioned in |H| parts connected as in H. This graph H is a kind of pattern, that we want to discover in G. This framework allows us to pre...
Conference Paper
Full-text available
We propose a “Sharp” disjunctive decomposition approach for language emptiness checking which is specifically targeted at “Large” or “Difficult” problems. Based on the SCC (Strongly-Connected Component) quotient graph of the property automaton, our method partitions the entire state space so that each state subspace accepts a subset of the language...
Conference Paper
Full-text available
We survey the use and effect of decomposition-based techniques in qualitative constraint-based reasoning, and clarify the notions of a tree decomposition, a chordal graph, and a partitioning graph, and their implication with a particular constraint property that has been extensively used in literature, namely, patchwork. As a consequence, we prove...
Article
Full-text available
Power system has a highly interconnected network that requires intense computational effort and resources for centralised control. Distributed computing is a solution to this and needs the systems to be partitioned optimally into clusters. The network partitioning is an optimization problem whose objective is to minimise the number of nodes in a cl...

Citations

... Remark. Our previous simulator for TCPAs was event-driven and had instructions precompiled instead of decoding them at runtime like the hardware does [4]. We have since found that a more accurate simulator does indeed facilitate prototyping and validating new compiler and accelerator features. ...
Conference Paper
To quickly prototype accelerator/compiler co-designs, fast and highly accurate architectural simulators are indispensable. They must be fast to keep design iteration times low; they must be highly accurate to make simulation results meaningful. In this paper, we describe how to construct such fast, cycle-accurate simulators from an architectural model by using C++ templates. Not only are templates fully resolved at compile time, thus offering ample opportunity for optimization, they also aptly mirror synthesis-time parameterization of accelerators. For each hardware component, we encode these architecture parameters in a C++ type and construct a class templated on this type. Hierarchically composing the component classes then yields the overall simulator. To demonstrate our constructed simulators' speedup, we construct two simulators for a lightweight VLIW processor, one with, one without templates, and measured their performance: the templated simulator is about 4.85 times faster. Their execution speed makes our simulators well-suited for compiler validation and prototyping accelerator features.
... A prominent feature of massively parallel processor architectures is a high number of registers. From one hand, the RT level high-speed simulation approach presented in [KHT04b] can be applied here as it enables high-speed RTL simulation of complex architectures with a large amount of registers, such as existent in processor arrays. This simulation methodology provides a direct automatic generation of the simulator from the given RTL netlist. ...
... On the other hand, instruction-set-level (ISL) simulation can typically perform at much higher speeds. Therefore, in order to enable the variations of the tradeoffs between speed of ISL and precision of RTL simulations the simulation approach presented in [KHT04b] should be extended to efficient ISL-simulation. Details on how to efficiently and accurately simulate complete processor array architectures will not be touched here. ...
... The aim to bring the ARCHITECTURECOMPOSER framework to the higher level of System-on-Chip and multi-core interactive design required two things. First, faster simulations as the above described ASM approach were needed thus we recently developed the socalled RasimSimulator [19], [20] , which can automatically generate a stand-alone cycle-accurate and bit-true C++-compiled fast simulator which executes the simulation approximately up to two orders of magnitude faster, compared to the Modelsim [9] VHDL-RTL simulation . Second, the integration of Wishbone OCB and several peripheral IP as presented in this paper are necessary to bring the framework to the higher level of SoC design. ...
... With the help of several optimization techniques based on graph theoretic approaches [19], a speedup of 6 was achieved compared to Modelsim, seeTable 1. Because of the asynchronous communication mode the high potential of these techniques for synchronous designs [19], [20] could not be fully exploited in this case. ...
Conference Paper
Full-text available
The integration of different Intellectual Property (IP) cores to modern System-on-Chip (SoC) de- signs becomes more and more an important topic because of the benefits in the overall system performance and the design costs. In this paper we present a new generic framework consisting of a graphical user interface with an extendable highly parameterizable IP component library for con- venient SoC architecture entry, as well as software tools, which provide an automatic generation of fast cycle-accurate simulators for verification purposes and synthesizable HDL code for hardware synthesis. Because the communication of the single IP cores also plays an important role, our IP core library includes an open-source bus component, which is used in a case study design. Topics: System-on-a-chip: design and methodology, Case studies, FPGA-based design
... Prominent feature of massively parallel processor architectures is a high number of registers. The RT level high-speed simulation approach presented in [18] is applied here as it enables high-speed RTL simulation of complex architectures with a large amount of registers, such as existent in processor arrays. This simulation methodology provides a direct automatic generation of the simulator from the given RTL netlist. ...
... This paper is based on our work presented in [7], where we proposed a mixed register-transfer level compiled simulation technique where the simulator is automatically extracted from a RTL description and the application program is compiled prior to simulatorrun. In this paper we present a novel strategy in order to derive optimal simulators. ...
... In [7], we presented a graph decomposition algorithm. The algorithm transforms a graph representing a given RTL circuit into subgraphs, denoting the minimal subsets of sequential elements which have to be reevaluated during each simulation cycle. ...
... We performed a number of tests to measure the number of comparisons and the number of updates by using the proposed IF-THEN-ELSE trees extraction algorithm. Applying our new algorithm, leads to a reduction of comparisons up to 30% and the same optimal number of register updates compared with the results of the decomposition algorithm, presented in [7]. As a realistic case study, a simulator for the MIPS R3000 32-bit processor has been generated and evaluated, too. ...
Article
Full-text available
In this paper we focus on the derivation of optimal code when generating high-speed event-driven com-piled simulators for processor architectures described on register transfer level (RTL). The simulators' gen-eration is part of a framework, which aims at archi-tecture and compiler co-generation for special pur-pose processors. The main contribution of this paper is an efficient algorithm to generate optimal if-then-else structures in order to perform the update cycle during the event-driven simulation process. Our ap-proach guarantees that during one simulation cycle a possible change of each register content is checked ex-actly once and that each register is updated at most once. Additionally, the proposed technique minimizes the code size of the generated simulator. The simu-lator's superior performance compared to an existing commercial simulator is shown. Finally, we demon-strate the pertinence of our approach by simulating a MIPS processor.
Chapter
Optimization is the process of finding the best input variable values from among all possibilities without explicitly evaluating each possibility.