ArticlePDF Available

LYCOS: the Lyngby Co-Synthesis System

Authors:

Abstract and Figures

This paper describes the LYCOS system, an experimental co-synthesis environment. We present the motivation and philosophy of LYCOS and after an overview of the entire system, the individual parts are described. We use a single CPU, single ASIC target architecture and we describe the techniques we use to estimate metrics concerning hardware, software and communication in this architecture. Finally we present a novel partitioning technique called PACE, which has shown to produce excellent results, and we demonstrate how partitioning is used to do design space exploration.
A CDFG representing the computation z := x + x y A CDFG is a hierarchical directed hypergraph consisting of nodes and edges. The semantics is based on a token passing mechanism, similar to colored Petri nets 33], 36]. The edges are entities on which tokens (i.e. values) can ow between nodes. Nodes can remove tokens from their input edges and place tokens on their output edges according to certain ring rules. There are diierent kinds of nodes. The nodes shown in gure 6 are all innx nodes, which have two input edges, one output edge and an associated innx operator. When an innx node, op, has tokens, say v1 and v2, on its input edges and no token on its output edge, it can re by placing the token, v1 op v2, on its output edge, and removing v1 and v2 from its input edges. Other kinds of nodes are preex nodes, constant nodes, control nodes to express conditionals, iteration nodes to express loops, void nodes to absorb tokens from edges, etc. For more details on these nodes, see 6], 41]. A graph is executed by placing tokens on its input edges and letting the nodes re until no more ring rules can be satissed. It is possible for a collection of CDFGs to communicate with each other through shared variables. Special interface nodes are used for this: Import and export nodes for sampling and updating the contents of shared variables, respectively, and wait nodes for synchronization to global events (i.e. certain contents of shared variables). Figur 7 gives an example of two CDFGs, Graph1 and Graph2, which communicate through two shared variables, s and ok. Graph1 has two export nodes, E1 and E2, which can update s and ok, respectively, with the value of a token on the x edge and b edge, respectively. Graph2 has a wait node, W1, which requires ok to contain the value true in order to re. In addition Graph2 has an import node, I1, which can sample the value of s and place it on the y edge. A common feature of the interface nodes is that they all require a token on their vertical input edge
… 
Content may be subject to copyright.
Channel
Communication
ASICProcessor
HW
SW
specification
SW HWSpecification Model
S
SW
t t
C
t
Estimator H
SW HW
Estimator
Estimator
HW
Lib
Lib
SW HW
Lib
ComCom
Translator
C
Translator
VHDL
Code Gen.
Partitioning
Analysis
Func. Lib.
Scheduling
Assignment
Quenya
Interf. Lib.
Interface
Power Est.
Synopsys
VHDLAssembler
Allocation
Add
Mult
z
a
y
x
Graph2Graph1
b
x
E2
E1
s
c2
c3
c1
k3
k1
k2
W1
I1
y
ok
Add
Mult
a
yx1
x2
B B
M M
Graph(s2)Graph(s1)
Graph(e)
. . .
...
...
b
b
b
Graph(e)Graph(s)
...
En En
Ex
Ex
b
b
b
...
Wait
ExportExportExport
Syncher
ImportImportImportImport
Wait
B
V
VVV
Add
Mult
Sub
Mult
Mult
3
Sub
Mult
Mult
Mult
3 AddNOP
Wait
DFG
Loop
Test
FU
DFG
Body
DFG
Cond
Branch1
DFG
Branch2
DFG
DFG
ConGIF CDFG BSB Hierarchy
Branch2Branch1
Body
MAIN
Test
DFG
DFG
Loop
Wait
Cond
DFGDFG
DFG
Fu
DFG
Original Hierarchy.
DFG
Cond BSB collapsed.
Loop
Test and Body BSBs
Test
FU
DFG
DFG
Body
Test
Loop
Wait
DFG
DFG
Cond
DFG
Body
DFG
FU
Test
Loop
DFG
Wait
Body
DFG
Wait
DFG
DFG
Branch2
DFG
Branch1
Cond
Seven leaf BSBs. collapsed. Five leaf BSBs.Six leaf BSBs.
A) B) C)
DatapathController
B1
DatapathController DatapathController
B2 B1 B2
V
U
W
Y
Z
X U W V X Y Z
V W X Y ZU
U V W Y X Z
T = 1 T = 2 T = 3 T = 4 T = 5 T = 6
1)
2)
3)
B) Three different topological sortingsA) Simple data flow graph
dmem3 <-- dmem1 + dmem2
generic instruction
. . .
dmem3 <-- dmem1 + dmem2
. . .
execution time size
35 . . .
mov a6@(offset1), d0 (7)
mov d0, a6@(offset3) (5)
add a6@(offset2), d0 (2+EA2)
mov ax, word ptr[bp+offset1] (10)
add ax, word ptr[bp+offset2] (9+EA1)
mov word ptr[bp+offset3], ax (10)
generic instruction
. . .
dmem3 <-- dmem1 + dmem2
. . .
execution time size
. . .22
8086 instructions 68020 instructions
Generic instruction
Technology file for 68020Texhnology file for 8086
B1
B2
B5
B8
B1
B2
B3
B4
B5
B6
B7
B8
A) B)
SW HW SW HW
B3
B4
3,4
6,7
B6
B7
S
S
A CB D
BC CD
= 1 = 1 = 1
B
B
C
C
D
D
= 10 = 2 = 10
AB
=2 =2 =4
= 1
A
= 5
A
s s s s
a a a a
ss
s
S
ABCD
BCD
CD
D
BC
C
ABC
B
AB
A
(a=4, s=35)
(a=3, s=28)
(a=2, s=16)
(a=1, s=10)
(a=1, s=2)
(a=2, s=14)
(a=3, s=21)
(a=1. s=10)
(a=2, s=17)
(a=1,s=5)
S
S
S
S
S
S
5 5 5 5
171717
10 10 + 5 = 15 10 + 5 = 15 10 + 5 = 15
21 21
14 + 5 = 1914 + 5 = 1914
2 + 10 = 122 2 + 17 = 19 2 + 17 = 19
35
28 + 5 = 3328
16 + 10 = 26 16 + 17 = 3316
1 2 3 4
Best:
Best: S : 10 S : 17 S : 17 S : 17
Best: S : 10 S : 17 S : 21 S : 21
Best: : 10 S : 20 S : 28 S : 35
S 10 10 + 10 = 20 10 + 17 = 27 10 + 21 = 31
Area:
Group D:
Group C:
Group B:
Group A:
BestChoice[D, 4]
BestSpeedup[D, 4]
S
S
S
A,A
A,B
B,B
A,C
B,C
C,C
A,D
B,D
C,D
D,D
B,B
B,B
B,B A,B
A,B
D,D
: 5S : 5S: 5S : 5S
A,A A,A A,A A,A
A,B A,B
A,C A,C
B,D A,D
Speedup[S
, 2]
D,D
0
200000
400000
600000
800000
1000000
1200000
1000 1200 1400 1600 1800 2000 2200 2400
Resulting clockcycles
Total chip area
Knapsack algorithm - instantaneous communication
Knapsack algorithm - simple communication
PACE algorithm - adjacent block communication
B1
B2
B3
B4
SW HW
B1
B2
B3
B4
B1
B2
B3
B4
SW HW
SW HW
Simple Communication Adjacent Block CommunicationInstantaneous Communication
0
200000
400000
600000
800000
1000000
1200000
1000 1200 1400 1600 1800 2000 2200 2400
Resulting clockcycles
Total chip area
Knapsack algorithm - instantaneous communication
Knapsack algorithm - simple communication
PACE algorithm - adjacent block communication
100000
200000
300000
400000
500000
600000
700000
500 1000 1500 2000 2500
Resulting clockcycles
Total chip area
Allocation A
Allocation B
Allocation C
0
5
10
15
20
0 1 2 3 4
All-HW execution time
Speedup
50
150
200
100
Number of combinatorial multipiers (mul-comb)
Hardware cycles
Speedup
... There are mainly two categories for partitioning approaches: exact solutions and heuristic solutions. The family of exact solutions includes dynamic programming (DP) [20,34,43,48], integer linear programming (ILP) [3,38,50,55], and branch and bound [7,21,36]. An exact algorithm performs an exhaustive search in the solution space and finds the optimal result. ...
... A substantial amount of researches has been investigated on HW/SW partitioning. For small scale problems in HW/SW partitioning, some exact algorithms are proposed to obtain an exact solution via the exhaustive search in the feasible space [3,7,20,21,34,36,38,43,48,50,55]. In [38], an algorithm that is able to solve HW/SW partitioning problem using integer programming is presented. ...
Article
Full-text available
Hardware/software (HW/SW) partitioning is the crucial step in HW/SW co-design, which can significantly reduce the time-to-market and improves the performance of an embedded system. Due to that the majority of previous works have large exploration time and generate often low-quality solutions for large scale systems, we propose a fast HW/SW partitioning approach based on graph convolution network (GCN) to address this problem. To the best of our knowledge, it is a new partitioning method based on GCN which is a gradient-based optimization approach. It can aggressively speed up the partitioning process. To quantify the quality of solutions, the scheduling is integrated into the partitioning process. The experiment results show that not only does our proposed method outperform existing metaheuristics approaches in terms of the efficiency (e.g., 18\(\times \) faster than Kernighan–Lin algorithm for the task graphs with 1000 nodes), but it also improves the quality of HW/SW partitioning (e.g., more than 10% acceleration ratio (AR) improvement for the 1000 nodes graphs).
... Finding a software/hardware partition of a CPS is an ongoing challenge. Automation of this problem has been already researched and exact partitioning [5], [22], [20] or heuristic partitioning models have been deployed. [30], [6], [7]. ...
... The existing models and algorithms for partitioning can be broadly differentiated as exact partitioning and heuristic partitioning models. The exact algorithms comprised branch-and bound [5], integer linear programming [22] and dynamic programming [16], [20], whereas the heuristic algorithm consists of simulated annealing [9], [21], [27], [26] and genetic algorithm. ...
Conference Paper
Full-text available
Cyber-physical systems (CPSs) are an example of software and hardware components working in symphony. The greatest challenge in CPS design and verification is to design a CPS to be reliable while encountering various uncertainties from the environment and its constituent subsystems. Cost, delay, and reliability of a CPS are functions of software–hardware partitioning of the CPS design. Hence, one of the key challenges in CPS design is to achieve reliability maximization while factoring in uncertainty in cost and delay. This work leverages the problem formulation developed in recent research (Jiang et al., Uncertainty theory based reliability-centric cyber-physical system design, in 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 208–215 (2019)), which poses CPS design as an optimization problem for reliability assurance while factoring in uncertainty in cost and delay. In this formulation, cost and delay are modeled as variables with uncertainty distributions under uncertainty theory, and the reliability requirement becomes an optimization objective. The authors of Uncertainty theory based reliability-centric cyber-physical system design, in 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 208–215 (2019) also show that heuristic solutions of this optimization problem can produce hardware/software partitioning which has potential to offer greater reliability under uncertainty. The novel contribution of this work is the exploration of alternate heuristics to genetic algorithm used in Uncertainty theory based reliability-centric cyber-physical system design, in 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 208–215 (2019) by Jiang et al. to solve the optimization problem. We conclude that treating the optimization problem as a 0–1 integer quadratic programming problem is feasible and then explore a few heuristics to solve such problems. Next, we solve this problem with a heuristic method. Preliminary results suggest that this solution method can achieve better reliability.
... Finding a software/hardware partition of a CPS is an ongoing challenge. Automation of this problem has been already researched and exact partitioning [5], [22], [20] or heuristic partitioning models have been deployed. [30], [6], [7]. ...
... The existing models and algorithms for partitioning can be broadly differentiated as exact partitioning and heuristic partitioning models. The exact algorithms comprised branch-and bound [5], integer linear programming [22] and dynamic programming [16], [20], whereas the heuristic algorithm consists of simulated annealing [9], [21], [27], [26] and genetic algorithm. ...
Book
Full-text available
Cyber-physical systems (CPS) are an example of software and hardware components working in symphony. The greatest challenge in CPS design and verification is to design a CPS to be reliable while encountering various uncertainties from the environment and its constituent subsystems. Cost, delay and reliability of a CPS are functions of software-hardware partitioning of the CPS design. Hence, one of the key challenges in CPS design is to achieve reliability maximization while factoring in uncertainty in cost and delay. This work leverages the problem formulation developed in recent research [13], which poses CPS design as an optimization problem for reliability assurance while factoring in uncertainty in cost and delay. In this formulation cost and delay are modeled as variables with uncertainty distributions under uncertainty theory, and the reliability requirement becomes an optimization objective. Authors of [13] also show heuristic solutions of this optimization problem can produce hard-ware/software partitioning which has potential to offer greater reliability under uncertainty. The novel contribution of this work is the exploration of alternate heuristics to genetic algorithm used in [13] to solve the optimization problem. We conclude that treating the optimization problem as a 0-1 integer quadratic programming problem is feasible and then explore a few heuristics to solve such problems. Next, we solve this problem with an heuristic method. Preliminary results suggest that this solution method can achieve better reliability.
... Several projects currently in progress are trying to integrate both HW and SW into the same design process: COSMOS [5], SpecSyn [6], Ptolemy [7], LYCOS [8], Chinook [9] e PISH [10]. Fischer [4] proposes an integrated HW/SW design to meet high performance in distributed systems, such as multimedia systems. ...
... Diversos grupos de pesquisa vêm desenvolvendo ambientes de projeto baseados na metodologia de codesign. Dentre eles, pode-se citar: Ptolemy [3], LYCOS [4] e PISH [5]. Esses ambientes de codesign diferem na linguagem de especificação, no método de particionamento, na arquitetura alvo e nos métodos de validação utilizados. ...
... Due to the fact that the HW/SW problem is an NP-hard problem [2], when its scale is large, the running speed of determinate algorithms [3][4][5][6]is often very slow. Hence, the determinate algorithms are difficult to satisfy the requirements of the fast solution in practical applications. ...
Article
Hardware/software partitioning (HW/SW) is a significant problem in hardware-software co-design, and it is also an NP-hard problem. For large-scale partitioning problems, it is difficult to solve and time-consuming. In order to solve HW/SW quickly and efficiently, this paper proposes a novel idea for solving HW/SW based on evolutionary algorithms (EAs). Firstly, for the defect of infeasible solutions on the performance of the algorithm when using EAs to solve HW/SW, a greedy repair optimization method GROM of handling infeasible solutions is proposed. Then, a general framework for solving HW/SW based on EAs is given. Finally, genetic algorithm (GA), binary particle swarm optimization (BPSO), binary differential evolution algorithm with hybrid encoding (HBDE) and group theory-based optimization algorithm (GTOA) are used to solve large-scale HW/SW instances based on the above framework. The feasibility and effectiveness of the new method proposed in this paper are verified by comparing the good and bad of the calculation results and pointed out that the performance of GTOA and BPSO is better than that of HBDE and GA for solving the HW/SW problem.
Chapter
Data flow graph (DFG) is a popular model for software and its execution. It consists of a list of arithmetic operations without conditionals and their dependencies. Completion time and energy consumption are two main objectives for DFG optimization. In this chapter, we discuss approximation methods at different levels of DFG that can reduce energy consumption with a guaranteed quantity of results. First, we consider a probabilistic design framework that approximates the application by intentionally terminating certain DFG executions before reaching the deadline. Second, we demonstrate a real-time estimation-and-recomputing approach that executes the non-critical parts of the DFG with approximation. Finally, we use the floating-point logarithmic operation as an example to show how to optimize data bit width based on the DFG model.
Chapter
Embedded systems are the principal element in modern electronic devices and in intelligent systems. An Embedded system (ES) is generally composed of hardware blocks (ASIC, FPGA) and software blocks that run on a microprocessor. The hardware (HW) and the software (SW) are executing in collaboration to achieve specific functionalities of the system. The non-functional requirements have a big impact on the design of modern ES. The objective of new design methodologies such as the Co-design is to meet the functional specifications and to achieve the best possible balance between the non-functional requirements. The Hardware Software Partitioning (HSP) is a key step in this process of Co-design. For each block of the system, the HSP decides whether it is more advantageous to be assigned to the hardware part or to the software part. The most important metrics involved in the HSP process, are the cost of the hardware area and the execution time. The majority of previous works study the optimization of one metric with the respect of a given constraint on the other metric. In this paper, we propose a novel approach aimed to simultaneously optimize the hardware area and the execution time of the system. The approach is inspired from the GO game and based on Minimax algorithm. Experimental results show that the proposed approach leads to more optimal solutions compared to the Genetic Algorithm (GA).
Conference Paper
Embedded systems (ES) represent the most important elements in modern intelligent systems. An ES is a mix of hardware blocks (HW) and software blocks (SW), executing in collaboration to achieve specific functionalities. Designing a good ES is driven by several factors, related to non-functional requirements. The most influencing factors are the cost of the hardware area and the execution time. The Co-design is one of the most design methodologies, used to optimize those factors while meeting the functional specifications. The Hardware Software Partitioning (HSP) is a major step in this process of Co-design. The HSP decides for each block, whether it is more advantageous to be affected to the hardware part or to the software part. Most of previous works study the optimization of one factor with the respect of a given constraint on the other factor. In this paper, we propose a novel approach aimed to simultaneously optimize the hardware area and the execution time of the system. The approach is inspired from the GO game and based on Minimax algorithm. Experimental results show that the proposed approach leads to more optimal solutions comparing to the Genetic Algorithm (GA).
Article
Full-text available
Synthesis of circuits containing application-specific as well as re-programmable components such as off-the-shelf microprocessors provides a promising approach to realization of complex systems using a minimal amount of application-s pecific hardware while still meeting the required performance constraints. We formulate the synthesis problem of complex behavioral descriptions with performance constraints as a hardware-softw are co-design problem. The target system architecture consists of a software component as a program running on a re-programmable processor assisted by application-specific hardware com- ponents. System synthesis is performed by first partitioning the input system description into hardware and software portions and then by implementing each of them separately. We consider the problem of identifying potential hardware and software components of a system described in a high-level modeling language. Partitioning approaches are presented based on decoupling of data and control flow, and based I on communication /synchronization requirements of the resulting system design. Synchronization between various elements of a mixed system design is one of the key issues that any synthesis system must address. We present software and interface synchronization schemes that facilitate communication between system components. We explore the relationship between the non-determinism in'the system models and the associated synchronization schemes needed in system implementations. The synthesis of dedicated hardware is achieved by hardware synthesis tools ( 11, while the software component is generated using software compiling techniques. We present tools to perform synthesis of a system description into hardware and software components. The resulting software component is assumed to be implemented for the DLX machine, a load/store microprocessor. We present design of an ethernet based network coprocessor to demonstrate the feasibility of mixed system synthesis.
Article
We present a broad extension of the conventional formalism of state machines and state diagrams, that is relevant to the specification and design of complex discrete-event systems, such as multi-computer real-time systems, communication protocols and digital control units. Our diagrams, which we call statecharts, extend conventional state-transition diagrams with essentially three olements, dealing, respectively, with the notions of hierarchy, concurrency and communication. These transform the language of state diagrams into a highly structured&apos; and economical description language. Statecharts are thus compact and expressive--small diagrams can express complex behavior--as well as compositional and modular. When coupled with the capabilities of computerized graphics, statecharts enable viewing the description at different levels of detail, and make even very large specifications manageable and comprehensible. In fact, we intend to demonstrate here that statecharts counter many of the objections raised against conventional state diagrams, and thus appear to render specification by diagrams an attractive and plausible approach. Statecharts can be used either as a stand-alone behavioral description or as part of a more general design methodology that deals also with the system&apos;s other aspects, such as functional decomposition and data-flow specification. We also discuss some practical experience that was gained over the last three years in applying the statechart formalism to the specification of a particularly complex system.
Book
Preface. 1. Formal Design Methods. 2. Designing with Transitions. 3. Formal Verification. 4. Synchronous Designs. 5. Synchronous Realizations. 6. Refinement. 7. Self-Timed Circuits. 8. Towards Larger Designs. 9. Epilog. A: Synchronized Transitions Report. Index.