ArticlePDF Available

LYCOS: the Lyngby Co-Synthesis System

March 1997
Design Automation for Embedded Systems 2(2):195-235

March 1997
2(2):195-235

DOI:10.1023/A:1008884219274

Source
DBLP

Authors:

Jan Madsen

Technical University of Denmark

Jesper Grode

VIA University College

Show all 5 authorsHide

This paper describes the LYCOS system, an experimental co-synthesis environment. We present the motivation and philosophy of LYCOS and after an overview of the entire system, the individual parts are described. We use a single CPU, single ASIC target architecture and we describe the techniques we use to estimate metrics concerning hardware, software and communication in this architecture. Finally we present a novel partitioning technique called PACE, which has shown to produce excellent results, and we demonstrate how partitioning is used to do design space exploration.

. Hardware execution time estimates for the Straight example

…

Obtaining performance metrics by the use of estimators.

…

. Grouping of sequences.

…

A CDFG representing the computation z := x + x y A CDFG is a hierarchical directed hypergraph consisting of nodes and edges. The semantics is based on a token passing mechanism, similar to colored Petri nets 33], 36]. The edges are entities on which tokens (i.e. values) can ow between nodes. Nodes can remove tokens from their input edges and place tokens on their output edges according to certain ring rules. There are diierent kinds of nodes. The nodes shown in gure 6 are all innx nodes, which have two input edges, one output edge and an associated innx operator. When an innx node, op, has tokens, say v1 and v2, on its input edges and no token on its output edge, it can re by placing the token, v1 op v2, on its output edge, and removing v1 and v2 from its input edges. Other kinds of nodes are preex nodes, constant nodes, control nodes to express conditionals, iteration nodes to express loops, void nodes to absorb tokens from edges, etc. For more details on these nodes, see 6], 41]. A graph is executed by placing tokens on its input edges and letting the nodes re until no more ring rules can be satissed. It is possible for a collection of CDFGs to communicate with each other through shared variables. Special interface nodes are used for this: Import and export nodes for sampling and updating the contents of shared variables, respectively, and wait nodes for synchronization to global events (i.e. certain contents of shared variables). Figur 7 gives an example of two CDFGs, Graph1 and Graph2, which communicate through two shared variables, s and ok. Graph1 has two export nodes, E1 and E2, which can update s and ok, respectively, with the value of a token on the x edge and b edge, respectively. Graph2 has a wait node, W1, which requires ok to contain the value true in order to re. In addition Graph2 has an import node, I1, which can sample the value of s and place it on the y edge. A common feature of the interface nodes is that they all require a token on their vertical input edge

…

. Modules and corresponding area for each of the three allocations.

…

Figures - uploaded by Jesper Grode

Content may be subject to copyright.

Content uploaded by Jesper Grode

Content may be subject to copyright.

Channel

Communication

ASICProcessor

specification

SW HWSpecification Model

t t

Estimator H

SW HW

Estimator

Lib

SW HW

Lib

ComCom

Translator

VHDL

Code Gen.

Partitioning

Analysis

Func. Lib.

Scheduling

Assignment

Quenya

Interf. Lib.

Interface

Power Est.

Synopsys

VHDLAssembler

Allocation

Add

Mult

Graph2Graph1

Add

Mult

yx1

B B

M M

Graph(s2)Graph(s1)

Graph(e)

. . .

...

Graph(e)Graph(s)

...

En En

...

Wait

ExportExportExport

Syncher

ImportImportImportImport

Wait

VVV

Add

Mult

Sub

Mult

Sub

Mult

3 AddNOP

Wait

DFG

Loop

Test

DFG

Body

DFG

Cond

Branch1

DFG

Branch2

DFG

ConGIF CDFG BSB Hierarchy

Branch2Branch1

Body

MAIN

Test

DFG

Loop

Wait

Cond

DFGDFG

DFG

Original Hierarchy.

DFG

Cond BSB collapsed.

Loop

Test and Body BSBs

Test

DFG

Body

Test

Loop

Wait

DFG

Cond

DFG

Body

DFG

Test

Loop

DFG

Wait

Body

DFG

Wait

DFG

Branch2

DFG

Branch1

Cond

Seven leaf BSBs. collapsed. Five leaf BSBs.Six leaf BSBs.

A) B) C)

DatapathController

DatapathController DatapathController

B2 B1 B2

X U W V X Y Z

V W X Y ZU

U V W Y X Z

T = 1 T = 2 T = 3 T = 4 T = 5 T = 6

B) Three different topological sortingsA) Simple data flow graph

dmem3 <-- dmem1 + dmem2

generic instruction

. . .

dmem3 <-- dmem1 + dmem2

. . .

execution time size

35 . . .

mov a6@(offset1), d0 (7)

mov d0, a6@(offset3) (5)

add a6@(offset2), d0 (2+EA2)

mov ax, word ptr[bp+offset1] (10)

add ax, word ptr[bp+offset2] (9+EA1)

mov word ptr[bp+offset3], ax (10)

generic instruction

. . .

dmem3 <-- dmem1 + dmem2

. . .

execution time size

. . .22

8086 instructions 68020 instructions

Generic instruction

Technology file for 68020Texhnology file for 8086

A) B)

SW HW SW HW

3,4

6,7

A CB D

BC CD

= 1 = 1 = 1

= 10 = 2 = 10

=2 =2 =4

= 1

= 5

s s s s

a a a a

ABCD

BCD

ABC

(a=4, s=35)

(a=3, s=28)

(a=2, s=16)

(a=1, s=10)

(a=1, s=2)

(a=2, s=14)

(a=3, s=21)

(a=1. s=10)

(a=2, s=17)

(a=1,s=5)

5 5 5 5

171717

10 10 + 5 = 15 10 + 5 = 15 10 + 5 = 15

21 21

14 + 5 = 1914 + 5 = 1914

2 + 10 = 122 2 + 17 = 19 2 + 17 = 19

28 + 5 = 3328

16 + 10 = 26 16 + 17 = 3316

1 2 3 4

Best:

Best: S : 10 S : 17 S : 17 S : 17

Best: S : 10 S : 17 S : 21 S : 21

Best: : 10 S : 20 S : 28 S : 35

S 10 10 + 10 = 20 10 + 17 = 27 10 + 21 = 31

Area:

Group D:

Group C:

Group B:

Group A:

BestChoice[D, 4]

BestSpeedup[D, 4]

A,A

A,B

B,B

A,C

B,C

C,C

A,D

B,D

C,D

D,D

B,B

B,B A,B

A,B

D,D

: 5S : 5S: 5S : 5S

A,A A,A A,A A,A

A,B A,B

A,C A,C

B,D A,D

Speedup[S

, 2]

D,D

200000

400000

600000

800000

1000000

1200000

1000 1200 1400 1600 1800 2000 2200 2400

Resulting clockcycles

Total chip area

Knapsack algorithm - instantaneous communication

Knapsack algorithm - simple communication

PACE algorithm - adjacent block communication

SW HW

Simple Communication Adjacent Block CommunicationInstantaneous Communication

200000

400000

600000

800000

1000000

1200000

1000 1200 1400 1600 1800 2000 2200 2400

Resulting clockcycles

Total chip area

Knapsack algorithm - instantaneous communication

Knapsack algorithm - simple communication

PACE algorithm - adjacent block communication

100000

200000

300000

400000

500000

600000

700000

500 1000 1500 2000 2500

Resulting clockcycles

Total chip area

Allocation A

Allocation B

Allocation C

0 1 2 3 4

All-HW execution time

Speedup

150

200

100

Number of combinatorial multipiers (mul-comb)

Hardware cycles

Speedup

A hardware/software partitioning method based on graph convolution network

Article

Full-text available

Dec 2021
DES AUTOM EMBED SYST

Hardware/software (HW/SW) partitioning is the crucial step in HW/SW co-design, which can significantly reduce the time-to-market and improves the performance of an embedded system. Due to that the majority of previous works have large exploration time and generate often low-quality solutions for large scale systems, we propose a fast HW/SW partitioning approach based on graph convolution network (GCN) to address this problem. To the best of our knowledge, it is a new partitioning method based on GCN which is a gradient-based optimization approach. It can aggressively speed up the partitioning process. To quantify the quality of solutions, the scheduling is integrated into the partitioning process. The experiment results show that not only does our proposed method outperform existing metaheuristics approaches in terms of the efficiency (e.g., 18\(\times \) faster than Kernighan–Lin algorithm for the task graphs with 1000 nodes), but it also improves the quality of HW/SW partitioning (e.g., more than 10% acceleration ratio (AR) improvement for the 1000 nodes graphs).

Quadratic Integer Programming Approach for Reliability Optimization of Cyber-Physical Systems Under Uncertainty Theory

Conference Paper

Full-text available

Mar 2021

Cyber-physical systems (CPSs) are an example of software and hardware components working in symphony. The greatest challenge in CPS design and verification is to design a CPS to be reliable while encountering various uncertainties from the environment and its constituent subsystems. Cost, delay, and reliability of a CPS are functions of software–hardware partitioning of the CPS design. Hence, one of the key challenges in CPS design is to achieve reliability maximization while factoring in uncertainty in cost and delay. This work leverages the problem formulation developed in recent research (Jiang et al., Uncertainty theory based reliability-centric cyber-physical system design, in 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 208–215 (2019)), which poses CPS design as an optimization problem for reliability assurance while factoring in uncertainty in cost and delay. In this formulation, cost and delay are modeled as variables with uncertainty distributions under uncertainty theory, and the reliability requirement becomes an optimization objective. The authors of Uncertainty theory based reliability-centric cyber-physical system design, in 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 208–215 (2019) also show that heuristic solutions of this optimization problem can produce hardware/software partitioning which has potential to offer greater reliability under uncertainty. The novel contribution of this work is the exploration of alternate heuristics to genetic algorithm used in Uncertainty theory based reliability-centric cyber-physical system design, in 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 208–215 (2019) by Jiang et al. to solve the optimization problem. We conclude that treating the optimization problem as a 0–1 integer quadratic programming problem is feasible and then explore a few heuristics to solve such problems. Next, we solve this problem with a heuristic method. Preliminary results suggest that this solution method can achieve better reliability.

Quadratic Integer Programming Approach for Reliability Optimization of Cyber-Physical Systems under Uncertainty Theory

Book

Full-text available

Jul 2020

Cyber-physical systems (CPS) are an example of software and hardware components working in symphony. The greatest challenge in CPS design and verification is to design a CPS to be reliable while encountering various uncertainties from the environment and its constituent subsystems. Cost, delay and reliability of a CPS are functions of software-hardware partitioning of the CPS design. Hence, one of the key challenges in CPS design is to achieve reliability maximization while factoring in uncertainty in cost and delay. This work leverages the problem formulation developed in recent research [13], which poses CPS design as an optimization problem for reliability assurance while factoring in uncertainty in cost and delay. In this formulation cost and delay are modeled as variables with uncertainty distributions under uncertainty theory, and the reliability requirement becomes an optimization objective. Authors of [13] also show heuristic solutions of this optimization problem can produce hard-ware/software partitioning which has potential to offer greater reliability under uncertainty. The novel contribution of this work is the exploration of alternate heuristics to genetic algorithm used in [13] to solve the optimization problem. We conclude that treating the optimization problem as a 0-1 integer quadratic programming problem is feasible and then explore a few heuristics to solve such problems. Next, we solve this problem with an heuristic method. Preliminary results suggest that this solution method can achieve better reliability.

Design of a Reliable Multicast Protocol Using HW/SW Codesign Based on Performance Optimization with Genetic Algorithms

Conference Paper

Full-text available

Jan 2002

Projeto de Protocolos Utilizando HW/SW Codesign Baseado na Otimização de Desempenho por Algoritmos Genéticos

Conference Paper

Full-text available

Jan 2003

A general approach to solving hardware and software partitioning problem based on evolutionary algorithms

Article

Apr 2021
ADV ENG SOFTW

Hardware/software partitioning (HW/SW) is a significant problem in hardware-software co-design, and it is also an NP-hard problem. For large-scale partitioning problems, it is difficult to solve and time-consuming. In order to solve HW/SW quickly and efficiently, this paper proposes a novel idea for solving HW/SW based on evolutionary algorithms (EAs). Firstly, for the defect of infeasible solutions on the performance of the algorithm when using EAs to solve HW/SW, a greedy repair optimization method GROM of handling infeasible solutions is proposed. Then, a general framework for solving HW/SW based on EAs is given. Finally, genetic algorithm (GA), binary particle swarm optimization (BPSO), binary differential evolution algorithm with hybrid encoding (HBDE) and group theory-based optimization algorithm (GTOA) are used to solve large-scale HW/SW instances based on the above framework. The feasibility and effectiveness of the new method proposed in this paper are verified by comparing the good and bad of the calculation results and pointed out that the performance of GTOA and BPSO is better than that of HBDE and GA for solving the HW/SW problem.

Approximation on Data Flow Graph Execution for Energy Efficiency

Chapter

Jan 2022

Data flow graph (DFG) is a popular model for software and its execution. It consists of a list of arithmetic operations without conditionals and their dependencies. Completion time and energy consumption are two main objectives for DFG optimization. In this chapter, we discuss approximation methods at different levels of DFG that can reduce energy consumption with a guaranteed quantity of results. First, we consider a probabilistic design framework that approximates the application by intentionally terminating certain DFG executions before reaching the deadline. Second, we demonstrate a real-time estimation-and-recomputing approach that executes the non-critical parts of the DFG with approximation. Finally, we use the floating-point logarithmic operation as an example to show how to optimize data bit width based on the DFG model.

Hardware Software Partitioning Using Four Levels Hybrid Algorithm Technique

Conference Paper

Apr 2020

Embedded Systems Hardware Software Partitioning Approach Based on Game Theory

Chapter

Feb 2020

Embedded systems are the principal element in modern electronic devices and in intelligent systems. An Embedded system (ES) is generally composed of hardware blocks (ASIC, FPGA) and software blocks that run on a microprocessor. The hardware (HW) and the software (SW) are executing in collaboration to achieve specific functionalities of the system. The non-functional requirements have a big impact on the design of modern ES. The objective of new design methodologies such as the Co-design is to meet the functional specifications and to achieve the best possible balance between the non-functional requirements. The Hardware Software Partitioning (HSP) is a key step in this process of Co-design. For each block of the system, the HSP decides whether it is more advantageous to be assigned to the hardware part or to the software part. The most important metrics involved in the HSP process, are the cost of the hardware area and the execution time. The majority of previous works study the optimization of one metric with the respect of a given constraint on the other metric. In this paper, we propose a novel approach aimed to simultaneously optimize the hardware area and the execution time of the system. The approach is inspired from the GO game and based on Minimax algorithm. Experimental results show that the proposed approach leads to more optimal solutions compared to the Genetic Algorithm (GA).

Embedded Systems Hardware Software Partitioning using Minimax Algorithm

Conference Paper

Oct 2019

Embedded systems (ES) represent the most important elements in modern intelligent systems. An ES is a mix of hardware blocks (HW) and software blocks (SW), executing in collaboration to achieve specific functionalities. Designing a good ES is driven by several factors, related to non-functional requirements. The most influencing factors are the cost of the hardware area and the execution time. The Co-design is one of the most design methodologies, used to optimize those factors while meeting the functional specifications. The Hardware Software Partitioning (HSP) is a major step in this process of Co-design. The HSP decides for each block, whether it is more advantageous to be affected to the hardware part or to the software part. Most of previous works study the optimization of one factor with the respect of a given constraint on the other factor. In this paper, we propose a novel approach aimed to simultaneously optimize the hardware area and the execution time of the system. The approach is inspired from the GO game and based on Minimax algorithm. Experimental results show that the proposed approach leads to more optimal solutions comparing to the Genetic Algorithm (GA).

ESTIMATION OF VISUAL MOTION IN IMAGE SEQUENCES

Article

Full-text available

Rasmus Larsen

System Synthesis via Hardware-Software Co-design

Article

Full-text available

Synthesis of circuits containing application-specific as well as re-programmable components such as off-the-shelf microprocessors provides a promising approach to realization of complex systems using a minimal amount of application-s pecific hardware while still meeting the required performance constraints. We formulate the synthesis problem of complex behavioral descriptions with performance constraints as a hardware-softw are co-design problem. The target system architecture consists of a software component as a program running on a re-programmable processor assisted by application-specific hardware com- ponents. System synthesis is performed by first partitioning the input system description into hardware and software portions and then by implementing each of them separately. We consider the problem of identifying potential hardware and software components of a system described in a high-level modeling language. Partitioning approaches are presented based on decoupling of data and control flow, and based I on communication /synchronization requirements of the resulting system design. Synchronization between various elements of a mixed system design is one of the key issues that any synthesis system must address. We present software and interface synchronization schemes that facilitate communication between system components. We explore the relationship between the non-determinism in'the system models and the associated synchronization schemes needed in system implementations. The synthesis of dedicated hardware is achieved by hardware synthesis tools ( 11, while the software component is generated using software compiling techniques. We present tools to perform synthesis of a system description into hardware and software components. The resulting software component is assumed to be implemented for the DLX machine, a load/store microprocessor. We present design of an ethernet based network coprocessor to demonstrate the feasibility of mixed system synthesis.

Parallel Program Design, A Foundation

Article

Jan 1988

Statecharts: A Visual Formalism For Complex Systems

Article

Jun 1987
SCI COMPUT PROGRAM

David Harel

We present a broad extension of the conventional formalism of state machines and state diagrams, that is relevant to the specification and design of complex discrete-event systems, such as multi-computer real-time systems, communication protocols and digital control units. Our diagrams, which we call statecharts, extend conventional state-transition diagrams with essentially three olements, dealing, respectively, with the notions of hierarchy, concurrency and communication. These transform the language of state diagrams into a highly structured' and economical description language. Statecharts are thus compact and expressive--small diagrams can express complex behavior--as well as compositional and modular. When coupled with the capabilities of computerized graphics, statecharts enable viewing the description at different levels of detail, and make even very large specifications manageable and comprehensible. In fact, we intend to demonstrate here that statecharts counter many of the objections raised against conventional state diagrams, and thus appear to render specification by diagrams an attractive and plausible approach. Statecharts can be used either as a stand-alone behavioral description or as part of a more general design methodology that deals also with the system's other aspects, such as functional decomposition and data-flow specification. We also discuss some practical experience that was gained over the last three years in applying the statechart formalism to the specification of a particularly complex system.

Introduction To Algorithms

Book

Jan 2001
J OPER RES SOC

Ptolemy: A frame-work for simulating and prototyping heterogeneous systems

Article

Jan 2002

The ASCIS data flow graph : semantics and textual format

Article

Jan 1991

Hardware-software codesign of embedded controllers based on hardware extraction

Article

Co-Synthesis of Hardware and Software for Digital Embedded Systems

Book

Jan 1995

Rajesh Gupta

A Formal Approach to Hardware Design

Book

Jan 1994

Jørgen Staunstrup

Preface. 1. Formal Design Methods. 2. Designing with Transitions. 3. Formal Verification. 4. Synchronous Designs. 5. Synchronous Realizations. 6. Refinement. 7. Self-Timed Circuits. 8. Towards Larger Designs. 9. Epilog. A: Synchronized Transitions Report. Index.

LYCOS: the Lyngby Co-Synthesis System

Abstract and Figures

Recommended publications

NATO Science for Peace and Security Series C: Environmental Security

The European Union (EU) and the emerging African peace and security architecture

Kant heute

A PC-based Pacing System For Collection Of Intracardiac Electrograms