Content uploaded by Jesper Grode
Author content
All content in this area was uploaded by Jesper Grode
Content may be subject to copyright.
Channel
Communication
ASICProcessor
HW
SW
specification
SW HWSpecification Model
S
SW
t t
C
t
Estimator H
SW HW
Estimator
Estimator
HW
Lib
Lib
SW HW
Lib
ComCom
Translator
C
Translator
VHDL
Code Gen.
Partitioning
Analysis
Func. Lib.
Scheduling
Assignment
Quenya
Interf. Lib.
Interface
Power Est.
Synopsys
VHDLAssembler
Allocation
Add
Mult
z
a
y
x
Graph2Graph1
b
x
E2
E1
s
c2
c3
c1
k3
k1
k2
W1
I1
y
ok
Add
Mult
a
yx1
x2
B B
M M
Graph(s2)Graph(s1)
Graph(e)
. . .
...
...
b
b
b
Graph(e)Graph(s)
...
En En
Ex
Ex
b
b
b
...
Wait
ExportExportExport
Syncher
ImportImportImportImport
Wait
B
V
VVV
Add
Mult
Sub
Mult
Mult
3
Sub
Mult
Mult
Mult
3 AddNOP
Wait
DFG
Loop
Test
FU
DFG
Body
DFG
Cond
Branch1
DFG
Branch2
DFG
DFG
ConGIF CDFG BSB Hierarchy
Branch2Branch1
Body
MAIN
Test
DFG
DFG
Loop
Wait
Cond
DFGDFG
DFG
Fu
DFG
Original Hierarchy.
DFG
Cond BSB collapsed.
Loop
Test and Body BSBs
Test
FU
DFG
DFG
Body
Test
Loop
Wait
DFG
DFG
Cond
DFG
Body
DFG
FU
Test
Loop
DFG
Wait
Body
DFG
Wait
DFG
DFG
Branch2
DFG
Branch1
Cond
Seven leaf BSBs. collapsed. Five leaf BSBs.Six leaf BSBs.
A) B) C)
DatapathController
B1
DatapathController DatapathController
B2 B1 B2
V
U
W
Y
Z
X U W V X Y Z
V W X Y ZU
U V W Y X Z
T = 1 T = 2 T = 3 T = 4 T = 5 T = 6
1)
2)
3)
B) Three different topological sortingsA) Simple data flow graph
dmem3 <-- dmem1 + dmem2
generic instruction
. . .
dmem3 <-- dmem1 + dmem2
. . .
execution time size
35 . . .
mov a6@(offset1), d0 (7)
mov d0, a6@(offset3) (5)
add a6@(offset2), d0 (2+EA2)
mov ax, word ptr[bp+offset1] (10)
add ax, word ptr[bp+offset2] (9+EA1)
mov word ptr[bp+offset3], ax (10)
generic instruction
. . .
dmem3 <-- dmem1 + dmem2
. . .
execution time size
. . .22
8086 instructions 68020 instructions
Generic instruction
Technology file for 68020Texhnology file for 8086
B1
B2
B5
B8
B1
B2
B3
B4
B5
B6
B7
B8
A) B)
SW HW SW HW
B3
B4
3,4
6,7
B6
B7
S
S
A CB D
BC CD
= 1 = 1 = 1
B
B
C
C
D
D
= 10 = 2 = 10
AB
=2 =2 =4
= 1
A
= 5
A
s s s s
a a a a
ss
s
S
ABCD
BCD
CD
D
BC
C
ABC
B
AB
A
(a=4, s=35)
(a=3, s=28)
(a=2, s=16)
(a=1, s=10)
(a=1, s=2)
(a=2, s=14)
(a=3, s=21)
(a=1. s=10)
(a=2, s=17)
(a=1,s=5)
S
S
S
S
S
S
5 5 5 5
171717
10 10 + 5 = 15 10 + 5 = 15 10 + 5 = 15
21 21
14 + 5 = 1914 + 5 = 1914
2 + 10 = 122 2 + 17 = 19 2 + 17 = 19
35
28 + 5 = 3328
16 + 10 = 26 16 + 17 = 3316
1 2 3 4
Best:
Best: S : 10 S : 17 S : 17 S : 17
Best: S : 10 S : 17 S : 21 S : 21
Best: : 10 S : 20 S : 28 S : 35
S 10 10 + 10 = 20 10 + 17 = 27 10 + 21 = 31
Area:
Group D:
Group C:
Group B:
Group A:
BestChoice[D, 4]
BestSpeedup[D, 4]
S
S
S
A,A
A,B
B,B
A,C
B,C
C,C
A,D
B,D
C,D
D,D
B,B
B,B
B,B A,B
A,B
D,D
: 5S : 5S: 5S : 5S
A,A A,A A,A A,A
A,B A,B
A,C A,C
B,D A,D
Speedup[S
, 2]
D,D
0
200000
400000
600000
800000
1000000
1200000
1000 1200 1400 1600 1800 2000 2200 2400
Resulting clockcycles
Total chip area
Knapsack algorithm - instantaneous communication
Knapsack algorithm - simple communication
PACE algorithm - adjacent block communication
B1
B2
B3
B4
SW HW
B1
B2
B3
B4
B1
B2
B3
B4
SW HW
SW HW
Simple Communication Adjacent Block CommunicationInstantaneous Communication
0
200000
400000
600000
800000
1000000
1200000
1000 1200 1400 1600 1800 2000 2200 2400
Resulting clockcycles
Total chip area
Knapsack algorithm - instantaneous communication
Knapsack algorithm - simple communication
PACE algorithm - adjacent block communication
100000
200000
300000
400000
500000
600000
700000
500 1000 1500 2000 2500
Resulting clockcycles
Total chip area
Allocation A
Allocation B
Allocation C
0
5
10
15
20
0 1 2 3 4
All-HW execution time
Speedup
50
150
200
100
Number of combinatorial multipiers (mul-comb)
Hardware cycles
Speedup