ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

22
ε ε -Optimal Minimum-Delay/Area Zero-Skew -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo- Clock Tree Wire-Sizing in Pseudo- Polynomial Time Polynomial Time Jeng-Liang Tsai Jeng-Liang Tsai Tsung-Hao Chen Tsung-Hao Chen Charlie Chung-Ping Chen Charlie Chung-Ping Chen (National Taiwan (National Taiwan University) University) University of Wisconsin- Madison http://vlsi.ece.wisc.edu

description

ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time. Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National Taiwan University). University of Wisconsin-Madison http://vlsi.ece.wisc.edu. Outline. Background Motivation and contribution - PowerPoint PPT Presentation

Transcript of ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

Page 1: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

1

εε-Optimal Minimum-Delay/Area Zero-Skew Clock -Optimal Minimum-Delay/Area Zero-Skew Clock

Tree Wire-Sizing in Pseudo-Polynomial TimeTree Wire-Sizing in Pseudo-Polynomial Time

Jeng-Liang TsaiJeng-Liang Tsai

Tsung-Hao ChenTsung-Hao Chen

Charlie Chung-Ping ChenCharlie Chung-Ping Chen (National Taiwan (National Taiwan

University)University)

University of Wisconsin-Madisonhttp://vlsi.ece.wisc.edu

Page 2: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

2

OutlineOutline

Background• Motivation and contribution• Literature overview

ClockTune algorithm• Problem formulation• ClockTune algorithm overview• Optimality and complexity analysis

Experimental results• Runtime, memory usage, and optimality• Power/Delay trade-off• Incremental refinement

Page 3: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

3

MotivationMotivation

Clock skew cycle time penalty• Start with zero-skew clock tree

• Minimize clock delay reduces system-level skew (Kuh, et al. [DAC ‘90])

Clock tree is power-hungry (30% in Intel McKinley(0.18um/1GHz/130W) • P = f CV2

• Minimize switching capacitance (wiring area)

Stability affects design convergence• Allow incremental refinement to accommodate local changes

Interconnect delay dominates total delay• Wire-sizing is effective in reducing interconnect delay

Page 4: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

4

MotivationMotivation

Non-convex zero-skew constraints• No known algorithm solves zero-skew wire-sizing problem optimally

with polynomial runtime

Hence, a good clock tree wire-sizing algorithm can Minimize delay and power Guarantee optimality and runtime Have good stability

Page 5: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

5

ContributionContribution

First ε-optimal algorithm for solving clock min-delay/power zero-skew wire-sizing optimization problem

Provide complete (Sampled) solution set of the delay/power/area trade-off information for design planning

Efficient pseudo-polynomial runtime (6170-branch clock tree in 6 minutes within 1% optimality)

Runtime v.s. Optimality tradeoff Incremental clock re-balancing to speed up design convergence

Page 6: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

6

Literature OverviewLiterature Overview

“Reliable non-zero skew clock tree using wire width optimization”, Pillage, et al. [DAC ’93]• Iteratively optimize skew and delay using adjoint sensitivity analysis• Aimed at reliable clock trees under process variation

Deferred Merging Embedding (DME) algorithm, Kahng, et al. [TCAD ’92] • Bottom-up merging segment construction, top-down embedding

Integrated Deferred Merging Embedding (IDME) algorithm, Wong, et al. [ISPD’00]• Handles simultaneous routing, buffer-insertion, and wire-sizing• Merging segment set: a set of line samples of a merging region• No optimality guarantee• The size of MSS grows exponentially

“Process variation aware clock tree routing”, Lu, et al. [ISPD ’03]• Based on DME/BST

Page 7: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

7

OutlineOutline

Background• Motivation and contribution• Literature overview

ClockTune algorithm• Problem formulation• ClockTune algorithm overview• Optimality and complexity analysis

Experimental results• Runtime, memory usage, and optimality• Power/Delay trade-off• Incremental refinement

Page 8: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

8

Problem formulationProblem formulation

min-ZSWS (Zero Skew Wire Sizing) problem• Given a clock routing

minimize

s.t.

where Pi, Pj are paths from v to leaf nodes i and jZero-skew constraints are non-convex constraints

• No known algorithm solves the problem optimally in polynomial runtime

Mm

ji

v

v

vv

www

jiwPwP

wT

wT

wTwT

s)constraint skew(Zero),(delay)(delay

Delay)(delay

Area)(area

)(delay)(area

Max

Max

21

Tv

ji

Pi Pj

Page 9: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

9

DC region approachDC region approach

Clock Delay and wiring Capacitance are top concerns Define f : RN R2, such that

• fY(w) = Delay(Tv(w)), fX(w) = Capacitance(Tv(w))

• DC region (v): The projection of the feasible region

• Choose a d-c pair from the DC region on R2

C

D

f : R6 -> R2

DC regionTv

w1 w2

w3

w4 w5

w6

Feasible region

Page 10: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

10

ClockTuneClockTune algorithm algorithm overviewoverview

Phase 1: bottom-up construct DC regions for every node Phase 2: top-down embedding after delay/power tradeoff

(a) (b)

1

2

2

3

4

5

6 7

4

3

1

C

D D

C

C

DD

D

C

C

CC

C

D

D D

Page 11: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

11

Optimality analysisOptimality analysis

Embeddings not fall on the delay samples will be omitted• Propagated error

• Delay sampling error

• Wire width sampling error (detailed in the paper)

D

C

w

d

p

DC region

DC region usingchildren informationSampled DC region

Page 12: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

12

D

C

DC region

Sampled DC region

Optimality analysisOptimality analysis Error is bounded

d : delay sampling resolution

w : wire width sampling resolution

• k, : Constants related to l, r0, c0, wm, wM …

Generally speaking, error reduced about a half when resolution doubled

ErrorError

ResolutionResolution

Page 13: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

13

Optimality runtime Optimality runtime trade offtrade off

Control sampling resolution can trade off optimality with runtime and memory

0.0%

0.5%

1.0%

1.5%

2.0%

128 256 512 1024

r1

r2

r3

r4

r5

(sample )

Minimum delay v.s. Optimal delay

0

20

40

60

80

100

120

0 1000 2000 3000 4000

p, q = 1024

(min)

(node )

512

256128

Runtime

Page 14: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

14

Complexity analysisComplexity analysis

Runtime• Bottom-up phase takes O(n p max(p,q))

• Top-down phase takes O(np)

• Overall: O(n p max(p,q))

MemoryO(np)

where n : number of nodes of the clock tree,

p : number of delay samples taken at each node

q : number of wire width samples taken at each level-2 node

Page 15: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

15

OutlineOutline

Background• Motivation and contribution• Related works• problem formulation

ClockTune Algorithm• Design space projection• Algorithm overview• Optimality and complexity analysis

Experimental Results• Runtime, memory usage, and optimality• Power/Delay trade-off• Incremental refinement

Page 16: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

16

Experimental setupExperimental setup

• ClockTune is implemented in C++, executed on a 128MB 533MHz Pentium III PC

Benchmarks r1 – r5 from Tsay et al. [ICCAD‘91] Initial routing generated by BB+DME algorithm with minimum

wire width w = 1 m ClockTune uses wm = 1 m, wM = 4 m

p: number of delay samples taken at every node q: number of wire width samples taken at every level-2 node r0 = 0.03, c0 = 210-16/m2

Page 17: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

17

Runtime and memory Runtime and memory usageusage

Runtime and memory usage are linear to problem size when p, q are fixed Within 1% optimality when p,q=256 (runtime < 6 minutes, memory ~ 64MB)

p, q = 256 # sink nodes # branches Runtime (s) Memory (MB) Optimality

r1 267 527 24.1 6.0 0.38%

r2 598 1185 61.0 12.5 0.71%

r3 862 1710 100.0 14.4 0.46%

r4 1903 3787 202.4 38.0 0.57%

r5 3101 6170 339.2 64.0 0.93%

0

20

40

60

80

100

120

0 1000 2000 3000 4000

p, q = 1024

(min)

(node )

512

256128

Runtime

0102030405060708090

0 1000 2000 3000 4000

(MB)

(node)

p, q = 1024

512

256

128

Memory Usage

Page 18: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

18

Optimality resultsOptimality results

Optimality Error below 1% with p=q=256 Error reduced to about a half when resolution doubled

0.0%

0.5%

1.0%

1.5%

2.0%

128 256 512 1024

r1

r2

r3

r4

r5

(sample )

Minimum delay v.s. Optimal delay

0.0%

0.2%

0.4%

0.6%

0.8%

128 256 512 1024

r1

r2

r3

r4

r5

(sample )

Minimum area v.s. Optimal area

Page 19: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

19

Power/Delay trade-offPower/Delay trade-off

r5

Capacitance

Delay

0.2~1.1nF0.2~1.1nF

5~150ns5~150ns

Minimum powerMinimum power

Minimum delayMinimum delay

15:1 delay:power trade-off

Page 20: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

20

Incremental Incremental refinementrefinement

DC region captures the design space• Enables incremental refinement

C

DC

D

C

DC

D

X

Page 21: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

21

Conclusion & Future Conclusion & Future WorkWork

Provide a zero-skew clock tree wire-sizing algorithm which• Minimizes delay and area ε-optimally

• Guarantees pseudo-polynomial runtime and memory usage

• Provides delay/power trade-off information to designers

• Speeds up design convergence by allowing clock tree re-balancing with minimum changes

Better delay model Buffer insertion/sizing capability

Page 22: ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

22

Thank you !Thank you !