ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

εε-Optimal Minimum-Delay/Area Zero-Skew Clock -Optimal Minimum-Delay/Area Zero-Skew Clock

Tree Wire-Sizing in Pseudo-Polynomial TimeTree Wire-Sizing in Pseudo-Polynomial Time

Jeng-Liang TsaiJeng-Liang Tsai

Tsung-Hao ChenTsung-Hao Chen

Charlie Chung-Ping ChenCharlie Chung-Ping Chen (National Taiwan (National Taiwan

University)University)

University of Wisconsin-Madisonhttp://vlsi.ece.wisc.edu

OutlineOutline

Background• Motivation and contribution• Literature overview

ClockTune algorithm• Problem formulation• ClockTune algorithm overview• Optimality and complexity analysis

Experimental results• Runtime, memory usage, and optimality• Power/Delay trade-off• Incremental refinement

MotivationMotivation

Clock skew cycle time penalty• Start with zero-skew clock tree

• Minimize clock delay reduces system-level skew (Kuh, et al. [DAC ‘90])

Clock tree is power-hungry (30% in Intel McKinley(0.18um/1GHz/130W) • P = f CV2

• Minimize switching capacitance (wiring area)

Stability affects design convergence• Allow incremental refinement to accommodate local changes

Interconnect delay dominates total delay• Wire-sizing is effective in reducing interconnect delay

MotivationMotivation

Non-convex zero-skew constraints• No known algorithm solves zero-skew wire-sizing problem optimally

with polynomial runtime

Hence, a good clock tree wire-sizing algorithm can Minimize delay and power Guarantee optimality and runtime Have good stability

ContributionContribution

First ε-optimal algorithm for solving clock min-delay/power zero-skew wire-sizing optimization problem

Provide complete (Sampled) solution set of the delay/power/area trade-off information for design planning

Efficient pseudo-polynomial runtime (6170-branch clock tree in 6 minutes within 1% optimality)

Runtime v.s. Optimality tradeoff Incremental clock re-balancing to speed up design convergence

Literature OverviewLiterature Overview

“Reliable non-zero skew clock tree using wire width optimization”, Pillage, et al. [DAC ’93]• Iteratively optimize skew and delay using adjoint sensitivity analysis• Aimed at reliable clock trees under process variation

Deferred Merging Embedding (DME) algorithm, Kahng, et al. [TCAD ’92] • Bottom-up merging segment construction, top-down embedding

Integrated Deferred Merging Embedding (IDME) algorithm, Wong, et al. [ISPD’00]• Handles simultaneous routing, buffer-insertion, and wire-sizing• Merging segment set: a set of line samples of a merging region• No optimality guarantee• The size of MSS grows exponentially

“Process variation aware clock tree routing”, Lu, et al. [ISPD ’03]• Based on DME/BST

OutlineOutline

Background• Motivation and contribution• Literature overview

ClockTune algorithm• Problem formulation• ClockTune algorithm overview• Optimality and complexity analysis

Experimental results• Runtime, memory usage, and optimality• Power/Delay trade-off• Incremental refinement

Problem formulationProblem formulation

min-ZSWS (Zero Skew Wire Sizing) problem• Given a clock routing

minimize

where Pi, Pj are paths from v to leaf nodes i and jZero-skew constraints are non-convex constraints

• No known algorithm solves the problem optimally in polynomial runtime

jiwPwP

s)constraint skew(Zero),(delay)(delay

Delay)(delay

Area)(area

)(delay)(area

DC region approachDC region approach

Clock Delay and wiring Capacitance are top concerns Define f : RN R2, such that

• fY(w) = Delay(Tv(w)), fX(w) = Capacitance(Tv(w))

• DC region (v): The projection of the feasible region

• Choose a d-c pair from the DC region on R2

f : R6 -> R2

DC regionTv

Feasible region

ClockTuneClockTune algorithm algorithm overviewoverview

Phase 1: bottom-up construct DC regions for every node Phase 2: top-down embedding after delay/power tradeoff

(a) (b)

Optimality analysisOptimality analysis

Embeddings not fall on the delay samples will be omitted• Propagated error

• Delay sampling error

• Wire width sampling error (detailed in the paper)

DC region

DC region usingchildren informationSampled DC region

DC region

Sampled DC region

Optimality analysisOptimality analysis Error is bounded

d : delay sampling resolution

w : wire width sampling resolution

• k, : Constants related to l, r0, c0, wm, wM …

Generally speaking, error reduced about a half when resolution doubled

ErrorError

ResolutionResolution

Optimality runtime Optimality runtime trade offtrade off

Control sampling resolution can trade off optimality with runtime and memory

128 256 512 1024

(sample )

Minimum delay v.s. Optimal delay

0 1000 2000 3000 4000

p, q = 1024

(node )

256128

Runtime

Complexity analysisComplexity analysis

Runtime• Bottom-up phase takes O(n p max(p,q))

• Top-down phase takes O(np)

• Overall: O(n p max(p,q))

MemoryO(np)

where n : number of nodes of the clock tree,

p : number of delay samples taken at each node

q : number of wire width samples taken at each level-2 node

OutlineOutline

Background• Motivation and contribution• Related works• problem formulation

ClockTune Algorithm• Design space projection• Algorithm overview• Optimality and complexity analysis

Experimental Results• Runtime, memory usage, and optimality• Power/Delay trade-off• Incremental refinement

Experimental setupExperimental setup

• ClockTune is implemented in C++, executed on a 128MB 533MHz Pentium III PC

Benchmarks r1 – r5 from Tsay et al. [ICCAD‘91] Initial routing generated by BB+DME algorithm with minimum

wire width w = 1 m ClockTune uses wm = 1 m, wM = 4 m

p: number of delay samples taken at every node q: number of wire width samples taken at every level-2 node r0 = 0.03, c0 = 210-16/m2

Runtime and memory Runtime and memory usageusage

Runtime and memory usage are linear to problem size when p, q are fixed Within 1% optimality when p,q=256 (runtime < 6 minutes, memory ~ 64MB)

p, q = 256 # sink nodes # branches Runtime (s) Memory (MB) Optimality

r1 267 527 24.1 6.0 0.38%

r2 598 1185 61.0 12.5 0.71%

r3 862 1710 100.0 14.4 0.46%

r4 1903 3787 202.4 38.0 0.57%

r5 3101 6170 339.2 64.0 0.93%

0 1000 2000 3000 4000

p, q = 1024

(node )

256128

Runtime

0102030405060708090

0 1000 2000 3000 4000

(node)

p, q = 1024

Memory Usage

Optimality resultsOptimality results

Optimality Error below 1% with p=q=256 Error reduced to about a half when resolution doubled

128 256 512 1024

(sample )

Minimum delay v.s. Optimal delay

128 256 512 1024

(sample )

Minimum area v.s. Optimal area

Power/Delay trade-offPower/Delay trade-off

Capacitance

0.2~1.1nF0.2~1.1nF

5~150ns5~150ns

Minimum powerMinimum power

Minimum delayMinimum delay

15:1 delay:power trade-off

Incremental Incremental refinementrefinement

DC region captures the design space• Enables incremental refinement

Conclusion & Future Conclusion & Future WorkWork

Provide a zero-skew clock tree wire-sizing algorithm which• Minimizes delay and area ε-optimally

• Guarantees pseudo-polynomial runtime and memory usage

• Provides delay/power trade-off information to designers

• Speeds up design convergence by allowing clock tree re-balancing with minimum changes

Better delay model Buffer insertion/sizing capability

Thank you !Thank you !

ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

Documents

Transcript of ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

Slow-fast asymptotics for delay differential equations€¦ · 7. Hal Smith, An Introduction to Delay Differential Equations with Applications to the Life Sciences, Springer (2010)

1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.

Variable Fractional Delay FIR Filters with Sparse Coefficients

ArchivumMathematicum - DML · PDF file2010MathematicsSubjectClassification: ... In the present paper we deal with an infinite time delay. ... n−1σ−qn−1 Γ

Pipe line sizing

Christopher LaFrieda and Rajit Manohar Computer Systems Laboratory Cornell University Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits.

MIAMI J SELECT - assets.ossur.com J Select Instructions for use.pdf · 7 • C-spine precaution for trauma patients SIZING TIPS • Adjust to the height that fits most comfortably

Field extensions, Derivations, and Matroids over skew ... Pendavingh seminar 1 … · Field extensions, Derivations, and Matroids over skew hyper elds Rudi Pendavingh Technische universiteit

EE141- Spring 2003bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s03/Lectures/Lectu… · EE141 Delay in a Logic Gate Gate delay: d = h + p effort delay intrinsic delay Effort delay:

Hermitian and Symmetric Matricesdallen/m640_03c/lectures/... · 2003-12-03 · (b) If A is skew-Hermitian the diagonal is imaginary. (c) ... matrix, the families are the same. So

Improved control valve sizing for multiphase flow - Samson AG Mess

Factoring in Skew-Polynomial Rings over Finite Fields - the David R

SKEW-PRODUCT FOR GROUP-VALUED EDGE › pub › pubmat › 02141493v53n2 › 02141493v53n2p329.pdfskew-products for particular edge labellings. For Matui, going modulo the group action

Whiskered invariant tori for bered dynamics. · Whiskered invariant tori for bered dynamics. We consider bered holomorphic dynamics, generated by the skew product over the translation

L14 Physics of dry air and moist air Potential temperature Pseudo-adiabatic charts Skew T – ln p charts Moist air Saturated adiabatic lapse rate Normand’s.

MUS420/EE367A Lecture 4A Interpolated Delay Lines - CCRMA

Copa: Practical Delay-Based Congestion Control for the Internet · 2018-03-19 · Copa: Practical Delay-Based Congestion Control for the Internet Venkat Arun and Hari Balakrishnan

January 14, 2003GPS Meteorology Workshop1 Information from a Numerical Weather Model for Improving Atmosphere Delay Estimation in Geodesy Arthur Niell.

[XLS]Fluid Flow - Pipe sizing · Web viewOrifice discharge pressure Permanent Loss Orifice Diameter V1 Orifice Coefficient of Discharge β Orifice diameter ratio Delta P psi/100 ft

SKEW DERIVATIONS WHOSE INVARIANTS SATISFY A …aragorn.pb.bialystok.pl/~piotrgr/prace/SkewDerPI.pdf · Viewing Smith’s result in terms of derivations, it says that in a prime ring,