On-Line Adjustable Buffering for Runtime Power Reduction

19

Click here to load reader

description

On-Line Adjustable Buffering for Runtime Power Reduction. Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown University. Outline. Introduction Adjustable Buffering Methodology Experiments & Results Conclusions. Power: First-Class Objective. - PowerPoint PPT Presentation

Transcript of On-Line Adjustable Buffering for Runtime Power Reduction

Page 1: On-Line Adjustable Buffering for Runtime Power Reduction

On-Line Adjustable Buffering for Runtime

Power ReductionAndrew B. KahngΨ

Sherief Reda†

Puneet SharmaΨ

ΨUniversity of California, San Diego†Brown University

Page 2: On-Line Adjustable Buffering for Runtime Power Reduction

Outline Introduction Adjustable Buffering Methodology Experiments & Results Conclusions

Page 3: On-Line Adjustable Buffering for Runtime Power Reduction

Power: First-Class ObjectivePower bottleneck to Moore’s lawPower-frequency tradeoff exists in CMOS circuits

Much higher power required to operate at high frequencyTechniques to exploit

power-frequency tradeoff are of interest Allow high freq. operation Can give significant power

reduction when max. performance not required

Mainstream approach: Dynamic voltage and frequency scaling (DVFS) Power-frequency tradeoff with VDD

scaling

Frequency

Pow

er

Page 4: On-Line Adjustable Buffering for Runtime Power Reduction

Dynamic VDD & Freq. Scaling Scale down VDD and freq. when high performance not needed Limitations of DVFS

VDD cannot be scaled down indefinitely

Range of VDD scaling is small and diminishing Extremely high power at high

VDD reduce max. VDD

High Vth to reduce leakage, noise margins, variability, soft errors increase min. VDD

Discrete allowed voltages

Frequency

Pow

erIdeal frequency-power from VDD scalingActual frequency-power from DVFS

Our objective: enable additional modes to exploit frequency-power tradeoff Useable when VDD cannot be scaled further Useable without DVFS

Page 5: On-Line Adjustable Buffering for Runtime Power Reduction

Proposal: Adjustable BufferingOur approach, like DVFS, provides runtime-selectable low-Our approach, like DVFS, provides runtime-selectable low-

power modes power modes supplement or replace DVFS supplement or replace DVFSKey idea: Key idea: Lot of logic added for performance, not Lot of logic added for performance, not

functionality functionality Turn this logic off when high-performance Turn this logic off when high-performance not needednot needed

Poor interconnect scaling Poor interconnect scaling large number of repeaters large number of repeaters20-30% of cells are repeaters20-30% of cells are repeatersFat repeaters are used to improve delay but consume a lot of powerFat repeaters are used to improve delay but consume a lot of power

We modify repeaters to dynamically adjust their driving We modify repeaters to dynamically adjust their driving capacitycapacity

32X 16X / 32X

Select

Transform

Page 6: On-Line Adjustable Buffering for Runtime Power Reduction

Outline Introduction Adjustable Buffering Methodology Experiments & Results Conclusions

Page 7: On-Line Adjustable Buffering for Runtime Power Reduction

Adjustable Repeater DesignWe add PMOS-NMOS pair to turn half the devices off dynamically

What power components are likely to reduce in low-power mode?• Short-circuit power: during switching, PMOS & NMOS ON momentarily short

circuit between VDD and VSS

• High when transition time (slew) is large• Subthreshold leakage: when one of PMOS-NMOS pair between VDD and VSS ON

Traditional Inverter (INVX8)Traditional Inverter (INVX8) Adjustable InverterAdjustable Inverter

“LPM” = ON only half devices operational (low-power mode).“LPM” = OFF all devices operational (high-performance mode).

Control Gate

Control Gate

Page 8: On-Line Adjustable Buffering for Runtime Power Reduction

Adjustable Repeater Requirements Low area overheadLow area overhead

Added PMOS-NMOS pair (Added PMOS-NMOS pair (LPM devices)LPM devices) takes area takes area LPMLPM signal to be routed or locally generated signal to be routed or locally generated Layout of the new cell must be simple and low area Layout of the new cell must be simple and low area

overheadoverhead High performance in high-performance modeHigh performance in high-performance mode

On-resistance of LPM devices may reduce On-resistance of LPM devices may reduce performanceperformance

Good power reduction in low-power modeGood power reduction in low-power mode

Page 9: On-Line Adjustable Buffering for Runtime Power Reduction

Area Overhead Problem:Problem: High performance needed when LPM signal OFF High performance needed when LPM signal OFF

use large control gates use large control gates large area overhead large area overhead

Solution:Solution: Share control gates among multiple repeatersShare control gates among multiple repeaters

Control Gate

Control Gate

Delay overhead: increase in delay of adjustable repeater over traditional repeater

Page 10: On-Line Adjustable Buffering for Runtime Power Reduction

Control Gate Sharing

Fewer control gates but virtual VFewer control gates but virtual VDDDD (V’ (V’DDDD) and V) and VSSSS (V’ (V’SSSS) need routing) need routing

How many control gates needed?How many control gates needed? Compute simultaneous switching rate (SSR) by finding the max. Compute simultaneous switching rate (SSR) by finding the max.

#repeaters that have overlapping timing windows. Time = #repeaters that have overlapping timing windows. Time = OO(RlogR) (RlogR) (R = #repeaters)(R = #repeaters)

Find total width of all repeater devices controlled by CGs (=Find total width of all repeater devices controlled by CGs (=WRWR)) For good performance, width of control gates = 4 x SSR x WRFor good performance, width of control gates = 4 x SSR x WR

Typical SSR=~10% Typical SSR=~10% small area overhead small area overhead

LPM devices shared by two invertersLPM devices shared by two inverters

V’DD

V’SS

Page 11: On-Line Adjustable Buffering for Runtime Power Reduction

Ensuring High Performance Problem:Problem: Adjustable repeaters ~5% slower when Adjustable repeaters ~5% slower when

LPM signal LPM signal OFFOFF Up to ~5% reduction in circuit performanceUp to ~5% reduction in circuit performance

Solution:Solution: do not use adjustable repeaters on timing-do not use adjustable repeaters on timing-critical pathscritical paths

Additional constraint: slew constraints not violated Additional constraint: slew constraints not violated when LPM signal is when LPM signal is OFF OFF or or ON.ON.

We characterize adjustable repeaters (i.e., find delay, We characterize adjustable repeaters (i.e., find delay, slew, power, input capacitance) and then substitute slew, power, input capacitance) and then substitute traditional repeaters with adjustable repeaters subject traditional repeaters with adjustable repeaters subject to delay and slew constraints.to delay and slew constraints. No loss in circuit performance & no slew violationsNo loss in circuit performance & no slew violations

Page 12: On-Line Adjustable Buffering for Runtime Power Reduction

Power Reduction in Low-Power Mode

Traditional InverterTraditional Inverter Adjustable InverterAdjustable Inverter

OFF

OFF

Short-Circuit Energy LeakageLVT 43% 28%

SVT 35% 26%

HVT 22% 22%

Reduction in short-circuit energy and leakage for INVX8

Short-circuit energy and leakage reduce

Page 13: On-Line Adjustable Buffering for Runtime Power Reduction

Outline Introduction Adjustable Buffering Methodology Experiments & Results Conclusions

Page 14: On-Line Adjustable Buffering for Runtime Power Reduction

Experimental Validation CircuitsCircuits: s38417 (8,890 cells), AES (15,272), : s38417 (8,890 cells), AES (15,272),

OpenRisc (46,732)OpenRisc (46,732) ToolsTools: Synopsys HSPICE (SPICE), Design : Synopsys HSPICE (SPICE), Design

Compiler (synthesis, timing and power analysis); Compiler (synthesis, timing and power analysis); Cadence SoC Encounter (P&R), SignalStorm Cadence SoC Encounter (P&R), SignalStorm (library characterization); Artisan TSMC 90nm (library characterization); Artisan TSMC 90nm library modelslibrary models

Other settingsOther settings: power and timing analysis at : power and timing analysis at slow corner, Vslow corner, VDDDD of 1.1V and 0.9V, activity factor of 1.1V and 0.9V, activity factor of 0.01.of 0.01.

Page 15: On-Line Adjustable Buffering for Runtime Power Reduction

Results: Power Reduction

Both dynamic and leakage Both dynamic and leakage power reducepower reduce

6-12% reduction in total 6-12% reduction in total power at low-power modepower at low-power mode

AES

2

2.5

3

3.5

4

4.5

445 438 432 389 354 349 343 337Frequency (MHz)

Tota

l Pow

er (m

W)

DVFS

DVFS+LPM

OpenRisc

7

8

9

10

11

12

192 187 181 173 164 159 154 149Frequency (MHz)

Tota

l Pow

er (m

W)

DVFSDVFS+LPM

VDD=1.1LPM=0

We perform comparative We perform comparative analysis of:analysis of: Circuit with DVFS Circuit with DVFS + LPMVDD=1.1

LPM=1

VDD=0.9LPM=0

VDD=0.9LPM=1

Page 16: On-Line Adjustable Buffering for Runtime Power Reduction

Results: Area OverheadLogic area overhead due to Logic area overhead due to

control gatescontrol gatesDepends on SSRDepends on SSRSmaller if control gates can be Smaller if control gates can be

placed in whitespaceplaced in whitespace0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

s38417(SSR=13.5%)

AES(SSR=7.95%)

OpenRisc(SSR=9.77%)

Routing overheadRouting overheadLPM, LPM routed to control gatesLPM, LPM routed to control gates

routing overhead depends on routing overhead depends on locations of control gateslocations of control gates

# control gates small # control gates small overhead overhead smallsmall

V’V’DDDD, V’, V’SSSS routed to all repeaters routed to all repeatersFor overhead estimation, nets For overhead estimation, nets

assumed to be Steiner treesassumed to be Steiner trees3.32%

3.34%

3.36%

3.38%

3.40%

3.42%

3.44%

3.46%

s38417 AES OpenRisc

Page 17: On-Line Adjustable Buffering for Runtime Power Reduction

Outline Introduction Adjustable Buffering Methodology Experiments & Results Conclusions

Page 18: On-Line Adjustable Buffering for Runtime Power Reduction

Conclusions Presented a novel technique that dynamically trades off power and Presented a novel technique that dynamically trades off power and

performance by turning off devices not needed at less than max. performance by turning off devices not needed at less than max. performanceperformance Both leakage and dynamic power reduce; total power reduction is 6-12% Both leakage and dynamic power reduce; total power reduction is 6-12%

on our testcaseson our testcases By sharing of control gates, area overhead reduced to <5.57%By sharing of control gates, area overhead reduced to <5.57% No adverse affect on performance of the circuit when LPM signal No adverse affect on performance of the circuit when LPM signal OFFOFF

Future work:Future work: Actual layout of adjustable repeaters with routing of V’Actual layout of adjustable repeaters with routing of V’DDDD, V’, V’SSSS, ,

LPM nets to accurately estimate power, performance, area LPM nets to accurately estimate power, performance, area impactsimpacts

Customization of more cells especially clock repeaters to further Customization of more cells especially clock repeaters to further improve power-performance tradeoffimprove power-performance tradeoff

Page 19: On-Line Adjustable Buffering for Runtime Power Reduction

Thank You Questions?