On-Line Adjustable Buffering for Runtime Power Reduction
Click here to load reader
description
Transcript of On-Line Adjustable Buffering for Runtime Power Reduction
On-Line Adjustable Buffering for Runtime
Power ReductionAndrew B. KahngΨ
Sherief Reda†
Puneet SharmaΨ
ΨUniversity of California, San Diego†Brown University
Outline Introduction Adjustable Buffering Methodology Experiments & Results Conclusions
Power: First-Class ObjectivePower bottleneck to Moore’s lawPower-frequency tradeoff exists in CMOS circuits
Much higher power required to operate at high frequencyTechniques to exploit
power-frequency tradeoff are of interest Allow high freq. operation Can give significant power
reduction when max. performance not required
Mainstream approach: Dynamic voltage and frequency scaling (DVFS) Power-frequency tradeoff with VDD
scaling
Frequency
Pow
er
Dynamic VDD & Freq. Scaling Scale down VDD and freq. when high performance not needed Limitations of DVFS
VDD cannot be scaled down indefinitely
Range of VDD scaling is small and diminishing Extremely high power at high
VDD reduce max. VDD
High Vth to reduce leakage, noise margins, variability, soft errors increase min. VDD
Discrete allowed voltages
Frequency
Pow
erIdeal frequency-power from VDD scalingActual frequency-power from DVFS
Our objective: enable additional modes to exploit frequency-power tradeoff Useable when VDD cannot be scaled further Useable without DVFS
Proposal: Adjustable BufferingOur approach, like DVFS, provides runtime-selectable low-Our approach, like DVFS, provides runtime-selectable low-
power modes power modes supplement or replace DVFS supplement or replace DVFSKey idea: Key idea: Lot of logic added for performance, not Lot of logic added for performance, not
functionality functionality Turn this logic off when high-performance Turn this logic off when high-performance not needednot needed
Poor interconnect scaling Poor interconnect scaling large number of repeaters large number of repeaters20-30% of cells are repeaters20-30% of cells are repeatersFat repeaters are used to improve delay but consume a lot of powerFat repeaters are used to improve delay but consume a lot of power
We modify repeaters to dynamically adjust their driving We modify repeaters to dynamically adjust their driving capacitycapacity
32X 16X / 32X
Select
Transform
Outline Introduction Adjustable Buffering Methodology Experiments & Results Conclusions
Adjustable Repeater DesignWe add PMOS-NMOS pair to turn half the devices off dynamically
What power components are likely to reduce in low-power mode?• Short-circuit power: during switching, PMOS & NMOS ON momentarily short
circuit between VDD and VSS
• High when transition time (slew) is large• Subthreshold leakage: when one of PMOS-NMOS pair between VDD and VSS ON
Traditional Inverter (INVX8)Traditional Inverter (INVX8) Adjustable InverterAdjustable Inverter
“LPM” = ON only half devices operational (low-power mode).“LPM” = OFF all devices operational (high-performance mode).
Control Gate
Control Gate
Adjustable Repeater Requirements Low area overheadLow area overhead
Added PMOS-NMOS pair (Added PMOS-NMOS pair (LPM devices)LPM devices) takes area takes area LPMLPM signal to be routed or locally generated signal to be routed or locally generated Layout of the new cell must be simple and low area Layout of the new cell must be simple and low area
overheadoverhead High performance in high-performance modeHigh performance in high-performance mode
On-resistance of LPM devices may reduce On-resistance of LPM devices may reduce performanceperformance
Good power reduction in low-power modeGood power reduction in low-power mode
Area Overhead Problem:Problem: High performance needed when LPM signal OFF High performance needed when LPM signal OFF
use large control gates use large control gates large area overhead large area overhead
Solution:Solution: Share control gates among multiple repeatersShare control gates among multiple repeaters
Control Gate
Control Gate
Delay overhead: increase in delay of adjustable repeater over traditional repeater
Control Gate Sharing
Fewer control gates but virtual VFewer control gates but virtual VDDDD (V’ (V’DDDD) and V) and VSSSS (V’ (V’SSSS) need routing) need routing
How many control gates needed?How many control gates needed? Compute simultaneous switching rate (SSR) by finding the max. Compute simultaneous switching rate (SSR) by finding the max.
#repeaters that have overlapping timing windows. Time = #repeaters that have overlapping timing windows. Time = OO(RlogR) (RlogR) (R = #repeaters)(R = #repeaters)
Find total width of all repeater devices controlled by CGs (=Find total width of all repeater devices controlled by CGs (=WRWR)) For good performance, width of control gates = 4 x SSR x WRFor good performance, width of control gates = 4 x SSR x WR
Typical SSR=~10% Typical SSR=~10% small area overhead small area overhead
LPM devices shared by two invertersLPM devices shared by two inverters
V’DD
V’SS
Ensuring High Performance Problem:Problem: Adjustable repeaters ~5% slower when Adjustable repeaters ~5% slower when
LPM signal LPM signal OFFOFF Up to ~5% reduction in circuit performanceUp to ~5% reduction in circuit performance
Solution:Solution: do not use adjustable repeaters on timing-do not use adjustable repeaters on timing-critical pathscritical paths
Additional constraint: slew constraints not violated Additional constraint: slew constraints not violated when LPM signal is when LPM signal is OFF OFF or or ON.ON.
We characterize adjustable repeaters (i.e., find delay, We characterize adjustable repeaters (i.e., find delay, slew, power, input capacitance) and then substitute slew, power, input capacitance) and then substitute traditional repeaters with adjustable repeaters subject traditional repeaters with adjustable repeaters subject to delay and slew constraints.to delay and slew constraints. No loss in circuit performance & no slew violationsNo loss in circuit performance & no slew violations
Power Reduction in Low-Power Mode
Traditional InverterTraditional Inverter Adjustable InverterAdjustable Inverter
OFF
OFF
Short-Circuit Energy LeakageLVT 43% 28%
SVT 35% 26%
HVT 22% 22%
Reduction in short-circuit energy and leakage for INVX8
Short-circuit energy and leakage reduce
Outline Introduction Adjustable Buffering Methodology Experiments & Results Conclusions
Experimental Validation CircuitsCircuits: s38417 (8,890 cells), AES (15,272), : s38417 (8,890 cells), AES (15,272),
OpenRisc (46,732)OpenRisc (46,732) ToolsTools: Synopsys HSPICE (SPICE), Design : Synopsys HSPICE (SPICE), Design
Compiler (synthesis, timing and power analysis); Compiler (synthesis, timing and power analysis); Cadence SoC Encounter (P&R), SignalStorm Cadence SoC Encounter (P&R), SignalStorm (library characterization); Artisan TSMC 90nm (library characterization); Artisan TSMC 90nm library modelslibrary models
Other settingsOther settings: power and timing analysis at : power and timing analysis at slow corner, Vslow corner, VDDDD of 1.1V and 0.9V, activity factor of 1.1V and 0.9V, activity factor of 0.01.of 0.01.
Results: Power Reduction
Both dynamic and leakage Both dynamic and leakage power reducepower reduce
6-12% reduction in total 6-12% reduction in total power at low-power modepower at low-power mode
AES
2
2.5
3
3.5
4
4.5
445 438 432 389 354 349 343 337Frequency (MHz)
Tota
l Pow
er (m
W)
DVFS
DVFS+LPM
OpenRisc
7
8
9
10
11
12
192 187 181 173 164 159 154 149Frequency (MHz)
Tota
l Pow
er (m
W)
DVFSDVFS+LPM
VDD=1.1LPM=0
We perform comparative We perform comparative analysis of:analysis of: Circuit with DVFS Circuit with DVFS + LPMVDD=1.1
LPM=1
VDD=0.9LPM=0
VDD=0.9LPM=1
Results: Area OverheadLogic area overhead due to Logic area overhead due to
control gatescontrol gatesDepends on SSRDepends on SSRSmaller if control gates can be Smaller if control gates can be
placed in whitespaceplaced in whitespace0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
s38417(SSR=13.5%)
AES(SSR=7.95%)
OpenRisc(SSR=9.77%)
Routing overheadRouting overheadLPM, LPM routed to control gatesLPM, LPM routed to control gates
routing overhead depends on routing overhead depends on locations of control gateslocations of control gates
# control gates small # control gates small overhead overhead smallsmall
V’V’DDDD, V’, V’SSSS routed to all repeaters routed to all repeatersFor overhead estimation, nets For overhead estimation, nets
assumed to be Steiner treesassumed to be Steiner trees3.32%
3.34%
3.36%
3.38%
3.40%
3.42%
3.44%
3.46%
s38417 AES OpenRisc
Outline Introduction Adjustable Buffering Methodology Experiments & Results Conclusions
Conclusions Presented a novel technique that dynamically trades off power and Presented a novel technique that dynamically trades off power and
performance by turning off devices not needed at less than max. performance by turning off devices not needed at less than max. performanceperformance Both leakage and dynamic power reduce; total power reduction is 6-12% Both leakage and dynamic power reduce; total power reduction is 6-12%
on our testcaseson our testcases By sharing of control gates, area overhead reduced to <5.57%By sharing of control gates, area overhead reduced to <5.57% No adverse affect on performance of the circuit when LPM signal No adverse affect on performance of the circuit when LPM signal OFFOFF
Future work:Future work: Actual layout of adjustable repeaters with routing of V’Actual layout of adjustable repeaters with routing of V’DDDD, V’, V’SSSS, ,
LPM nets to accurately estimate power, performance, area LPM nets to accurately estimate power, performance, area impactsimpacts
Customization of more cells especially clock repeaters to further Customization of more cells especially clock repeaters to further improve power-performance tradeoffimprove power-performance tradeoff
Thank You Questions?