Computer Structure 2012 – Power Management 1 Computer Structure Power Management Lihu Rappoport...

25
Computer Structure 2012 – Power Management 1 Computer Structure Power Management Lihu Rappoport and Adi Yoaz Thanks to Efi Rotem for many of the foils

Transcript of Computer Structure 2012 – Power Management 1 Computer Structure Power Management Lihu Rappoport...

Computer Structure 2012 – Power Management 1

Computer Structure

Power Management

Lihu Rappoport and Adi Yoaz

Thanks to Efi Rotem for many of the foils

Computer Structure 2012 – Power Management 2

Processor Power Components

· The power consumed by a processor consists of – Dynamic power: power for toggling transistors and lines from 01 or 10

αCV2f : α – activity, C – capacitance, V – voltage, f – frequency

– Leakage power: leakage of transistors under voltage function of: Z – total size of all transistors, V – voltage, t – temperature

· Peak power must not exceed the thermal constrains– Power generates heat

Heat must be dissipated to keep transistors within allowed temperature

– Peak power determines peak frequency (and thus peak performance)

– Also affects form factor, cooling solution cost, and acoustic noise

· Average power– Determines battery life (for mobile devices), electricity bill, air-condition bill– Average power = Total Energy / Total time

Including low-activity and idle-time (~90% idle time for client)

Computer Structure 2012 – Power Management 3

Performance per Watt· In small form-factor devices thermal budget limits performance

– Old target: get max performance– New target: get max performance at a given power envelope

Performance per Watt

· Increasing f also requires increasing V (~linearly)– Dynamic Power = αCV2f = Kf3 X% performance costs ~3X% power– A power efficient feature – better than 1:3 performance : power

Otherwise it is better to just increase frequency (and voltage)

· Vmin is the minimal operation voltage– Once at Vmin, reducing frequency no longer reduces voltage– At this point a feature is power efficient only if it is 1:1 performance : power

· Active energy efficiency tradeoff– Energyactive = Poweractive × Timeactive Poweractive / Perfactive

– Energy efficient feature: 1:1 performance : power

Computer Structure 2012 – Power Management 4

Platform Power

· Processor average power is <10% of the platform

Display(panel + inverter)

33%

CPU10%

Power Supply10%

MCH9%

Misc.8%

GFX8%

HDD8%

CLK5%

ICH3%

DVD2%

LAN2%

Fan2%

Computer Structure 2012 – Power Management 5

Managing Power

· Typical CPU usage varies over time– Bursts of high utilization & long idle periods (~90% of time in client)

· Optimize power and energy consumption– High power when high performance is needed– Low power at low activity or idle

· Enhanced Intel SpeedStep® Technology– Multi voltage/frequency operating points– OS changes frequency to meet performance needs and minimize power– Referred to as processor Performance states = P-States

· OS notifies CPU when no tasks are ready for execution– CPU enters sleep state, called C-state– Using MWAIT instruction, with C-state level as an argument– Tradeoff between power and latency

Deeper sleep more power savings longer to wake

Computer Structure 2012 – Power Management 6

P-states

· Operation frequncies are called P-states = Performance states– P0 is the highest frequency – P1,2,3… are lower frequencies– Pn is the min Vcc point = Energy efficient point

· DVFS = Dynamic Voltage and Frequency Scaling– Power = CV2f ; f = KV Power ~ f3 – Program execution time ~ 1/f– E = P×t E ~ f2

Pn is the most energy efficient point

– Going up/down the cubic curve of power High cost to achieve frequency large power savings for

some small frequency reduction

P0

P1

Pn

Freq

Power

P2

Computer Structure 2012 – Power Management 7

C-States: C0

· C0: CPU active state

Leakage

Clock Distribution

Local Clocks and Logic

Active Core Power

Computer Structure 2012 – Power Management 8

C-States: C1

· C0: CPU active state· C1: Halt state:

• Stop core pipeline• Stop most core clocks• No instructions are executed• Caches respond to external snoops

Leakage

Clock Distribution

Active Core Power

Computer Structure 2012 – Power Management 9

C-States: C3

· C0: CPU active state· C1: Halt state:

• Stop core pipeline• Stop most core clocks• No instructions are executed• Caches respond to external snoops

· C3 state:• Stop remaining core clocks• Flush internal core caches

Leakage

Active Core Power

Computer Structure 2012 – Power Management 10

C-States: C6

· C0: CPU active state· C1: Halt state:

• Stop core pipeline• Stop most core clocks• No instructions are executed• Caches respond to external snoops

· C3 state:• Stop remaining core clocks• Flush internal core caches

· C6 state:• Processor saves architectural state• Turn off power gate, eliminating leakage

Leakage

Core power goes to ~0

Active Core Power

Computer Structure 2012 – Power Management 11

Putting it all together

· CPU running at max power and frequency· Periodically enters C1

0

2

4

6

8

10

12

14

16

18

20

Pow

er [W

]

C1

C0P0

Time

Computer Structure 2012 – Power Management 12

Putting it all together

· Going into idle period– Gradually enters deeper C states– Controlled by OS

0

2

4

6

8

10

12

14

16

18

20

Time

Pow

er [W

]

C2C3

C4C1

C0P0

Computer Structure 2012 – Power Management 13

Putting it all together

· Tracking CPU utilization history– OS identifies low activity– Switches CPU to lower P state

0

2

4

6

8

10

12

14

16

18

20

Time

Pow

er [W

]

C2C3

C4

C0P1

C1

C0P0

Computer Structure 2012 – Power Management 14

Putting it all together

· CPU enters Idle state again

0

2

4

6

8

10

12

14

16

18

20

Time

Pow

er [W

]

C2C3

C4

C0P1

C2C3

C4C1

C0P0

Computer Structure 2012 – Power Management 15

· Further lowering the P state· DVD play runs at lowest P state

0

2

4

6

8

10

12

14

16

18

20

Putting it all together

Time

Pow

er [W

]

C2C3

C4

C0P1

C0P2C2

C3C4

C1

C0P0

Computer Structure 2012 – Power Management 16

Voltage and Frequency Domains· Two Independent Variable Power Planes

– CPU cores, ring and LLC Embedded power gates – each core

can be turned off individually Cache power gating – turn off portions

or all cache at deeper sleep states– Graphics processor

Can be varied or turned off when not active

· Shared frequency for all IA32 cores and ring

· Independent frequency for PG· Fixed Programmable power plane

for System Agent – Optimize SA power consumption– System On Chip functionality and PCU logic– Periphery: DDR, PCIe, Display

VCC Core(Gated)

VCC Core(Gated)

VCC Core(Gated)

VCC Core(Gated)

VCC Core(ungated)

VCC SA

VCC Graphics

VCC Periphery

VCC Periphery

Em

bed

ded

po

wer

gat

es

Computer Structure 2012 – Power Management 17

Turbo Mode· P1 is guaranteed frequency

– CPU and GFX simultaneous heavy load at worst case conditions– Actual power has high dynamic range

· P0 is max possible frequency – the Turbo frequency– P1-P0 has significant frequency range (GHz)

Single thread or lightly loaded applications GFX <>CPU balancing

– OS treats P0 as any other P-state Requesting is when it needs more performance

– P1 to P0 range is fully H/W controlled Frequency transitions handled completely in HW PCU keeps silicon within existing operating limits

– Systems designed to same specs, with or without Turbo Mode

· Pn is the energy efficient state– Lower than Pn is controlled by Thermal-State

“Turbo”H/W

Control

OS VisibleStates

OS Control

T-state &Throttle

P1

Pn

P0 1C

freq

uen

cy

LFM

Computer Structure 2012 – Power Management 18

Fre

qu

en

cy (

F)

Fre

qu

en

cy (

F)

No Turbo

Core

0

Core

1

Core

2

Core

3

Core

2

Core

3

Core

0

Core

1

Power GatingZero power for inactive cores

Turbo Mode

Workload Lightly Threaded

Computer Structure 2012 – Power Management 19

Workload Lightly Threaded

Fre

qu

en

cy (

F)

Fre

qu

en

cy (

F)

No Turbo

Core

0

Core

1

Core

2

Core

3

Turbo ModeUse thermal budget of

inactive core to increase

frequency of active cores

Core

0

Core

1

Turbo Mode

Power GatingZero power for inactive cores

Computer Structure 2012 – Power Management 20

Fre

qu

en

cy (

F)

Fre

qu

en

cy (

F)

No Turbo

Core

0

Core

1

Core

2

Core

3

Workload Lightly Threaded C

ore

0

Core

1

Turbo Mode

Power GatingZero power for inactive cores

Turbo ModeUse thermal budget of

inactive core to increase

frequency of active cores

Computer Structure 2012 – Power Management 21

Active cores running workloads < TDP

Fre

qu

en

cy (

F)

Fre

qu

en

cy (

F)

No Turbo

Core

0

Core

1

Core

2

Core

3

Core

2

C

ore

3

Core

0

C

ore

1

Core

2

Core

3

Core

0

Core

1

Turbo Mode

Turbo ModeIncrease frequency

within thermal headroom

Computer Structure 2012 – Power Management 22

Fre

qu

en

cy (

F)

Fre

qu

en

cy (

F)

No Turbo

Core

0

Core

1

Core

2

Core

3 Workload Lightly Threaded

And active cores < TDP

Core

2

C

ore

3

Core

1

Core

0

Turbo Mode

Turbo ModeIncrease frequency

within thermal headroom

Power GatingZero power for inactive cores

Computer Structure 2012 – Power Management 23

Thermal Capacitance

Classic ModelSteady-State Thermal Resistance

Design guide for steady state

Tem

pera

ture

Time

Classic model response

Temperature rises as energy is delivered to thermal solutionThermal solution response is calculated at real-time

Tem

pera

ture

Time

More realistic response to power

changes

New ModelSteady-State Thermal Resistance

ANDDynamic Thermal Capacitance

Foil taken from IDF 2011

Computer Structure 2012 – Power Management 24

Time

Power

Sleep orLow power

Turbo Boost 2.0

“TDP”

C0/P0(Turbo)

After idle periods, the system accumulates “energy budget” and can accommodate high power/performance for a few seconds

In Steady State conditions the power stabilizes on TDP

P > TDP:

Responsiveness

Sust

ain

pow

erBuildup thermal budget during idle periods

Use accumulated

energy budget to enhance user

experience

Intel® Turbo Boost Technology 2.0

Foil taken from IDF 2011

Computer Structure 2012 – Power Management 25

Core and Graphic Power Budgeting• Cores and Graphics integrated on the same die with

separate voltage/frequency controls; tight HW control• Full package power specifications available for sharing• Power budget can shift between Cores and Graphics

Core Power [W]

Graphics Power[W]

Total package power

Realistic concurrentmax power

Sum of max power

Heavy Graphicsworkload

Heavy CPUworkload

SpecificationCore Power

Specification Graphics Power

Applications

Sandy Bridge Next Gen Turbo

for short periods

Foil taken from IDF 2011