Processor Power Components

25
Computer Structure 2013 – Power Management 1 Computer Structure Power Management Lihu Rappoport and Adi Yoaz Thanks to Efi Rotem for many of the foils

description

Computer Structure Power Management Lihu Rappoport and Adi Yoaz Thanks to Efi Rotem for many of the foils . Processor Power Components. The power consumed by a processor consists of Dynamic power: power for toggling transistors and lines from 0  1 or 1  0 - PowerPoint PPT Presentation

Transcript of Processor Power Components

Page 1: Processor Power Components

Computer Structure 2013 – Power Management 1

Computer Structure

Power Management

Lihu Rappoport and Adi Yoaz

Thanks to Efi Rotem for many of the foils

Page 2: Processor Power Components

Computer Structure 2013 – Power Management 2

Processor Power Components· The power consumed by a processor consists of

– Dynamic power: power for toggling transistors and lines from 01 or 10 αCV2f : α – activity, C – capacitance, V – voltage, f – frequency

– Leakage power: leakage of transistors under voltage function of: Z – total size of all transistors, V – voltage, t – temperature

· Peak power must not exceed the thermal constrains– Power generates heat

Heat must be dissipated to keep transistors within allowed temperature– Peak power determines peak frequency (and thus peak performance)– Also affects form factor, cooling solution cost, and acoustic noise

· Average power– Determines battery life (for mobile devices), electricity bill, air-condition bill– Average power = Total Energy / Total time

Including low-activity and idle-time (~90% idle time for client)

Page 3: Processor Power Components

Computer Structure 2013 – Power Management 3

Performance per Watt· In small form-factor devices thermal budget limits performance

– Old target: get max performance– New target: get max performance at a given power envelope

Performance per Watt

· Increasing f also requires increasing V (~linearly)– Dynamic Power = αCV2f = Kf3 X% performance costs ~3X% power– A power efficient feature – better than 1:3 performance : power

Otherwise it is better to just increase frequency (and voltage)

· Vmin is the minimal operation voltage– Once at Vmin, reducing frequency no longer reduces voltage– At this point a feature is power efficient only if it is 1:1 performance : power

· Active energy efficiency tradeoff– Energyactive = Poweractive × Timeactive Poweractive / Perfactive

– Energy efficient feature: 1:1 performance : power

Page 4: Processor Power Components

Computer Structure 2013 – Power Management 4

Platform Power· Processor average power is <10% of the platform

Display(panel + inverter)

33%

CPU10%

Power Supply10%

MCH9%

Misc.8%

GFX8%

HDD8%

CLK5%

ICH3%

DVD2%

LAN2%

Fan2%

Page 5: Processor Power Components

Computer Structure 2013 – Power Management 5

Managing Power· Typical CPU usage varies over time

– Bursts of high utilization & long idle periods (~90% of time in client)

· Optimize power and energy consumption– High power when high performance is needed– Low power at low activity or idle

· Enhanced Intel SpeedStep® Technology– Multi voltage/frequency operating points– OS changes frequency to meet performance needs and minimize power– Referred to as processor Performance states = P-States

· OS notifies CPU when no tasks are ready for execution– CPU enters sleep state, called C-state– Using MWAIT instruction, with C-state level as an argument– Tradeoff between power and latency

Deeper sleep more power savings longer to wake

Page 6: Processor Power Components

Computer Structure 2013 – Power Management 6

P-states· Operation frequencies are called P-states = Performance states

– P0 is the highest frequency – P1,2,3… are lower frequencies– Pn is the min Vcc point = Energy efficient point

· DVFS = Dynamic Voltage and Frequency Scaling– Power = CV2f ; f = KV Power ~ f3 – Program execution time ~ 1/f– E = P×t E ~ f2

Pn is the most energy efficient point

– Going up/down the cubic curve of power High cost to achieve frequency large power savings for

some small frequency reduction

P0P1

Pn

Freq

Power

P2

Page 7: Processor Power Components

Computer Structure 2013 – Power Management 7

C-States: C0

· C0: CPU active state

Leakage

Clock Distribution

Local Clocks and Logic

Active Core Power

Page 8: Processor Power Components

Computer Structure 2013 – Power Management 8

C-States: C1

· C0: CPU active state· C1: Halt state:

• Stop core pipeline• Stop most core clocks• No instructions are executed• Caches respond to external snoops

Leakage

Clock Distribution

Active Core Power

Page 9: Processor Power Components

Computer Structure 2013 – Power Management 9

C-States: C3

· C0: CPU active state· C1: Halt state:

• Stop core pipeline• Stop most core clocks• No instructions are executed• Caches respond to external snoops

· C3 state:• Stop remaining core clocks• Flush internal core caches

Leakage

Active Core Power

Page 10: Processor Power Components

Computer Structure 2013 – Power Management 10

C-States: C6

· C0: CPU active state· C1: Halt state:

• Stop core pipeline• Stop most core clocks• No instructions are executed• Caches respond to external snoops

· C3 state:• Stop remaining core clocks• Flush internal core caches

· C6 state:• Processor saves architectural state• Turn off power gate, eliminating leakage

Leakage

Core power goes to ~0

Active Core Power

Page 11: Processor Power Components

Computer Structure 2013 – Power Management 11

Putting it all together

· CPU running at max power and frequency· Periodically enters C1

0

2

4

6

8

10

12

14

16

18

20

Pow

er [W

]

C1

C0P0

Time

Page 12: Processor Power Components

Computer Structure 2013 – Power Management 12

Putting it all together

· Going into idle period– Gradually enters deeper C states– Controlled by OS

0

2

4

6

8

10

12

14

16

18

20

Time

Pow

er [W

]

C2C3

C4C1

C0P0

Page 13: Processor Power Components

Computer Structure 2013 – Power Management 13

Putting it all together

· Tracking CPU utilization history– OS identifies low activity– Switches CPU to lower P state

0

2

4

6

8

10

12

14

16

18

20

Time

Pow

er [W

]

C2C3

C4

C0P1

C1

C0P0

Page 14: Processor Power Components

Computer Structure 2013 – Power Management 14

Putting it all together

· CPU enters Idle state again

0

2

4

6

8

10

12

14

16

18

20

Time

Pow

er [W

]

C2C3

C4

C0P1

C2C3

C4C1

C0P0

Page 15: Processor Power Components

Computer Structure 2013 – Power Management 15

· Further lowering the P state· DVD play runs at lowest P state

0

2

4

6

8

10

12

14

16

18

20

Putting it all together

Time

Pow

er [W

]

C2C3

C4

C0P1

C0P2C2

C3C4

C1

C0P0

Page 16: Processor Power Components

Computer Structure 2013 – Power Management 16

Voltage and Frequency Domains· Two Independent Variable Power Planes

– CPU cores, ring and LLC Embedded power gates – each core

can be turned off individually Cache power gating – turn off portions

or all cache at deeper sleep states– Graphics processor

Can be varied or turned off when not active

· Shared frequency for all IA32 cores and ring

· Independent frequency for PG· Fixed Programmable power plane

for System Agent – Optimize SA power consumption– System On Chip functionality and PCU logic– Periphery: DDR, PCIe, Display

VCC Core(Gated)

VCC Core(Gated)

VCC Core(Gated)

VCC Core(Gated)

VCC Core(ungated)

VCC SA

VCC Graphics

VCC Periphery

VCC Periphery

Embe

dded

pow

er g

ates

Page 17: Processor Power Components

Computer Structure 2013 – Power Management 17

Turbo Mode· P1 is guaranteed frequency

– CPU and GFX simultaneous heavy load at worst case conditions– Actual power has high dynamic range

· P0 is max possible frequency – the Turbo frequency– P1-P0 has significant frequency range (GHz)

Single thread or lightly loaded applications GFX <>CPU balancing

– OS treats P0 as any other P-state Requesting is when it needs more performance

– P1 to P0 range is fully H/W controlled Frequency transitions handled completely in HW PCU keeps silicon within existing operating limits

– Systems designed to same specs, with or without Turbo Mode

· Pn is the energy efficient state– Lower than Pn is controlled by Thermal-State

“Turbo”H/W

Control

OS VisibleStates

OS Control

T-state &Throttle

P1

Pn

P0 1C

freq

uenc

y

LFM

Page 18: Processor Power Components

Computer Structure 2013 – Power Management 18

Freq

uenc

y (F

)

Freq

uenc

y (F

)

No Turbo

Core

0

Core 1Core 2Core 3

Core 2Core 3Co

re 0

Core 1Power Gating

Zero power for inactive cores

Turbo Mode

Workload Lightly Threaded

Page 19: Processor Power Components

Computer Structure 2013 – Power Management 19

Workload Lightly Threaded

Freq

uenc

y (F

)

Freq

uenc

y (F

)

No Turbo

Core

0

Core 1Core 2Core 3

Turbo ModeUse thermal budget of

inactive core to increase

frequency of active cores

Core

0

Core 1

Turbo Mode

Power GatingZero power for inactive cores

Page 20: Processor Power Components

Computer Structure 2013 – Power Management 20

Freq

uenc

y (F

)

Freq

uenc

y (F

)

No Turbo

Core

0

Core 1Core 2Core 3

Workload Lightly Threaded Co

re 0

Core 1

Turbo Mode

Power GatingZero power for inactive cores

Turbo ModeUse thermal budget of

inactive core to increase

frequency of active cores

Page 21: Processor Power Components

Computer Structure 2013 – Power Management 21

Active cores running workloads < TDP

Freq

uenc

y (F

)

Freq

uenc

y (F

)

No Turbo

Core

0

Core 1Core 2Core 3

Core 2 Core 3

C

ore

0 Core 1Core 2Core 3Co

re 0

Core 1

Turbo Mode

Turbo ModeIncrease frequency

within thermal headroom

Page 22: Processor Power Components

Computer Structure 2013 – Power Management 22

Freq

uenc

y (F

)

Freq

uenc

y (F

)

No Turbo

Core

0

Core 1Core 2Core 3

Workload Lightly Threaded

And active cores < TDP

Core 2 Core 3

Core 1

Cor

e 0

Turbo Mode

Turbo ModeIncrease frequency

within thermal headroom

Power GatingZero power for inactive cores

Page 23: Processor Power Components

Computer Structure 2013 – Power Management 23

Thermal CapacitanceClassic Model

Steady-State Thermal Resistance

Design guide for steady state

Tem

pera

ture

Time

Classic model response

Temperature rises as energy is delivered to thermal solutionThermal solution response is calculated at real-time

Tem

pera

ture

Time

More realistic response to power

changes

New ModelSteady-State Thermal Resistance

ANDDynamic Thermal Capacitance

Foil taken from IDF 2011

Page 24: Processor Power Components

Computer Structure 2013 – Power Management 24

Time

Power

Sleep orLow power

Turbo Boost 2.0

“TDP”

C0/P0(Turbo)

After idle periods, the system accumulates “energy budget” and can accommodate high power/performance for a few seconds

In Steady State conditions the power stabilizes on TDP

P > TDP:

Responsiveness

Sust

ain

power

Buildup thermal budget during idle periods

Use accumulated

energy budget to enhance user

experience

Intel® Turbo Boost Technology 2.0

Foil taken from IDF 2011

Page 25: Processor Power Components

Computer Structure 2013 – Power Management 25

Core and Graphic Power Budgeting• Cores and Graphics integrated on the same die with

separate voltage/frequency controls; tight HW control• Full package power specifications available for sharing• Power budget can shift between Cores and Graphics

Core Power [W]

Graphics Power[W]

Total package power

Realistic concurrentmax power

Sum of max power

Heavy Graphicsworkload

Heavy CPUworkload

SpecificationCore Power

Specification Graphics Power

Applications

Sandy Bridge Next Gen Turbo

for short periods

Foil taken from IDF 2011