Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power...

27
1 Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay Basic Trends Lecture notes S. Yalamanchili and S. Mukhopadhyay

Transcript of Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power...

Page 1: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

1

Power Management

Lecture notes S. Yalamanchili and S. Mukhopadhyay

Basic Trends

Lecture notes S. Yalamanchili and S. Mukhopadhyay

Page 2: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

2

(3)

Technology Scaling

• 30% scaling down in dimensions à doubles transistor density

• Power per transistor v Vdd scaling à lower power

• Transistor delay =

GATE

SOURCE

BODY

DRAIN

tox

GATESOURCE DRAIN

L

P =αCVdd2 f +VddIleak

Delay = k ⋅C VddVdd −Vt( )2

(4)

Moore’s Law

4

From wikipedia.org

• Performance scaled with number of transistors

• Dennard scaling*: power scaled with feature size

Goal: Sustain Performance Scaling

*R. Dennard, et al., “Design of ion-implanted MOSFETs with very small physical dimensions,” IEEE Journal of Solid State Circuits, vol. SC-9, no. 5, pp. 256-268, Oct. 1974.

Page 3: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

3

(5)

Parallelism and PowerIBM Power5

Source: IBM

AMD Trinity

Source: forwardthinking.pcmag.com

• How much of the chip area is devoted to compute?

• Run many cores slower. Why does this reduce power?

(6)

Parallelism

• Concurrency + lower frequency à greater energy efficiency

P =αCVdd2 f +VddIleak

Core

Cache

Core

Cache

Core

Cache

Core

Cache

Core

Cache

• 4X #cores• 0.75x voltage• 0.5x Frequency• ~1X power• 2X in performance

Example

Page 4: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

4

(7)

The Power Wall

• Power per transistor scales with frequency but also scales with Vddv Lower Vdd can be compensated for with increased

pipelining to keep throughput constantv Power per transistor is not same as power per

area à power density is the problem!v Multiple units can be run at lower frequencies to

keep throughput constant, while saving power

P =αCVdd2 f +VddIleak

(8)

Mukhopadhyay and Yalamanchili (2009)

n Based on scaling using Pentium-class coresn While Moore’s Law continues, scaling phenomena have changed

n Power densities are increasing with each generation

8

What is the Problem?

Page 5: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

5

(9)

ITRS Roadmap for Logic Devices

From: “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems,” P. Kogge, et.al, 2008

Power Management Basics

Lecture notes S. Yalamanchili and S. Mukhopadhyay

Page 6: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

6

(11)

What are my Options?

1. Better technologyv Manufacturingv Better devices (FinFet)v New Devices à non-CMOS? à this is the future

2. Be more efficient – activity managementv Clock gating – dynamic energy/powerv Power gating – static energy/powerv Power state management - both

3. Improved architecturev Simpler pipelines

4. Parallelism

Not this course

Where does the power go?

(12)

Distribution of Power on Chip*

* From G. Chandra, P. Kapur and K. C. Saraswat, Scaling Trends for Distribution of on chip Power”, Stanford University, 2011

Circa 2011 (50nm)

Modern Estimates are 20%-40% (courtesy S. Mukhopadhyay)

Page 7: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

7

(13)

Activity Management

• Turn off clock to a block of logic

• Eliminate unnecessary transitions/activity

• Clock distribution power

• Turn off power to a block of logic, e.g., core

• No leakage

CombinationalLogic

clk

clkcond

input

clkCore 0 Core 1

VddPower gate transistor

Clock Gating Power Gating

(14)

Qualcomm Snapdragon

www.quora.com

• Heterogeneous architecture

• Different components used at different points in time

Page 8: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

8

(15)

Multiple Voltage-Frequency Domains

From E. Rotem et. Al. HotChips 2011

• Cores and ring in one DVFS domain• Graphics unit in another DVFS domain• Cores and portion of cache can be gated

off

Intel Sandy Bridge Processor

(16)

Processor Power States

• Performance States – P-statesv Operate at different voltage/frequencies

o Recall delay-voltage relationshipv Lower voltage à lower leakagev Lower frequency à lower power (not the same as energy!)v Lower frequency à longer execution time

• Idle States - C-statesv Sleep statesv Differ is how much state is saved

• SW or HW managed transitions between states!

Page 9: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

9

(17)

Example of P-states

• Software Managed Power States

• Changing Power States is not free

AMD Trinity A10-5800 APU: 100W TDP

CPU P-state

Voltage (V)

Freq (MHz)

HWOnly

(Boost)

Pb0 1 2400

Pb1 0.875 1800

SW-Visible

P0 0.825 1600

P1 0.812 1400

P2 0.787 1300

P3 0.762 1100

P4 0.75 900

(18)

Example of P-states

From: http://www.intel.com/content/www/us/en/processors/core/2nd-gen-core-family-mobile-vol-1-datasheet.html

Page 10: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

10

(19)

Management Knobs

• Each core can be in any one of a multiple of states

• How do I decide what state to set each core?v Who decides? HW? SW?

• How do I decide when I can turn off a core?

• What am I saving? Static energy or dynamic energy?

(20)

Power Management

• Software controlled power managementv Optimize power and/or energyv Orchestrated by the operating system or application

librariesv Industry standard interfaces for power management

o Advanced Configuration and Power Interface (ACPI)n https://www.acpica.org/n http://www.acpi.info/

• Hardware power managementv Optimized power/energyv Failsafe operation, e.g., protect against thermal

emergencies à automatically throttle when exceeding temperature bounds.

Page 11: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

11

(21)

Power Management3.0

Time Die

Tem

per

atu

re

Thermal Headroo

m

Convert thermal headroom to higher performance through boost

HW Boost states

SW visible states

Per

form

ance

CPU DVFS-state

HWOnly

(Boost)

Pb0Pb1

SW-Visible

P0P1P2

- - -Pmin

Instructions/cycle

Time

Performance and energy efficiency depend on effective utilization of power and thermal headroom

(22)

Boosting

• Exploit package physicsv Temperature changes on the

order of milliseconds

• Use the thermal headroom

Max Power

TDP Power

Low power – build up thermal credits

Turbo boost region

10s of seconds

Intel Sandy Bridge

Throttling

Page 12: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

12

(23)

Power Gating

Intel Sandy Bridge Processor

• Turn off components that are not being usedv Lose all state information

• Costs of powering down

• Costs of powering up

• Smart shutdownv Models to guide decisions

(24)

Linux CPU Governors

Page 13: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

13

(25)

Linux CPU Governors

Sources: https://www.kernel.org/doc/Documentation/cpu-freq/governors.txthttps://lwn.net/Articles/682391/https://lwn.net/Articles/531853/https://android.googlesource.com/kernel/common/+/a7827a2a60218b25f222b54f77ed38f57aebe08b/Documentation/cpu-freq/governors.txt

Performance: Set all cores to max supported frequency. (Max performance)

Powersave: Set all cores to min supported frequency. (Min power)

Userspace: Allow user to set any supported frequency.

Ondemand: Sample periodically, assess CPU Load, set frequency (Max performance)If load > up_threshold, set core frequency to max. If load < up_threshold, decrease core frequency based on sample_down_factor.

Conservative: Same as ondemand. Increase and decrease of core frequency is smoother.

SchedUtil: Uses CPU load as calculated by the scheduler’s per-entity-load-tracking (PELT).

Interactive: Designed for Android. Event (cpu idle state) based policy makes it more responsive than ondemand and conservative. Boosting of core frequencies is done via a heuristic. (Max performance when needed).

(26)

Parallelism

• Concurrency + lower frequency à greater energy efficiency

P =αCVdd2 f +VddIleak

Core

Cache

Core

Cache

Core

Cache

Core

Cache

Core

Cache

• 4X #cores• 0.75x voltage• 0.5x Frequency• 1X power• 2X in performance

Example

Page 14: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

14

(27)

Simplify Core DesignAMD Bulldozer Core

ARM A7 Core (arm.com)

• Support for branch prediction, schedulers, etc. consumes more energy per instruction

• Can fit many more simpler cores on a die

(28)

Metrics

• Power efficiencyv MIPS/wattv Ops/watt

• Energy efficiencyv Joules/instructionv Joules/op

• Compositev Energy-delay productv Energy-delay2

Why are these useful?

Page 15: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

15

Modeling

Lecture notes S. Yalamanchili and S. Mukhopadhyay

(30)

Microarchitectural Level Models

• How can we study power consumption without building circuits?v Models

• Models can are available at multiple levels of abstraction.

We are interested in microarchitectural models

Page 16: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

16

(31)

Processor Microarchitecture

Instruction Cache

Instruction Queue

FetchQueue

Instruction Decoder

BranchPrediction

Register Files

Instruction TLB

ALU

MUL

FPU

LD

ST

L1 Data Cache

DataTLB

L2 Data CacheNoC

RouterOn-ChipNetwork

Fetch Decode Execute/Writeback

Memory

Network

(32)

Energy/Power Calculation

• How do we calculate energy or power dissipation for a given microarchitecture?

• Energy/Power varies between:v Different ISA; ARM vs Intel x86

v Different microarchitecture; in-order vs out-of-order

v Different applications; memory vs compute-bound

v Different technologies; 90nm vs 22nm technology

v Different operation conditions; frequency, temperature

Page 17: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

17

(33)

Architecture Activity (1)

Instruction Cache

Instruction Queue

FetchQueue

Instruction Decoder

BranchPrediction

Register Files

Instruction TLB

ALU

MUL

FPU

LD

ST

L1 Data Cache

DataTLB

L2 Data CacheNoC

RouterOn-ChipNetwork

Activity 1: Instruction Fetch

icache.read++; fbuffer.write++;

• Collect activity counts of each architecture component (through simulation or measurement).

• List of components differs between microarchitectures.

• Activity counts at each component differs between applications.

(34)

Architecture Activity (2)

Instruction Cache

Instruction Queue

FetchQueue

Instruction Decoder

BranchPrediction

Register Files

Instruction TLB

ALU

MUL

FPU

LD

ST

L1 Data Cache

DataTLB

L2 Data CacheNoC

RouterOn-ChipNetwork

Activity 2: Instruction Decodefbuffer.read++; idecoder.logic++;

• Read/write accesses to caches, buffers, etc.

• Logical accesses to logic blocks such as decoder, ALUs, etc.

• Tradeoff of differentiating more access types (accuracy) vs simulation speed (complexity).

Page 18: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

18

(35)

Power and Architecture Activity

• For example, At nth clock cycle, collected counters are:v Data cache:

o read = 20, write = 12;

o per-read energy = 0.5nJ; per-write energy = 0.6nJ;

o Read energy = read*per-read energy = 10nJ

o Write energy = write*per-write energy = 7.2nJ

o Total activity energy = read+write energies = 17.2nJ

o If n = 50th clock cycle and clock frequency = 2GHz,Total activity power = energy*clock_freq/n = 688mW

*Note: n/clock_freq = n clock periods in secpower = time average of energy

(36)

Things to consider (1)

1. How do we calculate per-read/write energies?

• Per-access energies can be estimated from circuit-level designs and analyses.

• There are various open-source tools for this.

Architecture Specification

Technology Parameters

Circuit-levelEstimation

Tool

Estimation Results:Area, Energy, Timing, etc.

Page 19: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

19

(37)

Things to consider (2)

2. Is per-access energy always the same?

• Per-access energy in fact depends on:• how many bits are switching • how they are switching (0→1 or 1→0)

• It is reasonable to assume constant per-access energy in long-term observation (e.g., n = 1B clock cycles); the number of switching bits are averaged (e.g., 50% of bits are switching).

• Most architecture simulators do not capture bit-level details due to simulation complexity.

(38)

Things to consider (3)

3. If a register file didn’t have read/write accesses but held data, what is the energy dissipation?

• Energy (or power) is largely comprised of dynamic and static dissipations.

• Dynamic (or switching) energy refers to energy dissipation due to switching activities.

• Static (or leakage) energy is dissipation to keep the electronic system turned on.

• In this case, the register file has no dynamic energydissipation but consumes static energy.

Page 20: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

20

(39)

Example: A Simple Energy Model• We can use a simple model of per-access

energy for the architecture componentsCommonComponents AccessEnergy(10-12joules)

Inst.Cache+TLB Read19.22 Write21.6

DataCache+TLB Read25.28 Write27.26

Inst.Decoder Logic Switching16.78

Inst.Registers Read2.74 Write4.38

FP.Registers Read1.26 Write1.98

OtherBuffers Read9.74 Write11.18

ALU+ResultBus(interconnect) LogicSwitching123.2

FPU+ResultBus(interconnect) LogicSwitching 241.02

• Each unit can be accessed multiple times depending on instruction type• An Intel/AMD x86 instruction consume 600pJ ~ 4nJ dynamic energy.

@16nm

Thermal Issues

Lecture notes S. Yalamanchili and S. Mukhopadhyay

Page 21: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

21

(41)

Thermal Issues

• Heat can cause damage to the chipv Need failsafe operation

• Thermal fields change the physical characteristicsv Leakage current and therefore power increasesv Delay increasesv Device degradation becomes worse

• Cooling solution determines the permitted power dissipation

(42)

Thermal Design Power (TDP)

• This is the maximumpower at which the part is designed to operatev Dictates the design of the

cooling system o Max temperature à Tjmax

v Typically fixed by worst case workload

• Parts are typically operating below the TDP

• Opportunities for turbo mode?

AMD Trinity APU

http://ecs.vancouver.wsu.edu/thermofluids-research

Page 22: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

22

(43)

Heat Sink Limits on Performancen Thermal design power (TDP)

n Determines the cooling solution & package limits

n Performance depends on effective utilization of this thermal headroom

} www.legitreviews.com

Instructions/cycle

Time

Thermal Headroom

Convert thermal headroom to higher performance through boosting

HW Boost states

SW visible states

Boost power

Workload

Tem

pP

ower

(44)

Trinity TDP

Source: http://www.anandtech.com/show/6347/amd-a10-5800k-a8-5600k-review-trinity-on-the-desktop-part-2

Page 23: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

23

(45)

Issues

• Cooling chips is now an issue for computer architects!

• Co-design the cooling system and the processor

• Some very “cool” new technologiesv E.g., microfluidics!

(46)

Electrical and Fluidic I/Os

• Fluid flow through the microchannels carry heat out to an external heat exchanger (e.g., heat sink)

Courtesy L. Zheng ECE) and Professor Muhannad Bakir (ECE)

Page 24: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

24

(47)

Fabrication Examples

Electrical and fluidic microbumps, fluidic vias and fine wires

Micropin-fins (150 µm diameter and 225 µm diameter)and vias

Courtesy L. Zheng ECE) and Professor Muhannad Bakir (ECE)

(48)

IBM Series Mainframe

www.03.ibm.com

Page 25: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

25

(49)

Immersion Cooling

www.physic.org

(50)

Conclusions

• Power/energy is the leading driver of modern architecture design

• Power and energy management is key to scalability

• Need integrated power/energy, performance, thermal management in fielded systems

• What about energy/power efficient algorithms?

Page 26: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

26

(51)

Study Guide

• Explain the difference between energy dissipation and power dissipation

• Distinguish between static power dissipation and dynamic power dissipation

• Explain dynamic voltage frequency scalingv What are power states?v Why is this an advantage?v What is the impact of DVFS on i) energy, ii)

execution time, and iii) power

• Distinguish between clock gating and power gating

(52)

Study Guide (cont.)• Define thermal design power (TDP)

• Name two schemes to preventing the chip from exceeding TDP. Explain how they achieve this goal

• What does boosting achieve?

• What is the difference between C-states and P-states?

• Name one power management technique that will save static power?

• How does using many slower simpler cores improve power efficiency?

Page 27: Power Managementece3056-sy.ece.gatech.edu/wp-content/uploads/sites/5… ·  · 2017-12-09v Power gating –static energy/power ... v Orchestrated by the operating system or application

27

(53)

Study Guide (cont.)

• How is thermal design power (TDP) calculated?

• When using boost algorithms, what determines the duration of the high frequency operation?

• How does a power virus work?

• Describe how throttling works

• Know the power dissipation in some modern processor-memory systems drawn from the embedded, server, and high performance computing segments

(54)

Glossary

• Boosting

• C-states

• Dynamic Power and Energy

• Power Gating

• P-states

• Static Power and Energy

• Time constant

• Thermal Design Power (TDP)

• Throttling