Architecting for Intermittence · 2019-10-04 · resume until the device has harvested sufficient...
Transcript of Architecting for Intermittence · 2019-10-04 · resume until the device has harvested sufficient...
Architecting for Intermittence
Joshua San MiguelUniversity of Wisconsin-Madison
Arm Research Summit 2019
Everything is Computing…
2
…Computing is Everything
advanced capabilityquick responselong lifetime
3
…Computing is Everything
advanced capabilityquick responselong lifetime
Energy is the common denominator
4
Energy-Harvesting Devices
μW mW W
5
Intermittent Computing
Non-Volatile Memory
Processor State
CPU
Volatile Memory
Backup Restore
Energy Transducer
Capacitor
data persists upon power loss
volatile data lost:intermittent computing
6
Intermittent Computing
Computation may stop at any point in the program and cannot resume until the device has harvested sufficient energy
time
progress
task A
task B
task C
ideally
7
Intermittent Computing
Computation may stop at any point in the program and cannot resume until the device has harvested sufficient energy
time
progress
task A
task B
task C
power loss
backup
8
Intermittent Computing
Computation may stop at any point in the program and cannot resume until the device has harvested sufficient energy
time
progress
task A
task B
task C
9
Intermittent Computing
Computation may stop at any point in the program and cannot resume until the device has harvested sufficient energy➢ Power losses and backup overheads greatly impede forward progress
time
progress
task A
task B
task C
10
The Necessary Burden of Backups
17%
83%
compute backup-restore
30%
70%
compute backup-restore
59%
41%
compute backup-restore
12%
88%
compute backup-restore
Clank [ISCA’17]
DINO [PLDI’15]
Mementos [ASPLOS’11]
NVP[HPCA’15]
11
Architecting for Intermittence?
Design Tools:➢ EH Model [MICRO’18]
Design Paradigms:➢ Computational Skimming [HPCA’19]
12
Design Space of Intermittent Systems
energy efficiency
volatile memory bytes to save
backup frequencynon-volatile memory bandwidthcharging rate
program load:store ratio
non-volatile access energy
register bytes to save
processor frequency
And many more design axes…➢ e.g., SW vs. HW backups, volatile vs. non-volatile registers, dirty vs. non-dirty data
13
The EH Model
Analytical model for rapid design space exploration of arbitrary intermittent IoT systems [MICRO’18]:➢ Estimates forward progress: % harvested energy spent on useful work➢ Models significant factors that affect energy consumption:
• Dead cycles: instructions executed that are not saved prior to power loss• Charging rate and capacitor size• Architectural state (e.g., register file) and application state (e.g., volatile data) per backup• Non-volatile memory latency and energy per access• Backup frequency; multi-backup vs. single-backup systems
14
The EH Model
Multi-backup systems:➢ e.g., Clank [ISCA’17], Alpaca [OOPSLA’17], Mementos [ASPLOS’11]
restore program backup program backup program backup program
energy supply per active period (i.e., between power losses)
dead energy
• instructions executed• volatile loads/stores• charging
• PC and register file• volatile cache/buffers• non-volatile memory
15
The EH Model
Multi-backup systems:➢ e.g., Clank [ISCA’17], Alpaca [OOPSLA’17], Mementos [ASPLOS’11]
restore program backup program backup program backup program
energy supply per active period (i.e., between power losses)
+ dead energyenergy spent onforward progress
16
The EH Model
Single-backup systems:➢ e.g., Hibernus [IEEE TCAD’16], NVP [HPCA’15], QuickRecall [VLSID’14]
restore program conservative backup
energy supply per active period (i.e., between power losses)
energy spent onforward progress
• ADC voltage check• enough energy for all volatile data
17
The EH Model
rsa
crc
sense
rsa
crc
sense
ar
midi
ds
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
mea
sure
d p
rogr
ess
progress estimated from analytical model
HibernusMementosDINO
Evaluation on MSP430 LaunchPad:
18
The EH Model – Analytical Explorations
e.g., Mementos [ASPLOS’11]:
0%
20%
40%
60%
80%
100%
300 3000 30000
pro
gres
s
cycles between backups
worst-case average best-case
19
The EH Model – Analytical Explorations
e.g., Reduced-precision image recognition on Clank [ISCA’17]:
0%
1%
2%
3%
4%
5%
10 100 1000
imp
rove
me
nt
pe
r b
it r
edu
ctio
n
cycles between backups (program property)
64 B 128 B 256 Bvolatile buffer:
20
The EH Model – Analytical Explorations
e.g., Reduced-precision image recognition on Clank [ISCA’17]:
0%
1%
2%
3%
4%
5%
10 100 1000
imp
rove
me
nt
pe
r b
it r
edu
ctio
n
cycles between backups (program property)
64 B 128 B 256 Bvolatile buffer:
21
The EH Model – Analytical Explorations
More case studies and explorations [MICRO’18]:➢ Saving architectural vs. application state➢ The benefits of store-major locality➢ Controlling backups via write-after-read dependences
22
Architecting for Intermittence?
Design Tools:➢ EH Model [MICRO’18]
Design Paradigms:➢ Computational Skimming [HPCA’19]
23
The Anytime Automaton Model
General-purpose computation model for quality-proportional execution [ISCA’16]:
application execution
quality
precise output
conventionally, single output
24
The Anytime Automaton Model
General-purpose computation model for quality-proportional execution [ISCA’16]:
application execution
quality
precise output
anytime automaton
25
The Anytime Automaton Model
General-purpose computation model for quality-proportional execution [ISCA’16]:
application execution
quality
precise output
interruptibility:use current output if needed
strict target runtime26
The Anytime Automaton Model
General-purpose computation model for quality-proportional execution [ISCA’16]:
application execution
quality
precise output
27
The Anytime Automaton Model
General-purpose computation model for quality-proportional execution [ISCA’16]:
application execution
quality
precise output
28
The Anytime Automaton Model
General-purpose computation model for quality-proportional execution [ISCA’16]:
application execution
quality
precise output
user flexibility:wait longer for better quality
29
Intermittent Computing
Computation may stop at any point in the program and cannot resume until the device has harvested sufficient energy
time
progress
task A
task B
task C
ideally
30
Intermittent Computing
Computation may stop at any point in the program and cannot resume until the device has harvested sufficient energy
time
progress
task A
task B
task C
intermittent
31
Computational Skimming
Insight: Decouple backup from restore point via anytime model
time
progress
task A
task B
task C
backup
restore
32
Computational Skimming
Insight: Decouple backup from restore point via anytime model
time
progress
anytimetask A
anytimetask B
anytimetask C
backup
restore
33
Computational Skimming
Insight: Decouple backup from restore point via anytime model
time
progress
anytimetask A
anytimetask B
anytimetask C
skim
34
Computational Skimming
Insight: Decouple backup from restore point via anytime model
time
progress
anytimetask A
anytimetask B
anytimetask C
35
Computational Skimming
Insight: Decouple backup from restore point via anytime model
time
progress
anytimetask A
anytimetask B
anytimetask C
36
Computational Skimming
Insight: Decouple backup from restore point via anytime modelComputational Skimming: Quality of program output scales with how much energy we can afford
time
progress
anytimetask A
anytimetask B
anytimetask C
37
Computational Skimming
e.g., Blood glucose monitoring:
0
50
100
150
200
250
blo
od
glu
cose
(m
g/d
L)
Clinical readings Sampled readingsclinical readings traditional sampling
critically low
time
38
Computational Skimming
e.g., Blood glucose monitoring:
0
50
100
150
200
250
blo
od
glu
cose
(m
g/d
L)
Clinical Readings Anytime readingsclinical readings computational skimmingtime
39
Computational Skimming – What’s Next
What’s Next processor architecture for computational skimming, designed from baseline ARM M0+ [HPCA’19]:➢ Anytime subword pipelining➢ Anytime subword vectorization➢ Subword memoization
40
Computational Skimming – Anytime Pipelining
Long-latency operations (e.g., mul):➢ Conventional
time
Word 1
Word 2
Word 3
Less
Significance
More
41
Computational Skimming – Anytime Pipelining
Long-latency operations (e.g., mul):➢ Anytime subword pipelining
Skim point
Restore point
Word 1
Word 2
Word 3
time
42
Computational Skimming – Anytime Pipelining
a
x
x
f
+
b
y
y
f
+
a[MSb]
x
x
f
+
a[LSb]
x
f
+
b[MSb]
y
y
f
+
b[LSb]
y
f
+
skim
43
Computational Skimming – Anytime Vectorization
Short-latency operations (e.g., add):➢ Conventional
time
Less significant
More significant
1 2 3Word
44
Computational Skimming – Anytime Vectorization
Short-latency operations (e.g., add):➢ Anytime subword vectorization
time
Word 1
Word 2
Word 3
Skim point
Restore point45
Computational Skimming – Anytime Vectorization
f
x
a b
f
y
c d
f
x[MSb] y[MSb]
a[MSb] c[MSb] b[MSb] d[MSb]
f
x[LSb] y[LSb]
a[LSb] c[LSb] b[LSb] d[LSb]
Skim
46
Computational Skimming – Subword Memoization
In conjunction with anytime subword pipelining:➢ Conventional
time
Word 1
Word 2
Word 3
Less
Significance
More
low value redundancy
47
Computational Skimming – Subword Memoization
In conjunction with anytime subword pipelining:➢ Subword memoization
Skim point
Restore point
Word 1
Word 2
Word 3
time
high value redundancy
48
Computational Skimming – Subword Memoization
a[MSb]
x
x
f
+
b[MSb]
y
y
f
+
a[MSb]
x
x
+
b[MSb]
y
y
+
LUT LUTsupports
zero-skipping
49
Computational Skimming
What’s Next multi-backup system:
0%
5%
10%
15%
20%
25%
0.0x
1.0x
2.0x
3.0x
4.0x
5.0x
MatMul Conv2D Var NetMotion Home MatAdd Average
NR
MSE
(%
)
Spee
du
p
8-bit speedup 4-bit Speedup 8-bit NRMSE 4-bit NRMSE
50
Conclusion
Design Tools:➢ EH Model [MICRO’18]
Design Paradigms:➢ Computational Skimming [HPCA’19]
51
Conclusion
advanced capabilityquick responselong lifetime
Energy is the common denominator
52
Acknowledgements
Research Group:➢ Abhishek Bhattacharyya➢ Asmita Pal➢ Di Wu➢ Giri Prasanna Mugunda Krishnan➢ Mitali Soni
Collaborators:➢ University of Wisconsin-Madison➢ University of Toronto➢ IBM Research
53