SEESAW: Set Enhanced Superpage are caching¡SEESAW provides low-associative access to superpages,...

Post on 27-Mar-2020

3 views 0 download

Transcript of SEESAW: Set Enhanced Superpage are caching¡SEESAW provides low-associative access to superpages,...

SEESAW:Set Enhanced Superpage

Aware cachingMayank Parasar∑, Abhishek

BhattacharjeeΩ, Tushar Krishna∑

∑School of Electrical and Computer EngineeringGeorgia Institute of Technology

ΩDepartment of Computer Science Rutgers University

mparasar3@gatech.edu

Set

Associativityhttp://synergy.ece.gatech.edu/

Outline¡Motivation

¡SEESAW: Concept

¡SEESAW: Micro-architecture

¡Evaluation Methodology

¡Results

¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

2

6/26/18

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

3

6/26/18

L1 Cache Characteristics

Fast lookup

High hit-rate

EnergyEfficiency

Virtually Indexed Physically Tagged [VIPT] Cache

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

4

6/26/18

TLB

PPN Page Offset

tag Data blockv

VPN Page Offset

Setindex

block offset

Cache HIT/MISS

VA

PA

Way-1Way-2Way-3Way-4

Way-1Way-2Way-3Way-4

set-1

set-NWay-1Way-2Way-3Way-4

Way-1Way-2Way-3Way-4

=

Virtually Indexed Physically Tagged [VIPT] Cache

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

5

6/26/18

TLB

PPN Page Offset

tag Data blockv

VPN Page Offset

Setindex

block offset

Cache HIT/MISS

VA

PA

Way-1Way-2Way-3Way-4

Way-1Way-2Way-3Way-4

set-1

set-NWay-1Way-2Way-3Way-4

Way-1Way-2Way-3Way-4

=

VIPT Caches necessitate:

(set-index + block-offset) <= Page-offset

Impact of Associativity on Access Latency and Energy of cache

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

6

6/26/18

Cache Access Latency Cache Access Energy

Effect of associativity on MPKI of cache

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

7

6/26/18

High Associativity hurts latency and energy without commensurately improving hit rate

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

8

6/26/18

Revisiting L1 Cache Characteristics for VIPT Cache

Fast lookup

High hit-rate

EnergyEfficiency

Virtual

memory!

Virtual

memory!

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

9

6/26/18

Opportunity: SuperpageIs it possible to relax constrains of

Traditional VIPT cache? Yes

How ?

4-KB2-MB

1-GB

More page-offset bitsfor superpage!

HW and OS Support for Superpagesin modern processors

Baseline Page

Super Page

Offset-bits:12

Offset-bits:21

Offset-bits:30

Prevalence of superpages in modern OSes under memory fragmentation

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

10

6/26/18

Ran on 32-core; Sandybridge; 32 GB RAMMemhog causes memory fragmentation; higher

%age indicates higher fragmentation

Outline¡Motivation

¡SEESAW: Concept

¡SEESAW: Micro-architecture

¡Evaluation Methodology

¡Results

¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

11

6/26/18

SEESAW: Concept

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

12

6/26/18

Less-setsMore-associativity

More-setsLess-associativity

super-page

Base-page

tag Data blockv

Way-1 Way-1

Way-1

Way-1

Way-1

Way-1Way-2 Way-2

Way-2

Way-2

Way-2

Way-2

Way-3 Way-3

Way-3Way-3

Way-3 Way-3

Set:1

Set:2

Set:3

tag Data blockv

Way-1 Way-1

Way-1

Way-1

Way-1

Way-1Way-1 Way-1

Way-1

Way-1

Way-1

Way-1

Way-1 Way-1

Way-1Way-1

Way-1 Way-1

Set:1

Set:3Set:2

Set:4Set:5Set:6

Set:7Set:8Set:9

Faster

Energy-Efficient

Outline¡Motivation

¡SEESAW: Concept

¡SEESAW: Micro-architecture

¡Evaluation Methodology

¡Results

¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

13

6/26/18

SEESAW: Micro-architecture

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

14

6/26/18

VPNSet

indexblock offset

Cache

VA

TLB

PPN Basepage OffsetPA

set-N

set-1

tag Data blockvWay-3Way-4

Way-3Way-4

Way-3Way-4

Way-3Way-4

Basepage Offset

tag Data blockvWay-1Way-2

Way-1

Way-2

Way-1Way-2

Way-1Way-2

set-1

set-N

Partition bit

Translation Filter Table

(TFT)

Partitiondecoder

Predicts whether page is superpage

Partition-0 Partition-1

Superpage offsetDecodes

partition index from partition bit

SEESAW: Micro-architecture

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

15

6/26/18

VPNSet

indexblock offset

VA

TLB

PPN Basepage OffsetPA

set-N

set-1

tag Data blockvWay-3Way-4

Way-3Way-4

Way-3Way-4

Way-3Way-4

Basepage Offset

tag Data blockvWay-1Way-2

Way-1

Way-2

Way-1Way-2

Way-1Way-2

set-1

set-N

Partition bit

Translation Filter Table

(TFT)

Partitiondecoder

Partition-0 Partition-1Cache

Superpage offset

SEESAW: Superpage access

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

16

6/26/18

VPNSet

indexblock offset

VA

TLB

PPN Basepage OffsetPA

set-N

set-1

tag Data blockvWay-3Way-4

Way-3Way-4

Way-3Way-4

Way-3Way-4

Basepage Offset

tag Data blockvWay-1Way-2

Way-1

Way-2

Way-1Way-2

Way-1Way-2

set-1

set-N

Partition bit

Translation Filter Table

(TFT)

Partitiondecoder

Partition-0 Partition-1Cache

Super Page

Superpage offset

= HIT/MISS

SEESAW: Basepage access

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

17

6/26/18

VPNSet

indexblock offset

VA

TLB

PPN Basepage OffsetPA

set-N

set-1

tag Data blockvWay-3Way-4

Way-3Way-4

Way-3Way-4

Way-3Way-4

Basepage Offset

tag Data blockvWay-1Way-2

Way-1

Way-2

Way-1Way-2

Way-1Way-2

set-1

set-N

Partition index

Translation Filter Table

(TFT)

Partitiondecoder

Partition-0 Partition-1Cache

Not a Super Page

= HIT/MISS

SEESAW: TFT and Partition Decoder

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

18

6/26/18

Superpage?Tag: VA[63:21]

Partitiondecoder

Translation Filter Table

(TFT)

Translation Filter TableØ TFT Lookup

Ø Direct mappedØ False negative due to size

Ø TFT UpdateØ VA mispredictionØ 2MB L1-TLB fillØ 2MB L1-TLB Invalidation

Partition DecoderØ For 32kB CacheØ For 64kB Cache

SEESAW: Cache line insertion policy

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

19

6/26/18

VPNSet

indexblock offset

VA

TLB

PPN Baseline Page OffsetPA

set-N

set-1

tag Data blockvWay-3Way-4

Way-3Way-4

Way-3Way-4

Way-3Way-4

Baseline Page Offset

tag Data blockvWay-1Way-2

Way-1

Way-2

Way-1Way-2

Way-1Way-2

set-1

set-N

Partition bit

Translation Filter Table

(TFT)

Partitiondecoder

Partition-0 Partition-1Cache

Which partition should cache-

line be inserted?

SEESAW: Cache line insertion policy¡4way-8way¡Superpage miss: victim within the partition¡Basepage miss: victim within the set

¡4way¡Uses LRU within the associated partition¡Avoid installing the same line twice¡Saves energy

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

20

6/26/18

SEESAW: System Level Optimization¡Cache coherence ¡Cache coherence lookups use physical address ¡Snoopy provide higher energy benefits over Directory based

coherence

¡Page table modifications¡Superpage splintered into multiple basepages¡Multiple basepages promoted to superpages

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

21

6/26/18

Outline¡Motivation

¡SEESAW: Concept

¡SEESAW: Micro-architecture

¡Evaluation Methodology

¡Results

¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

22

6/26/18

SEESAW: Simulated system

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

23

6/26/18

SEESAW: Workloads

¡Spec¡Parsec¡Cloudsuite¡Tunkrank

¡Biobench¡Mummer¡Tiger

¡MongoDB

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

24

6/26/18

¡Server Workload¡graph500¡Nutch Hadoop

¡Social-event web service¡Olia

¡Key value store¡Redis

Outline¡Motivation

¡SEESAW: Concept

¡SEESAW: Micro-architecture

¡Evaluation Methodology

¡Results

¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

25

6/26/18

SEESAW: Performance improvement

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

26

6/26/18

SEESAW observes 3-10% better runtime over baseline

SEESAW: Performance improvement

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

27

6/26/18

Out-of-orderCPU

in-orderCPU

~10% performance improvement for 64kB cache in OoO CPUs

SEESAW: Energy savings

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

28

6/26/18

10-20% more energy savings over CPUs using baseline VIPT caches!

Approx. one-third of energy savings from coherence

SEESAW: TFT analysis and Way-Prediction

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

29

6/26/18

TFT Analysis SEESAW + Way-prediction

16-entry TFT drives miss-rate under 10%SEESAW+WP shows symbiotic behavior

Outline¡Motivation

¡SEESAW: Concept

¡SEESAW: Micro-architecture

¡Evaluation Methodology

¡Results

¡Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

30

6/26/18

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

31

6/26/18

Revisiting L1 Cache Characteristic

Fast lookup

High hit-rate

EnergyEfficiency

SEESAW: Conclusion

Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech

32

6/26/18

SetAssociativity

¡ L1 caches are optimized for latency¡ VIPT imposes indirect restriction on number of

sets in a L1 cache, increasing associativity¡ There is non-linear relation between associativity

and access latency/energy of the L1 cache

¡ Superpages are often used in modern OSes¡ SEESAW provides low-associative access to

superpages, providing both latency and energy benefits

¡ Up to 10 % performance improvement and 20 % energy reduction in modern workloads

¡ SEESAW has extremely low-overhead and is readily implementable