Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф...
-
Upload
susan-holt -
Category
Documents
-
view
222 -
download
0
Transcript of Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф...
![Page 1: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/1.jpg)
Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite
Javier Lira ψ
Carlos Molina ф
Antonio González λ
λ Intel Barcelona Research Center
Intel Labs - UPC
Barcelona, Spain
ф Dept. Enginyeria Informàtica
Universitat Rovira i Virgili
Tarragona, Spain
ψ Dept. Arquitectura de Computadors
Universitat Politècnica de Catalunya
Barcelona, Spain
MMCS 2009, Washington DC (USA) - March 7, 2009
![Page 2: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/2.jpg)
Outline
IntroductionMethodologyBank Policy Approaches
Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy
Conclusions
![Page 3: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/3.jpg)
Introduction
CMPs have emerged as a dominant paradigm in system design.
1. Keep performance improvement while reducing power consumption.
2. Take advantage of Thread-level parallelism.
Commercial CMPs are currently available.
CMPs incorporate larger and shared last-level caches.
Wire delay is a key constraint.
![Page 4: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/4.jpg)
NUCA
Non-Uniform Cache Architecture (NUCA) was first proposed in ASPLOS 2002 by Kim et al.[1].
NUCA divides a large cache in smaller and faster banks.
Banks close to cache controller have smaller latencies than further banks.
Processor
[1] C. Kim, D. Burger and S.W. Keckler. An Adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS ‘02
![Page 5: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/5.jpg)
NUCA Policies
Bank Placement Policy Bank Access Policy
Bank Replacement PolicyBank Migration Policy
![Page 6: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/6.jpg)
Outline
IntroductionMethodologyBank Policy Approaches
Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy
Conclusions
![Page 7: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/7.jpg)
Methodology
Simulation tools:Simics + GEMSCACTI v6.0
PARSEC Benchmark Suite
Number of cores 8
Core processor Out-of-order SPARCv9
Main Memory Size 4 GBytes
Memory Bandwidth 512 Bytes/cycle
On-chip wire delay 1 cycle
Off-chip wire delay 20 cycles
Switch delay 1 cycle
Private L1 data caches 8 KBytes
Private L1 instr. caches 8 KBytes
Shared L2 NUCA cache 1 MBytes, 256 Banks
![Page 8: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/8.jpg)
Baseline NUCA cache architecture
L1D L1I L1D L1I
L1D
L1I
L1D
L1I
L1D
L1I
L1D
L1I
L1D L1I L1D L1I
Core 7 Core 6
Core
1
Core 5
Core
0
Core 2 Core 3C
ore 4
8 cores
256 banks
[2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO ‘04
![Page 9: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/9.jpg)
Outline
IntroductionMethodologyBank Policy Approaches
Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy
Conclusions
![Page 10: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/10.jpg)
Bank Placement Policy
1B + Static 16B + Static 16B + Local
L1D L1I L1D L1I
L1D
L1I
L1D
L1I
L1D
L1I
L1D
L1I
L1D L1I L1D L1I
Core 7 Core 6
Core
1
Core 5
Core
0
Core 2 Core 3
Core 4
![Page 11: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/11.jpg)
Bank Placement Policy
1B + Static placement provides fair distribution.
16B configurations concentrate data in few banks.
Placement and migration policies are strictly correlated.
![Page 12: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/12.jpg)
Outline
IntroductionMethodologyBank Policy Approaches
Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy
Conclusions
![Page 13: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/13.jpg)
Bank Access Policy
Partially Serial 9P + 7P Parallel
L1D L1I L1D L1I
L1D
L1I
L1D
L1I
L1D
L1I
L1D
L1I
L1D L1I L1D L1I
Core 7 Core 6
Core
1
Core 5
Core
0
Core 2 Core 3
Core 4
![Page 14: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/14.jpg)
Bank Access Policy
Power efficiency vs. Perfomance.
9P + 7P is a trade-off, but it is still far from the performance potencial.
These results suggest the broad area of improvement on this policy.
![Page 15: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/15.jpg)
Outline
IntroductionMethodologyBank Policy Approaches
Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy
Conclusions
![Page 16: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/16.jpg)
Bank Migration Policy
Static
Gradual + Swapping
Gradual + Replication
L1D L1I L1D L1I
L1D
L1I
L1D
L1I
L1D
L1I
L1D
L1I
L1D L1I L1D L1I
Core 7 Core 6
Core
1
Core 5
Core
0
Core 2 Core 3
Core 4
![Page 17: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/17.jpg)
Bank Migration Policy
Replication reduces the effective size of the cache.
Migration approaches concentrate data blocks in few banks.
Static approach fairly distribute data blocks in the whole cache.
Placement and migration policies are strictly correlated.
![Page 18: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/18.jpg)
Outline
IntroductionMethodologyBank Policy Approaches
Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy
Conclusions
![Page 19: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/19.jpg)
Bank Replacement Policy
Zero-copy One-copy Last Bank
L1D L1I L1D L1I
L1D
L1I
L1D
L1I
L1D
L1I
L1D
L1I
L1D L1I L1D L1I
Core 7 Core 6
Core
1
Core 5
Core
0
Core 2 Core 3
Core 4
Last Bank
![Page 20: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/20.jpg)
Bank Replacement Policy
Giving a second chance to evicted data blocks provides significant performance gain.
Last Bank is a promising mechanism, but this is restricted by its small size.
Further exploration on this policy is required.
![Page 21: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/21.jpg)
Outline
IntroductionMethodologyBank Policy Approaches
Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy
Conclusions
![Page 22: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/22.jpg)
Conclusions
NUCA is characterized by four policies.
NUCA policies are related.
Static placement with no-migration: Good trade-off.
Bank placement and bank migration are strictly correlated.
Bank access: Power efficiency vs. Performance.
Bank replacement: ↑ Performance (unbounded last bank).
Still room for improvement in all policies.
![Page 23: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.](https://reader036.fdocument.org/reader036/viewer/2022062320/56649f525503460f94c75c25/html5/thumbnails/23.jpg)
Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite
Questions?