Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф...

23
Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spain [email protected] ф Dept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spain [email protected] ψ Dept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain [email protected] MMCS 2009, Washington DC (USA) - March 7, 2009

Transcript of Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф...

Page 1: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite

Javier Lira ψ

Carlos Molina ф

Antonio González λ

λ Intel Barcelona Research Center

Intel Labs - UPC

Barcelona, Spain

[email protected]

ф Dept. Enginyeria Informàtica

Universitat Rovira i Virgili

Tarragona, Spain

[email protected]

ψ Dept. Arquitectura de Computadors

Universitat Politècnica de Catalunya

Barcelona, Spain

[email protected]

MMCS 2009, Washington DC (USA) - March 7, 2009

Page 2: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Outline

IntroductionMethodologyBank Policy Approaches

Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy

Conclusions

Page 3: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Introduction

CMPs have emerged as a dominant paradigm in system design.

1. Keep performance improvement while reducing power consumption.

2. Take advantage of Thread-level parallelism.

Commercial CMPs are currently available.

CMPs incorporate larger and shared last-level caches.

Wire delay is a key constraint.

Page 4: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

NUCA

Non-Uniform Cache Architecture (NUCA) was first proposed in ASPLOS 2002 by Kim et al.[1].

NUCA divides a large cache in smaller and faster banks.

Banks close to cache controller have smaller latencies than further banks.

Processor

[1] C. Kim, D. Burger and S.W. Keckler. An Adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS ‘02

Page 5: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

NUCA Policies

Bank Placement Policy Bank Access Policy

Bank Replacement PolicyBank Migration Policy

Page 6: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Outline

IntroductionMethodologyBank Policy Approaches

Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy

Conclusions

Page 7: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Methodology

Simulation tools:Simics + GEMSCACTI v6.0

PARSEC Benchmark Suite

Number of cores 8

Core processor Out-of-order SPARCv9

Main Memory Size 4 GBytes

Memory Bandwidth 512 Bytes/cycle

On-chip wire delay 1 cycle

Off-chip wire delay 20 cycles

Switch delay 1 cycle

Private L1 data caches 8 KBytes

Private L1 instr. caches 8 KBytes

Shared L2 NUCA cache 1 MBytes, 256 Banks

Page 8: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Baseline NUCA cache architecture

L1D L1I L1D L1I

L1D

L1I

L1D

L1I

L1D

L1I

L1D

L1I

L1D L1I L1D L1I

Core 7 Core 6

Core

1

Core 5

Core

0

Core 2 Core 3C

ore 4

8 cores

256 banks

[2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO ‘04

Page 9: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Outline

IntroductionMethodologyBank Policy Approaches

Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy

Conclusions

Page 10: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Bank Placement Policy

1B + Static 16B + Static 16B + Local

L1D L1I L1D L1I

L1D

L1I

L1D

L1I

L1D

L1I

L1D

L1I

L1D L1I L1D L1I

Core 7 Core 6

Core

1

Core 5

Core

0

Core 2 Core 3

Core 4

Page 11: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Bank Placement Policy

1B + Static placement provides fair distribution.

16B configurations concentrate data in few banks.

Placement and migration policies are strictly correlated.

Page 12: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Outline

IntroductionMethodologyBank Policy Approaches

Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy

Conclusions

Page 13: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Bank Access Policy

Partially Serial 9P + 7P Parallel

L1D L1I L1D L1I

L1D

L1I

L1D

L1I

L1D

L1I

L1D

L1I

L1D L1I L1D L1I

Core 7 Core 6

Core

1

Core 5

Core

0

Core 2 Core 3

Core 4

Page 14: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Bank Access Policy

Power efficiency vs. Perfomance.

9P + 7P is a trade-off, but it is still far from the performance potencial.

These results suggest the broad area of improvement on this policy.

Page 15: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Outline

IntroductionMethodologyBank Policy Approaches

Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy

Conclusions

Page 16: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Bank Migration Policy

Static

Gradual + Swapping

Gradual + Replication

L1D L1I L1D L1I

L1D

L1I

L1D

L1I

L1D

L1I

L1D

L1I

L1D L1I L1D L1I

Core 7 Core 6

Core

1

Core 5

Core

0

Core 2 Core 3

Core 4

Page 17: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Bank Migration Policy

Replication reduces the effective size of the cache.

Migration approaches concentrate data blocks in few banks.

Static approach fairly distribute data blocks in the whole cache.

Placement and migration policies are strictly correlated.

Page 18: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Outline

IntroductionMethodologyBank Policy Approaches

Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy

Conclusions

Page 19: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Bank Replacement Policy

Zero-copy One-copy Last Bank

L1D L1I L1D L1I

L1D

L1I

L1D

L1I

L1D

L1I

L1D

L1I

L1D L1I L1D L1I

Core 7 Core 6

Core

1

Core 5

Core

0

Core 2 Core 3

Core 4

Last Bank

Page 20: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Bank Replacement Policy

Giving a second chance to evicted data blocks provides significant performance gain.

Last Bank is a promising mechanism, but this is restricted by its small size.

Further exploration on this policy is required.

Page 21: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Outline

IntroductionMethodologyBank Policy Approaches

Bank Placement PolicyBank Access PolicyBank Migration PolicyBank Replacement Policy

Conclusions

Page 22: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Conclusions

NUCA is characterized by four policies.

NUCA policies are related.

Static placement with no-migration: Good trade-off.

Bank placement and bank migration are strictly correlated.

Bank access: Power efficiency vs. Performance.

Bank replacement: ↑ Performance (unbounded last bank).

Still room for improvement in all policies.

Page 23: Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite

Questions?