Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs Javier Lira ψ...

Last Bank: Dealing with Address Reuse inNon-Uniform Cache Architecture for CMPs

Javier Lira ψ

Carlos Molina ф

Antonio González λ

λ Intel Barcelona Research Center

Intel Labs - UPC

Barcelona, Spain

antonio.gonzalez@intel.com

ф Dept. Enginyeria Informàtica

Universitat Rovira i Virgili

Tarragona, Spain

carlos.molina@urv.net

ψ Dept. Arquitectura de Computadors

Universitat Politècnica de Catalunya

Barcelona, Spain

javier.lira@ac.upc.edu

Euro-Par 2009, Delft (The Netherlands) - August 27, 2009

Outline

Introduction

Methodology

Last Bank

Characterization of replacements in NUCA

Last Bank Optimizations

Conclusions

Introduction

CMPs have emerged as a dominant paradigm in system design.

1. Keep performance improvement while reducing power consumption.

2. Take advantage of Thread-level parallelism.

Commercial CMPs are currently available.

CMPs incorporate larger and shared last-level caches.

Wire delay is a key constraint.

Non-Uniform Cache Architecture (NUCA) was first proposed in ASPLOS 2002 by Kim et al.[1].

NUCA divides a large cache in smaller and faster banks.

Banks close to cache controller have smaller latencies than further banks.

Processor

[1] C. Kim, D. Burger and S.W. Keckler. An Adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS ‘02 4

Outline

Introduction

Methodology

Last Bank

Conclusions

Methodology

Simulation tools:Simics + GEMSCACTI v6.0

PARSEC Benchmark Suite

Number of cores 8

Core processor Out-of-order SPARCv9

Main Memory Size 4 Gbytes

Memory Bandwidth 512 Bytes/cycle

L1 cache latency 3 cycles

NUCA bank latency 2 cycles

Router delay 1 cycle

On-chip wire delay 1 cycle

Main memory latency 350 cycles (from core)

Private L1 data caches 8 KBytes

Private L1 instr. caches 8 KBytes

Shared L2 NUCA cache 1 MByte, 256 Banks

Baseline NUCA cache architecture

8 cores

256 banks

[2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO ‘04

Outline

Introduction

Methodology

Last Bank

Conclusions

Last Bank

Data movements concentrate most accessed data in few banks.

Data replacements in HOT banks are unfair.

Last Bank

An extra bank is included in the NUCA cache.

Acts as a Victim cache, but it is not fully-associative.

Provides evicted data a second chance for keeping in the NUCA.

Last Bank

Performance benefits restricted by Last Bank size.

Significant performance potential.

Analysis of reused addresses to find improvement points.

Outline

Introduction

Methodology

Last Bank

Conclusions

How many evicted addresses are later reused?

How many cycles do a reused address usually spend out of the NUCA before being reinserted?

Where were reused addresses located within the NUCA just before being evicted?

What action did motivate reused addresses eviction from NUCA?

Reused address statistics

Nearly 70% of evicted addresses return to the NUCA cache.

Most of the reused address, return to NUCA at least twice.

Time between Eviction and Reinsertion

Nearly 30% of evicted addresses return in less than 100,000 cycles.

In blackscholes, almost 50% of reused addresses return to NUCA in less than 1,000 cycles.

Last location within the NUCA

Most of reused addresses were evicted from Local Banks.

Most of addresses replaced from Central Banks are not later reused.

Outline

Introduction

Methodology

Last Bank

Conclusions

Selective Last Bank

Target: To reduce pollution in Last Bank.

This mechanism allows to select the evicted data blocks that are going to be stored in the Last Bank.

Implemented Selective Last Bank: Stores data blocks, if and only if, they were evicted from a Local Bank. Otherwise, sends them back to the main memory.

LRU Prioritising Last Bank

Target: To maintain reused addresses in the NUCA cache.

Modification of data eviction policy of NUCA banks.

Prioritises lines that come from Last Bank during the data replacement process.

@AP: 0

@BP: 0

@CP: 0

@DP: 1

0 1 2 3

MRU LRU

@DP: 0

@AP: 0

@BP: 0

@CP: 0

0 1 2 3

@D, P:0

@A, P:0 @B, P:0 @C, P:0

Results

Both optimizations increase Last Bank performance benefits.

There is still room for improvement.

Adaptive filters will be analysed in future works.

Outline

Introduction

Methodology

Last Bank

Conclusions

Data movements provoke unfair replacements in HOT banks.

Last Bank reduce access latency of promptly reused addresses.

Huge performance potential.

Two optimizations are proposed: Selective Last Bank: Reduce pollution in Last Bank. LRU Prioritising Last Bank: Maintain reused addresses in the NUCA cache.

Last Bank: Dealing with Address Reuse inNon-Uniform Cache Architecture for CMPs

Questions?

Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs Javier Lira ψ...

Documents

Transcript of Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs Javier Lira ψ...

Μπορμπουδάκη Λ., Οικονομολόγος · 2014. 6. 16. · Μπορμπουδάκη Λ., Οικονομολόγος tei Κρήτης Σχολή Τεχνολογικών

HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs Javier Lira ψ Carlos Molina ф Antonio González ψ,λ λ Intel Barcelona Research Center Intel Labs.

Daniel E. Sheehy - LSU · • Finite density 2D gas: Always BEC? Strong Attraction: λ>λ c Weak Attraction: λ

λ peak =5,27 x 10 -7 nm λ peak T=b λ peak =5,8 x 10 -7 nm λ peak =6,4 x 10 -7 nm λ peak =7,2 x 10 -7 nm λ peak =8,3 x 10 -7 nm Radiazione del corpo nero.

Λ. Μαβίλης, Λήθη

λ | Lenses

emekorinthias.files.wordpress.com€¦ · 4 . Δίνονται οι μιγαδικοί αριθμοί z, w με . z 1 3 2 i,= λ+ + λ+ λ∈( ) ( ) και . w 4i 2−−= . i. Να

Λ. Τολστόι – Ο Ερημίτης

UPC Compiler Support for Trace-Level Speculative Multithreaded Architectures Antonio González λ,ф Carlos Molina ψ Jordi Tubella ф INTERACT-9, San Francisco.

Λ Ο ΑΥΓΟΥΣΤΟΣ 1973

εκπληκτικές γέφυρες λ

PREIS LISTE. · PDF filen ThermUltra PP 1,6 mit λ 0,07 n ThermUltra Energy+ PP 2 mit λ 0,067 n ThermSuper PP 2 mit λ 0,08 n ThermStandard PP 2 mit λ 0,09 und 0,10

Las incantadas (ΜΟΥΣΙΚΟ ΣΧΟΛΕΙΟ ΚΑΤΕΡΙΝΗΣ 2015) - Λ. Μαγαλιού, Λ. Βασιλειάδης

LA SEZIONE AUREA UN VIAGGIO TRA MATEMATICA, NATURA E ARTE Ф.

Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

Μαθηματικά Ε′ Δημοτικού · Web viewΕ Δ Μ Ε Δ Μ 2 € 200 λ. 57 56 48 47 Ε Δ Μ 1 € 100 λ. Ε Δ Μ 1 € 50 λ. 150 λ. 1ο μισό € 50 λ. Τριάντα

Modeling IBD for Pairs of Relativescsg.sph.umich.edu/abecasis/class/666.17.pdfIBD IBD O IBD MZ IBD IBD O IBD MZ K K K K K λ λ λ λ λ zModel ignores contribution of other genes

Definiciones : E dE d = E λ E = ∫ E λ dλ RλAλDλRλAλDλ R λ + A λ + D λ = 1 Cuerpo negro A λ = 1 para todo λ Kirchoff E λ AλAλ = E nλ E nλ = f(λ,T) Planck.

λ= ⇒= =λ ⎛⎞ ⎜⎟− ⎝⎠ 22 n 22

What’s New in Fiber? - BICSI - advancing the information ... · 62.5 micron 50 micron ~8 micron 125 micron ... Mux λ λ λ Demux λ 18. Why ... 2 3 https: ...