Υποθετικός Πολυνηματισμός · If t t f S lti Th di B d PInfrastructure for...

Υποθετικός Πολυνηματισμός

Τα προβλήματα του παράλληλου προγραμματισμού

Εντοπισμός παραλληλισμούμ ς ρ η μ– χειροκίνητα (επισκόπηση)– αυτόματα (compiler)

Έκφραση παραλληλισμούl l l ( Pth d MPI)– low-level (π.χ. Pthreads, MPI)

– high-level (π.χ. OpenMP, Cilk, Intel TBB, Galois, UPC κ.λπ.)Απεικόνιση

– scheduling, creation, termination, etc.g, , ,– operating system, runtime system

Συγχρονισμός– εύκολος όπως ο coarse-grain

αποδοτικός όπως ο fine grain– αποδοτικός όπως ο fine-grain– deadlock-free– composable

Απαιτήσεις– κλιμακωσιμότητα– ευκολία προγραμματισμού– υψηλή παραγωγικότητα– ορθότηταορθότητα– architectural awareness

Parallelizing Compilersg p

Στοχεύουν σε «κανονικές» εφαρμογέςf l– for loops, arrays

– (παραδοασικά, HPC scientific codes…)

Συνήθως 2 περάσματα:Συνήθως, 2 περάσματα:– Είναι ένα loop «ασφαλές» για παραλληλοποίηση;

• στατική ανάλυση (dependence, alias)ή η ( p , )

– «Αξίζει» να παραλληλοποιηθεί;• cost model, profiling

Π δ ί I l CC SUIF 1 P l i PGI O 64– Παραδείγματα: Intel CC, SUIF-1, Polaris, PGI, Open64

Τ ί ό έΤι γίνεται με τις «ακανόνιστες» εφαρμογές;– ασαφείς/μη-σταθερές εξαρτήσεις (π.χ. input dependent)

«περίπλοκες» δομές– «περίπλοκες» δομές

Ενδεικτική κατηγοριοποίησηή ηγ ρ η η

Χαρακτηριστικά κώδικα

Παράδειγμα Πεδίο εφαρμογήςκώδικα

Arrays with direct addressing

for(i = 1; i<n; i++) {A[i] = B[i]*C[i];

• scientific codes• media applications

Arrays with indirect addressing

for(i = 1; i<n; i++) {z A[K[i]]

• circuit simulationsstructural mechanics modelingaddressing z = A[K[i]];

A[L[i]] = z + C[i];}

• structural mechanics modeling• molecular dynamics simulation• fluid flows

Recursive Data Structures (RDS): • trees

while(ptr=ptr->next) {ProcessElement(ptr->val);

• graph algorithms (shortest paths,minimum spanning trees, max flows)• simulations (N-body, graphics)

• lists• graphs• sets • hash-tables

• meshes (refinement, triangulation)• dynamic programming• data mining •hash tables

• ……

Υποθετικός Πολυνηματισμόςημ μSpeculative Multithreading or Thread-Level Speculation

επιτρέπει τη δημιουργία και εκτέλεση παράλληλων νημάτων παρά τις πιθανές εξαρτήσεις δεδομένων

ελπίζουμε στην μη-ύπαρξη εξαρτήσεων– «αισιόδοξος» παραλληλισμόςξ ς ρ η μ ς

Run Time

Compile Time

Parallelize without

d i fDetect

Commit

ModificationNo

detection of

dependency Violation

Squash and

Re-execution Yes

Παράδειγμαρ γμ

for(i=0; i<n; i++) { Iteration iIteration i+1

Thread 1 Thread 2 Thread 3

for(i=0; i<n; i++) {… = A[B[i]];…A[C[i]] = …

… = A[4]… = A[3]

… = A[5]

Iteration i+1Iteration i+2

RAW i l tiA[C[i]] …} A[5] = …

A[2] = …A[6] = …

RAW violation

… = A[8]… = A[4]

Iteration i+3Iteration i+4

Iteration i+2REDO

A[6] = …A[9] = …

… = A[5]

A[6] =A[6] = …

Γενικό Μοντέλο

1 main (non-speculative) thread– γηραιότερο

– το μόνο που επιτρέπεται να κάνει commit

N ≥ 0 speculative threads

σχέσεις predecessor-successor ανάμεσα στα threadsσχέσεις predecessor successor ανάμεσα στα threads– sequential semantics

οποιοδήποτε thread μπορεί να κάνει spawn κάποιοοποιοδήποτε thread μπορεί να κάνει spawn κάποιο άλλο (speculative) thread

Γενικό ΜοντέλοSequential Execution

TU0 TU1 TU2Non-speculative

Thread Unit

SP1 INIT

SP2TSP1

SP2 INIT

TSP2 TSP2 validate

validate

TSP1 validate

SP: spawning pointTSP: thread start point

Προγραμματιστικές Δομέςρ γρ μμ μ

ξεκινάμε από το σειριακό πρόγραμμα και εξάγουμε d d ή (ή ί )«ordered» νήματα (ή εργασίες )

Υποθετικός Πολυνηματισμόςημ μ

Το HW/SW (runtime system) παρέχει υποστήριξη για:h k i ti ώ ήcheckpointing των καταχωρητών στην αρχή της

εκτέλεσης μιας εργασίαςbuffering της υποθετικής κατάστασης πουbuffering της υποθετικής κατάστασης που δημιουργείται– π.χ. write-buffer, cache

λ θ λπαρακολούθηση των λειτουργιών μνήμης των εργασιών– παραβίαση εξαρτήσεων: αναίρεση των επιδράσεων της (πιο)παραβίαση εξαρτήσεων: αναίρεση των επιδράσεων της (πιο)

υποθετικής εργασίας και επανεκκίνηση• π.χ. απόρριψη write-buffer, restore registers

μη παραβίαση εξαρτήσεων: commit αποτελεσμάτων των– μη παραβίαση εξαρτήσεων: commit αποτελεσμάτων των εργασιών

• στη σειρά προγράμματος!

Παράδειγμα Ανίχνευσης Παραβιάσεωνρ γμ χ η ρ β

Επέκταση cache μεh b δ– timestamp («epoch number»): δείχνει τη σειρά στο σειριακό

πρόγραμμα– violation flagviolation flag

Επέκταση cache line με ειδικά bits– SL: κάποιο speculative load έχει προσπελάσει την cache linep χ ρ η– SM: η γραμμή έχει τροποποιηθεί υποθετικά

Σε κάθε write-invalidation, μια εργασία μαρκάρεται για ακύρωση αν:– η γραμμή είναι παρούσα στην cache της

SL bit ί t– το SL bit της είναι set– το «epoch number» του invalidator υποδεικνύει προγενέστερο

νήμα στη σειρά προγράμματοςήμ η ρ ρ γρ μμ

Processor 1Epoch 5

Processor 2Epoch 6

p = q = &xpoc 5…

②STORE *q = 2;

become_speculative()①LOAD a = *p;…③attempt commit();

p = q = &x

… ③attempt_commit();…

L1 Cache L1 Cache

Epoch # = 5

Violation? = FALSE

Epoch # = 6

X = 1 T F

Speculatively

Loaded?

Spec lati el

Violation? = FALSE

X = 1 T F

① Read request

Speculatively

Modified?

Processor 1Epoch 5

Processor 2Epoch 6

p = q = &xpoc 5…

②STORE *q = 2;

p = q = &x

L1 Cache L1 Cache

Epoch # = 5

Violation? = FALSE

Epoch # = 6

X = 1 T F

Speculatively

Loaded?

Spec lati el

Violation? = FALSE

X = 1 2 T T

X = 1 T F

① Read request

Speculatively

Modified?SL SM

X 1 2 T T

② Invalidation (Epoch #5) q

Processor 1Epoch 5

Processor 2Epoch 6

p = q = &xpoc 5…

②STORE *q = 2;

p = q = &x

L1 Cache L1 Cache

③Epoch # = 5

Violation? = FALSE

Epoch # = 6

X = 1 T F

Speculatively

Loaded?

Spec lati el

Violation? = TRUE

X = 1 2 T T

X = 1 T F

① Read request

Speculatively

Modified?SL SM

X 1 2 T T

② Invalidation (Epoch #5) q

Τι εξαρτήσεις μας νοιάζουν τελικά;ρ ή μ

TM vs TLS

αισιόδοξος συγχρονισμός

αισιόδοξος παραλληλισμός

TM vs TLS

TLS TM

Contention Management

Task Spawning

Result Buffering

TLS TM

Context Dependency

Buffering

Passing Violation Detection

Replay

Sequential Ordering Checkpointing

TransactionScheduling

TLS and TM share multiple hardware componentsTLS and TM share multiple hardware components

Βιβλιογραφία

Speculative Multithreading στο Hardwarep g[Hammond98] L. Hammond, M. Willey and K. Olukotun. “Data Speculation Support for a Chip Multiprocessor”. SIGOPS Oper. Syst. Rev., vol. 32, no. 5, pages 58–69, 1998.[Hydra] Hydra project at Stanford University http://ogun stanford edu/[Hydra] Hydra project at Stanford University. http://ogun.stanford.edu/[Iacoma] I-ACOMA group at University of Illinois at Urbana-Champaign. http://iacoma.cs.uiuc.edu/work/speculation.html[Krishnan99] V. Krishnan and J. Torrellas. “A Chip-Multiprocessor Architecture with Speculative Multithreading”. IEEE Trans. Comput., vol. 48, no. 9, pages 866–880, 1999.[Multiscalar] Multiscalar project at University of Wisconsin Madison http://pages cs wisc edu/ mscalar[Multiscalar] Multiscalar project at University of Wisconsin-Madison. http://pages.cs.wisc.edu/~mscalar[Renau05] J. Renau, J. Tuck, W. Liu, L. Ceze, K. Strauss and J. Torrellas. “Tasking with Out-of-Order Spawn in TLS Chip Multiprocessors: Microarchitecture and Compilation”. 19th annual International Conference on Supercomputing (ICS ’05).[Quinones05] C. Quinones, C. Madriles, J. Sanchez, P. Marcuello, A. Gonzalez, D. Tullsen. "Mitosis Compiler: An I f t t f S l ti Th di B d P C t ti Sli " ACM SIGPLAN fInfrastructure for Speculative Threading Based on Pre-Computation Slices" . ACM SIGPLAN conference on Programming language design and implementation (PLDI ‘05)[Sohi95] G. Sohi, S. Breach and T. Vijaykumar. “Multiscalar Processors”. 22nd Annual International Symposium on Computer architecture (ISCA ’95). [Stampede] STAMPede project at Carnegie-Mellon University. http://www.cs.cmu.edu/~stampede/[Steffan98] G. Steffan and T. Mowry. “The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization”. IEEE 4th International Symposium on High Performance Computer Architecture (HPCA ’98). [Steffan00] G. Steffan, C. Colohan, A. Zhai and T. Mowry. “A Scalable Approach to Thread-Level Speculation”. 27th Annual International Symposium on Computer Architecture (ISCA ’00).

Speculative Multithreading στο Softwarep g

[Berger09] E. Berger, T. Yang, T. Liu, G. Novark. "Grace: Safe Multithreaded Programming for C/C++“. 24th ACM SIGPLAN conference on Object oriented programming systems j p g g ylanguages and applications (OOPSLA’09).

[Kulkarni07] M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala and L. Chew. “Optimistic Parallelism Requires Abstractions”. ACM SIGPLAN 2007 conference on Programming language design and implementation (PLDI ‘07), June 2007

[Rauchwerger95] L. Rauchwerger and D. Padua. “The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization”. ACM SIGPLAN 1995 conference on Programming language design and implementation (PLDI ’95) p 218 232conference on Programming language design and implementation (PLDI 95). p.218-232, June 18-21, 1995, La Jolla, California, United States.

[Tian08] C. Tian, M. Feng, V. Nagarajan and R. Gupta. “Copy or Discard Execution Model for Speculative Parallelization on Multicores.” 41st IEEE/ACM International Symposium on p / y pMicroarchitecture (MICRO ’08), pages 330–341, Washington, DC, USA, 2008. IEEE Computer Society.

[von Praun07] C. von Praun, L. Ceze, C. Cascaval. “Implicit Parallelism with Ordered Transactions”. 2th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP’07).

Compiler Frameworks for Spec. Multithreadingg

[Bhowmik02] A. Bhowmik and M. Franklin. “A general compiler framework for speculative multithreading” 14th annual ACM symposium on Parallel algorithmsspeculative multithreading . 14th annual ACM symposium on Parallel algorithms and architectures (SPAA ’02).

[Liu06] W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau and J. Torrellas. “POSH: a TLS compiler that exploits program structure” 11th ACM SIGPLAN symposium onTLS compiler that exploits program structure . 11th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP ’06).

Parallelizing Compilersg p

[Allen02] R. Allen and K. Kennedy. “Optimizing Compilers for Modern Architectures: A Dependence-Based Approach”. Morgan Kaufmann, 2002.[Brandes97] T. Brandes, S. Chaumette, M. Counilh, J. Roman, A. Darte, F. Desprez and J. Mignot. “HPFIT: a set of integrated tools for the parallelization of applications using high performance Fortran. part I: HPFIT and the Transtool environment”. Parallel Computing, 23(1-2), 1997.[Bugnion96] E. Bugnion, J. Anderson, T. Mowry, M. Rosenblum and M. Lam. “Compiler-Directed Page Coloring for Multiprocessors”. International Symposium on Architectural Support for Programming g p y p pp g gLanguages and Operating Systems (ASPLOS ‘96). [Goff91] G. Goff, K. Kennedy and C. Tseng. “Practical Dependence Testing”. ACM SIGPLAN Conference in Programming Language Design and Implementation (PLDI ‘91). [Hall96] M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion and M. Lam. “Maximizing multiprocessor performance with the SUIF compiler” IEEE Computer 29(12) 1996multiprocessor performance with the SUIF compiler . IEEE Computer, 29(12), 1996.[ICC] Intel C/C++ compiler. http://software.intel.com/en-us/intel-compilers/[Ishihara06] M. Ishihara, H. Honda and M. Sato. “Development and implementation of an interactive parallelization assistance tool for OpenMP: iPat/OMP”. IEICE - Transactions on Information and Systems, E89-D(2), 2006.( ),[Kennedy91] K. Kennedy, K. McKinley, and C. Tseng. “Interactive parallel programming using the Parascopeeditor”. IEEE TPDS, 2(3), 1991.[Open64] Open64. http://www.open64.net[Padua86] D. Padua and M. Wolfe, “Advanced Compiler Optimizations for Supercomputers,” Comm. ACM,

l 29 1 184 1 201 D 1986vol. 29, pp. 1,184–1,201, Dec. 1986.[Padua93] D. Padua, R. Eigenmann, J. Hoeflinger, P. Petersen, P. Tu, S. Weatherford and K. Faigin. “Polaris: A new-generation parallelizing compiler for MPPs”. Technical report, In CSRD No. 1306. UIUC, 1993.[Wolfe89] M. Wolfe. “Optimizing Compilers for Supercomputers”. The MIT Press, 1989.

Transactional Memory & Spec. Multithreading

[Baek07] W. Baek, C. Cao Minh, M. Trautmann, C. Kozyrakis and K. Olukotun. “The OpenTM Transactional Application Programming Interface” 16thThe OpenTM Transactional Application Programming Interface . 16th International Conference on Parallel Architecture and Compilation Techniques (PACT ’07).

[G 08] R G H A R D M C Y W d Q Li “L SP TM A[Guo08] R. Guo, H. An, R. Dou, M. Cong, Y. Wang and Q. Li. “LogSPoTM: A Scalable Thread Level Speculation Model Based on Transactional Memory”. 13th Asia-Pacific Computer Systems Architecture Conference (ACSAC ’08).

[Porter09] L. Porter, B. Choi and D. Tullsen. “Mapping Out a Path from Hardware Transactional Memory to Speculative Multithreading”. 18th International Conference on Parallel Architectures and Compilation Techniques (PACT ’09).

[Yoo08] R. Yoo and H. Lee. “Helper Transactions: Enabling Thread-Level Speculation via A Transactional Memory System”. Workshop on Parallel p y y pExecution of Sequential Programs on Multicore Architectures (PESPMA ’08).

Υποθετικός Πολυνηματισμός · If t t f S lti Th di B d PInfrastructure for...

Documents

Transcript of Υποθετικός Πολυνηματισμός · If t t f S lti Th di B d PInfrastructure for...

Spectral Methods in Causal Dynamical Triangulations · Spectral graph analysis on CDT spatial slices Observation ... Usual integer value on regular spaces: e.g. D S(˝) = d on Rd

KPC Series NEW - KYOCERA Asia-Pacific · Turning, grooving , cut-off and threading styles available Polygon Taper Shank with High Rigidity and High Precision NEW ... ISO26623-1 Polygonal

Threading Building Blocks

Threading Building Blocks - uoc.gr

1. LATHE MACHINE - WordPress.com · Machine Tool Lab Ram Prashanth Reddy Page 1 1. LATHE MACHINE Perform Face Turning, centre drilling, Plane Turning, Taper Turning, Grooving, Threading,

SIMPLOT 246683 - Avocado Slices S/O · 246683 - Avocado Slices S/O Perfect slices of rich Hass avocado; Consistent pricing, quality and availability all year; Elevate your menu and

9. Aplicación de la Transformada de Laplace a los sistemas ... · 9. Aplicación de la Transformada de Laplace a los sistemas LTI (I) {} 0 X ( ) () ( )sxtedtTLxt t Xsst TL ≡≡

S-72.1110 2007 Luento7...2 22.11.2007 3 LTI-järjestelmät • Sijoittamalla heräte sekä arvattu vaste differentiaaliyhtälöön saadaan • Joten taajuudella f pyörivää osoitin

The Convolution Sum for Discrete-Time LTI Systems

V.S.B. ENGINEERING COLLEGE, KARUR Department of …heat, energy and hyper threading and also describes the merits, demerits and applications. 13.A computer system has a main memory

Lectura 1: Conceptos de Filtrado · H(z) incluye el circulo unitario, entonces la respuesta en frecuencia H(ω) del filtro digital LTI puede ser obtenida de la siguiente forma: Asumiendo

Oti l i lti fd ttOptical manipulation of edge state ... · PDF fileOti l i lti fd ttOptical manipulation of edge state transpppgort in topological insulators ... A 364 me364 meVnm,

Small Combination of Slices in BanachSpaces · 2019. 8. 16. · space with an isolated point. ... (2P)-properties is a recent topic which generated a lot of interest in the study

Sup ersymmetry , da rk matter and the LHC GGI Conference ... · Th e se regions c ou ld p ossibly lea d to d istin guisha ... j will dec a y p ro ducing in ge n e ral m u lti-lep

Identifying threading dislocation types in ammonothermally ...

WiFeS IRAF DATA R P - Intranet - ANU...dodar dosubsky doslitsky overscan or fitbias doskyframe dozero Combine shuffled slices dosubshuffled data cube Subtract a separate blank sky

The Convolution Sum for Discrete-Time LTI Systemsahouse/mirror/engi7824/course_notes_7824... · The Convolution Sum for DT LTI Systems The Convolution Sum for Discrete-Time LTI Systems

The Convolution Sum for Discrete-Time LTI Systemsahouse/engi7824/course_notes_7824_part6.pdfThe Convolution Sum for DT LTI Systems The Convolution Sum for Discrete-Time LTI Systems

Template-A5...tomato, cheese & anchovy sandwich Ingredients - 4 slices of white bread - 1 or 2 tomatoes (cut into slices) - 2 tbs of grated Parmesan cheese - 6 anchovies - paprika

Céline A.J. Godard,*, †, 1, Roxanna M. Smolowitz, - PBSInduction of cetacean cytochrome P4501A1 by β-naphthoflavone exposure of skin biopsy slices Short title (40 characters max):