High Level Parallel Programming Language Compiling to a Cellular Automata Processing Model...

High Level Parallel Programming Language Compiling to a Cellular Automata Processing Model

Master’s thesis defense by

Martin Mortensen

November 9th 2007

Two exercises – two solutions

1. Provide convincing argument for correctness of runtime Φ-function structure (page 45)

2. Provide convincing argument for guaranteed arrival of messages (page 60)

The question

Der ønskes en præsentation af specialets

vigtigste ideer. Specifikt bedes du forklare

vejen fra program til kørende cellulære automater

og demonstrere med eksempler.

For hver fase bedes du kort gøre rede for status af

implementationen og eventuelle planlagte

forbedringer.

Overview

Motivation Goal A breakdown of CAPM

Description Status Planned improvements Demonstration

Conclusion

What are cellular automata?

Environment State Neighborhood Rule Configuration

Overview

Conclusion

Hypothetical Situation

Don’t worry, we have an old Master’s thesis explaining how to utilize the processing power of cellular automata in an easy way.

If no alien with paint comes by

Dr. Buth A. Nist Makes cellular automata grow on trees.

Dr. Gene S. Blyce Creates a cellular automata bacteria.

Prof. Nanu Miq Anik Creates a self-replicating cellular automata nano

robot. CEO Aqd Eave Memuri

Creates cellular automata memory computers.

Overview

Conclusion

Cellular Automata Processing Model

We know some Cellular Automata are Turing Complete

The question I have sought to answer is not about computability - the question is about usability.

Usability as:1. Accessibility2. Performance3. Compared to other processing models4. Hardware

CAPM – Q1 - Accessibility:

Can Cellular Automata be used as a general purpose processing model with an easy-to-use front-end?

Yes, an imperative programming language can be compiled to Cellular automata and be run by CAPM.

CAPM – Q2: Performance

Can the CA compiler achieve the expected performance gain?

Yes, if we realize what performance can be expected of CAPM. I.e. polynomial activation.

CAPM – Q3: Comparison

Can CAPM be an alternative to the Stack Based Random Access Processing (SBRAPM) model?

Yes, tests of CAPM have shown that it can beat SBRAPM.

CAPM – Q4 - Tipping point

When will a compiler outputting cellular automata be needed?Cellular automata become abundant and very

cheap to produce. Super-technologies difficult to apply to Von

Neumann architecture turn out to be easily applied to Cellular Automata.

Concerns in CAPM

Same as in SBRAPM Additional concerns - Compiler

Parallelism Constant sized nodes

Additional concerns – Runtime Message Passing Self-modifying cells No central control Parallelism and double-buffered state information

Primary Contributions

Cellular Automata Instruction Set Instruction Set Instruction CA encapsulated in Grid CA

CA Message Passing Solution to static neighborhoods

Techniques for controlling the parallelism Evaluation wave Φ-function control structures

Automatic non-sequential read of variables in a sequential read environment

Overview

Conclusion

Phases

Programming Language & Parsing Parallelization/sequentialization Translation of AST to constant sized

nodes Inserting nodes into grid Running the compiled program

CAPM – Overview (2)

Phases

Programming Language Parallelization/sequentialization Translation of AST to constant sized

Code examplemain() {var x, result;x = 1;result = 0;if (x ==1) { x = 42;}else { x = 1;}while (x > 1) { if (x / 2 == (x + 1) / 2) { x = x / 2; } else { x = x * 3 + 1; } result = result + 1;}

return result;}

Status – Programming language Current

Basic computational structures. Integer variables, comparative and arithmetic operators, ifelse, while

and output. No dynamic allocation of memory. Parser does not handle illegal syntax.

Planned improvements Boolean algebra Functions Pointers Exceptions Basic data types, e.g. collection Parallel control structures Parser should only accept legal syntax.

Phases

8 steps of sequentialization

1. Load and reload in while statementsx = x;

2. Renamingx$2 = x;

3. Replace AVarExp with APhiExpx$2 = Φx

4. Associate each APhiExp with assignentsx$2 = Φx[(<x$1, , >, <x$0, , >), ]

5. Initialize conditional pathsx$2 = Φx[(<x$1, , , >, <x$0, , >), {< while0, true>}]x$1-Assignment: {<while0, true>, <if0, false>}x$2-Assignment: {<while0, true>}

6. Domination hierarchyx$2 = Φx[(<x$1, , , >, <x$0, {x$1}, , >), {< while0, true>}]

Use conditionsx$2 = Φx[(<x$1, , , >, <x$0, {x$1}, <if1, true>, >), {< while0, true>}]

1. Clear conditionsx$2 = Φx[(<x$1, , , >, <x$0, {x$1}, <if1, true>, <if1, false>>), {<while0, true>}]

while (x > y) { x = y - 2; //x$0 if (y == 0) y = 5 + x; else x = 1; //x$1 y = y; y = y; x = x; //load, x$2 x = x; //reload}

Demonstration

AST, pre and post sequentialization

Status - sequentialization

CurrentA very pure conceptual solution. Works, but

very rigid structure and large overhead. Planned improvements

One Φ-function, multiple targetsMinimize node overhead

Phases

MCALIS Compiler

AST to BTNF Introduce Reload statements Compile Φ-functions

Compile each argument by the argument adding value-, use- and clear-listeners. Associate argument with Φ-function.

Introduce clear nodes in while statements

While loop

Status – MCALIS Compiler

Current Simple implementation. No balancing or optimizations. Only main function.

Planned improvements Perform balancing and optimizations of the instruction

graph. Support multiple functions. Support parameterized definitions of configurations.

Phases

Insert Strategies – Design concerns

Minimize MPP distance sum Minimize congestion Linear time complexity of compiler

SBPM linear CAPM linear

Linear space complexity of compiler Resulting structure should not introduce too

large grid size overhead

No Loop Test

main() { var v00, vr00, v01, vr01, v10, vr10, v11, vr11; vr00 = 1; vr01 = 1; vr10 = 1; vr11 = 1; v00 = 1 - (1 - vr00*vr00) * (1 - vr01*vr10); v01 = 1 - (1 - vr00*vr01) * (1 - vr01*vr11); v10 = 1 - (1 - vr10*vr00) * (1 - vr11*vr10); v11 = 1 - (1 - vr10*vr01) * (1 - vr11*vr11);

return 1;}

SCISCI

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

IOZDIOZD

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

IOESDIOESD

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12

Matrix

COBDCOBD

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

Performance of strategiesNo Loop Comparison

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

CAPM Simple ChronologicalInsert

CAPM COBD Insert

CAPM Simple Random Insert

CAPM IOZD

CAPM IOESD

2 iteration testmain() { var counter, v00, vr00, v01, vr01, v10, vr10, v11, vr11; v00 = 1; v01 = 1; v10 = 1; v11 = 1; counter = 2; while( counter > 0) { vr00 = v00; vr01 = v01; vr10 = v10; vr11 = v11; v00 = 1 - (1 - vr00*vr00) * (1 - vr01*vr10); v01 = 1 - (1 - vr00*vr01) * (1 - vr01*vr11); v10 = 1 - (1 - vr10*vr00) * (1 - vr11*vr10); v11 = 1 - (1 - vr10*vr01) * (1 - vr11*vr11); counter = counter - 1; }

return 1;}

2 iteration test – CAPM vs. SBPM2 Loop Iterations: SBPM vs. optimized CAPM

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

nt SBPM

CAPM with COBD Insert

CAPM with IOESD Insert

Insert strategy - conclusion

Fairly simple to achieve increase the efficiency of the CALIS topology

Quite heavy algorithms COBD:

Doubles CALCA amount Halves processing time (=generations) Reduces (almost eliminates) congestion Needs optimizations to become a practical compiler

strategy

Status – CALIS Compiler

CurrentSimple insert strategiesSimple synchronization

Planned improvementsGeneric environmentsBetter insert strategies (both itself and the

optimization)

Phases

Running the compiled program

UpdateGridUpdateMainCellularAutomaton*

UpdateNodeHandleIngoingMsgHandleOutgoingMsg

MessagePassMessageSwapIntroduceCALIS_MsgToGrid

SynchronizeSubCellularAutomaton*

Demonstration

Runtime

Status – Runtime

CurrentSimple Java implementationNot optimized at all

Planned improvementsOptimize VM Implement GPU CAPM-VM

Overview

Conclusion

CAPM is a practical processing model and a credible alternative to SBRAPM.

Much more research is required to reach maturity of CAPM. Many open research areas

CA dynamic (active) memory allocation (functions, pointers) and extending the language base in general.

GPU implementation Message passing protocols. Controlling CA parallelism. Insert strategies. Deconstructing runtime algorithm. And many more

Thank you for your time.

Questions?

Bonus: Solutions to the two “exercises”

Correctness of Φ-function control structure Assign outside while Assign inside while Clear of state information

Main problem: Use Conditions Sends/Receive simultaneously Use event received after Clear and EvalComp

Introduce Load before while. Solves all the issues and gives better performance.

Φ-function Structure

CAL_MessagePassingProtocol

Guaranteed Message ArrivalAt least one message gets closer to diagonal

or target each generationNo messages gets lost

Normal TriNodeSwitch

Inserting Strategies

Simple InsertRandom, Chronological

Instruction Oriented Dijkstra SSSPZoned, Expanding square

Cell Oriented Border Dijkstra SSSP Cellular Automata Insert

Local Swap Convergence

Least weight cell get node inserted.

Idea: Fill cell adding least

weight. Cells are considered as

candidates if they have a neighbor with a node.

Node candidates as in the CALIS oriented insert algorithms.

C C X C

C X X C

C C C C

Optimizing Position OptimizersCOBD

Optimizations to COBD: Let cells remember their lowest weight node When node is inserted, let cells having the node as candidate

find new lowest weight node. Cells only check newly introduced nodes as new candidates. These simple optimizations greatly improves performance (3

days to 15 minutes). Current version still lacks optimizations.

Associations from CALIS nodes to cells, queue, remove node is inefficiently implemented and general trimming of the code.

Cells does not re-calculate their candidate, when a neighbor of the candidate is inserted.

COBD Performance(1)

2 Loop Iterations: COBD Percentile of SCI

50,00%

100,00%

150,00%

200,00%

250,00%

300,00%

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

Generations

#CALCA

Avg. TT

Observed worst case vs Emptygrid worst case

COBD Performance(2)

COBD Travel Time

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

Average TT

40% with TT <

60% with TT <

80% with TT <

95% with TT <

COBD Performance (3)

Averrage distance

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

Avg. Distance Sum

Avg TT

Avg. Distance Sum div 4

High Level Parallel Programming Language Compiling to a Cellular Automata Processing Model...

Documents

Transcript of High Level Parallel Programming Language Compiling to a Cellular Automata Processing Model...

EGR 252 Ch. 9 Lecture1 JMB 2014 9th edition Slide 1 Chapter 9: One- and Two- Sample Estimation Statistical Inference Estimation Tests of hypotheses.

RUL: a Declarative Language for Updating RDF Data · RUL: a Declarative Language for Updating RDF Data Master’s Thesis Stavros Sahtouris Thesis Supervisor: Vassilis Christophides,

Σχολική Βιβλιοθήκη 9ου Γυμνασίου Περιστερίουdemo.openeclass.org/modules/document/file.php/DEMO-A868/9th junio… · • Παρουσιάσεις

Funny Kites, by E3 class - 9th Primary School of Larissa

Statistics, telematic networks and EDI bulletin : 2 - 1994core.ac.uk/download/pdf/148916165.pdfIn order to reduce the burden of compiling statistical information on individual organisations,

Master’s Thesis 2017 60 ECTS Faculty of Chemistry ...

ΣΤΑΤΙΚΗ ΜΕΛΕΤΗ ΣΤΕΓΑΣΤΡΟΥ ΚΕΡΚΙΔΩΝ …eeme.ntua.gr/proceedings/9th/Papers/033_PAP_Tsolakis.pdfΣΤΑΤΙΚΗ ΜΕΛΕΤΗ ΣΤΕΓΑΣΤΡΟΥ ΚΕΡΚΙΔΩΝ

Lab 4 – Module α2 · 3.014 Materials Laboratory Dec. 9th – Dec. 14st, 2005 Lab 4 – Module α2 Miscibility Gaps OBJECTIVES Review miscibility gaps in binary systems Introduce

ΕΥΦΡΟΣΥΝΗ ΠΑΡΑΣΚΕΥΑ CV ΦΕΒΡOYAΡΙΟΣ 2015 · 2015-11-19 · 2. 9th Balkan Biochemical and Biophysical Days, Θεσσαλονίκη, 21-23 Μαΐου 1992.

ERASMUS MUNDUS Master’s Course in TurbomacHinery aeRomechanic UniverSity Training

CBBC BUSINESS CULTURAL TRAINING Part 1 - … – BUSINESS CULTURAL TRAINING Part 1 建议 Advice Ι 支持 Support Ι 网络 Networking STEWART FERGUSON 9th November 2016AGENDA Contemporary

University of Turin DEPARTMENT OF PHYSICSUniversity of Turin DEPARTMENT OF PHYSICS Master’s Degree in Nuclear and Subnuclear Physics Measurement of the relative phase between EM

arXiv:1603.08360v4 [math.DS] 17 Mar 2018In 1880 Andrei A. Markov, a 24-year old student from St Petersburg, discovered in his master’s thesis [30] a remarkable connection between

Observation of Bound-State β Decay of Fully Ionized 207 Tl at the FRS-ESR 9 th Feb. 2004 / Ryo Koyama Master’s Program in Fundamental Science of Matter,

9th HalfMarathon Kalampaka Trikala Stamopoulos 20.039th_HalfMarathon_Kalampaka_Trikala_Stamopoulos 20.03.2016 rankTotal rankSex rankCategory Bib Surname Name FinishTime FinishNetTime

AB Vassilopoulos 9th

M. S. PROGRAM HANDBOOKThe Master’s of Science program in Mental Health Counseling prepares graduates to provide comprehensive counseling services to groups and individuals. Graduates

ΑΝΑΒΑΘΜΙΣΗ ΥΦΙΣΤΑΜΕΝΗΣ ΓΕΦΥΡΑΣ …eeme.ntua.gr/proceedings/9th/Papers/108_PAP_Katsaras.pdfΗ μέγιστη ταχύτητα κατασκευής με εκτέλεση

Investigating the role of KLHL12 and β-arrestin2 in dopamine D4 ... · Dorien CLARISSE Master’s dissertation submitted to obtain the degree of Master of Biochemistry and Biotechnology

Tufts CSnr/toolkit/working/sml/rtl/specifying.pdfAbstract The Zephyr project is part of an eﬀort to build a National Compiler Infrastructure, which will support research in compiling