High Level Parallel Programming Language Compiling to a Cellular Automata Processing Model...

Post on 20-Dec-2015

216 views 1 download

Transcript of High Level Parallel Programming Language Compiling to a Cellular Automata Processing Model...

High Level Parallel Programming Language Compiling to a Cellular Automata Processing Model

Master’s thesis defense by

Martin Mortensen

November 9th 2007

Two exercises – two solutions

1. Provide convincing argument for correctness of runtime Φ-function structure (page 45)

2. Provide convincing argument for guaranteed arrival of messages (page 60)

The question

Der ønskes en præsentation af specialets

vigtigste ideer. Specifikt bedes du forklare

vejen fra program til kørende cellulære automater

og demonstrere med eksempler.

For hver fase bedes du kort gøre rede for status af

implementationen og eventuelle planlagte

forbedringer.

Overview

Motivation Goal A breakdown of CAPM

Description Status Planned improvements Demonstration

Conclusion

What are cellular automata?

Environment State Neighborhood Rule Configuration

Overview

Motivation Goal A breakdown of CAPM

Description Status Planned improvements Demonstration

Conclusion

Hypothetical Situation

Hypothetical Situation

Hypothetical Situation

Hypothetical Situation

Hypothetical Situation

Hypothetical Situation

Hypothetical Situation

Hypothetical Situation

Don’t worry, we have an old Master’s thesis explaining how to utilize the processing power of cellular automata in an easy way.

If no alien with paint comes by

Dr. Buth A. Nist Makes cellular automata grow on trees.

Dr. Gene S. Blyce Creates a cellular automata bacteria.

Prof. Nanu Miq Anik Creates a self-replicating cellular automata nano

robot. CEO Aqd Eave Memuri

Creates cellular automata memory computers.

Overview

Motivation Goal A breakdown of CAPM

Description Status Planned improvements Demonstration

Conclusion

Cellular Automata Processing Model

We know some Cellular Automata are Turing Complete

The question I have sought to answer is not about computability - the question is about usability.

Usability as:1. Accessibility2. Performance3. Compared to other processing models4. Hardware

CAPM – Q1 - Accessibility:

Can Cellular Automata be used as a general purpose processing model with an easy-to-use front-end?

Yes, an imperative programming language can be compiled to Cellular automata and be run by CAPM.

CAPM – Q2: Performance

Can the CA compiler achieve the expected performance gain?

Yes, if we realize what performance can be expected of CAPM. I.e. polynomial activation.

CAPM – Q3: Comparison

Can CAPM be an alternative to the Stack Based Random Access Processing (SBRAPM) model?

Yes, tests of CAPM have shown that it can beat SBRAPM.

CAPM – Q4 - Tipping point

When will a compiler outputting cellular automata be needed?Cellular automata become abundant and very

cheap to produce. Super-technologies difficult to apply to Von

Neumann architecture turn out to be easily applied to Cellular Automata.

Concerns in CAPM

Same as in SBRAPM Additional concerns - Compiler

Parallelism Constant sized nodes

Additional concerns – Runtime Message Passing Self-modifying cells No central control Parallelism and double-buffered state information

Primary Contributions

Cellular Automata Instruction Set Instruction Set Instruction CA encapsulated in Grid CA

CA Message Passing Solution to static neighborhoods

Techniques for controlling the parallelism Evaluation wave Φ-function control structures

Automatic non-sequential read of variables in a sequential read environment

Overview

Motivation Goal A breakdown of CAPM

Description Status Planned improvements Demonstration

Conclusion

Phases

Programming Language & Parsing Parallelization/sequentialization Translation of AST to constant sized

nodes Inserting nodes into grid Running the compiled program

CAPM – Overview (2)

Phases

Programming Language Parallelization/sequentialization Translation of AST to constant sized

nodes Inserting nodes into grid Running the compiled program

Code examplemain() {var x, result;x = 1;result = 0;if (x ==1) { x = 42;}else { x = 1;}while (x > 1) { if (x / 2 == (x + 1) / 2) { x = x / 2; } else { x = x * 3 + 1; } result = result + 1;}

return result;}

Status – Programming language Current

Basic computational structures. Integer variables, comparative and arithmetic operators, ifelse, while

and output. No dynamic allocation of memory. Parser does not handle illegal syntax.

Planned improvements Boolean algebra Functions Pointers Exceptions Basic data types, e.g. collection Parallel control structures Parser should only accept legal syntax.

Phases

Programming Language Parallelization/sequentialization Translation of AST to constant sized

nodes Inserting nodes into grid Running the compiled program

8 steps of sequentialization

1. Load and reload in while statementsx = x;

2. Renamingx$2 = x;

3. Replace AVarExp with APhiExpx$2 = Φx

4. Associate each APhiExp with assignentsx$2 = Φx[(<x$1, , >, <x$0, , >), ]

5. Initialize conditional pathsx$2 = Φx[(<x$1, , , >, <x$0, , >), {< while0, true>}]x$1-Assignment: {<while0, true>, <if0, false>}x$2-Assignment: {<while0, true>}

6. Domination hierarchyx$2 = Φx[(<x$1, , , >, <x$0, {x$1}, , >), {< while0, true>}]

Use conditionsx$2 = Φx[(<x$1, , , >, <x$0, {x$1}, <if1, true>, >), {< while0, true>}]

1. Clear conditionsx$2 = Φx[(<x$1, , , >, <x$0, {x$1}, <if1, true>, <if1, false>>), {<while0, true>}]

while (x > y) { x = y - 2; //x$0 if (y == 0) y = 5 + x; else x = 1; //x$1 y = y; y = y; x = x; //load, x$2 x = x; //reload}

Demonstration

AST, pre and post sequentialization

Status - sequentialization

CurrentA very pure conceptual solution. Works, but

very rigid structure and large overhead. Planned improvements

One Φ-function, multiple targetsMinimize node overhead

Phases

Programming Language Parallelization/sequentialization Translation of AST to constant sized

nodes Inserting nodes into grid Running the compiled program

MCALIS Compiler

AST to BTNF Introduce Reload statements Compile Φ-functions

Compile each argument by the argument adding value-, use- and clear-listeners. Associate argument with Φ-function.

Introduce clear nodes in while statements

While loop

Status – MCALIS Compiler

Current Simple implementation. No balancing or optimizations. Only main function.

Planned improvements Perform balancing and optimizations of the instruction

graph. Support multiple functions. Support parameterized definitions of configurations.

Phases

Programming Language Parallelization/sequentialization Translation of AST to constant sized

nodes Inserting nodes into grid Running the compiled program

Insert Strategies – Design concerns

Minimize MPP distance sum Minimize congestion Linear time complexity of compiler

SBPM linear CAPM linear

Linear space complexity of compiler Resulting structure should not introduce too

large grid size overhead

No Loop Test

main() { var v00, vr00, v01, vr01, v10, vr10, v11, vr11; vr00 = 1; vr01 = 1; vr10 = 1; vr11 = 1; v00 = 1 - (1 - vr00*vr00) * (1 - vr01*vr10); v01 = 1 - (1 - vr00*vr01) * (1 - vr01*vr11); v10 = 1 - (1 - vr10*vr00) * (1 - vr11*vr10); v11 = 1 - (1 - vr10*vr01) * (1 - vr11*vr11);

return 1;}

SCISCI

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

Gen

erat

ion

s

SCI

IOZDIOZD

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

Gen

erat

ion

s

IOZD

IOESDIOESD

0

500

1000

1500

2000

2500

3000

3500

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12

Matrix

Gen

erat

ion

s

IOESD

COBDCOBD

0

500

1000

1500

2000

2500

3000

3500

4000

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

Gen

erat

ion

s

COBD

Performance of strategiesNo Loop Comparison

0

5000

10000

15000

20000

25000

30000

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

Am

ou

nt

SBPM

CAPM Simple ChronologicalInsert

CAPM COBD Insert

CAPM Simple Random Insert

CAPM IOZD

CAPM IOESD

2 iteration testmain() { var counter, v00, vr00, v01, vr01, v10, vr10, v11, vr11; v00 = 1; v01 = 1; v10 = 1; v11 = 1; counter = 2; while( counter > 0) { vr00 = v00; vr01 = v01; vr10 = v10; vr11 = v11; v00 = 1 - (1 - vr00*vr00) * (1 - vr01*vr10); v01 = 1 - (1 - vr00*vr01) * (1 - vr01*vr11); v10 = 1 - (1 - vr10*vr00) * (1 - vr11*vr10); v11 = 1 - (1 - vr10*vr01) * (1 - vr11*vr11); counter = counter - 1; }

return 1;}

2 iteration test – CAPM vs. SBPM2 Loop Iterations: SBPM vs. optimized CAPM

0

5000

10000

15000

20000

25000

30000

35000

40000

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

Am

ou

nt SBPM

CAPM with COBD Insert

CAPM with IOESD Insert

Insert strategy - conclusion

Fairly simple to achieve increase the efficiency of the CALIS topology

Quite heavy algorithms COBD:

Doubles CALCA amount Halves processing time (=generations) Reduces (almost eliminates) congestion Needs optimizations to become a practical compiler

strategy

Status – CALIS Compiler

CurrentSimple insert strategiesSimple synchronization

Planned improvementsGeneric environmentsBetter insert strategies (both itself and the

optimization)

Phases

Programming Language Parallelization/sequentialization Translation of AST to constant sized

nodes Inserting nodes into grid Running the compiled program

Running the compiled program

UpdateGridUpdateMainCellularAutomaton*

UpdateNodeHandleIngoingMsgHandleOutgoingMsg

MessagePassMessageSwapIntroduceCALIS_MsgToGrid

SynchronizeSubCellularAutomaton*

Demonstration

Runtime

Status – Runtime

CurrentSimple Java implementationNot optimized at all

Planned improvementsOptimize VM Implement GPU CAPM-VM

Overview

Motivation Goal A breakdown of CAPM

Description Status Planned improvements Demonstration

Conclusion

Conclusion

CAPM is a practical processing model and a credible alternative to SBRAPM.

Much more research is required to reach maturity of CAPM. Many open research areas

CA dynamic (active) memory allocation (functions, pointers) and extending the language base in general.

GPU implementation Message passing protocols. Controlling CA parallelism. Insert strategies. Deconstructing runtime algorithm. And many more

Thank you for your time.

Questions?

Bonus: Solutions to the two “exercises”

Correctness of Φ-function control structure Assign outside while Assign inside while Clear of state information

Main problem: Use Conditions Sends/Receive simultaneously Use event received after Clear and EvalComp

Introduce Load before while. Solves all the issues and gives better performance.

Φ-function Structure

Φ-function Structure

CAL_MessagePassingProtocol

Guaranteed Message ArrivalAt least one message gets closer to diagonal

or target each generationNo messages gets lost

Normal TriNodeSwitch

Inserting Strategies

Simple InsertRandom, Chronological

Instruction Oriented Dijkstra SSSPZoned, Expanding square

Cell Oriented Border Dijkstra SSSP Cellular Automata Insert

Local Swap Convergence

COBD

Least weight cell get node inserted.

Idea: Fill cell adding least

weight. Cells are considered as

candidates if they have a neighbor with a node.

Node candidates as in the CALIS oriented insert algorithms.

C C C

C C X C

C X X C

C X X C

C C C C

Optimizing Position OptimizersCOBD

Optimizations to COBD: Let cells remember their lowest weight node When node is inserted, let cells having the node as candidate

find new lowest weight node. Cells only check newly introduced nodes as new candidates. These simple optimizations greatly improves performance (3

days to 15 minutes). Current version still lacks optimizations.

Associations from CALIS nodes to cells, queue, remove node is inefficiently implemented and general trimming of the code.

Cells does not re-calculate their candidate, when a neighbor of the candidate is inserted.

COBD Performance(1)

2 Loop Iterations: COBD Percentile of SCI

0,00%

50,00%

100,00%

150,00%

200,00%

250,00%

300,00%

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

Per

cen

tile

Generations

#CALCA

Avg. TT

Observed worst case vs Emptygrid worst case

COBD Performance(2)

COBD Travel Time

0

50

100

150

200

250

300

350

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

Tra

vel

Tim

e (T

T)

Average TT

40% with TT <

60% with TT <

80% with TT <

95% with TT <

COBD Performance (3)

Averrage distance

0

20

40

60

80

100

120

140

4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14

Matrix

MP

P D

ista

nce

Avg. Distance Sum

Avg TT

Avg. Distance Sum div 4