High Level Parallel Programming Language Compiling to a Cellular Automata Processing Model...
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of High Level Parallel Programming Language Compiling to a Cellular Automata Processing Model...
High Level Parallel Programming Language Compiling to a Cellular Automata Processing Model
Master’s thesis defense by
Martin Mortensen
November 9th 2007
Two exercises – two solutions
1. Provide convincing argument for correctness of runtime Φ-function structure (page 45)
2. Provide convincing argument for guaranteed arrival of messages (page 60)
The question
Der ønskes en præsentation af specialets
vigtigste ideer. Specifikt bedes du forklare
vejen fra program til kørende cellulære automater
og demonstrere med eksempler.
For hver fase bedes du kort gøre rede for status af
implementationen og eventuelle planlagte
forbedringer.
Overview
Motivation Goal A breakdown of CAPM
Description Status Planned improvements Demonstration
Conclusion
What are cellular automata?
Environment State Neighborhood Rule Configuration
Overview
Motivation Goal A breakdown of CAPM
Description Status Planned improvements Demonstration
Conclusion
Hypothetical Situation
Hypothetical Situation
Hypothetical Situation
Hypothetical Situation
Hypothetical Situation
Hypothetical Situation
Hypothetical Situation
Hypothetical Situation
Don’t worry, we have an old Master’s thesis explaining how to utilize the processing power of cellular automata in an easy way.
If no alien with paint comes by
Dr. Buth A. Nist Makes cellular automata grow on trees.
Dr. Gene S. Blyce Creates a cellular automata bacteria.
Prof. Nanu Miq Anik Creates a self-replicating cellular automata nano
robot. CEO Aqd Eave Memuri
Creates cellular automata memory computers.
Overview
Motivation Goal A breakdown of CAPM
Description Status Planned improvements Demonstration
Conclusion
Cellular Automata Processing Model
We know some Cellular Automata are Turing Complete
The question I have sought to answer is not about computability - the question is about usability.
Usability as:1. Accessibility2. Performance3. Compared to other processing models4. Hardware
CAPM – Q1 - Accessibility:
Can Cellular Automata be used as a general purpose processing model with an easy-to-use front-end?
Yes, an imperative programming language can be compiled to Cellular automata and be run by CAPM.
CAPM – Q2: Performance
Can the CA compiler achieve the expected performance gain?
Yes, if we realize what performance can be expected of CAPM. I.e. polynomial activation.
CAPM – Q3: Comparison
Can CAPM be an alternative to the Stack Based Random Access Processing (SBRAPM) model?
Yes, tests of CAPM have shown that it can beat SBRAPM.
CAPM – Q4 - Tipping point
When will a compiler outputting cellular automata be needed?Cellular automata become abundant and very
cheap to produce. Super-technologies difficult to apply to Von
Neumann architecture turn out to be easily applied to Cellular Automata.
Concerns in CAPM
Same as in SBRAPM Additional concerns - Compiler
Parallelism Constant sized nodes
Additional concerns – Runtime Message Passing Self-modifying cells No central control Parallelism and double-buffered state information
Primary Contributions
Cellular Automata Instruction Set Instruction Set Instruction CA encapsulated in Grid CA
CA Message Passing Solution to static neighborhoods
Techniques for controlling the parallelism Evaluation wave Φ-function control structures
Automatic non-sequential read of variables in a sequential read environment
Overview
Motivation Goal A breakdown of CAPM
Description Status Planned improvements Demonstration
Conclusion
Phases
Programming Language & Parsing Parallelization/sequentialization Translation of AST to constant sized
nodes Inserting nodes into grid Running the compiled program
CAPM – Overview (2)
Phases
Programming Language Parallelization/sequentialization Translation of AST to constant sized
nodes Inserting nodes into grid Running the compiled program
Code examplemain() {var x, result;x = 1;result = 0;if (x ==1) { x = 42;}else { x = 1;}while (x > 1) { if (x / 2 == (x + 1) / 2) { x = x / 2; } else { x = x * 3 + 1; } result = result + 1;}
return result;}
Status – Programming language Current
Basic computational structures. Integer variables, comparative and arithmetic operators, ifelse, while
and output. No dynamic allocation of memory. Parser does not handle illegal syntax.
Planned improvements Boolean algebra Functions Pointers Exceptions Basic data types, e.g. collection Parallel control structures Parser should only accept legal syntax.
Phases
Programming Language Parallelization/sequentialization Translation of AST to constant sized
nodes Inserting nodes into grid Running the compiled program
8 steps of sequentialization
1. Load and reload in while statementsx = x;
2. Renamingx$2 = x;
3. Replace AVarExp with APhiExpx$2 = Φx
4. Associate each APhiExp with assignentsx$2 = Φx[(<x$1, , >, <x$0, , >), ]
5. Initialize conditional pathsx$2 = Φx[(<x$1, , , >, <x$0, , >), {< while0, true>}]x$1-Assignment: {<while0, true>, <if0, false>}x$2-Assignment: {<while0, true>}
6. Domination hierarchyx$2 = Φx[(<x$1, , , >, <x$0, {x$1}, , >), {< while0, true>}]
Use conditionsx$2 = Φx[(<x$1, , , >, <x$0, {x$1}, <if1, true>, >), {< while0, true>}]
1. Clear conditionsx$2 = Φx[(<x$1, , , >, <x$0, {x$1}, <if1, true>, <if1, false>>), {<while0, true>}]
while (x > y) { x = y - 2; //x$0 if (y == 0) y = 5 + x; else x = 1; //x$1 y = y; y = y; x = x; //load, x$2 x = x; //reload}
Demonstration
AST, pre and post sequentialization
Status - sequentialization
CurrentA very pure conceptual solution. Works, but
very rigid structure and large overhead. Planned improvements
One Φ-function, multiple targetsMinimize node overhead
Phases
Programming Language Parallelization/sequentialization Translation of AST to constant sized
nodes Inserting nodes into grid Running the compiled program
MCALIS Compiler
AST to BTNF Introduce Reload statements Compile Φ-functions
Compile each argument by the argument adding value-, use- and clear-listeners. Associate argument with Φ-function.
Introduce clear nodes in while statements
While loop
Status – MCALIS Compiler
Current Simple implementation. No balancing or optimizations. Only main function.
Planned improvements Perform balancing and optimizations of the instruction
graph. Support multiple functions. Support parameterized definitions of configurations.
Phases
Programming Language Parallelization/sequentialization Translation of AST to constant sized
nodes Inserting nodes into grid Running the compiled program
Insert Strategies – Design concerns
Minimize MPP distance sum Minimize congestion Linear time complexity of compiler
SBPM linear CAPM linear
Linear space complexity of compiler Resulting structure should not introduce too
large grid size overhead
No Loop Test
main() { var v00, vr00, v01, vr01, v10, vr10, v11, vr11; vr00 = 1; vr01 = 1; vr10 = 1; vr11 = 1; v00 = 1 - (1 - vr00*vr00) * (1 - vr01*vr10); v01 = 1 - (1 - vr00*vr01) * (1 - vr01*vr11); v10 = 1 - (1 - vr10*vr00) * (1 - vr11*vr10); v11 = 1 - (1 - vr10*vr01) * (1 - vr11*vr11);
return 1;}
SCISCI
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14
Matrix
Gen
erat
ion
s
SCI
IOZDIOZD
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14
Matrix
Gen
erat
ion
s
IOZD
IOESDIOESD
0
500
1000
1500
2000
2500
3000
3500
4 by 4 6 by 6 8 by 8 10 by 10 12 by 12
Matrix
Gen
erat
ion
s
IOESD
COBDCOBD
0
500
1000
1500
2000
2500
3000
3500
4000
4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14
Matrix
Gen
erat
ion
s
COBD
Performance of strategiesNo Loop Comparison
0
5000
10000
15000
20000
25000
30000
4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14
Matrix
Am
ou
nt
SBPM
CAPM Simple ChronologicalInsert
CAPM COBD Insert
CAPM Simple Random Insert
CAPM IOZD
CAPM IOESD
2 iteration testmain() { var counter, v00, vr00, v01, vr01, v10, vr10, v11, vr11; v00 = 1; v01 = 1; v10 = 1; v11 = 1; counter = 2; while( counter > 0) { vr00 = v00; vr01 = v01; vr10 = v10; vr11 = v11; v00 = 1 - (1 - vr00*vr00) * (1 - vr01*vr10); v01 = 1 - (1 - vr00*vr01) * (1 - vr01*vr11); v10 = 1 - (1 - vr10*vr00) * (1 - vr11*vr10); v11 = 1 - (1 - vr10*vr01) * (1 - vr11*vr11); counter = counter - 1; }
return 1;}
2 iteration test – CAPM vs. SBPM2 Loop Iterations: SBPM vs. optimized CAPM
0
5000
10000
15000
20000
25000
30000
35000
40000
4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14
Matrix
Am
ou
nt SBPM
CAPM with COBD Insert
CAPM with IOESD Insert
Insert strategy - conclusion
Fairly simple to achieve increase the efficiency of the CALIS topology
Quite heavy algorithms COBD:
Doubles CALCA amount Halves processing time (=generations) Reduces (almost eliminates) congestion Needs optimizations to become a practical compiler
strategy
Status – CALIS Compiler
CurrentSimple insert strategiesSimple synchronization
Planned improvementsGeneric environmentsBetter insert strategies (both itself and the
optimization)
Phases
Programming Language Parallelization/sequentialization Translation of AST to constant sized
nodes Inserting nodes into grid Running the compiled program
Running the compiled program
UpdateGridUpdateMainCellularAutomaton*
UpdateNodeHandleIngoingMsgHandleOutgoingMsg
MessagePassMessageSwapIntroduceCALIS_MsgToGrid
SynchronizeSubCellularAutomaton*
Demonstration
Runtime
Status – Runtime
CurrentSimple Java implementationNot optimized at all
Planned improvementsOptimize VM Implement GPU CAPM-VM
Overview
Motivation Goal A breakdown of CAPM
Description Status Planned improvements Demonstration
Conclusion
Conclusion
CAPM is a practical processing model and a credible alternative to SBRAPM.
Much more research is required to reach maturity of CAPM. Many open research areas
CA dynamic (active) memory allocation (functions, pointers) and extending the language base in general.
GPU implementation Message passing protocols. Controlling CA parallelism. Insert strategies. Deconstructing runtime algorithm. And many more
Thank you for your time.
Questions?
Bonus: Solutions to the two “exercises”
Correctness of Φ-function control structure Assign outside while Assign inside while Clear of state information
Main problem: Use Conditions Sends/Receive simultaneously Use event received after Clear and EvalComp
Introduce Load before while. Solves all the issues and gives better performance.
Φ-function Structure
Φ-function Structure
CAL_MessagePassingProtocol
Guaranteed Message ArrivalAt least one message gets closer to diagonal
or target each generationNo messages gets lost
Normal TriNodeSwitch
Inserting Strategies
Simple InsertRandom, Chronological
Instruction Oriented Dijkstra SSSPZoned, Expanding square
Cell Oriented Border Dijkstra SSSP Cellular Automata Insert
Local Swap Convergence
COBD
Least weight cell get node inserted.
Idea: Fill cell adding least
weight. Cells are considered as
candidates if they have a neighbor with a node.
Node candidates as in the CALIS oriented insert algorithms.
C C C
C C X C
C X X C
C X X C
C C C C
Optimizing Position OptimizersCOBD
Optimizations to COBD: Let cells remember their lowest weight node When node is inserted, let cells having the node as candidate
find new lowest weight node. Cells only check newly introduced nodes as new candidates. These simple optimizations greatly improves performance (3
days to 15 minutes). Current version still lacks optimizations.
Associations from CALIS nodes to cells, queue, remove node is inefficiently implemented and general trimming of the code.
Cells does not re-calculate their candidate, when a neighbor of the candidate is inserted.
COBD Performance(1)
2 Loop Iterations: COBD Percentile of SCI
0,00%
50,00%
100,00%
150,00%
200,00%
250,00%
300,00%
4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14
Matrix
Per
cen
tile
Generations
#CALCA
Avg. TT
Observed worst case vs Emptygrid worst case
COBD Performance(2)
COBD Travel Time
0
50
100
150
200
250
300
350
4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14
Matrix
Tra
vel
Tim
e (T
T)
Average TT
40% with TT <
60% with TT <
80% with TT <
95% with TT <
COBD Performance (3)
Averrage distance
0
20
40
60
80
100
120
140
4 by 4 6 by 6 8 by 8 10 by 10 12 by 12 14 by 14
Matrix
MP
P D
ista
nce
Avg. Distance Sum
Avg TT
Avg. Distance Sum div 4