Optimization algorithms using SSA

Author
arthurcasey 
Category
Documents

view
30 
download
0
Embed Size (px)
description
Transcript of Optimization algorithms using SSA
Optimization algorithms using SSA
Software Optimizations & Restructuring Research GroupSchool of Electronical Engineering
Seoul National University200621166 wonsub Kim
2
Using static single assignment form (SSA)
Review of SSA translation Place Φ function using dominance frontier
Overview of SSA form (why SSA form? ) SSA form transform step Role of SSA form, benefit of SSA form
Further optimization using SSA form Constant propagation, dead code elimination, induction variable reduc
tion and other issues.
3
Review of SSA form translation
Observation Node X does not dominates Z Node x dominates a immediate d
ominator of Z Key observation
insert the Φ function on the first node Z that is common to another path originating in a node Y with an assignment to V
Node z definition is same as dominance frontier definition
Place Φ function in the dominance frontier nodes of the nodes where each def of V is
4
Review of SSA form translation
Dominance frontier definition Definition – node sets which dominates the immediate predecessor o
f node Y, but do not dominate node Y. From A's point of view, these are the nodes at which other control pat
hs that don't go through A make their earliest appearance. Case 1 – definition 이 dominate 하는 모든 node 에 대해서는 definiti
on 이 reachable Case 2 – case 1 node 를 떠나서 dominance frontier 에 들어가게
되면 그때서야 비로소 같은 변수에 대한 other definition 을 하는 flow를 고려하게 된다 .
How about the assignments in the loop? it also needs to be use Φ function which merge multiple definitions
5
Static single assignment (SSA) form
Single assignment form In MA form, variable is memory location, not a value. In SA form, variable is a value. Simplifying property of variable Data structure  only a single assignment to a variable (single definiti
on), but many uses of it (only one def site, lists of use site) A def must dominate all of its uses!! Variables are renamed to remove multiple assignments
Role of SSA form in optimization Data flow analysis and compiler optimizations are more efficient with
SA form and SSA form simplify many optimizations The need for usedef chain is removed & that info explicitly appear. Quadratic number of usedef chain O(n^2)> linear number O(n) Eliminates false dependences (simplifying context)
6
Why SSA form?
SSA form provides the compiler with a solution to the question “which definitions of a variable reach the points where it is used? > Make defuse chains explicit
every definition knows its uses and every use know its single definition
Makes dataflow optimization more easier and faster For most optimizations reduce space/time requirements
DU chains in SSA form save more space, but spend more In MA form, # of chains for V variables
Defuse chains are so expensive!! Worst case # of chains = O(# of defs (v) * # of uses (v)) <= O (E*
E*V) # of defs (v) proportional to E, # of uses (v) also prop to E
In SSA form, # of chains = O (E * V)
7
Space reduction in DU chain
Defuse chain structure is more simplified
x1 := …x2 := …
x4 := (x1,x2,x3)… := x4
… := x4… := x4
x := … x := …
… := x
… := x… := x
MA form SSA form
x := … x2 := …
on
e s
tep
searc
h
mu
lti ste
p s
earc
h
8
SSA transform step SSA transformation step
1. To get some efficiency from SA form, translate the original code into SA form statically.
2. SSA form is not executable due to pseudoinstructions (compiler internal form), thus should be translated back to MA form execution
Original Code(MA form)
Optimizations
Optimized Code (MA form)
SSA form
code motion, redundancy elimination, constant propagation, …
9
Static single assignment form
Using SSA, further possible optimizations Constant propagation (simple constant, conditional constant) Dead code elimination Induction variable identification Global Value numbering(p349~355), data dependences.. Register allocation
Other considerations More variables, Increase in code size due to Φfunctions But only linearly increased (in practice, SSA is 0.62.4 times larger) Some optimizations are more annoying But on the whole, a win for compilers How does the Φ function choose which xi to use?
We don’t really care about it
10
Dead code elimination (1)
Problem definition assignment to variable with no use can be removed
SSA structure only one definition site & a list of use sites ( easy to check liveness)
Worklist algorithm W < a list of all variables in the SSA program (if transformed) While W is not empty
Remove some variable v from W If v’s list of uses is empty (defuse chain’s use list empty)
Let S be v’s statement of definition If S has no side effects(?) other than the assignment to v Delete S from the program for each variable xi used by S with UD ( keep track last use!! )
Delete S from the list of uses of xi
11
Dead code elimination (2)
Just check which variable value Vi do not show up Remove the phi function or statement that creates Vi Recurse as new Vi value may now have become dead
12
Dead code elimination example
x := 4 y := y+3
L: z := y*5
x := 4 y := y+3 goto L on x<5 z := y*2 x := y+9L: z := y*5
constant propagati
on
x := 4 y := y+3 goto L on 4<5 z := y*2 x := y+9L: z := y*5
dead code eliminatio
n
need extra operation to avoid dangling pointers to du chains for removed stateme
ntsDU chains in MA form
x1 := 4 y2 := y1+3 goto L on x1<5 z1 := y2*2 x2 := y2+9L: x3 := phi(x1,x2) z2 := phi(z0,z1) z3 := y2*5
x1 := 4 y2 := y1+3 goto L on 4<5 z1 := y2*2 x2 := y2+9L: x3 := phi(x1,x2) z2 := phi(z0,z1) z3 := y2*5
x1 := 4 y2 := y1+3
L: x3 := x1 z2 := z0 z3 := y2*5
dead code eliminatio
n
constant propagati
on
SSA forms
dead code eliminatio
n
13
Constant propagation
Evaluate expression at compile time, eliminate dead code, improve efficacy of other optimization
What is constants?? Simple constant – constant for all paths through a program
Faster algorithm  sparse simple constants algorithm using SSA translation
Conditional constants – constant for actual paths through Faster algorithm  sparse conditional constants algorithm using SS
A translation Key points!!
sparse SSA edge traverse is more efficient than Normal CFG edge traverse
14
Simple constant propagation using SSA
Standard worklist algorithm Identify simple constants
If variable is defined using only one constant value, it is simple constant
If variable definition uses phi func and func arguments are all same, v is simple constant!!
Simple constants First iteration  i1, j1, k1 Second iteration – j3 added
Traverse all edges in the CFG, and process. < inefficient!!
I1<1
J1<1
K1<1
J2<phi(j4, j1)
K2<phi(k4, k1)
If k2 < 100
If j2 <20 Return j2
J3<i1
K3<k2 + 1
J5 < k2
K5 < k2 + 2
J4 < phi (j3, j5)
K4 < phi (k3, k5)
15
Sparse simple constant propagation
Standard simple constant propagate algorithm is inefficient!!
For each program points (CFG edge connected node), maintain one constant value for each var.
O(EV), E # of edge in CFG Inefficient, since constant may have to be p
ropagated through irrelevant node > exploit spare dependence
Exploit SSA edges (explicitly connect defs with uses)
Iterate over SSA edges instead of over all CFG edges, SSA has fewer edge than defuse graph
16
Sparse simple constant propagation
Sparse simple worklist algorithm W < list of all statement in the SSA program
While W is not empty Remove statement S from W If S is v < phi(c1, c2, c3,…) and for arbitrary I, j, ci == cj
Replace S by v < c If S is v<c for some constant c
Delete S from the program For each statement T that uses v Substitute c for v in T W < W union {T}
실제로 constant definition’s use edge 에 대해서만 search 를 함으로써 sparse search algorithm 으로 modify 된다 .
17
Conditional constant propagation
Delete infeasible branch due to discovered constant Data flow analysis
Lattice (over defined, defined, never defined) Executability – Is there evidence that block B can ever be executed? Executable assignment – assignments in a executable block B Processed in compile time not in run time
Algorithm Simultaneously find constants + eliminates infeasible branch First find out executable block using following observation
If x < y goto L1 else L2 V[x] = Top or V[y] = Top > E(L1, L2)=T If x < y goto L1 else L2 V[x]=c1, V[y]=c2, c1!=c2, L1 or L2 taken
Second, in any assignments, get lattice value for variables Block executability and variable lattice is updated repeatedly
18
Conditional constant propagation example
Source codeAnalysis for conditional constant propagation
SSA form transform Dead code elimination
By analysis, j2, j3, j5 are always 1, else part is not reachable Unreachable code is eliminated by dead code elimination
19
Conditional constant propagation result
B1 :I1<1
J1<1
K1<1
B2 :J2<phi(j4, j1)
K2<phi(k4, k1)
If k2 < 100B3 :If j2 <20 B4 :Return j2
B5 :J3<i1
K3<k2 + 1
B6 :J5 < k2
K5 < k2 + 2
B7 :J4 < phi (j3, j5)
K4 < phi (k3, k5)
B Exec[B]
1 T
2 T
3 T
4 T
5 T
6 F
7 T
x V[x]
I1 1
J1 1
J2 1
J3 1
J4 1
J5 Bot
K1 0
K2 Top
K3 Top
K4 Top
k5 bot
K2 < phi (k3, 0)
If k2 < 100
K3 < k2 +1
Return 1
J2 constant prop.
B6 dead block
j4 constant prop.
k4 phi function use one argument, so copy propagation
20
Induction variable identification
Induction variable reduction Optimize the SSA graph rather
than handling CFG directly SSA graph clarify the link from
data use to its definition Induction variable value is
increased/decreased by constant in each iteration. I0 – first entry value I2 – value after going through
loop RC – loop invariant expression
21
Induction variable identification
SSAbased algorithm Build SSA representation Iterate from innermost CFG loop to outermost loop (just like
loop invariant code motion search.. Innermost > outermost ) finding SSA cycle
Each cycle may be basic induction variable if a variable in a cycle is a function of loop invariants and its value on the current iteration (how to detect this condition?) Phi function in the cycle have as one of its inputs a
def from inside the loop and a def from outside the loop
The def inside the loop (phi function input) will be part of the cycle and get one operand from the phi function and all others will be loop invariant
Find derived induction variable
22
Induction variable identification example
Source code transformation to SSA form
Build SSA value graph
Find a cycle from SSA graph
Cycle phi function이 앞에서 언급한 조건을 만족하므로 basic induction variable!!
Variable x case
SSA value graph
23
Induction variable identification example
1. transform SSA form 2. find SSA cycle
I2,m2 has a SSA cycle<candidate 3. for each candidate, check basic i
nduction variable condition I2 case
i1 – outside ,i3  inside i3 (inside def) get a operand from ph
i function and another from constant > biv!!
24
Global Value numbering
Global Value numbering (GVN) Symbolic evaluation (not runtime evaluation), if symbolic
number is same, two computation are equal. Compiler optimization based on the SSA IR Build value graph from the SSA form prevent the false variable namevalue name mappings more powerful that global common sub expression (CSE) in
some cases
25
Value numbering example
SSA transform
26
Dependency issues
In optimization, parallelization & scheduling, dependency check is important
3 data dependence  true ( read after write), anti (write after read), output (write after write) dependency
SSA form, true dependence is evident. (def site, use list) There are no anti, output dependence in SSA form SSA form has a single definition of each variable so write after read dependency can’t occur, why? so write after write dependency can’t occur, why? Variable definition dominates all use of it.
27
Other issues
Copy propagation Check live range of each variable, replace the target
variable uses following the copy operation by source variables if the target variable is live.
In SSA form, variable values are assured to remain statically single
So, there is no need to check live range, simply replace it!
Register allocation, common subexpression elimination