Runtime Analysis of Population-based Evolutionary Algorithms

Post on 21-Jul-2015

29 views 1 download

Transcript of Runtime Analysis of Population-based Evolutionary Algorithms

Runtime Analysis of Population-basedEvolutionary Algorithms1

Per Kristian Lehre

School of Computer ScienceUniversity of Nottingham

United Kingdom

IEEE CEC2015Sendai, Japan, May 25th 2015

1The latest version of these slides are available fromhttp://www.cs.nott.ac.uk/~pkl/populations/

Evolution

Selection

Variation

Example - Linear Ranking Selection[Goldberg and Deb, 1991]

α : [0, 1]→ [0,∞) a ranking function if

∫ 1

0α(y)dy = 1

Prob. of selecting individual with rank ≤ γ is

β(γ) :=

∫ γ

0α(y)dy

Linear ranking selection is obtained for

α(γ) := η − 2(η − 1)γ,

where η ∈ (1, 2) specifies selection pressure.

γ

0

λ− 1

α β

Rank

Example - Linear Ranking Selectionfor t = 0 to ∞ do

Sort current population Pt according to fitness f , stf(Pt(1)) ≥ f(Pt(2)) ≥ · · · ≥ f(Pt(λ)).

for i = 1 to λ do(Selection)Sample r in 1, ..., λ st. Pr (r ≤ γλ) = β(γ).Pt+1(i) := Pt(r).(Mutation)Flip each bit position in Pt+1(i) with prob. χ/n.

Problem

I Is it possible to predict the behaviour of this and other EAs?

I Can we parameterise the EA so that it optimises f efficiently, e.g.

Onemax(x) :=

n∑i=1

xi

Example - Linear Ranking Selectionfor t = 0 to ∞ do

Sort current population Pt according to fitness f , stf(Pt(1)) ≥ f(Pt(2)) ≥ · · · ≥ f(Pt(λ)).

for i = 1 to λ do(Selection)Sample r in 1, ..., λ st. Pr (r ≤ γλ) = β(γ).Pt+1(i) := Pt(r).(Mutation)Flip each bit position in Pt+1(i) with prob. χ/n.

Problem

I Is it possible to predict the behaviour of this and other EAs?

I Can we parameterise the EA so that it optimises f efficiently, e.g.

Onemax(x) :=

n∑i=1

xi

Evolutionary Algorithms are Algorithms

Criteria for evaluating algorithms

1. CorrectnessI Does the algorithm always give the correct output?

2. Computational ComplexityI How much computational resources does

the algorithm require to solve the problem?

Same criteria also applicable to evolutionary algorithms

1. Correctness.I Discover global optimum in finite time?

2. Computational Complexity.I Time (number of function evaluations)

most relevant computational resource.

Runtime Analysis

How does the distribution of the runtime

T := min tλ ≥ 0 | Pt ∩ X ∗ 6= ∅

depend on

I parameter settings of the algorithm

I problem size n, and other problem characteristics

Runtime as a function of problem size

4 5 6 7 8 9 11 13 15 17 19 21 23 25

0e+

004e

+06

8e+

06

RS on Easy FSM instance class.

Number of states in FSM (n).

Itera

tions

of R

S.

I Exponential =⇒ Algorithm impractical on problem.

I Polynomial =⇒ Possibly efficient algorithm.

Runtime as a function of problem size

10 60 160 260 360 460 560 660 760

020

0060

0010

000

(1+1) EA on Easy FSM instance class.

Number of states in FSM (n).

Itera

tions

of (

1+1)

EA

.

I Exponential =⇒ Algorithm impractical on problem.I Polynomial =⇒ Possibly efficient algorithm.

Outline

IntroductionBasic Probability TheoryRuntime Analysis

Upper boundsThe Level Based TheoremExamplesMutation and SelectionMutation, Crossover and SelectionNoisy and Uncertain Fitness

Lower BoundsNegative Drift Theorem for PopulationsMutation-Selection Balance

Basic Probability Theory

Basic Probability Theory

Ω

1

2

3

4

5

6

RX

Probability Triple (Ω,F,Pr)

I Ω : Sample space

I F : σ-algebra (family of events)

I Pr : F → [0, 1] probability function(satisfying probability axioms)

Events

I E ∈ F

Random Variable

I X : Ω→ R and X−1 : B → F

I X = y ⇐⇒ ω ∈ Ω | X(ω) = yExpectation

I E [X] :=∑

y yPr (X = y).

Basic Probability Theory

Ω

1

2

3

4

5

6

RX

Probability Triple (Ω,F,Pr)

I Ω : Sample space

I F : σ-algebra (family of events)

I Pr : F → [0, 1] probability function(satisfying probability axioms)

Events

I E ∈ F

Random Variable

I X : Ω→ R and X−1 : B → F

I X = y ⇐⇒ ω ∈ Ω | X(ω) = yExpectation

I E [X] :=∑

y yPr (X = y).

Basic Probability Theory

Ω

1

2

3

4

5

6

RX

Probability Triple (Ω,F,Pr)

I Ω : Sample space

I F : σ-algebra (family of events)

I Pr : F → [0, 1] probability function(satisfying probability axioms)

Events

I E ∈ F

Random Variable

I X : Ω→ R and X−1 : B → F

I X = y ⇐⇒ ω ∈ Ω | X(ω) = yExpectation

I E [X] :=∑

y yPr (X = y).

Basic Probability Theory

Ω

1

2

3

4

5

6

RX

Probability Triple (Ω,F,Pr)

I Ω : Sample space

I F : σ-algebra (family of events)

I Pr : F → [0, 1] probability function(satisfying probability axioms)

Events

I E ∈ F

Random Variable

I X : Ω→ R and X−1 : B → F

I X = y ⇐⇒ ω ∈ Ω | X(ω) = y

Expectation

I E [X] :=∑

y yPr (X = y).

Basic Probability Theory

Ω

1

2

3

4

5

6

RX

Probability Triple (Ω,F,Pr)

I Ω : Sample space

I F : σ-algebra (family of events)

I Pr : F → [0, 1] probability function(satisfying probability axioms)

Events

I E ∈ F

Random Variable

I X : Ω→ R and X−1 : B → F

I X = y ⇐⇒ ω ∈ Ω | X(ω) = yExpectation

I E [X] :=∑

y yPr (X = y).

Some Basic Properties

Two events E1 and E1 are independent iff

Pr (E1 ∩ E2) = Pr (E2) · Pr (E2)

Linearity of expectationFor any random variables X and Y , and constants a, b ∈ R

E [aX + bY ] = aE [X] + bE [Y ]

=⇒ Often easier to find E [X] than Pr (X ≥ k).

Conditional Probability

Given two events A and E , where Pr (E) > 0

Pr (A | E) :=Pr (A ∩ E)

Pr (E)

Law of total probability, 0 < Pr (E) < 1

Pr (A) = Pr (E) · Pr (A | E) + Pr(E)· Pr

(A | E

)

Conditional Probability

Given two events A and E , where Pr (E) > 0

Pr (A | E) :=Pr (A ∩ E)

Pr (E)

Law of total probability, 0 < Pr (E) < 1

Pr (A) ≥ Pr (E) · Pr (A | E)

+ Pr(E)· Pr

(A | E

)

Example: Standard bitwise mutation

Mutation of bitstrings of length n

I each bit flipped independently with probability χn

Define n random variables X1, . . . , Xn

Xi :=

1 if bit i was flipped

0 otherwise.

Expectation

E [Xi] = 1 · χn

+ 0 · χn

n

Expected number of bit positions that are flipped

E [X1 + · · ·+Xn] = E [X1] + · · ·+ E [Xn] = χ.

Example: (contd.)

I Probability that no bit positions are flipped (independence)

Pr (X = 0 ∧ · · · ∧Xn = 0) =

n∏i=1

Pr (Xi = 0) =(

1− χ

n

)nI Note that for any constant δ ∈ (0, 1) and n sufficiently large(

1− χ

n

)n≥ (1− δ)e−χ

I Probability that only bit position i is flipped(χn

)≥(χn

)∏j 6=i

Pr (Xj = 0) ≥ χ

n(1− δ)e−χ

I Probability that k specific bit positions are flipped, say 1 to k(χn

)k n∏j=k+1

Pr (Xj = 0) ≥(χn

)k(1− δ)e−χ

General Scheme for non-elitist Evolutionary Algorithms

X

for t = 0 to ∞ dofor i = 1 to λ do

Pt+1(i) ∼ D(Pt)

I X search space (genospace)

I Pt population vector containing λ individualsI D mapping from populations to distributions over X

I can describe selection, variation, noise, etc...

Runtime Analysis

X

for t = 0 to ∞ dofor i = 1 to λ do

Pt+1(i) ∼ D(Pt)

ProblemGiven a subset B ⊂ X (e.g. set of optimal search points), find

I E [TB] (expected runtime)

I Pr (TB ≤ t(n)) (probability of success in t(n) steps)

where

TB = minλt ≥ 0 | Pt ∩B 6= ∅

(i.e. TB is time until at least one individual is in subset B)

Approaches to Runtime Analysis of PopulationsI Infinite population sizeI Markov chain analysis [He and Yao, 2003]I No parent population, or monomorphic populations

I (1+1) EAI (1+λ) EA [Jansen et al., 2005]I (1,λ) EA [Rowe and Sudholt, 2012]

I Fitness-level techniquesI (1+λ) EA [Witt, 2006]I (N+N) EAs [Chen et al., 2009]I non-elitist EAs with unary variation operators

[Lehre, 2011a, Dang and Lehre, 2014]I Classical drift analysis

I Fitness proportionate selection [Neumann et al., 2009]I Family trees

I (µ+1) EA [Witt, 2006]I (µ+1) IA [Zarges, 2009]

I Multi-type branching processes [Lehre and Yao, 2012]I Negative drift theorem for populations [Lehre, 2011b]

I Level-based analysis [Corus et al., 2014]

Level-based Theorem

Outline - Level-based Theorem2

1. Definition of levels of search space

2. Definition of “current level” of population

3. Statement of theorem and its conditions

4. Recommendations for how to apply the theorem

5. Some example applications

6. Derivation of special casesI Mutation-only EAsI CrossoverI Mutation-only EAs with uncertain fitness (e.g. noise)

2It is out of scope of this tutorial to present the proof of this theorem. Theproof uses drift analysis with a distance function that takes into account thecurrent level, as well as the number of individuals above the current level.

Level partitioning of search space

Definition(A1, . . . , Am+1) a level-partitioning of search space X if

I⋃m+1j=1 Aj = X (i.e., together, levels cover the search space)

I Ai ∩Aj = ∅ whenever i 6= j (i.e., they are nonoverlapping)

I the last level Am+1 covers the optimum for the algorithm

To denote everything above level j, we also define

A+j :=

m+1⋃i=j+1

Ai

Current level of a population P wrt γ0 ∈ (0, 1)

DefinitionThe unique integer j ∈ [m] such that

|P ∩A+j−1| ≥ γ0λ > |P ∩A+

j |

Example

Current level wrt γ0 = 12

is .....

Current level of a population P wrt γ0 ∈ (0, 1)

DefinitionThe unique integer j ∈ [m] such that

|P ∩A+j−1| ≥ γ0λ > |P ∩A+

j |

Example

Current level wrt γ0 = 12

is 4.

Level-based theorem (informal version)

If the following three conditions are satisfied

(G1) it is always possible to sample above the current level

(G2) the proportion of the population above the current levelincreases in expectation

(G3) the population size is large enough

then the expected time to reach the last level cannot be too high.

Level-based Theorem3 (1/2) (setup)

I Given a level-partitioning (A1, . . . , Am+1) of XI m upgrade probabilities z1, . . . , zm ∈ (0, 1] andzmin := mini zi

I a parameter δ ∈ (0, 1), and

I a constant γ0 ∈ (0, 1),

3This version of the theorem simplifies some of the conditions at the cost ofa slightly less precise bound on the runtime.

Level-based Theorem (2/2)If for any j ∈ [m] and any population P ∈ Xλ where

γλ := |P ∩A+j | < γ0λ ≤ |P ∩A+

j−1|

a new individual y ∼ D(P ) is in A+j with probability

Pr(y ∈ A+

j

)≥zj if γ = 0

γ(1 + δ) if γ > 0(G1&G2)

and the population size λ is bounded from below by

λ ≥8

γ0δ2

(ln

(m

γ0δ7zmin

)+ 11

)(G3)

then the expected time to reach the last level Am+1 is less than

1536

δ5

mλ ln(λ) +

m∑j=1

1

zj

Level-based Theorem visualised

≥ γ(1 + δ)

≥ zjD(Pt)

A+mA

+1

A+j

A+j+1

· · ·Pt

γ0λ γλ

Suggested recipe for application of level-based theorem

1. Find a partition (A1, . . . , Am+1) of X that reflects thestate of the algorithm, and where Am+1 is the goal state.

2. Find parameters γ0 and δ and a configuration of thealgorithm (e.g., mutation rate, selective pressure) such thatwhenever |P ∩A+

j | = γλ > 0, condition (G2) holds, i.e.,

Pr(y ∈ A+

j

)≥ γ(1 + δ)

3. For each level j ∈ [m], estimate a lower bound zj ∈ (0, 1)

such that whenever |P ∩A+j | = 0, condition (G1) holds, i.e.,

Pr(y ∈ A+

j

)≥ zj

4. Calculate the sufficient population size λ from condition (G3).

5. Read off the bound on expected runtime.

Simple Example to Illustrate Theorem

Problem

I search space X = 1, · · · ,m+ 1I fitness function f(x) = x (to be maximised)

Evolutionary Algorithm

for t = 0, 1, 2, . . . until termination condition dofor i = 1 to λ do

Select a parent x from Pt using (µ, λ)-selectionObtain y by mutating xSet i-th offspring Pt+1(i) = y

(µ, λ)-selection mechanism

1. Sort the current population P = (x1, . . . , xλ) such that

f(x1) ≥ f(x2) ≥ . . . ≥ f(xλ)

2. return Unif(x1, . . . , xµ)

A simple mutation operator...

Pr (V (x) = y) =

13

if y ∈ x− 1, x, x+ 10 otherwise.

Step 1: Level-partition

Problem

I search space X = 1, · · · ,m+ 1I fitness function f(x) = x (to be maximised)

Level-partition of X

Aj := jA+j = j + 1, j + 2, . . . ,m+ 1

Step 1: Level-partition

Problem

I search space X = 1, · · · ,m+ 1I fitness function f(x) = x (to be maximised)

Level-partition of X

Aj := jA+j = j + 1, j + 2, . . . ,m+ 1

Properties of a Population at Level j

I Assume Aj is the current level of the population P , i.e.,

γλ = |P ∩A+j | < γ0λ ≤ |P ∩A+

j−1| (1)

I (µ, λ) selects parent u.a.r. among best µ individualsI by choosing parameter γ0 := µ/λ, assumption (1) implies

I Pr(

select parent in A+j−1

)=

1

I Pr(

select parent in A+j

)=

γλµ

Properties of a Population at Level j

I Assume Aj is the current level of the population P , i.e.,

γλ = |P ∩A+j | < γ0λ ≤ |P ∩A+

j−1| (1)

I (µ, λ) selects parent u.a.r. among best µ individualsI by choosing parameter γ0 := µ/λ, assumption (1) implies

I Pr(

select parent in A+j−1

)= 1

I Pr(

select parent in A+j

)= γλ

µ

Condition (G2)

Assuming that λµ

= 94

=1+1

2

1−13

Pr(y ∈ A+

j

)

≥ Pr(

select parent in A+j

)· Pr (do not downgrade)

≥ γ ·λ

µ·(

1−1

3

)= γ

(1 +

1

2

).

≥ γ(1 + δ)

=⇒ Condition (G2) satisfied for δ = 1/2.

Condition (G2)

Assuming that λµ

= 94

=1+1

2

1−13

Pr(y ∈ A+

j

)≥ Pr

(select parent in A+

j

)· Pr (do not downgrade)

≥ γ ·λ

µ·(

1−1

3

)

= γ

(1 +

1

2

).

≥ γ(1 + δ)

=⇒ Condition (G2) satisfied for δ = 1/2.

Condition (G2)

Assuming that λµ

= 94

=1+1

2

1−13

Pr(y ∈ A+

j

)≥ Pr

(select parent in A+

j

)· Pr (do not downgrade)

≥ γ ·λ

µ·(

1−1

3

)= γ

(1 +

1

2

).

≥ γ(1 + δ)

=⇒ Condition (G2) satisfied for δ = 1/2.

Condition (G1)

Pr(y ∈ A+

j

)≥

Pr (select parent in Aj) · Pr(

upgrade offspring to A+j

)≥ 1 ·

1

3

= zj > 0

=⇒ Condition (G1) satisfied by choosing zj := 13

for all j ∈ [m].

Condition (G1)

Pr(y ∈ A+

j

)≥ Pr (select parent in Aj) · Pr

(upgrade offspring to A+

j

)≥ 1 ·

1

3= zj > 0

=⇒ Condition (G1) satisfied by choosing zj := 13

for all j ∈ [m].

Condition (G3) - Sufficiently Large Population

Recall that γ0 = µ/λ = 4/9 and δ = 1/2 and z∗ = 1/3

8

γ0δ2

(ln

(m

z∗γ0δ7

)+ 11

)

= 72 (ln (864m) + 11)

< 72(ln(m) + 18)

≤ λ

Hence, choosing λ ≥ 72(ln(m) + 18) sufficient to satisfy (G3).

Condition (G3) - Sufficiently Large Population

Recall that γ0 = µ/λ = 4/9 and δ = 1/2 and z∗ = 1/3

8

γ0δ2

(ln

(m

z∗γ0δ7

)+ 11

)= 72 (ln (864m) + 11)

< 72(ln(m) + 18) ≤ λ

Hence, choosing λ ≥ 72(ln(m) + 18) sufficient to satisfy (G3).

Example: Summary

We have shown that if λ ≥ 72(ln(m) + 18) and µ = 4λ/9

I (G1) is satisfied for zj = 1/3 for all j ∈ [m]

I (G2) is satisfied for δ = 1/2, and

I (G3) is satisfied

hence, by the level-based theorem, the expected running time ofthe EA is no more than

1536

δ5

mλ ln(λ) +m∑j=1

1

zj

= O(mλ lnλ)

Population-Selection Variation Algorithm (PSVA)

Ptx

for t = 0 to∞ dofor i = 1 to λ do

Sample i-th parent x according to psel(Pt, ·)Sample i-th offspring Pt+1(i) according to pvar(x, ·)

Selective Pressure

γ

Cumulative selection probabilitypsel has cumulative selection probability β(γ) if

∀P ∈ Xλ ∀γ ∈ (0, γ0)

Pr ( f(psel(P )) ≥ f(γ-ranked) ) ≥ β(γ)

I If f(P ) = (8, 7, 6, 5, 5, 4, 3, 2),then β(3/8) ≈ Pr (select an individual ≥ 6).

I 2-tournament selection, β(γ) ≥ γ2 + 2γ(1− γ)

I linear ranking-selection β(γ) = γ(η(1− γ) + γ)

I (µ, λ)-selection β(γ) ≥ γλ/µ

Corollary for PSVA

If for any j ∈ [m]

(C1) pvar(y ∈ A+j | x ∈ A

+j−1) ≥ sj ≥ smin

(C2) pvar(y ∈ A+j | x ∈ A

+j ) ≥ p0

(C3) β(γ) ≥ γ(1+δ)p0

(C4) λ ≥ 8γ0δ2

(ln(

mγ20δ

7smin

)+ 11

)then the expected time to reach the last level Am+1 is less than

1536

δ5

mλ ln(λ) +p0

γ0

m∑j=1

1

sj

Proof of Corollary: (C2) & (C3) =⇒ (G2)

MS yx

If |P ∩A+j−1| ≥ γ0λ > |P ∩A+

j | =: γλ and y ∼ D(P )

then Pr(y ∈ A+

j

)≥

Pr(x ∈ A+

j

)Pr(y ∈ A+

j | x ∈ A+j

)(i.e., select x from level j + 1

and do not downgrade it)

≥ β(γ)p0

≥ γ(1 + δ).

Proof of Corollary: (C2) & (C3) =⇒ (G2)

MS yx

If |P ∩A+j−1| ≥ γ0λ > |P ∩A+

j | =: γλ and y ∼ D(P )

then Pr(y ∈ A+

j

)≥ Pr

(x ∈ A+

j

)Pr(y ∈ A+

j | x ∈ A+j

)(i.e., select x from level j + 1

and do not downgrade it)

≥ β(γ)p0

≥ γ(1 + δ).

Proof of Corollary: (C1) & (C3) =⇒ (G1)

MS yx

If |P ∩A+j−1| ≥ γ0λ and |P ∩A+

j | = 0 and y ∼ D(P )

Pr(y ∈ A+

j

)≥

Pr(x ∈ A+

j−1

)Pr(y ∈ A+

j | x ∈ A+j−1

)(i.e., select x from level j and upgrade it)

≥ β(γ0)sj

≥ γ0(1 + δ)sj/p0

= zj > 0

Proof of Corollary: (C1) & (C3) =⇒ (G1)

MS yx

If |P ∩A+j−1| ≥ γ0λ and |P ∩A+

j | = 0 and y ∼ D(P )

Pr(y ∈ A+

j

)≥ Pr

(x ∈ A+

j−1

)Pr(y ∈ A+

j | x ∈ A+j−1

)(i.e., select x from level j and upgrade it)

≥ β(γ0)sj

≥ γ0(1 + δ)sj/p0

= zj > 0

Example Application

LeadingOnes(x) =

n∑i=1

i∏j=1

xj

Partition into n+ 1 levels

Aj := x ∈ 0, 1n | x1 = · · · = xj−1 = 1 ∧ xj = 0

Example Application

(µ,λ) EA with bit-wise mutation rate χ/n on LeadingOnes

If λ/µ > eχ(1 + δ) and λ > c′′ ln(n) then

(C1) pvar

(y ∈ A+

j | x ∈ Aj)≥χ(1− δ)neχ

=: sj =: s∗

(C2) pvar

(y ∈ Aj ∪A+

j | x ∈ Aj)≥

1− δeχ

=: p0

(C3) β(γ) ≥ γλ/µ > γ(1 + δ)eχ

= γ(1 + δ)/p0

(C4) λ > c′′ ln(n)

> c ln(m/s∗)

then E [T ] = O(mλ ln(λ) +∑mj=1 s

−1j ) = O(nλ ln(λ) + n2)

Example Application

(µ,λ) EA with bit-wise mutation rate χ/n on LeadingOnes

If λ/µ > eχ(1 + δ) and λ > c′′ ln(n) then

(C1) pvar

(y ∈ A+

j | x ∈ Aj)≥χ(1− δ)neχ

=: sj =: s∗

(C2) pvar

(y ∈ Aj ∪A+

j | x ∈ Aj)≥

1− δeχ

=: p0

(C3) β(γ) ≥ γλ/µ > γ(1 + δ)eχ = γ(1 + δ)/p0

(C4) λ > c′′ ln(n) > c ln(m/s∗)

then E [T ] = O(mλ ln(λ) +∑mj=1 s

−1j ) = O(nλ ln(λ) + n2)

Exercise: Our first example, linear ranking selection

How to set the following parameters

I population size λ

I selective pressure η

I mutation rate χ/n

so that the EA optimises LeadingOnes efficiently?

Hints

I (C1), (C2), and (C4) already satisfied as for (µ, λ)-selection

I Remains to show (C3), i.e., β(γ) ≥ (1 + δ)γ/p0

I Linear ranking has cumulative selection probability

β(γ) = γ(η(1− γ) + γ)

Genetic Algorithms with Crossover

MS

C

x1

x2

zy

Definition (Genetic Algorithm)

for t = 0, 1, 2, . . . until termination condition dofor i = 1 to λ do

Select parents x1 and x2 from population Pt acc. to psel

Create z by applying a crossover operator to x1 and x2.Create y by applying a mutation operator to y.

Corollary for Genetic Algorithms

If for any j ∈ [m]

(C1) pvar(y ∈ A+j | x ∈ A

+j−1) ≥ sj ≥ s∗

(C2) pvar(y ∈ A+j | x ∈ A

+j ) ≥ p0

(C3) pxor(x ∈ A+j | u ∈ A

+j−1, v ∈ A

+j ) ≥ ε1

(C4) β(γ) ≥ γ√

1+δp0ε1γ0

(C5) λ ≥8

δ2γ0

(ln

(m

γ20δ

7s∗

)+ 8

)then the expected time to reach the last level Am+1 is less than

1536

δ5

mλ ln(λ) +p0

γ0

m∑j=1

1

sj

Proof of Corollary: (C1) and (C4) =⇒ (G1)

MS

C

x1

x2

zy

Pr(y ∈ A+

j

)

≥ Pr(z ∈ A+

j−1

)sj

≥ Pr(x1, x2 ∈ A+

j−1

)ε1sj

≥ β(γ0)2ε1sj

≥ γ20

(1 + δ

p0ε1γ0

)ε1sj

≥ γ0(1 + δ)sj/p0

=: zj.

Proof of Corollary: (C1) and (C4) =⇒ (G1)

MS

C

x1

x2

zy

Pr(y ∈ A+

j

)≥ Pr

(z ∈ A+

j−1

)sj

≥ Pr(x1, x2 ∈ A+

j−1

)ε1sj

≥ β(γ0)2ε1sj

≥ γ20

(1 + δ

p0ε1γ0

)ε1sj

≥ γ0(1 + δ)sj/p0

=: zj.

Proof of Corollary: (C2), (C3), and (C4) =⇒ (G2)

MS

C

x1

x2

zy

Pr(y ∈ A+

j

)

≥ Pr(z ∈ A+

j

)p0

≥ Pr(x1 ∈ A+

j−1

)Pr(x2 ∈ A+

j

)ε1p0

≥ β(γ0)β(γ)ε1p0

≥ γ0

(1 + δ

p0ε1γ0

)γε1p0

≥ γ(1 + δ).

Proof of Corollary: (C2), (C3), and (C4) =⇒ (G2)

MS

C

x1

x2

zy

Pr(y ∈ A+

j

)≥ Pr

(z ∈ A+

j

)p0

≥ Pr(x1 ∈ A+

j−1

)Pr(x2 ∈ A+

j

)ε1p0

≥ β(γ0)β(γ)ε1p0

≥ γ0

(1 + δ

p0ε1γ0

)γε1p0

≥ γ(1 + δ).

Uncertainty in Comparison-based PSVAs

1

2

3

M

Sources of uncertainty

1. Droste noise model (Droste, 2004)

2. Partial evaluation

3. Noisy fitness (Prugel-Bennet, Rowe, Shapiro, 2015)

Sufficient with mutation rate δ/(3n) and

Pr (x choosen | f(x) > f(y)) ≥1

2+ δ with 1/δ ∈ poly(n)

(Dang & Lehre, GECCO 2014 & FOGA 2015)

Lower Bounds

ProblemConsider a sequence of populations P1, . . . over a search space X ,and a target region A ⊂ X (e.g., the set of optimal solutions), let

TA := min λt | Pt ∩A 6= ∅

We would like to prove statements on the form

Pr (TA ≤ t(n)) ≤ e−Ω(n). (2)

I i.e., with overwhelmingly high probability, the target region Ahas not been found in t(n) evaluations

I lower bounds often harder to prove than upper bounds

I will present an easy to use method that is applicable in manysituations

Algorithms considered for lower bounds

Definition (Non-elitist EA with selection and mutation)

for t = 0, 1, 2, . . . until termination condition dofor i = 1 to λ do

Select parent x from population Pt according to psel

Flip each position in x independently with probability χ/n.Let the i-th offspring be Pt+1(i) := x.(i.e., create offspring by mutating the parent)

Assumptions

I population size λ ∈ poly(n), i.e. not exponentially large

I bitwise mutation with probability χ/n, but no crossover.

I results hold for any non-elitist selection scheme psel

that satisfy some mild conditions to be described later.

Reproductive rate4

DefinitionFor any population P = (x1, . . . , xλ) let psel(xi) be theprobability that individual xi is selected from the population P

I The reproductive rate of individual xi is λ · psel(xi).

I The reproductive rate of a selection mechanismis bounded from above by α0 if

∀P ∈ Xλ, ∀x ∈ P λ · psel(x) ≤ α0

(i.e., no individual gets more than α0 offspring in expectation)

4The reproductive rate of an individual as defined here corresponds to thenotion of “fitness” as used in the field of population genetics, i.e., the expectednumber of offspring.

(µ, λ)-selection mechanism

Probability of selecting i-th individual is pi ∈ 0, 1µ.

I reproductive rate bounded by α0 = λ/µ

Negative Drift Theorem for Populations (informal)

If individuals closer than b of target has reproductive rateα0 < eχ,then it takes exponential time ec(b−a) to reach within a of target.

Negative Drift Theorem for PopulationsConsider the non-elitist EA with

I population size λ = poly(n)

I bitwise mutation rate χ/n for 0 < χ < n

let T := mint | H(Pt, x∗) ≤ a for any

x∗ ∈ 0, 1n.

If there are constants α0 ≥ 1, δ > 0 and integersa(n) and b(n) < n

χwhere b(n)− a(n) = ω(lnn),

st.

(C1) If a(n) < H(x, x∗) < b(n) thenλ · psel(x) ≤ α0.

(C2) ψ := ln(α0)/χ+ δ < 1

(C3) b(n) < minn5, n

2

(1−

√ψ(2− ψ)

)then there exist constants c, c′ > 0 such that

Pr(T ≤ ec(b(n)−a(n))

)≤ e−c

′(b(n)−a(n)).

The worst individuals have low reproductive rate

LemmaConsider any selection mechanism which for x, y ∈ P satisfies

(a) If f(x) > f(y), then psel(x) > psel(y).(selection probabilities are monotone wrt fitness)

(b) If f(x) = f(y), then psel(x) = psel(y).(ties are drawn randomly)

If f(x) = miny∈P f(y), then psel(x) ≤ 1/λ.(individuals with lowest fitness have reproductive rate ≤ 1)

Proof.

I By (a) and (b), psel(x) = miny∈P psel(y).

I 1 =∑x∈P psel(x) ≥ λ ·miny∈P psel(y) = λ · psel(x).

Example 1: Needle in the haystack

Definition

Needle(x) =

1 if x = 1n

0 otherwise.

TheoremThe optimisation time of the non-elitist EA with any selectionmechanism satisfying the properties above5 on Needle isat least ecn with probability 1− e−Ω(n) for some constant c > 0.

5From black-box complexity theory, it is known that Needle is hard for allsearch heuristics (Droste et al 2006).

Example 1: Needle in the haystack (proof6)

I Apply negative drift theorem with a(n) := 1.

I By previous lemma, can choose α0 = 1 for any b(n),hence ψ = ln(α)/χ+ δ = δ < 1 for all χ and δ < 1.

I Choosing the parameters δ := 1/10 and b(n) := n/6 give

min

n

5,n

2

(1−

√ψ(2− ψ)

)=n

5< b(n).

I It follows that Pr(T ≤ ec(b(n)−a(n))

)≤ e−Ω(n).

6For simplicity, we assume that b(n) ≤ n/χ.

Exercise: Optimisation time on Jumpk

0 |x| (number of 1-bits) n

Fitness

Opt.k

Jumpk(x) :=

0 if n− k ≤ |x| < n,

|x| otherwise.

Recipe

I a(n) = 1

I b(n) = k

I α0 = 1 as before

I small δ

Exercise: Optimisation time on Jumpk

0 |x| (number of 1-bits) n

Fitness

Opt.k

Jumpk(x) :=

0 if n− k ≤ |x| < n,

|x| otherwise.

Recipe

I a(n) = 1

I b(n) = k

I α0 = 1 as before

I small δ

When the best individuals have low reproductive rate

Remark

I The negative drift conditions hold triviallyif α0 < eχ holds for all individuals.

Example (Insufficient selective pressure)

Selection mechanism Parameter settings

Linear ranking selection η < eχ

k-tournament selection k < eχ

(µ,λ)-selection λ < µeχ

Any in cellular EAs ∆(G) < eχ

Mutation-selection balance

χn0 2

n

λµ

1

7

exp

poly

RuntimeExample

The runtime T of a non-elitist EA with

I (µ, λ)-selection

I bit-wise mutation rate χ/n

I population size λ > c log(n)

on LeadingOnes has expected value

E [T ] =

eΩ(n) if λ < µeχ

O(nλ ln(λ) + n2) if λ > µeχ

λµ > eχ

Mutation-selection balance

χn0 2

n

λµ

1

7

exp

poly

RuntimeExample

The runtime T of a non-elitist EA with

I (µ, λ)-selection

I bit-wise mutation rate χ/n

I population size λ > c log(n)

on LeadingOnes has expected value

E [T ] =

eΩ(n) if λ < µeχ

O(nλ ln(λ) + n2) if λ > µeχ

λµ > eχ

Other Example Applications

Expected runtime of EA with bit-wise mutation rate χ/n

Selection Mechanism High Selective Pressure

Low Selective Pressure

Fitness Proportionate ν > fmax ln(2eχ)

ν < χ/ ln 2 and λ ≥ n3

Linear Ranking η > eχ

η < eχ

k-Tournament k > eχ

k < eχ

(µ, λ) λ > µeχ

λ < µeχ

Cellular EAs

∆(G) < eχ

Onemax O(nλ2)

eΩ(n)

LeadingOnes O(nλ2 + n2)

eΩ(n)

Linear Functions O(nλ2 + n2)

eΩ(n)

r-Unimodal O(rλ2 + nr)

eΩ(n)

Jumpr O(nλ2 + (n/χ)r)

eΩ(n)

Other Example Applications

Expected runtime of EA with bit-wise mutation rate χ/n

Selection Mechanism High Selective Pressure Low Selective Pressure

Fitness Proportionate ν > fmax ln(2eχ) ν < χ/ ln 2 and λ ≥ n3

Linear Ranking η > eχ η < eχ

k-Tournament k > eχ k < eχ

(µ, λ) λ > µeχ λ < µeχ

Cellular EAs ∆(G) < eχ

Onemax O(nλ2) eΩ(n)

LeadingOnes O(nλ2 + n2) eΩ(n)

Linear Functions O(nλ2 + n2) eΩ(n)

r-Unimodal O(rλ2 + nr) eΩ(n)

Jumpr O(nλ2 + (n/χ)r) eΩ(n)

Summary

I Runtime analysis of evolutionary algorithmsI mathematically rigorous statements about EA performanceI most previous results on simple EAs, such as (1+1) EAI special techniques developed for population-based EAs

I Level-based method [Corus et al., 2014]I EAs analysed from the perspective of EDAsI Upper bounds on expected optimisation timeI Example applications include crossover and noise

I Negative drift theorem [Lehre, 2011b]I reproductive rate vs selective pressureI exponential lower boundsI mutation-selection balance

Acknowledgements

I Dogan Corus, University of Nottingham, UK

I Duc-Cuong Dang, University of Nottingham, UK

I Anton Eremeev, Omsk Branch of Sobolev Institute ofMathematics, Russia

The research leading to these results has received funding from theEuropean Union Seventh Framework Programme (FP7/2007-2013)under grant agreement no. 618091 (SAGE).

References I

Chen, T., He, J., Sun, G., Chen, G., and Yao, X. (2009).A new approach for analyzing average time complexity of population-basedevolutionary algorithms on unimodal problems.Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on,39(5):1092–1106.

Corus, D., Dang, D., Eremeev, A. V., and Lehre, P. K. (2014).Level-based analysis of genetic algorithms and other search processes.In Parallel Problem Solving from Nature - PPSN XIII - 13th InternationalConference, Ljubljana, Slovenia, September 13-17, 2014. Proceedings, pages912–921.

Dang, D.-C. and Lehre, P. K. (2014).Refined Upper Bounds on the Expected Runtime of Non-elitist Populations fromFitness-Levels.In Proceedings of the 16th Annual Conference on Genetic and EvolutionaryComputation Conference (GECCO 2014), pages 1367–1374.

Goldberg, D. E. and Deb, K. (1991).A comparative analysis of selection schemes used in genetic algorithms.In Foundations of Genetic Algorithms, pages 69–93. Morgan Kaufmann.

References II

He, J. and Yao, X. (2003).Towards an analytic framework for analysing the computation time ofevolutionary algorithms.Artificial Intelligence, 145(1-2):59–97.

Jansen, T., Jong, K. A. D., and Wegener, I. (2005).On the choice of the offspring population size in evolutionary algorithms.Evolutionary Computation, 13(4):413–440.

Lehre, P. K. (2011a).Fitness-levels for non-elitist populations.In Proceedings of the 13th annual conference on Genetic and evolutionarycomputation, (GECCO 2011), pages 2075–2082, New York, NY, USA. ACM.

Lehre, P. K. (2011b).Negative drift in populations.In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume6238 of LNCS, pages 244–253. Springer Berlin / Heidelberg.

Lehre, P. K. and Yao, X. (2012).On the impact of mutation-selection balance on the runtime of evolutionaryalgorithms.Evolutionary Computation, IEEE Transactions on, 16(2):225 –241.

References III

Neumann, F., Oliveto, P. S., and Witt, C. (2009).Theoretical analysis of fitness-proportional selection: landscapes and efficiency.In Proceedings of the 11th Annual conference on Genetic and evolutionarycomputation (GECCO 2009), pages 835–842, New York, NY, USA. ACM.

Rowe, J. E. and Sudholt, D. (2012).The choice of the offspring population size in the (1,λ) ea.In Proceedings of the fourteenth international conference on Genetic andevolutionary computation conference, GECCO ’12, pages 1349–1356, New York,NY, USA. ACM.

Witt, C. (2006).Runtime Analysis of the (µ + 1) EA on Simple Pseudo-Boolean Functions.Evolutionary Computation, 14(1):65–86.

Zarges, C. (2009).On the utility of the population size for inversely fitness proportional mutationrates.In FOGA 09: Proceedings of the tenth ACM SIGEVO workshop on Foundationsof genetic algorithms, pages 39–46, New York, NY, USA. ACM.