Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility...

24
Stochastic Models Markov Chain Monte Carlo Introduction Matrix in R Lecture 2: Stochastic Models and MCMC October 5, 2010 Special Meeting 1/24

Transcript of Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility...

Page 1: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Lecture 2: Stochastic Models and MCMC

October 5, 2010

Special Meeting 1/24

Page 2: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Dynamical System

Knowing the state xt at time t, a system allows us to determinext+1. This suggests the evolution of states in the system

xt+1 = φt(xt)

where φt maps from all the possible states to themselves.

Special Meeting 2/24

Page 3: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Probabilistic Dynamical System

When the dynamical system is probabilistic it may be determinedby the transition probability

Pij = P(Xt+1 = j |Xt = i) = P(φt(i) = j)

for finitely many states i and j , with initial distributionp0(i) = P(X0 = i). This is a model for Markov chain.

Special Meeting 3/24

Page 4: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Transition probability matrix.

When the system has a finite state S = {1, . . . ,M}, the Markovchain is represented by the transition probability matrix

P =

P1,1 P1,2 · · · P1,M

P2,1 P2,2 · · · P2,M

. . . . . . . . . . . . . . . . . . . . . . .PM,1 PM,2 · · · PM,M

Since P is a square matrix, we can also introduce the n-th powerPn whose (i , j)-entry is denoted by Pn

ij . It represents the n-steptransition probability

Pnij = P(Xn = j | X0 = i)

Special Meeting 4/24

Page 5: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Transition Graph Representation

A Markov chain and its transition probabilities can be formulatedin terms of directed graph and function on its edges. Here Sbecomes the set of vertices of graph, and E = {(i , j) : Pij > 0}represents the collection of edges. The transition probability Pij ’scan be considered as a function on E .

Special Meeting 5/24

Page 6: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Communication Classes

When we can find a series of edges

(i0, i1), (i1, i2), . . . , (in−1, in)

we call it a directed path from i0 to in. There is a directed pathfrom i and j if and only if Pn

i ,j > 0 for some n. We say that state jcan be reached from state i , and write “i → j” if Pn

ij > 0 for somen ≥ 0, that is, if there is a directed path from i and j . The twostates i and j are said to communicate, denoted by “i ↔ j ,” if wehave i → j and j → i . It is an equivalence relation, and therefore,the state space S can be partitioned into equivalent classes, whichwe call communication classes.

Special Meeting 6/24

Page 7: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Irreducibility

The transition probability matrix P is said to be irreducible if itconsists of only one communication class. A communication classC is called recurrent if no state outside can be reached from anystate inside, that is,

Pni ,j = 0 for any i ∈ C and j 6∈ C , and for all n ≥ 0.

In particular, a state i is called absorbing if Pii = 1 since theMarkov chain has to stay at the state i forever (thus, “absorbed”)once it reaches there. Otherwise, that is, if a communication classis not recurrent, we call it transient.

Special Meeting 7/24

Page 8: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

An Example

The transition probability matrix

P =

0.3 0.1 0.5 0.10.2 0.4 0.2 0.20 0 1 00 0 0 1

corresponds to the previous example of graph representation. Herethe communication classes are {1, 2}, {3}, and {4}. Note that{3}, and {4} are absorbing.

Special Meeting 8/24

Page 9: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Periodicity of Recurrent Classes

We say that a recurrent state i has the period d if Pnii > 0 only

when n = kd with some integer k. Furthermore, if state i has theperiod d and i ↔ j , then state j will have the same period d ; thus,a recurrent class must have a common period d . If a recurrentclass has the period d ≥ 2 then it is called periodic; otherwise,(that is, if d = 1) it is said to be aperiodic. If a recurrent class Cis aperiodic then

Pnij > 0 whenever i , j ∈ C

for sufficiently large n.

Special Meeting 9/24

Page 10: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Eigenvalue and Eigenvector

A raw vector uT (which is a column vector u “transposed”) iscalled a left-eigenvector if it satisfies uT P = λuT . Then λ iscalled the corresponding eigenvalue. The eigenvalue λ = 1 isknown to be the largest in absolute value in any matrix P ofprobability transition matrix. Furthermore, Frobenius showed thatits corresponding left-eigenvector uT = [π1 · · ·πM ] is strictlypositive (i.e., πi > 0 for all i = 1, . . . ,M), and can be chosen asthe unique solution to the equations

πj =M∑i=1

πiPij for all j = 1, . . . ,M andM∑i=1

πi = 1.

Thus, π(j) is viewed as probability mass function, and called thestationary distribution.

Special Meeting 10/24

Page 11: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Perron–Frobenius theorem

Historically this fundamental discovery was made by Perron in1907. Suppose that P is aperiodic and irreducible. Then P has aunique left-eigenvector uT = [π1 · · ·πM ] representing thestationary distribution and

limn→∞

Pnij = πj for all i = 1, . . . ,M.

Moreover, if λ2 is the second largest eigenvalue of P in absolutevalue, for any |λ2| < ρ < 1 we can find C > 0 such that

max1≤i ,j≤M

|Pnij − πj | < Cρn for all n = 1, 2, . . ..

Special Meeting 11/24

Page 12: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Matrix Manipulation

A matrix and its multiplication with a row vector are introduced inR. By default data are filled into the matrix, starting from the firstcolumn to the last column with the specified number ncol ofcolumns.

> P = matrix(c(0, 0, 0.6, 1, 0.1, 0.4, 0, 0.9, 0), ncol=3)> P> P %*% P> c(0.2, 0.8, 0) %*% P %*% P

The result should define a transition probability matrix and givethe probability distribution after the second step given the initialdistribution p0(1) = 0.2, p0(2) = 0.8, and p0(3) = 0.

Special Meeting 12/24

Page 13: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

IntroductionMatrix in R

Stationary Distribution and Eigenvector

Compare the probability at the Nth step with the normalizedeigenvetor corresponding to λ = 1. Change the initial distributionand the size N, and compare the result. Note that eigen()calculates the left-eigenvectors when the transposed matrix t(P) isgiven.

> N = 20> A = P> for(i in 1:N) A = A %*% P> A> c(0.2, 0.8, 0) %*% A> uu = as.real(eigen(t(P))$vectors[,1])> uu / sum(uu)

Special Meeting 13/24

Page 14: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

Markov ChainExploration with RMetropolis Algorithm

Markov chain

Let S be a discrete state space and let (Xt , t = 0, 1, . . .) be adiscrete time stochastic process, taking its value on S . Then Xt iscalled a Markov chain if

P(Xt = xt |Xs = xs , s ≤ t ′) = P(Xt = xt |Xt′ = xt′)

for t ′ < t. Let P(x , y) be a transition probability, that is, aprobability distribution P(x , ·) on S for each x ∈ S . Then one canconstruct a Markov chain Xt so that

P(Xt = y |Xt−1 = x) = P(x , y).

Then we can easily see that P(Xt+n = y |Xt = x) = Pn(x , y).

Special Meeting 14/24

Page 15: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

Markov ChainExploration with RMetropolis Algorithm

Ergodicity and Limit Theorem

We call (Xt) (and its transition probability P) an ergodic Markovchain if P is irreducible and aperiodic. Now suppose that we havedevised an ergodic Markov chain whose stationary distribution is π.Then we have the following convergence theorem: If P isirreducible and aperiodic, then there is a unique stationarydistribution π such that

Pn(x , y) = P(Xn = y |X0 = x) → π(y) as n →∞,

regardless of the choice for x . In terms of sample path it impliesthat

XtL−→ π as t −→∞.

Special Meeting 15/24

Page 16: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

Markov ChainExploration with RMetropolis Algorithm

Random beta-Walk

Here we will introduce a random walk on (0, 1) in which the nextstep Xt+1 is determined by the beta distribution with parameterα = max(δ + θXt , 1) and β = max(δ + θ(1− Xt), 1). Theseparameters can control the behavior of walk.

I A smaller δ keeps a sample path closer to either of theboundary.

I The larger the value θ is the smaller the move of each stepbecomes.

Thus, 0 < δ < 1 will change the shape of stationary distribution ofthe random walk, and θ > 0 will influence the speed ofconvergence of random walk. Later we use this random walk as aproposal Markov chain on (0, 1).

Special Meeting 16/24

Page 17: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

Markov ChainExploration with RMetropolis Algorithm

Simulation in R

Download the R code “bwalk.r” and observe a sample path ofthe beta random walk.

I Choose a different choice of δ and θ.

I Change the running time and see how long it takes to displaya stationary behavior.

> source("bwalk.r")> sample.path = rwalk(move=bmove, trajectory=T, run.time=500,delta=0.8, theta=20)> par(mfrow=c(2,1))> plot(sample.path, type="l", xlab="time", ylab="state")

Special Meeting 17/24

Page 18: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

Markov ChainExploration with RMetropolis Algorithm

A Long Run Behavior

A long run behavior can be observed from the histogram of Xt

from the end of runs repeatedly.

I Change the running time, and see if the distribution of Xt isdifferent.

I Obtain the histogram of Xt for a different choice of δ and θ.

I Change the initial point by adding “init.state=1”, and seewhether it affects the long run behavior.

> sample.data = rwalk(move=bmove, run.time=100,delta=0.8, theta=20, sample.size=500)> hist(sample.data, freq=F, breaks=seq(0,1,by=0.05), col="red")

Special Meeting 18/24

Page 19: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

Markov ChainExploration with RMetropolis Algorithm

Emergence of Markov Chain Monte Carlo

In reality the “state” space for π(x) is not R. Either it is a subsetof Rn (in Bayesian applications), or it has a complex discretestructure (e.g., Ising model). For such models both algorithms viainverse probability transform and resampling method are notapplicable in general. By way of Markov chain convergencetheorem one can construct a Markov chain X whose stationarydistribution is π.

XtL−→ π as t −→∞.

This methodology gives a way to break the limitation of MonteCarlo simulation. But how can we make it happen?

Special Meeting 19/24

Page 20: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

Markov ChainExploration with RMetropolis Algorithm

Metropolis Algorithm

Let π be a pdf of interest on a state space S , and let Q be atransition probability on S . Define the acceptance probability by

A(x , y) := min

{π(y)Q(y , x)

π(x)Q(x , y), 1

}for a move from x to y satisfying Q(x , y) 6= 0 and Q(y , x) 6= 0.

Metropolis-Hastings Algorithm (MHA)

1. Choose an initial state X0 at time t = 0.

2. Suppose that Xt = x at time t. Then pick y according toQ(x , ·).

3. Move to Xt+1 = y with the probability A(x , y), or stay Xt+1 =x with the probability (1− A(x , y)).

Special Meeting 20/24

Page 21: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

Markov ChainExploration with RMetropolis Algorithm

Transition Probability for MHA

One should choose Q to be irreducible on S , and call it a proposalchain. In terms of transition probability, the above algorithm isgiven by

P(x , y) =

{Q(x , y)A(x , y) if x 6= y ;

Q(x , x) +∑

z∈S Q(x , z)(1− A(x , z)) if x = y .

Then P is irreducible, aperiodic and reversible transition probabilitywith the stationary distribution π.

Special Meeting 21/24

Page 22: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

Markov ChainExploration with RMetropolis Algorithm

Property of Time-Reversibility

Let π be a probability distribution on S . We call π stationary if∑x∈S

π(x)P(x , y) = π(y)

for all y ∈ S . A Markov chain P̃ is said to be time-reversed if thedetailed balance

π(x)P(x , y) = π(y)P̃(y , x)

holds between P and P̃. Moreover, P is called a reversible Markovchain when P = P̃. In particular if the probability distribution πsatisfies the detailed balance between the two Markov chains Pand P̃, then π becomes stationary for both P and P̃.

Special Meeting 22/24

Page 23: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

Markov ChainExploration with RMetropolis Algorithm

Reversibility for MHA

The detailed balance clearly holds for either x = y orπ(x)Q(x , y) = π(y)Q(y , x). Suppose that x 6= y , and thatπ(x)Q(x , y) > π(y)Q(y , x). Then

A(x , y) = π(y)Q(y ,x)π(x)Q(x ,y) and A(y , x) = 1,

and therefore, we obtain

π(x)P(x , y) = π(x)Q(x , y)A(x , y)= π(y)Q(y , x) = π(y)P(y , x).

It also holds for the case that π(x)Q(x , y) < π(y)Q(y , x) bysymmetry.

Special Meeting 23/24

Page 24: Lecture 2: Stochastic Models and MCMC October 5, 2010 · Introduction Matrix in R Irreducibility The transition probability matrix P is said to be irreducible if it consists of only

Stochastic ModelsMarkov Chain Monte Carlo

Markov ChainExploration with RMetropolis Algorithm

Metropolis for Normal Mixture

Here we use a random beta walk as a proposal chain, and run theMetropolis-Hastings Algorithm to generate a normal mixturedensity on [0, 1].

> source("nm.r")> source("metro.r")> sample = bwalk.metro(ff=nm, run.time=100, delta=0.8,theta=20, sample.size=500)> hist(sample, freq=F, breaks=seq(0,1,by=0.05), col="red")> x = seq(0,1,by=0.01)> lines(x, nm(x))

Change the running time and the parameters for random betawalk, and see how the histogram differs from the targetdistribution of normal mixture.

Special Meeting 24/24