Markov Chains - University of Southern...

24
Markov Chains Math 705 Topics in Probability, Spring 2010 May 19, 2010

Transcript of Markov Chains - University of Southern...

Page 1: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

Markov Chains

Math 705 Topics in Probability, Spring 2010

May 19, 2010

Page 2: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

Transient States For Continuous Time Markov Chains

We begin with relevant definitions that we shall need for this discussion.

Definition 1. A stochastic process Xt : t ∈ [0,∞) , defined on a probability space

(Ω,F , µ), with values in a countable set E(the state space), is called a continuous time

Markov chain if for any 0 ≤ t1 < t2 < . . . < tn+1 and corresponding set i1, . . . , in−1 : ik ∈E, 1 ≤ k ≤ n − 1 and i, j ∈ E such that P (Xtn = in, Xtn−1 = in−1, . . . , X1 = i1) > 0, we

have

P (Xtn+1 = j|Xtn = in, Xtn−1 = in−1, . . . , X1 = i1) = P (Xtn+1 = j|Xtn = in)

Definition 2. A Markov chain is time homogeneous if the P (Xt = j|Xs = i) only depends

on t and s through t− s.

In our discussion, we shall only consider time homogeneous processes. Define the transition

function of Xt as

Pij(t) ≡ P (Xt = j|X0 = i).

Definition 3. Given i, j ∈ E, we say that j can be reached from i if Pi,j(t) > 0 for some

t > 0, and therefore all t > 0. We say that i and j communicate and write i ↔ if i and j

can both be reached from each other.

We can define a partial ordering on E by saying that i > j if i → j. We can also define an

equivalence relation on E through ↔.

Definition 4. We say a state i ∈ E is recurrent if∫∞0

Pi,i(t)dt = ∞ and transient otherwise.

Note that∫∞0

Pi,i(t)dt =∫∞0

E(1X(t)=i|X0 = i

)dt = E

( ∫∞0

1Xt=idt|X0 = i)

= the expected

time spent in i given we start in i at time 0.

Proposition 0.0.1. Suppose i ↔ j. Then i is transient iff j is transient. Therefore in a

communicating class C, if one state is transient, then they all are.

Proof : Suppose i is transient. Then since i ↔ j, ∃ s, t > 0 such that Pi,j(t) > 0 and

Pj,i(s) > 0. Therefore for any u ≥ 0, Pi,i(t + u + s) ≥ Pi,j(t)Pj,j(u)Pj,i(s) by a simple

application of the Chapman-Kolmogorov equations. Therefore∫ ∞

0

Pi,i(v)dv ≥∫ ∞

0

Pi,i(s+t+u)du ≥∫ ∞

0

Pi,j(t)Pj,j(u)Pj,i(s)du = Pi,j(t)Pj,i(s)

∫ ∞

0

Pj,j(u)du.

So if∫∞0

Pi,i(v)dv < ∞, then∫∞0

Pj,j(u)du < ∞.

1

Page 3: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

Notice that we can modify the previous proof to show that if i ↔ j and i is recurrent, then

so is j.

Now we shall prove the major point of this note. First define the δ skeleton as the discretiza-

tion of Xt with time step δ.

Definition 5. Given δ > 0, the discrete Markov chain X(nδ), n ≥ 0 having one step

transition probabilities Pi,j(δ) is called the δ-skeleton of Xt, t ≥ 0.

Theorem 0.0.2. Let δ > 0 and i ∈ E be fixed. Then i transient ⇔ i is transient in the

δ-skeleton of Xt.

Proof : Note that∫∞

0Pi,i(t)dt =

∑n≥0

∫ (n+1)δ

nδPi,i(t)dt. Therefore

n≥0

min0≤s≤δ

δPi,i(nδ + s) ≤∫ ∞

0

Pi,i(t)dt ≤∑

n≥0

max0≤s≤δ

δPi,i(δn + s). (1)

Since Pi,i(nδ + s) ≥ Pi,i(nδ)Pi,i(s) for all s ≥ 0 by the Chapman-Kolmogorov equations,

min0≤s≤δ

Pi,i(nδ + s) ≥ Pi,i(nδ)min0≤s≤δPi,i(s), where min0≤s≤δPi,i(s) > 0. (2)

Pi,i

((n + 1)δ

)= Pi,i(nδ + s + δ − s) ≥ Pi,i(nδ + s)Pi,i(δ − s), therefore

max0≤s≤δ

Pi,i(nδ + s) ≤ Pi,i

((n + 1)δ

)

min0≤s≤δ Pi,i(δ − s). (3)

If we combine (1), (2), and (3), we see that

(min0≤s≤δPi,i(s)

)δ∑

n≥0

Pi,i(nδ) ≤∫ ∞

0

Pi,i(t)dt ≤ 1

min0≤s≤δPi,i(s)δ∑

n≥0

Pi,i((n + 1)δ). (4)

If we remember that a discrete Markov chain is transient if∑

n≥0 Pi,i(n) < ∞, then we see

from (4) if∫∞0

Pi,i(t) < ∞ then∑

n≥0 Pi,i(nδ) < ∞ and vice versa.

2

Page 4: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

Bibliography

[1] William J. Anderson, Continuous-Time Markov Chains: An Applications Oriented Ap-

proach, Springer-Verlag, 1991.

3

Page 5: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

Markov Chains and Coupling

Introduction

Let Xn denote a Markov Chain on a countable space S that is aperiodic, irre-ducible and positive recurrent, and hence has a stationary distribution π. Let-ting P denote the state transition matrix, we have for any initial distribution λthat

P (Xn = j) =∑

i

λip(n)ij → πj

as n→∞, where the superscript (n) represents the n-step probability. De-pending on where the chain begins, there is no indication as to the speed ofconvergence. Do we need to run the chain 100 times? 1000? This is an impor-tant question since many problems are solved using Markov Chain Monte Carlo(MCMC), where one wishes to sample from the unknown distribution π, andhence sets up a Markov Chain whose stationary distribution is precisely π, andafter a certain number of steps one records the current state as a sample fromthat distribution.

More precisely, we would like an estimate of |P (Xn = j)− πj |, and be ableto guarantee that this is small for large n.

The Intuition

The key insight to solving this problem is the following. If we knew the distri-bution of π, then we could simply pick the initial value X0 according to thisdistribution, and we would forever be in stationarity. In reality, we do not knowthis distribution, so we instead pick the starting location from some other distri-bution, say λ, and evolve according to P . In each instance we evolve accordingto P , and once we are in stationarity we stay there.

Herein lies the method of solution. Consider now two processes, X1n and X2

n.Let X1

0 be chosen according to π, and X20 according to some other distribution

λ. It is clear that X10 will always be in stationarity, but what if X2

0 = X10?

Won’t it too be in stationarity? The answer is affirmative, since of course theyboth evolve according to the same transition matrix P .

Extending this to any other meeting time T , we have that once the twoprocesses X1 and X2 meet, then they are both in stationarity, and we say theyare coupled and treat them as one process.

To visualize the above explanation, imagine two random walks in dimension1 starting at different points on the line x = 0. Let them evolve independentlyuntil they meet, and then let them continue together so that they are ’coupled’.

Formal Calculations

More formally, we have the following formulation of the problem. Let X1,X2, and X0 be Markov Chains satisfying the assumptions in the Introduction.

1

Page 6: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

Suppose that X1 has initial distribution π and X2 has initial distribution λ.Define

T = min(n ≥ 0 : X1n = X2

n).

Then letting X0 be given by

X0n =

X1

n n < T

X2n n ≥ T,

we have that X0 is initially equal in distribution to X1, but after time T ,which is called the coupling time, X0 is equal in distribution to X2

n, which is instationarity. Let us assume that T < ∞ a.s. (as will be the case in any finitestate system or e.g., a random walk in dimensions 1 and 2 but not 3 or above),then we have

|P (X1n = j)− πj | = |P (X0

n = j|T ≤ n) + P (X0n = j|T > n)

−P (X2n = j|T ≤ n)− P (X2

n = j|T > n)|= |P (X0

n = j|T > n)− P (X2n = j|T > n)|

≤ P (X0n = j|T > n) + P (X2

n = j|T > n) (1)= 2P (T > n). (2)

So if we are interested in estimating the error of a single sample, we can useEquation 2 to do so. However, since we are in a countable state space we cansum over all j on both sides and use Equation 1 to obtain

‖P (Xn ∈ ·)− π‖TV ≤ 2P (T > n),

where ‖ · ‖TV denotes the total variation distance. This allows us to proveuniform convergence towards the stationary distribution under certain condi-tions.

A Symmetric Example

Consider the case where P is symmtric, i.e. PT = P and there are a finitenumber of states, say n. Let π = (π1, . . . , πn) be the stationary distribution (soπP = π), which we assume is unknown, and let λ = (λ1, . . . , λn) be the initialdistribution given to our simulated process. By first recalling that π = πP =πPT , we then have

2

Page 7: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

P (T = 0) =n∑

i=1

πiλi = πλT

P (T = 1) = π(111− λT ) ∗ π(λP )T

= (1− πλT )(πPTλT )= (1− πλT )(πλT )

......

...P (T = k) = (1− πλT )k−1πλT ,

where we note that πλT =∑n

j=1 πjλj is simply a scalar value. T is hence ageometrically distributed random variable with parameter πλT , expected value1/πλT , and

P (T > n) = (1− πλT )n. (3)

The simplest case for λ is to pick a state j0 ∈ 1, . . . , n and let λj0 = 1 andall others 0, which represents starting our process at the same point each time.Then Equation 3 becomes

P (T > n) = (1− πj0)n−1.

It is then apparent that the best way to minimize this probability is to guessthe most likely state in the stationary distribution.

Reference: Lectures on the Coupling Method, Torgny Lindvall, 1992.

3

Page 8: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

Reversible Markov chain

April 27, 2010

I. The case in discrete time:

Suppose that Xn : 0 ≤ n ≤ N is an irreducible non-null persistent Markov chain, with

transition matrix P and stationary distribution π. Suppose further that Xn has distribution

π for all n. Define the ‘reversed chain’ Y by Yn = XN−n for 0 ≤ n ≤ N.

Theorem. The sequence Y is a Markov chain with P(Yn+1 = j|Yn = i) = (πj/πi)pij

Proof. We have

P(Yn+1 = in+1|Yn = in, Yn−1 = in−1, ..., Y0 = i0) =P(Yk = ik, 0 ≤ k ≤ n+ 1)

P(Yk = ik.0 ≤ k ≤ n)

=P(XN−n−1 = in+1, XN−n = in, ..., XN = i0)

P(XN−n = in, ..., XN = i0)

=πin+1pin+1,inpin,in−1 ...pi1,i0

πinpin,in−1 ...pi1,i0

=πin+1pin+1,in

πin= P(Yn+1 = in+1|Yn = in)

as required.

We call the chain Y the time - reversal of the chain X, and we say that X is reversible

if X and Y have the same transition probabilities.

Definition. Let X = Xn : 0 ≤ n ≤ N be an irreducible Markov chain such that Xn

has the stationary distribution π for all n. The chain is called reversible if the transition

matrices of X and its time-reversal Y are the same, that is:

πipij = πjpji for all i, j (∗)

Equations (*) are called the detailed balance equations. More generally, we say that

a transition matrix P and a distribution λ are in detailed balance if λipij = λjpji for all i, j

in the state space S. An irreducible chain X having a stationary distribution π is called

reversible in equilibrium if its transition matrix P is in detailed balance with π.

1

Page 9: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

Theorem. Let P be the transition matrix of an irreducible chain X, and suppose that there

exists a distribution π such that πjpij = πjpji for all i, j ∈ S. Then π is a stationary

distribution of the chain. Furthermore, X is reversible in equilibrium.

Proof. Suppose that π satisfies the conditions of the theorem. Then

i

πipij =∑

i

πjpji = πj∑

i

pji = πj

and so π = πP, hence π is stationary. The reversibility in equilibrium of X follows

from the definition.

Theorem. (Kolmogorov’s criterion for reversibility)

Let X be an irreducible non-null persistent aperiodic Markov chain. Then X is irre-

versible in equilibrium if and only if

pj1j2pj2j3 ...pjn−1jnpjnj1 = pj1jnpjnjn−1 ...pj2j1

for all n and all finite sequences j1, j2, ..., jn of states.

Proof. Suppose that X is reversible in equilibrium. We note that a stationary distribution π

exists and π 6= 0 for all i by the condition of the theorem. Then

π1pj1j2pj2j3 ...pjn−1jnpjnj1 = pj2j1π2pj2j3 ...pjn−1jnpjnj1

= pj2j1pj3j2π3...pjn−1jnpjnj1

...

= pj2j1pj3j2 ...pjnjn−1pj1jnπ1

by induction. π1 6= 0 by the above remark and the desired conclusion follows.

Suppose that the Kolmogorov’s condition holds. Denote p(n)ij := P(Xn = j|X0 = i).

We claim that

p(n)ij pji = pijp

(n)ji for all n = 1, 2, ...

It is enough to show this is true for n = 3. The calculation for other values of n is

similar:

2

Page 10: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

p(3)ij pji =

k∈Sp(2)ik pkjpji

=∑

k∈S

l∈Spilplkpkjpji

=∑

k,l∈Spilplkpkjpji

=∑

k,l∈Spijpjlplkpkj (Kolmogorov’s condition)

= pijp(3)ji

The condition of the chain also implies that X is ergodic. Hence taking limit as

n goes to ∞ on both sides of the above equation gives the desired result.

II. The case in continuous time:

Some preliminaries:

Definition. A continuous time process X satisfies the Markov property if

P(X(tn) = j|X(t1) = i1, ..., X(tn−1) = in−1) = P(X(tn) = j|X(tn−1) = in−1)

for all j, i1, ..., in−1 ∈ S and any sequence t1 < t2 < ... < tn of times.

Definition. The transition probability pij(s, t) is defined to be

pij(s, t) = P(X(t) = j|X(s) = i) for s ≤ t

The chain is called homogeneous if pij(s, t) = pij(0, t − s) for all i, j, s, t and we write

pij(t− s) for pij(s, t). We will assume that X is a homogeneous chain in this section.

Definition. The semigroup Pt is called standard if Pt → I as t ↓ 0.

Definition. Suppose gij = limt↓0P(Xt=j|X0=i)−δij

t exists (note that Pt is standard is a

necessary condition). We call the matrix G = (gij) the generator of the chain X.

Definition. Let X = X(t) : −∞ < t <∞ be a Markov chain with stationary distribution

π, and suppose that X(0) has distribution π. We call X reversible if X and Y have the same

joint distributions, where Y (t) := X(−t).

Theorem. • (i) If X(t) has distribution π for all t, then Y is a Markov chain with

transition probabilities p′ij(t) = πj

πipji(t), where the pij(t) are the transition probabilities

of X.

• (ii) If the transition semigroup Pt of X is standard with generator G, πigij = πjgij (

for all i and j ) is a necessary condition for the chain to be reversible.

3

Page 11: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

• (iii) If Pt = exp(tG), X(t) has distribution π for all t and that the condition in (ii) is

sufficient for the chain to be reversible.

Proof. • (i) The proof is similar to the proof in finite time case, and will be skipped.

• (ii) Suppose X is reversible. Let 0 < t1 < t2 be arbitrary time. Then

P(Yt1 = i, Yt2 = j) = P(X−t1 = i,X−t2 = j) (by definition of Y)

= P(Xt2−t1 = i,X0 = j) (by time homogeneity)

= πjpji(t2 − t1)

On the other hand,

P(Yt1 = i, Yt2 = j) = P(Xt1 = i,Xt2 = j) (since X is reversible)

= P(X0 = i,Xt2−t1 = j) (by time homogeneity)

= πipij(t2 − t1)

Hence πipij(t) = πjpji(t). Let t ↓ 0 gives the desired result.

• (iii) Suppose πigij = πjgji for any i, j. For any sequence of k1, k2, ..., kn of states we

have

πigik1gk1k2 ...gkn−1kngknj = gk1iπk1gk1k2 ...gkn−1kn

gknj

= ...

= gik1gk1k2 ...gkn−1kngknjπj

Sum the above expression over all sequences of k1, k2, ..., kn of length n we have

πi(Gn+1)ij = πj(Gn+1)ji

Since Pt = exp(tG) =∑∞n=0

(tG)n

n! , this implies that πipij(t) = πjpji(t). The chain is

then reversible by a similar calculation to part (ii).

References

[1] D. Stirzaker, G. Grimmett Probability and Random Processes Page 237 - 239.

4

Page 12: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

A Brief Study of Markov Chains from theTransition Probability Point of View

Joel Nibert

April 26, 2010

Let Xt : t0 ≤ t ≤ T denote a Markov process, taking values in Rd.The process has a transition probability function, which in the general non-homogeneous case, is a function of four arguments: s ≤ t are times in theinterval [t0, T ], and xεRd, and BεBd, the Borel sets. We write P (s, x, t, B)for the transition probability function, which is essentially the conditionalprobability P (XtεB|Xs = x).

The transition probability function must satisfy several properties. Whenthe first three arguments are fixed, P (s, x, t, ·) must be a probability measureon Bd. When only the x-argument is allowed to vary, the function P (s, ·, t, B)must be Bd - measurable. Finally, the transition probability function mustsatisfy the famous Kolmogorov-Chapman Equation (usually known in west-ern literature as the Chapman-Kolmogorov Equation). It states:

P (s, x, t, B) =∫

RdP (u, y, t, B)P (s, x, u, dy) (1)

The intuition behind the Kolmogorov-Chapman Equation derives froma consideration of times u, which are intermediate between s and t. Atany such time u, the Markov property guarantees that the future trajectoryof the process is independent from the past trajectory of the process- allthat matters is the current position, y. Thus the transition probabilities ofthe initial and final legs of the journey are multiplied, and those productssummed over all intermediate positions.

Following Arnold [1], we accept the K-C equation as a necessary conditionon the transition probability function. Alternatively, one can proceed byestablishing the filtration and a more standard definition of Markov process.We denote by Fs the sigma-algebra generated by the history of the process

1

Page 13: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

up to time s. As time increases, this gives an increasing sequence of sigma-algebras, the filtration generated by the random process. In this language,we recall the familiar definition of a Markov process as one satisfying

E(Xt|Fs) = E(Xt|Xs) (2)

for all times s < t. Then one can prove the K-C Equation using (2), andproperties of measurability and repeated conditioning.

We will continue to follow Arnold, and the point of view that the transi-tion probability is the fundamental object of study. We say that a Markovprocess Xt , as above, is a diffusion process if ∀s < t,∀xεRd, and ∀ε > 0,the following three statements hold true:

limt→s1

t− s∫

|y−x|>εP (s, x, t, dy) = 0 (3)

There exists an Rd -valued function f(s, x) such that

limt→s1

t− s∫

|y−x|≤ε(y − x)P (s, x, t, dy) = f(s, x) (4)

There exists a d by d matrix-valued function B(s, x) such that

limt→s1

t− s∫

|y−x|≤ε(y − x)(y − x)′P (s, x, t, dy) = B(s, x) (5)

Equation (3) means that large changes in Xt over a short period of timeare improbable, ruling out jump processes. Notice that the equations aboveuse truncated moments, because, a priori, there is no integrability. If, how-ever, integrability is known or assumed, then equation (4) gives f(s, x) asthe drift vector and equation (5) gives B(s, x) as the diffusion matrix. Ingeneral, moments do not fully characterize a distribution, but surprisingly,the first and second moments of P (s, x, t, B) are enough to define the diffu-sion. Finally, the equations defining the diffusion process greatly facilitatethe derivations of the Kolmogorov forward and backward equations. Pleasesee Gikhman and Skorokhod [2], and/or other sections of this compendiumfor more.

References

[1] Arnold, Ludwig. Stochastic Differential Equations: Theory and Applica-tions.

2

Page 14: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

[2] Gikhman and Skorokhod. Introduction to the Theory of Random Pro-cesses.

3

Page 15: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

1 Definitions of Ergodicity

1.1 Discrete-time

For discrete-time Markov chain, a state is called ergodic if it is persistent(recurrent),non-null( the expected time of return is finite, a.k.a. positive recurrent), andaperiodic.

A state i is said to be ergodic if it is aperiodic and positive recurrent. If allstates in a Markov chain are ergodic, then the chain is said to be ergodic.

Ergodic Markov chain is also irreducible. A finite state irreducible Markovchain is ergodic if it has an aperiodic state.

1.2 Continuous-time

The definition of ergodicity for discrete Markov chain is based on aperiodicity,which does not appear in the continuous case. Ergodicity for continuous Markovprocess thus is defined differently. Before we move on to that, the definition ofstationary distribution needs to be introduced.

For a homogeneous Markov process ξ(t), let pi,j(t) = P (ξ(t) = j|ξ(0) = i).If there exist pj such that

pj = limt→∞

pi,j(t)∑

j

pj = 1 (1)

then the distribution pj is called a stationary distribution.For continuous Markov processes, existence of stationary distribution is

equivalent to ergodicity.

2 Ergodic Theorem

Ergodic theorems concern the limiting behavior of averages over time. Theergodic theorem for Markov chain is a stronger result of the Strong Law ofLarge Numbers(SLLN) for independent random variables.

Theorem 2.1. If f(·) is a bounded function on the state space of ξ(t), then indiscrete-time case,

P

lim

n→∞1n

n∑

t=0

f(ξ(t)) =∑

j

pjf(j)

= 1 (2)

and in continuous-time case,

P

lim

T→∞1T

∫ T

0

f(ξ(t)) =∑

j

pjf(j)

= 1 (3)

1

Page 16: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

By choosing f(·) as the identity function, the ergodic theorem above becomesthe SLLN.

References

[1] James R. Norris. Markov Chains, Cambridge University Press,1998

[2] Geoffrey Grimmett and David Stirzaker. Probability and Random Pro-cesses,Oxford University Press,2001

2

Page 17: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

The Kolmogorov equations

Jerome Grand’Maison - Math 705 - Presentations

May 9, 2010

Abstract

My goal here is to give a sketch to show how both equations come up.

1 Setting and initial constructions

dXit = bi(t, Xt)dt +

mXk=1

σik(t, Xt)dW k

t

I will assume linear growth and Lipschitz coefficients (to guarantee aunique strong solution).

Let

D(t, x) =

nXi=1

bi(t, x)∂

∂xi+

1

2

nXi,j=1

mXk=1

σi(t, x)σj(t, x)∂2

∂xi∂xj

• Take an arbitrary function f (at least in the domain of the generatorof the diffusion process Xt and with compact support)

• LetA(y, t, T ) = E[f(XT )|Ft]|Xt=y (1)

• In the present I will assume that there exist a smooth density p(x, T |y, t)s. th. ∀g

E[g(XT )|Ft]|Xt=y =

Zg(x)p(x, T |y, t)dx1 . . . dxn

I believe that Nualart has criterias for when this is guaranteed, forexample when an operator highly related to D is hypo-elliptic, whichthen guarantees the existence and smoothness of p. I believe thatminimal assumptions under which this occurs is a whole theory inits own rights so I wont attempt to give it justice here. Nualart’sbook Malliavin Calculus and Related Topics (2nd Ed, 2006) is areference for this.

1

Page 18: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

2 Forward Kolmogorov equation

For the forward equation we will want to differentiate p in the forwardtime variable T :

1. By Ito’s lemma

E[f(XT )|Ft] = f(Xt) + E

24 TZt

Df(Xs)ds|Ft

352. Differentiate in T

∂A

∂T= E[D(f)(T, XT )|Ft]

3. Use integration by parts on the previous equationZf(x)

∂p

∂Tdx =

ZD(f)(T, x)pdx =

Zf(x)D∗pdx

Where

D∗(T, x)p(x, T |y, t) = −nX

i=1

∂xi(bi(T, x)p(x, T |y, t))+

1

2

nXi,j=1

mXk=1

∂2

∂xi∂xj(σi(T, x)σj(T, x)p(x, T |y, t))

4. Conclude that

∂p(x, T |y, t)

∂T= D∗(T, x)p(x, T |y, t)

5. FromlimT→t

A(y, t, T ) = y

we getlimT→t

p(x, T |y, t) = δ(x− y)

3 Backward Kolmogorov equation

Instead we could differentiate p in the backward time variable t :

1. Differentiate (1) in t using Feynman-Kac’s formula or Kolmogorov’swork (here again I’m hiding work and assumptions under the rug)to get

∂A

∂t= −D(t, y)A(y, t, T ) (2)

2. Plug A =R

f(x)p(x, T |y, t)dx in (2).

3. Conclude

−∂p(x, T |y, t)

∂t= D(t, y)p(x, T |y, t)

4. Fromlimt→T

A(y, t, T ) = y

we getlimt→T

p(x, T |y, t) = δ(x− y)

2

Page 19: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

1. Recurrence

Let Zn n = 0, 1, ... be a discrete time Markov Chain with values in acountable state space S. For some j ∈ S denote

τj = infn ≥ 1 : Zn = j,Pj(·) = P (·|Z0 = j) and Ej(·) = E(·|Z0 = j).

Definition: A state j ∈ S is recurrent if

Pj(τj <∞) = Pj(Zn = j for some n ≥ 1) = 1.

Zn is recurrent if this is true for all j ∈ S.

Example: (Durrett, p. 186) A simple random walk is recurrentin d ≤ 2 and transient (not recurrent) in d ≥ 3.

Definition: A state j ∈ S is positive recurrent if

Ej(τj) <∞.

Assume for some j ∈ S that Pj(τj = ∞) > 0, then Ej(τj) = ∞.So positive recurrence implies recurrence. The converse is not true ingeneral, this is called null recurrence.

Definition: A state j ∈ S is null recurrent if it is recurrent but notpositive recurrent, i.e.

Pj(τj <∞) = 1 and Ej(τj) =∞.

Example (Durrett, p. 181): A symmetric random walk is nullrecurrent for all values j 6= 0.

Let Xt t ≥ 0 be a continuous time Markov process with values in aclosed set D ⊂ Rd. For d = 1 the recurrence can be defined analog tothe above definition using a stopping time τj = inft > 0 : Xt = j. Inother cases we cannot expect that Xt hits fixed points, i.e. we need toconsider ”larger” sets when talking of recurrence. A set can be ”large”in a topological sense or a measure theoretical sense, which leads tothe following two definitions of recurrence.

1

Page 20: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

2

Definition: Xt is recurrent to open neighborhoods if for all openU ⊂ D

Px(Xt ∈ U for some t ≥ 0) = 1 for all x ∈ D.

Definition: Xt is Harris recurrent if there exists a σ-finite measureφ on D such that φ(A) > 0 implies

Px(Xt ∈ A for some t ≥ 0) = 1 for all x ∈ D.

See Baxendale, lecture notes for class 605 Topics in Probability,Spring 1010. Harris recurrence is the right kind of recurrence to look atfor Markov processes on Rd, for example we get the following theorem:

Theorem: If Xt is Harris recurrent then there exists an stationaryσ-finite measure µ, unique up to multiples. Moreover, the recurrencemeasure φ is absolutely continuous with respect to the invariant mea-sure µ.

Page 21: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

Consider discrete time homogeneous Markov chains with finite or count-

able state space S and transition probability matrix P . A stationary dis-

tribution for this chain is a vector π that satisfies πi ≥ 0,∑

i∈S πi = 1, and

πj =∑

i∈S πipij. This can be written in matrix vector form as πTP = πT .

As an example, let S = 0, 1 and P =(

(1−p) pq (1−q)

). Then you can solve

a system of equations to get π0 = qp+q

and π1 = pp+q

.

The two main results are as follows.

Theorem 0.1 Consider a recurrent irreducible aperiodic Markov chain. Let

P(n)ii be the probability of entering state i at the nth transition. Let f

(n)ii be

the probability of first returning to state i at the nth transition. Then

limn→∞

P(n)ii =

1∑∞

n=0 nf(n)ii

=1

mi

where mi is the expected amount of time until the first return to state i.

Theorem 0.2 In a positive recurrent aperiodic class with states j = 0, 1, 2, . . .

limn→∞P(n)jj = πj =

∞∑

i=0

πiPij,

∞∑

i=0

πi = 1

and the vector π is uniquely determined by the three equations defining a

1

Page 22: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

stationary distrbution.

If in addition S is finite and the chain is irreducible (there is only one

communicating class), then

limt→∞Pt =

π1 π2 . . . πn

π1 π2 . . . πn

. . . . . .

π1 π2 . . . πn

.

References

[1] Howard M. Taylor and Samuel Karlin, An introduction to stochastic mod-

eling, Academic Press, 1998.

2

Page 23: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

Reducible/Irreducible

(1) Discrete Markov Chain

Definition 0.1. A state y is called accessible from a state x (writtenx→ y) if there exists an integer n ≥ 0 such that

Pr (Xn = y|X0 = x) = p(n)xy > 0

Every state is defined to be accessible from itself.

Definition 0.2. A state x is called communicate with state y (writtenx↔ y) if both x→ y and y → x.A set of states C is a communicating class if every pair of states in Ccommunicates with each other, and no state in C communicates with anystate not in C.

Communication defines an equivalence relation and thus that communi-cating classes are the equivalence classes of this relation.

Definition 0.3. A Markov chain is called irreducible if its state space isa single communication class.

Definition 0.4. A communicating class is closed if the probability ofleaving the class is zero. That is, if state x is in C but y is not, then y isnot accessible from x.

Definition 0.5. Let T 0y = 0, and for k ≥ 1, let

T ky = inf

n > T k−1

y : Xn = y

T ky is the time of the kth return to y. We let Ty = T 1

y and ρxy =Px (Ty <∞)A state y is said to be recurrent if ρyy = 1 and transient if ρyy < 1.

Theorem 0.6. Decomposition Theorem. Let R = x : ρxx = 1 be therecurrent states of the a Markov Chain. R can be written as ∪iRi, whereeach Ri is closed and irreducible.

Example: Consider a Markov Chain with the states 1, 2, 3, 4, 5, 6, 7where 1 → 5, 2 → 1,2 → 3, 2 → 4, 3 → 4, 4 → 6, 4 → 7, 5 → 1, 6 → 4,6 → 7, 7 → 4. Then 1, 4, 5, 6, 7 are all the recurrent states, and 1, 5,4, 6, 7 are irreducible closed sets.

(2) Continuous Markov Chain

Definition 0.7. The transition probability pij(s, t) is defined to be

pij(s, t) = Pr(X(t) = j|X(s) = i)

for s ≤ t.The chain is called homogeneous if pij(s, t) = pij(0, t − s) for all i,j,s,t,and we write pij(t− s) for pij(s, t).The family of matrix Pt with entries pij(t) for t > 0 is called the transitionsemigroup of the chain.

1

Page 24: Markov Chains - University of Southern Californialototsky/InfDimErg/Math705-MarkCh-StudentWorkS... · Reversible Markov chain April 27, 2010 I. The case in discrete time: Suppose

2

We are interested in the behaviour of pij(h) for small h.It turns out that pij(h) is approximately linear in h when h is small.That is, there exist constants gij such that pij(h) ≈ gijh if i 6= j, pii(h) ≈1 + giih

Definition 0.8. The matrix G = gij is called the generator of thechain.

Definition 0.9. The chain is called irreducible if for any pair x, y ofstates we have that pxy(t) > 0 for some t.

Let X be a continuous-time Markov Chain with generator G = (gij) andsuppose that the transition semigroup Pt satisfies Pt = exp(tG). Then Xis irreducible if and only if for any pair i,j of states there exists a sequencek1, k2, ..., kn of states such that gi,k1gk1,k2 ...gkn,j 6= 0.