1 Markov Chains. Markov Chains (1) A Markov chain is a mathematical model for stochastic systems...

Post on 17-Dec-2015

239 views 0 download

Transcript of 1 Markov Chains. Markov Chains (1) A Markov chain is a mathematical model for stochastic systems...

1

Markov Chains

Markov Chains (1) A Markov chain is a mathematical model

for stochastic systems whose states, discrete or continuous, are governed by transition probability.

Suppose the random variable take state space (Ω) that is a countable set of value. A Markov chain is a process that corresponds to the network.

2

1tX tX1tX 0X 1X ... ...

0 1, ,X X

Markov Chains (2) The current state in Markov chain only

depends on the most recent previous states.

Transition probability where

http://en.wikipedia.org/wiki/Markov_chainhttp://civs.stat.ucla.edu/MCMC/MCMC_tutorial/Lect1_MCMC_Intro.pdf

3

1 1 1 0 0

1

| , , ,

|

t t t t

t t

P X j X i X i X i

P X j X i

0 , , ,i i j

An Example of Markov Chains

where is initial state and so on. is transition matrix.

4

0 1

1,2,3,4,5

, , , ,tX X X X

0X

P

1 0.4 0.6 0.0 0.0 0.0

2 0.5 0.0 0.5 0.0 0.0

3 0.0 0.3 0.0 0.7 0.0

4 0.0 0.0 0.1 0.3 0.6

5 0.0 0.3 0.0 0.5 0.2

P

1 2 3 4 5

Definition (1) Define the probability of going from state i

to state j in n time steps as

A state j is accessible from state i if there are n time steps such that , where

A state i is said to communicate with state j (denote: ), if it is true that both i is accessible from j and that j is accessible from i.

5

( ) |nij t n tp P X j X i

( ) 0nijp

0,1,n

i j

Definition (2) A state i has period if any return to

state i must occur in multiples of time steps.

Formally, the period of a state is defined as

If , then the state is said to be aperiodic; otherwise ( ), the state is said to be periodic with period .

6

( )gcd : 0niid i n P

d i d i

d i

1d i 1d i

Definition (3) A set of states C is a communicating class

if every pair of states in C communicates with each other.

Every state in a communicating class must have the same period

Example:

7

Definition (4) A finite Markov chain is said to be

irreducible if its state space (Ω) is a communicating class; this means that, in an irreducible Markov chain, it is possible to get to any state from any state.

Example:

8

Definition (5) A finite state irreducible Markov chain is

said to be ergodic if its states are aperiodic

Example:

9

Definition (6) A state i is said to be transient if, given

that we start in state i, there is a non-zero probability that we will never return back to i.

Formally, let the random variable Ti be the next return time to state i (the “hitting time”):

Then, state i is transient iff there exists a finite Ti such that:

10

0min : |i nT n X i X i

1iP T

Definition (7) A state i is said to be recurrent or

persistent iff there exists a finite Ti such that: .

The mean recurrent time . State i is positive recurrent if is finite;

otherwise, state i is null recurrent. A state i is said to be ergodic if it is

aperiodic and positive recurrent. If all states in a Markov chain are ergodic, then the chain is said to be ergodic.

11

1iP T i iE T

i

Stationary Distributions Theorem: If a Markov Chain is irreducible

and aperiodic, then

Theorem: If a Markov chain is irreducible and aperiodic, thenand

where is stationary distribution.

12

( ) 1 as , ,n

ijj

P n i j

! limj nnP X j

, 1, j i ij ji i

P j

Definition (8) A Markov chain is said to be reversible, if

there is a stationary distribution such that

Theorem: if a Markov chain is reversible, then

13

,i ij j jiP P i j

j i iji

P

An Example of Stationary Distributions A Markov chain:

The stationary distribution is

14

2

1 3

0.4

0.3

0.3

0.3

0.7 0.70.3

0.7 0.3 0.0

0.3 0.4 0.3

0.0 0.3 0.7

P

1 1 1

3 3 3

0.7 0.3 0.01 1 1 1 1 1

0.3 0.4 0.33 3 3 3 3 3

0.0 0.3 0.7

Properties of Stationary Distributions Regardless of the starting point, the process

of irreducible and aperiodic Markov chains will converge to a stationary distribution.

The rate of converge depends on properties of the transition probability.

15

16

Monte Carlo Markov Chains

Monte Carlo Markov Chains MCMC method are a class of algorithms for

sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution.

The state of the chain after a large number of steps is then used as a sample from the desired distribution.

http://en.wikipedia.org/wiki/MCMC

17

18

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm (1) The Metropolis-Hastings algorithm can draw

samples from any probability distribution , requiring only that a functionproportional to the density can be calculated at .

Process in three steps: Set up a Markov chain; Run the chain until stationary; Estimate with Monte Carlo methods. http://en.wikipedia.org/wiki/Metropolis-

Hastings_algorithm19

x

Metropolis-Hastings Algorithm (2) Let be a probability density (or

mass) function (pdf or pmf). is any function and we want to estimate

Construct the transition matrix of an irreducible Markov chain with states , whereand is its unique stationary distribution.

20

1, , n

f

1

n

ii

I E f f i

ijPP

1Pr | , 1, 2, ,ij t t tP X j X i X n 1,2, ,n

Π

Metropolis-Hastings Algorithm (3) Run this Markov chain for times and

calculate the Monte Carlo sum

then Sheldon M. Ross(1997). Proposition 4.3.

Introduction to Probability Model. 7th ed. http://nlp.stanford.edu/local/talks/

mcmc_2004_07_01.ppt

21

1, ,t N

1

1ˆN

t

t

I f XN

ˆ as I I N

Metropolis-Hastings Algorithm (4) In order to perform this method for a given

distribution , we must construct a Markov chain transition matrix with as its stationary distribution, i.e. .

Consider the matrix was made to satisfy the reversibility condition that for all and

. The property ensures that for all

and hence is a stationary distribution for 22

ΠP Π

ΠP = Π

P

i ij j ijΠ P = Π P

i j

i ij ji

P j

Π P

Metropolis-Hastings Algorithm (5) Let a proposal be irreducible where

, and range of is equal to range of .

But is not have to a stationary distribution of .

Process: Tweak to yield .

23

ijQQ 1Pr |ij t tQ X j X i Q

Π

ΠQ

ijQ Π

States from Qij

not π Tweak States from Pij π

Metropolis-Hastings Algorithm (6) We assume that has the form

where is called accepted probability, i.e. given ,

take

24

ijP

, ,

1 ,

ij ij

ii ij

i j

P Q i j i j

P P

,i j

tX i

1

1

with probability ,

with probability 1- ,t

t

X j i j

X i i j

Metropolis-Hastings Algorithm (7) For

WLOG for some , . In order to achieve equality (*), one can

introduce a probability on the left-hand side and set on the right-hand side.

25

, i ij j jii j P P

( , ) ( , ) *i ij j jiQ i j Q j i

( , )i j i ij j jiQ Q

, 1i j , 1j i

Metropolis-Hastings Algorithm (8) Then

These arguments imply that the accepted probability must be

26

, ,

,

i ij j ji j ji

j ji

i ij

Q i j Q j i Q

Qi j

Q

( , )i j

, min 1,j ji

i ij

Qi j

Q

Metropolis-Hastings Algorithm (9) M-H Algorithm:

Step 1: Choose an irreducible Markov chain transition matrix with transition probability .Step 2: Let and initialize from states in .Step 3 (Proposal Step): Given , sample form .

27

Q

ijQ0t 0X

Q

tX i Y j iYQ

Metropolis-Hastings Algorithm (10) M-H Algorithm (cont.):

Step 4 (Acceptance Step):Generate a random number fromIf , setelseStep 5: , repeat Step 3~5 until convergence.

28

U 0,1Unif

,U i j 1tX Y j

1t tX X i 1t t