1 Part 6 Markov Chains. Markov Chains (1) A Markov chain is a mathematical model for stochastic...
-
Upload
lewis-logan -
Category
Documents
-
view
218 -
download
1
Transcript of 1 Part 6 Markov Chains. Markov Chains (1) A Markov chain is a mathematical model for stochastic...
1
Part 6 Markov Chains
Markov Chains (1) A Markov chain is a mathematical model
for stochastic systems whose states, discrete or continuous, are governed by transition probability.
Suppose the random variable take state space (Ω) that is a countable set of value. A Markov chain is a process that corresponds to the network.
2
1tX tX1tX 0X 1X ... ...
0 1, ,X X
Markov Chains (2) The current state in Markov chain only
depends on the most recent previous states.
Transition probability where
http://en.wikipedia.org/wiki/Markov_chainhttp://civs.stat.ucla.edu/MCMC/MCMC_tutorial/Lect1_MCMC_Intro.pdf
3
1 1 1 0 0
1
| , , ,
|
t t t t
t t
P X j X i X i X i
P X j X i
0 , , ,i i j
An Example of Markov Chains
where is initial state and so on. is transition matrix.
4
0 1
1,2,3,4,5
, , , ,tX X X X
0X
P
1 0.4 0.6 0.0 0.0 0.0
2 0.5 0.0 0.5 0.0 0.0
3 0.0 0.3 0.0 0.7 0.0
4 0.0 0.0 0.1 0.3 0.6
5 0.0 0.3 0.0 0.5 0.2
P
1 2 3 4 5
Definition (1) Define the probability of going from state i
to state j in n time steps as
A state j is accessible from state i if there are n time steps such that , where
A state i is said to communicate with state j (denote: ), if it is true that both i is accessible from j and that j is accessible from i.
5
( ) |nij t n tp P X j X i
( ) 0nijp
0,1,n
i j
Definition (2) A state i has period if any return to
state i must occur in multiples of time steps.
Formally, the period of a state is defined as
If , then the state is said to be aperiodic; otherwise ( ), the state is said to be periodic with period .
6
( )gcd : 0niid i n P
d i d i
d i
1d i 1d i
Definition (3) A set of states C is a communicating class
if every pair of states in C communicates with each other.
Every state in a communicating class must have the same period
Example:
7
Definition (4) A finite Markov chain is said to be
irreducible if its state space (Ω) is a communicating class; this means that, in an irreducible Markov chain, it is possible to get to any state from any state.
Example:
8
Definition (5) A finite state irreducible Markov chain is
said to be ergodic if its states are aperiodic
Example:
9
Definition (6) A state i is said to be transient if, given
that we start in state i, there is a non-zero probability that we will never return back to i.
Formally, let the random variable Ti be the next return time to state i (the “hitting time”):
Then, state i is transient iff there exists a finite Ti such that:
10
0min : |i nT n X i X i
1iP T
Definition (7) A state i is said to be recurrent or
persistent iff there exists a finite Ti such that: .
The mean recurrent time . State i is positive recurrent if is finite;
otherwise, state i is null recurrent. A state i is said to be ergodic if it is
aperiodic and positive recurrent. If all states in a Markov chain are ergodic, then the chain is said to be ergodic.
11
1iP T i iE T
i
Stationary Distributions Theorem: If a Markov Chain is irreducible
and aperiodic, then
Theorem: If a Markov chain is irreducible and aperiodic, thenand
where is stationary distribution.
12
( ) 1 as , ,n
ijj
P n i j
! limj nn
P X j
, 1, j i ij ji i
P j
Definition (8) A Markov chain is said to be reversible, if
there is a stationary distribution such that
Theorem: if a Markov chain is reversible, then
13
,i ij j jiP P i j
j i iji
P
An Example of Stationary Distributions A Markov chain:
The stationary distribution is
14
2
1 3
0.4
0.3
0.3
0.3
0.7 0.70.3
0.7 0.3 0.0
0.3 0.4 0.3
0.0 0.3 0.7
P
1 1 1
3 3 3
0.7 0.3 0.01 1 1 1 1 1
0.3 0.4 0.33 3 3 3 3 3
0.0 0.3 0.7
Properties of Stationary Distributions Regardless of the starting point, the process
of irreducible and aperiodic Markov chains will converge to a stationary distribution.
The rate of converge depends on properties of the transition probability.
15
16
Part 7 Monte Carlo Markov
Chains
Applications of MCMC Simulation:
Ex:
where are known. Integration: computing in high dimensions.
Ex:
Bayesian Inference:Ex: Posterior distributions, posterior means…
17
11, ~ , 1n xxn
x y f x y c y yx
0,1,2, , , 0 1, ,x n y
1
0E g y g y f y dy
Monte Carlo Markov Chains MCMC method are a class of algorithms for
sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution.
The state of the chain after a large number of steps is then used as a sample from the desired distribution.
http://en.wikipedia.org/wiki/MCMC
18
Inversion Method vs. MCMC (1) Inverse transform sampling, also known
as the probability integral transform, is a method of sampling a number at random from any probability distribution given its cumulative distribution function (cdf).
http://en.wikipedia.org/wiki/Inverse_transform_sampling_method
19
Inversion Method vs. MCMC (2) A random variable with a cdf , then has a
uniform distribution on [0, 1]. The inverse transform sampling method
works as follows:1. Generate a random number from the standard
uniform distribution; call this .2. Compute the value such that ; call this
.3. Take to be the random number drawn
from the distribution described by .
20
F F
u
F x ux
chosenx
chosenxF
Inversion Method vs. MCMC (3) For one dimension random variable,
Inversion method is good, but for two or more high dimension random variables, Inverse Method maybe not.
For two or more high dimension random variables, the marginal distributions for those random variables respectively sometime be calculated difficult with more time.
21
Gibbs Sampling One kind of the MCMC methods. The point of Gibbs sampling is that given a
multivariate distribution it is simpler to sample from a conditional distribution rather than integrating over a joint distribution.
George Casella and Edward I. George. "Explaining the Gibbs sampler". The American Statistician, 46:167-174, 1992. (Basic summary and many references.)
http://en.wikipedia.org/wiki/Gibbs_sampling
22
Example 1 (1) To sample x from:
where are known is a constant
One can see that
23
11, ~ , 1n xxn
x y f x y c y yx
0,1,2, , , 0 1, , ,x n y n
c
,
| 1 ~ ,n xxnf x y
f x y y y Binomial n yxf y
11,
| 1 ~ ,n xxf x y
f y x y y Beta x n xf x
Example 1 (2) Gibbs sampling Algorithm:
Initial Setting:
or a arbitrary value
For , sample a value from
Return24
0
0 0
~ 0,1
~ ,
y Uniform
x Bin n y
0,1
1
1 1
~ ,
~ ,
t t t
t t
y Beta x n x
x Bin n y
1 1,t tx y
,n nx y
0, ,t n
Example 1 (3) Under regular conditions: How many steps are needed for
convergence? Within an acceptable error, such as
is large enough, such as .25
, .t tt t
x x y y
10 20
1 11 0.001, .10 10
i i
t tt i t i
x xi
1000nn
Example 1 (4) Inversion Method:
is Beta-Binomial distribution.
The cdf of is that has a uniform distribution on [0, 1].
26
1
0~ ,
x f x f x y dy
n x n x
x n
0
x
i
n i n iF x
i n
x
x F x
Gibbs sampling by R (1)N = 1000; num = 16; alpha = 5; beta = 7tempy <- runif(1); tempx <- rbeta(1, alpha, beta)j = 0; Forward = 1; Afterward = 0while((abs(Forward-Afterward) > 0.0001) && (j <= 1000)){
Forward = Afterward; Afterward = 0for(i in 1:N){
tempy <- rbeta(1, tempx+alpha, num-tempx+beta)tempx <- rbinom(1, num, tempy)Afterward = Afterward+tempx
}Afterward = Afterward/N; j = j+1
}sample <- matrix(0, nrow = N, ncol = 2)for(i in 1:N){
tempy <- rbeta(1, tempx+alpha, num-tempx+beta)tempx <- rbinom(1, num, tempy)
27
Gibbs sampling by R (2)sample[i, 1] = tempx; sample[i, 2] = tempy
}sample_Inverse <- rbetabin(N, num, alpha, beta)write(t(sample), "Sample for Ex1 by R.txt", ncol = 2)Xhist <- cbind(hist(sample[, 1], nclass = num)$count,
hist(sample_Inverse, nclass = num)$count)write(t(Xhist), "Histogram for Ex1 by R.txt", ncol = 2)prob <- matrix(0, nrow = num+1, ncol = 2)for(i in 0:num){
if(i == 0){prob[i+1, 2] = mean(pbinom(i, num, sample[, 2]))prob[i+1, 1] = gamma(alpha+beta)*gamma(num+beta)prob[i+1, 1] = prob[i+1,
1]/(gamma(beta)*gamma(num+beta+alpha))}else{
28
Inverse method
Gibbs sampling by R (3)if(i == 1){
prob[i+1, 1] = num*alpha/(num-1+alpha+beta)for(j in 0:(num-2))
prob[i+1, 1] = prob[i+1, 1]*(beta+j)/(alpha+beta+j)
}else
prob[i+1, 1] = prob[i+1, 1]*(num-i+1)/(i)*(i-1+alpha)/(num-i+beta)
prob[i+1, 2] = mean((pbinom(i, num, sample[, 2])-pbinom(i-1, num, sample[, 2])))}if(i != num)
prob[i+2, 1] = prob[i+1, 1]}write(t(prob), "ProbHistogram for Ex1 by R.txt", ncol = 2)
29
Inversion Method by R (1)rbetabin <- function(N, size, alpha, beta){
Usample <- runif(N)
Pr_0 = gamma(alpha+beta)*gamma(size+beta)/gamma(beta)/gamma(size+beta+alpha)
Pr = size*alpha/(size-1+alpha+beta)for(i in 0:(size-2))
Pr = Pr*(beta+i)/(alpha+beta+i)Pr_Initial = Pr
sample <- array(0,N)CDF <- array(0, (size+1))CDF[1] <- Pr_0
30
Inversion Method by R (2)for(i in 1:size){
CDF[i+1] = CDF[i]+PrPr = Pr*(size-i)/(i+1)*(i+alpha)/(size-i-1+beta)
}for(i in 1:N){ sample[i] = which.min(abs(Usample[i]-CDF))-1}return(sample)
}
31
Gibbs sampling by C/C++ (1)
32
Gibbs sampling by C/C++ (2)
33
Inverse method
Gibbs sampling by C/C++ (3)
34
Gibbs sampling by C/C++ (4)
35
Inversion Method by C/C++ (1)
36
Inversion Method by C/C++ (2)
37
Plot Histograms by Maple (1) Figure 1:1000 samples with n=16, α=5 and
β=7.
38
Blue-Inversion method
Red-Gibbs sampling
Plot Histograms by Maple (2)
39
Probability Histograms by Maple (1) Figure 2:
Blue histogram and yellow line are pmf of x. Red histogram is from Gibbs
sampling.
40
1
1ˆ |m
i ii
P X x P X x Y ym
Probability Histograms by Maple (2) The probability histogram of blue histogram
of Figure 1 would be similar to the bule probability histogram of Figue 2, when the sample size .
The probability histogram of red histogram of Figure 1 would be similar to the red probability histogram of Figue 2, when the iteration n .
41
Probability Histograms by Maple (3)
42
Exercises Write your own programs similar to those
examples presented in this talk, including Example 1 in Genetics and other examples.
Write programs for those examples mentioned at the reference web pages.
Write programs for the other examples that you know.
43
44
Bayesian Methods with Monte Carlo Markov
Chains III
Henry Horng-Shing LuInstitute of Statistics
National Chiao Tung [email protected]
http://tigpbp.iis.sinica.edu.tw/courses.htm
45
Part 10 More Examples of Gibbs
Sampling
An Example with Three Random Variables (1) To sample as follows:
where is known, is a constant.
46
, ,X Y N
c0,1,2, , , 0 1, 0,1,2,x n y n
An Example with Three Random Variables (2) One can see that
47
| , 1 ~ ,n xxn
f x y n y y Binomial n yx
11| , 1 ~ ,n xxf y x n y y Beta x n x
1 1
| , ~ 1!
n xye yf n x x y Poisson y
n x
Gibbs sampling Algorithm:1. Initial Setting: ,
or an arbitrary value
or an arbitrary integer
2. Sample a value from
3. , repeat step 2 until convergence.48
An Example with Three Random Variables (3)
0
0
0 0 0
~ 0,1
~ 1,
~ ,
y Unif
n Discrete Unif
x Bin n y
0t
1, 0,1
1 1,t tx y
1
1 1
1 1 1
~ ,
~ 1
~ ,
t t t t
t t t
t t t
y Beta x n x
n x Possion y
x Bin n y
1t t
An Example with Three Random Variables by RN = 10000; alpha = 2; beta = 4; lambda = 16sample <- matrix(0, nrow = N, ncol = 3)tempY <- runif(1); tempN <- 1tempX <- rbinom(1, tempN, tempY)j = 0; forward = 1; afterward = 0while((abs(forward-afterward) > 0.001) && (j <= 1000)){
forward = afterward; afterward = 0for(i in 1:N){
tempY <- rbeta(1, tempX+alpha, tempN-tempX+beta)tempN <- rpois(1, (1-tempY)*lambda)tempN = tempN+tempXtempX <- rbinom(1, tempN, tempY)afterward = afterward+tempX
}afterward = afterward/N; j = j+1
}49
10000 samples withα=2, β=4 and λ=16
for(i in i:N){tempY <- rbeta(1, tempX+alpha, tempN-tempX+beta)tempN <- rpois(1, (1-tempY)*lambda)tempN = tempN+tempXtempX <- rbinom(1, tempN, tempY)sample[i, 2] = tempY
sample[i, 3] = tempN sample[i, 1] = tempX}
50
An Example with Three Random Variables by R
An Example with 3 Random Variables by C (1)
51
10000 samples withα=2, β=4 and λ=16
52
An Example with 3 Random Variables by C (2)
Example 1 in Genetics (1) Two linked loci with alleles A and a, and B
and b A, B: dominant a, b: recessive
A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab
53
A
B b
a B
A
b
a
1/2
1/2
a
B
b
A
A
B b
a 1/2
1/2
Example 1 in Genetics (2) Probabilities for genotypes in gametes
54
No Recombination Recombination
Male 1-r r
Female 1-r’ r’
AB ab aB Ab
Male (1-r)/2 (1-r)/2 r/2 r/2
Female (1-r’)/2 (1-r’)/2 r’/2 r’/2
A
B b
a B
A
b
a
1/2
1/2
a
B
b
A
A
B b
a 1/2
1/2
Example 1 in Genetics (3) Fisher, R. A. and Balmukand, B. (1928). The
estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92.
More:http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/bank/handout12.pdf
55
Example 1 in Genetics (4)
56
MALE
AB (1-r)/2
ab(1-r)/2
aBr/2
Abr/2
FEMALE
AB (1-r’)/2
AABB (1-r) (1-r’)/4
aABb(1-r) (1-r’)/4
aABBr (1-r’)/4
AABbr (1-r’)/4
ab(1-r’)/2
AaBb(1-r) (1-r’)/4
aabb(1-r) (1-r’)/4
aaBbr (1-r’)/4
Aabbr (1-r’)/4
aB r’/2
AaBB(1-r) r’/4
aabB(1-r) r’/4
aaBBr r’/4
AabBr r’/4
Ab r’/2
AABb(1-r) r’/4
aAbb(1-r) r’/4
aABbr r’/4
AAbb r r’/4
Example 1 in Genetics (5) Four distinct phenotypes:
A*B*, A*b*, a*B* and a*b*. A*: the dominant phenotype from (Aa, AA, aA). a*: the recessive phenotype from aa. B*: the dominant phenotype from (Bb, BB, bB). b*: the recessive phenotype from bb. A*B*: 9 gametic combinations. A*b*: 3 gametic combinations. a*B*: 3 gametic combinations. a*b*: 1 gametic combination. Total: 16 combinations.
57
Example 1 in Genetics (6) Let , then
58
2( * *)
41
( * *) ( * *)4
( * *)4
P A B
P A b P a B
P a b
(1 )(1 ')r r
Example 1 in Genetics (7) Hence, the random sample of n from the
offspring of selfed heterozygotes will follow a multinomial distribution:
We know that and
So
59
2 1 1; , , ,
4 4 4 4Multinomial n
(1 )(1 '), 0 1/ 2,r r r
1/ 4 1
0 ' 1/ 2r
Example 1 in Genetics (8) Suppose that we observe the data of
which is a random sample from
Then the probability mass function is
60
1 2 3 4, , , 125,18,20,24y y y y y
2 1 1; , , ,
4 4 4 4Multinomial n
2 31 4
1 2 3 4
! 2 1( | ) ( ) ( ) ( )
! ! ! ! 4 4 4y yy yn
f yy y y y
Example 1 in Genetics (9) How to estimate ?
MME (shown in the last week):http://en.wikipedia.org/wiki/Method_of_moments_%28statistics%29
MLE (shown in the last week):http://en.wikipedia.org/wiki/Maximum_likelihood
Bayesian Method:http://en.wikipedia.org/wiki/Bayesian_method
61
Example 1 in Genetics (10) As the value of is between ¼ and 1, we
can assume that the prior distribution of is .
The posterior distribution is
The integration in the above denominator,
does not have a closed form.62
14 ,1Unif
||
|
f y ff y
f y f d
|f y f d
Example 1 in Genetics (11) We will consider the mean of posterior
distribution (the posterior mean),
The Monte Carlo Markov Chains method is a good method to estimate
even if and the posterior mean do not have closed forms.
63
| |E y f y d
| |E y f y d |f y f d
Example 1 by R Direct numerical integration when :
> y <- c(125, 18, 20, 24)> phi <- runif(1000000, 1/4, 1)> f_phi <- function(phi){((2+phi)/4)^y[1]*((1-phi)/4)^(y[2]+y[3])*(phi/4)^y[4]}> mean(f_phi(phi)*phi)/mean(f_phi(phi))[1] 0.573808
We can assume other prior distributions to compare the results of posterior means: , , , ,
, 64
1,1Beta 2,2Beta 2,3Beta 3,2Beta
0.5,0.5Beta 5 510 ,10Beta
1~ ,1
4U
Example 1 by C/C++
65
Replace other prior distribution, such as Beta(1,1),…,Beta(1e-5,1e-5)
Beta Prior
66
67
Estimate Method
Estimate Method
MME 0.683616 BayesianBeta(2,3)
0.564731
MLE 0.663165 BayesianBeta(3,2)
0.577575
BayesianU(¼,1)
0.573931 BayesianBeta(½,½)
0.574928
BayesianBeta(1,1)
0.573918 BayesianBeta(10-5,10-5)
0.588925
BayesianBeta(2,2)
0.572103 BayesianBeta(10-7,10-7)
show below
Comparison for Example 1 (1)
68
Estimate Method
Estimate Method
BayesianBeta(10,10)
0.559905 BayesianBeta(10-7,10-7)
0.193891
BayesianBeta(102,102)
0.520366 BayesianBeta(10-7,10-7)
0.400567
BayesianBeta(104,104)
0.500273 BayesianBeta(10-7,10-7)
0.737646
BayesianBeta(105,105)
0.500027 BayesianBeta(10-7,10-7)
0.641388
BayesianBeta(10n,10n)
BayesianBeta(10-7,10-7)
Not stationary
0.5n
Comparison for Example 1 (2)
69
Part 11 Gibbs Sampling
Strategy
Sampling Strategy (1) Strategy I:
Run one chain for a long time. After some “Burn-in” period, sample points
every some fixed number of steps.
The code example of Gibbs sampling in the previous lecture use sampling strategy I.
http://www.cs.technion.ac.il/~cs236372/tirgul09.ps 70
Burn-in N samples from one chain
Sampling Strategy (2) Strategy II:
Run the chain N times, each run for M steps. Each run starts from a different state points. Return the last state in each run.
71
N samples from the last sample of each chain
Burn-in
Sampling Strategy (3) Strategy II by R:N = 100; num = 16; alpha = 5; beta = 7sample <- matrix(0, nrow = N, ncol = 2)for(k in 1:N){
tempy <- runif(1); tempx <- rbeta(1, alpha, beta)j = 0; Forward = 1; Afterward = 0while((abs(Forward-Afterward) > 0.001) && (j <= 100)){
Forward = Afterward; Afterward = 0for(i in 1:N){
tempy <- rbeta(1, tempx+alpha, num-tempx+beta)tempx <- rbinom(1, num, tempy)Afterward = Afterward+tempx
}Afterward = Afterward/N; j = j+1
}tempy <- rbeta(1, tempx+alpha, num-tempx+beta)tempx <- rbinom(1, num, tempy)sample[k, 1] = tempx; sample[k, 2] = tempy
}72
Sampling Strategy (4) Strategy II by C/C++:
73
Strategy Comparison Strategy I:
Perform “burn-in” only once and save time. Samples might be correlated (--although only
weakly). Strategy II:
Better chance of “covering” the space points especially if the chain is slow to reach stationary.
This must perform “burn-in” steps for each chain and spend more time.
74
Hybrid Strategies (1) Run several chains and sample few samples
from each. Combines benefits of both strategies.
75
N samples from each chain
Burn-in
Hybrid Strategies (2) Hybrid Strategy by R:tempN <- N; loc <- 1sample <- matrix(0, nrow = N, ncol = 2)while(loc != (N+1)){
tempy <- runif(1); tempx <- rbeta(1, alpha, beta); j = 0pN <- floor(runif(1)*(N-loc))+1cat(pN, '\n‘); Forward = 1; Afterward = 0while((abs(Forward-Afterward) > 0.001) && (j <= 100)){
Forward = Afterward; Afterward = 0for(i in 1:N){tempy <- rbeta(1, tempx+alpha, num-tempx+beta)tempx <- rbinom(1, num, tempy)Afterward = Afterward+tempx}Afterward = Afterward/N; j = j+1
}for(i in loc:(loc+pN-1)){
tempy <- rbeta(1, tempx+alpha, num-tempx+beta)tempx <- rbinom(1, num, tempy)sample[i, 1] <- tempx; sample[i, 2] <- tempy
}loc <- i+1
}76
Hybrid Strategies (3) Hybrid Strategy by C/C++:
77
78
Part 12 Metropolis-Hastings
Algorithm
Metropolis-Hastings Algorithm (1) Another kind of the MCMC methods. The Metropolis-Hastings algorithm can draw
samples from any probability distribution , requiring only that a functionproportional to the density can be calculated at .
Process in three steps: Set up a Markov chain; Run the chain until stationary; Estimate with Monte Carlo methods. http://en.wikipedia.org/wiki/Metropolis-Hastings_algo
rithm 79
xΠ
x
Metropolis-Hastings Algorithm (2) Let be a probability density (or
mass) function (pdf or pmf). is any function and we want to estimate
Construct the transition matrix of an irreducible Markov chain with states , whereand is its unique stationary distribution.
80
1, , n
f
1
n
ii
I E f f i
ijPP
1Pr | , 1, 2, ,ij t t tP X j X i X n 1,2, ,n
Π
Metropolis-Hastings Algorithm (3) Run this Markov chain for times and
calculate the Monte Carlo sum
then Sheldon M. Ross(1997). Proposition 4.3.
Introduction to Probability Model. 7th ed. http://nlp.stanford.edu/local/talks/mcmc_20
04_07_01.ppt
81
1, ,t N
1
1ˆN
t
t
I f XN
ˆ as I I N
Metropolis-Hastings Algorithm (4) In order to perform this method for a given
distribution , we must construct a Markov chain transition matrix with as its stationary distribution, i.e. .
Consider the matrix was made to satisfy the reversibility condition that for all and
. The property ensures that for all
and hence is a stationary distribution for 82
ΠP Π
ΠP = Π
P
i ij j ijΠ P = Π P
i j
i ij ji
P j
Π P
Metropolis-Hastings Algorithm (5) Let a proposal be irreducible where
, and range of is equal to range of .
But is not have to a stationary distribution of .
Process: Tweak to yield .
83
ijQQ
1Pr |ij t tQ X j X i QΠ
ΠQ
ijQ Π
States from Qij
not π Tweak States from Pij π
Metropolis-Hastings Algorithm (6) We assume that has the form
where is called accepted probability, i.e. given ,
take
84
ijP
, ,
1 ,
ij ij
ii ij
i j
P Q i j i j
P P
,i j
tX i
1
1
with probability ,
with probability 1- ,t
t
X j i j
X i i j
Metropolis-Hastings Algorithm (7) For
WLOG for some , . In order to achieve equality (*), one can
introduce a probability on the left-hand side and set on the right-hand side.
85
, i ij j jii j P P
( , ) ( , ) *i ij j jiQ i j Q j i
( , )i j i ij j jiQ Q
, 1i j , 1j i
Metropolis-Hastings Algorithm (8) Then
These arguments imply that the accepted probability must be
86
, ,
,
i ij j ji j ji
j ji
i ij
Q i j Q j i Q
Qi j
Q
( , )i j
, min 1,j ji
i ij
Qi j
Q
Metropolis-Hastings Algorithm (9) M-H Algorithm:
Step 1: Choose an irreducible Markov chain transition matrix with transition probability .Step 2: Let and initialize from states in .Step 3 (Proposal Step): Given , sample form .
87
Q
ijQ0t 0X
Q
tX i Y j iYQ
Metropolis-Hastings Algorithm (10) M-H Algorithm (cont.):
Step 4 (Acceptance Step):Generate a random number fromIf , setelseStep 5: , repeat Step 3~5 until convergence.
88
U 0,1Unif
,U i j 1tX Y j
1t tX X i 1t t
89
An Example of Step 3~5:
Qij
X1= Y1
X2= Y1
X3= Y3
‧ ‧
‧ ‧
‧ ‧
XN
PijTweak
Y1
Y2
Y3
‧
‧
‧
YN
X(t) Y
0
1 0
0 1
1
2 1
1 2
2
1 ,
1 ,0 1
0 ,
1 1
2 ,
2 ,1 2
1 ,
2 1
3 ,
1. from and
( ) ( , ) min 1 ,
( )
accepted
2. from Q and
( ) ( , ) min 1 ,
( )
not accepted
3. from Q
X Y
Y X
X Y
X Y
Y X
X Y
X Y
Y Q
Y QX Y
X Q
X Y
Y
Y QX Y
X Q
X Y
Y
2 1
1 2
3 ,2 3
2 ,
3 3
and
( ) ( , ) min 1 ,
( )
accepted
and so on.
Y X
X Y
Y Qa X Y
X Q
X Y
Metropolis-Hastings Algorithm (11)
Metropolis-Hastings Algorithm (12) We may define a “rejection rate” as the
proportion of times t for which . Clearly, in choosing , high rejection rates are to be avoided.
Example:
90
1t tX X Q
Xt
π
Y
1
will be small ( )
and it is likely that
More Step3~5 are needed .
t
t t
Y
X
X X
Example (1) Simulate a bivariate normal distribution:
91
1 1 11 122
2 2 21 22
1
1/2
~ , ,
1exp
2i.e. .
2 | |
T
XX N
X
X XX
Example (2) Metropolis-Hastings Algorithm:
1. 2. Generate and
that and are independent, then3.
4.
5. , repeat step 2~4 until convergence.
92
0 , 0X i 1 ~ 1,1U U 2 ~ 1,1U U
1U 2U 1
2i
UU
U
i i iY X U
1
1
( ) w.p. , min 1,
( )
w.p. 1- ,
ii i i i
i
i i i i
YX Y X Y
X
X X X Y
1i i
Example of M-H Algorithm by R (1)Pi <- function(x, mu, sigma){exp(-0.5*((x-mu)%*%chol2inv(sigma)
%*%as.matrix(x-mu)))/(2*pi*sqrt(det(sigma)))}N <- 1000; mu <- c(3, 7)sigma <- matrix(c(1, 0.4, 0.4, 1), nrow = 2)sample <- matrix(0, nrow = N, ncol = 2)j = 0; tempX <- muwhile((j < 1000) && (abs(mean(sample[, 1])-mu[1]) > 0.001)){
for(i in 1:N){ tempU <- c(runif(1, -1, 1), runif(1, -1, 1))
tempY <- tempX+tempUif(min(c(Pi(tempY, mu, sigma)/Pi(tempX, mu, sigma), 1))
> runif(1)){ tempX <- tempY; sample[i, ] <- tempY }
93
Example of M-H Algorithm by R (2)
else{tempX <- tempX; sample[i, ] <- tempX
}}j = j+1
}for(i in 1:N){
tempU <- c(runif(1, -1, 1), runif(1, -1, 1))tempY <- tempX+tempUif(min(c(Pi(tempY, mu, sigma)/Pi(tempX, mu, sigma), 1)) > runif(1)){
tempX <- tempY; sample[i, ] <- tempY}else{
tempX <- tempX; sample[i, ] <- tempX }}
94
Example of M-H Algorithm by C (1)
95
Example of M-H Algorithm by C (2)
96
Example of M-H Algorithm by C (3)
97
An Figure to Check Simulation Results Black points are simulated samples; color
points are probability density.
98
0 1 2 3 4 5 6
45
67
89
10
X1
X2
plot(sample, xlab = "X1", ylab = "X2")j = 0for(i in seq(0.01, 0.3, 0.02)){
for(x in seq(0,6, 0.1)){for(y in seq(4, 11,
0.1)){ if(abs(Pi(c(x, y), mu, sigma)-i) < 0.003) points(x, y, col = ((j)%%2+2), pch = 19)
}}j = j+1
}
Exercises Write your own programs similar to those
examples presented in this talk, including Example 1 in Genetics and other examples.
Write programs for those examples mentioned at the reference web pages.
Write programs for the other examples that you know.
99