HW3 Solutions 36-724: Applied Bayesian and …brian/724/hw03/myhw3sol.pdf · HW3 Solutions 36-724:...
Click here to load reader
Transcript of HW3 Solutions 36-724: Applied Bayesian and …brian/724/hw03/myhw3sol.pdf · HW3 Solutions 36-724:...
HW3 Solutions36-724: Applied Bayesian and Computational Statistics
March 2, 2006
Problem 1
a Fatal Accidents – Poisson(θ)
I will set a prior for θ to be Gamma, as it is the conjugate prior. I will allow the paramters to beα = 0.5, β = 0.01 as it is near the non-informative Jeffrey’s prior shown in the last problem. Thus,the posterior distribution can be calculated.
p(θ|y) ∝ p(y|θ)p(θ)
= e−nθθ∑
yiθ−1
2 e−0.01θ
= e−θ(n+0.01)θ∑
yi−1
2
Note that this is a Gamma(α =∑
yi + 12 , β = n + 0.01).
I get the 95% predictive interval using simulation. I found the interval to be [14, 35].
> theta=rgamma(1000, sum(y)+0.5, length(y)+0.01)> ypred=rpois(1000, theta)> sort(ypred)[c(25, 975)][1] 14 35
b Let θ = α + βt
(a) noninformative priorsWhat do we know about α + βt? We know it must be greater than zero since it is the mean of aPoisson.The prior for (α, β) ∼ 1 is one option. Given a fixed range of time values we may simplytruncate the prior to ensure that α + βt > 0 on the support of the prior (technically the interiorof the support).Another option is Jeffrey’s prior, however with multiple parameters, the use of Jeffreys’ prior ismore controversial.
(b) Informative Prior:One may postulate that the number of accidents decreases over time and choose a prior whichfavors negative values of β. Since, we do not hear of plane crashes all the time in the news, we
1
may believe that α is a fairly small positive number. To put these beliefs into practice, we cantake, for example, independent normals with means 50 and −5 and standard deviations of 20and 5.Another informative prior would be to do a regression analysis on the data to get estimatesand distributions for α and β, and assume then they are independent (not necessarily realistic).However, this would be double counting the data, and not really a “prior.”I will draw the contours assuming the regression analysis. (Again, this is not an ideal method.)We know that the regression coefficients are normally distributed. Performing a regression anal-ysis, we get the following.
Call:lm(formula = y ˜ t)
Residuals:Min 1Q Median 3Q Max
-4.576 -2.320 -1.761 3.273 5.818
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.8667 2.7494 10.499 5.89e-06 ***t -0.9212 0.4431 -2.079 0.0712 .---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.025 on 8 degrees of freedomMultiple R-Squared: 0.3508, Adjusted R-squared: 0.2696F-statistic: 4.322 on 1 and 8 DF, p-value: 0.07123
So I will let α ∼ N(28.87, std=2.75) and β ∼ N(−0.92, std=0.44). The contours are in Figure1.
> z=matrix(NA, 100, 100)> alpha.grid=seq(0, 50, length=100)> beta.grid=seq(-30, 20, length=100)> funval=function(alpha, beta)+ return(dnorm(alpha, 28.87, 2.75)*dnorm(beta, -0.92, 0.44))> for (i in 1:100)+ for (j in 1:100)+ z[i,j]=funval(alpha.grid[i], beta.grid[j])contour(alpha.grid, beta.grid, z, xlab="alpha", ylab="beta", xlim=c(20, 40), ylim=c(-4, 2))
(c) Posterior Density & Sufficient StatisticsUsing the non-informative prior (α, β) ∝ 1 we can compute the posterior.
p(α, β|y) ∝ 1 ∗10∏
i=1
(α + βti)yie−(α+βti)
= e−(nα+β∑
ti)∏
i
(α + βti)yi
2
alpha
beta
20 25 30 35 40
−4−3
−2−1
01
2
Figure 1: Prior Contours for 4b
In this case, there is no way to write the posterior as a ”reduced” function of the data, so the dataare the sufficient statistics, i.e. (y1, t1), ..., (y10, t10)
(d) Proper PosteriorTo check if the posterior is proper, we need to show
∫ ∫
p(α, β|y)dαdβ < ∞. The handwavyanswer is that the exponential term will dominate any polynomial, and thus it must be integrable.A more precise answer is the following. If we let Ω be the parameter space (possible values) for(α, β), we get sup(α,β)∈Ω(α + βti)e
−(α+βti) ≤ supu∈[0,∞)ue−u = M for some constantM. Hence, we can create an upper bound for the integral as
∫ ∫
M9(α + βt1)e−(α+βt1)dαdβ
which is integrable.
(e) Linear RegressionSee the above section on Prior Distributions. We found α ∼ N(28.87, std=2.75) and β ∼N(−0.92, std=0.44).
(f) PosteriorRecall that the prior on (α, β) ∝ 1. Recall that the posterior is proportional to e−(nα+β
∑
ti)∏
i(α+βti)
yi .A contour plot of the posterior is in Figure 2.
> alpha.grid=seq(15, 50, length=100)> beta.grid=seq(-3, 1, length=100)> for (i in 1:100)+ for (j in 1:100)+ z[i,j]=postfun(alpha.grid[i], beta.grid[j])> contour(alpha.grid, beta.grid, z, xlim=c(20, 40),
ylim=c(-2.5, 0.5), xlab="alpha", ylab="beta")
(g) Posterior density for expected number of accidents in 1986.Below is the code on how I sampled from the joint posterior. I then computed α + β ∗ 11 foreach of the sampled values and then I plotted the histogram in Figure 3
3
alpha
beta
20 25 30 35 40
−2.5
−2.0
−1.5
−1.0
−0.5
0.0
0.5
Figure 2: Posterior Contours for 4f
> alpha.grid=seq(20, 40, length=100)> beta.grid=seq(-2.5, 0.5, length=100)> for (i in 1:100)+ for (j in 1:100)+ z[i,j]=postfun(alpha.grid[i], beta.grid[j])> zvec=c(z)> post.sample=sample(length(alpha.grid)*length(beta.grid),
length(alpha.grid), replace=T, prob=zvec)> alpha=rep(NA, length(alpha.grid))> beta=rep(NA, length(beta.grid))> for (m in 1:100)+ j=post.sample[m]%/%100+ i=post.sample[m]%%100+ j=j+1+ if ( i ==0) i=100; j=j-1+ alpha[m]=alpha.grid[i]+ beta[m]=beta.grid[j]+ > hist(alpha+11*beta)
(h) Number of fatal accidentsFor this, I generate 100 new values of y based on the values of α and β. Doing this, I find theconfidence interval to be [9,30].
> newy=rpois(100, alpha+beta*11)> quantile(newy, c(0.025, 0.975))2.5% 97.5%8.475 29.575
(i) Informative prior vs. posterior
4
Histogram of alpha + 11 * beta
alpha + 11 * beta
Freq
uenc
y
10 15 20 25
05
1015
20
Figure 3: Posterior Expected Number of Accidents for 4g
The informative prior does vary from the posterior, in that my informative prior assumes inde-pendence, and α and β are clearly not independent in the posterior. Though I did know when Imade the informative prior that the independence assumption was not accurate.
Problem 2
a L(θ|~y) =∏
i piyi = (2 + θ)p1(1 − θ)p2+p3θp4/4n
b We use simple acceptance/rejection sampling to do our Monte Carlo simulation.
We get the following summary statisticsmean: 0.62194sd: 0.05134142
accept. rate: 0.12723
Plots of the posterior density and histograms of the samples for 2b and 2d are given below.
5
2d
samples
Den
sity
0.4 0.5 0.6 0.7 0.8
02
46
8
2d
samples
Den
sity
0.2 0.4 0.6 0.8 1.0
01
23
4
# This is the code for 2by=c(125,18,20,34);
loglik = function(theta) logp = y[1]*log(2+theta)+(y[2]+y[3])*log(1-theta)+
y[4]*log(theta);
fudge = 1e-3;M = -nlm( function(t) -loglik(t) , .5)$minimum + fudge;
n = 100000;samples = runif(n,0,1);U = runif(n,0,1);accept = (log(U) < loglik(samples) - M);samples = samples[accept];sum(accept)/n;mean(samples);sd(samples);
hist(samples,br=100,freq=F);s = seq(0,1, length=1000);gridpost = exp(loglik(s));gridpost = gridpost/sum(gridpost)*length(s);lines(s,gridpost);
c The estimated mean and standard deviation from the Laplace approximation are 0.6227477 and0.05095122 respectively. The code is given below.
h = function(t) -loglik(t)/n
6
hstar = function(theta) logp = -(y[1]*log(2+theta)+(y[2]+y[3])*
log(1-theta)+(y[4]+1)*log(theta))/n;hstar2 = function(theta) logp = -(y[1]*log(2+theta)+(y[2]+y[3])*
log(1-theta)+(y[4]+2)*log(theta))/n;
min = nlm( h, .5, hessian=T);min.star = nlm( hstar, .5, hessian=T);min.star2 = nlm( hstar2, .5, hessian=T);
mu = sqrt(1/min.star$hessian)*exp(-n*min.star$minimum) /(sqrt(1/min$hessian)*exp(-n*min$minimum))
mom2 = sqrt(1/min.star2$hessian)*exp(-n*min.star2$minimum) /(sqrt(1/min$hessian)*exp(-n*min$minimum))
sqrt(mom2 - muˆ2)
d The likelihood function is the same. We get the following summary statistics from running the samecode as in part a with a different data vector.
mean: 0.8304324sd: 0.1090006
accept. rate: 0.23798
e We have that our “old” posterior is cL(θ|~y) where c is the appropriate normalizing constant.
We have as the posterior mean under the new Beta(5, 15) prior
E(θ|~y) =
∫
θL(θ|~y)p(θ)dθ∫
L(θ|~y)p(θ)dθ=
∫
θp(θ)cL(θ|~y)dθ∫
p(θ)cL(θ|~y)dθ≈
∑
θi ∗ dbeta(θi, 5, 15)∑
dbeta(θi, 5, 15)(1)
where θi is the ith draw from our old posterior. Similarly, we may estimate the second moment.The estimated posterior mean and standard deviation from this Beta(5, 15) prior are 0.5542206 and0.04916435 respectively.
Problem 3 We are given that yij = θj + εij, εij ∼ N(0, σ2), θj = µ + γj , γj ∼ N(0, τ 2). Combiningthese, we see that yij = µ + γj + εij , εij ∼ N(0, σ2), γj ∼ N(0, τ 2) where the γ ′s are independent of eachother, and independent of the ε′s, and the ε′s are independent of each other.
a Corr(yi1,j , yi2,j), i1 6= i2
Corr(yi1,j, yi2,j) =Cov(yi1,j , yi2,j)
√
Var(yi1,j)√
Var(yi2,j)
=Cov(µ + γj + εi1j , µ + γj + εi2j)
√
Var(µ + γj + εi1j)√
Var(µ + γj + εi2j)
7
µ is constant =Cov(γj + εi1j , γj + εi2j)
√
Var(γj + εi1j)√
Var(γj + εi2j)
=Cov(γj , γj) + Cov(γj , εi1,j) + Cov(γj , εi2,j) + Cov(εi1,j, εi2,j)
√
Var(γj + εi1,j)√
Var(γj + εi2j)
by independence =τ2 + 0 + 0 + 0
σ2 + τ2
b Corr(yi1j1 , yi2j2)
Corr(yi1j1 , yi2j2) =Corr(µ + γj1 + εi1j1 , µ + γj2 + εi2j2)
√
Var(µ + γj1 + εi1j1)√
Var(µ + γj2 + εi2j2)
µ is constant =Corr(γj1 + εi1j1 , γj2 + εi2j2)
√
Var(γj1 + εi1j1)√
Var(γj2 + εi2j2)
=Corr(γj1 , γj2) + Corr(γj2 , εi1j1) + Corr(γj1 , εi2j2) + Corr(εi1j1 , εi2j2)
√
Var(γj1 + εi1j1)√
Var(γj2 + εi2j2)
=0 + 0 + 0 + 0
√
Var(γj1 + εi1j1)√
Var(γj2 + εi2j2)
= 0
Problem 4
a Graphical Summaries
See included pdf file for history, autocorrelation density plots and the numerical summary for sigma.thetaand theta[1]. My bugs code is below.
list(J=8,y=c(28, 8, -3, 7, -1, 1, 18, 12),sigma.y=c(15, 10, 16, 11, 9, 11, 10, 18))
modelfor (j in 1:J)y[j]˜dnorm(theta[j], tau.y[j])theta[j]˜dnorm(mu, tau.theta)tau.y[j]<- pow(sigma.y[j], -2)mu˜dnorm(0.0, 1.0E-6)tau.theta<-pow(sigma.theta,-2)sigma.theta˜dunif(0, 1000)
b Change for sigma.theta
See included pdf file for history, autocorrelation density plots and the numerical summary for sigma.thetaand theta[1]. My bugs code is below.
8
list(J=8,y=c(28, 8, -3, 7, -1, 1, 18, 12),sigma.y=c(15, 10, 16, 11, 9, 11, 10, 18))
modelfor (j in 1:J)y[j]˜dnorm(theta[j], tau.y[j])theta[j]˜dnorm(mu, tau.theta)tau.y[j]<- pow(sigma.y[j], -2)mu˜dnorm(0.0, 1.0E-6)tau.theta<-pow(sigma.theta,-2)v˜dgamma(1,1)sigma.theta<-sqrt(1/v)
Note: I choose just to compare sigma.theta and theta[1] for a write up comparison to save space. Othervariables could have been reviewed.
Comparing parts a and b, we see that the values for sigma.theta are generally larger for part a (mean6.4, sd 5.51) than for part b (mean 1.52, sd 1.154). Both histories look as if the chain is stable andconverged. However it is clear that the autocorrelation for the part a is larger than the autocorrelationfor part b. The densities appear similarly shaped, with the changes in the summaries (i.e. mean andsd) reflected.
For theta[1] (chosen arbitrarily for comparison), we see similar items. For part a, the mean was 10.99,and the sd was 8.14. For part b, the mean is smaller, at 7.5 and the sd is also smaller at 4.38. Bothhistories look stable, indicating the chain is converged. Though here, the autocorrelation for part a issmaller than the autocorrelation for part b. This could explain the smaller sd for part b.
9