Applied Bayesian Data Analysis
Roy Levy, Ph.D.
Upcoming Seminar: February 18-20, 2021, Remote Seminar
Regression
Multiple Regression
27
Regression 28
Multiple Regression Model: J Predictors
Multiple xs, y for each of n subjects
• y = (y1, y2, y3,…, yn)
• x = (x1, x2, x3,…, xn)
• xi = (xi1, xi2,…, xiJ)
yi = β0 + β1xi1 + … + βJxiJ + εi εi independent, ~ N(0, σε2)
yi | xi, β0, β1,…, βJ, σε2 ~ N(β0 + β1xi1 +…+ βJxiJ, σε
2)
yi = β0 + β′xi + εi εi independent, ~ N(0, σε2), β = (β1,…, βJ)
yi | xi, β, σε2 ~ N(β0 + β′xi, σε
2)
Regression
Multiple Regression: Bayesian Analysis
29
Regression 30
Posterior Distribution
p(β0, β, σε | y, x) p(y | β0, β, σε, x) p(β0, β, σε)
Regression
Conditional Probability of the Data
31
Regression 32
Conditional Probability of the Data
p(β0, β1, σε | y, x) p(y | β0, β, σε, x) p(β0, β, σε)
Assuming exchangeability of subjects
p(y | β0, β, σε, x) = Πi p(yi | β0, β, σε, xi)
Assuming conditional normality
yi | β0, β, σε, xi ~ N(β0 + β1xi1 + … + βJxiJ, σε2)
Regression
Prior Distribution
33
Regression 34
Priors
p(β0, β, σε) = p(β0) p(β) p(σε)
Multivariate
prior?
0 0
2
0( ) ( , )p N =
( )p =β
~ Exp( )
Regression 35
Priors
p(β0, β, σε) = p(β0) p(β) p(σε)
0 0
2
0( ) ( , )p N =
~ Exp( )
1 2( ) ( ) ( ) ( )Jp p p p=β
Regression 36
Priors
p(β0, β, σε) = p(β0) p(β) p(σε)
0 0
2
0( ) ( , )p N =
~ Exp( )
1 2( ) ( ) ( ) ( )Jp p p p=β
1 1 2 2
2 2 2( , ) ( , ) ( , )J J
N N N =
Regression
Assuming exchangeability
37
Priors
p(β0, β, σε) = p(β0) p(β) p(σε)
0 0
2
0( ) ( , )p N =
~ Exp( )
1 2( ) ( ) ( ) ( )Jp p p p=β
1 1 2 2
2 2 2( , ) ( , ) ( , )J J
N N N =
2( ) ( , )jp N = j = 1,…, J
Regression 38
Priors
p(β0, β, σε) = p(β0) p(β) p(σε)
0 0
2
0( ) ( , )p N =
~ Exp( )
2( ) ( , )jp N = j = 1,…, J
Regression
Tying It All Together:
Complete Model and Posterior Distribution
39
Regression 40
Posterior Distribution
p(β0, β, σε | y, x) p(y | β0, β, σε, x) p(β0, β, σε)
Πi N(β0 + β1xi1 + … + βJxiJ, σε2)
j = 1,…, J
0 0
2
0 ~ ( , )N
2~ ( , )j N
~ Exp( )
Regression
Example
41
Regression 42
Example
• End-of-chapter test scores, from summing dichotomously
scored item responses from 50 subjects
• Regress Chapter 3 on Chapter 1 and Chapter 2
Test # items Range Mean
Standard
Deviation
Chapter 1 16 4-16 14.10 2.02
Chapter 2 18 3-18 14.34 3.29
Chapter 3 15 1-15 12.22 2.96
Correlation Chapter 1 Chapter 2
Chapter 2 0.58
Chapter 3 0.69 0.68
Regression 43
Posterior Distribution
p(β0, β, σε | y, x) p(y | β0, β, σε, x) p(β0, β, σε)
Πi N(β0 + β1xi1 + β2xi2, σε2)
j = 1, 2
0 ~ (0,900)N
~ (0,900)j N
~ Exp(1)
Regression
Core Code
44
Ch3Testi | Ch1Testi, β0, β1, β2, σε ~ N(β0 + β1Ch1Testi, + β2Ch2Testi, σε2)
fitted.model <- stan_glm(
Ch3Test ~ Ch1Test + Ch2Test,
Regression
Core Code
45
β0 ~ N(0, 900) = N(0, 302)
βj ~ N(0, 900) = N(0, 302) for j = 1, 2
σε ~ Exp(1)
fitted.model <- stan_glm(
prior_intercept = normal(0, 30, autoscale =
FALSE),
prior = normal(0, 30, autoscale = FALSE),
prior_aux = exponential(1, autoscale =
FALSE),
Regression
See
‘Regression model (Ch1Test and Ch2Test predictors)
in Stan via rstanarm.R’
46
Regression 47
Convergence of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup
Regression 48
Convergence of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup
Regression 49
Convergence of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup
Regression 50
Convergence of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup
Regression 51
Convergence of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup
Regression 52
Summary of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup
Regression 53
Summary of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup
Regression 54
Summary of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup
Mean SD
95% HPD
lower
95% HPD
Upper
Effective
Size
Intercept -2.52 1.97 -6.24 1.46 21585
Ch1Test 0.66 0.17 0.31 0.97 13993
Ch2Test 0.38 0.10 0.18 0.59 16222
sigma 1.92 0.20 1.55 2.32 11706
R.squared 0.59 0.08 0.43 0.73 17241
Regression 55
Summary of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup
Mean SD
95% HPD
lower
95% HPD
Upper
Intercept -2.52 1.97 -6.24 1.46
Ch1Test 0.66 0.17 0.31 0.97
Ch2Test 0.38 0.10 0.18 0.59
sigma 1.92 0.20 1.55 2.32
R.squared 0.59 0.08 0.43 0.73
Interpretation of the slope, R2?
Regression 56
Write Up
We conducted a Bayesian normal theory linear regression
analysis, specifying the outcome as
yi | β0, β, σε, xi ~ N(β0 + β1xi1 + β2xi2, σε2) i = 1,…, n
and employed diffuse prior distributions
β0 ~ N(0, 900); βj ~ N(0, 900) j = 1, 2; σε ~ Exponential(1).
4 chains were run for 5,000 iterations following a warmup period
of 1,000 iterations. Inspection of the trace plots and the PSRF ( )
evidenced convergence. The marginal posterior distributions for
the parameters are depicted in Figure xxxx and summarized in
Table xxxx….
R̂
Regression 57
Table of Results from Bayesian Analysis
Bayesian Analysis
Post.
Mean
Post.
SD
95% Cred.
Interval
β0 -2.52 1.97 (-6.24, 1.46)
β1 0.66 0.17 (0.31, 0.97)
β2 0.38 0.10 (0.18, 0.59)
σε 1.92 0.20 (1.55, 2.32)
R2 0.59 0.08 (0.43, 0.73)
Regression 58
Classical and Bayesian Analyses of the Traditional Model
Frequentist Analysis of
Traditional Model
Bayesian Analysis of
Traditional Model
Est. SE
95% Conf.
Int.
Post.
Mean
Post.
SD
95% Cred.
Interval
β0 -2.54 1.93 (-6.41, 1.34) -2.52 1.97 (-6.24, 1.46)
β1 0.66 0.17 (0.33, 0.99) 0.66 0.17 (0.31, 0.97)
β2 0.38 0.10 (0.18, 0.59) 0.38 0.10 (0.18, 0.59)
σε 1.95 0.28 (1.60, 2.37) 1.92 0.20 (1.55, 2.32)
R2 0.60 0.59 0.08 (0.43, 0.73)
Numerically similar, conceptually different
Results for β0 troublesome
Top Related