Applied Bayesian Data Analysis - Statistical Horizons

Post on 08-Dec-2021

4 views 0 download

Transcript of Applied Bayesian Data Analysis - Statistical Horizons

Applied Bayesian Data Analysis

Roy Levy, Ph.D.

Upcoming Seminar: February 18-20, 2021, Remote Seminar

Regression

Multiple Regression

27

Regression 28

Multiple Regression Model: J Predictors

Multiple xs, y for each of n subjects

• y = (y1, y2, y3,…, yn)

• x = (x1, x2, x3,…, xn)

• xi = (xi1, xi2,…, xiJ)

yi = β0 + β1xi1 + … + βJxiJ + εi εi independent, ~ N(0, σε2)

yi | xi, β0, β1,…, βJ, σε2 ~ N(β0 + β1xi1 +…+ βJxiJ, σε

2)

yi = β0 + β′xi + εi εi independent, ~ N(0, σε2), β = (β1,…, βJ)

yi | xi, β, σε2 ~ N(β0 + β′xi, σε

2)

Regression

Multiple Regression: Bayesian Analysis

29

Regression 30

Posterior Distribution

p(β0, β, σε | y, x) p(y | β0, β, σε, x) p(β0, β, σε)

Regression

Conditional Probability of the Data

31

Regression 32

Conditional Probability of the Data

p(β0, β1, σε | y, x) p(y | β0, β, σε, x) p(β0, β, σε)

Assuming exchangeability of subjects

p(y | β0, β, σε, x) = Πi p(yi | β0, β, σε, xi)

Assuming conditional normality

yi | β0, β, σε, xi ~ N(β0 + β1xi1 + … + βJxiJ, σε2)

Regression

Prior Distribution

33

Regression 34

Priors

p(β0, β, σε) = p(β0) p(β) p(σε)

Multivariate

prior?

0 0

2

0( ) ( , )p N =

( )p =β

~ Exp( )

Regression 35

Priors

p(β0, β, σε) = p(β0) p(β) p(σε)

0 0

2

0( ) ( , )p N =

~ Exp( )

1 2( ) ( ) ( ) ( )Jp p p p=β

Regression 36

Priors

p(β0, β, σε) = p(β0) p(β) p(σε)

0 0

2

0( ) ( , )p N =

~ Exp( )

1 2( ) ( ) ( ) ( )Jp p p p=β

1 1 2 2

2 2 2( , ) ( , ) ( , )J J

N N N =

Regression

Assuming exchangeability

37

Priors

p(β0, β, σε) = p(β0) p(β) p(σε)

0 0

2

0( ) ( , )p N =

~ Exp( )

1 2( ) ( ) ( ) ( )Jp p p p=β

1 1 2 2

2 2 2( , ) ( , ) ( , )J J

N N N =

2( ) ( , )jp N = j = 1,…, J

Regression 38

Priors

p(β0, β, σε) = p(β0) p(β) p(σε)

0 0

2

0( ) ( , )p N =

~ Exp( )

2( ) ( , )jp N = j = 1,…, J

Regression

Tying It All Together:

Complete Model and Posterior Distribution

39

Regression 40

Posterior Distribution

p(β0, β, σε | y, x) p(y | β0, β, σε, x) p(β0, β, σε)

Πi N(β0 + β1xi1 + … + βJxiJ, σε2)

j = 1,…, J

0 0

2

0 ~ ( , )N

2~ ( , )j N

~ Exp( )

Regression

Example

41

Regression 42

Example

• End-of-chapter test scores, from summing dichotomously

scored item responses from 50 subjects

• Regress Chapter 3 on Chapter 1 and Chapter 2

Test # items Range Mean

Standard

Deviation

Chapter 1 16 4-16 14.10 2.02

Chapter 2 18 3-18 14.34 3.29

Chapter 3 15 1-15 12.22 2.96

Correlation Chapter 1 Chapter 2

Chapter 2 0.58

Chapter 3 0.69 0.68

Regression 43

Posterior Distribution

p(β0, β, σε | y, x) p(y | β0, β, σε, x) p(β0, β, σε)

Πi N(β0 + β1xi1 + β2xi2, σε2)

j = 1, 2

0 ~ (0,900)N

~ (0,900)j N

~ Exp(1)

Regression

Core Code

44

Ch3Testi | Ch1Testi, β0, β1, β2, σε ~ N(β0 + β1Ch1Testi, + β2Ch2Testi, σε2)

fitted.model <- stan_glm(

Ch3Test ~ Ch1Test + Ch2Test,

Regression

Core Code

45

β0 ~ N(0, 900) = N(0, 302)

βj ~ N(0, 900) = N(0, 302) for j = 1, 2

σε ~ Exp(1)

fitted.model <- stan_glm(

prior_intercept = normal(0, 30, autoscale =

FALSE),

prior = normal(0, 30, autoscale = FALSE),

prior_aux = exponential(1, autoscale =

FALSE),

Regression

See

‘Regression model (Ch1Test and Ch2Test predictors)

in Stan via rstanarm.R’

46

Regression 47

Convergence of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup

Regression 48

Convergence of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup

Regression 49

Convergence of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup

Regression 50

Convergence of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup

Regression 51

Convergence of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup

Regression 52

Summary of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup

Regression 53

Summary of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup

Regression 54

Summary of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup

Mean SD

95% HPD

lower

95% HPD

Upper

Effective

Size

Intercept -2.52 1.97 -6.24 1.46 21585

Ch1Test 0.66 0.17 0.31 0.97 13993

Ch2Test 0.38 0.10 0.18 0.59 16222

sigma 1.92 0.20 1.55 2.32 11706

R.squared 0.59 0.08 0.43 0.73 17241

Regression 55

Summary of 4 Chains for 5,000 Iterations After 1,000 Iterations of Warmup

Mean SD

95% HPD

lower

95% HPD

Upper

Intercept -2.52 1.97 -6.24 1.46

Ch1Test 0.66 0.17 0.31 0.97

Ch2Test 0.38 0.10 0.18 0.59

sigma 1.92 0.20 1.55 2.32

R.squared 0.59 0.08 0.43 0.73

Interpretation of the slope, R2?

Regression 56

Write Up

We conducted a Bayesian normal theory linear regression

analysis, specifying the outcome as

yi | β0, β, σε, xi ~ N(β0 + β1xi1 + β2xi2, σε2) i = 1,…, n

and employed diffuse prior distributions

β0 ~ N(0, 900); βj ~ N(0, 900) j = 1, 2; σε ~ Exponential(1).

4 chains were run for 5,000 iterations following a warmup period

of 1,000 iterations. Inspection of the trace plots and the PSRF ( )

evidenced convergence. The marginal posterior distributions for

the parameters are depicted in Figure xxxx and summarized in

Table xxxx….

Regression 57

Table of Results from Bayesian Analysis

Bayesian Analysis

Post.

Mean

Post.

SD

95% Cred.

Interval

β0 -2.52 1.97 (-6.24, 1.46)

β1 0.66 0.17 (0.31, 0.97)

β2 0.38 0.10 (0.18, 0.59)

σε 1.92 0.20 (1.55, 2.32)

R2 0.59 0.08 (0.43, 0.73)

Regression 58

Classical and Bayesian Analyses of the Traditional Model

Frequentist Analysis of

Traditional Model

Bayesian Analysis of

Traditional Model

Est. SE

95% Conf.

Int.

Post.

Mean

Post.

SD

95% Cred.

Interval

β0 -2.54 1.93 (-6.41, 1.34) -2.52 1.97 (-6.24, 1.46)

β1 0.66 0.17 (0.33, 0.99) 0.66 0.17 (0.31, 0.97)

β2 0.38 0.10 (0.18, 0.59) 0.38 0.10 (0.18, 0.59)

σε 1.95 0.28 (1.60, 2.37) 1.92 0.20 (1.55, 2.32)

R2 0.60 0.59 0.08 (0.43, 0.73)

Numerically similar, conceptually different

Results for β0 troublesome