Post on 16-Jun-2018
(Last adjustments: December 6, 2017)
Workshop on statistical challenges in astronomy–Hierarchical models in Stan
Presenter
Dr. John T. Ormerod
School of Mathematics & Statistics F07
University of Sydney
(w) 02 9351 5883
(e) john.ormerod (at) sydney.edu.au
Why MCMC?
2 Do you have data? (x)
2 Do you want to build a rich statistical model? (p(x|θ))
2 Perhaps you want to incorporate prior information? (p(θ))
2 Are the integrals to get posterior densities are intractable?
p(θ|x) = p(x|θ)p(θ)p(x)
=p(x,θ)∫p(x,θ)dθ
2 Are point estimates not adequate? (If not Variational Bayes might be for you).
Workshop on statistical challenges in astronomy 2
Review: MCMC
2 Markov Chain Monte Carlo. The samples form a Markov Chain.
2 Markov property:
p(θt+1|θt, . . . , θ1) = p(θt+1|θt)
2 Invariant distribution:
πP = π
2 Detailed balance: sufficient condition:
P(θt+1,A) =∫Aq(θt+1, θt)dy
π(θt+1)q(θt+1, θt) = π(θt)q(θt, θt+1)
2 A Markov chain satisfying the detailed balance will converge in distribution to
π(θ).
2 We want to design Markov chains to mimic posterior distributions.
Workshop on statistical challenges in astronomy 3
Review: Random Walk Metropolis Hastings
2 Want samples from posterior distribution: p(θ|x) ∝ p(x,θ).
2 Algorithm: (Suppose θ ∈ Rd)
◦ Suppose we have x, p(x,θ) and θ0. Choose B ∈ Rd×d
◦ Loop t = 1, . . . , T .
∗ Sample θprop ∼ N(θt−1,B)
∗ With probability
α = min
[1,p(x,θprop)
p(x,θt−1)
]Set θt = θprop otherwise set θt = θt−1.
2 The above Markov chain can be shown (under mild conditions) to satisfy the
detailed balance condition and converges in distribution to p(θ|x).
2 In practice we can treat samples {θt}Tt=1 as independent samples from p(θ|x)for sufficiently large T .
Workshop on statistical challenges in astronomy 4
Stan: Hamiltonian Monte Carlo
2 In the previous example a multivariate normal distribution was used to generate
proposal samples.
2 Stan uses Hamiltonian dynamics (Hamiltonian Monte Carlo) to generate good
proposal samples and then uses a accept-reject step to ensure that the Markov
chain mimics the posterior distribution.
2 See HMC notes for details.
2 HMC samples are typically much higher quality and require less tuning than
other samplers.
Workshop on statistical challenges in astronomy 5
Stan: Hamiltonian Monte Carlo
Workshop on statistical challenges in astronomy 6
Stan: Where to get help
In anticipation that we will not get through everything today I will begin with a list
of resources on Stan
2 Homepage http://mc-stan.org/
2 User Guide
http://mc-stan.org/users/
2 Documentation, tutorials and case studies
http://mc-stan.org/users/documentation/index.html
2 Nice book
“Bayesian Models for Astrophysical Data Using R, JAGS, Python, and Stan” by
Joseph M. Hilbe, Rafael S. de Souza & Emille E. O. Ishida
https://www.bayesianmodelsforastrophysicaldata.com/
2 Code from above book
https://github.com/astrobayes/
Workshop on statistical challenges in astronomy 7
Motivation for Stan
2 Fit rich Bayesian statistical models.
2 The process
1. Create a statistical model.
2. Perform inference on the model.
3. Evaluate.
2 Difficulty with models of interest in existing tools.
Workshop on statistical challenges in astronomy 8
Motivation (cont.)
2 Usability
◦ general purpose, clear modelling language, integration
2 Scalability
◦ model complexity, number of parameters, data size
2 Efficiency
◦ high effective sample sizes, fast iterations, low memory
2 Robustness
◦ model structure (i.e. posterior geometry), numerical routines
Workshop on statistical challenges in astronomy 9
What is Stan?
2 Statistical model specification language
◦ high level, probabilistic programming language
◦ user specifies statistical model
◦ easy to create statistical models
2 4 cross-platform users interfaces
◦ CmdStan - command line
◦ RStan - R integration
◦ PyStan - Python integration
◦ MStan - Matlab integration (user contributed)
Workshop on statistical challenges in astronomy 10
Inference
2 Hamiltonian Monte Carlo (HMC)
◦ Sample parameters on unconstrained space (transform & Jacobian adjust-
ment)
◦ Gradients of the model wrt parameters (automatic differentiation).
◦ Sensitive to tuning parameters (No-U-Turn sampler).
2 No-U-Turn Sampler (NUTS)
◦ warmup: estimates mass matrix and step size
◦ sampling: adapts number of steps
◦ maintains detailed balance
2 Optimization
◦ BFGS, Newtons method
2 Variational inference.
Workshop on statistical challenges in astronomy 11
Stan to Scientists
2 Flexible probabilistic language, language still growing
2 Focus on science: the modelling and assumptions
◦ access to multiple algorithms (default is pretty good)
◦ faster and less error prone than implementing from scratch
◦ efficient implementation
2 Lots of (free) modelling help on users list
2 Responsive developers, continued support for Stan
2 Not just for inference
◦ fast forward sampling; lots of distributions
◦ gradients for arbitrary functions
Workshop on statistical challenges in astronomy 12
The Stan Language
Data Types
2 basic:
real, int, vector, row_vector, matrix
2 constrained:
simplex, unit_vector, ordered, positive_ordered,
corr_matrix, cov_matrix
2 arrays
Workshop on statistical challenges in astronomy 13
The Stan Language
Bounded variables
2 applies to int, real, and matrix types
2 lower example:
real<lower=0> sigma;
2 upper example:
real<upper=100> x;
Workshop on statistical challenges in astronomy 14
The Stan Language
Program Blocks
2 data (optional)
2 transformed data (optional)
2 parameters (optional)
2 transformed parameters (optional)
2 model
2 generated quantities (optional)
Workshop on statistical challenges in astronomy 15
Stan Example: basic structure – Linear regression
data {
int<lower=0> N;
vector[N] y;
vector[N] x;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
alpha ~ normal(0,10);
beta ~ normal(0,10);
sigma ~ cauchy(0,5);
for (n in 1:N)
y[n] ~ normal(alpha + beta * x[n], sigma);
}
Workshop on statistical challenges in astronomy 16
Stan Example: vectorisation – Linear regression
data {
int<lower=0> N;
vector[N] y;
vector[N] x;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
alpha ~ normal(0,10);
beta ~ normal(0,10);
sigma ~ cauchy(0,5);
y ~ normal(alpha + beta*x, sigma);
}
Workshop on statistical challenges in astronomy 17
Stan Example: Eight Schools: hierarchical example
2 Educational Testing Service study to analyze effect of coaching
2 SAT-V in eight high schools
2 No prior reason to believe any program was:
◦ more effective than the others
◦ more similar to others
[see Rubin, 1981; Gelman et al., Bayesian Data Analysis, 2003]
Workshop on statistical challenges in astronomy 18
Stan Example: Eight Schools: hierarchical example
Estimated Standard Error of
School Treatment Effect Treatment Effect
A 28 15
B 8 10
C −3 16
D 7 11
E −1 9
F 1 11
G 18 10
H 12 18
Workshop on statistical challenges in astronomy 19
Stan Example: Eight Schools: Model 0
Set up block stuctures so that data can be read.
data {
int<lower=0> J; // # of schools
real y[J]; // estimated treatment
real<lower=0> sigma[J]; // std err of effect
}
parameters {
real<lower=0, upper=1> theta;
}
model {
}
Workshop on statistical challenges in astronomy 20
Stan Example: Eight Schools: Model 1 – No pooling
Each school treated independently.
data {
int<lower=0> J; // # of schools
real y[J]; // estimated treatment
real<lower=0> sigma[J]; // std err of effect
}
parameters {
real theta[J]; // school effect
}
model {
y ~ normal(theta, sigma);
}
Workshop on statistical challenges in astronomy 21
Stan Example: Eight Schools: Model 2 – Complete pooling
All schools lumped together.
data {
int<lower=0> J; // # of schools
real y[J]; // estimated treatment
real<lower=0> sigma[J]; // std err of effect
}
parameters {
real theta; // pooled school effect
}
model {
y ~ normal(theta, sigma);
}
Workshop on statistical challenges in astronomy 22
Stan Example: Eight Schools: Model 3 – Hierarchical Model
Estimate hyperparameters µ and σ2
data {
int<lower=0> J; // # of schools
real y[J]; // estimated treatment
real<lower=0> sigma[J]; // std err of effect
}
parameters {
real theta[J]; // school effect
real mu; // mean for schools
real<lower=0> tau; // variance between schools
}
model {
theta ~ normal(mu, tau);
y ~ normal(theta, sigma);
}
Workshop on statistical challenges in astronomy 23