Gaussian variational approximation with structured ... · Gaussian variational approximation with...

39
Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability National University of Singapore Collaborators: Linda Tan, Victor Ong, Michael Smith, Matias Quiroz, Robert Kohn David Nott, NUS Gaussian variational approximation 1 / 39

Transcript of Gaussian variational approximation with structured ... · Gaussian variational approximation with...

Page 1: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Gaussian variational approximation with structuredcovariance matrices

David Nott

Department of Statistics and Applied ProbabilityNational University of Singapore

Collaborators: Linda Tan, Victor Ong, Michael Smith,Matias Quiroz, Robert Kohn

David Nott, NUS Gaussian variational approximation 1 / 39

Page 2: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Bayesian computation as usual

Data y to be observed, unknowns θ. Construct a model for (y , θ):

p(y , θ) =

prior³·µp(θ)

likelihood³¹¹¹¹¹¹·¹¹¹¹¹¹µp(y ∣θ)

Condition on the observed y :

p(θ∣y)´¹¹¹¹¹¹¸¹¹¹¹¹¹¶posterior

∝ p(θ)p(y ∣θ)

Summarization of the posterior is done using algorithms like MCMC,sequential Monte Carlo, etc.

These usual Monte Carlo algorithms are exact in principle.

David Nott, NUS Gaussian variational approximation 2 / 39

Page 3: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Variational approximations

Increasingly there is interest in algorithms that do not possess theexact in principle property.

Why approximate inference?Make use of the scalability of optimization-based approaches tocomputation.Approximate inference methods are enough to understand whycertain models are unsuitable.Approximate inference methods may perform as well as exactmethods for predictive inference.

This talk concerns a popular framework for approximate inference,variational approximation.

David Nott, NUS Gaussian variational approximation 3 / 39

Page 4: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Variational inference basicsBlei et al., 2017, Ormerod and Wand, 2012

Variational approximation reformulates Bayesian computation as anoptimization problem.

Define an approximating family with some parameters (Gaussian forexample). Then

Define some measure of "closeness" of an approximation to thetrue posterior.Optimize that measure over the approximating family.

Variational parameters to be optimized will be denoted λ.

David Nott, NUS Gaussian variational approximation 4 / 39

Page 5: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Gaussian variational approximation, toy example

David Nott, NUS Gaussian variational approximation 5 / 39

Page 6: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Gaussian variational approximation, toy example

David Nott, NUS Gaussian variational approximation 6 / 39

Page 7: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Gaussian variational approximation, toy example

David Nott, NUS Gaussian variational approximation 7 / 39

Page 8: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Gaussian variational approximation

In this talk we consider a multivariate normal approximation denotedqλ(θ) = N(µ,Σ) with λ = (µ,Σ) to be optimized.

The best normal approximation will be in the sense of minimizingKullback-Leibler (KL) divergence,

KL(p(θ∣y)∣∣qλ(θ)) = log p(y) − ∫ logp(θ)p(y ∣θ)

qλ(θ)qλ(θ)dθ

= log p(y) −L(λ)

where p(y) = ∫ p(θ)p(y ∣θ) and L(λ) is called thevariational lower bound.

Minimizing the KL divergence is equivalent to maximizing L(λ).

David Nott, NUS Gaussian variational approximation 8 / 39

Page 9: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Gaussian variational approximation

The optimization of L(λ) is challenging for a normal family when thedimension d of θ is large.

Key difficulty:With no restriction on Σ in qλ(θ) = N(µ,Σ), there are d + d(d + 1)/2parameters to optimize.

For high-dimensional θ we need reduced parametrizations for Σ.Exploiting conditional independence structure to motivate sparsityin Σ−1 (Tan and Nott, 2017).Factor models

David Nott, NUS Gaussian variational approximation 9 / 39

Page 10: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

How should we optimize? Stochastic gradient ascentRobbins and Monro, 1951

We will use stochastic gradient ascent methods for optimizing thelower bound.

Suppose that ∇λL(λ) is the gradient of L(λ) and that ∇λL(λ) is anunbiased estimate of it.

Stochastic gradient ascent

Initialize λ(0)

for t = 0,1, . . . and until some stopping rule is satisfied

λ(t+1) = λ(t) + ρt∇λL(λ(t))

Typically the learning rate sequence ρt , t ≥ 0 satisfies ∑t ρt =∞,∑t ρ

2t <∞ (Robbins and Monro, 1951).

David Nott, NUS Gaussian variational approximation 10 / 39

Page 11: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Reparametrization gradientsKingma and Welling, 2013, Rezende et al., 2014

Low variance gradient estimates are crucial for stability and fastconvergence of stochastic optimization - achieved using thereparametrization trick.

Writing h(θ) = p(θ)p(y ∣θ), the lower bound is

L(λ) = ∫ log h(θ) − log qλ(θ)qλ(θ) dθ.

Suppose that for the variational family qλ(θ) we can write θ ∼ qλ(θ) asθ = t(z, λ) where z ∼ f (z) and f (⋅) does not depend on λ. Then

L(λ) = Ef (log h(t(z, λ)) − log qλ(t(z, λ))

Differentiating under the integral sign, ∇λL(λ) is an expectation withrespect to f (⋅): simulation from f (⋅) gives unbiased estimates.

David Nott, NUS Gaussian variational approximation 11 / 39

Page 12: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Reparametrization gradients for the Gaussian familyTitsias and Lázaro-Gredilla, 2014, Kucukelbir et al., 2016

Suppose qλ(θ) = N(θ;µ,Σ = CCT ) where µ is the mean vector, Σ isthe covariance matrix, C the Cholesky factor of Σ.

Variational parameter λ = (µ,C). We can write θ ∼ qλ(θ) as

θ = µ +Cz z ∼ f (z) = N(0, I)

This shows that the Gaussian family has the structure required for thereparametrization trick.

David Nott, NUS Gaussian variational approximation 12 / 39

Page 13: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Gaussian variational approximation using conditionalindependence structure

The low variance gradient estimates provided by thereparametrization trick are crucial to efficient and stable stochasticvariational optimization methods.However, learning a Gaussian variational approximation is stillhard if we parametrize Σ with a dense Cholesky factor:The number of parameters in the covariance matrix growsquadratically with the parameter dimension.

We need to use the structure of the model to obtain parsimoniousstructured paremetrizations of covariance matrices suitable forGaussian variational approximations in high dimensions.

David Nott, NUS Gaussian variational approximation 13 / 39

Page 14: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Gaussian variational approximation

What exploitable structure is available?Conditional independence structure.

Can we match the true conditional independence structure in theGaussian approximation to make such approximations practical in highdimensions?

For a Gaussian random vector with covariance matrix Σ, if Ω = Σ−1

then Ωij = 0 implies variables i and j are conditionally independentgiven the rest.

David Nott, NUS Gaussian variational approximation 14 / 39

Page 15: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Motivating example: longitudinal generalized linearmixed model

Observations y = (y1, . . . ,yn), yi = (yi1, . . . ,yini )⊺.

Observation specific random effects b = (b1, . . . ,bn)⊺. Assumebi ∼ N(0,G) say.The yi are conditionally independent given b, likelihood

n∏i=1

p(yi ∣bi , η)

where η denotes fixed effects and variance parameters.Joint posterior for θ = (b⊺, η⊺)⊺

p(θ∣y)∝ p(η)n∏i=1

p(bi ∣η)p(yi ∣bi , η).

In the joint posterior bi and bj , i ≠ j are conditionally independentgiven η.

David Nott, NUS Gaussian variational approximation 15 / 39

Page 16: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Motivating example: longitudinal generalized linearmixed model

Consider a sparse Ω = Σ−1 in Gaussian variational approximation ofthe form:

Ω =

⎛⎜⎜⎜⎜⎜⎜⎝

b1 b2 . . . bn η

b1 Ω11 0 . . . 0 Ω1,n+1b2 0 Ω22 . . . 0 Ω2,n+1⋮ ⋮ ⋮ ⋱ ⋮ ⋮bn 0 0 . . . Ωnn Ωn,n+1η Ωn+1,1 Ωn+1,2 . . . Ωn+1,n Ωn+1,n+1

⎞⎟⎟⎟⎟⎟⎟⎠

David Nott, NUS Gaussian variational approximation 16 / 39

Page 17: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Motivating example: longitudinal generalized linearmixed model

It will be convenient later to parametrize Ω in terms of its Choleskyfactor, Ω = TT ⊺ say where T is lower triangular.By imposing sparse structure on T we can impose sparsestructure on Ω. The leftmost non-zero entries in each row match.Choose T of the form

T =

⎛⎜⎜⎜⎜⎜⎜⎝

b1 b2 . . . bn η

b1 T11 0 . . . 0 0b2 0 T22 . . . 0 0⋮ ⋮ ⋮ ⋱ ⋮ ⋮bn 0 0 . . . TNN 0η TN+1,1 TN+1,2 . . . TN+1,N TN+1,N+1

⎞⎟⎟⎟⎟⎟⎟⎠

David Nott, NUS Gaussian variational approximation 17 / 39

Page 18: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

More general framework

Observations y = (y1, . . . ,yn), observation specific latent variablesb1 . . . ,bn, global parameters η.Joint model p(y , θ) for (y , θ), θ = (b⊺, η⊺)⊺ of the form

p(η)n∏i=1

p(yi ∣bi , η)p(b1, . . . ,bk ∣η)∏i>k

p(bi ∣bi−1, . . . ,bi−k , η) .

for some 0 ≤ k ≤ nbi is conditionally independent of the other latent variables inp(θ∣y) given η and k neighbouring values.Our previous random effects example fits this structure with k = 0.A state space model for a time series where the bi are the statesfits this structure with k = 1.For a state space model, use Ω of the form

David Nott, NUS Gaussian variational approximation 18 / 39

Page 19: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

More general framework

Ω =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

b1 b2 b3 ⋯ bn−1 bn η

b1 Ω11 Ω⊺

21 0 . . . 0 0 Ω⊺

N+1,1b2 Ω21 Ω22 Ω⊺

32 . . . 0 0 Ω⊺

N+1,2b3 0 Ω32 Ω33 . . . 0 0 Ω⊺

N+1,3⋮ ⋮ ⋮ ⋮ ⋱ . . . ⋮ ⋮bn−1 0 0 0 . . . Ωn−1,n−1 Ω⊺

n,n−1 Ω⊺

n+1,n−1bn 0 0 0 . . . Ωn,n−1 Ωnn Ω⊺

n+1,nη Ωn+1,1 Ωn+1,2 Ωn+1,3 . . . Ωn+1,n−1 Ωn+1,n Ωn+1,n+1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Again, we can impose the appropriate sparse structure in Ω byparametrizing in terms of the Cholesky factor T , Ω = TT ⊺ and matchingthe row sparsity of T to that of Ω.

David Nott, NUS Gaussian variational approximation 19 / 39

Page 20: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Reparametrization gradients

Variational parameters λ = (µ,T ). Now qλ(θ) = N(µ,Σ = T −T T −1).Generative representation for applying the reparametrization trick:z ∼ N(0, I), θ = µ + T −T z.

Simulation from the variational distribution, and gradient estimation,can be done using only solutions of sparse triangular linear systems.

Expressions for lower bound gradients, where z ∼ f (z) = N(0, I):

∇µL = Ef (∇θ log h(µ + T −T z) + Tz).

∇TL = −Ef (T −T z(∇θ log h(µ + T −T z) + Tz)T T −T ),

David Nott, NUS Gaussian variational approximation 20 / 39

Page 21: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Logistic regression with random effectsHosmer et al., 2013

Data on 500 subjects studied over seven years.Response: whether the subject is taking drugs from 3 or moredifferent groups.Covariates for gender, race, age, number of oupatient mentalhealth visits.Random intercept model.8 fixed effects parameters, one variance parameter, 500 randomintercepts.

David Nott, NUS Gaussian variational approximation 21 / 39

Page 22: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Example: polypharmacy dataHosmer et al., 2013

Compare our approach withAutomatic differentiation variational inference (ADVI, implementedin the statistical package Stan) (Kucukelbir et al., 2016)Doubly stochastic variational inference (DSVI) (Titsias andLázaro-Gredilla, 2014).

DSVI and ADVI have been implemented in a "mean field" version(diagonal covariance structure) and in a version assuming adense Cholesky factor for the covariance.Although our DSVI and ADVI methods are essentially similar, theyuse different step size choices and stopping rules.

David Nott, NUS Gaussian variational approximation 22 / 39

Page 23: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Example: polypharmacy dataHosmer et al., 2013

−6.0 −5.0 −4.0 −3.0

0.0

0.5

1.0

1.5

2.0

β0

−0.5 0.5 1.5

0.0

0.5

1.0

1.5

2.0

βGender

−2.5 −1.5 −0.5 0.5

0.0

1.0

2.0

3.0

βRace

1.5 2.0 2.5 3.0 3.5

0.0

0.5

1.0

1.5

2.0

βAge

−0.5 0.0 0.5 1.0 1.5

0.0

1.0

2.0

3.0

βMHV41

0.0 0.5 1.0 1.5 2.0 2.5

01

23

4

βMHV42

0.5 1.0 1.5 2.0 2.5 3.0

01

23

4

βMHV43

0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

βINPTMHV21

0.6 0.8 1.0 1.2

02

46

812

ζ1

Blue= mean field (dashed=ADVI, solid=DSVI), Green=dense Cholesky(dashed=ADVI, solid=DSVI), Black=MCMC, Red=Sparse precisionCholesky.

David Nott, NUS Gaussian variational approximation 23 / 39

Page 24: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Example: polypharmacy dataHosmer et al., 2013

The sparse precision Cholesky approach is most accurate.Runtimes

ADVI methods: 7 seconds (mean field), 75 seconds (denseCholesky). Step size sequences based on Stan defaults.DSVI methods: 30 seconds (mean field), 262 seconds (denseCholesky). Adaptive step sizes chosen by ADADELTA.Sparse precision Cholesky: 56 seconds. Adaptive stepsizeschosen by ADADELTA.

DSVI and Sparse Cholesky precision algorithms are implementedin Julia.

David Nott, NUS Gaussian variational approximation 24 / 39

Page 25: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

An alternative to exploiting conditional independence:factor structure

Using a sparse precision matrix in the variational family is natural withconditional independence structure coming from the model.

In other cases, it may be useful to consider factor structure. Consider

qλ(θ) = N(µ,BB⊺ +D2), (1)

where B is a d × p full rank matrix with p << d , and D is diagonal withdiagonal elements δ = (δ1, . . . , δd)T .

A random draw from (1) can be written as

θ = µ +Bz +Dε,

where (z⊺, ε⊺)⊺ ∼ N(0, I).

David Nott, NUS Gaussian variational approximation 25 / 39

Page 26: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Gaussian variational approximation

In this generative model θ = µ +Bz +Dε:The term Bz involving the low-dimensional factor z capturesdependence among components.The “idiosyncratic” variation Dε, which is independent betweencomponents, captures component specific variation.

Factor models can give parsimonious descriptions of high-dimensionalcovariance structure; we set the upper triangle of B to zero (Gewekeand Zhou, 1996).

Variational family parametrized by λ = (µ,B, δ): the dimension of λgrows linearly in d .

David Nott, NUS Gaussian variational approximation 26 / 39

Page 27: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Optimization of the lower bound

The generative representation of the factor model is the basis forreparametrization gradients in stochastic gradient optimization.Gradient expressions for variational parameters λ = (µ,B, δ):

∇µL(λ) =Ef (∇θ log h(µ +Bz +Dε) + (BB⊺ +D2)−1(Bz +Dε)),

∇BL(λ) =Ef (∇θ log h(µ +Bz +Dε)zT + (BBT +D2)−1(Bz +Dε)zT ),

∇δL(λ) =Ef (diag(∇θ log h(µ +Bz +Dε)εT + (BBT +D2)−1(Bz +Dε)εT )).

The high-dimensional matrix inversions (BBT +D2)−1 can be doneconveniently using the Sherman-Morrison-Woodbury formula.

David Nott, NUS Gaussian variational approximation 27 / 39

Page 28: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Logistic regression with random effectsHosmer et al., 2013

Data on 500 subjects studied over seven years.Response: whether the subject is taking drugs from 3 or moredifferent groups.Covariates for gender, race, age, number of oupatient mentalhealth visits.Random intercept model.8 fixed effects parameters, one variance parameter, 500 randomintercepts.

David Nott, NUS Gaussian variational approximation 28 / 39

Page 29: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Logistic regression with random effectsHosmer et al., 2013

David Nott, NUS Gaussian variational approximation 29 / 39

Page 30: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Logistic regression with random effectsHosmer et al., 2013

-6 -5 -4 -3 -2

β0

0

2

4

6

8

-1 0 1 2 3

βgender

0

1

2

3

4

5

6

7

-3 -2 -1 0 1

βrace

0

0.5

1

1.5

2

2.5

3

1 2 3 4

βage

0

0.5

1

1.5

2

2.5

-1 0 1 2

βM1

0

0.5

1

1.5

2

2.5

3

0 1 2 3

βM2

0

1

2

3

4

5

0 1 2 3

βM3

0

1

2

3

4

5

6

-1 0 1 2

βIM

0

0.5

1

1.5

2

0.6 0.8 1 1.2

ζ

0

2

4

6

8

10

12

14

Tan & Nott (2016)VAFC (p = 0)VAFC (p = 4)VAFC (p = 20)

David Nott, NUS Gaussian variational approximation 30 / 39

Page 31: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Logistic regression with random effectsHosmer et al., 2013

-2 0 2 4 6

Mean of Var Dist (VAFC)

-2

-1

0

1

2

3

4

5

6

Me

an

of

Va

r D

ist

(Ta

n &

No

tt)

0 0.5 1 1.5 2

s.d. of Var Dist (VAFC)

0

0.5

1

1.5

2

s.d

. o

f V

ar

Dis

t (T

an

& N

ott

)

David Nott, NUS Gaussian variational approximation 31 / 39

Page 32: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Extensions

Combining factor and conditional independence structure can befruitful.

Consider a time series with observations y1, . . . ,yn. There is a statespace structure with states b1, . . . ,bn.

Suppose the states are high-dimensional.Model the posterior dependence of b = (b1, . . . ,bn)T through adynamic factor modelFactor structure to give a reduced dimension description of thestates; andConditional independence structure in time for the dynamicfactors.

David Nott, NUS Gaussian variational approximation 32 / 39

Page 33: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Example: A spatio-temporal modelWikle and Hooten, 2006

Spatio-temporal model of Wikle and Hooten (2006).

Dataset on the spread of the Eurasian collared-dove across NorthAmerica (North American Breeding Bird Survey).

p = 111 spatial grid points with the dove counts aggregated withineach area.

Vectors of spatial counts yt modelled as conditionally independentPoisson variables where the log means are given by a latenthigh-dimensional Markovian dynamic process ut plusmeasurement error.

The dynamic process ut evolves according to a discretizeddiffusion equation.

David Nott, NUS Gaussian variational approximation 33 / 39

Page 34: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Example: A spatio-temporal modelWikle and Hooten, 2006

In this model, there are 4,223 unknowns.Dimension reduction: consider 4 factors for parametrization ofspatial state vectors.6,428 variational parameters compared with the 8,923,199 in theunrestricted parametrization.

David Nott, NUS Gaussian variational approximation 34 / 39

Page 35: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Estimates of spatio-temporal intensity

Gaussian VA (left) and MCMC (right) estimates for 1999-2003

maplocs[, 1]

1999

maplocs[, 1]

2000

maplocs[, 1]

2001

maplocs[, 1]

2002

2003

maplocs[, 1]

map

locs

_ext

end

1999

maplocs[, 1]

map

locs

_ext

end

2000

maplocs[, 1]

map

locs

_ext

end

2001

maplocs[, 1]

map

locs

_ext

end

2002

map

locs

_ext

end

2003

David Nott, NUS Gaussian variational approximation 35 / 39

Page 36: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Spatially averaged intensity over time

0

200

400

600

800

1000

1200

1400

Year t

ϕ t

Draws from MCMC posterior of ϕt19

8619

8719

8819

8919

9019

9119

9219

9319

9419

9519

9619

9719

9819

9920

0020

0120

0220

030

200

400

600

800

1000

1200

1400

Year tϕ t

Draws from VB posterior of ϕt

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

David Nott, NUS Gaussian variational approximation 36 / 39

Page 37: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Spatially varying diffusion coefficients

Columns are for locations with zero (left), low (middle) and high (right) totalcount.

−0.2 0.0 0.2 0.4

Den

sity

of ψ

iAlgorithm

MCMCVB

Location i = 1

−0.2 0.0 0.2 0.4

Den

sity

of ψ

i

Location i = 5

−0.2 0.0 0.2 0.4

Location i = 35

−0.2 0.0 0.2 0.4 0.6

Location i = 46

−0.2 0.0 0.2 0.4

Location i = 96

−0.2 −0.1 0.0 0.1 0.2 0.3

Location i = 105

David Nott, NUS Gaussian variational approximation 37 / 39

Page 38: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

Future work

Using algorithms for Gaussian approximations within morecomplex procedures

Fitting Gaussian mixture approximations, Gaussian copulaapproaches (Guo et al., 2016, Miller et al., 2016).Gaussian copula approximations (Han et al., 2015).Skew normal copula

How to make these methods easy to use?Amortized variational inference methods.

David Nott, NUS Gaussian variational approximation 38 / 39

Page 39: Gaussian variational approximation with structured ... · Gaussian variational approximation with structured covariance matrices David Nott Department of Statistics and Applied Probability

References

D. Blei, A. Kucukelbir and J. McAuliffe. Variational inference: A review for statisticians.Journal of the American Statistical Association, 112(518), 859-877, 2017.

J. Ormerod and M. Wand. Explaining variational approximations. The AmericanStatistician, 64(2), 140-153, 2010.

Victor M.-H. Ong, David J. Nott and Michael S. Smith (2018). Gaussian variationalapproximation with a factor covariance structure. JCGS,doi:10.1080/10618600.2017.1390472.

Matias Quiroz, David J. Nott and Robert Kohn (2018). Gaussian variationalapproximation for high-dimensional state space models. arXiv:1801.07873.

Linda SL Tan and David J Nott (2018). Gaussian variational approximation withsparse precision matrix. Statistics and Computing, Statistics and Computing,28(2), 259-275.

C.K. Wikle and M.B. Hooten. Hierarchical Bayesian Spatio-Temporal Models forPopulation Spread. In Applications of Computational Statistics in theEnvironmental Sciences: Hierarchical Bayes and MCMC Methods. J.S. Clark andA. Gelfand (eds). Oxford University Press. 145-169, 2006.

David Nott, NUS Gaussian variational approximation 39 / 39