# Geostatistics for Large Data Sets ...

of 35
/35

Embed Size (px)

### Transcript of Geostatistics for Large Data Sets ...

Whitney Huang

Geostatistical Modeling for Large Data Sets

Whitney Huang

October 28, 2014

Whitney Huang

Outline

Motivation

Geostatistics for Large Data Sets

Whitney Huang

Gaussian process (GP) geostatistics Model:

Y (s) = µ(s) + η(s) + ε(s), s ∈ S ⊂ Rd

where

µ(s) = XT (s)β, {η(s)}s∈S ∼ GP (0,C (·, ·))

C (s, s′) = σ2ρθ (s− s′), and ε(s) ∼ N(0, τ2) ∀s ∈ S Log-likelihood: Given data Y = (Y (s1), · · · ,Y (sn))T

ln(β,θ, σ2, τ2) ∝ −1 2 log Σ(θ, σ2) + τ2In

− 1

2 (Y− XTβ)T

[ Σ(θ, σ2) + τ2In

]−1 (Y− Xβ)

where Σ(θ, σ2)i ,j = σ2ρθ(si − sj), i , j = 1, · · · , n

Geostatistics for Large Data Sets

Whitney Huang

“Big n Problem” in geostatistics

I Modern environmental instrument has produced a wealth of space–time data ⇒ n is big

I Evaluation of the likelihood function involves factorizing large covariance matrices that generally requires

I O(n3) operations I O(n2) memory

I Modeling strategies are needed to deal with large spatial data set.

I parameter estimation ⇒ MLE, Bayesian I spatial interpolation ⇒ Kriging I multivariate spatial data, spatio-temporal data

Geostatistics for Large Data Sets

Whitney Huang

“Big n Problem” in geostatistics

I Modern environmental instrument has produced a wealth of space–time data ⇒ n is big

I Evaluation of the likelihood function involves factorizing large covariance matrices that generally requires

I O(n3) operations I O(n2) memory

I Modeling strategies are needed to deal with large spatial data set.

I parameter estimation ⇒ MLE, Bayesian I spatial interpolation ⇒ Kriging I multivariate spatial data, spatio-temporal data

Geostatistics for Large Data Sets

Whitney Huang

“Big n Problem” in geostatistics

I Modern environmental instrument has produced a wealth of space–time data ⇒ n is big

I Evaluation of the likelihood function involves factorizing large covariance matrices that generally requires

I O(n3) operations I O(n2) memory

I Modeling strategies are needed to deal with large spatial data set.

I parameter estimation ⇒ MLE, Bayesian I spatial interpolation ⇒ Kriging I multivariate spatial data, spatio-temporal data

Geostatistics for Large Data Sets

Whitney Huang

Modeling strategies in the literature

I Covariance tapering (Furrer et al. 06, Kaufman et al. 08, Du et al. 09)

I Low–rank approximation (Cressie & Johannesson 08, Banerjee et al. 08)

I Likelihood approximation (Vecchia 88, Stein 04)

I Gaussian Markov random field approximation (Rue & Tjelmeland 02, Rue & Held 05, Lindgren et al. 11)

Geostatistics for Large Data Sets

Whitney Huang

Modeling strategies in the literature

I Covariance tapering (Furrer et al. 06, Kaufman et al. 08, Du et al. 09)

I Low–rank approximation (Cressie & Johannesson 08, Banerjee et al. 08)

I Likelihood approximation (Vecchia 88, Stein 04)

I Gaussian Markov random field approximation (Rue & Tjelmeland 02, Rue & Held 05, Lindgren et al. 11)

Geostatistics for Large Data Sets

Whitney Huang

Modeling strategies in the literature

I Covariance tapering (Furrer et al. 06, Kaufman et al. 08, Du et al. 09)

I Low–rank approximation (Cressie & Johannesson 08, Banerjee et al. 08)

I Likelihood approximation (Vecchia 88, Stein 04)

I Gaussian Markov random field approximation (Rue & Tjelmeland 02, Rue & Held 05, Lindgren et al. 11)

Geostatistics for Large Data Sets

Whitney Huang

Modeling strategies in the literature

I Covariance tapering (Furrer et al. 06, Kaufman et al. 08, Du et al. 09)

I Low–rank approximation (Cressie & Johannesson 08, Banerjee et al. 08)

I Likelihood approximation (Vecchia 88, Stein 04)

I Gaussian Markov random field approximation (Rue & Tjelmeland 02, Rue & Held 05, Lindgren et al. 11)

Geostatistics for Large Data Sets

Whitney Huang

Outline

Motivation

Geostatistics for Large Data Sets

Whitney Huang

Methods Covariance tapering Low–rank approximation Likelihood approximation Gaussian Markov random field approximation

Covariance tapering (Furrer et al. 06) We replace the C (h) by

Ctap(h; γ) = ρtap(h; γ) C (h)

where ρtap(h; γ) is an isotropic correlation function with compact support (ρtap(h) = 0 if h ≥ γ) and denotes the Schur product

Geostatistics for Large Data Sets

Whitney Huang

Covariance tapering cont’d

Geostatistics for Large Data Sets

Whitney Huang

Low–rank approximation

Y = η + ε, ε ∼ MVN(0,Σε)

η = Hα+ ξ, ξ ∼ MVN(0,Σξ)

α ∼ MVN(0,Σα)

where α = (α1, · · · , αp)T such that p n and H is mapping from the latent process, α, to the true spatial process of interest, η. Σε and Σξ and diagonal.

Geostatistics for Large Data Sets

Whitney Huang

Low–rank approximation cont’d

To carry out the spatial interpolation (i.e. kriging) of η(s0)|{Y (si )}ni=1 one need to compute(

HΣαH T + V

where V = Σε + Σξ.

Sherman–Morrison–Woodbury formula (A + BCD)−1 = A−1 − A−1B

( C−1 + DA−1B

In the case of low–rank model, we have( HΣαH

T + V )−1

)−1 HTV−1

Whitney Huang

Fixed Rank Kriging (Cressie & Johannesson 08)

Y = Xβ + ZW∗ + ε

Let W ∗ = {w(s∗i )pi=1} be be latent variables at p n known knots {s∗i }

p i=1 and Z (·) be a known basis function

The fixed rank kriging is equivalent to the following low rank model

Y (s) = X(s)β +

p∑ j=1

Geostatistics for Large Data Sets

Whitney Huang

Gaussian Predictive Process (Banerjee et al. 08)

Use a model

to approximate the original spatial process

Y (s) = X(s)Tβ + η(s) + ε(s)

Knots: {s∗1, · · · , s∗p} where p n

⇒ α = {α(s∗i )}pi=1, H(θ) = [ Cov(si , s∗j ;θ)

]T [Σα]−1

Whitney Huang

Likelihood approximation (Vecchia 88)

Partition the observation vector Y into sub–vector Y1, · · · ,Yb and let Y(j) = (YT

1 , · · · ,YT j )T

p(Yj |Y(j−1);β,θ)

Approximate the exact likelihood by replacing Y(j−1) by a sub–vector S(j−1) of Y(j−1)

Geostatistics for Large Data Sets

Whitney Huang

Markov Random Fields

Whitney Huang

Gaussian Markov Random Fields (GMRF)

Definition Let the neighbors to a point i be the points Ni that are “close" to i . A Gaussian random field X ∼ N(µ,Σ = Q−1)

that satisfies

p(Xi |Xj , j 6= i) = p(Xi |Xj : j ∈ Nj)

is a Gaussian Markov random field (GMRF) with Qij = 0 iff Xi ⊥ Xj |X−ij

Geostatistics for Large Data Sets

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Methods Covariance tapering Low–rank approximation Likelihood approximation Gaussian Markov random field approximation

GP/Stochastic Partial Differential Equation (SPDE) connection (Whittle 1954, 1963)

Gaussian process Y (s) with Matern covariance function is a stationary solution to the linear fractional stochastic partial differential equation:(

α2 − )κ

2 , ν > 0

I = ∑

I d is the dimension of the spatial domain

Geostatistics for Large Data Sets

Whitney Huang

Methods Covariance tapering Low–rank approximation Likelihood approximation Gaussian Markov random field approximation

An explicit link between GP and GMRF via SPDE (Lindgren et al. 11)

I Establish the link between GP with Matérn covariance function (with ν + d

2 are integers) and GMRF

I (Bayesian) inference can be done by using Integrated nested Laplace approximation (INLA) approach

I The extensions to nonstationary models, models on manifolds, multivariate models, spatio-temporal models are relatively easy

Geostatistics for Large Data Sets

Whitney Huang

Methods Covariance tapering Low–rank approximation Likelihood approximation Gaussian Markov random field approximation

An explicit link between GP and GMRF via SPDE (Lindgren et al. 11)

I Establish the link between GP with Matérn covariance function (with ν + d

2 are integers) and GMRF

I (Bayesian) inference can be done by using Integrated nested Laplace approximation (INLA) approach

I The extensions to nonstationary models, models on manifolds, multivariate models, spatio-temporal models are relatively easy

Geostatistics for Large Data Sets

Whitney Huang

Methods Covariance tapering Low–rank approximation Likelihood approximation Gaussian Markov random field approximation

An explicit link between GP and GMRF via SPDE (Lindgren et al. 11)

I Establish the link between GP with Matérn covariance function (with ν + d

2 are integers) and GMRF

I (Bayesian) inference can be done by using Integrated nested Laplace approximation (INLA) approach

I The extensions to nonstationary models, models on manifolds, multivariate models, spatio-temporal models are relatively easy

Geostatistics for Large Data Sets

Whitney Huang

Extensions I non-stationary model on a sphere(

α2(s) + )κ

I non-separable anisotropic space-time model( ∂

∂t + (α2 + m · ∇ −∇ ·H∇

Whitney Huang

Appendix For Further Reading

For Further Reading I

H. Rue, and L. Held Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall/CRC, 2005.

S. Banerjee, A. E. Gelfand, A. O. Finley, and H. Sang Gaussian Predictive Process Models for Large Spatial Data Sets JRSSB, 70:825–848, 2008.

N. A. C. Cressie, and G. Johannesson Fixed Rank Kriging for Very Large Spatial Data Sets JRSSB, 70:209–226, 2008.

Geostatistics for Large Data Sets

Whitney Huang

Appendix For Further Reading

For Further Reading II

J. Du, H. Zhang, and V. S. Mandrekar Fixed–Domain Asymptotic Properties of Tapered Maximum Likelihood Estimators The Annals of Statistics, 37:3330–3361, 2009.

R. Furrer, M. G. Genton, and D. W. Nychka Covariance Tapering for Interpolation of Large Spatial Datasets Journal of Computational and Graphical Statistics, 15:502–523, 2006.

C. G. Kaufman, M. J. Schervish, and D. W. Nychka Covariance Tapering for Likelihood–Based Estimation in Large Spatial Data Sets Journal of the American Statistical Association, 103:1545–1555, 2008.

Geostatistics for Large Data Sets

Whitney Huang

Appendix For Further Reading

For Further Reading III

Lindgren, F., Rue, H., & Lindström, J. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. JRSSB, 73:423–498

H. Rue, and H. Tjelmeland Fitting Gaussian Markov Random Fields to Gaussian Field. Scandinavian Journal of Statistics, 29:31–49

M. L. Stein, Z. Chi, and L. J. Welty Approximating Likelihoods for Large Spatial Data Sets JRSSB, 66:275–296, 2004.

Geostatistics for Large Data Sets

Whitney Huang

A. V. Vecchia Estimation and Model Identification for Continuous Spatial Processes JRSSB, 50:297–312, 1988.

Motivation

Methods

Appendix

Geostatistical Modeling for Large Data Sets

Whitney Huang

October 28, 2014

Whitney Huang

Outline

Motivation

Geostatistics for Large Data Sets

Whitney Huang

Gaussian process (GP) geostatistics Model:

Y (s) = µ(s) + η(s) + ε(s), s ∈ S ⊂ Rd

where

µ(s) = XT (s)β, {η(s)}s∈S ∼ GP (0,C (·, ·))

C (s, s′) = σ2ρθ (s− s′), and ε(s) ∼ N(0, τ2) ∀s ∈ S Log-likelihood: Given data Y = (Y (s1), · · · ,Y (sn))T

ln(β,θ, σ2, τ2) ∝ −1 2 log Σ(θ, σ2) + τ2In

− 1

2 (Y− XTβ)T

[ Σ(θ, σ2) + τ2In

]−1 (Y− Xβ)

where Σ(θ, σ2)i ,j = σ2ρθ(si − sj), i , j = 1, · · · , n

Geostatistics for Large Data Sets

Whitney Huang

“Big n Problem” in geostatistics

I Modern environmental instrument has produced a wealth of space–time data ⇒ n is big

I Evaluation of the likelihood function involves factorizing large covariance matrices that generally requires

I O(n3) operations I O(n2) memory

I Modeling strategies are needed to deal with large spatial data set.

I parameter estimation ⇒ MLE, Bayesian I spatial interpolation ⇒ Kriging I multivariate spatial data, spatio-temporal data

Geostatistics for Large Data Sets

Whitney Huang

“Big n Problem” in geostatistics

I Modern environmental instrument has produced a wealth of space–time data ⇒ n is big

I Evaluation of the likelihood function involves factorizing large covariance matrices that generally requires

I O(n3) operations I O(n2) memory

I Modeling strategies are needed to deal with large spatial data set.

I parameter estimation ⇒ MLE, Bayesian I spatial interpolation ⇒ Kriging I multivariate spatial data, spatio-temporal data

Geostatistics for Large Data Sets

Whitney Huang

“Big n Problem” in geostatistics

I Modern environmental instrument has produced a wealth of space–time data ⇒ n is big

I Evaluation of the likelihood function involves factorizing large covariance matrices that generally requires

I O(n3) operations I O(n2) memory

I Modeling strategies are needed to deal with large spatial data set.

I parameter estimation ⇒ MLE, Bayesian I spatial interpolation ⇒ Kriging I multivariate spatial data, spatio-temporal data

Geostatistics for Large Data Sets

Whitney Huang

Modeling strategies in the literature

I Covariance tapering (Furrer et al. 06, Kaufman et al. 08, Du et al. 09)

I Low–rank approximation (Cressie & Johannesson 08, Banerjee et al. 08)

I Likelihood approximation (Vecchia 88, Stein 04)

I Gaussian Markov random field approximation (Rue & Tjelmeland 02, Rue & Held 05, Lindgren et al. 11)

Geostatistics for Large Data Sets

Whitney Huang

Modeling strategies in the literature

I Covariance tapering (Furrer et al. 06, Kaufman et al. 08, Du et al. 09)

I Low–rank approximation (Cressie & Johannesson 08, Banerjee et al. 08)

I Likelihood approximation (Vecchia 88, Stein 04)

I Gaussian Markov random field approximation (Rue & Tjelmeland 02, Rue & Held 05, Lindgren et al. 11)

Geostatistics for Large Data Sets

Whitney Huang

Modeling strategies in the literature

I Covariance tapering (Furrer et al. 06, Kaufman et al. 08, Du et al. 09)

I Low–rank approximation (Cressie & Johannesson 08, Banerjee et al. 08)

I Likelihood approximation (Vecchia 88, Stein 04)

I Gaussian Markov random field approximation (Rue & Tjelmeland 02, Rue & Held 05, Lindgren et al. 11)

Geostatistics for Large Data Sets

Whitney Huang

Modeling strategies in the literature

I Covariance tapering (Furrer et al. 06, Kaufman et al. 08, Du et al. 09)

I Low–rank approximation (Cressie & Johannesson 08, Banerjee et al. 08)

I Likelihood approximation (Vecchia 88, Stein 04)

I Gaussian Markov random field approximation (Rue & Tjelmeland 02, Rue & Held 05, Lindgren et al. 11)

Geostatistics for Large Data Sets

Whitney Huang

Outline

Motivation

Geostatistics for Large Data Sets

Whitney Huang

Methods Covariance tapering Low–rank approximation Likelihood approximation Gaussian Markov random field approximation

Covariance tapering (Furrer et al. 06) We replace the C (h) by

Ctap(h; γ) = ρtap(h; γ) C (h)

where ρtap(h; γ) is an isotropic correlation function with compact support (ρtap(h) = 0 if h ≥ γ) and denotes the Schur product

Geostatistics for Large Data Sets

Whitney Huang

Covariance tapering cont’d

Geostatistics for Large Data Sets

Whitney Huang

Low–rank approximation

Y = η + ε, ε ∼ MVN(0,Σε)

η = Hα+ ξ, ξ ∼ MVN(0,Σξ)

α ∼ MVN(0,Σα)

where α = (α1, · · · , αp)T such that p n and H is mapping from the latent process, α, to the true spatial process of interest, η. Σε and Σξ and diagonal.

Geostatistics for Large Data Sets

Whitney Huang

Low–rank approximation cont’d

To carry out the spatial interpolation (i.e. kriging) of η(s0)|{Y (si )}ni=1 one need to compute(

HΣαH T + V

where V = Σε + Σξ.

Sherman–Morrison–Woodbury formula (A + BCD)−1 = A−1 − A−1B

( C−1 + DA−1B

In the case of low–rank model, we have( HΣαH

T + V )−1

)−1 HTV−1

Whitney Huang

Fixed Rank Kriging (Cressie & Johannesson 08)

Y = Xβ + ZW∗ + ε

Let W ∗ = {w(s∗i )pi=1} be be latent variables at p n known knots {s∗i }

p i=1 and Z (·) be a known basis function

The fixed rank kriging is equivalent to the following low rank model

Y (s) = X(s)β +

p∑ j=1

Geostatistics for Large Data Sets

Whitney Huang

Gaussian Predictive Process (Banerjee et al. 08)

Use a model

to approximate the original spatial process

Y (s) = X(s)Tβ + η(s) + ε(s)

Knots: {s∗1, · · · , s∗p} where p n

⇒ α = {α(s∗i )}pi=1, H(θ) = [ Cov(si , s∗j ;θ)

]T [Σα]−1

Whitney Huang

Likelihood approximation (Vecchia 88)

Partition the observation vector Y into sub–vector Y1, · · · ,Yb and let Y(j) = (YT

1 , · · · ,YT j )T

p(Yj |Y(j−1);β,θ)

Approximate the exact likelihood by replacing Y(j−1) by a sub–vector S(j−1) of Y(j−1)

Geostatistics for Large Data Sets

Whitney Huang

Markov Random Fields

Whitney Huang

Gaussian Markov Random Fields (GMRF)

Definition Let the neighbors to a point i be the points Ni that are “close" to i . A Gaussian random field X ∼ N(µ,Σ = Q−1)

that satisfies

p(Xi |Xj , j 6= i) = p(Xi |Xj : j ∈ Nj)

is a Gaussian Markov random field (GMRF) with Qij = 0 iff Xi ⊥ Xj |X−ij

Geostatistics for Large Data Sets

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Remarks: GP vs. GMRF in geostatistical modeling

I +: GP model is widely used in modeling continuously indexed spatial data in which the covariance function characterizes the process properties

I –: Inference involves factorizing covariance matrices

I +: GMRF model is computationally efficient due to the sparse precision matrix

I –: Only for discretely indexed spatial data

Main idea of GMRF approach:

GP inference

Whitney Huang

Methods Covariance tapering Low–rank approximation Likelihood approximation Gaussian Markov random field approximation

GP/Stochastic Partial Differential Equation (SPDE) connection (Whittle 1954, 1963)

Gaussian process Y (s) with Matern covariance function is a stationary solution to the linear fractional stochastic partial differential equation:(

α2 − )κ

2 , ν > 0

I = ∑

I d is the dimension of the spatial domain

Geostatistics for Large Data Sets

Whitney Huang

Methods Covariance tapering Low–rank approximation Likelihood approximation Gaussian Markov random field approximation

An explicit link between GP and GMRF via SPDE (Lindgren et al. 11)

I Establish the link between GP with Matérn covariance function (with ν + d

2 are integers) and GMRF

I (Bayesian) inference can be done by using Integrated nested Laplace approximation (INLA) approach

I The extensions to nonstationary models, models on manifolds, multivariate models, spatio-temporal models are relatively easy

Geostatistics for Large Data Sets

Whitney Huang

Methods Covariance tapering Low–rank approximation Likelihood approximation Gaussian Markov random field approximation

An explicit link between GP and GMRF via SPDE (Lindgren et al. 11)

I Establish the link between GP with Matérn covariance function (with ν + d

2 are integers) and GMRF

I (Bayesian) inference can be done by using Integrated nested Laplace approximation (INLA) approach

I The extensions to nonstationary models, models on manifolds, multivariate models, spatio-temporal models are relatively easy

Geostatistics for Large Data Sets

Whitney Huang

Methods Covariance tapering Low–rank approximation Likelihood approximation Gaussian Markov random field approximation

An explicit link between GP and GMRF via SPDE (Lindgren et al. 11)

I Establish the link between GP with Matérn covariance function (with ν + d

2 are integers) and GMRF

I (Bayesian) inference can be done by using Integrated nested Laplace approximation (INLA) approach

I The extensions to nonstationary models, models on manifolds, multivariate models, spatio-temporal models are relatively easy

Geostatistics for Large Data Sets

Whitney Huang

Extensions I non-stationary model on a sphere(

α2(s) + )κ

I non-separable anisotropic space-time model( ∂

∂t + (α2 + m · ∇ −∇ ·H∇

Whitney Huang

Appendix For Further Reading

For Further Reading I

H. Rue, and L. Held Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall/CRC, 2005.

S. Banerjee, A. E. Gelfand, A. O. Finley, and H. Sang Gaussian Predictive Process Models for Large Spatial Data Sets JRSSB, 70:825–848, 2008.

N. A. C. Cressie, and G. Johannesson Fixed Rank Kriging for Very Large Spatial Data Sets JRSSB, 70:209–226, 2008.

Geostatistics for Large Data Sets

Whitney Huang

Appendix For Further Reading

For Further Reading II

J. Du, H. Zhang, and V. S. Mandrekar Fixed–Domain Asymptotic Properties of Tapered Maximum Likelihood Estimators The Annals of Statistics, 37:3330–3361, 2009.

R. Furrer, M. G. Genton, and D. W. Nychka Covariance Tapering for Interpolation of Large Spatial Datasets Journal of Computational and Graphical Statistics, 15:502–523, 2006.

C. G. Kaufman, M. J. Schervish, and D. W. Nychka Covariance Tapering for Likelihood–Based Estimation in Large Spatial Data Sets Journal of the American Statistical Association, 103:1545–1555, 2008.

Geostatistics for Large Data Sets

Whitney Huang

Appendix For Further Reading

For Further Reading III

Lindgren, F., Rue, H., & Lindström, J. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. JRSSB, 73:423–498

H. Rue, and H. Tjelmeland Fitting Gaussian Markov Random Fields to Gaussian Field. Scandinavian Journal of Statistics, 29:31–49

M. L. Stein, Z. Chi, and L. J. Welty Approximating Likelihoods for Large Spatial Data Sets JRSSB, 66:275–296, 2004.

Geostatistics for Large Data Sets

Whitney Huang

A. V. Vecchia Estimation and Model Identification for Continuous Spatial Processes JRSSB, 50:297–312, 1988.

Motivation

Methods

Appendix