'Kriging' in: Encyclopedia of Environmetrics, Second ...anderes/papers/Kriging.pdf · Kriging...

Kriging

Introduction

Kriging, at its most fundamental level, is an interpola-tion method used to convert partial observations of aspatial field to predictions of that field at unobservedlocations. In geostatistics, the field may represent per-meability of soil and the observations from physicalmeasurements in a well-bore. In another example,of particular importance for environmental problems,the field may represent the concentration of pollu-tants in a contaminated site and the observations fromremote sensing equipment. In this article we give ashort account of kriging under some basic model-ing assumptions and observation scenarios. We givea number of equivalent derivations of the kriging pre-diction, each shedding light on a particular aspect ofkriging. Taken as a whole, these derivations providea broader perspective than any single derivation ofkriging can provide.

Kriging is named after the South African miningengineer, D. G. Krige, who during the 1950s devel-oped statistical techniques for predicting ore-gradedistributions from ground samples [1]. The term krig-ing was coined by Matheron [2] who is one of themost influential contributors to the subject. By now,kriging has become a predominant prediction tool inspatial statistics (see Ref. 3 for a historical account)and can be shown to be related to spline interpola-tion [4], generalized least squares (GLS) regression,Wiener filtering, and objective analysis in meteorol-ogy [5]. Book length accounts of kriging can be foundin Refs 2, 5–13.

One of the fundamental assumptions used in krig-ing is that the spatial field of interest is a realizationof a random field or a stochastic process. The ran-domness can be a consequence of some physicalmechanism generating the spatial field (e.g., diffusionprocess driven by random forcing) or may simplybe used to account for uncertainty. Either way, therandom field model is used to define the kriging pre-dictor as the optimal linear unbiased predictor, whereoptimality and unbiasedness are defined through the

Based in part on the article “Kriging” by PaulSwitzer, which appeared in the Encyclopedia ofEnvironmetrics.

randomness inherent in the spatial field model. Inthis article, we explore two alternatives for charac-terizing the kriging prediction: using GLS regressionand splines. Using GLS regression is arguably thecleanest and easiest to interpret formula for kriging.In contrast, the spline characterization is useful forrelating all three characterizations. The spline char-acterization also allows one to easily generalize fromthe point-wise predictions discussed in this article tothe prediction of other linear functionals (integrals orderivatives for example).

The Basic Model

In this section, we present the basic modelingassumptions which provide the main ingredients forkriging and serve as the basis for the generalizationsdiscussed in section titled Extensions to the BasicModel. We start by letting Y (x) denote the spatialfield of interest, where x ranges in !d . The basickriging model stipulates the following form for Y

Y (x) =N!

p=1

!pfp(x) + Z(x) (1)

where Z is a centered (i.e., mean zero) correlatedrandom field; the functions fp are known basis func-tions; and !p are unknown coefficients. We assumethe field Y is observed at spatial locations x1, . . . , xn

with some additive noise corrupting the observations.In particular, our observations constitute n measure-ments y1, . . . , yn where each yk has the form

yk = Y (xk) + "#k

where #1, . . . , #n are independent mean zero randomvariables (also independent of Y ) with standarddeviation 1. The goal is to then predict Y (x0) atsome spatial location x0 ! !d . Kriging produces sucha prediction, denoted "Y (x0), which is a linear functionof the data y1, . . . , yn.

The additive errors "#k constitute what is calleda nugget effect. The nugget effect can model instru-mental noise or the presence of a microscale process,possibly discontinuous, which has correlation lengthscales much smaller than Z. The nomenclature comesfrom the geostatistics literature (see Ref. 8) whichuses a nugget effect to model the presence of smallnuggets which contribute a spatially uncorrelated (or

Encyclopedia of Environmetrics, Online ! 2006 John Wiley & Sons, Ltd.This article is ! 2013 John Wiley & Sons, Ltd.This article was published in Encyclopedia of Environmetrics Second Edition in 2012 by John Wiley & Sons, Ltd.DOI: 10.1002/9780470057339.vak003.pub2

2 Kriging

nearly so) discontinuous field added to the larger scaleconcentration variations modeled with Y . When thereis no nugget effect (i.e., " = 0) kriging becomes anexact interpolator.

To fix notation for the remainder of the articlewe let K(x, y) " cov(Z(x), Z(y)) denote the covari-ance function of Z. Let y " (y1, . . . , yn)

t denotethe vector of observations. Let ! " (!1, . . . , !n)

t

denote the vector of unknown coefficients and X =#fp(xk)

$

k,pdenote the matrix of covariates at the

observations, where the rows index the observationsand the columns index the covariates. Also let x "(f1(x0), . . . , fN(x0)) be the row vector of the covari-ates at the prediction location x0. Finally, the matrix$ =

#K(xi, xj )

$

i,j+ " 2I denotes the covariance

matrix for the observation vector y where I is theidentity matrix and $0 =

%K(x0, x1), . . . , K(x0, xn)

&

denotes the row vector of the covariance of Y (x0)with the observations in y.

Kriging with Generalized Least SquaresRegression

The most intuitive definition of kriging is easiest toderive when additionally assuming both Z and theobservations errors #1, . . . , #n are Gaussian. In actu-ality, Gaussianity is unnecessary but it makes theexposition somewhat cleaner. Under the Gaussianassumption one has the familiar regression charac-terization of the observations: y # N [X!, $] (i.e., yis a Gaussian vector with mean X! and covariancematrix $). Even more is true, by Gaussianity of Y ,the joint distribution of y and Y (x0) is given by

'Y (x0)

y

(# N

)'x!X!

(,

'K(x0, x0) $0

$t0 $

(*(2)

If ! were known, one would simply use theconditional expectation E(Y (x0)|y) to predict Y (x0),which can be computed as follows

E(Y (x0)|y) = $0$$1(y $ X!) + x! (3)

However, the kriging model (1) assumes ! isunknown. To account for this uncertainty it is nat-ural to use the regression format of the observationsy # N [X!, $] to estimate !. Indeed the GLS esti-mate of ! is given by

!̂ = (Xt$$1X)$1Xt$$1y

Now, by replacing ! in (3) with the GLS estimate !̂,one obtains the following kriging prediction "Y (x0)(even if Gaussianity failed to hold):

"Y (x0) = $0$$1(y $ X!̂) + x!̂ (4)

This is arguably the easiest way to construct thekriging prediction and it makes clear how one canextend the basic kriging model to more complexobservation scenarios and predictions. For example,if one observed linear functionals of Y , then thebasic structure of (2) does not change. The onlydifference is a change in the design matrices Xand the covariance matrices $ and $0. Once thesechanges are made, (4) gives the appropriate krigingestimate for these generalizations.

In the simplified case when ! is known, thekriging estimate "Y (x0) generated from (3) is oftenreferred to as simple kriging. When N = 1 and thecovariate f1(x) is a constant, "Y (x0) from (4) isreferred to as ordinary kriging. Finally, the prediction"Y (x0) with no simplified assumptions on the size ofN or the covariates fp is generally referred to asuniversal kriging.

Kriging as a Best Linear Unbiased Predictor

Our second characterization of kriging is arguablythe most popular. It defines kriging as optimal linearunbiased prediction. One clear advantage of thisviewpoint is that it makes it clear why one should usekriging: it is unbiased and has the smallest expected(squared) prediction error among all linear unbiasedestimates of Y (x0).

To derive this characterization first consider the setof linear combinations of the data %t y = +n

k=1 %kyk ,where % " (%1, . . . , %n)

t . One of these linear com-binations is the kriging prediction "Y (x0). To isolatewhich one, first restrict % by requiring %t y and Y (x0)have the same expected value, irrespective of theunknown coefficient vector !. We call this condi-tion unbiasedness. To enforce this condition noticethat E%t y = %tX! and EY(x0) = x!. Therefore itis clear that %tX = x (or equivalently Xt% = xt ) isa sufficient condition for unbiasedness. Of course,the vector xt must be a member of the span of thecolumns of Xt for such a vector % to exist. Assum-ing there exists at least one such %, we can isolatethe optimal unbiased linear predictor—the krigingpredictor—by minimizing the mean square prediction


Kriging 3

error E,Y (x0) $ %t y

-2over the set of all % which

satisfy Xt% = xt (the unbiasedness constraint). Usingthe joint distribution (2) one gets

E,Y (x0) $ %t y

-2 = " 2n!

k=1

%2k +

n!

k,j=0

%k%jK(xk, xj ),

where %0 = $1 (5)

Then using Lagrange multipliers, the minimizer ofthe above quantity, over the set of % which satisfiesXt% = xt , is characterized as follows:

"Y (x0) = %t y, where'

$ XXt 0

( '%&

(=

'$t

0xt

(

(6)Note that (5) also allows one to quantify predictionerror of the kriging prediction "Y (x0). At this point itis not entirely clear how this estimate relates to theone given in (4). It can be seen in the next sectionthat the easiest way to relate (6) and (4) is throughthe third characterization: spline smoothers.

Remark A sufficient condition for a minimizerof (5) to exist over the set of % which sat-isfy Xt% = xt is that XtX be nonsingular and+n

k,j=0 %k%jK(xk, xj ) % 0 for any %. If not, and thereexists a % such that

+nk,j=0 %k%jK(xk, xj ) < 0 for

example, then the right hand side of (5) may nothave a finite minimizer (the right hand side maybe unbounded from below). Indeed, this exposesthe need for the covariance function K(x, y) tobe positive definite when constructing the krigingprediction.

Kriging as a Spline Interpolator

There are a number of advantages to a spline charac-terization of the kriging prediction, one of which isthat the estimate is written as a function of x0. Thisallows easy computation of "Y (x0) at a large numberof points x0, simultaneously. The spline characteriza-tion of "Y (x0) is given by

"Y (x0) =n!

k=1

ckK(x0, xk) +N!

p=1

bpfp(x0),

where'

$ XXt 0

('cb

(=

'y0

((7)

This is the typical form of a spline estimate where, inthe spline literature, K is referred to as a reproducingkernel and the locations xk are called the spline knots(see Ref. 4, for example).

At this point it is easy to establish the equivalenceof all three characterizations: (4), (6), and (7). Westart by showing the equivalence of (6) and (7)

%t y = (%t , & t )

'y0

(

by (6)= ($0, x)

'$ XXt 0

($1 'y0

(

by (7)= ($0, x)

'cb

(

=n!

k=1

ckK(x0, xk) +N!

p=1

bpfp(x0)

Note that we are tacitly assuming the above matrixinverse exists. A sufficient condition for this exis-tence is that $ be strictly positive definite and XtXbe nonsingular (see Section 3.4.1 of Ref. 7 fordetails). To see why the spline interpolator is equiv-alent to the GLS characterization (4) simply noticethat b = !̂ and c = $$1(y $ X!̂) satisfy (7). Thisestablishes the equivalence of all three characteriza-tions.

The Variogram and Generalized CovarianceFunctions

In this section, we investigate an important factabout kriging: often one does not need to specifythe full covariance structure of the process Z tocompute the kriging prediction and the mean squareprediction error. In particular, depending on whatcovariates are present in the kriging model (1), thereare different covariance functions K which lead tothe same prediction and mean square prediction error.This leads to simplified modeling assumptions, thetheory of intrinsic random functions and generalizedcovariance functions developed by Matheron [14].

The classic example of a modeling simplificationoccurs when one of the covariates fp(x) is a constantin the kriging model (1). In this case, it can be shownthat the kriging prediction and mean square predic-tion error only depend on what is called the var-iogram, defined as var(Z(x) $ Z(y)) = K(x, x) +K(y, y) $ 2K(x, y), in contrast to the full covariance


4 Kriging

function K(x, y) (see Refs 7 and 8). To be explicit,when computing the kriging prediction in (4), (6),or (7) and the mean square prediction error (5)one can replace all occurrences of K(x, y) with thefunction $ 1

2 var(Z(x) $ Z(y)) and obtain the samequantity. This is important since multiple covariancefunctions may have the same variogram. For exam-ple, the two covariance functions (1 $ |x $ y|)+ and|x| + |y| $ |x $ y| both have the same variogramwhen x, y ! (0, 1). The advantage, then, is that onedoes not need to spend time deciding between twocompeting covariance models if they have the samevariogram, for they yield the same predictions andmean square prediction errors. Another advantage isthat variograms must satisfy a somewhat less strin-gent requirement (they are conditionally negative def-inite) than the positive definite condition required forcovariance functions (see Section 2.3.6 in Ref. 8).The ubiquity of the variogram in the kriging liter-ature can therefore be partially understood throughthe simple fact that it is common for kriging modelsto have at least one of the covariates be a constantfunction.

The simplification provided by the variogram canbe generalized further. In particular, suppose thecovariates {fp(x): p = 1, . . . , N} contain all mono-mials with degree less than or equal to some non-negative integer k0. In addition, suppose there existsa function G(x, y) which is related to K(x, y) asfollows

K(x, y) = G(x, y) +!

0&|i |&k0

ai (y)x i +!

0&|i |&k0

bi (x)y i

+!

0&|i |+|j |&2k0

ci ,j xi y j (8)

where i , j are d-dimensional multiindices; ai (y) andbi (x) are arbitrary functions; and ci ,j is an arbitraryreal number for each i , j . Then to compute (4), (5),(6), or (7) one can simply replace all occurrences ofK(x, y) with G(x, y) to obtain the same quantity.A derivation is beyond the scope of this article,however, we remark that the natural way to show thisfact is through the kriging characterization given by(6) along with the unbiasedness condition Xt% = xt .Any function G(x, y) which satisfies (8) for somepositive definite function K(x, y) is said to be ageneralized covariance function of order k0 [7, 12,14]. A detailed study of the theory of generalized

covariance functions can be found in Refs [7 and14].

Although the form of (8) seems complicated, thereare rich classes of functions G(x, y) which satisfy(8). For example, any covariance function K(x, y)

can be written as in (8) when k0 = 0 and G(x, y)

is set to $12var(Z(x) $ Z(y)). This recovers the

aforementioned fact that if one of the covariatesfp(x) is constant then the kriging prediction onlydepends on the variogram. Another important classof generalized covariance functions are indexed by asingle parameter ' > 0 and given by

G'(x, y)

=.

($1)1+''/2(|x $ y|', when '/2 )! "($1)1+'/2|x $ y|' log |x $ y|, when '/2 ! "

where " denotes the set of integers. It can be shownthat G'(x, y) is a generalized covariance function oforder k0 = ''/2( (see Sections 4.5.5 and 4.5.7 inRef. 7). Therefore, one can use G'(x, y) in placeof K(x, y) in any of the formulas (4), (6), (7), or(5) so long as the set of covariates {fp(x): p =1, . . . , N} contain all monomials of order less than orequal to ''/2(. Note that when 0 < ' < 2 the func-tion G'(x, y) corresponds to a generalized covari-ance function of order k0 = 0 for the fractionalBrownian covariance K(x, y) = |x|' + |y|' $ |x $y|' . Another important case occurs when d = 2, ' =2, and the set of covariates {fp(x): p = 1, . . . , N}are exactly the set of monomials of order lessthan or equal to k0 = 1. In this case, kriging withthe generalized covariance G2(x, y) yields the clas-sic thin-plate spline smoother (see Section 2.4 ofRef. 4).

One advantage provided by the decompositiongiven in (8) is that one needs not worry about mod-eling aj (y), bj (x), or ci ,j , when the set of covari-ates {fp(x): p = 1, . . . , N} contain all monomialsof order less than or equal to k0 since it hasno impact on prediction or prediction mean squareerror. A particularly cogent example is the invari-ance of kriging to adding a random, mean zero,polynomial of order k0 to Z. If the random coef-ficients have finite second moments, then the onlyeffect on the covariance structure of Z is throughaj (y), bj (x), and ci ,j in (8). Therefore, it is unneces-sary to model any such additive random polynomialin Z since it is inconsequential to kriging (so long


Kriging 5

as the covariates fp contain a sufficient number ofmonomials).

Estimating the Covariance Structure

Throughout the above exposition it has been assumedthat the covariance function K(x, y) and the standarddeviation " are both known. In practice, however,this is almost never the case. There are a multitudeof different techniques for estimating K and " .Examples include variogram model fitting, spectraldensity estimation, maximum likelihood (or REML)estimation, Bayesian techniques, and cross validation.In this section, we present a short discussion oftwo popular techniques: REML estimation and crossvalidation.

If K(x, y) is known up to some parameter vec-tor ( , then maximum likelihood estimation (MLE)based on the data y provides a natural approachfor estimating ( and " . One of the difficulties withthe MLE, in this case, is the presence of the addi-tional (unknown) coefficients ! in the data modely # N[X!, $(," ], where $(," denotes the covariancematrix of y as it depends on the unknown param-eters ( and " . REML estimation circumvents thisproblem by restricting the data to linear combina-tions of the data which do not depend on !. Forexample, assume there exists a matrix M such thatthe rows form linearly independent vectors which areorthogonal to the columns of X. Notice this impliesthat MX = 0 and therefore My # N[0, M$(," Mt ]which does not depend on !. The REML estimateis now defined as the maximizer of the likelihoodfor My over ( and " . REML estimation is par-ticularly well suited for the generalized covariancefunctions discussed in section titled The Variogramand Generalized Covariance Functions. In particu-lar, if the covariates {fp(x): p = 1, . . . , N} containall monomials with degree less than or equal to somenonnegative integer k0 and K(x, y) is decomposedas in (8), then the covariance matrix M$(," Mt onlydepends on G(x, y). This fact along with the discus-sion presented in section titled The Variogram andGeneralized Covariance Functions provides a unifiedmethodology for estimation and kriging within theclass of generalized covariance functions.

Cross validation is another approach which canbe used to estimate " and a covariance parameter( . Like REML, cross validation only requires mod-eling the generalized covariance functions when the

covariates are sufficiently rich. The simplest formof cross validation divides the data into a test setand a training set. The training set is used to pre-dict (with kriging) the test set values. Predictionsquared error is measured then subsequently mini-mized to find the optimal values of ( and " . Thisis the basic cross validation technique (with manygeneralizations). Since kriging only depends on thegeneralized covariance functions G(x, y) (when thecovariates contain enough monomials), the corre-sponding cross validation estimates of ( and " onlydepend on G(x, y). We mention a minor simplifi-cation that occurs when performing cross validationwhich can reduce the number of kriging parame-ters by one. When the covariance structure of Zis known up to a proportionality constant, so thatcov(Z(x), Z(y)) = (2K0(x, y), for example, then thekriging prediction only depends on " 2 and (2 throughthe ratio " 2/(2. In particular, one can simply replaceK(x, y) with K0(x, y) and " 2 with " 2/(2 in (4),(6), or (7) to obtain the same value. In doing so, thenumber of parameters which needs to be estimated isreduced by one. Note that a derivation of this fact fol-lows simply from the GLS characterization of kriginggiven in (4).

Extensions to the Basic Model

The kriging methodology presented here constitutesonly a small portion of what appears in the currentliterature on kriging. In this section, we discuss someof the generalizations on the basic kriging modelpresented in section titled The Basic Model, withthe understanding that this is far from an exhaustivereview. We loosely characterize generalizations bythose which preserve some facet of linearity and thosewhich do not.

There are a whole class of kriging problemsdevoted to linear generalizations of the basic krigingmethodology. For example, one can generalize thestipulation, given in section The Basic Model, thatthe observations and predictions are based on point-wise observations of Y to the case that one observesor predicts linear functionals of Y . For example, inblock kriging the problem is to predict a block aver-age 1

|B|/B

Y (x)dx where B is some d-dimensionalregion and |B| is the volume. In another situation,the observations or predictions are derivatives of Y .Another linear generalization, called cokriging, deals


6 Kriging

with the multidimensional setting where Y is a ran-dom vector field, mapping!d to!m for some m > 1.Many of the results found in sections titled Introduc-tion and Estimating the Covariance Structure extendeasily to these examples.

In contrast to the linear case, some generalizationsintroduce nonlinearity into the kriging methodology.Consider, for example, the situation that the obser-vations or predictions are nonlinear functions of Y(indicators, maximum values, etc.). Another exam-ple postulates nonlinearity in the mean function ofY . Yet another considers predictors "Y (x0) which arenonlinear functions of the data y1, . . . , yn. This isoften used when the data is highly non-Gaussian sothat restricting to linear combinations of the data mayresult in poor performance. Two cogent examplesare robust kriging and Bayesian kriging. In robustkriging, one attempts to mitigate the effects of out-liers or highly skewed/non-Gaussian distributions. InBayesian kriging, one incorporates a prior distribu-tion on the mean function of Y (x) which can producepredictors which are nonlinear functions of the datay1, . . . , yn.

References

[1] Krige, D. (1951). A statistical approach to some basicmine valuation problems on the Witwatersrand, Journalof the Chemical, Metallurgical and Mining Society ofSouth Africa 52, 119–139.

[2] Matheron, G. (1963). Principles of geostatistics, Eco-nomic Geology 58, 1246–1266.

[3] Cressie, N. (1990). The origins of Kriging, MathematicalGeology 22, 239–252.

[4] Wahba, G. (1990). Spline Models for ObservationalData, SIAM, Philadelphia.

[5] Gandin, L. S. (1965). Objective analysis of meteoro-logical fields, Israel Program for Scientific Translations,Jerusalem.

[6] Armstrong, M. (1998). Basic Linear Geostatistics,Springer-Verlag, Berlin.

[7] Chiles, J-P. & Delfiner, P. (1999). Geostatistics: Model-ing Spatial Uncertainty, John Wiley & Sons, New York.

[8] Cressie, N. (1991). Statistics for Spatial Data, JohnWiley & Sons, New York.

[9] Goovaerts, P. (1997). Geostatistics for Natural ResourcesEvaluation, Oxford University Press, Oxford.

[10] Kitanidis, P. K. (1997). Introduction to Geostatistics:Applications to Hydrogeology, Cambridge UniversityPress, Cambridge.

[11] Olea, R. (1999). Geostatistics for Engineers and EarthScientists, Kluwer, Dordrecht.

[12] Stein, M. (1999). Interpolation of Spatial Data: SomeTheory for Kriging, Springer, New York.

[13] Wackernagel, H.(1998). Multivariate Geostatistics, 2ndEdition, Springer-Verlag, Berlin.

[14] Matheron, G. (1973). The intrinsic random functions andtheir applications, Advances in Applied Probability 5,439–468.

(See also Kriging, asymptotic theory; Covari-ogram, examples of; Kriging for functional data;Multivariate kriging; Random field, Gaussian;Random fields; Screening effect)

ETHAN B. ANDERES


'Kriging' in: Encyclopedia of Environmetrics, Second ...anderes/papers/Kriging.pdf · Kriging...

Documents

Transcript of 'Kriging' in: Encyclopedia of Environmetrics, Second ...anderes/papers/Kriging.pdf · Kriging...