A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G...

10

Click here to load reader

Transcript of A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G...

Page 1: A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G UAN and Ji Meng L OH ... Estimates for the regression coefÞcients, ... block bootstrap

A Thinned Block Bootstrap Variance EstimationProcedure for Inhomogeneous Spatial Point Patterns

Yongtao GUAN and Ji Meng LOH

When modeling inhomogeneous spatial point patterns it is of interest to fit a parametric model for the first-order intensity function (FOIF)of the process in terms of some measured covariates Estimates for the regression coefficients say β can be obtained by maximizing aPoisson maximum likelihood criterion Little work has been done on the asymptotic distribution of β except in some special cases Inthis article we show that β is asymptotically normal for a general class of mixing processes To estimate the variance of β we propose anovel thinned block bootstrap procedure that assumes that the point process is second-order reweighted stationary To apply this procedureonly the FOIF and not any high-order terms of the process needs to be estimated We establish the consistency of the resulting varianceestimator and demonstrate its efficacy through simulations and an application to a real data example

KEY WORDS Block bootstrap Inhomogeneous spatial point process Thinning

1 INTRODUCTION

A main interest when analyzing spatial point pattern data is tomodel the first-order intensity function (FOIF) of the underly-ing process that has generated the spatial point pattern Heuris-tically the FOIF is a function that describes the likelihood foran event of the process to occur at a given location (see Sec 2for its formal definition) We say a process is homogeneous ifits FOIF is a constant and inhomogeneous otherwise In prac-tice we often wish to model the FOIF in relation to some mea-sured covariates For example for the Beilschmiedia pendulaLauraceae (BPL) data given in Section 5 we want to modelthe FOIF in terms of two important variables of landscape fea-tures elevation and gradient Because the FOIF characterizesthe probability finding a BPL tree at a given location with theassociated elevation and gradient values the study of this func-tion can yield valuable insights into how these landscape fea-tures affect the spatial distribution of BPL trees If a significantrelationship can be established then this will provide evidencein support of the niche assembly theory which states that differ-ent species benefit from different habitats determined by localenvironmental features (eg Waagepetersen 2007)

To model the FOIF we assume that it can be expressed asa parametric function of the available covariates Specificallylet N denote a spatial point process defined over R

2 with theFOIF at s isin R

2 given by λ(sβ) where β is a p times 1 vector ofunknown regression coefficients associated with the covariatesand let D be the region in which a realization of N has beenobserved Our goal is then to estimate and make inference onthe regression coefficients β

To estimate β we consider the following maximum likeli-hood criterion

U(β) = 1

|D|sum

xisinDcapN

logλ(xβ) minus 1

|D|int

D

λ(sβ)ds (1)

The maximizer of (1) is taken as an estimator for β (denotedby β throughout this section) Note that (1) is proportional to

Yongtao Guan is Assistant Professor Division of Biostatistics Yale Univer-sity New Haven CT 06520 (E-mail yongtaoguanyaleedu) Ji Meng Loh isAssociate Professor Department of Statistics Columbia University New YorkNY 10027 Guanrsquos research was supported in part by National Science Founda-tion (NSF) grant DMS-0706806 Ji Meng Lohrsquos research was supported in partby NSF grant AST-0507687 The authors thank Steve Rathbun Mike Shermanand Rasmus Waagepetersen for helpful discussions and the joint editors asso-ciate editor and three referees for their constructive comments that have greatlyimproved the manuscript

the true maximum likelihood if N is an inhomogeneous Pois-son process that is when the events of the process occurring indisjoint sets are completely independent For the BPL examplehowever the locations of the BPL trees are likely to be clus-tered possibly due to seed dispersal andor correlation amongenvironmental factors that have not been accounted for by themodel Schoenberg (2004) showed that under some mild con-ditions β obtained by maximizing (1) is still consistent for β

for a class of spatialndashtemporal point process models even ifthe process is not Poisson however he did not provide the as-ymptotic distribution of β Waagepetersen (2007) significantlyextended the scope of the method by deriving asymptotic prop-erties of β including asymptotic normality for a wide class ofspatial cluster processes

To make inference on β we need information on the distri-butional properties of β One standard approach to this is toderive the limiting distribution of β under an appropriate as-ymptotic framework and then use it as an approximation for thedistribution of β in a finite-sample setting We note that cur-rently available asymptotics for inhomogeneous spatial pointprocesses are inadequate because they either assume completespatial independence that is the process is Poisson (eg Rath-bun and Cressie 1994) or use a parametric model for the de-pendence of the process (eg Waagepetersen 2007) For dataarising from biological studies such as the BPL data howeverthe underlying biological process generating the spatial pointpatterns is rather complex and often not well understood Thususing a specific form for the dependence may be debatable andcould lead to incorrect inference on the regression parametersβ (because the distribution of β depends on the dependencestructure of the process)

In this article we study the distributional properties of β un-der an increasing domain setting To quantify the dependenceof the process we use a more flexible model-free mixing con-dition (Sec 2) but do not assume any specific parametric struc-ture on the dependence Our main result shows that under somemild conditions the standardized distribution of β is asymptot-ically normal If the variance of β is known the approximateconfidence intervals for β can be obtained so that the inferenceon β becomes straightforward Thus our result further extendsthe scope of applications of (1) beyond the work of Schoenberg(2004) and Waagepetersen (2007)

copy 2007 American Statistical AssociationJournal of the American Statistical Association

December 2007 Vol 102 No 480 Theory and MethodsDOI 101198016214507000000879

1377

1378 Journal of the American Statistical Association December 2007

One complication in practice is that the variance of β is un-known and thus must be estimated From Theorem 1 in Sec-tion 2 we see that the variance of β depends on the second-order cumulant function (SOCF) of the process a function thatis related to the dependence structure of the process To avoidspecifying the SOCF we develop a nonparametric thinnedblock bootstrap procedure to estimate the variance of β by us-ing a combination of a thinning algorithm and block bootstrapSpecifically in the thinning step we retain (ie do not thin) anobserved event from N with a probability that is proportional tothe inverse of the estimated FOIF at the location of the event IfN is second-order reweighted stationary (see Sec 2) and if β isclose to β then the thinned process should resemble a second-order stationary (SOS) process In Section 3 we show that thevariance of β can be written in terms of the variance of a sta-tistic Sn where Sn is computed from the thinned process and isexpressed in terms of the intensity function only The task of es-timating the variance of β is thus translated into estimating thevariance of a statistic defined on an SOS process which we canaccomplish using block bootstrap In Section 3 we prove thatthe resulting variance estimator is L2-consistent for the targetvariance and in Section 4 we report a simulation study done toinvestigate the performance of the proposed procedure To il-lustrate its use in a practical setting we also apply the proposedprocedure to the BPL data in Section 5

Before proceeding to the next section we note that resam-pling methods including the block bootstrap have been exten-sively applied in spatial statistics (eg Politis 1999 Lahiri2003) However most of these methods were developed forstationary quantitative spatial processes that are observed on aregularly spaced grid In the regression setting also for quan-titative processes Cressie (1993 sec 732) discussed bothsemiparametric and parametric bootstrap methods to resam-ple the residuals whereas Sherman (1997) proposed a subsam-pling approach Politis and Sherman (2001) and McElroy andPolitis (2007) considered using subsampling for marked pointprocesses The former assumed the point process to be station-ary whereas the latter assumed it to be Poisson None of theaforementioned resampling procedures can be used for inho-mogeneous spatial point processes due to the unique feature ofthe data Note that by nature an inhomogeneous spatial pointprocess is not quantitative is observed at random locations isnonstationary and can be non-Poisson

2 NOTATION AND PRELIMINARYASYMPTOTIC RESULTS

Let N be a two-dimensional spatial point process observedover a domain of interest D For a Borel set B sub R

2 let |B|denote the area of B and let N(B) denote the number of eventsfrom N that fall in B We define the kth-order intensity andcumulant functions of N as

λk(s1 sk) = lim|dsi |rarr0

E[N(ds1) middot middot middotN(dsk)]

|ds1| middot middot middot |dsk|

i = 1 k

and

Qk(s1 sk) = lim|dsi |rarr0

cum[N(ds1) N(dsk)]

|ds1| middot middot middot |dsk|

i = 1 k

Here ds is an infinitesimal region containing s and cum(Y1

Yk) is the coefficient of ikt1 middot middot middot tk in the Taylor series expansionof logE[exp(i

sumkj=1 Yj tj )] about the origin (eg Brillinger

1975) For the intensity function λk(s1 sk)|ds1| middot middot middot |dsk| isthe approximate probability that ds1 dsk each contains anevent For the cumulant function Qk(s1 sk) describes thedependence among sites s1 sk where a near-0 value in-dicates near independence Specifically if N is Poisson thenQk(s1 sk) = 0 if at least two of s1 sk are differentThe kth-order cumulant (intensity) function can be expressedas a function of the intensity (cumulant) functions up to thekth-order (See Daley and Vere-Jones 1988 p 147 for details)

We study the large-sample behavior of β under an increasing-domain setting where β is obtained by maximizing (1) Specif-ically consider a sequence of regions Dn Let partDn denote theboundary of Dn let |partDn| denote the length of partDn and let βn

denote β obtained over Dn We assume that

C1n2 le |Dn| le C2n

2 C1n le |partDn| le C2n

for some 0 lt C1 le C2 lt infin (2)

Condition (2) requires that Dn must become large in all di-rections (ie the data are truly spatial) and that the boundarymust not be too irregular Many commonly used domain se-quences satisfy this condition To see an example of this letA sub (01] times (01] be the interior of a simple closed curve withnonempty interior If we define Dn as A inflated by a factor nthen Dn satisfies condition (2) Note that this formulation in-corporates a wide variety of shapes including rectangular andelliptical shapes

To formally state the large-sample distributional propertiesof βn we need to quantify the dependence in N We do thisby using the model-free strong mixing coefficient (Rosenblatt1956) defined as

α(p k) equiv sup|P(A1 cap A2) minus P(A1)P (A2)|

A1 isinF(E1)A2 isinF(E2)

E2 = E1 + s |E1| = |E2| le pd(E1E2) ge k

Here the supremum is taken over all compact and convex sub-sets E1 sub R

2 and over all s isin R2 such that d(E1E2) ge k

where d(E1E2) is the maximal distance between E1 and E2(eg Guan Sherman and Calvin 2006) and F(E) is the σ -algebra generated by the random events of N that are in E Weassume the following mixing condition on N

supp

α(p k)

p= O(kminusε) for some ε gt 2 (3)

Condition (3) states that for any two fixed sets the dependencebetween them must decay to 0 at a polynomial rate of the in-terset distance k The speed in which the dependence decaysto 0 also depends on the size of the sets (ie p) In particularfor a fixed k condition (3) allows the dependence to increaseas p increases Any point process with a finite dependencerange such as the Mateacutern cluster process (Stoyan and Stoyan1994) satisfies this condition Furthermore it is also satisfiedby the log-Gaussian Cox process (LGCP) which is very flexi-ble in modeling environmental data (eg Moslashller Syversveenand Waagepetersen 1998) if the correlation of the underlying

Guan and Loh Block Bootstrap Variance Estimation 1379

Gaussian random field decays at a polynomial rate faster than2 + ε and has a spectral density bounded below from 0 This isdue to corollary 2 of Doukhan (1994 p 59)

In addition to conditions (2) and (3) we also need some mildconditions on the intensity and cumulant functions of N Inwhat follows let f (i)(β) denote the ith derivative of f (β) Weassume that

λ(sβ) is bounded below from 0 (4)

λ(2)(sβ) is bounded and continuous with respect to β (5)

and

sups1

intmiddot middot middot

int|Qk(s1 sk)|ds2 middot middot middot dsk lt C

for k = 234 (6)

Conditions (4) and (5) are straightforward conditions that canbe checked directly for a proposed FOIF model Condition (6)is a fairly weak condition It also requires the process N to beweakly dependent but from a different perspective from (3)In the homogeneous case (6) is implied by Brillinger mix-ing which holds for many commonly used point process mod-els such as the Poisson cluster process a class of doubly sto-chastic Poisson process (ie Cox process) and certain renewalprocess (eg Heinrich 1985) In the inhomogeneous caseQk(s1 sk) often can be written as λ(s1) middot middot middotλ(sk)ϕk(s2 minuss1 sk minus s1) where ϕk(middot) is a cumulant function for somehomogeneous process Then (6) holds if λ(s) is bounded andint middot middot middot int |ϕk(u2 uk)|du2 middot middot middot duk lt C Processes satisfyingthis condition include but are not limited to the LGCP theinhomogeneous NeymanndashScott process (INSP Waagepetersen2007) and any inhomogeneous process obtained by thinning ahomogeneous process satisfying this condition

Theorem 1 Assume that conditions (2)ndash(6) hold and that βn

converges to β0 in probability where β0 is the true parametervector Then

|Dn|12(n)minus12(βn minus β0)

drarr N(0 Ip)

where Ip is a p times p identity matrix

n = |Dn|(An)minus1Bn(An)

minus1

An =int

Dn

λ(1)(sβ0)[λ(1)(sβ0)]primeλ(sβ0)

ds

and

Bn = An +int int

Dn

λ(1)(s1β0)[λ(1)(s2β0)]primeλ(s1β0)λ(s2β0)

Q2(s1 s2) ds1 ds2

Proof See Appendix A

We assume in Theorem 1 that βn is a consistent estimatorfor β0 Schoenberg (2004) established the consistency of βn fora class of spatialndashtemporal processes under some mild condi-tions He assumed that the spatial domain was fixed but thatthe time domain increased to infinity His results can be di-rectly extended to our case Therefore we simply assume con-sistency in Theorem 1 In connection with previous results ourasymptotic result coincides with that of Rathbun and Cressie(1994) in the inhomogeneous spatial Poisson process case and

Waagepetersen (2007) in the INSP case Furthermore note thatn depends only on the first- and second-order properties of theprocess

3 THINNED BLOCK BOOTSTRAP

31 The Proposed Method

Theorem 1 states that |Dn|12(n)minus12(βn minus β0) converges

to a standard multivariate normal distribution as n increasesThis provides the theoretical foundation for making inferenceon β In practice however n is typically unknown and thusmust be estimated From the definition of n in Theorem 1 wesee that two quantities (ie An and Bn) need to be estimatedto estimate n Note that An depends only on the FOIF andthus can be estimated easily once a FOIF model has been fit-ted to the data however the quantity Bn also depends on theSOCF Q2(middot) Unless an explicit parametric model is assumedfor Q2(middot) which (as argued in Sec 1) may be unrealistic it isdifficult to directly estimate Bn In this section we propose athinned block bootstrap approach to first estimate a term that isrelated to Bn and then use it to produce an estimate for Bn

From now on we assume that the point process N issecond-order reweighted stationary (SORWS) that is thereexists a function g(middot) defined in R

2 such that λ2(s1 s2) =λ(s1)λ(s2)g(s1 minus s2) The function g(middot) is often referred toas the pair correlation function (PCF eg Stoyan and Stoyan1994) Note that our definition of second-order reweightedstationarity is a special case of the original definition givenby Baddeley Moslashller and Waagepetersen (2000) Specificallythese two definitions coincide if the PCF exists Examples ofSORWS processes include the INSP the LGCP and any pointprocess obtained by thinning an SORWS point process (egBaddeley et al 2000) We also assume that β0 is known Thuswe suppress the dependence of the FOIF on β in this sectionIn practice the proposed algorithm can be applied by simplyreplacing β0 with its estimate β A theoretical justification forusing β is given in Appendix C

The essence of our approach is based on the fact that anSORWS point process can be thinned to be SOS by apply-ing some proper thinning weights to the events (Guan 2007)Specifically we consider the following thinned process

=

x x isin N cap DnP (x is retained) = minsisinDn λ(s)λ(x)

(7)

Clearly is SOS because its first- and second-order intensityfunctions can be written as

λn = minsisinDn

λ(s) and λ2n(s1 s2) = (λn)2g(s1 minus s2)

where s1 s2 isin Dn Based on we then define the followingstatistic

Sn =sum

xisincapDn

λ(1)(x) (8)

Note that Q2(s1 s2) = λ(s1)λ(s2)[g(s1 minus s2)minus 1] because N isSORWS Thus the covariance matrix of Sn where Sn is definedin (8) is given by

cov(Sn)

= λn

int

Dn

λ(1)(s)[λ(1)(s)

]primeds

1380 Journal of the American Statistical Association December 2007

+ (λn)2int int

Dn

λ(1)(s1)[λ(1)(s2)]primeλ(s1)λ(s2)

Q2(s1 s2) ds1 ds2 (9)

The first term on the right side of (9) depends only on the FOIFand thus can be estimated once a FOIF model has been fittedThe second term interestingly is (λn)

2(Bn minus An) These factssuggest that if we can estimate cov(Sn) then we can use the re-lationship between cov(Sn) and Bn to estimate Bn in an obviousway Note that Sn is a statistic defined on cap Dn where isan SOS process Thus the problem of estimating Bn becomes aproblem of estimating the variance of a statistic calculated froman SOS process which we can accomplish by using the blockbootstrap procedure

Toward that end we divide Dn into kn subblocks of the sameshape and size Let Di

l(n) i = 1 kn denote these subblockswhere l(n) = cnα for some c gt 0 and α isin (01) signifies thesubblock size For each Di

l(n) let ci denote the ldquocenterrdquo of

the subblock where the ldquocentersrdquo are defined in some consis-tent way across all of the Di

l(n)rsquos We then resample with re-placement kn subblocks from all of the available subblocks andldquogluerdquo them together so as to create a new region that will beused to approximate Dn We do so for a large number say B

times For the bth collection of random subblocks let Jb be theset of kn random indices sampled from 1 kn that are as-sociated with the selected subblocks For each event x isin D

Jb(i)l(n)

we replace it with a new value x minus cJb(i) + ci so that all events

in DJb(i)l(n)

are properly shifted into Dil(n)

In other words the ith

sampled subblock DJb(i)l(n) will be ldquogluedrdquo at where Di

l(n) wasin the original data We then define

Sbn =

knsum

i=1

sum

xisincapDJb(i)

l(n)

λ(1)(x minus cJb(i) + ci

) (10)

where the sumsum

xisincapDJb(i)

l(n)

is over all events in block DJb(i)l(n)

translated into block Dil(n)

Based on the Sbn b = 1 B thus

obtained we can calculate the sample covariance of Sbn and use

that as an estimate for cov(Sn) Note that for a given realizationof N on Dn the application of the thinning algorithm givenin (7) can yield multiple thinned realizations This will in turnlead to multiple sample covariance matrices by applying theproposed block bootstrap procedure One possible solution is tosimply average all of the obtained sample covariance matricesso as to produce a single estimate for cov(Sn) To do this let K

denote the total number of thinned processes and let Sbnj and

Snj denote Sbn and the bootstrapped mean of Sb

n b = 1 B for the j th thinned process where Sb

n is as defined in (10) Wethus obtain the following estimator for the covariance matrixof Sn

cov(Sn) = 1

K

Ksum

j=1

Bsum

b=1

(Sbnj minus Snj )(Sb

nj minus Snj )prime

B minus 1 (11)

32 Consistency and Practical Issues

To study the asymptotic properties of cov(Sn) we also as-sume the following condition

1

|Dn|12

int

Dn

int

R2Dn

|g(s1 minus s2) minus 1|ds1 ds2 lt C (12)

where R2Dn = s s isin R

2 but s isin Dn Condition (12) is onlyslightly stronger than

intR2 |g(s) minus 1|ds lt C which is implied

by condition (6) In the case where the Dnrsquos are sufficientlyregular (eg a sequence of n times n squares) such that |(Dn +s) cap R

2Dn| lt Cs|Dn|12 then condition (12) is implied bythe condition

intR2 s|g(s) minus 1|ds lt C which holds for any

process that has a finite range of dependence Furthermore itholds for the LGCP if the correlation function of the underlyingGaussian random field generating the intensities decays to 0 ata rate faster than sminus3

Theorem 2 Assume that conditions (2) (4) (5) and (6)hold Then

1

|Dn|cov(Sn)

L2rarr 1

|Dn| cov(Sn)

If we further assume (12) then 1|Dn|2 E[ cov(Sn)minuscov(Sn)]2 le

C[1kn + 1|Dl(n)|] for some C lt infin

Proof See Appendix B

Theorem 2 establishes the L2 consistency of the proposedcovariance matrix estimator for a wide range of values for thenumber of thinned processes and the subblock size But to ap-ply the proposed method in practice these values must be deter-mined For the former the proposed thinning algorithm can beapplied as many times as necessary until a stable estimator canbe obtained For the latter Theorem 2 suggests that the ldquoopti-malrdquo rate for the subblock size is |Dn|12 where the word ldquoopti-malrdquo is in the sense of minimizing the mean squared error Thisresult agrees with the findings for the optimal subblock size forother resampling variance estimators obtained from quantita-tive processes (eg Sherman 1996) If we further assume thatthe best rate can be approximated by cn = c|Dn|12 where c isan unknown constant then we can adapt the algorithm of Halland Jing (1996) to estimate cn and thus determine the ldquooptimalrdquosubblock size

Specifically let Dim i = 1 kprime

n be a set of subblocks con-tained in Dn that have the same size and shape let Si

mj be Sn

in (8) obtained from the j th thinned process on Dim and let

Smj be the mean of Simj We then define the sample variancendash

covariance matrix of Simj averaged over the K replicate thinned

point processes

θm = 1

K

Ksum

j=1

kprimensum

i=1

(Simj minus Smj )(Si

mj minus Smj )prime

kprimen minus 1

If Dim is relatively small compared with Dn then the forego-

ing should be a good estimator for the true covariance matrix ofSm where Sm is Sn in (8) defined on Dm For each Di

m we thenapply (11) to estimate cov(Sm) over a fine set of candidate sub-block sizes say cprime

m = cprime|Dim|12 Let θm(j k) and [θ i

m(j k)]prime bethe (j k)th element of θm and the resulting estimator for usingcprimem We define the best cm as the one that has the smallest value

for the following criterion

M(cprimem) = 1

kprimen

kprimensum

i=1

psum

j=1

psum

k=1

[θ im(j k)]prime minus θm(j k)

2

The ldquooptimalrdquo subblock size for Dn cn then can be estimatedeasily using the relationship cn = cm(|Dn||Dm|)12

Guan and Loh Block Bootstrap Variance Estimation 1381

4 SIMULATIONS

41 Log-Gaussian Cox Process

We performed a simulation study to test the results of ourtheoretical findings On an observation region D of size L timesL where L = 12 and 4 we simulated 1000 realizations ofan inhomogeneous Poisson process with intensity function λ(s)given by

logλ(s) = α + βX(s) + G(s) (13)

where G is a mean-0 Gaussian random field This is the LGCPmodel with inhomogeneity due to the covariate X

For values of X we used a single realization from a dif-ferent mean-0 Gaussian random field We kept this realizationfixed throughout the whole simulation study that is we usedthe same X for each of the 1000 point process realizationsFigure 1 shows a surface plot of X for L = 4 The values ofX for L = 1 and 2 are the lower-left corner of those shownin the figure For G we used the exponential model C(h) =σ 2 exp(minushρ) for its covariance function where ρ = 1 and 2and σ = 1 We set α and β to be 702 and 2 We generated thepoint process patterns using the rpoispp function in the spat-stat R package (Baddeley and Turner 2005) The average num-ber of points simulated were about 1200 4500 and 16700 forL = 12 and 4

For each of the 1000 point patterns we obtained the esti-mates α and β by maximizing the Poisson likelihood (usingthe ppm function in spatstat) Next we thinned the point pat-tern according to the criterion described in Section 3 resultingin 20 independent thinned processes for each realization Theaverage number of points in the thinned patterns was roughly490 1590 and 5400 for L = 12 and 4 so about 30ndash40

Figure 1 Image of the covariate X used in the simulation Therange of X is between minus6066 and 5303 where darker colors rep-resent larger values of X

of the points were retained We then applied the bootstrap pro-cedure to each of the thinned processes to get the variance esti-mates for α and β From (13) we found that the quantities to becomputed with each bootstrap sample were

sumλ(slowast) for α andsum

X(slowast)λ(slowast) for β where slowast represents the locations of thepoints in a bootstrap sample We used nonoverlapping squareblocks in the bootstrap procedure For L = 1 we used blockswith side lengths 14 13 and 12 whereas for L = 24 weused blocks with side lengths 14 13 12 and 1 The numberof bootstrap samples was 499

We computed the variances of α and β from the bootstrapvariances using Theorem 1 (9) and the relationship of cov(Sn)

to Bn We averaged the variance estimates across the inde-pendent thinnings Finally we constructed nominal 95 con-fidence intervals yielding a total of 1000 confidence intervalsfor α and β

Table 1 gives the empirical coverage of the confidence inter-vals for β The coverage for α (not shown) is similar but slightlylower For both values of ρ we found that the empirical cover-age increased toward the nominal level as the observation re-gion size increased This increase in coverage agrees with whatwe would expect from the theory The empirical coverage forρ = 1 is typically closer to the nominal level than for ρ = 2Note that ρ = 1 corresponds to a weaker dependence Thuswe would expect our method to work better if the dependencein the data were relatively weaker It is interesting to note thatfor L = 2 and 4 the best coverage for ρ = 2 was achieved byusing a block larger than its counterpart for ρ = 1 Thus forerrors with longer-range dependence it appears that the blockbootstrap procedure requires larger blocks to work well

Table 2 gives the bootstrap estimates of the standard errorsof β for ρ = 2 averaged over the 1000 realizations The stan-dard deviations of these estimates over the 1000 realizationsare included in brackets The ldquotruerdquo standard errors computedusing the estimated regression parameters from the independentrealizations are also given The table shows that the bootstrapestimates have a negative bias becoming proportionally closerto the ldquotruerdquo standard errors as the region size L increases

In summary we found that our method worked reasonablywell when the regression parameters were estimated We founda slight undercoverage in each instance but noted that this un-dercoverage becomes smaller as the sample size increases Wealso estimated the standard errors by using the true regressionparameters in our thinning step and found that the coverageincreased only slightly

Table 1 Empirical coverage of 95 nominal confidence intervalsof β in the LGCP case

Block size

Region size 14 13 12 1

ρ = 11 85 77 492 91 90 88 494 93 93 93 90

ρ = 21 84 76 502 87 88 87 504 88 89 90 89

1382 Journal of the American Statistical Association December 2007

Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case

ρ = 2 Block size

Region size True SE 14 13 12 1

1 24 17(04) 16(05) 13(06)

2 12 10(02) 10(02) 10(03) 08(03)

4 07 056(003) 057(004) 060(006) 060(008)

42 Inhomogeneous NeymanndashScott Process

Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case

To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case

We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β

and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing

int a

0[K(t)c minus K(tκω)c]2 dt (14)

with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ

and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode

Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the

former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method

5 AN APPLICATION

We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model

λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not

Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and

the plug-in method in the INSP case

Block size

Region size Plug-in 14 13 12

ω = 021 96 91 89 812 96 94 94 94

ω = 041 93 87 87 772 95 90 90 91

Guan and Loh Block Bootstrap Variance Estimation 1383

Figure 2 Locations of BLP trees

We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters

To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not

Table 4 Results for the BLP data

Plug-in method Thinned block bootstrap

EST STD 95 CI STD 95 CI

β1 02 02 (minus02 06) 017 (minus014 054)

β2 584 253 (891080) 212 (169999)

NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval

From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions

APPENDIX A PROOF OF THEOREM 1

Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device

Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)

n (β0)]4 lt C for some C lt infin

Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-

tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded

by the following (ignoring some multiplicative constants)int int int int

|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4

+[int int

|Q2(s1 s2)|ds1 ds2

]2

+int int int

|Q3(s1 s2 s3)|ds1 ds2 ds3

+ |Dn|int int

|Q2(s1 s2)|ds1 ds2

+int int

|Q2(s1 s2)|ds1 ds2

where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven

Proof of Theorem 1

First note that

U(1)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(1)(xβ)

λ(xβ)minus

intλ(1)(sβ)ds

]

and

U(2)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(2)(xβ)

λ(xβ)minus

intλ(2)(sβ)ds

]

minus 1

|Dn|sum

xisinNcapDn

[λ(1)(xβ)

λ(xβ)

]2

= U(2)na(β) minus U

(2)nb

(β)

Using the Taylor expansion we can obtain

|Dn|12(βn minus β0) = minus[U

(2)n (βn)

]minus1|Dn|12U(1)n (β0)

where βn is between βn and β0 We need to show that

U(2)na(β0)

prarr 0

and

U(2)nb

(β0)prarr 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

1384 Journal of the American Statistical Association December 2007

To show the first expression note that E[U(2)na(β0)] = 0 Thus we only

need look at the variance term

var[U

(2)na(β0)

]

= 1

|Dn|2int int

λ(2)(s1β0)λ(2)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(2)(sβ0)]2

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)

prarr 0

To show the second expression note that E[U(2)nb

(β0)] =1

|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only

the variance term

var[U

(2)nb

(β0)]

= 1

|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(1)(sβ0)]4

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb

(β0)prarr

1|Dn|

int [λ(1)(sβ0)]2λ(sβ0) ds

Now we want to show that |Dn|12U(1)n (β0) converges to a

normal distribution First we derive the mean and variance of|Dn|12U

(1)n (β0) Clearly E[|Dn|12U

(1)n (β0)] = 0 For the variance

var[|Dn|12U

(1)n (β0)

]

= 1

|Dn|int int

λ(1)(s1β0)λ(1)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

Now consider a new sequence of regions Dlowastn where Dlowast

n sub Dn and

|Dlowastn||Dn| rarr 1 as n rarr infin Let U

(1)nlowast(β0) be the estimating function

for the realization of the point process on Dlowastn Based on the expression

of the variance we can deduce that

var[|Dn|12U

(1)n (β0) minus |Dlowast

n|12U(1)nlowast(β0)

] rarr 0 as n rarr infin

Thus

|Dn|12U(1)n (β0) sim |Dlowast

n|12U(1)nlowast(β0)

where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast

n|12U(1)nlowast(β0) con-

verges to a normal distribution for some properly defined Dlowastn

To obtain a sequence of Dlowastn we apply the blocking technique used

by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt

η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi

l(n) i = 1 kn Within each subsquare we further obtain Di

m(n)

an m(n) times m(n) square sharing the same center with Dil(n)

Note that

d(Dim(n)

Djm(n)

) ge nη for i = j Now define

Dlowastn =

kn⋃

i=1

Dim(n)

Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show

that |Dlowastn|12U

(1)nlowast(β0) converges to a normal distribution This is true

due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here

APPENDIX B PROOF OF THEOREM 2

To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1

sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes

var(Sn) =Bsum

b=1

(Sbn minus Sn)2

B minus 1

As B goes to infinity we see that the foregoing converges to

var(Sbn | cap Dn)

= 1

kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)

sum

Dj

l(n)

Z(x1 minus cj + ci )Z(x2 minus cj + ci )

minus 1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj2l(n)

Z(x1 minus cj1 + ci

)

times Z(x2 minus cj2 + ci

)

where the notationsum

D is short forsum

Dcap We wish to show thatvar(Sb

n | cap Dn)|Dn| converges in L2 to

1

|Dn| var(Sn) = (λn)2

|Dn|int

Dn

int

Dn

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

+ λn

|Dn|int

Dn

[Z(s)]2 ds

To show this we need to show that

1

|Dn|E[var(Sbn | cap Dn)] rarr 1

|Dn| var(Sn) (B1)

and

1

|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)

To show (B1) note that

E[var(Sbn | cap Dn)]

= (λn)2knsum

i=1

int int

Dil(n)

Z(s1)Z(s2)g(s1 minus s2) ds1 ds2

+ λn(kn minus 1)

kn

int

Dn

[Z(s)]2 ds

minus (λn)2

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times g(s1 minus s2 + cj1 minus cj2

)ds1 ds2

Thus

1

|Dn|E[var(Sb

n | cap Dn)] minus var(Sn)

= minus λn

kn|Dn|int

Dn

[Z(s)]2 ds

minus (λn)2

|Dn|sumsum

i =j

int

Dil(n)

int

Dj

l(n)

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

Guan and Loh Block Bootstrap Variance Estimation 1385

minus (λn)2

|Dn|1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times ϕ(s1 minus s2 + cj1 minus cj2

)ds1 ds2

= minusT 1n minus T 2

n minus T 3n

T 1n goes to 0 due to conditions (2) and (5) For T 2

n and T 3n lengthy yet

elementary derivations yield

|T 2n | le C(λn)2

|kn|knsum

i=1

[1

|Dn|int

DnminusDn

|Dn cap (Dn minus s)||ϕ(s)|ds

minus 1

|Dil(n)

|int

Dil(n)

minusDil(n)

∣∣Dil(n) cap (

Dil(n) minus s

)∣∣|ϕ(s)|ds]

and

T 3n le C(λn)2

kn

1

|Dn|int

Dn

int

Dn

|ϕ(s1 minus s2)|ds1 ds2

Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1

n and T 3n are both

of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition

(12)To prove (B2) we first note that [var(Sb

n | cap Dn)]2 is equal to thesum of the following three terms

1

k2n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj2l(n)

[Z

(x1 minus cj1 + ci1

)

times Z(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)Z

(x4 minus cj2 + ci2

)]

minus 2

k3n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)

times Z(x4 minus cj3 + ci2

)]

and

1

k4n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

knsum

j4=1

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

sum

Dj4l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj2 + ci1

)Z

(x3 minus cj3 + ci2

)Z

(x4 minus cj4 + ci2

)]

Furthermore the following relationships are possible for x1 x4

a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but

x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =

x3 = x4 but x1 = x2e x1 = x2 = x3 = x4

When calculating the variance of 1|Dn| var(Sb

n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1

|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-

plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner

Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral

(λn)4

k2n|Dn|2

knsum

i1=1

knsum

i2=1

knsum

j1=1

int

Dj1l(n)

Z(s1 minus cj1 + ci1

)

times[ knsum

j2=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

Z(s2 minus cj1 + ci1

)Z

(s3 minus cj2 + ci2

)

times Z(s4 minus cj2 + ci2

)F1(s1 s2 s3 s4) ds2 ds3 ds4

minus 2

kn

knsum

j2=1

knsum

j3=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj3l(n)

Z(s2 minus cj1 + ci1

)

times Z(s3 minus cj2 + ci2

)Z

(s4 minus cj3 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

+ 1

k2n

knsum

j2=1

knsum

j3=1

knsum

j4=1

int

Dj2l(n)

int

Dj3l(n)

int

Dj4l(n)

Z(s2 minus cj2 + ci1

)

times Z(s3 minus cj3 + ci2

)Z

(s4 minus cj4 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

]ds1

Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1

s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by

2C(λn)4

kn|Dn|2knsum

j1=1

int

Dj1l(n)

int

Dj1l(n)

int

Dn

int

Dn

|F2(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|F1(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by

C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|ϕ4(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ 4C(λn)4

kn|Dn|knsum

j2=1

int

Dn

int

Dj2l(n)

int

Dj2l(n)

|ϕ3(s1 s3 s4)|ds1 ds3 ds4

+ 2C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

[int

Dj1l(n)

int

Dj2l(n)

|ϕ(s1 s3)|ds1 ds3

]2

The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|

APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP

Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s

θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap

1386 Journal of the American Statistical Association December 2007

Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that

1

|Dn|var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0

where var[Sbn |(θ0) cap Dn] is var(Sb

n | cap N) defined in Appendix Bvar[Sb

n |(θn) cap Dn] is defined analogously

Letsum

D andsumsum =

Ddenote

sum

Dcap(acupb)

andsumsum

x1isinDcapNx2isinDcap(acupb)x1 =x2

Note that

1

|Dn|∣∣var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣

ltC

|Dn|kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)capN

sum

Dj

l(n)

1

+ C

|Dn|k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

capN

sum

Dj2l(n)

1

= C

|Dn|knsum

j=1

=sumsum

Dj

l(n)

1

+ C

|Dn|sum

Dn

1 + C

|Dn|kn

[sum

Dn

1

][ sum

xisinDncapN

1

]

Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for

0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator

[Received December 2006 Revised May 2007]

REFERENCES

Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42

Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350

Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston

Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point

Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York

Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-

VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-

geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear

Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821

(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125

Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737

Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730

Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag

McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419

Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482

Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics

From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275

Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154

Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47

Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93

Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523

(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048

Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley

Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258

Page 2: A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G UAN and Ji Meng L OH ... Estimates for the regression coefÞcients, ... block bootstrap

1378 Journal of the American Statistical Association December 2007

One complication in practice is that the variance of β is un-known and thus must be estimated From Theorem 1 in Sec-tion 2 we see that the variance of β depends on the second-order cumulant function (SOCF) of the process a function thatis related to the dependence structure of the process To avoidspecifying the SOCF we develop a nonparametric thinnedblock bootstrap procedure to estimate the variance of β by us-ing a combination of a thinning algorithm and block bootstrapSpecifically in the thinning step we retain (ie do not thin) anobserved event from N with a probability that is proportional tothe inverse of the estimated FOIF at the location of the event IfN is second-order reweighted stationary (see Sec 2) and if β isclose to β then the thinned process should resemble a second-order stationary (SOS) process In Section 3 we show that thevariance of β can be written in terms of the variance of a sta-tistic Sn where Sn is computed from the thinned process and isexpressed in terms of the intensity function only The task of es-timating the variance of β is thus translated into estimating thevariance of a statistic defined on an SOS process which we canaccomplish using block bootstrap In Section 3 we prove thatthe resulting variance estimator is L2-consistent for the targetvariance and in Section 4 we report a simulation study done toinvestigate the performance of the proposed procedure To il-lustrate its use in a practical setting we also apply the proposedprocedure to the BPL data in Section 5

Before proceeding to the next section we note that resam-pling methods including the block bootstrap have been exten-sively applied in spatial statistics (eg Politis 1999 Lahiri2003) However most of these methods were developed forstationary quantitative spatial processes that are observed on aregularly spaced grid In the regression setting also for quan-titative processes Cressie (1993 sec 732) discussed bothsemiparametric and parametric bootstrap methods to resam-ple the residuals whereas Sherman (1997) proposed a subsam-pling approach Politis and Sherman (2001) and McElroy andPolitis (2007) considered using subsampling for marked pointprocesses The former assumed the point process to be station-ary whereas the latter assumed it to be Poisson None of theaforementioned resampling procedures can be used for inho-mogeneous spatial point processes due to the unique feature ofthe data Note that by nature an inhomogeneous spatial pointprocess is not quantitative is observed at random locations isnonstationary and can be non-Poisson

2 NOTATION AND PRELIMINARYASYMPTOTIC RESULTS

Let N be a two-dimensional spatial point process observedover a domain of interest D For a Borel set B sub R

2 let |B|denote the area of B and let N(B) denote the number of eventsfrom N that fall in B We define the kth-order intensity andcumulant functions of N as

λk(s1 sk) = lim|dsi |rarr0

E[N(ds1) middot middot middotN(dsk)]

|ds1| middot middot middot |dsk|

i = 1 k

and

Qk(s1 sk) = lim|dsi |rarr0

cum[N(ds1) N(dsk)]

|ds1| middot middot middot |dsk|

i = 1 k

Here ds is an infinitesimal region containing s and cum(Y1

Yk) is the coefficient of ikt1 middot middot middot tk in the Taylor series expansionof logE[exp(i

sumkj=1 Yj tj )] about the origin (eg Brillinger

1975) For the intensity function λk(s1 sk)|ds1| middot middot middot |dsk| isthe approximate probability that ds1 dsk each contains anevent For the cumulant function Qk(s1 sk) describes thedependence among sites s1 sk where a near-0 value in-dicates near independence Specifically if N is Poisson thenQk(s1 sk) = 0 if at least two of s1 sk are differentThe kth-order cumulant (intensity) function can be expressedas a function of the intensity (cumulant) functions up to thekth-order (See Daley and Vere-Jones 1988 p 147 for details)

We study the large-sample behavior of β under an increasing-domain setting where β is obtained by maximizing (1) Specif-ically consider a sequence of regions Dn Let partDn denote theboundary of Dn let |partDn| denote the length of partDn and let βn

denote β obtained over Dn We assume that

C1n2 le |Dn| le C2n

2 C1n le |partDn| le C2n

for some 0 lt C1 le C2 lt infin (2)

Condition (2) requires that Dn must become large in all di-rections (ie the data are truly spatial) and that the boundarymust not be too irregular Many commonly used domain se-quences satisfy this condition To see an example of this letA sub (01] times (01] be the interior of a simple closed curve withnonempty interior If we define Dn as A inflated by a factor nthen Dn satisfies condition (2) Note that this formulation in-corporates a wide variety of shapes including rectangular andelliptical shapes

To formally state the large-sample distributional propertiesof βn we need to quantify the dependence in N We do thisby using the model-free strong mixing coefficient (Rosenblatt1956) defined as

α(p k) equiv sup|P(A1 cap A2) minus P(A1)P (A2)|

A1 isinF(E1)A2 isinF(E2)

E2 = E1 + s |E1| = |E2| le pd(E1E2) ge k

Here the supremum is taken over all compact and convex sub-sets E1 sub R

2 and over all s isin R2 such that d(E1E2) ge k

where d(E1E2) is the maximal distance between E1 and E2(eg Guan Sherman and Calvin 2006) and F(E) is the σ -algebra generated by the random events of N that are in E Weassume the following mixing condition on N

supp

α(p k)

p= O(kminusε) for some ε gt 2 (3)

Condition (3) states that for any two fixed sets the dependencebetween them must decay to 0 at a polynomial rate of the in-terset distance k The speed in which the dependence decaysto 0 also depends on the size of the sets (ie p) In particularfor a fixed k condition (3) allows the dependence to increaseas p increases Any point process with a finite dependencerange such as the Mateacutern cluster process (Stoyan and Stoyan1994) satisfies this condition Furthermore it is also satisfiedby the log-Gaussian Cox process (LGCP) which is very flexi-ble in modeling environmental data (eg Moslashller Syversveenand Waagepetersen 1998) if the correlation of the underlying

Guan and Loh Block Bootstrap Variance Estimation 1379

Gaussian random field decays at a polynomial rate faster than2 + ε and has a spectral density bounded below from 0 This isdue to corollary 2 of Doukhan (1994 p 59)

In addition to conditions (2) and (3) we also need some mildconditions on the intensity and cumulant functions of N Inwhat follows let f (i)(β) denote the ith derivative of f (β) Weassume that

λ(sβ) is bounded below from 0 (4)

λ(2)(sβ) is bounded and continuous with respect to β (5)

and

sups1

intmiddot middot middot

int|Qk(s1 sk)|ds2 middot middot middot dsk lt C

for k = 234 (6)

Conditions (4) and (5) are straightforward conditions that canbe checked directly for a proposed FOIF model Condition (6)is a fairly weak condition It also requires the process N to beweakly dependent but from a different perspective from (3)In the homogeneous case (6) is implied by Brillinger mix-ing which holds for many commonly used point process mod-els such as the Poisson cluster process a class of doubly sto-chastic Poisson process (ie Cox process) and certain renewalprocess (eg Heinrich 1985) In the inhomogeneous caseQk(s1 sk) often can be written as λ(s1) middot middot middotλ(sk)ϕk(s2 minuss1 sk minus s1) where ϕk(middot) is a cumulant function for somehomogeneous process Then (6) holds if λ(s) is bounded andint middot middot middot int |ϕk(u2 uk)|du2 middot middot middot duk lt C Processes satisfyingthis condition include but are not limited to the LGCP theinhomogeneous NeymanndashScott process (INSP Waagepetersen2007) and any inhomogeneous process obtained by thinning ahomogeneous process satisfying this condition

Theorem 1 Assume that conditions (2)ndash(6) hold and that βn

converges to β0 in probability where β0 is the true parametervector Then

|Dn|12(n)minus12(βn minus β0)

drarr N(0 Ip)

where Ip is a p times p identity matrix

n = |Dn|(An)minus1Bn(An)

minus1

An =int

Dn

λ(1)(sβ0)[λ(1)(sβ0)]primeλ(sβ0)

ds

and

Bn = An +int int

Dn

λ(1)(s1β0)[λ(1)(s2β0)]primeλ(s1β0)λ(s2β0)

Q2(s1 s2) ds1 ds2

Proof See Appendix A

We assume in Theorem 1 that βn is a consistent estimatorfor β0 Schoenberg (2004) established the consistency of βn fora class of spatialndashtemporal processes under some mild condi-tions He assumed that the spatial domain was fixed but thatthe time domain increased to infinity His results can be di-rectly extended to our case Therefore we simply assume con-sistency in Theorem 1 In connection with previous results ourasymptotic result coincides with that of Rathbun and Cressie(1994) in the inhomogeneous spatial Poisson process case and

Waagepetersen (2007) in the INSP case Furthermore note thatn depends only on the first- and second-order properties of theprocess

3 THINNED BLOCK BOOTSTRAP

31 The Proposed Method

Theorem 1 states that |Dn|12(n)minus12(βn minus β0) converges

to a standard multivariate normal distribution as n increasesThis provides the theoretical foundation for making inferenceon β In practice however n is typically unknown and thusmust be estimated From the definition of n in Theorem 1 wesee that two quantities (ie An and Bn) need to be estimatedto estimate n Note that An depends only on the FOIF andthus can be estimated easily once a FOIF model has been fit-ted to the data however the quantity Bn also depends on theSOCF Q2(middot) Unless an explicit parametric model is assumedfor Q2(middot) which (as argued in Sec 1) may be unrealistic it isdifficult to directly estimate Bn In this section we propose athinned block bootstrap approach to first estimate a term that isrelated to Bn and then use it to produce an estimate for Bn

From now on we assume that the point process N issecond-order reweighted stationary (SORWS) that is thereexists a function g(middot) defined in R

2 such that λ2(s1 s2) =λ(s1)λ(s2)g(s1 minus s2) The function g(middot) is often referred toas the pair correlation function (PCF eg Stoyan and Stoyan1994) Note that our definition of second-order reweightedstationarity is a special case of the original definition givenby Baddeley Moslashller and Waagepetersen (2000) Specificallythese two definitions coincide if the PCF exists Examples ofSORWS processes include the INSP the LGCP and any pointprocess obtained by thinning an SORWS point process (egBaddeley et al 2000) We also assume that β0 is known Thuswe suppress the dependence of the FOIF on β in this sectionIn practice the proposed algorithm can be applied by simplyreplacing β0 with its estimate β A theoretical justification forusing β is given in Appendix C

The essence of our approach is based on the fact that anSORWS point process can be thinned to be SOS by apply-ing some proper thinning weights to the events (Guan 2007)Specifically we consider the following thinned process

=

x x isin N cap DnP (x is retained) = minsisinDn λ(s)λ(x)

(7)

Clearly is SOS because its first- and second-order intensityfunctions can be written as

λn = minsisinDn

λ(s) and λ2n(s1 s2) = (λn)2g(s1 minus s2)

where s1 s2 isin Dn Based on we then define the followingstatistic

Sn =sum

xisincapDn

λ(1)(x) (8)

Note that Q2(s1 s2) = λ(s1)λ(s2)[g(s1 minus s2)minus 1] because N isSORWS Thus the covariance matrix of Sn where Sn is definedin (8) is given by

cov(Sn)

= λn

int

Dn

λ(1)(s)[λ(1)(s)

]primeds

1380 Journal of the American Statistical Association December 2007

+ (λn)2int int

Dn

λ(1)(s1)[λ(1)(s2)]primeλ(s1)λ(s2)

Q2(s1 s2) ds1 ds2 (9)

The first term on the right side of (9) depends only on the FOIFand thus can be estimated once a FOIF model has been fittedThe second term interestingly is (λn)

2(Bn minus An) These factssuggest that if we can estimate cov(Sn) then we can use the re-lationship between cov(Sn) and Bn to estimate Bn in an obviousway Note that Sn is a statistic defined on cap Dn where isan SOS process Thus the problem of estimating Bn becomes aproblem of estimating the variance of a statistic calculated froman SOS process which we can accomplish by using the blockbootstrap procedure

Toward that end we divide Dn into kn subblocks of the sameshape and size Let Di

l(n) i = 1 kn denote these subblockswhere l(n) = cnα for some c gt 0 and α isin (01) signifies thesubblock size For each Di

l(n) let ci denote the ldquocenterrdquo of

the subblock where the ldquocentersrdquo are defined in some consis-tent way across all of the Di

l(n)rsquos We then resample with re-placement kn subblocks from all of the available subblocks andldquogluerdquo them together so as to create a new region that will beused to approximate Dn We do so for a large number say B

times For the bth collection of random subblocks let Jb be theset of kn random indices sampled from 1 kn that are as-sociated with the selected subblocks For each event x isin D

Jb(i)l(n)

we replace it with a new value x minus cJb(i) + ci so that all events

in DJb(i)l(n)

are properly shifted into Dil(n)

In other words the ith

sampled subblock DJb(i)l(n) will be ldquogluedrdquo at where Di

l(n) wasin the original data We then define

Sbn =

knsum

i=1

sum

xisincapDJb(i)

l(n)

λ(1)(x minus cJb(i) + ci

) (10)

where the sumsum

xisincapDJb(i)

l(n)

is over all events in block DJb(i)l(n)

translated into block Dil(n)

Based on the Sbn b = 1 B thus

obtained we can calculate the sample covariance of Sbn and use

that as an estimate for cov(Sn) Note that for a given realizationof N on Dn the application of the thinning algorithm givenin (7) can yield multiple thinned realizations This will in turnlead to multiple sample covariance matrices by applying theproposed block bootstrap procedure One possible solution is tosimply average all of the obtained sample covariance matricesso as to produce a single estimate for cov(Sn) To do this let K

denote the total number of thinned processes and let Sbnj and

Snj denote Sbn and the bootstrapped mean of Sb

n b = 1 B for the j th thinned process where Sb

n is as defined in (10) Wethus obtain the following estimator for the covariance matrixof Sn

cov(Sn) = 1

K

Ksum

j=1

Bsum

b=1

(Sbnj minus Snj )(Sb

nj minus Snj )prime

B minus 1 (11)

32 Consistency and Practical Issues

To study the asymptotic properties of cov(Sn) we also as-sume the following condition

1

|Dn|12

int

Dn

int

R2Dn

|g(s1 minus s2) minus 1|ds1 ds2 lt C (12)

where R2Dn = s s isin R

2 but s isin Dn Condition (12) is onlyslightly stronger than

intR2 |g(s) minus 1|ds lt C which is implied

by condition (6) In the case where the Dnrsquos are sufficientlyregular (eg a sequence of n times n squares) such that |(Dn +s) cap R

2Dn| lt Cs|Dn|12 then condition (12) is implied bythe condition

intR2 s|g(s) minus 1|ds lt C which holds for any

process that has a finite range of dependence Furthermore itholds for the LGCP if the correlation function of the underlyingGaussian random field generating the intensities decays to 0 ata rate faster than sminus3

Theorem 2 Assume that conditions (2) (4) (5) and (6)hold Then

1

|Dn|cov(Sn)

L2rarr 1

|Dn| cov(Sn)

If we further assume (12) then 1|Dn|2 E[ cov(Sn)minuscov(Sn)]2 le

C[1kn + 1|Dl(n)|] for some C lt infin

Proof See Appendix B

Theorem 2 establishes the L2 consistency of the proposedcovariance matrix estimator for a wide range of values for thenumber of thinned processes and the subblock size But to ap-ply the proposed method in practice these values must be deter-mined For the former the proposed thinning algorithm can beapplied as many times as necessary until a stable estimator canbe obtained For the latter Theorem 2 suggests that the ldquoopti-malrdquo rate for the subblock size is |Dn|12 where the word ldquoopti-malrdquo is in the sense of minimizing the mean squared error Thisresult agrees with the findings for the optimal subblock size forother resampling variance estimators obtained from quantita-tive processes (eg Sherman 1996) If we further assume thatthe best rate can be approximated by cn = c|Dn|12 where c isan unknown constant then we can adapt the algorithm of Halland Jing (1996) to estimate cn and thus determine the ldquooptimalrdquosubblock size

Specifically let Dim i = 1 kprime

n be a set of subblocks con-tained in Dn that have the same size and shape let Si

mj be Sn

in (8) obtained from the j th thinned process on Dim and let

Smj be the mean of Simj We then define the sample variancendash

covariance matrix of Simj averaged over the K replicate thinned

point processes

θm = 1

K

Ksum

j=1

kprimensum

i=1

(Simj minus Smj )(Si

mj minus Smj )prime

kprimen minus 1

If Dim is relatively small compared with Dn then the forego-

ing should be a good estimator for the true covariance matrix ofSm where Sm is Sn in (8) defined on Dm For each Di

m we thenapply (11) to estimate cov(Sm) over a fine set of candidate sub-block sizes say cprime

m = cprime|Dim|12 Let θm(j k) and [θ i

m(j k)]prime bethe (j k)th element of θm and the resulting estimator for usingcprimem We define the best cm as the one that has the smallest value

for the following criterion

M(cprimem) = 1

kprimen

kprimensum

i=1

psum

j=1

psum

k=1

[θ im(j k)]prime minus θm(j k)

2

The ldquooptimalrdquo subblock size for Dn cn then can be estimatedeasily using the relationship cn = cm(|Dn||Dm|)12

Guan and Loh Block Bootstrap Variance Estimation 1381

4 SIMULATIONS

41 Log-Gaussian Cox Process

We performed a simulation study to test the results of ourtheoretical findings On an observation region D of size L timesL where L = 12 and 4 we simulated 1000 realizations ofan inhomogeneous Poisson process with intensity function λ(s)given by

logλ(s) = α + βX(s) + G(s) (13)

where G is a mean-0 Gaussian random field This is the LGCPmodel with inhomogeneity due to the covariate X

For values of X we used a single realization from a dif-ferent mean-0 Gaussian random field We kept this realizationfixed throughout the whole simulation study that is we usedthe same X for each of the 1000 point process realizationsFigure 1 shows a surface plot of X for L = 4 The values ofX for L = 1 and 2 are the lower-left corner of those shownin the figure For G we used the exponential model C(h) =σ 2 exp(minushρ) for its covariance function where ρ = 1 and 2and σ = 1 We set α and β to be 702 and 2 We generated thepoint process patterns using the rpoispp function in the spat-stat R package (Baddeley and Turner 2005) The average num-ber of points simulated were about 1200 4500 and 16700 forL = 12 and 4

For each of the 1000 point patterns we obtained the esti-mates α and β by maximizing the Poisson likelihood (usingthe ppm function in spatstat) Next we thinned the point pat-tern according to the criterion described in Section 3 resultingin 20 independent thinned processes for each realization Theaverage number of points in the thinned patterns was roughly490 1590 and 5400 for L = 12 and 4 so about 30ndash40

Figure 1 Image of the covariate X used in the simulation Therange of X is between minus6066 and 5303 where darker colors rep-resent larger values of X

of the points were retained We then applied the bootstrap pro-cedure to each of the thinned processes to get the variance esti-mates for α and β From (13) we found that the quantities to becomputed with each bootstrap sample were

sumλ(slowast) for α andsum

X(slowast)λ(slowast) for β where slowast represents the locations of thepoints in a bootstrap sample We used nonoverlapping squareblocks in the bootstrap procedure For L = 1 we used blockswith side lengths 14 13 and 12 whereas for L = 24 weused blocks with side lengths 14 13 12 and 1 The numberof bootstrap samples was 499

We computed the variances of α and β from the bootstrapvariances using Theorem 1 (9) and the relationship of cov(Sn)

to Bn We averaged the variance estimates across the inde-pendent thinnings Finally we constructed nominal 95 con-fidence intervals yielding a total of 1000 confidence intervalsfor α and β

Table 1 gives the empirical coverage of the confidence inter-vals for β The coverage for α (not shown) is similar but slightlylower For both values of ρ we found that the empirical cover-age increased toward the nominal level as the observation re-gion size increased This increase in coverage agrees with whatwe would expect from the theory The empirical coverage forρ = 1 is typically closer to the nominal level than for ρ = 2Note that ρ = 1 corresponds to a weaker dependence Thuswe would expect our method to work better if the dependencein the data were relatively weaker It is interesting to note thatfor L = 2 and 4 the best coverage for ρ = 2 was achieved byusing a block larger than its counterpart for ρ = 1 Thus forerrors with longer-range dependence it appears that the blockbootstrap procedure requires larger blocks to work well

Table 2 gives the bootstrap estimates of the standard errorsof β for ρ = 2 averaged over the 1000 realizations The stan-dard deviations of these estimates over the 1000 realizationsare included in brackets The ldquotruerdquo standard errors computedusing the estimated regression parameters from the independentrealizations are also given The table shows that the bootstrapestimates have a negative bias becoming proportionally closerto the ldquotruerdquo standard errors as the region size L increases

In summary we found that our method worked reasonablywell when the regression parameters were estimated We founda slight undercoverage in each instance but noted that this un-dercoverage becomes smaller as the sample size increases Wealso estimated the standard errors by using the true regressionparameters in our thinning step and found that the coverageincreased only slightly

Table 1 Empirical coverage of 95 nominal confidence intervalsof β in the LGCP case

Block size

Region size 14 13 12 1

ρ = 11 85 77 492 91 90 88 494 93 93 93 90

ρ = 21 84 76 502 87 88 87 504 88 89 90 89

1382 Journal of the American Statistical Association December 2007

Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case

ρ = 2 Block size

Region size True SE 14 13 12 1

1 24 17(04) 16(05) 13(06)

2 12 10(02) 10(02) 10(03) 08(03)

4 07 056(003) 057(004) 060(006) 060(008)

42 Inhomogeneous NeymanndashScott Process

Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case

To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case

We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β

and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing

int a

0[K(t)c minus K(tκω)c]2 dt (14)

with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ

and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode

Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the

former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method

5 AN APPLICATION

We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model

λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not

Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and

the plug-in method in the INSP case

Block size

Region size Plug-in 14 13 12

ω = 021 96 91 89 812 96 94 94 94

ω = 041 93 87 87 772 95 90 90 91

Guan and Loh Block Bootstrap Variance Estimation 1383

Figure 2 Locations of BLP trees

We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters

To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not

Table 4 Results for the BLP data

Plug-in method Thinned block bootstrap

EST STD 95 CI STD 95 CI

β1 02 02 (minus02 06) 017 (minus014 054)

β2 584 253 (891080) 212 (169999)

NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval

From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions

APPENDIX A PROOF OF THEOREM 1

Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device

Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)

n (β0)]4 lt C for some C lt infin

Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-

tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded

by the following (ignoring some multiplicative constants)int int int int

|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4

+[int int

|Q2(s1 s2)|ds1 ds2

]2

+int int int

|Q3(s1 s2 s3)|ds1 ds2 ds3

+ |Dn|int int

|Q2(s1 s2)|ds1 ds2

+int int

|Q2(s1 s2)|ds1 ds2

where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven

Proof of Theorem 1

First note that

U(1)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(1)(xβ)

λ(xβ)minus

intλ(1)(sβ)ds

]

and

U(2)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(2)(xβ)

λ(xβ)minus

intλ(2)(sβ)ds

]

minus 1

|Dn|sum

xisinNcapDn

[λ(1)(xβ)

λ(xβ)

]2

= U(2)na(β) minus U

(2)nb

(β)

Using the Taylor expansion we can obtain

|Dn|12(βn minus β0) = minus[U

(2)n (βn)

]minus1|Dn|12U(1)n (β0)

where βn is between βn and β0 We need to show that

U(2)na(β0)

prarr 0

and

U(2)nb

(β0)prarr 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

1384 Journal of the American Statistical Association December 2007

To show the first expression note that E[U(2)na(β0)] = 0 Thus we only

need look at the variance term

var[U

(2)na(β0)

]

= 1

|Dn|2int int

λ(2)(s1β0)λ(2)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(2)(sβ0)]2

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)

prarr 0

To show the second expression note that E[U(2)nb

(β0)] =1

|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only

the variance term

var[U

(2)nb

(β0)]

= 1

|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(1)(sβ0)]4

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb

(β0)prarr

1|Dn|

int [λ(1)(sβ0)]2λ(sβ0) ds

Now we want to show that |Dn|12U(1)n (β0) converges to a

normal distribution First we derive the mean and variance of|Dn|12U

(1)n (β0) Clearly E[|Dn|12U

(1)n (β0)] = 0 For the variance

var[|Dn|12U

(1)n (β0)

]

= 1

|Dn|int int

λ(1)(s1β0)λ(1)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

Now consider a new sequence of regions Dlowastn where Dlowast

n sub Dn and

|Dlowastn||Dn| rarr 1 as n rarr infin Let U

(1)nlowast(β0) be the estimating function

for the realization of the point process on Dlowastn Based on the expression

of the variance we can deduce that

var[|Dn|12U

(1)n (β0) minus |Dlowast

n|12U(1)nlowast(β0)

] rarr 0 as n rarr infin

Thus

|Dn|12U(1)n (β0) sim |Dlowast

n|12U(1)nlowast(β0)

where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast

n|12U(1)nlowast(β0) con-

verges to a normal distribution for some properly defined Dlowastn

To obtain a sequence of Dlowastn we apply the blocking technique used

by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt

η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi

l(n) i = 1 kn Within each subsquare we further obtain Di

m(n)

an m(n) times m(n) square sharing the same center with Dil(n)

Note that

d(Dim(n)

Djm(n)

) ge nη for i = j Now define

Dlowastn =

kn⋃

i=1

Dim(n)

Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show

that |Dlowastn|12U

(1)nlowast(β0) converges to a normal distribution This is true

due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here

APPENDIX B PROOF OF THEOREM 2

To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1

sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes

var(Sn) =Bsum

b=1

(Sbn minus Sn)2

B minus 1

As B goes to infinity we see that the foregoing converges to

var(Sbn | cap Dn)

= 1

kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)

sum

Dj

l(n)

Z(x1 minus cj + ci )Z(x2 minus cj + ci )

minus 1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj2l(n)

Z(x1 minus cj1 + ci

)

times Z(x2 minus cj2 + ci

)

where the notationsum

D is short forsum

Dcap We wish to show thatvar(Sb

n | cap Dn)|Dn| converges in L2 to

1

|Dn| var(Sn) = (λn)2

|Dn|int

Dn

int

Dn

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

+ λn

|Dn|int

Dn

[Z(s)]2 ds

To show this we need to show that

1

|Dn|E[var(Sbn | cap Dn)] rarr 1

|Dn| var(Sn) (B1)

and

1

|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)

To show (B1) note that

E[var(Sbn | cap Dn)]

= (λn)2knsum

i=1

int int

Dil(n)

Z(s1)Z(s2)g(s1 minus s2) ds1 ds2

+ λn(kn minus 1)

kn

int

Dn

[Z(s)]2 ds

minus (λn)2

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times g(s1 minus s2 + cj1 minus cj2

)ds1 ds2

Thus

1

|Dn|E[var(Sb

n | cap Dn)] minus var(Sn)

= minus λn

kn|Dn|int

Dn

[Z(s)]2 ds

minus (λn)2

|Dn|sumsum

i =j

int

Dil(n)

int

Dj

l(n)

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

Guan and Loh Block Bootstrap Variance Estimation 1385

minus (λn)2

|Dn|1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times ϕ(s1 minus s2 + cj1 minus cj2

)ds1 ds2

= minusT 1n minus T 2

n minus T 3n

T 1n goes to 0 due to conditions (2) and (5) For T 2

n and T 3n lengthy yet

elementary derivations yield

|T 2n | le C(λn)2

|kn|knsum

i=1

[1

|Dn|int

DnminusDn

|Dn cap (Dn minus s)||ϕ(s)|ds

minus 1

|Dil(n)

|int

Dil(n)

minusDil(n)

∣∣Dil(n) cap (

Dil(n) minus s

)∣∣|ϕ(s)|ds]

and

T 3n le C(λn)2

kn

1

|Dn|int

Dn

int

Dn

|ϕ(s1 minus s2)|ds1 ds2

Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1

n and T 3n are both

of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition

(12)To prove (B2) we first note that [var(Sb

n | cap Dn)]2 is equal to thesum of the following three terms

1

k2n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj2l(n)

[Z

(x1 minus cj1 + ci1

)

times Z(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)Z

(x4 minus cj2 + ci2

)]

minus 2

k3n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)

times Z(x4 minus cj3 + ci2

)]

and

1

k4n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

knsum

j4=1

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

sum

Dj4l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj2 + ci1

)Z

(x3 minus cj3 + ci2

)Z

(x4 minus cj4 + ci2

)]

Furthermore the following relationships are possible for x1 x4

a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but

x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =

x3 = x4 but x1 = x2e x1 = x2 = x3 = x4

When calculating the variance of 1|Dn| var(Sb

n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1

|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-

plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner

Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral

(λn)4

k2n|Dn|2

knsum

i1=1

knsum

i2=1

knsum

j1=1

int

Dj1l(n)

Z(s1 minus cj1 + ci1

)

times[ knsum

j2=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

Z(s2 minus cj1 + ci1

)Z

(s3 minus cj2 + ci2

)

times Z(s4 minus cj2 + ci2

)F1(s1 s2 s3 s4) ds2 ds3 ds4

minus 2

kn

knsum

j2=1

knsum

j3=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj3l(n)

Z(s2 minus cj1 + ci1

)

times Z(s3 minus cj2 + ci2

)Z

(s4 minus cj3 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

+ 1

k2n

knsum

j2=1

knsum

j3=1

knsum

j4=1

int

Dj2l(n)

int

Dj3l(n)

int

Dj4l(n)

Z(s2 minus cj2 + ci1

)

times Z(s3 minus cj3 + ci2

)Z

(s4 minus cj4 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

]ds1

Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1

s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by

2C(λn)4

kn|Dn|2knsum

j1=1

int

Dj1l(n)

int

Dj1l(n)

int

Dn

int

Dn

|F2(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|F1(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by

C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|ϕ4(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ 4C(λn)4

kn|Dn|knsum

j2=1

int

Dn

int

Dj2l(n)

int

Dj2l(n)

|ϕ3(s1 s3 s4)|ds1 ds3 ds4

+ 2C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

[int

Dj1l(n)

int

Dj2l(n)

|ϕ(s1 s3)|ds1 ds3

]2

The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|

APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP

Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s

θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap

1386 Journal of the American Statistical Association December 2007

Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that

1

|Dn|var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0

where var[Sbn |(θ0) cap Dn] is var(Sb

n | cap N) defined in Appendix Bvar[Sb

n |(θn) cap Dn] is defined analogously

Letsum

D andsumsum =

Ddenote

sum

Dcap(acupb)

andsumsum

x1isinDcapNx2isinDcap(acupb)x1 =x2

Note that

1

|Dn|∣∣var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣

ltC

|Dn|kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)capN

sum

Dj

l(n)

1

+ C

|Dn|k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

capN

sum

Dj2l(n)

1

= C

|Dn|knsum

j=1

=sumsum

Dj

l(n)

1

+ C

|Dn|sum

Dn

1 + C

|Dn|kn

[sum

Dn

1

][ sum

xisinDncapN

1

]

Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for

0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator

[Received December 2006 Revised May 2007]

REFERENCES

Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42

Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350

Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston

Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point

Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York

Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-

VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-

geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear

Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821

(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125

Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737

Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730

Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag

McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419

Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482

Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics

From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275

Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154

Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47

Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93

Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523

(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048

Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley

Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258

Page 3: A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G UAN and Ji Meng L OH ... Estimates for the regression coefÞcients, ... block bootstrap

Guan and Loh Block Bootstrap Variance Estimation 1379

Gaussian random field decays at a polynomial rate faster than2 + ε and has a spectral density bounded below from 0 This isdue to corollary 2 of Doukhan (1994 p 59)

In addition to conditions (2) and (3) we also need some mildconditions on the intensity and cumulant functions of N Inwhat follows let f (i)(β) denote the ith derivative of f (β) Weassume that

λ(sβ) is bounded below from 0 (4)

λ(2)(sβ) is bounded and continuous with respect to β (5)

and

sups1

intmiddot middot middot

int|Qk(s1 sk)|ds2 middot middot middot dsk lt C

for k = 234 (6)

Conditions (4) and (5) are straightforward conditions that canbe checked directly for a proposed FOIF model Condition (6)is a fairly weak condition It also requires the process N to beweakly dependent but from a different perspective from (3)In the homogeneous case (6) is implied by Brillinger mix-ing which holds for many commonly used point process mod-els such as the Poisson cluster process a class of doubly sto-chastic Poisson process (ie Cox process) and certain renewalprocess (eg Heinrich 1985) In the inhomogeneous caseQk(s1 sk) often can be written as λ(s1) middot middot middotλ(sk)ϕk(s2 minuss1 sk minus s1) where ϕk(middot) is a cumulant function for somehomogeneous process Then (6) holds if λ(s) is bounded andint middot middot middot int |ϕk(u2 uk)|du2 middot middot middot duk lt C Processes satisfyingthis condition include but are not limited to the LGCP theinhomogeneous NeymanndashScott process (INSP Waagepetersen2007) and any inhomogeneous process obtained by thinning ahomogeneous process satisfying this condition

Theorem 1 Assume that conditions (2)ndash(6) hold and that βn

converges to β0 in probability where β0 is the true parametervector Then

|Dn|12(n)minus12(βn minus β0)

drarr N(0 Ip)

where Ip is a p times p identity matrix

n = |Dn|(An)minus1Bn(An)

minus1

An =int

Dn

λ(1)(sβ0)[λ(1)(sβ0)]primeλ(sβ0)

ds

and

Bn = An +int int

Dn

λ(1)(s1β0)[λ(1)(s2β0)]primeλ(s1β0)λ(s2β0)

Q2(s1 s2) ds1 ds2

Proof See Appendix A

We assume in Theorem 1 that βn is a consistent estimatorfor β0 Schoenberg (2004) established the consistency of βn fora class of spatialndashtemporal processes under some mild condi-tions He assumed that the spatial domain was fixed but thatthe time domain increased to infinity His results can be di-rectly extended to our case Therefore we simply assume con-sistency in Theorem 1 In connection with previous results ourasymptotic result coincides with that of Rathbun and Cressie(1994) in the inhomogeneous spatial Poisson process case and

Waagepetersen (2007) in the INSP case Furthermore note thatn depends only on the first- and second-order properties of theprocess

3 THINNED BLOCK BOOTSTRAP

31 The Proposed Method

Theorem 1 states that |Dn|12(n)minus12(βn minus β0) converges

to a standard multivariate normal distribution as n increasesThis provides the theoretical foundation for making inferenceon β In practice however n is typically unknown and thusmust be estimated From the definition of n in Theorem 1 wesee that two quantities (ie An and Bn) need to be estimatedto estimate n Note that An depends only on the FOIF andthus can be estimated easily once a FOIF model has been fit-ted to the data however the quantity Bn also depends on theSOCF Q2(middot) Unless an explicit parametric model is assumedfor Q2(middot) which (as argued in Sec 1) may be unrealistic it isdifficult to directly estimate Bn In this section we propose athinned block bootstrap approach to first estimate a term that isrelated to Bn and then use it to produce an estimate for Bn

From now on we assume that the point process N issecond-order reweighted stationary (SORWS) that is thereexists a function g(middot) defined in R

2 such that λ2(s1 s2) =λ(s1)λ(s2)g(s1 minus s2) The function g(middot) is often referred toas the pair correlation function (PCF eg Stoyan and Stoyan1994) Note that our definition of second-order reweightedstationarity is a special case of the original definition givenby Baddeley Moslashller and Waagepetersen (2000) Specificallythese two definitions coincide if the PCF exists Examples ofSORWS processes include the INSP the LGCP and any pointprocess obtained by thinning an SORWS point process (egBaddeley et al 2000) We also assume that β0 is known Thuswe suppress the dependence of the FOIF on β in this sectionIn practice the proposed algorithm can be applied by simplyreplacing β0 with its estimate β A theoretical justification forusing β is given in Appendix C

The essence of our approach is based on the fact that anSORWS point process can be thinned to be SOS by apply-ing some proper thinning weights to the events (Guan 2007)Specifically we consider the following thinned process

=

x x isin N cap DnP (x is retained) = minsisinDn λ(s)λ(x)

(7)

Clearly is SOS because its first- and second-order intensityfunctions can be written as

λn = minsisinDn

λ(s) and λ2n(s1 s2) = (λn)2g(s1 minus s2)

where s1 s2 isin Dn Based on we then define the followingstatistic

Sn =sum

xisincapDn

λ(1)(x) (8)

Note that Q2(s1 s2) = λ(s1)λ(s2)[g(s1 minus s2)minus 1] because N isSORWS Thus the covariance matrix of Sn where Sn is definedin (8) is given by

cov(Sn)

= λn

int

Dn

λ(1)(s)[λ(1)(s)

]primeds

1380 Journal of the American Statistical Association December 2007

+ (λn)2int int

Dn

λ(1)(s1)[λ(1)(s2)]primeλ(s1)λ(s2)

Q2(s1 s2) ds1 ds2 (9)

The first term on the right side of (9) depends only on the FOIFand thus can be estimated once a FOIF model has been fittedThe second term interestingly is (λn)

2(Bn minus An) These factssuggest that if we can estimate cov(Sn) then we can use the re-lationship between cov(Sn) and Bn to estimate Bn in an obviousway Note that Sn is a statistic defined on cap Dn where isan SOS process Thus the problem of estimating Bn becomes aproblem of estimating the variance of a statistic calculated froman SOS process which we can accomplish by using the blockbootstrap procedure

Toward that end we divide Dn into kn subblocks of the sameshape and size Let Di

l(n) i = 1 kn denote these subblockswhere l(n) = cnα for some c gt 0 and α isin (01) signifies thesubblock size For each Di

l(n) let ci denote the ldquocenterrdquo of

the subblock where the ldquocentersrdquo are defined in some consis-tent way across all of the Di

l(n)rsquos We then resample with re-placement kn subblocks from all of the available subblocks andldquogluerdquo them together so as to create a new region that will beused to approximate Dn We do so for a large number say B

times For the bth collection of random subblocks let Jb be theset of kn random indices sampled from 1 kn that are as-sociated with the selected subblocks For each event x isin D

Jb(i)l(n)

we replace it with a new value x minus cJb(i) + ci so that all events

in DJb(i)l(n)

are properly shifted into Dil(n)

In other words the ith

sampled subblock DJb(i)l(n) will be ldquogluedrdquo at where Di

l(n) wasin the original data We then define

Sbn =

knsum

i=1

sum

xisincapDJb(i)

l(n)

λ(1)(x minus cJb(i) + ci

) (10)

where the sumsum

xisincapDJb(i)

l(n)

is over all events in block DJb(i)l(n)

translated into block Dil(n)

Based on the Sbn b = 1 B thus

obtained we can calculate the sample covariance of Sbn and use

that as an estimate for cov(Sn) Note that for a given realizationof N on Dn the application of the thinning algorithm givenin (7) can yield multiple thinned realizations This will in turnlead to multiple sample covariance matrices by applying theproposed block bootstrap procedure One possible solution is tosimply average all of the obtained sample covariance matricesso as to produce a single estimate for cov(Sn) To do this let K

denote the total number of thinned processes and let Sbnj and

Snj denote Sbn and the bootstrapped mean of Sb

n b = 1 B for the j th thinned process where Sb

n is as defined in (10) Wethus obtain the following estimator for the covariance matrixof Sn

cov(Sn) = 1

K

Ksum

j=1

Bsum

b=1

(Sbnj minus Snj )(Sb

nj minus Snj )prime

B minus 1 (11)

32 Consistency and Practical Issues

To study the asymptotic properties of cov(Sn) we also as-sume the following condition

1

|Dn|12

int

Dn

int

R2Dn

|g(s1 minus s2) minus 1|ds1 ds2 lt C (12)

where R2Dn = s s isin R

2 but s isin Dn Condition (12) is onlyslightly stronger than

intR2 |g(s) minus 1|ds lt C which is implied

by condition (6) In the case where the Dnrsquos are sufficientlyregular (eg a sequence of n times n squares) such that |(Dn +s) cap R

2Dn| lt Cs|Dn|12 then condition (12) is implied bythe condition

intR2 s|g(s) minus 1|ds lt C which holds for any

process that has a finite range of dependence Furthermore itholds for the LGCP if the correlation function of the underlyingGaussian random field generating the intensities decays to 0 ata rate faster than sminus3

Theorem 2 Assume that conditions (2) (4) (5) and (6)hold Then

1

|Dn|cov(Sn)

L2rarr 1

|Dn| cov(Sn)

If we further assume (12) then 1|Dn|2 E[ cov(Sn)minuscov(Sn)]2 le

C[1kn + 1|Dl(n)|] for some C lt infin

Proof See Appendix B

Theorem 2 establishes the L2 consistency of the proposedcovariance matrix estimator for a wide range of values for thenumber of thinned processes and the subblock size But to ap-ply the proposed method in practice these values must be deter-mined For the former the proposed thinning algorithm can beapplied as many times as necessary until a stable estimator canbe obtained For the latter Theorem 2 suggests that the ldquoopti-malrdquo rate for the subblock size is |Dn|12 where the word ldquoopti-malrdquo is in the sense of minimizing the mean squared error Thisresult agrees with the findings for the optimal subblock size forother resampling variance estimators obtained from quantita-tive processes (eg Sherman 1996) If we further assume thatthe best rate can be approximated by cn = c|Dn|12 where c isan unknown constant then we can adapt the algorithm of Halland Jing (1996) to estimate cn and thus determine the ldquooptimalrdquosubblock size

Specifically let Dim i = 1 kprime

n be a set of subblocks con-tained in Dn that have the same size and shape let Si

mj be Sn

in (8) obtained from the j th thinned process on Dim and let

Smj be the mean of Simj We then define the sample variancendash

covariance matrix of Simj averaged over the K replicate thinned

point processes

θm = 1

K

Ksum

j=1

kprimensum

i=1

(Simj minus Smj )(Si

mj minus Smj )prime

kprimen minus 1

If Dim is relatively small compared with Dn then the forego-

ing should be a good estimator for the true covariance matrix ofSm where Sm is Sn in (8) defined on Dm For each Di

m we thenapply (11) to estimate cov(Sm) over a fine set of candidate sub-block sizes say cprime

m = cprime|Dim|12 Let θm(j k) and [θ i

m(j k)]prime bethe (j k)th element of θm and the resulting estimator for usingcprimem We define the best cm as the one that has the smallest value

for the following criterion

M(cprimem) = 1

kprimen

kprimensum

i=1

psum

j=1

psum

k=1

[θ im(j k)]prime minus θm(j k)

2

The ldquooptimalrdquo subblock size for Dn cn then can be estimatedeasily using the relationship cn = cm(|Dn||Dm|)12

Guan and Loh Block Bootstrap Variance Estimation 1381

4 SIMULATIONS

41 Log-Gaussian Cox Process

We performed a simulation study to test the results of ourtheoretical findings On an observation region D of size L timesL where L = 12 and 4 we simulated 1000 realizations ofan inhomogeneous Poisson process with intensity function λ(s)given by

logλ(s) = α + βX(s) + G(s) (13)

where G is a mean-0 Gaussian random field This is the LGCPmodel with inhomogeneity due to the covariate X

For values of X we used a single realization from a dif-ferent mean-0 Gaussian random field We kept this realizationfixed throughout the whole simulation study that is we usedthe same X for each of the 1000 point process realizationsFigure 1 shows a surface plot of X for L = 4 The values ofX for L = 1 and 2 are the lower-left corner of those shownin the figure For G we used the exponential model C(h) =σ 2 exp(minushρ) for its covariance function where ρ = 1 and 2and σ = 1 We set α and β to be 702 and 2 We generated thepoint process patterns using the rpoispp function in the spat-stat R package (Baddeley and Turner 2005) The average num-ber of points simulated were about 1200 4500 and 16700 forL = 12 and 4

For each of the 1000 point patterns we obtained the esti-mates α and β by maximizing the Poisson likelihood (usingthe ppm function in spatstat) Next we thinned the point pat-tern according to the criterion described in Section 3 resultingin 20 independent thinned processes for each realization Theaverage number of points in the thinned patterns was roughly490 1590 and 5400 for L = 12 and 4 so about 30ndash40

Figure 1 Image of the covariate X used in the simulation Therange of X is between minus6066 and 5303 where darker colors rep-resent larger values of X

of the points were retained We then applied the bootstrap pro-cedure to each of the thinned processes to get the variance esti-mates for α and β From (13) we found that the quantities to becomputed with each bootstrap sample were

sumλ(slowast) for α andsum

X(slowast)λ(slowast) for β where slowast represents the locations of thepoints in a bootstrap sample We used nonoverlapping squareblocks in the bootstrap procedure For L = 1 we used blockswith side lengths 14 13 and 12 whereas for L = 24 weused blocks with side lengths 14 13 12 and 1 The numberof bootstrap samples was 499

We computed the variances of α and β from the bootstrapvariances using Theorem 1 (9) and the relationship of cov(Sn)

to Bn We averaged the variance estimates across the inde-pendent thinnings Finally we constructed nominal 95 con-fidence intervals yielding a total of 1000 confidence intervalsfor α and β

Table 1 gives the empirical coverage of the confidence inter-vals for β The coverage for α (not shown) is similar but slightlylower For both values of ρ we found that the empirical cover-age increased toward the nominal level as the observation re-gion size increased This increase in coverage agrees with whatwe would expect from the theory The empirical coverage forρ = 1 is typically closer to the nominal level than for ρ = 2Note that ρ = 1 corresponds to a weaker dependence Thuswe would expect our method to work better if the dependencein the data were relatively weaker It is interesting to note thatfor L = 2 and 4 the best coverage for ρ = 2 was achieved byusing a block larger than its counterpart for ρ = 1 Thus forerrors with longer-range dependence it appears that the blockbootstrap procedure requires larger blocks to work well

Table 2 gives the bootstrap estimates of the standard errorsof β for ρ = 2 averaged over the 1000 realizations The stan-dard deviations of these estimates over the 1000 realizationsare included in brackets The ldquotruerdquo standard errors computedusing the estimated regression parameters from the independentrealizations are also given The table shows that the bootstrapestimates have a negative bias becoming proportionally closerto the ldquotruerdquo standard errors as the region size L increases

In summary we found that our method worked reasonablywell when the regression parameters were estimated We founda slight undercoverage in each instance but noted that this un-dercoverage becomes smaller as the sample size increases Wealso estimated the standard errors by using the true regressionparameters in our thinning step and found that the coverageincreased only slightly

Table 1 Empirical coverage of 95 nominal confidence intervalsof β in the LGCP case

Block size

Region size 14 13 12 1

ρ = 11 85 77 492 91 90 88 494 93 93 93 90

ρ = 21 84 76 502 87 88 87 504 88 89 90 89

1382 Journal of the American Statistical Association December 2007

Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case

ρ = 2 Block size

Region size True SE 14 13 12 1

1 24 17(04) 16(05) 13(06)

2 12 10(02) 10(02) 10(03) 08(03)

4 07 056(003) 057(004) 060(006) 060(008)

42 Inhomogeneous NeymanndashScott Process

Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case

To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case

We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β

and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing

int a

0[K(t)c minus K(tκω)c]2 dt (14)

with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ

and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode

Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the

former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method

5 AN APPLICATION

We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model

λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not

Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and

the plug-in method in the INSP case

Block size

Region size Plug-in 14 13 12

ω = 021 96 91 89 812 96 94 94 94

ω = 041 93 87 87 772 95 90 90 91

Guan and Loh Block Bootstrap Variance Estimation 1383

Figure 2 Locations of BLP trees

We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters

To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not

Table 4 Results for the BLP data

Plug-in method Thinned block bootstrap

EST STD 95 CI STD 95 CI

β1 02 02 (minus02 06) 017 (minus014 054)

β2 584 253 (891080) 212 (169999)

NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval

From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions

APPENDIX A PROOF OF THEOREM 1

Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device

Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)

n (β0)]4 lt C for some C lt infin

Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-

tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded

by the following (ignoring some multiplicative constants)int int int int

|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4

+[int int

|Q2(s1 s2)|ds1 ds2

]2

+int int int

|Q3(s1 s2 s3)|ds1 ds2 ds3

+ |Dn|int int

|Q2(s1 s2)|ds1 ds2

+int int

|Q2(s1 s2)|ds1 ds2

where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven

Proof of Theorem 1

First note that

U(1)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(1)(xβ)

λ(xβ)minus

intλ(1)(sβ)ds

]

and

U(2)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(2)(xβ)

λ(xβ)minus

intλ(2)(sβ)ds

]

minus 1

|Dn|sum

xisinNcapDn

[λ(1)(xβ)

λ(xβ)

]2

= U(2)na(β) minus U

(2)nb

(β)

Using the Taylor expansion we can obtain

|Dn|12(βn minus β0) = minus[U

(2)n (βn)

]minus1|Dn|12U(1)n (β0)

where βn is between βn and β0 We need to show that

U(2)na(β0)

prarr 0

and

U(2)nb

(β0)prarr 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

1384 Journal of the American Statistical Association December 2007

To show the first expression note that E[U(2)na(β0)] = 0 Thus we only

need look at the variance term

var[U

(2)na(β0)

]

= 1

|Dn|2int int

λ(2)(s1β0)λ(2)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(2)(sβ0)]2

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)

prarr 0

To show the second expression note that E[U(2)nb

(β0)] =1

|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only

the variance term

var[U

(2)nb

(β0)]

= 1

|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(1)(sβ0)]4

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb

(β0)prarr

1|Dn|

int [λ(1)(sβ0)]2λ(sβ0) ds

Now we want to show that |Dn|12U(1)n (β0) converges to a

normal distribution First we derive the mean and variance of|Dn|12U

(1)n (β0) Clearly E[|Dn|12U

(1)n (β0)] = 0 For the variance

var[|Dn|12U

(1)n (β0)

]

= 1

|Dn|int int

λ(1)(s1β0)λ(1)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

Now consider a new sequence of regions Dlowastn where Dlowast

n sub Dn and

|Dlowastn||Dn| rarr 1 as n rarr infin Let U

(1)nlowast(β0) be the estimating function

for the realization of the point process on Dlowastn Based on the expression

of the variance we can deduce that

var[|Dn|12U

(1)n (β0) minus |Dlowast

n|12U(1)nlowast(β0)

] rarr 0 as n rarr infin

Thus

|Dn|12U(1)n (β0) sim |Dlowast

n|12U(1)nlowast(β0)

where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast

n|12U(1)nlowast(β0) con-

verges to a normal distribution for some properly defined Dlowastn

To obtain a sequence of Dlowastn we apply the blocking technique used

by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt

η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi

l(n) i = 1 kn Within each subsquare we further obtain Di

m(n)

an m(n) times m(n) square sharing the same center with Dil(n)

Note that

d(Dim(n)

Djm(n)

) ge nη for i = j Now define

Dlowastn =

kn⋃

i=1

Dim(n)

Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show

that |Dlowastn|12U

(1)nlowast(β0) converges to a normal distribution This is true

due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here

APPENDIX B PROOF OF THEOREM 2

To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1

sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes

var(Sn) =Bsum

b=1

(Sbn minus Sn)2

B minus 1

As B goes to infinity we see that the foregoing converges to

var(Sbn | cap Dn)

= 1

kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)

sum

Dj

l(n)

Z(x1 minus cj + ci )Z(x2 minus cj + ci )

minus 1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj2l(n)

Z(x1 minus cj1 + ci

)

times Z(x2 minus cj2 + ci

)

where the notationsum

D is short forsum

Dcap We wish to show thatvar(Sb

n | cap Dn)|Dn| converges in L2 to

1

|Dn| var(Sn) = (λn)2

|Dn|int

Dn

int

Dn

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

+ λn

|Dn|int

Dn

[Z(s)]2 ds

To show this we need to show that

1

|Dn|E[var(Sbn | cap Dn)] rarr 1

|Dn| var(Sn) (B1)

and

1

|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)

To show (B1) note that

E[var(Sbn | cap Dn)]

= (λn)2knsum

i=1

int int

Dil(n)

Z(s1)Z(s2)g(s1 minus s2) ds1 ds2

+ λn(kn minus 1)

kn

int

Dn

[Z(s)]2 ds

minus (λn)2

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times g(s1 minus s2 + cj1 minus cj2

)ds1 ds2

Thus

1

|Dn|E[var(Sb

n | cap Dn)] minus var(Sn)

= minus λn

kn|Dn|int

Dn

[Z(s)]2 ds

minus (λn)2

|Dn|sumsum

i =j

int

Dil(n)

int

Dj

l(n)

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

Guan and Loh Block Bootstrap Variance Estimation 1385

minus (λn)2

|Dn|1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times ϕ(s1 minus s2 + cj1 minus cj2

)ds1 ds2

= minusT 1n minus T 2

n minus T 3n

T 1n goes to 0 due to conditions (2) and (5) For T 2

n and T 3n lengthy yet

elementary derivations yield

|T 2n | le C(λn)2

|kn|knsum

i=1

[1

|Dn|int

DnminusDn

|Dn cap (Dn minus s)||ϕ(s)|ds

minus 1

|Dil(n)

|int

Dil(n)

minusDil(n)

∣∣Dil(n) cap (

Dil(n) minus s

)∣∣|ϕ(s)|ds]

and

T 3n le C(λn)2

kn

1

|Dn|int

Dn

int

Dn

|ϕ(s1 minus s2)|ds1 ds2

Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1

n and T 3n are both

of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition

(12)To prove (B2) we first note that [var(Sb

n | cap Dn)]2 is equal to thesum of the following three terms

1

k2n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj2l(n)

[Z

(x1 minus cj1 + ci1

)

times Z(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)Z

(x4 minus cj2 + ci2

)]

minus 2

k3n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)

times Z(x4 minus cj3 + ci2

)]

and

1

k4n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

knsum

j4=1

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

sum

Dj4l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj2 + ci1

)Z

(x3 minus cj3 + ci2

)Z

(x4 minus cj4 + ci2

)]

Furthermore the following relationships are possible for x1 x4

a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but

x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =

x3 = x4 but x1 = x2e x1 = x2 = x3 = x4

When calculating the variance of 1|Dn| var(Sb

n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1

|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-

plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner

Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral

(λn)4

k2n|Dn|2

knsum

i1=1

knsum

i2=1

knsum

j1=1

int

Dj1l(n)

Z(s1 minus cj1 + ci1

)

times[ knsum

j2=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

Z(s2 minus cj1 + ci1

)Z

(s3 minus cj2 + ci2

)

times Z(s4 minus cj2 + ci2

)F1(s1 s2 s3 s4) ds2 ds3 ds4

minus 2

kn

knsum

j2=1

knsum

j3=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj3l(n)

Z(s2 minus cj1 + ci1

)

times Z(s3 minus cj2 + ci2

)Z

(s4 minus cj3 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

+ 1

k2n

knsum

j2=1

knsum

j3=1

knsum

j4=1

int

Dj2l(n)

int

Dj3l(n)

int

Dj4l(n)

Z(s2 minus cj2 + ci1

)

times Z(s3 minus cj3 + ci2

)Z

(s4 minus cj4 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

]ds1

Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1

s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by

2C(λn)4

kn|Dn|2knsum

j1=1

int

Dj1l(n)

int

Dj1l(n)

int

Dn

int

Dn

|F2(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|F1(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by

C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|ϕ4(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ 4C(λn)4

kn|Dn|knsum

j2=1

int

Dn

int

Dj2l(n)

int

Dj2l(n)

|ϕ3(s1 s3 s4)|ds1 ds3 ds4

+ 2C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

[int

Dj1l(n)

int

Dj2l(n)

|ϕ(s1 s3)|ds1 ds3

]2

The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|

APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP

Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s

θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap

1386 Journal of the American Statistical Association December 2007

Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that

1

|Dn|var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0

where var[Sbn |(θ0) cap Dn] is var(Sb

n | cap N) defined in Appendix Bvar[Sb

n |(θn) cap Dn] is defined analogously

Letsum

D andsumsum =

Ddenote

sum

Dcap(acupb)

andsumsum

x1isinDcapNx2isinDcap(acupb)x1 =x2

Note that

1

|Dn|∣∣var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣

ltC

|Dn|kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)capN

sum

Dj

l(n)

1

+ C

|Dn|k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

capN

sum

Dj2l(n)

1

= C

|Dn|knsum

j=1

=sumsum

Dj

l(n)

1

+ C

|Dn|sum

Dn

1 + C

|Dn|kn

[sum

Dn

1

][ sum

xisinDncapN

1

]

Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for

0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator

[Received December 2006 Revised May 2007]

REFERENCES

Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42

Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350

Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston

Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point

Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York

Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-

VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-

geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear

Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821

(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125

Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737

Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730

Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag

McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419

Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482

Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics

From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275

Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154

Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47

Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93

Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523

(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048

Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley

Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258

Page 4: A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G UAN and Ji Meng L OH ... Estimates for the regression coefÞcients, ... block bootstrap

1380 Journal of the American Statistical Association December 2007

+ (λn)2int int

Dn

λ(1)(s1)[λ(1)(s2)]primeλ(s1)λ(s2)

Q2(s1 s2) ds1 ds2 (9)

The first term on the right side of (9) depends only on the FOIFand thus can be estimated once a FOIF model has been fittedThe second term interestingly is (λn)

2(Bn minus An) These factssuggest that if we can estimate cov(Sn) then we can use the re-lationship between cov(Sn) and Bn to estimate Bn in an obviousway Note that Sn is a statistic defined on cap Dn where isan SOS process Thus the problem of estimating Bn becomes aproblem of estimating the variance of a statistic calculated froman SOS process which we can accomplish by using the blockbootstrap procedure

Toward that end we divide Dn into kn subblocks of the sameshape and size Let Di

l(n) i = 1 kn denote these subblockswhere l(n) = cnα for some c gt 0 and α isin (01) signifies thesubblock size For each Di

l(n) let ci denote the ldquocenterrdquo of

the subblock where the ldquocentersrdquo are defined in some consis-tent way across all of the Di

l(n)rsquos We then resample with re-placement kn subblocks from all of the available subblocks andldquogluerdquo them together so as to create a new region that will beused to approximate Dn We do so for a large number say B

times For the bth collection of random subblocks let Jb be theset of kn random indices sampled from 1 kn that are as-sociated with the selected subblocks For each event x isin D

Jb(i)l(n)

we replace it with a new value x minus cJb(i) + ci so that all events

in DJb(i)l(n)

are properly shifted into Dil(n)

In other words the ith

sampled subblock DJb(i)l(n) will be ldquogluedrdquo at where Di

l(n) wasin the original data We then define

Sbn =

knsum

i=1

sum

xisincapDJb(i)

l(n)

λ(1)(x minus cJb(i) + ci

) (10)

where the sumsum

xisincapDJb(i)

l(n)

is over all events in block DJb(i)l(n)

translated into block Dil(n)

Based on the Sbn b = 1 B thus

obtained we can calculate the sample covariance of Sbn and use

that as an estimate for cov(Sn) Note that for a given realizationof N on Dn the application of the thinning algorithm givenin (7) can yield multiple thinned realizations This will in turnlead to multiple sample covariance matrices by applying theproposed block bootstrap procedure One possible solution is tosimply average all of the obtained sample covariance matricesso as to produce a single estimate for cov(Sn) To do this let K

denote the total number of thinned processes and let Sbnj and

Snj denote Sbn and the bootstrapped mean of Sb

n b = 1 B for the j th thinned process where Sb

n is as defined in (10) Wethus obtain the following estimator for the covariance matrixof Sn

cov(Sn) = 1

K

Ksum

j=1

Bsum

b=1

(Sbnj minus Snj )(Sb

nj minus Snj )prime

B minus 1 (11)

32 Consistency and Practical Issues

To study the asymptotic properties of cov(Sn) we also as-sume the following condition

1

|Dn|12

int

Dn

int

R2Dn

|g(s1 minus s2) minus 1|ds1 ds2 lt C (12)

where R2Dn = s s isin R

2 but s isin Dn Condition (12) is onlyslightly stronger than

intR2 |g(s) minus 1|ds lt C which is implied

by condition (6) In the case where the Dnrsquos are sufficientlyregular (eg a sequence of n times n squares) such that |(Dn +s) cap R

2Dn| lt Cs|Dn|12 then condition (12) is implied bythe condition

intR2 s|g(s) minus 1|ds lt C which holds for any

process that has a finite range of dependence Furthermore itholds for the LGCP if the correlation function of the underlyingGaussian random field generating the intensities decays to 0 ata rate faster than sminus3

Theorem 2 Assume that conditions (2) (4) (5) and (6)hold Then

1

|Dn|cov(Sn)

L2rarr 1

|Dn| cov(Sn)

If we further assume (12) then 1|Dn|2 E[ cov(Sn)minuscov(Sn)]2 le

C[1kn + 1|Dl(n)|] for some C lt infin

Proof See Appendix B

Theorem 2 establishes the L2 consistency of the proposedcovariance matrix estimator for a wide range of values for thenumber of thinned processes and the subblock size But to ap-ply the proposed method in practice these values must be deter-mined For the former the proposed thinning algorithm can beapplied as many times as necessary until a stable estimator canbe obtained For the latter Theorem 2 suggests that the ldquoopti-malrdquo rate for the subblock size is |Dn|12 where the word ldquoopti-malrdquo is in the sense of minimizing the mean squared error Thisresult agrees with the findings for the optimal subblock size forother resampling variance estimators obtained from quantita-tive processes (eg Sherman 1996) If we further assume thatthe best rate can be approximated by cn = c|Dn|12 where c isan unknown constant then we can adapt the algorithm of Halland Jing (1996) to estimate cn and thus determine the ldquooptimalrdquosubblock size

Specifically let Dim i = 1 kprime

n be a set of subblocks con-tained in Dn that have the same size and shape let Si

mj be Sn

in (8) obtained from the j th thinned process on Dim and let

Smj be the mean of Simj We then define the sample variancendash

covariance matrix of Simj averaged over the K replicate thinned

point processes

θm = 1

K

Ksum

j=1

kprimensum

i=1

(Simj minus Smj )(Si

mj minus Smj )prime

kprimen minus 1

If Dim is relatively small compared with Dn then the forego-

ing should be a good estimator for the true covariance matrix ofSm where Sm is Sn in (8) defined on Dm For each Di

m we thenapply (11) to estimate cov(Sm) over a fine set of candidate sub-block sizes say cprime

m = cprime|Dim|12 Let θm(j k) and [θ i

m(j k)]prime bethe (j k)th element of θm and the resulting estimator for usingcprimem We define the best cm as the one that has the smallest value

for the following criterion

M(cprimem) = 1

kprimen

kprimensum

i=1

psum

j=1

psum

k=1

[θ im(j k)]prime minus θm(j k)

2

The ldquooptimalrdquo subblock size for Dn cn then can be estimatedeasily using the relationship cn = cm(|Dn||Dm|)12

Guan and Loh Block Bootstrap Variance Estimation 1381

4 SIMULATIONS

41 Log-Gaussian Cox Process

We performed a simulation study to test the results of ourtheoretical findings On an observation region D of size L timesL where L = 12 and 4 we simulated 1000 realizations ofan inhomogeneous Poisson process with intensity function λ(s)given by

logλ(s) = α + βX(s) + G(s) (13)

where G is a mean-0 Gaussian random field This is the LGCPmodel with inhomogeneity due to the covariate X

For values of X we used a single realization from a dif-ferent mean-0 Gaussian random field We kept this realizationfixed throughout the whole simulation study that is we usedthe same X for each of the 1000 point process realizationsFigure 1 shows a surface plot of X for L = 4 The values ofX for L = 1 and 2 are the lower-left corner of those shownin the figure For G we used the exponential model C(h) =σ 2 exp(minushρ) for its covariance function where ρ = 1 and 2and σ = 1 We set α and β to be 702 and 2 We generated thepoint process patterns using the rpoispp function in the spat-stat R package (Baddeley and Turner 2005) The average num-ber of points simulated were about 1200 4500 and 16700 forL = 12 and 4

For each of the 1000 point patterns we obtained the esti-mates α and β by maximizing the Poisson likelihood (usingthe ppm function in spatstat) Next we thinned the point pat-tern according to the criterion described in Section 3 resultingin 20 independent thinned processes for each realization Theaverage number of points in the thinned patterns was roughly490 1590 and 5400 for L = 12 and 4 so about 30ndash40

Figure 1 Image of the covariate X used in the simulation Therange of X is between minus6066 and 5303 where darker colors rep-resent larger values of X

of the points were retained We then applied the bootstrap pro-cedure to each of the thinned processes to get the variance esti-mates for α and β From (13) we found that the quantities to becomputed with each bootstrap sample were

sumλ(slowast) for α andsum

X(slowast)λ(slowast) for β where slowast represents the locations of thepoints in a bootstrap sample We used nonoverlapping squareblocks in the bootstrap procedure For L = 1 we used blockswith side lengths 14 13 and 12 whereas for L = 24 weused blocks with side lengths 14 13 12 and 1 The numberof bootstrap samples was 499

We computed the variances of α and β from the bootstrapvariances using Theorem 1 (9) and the relationship of cov(Sn)

to Bn We averaged the variance estimates across the inde-pendent thinnings Finally we constructed nominal 95 con-fidence intervals yielding a total of 1000 confidence intervalsfor α and β

Table 1 gives the empirical coverage of the confidence inter-vals for β The coverage for α (not shown) is similar but slightlylower For both values of ρ we found that the empirical cover-age increased toward the nominal level as the observation re-gion size increased This increase in coverage agrees with whatwe would expect from the theory The empirical coverage forρ = 1 is typically closer to the nominal level than for ρ = 2Note that ρ = 1 corresponds to a weaker dependence Thuswe would expect our method to work better if the dependencein the data were relatively weaker It is interesting to note thatfor L = 2 and 4 the best coverage for ρ = 2 was achieved byusing a block larger than its counterpart for ρ = 1 Thus forerrors with longer-range dependence it appears that the blockbootstrap procedure requires larger blocks to work well

Table 2 gives the bootstrap estimates of the standard errorsof β for ρ = 2 averaged over the 1000 realizations The stan-dard deviations of these estimates over the 1000 realizationsare included in brackets The ldquotruerdquo standard errors computedusing the estimated regression parameters from the independentrealizations are also given The table shows that the bootstrapestimates have a negative bias becoming proportionally closerto the ldquotruerdquo standard errors as the region size L increases

In summary we found that our method worked reasonablywell when the regression parameters were estimated We founda slight undercoverage in each instance but noted that this un-dercoverage becomes smaller as the sample size increases Wealso estimated the standard errors by using the true regressionparameters in our thinning step and found that the coverageincreased only slightly

Table 1 Empirical coverage of 95 nominal confidence intervalsof β in the LGCP case

Block size

Region size 14 13 12 1

ρ = 11 85 77 492 91 90 88 494 93 93 93 90

ρ = 21 84 76 502 87 88 87 504 88 89 90 89

1382 Journal of the American Statistical Association December 2007

Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case

ρ = 2 Block size

Region size True SE 14 13 12 1

1 24 17(04) 16(05) 13(06)

2 12 10(02) 10(02) 10(03) 08(03)

4 07 056(003) 057(004) 060(006) 060(008)

42 Inhomogeneous NeymanndashScott Process

Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case

To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case

We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β

and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing

int a

0[K(t)c minus K(tκω)c]2 dt (14)

with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ

and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode

Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the

former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method

5 AN APPLICATION

We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model

λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not

Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and

the plug-in method in the INSP case

Block size

Region size Plug-in 14 13 12

ω = 021 96 91 89 812 96 94 94 94

ω = 041 93 87 87 772 95 90 90 91

Guan and Loh Block Bootstrap Variance Estimation 1383

Figure 2 Locations of BLP trees

We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters

To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not

Table 4 Results for the BLP data

Plug-in method Thinned block bootstrap

EST STD 95 CI STD 95 CI

β1 02 02 (minus02 06) 017 (minus014 054)

β2 584 253 (891080) 212 (169999)

NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval

From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions

APPENDIX A PROOF OF THEOREM 1

Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device

Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)

n (β0)]4 lt C for some C lt infin

Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-

tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded

by the following (ignoring some multiplicative constants)int int int int

|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4

+[int int

|Q2(s1 s2)|ds1 ds2

]2

+int int int

|Q3(s1 s2 s3)|ds1 ds2 ds3

+ |Dn|int int

|Q2(s1 s2)|ds1 ds2

+int int

|Q2(s1 s2)|ds1 ds2

where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven

Proof of Theorem 1

First note that

U(1)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(1)(xβ)

λ(xβ)minus

intλ(1)(sβ)ds

]

and

U(2)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(2)(xβ)

λ(xβ)minus

intλ(2)(sβ)ds

]

minus 1

|Dn|sum

xisinNcapDn

[λ(1)(xβ)

λ(xβ)

]2

= U(2)na(β) minus U

(2)nb

(β)

Using the Taylor expansion we can obtain

|Dn|12(βn minus β0) = minus[U

(2)n (βn)

]minus1|Dn|12U(1)n (β0)

where βn is between βn and β0 We need to show that

U(2)na(β0)

prarr 0

and

U(2)nb

(β0)prarr 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

1384 Journal of the American Statistical Association December 2007

To show the first expression note that E[U(2)na(β0)] = 0 Thus we only

need look at the variance term

var[U

(2)na(β0)

]

= 1

|Dn|2int int

λ(2)(s1β0)λ(2)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(2)(sβ0)]2

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)

prarr 0

To show the second expression note that E[U(2)nb

(β0)] =1

|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only

the variance term

var[U

(2)nb

(β0)]

= 1

|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(1)(sβ0)]4

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb

(β0)prarr

1|Dn|

int [λ(1)(sβ0)]2λ(sβ0) ds

Now we want to show that |Dn|12U(1)n (β0) converges to a

normal distribution First we derive the mean and variance of|Dn|12U

(1)n (β0) Clearly E[|Dn|12U

(1)n (β0)] = 0 For the variance

var[|Dn|12U

(1)n (β0)

]

= 1

|Dn|int int

λ(1)(s1β0)λ(1)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

Now consider a new sequence of regions Dlowastn where Dlowast

n sub Dn and

|Dlowastn||Dn| rarr 1 as n rarr infin Let U

(1)nlowast(β0) be the estimating function

for the realization of the point process on Dlowastn Based on the expression

of the variance we can deduce that

var[|Dn|12U

(1)n (β0) minus |Dlowast

n|12U(1)nlowast(β0)

] rarr 0 as n rarr infin

Thus

|Dn|12U(1)n (β0) sim |Dlowast

n|12U(1)nlowast(β0)

where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast

n|12U(1)nlowast(β0) con-

verges to a normal distribution for some properly defined Dlowastn

To obtain a sequence of Dlowastn we apply the blocking technique used

by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt

η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi

l(n) i = 1 kn Within each subsquare we further obtain Di

m(n)

an m(n) times m(n) square sharing the same center with Dil(n)

Note that

d(Dim(n)

Djm(n)

) ge nη for i = j Now define

Dlowastn =

kn⋃

i=1

Dim(n)

Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show

that |Dlowastn|12U

(1)nlowast(β0) converges to a normal distribution This is true

due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here

APPENDIX B PROOF OF THEOREM 2

To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1

sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes

var(Sn) =Bsum

b=1

(Sbn minus Sn)2

B minus 1

As B goes to infinity we see that the foregoing converges to

var(Sbn | cap Dn)

= 1

kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)

sum

Dj

l(n)

Z(x1 minus cj + ci )Z(x2 minus cj + ci )

minus 1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj2l(n)

Z(x1 minus cj1 + ci

)

times Z(x2 minus cj2 + ci

)

where the notationsum

D is short forsum

Dcap We wish to show thatvar(Sb

n | cap Dn)|Dn| converges in L2 to

1

|Dn| var(Sn) = (λn)2

|Dn|int

Dn

int

Dn

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

+ λn

|Dn|int

Dn

[Z(s)]2 ds

To show this we need to show that

1

|Dn|E[var(Sbn | cap Dn)] rarr 1

|Dn| var(Sn) (B1)

and

1

|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)

To show (B1) note that

E[var(Sbn | cap Dn)]

= (λn)2knsum

i=1

int int

Dil(n)

Z(s1)Z(s2)g(s1 minus s2) ds1 ds2

+ λn(kn minus 1)

kn

int

Dn

[Z(s)]2 ds

minus (λn)2

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times g(s1 minus s2 + cj1 minus cj2

)ds1 ds2

Thus

1

|Dn|E[var(Sb

n | cap Dn)] minus var(Sn)

= minus λn

kn|Dn|int

Dn

[Z(s)]2 ds

minus (λn)2

|Dn|sumsum

i =j

int

Dil(n)

int

Dj

l(n)

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

Guan and Loh Block Bootstrap Variance Estimation 1385

minus (λn)2

|Dn|1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times ϕ(s1 minus s2 + cj1 minus cj2

)ds1 ds2

= minusT 1n minus T 2

n minus T 3n

T 1n goes to 0 due to conditions (2) and (5) For T 2

n and T 3n lengthy yet

elementary derivations yield

|T 2n | le C(λn)2

|kn|knsum

i=1

[1

|Dn|int

DnminusDn

|Dn cap (Dn minus s)||ϕ(s)|ds

minus 1

|Dil(n)

|int

Dil(n)

minusDil(n)

∣∣Dil(n) cap (

Dil(n) minus s

)∣∣|ϕ(s)|ds]

and

T 3n le C(λn)2

kn

1

|Dn|int

Dn

int

Dn

|ϕ(s1 minus s2)|ds1 ds2

Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1

n and T 3n are both

of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition

(12)To prove (B2) we first note that [var(Sb

n | cap Dn)]2 is equal to thesum of the following three terms

1

k2n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj2l(n)

[Z

(x1 minus cj1 + ci1

)

times Z(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)Z

(x4 minus cj2 + ci2

)]

minus 2

k3n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)

times Z(x4 minus cj3 + ci2

)]

and

1

k4n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

knsum

j4=1

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

sum

Dj4l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj2 + ci1

)Z

(x3 minus cj3 + ci2

)Z

(x4 minus cj4 + ci2

)]

Furthermore the following relationships are possible for x1 x4

a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but

x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =

x3 = x4 but x1 = x2e x1 = x2 = x3 = x4

When calculating the variance of 1|Dn| var(Sb

n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1

|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-

plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner

Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral

(λn)4

k2n|Dn|2

knsum

i1=1

knsum

i2=1

knsum

j1=1

int

Dj1l(n)

Z(s1 minus cj1 + ci1

)

times[ knsum

j2=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

Z(s2 minus cj1 + ci1

)Z

(s3 minus cj2 + ci2

)

times Z(s4 minus cj2 + ci2

)F1(s1 s2 s3 s4) ds2 ds3 ds4

minus 2

kn

knsum

j2=1

knsum

j3=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj3l(n)

Z(s2 minus cj1 + ci1

)

times Z(s3 minus cj2 + ci2

)Z

(s4 minus cj3 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

+ 1

k2n

knsum

j2=1

knsum

j3=1

knsum

j4=1

int

Dj2l(n)

int

Dj3l(n)

int

Dj4l(n)

Z(s2 minus cj2 + ci1

)

times Z(s3 minus cj3 + ci2

)Z

(s4 minus cj4 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

]ds1

Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1

s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by

2C(λn)4

kn|Dn|2knsum

j1=1

int

Dj1l(n)

int

Dj1l(n)

int

Dn

int

Dn

|F2(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|F1(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by

C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|ϕ4(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ 4C(λn)4

kn|Dn|knsum

j2=1

int

Dn

int

Dj2l(n)

int

Dj2l(n)

|ϕ3(s1 s3 s4)|ds1 ds3 ds4

+ 2C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

[int

Dj1l(n)

int

Dj2l(n)

|ϕ(s1 s3)|ds1 ds3

]2

The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|

APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP

Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s

θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap

1386 Journal of the American Statistical Association December 2007

Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that

1

|Dn|var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0

where var[Sbn |(θ0) cap Dn] is var(Sb

n | cap N) defined in Appendix Bvar[Sb

n |(θn) cap Dn] is defined analogously

Letsum

D andsumsum =

Ddenote

sum

Dcap(acupb)

andsumsum

x1isinDcapNx2isinDcap(acupb)x1 =x2

Note that

1

|Dn|∣∣var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣

ltC

|Dn|kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)capN

sum

Dj

l(n)

1

+ C

|Dn|k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

capN

sum

Dj2l(n)

1

= C

|Dn|knsum

j=1

=sumsum

Dj

l(n)

1

+ C

|Dn|sum

Dn

1 + C

|Dn|kn

[sum

Dn

1

][ sum

xisinDncapN

1

]

Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for

0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator

[Received December 2006 Revised May 2007]

REFERENCES

Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42

Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350

Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston

Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point

Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York

Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-

VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-

geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear

Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821

(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125

Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737

Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730

Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag

McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419

Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482

Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics

From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275

Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154

Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47

Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93

Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523

(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048

Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley

Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258

Page 5: A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G UAN and Ji Meng L OH ... Estimates for the regression coefÞcients, ... block bootstrap

Guan and Loh Block Bootstrap Variance Estimation 1381

4 SIMULATIONS

41 Log-Gaussian Cox Process

We performed a simulation study to test the results of ourtheoretical findings On an observation region D of size L timesL where L = 12 and 4 we simulated 1000 realizations ofan inhomogeneous Poisson process with intensity function λ(s)given by

logλ(s) = α + βX(s) + G(s) (13)

where G is a mean-0 Gaussian random field This is the LGCPmodel with inhomogeneity due to the covariate X

For values of X we used a single realization from a dif-ferent mean-0 Gaussian random field We kept this realizationfixed throughout the whole simulation study that is we usedthe same X for each of the 1000 point process realizationsFigure 1 shows a surface plot of X for L = 4 The values ofX for L = 1 and 2 are the lower-left corner of those shownin the figure For G we used the exponential model C(h) =σ 2 exp(minushρ) for its covariance function where ρ = 1 and 2and σ = 1 We set α and β to be 702 and 2 We generated thepoint process patterns using the rpoispp function in the spat-stat R package (Baddeley and Turner 2005) The average num-ber of points simulated were about 1200 4500 and 16700 forL = 12 and 4

For each of the 1000 point patterns we obtained the esti-mates α and β by maximizing the Poisson likelihood (usingthe ppm function in spatstat) Next we thinned the point pat-tern according to the criterion described in Section 3 resultingin 20 independent thinned processes for each realization Theaverage number of points in the thinned patterns was roughly490 1590 and 5400 for L = 12 and 4 so about 30ndash40

Figure 1 Image of the covariate X used in the simulation Therange of X is between minus6066 and 5303 where darker colors rep-resent larger values of X

of the points were retained We then applied the bootstrap pro-cedure to each of the thinned processes to get the variance esti-mates for α and β From (13) we found that the quantities to becomputed with each bootstrap sample were

sumλ(slowast) for α andsum

X(slowast)λ(slowast) for β where slowast represents the locations of thepoints in a bootstrap sample We used nonoverlapping squareblocks in the bootstrap procedure For L = 1 we used blockswith side lengths 14 13 and 12 whereas for L = 24 weused blocks with side lengths 14 13 12 and 1 The numberof bootstrap samples was 499

We computed the variances of α and β from the bootstrapvariances using Theorem 1 (9) and the relationship of cov(Sn)

to Bn We averaged the variance estimates across the inde-pendent thinnings Finally we constructed nominal 95 con-fidence intervals yielding a total of 1000 confidence intervalsfor α and β

Table 1 gives the empirical coverage of the confidence inter-vals for β The coverage for α (not shown) is similar but slightlylower For both values of ρ we found that the empirical cover-age increased toward the nominal level as the observation re-gion size increased This increase in coverage agrees with whatwe would expect from the theory The empirical coverage forρ = 1 is typically closer to the nominal level than for ρ = 2Note that ρ = 1 corresponds to a weaker dependence Thuswe would expect our method to work better if the dependencein the data were relatively weaker It is interesting to note thatfor L = 2 and 4 the best coverage for ρ = 2 was achieved byusing a block larger than its counterpart for ρ = 1 Thus forerrors with longer-range dependence it appears that the blockbootstrap procedure requires larger blocks to work well

Table 2 gives the bootstrap estimates of the standard errorsof β for ρ = 2 averaged over the 1000 realizations The stan-dard deviations of these estimates over the 1000 realizationsare included in brackets The ldquotruerdquo standard errors computedusing the estimated regression parameters from the independentrealizations are also given The table shows that the bootstrapestimates have a negative bias becoming proportionally closerto the ldquotruerdquo standard errors as the region size L increases

In summary we found that our method worked reasonablywell when the regression parameters were estimated We founda slight undercoverage in each instance but noted that this un-dercoverage becomes smaller as the sample size increases Wealso estimated the standard errors by using the true regressionparameters in our thinning step and found that the coverageincreased only slightly

Table 1 Empirical coverage of 95 nominal confidence intervalsof β in the LGCP case

Block size

Region size 14 13 12 1

ρ = 11 85 77 492 91 90 88 494 93 93 93 90

ρ = 21 84 76 502 87 88 87 504 88 89 90 89

1382 Journal of the American Statistical Association December 2007

Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case

ρ = 2 Block size

Region size True SE 14 13 12 1

1 24 17(04) 16(05) 13(06)

2 12 10(02) 10(02) 10(03) 08(03)

4 07 056(003) 057(004) 060(006) 060(008)

42 Inhomogeneous NeymanndashScott Process

Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case

To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case

We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β

and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing

int a

0[K(t)c minus K(tκω)c]2 dt (14)

with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ

and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode

Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the

former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method

5 AN APPLICATION

We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model

λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not

Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and

the plug-in method in the INSP case

Block size

Region size Plug-in 14 13 12

ω = 021 96 91 89 812 96 94 94 94

ω = 041 93 87 87 772 95 90 90 91

Guan and Loh Block Bootstrap Variance Estimation 1383

Figure 2 Locations of BLP trees

We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters

To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not

Table 4 Results for the BLP data

Plug-in method Thinned block bootstrap

EST STD 95 CI STD 95 CI

β1 02 02 (minus02 06) 017 (minus014 054)

β2 584 253 (891080) 212 (169999)

NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval

From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions

APPENDIX A PROOF OF THEOREM 1

Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device

Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)

n (β0)]4 lt C for some C lt infin

Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-

tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded

by the following (ignoring some multiplicative constants)int int int int

|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4

+[int int

|Q2(s1 s2)|ds1 ds2

]2

+int int int

|Q3(s1 s2 s3)|ds1 ds2 ds3

+ |Dn|int int

|Q2(s1 s2)|ds1 ds2

+int int

|Q2(s1 s2)|ds1 ds2

where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven

Proof of Theorem 1

First note that

U(1)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(1)(xβ)

λ(xβ)minus

intλ(1)(sβ)ds

]

and

U(2)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(2)(xβ)

λ(xβ)minus

intλ(2)(sβ)ds

]

minus 1

|Dn|sum

xisinNcapDn

[λ(1)(xβ)

λ(xβ)

]2

= U(2)na(β) minus U

(2)nb

(β)

Using the Taylor expansion we can obtain

|Dn|12(βn minus β0) = minus[U

(2)n (βn)

]minus1|Dn|12U(1)n (β0)

where βn is between βn and β0 We need to show that

U(2)na(β0)

prarr 0

and

U(2)nb

(β0)prarr 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

1384 Journal of the American Statistical Association December 2007

To show the first expression note that E[U(2)na(β0)] = 0 Thus we only

need look at the variance term

var[U

(2)na(β0)

]

= 1

|Dn|2int int

λ(2)(s1β0)λ(2)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(2)(sβ0)]2

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)

prarr 0

To show the second expression note that E[U(2)nb

(β0)] =1

|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only

the variance term

var[U

(2)nb

(β0)]

= 1

|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(1)(sβ0)]4

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb

(β0)prarr

1|Dn|

int [λ(1)(sβ0)]2λ(sβ0) ds

Now we want to show that |Dn|12U(1)n (β0) converges to a

normal distribution First we derive the mean and variance of|Dn|12U

(1)n (β0) Clearly E[|Dn|12U

(1)n (β0)] = 0 For the variance

var[|Dn|12U

(1)n (β0)

]

= 1

|Dn|int int

λ(1)(s1β0)λ(1)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

Now consider a new sequence of regions Dlowastn where Dlowast

n sub Dn and

|Dlowastn||Dn| rarr 1 as n rarr infin Let U

(1)nlowast(β0) be the estimating function

for the realization of the point process on Dlowastn Based on the expression

of the variance we can deduce that

var[|Dn|12U

(1)n (β0) minus |Dlowast

n|12U(1)nlowast(β0)

] rarr 0 as n rarr infin

Thus

|Dn|12U(1)n (β0) sim |Dlowast

n|12U(1)nlowast(β0)

where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast

n|12U(1)nlowast(β0) con-

verges to a normal distribution for some properly defined Dlowastn

To obtain a sequence of Dlowastn we apply the blocking technique used

by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt

η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi

l(n) i = 1 kn Within each subsquare we further obtain Di

m(n)

an m(n) times m(n) square sharing the same center with Dil(n)

Note that

d(Dim(n)

Djm(n)

) ge nη for i = j Now define

Dlowastn =

kn⋃

i=1

Dim(n)

Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show

that |Dlowastn|12U

(1)nlowast(β0) converges to a normal distribution This is true

due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here

APPENDIX B PROOF OF THEOREM 2

To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1

sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes

var(Sn) =Bsum

b=1

(Sbn minus Sn)2

B minus 1

As B goes to infinity we see that the foregoing converges to

var(Sbn | cap Dn)

= 1

kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)

sum

Dj

l(n)

Z(x1 minus cj + ci )Z(x2 minus cj + ci )

minus 1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj2l(n)

Z(x1 minus cj1 + ci

)

times Z(x2 minus cj2 + ci

)

where the notationsum

D is short forsum

Dcap We wish to show thatvar(Sb

n | cap Dn)|Dn| converges in L2 to

1

|Dn| var(Sn) = (λn)2

|Dn|int

Dn

int

Dn

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

+ λn

|Dn|int

Dn

[Z(s)]2 ds

To show this we need to show that

1

|Dn|E[var(Sbn | cap Dn)] rarr 1

|Dn| var(Sn) (B1)

and

1

|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)

To show (B1) note that

E[var(Sbn | cap Dn)]

= (λn)2knsum

i=1

int int

Dil(n)

Z(s1)Z(s2)g(s1 minus s2) ds1 ds2

+ λn(kn minus 1)

kn

int

Dn

[Z(s)]2 ds

minus (λn)2

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times g(s1 minus s2 + cj1 minus cj2

)ds1 ds2

Thus

1

|Dn|E[var(Sb

n | cap Dn)] minus var(Sn)

= minus λn

kn|Dn|int

Dn

[Z(s)]2 ds

minus (λn)2

|Dn|sumsum

i =j

int

Dil(n)

int

Dj

l(n)

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

Guan and Loh Block Bootstrap Variance Estimation 1385

minus (λn)2

|Dn|1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times ϕ(s1 minus s2 + cj1 minus cj2

)ds1 ds2

= minusT 1n minus T 2

n minus T 3n

T 1n goes to 0 due to conditions (2) and (5) For T 2

n and T 3n lengthy yet

elementary derivations yield

|T 2n | le C(λn)2

|kn|knsum

i=1

[1

|Dn|int

DnminusDn

|Dn cap (Dn minus s)||ϕ(s)|ds

minus 1

|Dil(n)

|int

Dil(n)

minusDil(n)

∣∣Dil(n) cap (

Dil(n) minus s

)∣∣|ϕ(s)|ds]

and

T 3n le C(λn)2

kn

1

|Dn|int

Dn

int

Dn

|ϕ(s1 minus s2)|ds1 ds2

Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1

n and T 3n are both

of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition

(12)To prove (B2) we first note that [var(Sb

n | cap Dn)]2 is equal to thesum of the following three terms

1

k2n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj2l(n)

[Z

(x1 minus cj1 + ci1

)

times Z(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)Z

(x4 minus cj2 + ci2

)]

minus 2

k3n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)

times Z(x4 minus cj3 + ci2

)]

and

1

k4n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

knsum

j4=1

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

sum

Dj4l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj2 + ci1

)Z

(x3 minus cj3 + ci2

)Z

(x4 minus cj4 + ci2

)]

Furthermore the following relationships are possible for x1 x4

a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but

x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =

x3 = x4 but x1 = x2e x1 = x2 = x3 = x4

When calculating the variance of 1|Dn| var(Sb

n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1

|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-

plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner

Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral

(λn)4

k2n|Dn|2

knsum

i1=1

knsum

i2=1

knsum

j1=1

int

Dj1l(n)

Z(s1 minus cj1 + ci1

)

times[ knsum

j2=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

Z(s2 minus cj1 + ci1

)Z

(s3 minus cj2 + ci2

)

times Z(s4 minus cj2 + ci2

)F1(s1 s2 s3 s4) ds2 ds3 ds4

minus 2

kn

knsum

j2=1

knsum

j3=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj3l(n)

Z(s2 minus cj1 + ci1

)

times Z(s3 minus cj2 + ci2

)Z

(s4 minus cj3 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

+ 1

k2n

knsum

j2=1

knsum

j3=1

knsum

j4=1

int

Dj2l(n)

int

Dj3l(n)

int

Dj4l(n)

Z(s2 minus cj2 + ci1

)

times Z(s3 minus cj3 + ci2

)Z

(s4 minus cj4 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

]ds1

Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1

s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by

2C(λn)4

kn|Dn|2knsum

j1=1

int

Dj1l(n)

int

Dj1l(n)

int

Dn

int

Dn

|F2(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|F1(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by

C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|ϕ4(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ 4C(λn)4

kn|Dn|knsum

j2=1

int

Dn

int

Dj2l(n)

int

Dj2l(n)

|ϕ3(s1 s3 s4)|ds1 ds3 ds4

+ 2C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

[int

Dj1l(n)

int

Dj2l(n)

|ϕ(s1 s3)|ds1 ds3

]2

The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|

APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP

Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s

θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap

1386 Journal of the American Statistical Association December 2007

Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that

1

|Dn|var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0

where var[Sbn |(θ0) cap Dn] is var(Sb

n | cap N) defined in Appendix Bvar[Sb

n |(θn) cap Dn] is defined analogously

Letsum

D andsumsum =

Ddenote

sum

Dcap(acupb)

andsumsum

x1isinDcapNx2isinDcap(acupb)x1 =x2

Note that

1

|Dn|∣∣var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣

ltC

|Dn|kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)capN

sum

Dj

l(n)

1

+ C

|Dn|k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

capN

sum

Dj2l(n)

1

= C

|Dn|knsum

j=1

=sumsum

Dj

l(n)

1

+ C

|Dn|sum

Dn

1 + C

|Dn|kn

[sum

Dn

1

][ sum

xisinDncapN

1

]

Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for

0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator

[Received December 2006 Revised May 2007]

REFERENCES

Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42

Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350

Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston

Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point

Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York

Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-

VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-

geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear

Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821

(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125

Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737

Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730

Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag

McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419

Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482

Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics

From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275

Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154

Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47

Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93

Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523

(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048

Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley

Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258

Page 6: A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G UAN and Ji Meng L OH ... Estimates for the regression coefÞcients, ... block bootstrap

1382 Journal of the American Statistical Association December 2007

Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case

ρ = 2 Block size

Region size True SE 14 13 12 1

1 24 17(04) 16(05) 13(06)

2 12 10(02) 10(02) 10(03) 08(03)

4 07 056(003) 057(004) 060(006) 060(008)

42 Inhomogeneous NeymanndashScott Process

Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case

To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case

We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β

and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing

int a

0[K(t)c minus K(tκω)c]2 dt (14)

with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ

and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode

Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the

former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method

5 AN APPLICATION

We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model

λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not

Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and

the plug-in method in the INSP case

Block size

Region size Plug-in 14 13 12

ω = 021 96 91 89 812 96 94 94 94

ω = 041 93 87 87 772 95 90 90 91

Guan and Loh Block Bootstrap Variance Estimation 1383

Figure 2 Locations of BLP trees

We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters

To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not

Table 4 Results for the BLP data

Plug-in method Thinned block bootstrap

EST STD 95 CI STD 95 CI

β1 02 02 (minus02 06) 017 (minus014 054)

β2 584 253 (891080) 212 (169999)

NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval

From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions

APPENDIX A PROOF OF THEOREM 1

Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device

Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)

n (β0)]4 lt C for some C lt infin

Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-

tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded

by the following (ignoring some multiplicative constants)int int int int

|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4

+[int int

|Q2(s1 s2)|ds1 ds2

]2

+int int int

|Q3(s1 s2 s3)|ds1 ds2 ds3

+ |Dn|int int

|Q2(s1 s2)|ds1 ds2

+int int

|Q2(s1 s2)|ds1 ds2

where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven

Proof of Theorem 1

First note that

U(1)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(1)(xβ)

λ(xβ)minus

intλ(1)(sβ)ds

]

and

U(2)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(2)(xβ)

λ(xβ)minus

intλ(2)(sβ)ds

]

minus 1

|Dn|sum

xisinNcapDn

[λ(1)(xβ)

λ(xβ)

]2

= U(2)na(β) minus U

(2)nb

(β)

Using the Taylor expansion we can obtain

|Dn|12(βn minus β0) = minus[U

(2)n (βn)

]minus1|Dn|12U(1)n (β0)

where βn is between βn and β0 We need to show that

U(2)na(β0)

prarr 0

and

U(2)nb

(β0)prarr 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

1384 Journal of the American Statistical Association December 2007

To show the first expression note that E[U(2)na(β0)] = 0 Thus we only

need look at the variance term

var[U

(2)na(β0)

]

= 1

|Dn|2int int

λ(2)(s1β0)λ(2)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(2)(sβ0)]2

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)

prarr 0

To show the second expression note that E[U(2)nb

(β0)] =1

|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only

the variance term

var[U

(2)nb

(β0)]

= 1

|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(1)(sβ0)]4

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb

(β0)prarr

1|Dn|

int [λ(1)(sβ0)]2λ(sβ0) ds

Now we want to show that |Dn|12U(1)n (β0) converges to a

normal distribution First we derive the mean and variance of|Dn|12U

(1)n (β0) Clearly E[|Dn|12U

(1)n (β0)] = 0 For the variance

var[|Dn|12U

(1)n (β0)

]

= 1

|Dn|int int

λ(1)(s1β0)λ(1)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

Now consider a new sequence of regions Dlowastn where Dlowast

n sub Dn and

|Dlowastn||Dn| rarr 1 as n rarr infin Let U

(1)nlowast(β0) be the estimating function

for the realization of the point process on Dlowastn Based on the expression

of the variance we can deduce that

var[|Dn|12U

(1)n (β0) minus |Dlowast

n|12U(1)nlowast(β0)

] rarr 0 as n rarr infin

Thus

|Dn|12U(1)n (β0) sim |Dlowast

n|12U(1)nlowast(β0)

where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast

n|12U(1)nlowast(β0) con-

verges to a normal distribution for some properly defined Dlowastn

To obtain a sequence of Dlowastn we apply the blocking technique used

by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt

η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi

l(n) i = 1 kn Within each subsquare we further obtain Di

m(n)

an m(n) times m(n) square sharing the same center with Dil(n)

Note that

d(Dim(n)

Djm(n)

) ge nη for i = j Now define

Dlowastn =

kn⋃

i=1

Dim(n)

Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show

that |Dlowastn|12U

(1)nlowast(β0) converges to a normal distribution This is true

due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here

APPENDIX B PROOF OF THEOREM 2

To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1

sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes

var(Sn) =Bsum

b=1

(Sbn minus Sn)2

B minus 1

As B goes to infinity we see that the foregoing converges to

var(Sbn | cap Dn)

= 1

kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)

sum

Dj

l(n)

Z(x1 minus cj + ci )Z(x2 minus cj + ci )

minus 1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj2l(n)

Z(x1 minus cj1 + ci

)

times Z(x2 minus cj2 + ci

)

where the notationsum

D is short forsum

Dcap We wish to show thatvar(Sb

n | cap Dn)|Dn| converges in L2 to

1

|Dn| var(Sn) = (λn)2

|Dn|int

Dn

int

Dn

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

+ λn

|Dn|int

Dn

[Z(s)]2 ds

To show this we need to show that

1

|Dn|E[var(Sbn | cap Dn)] rarr 1

|Dn| var(Sn) (B1)

and

1

|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)

To show (B1) note that

E[var(Sbn | cap Dn)]

= (λn)2knsum

i=1

int int

Dil(n)

Z(s1)Z(s2)g(s1 minus s2) ds1 ds2

+ λn(kn minus 1)

kn

int

Dn

[Z(s)]2 ds

minus (λn)2

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times g(s1 minus s2 + cj1 minus cj2

)ds1 ds2

Thus

1

|Dn|E[var(Sb

n | cap Dn)] minus var(Sn)

= minus λn

kn|Dn|int

Dn

[Z(s)]2 ds

minus (λn)2

|Dn|sumsum

i =j

int

Dil(n)

int

Dj

l(n)

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

Guan and Loh Block Bootstrap Variance Estimation 1385

minus (λn)2

|Dn|1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times ϕ(s1 minus s2 + cj1 minus cj2

)ds1 ds2

= minusT 1n minus T 2

n minus T 3n

T 1n goes to 0 due to conditions (2) and (5) For T 2

n and T 3n lengthy yet

elementary derivations yield

|T 2n | le C(λn)2

|kn|knsum

i=1

[1

|Dn|int

DnminusDn

|Dn cap (Dn minus s)||ϕ(s)|ds

minus 1

|Dil(n)

|int

Dil(n)

minusDil(n)

∣∣Dil(n) cap (

Dil(n) minus s

)∣∣|ϕ(s)|ds]

and

T 3n le C(λn)2

kn

1

|Dn|int

Dn

int

Dn

|ϕ(s1 minus s2)|ds1 ds2

Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1

n and T 3n are both

of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition

(12)To prove (B2) we first note that [var(Sb

n | cap Dn)]2 is equal to thesum of the following three terms

1

k2n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj2l(n)

[Z

(x1 minus cj1 + ci1

)

times Z(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)Z

(x4 minus cj2 + ci2

)]

minus 2

k3n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)

times Z(x4 minus cj3 + ci2

)]

and

1

k4n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

knsum

j4=1

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

sum

Dj4l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj2 + ci1

)Z

(x3 minus cj3 + ci2

)Z

(x4 minus cj4 + ci2

)]

Furthermore the following relationships are possible for x1 x4

a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but

x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =

x3 = x4 but x1 = x2e x1 = x2 = x3 = x4

When calculating the variance of 1|Dn| var(Sb

n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1

|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-

plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner

Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral

(λn)4

k2n|Dn|2

knsum

i1=1

knsum

i2=1

knsum

j1=1

int

Dj1l(n)

Z(s1 minus cj1 + ci1

)

times[ knsum

j2=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

Z(s2 minus cj1 + ci1

)Z

(s3 minus cj2 + ci2

)

times Z(s4 minus cj2 + ci2

)F1(s1 s2 s3 s4) ds2 ds3 ds4

minus 2

kn

knsum

j2=1

knsum

j3=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj3l(n)

Z(s2 minus cj1 + ci1

)

times Z(s3 minus cj2 + ci2

)Z

(s4 minus cj3 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

+ 1

k2n

knsum

j2=1

knsum

j3=1

knsum

j4=1

int

Dj2l(n)

int

Dj3l(n)

int

Dj4l(n)

Z(s2 minus cj2 + ci1

)

times Z(s3 minus cj3 + ci2

)Z

(s4 minus cj4 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

]ds1

Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1

s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by

2C(λn)4

kn|Dn|2knsum

j1=1

int

Dj1l(n)

int

Dj1l(n)

int

Dn

int

Dn

|F2(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|F1(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by

C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|ϕ4(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ 4C(λn)4

kn|Dn|knsum

j2=1

int

Dn

int

Dj2l(n)

int

Dj2l(n)

|ϕ3(s1 s3 s4)|ds1 ds3 ds4

+ 2C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

[int

Dj1l(n)

int

Dj2l(n)

|ϕ(s1 s3)|ds1 ds3

]2

The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|

APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP

Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s

θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap

1386 Journal of the American Statistical Association December 2007

Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that

1

|Dn|var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0

where var[Sbn |(θ0) cap Dn] is var(Sb

n | cap N) defined in Appendix Bvar[Sb

n |(θn) cap Dn] is defined analogously

Letsum

D andsumsum =

Ddenote

sum

Dcap(acupb)

andsumsum

x1isinDcapNx2isinDcap(acupb)x1 =x2

Note that

1

|Dn|∣∣var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣

ltC

|Dn|kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)capN

sum

Dj

l(n)

1

+ C

|Dn|k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

capN

sum

Dj2l(n)

1

= C

|Dn|knsum

j=1

=sumsum

Dj

l(n)

1

+ C

|Dn|sum

Dn

1 + C

|Dn|kn

[sum

Dn

1

][ sum

xisinDncapN

1

]

Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for

0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator

[Received December 2006 Revised May 2007]

REFERENCES

Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42

Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350

Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston

Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point

Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York

Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-

VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-

geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear

Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821

(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125

Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737

Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730

Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag

McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419

Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482

Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics

From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275

Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154

Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47

Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93

Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523

(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048

Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley

Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258

Page 7: A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G UAN and Ji Meng L OH ... Estimates for the regression coefÞcients, ... block bootstrap

Guan and Loh Block Bootstrap Variance Estimation 1383

Figure 2 Locations of BLP trees

We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters

To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not

Table 4 Results for the BLP data

Plug-in method Thinned block bootstrap

EST STD 95 CI STD 95 CI

β1 02 02 (minus02 06) 017 (minus014 054)

β2 584 253 (891080) 212 (169999)

NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval

From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions

APPENDIX A PROOF OF THEOREM 1

Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device

Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)

n (β0)]4 lt C for some C lt infin

Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-

tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded

by the following (ignoring some multiplicative constants)int int int int

|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4

+[int int

|Q2(s1 s2)|ds1 ds2

]2

+int int int

|Q3(s1 s2 s3)|ds1 ds2 ds3

+ |Dn|int int

|Q2(s1 s2)|ds1 ds2

+int int

|Q2(s1 s2)|ds1 ds2

where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven

Proof of Theorem 1

First note that

U(1)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(1)(xβ)

λ(xβ)minus

intλ(1)(sβ)ds

]

and

U(2)n (β) = 1

|Dn|[ sum

xisinNcapDn

λ(2)(xβ)

λ(xβ)minus

intλ(2)(sβ)ds

]

minus 1

|Dn|sum

xisinNcapDn

[λ(1)(xβ)

λ(xβ)

]2

= U(2)na(β) minus U

(2)nb

(β)

Using the Taylor expansion we can obtain

|Dn|12(βn minus β0) = minus[U

(2)n (βn)

]minus1|Dn|12U(1)n (β0)

where βn is between βn and β0 We need to show that

U(2)na(β0)

prarr 0

and

U(2)nb

(β0)prarr 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

1384 Journal of the American Statistical Association December 2007

To show the first expression note that E[U(2)na(β0)] = 0 Thus we only

need look at the variance term

var[U

(2)na(β0)

]

= 1

|Dn|2int int

λ(2)(s1β0)λ(2)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(2)(sβ0)]2

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)

prarr 0

To show the second expression note that E[U(2)nb

(β0)] =1

|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only

the variance term

var[U

(2)nb

(β0)]

= 1

|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(1)(sβ0)]4

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb

(β0)prarr

1|Dn|

int [λ(1)(sβ0)]2λ(sβ0) ds

Now we want to show that |Dn|12U(1)n (β0) converges to a

normal distribution First we derive the mean and variance of|Dn|12U

(1)n (β0) Clearly E[|Dn|12U

(1)n (β0)] = 0 For the variance

var[|Dn|12U

(1)n (β0)

]

= 1

|Dn|int int

λ(1)(s1β0)λ(1)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

Now consider a new sequence of regions Dlowastn where Dlowast

n sub Dn and

|Dlowastn||Dn| rarr 1 as n rarr infin Let U

(1)nlowast(β0) be the estimating function

for the realization of the point process on Dlowastn Based on the expression

of the variance we can deduce that

var[|Dn|12U

(1)n (β0) minus |Dlowast

n|12U(1)nlowast(β0)

] rarr 0 as n rarr infin

Thus

|Dn|12U(1)n (β0) sim |Dlowast

n|12U(1)nlowast(β0)

where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast

n|12U(1)nlowast(β0) con-

verges to a normal distribution for some properly defined Dlowastn

To obtain a sequence of Dlowastn we apply the blocking technique used

by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt

η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi

l(n) i = 1 kn Within each subsquare we further obtain Di

m(n)

an m(n) times m(n) square sharing the same center with Dil(n)

Note that

d(Dim(n)

Djm(n)

) ge nη for i = j Now define

Dlowastn =

kn⋃

i=1

Dim(n)

Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show

that |Dlowastn|12U

(1)nlowast(β0) converges to a normal distribution This is true

due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here

APPENDIX B PROOF OF THEOREM 2

To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1

sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes

var(Sn) =Bsum

b=1

(Sbn minus Sn)2

B minus 1

As B goes to infinity we see that the foregoing converges to

var(Sbn | cap Dn)

= 1

kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)

sum

Dj

l(n)

Z(x1 minus cj + ci )Z(x2 minus cj + ci )

minus 1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj2l(n)

Z(x1 minus cj1 + ci

)

times Z(x2 minus cj2 + ci

)

where the notationsum

D is short forsum

Dcap We wish to show thatvar(Sb

n | cap Dn)|Dn| converges in L2 to

1

|Dn| var(Sn) = (λn)2

|Dn|int

Dn

int

Dn

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

+ λn

|Dn|int

Dn

[Z(s)]2 ds

To show this we need to show that

1

|Dn|E[var(Sbn | cap Dn)] rarr 1

|Dn| var(Sn) (B1)

and

1

|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)

To show (B1) note that

E[var(Sbn | cap Dn)]

= (λn)2knsum

i=1

int int

Dil(n)

Z(s1)Z(s2)g(s1 minus s2) ds1 ds2

+ λn(kn minus 1)

kn

int

Dn

[Z(s)]2 ds

minus (λn)2

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times g(s1 minus s2 + cj1 minus cj2

)ds1 ds2

Thus

1

|Dn|E[var(Sb

n | cap Dn)] minus var(Sn)

= minus λn

kn|Dn|int

Dn

[Z(s)]2 ds

minus (λn)2

|Dn|sumsum

i =j

int

Dil(n)

int

Dj

l(n)

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

Guan and Loh Block Bootstrap Variance Estimation 1385

minus (λn)2

|Dn|1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times ϕ(s1 minus s2 + cj1 minus cj2

)ds1 ds2

= minusT 1n minus T 2

n minus T 3n

T 1n goes to 0 due to conditions (2) and (5) For T 2

n and T 3n lengthy yet

elementary derivations yield

|T 2n | le C(λn)2

|kn|knsum

i=1

[1

|Dn|int

DnminusDn

|Dn cap (Dn minus s)||ϕ(s)|ds

minus 1

|Dil(n)

|int

Dil(n)

minusDil(n)

∣∣Dil(n) cap (

Dil(n) minus s

)∣∣|ϕ(s)|ds]

and

T 3n le C(λn)2

kn

1

|Dn|int

Dn

int

Dn

|ϕ(s1 minus s2)|ds1 ds2

Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1

n and T 3n are both

of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition

(12)To prove (B2) we first note that [var(Sb

n | cap Dn)]2 is equal to thesum of the following three terms

1

k2n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj2l(n)

[Z

(x1 minus cj1 + ci1

)

times Z(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)Z

(x4 minus cj2 + ci2

)]

minus 2

k3n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)

times Z(x4 minus cj3 + ci2

)]

and

1

k4n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

knsum

j4=1

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

sum

Dj4l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj2 + ci1

)Z

(x3 minus cj3 + ci2

)Z

(x4 minus cj4 + ci2

)]

Furthermore the following relationships are possible for x1 x4

a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but

x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =

x3 = x4 but x1 = x2e x1 = x2 = x3 = x4

When calculating the variance of 1|Dn| var(Sb

n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1

|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-

plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner

Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral

(λn)4

k2n|Dn|2

knsum

i1=1

knsum

i2=1

knsum

j1=1

int

Dj1l(n)

Z(s1 minus cj1 + ci1

)

times[ knsum

j2=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

Z(s2 minus cj1 + ci1

)Z

(s3 minus cj2 + ci2

)

times Z(s4 minus cj2 + ci2

)F1(s1 s2 s3 s4) ds2 ds3 ds4

minus 2

kn

knsum

j2=1

knsum

j3=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj3l(n)

Z(s2 minus cj1 + ci1

)

times Z(s3 minus cj2 + ci2

)Z

(s4 minus cj3 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

+ 1

k2n

knsum

j2=1

knsum

j3=1

knsum

j4=1

int

Dj2l(n)

int

Dj3l(n)

int

Dj4l(n)

Z(s2 minus cj2 + ci1

)

times Z(s3 minus cj3 + ci2

)Z

(s4 minus cj4 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

]ds1

Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1

s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by

2C(λn)4

kn|Dn|2knsum

j1=1

int

Dj1l(n)

int

Dj1l(n)

int

Dn

int

Dn

|F2(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|F1(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by

C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|ϕ4(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ 4C(λn)4

kn|Dn|knsum

j2=1

int

Dn

int

Dj2l(n)

int

Dj2l(n)

|ϕ3(s1 s3 s4)|ds1 ds3 ds4

+ 2C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

[int

Dj1l(n)

int

Dj2l(n)

|ϕ(s1 s3)|ds1 ds3

]2

The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|

APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP

Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s

θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap

1386 Journal of the American Statistical Association December 2007

Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that

1

|Dn|var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0

where var[Sbn |(θ0) cap Dn] is var(Sb

n | cap N) defined in Appendix Bvar[Sb

n |(θn) cap Dn] is defined analogously

Letsum

D andsumsum =

Ddenote

sum

Dcap(acupb)

andsumsum

x1isinDcapNx2isinDcap(acupb)x1 =x2

Note that

1

|Dn|∣∣var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣

ltC

|Dn|kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)capN

sum

Dj

l(n)

1

+ C

|Dn|k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

capN

sum

Dj2l(n)

1

= C

|Dn|knsum

j=1

=sumsum

Dj

l(n)

1

+ C

|Dn|sum

Dn

1 + C

|Dn|kn

[sum

Dn

1

][ sum

xisinDncapN

1

]

Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for

0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator

[Received December 2006 Revised May 2007]

REFERENCES

Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42

Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350

Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston

Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point

Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York

Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-

VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-

geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear

Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821

(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125

Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737

Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730

Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag

McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419

Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482

Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics

From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275

Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154

Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47

Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93

Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523

(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048

Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley

Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258

Page 8: A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G UAN and Ji Meng L OH ... Estimates for the regression coefÞcients, ... block bootstrap

1384 Journal of the American Statistical Association December 2007

To show the first expression note that E[U(2)na(β0)] = 0 Thus we only

need look at the variance term

var[U

(2)na(β0)

]

= 1

|Dn|2int int

λ(2)(s1β0)λ(2)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(2)(sβ0)]2

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)

prarr 0

To show the second expression note that E[U(2)nb

(β0)] =1

|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only

the variance term

var[U

(2)nb

(β0)]

= 1

|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|2int [λ(1)(sβ0)]4

λ(sβ0)ds

which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb

(β0)prarr

1|Dn|

int [λ(1)(sβ0)]2λ(sβ0) ds

Now we want to show that |Dn|12U(1)n (β0) converges to a

normal distribution First we derive the mean and variance of|Dn|12U

(1)n (β0) Clearly E[|Dn|12U

(1)n (β0)] = 0 For the variance

var[|Dn|12U

(1)n (β0)

]

= 1

|Dn|int int

λ(1)(s1β0)λ(1)(s2β0)

λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2

+ 1

|Dn|int [λ(1)(sβ0)]2

λ(sβ0)ds

Now consider a new sequence of regions Dlowastn where Dlowast

n sub Dn and

|Dlowastn||Dn| rarr 1 as n rarr infin Let U

(1)nlowast(β0) be the estimating function

for the realization of the point process on Dlowastn Based on the expression

of the variance we can deduce that

var[|Dn|12U

(1)n (β0) minus |Dlowast

n|12U(1)nlowast(β0)

] rarr 0 as n rarr infin

Thus

|Dn|12U(1)n (β0) sim |Dlowast

n|12U(1)nlowast(β0)

where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast

n|12U(1)nlowast(β0) con-

verges to a normal distribution for some properly defined Dlowastn

To obtain a sequence of Dlowastn we apply the blocking technique used

by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt

η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi

l(n) i = 1 kn Within each subsquare we further obtain Di

m(n)

an m(n) times m(n) square sharing the same center with Dil(n)

Note that

d(Dim(n)

Djm(n)

) ge nη for i = j Now define

Dlowastn =

kn⋃

i=1

Dim(n)

Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show

that |Dlowastn|12U

(1)nlowast(β0) converges to a normal distribution This is true

due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here

APPENDIX B PROOF OF THEOREM 2

To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1

sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes

var(Sn) =Bsum

b=1

(Sbn minus Sn)2

B minus 1

As B goes to infinity we see that the foregoing converges to

var(Sbn | cap Dn)

= 1

kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)

sum

Dj

l(n)

Z(x1 minus cj + ci )Z(x2 minus cj + ci )

minus 1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj2l(n)

Z(x1 minus cj1 + ci

)

times Z(x2 minus cj2 + ci

)

where the notationsum

D is short forsum

Dcap We wish to show thatvar(Sb

n | cap Dn)|Dn| converges in L2 to

1

|Dn| var(Sn) = (λn)2

|Dn|int

Dn

int

Dn

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

+ λn

|Dn|int

Dn

[Z(s)]2 ds

To show this we need to show that

1

|Dn|E[var(Sbn | cap Dn)] rarr 1

|Dn| var(Sn) (B1)

and

1

|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)

To show (B1) note that

E[var(Sbn | cap Dn)]

= (λn)2knsum

i=1

int int

Dil(n)

Z(s1)Z(s2)g(s1 minus s2) ds1 ds2

+ λn(kn minus 1)

kn

int

Dn

[Z(s)]2 ds

minus (λn)2

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times g(s1 minus s2 + cj1 minus cj2

)ds1 ds2

Thus

1

|Dn|E[var(Sb

n | cap Dn)] minus var(Sn)

= minus λn

kn|Dn|int

Dn

[Z(s)]2 ds

minus (λn)2

|Dn|sumsum

i =j

int

Dil(n)

int

Dj

l(n)

Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2

Guan and Loh Block Bootstrap Variance Estimation 1385

minus (λn)2

|Dn|1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times ϕ(s1 minus s2 + cj1 minus cj2

)ds1 ds2

= minusT 1n minus T 2

n minus T 3n

T 1n goes to 0 due to conditions (2) and (5) For T 2

n and T 3n lengthy yet

elementary derivations yield

|T 2n | le C(λn)2

|kn|knsum

i=1

[1

|Dn|int

DnminusDn

|Dn cap (Dn minus s)||ϕ(s)|ds

minus 1

|Dil(n)

|int

Dil(n)

minusDil(n)

∣∣Dil(n) cap (

Dil(n) minus s

)∣∣|ϕ(s)|ds]

and

T 3n le C(λn)2

kn

1

|Dn|int

Dn

int

Dn

|ϕ(s1 minus s2)|ds1 ds2

Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1

n and T 3n are both

of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition

(12)To prove (B2) we first note that [var(Sb

n | cap Dn)]2 is equal to thesum of the following three terms

1

k2n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj2l(n)

[Z

(x1 minus cj1 + ci1

)

times Z(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)Z

(x4 minus cj2 + ci2

)]

minus 2

k3n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)

times Z(x4 minus cj3 + ci2

)]

and

1

k4n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

knsum

j4=1

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

sum

Dj4l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj2 + ci1

)Z

(x3 minus cj3 + ci2

)Z

(x4 minus cj4 + ci2

)]

Furthermore the following relationships are possible for x1 x4

a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but

x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =

x3 = x4 but x1 = x2e x1 = x2 = x3 = x4

When calculating the variance of 1|Dn| var(Sb

n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1

|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-

plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner

Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral

(λn)4

k2n|Dn|2

knsum

i1=1

knsum

i2=1

knsum

j1=1

int

Dj1l(n)

Z(s1 minus cj1 + ci1

)

times[ knsum

j2=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

Z(s2 minus cj1 + ci1

)Z

(s3 minus cj2 + ci2

)

times Z(s4 minus cj2 + ci2

)F1(s1 s2 s3 s4) ds2 ds3 ds4

minus 2

kn

knsum

j2=1

knsum

j3=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj3l(n)

Z(s2 minus cj1 + ci1

)

times Z(s3 minus cj2 + ci2

)Z

(s4 minus cj3 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

+ 1

k2n

knsum

j2=1

knsum

j3=1

knsum

j4=1

int

Dj2l(n)

int

Dj3l(n)

int

Dj4l(n)

Z(s2 minus cj2 + ci1

)

times Z(s3 minus cj3 + ci2

)Z

(s4 minus cj4 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

]ds1

Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1

s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by

2C(λn)4

kn|Dn|2knsum

j1=1

int

Dj1l(n)

int

Dj1l(n)

int

Dn

int

Dn

|F2(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|F1(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by

C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|ϕ4(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ 4C(λn)4

kn|Dn|knsum

j2=1

int

Dn

int

Dj2l(n)

int

Dj2l(n)

|ϕ3(s1 s3 s4)|ds1 ds3 ds4

+ 2C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

[int

Dj1l(n)

int

Dj2l(n)

|ϕ(s1 s3)|ds1 ds3

]2

The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|

APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP

Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s

θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap

1386 Journal of the American Statistical Association December 2007

Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that

1

|Dn|var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0

where var[Sbn |(θ0) cap Dn] is var(Sb

n | cap N) defined in Appendix Bvar[Sb

n |(θn) cap Dn] is defined analogously

Letsum

D andsumsum =

Ddenote

sum

Dcap(acupb)

andsumsum

x1isinDcapNx2isinDcap(acupb)x1 =x2

Note that

1

|Dn|∣∣var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣

ltC

|Dn|kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)capN

sum

Dj

l(n)

1

+ C

|Dn|k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

capN

sum

Dj2l(n)

1

= C

|Dn|knsum

j=1

=sumsum

Dj

l(n)

1

+ C

|Dn|sum

Dn

1 + C

|Dn|kn

[sum

Dn

1

][ sum

xisinDncapN

1

]

Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for

0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator

[Received December 2006 Revised May 2007]

REFERENCES

Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42

Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350

Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston

Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point

Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York

Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-

VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-

geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear

Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821

(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125

Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737

Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730

Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag

McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419

Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482

Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics

From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275

Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154

Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47

Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93

Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523

(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048

Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley

Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258

Page 9: A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G UAN and Ji Meng L OH ... Estimates for the regression coefÞcients, ... block bootstrap

Guan and Loh Block Bootstrap Variance Estimation 1385

minus (λn)2

|Dn|1

k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

int int

Dil(n)

Z(s1)Z(s2)

times ϕ(s1 minus s2 + cj1 minus cj2

)ds1 ds2

= minusT 1n minus T 2

n minus T 3n

T 1n goes to 0 due to conditions (2) and (5) For T 2

n and T 3n lengthy yet

elementary derivations yield

|T 2n | le C(λn)2

|kn|knsum

i=1

[1

|Dn|int

DnminusDn

|Dn cap (Dn minus s)||ϕ(s)|ds

minus 1

|Dil(n)

|int

Dil(n)

minusDil(n)

∣∣Dil(n) cap (

Dil(n) minus s

)∣∣|ϕ(s)|ds]

and

T 3n le C(λn)2

kn

1

|Dn|int

Dn

int

Dn

|ϕ(s1 minus s2)|ds1 ds2

Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1

n and T 3n are both

of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition

(12)To prove (B2) we first note that [var(Sb

n | cap Dn)]2 is equal to thesum of the following three terms

1

k2n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj2l(n)

[Z

(x1 minus cj1 + ci1

)

times Z(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)Z

(x4 minus cj2 + ci2

)]

minus 2

k3n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

sum

Dj1l(n)

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj1 + ci1

)Z

(x3 minus cj2 + ci2

)

times Z(x4 minus cj3 + ci2

)]

and

1

k4n

knsum

i1=1

knsum

i2=1

knsum

j1=1

knsum

j2=1

knsum

j3=1

knsum

j4=1

sum

Dj1l(n)

sum

Dj2l(n)

sum

Dj3l(n)

sum

Dj4l(n)

[Z

(x1 minus cj1

+ ci1

)Z

(x2 minus cj2 + ci1

)Z

(x3 minus cj3 + ci2

)Z

(x4 minus cj4 + ci2

)]

Furthermore the following relationships are possible for x1 x4

a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but

x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =

x3 = x4 but x1 = x2e x1 = x2 = x3 = x4

When calculating the variance of 1|Dn| var(Sb

n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1

|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-

plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner

Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral

(λn)4

k2n|Dn|2

knsum

i1=1

knsum

i2=1

knsum

j1=1

int

Dj1l(n)

Z(s1 minus cj1 + ci1

)

times[ knsum

j2=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

Z(s2 minus cj1 + ci1

)Z

(s3 minus cj2 + ci2

)

times Z(s4 minus cj2 + ci2

)F1(s1 s2 s3 s4) ds2 ds3 ds4

minus 2

kn

knsum

j2=1

knsum

j3=1

int

Dj1l(n)

int

Dj2l(n)

int

Dj3l(n)

Z(s2 minus cj1 + ci1

)

times Z(s3 minus cj2 + ci2

)Z

(s4 minus cj3 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

+ 1

k2n

knsum

j2=1

knsum

j3=1

knsum

j4=1

int

Dj2l(n)

int

Dj3l(n)

int

Dj4l(n)

Z(s2 minus cj2 + ci1

)

times Z(s3 minus cj3 + ci2

)Z

(s4 minus cj4 + ci2

)

times F1(s1 s2 s3 s4) ds2 ds3 ds4

]ds1

Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1

s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by

2C(λn)4

kn|Dn|2knsum

j1=1

int

Dj1l(n)

int

Dj1l(n)

int

Dn

int

Dn

|F2(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|F1(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by

C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

int

Dj1l(n)

int

Dj1l(n)

int

Dj2l(n)

int

Dj2l(n)

|ϕ4(s1 s2

s3 s4)|ds1 ds2 ds3 ds4

+ 4C(λn)4

kn|Dn|knsum

j2=1

int

Dn

int

Dj2l(n)

int

Dj2l(n)

|ϕ3(s1 s3 s4)|ds1 ds3 ds4

+ 2C(λn)4

|Dn|2knsum

j1=1

knsum

j2=1

[int

Dj1l(n)

int

Dj2l(n)

|ϕ(s1 s3)|ds1 ds3

]2

The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|

APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP

Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s

θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap

1386 Journal of the American Statistical Association December 2007

Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that

1

|Dn|var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0

where var[Sbn |(θ0) cap Dn] is var(Sb

n | cap N) defined in Appendix Bvar[Sb

n |(θn) cap Dn] is defined analogously

Letsum

D andsumsum =

Ddenote

sum

Dcap(acupb)

andsumsum

x1isinDcapNx2isinDcap(acupb)x1 =x2

Note that

1

|Dn|∣∣var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣

ltC

|Dn|kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)capN

sum

Dj

l(n)

1

+ C

|Dn|k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

capN

sum

Dj2l(n)

1

= C

|Dn|knsum

j=1

=sumsum

Dj

l(n)

1

+ C

|Dn|sum

Dn

1 + C

|Dn|kn

[sum

Dn

1

][ sum

xisinDncapN

1

]

Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for

0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator

[Received December 2006 Revised May 2007]

REFERENCES

Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42

Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350

Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston

Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point

Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York

Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-

VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-

geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear

Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821

(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125

Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737

Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730

Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag

McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419

Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482

Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics

From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275

Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154

Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47

Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93

Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523

(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048

Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley

Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258

Page 10: A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G UAN and Ji Meng L OH ... Estimates for the regression coefÞcients, ... block bootstrap

1386 Journal of the American Statistical Association December 2007

Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that

1

|Dn|var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0

where var[Sbn |(θ0) cap Dn] is var(Sb

n | cap N) defined in Appendix Bvar[Sb

n |(θn) cap Dn] is defined analogously

Letsum

D andsumsum =

Ddenote

sum

Dcap(acupb)

andsumsum

x1isinDcapNx2isinDcap(acupb)x1 =x2

Note that

1

|Dn|∣∣var[Sb

n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣

ltC

|Dn|kn

knsum

i=1

knsum

j=1

sum

Dj

l(n)capN

sum

Dj

l(n)

1

+ C

|Dn|k2n

knsum

i=1

knsum

j1=1

knsum

j2=1

sum

Dj1l(n)

capN

sum

Dj2l(n)

1

= C

|Dn|knsum

j=1

=sumsum

Dj

l(n)

1

+ C

|Dn|sum

Dn

1 + C

|Dn|kn

[sum

Dn

1

][ sum

xisinDncapN

1

]

Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for

0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator

[Received December 2006 Revised May 2007]

REFERENCES

Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42

Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350

Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston

Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point

Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York

Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-

VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-

geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear

Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821

(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125

Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737

Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730

Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag

McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419

Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482

Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics

From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275

Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154

Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47

Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93

Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523

(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048

Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley

Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258