A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G...
Click here to load reader
Transcript of A Thinned Block Bootstrap Variance Estimation Procedure ...meng/Papers/Jasa.Guan07.pdf · Yongtao G...
A Thinned Block Bootstrap Variance EstimationProcedure for Inhomogeneous Spatial Point Patterns
Yongtao GUAN and Ji Meng LOH
When modeling inhomogeneous spatial point patterns it is of interest to fit a parametric model for the first-order intensity function (FOIF)of the process in terms of some measured covariates Estimates for the regression coefficients say β can be obtained by maximizing aPoisson maximum likelihood criterion Little work has been done on the asymptotic distribution of β except in some special cases Inthis article we show that β is asymptotically normal for a general class of mixing processes To estimate the variance of β we propose anovel thinned block bootstrap procedure that assumes that the point process is second-order reweighted stationary To apply this procedureonly the FOIF and not any high-order terms of the process needs to be estimated We establish the consistency of the resulting varianceestimator and demonstrate its efficacy through simulations and an application to a real data example
KEY WORDS Block bootstrap Inhomogeneous spatial point process Thinning
1 INTRODUCTION
A main interest when analyzing spatial point pattern data is tomodel the first-order intensity function (FOIF) of the underly-ing process that has generated the spatial point pattern Heuris-tically the FOIF is a function that describes the likelihood foran event of the process to occur at a given location (see Sec 2for its formal definition) We say a process is homogeneous ifits FOIF is a constant and inhomogeneous otherwise In prac-tice we often wish to model the FOIF in relation to some mea-sured covariates For example for the Beilschmiedia pendulaLauraceae (BPL) data given in Section 5 we want to modelthe FOIF in terms of two important variables of landscape fea-tures elevation and gradient Because the FOIF characterizesthe probability finding a BPL tree at a given location with theassociated elevation and gradient values the study of this func-tion can yield valuable insights into how these landscape fea-tures affect the spatial distribution of BPL trees If a significantrelationship can be established then this will provide evidencein support of the niche assembly theory which states that differ-ent species benefit from different habitats determined by localenvironmental features (eg Waagepetersen 2007)
To model the FOIF we assume that it can be expressed asa parametric function of the available covariates Specificallylet N denote a spatial point process defined over R
2 with theFOIF at s isin R
2 given by λ(sβ) where β is a p times 1 vector ofunknown regression coefficients associated with the covariatesand let D be the region in which a realization of N has beenobserved Our goal is then to estimate and make inference onthe regression coefficients β
To estimate β we consider the following maximum likeli-hood criterion
U(β) = 1
|D|sum
xisinDcapN
logλ(xβ) minus 1
|D|int
D
λ(sβ)ds (1)
The maximizer of (1) is taken as an estimator for β (denotedby β throughout this section) Note that (1) is proportional to
Yongtao Guan is Assistant Professor Division of Biostatistics Yale Univer-sity New Haven CT 06520 (E-mail yongtaoguanyaleedu) Ji Meng Loh isAssociate Professor Department of Statistics Columbia University New YorkNY 10027 Guanrsquos research was supported in part by National Science Founda-tion (NSF) grant DMS-0706806 Ji Meng Lohrsquos research was supported in partby NSF grant AST-0507687 The authors thank Steve Rathbun Mike Shermanand Rasmus Waagepetersen for helpful discussions and the joint editors asso-ciate editor and three referees for their constructive comments that have greatlyimproved the manuscript
the true maximum likelihood if N is an inhomogeneous Pois-son process that is when the events of the process occurring indisjoint sets are completely independent For the BPL examplehowever the locations of the BPL trees are likely to be clus-tered possibly due to seed dispersal andor correlation amongenvironmental factors that have not been accounted for by themodel Schoenberg (2004) showed that under some mild con-ditions β obtained by maximizing (1) is still consistent for β
for a class of spatialndashtemporal point process models even ifthe process is not Poisson however he did not provide the as-ymptotic distribution of β Waagepetersen (2007) significantlyextended the scope of the method by deriving asymptotic prop-erties of β including asymptotic normality for a wide class ofspatial cluster processes
To make inference on β we need information on the distri-butional properties of β One standard approach to this is toderive the limiting distribution of β under an appropriate as-ymptotic framework and then use it as an approximation for thedistribution of β in a finite-sample setting We note that cur-rently available asymptotics for inhomogeneous spatial pointprocesses are inadequate because they either assume completespatial independence that is the process is Poisson (eg Rath-bun and Cressie 1994) or use a parametric model for the de-pendence of the process (eg Waagepetersen 2007) For dataarising from biological studies such as the BPL data howeverthe underlying biological process generating the spatial pointpatterns is rather complex and often not well understood Thususing a specific form for the dependence may be debatable andcould lead to incorrect inference on the regression parametersβ (because the distribution of β depends on the dependencestructure of the process)
In this article we study the distributional properties of β un-der an increasing domain setting To quantify the dependenceof the process we use a more flexible model-free mixing con-dition (Sec 2) but do not assume any specific parametric struc-ture on the dependence Our main result shows that under somemild conditions the standardized distribution of β is asymptot-ically normal If the variance of β is known the approximateconfidence intervals for β can be obtained so that the inferenceon β becomes straightforward Thus our result further extendsthe scope of applications of (1) beyond the work of Schoenberg(2004) and Waagepetersen (2007)
copy 2007 American Statistical AssociationJournal of the American Statistical Association
December 2007 Vol 102 No 480 Theory and MethodsDOI 101198016214507000000879
1377
1378 Journal of the American Statistical Association December 2007
One complication in practice is that the variance of β is un-known and thus must be estimated From Theorem 1 in Sec-tion 2 we see that the variance of β depends on the second-order cumulant function (SOCF) of the process a function thatis related to the dependence structure of the process To avoidspecifying the SOCF we develop a nonparametric thinnedblock bootstrap procedure to estimate the variance of β by us-ing a combination of a thinning algorithm and block bootstrapSpecifically in the thinning step we retain (ie do not thin) anobserved event from N with a probability that is proportional tothe inverse of the estimated FOIF at the location of the event IfN is second-order reweighted stationary (see Sec 2) and if β isclose to β then the thinned process should resemble a second-order stationary (SOS) process In Section 3 we show that thevariance of β can be written in terms of the variance of a sta-tistic Sn where Sn is computed from the thinned process and isexpressed in terms of the intensity function only The task of es-timating the variance of β is thus translated into estimating thevariance of a statistic defined on an SOS process which we canaccomplish using block bootstrap In Section 3 we prove thatthe resulting variance estimator is L2-consistent for the targetvariance and in Section 4 we report a simulation study done toinvestigate the performance of the proposed procedure To il-lustrate its use in a practical setting we also apply the proposedprocedure to the BPL data in Section 5
Before proceeding to the next section we note that resam-pling methods including the block bootstrap have been exten-sively applied in spatial statistics (eg Politis 1999 Lahiri2003) However most of these methods were developed forstationary quantitative spatial processes that are observed on aregularly spaced grid In the regression setting also for quan-titative processes Cressie (1993 sec 732) discussed bothsemiparametric and parametric bootstrap methods to resam-ple the residuals whereas Sherman (1997) proposed a subsam-pling approach Politis and Sherman (2001) and McElroy andPolitis (2007) considered using subsampling for marked pointprocesses The former assumed the point process to be station-ary whereas the latter assumed it to be Poisson None of theaforementioned resampling procedures can be used for inho-mogeneous spatial point processes due to the unique feature ofthe data Note that by nature an inhomogeneous spatial pointprocess is not quantitative is observed at random locations isnonstationary and can be non-Poisson
2 NOTATION AND PRELIMINARYASYMPTOTIC RESULTS
Let N be a two-dimensional spatial point process observedover a domain of interest D For a Borel set B sub R
2 let |B|denote the area of B and let N(B) denote the number of eventsfrom N that fall in B We define the kth-order intensity andcumulant functions of N as
λk(s1 sk) = lim|dsi |rarr0
E[N(ds1) middot middot middotN(dsk)]
|ds1| middot middot middot |dsk|
i = 1 k
and
Qk(s1 sk) = lim|dsi |rarr0
cum[N(ds1) N(dsk)]
|ds1| middot middot middot |dsk|
i = 1 k
Here ds is an infinitesimal region containing s and cum(Y1
Yk) is the coefficient of ikt1 middot middot middot tk in the Taylor series expansionof logE[exp(i
sumkj=1 Yj tj )] about the origin (eg Brillinger
1975) For the intensity function λk(s1 sk)|ds1| middot middot middot |dsk| isthe approximate probability that ds1 dsk each contains anevent For the cumulant function Qk(s1 sk) describes thedependence among sites s1 sk where a near-0 value in-dicates near independence Specifically if N is Poisson thenQk(s1 sk) = 0 if at least two of s1 sk are differentThe kth-order cumulant (intensity) function can be expressedas a function of the intensity (cumulant) functions up to thekth-order (See Daley and Vere-Jones 1988 p 147 for details)
We study the large-sample behavior of β under an increasing-domain setting where β is obtained by maximizing (1) Specif-ically consider a sequence of regions Dn Let partDn denote theboundary of Dn let |partDn| denote the length of partDn and let βn
denote β obtained over Dn We assume that
C1n2 le |Dn| le C2n
2 C1n le |partDn| le C2n
for some 0 lt C1 le C2 lt infin (2)
Condition (2) requires that Dn must become large in all di-rections (ie the data are truly spatial) and that the boundarymust not be too irregular Many commonly used domain se-quences satisfy this condition To see an example of this letA sub (01] times (01] be the interior of a simple closed curve withnonempty interior If we define Dn as A inflated by a factor nthen Dn satisfies condition (2) Note that this formulation in-corporates a wide variety of shapes including rectangular andelliptical shapes
To formally state the large-sample distributional propertiesof βn we need to quantify the dependence in N We do thisby using the model-free strong mixing coefficient (Rosenblatt1956) defined as
α(p k) equiv sup|P(A1 cap A2) minus P(A1)P (A2)|
A1 isinF(E1)A2 isinF(E2)
E2 = E1 + s |E1| = |E2| le pd(E1E2) ge k
Here the supremum is taken over all compact and convex sub-sets E1 sub R
2 and over all s isin R2 such that d(E1E2) ge k
where d(E1E2) is the maximal distance between E1 and E2(eg Guan Sherman and Calvin 2006) and F(E) is the σ -algebra generated by the random events of N that are in E Weassume the following mixing condition on N
supp
α(p k)
p= O(kminusε) for some ε gt 2 (3)
Condition (3) states that for any two fixed sets the dependencebetween them must decay to 0 at a polynomial rate of the in-terset distance k The speed in which the dependence decaysto 0 also depends on the size of the sets (ie p) In particularfor a fixed k condition (3) allows the dependence to increaseas p increases Any point process with a finite dependencerange such as the Mateacutern cluster process (Stoyan and Stoyan1994) satisfies this condition Furthermore it is also satisfiedby the log-Gaussian Cox process (LGCP) which is very flexi-ble in modeling environmental data (eg Moslashller Syversveenand Waagepetersen 1998) if the correlation of the underlying
Guan and Loh Block Bootstrap Variance Estimation 1379
Gaussian random field decays at a polynomial rate faster than2 + ε and has a spectral density bounded below from 0 This isdue to corollary 2 of Doukhan (1994 p 59)
In addition to conditions (2) and (3) we also need some mildconditions on the intensity and cumulant functions of N Inwhat follows let f (i)(β) denote the ith derivative of f (β) Weassume that
λ(sβ) is bounded below from 0 (4)
λ(2)(sβ) is bounded and continuous with respect to β (5)
and
sups1
intmiddot middot middot
int|Qk(s1 sk)|ds2 middot middot middot dsk lt C
for k = 234 (6)
Conditions (4) and (5) are straightforward conditions that canbe checked directly for a proposed FOIF model Condition (6)is a fairly weak condition It also requires the process N to beweakly dependent but from a different perspective from (3)In the homogeneous case (6) is implied by Brillinger mix-ing which holds for many commonly used point process mod-els such as the Poisson cluster process a class of doubly sto-chastic Poisson process (ie Cox process) and certain renewalprocess (eg Heinrich 1985) In the inhomogeneous caseQk(s1 sk) often can be written as λ(s1) middot middot middotλ(sk)ϕk(s2 minuss1 sk minus s1) where ϕk(middot) is a cumulant function for somehomogeneous process Then (6) holds if λ(s) is bounded andint middot middot middot int |ϕk(u2 uk)|du2 middot middot middot duk lt C Processes satisfyingthis condition include but are not limited to the LGCP theinhomogeneous NeymanndashScott process (INSP Waagepetersen2007) and any inhomogeneous process obtained by thinning ahomogeneous process satisfying this condition
Theorem 1 Assume that conditions (2)ndash(6) hold and that βn
converges to β0 in probability where β0 is the true parametervector Then
|Dn|12(n)minus12(βn minus β0)
drarr N(0 Ip)
where Ip is a p times p identity matrix
n = |Dn|(An)minus1Bn(An)
minus1
An =int
Dn
λ(1)(sβ0)[λ(1)(sβ0)]primeλ(sβ0)
ds
and
Bn = An +int int
Dn
λ(1)(s1β0)[λ(1)(s2β0)]primeλ(s1β0)λ(s2β0)
Q2(s1 s2) ds1 ds2
Proof See Appendix A
We assume in Theorem 1 that βn is a consistent estimatorfor β0 Schoenberg (2004) established the consistency of βn fora class of spatialndashtemporal processes under some mild condi-tions He assumed that the spatial domain was fixed but thatthe time domain increased to infinity His results can be di-rectly extended to our case Therefore we simply assume con-sistency in Theorem 1 In connection with previous results ourasymptotic result coincides with that of Rathbun and Cressie(1994) in the inhomogeneous spatial Poisson process case and
Waagepetersen (2007) in the INSP case Furthermore note thatn depends only on the first- and second-order properties of theprocess
3 THINNED BLOCK BOOTSTRAP
31 The Proposed Method
Theorem 1 states that |Dn|12(n)minus12(βn minus β0) converges
to a standard multivariate normal distribution as n increasesThis provides the theoretical foundation for making inferenceon β In practice however n is typically unknown and thusmust be estimated From the definition of n in Theorem 1 wesee that two quantities (ie An and Bn) need to be estimatedto estimate n Note that An depends only on the FOIF andthus can be estimated easily once a FOIF model has been fit-ted to the data however the quantity Bn also depends on theSOCF Q2(middot) Unless an explicit parametric model is assumedfor Q2(middot) which (as argued in Sec 1) may be unrealistic it isdifficult to directly estimate Bn In this section we propose athinned block bootstrap approach to first estimate a term that isrelated to Bn and then use it to produce an estimate for Bn
From now on we assume that the point process N issecond-order reweighted stationary (SORWS) that is thereexists a function g(middot) defined in R
2 such that λ2(s1 s2) =λ(s1)λ(s2)g(s1 minus s2) The function g(middot) is often referred toas the pair correlation function (PCF eg Stoyan and Stoyan1994) Note that our definition of second-order reweightedstationarity is a special case of the original definition givenby Baddeley Moslashller and Waagepetersen (2000) Specificallythese two definitions coincide if the PCF exists Examples ofSORWS processes include the INSP the LGCP and any pointprocess obtained by thinning an SORWS point process (egBaddeley et al 2000) We also assume that β0 is known Thuswe suppress the dependence of the FOIF on β in this sectionIn practice the proposed algorithm can be applied by simplyreplacing β0 with its estimate β A theoretical justification forusing β is given in Appendix C
The essence of our approach is based on the fact that anSORWS point process can be thinned to be SOS by apply-ing some proper thinning weights to the events (Guan 2007)Specifically we consider the following thinned process
=
x x isin N cap DnP (x is retained) = minsisinDn λ(s)λ(x)
(7)
Clearly is SOS because its first- and second-order intensityfunctions can be written as
λn = minsisinDn
λ(s) and λ2n(s1 s2) = (λn)2g(s1 minus s2)
where s1 s2 isin Dn Based on we then define the followingstatistic
Sn =sum
xisincapDn
λ(1)(x) (8)
Note that Q2(s1 s2) = λ(s1)λ(s2)[g(s1 minus s2)minus 1] because N isSORWS Thus the covariance matrix of Sn where Sn is definedin (8) is given by
cov(Sn)
= λn
int
Dn
λ(1)(s)[λ(1)(s)
]primeds
1380 Journal of the American Statistical Association December 2007
+ (λn)2int int
Dn
λ(1)(s1)[λ(1)(s2)]primeλ(s1)λ(s2)
Q2(s1 s2) ds1 ds2 (9)
The first term on the right side of (9) depends only on the FOIFand thus can be estimated once a FOIF model has been fittedThe second term interestingly is (λn)
2(Bn minus An) These factssuggest that if we can estimate cov(Sn) then we can use the re-lationship between cov(Sn) and Bn to estimate Bn in an obviousway Note that Sn is a statistic defined on cap Dn where isan SOS process Thus the problem of estimating Bn becomes aproblem of estimating the variance of a statistic calculated froman SOS process which we can accomplish by using the blockbootstrap procedure
Toward that end we divide Dn into kn subblocks of the sameshape and size Let Di
l(n) i = 1 kn denote these subblockswhere l(n) = cnα for some c gt 0 and α isin (01) signifies thesubblock size For each Di
l(n) let ci denote the ldquocenterrdquo of
the subblock where the ldquocentersrdquo are defined in some consis-tent way across all of the Di
l(n)rsquos We then resample with re-placement kn subblocks from all of the available subblocks andldquogluerdquo them together so as to create a new region that will beused to approximate Dn We do so for a large number say B
times For the bth collection of random subblocks let Jb be theset of kn random indices sampled from 1 kn that are as-sociated with the selected subblocks For each event x isin D
Jb(i)l(n)
we replace it with a new value x minus cJb(i) + ci so that all events
in DJb(i)l(n)
are properly shifted into Dil(n)
In other words the ith
sampled subblock DJb(i)l(n) will be ldquogluedrdquo at where Di
l(n) wasin the original data We then define
Sbn =
knsum
i=1
sum
xisincapDJb(i)
l(n)
λ(1)(x minus cJb(i) + ci
) (10)
where the sumsum
xisincapDJb(i)
l(n)
is over all events in block DJb(i)l(n)
translated into block Dil(n)
Based on the Sbn b = 1 B thus
obtained we can calculate the sample covariance of Sbn and use
that as an estimate for cov(Sn) Note that for a given realizationof N on Dn the application of the thinning algorithm givenin (7) can yield multiple thinned realizations This will in turnlead to multiple sample covariance matrices by applying theproposed block bootstrap procedure One possible solution is tosimply average all of the obtained sample covariance matricesso as to produce a single estimate for cov(Sn) To do this let K
denote the total number of thinned processes and let Sbnj and
Snj denote Sbn and the bootstrapped mean of Sb
n b = 1 B for the j th thinned process where Sb
n is as defined in (10) Wethus obtain the following estimator for the covariance matrixof Sn
cov(Sn) = 1
K
Ksum
j=1
Bsum
b=1
(Sbnj minus Snj )(Sb
nj minus Snj )prime
B minus 1 (11)
32 Consistency and Practical Issues
To study the asymptotic properties of cov(Sn) we also as-sume the following condition
1
|Dn|12
int
Dn
int
R2Dn
|g(s1 minus s2) minus 1|ds1 ds2 lt C (12)
where R2Dn = s s isin R
2 but s isin Dn Condition (12) is onlyslightly stronger than
intR2 |g(s) minus 1|ds lt C which is implied
by condition (6) In the case where the Dnrsquos are sufficientlyregular (eg a sequence of n times n squares) such that |(Dn +s) cap R
2Dn| lt Cs|Dn|12 then condition (12) is implied bythe condition
intR2 s|g(s) minus 1|ds lt C which holds for any
process that has a finite range of dependence Furthermore itholds for the LGCP if the correlation function of the underlyingGaussian random field generating the intensities decays to 0 ata rate faster than sminus3
Theorem 2 Assume that conditions (2) (4) (5) and (6)hold Then
1
|Dn|cov(Sn)
L2rarr 1
|Dn| cov(Sn)
If we further assume (12) then 1|Dn|2 E[ cov(Sn)minuscov(Sn)]2 le
C[1kn + 1|Dl(n)|] for some C lt infin
Proof See Appendix B
Theorem 2 establishes the L2 consistency of the proposedcovariance matrix estimator for a wide range of values for thenumber of thinned processes and the subblock size But to ap-ply the proposed method in practice these values must be deter-mined For the former the proposed thinning algorithm can beapplied as many times as necessary until a stable estimator canbe obtained For the latter Theorem 2 suggests that the ldquoopti-malrdquo rate for the subblock size is |Dn|12 where the word ldquoopti-malrdquo is in the sense of minimizing the mean squared error Thisresult agrees with the findings for the optimal subblock size forother resampling variance estimators obtained from quantita-tive processes (eg Sherman 1996) If we further assume thatthe best rate can be approximated by cn = c|Dn|12 where c isan unknown constant then we can adapt the algorithm of Halland Jing (1996) to estimate cn and thus determine the ldquooptimalrdquosubblock size
Specifically let Dim i = 1 kprime
n be a set of subblocks con-tained in Dn that have the same size and shape let Si
mj be Sn
in (8) obtained from the j th thinned process on Dim and let
Smj be the mean of Simj We then define the sample variancendash
covariance matrix of Simj averaged over the K replicate thinned
point processes
θm = 1
K
Ksum
j=1
kprimensum
i=1
(Simj minus Smj )(Si
mj minus Smj )prime
kprimen minus 1
If Dim is relatively small compared with Dn then the forego-
ing should be a good estimator for the true covariance matrix ofSm where Sm is Sn in (8) defined on Dm For each Di
m we thenapply (11) to estimate cov(Sm) over a fine set of candidate sub-block sizes say cprime
m = cprime|Dim|12 Let θm(j k) and [θ i
m(j k)]prime bethe (j k)th element of θm and the resulting estimator for usingcprimem We define the best cm as the one that has the smallest value
for the following criterion
M(cprimem) = 1
kprimen
kprimensum
i=1
psum
j=1
psum
k=1
[θ im(j k)]prime minus θm(j k)
2
The ldquooptimalrdquo subblock size for Dn cn then can be estimatedeasily using the relationship cn = cm(|Dn||Dm|)12
Guan and Loh Block Bootstrap Variance Estimation 1381
4 SIMULATIONS
41 Log-Gaussian Cox Process
We performed a simulation study to test the results of ourtheoretical findings On an observation region D of size L timesL where L = 12 and 4 we simulated 1000 realizations ofan inhomogeneous Poisson process with intensity function λ(s)given by
logλ(s) = α + βX(s) + G(s) (13)
where G is a mean-0 Gaussian random field This is the LGCPmodel with inhomogeneity due to the covariate X
For values of X we used a single realization from a dif-ferent mean-0 Gaussian random field We kept this realizationfixed throughout the whole simulation study that is we usedthe same X for each of the 1000 point process realizationsFigure 1 shows a surface plot of X for L = 4 The values ofX for L = 1 and 2 are the lower-left corner of those shownin the figure For G we used the exponential model C(h) =σ 2 exp(minushρ) for its covariance function where ρ = 1 and 2and σ = 1 We set α and β to be 702 and 2 We generated thepoint process patterns using the rpoispp function in the spat-stat R package (Baddeley and Turner 2005) The average num-ber of points simulated were about 1200 4500 and 16700 forL = 12 and 4
For each of the 1000 point patterns we obtained the esti-mates α and β by maximizing the Poisson likelihood (usingthe ppm function in spatstat) Next we thinned the point pat-tern according to the criterion described in Section 3 resultingin 20 independent thinned processes for each realization Theaverage number of points in the thinned patterns was roughly490 1590 and 5400 for L = 12 and 4 so about 30ndash40
Figure 1 Image of the covariate X used in the simulation Therange of X is between minus6066 and 5303 where darker colors rep-resent larger values of X
of the points were retained We then applied the bootstrap pro-cedure to each of the thinned processes to get the variance esti-mates for α and β From (13) we found that the quantities to becomputed with each bootstrap sample were
sumλ(slowast) for α andsum
X(slowast)λ(slowast) for β where slowast represents the locations of thepoints in a bootstrap sample We used nonoverlapping squareblocks in the bootstrap procedure For L = 1 we used blockswith side lengths 14 13 and 12 whereas for L = 24 weused blocks with side lengths 14 13 12 and 1 The numberof bootstrap samples was 499
We computed the variances of α and β from the bootstrapvariances using Theorem 1 (9) and the relationship of cov(Sn)
to Bn We averaged the variance estimates across the inde-pendent thinnings Finally we constructed nominal 95 con-fidence intervals yielding a total of 1000 confidence intervalsfor α and β
Table 1 gives the empirical coverage of the confidence inter-vals for β The coverage for α (not shown) is similar but slightlylower For both values of ρ we found that the empirical cover-age increased toward the nominal level as the observation re-gion size increased This increase in coverage agrees with whatwe would expect from the theory The empirical coverage forρ = 1 is typically closer to the nominal level than for ρ = 2Note that ρ = 1 corresponds to a weaker dependence Thuswe would expect our method to work better if the dependencein the data were relatively weaker It is interesting to note thatfor L = 2 and 4 the best coverage for ρ = 2 was achieved byusing a block larger than its counterpart for ρ = 1 Thus forerrors with longer-range dependence it appears that the blockbootstrap procedure requires larger blocks to work well
Table 2 gives the bootstrap estimates of the standard errorsof β for ρ = 2 averaged over the 1000 realizations The stan-dard deviations of these estimates over the 1000 realizationsare included in brackets The ldquotruerdquo standard errors computedusing the estimated regression parameters from the independentrealizations are also given The table shows that the bootstrapestimates have a negative bias becoming proportionally closerto the ldquotruerdquo standard errors as the region size L increases
In summary we found that our method worked reasonablywell when the regression parameters were estimated We founda slight undercoverage in each instance but noted that this un-dercoverage becomes smaller as the sample size increases Wealso estimated the standard errors by using the true regressionparameters in our thinning step and found that the coverageincreased only slightly
Table 1 Empirical coverage of 95 nominal confidence intervalsof β in the LGCP case
Block size
Region size 14 13 12 1
ρ = 11 85 77 492 91 90 88 494 93 93 93 90
ρ = 21 84 76 502 87 88 87 504 88 89 90 89
1382 Journal of the American Statistical Association December 2007
Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case
ρ = 2 Block size
Region size True SE 14 13 12 1
1 24 17(04) 16(05) 13(06)
2 12 10(02) 10(02) 10(03) 08(03)
4 07 056(003) 057(004) 060(006) 060(008)
42 Inhomogeneous NeymanndashScott Process
Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case
To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case
We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β
and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing
int a
0[K(t)c minus K(tκω)c]2 dt (14)
with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ
and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode
Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the
former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method
5 AN APPLICATION
We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model
λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not
Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and
the plug-in method in the INSP case
Block size
Region size Plug-in 14 13 12
ω = 021 96 91 89 812 96 94 94 94
ω = 041 93 87 87 772 95 90 90 91
Guan and Loh Block Bootstrap Variance Estimation 1383
Figure 2 Locations of BLP trees
We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters
To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not
Table 4 Results for the BLP data
Plug-in method Thinned block bootstrap
EST STD 95 CI STD 95 CI
β1 02 02 (minus02 06) 017 (minus014 054)
β2 584 253 (891080) 212 (169999)
NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval
From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions
APPENDIX A PROOF OF THEOREM 1
Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device
Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)
n (β0)]4 lt C for some C lt infin
Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-
tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded
by the following (ignoring some multiplicative constants)int int int int
|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4
+[int int
|Q2(s1 s2)|ds1 ds2
]2
+int int int
|Q3(s1 s2 s3)|ds1 ds2 ds3
+ |Dn|int int
|Q2(s1 s2)|ds1 ds2
+int int
|Q2(s1 s2)|ds1 ds2
where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven
Proof of Theorem 1
First note that
U(1)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(1)(xβ)
λ(xβ)minus
intλ(1)(sβ)ds
]
and
U(2)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(2)(xβ)
λ(xβ)minus
intλ(2)(sβ)ds
]
minus 1
|Dn|sum
xisinNcapDn
[λ(1)(xβ)
λ(xβ)
]2
= U(2)na(β) minus U
(2)nb
(β)
Using the Taylor expansion we can obtain
|Dn|12(βn minus β0) = minus[U
(2)n (βn)
]minus1|Dn|12U(1)n (β0)
where βn is between βn and β0 We need to show that
U(2)na(β0)
prarr 0
and
U(2)nb
(β0)prarr 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
1384 Journal of the American Statistical Association December 2007
To show the first expression note that E[U(2)na(β0)] = 0 Thus we only
need look at the variance term
var[U
(2)na(β0)
]
= 1
|Dn|2int int
λ(2)(s1β0)λ(2)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(2)(sβ0)]2
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)
prarr 0
To show the second expression note that E[U(2)nb
(β0)] =1
|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only
the variance term
var[U
(2)nb
(β0)]
= 1
|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(1)(sβ0)]4
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb
(β0)prarr
1|Dn|
int [λ(1)(sβ0)]2λ(sβ0) ds
Now we want to show that |Dn|12U(1)n (β0) converges to a
normal distribution First we derive the mean and variance of|Dn|12U
(1)n (β0) Clearly E[|Dn|12U
(1)n (β0)] = 0 For the variance
var[|Dn|12U
(1)n (β0)
]
= 1
|Dn|int int
λ(1)(s1β0)λ(1)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
Now consider a new sequence of regions Dlowastn where Dlowast
n sub Dn and
|Dlowastn||Dn| rarr 1 as n rarr infin Let U
(1)nlowast(β0) be the estimating function
for the realization of the point process on Dlowastn Based on the expression
of the variance we can deduce that
var[|Dn|12U
(1)n (β0) minus |Dlowast
n|12U(1)nlowast(β0)
] rarr 0 as n rarr infin
Thus
|Dn|12U(1)n (β0) sim |Dlowast
n|12U(1)nlowast(β0)
where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast
n|12U(1)nlowast(β0) con-
verges to a normal distribution for some properly defined Dlowastn
To obtain a sequence of Dlowastn we apply the blocking technique used
by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt
η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi
l(n) i = 1 kn Within each subsquare we further obtain Di
m(n)
an m(n) times m(n) square sharing the same center with Dil(n)
Note that
d(Dim(n)
Djm(n)
) ge nη for i = j Now define
Dlowastn =
kn⋃
i=1
Dim(n)
Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show
that |Dlowastn|12U
(1)nlowast(β0) converges to a normal distribution This is true
due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here
APPENDIX B PROOF OF THEOREM 2
To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1
sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes
var(Sn) =Bsum
b=1
(Sbn minus Sn)2
B minus 1
As B goes to infinity we see that the foregoing converges to
var(Sbn | cap Dn)
= 1
kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)
sum
Dj
l(n)
Z(x1 minus cj + ci )Z(x2 minus cj + ci )
minus 1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj2l(n)
Z(x1 minus cj1 + ci
)
times Z(x2 minus cj2 + ci
)
where the notationsum
D is short forsum
Dcap We wish to show thatvar(Sb
n | cap Dn)|Dn| converges in L2 to
1
|Dn| var(Sn) = (λn)2
|Dn|int
Dn
int
Dn
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
+ λn
|Dn|int
Dn
[Z(s)]2 ds
To show this we need to show that
1
|Dn|E[var(Sbn | cap Dn)] rarr 1
|Dn| var(Sn) (B1)
and
1
|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)
To show (B1) note that
E[var(Sbn | cap Dn)]
= (λn)2knsum
i=1
int int
Dil(n)
Z(s1)Z(s2)g(s1 minus s2) ds1 ds2
+ λn(kn minus 1)
kn
int
Dn
[Z(s)]2 ds
minus (λn)2
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times g(s1 minus s2 + cj1 minus cj2
)ds1 ds2
Thus
1
|Dn|E[var(Sb
n | cap Dn)] minus var(Sn)
= minus λn
kn|Dn|int
Dn
[Z(s)]2 ds
minus (λn)2
|Dn|sumsum
i =j
int
Dil(n)
int
Dj
l(n)
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
Guan and Loh Block Bootstrap Variance Estimation 1385
minus (λn)2
|Dn|1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times ϕ(s1 minus s2 + cj1 minus cj2
)ds1 ds2
= minusT 1n minus T 2
n minus T 3n
T 1n goes to 0 due to conditions (2) and (5) For T 2
n and T 3n lengthy yet
elementary derivations yield
|T 2n | le C(λn)2
|kn|knsum
i=1
[1
|Dn|int
DnminusDn
|Dn cap (Dn minus s)||ϕ(s)|ds
minus 1
|Dil(n)
|int
Dil(n)
minusDil(n)
∣∣Dil(n) cap (
Dil(n) minus s
)∣∣|ϕ(s)|ds]
and
T 3n le C(λn)2
kn
1
|Dn|int
Dn
int
Dn
|ϕ(s1 minus s2)|ds1 ds2
Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1
n and T 3n are both
of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition
(12)To prove (B2) we first note that [var(Sb
n | cap Dn)]2 is equal to thesum of the following three terms
1
k2n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj2l(n)
[Z
(x1 minus cj1 + ci1
)
times Z(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)Z
(x4 minus cj2 + ci2
)]
minus 2
k3n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)
times Z(x4 minus cj3 + ci2
)]
and
1
k4n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
knsum
j4=1
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
sum
Dj4l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj2 + ci1
)Z
(x3 minus cj3 + ci2
)Z
(x4 minus cj4 + ci2
)]
Furthermore the following relationships are possible for x1 x4
a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but
x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =
x3 = x4 but x1 = x2e x1 = x2 = x3 = x4
When calculating the variance of 1|Dn| var(Sb
n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1
|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-
plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner
Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral
(λn)4
k2n|Dn|2
knsum
i1=1
knsum
i2=1
knsum
j1=1
int
Dj1l(n)
Z(s1 minus cj1 + ci1
)
times[ knsum
j2=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
Z(s2 minus cj1 + ci1
)Z
(s3 minus cj2 + ci2
)
times Z(s4 minus cj2 + ci2
)F1(s1 s2 s3 s4) ds2 ds3 ds4
minus 2
kn
knsum
j2=1
knsum
j3=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj3l(n)
Z(s2 minus cj1 + ci1
)
times Z(s3 minus cj2 + ci2
)Z
(s4 minus cj3 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
+ 1
k2n
knsum
j2=1
knsum
j3=1
knsum
j4=1
int
Dj2l(n)
int
Dj3l(n)
int
Dj4l(n)
Z(s2 minus cj2 + ci1
)
times Z(s3 minus cj3 + ci2
)Z
(s4 minus cj4 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
]ds1
Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1
s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by
2C(λn)4
kn|Dn|2knsum
j1=1
int
Dj1l(n)
int
Dj1l(n)
int
Dn
int
Dn
|F2(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|F1(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by
C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|ϕ4(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ 4C(λn)4
kn|Dn|knsum
j2=1
int
Dn
int
Dj2l(n)
int
Dj2l(n)
|ϕ3(s1 s3 s4)|ds1 ds3 ds4
+ 2C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
[int
Dj1l(n)
int
Dj2l(n)
|ϕ(s1 s3)|ds1 ds3
]2
The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|
APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP
Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s
θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap
1386 Journal of the American Statistical Association December 2007
Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that
1
|Dn|var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0
where var[Sbn |(θ0) cap Dn] is var(Sb
n | cap N) defined in Appendix Bvar[Sb
n |(θn) cap Dn] is defined analogously
Letsum
D andsumsum =
Ddenote
sum
Dcap(acupb)
andsumsum
x1isinDcapNx2isinDcap(acupb)x1 =x2
Note that
1
|Dn|∣∣var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣
ltC
|Dn|kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)capN
sum
Dj
l(n)
1
+ C
|Dn|k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
capN
sum
Dj2l(n)
1
= C
|Dn|knsum
j=1
=sumsum
Dj
l(n)
1
+ C
|Dn|sum
Dn
1 + C
|Dn|kn
[sum
Dn
1
][ sum
xisinDncapN
1
]
Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for
0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator
[Received December 2006 Revised May 2007]
REFERENCES
Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42
Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350
Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston
Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point
Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York
Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-
VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-
geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear
Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821
(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125
Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737
Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730
Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag
McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419
Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482
Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics
From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275
Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154
Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47
Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93
Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523
(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048
Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley
Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258
1378 Journal of the American Statistical Association December 2007
One complication in practice is that the variance of β is un-known and thus must be estimated From Theorem 1 in Sec-tion 2 we see that the variance of β depends on the second-order cumulant function (SOCF) of the process a function thatis related to the dependence structure of the process To avoidspecifying the SOCF we develop a nonparametric thinnedblock bootstrap procedure to estimate the variance of β by us-ing a combination of a thinning algorithm and block bootstrapSpecifically in the thinning step we retain (ie do not thin) anobserved event from N with a probability that is proportional tothe inverse of the estimated FOIF at the location of the event IfN is second-order reweighted stationary (see Sec 2) and if β isclose to β then the thinned process should resemble a second-order stationary (SOS) process In Section 3 we show that thevariance of β can be written in terms of the variance of a sta-tistic Sn where Sn is computed from the thinned process and isexpressed in terms of the intensity function only The task of es-timating the variance of β is thus translated into estimating thevariance of a statistic defined on an SOS process which we canaccomplish using block bootstrap In Section 3 we prove thatthe resulting variance estimator is L2-consistent for the targetvariance and in Section 4 we report a simulation study done toinvestigate the performance of the proposed procedure To il-lustrate its use in a practical setting we also apply the proposedprocedure to the BPL data in Section 5
Before proceeding to the next section we note that resam-pling methods including the block bootstrap have been exten-sively applied in spatial statistics (eg Politis 1999 Lahiri2003) However most of these methods were developed forstationary quantitative spatial processes that are observed on aregularly spaced grid In the regression setting also for quan-titative processes Cressie (1993 sec 732) discussed bothsemiparametric and parametric bootstrap methods to resam-ple the residuals whereas Sherman (1997) proposed a subsam-pling approach Politis and Sherman (2001) and McElroy andPolitis (2007) considered using subsampling for marked pointprocesses The former assumed the point process to be station-ary whereas the latter assumed it to be Poisson None of theaforementioned resampling procedures can be used for inho-mogeneous spatial point processes due to the unique feature ofthe data Note that by nature an inhomogeneous spatial pointprocess is not quantitative is observed at random locations isnonstationary and can be non-Poisson
2 NOTATION AND PRELIMINARYASYMPTOTIC RESULTS
Let N be a two-dimensional spatial point process observedover a domain of interest D For a Borel set B sub R
2 let |B|denote the area of B and let N(B) denote the number of eventsfrom N that fall in B We define the kth-order intensity andcumulant functions of N as
λk(s1 sk) = lim|dsi |rarr0
E[N(ds1) middot middot middotN(dsk)]
|ds1| middot middot middot |dsk|
i = 1 k
and
Qk(s1 sk) = lim|dsi |rarr0
cum[N(ds1) N(dsk)]
|ds1| middot middot middot |dsk|
i = 1 k
Here ds is an infinitesimal region containing s and cum(Y1
Yk) is the coefficient of ikt1 middot middot middot tk in the Taylor series expansionof logE[exp(i
sumkj=1 Yj tj )] about the origin (eg Brillinger
1975) For the intensity function λk(s1 sk)|ds1| middot middot middot |dsk| isthe approximate probability that ds1 dsk each contains anevent For the cumulant function Qk(s1 sk) describes thedependence among sites s1 sk where a near-0 value in-dicates near independence Specifically if N is Poisson thenQk(s1 sk) = 0 if at least two of s1 sk are differentThe kth-order cumulant (intensity) function can be expressedas a function of the intensity (cumulant) functions up to thekth-order (See Daley and Vere-Jones 1988 p 147 for details)
We study the large-sample behavior of β under an increasing-domain setting where β is obtained by maximizing (1) Specif-ically consider a sequence of regions Dn Let partDn denote theboundary of Dn let |partDn| denote the length of partDn and let βn
denote β obtained over Dn We assume that
C1n2 le |Dn| le C2n
2 C1n le |partDn| le C2n
for some 0 lt C1 le C2 lt infin (2)
Condition (2) requires that Dn must become large in all di-rections (ie the data are truly spatial) and that the boundarymust not be too irregular Many commonly used domain se-quences satisfy this condition To see an example of this letA sub (01] times (01] be the interior of a simple closed curve withnonempty interior If we define Dn as A inflated by a factor nthen Dn satisfies condition (2) Note that this formulation in-corporates a wide variety of shapes including rectangular andelliptical shapes
To formally state the large-sample distributional propertiesof βn we need to quantify the dependence in N We do thisby using the model-free strong mixing coefficient (Rosenblatt1956) defined as
α(p k) equiv sup|P(A1 cap A2) minus P(A1)P (A2)|
A1 isinF(E1)A2 isinF(E2)
E2 = E1 + s |E1| = |E2| le pd(E1E2) ge k
Here the supremum is taken over all compact and convex sub-sets E1 sub R
2 and over all s isin R2 such that d(E1E2) ge k
where d(E1E2) is the maximal distance between E1 and E2(eg Guan Sherman and Calvin 2006) and F(E) is the σ -algebra generated by the random events of N that are in E Weassume the following mixing condition on N
supp
α(p k)
p= O(kminusε) for some ε gt 2 (3)
Condition (3) states that for any two fixed sets the dependencebetween them must decay to 0 at a polynomial rate of the in-terset distance k The speed in which the dependence decaysto 0 also depends on the size of the sets (ie p) In particularfor a fixed k condition (3) allows the dependence to increaseas p increases Any point process with a finite dependencerange such as the Mateacutern cluster process (Stoyan and Stoyan1994) satisfies this condition Furthermore it is also satisfiedby the log-Gaussian Cox process (LGCP) which is very flexi-ble in modeling environmental data (eg Moslashller Syversveenand Waagepetersen 1998) if the correlation of the underlying
Guan and Loh Block Bootstrap Variance Estimation 1379
Gaussian random field decays at a polynomial rate faster than2 + ε and has a spectral density bounded below from 0 This isdue to corollary 2 of Doukhan (1994 p 59)
In addition to conditions (2) and (3) we also need some mildconditions on the intensity and cumulant functions of N Inwhat follows let f (i)(β) denote the ith derivative of f (β) Weassume that
λ(sβ) is bounded below from 0 (4)
λ(2)(sβ) is bounded and continuous with respect to β (5)
and
sups1
intmiddot middot middot
int|Qk(s1 sk)|ds2 middot middot middot dsk lt C
for k = 234 (6)
Conditions (4) and (5) are straightforward conditions that canbe checked directly for a proposed FOIF model Condition (6)is a fairly weak condition It also requires the process N to beweakly dependent but from a different perspective from (3)In the homogeneous case (6) is implied by Brillinger mix-ing which holds for many commonly used point process mod-els such as the Poisson cluster process a class of doubly sto-chastic Poisson process (ie Cox process) and certain renewalprocess (eg Heinrich 1985) In the inhomogeneous caseQk(s1 sk) often can be written as λ(s1) middot middot middotλ(sk)ϕk(s2 minuss1 sk minus s1) where ϕk(middot) is a cumulant function for somehomogeneous process Then (6) holds if λ(s) is bounded andint middot middot middot int |ϕk(u2 uk)|du2 middot middot middot duk lt C Processes satisfyingthis condition include but are not limited to the LGCP theinhomogeneous NeymanndashScott process (INSP Waagepetersen2007) and any inhomogeneous process obtained by thinning ahomogeneous process satisfying this condition
Theorem 1 Assume that conditions (2)ndash(6) hold and that βn
converges to β0 in probability where β0 is the true parametervector Then
|Dn|12(n)minus12(βn minus β0)
drarr N(0 Ip)
where Ip is a p times p identity matrix
n = |Dn|(An)minus1Bn(An)
minus1
An =int
Dn
λ(1)(sβ0)[λ(1)(sβ0)]primeλ(sβ0)
ds
and
Bn = An +int int
Dn
λ(1)(s1β0)[λ(1)(s2β0)]primeλ(s1β0)λ(s2β0)
Q2(s1 s2) ds1 ds2
Proof See Appendix A
We assume in Theorem 1 that βn is a consistent estimatorfor β0 Schoenberg (2004) established the consistency of βn fora class of spatialndashtemporal processes under some mild condi-tions He assumed that the spatial domain was fixed but thatthe time domain increased to infinity His results can be di-rectly extended to our case Therefore we simply assume con-sistency in Theorem 1 In connection with previous results ourasymptotic result coincides with that of Rathbun and Cressie(1994) in the inhomogeneous spatial Poisson process case and
Waagepetersen (2007) in the INSP case Furthermore note thatn depends only on the first- and second-order properties of theprocess
3 THINNED BLOCK BOOTSTRAP
31 The Proposed Method
Theorem 1 states that |Dn|12(n)minus12(βn minus β0) converges
to a standard multivariate normal distribution as n increasesThis provides the theoretical foundation for making inferenceon β In practice however n is typically unknown and thusmust be estimated From the definition of n in Theorem 1 wesee that two quantities (ie An and Bn) need to be estimatedto estimate n Note that An depends only on the FOIF andthus can be estimated easily once a FOIF model has been fit-ted to the data however the quantity Bn also depends on theSOCF Q2(middot) Unless an explicit parametric model is assumedfor Q2(middot) which (as argued in Sec 1) may be unrealistic it isdifficult to directly estimate Bn In this section we propose athinned block bootstrap approach to first estimate a term that isrelated to Bn and then use it to produce an estimate for Bn
From now on we assume that the point process N issecond-order reweighted stationary (SORWS) that is thereexists a function g(middot) defined in R
2 such that λ2(s1 s2) =λ(s1)λ(s2)g(s1 minus s2) The function g(middot) is often referred toas the pair correlation function (PCF eg Stoyan and Stoyan1994) Note that our definition of second-order reweightedstationarity is a special case of the original definition givenby Baddeley Moslashller and Waagepetersen (2000) Specificallythese two definitions coincide if the PCF exists Examples ofSORWS processes include the INSP the LGCP and any pointprocess obtained by thinning an SORWS point process (egBaddeley et al 2000) We also assume that β0 is known Thuswe suppress the dependence of the FOIF on β in this sectionIn practice the proposed algorithm can be applied by simplyreplacing β0 with its estimate β A theoretical justification forusing β is given in Appendix C
The essence of our approach is based on the fact that anSORWS point process can be thinned to be SOS by apply-ing some proper thinning weights to the events (Guan 2007)Specifically we consider the following thinned process
=
x x isin N cap DnP (x is retained) = minsisinDn λ(s)λ(x)
(7)
Clearly is SOS because its first- and second-order intensityfunctions can be written as
λn = minsisinDn
λ(s) and λ2n(s1 s2) = (λn)2g(s1 minus s2)
where s1 s2 isin Dn Based on we then define the followingstatistic
Sn =sum
xisincapDn
λ(1)(x) (8)
Note that Q2(s1 s2) = λ(s1)λ(s2)[g(s1 minus s2)minus 1] because N isSORWS Thus the covariance matrix of Sn where Sn is definedin (8) is given by
cov(Sn)
= λn
int
Dn
λ(1)(s)[λ(1)(s)
]primeds
1380 Journal of the American Statistical Association December 2007
+ (λn)2int int
Dn
λ(1)(s1)[λ(1)(s2)]primeλ(s1)λ(s2)
Q2(s1 s2) ds1 ds2 (9)
The first term on the right side of (9) depends only on the FOIFand thus can be estimated once a FOIF model has been fittedThe second term interestingly is (λn)
2(Bn minus An) These factssuggest that if we can estimate cov(Sn) then we can use the re-lationship between cov(Sn) and Bn to estimate Bn in an obviousway Note that Sn is a statistic defined on cap Dn where isan SOS process Thus the problem of estimating Bn becomes aproblem of estimating the variance of a statistic calculated froman SOS process which we can accomplish by using the blockbootstrap procedure
Toward that end we divide Dn into kn subblocks of the sameshape and size Let Di
l(n) i = 1 kn denote these subblockswhere l(n) = cnα for some c gt 0 and α isin (01) signifies thesubblock size For each Di
l(n) let ci denote the ldquocenterrdquo of
the subblock where the ldquocentersrdquo are defined in some consis-tent way across all of the Di
l(n)rsquos We then resample with re-placement kn subblocks from all of the available subblocks andldquogluerdquo them together so as to create a new region that will beused to approximate Dn We do so for a large number say B
times For the bth collection of random subblocks let Jb be theset of kn random indices sampled from 1 kn that are as-sociated with the selected subblocks For each event x isin D
Jb(i)l(n)
we replace it with a new value x minus cJb(i) + ci so that all events
in DJb(i)l(n)
are properly shifted into Dil(n)
In other words the ith
sampled subblock DJb(i)l(n) will be ldquogluedrdquo at where Di
l(n) wasin the original data We then define
Sbn =
knsum
i=1
sum
xisincapDJb(i)
l(n)
λ(1)(x minus cJb(i) + ci
) (10)
where the sumsum
xisincapDJb(i)
l(n)
is over all events in block DJb(i)l(n)
translated into block Dil(n)
Based on the Sbn b = 1 B thus
obtained we can calculate the sample covariance of Sbn and use
that as an estimate for cov(Sn) Note that for a given realizationof N on Dn the application of the thinning algorithm givenin (7) can yield multiple thinned realizations This will in turnlead to multiple sample covariance matrices by applying theproposed block bootstrap procedure One possible solution is tosimply average all of the obtained sample covariance matricesso as to produce a single estimate for cov(Sn) To do this let K
denote the total number of thinned processes and let Sbnj and
Snj denote Sbn and the bootstrapped mean of Sb
n b = 1 B for the j th thinned process where Sb
n is as defined in (10) Wethus obtain the following estimator for the covariance matrixof Sn
cov(Sn) = 1
K
Ksum
j=1
Bsum
b=1
(Sbnj minus Snj )(Sb
nj minus Snj )prime
B minus 1 (11)
32 Consistency and Practical Issues
To study the asymptotic properties of cov(Sn) we also as-sume the following condition
1
|Dn|12
int
Dn
int
R2Dn
|g(s1 minus s2) minus 1|ds1 ds2 lt C (12)
where R2Dn = s s isin R
2 but s isin Dn Condition (12) is onlyslightly stronger than
intR2 |g(s) minus 1|ds lt C which is implied
by condition (6) In the case where the Dnrsquos are sufficientlyregular (eg a sequence of n times n squares) such that |(Dn +s) cap R
2Dn| lt Cs|Dn|12 then condition (12) is implied bythe condition
intR2 s|g(s) minus 1|ds lt C which holds for any
process that has a finite range of dependence Furthermore itholds for the LGCP if the correlation function of the underlyingGaussian random field generating the intensities decays to 0 ata rate faster than sminus3
Theorem 2 Assume that conditions (2) (4) (5) and (6)hold Then
1
|Dn|cov(Sn)
L2rarr 1
|Dn| cov(Sn)
If we further assume (12) then 1|Dn|2 E[ cov(Sn)minuscov(Sn)]2 le
C[1kn + 1|Dl(n)|] for some C lt infin
Proof See Appendix B
Theorem 2 establishes the L2 consistency of the proposedcovariance matrix estimator for a wide range of values for thenumber of thinned processes and the subblock size But to ap-ply the proposed method in practice these values must be deter-mined For the former the proposed thinning algorithm can beapplied as many times as necessary until a stable estimator canbe obtained For the latter Theorem 2 suggests that the ldquoopti-malrdquo rate for the subblock size is |Dn|12 where the word ldquoopti-malrdquo is in the sense of minimizing the mean squared error Thisresult agrees with the findings for the optimal subblock size forother resampling variance estimators obtained from quantita-tive processes (eg Sherman 1996) If we further assume thatthe best rate can be approximated by cn = c|Dn|12 where c isan unknown constant then we can adapt the algorithm of Halland Jing (1996) to estimate cn and thus determine the ldquooptimalrdquosubblock size
Specifically let Dim i = 1 kprime
n be a set of subblocks con-tained in Dn that have the same size and shape let Si
mj be Sn
in (8) obtained from the j th thinned process on Dim and let
Smj be the mean of Simj We then define the sample variancendash
covariance matrix of Simj averaged over the K replicate thinned
point processes
θm = 1
K
Ksum
j=1
kprimensum
i=1
(Simj minus Smj )(Si
mj minus Smj )prime
kprimen minus 1
If Dim is relatively small compared with Dn then the forego-
ing should be a good estimator for the true covariance matrix ofSm where Sm is Sn in (8) defined on Dm For each Di
m we thenapply (11) to estimate cov(Sm) over a fine set of candidate sub-block sizes say cprime
m = cprime|Dim|12 Let θm(j k) and [θ i
m(j k)]prime bethe (j k)th element of θm and the resulting estimator for usingcprimem We define the best cm as the one that has the smallest value
for the following criterion
M(cprimem) = 1
kprimen
kprimensum
i=1
psum
j=1
psum
k=1
[θ im(j k)]prime minus θm(j k)
2
The ldquooptimalrdquo subblock size for Dn cn then can be estimatedeasily using the relationship cn = cm(|Dn||Dm|)12
Guan and Loh Block Bootstrap Variance Estimation 1381
4 SIMULATIONS
41 Log-Gaussian Cox Process
We performed a simulation study to test the results of ourtheoretical findings On an observation region D of size L timesL where L = 12 and 4 we simulated 1000 realizations ofan inhomogeneous Poisson process with intensity function λ(s)given by
logλ(s) = α + βX(s) + G(s) (13)
where G is a mean-0 Gaussian random field This is the LGCPmodel with inhomogeneity due to the covariate X
For values of X we used a single realization from a dif-ferent mean-0 Gaussian random field We kept this realizationfixed throughout the whole simulation study that is we usedthe same X for each of the 1000 point process realizationsFigure 1 shows a surface plot of X for L = 4 The values ofX for L = 1 and 2 are the lower-left corner of those shownin the figure For G we used the exponential model C(h) =σ 2 exp(minushρ) for its covariance function where ρ = 1 and 2and σ = 1 We set α and β to be 702 and 2 We generated thepoint process patterns using the rpoispp function in the spat-stat R package (Baddeley and Turner 2005) The average num-ber of points simulated were about 1200 4500 and 16700 forL = 12 and 4
For each of the 1000 point patterns we obtained the esti-mates α and β by maximizing the Poisson likelihood (usingthe ppm function in spatstat) Next we thinned the point pat-tern according to the criterion described in Section 3 resultingin 20 independent thinned processes for each realization Theaverage number of points in the thinned patterns was roughly490 1590 and 5400 for L = 12 and 4 so about 30ndash40
Figure 1 Image of the covariate X used in the simulation Therange of X is between minus6066 and 5303 where darker colors rep-resent larger values of X
of the points were retained We then applied the bootstrap pro-cedure to each of the thinned processes to get the variance esti-mates for α and β From (13) we found that the quantities to becomputed with each bootstrap sample were
sumλ(slowast) for α andsum
X(slowast)λ(slowast) for β where slowast represents the locations of thepoints in a bootstrap sample We used nonoverlapping squareblocks in the bootstrap procedure For L = 1 we used blockswith side lengths 14 13 and 12 whereas for L = 24 weused blocks with side lengths 14 13 12 and 1 The numberof bootstrap samples was 499
We computed the variances of α and β from the bootstrapvariances using Theorem 1 (9) and the relationship of cov(Sn)
to Bn We averaged the variance estimates across the inde-pendent thinnings Finally we constructed nominal 95 con-fidence intervals yielding a total of 1000 confidence intervalsfor α and β
Table 1 gives the empirical coverage of the confidence inter-vals for β The coverage for α (not shown) is similar but slightlylower For both values of ρ we found that the empirical cover-age increased toward the nominal level as the observation re-gion size increased This increase in coverage agrees with whatwe would expect from the theory The empirical coverage forρ = 1 is typically closer to the nominal level than for ρ = 2Note that ρ = 1 corresponds to a weaker dependence Thuswe would expect our method to work better if the dependencein the data were relatively weaker It is interesting to note thatfor L = 2 and 4 the best coverage for ρ = 2 was achieved byusing a block larger than its counterpart for ρ = 1 Thus forerrors with longer-range dependence it appears that the blockbootstrap procedure requires larger blocks to work well
Table 2 gives the bootstrap estimates of the standard errorsof β for ρ = 2 averaged over the 1000 realizations The stan-dard deviations of these estimates over the 1000 realizationsare included in brackets The ldquotruerdquo standard errors computedusing the estimated regression parameters from the independentrealizations are also given The table shows that the bootstrapestimates have a negative bias becoming proportionally closerto the ldquotruerdquo standard errors as the region size L increases
In summary we found that our method worked reasonablywell when the regression parameters were estimated We founda slight undercoverage in each instance but noted that this un-dercoverage becomes smaller as the sample size increases Wealso estimated the standard errors by using the true regressionparameters in our thinning step and found that the coverageincreased only slightly
Table 1 Empirical coverage of 95 nominal confidence intervalsof β in the LGCP case
Block size
Region size 14 13 12 1
ρ = 11 85 77 492 91 90 88 494 93 93 93 90
ρ = 21 84 76 502 87 88 87 504 88 89 90 89
1382 Journal of the American Statistical Association December 2007
Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case
ρ = 2 Block size
Region size True SE 14 13 12 1
1 24 17(04) 16(05) 13(06)
2 12 10(02) 10(02) 10(03) 08(03)
4 07 056(003) 057(004) 060(006) 060(008)
42 Inhomogeneous NeymanndashScott Process
Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case
To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case
We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β
and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing
int a
0[K(t)c minus K(tκω)c]2 dt (14)
with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ
and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode
Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the
former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method
5 AN APPLICATION
We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model
λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not
Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and
the plug-in method in the INSP case
Block size
Region size Plug-in 14 13 12
ω = 021 96 91 89 812 96 94 94 94
ω = 041 93 87 87 772 95 90 90 91
Guan and Loh Block Bootstrap Variance Estimation 1383
Figure 2 Locations of BLP trees
We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters
To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not
Table 4 Results for the BLP data
Plug-in method Thinned block bootstrap
EST STD 95 CI STD 95 CI
β1 02 02 (minus02 06) 017 (minus014 054)
β2 584 253 (891080) 212 (169999)
NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval
From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions
APPENDIX A PROOF OF THEOREM 1
Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device
Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)
n (β0)]4 lt C for some C lt infin
Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-
tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded
by the following (ignoring some multiplicative constants)int int int int
|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4
+[int int
|Q2(s1 s2)|ds1 ds2
]2
+int int int
|Q3(s1 s2 s3)|ds1 ds2 ds3
+ |Dn|int int
|Q2(s1 s2)|ds1 ds2
+int int
|Q2(s1 s2)|ds1 ds2
where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven
Proof of Theorem 1
First note that
U(1)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(1)(xβ)
λ(xβ)minus
intλ(1)(sβ)ds
]
and
U(2)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(2)(xβ)
λ(xβ)minus
intλ(2)(sβ)ds
]
minus 1
|Dn|sum
xisinNcapDn
[λ(1)(xβ)
λ(xβ)
]2
= U(2)na(β) minus U
(2)nb
(β)
Using the Taylor expansion we can obtain
|Dn|12(βn minus β0) = minus[U
(2)n (βn)
]minus1|Dn|12U(1)n (β0)
where βn is between βn and β0 We need to show that
U(2)na(β0)
prarr 0
and
U(2)nb
(β0)prarr 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
1384 Journal of the American Statistical Association December 2007
To show the first expression note that E[U(2)na(β0)] = 0 Thus we only
need look at the variance term
var[U
(2)na(β0)
]
= 1
|Dn|2int int
λ(2)(s1β0)λ(2)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(2)(sβ0)]2
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)
prarr 0
To show the second expression note that E[U(2)nb
(β0)] =1
|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only
the variance term
var[U
(2)nb
(β0)]
= 1
|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(1)(sβ0)]4
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb
(β0)prarr
1|Dn|
int [λ(1)(sβ0)]2λ(sβ0) ds
Now we want to show that |Dn|12U(1)n (β0) converges to a
normal distribution First we derive the mean and variance of|Dn|12U
(1)n (β0) Clearly E[|Dn|12U
(1)n (β0)] = 0 For the variance
var[|Dn|12U
(1)n (β0)
]
= 1
|Dn|int int
λ(1)(s1β0)λ(1)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
Now consider a new sequence of regions Dlowastn where Dlowast
n sub Dn and
|Dlowastn||Dn| rarr 1 as n rarr infin Let U
(1)nlowast(β0) be the estimating function
for the realization of the point process on Dlowastn Based on the expression
of the variance we can deduce that
var[|Dn|12U
(1)n (β0) minus |Dlowast
n|12U(1)nlowast(β0)
] rarr 0 as n rarr infin
Thus
|Dn|12U(1)n (β0) sim |Dlowast
n|12U(1)nlowast(β0)
where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast
n|12U(1)nlowast(β0) con-
verges to a normal distribution for some properly defined Dlowastn
To obtain a sequence of Dlowastn we apply the blocking technique used
by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt
η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi
l(n) i = 1 kn Within each subsquare we further obtain Di
m(n)
an m(n) times m(n) square sharing the same center with Dil(n)
Note that
d(Dim(n)
Djm(n)
) ge nη for i = j Now define
Dlowastn =
kn⋃
i=1
Dim(n)
Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show
that |Dlowastn|12U
(1)nlowast(β0) converges to a normal distribution This is true
due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here
APPENDIX B PROOF OF THEOREM 2
To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1
sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes
var(Sn) =Bsum
b=1
(Sbn minus Sn)2
B minus 1
As B goes to infinity we see that the foregoing converges to
var(Sbn | cap Dn)
= 1
kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)
sum
Dj
l(n)
Z(x1 minus cj + ci )Z(x2 minus cj + ci )
minus 1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj2l(n)
Z(x1 minus cj1 + ci
)
times Z(x2 minus cj2 + ci
)
where the notationsum
D is short forsum
Dcap We wish to show thatvar(Sb
n | cap Dn)|Dn| converges in L2 to
1
|Dn| var(Sn) = (λn)2
|Dn|int
Dn
int
Dn
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
+ λn
|Dn|int
Dn
[Z(s)]2 ds
To show this we need to show that
1
|Dn|E[var(Sbn | cap Dn)] rarr 1
|Dn| var(Sn) (B1)
and
1
|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)
To show (B1) note that
E[var(Sbn | cap Dn)]
= (λn)2knsum
i=1
int int
Dil(n)
Z(s1)Z(s2)g(s1 minus s2) ds1 ds2
+ λn(kn minus 1)
kn
int
Dn
[Z(s)]2 ds
minus (λn)2
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times g(s1 minus s2 + cj1 minus cj2
)ds1 ds2
Thus
1
|Dn|E[var(Sb
n | cap Dn)] minus var(Sn)
= minus λn
kn|Dn|int
Dn
[Z(s)]2 ds
minus (λn)2
|Dn|sumsum
i =j
int
Dil(n)
int
Dj
l(n)
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
Guan and Loh Block Bootstrap Variance Estimation 1385
minus (λn)2
|Dn|1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times ϕ(s1 minus s2 + cj1 minus cj2
)ds1 ds2
= minusT 1n minus T 2
n minus T 3n
T 1n goes to 0 due to conditions (2) and (5) For T 2
n and T 3n lengthy yet
elementary derivations yield
|T 2n | le C(λn)2
|kn|knsum
i=1
[1
|Dn|int
DnminusDn
|Dn cap (Dn minus s)||ϕ(s)|ds
minus 1
|Dil(n)
|int
Dil(n)
minusDil(n)
∣∣Dil(n) cap (
Dil(n) minus s
)∣∣|ϕ(s)|ds]
and
T 3n le C(λn)2
kn
1
|Dn|int
Dn
int
Dn
|ϕ(s1 minus s2)|ds1 ds2
Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1
n and T 3n are both
of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition
(12)To prove (B2) we first note that [var(Sb
n | cap Dn)]2 is equal to thesum of the following three terms
1
k2n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj2l(n)
[Z
(x1 minus cj1 + ci1
)
times Z(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)Z
(x4 minus cj2 + ci2
)]
minus 2
k3n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)
times Z(x4 minus cj3 + ci2
)]
and
1
k4n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
knsum
j4=1
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
sum
Dj4l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj2 + ci1
)Z
(x3 minus cj3 + ci2
)Z
(x4 minus cj4 + ci2
)]
Furthermore the following relationships are possible for x1 x4
a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but
x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =
x3 = x4 but x1 = x2e x1 = x2 = x3 = x4
When calculating the variance of 1|Dn| var(Sb
n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1
|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-
plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner
Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral
(λn)4
k2n|Dn|2
knsum
i1=1
knsum
i2=1
knsum
j1=1
int
Dj1l(n)
Z(s1 minus cj1 + ci1
)
times[ knsum
j2=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
Z(s2 minus cj1 + ci1
)Z
(s3 minus cj2 + ci2
)
times Z(s4 minus cj2 + ci2
)F1(s1 s2 s3 s4) ds2 ds3 ds4
minus 2
kn
knsum
j2=1
knsum
j3=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj3l(n)
Z(s2 minus cj1 + ci1
)
times Z(s3 minus cj2 + ci2
)Z
(s4 minus cj3 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
+ 1
k2n
knsum
j2=1
knsum
j3=1
knsum
j4=1
int
Dj2l(n)
int
Dj3l(n)
int
Dj4l(n)
Z(s2 minus cj2 + ci1
)
times Z(s3 minus cj3 + ci2
)Z
(s4 minus cj4 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
]ds1
Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1
s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by
2C(λn)4
kn|Dn|2knsum
j1=1
int
Dj1l(n)
int
Dj1l(n)
int
Dn
int
Dn
|F2(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|F1(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by
C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|ϕ4(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ 4C(λn)4
kn|Dn|knsum
j2=1
int
Dn
int
Dj2l(n)
int
Dj2l(n)
|ϕ3(s1 s3 s4)|ds1 ds3 ds4
+ 2C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
[int
Dj1l(n)
int
Dj2l(n)
|ϕ(s1 s3)|ds1 ds3
]2
The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|
APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP
Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s
θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap
1386 Journal of the American Statistical Association December 2007
Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that
1
|Dn|var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0
where var[Sbn |(θ0) cap Dn] is var(Sb
n | cap N) defined in Appendix Bvar[Sb
n |(θn) cap Dn] is defined analogously
Letsum
D andsumsum =
Ddenote
sum
Dcap(acupb)
andsumsum
x1isinDcapNx2isinDcap(acupb)x1 =x2
Note that
1
|Dn|∣∣var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣
ltC
|Dn|kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)capN
sum
Dj
l(n)
1
+ C
|Dn|k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
capN
sum
Dj2l(n)
1
= C
|Dn|knsum
j=1
=sumsum
Dj
l(n)
1
+ C
|Dn|sum
Dn
1 + C
|Dn|kn
[sum
Dn
1
][ sum
xisinDncapN
1
]
Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for
0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator
[Received December 2006 Revised May 2007]
REFERENCES
Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42
Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350
Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston
Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point
Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York
Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-
VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-
geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear
Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821
(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125
Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737
Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730
Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag
McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419
Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482
Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics
From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275
Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154
Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47
Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93
Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523
(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048
Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley
Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258
Guan and Loh Block Bootstrap Variance Estimation 1379
Gaussian random field decays at a polynomial rate faster than2 + ε and has a spectral density bounded below from 0 This isdue to corollary 2 of Doukhan (1994 p 59)
In addition to conditions (2) and (3) we also need some mildconditions on the intensity and cumulant functions of N Inwhat follows let f (i)(β) denote the ith derivative of f (β) Weassume that
λ(sβ) is bounded below from 0 (4)
λ(2)(sβ) is bounded and continuous with respect to β (5)
and
sups1
intmiddot middot middot
int|Qk(s1 sk)|ds2 middot middot middot dsk lt C
for k = 234 (6)
Conditions (4) and (5) are straightforward conditions that canbe checked directly for a proposed FOIF model Condition (6)is a fairly weak condition It also requires the process N to beweakly dependent but from a different perspective from (3)In the homogeneous case (6) is implied by Brillinger mix-ing which holds for many commonly used point process mod-els such as the Poisson cluster process a class of doubly sto-chastic Poisson process (ie Cox process) and certain renewalprocess (eg Heinrich 1985) In the inhomogeneous caseQk(s1 sk) often can be written as λ(s1) middot middot middotλ(sk)ϕk(s2 minuss1 sk minus s1) where ϕk(middot) is a cumulant function for somehomogeneous process Then (6) holds if λ(s) is bounded andint middot middot middot int |ϕk(u2 uk)|du2 middot middot middot duk lt C Processes satisfyingthis condition include but are not limited to the LGCP theinhomogeneous NeymanndashScott process (INSP Waagepetersen2007) and any inhomogeneous process obtained by thinning ahomogeneous process satisfying this condition
Theorem 1 Assume that conditions (2)ndash(6) hold and that βn
converges to β0 in probability where β0 is the true parametervector Then
|Dn|12(n)minus12(βn minus β0)
drarr N(0 Ip)
where Ip is a p times p identity matrix
n = |Dn|(An)minus1Bn(An)
minus1
An =int
Dn
λ(1)(sβ0)[λ(1)(sβ0)]primeλ(sβ0)
ds
and
Bn = An +int int
Dn
λ(1)(s1β0)[λ(1)(s2β0)]primeλ(s1β0)λ(s2β0)
Q2(s1 s2) ds1 ds2
Proof See Appendix A
We assume in Theorem 1 that βn is a consistent estimatorfor β0 Schoenberg (2004) established the consistency of βn fora class of spatialndashtemporal processes under some mild condi-tions He assumed that the spatial domain was fixed but thatthe time domain increased to infinity His results can be di-rectly extended to our case Therefore we simply assume con-sistency in Theorem 1 In connection with previous results ourasymptotic result coincides with that of Rathbun and Cressie(1994) in the inhomogeneous spatial Poisson process case and
Waagepetersen (2007) in the INSP case Furthermore note thatn depends only on the first- and second-order properties of theprocess
3 THINNED BLOCK BOOTSTRAP
31 The Proposed Method
Theorem 1 states that |Dn|12(n)minus12(βn minus β0) converges
to a standard multivariate normal distribution as n increasesThis provides the theoretical foundation for making inferenceon β In practice however n is typically unknown and thusmust be estimated From the definition of n in Theorem 1 wesee that two quantities (ie An and Bn) need to be estimatedto estimate n Note that An depends only on the FOIF andthus can be estimated easily once a FOIF model has been fit-ted to the data however the quantity Bn also depends on theSOCF Q2(middot) Unless an explicit parametric model is assumedfor Q2(middot) which (as argued in Sec 1) may be unrealistic it isdifficult to directly estimate Bn In this section we propose athinned block bootstrap approach to first estimate a term that isrelated to Bn and then use it to produce an estimate for Bn
From now on we assume that the point process N issecond-order reweighted stationary (SORWS) that is thereexists a function g(middot) defined in R
2 such that λ2(s1 s2) =λ(s1)λ(s2)g(s1 minus s2) The function g(middot) is often referred toas the pair correlation function (PCF eg Stoyan and Stoyan1994) Note that our definition of second-order reweightedstationarity is a special case of the original definition givenby Baddeley Moslashller and Waagepetersen (2000) Specificallythese two definitions coincide if the PCF exists Examples ofSORWS processes include the INSP the LGCP and any pointprocess obtained by thinning an SORWS point process (egBaddeley et al 2000) We also assume that β0 is known Thuswe suppress the dependence of the FOIF on β in this sectionIn practice the proposed algorithm can be applied by simplyreplacing β0 with its estimate β A theoretical justification forusing β is given in Appendix C
The essence of our approach is based on the fact that anSORWS point process can be thinned to be SOS by apply-ing some proper thinning weights to the events (Guan 2007)Specifically we consider the following thinned process
=
x x isin N cap DnP (x is retained) = minsisinDn λ(s)λ(x)
(7)
Clearly is SOS because its first- and second-order intensityfunctions can be written as
λn = minsisinDn
λ(s) and λ2n(s1 s2) = (λn)2g(s1 minus s2)
where s1 s2 isin Dn Based on we then define the followingstatistic
Sn =sum
xisincapDn
λ(1)(x) (8)
Note that Q2(s1 s2) = λ(s1)λ(s2)[g(s1 minus s2)minus 1] because N isSORWS Thus the covariance matrix of Sn where Sn is definedin (8) is given by
cov(Sn)
= λn
int
Dn
λ(1)(s)[λ(1)(s)
]primeds
1380 Journal of the American Statistical Association December 2007
+ (λn)2int int
Dn
λ(1)(s1)[λ(1)(s2)]primeλ(s1)λ(s2)
Q2(s1 s2) ds1 ds2 (9)
The first term on the right side of (9) depends only on the FOIFand thus can be estimated once a FOIF model has been fittedThe second term interestingly is (λn)
2(Bn minus An) These factssuggest that if we can estimate cov(Sn) then we can use the re-lationship between cov(Sn) and Bn to estimate Bn in an obviousway Note that Sn is a statistic defined on cap Dn where isan SOS process Thus the problem of estimating Bn becomes aproblem of estimating the variance of a statistic calculated froman SOS process which we can accomplish by using the blockbootstrap procedure
Toward that end we divide Dn into kn subblocks of the sameshape and size Let Di
l(n) i = 1 kn denote these subblockswhere l(n) = cnα for some c gt 0 and α isin (01) signifies thesubblock size For each Di
l(n) let ci denote the ldquocenterrdquo of
the subblock where the ldquocentersrdquo are defined in some consis-tent way across all of the Di
l(n)rsquos We then resample with re-placement kn subblocks from all of the available subblocks andldquogluerdquo them together so as to create a new region that will beused to approximate Dn We do so for a large number say B
times For the bth collection of random subblocks let Jb be theset of kn random indices sampled from 1 kn that are as-sociated with the selected subblocks For each event x isin D
Jb(i)l(n)
we replace it with a new value x minus cJb(i) + ci so that all events
in DJb(i)l(n)
are properly shifted into Dil(n)
In other words the ith
sampled subblock DJb(i)l(n) will be ldquogluedrdquo at where Di
l(n) wasin the original data We then define
Sbn =
knsum
i=1
sum
xisincapDJb(i)
l(n)
λ(1)(x minus cJb(i) + ci
) (10)
where the sumsum
xisincapDJb(i)
l(n)
is over all events in block DJb(i)l(n)
translated into block Dil(n)
Based on the Sbn b = 1 B thus
obtained we can calculate the sample covariance of Sbn and use
that as an estimate for cov(Sn) Note that for a given realizationof N on Dn the application of the thinning algorithm givenin (7) can yield multiple thinned realizations This will in turnlead to multiple sample covariance matrices by applying theproposed block bootstrap procedure One possible solution is tosimply average all of the obtained sample covariance matricesso as to produce a single estimate for cov(Sn) To do this let K
denote the total number of thinned processes and let Sbnj and
Snj denote Sbn and the bootstrapped mean of Sb
n b = 1 B for the j th thinned process where Sb
n is as defined in (10) Wethus obtain the following estimator for the covariance matrixof Sn
cov(Sn) = 1
K
Ksum
j=1
Bsum
b=1
(Sbnj minus Snj )(Sb
nj minus Snj )prime
B minus 1 (11)
32 Consistency and Practical Issues
To study the asymptotic properties of cov(Sn) we also as-sume the following condition
1
|Dn|12
int
Dn
int
R2Dn
|g(s1 minus s2) minus 1|ds1 ds2 lt C (12)
where R2Dn = s s isin R
2 but s isin Dn Condition (12) is onlyslightly stronger than
intR2 |g(s) minus 1|ds lt C which is implied
by condition (6) In the case where the Dnrsquos are sufficientlyregular (eg a sequence of n times n squares) such that |(Dn +s) cap R
2Dn| lt Cs|Dn|12 then condition (12) is implied bythe condition
intR2 s|g(s) minus 1|ds lt C which holds for any
process that has a finite range of dependence Furthermore itholds for the LGCP if the correlation function of the underlyingGaussian random field generating the intensities decays to 0 ata rate faster than sminus3
Theorem 2 Assume that conditions (2) (4) (5) and (6)hold Then
1
|Dn|cov(Sn)
L2rarr 1
|Dn| cov(Sn)
If we further assume (12) then 1|Dn|2 E[ cov(Sn)minuscov(Sn)]2 le
C[1kn + 1|Dl(n)|] for some C lt infin
Proof See Appendix B
Theorem 2 establishes the L2 consistency of the proposedcovariance matrix estimator for a wide range of values for thenumber of thinned processes and the subblock size But to ap-ply the proposed method in practice these values must be deter-mined For the former the proposed thinning algorithm can beapplied as many times as necessary until a stable estimator canbe obtained For the latter Theorem 2 suggests that the ldquoopti-malrdquo rate for the subblock size is |Dn|12 where the word ldquoopti-malrdquo is in the sense of minimizing the mean squared error Thisresult agrees with the findings for the optimal subblock size forother resampling variance estimators obtained from quantita-tive processes (eg Sherman 1996) If we further assume thatthe best rate can be approximated by cn = c|Dn|12 where c isan unknown constant then we can adapt the algorithm of Halland Jing (1996) to estimate cn and thus determine the ldquooptimalrdquosubblock size
Specifically let Dim i = 1 kprime
n be a set of subblocks con-tained in Dn that have the same size and shape let Si
mj be Sn
in (8) obtained from the j th thinned process on Dim and let
Smj be the mean of Simj We then define the sample variancendash
covariance matrix of Simj averaged over the K replicate thinned
point processes
θm = 1
K
Ksum
j=1
kprimensum
i=1
(Simj minus Smj )(Si
mj minus Smj )prime
kprimen minus 1
If Dim is relatively small compared with Dn then the forego-
ing should be a good estimator for the true covariance matrix ofSm where Sm is Sn in (8) defined on Dm For each Di
m we thenapply (11) to estimate cov(Sm) over a fine set of candidate sub-block sizes say cprime
m = cprime|Dim|12 Let θm(j k) and [θ i
m(j k)]prime bethe (j k)th element of θm and the resulting estimator for usingcprimem We define the best cm as the one that has the smallest value
for the following criterion
M(cprimem) = 1
kprimen
kprimensum
i=1
psum
j=1
psum
k=1
[θ im(j k)]prime minus θm(j k)
2
The ldquooptimalrdquo subblock size for Dn cn then can be estimatedeasily using the relationship cn = cm(|Dn||Dm|)12
Guan and Loh Block Bootstrap Variance Estimation 1381
4 SIMULATIONS
41 Log-Gaussian Cox Process
We performed a simulation study to test the results of ourtheoretical findings On an observation region D of size L timesL where L = 12 and 4 we simulated 1000 realizations ofan inhomogeneous Poisson process with intensity function λ(s)given by
logλ(s) = α + βX(s) + G(s) (13)
where G is a mean-0 Gaussian random field This is the LGCPmodel with inhomogeneity due to the covariate X
For values of X we used a single realization from a dif-ferent mean-0 Gaussian random field We kept this realizationfixed throughout the whole simulation study that is we usedthe same X for each of the 1000 point process realizationsFigure 1 shows a surface plot of X for L = 4 The values ofX for L = 1 and 2 are the lower-left corner of those shownin the figure For G we used the exponential model C(h) =σ 2 exp(minushρ) for its covariance function where ρ = 1 and 2and σ = 1 We set α and β to be 702 and 2 We generated thepoint process patterns using the rpoispp function in the spat-stat R package (Baddeley and Turner 2005) The average num-ber of points simulated were about 1200 4500 and 16700 forL = 12 and 4
For each of the 1000 point patterns we obtained the esti-mates α and β by maximizing the Poisson likelihood (usingthe ppm function in spatstat) Next we thinned the point pat-tern according to the criterion described in Section 3 resultingin 20 independent thinned processes for each realization Theaverage number of points in the thinned patterns was roughly490 1590 and 5400 for L = 12 and 4 so about 30ndash40
Figure 1 Image of the covariate X used in the simulation Therange of X is between minus6066 and 5303 where darker colors rep-resent larger values of X
of the points were retained We then applied the bootstrap pro-cedure to each of the thinned processes to get the variance esti-mates for α and β From (13) we found that the quantities to becomputed with each bootstrap sample were
sumλ(slowast) for α andsum
X(slowast)λ(slowast) for β where slowast represents the locations of thepoints in a bootstrap sample We used nonoverlapping squareblocks in the bootstrap procedure For L = 1 we used blockswith side lengths 14 13 and 12 whereas for L = 24 weused blocks with side lengths 14 13 12 and 1 The numberof bootstrap samples was 499
We computed the variances of α and β from the bootstrapvariances using Theorem 1 (9) and the relationship of cov(Sn)
to Bn We averaged the variance estimates across the inde-pendent thinnings Finally we constructed nominal 95 con-fidence intervals yielding a total of 1000 confidence intervalsfor α and β
Table 1 gives the empirical coverage of the confidence inter-vals for β The coverage for α (not shown) is similar but slightlylower For both values of ρ we found that the empirical cover-age increased toward the nominal level as the observation re-gion size increased This increase in coverage agrees with whatwe would expect from the theory The empirical coverage forρ = 1 is typically closer to the nominal level than for ρ = 2Note that ρ = 1 corresponds to a weaker dependence Thuswe would expect our method to work better if the dependencein the data were relatively weaker It is interesting to note thatfor L = 2 and 4 the best coverage for ρ = 2 was achieved byusing a block larger than its counterpart for ρ = 1 Thus forerrors with longer-range dependence it appears that the blockbootstrap procedure requires larger blocks to work well
Table 2 gives the bootstrap estimates of the standard errorsof β for ρ = 2 averaged over the 1000 realizations The stan-dard deviations of these estimates over the 1000 realizationsare included in brackets The ldquotruerdquo standard errors computedusing the estimated regression parameters from the independentrealizations are also given The table shows that the bootstrapestimates have a negative bias becoming proportionally closerto the ldquotruerdquo standard errors as the region size L increases
In summary we found that our method worked reasonablywell when the regression parameters were estimated We founda slight undercoverage in each instance but noted that this un-dercoverage becomes smaller as the sample size increases Wealso estimated the standard errors by using the true regressionparameters in our thinning step and found that the coverageincreased only slightly
Table 1 Empirical coverage of 95 nominal confidence intervalsof β in the LGCP case
Block size
Region size 14 13 12 1
ρ = 11 85 77 492 91 90 88 494 93 93 93 90
ρ = 21 84 76 502 87 88 87 504 88 89 90 89
1382 Journal of the American Statistical Association December 2007
Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case
ρ = 2 Block size
Region size True SE 14 13 12 1
1 24 17(04) 16(05) 13(06)
2 12 10(02) 10(02) 10(03) 08(03)
4 07 056(003) 057(004) 060(006) 060(008)
42 Inhomogeneous NeymanndashScott Process
Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case
To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case
We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β
and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing
int a
0[K(t)c minus K(tκω)c]2 dt (14)
with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ
and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode
Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the
former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method
5 AN APPLICATION
We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model
λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not
Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and
the plug-in method in the INSP case
Block size
Region size Plug-in 14 13 12
ω = 021 96 91 89 812 96 94 94 94
ω = 041 93 87 87 772 95 90 90 91
Guan and Loh Block Bootstrap Variance Estimation 1383
Figure 2 Locations of BLP trees
We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters
To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not
Table 4 Results for the BLP data
Plug-in method Thinned block bootstrap
EST STD 95 CI STD 95 CI
β1 02 02 (minus02 06) 017 (minus014 054)
β2 584 253 (891080) 212 (169999)
NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval
From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions
APPENDIX A PROOF OF THEOREM 1
Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device
Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)
n (β0)]4 lt C for some C lt infin
Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-
tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded
by the following (ignoring some multiplicative constants)int int int int
|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4
+[int int
|Q2(s1 s2)|ds1 ds2
]2
+int int int
|Q3(s1 s2 s3)|ds1 ds2 ds3
+ |Dn|int int
|Q2(s1 s2)|ds1 ds2
+int int
|Q2(s1 s2)|ds1 ds2
where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven
Proof of Theorem 1
First note that
U(1)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(1)(xβ)
λ(xβ)minus
intλ(1)(sβ)ds
]
and
U(2)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(2)(xβ)
λ(xβ)minus
intλ(2)(sβ)ds
]
minus 1
|Dn|sum
xisinNcapDn
[λ(1)(xβ)
λ(xβ)
]2
= U(2)na(β) minus U
(2)nb
(β)
Using the Taylor expansion we can obtain
|Dn|12(βn minus β0) = minus[U
(2)n (βn)
]minus1|Dn|12U(1)n (β0)
where βn is between βn and β0 We need to show that
U(2)na(β0)
prarr 0
and
U(2)nb
(β0)prarr 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
1384 Journal of the American Statistical Association December 2007
To show the first expression note that E[U(2)na(β0)] = 0 Thus we only
need look at the variance term
var[U
(2)na(β0)
]
= 1
|Dn|2int int
λ(2)(s1β0)λ(2)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(2)(sβ0)]2
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)
prarr 0
To show the second expression note that E[U(2)nb
(β0)] =1
|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only
the variance term
var[U
(2)nb
(β0)]
= 1
|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(1)(sβ0)]4
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb
(β0)prarr
1|Dn|
int [λ(1)(sβ0)]2λ(sβ0) ds
Now we want to show that |Dn|12U(1)n (β0) converges to a
normal distribution First we derive the mean and variance of|Dn|12U
(1)n (β0) Clearly E[|Dn|12U
(1)n (β0)] = 0 For the variance
var[|Dn|12U
(1)n (β0)
]
= 1
|Dn|int int
λ(1)(s1β0)λ(1)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
Now consider a new sequence of regions Dlowastn where Dlowast
n sub Dn and
|Dlowastn||Dn| rarr 1 as n rarr infin Let U
(1)nlowast(β0) be the estimating function
for the realization of the point process on Dlowastn Based on the expression
of the variance we can deduce that
var[|Dn|12U
(1)n (β0) minus |Dlowast
n|12U(1)nlowast(β0)
] rarr 0 as n rarr infin
Thus
|Dn|12U(1)n (β0) sim |Dlowast
n|12U(1)nlowast(β0)
where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast
n|12U(1)nlowast(β0) con-
verges to a normal distribution for some properly defined Dlowastn
To obtain a sequence of Dlowastn we apply the blocking technique used
by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt
η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi
l(n) i = 1 kn Within each subsquare we further obtain Di
m(n)
an m(n) times m(n) square sharing the same center with Dil(n)
Note that
d(Dim(n)
Djm(n)
) ge nη for i = j Now define
Dlowastn =
kn⋃
i=1
Dim(n)
Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show
that |Dlowastn|12U
(1)nlowast(β0) converges to a normal distribution This is true
due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here
APPENDIX B PROOF OF THEOREM 2
To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1
sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes
var(Sn) =Bsum
b=1
(Sbn minus Sn)2
B minus 1
As B goes to infinity we see that the foregoing converges to
var(Sbn | cap Dn)
= 1
kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)
sum
Dj
l(n)
Z(x1 minus cj + ci )Z(x2 minus cj + ci )
minus 1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj2l(n)
Z(x1 minus cj1 + ci
)
times Z(x2 minus cj2 + ci
)
where the notationsum
D is short forsum
Dcap We wish to show thatvar(Sb
n | cap Dn)|Dn| converges in L2 to
1
|Dn| var(Sn) = (λn)2
|Dn|int
Dn
int
Dn
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
+ λn
|Dn|int
Dn
[Z(s)]2 ds
To show this we need to show that
1
|Dn|E[var(Sbn | cap Dn)] rarr 1
|Dn| var(Sn) (B1)
and
1
|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)
To show (B1) note that
E[var(Sbn | cap Dn)]
= (λn)2knsum
i=1
int int
Dil(n)
Z(s1)Z(s2)g(s1 minus s2) ds1 ds2
+ λn(kn minus 1)
kn
int
Dn
[Z(s)]2 ds
minus (λn)2
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times g(s1 minus s2 + cj1 minus cj2
)ds1 ds2
Thus
1
|Dn|E[var(Sb
n | cap Dn)] minus var(Sn)
= minus λn
kn|Dn|int
Dn
[Z(s)]2 ds
minus (λn)2
|Dn|sumsum
i =j
int
Dil(n)
int
Dj
l(n)
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
Guan and Loh Block Bootstrap Variance Estimation 1385
minus (λn)2
|Dn|1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times ϕ(s1 minus s2 + cj1 minus cj2
)ds1 ds2
= minusT 1n minus T 2
n minus T 3n
T 1n goes to 0 due to conditions (2) and (5) For T 2
n and T 3n lengthy yet
elementary derivations yield
|T 2n | le C(λn)2
|kn|knsum
i=1
[1
|Dn|int
DnminusDn
|Dn cap (Dn minus s)||ϕ(s)|ds
minus 1
|Dil(n)
|int
Dil(n)
minusDil(n)
∣∣Dil(n) cap (
Dil(n) minus s
)∣∣|ϕ(s)|ds]
and
T 3n le C(λn)2
kn
1
|Dn|int
Dn
int
Dn
|ϕ(s1 minus s2)|ds1 ds2
Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1
n and T 3n are both
of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition
(12)To prove (B2) we first note that [var(Sb
n | cap Dn)]2 is equal to thesum of the following three terms
1
k2n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj2l(n)
[Z
(x1 minus cj1 + ci1
)
times Z(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)Z
(x4 minus cj2 + ci2
)]
minus 2
k3n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)
times Z(x4 minus cj3 + ci2
)]
and
1
k4n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
knsum
j4=1
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
sum
Dj4l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj2 + ci1
)Z
(x3 minus cj3 + ci2
)Z
(x4 minus cj4 + ci2
)]
Furthermore the following relationships are possible for x1 x4
a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but
x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =
x3 = x4 but x1 = x2e x1 = x2 = x3 = x4
When calculating the variance of 1|Dn| var(Sb
n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1
|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-
plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner
Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral
(λn)4
k2n|Dn|2
knsum
i1=1
knsum
i2=1
knsum
j1=1
int
Dj1l(n)
Z(s1 minus cj1 + ci1
)
times[ knsum
j2=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
Z(s2 minus cj1 + ci1
)Z
(s3 minus cj2 + ci2
)
times Z(s4 minus cj2 + ci2
)F1(s1 s2 s3 s4) ds2 ds3 ds4
minus 2
kn
knsum
j2=1
knsum
j3=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj3l(n)
Z(s2 minus cj1 + ci1
)
times Z(s3 minus cj2 + ci2
)Z
(s4 minus cj3 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
+ 1
k2n
knsum
j2=1
knsum
j3=1
knsum
j4=1
int
Dj2l(n)
int
Dj3l(n)
int
Dj4l(n)
Z(s2 minus cj2 + ci1
)
times Z(s3 minus cj3 + ci2
)Z
(s4 minus cj4 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
]ds1
Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1
s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by
2C(λn)4
kn|Dn|2knsum
j1=1
int
Dj1l(n)
int
Dj1l(n)
int
Dn
int
Dn
|F2(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|F1(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by
C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|ϕ4(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ 4C(λn)4
kn|Dn|knsum
j2=1
int
Dn
int
Dj2l(n)
int
Dj2l(n)
|ϕ3(s1 s3 s4)|ds1 ds3 ds4
+ 2C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
[int
Dj1l(n)
int
Dj2l(n)
|ϕ(s1 s3)|ds1 ds3
]2
The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|
APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP
Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s
θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap
1386 Journal of the American Statistical Association December 2007
Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that
1
|Dn|var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0
where var[Sbn |(θ0) cap Dn] is var(Sb
n | cap N) defined in Appendix Bvar[Sb
n |(θn) cap Dn] is defined analogously
Letsum
D andsumsum =
Ddenote
sum
Dcap(acupb)
andsumsum
x1isinDcapNx2isinDcap(acupb)x1 =x2
Note that
1
|Dn|∣∣var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣
ltC
|Dn|kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)capN
sum
Dj
l(n)
1
+ C
|Dn|k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
capN
sum
Dj2l(n)
1
= C
|Dn|knsum
j=1
=sumsum
Dj
l(n)
1
+ C
|Dn|sum
Dn
1 + C
|Dn|kn
[sum
Dn
1
][ sum
xisinDncapN
1
]
Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for
0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator
[Received December 2006 Revised May 2007]
REFERENCES
Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42
Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350
Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston
Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point
Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York
Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-
VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-
geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear
Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821
(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125
Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737
Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730
Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag
McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419
Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482
Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics
From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275
Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154
Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47
Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93
Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523
(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048
Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley
Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258
1380 Journal of the American Statistical Association December 2007
+ (λn)2int int
Dn
λ(1)(s1)[λ(1)(s2)]primeλ(s1)λ(s2)
Q2(s1 s2) ds1 ds2 (9)
The first term on the right side of (9) depends only on the FOIFand thus can be estimated once a FOIF model has been fittedThe second term interestingly is (λn)
2(Bn minus An) These factssuggest that if we can estimate cov(Sn) then we can use the re-lationship between cov(Sn) and Bn to estimate Bn in an obviousway Note that Sn is a statistic defined on cap Dn where isan SOS process Thus the problem of estimating Bn becomes aproblem of estimating the variance of a statistic calculated froman SOS process which we can accomplish by using the blockbootstrap procedure
Toward that end we divide Dn into kn subblocks of the sameshape and size Let Di
l(n) i = 1 kn denote these subblockswhere l(n) = cnα for some c gt 0 and α isin (01) signifies thesubblock size For each Di
l(n) let ci denote the ldquocenterrdquo of
the subblock where the ldquocentersrdquo are defined in some consis-tent way across all of the Di
l(n)rsquos We then resample with re-placement kn subblocks from all of the available subblocks andldquogluerdquo them together so as to create a new region that will beused to approximate Dn We do so for a large number say B
times For the bth collection of random subblocks let Jb be theset of kn random indices sampled from 1 kn that are as-sociated with the selected subblocks For each event x isin D
Jb(i)l(n)
we replace it with a new value x minus cJb(i) + ci so that all events
in DJb(i)l(n)
are properly shifted into Dil(n)
In other words the ith
sampled subblock DJb(i)l(n) will be ldquogluedrdquo at where Di
l(n) wasin the original data We then define
Sbn =
knsum
i=1
sum
xisincapDJb(i)
l(n)
λ(1)(x minus cJb(i) + ci
) (10)
where the sumsum
xisincapDJb(i)
l(n)
is over all events in block DJb(i)l(n)
translated into block Dil(n)
Based on the Sbn b = 1 B thus
obtained we can calculate the sample covariance of Sbn and use
that as an estimate for cov(Sn) Note that for a given realizationof N on Dn the application of the thinning algorithm givenin (7) can yield multiple thinned realizations This will in turnlead to multiple sample covariance matrices by applying theproposed block bootstrap procedure One possible solution is tosimply average all of the obtained sample covariance matricesso as to produce a single estimate for cov(Sn) To do this let K
denote the total number of thinned processes and let Sbnj and
Snj denote Sbn and the bootstrapped mean of Sb
n b = 1 B for the j th thinned process where Sb
n is as defined in (10) Wethus obtain the following estimator for the covariance matrixof Sn
cov(Sn) = 1
K
Ksum
j=1
Bsum
b=1
(Sbnj minus Snj )(Sb
nj minus Snj )prime
B minus 1 (11)
32 Consistency and Practical Issues
To study the asymptotic properties of cov(Sn) we also as-sume the following condition
1
|Dn|12
int
Dn
int
R2Dn
|g(s1 minus s2) minus 1|ds1 ds2 lt C (12)
where R2Dn = s s isin R
2 but s isin Dn Condition (12) is onlyslightly stronger than
intR2 |g(s) minus 1|ds lt C which is implied
by condition (6) In the case where the Dnrsquos are sufficientlyregular (eg a sequence of n times n squares) such that |(Dn +s) cap R
2Dn| lt Cs|Dn|12 then condition (12) is implied bythe condition
intR2 s|g(s) minus 1|ds lt C which holds for any
process that has a finite range of dependence Furthermore itholds for the LGCP if the correlation function of the underlyingGaussian random field generating the intensities decays to 0 ata rate faster than sminus3
Theorem 2 Assume that conditions (2) (4) (5) and (6)hold Then
1
|Dn|cov(Sn)
L2rarr 1
|Dn| cov(Sn)
If we further assume (12) then 1|Dn|2 E[ cov(Sn)minuscov(Sn)]2 le
C[1kn + 1|Dl(n)|] for some C lt infin
Proof See Appendix B
Theorem 2 establishes the L2 consistency of the proposedcovariance matrix estimator for a wide range of values for thenumber of thinned processes and the subblock size But to ap-ply the proposed method in practice these values must be deter-mined For the former the proposed thinning algorithm can beapplied as many times as necessary until a stable estimator canbe obtained For the latter Theorem 2 suggests that the ldquoopti-malrdquo rate for the subblock size is |Dn|12 where the word ldquoopti-malrdquo is in the sense of minimizing the mean squared error Thisresult agrees with the findings for the optimal subblock size forother resampling variance estimators obtained from quantita-tive processes (eg Sherman 1996) If we further assume thatthe best rate can be approximated by cn = c|Dn|12 where c isan unknown constant then we can adapt the algorithm of Halland Jing (1996) to estimate cn and thus determine the ldquooptimalrdquosubblock size
Specifically let Dim i = 1 kprime
n be a set of subblocks con-tained in Dn that have the same size and shape let Si
mj be Sn
in (8) obtained from the j th thinned process on Dim and let
Smj be the mean of Simj We then define the sample variancendash
covariance matrix of Simj averaged over the K replicate thinned
point processes
θm = 1
K
Ksum
j=1
kprimensum
i=1
(Simj minus Smj )(Si
mj minus Smj )prime
kprimen minus 1
If Dim is relatively small compared with Dn then the forego-
ing should be a good estimator for the true covariance matrix ofSm where Sm is Sn in (8) defined on Dm For each Di
m we thenapply (11) to estimate cov(Sm) over a fine set of candidate sub-block sizes say cprime
m = cprime|Dim|12 Let θm(j k) and [θ i
m(j k)]prime bethe (j k)th element of θm and the resulting estimator for usingcprimem We define the best cm as the one that has the smallest value
for the following criterion
M(cprimem) = 1
kprimen
kprimensum
i=1
psum
j=1
psum
k=1
[θ im(j k)]prime minus θm(j k)
2
The ldquooptimalrdquo subblock size for Dn cn then can be estimatedeasily using the relationship cn = cm(|Dn||Dm|)12
Guan and Loh Block Bootstrap Variance Estimation 1381
4 SIMULATIONS
41 Log-Gaussian Cox Process
We performed a simulation study to test the results of ourtheoretical findings On an observation region D of size L timesL where L = 12 and 4 we simulated 1000 realizations ofan inhomogeneous Poisson process with intensity function λ(s)given by
logλ(s) = α + βX(s) + G(s) (13)
where G is a mean-0 Gaussian random field This is the LGCPmodel with inhomogeneity due to the covariate X
For values of X we used a single realization from a dif-ferent mean-0 Gaussian random field We kept this realizationfixed throughout the whole simulation study that is we usedthe same X for each of the 1000 point process realizationsFigure 1 shows a surface plot of X for L = 4 The values ofX for L = 1 and 2 are the lower-left corner of those shownin the figure For G we used the exponential model C(h) =σ 2 exp(minushρ) for its covariance function where ρ = 1 and 2and σ = 1 We set α and β to be 702 and 2 We generated thepoint process patterns using the rpoispp function in the spat-stat R package (Baddeley and Turner 2005) The average num-ber of points simulated were about 1200 4500 and 16700 forL = 12 and 4
For each of the 1000 point patterns we obtained the esti-mates α and β by maximizing the Poisson likelihood (usingthe ppm function in spatstat) Next we thinned the point pat-tern according to the criterion described in Section 3 resultingin 20 independent thinned processes for each realization Theaverage number of points in the thinned patterns was roughly490 1590 and 5400 for L = 12 and 4 so about 30ndash40
Figure 1 Image of the covariate X used in the simulation Therange of X is between minus6066 and 5303 where darker colors rep-resent larger values of X
of the points were retained We then applied the bootstrap pro-cedure to each of the thinned processes to get the variance esti-mates for α and β From (13) we found that the quantities to becomputed with each bootstrap sample were
sumλ(slowast) for α andsum
X(slowast)λ(slowast) for β where slowast represents the locations of thepoints in a bootstrap sample We used nonoverlapping squareblocks in the bootstrap procedure For L = 1 we used blockswith side lengths 14 13 and 12 whereas for L = 24 weused blocks with side lengths 14 13 12 and 1 The numberof bootstrap samples was 499
We computed the variances of α and β from the bootstrapvariances using Theorem 1 (9) and the relationship of cov(Sn)
to Bn We averaged the variance estimates across the inde-pendent thinnings Finally we constructed nominal 95 con-fidence intervals yielding a total of 1000 confidence intervalsfor α and β
Table 1 gives the empirical coverage of the confidence inter-vals for β The coverage for α (not shown) is similar but slightlylower For both values of ρ we found that the empirical cover-age increased toward the nominal level as the observation re-gion size increased This increase in coverage agrees with whatwe would expect from the theory The empirical coverage forρ = 1 is typically closer to the nominal level than for ρ = 2Note that ρ = 1 corresponds to a weaker dependence Thuswe would expect our method to work better if the dependencein the data were relatively weaker It is interesting to note thatfor L = 2 and 4 the best coverage for ρ = 2 was achieved byusing a block larger than its counterpart for ρ = 1 Thus forerrors with longer-range dependence it appears that the blockbootstrap procedure requires larger blocks to work well
Table 2 gives the bootstrap estimates of the standard errorsof β for ρ = 2 averaged over the 1000 realizations The stan-dard deviations of these estimates over the 1000 realizationsare included in brackets The ldquotruerdquo standard errors computedusing the estimated regression parameters from the independentrealizations are also given The table shows that the bootstrapestimates have a negative bias becoming proportionally closerto the ldquotruerdquo standard errors as the region size L increases
In summary we found that our method worked reasonablywell when the regression parameters were estimated We founda slight undercoverage in each instance but noted that this un-dercoverage becomes smaller as the sample size increases Wealso estimated the standard errors by using the true regressionparameters in our thinning step and found that the coverageincreased only slightly
Table 1 Empirical coverage of 95 nominal confidence intervalsof β in the LGCP case
Block size
Region size 14 13 12 1
ρ = 11 85 77 492 91 90 88 494 93 93 93 90
ρ = 21 84 76 502 87 88 87 504 88 89 90 89
1382 Journal of the American Statistical Association December 2007
Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case
ρ = 2 Block size
Region size True SE 14 13 12 1
1 24 17(04) 16(05) 13(06)
2 12 10(02) 10(02) 10(03) 08(03)
4 07 056(003) 057(004) 060(006) 060(008)
42 Inhomogeneous NeymanndashScott Process
Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case
To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case
We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β
and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing
int a
0[K(t)c minus K(tκω)c]2 dt (14)
with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ
and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode
Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the
former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method
5 AN APPLICATION
We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model
λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not
Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and
the plug-in method in the INSP case
Block size
Region size Plug-in 14 13 12
ω = 021 96 91 89 812 96 94 94 94
ω = 041 93 87 87 772 95 90 90 91
Guan and Loh Block Bootstrap Variance Estimation 1383
Figure 2 Locations of BLP trees
We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters
To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not
Table 4 Results for the BLP data
Plug-in method Thinned block bootstrap
EST STD 95 CI STD 95 CI
β1 02 02 (minus02 06) 017 (minus014 054)
β2 584 253 (891080) 212 (169999)
NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval
From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions
APPENDIX A PROOF OF THEOREM 1
Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device
Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)
n (β0)]4 lt C for some C lt infin
Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-
tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded
by the following (ignoring some multiplicative constants)int int int int
|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4
+[int int
|Q2(s1 s2)|ds1 ds2
]2
+int int int
|Q3(s1 s2 s3)|ds1 ds2 ds3
+ |Dn|int int
|Q2(s1 s2)|ds1 ds2
+int int
|Q2(s1 s2)|ds1 ds2
where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven
Proof of Theorem 1
First note that
U(1)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(1)(xβ)
λ(xβ)minus
intλ(1)(sβ)ds
]
and
U(2)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(2)(xβ)
λ(xβ)minus
intλ(2)(sβ)ds
]
minus 1
|Dn|sum
xisinNcapDn
[λ(1)(xβ)
λ(xβ)
]2
= U(2)na(β) minus U
(2)nb
(β)
Using the Taylor expansion we can obtain
|Dn|12(βn minus β0) = minus[U
(2)n (βn)
]minus1|Dn|12U(1)n (β0)
where βn is between βn and β0 We need to show that
U(2)na(β0)
prarr 0
and
U(2)nb
(β0)prarr 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
1384 Journal of the American Statistical Association December 2007
To show the first expression note that E[U(2)na(β0)] = 0 Thus we only
need look at the variance term
var[U
(2)na(β0)
]
= 1
|Dn|2int int
λ(2)(s1β0)λ(2)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(2)(sβ0)]2
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)
prarr 0
To show the second expression note that E[U(2)nb
(β0)] =1
|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only
the variance term
var[U
(2)nb
(β0)]
= 1
|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(1)(sβ0)]4
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb
(β0)prarr
1|Dn|
int [λ(1)(sβ0)]2λ(sβ0) ds
Now we want to show that |Dn|12U(1)n (β0) converges to a
normal distribution First we derive the mean and variance of|Dn|12U
(1)n (β0) Clearly E[|Dn|12U
(1)n (β0)] = 0 For the variance
var[|Dn|12U
(1)n (β0)
]
= 1
|Dn|int int
λ(1)(s1β0)λ(1)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
Now consider a new sequence of regions Dlowastn where Dlowast
n sub Dn and
|Dlowastn||Dn| rarr 1 as n rarr infin Let U
(1)nlowast(β0) be the estimating function
for the realization of the point process on Dlowastn Based on the expression
of the variance we can deduce that
var[|Dn|12U
(1)n (β0) minus |Dlowast
n|12U(1)nlowast(β0)
] rarr 0 as n rarr infin
Thus
|Dn|12U(1)n (β0) sim |Dlowast
n|12U(1)nlowast(β0)
where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast
n|12U(1)nlowast(β0) con-
verges to a normal distribution for some properly defined Dlowastn
To obtain a sequence of Dlowastn we apply the blocking technique used
by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt
η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi
l(n) i = 1 kn Within each subsquare we further obtain Di
m(n)
an m(n) times m(n) square sharing the same center with Dil(n)
Note that
d(Dim(n)
Djm(n)
) ge nη for i = j Now define
Dlowastn =
kn⋃
i=1
Dim(n)
Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show
that |Dlowastn|12U
(1)nlowast(β0) converges to a normal distribution This is true
due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here
APPENDIX B PROOF OF THEOREM 2
To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1
sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes
var(Sn) =Bsum
b=1
(Sbn minus Sn)2
B minus 1
As B goes to infinity we see that the foregoing converges to
var(Sbn | cap Dn)
= 1
kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)
sum
Dj
l(n)
Z(x1 minus cj + ci )Z(x2 minus cj + ci )
minus 1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj2l(n)
Z(x1 minus cj1 + ci
)
times Z(x2 minus cj2 + ci
)
where the notationsum
D is short forsum
Dcap We wish to show thatvar(Sb
n | cap Dn)|Dn| converges in L2 to
1
|Dn| var(Sn) = (λn)2
|Dn|int
Dn
int
Dn
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
+ λn
|Dn|int
Dn
[Z(s)]2 ds
To show this we need to show that
1
|Dn|E[var(Sbn | cap Dn)] rarr 1
|Dn| var(Sn) (B1)
and
1
|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)
To show (B1) note that
E[var(Sbn | cap Dn)]
= (λn)2knsum
i=1
int int
Dil(n)
Z(s1)Z(s2)g(s1 minus s2) ds1 ds2
+ λn(kn minus 1)
kn
int
Dn
[Z(s)]2 ds
minus (λn)2
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times g(s1 minus s2 + cj1 minus cj2
)ds1 ds2
Thus
1
|Dn|E[var(Sb
n | cap Dn)] minus var(Sn)
= minus λn
kn|Dn|int
Dn
[Z(s)]2 ds
minus (λn)2
|Dn|sumsum
i =j
int
Dil(n)
int
Dj
l(n)
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
Guan and Loh Block Bootstrap Variance Estimation 1385
minus (λn)2
|Dn|1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times ϕ(s1 minus s2 + cj1 minus cj2
)ds1 ds2
= minusT 1n minus T 2
n minus T 3n
T 1n goes to 0 due to conditions (2) and (5) For T 2
n and T 3n lengthy yet
elementary derivations yield
|T 2n | le C(λn)2
|kn|knsum
i=1
[1
|Dn|int
DnminusDn
|Dn cap (Dn minus s)||ϕ(s)|ds
minus 1
|Dil(n)
|int
Dil(n)
minusDil(n)
∣∣Dil(n) cap (
Dil(n) minus s
)∣∣|ϕ(s)|ds]
and
T 3n le C(λn)2
kn
1
|Dn|int
Dn
int
Dn
|ϕ(s1 minus s2)|ds1 ds2
Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1
n and T 3n are both
of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition
(12)To prove (B2) we first note that [var(Sb
n | cap Dn)]2 is equal to thesum of the following three terms
1
k2n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj2l(n)
[Z
(x1 minus cj1 + ci1
)
times Z(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)Z
(x4 minus cj2 + ci2
)]
minus 2
k3n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)
times Z(x4 minus cj3 + ci2
)]
and
1
k4n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
knsum
j4=1
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
sum
Dj4l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj2 + ci1
)Z
(x3 minus cj3 + ci2
)Z
(x4 minus cj4 + ci2
)]
Furthermore the following relationships are possible for x1 x4
a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but
x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =
x3 = x4 but x1 = x2e x1 = x2 = x3 = x4
When calculating the variance of 1|Dn| var(Sb
n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1
|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-
plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner
Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral
(λn)4
k2n|Dn|2
knsum
i1=1
knsum
i2=1
knsum
j1=1
int
Dj1l(n)
Z(s1 minus cj1 + ci1
)
times[ knsum
j2=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
Z(s2 minus cj1 + ci1
)Z
(s3 minus cj2 + ci2
)
times Z(s4 minus cj2 + ci2
)F1(s1 s2 s3 s4) ds2 ds3 ds4
minus 2
kn
knsum
j2=1
knsum
j3=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj3l(n)
Z(s2 minus cj1 + ci1
)
times Z(s3 minus cj2 + ci2
)Z
(s4 minus cj3 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
+ 1
k2n
knsum
j2=1
knsum
j3=1
knsum
j4=1
int
Dj2l(n)
int
Dj3l(n)
int
Dj4l(n)
Z(s2 minus cj2 + ci1
)
times Z(s3 minus cj3 + ci2
)Z
(s4 minus cj4 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
]ds1
Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1
s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by
2C(λn)4
kn|Dn|2knsum
j1=1
int
Dj1l(n)
int
Dj1l(n)
int
Dn
int
Dn
|F2(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|F1(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by
C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|ϕ4(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ 4C(λn)4
kn|Dn|knsum
j2=1
int
Dn
int
Dj2l(n)
int
Dj2l(n)
|ϕ3(s1 s3 s4)|ds1 ds3 ds4
+ 2C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
[int
Dj1l(n)
int
Dj2l(n)
|ϕ(s1 s3)|ds1 ds3
]2
The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|
APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP
Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s
θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap
1386 Journal of the American Statistical Association December 2007
Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that
1
|Dn|var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0
where var[Sbn |(θ0) cap Dn] is var(Sb
n | cap N) defined in Appendix Bvar[Sb
n |(θn) cap Dn] is defined analogously
Letsum
D andsumsum =
Ddenote
sum
Dcap(acupb)
andsumsum
x1isinDcapNx2isinDcap(acupb)x1 =x2
Note that
1
|Dn|∣∣var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣
ltC
|Dn|kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)capN
sum
Dj
l(n)
1
+ C
|Dn|k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
capN
sum
Dj2l(n)
1
= C
|Dn|knsum
j=1
=sumsum
Dj
l(n)
1
+ C
|Dn|sum
Dn
1 + C
|Dn|kn
[sum
Dn
1
][ sum
xisinDncapN
1
]
Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for
0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator
[Received December 2006 Revised May 2007]
REFERENCES
Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42
Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350
Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston
Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point
Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York
Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-
VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-
geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear
Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821
(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125
Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737
Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730
Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag
McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419
Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482
Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics
From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275
Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154
Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47
Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93
Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523
(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048
Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley
Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258
Guan and Loh Block Bootstrap Variance Estimation 1381
4 SIMULATIONS
41 Log-Gaussian Cox Process
We performed a simulation study to test the results of ourtheoretical findings On an observation region D of size L timesL where L = 12 and 4 we simulated 1000 realizations ofan inhomogeneous Poisson process with intensity function λ(s)given by
logλ(s) = α + βX(s) + G(s) (13)
where G is a mean-0 Gaussian random field This is the LGCPmodel with inhomogeneity due to the covariate X
For values of X we used a single realization from a dif-ferent mean-0 Gaussian random field We kept this realizationfixed throughout the whole simulation study that is we usedthe same X for each of the 1000 point process realizationsFigure 1 shows a surface plot of X for L = 4 The values ofX for L = 1 and 2 are the lower-left corner of those shownin the figure For G we used the exponential model C(h) =σ 2 exp(minushρ) for its covariance function where ρ = 1 and 2and σ = 1 We set α and β to be 702 and 2 We generated thepoint process patterns using the rpoispp function in the spat-stat R package (Baddeley and Turner 2005) The average num-ber of points simulated were about 1200 4500 and 16700 forL = 12 and 4
For each of the 1000 point patterns we obtained the esti-mates α and β by maximizing the Poisson likelihood (usingthe ppm function in spatstat) Next we thinned the point pat-tern according to the criterion described in Section 3 resultingin 20 independent thinned processes for each realization Theaverage number of points in the thinned patterns was roughly490 1590 and 5400 for L = 12 and 4 so about 30ndash40
Figure 1 Image of the covariate X used in the simulation Therange of X is between minus6066 and 5303 where darker colors rep-resent larger values of X
of the points were retained We then applied the bootstrap pro-cedure to each of the thinned processes to get the variance esti-mates for α and β From (13) we found that the quantities to becomputed with each bootstrap sample were
sumλ(slowast) for α andsum
X(slowast)λ(slowast) for β where slowast represents the locations of thepoints in a bootstrap sample We used nonoverlapping squareblocks in the bootstrap procedure For L = 1 we used blockswith side lengths 14 13 and 12 whereas for L = 24 weused blocks with side lengths 14 13 12 and 1 The numberof bootstrap samples was 499
We computed the variances of α and β from the bootstrapvariances using Theorem 1 (9) and the relationship of cov(Sn)
to Bn We averaged the variance estimates across the inde-pendent thinnings Finally we constructed nominal 95 con-fidence intervals yielding a total of 1000 confidence intervalsfor α and β
Table 1 gives the empirical coverage of the confidence inter-vals for β The coverage for α (not shown) is similar but slightlylower For both values of ρ we found that the empirical cover-age increased toward the nominal level as the observation re-gion size increased This increase in coverage agrees with whatwe would expect from the theory The empirical coverage forρ = 1 is typically closer to the nominal level than for ρ = 2Note that ρ = 1 corresponds to a weaker dependence Thuswe would expect our method to work better if the dependencein the data were relatively weaker It is interesting to note thatfor L = 2 and 4 the best coverage for ρ = 2 was achieved byusing a block larger than its counterpart for ρ = 1 Thus forerrors with longer-range dependence it appears that the blockbootstrap procedure requires larger blocks to work well
Table 2 gives the bootstrap estimates of the standard errorsof β for ρ = 2 averaged over the 1000 realizations The stan-dard deviations of these estimates over the 1000 realizationsare included in brackets The ldquotruerdquo standard errors computedusing the estimated regression parameters from the independentrealizations are also given The table shows that the bootstrapestimates have a negative bias becoming proportionally closerto the ldquotruerdquo standard errors as the region size L increases
In summary we found that our method worked reasonablywell when the regression parameters were estimated We founda slight undercoverage in each instance but noted that this un-dercoverage becomes smaller as the sample size increases Wealso estimated the standard errors by using the true regressionparameters in our thinning step and found that the coverageincreased only slightly
Table 1 Empirical coverage of 95 nominal confidence intervalsof β in the LGCP case
Block size
Region size 14 13 12 1
ρ = 11 85 77 492 91 90 88 494 93 93 93 90
ρ = 21 84 76 502 87 88 87 504 88 89 90 89
1382 Journal of the American Statistical Association December 2007
Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case
ρ = 2 Block size
Region size True SE 14 13 12 1
1 24 17(04) 16(05) 13(06)
2 12 10(02) 10(02) 10(03) 08(03)
4 07 056(003) 057(004) 060(006) 060(008)
42 Inhomogeneous NeymanndashScott Process
Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case
To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case
We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β
and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing
int a
0[K(t)c minus K(tκω)c]2 dt (14)
with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ
and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode
Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the
former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method
5 AN APPLICATION
We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model
λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not
Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and
the plug-in method in the INSP case
Block size
Region size Plug-in 14 13 12
ω = 021 96 91 89 812 96 94 94 94
ω = 041 93 87 87 772 95 90 90 91
Guan and Loh Block Bootstrap Variance Estimation 1383
Figure 2 Locations of BLP trees
We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters
To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not
Table 4 Results for the BLP data
Plug-in method Thinned block bootstrap
EST STD 95 CI STD 95 CI
β1 02 02 (minus02 06) 017 (minus014 054)
β2 584 253 (891080) 212 (169999)
NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval
From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions
APPENDIX A PROOF OF THEOREM 1
Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device
Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)
n (β0)]4 lt C for some C lt infin
Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-
tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded
by the following (ignoring some multiplicative constants)int int int int
|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4
+[int int
|Q2(s1 s2)|ds1 ds2
]2
+int int int
|Q3(s1 s2 s3)|ds1 ds2 ds3
+ |Dn|int int
|Q2(s1 s2)|ds1 ds2
+int int
|Q2(s1 s2)|ds1 ds2
where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven
Proof of Theorem 1
First note that
U(1)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(1)(xβ)
λ(xβ)minus
intλ(1)(sβ)ds
]
and
U(2)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(2)(xβ)
λ(xβ)minus
intλ(2)(sβ)ds
]
minus 1
|Dn|sum
xisinNcapDn
[λ(1)(xβ)
λ(xβ)
]2
= U(2)na(β) minus U
(2)nb
(β)
Using the Taylor expansion we can obtain
|Dn|12(βn minus β0) = minus[U
(2)n (βn)
]minus1|Dn|12U(1)n (β0)
where βn is between βn and β0 We need to show that
U(2)na(β0)
prarr 0
and
U(2)nb
(β0)prarr 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
1384 Journal of the American Statistical Association December 2007
To show the first expression note that E[U(2)na(β0)] = 0 Thus we only
need look at the variance term
var[U
(2)na(β0)
]
= 1
|Dn|2int int
λ(2)(s1β0)λ(2)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(2)(sβ0)]2
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)
prarr 0
To show the second expression note that E[U(2)nb
(β0)] =1
|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only
the variance term
var[U
(2)nb
(β0)]
= 1
|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(1)(sβ0)]4
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb
(β0)prarr
1|Dn|
int [λ(1)(sβ0)]2λ(sβ0) ds
Now we want to show that |Dn|12U(1)n (β0) converges to a
normal distribution First we derive the mean and variance of|Dn|12U
(1)n (β0) Clearly E[|Dn|12U
(1)n (β0)] = 0 For the variance
var[|Dn|12U
(1)n (β0)
]
= 1
|Dn|int int
λ(1)(s1β0)λ(1)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
Now consider a new sequence of regions Dlowastn where Dlowast
n sub Dn and
|Dlowastn||Dn| rarr 1 as n rarr infin Let U
(1)nlowast(β0) be the estimating function
for the realization of the point process on Dlowastn Based on the expression
of the variance we can deduce that
var[|Dn|12U
(1)n (β0) minus |Dlowast
n|12U(1)nlowast(β0)
] rarr 0 as n rarr infin
Thus
|Dn|12U(1)n (β0) sim |Dlowast
n|12U(1)nlowast(β0)
where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast
n|12U(1)nlowast(β0) con-
verges to a normal distribution for some properly defined Dlowastn
To obtain a sequence of Dlowastn we apply the blocking technique used
by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt
η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi
l(n) i = 1 kn Within each subsquare we further obtain Di
m(n)
an m(n) times m(n) square sharing the same center with Dil(n)
Note that
d(Dim(n)
Djm(n)
) ge nη for i = j Now define
Dlowastn =
kn⋃
i=1
Dim(n)
Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show
that |Dlowastn|12U
(1)nlowast(β0) converges to a normal distribution This is true
due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here
APPENDIX B PROOF OF THEOREM 2
To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1
sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes
var(Sn) =Bsum
b=1
(Sbn minus Sn)2
B minus 1
As B goes to infinity we see that the foregoing converges to
var(Sbn | cap Dn)
= 1
kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)
sum
Dj
l(n)
Z(x1 minus cj + ci )Z(x2 minus cj + ci )
minus 1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj2l(n)
Z(x1 minus cj1 + ci
)
times Z(x2 minus cj2 + ci
)
where the notationsum
D is short forsum
Dcap We wish to show thatvar(Sb
n | cap Dn)|Dn| converges in L2 to
1
|Dn| var(Sn) = (λn)2
|Dn|int
Dn
int
Dn
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
+ λn
|Dn|int
Dn
[Z(s)]2 ds
To show this we need to show that
1
|Dn|E[var(Sbn | cap Dn)] rarr 1
|Dn| var(Sn) (B1)
and
1
|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)
To show (B1) note that
E[var(Sbn | cap Dn)]
= (λn)2knsum
i=1
int int
Dil(n)
Z(s1)Z(s2)g(s1 minus s2) ds1 ds2
+ λn(kn minus 1)
kn
int
Dn
[Z(s)]2 ds
minus (λn)2
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times g(s1 minus s2 + cj1 minus cj2
)ds1 ds2
Thus
1
|Dn|E[var(Sb
n | cap Dn)] minus var(Sn)
= minus λn
kn|Dn|int
Dn
[Z(s)]2 ds
minus (λn)2
|Dn|sumsum
i =j
int
Dil(n)
int
Dj
l(n)
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
Guan and Loh Block Bootstrap Variance Estimation 1385
minus (λn)2
|Dn|1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times ϕ(s1 minus s2 + cj1 minus cj2
)ds1 ds2
= minusT 1n minus T 2
n minus T 3n
T 1n goes to 0 due to conditions (2) and (5) For T 2
n and T 3n lengthy yet
elementary derivations yield
|T 2n | le C(λn)2
|kn|knsum
i=1
[1
|Dn|int
DnminusDn
|Dn cap (Dn minus s)||ϕ(s)|ds
minus 1
|Dil(n)
|int
Dil(n)
minusDil(n)
∣∣Dil(n) cap (
Dil(n) minus s
)∣∣|ϕ(s)|ds]
and
T 3n le C(λn)2
kn
1
|Dn|int
Dn
int
Dn
|ϕ(s1 minus s2)|ds1 ds2
Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1
n and T 3n are both
of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition
(12)To prove (B2) we first note that [var(Sb
n | cap Dn)]2 is equal to thesum of the following three terms
1
k2n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj2l(n)
[Z
(x1 minus cj1 + ci1
)
times Z(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)Z
(x4 minus cj2 + ci2
)]
minus 2
k3n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)
times Z(x4 minus cj3 + ci2
)]
and
1
k4n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
knsum
j4=1
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
sum
Dj4l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj2 + ci1
)Z
(x3 minus cj3 + ci2
)Z
(x4 minus cj4 + ci2
)]
Furthermore the following relationships are possible for x1 x4
a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but
x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =
x3 = x4 but x1 = x2e x1 = x2 = x3 = x4
When calculating the variance of 1|Dn| var(Sb
n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1
|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-
plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner
Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral
(λn)4
k2n|Dn|2
knsum
i1=1
knsum
i2=1
knsum
j1=1
int
Dj1l(n)
Z(s1 minus cj1 + ci1
)
times[ knsum
j2=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
Z(s2 minus cj1 + ci1
)Z
(s3 minus cj2 + ci2
)
times Z(s4 minus cj2 + ci2
)F1(s1 s2 s3 s4) ds2 ds3 ds4
minus 2
kn
knsum
j2=1
knsum
j3=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj3l(n)
Z(s2 minus cj1 + ci1
)
times Z(s3 minus cj2 + ci2
)Z
(s4 minus cj3 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
+ 1
k2n
knsum
j2=1
knsum
j3=1
knsum
j4=1
int
Dj2l(n)
int
Dj3l(n)
int
Dj4l(n)
Z(s2 minus cj2 + ci1
)
times Z(s3 minus cj3 + ci2
)Z
(s4 minus cj4 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
]ds1
Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1
s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by
2C(λn)4
kn|Dn|2knsum
j1=1
int
Dj1l(n)
int
Dj1l(n)
int
Dn
int
Dn
|F2(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|F1(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by
C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|ϕ4(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ 4C(λn)4
kn|Dn|knsum
j2=1
int
Dn
int
Dj2l(n)
int
Dj2l(n)
|ϕ3(s1 s3 s4)|ds1 ds3 ds4
+ 2C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
[int
Dj1l(n)
int
Dj2l(n)
|ϕ(s1 s3)|ds1 ds3
]2
The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|
APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP
Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s
θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap
1386 Journal of the American Statistical Association December 2007
Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that
1
|Dn|var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0
where var[Sbn |(θ0) cap Dn] is var(Sb
n | cap N) defined in Appendix Bvar[Sb
n |(θn) cap Dn] is defined analogously
Letsum
D andsumsum =
Ddenote
sum
Dcap(acupb)
andsumsum
x1isinDcapNx2isinDcap(acupb)x1 =x2
Note that
1
|Dn|∣∣var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣
ltC
|Dn|kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)capN
sum
Dj
l(n)
1
+ C
|Dn|k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
capN
sum
Dj2l(n)
1
= C
|Dn|knsum
j=1
=sumsum
Dj
l(n)
1
+ C
|Dn|sum
Dn
1 + C
|Dn|kn
[sum
Dn
1
][ sum
xisinDncapN
1
]
Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for
0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator
[Received December 2006 Revised May 2007]
REFERENCES
Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42
Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350
Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston
Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point
Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York
Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-
VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-
geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear
Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821
(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125
Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737
Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730
Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag
McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419
Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482
Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics
From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275
Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154
Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47
Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93
Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523
(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048
Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley
Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258
1382 Journal of the American Statistical Association December 2007
Table 2 Estimates of the standard errors of β for ρ = 2 in the LGCP case
ρ = 2 Block size
Region size True SE 14 13 12 1
1 24 17(04) 16(05) 13(06)
2 12 10(02) 10(02) 10(03) 08(03)
4 07 056(003) 057(004) 060(006) 060(008)
42 Inhomogeneous NeymanndashScott Process
Theorem 1 gives the variance in terms of the FOIF and theSOCF Another approach for variance estimation is to use theplug-in method As point out by one referee we may sim-ply estimate the PCF or the K-function using the originaldata (Baddeley et al 2000) This will yield an estimate of theSOCF that in turn can be plugged into the expression for Bn inTheorem 1 to obtain an estimate for the asymptotic varianceWaagepetersen (2007) studied the performance of this methodin the INSP case
To compare the performance of our method with the plug-in method we also simulated data from the INSP model givenby Waagepetersen (2007) To do so we first simulated a ho-mogeneous Poisson process with intensity κ = 50 as the parentprocess For each parent we then generated a Poisson numberof offspring We defined the position of each offspring relativeto its parent by a radially symmetric Gaussian random variable(eg Diggle 2003) Let ω denote the standard deviation of thevariable We used ω = 02 04 representing relatively strongand weak clustering Finally we thinned the offspring as doneby Waagepetersen (2007) by setting the probability of retain-ing an offspring equal to the intensity at the offspring locationdivided by the maximum intensity in the study region For theintensity we used logλ(s) = α + βX(s) where α β and X(s)were as defined in the LGCP case
We simulated 1000 realizations of the process on both a1 times 1 square and a 2 times 2 square For each realization we ap-plied the thinned block bootstrap and the plug-in method ofWaagepetersen (2007) to estimate the standard errors of α and β
and to further obtain their respective 95 confidence intervalsFor the plug-in method we estimated the second-order para-meters κ and ω by a minimum contrast estimation procedureSpecifically we obtained these estimates by minimizing
int a
0[K(t)c minus K(tκω)c]2 dt (14)
with respect to (κω) for a specified a and c where K(t) andK(tκω) are the empirical and theoretical K functions Weused a = 4ω and c = 25 then used the estimated values of κ
and ω to estimate the standard errors of α and β To do this weused the inhomthomasasympcov function in the InhomClusterR package which is available on Waagepetersenrsquos website athttpwwwmathaaudksimrwsppcode
Table 3 gives the empirical coverage of the confidence in-tervals for β As in the LGCP case there was a slight under-coverage but this undercoverage appears to be less serious Inparticular for L = 2 and ω = 02 the coverage was very closeto the nominal level for all block sizes used Furthermore thecoverage was better for ω = 02 than for ω = 04 because the
former yielded a process with a shorter-range dependence thanthe latter For the plug-in method all coverages were very closeto the nominal level Thus our method did not perform as wellas the plug-in method when the true SOCF was used Howeverthe difference in coverage tended to diminish as the sample sizeincreased As pointed out by Waagepetersen (2007) the perfor-mance of the plug-in method is affected by the choice of thetuning parameters a and c in (14) In the simulation we alsoconsidered using the default value of a given by spatstat Theresults not shown here suggested that the coverage was oftenworse than that achieved by our method
5 AN APPLICATION
We applied the proposed thinned block bootstrap procedureto a tropical rain forest data set collected in a 1000 times 500 mplot on Barro Colorado Island The data contain measurementsfor more than 300 species existing in the same plot in mul-tiple years as well as information on elevation and gradientrecorded on a 5 times 5 m grid within the same region We wereparticularly interested in modeling the locations of 3605 BPLtrees recorded in a 1995 census by using elevation and gradi-ent as covariates (Fig 2) The same data set was analyzed byWaagepetersen (2007) in which he considered the followingFOIF model
λ(s) = exp[β0 + β1E(s) + β2G(s)]where E(s) and G(s) are the (estimated) elevation and gradientat location s Waagepetersen (2007) fitted this model by maxi-mizing the Poisson maximum likelihood criterion given in (1)Specifically he obtained the estimates β1 = 02 and β2 = 584To estimate the standard errors associated with these estimateshe further assumed that the locations of the BPL trees were gen-erated by an INSP Based on the estimated standard errors heconcluded that β2 was significant but β1 was not
Table 3 Empirical coverage of 95 nominal confidence intervalsof β obtained by the thinned block bootstrap method and
the plug-in method in the INSP case
Block size
Region size Plug-in 14 13 12
ω = 021 96 91 89 812 96 94 94 94
ω = 041 93 87 87 772 95 90 90 91
Guan and Loh Block Bootstrap Variance Estimation 1383
Figure 2 Locations of BLP trees
We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters
To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not
Table 4 Results for the BLP data
Plug-in method Thinned block bootstrap
EST STD 95 CI STD 95 CI
β1 02 02 (minus02 06) 017 (minus014 054)
β2 584 253 (891080) 212 (169999)
NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval
From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions
APPENDIX A PROOF OF THEOREM 1
Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device
Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)
n (β0)]4 lt C for some C lt infin
Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-
tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded
by the following (ignoring some multiplicative constants)int int int int
|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4
+[int int
|Q2(s1 s2)|ds1 ds2
]2
+int int int
|Q3(s1 s2 s3)|ds1 ds2 ds3
+ |Dn|int int
|Q2(s1 s2)|ds1 ds2
+int int
|Q2(s1 s2)|ds1 ds2
where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven
Proof of Theorem 1
First note that
U(1)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(1)(xβ)
λ(xβ)minus
intλ(1)(sβ)ds
]
and
U(2)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(2)(xβ)
λ(xβ)minus
intλ(2)(sβ)ds
]
minus 1
|Dn|sum
xisinNcapDn
[λ(1)(xβ)
λ(xβ)
]2
= U(2)na(β) minus U
(2)nb
(β)
Using the Taylor expansion we can obtain
|Dn|12(βn minus β0) = minus[U
(2)n (βn)
]minus1|Dn|12U(1)n (β0)
where βn is between βn and β0 We need to show that
U(2)na(β0)
prarr 0
and
U(2)nb
(β0)prarr 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
1384 Journal of the American Statistical Association December 2007
To show the first expression note that E[U(2)na(β0)] = 0 Thus we only
need look at the variance term
var[U
(2)na(β0)
]
= 1
|Dn|2int int
λ(2)(s1β0)λ(2)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(2)(sβ0)]2
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)
prarr 0
To show the second expression note that E[U(2)nb
(β0)] =1
|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only
the variance term
var[U
(2)nb
(β0)]
= 1
|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(1)(sβ0)]4
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb
(β0)prarr
1|Dn|
int [λ(1)(sβ0)]2λ(sβ0) ds
Now we want to show that |Dn|12U(1)n (β0) converges to a
normal distribution First we derive the mean and variance of|Dn|12U
(1)n (β0) Clearly E[|Dn|12U
(1)n (β0)] = 0 For the variance
var[|Dn|12U
(1)n (β0)
]
= 1
|Dn|int int
λ(1)(s1β0)λ(1)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
Now consider a new sequence of regions Dlowastn where Dlowast
n sub Dn and
|Dlowastn||Dn| rarr 1 as n rarr infin Let U
(1)nlowast(β0) be the estimating function
for the realization of the point process on Dlowastn Based on the expression
of the variance we can deduce that
var[|Dn|12U
(1)n (β0) minus |Dlowast
n|12U(1)nlowast(β0)
] rarr 0 as n rarr infin
Thus
|Dn|12U(1)n (β0) sim |Dlowast
n|12U(1)nlowast(β0)
where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast
n|12U(1)nlowast(β0) con-
verges to a normal distribution for some properly defined Dlowastn
To obtain a sequence of Dlowastn we apply the blocking technique used
by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt
η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi
l(n) i = 1 kn Within each subsquare we further obtain Di
m(n)
an m(n) times m(n) square sharing the same center with Dil(n)
Note that
d(Dim(n)
Djm(n)
) ge nη for i = j Now define
Dlowastn =
kn⋃
i=1
Dim(n)
Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show
that |Dlowastn|12U
(1)nlowast(β0) converges to a normal distribution This is true
due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here
APPENDIX B PROOF OF THEOREM 2
To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1
sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes
var(Sn) =Bsum
b=1
(Sbn minus Sn)2
B minus 1
As B goes to infinity we see that the foregoing converges to
var(Sbn | cap Dn)
= 1
kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)
sum
Dj
l(n)
Z(x1 minus cj + ci )Z(x2 minus cj + ci )
minus 1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj2l(n)
Z(x1 minus cj1 + ci
)
times Z(x2 minus cj2 + ci
)
where the notationsum
D is short forsum
Dcap We wish to show thatvar(Sb
n | cap Dn)|Dn| converges in L2 to
1
|Dn| var(Sn) = (λn)2
|Dn|int
Dn
int
Dn
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
+ λn
|Dn|int
Dn
[Z(s)]2 ds
To show this we need to show that
1
|Dn|E[var(Sbn | cap Dn)] rarr 1
|Dn| var(Sn) (B1)
and
1
|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)
To show (B1) note that
E[var(Sbn | cap Dn)]
= (λn)2knsum
i=1
int int
Dil(n)
Z(s1)Z(s2)g(s1 minus s2) ds1 ds2
+ λn(kn minus 1)
kn
int
Dn
[Z(s)]2 ds
minus (λn)2
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times g(s1 minus s2 + cj1 minus cj2
)ds1 ds2
Thus
1
|Dn|E[var(Sb
n | cap Dn)] minus var(Sn)
= minus λn
kn|Dn|int
Dn
[Z(s)]2 ds
minus (λn)2
|Dn|sumsum
i =j
int
Dil(n)
int
Dj
l(n)
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
Guan and Loh Block Bootstrap Variance Estimation 1385
minus (λn)2
|Dn|1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times ϕ(s1 minus s2 + cj1 minus cj2
)ds1 ds2
= minusT 1n minus T 2
n minus T 3n
T 1n goes to 0 due to conditions (2) and (5) For T 2
n and T 3n lengthy yet
elementary derivations yield
|T 2n | le C(λn)2
|kn|knsum
i=1
[1
|Dn|int
DnminusDn
|Dn cap (Dn minus s)||ϕ(s)|ds
minus 1
|Dil(n)
|int
Dil(n)
minusDil(n)
∣∣Dil(n) cap (
Dil(n) minus s
)∣∣|ϕ(s)|ds]
and
T 3n le C(λn)2
kn
1
|Dn|int
Dn
int
Dn
|ϕ(s1 minus s2)|ds1 ds2
Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1
n and T 3n are both
of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition
(12)To prove (B2) we first note that [var(Sb
n | cap Dn)]2 is equal to thesum of the following three terms
1
k2n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj2l(n)
[Z
(x1 minus cj1 + ci1
)
times Z(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)Z
(x4 minus cj2 + ci2
)]
minus 2
k3n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)
times Z(x4 minus cj3 + ci2
)]
and
1
k4n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
knsum
j4=1
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
sum
Dj4l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj2 + ci1
)Z
(x3 minus cj3 + ci2
)Z
(x4 minus cj4 + ci2
)]
Furthermore the following relationships are possible for x1 x4
a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but
x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =
x3 = x4 but x1 = x2e x1 = x2 = x3 = x4
When calculating the variance of 1|Dn| var(Sb
n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1
|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-
plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner
Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral
(λn)4
k2n|Dn|2
knsum
i1=1
knsum
i2=1
knsum
j1=1
int
Dj1l(n)
Z(s1 minus cj1 + ci1
)
times[ knsum
j2=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
Z(s2 minus cj1 + ci1
)Z
(s3 minus cj2 + ci2
)
times Z(s4 minus cj2 + ci2
)F1(s1 s2 s3 s4) ds2 ds3 ds4
minus 2
kn
knsum
j2=1
knsum
j3=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj3l(n)
Z(s2 minus cj1 + ci1
)
times Z(s3 minus cj2 + ci2
)Z
(s4 minus cj3 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
+ 1
k2n
knsum
j2=1
knsum
j3=1
knsum
j4=1
int
Dj2l(n)
int
Dj3l(n)
int
Dj4l(n)
Z(s2 minus cj2 + ci1
)
times Z(s3 minus cj3 + ci2
)Z
(s4 minus cj4 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
]ds1
Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1
s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by
2C(λn)4
kn|Dn|2knsum
j1=1
int
Dj1l(n)
int
Dj1l(n)
int
Dn
int
Dn
|F2(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|F1(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by
C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|ϕ4(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ 4C(λn)4
kn|Dn|knsum
j2=1
int
Dn
int
Dj2l(n)
int
Dj2l(n)
|ϕ3(s1 s3 s4)|ds1 ds3 ds4
+ 2C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
[int
Dj1l(n)
int
Dj2l(n)
|ϕ(s1 s3)|ds1 ds3
]2
The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|
APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP
Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s
θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap
1386 Journal of the American Statistical Association December 2007
Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that
1
|Dn|var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0
where var[Sbn |(θ0) cap Dn] is var(Sb
n | cap N) defined in Appendix Bvar[Sb
n |(θn) cap Dn] is defined analogously
Letsum
D andsumsum =
Ddenote
sum
Dcap(acupb)
andsumsum
x1isinDcapNx2isinDcap(acupb)x1 =x2
Note that
1
|Dn|∣∣var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣
ltC
|Dn|kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)capN
sum
Dj
l(n)
1
+ C
|Dn|k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
capN
sum
Dj2l(n)
1
= C
|Dn|knsum
j=1
=sumsum
Dj
l(n)
1
+ C
|Dn|sum
Dn
1 + C
|Dn|kn
[sum
Dn
1
][ sum
xisinDncapN
1
]
Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for
0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator
[Received December 2006 Revised May 2007]
REFERENCES
Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42
Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350
Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston
Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point
Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York
Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-
VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-
geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear
Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821
(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125
Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737
Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730
Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag
McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419
Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482
Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics
From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275
Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154
Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47
Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93
Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523
(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048
Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley
Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258
Guan and Loh Block Bootstrap Variance Estimation 1383
Figure 2 Locations of BLP trees
We reanalyzed the data because the INSP model was only aldquocruderdquo model for the data as pointed out by Waagepetersen(2007) The INSP model attributed the possible clusteringamong the BPL trees to a one-round seed-dispersal process Inreality however the clustering might be due to many differentfactors (eg important environmental factors other than eleva-tion and gradient) that were not included in the model Evenif the clustering was due to seed dispersal alone it probablywas due to seed dispersal that had occurred in multiple genera-tions As a result the validity of the model and the subsequentestimated standard errors and inference on the regression pa-rameters required further investigation Our proposed thinnedblocking bootstrap method on the other hand requires muchweaker conditions and thus might yield more reasonable esti-mates for the standard errors and more reliable inference on theregression parameters
To estimate the standard errors for β1 and β2 we applied(12) with K = 100 and B = 999 and used 200 times 100 m sub-blocks We chose K = 100 because the trace plots for the es-timated standard errors became fairly stable if 100 or morethinned realizations were used We selected the subblock sizefrom using the data-driven procedure discussed at the end ofSection 3 Table 4 gives the estimated standard errors for β1 andβ2 using the thinned block bootstrap and the plug-in methodof Waagepetersen (2007) along with the respective 95 confi-dence intervals based on these estimated standard errors Notethat the intervals based on the thinned block bootstrap areslightly shorter than those from the plug-in method but nev-ertheless lead to the same conclusion as the plug-in methodSpecifically they suggest that β2 is significant but β1 is not
Table 4 Results for the BLP data
Plug-in method Thinned block bootstrap
EST STD 95 CI STD 95 CI
β1 02 02 (minus02 06) 017 (minus014 054)
β2 584 253 (891080) 212 (169999)
NOTE EST STD and CI denote the point estimate of the regression parameters by maxi-mizing (1) the standard deviation and the confidence interval
From a biological standpoint this means that the BPL treesprefer to live on slopes but do not really favor either high orlow altitudes Thus our results formally confirm the findings ofWaagepetersen (2007) by using less restrictive assumptions
APPENDIX A PROOF OF THEOREM 1
Let β0 represent the true regression parameter For ease of presen-tation but without loss of generality we assume that β and β0 are bothscalars that is p = 1 In the case of p gt 1 the proof can be easilygeneralized by applying the CramerndashWold device
Lemma A1 Assume that conditions (4)ndash(6) hold Then |Dn|2 timesE[U(1)
n (β0)]4 lt C for some C lt infin
Proof By repeatedly using Campbellrsquos theorem (eg Stoyan andStoyan 1994) and the relationship between moments and cumulantsand using the fact that λ(1)(sβ0)λ(sβ0) is bounded due to condi-
tions (4) and (5) we can derive that |Dn|4E[U(1)n (β0)]4 is bounded
by the following (ignoring some multiplicative constants)int int int int
|Q4(s1 s2 s3 s4)|ds1 ds2 ds3 ds4
+[int int
|Q2(s1 s2)|ds1 ds2
]2
+int int int
|Q3(s1 s2 s3)|ds1 ds2 ds3
+ |Dn|int int
|Q2(s1 s2)|ds1 ds2
+int int
|Q2(s1 s2)|ds1 ds2
where all of the integrals are over Dn It then follows from condition(6) that the foregoing sum is of order |Dn|2 Thus the lemma is proven
Proof of Theorem 1
First note that
U(1)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(1)(xβ)
λ(xβ)minus
intλ(1)(sβ)ds
]
and
U(2)n (β) = 1
|Dn|[ sum
xisinNcapDn
λ(2)(xβ)
λ(xβ)minus
intλ(2)(sβ)ds
]
minus 1
|Dn|sum
xisinNcapDn
[λ(1)(xβ)
λ(xβ)
]2
= U(2)na(β) minus U
(2)nb
(β)
Using the Taylor expansion we can obtain
|Dn|12(βn minus β0) = minus[U
(2)n (βn)
]minus1|Dn|12U(1)n (β0)
where βn is between βn and β0 We need to show that
U(2)na(β0)
prarr 0
and
U(2)nb
(β0)prarr 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
1384 Journal of the American Statistical Association December 2007
To show the first expression note that E[U(2)na(β0)] = 0 Thus we only
need look at the variance term
var[U
(2)na(β0)
]
= 1
|Dn|2int int
λ(2)(s1β0)λ(2)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(2)(sβ0)]2
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)
prarr 0
To show the second expression note that E[U(2)nb
(β0)] =1
|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only
the variance term
var[U
(2)nb
(β0)]
= 1
|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(1)(sβ0)]4
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb
(β0)prarr
1|Dn|
int [λ(1)(sβ0)]2λ(sβ0) ds
Now we want to show that |Dn|12U(1)n (β0) converges to a
normal distribution First we derive the mean and variance of|Dn|12U
(1)n (β0) Clearly E[|Dn|12U
(1)n (β0)] = 0 For the variance
var[|Dn|12U
(1)n (β0)
]
= 1
|Dn|int int
λ(1)(s1β0)λ(1)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
Now consider a new sequence of regions Dlowastn where Dlowast
n sub Dn and
|Dlowastn||Dn| rarr 1 as n rarr infin Let U
(1)nlowast(β0) be the estimating function
for the realization of the point process on Dlowastn Based on the expression
of the variance we can deduce that
var[|Dn|12U
(1)n (β0) minus |Dlowast
n|12U(1)nlowast(β0)
] rarr 0 as n rarr infin
Thus
|Dn|12U(1)n (β0) sim |Dlowast
n|12U(1)nlowast(β0)
where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast
n|12U(1)nlowast(β0) con-
verges to a normal distribution for some properly defined Dlowastn
To obtain a sequence of Dlowastn we apply the blocking technique used
by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt
η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi
l(n) i = 1 kn Within each subsquare we further obtain Di
m(n)
an m(n) times m(n) square sharing the same center with Dil(n)
Note that
d(Dim(n)
Djm(n)
) ge nη for i = j Now define
Dlowastn =
kn⋃
i=1
Dim(n)
Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show
that |Dlowastn|12U
(1)nlowast(β0) converges to a normal distribution This is true
due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here
APPENDIX B PROOF OF THEOREM 2
To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1
sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes
var(Sn) =Bsum
b=1
(Sbn minus Sn)2
B minus 1
As B goes to infinity we see that the foregoing converges to
var(Sbn | cap Dn)
= 1
kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)
sum
Dj
l(n)
Z(x1 minus cj + ci )Z(x2 minus cj + ci )
minus 1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj2l(n)
Z(x1 minus cj1 + ci
)
times Z(x2 minus cj2 + ci
)
where the notationsum
D is short forsum
Dcap We wish to show thatvar(Sb
n | cap Dn)|Dn| converges in L2 to
1
|Dn| var(Sn) = (λn)2
|Dn|int
Dn
int
Dn
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
+ λn
|Dn|int
Dn
[Z(s)]2 ds
To show this we need to show that
1
|Dn|E[var(Sbn | cap Dn)] rarr 1
|Dn| var(Sn) (B1)
and
1
|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)
To show (B1) note that
E[var(Sbn | cap Dn)]
= (λn)2knsum
i=1
int int
Dil(n)
Z(s1)Z(s2)g(s1 minus s2) ds1 ds2
+ λn(kn minus 1)
kn
int
Dn
[Z(s)]2 ds
minus (λn)2
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times g(s1 minus s2 + cj1 minus cj2
)ds1 ds2
Thus
1
|Dn|E[var(Sb
n | cap Dn)] minus var(Sn)
= minus λn
kn|Dn|int
Dn
[Z(s)]2 ds
minus (λn)2
|Dn|sumsum
i =j
int
Dil(n)
int
Dj
l(n)
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
Guan and Loh Block Bootstrap Variance Estimation 1385
minus (λn)2
|Dn|1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times ϕ(s1 minus s2 + cj1 minus cj2
)ds1 ds2
= minusT 1n minus T 2
n minus T 3n
T 1n goes to 0 due to conditions (2) and (5) For T 2
n and T 3n lengthy yet
elementary derivations yield
|T 2n | le C(λn)2
|kn|knsum
i=1
[1
|Dn|int
DnminusDn
|Dn cap (Dn minus s)||ϕ(s)|ds
minus 1
|Dil(n)
|int
Dil(n)
minusDil(n)
∣∣Dil(n) cap (
Dil(n) minus s
)∣∣|ϕ(s)|ds]
and
T 3n le C(λn)2
kn
1
|Dn|int
Dn
int
Dn
|ϕ(s1 minus s2)|ds1 ds2
Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1
n and T 3n are both
of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition
(12)To prove (B2) we first note that [var(Sb
n | cap Dn)]2 is equal to thesum of the following three terms
1
k2n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj2l(n)
[Z
(x1 minus cj1 + ci1
)
times Z(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)Z
(x4 minus cj2 + ci2
)]
minus 2
k3n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)
times Z(x4 minus cj3 + ci2
)]
and
1
k4n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
knsum
j4=1
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
sum
Dj4l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj2 + ci1
)Z
(x3 minus cj3 + ci2
)Z
(x4 minus cj4 + ci2
)]
Furthermore the following relationships are possible for x1 x4
a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but
x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =
x3 = x4 but x1 = x2e x1 = x2 = x3 = x4
When calculating the variance of 1|Dn| var(Sb
n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1
|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-
plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner
Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral
(λn)4
k2n|Dn|2
knsum
i1=1
knsum
i2=1
knsum
j1=1
int
Dj1l(n)
Z(s1 minus cj1 + ci1
)
times[ knsum
j2=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
Z(s2 minus cj1 + ci1
)Z
(s3 minus cj2 + ci2
)
times Z(s4 minus cj2 + ci2
)F1(s1 s2 s3 s4) ds2 ds3 ds4
minus 2
kn
knsum
j2=1
knsum
j3=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj3l(n)
Z(s2 minus cj1 + ci1
)
times Z(s3 minus cj2 + ci2
)Z
(s4 minus cj3 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
+ 1
k2n
knsum
j2=1
knsum
j3=1
knsum
j4=1
int
Dj2l(n)
int
Dj3l(n)
int
Dj4l(n)
Z(s2 minus cj2 + ci1
)
times Z(s3 minus cj3 + ci2
)Z
(s4 minus cj4 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
]ds1
Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1
s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by
2C(λn)4
kn|Dn|2knsum
j1=1
int
Dj1l(n)
int
Dj1l(n)
int
Dn
int
Dn
|F2(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|F1(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by
C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|ϕ4(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ 4C(λn)4
kn|Dn|knsum
j2=1
int
Dn
int
Dj2l(n)
int
Dj2l(n)
|ϕ3(s1 s3 s4)|ds1 ds3 ds4
+ 2C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
[int
Dj1l(n)
int
Dj2l(n)
|ϕ(s1 s3)|ds1 ds3
]2
The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|
APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP
Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s
θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap
1386 Journal of the American Statistical Association December 2007
Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that
1
|Dn|var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0
where var[Sbn |(θ0) cap Dn] is var(Sb
n | cap N) defined in Appendix Bvar[Sb
n |(θn) cap Dn] is defined analogously
Letsum
D andsumsum =
Ddenote
sum
Dcap(acupb)
andsumsum
x1isinDcapNx2isinDcap(acupb)x1 =x2
Note that
1
|Dn|∣∣var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣
ltC
|Dn|kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)capN
sum
Dj
l(n)
1
+ C
|Dn|k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
capN
sum
Dj2l(n)
1
= C
|Dn|knsum
j=1
=sumsum
Dj
l(n)
1
+ C
|Dn|sum
Dn
1 + C
|Dn|kn
[sum
Dn
1
][ sum
xisinDncapN
1
]
Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for
0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator
[Received December 2006 Revised May 2007]
REFERENCES
Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42
Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350
Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston
Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point
Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York
Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-
VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-
geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear
Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821
(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125
Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737
Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730
Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag
McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419
Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482
Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics
From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275
Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154
Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47
Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93
Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523
(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048
Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley
Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258
1384 Journal of the American Statistical Association December 2007
To show the first expression note that E[U(2)na(β0)] = 0 Thus we only
need look at the variance term
var[U
(2)na(β0)
]
= 1
|Dn|2int int
λ(2)(s1β0)λ(2)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(2)(sβ0)]2
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)na(β0)
prarr 0
To show the second expression note that E[U(2)nb
(β0)] =1
|Dn|int [λ(1)(sβ0)]2λ(sβ0) ds Thus again we need consider only
the variance term
var[U
(2)nb
(β0)]
= 1
|Dn|2int int [λ(1)(s1β0)λ(1)(s2β0)]2
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|2int [λ(1)(sβ0)]4
λ(sβ0)ds
which converges to 0 due to conditions (4)ndash(6) Thus U(2)nb
(β0)prarr
1|Dn|
int [λ(1)(sβ0)]2λ(sβ0) ds
Now we want to show that |Dn|12U(1)n (β0) converges to a
normal distribution First we derive the mean and variance of|Dn|12U
(1)n (β0) Clearly E[|Dn|12U
(1)n (β0)] = 0 For the variance
var[|Dn|12U
(1)n (β0)
]
= 1
|Dn|int int
λ(1)(s1β0)λ(1)(s2β0)
λ(s1β0)λ(s2β0)Q2(s1 s2) ds1 ds2
+ 1
|Dn|int [λ(1)(sβ0)]2
λ(sβ0)ds
Now consider a new sequence of regions Dlowastn where Dlowast
n sub Dn and
|Dlowastn||Dn| rarr 1 as n rarr infin Let U
(1)nlowast(β0) be the estimating function
for the realization of the point process on Dlowastn Based on the expression
of the variance we can deduce that
var[|Dn|12U
(1)n (β0) minus |Dlowast
n|12U(1)nlowast(β0)
] rarr 0 as n rarr infin
Thus
|Dn|12U(1)n (β0) sim |Dlowast
n|12U(1)nlowast(β0)
where the notation an sim bn means that an and bn have the same lim-iting distribution Thus we need only show that |Dlowast
n|12U(1)nlowast(β0) con-
verges to a normal distribution for some properly defined Dlowastn
To obtain a sequence of Dlowastn we apply the blocking technique used
by Guan et al (2004) Let l(n) = nα m(n) = nα minus nη for 4(2 + ε) lt
η lt α lt 1 where ε is defined as in condition (3) We first divide theoriginal domain Dn into some nonoverlapping l(n)times l(n) subsquaresDi
l(n) i = 1 kn Within each subsquare we further obtain Di
m(n)
an m(n) times m(n) square sharing the same center with Dil(n)
Note that
d(Dim(n)
Djm(n)
) ge nη for i = j Now define
Dlowastn =
kn⋃
i=1
Dim(n)
Condition (2) implies that |Dlowastn||Dn| rarr 1 Thus we need only show
that |Dlowastn|12U
(1)nlowast(β0) converges to a normal distribution This is true
due to the mixing condition the result in Lemma A1 and the appli-cation of the Lyapunovrsquos central limit theorem The proof is similar tothat of Guan et al (2004) thus we omit the details here
APPENDIX B PROOF OF THEOREM 2
To simplify notation let ϕ(u) = g(u) minus 1 ϕk(s1 sk) = Qk(s1
sk)[λ(s1) middot middot middotλ(sk)] gk(s1 sk) = λk(s1 sk)[λ(s1) middot middot middotλ(sk)] for k = 3 and 4 and Z(s) = λ(1)(s) As in the proof of The-orem 1 we assume that β and β0 are both scalars that is p = 1 Asa result Z(s) is a scalar so that cov(Sn) = var(Sn) Furthermore weconsider only the case where K = 1 Thus the covariance estimatordefined in (11) becomes
var(Sn) =Bsum
b=1
(Sbn minus Sn)2
B minus 1
As B goes to infinity we see that the foregoing converges to
var(Sbn | cap Dn)
= 1
kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)
sum
Dj
l(n)
Z(x1 minus cj + ci )Z(x2 minus cj + ci )
minus 1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj2l(n)
Z(x1 minus cj1 + ci
)
times Z(x2 minus cj2 + ci
)
where the notationsum
D is short forsum
Dcap We wish to show thatvar(Sb
n | cap Dn)|Dn| converges in L2 to
1
|Dn| var(Sn) = (λn)2
|Dn|int
Dn
int
Dn
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
+ λn
|Dn|int
Dn
[Z(s)]2 ds
To show this we need to show that
1
|Dn|E[var(Sbn | cap Dn)] rarr 1
|Dn| var(Sn) (B1)
and
1
|Dn|2 var[var(Sbn | cap Dn)] rarr 0 (B2)
To show (B1) note that
E[var(Sbn | cap Dn)]
= (λn)2knsum
i=1
int int
Dil(n)
Z(s1)Z(s2)g(s1 minus s2) ds1 ds2
+ λn(kn minus 1)
kn
int
Dn
[Z(s)]2 ds
minus (λn)2
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times g(s1 minus s2 + cj1 minus cj2
)ds1 ds2
Thus
1
|Dn|E[var(Sb
n | cap Dn)] minus var(Sn)
= minus λn
kn|Dn|int
Dn
[Z(s)]2 ds
minus (λn)2
|Dn|sumsum
i =j
int
Dil(n)
int
Dj
l(n)
Z(s1)Z(s2)ϕ(s1 minus s2) ds1 ds2
Guan and Loh Block Bootstrap Variance Estimation 1385
minus (λn)2
|Dn|1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times ϕ(s1 minus s2 + cj1 minus cj2
)ds1 ds2
= minusT 1n minus T 2
n minus T 3n
T 1n goes to 0 due to conditions (2) and (5) For T 2
n and T 3n lengthy yet
elementary derivations yield
|T 2n | le C(λn)2
|kn|knsum
i=1
[1
|Dn|int
DnminusDn
|Dn cap (Dn minus s)||ϕ(s)|ds
minus 1
|Dil(n)
|int
Dil(n)
minusDil(n)
∣∣Dil(n) cap (
Dil(n) minus s
)∣∣|ϕ(s)|ds]
and
T 3n le C(λn)2
kn
1
|Dn|int
Dn
int
Dn
|ϕ(s1 minus s2)|ds1 ds2
Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1
n and T 3n are both
of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition
(12)To prove (B2) we first note that [var(Sb
n | cap Dn)]2 is equal to thesum of the following three terms
1
k2n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj2l(n)
[Z
(x1 minus cj1 + ci1
)
times Z(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)Z
(x4 minus cj2 + ci2
)]
minus 2
k3n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)
times Z(x4 minus cj3 + ci2
)]
and
1
k4n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
knsum
j4=1
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
sum
Dj4l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj2 + ci1
)Z
(x3 minus cj3 + ci2
)Z
(x4 minus cj4 + ci2
)]
Furthermore the following relationships are possible for x1 x4
a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but
x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =
x3 = x4 but x1 = x2e x1 = x2 = x3 = x4
When calculating the variance of 1|Dn| var(Sb
n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1
|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-
plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner
Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral
(λn)4
k2n|Dn|2
knsum
i1=1
knsum
i2=1
knsum
j1=1
int
Dj1l(n)
Z(s1 minus cj1 + ci1
)
times[ knsum
j2=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
Z(s2 minus cj1 + ci1
)Z
(s3 minus cj2 + ci2
)
times Z(s4 minus cj2 + ci2
)F1(s1 s2 s3 s4) ds2 ds3 ds4
minus 2
kn
knsum
j2=1
knsum
j3=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj3l(n)
Z(s2 minus cj1 + ci1
)
times Z(s3 minus cj2 + ci2
)Z
(s4 minus cj3 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
+ 1
k2n
knsum
j2=1
knsum
j3=1
knsum
j4=1
int
Dj2l(n)
int
Dj3l(n)
int
Dj4l(n)
Z(s2 minus cj2 + ci1
)
times Z(s3 minus cj3 + ci2
)Z
(s4 minus cj4 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
]ds1
Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1
s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by
2C(λn)4
kn|Dn|2knsum
j1=1
int
Dj1l(n)
int
Dj1l(n)
int
Dn
int
Dn
|F2(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|F1(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by
C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|ϕ4(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ 4C(λn)4
kn|Dn|knsum
j2=1
int
Dn
int
Dj2l(n)
int
Dj2l(n)
|ϕ3(s1 s3 s4)|ds1 ds3 ds4
+ 2C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
[int
Dj1l(n)
int
Dj2l(n)
|ϕ(s1 s3)|ds1 ds3
]2
The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|
APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP
Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s
θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap
1386 Journal of the American Statistical Association December 2007
Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that
1
|Dn|var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0
where var[Sbn |(θ0) cap Dn] is var(Sb
n | cap N) defined in Appendix Bvar[Sb
n |(θn) cap Dn] is defined analogously
Letsum
D andsumsum =
Ddenote
sum
Dcap(acupb)
andsumsum
x1isinDcapNx2isinDcap(acupb)x1 =x2
Note that
1
|Dn|∣∣var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣
ltC
|Dn|kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)capN
sum
Dj
l(n)
1
+ C
|Dn|k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
capN
sum
Dj2l(n)
1
= C
|Dn|knsum
j=1
=sumsum
Dj
l(n)
1
+ C
|Dn|sum
Dn
1 + C
|Dn|kn
[sum
Dn
1
][ sum
xisinDncapN
1
]
Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for
0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator
[Received December 2006 Revised May 2007]
REFERENCES
Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42
Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350
Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston
Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point
Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York
Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-
VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-
geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear
Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821
(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125
Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737
Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730
Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag
McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419
Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482
Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics
From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275
Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154
Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47
Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93
Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523
(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048
Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley
Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258
Guan and Loh Block Bootstrap Variance Estimation 1385
minus (λn)2
|Dn|1
k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
int int
Dil(n)
Z(s1)Z(s2)
times ϕ(s1 minus s2 + cj1 minus cj2
)ds1 ds2
= minusT 1n minus T 2
n minus T 3n
T 1n goes to 0 due to conditions (2) and (5) For T 2
n and T 3n lengthy yet
elementary derivations yield
|T 2n | le C(λn)2
|kn|knsum
i=1
[1
|Dn|int
DnminusDn
|Dn cap (Dn minus s)||ϕ(s)|ds
minus 1
|Dil(n)
|int
Dil(n)
minusDil(n)
∣∣Dil(n) cap (
Dil(n) minus s
)∣∣|ϕ(s)|ds]
and
T 3n le C(λn)2
kn
1
|Dn|int
Dn
int
Dn
|ϕ(s1 minus s2)|ds1 ds2
Both terms converge to 0 due to conditions (2) (4) (5) and (6) Thus(B1) is proven Furthermore we can see that both T 1
n and T 3n are both
of order 1kn whereas T 2n is of order 1|Dl(n)|12 due to condition
(12)To prove (B2) we first note that [var(Sb
n | cap Dn)]2 is equal to thesum of the following three terms
1
k2n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj2l(n)
[Z
(x1 minus cj1 + ci1
)
times Z(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)Z
(x4 minus cj2 + ci2
)]
minus 2
k3n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
sum
Dj1l(n)
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj1 + ci1
)Z
(x3 minus cj2 + ci2
)
times Z(x4 minus cj3 + ci2
)]
and
1
k4n
knsum
i1=1
knsum
i2=1
knsum
j1=1
knsum
j2=1
knsum
j3=1
knsum
j4=1
sum
Dj1l(n)
sum
Dj2l(n)
sum
Dj3l(n)
sum
Dj4l(n)
[Z
(x1 minus cj1
+ ci1
)Z
(x2 minus cj2 + ci1
)Z
(x3 minus cj3 + ci2
)Z
(x4 minus cj4 + ci2
)]
Furthermore the following relationships are possible for x1 x4
a x1 = x2 = x3 = x4b x1 = x2 and x3 = x4 but x1 = x3 or x1 = x3 and x2 = x4 but
x1 = x2c x1 = x2 but x2 = x3 = x4 or x3 = x4 but x1 = x2 = x3d x1 = x2 = x3 but x3 = x4 or x1 = x2 = x3 but x1 = x4 or x2 =
x3 = x4 but x1 = x2e x1 = x2 = x3 = x4
When calculating the variance of 1|Dn| var(Sb
n | cap Dn) the forego-ing relationships combined with the expressions for the squared valueof 1
|Dn|E[var(Sbn | cap Dn)] in turn lead to integrals of different com-
plexity To save space we present only the integral from term e theremaining integrals can be shown to have order no higher than 1kn ina similar manner
Let F1(s1 s2 s3 s4) = g4(s1 s2 s3 s4) minus g(s1 s2)g(s3 s4)Term e leads to the following integral
(λn)4
k2n|Dn|2
knsum
i1=1
knsum
i2=1
knsum
j1=1
int
Dj1l(n)
Z(s1 minus cj1 + ci1
)
times[ knsum
j2=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
Z(s2 minus cj1 + ci1
)Z
(s3 minus cj2 + ci2
)
times Z(s4 minus cj2 + ci2
)F1(s1 s2 s3 s4) ds2 ds3 ds4
minus 2
kn
knsum
j2=1
knsum
j3=1
int
Dj1l(n)
int
Dj2l(n)
int
Dj3l(n)
Z(s2 minus cj1 + ci1
)
times Z(s3 minus cj2 + ci2
)Z
(s4 minus cj3 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
+ 1
k2n
knsum
j2=1
knsum
j3=1
knsum
j4=1
int
Dj2l(n)
int
Dj3l(n)
int
Dj4l(n)
Z(s2 minus cj2 + ci1
)
times Z(s3 minus cj3 + ci2
)Z
(s4 minus cj4 + ci2
)
times F1(s1 s2 s3 s4) ds2 ds3 ds4
]ds1
Let F2(s1 s2 s3 s4) = ϕ4(s1 s2 s3 s4) + 2ϕ3(s1 s2 s3) + 2ϕ3(s1
s3 s4) + 2ϕ(s1 minus s3)ϕ(s2 minus s4) By noting that Z(middot) is bounded wecan derive from lengthy yet elementary algebra that the foregoing isbounded by
2C(λn)4
kn|Dn|2knsum
j1=1
int
Dj1l(n)
int
Dj1l(n)
int
Dn
int
Dn
|F2(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|F1(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
The first integral in the foregoing is of order 1kn due to conditions(4) (5) and (6) The second integral is bounded by
C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
int
Dj1l(n)
int
Dj1l(n)
int
Dj2l(n)
int
Dj2l(n)
|ϕ4(s1 s2
s3 s4)|ds1 ds2 ds3 ds4
+ 4C(λn)4
kn|Dn|knsum
j2=1
int
Dn
int
Dj2l(n)
int
Dj2l(n)
|ϕ3(s1 s3 s4)|ds1 ds3 ds4
+ 2C(λn)4
|Dn|2knsum
j1=1
knsum
j2=1
[int
Dj1l(n)
int
Dj2l(n)
|ϕ(s1 s3)|ds1 ds3
]2
The foregoing is of order 1kn + an due to conditions (4) (5) (6) and(12) where the order of an is no higher than 1|Dl(n)|
APPENDIX C JUSTIFICATION FOR USING βnIN THE THINNING STEP
Here we assume |Dl(n)| = o(|Dn|12) Let pn(x θ) = minsisinDnλ(s
θ)λ(x θ) and let r(x) be a uniform random variable in [01] Ifr(x) le pn(x θ) where x isin (N cap Dn) then x will be retained in thethinned process say (θ) Based on the same set r(x) x isin (N cap
1386 Journal of the American Statistical Association December 2007
Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that
1
|Dn|var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0
where var[Sbn |(θ0) cap Dn] is var(Sb
n | cap N) defined in Appendix Bvar[Sb
n |(θn) cap Dn] is defined analogously
Letsum
D andsumsum =
Ddenote
sum
Dcap(acupb)
andsumsum
x1isinDcapNx2isinDcap(acupb)x1 =x2
Note that
1
|Dn|∣∣var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣
ltC
|Dn|kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)capN
sum
Dj
l(n)
1
+ C
|Dn|k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
capN
sum
Dj2l(n)
1
= C
|Dn|knsum
j=1
=sumsum
Dj
l(n)
1
+ C
|Dn|sum
Dn
1 + C
|Dn|kn
[sum
Dn
1
][ sum
xisinDncapN
1
]
Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for
0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator
[Received December 2006 Revised May 2007]
REFERENCES
Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42
Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350
Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston
Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point
Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York
Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-
VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-
geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear
Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821
(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125
Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737
Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730
Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag
McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419
Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482
Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics
From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275
Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154
Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47
Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93
Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523
(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048
Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley
Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258
1386 Journal of the American Statistical Association December 2007
Dn) we can determine (θ0) and (θn) Note that (θ0) is as de-fined in (7) using the true FOIF Define a = x isin (θ0)x isin (θn)and b = x isin (θ0)x isin (θn) Note that P(x isin a cupb|x isin N) le|pn(x θn) minus pn(x θ0)| We need to show that
1
|Dn|var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn] prarr 0
where var[Sbn |(θ0) cap Dn] is var(Sb
n | cap N) defined in Appendix Bvar[Sb
n |(θn) cap Dn] is defined analogously
Letsum
D andsumsum =
Ddenote
sum
Dcap(acupb)
andsumsum
x1isinDcapNx2isinDcap(acupb)x1 =x2
Note that
1
|Dn|∣∣var[Sb
n |(θ0) cap Dn] minus var[Sbn |(θn) cap Dn]∣∣
ltC
|Dn|kn
knsum
i=1
knsum
j=1
sum
Dj
l(n)capN
sum
Dj
l(n)
1
+ C
|Dn|k2n
knsum
i=1
knsum
j1=1
knsum
j2=1
sum
Dj1l(n)
capN
sum
Dj2l(n)
1
= C
|Dn|knsum
j=1
=sumsum
Dj
l(n)
1
+ C
|Dn|sum
Dn
1 + C
|Dn|kn
[sum
Dn
1
][ sum
xisinDncapN
1
]
Note that P [x isin (a cup b)|x isin N ] le C|Dn|α2 with probability 1 for
0 lt α lt 1 due to Theorem 1 Thus all three terms in the foregoingconverge to 0 in probability This in turn leads to the consistency ofthe variance estimator
[Received December 2006 Revised May 2007]
REFERENCES
Baddeley A J and Turner R (2005) ldquoSpatstat An R Package for AnalyzingSpatial Point Patternsrdquo Journal of Statistical Software 12 1ndash42
Baddeley A J Moslashller J and Waagepetersen R (2000) ldquoNon- and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patternsrdquo Sta-tistica Neerlandica 54 329ndash350
Brillinger D R (1975) Time Series Data Analysis and Theory New YorkHolt Rinehart amp Winston
Cressie N A C (1993) Statistics for Spatial Data New York WileyDaley D J and Vere-Jones D (1988) An Introduction to the Theory of Point
Processes New York Springer-VerlagDiggle P J (2003) Statistical Analysis of Spatial Point Patterns New York
Oxford University Press IncDoukhan P (1994) Mixing Properties and Examples New York Springer-
VerlagGuan Y (2007) ldquoVariance Estimation for Statistics Computed From Inhomo-
geneous Spatial Point Processesrdquo Journal of the Royal Statistical SocietySer B to appear
Guan Y Sherman M and Calvin J A (2004) ldquoA Nonparametric Test forSpatial Isotropy Using Subsamplingrdquo Journal of the American Statistical As-sociation 99 810ndash821
(2006) ldquoAssessing Isotropy for Spatial Point Processesrdquo Biometrics62 119ndash125
Hall P and Jing B (1996) ldquoOn Sample Reuse Methods for Dependent DatardquoJournal of the Royal Statistical Society Ser B 58 727ndash737
Heinrich L (1985) ldquoNormal Convergence of Multidimensional Shot Noise andRates of This Convergencerdquo Advances in Applied Probability 17 709ndash730
Lahiri S N (2003) Resampling Methods for Dependent Data New YorkSpringer-Verlag
McElroy T and Politis D N (2007) ldquoStable Marked Point Processesrdquo TheAnnals of Statistics 35 393ndash419
Moslashller J Syversveen A R and Waagepetersen R P (1998) ldquoLog GaussianCox Processesrdquo Scandinavian Journal of Statistics 25 451ndash482
Politis D N (1999) Subsampling New York Springer-VerlagPolitis D N and Sherman M (2001) ldquoMoment Estimation for Statistics
From Marked Point Processesrdquo Journal of the Royal Statistical SocietySer B 63 261ndash275
Rathbun S L and Cressie N (1994) ldquoAsymptotic Properties of Estimatorsfor the Parameters of Spatial Inhomogeneous Poisson Point Processesrdquo Ad-vances in Applied Probability 26 122ndash154
Rosenblatt M (1956) ldquoA Central Limit Theorem and a Strong Mixing Condi-tionrdquo Proceedings of the National Academy of Sciences 42 43ndash47
Schoenberg F P (2004) ldquoConsistent Parametric Estimation of the Intensityof a SpatialndashTemporal Point Processrdquo Journal of Statistical Planning andInference 128 79ndash93
Sherman M (1996) ldquoVariance Estimation for Statistics Computed FromSpatial Lattice Datardquo Journal of the Royal Statistical Society Ser B 58509ndash523
(1997) ldquoSubseries Methods in Regressionrdquo Journal of the AmericanStatistical Association 92 1041ndash1048
Stoyan D and Stoyan H (1994) Fractals Random Shapes and Point FieldsNew York Wiley
Waagepetersen R P (2007) ldquoAn Estimating Function Approach to Inferencefor Inhomogeneous NeymanndashScott Processesrdquo Biometrics 62 252ndash258