Bootstrap Methods for Time Series: A Selective Overview...
Transcript of Bootstrap Methods for Time Series: A Selective Overview...
Bootstrap Methods for Time Series:A Selective Overview
Dimitris N. PolitisUniversity of California, San Diego
2
DATA: X1, . . . , Xn from time series {Xt, t ∈ Z}
Different resampling set-ups:
1. Parametric (?) E.g. {Xt} is a Gaussian time serieswith mean µ = EXt and stationary autocovarianceγ(k) = Cov(Xt,Xt+k); cf. Ramos (1988).
2. Semi-parametric.E.g. {Xt} satisfies the ARMA(p, q) equation:
Xt−φ1Xt−1−· · ·−φpXt−p = Zt+θ1Zt−1+· · ·+θqZt−q
or the AR(∞) linear time series model:
Xt =
∞∑k=1
φkXt−k + Zt
where Zt ∼ i.i.d. F (unknown) with EZt = 0.Freedman, Efron/Tibshirani, Kreiss, Paparoditis,Swanepoel/VanWyk, Buhlmann
3
3. Non-parametric.
• Model-based. E.g. nonparametric AR(1)Xt = g(Xt−1) + Zt, where Zt ∼ i.i.d. (0, σ2).
• Model-free. Blocking methods, Markovmethods, Frequency-Domain, etc.
A different classification:
• Frequency-Domain Bootstrap.Hurvich/Zeger (1987), Franke/Hardle (1992),Theiler et al. (1994), Dahlhaus/Janas (1996),Braun/Kulperger (1997), Paparoditis/Politis (1999),Kreiss/Paparoditis (2003), Paparoditis (2003)
• Time-Domain Bootstrap.
– Markov: Rajarshi (1990), Horowitz (2002).
– Local bootstrap: Paparoditis/Politis (2000-2002)
– AR(∞) ‘sieve’ bootstrap: Kreiss (1988, 1992),Paparoditis (1992), Buhlmann (1997, 2002),Buhlmann/Wyner (1999)
– Blocking methods ���
4
The sample mean
Data: X1, . . . , Xn from stationary series {Xt, t ∈ Z}with unknown mean µ = EXt and (equally unknown)autocovariance γ(k) = Cov(Xt,Xt+k).
Xn = 1n
∑ni=1 Xi : consistent & asymptotically efficient
σ2n = Var (
√nXn) =
∑ns=−n(1 − |s|
n)γ(s).
• Under regularity:
σ2∞ := lim
n→∞σ2
n =∞∑
s=−∞γ(s) = 2πf(0)
where f(w) = (2π)−1∑∞
s=−∞ eiwsγ(s) forw ∈ [−π, π], is the spectral density function.
• Standard error estimation is nontrivial underdependence.
5
Standard error estimation
Var (Xn) � σ2∞n
where σ2∞ =
∑∞s=−∞ γ(s)
• γ(s) = n−1∑n−|s|
t=1 (Xt − Xn)(Xt+|s| − Xn)
• Naive plug-in estimator σ2∞,naive =
∑|s|<n γ(s)
• But σ2∞,naive = 2πT (0), where
T (w) =1
2πn|
n∑s=1
eiws(Xs − Xn)|2
• The periodogram T (w) is inconsistent for f(w).
� ET (w) = f(w) + O(1/n) for w �= 0.
� Var T (w) � f 2(w)(1 + 1{w/π∈Z}) �→ 0.
• Furthermore, T (0) ≡ 0!
6
Blocking schemes
Basic assumption: b → ∞ but b/n → 0 as n → ∞.
• Fully overlapping—number of blocks q = n − b + 1
B3︷ ︸︸ ︷B1︷ ︸︸ ︷
X1, X2, X3, · · · , Xb, Xb+1, Xb+2, · · · ,
Bq︷ ︸︸ ︷Xn−b+1, · · · , Xn︸ ︷︷ ︸
B2
• Non-overlapping—number of blocks Q = [n/b]
B1︷ ︸︸ ︷X1, · · · , Xb,
B2︷ ︸︸ ︷Xb+1, · · · , X2b,
B3︷ ︸︸ ︷X2b+1, · · · , X3b, · · · · · ·Xn
• Non-overlapping with ‘buffer’—number ofblocks [n/(2b)]
B1︷ ︸︸ ︷X1, · · · , Xb,
buffer︷ ︸︸ ︷Xb+1, · · · , X2b,
B2︷ ︸︸ ︷X2b+1, · · · , X3b, · · · · · ·Xn
7
Bartlett’s spectral estimation scheme
• Consider one of the blocking schemes—for simplicity:non-overlapping
• Let Ti(w) be the periodogram calculated from Bi
– ETi(w) = f(w) + O(1/b)
– Var Ti(w) � cw
where cw = f 2(w)(1 + 1{w/π∈Z}).
• Define T (w) = Q−1∑Q
i=1 Ti(w).
– ET (w) = f(w) + O(1/b)
– Var T (w) � cw/Q = cw[b/n]
• If b → ∞ but b/n → 0, then T (w)P−→ f(w).
◦ Same argument for overlapping scheme—just cw isdifferent: 33% smaller.
8
Welch’s tapered periodograms
• Lag-window interpretation:
T (w) ≈ fB(w) =1
2π
b∑s=−b
λB(s/b)eiwsγ(s)
where λB(x) = 1 − |x| is Bartlett’s kernel.
◦ Let Tνi (w) be the periodogram calculated from block
Bi tapered, i.e., multiplied, by taper ν : [0, 1] → R+
◦ Define T ν(w) = Q−1∑Q
i=1 Tνi (w). Then:
T ν(w) ≈ 1
2π
b∑s=−b
ν2(s/b)eiwsγ(s)
where ν2 = ν ∗ ν is self-convolution of ν.
• If ν(0) = 0, ν is continuous, increasing on [0, 1/2]and symmetric about 1/2, then ν2 is twicecontinuously differentiable at the origin, and:
– ET ν(w) = f(w) + O(1/b2)
– Var T ν(w) � cνw[b/n]
9
t
w(t)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.2
0.40.6
0.81.0
w(t) TRAP ; c=0.43
t
w*w(
t)
-1.0 -0.5 0.0 0.5 1.0
0.00.2
0.40.6
0.81.0
Figure 1: (a) Graph of window wTRAP0.43 ; (b) graph of the self-convolution wTRAP
0.43 ∗ wTRAP0.43 .
t
w(t)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.2
0.40.6
0.81.0
w(t) SMOOTH ; a=1.3
t
w*w(
t)
-1.0 -0.5 0.0 0.5 1.0
0.00.2
0.40.6
0.81.0
Figure 2: (a) Graph of window wSMOOTH1.3 ; (b) graph of the self-convolution wSMOOTH
1.3 ∗wSMOOTH
1.3 (this is like 1 − |x|a).
10
Subsampling standard errors—Carlstein
• Consider all available blocks B1, B2, . . . from one ofthe blocking schemes, e.g., the full-overlap scheme.1
• General statistic of interest θn = θn(X1, . . . , Xn)that is
√n–consistent for some parameter θ.
• Typically,√
n(θn − θ)L⇒ N(0, σ2
θ) as n → ∞.
• Re-compute the statistic θb over the blocksB1, . . . Bq, i.e., let θb,i = θb(Bi).
◦ Let Vb(θn) = q−1∑q
i=1(θb,i − θn)2 denote the sample
variance of the subsample statistics θb,1, . . . , θb,q.
◦ Under some moment, mixing and uniform integrability
conditions, bVb(θn)P−→ σ2
θ as b−1 + b/n → 0.
1Carlstein used non-overlapping blocking scheme for convenience.
11
Subsampling distributions—Politis/Romano
• Assume τn(θn − θ)L⇒ some law J as n → ∞.
• As before, re-compute the statistic θb over the blocksB1, . . . Bq, i.e., let θb,i = θb(Bi).
• Define the subsampling distribution Jb,n as theempirical distribution of the (centered and scaled)subsample statistics θb,1, . . . , θb,q, i.e., let
Jb,n(x) =1
q
q∑i=1
1{τb(θb,i − θn) ≤ x}
� If {Xt} is strong mixing, and b−1 + b/n → 0, then
Jb,n(x)P−→ J(x) for all points of continuity of J .
◦ Confidence intervals and tests under minimalassumptions— not set-up specific!
12
Block Bootstrap: Hall, Kunsch, Liu/Singh
• Consider all available blocks B1, B2, . . . from one ofthe blocking schemes.
• Draw Q = [n/b] blocks B∗1 , . . . , B∗
Q randomly (withreplacement) from the given set of blocks B1, B2, . . .
• Concatenate the elements of the Q bootstrap blocksB∗
1 , . . . , B∗Q to create pseudo-realization X∗
1 , · · · , X∗n
where n = bQ = b[n/b] � n.
B∗1︷ ︸︸ ︷
X∗1 , · · · , X∗
b ,
B∗2︷ ︸︸ ︷
X∗b+1, · · · , X∗
2b, X∗2b+1, · · ·
B∗Q︷ ︸︸ ︷
· · · , X∗n−1, X
∗n
13
Approximately linear statistics
◦ Let θ be a parameter associated with L(X1), andθn = θn(X1, . . . , Xn) an (approximately) linear
statistic satisfying:√
n(θn − θ)L⇒ N(0, σ2
θ).
◦ For example, θn = 1n
∑nt=1 g(Xt) + oP ( 1√
n).
◦ Re-compute θn over the pseudo-realizationX∗
1 , · · · , X∗n, i.e., let θ∗n = θn(X
∗1 , . . . , X∗
n).
� Then, under some moment, mixing and regularityconditions, as b−1 + b/n → 0, BB ‘works’, i.e.
supx |P ∗(√
n(θ∗n− θn) ≤ x)−P (√
n(θn−θ) ≤ x)| P−→ 0
• Can equally handle continuous functions ofapproximately linear statistics.
◦ Bootstrap distribution may require explicit centeringas E∗θ∗n �= θn in general due to edge effects—use acircular scheme.
14
Parameters of joint distributions
◦ Let θ be a parameter associated with the joint lawL(X1, X2, . . . , Xp)
◦ Let g : Rp → R
d and θn = θn(X1, . . . , Xn) be anapproximately linear statistic of the type:
θn =1
n − p + 1
n−p+1∑t=1
g(Xt, Xt+1, . . . , Xt+p−1)+oP (1√n
)
• For example, assume µ = 0 and let p = 2, d = 2 andg(x1, x2) = (x2
1, x1x2)′. Then, θn = (γ(0), γ(1))′.
◦ How to bootstrap θn?
◦ How to bootstrap the sample autocorrelationρ(1) = γ(1)/γ(0); it is a smooth function of θn.
15
Blocks-of-blocks bootstrap
◦ Naive BB scheme:
– BB on X1, . . . , Xn yields X∗1 , . . . , X∗
n
– Re-compute θn over X∗1 , . . . , X∗
n to get θ∗n.
◦ Blocks-of-blocks bootstrap:
– Define Yt = (Xt, Xt+1, . . . , Xt+p−1) fort = 1, . . . , N where N = n − p + 1
– Then, θn = N−1∑N
t=1 g(Yt) + oP ( 1√n)
– Perform BB on Y1, . . . , YN to get Y �1 , . . . , Y �
N
– Re-compute θn over Y �1 , . . . , Y �
N to get θ�n.
• Both schemes work asymptotically when p is finitebut naive scheme has bias due to edge effects.
• Naive scheme fails if p = ∞, e.g., θ =∑∞
k=−∞ γ(k)but blocks-of-blocks scheme still works—Politis, 1990.
16
Circular blocking schemes
◦ Data periodically extended ‘modulo’ n
X1, X2, X3, · · · , Xb,Xb+1, · · · , Xn, X1, X2, X3, · · ·
◦ No edge effects! Bootstrap distribution isautomatically centered correctly.
• Circular Block Bootstrap: Fixed block size b andnumber of blocks = n
B1︷ ︸︸ ︷X1, X2, X3, · · · , Xb, Xb+1, · · · ,
Bn︷ ︸︸ ︷Xn, X1, X2, X3, · · · , Xb−1︸ ︷︷ ︸
B2
• Stationary Bootstrap: Random block size withexpected value b; blocks of all sizes are available.
◦ If the block sizes are drawn from a geometricdistribution, then SB sample paths are stationary.
17
The sample mean revisited
• Subsampling for the sample mean works in greatgenerality; all that is required is strong mixing and
τn(θn − θ)L⇒ some law J as n → ∞.
• J can be heavy-tailed α-stable with τn = n1−1/αL(n).
� BB/CB/SB bootstrap work only when a CLT holdsfor the sample mean, i.e. J is normal and τn = n1/2.
• Why? Let θ∗n denote the BB sample mean.
◦ θ∗n= n−1∑Q
i=1 X∗i = (bQ)−1
∑Qk=1
∑bj=1 X∗
(k−1)b+j
= Q−1∑Q
k=1 θb,i where n = bQ = b[n/b] � n.
◦ θ∗n is the average of Q subsample sample means θb,i.
� The BB distribution of θ∗n is a Q–fold convolution ofthe subsampling distribution Jb,n with itself.
� Q = [n/b] → ∞; hence the BB distribution tends tonormal regardless of the shape of Jb,n (and of J).
18
Higher-order accuracy
� Under some strong moment and mixing conditions,the studentized BB/CB/SB distributions all havehigher-order accuracy—cf. Lahiri, Gotze/Kunsch; i.e.,for some δ > 1/2 (but unfortunately < 1):
supx |P ∗(√
n(θ∗n−θn)σ∗n
≤ x) − P (√
n(θn−θ)σn
≤ x)| = OP (n−δ)
whereas supx |Φ(x) − P (√
n(θn−θ)σn
≤ x)| = OP (n−1/2)
� Proper studentization for BB/CB/SB is cumbersome.
◦ Extrapolation techniques can make subsamplinghigher-order accurate as well; cf. Bertail/Politis.
◦ Extrapolated/interpolated subsampling rate is slightlyslower than BB/CB/SB in the sample mean case.
◦ But for the sample median and other quantilestatistics, subsampling has faster rate than bootstrapin the i.i.d. seting—Arcones, Bickel/Sakov.
19
Standard error estimation
• Let σ2b,BB, σ2
b,CB ,σ2b,SB and σ2
b,SUB be the
BB/CB/SB and subsampling2 estimators ofσ2∞ = limn Var (
√nX).
• Then, σ2b,BB ≈ σ2
b,CB ≈ σ2b,SUB ≈ 2πfB(0), the
Bartlett estimator—they all have bias O(1/b) and
variance approximately 4σ4∞3
bn.
• σ2b,SB is approximately a linear combination of σ2
bi,CB
with bi close to b; still Bias(σ2b,BB) = O(1/b) but has
variance ∼ cSB(b/n) for cSB > (4/3)σ4∞.
• SB is less sensitive to block size mispecification.
2All using full-overlap scheme
20
Block size considerations
◦ Can choose b to minimize the MSE of σ2b :
bias/variance trade-off yields bopt ∼ cn1/3; but theproportionality constant c involves f(0) and f ′(0).
◦ Pretending f(0) and f ′(0) are known, we cancompute bopt for all methods, MSEs and AREs.
MSEopt ≈ C∗ n−2/3 for BB/CB/SUB
MSEopt,SB ≈ CSB
n−2/3 for SB
◦ Can show: 1/3 < ARE(SB/BB) < 1/2.
• But f(0) and f ′(0) are unknown.
• Let bopt be optimal block size estimator based onestimated f(0) and f ′(0).3
• Define Finite-sample Attainable Relative Efficiency(FARE) as a ratio of MSEs based on estimated bopt.
• FARE (SB/BB) close to one for small samples;4
3More on this later...4SB is less sensitive to block size.
21
MSE rates
◦ The rate MSEopt = O(n−2/3) for BB/CB/SB/SUB
and the Bartlett 2πfB(0) is suboptimal.
◦ Can achieve O(n−4/5) with nonnegative/quadraticspectral estimators, e.g. Parzen window, Daniel,etc.—but also with Welch’s scheme.
◦ Welch’s tapering idea is applicable to BB.
• Can actually further reduce the MSE to close toO(1/n) by use of higher-order kernels—but notnecessarily nonnegative estimation.
22
Tapered block bootstrap—Paparoditis/Politis
• Assume Xt is centered5 and consider all availableblocks B1, B2, . . . from one of the blocking schemes.
• Draw Q = [n/b] blocks B∗1 , . . . , B∗
Q randomly (withreplacement) from the given set of blocks B1, B2, . . .
• Taper the data from each block using the taper ν.Let B�
1, . . . , B�Q denote the tapered blocks.
• Concatenate the elements of the Q tapered blocksB�
1, . . . , B�Q to create pseudo-realization X�
1 , · · · , X�n
B�1︷ ︸︸ ︷
X�1 , · · · , X�
b ,
B�2︷ ︸︸ ︷
X�b+1, · · · , X�
2b, X�2b+1, · · ·
B�Q︷ ︸︸ ︷
· · · , X�n−1, X
�n
5centered at the sample mean will do.
23
TBB/BB comparisons
BB/CB or SUB with fully overlapping blocks:
• Bias(σ2b,BB) ≈ −2π“f ′”(0)/b
• Var (σ2b,BB) ≈ 8π2f 2(0)||λB||2 · (b/n)
• bopt,BB ∼ cBB
n1/3 and MSEopt,BB = O(n−2/3)
where “f ′”(w) = (2π)−1∑∞
k=−∞ |k|γ(k)eiwk.
For TBB:
◦ Bias(σ2b,TBB) ≈ −πf ′′(0)/b2
◦ Var (σ2b,TBB) ≈ 8π2f 2(0)||ν ∗ ν||2 · (b/n)
◦ bopt,TBB ∼ cTBB
n1/5 and MSEopt,TBB = O(n−4/5)
—Optimal block sizes depend on f and its derivatives.
—Need accurate estimation of f (and its derivatives).
24
Higher-order kernels in spectral estimation
◦ General lag-window spectral density estimator:
f (w) =1
2π
b∑s=−b
λ(s/b)eiwsγ(s)
◦ Note: f (w) can be equivalently defined as akernel-smoothed periodogram with kernel
Λ(w) =1
2π
∞∑s=−∞
λ(s)eiws
◦ Λ is said to be of order q if∫
wkΛ(w) = 0 fork = 1, . . . , q − 1, and
∫wqΛ(w) �= 0.6
◦ If f has r continuous derivatives and ζ = min(r, q):
– Bias(f(w)) ≈ −(ζ !)−1f (ζ)(w)/bζ
– Var (f(w)) ≈ f 2(w)||λ||2 · (1 + 1{w/π∈Z})(b/n)
◦ bopt,f ∼ cλ,w
n1/(2ζ+1) and MSEopt,f = O(n−2/(2ζ+1))
6Bartlett kernel is ‘almost’ order one:∫ m
−m wΛ(w) → 0 as m → ∞ (Cauchy principal value integral).
25
Flat-top lag-windows in spectral estimation
◦ To get Bias(f(w)) = O(1/br) we need q ≥ r, i.e.,order of kernel ≥ number of continuous derivatives.
◦ But r is unknown; so use a kernel with q = ∞.
◦ Simplest infinite-order kernel: flat-top lag-window(Politis/Romano) λT (x) = min(1, 2(1 − |x|)+).
� Flat-top lag-window spectral density estimator:
fT (w) =
[1
2π
b∑s=−b
λT (s/b)eiwsγ(s)
]+
� Not only MSEopt,ft= O(n−2/(2r+1))—best possible
but choosing the bandwidth b for ft is very intuitivebased on a correlogram inspection.
• Empirical Rule: if γ(s) � 0 for all s ≥ s0, let b = 2s0.
26
lag window
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1(a)
Fejer kernel
0 5 10 15 20
0.0
0.0
20
.06
0.1
0
1(b)
lag window
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1(c)
Dirichlet kernel
0 5 10 15 20
0.0
0.1
0.2
0.3
1(d)
lag window
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1(e)
flat-top kernel, c=0.5
0 5 10 15 20
0.0
0.0
50
.10
0.1
5
1(f)
1(g) 1(h)
27
MSE—optimal standard error estimation
� Can get flat-top by linear combination of two Bartlettwindows: λT (x) = 2λB(x) − λB(2x).
◦ Recall: σ2b,BB ≈ σ2
b,CB ≈ σ2b,SUB ≈ 2πfB(0).
� Let σ2b = 2σ2
b,BB − σ2b/2,BB.
� Then MSEopt,σ2b
= O(n−2/(2r+1))—best possible.
� But σ2b is not necessarily nonnegative.
– Define σ2b,+ = max(ε, σ2
b ) for small ε ≥ 0.
– There is no bootstrap scheme P� such thatVar �(
√nX�) = σ2
b or σ2b,+
28
Bandwidth/block choice for fT and σ2b
◦ Empirical Rule: if γ(s) � 0 for all s ≥ s0, let b = 2s0.
◦ γ(s) � 0 for s ≥ s0 is an implied test of significance.
◦ Focus on sample autocorrelation ρ(k) = γ(k)/γ(0).
� Formal Rule: Let b = 2s0 where s0 be the smallestpositive integer such that
|ρ(s0 + k)| < c
√log
10n
n for all k = 1, . . . ,Kn.
• Practical choice: c = 2 with Kn = max(5,√
log10 n).
◦ If 100< n <1000, then 1.41 <√
log10
n < 1.73.
� Practical Rule: If ρ(k) is in the band ±3/√
n for fiveconsecutive points k-points, then ρ(k) � 0.
� Interpretation: If Kn ≈ 5, the above bandscorrespond to 95% simultaneous intervals forρ(s0 + 1), ρ(s0 + 2), . . . , ρ(s0 + 5) by Bonferroni.
29
Lag
AC
F
0 5 10 15 20
-0.2
0.2
0.6
1.0
Series : AR1 (a)
w
f
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.2
0.4
0.6
hat m =1 (b)
w
f
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.2
0.4
0.6
hat m =2 (c)
w
f
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.2
0.4
0.6
hat m =3 (d)
Figure 3: Gaussian AR(1) acf and different bandwidth choices for flat-top lag-window spec-tral density estimation; n = 200. Superimposed are the naive ±2/
√n bands.
30
lag
ACF
0 5 10 15 20
0.00.2
0.40.6
0.81.0
Figure 4: A ‘problematic’ correlogram from an AR(1) model with ρ = 0.3 and n = 500.Superimposed are the correct bands ±c
√log10 n/n with c = 2 recommended in connection
with Kn = max(5,√
log10 n).
A problematic correlogram
◦ The values c = 2 and Kn = max(5,√
log10 n) arerecommendations—not absolute requirements.
◦ Faced with a problematic correlogram, thepractitioner must make an informed decision.
◦ When in doubt, choose the smallest b.
– Flat-tops perform best with small bs.
– “Okham’s razor” favors the simplest model.
31
Block size choice for BB/CB/SUB and TBB
BB/CB or SUB with fully overlapping blocks:
• Bias(σ2b,BB) ≈ −2π“f ′”(0)/b
• Var (σ2b,BB) ≈ 8π2f 2(0)||λB||2 · (b/n)
• bopt,BB ∼ cBB
n1/3 and MSEopt,BB = O(n−2/3)
where “f ′”(w) = (2π)−1∑∞
k=−∞ |k|γ(k)eiwk.
For TBB:
◦ Bias(σ2b,TBB) ≈ −πf ′′(0)/b2
◦ Var (σ2b,TBB) ≈ 8π2f 2(0)||ν ∗ ν||2 · (b/n)
◦ bopt,TBB ∼ cTBB
n1/5 and MSEopt,TBB = O(n−4/5)
—Optimal block sizes depend on f and its derivatives.
—Estimate f via flat-top kernel and plug-in!
32
Locally Stationary Series–Dahlhaus
◦ Stationarity assumption is often unrealistic for verylong time series.
◦ More realistic model: assume a slowly-changingstochastic structure, i.e. (for fixed k) the jointprobability law of (Xt+1, . . . , Xt+k) changessmoothly (and slowly) with the time index t.
• Local Block Bootstrap—Paparoditis/Politis (2002).
• LBB resamples blocks that are close to each other,i.e., a block that starts at time t, can only be replacedwith a block whose starting point is close to t.
– An LBB bootstrap pseudo-series is constructed bya concatenation of Q blocks of size b.
– The jth block of the resampled series is chosenrandomly from a distribution (say, uniform) on allthe size-b blocks whose time indices are ‘close’ tothose in the original jth block.
33
year
1880 1920 1960
12
34
5Figure 3a: annual S&P 500 data
year
1880 1920 1960
12
34
5
Figure 3b: BB realization
year
1880 1920 1960
23
4
Figure 3c: CBB realization
Integrated Series, Unit Roots and Random Walks
e.g.: S&P 500 (figure), stock prices, foreign exchange...
DEFINITION: {Xt} is I(1), i.e., integrated of order one,if {Xt} is not stationary but its first difference series{Yt} is stationary, where Yt = Xt − Xt−1.
• I(1) sample-paths are “continuous”.
• BB destroys the sample-path continuity; see figure.
• Continuous-Path BB—Paparoditis/Politis (2001)
• CBB IDEA: Adjust (shift) the BB blocks to ensurecontinuity of the bootstrap sample paths.
◦ Can employ CBB to test for unit root or cointegration.
34