Maximum Likelihood Estimation for α-Stable AR and Allpass ...rdavis/lectures/Bern_07.pdf · EVA...
Transcript of Maximum Likelihood Estimation for α-Stable AR and Allpass ...rdavis/lectures/Bern_07.pdf · EVA...
1EVA 2007
Maximum Likelihood Estimation for Maximum Likelihood Estimation for αα--Stable Stable AR and AR and AllpassAllpass Processes Processes
Beth AndrewsNorthwestern University
Matt CalderPHZ Capital Partners
Richard A. DavisColumbia University
(http://www.stat.columbia.edu/~rdavis)
2EVA 2007
Blood!Blood!
3EVA 2007
1. Motivating Example
day
log-
volu
me
0 50 100 150 200 250
15.0
15.5
16.0
16.5
17.0
Log(volume) of Walmart stock 12/1/03-12/31/04
4EVA 2007
1. Motivating Example (cont)
lag (h)
acf
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
lag (h)
pacf
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
Analysis suggests that {Xt} follows an AR (1) or AR(2).
A causal AR(2) fit is
Xt = .4455 Xt-1 + .1025 Xt-2 + Zt
Are the estimated residuals iid?
5EVA 2007
Analysis of residuals from causal AR(2) fit
lag (h)
acf o
f squ
ares
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
lag (h)
acf o
f abs
val
ues
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
lag (h)
acf
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
lag (h)
pacf
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0ACF + PACF imply data uncorrelated
ACFs of abs values and squares imply data dependentExample of data from an allpass model
6EVA 2007
Game Plan
1. Motivating example: Wal-Mart volume
2. Setup
AR models
Allpass models
Stable noise
3. Maximum likelihood estimation
4. Simulation results
5. Wal-Mart revisited
7EVA 2007
Assume {Xt} follows an autoregressive model or an allpass model.
AR(p) model:
Xt = φ1 Xt-1 + … + φp Xt-p + Zt, where {Zt} ~ IID
AR polynomial: φ(z)=1−φ1z − … − φpzp , φ(z) ≠ 0 for |z| = 1.
• causal if φ(z) ≠ 0 for |z| ≤ 1, i.e., fcn of past values of Zt
• purely noncausal if φ(z) ≠ 0 for |z| > 1, i.e., fcn of future values of Zt
2. Setup—AR models
jt0j
jt −
∞
=∑= ZX ψ
jt1pj
jt +
∞
+=∑= ZX ψ
8EVA 2007
2. Setup—AR Models
Zt
Xt
Impulse response: causal & low frequency
jt-j
jt −
∞
∞=∑= ZX ψ
9EVA 2007
2. Setup—AR Models
Impulse response: noncausal & high frequency
Zt
Xt
jt-j
jt −
∞
∞=∑= ZX ψ
10EVA 2007
2. Setup—AR Models
Impulse response: mixed causal (low frequency) & noncausal (high frequency)
Zt
Xt
jt-j
jt −
∞
∞=∑= ZX ψ
11EVA 2007
Causal AR polynomial: φ(z)=1−φ1z − … − φpzp , φ(z) ≠ 0 for |z|≤1.
Define MA polynomial:
θ(z) = −zp φ(z−1)/φp = −(zp −φ1zp-1 − … − φp)/ φp
≠ 0 for |z|≥1 (MA polynomial is non-invertible).
Model for data {Xt} : φ(B)Xt = θ(B) Zt , {Zt} ~ IID (non-Gaussian)
BkXt = Xt-k
Examples:
All-pass(1): Xt − φ Xt-1 = Zt − φ−1 Zt-1 , | φ | < 1.
All-pass(2): Xt − φ1 Xt-1 − φ2 Xt-2 = Zt + φ1/ φ2 Zt-1 + 1/ φ2 Zt-2
2. Setup—Allpass models
12EVA 2007
Properties: linear process with nonlinear behavior
• causal, non-invertible ARMA with MA representation
• uncorrelated (flat spectrum)
• zero mean
• data are dependent if noise is non-Gaussian
(e.g. Breidt & Davis 1991).
• squares and absolute values are correlated.
• Xt is heavy-tailed if noise is heavy-tailed.
πφσ
πσ
φφ
φω
ω
ωω
22)(
)()( 2
22
22
22
pi
p
iip
Xe
eef ==
−
−
jt0j
jtp
1
t )()(
−
∞
=
−
∑ψ=φφ−
φ= ZZ
BBBX
p
2. Setup—Allpass models
13EVA 2007
Properties cont:
• Suppose the time series Xt follows a noncausal AR and/or a non-invertible MA process. For definiteness, suppose {Xt} follows the noninvertible MA model
Xt= θi(B) θni(B) Zt , {Zt} ~ IID.
Step 1: Let {Ut} be the residuals obtained by fitting a purely invertible MA model, i.e.,
So
Step 2: Fit a purely causal AP model to {Ut}
2. Setup—Allpass models
Z(B)~(B)U
). of version invertible theis ~( ,(B)U~(B)
(B)UˆX
tni
nit
ninitnii
tt
θθ
≈
θθθθ≈
θ=
14EVA 2007
The noise is assumed to have a nonGaussian stable distribution
Noise distribution: {Zt} ~ IID with a stable distribution
That is, Zt has characteristic function given by
where+ exponent α ∈(0,2)
+ symmetry parameter β ∈(-1,1)
+ scale parameter σ > 0 ∈(0,2)
+ location parameter μ.
Density function: inverse Fourier transform
can be computed numerically reasonably quickly (Nolan, `97).
2. Setup—stable noise
})]1|)(|2/tan()sgn(1[||σexp{)}{exp()( 1ααt sissisisZEs μσπαβϕ α +−+−== −
0
dssezf isz∫∞
∞−0
−−= )()2()),,,(;( 1 ϕπμσβα
15EVA 2007
Remarks:
1. There are a number of parameterizations of a stable distribution (see Zolotarev `86), but this is the one advocated by Nolan `01.
• differentiable wrt to z
• differentiable wrt to the parameter vector (α,β,σ,μ)’
2. Stable distributions have pareto-like heavy tails
3. Location/scale family
4. Unimodal. is unimodal
2. Setup—stable noise (cont)
.)sin()( and )()|(|1
01
−∞−
⎟⎟⎠
⎞⎜⎜⎝
⎛=→> ∫ dtttccxZPx ααα αασ
( ) ( ))0,1,,();(),,,(; 11 βαμσσμσβα −= −− zfzf
( ))0,1,,(; βα•f
16EVA 2007
Second-order moment techniques do not work (work for causal AR!)
• least squares
• Gaussian likelihood
Higher-order cumulant methods
• Giannakis and Swami (1990)
• Chi and Kung (1995)
Non-Gaussian likelihood methods
• likelihood approximation assuming known density
• quasi-likelihood
Other
• LAD- least absolute deviation
• Rank-based estimation (minimum dispersion)
Estimation for nonCausal and/or All-Pass Models
3. Maximum Likelihood Estimation
17EVA 2007
Realizations of an invertible and noninvertible MA(2) processesModel: Xt = θ∗(B) Zt , {Zt} ~ IID(α = 1), whereθi(B) = (1 +1/2B)(1 + 1/3B) and θni(B) = (1 + 2B)(1 + 3B)
0 10 20 30 40
-40
-20
020
0 10 20 30 40
-300
-100
010
0
Lag
0 2 4 6 8 10
-0.2
0.2
0.6
1.0
ACF
Lag
0 2 4 6 8 10
-0.2
0.2
0.6
1.0
ACF
18EVA 2007
Estimation strategies for causal AR models:
• LS estimation (Davis and Resnick `86)
• Maximum Gaussian likelihood (Mikosch, Gadrich, Kluppelberg,
Adler `95)
• LAD- least absolute deviation (special case of M-Estimation)
(Davis, Knight, Liu `92; Davis `96)
3. Estimation for AR Models
19EVA 2007
Realization from an all-pass model of order 2
(t3 noise )
0 1 0 2 0 3 0 4 0
0.0
0.2
0.4
0.6
0.8
1.0
L a g
AC
F
A C F : (a llp a s s )2
0 1 0 2 0 3 0 4 0
0.0
0.2
0.4
0.6
0.8
1.0
L a g
AC
F
A C F : (a llp a s s )
modelsample
t
X(t)
0 200 400 600 800 1000
-30
-20
-10
010
20
4. Allpass models
20EVA 2007
Data: (X1, . . ., Xn)
Model:
where φ0r is the last non-zero coefficient among the φ0j’s.
Noise:
where zt =Zt / φ0r.
More generally define,
Note: zt(φ0) is a close approximation to zt (initialization error)
rtpptpt
ptptt
ZZZ
XXX
00101
0101
/)( φφ−−φ−−
φ++φ=
+−−
−−
L
L
),( 01010101 ptptttpptpt XXXzzz −−+−− φ−−φ−−φ++φ= LL
⎩⎨⎧
+=φ−φ++φ++=
=+−
− .1,..., if ,)()()(,1,..., if ,0
)(11 pntXBzz
npntz
ttpptpt φφ
φL
3. MLE—allpass models: approximating the likelihood
21EVA 2007
Assume that Zt has stable pdf fτ and consider the vector
Joint density of z:
and hence the joint density of the data can be approximated by
where q=max{0 ≤ j ≤ p: φj ≠ 0}.
)')(),...,(),...,(,)(),...,(,,...,( 110101 4444 34444 2144444 344444 21φφφφφ npnpp zzzzzXX +−−−=z
independent pieces
)),(),...,(( ||))((
))(),...,(,,...,()(
121
01011
φφφ
φφ
npn
pn
tqtq
pp
zzhzf
zzXXhh
+−
−
=
−−
⎟⎟⎠
⎞⎜⎜⎝
⎛•
=
∏ φφτ
z
⎟⎟⎠
⎞⎜⎜⎝
⎛= ∏
−
=
pn
tqtq zfh
1
||))(()( φφτ φx
22EVA 2007
Log-likelihood: Let
( )[ ]
( )[ ]∑
∑
∑
∑
−
=
−
=
−
=
−
=
=
=
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛ −=
+=
pn
tt
pn
tqqqt
pn
t q
qtq
q
pn
ttq
zf
zf
zf
zfL
1
1
1
1
);(ln
)/|,|/,)sgn(,();(ln
)0,1,,(;/
/)(||ln
|)|ln)),,,();(((ln),(
τ
φμφσβφα
βαφσ
φμσφ
φμσβαφτ
φ
φ
φ
φφ
)'/|,|/,)sgn(,( qqq φμφσβφατ =
23EVA 2007
One can go through a similar calculation for noncausal AR models.
Model:
Xt = φ1 Xt-1 + … + φp Xt-p + Zt
Zt = (1−φ1 B + … + φp Bp) Xt B=backward shift operator
Zt = (1−θ1 B + … + θr Br) (1−θr+1 B + … + θr+s Bs) Xt
Zt = θ+(B)θ*(B) Xt
where θ+(z) is the good (causal) AR polynomial and θ*(B) is the bad (purely noncausal) AR polynomial.
Likelihood:
3. MLE—noncausal AR models
( )[ ]( )∑+=
>+=n
ptpt sIZfL
1)0(||ln);(ln),( θττ θθ
)'/|,|/,)sgn(,( qqq φμφσβφατ =
24EVA 2007
Reparameterize: Denote the true parameters by φ0 and τ0 and set
and
which are elements of Rp and R4, respectively. Now the log-likelihood
can be re-expressed as a continuous function on Rp×R4 given by
Note:
( )[ ] ( )[ ]∑∑−
=
−
=
−−
−−
−++=
−++=pn
tt
pn
tt
n
zfvnunzf
LvnunLvuW
10
1
2/10
/1
02/1
0/1
);(ln);(ln
),(),(),(
0
0
ττ
ττ
α
α
00
00
φφ
φφ
)( )( 02/1
0/1 0 ττφφ −=−= nvnu α
( ) ( ) )ˆ(),ˆ(),(maxargˆ,ˆ 02/1
0/1
,0 ττφφ −−== nnvuWvu nvunn
α
25EVA 2007
Result: on C(Rp×R4 ), where
• is the Fisher information for a stable density.
• N ~ N(0, I(τ0)) is a normal random vector independent of W.
• W is the process
vvvuWvuW dn )('2')(),( 01 τIN −−+→
[ ]41,
20 )}/();(ln{E:)(
=∂∂∂−=
jijizf ττττI
( ) ( ){ }∑∑∞
= ≠
−− −Γ+=1 0
0,0/11
00/1
0, ;ln;)(||)](~[ln )( 00
k jjkkkjrjk zfucczfuW ττδφσα αα
cj(u) are found through the Laurent series identity
{zk,j} is iid z1,1 =d Z1/φ0r
{δk} is iid P(δk=1) = p = 1 − P(δk=-1).
Γk=E1 + … + Ek, where {Ek} is iid unit exponentials
{zk,j}, {δk}, and {Ek} are mutually independent
[ ]p
kj
kkjj zzzzuzuc
10
100 ))(/())(/(')(
=≠
−−∑ +−= φφ
26EVA 2007
Deconstructing the limit result and limit process:
The limit process in the following
consists of two independent pieces:
1. W(u) which governs the limit behavior of the AR or AP model parameters φ1,. . . ,φp. Set
2. v’N−2−1v’I(τ0)v which governs the limit behavior of the stable parameters (α,β,σ,μ). Set is equal to
vvvuWvuW dn )('2')(),( 01 τIN −−+→
Theorem: There exists a sequence of maximizers of the likelihood
function such that
and
)(maxarg uWu=ξ
,)('2'max arg 01
v vvv τIN −−=η)).(,0(N~)( 0
10
1 ττ −−= INIη
. )ˆ( )ˆ( 02/1
d0/1 0 ηττξφφ dMLML nn →−→−α
27EVA 2007
Remarks:
The distribution of ξ is generally intractable. However, one can use bootstrapping techniques (see later example and Davis and Wu `97).
The scaling for the AR parameters is n1/α, which is must faster thanthe standard n1/2 rate.
The limit behavior of the estimates of the stable parameters is the same as for an iid sequence (see DuMouchel `73).
. ))()ˆ( , )ˆ( 002/1
d0/1 0 τα 1ΙΝ(0,ηττξφφ −∼→−→− dMLML nn
3. MLE—allpass models
28EVA 2007
Simulation setup:
• 300 replicates of a non-causal AR(2) model
Xt = −1.2 Xt-1 + 1.6 Xt-2 + Zt (zeros of AR polyn 1.25, -.5)
• noise distribution is stable with two sets of parameter values:
α = .8 β = .5 σ = 1.0 μ = 0.0 (really heavy!)
α = 1.5 β = .5 σ = 1.0 μ =0.0
• sample sizes n=500
• estimation is maximum likelihood
4. Simulation Results
29EVA 2007
4. Simulation Results
0.0830.0000.078μ = 0.00.078-0.0060.078μ = 0.0
0.0560.9970.047σ = 1.00.0660.9990.048σ = 1.0
0.1280.5090.121β = 0.50.1280.0100.137β = 0.0
0.0711.4990.070α = 1.50.0691.5020.071α = 1.5
0.0621.598φ2 = 1.60.0651.605φ2 = 1.6
0.078-1.204φ1=−1.20.083-1.212φ1=−1.2
0.064-0.0040.062μ = 0.00.057-0.0020.054μ = 0.0
0.0710.9970.074σ = 1.00.0730.9970.077σ = 1.0
0.0560.5020.058β = 0.50.068-0.0010.067β = 0.0
0.0390.8000.049α = 0.80.0410.7980.051α = 0.8
0.0041.600φ2 = 1.60.0041.600φ2 = 1.6
0.004-1.200φ1=−1.20.004-1.200φ1=−1.2
EmpiricalMean Std Dev
AsympSD
EmpiricalMean Std Dev
AsympSD
30EVA 2007
5. Walmart revisited---residuals from noncausal model
Analysis of the ACF and PACF of the time series (n=274) suggests that {Xt} follows an AR (1) or AR(2).
A causal AR(2) fit (using Gaussian MLE) is
Xt = .4455 Xt-1 + .1025 Xt-2 + Zt
The estimated residuals were uncorrelated but dependent.
Bootstrap confidence intervals for φ1 and φ2 : (based on Davis and Wu `97)
(-2.5719,-1.4212) (1.4521,2.5743)
Resample size is m=135.
Maximum-likelihood model:
Xt = -2.0766 Xt-1 + 2.0772 Xt-2 + Zt (purely non-causal)
{Zt} ~ IID Stable
α = 1.8335, β = .5650, σ = .4559, μ = 16.0030
lag (h)
acf o
f abs
val
ues
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
lag (h)
acf o
f squ
ares
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
31EVA 2007 theoretical quantiles
empi
rical
qua
ntile
s
14 16 18 20
1416
1820
day
resi
dual
s
0 50 100 150 200 250
1416
1820
5. Walmart—analysis of residuals from noncausal model
lag (h)
acf o
f abs
val
ues
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
lag (h)
acf o
f squ
ares
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
CB’s simulated
CB’s simulated
32EVA 2007
6th Conference on Extremes: Fort Collins, Colorado June 22-26, 2009
33EVA 2007
Graybill VIII Conference in 2009VIII Conference in 2009
66thth Conference on Extremes Conference on Extremes
Fort Collins, Colorado Fort Collins, Colorado
June 22June 22-- 26, 200926, 2009