Maximum Likelihood Estimation for α-Stable AR and Allpass ...rdavis/lectures/Bern_07.pdf · EVA...

Post on 24-Jun-2020

6 views 0 download

Transcript of Maximum Likelihood Estimation for α-Stable AR and Allpass ...rdavis/lectures/Bern_07.pdf · EVA...

1EVA 2007

Maximum Likelihood Estimation for Maximum Likelihood Estimation for αα--Stable Stable AR and AR and AllpassAllpass Processes Processes

Beth AndrewsNorthwestern University

Matt CalderPHZ Capital Partners

Richard A. DavisColumbia University

(http://www.stat.columbia.edu/~rdavis)

2EVA 2007

Blood!Blood!

3EVA 2007

1. Motivating Example

day

log-

volu

me

0 50 100 150 200 250

15.0

15.5

16.0

16.5

17.0

Log(volume) of Walmart stock 12/1/03-12/31/04

4EVA 2007

1. Motivating Example (cont)

lag (h)

acf

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

lag (h)

pacf

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Analysis suggests that {Xt} follows an AR (1) or AR(2).

A causal AR(2) fit is

Xt = .4455 Xt-1 + .1025 Xt-2 + Zt

Are the estimated residuals iid?

5EVA 2007

Analysis of residuals from causal AR(2) fit

lag (h)

acf o

f squ

ares

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

lag (h)

acf o

f abs

val

ues

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

lag (h)

acf

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

lag (h)

pacf

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0ACF + PACF imply data uncorrelated

ACFs of abs values and squares imply data dependentExample of data from an allpass model

6EVA 2007

Game Plan

1. Motivating example: Wal-Mart volume

2. Setup

AR models

Allpass models

Stable noise

3. Maximum likelihood estimation

4. Simulation results

5. Wal-Mart revisited

7EVA 2007

Assume {Xt} follows an autoregressive model or an allpass model.

AR(p) model:

Xt = φ1 Xt-1 + … + φp Xt-p + Zt, where {Zt} ~ IID

AR polynomial: φ(z)=1−φ1z − … − φpzp , φ(z) ≠ 0 for |z| = 1.

• causal if φ(z) ≠ 0 for |z| ≤ 1, i.e., fcn of past values of Zt

• purely noncausal if φ(z) ≠ 0 for |z| > 1, i.e., fcn of future values of Zt

2. Setup—AR models

jt0j

jt −

=∑= ZX ψ

jt1pj

jt +

+=∑= ZX ψ

8EVA 2007

2. Setup—AR Models

Zt

Xt

Impulse response: causal & low frequency

jt-j

jt −

∞=∑= ZX ψ

9EVA 2007

2. Setup—AR Models

Impulse response: noncausal & high frequency

Zt

Xt

jt-j

jt −

∞=∑= ZX ψ

10EVA 2007

2. Setup—AR Models

Impulse response: mixed causal (low frequency) & noncausal (high frequency)

Zt

Xt

jt-j

jt −

∞=∑= ZX ψ

11EVA 2007

Causal AR polynomial: φ(z)=1−φ1z − … − φpzp , φ(z) ≠ 0 for |z|≤1.

Define MA polynomial:

θ(z) = −zp φ(z−1)/φp = −(zp −φ1zp-1 − … − φp)/ φp

≠ 0 for |z|≥1 (MA polynomial is non-invertible).

Model for data {Xt} : φ(B)Xt = θ(B) Zt , {Zt} ~ IID (non-Gaussian)

BkXt = Xt-k

Examples:

All-pass(1): Xt − φ Xt-1 = Zt − φ−1 Zt-1 , | φ | < 1.

All-pass(2): Xt − φ1 Xt-1 − φ2 Xt-2 = Zt + φ1/ φ2 Zt-1 + 1/ φ2 Zt-2

2. Setup—Allpass models

12EVA 2007

Properties: linear process with nonlinear behavior

• causal, non-invertible ARMA with MA representation

• uncorrelated (flat spectrum)

• zero mean

• data are dependent if noise is non-Gaussian

(e.g. Breidt & Davis 1991).

• squares and absolute values are correlated.

• Xt is heavy-tailed if noise is heavy-tailed.

πφσ

πσ

φφ

φω

ω

ωω

22)(

)()( 2

22

22

22

pi

p

iip

Xe

eef ==

jt0j

jtp

1

t )()(

=

∑ψ=φφ−

φ= ZZ

BBBX

p

2. Setup—Allpass models

13EVA 2007

Properties cont:

• Suppose the time series Xt follows a noncausal AR and/or a non-invertible MA process. For definiteness, suppose {Xt} follows the noninvertible MA model

Xt= θi(B) θni(B) Zt , {Zt} ~ IID.

Step 1: Let {Ut} be the residuals obtained by fitting a purely invertible MA model, i.e.,

So

Step 2: Fit a purely causal AP model to {Ut}

2. Setup—Allpass models

Z(B)~(B)U

). of version invertible theis ~( ,(B)U~(B)

(B)UˆX

tni

nit

ninitnii

tt

θθ

θθθθ≈

θ=

14EVA 2007

The noise is assumed to have a nonGaussian stable distribution

Noise distribution: {Zt} ~ IID with a stable distribution

That is, Zt has characteristic function given by

where+ exponent α ∈(0,2)

+ symmetry parameter β ∈(-1,1)

+ scale parameter σ > 0 ∈(0,2)

+ location parameter μ.

Density function: inverse Fourier transform

can be computed numerically reasonably quickly (Nolan, `97).

2. Setup—stable noise

})]1|)(|2/tan()sgn(1[||σexp{)}{exp()( 1ααt sissisisZEs μσπαβϕ α +−+−== −

0

dssezf isz∫∞

∞−0

−−= )()2()),,,(;( 1 ϕπμσβα

15EVA 2007

Remarks:

1. There are a number of parameterizations of a stable distribution (see Zolotarev `86), but this is the one advocated by Nolan `01.

• differentiable wrt to z

• differentiable wrt to the parameter vector (α,β,σ,μ)’

2. Stable distributions have pareto-like heavy tails

3. Location/scale family

4. Unimodal. is unimodal

2. Setup—stable noise (cont)

.)sin()( and )()|(|1

01

−∞−

⎟⎟⎠

⎞⎜⎜⎝

⎛=→> ∫ dtttccxZPx ααα αασ

( ) ( ))0,1,,();(),,,(; 11 βαμσσμσβα −= −− zfzf

( ))0,1,,(; βα•f

16EVA 2007

Second-order moment techniques do not work (work for causal AR!)

• least squares

• Gaussian likelihood

Higher-order cumulant methods

• Giannakis and Swami (1990)

• Chi and Kung (1995)

Non-Gaussian likelihood methods

• likelihood approximation assuming known density

• quasi-likelihood

Other

• LAD- least absolute deviation

• Rank-based estimation (minimum dispersion)

Estimation for nonCausal and/or All-Pass Models

3. Maximum Likelihood Estimation

17EVA 2007

Realizations of an invertible and noninvertible MA(2) processesModel: Xt = θ∗(B) Zt , {Zt} ~ IID(α = 1), whereθi(B) = (1 +1/2B)(1 + 1/3B) and θni(B) = (1 + 2B)(1 + 3B)

0 10 20 30 40

-40

-20

020

0 10 20 30 40

-300

-100

010

0

Lag

0 2 4 6 8 10

-0.2

0.2

0.6

1.0

ACF

Lag

0 2 4 6 8 10

-0.2

0.2

0.6

1.0

ACF

18EVA 2007

Estimation strategies for causal AR models:

• LS estimation (Davis and Resnick `86)

• Maximum Gaussian likelihood (Mikosch, Gadrich, Kluppelberg,

Adler `95)

• LAD- least absolute deviation (special case of M-Estimation)

(Davis, Knight, Liu `92; Davis `96)

3. Estimation for AR Models

19EVA 2007

Realization from an all-pass model of order 2

(t3 noise )

0 1 0 2 0 3 0 4 0

0.0

0.2

0.4

0.6

0.8

1.0

L a g

AC

F

A C F : (a llp a s s )2

0 1 0 2 0 3 0 4 0

0.0

0.2

0.4

0.6

0.8

1.0

L a g

AC

F

A C F : (a llp a s s )

modelsample

t

X(t)

0 200 400 600 800 1000

-30

-20

-10

010

20

4. Allpass models

20EVA 2007

Data: (X1, . . ., Xn)

Model:

where φ0r is the last non-zero coefficient among the φ0j’s.

Noise:

where zt =Zt / φ0r.

More generally define,

Note: zt(φ0) is a close approximation to zt (initialization error)

rtpptpt

ptptt

ZZZ

XXX

00101

0101

/)( φφ−−φ−−

φ++φ=

+−−

−−

L

L

),( 01010101 ptptttpptpt XXXzzz −−+−− φ−−φ−−φ++φ= LL

⎩⎨⎧

+=φ−φ++φ++=

=+−

− .1,..., if ,)()()(,1,..., if ,0

)(11 pntXBzz

npntz

ttpptpt φφ

φL

3. MLE—allpass models: approximating the likelihood

21EVA 2007

Assume that Zt has stable pdf fτ and consider the vector

Joint density of z:

and hence the joint density of the data can be approximated by

where q=max{0 ≤ j ≤ p: φj ≠ 0}.

)')(),...,(),...,(,)(),...,(,,...,( 110101 4444 34444 2144444 344444 21φφφφφ npnpp zzzzzXX +−−−=z

independent pieces

)),(),...,(( ||))((

))(),...,(,,...,()(

121

01011

φφφ

φφ

npn

pn

tqtq

pp

zzhzf

zzXXhh

+−

=

−−

⎟⎟⎠

⎞⎜⎜⎝

⎛•

=

∏ φφτ

z

⎟⎟⎠

⎞⎜⎜⎝

⎛= ∏

=

pn

tqtq zfh

1

||))(()( φφτ φx

22EVA 2007

Log-likelihood: Let

( )[ ]

( )[ ]∑

=

=

=

=

=

=

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

+=

pn

tt

pn

tqqqt

pn

t q

qtq

q

pn

ttq

zf

zf

zf

zfL

1

1

1

1

);(ln

)/|,|/,)sgn(,();(ln

)0,1,,(;/

/)(||ln

|)|ln)),,,();(((ln),(

τ

φμφσβφα

βαφσ

φμσφ

φμσβαφτ

φ

φ

φ

φφ

)'/|,|/,)sgn(,( qqq φμφσβφατ =

23EVA 2007

One can go through a similar calculation for noncausal AR models.

Model:

Xt = φ1 Xt-1 + … + φp Xt-p + Zt

Zt = (1−φ1 B + … + φp Bp) Xt B=backward shift operator

Zt = (1−θ1 B + … + θr Br) (1−θr+1 B + … + θr+s Bs) Xt

Zt = θ+(B)θ*(B) Xt

where θ+(z) is the good (causal) AR polynomial and θ*(B) is the bad (purely noncausal) AR polynomial.

Likelihood:

3. MLE—noncausal AR models

( )[ ]( )∑+=

>+=n

ptpt sIZfL

1)0(||ln);(ln),( θττ θθ

)'/|,|/,)sgn(,( qqq φμφσβφατ =

24EVA 2007

Reparameterize: Denote the true parameters by φ0 and τ0 and set

and

which are elements of Rp and R4, respectively. Now the log-likelihood

can be re-expressed as a continuous function on Rp×R4 given by

Note:

( )[ ] ( )[ ]∑∑−

=

=

−−

−−

−++=

−++=pn

tt

pn

tt

n

zfvnunzf

LvnunLvuW

10

1

2/10

/1

02/1

0/1

);(ln);(ln

),(),(),(

0

0

ττ

ττ

α

α

00

00

φφ

φφ

)( )( 02/1

0/1 0 ττφφ −=−= nvnu α

( ) ( ) )ˆ(),ˆ(),(maxargˆ,ˆ 02/1

0/1

,0 ττφφ −−== nnvuWvu nvunn

α

25EVA 2007

Result: on C(Rp×R4 ), where

• is the Fisher information for a stable density.

• N ~ N(0, I(τ0)) is a normal random vector independent of W.

• W is the process

vvvuWvuW dn )('2')(),( 01 τIN −−+→

[ ]41,

20 )}/();(ln{E:)(

=∂∂∂−=

jijizf ττττI

( ) ( ){ }∑∑∞

= ≠

−− −Γ+=1 0

0,0/11

00/1

0, ;ln;)(||)](~[ln )( 00

k jjkkkjrjk zfucczfuW ττδφσα αα

cj(u) are found through the Laurent series identity

{zk,j} is iid z1,1 =d Z1/φ0r

{δk} is iid P(δk=1) = p = 1 − P(δk=-1).

Γk=E1 + … + Ek, where {Ek} is iid unit exponentials

{zk,j}, {δk}, and {Ek} are mutually independent

[ ]p

kj

kkjj zzzzuzuc

10

100 ))(/())(/(')(

=≠

−−∑ +−= φφ

26EVA 2007

Deconstructing the limit result and limit process:

The limit process in the following

consists of two independent pieces:

1. W(u) which governs the limit behavior of the AR or AP model parameters φ1,. . . ,φp. Set

2. v’N−2−1v’I(τ0)v which governs the limit behavior of the stable parameters (α,β,σ,μ). Set is equal to

vvvuWvuW dn )('2')(),( 01 τIN −−+→

Theorem: There exists a sequence of maximizers of the likelihood

function such that

and

)(maxarg uWu=ξ

,)('2'max arg 01

v vvv τIN −−=η)).(,0(N~)( 0

10

1 ττ −−= INIη

. )ˆ( )ˆ( 02/1

d0/1 0 ηττξφφ dMLML nn →−→−α

27EVA 2007

Remarks:

The distribution of ξ is generally intractable. However, one can use bootstrapping techniques (see later example and Davis and Wu `97).

The scaling for the AR parameters is n1/α, which is must faster thanthe standard n1/2 rate.

The limit behavior of the estimates of the stable parameters is the same as for an iid sequence (see DuMouchel `73).

. ))()ˆ( , )ˆ( 002/1

d0/1 0 τα 1ΙΝ(0,ηττξφφ −∼→−→− dMLML nn

3. MLE—allpass models

28EVA 2007

Simulation setup:

• 300 replicates of a non-causal AR(2) model

Xt = −1.2 Xt-1 + 1.6 Xt-2 + Zt (zeros of AR polyn 1.25, -.5)

• noise distribution is stable with two sets of parameter values:

α = .8 β = .5 σ = 1.0 μ = 0.0 (really heavy!)

α = 1.5 β = .5 σ = 1.0 μ =0.0

• sample sizes n=500

• estimation is maximum likelihood

4. Simulation Results

29EVA 2007

4. Simulation Results

0.0830.0000.078μ = 0.00.078-0.0060.078μ = 0.0

0.0560.9970.047σ = 1.00.0660.9990.048σ = 1.0

0.1280.5090.121β = 0.50.1280.0100.137β = 0.0

0.0711.4990.070α = 1.50.0691.5020.071α = 1.5

0.0621.598φ2 = 1.60.0651.605φ2 = 1.6

0.078-1.204φ1=−1.20.083-1.212φ1=−1.2

0.064-0.0040.062μ = 0.00.057-0.0020.054μ = 0.0

0.0710.9970.074σ = 1.00.0730.9970.077σ = 1.0

0.0560.5020.058β = 0.50.068-0.0010.067β = 0.0

0.0390.8000.049α = 0.80.0410.7980.051α = 0.8

0.0041.600φ2 = 1.60.0041.600φ2 = 1.6

0.004-1.200φ1=−1.20.004-1.200φ1=−1.2

EmpiricalMean Std Dev

AsympSD

EmpiricalMean Std Dev

AsympSD

30EVA 2007

5. Walmart revisited---residuals from noncausal model

Analysis of the ACF and PACF of the time series (n=274) suggests that {Xt} follows an AR (1) or AR(2).

A causal AR(2) fit (using Gaussian MLE) is

Xt = .4455 Xt-1 + .1025 Xt-2 + Zt

The estimated residuals were uncorrelated but dependent.

Bootstrap confidence intervals for φ1 and φ2 : (based on Davis and Wu `97)

(-2.5719,-1.4212) (1.4521,2.5743)

Resample size is m=135.

Maximum-likelihood model:

Xt = -2.0766 Xt-1 + 2.0772 Xt-2 + Zt (purely non-causal)

{Zt} ~ IID Stable

α = 1.8335, β = .5650, σ = .4559, μ = 16.0030

lag (h)

acf o

f abs

val

ues

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

lag (h)

acf o

f squ

ares

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

31EVA 2007 theoretical quantiles

empi

rical

qua

ntile

s

14 16 18 20

1416

1820

day

resi

dual

s

0 50 100 150 200 250

1416

1820

5. Walmart—analysis of residuals from noncausal model

lag (h)

acf o

f abs

val

ues

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

lag (h)

acf o

f squ

ares

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

CB’s simulated

CB’s simulated

32EVA 2007

6th Conference on Extremes: Fort Collins, Colorado June 22-26, 2009

33EVA 2007

Graybill VIII Conference in 2009VIII Conference in 2009

66thth Conference on Extremes Conference on Extremes

Fort Collins, Colorado Fort Collins, Colorado

June 22June 22-- 26, 200926, 2009