Testing for Serial Correlation by means of Extreme Values

28
Testing for Serial Correlation by means of Extreme Values Ishay Weissman Technion - Israel Institute of Technology [email protected] Vimeiro 2013 1

Transcript of Testing for Serial Correlation by means of Extreme Values

Page 1: Testing for Serial Correlation by means of Extreme Values

Testing for SerialCorrelation by means of

Extreme Values

Ishay WeissmanTechnion - Israel Institute of Technology

[email protected]

Vimeiro 2013

1

Page 2: Testing for Serial Correlation by means of Extreme Values

A typical lecture in statistics begins as

follows:

Let

X1, X2, · · · , Xn

be an iid sample from some df F ...

I will open with

2

Page 3: Testing for Serial Correlation by means of Extreme Values

Ivette, Ivette Jr., Tiago de Oliveira

Vimeiro, 1983

3

Page 4: Testing for Serial Correlation by means of Extreme Values

And now, let

X1, X2, · · · , Xn

be a sample from a continuous df F0 and F0

is U [0,1]

(if not, replace Xi by F0(Xi)).

We suspect the data exhibit some serial

correlation (dependence).

The main purpose of this work is to study

the effectiveness of the LARGEST

SPACING (LS) as a tool to detect serial

dependence.

4

Page 5: Testing for Serial Correlation by means of Extreme Values

OVERVIEW

- Background on Spacings

- Possible Competitors

- Autoregressive Model and a Surprising

Connection to Extreme Values

- Power Comparisons

- Two More Models

- Conclusions

5

Page 6: Testing for Serial Correlation by means of Extreme Values

Want to test

H0 : ”iid-uniform”

There is no optimal test against all possible

alternatives !!!

Concentrate on Autoregressive Model

Xi = ρXi−1 + (1 − ρ)Ui

(1 ≤ i ≤ n , 0 ≤ ρ ≤ 1),

where

{Ui : i ≥ 0} is an iid-U [0,1] sequence,

X0 = U0.

So, here we test

H0 : ρ = 0 vs. H1 : ρ > 0 .

6

Page 7: Testing for Serial Correlation by means of Extreme Values

SPACINGS

Let

Y1 ≤ Y2 ≤ · · · ≤ Yn (Y0 ≡ 0, Yn+1 ≡ 1)

be the order statistics of the {Xi} and let

Vi = Yi − Yi−1 (i = 1,2, · · · , n + 1)

be the spacings and Vmax be the largest.

When ρ = 0, for 0 ≤ y ≤ 1,

P{Vmax ≤ y } =

n+1∑j=0

(−1)j(n + 1

j

){(1 − jy)+}n

(Whitworth (1897), Darling(1953)).

7

Page 8: Testing for Serial Correlation by means of Extreme Values

If E1, E2, · · · , En+1 are iid unit-exponential

and

Tn+1 =n+1∑i=1

Ei ,

then

(V1, V2, · · · , Vn+1)D=

(E1, E2, · · · , En+1)

Tn+1

=n + 1

Tn+1·(E1, E2, · · · , En+1)

n + 1,

independent of Tn+1. Since

Tn+1/(n + 1) → 1 a.s., for large n, the

spacings behave (approximately) as iid

exponential (λ = n + 1).

Hence, for −∞ < x < ∞

limn→∞P{ (n + 1)Vmax − log(n + 1) ≤ x }

= exp{−e−x} ,

i.e. attraction to the Gumbel distribution.8

Page 9: Testing for Serial Correlation by means of Extreme Values

Want to compare the power of LS with some

other competitors.

That is, the power of the test which rejects

H0 when Vmax > cα with powers of tests

based on:

- Likelihood ratio (LR)

- Sample serial correlation (SSC)

- Kolmogorov-Smirnov (K-S)

9

Page 10: Testing for Serial Correlation by means of Extreme Values

LR: Most powerful, as a benchmark, to see

how close is LS to LR.

SSC: Least squares estimator of ρ, intuitive.

K-S: Very popular, similar in nature:

extreme vertical distance

vs.

extreme horizontal distance.

10

Page 11: Testing for Serial Correlation by means of Extreme Values

Empirical df vs. Uniform df

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

LS and K-S occur together K-S is large due to

accumulation

11

Page 12: Testing for Serial Correlation by means of Extreme Values

LIKELIHOOD RATIO

Denote

X = (X1, X2, · · · , Xn)

U = (U1, U2, · · · , Un)

and let U0 = X0 = x0 ∈ [0,1]. Then

Ui = (Xi − ρXi−1)(1 − ρ)−1 (1 ≤ i ≤ n).

The Jacobian of U 7→ X is (1 − ρ)−n.

Hence the joint density of X, conditioned on

U0 = X0 = x0, at x ∈ [0,1]n, is given by

fX(x) =

= (1 − ρ)−nn∏

i=1I{ρxi−1 ≤ xi ≤ ρxi−1 + 1 − ρ}

12

Page 13: Testing for Serial Correlation by means of Extreme Values

= (1−ρ)−nI

ρ ≤ min1≤i≤n

min

xi

xi−1,

1 − xi

1 − xi−1

.

Page 14: Testing for Serial Correlation by means of Extreme Values

Let

Ti = min

Xi

Xi−1,

1 − Xi

1 − Xi−1

(∈ [ρ,1] )

(1 ≤ i ≤ n, Tmin = min1≤i≤n

Ti ),

then the following facts follow from Slide 11:

Fact 1. The {Ti} are iid uniform on [ρ,1].

Fact 2. The likelihood function is given by

L(ρ) = (1 − ρ)−nI{ρ ≤ Tmin} (0 ≤ ρ ≤ 1).

Fact 3. The statistic Tmin is sufficient with

respect to ρ and it is the maximum likelihood

estimator (MLE) of ρ.13

Page 15: Testing for Serial Correlation by means of Extreme Values

Fact 4. For testing

H0 : ρ = 0 vs. H1 : ρ > 0

the test which rejects H0 when

Tmin > cα = 1 − α1/n

is most powerful α-level test, with power

given by

πα(ρ) =

α

(1−ρ)n if ρ ≤ cα ,

1 if ρ ≥ cα .

Interesting case:

a sample extreme (minimum) is most

powerful for testing existence of serial

correlation !!!

14

Page 16: Testing for Serial Correlation by means of Extreme Values

POWER COMPARISONS

For each pair ρ, n we generated 105 samples

from the autoregressive model and computed

the (empirical) power, namely, the proportion

of samples for which H0 : ρ = 0 was rejected.

The significance level is α = .05 in all cases.

15

Page 17: Testing for Serial Correlation by means of Extreme Values

Power Functions, Autoregressive Model, α = .05.

0.0 0.1 0.2 0.3 0.4

0.2

0.6

1.0

alpha=.05, n=10

rho

0.0 0.1 0.2 0.3 0.40

.20

.61

.0

alpha=.05, n=20

rho

0.0 0.1 0.2 0.3 0.4

0.2

0.6

1.0

alpha=.05, n=50

rho

0.0 0.1 0.2 0.3 0.4

0.2

0.6

1.0

alpha=.05, n=100

rho

0.0 0.1 0.2 0.3 0.4

0.2

0.6

1.0

alpha=.05, n=200

rho

0.00 0.05 0.10 0.15 0.20

0.2

0.6

1.0

alpha=.05, n=500

rho

LR (blue), LS (black), K-S (red) , SSC (green)

16

Page 18: Testing for Serial Correlation by means of Extreme Values

0.00 0.05 0.10 0.15

0.2

0.6

1.0

alpha=.05, n=1000

rho

0.00 0.02 0.04 0.06 0.08 0.10

0.2

0.6

1.0

alpha=.05, n=2000

rho

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

0.2

0.6

1.0

alpha=.05, n=5000

rho

0.00 0.01 0.02 0.03 0.04 0.05

0.2

0.6

1.0

alpha=.05, n=10000

rho

LR (blue), LS (black), K-S (red) , SSC (green)

17

Page 19: Testing for Serial Correlation by means of Extreme Values

To be fair to Kolmogorov-Smirnov, we have

run similar simulations on samples from beta

models beta(γ, 1), namely

Xi = U1/γi .

Independent, but not uniform.

Here,

H0 : γ = 1 vs. H1 : γ > 1

or

H0 : γ = 1 vs. H2 : γ < 1 .

18

Page 20: Testing for Serial Correlation by means of Extreme Values

Power Functions, Beta Model

0.5 1.0 1.5 2.0

0.1

0.3

0.5

0.7

alpha=.05, n=10

1/gamma

0.5 1.0 1.5 2.0

0.2

0.6

1.0

alpha=.05, n=50

1/gamma

0.5 1.0 1.5 2.0

0.2

0.6

1.0

alpha=.05, n=100

1/gamma

0.5 1.0 1.5 2.0

0.2

0.6

1.0

alpha=.05, n=1000

1/gamma

LR (blue), LS (black), K-S (red)

LR here refers to the likelihood ratio test for

this model∗, namely the most powerful test.

K-S tends to the optimum, while LS stays

far below.19

Page 21: Testing for Serial Correlation by means of Extreme Values

(∗) Reject H0 vs. γ > 1 when −2Σ logXi < χ22n(.05)

Reject H0 vs. γ < 1 when −2Σ logXi > χ22n(.95).

Page 22: Testing for Serial Correlation by means of Extreme Values

TWO MORE MODELS

Binomial Model:

Let B1, B2, · · · be iid Bernoulli sequence with

parameter p, independent of the {Ui}

sequence.

Define

Yi = BiYi−1 + (1 − Bi)Ui

( i ≥ 1 , Y0 = U0 )

Notice, the marginal distribution of Yi is

U [0,1],

the first serial correlation, P{Yi = Yi+1}

and the extremal index, all three are equal

to p. Clusters of equal neighbors are of

random (geometric) length.

20

Page 23: Testing for Serial Correlation by means of Extreme Values

Moving-max model:

Let ξ1, ξ2, · · · be a sequence of iid β(k−1,1)

random variables, where k is a fixed positive

integer. Let

Zi = max{ξi, ξi+1, · · · , ξi+k−1} (i ≥ 1).

The Z-sequence is called a moving-max

sequence of order k. For each i, Zi is

U [0,1]-distributed but neighboring values are

dependent. Upper extreme values appear in

clusters of size k, which imply that the

extremal index is equal to k−1.

For k = 2, the first serial correlation is 3/7

and P{Zi = Zi+1} = 1/3.

21

Page 24: Testing for Serial Correlation by means of Extreme Values

Scatter points (i, Yi) and (i, Zi)

0 20 40 60 80 100

0.00.2

0.40.6

0.81.0

Binomial, p=.333

0 20 40 60 80 100

0.20.4

0.60.8

1.0

Moving−Max(2)

The two plots look very similar. In both

cases, the experienced practitioner will reject

the independence hypothesis just on the

basis of the fact that for continuous random

variables, the probability of a tie is 0. We

brought these cases to see how well the LS

and K-S tests detect the dependence.

22

Page 25: Testing for Serial Correlation by means of Extreme Values

Power Functions

1 10 100 1000 10000

0.0

0.2

0.4

0.6

0.8

1.0

Moving−Max(2), alpha=.05

n

1 10 100 1000 10000

0.0

0.2

0.4

0.6

0.8

1.0

Moving−Max(3), alpha=.05

n

Logarithmic scale, LS (black), K-S (red)

K-S test is not consistent !

(Similar results for the Binomial Model.)

23

Page 26: Testing for Serial Correlation by means of Extreme Values

CONCLUSION

- We presented here evidence (not a

theorem) that the largest spacing is quite

sensitive to serial dependence.

- K-S is more sensitive to deviation from

”uniform distribution”.

- As a byproduct, in the Autoregressive

Model, the optimal test for serial correlation

is based on lower extremes.

24

Page 27: Testing for Serial Correlation by means of Extreme Values

THANK YOU FOR

YOUR ATTENTION

SEE YOU ALL

IN VIMEIRO 2043

25

Page 28: Testing for Serial Correlation by means of Extreme Values