Estimation of simultaneous change-point under …Estimation of simultaneous change-point under...

Estimation of simultaneous change-point undersparsity conditions

Farida Enikeeva and Zaid Harchaoui

LJK-UJFINRIA Rhones-Alpes

7 June 2012

1/24

Change-point problem for Gaussian observations

We observe Gaussian data with the change in mean,

Xi =

{θ′, i ≤ τθ′′, i > τ

+ εξi , ξi ∼ N (0, 1), i = 1, . . . , n

0 20 40 60 80 100 120 140 160 180 20050−4

−3

−2

−1

0

1

2

3

4

5

i=1...n

Xi

θ’’

τ=50

θ’

The problem is to estimate the change-point τ

2/24


Historical Overview

Chernoff and Zacks (1964) : Bayesian estimate of the meanafter the change

Hinkley (1970) : an MLE of the change-point τ

Bhattacharya and Brockwell (1976) : limiting behaviour forthe likelihood process

Brodsky and Darkhovksy (1990) : asymptotic distribution ofan estimate of τ

Books : Shiryaev (1978), Brodsky and Darkhovsky (1993), Csorgoand Horvath (1997)

3/24


We observe

Xi =

{θ′, i ≤ τθ′′, i > τ

+ εξi , ξi ∼ N (0, 1), i = 1, . . . , n

Pass to continuous time : t = i/n, t0 = τ/n

i = 1, . . . , n→ t ∈ [a, b], a, b > 0

Size of the jump : ∆ = |θ′ − θ′′|

Asymptotic behavior

tmle = arg maxt∈[a,b]

{∆φ(t) +

ε√n

B(t)√t(1− t)

}where

φ(t) =√

t(1− t)

{1−t01−t , t ≤ t0t0t , t > t0

4/24


Main results concerning the MLE of τ :

Rate of convergence n−1 :

limn→∞

E[n( tmle − t0)

]2=

26ε4

∆4.

The error of estimation converges to

n(tmle − t0)w→ ε2

∆2argmin

t∈R

{−|t|

2+ W (t)

}

where W (t) =

{W1(t), t ≥ 0W2(−t),t < 0

is a two-sided Wiener process

Asymptotic distribution of tmle :

P

{n|tmle − t0| ≥ z

}∼ zn∆2

2ε2e−zn∆2

8ε2 , n→∞

5/24

Multi-dimensional case

A sequence of Gaussian vectors Xi ∈ Rd

Xi ∼ N (θ′, ε2Id)1{i ≤ τ}+N (θ′′, ε2Id)1{i > τ}

with means θ′, θ′′ ∈ Rd before and after the change.

The norm of the vector of jumps ∆θ = θ′ − θ′′ ∈ Rd :

‖∆θ‖2 =d∑

j=1

(θ′j − θ′′j )2

The error of estimation converges to

n(t − t)w→ ε2

‖∆θ‖2argmin

t∈R

{−|t|

2+ W (t)

}, n→∞

6/24

Multi-channel change-point

0 20 40 60 80 100i = 1 . . . n

j=

1...

d

o=30

Observations

Xij =

{θ′j , i ≤ τθ′′j , i > τ

+ ξij , i = 1, . . . , n

j = 1, . . . , d channels

simultaneous change in thesignal mean at some channels

ξij ∼ N (0, 1) i.i.d.

the number of corruptedchannels J is unknown

d →∞Estimate the common change-point τ

7/24

Motivation

Applications

Signal processing :Segmentation of audio-visual signals

Biology :analysis of microarrays ; genetic linkage studies

Cancer research :detection of copy-number variation in a gene

detection of a shared pattern in genomic profiles of patients

Finance :Detection of shifts of volatilities in the stock market

8/24

Applications

Segmentation of audio-visual signals

Take a number of features of the audio-visual signal (d →∞)

Some features change simultaneously⇒ beginning of a new segment

The change not necessarily happens for all the features⇒ sparsity

The signal length n is fixed

9/24

Applications

Detection of copy-number variations (CNV) in a gene

CNVs are alterations in a genome resulting in an abnormalnumber of copies of one or more sections of the DNA :A-B-C-D → A-B-C-C-D (a duplication of ”C”)A-B-C-D → A-B-D (a deletion of ”C”).

Gene copy number can be elevated in cancer cells

Identify CNVs that are frequent in population of cancerpatients or similar for certain genes.

10/24

Method

Model

Xij =

{θ′j , i ≤ τθ′′j , i > τ

+ ξij , i = 1, . . . , n

where ξij ∼ N (0, 1), j = 1, . . . , d and there are J channels with achange .

The log-likelihood L(X , τ, J,m, θ′, θ′′)

Estimate θ′, θ′′ → L(X , τ, J,m)

Estimate the number of channels with change :

J∗(τ) = arg maxm,1≤J≤d

{L(X , τ, J,m)− Pen(J)

},

Estimate the change-point

τ = arg max1≤τ≤n

L(X , τ, J∗(τ))

11/24

Log-likelihood

J is the number of channels with a change

m∗ = {j1, . . . , jJ} ∈ M is a set of indices of corruptedchannels

M =d⋃

J=1

({1,...,d}J

)is a set of all possible combinations of

indices

Log-likelihood

L(X ; τ,m) =∑j∈m

Z 2j (τ), m ∈M, τ = 1, . . . , n

where for k = 1, . . . , n the channels are merged,

Z 2j (k) =

1

k

( k∑i=1

Xij

)2+

1

n − k

( n∑i=k+1

Xij

)2 − 1

n

( n∑i=1

Xij

)2

12/24

Estimator

Ordered statistics Z 2(1)(k) > Z 2

(2)(k) > · · · > Z 2(d)(k) for each

k = 1, . . . , n.

Penalized likelihood for estimating the number of corruptedchannels :

J∗(τ) = arg max1≤J≤d

{ J∑j=1

Z 2(j)(τ)− Pen(J)

},

with the penalty chosen according to (Birge et Massart, 2007)

Pen(J) = (1+α)J+2J√xJ+2JxJ , xJ = log

de

J+2

log J

J, α > 0.

Estimator of τ :

τ = arg max1≤τ≤n

{J∗(τ)∑j=1

Z 2(j)(τ)

}.

13/24

Why penalization ?

0 50 100 150 200 250 300 350 400 450 5000

2000

4000

6000

8000

10000

12000

Lτ(j)

j=1...d

Penalized and non−penalized LL for J=100, d=500, n=100, ∆θj≈ 2.17

non−penalized LL

τ/n=1/4

τ/n=1/2

penalized LL

J=100

Non-penalized log-likelihood increases

We subtract some penalty Pen(J) to penalize the number of”redundant” subsets

14/24

The choice of penalty

Pen(J) = (1 + α)J + 2J√xJ + 2JxJ , α > 0,

xJ = logde

J+ 2

log J

J

The number of subsets of size J is bounded by(d

J

)≤ exp

(J log

de

J

)⇒ the choice of xJ : JxJ = J log de

J + 2 log J

Birge and Massart condition :∑m∈M: #m=J

exp(−JxJ) <∞

⇒“putting a prior finite measure on the list of models”⇒ term log d/d in xJ to make the sum finite

15/24

Log-likelihood

For each fixed k = 1, . . . , n the merged channels have Gaussiandistribution

Zj(k) = ∆θjµ(k) + εξj(k), ξj(k) ∼ N (0, 1), j = 1, . . . , d ,

where

ξj(k) =

√n

k(n − k)

(k∑

i=1

ξij −k

n

n∑i=1

ξij

)and

ξj(k)d=

Bj(t)√t(1− t)

, t = k/n.

16/24

Log-likelihood

The statistic follows non-central χ2 distribution,∑j∈m

Z 2j (k) ∼ χ2(#m, L(k)), k = 1, . . . , n

with #m degrees of freedom (the number of elements in m),

the non-centrality parameter

L(k) = µ2(k)∑j∈m

(θ′′j − θ′j)2,

where the function µ2(k)attains its maximum at τ ,

arg maxk=1,...,n

µ2(k) = τ.0 10 20 30 40 50 60 70 80 90 100

0

5

10

15

20

25

k=1,...,100

µτ(k

)

τ=10

τ=50

τ=80

17/24

Estimation of quadratic functional

It is impossible to estimate consistently

the true subset m∗ = {j1, . . . , jJ} of number of corruptedchannelsthe number of channel with a change J.

We estimate the parameter of non-central χ2 distribution

L(τ) = µ2(τ)‖∆θ‖2

by the estimate (Laurent and Massart, 2000)

L(τ) = max1≤J≤d

{ J∑j=1


}

⇒ Consistent estimation of L(k) is possible under certainconditions on ‖∆θ‖2 and J.

18/24

Consistency of L(k)

Estimating L(k) = µ2(k)‖∆θ‖2 :

Zj(k) = ∆θjµ(k) + εξj(k), ξj(k) ∼ N (0, 1), j = 1, . . . , d ,

L(k) = max1≤J≤d

{ J∑j=1


}Lemma

For any t > 0

P[L(k)− L(k)− 2µ(k)〈∆θ, ξ(k)〉 > t

]≤ Ce−t/(2+1/α).

For any z > 0,

P[L(k)− L(k)− 2µ(k)〈∆θ, ξ(k)〉 < −z − Qd(k)

]≤ e−z ,

whereQd(k) = inf

m

{3

4µ2(k)‖∆θ⊥m‖2 + Pen(#m)

}.

19/24

Corollaries

Assume that for all k = 1, . . . , n

limd→∞

Qd(k)

‖∆θ‖2= 0.

ThenL(k)‖∆θ‖2

P→µ2(k), d →∞L(k) is asymptotically normal,

L(k)

‖∆θ‖2− µ2(k)

w→N(

0, 4µ2(k)), d →∞.

Then τ is consistent

τ = arg maxk=1,...,n

L(k)P→arg max

k=1,...,nµ2(k) = τ

20/24

Consistency of τ

Let the number of channels with change be J = dβ

Define ρ = min(τ/n, 1− τ/n)

Assume that

limd→∞

min(J log de

J ,√d)

‖∆θ‖2ρ= 0,

Consistency of τ

If for some K > 0

δn,d = Kmin(J log de

J ,√d)

n‖∆θ‖2ρ,

the estimator τ is consistent

P

[|τ − τ | ≥ nδn

]→ 0, d →∞.

21/24

Rate

The rate

δn,d ∼min(J log de

J ,√d)

n‖∆θ‖2ρ

depends on

the norm of jump sizes ‖∆θ‖2 =d∑

j=1(θ′j − θ′′j )2

the location of the change-point τ/n ⇒ρ = min(τ/n, 1− τ/n)

the number of channels d and the number of corruptedchannels J

22/24

Rate

20 40 60 80 100 120 140 160 180 200

0

0.05

0.1

0.15

d=10,...,200

E|t

-t|

Risk for n=100, J=d, ∆θj=0.5

τ/n=0.5

τ/n=0.6

τ/n=0.8

τ/n=0.9

‖∆θ‖2 = d/4, J = d

δn,d ∼ d−1/2

nρ

20 40 60 80 100 120 140 160 180 2000

0.01

0.02

0.03

0.04

0.05

0.06

d=10,...,200

E|t

-t|

Risk for n=100, J=10, ∆θj=1

τ/n=0.5

τ/n=0.6

τ/n=0.8

τ/n=0.9

‖∆θ‖2 = J, J = dβ, β < 1/2

δn,d ∼ 1+(1−β) log dnρ

23/24

Future work

Why Qd(k) ∼ min(J log de

J ,√d)

? ⇒ Construct an estimator

with different penalties whether J <√d or J >

√d .

Asymptotic distribution of τ :

tmle = arg maxt∈[a,b]

{‖∆θ‖φ(t)

(‖∆θ‖φ(t) +

1√n

B(t)√t(1− t)

)}where

φ(t) =√

t(1− t)

{1−t01−t , t ≤ t0t0t , t > t0

Optimality ⇒ lower bounds

limd→∞

inft

supt∈[a,b]

sup∆θ∈Θ

δ−1n,dE∆θ |t − t0| ≥ C0.

24/24

Estimation of simultaneous change-point under …Estimation of simultaneous change-point under...

Documents

Transcript of Estimation of simultaneous change-point under …Estimation of simultaneous change-point under...