Estimation of simultaneous change-point under …Estimation of simultaneous change-point under...
Transcript of Estimation of simultaneous change-point under …Estimation of simultaneous change-point under...
Estimation of simultaneous change-point undersparsity conditions
Farida Enikeeva and Zaid Harchaoui
LJK-UJFINRIA Rhones-Alpes
7 June 2012
1/24
Change-point problem for Gaussian observations
We observe Gaussian data with the change in mean,
Xi =
{θ′, i ≤ τθ′′, i > τ
+ εξi , ξi ∼ N (0, 1), i = 1, . . . , n
0 20 40 60 80 100 120 140 160 180 20050−4
−3
−2
−1
0
1
2
3
4
5
i=1...n
Xi
θ’’
τ=50
θ’
The problem is to estimate the change-point τ
2/24
Change-point problem for Gaussian observations
Historical Overview
Chernoff and Zacks (1964) : Bayesian estimate of the meanafter the change
Hinkley (1970) : an MLE of the change-point τ
Bhattacharya and Brockwell (1976) : limiting behaviour forthe likelihood process
Brodsky and Darkhovksy (1990) : asymptotic distribution ofan estimate of τ
Books : Shiryaev (1978), Brodsky and Darkhovsky (1993), Csorgoand Horvath (1997)
3/24
Change-point problem for Gaussian observations
We observe
Xi =
{θ′, i ≤ τθ′′, i > τ
+ εξi , ξi ∼ N (0, 1), i = 1, . . . , n
Pass to continuous time : t = i/n, t0 = τ/n
i = 1, . . . , n→ t ∈ [a, b], a, b > 0
Size of the jump : ∆ = |θ′ − θ′′|
Asymptotic behavior
tmle = arg maxt∈[a,b]
{∆φ(t) +
ε√n
B(t)√t(1− t)
}where
φ(t) =√
t(1− t)
{1−t01−t , t ≤ t0t0t , t > t0
4/24
Change-point problem for Gaussian observations
Main results concerning the MLE of τ :
Rate of convergence n−1 :
limn→∞
E[n( tmle − t0)
]2=
26ε4
∆4.
The error of estimation converges to
n(tmle − t0)w→ ε2
∆2argmin
t∈R
{−|t|
2+ W (t)
}
where W (t) =
{W1(t), t ≥ 0W2(−t),t < 0
is a two-sided Wiener process
Asymptotic distribution of tmle :
P
{n|tmle − t0| ≥ z
}∼ zn∆2
2ε2e−zn∆2
8ε2 , n→∞
5/24
Multi-dimensional case
A sequence of Gaussian vectors Xi ∈ Rd
Xi ∼ N (θ′, ε2Id)1{i ≤ τ}+N (θ′′, ε2Id)1{i > τ}
with means θ′, θ′′ ∈ Rd before and after the change.
The norm of the vector of jumps ∆θ = θ′ − θ′′ ∈ Rd :
‖∆θ‖2 =d∑
j=1
(θ′j − θ′′j )2
The error of estimation converges to
n(t − t)w→ ε2
‖∆θ‖2argmin
t∈R
{−|t|
2+ W (t)
}, n→∞
6/24
Multi-channel change-point
0 20 40 60 80 100i = 1 . . . n
j=
1...
d
o=30
Observations
Xij =
{θ′j , i ≤ τθ′′j , i > τ
+ ξij , i = 1, . . . , n
j = 1, . . . , d channels
simultaneous change in thesignal mean at some channels
ξij ∼ N (0, 1) i.i.d.
the number of corruptedchannels J is unknown
d →∞Estimate the common change-point τ
7/24
Motivation
Applications
Signal processing :Segmentation of audio-visual signals
Biology :analysis of microarrays ; genetic linkage studies
Cancer research :detection of copy-number variation in a gene
detection of a shared pattern in genomic profiles of patients
Finance :Detection of shifts of volatilities in the stock market
8/24
Applications
Segmentation of audio-visual signals
Take a number of features of the audio-visual signal (d →∞)
Some features change simultaneously⇒ beginning of a new segment
The change not necessarily happens for all the features⇒ sparsity
The signal length n is fixed
9/24
Applications
Detection of copy-number variations (CNV) in a gene
CNVs are alterations in a genome resulting in an abnormalnumber of copies of one or more sections of the DNA :A-B-C-D → A-B-C-C-D (a duplication of ”C”)A-B-C-D → A-B-D (a deletion of ”C”).
Gene copy number can be elevated in cancer cells
Identify CNVs that are frequent in population of cancerpatients or similar for certain genes.
10/24
Method
Model
Xij =
{θ′j , i ≤ τθ′′j , i > τ
+ ξij , i = 1, . . . , n
where ξij ∼ N (0, 1), j = 1, . . . , d and there are J channels with achange .
The log-likelihood L(X , τ, J,m, θ′, θ′′)
Estimate θ′, θ′′ → L(X , τ, J,m)
Estimate the number of channels with change :
J∗(τ) = arg maxm,1≤J≤d
{L(X , τ, J,m)− Pen(J)
},
Estimate the change-point
τ = arg max1≤τ≤n
L(X , τ, J∗(τ))
11/24
Log-likelihood
J is the number of channels with a change
m∗ = {j1, . . . , jJ} ∈ M is a set of indices of corruptedchannels
M =d⋃
J=1
({1,...,d}J
)is a set of all possible combinations of
indices
Log-likelihood
L(X ; τ,m) =∑j∈m
Z 2j (τ), m ∈M, τ = 1, . . . , n
where for k = 1, . . . , n the channels are merged,
Z 2j (k) =
1
k
( k∑i=1
Xij
)2+
1
n − k
( n∑i=k+1
Xij
)2 − 1
n
( n∑i=1
Xij
)2
12/24
Estimator
Ordered statistics Z 2(1)(k) > Z 2
(2)(k) > · · · > Z 2(d)(k) for each
k = 1, . . . , n.
Penalized likelihood for estimating the number of corruptedchannels :
J∗(τ) = arg max1≤J≤d
{ J∑j=1
Z 2(j)(τ)− Pen(J)
},
with the penalty chosen according to (Birge et Massart, 2007)
Pen(J) = (1+α)J+2J√xJ+2JxJ , xJ = log
de
J+2
log J
J, α > 0.
Estimator of τ :
τ = arg max1≤τ≤n
{J∗(τ)∑j=1
Z 2(j)(τ)
}.
13/24
Why penalization ?
0 50 100 150 200 250 300 350 400 450 5000
2000
4000
6000
8000
10000
12000
Lτ(j)
j=1...d
Penalized and non−penalized LL for J=100, d=500, n=100, ∆θj≈ 2.17
non−penalized LL
τ/n=1/4
τ/n=1/2
penalized LL
J=100
Non-penalized log-likelihood increases
We subtract some penalty Pen(J) to penalize the number of”redundant” subsets
14/24
The choice of penalty
Pen(J) = (1 + α)J + 2J√xJ + 2JxJ , α > 0,
xJ = logde
J+ 2
log J
J
The number of subsets of size J is bounded by(d
J
)≤ exp
(J log
de
J
)⇒ the choice of xJ : JxJ = J log de
J + 2 log J
Birge and Massart condition :∑m∈M: #m=J
exp(−JxJ) <∞
⇒“putting a prior finite measure on the list of models”⇒ term log d/d in xJ to make the sum finite
15/24
Log-likelihood
For each fixed k = 1, . . . , n the merged channels have Gaussiandistribution
Zj(k) = ∆θjµ(k) + εξj(k), ξj(k) ∼ N (0, 1), j = 1, . . . , d ,
where
ξj(k) =
√n
k(n − k)
(k∑
i=1
ξij −k
n
n∑i=1
ξij
)and
ξj(k)d=
Bj(t)√t(1− t)
, t = k/n.
16/24
Log-likelihood
The statistic follows non-central χ2 distribution,∑j∈m
Z 2j (k) ∼ χ2(#m, L(k)), k = 1, . . . , n
with #m degrees of freedom (the number of elements in m),
the non-centrality parameter
L(k) = µ2(k)∑j∈m
(θ′′j − θ′j)2,
where the function µ2(k)attains its maximum at τ ,
arg maxk=1,...,n
µ2(k) = τ.0 10 20 30 40 50 60 70 80 90 100
0
5
10
15
20
25
k=1,...,100
µτ(k
)
τ=10
τ=50
τ=80
17/24
Estimation of quadratic functional
It is impossible to estimate consistently
the true subset m∗ = {j1, . . . , jJ} of number of corruptedchannelsthe number of channel with a change J.
We estimate the parameter of non-central χ2 distribution
L(τ) = µ2(τ)‖∆θ‖2
by the estimate (Laurent and Massart, 2000)
L(τ) = max1≤J≤d
{ J∑j=1
Z 2(j)(τ)− Pen(J)
}
⇒ Consistent estimation of L(k) is possible under certainconditions on ‖∆θ‖2 and J.
18/24
Consistency of L(k)
Estimating L(k) = µ2(k)‖∆θ‖2 :
Zj(k) = ∆θjµ(k) + εξj(k), ξj(k) ∼ N (0, 1), j = 1, . . . , d ,
L(k) = max1≤J≤d
{ J∑j=1
Z 2(j)(τ)− Pen(J)
}Lemma
For any t > 0
P[L(k)− L(k)− 2µ(k)〈∆θ, ξ(k)〉 > t
]≤ Ce−t/(2+1/α).
For any z > 0,
P[L(k)− L(k)− 2µ(k)〈∆θ, ξ(k)〉 < −z − Qd(k)
]≤ e−z ,
whereQd(k) = inf
m
{3
4µ2(k)‖∆θ⊥m‖2 + Pen(#m)
}.
19/24
Corollaries
Assume that for all k = 1, . . . , n
limd→∞
Qd(k)
‖∆θ‖2= 0.
ThenL(k)‖∆θ‖2
P→µ2(k), d →∞L(k) is asymptotically normal,
L(k)
‖∆θ‖2− µ2(k)
w→N(
0, 4µ2(k)), d →∞.
Then τ is consistent
τ = arg maxk=1,...,n
L(k)P→arg max
k=1,...,nµ2(k) = τ
20/24
Consistency of τ
Let the number of channels with change be J = dβ
Define ρ = min(τ/n, 1− τ/n)
Assume that
limd→∞
min(J log de
J ,√d)
‖∆θ‖2ρ= 0,
Consistency of τ
If for some K > 0
δn,d = Kmin(J log de
J ,√d)
n‖∆θ‖2ρ,
the estimator τ is consistent
P
[|τ − τ | ≥ nδn
]→ 0, d →∞.
21/24
Rate
The rate
δn,d ∼min(J log de
J ,√d)
n‖∆θ‖2ρ
depends on
the norm of jump sizes ‖∆θ‖2 =d∑
j=1(θ′j − θ′′j )2
the location of the change-point τ/n ⇒ρ = min(τ/n, 1− τ/n)
the number of channels d and the number of corruptedchannels J
22/24
Rate
20 40 60 80 100 120 140 160 180 200
0
0.05
0.1
0.15
d=10,...,200
E|t
-t|
Risk for n=100, J=d, ∆θj=0.5
τ/n=0.5
τ/n=0.6
τ/n=0.8
τ/n=0.9
‖∆θ‖2 = d/4, J = d
δn,d ∼ d−1/2
nρ
20 40 60 80 100 120 140 160 180 2000
0.01
0.02
0.03
0.04
0.05
0.06
d=10,...,200
E|t
-t|
Risk for n=100, J=10, ∆θj=1
τ/n=0.5
τ/n=0.6
τ/n=0.8
τ/n=0.9
‖∆θ‖2 = J, J = dβ, β < 1/2
δn,d ∼ 1+(1−β) log dnρ
23/24
Future work
Why Qd(k) ∼ min(J log de
J ,√d)
? ⇒ Construct an estimator
with different penalties whether J <√d or J >
√d .
Asymptotic distribution of τ :
tmle = arg maxt∈[a,b]
{‖∆θ‖φ(t)
(‖∆θ‖φ(t) +
1√n
B(t)√t(1− t)
)}where
φ(t) =√
t(1− t)
{1−t01−t , t ≤ t0t0t , t > t0
Optimality ⇒ lower bounds
limd→∞
inft
supt∈[a,b]
sup∆θ∈Θ
δ−1n,dE∆θ |t − t0| ≥ C0.
24/24