A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of...

59
Saarland University A Bernstein inequality for dependent random matrices Marwa Banna with F. Merlevède and P. Youssef Analytic Versus Combinatorial in Free Probability Banff December 2016

Transcript of A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of...

Page 1: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Saarland University

A Bernstein inequality for dependent randommatrices

Marwa Bannawith F. Merlevède and P. Youssef

Analytic Versus Combinatorial in Free ProbabilityBanff

December 2016

Page 2: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Introduction

I Let (Xk )k be a sequence of i.i.d. random variables with commonmean µ and variance σ2.

I By the central limit theorem

√n(1

n

n∑k=1

Xk − µ)

L−−−→n→∞

N (0, σ2) := Y .

I For any n,

P( n∑

k=1

(Xk − µ) > x)6 ?

I A long history:

Cramér, Benett, Hoeffding, Nagaev, Bernstein, . . .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 3: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Introduction

I Let (Xk )k be a sequence of i.i.d. random variables with commonmean µ and variance σ2.

I By the central limit theorem

P( n∑

k=1

(Xk − µ) > x√

n)−−−→n→∞

P(

Y > x)6 exp

(− x2

2σ2

)

I For any n,

P( n∑

k=1

(Xk − µ) > x)6 ?

I A long history:

Cramér, Benett, Hoeffding, Nagaev, Bernstein, . . .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 4: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Introduction

I Let (Xk )k be a sequence of i.i.d. random variables with commonmean µ and variance σ2.

I By the central limit theorem

P( n∑

k=1

(Xk − µ) > x√

n)−−−→n→∞

P(

Y > x)6 exp

(− x2

2σ2

)I For any n,

P( n∑

k=1

(Xk − µ) > x)6 ?

I A long history:

Cramér, Benett, Hoeffding, Nagaev, Bernstein, . . .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 5: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Introduction

I Let (Xk )k be a sequence of i.i.d. random variables with commonmean µ and variance σ2.

I By the central limit theorem

P( n∑

k=1

(Xk − µ) > x√

n)−−−→n→∞

P(

Y > x)6 exp

(− x2

2σ2

)I For any n,

P( n∑

k=1

(Xk − µ) > x)6 ?

I A long history:Cramér, Benett, Hoeffding, Nagaev, Bernstein, . . .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 6: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Bernstein inequalityScalar independent case

Let X1, . . . ,Xn be independent random variables such that

EXk = 0 , EX 2k = σ2

k and supk|Xk | < 1 a.s.

For any x > 0,

P( n∑

k=1

Xk > x)6 exp

(− x2

2 nVn + 2x

),

where Vn =1n

n∑k=1

σ2k .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 7: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Bernstein inequalityScalar independent case

Let X1, . . . ,Xn be independent random variables such that

EXk = 0 , EX 2k = σ2

k and supk|Xk | < 1 a.s.

For any x > 0,

P( n∑

k=1

Xk > x)6 exp

(− x2

2 nVn + 2x

),

where Vn =1n

n∑k=1

σ2k .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 8: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Matrix Setting

Let X1, . . . ,Xn be d × d centered Hermitian random matrices.

What can be said about

P(λmax

( n∑k=1

Xk)> x

)6 ?

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 9: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Independent Matrix Case

Theorem (Ahlswede and Winter ’02, Tropp ’12)Let X1, . . . ,Xn be d × d independent Hermitian random matrices.Assume that each matrix satisfies

EXk = 0 and λmax(Xk ) ≤ 1 a.s.

Then for any x > 0,

P(λmax

( n∑k=1

Xk)≥ x

)≤ d · exp

(− x2/2

nσ2 + x/3

),

where σ2 :=1nλmax

( n∑k=1

EX2k).

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 10: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Independent Matrix Case

Theorem (Ahlswede and Winter ’02, Tropp ’12)Let X1, . . . ,Xn be d × d independent Hermitian random matrices.Assume that each matrix satisfies

EXk = 0 and λmax(Xk ) ≤ 1 a.s.

Then for any x > 0,

P(λmax

( n∑k=1

Xk)≥ x

)≤ d · exp

(− x2/2

nσ2 + x/3

),

where σ2 :=1nλmax

( n∑k=1

EX2k).

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 11: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Applications?!

I Let Y ∈ Rd be an isotropic random vector i.e.

EY = 0 and EYY t = Id .

In particular, E‖Y‖22 = ETr(YY t) = d .

I Let Y1, . . . ,Yn be independent copies of Y .I If d is fixed then by the law of large numbers

limn→∞

1n

n∑k=1

Yk Y Tk = Id a.s.

Question: What is the minimal order of n so that∥∥1n

n∑k=1

Yk Y tk − Id

∥∥ < ε?

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 12: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Applications?!

I Let Y ∈ Rd be an isotropic random vector i.e.

EY = 0 and EYY t = Id .

In particular, E‖Y‖22 = ETr(YY t) = d .

I Let Y1, . . . ,Yn be independent copies of Y .I If d is fixed then by the law of large numbers

limn→∞

1n

n∑k=1

Yk Y Tk = Id a.s.

Question: What is the minimal order of n so that∥∥1n

n∑k=1

Yk Y tk − Id

∥∥ < ε?

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 13: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Applications?!

I Let Y ∈ Rd be an isotropic random vector i.e.

EY = 0 and EYY t = Id .

In particular, E‖Y‖22 = ETr(YY t) = d .

I Let Y1, . . . ,Yn be independent copies of Y .I If d is fixed then by the law of large numbers

limn→∞

1n

n∑k=1

Yk Y Tk = Id a.s.

Question: What is the minimal order of n so that∥∥1n

n∑k=1

Yk Y tk − Id

∥∥ < ε?

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 14: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Applications?!

I Set Xk = 1n (Yk Y t

k − Id).

I EXk = 0.I ‖Yk‖2 = O(

√d) ⇒ ‖Xk‖ = O( d

n ).

Control: σ2 = 1nλmax

(∑nk=1 EX2

k

)I EX2

k = 1n2E[‖Yk‖2

2Yk Y tk + Id − 2Yk Y t

k

]- d

n2 Id .I σ2 = O( d

n2 ).

Bernstein’s inequality then yields

P(∥∥1

n

n∑k=1

Yk Y tk − Id

∥∥ ≥√d log dn

)= O(d−1).

Conclusion: n ∼ d log d copies are sufficient.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 15: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Applications?!

I Set Xk = 1n (Yk Y t

k − Id).I EXk = 0.I ‖Yk‖2 = O(

√d) ⇒ ‖Xk‖ = O( d

n ).

Control: σ2 = 1nλmax

(∑nk=1 EX2

k

)I EX2

k = 1n2E[‖Yk‖2

2Yk Y tk + Id − 2Yk Y t

k

]- d

n2 Id .I σ2 = O( d

n2 ).

Bernstein’s inequality then yields

P(∥∥1

n

n∑k=1

Yk Y tk − Id

∥∥ ≥√d log dn

)= O(d−1).

Conclusion: n ∼ d log d copies are sufficient.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 16: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Applications?!

I Set Xk = 1n (Yk Y t

k − Id).I EXk = 0.I ‖Yk‖2 = O(

√d) ⇒ ‖Xk‖ = O( d

n ).

Control: σ2 = 1nλmax

(∑nk=1 EX2

k

)

I EX2k = 1

n2E[‖Yk‖2

2Yk Y tk + Id − 2Yk Y t

k

]- d

n2 Id .I σ2 = O( d

n2 ).

Bernstein’s inequality then yields

P(∥∥1

n

n∑k=1

Yk Y tk − Id

∥∥ ≥√d log dn

)= O(d−1).

Conclusion: n ∼ d log d copies are sufficient.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 17: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Applications?!

I Set Xk = 1n (Yk Y t

k − Id).I EXk = 0.I ‖Yk‖2 = O(

√d) ⇒ ‖Xk‖ = O( d

n ).

Control: σ2 = 1nλmax

(∑nk=1 EX2

k

)I EX2

k = 1n2E[‖Yk‖2

2Yk Y tk + Id − 2Yk Y t

k

]- d

n2 Id .I σ2 = O( d

n2 ).

Bernstein’s inequality then yields

P(∥∥1

n

n∑k=1

Yk Y tk − Id

∥∥ ≥√d log dn

)= O(d−1).

Conclusion: n ∼ d log d copies are sufficient.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 18: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Applications?!

I Set Xk = 1n (Yk Y t

k − Id).I EXk = 0.I ‖Yk‖2 = O(

√d) ⇒ ‖Xk‖ = O( d

n ).

Control: σ2 = 1nλmax

(∑nk=1 EX2

k

)I EX2

k = 1n2E[‖Yk‖2

2Yk Y tk + Id − 2Yk Y t

k

]- d

n2 Id .I σ2 = O( d

n2 ).

Bernstein’s inequality then yields

P(∥∥1

n

n∑k=1

Yk Y tk − Id

∥∥ ≥√d log dn

)= O(d−1).

Conclusion: n ∼ d log d copies are sufficient.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 19: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Applications?!

I Set Xk = 1n (Yk Y t

k − Id).I EXk = 0.I ‖Yk‖2 = O(

√d) ⇒ ‖Xk‖ = O( d

n ).

Control: σ2 = 1nλmax

(∑nk=1 EX2

k

)I EX2

k = 1n2E[‖Yk‖2

2Yk Y tk + Id − 2Yk Y t

k

]- d

n2 Id .I σ2 = O( d

n2 ).

Bernstein’s inequality then yields

P(∥∥1

n

n∑k=1

Yk Y tk − Id

∥∥ ≥√d log dn

)= O(d−1).

Conclusion: n ∼ d log d copies are sufficient.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 20: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Dependent Matrix Case

I The β-mixing coefficient between two σ-algebras A and B isdefined by

β(A,B) = 12

sup{∑

i∈I

∑j∈J

∣∣P(Ai ∩ Bj)− P(Ai)P(Bj)∣∣} ,

where the supremum is taken over all finite partitions (Ai)i∈I and(Bj)j∈J that are respectively A and B measurable.

I The sequence (βk )k associated with (Xk )k is defined by

βk := supjβ(σ(Xi , i ≤ j), σ(Xi , i ≥ j + k))

I βk ≤ e−ck for some positive constant c.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 21: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Dependent Matrix Case

I The β-mixing coefficient between two σ-algebras A and B isdefined by

β(A,B) = 12

sup{∑

i∈I

∑j∈J

∣∣P(Ai ∩ Bj)− P(Ai)P(Bj)∣∣} ,

where the supremum is taken over all finite partitions (Ai)i∈I and(Bj)j∈J that are respectively A and B measurable.

I The sequence (βk )k associated with (Xk )k is defined by

βk := supjβ(σ(Xi , i ≤ j), σ(Xi , i ≥ j + k))

I βk ≤ e−ck for some positive constant c.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 22: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Dependent Matrix Case

I The β-mixing coefficient between two σ-algebras A and B isdefined by

β(A,B) = 12

sup{∑

i∈I

∑j∈J

∣∣P(Ai ∩ Bj)− P(Ai)P(Bj)∣∣} ,

where the supremum is taken over all finite partitions (Ai)i∈I and(Bj)j∈J that are respectively A and B measurable.

I The sequence (βk )k associated with (Xk )k is defined by

βk := supjβ(σ(Xi , i ≤ j), σ(Xi , i ≥ j + k))

I βk ≤ e−ck for some positive constant c.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 23: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Dependent Matrix Case

Theorem (B., Merlevède and Youssef ’15)Let (Xk )k≥1 be a family of geometrically β-mixing random matrices ofdimension d.

Assume that

E(Xk ) = 0 and λmax(Xk ) ≤ 1 a.s.

Then for any x > 0 ,

P(λmax

( n∑k=1

Xk)≥ x

)≤ d exp

(− Cx2

nv2 + xc−1(log n)2

),

where C is a universal constant and v2 is given by

v2 = supJ⊆{1,...,n}

1CardJ

λmax

( ∑k,`∈J

Cov(Xk ,X`)).

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 24: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Dependent Matrix Case

Theorem (B., Merlevède and Youssef ’15)Let (Xk )k≥1 be a family of geometrically β-mixing random matrices ofdimension d. Assume that

E(Xk ) = 0 and λmax(Xk ) ≤ 1 a.s.

Then for any x > 0 ,

P(λmax

( n∑k=1

Xk)≥ x

)≤ d exp

(− Cx2

nv2 + xc−1(log n)2

),

where C is a universal constant and v2 is given by

v2 = supJ⊆{1,...,n}

1CardJ

λmax

( ∑k,`∈J

Cov(Xk ,X`)).

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 25: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Dependent Matrix Case

Theorem (B., Merlevède and Youssef ’15)Let (Xk )k≥1 be a family of geometrically β-mixing random matrices ofdimension d. Assume that

E(Xk ) = 0 and λmax(Xk ) ≤ 1 a.s.

Then for any x > 0 ,

P(λmax

( n∑k=1

Xk)≥ x

)≤ d exp

(− Cx2

nv2 + xc−1(log n)2

),

where C is a universal constant and v2 is given by

v2 = supJ⊆{1,...,n}

1CardJ

λmax

( ∑k,`∈J

Cov(Xk ,X`)).

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 26: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Consequence

Let A be a d × n random matrix such that:

I Yi= Columns of A are isotropic i.e. EYi = 0 and EYiY ti = Id .

I ‖Yi‖2 ≤√

d a.s.

I (Yi)i≤n are geometrically β-mixing.

Recall that AAT =∑n

k=1 Yk Y Tk and λk (AAT ) = s2

k (A).

Then with high probability

1−√

d log3 d/n ≤ smin

( A√n

)≤ smax

( A√n

)≤ 1 +

√d log3 d/n.

If d log3 d � n then 1√n A is "almost" an isometry.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 27: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Consequence

Let A be a d × n random matrix such that:

I Yi= Columns of A are isotropic i.e. EYi = 0 and EYiY ti = Id .

I ‖Yi‖2 ≤√

d a.s.

I (Yi)i≤n are geometrically β-mixing.

Recall that AAT =∑n

k=1 Yk Y Tk and λk (AAT ) = s2

k (A).

Then with high probability

1−√

d log3 d/n ≤ smin

( A√n

)≤ smax

( A√n

)≤ 1 +

√d log3 d/n.

If d log3 d � n then 1√n A is "almost" an isometry.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 28: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Consequence

Let A be a d × n random matrix such that:

I Yi= Columns of A are isotropic i.e. EYi = 0 and EYiY ti = Id .

I ‖Yi‖2 ≤√

d a.s.

I (Yi)i≤n are geometrically β-mixing.

Recall that AAT =∑n

k=1 Yk Y Tk and λk (AAT ) = s2

k (A).

Then with high probability

1−√

d log3 d/n ≤ smin

( A√n

)≤ smax

( A√n

)≤ 1 +

√d log3 d/n.

If d log3 d � n then 1√n A is "almost" an isometry.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 29: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Consequence

Let A be a d × n random matrix such that:

I Yi= Columns of A are isotropic i.e. EYi = 0 and EYiY ti = Id .

I ‖Yi‖2 ≤√

d a.s.

I (Yi)i≤n are geometrically β-mixing.

Recall that AAT =∑n

k=1 Yk Y Tk and λk (AAT ) = s2

k (A).

Then with high probability

1−√

d log3 d/n ≤ smin

( A√n

)≤ smax

( A√n

)≤ 1 +

√d log3 d/n.

If d log3 d � n then 1√n A is "almost" an isometry.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 30: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Consequence

Let A be a d × n random matrix such that:

I Yi= Columns of A are isotropic i.e. EYi = 0 and EYiY ti = Id .

I ‖Yi‖2 ≤√

d a.s.

I (Yi)i≤n are geometrically β-mixing.

Recall that AAT =∑n

k=1 Yk Y Tk and λk (AAT ) = s2

k (A).

Then with high probability

1−√

d log3 d/n ≤ smin

( A√n

)≤ smax

( A√n

)≤ 1 +

√d log3 d/n.

If d log3 d � n then 1√n A is "almost" an isometry.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 31: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Consequence

Let A be a d × n random matrix such that:

I Yi= Columns of A are isotropic i.e. EYi = 0 and EYiY ti = Id .

I ‖Yi‖2 ≤√

d a.s.

I (Yi)i≤n are geometrically β-mixing.

Recall that AAT =∑n

k=1 Yk Y Tk and λk (AAT ) = s2

k (A).

Then with high probability

1−√

d log3 d/n ≤ smin

( A√n

)≤ smax

( A√n

)≤ 1 +

√d log3 d/n.

If d log3 d � n then 1√n A is "almost" an isometry.

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 32: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Matrix Chernoff Bound

Ahlswede and Winter (2002) prove the following matrix Chernoffbound:

P(λmax

( n∑k=1

Xk)> x

)6 inf

t>0

{e−tx · ETr exp

(t

n∑i=1

Xi)}

For any t > 0,

P(λmax

( n∑k=1

Xk)> x

)= P

(eλmax(t

∑nk=1 Xk ) > etx

)6 e−tx · Eeλmax

(t∑n

k=1 Xk

)

6 e−tx · ETr(et

∑nk=1 Xk

)λmax

(eA) = eλmax(A)

Aim: Give a suitable bound for

Ln(t) := ETr exp(t

n∑i=1

Xi)

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 33: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Matrix Chernoff Bound

Ahlswede and Winter (2002) prove the following matrix Chernoffbound:

P(λmax

( n∑k=1

Xk)> x

)6 inf

t>0

{e−tx · ETr exp

(t

n∑i=1

Xi)}

For any t > 0,

P(λmax

( n∑k=1

Xk)> x

)= P

(eλmax(t

∑nk=1 Xk ) > etx

)

6 e−tx · Eeλmax

(t∑n

k=1 Xk

)

6 e−tx · ETr(et

∑nk=1 Xk

)λmax

(eA) = eλmax(A)

Aim: Give a suitable bound for

Ln(t) := ETr exp(t

n∑i=1

Xi)

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 34: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Matrix Chernoff Bound

Ahlswede and Winter (2002) prove the following matrix Chernoffbound:

P(λmax

( n∑k=1

Xk)> x

)6 inf

t>0

{e−tx · ETr exp

(t

n∑i=1

Xi)}

For any t > 0,

P(λmax

( n∑k=1

Xk)> x

)= P

(eλmax(t

∑nk=1 Xk ) > etx

)6 e−tx · Eeλmax

(t∑n

k=1 Xk

)

6 e−tx · ETr(et

∑nk=1 Xk

)λmax

(eA) = eλmax(A)

Aim: Give a suitable bound for

Ln(t) := ETr exp(t

n∑i=1

Xi)

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 35: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Matrix Chernoff Bound

Ahlswede and Winter (2002) prove the following matrix Chernoffbound:

P(λmax

( n∑k=1

Xk)> x

)6 inf

t>0

{e−tx · ETr exp

(t

n∑i=1

Xi)}

For any t > 0,

P(λmax

( n∑k=1

Xk)> x

)= P

(eλmax(t

∑nk=1 Xk ) > etx

)6 e−tx · Eλmax

(et

∑nk=1 Xk

)λmax

(eA) = eλmax(A)

6 e−tx · ETr(et

∑nk=1 Xk

)eA � 0

Aim: Give a suitable bound for

Ln(t) := ETr exp(t

n∑i=1

Xi)

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 36: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Matrix Chernoff Bound

Ahlswede and Winter (2002) prove the following matrix Chernoffbound:

P(λmax

( n∑k=1

Xk)> x

)6 inf

t>0

{e−tx · ETr exp

(t

n∑i=1

Xi)}

For any t > 0,

P(λmax

( n∑k=1

Xk)> x

)= P

(eλmax(t

∑nk=1 Xk ) > etx

)6 e−tx · Eλmax

(et

∑nk=1 Xk

)λmax

(eA) = eλmax(A)

6 e−tx · ETr(et

∑nk=1 Xk

)eA � 0

Aim: Give a suitable bound for

Ln(t) := ETr exp(t

n∑i=1

Xi)

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 37: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Matrix Chernoff Bound

Ahlswede and Winter (2002) prove the following matrix Chernoffbound:

P(λmax

( n∑k=1

Xk)> x

)6 inf

t>0

{e−tx · ETr exp

(t

n∑i=1

Xi)}

For any t > 0,

P(λmax

( n∑k=1

Xk)> x

)= P

(eλmax(t

∑nk=1 Xk ) > etx

)6 e−tx · Eλmax

(et

∑nk=1 Xk

)λmax

(eA) = eλmax(A)

6 e−tx · ETr(et

∑nk=1 Xk

)eA � 0

Aim: Give a suitable bound for

Ln(t) := ETr exp(t

n∑i=1

Xi)

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 38: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

On the matrix exponential

I The matrix exponential is given by

A 7→ eA := Id +∞∑

k=1

Ak

k !

I The matrix exponential is positive definite but it isNOT increasing

A � B ; Ak � Bk

NOR convex(12

A +12

B)k�

12

Ak +12

Bk for k > 2 .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 39: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

On the matrix exponential

I The matrix exponential is given by

A 7→ eA := Id +∞∑

k=1

Ak

k !

I The matrix exponential is positive definite but it isNOT increasing

A � B ; Ak � Bk

NOR convex(12

A +12

B)k�

12

Ak +12

Bk for k > 2 .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 40: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

On the matrix exponential

I The matrix exponential is given by

A 7→ eA := Id +∞∑

k=1

Ak

k !

I The matrix exponential is positive definite but it isNOT increasing

A � B ; Ak � Bk

NOR convex(12

A +12

B)k�

12

Ak +12

Bk for k > 2 .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 41: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

On the Trace exponential

I The Trace exponential is increasing and convex

A � B =⇒ Tr exp(A) ≤ Tr exp(B)

and for any t ∈ [0,1],

Tr exp(tA + (1− t)B) ≤ t Tr exp(A) + (1− t)Tr exp(B) .

I Jensen’s inequality for the Trace exponential function yields

Tr exp(EA) ≤ ETr exp(A) .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 42: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

On the Trace exponential

I The Trace exponential is increasing and convex

A � B =⇒ Tr exp(A) ≤ Tr exp(B)

and for any t ∈ [0,1],

Tr exp(tA + (1− t)B) ≤ t Tr exp(A) + (1− t)Tr exp(B) .

I Jensen’s inequality for the Trace exponential function yields

Tr exp(EA) ≤ ETr exp(A) .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 43: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

The Golden-Thompson Inequality

The matrix exponential does not convert sums into products!

eA+B 6= eA · eB

The Golden-Thompson (1965) inequality:

Tr(eA+B) 6 Tr

(eA · eB)

This inequality fails for more than two matrices

Tr(eA+B+C) Tr

(eA · eB · eC)

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 44: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

The Golden-Thompson Inequality

The matrix exponential does not convert sums into products!

eA+B 6= eA · eB

The Golden-Thompson (1965) inequality:

Tr(eA+B) 6 Tr

(eA · eB)

This inequality fails for more than two matrices

Tr(eA+B+C) Tr

(eA · eB · eC)

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 45: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

The Golden-Thompson Inequality

The matrix exponential does not convert sums into products!

eA+B 6= eA · eB

The Golden-Thompson (1965) inequality:

Tr(eA+B) 6 Tr

(eA · eB)

This inequality fails for more than two matrices

Tr(eA+B+C) Tr

(eA · eB · eC)

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 46: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Independent Matrix CaseAlshwede-Winter’s approach

Let X1, . . . ,Xn be d × d mutually independent centered Hermitianrandom matrices.

Aim: Control Ln(t) = ETr exp(t∑n

k=1 Xk).

ETr exp(t

n∑k=1

Xk)≤ ETr

(etXn · et

∑n−1k=1 Xk

)Golden-Thompson

= Tr(E(etXn) · E

(et

∑n−1k=1 Xk

))Independence

≤ λmax(EetXn) · ETr(et

∑n−1k=1 Xk

).

The last inequality follows from:

E(etXn) � λmax(E(etXn)) · Id .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 47: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Independent Matrix CaseAlshwede-Winter’s approach

Let X1, . . . ,Xn be d × d mutually independent centered Hermitianrandom matrices.

Aim: Control Ln(t) = ETr exp(t∑n

k=1 Xk).

ETr exp(t

n∑k=1

Xk)≤ ETr

(etXn · et

∑n−1k=1 Xk

)Golden-Thompson

= Tr(E(etXn) · E

(et

∑n−1k=1 Xk

))Independence

≤ λmax(EetXn) · ETr(et

∑n−1k=1 Xk

).

The last inequality follows from:

E(etXn) � λmax(E(etXn)) · Id .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 48: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Independent Matrix CaseAlshwede-Winter’s approach

Let X1, . . . ,Xn be d × d mutually independent centered Hermitianrandom matrices.

Aim: Control Ln(t) = ETr exp(t∑n

k=1 Xk).

ETr exp(t

n∑k=1

Xk)≤ ETr

(etXn · et

∑n−1k=1 Xk

)Golden-Thompson

= Tr(E(etXn) · E

(et

∑n−1k=1 Xk

))Independence

≤ λmax(EetXn) · ETr(et

∑n−1k=1 Xk

).

The last inequality follows from:

E(etXn) � λmax(E(etXn)) · Id .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 49: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Independent Matrix CaseAlshwede-Winter’s approach

Let X1, . . . ,Xn be d × d mutually independent centered Hermitianrandom matrices.

Aim: Control Ln(t) = ETr exp(t∑n

k=1 Xk).

ETr exp(t

n∑k=1

Xk)≤ ETr

(etXn · et

∑n−1k=1 Xk

)Golden-Thompson

= Tr(E(etXn) · E

(et

∑n−1k=1 Xk

))Independence

≤ λmax(EetXn) · ETr(et

∑n−1k=1 Xk

).

The last inequality follows from:

E(etXn) � λmax(E(etXn)) · Id .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 50: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Independent Matrix CaseAlshwede-Winter’s approach

Let X1, . . . ,Xn be d × d mutually independent centered Hermitianrandom matrices.

Aim: Control Ln(t) = ETr exp(t∑n

k=1 Xk).

ETr exp(t

n∑k=1

Xk)≤ ETr

(etXn · et

∑n−1k=1 Xk

)Golden-Thompson

= Tr(E(etXn) · E

(et

∑n−1k=1 Xk

))Independence

≤ λmax(EetXn) · ETr(et

∑n−1k=1 Xk

).

The last inequality follows from:

E(etXn) � λmax(E(etXn)) · Id .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 51: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Independent Matrix CaseAlshwede-Winter’s approach

Let X1, . . . ,Xn be d × d mutually independent centered Hermitianrandom matrices.

Aim: Control Ln(t) = ETr exp(t∑n

k=1 Xk).

ETr exp(t

n∑k=1

Xk)≤ ETr

(etXn · et

∑n−1k=1 Xk

)Golden-Thompson

= Tr(E(etXn) · E

(et

∑n−1k=1 Xk

))Independence

≤ λmax(EetXn) · ETr(et

∑n−1k=1 Xk

).

Iterating this procedure:

ETr exp(t

n∑k=1

Xk)6 d

n∏k=1

λmax(EetXk ) .

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 52: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Construction of Cantor-type sets

X1, . . . ,Xn geometrically β-mixing.Aim: Bound ETr exp

(t∑n

k=1 Xk).

I1,2I2,2 I2,4

I1,1I2,1 I2,3

I3,1 I3,2 I3,3 I3,4 I3,5 I3,6 I3,7 I3,8d1 d1

d2

d0

d2d2d2

1 n

Kn

Figure: Construction of the Cantor-type set Kn

Aim: Control the Laplace transform on the Cantor set Kn

ETr exp(

t∑k∈Kn

Xk

)≤ ?

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 53: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Construction of Cantor-type sets

X1, . . . ,Xn geometrically β-mixing.Aim: Bound ETr exp

(t∑n

k=1 Xk).

I1,2I2,2 I2,4

I1,1I2,1 I2,3

I3,1 I3,2 I3,3 I3,4 I3,5 I3,6 I3,7 I3,8d1 d1

d2

d0

d2d2d2

1 n

Kn

Figure: Construction of the Cantor-type set Kn

Aim: Control the Laplace transform on the Cantor set Kn

ETr exp(

t∑k∈Kn

Xk

)≤ ?

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 54: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Construction of Cantor-type sets

X1, . . . ,Xn geometrically β-mixing.

Aim: Bound ETr exp(t∑n

k=1 Xk).

I1,2I2,2 I2,4

I1,1I2,1 I2,3

I3,1 I3,2 I3,3 I3,4 I3,5 I3,6 I3,7 I3,8d1 d1

d2

d0

d2d2d2

1 n

Kn

Figure: Construction of the Cantor-type set Kn

Aim: Control the Laplace transform on the Cantor set Kn

ETr exp(

t∑k∈Kn

Xk

)≤ ?

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 55: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Dependent matrix caseControl of the Laplace transform

I Control:ETr

(etS1+tS2+tS3+tS4

)≤ ?

I Scalar Setting:

E(etS1+tS2+tS3+tS4) ≤ E(etS1+tS2) · E(etS3+tS4)(1 + ε(d2,3))

I Matrix Setting:

ETr(etS1+tS2+tS3+tS4) ≤ Tr(E(etS1+tS2) · E(etS3+tS4)

)(1 + ε(d2,3))

We shall prove instead:

ETr(etS1+tS2+tS3+tS4

)≤ ETr

(etS∗

1 +tS∗2 +tS∗

3 +tS∗4

)(1 + ε(d1,2,d2,3,d3,4))

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 56: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Dependent matrix caseControl of the Laplace transform

I Control:ETr

(etS1+tS2+tS3+tS4

)≤ ?

I Scalar Setting:

E(etS1+tS2+tS3+tS4) ≤ E(etS1+tS2) · E(etS3+tS4)(1 + ε(d2,3))

I Matrix Setting:

ETr(etS1+tS2+tS3+tS4) ≤ Tr(E(etS1+tS2) · E(etS3+tS4)

)(1 + ε(d2,3))

We shall prove instead:

ETr(etS1+tS2+tS3+tS4

)≤ ETr

(etS∗

1 +tS∗2 +tS∗

3 +tS∗4

)(1 + ε(d1,2,d2,3,d3,4))

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 57: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Dependent matrix caseControl of the Laplace transform

I Control:ETr

(etS1+tS2+tS3+tS4

)≤ ?

I Scalar Setting:

E(etS1+tS2+tS3+tS4) ≤ E(etS1+tS2) · E(etS3+tS4)(1 + ε(d2,3))

I Matrix Setting:

ETr(etS1+tS2+tS3+tS4) ≤ Tr(E(etS1+tS2) · E(etS3+tS4)

)(1 + ε(d2,3))

We shall prove instead:

ETr(etS1+tS2+tS3+tS4

)≤ ETr

(etS∗

1 +tS∗2 +tS∗

3 +tS∗4

)(1 + ε(d1,2,d2,3,d3,4))

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 58: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

Thank you for your attention!

Marwa BANNA | Bernstein inequality for dependent random matrices

Page 59: A Bernstein inequality for dependent random matrices · Introduction I Let (X k) kbe a sequence of i.i.d. random variables with common mean and variance ˙2. I By the central limit

[1] Ahlswede, R. and Winter, A. (2002). Strong converse foridentification via quantum channels. IEEE Trans. Inform. Theory,48, no. 3, 569-579.

[2] Banna M., Merlevède F. and Youssef P. (2015). Bernstein typeinequality for a class of dependent random matrices. In revision:Random Matrices: Theory and Applications

[3] Merlevède F., Peligrad M. and Rio E. (2009). Bernstein inequalityand moderate deviations under strong mixing conditions. Highdimensional probability V: the Luminy volume, 273–292, Inst.Math. Stat. Collect., 5, Inst. Math. Statist., Beachwood, OH

[4] Tropp, J. A. (2012). User-friendly tail bounds for sums of randommatrices. Found. Comput. Math., 12, no. 4, 389-434.

Marwa BANNA | Bernstein inequality for dependent random matrices