High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm:...

49
High-dimensional covariance matrix estimation with applications to microarray studies and portfolio optimization Esa Ollila Department of Signal Processing and Acoustics Aalto University, Finland June 7th, 2018, Gipsa-Lab

Transcript of High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm:...

Page 1: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

High-dimensional covariance matrix estimationwith applications to microarray studies and

portfolio optimization

Esa Ollila

Department of Signal Processing and AcousticsAalto University, Finland

June 7th, 2018, Gipsa-Lab

Page 2: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Covariance estimation problem

x : p-variate (centered) random vector (p large)

x1, . . . ,xn i.i.d. realizations of x.

Problem: Find an estimate Σ of the pos. def. covariance matrix

Σ = E[(x− E[x])(x− E[x])>

]∈ Sp×p++

The sample covariance matrix (SCM),

S =1

n− 1

n∑

i=1

(xi − x)(xi − x)>,

is the most commonly used estimator of Σ.

Challenges in HD:

1 Insufficient sample support (ISS) case: p > n.=⇒ Estimate of Σ−1 can not be computed!

2 Low sample support (LSS) (i.e., p of the same magnitude as n)=⇒ Σ is estimated with a lot of error.

3 Outliers or heavy-tailed non-Gaussian data

2/40

Page 3: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Why covariance estimation?

Portfolio selection Discriminant AnalysisThe Most Important Applications

graphical models clustering/discriminant analysis

PCAradar detection

Ilya Soloveychik (HUJI) Robust Covariance Estimation 15 / 47

PCA

The Most Important Applications

graphical models clustering/discriminant analysis

PCA radar detection

Ilya Soloveychik (HUJI) Robust Covariance Estimation 15 / 47

Radar detection

Graphical models

Gaussian graphical model

n-dimensional Gaussian vector

x = (x1, . . . , xn) ∼ N (0,Σ)

xi, xj are conditionally independent (given the rest of x) if

(Σ−1)ij = 0

modeled as undirected graph with n nodes; arc i, j is absent if (Σ−1)ij = 0

1

2

34

5Σ−1 =

• • 0 • •• • • 0 •0 • • • 0• 0 • • 0• • 0 0 •

13/40

Page 4: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Bias-variance trade-off

Frobenius norm: ‖A‖2F = tr(A2).

Any estimator Σ ∈ Sp×p++ of Σ verifies

MSE(Σ) = var(Σ) + bias(Σ)

where MSE(Σ) , E[‖Σ−Σ‖2F].Suppose Σ = S. Then, since S is unbiased, i.e.,bias(Σ) = ‖E[Σ]−Σ‖2F = 0, one has that

MSE(Σ) = var(S) (= tr{cov(vec(S))})

7 However, S is not applicable for high-dimensional problems

n ≈ p ⇒ var(S) is very largep > n or p� n ⇒ S is singular (non-invertible).

3 Use an estimator Σ = Sβ that shrinks S towards a structure (e.g., ascaled identity matrix) using a tuning (shrinkage) parameter β

MSE(Σ) can be reduced by introducing some bias!Positive definiteness of Σ can be ensured!

4/40

Page 5: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Bias-variance trade-off

Frobenius norm: ‖A‖2F = tr(A2).

Any estimator Σ ∈ Sp×p++ of Σ verifies

MSE(Σ) = var(Σ) + bias(Σ)

where MSE(Σ) , E[‖Σ−Σ‖2F].Suppose Σ = S. Then, since S is unbiased, i.e.,bias(Σ) = ‖E[Σ]−Σ‖2F = 0, one has that

MSE(Σ) = var(S) (= tr{cov(vec(S))})

7 However, S is not applicable for high-dimensional problems

n ≈ p ⇒ var(S) is very largep > n or p� n ⇒ S is singular (non-invertible).

3 Use an estimator Σ = Sβ that shrinks S towards a structure (e.g., ascaled identity matrix) using a tuning (shrinkage) parameter β

MSE(Σ) can be reduced by introducing some bias!Positive definiteness of Σ can be ensured!

4/40

Page 6: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Bias-variance trade-off (cont’d)

Regularized SCM (RSCM) a la Ledoit and Wolf

Sβ = βS + (1− β)[tr(S)/p]Iwhere β > 0 denotes the shrinkage (regularization) parameter.

,_

e ,_

w

Total error Variance

Bias2

Tuning parameter

5/40

Page 7: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

This talk: choose β optimally under the random samplingfrom elliptical distribution + applications

Esa Ollila. Optimal high-dimensional shrinkage covariance estimation forelliptical distributions. In EUSIPCO’17, Kos, Greece, 2017, pp1689–1693.

Esa Ollila and Elias Raninen. Optimal shrinkage covariance matrixestimator under random sampling from elliptical distributions. in Arxivsoon.

M. N. Tabassum and E. Ollila. Compressive regularized discriminantanalysis of high-dimensional data with applications to microarraystudies. In ICASSP’18, Calgary, Canada, 2017, pp. 4204 –4208.

6/40

Page 8: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

1 Portfolio optimization

2 Ell-RSCM estimators

3 Estimates of oracle parameters

4 Simulation studies

5 Compressive Regularized Discriminant Analysis

Page 9: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Modern portfolio theory (MPT)

mathematical framework for portfolio allocations that balances thereturn-risk tradeoff by ??.

Important inputs by ???∗

A portfolio consist of p assets, e.g.:

- equity securities (stocks), market indexes- fixed-income securities (e.g., government or corporate bonds),

commodities (e.g., oil, gold, etc), ETF-s- currencies (exchange rates), derivatives, . . . ,

To use MPT one needs to estimate the mean vector µ and thecovariance matrix Σ of asset returns.

7 often p, the number of assets is larger (or of similar magnitude) to n,the number of historical returns.

⇒ MPT standard plug-in estimates is not applicable

∗Noble price winners: James Tobin (1981), Harry Markovitz (1990) and Willian F.

Sharpe (1990), and Eugene F. Fama (2013)

7/40

Page 10: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Basic definitions

Portfolio weight at (discrete) time index t:

wt = (wt,1, . . . , wt,p)> s.t.

p∑

i=1

wt,i = 1>wt = 1

Let Ci,t > 0 be the price of ith asset

The net return of the ith asset over one interval is

ri,t =Ci,t − Ci,t−1

Ci,t−1=

Ci,tCi,t−1

− 1 ∈ [−1,∞)

Single period net returns of p assets is a p-variate vector

rt = (r1,t, . . . , rp,t)>

The portfolio net return at period t+ 1 is

Rt+1 = w>t rt+1 =

p∑

i=1

wi,tri,t+1

8/40

Page 11: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Basic definitions (cont’d)

Given historical returns {rt}nt=1,

µ = E[rt] and Σ ≡ cov(rt) = E[(rt − µt)(rt − µt)

>]

holds for all t (so drop the index t from subscript).

Let r denote (random vector) of returns, then the expected return ofthe portfolio is

µw = E[R] = w>µ

and the variance (or risk) of the portfolio is

σ2w = var(R) = w>Σw.

The standard deviation σw =√

var(R) is called the volatility

Global mean variance portfolio (GMVP) optimization problem:

minimizew∈Rp

w>Σw subject to 1>w = 1.

The solution is wo =Σ−11

1>Σ−11.

9/40

Page 12: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

S&P 500 and Nasdaq-100 indexes for year 2017

01-03 03-16 05-26 08-08 10-18 12-29

2300

2400

2500

2600

S&P 500 index prices

01-03 03-16 05-26 08-08 10-18 12-29

5000

5500

6000

6500NASDAQ-100 prices

01-03 03-16 05-26 08-08 10-18 12-29

-0.02

-0.01

0

0.01

0.02

NASDAQ-100 daily net returns

01-03 03-16 05-26 08-08 10-18 12-29

-0.02

-0.01

0

0.01

0.02

S&P 500 daily net returns

10/40

Page 13: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Are historical returns Gaussian?

QQ-plots and estimated pdf-s

-4 -2 0 2 4

Standard Normal Quantiles

-4

-2

0

2

4

Qu

an

tile

s o

f In

pu

t S

am

ple

S&P 500

-4 -2 0 2 4

Standard Normal Quantiles

-4

-2

0

2

4

Qu

an

tile

s o

f In

pu

t S

am

ple

NASDAQ-100

-4 -2 0 2 4

Standard Normal Quantiles

-4

-2

0

2

4

Qu

an

tile

s o

f In

pu

t S

am

ple

N(0,1) sample

S&P 500

-4 -2 0 2

0

0.1

0.2

0.3

0.4

0.5

0.6NASDAQ-100

-4 -2 0 2 4

0

0.1

0.2

0.3

0.4

0.5

0.6N(0,1) sample

-2 -1 0 1 2

0

0.1

0.2

0.3

0.4

0.5

0.6

11/40

Page 14: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Are historical returns Gaussian (cont’d)?

Scatter plots and estimated 99%, 95% and 50% tolerance ellipses

-0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

inside the 50% ellipse: 65.6% of returnsinside the 95 and 99% ellipses = 93.2% 95.6% of returns

12/40

Page 15: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Stocks are unpredictable (and non-Gaussian)!

03-16 05-26 08-08 10-18 12-29-5

0

5

10

15

Shorting allowed

long-only

05-26 08-08 10-18 12-29-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

Shorting allowed

long-only

TECH stocks (Facebook, Apple, Amazon, Microsoft, Google) droppeddrastically (in seconds) due ”fat finger” trade or an automated trade.

13/40

Page 16: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Real data analysis

We apply GMVP problem for real stock data consisting of dividentadjusted daily closing prices of daily net returns.

Data sets:

1 p = 45 stocks in Hang Seng Index (HSI) for Jan. 4, 2010 - Dec.24, 2011

2 p = 50 stocks in HSI for Jan. 1, 2016 to Dec. 27, 2017.3 p = 396 stocks in S&P500 index for Jan. 4, 2016 - Apr 27, 2018

Sliding window method:

1 At day t, we use the previous n days to estimate Σ and w.2 portfolio returns are then computed for the following 20 days.3 Window is shifted 20 trading days forward, new weight and

portfolio returns for another 20 days are computed.

14/40

Page 17: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Results for HSIJan 4, 2010 - Dec 24, 2011 Jan 1, 2016 - Dec 27, 2017

50 100 150 200

n

0.12

0.122

0.124

0.126

0.128

annu

aliz

ed re

aliz

ed ri

sk

Ell1-RSCM

Ell2-RSCM

LW-RSCM

150 250200 n

0.075

0.08

0.085

0.09

annu

aliz

ed re

aliz

ed ri

sk

Ell1-RSCM

Ell2-RSCM

LW-RSCM

150 200 250

n

0.75

0.8

0.85

0.9

0.95

be

ta

Ell1-RSCM

Ell2-RSCM

LW-RSCM

50 100 150 200

n

0.75

0.8

0.85

0.9

0.95

beta

Ell1-RSCM

Ell2-RSCM

LW-RSCM

15/40

Page 18: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Results for S&P 500

p = 396 stocks in S&P 500 index for Jan. 4, 2016 to to Apr 27, 2018

70 120 170 220 270

n

0.075

0.078

0.081

0.084

0.087

annu

aliz

ed re

aliz

ed ri

sk

Ell1-RSCM

Ell2-RSCM

LW-RSCM

100 150 200 250 300

n

0.5

0.6

0.7

0.8

0.9

be

taEll1-RSCM

Ell2-RSCM

LW-RSCM

16/40

Page 19: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

1 Portfolio optimization

2 Ell-RSCM estimators

3 Estimates of oracle parameters

4 Simulation studies

5 Compressive Regularized Discriminant Analysis

Page 20: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Regularized SCM and MMSE estimator

Minimum mean squared error (MMSE) estimator Sαo,βo :

(αo, βo) = argminα,β>0

{E[∥∥βS + αI−Σ

∥∥2F

]},

7 (αo, βo) will depend on true unknown Σ ⇒ need to estimate (αo, βo)

? developed a popular consistent estimators (αLWo , β

LWo ) under the

random matrix theory (RMT) regime.

? assumed Gaussianity to improve estimation accuracy.

We develop estimators of (αo, βo) under the RMT regime whensampling from an unspecified elliptically symmetric distribution.

17/40

Page 21: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Important statistics

Scale measure:

η =tr(Σ)

p= mean of eigenvalues

Sphericity measure:

γ =p tr(Σ2)

tr(Σ)2

=mean of squared eigenvalues

[mean of eigenvalues]2

γ ∈ [1, p] and

γ = 1 iff Σ ∝ Iγ = p iff rank(Σ) = p.

18/40

Page 22: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Optimal shrinkage parameters

Result 1.

Assume finite 4th-order moments.

Optimal shrinkage parameters:

βo =‖Σ− ηI‖2F

‖Σ− ηI‖2F +MSE(S)

=p(γ − 1)η2

E[tr(S2)]− pη2

αo = (1− βo)η.

and note that βo ∈ [0, 1).

The MSE at the optimum is

E[∥∥βoS + αoI−Σ

∥∥2F

]= ‖Σ− ηI‖2F(1− βo).

19/40

Page 23: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Elliptically symmetric distributions

x ∼ Ep(µ,Σ, g) (and assume finite 4th-order moments):

f(x) = C · |Σ|−1/2g([x− µ]>Σ−1[x− µ]

)

where g : [0,∞)→ [0,∞) is the density generator, C is a normalizingconstant, µ = E[x] and Σ = cov(x) � 0.

x ∼ Np(µ,Σ): g(t) = exp(−t/2).t-distribution with ν > 4 deg. of freedom, x ∼ tp,ν(0,Σ), . . .

Elliptical kurtosis parameter (?)

κ =E[‖Σ−1/2(x− µ)‖4]

p(p+ 2)− 1

=1

3· {kurtosis of xi}

(kurtosis = E[(xi − µi)4]/var(xi)2 − 3)

20/40

Page 24: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Covariance matrix of the SCM SKp := commutation matrix (i.e., Kpvec(A) = vec(A>) ∀ A ∈ Rp×p)

Result. When sampling from Ep(µ,Σ, g) :

cov(vec(S)) =( 1

n− 1+κ

n

)(I + Kp)(Σ⊗Σ) +

κ

nvec(Σ)vec(Σ)>,

Special cases:

1 Case p = 1 (so S = s2 (sample variance), Σ = σ2, κ = 13kurt(x)):

var(s2) = σ4{ 2

n− 1+

kurt(x)

n

}

2 Gaussian case (κ = 0):

cov(vec(S)) =1

n− 1(I + Kp)(Σ⊗Σ)

which is expected result since S ∼Wp(n− 1, (1/n)Σ).

21/40

Page 25: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Covariance matrix of the SCM SKp := commutation matrix (i.e., Kpvec(A) = vec(A>) ∀ A ∈ Rp×p)

Result. When sampling from Ep(µ,Σ, g) :

cov(vec(S)) =( 1

n− 1+κ

n

)(I + Kp)(Σ⊗Σ) +

κ

nvec(Σ)vec(Σ)>,

Special cases:

1 Case p = 1 (so S = s2 (sample variance), Σ = σ2, κ = 13kurt(x)):

var(s2) = σ4{ 2

n− 1+

kurt(x)

n

}

2 Gaussian case (κ = 0):

cov(vec(S)) =1

n− 1(I + Kp)(Σ⊗Σ)

which is expected result since S ∼Wp(n− 1, (1/n)Σ).

21/40

Page 26: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Covariance matrix of the SCM SKp := commutation matrix (i.e., Kpvec(A) = vec(A>) ∀ A ∈ Rp×p)

Result. When sampling from Ep(µ,Σ, g) :

cov(vec(S)) =( 1

n− 1+κ

n

)(I + Kp)(Σ⊗Σ) +

κ

nvec(Σ)vec(Σ)>,

Special cases:

1 Case p = 1 (so S = s2 (sample variance), Σ = σ2, κ = 13kurt(x)):

var(s2) = σ4{ 2

n− 1+

kurt(x)

n

}

2 Gaussian case (κ = 0):

cov(vec(S)) =1

n− 1(I + Kp)(Σ⊗Σ)

which is expected result since S ∼Wp(n− 1, (1/n)Σ).

21/40

Page 27: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

MMSE shrinkage parameters: elliptical case (cont’d)

Result. Optimal shrinkage parameters when F = Ep(µ,Σ, g) :

αEllo = (1− βEll1o )η

βEllo =γ − 1

γ − 1 + κ(2γ + p)/n+ (γ + p)/(n− 1),

where η = tr(Σ)/p and γ = p tr(Σ2)/ tr(Σ)2 are the scale andsphericity measures.

Note: βEllo = βEllo (γ, κ) depends on unknown γ and κ.

Proof: Simply use easlier result and the fact that:

MSE(S) = tr{cov(vec(S))}

(and some calculus related to Kronecker product, trace andcommutation matrix).

22/40

Page 28: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

1 Portfolio optimization

2 Ell-RSCM estimators

3 Estimates of oracle parameters

4 Simulation studies

5 Compressive Regularized Discriminant Analysis

Page 29: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Estimation of oracle parameters

Plug-in estimator:βEllo = βEllo (γ, κ),

where γ and κ are preferably consistent estimators of γ and κ.

Ell-RSCM estimator

βEllo = min

(1,max

(0,

γ − 1

γ − 1 + κ(2γ + p)/n+ (γ + p)/(n− 1)

))

αEllo = (1− βEllo ) tr(S)/p

Note: The ”min - max ” is due to the fact that βo ∈ [0, 1).

23/40

Page 30: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Estimate of elliptical kurtosis

A natural estimate of κ = (1/3)kurt(xi):

κ = max(− 2

p+ 2,1

3p

p∑

j=1

Kj

),

where Kj is (cumulant based) estimate of kurtosis of xj (?).

Kj =n− 1

(n− 2)(n− 3){(n+ 1)kj + 6}

and kj is the sample moment based estimate of kurtosis.

Note: κ > −2/(p+ 2) (?).

24/40

Page 31: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Ell1-estimator of γ

Sample sign covariance matrix (?) is defined as

Ssgn =1

n

n∑

i=1

(xi − µ)(xi − µ)>

‖xi − µ‖2 ,

where µ = argminµ

n∑

i=1

‖xi − µ‖

(?) proposed a sphericity statistic

γEll1 = p tr(S2sgn

)− p

n

and showed that γEll1 → γ under the assumptions:

n, p→∞ and c =p

n→ c0 and eigenvalues of Σ converge to a

fixed spectrum. [random matrix theory (RMT) regime]

As p→∞, ηi =tr(Σi)

p→ ηoi , for i = 1, . . . , 4

25/40

Page 32: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Ell1-estimator of γ (cont’d)

We include a bias correction:

γEll1 =n

n− 1

(p tr

(S2sgn

)− p

n

)

=p

n(n− 1)

i 6=j(v>i vj)

2

= p avei 6=j{cos2(^(xi,xj))} (vi = xi/‖xi‖)

Note: E[γEll1] ∈ [1, p].

26/40

Page 33: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Ell2-estimator of γ

Define

η2 = bn

(tr(S2)

p− an

p

n

[tr(S)

p

]2)

where

an ≡ an(κ) =(

n

n+ κ

)(n

n− 1+ κ

)

bn ≡ bn(κ) =(κ+ n)(n− 1)2

(n− 2)(3κ(n− 1) + n(n+ 1))

Note: for large n, we have

η2 ≈(tr(S2)

p− (1 + κ)

p

n

[tr(S)

p

]2)

Result: E[η2] = tr(Σ2)/p .

⇒ tr(S2)/p 6→ tr(Σ2)/p unless c = p/n→ 0 as p, n→∞.

27/40

Page 34: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Ell2-estimator of γ (cont’d)

Defining an = an(κ) and bn = bn(κ), we get our second estimator ofsphericity γ:

γEll2 =η2η2

= bn

(p tr(S2)

tr(S)2− an

p

n

)

Ell2-RSCM estimator uses βEll2o = βo(κ, γEll2).

Ell1-RSCM estimator uses βEll1o = βo(κ, γEll1).

28/40

Page 35: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

1 Portfolio optimization

2 Ell-RSCM estimators

3 Estimates of oracle parameters

4 Simulation studies

5 Compressive Regularized Discriminant Analysis

Page 36: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

AR(1) covariance matrix : [Σ]ij = %|i−j|, % ∈ (0, 1)Observations xi from Gaussian distribution, p = 100

Normalized MSE = E[‖Σ−Σ‖2F]/‖Σ‖2F

20 40

0

0.1

0.2

0.3

0.4

n

NMSE

Theory; Ell1; Ell2; Ell3; LW

(c) % = 0.1

20 40

0.3

0.4

0.5

n

NMSE

(d) % = 0.4

29/40

Page 37: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

AR(1) covariance matrix : [Σ]ij = %|i−j|, % ∈ (0, 1)Observations xi from t12 distribution, p = 100

20 40

0

0.2

0.4

0.6

n

NMSE

Theory; Ell1; Ell2; Ell3; LW

(e) % = 0.1 and ν = 12

20 400.2

0.4

0.6

nNMSE

(f) % = 0.4 and ν = 12

30/40

Page 38: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Largely varying spectrum case I: Gaussian samples

n = p = 50

Σ = diag(1, 1, . . . , 1, 0.01, . . . , 0.01), where m eigenvalues are 1 andremaining 50−m are 0.01.

30 40 50

0

1

2

·10−2

m

NMSE

Theory; Ell1; Ell2; Ell3; LW

30 40 50

0

0.1

0.2

0.3

m

β

31/40

Page 39: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Largely varying spectrum case I : t10-distributed samples

n = p = 50

Σ = diag(1, 1, . . . , 1, 0.01, . . . , 0.01), where m eigenvalues are 1 andremaining 50−m are 0.01.

30 40 50

0

1

2

·10−2

m

NMSE

Theory; Ell1; Ell2; Ell3; LW

30 40 50

0

0.1

0.2

0.3

m

β

32/40

Page 40: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Largely varying spectrum case II: Gaussian samples

Σ has 30 eigenvalues equal to 100, 40 eigenvalues equal to 1 and 30eigenvalues equal to 0.01.

p = 100 and n varies

10 20 30

0.4

0.6

0.8

1

1.2

n

NMSE

Theory; Ell1; Ell2; Ell3; LW

10 20 30

0.1

0.2

0.3

0.4

n

β

33/40

Page 41: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Largely varying spectrum case II: t10-distributed samples

Σ has 30 eigenvalues equal to 100, 40 eigenvalues equal to 1 and 30eigenvalues equal to 0.01.

p = 100 and n varies

10 20 30

0.5

1

1.5

n

NMSE

Theory; Ell1; Ell2; Ell3; LW

10 20 30

0.1

0.2

0.3

0.4

n

β

34/40

Page 42: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

1 Portfolio optimization

2 Ell-RSCM estimators

3 Estimates of oracle parameters

4 Simulation studies

5 Compressive Regularized Discriminant Analysis

Page 43: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Micro array analysis

Inferring large-scale covariance matrices from sparse genomic data isan ubiquitous problem in bioinformatics.

Microarrays monitor genome wide expression levels of genes in a givenorganism.

MDA provides a challenging framework for data analyst to work with:

p = # number of genes (> 10,000)

ng = # number of patients/organism (∼ 10) e.g., of cancer type g,g = 1, . . . , G.

G = # number of classes (around 2− 10)

Dataset N p G Disease

? 190 16,063 14 Cancer? 248 12,625 6 Leukemia? 180 54,613 4 Glioma? 105 22,283 10 Sarcoma

35/40

Page 44: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Compressive regularized discriminant analysis (CRDA)

Goals:

1 Assign x ∈ Rp to a correct class (out of G distinct classes).2 Reduce # of variables (or features) without sacrificing the

classification accuracy.

Challenge: Small n, large p, i.e., # of features � # of observationsin the training dataset.

Benchmark methods:

nearest shrunked centroid (NSC) ?

shrunken centroids regularized discriminant analysis (SCRDA)(?).

3 CRDA can be used as fast and accurate gene selection method andclassification tool in microarray studies.

3 CRDA gives fewer misclassification errors than its competitors whileat the same time achieving accurate feature elimination.

36/40

Page 45: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Linear Discriminant Analysis

LDA discriminant function expressed in vector form:

d(x) = (d1(x), . . . , dg(x), . . . , dG(x))

= x>B − 1

2diag

(M>B

)

where

M =(µ1 . . . µG

)

B = Σ−1M.

Here Σ is the common covariance matrix of the classes.

LDA rule: classify x ∈ Rp to class g ∈ {1, . . . , G}:

g = argmaxg

dg(x)

37/40

Page 46: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Compressive Regularized Discriminant Analysis (CRDA)

CRDA uses compressed estimated rule

d(x) =(d1(x), . . . , dG(x)

)

= x>B − 1

2diag

(M>B

),

where µg = xg, and

M =(µ1 . . . µG

),

B = HK(Σ−1M, q).

Regularization: Σ = Ell2-RSCM estimator (based on pooled data).

Feature elimination though hard-thresholding operator HK(·, q):HK(B, q) retains the elements of the K rows of B that possess largest `qnorm and set elements of the other rows to zero.

Regularization parameter is K (for a fixed `q-norm q ∈ {1, 2,∞}).

38/40

Page 47: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Compressive Regularized Discriminant Analysis (CRDA)

MethodsRamaswamy et al. dataset Yeoh et al. dataset Sun et al. dataset Nakayama et al. datasetTE NFS TE NFS TE NFS TE NFS

CRDA`1 9.9 4899 7.5 4697 12.9 27416 7.9 6952

CRDA`2 10.3 3968 6.0 4659 13.3 23484 7.6 7755

CRDA`∞ 10.3 4530 9.3 4697 12.4 20207 7.6 2340PLDA 18.8 5023 NA NA 15.2 21635 4.4 10479SCRDA 24 14874 NA NA 15.7 54183 2.8 22283NSC 16.3 2337 NA NA 15 30005 4.2 5908

Table: Classification results: bold-face indicates the best results for testerror (TE) and number of features selected (NFS).

We compute the results for each dataset over 10 training-test set splits,each with a random choice of training and test set containing 75% and25% of the total N observations, respectively.

39/40

Page 48: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

Conclusions

We proposed optimal (minimizing MSE) regularized samplecovariance matrix estimators, called Ell1- and Ell2-RSCM estimator,which are suitable for high-dimensional problems.

We exploit the assumption that samples are from an unspecifiedelliptically symmetric distribution.

Simulation studies illustrated the significant advantages of theproposed Ell1 and Ell2-RSCM estimators over the Ledoit-Wolf(LW-)RSCM estimator

Good performance of the method in regularized QDA was illustratedwith real data.

We proposed a new compressive regularized discriminant analysis forsimultaneous classification and feature selection

Method was shown to outperform its popular competitors in theanalysis of real microarray data sets.

40/40

Page 49: High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

ReferencesPeter M Bentler and Maia Berkane. Greatest lower bound to the elliptical theory

kurtosis parameter. Biometrika, 73(1):240–241, 1986.Yilun Chen, Ami Wiesel, Yonina C Eldar, and Alfred O Hero. Shrinkage algorithms for

mmse covariance estimation. IEEE Trans. Signal Process., 58(10):5016–5029, 2010.Yaqian Guo, Trevor Hastie, and Robert Tibshirani. Regularized linear discriminant

analysis and its application in microarrays. Biostatistics, 8(1):86–100, 2007.DN Joanes and CA Gill. Comparing measures of sample skewness and kurtosis. Journal

of the Royal Statistical Society: Series D (The Statistician), 47(1):183–189, 1998.Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensional

covariance matrices. Journal of multivariate analysis, 88(2):365–411, 2004.Burton G Malkiel and Eugene F Fama. Efficient capital markets: A review of theory and

empirical work. The journal of Finance, 25(2):383–417, 1970.Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952.Harry Markowitz. Portfolio Selection, Efficent Diversification of Investments. J. Wiley,

1959.R. J. Muirhead. Aspects of Multivariate Statistical Theory. Wiley, New York, 1982. 704

pages.Robert Nakayama, Takeshi Nemoto, Hiro Takahashi, Tsutomu Ohta, Akira Kawai,

Kunihiko Seki, Teruhiko Yoshida, Yoshiaki Toyama, Hitoshi Ichikawa, and TadashiHasegawa. Gene expression analysis of soft tissue sarcomas: characterization andreclassification of malignant fibrous histiocytoma. Modern pathology, 20(7):749–759,2007.

Sridhar Ramaswamy, Pablo Tamayo, Ryan Rifkin, Sayan Mukherjee, Chen-Hsiang Yeang,Michael Angelo, Christine Ladd, Michael Reich, Eva Latulippe, Jill P Mesirov, et al.Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings ofthe National Academy of Sciences, 98(26):15149–15154, 2001.

William F Sharpe. Capital asset prices: A theory of market equilibrium under conditionsof risk. The journal of finance, 19(3):425–442, 1964.

Lixin Sun, Ai-Min Hui, Qin Su, Alexander Vortmeyer, Yuri Kotliarov, Sandra Pastorino,Antonino Passaniti, Jayant Menon, Jennifer Walling, Rolando Bailey, et al. Neuronaland glioma-derived stem cell factor induces angiogenesis within the brain. Cancer cell,9(4):287–300, 2006.

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu.Diagnosis of multiple cancer types by shrunken centroids of gene expression.Proceedings of the National Academy of Sciences, 99(10):6567–6572, 2002.

James Tobin. Liquidity preference as behavior towards risk. The review of economicstudies, 25(2):65–86, 1958.

S. Visuri, V. Koivunen, and H. Oja. Sign and rank covariance matrices. J. Statist.Plann. Inference, 91:557–575, 2000.

Eng-Juh Yeoh, Mary E Ross, Sheila A Shurtleff, W Kent Williams, Divyen Patel, RamiMahfouz, Fred G Behm, Susana C Raimondi, Mary V Relling, Anami Patel, et al.Classification, subtype discovery, and prediction of outcome in pediatric acutelymphoblastic leukemia by gene expression profiling. Cancer cell, 1(2):133–143, 2002.

Teng Zhang and Ami Wiesel. Automatic diagonal loading for tyler’s robust covarianceestimator. In IEEE Statistical Signal Processing Workshop (SSP’16), pages 1–5, 2016.