High-dimensional covariance matrix estimation with ... · PDF file Bias-variance trade-o...

Click here to load reader

  • date post

    11-Oct-2020
  • Category

    Documents

  • view

    1
  • download

    0

Embed Size (px)

Transcript of High-dimensional covariance matrix estimation with ... · PDF file Bias-variance trade-o...

  • High-dimensional covariance matrix estimation with applications to microarray studies and

    portfolio optimization

    Esa Ollila

    Department of Signal Processing and Acoustics Aalto University, Finland

    June 7th, 2018, Gipsa-Lab

  • Covariance estimation problem

    x : p-variate (centered) random vector (p large)

    x1, . . . ,xn i.i.d. realizations of x.

    Problem: Find an estimate Σ̂ of the pos. def. covariance matrix

    Σ = E [ (x− E[x])(x− E[x])>

    ] ∈ Sp×p++

    The sample covariance matrix (SCM),

    S = 1

    n− 1 n∑

    i=1

    (xi − x)(xi − x)>,

    is the most commonly used estimator of Σ.

    Challenges in HD:

    1 Insufficient sample support (ISS) case: p > n. =⇒ Estimate of Σ−1 can not be computed!

    2 Low sample support (LSS) (i.e., p of the same magnitude as n) =⇒ Σ̂ is estimated with a lot of error.

    3 Outliers or heavy-tailed non-Gaussian data

    2/40

  • Why covariance estimation?

    Portfolio selection Discriminant Analysis The Most Important Applications

    graphical models clustering/discriminant analysis

    PCA radar detection

    Ilya Soloveychik (HUJI) Robust Covariance Estimation 15 / 47

    PCA

    The Most Important Applications

    graphical models clustering/discriminant analysis

    PCA radar detection

    Ilya Soloveychik (HUJI) Robust Covariance Estimation 15 / 47

    Radar detection

    Graphical models

    Gaussian graphical model

    n-dimensional Gaussian vector

    x = (x1, . . . , xn) ∼ N (0,Σ)

    xi, xj are conditionally independent (given the rest of x) if

    (Σ−1)ij = 0

    modeled as undirected graph with n nodes; arc i, j is absent if (Σ−1)ij = 0

    1

    2

    34

    5 Σ−1 =

     

    • • 0 • • • • • 0 • 0 • • • 0 • 0 • • 0 • • 0 0 •

     

    13/40

  • Bias-variance trade-off

    Frobenius norm: ‖A‖2F = tr(A2). Any estimator Σ̂ ∈ Sp×p++ of Σ verifies

    MSE(Σ̂) = var(Σ̂) + bias(Σ̂)

    where MSE(Σ̂) , E[‖Σ̂−Σ‖2F]. Suppose Σ̂ = S. Then, since S is unbiased, i.e., bias(Σ̂) = ‖E[Σ̂]−Σ‖2F = 0, one has that

    MSE(Σ̂) = var(Ŝ) (= tr{cov(vec(S))})

    7 However, S is not applicable for high-dimensional problems

    n ≈ p ⇒ var(S) is very large p > n or p� n ⇒ S is singular (non-invertible).

    3 Use an estimator Σ̂ = Ŝβ that shrinks S towards a structure (e.g., a scaled identity matrix) using a tuning (shrinkage) parameter β

    MSE(Σ̂) can be reduced by introducing some bias! Positive definiteness of Σ̂ can be ensured!

    4/40

  • Bias-variance trade-off

    Frobenius norm: ‖A‖2F = tr(A2). Any estimator Σ̂ ∈ Sp×p++ of Σ verifies

    MSE(Σ̂) = var(Σ̂) + bias(Σ̂)

    where MSE(Σ̂) , E[‖Σ̂−Σ‖2F]. Suppose Σ̂ = S. Then, since S is unbiased, i.e., bias(Σ̂) = ‖E[Σ̂]−Σ‖2F = 0, one has that

    MSE(Σ̂) = var(Ŝ) (= tr{cov(vec(S))})

    7 However, S is not applicable for high-dimensional problems

    n ≈ p ⇒ var(S) is very large p > n or p� n ⇒ S is singular (non-invertible).

    3 Use an estimator Σ̂ = Ŝβ that shrinks S towards a structure (e.g., a scaled identity matrix) using a tuning (shrinkage) parameter β

    MSE(Σ̂) can be reduced by introducing some bias! Positive definiteness of Σ̂ can be ensured!

    4/40

  • Bias-variance trade-off (cont’d)

    Regularized SCM (RSCM) a lá Ledoit and Wolf

    Sβ = βS + (1− β)[tr(S)/p]I where β > 0 denotes the shrinkage (regularization) parameter.

    ,_

    e ,_

    w

    Total error Variance

    Bias2

    Tuning parameter

    5/40

  • This talk: choose β optimally under the random sampling from elliptical distribution + applications

    Esa Ollila. Optimal high-dimensional shrinkage covariance estimation for elliptical distributions. In EUSIPCO’17, Kos, Greece, 2017, pp 1689–1693.

    Esa Ollila and Elias Raninen. Optimal shrinkage covariance matrix estimator under random sampling from elliptical distributions. in Arxiv soon.

    M. N. Tabassum and E. Ollila. Compressive regularized discriminant analysis of high-dimensional data with applications to microarray studies. In ICASSP’18, Calgary, Canada, 2017, pp. 4204 –4208.

    6/40

  • 1 Portfolio optimization

    2 Ell-RSCM estimators

    3 Estimates of oracle parameters

    4 Simulation studies

    5 Compressive Regularized Discriminant Analysis

  • Modern portfolio theory (MPT)

    mathematical framework for portfolio allocations that balances the return-risk tradeoff by ??.

    Important inputs by ???∗

    A portfolio consist of p assets, e.g.:

    - equity securities (stocks), market indexes - fixed-income securities (e.g., government or corporate bonds),

    commodities (e.g., oil, gold, etc), ETF-s - currencies (exchange rates), derivatives, . . . ,

    To use MPT one needs to estimate the mean vector µ and the covariance matrix Σ of asset returns.

    7 often p, the number of assets is larger (or of similar magnitude) to n, the number of historical returns.

    ⇒ MPT standard plug-in estimates is not applicable ∗Noble price winners: James Tobin (1981), Harry Markovitz (1990) and Willian F.

    Sharpe (1990), and Eugene F. Fama (2013)

    7/40

  • Basic definitions

    Portfolio weight at (discrete) time index t:

    wt = (wt,1, . . . , wt,p) > s.t.

    p∑

    i=1

    wt,i = 1 >wt = 1

    Let Ci,t > 0 be the price of i th asset

    The net return of the ith asset over one interval is

    ri,t = Ci,t − Ci,t−1

    Ci,t−1 =

    Ci,t Ci,t−1

    − 1 ∈ [−1,∞)

    Single period net returns of p assets is a p-variate vector

    rt = (r1,t, . . . , rp,t) >

    The portfolio net return at period t+ 1 is

    Rt+1 = w > t rt+1 =

    p∑

    i=1

    wi,tri,t+1

    8/40

  • Basic definitions (cont’d)

    Given historical returns {rt}nt=1,

    µ = E[rt] and Σ ≡ cov(rt) = E [ (rt − µt)(rt − µt)>

    ]

    holds for all t (so drop the index t from subscript).

    Let r denote (random vector) of returns, then the expected return of the portfolio is

    µw = E[R] = w>µ

    and the variance (or risk) of the portfolio is

    σ2w = var(R) = w >Σw.

    The standard deviation σw = √

    var(R) is called the volatility

    Global mean variance portfolio (GMVP) optimization problem:

    minimize w∈Rp

    w>Σw subject to 1>w = 1.

    The solution is wo = Σ−11

    1>Σ−11 .

    9/40

  • S&P 500 and Nasdaq-100 indexes for year 2017

    01-03 03-16 05-26 08-08 10-18 12-29

    2300

    2400

    2500

    2600

    S&P 500 index prices

    01-03 03-16 05-26 08-08 10-18 12-29

    5000

    5500

    6000

    6500 NASDAQ-100 prices

    01-03 03-16 05-26 08-08 10-18 12-29

    -0.02

    -0.01

    0

    0.01

    0.02

    NASDAQ-100 daily net returns

    01-03 03-16 05-26 08-08 10-18 12-29

    -0.02

    -0.01

    0

    0.01

    0.02

    S&P 500 daily net returns

    10/40

  • Are historical returns Gaussian?

    QQ-plots and estimated pdf-s

    -4 -2 0 2 4

    Standard Normal Quantiles

    -4

    -2

    0

    2

    4

    Q u

    a n

    ti le

    s o

    f In

    p u

    t S

    a m

    p le

    S&P 500

    -4 -2 0 2 4

    Standard Normal Quantiles

    -4

    -2

    0

    2

    4

    Q u

    a n

    ti le

    s o

    f In

    p u

    t S

    a m

    p le

    NASDAQ-100

    -4 -2 0 2 4

    Standard Normal Quantiles

    -4

    -2

    0

    2

    4

    Q u

    a n

    ti le

    s o

    f In

    p u

    t S

    a m

    p le

    N(0,1) sample

    S&P 500

    -4 -2 0 2

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6 NASDAQ-100

    -4 -2 0 2 4

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6 N(0,1) sample

    -2 -1 0 1 2

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    11/40

  • Are historical returns Gaussian (cont’d)?

    Scatter plots and estimated 99%, 95% and 50% tolerance ellipses

    -0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015

    -0.03

    -0.02

    -0.01

    0

    0.01

    0.02

    0.03

    inside the 50% ellipse: 65.6% of returns inside the 95 and 99% ellipses = 93.2% 95.6% of returns

    12/40

  • Stocks are unpredictable (and non-Gaussian)!

    03-16 05-26 08-08 10-18 12-29 -5

    0

    5

    10

    15

    Shorting allowed

    long-only

    05-26 08-08 10-18 12-29 -0.2

    -0.15

    -0.1

    -0.05

    0

    0.05

    0.1

    Shorting allowed

    long-only

    TECH stocks (Facebook, Apple, Amazon, Microsoft, Google) dropped drastically (in seconds) due ”fat finger” trade or an automated trade.

    13/40

  • Real data analysis