High-dimensional covariance matrix estimation with ... · PDF file Bias-variance trade-o...
date post
11-Oct-2020Category
Documents
view
1download
0
Embed Size (px)
Transcript of High-dimensional covariance matrix estimation with ... · PDF file Bias-variance trade-o...
High-dimensional covariance matrix estimation with applications to microarray studies and
portfolio optimization
Esa Ollila
Department of Signal Processing and Acoustics Aalto University, Finland
June 7th, 2018, Gipsa-Lab
Covariance estimation problem
x : p-variate (centered) random vector (p large)
x1, . . . ,xn i.i.d. realizations of x.
Problem: Find an estimate Σ̂ of the pos. def. covariance matrix
Σ = E [ (x− E[x])(x− E[x])>
] ∈ Sp×p++
The sample covariance matrix (SCM),
S = 1
n− 1 n∑
i=1
(xi − x)(xi − x)>,
is the most commonly used estimator of Σ.
Challenges in HD:
1 Insufficient sample support (ISS) case: p > n. =⇒ Estimate of Σ−1 can not be computed!
2 Low sample support (LSS) (i.e., p of the same magnitude as n) =⇒ Σ̂ is estimated with a lot of error.
3 Outliers or heavy-tailed non-Gaussian data
2/40
Why covariance estimation?
Portfolio selection Discriminant Analysis The Most Important Applications
graphical models clustering/discriminant analysis
PCA radar detection
Ilya Soloveychik (HUJI) Robust Covariance Estimation 15 / 47
PCA
The Most Important Applications
graphical models clustering/discriminant analysis
PCA radar detection
Ilya Soloveychik (HUJI) Robust Covariance Estimation 15 / 47
Radar detection
Graphical models
Gaussian graphical model
n-dimensional Gaussian vector
x = (x1, . . . , xn) ∼ N (0,Σ)
xi, xj are conditionally independent (given the rest of x) if
(Σ−1)ij = 0
modeled as undirected graph with n nodes; arc i, j is absent if (Σ−1)ij = 0
1
2
34
5 Σ−1 =
• • 0 • • • • • 0 • 0 • • • 0 • 0 • • 0 • • 0 0 •
13/40
Bias-variance trade-off
Frobenius norm: ‖A‖2F = tr(A2). Any estimator Σ̂ ∈ Sp×p++ of Σ verifies
MSE(Σ̂) = var(Σ̂) + bias(Σ̂)
where MSE(Σ̂) , E[‖Σ̂−Σ‖2F]. Suppose Σ̂ = S. Then, since S is unbiased, i.e., bias(Σ̂) = ‖E[Σ̂]−Σ‖2F = 0, one has that
MSE(Σ̂) = var(Ŝ) (= tr{cov(vec(S))})
7 However, S is not applicable for high-dimensional problems
n ≈ p ⇒ var(S) is very large p > n or p� n ⇒ S is singular (non-invertible).
3 Use an estimator Σ̂ = Ŝβ that shrinks S towards a structure (e.g., a scaled identity matrix) using a tuning (shrinkage) parameter β
MSE(Σ̂) can be reduced by introducing some bias! Positive definiteness of Σ̂ can be ensured!
4/40
Bias-variance trade-off
Frobenius norm: ‖A‖2F = tr(A2). Any estimator Σ̂ ∈ Sp×p++ of Σ verifies
MSE(Σ̂) = var(Σ̂) + bias(Σ̂)
where MSE(Σ̂) , E[‖Σ̂−Σ‖2F]. Suppose Σ̂ = S. Then, since S is unbiased, i.e., bias(Σ̂) = ‖E[Σ̂]−Σ‖2F = 0, one has that
MSE(Σ̂) = var(Ŝ) (= tr{cov(vec(S))})
7 However, S is not applicable for high-dimensional problems
n ≈ p ⇒ var(S) is very large p > n or p� n ⇒ S is singular (non-invertible).
3 Use an estimator Σ̂ = Ŝβ that shrinks S towards a structure (e.g., a scaled identity matrix) using a tuning (shrinkage) parameter β
MSE(Σ̂) can be reduced by introducing some bias! Positive definiteness of Σ̂ can be ensured!
4/40
Bias-variance trade-off (cont’d)
Regularized SCM (RSCM) a lá Ledoit and Wolf
Sβ = βS + (1− β)[tr(S)/p]I where β > 0 denotes the shrinkage (regularization) parameter.
,_
e ,_
w
Total error Variance
Bias2
Tuning parameter
5/40
This talk: choose β optimally under the random sampling from elliptical distribution + applications
Esa Ollila. Optimal high-dimensional shrinkage covariance estimation for elliptical distributions. In EUSIPCO’17, Kos, Greece, 2017, pp 1689–1693.
Esa Ollila and Elias Raninen. Optimal shrinkage covariance matrix estimator under random sampling from elliptical distributions. in Arxiv soon.
M. N. Tabassum and E. Ollila. Compressive regularized discriminant analysis of high-dimensional data with applications to microarray studies. In ICASSP’18, Calgary, Canada, 2017, pp. 4204 –4208.
6/40
1 Portfolio optimization
2 Ell-RSCM estimators
3 Estimates of oracle parameters
4 Simulation studies
5 Compressive Regularized Discriminant Analysis
Modern portfolio theory (MPT)
mathematical framework for portfolio allocations that balances the return-risk tradeoff by ??.
Important inputs by ???∗
A portfolio consist of p assets, e.g.:
- equity securities (stocks), market indexes - fixed-income securities (e.g., government or corporate bonds),
commodities (e.g., oil, gold, etc), ETF-s - currencies (exchange rates), derivatives, . . . ,
To use MPT one needs to estimate the mean vector µ and the covariance matrix Σ of asset returns.
7 often p, the number of assets is larger (or of similar magnitude) to n, the number of historical returns.
⇒ MPT standard plug-in estimates is not applicable ∗Noble price winners: James Tobin (1981), Harry Markovitz (1990) and Willian F.
Sharpe (1990), and Eugene F. Fama (2013)
7/40
Basic definitions
Portfolio weight at (discrete) time index t:
wt = (wt,1, . . . , wt,p) > s.t.
p∑
i=1
wt,i = 1 >wt = 1
Let Ci,t > 0 be the price of i th asset
The net return of the ith asset over one interval is
ri,t = Ci,t − Ci,t−1
Ci,t−1 =
Ci,t Ci,t−1
− 1 ∈ [−1,∞)
Single period net returns of p assets is a p-variate vector
rt = (r1,t, . . . , rp,t) >
The portfolio net return at period t+ 1 is
Rt+1 = w > t rt+1 =
p∑
i=1
wi,tri,t+1
8/40
Basic definitions (cont’d)
Given historical returns {rt}nt=1,
µ = E[rt] and Σ ≡ cov(rt) = E [ (rt − µt)(rt − µt)>
]
holds for all t (so drop the index t from subscript).
Let r denote (random vector) of returns, then the expected return of the portfolio is
µw = E[R] = w>µ
and the variance (or risk) of the portfolio is
σ2w = var(R) = w >Σw.
The standard deviation σw = √
var(R) is called the volatility
Global mean variance portfolio (GMVP) optimization problem:
minimize w∈Rp
w>Σw subject to 1>w = 1.
The solution is wo = Σ−11
1>Σ−11 .
9/40
S&P 500 and Nasdaq-100 indexes for year 2017
01-03 03-16 05-26 08-08 10-18 12-29
2300
2400
2500
2600
S&P 500 index prices
01-03 03-16 05-26 08-08 10-18 12-29
5000
5500
6000
6500 NASDAQ-100 prices
01-03 03-16 05-26 08-08 10-18 12-29
-0.02
-0.01
0
0.01
0.02
NASDAQ-100 daily net returns
01-03 03-16 05-26 08-08 10-18 12-29
-0.02
-0.01
0
0.01
0.02
S&P 500 daily net returns
10/40
Are historical returns Gaussian?
QQ-plots and estimated pdf-s
-4 -2 0 2 4
Standard Normal Quantiles
-4
-2
0
2
4
Q u
a n
ti le
s o
f In
p u
t S
a m
p le
S&P 500
-4 -2 0 2 4
Standard Normal Quantiles
-4
-2
0
2
4
Q u
a n
ti le
s o
f In
p u
t S
a m
p le
NASDAQ-100
-4 -2 0 2 4
Standard Normal Quantiles
-4
-2
0
2
4
Q u
a n
ti le
s o
f In
p u
t S
a m
p le
N(0,1) sample
S&P 500
-4 -2 0 2
0
0.1
0.2
0.3
0.4
0.5
0.6 NASDAQ-100
-4 -2 0 2 4
0
0.1
0.2
0.3
0.4
0.5
0.6 N(0,1) sample
-2 -1 0 1 2
0
0.1
0.2
0.3
0.4
0.5
0.6
11/40
Are historical returns Gaussian (cont’d)?
Scatter plots and estimated 99%, 95% and 50% tolerance ellipses
-0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
inside the 50% ellipse: 65.6% of returns inside the 95 and 99% ellipses = 93.2% 95.6% of returns
12/40
Stocks are unpredictable (and non-Gaussian)!
03-16 05-26 08-08 10-18 12-29 -5
0
5
10
15
Shorting allowed
long-only
05-26 08-08 10-18 12-29 -0.2
-0.15
-0.1
-0.05
0
0.05
0.1
Shorting allowed
long-only
TECH stocks (Facebook, Apple, Amazon, Microsoft, Google) dropped drastically (in seconds) due ”fat finger” trade or an automated trade.
13/40
Real data analysis