High-dimensional StatisticsHigh-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High...

High-dimensional Statistics

Pradeep Ravikumar

UT Austin

Outline

1. High Dimensional Data : Large p, small n

2. Sparsity

3. Group Sparsity

4. Low Rank

Curse of Dimensionality

Statistical Learning: Given n observations from p(X; θ∗),where θ∗ ∈ Rp, recover signal/parameter θ∗.

For reliable statistical learning, no. of observations nshould scale exponentially with the dimension of data p.

What if we do not have these many observations?

What if the dimension of data p scales exponentiallywith the number of observations n instead?

High-dim. Data: Imaging

Tens of thousands of “voxels” in each 3D image.

Don’t want to spend hundreds of thousands of minutesinside machine in the name of curse of dim.!

High-dim. Data: Gene (Microarray) Experiments

Tens of thousands of genes

Each experiment costs money (so no access to “expo-nentially” more observations)

High-dim. Data: Social Networks

Millions/Billions of nodes/parameters

Fewer obervations

Linear Regression

Source: Wikipedia

Yi = XTi θ∗ + εi, i = 1, . . . , n

Y : real-valued response

X : “covariates/features” in Rp

Examples:

Finance: Modeling Investment risk, Spending, Demand,etc. (responses) given market conditions (features)

Epidemiology: Linking Tobacco Smoking (feature) toMortality (response)

Linear Regression

Yi = XTi θ∗ + εi, i = 1, . . . , n

What if p� n?

Hope for consistent estimation even for such a high-dimensional model, if there is some low-dimensionalstructure!

Sparsity: Only a few entries are non-zero

Sparse Linear Regression

Example: Sparse regression

wy X !!

Set-up: noisy observations y = X!! + w with sparse !!

Estimator: Lasso program

!! " arg min!

(yi # xTi !)2 + "n

Some past work: Tibshirani, 1996; Chen et al., 1998; Donoho/Xuo, 2001; Tropp, 2004;

Fuchs, 2004; Meinshausen/Buhlmann, 2005; Candes/Tao, 2005; Donoho, 2005; Haupt &

Nowak, 2006; Zhao/Yu, 2006; Wainwright, 2006; Zou, 2006; Koltchinskii, 2007;

Meinshausen/Yu, 2007; Tsybakov et al., 2008

‖θ∗‖0 = |{j ∈ {1, . . . , p} : θ∗j 6= 0}| is small

Estimate a sparse linear model:

minθ‖y −Xθ‖2

s.t. ‖θ‖0 ≤ k.

`0 constrained linear regression!

NP-Hard : Davis (1994), Natarajan (1995)

Note: The estimation problem is non-convex

`1 RegularizationWhy an �1 penalty?

�0 quasi-norm �1 norm �2 norm

Just Relax 6

Source: Tropp 06

`1 norm is the closest “convex” norm to the `0 penalty.

`1 Regularization

Example: Sparse regression

wy X !!

Set-up: noisy observations y = X!! + w with sparse !!

Estimator: Lasso program

!! " arg min!

(yi # xTi !)2 + "n

Some past work: Tibshirani, 1996; Chen et al., 1998; Donoho/Xuo, 2001; Tropp, 2004;

Fuchs, 2004; Meinshausen/Buhlmann, 2005; Candes/Tao, 2005; Donoho, 2005; Haupt &

Nowak, 2006; Zhao/Yu, 2006; Wainwright, 2006; Zou, 2006; Koltchinskii, 2007;

Meinshausen/Yu, 2007; Tsybakov et al., 2008

Equivalent:

(yi − xTi θ)2

s.t. ‖θ‖1 ≤ C.

Group-Sparsity

Parameters in groups: θ =(θ1, . . . , θ|G1|︸︷︷︸

, . . . , θp−|Gm|+1, . . . , θp︸︷︷︸θGm

A group analog of sparsity: θ∗ =(∗, . . . , ∗︸︷︷︸

,0, . . . ,0, . . .)

Only a few groups are active; rest are zero.

Handwriting Recognition

Data : Digit “Two” from multiple writers ; Task: Rec-ognize Digit given a new image

Could model digit recognition for each writer separately,or mix all digits for training.

Alternative: Use group sparsity. Model digit recognitionfor each writer, but make the models share relevantfeatures. (Each image is represented as a vector offeatures)

Group-sparse Multiple Linear Regression

m Response Variables:

Y (l)i = XT

i Θ(l) + w(l)i , i = 1, . . . , n.

Collate into matrices Y ∈ Rn×m, X ∈ Rn×p and Θ ∈ Rm×p:

Multiple Linear Regression: Y = XΘ +W .

Group-sparse Multiple Linear RegressionMultivariate regression with block regularizers

PSfrag replacements

XY W!!

pn ! p

!1/!q-regularized group Lasso: with "n ! 2|||XT Wn

|||!,q̃ where 1/q + 1/q̃ = 1

b! " arg min!"Rp!p

2n|||Y # X!|||2F + "n|||!|||1,q

Corollary

Say !! is supported on |S| = s rows, X satisfies RSC and Wij " N(0,!2).

Then we have |||!! # !!|||F $ 2!(L)"q(S)"n where

"q(S) =

"m1/q"1/2

%s if q & [1, 2).%

s if q ' 2.

Estimate a group-sparse model where rows (groups) ofΘ∗ are sparse:

|{j ∈ {1, . . . , p} : Θ∗j· 6= 0}| is small.

Group Lasso

{∑ml=1

∑ni=1(Y (l)

i −XTi Θ·l)2 + λ

∑pj=1 ‖Θj·‖q

Group analog of Lasso.

Lasso: ‖θ‖0 →∑p

j=1 |θj|.

Group Lasso: ‖(‖Θj·‖q)‖0 →∑p

j=1 ‖Θj·‖q

(Obozinski et al; Negahban et al; Huang et al; . . .)

Low Rank

Matrix-structured observations: X ∈ Rk×m, Y ∈ R.

Parameters are matrices: Θ ∈ Rk×m

Linear Model: Yi = tr(XiΘ) +Wi, i = 1, . . . , n.

Applications: Analysis of fMRI image data, EEG datadecoding, neural response modeling, financial data.

Also arise in collaborative filtering: predicting user pref-erences for items (such as movies) based on their andother users’ ratings of related items.

Low Rank

Example: Low-rank matrix approximation

!! U D V T

k ! m k ! r

Set-up: Matrix !! " Rk"m with rank r # min{k, m}.

Estimator:

!! " argmin!

(yi $ %%Xi, !&&)2 + !n

min{k,m}"

Some past work: Frieze et al., 1998; Achilioptas & McSherry, 2001; Srebro et al., 2004;

Drineas et al., 2005; Rudelson & Vershynin, 2006; Recht et al., 2007; Bach, 2008; Meka et

al., 2009; Candes & Tao, 2009; Keshavan et al., 2009

Nuclear Norm

Singular Values of A ∈ Rk×m: Square-roots of non-zeroeigenvalues of ATA.

Matrix Decomposition: A =∑r

i=1 σiui(vi)T .

Rank of Matrix A = |{i ∈ {1, . . . ,min{k,m}} : σi 6= 0}|.

Nuclear Norm is the low-rank analog of Lasso:

‖A‖∗ =∑min{k,m}

i=1 σi.

High-dimensional Statistical Analysis

Typical Statistical Consistency Analysis: Holding modelsize (p) fixed, as number of samples goes to infinity,estimated parameter θ̂ approaches the true parameterθ∗.

Meaningless in finite sample cases where p� n!

Need a new breed of modern statistical analysis: bothmodel size p and sample size n go to infinity!

Typical Statistical Guarantees of Interest for an estimateθ̂:

• Structure Recovery e.g. is sparsity pattern of θ̂same as of θ∗?

• Parameter Bounds: ‖θ̂ − θ∗‖ (e.g. `2 error bounds)

• Risk (Loss) Bounds: difference in expected loss

High-dimensional StatisticsHigh-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High...

Documents

Transcript of High-dimensional StatisticsHigh-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High...

OSNAP: Faster numerical linear algebra algorithms via ...people.seas.harvard.edu/~minilek/publications/papers/osnap.pdf · proportional to the input sparsity. These embeddings have

On biconservative surfacesmath.etc.tuiasi.ro/dfetcu/resurse/Fetcu-Brno_July_2016_short.pdf · Biconservative surfaces in 3-dimensional space forms Biconservative surfaces in 3-dimensional

Analisis Dimensional

Low-dimensional state-space representations for classical ...faculty.washington.edu/sbrunton/talks/aiaa_asm_2011.pdf · Low-dimensional state-space representations for classical unsteady

Two-dimensional incompressible irrotational fow

Sparsity and the Possibility of Inferencebickel/BickelYan2009sankh...Sparsity and the possibility of inference 1 on the basis of i.i.d. observations, (Xi,Yi), with X ∈ Rd, assuming

Gestational Trophoblastic Disease YASHODHARA PRADEEP Prof., OBGYN QMH CSMMU LKO 2014.

6.2 Dimensional Analysis

arXiv:2005.07274v2 [cs.CV] 1 Jun 2020Abhishek Badki1,2 Alejandro Troccoli 1Kihwan Kim Jan Kautz1 Pradeep Sen2 Orazio Gallo1 1NVIDIA 2University of California, Santa Barbara (a) (b)

One Dimensional Motion

Sharp thresholds for high-dimensional and noisy recovery of …jordan/sail/readings/... · 2008-05-22 · Sharp thresholds for high-dimensional and noisy recovery of sparsity using

Module 4: Dimensional analysis - UBC Blogsblogs.ubc.ca/frigaard/files/2017/12/18-MECH280-Module4.pdf · Module 4: Dimensional analysis Understand dimensions, units and dimensional

Sparsity-based single-shot subwavelength coherent ...webee.technion.ac.il/people/YoninaEldar/journals/nmat3289-s1.pdf · Sparsity-based single-shot subwavelength ... Utilizing a convenient

Fluorescence Quenching Studies of -Butyrolactone Binding …pradeep/assets/29.pdf · Fluorescence Quenching Studies of γ‑Butyrolactone Binding Protein (CprB) from Streptomyces

Sharp thresholds for high-dimensional and noisy sparsity ...wainwrig/Papers/Wai_SharpThresh.pdfSharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained

Scaling relations in quasi-two-dimensional Heisenberg ...

Mathematical Modelling Lecture 2 -- Dimensional Analysispjh503/mathematical_model/math_model2.pdf · Dimensional Analysis Buckingham’s Π-Theorem Buckingham’s Π-Theorem Example

Supplementary Information for Novel two-dimensional -GeSe ...

Flujo Unidimensional y Bi Dimensional

The Three-dimensional Schödinger Equation