Ji Xu, Daniel Hsu - Columbia Universitydjhsu/papers/pcr-poster.pdf · Title: On the number of...

1
On the number of variables to use in principal component regression Ji Xu, Daniel Hsu Computer Science Department and Data Science Institute, Columbia University Principal Component Regression Model: Suppose the data consists of n i.i.d. observations (x 1 ,y 1 ),...,(x n ,y n ) from R N × R, where y i = x > i θ + w i , and x i ∼N (0, Σ), w i ∼N (02 ). Principal component regression: Let λ 1 ... λ p be the eigenvalues of Σ. Let v 1 ,v 2 ,...,v p be the corresponding eigenvectors. The PCR estimator ˆ θ for θ is defined by (minimum 2 norm solution for p>n regime) ˆ θ P := ( (X > P X P ) -1 X > P y if p n, X > P (X P X > P ) -1 y if p > n, where X P =[x 1 |···|x n ] > [v 1 | ... |v p ]. The prediction error is given by Error p := E x,y [(y - x > ˆ θ P ) 2 ]. Question: What is the optimal value of p that minimizes the prediction error? Double descent phenomenon First descent: classic U-shaped risk curve arising from a bias-variance trade-off. Second descent: behavior of models in H that interpolate training data. This particular shape is observed in many learning problems, such as neural networks, decision trees and ensemble methods. Figure: [BHMM19] (a) The classical U-shaped risk curve arising from the bias-variance trade-off. (b) The double descent risk curve including both the U-shaped risk curve and the observed behavior from using high capacity function classes. For principal component regression (PCR): Assumption: We assume E θ [θ ]=0 and E θ [θθ > ]= I . Question: Does double descent phenomenon happen in PCR? Question: When the second descent achieve error smaller than the first descent? Case of Polynomial Decay We first analyze a special case when the eigenvalues of Σ decay to zero at a polynomial rate. Specifically, we assume A.1 There exists a constant κ> 0 such that λ j = j -κ for all j =1,...,N . A.2 There exist constants α [0, 1] and β (0, 1) such that p/N α and n/N β as p, n, N →∞. Define m κ (z ) for z 0 to be the smallest positive solution to the equation -z = 1 m κ (z ) - 1 β Z α -κ 1 κt 1(1 + t · m κ (z )) dt, (1) and let m 0 κ (·) denote the derivative of m κ (·). Remark: m κ (z ) is the Stieltjes transform of the limiting distribution of the empirical distribution of the eigenvalues of N κ Σ. Theorem 1. Assume A.1 with constant κ and A.2 with constants α and β . (i) Risk characterization at α<β : For all α<β , we have E w,θ [Error] p N 1-κ Z 1 α t -κ dt + σ 2 ! · β β - α =: R κ (α, σ ), α < β. (ii) Optimal risk at α<β : When κ> 1, the minimum of R κ (α, σ ) is achieved at α =0 and the minimum risk is given by min α<β R κ (α, σ )= σ 2 . When κ 1, the minimum of R κ (α, σ ) is achieved at α * which is the unique solution of the equation h κ (α)=0 on (0), where h κ (α) is given by h κ (α) := β α - Z 1 α t κ-2 dt - 1 - σ 2 1{κ =1}. The minimum risk is therefore given by min α<β R κ (α, σ )= N 1-κ β (α * ) κ . (iii) Risk characterization at α>β : For all α>β , the function m κ defined in (1) and its derivative m 0 κ are well-defined and positive at z =0, and E w,θ [Error] p N 1-κ β m κ (0) + N 1-κ Z 1 α t -κ dt + σ 2 ! m 0 κ (0) m 2 κ (0) =: R κ (α, σ ). (iv) Comparison between two regimes: When κ> 1, the minimum risk for all α< 1 and α 6= β is achieved at α =0, i.e., p = o(n). When κ< 1, let α * be the minimizer of R κ (α, σ ) over the interval [0). Then lim sup N R κ (1) R κ (α * ) < 1. Case of General Decay Results from Theorem 1 can be extended to the eigenvalues of Σ with other decay rate when the following assumptions hold: B.1 kΣk 2 C for some constant C> 0. Also, there exists a positive sequence (c N ) N 1 such that the empirical eigenvalue distribution of c N Σ converges as N →∞ to F = (1 - δ )F 0 + δF 1 , where δ (0, 1], F 0 is a point mass of 0, and F 1 has a continuous probability density f supported on either [η 1 2 ] or [η 1 , ) for some constants η 1 2 > 0. B.2 There exist constants ν> 0 and β (0) s.t. p = N i=1 I(λ i ν N ), ν N c N ν and n/N β as n, N →∞. Remark: B.1 is the extension of A.1 with c N = N κ . B.2 is the extension of A.2 where ν N is the threshold parameter that determines the number of selected principal components. Summaries and Discussions We confirm the “double descent” in a natural setting with Gaussian design. Optimal p depends on noise level and decay rate of the eigenvalues of Σ. When κ< 1, a smaller risk is achieved after the interpolation threshold (p>n) than any point before (p<n). When κ> 1, a smaller risk is achieved after the interpolation threshold (p>n) only in the noiseless setting. When Σ is unknown Estimate Σ via unlabeled data. Since the dominance of the p>n regime is always established at p = N (full model), we believe same results hold for standard PCR as well. R () E w,✓ Error 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 p/N E Error R(alpha) 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 = p/N risk =1 0.0 0.2 0.4 0.6 0.8 1.0 0.006 0.008 0.010 0.012 E Error R(alpha) 0.0 0.2 0.4 0.6 0.8 1.0 0.006 0.008 0.010 0.012 = p/N risk =2 Figure: The asymptotic risk function R κ as a function of α (with σ =0, n = 300, N = 1000, β = n/N =0.3 and κ =1, 2 respectively). The location of α * from Theorem 1 is marked with a black circle. In both cases, the asymptotic risk at α =1 is lower than the asymptotic risk at α * . https://arxiv.org/abs/1906.01139 [email protected], [email protected]

Transcript of Ji Xu, Daniel Hsu - Columbia Universitydjhsu/papers/pcr-poster.pdf · Title: On the number of...

Page 1: Ji Xu, Daniel Hsu - Columbia Universitydjhsu/papers/pcr-poster.pdf · Title: On the number of variables to use in principal component regression Author: Ji Xu, Daniel Hsu Created

On the number of variables to use in principal component regressionJi Xu, Daniel Hsu

Computer Science Department and Data Science Institute, Columbia University

Principal Component Regression

Model: Suppose the data consists of n i.i.d. observations (x1, y1), . . . ,(xn, yn)from RN × R, where

yi = x>i θ + wi,

andxi ∼ N (0,Σ), wi ∼ N (0, σ2).

Principal component regression: Let λ1 ≥ . . . ≥ λp be the eigenvalues of Σ.

Let v1, v2, . . . , vp be the corresponding eigenvectors. The PCR estimator θ̂ for θis defined by (minimum `2 norm solution for p > n regime)

θ̂P :=

{(X>

PXP)−1X>Py if p ≤ n,

X>P(XPX

>P)−1y if p > n,

where XP = [x1| · · · |xn]>[v1| . . . |vp]. The prediction error is given by

Errorp := Ex,y[(y − x>θ̂P)2].

Question: What is the optimal value of p that minimizes the prediction error?

Double descent phenomenon

• First descent: classic U-shaped risk curve arising from a bias-variance trade-off.

• Second descent: behavior of models in H that interpolate training data.

• This particular shape is observed in many learning problems, such as neural networks,decision trees and ensemble methods.

Figure: [BHMM19] (a) The classical U-shaped risk curve arising from the bias-variance trade-off. (b)The double descent risk curve including both the U-shaped risk curve and the observed behavior fromusing high capacity function classes.

For principal component regression (PCR):

Assumption: We assume Eθ[θ] = 0 and Eθ[θθ>] = I.

Question: Does double descent phenomenon happen in PCR?

Question: When the second descent achieve error smaller than the first descent?

Case of Polynomial Decay

We first analyze a special case when the eigenvalues of Σ decay to zero at a polynomial rate.Specifically, we assume

A.1 There exists a constant κ > 0 such that λj = j−κ for all j = 1, . . . , N .

A.2 There exist constants α ∈ [0, 1] and β ∈ (0, 1) such that p/N → α andn/N → β as p, n,N →∞.

Define mκ(z) for z ≤ 0 to be the smallest positive solution to the equation

−z =1

mκ(z)−

1

β

∫ ∞α−κ

1

κt1/κ(1 + t ·mκ(z))dt, (1)

and let m′κ(·) denote the derivative of mκ(·).

Remark: mκ(z) is the Stieltjes transform of the limiting distribution of the empiricaldistribution of the eigenvalues of NκΣ.

Theorem 1. Assume A.1 with constant κ and A.2 with constants α and β.

(i) Risk characterization at α < β: For all α < β, we have

Ew,θ[Error]p→(N1−κ

∫ 1

α

t−κ dt+ σ2

β

β − α=: Rκ(α, σ), ∀α < β.

(ii) Optimal risk at α < β: When κ > 1, the minimum of Rκ(α, σ) is achieved atα = 0 and the minimum risk is given by

minα<βRκ(α, σ) = σ2.

When κ ≤ 1, the minimum of Rκ(α, σ) is achieved at α∗ which is the uniquesolution of the equation hκ(α) = 0 on (0, β), where hκ(α) is given by

hκ(α) :=β

α−∫ 1

α

tκ−2 dt− 1− σ21{κ = 1}.

The minimum risk is therefore given by

minα<βRκ(α, σ) = N1−κ β

(α∗)κ.

(iii) Risk characterization at α > β: For all α > β, the function mκ defined in (1) and itsderivative m′κ are well-defined and positive at z = 0, and

Ew,θ[Error]p→ N1−κ β

mκ(0)+

(N1−κ

∫ 1

α

t−κ dt+ σ2

)m′κ(0)

m2κ(0)

=: Rκ(α, σ).

(iv) Comparison between two regimes: When κ > 1, the minimum risk for all α < 1 andα 6= β is achieved at α = 0, i.e., p = o(n). When κ < 1, let α∗ be the minimizerof Rκ(α, σ) over the interval [0, β). Then

lim supN

Rκ(1, σ)

Rκ(α∗, σ)< 1.

Case of General Decay

Results from Theorem 1 can be extended to the eigenvalues of Σ with otherdecay rate when the following assumptions hold:

B.1 ‖Σ‖2 ≤ C for some constant C > 0. Also, there exists a positivesequence (cN)N≥1 such that the empirical eigenvalue distribution of cNΣconverges as N →∞ to F = (1− δ)F0 + δF1, where δ ∈ (0, 1], F0

is a point mass of 0, and F1 has a continuous probability density fsupported on either [η1, η2] or [η1,∞) for some constants η1, η2 > 0.

B.2 There exist constants ν > 0 and β ∈ (0, δ) s.t. p =∑Ni=1 I(λi ≥ νN),

νNcN → ν and n/N → β as n,N →∞.

Remark: B.1 is the extension of A.1 with cN = Nκ. B.2 is the extension ofA.2 where νN is the threshold parameter that determines the number of selectedprincipal components.

Summaries and Discussions

• We confirm the “double descent” in a natural setting with Gaussian design.

• Optimal p depends on noise level and decay rate of the eigenvalues of Σ.

◦ When κ < 1, a smaller risk is achieved after the interpolation threshold(p > n) than any point before (p < n).

◦ When κ > 1, a smaller risk is achieved after the interpolation threshold(p > n) only in the noiseless setting.

• When Σ is unknown

◦ Estimate Σ via unlabeled data.

◦ Since the dominance of the p > n regime is always established at p = N(full model), we believe same results hold for standard PCR as well.

R(↵)<latexit sha1_base64="G3+rf6RqL6ugJ3aBEV4KtdSPiRA=">AAACBXicbVBNS8NAEN34WetX1KMegkWol5KIoMeiF49V7Ac0IUy2m3bpJll2N0IJuXjxr3jxoIhX/4M3/42bNgdtfTDweG+GmXkBZ1Qq2/42lpZXVtfWKxvVza3tnV1zb78jk1Rg0sYJS0QvAEkYjUlbUcVIjwsCUcBINxhfF373gQhJk/heTTjxIhjGNKQYlJZ888iNQI0wsOwu9zN3DJxDXneB8RGc+mbNbthTWIvEKUkNlWj55pc7SHAakVhhBlL2HZsrLwOhKGYkr7qpJBzwGIakr2kMEZFeNv0it060MrDCROiKlTVVf09kEEk5iQLdWdws571C/M/rpyq89DIa81SRGM8WhSmzVGIVkVgDKghWbKIJYEH1rRYegQCsdHBVHYIz//Ii6Zw1HLvh3J7XmldlHBV0iI5RHTnoAjXRDWqhNsLoET2jV/RmPBkvxrvxMWtdMsqZA/QHxucPwoGYtg==</latexit><latexit sha1_base64="G3+rf6RqL6ugJ3aBEV4KtdSPiRA=">AAACBXicbVBNS8NAEN34WetX1KMegkWol5KIoMeiF49V7Ac0IUy2m3bpJll2N0IJuXjxr3jxoIhX/4M3/42bNgdtfTDweG+GmXkBZ1Qq2/42lpZXVtfWKxvVza3tnV1zb78jk1Rg0sYJS0QvAEkYjUlbUcVIjwsCUcBINxhfF373gQhJk/heTTjxIhjGNKQYlJZ888iNQI0wsOwu9zN3DJxDXneB8RGc+mbNbthTWIvEKUkNlWj55pc7SHAakVhhBlL2HZsrLwOhKGYkr7qpJBzwGIakr2kMEZFeNv0it060MrDCROiKlTVVf09kEEk5iQLdWdws571C/M/rpyq89DIa81SRGM8WhSmzVGIVkVgDKghWbKIJYEH1rRYegQCsdHBVHYIz//Ii6Zw1HLvh3J7XmldlHBV0iI5RHTnoAjXRDWqhNsLoET2jV/RmPBkvxrvxMWtdMsqZA/QHxucPwoGYtg==</latexit><latexit sha1_base64="G3+rf6RqL6ugJ3aBEV4KtdSPiRA=">AAACBXicbVBNS8NAEN34WetX1KMegkWol5KIoMeiF49V7Ac0IUy2m3bpJll2N0IJuXjxr3jxoIhX/4M3/42bNgdtfTDweG+GmXkBZ1Qq2/42lpZXVtfWKxvVza3tnV1zb78jk1Rg0sYJS0QvAEkYjUlbUcVIjwsCUcBINxhfF373gQhJk/heTTjxIhjGNKQYlJZ888iNQI0wsOwu9zN3DJxDXneB8RGc+mbNbthTWIvEKUkNlWj55pc7SHAakVhhBlL2HZsrLwOhKGYkr7qpJBzwGIakr2kMEZFeNv0it060MrDCROiKlTVVf09kEEk5iQLdWdws571C/M/rpyq89DIa81SRGM8WhSmzVGIVkVgDKghWbKIJYEH1rRYegQCsdHBVHYIz//Ii6Zw1HLvh3J7XmldlHBV0iI5RHTnoAjXRDWqhNsLoET2jV/RmPBkvxrvxMWtdMsqZA/QHxucPwoGYtg==</latexit><latexit sha1_base64="G3+rf6RqL6ugJ3aBEV4KtdSPiRA=">AAACBXicbVBNS8NAEN34WetX1KMegkWol5KIoMeiF49V7Ac0IUy2m3bpJll2N0IJuXjxr3jxoIhX/4M3/42bNgdtfTDweG+GmXkBZ1Qq2/42lpZXVtfWKxvVza3tnV1zb78jk1Rg0sYJS0QvAEkYjUlbUcVIjwsCUcBINxhfF373gQhJk/heTTjxIhjGNKQYlJZ888iNQI0wsOwu9zN3DJxDXneB8RGc+mbNbthTWIvEKUkNlWj55pc7SHAakVhhBlL2HZsrLwOhKGYkr7qpJBzwGIakr2kMEZFeNv0it060MrDCROiKlTVVf09kEEk5iQLdWdws571C/M/rpyq89DIa81SRGM8WhSmzVGIVkVgDKghWbKIJYEH1rRYegQCsdHBVHYIz//Ii6Zw1HLvh3J7XmldlHBV0iI5RHTnoAjXRDWqhNsLoET2jV/RmPBkvxrvxMWtdMsqZA/QHxucPwoGYtg==</latexit>

Ew,✓⇤ Error<latexit sha1_base64="b+Ytr3XWSnr8N38ZMyOL/kC6yrU=">AAACD3icbVDLSsNAFJ34rPUVdekmWBQRKYkIuixKwWUF+4Amhsl02g6dPJi5UUqIX+DGX3HjQhG3bt35N07SLLT1wIXDOfdy7z1exJkE0/zW5uYXFpeWSyvl1bX1jU19a7slw1gQ2iQhD0XHw5JyFtAmMOC0EwmKfY/Ttje6zPz2HRWShcENjCPq+HgQsD4jGJTk6ge2j2HoeUk9dZP7YxuGFPDtUfqQ68JP6kKEInX1ilk1cxizxCpIBRVouPqX3QtJ7NMACMdSdi0zAifBAhjhNC3bsaQRJiM8oF1FA+xT6ST5P6mxr5Se0Q+FqgCMXP09kWBfyrHvqc7sSjntZeJ/XjeG/rmTsCCKgQZksqgfcwNCIwvH6DFBCfCxIpgIpm41yBALTEBFWFYhWNMvz5LWSdUyq9b1aaV2UcRRQrtoDx0iC52hGrpCDdREBD2iZ/SK3rQn7UV71z4mrXNaMbOD/kD7/AFrvZ2H</latexit><latexit sha1_base64="b+Ytr3XWSnr8N38ZMyOL/kC6yrU=">AAACD3icbVDLSsNAFJ34rPUVdekmWBQRKYkIuixKwWUF+4Amhsl02g6dPJi5UUqIX+DGX3HjQhG3bt35N07SLLT1wIXDOfdy7z1exJkE0/zW5uYXFpeWSyvl1bX1jU19a7slw1gQ2iQhD0XHw5JyFtAmMOC0EwmKfY/Ttje6zPz2HRWShcENjCPq+HgQsD4jGJTk6ge2j2HoeUk9dZP7YxuGFPDtUfqQ68JP6kKEInX1ilk1cxizxCpIBRVouPqX3QtJ7NMACMdSdi0zAifBAhjhNC3bsaQRJiM8oF1FA+xT6ST5P6mxr5Se0Q+FqgCMXP09kWBfyrHvqc7sSjntZeJ/XjeG/rmTsCCKgQZksqgfcwNCIwvH6DFBCfCxIpgIpm41yBALTEBFWFYhWNMvz5LWSdUyq9b1aaV2UcRRQrtoDx0iC52hGrpCDdREBD2iZ/SK3rQn7UV71z4mrXNaMbOD/kD7/AFrvZ2H</latexit><latexit sha1_base64="b+Ytr3XWSnr8N38ZMyOL/kC6yrU=">AAACD3icbVDLSsNAFJ34rPUVdekmWBQRKYkIuixKwWUF+4Amhsl02g6dPJi5UUqIX+DGX3HjQhG3bt35N07SLLT1wIXDOfdy7z1exJkE0/zW5uYXFpeWSyvl1bX1jU19a7slw1gQ2iQhD0XHw5JyFtAmMOC0EwmKfY/Ttje6zPz2HRWShcENjCPq+HgQsD4jGJTk6ge2j2HoeUk9dZP7YxuGFPDtUfqQ68JP6kKEInX1ilk1cxizxCpIBRVouPqX3QtJ7NMACMdSdi0zAifBAhjhNC3bsaQRJiM8oF1FA+xT6ST5P6mxr5Se0Q+FqgCMXP09kWBfyrHvqc7sSjntZeJ/XjeG/rmTsCCKgQZksqgfcwNCIwvH6DFBCfCxIpgIpm41yBALTEBFWFYhWNMvz5LWSdUyq9b1aaV2UcRRQrtoDx0iC52hGrpCDdREBD2iZ/SK3rQn7UV71z4mrXNaMbOD/kD7/AFrvZ2H</latexit><latexit sha1_base64="b+Ytr3XWSnr8N38ZMyOL/kC6yrU=">AAACD3icbVDLSsNAFJ34rPUVdekmWBQRKYkIuixKwWUF+4Amhsl02g6dPJi5UUqIX+DGX3HjQhG3bt35N07SLLT1wIXDOfdy7z1exJkE0/zW5uYXFpeWSyvl1bX1jU19a7slw1gQ2iQhD0XHw5JyFtAmMOC0EwmKfY/Ttje6zPz2HRWShcENjCPq+HgQsD4jGJTk6ge2j2HoeUk9dZP7YxuGFPDtUfqQ68JP6kKEInX1ilk1cxizxCpIBRVouPqX3QtJ7NMACMdSdi0zAifBAhjhNC3bsaQRJiM8oF1FA+xT6ST5P6mxr5Se0Q+FqgCMXP09kWBfyrHvqc7sSjntZeJ/XjeG/rmTsCCKgQZksqgfcwNCIwvH6DFBCfCxIpgIpm41yBALTEBFWFYhWNMvz5LWSdUyq9b1aaV2UcRRQrtoDx0iC52hGrpCDdREBD2iZ/SK3rQn7UV71z4mrXNaMbOD/kD7/AFrvZ2H</latexit>

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

p/N

risk

E ErrorR(alpha)

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

↵ = p/N<latexit sha1_base64="dadLJSVMX+K2Z5x9P6fR+QzKsK0=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69BIvgqSYi6EUoevEkFewHtqFMtpt26Waz7G6EEvovvHhQxKv/xpv/xm2bg1YfDDzem2FmXig508bzvpzC0vLK6lpxvbSxubW9U97da+okVYQ2SMIT1Q5RU84EbRhmOG1LRTEOOW2Fo+up33qkSrNE3JuxpEGMA8EiRtBY6aGLXA7xUp7c9soVr+rN4P4lfk4qkKPeK392+wlJYyoM4ah1x/ekCTJUhhFOJ6VuqqlEMsIB7VgqMKY6yGYXT9wjq/TdKFG2hHFn6s+JDGOtx3FoO2M0Q73oTcX/vE5qoosgY0KmhgoyXxSl3DWJO33f7TNFieFjS5AoZm91yRAVEmNDKtkQ/MWX/5LmadX3qv7dWaV2lcdRhAM4hGPw4RxqcAN1aAABAU/wAq+Odp6dN+d93lpw8pl9+AXn4xvuWZBq</latexit><latexit sha1_base64="dadLJSVMX+K2Z5x9P6fR+QzKsK0=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69BIvgqSYi6EUoevEkFewHtqFMtpt26Waz7G6EEvovvHhQxKv/xpv/xm2bg1YfDDzem2FmXig508bzvpzC0vLK6lpxvbSxubW9U97da+okVYQ2SMIT1Q5RU84EbRhmOG1LRTEOOW2Fo+up33qkSrNE3JuxpEGMA8EiRtBY6aGLXA7xUp7c9soVr+rN4P4lfk4qkKPeK392+wlJYyoM4ah1x/ekCTJUhhFOJ6VuqqlEMsIB7VgqMKY6yGYXT9wjq/TdKFG2hHFn6s+JDGOtx3FoO2M0Q73oTcX/vE5qoosgY0KmhgoyXxSl3DWJO33f7TNFieFjS5AoZm91yRAVEmNDKtkQ/MWX/5LmadX3qv7dWaV2lcdRhAM4hGPw4RxqcAN1aAABAU/wAq+Odp6dN+d93lpw8pl9+AXn4xvuWZBq</latexit><latexit sha1_base64="dadLJSVMX+K2Z5x9P6fR+QzKsK0=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69BIvgqSYi6EUoevEkFewHtqFMtpt26Waz7G6EEvovvHhQxKv/xpv/xm2bg1YfDDzem2FmXig508bzvpzC0vLK6lpxvbSxubW9U97da+okVYQ2SMIT1Q5RU84EbRhmOG1LRTEOOW2Fo+up33qkSrNE3JuxpEGMA8EiRtBY6aGLXA7xUp7c9soVr+rN4P4lfk4qkKPeK392+wlJYyoM4ah1x/ekCTJUhhFOJ6VuqqlEMsIB7VgqMKY6yGYXT9wjq/TdKFG2hHFn6s+JDGOtx3FoO2M0Q73oTcX/vE5qoosgY0KmhgoyXxSl3DWJO33f7TNFieFjS5AoZm91yRAVEmNDKtkQ/MWX/5LmadX3qv7dWaV2lcdRhAM4hGPw4RxqcAN1aAABAU/wAq+Odp6dN+d93lpw8pl9+AXn4xvuWZBq</latexit><latexit sha1_base64="dadLJSVMX+K2Z5x9P6fR+QzKsK0=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69BIvgqSYi6EUoevEkFewHtqFMtpt26Waz7G6EEvovvHhQxKv/xpv/xm2bg1YfDDzem2FmXig508bzvpzC0vLK6lpxvbSxubW9U97da+okVYQ2SMIT1Q5RU84EbRhmOG1LRTEOOW2Fo+up33qkSrNE3JuxpEGMA8EiRtBY6aGLXA7xUp7c9soVr+rN4P4lfk4qkKPeK392+wlJYyoM4ah1x/ekCTJUhhFOJ6VuqqlEMsIB7VgqMKY6yGYXT9wjq/TdKFG2hHFn6s+JDGOtx3FoO2M0Q73oTcX/vE5qoosgY0KmhgoyXxSl3DWJO33f7TNFieFjS5AoZm91yRAVEmNDKtkQ/MWX/5LmadX3qv7dWaV2lcdRhAM4hGPw4RxqcAN1aAABAU/wAq+Odp6dN+d93lpw8pl9+AXn4xvuWZBq</latexit>

risk

= 1<latexit sha1_base64="A0YchN0cVRV7/kKcxtDLjLttdgA=">AAAB73icbVA9SwNBEJ2LXzF+RS1tFoNgFe5E0EYI2lhGMDGQHGFus0mW7O2tu3tCOPInbCwUsfXv2Plv3CRXaOKDgcd7M8zMi5Tgxvr+t1dYWV1b3yhulra2d3b3yvsHTZOkmrIGTUSiWxEaJrhkDcutYC2lGcaRYA/R6GbqPzwxbXgi7+1YsTDGgeR9TtE6qdUZoVJ4FXTLFb/qz0CWSZCTCuSod8tfnV5C05hJSwUa0w58ZcMMteVUsEmpkxqmkI5wwNqOSoyZCbPZvRNy4pQe6SfalbRkpv6eyDA2ZhxHrjNGOzSL3lT8z2untn8ZZlyq1DJJ54v6qSA2IdPnSY9rRq0YO4JUc3croUPUSK2LqORCCBZfXibNs2rgV4O780rtOo+jCEdwDKcQwAXU4Bbq0AAKAp7hFd68R+/Fe/c+5q0FL585hD/wPn8Ai+CPoQ==</latexit><latexit sha1_base64="A0YchN0cVRV7/kKcxtDLjLttdgA=">AAAB73icbVA9SwNBEJ2LXzF+RS1tFoNgFe5E0EYI2lhGMDGQHGFus0mW7O2tu3tCOPInbCwUsfXv2Plv3CRXaOKDgcd7M8zMi5Tgxvr+t1dYWV1b3yhulra2d3b3yvsHTZOkmrIGTUSiWxEaJrhkDcutYC2lGcaRYA/R6GbqPzwxbXgi7+1YsTDGgeR9TtE6qdUZoVJ4FXTLFb/qz0CWSZCTCuSod8tfnV5C05hJSwUa0w58ZcMMteVUsEmpkxqmkI5wwNqOSoyZCbPZvRNy4pQe6SfalbRkpv6eyDA2ZhxHrjNGOzSL3lT8z2untn8ZZlyq1DJJ54v6qSA2IdPnSY9rRq0YO4JUc3croUPUSK2LqORCCBZfXibNs2rgV4O780rtOo+jCEdwDKcQwAXU4Bbq0AAKAp7hFd68R+/Fe/c+5q0FL585hD/wPn8Ai+CPoQ==</latexit><latexit sha1_base64="A0YchN0cVRV7/kKcxtDLjLttdgA=">AAAB73icbVA9SwNBEJ2LXzF+RS1tFoNgFe5E0EYI2lhGMDGQHGFus0mW7O2tu3tCOPInbCwUsfXv2Plv3CRXaOKDgcd7M8zMi5Tgxvr+t1dYWV1b3yhulra2d3b3yvsHTZOkmrIGTUSiWxEaJrhkDcutYC2lGcaRYA/R6GbqPzwxbXgi7+1YsTDGgeR9TtE6qdUZoVJ4FXTLFb/qz0CWSZCTCuSod8tfnV5C05hJSwUa0w58ZcMMteVUsEmpkxqmkI5wwNqOSoyZCbPZvRNy4pQe6SfalbRkpv6eyDA2ZhxHrjNGOzSL3lT8z2untn8ZZlyq1DJJ54v6qSA2IdPnSY9rRq0YO4JUc3croUPUSK2LqORCCBZfXibNs2rgV4O780rtOo+jCEdwDKcQwAXU4Bbq0AAKAp7hFd68R+/Fe/c+5q0FL585hD/wPn8Ai+CPoQ==</latexit><latexit sha1_base64="A0YchN0cVRV7/kKcxtDLjLttdgA=">AAAB73icbVA9SwNBEJ2LXzF+RS1tFoNgFe5E0EYI2lhGMDGQHGFus0mW7O2tu3tCOPInbCwUsfXv2Plv3CRXaOKDgcd7M8zMi5Tgxvr+t1dYWV1b3yhulra2d3b3yvsHTZOkmrIGTUSiWxEaJrhkDcutYC2lGcaRYA/R6GbqPzwxbXgi7+1YsTDGgeR9TtE6qdUZoVJ4FXTLFb/qz0CWSZCTCuSod8tfnV5C05hJSwUa0w58ZcMMteVUsEmpkxqmkI5wwNqOSoyZCbPZvRNy4pQe6SfalbRkpv6eyDA2ZhxHrjNGOzSL3lT8z2untn8ZZlyq1DJJ54v6qSA2IdPnSY9rRq0YO4JUc3croUPUSK2LqORCCBZfXibNs2rgV4O780rtOo+jCEdwDKcQwAXU4Bbq0AAKAp7hFd68R+/Fe/c+5q0FL585hD/wPn8Ai+CPoQ==</latexit>

0.0 0.2 0.4 0.6 0.8 1.0

0.00

60.

008

0.01

00.

012 E Error

R(alpha)

0.0 0.2 0.4 0.6 0.8 1.0

0.006

0.008

0.010

0.012

↵ = p/N<latexit sha1_base64="dadLJSVMX+K2Z5x9P6fR+QzKsK0=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69BIvgqSYi6EUoevEkFewHtqFMtpt26Waz7G6EEvovvHhQxKv/xpv/xm2bg1YfDDzem2FmXig508bzvpzC0vLK6lpxvbSxubW9U97da+okVYQ2SMIT1Q5RU84EbRhmOG1LRTEOOW2Fo+up33qkSrNE3JuxpEGMA8EiRtBY6aGLXA7xUp7c9soVr+rN4P4lfk4qkKPeK392+wlJYyoM4ah1x/ekCTJUhhFOJ6VuqqlEMsIB7VgqMKY6yGYXT9wjq/TdKFG2hHFn6s+JDGOtx3FoO2M0Q73oTcX/vE5qoosgY0KmhgoyXxSl3DWJO33f7TNFieFjS5AoZm91yRAVEmNDKtkQ/MWX/5LmadX3qv7dWaV2lcdRhAM4hGPw4RxqcAN1aAABAU/wAq+Odp6dN+d93lpw8pl9+AXn4xvuWZBq</latexit><latexit sha1_base64="dadLJSVMX+K2Z5x9P6fR+QzKsK0=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69BIvgqSYi6EUoevEkFewHtqFMtpt26Waz7G6EEvovvHhQxKv/xpv/xm2bg1YfDDzem2FmXig508bzvpzC0vLK6lpxvbSxubW9U97da+okVYQ2SMIT1Q5RU84EbRhmOG1LRTEOOW2Fo+up33qkSrNE3JuxpEGMA8EiRtBY6aGLXA7xUp7c9soVr+rN4P4lfk4qkKPeK392+wlJYyoM4ah1x/ekCTJUhhFOJ6VuqqlEMsIB7VgqMKY6yGYXT9wjq/TdKFG2hHFn6s+JDGOtx3FoO2M0Q73oTcX/vE5qoosgY0KmhgoyXxSl3DWJO33f7TNFieFjS5AoZm91yRAVEmNDKtkQ/MWX/5LmadX3qv7dWaV2lcdRhAM4hGPw4RxqcAN1aAABAU/wAq+Odp6dN+d93lpw8pl9+AXn4xvuWZBq</latexit><latexit sha1_base64="dadLJSVMX+K2Z5x9P6fR+QzKsK0=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69BIvgqSYi6EUoevEkFewHtqFMtpt26Waz7G6EEvovvHhQxKv/xpv/xm2bg1YfDDzem2FmXig508bzvpzC0vLK6lpxvbSxubW9U97da+okVYQ2SMIT1Q5RU84EbRhmOG1LRTEOOW2Fo+up33qkSrNE3JuxpEGMA8EiRtBY6aGLXA7xUp7c9soVr+rN4P4lfk4qkKPeK392+wlJYyoM4ah1x/ekCTJUhhFOJ6VuqqlEMsIB7VgqMKY6yGYXT9wjq/TdKFG2hHFn6s+JDGOtx3FoO2M0Q73oTcX/vE5qoosgY0KmhgoyXxSl3DWJO33f7TNFieFjS5AoZm91yRAVEmNDKtkQ/MWX/5LmadX3qv7dWaV2lcdRhAM4hGPw4RxqcAN1aAABAU/wAq+Odp6dN+d93lpw8pl9+AXn4xvuWZBq</latexit><latexit sha1_base64="dadLJSVMX+K2Z5x9P6fR+QzKsK0=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69BIvgqSYi6EUoevEkFewHtqFMtpt26Waz7G6EEvovvHhQxKv/xpv/xm2bg1YfDDzem2FmXig508bzvpzC0vLK6lpxvbSxubW9U97da+okVYQ2SMIT1Q5RU84EbRhmOG1LRTEOOW2Fo+up33qkSrNE3JuxpEGMA8EiRtBY6aGLXA7xUp7c9soVr+rN4P4lfk4qkKPeK392+wlJYyoM4ah1x/ekCTJUhhFOJ6VuqqlEMsIB7VgqMKY6yGYXT9wjq/TdKFG2hHFn6s+JDGOtx3FoO2M0Q73oTcX/vE5qoosgY0KmhgoyXxSl3DWJO33f7TNFieFjS5AoZm91yRAVEmNDKtkQ/MWX/5LmadX3qv7dWaV2lcdRhAM4hGPw4RxqcAN1aAABAU/wAq+Odp6dN+d93lpw8pl9+AXn4xvuWZBq</latexit>

risk

= 2<latexit sha1_base64="YL1ePw28IevvBzwsoV26KDEGt2I=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoBeh6MVjBfuBbSiT7aZdutmE3Y1QQv+FFw+KePXfePPfuG1z0NYHA4/3ZpiZFySCa+O6305hbX1jc6u4XdrZ3ds/KB8etXScKsqaNBax6gSomeCSNQ03gnUSxTAKBGsH49uZ335iSvNYPphJwvwIh5KHnKKx0mNvjEmC5JrU+uWKW3XnIKvEy0kFcjT65a/eIKZpxKShArXuem5i/AyV4VSwaamXapYgHeOQdS2VGDHtZ/OLp+TMKgMSxsqWNGSu/p7IMNJ6EgW2M0Iz0sveTPzP66YmvPIzLpPUMEkXi8JUEBOT2ftkwBWjRkwsQaq4vZXQESqkxoZUsiF4yy+vklat6rlV7/6iUr/J4yjCCZzCOXhwCXW4gwY0gYKEZ3iFN0c7L86787FoLTj5zDH8gfP5Az3oj/Y=</latexit><latexit sha1_base64="YL1ePw28IevvBzwsoV26KDEGt2I=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoBeh6MVjBfuBbSiT7aZdutmE3Y1QQv+FFw+KePXfePPfuG1z0NYHA4/3ZpiZFySCa+O6305hbX1jc6u4XdrZ3ds/KB8etXScKsqaNBax6gSomeCSNQ03gnUSxTAKBGsH49uZ335iSvNYPphJwvwIh5KHnKKx0mNvjEmC5JrU+uWKW3XnIKvEy0kFcjT65a/eIKZpxKShArXuem5i/AyV4VSwaamXapYgHeOQdS2VGDHtZ/OLp+TMKgMSxsqWNGSu/p7IMNJ6EgW2M0Iz0sveTPzP66YmvPIzLpPUMEkXi8JUEBOT2ftkwBWjRkwsQaq4vZXQESqkxoZUsiF4yy+vklat6rlV7/6iUr/J4yjCCZzCOXhwCXW4gwY0gYKEZ3iFN0c7L86787FoLTj5zDH8gfP5Az3oj/Y=</latexit><latexit sha1_base64="YL1ePw28IevvBzwsoV26KDEGt2I=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoBeh6MVjBfuBbSiT7aZdutmE3Y1QQv+FFw+KePXfePPfuG1z0NYHA4/3ZpiZFySCa+O6305hbX1jc6u4XdrZ3ds/KB8etXScKsqaNBax6gSomeCSNQ03gnUSxTAKBGsH49uZ335iSvNYPphJwvwIh5KHnKKx0mNvjEmC5JrU+uWKW3XnIKvEy0kFcjT65a/eIKZpxKShArXuem5i/AyV4VSwaamXapYgHeOQdS2VGDHtZ/OLp+TMKgMSxsqWNGSu/p7IMNJ6EgW2M0Iz0sveTPzP66YmvPIzLpPUMEkXi8JUEBOT2ftkwBWjRkwsQaq4vZXQESqkxoZUsiF4yy+vklat6rlV7/6iUr/J4yjCCZzCOXhwCXW4gwY0gYKEZ3iFN0c7L86787FoLTj5zDH8gfP5Az3oj/Y=</latexit><latexit sha1_base64="YL1ePw28IevvBzwsoV26KDEGt2I=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoBeh6MVjBfuBbSiT7aZdutmE3Y1QQv+FFw+KePXfePPfuG1z0NYHA4/3ZpiZFySCa+O6305hbX1jc6u4XdrZ3ds/KB8etXScKsqaNBax6gSomeCSNQ03gnUSxTAKBGsH49uZ335iSvNYPphJwvwIh5KHnKKx0mNvjEmC5JrU+uWKW3XnIKvEy0kFcjT65a/eIKZpxKShArXuem5i/AyV4VSwaamXapYgHeOQdS2VGDHtZ/OLp+TMKgMSxsqWNGSu/p7IMNJ6EgW2M0Iz0sveTPzP66YmvPIzLpPUMEkXi8JUEBOT2ftkwBWjRkwsQaq4vZXQESqkxoZUsiF4yy+vklat6rlV7/6iUr/J4yjCCZzCOXhwCXW4gwY0gYKEZ3iFN0c7L86787FoLTj5zDH8gfP5Az3oj/Y=</latexit>

Figure: The asymptotic risk function Rκ as a function of α (with σ = 0, n = 300,N = 1000, β = n/N = 0.3 and κ = 1, 2 respectively). The location of α∗ from Theorem1 is marked with a black circle. In both cases, the asymptotic risk at α = 1 is lower than theasymptotic risk at α∗.

https://arxiv.org/abs/1906.01139 [email protected], [email protected]