Download - Unbiased Risk Estimation for Sparse Analysis …...Unbiased Risk Estimation for Sparse Analysis Regularization Charles Deledalle1, Samuel Vaiter1, Gabriel Peyr e1, ...

Transcript
Page 1: Unbiased Risk Estimation for Sparse Analysis …...Unbiased Risk Estimation for Sparse Analysis Regularization Charles Deledalle1, Samuel Vaiter1, Gabriel Peyr e1, ...

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x?(y, λ) ∈ argminx∈RN

1

2||y − Φx||2 + λ||D∗x||1 (Pλ(y))

which aims at inverting

y = Φx0 + w

by promoting sparsity and with

I x0 ∈ RN the unknown image of interest,

I y ∈ RQ the low-dimensional noisy observation of x0,

I Φ ∈ RQ×N a linear operator that models the acquisition process,

I w ∼ N (0, σ2IdQ) the noise component,

I D ∈ RN×P an analysis dictionary, and

I λ > 0 a regularization parameter.

How to choose the value of the parameter λ?

Risk-based selection of λ

I Risk associated to λ: measure of the expected quality of x?(y, λ) wrt x0,

R(λ) = Ew||x?(y, λ)− x0||2 .I The optimal (theoretical) λ minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x?(y,λ)?

Risk estimation

I Assume y 7→ Φx?(y, λ) is weakly differentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y, λ) =||y − Φx?(y, λ)||2 − σ2Q + 2σ2 tr

(∂Φx?(y, λ)

∂y

)︸ ︷︷ ︸Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew(SURE(y, λ)) = Ew(||Φx0 − Φx?(y, λ)||2) .

Projection risk estimation via GSURE

I Let Π = Φ∗(ΦΦ∗)+Φ be the orthogonal projector on ker(Φ)⊥ = Im(Φ∗),

I Denote xML(y) = Φ∗(ΦΦ∗)+y,

I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y, λ) =||xML(y)− Πx?(y, λ)||2 − σ2 tr((ΦΦ∗)+) + 2σ2 tr

((ΦΦ∗)+∂Φx?(y, λ)

∂y

)is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew(GSURE(y, λ)) = Ew(||Πx0 − Πx?(y, λ)||2)

(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

Unkn

own

Known

SURE GSURE

(here, x? denotes x?(y, λ) for an arbitrary value of λ)

How to estimate the quantity tr((ΦΦ∗)+∂x

?(y,λ)∂y

)?

Main notations and assumptions

I Let I = supp(D∗x?(y, λ)) be the support of D∗x?(y, λ),

I Let J = Ic be the co-support of D∗x?(y, λ),

I Let DI be the submatrix of D whose columns are indexed by I ,

I Let sI = sign(D∗x?(y, λ))I be the subvector of D∗x?(y, λ) whose entries are indexed by I ,

I Let GJ = KerD∗J be the “cospace” associated to x?(y, λ) ,

I To study the local behaviour of x?(y, λ), we impose Φ to be “invertible” on GJ :

GJ ∩ Ker Φ = {0},I It allows us to define the matrix

A[J ] = U(U∗Φ∗ΦU)−1U∗,where U is a matrix whose columns form a basis of GJ ,

I In this case, we obtain an implicit equation:

x?(y, λ) solution of Pλ(y)⇔ x?(y, λ) = x(y, λ) , A[J ]Φ∗y − λA[J ]DIsI .

Is this relation true in a neighbourhood of (y,λ)?

Theorem (Local Parameterization)

I Even if the solutions x?(y, λ) of Pλ(y) might benot unique, Φx?(y, λ) is uniquely defined.

I If (y, λ) 6∈ H, for (y, λ) close to (y, λ), x(y, λ)is a solution of P(y, λ) where

x(y, λ) = A[J ]Φ∗y − λA[J ]DIsI .

I Hence, it allows us writing

∂Φx?(y, λ)

∂y= ΦA[J ]Φ∗ ,

I Moreover, the DOF can be estimated by

tr

(∂Φx?(y, λ)

∂y

)= dim(GJ) .

Can we compute this quantity efficiently?

x1

x2

�0 = 0 �k

x�k= 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ∼ N (0, IdP ),

tr

((ΦΦ∗)+∂Φx?(y, λ)

∂y

)= EZ(〈ν(Z), Φ∗(ΦΦ∗)+Z〉)

where, for any z ∈ RP , ν = ν(z) solves the following linear system(Φ∗Φ DJD∗J 0

)(νν

)=

(Φ∗z

0

).

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ν(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y, λ) at the optimal λ 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter λ

Qu

ad

ratic lo

ss

Projection Risk

GSURE

True Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y, λ) at the optimal λ2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter λ

Qu

ad

ratic lo

ss

Projection Risk

GSURE

True Risk

Perspectives: How to efficiently minimizes GSURE(y,λ) wrt λ?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ [email protected]