Regularization Parameter Estimation forrosie/mypresentations/prague.pdf · Regularization Parameter...

Regularization Parameter Estimation forLeast Squares: Using the χ2-curve

Rosemary Renaut, Jodi MeadSupported by NSF

Arizona State and Boise State

Harrachov, August 2007

Outline

IntroductionMethods

Examples

Chi squared MethodBackgroundAlgorithmSingle Variable Newton MethodExtend for General D: Generalized Tikhonov

Results

Conclusions

References

Regularized Least Squares for Ax = b

I Ill-posed system: A ∈ Rm×n, b ∈ Rm, x ∈ Rn

I Generalized Tikhonov regularization with operator D on x.

x̂ = argmin J(x) = argmin{‖Ax−b‖2Wb

+‖D(x−x0)‖2Wx}. (1)

Assume N (A) ∩N (D) = ∅I Statistically Wb is inverse covariance matrix for data b.I Standard: Wx = λ2I, λ unknown penalty parameter

Focus: How to find λ?

Standard Methods I: L-curve - Find the corner

1. Let r(λ) = (A(λ)− A)b,where Influence MatrixA(λ) =A(AT WbA + λ2DT D)−1AT

log(‖Dx‖), log(‖r(λ)‖)

2. Trade off contributions.3. Expensive - requires range

of λ.4. GSVD makes calculations

efficient.5. No statistical information.

Find corner

No corner

Standard Methods II: Generalized Cross-Validation (GCV)

1. Minimizes GCV function

‖b− Ax(λ)‖2Wb

[trace(Im − A(Wx))]2,

Wx = λ−2In

which estimates predictiverisk.

2. Expensive - requires rangeof λ.

3. GSVD makes calculationsefficient.

4. Uses statisticalinformation.

Multiple minima

Sometimes flat

Standard Methods III: Unbiased Predictive Risk Estimation (UPRE)

1. Minimize expected value ofpredictive risk: MinimizeUPRE function

‖b− Ax(λ)‖2Wb

+2 trace(A(Wx))−m

2. Expensive - requires rangeof λ.

3. GSVD makes calculationsefficient.

4. Uses statisticalinformation.

5. Minimum needed

An Illustrative Example: phillips Fredholm integral equation (Hansen)

1. Add noise to b2. Standard deviation

σbi = .01|bi |+ .1bmax

3. Covariance matrixCb = σ2

bIm = Wb−1

4. σ2b average of σ2

5. − is the original b and ∗noisy data.

Example Error 10%

An Illustrative Example: phillips Fredholm integral equation (Hansen)

1. Add noise to b2. Standard deviation

σbi = .01|bi |+ .1bmax

3. Covariance matrixCb = σ2

bIm = Wb−1

4. σ2b average of σ2

5. − is the original b and ∗noisy data.

6. Each method givesdifferent solution: o isL-curve

7. + is reference

Comparison with new method

General Result: Tikhonov (D = I) Cost functional at min is χ2 r.v.

Theorem (Rao:73, Tarantola, Mead (2007))

J(x) = (b− Ax)T Cb−1(b− Ax) + (x− x0)

T Cx−1(x− x0),

I x and b are stochastic (need not be normal)I r = b− Ax0 are iid.I Matrices Cb = Wb

−1 and Cx = Wx−1 are SPD -

I Then for large m,I minimium value of J is a random variableI it follows a χ2 distribution with m degrees of freedom.

Implication: Find Wx such that J is χ2 r.v.

I Theorem implies

m −√

2zα/2 < J(x̂) < m +√

2zα/2

for confidence interval (1− α), x̂ the solution.I Equivalently, when D = I,

m −√

2zα/2 < rT (ACxAT + Cb)−1r < m +√

2zα/2.

I Having found Wx posterior inverse covariance matrix is

W̃x = AT WbA + Wx

Note that Wx is completely general

A New Algorithm for Estimating Model Covariance

Algorithm (Mead 07)Given confidence interval parameter α, initial residualr = b− Ax0 and estimate of the data covariance Cb, find Lxwhich solves the nonlinear optimization.Minimize ‖LxLx

T‖2F

Subject to m −√

2zα/2 < rT (ALxLxT AT + Cb)−1r < m +

√2zα/2

ALxLxT AT + Cb well-conditioned.

Expensive

Single Variable Approach: Seek efficient, practical algorithm

1. Let Wx = σ−2x I, where regularization parameter λ = 1/σx.

2. Use SVD to implement UbΣbV Tb = Wb

1/2A, svsσ1 ≥ σ2 ≥ . . . σp and define s = UbWb

1/2r:3. Find σx such that

m −√

2zα/2 < sT diag(1

σ2i σ2

x + 1)s < m +

√2zα/2.

4. Equivalently, find σ2x such that

F (σx) = sT diag(1

1 + σ2xσ2

i)s−m = 0.

Scalar Root Finding: Newton’s Method

Extension to Generalized Tikhonov

Define

x̂GTik = argminJD(x) = argmin{‖Ax−b‖2Wb

+‖D(x−x0)‖2Wx}, (2)

TheoremFor large m, the minimium value of JD is a random variablewhich follows a χ2 distribution with m − n + p degrees offreedom.

Proof.Use the Generalized Singular Value Decomposition for[Wb

1/2A, Wx1/2D]

Find Wx such that JD is χ2 with m − n + p d.o.f.

Newton Root Finding Wx = σ−2x Ip

I GSVD of [Wb1/2A, D]

A = U[

Υ0m−n×n

]X T D = V [M, 0p×n−p]X T ,

I γi are the generalized singular valuesI m̃ = m − n + p −

∑pi=1 s2

i δγi 0 −∑m

i=n+1 s2i ,

I s̃i = si/(γ2i σ2

x + 1), i = 1, . . . , pI ti = s̃iγi .

Solve F = 0, where

F (σx) = sT s̃− m̃ and F ′(σx) = −2σx‖t‖22.

Observations: Example F

I Initialization GCV, UPRE, L-curve, χ2 all use GSVD (orSVD).

I Algorithm is cheap as compared to GCV, UPRE, L-curve.I F is monotonic decreasing, evenI Solution either exists and is unique for positive σ

I Or no solution exists F (0) < 0.

Relationship to Discrepancy Principle

I The discrepancy principle can be implemented by aNewton method.

I Finds σx such that the regularized residual satisfies

σ2b =

1m‖b− Ax(σ)‖2

2. (3)

I Consistent with our notation

p∑i=1

γ2i σ2 + 1

)2s2i +

m∑i=n+1

s2i = m, (4)

I Similar but note that the weight in the first sum is squaredin this case.

Some Solutions: with no prior information x0

Illustrated are solutions and error bars

No Statistical InformationSolution is Smoothed

With statistical informationCb = diag(σ2

Some Generalized Tikhonov Solutions: First Order Derivative

No Statistical Information Cb = diag(σ2bi

Some Generalized Tikhonov Solutions: Prior x0: Solution not smoothed

Some Generalized Tikhonov Solutions: x0 = 0: Exponential noise

Newton’s Method converges in 5− 10 Iterations

l cb Iterations kmean std

0 1 8.23e + 00 6.64e − 010 2 8.31e + 00 9.80e − 010 3 8.06e + 00 1.06e + 001 1 4.92e + 00 5.10e − 011 2 1.00e + 01 1.16e + 001 3 1.00e + 01 1.19e + 002 1 5.01e + 00 8.90e − 012 2 8.29e + 00 1.48e + 002 3 8.38e + 00 1.50e + 00

Table: Convergence characteristics for problem phillips with n = 40over 500 runs

Newton’s Method converges in 5− 10 Iterations

l cb Iterations kmean std

0 1 6.84e + 00 1.28e + 000 2 8.81e + 00 1.36e + 000 3 8.72e + 00 1.46e + 001 1 6.05e + 00 1.30e + 001 2 7.40e + 00 7.68e − 011 3 7.17e + 00 8.12e − 012 1 6.01e + 00 1.40e + 002 2 7.28e + 00 8.22e − 012 3 7.33e + 00 8.66e − 01

Table: Convergence characteristics for problem blur with n = 36 over500 runs

Estimating The Error and Predictive Risk

Errorl cb χ2 L GCV UPRE

mean mean mean mean0 2 4.37e − 03 4.39e − 03 4.21e − 03 4.22e − 030 3 4.32e − 03 4.42e − 03 4.21e − 03 4.22e − 031 2 4.35e − 03 5.17e − 03 4.30e − 03 4.30e − 031 3 4.39e − 03 5.05e − 03 4.38e − 03 4.37e − 032 2 4.50e − 03 6.68e − 03 4.39e − 03 4.56e − 032 3 4.37e − 03 6.66e − 03 4.43e − 03 4.54e − 03

Table: Error characteristics for problem phillips with n = 60 over 500runs with error contaminated x0. Relative errors larger than .009removed.

Results are comparable

Riskl cb χ2 L GCV UPRE

mean mean mean mean0 2 3.78e − 02 5.22e − 02 3.15e − 02 2.92e − 020 3 3.88e − 02 5.10e − 02 2.97e − 02 2.90e − 021 2 3.94e − 02 5.71e − 02 3.02e − 02 2.74e − 021 3 1.10e − 01 5.90e − 02 3.27e − 02 2.79e − 022 2 3.41e − 02 6.00e − 02 3.35e − 02 3.79e − 022 3 3.61e − 02 5.98e − 02 3.35e − 02 3.82e − 02

Table: Error characteristics for problem phillips with n = 60 over 500runs

χ2 method does not give best estimate of risk

Error Histogram

Normal noise on rhs, first order derivative, Cb = σ2I

Error Histogram

Exponential noise on rhs, first order derivative, Cb = σ2I

Conclusions

I χ2 Newton algorithm is cost effectiveI It performs as well ( or better) than GCV and UPRE when

statistical information is available.I Should be method of choice when statistical information is

providedI Method can be adapted to find Wb if Wx is provided.

Future Work

I Analyse for truncated expansions (TSVD and TGSVD)-reduce the degrees of freedom.

I Further theoretical analysis and simulations with othernoise distributions.

I Can it be extended for nonlinear regularization terms?(TV?)

I Development of the nonlinear least squares for generaldiagonal Wx.

I Efficient calculation of uncertainty information, covariancematrix.

I Nonlinear problems?

Major References

1. Bennett A, 2005 Inverse Modeling of the Ocean andAtmosphere (Cambridge University Press)

2. Hansen, P. C., 1994, Regularization Tools: A MatlabPackage for Analysis and Solution of Discrete Ill-posedProblems, Numerical Algorithms 6 1-35.

3. Mead J., 2007, A priori weighting for parameter estimation,J. Inv. Ill-posed Problems, to appear.

4. Rao, C. R., 1973, Linear Statistical Inference and itsapplications, Wiley, New York.

5. Tarantola A 2005 Inverse Problem Theory and Methods forModel Parameter Estimation (SIAM).

6. Vogel, C. R., 2002. Computational Methods for InverseProblems, (SIAM), Frontiers in Applied Mathematics.

blur Atmospheric (Gaussian PSF) (Hansen): Again with noise

Solution on Left and Degraded on the Right

Solutions using x0 = 0, Generalized Tikhonov Second Derivative 5% noise

Solutions using x0 = 0, Generalized Tikhonov Second Derivative 10% noise

Regularization Parameter Estimation forrosie/mypresentations/prague.pdf · Regularization Parameter...

Documents

Transcript of Regularization Parameter Estimation forrosie/mypresentations/prague.pdf · Regularization Parameter...

Basis Expansions and Regularization - Stanford Universitystatweb.stanford.edu/~tibs/stat315a/LECTURES/chap5.pdf · ESL Chap 5 |Basis Expansions and Regularization Rob Tibshirani ’

Machine learning Sen Aug2017 - Jackson School of …€¢ CalderónMMacías,C.*, M./K./Sen,and)P.)L.)Stoffa,2000,Neural)Networks)for)Parameter)Estimation)in)Geophysics,Geophysical.

MODELLING AND PARAMETER ESTIMATION IN PET Vesa Oikonen Turku PET Centre 2004-06-03.

Rosemary Renaut, Jodi Mead - Arizona State Universityrosie/mypresentations/cfgpres.pdf · 2008. 1. 2. · Regularization Parameter Estimation for Least Squares: A Newton method using

Parameter Estimation of Mathematical Models Described by ...

Parameter Estimation for a Stochastic Volatility Model ...ewald/004-Parameter...Stochastic volatility models for option pricing were developed to try to cor-rect the unrealistic assumption

TOTAL VARIATION REGULARIZATION FOR IMAGE ...wka/66261.pdfKey words. total variation, regularization, image denoising AMS subject classiﬁcations. 49Q20, 58E30 DOI. 10.1137/060662617

Idiot's guide to... General Linear Model fMRI Elliot Freeman, ICN. fMRI model, Linear Time Series, Design Matrices, Parameter estimation,

Precipitation Estimation Greece

12. Principles of Parameter Estimationeng.uok.ac.ir/mfathi/Courses/Stochastic Process/Ch. 8 Estimation.pdf · 12. Principles of Parameter Estimation • consider the problem of estimating

Symbol TIMING estimation

RegML 2020 Class 2 Tikhonov regularization and kernelslcsl.mit.edu/courses/regml/regml2020/slides/lect2.pdf · 2020-06-29 · This class I Learning and Regularization: logistic regression

Unbiased Risk Estimation for Sparse Analysis …...Unbiased Risk Estimation for Sparse Analysis Regularization Charles Deledalle1, Samuel Vaiter1, Gabriel Peyr e1, ...

CS540 Machine learning Lecture 13 L1 regularizationmurphyk/Teaching/CS540-Fall08/L13.pdf · 2008. 10. 27. · Lecture 13 L1 regularization. Outline • L1 regularization • Algorithms

How to lasso positively, quickly and correctly · The Lasso (a.k.a. L 1 regularization) Consider the linear inverse problem y = X + : An L 1-regularized solution takes a parameter

Self-induced regularization: From linear regression to ...

and its e ects on parameter estimation - MSUalpha.sinp.msu.ru/~panov/Lib/Papers/APH/1310.6109v1.pdf · further explore the evolution of and its e ects on parameter estimation. ...

Regularization parameter estimation for underdetermined ...rosie/mypapers/FinalJuly12013IP.pdfRegularization parameter estimation for underdetermined problems by the ˜2 principle

Regularization Parameter Estimation forrosie/mypapers/NotesMathPaper_v8.pdf · Regularization Parameter Estimation for Underdetermined problems by the ˜2 principle with application

Parameter Estimation of Mathematical Models Described by Differential Equations