Regularization Parameter Estimation for LeastSquares: A Newton method using the χ2-distribution
Rosemary Renaut, Jodi Mead
Arizona State and Boise State
September 2007
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 1 / 31
Outline
1 Introduction- Ill-posed least squaresSome Standard Methods
2 A Statistically based method: Chi squared MethodBackgroundAlgorithmSingle Variable Newton MethodExtend for General D: Generalized TikhonovObservations
3 Results
4 Conclusions
5 References
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 2 / 31
Regularized Least Squares for Ax = bAssume: A ∈ Rm×n, b ∈ Rm, x ∈ Rn, and the system is ill-posed.
Generalized Tikhonov regularization, operator D acts on x.
x̂ = argmin J(x) = argmin{‖Ax− b‖2Wb
+ ‖D(x− x0)‖2Wx}. (1)
Assume N (A) ∩N (D) = ∅Statistically Wb is inverse covariance matrix for data b.Standard: Wx = λ2I, λ unknown penalty parameter: (1) is
x̂(λ) = argmin J(x) = argmin{‖Ax− b‖2Wb
+ λ2‖D(x− x0)‖2}. (2)
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 3 / 31
Regularized Least Squares for Ax = bAssume: A ∈ Rm×n, b ∈ Rm, x ∈ Rn, and the system is ill-posed.
Generalized Tikhonov regularization, operator D acts on x.
x̂ = argmin J(x) = argmin{‖Ax− b‖2Wb
+ ‖D(x− x0)‖2Wx}. (1)
Assume N (A) ∩N (D) = ∅Statistically Wb is inverse covariance matrix for data b.Standard: Wx = λ2I, λ unknown penalty parameter: (1) is
x̂(λ) = argmin J(x) = argmin{‖Ax− b‖2Wb
+ λ2‖D(x− x0)‖2}. (2)
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 3 / 31
Regularized Least Squares for Ax = bAssume: A ∈ Rm×n, b ∈ Rm, x ∈ Rn, and the system is ill-posed.
Generalized Tikhonov regularization, operator D acts on x.
x̂ = argmin J(x) = argmin{‖Ax− b‖2Wb
+ ‖D(x− x0)‖2Wx}. (1)
Assume N (A) ∩N (D) = ∅Statistically Wb is inverse covariance matrix for data b.Standard: Wx = λ2I, λ unknown penalty parameter: (1) is
x̂(λ) = argmin J(x) = argmin{‖Ax− b‖2Wb
+ λ2‖D(x− x0)‖2}. (2)
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 3 / 31
Regularized Least Squares for Ax = bAssume: A ∈ Rm×n, b ∈ Rm, x ∈ Rn, and the system is ill-posed.
Generalized Tikhonov regularization, operator D acts on x.
x̂ = argmin J(x) = argmin{‖Ax− b‖2Wb
+ ‖D(x− x0)‖2Wx}. (1)
Assume N (A) ∩N (D) = ∅Statistically Wb is inverse covariance matrix for data b.Standard: Wx = λ2I, λ unknown penalty parameter: (1) is
x̂(λ) = argmin J(x) = argmin{‖Ax− b‖2Wb
+ λ2‖D(x− x0)‖2}. (2)
Question: What is the correct λ?
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 3 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ)− A)b:Influence Matrix A(λ) =A(AT WbA + λ2DT D)−1AT
Plot
log(‖Dx‖), log(‖r(λ)‖)
Trade off contributions.Expensive - requires range ofλ.GSVD makes calculationsefficient.Not statistically based
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ)− A)b:Influence Matrix A(λ) =A(AT WbA + λ2DT D)−1AT
Plot
log(‖Dx‖), log(‖r(λ)‖)
Trade off contributions.Expensive - requires range ofλ.GSVD makes calculationsefficient.Not statistically based
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ)− A)b:Influence Matrix A(λ) =A(AT WbA + λ2DT D)−1AT
Plot
log(‖Dx‖), log(‖r(λ)‖)
Trade off contributions.Expensive - requires range ofλ.GSVD makes calculationsefficient.Not statistically based
Find corner
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ)− A)b:Influence Matrix A(λ) =A(AT WbA + λ2DT D)−1AT
Plot
log(‖Dx‖), log(‖r(λ)‖)
Trade off contributions.Expensive - requires range ofλ.GSVD makes calculationsefficient.Not statistically based
Find corner
No cornerRenaut and Mead (ASU/Boise) Scalar Newton method September 2007 4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ)− A)b:Influence Matrix A(λ) =A(AT WbA + λ2DT D)−1AT
Plot
log(‖Dx‖), log(‖r(λ)‖)
Trade off contributions.Expensive - requires range ofλ.GSVD makes calculationsefficient.Not statistically based
Find corner
No cornerRenaut and Mead (ASU/Boise) Scalar Newton method September 2007 4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ)− A)b:Influence Matrix A(λ) =A(AT WbA + λ2DT D)−1AT
Plot
log(‖Dx‖), log(‖r(λ)‖)
Trade off contributions.Expensive - requires range ofλ.GSVD makes calculationsefficient.Not statistically based
Find corner
No cornerRenaut and Mead (ASU/Boise) Scalar Newton method September 2007 4 / 31
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ)− A)b:Influence Matrix A(λ) =A(AT WbA + λ2DT D)−1AT
Plot
log(‖Dx‖), log(‖r(λ)‖)
Trade off contributions.Expensive - requires range ofλ.GSVD makes calculationsefficient.Not statistically based
Find corner
No cornerRenaut and Mead (ASU/Boise) Scalar Newton method September 2007 4 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
‖b− Ax(λ)‖2Wb
[trace(Im − A(λ))]2,
which estimates predictiverisk.Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedRequires minimum
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
‖b− Ax(λ)‖2Wb
[trace(Im − A(λ))]2,
which estimates predictiverisk.Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedRequires minimum
Multiple minima
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
‖b− Ax(λ)‖2Wb
[trace(Im − A(λ))]2,
which estimates predictiverisk.Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedRequires minimum
Multiple minima
Sometimes flat
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
‖b− Ax(λ)‖2Wb
[trace(Im − A(λ))]2,
which estimates predictiverisk.Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedRequires minimum
Multiple minima
Sometimes flat
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
‖b− Ax(λ)‖2Wb
[trace(Im − A(λ))]2,
which estimates predictiverisk.Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedRequires minimum
Multiple minima
Sometimes flat
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
‖b− Ax(λ)‖2Wb
[trace(Im − A(λ))]2,
which estimates predictiverisk.Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedRequires minimum
Multiple minima
Sometimes flat
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 5 / 31
Some standard approaches II: Generalized Cross-Validation (GCV)
Minimizes GCV function
‖b− Ax(λ)‖2Wb
[trace(Im − A(λ))]2,
which estimates predictiverisk.Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedRequires minimum
Multiple minima
Sometimes flat
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 5 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation(UPRE)
Minimize expected value ofpredictive risk: MinimizeUPRE function
‖b− Ax(λ)‖2Wb
+2 trace(A(λ))−m
Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedMinimum needed
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 6 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation(UPRE)
Minimize expected value ofpredictive risk: MinimizeUPRE function
‖b− Ax(λ)‖2Wb
+2 trace(A(λ))−m
Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedMinimum needed
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 6 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation(UPRE)
Minimize expected value ofpredictive risk: MinimizeUPRE function
‖b− Ax(λ)‖2Wb
+2 trace(A(λ))−m
Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedMinimum needed
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 6 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation(UPRE)
Minimize expected value ofpredictive risk: MinimizeUPRE function
‖b− Ax(λ)‖2Wb
+2 trace(A(λ))−m
Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedMinimum needed
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 6 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation(UPRE)
Minimize expected value ofpredictive risk: MinimizeUPRE function
‖b− Ax(λ)‖2Wb
+2 trace(A(λ))−m
Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedMinimum needed
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 6 / 31
Some standard approaches III: Unbiased Predictive Risk Estimation(UPRE)
Minimize expected value ofpredictive risk: MinimizeUPRE function
‖b− Ax(λ)‖2Wb
+2 trace(A(λ))−m
Expensive - requires range ofλ.GSVD makes calculationsefficient.Statistically basedMinimum needed
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 6 / 31
Development
A New Statistically based method: The Chi squared MethodIts BackgroundA Newton algorithmSome ExamplesFuture Work
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 7 / 31
Development
A New Statistically based method: The Chi squared MethodIts BackgroundA Newton algorithmSome ExamplesFuture Work
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 7 / 31
Development
A New Statistically based method: The Chi squared MethodIts BackgroundA Newton algorithmSome ExamplesFuture Work
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 7 / 31
Development
A New Statistically based method: The Chi squared MethodIts BackgroundA Newton algorithmSome ExamplesFuture Work
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 7 / 31
Development
A New Statistically based method: The Chi squared MethodIts BackgroundA Newton algorithmSome ExamplesFuture Work
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 7 / 31
General Result: Tikhonov (D = I) Cost functional at min is χ2 r.v.
Theorem (Rao:73, Tarantola, Mead (2007))
J(x) = (b− Ax)T Cb−1(b− Ax) + (x− x0)
T Cx−1(x− x0),
x and b are stochastic (need not be normal)r = b− Ax0 are iid. (Assume no components are zero)Matrices Cb = Wb
−1 and Cx = Wx−1 are SPD -
Then for large m,minimium value of J is a random variableit follows a χ2 distribution with m degrees of freedom.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 8 / 31
General Result: Tikhonov (D = I) Cost functional at min is χ2 r.v.
Theorem (Rao:73, Tarantola, Mead (2007))
J(x) = (b− Ax)T Cb−1(b− Ax) + (x− x0)
T Cx−1(x− x0),
x and b are stochastic (need not be normal)r = b− Ax0 are iid. (Assume no components are zero)Matrices Cb = Wb
−1 and Cx = Wx−1 are SPD -
Then for large m,minimium value of J is a random variableit follows a χ2 distribution with m degrees of freedom.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 8 / 31
General Result: Tikhonov (D = I) Cost functional at min is χ2 r.v.
Theorem (Rao:73, Tarantola, Mead (2007))
J(x) = (b− Ax)T Cb−1(b− Ax) + (x− x0)
T Cx−1(x− x0),
x and b are stochastic (need not be normal)r = b− Ax0 are iid. (Assume no components are zero)Matrices Cb = Wb
−1 and Cx = Wx−1 are SPD -
Then for large m,minimium value of J is a random variableit follows a χ2 distribution with m degrees of freedom.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 8 / 31
General Result: Tikhonov (D = I) Cost functional at min is χ2 r.v.
Theorem (Rao:73, Tarantola, Mead (2007))
J(x) = (b− Ax)T Cb−1(b− Ax) + (x− x0)
T Cx−1(x− x0),
x and b are stochastic (need not be normal)r = b− Ax0 are iid. (Assume no components are zero)Matrices Cb = Wb
−1 and Cx = Wx−1 are SPD -
Then for large m,minimium value of J is a random variableit follows a χ2 distribution with m degrees of freedom.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 8 / 31
General Result: Tikhonov (D = I) Cost functional at min is χ2 r.v.
Theorem (Rao:73, Tarantola, Mead (2007))
J(x) = (b− Ax)T Cb−1(b− Ax) + (x− x0)
T Cx−1(x− x0),
x and b are stochastic (need not be normal)r = b− Ax0 are iid. (Assume no components are zero)Matrices Cb = Wb
−1 and Cx = Wx−1 are SPD -
Then for large m,minimium value of J is a random variableit follows a χ2 distribution with m degrees of freedom.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 8 / 31
Implications:
Theorem implies
m −√
2zα/2 < J(x̂) < m +√
2zα/2
for confidence interval (1− α), x̂ the solution.Equivalently, when D = I,
m −√
2zα/2 < rT (ACxAT + Cb)−1r < m +√
2zα/2.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 9 / 31
Implications:
Theorem implies
m −√
2zα/2 < J(x̂) < m +√
2zα/2
for confidence interval (1− α), x̂ the solution.Equivalently, when D = I,
m −√
2zα/2 < rT (ACxAT + Cb)−1r < m +√
2zα/2.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 9 / 31
Implications:
Theorem implies
m −√
2zα/2 < J(x̂) < m +√
2zα/2
for confidence interval (1− α), x̂ the solution.Equivalently, when D = I,
m −√
2zα/2 < rT (ACxAT + Cb)−1r < m +√
2zα/2.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 9 / 31
Implications:
Theorem implies
m −√
2zα/2 < J(x̂) < m +√
2zα/2
for confidence interval (1− α), x̂ the solution.Equivalently, when D = I,
m −√
2zα/2 < rT (ACxAT + Cb)−1r < m +√
2zα/2.
Note no assumptions on Wx : it is completely general
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 9 / 31
Implications:
Theorem implies
m −√
2zα/2 < J(x̂) < m +√
2zα/2
for confidence interval (1− α), x̂ the solution.Equivalently, when D = I,
m −√
2zα/2 < rT (ACxAT + Cb)−1r < m +√
2zα/2.
Note no assumptions on Wx : it is completely general
Can we use the result to obtain an efficient algorithm?
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 9 / 31
First attempt: a New Algorithm for Estimating Model Covariance
Algorithm (Mead 07)Given confidence interval parameter α, initial residual r = b− Ax0 andestimate of the data covariance Cb, find Lx which solves the nonlinearoptimization.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 10 / 31
First attempt: a New Algorithm for Estimating Model Covariance
Algorithm (Mead 07)Given confidence interval parameter α, initial residual r = b− Ax0 andestimate of the data covariance Cb, find Lx which solves the nonlinearoptimization.
Minimize ‖LxLxT‖2
FSubject to m −
√2zα/2 < rT (ALxLx
T AT + Cb)−1r < m +√
2zα/2ALxLx
T AT + Cb well-conditioned.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 10 / 31
First attempt: a New Algorithm for Estimating Model Covariance
Algorithm (Mead 07)Given confidence interval parameter α, initial residual r = b− Ax0 andestimate of the data covariance Cb, find Lx which solves the nonlinearoptimization.
Minimize ‖LxLxT‖2
FSubject to m −
√2zα/2 < rT (ALxLx
T AT + Cb)−1r < m +√
2zα/2ALxLx
T AT + Cb well-conditioned.
Expensive
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 10 / 31
Single Variable Approach: Seek efficient, practical algorithm
Let Wx = σ−2x I, where regularization parameter λ = 1/σx.
Use SVD to implement UbΣbV Tb = Wb
1/2A, svs σ1 ≥ σ2 ≥ . . . σpand define s = UbWb
1/2r:Find σx such that
m −√
2zα/2 < sT diag(1
σ2i σ2
x + 1)s < m +
√2zα/2.
Equivalently, find σ2x such that
F (σx) = sT diag(1
1 + σ2xσ2
i)s−m = 0.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 11 / 31
Single Variable Approach: Seek efficient, practical algorithm
Let Wx = σ−2x I, where regularization parameter λ = 1/σx.
Use SVD to implement UbΣbV Tb = Wb
1/2A, svs σ1 ≥ σ2 ≥ . . . σpand define s = UbWb
1/2r:Find σx such that
m −√
2zα/2 < sT diag(1
σ2i σ2
x + 1)s < m +
√2zα/2.
Equivalently, find σ2x such that
F (σx) = sT diag(1
1 + σ2xσ2
i)s−m = 0.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 11 / 31
Single Variable Approach: Seek efficient, practical algorithm
Let Wx = σ−2x I, where regularization parameter λ = 1/σx.
Use SVD to implement UbΣbV Tb = Wb
1/2A, svs σ1 ≥ σ2 ≥ . . . σpand define s = UbWb
1/2r:Find σx such that
m −√
2zα/2 < sT diag(1
σ2i σ2
x + 1)s < m +
√2zα/2.
Equivalently, find σ2x such that
F (σx) = sT diag(1
1 + σ2xσ2
i)s−m = 0.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 11 / 31
Single Variable Approach: Seek efficient, practical algorithm
Let Wx = σ−2x I, where regularization parameter λ = 1/σx.
Use SVD to implement UbΣbV Tb = Wb
1/2A, svs σ1 ≥ σ2 ≥ . . . σpand define s = UbWb
1/2r:Find σx such that
m −√
2zα/2 < sT diag(1
σ2i σ2
x + 1)s < m +
√2zα/2.
Equivalently, find σ2x such that
F (σx) = sT diag(1
1 + σ2xσ2
i)s−m = 0.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 11 / 31
Single Variable Approach: Seek efficient, practical algorithm
Let Wx = σ−2x I, where regularization parameter λ = 1/σx.
Use SVD to implement UbΣbV Tb = Wb
1/2A, svs σ1 ≥ σ2 ≥ . . . σpand define s = UbWb
1/2r:Find σx such that
m −√
2zα/2 < sT diag(1
σ2i σ2
x + 1)s < m +
√2zα/2.
Equivalently, find σ2x such that
F (σx) = sT diag(1
1 + σ2xσ2
i)s−m = 0.
Scalar Root Finding: Newton’s Method
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 11 / 31
Extension to Generalized Tikhonov
Define
x̂GTik = argminJD(x) = argmin{‖Ax− b‖2Wb
+ ‖D(x− x0)‖2Wx}, (3)
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 12 / 31
Extension to Generalized Tikhonov
Define
x̂GTik = argminJD(x) = argmin{‖Ax− b‖2Wb
+ ‖D(x− x0)‖2Wx}, (3)
Theorem
For large m, the minimium value of JD is a random variable whichfollows a χ2 distribution with m− n + p degrees of freedom. (Assumingthat no components of r are zero)
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 12 / 31
Extension to Generalized Tikhonov
Define
x̂GTik = argminJD(x) = argmin{‖Ax− b‖2Wb
+ ‖D(x− x0)‖2Wx}, (3)
Theorem
For large m, the minimium value of JD is a random variable whichfollows a χ2 distribution with m− n + p degrees of freedom. (Assumingthat no components of r are zero)
Proof.Use the Generalized Singular Value Decomposition for[Wb
1/2A, Wx1/2D]
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 12 / 31
Extension to Generalized Tikhonov
Define
x̂GTik = argminJD(x) = argmin{‖Ax− b‖2Wb
+ ‖D(x− x0)‖2Wx}, (3)
Theorem
For large m, the minimium value of JD is a random variable whichfollows a χ2 distribution with m− n + p degrees of freedom. (Assumingthat no components of r are zero)
Proof.Use the Generalized Singular Value Decomposition for[Wb
1/2A, Wx1/2D]
Find Wx such that JD is χ2 with m − n + p d.o.f.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 12 / 31
Newton Root Finding Wx = σ−2x Ip
LetGSVD of [Wb
1/2A, D]
A = U[
Υ0m−n×n
]X T D = V [M, 0p×n−p]X T ,
γi are the generalized singular valuesm̃ = m − n + p −
∑pi=1 s2
i δγi 0 −∑m
i=n+1 s2i ,
s̃i = si/(γ2i σ2
x + 1), i = 1, . . . , pti = s̃iγi .
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 13 / 31
Newton Root Finding Wx = σ−2x Ip
LetGSVD of [Wb
1/2A, D]
A = U[
Υ0m−n×n
]X T D = V [M, 0p×n−p]X T ,
γi are the generalized singular valuesm̃ = m − n + p −
∑pi=1 s2
i δγi 0 −∑m
i=n+1 s2i ,
s̃i = si/(γ2i σ2
x + 1), i = 1, . . . , pti = s̃iγi .
Find root of∑p
i=1(1
γ2i σ2+1)s2
i +∑m
i=n+1 s2i = m
Solve F = 0, where
F (σx) = sT s̃− m̃ and F ′(σx) = −2σx‖t‖22.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 13 / 31
An Illustrative Example: phillips Fredholm integral equation (Hansen)
Add noise to bStandard deviationσbi = .01|bi |+ .1bmax
Covariance matrixCb = σ2
bIm = Wb−1
σ2b average of σ2
bi
− is the original b and ∗ noisydata.
Example Error 10%
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 14 / 31
An Illustrative Example: phillips Fredholm integral equation (Hansen)
Compare Solutions:+ is reference x0. −− is exact.L-Curve oThree other solutions: UPRE,GCV and χ2 method (blue,magenta, black)Each method gives differentsolution - but UPRE, GCV andχ2 are comparable.
Comparison with new method
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 14 / 31
Observations: Example F
Initialization GCV, UPRE, L-curve, χ2 use GSVD (or SVD).Algorithm is cheap as compared to GCV, UPRE, L-curve.F is monotonic decreasing, even
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 15 / 31
Observations: Example F
Initialization GCV, UPRE, L-curve, χ2 use GSVD (or SVD).Algorithm is cheap as compared to GCV, UPRE, L-curve.F is monotonic decreasing, evenSolution either exists and is unique for positive σ
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 15 / 31
Observations: Example F
Initialization GCV, UPRE, L-curve, χ2 use GSVD (or SVD).Algorithm is cheap as compared to GCV, UPRE, L-curve.F is monotonic decreasing, evenSolution either exists and is unique for positive σ
Or no solution exists F (0) < 0.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 15 / 31
Observations: Example F
Initialization GCV, UPRE, L-curve, χ2 use GSVD (or SVD).Algorithm is cheap as compared to GCV, UPRE, L-curve.F is monotonic decreasing, evenSolution either exists and is unique for positive σ
Or no solution exists F (0) < 0.Theoretically, limσ→∞ F > 0 possible. Equivalent to λ = 0. Noregularization needed.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 15 / 31
Observations: Example F
Initialization GCV, UPRE, L-curve, χ2 use GSVD (or SVD).Algorithm is cheap as compared to GCV, UPRE, L-curve.F is monotonic decreasing, evenSolution either exists and is unique for positive σ
Or no solution exists F (0) < 0.Theoretically, limσ→∞ F > 0 possible. Equivalent to λ = 0. Noregularization needed.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 15 / 31
Remark on F (0) < 0
Notice, when F (0) < 0, m̃ is too big relative to J.Equivalently, there are insufficient degrees of freedom.Notice
J(x̂) = ‖P1/2s‖22, P = diag(1/((γiσ)2 + 1), 0n−p, Im−n)
In particular J(x̂(0)) = ‖P1/2(0)s‖22 = y , for some y . If y < m̃, set
m̃ = floor(y)Theorem is revised to: m̃ = min{floor(J(0)), m − n + p}.In example J(0) ≈ 39, F (0) ≈ −461. On right m̃ = 38.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 16 / 31
Remark on F (0) < 0
Notice, when F (0) < 0, m̃ is too big relative to J.Equivalently, there are insufficient degrees of freedom.Notice
J(x̂) = ‖P1/2s‖22, P = diag(1/((γiσ)2 + 1), 0n−p, Im−n)
In particular J(x̂(0)) = ‖P1/2(0)s‖22 = y , for some y . If y < m̃, set
m̃ = floor(y)Theorem is revised to: m̃ = min{floor(J(0)), m − n + p}.In example J(0) ≈ 39, F (0) ≈ −461. On right m̃ = 38.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 16 / 31
Remark on F (0) < 0
Notice, when F (0) < 0, m̃ is too big relative to J.Equivalently, there are insufficient degrees of freedom.Notice
J(x̂) = ‖P1/2s‖22, P = diag(1/((γiσ)2 + 1), 0n−p, Im−n)
In particular J(x̂(0)) = ‖P1/2(0)s‖22 = y , for some y . If y < m̃, set
m̃ = floor(y)Theorem is revised to: m̃ = min{floor(J(0)), m − n + p}.In example J(0) ≈ 39, F (0) ≈ −461. On right m̃ = 38.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 16 / 31
Remark on F (0) < 0
Notice, when F (0) < 0, m̃ is too big relative to J.Equivalently, there are insufficient degrees of freedom.Notice
J(x̂) = ‖P1/2s‖22, P = diag(1/((γiσ)2 + 1), 0n−p, Im−n)
In particular J(x̂(0)) = ‖P1/2(0)s‖22 = y , for some y . If y < m̃, set
m̃ = floor(y)Theorem is revised to: m̃ = min{floor(J(0)), m − n + p}.In example J(0) ≈ 39, F (0) ≈ −461. On right m̃ = 38.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 16 / 31
Remark on F (0) < 0
Notice, when F (0) < 0, m̃ is too big relative to J.Equivalently, there are insufficient degrees of freedom.Notice
J(x̂) = ‖P1/2s‖22, P = diag(1/((γiσ)2 + 1), 0n−p, Im−n)
In particular J(x̂(0)) = ‖P1/2(0)s‖22 = y , for some y . If y < m̃, set
m̃ = floor(y)Theorem is revised to: m̃ = min{floor(J(0)), m − n + p}.In example J(0) ≈ 39, F (0) ≈ −461. On right m̃ = 38.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 16 / 31
Example: Seismic Signal Restoration
Real data set of 48 signals of length 500.The point spread function is derived from the signalsSolve Pf = g, where P is psf matrix, g is signal and restore f .Calculate the signal variance pointwise over all 48 signals.Compare restoration of S-wave with derivative orders 0, 1, 2Weighting matrices are I, σ−2
g I, and diag(σ−2gi
), cases 1, 2, and 3.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 17 / 31
Tikhonov Regularization
ObservationsReducedDegrees ofFreedomRelevant!Degrees ofFreedom foundautomaticallyCase 2 and 1have differentsolutionsCase 3 leavesthe features ofthe signal
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 18 / 31
First and Second Order Derivative Restoration
Observations
Here derivativesmoothing is notdesirableCase 3 preservessignal characteristics.Given value is λ, theregularizationparameter.λ increases withderivative order - moresmoothing.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 19 / 31
Comparison with L-curve and UPRE Solutions
Observations
L-curveunderestimates λseverely.UPRE and χ2 aresimilar when DOF arelimited on χ2.UPRE underestimatesfor case 2 and 3weighting.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 20 / 31
Newton’s Method converges in 5− 10 Iterations
l cb Iterations kmean std
0 1 8.23e + 00 6.64e − 010 2 8.31e + 00 9.80e − 010 3 8.06e + 00 1.06e + 001 1 4.92e + 00 5.10e − 011 2 1.00e + 01 1.16e + 001 3 1.00e + 01 1.19e + 002 1 5.01e + 00 8.90e − 012 2 8.29e + 00 1.48e + 002 3 8.38e + 00 1.50e + 00
Table: Convergence characteristics for problem phillips with n = 40 over 500runs
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 21 / 31
Newton’s Method converges in 5− 10 Iterations
l cb Iterations kmean std
0 1 6.84e + 00 1.28e + 000 2 8.81e + 00 1.36e + 000 3 8.72e + 00 1.46e + 001 1 6.05e + 00 1.30e + 001 2 7.40e + 00 7.68e − 011 3 7.17e + 00 8.12e − 012 1 6.01e + 00 1.40e + 002 2 7.28e + 00 8.22e − 012 3 7.33e + 00 8.66e − 01
Table: Convergence characteristics for problem blur with n = 36 over 500runs
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 21 / 31
Estimating The Error and Predictive Risk
Errorl cb χ2 L GCV UPRE
mean mean mean mean0 2 4.37e − 03 4.39e − 03 4.21e − 03 4.22e − 030 3 4.32e − 03 4.42e − 03 4.21e − 03 4.22e − 031 2 4.35e − 03 5.17e − 03 4.30e − 03 4.30e − 031 3 4.39e − 03 5.05e − 03 4.38e − 03 4.37e − 032 2 4.50e − 03 6.68e − 03 4.39e − 03 4.56e − 032 3 4.37e − 03 6.66e − 03 4.43e − 03 4.54e − 03
Table: Error characteristics for problem phillips with n = 60 over 500 runs witherror contaminated x0. Relative errors larger than .009 removed.
Results are comparable
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 22 / 31
Estimating The Error and Predictive Risk
Riskl cb χ2 L GCV UPRE
mean mean mean mean0 2 3.78e − 02 5.22e − 02 3.15e − 02 2.92e − 020 3 3.88e − 02 5.10e − 02 2.97e − 02 2.90e − 021 2 3.94e − 02 5.71e − 02 3.02e − 02 2.74e − 021 3 1.10e − 01 5.90e − 02 3.27e − 02 2.79e − 022 2 3.41e − 02 6.00e − 02 3.35e − 02 3.79e − 022 3 3.61e − 02 5.98e − 02 3.35e − 02 3.82e − 02
Table: Error characteristics for problem phillips with n = 60 over 500 runs
χ2 method does not give best estimate of risk
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 22 / 31
Estimating The Error and Predictive Risk
Error Histogram
Normal noise on rhs, first order derivative, Cb = σ2IRenaut and Mead (ASU/Boise) Scalar Newton method September 2007 22 / 31
Estimating The Error and Predictive Risk
Error Histogram
Exponential noise on rhs, first order derivative, Cb = σ2IRenaut and Mead (ASU/Boise) Scalar Newton method September 2007 22 / 31
Conclusions
χ2 Newton algorithm is cost effectiveIt performs as well ( or better) than GCV and UPRE whenstatistical information is available.Should be method of choice when statistical information isprovidedMethod can be adapted to find Wb if Wx is provided.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 23 / 31
Future Work
Analyse for truncated expansions (TSVD and TGSVD) -reducethe degrees of freedom.Further theoretical analysis and simulations with other noisedistributions. Comparison new work of Rust & O’Leary 2007.Can it be extended for nonlinear regularization terms? (TV?)Development of the nonlinear least squares for general diagonalWx.Efficient calculation of uncertainty information, covariance matrix.Nonlinear problems?
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 24 / 31
Some Solutions: with no prior information x0
Illustrated are solutions and error bars
No Statistical InformationSolution is Smoothed
With statistical informationCb = diag(σ2
bi)
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 25 / 31
Some Generalized Tikhonov Solutions: First Order Derivative
No Statistical Information Cb = diag(σ2bi
)
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 26 / 31
Some Generalized Tikhonov Solutions: Prior x0: Solution not smoothed
No Statistical Information Cb = diag(σ2bi
)
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 27 / 31
Some Generalized Tikhonov Solutions: x0 = 0: Exponential noise
No Statistical Information Cb = diag(σ2bi
)
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 28 / 31
Relationship to Discrepancy Principle
The discrepancy principle can be implemented by a Newtonmethod.Finds σx such that the regularized residual satisfies
σ2b =
1m‖b− Ax(σ)‖2
2. (4)
Consistent with our notation
p∑i=1
(1
γ2i σ2 + 1
)2s2i +
m∑i=n+1
s2i = m, (5)
Weight in the first sum is squared here, otherwise functional is thesame.But discrepancy principle often oversmooths. What happenshere?
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 29 / 31
Major References
Bennett A, 2005 Inverse Modeling of the Ocean and Atmosphere(Cambridge University Press)Hansen, P. C., 1994, Regularization Tools: A Matlab Package forAnalysis and Solution of Discrete Ill-posed Problems, NumericalAlgorithms 6 1-35.Mead J., 2007, A priori weighting for parameter estimation, J. Inv.Ill-posed Problems, to appear.Rao, C. R., 1973, Linear Statistical Inference and its applications,Wiley, New York.Tarantola A 2005 Inverse Problem Theory and Methods for ModelParameter Estimation (SIAM).Vogel, C. R., 2002. Computational Methods for Inverse Problems,(SIAM), Frontiers in Applied Mathematics.
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 30 / 31
blur Atmospheric (Gaussian PSF) (Hansen): Again with noise
Solution on Left and Degraded on the Right
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 31 / 31
blur Atmospheric (Gaussian PSF) (Hansen): Again with noise
Solutions using x0 = 0, Generalized Tikhonov Second Derivative 5% noise
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 31 / 31
blur Atmospheric (Gaussian PSF) (Hansen): Again with noise
Solutions using x0 = 0, Generalized Tikhonov Second Derivative 10% noise
Renaut and Mead (ASU/Boise) Scalar Newton method September 2007 31 / 31
Top Related