EE 396: Lecture 5 - UCLA Vision Labvision.ucla.edu/~ganeshs/dsp_course/lect5.pdf · EE 396: Lecture...

EE 396: Lecture 5

Ganesh Sundaramoorthi

Feb. 22, 2011

In this lecture, we learn to solve the Euler-Lagrange equations of our denoising model PDE:u(x)− α∆u(x) = I(x) x ∈ int(Ω)∂u∂n(x) = 0 x ∈ ∂Ω

. (1)

numerically since in the last lecture, we have understood that the equation has a solution. Note that thesolution will be a global minimizer of our energy :

E(u) = − log p(u|I) =∫

Ω(I(x)− u(x))2 dx+ α

∫Ω|∇u(x)|2 dx, (2)

since the functional is convex.

0.1 Gradient Descent

0.1.1 Steepest Descent in Rn

A generic algorithm for minimizing a smooth energy function is that of gradient descent or steepest descent.This algorithm for a function f : Rn → R works by making a guess for the minimum, xguess ∈ Rn and thencorrecting the guess by moving slightly in the negative gradient direction (recall from your calculus classthat the gradient direction is the direction that maximizes f the quickest), xnext = xguess −∆t∇f(xguess)where ∆t > 0 is small. Since the negative gradient direction decreases the function in the steepest possibleway, xnext should have lower value than xguess provided ∆t is small enough. Then one repeats the processon xnext and hopes that the procedure converges. That is,

x0 = xguess (3)

xk+1 = xk −∆t∇f(xk). (4)

If the algorithm converges, then there is a possibility that the algorithm could become trapped in a localminimum if for example f is not convex (if f is convex, then the algorithm will reach a global minimum).Therefore, the initial guess is quite important; ideally it should be chosen close enough to the global mini-mum as possible or as close as possible to the desired local minimum.

0.1.2 Gradient of Functionals

We now generalize this gradient descent procedure to our denoising model. However, what is meant bygradient of such a functional? Turns out that this is analogous to the multivariable function case. Recall that

1

from previous lectures we have shown that the directional derivative of E is

dE(u) · v =ddtE(u+ tv)

∣∣∣∣t=0

=∫

Ωv(x) · (u(x)− I(x)− α∆u(x)) dx (5)

for v ∈ V , the space of permissible perturbations, and u ∈ U . Note that from this expression of thedirectional derivative, we can choose the perturbation v of u so that the directional derivative in the directionv is minimum. Note that an sure way to ensure dE(u) · v < 0 is to choose

v(x) = −(u(x)− I(x)− α∆u(x)); (6)

in this case,

dE(u) · v = −∫

Ω|u(x)− I(x)− α∆u(x)|2 dx ≤ 0. (7)

But is the steepest direction? To answer this question, we must first note that V is an inner product spaceequipped with the inner product 1

〈v1, v2〉L2 =∫

Ωv1(x)v2(x) dx for v1, v2 ∈ V. (8)

In an inner product space, we automatically get for free the Cauchy-Schwartz inequality | 〈v1, v2〉 | ≤‖v1‖‖v2‖ for the norm induced from the inner product and equality if and only if v1 and v2 are such thatv1 = av2 for some a ∈ R. Applying this to (5), we find that

| dE(u) · v| ≤ ‖v‖L2‖u− I − α∆u‖L2 (9)

or|dE(u) · v|‖v‖L2

≤ ‖u− I − α∆u‖L2 (10)

where ‖ · ‖L2 is the norm induced from the L2 inner product. Therefore we see that the magnitude of thequotient above will be maximized (equality in the above expression) when

v = −(u− I − α∆u), (11)

thus, this justifies callingv = −(u− I − α∆u) = −∇E(u) (12)

the negative gradient direction. This justifies the formal definition :

Definition 1. Let E : U → R be a functional. Then the gradient of E at u (u ∈ U), denoted ∇E(u) ∈ V , isdefined as the perturbation that satisfies

dE(u) · v =∫

Ωv(x) · ∇E(u) dx = 〈v,∇E(u)〉L2 . (13)

1Recall that 〈·, ·〉 is a (real) inner product on a vector space V if the properties hold:

1. Linearity: 〈a1v1 + a2v2, v〉 = a1 〈v1, v〉 + a2 〈v2, v〉 for all v1, v2, v ∈ V and a1, a2 ∈ R.

2. Positive-definiteness: 〈v, v〉 ≥ 0and 〈v, v〉 = 0 is equivalent to v = 0 for all v ∈ V3. Symmetry: 〈v1, v2〉 = 〈v2, v1〉.

Note that a norm can be defined from the inner product: ‖v‖ =p〈v, v〉.

2

0.1.3 Gradient Descent PDE

Now the steepest descent algorithm for our functionals is

u0 = uguess (14)

uk+1 = uk −∆t∇E(uk) (15)

where k represents the iteration number. In continuous time, the gradient descent is the PDE

u(0, ·) = uguess(0, ·)ut(t, ·) = −∇E(u(t, ·))

and (15) is a discretization where we have used the forward Euler difference approximation :

ut(t, ·) =u(t+ ∆t, ·)− u(t, ·)

∆t+O(∆t). (16)

For our denoising model, this becomesut = I − u+ α∆u (17)

where α > 0 2

We assume that we sample the function u on a regular grid with spacing ∆x in the x direction and ∆yin the y direction. We now discretize the PDE 3. Using central differences 4, we find that

u(t, x+ ∆x, y) = u(t, x, y) +∂u

∂x(t, x, y)∆x+

12∂2u

∂x(t, x, y)(∆x)2 +O((∆x)2) (18)

u(t, x−∆x, y) = u(t, x, y)− ∂u

∂x(t, x, y)∆x+

12∂2u

∂x(t, x, y)(∆x)2 +O((∆x)2) (19)

where O(∆x) denotes linear dependence on ∆x. Adding the above two equations, we find that

∂2u

∂x2(t, x, y) =

u(t, x+ ∆x, y)− 2u(t, x, y) + u(t, x−∆x, y)(∆x)2

+O(∆x) (20)

∂2u

∂y2(t, x, y) =

u(t, x, y + ∆y)− 2u(t, x, y) + u(t, x, y −∆y)(∆y)2

+O(∆y) (21)

Setting uk(x, y) = u(k∆t, x, y) we find that our PDE can be written

uk+1(x, y) = uk(x, y) + ∆t[I(x, y)− uk(x, y) +

αuk(x+ ∆x, y)− 2uk(x, y) + uk(x−∆x, y)

(∆x)2+ α

uk(x, y + ∆y)− 2uk(x, y) + uk(x, y −∆y)(∆y)2

]+O(∆x∆t) +O(∆y∆t), (22)

thus, we have discretized our PDE. The question remains on how to choose ∆t to ensure that the numericalscheme is stable and converges.

2The equation ut = α∆u is called the heat equation for Ω = Rn; one can show that u(t, ·) is simply a Gaussian of standarddeviation

√t convolved with the initial guess u0. Therefore, we can understand that our denoising model will be a combination

smoothing with and closeness to the original image I .3The is a subtle art in choosing the discretizing scheme for a PDE; ultimately the approriate numerical scheme requires full

understanding on the behavior of the PDE.4Generally, central differences are preferred compared to forward or backward differences because they induce the least error.

However, in certain PDE choosing central differences could lead to an unstable PDE. Again, the differencing scheme is dependenton the behavior of the PDE.

3

0.1.4 Von Neumann Analysis

We apply Von Neumann analysis to analyze the stability of the discretization of the PDE, and this analysistypically can be applied to linear PDE. We compute the Fourier Transform in the spatial variable (x, y) ofour discretization scheme :

Uk+1(ω1, ω2) = Uk(ω1, ω2)[1−∆t+ α∆t

(ei∆xω1 − 2 + e−i∆xω2

(∆x)2+ei∆yω2 − 2 + e−i∆yω2

(∆y)2

)]+

∆tI(ω1, ω2) (23)

where we have noted that the Fourier transform of uk(x + ∆x, y) is Uk(x, y)ei∆xω1 where Uk denotesFourier transform of uk and i is the imaginary unit. We define the H(ω1, ω2) to be the expression in thebrackets above, and then

H(ω1, ω2) = 1−∆t− 4α∆t(

sin2 (∆xω1/2)(∆x)2

+sin2 (∆yω2/2)

(∆y)2

). (24)

Therefore, (23) becomes

Uk+1(ω1, ω2) = H(ω1, ω2)Uk(ω1, ω2) + ∆tI(ω1, ω2). (25)

Writting the above as an explicit formula, we have that

Uk+1(ω1, ω2) = (H(ω1, ω2))k U0(ω1, ω2) + ∆tI(ω1, ω2)k−1∑i=0

(H(ω1, ω2))k (26)

where (·)i denotes ith power. We see that as k →∞, we must have that

|H(ω1, ω2)| < 1 for all ω1, ω2; (27)

that is

1−∆t− 4α∆t(

sin2 (∆xω1/2)(∆x)2

+sin2 (∆yω2/2)

(∆y)2

)< 1 (28)

and

1−∆t− 4α∆t(

sin2 (∆xω1/2)(∆x)2

+sin2 (∆yω2/2)

(∆y)2

)> −1. (29)

The first yields that ∆t > 0 (noting that α > 0) and the second yields that

∆t <2

1 + 4α(

sin2 (∆xω1/2)(∆x)2

+ sin2 (∆yω2/2)(∆y)2

) for all ω1, ω2 (30)

which means that ∆t must be less than the min of the right hand side over ω1, ω2 or

∆t <2

1 + 4α(

1(∆x)2

+ 1(∆y)2

) . (31)

In the case that ∆x = ∆y = 1, we have that

∆t <2

1 + 8α. (32)

4

0.1.5 Choosing α

Recall that the original energy from our model is

E(u) =1

2σ2n

∫Ω

(I(x)− u(x))2 dx+1

2σ2p

∫Ω|∇u(x)|2 dx, (33)

where σn is the standard deviation of the noise and σp is the standard deviation of the true noise free imagederivatives. We scaled the energy by 2σ2

n (which doesn’t affect the minimum u). Then α was defined as

α =σ2

n

σ2p

=1

SNR, (34)

where SNR can be thought of as a signal to noise ratio. We see that α should be chosen larger as the noiseincreases. In practice, we do not know these parameters, but you may experiment with estimating theseparameters from the noisy image.

Notice that as α becomes larger (as the image becomes more noisy), ∆t becomes smaller and thus thealgorithm becomes slower.

0.1.6 Algorithm Denoising Results

A key point is the initialization. Theoretically, we have shown that the energy is convex, and the energy onlyhas global minimum, and thus the initialization should be irrelevant since the scheme can only converge ata global minimum. In practice, initialization is important. For these experiments we choose u0 = I . Alsoof importance is the convergence criteria, and we take that to be

maxx∈Ω|uk(x)− I(x)− α∆uk(x)| < ε (35)

where ε > 0 is our allowed error.Fig. 1 shows an original image that is then corrupted by iid Gaussian noise. Fig. 2 shows the result of

the algorithm constructed.

5

Figure 1: Left: An image of the UCLA campus, and right: the image corrupted synthetically generatednoise (iid Gaussian mean zero with σ2 = 0.05).

6

Figure 2: Left to right, top to bottom: denoising results for the noisy image in Fig. 1 with α =1, 5, 10, 17, 25, 50, respectively.

7

EE 396: Lecture 5 - UCLA Vision Labvision.ucla.edu/~ganeshs/dsp_course/lect5.pdf · EE 396: Lecture...

Documents

Transcript of EE 396: Lecture 5 - UCLA Vision Labvision.ucla.edu/~ganeshs/dsp_course/lect5.pdf · EE 396: Lecture...