Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In...

40
Lecture 10 Discrete Fourier Transforms (cont’d) Some properties of DFTs We now establish a few properties of DFTs which are discrete analogues of properties of Fourier Transforms that you may have seen. Linearity F (f + g)= F f + F g. (1) This is quite trivial to show. Let h = f + g, F = F f and G = F g. By definition, for 0 k N 1, H [k] = N-1 n=0 h[n] exp i2πkn N = N-1 n=0 (f [n]+ g[n]) exp i2πkn N = N-1 n=0 f [n] exp i2πkn N + N-1 n=0 g[n] exp i2πkn N = F [k]+ G[k], (2) which proves Eq. (1). Conjugate symmetry Let F be the DFT of a sampled real signal f of N points. Then F [N k]= F [k], k =0, 1, ··· ,N 1. (3) Proof: By definition, F [k]= N-1 n=0 f [n] exp i2πkn N , (4) so that F [N k] = N-1 n=0 f [n] exp i2π(N k)n N 102

Transcript of Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In...

Page 1: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

Lecture 10

Discrete Fourier Transforms (cont’d)

Some properties of DFTs

We now establish a few properties of DFTs which are discrete analogues of properties of Fourier

Transforms that you may have seen.

Linearity

F(f + g) = Ff + Fg. (1)

This is quite trivial to show. Let h = f + g, F = Ff and G = Fg. By definition, for 0 ≤ k ≤ N − 1,

H[k] =

N−1∑

n=0

h[n] exp

(

−i2πkn

N

)

=N−1∑

n=0

(f [n] + g[n]) exp

(

−i2πkn

N

)

=

N−1∑

n=0

f [n] exp

(

−i2πkn

N

)

+

N−1∑

n=0

g[n] exp

(

−i2πkn

N

)

= F [k] + G[k], (2)

which proves Eq. (1).

Conjugate symmetry

Let F be the DFT of a sampled real signal f of N points. Then

F [N − k] = F [k], k = 0, 1, · · · , N − 1. (3)

Proof: By definition,

F [k] =

N−1∑

n=0

f [n] exp

(

−i2πkn

N

)

, (4)

so that

F [N − k] =

N−1∑

n=0

f [n] exp

(

−i2π(N − k)n

N

)

102

Page 2: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

=N−1∑

n=0

f [n] exp

(

i2πkn

N

)

exp(−i2πn)

= F [k], (5)

and the desired result follows.

Note that the above result may also be written as

F [k] = F [N − k], k = 0, 1, · · · , N − 1. (6)

In other words, F [k] and F [N − k] are complex conjugates of each other. The consequences of this

very important result will be discussed shortly.

Shift Theorem

Let

f = (f [0], f [1], · · · , f [N − 1]) and g = (f [1], f [2], · · · , f [N − 1], f [0]). (7)

In other words, g is obtained from f by shifting the sequence of sample values one space to the left.

We may consider the first element f [0] to be “wrapped-around” and placed at the end, or we may view

it as coming from the next N data points, since f [N ] = f [0]. Then the DFT coefficients of G = Fg

and F = Ff are related as follows,

G[k] = ω−kF [k], k = 0, 1, · · · , N − 1, where ω = exp

(

−i2π

N

)

. (8)

Proof: By defnition,

G[k] =N−1∑

n=0

g[n] exp

(

−i2πkn

N

)

,

=

N−1∑

n=0

f [n + 1] exp

(

−i2πkn

N

)

=

N−1∑

n=0

f [n + 1] exp

(

−i2πk(n + 1)

N

)

exp

(

i2πk

N

)

= exp

(

i2πk

N

) N−1∑

m=0

f [m] exp

(

−i2πkm

N

)

= ω−kF [k]. (9)

103

Page 3: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

Remarks:

1. The Shift Theorem may be applied repeatedly. For example, if gM is obtained by left-shifting

the entries of f M times, i.e.,

f = (f [0], f [1], · · · , f [N − 1]) and gM = (f [M ], f [M + 1], · · · , f [M − 2], f [M − 1]), (10)

then (Exercise)

GM [k] = ω−MkF [k]. (11)

2. The Shift Theorem may also be applied for right-shifted sequences. We leave this as an exercise.

Convolution Theorem

Let f and g be two N -periodic complex vectors. Define the (circular) convolution of these two vectors

as the vector h with components

h[n] =

N−1∑

j=0

f [j]g[n − j], n = 0, 1, · · · , N − 1. (12)

Then the DFT of h is related to the DFTs of f and g as follows,

H[k] = F [k]G[k], (13)

in other words, the pointwise product – the discrete analogue of multiplying two functions together,

i.e., f(x)g(x).

Proof:

H[k] =

N−1∑

n=0

h[n] exp

(

−i2πkn

N

)

,

=N−1∑

n=0

N−1∑

j=0

f [j]g[n − j]

exp

(

−i2πkn

N

)

=

N−1∑

n=0

N−1∑

j=0

f [j] exp

(

−i2πkj

N

)

g[n − j] exp

(

−i2πk(n − j)

N

)

=

N−1∑

j=0

f [j] exp

(

−i2πkj

N

) N−1∑

n=0

g[n − j] exp

(

−i2πk(n − j)

N

)

=

N−1∑

j=0

f [j] exp

(

−i2πkj

N

)

[

N−1∑

l=0

g[l] exp

(

−i2πkl

N

)

]

= F [k]G[k]. (14)

104

Page 4: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

The second-to-last line in the sequence follows from the fact that the products f [j]g[n− j] exhaust all

possible pairs since the vectors are N -periodic.

Some consequences of the above results

Conjugate Symmetry

Recall that if f ∈ RN is a real-valued signal, then its DFT F has conjugate symmetry of the form

F [N − k] = F [k], k = 0, 1, · · · , N. (15)

Assuming that N is even, this implies that we need only compute at most the first N/2 coefficients,

F [0], F [1], · · · , F [N/2 − 1]. The other coefficients may be computed by complex conjugating these

coefficients. But there are some other interesting features.

First of all, note that for k = 0, conjugate symmetry implies that F [N ] = F [0]. But by definition,

F [0] =N−1∑

n=0

f [n], (16)

which is real-valued. Therefore F [N ] = F [0] ∈ R.

For further discussion, we must consider two cases:

1. N is even (This is the case for most applications.)

Setting k = N/2 (an integer) in Eq. (15) implies that the “middle element”

F [N/2] = F [N/2], (17)

implying that F [N/2] ∈ R. All other DFT coefficients F [1], F [2], · · · F [N/2 − 1] are complex. If

we consider a real number to represent one degree of freedom and a complex number two degrees

of freedom, then the total number of “degrees of freedom” represented by the DFT coefficients,

F [0], F [1], · · · , F [N/2 − 1], F [N/2], (18)

is 1+2∗(N/2−1)+1 = N . This is the number of degrees of freedom represented by the original

data vector f ∈ RN .

Example: Let N = 10. The periodic DFT 10-vector is composed of the elements

F [0], F [1], · · · , F [5], · · · , F [9]. (19)

105

Page 5: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

The “middle element,” F [10/2] = F [5], is real-valued. Because of conjugate symmetry, the

sequence in (19) is determined uniquely by the DFT coefficients F [0], · · · , F [5]. Since elements

F [1] to F [4] are complex, the total number of degrees of freedom is 1 + 4 × 2 + 1 = 10.

2. N is odd

In this case, N/2 is not an integer, so there is no “middle element” F [N/2]. The N -dimensional

DFT is uniquely determined by the elements

F [0], F [1], · · · , F [N/2 − 1], F [int(N/2)], (20)

where int(x) denotes “the integer part of x”.

Example: Let N = 9. The periodic DFT 9-vector is composed of the elements

F [0], F [1], · · · , F [4], F [5], · · · , F [8]. (21)

There is no “middle element”. Here, int(N/2) = int(9/2) = 4. Because of conjugate symmetry,

the elements in (21 is determined the DFT coefficients F [0], · · · , F [4]. (Setting k = 4 in (15)

yields F [5] = F [4].) Since elements F [1] to F [4] are complex, the total number of degrees of

freedom is 1 + 4 × 2 = 9

Finally, we note that Examples 1 and 2 of the previous lecture, the cos(2xn) and sin(2xn) data

sets, along with Example 3, where an additional sin(5xn) set was added, demonstrated conjugate

symmetry.

High- and low-frequency DFT coefficients

From Eq. (15), it follows that

|F [N − k]| = |F [k]|, k = 0, 1, 2, · · · , N − 1. (22)

In other words, a plot of the magnitudes of DFT coefficients |F [k]| will be symmetric with respect to

the “middle”:

1. In the case that N is even, the the symmetry will be about the line k = N/2.

2. In the case that N is odd, the symmetry will be about the line k = N/2, a non-integer.

106

Page 6: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

In order to simplify the discussion, and keeping in mind that most, if not all, applications employ even

values of N , we shall assume, from this point onward, that N is even.

The plots of DFT coefficient magnitudes presented in the previous lecture all demonstrate this

symmetry.

From our discussions of Fourier series expansions, we know that the magnitudes of the coefficients

an and bn of cos(nx) and sin(nx), respectively, decay to zero in the limit n → ∞. Since the DFT is

based on complex exponentials, we also expect the DFT coefficients |F [k]| to decay with increasing k.

But from the conjugate symmetry property, the region of high-frequency DFTs is centered about the

value k = N/2, as sketched below.

0 N − 1N/2

k

|F [k]|

increasing oscillationincreasing oscillation

low frequency low frequencyregion region

highest frequency

General behaviour of magnitudes |F [k]| of DFT coefficients.

In applications, it is often more desirable to consider a slightly revised version of this plot of DFT

coefficient magnitudes. Recalling that the DFT values F [k] are N -periodic, the right-half of the above

figure is identical to the plot of DFT coefficients immediately to the left of the point k = 0. As such,

we may consider the plot of F [k] magnitudes centered at k = 0, as shown below. In this way, moving

away from k = 0, in either direction corresponds to higher frequencies.

We shall discuss the region of high-frequency DFT coefficients very shortly.

Shift Theorem

As mentioned in the previous lecture, the Shift Theorem may be applied more than once to treat

sequences that have been left- or right-shifted by M entries. We leave it as an exercise for the reader

107

Page 7: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

N/2

|F [k]|

low frequencyregion

k

0−N/2 + 1

high frequencyhigh frequencyregion region

A revised version of the above plot of the magnitudes |F [k]| of DFT coefficients.

to show how the Shift Theorem may be used to derive the DFT of the sin(2xn) sequence of Example

2 of the previous lecture from the cos(2xn) sequence of Example 1 (or vice versa).

Convolution Theorem

We shall discuss some interesting consequences and applications of this theorem in the next lecture.

“Thresholding” of DFT coefficients as a method of data compression

Data compression is a fundamental area of research and development in signal and image processing.

Practically speaking, you’d like to get as many songs or images on a DVD – or your iPod - as you

can. But you probably know that in order to squeeze more songs on a device, you have to “compress”

the digital data sets representing the songs/images, i.e., reduce the storage space required for each

item. But reducing the storage space means throwing out some information, implying a reduction

in the quality or fidelity of the song or image. Because of the redundancy of signals and images,

some compression is possible without any noticeable changes in aural or visual quality. But as the

compression is pushed higher, noticeable distortions eventually appear – for example, echoing or hissing

in the case of audio and blockiness or blurring in the case of images.

It is not the purpose of this course to study data compression methods in any detail. However,

our study of Fourier, and later wavelet, transforms naturally takes us to some of the basic concepts

that underlie compression methods.

108

Page 8: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

The truncation of the Fourier series expansion of a function f(x), i.e., approximating f(x) by the

partial sum SN (x) may be viewed as a compression method. In general, it is impossible to store all

Fourier coefficients an and bn. We know that for a function f ∈ L2[a, b], the coefficients an and bn

decay to zero: for a sufficiently large N , all coefficients for which n > N will be “negligible.” (Of

course, as we have discussed, N will depend on the decay of the coefficients which, in turn, depends

on the regularity of the function f .

In the case of discrete data sets, i.e., the discrete Fourier transform, we already have a finite

number of coefficients representing our signal/image of interest. Given a signal f ∈ RN , the N DFT

coefficients |F [k]|, k = 0, 1, · · · , N − 1 permit a perfect reconstruction of f . We now wish to perform

compression on this data set.

That being said, it is probably useful to mention one simple, yet important, fact: You can’t

perform compression by simply deleting the signal values f [n]. That would be too brutal. That is not

to say that you couldn’t exploit the redundancy of the signal, e.g., that contiguous elements – f [n]

and f [n + 1] – of the signal are generally close in magnitude. You could do this by keeping f [n] and

perhaps using a single digit, say “1”, to indicate that f [n] is repeated in the next data element. Or

you could register the difference between f [n] and f [n + 1] which generally would require less storage

space. This is the essence of “predictive coding.” But this is getting us deeper into the subject of

data compression and away from the course. Here, we simply wish to show that one can work on the

discrete transforms – Fourier and wavelet – of signals.

One of the simplest methods of performing compression is thresholding, i.e., a perhaps less brutal

removal of “insignificant coefficients” in a discrete Fourier or wavelet transform. (The words significant

and insignificant are heavily used in signal/image compression literature, especially with regard to

wavelets.) The idea is simple: You set a threshold value ǫ > 0, and delete all coefficients with

magnitudes less than ǫ. Let’s present this a little more mathematically:

Thresholding algorithm: Let f = (f [0], f [1], · · · , f [N − 1]) ∈ RN (or CN) represent our signal of

interest, with DFT,

F = Ff = (F [0], F [1], · · · , F [N − 1]) ∈ CN . (23)

Let ǫ ≥ 0 be a threshold parameter. Now define the new sequence

F̃ǫ = (F̃ǫ[0], · · · , F̃ǫ[N − 1]), (24)

109

Page 9: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

as follows,

F̃ǫ[k] =

F [k], if |F [k]| ≥ ǫ,

0, if |F [k]| < ǫ.(25)

Then define

f̃ǫ = F−1F̃ǫ. (26)

f̃ǫ is the “compressed signal” corresponding to the threshold parameter ǫ. Of course, f̃0 = f , since

you haven’t thrown away any DFT coefficients. We expect that for “small” values of ǫ (the question,

of course, is “What is small?” It will depend upon the signal, in particular, on entire ensemble of

DFT coefficients F [k].) f̃ǫ should well approximate f . This could be summarized in the statement

‖f − f̃ǫ‖ → 0 as ǫ → 0. (27)

The quantity on the left may be viewed as the approximation error.

Of course, the question that remains is, “What values of ǫ do we use?” This depends on the

signal and the nature of the DFT spectrum F [k]. One could, for example, sort the DFT coefficients in

decreasing order of magnitude and then decide where to “cut.” Instead of prescribing an ǫ-value, one

may also decide to throw away a prescribed percentage of DFT coefficients based on their insignificance,

e.g., the 10% most insignificant coefficients.

Example: We now illustrate this method with a simple example. Consider the following function

f(x) = e−x2/10[sin(2x) + 2 cos(4x) + 0.4 sin(x) sin(10x)], 0 ≤ x ≤ 2π. (28)

(This function was also used in the book of Boggess and Narcowich, p. 139.) We sample this function

on [0, 2π] with N = 256, i.e.,

f [n] = f(xn), xn =2πn

N, n = 0, 1, · · · , 255. (29)

This original sampled signal is plotted on the top left of the next figure. Consecutive data points f [n]

have been connected with straight lines so that the signal may be seen more clearly. On the top left

is a plot of the magnitudes of the DFT coefficients F [k] of this signal. Note that the 2π-extension of

this signal is not continuous at the endpoints, since f(0) 6= f(2π).

The sets of plots that follow show the results of thresholding for increasing values of ǫ: 1.5, 2.5,

10.0, 20.0 and 40.0. In each case, the signal f̃ and thresholded DFT spectrum F̃ are shown. Also, the

percentage of coefficients retained by the thresholding procedure is given, as well as the L2 (Euclidean)

110

Page 10: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

distance ‖f − f̃‖ between f and f̃ , as well as the percentage relative error of approximation, computed

as follows,‖f − f̃‖

‖f‖× 100. (30)

There are some noteworthy features demonstrated in these plots:

1. First of all, as expected, as ǫ increases, more and more of the inner “high frequency” region of

the DFT spectrum gets deleted.

2. Also as expected, the L2 (Euclidean) distance ‖f − f̃‖ increases as ǫ increases.

3. Note that at small ǫ values, e.g., ǫ = 1.5, the most significant error/distortion to the signal

occurs in the region 5 ≤ x ≤ 6. Indeed, one observes “ringing” there. This is because of the

discontinuity of the 2π-periodic extension of f : The value of f(x255) is close to zero, whereas

f(x256) = f(x0) = 2. That being said, we must mention that the use of all 255 DFT coefficients

will reconstruct the data series f [n] perfectly. However, when terms of the DFT are deleted, the

Gibbs phenomenon will appear.

4. As ǫ is increased, the signal is altered in other regions as well, e.g., the relative minimum near

x = 0.6, and the relative maximum near x = 0.1.

Finally, let us return to the observation that the relative error in approximation of the signal f by

f̃ǫ increases with ǫ. In the next figure is plotted the relative error vs. ǫ for 0 ≤ ǫ ≤ 110. To construct

this plot, the relative errors were computed in increments of ∆ǫ = 0.5.

This figure is a simple example of a rate-distortion curve: a plot of the distortion, or error, vs.

the rate of thresholding. There are some noteworthy features in this plot:

1. For 0 ≤ ǫ < 1, there is no noticeable error. This might be because there are no coefficients with

magnitudes less than these ǫ values, or there are so few that their absence does not affect the

fidelity of reconstruction.

2. As ǫ is increased, there are flat regions. Once again, these might correspond to regions that are

not occupied by the DFT coefficients.

3. At ǫ = 107.5, 100% error is achieved. This is because the DFT coefficient with the highest

magnitude is F [4] = 107.36.

111

Page 11: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

Thresholding: Given an ǫ > 0, remove all DFT coefficients |F [k]| < ǫ.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

25

50

75

100

125

0 50 100 150 200 250

|F(k)|

k

Left: Original signal f [n], n = 0, · · · , 255, obtained by sampling f(x) in Eq. (28). Right: Magnitudes |F [k]| of

DFT coefficients. ‖f‖ = 12.95.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

25

50

75

100

125

0 50 100 150 200 250

|F(k)|

k

Threshold ǫ = 1.5. 44.9% of original coeffs retained. Reconstructed signal f̃ [n] and magnitudes |F̃ [k]|. L2

error ‖f − f̃‖2 = 0.83. Relative L2 error 6.3%.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

25

50

75

100

125

0 50 100 150 200 250

|F(k)|

k

Threshold ǫ = 2.5. 25.4% of original coeffs retained. Reconstructed signal f̃ [n] and magnitudes |F̃ [k]|. L2

error ‖f − f̃‖2 = 1.19. Relative L2 error 9.0%.

112

Page 12: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

25

50

75

100

125

0 50 100 150 200 250

|F(k)|

k

Threshold ǫ = 10.0. 8.2% of original coeffs retained. Reconstructed signal f̃ [n] and magnitudes |F̃ [k]|. L2

error ‖f − f̃‖2 = 2.18. Relative L2 error 16.5%.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

25

50

75

100

125

0 50 100 150 200 250

|F(k)|

k

Threshold ǫ = 20.0. 5.1% of original coeffs retained. Reconstructed signal f̃ [n] and magnitudes |F̃ [k]|. L2

error ‖f − f̃‖2 = 3.75. Relative L2 error 28.5%.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

25

50

75

100

125

0 50 100 150 200 250

|F(k)|

k

Threshold ǫ = 40.0. 3.1% of original coeffs retained. Reconstructed signal f̃ [n] and magnitudes |F̃ [k]|. L2

error ‖f − f̃‖2 = 5.44. Relative L2 error 41.4%.

113

Page 13: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100 110

relative

error

epsilon

Relative error of approximation of signal f with thresholded signal f̃ǫ vs. ǫ.

In order to clarify the questions raised in points 1 and 2 above, it is probably more instructive to

plot the relative error vs. the percentage of coefficients removed: this is a more accurate indication of

the compression rate. This plot is shown below.

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

relative

error

percentage of coeffs removed

Relative error of approximation of signal f with thresholded signal f̃ǫ vs. percentage of coefficients removed.

This plot is much more instructive. It shows that, in fact, significant compression can be performed

before the error becomes appreciable. When 70% of the coefficients are removed, the relative error is

10%.

114

Page 14: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

For this simple example, the compression ratio R may defined as

R =number of DFT coefficients used in original signal

number of DFT coefficients used in thresholded signal(31)

In this case, the numerator is N = 256. A plot of the relative error vs. compression rate is shown

below. This is the true rate-distortion curve for this experiment.

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100 110 120 130

relative

error

compression rate

Relative error of approximation of signal f with thresholded signal f̃ǫ vs. compression rate.

The point situated roughly at (3, 10) corresponds to the 70% removal rate mentioned for the

previous plot. The plot stops at R = 128 – after this rate, virtually all coefficients are removed, with

corresponds to R = ∞, where the asymptotic value of 100% relative error is achieved.

115

Page 15: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

Lecture 11

Discrete Fourier transforms (cont’d)

Thresholding as a method of denoising

The method of thresholding can also be used to “denoise” signals. (In fact, it had been a rather

standard method for wavelet-based denoising of signals and images.) First of all, we should qualify

what we mean by “noisy signals/images.”

The transmission, reproduction or recording of signals/images, as “pure” as they may be initially,

generally introduces distortions. Some of these distortions may be quite systematic in nature, e.g.,

the scratch on a lens of a digital camera. But signals/images may also be subject to distortions that

may be considered as random in nature, for example, the distortion of an audio signal that is send

over a very poor communications line. There are various models for such degradations, according to

the application. In what follows, we employ one of the simplest and most standard models, namely

additive Gaussian noise. Our actual implementation of this model is also quite simple. (That

being said, the majority of research papers basically use the same type of simplified model.)

Let f0 = (f0[0], f0[1], · · · , f0[N − 1]) denote a “pure” or “noiseless”, i.e., undegraded, signal. For

example, it could represent part of an audio track that was recorded in a “perfect studio”. (Of course,

no such studio exists.) We then assume that this perfect signal f0 is degraded according to the

following model,

f = f0 + n, (32)

where n ∈ RN denotes a random N -vector. The components n[i], 1 ≤ i ≤ N − 1 (we’ll use i as

an index instead of the usual n, to avoid the confusing notation “n[n]”) are independent random

variables, which are identically distributed according to the normal or Gaussian distribution N (0, σ),

i.e., zero-mean, standard deviation σ > 0. The vector f then represents the noisy signal. Of course,

what we want is to find f0, or at least a good approximation to it, from f .

As you know from probability/statistics, a proper interpretation of this model implies that we

must consider a large collection or ensemble of such noisy signals produced by this random process.

f0 remains the same, but we’ll have many different noisy signals f produced by the random N -vectors

n. And if we examined the values assumed by a particular entry in the n vectors, say n[5], we would

see that, very roughly, the mean of these values would be near zero, and the variance near σ.

116

Page 16: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

Actually, let’s stop here for a moment and mention that this represents one way of extracting

approximations to the noiseless image f0: By collecting a large number M of such distorted images f

and taking the average of them. If M is large enough, then the average of the n vectors will be roughly

(0, 0, · · · , 0) ∈ RN . Therefore the average of all of these noisy signals will be a rough approximation

to f0. This is one of the oldest methods of noise reduction.

Here, however, we assume that we do not have access to a large number of noisy signals, but only

one – produced, in essence, from a particular realization of the random vector n. In the numerical

experiment below, this particular realization is constructed by simply generating N random numbers

from a random number generator that is designed to generate them according to a normal N (0, σ)

distribution.

Here is the important point:

By “denoising” the noisy signal f , we mean finding approximations to the

noiseless signal f0.

We can never find f0 exactly, since the elements of the random vector n are not known deterministically.

The best we can do is to find approximations to f0.

For the noiseless signal, we shall once again employ the discrete signal of length N = 256,

f0[n] = f(xn), xn =2πn

N, n = 0, 1, · · · , 255, (33)

obtained by sampling the function

f(x) = e−x2/10[sin(2x) + 2 cos(4x) + 0.4 sin(x) sin(10x)], 0 ≤ x ≤ 2π. (34)

The function is plotted once again at the top left in the figure below.

A particular vector n ∈ RN was also generated by means of a random number generator (in the

FORTRAN programming language) using standard deviation value σ = 0.1. The vector n is plotted

at the top right in the figure below.

Finally, the noisy signal f = f0 + n is constructed by adding the components of these two signals,

cf. Eq. (32). The result is plotted at the bottom of the figure.

We now show that some “denoising” of the signal f , i.e, finding approximations to f0, may be

achieved by thresholding the discrete Fourier transform F of f . First of all, recall that the DFT is a

117

Page 17: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

Left: Noiseless signal f0[n], sampled from f(x) in Eq. (34). Right: Noise vector n, σ = 0.1.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

Resulting noisy signal f = f0 + n.

118

Page 18: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

linear operator. This means that

F = F(f) = F(f0 + n) = F(f0) + F(n) = F0 + N. (35)

Here we run the risk of confusion since N , the DFT of n, also denotes the number of samples. We

hope that things will be clear by context.

This addition property of the DFTs is illustrated in the next figure. At the top left is the DFT

F0 of the noiseless signal f . At the top right is the DFT N of the pure-noise signal n. These two are

added to produce the DFT F of the noisy signal F .

0

25

50

75

100

125

0 50 100 150 200 250

|F(k)|

k

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Left: Noiseless DFT F0, obtained from sampled signal f . Right: Noise DFT N , obtained from noise vector n.

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Resulting noisy DFT F = F0 + N .

We now step back and examine the DFT N of the pure-noise signal. Perhaps the most noteworthy

feature of this plot is that the DFT coefficients do not exhibit the decay characteristic of “normal”

signals. In fact, they do not appear to decay at all. It is a fact, which will not be proved here,

119

Page 19: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

that the coefficients N [k] are also random – for all intents and purposes, we may view them as being

generated randomly from a normal distribution. (That being said, each of the coefficients N [k] is

related deterministically from the noise vector coefficients n[i].) In the next figure, we show how the

amplitude of the DFT coefficients is related to the amplitude of the noise vector n. It shows the

noise vector n used in this experiment, with σ = 0.1 (top), along with its DFT, and a noise vector

corresponding to σ = 0.5, so that the random entries can assume values of larger magnitude.

Noise and its DFT representation

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Pure noise f [n] signal, zero-mean, σ = 0.1, and corresponding DFT coefficient magnitudes |N [k]|.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Pure noise f [n] signal, zero-mean, σ = 0.5, and corresponding DFT coefficient magnitudes |N [k]|.

We now come to the main point behind this thresholding denoiser. The coefficients N [k] of

the pure noise vector are relatively small in magnitude for all frequencies k. They are seen to be

insignificant with respect to the low-frequency coefficients F0[k] of the noiseless signal. They are not

insignificant with respect to the high-frequency coefficients F0[k]. Therefore we shall assume that

most of the high-frequency content of the noisy DFT F comes from noise. Since these coefficients are

insignificant with respect to much of the low-frequency content, we conjecture that thresholding might

be able to “remove” much of the noise content in F , and thereby provide reasonable approximations

120

Page 20: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

to F0, hence f0.

The results of thresholding for a number of ǫ values are shown in the next figures. For each ǫ

value are presented the resulting signal as well as the L2 error and relative L2 of approximation to

the noiseless signal f0.

What is rather interesting, and potentially discouraging, is that the L2 error, 1.74, for the case

ǫ = 2.0 is greater than the error, 1.66, for the actual noisy image! As ǫ is increased, however, the error

decreases - it’s 1.64 at ǫ = 3.0 - but then increases again.

The results are not very encouraging and, indeed, thresholding of DFTs is not a very good

method. But it’s not the thresholding that’s the problem – it’s the DFTs. They are too global: each

DFT coefficient contains information from the entire signal. We’ll see later that thresholding works

quite well with wavelet transforms, because of the locality of wavelet functions.

121

Page 21: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

Simple denoising by thresholding

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Left: Original noisy signal f = f0 + n, σ = 0.1. Magnitudes |F [k]| of DFT coefficients. L2 error

‖f0 − f̃‖2 = 1.66. Relative L2 error 12.6%.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Threshold ǫ = 2.0. 57.4% of original coeffs retained. Reconstructed signal f̃ [n] and magnitudes

|F̃ [k]|. L2 error ‖f0 − f̃‖2 = 1.71. Relative L2 error 13.0%.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Threshold ǫ = 3.0. 37.1% of original coeffs retained. Reconstructed signal f̃ [n] and magnitudes

|F̃ [k]|. L2 error ‖f0 − f̃‖2 = 1.64. Relative L2 error 12.4%.

122

Page 22: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Threshold ǫ = 4.0. 25.3% of original coeffs retained. Reconstructed signal f̃ [n] and magnitudes

|F̃ [k]|. L2 error ‖f − f̃‖2 = 1.70. Relative L2 error 12.9%.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Threshold ǫ = 5.0. 16.8% of original coeffs retained. Reconstructed signal f̃ [n] and magnitudes

|F̃ [k]|. L2 error ‖f − f̃‖2 = 1.72. Relative L2 error 13.0%.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Threshold ǫ = 10.0. 9.0% of original coeffs retained. Reconstructed signal f̃ [n] and magnitudes

|F̃ [k]|. L2 error ‖f − f̃‖2 = 2.06. Relative L2 error 15.6%.

123

Page 23: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

A closer look at the Convolution Theorem

In this section, we examine some particular examples, along with a very simple, yet interesting,

application to signal processing.

Recall the Convolution Theorem:

Let f and g be two N -periodic complex vectors. Define the (circular) convolution of these two

vectors as the vector h with components

h[n] =

N−1∑

j=0

f [j]g[n − j], n = 0, 1, · · · , N − 1. (36)

Then the DFT of h is related to the DFTs of f and g as follows,

H[k] = F [k]G[k]. (37)

We first rewrite the RHS of (36) slightly, via a change of variables:

h[n] =N−1∑

j=0

f [n − j]g[j], n = 0, 1, · · · , N − 1. (38)

In this way, we can view f as a “signal” and g as a “mask”: The convolution then produces a new

signal h from f .

In the figure below, we align the vector g with f in a manner appropriate to the convolution. The

terms that are joined by lines are multiplied and then added together to form the entry h[n]:

f [n] f [n + 1]

g[0]g[−1] g[1]

g[N − 1] g[N ]

f [n − 1]

= =

. . .. . .

. . .

. . .

Terms in convolution of f with g contributing to h[n].

The convolution operation may be viewed as a kind of “reverse scalar product: To compute f [n],

we “flip” the order of the elements of g with respect to g[0], which is lined up with f [n] and then

perform the scalar product. We’ll come back to this idea in our study of wavelets.

Let us now examine a few special cases for g:

124

Page 24: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

1. g[0] = 1 and g[n] = 0 otherwise: The only term that contributes to the sum in Eq. (36) is

f [n]g[0] = f [n]. Therefore

h[n] = f [n], n = 0, 1, · · · , N − 1, (39)

or simply h = f . This has the appearance of an identity operation, but it is more convenient to

view g as the discrete version of the Dirac delta function. This will become clearer in the next

example.

In this example, the DFTs of f and h are identical, i.e., H = F . From the Convolution Theorem,

H[k] = F [k]G[k], implying that G[k] = 1. But we could have also derived this result by directly

computing the DFT of g:

G[k] =

N−1∑

n=0

g[n] exp

(

−2πkn

N

)

= g[0] exp(0) (since only g[0] is nonzero)

= 1. (40)

2. g[1] = 1 and g[n] = 0 otherwise: The only term that contributes to the sum in Eq. (36) is

f [n − 1]g[1] = f [n − 1]. Therefore,

h[n] = f [n − 1], n = 0, 1, · · · , N − 1. (41)

Thus,

(h[0], h[1], · · · , h[N − 1]) = (f [N − 1)], f [0], f [1], · · · , f [N − 2]). (42)

In other words, g correponds to the right-shift operator.

We’ll leave it as an exercise for the reader to determine the DFT of g, i.e., G[k], in two different

ways.

3. Of course, we can generalize the above result: g[k0] = 1 and g[n] = 0 otherwise, where k0 ∈

{0, 1, · · · , N − 1}. Then g is a k0-fold right-shift operator.

Once again, we’ll leave it as an exercise for the reader to determine the DFT of g, i.e., G[k], in

two different ways.

125

Page 25: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

Lecture 12

Discrete Fourier transforms (cont’d)

A closer look at the Convolution Theorem: “Averaging” as a convolution

With reference to Eq. (38) from the previous lecture, we now consider the following “mask” g: For

an α ∈ [0, 1],

g[0] = α,

g[1] =1

2(1 − α),

g[−1] = g[N − 1] =1

2(1 − α),

g[n] = 0, otherwise. (43)

In other words, g has at most three non-zero elements. When α = 1, we have the Dirac delta mask.

Note that

g[0] + g[1] + g[−1] = 1. (44)

The convolution of a signal f and g then produces the signal

h[n] = g[1]f [n − 1] + g[0]f [n] + g[−1]f [n + 1]

= αf [n] +1

2(1 − α) (f [n − 1] + f [n + 1]) . (45)

This may be viewed as a weighted averaging of f [n] with its immediate neighbours to produce a new

signal value h[n]. In the special case α = 1/3, the weighting is uniform:

h[n] =1

3(f [n − 1] + f [n] + f [n + 1]). (46)

Since only the immediate neighbours of f [n] are employed in this averaging procedure, it is

often referred to as “local averaging.” The effect of this procedure is to “smoothen out” a signal.

For example, if f [n] lies higher in value than its neighbours, as sketched below, then averaging will

produce a lower value. And, of course, if f [n] lies lower in value, then averaging will produce a higher,

i.e., more positive, value.

This “smoothing” effect may also be viewed as “blurring”, especially if there are sharp disconti-

nuities in the signal, as sketched in the next figure. Signal f consists of two flat, i.e., constant, regions,

126

Page 26: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

n n + 1n − 1

.

.

.

f [n]

f [n − 1]f [n + 1].

h[n]

Local averaging at f [n] to produce h[n].

and an “edge,” or discontinuity between 4 and 5. One application of the convolution/local averaging

will lower the value of f [4] one-third the way down towards the value of f [5], to produce the value h[4]

and raise the value of f [5] one-third the way up to produce the value h[5]. In summary, signal values

are changed at 4 and 5. The other signal values are unaffected since they lie in constant regions –

local averaging will not change their values. The result of this operation is a slightly “blurred” edge,

i.e., a more gradual change in values from the highest ones to the lowest ones.

20 1 3 54 86 7

20 1 3 54 86 7

20 1 3 54 86 7

. . . . .

. . . .

. . . .

...

. .

.

..

..

. .

with edge between

4 and 5

one application of

local averaging

another application

of local averaging

original signal, f ,

h = f ∗ g

r = h ∗ g = f ∗ (g ∗ g)

..

The blurring of an “edge” or discontinuity of a signal by local averaging.

Another application of the averaging operator will alter the values of the signal at 3,4,5 and 6.

127

Page 27: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

The reader can see that the gradient of the signal has been further decreased in magnitude, i.e., the

graph has become less steep.

One final point: the reader may have already noticed how each application of the local averaging

operator increases the region of influence, i.e., the points affected by the averaging. Each application

affects an additional signal value, previously unaffected, on either side of the original edge.

Local averaging viewed in the frequency domain

Let us now examine what is happening in the frequency domain, i.e., in “k-space,” with the DFTs.

Once again, the DFT H of the blurred signal will be related to F as follows,

H[k] = F [k]G[k], k = 0, 1, · · · , N − 1. (47)

Since we know g, we may compute G[k]: By definition,

G[k] =

N−1∑

n=0

g[n] exp

(

−i2πkn

N

)

= g[−1] exp

(

i2πk

N

)

+ g[0] exp(0) + g[1] exp

(

−i2πk

N

)

= α +1

2(1 − α)

[

exp

(

i2πk

N

)

+ exp

(

−i2πk

N

)]

= α + (1 − α) cos

(

2πk

N

)

. (48)

Therefore,

H(k) = F (k)

[

α + (1 − α) cos

(

2πk

N

)]

. (49)

One immediate consequence of this relation is that

H(0) = F (0). (50)

In other words, the zero-frequency component of F is unchanged. But what about the other frequen-

cies? We need to examine the graph of the function G(k) vs. k.

First, we identify some other important values:

G[N/2] = 2α − 1, G[N/4] = G[3N/4] = α. (51)

A qualitative sketch of the graph of G[k] for α < 1/2 is shown in the next figure.

128

Page 28: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

1

N/4 N/2 3N/4 N

-1

0 k

dampening of magnitudes

of high frequency DFT coefficients

G[k] = α + (1 − α) cos(2πk/N)

α

2α − 1

The DFT G[k] of the local averaging convolution kernel g.

Perhaps the most important feature of the graph is that

|G[k]| < 1, 1 ≤ k ≤ N − 1. (52)

Then from the fact that

|H[k]| = |F [k]||G[k]|, (53)

we may conclude that

|H[k]| < |F [k]|, 1 ≤ k ≤ N − 1. (54)

In other words, the magnitudes of the DFT coefficients F [k] have been reduced to produce H[k]. For

the particular case α = 1/3, the degree of shrinking is greatest in the high-frequency region, i.e.,

N/4 ≤ k ≤ 3N/4.

Of course, this result is not surprising – we expected that the blurring or smearing of a signal

means that higher frequency components are being diminished in magnitude. But the main point is

that our analysis allows us to move from deblurring or denoising operations in the spatial or temporal

domain to equivalent operations in the frequency domain. We may choose to modify the DFT F of a

signal in order to denoise/deblur it, rather than working on the signal f itself. Of course, the method

of thresholding of DFT coefficients examined in the previous lecture is an example.

129

Page 29: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

Repeated applications of the averaging operator

Let us change the name of the averaged signal h = f ∗ g to be h1. If we now apply the averaging

operator to the averaged signal h1, the result is a new signal – call it h2:

h2 = h1 ∗ g = (f ∗ g) ∗ g. (55)

In the frequency domain, the DFT transform of H2 will be H1G. But H1 = FG. The net result is

that

H2[k] = (F [k]G[k]) G[k] = F [k]G[k]2. (56)

It is straightforward to show that if we apply the convolution/averaging operator n times, the result

is the signal hn with DFT transform,

Hn[k] = F [k]G[k]n. (57)

Recall that G[0] = 1 and that G[k] < 1 for k 6= 1. It follows that

Hn[0] = F [0] n = 1, 2, · · · , (58)

and

G[k]n → 0 as n → ∞. (59)

This implies that in the limit n → ∞, the DFT transform Hn[k] – recall that it is a complex N -vector

– will approach the limiting N -vector

H = (F [0], 0, 0, · · · , 0). (60)

The reader may already see that this corresponds to the DFT of a constant function, as expected: If

you keep taking averages, you eventually smooth the function out to a constant, i.e., h[n] = C. The

question is, “What is the value of the constant C?” We leave it for the reader to show that

C =1

N

N−1∑

n=0

f [n], (61)

i.e., the average value of the signal f . This seems to make sense, from a conservation principle, since

the convolution coefficients were chosen to “conserve signals,” cf. Eq. (44).

130

Page 30: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

Denoising by local averaging

The fact that local averaging/convolution smoothens an image suggests that it may be able to perform

denoising, essentially by averaging out the fluctuations produced by the additive noise. This is also

supported by the fact that local averaging dampens the magnitudes of high-frequency coefficients.

As such, we consider applying the local averaging operator introduced in the previous section, with

α = 1/3 to the noisy signal f [n] examined in the previous lecture, and treated with the thresholding

algorithm. Recall that the noisy signal was given by f = f0 + n, where f0 is the noiseless signal

given by Eq. (34) and n is an N -vector composed of random numbers generated by a random number

generator from a normal distribution N (0, σ), zero-mean and standard deviation σ = 0.1.

The results are shown in the accompanying figures. We first present again the noiseless signal f0

along with its DFT, then the noisy signal f along with its DFT. In the following figures, we show the

results of applying one, two, three and four convolutions to the signal. Most noticeably, the L2 error

between the signal and the noiseless signal f0 has been reduced from 1.66 to 1.31 after one application

of averaging. (Recall that the first application of thresholding resulted in an increase in the error.)

The L2 error is further reduced to 1.19 after another convolution. However, the third convolution

increases the error, as does the fourth. This decrease in error, followed by an increase, indicates the

tradeoff between smoothing of the signal to remove noise and smoothing of the signal away from the

underlying noiseless signal f0.

Note also the effects of the convolutions on the DFT spectra – one application of convolution

significantly diminishes the high-frequency coefficients in the region 75 ≤ k ≤ 175. After two, applica-

tions this part of the spectrum is almost eliminated. Further applications continue to diminish other

parts of the spectrum, which may actually “oversmooth” the signal.

This experiment has shown that the local averaging/convolution method seems to work better

than the thresholding method. But it is only one experiment, and definite conclusions cannot be

made.

The connection between local averaging/convolution and high-frequency damping also suggests

that we could perform denoising in the frequency domain, as was done for thresholding. Instead of

merely discarding DFT coefficients deemed insignificant, we could apply some kind of dampening

factor to the spectrum, with greater dampening being performed for high frequencies.

I thank one of the students in the class (E. Grant) for telling me after class that this is essentially

131

Page 31: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

Simple denoising by convolution operation, fc[n] = 1

3(f [n − 1] + f [n] + f [n + 1])

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

25

50

75

100

125

0 50 100 150 200 250

|F(k)|

k

Left: Original noiseless signal, N = 256 samples, f0[n], n = 0, 1, · · · , 255. Magnitudes |F0[k]| of DFT

coefficients.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Left: Noisy signal f = f0 + n, σ = 0.1. 0 ≤ x ≤ 2π. N = 256 samples. σ = 0.1. Right: Magnitudes |F [k]| of

DFT coefficients. L2 error ‖f0 − f‖2 = 1.66. Relative L2 error 12.6%.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Application of one convolution. Reconstructed signal f̃ [n] and magnitudes |F̃ [k]|. L2 error ‖f0 − f̃‖2 = 1.31.

Relative L2 error 10.0%.

132

Page 32: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Application of two convolutions. Reconstructed signal f̃ [n] and magnitudes |F̃ [k]|. L2 error ‖f0 − f̃‖2 = 1.19.

Relative L2 error 9.0%.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Application of three convolutions. Reconstructed signal f̃ [n] and magnitudes |F̃ [k]|. L2 error ‖f − f̃‖2 = 1.25.

Relative L2 error 9.5%.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

10

20

30

40

50

0 50 100 150 200 250

|F(k)|

k

Application of four convolutions. Reconstructed signal f̃ [n] and magnitudes |F̃ [k]|. L2 error ‖f − f̃‖2 = 1.29.

Relative L2 error 9.8%.

133

Page 33: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

what is done by sophisticated audio processing software packages. For example, I have been told that

a software package will ask for a sample of the “background noise”. It then performs a frequency

analysis of this noise – analogous to the DFT – and then allows you to design a “shaper” to modify

these frequencies as desired.

If time permits, perhaps we can return to this topic near the end of this course. In the meantime,

the reader may wish to experiment with various “frequency-shaping” formulas in an effort to denoise

a given signal.

Signal/image enhancement and “Deconvolution”

In many practical applications, we are given a degraded signal, say, h, and asked to find good approx-

imations to the original signal f that was degraded to produce h. The degradation could be done by

noise or by blurring, or both. In fact, in signal processing literature the general model for degradation

is a composition of a blur along with a noise operator (often additive noise).

If we happen to know (or assume!) that the degradation was accomplished by convolution with a

kernel g, then the DFTs of the degraded signal h and the original signal h are related as follows,

H[k] = F [k]G[k]. (62)

Now suppose that we know the operator g, hence the DFT coefficients, G[k]. One may well be tempted

to solve for F [k] by division, i.e.,

F [k] =H[k]

G[k], k = 0, 1, · · · , N − 1. (63)

and then performing an inverse DFT on F to obtain f , i.e., f = F−1F .

Very nice in theory, but not often successful in practice! One reason is that some coefficients G[k]

may be zero or very close to zero in magnitude. As a result, this procedure is unstable. A more stable

procedure would be to find an FFT F that minimizes the squared distance

‖F − HG‖2. (64)

In L2, this becomes a least-squares problem which is generally more stable.

But there are other problems. Generally, such inverse problems, i.e., given an h, find f such

that h = f ∗ g, are said to be ill-posed because they lack unique solutions. There are often many, if

not an infinite number of, solutions that satisfy the relation, at least approximately. One must often

134

Page 34: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

impose additional conditions on the solution during the process, which restricts the space of solutions

that we are exploring, but we still may be able to find useful approximations. The imposition of

additional conditions (do you recall the Lagrange multiplier technique in advanced calculus?) is

known as regularization.

The problem of “inverting” Eq. (62), i.e., find F given H, is known as “deconvolution,” for reasons

that should be clear: One obtains H from F by convolution, so obtaining F from H is the reverse

process, i.e., “undoing” the convolution.

Signal/image enhancement: When does one stop?

In the previous experiments on denoising using thresholding and convolution, we applied a particular

operation on the noisy image several times and observed the errors between the enhanced (i.e., de-

noised) signal and the reference image f0. In both cases, as is found in general, the error will decrease,

but then increase. As such, there are optimal “cutoff times”. Of course, if we know the original signal

f0, we know when to stop. But what if we don’t know f0, the situation we face most often? For

example, we may retrieve a noisy or blurred image f from somewhere, perhaps our own digital camera

and wish to enhance it, i.e., denoise or deblur it. Most often, this is done by trial and error – we look

at the result of the enhancement process and decide if we are satisfied with the result. If not, we may

wish to continue with the enhancement process, perhaps tweaking the control parameters.

A big question in signal/image processing is, “How do we automate this process?” For example,

how can we program a computer to know when to stop applying an operation, say convolution, to a

noisy image, if we don’t know the noiseless image f0?

A simple illustration of DFT for audio signal processing: Handel’s “Hallelujah”

chorus

This lecture actually began with a presentation of some results of simple DFT-based denoising, now

applied to a real audio signal – Handel’s “Hallelujah” chorus – instead of a seemingly sterile math-

ematical function as done earlier. The results, along with the MATLAB file used to generate them

(written by D. Brunet, my former Ph.D. student and the TA for this course in Winter 2011), can be

found in a folder posted below this week’s set of lectures on Waterloo LEARN. The notes summarizing

the presentation are included below.

135

Page 35: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

Appendix

DFT-based denoising of an audio signal: Handel’s “Hallelujah” Chorus

Here the ideas of thresholding and low-pass filtering for the purpose of denoising are applied to a more

realistic data set, namely an almost 9-second stream of Handel’s famous “Hallelujah” Chorus. I thank

Dominique Brunet, the TA for this course, for writing a MATLAB script, soundexampleDFT.m, which

performs these denoising operations. The MATLAB file is copied below and also posted along with

these notes at the course website on UW-ACE.

The Handel chorus is available from MATLAB as the digital audio file ’handel’, a data set of

N = 73113 points that were obtained by sampling the (continuous) audio signal at a frequency of

8192 Hz (i.e., 8192 = 213 samplings per second). In the MATLAB program below, it is accessed by

means of the load command. The signal is then normalized to assume values in [−1, 1] by dividing all

elements by the data value with the highest magnitude. This “noiseless audio signal” is also written

into the file Hallelujah.wav, which is also posted.

Gaussian noise of standard deviation σ = 0.1 is then added to this data set to produce a noisy

audio signal, written into the file noisyHallelujah.wav, also posted.

%%

clear all

close all

%% load signal

VOLUME = 1; % multiplicative factor of the amplitude of the signal

load(’handel’,’Fs’,’y’); % load "handel" in vector ’y’ with sample frequency of ’Fs’ hertz

y=y/max(abs(y)); % normalize amplitude of signal

len = length(y); % number of samples in ’y’

%% generate white noise

seed = 0;

randn(’state’,seed) % initialize pseudo-random number generator

sigma = 0.1; % standard deviation of the noise

n = sigma*randn(len,1); % create vector of Gaussian noise

%% additive noise

yn = y + n;

%% low-pass filter

med = (len+1)/2; % middle of the signal

cut = med/2; % cut 50% of the signal

136

Page 36: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

medmin = ceil(med - cut); % lower bound

medmax = floor(med + cut); % upper bound

Yn = fft(yn); % DFT

lYn = Yn;

lYn(medmin:medmax) = 0; % clamp middle frequencies

lyn = real(ifft(lYn)); % inverse-DFT

%% threshold

T = 10000;

Yn = fft(yn); % DFT

pYn = abs(Yn).^2; % Power Spectrum of Yn

tYn = Yn.*(pYn>T); % cut frequencies under threshold T

tyn = real(ifft(tYn)); % inverse-DFT

%% play all sounds

pause % (press ’enter’ to continue)

sound(y*VOLUME,Fs) % original

pause

sound(yn*VOLUME,Fs) % noisy

pause

sound(lyn*VOLUME,Fs) % low-pass filter

pause

sound(tyn*VOLUME,Fs) % hard threshold filter

%% display all sound waves

t=(0:len-1)/Fs; % time (step = 1/Fs)

pause

figure, plot(t,y), title(’Handel’), xlabel(’time (s)’), ylabel(’intensity (%)’)

pause

figure, plot(t,yn), title(’noisy Handel’), xlabel(’time (s)’), ylabel(’intensity (%)’), axis([0 9 -1 1])

pause

figure, plot(t,lyn), title(’low-pass Handel’), xlabel(’time (s)’), ylabel(’intensity (%)’), axis([0 9 -1 1])

pause

figure, plot(t,tyn), title(’hard threshold Handel’), xlabel(’time (s)’), ylabel(’intensity (%)’), axis([0 9 -1 1])

137

Page 37: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

%% frequency analysis

Y = fft(y);

pY = abs(Y).^2;

%rY = real(Y);

%iY = imag(Y);

Yn = fft(yn);

pYn = abs(Yn).^2;

%rYn = real(Yn);

%iYn = imag(Yn);

plYn = abs(lYn).^2;

%rlYn = real(lYn);

%ilYn = imag(lYn);

ptYn = abs(tYn).^2;

%rtYn = real(tYn);

%itYn = imag(tYn);

pause

%% display all power spectrums

figure, plot(pY), title(’power spectrum Handel’)

%figure, plot(rY), title(’real Handel’)

%figure, plot(iY), title(’imaginary Handel’)

pause

figure, plot(pYn), title(’power spectrum noisy Handel’)

%figure, plot(rYn), title(’real noisy Handel’)

%figure, plot(iYn), title(’imaginary noisy Handel’)

pause

figure, plot(plYn), title(’power spectrum of low-pass filtered noisy Handel’)

%figure, plot(rlYn), title(’real of low-pass filtered noisy Handel’)

%figure, plot(ilYn), title(’imaginary of low-pass filtered noisy Handel’)

pause

figure, plot(ptYn), title(’power spectrum of hard thresholded noisy Handel’)

%figure, plot(rtYn), title(’real of hard thresholded noisy Handel’)

%figure, plot(itYn), title(’imaginary of hard thresholded noisy Handel’)

138

Page 38: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

%% write sound waves

wavwrite(y,Fs,’Hallelujah.wav’)

wavwrite(yn,Fs,’noisyHallelujah.wav’)

wavwrite(lyn,Fs,’LPnoisyHallelujah.wav’)

wavwrite(tyn,Fs,’HTnoisyHallelujah.wav’)

The DFT of the noisy signal is then constructed using the fft command. Then two procedures

are applied to this noisy signal in an effort to denoise it:

1. Low-pass filtering: This is a removal of all high-frequency coefficients. In this experiment,

the removal was rather brutal: 50% of the signal was removed. (This means that from the DFT

coefficients ranging from 0 to N = 77113, the coefficients in the range [N/4, 3N/4] were deleted.

The inverse DFT of this set of roughly 38556 points was then taken using the ifft command.

The resulting audio signal is stored in the file LPnoisyHallelujah.wav.

When this signal was played in class, it sounded quite “hollow”.

2. Thresholding: Here, all DFT coefficients of the noisy signal with magnitudes under a prescribed

threshold T – in this case T = 10000 – were set to zero. The inverse DFT of the resulting set

was then taken and the resulting audio signal stored as HTnoisyHallelujah.wav.

When this signal was played in class, it sounded perhaps a little less hollow, but one could

hear some continuous high frequency whining. This is due to the fact that a small number of

high frequency DFT coefficients survived the thresholding. These few coefficients produced the

constant “whine.”

The graphs of all four audio signals – original (noiseless), noisy, low-pass denoised, threshold

denoised – are shown in figure below. And in the next figure are shown the DFT’s of these signals.

The corresponding audio files also posted as .wav files.

139

Page 39: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

0 1 2 3 4 5 6 7 8 9−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Handel

time (s)

inte

nsity (

%)

0 1 2 3 4 5 6 7 8 9−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1noisy Handel

time (s)in

ten

sity (

%)

Left: Original signal, N = 73113 data points. Right: Noisy signal.

0 1 2 3 4 5 6 7 8 9−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1low−pass Handel

time (s)

inte

nsity (

%)

0 1 2 3 4 5 6 7 8 9−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1hard threshold Handel

time (s)

inte

nsity (

%)

Left: Result of low-pass filter applied to noisy signal. Right: Result of hard threshold applied to

noisy signal.

140

Page 40: Lecture 10 Discrete Fourier Transforms (cont’d)links.uwaterloo.ca/amath391w13docs/set4.pdf · In applications, it is often more desirable to consider a slightly revised version

0 1 2 3 4 5 6 7 8

x 104

0

1

2

3

4

5

6

7

8

9x 10

5 power spectrum Handel

0 1 2 3 4 5 6 7 8

x 104

0

1

2

3

4

5

6

7

8

9x 10

5 power spectrum noisy Handel

Left: DFT of original signal. Right: DFT of noisy signal.

0 1 2 3 4 5 6 7 8

x 104

0

1

2

3

4

5

6

7

8

9x 10

5 power spectrum of low−pass filtered noisy Handel

0 1 2 3 4 5 6 7 8

x 104

0

1

2

3

4

5

6

7

8

9x 10

5 power spectrum of hard thresholded noisy Handel

Left: DFT of low-pass filtered noisy signal. The frequency components in the interval [N/4, 3N/4]

have been removed. Right: DFT of hard-thresholded noisy signal. Some isolated individual

high-frequency DFT components remain.

141