Orthonormal wavelet basesmdav.ece.gatech.edu/ece-6250-fall2017/notes/07-notes...piecewise-constant...

Orthonormal wavelet bases

In the last set of lecture notes, we developed the Haar wavelet basisfor decomposing continuous-time signals x(t) ∈ L2(R):

x(t) =∞∑

n=−∞s0,nφ0,n(t) +

∞∑j=0

∞∑n=−∞

wj,nψj,n(t),

where the (orthonormal) basis functions are are scaled an shiftedversions of two template functions:

φ0(t) =

1, 0 ≤ t ≤ 1,

0, otherwise, ψ0(t) =

1, 0 ≤ t < 1/2,

−1, 1/2 ≤ t < 1,

0, otherwise.

φ0,n(t) = φ0(t− n), ψj,n(t) = 2j/2ψ0(2jt− n).

The two template functions were linear combinations of shifts of acontracted version of φ0(t):

φ0(t) = φ0(2t) + φ0(2t− 1), ψ0(t) = φ0(2t)− φ0(2t− 1).

This gave us the very nice interpretation of the wavelet coefficientswj,n capturing the differences between piecewise-constant approx-imations of x(t) at different dyadic scales,

x(t) = P V0[x(t)] + PW0[x(t)]︸︷︷︸

=P V1 [x(t)]

+PW1[x(t)]

︸︷︷︸=P V2 [x(t)]

+PW2[x(t)]

︸︷︷︸=P V3 [x(t)]

+ · · · .

It is natural to ask if we can do something similar for other typesof approximation spaces Vj, ones that contain things other than just

133

Georgia Tech ECE 6250 Fall 2017; Notes by J. Romberg and M. Davenport. Last updated 12:51, October 2, 2017

piecewise-constant functions. Indeed we can, and it leads to a veryrich family of orthonormal wavelet bases.

As in the Haar case, everything will follow from properties of a scal-ing function φ0(t). The first thing we must do is carefully write downsome properties of φ0(t) that lead to consistent multiscale approxi-mations.

Multiscale approximation: Scaling spaces

For a given φ0(t), the first approximation space V0 is set of signals wecan build up from different linear combinations1 of the integer shiftsof φ0(t):

V0 = Span(φ0(t− n)n∈Z).

The first thing we want is for φ0(t− n)n∈Z to be an orthobasis, sowe ask that

(P1) 〈φ0(t− k), φ0(t− n)〉 =

1, k = n,

0, k 6= n.

Now setφj,n(t) = 2j/2φ0(2

jt− n),

so the function φ0(2jt − n) is formed by contracting φ0(t) by a

factor of 2j, then shifting the result on a grid with spacing 2−j. Fora fixed scale j, define

Vj = Span(φj,n(t)n∈Z).1Technically, this is the set of signals we can approximate arbitrarily well

from different linear combinations — this is the closure of the span, whichwe will denote by Span.

134


It follows immediately from the definitions that

x(t) ∈ V0 ⇔ x(t− k) ∈ V0 for all k ∈ Z,

or more generally,

x(t) ∈ Vj ⇔ x(t− 2−jk) ∈ Vj for all k ∈ Z.

This means that if Vj contains a signal, than it also contains ev-ery shift of that signal by integer multiples of 2−j. It also followsimmediately that

x(t) ∈ V0 ⇔ x(2t) ∈ V1

⇔ x(4t) ∈ V2

...

⇔ x(2jt) ∈ Vj.

Following the Haar case, there are two more key properties we askof this sequence of approximation spaces; we would like these spacesto be nested,

(P2) Vj ⊂ Vj+1, so x(t) ∈ Vj ⇒ x(t) ∈ Vj+1,

and we also want these approximation spaces to cover all of L2(R)in their limit:

(P3) limj→∞Vj = L2(R), so lim

j→∞P Vj [x(t)] = x(t) for all x(t) ∈ L2(R).

Now the question is: What properties does φ0(t) have to have to en-sure (P1)–(P3) hold? While the answer is not straightforward, thisquestion was answered completely in the late 1980s/early 1990s. The

135


conditions on φ0(t) are most easily expressed in terms of the inter-scale relationships between the φj,nn∈Z and φj+1,nn∈Z. This rela-tionship also connects wavelets to digital filterbanks, a fact thatallows discrete wavelet transforms to be computed very efficiently.

Given a φ0(t), define the sequence of numbers g0[n]

g0[n] = 〈φ0(t),√

2φ0(2t− n)〉. (1)

It turns out that whether properties (P1)–(P3) hold depends en-tirely on properties of this sequence of numbers. Let G0(e

jω) bethe discrete-time Fourier transform of g0[n]. Then we have followingmajor result:

If g0[n] obeys the following three properties, then the approximationspaces Vjj≥0 obey properties (P1)–(P3):

(G1) |G0(ejω)|2 + |G0(e

j(ω+π))|2 = 2, for all − π ≤ ω ≤ π

(G2) G0(ej0) =

∑n

g0[n] =√

2,

(G3) |G0(ejω)| > 0 for all |ω| ≤ π

2.

The proof of this result is long and complicated.2 Note that with(P2) established, we know that φ0(t) ∈ V1. This gives us an addi-

2There are a few good references here. I will recommend Chapter 7 of AWavelet Tour of Signal Processing, by S. Mallat, and Daubechies’ bookTen Lectures on Wavelets.

136


tional interpretation of the g0[n]; they tell us how to build up φ0(t)out of shifts of the contracted version φ0(2t):

φ0(t) =∞∑

n=−∞g0[n]

√2φ0(2t− n). (2)

Given a particular φ0(t), we can of course generate the g0[n] using(1), and check to see if the properties above hold. But we can alsogo the other way. If we design a sequence g0[n] that obeys the threeproperties above, it specifies a unique scaling function φ0(t). To getφ0(t) from g0[n], we take the continuous-time Fourier transform ofboth sides of (2):

Φ0(jΩ) =∞∑

n=−∞g0[n]√

2

∫ ∞−∞

φ0(2t− n)e−jΩt dt

=∞∑

n=−∞g0[n]

1√2ejΩn/2Φ0(jΩ/2)

=1√2G(ejΩ/2)Φ0(jΩ/2)

We can again expand Φ0(jΩ/2) = 1√2G(ejΩ/4)Φ0(jΩ/4), etc. Condi-

tion (G3) above means that the limit exists, and we have

Φ0(jΩ) =

( ∞∏p=1

G(ej2−pΩ)√2

)Φ0(j0) =

∞∏p=1

G(ej2−pΩ)√2

,

since Φ0(j0) = 1 (this follows from integrating both sides of (2)and applying Condition (G2) above). Unfortunately, except in spe-cial cases it is hard to compute Φ0(jΩ) past the iterative expressionabove. This is why wavelets are usually specified in terms of theircorresponding sequences g0[n].

137


Multiscale approximation: Wavelet spaces

The complementary wavelet spaces and wavelet basis functions canalso be generated from the coefficient sequence g0[n]. This is detailedas our second major result:

Suppose φ0(t) with corresponding g0[n] obeys (G1)–(G3). Set

g1[n] = (−1)1−ng0[1− n],

and

ψ0(t) =∞∑

n=−∞g1[n]√

2φ0(2t− n).

Then, along with integer shifts of the scaling functionφ0,n(t) = φ0(t− n), the set of all dyadic shifts and contractions ofψ0(t),

ψj,n(t) = 2j/2ψ0(2jt− n), n ∈ Z, j = 0, 1, 2, . . . ,

form an orthobasis for L2(R). That is,

x(t) =∞∑

n=−∞〈x,φ0,n〉φ0,n(t) +

∞∑j=0

∞∑n=−∞〈x,ψj,n〉ψj,n(t)

for all x(t) ∈ L2(R).

As with the Haar case, the wavelet coefficients at scale j representthe difference between the approximation of a signal in Vj and theapproximation in Vj+1. That is, if we set

Wj = Span (ψj,n(t)n∈Z)

then

138


1. For fixed j, 〈ψj,n, ψj,`〉 = 0 for n 6= `. That is, the ψj,n(t)n∈Zare orthobasis for Wj.

2. Wj ⊥ Vj′ for all j ′ ≤ j. Notice that sinceWj ⊂ Vj+1, it followsthat the sequence of spaces V0,W0,W1, . . . are all mutually or-thogonal.

3. Vj+1 = Vj ⊕Wj. That is, every v(t) ∈ Vj+1 can be written as

v(t) = P Vj [v(t)] + PWj[v(t)].

As the previous property states, these two components are or-thogonal to one another.

In summary, this means we can break L2(R) into orthogonal parts,

L2(R) = V0 ⊕W0 ⊕W1 ⊕W2 ⊕ · · ·

and we have an orthobases for each of these.

139


Vanishing moments and support size

In addition to forming an orthobasis with a certain multiscale form,there are other desirable properties that wavelet systems often have.

Vanishing moments. We say that ψ0(t) has p vanishing momentsif ∫ ∞

−∞tqψ0(t) dt = 0, for q = 0, 1, . . . , p− 1.

This means that ψ0(t) is orthogonal to all polynomials of degreep− 1 or smaller. Since shifting a polynomial just gives you anotherpolynomial of the same order, ψ0(t − n) is also orthogonal to thesepolynomials. This means that polynomials that have degree at mostp − 1 are completely contained in the scaling space V0 — all of thewavelet coefficients of a polynomial are zero.

Compact support. The support of ψ0(t) is the size of the intervalon which it is non-zero. If ψ0(t) is supported on [0, L], then ψ0,n(t) =ψ0(t− n) is supported on [n, n + L], and

w0,n = 〈x,ψ0,n〉 =

∫ n+L

n

x(t)ψ0,n(t) dt.

This means that w0,n only depends on what x(t) is doing on [n, n+L]— the wavelet coefficients are recording local information about thebehavior of x(t).

These two properties make wavelets very good for representing signalswhich are smooth except at a few singularities.

140


Daubechies Wavelets

In the late 1980s, Ingrid Daubechies presented a systematic frame-work for designing wavelets with vanishing moments and compactsupport. For any integer p, there is a method for solving for theg0[n] that corresponds to a wavelet with p vanishing moments andhas support size 2p− 1.

Here are the filter coefficients for p = 2, . . . , 10. (p = 1 gives youHaar wavelets.):

From Mallat, A Wavelet Tour of Signal Processing

141


Here are pictures of some of the scaling functions (N = 2p in thecaptions below):

From Burrus et al, Introduction to Wavelets ...

142


Here are pictures of some of the wavelet functions (N = 2p in thecaptions below):

From Burrus et al, Introduction to Wavelets ...

143


Non-orthogonal bases

Although orthogonal bases have many useful properties, it is possi-ble (and sometimes desirable) to use a set of non-orthogonal basisfunctions for discretization.

The main property we want from a basis is that the mapping fromsignal space to coefficient space is a stable bijection — each signalshould have a different set of expansion coefficients, and each set ofexpansion coefficients should correspond to a different signal. Smallchanges in the expansion coefficients should not lead to large changesin the re-synthesized signal. We also want a concrete method forcalculating the coefficients in a basis expansion.

We start our discussion in the familiar setting of RN .

Bases in RN

Let ψ1, . . . ,ψN ∈ RN be a basis for RN . We know from basic linearalgebra that these vectors form a basis if and only if they are linearlyindependent. For any x ∈ RN , we have

x =N∑n=1

αnψn, (3)

for some coefficient sequence

α =

α1

α2...αN

.How do we compute the αn?

144


The answer to this turns out to be straightforward as soon as wehave everything written down the right way. We can write the de-composition in (3) as

x = α1ψ1 + · · · + αNψN =

| | |

ψ1 ψ2 · · · ψN

| | |

α1

α2...αN

= Ψα

That is, the basis vectors are concatenated as columns in the N ×Nmatrix Ψ. Since its columns are linearly independent, Ψ is invertible,and so we have the reproducing formula

x = ΨΨ−1x

=N∑n=1

〈x, ψn〉ψn,

where ψn is the nth row of the inverse of Ψ:

Ψ−1 =

− ψ

T

1 −− ψ

T

2 −...

− ψT

N −

We have

Transform (analysis): αn = 〈x, ψn〉, n = 1, . . . , N ;

Inverse (synthesis): x =N∑n=1

αnψn

145


So we compute the expansion coefficients by taking inner productsagainst basis signals, just not the same basis signals as we are usingto re-synthesize x. The ψ1, . . . , ψN themselves are linearly indepen-dent, and are called the dual basis for ψ1, . . . ,ψN .

Also note that while the ψn are not orthonormal and the ψnare not orthonormal, jointly they obey the relation

〈ψn, ψ`〉 =

1, n = `,

0, n 6= `.

(This follows simply from the fact that ΨΨ−1 = I.) For this reason,

ψn and ψn are called biorthogonal bases.

Bases for subspaces of RN

Suppose that T is a M -dimensional (M < N) subspace of RN , andψ1, . . . ,ψM ∈ T is a basis for this space. Now

Ψ =

| |

ψ1 · · · ψM

| |

is N×M — it is not square, and so it is not invertible. But since theψm are linearly independent, it does have a left inverse, and hencewe can derive a reproducing formula.

Let x ∈ T , so there exists an α ∈ RM such that x = Ψα. Thenthe reproducing formula is

x = Ψ(ΨTΨ)−1ΨTx.

146


(The M ×M matrix ΨTΨ is invertible by the linear independenceof ψ1, . . . ,ψM .) To see that the formula above holds, simply plugx = Ψα into the expression on the right hand side.

We can write

x =M∑m=1

〈x, ψm〉ψm,

where

ψm = mth row of the M ×N matrix (ΨTΨ)−1ΨT.

Notice that when M = N (and so Ψ is square and invertible), thisagrees with the result in the previous section, as in this case

(ΨTΨ)−1ΨT = Ψ−1(ΨT)−1ΨT = Ψ−1.

Bases in finite dimensional spaces

The construction of the dual basis in RN (which tells us how tocompute the expansion coefficients for a basis) relied on manipulatingmatrices that contained the basis vectors. If our signals of interestare not length N vectors, but still live in a finite dimensional Hilbertspace S , then we can proceed in a similar manner.

Let ψ1(t), . . . , ψN(t) be a basis for an N -dimensional inner productspace S . Let x(t) ∈ S be another arbitrary signal in this space. Weknow that the closest point to x in S is x itself, and from our workon the closest point problem, we know that we can write

x(t) =N∑n=1

αnψn(t),

147


whereα1

α2...αN

=

〈ψ1,ψ1〉〈ψ2,ψ1〉 · · · 〈ψN ,ψ1〉〈ψ1,ψ2〉〈ψ2,ψ2〉 · · · 〈ψN ,ψ2〉

... . . .〈ψ1,ψN〉〈ψ2,ψN〉 · · · 〈ψN ,ψN〉

−1 〈x,ψ1〉〈x,ψ2〉

...〈x,ψN〉

.Use H to denote the inverse Gram matrix above. Then

αn =N∑`=1

Hn,`〈x,ψ`〉 =

⟨x,

N∑`=1

Hn,`ψ`

⟩.

Thus

x(t) =N∑n=1

〈x, ψn〉ψn(t), where ψn(t) =N∑`=1

Hn,` ψ`(t).

Example: Let S be the space of all second-order polynomials on[0, 1]. Set

ψ1(t) = 1, ψ2(t) = t, ψ3(t) = t2.

Then

H =

1 1/2 1/31/2 1/3 1/41/3 1/4 1/5

−1

=

9 −36 30−36 192 −18030 −180 180

,and

ψ1(t) = 30t2 − 36t + 9,

ψ2(t) = −180t2 + 192t− 36,

ψ3(t) = 180t2 − 180t + 30.

148


Non-orthogonal basis in infinite dimensions: Riesz Bases

When S is infinite dimensional, we have to proceed with a littlemore caution. It is possible that we have a infinite set of vectorswhich are linearly independent and span S (after closure), but therepresentation is completely unstable.

Recall that a (possibly infinite) set of vectors is called linearly inde-pendent if no finite subset is linearly dependent. Trouble can comewhen larger and larger sets are coming closer and closer to beinglinearly dependent. That is, if ψn, 1 ≤ n ≤ ∞ is a set of vectors,there might be no α2, . . . , αL such that

ψ1 =L∑`=2

α`ψ`,

exactly, no matter how large L is. But there could be an infinitesequence α` such that

limL→∞‖ψ1 −

L∑`=2

α`ψ`‖ = 0.

We will see an example of this below.

Our definition of basis prevents sequences like the above from occur-ring.

Definition. We say3 ψn∞n=1 is a Riesz basis if there existsconstants A,B > 0 such that

A∞∑n=1

|αn|2 ≤∥∥∥∥∥∞∑n=1

αnψn

∥∥∥∥∥2

≤ B∞∑n=1

|αn|2

3This definition uses the natural numbers to index the set of basis functions,but of course it applies equally to any countably infinite sequence.

149


uniformly for all sequences αn with∑∞

n=1 |αn|2 <∞.

Note that if ψn is an orthonormal basis, then it is a Riesz basiswith A = B = 1 (Parseval’s theorem).

Our first example is something which is not a Riesz basis.

Example: Multiscale Tent Functions

Consider this set of continuous-time signals on [0, 1].

φ0(t) =√

2(1− t), φ1(t) =√

2t,

ψ0(t) =

√3t, 0 ≤ t ≤ 1/2,√3(1/2− t), 1/2 ≤ t ≤ 1.

ψj,n(t) = 2j/2ψ0(2jt− n), j ≥ 1, n = 0, . . . , 2j − 1.

Sketch:

From your sketch above, it should be clear that

Span(φ0,φ1,ψ0,ψj,n, 1 ≤ j ≤ J, n = 0, . . . , 2j − 1)

is the set of all continuous piecewise-linear functions on dyadic in-tervals of length 2−J . Since this set is dense in L2([0, 1]), we can

150


writex(t) = b0φ0(t) + b1φ1(t) +

∑j,n

cj,nψj,n(t)

for some sequence of numbers b0, b1, cj,n. The problem is that thissequence of numbers might not be well-behaved.

To see that this collection of functions cannot be a Riesz basis, noticethat using the functions with 1 ≤ j ≤ J , we can match the samplesof any function on the grid with spacing 2−J , with linear interpola-tion in between these samples. From this, we see that φ0(t) can bematched exactly on the interval [2−J−1, 1] with a linear combinationof ψj,n, 0 ≤ j ≤ J . This means that there is a sequence of numbersβj,n such that

φ0(t) =∑j≥0

2j−1∑n=0

βj,nψj,n(t).

This means that the non-zero sequence of numbers 1, 0, βj,n, j ≥0, n = 0, . . . , 2j − 1 synthesizes the 0 signal, thus violating thecondition that A > 0 uniformly.

Example: Non-harmonic sinusoids

Consider the set of signals on [0, 1]

ψk(t) = ej2πγkt, k ∈ Z

where the frequencies γk are a sequence of numbers obeying

γk < γk+1, γk → −∞ as k → −∞, γk → +∞ as k → +∞.

Of course, if γk = k, this is the classical Fourier Series basis, andthe ψk form an orthobasis. If the γk are no longer equally spaced

151


by an integer multiple, these signals are not orthogonal. However,if they are not too far from being uniformly spaced, they still forma Riesz basis. “Kadec’s 1/4-Theorem” is a result from harmonicanalysis that says: If there exists a δ < 1/4 such that

|γk − k| ≤ δ for all k,

then ψk is a Riesz basis with

A = (cos(πδ)− sin(πδ))2, B = (2− cos(πδ) + sin(πδ))2.

Example: Modulated Bumps

In a previous lecture, we saw that signals of the form

ψn,k(t) = g(t− n) cos((k + 1/2)πt), k ≥ 0, n ∈ Z,

formed an orthobasis if the windowing function g(t) was chosen care-fully (this was called the Lapped Orthogonal Transform). If we arenot so concerned with orthogonality, we can use many different kindsof windows. For example, the set

ψn,k(t) = e−(t−n−1/2)2/2 sin((k + 1/2)πt), k ≥ 1, n ∈ Z,

is a Riesz basis for L2(R). The basis functions are essentially a bellcurve centered on the half integers modulated by sinusoids of differentfrequencies. (Using sin in the expression above instead of cos makesthe symmetries work out the way they need to.) In this case, theRiesz constants A,B and the dual basis can be computed explicitly4.

4R. R. Coifman and Y. Meyer, “Gaussian Bases,” Applied and Computa-tional Harmonic Analysis, vol. 2, pp. 299–302, 1995.

152


Riesz representation

If ψn is a Riesz basis for a Hilbert space S , then there is a dual

basis ψn such that for all x ∈ S,

x =∞∑n=1

〈x, ψn〉ψn.

So just as in finite dimensions, we can think of the ψn as providinga transform: we calculate the transform coefficients using αn =〈x, ψn〉, and can invert the transform (re-synthesize the signal) usingx =

∑n αnψn.

Linear approximation

Similarly, if ψn is a Riesz basis for an infinite dimensional subspace

T of S , then there exists a dual basis with ψn ∈ T such that

x =∞∑n=1

〈x, ψn〉ψn,

is the best approximation to x in T . That is, x above is the solutionto

minv∈T‖x− v‖

Stable reconstruction

Suppose that we compute the basis expansion coefficients for a signalx:

αn = 〈x, ψn〉,

153


and then perturb them:

αn = αn + εn.

What effect does this have on the reconstructed signal? Set

x =∞∑

n=−∞αnψn,

then

x− x =∞∑

n=−∞εnψn,

and so if ψn is a Riesz basis with constants A,B,

A‖ε‖22 ≤ ‖x− x‖2 ≤ B‖ε‖2

2.

That is, the squared-error in signal space is no greater than B timesthe squared-error in coefficient space (and no less than A times thecoefficient squared-error.)

Computing the dual basis

We have not said anything yet about how to compute the dual basisin infinite dimensions. This is because it is much less straightforwardthan in the finite dimensional case — instead matrix equations, wehave to manipulate linear operators that act on sequences of numbersof infinite length.

But still, we can do this in certain cases, as we will see in the nextsection. Let’s draw some parallels to the finite dimensional case tosee what needs to be done. For a finite N -dimensional space, weform the N × N Gram matrix G by filling in the entries Gn,` =

154


〈ψ`,ψn〉, invert it to get another N × N matrix H , and then set

ψn =∑N

`=1Hn,`ψ`.

We can follow the same procedure in infinite dimensions, but nowthe Gram “matrix” has an infinite number of rows and columns.The Grammian is a linear operator G : `2(Z) → `2(Z); it mapsinfinite length sequences to infinite length sequences. Given an inputx ∈ `2(Z) to this operator, the output at the nth index is given by

[G (x)](n) =∞∑

`=−∞

〈ψ`,ψn〉x[`].

It turns out that the conditions for being a Riesz basis ensure thatG is invertible, that is, that there is another linear operator H :`2(Z)→ `2(Z) such that

H (G (x)) = x, for all x ∈ `2(Z).

In general, we need completely different methods to compute theinverse H = G −1 than we do in the finite dimensional case. Butin the end, the action of H will be specified by a two-dimensionalarray of numbers Hn,`, n, ` ∈ Z; for any v ∈ `2(Z),

[H (v)](n) =∞∑

`=−∞

Hn,` v[`].

155


Orthonormal wavelet basesmdav.ece.gatech.edu/ece-6250-fall2017/notes/07-notes...piecewise-constant...

Documents

Transcript of Orthonormal wavelet basesmdav.ece.gatech.edu/ece-6250-fall2017/notes/07-notes...piecewise-constant...