Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time...

18
1 1 Digital Speech Processing— Lecture 9 Short-Time Fourier Analysis Methods- Introduction 2 General Discrete-Time Model of Speech Production Voiced Speech: A V P(z)G(z)V(z)R(z) Unvoiced Speech: A N N(z)V(z)R(z) 3 Short-Time Fourier Analysis represent signal by sum of sinusoids or complex exponentials as it leads to convenient solutions to problems (formant estimation, pitch period estimation, analysis-by-synthesis methods), and insight into the signal itself • such Fourier representations provide convenient means to determine response to a sum of sinusoids for linear systems clear evidence of signal properties that are obscured in the original signal 4 Why STFT for Speech Signals steady state sounds, like vowels, are produced by periodic excitation of a linear system => speech spectrum is the product of the excitation spectrum and the vocal tract frequency response speech is a time-varying signal => need more sophisticated analysis to reflect time varying properties changes occur at syllabic rates (~10 times/sec) – over fixed time intervals of 10-30 msec, properties of most speech signals are relatively constant (when is this not the case) 5 Overview of Lecture • define time-varying Fourier transform (STFT) analysis method • define synthesis method from time-varying FT (filter-bank summation, overlap addition) show how time-varying FT can be viewed in terms of a bank of filters model computation methods based on using FFT application to vocoders, spectrum displays, format estimation, pitch period estimation 6 Frequency Domain Processing Coding: – transform, subband, homomorphic, channel vocoders Restoration/Enhancement/Modification: – noise and reverberation removal, helium restoration, time-scale modifications (speed-up and slow-down of speech)

Transcript of Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time...

Page 1: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

1

1

Digital Speech Processing—Lecture 9

Short-Time Fourier Analysis Methods-

Introduction2

General Discrete-Time Model of Speech Production

Voiced Speech:

• AVP(z)G(z)V(z)R(z)

Unvoiced Speech:

• ANN(z)V(z)R(z)

3

Short-Time Fourier Analysis• represent signal by sum of sinusoids or

complex exponentials as it leads to convenient solutions to problems (formant estimation, pitch period estimation, analysis-by-synthesis methods), and insight into the signal itself

• such Fourier representations provide– convenient means to determine response to a sum of

sinusoids for linear systems– clear evidence of signal properties that are obscured

in the original signal

4

Why STFT for Speech Signals• steady state sounds, like vowels, are produced

by periodic excitation of a linear system => speech spectrum is the product of the excitation spectrum and the vocal tract frequency response

• speech is a time-varying signal => need more sophisticated analysis to reflect time varying properties– changes occur at syllabic rates (~10 times/sec)– over fixed time intervals of 10-30 msec, properties of

most speech signals are relatively constant (when is this not the case)

5

Overview of Lecture• define time-varying Fourier transform (STFT)

analysis method• define synthesis method from time-varying FT

(filter-bank summation, overlap addition)• show how time-varying FT can be viewed in

terms of a bank of filters model• computation methods based on using FFT• application to vocoders, spectrum displays,

format estimation, pitch period estimation

6

Frequency Domain Processing

• Coding:– transform, subband, homomorphic, channel vocoders

• Restoration/Enhancement/Modification:– noise and reverberation removal, helium restoration,

time-scale modifications (speed-up and slow-down of speech)

Page 2: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

2

7

Frequency and the DTFT

0 00

0

2 sinusoids

( ) cos( ) ( ) /where is the (in radians) of the sinusoid the Discrete-Time Fourier Transform ( )

( ) ( ) (

frequency

ω ω

ω ω

ωω

∞−

=−∞

= = +

= =∑

j n j n

j j n

n

x n n e e

X e x n e x

DTFT

DTFT { }

{ }12

-1

)

( ) ( ) ( )

where is the of ( )frequency variable

πω ω ω

π

ω

ωπ

ω−

= =∫ j j n j

j

n

x n X e e d X e

X e

DTFT

8

DTFT and DFT of Speech

1(2 / )

0

The DTFT and the DFT for the infinite duration signal could be calculated (the DTFT) and approximated (the DFT) by the following:

( ) ( ) ( )

( ) ( ) ( ) ,

j j m

mL

j L km

m

X e x m e DTFT

X k x m w m e

ω ω

π

∞−

=−∞

−−

=

=

=

(2 / )

0,1,..., 1

( ) ( )

using a value of =25000 we get the following plot

j

k L

k L

X e DFT

L

ω

ω π=

= −

=

9

25000-Point DFT of Speech

Mag

nitu

deLo

g M

agni

tude

(dB

)

Short-Time Fourier Transform (STFT)

10

11

Short-Time Fourier Transform• speech is not a stationary signal, i.e., it

has properties that change with time• thus a single representation based on all

the samples of a speech utterance, for the most part, has no meaning

• instead, we define a time-dependent Fourier transform (TDFT or STFT) of speech that changes periodically as the speech properties change over time

12

Definition of STFTˆ ˆ

ˆ

ˆˆ

ˆ ˆ ˆ ( ) ( ) ( ) both and are variables

ˆ ˆ ( ) is a real window which determines the portion of ( )that is used in the computation of ( )

ω ω

ω

ω∞

=−∞

= −

• −

∑j j mn

m

jn

X e x m w n m e n

w n m x nX e

Page 3: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

3

13

Short Time Fourier Transform

ω ω

ω

ω

∞−

=−∞

= −

= − ⇒

∑ˆ ˆˆ

ˆ STFT is a function of two variables, the time index, , whichis discrete, and the frequency variable, ˆ, which is continuous

ˆ( ) ( ) ( )

ˆ ˆ( ( ) ( )) fixed, ˆ varia

j j mn

m

n

X e x m w n m e

x m w n m n DTFT ble

14

ˆˆ ˆ ( )ˆ

ˆˆ ˆ

ˆ ˆˆ

ˆ

alternative form of STFT (based on change of variables) is

ˆ ( ) ( ) ( )

ˆ( ) ( )

if we define

ˆ ( ) ( ) ( )

then (

ω ω

ω ω

ω ω

∞− −

=−∞

∞−

=−∞

=−∞

= −

= −

= −

∑%

j j n mn

m

j n j m

m

j j mn

mj

n

X e w m x n m e

e x n m w m e

X e x n m w m e

X e ˆ

ˆ ˆˆ ˆ ˆ ˆˆ

) can be expressed as (using )ˆ ( ) ( ) ( ) ( )

ω

ω ω ω ω− −

′ = −

= = + −⎡ ⎤⎣ ⎦%j j n j j n

nn

m m

X e e X e e x n m w mDTFT

Short-Time Fourier Transform

15

STFT-Different Time Origins

ˆ ˆˆ

the STFT can be viewed as having two different time origins 1. time origin tied to signal ( )

ˆ( ) ( ) ( )

ˆ ˆ ˆ( ) ( ) , fixed, variable

2. time origin ti

ω ω

ω

∞−

=−∞

= −

= −⎡ ⎤⎣ ⎦

∑j j mn

m

x n

X e x m w n m e

x m w n m n DTFT

ˆˆ ˆ ˆˆ

ˆˆ ˆ

ˆˆ

ed to window signal ( )

ˆ( ) ( ) ( )

( )ˆ ˆ ˆ( ) ( ) , fixed, variable

ω ω ω

ω ω

ω ω

∞− −

=−∞

= + −

=

= − +⎡ ⎤⎣ ⎦

∑%

j j n j mn

m

j n j

j n

w m

X e e x n m w m e

e X ee w m x n m nDTFT

16

Time Origin for STFT

ˆ [0]m n x= − ⇒

ˆ[ ] [ ]Time origin tied towindow w m x n m− +

17

Interpretations of STFT

1

ˆˆ

ˆˆ

ˆˆ

there are 2 distinct interpretations of ( )ˆ. assume is fixed, then ( ) is simply the normal Fourier

ˆ transform of the sequence ( ) ( ), for ˆfixed , ( ) has the

ω

ω

ω

− − ∞ < < ∞ =>

jn

jn

jn

X e

n X ew n m x m m

n X e

ˆˆ

ˆˆ ˆˆ

same properties as a normal Fouriertransform

ˆ ˆ2. consider ( ) as a function of the time index with fixed.ˆ Then ( ) is in the form of a convolution of the signal ( )

with

ω

ω ω

ω−

jn

j j nn

X e n

X e x n e

ˆˆ

ˆthe window ( ). This leads to an interpretation in the form of ˆ ˆ linear filtering of the frequency modulated signal ( ) by ( ).

- we will now consider each of these interpretations of th

ω− j n

w nx n e w n

e STFT in a lot more detail

18

Fourier Transform Interpretationˆ

ˆ consider ( ) as the normal Fourier transform of the sequenceˆ ˆ( ) ( ), for fixed .

ˆ the window ( ) slides along the sequence ( ) and defines aˆ new STFT for every value of

w

ω•

− −∞ < < ∞• −

jnX e

w n m x m m nw n m x m

n

hat are the conditions for the existence of the STFTˆ the sequence ( ) ( ) must be absolutely summable for all

ˆ values of ˆ - since | ( ) | (32767 for 16-bit sampling)

-

• −

w n m x mn

x n L1ˆ since | ( ) | (normalized window levels)

- since window duration is usually finiteˆ ˆ ( ) ( ) is absolutely summable for all

• −

w n

w n m x m n

Page 4: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

4

19

Frequencies for STFT

2ˆ ˆ( )ˆ ˆ

the STFT is periodic in with period 2 , i.e.,( ) ( ),

can use any of several frequency variables to express STFT, including

ˆˆ -- (where is the sampling period for

j j kn nX e X e k

T T

ω ω π

ω π

ω

+

= ∀

= Ω

2

2 2

1

1

ˆˆ

ˆˆ

( )) to represent analog radian frequency,

giving ( )ˆ ˆˆ ˆ -- or to represent normalized

ˆ frequency (0 ) or analog frequency ˆ (0 ), giving (

j Tn

j fs n

x m

X e

f FT

f

F F / T X e π

ω π ω π

Ω

= =

≤ ≤

≤ ≤ = 2 ˆˆ) or ( )j FTnX e π

20

Signal Recovery from STFTˆ

ˆ

ˆˆ

ˆ since for a given value of , ( ) has the same properties as a normal Fourier transform, we can recover the input sequence exactly

since ( ) is the normal Fourier transform of the windo

ω

ω

jn

jn

n X e

X e

12

0 0

ˆ ˆˆ

wed ˆ sequence ( ) ( ), then

ˆ ˆ ( ) ( ) ( )

assuming the window satisfies the property that ( ) ( a trivial requirement), then by evaluating the inverse Fourier t

πω ω

π

ωπ −

− =

• ≠

∫ j j mn

w n m x m

w n m x m X e e d

w

12 0

ˆˆˆ

ransformˆ when , we obtain

ˆ ˆ ( ) ( )( )

πω ω

π

ωπ −

=

= ∫ j j nn

m n

x n X e e dw

21

Signal Recovery from STFT

12 0

0 0

ˆˆˆ

ˆ ˆˆ ˆ

ˆ ˆ ( ) ( )( )

ˆ with the requirement that ( ) , the sequence ( ) can be recovered exactly from ( ), if ( ) is known

ˆfor all values of over one complete pe

πω ω

π

ω ω

ωπ

ω

=

• ≠

∫ j j nn

j jn n

x n X e e dw

w x nX e X e

ˆˆ

riod- sample-by-sample recovery process

ˆ ˆ - ( ) must be known for every value of and for all ˆcan also recover sequence ( ) ( ) but can't guarantee

that ( ) can be recovered since (

ω ω−

jnX e n

w n m x mx m w ˆ ) can equal 0−n m

22

Properties of STFT

[ ]

ˆˆ

ˆ ˆ ˆ ˆ2ˆ ˆ ˆ ˆ ˆ

ˆ ˆ

ˆ ˆ ˆ( ) [ ( ) ( )] fixed, variable relation to short-time power density function

ˆ ( ) | ( ) | ( ) ( ) ( ) fixed

ˆ ˆ( ) ( ) ( ) ( ) ( )

ω

ω ω ω ω

ω

= −

= = ⋅ =

= − − − + ⇔

jn

j j j jn n n n n

n n

X e w n m x m n

S e X e X e X e R k n

R k w n m x m w n m k x m k S

DTFT

DTFT

[ ]

ˆ

ˆ

ˆ ˆ

ˆˆ ˆ( )ˆ

ˆ

( )

Relation to regular ( ) (assuming it exists)

( ) ( ) ( )

1( ) ( ) ( )2

ˆ( ) ( ) ( ) ( )

ω

ω

ω ω

πω θ ω θ θ

π

θ θ θ

θπ

=−∞

∞−

=−∞

− − −

− −

= =

=

⎡ ⎤− ↔ ∗⎣ ⎦

j

mj

j j m

m

j j j j nn

j j n j

e

X e

X e x m x m e

X e W e X e e d

w n m x m W e e X e

DTFT

23

Properties of STFT

[ ]

12

1 2

1 22

ˆ

ˆ ˆ

ˆˆ ˆ( )ˆ

ˆ

ˆˆ ˆ ˆ( )ˆ

assume ( ) exists

( ) ( ) ( )

( ) ( ) ( )

limiting caseˆ ˆ ˆ ( ) ( ) ( )

( ) ( ) ( ) ( )

i.e.

ω

ω ω

πω θ ω θ θ

π

ω

πω ω θ θ ω

π

θπ

πδ ω

πδ θ θπ

∞−

=−∞

− − −

− −

= =

=

= −∞ < < ∞ ⇔ =

= − =

j

j j m

m

j j j j nn

j

j j j n jn

X e

X e x m x m e

X e W e X e e d

w n n W e

X e X e e d X e

DTFT

, we get the same thing no matter where the window is shifted

24

Alternative Forms of STFT

1

ˆˆ

ˆ ˆ ˆˆ ˆ ˆ

ˆ ˆ

ˆˆ ˆ

ˆˆ ˆ

Alternative forms of ( ). real and imaginary parts

( ) Re ( ) Im ( )

ˆ ˆ( ) ( )

ˆ( ) Re ( )

ˆ( ) Im ( )

ˆ when ( ) and ( ) are bot

ω

ω ω ω

ω

ω

ω ω

ω

ω

⎡ ⎤ ⎡ ⎤= +⎣ ⎦ ⎣ ⎦= −

⎡ ⎤= ⎣ ⎦⎡ ⎤= − ⎣ ⎦

• −

jn

j j jn n n

n n

jn n

jn n

X e

X e X e j X e

a j b

a X e

b X e

x m w n m

ˆ

ˆ ˆ

ˆ( )ˆ ˆˆ ˆ

ˆˆ ˆ

h real (usually the case)ˆ ˆ ˆ can show that ( ) is symmetric in , and ( ) isˆ anti-symmetric in

2. magnitude and phase ( ) | ( ) |

ˆ can relate | ( ) | and ( )

θ ωω ω

ω

ω ω ωω

θ ω

=

n

n n

jj jn n

jn n

a b

X e X e e

X e ˆ ˆˆ ˆ to ( ) and ( ) ω ωn na b

Page 5: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

5

25

Role of Window in STFT

ˆˆ

ˆˆ

ˆ The window ( ) does the following: 1. chooses portion of ( ) to be analyzed 2. window shape determines the nature of ( )

ˆ Since ( ) (for fixed ) is the normal FT o

ω

ω

jn

jn

w n mx m

X e

X e n

ˆ ˆ

ˆ ˆ

ˆf ( ) ( ), then if we consider the normal FT's of both ( ) and ( ) individually, we get

( ) ( )

( ) ( )

ω ω

ω ω

∞−

=−∞

∞−

=−∞

=

=

j j m

m

j j m

m

w n m x mx n w n

X e x m e

W e w m e

26

Role of Window in STFT

ˆˆ ˆ

ˆ then for fixed , the normal Fourier transform of the ˆproduct ( ) ( ) is the convolution of the transforms

ˆof ( ) and ( )ˆ ˆ for fixed , the FT of ( ) is ( ) --thus

ω ω− −

•−

• − j j n

nw n m x m

w n m x mn w n m W e e

12

12

ˆˆ ˆ( )ˆ

ˆˆ ˆ( )ˆ

( ) ( ) ( )

and replacing by gives

( ) ( ) ( )

πω θ θ ω θ

π

πω θ θ ω θ

π

θπ

θ θ

θπ

− − −

+

=

• −

=

j j j n jn

j j j n jn

X e W e e X e d

X e W e e X e d

27

Interpretation of Role of Windowˆ ˆ

ˆ

ˆˆ ˆ

ˆ

( ) is the convolution of ( ) with the FT of the shifted

window sequence ( )ˆ ( ) really doesn't have meaning since ( ) varies with time;

ˆ consider ( ) defined for wi

ω ω

ω ω

ω

− −

j jn

j j n

j

X e X e

W e eX e x n

x nˆ

ndow duration and extended for all time to have the same properties then ( ) does exist with properties

ˆ that reflect the sound within the window (can also consider ( ) 0 outside the window

ω⇒

=

jX ex n

ˆ

ˆˆ

and define ( ) appropriately--but this is another case)Bottom Line: ( ) is a smoothed version of the FT of the part

ˆ of ( ) that is within the window .

ω

ω

j

jn

X e

X ex n w

28

Windows in STFTˆ

ˆ

ˆ

ˆ for ( ) to represent the short-time spectral properties of ( )

inside the window ( ) should be much narrower in frequency than significant spectral regions of ( )--i.e., almost a

ω

θ

ω

=>

jn

j

j

X e x n

W eX e n impulse

in frequency consider rectangular and Hamming windows, where width of the

main spectral lobe is inversely proportional to window length, and side lobe levels are essentially independent

of window length

Rectangular Window: flat window of length L samples; first zero in frequency response occurs at FS/L, with sidelobe levels of -14 dB or lower

Hamming Window: raised cosine window of length Lsamples; first zero in frequency response occurs at 2FS/L, with sidelobe levels of -40 dB or lower

Windows

29L=2M+1-point Hamming window and its corresponding DTFT

30

Frequency Responses of Windows

Page 6: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

6

Effect of Window Length-HW

31

Effect of Window Length-HW

32

Effect of Window Length-RW

33

Effect of Window Length-HW

34

Relation to Short-Time Autocorrelation

35

ˆˆ

ˆ ˆ ˆ ˆ2 *ˆ ˆ ˆ ˆ

ˆ

ˆ( ) [ ] [ ]ˆ,

( ) | ( ) | ( ) ( )

ˆ( ) [ ]

is the discrete-time Fourier transform of for each value of then it is seen that is the Fourier transform of

ω

ω ω ω ω

= =

= −

jn

j j j jn n n n

n

X e w n m x mn

S e X e X e X e

R l w n m x ˆ[ ] [ ] [ ]

which is the short-time autocorrelation function of the previouschapter. Thus the above equations relate the short-time spectrum to the short-time autocorrelation,

=−∞

− − +∑m

m w n l m x m l

Short-Time Autocorrelation and STFT

36

Page 7: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

7

37

Summary of FT view of STFTˆ

ˆ

interpret ( ) as the normal Fourier transform of the sequenceˆ( ) ( ),

properties of this Fourier transform depend on the window frequency resolution of ( ) varies inver

ω

ω

− − ∞ < < ∞•

o

jn

jn

X ew n m x m m

X e sely with the length of the window want long windows for high resolution want ( ) to be relatively stationary (non-time-varying) during duration of window for most s

=>o x n

table spectrum want short windows

as usual in speech processing, there needs to be a compromise between good temporal resolution (short windows) and good frequency resolution (l

=>

ong windows)

Linear Filtering Interpretation of STFT

38

39

Linear Filtering Interpretation

( )

1

12

2

ˆ ˆ

ˆ

ˆ( )

ˆ. modulation-lowpass filter form ( rather than )

( ) ( ) ( )

ˆ( ) ( ) , variable, fixed

( ) ( )

. bandpass filter-demodulation

ω ω

ω

πθ θ ω θ

π

ω

θπ

∞−

=−∞

+

= −

= ∗

=

j j mn

m

j n

j j j n

n

n n

X e x m e w n m

w n x n e n

W e X e e d

X ˆ ˆ ( )

ˆ ˆ

ˆ ˆ

( ) ( ) ( )

( ( ) ) ( )

ˆ[( ( ) ) ( )], variable, fixed

ω ω

ω ω

ω ω ω

∞− −

=−∞

∞−

=−∞

= −

= −

= ∗

j j n m

m

j n j m

mj n j n

e w m x n m e

e w m e x n m

e w n e x n n40

Linear Filtering Interpretation

( )( )( )

ˆ ˆ

ˆ

1. modulation-lowpass filter form:

( ) ( ) ( ),

ˆ variable, fixed

( ) ( )

ˆ( )cos( ) ( )

ˆ( )sin( ) ( )ˆ ˆ( ) ( )

j j mn

m

j n

n n

X e x m e w n m

n

x n e w n

x n n w n

j x n n w na jb

ω ω

ω

ω

ω

ω

ω ω

∞−

=−∞

= −

= ∗

= ∗ −

= −

41

Linear Filtering Interpretation

( )ˆ ˆ ˆ

2. bandpass filter-demodulation form

ˆ ( ) ( ) ( ) , variable, fixedω ω ω ω− ⎡ ⎤= ∗⎣ ⎦j j n j n

nX e e w n e x n n

ˆ

complex bandpass filter output modulated by signal if ( ) is lowpass, then filter

ˆ is bandpass around all real computation for lower

half structure

j n

j

eW e

ω

θ

θ ω

•=

42

Linear Filtering InterpretationLowpass filter frequency response

Bandpass filter frequency response

Page 8: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

8

43

Linear Filtering Interpretation

ˆ ˆ( )

assume normal FT of ( ) existsˆ ( ) ( ) (recall that is a particular frequency)

( ) ( )ˆ spectrum of ( ) at frequency is shifted to zero frequency;

s

θ

ω θ ω

ω

ω

− +

↔=>•

j

j n j

x nx n X ex n e X e

x n

1

ˆ( )

ince the STFT is a convolution, the FT of the STFT is the product of the individual FT's, i.e., ( ) ( ) if ( ) resembles a narrow band lowpass filter, i.e., ( )

for sm

θ ω θ

θ θ

+ ⋅

• =

j j

j j

X e W eW e W e

ˆ ˆ( )

all and is 0 otherwise, then ( ) ( ) ( )θ ω θ ω

θ+ ⋅ ≈j j jX e W e X e

Summary-STFT

44

ˆ ˆˆ

Short-Time Fourier Transform (STFT)

ˆ( ) [ ] [ ] ,

ˆ ˆ, 0 2ˆ ˆFixed value of , varying -- DFT Interpretation

ˆˆFixed value of , varying -- Filter Bank Interpretation

j j mn

mX e x m w n m e

nn

n

ω ω

ω πω

ω

∞−

=−∞

= −

− ∞ < < ∞ ≤ <

Summary

45

ˆ ˆˆ

Short-Time Fourier Transform (STFT)

ˆ ˆ ˆ( ) [ ] [ ] , , 0 2j j mn

mX e x m w n m e nω ω ω π

∞−

=−∞

= − −∞ < < ∞ ≤ <∑

0 R 2R 3R n̂

ω̂

0

( )

( )

ˆˆ ˆ

ˆˆ 1

ˆˆ

ˆDFT: ( ) [ ] [ ]

ˆ( ) DFT [ ] [ ]ˆˆ0 2 , 0, ,2 ,...

nj j m

nm n L

jn

X e x m w n m e

X e x m w n m

n R R

ω ω

ω

ω π

= − +

= −

= −

≤ < =

Summary – Modulation/Lowpass Filter

46

ˆ ˆˆ

Short-Time Fourier Transform (STFT)

ˆ ˆ ˆ( ) [ ] [ ] , , 0 2j j mn

mX e x m w n m e nω ω ω π

∞−

=−∞

= − −∞ < < ∞ ≤ <∑

0ω̂1ω̂2ω̂

1ˆLω −

n

( )( )( )

ˆ ˆ

1

ˆ ˆ

ˆ

Filter Bank: ( ) [ ] [ ]

( ) [ ] [ ]

[ ] [ ]

nj j m

nm n L

j j nn

j n

X e x m e w n m

X e x n e w n m

x n e w n

ω ω

ω ω

ω

= − +

= −

= −

= ∗

Summary – Bandpass Filter/Demodulation

47

ˆ ˆˆ

Short-Time Fourier Transform (STFT)

ˆ ˆ ˆ( ) [ ] [ ] , , 0 2j j mn

mX e x m w n m e nω ω ω π

∞−

=−∞

= − −∞ < < ∞ ≤ <∑

0ω̂1ω̂2ω̂

1ˆLω −

n

( )ˆ ˆ ( )

ˆ ˆ ˆ

Filter Bank: ( ) [ ] [ ]

( ) ( [ ] ) [ ]

j j n mn

m

j j n j nn

X e x n m e w m

X e e w n e x n

ω ω

ω ω ω

∞− −

=−∞

= −

⎡ ⎤= ∗⎣ ⎦

Summary – Modulation

48

ˆ ˆ

ˆ( )

Modulation[ ] ( ) ( )

ˆ( ) ( )( )

j n j j n

j

j

x n e X e FT eX eX e

ω ω ω

ω

ω ω

δ ω ω−

↔ ∗

= ∗ −

=

-W 0 W

( )jX e ω ˆ( )( )jX e ω ω−

ˆ Wω − ˆ Wω +ω̂

Page 9: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

9

49

STFT Magnitude Only

1 22 2 /ˆ

for many applications you only need the magnitude of the STFT(not the phase) in such cases, the bandpass filter implementation is

less complex, since

ˆ ˆ | ( ) | ( ) ( )

|

ω ω ω

⎡ ⎤= +⎣ ⎦

=

jn n nX e a b

1 22 2 /ˆ ˆ ˆ( ) | ( ) ( )ω ω ω⎡ ⎤= +⎣ ⎦% %%j

n n nX e a b

Sampling Rates of STFT

50

51

Sampling Rates of STFT• need to sample STFT in both time and frequency to

produce an unaliased representation from which x(n) can be exactly recovered

• sampling rates lower than the theoretical minimum rate can be used, in either time or frequency, and x(n) can still be exactly recovered from the aliased (under-sampled) short-time transform– this is useful for spectral estimation, pitch estimation, formant

estimation, speech spectrograms, vocoders– for applications where the signal is modified, e.g., speech

enhancement, cannot undersample STFT and still recover modified signal exactly

52

Sampling Rate in Time

ˆ

ˆ

to determine the sampling rate in time, we take a linear filtering view 1. ( ) is the output of a filter with impulse response ( )

2. ( ) is a lowpass response with effective ba

jn

j

X e w n

W e

ω

ω

•%

2

0 54 0 46

ˆ ˆ

ndwidth of Hertz thus the effective bandwidth of ( ) is Hertz ( ) has

to be sampled at a rate of samples/second to avoid aliasingExample: Hamming Window

( ) . . co

j jn n

BX e B X e

B

w n

ω ω• =>

= − 2 1 0 10

2 10 000

s( / ( ))otherwise

(Hz); for 400, , Hz 50 Hz need

rate of 100/sec (every 100 samples) for sampling rate in time

ss

n L n L

FB L F BL

π − ≤ ≤ −=

⇒ ≈ = = => = =>

53

Sampling Rate in Frequency2

22 0 1 1

ˆ ˆ since ( ) is periodic in with period , it is only necessary to sample over an interval of length

ˆ need to determine an appropriate finite set of frequencies, / , , ,...,

jn

k

X e

k N k N

ω ω ππ

ω π

• = = −ˆ

ˆ

ˆ

at which ( ) must be specified to exactly recover ( )

use the Fourier transform interpretation of ( )

1. if the window ( ) is time-limited, then the inverse transform of ( ) is

jn

jn

jn

X e x n

X e

w n X e

ω

ω

ω

ˆ

time-limited

2. the sampling theorem requres that we sample ( ) in the frequency dimension at a rate of at least twice its ('symmetric') "time width" 3. since the inverse Fourie

jnX e ω

ˆ

ˆ

r transform of ( ) is the signal ( ) ( ) and this signal is of duration samples (the duration of ( )), then according to the sampling theorem

( ) must be sampled (in frequency

jn

jn

X e x m w n mL w n

X e

ω

ω

2 0 1 1 2

ˆ

) at the set of frequencies

ˆ , , ,..., (where / is the effective width of the window)

in order to exactly recover ( ) from ( ) k

k

jn

k k L LL

x n X e ω

πω = = −

• thus for a Hamming window of duration L=400 samples, we require that the STFT be evaluated at at least 400 uniformly spaced frequencies around the unit circle 54

“Total” Sampling Rate of STFT• the “total” sampling rate for the STFT is the product of the sampling

rates in time and frequency, i.e.,SR = SR(time) x SR(frequency)

= 2B x L samples/secB = frequency bandwidth of window (Hz)L = time width of window (samples)

• for most windows of interest, B is a multiple of FS/L, i.e.,B = C FS/L (Hz), C=1 for Rectangular Window

C=2 for Hamming WindowSR = 2C FS samples/second

• can define an ‘oversampling rate’ ofSR/ FS = 2C = oversampling rate of STFT as compared to

conventional sampling representation of x(n)for RW, 2C=2; for HW 2C=4 => range of oversampling is 2-4this oversampling gives a very flexible representation of the speech signal

Page 10: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

10

55

Mathematical Basis for Sampling the STFT

2 2

2 2

2

2 0 1 1

ˆ assume sample in time at ,

ˆ ˆ and in frequency at , , ,...,

sample values

( ) [ ] [ ]

( )

( ) [ ]

r

k

j k j kmN N

r Rm

j krR j kN N

r R

j kN

r R

n n rR r

k k NN

X e w rR m x m e

e X e

X e x rR m

π π

π π

π

πω ω

∞ −

=−∞

• = = − ∞ < < ∞

⎛ ⎞= = = −⎜ ⎟⎝ ⎠

= −

=

= +

%

% ( )2

2 2

( ) set ;

define DFT-type notation

( ) ( ) ( )

j kmN

m

j k j krRN N

r r R r

w m e m rR m m m

X k X e e X k

π

π π

∞ −

=−∞

′ ′− = + =

= =

%

56

Sampling the STFT

57

Sampling the STFT

2 2

21

0

0 0 1

0 1

DFT Notation

[ ] ( ) [ ]

let [ ] for (finite duration window with no zero-valued samples)

[ ] [ ] [ ]

( fixed, ) if then (D

j k j krRN N

r r R r

L j kmN

rm

X k X e e X k

w m m L

X k x r R m w m e

r k NL N

π π

π

− −

=

= =

• − ≠ ≤ ≤ −

= + −

≤ ≤ −• ≤

%

%

21

0

1

0 1

FT defined with no aliasing can recover sequence exactly using inverse DFT)

[ ] [ ] [ ]

( fixed, ) if (IDFT defined with no aliasing), then all samples

ca

N j kmN

rk

x r R m w m X k eN

r m NR L

π−

=

+ − =

≤ ≤ −• ≤

∑ %

( )n be recovered from [ ] gaps in sequencerX k R L> ⇒

58

What We Have Learned So Far1 ˆ ˆ

ˆ

ˆˆ

ˆ. ( ) ( ) ( )

ˆ ˆ function of for sampled (looks like a time sequence)ˆˆ function of for sampled (looks like a transform)

( ) (no sampling rate reduction) define

j j mn

m

jn

X e x m w n m e

n nn

X e

ω ω

ω

ωω ω

∞−

=−∞

= −

=

=

1 2 3 0

2 ˆˆ

ˆˆ ˆˆ

ˆ ˆd for , , ,...;ˆ ˆ ˆ. ( ) DTFT ( ) ( ) fixed, variable

ˆ with time origin tied to ( )ˆ ˆ ˆ( ) DTFT ( ) ( ) fixed, variable

with time origin tied to (

jn

j j nn

n

X e x m w n m n

x nX e e x n m w m n

w

ω

ω ω

ω π

ω

ω−

= ≤ ≤

= − ⇒⎡ ⎤⎣ ⎦

= + − ⇒⎡ ⎤⎣ ⎦

3

1

ˆˆ

ˆˆ

ˆ ˆˆ

). Interpretations of ( )

ˆ ˆˆ. fixed, variable; ( ) DTFT ( ) ( ) DFT View

ˆ ˆ 2. variable, fixed; ( ) ( ) ( ) Linear Filtering view

jn

jn

j j nn

mX e

n X e x m w n m

n n X e x n e w n

ω

ω

ω ω

ω ω

ω −

= = − ⇒⎡ ⎤⎣ ⎦= = ∗ ⇒

filter bank implementation⇒

59

What We Have Learned So Far4

12

12 0

5

ˆ ˆˆ

ˆˆ ˆˆ

ˆ

. Signal Recovery from STFT

ˆ ˆ ( ) ( ) ( )

ˆ ˆ( ) ( )( )

. Linear Filtering Interpretation

1. modulation-lowpass filter ( ) ( ) ( )

j j mn

j j nn

j jn

x m w n m X e e d

x n X e e dw

X e w n x n e

πω ω

π

πω ω

π

ω

ωπ

ωπ

− =

=

⇒ = ∗

( )

12

ˆ

ˆ ˆ( )

ˆ ˆ ˆ

,

ˆ ˆ variable, fixed ( ) ( ) ( )

2. bandpass filer-demodulation ( ) ( ) ( ) ,

ˆ ˆ variable, fixed

n

j j j j nn

j j n j nn

n n X e W e X e e d

X e e w n e x n

n n

ω

πω θ θ ω θ

π

ω ω ω

ω θπ

ω

+

⎡ ⎤⎣ ⎦

= =

⎡ ⎤⇒ = ∗⎣ ⎦=

60

What We Have Learned So Far6

22

. Sampling Rates in Time and Frequency 1. time: ( ) has bandwidth of Hertz samples/sec rate

Hamming Window: (Hz)

2. frequency: ( ) is time limited to samples

j

S

W e B BFBL

w n L

ω ⇒

=

% inverse of ( ) is also time limited need to sample in frequency at twice the (effective) time width of the time-limited sequence frequency samples 3.

jnX e

L

ω⇒

⇒⇒

2total Sampling Rate: samples/sec - frequency bandwidth of the window (Hz) - effective time width of the window (samples) / (Hz) SampliS

B LBL

B C F L

⋅=== ⋅ ⇒ 2 2

12

ng Rate samples/second - for Rectangular Window, - for Hamming Window,

SB L CFC

C

= ⋅ =

==

Page 11: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

11

Spectrographic Displays

61 62

Spectrographic Displays• Sound Spectrograph-one of the earliest embodiments of the time-

dependent spectrum analysis techniques– 2-second utterance repeatedly modulates a variable frequency

oscillator, then bandpass filtered, and the average energy at a given time and frequency is measured and used as a crude measure of the STFT

– thus energy is recorded by an ingenious electro-mechanical system on special electrostatic paper called teledeltos paper

– result is a two-dimensional representation of the time-dependent spectrum-with vertical intensity being spectrum level at a given frequency, and horizontal intensity being spectral level at a given time-with spectrum magnitude being represented by the darkness of the marking

– wide bandpass filters (300 Hz bandwidth) provide good temporal resolution and poor frequency resolution (resolve pitch pulses in time but not in frequency)—called wideband spectrogram

– narrow bandpass filters (45 Hz bandwidth) provide good frequency resolution and poor time resolution (resolve pitch pulses in frequency, but not in time)—called narrowband spectrogram

Conventional Spectrogram (Every salt breeze comes from the sea)

63 64

Digital Speech Spectrograms• wideband spectrogram

• follows broad spectral peaks (formants) over time

• resolves most individual pitch periods as vertical striations since the IR of the analyzing filter is comparable in duration to a pitch period

• what happens for low pitch males—high pitch females

• for unvoiced speech there are no vertical pitch striations

• narrowband spectrogram• individual harmonics are resolved in voiced regions

• formant frequencies are still in evidence

• usually can see fundamental frequency

• unvoiced regions show no strong structure

Digital Speech Spectrograms• Speech Parameters (“This is a test”):

– sampling rate: 16 kHz– speech duration: 1.406 seconds– speaker: male

• Wideband Spectrogram Parameters:– analysis window: Hamming window– analysis window duration: 6 msec (96 samples)– analysis window shift: 0.625 msec (10 samples)– FFT size: 512– dynamic range of spectral log magnitudes: 40 dB

• Narrowband Spectrogram Parameters:– analysis window: Hamming window– analysis window duration: 60 msec (960 samples)– analysis window shift: 6 msec (96 samples)– FFT size: 1024 – dynamic range of spectral log magnitudes: 40 dB

65

Digital Speech Spectrograms

66

Top Panel: 3 msec (48 samples) window

Second Panel: 6 msec (96 samples) window

Third Panel: 9 msec (144 sample) window

Fourth Panel: 30 msec (480 sample) window

Page 12: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

12

67

Spectrogram Comparisons Spectrogram - Male

68

1024 80 75= = =nfft , ,L Overlap

Spectrogram - Female

69

1024 80 75= = =nfft , ,L Overlap

Overlap Addition (OLA) Method

70

71

Overlap Addition (OLA) Method

/ˆ ˆ

ˆ

based on normal FT interpretation of short-time spectrumˆ( ) ( ) ( ) ( )

can reconstruct by computing IDFT of ( ) and dividing out the window (assumed non-zero

k

k

DFT IDFTjn n

jn

X e y m x m w n m

x(m) X e

ω

ω

←⎯⎯⎯⎯→ = −

ˆ

for all samples) this process gives signal values of for each window

window can be moved by samples and the process repeated

since ( ) is "undersampled" in time, it is highly susckjn

L x(m)L

X e ω

• =>

• eptible to aliasing errors need more robust synthesis procedure=>

72

Overlap Addition (OLA) Method( ) ( )

summation is for overlapping analysis sections for each value of where ( ) is measured, do an inverse FT to give

( ) ( ) ( ) (where is the size of the FT)(

k k

k

j j nm

m k

jm

m

y n X e e

m X ey n Lx n w m n Ly

ω ω

ω

⎡ ⎤= ⎢ ⎥

⎣ ⎦•

= −

∑ ∑

10

00

) ( ) ( ) ( )

a basic property of the window is

( ) ( ) | ( )

since any set of samples of the window are equivalent (by sampling arguments), then if ( ) is sampled ofte

k

k

mm m

Njj

n

n y n Lx n w m n

W e W e w n

w n

ωω

==

= = −

= =

∑ ∑

0

0

n enough we get (independent of ) ( ) ( )

( ) ( ) ( ) using overlap-added sections

j

mj

nw m n W e

y n Lx n W e

− =

=

Page 13: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

13

Overlap Addition of Bartlett and Hann Windows

73 74

Overlap Addition of Hamming WindowL=128

Window Spectra

75DTFTof Bartlett (triangular), Hann and Hamming windows

Hamming Window Spectra

76

DTFTs of even-length, odd-length and modified odd-to-even length Hamming windows; zeros spaced at 2π/R give perfect

reconstruction using OLA (even-length window)

77

Overlap Addition (OLA) Method

78

Overlap Addition (OLA) Method

• w(n) is an L-point Hamming window with R=L/4

• assume x(n)=0 for n<0

• time overlap of 4:1 for HW

• first analysis section begins at n=L/4

Page 14: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

14

79

Overlap Addition (OLA) Method

• 4-overlapping sections contribute to each interval

• N-point FFT’s done using L speech samples, with N-Lzeros padded at end to allow modifications without significant aliasing effects

• for a given value of ny(n)=x(n)w(R-n)+x(n)w(2R-n)+x(n)w(3R-n)+x(n)w(4R-n)=x(n)[w(R-n)+w(2R-n)+w(3R-n)+w(4R-n)]=x(n) W(ej0)/R

Filter Bank Summation (FBS)

80

81

Filter Bank Summation

the filter bank interpretation of the STFT shows that for

any frequency , ( ) is a lowpass representation ˆ of the signal in a band centered at ( for FBS)

( ) ( )

ω

ω ω

ωω

=

= −

k

k k

jk n

k

j j nn k

X en n

X e e x n m w ( )

where ( ) is the lowpass window used at frequency (we have generalized the structure to allow a different lowpass window at each frequency ).

ω

ω

ω

=−∞∑ kj m

m

k k

k

m e

w m

82

Filter Bank Summation

define a bandpass filter and substitute it in the equation to give

( ) ( )

( ) ( ) ( )

k

k k

j nk k

j j nn k

m

h n w n e

X e e x n m h m

ω

ω ω∞

=−∞

=

= −∑

83

Filter Bank Summation

84

Filter Bank Interpretation of STFT (case: wk[n]=w[n])

[ ]

( )

( )

[ ] ( )

[ ] [ ] ( ) ( ) ( )

( ) ( )

( ) ( ) ( ) ( )

[ ] ( ) [ ] [ ]

( ) ( ) ( )

k k

k k k

k

k k

j

j n j nj j

j n j n j nj nk

n njj j

k

j j nk n

j j jk

w n W e

h n w n e H e W e FT e

FT e e e e

H e W e W e

v n X e e x n h n

V e H e X e FT

ω

ω ωω ω

ω ω ω ωω

ω ωω ω

ω ω

ω ω ω

δ ω ω

δ ω ω

∞ ∞− −−

=−∞ =−∞

← − − − →

= ← − − − → = ⊗

= = = −

= ⊗ − =

= = ⋅ ∗

⎡ ⎤= ⋅ ⊗⎣ ⎦

∑ ∑

( )

( )

( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( )

k

k

k

j n

j jk

jjk

j j

e

H e X e

X e W e

X e W e

ω

ω ω

ω ωω

ω ω ω

δ ω ω

δ ω ω

+

⎡ ⎤= ⋅ ⊗ +⎣ ⎦⎡ ⎤= ⋅ ⊗ +⎣ ⎦

= ⋅

-ω0 ω0 ω

ωk-ω0 ωk ωk+ ω0 ω

“single-sided, bandpass”

“lowpass”

Page 15: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

15

85

Filter Bank Summation

thus ( ) is obtained by bandpass filtering ( ) followed by modulation with the complex exponential

. We can express this in the form

( ) ( ) ( ) ( )

thus

k

k

k k

jn

j n

j j nk n k

m

X e x n

e

y n X e e x n m h m

ω

ω

ω ω

=−∞

= = −

∑ ( ) is the output of a bandpass filter with impulse

response ( )k

k

y nh n

86

Filter Bank Summation

87

Filter Bank Summation0 1 1

a practical method for reconstructing ( ) from the STFT is as follows

1. assume we know ( ) for a set of frequencies { }, , ,..., 2. assume we have a set of bandpass filters

ω ω

= −kjn k

x n

X e N k NN

0 1 1

with impulse responses

( ) ( ) , , ,..., 3. assume ( ) is an ideal lowpass filter with cutoff frequency

ω

ω= = −kj n

k k

k pk

h n w n e k Nw n

( )

- the frequency response of the bandpass filter is

( ) ( )kjjk kH e W e ω ωω −=

88

Filter Bank Summation

2 0 1 1

consider a set of bandpass filters, uniformly spaced, so that the entire frequency band is covered

, , ,...,

also assume window the same for all channels, i.e., ( ) ( ),

πω

= = −

•=

k

k

N

k k NN

w n w n

1 1

0 0

0 1 1

( )

, ,..., if we add together all the bandpass outputs, the composite response is

( ) ( ) ( )

if ( ) is properly sampled in frequency ( ), where is the

ω ωω ω

ω

− −−

= =

= −

= =

• ≥

∑ ∑% k

k

N Njj j

kk k

j

k N

H e H e W e

W e N L L

1

0

1 0( )

window duration, then it can be shown that

( ) ( )ω ω ω−

=

= ∀∑ k

Nj

k

W e wN

FBS Formula

89

Filter Bank Summation

1/N

90

Filter Bank Summation

/

derivation of FBS formula

( ) ( )

if ( ) is sampled in frequency at uniformly spaced points, the inverse discrete Fourier transform

of the sampled version of ( ) is

ω

ω

ω

←⎯⎯⎯→

k

FT IFT j

j

j

w n W e

W e N

W e

1

0

1

(recall that sampling multiplication convolution aliasing)

( ) ( )

an aliased version of ( ) is obtained.

ω ω− ∞

= =−∞

⇒ ⇔ ⇒

= +

∑ ∑k k

Nj j n

k r

W e e w n rNN

w n

Page 16: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

16

91

Filter Bank Summation

0 0

0

1

If ( ) is of duration samples, then ( ) , , and no aliasing occurs due to sampling in frequency

of ( ). In this case if we evaluate the aliased formula for , we get

ω

•= < ≥

=

j

w n Lw n n n L

W en

1

0

0( ) ( )

the FBS formula is seen to be equivalent to the formula above, since (according to the sampling theorem) any

set of uniformly spaced samples of ( ) is adequate

ω

ω

=

=

∑ k

Nj

k

j

W e wN

N W e

92

Filter Bank Summation

1 1

0 0

0

0

the impulse response of the composite filter bank system is

( ) ( ) ( ) ( ) ( )

thus the composite output is

( ) ( ) ( ) ( ) ( ) thus for FBS method, the

ω δ− −

= =

= = =

= ∗ =•

∑ ∑%

%

k

N Nj n

kk k

h n h n w n e N w n

y n x n h n Nw x n

1 1

0 0

0

reconstructed signal is

( ) ( ) ( ) ( ) ( )

if ( ) is sampled properly in frequency, and is independent of theshape of ( )

ω ω

ω

− −

= =

= = =

∑ ∑ k k

k

N Nj j n

k nk k

jn

y n y n X e e N w x n

X ew n

93

Filter Bank Summation

1/N

94

2 21 1

0 0

2 21

0

21

0

0 0 1

( )

( ) ( ) ( )

( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) for if then n

N N j k j knN N

k nk k

N j km j knN N

k m

N j k n mN

m k

m r

r

y n y n X e e

x m w n m e e

x m w n m e

x m w n m N n m rN

y n N w rN x n rN

w n n L N L

π π

π π

π

δ

− −

= =

− −

=

− −

=

=−∞

=−∞

= =

⎡ ⎤= −⎢ ⎥

⎣ ⎦

= −

= − − −

= −

• ≠ ≤ ≤ − ⇒ ≥

∑ ∑

∑ ∑

∑ ∑

∑ ∑

∑0

00 1 2

eed only term

if then in order for ( ) ( ) you need the condition ( ) , , ,... 'undersampled' representation can still work--at least in theory

ry(n) Nw( )x(n)

N L y n x n w rN r

==

• < = = = ± ±•

Filter Bank Summation

95

Summary of FBS Method perfect reconstruction of from ( ) is possible using FBS

under the following conditions: 1. is a finite duration filter/window

2. ( ) is sampled properly in both ti

jn

jn

x(n) X e

w(n)

X e

ω

ω

me and frequency

perfect reconstruction of from ( ) is also possible using FBS under the following condition:

( ) is perfectly bandlimited

To avoid time aliasing, ( ) must

k

k

jn

j

jn

x(n) X e

W e

X e

ω

ω

ω

2 4

be evaluated at at least uniformly spaced frequencies, where is the window duration -since window of length samples has frequency bandwidth of from

/ (for RW) to / (for HW), the

LL

LL Lπ π

2 0 1 1 bandpass filters in FBS overlap

in frequency since the analysis frequencies are / , , ,...,

there is a way (at least theoretically) for ( ) to be evaluated in non-overlapping bands

kjn

k L k L

X e ω

π = −

and for which can still be exactly recoveredx(n) 96

FBS Reconstruction in Non-Overlapping Bands

2 0 1 1

assume window length for all bands is samples assume the same window is used for equally spaced

frequency bands with analysis frequencies

, , ,...,

where can be less than

πω

••

= = −

k

LN

k k NN

N L assume is an ideal lowpass filter with cutoff frequency

πω

=p

w(n)

N

example with N=6 equally spaced ideal

filters

Page 17: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

17

97

FBS Reconstruction in Non-Overlapping Bands

1 1

0 0

1 12

0 0

/

the composite impulse response for the FBS system is

( ) ( ) ( )

defining a composite of the terms being summed as

( )

we get fo

k k

k

N Nj n j n

k k

N Nj n j kn N

k k

h n w n e w n e

p n e e

ω ω

ω π

− −

= =

− −

= =

= =

= =

∑ ∑

∑ ∑

%

r ( )

( ) ( ) ( ) it is easy to show that is a periodic train of impulses of the form

( ) ( )

giving for ( ) the expression

( ) ( ) ( )

thus the composite i

r

r

h n

h n w n p np(n)

p n N n rN

h n

h n N w rN n rN

δ

δ

=−∞

=−∞

=•

= −

= −

%

%

%

%

mpulse response is the window sequence sampled at intervals of samplesN

98

FBS Reconstruction in Non-Overlapping Bands

1 for ideal LPF we have

sin( / ) sin( )( ) , ( ) ( )

giving ( ) ( ) other cases where perfect reconstruction is obtained1. ( ) is of finite length and causal (no images)2. ( ) has

n N rw n w rN rn rN N

h n n

w n L Nw n

π π δπ π

δ

= = =

=•

%

0

0

0

0

0

10 0 1 2

length and has the property ( ) / , for

for ( , , , ,...)

giving ( ) ( ) ( ) ( )

( ) ( ) ( )j r Nj

Nw n N n r N

n rN r r r

h n p n w n n r N

H e e y n x n r Nωω

δ−

>= =

= = ≠ = ± ±

= = −

= ⇒ = −

%

%

impulse response of ideal lowpass filter with cutoff frequency π/N

99

Summary of FBS Reconstruction for perfect reconstruction using FBS methods

1. does not need to be either time-limited or frequency-limited to exactly reconstruct

from ( ) 2. just needs eq

ω

kjn

w(n)

x(n) X ew(n) ually spaced zeros, spaced samples apart for theoretically perfect reconstruction

exact reconstruction of the input is possible with a number of frequency channels less than that required by the

•N

sampling theorem key issue is how to design digital filters that match these

criteria•

100

Practical Implementation of FBS

FBS and OLA Comparisons

101 102

FBS and OLA Comparisonsduals filter bank summation method overlap addition method

-- one depends on sampling relation in frequency -- one depends on sampling relation in time FBS requires sampling in frequency

• ←⎯⎯⎯→

1

0

0

1 0( )

be such that the window

transform ( ) obeys the relation

( ) ( ) any

OLA requires that sampling in time be such that the window obeys the relation

( ) ( ) / any

ω

ω ω ω−

=

=

− =

∑ k

j

Nj

k

j

W e

W e wN

w rR n W e R

the key to Short-Time Fourier Analysis is the ability to modify the short-time spectrum (via quantization, noise enhancement, signal enhancement, speed-up/slow-down,etc) and recover an

=−∞

∑r

n

"unaliased" modified signal

Page 18: Short-Time Fourier Analysis Why STFT for Speech Signals speech processing... · 3 13 Short Time Fourier Transform ωω ω ω ∞ − =−∞ • =− =−⇒ ˆˆ∑ ˆ STFT is a

18

103

Overlap Addition (OLA) Method

1

0

0 1

1

assume ( ) sampled with period samples in time

( ) ( ), integer, the Overlap Add Method is based on the summation

( ) ( ) OLA Metho

ω

ω ω

ω ω∞ −

=−∞ =

= ≤ ≤ −

⎡ ⎤= ⎢ ⎥

⎢ ⎥⎣ ⎦∑ ∑

k

k k

k k

jn

j jr rR

Nj j n

rr k

X e R

Y e X e r k N

y n Y e eN

d

for each value of , compute the inverse transform of ( ) giving the sequences ( ) ( ) ( ), the signal at time is obtained by summing the values at time

of all the seque

ω•

= − − ∞ < < ∞

kjr

r

r Y e

y m x m w rR m mn n

nces, ( ) that overlap at time , giving

( ) ( ) ( ) ( )

if has a bandlimited FT and if ( ) is properly sampled in time (i.e., small enough to avoid aliasing) the

ω

∞ ∞

=−∞ =−∞

= = −

∑ ∑k

r

rr r

jn

y m n

y n y n x n w rR n

w(n) X eR

0

0

n

( ) ( ) / -- independent of , and

( ) ( ) ( ) / -- exact reconstruction of

=−∞

− ≈

=

∑ j

rj

w rR n W e R n

y n x n W e R x(n)