Spectral Estimation & Examples of Signal Analysis ·  · 2017-07-14Spectral Estimation & Examples...

38
Spectral Estimation & Examples of Signal Analysis Examples from research of Kyoung Hoon Lee, Aaron Hastings, Don Gallant, Shashikant More, Weonchan Sung Herrick Graduate Students

Transcript of Spectral Estimation & Examples of Signal Analysis ·  · 2017-07-14Spectral Estimation & Examples...

Spectral Estimation & Examples of Signal Analysis

Examples from research of Kyoung Hoon Lee, Aaron Hastings, Don Gallant,

Shashikant More, Weonchan Sung Herrick Graduate Students

Estimation: Bias, Variance and Mean Square Error

Let φ denote the thing that we are trying to estimate. Let denote the result of an estimation based on one data set with N pieces of information. Each data set used for estimate à a different estimate of φ. Bias: True value - the average of all possible estimates

Variance: Measure of the spread of the estimates about the mean of all estimates. Mean Square Error:

φ

b(φ) = φ − E[φ]

m.s.e.= E[ (φ −φ)2 ]= b2 +σ 2

σ 2 = E[ (φ − E[φ])2 ]

Estimation: Some definitions

Estimate is consistent if, when we use more data to form the estimate, the mean square error is reduced.

If we have two ways of estimating the same thing, we say that the estimator that leads to the smaller mean square error is more efficient than the other estimator.

xxxxxxxx

x

xestimates

mean of all estimates

φ = (a,b)

a

b

true value bias

Examples

Bias and variance of an estimate of the mean:

X ,µ = 1N

Xnn=1

N∑

E[µ]= E 1N

Xnn=1

N∑

⎣⎢⎢

⎦⎥⎥=

1N

E Xn⎡⎣ ⎤⎦n=1

N∑ =

1N

µ = n=1

N∑ µ (unbiased)

σµ2E µ − E[µ]( )2⎡

⎣⎢⎤

⎦⎥= E 1

NXn

n=1

N∑

⎜⎜

⎟⎟−µ

⎜⎜

⎟⎟

2⎡

⎢⎢⎢

⎥⎥⎥= E 1

NXn −µ( )

n=1

N∑

⎜⎜

⎟⎟

2⎡

⎢⎢⎢

⎥⎥⎥

=1N2E Xm −µ( )

m=1

N∑ Xn −µ( )

n=1

N∑⎡

⎣⎢⎢

⎦⎥⎥

=1N2

N2 − N( )E Xn −µ( ) Xm −µ( )⎡⎣

⎤⎦ + N E Xn −µ( )2

⎣⎢⎤

⎦⎥⎧⎨⎩

⎫⎬⎭

=1N2

N2 − N( )E Xn −µ( )⎡⎣

⎤⎦E Xm −µ( )⎡⎣

⎤⎦ + N E Xn −µ( )2

⎣⎢⎤

⎦⎥⎧⎨⎩

⎫⎬⎭

=1N2N E Xn −µ( )2

⎣⎢⎤

⎦⎥=

1Nσ x

2

Derivation assuming that the samples Xn are independent of one another.

Separate into terms where n does not equal m and where n=m

Examples

Biased Estimate of the variance of a set of iN measurements:

Unbiased Estimates of the variance of a set of N measurements::

1N

(Xnn=1

N∑ − µ)2

1N −1

(Xnn=1

N∑ − µ)2 and 1

N(Xn

n=1

N∑ −µ)2

First estimate the mean, and use that estimate in this calculate

(have lost 1 degree of freedom)

Special case where the mean is known and doesn’t need to be

estimated from the data

Estimation of Autocovariance functions

Two methods of estimating Rxx(τ) from T sec. of data. 1.  Dividing by the integration time: T-|τ|

Estimation was unbiased but had very high variance, particularly when τ is close to T.

2.  Dividing by total time: T Estimation was biased (asymptotically unbiased). This was equivalent to multiplying first estimate by a triangular window (T-|τ|)/T. This window attenuates the high variance estimates.

x(t)

time

T secs

τ

x(t) x(t+τ)

Calculating the average value of [x(t) x(t+τ)] from T seconds of data.

Estimation of Autocovariance functions

Two methods of estimating Rxx(τ) from T sec. of data. 1.  Dividing by the integration time: T-|τ|

Estimation was unbiased but had very high variance, particularly when τ is close to T.

2.  Dividing by total time: T Estimation was biased (asymptotically unbiased). This was equivalent to multiplying first estimate by a triangular window (T-|τ|)/T. This window attenuates the high variance estimates.

x(t)

time τ

x(t) x(t+τ)

T secs

τx(t) x(t+τ)

x(t)

τ

x(t+τ)

τx(t) x(t+τ)

x(t)

τCalculating the average value of [x(t) x(t+τ)] from T seconds of data.

Estimation of Cross Covariance

Same issues as for Auto-Covariance: Bigger τ less averaging for finite T.

x(t)

time τ

y(t)

T

time T

x(t) y(t+τ)

x(t) and y(t), zero mean, weakly stationary random processes. Average value of [x(t) y(t+τ)]. Additional problem: must make T large enough to accommodate system delays.

y(t-τ) x(t)

Estimation of Covariance

With fast computation of spectra, these are now more usually estimated by inverse Fourier

transforming the power and cross spectral density estimates.

Inverse transform of RAW PSD or CSD ESTIMATE

equivalent to Method 2 for calculating covaraiance functions with triangular window for data

of size Tr

Power Spectral Density EstimationDefinition: Estimation:1.  Could Fourier Transform the Autocorrelation Function estimate

(not computationally efficient). 2.  Could use the frequency domain definition directly.

Raw Estimate =

No averaging! Extremely poor variance characteristics. Variance is and is unaffected by T, the length of data used.

Sxx ( f ) = limT →∞

EXT*XTT

⎣⎢⎢

⎦⎥⎥= Rxx−∞

+∞∫ (τ )e− j2π f τdτ .

Sxx ( f ) =XT*XTT

⎣⎢⎢

⎦⎥⎥

Sxx ( f )2

Power Spectral Density Estimation (Continued)

Smoothed estimate from segment averaging.

1.  Break signal up into Nseg segments, Tr seconds long. 2.  For each segment:

1.  Apply a window to smooth transition at ends of segments 2.  Fourier Transform windowed segment à XT(f) 3.  Calculate a raw power spectral density: XTr|2/Tr| estimate

3.  Average the results from each segment to get the smoothed estimate and do a power compensation for the window used.

!Sxx ( f ) = 1

NSEG.PcompSxxi ( f )

i=1

NSEG∑ Pcomp = 1

Tw2(t)dt∫

x(t)

time Tr

w(t)

Power Spectral Density Estimation (Continued)

Smoothed estimate from segment averaging.

x(t)

time Tr

w(t)

Overlap: For some windows segment overlap makes sense. A Hann window, 50% overlap means that data de-emphasized in one windowed segment is strong emphasized in the next window (and vice versa).

Bias: Note PSD estimate bias is controlled by the size of the window (Tr) which controls the frequency resolution (1/Tr). Larger window, smoother transitions à less power leakage à less bias

Power Spectral Density (PSD) Estimation (Continued)

We argue that the distribution of the smoothed PSD was related to that of a Chi-squared random variable (χν2 ) with ν = 2.NSEG degrees of freedom, if Tr was large enough so we could ignore bias errors. Therefore: and rearranging we showed that: Therefore, we can control variance by averaging more segments. Note: shorter segments mean larger bias, so for a fixed T seconds of data, there is a trade-off between Segment Length (Tr), which controls the bias, and Number of Segments (NSEG), which controls the variance: T=Tr.NSEG.

Variance2.Nseg. !SxxSxx

⎣⎢

⎦⎥=4.Nseg2

Sxx2

Variance !Sxx⎡⎣ ⎤⎦= 2(2.Nseg)

Variance[ !Sxx ]=Sxx2

Nseg

Cross Spectral Density (CSD)Definition: Estimation: Could Fourier Transform the Cross-correlation function estimate (not computationally efficient). Could use the frequency domain definition directly.

Raw Estimate = As with PSD,this has extremely poor variance characteristics, so

–  divide the time histories into segments, –  generate a raw estimate from each segment, and –  average to reduce variance and produce a smoothed estimate.

Sxy ( f ) = limT →∞

EXT*YTT

⎣⎢⎢

⎦⎥⎥= Rxy−∞

+∞∫ (τ )e− j2π f τdτ .

Sxy ( f ) =XT*YTT

⎣⎢⎢

⎦⎥⎥

Cross Spectral Density Estimation: Segment Averaging

x(t)

time Tr

w(t)

y(t)

time Tr

w(t)

Fourier Transform of Windowed Segments à XT(f) & YT(f). Raw Estimate from ith segment = Smoothed Estimate = !Sxy ( f ) =

1Nseg

Sxyi ( f )i=1

Nseg∑

Sxyi ( f ) =XTr* ( f )YTr ( f )Tr

Issues with Cross Spectral Density Estimates

1.  Reduce bias by choosing the segment length (Tr) as large as possible. (Bias greatest where the phase changes rapidly.)

2.  Reduce variance by averaging many segments. 3.  Might require a large amount of averaging to reduce noise effects:

4.  Time delays between x and y cause problems, if the time delay (to) is greater than a small fraction of the segment length (Tr). Can estimate t0 and offset y segments, but need T+t0 seconds of data.

SNRym =SyySnyny

=H ( f )

2Sxx

Snyny

ym(t) = y(t)+ n(t) = h(t)∗ x(t)+ n(t)x(t), n(t) zero mean, weakly stationary, uncorrelated random processes

!Sxy ≈ H ( f ) !Sxx + !Sxn→ H ( f ) !Sxx

Cross Spectral Density Estimation: Segment Averaging with System Delays

x(t)

time Tr

w(t)

y(t)

time Tr

w(t)

Fourier Transform of Windowed Segments à XT(f) & YT(f).

Offsetting y segements essentially removes most of the delay from the estimated frequency response function. Can put back delay effects in by multiplying estimate of H(f) by:

e− j2π f t0

estimated t0

Coherence Function Estimation: Substitute in Smoothed Estimates of Spectral Densities

Coherence takes values in the range 0 to 1. Definition: ; Estimate: –  Substituting raw spectral density estimates into formula results in 1

A result where the coherence = 1 at all frequencies from measured signals should be treated with a high degree of suspicion.

–  Estimate highly sensitive to bias in spectral density estimates, which is particularly bad where the phase of the cross spectral density changes rapidly (at maxima and minima in |Sxy|).

–  COHERENCE à 0 because of: NOISE NONLINEARITY BIAS ERRORS IN ESTIMATION LINEAR RELATIONSHIP BETWEEN SIGNALS VERY WEAK

γxy2 =

| Sxy |2

SxxSyy!γxy2 =

| !Sxy |2

!Sxx !Syy

Example: System with Some Nonlinearities (cubic stiffness) and Noisy Measurements

Nonlinearity causes spread of energy here, around 3x and 5x this frequency

Poor SNR on output

causing this

Nonlineary causes broad dips in

coherence function. If you drive the system

harder these regions become wider

Nonlinear Mode

Dips due to Bias Errors

Poor SNRy

Poor SNRy

Example: Linear System with Noisy Output Measurements

High SNR; Tr = 512/fs

High SNR; Tr = 2048/fs

Low SNR on output; Tr = 512/fs

SNRy also affecting coherence here

Dips mainly due to bias…. and thus get smaller

as resolution increases

Dip filled in with noise

Bias greatest where phase change is fastest

Less Averaging compared to N=512 case: fewer segments à greater variance

but bias effects are less

H1 and H2 Estimates of H: Effects of Noise

If the system is linear and there is No Noise (ignoring all other estimation erros): H(f) = Sxy(f)/Sxx(f) (H1 approach) = Syy(f)/Syx(f) (H2 approach) Cases with Noise: Assume that estimation errors are small (Tr and Nseg both large). H1estimate = Sxmym

/Sxmxm = [Sxy(f)/Sxx(f)]/[1+ Snxnx

/Sxx] = H(f)/[1+ Snxnx

/Sxx] Noise on the input adversely affects this estimate of H. Theory: |H1estimate| < |H| H2 estimate = Symym

/S*xmym

= [Syy(f)/S*xy(f)].[1+ Snyny

/Syy] = H(f).[1+ Snyny

/Syy] Noise on the output adversely affects this estimate of H. Theory: |H2 estimate| > |H| Note that with bias errors due to windowing (Tr not as large as you would like) these inequalities may not hold, but |H1estimate| < |H2estimate|

Estimation of H Note that, e.g., Frequency response function estimates are extremely sensitive to bias errors which are worse at peaks and troughs. Require large segment sizes to overcome bias, but this means less segments to average, thus higher variance. Note: A low coherence function does not necessarily imply a poor frequency response function estimate. If the coherence function is low because of noise on the response (input), then the H1 (H2) frequency response estimation should be accurate, provided sufficient averaging was done to reduce the variance of the estimates.

E[H ]= E!Sxy!Sxy

⎣⎢⎢

⎦⎥⎥≠E[ !Sxy ]E[ !Sxx ]

Calibration of PSD and CSD in MatLab

psd - old program pwelch – new program

cpsd – gives complex conjugate of want you want mean square value of the time signal (variance), should give the same result as integrating the PSD. (Parseval’s theorem) Check for whether you are getting a two-sided or a one-sided PSD. One sided: Add negative and positive frequency contributions (not for the components at f=0 and fs/2, though, which should be zero anyway) – this is what Matlab does Two sided: When you integrate the spectrum (0 to fs/2) you’ll get about half of what you expect (no addition of positive and negative frequency contributions has occurred) Matlab also doubles the CPSD from 0 to fs/2, which doesn’t make sense, because it is convenient when you estimated the frequency response function because the doubling cancels.

Calibration (continued)

Power Spectral Density Estimates Using DTFs: Recall that for –fs/2 < f < fs/2,

XT ( f ) f =k fsN≈ Δ.DFT w(nΔ).x(nΔ),n = 0,1,...N −1( ) = Δ.Xk

Sxx ( fk ) =X*T ( fk )XT ( fk )T .wcomp

≈ Δ2. X*k X kNΔ.wcomp

=Δ X k

2

N .wcomp

Calibration Continued: Energy Spectral Density We sometimes have segments that contain a single transient (tap testing of structures) and we average the raw spectra from each segment to remove noise effects. [Be careful with applying this random process theory to different types of signals, each segment used in the estimation should contain similar information.] If we choose different Tr, i.e., allow a shorter or longer time between successive transients, (transient should have died away in the segment), the PSD will change because of the divide by Tr in the formula.

To overcome this problem we estimate an Energy Spectral Density (ESD) (remove the divide by Tr in the raw PSD estimate.) Raw ESD estimate = |XTr(f)|2 ≈ Δ2 |Xk|2 (Volts/Hz)2

[You also need to be careful with window choice here so as not to distort the transient]

time - s Tr

Calibration Continued: Power Spectrum

Power SpectrumSegment averaging is often applied to signals that have periodic and random components. In a power spectrum (works great for periodic signals), as resolution increases (frequency spacing gets smaller) the noise floor decreases. Total power = sum of power at each spectral component. Recall: Ck = Xk/N, if you synchronize, don’t alias and there is no noise. Power Spectral Density (PSD) (ideal for random signals level unaffected by changes in frequency resolution – window size) Total power = the integral of the PSD = sum of PSD Values x Freq.Resolution Power estimate = |Xk|2/N2

= Raw PSD estimate . (frequency resolution) = (Δ |Xk|2/N ) . (fs/N) V2

PSDs for Sines + NoiseThe power spectral density of a sinusoid is: •  But by using windows Tr seconds long, the delta

functions become sinc or sinc-like functions with maximum height affected by window size Tr.

•  If Tr is too small the sinc functions will be buried in the noise. But as Tr is increased the sinc functions begin to emerge from the noise.

•  So if you expect a peak in your spectrum is due to sinewave, increase the window size (better frequency resolution) and see if the peak gets larger, as you would expect if it were truly a sine wave.

A2

2δ( f − f1)+

A2

2δ( f + f1)

1000 1050 1100 1150 1200 1250 130030

35

40

45

50

55

60

65

70

75

Frequency - Hz

Leve

l - d

B

Left

OriginalSimulated

Sines + Noise

Sxmxm = Sxx + Snn

=

ATr2sinc π ( f − f1)Tr( )+ ATr2 sinc π ( f + f1)Tr( )

2

Tr+ Snn

≈A2Tr4sinc2 π ( f − f1)Tr( )+ A

2Tr4sinc2 π ( f + f1)Tr( )+ Snn

Note here we have assumed averaging is sufficient to make cross terms small compared to the terms retained.

Sines + Broadband Random Noise

Frequency - Hertz

PSD

- V

2 /Hz

Tr = NΔ N=4,096 N=8,192 N=16,384

sinc function emerging from

noise as Tr increases.

Variation in estimated PSD due to lack of

averaging. Tr larger

⇒ Nseg smaller,

∴ larger variance.

Frequency Variations in Sinusoidal Components

When there are frequency variations, sinusoidal power is spread over group of frequencies and amplitudes are reduced. Sometimes more noticeable at higher harmonics when variations are small, like signal 17 below, versus signal 22, where there appears to be very little frequency variation (sinusoidal components are narrow, even at high frequencies.

Frequency - Hz

Pow

er S

pect

ral

Den

sitie

s - d

B

Fast Modulations in Frequency Modulated (FM) Sounds

FM tones of the form (randomized variation of tones) :

–  r(t) : random noise pass through a 4th order Butterworth filter –  fc : cutoff freq., –  B : the range of frequency modulation –  f0 : center freq (700 Hz)., –  Sampling frequency : 44.1 kHz

y(t) = Asin 2π fo t + 2πB r(t)0

t∫ dt

⎜⎜

⎟⎟

time cf

Instantaneous Frequency

w(t)

r(t)

f0 = 700

700 + B

700 − B€

f (t)

Sounds – Power Spectra of a Frequency Modulated Tone

fc = 10 Hz 50 Hz 100 Hz 200 Hz

B = 25 Hz

50 Hz

75 Hz

100 Hz

Spectral Estimation Parameters:

Hanning Window

fs = 44.1 kHz

Δ f = 1 Hz

100 segments

50% overlap

fc Filter cut-off frequency: controls frequency content of frequency variation

B controls the range of frequency

modulation

Power Spectra of a FM Tone with Trackable FMs Made Stationary

fc = 10 Hz 50 Hz 100 Hz 200 Hz

B = 25 Hz

50 Hz

75 Hz

100 Hz

Spectral Estimation Parameters:

Hanning Window

fs = 44.1 kHz

Δ f = 1 Hz

100 segments

50% overlap

fc Filter cut-off frequency: controls frequency content of frequency variation

B controls the range of frequency

modulation

Another Example of Spectral Manipulation to Help in Estimation of Tonality Metrics

Recording (> 5s) à Finely Resolved Spectrum Signal Decomposition: (1) Significant Tones, (2) Insignificant Tones, (3) Noise Floor Signal Reconstruction: (2)+(3) à (4) new noise floor, (1)+(4) Inverse DFT to give sound.

Time-Varying (Non-Stationary) Signals

Spectrograms: Apply stationary spectral methods over short periods of time with overlapping windows

–  limits averaging for random parts of signals –  short windows means more bias, and tones not so prominent

Time 0 à 3 seconds

Freq

uenc

y 0

à 2

000

Hz.

Humming/Whining

Motor Driven Device

Spectrogram: Non-stationary Sounds

Aircraft flyover: Tones with Doppler

Shift & Ground Effects

Spectrograms: Sliding Spectral Estimates

Have to “play” with window sizesa.  Listen to see if there are any obvious variations you can

track, try a window size about 1/10 of a variation “period” (Ta). In Matlab: nfft = nearest power of 2 to ( 0.1 fs . Ta). Typically we choose a Hann window with 50% overlap.

b.  Identify fundamental frequencies of tone complexes to identify lowest desirable frequency resolution. Based on frequency analysis and understanding of repetition rates in your machine, the minimum window size should be the inverse of (fundamental frequency/7) for a Hann window. (One harmonic series example.)

c.  Make window smaller (if harmonics remain well separated) to see if there are faster fluctuations. As you continue to make windows smaller, the frequency resolution in Hz (inverse of window size in seconds) will get bigger. Eventually harmonic separation and spectral resolution will merge (not good).

d.  Always a trade-off between spectral and temporal resolution

References (for ME 579 at Purdue)