Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo,...

77
Signal processing

Transcript of Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo,...

Page 1: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Signal processing

Page 2: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Example data – ChIP-Seq

T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross, "Four histone variants mark the boundaries of polycistronic transcription units in Trypanosoma brucei" , Genes Dev. 23 (2009) 1063-1076.

Page 3: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

α-factor release

Example Data: Time-Resolved ChIP-chip

Chromosome 16

M.D. Sekedat, D. Fenyö, R.S. Rogers, A.J. Tackett, J.D. Aitchison, B.T. Chait, "GINS motion reveals replication fork progression is remarkably uniform throughout the yeast genome", Mol Syst Biol. 6 (2010) 353.

Page 4: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Example data – MALDI-TOF

m/z1000 4500

Inte

nsity

1800

0

D:\Users\Fenyo\Desktop\ATP.txt (15:42 02/03/11)Description: none available m/z2280 2400

Inte

nsi

ty

700

0

D:\Users\Fenyo\Desktop\ATP.txt (15:46 02/03/11)Description: none available

m/z1300 1460In

ten

sity

45

0

D:\Users\Fenyo\Desktop\ATP.txt (15:50 02/03/11)Description: none available

m/z1444.0 1458.0

Inte

nsi

ty

35

0

D:\Users\Fenyo\Desktop\ATP.txt (15:54 02/03/11)Description: none available

m/z2378.0 2394.0

Inte

nsi

ty

700

0

D:\Users\Fenyo\Desktop\ATP.txt (16:07 02/03/11)Description: none available

Peptide intensity vs m/z

Page 5: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Fragment intensity vs m/z

Example data – ESI-LC-MS/MS

Time

m/z

m/z

% R

ela

tive

Ab

un

da

nce

100

0250 500 750 1000

[M+2H]2+

762

260 389 504

633

875

292405 534

9071020663 778 1080

1022

MS/MS

Peptide intensity vs m/z vs time

Page 6: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Example Data: Super-Resolution Microscopy

Dylan Reid and Eli Rothenberg

Page 7: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Sinus

amplitude

Wave length

b

ac

a

ca /)sin(

Page 8: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Sinus and Cosinus

b

ac

a

ca /)sin( cb /)cos(

Page 9: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Two Frequencies

Page 10: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Fourier Transform

dxxff eix 2^

)()(

)2sin()2cos(2

iiiei

Page 11: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Fourier Transform

from numpy import *x=2.0*pi*arange(1000.0)/100000.0sin1 = sin(1000.0*x)sin2 = 0.2*sin(10000.0*x)sin12=sin1+sin2

fft12=fft.rfft(sin12)

Frequency

Page 12: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Inverse Fourier Transform

dfxf exi2^

)()(

Frequency

Page 13: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Inverse Fourier Transform

from numpy import *x=2.0*pi*arange(1000.0)/100000.0sin1 = sin(1000.0*x)sin2 = 0.2*sin(10000.0*x)sin12=sin1+sin2fft12=fft.rfft(sin12)

sin12_=fft.irfft(fft12,len(sin12))

Frequency

Page 14: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Inverse Fourier Transform

Frequency

Page 15: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

A Peak

centroid

full width at half

maximum (FWHM)

area

height

maximum

meanvarianceskewnesskurtosis

Inte

nsit

y

Page 16: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Mean and variance

)(xxf

)()(22xfx

Mean

Variance

)(xfA peak is defined by and 1)( xf

Page 17: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Skewness and kurtosis

3/)(44

)( xfx

Skewness

Kurtosis

33/)()( xfx

Page 18: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

A Gaussian Peak

def gaussian(x,x0,s):return exp(-(x-x0)**2/(2*s**2))

x = linspace(-1,1,1000)y=gaussian(x,0,0.1)ffty=fft.rfft(y)

Frequency

Page 19: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

A Gaussian Peak

Skewness = 0

Kurtosis = 0

2log22FWHM

2heightarea

Frequency

Page 20: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Peak with a longer tail

2FWHM

heightarea

)( 01

1)(

2

xxxf

Frequency

Page 21: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

A skewed peak

def pdf(x): return 1/sqrt(2*pi) * exp(-x**2/2)

def cdf(x): return (1 + erf(x/sqrt(2))) / 2

def skew(x,e=0,w=1,a=0): t = (x-e) / w return 2 / w * pdf(t) * cdf(a*t)

Frequency

Page 22: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Normal noise

x = linspace(-1,1,1000)y=0.2*random.normal(size=len(x))

If the noise is not normally distributed, try to find a transform that makes it normal

Frequency

Page 23: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Lognormal noise

x = linspace(-1,1,1000)y=0.2*random.lognormal(size=len(x))

Frequency

Page 24: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Skewed noise

x=random.uniform(-1.0,1.0,size=10*len(x))y=random.uniform(0.0,1.0,size=10*len(x))yskew=skew(x,-0.1,0.2,10)/max(yskew)yn_skew=x_test[y<yskew][:len(x)]

Frequency

Page 25: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Gaussian peak with normal noise

Frequency

Frequency

Frequency

Page 26: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Removing High Frequences

Frequency

Page 27: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Convolution

http://en.wikipedia.org/wiki/Convolution

)()())(*( tgftgf

Describes the response of a linear and time-invariant system to an input signal

The inverse Fourier transform of the pointwise product in frequency space

Page 28: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Smoothing by convolution

Page 29: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Smoothing

w=ones(2*width+1,'d')convolve(w/w.sum(),y,'valid‘)

Frequency Frequency Frequency

Inte

nsit

y

Page 30: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Smoothing

Page 31: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Smoothing

Page 32: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Adaptive Background Correction (unsharp masking)

wlk

wlk

kIw

dwdlI )(

12),,('

Unsharp masking

Original

wi = linspace(1,window_len,window_len)w = 1 / ( 2*r_[wi[::-1],0,wi] + 1 )x_ = x - d*convolve(w/w.sum(),x,'valid')

Page 33: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Adaptive Background Correction

Page 34: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Smoothing and Adaptive Background Correction

Page 35: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Savitsky-Golay smoothingPolynomial order = 3

Bin size = 25

Bin size = 75

Bin size = 150

Polynomial order = 5 Polynomial order = 7

Page 36: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Background

Frequency

Frequency

Page 37: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Background Subtraction Using Smoothing

Bin size = 100 Bin size = 200 Bin size = 300

Smooting Smooting Smooting

Background subtractionBackground subtractionBackground subtraction

Page 38: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Root Mean Square Deviation (RMSD)

22

2

//||

))((w

wlkIkI

The Root Mean Square Deviation (RMSD) is often constant for the noise and larger for the peak if the window size is approximately the size of the peak.

Page 39: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Background Subtraction using RMSDBin size = 100 Bin size = 200 Bin size = 300

RM

SD

RM

SD

RM

SD

Inte

nsit

y

Inte

nsit

y

Inte

nsit

y

Page 40: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Convolution, Cross-correlation, and Autocorrelation

http://en.wikipedia.org/wiki/Convolution

Convolution describes the response of a linear andtime-invariant system to an input signal.

The inverse Fourier transform of the pointwise product in frequency space.

Cross-correlation is a measure of similarity of two signals.

It can be used for finding a shift between two signals.

Auto-correlation is the cross-correlation of a signal with itself.

It can be used for finding periodic signals obscured by noise.

Page 41: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Cross-correlation and autocorrelation

)()())(( tgftgf

http://en.wikipedia.org/wiki/Convolution

)()())(*( tfftff

Page 42: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Autocorrelation

Autocorrelation

Signal

Same signal

Page 43: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Cross-correlation

Cross-correlation

Signal

Shifted signal

Page 44: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Cross-correlation

Cross-correlation

Signal

Half of the peaks shifted

Page 45: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

How similar are two signals?

Dot product),...,,(

21 aaa nA

),...,,(21 bbb n

B

cos

BA

BA iiiba

Identical vectors: 1,0 BAPerpendicular vectors: 0,

2 BA

)()()0)(( gfgf

The dot product is the came as the cross-correation at zero:

Page 46: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

What are the characteristics of the dot product?

10 3 1 0.3 0.1 S/N 10

100

1000

Dimensions

Signal+Noise

Noise

Page 47: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Autocorrelation

Autocorrelation

Signal

Shifted signal

Sum of signal and shifted

signal

Page 48: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Coincidence – enhances the signal

The signal to noise can be dramatically increased by measuring several independent signals of the same phenomenon and combining these signals.

Ideal signal

Product of the four measurements

Four measurements

Page 49: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Coincidence – supresses and transforms the noise

Noise in productOriginal noise

Page 50: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Coincidence – supresses interference

Ideal signal

Product of the four measurements

Four measurements with interference

Page 51: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Peak Finding

The derivative of a function is zero at its minima and maxima.

The second derivative is negative at maxima and positive at minima.

Page 52: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Detection of steps

Motivation: To demonstrate a general strategy for separating signal from noise:

1. Characterize the signal and the noise2. Make a model of the data3. Select detection method4. Select parameters using simulations

Inte

nsit

y

Page 53: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Detection of steps: Characterization of noise

Remove signal by subtracting a moving average

Page 54: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Detection of steps: Model of data

points=1000x = linspace(-1,1,points)y=noise*random.normal(size=len(x))y[points/2:]+=signal

S/N=0.75 S/N=1 S/N=2

Page 55: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Detection of steps: Detection method

Steps can be converted into peaks by calculating the difference between the moving average in two windows

S/N=0.75 S/N=1 S/N=2

Page 56: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Detection of steps: Detection method

S/N=0.75 S/N=1 S/N=2

Bin size = 10

Bin size = 30

Bin size = 100

Avera

ge

Inte

nsit

yA

vera

ge

Inte

nsit

yA

vera

ge

Inte

nsit

y

Avera

ge

Inte

nsit

yA

vera

ge

Inte

nsit

yA

vera

ge

Inte

nsit

y

Avera

ge

Inte

nsit

yA

vera

ge

Inte

nsit

yA

vera

ge

Inte

nsit

y

Page 57: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Detection of steps: Simulations - peak location

S/N=0.05 S/N=0.25 S/N=1

Bin size = 10

Bin size = 30

Bin size = 100

Page 58: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Detection of steps: Simulations – correct peak

S/N=0.05 S/N=0.25 S/N=1

Bin size = 10

Bin size = 30

Bin size = 100

Fre

qu

en

cy

Fre

qu

en

cy

Fre

qu

en

cy

Fre

qu

en

cy

Fre

qu

en

cy

Fre

qu

en

cy

Fre

qu

en

cy

Fre

qu

en

cy

Fre

qu

en

cy

Score

Score

Score

Score

Score

Score

Score

Score

Score

Page 59: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Detection of steps: Simulations - FDR and FNR

S/N=0.05 S/N=0.25 S/N=1

Bin size = 10

Bin size = 30

Bin size = 100

Fals

e R

ate

Fals

e R

ate

Fals

e R

ate

Fals

e R

ate

Fals

e R

ate

Fals

e R

ate

Fals

e R

ate

Fals

e R

ate

Fals

e R

ate

Threshold

Threshold

Threshold

Threshold

Threshold

Threshold

Threshold

Threshold

Threshold

False Discovery

Rate

False Negative

Rate

Page 60: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Peak Finding

1. Characterize the signal and the noise2. Make a model of the data3. Select detection method4. Select parameters using simulations

Inte

nsit

y

Page 61: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Peak Finding: Characterizing the noise

Inte

nsit

y

Let’s first try without removing the peaks

Page 62: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Peak Finding: Characterizing the noise

Inte

nsit

y

Removing the peaks by looking for outliers in the root mean square deviation (RMSD)

RMSD

Page 63: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Peak Finding: Characterizing the peaks

Inte

nsit

y

Page 64: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Peak Finding: Model of data

points=1000x = linspace(-1,1,points)y=noise*random.normal(size=len(x))y+=signal*gaussian(x,0,0.01)

S/N=1 S/N=2 S/N=4

Page 65: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Peak Finding: Detection method

S/N=1 S/N=2 S/N=4

Peaks can be detected by finding maxima in the moving average with a window size similar to the peak width

wlk

wlk

kIlS )()(

Page 66: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Peak Finding: Detection method – moving average

S/N=1

S/N=2

S/N=4

Bin size = 5 Bin size = 20 Bin size = 80 Signal

Page 67: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Peak Finding: Detection method – RMSD

S/N=1

S/N=2

S/N=4

Bin size = 5 Bin size = 20 Bin size = 80 Signal

Page 68: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Peak Finding: Information about the Peak

centroid(mean)

full width at half

maximum (FWHM)

area

height

maximum

meanvarianceskewnesskurtosis

Inte

nsit

y

Page 69: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Information about a Peak

)(

)(

xf

xxf

)(xfarea

Centroid or mean

)(xfA peak is defined by

))(max( xfheight

To calculate any of these measures we needto know where the peak starts and ends.

Page 70: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Where does a peak start and end?

Page 71: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Estimating peptide quantity

Peak heightCurve fittingPeak area

Peak heightCurve fitting

m/z

Inte

ns

ity

Page 72: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Time dimension

m/z

Inte

ns

ity

Tim

e

m/z

Tim

e

Page 73: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Sampling

Retention Time

Inte

nsi

ty

Page 74: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

0.5

0.6

0.7

0.8

0.9

1

1.1

1 2 3 4 5 6 7 8 9 10

Th

res

ho

lds

(90

%)

# of points

Sampling

Page 75: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

What is the best way to estimate quantity?

Peak height - resistant to interference- poor statistics

Peak area - better statistics - more sensitive to

interference

Curve fitting - better statistics- needs to know the peak

shape- slow

Page 76: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Homework: Background Subtraction Using Smoothing

Page 77: Signal processing. Example data – ChIP-Seq T.N. Siegel, D.R. Hekstra, L.E. Kemp, L.M. Figueiredo, J.E. Lowell, D. Fenyö, X. Wang, S. Dewell, G.A. Cross,

Summary

Fourier transform - transformation to frequency space and back

Signal – how do we detect and characterize signals?

Noise – how do we characterize noise?

Modeling signal and noise

Simulation to select thresholds and select parameters

Filters – fitering by low-pass (i.e. smoothing) and high-pass filters

(e.g. adaptive background correction)

Detection methods based on moving average and RMSD

Convolution - describes the response of a linear and

time-invariant system to an input signal

Cross-correlation is a measure of similarity of two signals

Autocorrelation can be used for finding periodic signals obscured by

noise

The dot product can be used to determine how similar two signals

are

Coincidence measurements enhance the signal and supresses noise

The quantity associated with a peak – height and area

Sampling – how often do we need to sample a peak to get a good

estimate of its area?