Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition...

59
Pattern Recognition applied to Biomedical Signals Dr Philip de Chazal Chief Technical Officer BiancaMed, Dublin, Ireland

Transcript of Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition...

Page 1: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Pattern Recognition applied to Biomedical Signals

Dr Philip de ChazalChief Technical OfficerBiancaMed, Dublin, Ireland

Page 2: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Outline

1st Hour: Focus on Pattern Recognition basics1. Pattern Recognition Overview2. Classifiers: LDA, QDA, FFNN, HMM etc3. Performance Assessment

Data splittingPerformance measures (sensitivity, specificity etc)

4. FeaturesTransformationsMissing valuesFeature Selection

Page 3: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Outline

2nd hour: Case study on Sleep apnea detection from the Electrocardiogram– Database– Expert Annotations– Features– Performance assessment– Practical tips

Page 4: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

1. Pattern Recognition OverviewPattern Recognition

Supervised Learning Unsupervised learning

Self organising feature maps

Cluster Analysis

Hebbian Learning

Vector QuantisationPlug-in parameters Distributed parameters

Parametric Non-Parametric

Nearest neighbour (kNN)

Decision Trees

Discriminant Analysis Bayesian ClassifiersNeural Networks (feed

forward etc)

HMM

Page 5: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Classifier Types: Parametric models with plug-in parameters

Define a parametric model between the input features and the output classesThe model has adjustable parameters which are set using training data

Features Classifier Classes

Adjustable Parameters

Page 6: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

2. Classification Methods

Bayes theoremGaussian modelsLinear discriminant analysis– Derivation– Covariance inversion– Training Equations– Classifying Equations

Quadratic discriminant analysisFeedforward neural networks

Page 7: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Bayes theorem

x

ℜ1 ℜ2

p x C P C( | ) ( )2 2p x C P C( | ) ( )1 1

Page 8: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Bayes theorem

Optimally classify an object into one of cmutually exclusive classes given priors and class densities.For a c-class problem Bayes’ rule states that the posterior probability of the kth class is related to the its prior probability and its class density function by

( )( )

1

,

,

k k kk c

l l ll

fp

f

π

π=

=

∑x θ

x θ

Class density

Posterior probability

Prior probability Normalising

component

Page 9: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Likelihood functionIf we have

1

c

kn

N N=

= ∑ labelled data (d×1) vectors, ( ) , 1..kn kn N=x , then a likelihood can

be formed as follows:

( ) ( )( )

1 1

,kNc

kk k n k

k n

l fπ= =

=∏∏θ x θ

Our aim is to find the values of θ for each class that maximise the value of the ( )l θ likelihood. Equivalently we can find the values of θ that maximise the value of

the log- likelihood:

( ) ( ) ( )( )

( )( ) ( )

( )

1 1

( )

1 1 1

log( ) log ,

log , log

k

k

Nck

k k n kk n

Nc ck

k n k k kk n k

L l f

f N

π

π

= =

= = =

= =

= +

∑∑

∑∑ ∑

θ θ x θ

x θ

Think of it as combined probabilities

Page 10: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

The class densities are modelled with a Gaussian model (d-dimensional) with common covariance across all classes:

( ) ( ) ( ) ( )122 11

2, , 2 expd T

k k k k kf π −− −⎡ ⎤= = − − −⎣ ⎦x θ μ Σ Σ x μ Σ x μ

Common Covariance Class mean

LDA: Gaussion parametric model for the class densities

Page 11: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

LDA: Log-likelihood

For a training example

Hence

And the log-likelihood over all training examples

( )( ) ( ) ( ) ( ) ( )( ) ( ) 1 ( )1 12 2 2log , , log 2 log

Tk k kdk n k n k n kf π −= − − − − −x μ Σ Σ x μ Σ x μ

( )( ) ( ) ( ) ( ) ( )( ) ( ) 1 ( )12 2 2

1 1 1 1log , log 2 log

k kN Nc c Tk k kdN Nk n k n k n k

k n k nf π −

= = = =

= − − − − −∑∑ ∑∑x θ Σ x μ Σ x μ

( ) ( ) ( ) ( ) ( ) ( )( ) 1 ( )11 2 2 2

1 1 1,... , log 2 log log

kNc cTk kdN Nc n k n k k k

k n kL Nπ π−

= = =

= = − − − − − +∑∑ ∑θ μ μ Σ Σ x μ Σ x μ

Page 12: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

LDA: Maximising the log-likelihood function

1 ( )

1[many steps omitted!]= 0

kNk

k n k knk

L N−

=

⎛ ⎞∂= − =⎜ ⎟∂ ⎝ ⎠

∑Σ x μμ

( )

1

kNk

k n kn

N=

= ∑μ x

0.5 1 11 0.5

11 1 0.5

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦

H

L

M

M O

L

1.

2. ( )( )( ) ( )1

1 1[many steps omitted!]=

kNc Tk kn k n k

k n

L N−= =

⎛ ⎞∂= × − − −⎜ ⎟∂ ⎝ ⎠

∑∑H Σ x μ x μΣ

( )( )( ) ( )

1 1

kNc Tk kn k n k

k nN

= =

= − −∑∑Σ x μ x μ

Page 13: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

QDA: Separate Gaussion parametric model for the class densities

The class densities are modelled with a Gaussian model (d-dimensional) with separate covariance across all classes:

Class specific Covariance Class mean

( ) ( ) ( ) ( )1 122( ) ( ) ( )1

2, , 2 expd Tk k k

k k k k kf π−−− ⎡ ⎤= = − − −⎣ ⎦x θ μ Σ Σ x μ Σ x μ

Page 14: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

QDA: Log-likelihood

For a training example

( )( ) ( ) ( ) ( ) ( )1( ) ( ) ( ) ( ) ( ) ( )1 12 2 2log , , log 2 log

Tk k k k k kdk n k n k n kf π

= − − − − −x μ Σ Σ x μ Σ x μ

Hence

( )( ) ( ) ( ) ( ) ( )1( ) ( ) ( ) ( ) ( )12 2

1 1 1 1 1

1log , log 2 log2

k kN Nc c c Tk k k k kdNk n k k n k n k

k n k k nf Nπ

= = = = =

= − − − − −∑∑ ∑ ∑∑x θ Σ x μ Σ x μ

And the log-likelihood over all training examples

( ) ( ) ( ) ( ) ( ) ( )1( ) ( ) ( ) ( ) ( )1 11 2 2 2

1 1 1 1,... , log 2 log log

kNc c cTk k k k kdNc k n k n k k k

k k n kL N Nπ π

= = = =

= = − − − − − +∑ ∑∑ ∑θ μ μ Σ Σ x μ Σ x μ

Page 15: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

QDA: Maximising the log-likelihood function

1 ( )

1[many steps omitted!]= 0

kNk

k n k knk

L N−

=

⎛ ⎞∂= − =⎜ ⎟∂ ⎝ ⎠

∑Σ x μμ

( )

1

kNk

k n kn

N=

= ∑μ x

( )( )( ) ( )1

1[many steps omitted!]=

kN Tk kk k n k n k

nk

L N−=

⎛ ⎞∂= × − − −⎜ ⎟∂ ⎝ ⎠

∑H Σ x μ x μΣ

0.5 1 11 0.5

11 1 0.5

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦

H

L

M

M O

L

1.

2.

( )( )( ) ( )

1

kN Tk kk n k n k k

nN

=

= − −∑Σ x μ x μ

Page 16: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Covariance inversion

If a covariance matrix does not have full rank then cannot invert matrixWork around– Identify columns of CV matrix with zero

eigenvectors– Remove these columns for CV (equivalent to

remove corresponding features)– Invert submatrix

Page 17: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Training EquationsDefine an associated (c×1) target vector nt which has one element set to 1 and all other elements set to zero for each of the N (d×1) training feature vector nx . The position of the element with value 1 indicates the class e.g. for a four class problem a target vector indicating that the associated training feature vector belongs to class 3 is

[ ]0 0 1 0 Tt = . Form a (d×N) matrix of feature vectors,

[ ]1 2 ... N=X x x x and a (c×N) matrix of target vectors

[ ]1 2 ... N=T t t t and a (d×c) matrix of mean vectors

[ ]1 2 ... c=M μ μ μ and a (c ×1) vector of prior probabilities

[ ]1 2 ... Tcπ π π=π

Page 18: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Matlab implementation – Target vectors

10000t5

01000t4

00100t3

00010t3

00001t1

Class 5Class 4Class 3Class 2Class 1Vector elements

Page 19: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Training Equations

( )T N= −Σ X X ΜT

( ) 1T T −=M XT TT

[ ]( )

. .,1

Tk

k k k kdN⎛ ⎞⎛ ⎞= × −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

Σ X X 1 T μ T

LDA

QDA

. means the kth row of matrix kT T

Page 20: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Processing – general form ( )

( )( ) ( )( ) ( )

( ) ( )( )( )( ) ( )( )( )

( )( )( )( )( )( )

( )( )

( )( )

( )( )

1 1

1 1

1

1

, exp ,

, exp ,

exp exp log , exp log ,

exp exp log , exp log ,

exp,where log ,

exp

,

,

k k k k k kk c c

l l l l l ll l

k k k k k k

c c

l l l l l ll l

kk k k kc

ll

k k kk c

kl l l

l

f K fp

f K f

K f f K

K f f K

yy f K

y

fNow p

f

π π

π π

π π

π π

π

π

π

= =

= =

=

=

= =

+= =

+

= = +

=

∑ ∑

∑ ∑

x θ x θ

x θ x θ

x θ x θ

x θ x θ

x θ

x θ

x θ

( )

( )

( )( )

1

1 1

1

12

1

,1

,

exp1 ,

exp

c

k k kc ck

ck

l l ll

ck

l k cl

ll

f

f

yp p p

y

π

π

=

= =

=

=

=

= =

∴ = − =

∑∑ ∑

∑∑

x θ

x θ

Page 21: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Processing - LDA

( )( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( )( ) ( ) ( )

( )

( )

11 12 2 2

1 1 11 12 2 2

1 1 11 12 2 2

1 112

log , log log 2 log

log 2 log 2 log

log , log 2 log

log

log

Tdk k k k k k

T T T dk k k k

T T Tdk k k k

T Tk k k k k

k k k k

Tk k

f

K where K

y

y b

π π π

π π

π π

π

π

− − −

− − −

− −

= − − − − −

= − − + − −

= + − + = − − +

= + −

= + +

=

x θ Σ x μ Σ x μ

x Σ x μ Σ x μ Σ μ Σ

μ Σ x μ Σ μ Σ x Σ x

μ Σ x μ Σ μ

a x

a μ 1

112

Tk k kb

−= −

Σ

μ Σ μ

Linear equation

Page 22: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Processing - QDA

( )( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

11 12 2 2

11 12 2 2

11 12 2 2

1

log , log log 2 log

log log 2 log

log log , log 2

2log log

Tdk k k k k k k k

T dk k k k k

T dk k k k k

Tk k k k k k

f

K where K

y

π π π

π π

π π

π

= − − − − −

= − − − − −

= − − − − − =

= − − − −

x θ Σ x μ Σ x μ

x μ Σ x μ Σ

x μ Σ x μ Σ

x μ Σ x μ Σ

Quadratic equation

Page 23: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Feedforward Neural Networks

Multilayer perceptron artificial neural network0 or more hidden layers

x1

x2

1 1

∑xd

1

1

1

1

∑w11

1)(

wMd(1)

b21)(

b11)(

bM(1)

y21)(

y11)(

yM(1)

w112( )

wcM( )2

y22( )

y12( )

yc( )2

b22( )

b12( )

bc( )2

1b3

1)(

y31)(

Input Features

Output Classes

Page 24: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Feedforward Neural Networks

Flexible linear or nonlinear mapping from features to classesA feedforward neural network is a ‘universal function approximator’Except for linear networks, training requires numerical optimisation Back propagation algorithm used for efficient training

x1

x21

∑w11

1)(

b11)(

1)(

y11)(

Non-linearity

y b w xm m mi ii

d( ( (1) 1) 1)

1

= +FHG

IKJ=

∑ϕ

OutputLinearly summed inputs

Nodal equation

Page 25: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

3. Performance Measurement Methods

Twoway classification– Prior probabilities– Sensitivity– Specificity– Positive and negative predictivity– Accuracy

Multiway classification– Prior probabilities– Sensitivity– Specificity– Positive and negative predictivity– Accuracy

Data splitting

Page 26: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Two classification

Diagnostic True statusAllocation Abnormal (D) Normal (~D) Total

Abnormal (S) a b a+b

Normal (~S) c d c+d

Total a+c b+d N

a is the number of cases which were classified abnormal and were truly abnormal. b is the number of cases which were classified abnormal but were in fact normal. c is the number of cases which were classified normal but were in fact abnormal d is the number of cases which were classified normal and were truly normal. N is the total number of cases.

(TP)(FP)(FN)(TN)

Probability of having the disease (P) = a cN+

Probability of not having the disease = b dN

P+= −1

Page 27: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Two way classification

Sensitivity (Se) = p(S|D) = Percentage of well classified abnormals aa c

=+

Specificity (Sp) = p(~S|~D)= Percentage of well classified normals db d

=+

Diagnostic True statusAllocation Abnormal (D) Normal (~D) Total

Abnormal (S) a b a+b

Normal (~S) c d c+d

Total a+c b+d N

Accuracy (A) = Percentage of well classified cases a dN

=+

Page 28: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Two way classification

Predictive value of positive test (PV+) = p(D|S)

=Percentage of well classified positives aa b

P SeP Se P Sp

=+

=+ − −

.. ( ).( )1 1

Predictive value of a negative test (PV-) =p(~D|~S)

=Percentage of well classified negatives dc d

P SpP Se P Sp

=+

=−

− + −( ).

.( ) ( ).1

1 1

Diagnostic True statusAllocation Abnormal (D) Normal (~D) Total

Abnormal (S) a b a+b

Normal (~S) c d c+d

Total a+c b+d N

Page 29: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Multiway classification Diagnostic Allocation

True Status No disease Disease 1 Disease 2 . . Disease n SumNo disease N00 N01 N02 N0n N0 .

Disease 1 N10 N00 N00 N1n N1 .

Disease 2 N20 N00 N00 N2n N2 .

:Disease n Nn0 N11 N12 Nnn Nn .

Sum N.0 N.1 N.2 N.n N..

Prevalence of disease i = probability of having disease i: P NNi

i= .

..

Page 30: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Multiway classification Diagnostic Allocation

True Status No disease Disease 1 Disease 2 . . Disease n SumNo disease N00 N01 N02 N0n N0 .

Disease 1 N10 N00 N00 N1n N1 .

Disease 2 N20 N00 N00 N2n N2 .

:Disease n Nn0 N11 N12 Nnn Nn .

Sum N.0 N.1 N.2 N.n N..

Sensitivity for disease i = Proportion of correctly classified cases with disease i:

Se NNi

ii

i

=.

Specificity = Proportion of correctly classified normals = NN

00

0.

Accuracy (A) = Proportion of correctly classified cases N

Nii∑

..

Page 31: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Multiway classification Diagnostic Allocation

True Status No disease Disease 1 Disease 2 . . Disease n SumNo disease N00 N01 N02 N0n N0 .

Disease 1 N10 N00 N00 N1n N1 .

Disease 2 N20 N00 N00 N2n N2 .

:Disease n Nn0 N11 N12 Nnn Nn .

Sum N.0 N.1 N.2 N.n N..

Predictive value for disease i = proportion of cases classified disease i which are

correct: PV NNi

ii

i

=.

Page 32: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Data splitting

Ideal– Maximum data for training– Maximum data for testing– Training and test data independent– Conflicting requirements

Resubstitution– Train and test on same recordings– Positively biased results

Holdout– Train on one sample, test on remaining sample

Cross fold validation

Page 33: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Cross fold validation

Training case Testing case

Illustration of 5-fold cross validation using 10 ECG records. The data is divided into 5 mutually exclusive folds and the classifier is trained and tested 5 times. Each time a different test fold is used and the remainder of the data used for training.

Unbiased, computationally intensive

Page 34: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

4. Features

TransformationsMissing valuesFeature Selection

Page 35: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Transformations

Look at the histogram of features and try applying a transformation if a skewed distribution resulting in a less skewed distribution.

Histogram of original feature

Histogram of log(feature)

Page 36: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Missing Values

Practical data sets often have missing feature values due to faulty measurements etcA majority of classifier models require all feature values to presentWhat to do?

– Delete all cases with one of more feature values– Estimate the missing feature values

replace with average (bad choice as skews distribution)Replace with random valueReplace with value from another case that is “similar” and has all featuresSee academic.uprm.edu/~eacuna/IFCS04r.pdf for a good summary

Page 37: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Feature Selection

The aim is to find a subset of the available features that provides “acceptable”performance– Fewer features means easier implementation– There may exist subsets of the available features

that provide higher classification performance than the full feature set

Irrelevant and redundant features in general reduce classifier performance

– Methods to look for include “filter” and “wrapper”methods, forward selection, backward elimination, stepwise, exhaustive, beam search

Page 38: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Part 2: Case Study: Sleep Apnea Detection using the Electrocardiogram

Page 39: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Diagnosis using polysomnogram (multiple signals).Diagnostic test costs $1500 - carried out in hospital.Only 15% with disease have been diagnosed.

Obstructive sleep apnea: 2-4% prevalence, disrupted sleep, treated with Continuous Positive Airway Pressure (CPAP) mask.

Case Study: Sleep Apnea Detection using the Electrocardiogram

Page 40: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Study Objective

See if can determine a method that can reliably detect sleep apnea using the Electrocardiogram (ECG)Benefits– Can do the test at home– Low cost– Reduce waiting lists in hospitals

Page 41: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Sleep Apnoea ECG database

Computers in Cardiology Conference 2000 Challenge– Automated ECG apnoea detection.

Uses modified lead V2 ECG from PSG database from patients at Philipps University in Germany (T. Penzel) which had been scored by sleep physiologists using complete polysomnogram. Supplied raw ECG waveform from a single lead and QRS detection times (unverified). 70 records total (about 8 hrs each); 35 released for training, 35 for independent testing

Page 42: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

PSG scored on epoch-by-epoch basis.Goal was to mimic human scorer

– Each epoch labelled as Normal (NR) or Sleep disordered respiration (SDR)

– Ea

Epoch based scoring

Page 43: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Splitting up of the DataWe have 70 overnight recordings of ECG– Every minute annotated as ‘normal’ of ‘sleep

disorder breathing’ by an expert– Over 32000 labels– 35 recordings available for training, 35 withheld for

testing

Page 44: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Guilleminault et al. were first to report on characteristic bradycardia/tachycardia pattern associated with obstructive apnoeas(Guilleminault C et al.. Cyclical variation of heart rate in sleep apnea syndrome. Lancet 1984 ).

• Ichimaru reported on low-frequency heart rate fluctuations caused by Cheyne-Stokes respiration(Ichimaru Y, Yanaga T. Frequency characteristics of the heart rate variability produced by Cheyne-Stokes respiration during 24-hr ambulatory electrocardiographic monitoring. Comput Biomed Res 1989).

Bradycardia/tachycardia patterns

Page 45: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Stein et al., J. CardiovascElectrophysiol.,

2003

Brady/tachy patterns

Page 46: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

EDR signal

Modulation of chest lead ECG signal amplitude by respiration

Respiration

EDR(n)=area enclosed by the QRS complex(n)

Page 47: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

ECG derived respiration (EDR)

A. Travaglini, et al, “Respiratory signal derived from eight-lead ECG,” in Computers in Cardiology, 1998.

B. G.B. Moody, et al, “Clinical Validation of the ECG-Derived Respiration (EDR) Technique,” in Computers in Cardiology, 1986.

Travaglini et al

Page 48: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Features

Record-based stdevEDR amplitude

Record-based stdevRR interval

NN50

Epoch-based mean EDR amplitude

Epoch-based mean RR interval

pNN50

Epoch-based st.dev. EDRamplitude

Epoch-based st.dev. RR interval

Allan Factor at 5-25 secs time scale

Record-based mean EDR amplitude

Record-based mean RR interval

SDNN

32 PSD features32 PSD featuresSerial correlation

EDR frequency domain

RR interval frequency domain

RR interval time domain

88 Features in all

Page 49: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Experiments

Linear and quadratic discriminant classifiers– Quick to train– Wanted to focus study on the features not the classifiers– Wanted a system that is readily implemented on

microprocessors

Different combination of feature groups: – RR time domain and frequency domain– EDR frequency domain

Feature SelectionCovariance regularisation (not discussed here)

Page 50: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Training – all features35-fold cross validation. Each fold contained 1 record

– Removed intra-record bias

Page 51: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Training –Feature SelectionUse the best first feature selection strategyOuter loop: 35-fold cross validationInner loop: 34 fold cross validation.As before each fold contained 1 record

– Removed intra-record bias

Page 52: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Performance Assessment 1

During Feature selection classifiers compared using ‘Accuracy’

Accuracy = (TP+TN) / (TP+FN+FP+TN)

TNFPSDR

FNTPNR

SDRNR

ExpertPr

edic

ted

Page 53: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Performance Assessment 2In addition classifiers assessed using Sensitivity and Specificity

TNFPSDR

FNTPNR

SDRNR

ExpertPr

edic

ted

Sensitivity = TP / (TP+FP)

Specificity = TN / (TN+FN)

Page 54: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Results- cross validation

(a) All features, (b) Feature selection

•LDA better than QDA

•Feature selection improved QDA

•Best features were RR and EDR combined

Page 55: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Results – withheld set

•Results on new data (withheld set) similar to the cross-validation results which is an encouraging sign!

Page 56: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Results- feature separation across classes

RR PSD features– Good separation at low

frequencies

EDR PSD features– Good separation

particularly at low frequencies

Page 57: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Practical tips

Always start with a simple system and add complexity if needed e.g start with LDA, progress to NN if neededFocus on finding good discriminating features. If your features are poor then no fancy classifier will helpAs best as possible make sure your performance assessment is unbiased

– Be careful of training and testing with features from the same record– Never make many performance assessments on the same data set

then report the best as this will be a postiviely biased result. Be particularly careful with feature selection where many thousands of comparisons may be made.

Consider what is the target device for the pattern recognition system

– E.g. low power, low computational device will influence your choice of features and classifier

Page 58: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Bibliography

Data splitting– R. Kohavi, “A study of cross validation and bootstrap for accuracy estimation and model selection,” In:

Proc. of 14th Int. Joint Conference on Artificial Intelligence, 1995, pp. 1137-1143. Performance estimating

– M. H. Zweig (1993), “Receiver Operator Characteristic (ROC) Plots: A Fundamental Evaluation Tool in Clinical Medicine,” Clin. Chem., vol. 39(4), pp. 561-577

– J. Michaelis and J. L. Willems (1987), “Performance Evaluation of Diagnostic ECG Programs,”Proceedings of the Computers in Cardiology Conference, Leuven, Belgium, Sept 12-15, pp. 25-30, Edited by: K. L. Ripley, IEEE Computer Society.

Classifiers– C. M. Bishop, Neural Networks for Pattern Recognition. New York: Oxford University Press, 1995.– B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge, England: Cambridge University

Press, 1996.– R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, New York:John Wiley and Sons, 2001. – Warren Sarle, SAS Institute, http://www.faqs.org/faqs/ai-faq/neural-nets/part1/– Mike James, Classification Algorithms, John Wiley and Sons, 1985Sleep Apnea

– P. de Chazal, C. Heneghan, E. Sheridan, R.B. Reilly, P. Nolan, M O’Malley (2003) ”Automated Processing of the Single Lead Electrocardiogram for the Detection of Obstructive Sleep Apnea”, IEEE Transactions on Biomedical Engineering, Vol. 50, No. 6, June 2003, pp. 686-696 .

Page 59: Pattern Recognition applied to Biomedical Signals - …fcruz/pdf/PattRecAug07.pdfPattern Recognition applied to Biomedical Signals ... kn lf π == θ=∏∏ x θ Our ... c kkkk l l

Email – contact me

[email protected]@ee.ucd.ie