Post on 16-Mar-2020
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 1
L5: Quadratic classifiers
โข Bayes classifiers for Normally distributed classes โ Case 1: ฮฃ๐ = ๐2๐ผ
โ Case 2: ฮฃ๐ = ฮฃ (ฮฃ diagonal)
โ Case 3: ฮฃ๐ = ฮฃ (ฮฃ non-diagonal)
โ Case 4: ฮฃ๐ = ๐๐2๐ผ
โ Case 5: ฮฃ๐ โ ฮฃ๐ (general case)
โข Numerical example
โข Linear and quadratic classifiers: conclusions
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 2
Bayes classifiers for Gaussian classes โข Recap
โ On L4 we showed that the decision rule that minimized ๐[๐๐๐๐๐] could be formulated in terms of a family of discriminant functions
โข For normally Gaussian classes, these DFs reduce to simple expressions โ The multivariate Normal pdf is
๐๐ ๐ฅ = 2๐ โ๐/2 ฮฃ โ1/2๐โ12๐ฅโ๐ ๐ฮฃโ1 ๐ฅโ๐
โ Using Bayes rule, the DFs become
๐๐ ๐ฅ = ๐ ๐๐|๐ฅ = ๐ ๐๐ ๐ ๐ฅ ๐๐ ๐ ๐ฅ
= 2๐ โ๐/2 ฮฃ๐โ1/2๐โ
12๐ฅโ๐๐
๐ฮฃ๐โ1 ๐ฅโ๐๐ ๐ ๐๐ ๐ ๐ฅ
โ Eliminating constant terms
๐๐ ๐ฅ = ฮฃ๐โ1/2 ๐โ
12๐ฅโ๐๐
๐ฮฃ๐โ1 ๐ฅโ๐๐ ๐ ๐๐
โ And taking natural logs
๐๐ ๐ฅ = โ1
2๐ฅ โ ๐๐
๐ฮฃ๐โ1 ๐ฅ โ ๐๐ โ
1
2log ฮฃ๐ + ๐๐๐๐(๐๐)
โ This expression is called a quadratic discriminant function
x2x2 x3
x3 xdxd
g1(x)g1(x)
x1x1
g2(x)g2(x) gC(x)gC(x)
Select maxSelect max
CostsCosts
Class assignment
Discriminant
functions
Features
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 3
Case 1: ๐บ๐ = ๐๐๐ฐ โข This situation occurs when features are statistically independent with
equal variance for all classes
โ In this case, the quadratic DFs become
๐๐ ๐ฅ = โ1
2๐ฅ โ ๐๐
๐ ๐2๐ผ โ1 ๐ฅ โ ๐๐ โ1
2log ๐2๐ผ + ๐๐๐๐๐ โก โ
1
2๐2๐ฅ โ ๐๐
๐ ๐ฅ โ ๐๐ + ๐๐๐๐๐
โ Expanding this expression
๐๐ ๐ฅ = โ1
2๐2๐ฅ๐๐ฅ โ 2๐๐
๐๐ฅ + ๐๐๐๐๐ + ๐๐๐๐๐
โ Eliminating the term ๐ฅ๐๐ฅ, which is constant for all classes
๐๐ ๐ฅ = โ1
2๐2โ2๐๐
๐๐ฅ + ๐๐๐๐๐ + ๐๐๐๐๐ = ๐ค๐
๐๐ฅ + ๐ค0
โ So the DFs are linear, and the boundaries ๐๐(๐ฅ) = ๐๐(๐ฅ) are hyper-planes
โ If we assume equal priors
๐๐ ๐ฅ = โ1
2๐2๐ฅ โ ๐๐
๐ ๐ฅ โ ๐๐
โข This is called a minimum-distance or nearest mean classifier
โข The equiprobable contours are hyper-spheres
โข For unit variance (๐2 = 1), ๐๐ ๐ฅ is the Euclidean distance
Distance 1
Min
imu
m S
elec
tor
Distance 2
Distance C
class
x
[Schalkoff, 1992]
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 4
โข Example โ Three-class 2D problem
with equal priors
๐1 = 3 2 ๐ ๐2 = 7 4 ๐ ๐3 = 2 5 ๐
ฮฃ1 =2
2ฮฃ1 =
22
ฮฃ1 =2
2
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 5
Case 2: ๐บ๐ = ๐บ (diagonal)
โข Classes still have the same covariance, but features are allowed to have different variances โ In this case, the quadratic DFs becomes
๐๐ ๐ฅ = โ1
2๐ฅ โ ๐๐
๐ฮฃ๐โ1 ๐ฅ โ ๐๐ โ
1
2log ฮฃ๐ + ๐๐๐๐๐ =
โ1
2โ๐=1๐
๐ฅ๐ โ ๐๐,๐2
๐๐2 โ
1
2๐๐๐โ๐=1
๐ ๐๐2 + ๐๐๐๐๐
โ Eliminating the term ๐ฅ๐2, which is constant for all classes
๐๐ ๐ฅ = โ1
2โ๐=1๐
โ2๐ฅ๐๐๐,๐ + ๐๐,๐2
๐๐2 โ
1
2๐๐๐ฮ ๐=1
๐ ๐๐2 + ๐๐๐๐๐
โ This discriminant is also linear, so the decision boundaries ๐๐(๐ฅ) = ๐๐(๐ฅ) will also be hyper-planes
โ The equiprobable contours are hyper-ellipses aligned with the reference frame
โ Note that the only difference with the previous classifier is that the distance of each axis is normalized by its variance
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 6
โข Example โ Three-class 2D problem
with equal priors
๐1 = 3 2 ๐ ๐2 = 5 4 ๐ ๐3 = 2 5 ๐
ฮฃ1 =1
2ฮฃ2 =
12
ฮฃ3 =1
2
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 7
Case 3: ๐บ๐ = ๐บ (non-diagonal) โข Classes have equal covariance matrix, but no longer diagonal
โ The quadratic discriminant becomes
๐๐ ๐ฅ = โ1
2๐ฅ โ ๐๐
๐ฮฃโ1 ๐ฅ โ ๐๐ โ1
2log ฮฃ + ๐๐๐๐๐
โ Eliminating the term log ฮฃ , which is constant for all classes, and assuming equal priors
๐๐ ๐ฅ = โ1
2๐ฅ โ ๐๐
๐ฮฃโ1 ๐ฅ โ ๐๐
โ The quadratic term is called the Mahalanobis distance, a very important concept in statistical pattern recognition
โ The Mahalanobis distance is a vector distance that uses a ฮฃโ1norm,
โ ฮฃโ1 acts as a stretching factor on the space
โ Note that when ฮฃ = ๐ผ, the Mahalanobis distance becomes the familiar Euclidean distance
x2
x1
ฮ ฮผ-x2
i =
K ฮผ-x2
i 1 =-รฅ
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 8
โ Expanding the quadratic term
๐๐ ๐ฅ = โ1
2๐ฅ๐ฮฃโ1๐ฅ โ 2๐๐
๐ฮฃโ1๐ฅ + ๐๐๐ฮฃโ1๐๐
โ Removing the term ๐ฅ๐ฮฃโ1๐ฅ, which is constant for all classes
๐๐ ๐ฅ = โ1
2โ2๐๐
๐ฮฃโ1๐ฅ + ๐๐๐ฮฃโ1๐๐ = ๐ค1
๐๐ฅ + ๐ค0
โ So the DFs are still linear, and the decision boundaries will also be hyper-planes
โ The equiprobable contours are hyper-ellipses aligned with the eigenvectors of ฮฃ
โ This is known as a minimum (Mahalanobis) distance classifier
Distance 1
Min
imu
m S
elec
tor
Distance 2
Distance C
รฅ
class
x
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 9
โข Example โ Three-class 2D problem
with equal priors
๐1 = 3 2 ๐ ๐2 = 5 4 ๐ ๐3 = 2 5 ๐
ฮฃ1 =1 .7.7 2
ฮฃ2 =1 .7.7 2
ฮฃ3 =1 .7.7 2
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 10
Case 4: ๐บ๐ = ๐๐๐๐ฐ
โข In this case, each class has a different covariance matrix, which is proportional to the identity matrix โ The quadratic discriminant becomes
๐๐ ๐ฅ = โ1
2๐ฅ โ ๐๐
๐๐๐โ2 ๐ฅ โ ๐๐ โ
1
2๐log ๐๐
2 + ๐๐๐๐๐
โ This expression cannot be reduced further
โ The decision boundaries are quadratic: hyper-ellipses
โ The equiprobable contours are hyper-spheres aligned with the feature axis
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 11
โข Example โ Three-class 2D problem
with equal priors
๐1 = 3 2 ๐ ๐2 = 5 4 ๐ ๐3 = 2 5 ๐
ฮฃ1 =.5
.5ฮฃ2 =
11
ฮฃ3 =2
2
Zoom out
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 12
Case 5: ๐บ๐ โ ๐บ๐ (general case)
โข We already derived the expression for the general case
๐๐ ๐ฅ = โ1
2๐ฅ โ ๐๐
๐ฮฃ๐โ1 ๐ฅ โ ๐๐ โ
1
2log ฮฃ๐ + ๐๐๐๐๐
โ Reorganizing terms in a quadratic form yields
๐๐ ๐ฅ = ๐ฅ๐๐2,๐๐ฅ + ๐ค1,๐๐ ๐ฅ + ๐ค0,๐
where
๐2,๐ = โ1
2ฮฃ๐โ1
๐ค1,๐ = ฮฃ๐โ1๐๐
๐ค๐,๐ = โ1
2๐๐๐ฮฃ๐
โ1๐๐ โ1
2log ฮฃ๐ + ๐๐๐๐๐
โ The equiprobable contours are hyper-ellipses, oriented with the eigenvectors of ฮฃ๐ for that class
โ The decision boundaries are again quadratic: hyper-ellipses or hyper-parabolloids
โ Notice that the quadratic expression in the discriminant is proportional to the Mahalanobis distance for covariance ฮฃ๐
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 13
โข Example โ Three-class 2D problem
with equal priors
๐1 = 3 2 ๐ ๐2 = 5 4 ๐ ๐3 = 3 4 ๐
ฮฃ1 =1 โ1โ1 2
ฮฃ2 =1 โ1โ1 7
ฮฃ3 =.5 .5.5 3
Zoom out
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 14
Numerical example โข Derive a linear DF for the following 2-class 3D problem
๐1 = 0 0 0 ๐; ๐2 = 1 1 1 ๐; ฮฃ1 = ฮฃ2 =.25
.25.25
; P2 = 2P1
โ Solution
๐1 ๐ฅ = โ1
2๐2๐ฅ โ ๐1
๐ ๐ฅ โ ๐1 + ๐๐๐๐1 = โ1
2
๐ฅ โ 0๐ฆ โ 0๐ง โ 0
๐4
44
๐ฅ โ 0๐ฆ โ 0๐ง โ 0
+ ๐๐๐1
3
โข ๐2 ๐ฅ = โ1
2
๐ฅ โ 1๐ฆ โ 1๐ง โ 1
๐4
44
๐ฅ โ 1๐ฆ โ 1๐ง โ 1
+ ๐๐๐2
3
โข ๐1 ๐ฅ๐1><๐2
๐2 ๐ฅ โ โ2 ๐ฅ2 + ๐ฆ2 + ๐ง2 + ๐๐1
3 ๐1><๐2
โ 2 ๐ฅ โ 1 2 + ๐ฆ โ 1 2 + ๐ง โ 1 2 + lg2
3
โข ๐ฅ + ๐ฆ + ๐ง ๐2><๐1
6โ๐๐๐2
4= 1.32
โ Classify the test example ๐ฅ๐ข = 0.1 0.7 0.8 ๐
0.1 + 0.7 + 0.8 = 1.6
๐2
><๐1
1.32 โ ๐ฅ๐ข โ ๐2
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 15
Conclusions โข The examples in this lecture illustrate the following points
โ The Bayes classifier for Gaussian classes (general case) is quadratic โ The Bayes classifier for Gaussian classes with equal covariance is linear โ The Mahalanobis distance classifier is Bayes-optimal for
โข normally distributed classes and โข equal covariance matrices and โข equal priors
โ The Euclidean distance classifier is Bayes-optimal for โข normally distributed classes and โข equal covariance matrices proportional to the identity matrix and โข equal priors
โ Both Euclidean and Mahalanobis distance classifiers are linear classifiers
โข Thus, some of the simplest and most popular classifiers can be derived from decision-theoretic principles โ Using a specific (Euclidean or Mahalanobis) minimum distance classifier
implicitly corresponds to certain statistical assumptions โ The question whether these assumptions hold or donโt can rarely be
answered in practice; in most cases we can only determine whether the classifier solves our problem