L5: Quadratic classifiers - Texas A&M Universityresearch.cs.tamu.edu/prism/lectures/pr/pr_l5.pdfCSCE...

CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 1

L5: Quadratic classifiers

• Bayes classifiers for Normally distributed classes – Case 1: Σ𝑖 = 𝜎2𝐼

– Case 2: Σ𝑖 = Σ (Σ diagonal)

– Case 3: Σ𝑖 = Σ (Σ non-diagonal)

– Case 4: Σ𝑖 = 𝜎𝑖2𝐼

– Case 5: Σ𝑖 ≠ Σ𝑗 (general case)

• Numerical example

• Linear and quadratic classifiers: conclusions

Bayes classifiers for Gaussian classes • Recap

– On L4 we showed that the decision rule that minimized 𝑃[𝑒𝑟𝑟𝑜𝑟] could be formulated in terms of a family of discriminant functions

• For normally Gaussian classes, these DFs reduce to simple expressions – The multivariate Normal pdf is

𝑓𝑋 𝑥 = 2𝜋 −𝑁/2 Σ −1/2𝑒−12𝑥−𝜇 𝑇Σ−1 𝑥−𝜇

– Using Bayes rule, the DFs become

𝑔𝑖 𝑥 = 𝑃 𝜔𝑖|𝑥 = 𝑃 𝜔𝑖 𝑝 𝑥 𝜔𝑖 𝑝 𝑥

= 2𝜋 −𝑁/2 Σ𝑖−1/2𝑒−

12𝑥−𝜇𝑖

𝑇Σ𝑖−1 𝑥−𝜇𝑖 𝑃 𝜔𝑖 𝑝 𝑥

– Eliminating constant terms

𝑔𝑖 𝑥 = Σ𝑖−1/2 𝑒−

12𝑥−𝜇𝑖

𝑇Σ𝑖−1 𝑥−𝜇𝑖 𝑃 𝜔𝑖

– And taking natural logs

𝑔𝑖 𝑥 = −1

2𝑥 − 𝜇𝑖

𝑇Σ𝑖−1 𝑥 − 𝜇𝑖 −

2log Σ𝑖 + 𝑙𝑜𝑔𝑃(𝜔𝑖)

– This expression is called a quadratic discriminant function

x2x2 x3

x3 xdxd

g1(x)g1(x)

g2(x)g2(x) gC(x)gC(x)

Select maxSelect max

CostsCosts

Class assignment

Discriminant

functions

Features

Case 1: 𝚺𝒊 = 𝝈𝟐𝑰 • This situation occurs when features are statistically independent with

equal variance for all classes

– In this case, the quadratic DFs become

𝑔𝑖 𝑥 = −1

2𝑥 − 𝜇𝑖

𝑇 𝜎2𝐼 −1 𝑥 − 𝜇𝑖 −1

2log 𝜎2𝐼 + 𝑙𝑜𝑔𝑃𝑖 ≡ −

2𝜎2𝑥 − 𝜇𝑖

𝑇 𝑥 − 𝜇𝑖 + 𝑙𝑜𝑔𝑃𝑖

– Expanding this expression

𝑔𝑖 𝑥 = −1

2𝜎2𝑥𝑇𝑥 − 2𝜇𝑖

𝑇𝑥 + 𝜇𝑖𝑇𝜇𝑖 + 𝑙𝑜𝑔𝑃𝑖

– Eliminating the term 𝑥𝑇𝑥, which is constant for all classes

𝑔𝑖 𝑥 = −1

2𝜎2−2𝜇𝑖

𝑇𝑥 + 𝜇𝑖𝑇𝜇𝑖 + 𝑙𝑜𝑔𝑃𝑖 = 𝑤𝑖

𝑇𝑥 + 𝑤0

– So the DFs are linear, and the boundaries 𝑔𝑖(𝑥) = 𝑔𝑗(𝑥) are hyper-planes

– If we assume equal priors

𝑔𝑖 𝑥 = −1

2𝜎2𝑥 − 𝜇𝑖

𝑇 𝑥 − 𝜇𝑖

• This is called a minimum-distance or nearest mean classifier

• The equiprobable contours are hyper-spheres

• For unit variance (𝜎2 = 1), 𝑔𝑖 𝑥 is the Euclidean distance

Distance 1

Distance 2

Distance C

[Schalkoff, 1992]

• Example – Three-class 2D problem

with equal priors

𝜇1 = 3 2 𝑇 𝜇2 = 7 4 𝑇 𝜇3 = 2 5 𝑇

Σ1 =2

2Σ1 =

Σ1 =2

Case 2: 𝚺𝒊 = 𝚺 (diagonal)

• Classes still have the same covariance, but features are allowed to have different variances – In this case, the quadratic DFs becomes

𝑔𝑖 𝑥 = −1

2𝑥 − 𝜇𝑖

2log Σ𝑖 + 𝑙𝑜𝑔𝑃𝑖 =

2∑𝑘=1𝑁

𝑥𝑘 − 𝜇𝑖,𝑘2

𝜎𝑘2 −

2𝑙𝑜𝑔∏𝑘=1

𝑁 𝜎𝑘2 + 𝑙𝑜𝑔𝑃𝑖

– Eliminating the term 𝑥𝑘2, which is constant for all classes

𝑔𝑖 𝑥 = −1

2∑𝑘=1𝑁

−2𝑥𝑘𝜇𝑖,𝑘 + 𝜇𝑖,𝑘2

𝜎𝑘2 −

2𝑙𝑜𝑔Π𝑘=1

𝑁 𝜎𝑘2 + 𝑙𝑜𝑔𝑃𝑖

– This discriminant is also linear, so the decision boundaries 𝑔𝑖(𝑥) = 𝑔𝑗(𝑥) will also be hyper-planes

– The equiprobable contours are hyper-ellipses aligned with the reference frame

– Note that the only difference with the previous classifier is that the distance of each axis is normalized by its variance

with equal priors

𝜇1 = 3 2 𝑇 𝜇2 = 5 4 𝑇 𝜇3 = 2 5 𝑇

Σ1 =1

2Σ2 =

Σ3 =1

Case 3: 𝚺𝒊 = 𝚺 (non-diagonal) • Classes have equal covariance matrix, but no longer diagonal

– The quadratic discriminant becomes

𝑔𝑖 𝑥 = −1

2𝑥 − 𝜇𝑖

𝑇Σ−1 𝑥 − 𝜇𝑖 −1

2log Σ + 𝑙𝑜𝑔𝑃𝑖

– Eliminating the term log Σ , which is constant for all classes, and assuming equal priors

𝑔𝑖 𝑥 = −1

2𝑥 − 𝜇𝑖

𝑇Σ−1 𝑥 − 𝜇𝑖

– The quadratic term is called the Mahalanobis distance, a very important concept in statistical pattern recognition

– The Mahalanobis distance is a vector distance that uses a Σ−1norm,

– Σ−1 acts as a stretching factor on the space

– Note that when Σ = 𝐼, the Mahalanobis distance becomes the familiar Euclidean distance

Κ μ-x2

K μ-x2

i 1 =-å

– Expanding the quadratic term

𝑔𝑖 𝑥 = −1

2𝑥𝑇Σ−1𝑥 − 2𝜇𝑖

𝑇Σ−1𝑥 + 𝜇𝑖𝑇Σ−1𝜇𝑖

– Removing the term 𝑥𝑇Σ−1𝑥, which is constant for all classes

𝑔𝑖 𝑥 = −1

2−2𝜇𝑖

𝑇Σ−1𝑥 + 𝜇𝑖𝑇Σ−1𝜇𝑖 = 𝑤1

𝑇𝑥 + 𝑤0

– So the DFs are still linear, and the decision boundaries will also be hyper-planes

– The equiprobable contours are hyper-ellipses aligned with the eigenvectors of Σ

– This is known as a minimum (Mahalanobis) distance classifier

Distance 1

Distance 2

Distance C

with equal priors

𝜇1 = 3 2 𝑇 𝜇2 = 5 4 𝑇 𝜇3 = 2 5 𝑇

Σ1 =1 .7.7 2

Σ2 =1 .7.7 2

Σ3 =1 .7.7 2

Case 4: 𝚺𝒊 = 𝝈𝒊𝟐𝑰

• In this case, each class has a different covariance matrix, which is proportional to the identity matrix – The quadratic discriminant becomes

𝑔𝑖 𝑥 = −1

2𝑥 − 𝜇𝑖

𝑇𝜎𝑖−2 𝑥 − 𝜇𝑖 −

2𝑁log 𝜎𝑖

2 + 𝑙𝑜𝑔𝑃𝑖

– This expression cannot be reduced further

– The decision boundaries are quadratic: hyper-ellipses

– The equiprobable contours are hyper-spheres aligned with the feature axis

with equal priors

𝜇1 = 3 2 𝑇 𝜇2 = 5 4 𝑇 𝜇3 = 2 5 𝑇

Σ1 =.5

.5Σ2 =

Σ3 =2

Zoom out

Case 5: 𝚺𝒊 ≠ 𝚺𝒋 (general case)

• We already derived the expression for the general case

𝑔𝑖 𝑥 = −1

2𝑥 − 𝜇𝑖

2log Σ𝑖 + 𝑙𝑜𝑔𝑃𝑖

– Reorganizing terms in a quadratic form yields

𝑔𝑖 𝑥 = 𝑥𝑇𝑊2,𝑖𝑥 + 𝑤1,𝑖𝑇 𝑥 + 𝑤0,𝑖

𝑊2,𝑖 = −1

2Σ𝑖−1

𝑤1,𝑖 = Σ𝑖−1𝜇𝑖

𝑤𝑜,𝑖 = −1

2𝜇𝑖𝑇Σ𝑖

−1𝜇𝑖 −1

2log Σ𝑖 + 𝑙𝑜𝑔𝑃𝑖

– The equiprobable contours are hyper-ellipses, oriented with the eigenvectors of Σ𝑖 for that class

– The decision boundaries are again quadratic: hyper-ellipses or hyper-parabolloids

– Notice that the quadratic expression in the discriminant is proportional to the Mahalanobis distance for covariance Σ𝑖

with equal priors

𝜇1 = 3 2 𝑇 𝜇2 = 5 4 𝑇 𝜇3 = 3 4 𝑇

Σ1 =1 −1−1 2

Σ2 =1 −1−1 7

Σ3 =.5 .5.5 3

Zoom out

Numerical example • Derive a linear DF for the following 2-class 3D problem

𝜇1 = 0 0 0 𝑇; 𝜇2 = 1 1 1 𝑇; Σ1 = Σ2 =.25

.25.25

; P2 = 2P1

– Solution

𝑔1 𝑥 = −1

2𝜎2𝑥 − 𝜇1

𝑇 𝑥 − 𝜇1 + 𝑙𝑜𝑔𝑃1 = −1

𝑥 − 0𝑦 − 0𝑧 − 0

+ 𝑙𝑜𝑔1

• 𝑔2 𝑥 = −1

𝑥 − 1𝑦 − 1𝑧 − 1

+ 𝑙𝑜𝑔2

• 𝑔1 𝑥𝜔1><𝜔2

𝑔2 𝑥 ⇒ −2 𝑥2 + 𝑦2 + 𝑧2 + 𝑙𝑔1

3 𝜔1><𝜔2

− 2 𝑥 − 1 2 + 𝑦 − 1 2 + 𝑧 − 1 2 + lg2

• 𝑥 + 𝑦 + 𝑧 𝜔2><𝜔1

6−𝑙𝑜𝑔2

4= 1.32

– Classify the test example 𝑥𝑢 = 0.1 0.7 0.8 𝑇

0.1 + 0.7 + 0.8 = 1.6

><𝜔1

1.32 ⇒ 𝑥𝑢 ∈ 𝜔2

Conclusions • The examples in this lecture illustrate the following points

– The Bayes classifier for Gaussian classes (general case) is quadratic – The Bayes classifier for Gaussian classes with equal covariance is linear – The Mahalanobis distance classifier is Bayes-optimal for

• normally distributed classes and • equal covariance matrices and • equal priors

– The Euclidean distance classifier is Bayes-optimal for • normally distributed classes and • equal covariance matrices proportional to the identity matrix and • equal priors

– Both Euclidean and Mahalanobis distance classifiers are linear classifiers

• Thus, some of the simplest and most popular classifiers can be derived from decision-theoretic principles – Using a specific (Euclidean or Mahalanobis) minimum distance classifier

implicitly corresponds to certain statistical assumptions – The question whether these assumptions hold or don’t can rarely be

answered in practice; in most cases we can only determine whether the classifier solves our problem

L5: Quadratic classifiers - Texas A&M Universityresearch.cs.tamu.edu/prism/lectures/pr/pr_l5.pdfCSCE...

Documents

Transcript of L5: Quadratic classifiers - Texas A&M Universityresearch.cs.tamu.edu/prism/lectures/pr/pr_l5.pdfCSCE...

1111. Un cubo di lato L e peso specifico. Un cubo di lato ...€¦ · O x Sx Sy α. 666 5555 ... affonda la barca rispetto alla superficie libera. L’ equilibrio si ha quando P=

Curriculum Vitae Dr. George A. Papakostasiiwm.teikav.edu.gr/iinew/wp-content/uploads/2015/... · [5] A. Kyriazis, “Development of Neural Classifiers by Using Wavelets for Pattern

To χάραγμα και το 666, π.Ευσεβίου Βίττη

Naive Bayes Classifiers - Wilkes University

Government of Kerala tIcf Kkddv · _tbm-sS-Ivt\m-fPn B‚v tamU¬ ^vtfmdn- ...

ΤΟ ΧΑΡΑΓΜΑ (666) · 2010. 9. 7. · ΓΙΑ ΤΑ ΑΡΧΑΙΑ ΣΤΑ ... ή ένα μικροτσίπ με το barcode, που θα δοθεί σε κάθε άνθρωπο

1 Problem: Consider a two class task with ω 1, ω 2 LINEAR CLASSIFIERS.

PRESENTATION of PASTRY - staffnew.uny.ac.idstaffnew.uny.ac.id/upload/.../pendidikan/MENU+RESTORAN+TERBARUppt.pdf · makanan khas daerah. Menu kusus ditujukan untuk tamu. Variasi pilihan

The Fundamentals of Modal Testing - TRIBGROUP TAMU Note 243-3 The...2 Modal analysis is defined as the study of the dynamic characteristics of a mechanical structure. This applica-tion

Advances in Deep Learning with Applications in Text and ......C. Alippi, G. Boracchi and M. Roveri, “Just In Time Classifiers for Recurrent Concepts”IEEE Transactions on Neural

DENKA Σ1000 - Denka Infrastructure Technologies€¦ · 1DENKA cement 2Equivalent amount ... Resistance to freezing/thawing (tests done in accordance with ASTM C-666) ... Compressive

Ο Γραμμικός Κώδικας (BAR-CODE) και ο Αριθμός 666

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

S NA C - Noticias | Likitechlikitech-franklin.com/upload/FESC/FESC21.pdf · T N [Nm] T A [Nm] L [mm] P [kg] 185 60 000 400 2940 357 1892 87 0,87 600 666 1703 595 220 60 000 400 2940

Index [application.wiley-vch.de] · reagents used for 310 ... Fcγ-effector functions 104 engineering 99–102, 101 ... aqueous DNA solutions phenolic extraction of 666 Ardenne, ...

1 The Problem: Consider a two class task with ω 1, ω 2 LINEAR CLASSIFIERS.

L5: Quadratic classifierscourses.cs.tamu.edu/rgutier/csce666_f13/l5.pdf · L5: Quadratic classifiers • Bayes classifiers for Normally distributed classes –Case 1: Σ =𝜎2𝐼

SCI 519 - Hybrid Classifiers (Backmatter Pages)978-3-642-40997-4/1.pdf · A Appendix A.1 Hypothesis Testing Letusassumethatwewouldliketoestimatethemeanμofnormaldensity fromsample[314]

2015年應用學習科目 - 666 - 學生表現示例 - 課業6學生表現 ......Applied Learning ，-Exploring Psychology Task 6: Character Strengths Photo Album presentation Assessment