Naive Bayes Classifiers - Wilkes University

16
kNN (k-Nearest Neighbor)

Transcript of Naive Bayes Classifiers - Wilkes University

Page 1: Naive Bayes Classifiers - Wilkes University

kNN(k-Nearest Neighbor)

Page 2: Naive Bayes Classifiers - Wilkes University

kNN (Instance Based Classifier)•Uses k “closest” points (nearest neighbors)

•Requires a similarity (distance) metric

• Similarity between items a and b:• C, common (shared) features

• A, features unique in a

• B, features unique in bS = θC - αA - βBwhere θ,α,β ≥ 0

Page 3: Naive Bayes Classifiers - Wilkes University

One of these things…

Page 4: Naive Bayes Classifiers - Wilkes University

kNN Classifiers• Requirements

– The set of stored records

– The distance metric

– The value of k

• Classification Algorithm– Compute distance from item to other stored records

– Identify its k nearest neighbors

– Use their class labels to determine the unknown class label

– May weigh the vote according to distance, e.g. use w = 1/d2

c c

c

c

cc

c

cc

dd

d

dd

d

?

d

Page 5: Naive Bayes Classifiers - Wilkes University

(next: video clip examining parameters on website)

Page 6: Naive Bayes Classifiers - Wilkes University

Reducing Complexity• Decrease training set size

• Help distance metric• Apply PCA to reduce features• Neighborhood Component Analysis

• Precompute distances• Nearest Neighbor Transformer

• Change search strategy

Page 7: Naive Bayes Classifiers - Wilkes University

Naive Bayes Classifiers

Page 8: Naive Bayes Classifiers - Wilkes University

Naive Bayes is

supervised learning algorithm

classification algorithm

probabilistic classifier: based on Bayes’ theorem of probability

Page 9: Naive Bayes Classifiers - Wilkes University

Bayes’ Theorem

Let A and B be events. Then

P(A|B) =P(B|A)P(A)

P(B)

where

P(A) and P(B) are probabilities of observing events A and B,

respectively

P(A|B) is a conditional probability: the likelihood of event A

occurring given that B is true

P(B|A) is also a conditional probability: the likelihood of event

B occurring given that A is true

Page 10: Naive Bayes Classifiers - Wilkes University

Naive Bayes

Let a dataset have two classes - for instance: {cats, dogs}. Every data

point has a set of features (variables). Then

P(class|feature set) =P(feature set|class)P(class)

P(feature set)

where

P(class|feature set) is called the posteriori: probability of classifying a

cat (an image of a cat), given a set of features observed in cats.

P(class) is called the prior: (unscaled) probability that a randomly

chosen observation is a cat.

P(feature set|class) is called the scaler: it scales up or down the prior

given this specific set of features (also called the likelihood).

P(feature set) is called the normalizer (evidence): probability of what

we are observing (the set of features) in our dataset.

Page 11: Naive Bayes Classifiers - Wilkes University

Naive Bayes

“Naive” = assumption that all the features in data are

independent of one another! (This strong assumption rarely

holds in the real world though.)

The method is simple and computationally fast!

Page 12: Naive Bayes Classifiers - Wilkes University

Example

Suppose we have 60 cats and 40 dogs in our dataset. Each data point

is a vector of n features.

Given particular values for the first two features (feature 1 and feature

2), what is the probability of a data point being a cat or a dog?

Feature Values Cats Dogs

Total 60 40

feature 1 50 5/6 5 1/8

feature 2 45 3/4 10 1/4

both features 40 15/24 5/4 1/32

P(Cat|both features) =40

40 + 5/4= 97%

Page 13: Naive Bayes Classifiers - Wilkes University

Naive Bayes in Scikit-Learn

Page 14: Naive Bayes Classifiers - Wilkes University

References

“A Comparison of Event Models for Naive Bayes Text

Classification” by Andrew McCallum and Kamal Nigam

“Spam Filtering with Naive Bayes – Which Naive Bayes?” by

Vangelis Metsis, et al.

“Pattern Recognition and Machine Learning” by Christopher

Bishop

“Image Classification Using Naive Bayes Classifier” by

Dong-Chul Park

Naive Bayes in Python (Scikit Learn):

https://scikit-learn.org/stable/modules/

naive_bayes.html

Page 15: Naive Bayes Classifiers - Wilkes University

ANN(brief discussion)

Page 16: Naive Bayes Classifiers - Wilkes University

Live Session IV(pause video here)