Naive Bayes Classifiers - Wilkes University

Post on 29-Nov-2021

9 views 0 download

Transcript of Naive Bayes Classifiers - Wilkes University

kNN(k-Nearest Neighbor)

kNN (Instance Based Classifier)•Uses k “closest” points (nearest neighbors)

•Requires a similarity (distance) metric

• Similarity between items a and b:• C, common (shared) features

• A, features unique in a

• B, features unique in bS = θC - αA - βBwhere θ,α,β ≥ 0

One of these things…

kNN Classifiers• Requirements

– The set of stored records

– The distance metric

– The value of k

• Classification Algorithm– Compute distance from item to other stored records

– Identify its k nearest neighbors

– Use their class labels to determine the unknown class label

– May weigh the vote according to distance, e.g. use w = 1/d2

c c

c

c

cc

c

cc

dd

d

dd

d

?

d

(next: video clip examining parameters on website)

Reducing Complexity• Decrease training set size

• Help distance metric• Apply PCA to reduce features• Neighborhood Component Analysis

• Precompute distances• Nearest Neighbor Transformer

• Change search strategy

Naive Bayes Classifiers

Naive Bayes is

supervised learning algorithm

classification algorithm

probabilistic classifier: based on Bayes’ theorem of probability

Bayes’ Theorem

Let A and B be events. Then

P(A|B) =P(B|A)P(A)

P(B)

where

P(A) and P(B) are probabilities of observing events A and B,

respectively

P(A|B) is a conditional probability: the likelihood of event A

occurring given that B is true

P(B|A) is also a conditional probability: the likelihood of event

B occurring given that A is true

Naive Bayes

Let a dataset have two classes - for instance: {cats, dogs}. Every data

point has a set of features (variables). Then

P(class|feature set) =P(feature set|class)P(class)

P(feature set)

where

P(class|feature set) is called the posteriori: probability of classifying a

cat (an image of a cat), given a set of features observed in cats.

P(class) is called the prior: (unscaled) probability that a randomly

chosen observation is a cat.

P(feature set|class) is called the scaler: it scales up or down the prior

given this specific set of features (also called the likelihood).

P(feature set) is called the normalizer (evidence): probability of what

we are observing (the set of features) in our dataset.

Naive Bayes

“Naive” = assumption that all the features in data are

independent of one another! (This strong assumption rarely

holds in the real world though.)

The method is simple and computationally fast!

Example

Suppose we have 60 cats and 40 dogs in our dataset. Each data point

is a vector of n features.

Given particular values for the first two features (feature 1 and feature

2), what is the probability of a data point being a cat or a dog?

Feature Values Cats Dogs

Total 60 40

feature 1 50 5/6 5 1/8

feature 2 45 3/4 10 1/4

both features 40 15/24 5/4 1/32

P(Cat|both features) =40

40 + 5/4= 97%

Naive Bayes in Scikit-Learn

References

“A Comparison of Event Models for Naive Bayes Text

Classification” by Andrew McCallum and Kamal Nigam

“Spam Filtering with Naive Bayes – Which Naive Bayes?” by

Vangelis Metsis, et al.

“Pattern Recognition and Machine Learning” by Christopher

Bishop

“Image Classification Using Naive Bayes Classifier” by

Dong-Chul Park

Naive Bayes in Python (Scikit Learn):

https://scikit-learn.org/stable/modules/

naive_bayes.html

ANN(brief discussion)

Live Session IV(pause video here)