MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve...

19
MLE’s, Bayesian Classifiers and Naïve Bayes and Naïve Bayes Required reading: • Mitchell draft chapter (on class website) Machine Learning 10-601 Machine Learning 10 601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 28, 2008

Transcript of MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve...

Page 1: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes

Required reading:

• Mitchell draft chapter (on class website)

Machine Learning 10-601Machine Learning 10 601

Tom M. MitchellMachine Learning Department

Carnegie Mellon University

January 28, 2008

Page 2: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Estimating Parameters• Maximum Likelihood Estimate (MLE): choose θ that maximizes probability of observed dataθ that maximizes probability of observed data

• Maximum a Posteriori (MAP) estimate: choose θ that is most probable given prior probability and the data

Page 3: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required
Page 4: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Estimating MLE for Bernoulli model

Page 5: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Naïve Bayes and Logistic Regression

• Design learning algorithms based on probabilistic modelprobabilistic model– Learn f: X Y, or better yet P(Y|X)

• Two of the most widely used algorithms

• Interesting relationship between these g ptwo:– Generative vs Discriminative classifiers

Page 6: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Bayes Rule

Which is shorthand for:

Random ith possible value of YRandom Variable

p

Page 7: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Bayes Rule

Which is shorthand for:

Equivalently:

Page 8: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Bayes Rule

Which is shorthand for:

Common abbreviation:

Page 9: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Bayes ClassifierX Y

Training data:X Y

Learning = estimating P(X|Y), P(Y)Classification = using Bayes rule toClassification = using Bayes rule to

calculate P(Y | Xnew)

Page 10: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Bayes ClassifierX Y

Training data:X Y

How shall we represent P(X|Y), P(Y)?How many parameters must we estimate?How many parameters must we estimate?

Page 11: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Bayes ClassifierX Y

Training data:X Y

How shall we represent P(X|Y), P(Y)?How many parameters must we estimate?How many parameters must we estimate?

Page 12: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Naïve Bayes

Naïve Bayes assumes

i th t X d X diti lli.e., that Xi and Xj are conditionally independent given Y, for all i≠j

Page 13: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Conditional IndependenceDefinition: X is conditionally independent of Y given Z,

if the probability distribution governing X is independent of the value of Y, given the value of Z

Which we often writeWhich we often write

E.g.,

Page 14: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Naïve Bayes uses assumption that the Xi are conditionally independent given Yindependent, given Y

Given this assumption, then:Given this assumption, then:

in general:

How many parameters needed to describe P(X|Y)? P(Y)?• Without conditional indep assumption?• Without conditional indep assumption?• With conditional indep assumption?

Page 15: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

How many parameters to estimate?P(X1 X | Y) ll i bl b lP(X1, ... Xn | Y), all variables booleanWithout conditional independence assumption:

With conditional independence assumption:

Page 16: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Naïve Bayes in a NutshellBayes rule:

Assuming conditional independence among Xi’s:

So, classification rule for Xnew =h X1, …, Xn i is:

Page 17: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Naïve Bayes Algorithm – discrete Xi

• Train Naïve Bayes (examples) for each* value yfor each value yk

estimatefor each* value x of each attribute Xfor each value xij of each attribute Xi

estimate

• Classify (Xnew)

* probabilities must sum to 1, so need estimate only n-1 parameters...

Page 18: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Estimating Parameters• Maximum Likelihood Estimate (MLE): choose θ that maximizes probability of observed dataθ that maximizes probability of observed data

• Maximum a Posteriori (MAP) estimate: choose θ that is most probable given prior probability and the data

Page 19: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required

Estimating Parameters: Y, Xi discrete-valued

Maximum likelihood estimates (MLE’s):

N b f D Number of items in set D for which Y=yk