MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve...

MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes

Required reading:

• Mitchell draft chapter (on class website)

Machine Learning 10-601Machine Learning 10 601

Tom M. MitchellMachine Learning Department

Carnegie Mellon University

January 28, 2008

Estimating Parameters• Maximum Likelihood Estimate (MLE): choose θ that maximizes probability of observed dataθ that maximizes probability of observed data

• Maximum a Posteriori (MAP) estimate: choose θ that is most probable given prior probability and the data

Estimating MLE for Bernoulli model

Naïve Bayes and Logistic Regression

• Design learning algorithms based on probabilistic modelprobabilistic model– Learn f: X Y, or better yet P(Y|X)

• Two of the most widely used algorithms

• Interesting relationship between these g ptwo:– Generative vs Discriminative classifiers

Bayes Rule

Which is shorthand for:

Random ith possible value of YRandom Variable

Bayes Rule

Equivalently:

Bayes Rule

Common abbreviation:

Bayes ClassifierX Y

Training data:X Y

Learning = estimating P(X|Y), P(Y)Classification = using Bayes rule toClassification = using Bayes rule to

calculate P(Y | Xnew)

Bayes ClassifierX Y

Training data:X Y

How shall we represent P(X|Y), P(Y)?How many parameters must we estimate?How many parameters must we estimate?

Bayes ClassifierX Y

Training data:X Y

How shall we represent P(X|Y), P(Y)?How many parameters must we estimate?How many parameters must we estimate?

Naïve Bayes

Naïve Bayes assumes

i th t X d X diti lli.e., that Xi and Xj are conditionally independent given Y, for all i≠j

Conditional IndependenceDefinition: X is conditionally independent of Y given Z,

if the probability distribution governing X is independent of the value of Y, given the value of Z

Which we often writeWhich we often write

Naïve Bayes uses assumption that the Xi are conditionally independent given Yindependent, given Y

Given this assumption, then:Given this assumption, then:

in general:

How many parameters needed to describe P(X|Y)? P(Y)?• Without conditional indep assumption?• Without conditional indep assumption?• With conditional indep assumption?

How many parameters to estimate?P(X1 X | Y) ll i bl b lP(X1, ... Xn | Y), all variables booleanWithout conditional independence assumption:

With conditional independence assumption:

Naïve Bayes in a NutshellBayes rule:

Assuming conditional independence among Xi’s:

So, classification rule for Xnew =h X1, …, Xn i is:

Naïve Bayes Algorithm – discrete Xi

• Train Naïve Bayes (examples) for each* value yfor each value yk

estimatefor each* value x of each attribute Xfor each value xij of each attribute Xi

estimate

• Classify (Xnew)

* probabilities must sum to 1, so need estimate only n-1 parameters...

Estimating Parameters• Maximum Likelihood Estimate (MLE): choose θ that maximizes probability of observed dataθ that maximizes probability of observed data

• Maximum a Posteriori (MAP) estimate: choose θ that is most probable given prior probability and the data

Estimating Parameters: Y, Xi discrete-valued

Maximum likelihood estimates (MLE’s):

N b f D Number of items in set D for which Y=yk

MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve...

Documents

Transcript of MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve...

ST451 - Lent term Bayesian Machine Learning

Bayesian Adaptive Trading with Daily Cycle

Naive Bayes Classifiers - Wilkes University

Bayesian Interpretations of Regularization - mit.edu9.520/spring09/Classes/class15-bayes.pdf · The Plan Bayesian estimation basics Bayesian interpretation of ERM Bayesian interpretation

Naïve Bayes based Model

Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Bayesian Optimization with Exponential Convergencepapers.nips.cc/paper/5715-bayesian-optimization-with... · 2015-12-18 · Bayesian Optimization with Exponential Convergence Kenji

L5: Quadratic classifierscourses.cs.tamu.edu/rgutier/csce666_f13/l5.pdf · L5: Quadratic classifiers • Bayes classifiers for Normally distributed classes –Case 1: Σ =𝜎2𝐼

1 Problem: Consider a two class task with ω 1, ω 2 LINEAR CLASSIFIERS.

Stochastic Volatility Models: Bayesian Framework

Bayesianmethods&Naïve( Bayes( Lecture18people.csail.mit.edu/dsontag/courses/ml13/slides/lecture18.pdf · Naïve Bayes for Digits (Binary Inputs) • Simple version: – One feature

Doubly Robust Bayesian Inference for Non-Stationary ... · 2.1 General Bayesian Inference (GBI) with -Divergences ( -D) Standard Bayesian inference minimizes the KLD between the Data

Non-parametric Bayesian Methods - Princeton University

Non-parametric Bayesian Methods - Cambridge Machine Learning …mlg.eng.cam.ac.uk/tutorials/07/zg.pdf · 2007-07-02 · Non-parametric Bayesian Models •Bayesian methods are most

Lecture 17 Bayesian Econometrics · PDF fileBayesian Econometrics Bayesian Econometrics: Introduction • Idea: ... • The typical problem in Bayesian statistics involves obtaining

Classification Decision boundary & Naïve Bayesaarti/Class/10315_Fall19/lecs/Lecture3.pdf · Classification – Decision boundary & Naïve Bayes Sub-lecturer: Mariya Toneva Instructor:

Bayesian Inference for Some Basic Models

Bayesian Graphical Models

Naïve Bayes - University of Liverpoolxiaowei/ai_materials/15-Naive-Bayes.pdf · Naïve Bayes: Subtlety #2 •Often the X i are not really conditionally independent •We use Naïve

Parametric Density Estimation: Bayesian Estimation. Naïve ...rita/uml_course/add_mat/PDE_Bayesian_NB.pdf · Parametric Density Estimation: Bayesian Estimation. ... Learn Bayes classifier