# Machine Learning - Introduction to Bayesian Classification

date post

14-Aug-2015Category

## Engineering

view

23download

3

Embed Size (px)

### Transcript of Machine Learning - Introduction to Bayesian Classification

- 1. Machine Learning for Data Mining Introduction to Bayesian Classiers Andres Mendez-Vazquez August 3, 2015 1 / 71
- 2. Outline 1 Introduction Supervised Learning Naive Bayes The Naive Bayes Model The Multi-Class Case Minimizing the Average Risk 2 Discriminant Functions and Decision Surfaces Introduction Gaussian Distribution Inuence of the Covariance Maximum Likelihood Principle Maximum Likelihood on a Gaussian 2 / 71
- 3. Classication problem Training Data Samples of the form (d, h(d)) d Where d are the data objects to classify (inputs) h (d) h(d) are the correct class info for d, h(d) 1, . . . K 3 / 71
- 4. Classication problem Training Data Samples of the form (d, h(d)) d Where d are the data objects to classify (inputs) h (d) h(d) are the correct class info for d, h(d) 1, . . . K 3 / 71
- 5. Classication problem Training Data Samples of the form (d, h(d)) d Where d are the data objects to classify (inputs) h (d) h(d) are the correct class info for d, h(d) 1, . . . K 3 / 71
- 6. Outline 1 Introduction Supervised Learning Naive Bayes The Naive Bayes Model The Multi-Class Case Minimizing the Average Risk 2 Discriminant Functions and Decision Surfaces Introduction Gaussian Distribution Inuence of the Covariance Maximum Likelihood Principle Maximum Likelihood on a Gaussian 4 / 71
- 7. Classication Problem Goal Given dnew, provide h(dnew) The Machinery in General looks... Supervised Learning Training Info: Desired/Trget Output INPUT OUTPUT 5 / 71
- 8. Outline 1 Introduction Supervised Learning Naive Bayes The Naive Bayes Model The Multi-Class Case Minimizing the Average Risk 2 Discriminant Functions and Decision Surfaces Introduction Gaussian Distribution Inuence of the Covariance Maximum Likelihood Principle Maximum Likelihood on a Gaussian 6 / 71
- 9. Naive Bayes Model Task for two classes Let 1, 2 be the two classes in which our samples belong. There is a prior probability of belonging to that class P (1) for class 1. P (2) for class 2. The Rule for classication is the following one P (i|x) = P (x|i) P (i) P (x) (1) Remark: Bayes to the next level. 7 / 71
- 10. Naive Bayes Model Task for two classes Let 1, 2 be the two classes in which our samples belong. There is a prior probability of belonging to that class P (1) for class 1. P (2) for class 2. The Rule for classication is the following one P (i|x) = P (x|i) P (i) P (x) (1) Remark: Bayes to the next level. 7 / 71
- 11. Naive Bayes Model Task for two classes Let 1, 2 be the two classes in which our samples belong. There is a prior probability of belonging to that class P (1) for class 1. P (2) for class 2. The Rule for classication is the following one P (i|x) = P (x|i) P (i) P (x) (1) Remark: Bayes to the next level. 7 / 71
- 12. Naive Bayes Model Task for two classes Let 1, 2 be the two classes in which our samples belong. There is a prior probability of belonging to that class P (1) for class 1. P (2) for class 2. The Rule for classication is the following one P (i|x) = P (x|i) P (i) P (x) (1) Remark: Bayes to the next level. 7 / 71
- 13. In Informal English We have that posterior = likelihood prior information evidence (2) Basically One: If we can observe x. Two: we can convert the prior-information to the posterior information. 8 / 71
- 14. In Informal English We have that posterior = likelihood prior information evidence (2) Basically One: If we can observe x. Two: we can convert the prior-information to the posterior information. 8 / 71
- 15. In Informal English We have that posterior = likelihood prior information evidence (2) Basically One: If we can observe x. Two: we can convert the prior-information to the posterior information. 8 / 71
- 16. We have the following terms... Likelihood We call p (x|i) the likelihood of i given x: This indicates that given a category i: If p (x|i) is large, then i is the likely class of x. Prior Probability It is the known probability of a given class. Remark: Because, we lack information about this class, we tend to use the uniform distribution. However: We can use other tricks for it. Evidence The evidence factor can be seen as a scale factor that guarantees that the posterior probability sum to one. 9 / 71
- 17. We have the following terms... Likelihood We call p (x|i) the likelihood of i given x: This indicates that given a category i: If p (x|i) is large, then i is the likely class of x. Prior Probability It is the known probability of a given class. Remark: Because, we lack information about this class, we tend to use the uniform distribution. However: We can use other tricks for it. Evidence The evidence factor can be seen as a scale factor that guarantees that the posterior probability sum to one. 9 / 71
- 18. We have the following terms... Likelihood We call p (x|i) the likelihood of i given x: This indicates that given a category i: If p (x|i) is large, then i is the likely class of x. Prior Probability It is the known probability of a given class. Remark: Because, we lack information about this class, we tend to use the uniform distribution. However: We can use other tricks for it. Evidence The evidence factor can be seen as a scale factor that guarantees that the posterior probability sum to one. 9 / 71
- 19. We have the following terms... Likelihood We call p (x|i) the likelihood of i given x: This indicates that given a category i: If p (x|i) is large, then i is the likely class of x. Prior Probability It is the known probability of a given class. Remark: Because, we lack information about this class, we tend to use the uniform distribution. However: We can use other tricks for it. Evidence The evidence factor can be seen as a scale factor that guarantees that the posterior probability sum to one. 9 / 71
- 20. We have the following terms... Likelihood We call p (x|i) the likelihood of i given x: This indicates that given a category i: If p (x|i) is large, then i is the likely class of x. Prior Probability It is the known probability of a given class. Remark: Because, we lack information about this class, we tend to use the uniform distribution. However: We can use other tricks for it. Evidence The evidence factor can be seen as a scale factor that guarantees that the posterior probability sum to one. 9 / 71
- 21. We have the following terms... Likelihood We call p (x|i) the likelihood of i given x: This indicates that given a category i: If p (x|i) is large, then i is the likely class of x. Prior Probability It is the known probability of a given class. Remark: Because, we lack information about this class, we tend to use the uniform distribution. However: We can use other tricks for it. Evidence The evidence factor can be seen as a scale factor that guarantees that the posterior probability sum to one. 9 / 71
- 22. The most important term in all this The factor likelihood prior information (3) 10 / 71
- 23. Example We have the likelihood of two classes 11 / 71
- 24. Example We have the posterior of two classes when P (1) = 2 3 and P (2) = 1 3 12 / 71
- 25. Naive Bayes Model In the case of two classes P (x) = 2 i=1 p (x, i) = 2 i=1 p (x|i) P (i) (4) 13 / 71
- 26. Error in this rule We have that P (error|x) = P (1|x) if we decide 2 P (2|x) if we decide 1 (5) Thus, we have that P (error) = P (error, x) dx = P (error|x) p (x) dx (6) 14 / 71
- 27. Error in this rule We have that P (error|x) = P (1|x) if we decide 2 P (2|x) if we decide 1 (5) Thus, we have that P (error) = P (error, x) dx = P (error|x) p (x) dx (6) 14 / 71
- 28. Classication Rule Thus, we have the Bayes Classication Rule 1 If P (1|x) > P (2|x) x is classied to 1 2 If P (1|x) < P (2|x) x is classied to 2 15 / 71
- 29. Classication Rule Thus, we have the Bayes Classication Rule 1 If P (1|x) > P (2|x) x is classied to 1 2 If P (1|x) < P (2|x) x is classied to 2 15 / 71
- 30. What if we remove the normalization factor? Remember P (1|x) + P (2|x) = 1 (7) We are able to obtain the new Bayes Classication Rule 1 If P (x|1) p (1) > P (x|2) P (2) x is classied to 1 2 If P (x|1) p (1) < P (x|2) P (2) x is classied to 2 16 / 71
- 31. What if we remove the normalization factor? Remember P (1|x) + P (2|x) = 1 (7) We are able to obtain the new Bayes Classication Rule 1 If P (x|1) p (1) > P (x|2) P (2) x is classied to 1 2 If P (x|1) p (1) < P (x|2) P (2) x is classied to 2 16 / 71
- 32. What if we remove the normalization factor? Remember P (1|x) + P (2|x) = 1 (7) We are able to obtain the new Bayes Classication Rule 1 If P (x|1) p (1) > P (x|2) P (2) x is classied to 1 2 If P (x|1) p (1) < P (x|2) P (2) x is classied to 2 16 / 71
- 33. We have several cases If for some x we have P (x|1) = P (x|2) The nal decision relies completely from the prior probability. On the Other hand if P (1) = P (2), the state is equally probable In this case the decision is based entirely on the likelihoods P (x|i). 17 / 71
- 34. We have several cases If for some x we have P (x|1) = P (x|2) The nal decision relies completely from the prior probability. On the Other hand if P (1) = P (2), the state is equally probable In this case the decision is based entirely on the likelihoods P (x|i). 17 / 71
- 35. How the Rule looks like If P (1) = P (2) the Rule depends on the term p (x|i) 18 / 71
- 36. The Error in the Second Case of Naive Bayes Error in equiprobable classes P (error) = 1 2 x0 p (x|2) dx + 1 2 x0 p (x|1) dx (8) Remark: P (1) = P (2) = 1 2 19 / 71
- 37. What do we want to prove? Something Notable Bayesian classier is optimal with respect to minimizing the classication error probability. 20 / 71
- 38. Proo

Recommended

*View more*