Linear Discrimination Centering on Support Vector Machines

download Linear Discrimination Centering on Support Vector Machines

of 30

  • date post

    10-Jun-2015
  • Category

    Documents

  • view

    666
  • download

    2

Embed Size (px)

Transcript of Linear Discrimination Centering on Support Vector Machines

  • 1. CHAPTER 10: Linear Discrimination Eick/Aldaydin: Topic13

2. Likelihood- vs. Discriminant-based Classification

  • Likelihood-based:Assume a model forp ( x | C i ), use Bayes rule to calculateP ( C i | x )
  • g i ( x ) = logP ( C i | x )
  • Discriminant-based:Assume a model forg i ( x | i ); no density estimation
  • Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundaries

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 3. Linear Discriminant

  • Linear discriminant:
  • Advantages:
    • Simple: O( d ) space/computation
    • Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring)
    • Optimal whenp ( x | C i ) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separable

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 4. Generalized Linear Model

  • Quadratic discriminant:
  • Higher-order (product) terms:
  • Map fromxtozusingnonlinear basis functionsand use a linear discriminant inz -space

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 5. Two Classes Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 6. Geometry Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 7. Support Vector Machines

  • One Possible Solution

8. Support Vector Machines

  • Another possible solution

9. Support Vector Machines

  • Other possible solutions

10. Support Vector Machines

  • Which one is better? B1 or B2?
  • How do you define better?

11. Support Vector Machines

  • Find hyperplanemaximizesthe margin => B1 is better than B2

12. Support Vector Machines Examples are: (x1,..,xn,y) with y {-1,1} 13. Support Vector Machines

  • We want to maximize:
    • Which is equivalent to minimizing:
    • But subjected to the following N constraints:
    • This is a constrained convex quadratic optimization problem that can be solved in polynominal time
      • Numerical approaches to solve it (e.g., quadratic programming) exist
      • The function to be optimized has only a single minimum no local minimum problem

14. Support Vector Machines

  • What if the problem is not linearly separable?

15. Linear SVM for Non-linearly Separable Problems

  • What if the problem is not linearly separable?
    • Introduce slack variables
    • Need to minimize:
    • Subject to (i=1,..,N):
    • C is chosen using a validation set trying to keep the margins wide while keeping the training error low.

Measures prediction error Inverse size of margin between hyperplanes Parameter Slack variable allows constraint violation to a certain degree 16. Nonlinear Support Vector Machines

  • What if decision boundary is not linear?

Alternative 1: Use technique that Employs non-linear decision boundaries Non-linear function 17. Nonlinear Support Vector Machines

  • Transform data into higher dimensional space
  • Find the best hyperplane using the methods introduced earlier

Alternative 2: Transform into a higher dimensional attribute space andfindlinear decision boundaries in this space 18. Nonlinear Support Vector Machines

    • Choose a non-linear kernel functionto transform into a different, usually higher dimensional, attribute space
    • Minimize
    • but subjected to the following N constraints:

Find a good hyperplane in the transformed space 19. Example: Polynomial Kernel Function

  • Polynomial Kernel Function:
  • (x1,x2)=(x1 2 ,x2 2 ,sqrt(2)*x1,sqrt(2)*x2,1)
  • K(u,v)= (u) (v)= (u v + 1) 2
  • A Support Vector Machine with polynomial kernel function classifies a new example z as follows:
  • sign(( i y i (x i ) (z))+b)=
  • sign(( i y i (x i z +1) 2 ))+b)
  • Remark: iand b are determined using the methods for linear SVMs that were discussed earlier

Kernel function trick :perform computations in the original space, although we solve an optimization problem in the transformed spacemore efficient; More details Topic14. 20. Summary Support Vector Machines

  • Support vector machines learn hyperplanes that separate two classes maximizing themargin between them( the empty space between the instances of the two classes) .
  • Support vector machines introduce slack variables, in the case that classes are not linear separable and trying to maximize margins while keeping the training error low.
  • The most popular versions of SVMs use non-linear kernel functions to map the attribute space into a higher dimensional space to facilitate finding good linear decision boundaries in the modified space.
  • Support vector machines find margin optimal hyperplanes by solving a convex quadratic optimization problem. However, this optimization process is quite slow and support vector machines tend to fail if the number of examples goes beyond 500/2000/5000
  • In general, support vector machines accomplish quite high accuracies, if compared to other techniques.

21. Useful Support Vector Machine Links Lecture notes are much more helpful to understand the basic ideas:http://www.ics.uci.edu/~welling/teaching/KernelsICS273B/Kernels.html http://cerium.raunvis.hi.is/~tpr/courseware/svm/kraekjur.html Some tools are often used in publications livsvm:http://www.csie.ntu.edu.tw/~cjlin/libsvm/ spider:http://www.kyb.tuebingen.mpg.de/bs/people/spider/index.html Tutorial Slides:http://www.support-vector.net/icml-tutorial.pdf Surveys: http://www.svms.org/survey/Camp00.pdf More General Material:http://www.learning-with-kernels.org/ http://www.kernel-machines.org/ http://kernel-machines.org/publications.html http://www.support-vector.net/tutorial.html Remarks : Thanks to Chaofan Sun for providing these links! 22. Optimal Separating Hyperplane Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) (Cortes and Vapnik, 1995; Vapnik, 1995) Alpaydin transparencies on Support Vector Machines not used in lecture! 23. Margin

  • Distance from the discriminant to the closest instances on either side
  • Distance of x to the hyperplane is
  • We require
  • For a unique soln, fix || w ||=1and to max margin

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 24. Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 25. Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 26. Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) Most tare 0 and only a small number have t>0; they are thesupport vectors 27. Soft Margin Hyperplane

  • Not linearly separable
  • Soft error
  • New primal is

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 28. Kernel Machines

  • Preprocess inputxby basis functions
  • z= ( x ) g ( z )= w T z
  • g ( x )= w T ( x )
  • The SVM solution

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) 29. Kernel Functions

  • Polynomials of degreeq :
  • Radial-basis functions:
  • Sigmoidal functions:

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1) (Cherkassky and Mulier, 1998) 30. SVM for Regression

  • Use a linear model (possibly kernelized)
    • f ( x )= w T x + w 0
  • Use the -sensitive error function

Lecture Notes for E Alpaydn 2004 Introduction to Machine Learning The MIT Press (V1.1)