Exercises in Machine Learning Playing with Kernels

Click here to load reader

  • date post

    05-Jan-2017
  • Category

    Documents

  • view

    221
  • download

    3

Embed Size (px)

Transcript of Exercises in Machine Learning Playing with Kernels

  • Exercises in Machine LearningPlaying with Kernels

    Zdenek Zabokrtsky, Ondrej Bojar

    Institute of Formal and Applied LinguisticsFaculty of Mathematics and Physics

    Charles University, Prague

    Tue Apr 19, 2016

    1 / 78

  • Outline

    I Regularization parameter C in SVM.

    I Linear Kernel: k(x, y) = x yI Polynomial Kernel: k(x, y) = ( x y + coeff0)degreeI RBF Kernel: k(x, y) = exp(x y2); > 0

    . . . including their parameters

    I Cross-validation HeatmapI Multi-class SVM

    I For the PAMAP-easy dataset.I Regularization parameters.I Inseparable classes.

    Based on http://scikit-learn.org/stable/modules/svm.html and other scikit-demos.

    2 / 78

    http://scikit-learn.org/stable/modules/svm.html

  • Regularization (C) in linear SVM

    k(x, y) = x y

    (Linear kernel = no kernel)

    The parameter C in (linear) SVM:I sets the weight of the sum of slack variables.I serves as a regularization parameter.I controls the number of support vectors.

    Penalty Number ofC for Errors points considered Margin Bias Variance

    Low Low Many Wide High LowHigh High Few Narrow Low High

    Think C for VarianCe.

    3 / 78

  • SVM Linear C=0.1

    4 / 78

  • SVM Linear C=0.2

    5 / 78

  • SVM Linear C=0.5

    6 / 78

  • SVM Linear C=1

    7 / 78

  • SVM Linear C=5

    8 / 78

  • SVM Linear C=10

    9 / 78

  • SVM Linear C=20

    10 / 78

  • SVM Linear C=50

    11 / 78

  • SVM Linear C=100

    12 / 78

  • Polynomial Kernel

    k(x, y) = ( x y + coeff0)degree

    13 / 78

  • SVM Poly (degree 1)

    14 / 78

  • SVM Poly (degree 2)

    15 / 78

  • SVM Poly (degree 3)

    16 / 78

  • SVM Poly (degree 4)

    17 / 78

  • SVM Poly (degree 5)

    18 / 78

  • SVM Poly (degree 6)

    19 / 78

  • SVM Poly (degree 7)

    20 / 78

  • SVM Poly (degree 8)

    21 / 78

  • SVM Poly (degree 9)

    22 / 78

  • SVM Poly (degree 3, gamma 0.05)

    23 / 78

  • SVM Poly (degree 3, gamma 0.1)

    24 / 78

  • SVM Poly (degree 3, gamma 0.2)

    25 / 78

  • SVM Poly (degree 3, gamma 0.5)

    26 / 78

  • SVM Poly (degree 3, gamma 0.7)

    27 / 78

  • SVM Poly (degree 3, gamma 1)

    28 / 78

  • SVM Poly (degree 3, gamma 2)

    29 / 78

  • SVM Poly (d=3, g=0.5, coef=-2.0)

    30 / 78

  • SVM Poly (d=3, g=0.5, coef=-1.0)

    31 / 78

  • SVM Poly (d=3, g=0.5, coef=-0.50)

    32 / 78

  • SVM Poly (d=3, g=0.5, coef=0)

    33 / 78

  • SVM Poly (d=3, g=0.5, coef=0.5)

    34 / 78

  • SVM Poly (d=3, g=0.5, coef=1)

    35 / 78

  • SVM Poly (d=3, g=0.5, coef=2)

    36 / 78

  • RBF Kernels

    k(x, y) = exp(x y2); > 0

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    -10 -5 0 5 10

    exp(-0.5*x*x)exp(-1*x*x)exp(-2*x*x)exp(-3*x*x)

    I Each training point creates its bell.I Overall shape is the sum of the bells.I Kind of all nearest neighbours.

    37 / 78

  • RBF Kernel Parameters

    C Decision Surface Model Bias VarianceLow Smooth Simple High LowHigh Peaked Complex Low High

    gamma Affected PointsLow can be far from training examplesHigh must be close to training examples

    I Does higher gamma lead to higher variance?

    I Choice critical for SVM performance.I Advised to use GridSearchCV for C and gamma:

    I exponentially spaced probesI wide range

    38 / 78

  • SVM RBF (C=0.05, gamma=2)

    39 / 78

  • SVM RBF (C=0.1, gamma=2)

    40 / 78

  • SVM RBF (C=0.2, gamma=2)

    41 / 78

  • SVM RBF (C=0.5, gamma=2)

    42 / 78

  • SVM RBF (C=0.6, gamma=2)

    43 / 78

  • SVM RBF (C=0.7, gamma=2)

    44 / 78

  • SVM RBF (C=1, gamma=2)

    45 / 78

  • SVM RBF (C=2, gamma=2)

    46 / 78

  • SVM RBF (C=1, gamma=2)

    47 / 78

  • SVM RBF (C=0.5, gamma=2)

    48 / 78

  • SVM RBF (C=0.5, gamma=5)

    49 / 78

  • SVM RBF (C=0.5, gamma=10)

    50 / 78

  • SVM RBF (C=0.5, gamma=5)

    51 / 78

  • SVM RBF (C=0.5, gamma=2)

    52 / 78

  • SVM RBF (C=0.5, gamma=1)

    53 / 78

  • SVM RBF (C=0.5, gamma=0.7)

    54 / 78

  • SVM RBF (C=0.5, gamma=0.5)

    55 / 78

  • SVM RBF (C=0.5, gamma=0.2)

    56 / 78

  • SVM RBF (C=0.5, gamma=0.1)

    57 / 78

  • SVM RBF (C=0.5, gamma=0.05)

    58 / 78

  • Cross-validation Heatmap

    http://scikit-learn.org/stable/auto_examples/svm/plot_

    rbf_parameters.html

    59 / 78

    http://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.htmlhttp://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html

  • Multi-class SVM

    Two implementations in scikit-learn:I SVC: one-against-one

    I n(n 1)/2 classifiers constructedI supports various kernels, incl. custom ones

    I LinearSVC: one-vs-the-restI n classifiers trained

    60 / 78

  • PAMAP-easy Training Data

    30

    30.5

    31

    31.5

    32

    32.5

    33

    33.5

    34

    60 80 100 120 140 160 180 200

    1121316172

    2434567

    61 / 78

  • Default View (every 200)

    62 / 78

  • Default View (every 300)

    63 / 78

  • Default View (every 400)

    64 / 78

  • Regularization C=0.5

    65 / 78

  • Regularization C=1

    66 / 78

  • Regularization C=5

    67 / 78

  • Regularization C=10

    68 / 78

  • Regularization C=20

    69 / 78

  • Regularization C=50

    70 / 78

  • Regularization C=500

    71 / 78

  • Regularization C=5000

    72 / 78

  • Inseparable classes 12,13 (every 200)

    73 / 78

  • Inseparable classes 12,13 (every 100)

    74 / 78

  • Inseparable classes 12,13 (every 80)

    75 / 78

  • Inseparable classes 12,13 (every 60)

    76 / 78

  • Inseparable classes 12,13 (every 55)

    77 / 78

  • Task and Homework #07I Required minimum: For PAMAP-Easy as divided into

    train+test:I Cross-validate to choose between linear, poly and RBF.I Create the heatmap for RBF (i.e. plot score for all values of

    C and gamma).I Use the GridSearchCV to find the best C and gamma (i.e.

    find the best without plotting anything).I Enter the accuracy of the best setting to

    classification results.txt, mention C and gamma in thecomment.

    I Optional:I Use GridSearchCV on all our datasets.I Enter the accuracies of the best settings to

    classification results.txt, mention C and gamma in thecomment.

    Due: 2 weeks from now, i.e. May 3.78 / 78