Exercises in Machine Learning Playing with Kernels

Exercises in Machine LearningPlaying with Kernels

Zdenek Zabokrtsky, Ondrej Bojar

Institute of Formal and Applied LinguisticsFaculty of Mathematics and Physics

Charles University, Prague

Tue Apr 19, 2016

1 / 78

Outline

I Regularization parameter C in SVM.

I Linear Kernel: k(x, y) = x · yI Polynomial Kernel: k(x, y) = (γ ∗ x · y + coeff0)degree

I RBF Kernel: k(x, y) = exp(−γ‖x− y‖2); γ > 0. . . including their parameters

I Cross-validation HeatmapI Multi-class SVM

I For the PAMAP-easy dataset.I Regularization parameters.I Inseparable classes.

Based on http://scikit-learn.org/stable/modules/svm.html and other scikit-demos.

2 / 78

http://scikit-learn.org/stable/modules/svm.html

Regularization (C) in linear SVM

k(x, y) = x · y

(Linear kernel = no kernel)

The parameter C in (linear) SVM:I sets the weight of the sum of slack variables.I serves as a regularization parameter.I controls the number of support vectors.

Penalty Number ofC for Errors points considered Margin Bias Variance

Low Low Many Wide High LowHigh High Few Narrow Low High

Think C for VarianCe.

3 / 78

SVM Linear C=0.1

4 / 78

SVM Linear C=0.2

5 / 78

SVM Linear C=0.5

6 / 78

SVM Linear C=1

7 / 78

SVM Linear C=5

8 / 78

SVM Linear C=10

9 / 78

SVM Linear C=20

10 / 78

SVM Linear C=50

11 / 78

SVM Linear C=100

12 / 78

Polynomial Kernel

k(x, y) = (γ ∗ x · y + coeff0)degree

13 / 78

SVM Poly (degree 1)

14 / 78

SVM Poly (degree 2)

15 / 78

SVM Poly (degree 3)

16 / 78

SVM Poly (degree 4)

17 / 78

SVM Poly (degree 5)

18 / 78

SVM Poly (degree 6)

19 / 78

SVM Poly (degree 7)

20 / 78

SVM Poly (degree 8)

21 / 78

SVM Poly (degree 9)

22 / 78

SVM Poly (degree 3, gamma 0.05)

23 / 78


24 / 78


25 / 78


26 / 78


27 / 78

SVM Poly (degree 3, gamma 1)

28 / 78

SVM Poly (degree 3, gamma 2)

29 / 78

SVM Poly (d=3, g=0.5, coef=-2.0)

30 / 78

SVM Poly (d=3, g=0.5, coef=-1.0)

31 / 78

SVM Poly (d=3, g=0.5, coef=-0.50)

32 / 78

SVM Poly (d=3, g=0.5, coef=0)

33 / 78

SVM Poly (d=3, g=0.5, coef=0.5)

34 / 78


35 / 78


36 / 78

RBF Kernels

k(x, y) = exp(−γ‖x− y‖2); γ > 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-10 -5 0 5 10

exp(-0.5*x*x)exp(-1*x*x)exp(-2*x*x)exp(-3*x*x)

I Each training point creates its bell.I Overall shape is the sum of the bells.I Kind of “all nearest neighbours”.

37 / 78

RBF Kernel Parameters

C Decision Surface Model Bias VarianceLow Smooth Simple High LowHigh Peaked Complex Low High

gamma Affected PointsLow can be far from training examplesHigh must be close to training examples

I Does higher gamma lead to higher variance?

I Choice critical for SVM performance.I Advised to use GridSearchCV for C and gamma:

I exponentially spaced probesI wide range

38 / 78

SVM RBF (C=0.05, gamma=2)

39 / 78


40 / 78


41 / 78


42 / 78


43 / 78


44 / 78

SVM RBF (C=1, gamma=2)

45 / 78


46 / 78


47 / 78


48 / 78


49 / 78


50 / 78


51 / 78


52 / 78


53 / 78

SVM RBF (C=0.5, gamma=0.7)

54 / 78


55 / 78


56 / 78


57 / 78


58 / 78

Cross-validation Heatmap

http://scikit-learn.org/stable/auto_examples/svm/plot_

rbf_parameters.html

59 / 78

http://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html

http://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html

Multi-class SVM

Two implementations in scikit-learn:I SVC: one-against-one

I n(n − 1)/2 classifiers constructedI supports various kernels, incl. custom ones

I LinearSVC: one-vs-the-restI n classifiers trained

60 / 78

PAMAP-easy Training Data

30

30.5

31

31.5

32

32.5

33

33.5

34

60 80 100 120 140 160 180 200

1121316172

2434567

61 / 78

Default View (every 200)

62 / 78


63 / 78


64 / 78

Regularization C=0.5

65 / 78

Regularization C=1

66 / 78

Regularization C=5

67 / 78

Regularization C=10

68 / 78

Regularization C=20

69 / 78

Regularization C=50

70 / 78

Regularization C=500

71 / 78

Regularization C=5000

72 / 78

Inseparable classes 12,13 (every 200)

73 / 78


74 / 78


75 / 78


76 / 78


77 / 78

Task and Homework #07I Required minimum: For PAMAP-Easy as divided into

train+test:I Cross-validate to choose between linear, poly and RBF.I Create the heatmap for RBF (i.e. plot score for all values of

C and gamma).I Use the GridSearchCV to find the best C and gamma (i.e.

find the best without plotting anything).I Enter the accuracy of the best setting to

classification results.txt, mention C and gamma in thecomment.

I Optional:I Use GridSearchCV on all our datasets.I Enter the accuracies of the best settings to

classification results.txt, mention C and gamma in thecomment.

Due: 2 weeks from now, i.e. May 3.78 / 78

Exercises in Machine Learning Playing with Kernels

Documents

Transcript of Exercises in Machine Learning Playing with Kernels