MA2823: Foundations of Machine Learning Homework...

1
MA2823: Foundations of Machine Learning Homework 8 Due November 25, 2016. Question 1 Let us consider the training data {(x 1 ,y 1 ),..., (x n ,y n )} where x i R p and y i ∈ {-1, +1}. A soft-margin SVM solves arg min wR p ,bR,ξR n 1 2 ||w|| 2 + C n X i=1 ξ i (1) s. t. y i (hw, x i i + b) 1 - ξ i ξ i 0 i ∈{1,...,n} (a) Is a soft-margin SVM more likely to overfit if C is large or small? Solution: If C is large, more importance is given to the error on the training set, and the SVM is more likely to overfit. (b) Give one way of choosing C in practice. Solution: By cross-validation. (c) What does this mean for a feature j if the solution w j is close to 0? Solution: That this feature is uninformative. The class won’t depend on this fea- ture. (Note: we’re talking about a feature weight w j , not a (support) vector coeffi- cient α i .) (d) Give an interpretation of the two terms of Equation (1): ||w|| 2 and n i=1 ξ i . Solution: ||w|| is the inverse of the margin. n i=1 ξ i is the sum of slacks ξ i , which quantify the error for each misclassified train- ing point.

Transcript of MA2823: Foundations of Machine Learning Homework...

Page 1: MA2823: Foundations of Machine Learning Homework 8cazencott.info/dotclear/public/lectures/ma2823_2016/hw/hw8_sol.pdf · MA2823: Foundations of Machine Learning Homework 8 DueNovember25,2016.

MA2823: Foundations of Machine Learning Homework 8

Due November 25, 2016.

Question 1Let us consider the training data {(x1, y1), . . . , (xn, yn)}wherexi ∈ Rp and yi ∈ {−1,+1}.A soft-margin SVM solves

arg minw∈Rp,b∈R,ξ∈Rn

1

2||w||2 + C

n∑i=1

ξi (1)

s. t. yi(〈w,xi〉+ b) ≥ 1− ξiξi ≥ 0 ∀i ∈ {1, . . . , n}

(a) Is a soft-margin SVMmore likely to overfit if C is large or small?

Solution: If C is large, more importance is given to the error on the training set,and the SVM is more likely to overfit.

(b) Give one way of choosing C in practice.

Solution: By cross-validation.

(c) What does this mean for a feature j if the solution wj is close to 0?

Solution: That this feature is uninformative. The class won’t depend on this fea-ture. (Note: we’re talking about a feature weight wj , not a (support) vector coeffi-cient αi.)

(d) Give an interpretation of the two terms of Equation (1): ||w||2 and∑n

i=1 ξi.

Solution: ||w|| is the inverse of the margin.∑ni=1 ξi is the sum of slacks ξi, which quantify the error for eachmisclassified train-

ing point.