Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO...

35
Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 1 of 19

Transcript of Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO...

Page 1: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Solving SVMs (SMO Algorithms)

Jordan Boyd-GraberUniversity of Colorado BoulderLECTURE 9A

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 1 of 19

Page 2: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Dual Objective

Plan

Dual Objective

Algorithm Big Picture

The Algorithm

Recap

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 2 of 19

Page 3: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Dual Objective

Lagrange Multipliers

Introduce Lagrange variables αi ≥ 0, i ∈ [1,m] for each of the mconstraints (one for each data point).

L (w , b, α) =1

2||w ||2 −

m∑i=1

αi [yi (w · xi + b)− 1] (1)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 3 of 19

Page 4: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Dual Objective

Solving Lagrangian

Weights

~w =m∑i=1

αiyi~xi (2)

Bias

0 =m∑i=1

αiyi (3)

Support Vector-ness

αi = 0 ∨ yi (w · xi + b) ≤ 1 (4)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 4 of 19

Page 5: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Dual Objective

Solving Lagrangian

Weights

~w =m∑i=1

αiyi~xi (2)

Bias

0 =m∑i=1

αiyi (3)

Support Vector-ness

αi = 0 ∨ yi (w · xi + b) ≤ 1 (4)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 4 of 19

Page 6: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Dual Objective

Solving Lagrangian

Weights

~w =m∑i=1

αiyi~xi (2)

Bias

0 =m∑i=1

αiyi (3)

Support Vector-ness

αi = 0 ∨ yi (w · xi + b) ≤ 1 (4)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 4 of 19

Page 7: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Dual Objective

Reparameterize in terms of α

max~α

m∑i=1

αi −1

2

m∑i=1

m∑i=1

αiαjyiyj(~xi · ~xj) (5)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 5 of 19

Page 8: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Dual Objective

Strawman: Coordinate Descent

• Why not optimize one coordinate αi at a time?

• Constraints!

• So we’ll just minimize pairs (αi , αj) at a time

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 6 of 19

Page 9: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Dual Objective

Strawman: Coordinate Descent

• Why not optimize one coordinate αi at a time?

• Constraints!

• So we’ll just minimize pairs (αi , αj) at a time

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 6 of 19

Page 10: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Dual Objective

Outline for SVM Optimization (SMO)

1. Select two examples i , j

2. Get a learning rate η

3. Update αj

4. Update αi

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 7 of 19

Page 11: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Algorithm Big Picture

Plan

Dual Objective

Algorithm Big Picture

The Algorithm

Recap

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 8 of 19

Page 12: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Algorithm Big Picture

Contrast with SG

• There’s a learning rate η that depends on the data

• Use the error of an example to derive update

• You update multiple α at once

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 9 of 19

Page 13: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Algorithm Big Picture

Contrast with SG

• There’s a learning rate η that depends on the data

• Use the error of an example to derive update

• You update multiple α at once: if one goes up, the other should godown because

∑yiαi = 0

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 9 of 19

Page 14: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Algorithm Big Picture

More details

• We enforce every αi < C (slackness)

• How do we know we’ve converged?

αi = 0⇒ yi (w · xi + b) ≥ 1 (6)

αi = C ⇒ yi (w · xi + b) ≤ 1 (7)

0 < αi < C ⇒ yi (w · xi + b) = 1 (8)

(Karush-Kuhn-Tucker Conditions)

• Keep checking (to some tolerance)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 10 of 19

Page 15: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Algorithm Big Picture

More details

• We enforce every αi < C (slackness)

• How do we know we’ve converged?

αi = 0⇒ yi (w · xi + b) ≥ 1 (6)

αi = C ⇒ yi (w · xi + b) ≤ 1 (7)

0 < αi < C ⇒ yi (w · xi + b) = 1 (8)

(Karush-Kuhn-Tucker Conditions)

• Keep checking (to some tolerance)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 10 of 19

Page 16: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Algorithm Big Picture

More details

• We enforce every αi < C (slackness)

• How do we know we’ve converged?

αi = 0⇒ yi (w · xi + b) ≥ 1 (6)

αi = C ⇒ yi (w · xi + b) ≤ 1 (7)

0 < αi < C ⇒ yi (w · xi + b) = 1 (8)

(Karush-Kuhn-Tucker Conditions)

• Keep checking (to some tolerance)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 10 of 19

Page 17: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Plan

Dual Objective

Algorithm Big Picture

The Algorithm

Recap

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 11 of 19

Page 18: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 1: Select i and j

• Find some i ∈ {1, . . .m} that violates KKT

• Choose j randomly from m − 1 other options

• You can do better (particularly for large datasets)

• Repeat until KKT conditions are met

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 12 of 19

Page 19: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 2: Optimize αj

1. Compute upper (H) and lower (L) bounds that ensure 0 < αj ≤ C .

yi 6= yj

L = max(0, αj − αi ) (9)

H = min(C ,C + αj − αi ) (10)

yi = yj

L = max(0, αi + αj − C ) (11)

H = min(C , αj + αi ) (12)

This is because the update for αi is based on yiyj (sign matters)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 13 of 19

Page 20: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 2: Optimize αj

1. Compute upper (H) and lower (L) bounds that ensure 0 < αj ≤ C .

yi 6= yj

L = max(0, αj − αi ) (9)

H = min(C ,C + αj − αi ) (10)

yi = yj

L = max(0, αi + αj − C ) (11)

H = min(C , αj + αi ) (12)

This is because the update for αi is based on yiyj (sign matters)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 13 of 19

Page 21: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 2: Optimize αj

Compute errors for i and j

Ek ≡ f (xk)− yk (13)

and the learning rate (more similar, higher step size)

η = 2xi · xj − xi · xi − xj · xj (14)

for new value for αj

α∗j = α

(old)j −

yj(Ei − Ej)

η(15)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 14 of 19

Page 22: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 2: Optimize αj

Compute errors for i and j

Ek ≡ f (xk)− yk (13)

and the learning rate (more similar, higher step size)

η = 2xi · xj − xi · xi − xj · xj (14)

for new value for αj

α∗j = α

(old)j −

yj(Ei − Ej)

η(15)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 14 of 19

Page 23: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 2: Optimize αj

Compute errors for i and j

Ek ≡ f (xk)− yk (13)

and the learning rate (more similar, higher step size)

η = 2xi · xj − xi · xi − xj · xj (14)

for new value for αj

α∗j = α

(old)j −

yj(Ei − Ej)

η(15)

Similar to stochastic gradient, but with additional error term.

If α∗j is

outside [L,H], clip it so that it is within the range.

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 14 of 19

Page 24: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 2: Optimize αj

Compute errors for i and j

Ek ≡ f (xk)− yk (13)

and the learning rate (more similar, higher step size)

η = 2xi · xj − xi · xi − xj · xj (14)

for new value for αj

α∗j = α

(old)j −

yj(Ei − Ej)

η(15)

What if xi == xj?

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 14 of 19

Page 25: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 3: Optimize αi

Set αi :

α∗i = α

(old)i + yiyj

(α(old)j − αj

)(16)

This balances out the move that we made for αj .

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 15 of 19

Page 26: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 3: Optimize αi

Set αi :

α∗i = α

(old)i + yiyj

(α(old)j − αj

)(16)

This balances out the move that we made for αj .

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 15 of 19

Page 27: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 4: Optimize the threshold b

We need the KKT conditions to be satisfied for these two examples.

• If 0 < αi < C (support vector)

b = b1 = b−Ei − yi (α∗i −α

(old)i )xi · xi − yj(α

∗j −α

(old)j )xi · xj (17)

• If 0 < αj < C (support vector)

b = b2 = b−Ej − yi (α∗i −α

(old)i )xi · xj − yj(α

∗j −α

(old)j )xj · xj (18)

• If both αi and αj are at the bounds (well away from margin), thenanything between b1 and b2 works, so we set

b =b1 + b2

2(19)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 16 of 19

Page 28: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 4: Optimize the threshold b

We need the KKT conditions to be satisfied for these two examples.

• If 0 < αi < C (support vector)

b = b1 = b−Ei − yi (α∗i −α

(old)i )xi · xi − yj(α

∗j −α

(old)j )xi · xj (17)

• If 0 < αj < C (support vector)

b = b2 = b−Ej − yi (α∗i −α

(old)i )xi · xj − yj(α

∗j −α

(old)j )xj · xj (18)

• If both αi and αj are at the bounds (well away from margin), thenanything between b1 and b2 works, so we set

b =b1 + b2

2(19)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 16 of 19

Page 29: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Step 4: Optimize the threshold b

We need the KKT conditions to be satisfied for these two examples.

• If 0 < αi < C (support vector)

b = b1 = b−Ei − yi (α∗i −α

(old)i )xi · xi − yj(α

∗j −α

(old)j )xi · xj (17)

• If 0 < αj < C (support vector)

b = b2 = b−Ej − yi (α∗i −α

(old)i )xi · xj − yj(α

∗j −α

(old)j )xj · xj (18)

• If both αi and αj are at the bounds (well away from margin), thenanything between b1 and b2 works, so we set

b =b1 + b2

2(19)

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 16 of 19

Page 30: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Iterations / Details

• What if i doesn’t violate the KKT conditions?

• What if η ≥ 0?

• When do we stop?

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 17 of 19

Page 31: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Iterations / Details

• What if i doesn’t violate the KKT conditions? Skip it!

• What if η ≥ 0?

• When do we stop?

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 17 of 19

Page 32: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Iterations / Details

• What if i doesn’t violate the KKT conditions? Skip it!

• What if η ≥ 0? Skip it!

• When do we stop?

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 17 of 19

Page 33: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

The Algorithm

Iterations / Details

• What if i doesn’t violate the KKT conditions? Skip it!

• What if η ≥ 0? Skip it!

• When do we stop? Until we go through α’s without changinganything

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 17 of 19

Page 34: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Recap

Plan

Dual Objective

Algorithm Big Picture

The Algorithm

Recap

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 18 of 19

Page 35: Solving SVMs (SMO Algorithms) - Computer Sciencejbg/teaching/CSCI_5622/09a.pdf · Solving SVMs (SMO Algorithms) Jordan Boyd-Graber University of Colorado Boulder LECTURE 9A Jordan

Recap

Recap

• SMO: Optimize objective function for two data points

• Convex problem: Will converge

• Relatively fast

• Gives good performance

• Next HW!

Jordan Boyd-Graber | Boulder Solving SVMs (SMO Algorithms) | 19 of 19