Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems...

22
CS 6364 Perceptrons

Transcript of Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems...

Page 1: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

CS6364Perceptrons

Page 2: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

LinearClassifiers

§ Inputsarefeaturevalues§ Eachfeaturehasaweight§ Sumistheactivation

§ Iftheactivationis:§ Positive,output+1§ Negative,output-1

Σf1f2f3

w1w2w3

>0?

Page 3: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

Weights§ Binarycase:comparefeaturestoaweightvector§ Learning:figureouttheweightvectorfromexamples

# free : 2YOUR_NAME : 0MISSPELLED : 2FROM_FRIEND : 0...

# free : 4YOUR_NAME :-1MISSPELLED : 1FROM_FRIEND :-3...

# free : 0YOUR_NAME : 1MISSPELLED : 1FROM_FRIEND : 1...

Dot product positive means the positive class

Page 4: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

DecisionRules

Page 5: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

BinaryDecisionRule

§ Inthespaceoffeaturevectors§ Examplesarepoints§ Anyweightvectorisahyperplane§ OnesidecorrespondstoY=+1§ OthercorrespondstoY=-1

BIAS : -3free : 4money : 2... 0 1

0

1

2

freemoney

+1=SPAM

-1=HAM

Page 6: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

WeightUpdates

Page 7: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

Learning:BinaryPerceptron

§ Startwithweights=0§ Foreachtraininginstance:

§ Classifywithcurrentweights

§ Ifcorrect(i.e.,y=y*),nochange!

§ Ifwrong:adjusttheweightvector

Page 8: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

Learning:BinaryPerceptron

§ Startwithweights=0§ Foreachtraininginstance:

§ Classifywithcurrentweights

§ Ifcorrect(i.e.,y=y*),nochange!§ Ifwrong:adjusttheweightvectorbyaddingorsubtractingthefeaturevector.Subtractify*is-1.

Page 9: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

Examples:Perceptron

§ SeparableCase

Page 10: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

MulticlassDecisionRule

§ Ifwehavemultipleclasses:§ Aweightvectorforeachclass:

§ Score(activation)ofaclassy:

§ Predictionhighestscorewins

Binary=multiclasswherethenegativeclasshasweightzero

Page 11: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

Learning:MulticlassPerceptron

§ Startwithallweights=0§ Pickuptrainingexamplesonebyone§ Predictwithcurrentweights

§ Ifcorrect,nochange!§ Ifwrong:lowerscoreofwronganswer,

raisescoreofrightanswer

Page 12: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

Example:MulticlassPerceptron

BIAS : 1win : 0game : 0 vote : 0 the : 0 ...

BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...

BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...

“winthevote”

“wintheelection”

“winthegame”

Page 13: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

PropertiesofPerceptrons

§ Separability:trueifsomeparametersgetthetrainingsetperfectlycorrect

§ Convergence:ifthetrainingisseparable,perceptron willeventuallyconverge(binarycase)

§ MistakeBound:themaximumnumberofmistakes(binarycase)relatedtothemargin ordegreeofseparability

Separable

Non-Separable

Page 14: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

Examples:Perceptron

§ Non-SeparableCase

Page 15: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

ImprovingthePerceptron

Page 16: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

ProblemswiththePerceptron

§ Noise:ifthedataisn’tseparable,weightsmightthrash§ Averagingweightvectorsovertime

canhelp(averagedperceptron)

§ Mediocregeneralization:findsa“barely”separatingsolution

§ Overtraining:test/held-outaccuracyusuallyrises,thenfalls§ Overtrainingisakindofoverfitting

Page 17: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

FixingthePerceptron

§ Idea:adjusttheweightupdatetomitigatetheseeffects

§ MIRA*:chooseanupdatesizethatfixesthecurrentmistake…

§ …but,minimizesthechangetow

§ The+1helpstogeneralize

*MarginInfusedRelaxedAlgorithm

Page 18: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

MinimumCorrectingUpdate

minnotτ=0,orwouldnothavemadeanerror,sominwillbewhereequalityholds

Page 19: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

MaximumStepSize

§ Inpractice,it’salsobadtomakeupdatesthataretoolarge§ Examplemaybelabeledincorrectly§ Youmaynothaveenoughfeatures§ Solution:capthemaximumpossiblevalueofτ withsome

constantC

§ Correspondstoanoptimizationthatassumesnon-separabledata§ Usuallyconvergesfasterthanperceptron§ Usuallybetter,especiallyonnoisydata

Page 20: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

LinearSeparators

§ Whichoftheselinearseparatorsisoptimal?

Page 21: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

SupportVectorMachines

§ Maximizingthemargin:goodaccordingtointuition,theory,practice§ Onlysupportvectorsmatter;othertrainingexamplesareignorable§ Supportvectormachines(SVMs)findtheseparatorwithmaxmargin§ Basically,SVMsareMIRAwhereyouoptimizeoverallexamplesatonce

MIRA

SVM

Page 22: Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems with the Perceptron § Noise: if the data isn’t separable, weights might thrash

Classification:Comparison

§ NaïveBayes§ Buildsamodeltrainingdata§ Givespredictionprobabilities§ Strongassumptionsaboutfeatureindependence§ Onepassthroughdata(counting)

§ Perceptrons/MIRA:§ Makeslessassumptionsaboutdata§ Mistake-drivenlearning§ Multiplepassesthroughdata(prediction)§ Oftenmoreaccurate