Post on 16-Mar-2018
CS6364Perceptrons
LinearClassifiers
§ Inputsarefeaturevalues§ Eachfeaturehasaweight§ Sumistheactivation
§ Iftheactivationis:§ Positive,output+1§ Negative,output-1
Σf1f2f3
w1w2w3
>0?
Weights§ Binarycase:comparefeaturestoaweightvector§ Learning:figureouttheweightvectorfromexamples
# free : 2YOUR_NAME : 0MISSPELLED : 2FROM_FRIEND : 0...
# free : 4YOUR_NAME :-1MISSPELLED : 1FROM_FRIEND :-3...
# free : 0YOUR_NAME : 1MISSPELLED : 1FROM_FRIEND : 1...
Dot product positive means the positive class
DecisionRules
BinaryDecisionRule
§ Inthespaceoffeaturevectors§ Examplesarepoints§ Anyweightvectorisahyperplane§ OnesidecorrespondstoY=+1§ OthercorrespondstoY=-1
BIAS : -3free : 4money : 2... 0 1
0
1
2
freemoney
+1=SPAM
-1=HAM
WeightUpdates
Learning:BinaryPerceptron
§ Startwithweights=0§ Foreachtraininginstance:
§ Classifywithcurrentweights
§ Ifcorrect(i.e.,y=y*),nochange!
§ Ifwrong:adjusttheweightvector
Learning:BinaryPerceptron
§ Startwithweights=0§ Foreachtraininginstance:
§ Classifywithcurrentweights
§ Ifcorrect(i.e.,y=y*),nochange!§ Ifwrong:adjusttheweightvectorbyaddingorsubtractingthefeaturevector.Subtractify*is-1.
Examples:Perceptron
§ SeparableCase
MulticlassDecisionRule
§ Ifwehavemultipleclasses:§ Aweightvectorforeachclass:
§ Score(activation)ofaclassy:
§ Predictionhighestscorewins
Binary=multiclasswherethenegativeclasshasweightzero
Learning:MulticlassPerceptron
§ Startwithallweights=0§ Pickuptrainingexamplesonebyone§ Predictwithcurrentweights
§ Ifcorrect,nochange!§ Ifwrong:lowerscoreofwronganswer,
raisescoreofrightanswer
Example:MulticlassPerceptron
BIAS : 1win : 0game : 0 vote : 0 the : 0 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
“winthevote”
“wintheelection”
“winthegame”
PropertiesofPerceptrons
§ Separability:trueifsomeparametersgetthetrainingsetperfectlycorrect
§ Convergence:ifthetrainingisseparable,perceptron willeventuallyconverge(binarycase)
§ MistakeBound:themaximumnumberofmistakes(binarycase)relatedtothemargin ordegreeofseparability
Separable
Non-Separable
Examples:Perceptron
§ Non-SeparableCase
ImprovingthePerceptron
ProblemswiththePerceptron
§ Noise:ifthedataisn’tseparable,weightsmightthrash§ Averagingweightvectorsovertime
canhelp(averagedperceptron)
§ Mediocregeneralization:findsa“barely”separatingsolution
§ Overtraining:test/held-outaccuracyusuallyrises,thenfalls§ Overtrainingisakindofoverfitting
FixingthePerceptron
§ Idea:adjusttheweightupdatetomitigatetheseeffects
§ MIRA*:chooseanupdatesizethatfixesthecurrentmistake…
§ …but,minimizesthechangetow
§ The+1helpstogeneralize
*MarginInfusedRelaxedAlgorithm
MinimumCorrectingUpdate
minnotτ=0,orwouldnothavemadeanerror,sominwillbewhereequalityholds
MaximumStepSize
§ Inpractice,it’salsobadtomakeupdatesthataretoolarge§ Examplemaybelabeledincorrectly§ Youmaynothaveenoughfeatures§ Solution:capthemaximumpossiblevalueofτ withsome
constantC
§ Correspondstoanoptimizationthatassumesnon-separabledata§ Usuallyconvergesfasterthanperceptron§ Usuallybetter,especiallyonnoisydata
LinearSeparators
§ Whichoftheselinearseparatorsisoptimal?
SupportVectorMachines
§ Maximizingthemargin:goodaccordingtointuition,theory,practice§ Onlysupportvectorsmatter;othertrainingexamplesareignorable§ Supportvectormachines(SVMs)findtheseparatorwithmaxmargin§ Basically,SVMsareMIRAwhereyouoptimizeoverallexamplesatonce
MIRA
SVM
Classification:Comparison
§ NaïveBayes§ Buildsamodeltrainingdata§ Givespredictionprobabilities§ Strongassumptionsaboutfeatureindependence§ Onepassthroughdata(counting)
§ Perceptrons/MIRA:§ Makeslessassumptionsaboutdata§ Mistake-drivenlearning§ Multiplepassesthroughdata(prediction)§ Oftenmoreaccurate