Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems...

Post on 16-Mar-2018

215 views 1 download

Transcript of Perceptrons - University of Texas at Dallasvgogate/ai/fall16/grad/slides/perceptron.pdf · Problems...

CS6364Perceptrons

LinearClassifiers

§ Inputsarefeaturevalues§ Eachfeaturehasaweight§ Sumistheactivation

§ Iftheactivationis:§ Positive,output+1§ Negative,output-1

Σf1f2f3

w1w2w3

>0?

Weights§ Binarycase:comparefeaturestoaweightvector§ Learning:figureouttheweightvectorfromexamples

# free : 2YOUR_NAME : 0MISSPELLED : 2FROM_FRIEND : 0...

# free : 4YOUR_NAME :-1MISSPELLED : 1FROM_FRIEND :-3...

# free : 0YOUR_NAME : 1MISSPELLED : 1FROM_FRIEND : 1...

Dot product positive means the positive class

DecisionRules

BinaryDecisionRule

§ Inthespaceoffeaturevectors§ Examplesarepoints§ Anyweightvectorisahyperplane§ OnesidecorrespondstoY=+1§ OthercorrespondstoY=-1

BIAS : -3free : 4money : 2... 0 1

0

1

2

freemoney

+1=SPAM

-1=HAM

WeightUpdates

Learning:BinaryPerceptron

§ Startwithweights=0§ Foreachtraininginstance:

§ Classifywithcurrentweights

§ Ifcorrect(i.e.,y=y*),nochange!

§ Ifwrong:adjusttheweightvector

Learning:BinaryPerceptron

§ Startwithweights=0§ Foreachtraininginstance:

§ Classifywithcurrentweights

§ Ifcorrect(i.e.,y=y*),nochange!§ Ifwrong:adjusttheweightvectorbyaddingorsubtractingthefeaturevector.Subtractify*is-1.

Examples:Perceptron

§ SeparableCase

MulticlassDecisionRule

§ Ifwehavemultipleclasses:§ Aweightvectorforeachclass:

§ Score(activation)ofaclassy:

§ Predictionhighestscorewins

Binary=multiclasswherethenegativeclasshasweightzero

Learning:MulticlassPerceptron

§ Startwithallweights=0§ Pickuptrainingexamplesonebyone§ Predictwithcurrentweights

§ Ifcorrect,nochange!§ Ifwrong:lowerscoreofwronganswer,

raisescoreofrightanswer

Example:MulticlassPerceptron

BIAS : 1win : 0game : 0 vote : 0 the : 0 ...

BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...

BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...

“winthevote”

“wintheelection”

“winthegame”

PropertiesofPerceptrons

§ Separability:trueifsomeparametersgetthetrainingsetperfectlycorrect

§ Convergence:ifthetrainingisseparable,perceptron willeventuallyconverge(binarycase)

§ MistakeBound:themaximumnumberofmistakes(binarycase)relatedtothemargin ordegreeofseparability

Separable

Non-Separable

Examples:Perceptron

§ Non-SeparableCase

ImprovingthePerceptron

ProblemswiththePerceptron

§ Noise:ifthedataisn’tseparable,weightsmightthrash§ Averagingweightvectorsovertime

canhelp(averagedperceptron)

§ Mediocregeneralization:findsa“barely”separatingsolution

§ Overtraining:test/held-outaccuracyusuallyrises,thenfalls§ Overtrainingisakindofoverfitting

FixingthePerceptron

§ Idea:adjusttheweightupdatetomitigatetheseeffects

§ MIRA*:chooseanupdatesizethatfixesthecurrentmistake…

§ …but,minimizesthechangetow

§ The+1helpstogeneralize

*MarginInfusedRelaxedAlgorithm

MinimumCorrectingUpdate

minnotτ=0,orwouldnothavemadeanerror,sominwillbewhereequalityholds

MaximumStepSize

§ Inpractice,it’salsobadtomakeupdatesthataretoolarge§ Examplemaybelabeledincorrectly§ Youmaynothaveenoughfeatures§ Solution:capthemaximumpossiblevalueofτ withsome

constantC

§ Correspondstoanoptimizationthatassumesnon-separabledata§ Usuallyconvergesfasterthanperceptron§ Usuallybetter,especiallyonnoisydata

LinearSeparators

§ Whichoftheselinearseparatorsisoptimal?

SupportVectorMachines

§ Maximizingthemargin:goodaccordingtointuition,theory,practice§ Onlysupportvectorsmatter;othertrainingexamplesareignorable§ Supportvectormachines(SVMs)findtheseparatorwithmaxmargin§ Basically,SVMsareMIRAwhereyouoptimizeoverallexamplesatonce

MIRA

SVM

Classification:Comparison

§ NaïveBayes§ Buildsamodeltrainingdata§ Givespredictionprobabilities§ Strongassumptionsaboutfeatureindependence§ Onepassthroughdata(counting)

§ Perceptrons/MIRA:§ Makeslessassumptionsaboutdata§ Mistake-drivenlearning§ Multiplepassesthroughdata(prediction)§ Oftenmoreaccurate