CS388: Natural Language Processing Lecture 15: Seman
Transcript of CS388: Natural Language Processing Lecture 15: Seman
CS388:NaturalLanguageProcessing
GregDurre8
Lecture15:Seman<csII/Seq2seqI
credit:NawaphonIsarathanachaikulonimgflip
Administrivia
‣ Project2outtoday
‣Mini2gradedbytomorrow
‣ Finalprojectfeedbacksoon
Recall:ParsestoLogicalForms
NP
VPNNP NNP
S
VBPLadyGaga
sings
e470
λy. sings(y)
sings(e470) ∧ dances(e470)
VP
CC VP
VBPdancesλy. dances(y)
and
VP:λy.a(y)∧b(y)->VP:λy.a(y)CCVP:λy.b(y)
λy. sings(y) ∧ dances(y)
‣ Generalrules:S:f(x)->NP:xVP:f
Recall:CCG‣ Steedman+Szabolcsi1980s:formalismbridgingsyntaxandseman<cs
‣ Syntac<ccategories(forthislecture):S,NP,“slash”categories‣ S\NP:“ifIcombinewithanNPonmyleaside,Iformasentence”—verb
‣ (S\NP)/NP:“IneedanNPonmyrightandthenonmylea”—verbwithadirectobject
NP S\NP
Eminem singse728 λy. sings(y)
Ssings(e728)
NP (S\NP)/NP
Oklahoma borderse101
Texase89NP
λx.λy borders(y,x)
S\NPλy borders(y,e89)
Sborders(e101,e89)
ThisLecture‣ Seq2seqmodels
‣ Seq2seqmodelsforseman<cparsing
‣ Introtoa8en<on
Encoder-DecoderModels
Mo<va<on‣ Parsershavebeenpre8yhardtobuild…‣ Cons<tuency/graph-based:complexdynamicprograms
‣ Transi<on-based:complextransi<onsystems
‣ CCG/seman<cparsers:complexsyntax/seman<csinterface,challenginginference,challenginglearning
‣ Forseman<cparsinginpar<cular:bridgingthesyntax-seman<csdivideresultsinstructuralweirdnessesinparsers,hardtolearntherightseman<cgrammar
‣ Encoder-decodermodelscan(inprinciple)predictanylinearizedsequenceoftokens
Encoder-Decoder‣ Seman<cparsing:
WhatstatesborderTexas λ x state( x ) ∧ borders( x , e89 )
‣ Syntac<cparsing
Thedogran (S (NP (DT the) (NN dog) ) (VP (VBD ran) ) )
(butwhatifweproduceaninvalidtreeoronewithdifferentwords?)🤔
‣Machinetransla<on,summariza<on,dialoguecanallbeviewedinthisframeworkaswell
Encoder-Decoder‣ Encodeasequenceintoafixed-sizedvector
themoviewasgreat
‣ NowusethatvectortoproduceaseriesoftokensasoutputfromaseparateLSTMdecoder
lefilmétaitbon[STOP]
Sutskeveretal.(2014)
Encoder-Decoder
‣ Isthistrue?Sortof…we’llcomebacktothislater
Model‣ Generatenextwordcondi<onedonpreviouswordaswellashiddenstate
themoviewasgreat <s>
h̄
‣Wsizeis|vocab|x|hiddenstate|,soamaxoveren<revocabulary
Decoderhasseparateparametersfromencoder,sothiscanlearntobealanguagemodel(produceaplausiblenextwordgivencurrentone)
P (y|x) =nY
i=1
P (yi|x, y1, . . . , yi�1)
P (yi|x, y1, . . . , yi�1) = softmax(Wh̄)y1<latexit sha1_base64="7G4kLJYkX3D7/ov8pWJUOLn1JaM=">AAAGE3icjVTLbtQwFE3LDJTwamHJxqIaNVFDlbSVQEhFFWwQEtLw6ENq2shxPBmrecl2mhll/A9s+BU2LECILRt2/A2OkxQ6nbZYmszNOee+7Bv7WUQYt+3fc/PXOt3rNxZu6rdu37l7b3Hp/i5Lc4rwDkqjlO77kOGIJHiHEx7h/YxiGPsR3vOPX1b83gmmjKTJBz7O8GEMw4QMCIJcQt5Sx+y5MMqG0HMMZoIt4OJRZrjZkHjYYJZjuTHkQ39QjoRp6q2WG8zjSs3y2CuZV/LHjhCgodWb0aDm2ZAet/h00FOet05WFd8EkvMxn5lvVeVTrHoxGnA6nQIt+bg0Kbdad5U0wgNu1LlC4JIEuByPOI3LJKQwZjJz4UkCBSkHyKUkHPIqZJSGoG+Mt5xJYSETrAJ3QCEqHVEei6Z0suWIIxmvldqTwiNSrPf6RltgIapW+0bhOab6W5+cmhuVaUlI9hnJ9KxylCEUXCOSLYna+DpIxUkgqfZhSqGcJ21zJ4QRjgPwHibKua6+ZVGaJ1wYs8QWKEzxP0JTnM2YyP0rYMIBT0GYyud0SVcKXqfDBLQp3sAAhpAhSOv6I/kdBHKyL+/kohCXdnWBkylHY7U5GWXUFWzOrmBW/Gqmmm36ewYTt8yegSlpZknFc9sVExmlGq4CXKXTe8xz9N6onmk1bn75ThwFoB5iSGlagNFKzZe25bjiqAxWhD72HG9x2V6z1QLnDacxlrVm9b3FX26QojzGCUcRZOzAsTN+WELKCYqw0N2c4QyiYxjiA2kmMMbssFR3mgA9iQRgkFL5k+ev0H89SvkdsnHsS2XVB5vmKnAWd5DzwdPDkiRZznGC6kSDPKomrLogQUAoRjwaSwMiSmStAA2hPAUur1FdboIz3fJ5Y3d9zdlYW3+7ubz9otmOBe2h9kgzNEd7om1rr7S+tqOhzsfO587Xzrfup+6X7vfuj1o6P9f4PNDOrO7PP83BCWI=</latexit>
Inference‣ Generatenextwordcondi<onedonpreviouswordaswellashiddenstate
themoviewasgreat
‣ Duringinference:needtocomputetheargmaxoverthewordpredic<onsandthenfeedthattothenextRNNstate
le
<s>
‣ Needtoactuallyevaluatecomputa<ongraphuptothispointtoforminputforthenextstate
‣ Decoderisadvancedonestateata<meun<l[STOP]isreached
film était bon [STOP]
Implemen<ngseq2seqModels
themoviewasgreat
‣ Encoder:consumessequenceoftokens,producesavector.Analogoustoencodersforclassifica<on/taggingtasks
le
<s>
‣ Decoder:separatemodule,singlecell.Takestwoinputs:hiddenstate(vectorhortuple(h,c))andprevioustoken.Outputstoken+newstate
Encoder
…
film
le
Decoder Decoder
Training
‣ Objec<ve:maximize
themoviewasgreat <s> lefilmétaitbon
le
‣ Onelosstermforeachtarget-sentenceword,feedthecorrectwordregardlessofmodel’spredic<on(called“teacherforcing”)
[STOP]était
X
(x,y)
nX
i=1
logP (y⇤i |x, y⇤1 , . . . , y⇤i�1)
Training:ScheduledSampling
‣ Star<ngwithp=1(teacherforcing)anddecayingitworksbest
‣ Scheduledsampling:withprobabilityp,takethegoldasinput,elsetakethemodel’spredic<on
themoviewasgreat
lafilmétaisbon[STOP]
le film était
‣Modelneedstodotherightthingevenwithitsownpredic<ons
Bengioetal.(2015)
sample
‣ “Right”thing:trainwithreinforcementlearning
Implementa<onDetails
‣ Sentencelengthsvaryforbothencoderanddecoder:
‣ Typicallypadeverythingtotherightlengthanduseamaskorindexingtoaccessasubsetofterms
‣ Encoder:lookslikewhatyoudidinMini2
‣ Decoder:executeonestepofcomputa<onata<me,socomputa<ongraphisformulatedastakingoneinput+hiddenstate
‣ Test<me:dothisun<lyougeneratethestoptoken
‣ Training:dothisun<lyoureachthegoldstoppingpoint
Implementa<onDetails(cont’d)
‣ Batchingispre8ytricky:decoderisacross<mesteps,soyouprobablywantyourlabelvectorstolooklike[num<mestepsxbatchsizexnumlabels],iterateupwardsby<mesteps
‣ Beamsearch:canhelpwithlookahead.Findsthe(approximate)highestscoringsequence:
argmaxy
nY
i=1
P (yi|x, y1, . . . , yi�1)
BeamSearch‣Maintaindecoderstate,tokenhistoryinbeam
la:0.4
<s>
la
le
les
le:0.3les:0.1
log(0.4)log(0.3)
log(0.1)
film:0.4
la
…
film:0.8
le
… lefilm
lafilm
log(0.3)+log(0.8)
…
log(0.4)+log(0.4)
‣ Keepbothfilmstates!Hiddenstatevectorsaredifferent
themoviewasgreat
OtherArchitectures‣What’sthebasicabstrac<onhere?
‣ Encoder:sentence->vector
‣ Decoder:hiddenstate,outputprefix->newhiddenstate,newoutput
‣Widevarietyofmodelscanapplyhere:CNNencoders,decoderscanbeanyautoregressivemodelincludingcertaintypesofCNNs
‣ Transformer:anothermodeldiscussednextlecture
‣ OR:sentence,outputprefix->newoutput(moregeneral)
Seq2seqSeman<cParsing
Seman<cParsingasTransla<on
JiaandLiang(2016)
‣Writedownalinearizedformoftheseman<cparse,trainseq2seqmodelstodirectlytranslateintothisrepresenta<on
‣Whatmightbesomeconcernsaboutthisapproach?Howdowemi<gatethem?
“whatstatesborderTexas”
lambda x ( state( x ) and border( x , e89 ) ) )
‣Whataresomebenefitsofthisapproachcomparedtogrammar-based?
HandlingInvariances
‣ Parsing-basedapproacheshandlethesethesameway
‣ Possibledivergences:features,differentweightsinthelexicon
‣ Keyidea:don’tchangethemodel,changethedata
“whatstatesborderTexas” “whatstatesborderOhio”
‣ Canwegetseq2seqseman<cparserstohandlethesethesameway?
‣ “Dataaugmenta<on”:encodeinvariancesbyautoma<callygenera<ngnewtrainingexamples
DataAugmenta<on
‣ Abstractouten<<es:nowwecan“remix”examplesandencodeinvariancetoen<tyID.Morecomplicatedremixestoo
‣ Letsussynthesizea“whatstatesborderohio?”example
JiaandLiang(2016)
Seman<cParsingasTransla<on
JiaandLiang(2016)
‣ Prolog
‣ Lambdacalculus
‣ OtherDSLs
‣ Handleallofthesewithuniformmachinery!
Seman<cParsingasTransla<on
JiaandLiang(2016)
‣ Threeformsofdataaugmenta<onallhelp
‣ Resultsonthesetasksares<llnotasstrongashand-tunedsystemsfrom10yearsago,butthesamesimplemodelcandowellatallproblems
RegexPredic<on
‣ Predictregexfromtext
‣ Problem:requiresalotofdata:10,000examplesneededtoget~60%accuracyonpre8ysimpleregexes
Locascioetal.(2016)‣ Doesnotscalewhenregexspecifica<onsaremoreabstract(Iwanttorecognizeadecimalnumberlessthan20)
SQLGenera<on
‣ Convertnaturallanguagedescrip<onintoaSQLqueryagainstsomeDB
‣ Howtoensurethatwell-formedSQLisgenerated?
Zhongetal.(2017)
‣ Threeseq2seqmodels
‣ Howtocapturecolumnnames+constants?
‣ Pointermechanisms,tobediscussedlater
A8en<on
‣ Orangepiecesareprobablyreusedacrossmanyproblems
‣ LSTMhastorememberthevalueofTexasfor13steps!
‣ Next:a8en<onmechanismsthatletus“lookback”attheinputtoavoidhavingtoremembereverything
“whatstatesborderTexas” lambdax(state(x)andborder(x,e89)))
‣ Nottoohardtolearntogenerate:startwithlambda,alwaysfollowwithx,followthatwithparen,etc.
A8en<on
ProblemswithSeq2seqModels
‣ Needsomeno<onofinputcoverageorwhatinputwordswe’vetranslated
‣ Encoder-decodermodelsliketorepeatthemselves:
AboyplaysinthesnowboyplaysboyplaysUngarçonjouedanslaneige
‣Whydoesthishappen?
‣Modelstrainedpoorly
‣ Inputisforgo8enbytheLSTMsoitgetsstuckina“loop”ofgenera<ngthesameoutputtokensagainandagain
ProblemswithSeq2seqModels
‣ Badatlongsentences:1)afixed-sizehiddenrepresenta<ondoesn’tscale;2)LSTMss<llhaveahard<merememberingforreallylongperiodsof<me
RNNenc:themodelwe’vediscussedsofarRNNsearch:usesa8en<on
Bahdanauetal.(2014)
ProblemswithSeq2seqModels
‣ Unknownwords:
‣ Infact,wedon’twanttoencodethem,wewantawayofdirectlylookingbackattheinputandcopyingthem(Pont-de-Buis)
‣ Encodingtheserarewordsintoavectorspaceisreallyhard
AlignedInputs
<s>lefilmétaitbon
themoviewasgreat
themoviewasgreat
lefilmétaitbon
‣ Supposeweknewthesourceandtargetwouldbeword-by-wordtranslated
‣ Inthatcase,wecouldlookatthecorrespondinginputwordwhentransla<ng—mightimprovehandlingoflongsentences!
lefilmétaitbon[STOP]
‣ Howcanweachievethiswithouthardcodingit?
A8en<on
‣ Ateachdecoderstate,computeadistribu<onoversourceinputsbasedoncurrentdecoderstatethemoviewasgreat <s> le
themovie wa
sgreatth
emovie wa
sgreat
… …
‣ Usetheweightedsumofinputtokenstopredictoutput
Takeaways
‣ Ratherthancombiningsyntaxandseman<cslikeinCCG,wecaneitherparsetoseman<crepresenta<onsdirectlyorgeneratethemwithseq2seqmodels
‣ Seq2seqmodelsareaveryflexibleframework,someweaknessescanpoten<allybepatchedwithmoredata
‣ Howtofixtheirshortcomings?Next<me:a8en<on,copying,andtransformers