ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών · 2018. 2. 12. ·...

29
ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών Lecture 5a Syntax Analysis Elias Athanasopoulos [email protected]

Transcript of ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών · 2018. 2. 12. ·...

  • ΕΠΛ323-ΘεωρίακαιΠρακτικήΜεταγλωττιστών

    Lecture5aSyntaxAnalysisEliasAthanasopoulos

    [email protected]

  • SyntaxAnalysisΣυντακτικήΑνάλυση

    • Context-freeGrammars(CFGs)• Derivations• Parsetrees• Top-downParsing• Ambiguities

  • SyntaxAnalysis

    • Syntaxanalysis(parsing)istheprocessofdeterminingifastringoftokenscanbegeneratedbyagrammar

    “Igavehimthebook” sentence

    subject: I verb:gave object: himindirect object

    noun phrase

    article: the noun: book

  • Lexical-SyntaxAnalysis

    Sourcecode(characterstream)

    Tokenstream

    Syntaxtree

    if ( b ) a = b ;==

    {if (b == 0) a = b;while (a != 1) { printf(“%I “,I--); }

    }

    {

    if_stmtexpr

    variable

    b

    constant

    0

    block

    while_stmt

    expr

    == !=variable constant

    block

    1a...

    0

    LexicalAnalysis

    SyntaxAnalysis

    expr

    variable = variable

    a b

  • TheRoleoftheParser

    lexicalanalyzer parser

    sourceprogram

    token

    getnexttoken

    symboltable

    restoffrontend

    parsetree

  • SyntaxAnalysisOperation

    • Input– Astreamoftokenstakenfromlexicalanalysis

    • Output– Syntaxtreewhichdeterminesthetokenrelationsandthesyntaxcorrectness(areallparenthesesbalanced?)

    • Semanticanalysistakescareoftypes– int x = true;– int y; z = f(y);

  • SyntaxErrorHandling

    • Lexical–Misspellinganidentifier,keyword,oroperator

    • Syntactic– Arithmeticexpressionwithunbalancedparenthesis

    • Semantic– Operatorappliedtoanincompatibleoperand

    • Logical– Infinitelyrecursivecall

  • ErrorHandlerRequirements

    • Itshouldreportthepresenceoferrorsclearlyandaccurately

    • Itshouldrecoverfromeacherrorquicklyenoughtobeabletodetectsubsequenterrors

    • Itshouldnotsignificantlyslowdowntheprocessingofcorrectprograms

  • Whathappenswhenanerrorisdetected?• Manystrategies,noneclearlydominates• Notadequatefortheparsertoquitupondetectingthefirsterror– Subsequentparsingmayrevealadditionalerrors

    • Usually,thecompilerattemptserrorrecovery– Reasonablehopethattherestoftheprogramcanbeparsed

    • Errorrecoveryshouldberealizedcorrectly– Otherwisemanyerrorscanbegenerated

  • Example

    • Whilerecoveringfromanerroracompilermayskipthedeclarationofavariablezap

    • Atalaterpointwhenzap isusedthecompilershouldnotgenerateasyntacticerror,butjustthemissingdeclaration– Since,thereshouldbenoentryatthesymboltable

    • Conservativestrategy– Onceanerrorisdetected,filteroutcloseerrors(consumeenoughtokenstoexittheerrorarea)

  • Error-recoveryStrategies• Panicmode

    – Onceanerrorisdetected,consumetokensuntilasynchronizingtoken isdetected

    – Synchronizingtokensareusuallydelimiters(end, ;),whichhaveaclearmeaning

    – Simpleandcannotenteraninfiniteloop• Phraselevel

    – Attempttocorrecttheerrorbytakingaction– Insertamissingsemicolon,replaceacommawithasemicolon,

    etc.– Cancreateinfiniteloopsifactionsarenotappliedcorrectly– Hardtocopewithcaseswheretheerrorhasoccurredbefore

    thepointofdetection

  • Error-recoveryStrategies• Errorproductions– Commonerrorscanbeaugmentedtothegrammarofthelanguage

    – Theparsercanthendetecterrors,sincetheseerrorsarepartofthelanguage

    • Globalcorrection– Attempttocorrectanerrorwiththeleastpossibleactions– Givenanincorrectinputstringx andgrammarG,findavalidy,whichcanbederivedfromx withtheleastamountofchanges

    – Theclosestcorrectprogrammaynotbetheonetheprogrammerhadinmind

  • CONTEXT-FREEGRAMMARSΓραμματικέςΧωρίςΣυμφραζόμενα

  • RegularExpressionsLimitations• RegularexpressionscanbetransformedeasilytoNFA(andthentoDFA)

    • Discoveringandclassifyingtokensusingregularexpressionsiseasyandefficient

    • Regularexpressionscannotbeusedforsyntaxanalysis

  • RegularExpressionsLimitations• Matchallbalancedparentheses:– () (()) ()()() (())()((()()))

    • YouneedanNFAwithaninfinitenumberofstates

    ( ( ( ( (

    )))))

    For5nestedparenthesesyouneedthefollowingNFA

    S

  • Context-freeGrammar(CFG)ΓραμματικήΧωρίςΣυμφραζόμενα

    1. Asetoftokens,knownasterminal symbols.– Terminalsarethebasicsymbolsfromwhichstringsareformed.Theword “token”isasynonymfor“terminal”whenwearetalkingaboutprogramminglanguages(e.g.,tokenslikeif,then,andelse areallterminals)

    2. Asetofnonterminals.– Nonterminals aresyntacticvariablesthatdenotesetsofstrings.Thenonterminals definesetsofstringsthathelpdefinethelanguagegeneratedbythegrammar.Theyalsoimposeahierarchicalstructureonthelanguagedefinedbythegrammar.

  • Context-freeGrammar(CFG)ΓραμματικήΧωρίςΣυμφραζόμενα

    3. Asetofproductions(κανόνεςπαραγωγής) whereeachproductionconsistsofanonterminal,calledtheleftside oftheproduction,anarrow,andasequenceoftokensand/ornonterminals,calledtherightside oftheproduction.– Theproductionsofthegrammarspecifythemannerinwhichthe

    terminalsandnonterminals canbecombinedtoformstrings.Eachproductionconsistsofanonterminal,followedbyanarrow(sometimesthesymbol::== isusedinplaceofthearrow),followedbyastringofnonterminals andterminals.

    4. Adesignationofoneofthenonterminals asthestartsymbol– Inagrammar,onenonterminalisdistinguishedasthestartsymbol,

    andthesetofstringsitdenotesisthelanguagedefinedbythegrammar.

  • Example1

    • Expressionsofdigitsseparatedbyplusandminussigns– 9-5+2, 3-1, 7

    list è list + digit (2.2)list è list – digit (2.3)list è digit (2.4)digit è 0|1|2|3|4|5|6|7|8|9 (2.5)Thethreefirstproductionscanbegrouped:list è list + digit | list – digit | digit

    Terminals/Tokens:+ - 0 1 2 3 4 5 6 7 8 9Nonterminals:list, digitSart symbol: list

  • Example1

    • Thetenproductionsforthenonterminaldigitallowittostandforanyofthetokens0, 1, ..., 9

    • From2.4asingledigit byitselfisalist• 2.2and2.3expressthefactthatifwetakeanylistandfollowitbyaplusorminussignandthenanotherdigit wehaveanewlist

    9-5+2• 9 isalistbyproduction2.4,since9 isadigit• 9-5 isalistbyproduction2.3,since9 isalistand5 isadigit• 9-5+2 isalistbyproduction2.2,since9-5 isalistand2 isadigit

  • Example2

    • “Begin End” blockinPascal

    begin... (* Pascal code *)

    end

    block è begin opt_stmts endopt_stmts è stmt_list | εstmt_list è stmt_list ; stmt | stmt

    (stmt isnotexpandedatthispoint)

  • Example3

    • Simplearithmeticexpressionsexpr è expr op exprexpr è (expr)expr è -exprexpr è idop è +op è -op è *op è /op è ^

    Equalwith:E è E A E | (E) | -E | idA è + | - | * | / | ^

  • DerivationΠαραγωγή

    E è E A E | (E) | -E | id• TheproductionE è -E signifiesthatanexpressionprecededbyaminussignisalsoanexpression

    • WecanthusgeneratemorecomplexexpressionsfromsimplerexpressionsbyjustreplacingE with-E

  • DerivationΠαραγωγή

    E => -E(E derives –E)

    ExamplesE è (E) E*E => (E)*E or E*(E)E => -E => -(E) => -(id)

    => Derivesinonestep=> Derivesinzerooremoresteps=> Derivesinoneormoresteps*+

  • LeftmostderivationE=>-E=>-(E)=>-(E+E)=>-(id+E)=>-(id+id)

    RightmostderivationE=>-E=>-(E)=>-(E+E)=>-(E+id)=>-(id+id)

    Leftmost- Rightmost

    E è E A E | (E) | -E | id (G1)A è + | - | * | / | ^

    lm lmlmlmlm

    rm rm rm rm rm

    Thestring -(id + id) isasentenceofgrammarG1

  • GrammarsandLanguages

    • GivenagrammarGwithastartsymbolS,– Astringofonlyterminals, w, isinL(G)iff S=>w– Thestringw iscalledasentenceof G– L(G) isthelanguagegeneratedbyG andincludesallw(stringscomposedbyterminalsofG)

    • Alanguagethatcanbegeneratedbyagrammarisacontext-freegrammar

    • Iftwogrammarsgeneratethesamelanguage,thentheyareequivalent

    +

  • ParseTrees

    Aparsetreemaybeviewedasagraphicalrepresentationforaderivationthatfiltersoutthechoiceregardingreplacementorder.

    E

    E

    E

    E E

    -

    ( )

    +

    id id

    E=>-E=>-(E)=>-(E+E)=>-(id+E)=>-(id+id)lm lmlmlmlm

  • ConstructingtheParseTreeE E

    E-E

    E

    E

    -

    ( )

    E

    E

    E

    E E

    -

    ( )

    +

    id id

    E

    E

    E

    E E

    -

    ( )

    +

    id

    E

    E

    E

    E E

    -

    ( )

    +

    => =>

    => =>=>

  • AmbiguityΑμφισημία

    • Agrammarthatproducesmorethanoneparsetreeforsomesentenceissaidtobeambiguous

    • Forcertaintypesofparsers,itisdesirablethatthegrammarbemadeunambiguous

    • Forsomeapplicationsweshallalsoconsidermethodswherebywecanusecertainambiguousgrammars,togetherwithdisambiguating rulesthat“throwaway”undesirableparsetrees