ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ?ΠΛ323 -...

download ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ?ΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ... Compiler Phases Lexical

of 35

  • date post

    11-May-2018
  • Category

    Documents

  • view

    215
  • download

    0

Embed Size (px)

Transcript of ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ?ΠΛ323 -...

  • 323-

    Lecture1IntroductionEliasAthanasopoulos

    eliasathan@cs.ucy.ac.cy

  • Contract- Logistics

    Officehours EveryTuesday,10:00 12:00,B105(01)

    Credits:7.5ECTS LectureTimetableMonday,Thursday,12:00 13:30,006(01)Wednesday,9:00-10:00,110(01),onlywhenannounced!

  • Contract- Logistics

    Labs(Dr.Petros Panayi) EveryWednesday,B103(01) Group1:15:00- 17:00 Group2:17:00- 19:00

  • Contract- Logistics

    Score ProgrammingAssignments(Quizzes),15% Project(multiplesteps),15%Midterm,30% Final,40%

    Requirements Averagegradeofwrittenexamsshouldbeatleast4.5

    Finalgradeshouldbeatleast5

  • Contract- Logistics

    Requiredcourses CS132(Cprogramming), CS211 (Complexity), CS231 (DataStructures)

    Studentswithconflictsshouldletmeknow

  • Contract- Logistics

    Readingmaterial Dragonbook,Compilers Principles,Techniques,andTools(2ndEdition),A.V.Aho,M.S.Lam,R.Sethi andJ.D.Ullman,PrenticeHall,2006.

  • Communication

    Labs Blackboard

    Lectures http://www.cs.ucy.ac.cy/courses/EPL323

  • INTRODUCTION(Chapter1fromDragonBook)

  • Whythiscourseisimportant?

    Manycoreconcepts ofCSinareal-lifesetting CS132:BehindthescenesofCprogramming CS211,CS231:DFAs,parsingalgorithms CS372,CS221:Assembly,ComputerArchitecture

    Compilersareeverywhere Fromexoticdevices(IoT)towebbrowsers(JavaScriptengines,DOMparsers,etc.)

    TweakingcompilersforSecurity

  • Compilerisatoughone

    Significantefforttobuildacompilerfromscratch FirstFortrancompilertook18staff-years

    Nowadays,manytoolsexisttohelp Ourunderstandingismuchbetter Bettertechniques,betterprogramminglanguages

  • Whatisacompiler?

    Acompilerisaprogramthat readsaprogramwritteninonelanguage(source) andtranslatesittoanequivalentprograminanotherlanguage(target)

    important:errorreportingduringtranslation

    Compilersourceprogram targetprogram

    errormessages

  • Examples

    GCC(GnuCompilerCollection) gcc,g++,javac,etc.

    LLVM(LowLevelVirtualMachine) clang,clang++

    Compilersareeverywhere, Prettyprinters(i.e.,colorsyntaxineditors),staticcheckers,interpreters(i.e.,scriptinglanguages),etc.

  • Analysis-SynthesisModel

    Therearetwopartsincompilation: Analysis Synthesis

    Analysis Breaksupthesourceprogram tosubpartsandcreatesintermediaterepresentation(s)

    Synthesis Constructsthetargetprogram fromintermediaterepresentation(s)

  • Example1(LaTeX)\begin{table}[tb]

    \centering\caption{We name gadgets based on their type (prefix), payload (body),and exit instruction (suffix). In total, we name 2$\times$3$\times$3=18different gadget types.}\begin{tabular}{|c|c|c|}

    \hline\textbf{Gadget type} & \textbf{Payload instructions} &

    \textbf{Exit instruction} \\\hline{Prefix} & {Body} & {Suffix} \\\hline

    \begin{tabular}{l}CS - Call site\\EP - Entry point\\\end{tabular} &...

    $ pdflatex main.tex

  • Example2(Database)

    class

    student Seq. Scan

    Hash Join

    Aggregate

    SELECT AVG(grade)FROM student, classWHEREclass.name = epl223 ANDclass.id = student.id;

  • Requirements

    Compiler Reliability Fastexecution Lowmemoryoverhead Gooderrorreporting Errorrecovery Portability Maintainability

    Targetprogram Fastexecution Lowmemoryoverhead

  • Sourcecode/program

    Easytoread/writebyhuman

    int expr(int n) {int d;

    d = 4 * n * n * (n + 1) * (n + 1);

    return d;}

  • AssemblyandMachineCode

    Optimizedforexecutionbyamachine(CPU) Lessdescriptive Hardtobeprocessedbyahuman

    lda $30,-32($30)stq $26,0($30)stq $15,8($30)bis $30,$30,$15bis $16,$16,$1stl $1,16($15)lds $f1,16($15)sts $f1,24($15)ldl $5,24($15)bis $5,$5,$2s4addq $2,0,$3ldl $4,16($15)mull $4,$3,$2ldl $3,16($15)

  • Optimizations

    Compilershaveseverallayersofoptimizationsint expr(int n){

    int d;d = 4 * n * n * (n + 1) * (n + 1);

    return d;}

    .expr:stw 31,-4(1) lwz 11,64(31)stwu 1,-40(1) addi 9,11,1mr 31,1 mullw 0,0,9stw 3,64(31) stw 0,24(31)lwz 0,64(31) lwz 0,24(31)mr 9,0 mr 3,0slwi 0,9,2 b L..2lwz 9,64(31) L..2:mullw 0,0,9 lwz 1,0(1)lwz 11,64(31) lwz 31,-4(1)addi 9,11,1 blrmullw 0,0,9

    .expr:addi 9,3,1slwi 0,3,2mullw 3,3,0mullw 3,3,9mullw 3,3,9blr

    Nooptimizations$ gcc O0

    Optimizations$ gcc O3

  • Cross-compiler

    Compilerscangeneratecodefordifferentmachines(targets) int expr(int n){

    int d;d = 4 * n * n * (n + 1) * (n + 1);

    return d;}

    expr:pushl %ebpmovl %esp, %ebpmovl 8(%ebp), %eaxleal 1(%eax), %edximull %eax, %eaximull %edx, %eaximull %edx, %eaxsall $2, %eaxpopl %ebpret

    .expr:addi 9,3,1slwi 0,3,2mullw 3,3,0mullw 3,3,9mullw 3,3,9blr

    Forx86$ gcc O3 b i586

    ForPowerPC$ gcc O3 b powerpc

  • Compilationlifecycle

    Phases Sourcecodeistransformedtointermediaterepresentations

    Eachintermediaterepresentationissuitableforaparticularprocessing(lexical,syntax,optimization,etc.)

    Ineachphasetheprogramistranslatedtoaformclosertothemachinerepresentationandlesssimilartothe(human-oriented)sourcerepresentation

  • CompilerPhases

    Lexical Analyzer

    Source Program

    Target Program

    Syntax Analyzer

    Semantic Analyzer

    Intermediate codegenerator

    Code optimizer

    Code generator

    Error HandlerSymbol Table Manager

    1

    2

    2

    3

    3

    Optional

    1:Tokens()2:Syntaxtree()3:Intermediatecode()

  • Analysisofthesourceprogram

    Linearanalysis Sourceistreatedasastreamofcharacters(left-to-right)andisgroupedintotokens

    Hierarchicalanalysis Tokensarefurthergroupedinlargergrammaticalstructures(e.g.,nestedparenthesesandblocks)

    Semanticanalysis Certainchecksareperformedtoensurethevalidityoftheidentifiedgrammaticalstructures

  • LexicalAnalysis

    Linearscanning Considertheexpression

    position := initial + rate *60

    Lexicalanalysisproducesid(1) op(:=) id(2) op(+) id(3) op(*) cons(60)id: identifier, op: operator, cons: constant

    SymbolTable 1 position 2 initial 3 rate 4

  • SyntaxAnalysis

    Hierarchical Involvesgroupingthetokensintogrammaticalphases

    Constructsthestructurewiththetokenrelationship

    op(:=)

    id(3) cons(60)

    id(2) op(*)

    id(1) op(+)

    position := initial + rate * 60

  • SimpleGrammar

    Thehierarchicalstructureoftheprogramisusuallyexpressedbyrecursiverules

    1. Anyidentifier isanexpression2. Anynumber isanexpression3. Ifexpression1 andexpression2 areexpressions,

    thensoare:expression1 +expression2expression1 *expression2(expression1 )

  • Applyingthegrammar

    op(:=)

    id(3) cons(60)

    id(2) op(*)

    id(1) op(+)

    (1)Anyidentifier isanexpression(2)Anynumber isanexpression(3)Ifexpression1 andexpression2 areexpressions,thensoare:

    expression1 +expression2expression1 *expression2(expression1 )

  • SemanticAnalysis

    Checkstheprogramforsemanticerrors Gatherstypeinformation Operandsandoperators Type-checking

    op(:=)

    id(3) int_2_real

    id(2) op(*)

    id(1) op(+)

    cons(60)

    FLOAT

    INTposition := initial + rate * 60

    int_2_real() isanextranodeforconverting60 toarealnumber.Remember:themachinerepresentationofintegersandrealnumbersisdifferent!

  • Errordetectionandreporting

    Allphasescanissueerrors Acompilerthatstopsatthefirsterrorisnothelpful Mostoftheerrorsarehandledinthesyntax/semanticanalysisphases Lexicalanalysisdetectserrorswhereastreamofcharactersdoesnotformavalidtoken

    Syntaxanalysisdetectserrorswherethestreamofvalidtokensviolatethestructurerules(syntax)

    Semanticanalysisdetectserrorswherethesyntaxisvalidbytheoperationnot(addinganarraywitharealnumber)

  • IntermediateCodeandOptimization

    Eachphaseproducesintermediatecode

    Optimization

    temp1 := int_2_real(60)temp2 := id(3) * temp1temp3 := id(2) + temp2id(1) := temp3

    temp1 := id(3) * 60.0id(1) := id(2) + temp1

    three-addresscode:asimpleassembly-likelanguage,whichconsistsofinstructions,eachofwhichhasatmostthreeoperands

    Operator:Operand:

  • CodeGeneration

    Thelastphaseofthecompileristhegenerationofthetargetcode

    Registerallocation Eachexpressionshoulduseregistersthatareavailable

    Relocationinformation Variablesarestoredinrelocatable addresses

    MOVF id3, R2MULF #60.0, R2MOVF id2, R1ADDF R2, R1MOVF R1, id1

  • GenericPictureCompilerandFriends

    Sourcecode

    Compiler

    Assemblycode

    Assembler

    Machinecode

    Executable

    Linker

    SLoader

  • FrontandBackends

    Separationofcommontasks Makesdesignand

    implementationeasier KcompilersforNmachines Nbackends,Kfrontends InsteadofK*Ncompilers

    Frontend(machine-

    independent)

    Backend

    Lexical Analyzer

    Source Program

    Target Program

    Syntax Analyzer

    Semantic Analyzer

    Intermediate codegenerator

    Code optimizer

    Code generator

  • Passes

    Apass iswhenthecompilerreadsthesourcecode(orintermediatefiles)

    Thenumberofpassesdependsonthesourceandtargetlanguageandtherunningenvironment

    Differentphasesthatcooperatecanbegroupedtoasinglepass(notalwayspossible)

    Whengroupingisnotpossible Backpatching:leaveemptyinformationthatisgoingtobefilledbyalaterphase/pass

  • Compiler-constructionTools

    Parsergenerators Scannergenerators Syntax-directedtranslationengines Automaticcodegenerators Data-flowengines