Download - ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών · ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ... Compiler Phases

Transcript

ΕΠΛ323-ΘεωρίακαιΠρακτικήΜεταγλωττιστών

Lecture1IntroductionEliasAthanasopoulos

[email protected]

Contract- Logistics

• Officehours– EveryTuesday,10:00– 12:00,B105(ΘΕΕ01)

• Credits:7.5ECTS• LectureTimetable–Monday,Thursday,12:00– 13:30,006(ΧΩΔ01)–Wednesday,9:00-10:00,110(ΧΩΔ01),onlywhenannounced!

Contract- Logistics

• Labs(Dr.Petros Panayi)– EveryWednesday,B103(ΘΕΕ01)– Group1:15:00- 17:00– Group2:17:00- 19:00

Contract- Logistics

• Score– ProgrammingAssignments(Quizzes),15%– Project(multiplesteps),15%–Midterm,30%– Final,40%

• Requirements– Averagegradeofwrittenexamsshouldbeatleast4.5

– Finalgradeshouldbeatleast5

Contract- Logistics

• Requiredcourses– CS132(Cprogramming),– CS211 (Complexity),– CS231 (DataStructures)

Studentswithconflictsshouldletmeknow

Contract- Logistics

• Readingmaterial– Dragonbook,Compilers– Principles,Techniques,andTools(2ndEdition),A.V.Aho,M.S.Lam,R.Sethi andJ.D.Ullman,PrenticeHall,2006.

Communication

• Labs– Blackboard

• Lectures– http://www.cs.ucy.ac.cy/courses/EPL323

INTRODUCTION(Chapter1fromDragonBook)

Whythiscourseisimportant?

• Manycoreconcepts ofCSinareal-lifesetting– CS132:BehindthescenesofCprogramming– CS211,CS231:DFAs,parsingalgorithms– CS372,CS221:Assembly,ComputerArchitecture

• Compilersareeverywhere– Fromexoticdevices(IoT)towebbrowsers(JavaScriptengines,DOMparsers,etc.)

• TweakingcompilersforSecurity

Compilerisatoughone

• Significantefforttobuildacompilerfromscratch– FirstFortrancompilertook18staff-years

• Nowadays,manytoolsexisttohelp– Ourunderstandingismuchbetter– Bettertechniques,betterprogramminglanguages

Whatisacompiler?

• Acompilerisaprogramthat– readsaprogramwritteninonelanguage(source)– andtranslatesittoanequivalentprograminanotherlanguage(target)

– important:errorreportingduringtranslation

Compilersourceprogram targetprogram

errormessages

Examples

• GCC(GnuCompilerCollection)– gcc,g++,javac,etc.

• LLVM(LowLevelVirtualMachine)– clang,clang++

• “Compilers”areeverywhere,– Prettyprinters(i.e.,colorsyntaxineditors),staticcheckers,interpreters(i.e.,scriptinglanguages),etc.

Analysis-SynthesisModel

• Therearetwopartsincompilation:– Analysis– Synthesis

• Analysis– Breaksupthesourceprogram tosubpartsandcreatesintermediaterepresentation(s)

• Synthesis– Constructsthetargetprogram fromintermediaterepresentation(s)

Example1(LaTeX)\begin{table}[tb]

\centering\caption{We name gadgets based on their type (prefix), payload (body),and exit instruction (suffix). In total, we name 2$\times$3$\times$3=18different gadget types.}\begin{tabular}{|c|c|c|}

\hline\textbf{Gadget type} & \textbf{Payload instructions} &

\textbf{Exit instruction} \\\hline{Prefix} & {Body} & {Suffix} \\\hline

\begin{tabular}{l}CS - Call site\\EP - Entry point\\\end{tabular} &...

$ pdflatex main.tex

Example2(Database)

class

student Seq. Scan

Hash Join

Aggregate

SELECT AVG(grade)FROM student, classWHEREclass.name = “epl223” ANDclass.id = student.id;

Requirements

• Compiler– Reliability– Fastexecution– Lowmemoryoverhead– Gooderrorreporting– Errorrecovery– Portability– Maintainability

• Targetprogram– Fastexecution– Lowmemoryoverhead

Sourcecode/program

• Easytoread/writebyhuman

int expr(int n) {int d;

d = 4 * n * n * (n + 1) * (n + 1);

return d;}

AssemblyandMachineCode

• Optimizedforexecutionbyamachine(CPU)• Lessdescriptive• Hardtobeprocessedbyahuman

lda $30,-32($30)stq $26,0($30)stq $15,8($30)bis $30,$30,$15bis $16,$16,$1stl $1,16($15)lds $f1,16($15)sts $f1,24($15)ldl $5,24($15)bis $5,$5,$2s4addq $2,0,$3ldl $4,16($15)mull $4,$3,$2ldl $3,16($15)

Optimizations

• Compilershaveseverallayersofoptimizationsint expr(int n){

int d;d = 4 * n * n * (n + 1) * (n + 1);

return d;}

.expr:stw 31,-4(1) lwz 11,64(31)stwu 1,-40(1) addi 9,11,1mr 31,1 mullw 0,0,9stw 3,64(31) stw 0,24(31)lwz 0,64(31) lwz 0,24(31)mr 9,0 mr 3,0slwi 0,9,2 b L..2lwz 9,64(31) L..2:mullw 0,0,9 lwz 1,0(1)lwz 11,64(31) lwz 31,-4(1)addi 9,11,1 blrmullw 0,0,9

.expr:addi 9,3,1slwi 0,3,2mullw 3,3,0mullw 3,3,9mullw 3,3,9blr

Nooptimizations$ gcc –O0

Optimizations$ gcc –O3

Cross-compiler

• Compilerscangeneratecodefordifferentmachines(targets) int expr(int n){

int d;d = 4 * n * n * (n + 1) * (n + 1);

return d;}

expr:pushl %ebpmovl %esp, %ebpmovl 8(%ebp), %eaxleal 1(%eax), %edximull %eax, %eaximull %edx, %eaximull %edx, %eaxsall $2, %eaxpopl %ebpret

.expr:addi 9,3,1slwi 0,3,2mullw 3,3,0mullw 3,3,9mullw 3,3,9blr

Forx86$ gcc –O3 –b i586

ForPowerPC$ gcc –O3 –b powerpc

Compilationlifecycle

• Phases– Sourcecodeistransformedtointermediaterepresentations

– Eachintermediaterepresentationissuitableforaparticularprocessing(lexical,syntax,optimization,etc.)

• Ineachphasetheprogramistranslatedtoaformclosertothemachinerepresentationandlesssimilartothe(human-oriented)sourcerepresentation

CompilerPhases

Lexical Analyzer

Source Program

Target Program

Syntax Analyzer

Semantic Analyzer

Intermediate codegenerator

Code optimizer

Code generator

Error HandlerSymbol Table Manager

1

2

2

3

3

Optional

1:Tokens(Διακριτικά)2:Syntaxtree(Συντατικόδένδρο)3:Intermediatecode(Ενδιάμεσοςκώδικας)

Analysisofthesourceprogram

• Linearanalysis– Sourceistreatedasastreamofcharacters(left-to-right)andisgroupedintotokens

• Hierarchicalanalysis– Tokensarefurthergroupedinlargergrammaticalstructures(e.g.,nestedparenthesesandblocks)

• Semanticanalysis– Certainchecksareperformedtoensurethevalidityoftheidentifiedgrammaticalstructures

LexicalAnalysisΛεξιλογικήΑνάλυση

• Linearscanning• Considertheexpression

position := initial + rate *60

• Lexicalanalysisproducesid(1) op(:=) id(2) op(+) id(3) op(*) cons(60)id: identifier, op: operator, cons: constant

• SymbolTable 1 position …2 initial …3 rate …4 … …

SyntaxAnalysisΣυντακτικήΑνάλυση

• Hierarchical• Involvesgroupingthetokensintogrammaticalphases

• Constructsthestructurewiththetokenrelationship

op(:=)

id(3) cons(60)

id(2) op(*)

id(1) op(+)

position := initial + rate * 60

SimpleGrammar

• Thehierarchicalstructureoftheprogramisusuallyexpressedbyrecursiverules

1. Anyidentifier isanexpression2. Anynumber isanexpression3. Ifexpression1 andexpression2 areexpressions,

thensoare:expression1 +expression2expression1 *expression2(expression1 )

Applyingthegrammar

op(:=)

id(3) cons(60)

id(2) op(*)

id(1) op(+)

(1)Anyidentifier isanexpression(2)Anynumber isanexpression(3)Ifexpression1 andexpression2 areexpressions,thensoare:

expression1 +expression2expression1 *expression2(expression1 )

SemanticAnalysisΣημασιολογικήΑνάλυση

• Checkstheprogramforsemanticerrors• Gatherstypeinformation• Operandsandoperators• Type-checking

op(:=)

id(3) int_2_real

id(2) op(*)

id(1) op(+)

cons(60)

FLOAT

INTposition := initial + rate * 60

int_2_real() isanextranodeforconverting60 toarealnumber.Remember:themachinerepresentationofintegersandrealnumbersisdifferent!

Errordetectionandreporting

• Allphasescanissueerrors• Acompilerthatstopsatthefirsterrorisnothelpful• Mostoftheerrorsarehandledinthesyntax/semanticanalysisphases– Lexicalanalysisdetectserrorswhereastreamofcharactersdoesnotformavalidtoken

– Syntaxanalysisdetectserrorswherethestreamofvalidtokensviolatethestructurerules(syntax)

– Semanticanalysisdetectserrorswherethesyntaxisvalidbytheoperationnot(addinganarraywitharealnumber)

IntermediateCodeandOptimizationΕνδιάμεσοςΚώδικαςκαιΒελτιστοποίηση

• Eachphaseproducesintermediatecode

• Optimization

temp1 := int_2_real(60)temp2 := id(3) * temp1temp3 := id(2) + temp2id(1) := temp3

temp1 := id(3) * 60.0id(1) := id(2) + temp1

three-addresscode:asimpleassembly-likelanguage,whichconsistsofinstructions,eachofwhichhasatmostthreeoperands

Operator:τελεστήςOperand: τελεστέος

CodeGenerationΠαραγωγήΚώδικα

• Thelastphaseofthecompileristhegenerationofthetargetcode

• Registerallocation– Eachexpressionshoulduseregistersthatareavailable

• Relocationinformation– Variablesarestoredinrelocatable addresses

MOVF id3, R2MULF #60.0, R2MOVF id2, R1ADDF R2, R1MOVF R1, id1

GenericPictureCompilerandFriends

Sourcecode

Compiler

Assemblycode

Assembler

Machinecode

Executable

Linker

ΟSLoader

FrontandBackends

• Separationofcommontasks• Makesdesignand

implementationeasier• KcompilersforNmachines– Nbackends,Kfrontends

– InsteadofK*Ncompilers

Frontend(machine-

independent)

Backend

Lexical Analyzer

Source Program

Target Program

Syntax Analyzer

Semantic Analyzer

Intermediate codegenerator

Code optimizer

Code generator

Passes

• Apass iswhenthecompilerreadsthesourcecode(orintermediatefiles)

• Thenumberofpassesdependsonthesourceandtargetlanguageandtherunningenvironment

• Differentphasesthatcooperatecanbegroupedtoasinglepass(notalwayspossible)

• Whengroupingisnotpossible– Backpatching:leaveemptyinformationthatisgoingtobefilledbyalaterphase/pass

Compiler-constructionTools

• Parsergenerators• Scannergenerators• Syntax-directedtranslationengines• Automaticcodegenerators• Data-flowengines