Contract- Logistics
• Officehours– EveryTuesday,10:00– 12:00,B105(ΘΕΕ01)
• Credits:7.5ECTS• LectureTimetable–Monday,Thursday,12:00– 13:30,006(ΧΩΔ01)–Wednesday,9:00-10:00,110(ΧΩΔ01),onlywhenannounced!
Contract- Logistics
• Labs(Dr.Petros Panayi)– EveryWednesday,B103(ΘΕΕ01)– Group1:15:00- 17:00– Group2:17:00- 19:00
Contract- Logistics
• Score– ProgrammingAssignments(Quizzes),15%– Project(multiplesteps),15%–Midterm,30%– Final,40%
• Requirements– Averagegradeofwrittenexamsshouldbeatleast4.5
– Finalgradeshouldbeatleast5
Contract- Logistics
• Requiredcourses– CS132(Cprogramming),– CS211 (Complexity),– CS231 (DataStructures)
Studentswithconflictsshouldletmeknow
Contract- Logistics
• Readingmaterial– Dragonbook,Compilers– Principles,Techniques,andTools(2ndEdition),A.V.Aho,M.S.Lam,R.Sethi andJ.D.Ullman,PrenticeHall,2006.
Whythiscourseisimportant?
• Manycoreconcepts ofCSinareal-lifesetting– CS132:BehindthescenesofCprogramming– CS211,CS231:DFAs,parsingalgorithms– CS372,CS221:Assembly,ComputerArchitecture
• Compilersareeverywhere– Fromexoticdevices(IoT)towebbrowsers(JavaScriptengines,DOMparsers,etc.)
• TweakingcompilersforSecurity
Compilerisatoughone
• Significantefforttobuildacompilerfromscratch– FirstFortrancompilertook18staff-years
• Nowadays,manytoolsexisttohelp– Ourunderstandingismuchbetter– Bettertechniques,betterprogramminglanguages
Whatisacompiler?
• Acompilerisaprogramthat– readsaprogramwritteninonelanguage(source)– andtranslatesittoanequivalentprograminanotherlanguage(target)
– important:errorreportingduringtranslation
Compilersourceprogram targetprogram
errormessages
Examples
• GCC(GnuCompilerCollection)– gcc,g++,javac,etc.
• LLVM(LowLevelVirtualMachine)– clang,clang++
• “Compilers”areeverywhere,– Prettyprinters(i.e.,colorsyntaxineditors),staticcheckers,interpreters(i.e.,scriptinglanguages),etc.
Analysis-SynthesisModel
• Therearetwopartsincompilation:– Analysis– Synthesis
• Analysis– Breaksupthesourceprogram tosubpartsandcreatesintermediaterepresentation(s)
• Synthesis– Constructsthetargetprogram fromintermediaterepresentation(s)
Example1(LaTeX)\begin{table}[tb]
\centering\caption{We name gadgets based on their type (prefix), payload (body),and exit instruction (suffix). In total, we name 2$\times$3$\times$3=18different gadget types.}\begin{tabular}{|c|c|c|}
\hline\textbf{Gadget type} & \textbf{Payload instructions} &
\textbf{Exit instruction} \\\hline{Prefix} & {Body} & {Suffix} \\\hline
\begin{tabular}{l}CS - Call site\\EP - Entry point\\\end{tabular} &...
$ pdflatex main.tex
Example2(Database)
class
student Seq. Scan
Hash Join
Aggregate
SELECT AVG(grade)FROM student, classWHEREclass.name = “epl223” ANDclass.id = student.id;
Requirements
• Compiler– Reliability– Fastexecution– Lowmemoryoverhead– Gooderrorreporting– Errorrecovery– Portability– Maintainability
• Targetprogram– Fastexecution– Lowmemoryoverhead
Sourcecode/program
• Easytoread/writebyhuman
int expr(int n) {int d;
d = 4 * n * n * (n + 1) * (n + 1);
return d;}
AssemblyandMachineCode
• Optimizedforexecutionbyamachine(CPU)• Lessdescriptive• Hardtobeprocessedbyahuman
lda $30,-32($30)stq $26,0($30)stq $15,8($30)bis $30,$30,$15bis $16,$16,$1stl $1,16($15)lds $f1,16($15)sts $f1,24($15)ldl $5,24($15)bis $5,$5,$2s4addq $2,0,$3ldl $4,16($15)mull $4,$3,$2ldl $3,16($15)
Optimizations
• Compilershaveseverallayersofoptimizationsint expr(int n){
int d;d = 4 * n * n * (n + 1) * (n + 1);
return d;}
.expr:stw 31,-4(1) lwz 11,64(31)stwu 1,-40(1) addi 9,11,1mr 31,1 mullw 0,0,9stw 3,64(31) stw 0,24(31)lwz 0,64(31) lwz 0,24(31)mr 9,0 mr 3,0slwi 0,9,2 b L..2lwz 9,64(31) L..2:mullw 0,0,9 lwz 1,0(1)lwz 11,64(31) lwz 31,-4(1)addi 9,11,1 blrmullw 0,0,9
.expr:addi 9,3,1slwi 0,3,2mullw 3,3,0mullw 3,3,9mullw 3,3,9blr
Nooptimizations$ gcc –O0
Optimizations$ gcc –O3
Cross-compiler
• Compilerscangeneratecodefordifferentmachines(targets) int expr(int n){
int d;d = 4 * n * n * (n + 1) * (n + 1);
return d;}
expr:pushl %ebpmovl %esp, %ebpmovl 8(%ebp), %eaxleal 1(%eax), %edximull %eax, %eaximull %edx, %eaximull %edx, %eaxsall $2, %eaxpopl %ebpret
.expr:addi 9,3,1slwi 0,3,2mullw 3,3,0mullw 3,3,9mullw 3,3,9blr
Forx86$ gcc –O3 –b i586
ForPowerPC$ gcc –O3 –b powerpc
Compilationlifecycle
• Phases– Sourcecodeistransformedtointermediaterepresentations
– Eachintermediaterepresentationissuitableforaparticularprocessing(lexical,syntax,optimization,etc.)
• Ineachphasetheprogramistranslatedtoaformclosertothemachinerepresentationandlesssimilartothe(human-oriented)sourcerepresentation
CompilerPhases
Lexical Analyzer
Source Program
Target Program
Syntax Analyzer
Semantic Analyzer
Intermediate codegenerator
Code optimizer
Code generator
Error HandlerSymbol Table Manager
1
2
2
3
3
Optional
1:Tokens(Διακριτικά)2:Syntaxtree(Συντατικόδένδρο)3:Intermediatecode(Ενδιάμεσοςκώδικας)
Analysisofthesourceprogram
• Linearanalysis– Sourceistreatedasastreamofcharacters(left-to-right)andisgroupedintotokens
• Hierarchicalanalysis– Tokensarefurthergroupedinlargergrammaticalstructures(e.g.,nestedparenthesesandblocks)
• Semanticanalysis– Certainchecksareperformedtoensurethevalidityoftheidentifiedgrammaticalstructures
LexicalAnalysisΛεξιλογικήΑνάλυση
• Linearscanning• Considertheexpression
position := initial + rate *60
• Lexicalanalysisproducesid(1) op(:=) id(2) op(+) id(3) op(*) cons(60)id: identifier, op: operator, cons: constant
• SymbolTable 1 position …2 initial …3 rate …4 … …
SyntaxAnalysisΣυντακτικήΑνάλυση
• Hierarchical• Involvesgroupingthetokensintogrammaticalphases
• Constructsthestructurewiththetokenrelationship
op(:=)
id(3) cons(60)
id(2) op(*)
id(1) op(+)
position := initial + rate * 60
SimpleGrammar
• Thehierarchicalstructureoftheprogramisusuallyexpressedbyrecursiverules
1. Anyidentifier isanexpression2. Anynumber isanexpression3. Ifexpression1 andexpression2 areexpressions,
thensoare:expression1 +expression2expression1 *expression2(expression1 )
Applyingthegrammar
op(:=)
id(3) cons(60)
id(2) op(*)
id(1) op(+)
(1)Anyidentifier isanexpression(2)Anynumber isanexpression(3)Ifexpression1 andexpression2 areexpressions,thensoare:
expression1 +expression2expression1 *expression2(expression1 )
SemanticAnalysisΣημασιολογικήΑνάλυση
• Checkstheprogramforsemanticerrors• Gatherstypeinformation• Operandsandoperators• Type-checking
op(:=)
id(3) int_2_real
id(2) op(*)
id(1) op(+)
cons(60)
FLOAT
INTposition := initial + rate * 60
int_2_real() isanextranodeforconverting60 toarealnumber.Remember:themachinerepresentationofintegersandrealnumbersisdifferent!
Errordetectionandreporting
• Allphasescanissueerrors• Acompilerthatstopsatthefirsterrorisnothelpful• Mostoftheerrorsarehandledinthesyntax/semanticanalysisphases– Lexicalanalysisdetectserrorswhereastreamofcharactersdoesnotformavalidtoken
– Syntaxanalysisdetectserrorswherethestreamofvalidtokensviolatethestructurerules(syntax)
– Semanticanalysisdetectserrorswherethesyntaxisvalidbytheoperationnot(addinganarraywitharealnumber)
IntermediateCodeandOptimizationΕνδιάμεσοςΚώδικαςκαιΒελτιστοποίηση
• Eachphaseproducesintermediatecode
• Optimization
temp1 := int_2_real(60)temp2 := id(3) * temp1temp3 := id(2) + temp2id(1) := temp3
temp1 := id(3) * 60.0id(1) := id(2) + temp1
three-addresscode:asimpleassembly-likelanguage,whichconsistsofinstructions,eachofwhichhasatmostthreeoperands
Operator:τελεστήςOperand: τελεστέος
CodeGenerationΠαραγωγήΚώδικα
• Thelastphaseofthecompileristhegenerationofthetargetcode
• Registerallocation– Eachexpressionshoulduseregistersthatareavailable
• Relocationinformation– Variablesarestoredinrelocatable addresses
MOVF id3, R2MULF #60.0, R2MOVF id2, R1ADDF R2, R1MOVF R1, id1
GenericPictureCompilerandFriends
Sourcecode
Compiler
Assemblycode
Assembler
Machinecode
Executable
Linker
ΟSLoader
FrontandBackends
• Separationofcommontasks• Makesdesignand
implementationeasier• KcompilersforNmachines– Nbackends,Kfrontends
– InsteadofK*Ncompilers
Frontend(machine-
independent)
Backend
Lexical Analyzer
Source Program
Target Program
Syntax Analyzer
Semantic Analyzer
Intermediate codegenerator
Code optimizer
Code generator
Passes
• Apass iswhenthecompilerreadsthesourcecode(orintermediatefiles)
• Thenumberofpassesdependsonthesourceandtargetlanguageandtherunningenvironment
• Differentphasesthatcooperatecanbegroupedtoasinglepass(notalwayspossible)
• Whengroupingisnotpossible– Backpatching:leaveemptyinformationthatisgoingtobefilledbyalaterphase/pass
Top Related