Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table...

28
CSE450 Translation of Programming Languages Lecture 5 : YACC and Grammar Design

Transcript of Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table...

Page 1: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

CSE450 Translation of Programming LanguagesLecture 5: YACC and Grammar Design

Page 2: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Hold up the card for the RegEx that accepts the same language as the CFG?

S ➔ abS | a abS|a ab*a

a*b*a* (ab)*a

Page 3: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Which grammar describes the language axby, where x > y? Example: "aaabb"

S ➔ aL ; L ➔ aLb | ε S ➔ aL ; L ➔ aLb | aL | ε S ➔ aSb | L ; L ➔ aL | ε

None of the above

Page 4: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Yacc Overviewimport sysimport ply.yacc as yaccfrom lexer import tokens

<production rule functions>

parser = yacc.yacc(errorlog=yacc.NullLogger())

source = sys.stdin.read()parser.parse(source)

Format of parser.py file

lexer.py

parser.py

Yacc

Lex

Page 5: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

A Yacc Program (num_parser.py)import ply.yacc as yaccfrom num_lexer import tokens

def p_number(p): "num_entry : NUMBER" print("I found a number!")

def p_error(p): print("Liar! That isn't a number!")

parser = yacc.yacc(errorlog=yacc.NullLogger())source = "67"parser.parse(source)

Page 6: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

... and its Lex Partner (num_lexer.py).import sysimport ply.lex as lex

tokens = ["NUMBER", "OTHER"]

def t_NUMBER(t): r"[0-9]+" return t

def t_OTHER(t): r"[^0-9]" return t

def t_error(t): sys.exit(1)

lexer = lex.lex()

Page 7: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Compiling the Pairpython3 num_parser.py

Note: A number of additional files ("parsetab.py", "parser.out", "__pycache__/") will be generated by PLY. You shouldn't commit these files as they can will be generated as needed by PLY.

Page 8: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Structure of Yacc Parserimport sysimport ply.yacc as yaccfrom lexer import tokens

The import of the tokens from the lexer module (lexer.py or some other named module) is necessary. PLY uses this info (and the lexer created in lexer.py) to tokenize the input.

parser = yacc.yacc(errorlog=yacc.NullLogger())

source = sys.stdin.read()parser.parse(source)

These lines create the parser, read a string from stdin and parse the input.

Page 9: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Production RulesIn parser.py, the Context Free Grammar rules are defined as functions.

The function name must start with p_ , but otherwise can be whatever you want.

The docstrings contain the rule specification

The function body says what to do when the rule is used. defp_number(p):"num_entry:NUMBER"print("Ifoundanumber!")

The arrow notation is replaced with a colon (:)

Page 10: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

More Production Rulesε production has the form: "something:"The first production function/rule is the starting rule.

You can show multiple productions for the same non-terminal with either multiple functions (usually recommended) or a compound production

defp_signed_int(p):

"""

signed_int:INT

|MINUS_SIGNINT

"""

pass

Page 11: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

LiteralsYou can make your life easier by adding single character literals to lexer.py like so:

literals=['+','-','*','/','(',...]

Then you can use the literals in production rules like so: defp_parentheses(p):"""expression:'('expression')'"""pass

Page 12: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

3 minute break

http://rossching.com/wp-content/uploads/2011/01/3mins.jpg

Page 13: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Designing your GrammarQuestion: Given a rule that reduces to a LHS called "statement" and identifies legal statements, how do we use it to verify a program with zero or more statements?

statement_list: statement statement_list

| /* Empty program! */

Page 14: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Designing your Grammar (2)Next Step: Now that we need to design "statement" what things do we need to worry about?

statement: assignment ';' // has an = | expression ';' // just the right of = | var_declare ';' // just the left of = | command ';' // non-expression

Once we've broken down statement into sub-categories, we can just go and define them.

Page 15: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Designing your Grammar (3)

Assignments: What do we need to worry about in an assignment? We need a LHS, an '=' and a RHS.

assignment: var_any '=' expression

For "var_any" we want either a normal variable, or one we declare on the same line.

var_any: var_usage | var_declare

Page 16: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Designing your Grammar (4)Expressions: What can be part of an expression?

Literal NumbersMath symbols (including negation and comparisons!)ParenthesesVariablesRandom command

You have a lot of examples of these and should be able to design it yourself!

Page 17: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Designing your Grammar (5)Commands: What should be in the commands portion of the definition of the statement rule?

For now, just the "print" command.

Eventually, "if", "while", or any other commands that do not return a value.

Why only these?

If a command returns a value, it should be included with expression! We don't want to have two definitions without setting a precedence.

Page 18: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Getting More InformationSometimes we will need to retrieve more information about a token than just its type:

for identifiers, we need to know their full name. for numbers, we need to know their value. for types, we need to know which type.

How can we get this extra information?

We can attach attributes to each terminal (token) or non-terminal used in the production rules.

Page 19: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Attribute Values (aka p argument)Each terminal/non-terminal has one in a production rule.

Denoted by p[n] where n is its rank in the rule starting by 1 p[0] = LHS p[1] = first symbol of the RHS p[2] = second symbol, etc. For example: r"""A:B'+'C"""C’s value is denoted by p[3]

Note: negative indices have a special meaning in PLY that we won't be exploiting.

Page 20: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Lexeme Attributes in Yacc (add_parser.py) import ply.yacc as yaccfrom add_lexer import tokens

def p_add_one(p): "expression : NUMBER" num = p[1] new_num = num + 1 print("My better number is {}".format(new_num))

parser = yacc.yacc(errorlog=yacc.NullLogger())source = "67"parser.parse(source)

Page 21: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Lexeme Attributes in Lex (add_lexer.py) import sysimport ply.lex as lex

tokens = ["NUMBER", "OTHER"]

def t_NUMBER(t): r"[0-9]+" t.value = int(t.value) return t

def t_OTHER(t): r"[^0-9]" return t

def t_error(t): sys.exit(1)

lexer = lex.lex()

Page 22: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Symbol TablesA symbol table is a data structure that associates names with information about the user-defined data that are denoted by the names.

Programming languages have many kinds of symbols. In our case:

variables (project 2) constants (project 5) function names (project 6)

Other programs may have: position labels, user-defined types, enumerations, etc.

Page 23: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Symbol Table Design

A symbol table should be well organized:

For fast lookup of symbols.

Note: if a symbol table lookup takes O(n) time, the total compilation time becomes O(n2).

You want to be able to reflect the organization of the program (block structure). This will be important in project 4!

Page 24: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Symbol Table EntriesWhat information do we need to put in an entry for a variable in a symbol table?

Some obvious choices: Name and Type

Line Number (using in reporting errors) Scope (so we know when to deactivate it) Initialized? (for compile time error checking) Memory Position (for compiling to assembly)

Many more if we were interpreting the code.

Page 25: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Symbol Table Data Structures

Several data structures can be used for a symbol table.

Arrays (sorted / unsorted)Linked Lists (sorted / unsorted)Binary TreeHash Table

Which are the best choices?We need to be able to search() and insert().

Page 26: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Example Declaration and Symbol Tables

SYMBOL_TABLE = {}

def p_declaration(p): """ declaration : TYPE ID """ var_name = p[2] if var_name in SYMBOL_TABLE: exit(1) SYMBOL_TABLE[p[2]] = p[1]

Page 27: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Some Advanced Yacc - Interpreter (calc_parser.py)import ply.yacc as yaccfrom calc_lexer import tokens

def p_start(p): "start : expr" print(p[1])

def p_add(p): "expr : expr '+' expr" p[0] = p[1] + p[3]

def p_sub(p): "expr : expr '-' expr" p[0] = p[1] - p[3]

def p_mult(p): "expr : expr '*' expr" p[0] = p[1] * p[3]

def p_div(p): "expr : expr '/' expr" if p[3] == 0: exit(1) p[0] = p[1] / p[3]

def p_para(p): "expr : '(' expr ')'" p[0] = p[2]

def p_number(p): "expr : NUMBER" p[0] = p[1]

def p_error(p): sys.exit(1)

parser = yacc.yacc()source = "1 * 2 + 3 * 5 + 5"source = "4 / 8"parser.parse(source)

Page 28: Lecture 5: YACC and Grammar Designcse450/Lectures/05-yacc.pdf · Symbol Table Design A symbol table should be well organized: For fast lookup of symbols. Note: if a symbol table lookup

Interpreter Lex File (calc_lexer.py)import sysimport ply.lex as lex

tokens = ["NUMBER"]

def t_NUMBER(t): r"[0-9]+" t.value = int(t.value) return t

def t_WHITESPACE(t): r"[\n\t ]" pass

literals = ['+', '-', '*', '/', '(', ')']

def t_error(t): sys.exit(1)

lexer = lex.lex()