Regular Languages and Regular Grammars

52
Regular Languages and Regular Grammars Chapter 3

description

Regular Languages and Regular Grammars. Chapter 3. Regular Languages. Regular Language. Describes. Regular Expression. Accepts. Finite State Machine. Operators on Regular Expressions. In order of precedence:. ()Parentheses * Star Closure Concatenation + Union. Example: - PowerPoint PPT Presentation

Transcript of Regular Languages and Regular Grammars

Page 1: Regular  Languages and Regular Grammars

Regular Languagesand

Regular Grammars

Chapter 3

Page 2: Regular  Languages and Regular Grammars

Regular Languages

Regular Language

Regular Expression

Finite State Machine

Accepts

Describes

Page 3: Regular  Languages and Regular Grammars

Operators on Regular Expressions

() Parentheses

* Star Closure

Concatenation

+ Union

.

Example:

Over = {a, b, c}, (a + (b . c))* produces:

{λ, a, bc, aa, abc, bcbc, … }

In order of precedence:

Note: The concatenation symbol is often omitted.

Page 4: Regular  Languages and Regular Grammars

Regular Expressions

Let be a given alphabet. Then

1. , λ, and a are all primitive regular expressions.

2. If r1 and r2 are regular expressions,

so are r1 + r2, r1 . r2, r1

*, and (r1)

3. A string is a regular expression, iff it can be derived from the primitive regular expressions by a finite number of application of the rules in (2).

Page 5: Regular  Languages and Regular Grammars

Languages Associated with Regular Expressions

If r is a regular expression L(r) is a language associated with r.

Rules to simplify languages associated with r:

L() = L(λ) = λ L(a) = {a}

L(r1 + r2) = L(r1) U L(r2)

L(r1 . r2) = L(r1)

. L(r2)

L((r1)) = L(r1)

L(r1*) = (L(r1))*

Page 6: Regular  Languages and Regular Grammars

Analyzing a Regular Expression

L((a + b)*b) = L((a + b)*) L(b)

= (L(a + b))* L(b)

= (L(a) U L(b))* L(b)

= ({a} U {b})* {b}

= {a, b}* {b}.

A string of a’s and b’s that end with b

Page 7: Regular  Languages and Regular Grammars

L(a*b*) = L(a*)L(b*)= {a}*{b}*

A string of zero or more a’s followed by a string of zero or more b’s.

Analyzing a Regular Expression

Page 8: Regular  Languages and Regular Grammars

L = {w {a, b}* : w = |w| is even}

((a + b)(a + b))*

or

(aa + ab + ba + bb)*

Given a Language, find a rex

Page 9: Regular  Languages and Regular Grammars

Examples

L = {w {a, b}* : w contains an odd number of a’s}

b*(ab*ab*)*ab*

or

b*ab*(ab*ab*)*

Both expressions require that there be a single a somewhere. There can also be other a’s, but they must occur in pairs.

Page 10: Regular  Languages and Regular Grammars

More Regular Expression Examples

Try these:

L = {w {a, b}*: there is no more than one b in w}

L(r) = {a2nb2m+1 : n 0, m 0}

Page 11: Regular  Languages and Regular Grammars

More Regular Expression Examples

Try these:

L = {w {a, b}*: there is no more than one b in w}

a*(λ+b)a* or a* + a*ba*

L(r) = {a2n b2m+1 : n 0, m 0}

(aa)*(bb)*b

Page 12: Regular  Languages and Regular Grammars

The Details Matter

a* + b* (a + b)*

(ab)* a*b*

Page 13: Regular  Languages and Regular Grammars

Rex to NFA

Finite state machines and regular expressions define the same class of languages.

Theorem: Any language that can be defined with a regular expression can be accepted by some NFA and so is regular.

Proof by Construction: Must show that an NFA can be constructed using rules for: , λ, any symbol in , union, and concatenation.

Page 14: Regular  Languages and Regular Grammars

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

Page 15: Regular  Languages and Regular Grammars

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

Page 16: Regular  Languages and Regular Grammars

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

A single element of :

Page 17: Regular  Languages and Regular Grammars

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

A single element of :

Page 18: Regular  Languages and Regular Grammars

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

A single element of :

λ:

Page 19: Regular  Languages and Regular Grammars

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

A single element of :

λ:

Page 20: Regular  Languages and Regular Grammars

FSA that recognizes s + t

;;; …

M1 (recognizes string s)

M2 (recognizes string t)

λ

λ

λ

λ

Union

Page 21: Regular  Languages and Regular Grammars

;;; … …λ λ λ

FSA that recognizes st

Concatenation

M1 (recognizes string s) M2 (recognizes string t)

Page 22: Regular  Languages and Regular Grammars

Star Closure

;;; …λ λ

M1 (recognizes string s)

λ

λ

FSA that recognizes s*

λ

Page 23: Regular  Languages and Regular Grammars

An Example

(b + ab)*

An FSM for a An FSM for b

An FSM for ab:

λ

Page 24: Regular  Languages and Regular Grammars

An Example

(b + ab)*

An FSM for (b + ab):

λ λ

λ

Page 25: Regular  Languages and Regular Grammars

An Example

An FSM for (b + ab)*:

λ λ

λ

λ

λ

λλ

λ

Page 26: Regular  Languages and Regular Grammars

An Example

A Simplified FSM for (b + ab)*:

a

b

λ

Page 27: Regular  Languages and Regular Grammars

For Every FSM There is a Corresponding Regular Expression

Theorem: Every regular language (i.e., every language that can be accepted by some DFSM) can be defined with a regular expression.

Proof by Construction: Use generalized transition graphs (GTGs) to convert FSM to REX. A GTG is a transition graph whose edges are labeled with regular expressions.

Page 28: Regular  Languages and Regular Grammars

A Simple Example

Let M be:

Suppose we rip out state 2:

Page 29: Regular  Languages and Regular Grammars

The Algorithm fsmtoregexheuristic fsmtoregexheuristic(M: FSM) = 1. Remove unreachable states from M. 2. If M has no accepting states then return . 3. If the start state of M is part of a loop, create a new start state s and connect s to M’s start state via an λ-transition. 4. If there is more than one accepting state of M or there are any transitions out of any of them, create a new accepting state and connect each of M’s accepting states to it via an λ-transition. The old accepting states no longer accept. 5. If M has only one state then return λ. 6. Until only the start state and the accepting state remain do:

6.1 Select rip (not s or an accepting state). 6.2 Remove rip from M. 6.3 *Modify the transitions among the remaining states so M accepts the same strings.

7. Return the regular expression that labels the one remaining transition from the start state to the accepting state.

Page 30: Regular  Languages and Regular Grammars

Example 1

1. Create a new initial state and a new, unique accepting state, neither of which is part of a loop.

Note: λ

Page 31: Regular  Languages and Regular Grammars

2. Remove states and arcs and replace with arcs labeled with larger and larger regular expressions.

Example 1, Continued

Page 32: Regular  Languages and Regular Grammars

Remove state 3:

Example 1, Continued

Page 33: Regular  Languages and Regular Grammars

Remove state 2:

+

Example 1, Continued

Page 34: Regular  Languages and Regular Grammars

Example 1, Continued

Remove state 1:

+

+ +

Page 35: Regular  Languages and Regular Grammars

Example 2

a*(a + b)c*

Page 36: Regular  Languages and Regular Grammars

Example 3

a* + a*(a + b)c*

Page 37: Regular  Languages and Regular Grammars

Simplifying Regular Expressions

Regex’s describe sets: ● Union is commutative: + = + . ● Union is associative: ( + ) + = + ( + ). ● is the identity for union: + = + = . ● Union is idempotent: + = .

Concatenation: ● Concatenation is associative: () = (). ● λ is the identity for concatenation: λ = λ = . ● is a zero for concatenation: = = .

Concatenation distributes over union: ● ( + ) = ( ) + ( ). ● ( + ) = ( ) + ( ).

Kleene star: ● * = λ. ● λ* = λ. ●(*)* = *. ● ** = *. ●( + )* = (**)*.

Page 38: Regular  Languages and Regular Grammars

Applications of regular expressions: Pattern Matching

Many applications allow pattern matchesunixperlExcelAccess…

Pattern matching programs use automata

pattern rex nfa dfa transition table driver

Page 39: Regular  Languages and Regular Grammars

A Biology Example – BLAST

Given a protein or DNA sequence, find others that are likely to be evolutionarily close to it.

ESGHDTTTYYNKNRYPAGWNNHHDQMFFWV

Build a DFSM that can examine thousands of other sequences and find those that match any of the selected patterns.

Page 40: Regular  Languages and Regular Grammars

Regular Expressions in Perl Syntax Name Description

abc Concatenation Matches a, then b, then c, where a, b, and c are any regexs

a | b | c Union (Or) Matches a or b or c, where a, b, and c are any regexs

a* Kleene star Matches 0 or more a’s, where a is any regex

a+ At least one Matches 1 or more a’s, where a is any regex

a? Matches 0 or 1 a’s, where a is any regex

a{n, m} Replication Matches at least n but no more than m a’s, where a is any regex

a*? Parsimonious Turns off greedy matching so the shortest match is selected

a+?

. Wild card Matches any character except newline

^ Left anchor Anchors the match to the beginning of a line or string

$ Right anchor Anchors the match to the end of a line or string

[a-z] Assuming a collating sequence, matches any single character in range

[^a-z] Assuming a collating sequence, matches any single character not in range

\d Digit Matches any single digit, i.e., string in [0-9]

\D Nondigit Matches any single nondigit character, i.e., [^0-9]

\w Alphanumeric Matches any single “word” character, i.e., [a-zA-Z0-9]

\W Nonalphanumeric Matches any character in [^a-zA-Z0-9]

\s White space Matches any character in [space, tab, newline, etc.]

Page 41: Regular  Languages and Regular Grammars

Syntax Name Description

\S Nonwhite space Matches any character not matched by \s

\n Newline Matches newline

\r Return Matches return

\t Tab Matches tab

\f Formfeed Matches formfeed

\b Backspace Matches backspace inside []

\b Word boundary Matches a word boundary outside []

\B Nonword boundary Matches a non-word boundary

\0 Null Matches a null character

\nnn Octal Matches an ASCII character with octal value nnn

\xnn Hexadecimal Matches an ASCII character with hexadecimal value nn

\cX Control Matches an ASCII control character

\char Quote Matches char; used to quote symbols such as . and \

(a) Store Matches a, where a is any regex, and stores the matched string in the next variable

\1 Variable Matches whatever the first parenthesized expression matched

\2 Matches whatever the second parenthesized expression matched

… For all remaining variables

Regular Expressions in Perl

Page 42: Regular  Languages and Regular Grammars

Using Regular Expressions in the Real World

Matching numbers:

-? ([0-9]+(\.[0-9]*)? | \.[0-9]+)

Matching ip addresses:

S !<emphasis> ([0-9]{1,3} (\ . [0-9] {1,3}){3}) </emphasis> !<inet> $1 </inet>!

Finding doubled words:

\< ([A-Za-z]+) \s+ \1 \>

From Friedl, J., Mastering Regular Expressions, O’Reilly,1997.

Page 43: Regular  Languages and Regular Grammars

More Regular Expressions

Identifying spam:

\badv\(?ert\)?\b

Trawl for email addresses:

\b[A-Za-z0-9_%-]+@[A-Za-z0-9_%-]+ (\.[A-Za-z]+){1,4}\b

Page 44: Regular  Languages and Regular Grammars

Using Substitution

Building a chatbot:

On input:

<phrase1> is <phrase2>

the chatbot will reply:

Why is <phrase1> <phrase2>?

Page 45: Regular  Languages and Regular Grammars

Chatbot Example

<user> The food there is awful<chatbot> Why is the food there awful?

Assume that the input text is stored in the variable $text:

$text =~ s/^([A-Za-z]+)\sis\s([A-Za-z]+)\.?$/ Why is \1 \2?/;

Page 46: Regular  Languages and Regular Grammars

Regular Grammars

A regular grammar G is a quadruple (V, T, S, P) that is either consistently right-linear or consistently left-linear.

● V - Variables

● T – Terminals

● S - Start variable, S V

● P - Productions

Page 47: Regular  Languages and Regular Grammars

Right-Linear Grammar

All production rules are of the form:

A xB or A x

A,B V A and B are variablesx T* x is a string in the alphabet

Example:

G = ({S}, {a, b}, S, P)

P: S abS | a

Corresponding Regular Expression:(ab)*a

Page 48: Regular  Languages and Regular Grammars

Left-Linear Grammar

All production rules are of the form:

A Bx or A x

A,B V A and B are variablesx T* x is a string in the alphabet

Example:

G = ({S, S1, S2}, {a, b}, S, P)

P: S S1abS1 S1ab | S2

S2 a

Corresponding Regular Expression:aab(ab)*

Page 49: Regular  Languages and Regular Grammars

Focus on Right-Linear Grammars

A language generated by a right-linear grammar is always regular. Proof by construction of FA on page 91 of text.

Example: Construct an FA that accepts the language generated by the grammar:

V0 aV1 V1 abV0 | b

Page 50: Regular  Languages and Regular Grammars

Focus on Right-Linear Grammars

V0 aV1

V1 b

V1 abV0

Complete FA:

Page 51: Regular  Languages and Regular Grammars

Right-Linear Grammars

Every regular language can be generated by some right-linear grammar. Proof by reverse construction of an FA, page 93 of text.

Example: Find a right-linear grammar that generates the language accepted by the FA shown below.

Page 52: Regular  Languages and Regular Grammars

G = {{Q0, Q1, Q2}, {0, 1}, Q0, P}

P: Q0 1Q1 | Q2 | λ

Q1 0Q0 | 0Q2

Q2 1Q2

Each state in the FA is represented by a variable in the grammar.Each transition symbol in the FA is a terminal in the grammar.Each transition in the FA is represented by a rule in the grammar.If a state, qk is a final state, include the production qk λ