Chapter 12: Regular Expressions and Finite-State Automatamth314/W08/Slides/Chapter12.pdf · Chapter...

Chapter 12: Regular Expressions andFinite-State Automata

March 10, 2008

Outline

1 12.1 Formal Languages and Regular Expressions

2 12.2 Finite-State Automata

3 7.3 Pigeonhole Principle

4 5.4 Russell’s Paradox and the Halting Problem

5 7.5 Cardinality of Sets

Formal Languages

DefinitionAn alphabet Σ is a finite set of characters (symbols).

Examples

1 ΣE = {a, b, c, . . . , X , Y , Z} is the usual alphabet for Englishlanguage.

2 Computer languages use a slightly richer alphabet, whichis called ASCII,

ΣASCII = ΣE ∪ {!,@, $, . . . , ?}

3 The real language of computers is based on the binaryalphabet

Σ0 = {0, 1}

DefinitionA string over an alphabet Σ is either(a) the empty string ε; or(b) any ordered n-tuple of elements from Σ, written without

commas or parentheses.

Examples

1 If Σ = Σ0, then ε, 0, 1, 00, 01, 10, 11, 011000101, . . . areall strings over Σ.

2 If Σ = ΣE , then “in”, “fvedwyf”, “string” are all strings overΣ.

DefinitionThe length of a string is the number of characters that appear inthe string. By default, we define the length of the empty string,ε, to be zero.

Examples

1 Over the language Σ0 = {0, 1}, the string

011000101

has the length 9.2 Over ΣE , the usual English alphabet, the string “string” has

length 6.

DefinitionSuppose Σ is an alphabet and n a non-negative integer. Then• Σn = the set of all strings over Σ that have length n• Σ+ = the set of all strings over Σ whose length is at least 1• Σ∗ = the set of all strings over Σ

It is not difficult to see that

Σ∗ = Σ0 ∪ Σ1 ∪ Σ2 ∪ . . . ∪ Σn ∪ . . .

ExampleSuppose the alphabet is Σ = Σ0 = {0, 1}. Then

1 Σ0 = {ε},2 Σ1 = {0, 1},3 Σ2 = {00, 01, 10, 11},4 Σ3 = {000, 001, 010, 100, 011, 101, 110, 111},

etc.

DefinitionGiven an alphabet Σ, a formal language over Σ is any fixedsubset L of Σ∗.The members of L are called the words of the language.

Examples

(a) Given the usual English alphabet ΣE ,

L = {English words}

(b) Given Σ0 = {0, 1},

L = {x ∈ Σ∗ | x ends in 11} = {11, 011, 111, 0011, 1011, . . .}

(c) Given Σ = {a, b} a palyndrome is a word which is equal toits reverse:

L = {x ∈ Σ∗ | x is a palyndrome }

= {a, b, aa, bb, aaa, aba, bab, bbb, abba, . . .}

Operations on Formal LanguagesDefinitionIf Σ is an alphabet, and x and y are two strings over Σ, then theconcatenation of x and y is the string obtained by juxtaposingthe characters of both words.

ExampleFor Σ = {0, 1}, consider

x = 010, y = 11011

Their concatenation is

xy = 01011011

Notice that, in general, the concatenation of strings is notcommutative; i.e.

xy 6= yx

DefinitionFor any languages L and L′ over some alphabet Σ, we candefine new languages as follows:• Concatenation of L and L′:

LL′ = {xy | x ∈ L and y ∈ L′}

• Union of L and L′:

L ∪ L′ = {x | x ∈ L or x ∈ L′}

• Kleene closure of L:

L∗ = {x | x is a conactenation of a finite number of words from L}

[ε is in L∗ since it is a concatenation of zero strings from L.]

ExampleLet

L1 = {0, 01}, L2 = {1}

Then• L1L2 = {01, 011}• L2L1 = {10, 101}• L1 ∪ L2 = {0, 01, 1}• L∗1 = {ε, 0, 01, 00, 001, 010, 0000, 0100, 01001, . . .}• L∗2 = {ε, 1, 11, 111, 1111, . . .}

Regular Expressions

Definition(Regular Expression) Suppose Σ is a finite alphabet. Then, thefollowing are regular expressions over Σ:

I BASE: ∅, ε, and each individual symbol from Σ.II RECURSION:If r and s are regular expressions over Σ,

then so are:(a) (rs)(b) (r ∨ s) (also written as (r + s))(c) (r∗)

III RESTRICTION:Nothing else is a regular expression overΣ, except for the objects from I and II.

Examples

(a) If Σ = {a, b},((a ∨ b)∗(a ∨ ε))

is a regular expression over Σ.(b) If Σ = {0, 1}, one example of a regular expression over Σ

is((0∗)(1∗))

• If the context is clear we can omit unnecessary brackets;e.g. the two examples above can be written as:

(a ∨ b)∗(a ∨ ε), 0∗1∗

• The operation of highest priority is the Kleene closure ∗,followed by concatenation, while ∨ has the lowest priority.

• Regular expressions define certain languages, which arecalled regular languages.

• Given a regular expression r over some fixed finitealphabet Σ, what is the language L(r) of words from Σ∗

defined r?

DefinitionFor any finite alphabet Σ, we can associate a language L(r) toa regular expression r over Σ. L(r) is called the languagedefined by r .

I BASE: L(∅) = ∅, L(ε) = {ε}, and L(a) = {a}, for everya ∈ Σ.

II RECURSION: If L(r), L(r1) and L(r2) are the languagesdefined by regular expressions r , r1 and r2 over Σ, then(a) L(r1r2) = L(r1)L(r2)(b) L(r1 ∨ r2) = L(r1) ∪ L(r2)(c) L(r∗) = (L(r))∗

ExamplesFind regular expressions for the following languages:(a) L = {x ∈ {0, 1}∗ | x begins with a 1}

1(0 ∨ 1)∗

(b) All strings with at least one 1

(0 ∨ 1)∗1(0 ∨ 1)∗

(c) All strings of length two or three over the alphabetΣ = {x , y , z}

(x ∨ y ∨ z)(x ∨ y ∨ z)(ε ∨ x ∨ y ∨ z)

(d) All strings over {0, 1} which have no repeated 1’s.

(10 ∨ 0)∗(ε ∨ 1)

(e) All strings in which the number of 1’s is even.

(0 ∨ 10∗1)∗

ExamplesDescribe the languages that correspond to the following regularexpressions:(a) (0 ∨ 1)∗1

L = {x ∈ {0, 1}∗ | x ends in a 1}

(b) (a ∨ b)∗c(a ∨ b)∗c(a ∨ b)∗, where Σ = {a, b, c}All strings with exactly two c’s.

(c) ((a ∨ b)∗c(a ∨ b)∗c(a ∨ b)∗)∗, where Σ = {a, b, c}All strings with an even number of c’s.

ExampleDetermine, in each case, whether the two regular expressionsdefine the same language.

(a) (a ∨ ε)∗ and a∗.Yes. Since aε = a = εa, the language defined by the firstexpression is the set of all strings that result fromconcatenating a with itself a finite number of times, whichis the same language as the one defined by the secondregular expression.

(b) 0∗ ∨ 1∗ and (01)∗.No. For example, the string 00 is in the language definedby the first expression, but not in the second language.

Finite-State Automata

• A finite-state automaton is an idealized (theoretical)version of a sequential computational circuit.

• Roughly speaking, a finite state automaton is a machinewhose memory can store a finite amount of informationregarding its prior input, and based on that information andthe current input symbol, one can predict its output.

DefinitionA finite-state automaton A consists of five objects:

1 A set I, called the input alphabet (or, input symbols)2 A set S of states that the automaton can be in3 A designated state s0 called the initial state4 A designated set of states called the accepting (or,

terminal) states5 A next-state (or, transition) function

N : S × I → S

which, based on the current state s that the automaton isin, and the current input symbol m computes the next states′ the automaton will be in:

N(s, m) = s′.

• The easiest way to visualize a finite-state automaton is touse the so-called transition diagram.

• In such a diagram, states are represented by circles andaccepting states by double circles. There is an arrow thatpoints to the initial state and other arrows are labelled withinput symbols and point from each state to other states inthe following way:

if N(s, m) = s′, then there is an arrow labelled by mpointing from s to s′.

ExampleConsider the finite-state automaton A defined by the followingtransition diagram:

s0 s1 s2

1

1

1

0

0

0

(a) The states are S = {s0, s1, s2}.(b) The input symbols are I = {0, 1}.(c) The initial state is s0.(d) The only accepting state is s2.(e) The next-state function is given by the following table:

N 0 1s0 s1 s0s1 s1 s2s2 s1 s0

ExampleConsider the finite-state automaton A defined by the followingtransition table:

N a b cU Z Y YV V V VY Z V YZ Z Z Z

and suppose the input state is U, while the accepting states areV and Z .Draw the transition diagram for this automaton.

U V

Z Y

a

a

a

a

b

b

b

bc

c

c

c

• We can simplify this diagram by condensing all arrowspointing from one state to some other fixed state into asingle arrow with several labels:

U V

Z Y

a

a

b,c

a,b,c

a,b,c

b

c

The Language Accepted by anAutomaton

• Suppose a string of input symbols

s1s2s3 . . . sn . . .

is being fed into a finite-state automaton.• After reading each input symbol si the automaton changes

the state, and ends up in either an accepting or anon-accepting state.

• In this way, the automaton separates the set of all inputsstrings into two subsets: those that force the automatoninto an accepting state and those that not.

• Those strings that send the automaton into an acceptingstate are said to be accepted by the automaton.

DefinitionLet A be a finite-state automaton with the set of input symbols I.Let I∗ be the set of all strings over the alphabet I. Suppose w isa string in I∗. Then w is accepted by A if, and only if, A goes intoan accepting state when the symbols of w are input into A in asequence from left to right, with A starting from its initial state.The language accepted by A, L(A), is the set of all stringsaccepted by A.

ExampleConsider the finite-state automaton from our original example:

s0 s1 s2

1

1

1

0

0

0

(a) To what states does A go if the symbols of the followingstrings are input to A in sequence, starting from s0?

(i) 01, (ii) 0011, (iii) 0101100, (iv) 10101

(i) s2, (ii) s0, (iii) s1, (iv) s2

(b) Which of the strings in part (a) send A to an acceptingstate?

01, 10101

(c) What is the language accepted by A?L(A) = All strings ending in 01.

(d) Is there a regular expression that defines the samelanguage?

(0 ∨ 1)∗01

Eventual-State Function

• Suppose we input some string

w = s1s2 . . . sn

into a finite-automaton, and not just a single symbol.• What will be the state that the automaton will enter

eventually?• To answer that question, we need to introduce the

eventual-state function.

DefinitionLet A be a finite-state automaton with the set of states S, inputsymbols I, and next-state function

N : S × I → S

Let I∗ be the set of all strings over I, and define theeventual-state function

N∗ : S × I∗ → S

so that, if w is a string from I∗,

N∗(s, w) = the state to which A goes if the symbols from w areinput into A in sequence, starting with A being in the state s.

ExampleConsider, again, the automaton

s0 s1 s2

1

1

1

0

0

0

What is N∗(s1, 01100)?Solution:

s10−→ s1

1−→ s21−→ s0

0−→ s10−→ s1

so:N∗(s1, 01100) = s1

• Suppose A is a finite-state automaton with input symbols Iand next-state function N, and let I∗ be the set of all stringsover I, with w a string in I∗,

w is accepted by A ⇔ N∗(s0, w) is an accepting state of A

• The language accepted by A is

L(A) = {w ∈ I∗ | N∗(s0, w) is an accepting state of A}

Constructing a Finite-State Automaton

Example

(a) Construct a finite-state automaton A which accepts the setof all strings over {0, 1} such that the number of 1’s in thestring is divisible by 3

(b) Find a regular expression that defines this set of strings.

Solution: Suppose the initial state is s0. We want A to keeptrack of how many 1’s have been input up to that point, so weneed at least two more states: s1, s2. A will be in s1, if thenumber of 1’s is one, and s2 if the number of 1’s that have beenscanned is two. Since the empty string contains a number of1’s (zero) that is divisible by 3, we want to make s0 an acceptingstate.

So, we want A to behave as follows:

s01−→ s1

1−→ s21−→ s0

If, at some point, A encounters a 0 in the string, this has noeffect on what on the number of 1’s, so A doesn’t need tochange its current state; i.e.

s00−→ s0, s1

0−→ s1, s20−→ s2

a

s0 s1

s2

1

11

0

0

0

(b) The regular expression for this language is:

0∗ ∨ (0∗10∗10∗10∗)∗

Example

(a) Construct a finite-state automaton A to accept the set of allstrings over {0, 1} which contain exactly one symbol 1.

(b) Find a regular expression for this language.

Solution: We start with two states:s0: initial state of A;s1: the state which A enters when the input string containsexactly one 1. (so, s1 is an accepting state)

We can start in this way:

s00−→ s0, s0

1−→ s1

If A is in state s1 and reads another 0, it can remain in thatstate, since the number of 0’s is irrelevant.If A is in state s1, and reads another 1, we should not put

s11−→ s0

the reason being that such a string should not be acceptedlater, but it is possible to get from s0 to s1.Instead, if we have at least two 1’s, we want A to enter a“dead-end” state s2 from which it cannot get to either s0 or s1.

s1 s2s011

0 00 ,1

(b) The corresponding regular expression is:

0∗10∗

Finite-State Automata and RegularLanguages

• We have seen, in the previous two examples, that, given afinite automaton, it may be possible to find a regularexpression that defines the language accepted by theautomaton.

• This is true for any finite-state automaton.

Theorem(Kleene’s Theorem - Part 1) Given any language that isaccepted by a finite-state automaton, there is a regularexpression that defines the same language.

Theorem(Kleene’s Theorem - Part 2) Given any language defined by aregular expression, there is a finite-state automaton thataccepts the same language.

• Therefore, the class of all languages defined y regularexpression is identical with the class of all languagesaccepted by finite-state automata.

• Such languages are called regular languages.

• Question: are there languages over finite alphabets thatare not regular?

• Answer: Yes. In fact, regular languages are rather specialand most of the languages we work with require morecomplicated recognizers (machines), e.g. pushdownautomata, Turing machines, random access machines, etc.

• Next, we will give an example of a relatively simplelanguage over {a, b}, which is not regular. To do that, wewill need a theorem from 7.5.

Theorem(Pigeonhole Principle) If we want to put n pigeons in m holes,and n > m, then there must be at least one hole with more than1 pigeon in it.

• This is an informal version of a combinatorial principlewhich asserts that, if we want to distribute n objects in mcategories, with n > m, then at least one category willcontain more than one object.

• We will discuss this principle (theorem) in more detail in7.5.

ExampleLet L be the language consisting of all strings

akbk , k ≥ 1

over Σ = {a, b}; i.e.

L = {s ∈ Σ∗ | s = akbk , k ≥ 1}

Show that L is not a regular language.Solution: We will prove this by contradiction. Suppose thatthere exists a finite-state automaton A that accepts thelanguage L.A has a finite number of states, say

s1, s2, . . . , sn

Look at the infinite sequence of strings

a, a2, a3, . . . , ak , . . .

By Pigeonhole Principle, there must be a state, say sm and twoinput strings ap, aq that will cause A to end up in that samestate sm, starting from s1.We assumed that A accepts L, i.e. that the string

apbp

is accepted by A.So, the string bp will cause A to go from state sm to someaccepting state, say sa.But, in that case, the string

aqbp

will also be accepted by A. [aq first leads A from the initial states1 to sm and, then, bp causes the transition to sa.]

However,aqbp

should not be accepted by A since the number of a’s and b’s isnot the same. Contradiction.Therefore, L is not a regular language; i.e. there is nofinite-state automaton that accepts it.

a a sm

a a

a

a a

a

b b... ...

p a’s are input

q-p additionala’s are input

p b’s are input

Pigeonhole Principle

• The Pigeonhole Principle states that if n pigeons fly into mpigeonholes and n > m, then at least one hole mustcontain two or more pigeons.

• This combinatorial principle is sometimes also called theDirichlet Principle, since it was first formulated by J.P.G.L.Dirichlet (1805-1859).

Pigeonhole Principle: A function from one finite set to a smallerfinite set cannot be one-to-one. there must be at least twoelements in the domain that have the same image in thecodomain.

• Suppose |X | denotes the number of elements of set X .• The Pigeonhole Principle can then be written as: given a

functionf : X → Y , |X | > |Y |

we have

∃x1, x2 (x1 6= x2 ∧ f (x1) = f (x2)).

ExampleIn a group of 13 people, must there be two people born in thesame month? What about a group of 10 people?Solution: Let

X = {People in the group}, Y = {Months of the year}

and define the function

f : X → Y , f (x) = the month of x ’s birth

Since|X | = 13 > 12 = |Y |

by the Pigeonhole Principle, there are distinct x1, x2 such that

f (x1) = f (x2)

i.e. x1 and x2 have the same month of birth.

In a group of 10 people, |X | = 10, |Y | = 12, so

|X | 6> |Y |

and the Pigeonhole Principle does not apply.Therefore, there is no guarantee that two people would be bornin the same month of the year.

ExampleAre there at least two people in the Metro Toronto area with thesame number of hairs on their head?Solution: Let T be the set of all people in the Metro Torontoarea. Then,

|T | ≈ 3× 106.

letHx = set of hairs on the head of x ∈ T

On the other hand, the maximum number of hairs on a humanhead is less than 300,000, so, for every x ∈ T

|Hx | < 300, 000

Then,maxx∈T

|Hx | < 300, 000.

letf : T → {0, 1, . . . , 300000}

be defined as

f (x) = |Hx | = the number of hairs on x ’s head

Since|T | > max

x∈T|Hx |,

there must be two members of T , which have the same numberof hairs.

ExampleA drawer contains 10 red and 10 blue socks. If we pull socks atrandom, how many must we pick in order to guarantee 2 of thesame color?Solution: Let

X = { Socks pulled }, Y = { Red, Blue}.

Letf : X → Y , f (x) = colour of x

we want the size of X to be such that it guarantees (by thePigeonhole Principle) that there will be at least two distinctelements of X for which the image is the same.To accomplish that, we need

|X | > |Y | = 2

so we need to pick at least 3 socks.

Generalized Pigeonhole Principle

Generalized Pigeonhole Principle: For any function f from afinite set X to a finite set Y and any positive integer k , if

|X | > k · |Y |,

there is some y ∈ Y that is the image of at least k + 1 distinctelements of X .

Examples

(a) In a group of 85 people, what is the minimum number whohave the same first initial?

Number of initials = 26, 85 = 26 · 3 + 7, so at least 4.

(b) What is the minimum number of people with the samenumber of hairs on their head in the Metro Toronto area?Since

|T | ≈ 3× 106

and∀x ∈ T , |Hx | < 300, 000

we have|T | > 10 max

x∈T|Hx |,

there are at least 11 people with the same number of hairson their head

Russell’s Paradox

• At the beginning of this century Alfred Whitehead (1861 -1947) and Betrand Russell (1872 - 1970) attempted to abook, Principia Mathematica, which would develop all ofmathematics, staring from some basic principles of settheory. The scope of this work was incredible, the proofthat 1 + 1 = 2 does not appear until page 362.

• The work itself was flawed; there were theorems that theywere not able to address using only the basic principlesthey had assumed.

• The basic inconsistency is what is now known as Russell’sParadox

• We have seen before that a set can be an element ofanother set; e.g.

{{a, b}, c}

• Most often, a set is not an element of itself. However, thereis no principle of set theory which would prohibit that.

• Consider the universal set U and suppose that everyobject is an element of U.

• Then, U must be an element of itself

U ∈ U

• Let S be a set of all sets that are not elements ofthemselves:

S = {A|A ⊆ U and A 6∈ A}

• Now, S is either an element of itself or not:

S ∈ S or S 6∈ S

• If S ∈ S, then, by definition of S,

S 6∈ S

which is a contradiction.• On the other hand, if S 6∈ S, then

S ∈ S

(by definition of S). Contradiction.• We have obtained a paradox• The only way out is to conclude that S 6⊆ U; i.e.S is not a

set in the universe.

The Barber Puzzle

In a certain town there is a male barber who shaves all thosemen, and only those men, who do not shave themselves.Question: Does the barber shave himself?

Answer: Neither yes nor no.If the barber shaves himself, then, by assumption, he does notshave himself, which is a contradiction.If he doesn’t shave himself, he is shaved by the barber(himself). Again, we reached a contradiction.

Conclusion: Such a situation cannot exist in the real world.

Figure: R. Magritte - Ceci n’est pas une pipe

The Halting Problem

• We wish to design an algorithm, H which will take as inputan algorithm A, and a potential input to that algorithm wand decide whether A eventually halts on the input w .

• The question of whether such an algorithm H exists or notis known as the Halting Problem.

• The existence of such an algorithm would be very useful forchecking whether our programs ever enter an infinite loop.

• An algorithm is, ultimately, a sequence of characters,presumably in some appropriate programming language.

• This sequence can be used as an input to anotheralgorithm.

• Indeed by asking the halting problem we implicitlyassumed that the algorithm A could be an input for H.

• This is similar to the situation where a set can be anelement of another set.

• In particular if A expects an algorithm for input, w could bethe encoding of another algorithm.

TheoremThere is no algorithm which will take as input an algorithm A,and a potential input to that algorithm w and decide whether Aeventually halts on the input w.Proof. We prove this by contradiction.Suppose that such an algorithm exists, H(A, w).Then, H(A, w) will output:• Halts, if A halts on input w• Loops, if A does not halt on input w

Now we design a new algorithm D, which uses H as a‘subroutine’. The input to D is the encoding of an algorithm A.On input A, D runs H on < A, A >. i.e. H determines theoutcome of running the algorithm A on an encoding of itself.D then reverses the output from H and acts as follows:• D(A) loops forever, if A halts on input A.• D(A) halts, if A loops forever on input A.

Now, run D on itself to derive a contradiction:

• D(D) loops forever, if D halts on input D. Contradiction.• D(D) halts, if D loops forever on input D. Contradiction,

again.

We see that our original assumption, that H exists, is wrong. �

Cardinal vs Ordinal Numbers

• Cardinal number: describes the size of the set.• Ordinal number: refers to the order of an element in a

sequence (e.g. the sixth element in the enumeration ofsome set or sequence)

• In order to define cardinal numbers, we need to reviewsome things about functions.

DefinitionA one-to-one correspondence, or a bijection between two setsA and B is a function

f : A → B

with the following two properties:1 f is one-to-one; i.e. two distinct elements of A cannot have

the same image in B

∀x , y ∈ A (x 6= y → f (x) 6= f (y))

2 f is onto; i.e. every element of B is an image of someelement of A.

∀y ∈ B ∃x ∈ A f (x) = y

Definition

1 We say that two sets A and B have the same cardinality ifthere exists a bijection f : A → B.

2 If B = {1, 2, . . . , n}, for some fixed n ∈ Z+, and there is abijection

f : A → B

we say that A is of size n (or of cardinality n) and write thatas |A| = n

3 If |A| = n, for some n ∈ Z+, we say that A is a finite set.

• Clearly, if f : A → B is a bijection, then there is an inversefunction (which is also a bijection)

f−1 : B → A

• Also the composition of two bijection is also a bijection;namely, if

f : A → B, g : B → C

are both bijections, so is

g ◦ f : A → C, where (g ◦ f )(x) = g(f (x))

• As a consequence, if A has the same cardinality as B andB has the same cardinality as C, then A and C will be ofthe same cardinality.

Countable Sets

• So far, our main distinction between different sets was byseparating them into finite and infinite sets.

• Georg Cantor introduced the notion of cardinal numbers asa means of defining infinite sets.

• Based on this idea, it is possible to measure to what extenta set may be “infinite”; i.e. is there any difference in theway set of all integers appears to be infinite, as opposedto, say, the set of all rational (or,even, real) numbers?

• For example, the set Z is discrete, whereas the real line(the usual way we think about R) has no “gaps”.

DefinitionWe say that a set A is countably infinite if, and only if, it has thesame cardinality as the set of all positive integers Z+

A set is called countable if it is either finite or countably infinite.

Example(Even Numbers) Consider the set of positive even numbers

E = {2, 4, 6, . . . , 2n, . . .}

Consider the function

f : Z+ → E , f (n) = 2n

It is easy to show that this function is a bijection (exercise).Therefore, the positive even numbers are an example of acountably infinite set.

• This example also shows that a countably infinite set canhave a proper subset of the same cardinality, which is notthe case for finite sets.

Example(Z - Integers) We will show that the set of all integers iscountable.Consider the function

f : Z+ → Z, f (n) =

{ n2 , n is even−n−1

2 , n is odd

The output of the function looks like this:

n 1 2 3 4 5 . . .f (n) 0 1 -1 2 -2 . . .

Again, it is relatively easy to show that this function is abijection, which shows that Z is countable.

ExampleShow that the set of all positive rational numbers Q+ iscountable.In order to count the positive rational numbers we create aninfinite table on which to count, the denominator increases aswe move to the right, and the numerator increases as we movedown (see the figure). This enumerates all possible values ofnumerator and denominator.

The output of the function looks like

n 1 2 3 4 5 6 . . .f (n) 1

112

21

31

22

13 . . .

• This method of showing that Q+ is countable is calledCantor’s diagonalization process.

• Using the same approach we used to show that Z iscountable, we can show that the set of all rational numbersis countable; we need to construct a bijection between Qand Q+ (Exercise.)

Theorem(Cantor) The set of all real numbers between 0 and 1 isuncountable; i.e. there is no bijection

f : Z+ → (0, 1)

Proof. We will prove this by contradiction, by assuming thatsuch a bijection exists

f : Z+ → (0, 1)

This means that we can index the numbers between 0 and 1into a sequence on a large sheet of paper.An example of what such a list may look like is on the next slide.

n f (n)1 0.1986759143598725309861532. . .2 0.6569872345023458796234509. . .3 0.2938745723450972345234534. . .4 0.9854918273450912346598764. . .5 0.1987523444098234734598723. . .6 0.2341897123487912349876123. . ....

...

Construct a new decimal number

d = 0.d1d2d3 . . . dn . . .

between 0 and 1 as follows:

dn =

{1, if the n-th digit of f (n) is not 10, if the n-th digit of f (n) is 1

In this way, we get a number which is between 0 and 1, but itcannot appear in the list above since, for every n ≥ 1, it differsin the n-th decimal place from the n-th number in the list.Contradiction. Therefore, such a bijection (sequence) cannotexist, and (0, 1) is not countable.

�

ExampleShow that the interval (0, 1) and the set of all real numbers Rhave the same cardinality.Solution: We will construct a bijection

f : (0, 1) → R

in the following way:

f (x) = tan((x − 12)π)

Notice that the function x → (x − 12)π transforms the interval

(0, 1) into another interval

(−π

2,π

2)

and the tangent function is a bijection between this interval andR.

TheoremAny subset of a countable set is countable.

The proof of this theorem is relatively easy; see p.451.

• One consequence of this theorem is the following:

Any set with an uncountable subset must be uncountable.

ExampleLet T be a set of all functions from positive integers to the set

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Show that T is uncountable.Solution: We are going to construct a bijection

F : (0, 1) → T

Suppose0.a1a2a3 . . . an . . . ∈ (0, 1)

Define

F (0.a1a2a3 . . . an . . .) = function sending each n ≥ 1 into then-th digit an

It is easy to check that F is a bijection between all numbersfrom (0, 1) and all functions from positive integers into the set ofdecimal digits.

Therefore, T is uncountable, since (0, 1) is such.

• The sets which have the same cardinality as (0, 1) (or R)are said to be of cardinality continuum.

• The question as to whether there are infinite cardinalitieslying between countable sets and sets of cardinalitycontinuum is very difficult and is known to be independentof basic principles of set theory.

Chapter 12: Regular Expressions and Finite-State Automatamth314/W08/Slides/Chapter12.pdf · Chapter...

Documents

Transcript of Chapter 12: Regular Expressions and Finite-State Automatamth314/W08/Slides/Chapter12.pdf · Chapter...