Download - String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Transcript
Page 1: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

String Matching

Inge Li Gørtz

CLRS 32

Page 2: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

String Matching

• String matching problem:

• string T (text) and string P (pattern) over an alphabet Σ.

• |T| = n, |P| = m.

• Report all starting positions of occurrences of P in T.

• ε: empty string

• prefix/suffix: v=xy, x prefix of v, y suffix of v.

P = a b a b a c aT = b a c b a b a b a b a b a c a b

Page 3: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

String Matching

• Finite automaton

• Knuth-Morris-Pratt (KMP)

Page 4: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

A naive string matching algorithm

a b a b a c ab a c b a b a b a b a b a c a b

a b a b a c aa b a b a c a

a b a b a c aa b a b a c a

a b a b a c aa b a b a c a

a b a b a c aa b a b a c a

a b a b a c a

Page 5: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

T =P = a a a b a b a

Page 6: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

a a a b a b aT =P = a a a b a b a

Page 7: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

a a a b a b aT =P = a a a b a b a

Page 8: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

a a a b a b aT =P = a a a b a b a

Page 9: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

a a a b a b aT =P = a a a b a b a

a a a b a b a

Page 10: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

a a a b a b aT =P = a a a b a b a

Page 11: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

a a a b a b aT =P = a a a b a b a

a a a b a a a

Page 12: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

a a a b a b aT =P = a a a b a b a

Page 13: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

a a a b a b aT =P = a a a b a b a

a a a b a a a

Page 14: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

a a a b a b aT =P = a a a b a b a

Page 15: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

a a a b a b aT =P = a a a b a b a

a a a b a b a

Page 16: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

a a a b a a a b a b a b a c a b b

Improving the naive algorithm

a a a b a b aT =P = a a a b a b a

Page 17: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Exploiting what we know from pattern

a b a b a c aT =P = a b a b a c a

a b a b a a

a b a b a c aT = a b a b a b

a b a b a c aT = a b a b a c

What character in the pattern should we check next?

What character in the pattern should we check next?

What character in the pattern should we check next?

Page 18: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Fill the table

matched a b a b a c a#matched 0 1 2 3 4 5 6 7

abc

P = a b a b a c a What character in the pattern should we check next?

Page 19: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Fill the table

matched a b a b a c a#matched 0 1 2 3 4 5 6 7

abc

P = a b a b a c a

matched a b a b a c a#matched 0 1 2 3 4 5 6 7

a 2b 5c 7

What character in the pattern should we check next?

Page 20: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

matched a b a b a c a#matched 0 1 2 3 4 5 6 7

abc

Fill the table

P = a b a b a c a

matched a b a b a c a#matched 0 1 2 3 4 5 6 7

abc

What character in the pattern should we check next?

How much of

the pattern is matched?

matched a b a b a c a#matched 0 1 2 3 4 5 6 7

abc

Page 21: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

Page 22: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P = ababaca.

a b a b a c a

a

a a

b

a

b

starting state accepting state

Page 23: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P = ababaca.

a b a b a c a

a

a a

b

a

b

starting state accepting state

Page 24: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P= ababaca.

a b a b a c a

a

a a

b

a

b

starting state accepting state

a b a aa b a b a c aP:

Matched until now:

Page 25: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P= ababaca.

a b a b a c a

a

a a

b

a

b

starting state accepting state

a b a aa b a b a c aP:

Matched until now:

Page 26: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P= ababaca.

a b a b a c a

a

a a

b

a

b

starting state accepting state

a b a aa b a b a c aP:

Matched until now:

Page 27: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P= ababaca.

a b a b a c a

a

a a

b

a

b

starting state accepting state

a b a aa b a b a c aP:

Matched until now:

Page 28: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P= ababaca.

a b a b a c a

a

a a

b

a

b

longest prefix of P that is a suffix of ‘abaa'

starting state accepting state

a b a aa b a b a c aP:

Matched until now:

Page 29: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P= ababaca.

a b a b a c a

Page 30: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P= ababaca.

a b a b a c a

read ‘a’?

Page 31: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P= ababaca.

a b a b a c a

read ‘a’?

a aa b a b a c aP:

Matched until now:

Page 32: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P= ababaca.

a b a b a c a

read ‘a’?

a aa b a b a c aP:

Matched until now:

Page 33: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P= ababaca.

a b a b a c a

longest prefix of P that is a suffix of ‘aa’ = ‘a’ read ‘a’?

a aa b a b a c aP:

Matched until now:

Page 34: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: alphabet Σ = {a,b,c}. P= ababaca.

a b a b a c a

longest prefix of P that is a suffix of ‘aa’ = ‘a’ read ‘a’?

a

a aa b a b a c aP:

Matched until now:

Page 35: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: • Q: finite set of states • q0 ∈ Q: start state

• A ⊆ Q: set of accepting states

• Σ: finite input alphabet • δ: transition function a b a b a c a

a

a a

b

a

b

Page 36: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

Finite Automaton

• Finite automaton: • Q: finite set of states • q0 ∈ Q: start state

• A ⊆ Q: set of accepting states

• Σ: finite input alphabet • δ: transition function

• Matching time: O(n)• Preprocessing time: O(m3|Σ|). Can be done in O(m|Σ|).• Total time: O(n + m|Σ|)

a b a b a c a

a

a a

b

a

b

Page 37: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

Page 38: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• Finite automaton: alphabet Σ = {a,b,c}. P = ababaca.

• KMP: Can be seen as finite automaton with failure links:

a b a b a c a

a

a a

b

a

b

a b a b a c a1 2 3 4 5 6

Page 39: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• Finite automaton: alphabet Σ = {a,b,c}. P = ababaca.

• KMP: Can be seen as finite automaton with failure links:

a b a b a c a

a

a a

b

a

b

a b a b a c a1 2 3 4 5 6

Page 40: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a suffix of what we have matched until now (ignore the mismatched character).

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘aba'

Page 41: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP matching

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a suffix of what we have matched until now.

b a c b a b a b a b a b a c a bT =

a b a b a c a1 2 3 4 5 6

Page 42: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 43: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 44: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 45: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 46: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 47: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 48: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 49: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 50: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 51: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 52: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 53: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 54: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 55: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 56: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 57: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 58: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• KMP: Can be seen as finite automaton with failure links:

• longest prefix of P that is a proper suffix of what we have matched until now.

• can follow several failure links when matching one character:

a b a b a aT =

a b a b a c a1 2 3 4 5 6

Page 59: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP Analysis

• Lemma. The running time of KMP matching is O(n).

• Each time we follow a forward edge we read a new character of T.

• #backward edges followed ≤ #forward edges followed ≤ n.

• If in the start state and the character read in T does not match the forward edge, we stay there.

• Total time = #non-matched characters in start state + #forward edges followed + #backward edges followed ≤ 2n.

Page 60: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

Matched until now:

Page 61: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a bMatched until now:

Page 62: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a ba b a b a c a

Matched until now:

Page 63: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a ba b a b a c a

Matched until now:

Page 64: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a ba b a b a c a

Matched until now:

Page 65: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

• Computing failure links: Use KMP matching algorithm.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a b a c a1 2 3 4 5 6

Can be found by using KMP to match ‘bab'

Page 66: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

• Computing failure links: Use KMP matching algorithm.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a b a c a1 2 3 4 5 6

Can be found by using KMP to match ‘bab'

Page 67: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

• Computing failure links: Use KMP matching algorithm.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a b a c a1 2 3 4 5 6

Can be found by using KMP to match ‘bab'

Page 68: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

• Computing failure links: Use KMP matching algorithm.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a b a c a1 2 3 4 5 6

Can be found by using KMP to match ‘bab'

Page 69: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

• Computing failure links: Use KMP matching algorithm.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a b a c a1 2 3 4 5 6

Can be found by using KMP to match ‘bab'

Page 70: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

• Computing failure links: Use KMP matching algorithm.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a b a c a1 2 3 4 5 6

Can be found by using KMP to match ‘bab'

Page 71: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

• Computing failure links: Use KMP matching algorithm.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a b a c a1 2 3 4 5 6

Can be found by using KMP to match ‘bab'

Page 72: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

• Computing failure links: Use KMP matching algorithm.

Computation of failure links

a b a b a c a1 2 3 4 5 6

longest prefix of P that is a suffix of ‘abab'

a b a b a c a1 2 3 4 5 6

Can be found by using KMP to match ‘bab'

Page 73: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

• Computing failure links: As KMP matching algorithm (only need failure links that are already computed).

• Failure link: longest prefix of P that is a proper suffix of what we have matched until now.

Computation of failure links

1 2 3 4 5 6 7

a b a b a c aP =

a b a b a c a1 2 3 4 5 6

Page 74: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP

• Computing π: As KMP matching algorithm (only need π values that are already computed).

• Running time: O(n + m):

• Lemma. Total number of comparisons of characters in KMP is at most 2n.• Corollary. Total number of comparisons of characters in the preprocessing

of KMP is at most 2m.

Page 75: String Matching - Technical University of Denmark · • String matching problem: • string T (text) and string P (pattern) over an alphabet Σ. • |T| = n, |P| = m. • Report

KMP: the π array

• π array: A representation of the failure links.

• Takes up less space than pointers.

a b a b a c a

i 1 2 3 4 5 6 7

π[i] 0 0 1 2 3 0 1

1 2 3 4 5 6