Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all...

53
Tries and Suffix Trees Inge Li Gørtz

Transcript of Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all...

Page 1: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

Tries and Suffix Trees Inge Li Gørtz

Page 2: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• String matching problem. Given strings T (text) and P (pattern) over an alphabet Σ, report starting positions of all occurrences of P in T. • Finite automaton: O(mΣ + n) time and space• KMP: O(m+n) time and space

• String indexing problem. Given a string S of characters from an alphabet Σ. Preprocess S into a data structure to support• Search(P): Return starting position of all occurrences of P in S.

• Today: Data structure using O(n) space and supporting Search(P) in O(m) time.

• Applications: • Search engines, e.g. prefix searches.• Finding common substrings of many biological strings• Finding repeating substructures in biological strings• Detecting DNA contamination

String indexing problem

Page 3: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Tries• Compressed tries• Suffix trees• Applications of suffix trees

Outline

Page 4: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

Tries

Page 5: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval

• Trie over the strings: sells, by, the, sea, shells, tea.

Tries

b

yS2

S1

s

s

s

S4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

Page 6: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Prefix-free?

• Trie over the strings: sells, by, the, sea, shells, tea, she.

Tries

b

yS2

S1

s

s

s

S4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

Page 7: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Prefix-free?

• Trie over the strings: sells, by, the, sea, shells, tea, she.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 8: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Page 9: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 10: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 11: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 12: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 13: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 14: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 15: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 16: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 17: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 18: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 19: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “sea”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

S4

Tries

b

y

S2

S1

s

s

sS6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 20: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “short”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 21: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “short”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 22: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “short”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 23: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “short”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 24: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Text retrieval • Search for “short”

• Trie over the strings: sells$, by$, the$, sea$, shells$, tea$, she$.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 25: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Build a trie over the strings: by$, sells$, sea$.

Tries

b

y

S2

S1

s

sS4

e

a l

l

$

$

$

Page 26: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Properties of the trie. A trie T storing a collection S of s strings of total length n from an alphabet of size d has the following properties:

• How many children can a node have?

• How many leaves does T have?

• What is the height of T?

• What is the number of nodes in T?

Trie

Page 27: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Search time: O(d) in each node => O(dm).• O(m) if d constant. • d not constant: use dictionary

• Hashing O(1)• Balanced BST: O(log d)

• Time and space for a trie (for small/constant d): • O(m) for searching for a string of length m. • O(n) space.• Preprocessing: O(n)

Trie

Page 28: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Prefix search: return all words in the trie starting with “se”

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 29: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

• Prefix search: return all words in the trie starting with “se”

Page 30: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Prefix search: return all words in the trie starting with “se”

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 31: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Time for prefix search: O(m) + time to report all occurrences.

• Solution: compact tries.

Trie

Could be large!!

Page 32: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

Compact tries

Page 33: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Compact trie: Chains of nodes with a single child is merged into a single node.

Tries

b

y

S2

S1

s

s

sS4 S6 S3

e

e

e

ea al

l

l

l

h h

t

S5

$

$

$

$

$ $S7

$

Page 34: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Compact trie: Chains of nodes with a single child is merged into a single node.

Tries

b ty

a

se

ll

e

s

ahe

hel

ls

$$

$$

$ $ $

by$

S2

s

e

a$

S4

lls$S1

he

lls$

S6 S3

S5 S7

$

t

ea$

he$

Page 35: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Properties of the compact trie. A compact trie T storing a collection S of s strings of total length n from an alphabet of size d has the following properties:• Every internal node of T has at least 2 and at most d children.• T has s leaves• The number of nodes in T is < 2s.

• Time and space for a compact trie (constant d): • O(m) for searching for a string of length m. • O(m + occ) for prefix search, where occ = #occurrences• O(s) space.• Preprocessing: O(n)

Trie

Page 36: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

Suffix trees

Page 37: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• String indexing problem. Given a string S of characters from an alphabet Σ. Preprocess S into a data structure to support• Search(P): Return starting position of all occurrences of P in S.

• Build a compressed trie over all suffixes of S (suffix tree). Label leaves with index of suffix.

• Observation: An occurrence of P is a prefix of a suffix of S.

Suffix tree

occurrence of P

Suffix of S

Page 38: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• String indexing problem. Given a string S of characters from an alphabet Σ. Preprocess S into a data structure to support• Search(P): Return starting position of all occurrences of P in S.

• Build a compressed trie over all suffixes of S (suffix tree). Label leaves with index of suffix.

• Observation: An occurrence of P is a prefix of a suffix of S.

• Example: P = ana.

Suffix tree

occurrence of P

Suffix of S

b a n a n a s t r i n g s s a l a d s

Suffix of S

Suffix of S

Page 39: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Suffix tree: over the string banana$

Suffix Tree

$

b

1

an

a

an

a

$

6

na

$

4

na$

2

na

na$

3 5

$

$

7

Page 40: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Suffix tree: over the string banana$

Suffix Tree

$

b

1

an

a

an

a

$

6

na

$

4

na$

2

na

na$

3 5

$

$

7

• Search for P.• Report labels of all leaves

below final node

Page 41: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Suffix tree: over the string banana$• Find all occurrences of P=“an”

Suffix Tree

$

b

1

an

a

an

a

$

6

na

$

4

na$

2

na

na$

3 5

$

$

7

• Search for P.• Report labels of all leaves

below final node

Page 42: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Suffix tree: over the string banana$

Suffix Tree

$

b

1

an

a

an

a

$

6

na

$

4

na$

2

na

na$

3 5

$

$

7

1 2 3 4 5 6 7b a n a n a $

• Store S and store node labels by reference to S.

Page 43: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Suffix tree: over the string banana$

Suffix Tree

1

[2,2]

6

42

3 5

7

1 2 3 4 5 6 7b a n a n a $

• Store S and store node labels by reference to S.

[3,4] [7,7]

[5,7]

[7,7]

[7,7]

[7,7]

[1,7][3,4]

[5,7]

Page 44: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

Suffix trees and common substrings

918

15 12

716 3 6

14 11

2

1013

5 1

17 8 4

$2 $1

$2 $2

$2 $1 $1

$2 $2

$1 $1

$2 $2

$1 $1

$2 $1 $1

Page 45: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Suffix tree of a string S: Compact trie over all suffixes of S.

• Space and time:• Space: O(n)• Search time: O(m) + time to report all occurrences = O(m+occ)• Preprocessing: Can be done in O(sort(n,|Σ|)) time, where sort(n,|Σ|) is the time it

takes to sort n characters from an alphabet Σ.

• Suffix trees can be used to solve the String indexing problem in:• Space: O(n)• Search time: O(m+occ)• Preprocessing: O(sort(n,|Σ|)) time

Suffix tree

Page 46: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

Applications of suffix trees

Page 47: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Find longest common substring of strings S1 and S2. • Construct the suffix tree over S1$1S2$2.• Example: Find longest common substring of piespies and piepiees:

• Construct suffix tree of piespies$1piepiees$2.

Longest common substring

Page 48: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Suffix tree of piespies$1piepiees$2.

Generalized suffix tree

$1

918

15 12

716 3 6

14 11

2

1013

5 1

17 8 4

$2

$1

$1

$1

$2

$2

$2

$2

$2

p

p p

pp

i

i

i eee

ee

e

e

e

ss

s

s

s

s

i

i

i

s s

$1

$2

$2$2

s

s

e

e

...

...

......

...

e e

$2

$2

$2 $2

$1

$2

...

e

$2s p

s

ie

$1

$2

...

p

si

$1

$2

e

...

p

si

$1

$2

e

...

Page 49: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Suffix tree of piespies$1piepiees$2.• Mark leaf with if $1 suffix starts in S1.

Generalized suffix tree

918

15 12

716 3 6

14 11

2

1013

5 1

17 8 4

$2 $1

$2 $2

$2 $1 $1

$2 $2

$1 $1

$2 $2

$1 $1

$2 $1 $1

Page 50: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Suffix tree of piespies$1piepiees$2.• Mark leaf with if $1 suffix starts in S1.

Generalized suffix tree

918

15 12

716 3 6

14 11

2

1013

5 1

17 8 4

$2 $1

$2 $2

$2 $1 $1

$2 $2

$1 $1

$2 $2

$1 $1

$2 $1 $1

Page 51: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Suffix tree of piespies$1piepiees$2.• Mark leaf with if $1 suffix starts in S1.• Add string-depth.

15111

Generalized suffix tree

918

15 12

716 3 6

14 11

2

1013

5 1

17 8 4

$2 $1

$2 $2

$2 $1 $1

$2 $2

$1 $1

$2 $2

$1 $1

$2 $1 $1

[18,18] [9,18] [15,15]

[16,18]

[13,18]

[17,17]

[18,18]

[9,18]

[5,18]

[14,15]

[16,18]

[13,18]

[8,8]

[9,18] [5,18]

[13,15]

[16,18]

[13,18]

[8,8]

[9,18] [5,18]

[18,18]

[17,17]

[9,18]

[5,18]

1 10 1

4 7 2

3 12 16

2

5 8 3

13 17

3

6 9 4

14 18

Page 52: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

13

3

• Suffix tree of piespies$1piepiees$2.• Mark leaf with if $1 suffix starts in S1.• Add string-depth.

15111

Generalized suffix tree

918

15 12

716 3 6

14 11

2

1013

5 1

17 8 4

$2 $1

$2 $2

$2 $1 $1

$2 $2

$1 $1

$2 $2

$1 $1

$2 $1 $1

[18,18] [9,18] [15,15]

[16,18]

[13,18]

[17,17]

[18,18]

[9,18]

[5,18]

[14,15]

[16,18]

[13,18]

[8,8]

[9,18] [5,18]

[13,15]

[16,18]

[13,18]

[8,8]

[9,18] [5,18]

[18,18]

[17,17]

[9,18]

[5,18]

1 11 1

4 7 2

3 12 16

2

5 8 3

17

6 9 4

14 18

S[13,15] = “pie” is the longest common substring.

Page 53: Tries and Suffix Trees - Technical University of · PDF filereport starting positions of all occurrences of P in T. ... • Hashing O(1) ... Suffix Tree $ b 1 a n a a n a $ 6 n a

• Using a suffix tree we can solve the longest common substring problem in linear time (for a constant size alphabet).

Longest common substring