Dynamic Text and Static Pattern Matching
description
Transcript of Dynamic Text and Static Pattern Matching
![Page 1: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/1.jpg)
Dynamic Text and Static Pattern Matching
Amihood AmirGad M. Landau
Moshe LewensteinDina Sokol
Bar-Ilan University
![Page 2: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/2.jpg)
Classical Pattern Matching
Input: - Pattern P = p1p2…pm - Text T = t1 t2 t3 . . . tn
over alphabet Σ.
• m is the PATTERN size.
• n is the TEXT size.
Output: locations of T where P appears.
![Page 3: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/3.jpg)
Pattern Matching (eg.)
Input: P=agca = {a,g,c,t}
T=aaagcattagctagcagcat
![Page 4: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/4.jpg)
Pattern Matching (eg.)
Input: P=agca = {a,g,c,t}
Output:
1 2 3 4 5 6 … 13. . . 16
3
, 13
, 16,…
T=aaagcattagctagcagcat
![Page 5: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/5.jpg)
“Dynamic” Pattern Matching
A. Static Text and Dynamic Pattern.
B. Dynamic Text and Dynamic Pattern.
C. Dynamic Text and Static Pattern.
![Page 6: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/6.jpg)
“Dynamic” Pattern Matching
A. Static Text and Dynamic Pattern.
a.k.a. - the indexing problem
Solution: Preprocess text and
answer pattern queries
Preprocessing Data Structure:
Suffix trees, [Wei73,McC75,Ukk95,Far97] Time: O(n) prepro. O(m) query time
![Page 7: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/7.jpg)
“Dynamic” Pattern Matching
A. Static Text and Dynamic Pattern.
B. Dynamic Text and Dynamic Pattern.
Time: O(n) preprocessing O(m) query time
a.k.a. - the dynamic indexing problem Solution: sophisticated data structures
[SV96,ABR00] Time: query - O(m + log2n) change - O(log2n)
![Page 8: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/8.jpg)
“Dynamic” Pattern Matching
A. Static Text and Dynamic Pattern.
B. Dynamic Text and Dynamic Pattern.
Time: O(n) preprocessing O(m) query time
C. Dynamic Text and Static Pattern?
Time: query - O(m + log2n) change - O(log2n)
![Page 9: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/9.jpg)
Dynamic Text and Static Pattern Matching
Pattern is non-changing Text changes over time
Goal: report new occurrences of the pattern without performing a new
search.
![Page 10: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/10.jpg)
Motivation
a14
a4b2c3d5
c8a6
FAX
1. Intrusion detection systems
2. Info alerts
3. Two-dimensional run-length compressed matching problem, [ALS03]
![Page 11: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/11.jpg)
Problem Definition Input: T and P over Σ ={1, …, m}.
Output: 1. at start: all occurrences of P in T. 2. after change operation: a. report all new occurrences of P in T. b. discard all old occurrences of P in T.Change Operation: change one character
in the text, e.g. location 5 from a to b.
![Page 12: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/12.jpg)
Example
Input: P=agagagc = (ag)3c = {a,g,c,t}
T = g a g a g c t a g c g a g c a t
![Page 13: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/13.jpg)
Example
Input: P=agagagc = (ag)3c = {a,g,c,t}
T = g a g a g c t a g a g a g c a t
10
![Page 14: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/14.jpg)
Example
Input: P=agagagc = (ag)3c = {a,g,c,t}
T = g a g a g c t a g a g a g c a t
108
Output: {8}
![Page 15: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/15.jpg)
Results
O(log log m) time per replacement.
mm log After O(n log log m + ) preprocessing time,
![Page 16: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/16.jpg)
“Dynamic” Pattern Matching
A. Static Text and Dynamic Pattern.
B. Dynamic Text and Dynamic Pattern.
Time: O(n) preprocessing O(m) query time
C. Dynamic Text and Static Pattern.
Time: query - O(m + log2n) change - O(log2n)
Time: change and announce O(log log m)
![Page 17: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/17.jpg)
Static Stage
To initially find all occurrences of P in T, use KMP [Knuth-Morris-Pratt ‘77].
All pattern occurrences in a text of length 2m can be stored in O(1) space.
![Page 18: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/18.jpg)
Succinct Output
Assumption: the text is of size 2m.(Break the text T into overlapping strings
of length 2m-1. )
T 1 m 2m 3m 4m
P
![Page 19: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/19.jpg)
Succinct Output (cont.)
P is periodic: A string p is periodic if it matches itself before position |P|/2.
e.g. p = abcabcabca abcabcabca
Store the output as a ‘chain’ of pattern occurrences.
P is non-periodic: By definition, no more than two
occurrences.
![Page 20: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/20.jpg)
On-line Algorithm
Following each replacement:
Delete old matches that are no longer pattern occurrences.
Find new matches.
![Page 21: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/21.jpg)
Delete Old Matches
Deleting is trivial since we store the matches in constant space:
P is periodic: Truncate the chain of pattern occurrences.
P is non-periodic: Discard all matches that are within distance -m of the replacement.
![Page 22: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/22.jpg)
Find New Matches
Challenge: How can we locate occurrences of P, following each replacement, without actually searching for P?
![Page 23: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/23.jpg)
Main Idea - Text Covers
We ‘cover’ the text with substrings of the pattern, i.e. store the text in terms of P.
Pattern
Text = g a g a g c t a g c g a g c a t
= a g a g a g c
g a g a g c
[ 2,7]
1 2 3 4 5 6 7
a g c
] 5,7[
g a g ca
[4,7] [1,1]Cover:
![Page 24: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/24.jpg)
Text Cover (cont.)
The text cover must satisfy two properties:
Substring Property: each element of the cover is a substring of P, or a character not included in P.
Maximality Property: no two adjacent elements can concatenate to form a substring of P.
![Page 25: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/25.jpg)
Text Cover (cont.)
How does a replacement in the text affect the text cover?
• Initially, in the static stage, we construct a text cover for T.
• We ensure that the cover satisfies both the substring and maximality property.
![Page 26: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/26.jpg)
Text Cover following replacement
Pattern = a g a g a g c
Text = g a g a g c t a g c g a g c a t g a g a g c,a g
c,g a g c,
a Cover: (2,7) - (5,7) (4,7) (1,1) -
1 2 3 4 5 6 7
a
(2,7) - (5, 6)(1,1) (4,7) (1,1) -
(1,3)
(1,7)
![Page 27: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/27.jpg)
Updating the Text Cover
At most 5 pieces can violate the maximality property.
![Page 28: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/28.jpg)
Substring Concatenation Query
Query: Given two substrings of P, P[i,j] and P[k,l]. Is their concatenation also a substring of P?
Query time: O(log log m).Preprocessing time:
(also uses - [BG00])
). log ( mmO
Hence, in O(log log m) we can update the cover satisfying both properties.
![Page 29: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/29.jpg)
Find New Matches
Given: a text cover which satisfies both the substring and maximality properties.
Find: all new locations of the pattern in the text.
![Page 30: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/30.jpg)
Key Observations
A new match must begin within distance -m of the change.
A new match can include at most one entire piece of the cover.
It can span at most three pieces of the cover.
![Page 31: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/31.jpg)
Furthermore
A new match can begin in one of at most three pieces of the cover:– the piece with the change– the previous piece– the one previous to that
PT
![Page 32: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/32.jpg)
Simplified Problem
Search starts within piece of cover.
Simple O(m) time algorithm:– Check each location in X for a
pattern start.– Use suffix trees and LCA queries to
compare substrings in constant time.
PT X
![Page 33: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/33.jpg)
Improved Algorithm
Really, we only have to check each suffix of X that is a pattern prefix.
e.g. X = a g a g a
The KMP automaton can give the necessary information.
However, the time is still O(m) !
![Page 34: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/34.jpg)
Improved Algorithm
We can group the prefixes of P by their periods.
Each group of prefixes can be checked in constant time!
There are at most O(log m) groups.
![Page 35: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/35.jpg)
Groups (eg.)
Pattern = a g a g a g c1 2 3 4 5 6 7
X = a g a g a
There are three suffixes of X that are also pattern prefixes:
{ agaga, aga } { a }
Prefixes with the same period fall into a single group.
![Page 36: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/36.jpg)
Checking a group in Constant Time
Pattern = a g a g a g c1 2 3 4 5 6 7
X = a g a g a
a g a g a a g t . . . a g a g a g a g a g c
Idea: Match the period ‘ag’ as far as possible. As soon as (ag)* doesn’t match, check for a ‘c.’
g c . . .
![Page 37: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/37.jpg)
Groups
A string cannot have more than O(log m) border groups.
Hence, the time of the algorithm is O(log m).
[Intuition: each new group has a new period which has to be at least double the size of the old period. e.g. aagaagaa]
![Page 38: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/38.jpg)
Even Better...
We check only a constant number of groups.
Choosing these O(1) groups takes O(log log m) time.Hence, our algorithm takes O(log
log m) time per replacement.
![Page 39: Dynamic Text and Static Pattern Matching](https://reader035.fdocument.org/reader035/viewer/2022062322/56814d72550346895dbacc10/html5/thumbnails/39.jpg)
Open Problems
Allowing insertions and deletions to the text.
Searching for a set of multiple static patterns.