Download - Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Transcript

Modeling Delta Encoding of Compressed Files

S.T. Klein, T.C. Serebro, D. Shapira

Delta Encoding

Example:

S=The Prague Stringology ClubT=The Prague Stringology Conference 06

Δ=(1, 24)onferenc(3,2)06

Compressed Differencing

Goal- Create a delta file of S and T, without decompressing the compressed files.

S T

Δ(S,T)

E(S)Delta encoding:Semi Compressed Differencing:

E(T)SE(S)Full Compressed Differencing:

LZW compressionSTR = input character WHILE there are input characters {C = input character IF STR C is in T then

STR = STR C ELSE {

output the code for STR add STR C to T STR = C

}} output the code for STR

S =abccbaaabccba

Example

E(S) =1233219571

construct the trie of E(S)i 1while i ≤ u{ P Starting at the root,

traverse the trie using P When a leaf v is reached k depth of v in trie output the position in S

corresponding to v ii+ k}

uii TTT ...1

Semi Compressed Differencing Algorithm

E(S) =1233219571, T =ccbbabccbabccbba.

(3,2) b (5,2) (9,3)(5,2)(9,3) b (5,2)

Example

Δ(S,T)=

Full Compressed Differencing Algorithm1 construct the trie of E(S)2 flag 0 // output character k3 counter 1 // position in T4 input oldcw from E(T)5 while oldcwNULL // still processing E(T) {5.1 input cw from E(T)5.2 node Dictionary[oldcw]5.3 if (Dictionary[cw] NULL)5.3.1 k first character of string corresponding to Dictionary[cw]5.4 else5.4.1 k first character of string corresponding to node5.5 if ((node has a child k) and (cwNULL))5.5.1 output (pos+flag,len-flag) corresponding to child k of node5.5.2 flag 15.6 else5.6.1 output (pos+flag, len-flag) corresponding to node5.6.2 create a new child of node corresponding to k5.6.3 flag 05.7 pos of child k of node counter5.8 oldcw cw5.9 counter counter + len - flag }

E(S) =1233219571 E(T) =33221247957

Example

ExampleE(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=coldcw=3

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3k=c

3

Example

4(1,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3k=cΔ(S,T)=<3, 2>

3

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3k=c

Example

4(1,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3 flag=1k=cΔ(S,T)=<3, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=3 flag=1k=c

Example

4(1,2,c)

Δ(S,T)=<3, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=2 flag=1k=c

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=2 flag=1k=b

<5, 1>

5(2,2,c)

Example

4(1,2,c)

Δ(S,T)=<3, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=2 flag=1k=b

<5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=3cw=2 flag=1k=b

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=2cw=2 flag=1k=b

6(3,2,b)

b

<b, 0>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=2cw=2 flag=0k=b

Example

4(1,2,c)

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=2cw=2 flag=0k=b

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=2 flag=0k=b

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=1 flag=0k=a

6(3,2,b)

Δ(S,T)=<3, 2> <5, 1>

4(1,2,c)

5(2,2,c)

7(4,2,b)

<5, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=1 flag=1k=a

b

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=1 flag=1k=a

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaboldcw=2cw=1 flag=1k=a

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaboldcw=1cw=2 flag=1k=b

6(3,2,b)

4(1,2,c)

5(2,2,c)

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

b

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccoldcw=2cw=4 flag=1k=c

6(3,2,b)

b

4(1,2,c)

5(2,2,c)

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

9(6,2,b)

<3, 1>

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbaoldcw=4cw=7 flag=1k=b

6(3,2,b)

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

9(6,2,b)

<3, 1>

10(7,3,c)

b

(2, 1)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbaoldcw=4cw=7 flag=0k=b

b

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

6(3,2,b)

b

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

9(6,2,b)

<3, 1>

10(7,3,c)

b

(2, 1)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabcoldcw=7cw=9 flag=0k=b

11(9,3,b)

b

(4, 2)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccboldcw=9cw=5 flag=0k=c

<9, 3>

12(11,3,b)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccboldcw=9cw=5 flag=1k=c

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccbbaoldcw=5cw=7 flag=1k=b

13(13,3,c)

b

(3, 1)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccbbaoldcw=7cw=Null flag=0k=b

(4, 2)

Combination of Pairs

Δ(S,T)=<3, 2> <5, 1> <5, 2><2,1> <3, 1> (2, 1)(4, 2)<9, 3> (3, 1)(4, 2)

S =abccbaaabccbaS =abccbaaabccba

<3, 2> <5, 1>

S =abccbaaabccba

<3, 3>

If two consecutive ordered pairs are of the form and , we combine them into a single ordered pair

1, li 21, lli 21, lli

Combination of Pairs

If two consecutive ordered pairs are of the form and , we combine them into a single ordered pair

1, li 21, lli 21, lli

Δ(S,T)= <5, 2><2,1> <3, 1> (2, 1)(4, 2)<9, 3> (3, 1)(4, 2)

S =abccbaaabccbaS =abccbaaabccbaS =abccbaaabccba

<3, 3> <2,1><3, 1><2, 2>

Δ(S,T)= <5, 2> (4, 2) <9, 3> (4, 2)<3, 3> <2,2 > c b

Encoding the delta fileΔ(S,T)= <5, 2> (4, 2) <9, 3> (4, 2)<3, 3> <2,2 > c b

File consists of:

(pos, len) in S

(pos, len) in T

Characters

flags

Experiments:

S = xfig.3.2.1 T = xfig.3.2.2

|T| = 812K|Gzip(T)| = 325K|LZW(T)| = 497K|Δ(S,T)| 3K