Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

23
Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira

Transcript of Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Modeling Delta Encoding of Compressed Files

S.T. Klein, T.C. Serebro, D. Shapira

Delta Encoding

Example:

S=The Prague Stringology ClubT=The Prague Stringology Conference 06

Δ=(1, 24)onferenc(3,2)06

Compressed Differencing

Goal- Create a delta file of S and T, without decompressing the compressed files.

S T

Δ(S,T)

E(S)Delta encoding:Semi Compressed Differencing:

E(T)SE(S)Full Compressed Differencing:

LZW compressionSTR = input character WHILE there are input characters {C = input character IF STR C is in T then

STR = STR C ELSE {

output the code for STR add STR C to T STR = C

}} output the code for STR

S =abccbaaabccba

Example

E(S) =1233219571

construct the trie of E(S)i 1while i ≤ u{ P Starting at the root,

traverse the trie using P When a leaf v is reached k depth of v in trie output the position in S

corresponding to v ii+ k}

uii TTT ...1

Semi Compressed Differencing Algorithm

E(S) =1233219571, T =ccbbabccbabccbba.

(3,2) b (5,2) (9,3)(5,2)(9,3) b (5,2)

Example

Δ(S,T)=

Full Compressed Differencing Algorithm1 construct the trie of E(S)2 flag 0 // output character k3 counter 1 // position in T4 input oldcw from E(T)5 while oldcwNULL // still processing E(T) {5.1 input cw from E(T)5.2 node Dictionary[oldcw]5.3 if (Dictionary[cw] NULL)5.3.1 k first character of string corresponding to Dictionary[cw]5.4 else5.4.1 k first character of string corresponding to node5.5 if ((node has a child k) and (cwNULL))5.5.1 output (pos+flag,len-flag) corresponding to child k of node5.5.2 flag 15.6 else5.6.1 output (pos+flag, len-flag) corresponding to node5.6.2 create a new child of node corresponding to k5.6.3 flag 05.7 pos of child k of node counter5.8 oldcw cw5.9 counter counter + len - flag }

E(S) =1233219571 E(T) =33221247957

Example

ExampleE(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=coldcw=3

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3k=c

3

Example

4(1,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3k=cΔ(S,T)=<3, 2>

3

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3k=c

Example

4(1,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3 flag=1k=cΔ(S,T)=<3, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=3 flag=1k=c

Example

4(1,2,c)

Δ(S,T)=<3, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=2 flag=1k=c

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=2 flag=1k=b

<5, 1>

5(2,2,c)

Example

4(1,2,c)

Δ(S,T)=<3, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=2 flag=1k=b

<5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=3cw=2 flag=1k=b

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=2cw=2 flag=1k=b

6(3,2,b)

b

<b, 0>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=2cw=2 flag=0k=b

Example

4(1,2,c)

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=2cw=2 flag=0k=b

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=2 flag=0k=b

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=1 flag=0k=a

6(3,2,b)

Δ(S,T)=<3, 2> <5, 1>

4(1,2,c)

5(2,2,c)

7(4,2,b)

<5, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=1 flag=1k=a

b

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=1 flag=1k=a

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaboldcw=2cw=1 flag=1k=a

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaboldcw=1cw=2 flag=1k=b

6(3,2,b)

4(1,2,c)

5(2,2,c)

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

b

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccoldcw=2cw=4 flag=1k=c

6(3,2,b)

b

4(1,2,c)

5(2,2,c)

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

9(6,2,b)

<3, 1>

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbaoldcw=4cw=7 flag=1k=b

6(3,2,b)

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

9(6,2,b)

<3, 1>

10(7,3,c)

b

(2, 1)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbaoldcw=4cw=7 flag=0k=b

b

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

6(3,2,b)

b

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

9(6,2,b)

<3, 1>

10(7,3,c)

b

(2, 1)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabcoldcw=7cw=9 flag=0k=b

11(9,3,b)

b

(4, 2)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccboldcw=9cw=5 flag=0k=c

<9, 3>

12(11,3,b)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccboldcw=9cw=5 flag=1k=c

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccbbaoldcw=5cw=7 flag=1k=b

13(13,3,c)

b

(3, 1)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccbbaoldcw=7cw=Null flag=0k=b

(4, 2)

Combination of Pairs

Δ(S,T)=<3, 2> <5, 1> <5, 2><2,1> <3, 1> (2, 1)(4, 2)<9, 3> (3, 1)(4, 2)

S =abccbaaabccbaS =abccbaaabccba

<3, 2> <5, 1>

S =abccbaaabccba

<3, 3>

If two consecutive ordered pairs are of the form and , we combine them into a single ordered pair

1, li 21, lli 21, lli

Combination of Pairs

If two consecutive ordered pairs are of the form and , we combine them into a single ordered pair

1, li 21, lli 21, lli

Δ(S,T)= <5, 2><2,1> <3, 1> (2, 1)(4, 2)<9, 3> (3, 1)(4, 2)

S =abccbaaabccbaS =abccbaaabccbaS =abccbaaabccba

<3, 3> <2,1><3, 1><2, 2>

Δ(S,T)= <5, 2> (4, 2) <9, 3> (4, 2)<3, 3> <2,2 > c b

Encoding the delta fileΔ(S,T)= <5, 2> (4, 2) <9, 3> (4, 2)<3, 3> <2,2 > c b

File consists of:

(pos, len) in S

(pos, len) in T

Characters

flags

Experiments:

S = xfig.3.2.1 T = xfig.3.2.2

|T| = 812K|Gzip(T)| = 325K|LZW(T)| = 497K|Δ(S,T)| 3K