New Lower and Upper Bounds for Representing...

22
New Lower and Upper Bounds for Representing Sequences Djamal Belazzougui and Gonzalo Navarro LIAFA, Univ. Paris Diderot - Paris 7 and University of Chile Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Transcript of New Lower and Upper Bounds for Representing...

Page 1: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

New Lower and Upper Bounds for RepresentingSequences

Djamal Belazzougui and Gonzalo Navarro

LIAFA, Univ. Paris Diderot - Paris 7 and University of Chile

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 2: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation

Sequence of length n over an alphabet of size σ.

Goal support three operations: select, access and rank.

Many applications in compressed and succinct data structures:text indexing (FM-index), representations of graph,permutations etc...

Work in word-RAM model with w bit-words.

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 3: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (space)

Sequence occupies n log σ bits of space.

compact space O(n log σ) bits.

succinct space n log σ(1 + o(1)) bits.

compressed space nH0(1 + o(1)) bits or nHk(1 + o(1)) bits.

H0 and Hk , empirical entropies of order 0, k of the sequence.

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 4: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (Operations)

select(c , i) −→ position of occurrence number i of characterc .

rank(c , i) −→ number of occurrences of character c beforeposition i .

access(i) −→ character at position i in the sequence.

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 5: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (Select)

select(c , i) −→ position of occurrence number i of characterc .

examples: select(a, 3) = 4, select(c , 1) = 7, select(b, 3) = 12.

b aa a a b b c c a a b

select(c,1)

4 7 12

select(a,3) select(b,3)

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 6: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (access)

access(i) −→ character at position i in the sequence.

examples: access(4) = a, access(7) = c , access(12) = b.

b aa a a b c c a a b

access(7) access(12)

b

access(4)

1 2 3 4 5 6 7 8 9 10 11 12

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 7: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (rank)

rank(c , i) −→ number of occurrences of character c beforeposition i .

examples: rank(b, 4) = 1, rank(a, 7) = 3, rank(c , 12) = 2.

b aa a a b c c a a b

rank(a,7) = 3rank(b,4) = 1 rank(c,12) = 2

52 3 4 61 1 2

b

3 421

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 8: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (main results)

Rank on sequence representation −→ equivalent topredecessor search.

Predecessor search of n elements from universe of size nσ.

Equivalent to rank operation on a sequence of n elements overalphabet of size σ.

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 9: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (main results)

Optimal time for rank operation : O(log log σlogw ).

Optimal space O(n log σ) bits.

Select and Access also supported in constant time (alreadyknown before).

Also get succinct (n log σ(1 + o(1))), compressed(nH0(1 + o(1))) space with slower acess or slower select.

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 10: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Our new upper bounds

H0 zero entropy of the sequence. Hk , entropy of order k withk = o(n log σ)

f (n, σ) = ω(1) and f (n, σ) = o(log log σlogw ).

Source space (bits) access select rank

Old O(n log σ) O(1) O(1) O(log log σ)

nH0 + o(nH0 + n) O( log σlog log n ) O( log σ

log log n ) O( log σlog log n )

nHk + o(n log σ) O(1) O(log log σ) O(log log σ)

New O(n log σ) O(1) O(1) Θ(log log σlogw )

nH0 + o(nH0 + n) ( log σlogw ) O( log σ

logw ) ( log σlogw )

nH0 + o(nH0 + n) O(f (n, σ)) O(f (n, σ)) Θ(log log σlogw )

nHk + o(n log σ) O(1) O(f (n, σ)) Θ(log log σlogw )

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 11: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (Lower bound)

Predecessor search of n elements universe of size u=nσ.

Reduces to rank operation over a sequence of length n overalphabet of size σ

Use the optimal Patrascu-Thorup result.

Query time o(log log(u/n)logw ) = o(log log σ

logw ) −→ Space ω(nw c)for any constant c

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 12: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (Lower bound reduction)

Colored predecessor n = 6, σ = 3, u = nσ = 18

1 2 3 4 5 6 7 8 10 11 12 13 14 15 16 17 189

predcolor(10)=

1 2 3 4 5 6

a

b

c

predcolor(10)=

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 13: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (Lower bound reduction)

a

a

b

c

c

b

1 2 3 4 6

1

5

1’ 2 2’ 4 6

a c a b c a

predcolor(10)=

predcolor(10)=

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 14: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (Lower bound reduction)

Colored predecessor search data structure.

color bit-vector that mark the color of each element of the set(uses n bits ). 1 for green and 0 for red.

row bit-vector with the number of elements in each row.

row of t elements −→ t 1s followed by one 0 (n + σ) bits.

column bit-vector the number of elements in each column (2nbits).

sequence representation of n elements over alphabet σ.

Overall space −→ about 3n + σ bits plus the space forsequence representation.

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 15: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (Lower bound reduction)

Want to answer colorpred(x).

translate x into a 2D point−→ column i = x mod n, linej = x div n.

Compute r0, the number of elements in rows in rows [1..i − 1].Constant time using select(0,i-1)-(i-1) on the row bit-vector.

Find translated column j ′ of j using select(0,j)-j on columnbitvector.

Compute r1 = rank(j ′, i) on the sequence representation.

Final answer is color [r0 + r1].

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 16: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (Lower bound reduction)

1 2 3 4 5 6

a

b

c

predcolor(10)=

x= 10, i=b, j=4

r_0=select(1)−1=3 (row vector)

a

a

b

c

c

b

1 2 3 4 6

1

5

1’ 2 2’ 4 6

a c a b c a

predcolor(10)=

predcolor(10)=

color[r_1+r_0]=color[2+2]=

Color

select(0,4)−4=5 (on column vector)

r_1=rank(b,5)=1 (on sequence)1 2 3

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 17: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Sequence representation (Lower bound reduction)

Reduction space −→ additive O(n + σ) bits (bit-vectors).

Reduction time −→ additive O(1) time (select queries onbit-vectors).

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 18: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Compressed sequence representation(compact space upperbounds)

Optimal time for rank operation : O(log log σlogw ).

Reduction to predecessor search: Use Patrascu-Thorup upperbound.

Optimal space O(n log σ) bits.

Select and access also supported in constant time (wasalready known before).

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 19: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Compressed sequence representation (compressed upperbounds)

Previously known that select, rank and access supported inconstant time when σ = logc n for some constant c .

New result; all three operations in constant time for σ = w c .

Time improved from O( log σlog log n )) to O( log σ

logw )

Useful only when logw = ω(log log n). Very large word size.

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 20: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Compressed sequence representation (compressed andsuccinct upper bounds)

Replace tabulation with word-parallelism.

Support select and rank on w1/3-bit blocks instead of logε nblocks in constant time.

Use word-parallelism instead of tabulation.

Get compressed and succinct space by combining withstandard techniques.

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 21: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Conclusion

We show optimal bound for rank queries over alphabet of sizeσ.

Rank lower bound −→ o(n log log σlogw ) time impossible if space

is only O(nw c) for any constant c .

Rank upper bound −→ O(n log log σlogw ) time in compressed

space.

All 3 operations in constant time when w is very large and σrelatively small.

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences

Page 22: New Lower and Upper Bounds for Representing Sequencesalgo12.fri.uni-lj.si/reg/proc/presentations/... · Sequence representation (main results) Rankon sequence representation ! equivalentto

Conclusion

Main open problem −→ achieve improved select and access incompressed space?

Golysnki lower bound for access vs select in compressed spacedoes not match known upper bounds for all σ.

Djamal Belazzougui and Gonzalo Navarro New Lower and Upper Bounds for Representing Sequences