Distributed Storage Allocation Problems

29
Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16

description

Distributed Storage Allocation Problems. Derek Leong, Alexandros G. Dimakis , Tracey Ho California Institute of Technology NetCod 2009 2009-06-16. Motivation. Motivation. 0.1. 2. ?. ?. ?. ?. ?. Σ ≥ 1?. Motivation. A. 1. 1. 0. 0. 0. B. 2 / 5. 2 / 5. 2 / 5. 2 / 5. - PowerPoint PPT Presentation

Transcript of Distributed Storage Allocation Problems

Page 1: Distributed Storage Allocation Problems

Distributed Storage Allocation ProblemsDerek Leong, Alexandros G. Dimakis, Tracey HoCalifornia Institute of TechnologyNetCod 20092009-06-16

Page 2: Distributed Storage Allocation Problems

Motivation

Page 3: Distributed Storage Allocation Problems

Motivation

2

? ?? ?

?

0.1

Σ ≥ 1?

Page 4: Distributed Storage Allocation Problems

Motivation

1 1 0 0 0A

2/5B

2/5 2/5 2/5 2/5

1/2

C1/2 1/2 1/2 0

Page 5: Distributed Storage Allocation Problems

Motivation

Success probability= 0.90 × 0.15 × 0 successful 0-subsets + 0.91 × 0.14 × 2 successful 1-subsets+ 0.92 × 0.13 × 7 successful 2-subsets+ 0.93 × 0.12 × 9 successful 3-subsets+ 0.94 × 0.11 × 5 successful 4-subsets+ 0.95 × 0.10 × 1 successful 5-subsets= 0.99

1 1 0 0 0A

Page 6: Distributed Storage Allocation Problems

Motivation

Success probability= 0.90 × 0.15 × 0 successful 0-subsets + 0.91 × 0.14 × 0 successful 1-subsets+ 0.92 × 0.13 × 0 successful 2-subsets+ 0.93 × 0.12 × 10 successful 3-subsets+ 0.94 × 0.11 × 5 successful 4-subsets+ 0.95 × 0.10 × 1 successful 5-subsets= 0.99144

2/5B

2/5 2/5 2/5 2/5

Page 7: Distributed Storage Allocation Problems

Motivation

Success probability= 0.90 × 0.15 × 0 successful 0-subsets + 0.91 × 0.14 × 0 successful 1-subsets+ 0.92 × 0.13 × 6 successful 2-subsets+ 0.93 × 0.12 × 10 successful 3-subsets+ 0.94 × 0.11 × 5 successful 4-subsets+ 0.95 × 0.10 × 1 successful 5-subsets= 0.9963

1/2

C1/2 1/2 1/2 0

Page 8: Distributed Storage Allocation Problems

Motivation

1 1 0 0 0A

2/5B

2/5 2/5 2/5 2/5

1/2

C1/2 1/2 1/2 0

0.99

0.99144

0.9963

Page 9: Distributed Storage Allocation Problems

Motivation

2

? ?? ?

?

0.1

Σ ≥ 1?

allocationmodelaccess

model

Page 10: Distributed Storage Allocation Problems

Problem Description x

How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget?

• x

• Storage Allocation• Access by the Data Collector• Objective

Page 11: Distributed Storage Allocation Problems

Problem Description x

How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget?

• x

• Storage Allocation• Source s has a data object of unit size

• It can use n storage nodes to store x1, x2, …, xn amount of data

• But faces an aggregate storage budget T, i.e.

• Access by the Data Collector• Objective

Page 12: Distributed Storage Allocation Problems

Problem Description x

How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget?

• x

• Storage Allocation• Access by the Data Collector

• Data collector t attempts to recover the data object by accessinga subset r of storage nodes

• It succeeds when the total amount of data accessed is at least the size of the data object, i.e.

• Objective

Page 13: Distributed Storage Allocation Problems

Problem Description x

How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget?

• x

• Storage Allocation• Access by the Data Collector• Objective

• We seek the optimal allocation that maximizes the probability of successful recovery

Page 14: Distributed Storage Allocation Problems

Problem Description x

How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget?

• x

• Difficulty• Problem is nonconvex• Large space of possible symmetric and nonsymmetric allocations

(an allocation is symmetric if all its nonzero elements are equal,and nonsymmetric otherwise)

Page 15: Distributed Storage Allocation Problems

[1] Deterministic Allocation with Probabilistic Access

Data collector accesses each storage node independentlywith constant probability p

Page 16: Distributed Storage Allocation Problems

• Symmetric allocations can be suboptimal• †Given n = 5 storage nodes,

budget T = 12/5, and p = 0.9,the nonsymmetric allocation

performs better than the optimal symmetric allocation

• Finding the optimal symmetric allocation is also nontrivial

[1] Deterministic Allocation with Probabilistic Access

†Originally from a discussion among R. Karp, R. Kleinberg, †C. Papadimitriou, E. Friedman, and others †at UC Berkeley

Page 17: Distributed Storage Allocation Problems

[2] Deterministic Allocation with Fixed Access

Data collector accesses an r-subset of storage nodes,selected uniformly at random from the collection of all possible r-subsets, where r < n is a constant

Page 18: Distributed Storage Allocation Problems

[2] Deterministic Allocation with Fixed Access• Equivalently, we can seek the

allocation that minimizes the budget T, among all allocationsthat achieve a given probabilityof successful recovery

Page 19: Distributed Storage Allocation Problems

[2] Deterministic Allocation with Fixed Access• Example: (n, r) = (6,2)

• Question: For any budget T, is therealways a symmetric allocation thatproduces the maximum success probability?

Page 20: Distributed Storage Allocation Problems

[2] Deterministic Allocation with Fixed Access• Question: What is the optimal

symmetric allocation?• For most choices of (n, r, T ), the

optimal allocation either concentrates the budget over a minimal number of nodes, or spreads it out maximally

• An example of an exception is (n, r, T ) = (15, 3, 4.6)for which the optimal number of nodes to use, 9, is neither of the extremes

Page 21: Distributed Storage Allocation Problems

[2] Deterministic Allocation with Fixed Access• For Probability-1 Recovery, the

problem reduces to a simple LP• Result 1:

If we require all possible r -subsets to allow successful recovery, then we need a minimum budget of

which corresponds to the allocation

i.e. it is optimal to spread the budget maximally

• We can also bound the success probability above which this allocation is optimal

Page 22: Distributed Storage Allocation Problems

[3] Symmetric Probabilistic Allocation with Fixed Access

Each storage node is used independently with constant

probability s/n to store the same amount of data 1/`, andthe total storage used must be at most budget T in expectation

Page 23: Distributed Storage Allocation Problems

[3] Symmetric Probabilistic Allocation with Fixed Access

• Probability of successful recovery can be written as

where “Bin(n, p)” denotes the binomial random variable with n trials and success probability p

• Reparameterizing in terms ofbudget T gives the success probability

,

,

each nonempty node stores1/` amount of data

Page 24: Distributed Storage Allocation Problems

[3] Symmetric Probabilistic Allocation with Fixed Access

• Result 2: For any r ≥ 2, and at any budget T large enough to support a success probability

xXXxxP (r, T,`) > 0.9

for some `, the choice ofx x x x x x x x x x ` = ris optimal, i.e. it is best to spread the budget maximally

each nonempty node stores1/` amount of data

Page 25: Distributed Storage Allocation Problems

[3] Symmetric Probabilistic Allocation with Fixed Access• As we increase the budget T, we

observe a sharp change in the optimal allocation• For small budgets and therefore

low success probabilities,it is optimal to store the data object in its entirety (` = 1) and hope the data collector accesses at least one of the nonempty nodes

• For large budgets and therefore high success probabilities, it is optimal to store only 1/r amount of data in each nodeused (` = r) and hope the data collector accesses r of them

r = 5

Page 26: Distributed Storage Allocation Problems

[3] Symmetric Probabilistic Allocation with Fixed Access

• We conjecture that for any r and T, the optimal choice of ` that maximizessuccess probability P (r, T,`) is either ` = 1 or ` = r

r = 5

each nonempty node stores1/` amount of data

Page 27: Distributed Storage Allocation Problems

[3] Symmetric Probabilistic Allocation with Fixed Access

• We conjecture that for any r and T, the optimal choice of ` that maximizessuccess probability P (r, T,`) is either ` = 1 or ` = r

each nonempty node stores1/` amount of data

r = 5

store lessstore more

increasing budgetper node

Page 28: Distributed Storage Allocation Problems

Summary & Future Work[1] Deterministic Allocation with Probabilistic Access

• Suboptimality of symmetric allocations

[2] Deterministic Allocation with Fixed Access• Optimal allocation for high probability recovery• Extreme point solutions not necessarily optimal

for symmetric allocations• Is there always a symmetric optimal allocation?

[3]iSymmetric Probabilistic Allocation with Fixed Access• Optimal allocation in high-probability regime• Is there a phase transition in optimal allocation

with increasing budget?

Page 29: Distributed Storage Allocation Problems

Distributed Storage Allocation ProblemsDerek Leong, Alexandros G. Dimakis, Tracey HoCalifornia Institute of TechnologyNetCod 20092009-06-16