The Packing Server for Real-time Scheduling of MapReduce Workflows Shen Li, Shaohan Hu, Tarek...

21
The Packing Server for Real-time Scheduling of MapReduce Workflows Shen Li, Shaohan Hu, Tarek Abdelzaher University of Illinois at Urbana Champaign 1

Transcript of The Packing Server for Real-time Scheduling of MapReduce Workflows Shen Li, Shaohan Hu, Tarek...

1

The Packing Server for Real-time Scheduling of MapReduce

Workflows

Shen Li, Shaohan Hu, Tarek AbdelzaherUniversity of Illinois at Urbana Champaign

2

Generalized Parallel Tasks

Parallel Tasks

A.K.A. Pipeline Model A.K.A. Workflow Model

3

Significance

Underlying Independent Scheduler

τ1

τ2

τn

PackingServer 1

PackingServer 2

PackingServer n…

τ1

τ2

τn

…1. the main contribution is the

notion of a packing server.

2. Packing servers allow graphs of tasks with precedence constraints be converted to a set of budgets treated by the underlying scheduler as independent.

-This is achieved thanks to the app-level scheduler of workload inside the server

3. As a result, we are able to convert bounds from independent tasks into equivalent bounds for parallel tasks.

4. This leads to the notion of conversion bound.

5. Using this approach, we come up with bounds for parallel task models that beat the best known ones.

6. We apply to MapReduce

Utilization

Bound

Utilization

Bound

Utilization

Bound

App-Sched App-Sched App-Sched

4

Independent vs Parallel Tasks

G-EDF

G-RM

Federated

EDF-FF

EDF-FFD

RM-ST

EDZL

38.2%

50%

26.8%

In MapReduce applications: m >> 1, D >> L

(𝑚−2 )(1− 1𝛽 )+1−𝑙𝑛2  𝑚

𝑚−(𝑚−1)

𝛽𝑚

80%

80%

𝑚(1−1𝑒

)

𝑚≈63%

𝑚(1− 1𝛽 )+ 1𝛽𝑚

≈80%

𝑚𝛽+1𝛽+1𝑚

≈80%

-

𝑚2 (1− 1𝛽 )+ 1𝛽

𝑚≈40%

-

-

-

-

max util. of any task: assume β=5

Comparing Utilization Bound

[Li et al. ECRTC’14][Davis et al. ACM Computing Surveys’11]

5

The Conversion Bound

𝜑− 𝛽𝜑

𝑈 𝐵Independent Task Set

Utilization Bound

Parallel Task Set

utilization Bound

×

𝜑 The stretch: deadline over critical path length●

𝛽 The reverse of maximum task utilization●

6

An Example of φ=30, β=5

G-EDF

G-RM

Federated

EDF-FF

EDF-FFD

RM-ST

EDZL

38.2%

50%

26.8%

80%

80%

63%

80%

80%

-

40%

-

-

-

-

Independent ParallelInterdependent

Using Conversion

-

80%×30−530

≈67%

40%×30−530

≈33%

80%×30−530

≈67%

80%×30−530

≈67%

80%×30−530

≈67%

63%×30−530

≈52.5%

7

Construct a Packing Server for a Pipeline

Two questions:

2. What is the conversion bound when using this technique?

1. How to schedule the pipeline in its budgets?

Di Di

Pack to min parallelism without violating deadline

8

Before Packing After Packing

The App-Scheduler

t

1. Find the time instance t such that the accumulative execution time before t equations the total budget size of the first phase.

2. Schedule each phase in its corresponding budget portions using the best-fit-like algorithm.3. For each phase, process one segment at a time. Lay each segment into the budget portions from right to left, starting from the smallest budget portion. Skip any parallelism conflict.

Budget portions

4. This algorithm guarantees to schedule every phase in its own budget portions using a simulation of the Dmax time ahead. Please refer to the paper for more details.

9

Lower bound of total WECT:● ∑𝑗

𝑚𝑖𝑗𝑐 𝑖

𝑗≥ (𝑚𝑖−1 ) (𝜑𝑖

𝛽−1)∑

𝑗

𝑐 𝑖𝑗

𝑢𝑖−𝑢𝑖

𝑢𝑖

=∑

{ 𝑗|𝑚𝑖𝑗<𝑚𝑖 }

(𝑚𝑖−𝑚𝑖𝑗 )𝑐 𝑖

𝑗

∑𝑗

𝑚𝑖𝑗 𝑐𝑖

𝑗

# of virtual segments in phase j

(𝑚𝑖−1 ) ∑{ 𝑗|𝑚𝑖

𝑗<𝑚𝑖 }𝑐 𝑖

𝑗

∑𝑗

𝑚𝑖𝑗𝑐 𝑖

𝑗as 𝑚𝑖

𝑗 ≥1

𝑢𝑖−𝑢𝑖

𝑢𝑖

(𝑚𝑖−1) ∑{ 𝑗|𝑚𝑖

𝑗<𝑚𝑖 }𝑐 𝑖

𝑗

(𝑚𝑖−1 )(𝜑𝑖

𝛽−1)∑

𝑗

𝑐𝑖𝑗

≤𝛽

𝜑𝑖− 𝛽as ∑

{ 𝑗|𝑚𝑖𝑗<𝑚𝑖 }

𝑐 𝑖𝑗≤∑

𝑗

𝑐 𝑖𝑗

Task τi

-utilization (ui)

Phase j

-# of segments (m )

-WCET (c )

ji

ji

The conversion bound:● 𝑢𝑖≤𝑢𝑖 ⋅𝜑𝑖

𝜑𝑖−𝛽≤𝑈𝐵

𝑢𝑖≤𝑈𝐵 ⋅𝜑𝑖− 𝛽𝜑𝑖

The Conversion Bound for M-R Pipelines

Workflow Job i

-deadline (Di)

-crit. path len (Li)

-Stretch (φi)

-budget utilization

bound (1/β)

-# of segments (mi)

10

Transform Workflow into Pipeline

m = 3c = 7

21

21

m = 2c = 5

11

11

m = 6c = 5

51

51

m = 4c = 3

31

31

m = 3c = 3

41

41

m = 2c = 2

61

61

t0 5 10 15 20

● Introducing no computational penalty ● Respecting dependencies

● Preserving critical path length

11

Summary

2. The app-scheduler schedules pipeline into budgets using underlying-scheduler simulations

𝜑− 𝛽𝜑

t

Di Di

t0 5 10 15 20

1. The packing operation packs a pipeline into minimum parallelism

3. Prove conversion bound by analyzing the upper bound of the amount of introduced virtual execution time.

4. Translate workflow into pipeline without introducing virtual computation overhead or lengthening critical path length

12

Evaluation: Algorithms

Packing server uses EDF First-Fit as the underlying scheduler. Independent tasks are partitioned into the first resource slot that does not violate 100% utilization bound.

Packing server uses GEDF as the underlying scheduler. GEDF assigns the highest priority to the job with the most urgent deadline.

The workflow with the most urgent deadline gets the highest priority.

Each high-utilization task (u ≥ 1) is assigned a set of dedicated cores and the remaining low-utilization tasks share the remaining cores.

1. Packing & EDF-FF

2. Packing & GEDF

3. GEDF

4. Federated

13

Evaluation: Compute β

Packing & EDF-FF

Packing & GEDF

𝑈 𝐵⋅𝜑−𝛽𝜑

=¿𝑚𝛽+1𝑚(𝛽+1)

⋅𝜑− 𝛽𝜑

By taking the derivative with respect to β, the highest utilization bound can be achieved at:

𝛽=√ (𝜑+1)(𝑚−1)𝑚

−1

𝑈 𝐵⋅𝜑−𝛽𝜑

=¿𝑚𝛽−𝑚+1

𝑚𝛽⋅𝜑− 𝛽𝜑

Similarly:

𝛽=√𝜑 (𝑚−1)𝑚

14

Evaluation: Accepted Utilization

Workflows are generated based on Yahoo! WebScope data.

Set φ =20, m = 500 (small granularity)

Compute β = 3.58 for Packing & EDF-FF

β = 4.47 for Packing & GEDF

Theoretical utilization bounds:

Packing & EDF-FF: 64%

Packing & GEDF: 60.3%

Federated: 50% [Li et al. ECRTC’14]

GEDF: 38.2% [Li et al. ECRTC’14]

Domino effect

15

Evaluation: Accepted Utilization

Workflows are generated based on Yahoo! WebScope data.

Set φ =30, m = 500 (small granularity)

Compute β = 4.56 for Packing & EDF-FF

β = 5.47 for Packing & GEDF

Theoretical utilization bounds:

Packing & EDF-FF: 70%

Packing & GEDF: 66.9%

Federated: 50% [Li et al. ECRTC’14]

GEDF: 38.2% [Li et al. ECRTC’14]

Domino effect

16

Evaluation: Admission Control

Workflows are generated based on Yahoo! WebScope data.

Implemented a prototype on WOHA [Li et

al., ICDCS’14], a variant of Hadoop

Submitted a set of tasks with a total

utilization above 100%

Admission control is enforced at the

theoretical utilization bound.

Set φ =20, m = 160 (small granularity)

17

Thank You!

Q & A

18

The Conversion Bound for M-R Pipelines

∑{ 𝑗∨𝑚𝑖

𝑗≥𝑚𝑖−1 }

𝑚𝑖𝑗𝑐 𝑖

𝑗

𝑚𝑖−1+ ∑

{ 𝑗∨𝑚𝑖𝑗<𝑚𝑖−1 }

𝑐 𝑖𝑗≥𝐷𝑖

To cap the max util. of resulting tasks:● 𝐷𝑖′=

𝐷𝑖

𝛽=𝜑𝑖

𝛽 ∑𝑗

𝑐 𝑖𝑗

Together, we have:●

≥𝜑𝑖

𝛽 ∑𝑗

𝑐 𝑖𝑗−∑

𝑗

𝑐 𝑖𝑗

This is a subset of phases

¿ (𝜑𝑖

𝛽−1)∑

𝑗

𝑐 𝑖𝑗

Moreover:● ∑𝑗

𝑚𝑖𝑗𝑐 𝑖

𝑗≥ ∑{ 𝑗|𝑚 𝑖

𝑗≥𝑚𝑖−1 }𝑚𝑖

𝑗𝑐 𝑖𝑗

≥ (𝑚𝑖−1 )(𝜑𝑖

𝛽−1)∑

𝑗

𝑐𝑖𝑗

phases need to be packed (big)

phases need virtual segments (small)

Task τi

-utilization (ui)

Workflow Job i

-deadline (Di)

-crit. path len (Li)

-Stretch (φi)

-budget utilization

bound (1/β)

Phase j

-# of segments (m )

-WCET (c )

ji

ji

Find the minimum concurrency mi such that converted budgets do not violate the deadline. Then, we have:

≥ (𝑚𝑖−1 )(𝜑𝑖

𝛽−1)∑

𝑗

𝑐𝑖𝑗

19

The Packing Server: straightforward strategy

Budgets

The Problem:It introduces too much virtual computational overhead.

Consider a MapReduce workflow of two phases:

is bad

20

The Packing Server: fit into Hadoop

Sche

dule

Dmax

Container

Container Container

Container

Input Task Set τ

AM1 AM2 AM3AM:

Application Master

RM RM: Resource Manager

Container

Container

Container: execute segment

Budget Schedule

Budget Schedule

Budget Schedule

21

The Story of Aperiodic Task Servers

Aperiodic tasks are difficult to analyze. ●

There exists a rich set of techniques to analyze periodic tasks.

Researchers proposed the concept of aperiodic task servers.

t0 5 10 15 20