The Packing Server for Real-time Scheduling of MapReduce Workflows Shen Li, Shaohan Hu, Tarek...
Transcript of The Packing Server for Real-time Scheduling of MapReduce Workflows Shen Li, Shaohan Hu, Tarek...
1
The Packing Server for Real-time Scheduling of MapReduce
Workflows
Shen Li, Shaohan Hu, Tarek AbdelzaherUniversity of Illinois at Urbana Champaign
3
Significance
Underlying Independent Scheduler
τ1
τ2
τn
PackingServer 1
PackingServer 2
PackingServer n…
τ1
τ2
τn
…1. the main contribution is the
notion of a packing server.
2. Packing servers allow graphs of tasks with precedence constraints be converted to a set of budgets treated by the underlying scheduler as independent.
-This is achieved thanks to the app-level scheduler of workload inside the server
3. As a result, we are able to convert bounds from independent tasks into equivalent bounds for parallel tasks.
4. This leads to the notion of conversion bound.
5. Using this approach, we come up with bounds for parallel task models that beat the best known ones.
6. We apply to MapReduce
Utilization
Bound
Utilization
Bound
Utilization
Bound
App-Sched App-Sched App-Sched
4
Independent vs Parallel Tasks
G-EDF
G-RM
Federated
EDF-FF
EDF-FFD
RM-ST
EDZL
38.2%
50%
26.8%
In MapReduce applications: m >> 1, D >> L
(𝑚−2 )(1− 1𝛽 )+1−𝑙𝑛2 𝑚
≈
𝑚−(𝑚−1)
𝛽𝑚
≈
80%
80%
𝑚(1−1𝑒
)
𝑚≈63%
𝑚(1− 1𝛽 )+ 1𝛽𝑚
≈80%
𝑚𝛽+1𝛽+1𝑚
≈80%
-
𝑚2 (1− 1𝛽 )+ 1𝛽
𝑚≈40%
-
-
-
-
max util. of any task: assume β=5
Comparing Utilization Bound
[Li et al. ECRTC’14][Davis et al. ACM Computing Surveys’11]
5
The Conversion Bound
𝜑− 𝛽𝜑
𝑈 𝐵Independent Task Set
Utilization Bound
Parallel Task Set
utilization Bound
×
𝜑 The stretch: deadline over critical path length●
𝛽 The reverse of maximum task utilization●
6
An Example of φ=30, β=5
G-EDF
G-RM
Federated
EDF-FF
EDF-FFD
RM-ST
EDZL
38.2%
50%
26.8%
80%
80%
63%
80%
80%
-
40%
-
-
-
-
Independent ParallelInterdependent
Using Conversion
-
80%×30−530
≈67%
40%×30−530
≈33%
80%×30−530
≈67%
80%×30−530
≈67%
80%×30−530
≈67%
63%×30−530
≈52.5%
7
Construct a Packing Server for a Pipeline
Two questions:
2. What is the conversion bound when using this technique?
1. How to schedule the pipeline in its budgets?
Di Di
Pack to min parallelism without violating deadline
8
Before Packing After Packing
The App-Scheduler
t
1. Find the time instance t such that the accumulative execution time before t equations the total budget size of the first phase.
2. Schedule each phase in its corresponding budget portions using the best-fit-like algorithm.3. For each phase, process one segment at a time. Lay each segment into the budget portions from right to left, starting from the smallest budget portion. Skip any parallelism conflict.
Budget portions
4. This algorithm guarantees to schedule every phase in its own budget portions using a simulation of the Dmax time ahead. Please refer to the paper for more details.
9
Lower bound of total WECT:● ∑𝑗
𝑚𝑖𝑗𝑐 𝑖
𝑗≥ (𝑚𝑖−1 ) (𝜑𝑖
𝛽−1)∑
𝑗
𝑐 𝑖𝑗
𝑢𝑖−𝑢𝑖
𝑢𝑖
=∑
{ 𝑗|𝑚𝑖𝑗<𝑚𝑖 }
(𝑚𝑖−𝑚𝑖𝑗 )𝑐 𝑖
𝑗
∑𝑗
𝑚𝑖𝑗 𝑐𝑖
𝑗
# of virtual segments in phase j
≤
(𝑚𝑖−1 ) ∑{ 𝑗|𝑚𝑖
𝑗<𝑚𝑖 }𝑐 𝑖
𝑗
∑𝑗
𝑚𝑖𝑗𝑐 𝑖
𝑗as 𝑚𝑖
𝑗 ≥1
𝑢𝑖−𝑢𝑖
𝑢𝑖
≤
(𝑚𝑖−1) ∑{ 𝑗|𝑚𝑖
𝑗<𝑚𝑖 }𝑐 𝑖
𝑗
(𝑚𝑖−1 )(𝜑𝑖
𝛽−1)∑
𝑗
𝑐𝑖𝑗
≤𝛽
𝜑𝑖− 𝛽as ∑
{ 𝑗|𝑚𝑖𝑗<𝑚𝑖 }
𝑐 𝑖𝑗≤∑
𝑗
𝑐 𝑖𝑗
Task τi
-utilization (ui)
Phase j
-# of segments (m )
-WCET (c )
ji
ji
The conversion bound:● 𝑢𝑖≤𝑢𝑖 ⋅𝜑𝑖
𝜑𝑖−𝛽≤𝑈𝐵
𝑢𝑖≤𝑈𝐵 ⋅𝜑𝑖− 𝛽𝜑𝑖
The Conversion Bound for M-R Pipelines
Workflow Job i
-deadline (Di)
-crit. path len (Li)
-Stretch (φi)
-budget utilization
bound (1/β)
-# of segments (mi)
10
Transform Workflow into Pipeline
m = 3c = 7
21
21
m = 2c = 5
11
11
m = 6c = 5
51
51
m = 4c = 3
31
31
m = 3c = 3
41
41
m = 2c = 2
61
61
t0 5 10 15 20
● Introducing no computational penalty ● Respecting dependencies
● Preserving critical path length
11
Summary
2. The app-scheduler schedules pipeline into budgets using underlying-scheduler simulations
𝜑− 𝛽𝜑
t
Di Di
t0 5 10 15 20
1. The packing operation packs a pipeline into minimum parallelism
3. Prove conversion bound by analyzing the upper bound of the amount of introduced virtual execution time.
4. Translate workflow into pipeline without introducing virtual computation overhead or lengthening critical path length
12
Evaluation: Algorithms
Packing server uses EDF First-Fit as the underlying scheduler. Independent tasks are partitioned into the first resource slot that does not violate 100% utilization bound.
Packing server uses GEDF as the underlying scheduler. GEDF assigns the highest priority to the job with the most urgent deadline.
The workflow with the most urgent deadline gets the highest priority.
Each high-utilization task (u ≥ 1) is assigned a set of dedicated cores and the remaining low-utilization tasks share the remaining cores.
1. Packing & EDF-FF
2. Packing & GEDF
3. GEDF
4. Federated
13
Evaluation: Compute β
Packing & EDF-FF
Packing & GEDF
𝑈 𝐵⋅𝜑−𝛽𝜑
=¿𝑚𝛽+1𝑚(𝛽+1)
⋅𝜑− 𝛽𝜑
By taking the derivative with respect to β, the highest utilization bound can be achieved at:
𝛽=√ (𝜑+1)(𝑚−1)𝑚
−1
𝑈 𝐵⋅𝜑−𝛽𝜑
=¿𝑚𝛽−𝑚+1
𝑚𝛽⋅𝜑− 𝛽𝜑
Similarly:
𝛽=√𝜑 (𝑚−1)𝑚
14
Evaluation: Accepted Utilization
Workflows are generated based on Yahoo! WebScope data.
Set φ =20, m = 500 (small granularity)
Compute β = 3.58 for Packing & EDF-FF
β = 4.47 for Packing & GEDF
Theoretical utilization bounds:
Packing & EDF-FF: 64%
Packing & GEDF: 60.3%
Federated: 50% [Li et al. ECRTC’14]
GEDF: 38.2% [Li et al. ECRTC’14]
Domino effect
15
Evaluation: Accepted Utilization
Workflows are generated based on Yahoo! WebScope data.
Set φ =30, m = 500 (small granularity)
Compute β = 4.56 for Packing & EDF-FF
β = 5.47 for Packing & GEDF
Theoretical utilization bounds:
Packing & EDF-FF: 70%
Packing & GEDF: 66.9%
Federated: 50% [Li et al. ECRTC’14]
GEDF: 38.2% [Li et al. ECRTC’14]
Domino effect
16
Evaluation: Admission Control
Workflows are generated based on Yahoo! WebScope data.
Implemented a prototype on WOHA [Li et
al., ICDCS’14], a variant of Hadoop
Submitted a set of tasks with a total
utilization above 100%
Admission control is enforced at the
theoretical utilization bound.
Set φ =20, m = 160 (small granularity)
18
The Conversion Bound for M-R Pipelines
∑{ 𝑗∨𝑚𝑖
𝑗≥𝑚𝑖−1 }
𝑚𝑖𝑗𝑐 𝑖
𝑗
𝑚𝑖−1+ ∑
{ 𝑗∨𝑚𝑖𝑗<𝑚𝑖−1 }
𝑐 𝑖𝑗≥𝐷𝑖
′
To cap the max util. of resulting tasks:● 𝐷𝑖′=
𝐷𝑖
𝛽=𝜑𝑖
𝛽 ∑𝑗
𝑐 𝑖𝑗
Together, we have:●
≥𝜑𝑖
𝛽 ∑𝑗
𝑐 𝑖𝑗−∑
𝑗
𝑐 𝑖𝑗
This is a subset of phases
¿ (𝜑𝑖
𝛽−1)∑
𝑗
𝑐 𝑖𝑗
Moreover:● ∑𝑗
𝑚𝑖𝑗𝑐 𝑖
𝑗≥ ∑{ 𝑗|𝑚 𝑖
𝑗≥𝑚𝑖−1 }𝑚𝑖
𝑗𝑐 𝑖𝑗
≥ (𝑚𝑖−1 )(𝜑𝑖
𝛽−1)∑
𝑗
𝑐𝑖𝑗
phases need to be packed (big)
phases need virtual segments (small)
Task τi
-utilization (ui)
Workflow Job i
-deadline (Di)
-crit. path len (Li)
-Stretch (φi)
-budget utilization
bound (1/β)
Phase j
-# of segments (m )
-WCET (c )
ji
ji
Find the minimum concurrency mi such that converted budgets do not violate the deadline. Then, we have:
●
≥ (𝑚𝑖−1 )(𝜑𝑖
𝛽−1)∑
𝑗
𝑐𝑖𝑗
19
The Packing Server: straightforward strategy
Budgets
The Problem:It introduces too much virtual computational overhead.
Consider a MapReduce workflow of two phases:
is bad
20
The Packing Server: fit into Hadoop
Sche
dule
Dmax
Container
Container Container
Container
Input Task Set τ
AM1 AM2 AM3AM:
Application Master
RM RM: Resource Manager
Container
Container
Container: execute segment
Budget Schedule
Budget Schedule
Budget Schedule