Download - μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

Transcript
Page 1: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices  

Akshitha  Sriraman,  Thomas  F.  Wenisch  University  of  Michigan  

 OSDI  2018  10/08/2018  

Page 2: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

OLDI:  From  Monoliths  to  Microservices

2  

>  100  ms  SLO    

Monolith Microservices

Sub-­‐ms  SLO                                  

Sub-­‐ms-­‐scale  system  overheads  must  be  characterized  for  microservices

Monolithic)service

Scaling Scaling

RPC

Microservice

100μs

20  μs

120μs 20%!

Monolith

300ms

20  μs

300.02ms

Page 3: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

Impact  of  Threading  on  Microservices •  Our  focus:  Sub-­‐ms  overheads  due  to  threading  design

Threading  model  that  exhibits  best  tail  latency  is  a  funcNon  of  load

Lock  contenNon Thread  wakeups Spurious  context  switch

3  

Polling

Low  load

Blocking Polling

Polling

High  load

Page 4: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

ContribuNons •  A  taxonomy  of  threading  models

–  Structured  understanding  of  threading  implicaNons

Ø  Reveals  tail  inflecNon  points  across  load Ø  Peak  load-­‐sustaining  model  is  worse  at  low  load

•  μTune:   –  Uses  tail  inflecNon  insights  to  opNmize  tail  latency

–  Tunes  model  &  thread  pool  size  across  load

–  Simple  interface:  Abstracts  threading  model  from  RPC  code

4  

Latency

Up  to  1.9x  tail  improvement  over  staNc  throughput-­‐opNmized  model  

Page 5: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

Outline •  MoNvaNon   •  A  taxonomy  of  threading  models

•  μTune:   –  Simple  interface  design

–  AutomaNc  load  adaptaNon  system

•  EvaluaNon

5  

Page 6: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

Mid-­‐Ner  Faces  More  Threading  Overheads

6  

Front-­‐End  Microserver Mid-­‐Tier  Microserver

Leaf  Microserver  1

Leaf  Microserver  2 •  Mid-­‐Ner  –  subject  to  more  threading  overheads

–  Manages  RPC  fan-­‐out  to  many  leaves –  RPC  layer  interacNons  dominate  computaNon

Threading  overheads  must  be  characterized  for  mid-­‐%er  microservices

Page 7: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

A  Taxonomy  of  Threading  Models

7  

Front-­‐End Mid-­‐Tier Leaf

NW  socket

(1)  Poll  vs.  Block Request

(2)  In-­‐line  vs.  Dispatch

(3)  Synchronous  vs.  

Asynchronous

Synchronous

Asynchronous

Block Poll

In-­‐line SIB SIP

Dispatch SDB SDP

Block Poll

In-­‐line AIB AIP

Dispatch ADB ADP

vs. ...

vs. ...

vs. ...

Network  poller

Worker

Response

Page 8: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices 8  

Latency  Tradeoffs  Across  Threading  Models

0

0.5

1

1.5

2

10 100 1000 10000

99th  percenN

le  tail  latency  (m

s)

Load  (Queries  Per  Second)

saturaNon

HDSearch:  Sync.

In-­‐line  Block

In-­‐line  Poll

Dispatch  Block

Dispatch  Poll

1.4x

In-­‐line  Poll  has  lowest  low-­‐load  latency:  Avoids  thread  wakeup  delays

Page 9: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices 9  

Latency  Tradeoffs  Across  Threading  Models

0

0.5

1

1.5

2

10 100 1000 10000

99th  percenN

le  tail  latency  (m

s)

Load  (Queries  Per  Second)

saturaNon

HDSearch:  Sync.

In-­‐line  Block

In-­‐line  Poll

Dispatch  Block

Dispatch  Poll

In-­‐Line  Poll  faces  contenNon;  Dispatch  Poll  with  one  network  poller  is  best  

Page 10: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices 10  

Latency  Tradeoffs  Across  Threading  Models

0

0.5

1

1.5

2

10 100 1000 10000

99th  percenN

le  tail  latency  (m

s)

Load  (Queries  Per  Second)

saturaNon

HDSearch:  Sync.

In-­‐line  Block

In-­‐line  Poll

Dispatch  Block

Dispatch  Poll

Dispatch  Block  is  best  at  high  load  as  it  does  not  waste  CPU  

Page 11: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices 11  

Latency  Tradeoffs  Across  Threading  Models

0

0.5

1

1.5

2

10 100 1000 10000

99th  percenN

le  tail  latency  (m

s)

Load  (Queries  Per  Second)

saturaNon

HDSearch:  Sync.

In-­‐line  Block

In-­‐line  Poll

Dispatch  Block

Dispatch  Poll

No  single  threading  model  works  best  at  all  loads

Page 12: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

•  Threading  choice  can  significantly  affect  tail  latency •  Threading  latency  trade-­‐offs  are  not  obvious •  Most  sohware  face  latency  penalNes  due  to  staNc  threading

12  

Need  for  AutomaNc  Load  AdaptaNon:  μTune

Opportunity:  Exploit  trade-­‐offs  among  threading  models  at  run-­‐Nme

Page 13: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

•  Load  adaptaNon:  Vary  threading  model  &  pool  size  at  run-­‐Nme •  Abstract  threading  model  boiler-­‐plate  code  from  RPC  code

13  

Microservice  funcNonality:  ProcessReq(),  InvokeLeaf(),  FinalizeResp()

μTune  automaNc  load  adaptaNon  system

RPC  layer

App  layer

μTune  

Network  layer

μTune

Simple  interface:  Developer  defines  only  three  funcNons

Page 14: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

μTune:  Goals  &  Challenges

14  

Service  code

μTune‘s   threading   framework

Simple   interface

Quick  load  change   detecNon

Fast  threading   model  switches

Scale   thread  pools

Page 15: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

μTune  System  Design:  Auto-­‐Tuner •  Dynamically  picks  threading  model  &  pool  sizes  based  on  load

15  

Request rate Best TM Ideal no. of threads

0 – 128 QPS SIP In-line: one

. . .

4096 – 8192 QPS SDB NW poller: one, Workers: many (eg. 50), Resp. threads: many

Offline   training

Create  piecewise  linear  model

Mid-tier Leaf

Front end

Request

Compute

In-line thread

In-line thread

Response

NW socket

In-line thread: <block>

Synchronous

Mid-tier Leaf

Front end

Request

Compute

In-line thread

Response

NW socket

Synchronous

(a) (b)

Mid-tier Front- end

Request

Worker

Response

NW socket

Network thread: <block>

Task queue

Worker notified

Worker awaits notification

Synchronous

Dispatch

Leaf

Compute

Mid-tier Front- end

Request

Worker

Response

NW socket

Network thread: <poll>

Task queue

Worker notified

Worker awaits notification

Synchronous

Dispatch

Leaf

Compute

(a) (b)

Mid-tier Front- end

Request

Compute

In-line thread

Resp. thread: <block>

Response

NW (server) socket

In-line thread: <block>

Asynchronous

Leaf

NW (client) socket

Mid-tier Front- end

Request

Compute

In-line thread

Resp. thread: <poll>

Response

NW (server) socket

In-line thread: <poll>

Asynchronous

Leaf

NW (client) socket

(a) (b)

Front- end

Request

Network thread: <block>

Task queue

Worker notified

Dispatch

Worker awaits notification

Asynchronous

Mid-tier Leaf

Compute Response

NW (client) socket

NW (server) socket

Resp. thread: <block>

Request

Network thread: <poll>

Task queue

Worker notified

Dispatch

Worker awaits notification

Asynchronous

Compute Response

NW (client) socket

NW (server) socket

Resp. thread: <poll>

Front- end

Mid-tier Leaf

(a) (b)

SIB SIP SDB SDP AIB AIP ADB ADP

(a) µTune framework

Offline training

(b) Async. µTune’s automatic load adaptation system

1

Create piecewise linear model

Request'rate' Best'TM' Ideal'no.'of'threads'

0'–'128'QPS' AIP' Inline:'one'

.'

.'

4096'–'8192'QPS'

ADB' NW'poller:'one''Workers:'few'(eg.'4),'Resp.'threads:'few'

Online: Request from front-end gRPC

Circular event buffer 1

2

Request'rate'

compute'

Send to switching logic

Switch'to'best'TM'and'

thread'poll'sizes'if'needed'

ProcessRequest()

InvokeLeafAsync()

Request to leaf

Mid-tier Front- end

Request

Compute

In-line thread

Resp. thread: <block>

Response

NW (server) socket

In-line thread: <block>

Asynchronous

Leaf

NW (client) socket

Mid-tier Front- end

Request

Compute

In-line thread

Resp. thread: <poll>

Response

NW (server) socket

In-line thread: <poll>

Asynchronous

Leaf

NW (client) socket

(a) (b)

Front- end

Request

Network thread: <block>

Task queue

Worker notified

Dispatch

Worker awaits notification

Asynchronous

Mid-tier Leaf

Compute Response

NW (client) socket

NW (server) socket

Resp. thread: <block>

Request

Network thread: <poll>

Task queue

Worker notified

Dispatch

Worker awaits notification

Asynchronous

Compute Response

NW (client) socket

NW (server) socket

Resp. thread: <poll>

Front- end

Mid-tier Leaf

(a) (b)

FinalizeResponse()Response from leaf Response to

front-end 3

4

5

6

7

10 11

8

9

2

Circular  event  buffer Online:  Request  from  

front-­‐end gRPC

Request  rate   compute

Send  to  switching  logic

Switch  to  best  TM  &  thread  pool  

sizes

Request  to  leaf

Page 16: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

Experimental  Setup •  μSuite  [Sriraman  ’18]  benchmark  suite:

–  Load  generator,  a  mid-­‐Ner,  4  or  16  leaf  microservers

•  Study  μTune’s  adaptaNon  in  two  load  scenarios: –  Steady-­‐state –  Transient

16  

Page 17: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

EvaluaNon:  μTune’s  Load  AdaptaNon

17  

Converges  to  best  threading  model  &  pool  sizes  to  improve  tails  by  up  to  1.9x  

0 0.5

1 1.5

2

20 50 100 1K 8K 14K

Load (Queries Per Second)

∞ 6.17 ∞HDSearch:  Async.

99th  percenN

le  tail  latency  (m

s)

1.9x  

In-­‐Line  Poll Dispatch  Poll Dispatch  Block Haque  ‘15 Abdelzaher  ‘99 μTune  

saturaNon

<5%  mean  overhead

Page 18: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices

Conclusion •  Taxonomy  of  threading  models

–  No  single  threading  model  has  best  tail  latency  across  all  load

•  μTune  –  threading  model  framework  +  load  adaptaNon  system

•  Achieved  up  to  1.9x  tail  speedup  over  best  staNc  model

18  

Page 19: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices 19  

μTune:  Auto-­‐Tuned  Threading  for  OLDI  Microservices Akshitha  Sriraman,  Thomas  F.  Wenisch

University  of  Michigan

hwps://github.com/wenischlab/MicroTune

Poster  number:  29