μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 ·...

of 19 /19
μTune: AutoTuned Threading for OLDI Microservices Akshitha Sriraman , Thomas F. Wenisch University of Michigan OSDI 2018 10/08/2018

Embed Size (px)

Transcript of μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 ·...

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices  

    Akshitha  Sriraman,  Thomas  F.  Wenisch  University  of  Michigan  

     OSDI  2018  10/08/2018  

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    OLDI:  From  Monoliths  to  Microservices

    2  

    >  100  ms  SLO    

    Monolith Microservices

    Sub-‐ms  SLO                                  

    Sub-‐ms-‐scale  system  overheads  must  be  characterized  for  microservices

    Monolithic)service

    Scaling Scaling

    RPC

    Microservice

    100μs

    20  μs

    120μs20%!

    Monolith

    300ms

    20  μs

    300.02ms

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    Impact  of  Threading  on  Microservices•  Our  focus:  Sub-‐ms  overheads  due  to  threading  design

    Threading  model  that  exhibits  best  tail  latency  is  a  funcNon  of  load

    Lock  contenNon Thread  wakeups Spurious  context  switch

    3  

    Polling

    Low  load

    Blocking Polling

    Polling

    High  load

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    ContribuNons

    •  A  taxonomy  of  threading  models

    –  Structured  understanding  of  threading  implicaNonsØ  Reveals  tail  inflecNon  points  across  loadØ  Peak  load-‐sustaining  model  is  worse  at  low  load

    •  μTune:  –  Uses  tail  inflecNon  insights  to  opNmize  tail  latency–  Tunes  model  &  thread  pool  size  across  load–  Simple  interface:  Abstracts  threading  model  from  RPC  code

    4  

    Latency

    Up  to  1.9x  tail  improvement  over  staNc  throughput-‐opNmized  model  

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    Outline•  MoNvaNon  •  A  taxonomy  of  threading  models•  μTune:  

    –  Simple  interface  design–  AutomaNc  load  adaptaNon  system

    •  EvaluaNon

    5  

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    Mid-‐Ner  Faces  More  Threading  Overheads

    6  

    Front-‐End  Microserver Mid-‐Tier  Microserver

    Leaf  Microserver  1

    Leaf  Microserver  2•  Mid-‐Ner  –  subject  to  more  threading  overheads

    –  Manages  RPC  fan-‐out  to  many  leaves–  RPC  layer  interacNons  dominate  computaNon

    Threading  overheads  must  be  characterized  for  mid-‐%er  microservices

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    A  Taxonomy  of  Threading  Models

    7  

    Front-‐End Mid-‐Tier Leaf

    NW  socket

    (1)  Poll  vs.  BlockRequest

    (2)  In-‐line  vs.  Dispatch

    (3)  Synchronous  vs.  

    Asynchronous

    Synchronous

    Asynchronous

    Block Poll

    In-‐line SIB SIP

    Dispatch SDB SDP

    Block Poll

    In-‐line AIB AIP

    Dispatch ADB ADP

    vs. ...

    vs. ...

    vs. ...

    Network  poller

    Worker

    Response

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices 8  

    Latency  Tradeoffs  Across  Threading  Models

    0

    0.5

    1

    1.5

    2

    10 100 1000 10000

    99th  percenN

    le  tail  latency  (m

    s)

    Load  (Queries  Per  Second)

    saturaNon

    HDSearch:  Sync.

    In-‐line  Block

    In-‐line  Poll

    Dispatch  Block

    Dispatch  Poll

    1.4x

    In-‐line  Poll  has  lowest  low-‐load  latency:  Avoids  thread  wakeup  delays

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices 9  

    Latency  Tradeoffs  Across  Threading  Models

    0

    0.5

    1

    1.5

    2

    10 100 1000 10000

    99th  percenN

    le  tail  latency  (m

    s)

    Load  (Queries  Per  Second)

    saturaNon

    HDSearch:  Sync.

    In-‐line  Block

    In-‐line  Poll

    Dispatch  Block

    Dispatch  Poll

    In-‐Line  Poll  faces  contenNon;  Dispatch  Poll  with  one  network  poller  is  best  

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices 10  

    Latency  Tradeoffs  Across  Threading  Models

    0

    0.5

    1

    1.5

    2

    10 100 1000 10000

    99th  percenN

    le  tail  latency  (m

    s)

    Load  (Queries  Per  Second)

    saturaNon

    HDSearch:  Sync.

    In-‐line  Block

    In-‐line  Poll

    Dispatch  Block

    Dispatch  Poll

    Dispatch  Block  is  best  at  high  load  as  it  does  not  waste  CPU  

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices 11  

    Latency  Tradeoffs  Across  Threading  Models

    0

    0.5

    1

    1.5

    2

    10 100 1000 10000

    99th  percenN

    le  tail  latency  (m

    s)

    Load  (Queries  Per  Second)

    saturaNon

    HDSearch:  Sync.

    In-‐line  Block

    In-‐line  Poll

    Dispatch  Block

    Dispatch  Poll

    No  single  threading  model  works  best  at  all  loads

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    •  Threading  choice  can  significantly  affect  tail  latency•  Threading  latency  trade-‐offs  are  not  obvious•  Most  sohware  face  latency  penalNes  due  to  staNc  threading

    12  

    Need  for  AutomaNc  Load  AdaptaNon:  μTune

    Opportunity:  Exploit  trade-‐offs  among  threading  models  at  run-‐Nme

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    •  Load  adaptaNon:  Vary  threading  model  &  pool  size  at  run-‐Nme•  Abstract  threading  model  boiler-‐plate  code  from  RPC  code

    13  

    Microservice  funcNonality:  ProcessReq(),  InvokeLeaf(),  FinalizeResp()

    μTune  automaNc  load  adaptaNon  system

    RPC  layer

    App  layer

    μTune  

    Network  layer

    μTune

    Simple  interface:  Developer  defines  only  three  funcNons

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    μTune:  Goals  &  Challenges

    14  

    Service  code

    μTune‘s  threading  framework

    Simple  interface

    Quick  load  change  detecNon

    Fast  threading  model  switches

    Scale  thread  pools

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    μTune  System  Design:  Auto-‐Tuner•  Dynamically  picks  threading  model  &  pool  sizes  based  on  load

    15  

    Request rate Best TM Ideal no. of threads

    0 – 128 QPS SIP In-line: one

    . . .

    4096 – 8192 QPS SDB NW poller: one, Workers: many (eg. 50), Resp. threads: many

    Offline  training

    Create  piecewise  linear  model

    Mid-tier Leaf

    Front end

    Request

    Compute

    In-line thread

    In-line thread

    Response

    NW socket

    In-line thread:

    Synchronous

    Mid-tier Leaf

    Front end

    Request

    Compute

    In-line thread

    Response

    NW socket

    Synchronous

    (a) (b)

    Mid-tier Front- end

    Request

    Worker

    Response

    NW socket

    Network thread:

    Task queue

    Worker notified

    Worker awaits notification

    Synchronous

    Dispatch

    Leaf

    Compute

    Mid-tier Front- end

    Request

    Worker

    Response

    NW socket

    Network thread:

    Task queue

    Worker notified

    Worker awaits notification

    Synchronous

    Dispatch

    Leaf

    Compute

    (a) (b)

    Mid-tier Front- end

    Request

    Compute

    In-line thread

    Resp. thread:

    Response

    NW (server) socket

    In-line thread:

    Asynchronous

    Leaf

    NW (client) socket

    Mid-tier Front- end

    Request

    Compute

    In-line thread

    Resp. thread:

    Response

    NW (server) socket

    In-line thread:

    Asynchronous

    Leaf

    NW (client) socket

    (a) (b)

    Front- end

    Request

    Network thread:

    Task queue

    Worker notified

    Dispatch

    Worker awaits notification

    Asynchronous

    Mid-tier Leaf

    Compute Response

    NW (client) socket

    NW (server) socket

    Resp. thread:

    Request

    Network thread:

    Task queue

    Worker notified

    Dispatch

    Worker awaits notification

    Asynchronous

    Compute Response

    NW (client) socket

    NW (server) socket

    Resp. thread:

    Front- end

    Mid-tier Leaf

    (a) (b)

    SIB SIP SDB SDP AIB AIP ADB ADP

    (a) µTune framework

    Offline training

    (b) Async. µTune’s automatic load adaptation system

    1

    Create piecewise linear model

    Request'rate' Best'TM' Ideal'no.'of'threads'

    0'–'128'QPS' AIP' Inline:'one'

    .'

    .'

    4096'–'8192'QPS'

    ADB' NW'poller:'one''Workers:'few'(eg.'4),'Resp.'threads:'few'

    Online: Request from front-end gRPC

    Circular event buffer 1

    2

    Request'rate'

    compute'

    Send to switching logic

    Switch'to'best'TM'and'

    thread'poll'sizes'if'needed'

    ProcessRequest()

    InvokeLeafAsync()

    Request to leaf

    Mid-tier Front- end

    Request

    Compute

    In-line thread

    Resp. thread:

    Response

    NW (server) socket

    In-line thread:

    Asynchronous

    Leaf

    NW (client) socket

    Mid-tier Front- end

    Request

    Compute

    In-line thread

    Resp. thread:

    Response

    NW (server) socket

    In-line thread:

    Asynchronous

    Leaf

    NW (client) socket

    (a) (b)

    Front- end

    Request

    Network thread:

    Task queue

    Worker notified

    Dispatch

    Worker awaits notification

    Asynchronous

    Mid-tier Leaf

    Compute Response

    NW (client) socket

    NW (server) socket

    Resp. thread:

    Request

    Network thread:

    Task queue

    Worker notified

    Dispatch

    Worker awaits notification

    Asynchronous

    Compute Response

    NW (client) socket

    NW (server) socket

    Resp. thread:

    Front- end

    Mid-tier Leaf

    (a) (b)

    FinalizeResponse() Response from leaf Response to front-end

    3

    4

    5

    6

    7

    10 11

    8

    9

    2

    Circular  event  bufferOnline:  Request  from  

    front-‐end gRPC

    Request  rate  compute

    Send  to  switching  logic

    Switch  to  best  TM  &  thread  pool  

    sizes

    Request  to  leaf

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    Experimental  Setup•  μSuite  [Sriraman  ’18]  benchmark  suite:

    –  Load  generator,  a  mid-‐Ner,  4  or  16  leaf  microservers•  Study  μTune’s  adaptaNon  in  two  load  scenarios:

    –  Steady-‐state–  Transient

    16  

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    EvaluaNon:  μTune’s  Load  AdaptaNon

    17  

    Converges  to  best  threading  model  &  pool  sizes  to  improve  tails  by  up  to  1.9x  

    0 0.5

    1 1.5

    2

    20 50 100 1K 8K 14K

    Load (Queries Per Second)

    ∞ 6.17∞HDSearch:  Async.

    99th  percenN

    le  tail  latency  (m

    s)

    1.9x  

    In-‐Line  Poll Dispatch  Poll Dispatch  Block Haque  ‘15 Abdelzaher  ‘99 μTune  

    saturaNon

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    Conclusion•  Taxonomy  of  threading  models

    –  No  single  threading  model  has  best  tail  latency  across  all  load•  μTune  –  threading  model  framework  +  load  adaptaNon  system•  Achieved  up  to  1.9x  tail  speedup  over  best  staNc  model

    18  

  • μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices 19  

    μTune:  Auto-‐Tuned  Threading  for  OLDI  Microservices

    Akshitha  Sriraman,  Thomas  F.  Wenisch

    University  of  Michigan

    hwps://github.com/wenischlab/MicroTune

    Poster  number:  29