Download - μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

Transcript

μTune: Auto-‐Tuned Threading for OLDI Microservices

Akshitha Sriraman, Thomas F. Wenisch University of Michigan

OSDI 2018 10/08/2018

μTune: Auto-‐Tuned Threading for OLDI Microservices

OLDI: From Monoliths to Microservices

> 100 ms SLO

Monolith Microservices

Sub-‐ms SLO

Sub-‐ms-‐scale system overheads must be characterized for microservices

Monolithic)service

Scaling Scaling

RPC

Microservice

100μs

20 μs

120μs 20%!

Monolith

300ms

20 μs

300.02ms

μTune: Auto-‐Tuned Threading for OLDI Microservices

Impact of Threading on Microservices •  Our focus: Sub-‐ms overheads due to threading design

Threading model that exhibits best tail latency is a funcNon of load

Lock contenNon Thread wakeups Spurious context switch

Polling

Low load

Blocking Polling

Polling

High load

μTune: Auto-‐Tuned Threading for OLDI Microservices

ContribuNons •  A taxonomy of threading models

–  Structured understanding of threading implicaNons

Ø  Reveals tail inflecNon points across load Ø  Peak load-‐sustaining model is worse at low load

•  μTune: –  Uses tail inflecNon insights to opNmize tail latency

–  Tunes model & thread pool size across load

–  Simple interface: Abstracts threading model from RPC code

Latency

Up to 1.9x tail improvement over staNc throughput-‐opNmized model

μTune: Auto-‐Tuned Threading for OLDI Microservices

Outline •  MoNvaNon •  A taxonomy of threading models

•  μTune: –  Simple interface design

–  AutomaNc load adaptaNon system

•  EvaluaNon

μTune: Auto-‐Tuned Threading for OLDI Microservices

Mid-‐Ner Faces More Threading Overheads

Front-‐End Microserver Mid-‐Tier Microserver

Leaf Microserver 1

Leaf Microserver 2 •  Mid-‐Ner – subject to more threading overheads

–  Manages RPC fan-‐out to many leaves –  RPC layer interacNons dominate computaNon

Threading overheads must be characterized for mid-‐%er microservices

μTune: Auto-‐Tuned Threading for OLDI Microservices

A Taxonomy of Threading Models

Front-‐End Mid-‐Tier Leaf

NW socket

(1) Poll vs. Block Request

(2) In-‐line vs. Dispatch

(3) Synchronous vs.

Asynchronous

Synchronous

Asynchronous

Block Poll

In-‐line SIB SIP

Dispatch SDB SDP

Block Poll

In-‐line AIB AIP

Dispatch ADB ADP

vs. ...

Network poller

Worker

Response

Page 8: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune: Auto-‐Tuned Threading for OLDI Microservices 8

Latency Tradeoffs Across Threading Models

0.5

1.5

10 100 1000 10000

99th percenN

le tail latency (m

Load (Queries Per Second)

saturaNon

HDSearch: Sync.

In-‐line Block

In-‐line Poll

Dispatch Block

Dispatch Poll

1.4x

In-‐line Poll has lowest low-‐load latency: Avoids thread wakeup delays

Page 9: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune: Auto-‐Tuned Threading for OLDI Microservices 9

Latency Tradeoffs Across Threading Models

0.5

1.5

10 100 1000 10000

99th percenN

le tail latency (m

Load (Queries Per Second)

saturaNon

HDSearch: Sync.

In-‐line Block

In-‐line Poll

Dispatch Block

Dispatch Poll

In-‐Line Poll faces contenNon; Dispatch Poll with one network poller is best

Page 10: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune: Auto-‐Tuned Threading for OLDI Microservices 10

Latency Tradeoffs Across Threading Models

0.5

1.5

10 100 1000 10000

99th percenN

le tail latency (m

Load (Queries Per Second)

saturaNon

HDSearch: Sync.

In-‐line Block

In-‐line Poll

Dispatch Block

Dispatch Poll

Dispatch Block is best at high load as it does not waste CPU

∞

Page 11: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune: Auto-‐Tuned Threading for OLDI Microservices 11

Latency Tradeoffs Across Threading Models

0.5

1.5

10 100 1000 10000

99th percenN

le tail latency (m

Load (Queries Per Second)

saturaNon

HDSearch: Sync.

In-‐line Block

In-‐line Poll

Dispatch Block

Dispatch Poll

∞

No single threading model works best at all loads

μTune: Auto-‐Tuned Threading for OLDI Microservices

•  Threading choice can significantly affect tail latency •  Threading latency trade-‐offs are not obvious •  Most sohware face latency penalNes due to staNc threading

Need for AutomaNc Load AdaptaNon: μTune

Opportunity: Exploit trade-‐offs among threading models at run-‐Nme

μTune: Auto-‐Tuned Threading for OLDI Microservices

•  Load adaptaNon: Vary threading model & pool size at run-‐Nme •  Abstract threading model boiler-‐plate code from RPC code

Microservice funcNonality: ProcessReq(), InvokeLeaf(), FinalizeResp()

μTune automaNc load adaptaNon system

RPC layer

App layer

μTune

Network layer

μTune

Simple interface: Developer defines only three funcNons

μTune: Auto-‐Tuned Threading for OLDI Microservices

μTune: Goals & Challenges

Service code

μTune‘s threading framework

Simple interface

Quick load change detecNon

Fast threading model switches

Scale thread pools

μTune: Auto-‐Tuned Threading for OLDI Microservices

μTune System Design: Auto-‐Tuner •  Dynamically picks threading model & pool sizes based on load

Request rate Best TM Ideal no. of threads

0 – 128 QPS SIP In-line: one

. . .

4096 – 8192 QPS SDB NW poller: one, Workers: many (eg. 50), Resp. threads: many

Offline training

Create piecewise linear model

Mid-tier Leaf

Front end

Request

Compute

In-line thread

Response

NW socket

In-line thread: <block>

Synchronous

Mid-tier Leaf

Front end

Request

Compute

In-line thread

Response

NW socket

Synchronous

(a) (b)

Mid-tier Front- end

Request

Worker

Response

NW socket

Network thread: <block>

Task queue

Worker notified

Worker awaits notification

Synchronous

Dispatch

Leaf

Compute

Mid-tier Front- end

Request

Worker

Response

NW socket

Network thread: <poll>

Task queue

Worker notified

Worker awaits notification

Synchronous

Dispatch

Leaf

Compute

(a) (b)

Mid-tier Front- end

Request

Compute

In-line thread

Resp. thread: <block>

Response

NW (server) socket

In-line thread: <block>

Asynchronous

Leaf

NW (client) socket

Mid-tier Front- end

Request

Compute

In-line thread

Resp. thread: <poll>

Response

NW (server) socket

In-line thread: <poll>

Asynchronous

Leaf

NW (client) socket

(a) (b)

Front- end

Request

Network thread: <block>

Task queue

Worker notified

Dispatch

Worker awaits notification

Asynchronous

Mid-tier Leaf

Compute Response

NW (client) socket

NW (server) socket

Resp. thread: <block>

Request

Network thread: <poll>

Task queue

Worker notified

Dispatch

Worker awaits notification

Asynchronous

Compute Response

NW (client) socket

NW (server) socket

Resp. thread: <poll>

Front- end

Mid-tier Leaf

(a) (b)

SIB SIP SDB SDP AIB AIP ADB ADP

(a) µTune framework

Offline training

(b) Async. µTune’s automatic load adaptation system

Create piecewise linear model

Request'rate' Best'TM' Ideal'no.'of'threads'

0'–'128'QPS' AIP' Inline:'one'

4096'–'8192'QPS'

ADB' NW'poller:'one''Workers:'few'(eg.'4),'Resp.'threads:'few'

Online: Request from front-end gRPC

Circular event buffer 1

Request'rate'

compute'

Send to switching logic

Switch'to'best'TM'and'

thread'poll'sizes'if'needed'

ProcessRequest()

InvokeLeafAsync()

Request to leaf

Mid-tier Front- end

Request

Compute

In-line thread

Resp. thread: <block>

Response

NW (server) socket

In-line thread: <block>

Asynchronous

Leaf

NW (client) socket

Mid-tier Front- end

Request

Compute

In-line thread

Resp. thread: <poll>

Response

NW (server) socket

In-line thread: <poll>

Asynchronous

Leaf

NW (client) socket

(a) (b)

Front- end

Request

Network thread: <block>

Task queue

Worker notified

Dispatch

Worker awaits notification

Asynchronous

Mid-tier Leaf

Compute Response

NW (client) socket

NW (server) socket

Resp. thread: <block>

Request

Network thread: <poll>

Task queue

Worker notified

Dispatch

Worker awaits notification

Asynchronous

Compute Response

NW (client) socket

NW (server) socket

Resp. thread: <poll>

Front- end

Mid-tier Leaf

(a) (b)

FinalizeResponse()Response from leaf Response to

front-end 3

10 11

Circular event buffer Online: Request from

front-‐end gRPC

Request rate compute

Send to switching logic

Switch to best TM & thread pool

sizes

Request to leaf

μTune: Auto-‐Tuned Threading for OLDI Microservices

Experimental Setup •  μSuite [Sriraman ’18] benchmark suite:

–  Load generator, a mid-‐Ner, 4 or 16 leaf microservers

•  Study μTune’s adaptaNon in two load scenarios: –  Steady-‐state –  Transient

μTune: Auto-‐Tuned Threading for OLDI Microservices

EvaluaNon: μTune’s Load AdaptaNon

Converges to best threading model & pool sizes to improve tails by up to 1.9x

0 0.5

1 1.5

20 50 100 1K 8K 14K

Load (Queries Per Second)

∞ 6.17 ∞HDSearch: Async.

99th percenN

le tail latency (m

1.9x

In-‐Line Poll Dispatch Poll Dispatch Block Haque ‘15 Abdelzaher ‘99 μTune

saturaNon

<5% mean overhead

μTune: Auto-‐Tuned Threading for OLDI Microservices

Conclusion •  Taxonomy of threading models

–  No single threading model has best tail latency across all load

•  μTune – threading model framework + load adaptaNon system

•  Achieved up to 1.9x tail speedup over best staNc model

Page 19: μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 · μTune:Auto+TunedThreadingforOLDI Microservices μTune’SystemDesign:’Auto+Tuner • Dynamically’picks’threading’model’&’pool’sizes’based’on’load

μTune: Auto-‐Tuned Threading for OLDI Microservices 19

μTune: Auto-‐Tuned Threading for OLDI Microservices Akshitha Sriraman, Thomas F. Wenisch

University of Michigan

hwps://github.com/wenischlab/MicroTune

Poster number: 29

Top Related

Auto Moto Accidents

MICROSERVICES · MICROSERVICES VS. REVERSE-PROXY Emile Vauge — ContainerCon 2016 ... Backends: Docker, Swarm, Kubernetes, Mesos / Marathon, Consul, Etcd, Zookeeper... Hot reloading

Compact Antenna Tuner for the IC-705, Tunable 50 Ω Antenna ... · HF/50 MHz AUTOMATIC ANTENNA TUNER AH-705 Covers the 1.8 MHz to 50 MHz bands 30 m, 98.4 ft or longer antenna: 1.8

ultra auto 4

Manual instruções cadeira auto huggy inglesina

Identification of an Auto-regulatory Feedback Pathway Involving IL … · 2003-12-16 · Identification of an Auto-regulatory Feedback Pathway Involving IL-1 ... activation plays

Mecanica Auto

Αυτο-συσχέτιση ( auto-correlation )