μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 ·...
Embed Size (px)
Transcript of μTune:’Auto+Tuned’Threading’for’OLDI Microservices’ · 2019-02-26 ·...
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
Akshitha Sriraman, Thomas F. Wenisch University of Michigan
OSDI 2018 10/08/2018
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
OLDI: From Monoliths to Microservices
2
> 100 ms SLO
Monolith Microservices
Sub-‐ms SLO
Sub-‐ms-‐scale system overheads must be characterized for microservices
Monolithic)service
Scaling Scaling
RPC
Microservice
100μs
20 μs
120μs20%!
Monolith
300ms
20 μs
300.02ms
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
Impact of Threading on Microservices• Our focus: Sub-‐ms overheads due to threading design
Threading model that exhibits best tail latency is a funcNon of load
Lock contenNon Thread wakeups Spurious context switch
3
Polling
Low load
Blocking Polling
Polling
High load
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
ContribuNons
• A taxonomy of threading models
– Structured understanding of threading implicaNonsØ Reveals tail inflecNon points across loadØ Peak load-‐sustaining model is worse at low load
• μTune: – Uses tail inflecNon insights to opNmize tail latency– Tunes model & thread pool size across load– Simple interface: Abstracts threading model from RPC code
4
Latency
Up to 1.9x tail improvement over staNc throughput-‐opNmized model
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
Outline• MoNvaNon • A taxonomy of threading models• μTune:
– Simple interface design– AutomaNc load adaptaNon system
• EvaluaNon
5
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
Mid-‐Ner Faces More Threading Overheads
6
Front-‐End Microserver Mid-‐Tier Microserver
Leaf Microserver 1
Leaf Microserver 2• Mid-‐Ner – subject to more threading overheads
– Manages RPC fan-‐out to many leaves– RPC layer interacNons dominate computaNon
Threading overheads must be characterized for mid-‐%er microservices
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
A Taxonomy of Threading Models
7
Front-‐End Mid-‐Tier Leaf
NW socket
(1) Poll vs. BlockRequest
(2) In-‐line vs. Dispatch
(3) Synchronous vs.
Asynchronous
Synchronous
Asynchronous
Block Poll
In-‐line SIB SIP
Dispatch SDB SDP
Block Poll
In-‐line AIB AIP
Dispatch ADB ADP
vs. ...
vs. ...
vs. ...
Network poller
Worker
Response
-
μTune: Auto-‐Tuned Threading for OLDI Microservices 8
Latency Tradeoffs Across Threading Models
0
0.5
1
1.5
2
10 100 1000 10000
99th percenN
le tail latency (m
s)
Load (Queries Per Second)
saturaNon
HDSearch: Sync.
In-‐line Block
In-‐line Poll
Dispatch Block
Dispatch Poll
1.4x
In-‐line Poll has lowest low-‐load latency: Avoids thread wakeup delays
-
μTune: Auto-‐Tuned Threading for OLDI Microservices 9
Latency Tradeoffs Across Threading Models
0
0.5
1
1.5
2
10 100 1000 10000
99th percenN
le tail latency (m
s)
Load (Queries Per Second)
saturaNon
HDSearch: Sync.
In-‐line Block
In-‐line Poll
Dispatch Block
Dispatch Poll
In-‐Line Poll faces contenNon; Dispatch Poll with one network poller is best
-
μTune: Auto-‐Tuned Threading for OLDI Microservices 10
Latency Tradeoffs Across Threading Models
0
0.5
1
1.5
2
10 100 1000 10000
99th percenN
le tail latency (m
s)
Load (Queries Per Second)
saturaNon
HDSearch: Sync.
In-‐line Block
In-‐line Poll
Dispatch Block
Dispatch Poll
Dispatch Block is best at high load as it does not waste CPU
∞
-
μTune: Auto-‐Tuned Threading for OLDI Microservices 11
Latency Tradeoffs Across Threading Models
0
0.5
1
1.5
2
10 100 1000 10000
99th percenN
le tail latency (m
s)
Load (Queries Per Second)
saturaNon
HDSearch: Sync.
In-‐line Block
In-‐line Poll
Dispatch Block
Dispatch Poll
∞
No single threading model works best at all loads
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
• Threading choice can significantly affect tail latency• Threading latency trade-‐offs are not obvious• Most sohware face latency penalNes due to staNc threading
12
Need for AutomaNc Load AdaptaNon: μTune
Opportunity: Exploit trade-‐offs among threading models at run-‐Nme
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
• Load adaptaNon: Vary threading model & pool size at run-‐Nme• Abstract threading model boiler-‐plate code from RPC code
13
Microservice funcNonality: ProcessReq(), InvokeLeaf(), FinalizeResp()
μTune automaNc load adaptaNon system
RPC layer
App layer
μTune
Network layer
μTune
Simple interface: Developer defines only three funcNons
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
μTune: Goals & Challenges
14
Service code
μTune‘s threading framework
Simple interface
Quick load change detecNon
Fast threading model switches
Scale thread pools
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
μTune System Design: Auto-‐Tuner• Dynamically picks threading model & pool sizes based on load
15
Request rate Best TM Ideal no. of threads
0 – 128 QPS SIP In-line: one
. . .
4096 – 8192 QPS SDB NW poller: one, Workers: many (eg. 50), Resp. threads: many
Offline training
Create piecewise linear model
Mid-tier Leaf
Front end
Request
Compute
In-line thread
In-line thread
Response
NW socket
In-line thread:
Synchronous
Mid-tier Leaf
Front end
Request
Compute
In-line thread
Response
NW socket
Synchronous
(a) (b)
Mid-tier Front- end
Request
Worker
Response
NW socket
Network thread:
Task queue
Worker notified
Worker awaits notification
Synchronous
Dispatch
Leaf
Compute
Mid-tier Front- end
Request
Worker
Response
NW socket
Network thread:
Task queue
Worker notified
Worker awaits notification
Synchronous
Dispatch
Leaf
Compute
(a) (b)
Mid-tier Front- end
Request
Compute
In-line thread
Resp. thread:
Response
NW (server) socket
In-line thread:
Asynchronous
Leaf
NW (client) socket
Mid-tier Front- end
Request
Compute
In-line thread
Resp. thread:
Response
NW (server) socket
In-line thread:
Asynchronous
Leaf
NW (client) socket
(a) (b)
Front- end
Request
Network thread:
Task queue
Worker notified
Dispatch
Worker awaits notification
Asynchronous
Mid-tier Leaf
Compute Response
NW (client) socket
NW (server) socket
Resp. thread:
Request
Network thread:
Task queue
Worker notified
Dispatch
Worker awaits notification
Asynchronous
Compute Response
NW (client) socket
NW (server) socket
Resp. thread:
Front- end
Mid-tier Leaf
(a) (b)
SIB SIP SDB SDP AIB AIP ADB ADP
(a) µTune framework
Offline training
(b) Async. µTune’s automatic load adaptation system
1
Create piecewise linear model
Request'rate' Best'TM' Ideal'no.'of'threads'
0'–'128'QPS' AIP' Inline:'one'
.'
.'
4096'–'8192'QPS'
ADB' NW'poller:'one''Workers:'few'(eg.'4),'Resp.'threads:'few'
Online: Request from front-end gRPC
Circular event buffer 1
2
Request'rate'
compute'
Send to switching logic
Switch'to'best'TM'and'
thread'poll'sizes'if'needed'
ProcessRequest()
InvokeLeafAsync()
Request to leaf
Mid-tier Front- end
Request
Compute
In-line thread
Resp. thread:
Response
NW (server) socket
In-line thread:
Asynchronous
Leaf
NW (client) socket
Mid-tier Front- end
Request
Compute
In-line thread
Resp. thread:
Response
NW (server) socket
In-line thread:
Asynchronous
Leaf
NW (client) socket
(a) (b)
Front- end
Request
Network thread:
Task queue
Worker notified
Dispatch
Worker awaits notification
Asynchronous
Mid-tier Leaf
Compute Response
NW (client) socket
NW (server) socket
Resp. thread:
Request
Network thread:
Task queue
Worker notified
Dispatch
Worker awaits notification
Asynchronous
Compute Response
NW (client) socket
NW (server) socket
Resp. thread:
Front- end
Mid-tier Leaf
(a) (b)
FinalizeResponse() Response from leaf Response to front-end
3
4
5
6
7
10 11
8
9
2
Circular event bufferOnline: Request from
front-‐end gRPC
Request rate compute
Send to switching logic
Switch to best TM & thread pool
sizes
Request to leaf
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
Experimental Setup• μSuite [Sriraman ’18] benchmark suite:
– Load generator, a mid-‐Ner, 4 or 16 leaf microservers• Study μTune’s adaptaNon in two load scenarios:
– Steady-‐state– Transient
16
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
EvaluaNon: μTune’s Load AdaptaNon
17
Converges to best threading model & pool sizes to improve tails by up to 1.9x
0 0.5
1 1.5
2
20 50 100 1K 8K 14K
Load (Queries Per Second)
∞ 6.17∞HDSearch: Async.
99th percenN
le tail latency (m
s)
1.9x
In-‐Line Poll Dispatch Poll Dispatch Block Haque ‘15 Abdelzaher ‘99 μTune
saturaNon
-
μTune: Auto-‐Tuned Threading for OLDI Microservices
Conclusion• Taxonomy of threading models
– No single threading model has best tail latency across all load• μTune – threading model framework + load adaptaNon system• Achieved up to 1.9x tail speedup over best staNc model
18
-
μTune: Auto-‐Tuned Threading for OLDI Microservices 19
μTune: Auto-‐Tuned Threading for OLDI Microservices
Akshitha Sriraman, Thomas F. Wenisch
University of Michigan
hwps://github.com/wenischlab/MicroTune
Poster number: 29