R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy...

R2P2: Making RPCs first-class datacenter citizens

Marios Kogias <marios.kogias@epfl.ch>

Datacenter Communication• Infrastructure:

• Clos topology• 10/40/100G links• Few μs RTTs• Kernel-bypass• In-network programmability

• Applications:• Data-stores, search, etc…• Complex Fan-in/Fan-out patterns• Tight tail-latency SLOs• Service time variability• μs-scale Remote Procedure Calls

LoadBalancer

Root Root

LeafLeafLeaf

Q: What is an RPC?

Q: What is a typical RPC stack?

Q: Identify the layers involved

Q: What is a latency SLO?

Transport

Application

Paradigm MismatchMultiplexing independent RPCs over a reliable byte-stream, e.g. TCP

P3 P2 P1

R4 R3 R2 R1

Problems:1. Ordering and Head-of-line blocking• TCP imposes ordering of

requests• RPCs are independent

• Lost packets can affect several requests

P2 P1P3

R1R3R2

2. RPC-agnostic network

• TCP hides RPC semantics• Software middleboxes:

• Deep packet inspection• Connection termination, e.g. L7 LB

P2 P1P3

R2R4R3 R1

Outline

• R2P2, a transport protocol for RPCs that exposes the RPC abstraction to the network and enables in-network policy enforcement

• Usecase: In-network RPC loadbalancing over R2P2

• Identify reusable system design principles• Suggested reading Hints for Computer System Design

R2P2: Request Response Pair Protocol

• Independent RR pairs• Not connections• Not messages

• No protocol-enforced ordering• No fate sharing• Lost packets only affect equivalent RRP

• Per-RPC decisions:• Timeout• At-least/At-most once

Client S

Hint: Leave it to the client

P3 P2 P1

R4 R3 R2 R1

Problems:1. Ordering and Head-of-line blocking

• TCP imposes ordering of requests• RPCs are independent

P2 P1P3

R1R3 R2

• Deep packet inspection• Connection termination (L7 LB)

P2 P1P3

R2R4 R3 R1

Client-Server Decoupling

• RR pair identified by:• Source IP• Source Port• Request-id

• Break point-to-point RPC semantics• Request Destination != Reply Source• Per request policy enforcement

Client Middlebox

RPC Policy Enforcement over R2P2

Clients

MiddleboxREQ0 REQ0

(REQREADY)

(REQN)

(FEEDBACK)

RPC-aware middleboxOnly policy enforcementNo IO Bottleneck

Hint: Separate normal and worst case

P3 P2 P1

R4 R3 R2 R1

Problems:1. Ordering and Head-of-line blocking

• TCP imposes ordering of requests• RPCs are independent

P2 P1P3

R1R3 R2

• Deep packet inspection• Connection termination (L7 LB)

P2 P1P3

R2R4 R3 R1

• Network-based RPC load balancing [ATC 2019]• Target selection based on request type [ATC 2019]• And more to discuss later…

RPC Policies explored

Request-level Load Balancing over TCP

Clients

Middlebox

Vanilla Request-Level LoadBalancing

• L7 loadbalancing• e.g. NGINX reverse proxy

• Terminate client connections• Open other connections to the servers

0.0 0.5 1.0 1.5 2.0 2.5Load (MRPS)

• 4 servers x 16 threads• HTTP-based RPC• 𝑆 = 25𝜇𝑠 exponential distribution• Max throughput 2.56 MRPS• NGINX with Join-Shortest-Queue

L7 loadbalancers suffer from the mismatch and become IO bottlenecks

NGINX-JSQ

In-network Request-level Loadbalancing

• Software DPDK R2P2 router• 5μs latency overhead• IOPS-bottlenecked with 2 cores

• P4 dataplane on Barefoot Tofino• 1μs latency overhead

• 4 servers x 16 threads• 𝑆 = 25𝜇𝑠 exponential distribution• Max throughput 2.56 MRPS

NGINX-JSQ RANDOM

0.0 0.5 1.0 1.5 2.0 2.5Load (MRPS)

Can we do better...?

In-network Request-level Loadbalancing

• Software DPDK R2P2 router• 5μs latency overhead• IOPS-bottlenecked with 2 cores

• P4 dataplane on Barefoot Tofino• 1μs latency overhead

• 4 servers x 16 threads• 𝑆 = 25𝜇𝑠 Exponential distribution• Max throughput 2.56 MRPS

0.0 0.5 1.0 1.5 2.0 2.5Load (MRPS)

NGINX-JSQ RANDOM SW-JBSQ(3)

Yes, we can!

Q: Why do the 3 curves perform differently?

Request-level Load-Balancing

DispatcherS2

• N x M/G/1• Transient load-imbalance• Scalable throughput

• Equivalent to L4 loadbancing

• M/G/N• Better tail-latency• Communication overhead

• Could be implemented as L7 loadbalancing

N x M/G/1M/G/N

TheoryPractice

Challenge

How can we implement RPC loadbalancing with single-queue performance across multiple servers while achieving high throughput and low latency?

Join-Bounded-Shortest-Queue JBSQ(n)• Split-Queue model• One central “unbounded” queue• Several distributed bounded queues

• Delay scheduling decision for better placement• Trade-off• Throughput

• High n can lead to bad placement• Tail-latency

• Small n exposes the communication overhead

Always think about the trade-offs

JBSQ RPC Load Balancing on R2P2• Central queue of REQ0s in the

middlebox• Middlebox maintains

#outstanding RPCs per server• Feedback messages for each

completed RPC

Clients

Middlebox

(REQREADY)

(REQN)

(FEEDBACK)

Software (DPDK)or

Hardware (P4)

JBSQ Evaluation

4 servers (DPDK) x 16 threads𝑆 = 10𝜇𝑠 Exponential distribution

4-byte packets over R2P2

RANDOMM/G/64

0 2 4 6Load (MRPS)

JBSQ Evaluation

0 2 4 6Load (MRPS)

RANDOMSW-JBSQ(1)M/G/64

n = 1 is not enough to saturate throughput

How will the SW-JBSQ(1) curve look like and why?

Q: What can we do to get more throughput?

JBSQ Evaluation

0 2 4 6Load (MRPS)

RANDOMSW-JBSQ(1)SW-JBSQ(5)M/G/64

Max throughput under SLO

JBSQ Evaluation

0 2 4 6Load (MRPS)

RANDOMSW-JBSQ(1)SW-JBSQ(5)P4-JBSQ(3)M/G/64

More efficient HW implementation

Smaller n is better for tail-latency

Alternative Policies

Header Size

PacketId/Packet Count

MessageType ReservedPolicy L

• Policy field in R2P2 header• Existing policies:• ROUTE_ANY• ROUTE_FIXED

• Alternative policies:• STICKY• HASH• Etc..

Redis• KV-store• Master-Slave replication

• SETs only go to Master• GETs are loadbalanced

• 3+1 DPDK servers• USR Facebook workload• Baseline:• Linux TCP

TCP-DIRECT RANDOM SW-JBSQ(20)

0.0 0.5 1.0 1.5 2.0 2.5Load (MRPS)

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00Load (MRPS)

4.09x4.8x

Observations:• R2P2 and DPDK increase throughput• Scheduling benefits are more significant as

service time variability increases

R2P2 and JBSQ vs ZygOS

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6Load (MRPS)

M/M/16ZygOSP4-JBSQ(3) 1 servers x 16 threads

𝑆 = 10𝜇𝑠 Exponential distribution64-byte packets

Lessons Learnt from R2P2

1. Pushing functionality in the network is a viable option

2. Programmable switches can undertake some of this functionality

3. Adding network hops for better scheduling can improve performance

1. Try to properly place functionality in the right layer• Can you think of alternative RPC policies / functionality that can be

implemented with this new abstraction?

2. Separate normal and worst case• Mention other usecases of this hint

3. Leave it to the client• Mention other usecases of this hint

Design Points to Remember

Conclusion• R2P2 – transport protocol for RPCs• Exposes the RPC abstraction to the network• Enables in-network policy enforcement

• In-network RPC loadbalancing• Software/Hardware middlebox• JBSQ scheduling policy

• Extensible in-network policies

https://github.com/epfl-dcsl/r2p2

Thank you!

R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy...

Documents

Transcript of R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy...

Connection Design. Bolted Steel Connections Bolts in tension Bolts in shear 6 x 200 =1200 kN.

CABLES & INTERCONNECT FOR DISPLAY CONNECTIONS · FDC-FLAT DISPLAY CONNECTIONS • Designed for board-to-display interconnections. • Made of standard FFC and connectors such as DF-9,

Stein's method, logarithmic and transport inequalities work with I. Nourdin, G. Peccati (Luxemburg) new connections between Stein’s method logarithmic Sobolev inequalities transportation

· Web viewΠρόγραμμα « Active. Citizens. Fund » στην Ελλάδα » Χρηματοδοτικός Μηχανισμός ΕΟΧ 2014 - 2021. ΑΙΤΗΣΗ (Για μικρά.

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections …strakos/download/2006... · 2006. 9. 29. · Lanczos tridigonalization and Golub - Kahan bidiagonalization:

Όψον – ( o’pson) noun - Andronis Arcadia€¦ · tasting menu Citizens of Ancient Greece held specific rights in the community. Many owned land & helped create a strong trade

Birth of the Modern World - MIT OpenCourseWare · “The Birth of the Modern World” (Christian, Chap. 13) • “Revolution” becomes “normal” • Tighter connections between

· Web viewΠρόγραμμα « A. ctive. Citizens. Fund » στην Ελλάδα. Χρηματοδοτικός Μηχανισμός ΕΟΧ 2014 - 2021. ΑΙΤΗΣΗ (Για μεσαία

Y - Delta Connections in 3 Phase Systems Power Calculations

Brochure...puppet-show performance ( Karagiozis). ... edge about the Greek Constitution, the modern Greek history, the rights of the citizens, the public services, and the administrative

Nuclear Physics News - NuPECCol Vol. 27, No. 4, 2017, Nuclear Physics News 3 Scientific inquiry has been a global human activity since the dawn of history. We scientists are citizens

Holonomies for connections with values in L -algebras · 2015-11-25 · Holonomies for connections with values in L 1-algebras Camilo Arias Abad and Florian Sch atzy April 28, 2014

Datacenter as Fort Knox 2.0 - Webzilla @ Ecom21

BA295S - Honeywellm.products.ecc.emea.honeywell.com/spain/pdf/ba295s-pd-en0h... · The pressure in zone is higher than in zone , ... BA295S-1B = Special version with threaded connections

Pump-Transfer Cooler Filtration Unit UF 2-3 / UKF 1-3...Heat exchanger connections: Female thread (max. tightening torque 160 Nm) The pipes must be connected so that the connections

Elli Chrysidou Vasiliki Tsaousidou – Mitraka Sofoklis ... · the port. 26.09 prECTOJ THEsws IkI PROjEcT Digitalization of Thessaloniki’s history and culture by its own citizens,

INDEX (Click the Series Titles)aitendo3.sakura.ne.jp/aitendo_data/product_img/parts/...INDEX (Click the Series Titles) TAJIMI ELECTRONICS CO., LTD. Series Features Connector Type Connections

Connections / Connexions / Conexiones / Conexões ...media.supercheapauto.com.au/sca/files/fact-sheet/203fc4d6-4566-4919-a8... · pemasangan. Reka bentuk dan spesifikasi mungkin bertukar

1 Riemannian Metric - UHminru/Riemann08/note1.pdf · Math 6396 Riemannian Geometry, Metric, Connections, Curvature Tensors etc. By Min Ru, University of Houston 1 Riemannian Metric

ThinkVantage Access Connections 4.1: 使用手冊ps-2.kev009.com/pccbbs/mobiles_pdf/ac41ugtc.pdf · ® ú ThinkVantage Access Connections 4.1 ÷ΩTC [c 1 1 , yAccess Connections z]tAccess