R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy...

29
R2P2: Making RPCs first-class datacenter citizens Marios Kogias <[email protected]>

Transcript of R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy...

Page 1: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

R2P2: Making RPCs first-class datacenter citizens

Marios Kogias <[email protected]>

Page 2: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Datacenter Communication• Infrastructure:

• Clos topology• 10/40/100G links• Few μs RTTs• Kernel-bypass• In-network programmability

• Applications:• Data-stores, search, etc…• Complex Fan-in/Fan-out patterns• Tight tail-latency SLOs• Service time variability• μs-scale Remote Procedure Calls

LoadBalancer

Root Root

LeafLeafLeaf

2

Page 3: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Q: What is an RPC?

Q: What is a typical RPC stack?

Q: Identify the layers involved

Q: What is a latency SLO?

Transport

RPC

Application

Page 4: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Paradigm MismatchMultiplexing independent RPCs over a reliable byte-stream, e.g. TCP

P3 P2 P1

R4 R3 R2 R1

4

Problems:1. Ordering and Head-of-line blocking• TCP imposes ordering of

requests• RPCs are independent

• Lost packets can affect several requests

P2 P1P3

R1R3R2

2. RPC-agnostic network

• TCP hides RPC semantics• Software middleboxes:

• Deep packet inspection• Connection termination, e.g. L7 LB

P2 P1P3

R2R4R3 R1

Page 5: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Outline

• R2P2, a transport protocol for RPCs that exposes the RPC abstraction to the network and enables in-network policy enforcement

• Usecase: In-network RPC loadbalancing over R2P2

• Identify reusable system design principles• Suggested reading Hints for Computer System Design

Page 6: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

R2P2: Request Response Pair Protocol

• Independent RR pairs• Not connections• Not messages

• No protocol-enforced ordering• No fate sharing• Lost packets only affect equivalent RRP

• Per-RPC decisions:• Timeout• At-least/At-most once

6

Client S

X

Hint: Leave it to the client

Page 7: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Paradigm MismatchMultiplexing independent RPCs over a reliable byte-stream, e.g. TCP

P3 P2 P1

R4 R3 R2 R1

7

Problems:1. Ordering and Head-of-line blocking

• TCP imposes ordering of requests• RPCs are independent

• Lost packets can affect several requests

P2 P1P3

R1R3 R2

2. RPC-agnostic network

• TCP hides RPC semantics• Software middleboxes:

• Deep packet inspection• Connection termination (L7 LB)

P2 P1P3

R2R4 R3 R1

Page 8: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Client-Server Decoupling

• RR pair identified by:• Source IP• Source Port• Request-id

• Break point-to-point RPC semantics• Request Destination != Reply Source• Per request policy enforcement

8

Client Middlebox

S1

S2

Page 9: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

RPC Policy Enforcement over R2P2

Clients

S1

S2

SN

MiddleboxREQ0 REQ0

(REQREADY)

(REQN)

REPLY

(FEEDBACK)

9

RPC-aware middleboxOnly policy enforcementNo IO Bottleneck

Hint: Separate normal and worst case

Page 10: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Paradigm MismatchMultiplexing independent RPCs over a reliable byte-stream, e.g. TCP

P3 P2 P1

R4 R3 R2 R1

10

Problems:1. Ordering and Head-of-line blocking

• TCP imposes ordering of requests• RPCs are independent

• Lost packets can affect several requests

P2 P1P3

R1R3 R2

2. RPC-agnostic network

• TCP hides RPC semantics• Software middleboxes:

• Deep packet inspection• Connection termination (L7 LB)

P2 P1P3

R2R4 R3 R1

Page 11: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

• Network-based RPC load balancing [ATC 2019]• Target selection based on request type [ATC 2019]• And more to discuss later…

RPC Policies explored

11

Page 12: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Request-level Load Balancing over TCP

Clients

S1

S2

SN

Middlebox

12

Page 13: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Vanilla Request-Level LoadBalancing

• L7 loadbalancing• e.g. NGINX reverse proxy

• Terminate client connections• Open other connections to the servers

13

0.0 0.5 1.0 1.5 2.0 2.5Load (MRPS)

0

100

200

300

400

500

99th

Lat

ency

(µs)

• 4 servers x 16 threads• HTTP-based RPC• 𝑆 = 25𝜇𝑠 exponential distribution• Max throughput 2.56 MRPS• NGINX with Join-Shortest-Queue

L7 loadbalancers suffer from the mismatch and become IO bottlenecks

NGINX-JSQ

Page 14: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

In-network Request-level Loadbalancing

• Software DPDK R2P2 router• 5μs latency overhead• IOPS-bottlenecked with 2 cores

• P4 dataplane on Barefoot Tofino• 1μs latency overhead

14

• 4 servers x 16 threads• 𝑆 = 25𝜇𝑠 exponential distribution• Max throughput 2.56 MRPS

NGINX-JSQ RANDOM

0.0 0.5 1.0 1.5 2.0 2.5Load (MRPS)

0

100

200

300

400

500

99th

Lat

ency

(µs)

Can we do better...?

Page 15: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

In-network Request-level Loadbalancing

• Software DPDK R2P2 router• 5μs latency overhead• IOPS-bottlenecked with 2 cores

• P4 dataplane on Barefoot Tofino• 1μs latency overhead

15

• 4 servers x 16 threads• 𝑆 = 25𝜇𝑠 Exponential distribution• Max throughput 2.56 MRPS

0.0 0.5 1.0 1.5 2.0 2.5Load (MRPS)

0

100

200

300

400

500

99th

Lat

ency

(µs)

NGINX-JSQ RANDOM SW-JBSQ(3)

Yes, we can!

Q: Why do the 3 curves perform differently?

Page 16: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Request-level Load-Balancing

16

S1

DispatcherS2

S N

S1

DispatcherS2

S N

• N x M/G/1• Transient load-imbalance• Scalable throughput

• Equivalent to L4 loadbancing

• M/G/N• Better tail-latency• Communication overhead

• Could be implemented as L7 loadbalancing

Load

N x M/G/1M/G/N

TheoryPractice

Page 17: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Challenge

How can we implement RPC loadbalancing with single-queue performance across multiple servers while achieving high throughput and low latency?

Page 18: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Join-Bounded-Shortest-Queue JBSQ(n)• Split-Queue model• One central “unbounded” queue• Several distributed bounded queues

• Delay scheduling decision for better placement• Trade-off• Throughput

• High n can lead to bad placement• Tail-latency

• Small n exposes the communication overhead

18

n==2

Always think about the trade-offs

Page 19: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

JBSQ RPC Load Balancing on R2P2• Central queue of REQ0s in the

middlebox• Middlebox maintains

#outstanding RPCs per server• Feedback messages for each

completed RPC

19

Clients

S1

S2

SN

Middlebox

REQ0

REQ0

(REQREADY)

(REQN)

REPLY

(FEEDBACK)

Software (DPDK)or

Hardware (P4)

Page 20: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

JBSQ Evaluation

4 servers (DPDK) x 16 threads𝑆 = 10𝜇𝑠 Exponential distribution

4-byte packets over R2P2

20

RANDOMM/G/64

0 2 4 6Load (MRPS)

0

50

100

150

99th

Lat

ency

(µs)

Page 21: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

JBSQ Evaluation

4 servers (DPDK) x 16 threads𝑆 = 10𝜇𝑠 Exponential distribution

4-byte packets over R2P2

21

0 2 4 6Load (MRPS)

0

50

100

150

99th

Lat

ency

(µs)

RANDOMSW-JBSQ(1)M/G/64

n = 1 is not enough to saturate throughput

?

How will the SW-JBSQ(1) curve look like and why?

Q: What can we do to get more throughput?

Page 22: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

JBSQ Evaluation

4 servers (DPDK) x 16 threads𝑆 = 10𝜇𝑠 Exponential distribution

4-byte packets over R2P2

22

0 2 4 6Load (MRPS)

0

50

100

150

99th

Lat

ency

(µs)

RANDOMSW-JBSQ(1)SW-JBSQ(5)M/G/64

Max throughput under SLO

Page 23: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

JBSQ Evaluation

4 servers (DPDK) x 16 threads𝑆 = 10𝜇𝑠 Exponential distribution

4-byte packets over R2P2

23

0 2 4 6Load (MRPS)

0

50

100

150

99th

Lat

ency

(µs)

RANDOMSW-JBSQ(1)SW-JBSQ(5)P4-JBSQ(3)M/G/64

More efficient HW implementation

Smaller n is better for tail-latency

Page 24: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Alternative Policies

Header Size

PacketId/Packet Count

F

0 16

ReqId

Magic

MessageType ReservedPolicy L

24

• Policy field in R2P2 header• Existing policies:• ROUTE_ANY• ROUTE_FIXED

• Alternative policies:• STICKY• HASH• Etc..

Page 25: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Redis• KV-store• Master-Slave replication

• SETs only go to Master• GETs are loadbalanced

• 3+1 DPDK servers• USR Facebook workload• Baseline:• Linux TCP

TCP-DIRECT RANDOM SW-JBSQ(20)

0.0 0.5 1.0 1.5 2.0 2.5Load (MRPS)

0

100

200

300

99th

Lat

ency

(µs)

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00Load (MRPS)

0

100

200

300

99th

Lat

ency

(µs)

5.3x

4.09x4.8x

Observations:• R2P2 and DPDK increase throughput• Scheduling benefits are more significant as

service time variability increases

5.6x

25

Page 26: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

R2P2 and JBSQ vs ZygOS

26

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6Load (MRPS)

0

50

100

150

99th

Lat

ency

(µs)

M/M/16ZygOSP4-JBSQ(3) 1 servers x 16 threads

𝑆 = 10𝜇𝑠 Exponential distribution64-byte packets

Page 27: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Lessons Learnt from R2P2

27

1. Pushing functionality in the network is a viable option

2. Programmable switches can undertake some of this functionality

3. Adding network hops for better scheduling can improve performance

Page 28: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

1. Try to properly place functionality in the right layer• Can you think of alternative RPC policies / functionality that can be

implemented with this new abstraction?

2. Separate normal and worst case• Mention other usecases of this hint

3. Leave it to the client• Mention other usecases of this hint

Design Points to Remember

Page 29: R2P2: Making RPCs first-class datacenter citizens•L7 loadbalancing •e.g. NGINX reverse proxy •Terminate client connections •Open other connections to the servers 13 0.0 0.5

Conclusion• R2P2 – transport protocol for RPCs• Exposes the RPC abstraction to the network• Enables in-network policy enforcement

• In-network RPC loadbalancing• Software/Hardware middlebox• JBSQ scheduling policy

• Extensible in-network policies

https://github.com/epfl-dcsl/r2p2

29

Thank you!