Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You...

19
This work is supported by the Semiconductor Research Corporation (SRC) and DARPA @ADA_Center adacenter.org Software Data Planes: You Can’t Always Spin to Win Hossein Golestani, Amirhossein Mirhosseini, Thomas F. Wenisch University of Michigan ACM Symposium on Cloud Computing (SoCC) November 22, 2019

Transcript of Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You...

Page 1: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

This work is supported by the Semiconductor Research Corporation (SRC) and DARPA

@ADA_Centeradacenter.org

Software Data Planes:You Can’t Always Spin to Win

Hossein Golestani, Amirhossein Mirhosseini, Thomas F. WenischUniversity of Michigan

ACM Symposium on Cloud Computing (SoCC)November 22, 2019

Page 2: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• Virtual μs-scale computing era

• Service objectives

• High throughput

• Low average/tail latency

Software Data Planes: You Can’t Always Spin to Win

* Image credits: Mellanox, Intel

High-speed I/O*

Microservices

What’s Up in the Cloud?

Network function

virtualizationI/O virtualization

Server #1

Address

Translation

Routing

Server #2

Firewall

Load

Balancing

VM #1 VM #n…

2

Page 3: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• Then vs. now

• Kernel-bypass architectures (just a handful)

Software Data Planes: You Can’t Always Spin to Win

Andromeda [NSDI’18]

Arrakis [OSDI’14]

IX [OSDI’14]

mTCP [NSDI’14]

ReFlex [ASPLOS’17]

Shenango [NSDI’19]

Shinjuku [NSDI’19]

Snap [SOSP’19]

ZygOS [SOSP’17]

Kernel

CPU

CPU

I/O

I/O

…User app

CPU … CPU

Kernel

I/O … I/O

User app

Software Stacks: Under Revision

3

Page 4: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• Key mechanisms

• User-level shared queues

• Spin-polling cores

• Fast notification by cache coherence write signals

• Widely adopted in industry

Software Data Planes: You Can’t Always Spin to Win

SPDKSTORAGE PERFORMANCE DEVELOPMENT KIT

I/O

Software Data Planes

4

Page 5: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• An easy-to-use and fast model for communication and signaling

• But far from ideal, especially when scaled

• We show that spin-based data planes:

• Perform more work when there is less

• Are not scalable to many cores

• Are not scalable to many queues

• Are not well-suited for shared queues

Software Data Planes: You Can’t Always Spin to Win

Spin-polling: Not a Panacea

5

Page 6: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• Introduction to Software Data Planes

• Methodology

• Characterization of Software Data Plane Challenges

• Solution Directions

• Conclusion

Software Data Planes: You Can’t Always Spin to Win

Outline

6

Page 7: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• Setup• DPDK-based applications

• Skylake cores

• 100GbE Mellanox NIC

• Experiments

Inefficiencies of spin-polling

Lack of queue scalability

Impracticality of queue sharing

Software Data Planes: You Can’t Always Spin to Win

Methodology

7

1

2

3

Page 8: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• Polling “tax”• Body of poll loop

• Useless polling on idle queues (possibly causing cache misses)

• Affects throughput scalability with cores

Software Data Planes: You Can’t Always Spin to Win

While forever:

For each RX queue:

Read packets from RX queue;

If there are any packets:

Route packets using LPM*;

Send packets to TX queue(s);

* LPM: Longest Prefix Match

Inefficiencies of Spin-polling

Polling tax can be 20-28% of total CPU cycles even in 100% load

Core

… NIC

Port 1

… NIC

Port 2

8

(1)

(2)

(3)

(4)

(5)

(6)

1

2

3

Page 9: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• IPC (Instructions Per Cycle) of routing core at varying loads

Software Data Planes: You Can’t Always Spin to Win

IPC != Useful Work

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

2.25

2.50

2.75

0 5 10 15 20 25 30

IPC

of ro

uting c

ore

Routing throughput (Mpps)

1 queue

4 queues

8 queues

9

1

2

3

IPC decreases as load increases, resulting in

energy inefficiency, fast aging, and severe co-runner interference

Page 10: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• More (useless) instructions executed in lighter traffic

• Co-running:• Matrix mult

• Spin-based routing (0-100% load)

• Executed on:• SMT cores of a physical CPU

• Different physical CPUs

Software Data Planes: You Can’t Always Spin to Win

Effect on SMT Co-runner

Useless spinning wastes execution resources of an SMT co-runner

2.24

1.56 1.54

0.0

0.5

1.0

1.5

2.0

2.5

Not collocated CollocatedRouting-0%

CollocatedRouting-100%

IPC

of m

atr

ix m

ult

10

1

2

3

Page 11: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• Traffic flows spread among multiple queues

• Limited size of CPU caches: a performance antagonist

• Experiment• Forwarding packets by a single core

• Scaling up the number of queues

Software Data Planes: You Can’t Always Spin to Win

Lack of Queue Scalability

Core

… NIC

Port 1

… NIC

Port 2

11

1

2

3

Page 12: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• Round-trip latency of packet forwarding

• Light traffic (minimal queuing delay)

Software Data Planes: You Can’t Always Spin to Win

Latency is severely affected as queue heads fall out of L1/L2 caches

Effect on Latency

0

5

10

15

20

25

0 64 128 192 256 320 384 448 512

Avera

ge late

ncy (

μs)

Number of queues

12

1

2

3

Page 13: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• Balanced traffic: Passing through all queues

• Unbalanced traffic: Passing through only one queue

Software Data Planes: You Can’t Always Spin to Win

Effect on Peak Throughput

13

Cache misses not interleaved with transmits

severely hurt peak throughput in unbalanced traffic

0

5

10

15

20

25

30

35

40

0 64 128 192 256 320 384 448 512

Thro

ughput

(Mpps)

Total number of queues

Balanced

Unbalanced

1

2

3

Page 14: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• (a) Scale-out vs. (b) Scale-up queuing (shared queue)

• Scale-up queuing

• Strong theoretical merits

• Synchronization disadvantage

Software Data Planes: You Can’t Always Spin to Win

Scale-up Queuing Is Impractical

14

(a) (b)

Core

1

Core

n

……

Core

1

Core

n

1

2

3

Page 15: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• Processing hiccups cause head-of-line (HoL) blocking in scale-out

• Round-trip latency with 10 parallel cores(a) No hiccups

(b) 1μs processing hiccupwith 1% probability

Software Data Planes: You Can’t Always Spin to Win

Although effective in avoiding HoL blocking,

spin-polling in scale-up queuing saturates at lower loads

Scale-out vs. Scale-up

15

0

50

100

150

200

250

300

350

400

0 20 40 60

Avera

ge late

ncy (

μs)

Throughput (Mpps)

Scale-out

Scale-up0

50

100

150

200

250

300

350

400

0 20 40 60A

vera

ge late

ncy (

μs)

Throughput (Mpps)

Scale-out

Scale-up

(a) (b)

1

2

3

Page 16: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

Software Data Planes: You Can’t Always Spin to Win

Future Data Planes

16

Page 17: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• QWAIT, a multi-address monitoring scheme• Inspired by x86 MWAIT

• Avoids polling tax, useless polling, and disruption to SMT co-runners

• Needs hardware support

• Programming model similar to select-case in Go

Software Data Planes: You Can’t Always Spin to Win

Solution Direction(s)

QWAIT (queue_set):

case queue_1:

process_queue_1();

case queue_n:

process_queue_n();

17

Page 18: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

• Key mechanisms of software data planes• User-level shared queues

• Spin-polling cores

• Although easy-to-use and low-latency, software data planes have deficiencies, especially when scaled

• Using DPDK, we quantified these deficiencies:• Incurring polling overhead and useless work

• Not scalable to many cores/queues

• Not well-suited for scale-up queuing

Software Data Planes: You Can’t Always Spin to Win

Conclusion

18

Page 19: Software Data Planes: You Can’t Always Spin to Win · 2020. 3. 6. · Software Data Planes: You Can’t Always Spin to Win Although effective in avoiding HoL blocking, spin-polling

Thank you!

Software Data Planes: You Can’t Always Spin to Win

Q & A

19