Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick...

56
Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick...

Page 1: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

Using Load-Balancing To Build High-Performance Routers

Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown

Stanford University

Page 2: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

2

R

R

R

R

R

R

Typical Router Architecture

Input

Input

Input

Switch Fabric

Scheduler

Output

Output

Output

1122

11

Page 3: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

3

Traffic matrix:

Uniform traffic matrix: λij = λ

Definitions: Traffic MatrixR

R

R

R

R

R

1

N

i

1

N

j

Page 4: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

4

100% throughput: for any traffic matrix of row and column sum less than R,

λij < μij

Definitions: 100% ThroughputR

R

R

R

R

R

1

N

i

1

N

j

ij ij

Page 5: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

5

Router Wish ListScale to High Linecard Speeds

No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity

Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards

Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering

Page 6: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

6

Stanford 100Tb/s Router

“Optics in Routers” project http://yuba.stanford.edu/or/

Some challenging numbers: 100Tb/s 160Gb/s linecards 640 linecards

Page 7: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

7

In

In

In

Out

Out

Out

R

R

R

R

R

R

Router capacity = NRSwitch capacity = N2R

100% Throughput in a Mesh Fabric

?

?

?

?

?

?

?

?

?

R

R

R

R

R

R

R

R

R

RRRR

Page 8: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

8

R

In

In

In

Out

Out

Out

R

R

R

R

R

R/N

R/N

R/N

R/NR/N

R/N

R/N

R/N

R/N

If Traffic Is Uniform

RNR /NR /NR /

R

NR / NR /

Page 9: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

9

Real Traffic is Not Uniform

R

In

In

In

Out

Out

Out

R

R

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

RNR /NR /NR /

R

RNR /NR /NR /

R

RNR /NR /NR /

R

R

R

R

?

Page 10: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

10

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

Load-Balanced Switch

Load-balancing stage Forwarding stage

In

In

In

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R

R

R

100% throughput for weakly mixing traffic (Valiant, C.-S. Chang)

Page 11: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

11

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

112233

Load-Balanced Switch

Page 12: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

12

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N33

22

11

Load-Balanced Switch

Page 13: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

13

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/NR/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

Intuition: Proof of 100% Throughput

Arrivals to second mesh:

Capacity of second mesh:

Second mesh: arrival rate < service rate

111

111

111

where,1

UaUN

b

01

-b RUaUN

C

UN

RC

Cba

Page 14: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

14

Router Wish ListScale to High Linecard Speeds

No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity

Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards

Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering

?

Page 15: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

15

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

Packet Reordering

12

Page 16: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

16

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

Bounding Delay Difference Between Middle Ports

1

2

cells

Page 17: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

17

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

123

0

UFS (Uniform Frame Spreading)

12

Page 18: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

18

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

FOFF (Full Ordered Frames First)

12

Page 19: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

19

FOFF (Full Ordered Frames First)

Input Algorithm N FIFO queues corresponding to the N output flows Spread each flow uniformly: if last packet was sent to

middle port k, send next to k+1. Every N time-slots, pick a flow:

- If full frame exists, pick it and spread like UFS - Else if all frames are partial, pick one in round-robin order and send it

123

12

4

N

Page 20: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

20

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

Bounding Reordering

123

NN

Page 21: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

21

FOFF

Output properties N FIFO queues corresponding to the N middle

ports If there are N2 packets, one of the head-of-line

packets is in order and can depart Buffer size at most N2 packets

111

22

333

Output

4

N

Page 22: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

22

FOFF Properties

Property 1: FOFF maintains packet order.

Property 2: FOFF has O(1) complexity.

Property 3: Congestion buffers operate independently.

Property 4: FOFF maintains an average packet delay within constant from ideal output-queued router.

Corollary: FOFF has 100% throughput for any adversarial traffic.

Page 23: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

23

In

In

In

Out

Out

Out

R

R

R

R

R

R

Output-Queued Router?

?

?

?

?

?

?

?

?

R

R

R

R

R

R

R

R

R

RRRR

Page 24: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

24

Router Wish ListScale to High Linecard Speeds

No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity

Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards

Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering

Page 25: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

25

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

From Two Meshes to One Mesh

One linecard

In

Out

Page 26: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

26

From Two Meshes to One Mesh

First meshIn Out

In Out

In Out

In Out

One linecard

Second mesh

R R

R

R

R

Page 27: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

27

From Two Meshes to One Mesh

Combined meshIn Out

In Out

In Out

In Out

2RR

2R

2R

2R

Page 28: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

28

Many Fabric Options

Options

Space: Full uniform meshTime: Round-robin crossbarWavelength: Static WDM

Any spreadingdevice

C1, C2, …, CN

C1

C2

C3

CN

In Out

In Out

In Out

In Out

N channels each at rate 2R/NOne linecard

Page 29: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

29

AWGR (Arrayed Waveguide Grating Router) A Passive Optical Component

Wavelength i on input port j goes to output port (i+j-1) mod N

Can shuffle information from different inputs

1,

2…N

NxN AWGR

Linecard 1

Linecard 2

Linecard N

1

2

N

Linecard 1

Linecard 2

Linecard N

Page 30: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

30

In Out

In Out

In Out

In Out

Static WDM Switching: Packaging

AWGR

Passive andAlmost Zero

Power

A

B

C

D

A, B, C, D

A, B, C, D

A, B, C, D

A, B, C, D

A, A, A, A

B, B, B, B

C, C, C, C

D, D, D, D

N WDM channels, each at rate 2R/N

Page 31: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

31

Router Wish ListScale to High Linecard Speeds

No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity

Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards

Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering

Page 32: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

32

Scaling Problem

For N < 64, an AWGR is a good solution. We want N = 640. Need to decompose.

Page 33: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

33

A Different Representation of the Mesh

In Out

In Out

In Out

In Out

R 2R

Mesh

2R In Out

In Out

In Out

In Out

R

2RR

Page 34: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

34

A Different Representation of the Mesh

In Out

In Out

In Out

In Out

R In Out

In Out

In Out

In Out

R2R/N

Page 35: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

35

1

2

3

4

Example: N=8

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

8

2R/8

Page 36: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

36

When N is Too LargeDecompose into groups (or racks)

4R/42R 2R1

2

3

4

5

6

7

8

2R2R

1

2

3

4

5

6

7

8

4R 4R

Page 37: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

37

When N is Too LargeDecompose into groups (or racks)

1

2

L

2R2R

2R

1

2

L

2R2R

2R

Group/Rack 1

Group/Rack G

1

2

L

2R2R

2R

Group/Rack 1

1

2

L

2R2R

2R

Group/Rack G

2RL

2RL 2RL

2RL2RL/G

2RL/G

2RL/G

2RL/G

Page 38: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

38

Router Wish ListScale to High Linecard Speeds

No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity

Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards

Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering

Page 39: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

39

When Linecards Fail

1

2

L

2R2R

2R

1

2

L

2R2R

2R

Group/Rack 1

Group/Rack G

1

2

L

2R2R

2R

Group/Rack 1

1

2

L

2R2R

2R

Group/Rack G

2RL

2RL 2RL

2RL2RL/G

2RL/G

2RL/G

2RL/G

2RL

Solution: replace mesh with sum of permutations

= + +

2RL/G 2RL/G 2RL/G 2RL/G

2RL 2RL/G

G *

Page 40: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

40

Hybrid Electro-Optical ArchitectureUsing MEMS Switches

1

2

L

2R2R

2R

1

2

L

2R2R

2R

Group/Rack 1

Group/Rack G

1

2

L

2R2R

2R

Group/Rack 1

1

2

L

2R2R

2R

Group/Rack G

MEMSSwitch

MEMSSwitch

Page 41: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

41

When Linecards Fail

1

2

L

2R2R

2R

1

2

L

2R2R

2R

Group/Rack 1

Group/Rack G

1

2

L

2R2R

2R

Group/Rack 1

1

2

L

2R2R

2R

Group/Rack G

MEMSSwitch

MEMSSwitch

Page 42: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

42

Fiber Link Capacity

1

2

L

2R2R

2R

1

2

L

2R2R

2R

Group/Rack 1

Group/Rack G

1

2

L

2R2R

2R

Group/Rack 1

1

2

L

2R2R

2R

Group/Rack G

MEMSSwitch

MEMSSwitch

MEMSSwitch

Link Capacity ≈ 64 λ’s * 5 Gb/s/λ = 320 Gb/s = 2R

Laser/Modulator

MUX

Page 43: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

43

Group/Rack 1

1

2

2R

2R 4R

Group/Rack 2

1

2

2R

2R 4R

Number of MEMS Switches Example: 2 Groups of 2 Linecards

1

2

2R

2R

Group/Rack 1

1

2

2R

2R

Group/Rack 2

4R

4R

2R

2R

2R

2R

2R

2R

Page 44: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

44

Theorem: M≡L+G-1 MEMS switches are sufficient for bandwidth.

Number of MEMS Switches

Examples:

5540,16,640

2

MGLN

NMNGL

G groups, Li linecards in group i,

G

iiLN

1

,max kk

LL

Page 45: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

45

Implementation of a 100Tb/s Load-Balanced Router

Linecard Rack 1

L = 16160Gb/s linecards

55 56

1 2

40 x 40static

MEMS

Switch Rack < 100W

L = 16160Gb/s linecards

Linecard Rack G = 40

L = 16160Gb/s linecards

Page 46: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

46

Group A

1

2

2R

2R 4R

Group B

1

2

2R

2R 4R

Packet Schedule

1

2

2R

2R

Group A

1

2

2R

2R

Group B

4R

4R

2R

2R

2R

2R

Page 47: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

47

At each time-slot: Each transmitting linecard sends one packet Each receiving linecard receives one packet (MEMS constraint) Each transmitting group i

sends at most one packet to each receiving group j through each MEMS connecting them

In a schedule of N time-slots: Each transmitting linecard sends exactly one

packet to each receiving linecard

Rules for Packet Schedule

Page 48: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

48

Packet Schedule

T+1 T+2 T+3 T+4

Tx LC A1 ? ? ? ?

Tx LC A2 ? ? ? ?

Tx LC B1 ? ? ? ?

Tx LC B2 ? ? ? ?

Tx Group A

Tx Group B

Page 49: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

49

Packet Schedule

T+1 T+2 T+3 T+4

Tx LC A1 A1 A2 B1 B2

Tx LC A2 B2 A1 A2 B1

Tx LC B1 B1 B2 A1 A2

Tx LC B2 A2 B1 B2 A1

Tx Group A

Tx Group B

Page 50: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

50

Bad Packet Schedule

T+1 T+2 T+3 T+4

Tx LC A1 A1 A2 B1 B2

Tx LC A2 B2 A1 A2 B1

Tx LC B1 B1 B2 A1 A2

Tx LC B2 A2 B1 B2 A1

Tx Group A

Tx Group B

Page 51: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

51

Group Schedule

T+1 T+2 T+3 T+4

Tx Group A AB AB AB AB

Tx Group B AB AB AB AB

Page 52: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

52

Good Packet Schedule

T+1 T+2 T+3 T+4

Tx LC A1 A1 A2 B1 B2

Tx LC A2 B2 B1 A2 A1

Tx LC B1 B1 B2 A1 A2

Tx LC B2 A2 A1 B2 B1

Theorem: There exists a polynomial-time algorithm that finds the correct packet schedule.

Verilog implementation < 50ms.

Tx Group A

Tx Group B

Page 53: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

53

Router Wish ListScale to High Linecard Speeds

No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity

Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards

Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering

Page 54: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

54

Summary

The load-balanced switch Does not need any centralized scheduling Can use a mesh

Using FOFF It keeps packets in order It guarantees 100% throughput

Using the hybrid electro-optical architecture It scales to high port numbers It tolerates linecard failure

Page 55: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

55

References

Initial Work

C.-S. Chang, D.-S. Lee and Y.-S. Jou, "Load Balanced Birkhoff-von Neumann Switches, part I: One-Stage Buffering," Computer Communications, Vol. 25, pp. 611-622, 2002.

Extensions

I. Keslassy, S.-T. Chuang, K. Yu, D. Miller, M. Horowitz, O. Solgaard and N. McKeown, "Scaling Internet Routers Using Optics," ACM SIGCOMM '03, Karlsruhe, Germany, August 2003.

I. Keslassy, S.-T. Chuang and N. McKeown, “A Load-Balanced Switch with an Arbitrary Number of Linecards,” IEEE Infocom ’04, Hong Kong, March 2004.

Page 56: Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.

Thank you.