2008 Generalized SEN
-
Upload
rajkumarpani -
Category
Documents
-
view
220 -
download
0
Transcript of 2008 Generalized SEN
Optimal All-To-All PersonalizedExchange Algorithms in Generalized
Shuffle-Exchange Networks
Student: YuChieh Chiu Advisor: Chiuyuan Chen
Department of Applied Mathematics
National Chiao Tung University
July 31, 2008
Shuffle
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
When N = 2n, the shuffle operation is:
π(in−1in−2 · · · i1i0) = in−2 · · · i1i0in−1
.
Binary Switch
A binary switch is a 2× 2 Switch Element (SE)
2x2Switch
Legitimate States = 4
Permutation Connections = 2
Binary Switch
Control bit
0 for straight and 1 for exchange (cross)
the 2 broadcast states are not used in this paper
Straight Exchange
Upper-broadcast Lower-broadcast
The different setting of the 2X2 SE
Multistage Interconnection Network(MIN)
three typical MINs
8× 8 baseline network, shuffle exchange network, and indirectbinary n-cube network
MIN Implementation
Control (X)
Source (S) Destination (D)
N ×N Shuffle Exchange Networks
N ×N Shuffle Exchange Networks = N ×N SENs
N = ♯ of nodes, n = ♯ of stages
N = 2n
Figure: 4× 4, and 8× 8 SENs
N ×N Generalized Shuffle Exchange Networks
N ×N Generalized Shuffle Exchange Networks = N ×N
GSENs
N = ♯ of nodes, n + 1 = ♯ of stages
2n < N ≤ 2n+1
SENs are contained in GSENs
Figure: 4× 4, 6× 6, and 8× 8 GSENs
NOT unique-path
A MIN is unique-path if there is a unique path between eachpair of input and output.
SENs are unique-path
GSENs are NOT unique-path
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
stage 0 stage 1 stage 2 stage 3
NOT unique-path
A MIN is unique-path if there is a unique path between eachpair of input and output.
SENs are unique-path
GSENs are NOT unique-path
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
stage 0 stage 1 stage 2 stage 3
Communications among processors
one-to-one
one-to-many
all-to-all
all-to-all broadcastall-to-all personalized exchange (ATAPE) ←−we focus on here
Definition
In ATAPE, each processor sends a specific message to every otherprocessor.
Why ATAPE?
ATAPE occurs in many applications:
matrix transposition
fast Fourier transform (FFT)
Compare MIN with other networks
network model scalability communication delay
hypercubes poor shortermeshes better highertori better higherMINs better shorter
is ATAPE easy?
Figure: 6× 6 GSEN
is ATAPE easy?
Figure: 6× 6 GSEN
is ATAPE easy?
Figure: 6× 6 GSEN
Stage control
Stage Control (SC): all the SEs at the same stage are set tothe same state.
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
stage 0 stage 1 stage 2 stage 3
Figure: SC in 10× 10 GSEN
Configuration
Network configuration: the states of switches of the network
in Matrix form
1 0 0 11 0 0 11 0 0 11 0 0 11 0 0 1
under stage control, configuration represented as n + 1-tuple
(1 0 0 1)
or an integer C
(1001)2 = 9
Previous Results
Yang & Wang (2000),IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED
SYSTEMS,Optimal all-to-all personalized exchange in self-routablemultistage networks.
Yang & Wang propose an optimal all-to-all personalizedexchange algorithm with stage control technique in which theLatin square method is used.
Yang & Wang’s algorithm requires constructing a Latin squarein advance and allocating memory for storing the Latin square.
MINs in this paper must be unique path and self-routable.
Previous Results
Massini (2003),DISCRETE APPLIED MATHEMATICS,All-to-all personalized communication on multistageinterconnection networks.
Massini’ algorithm does not require precomputation for Latinsquare and extra memory space.
MINs in this paper must be unique path and self-routable.
Motivation
The purpose of this thesis
To our knowledge, no one has studied ATAPE algorithms inGSENs.
The purpose of this thesis is to propose ATAPE algorithms forGSENs.
We propose two algorithms
Algorithm 1 uses the stage control technique and works for alleven N .
On the contrary, Algorithm 2 works for all N ≡ 2 (mod 4)without stage control.
Both are optimal.
R(N) and Rsc(N)
Definition
Let R(N) denote the minimum number of network configurationsrequired to realize ATAPE in an N ×N GSEN.Also, let Rsc(N) denote the minimum number of networkconfigurations required to realize ATAPE in an N ×N GSEN whenthe stage control technique is assumed.
Lemma 1
N ≤ R(N) ≤ Rsc(N) ≤ 2n+1.
Main results
We propose two optimal ATAPE algorithms for N ×N GSENs.
Algorithm 1
with stage control
need 2n+1 configurations
need to construct a destination matrix in advance
Algorithm 2
without stage control
need N configurations
compute destinations directly
Algorithm 1
Preprocessing-Phase Destination matrix constructing phase
for each processor i (0 ≤ i < N) do in parallel
for each time k (0 ≤ k < 2n+1) do in sequential
prepare a null message;equip the message with configuration k and send it out;when an output (say, j) receives the message, set sj,k = i
for each j (0 ≤ j < N) do
for each k (0 ≤ k < 2n+1) do
if sj,k = i then set di,k = j;for each i (0 ≤ i < N) do
for each j (0 ≤ j < N) do
set mark[j] = 0;
Algorithm 1
Preprocessing-Phase (cont.)
for each k (0 ≤ k < 2n+1) do
if mark[di,k] = 0 then set mark[di,k] = 1;else set di,k = −1;
S =
0 4 8 5 7 6 1 3 5 9 3 0 2 1 6 84 0 5 8 6 7 3 1 9 5 0 3 1 2 8 68 3 0 2 1 5 7 9 3 8 5 7 6 0 2 43 8 2 0 5 1 9 7 8 3 7 5 0 6 4 27 2 6 3 0 9 4 5 2 7 1 8 5 4 9 02 7 3 6 9 0 5 4 7 2 8 1 4 5 0 96 1 7 9 4 8 0 2 1 6 2 4 9 3 5 71 6 9 7 8 4 2 0 6 1 4 2 3 9 7 55 9 4 1 3 2 6 8 0 4 9 6 8 7 1 39 5 1 4 2 3 8 6 4 0 6 9 7 8 3 1
, D =
0 1 2 3 4 5 6 7 8 9 − − − − − −7 6 9 8 2 3 0 1 − − 4 5 − − − −5 4 3 2 9 8 7 6 − − − − 0 1 − −3 2 5 4 8 9 1 0 − − − − 7 6 − −1 0 8 9 6 7 4 5 − − − − − − 3 28 9 1 0 3 2 5 4 − − − − − − 6 76 7 4 5 1 0 8 9 − − − − 2 3 − −4 5 6 7 0 1 2 3 − − − − 9 8 − −2 3 0 1 7 6 9 8 − − 5 4 − − − −9 8 7 6 5 4 3 2 1 0 − − − − − −
Algorithm 1
Phase 1: The message preparing phase
for each processor i (0 ≤ i < N) do in parallel
for each time k (0 ≤ k < 2n+1) do in sequential
if di,k 6= −1 then prepare a personalized message to di,k
else prepare a null message;equip the message with configuration k
insert the message into the message queue of i;
Phase 2: The message sending phase
for each processor i (0 ≤ i < N) do in parallel
for each time k (0 ≤ k < 2n+1) do in sequential
do send a message in the message queue of i;
Correct and Optimal
Correctness
Easy part. Since we use every configuration k, 0 ≤ k < 2n+1.
Optimality
Hard part. We have to claim every configuration contains a
unique-path.
Definition
stage 0 stage 1 stage n
nb
nf 1n
b 1n
f
0b
0fi
j
Forward control tag F = fn2n + · · · + f121 + f02
0
Backward control tag B = bn2n + · · ·+ b121 + b02
0
Observation
sketch of proof
C B F !Lem.82 2 1n n "# Lem.9
2 2 1n n " ! !Lem.10
Lem.11 no 2 holes
12ni FB
N
"$ "% & % &' (
Lem.7
Thm FBCT
Thm.21( ) 2nN ")
"#$
Without SC
Can we do better, if we abandon the SC technique?
N ≤ R(N), at least N configurations
Alternating stage control (ASC)
A variation of stage control, which means the states of theswitches of a stage alternate between straight and cross.
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
1 100
Observation
To choose N configurations
We reordered the configurations AF by the control tag F of inputprocess 0, then we found A0 = AN , A1 = AN+1, . . .
Algorithm 2
Phase 1: The message preparing phase
for each processor i (0 ≤ i < N) do in parallel
calculate mi by the formula:
mi =
{
(i · 2n+1) mod N, if i is even;((i + 1) · 2n+1 − 1) mod N, if i is odd;
07254361899
70523416988
85032149677
58301294766
63810927455
36189072544
41698705233
14967850322
29476583011
92745638100
9876543210i\k
Algorithm 2
Phase 1: The message preparing phase
for each time k (0 ≤ k < N) do in sequential
prepare a personalized message for destination processor{
(mi + k) mod N, if i is even;(mi − k) mod N, if i is odd;
equip the message with configuration Ak = k ⊕⌊
k2
⌋
;insert the message into the message queue of i;
07254361899
70523416988
85032149677
58301294766
63810927455
36189072544
41698705233
14967850322
29476583011
92745638100
9876543210i\k
07254361899
70523416988
85032149677
58301294766
63810927455
36189072544
41698705233
14967850322
29476583011
92745638100
9876543210i\k
Algorithm 2
Phase 2: The message sending phase
for each processor i (0 ≤ i < N) do in parallel
for each time k (0 ≤ k < N) do in sequential
send a message in the message queue of i;
07254361899
70523416988
85032149677
58301294766
63810927455
36189072544
41698705233
14967850322
29476583011
92745638100
9876543210i\k
Correct and Optimal
Optimality
Easy part. Since we use only N configurations Ak, 0 ≤ k < N .N ≤ R(N)
Correctness
Hard part. We proved that the link from any input i to any output
j exists in our algorithm, and the message sent by input i would
reach exactly the output j calculated in our algorithm.
Why N ≡ 2 (mod 4)?
In the proof of correctness, property (∗) only holds when N ≡ 2(mod 4).Thus Algorithm 2 can work when N ≡ 2 (mod 4) only.
Property (∗)
Property (∗)
1 If the alternating control bit is 0, E0−→ E, O
1−→ O;
2 Else the control bit is 1, E1−→ O, O
0−→ E.
1
3
5
7
9
0
2
4
6
8
0 0
1
3
5
7
9
0
2
4
6
8
(a) (b)
0
2
4
6
8
1
3
5
7
9
0
2
4
6
8
1 1
1
3
5
7
9
(c) (d)
Property (∗)
Proof of property (∗)
N2
is odd because N ≡ 2 (mod 4).
x0 = y and x1 = y + N2.
Thus one of the input port of switch y
is even while the other input port isodd.
z0 is even and z1 is odd.
Now set the control bit 0 or 1,andswitch y even or odd; totally 4 cases.
By examine the states of all switches,we have done.
0
1
2
3
4
5
N-2
N-1
0
1
2
3
4
N-2
N-1
0x
1x
1z
0zy
0
1
2
12
N
1, 2 1n
k kF k F k
!" " # #Lem.17
1( 2 )modnj i T N
" $ Lem.6
F F! "Lem.16
Thm.15 relation between
0 1,E E O O%%& %%&
Property (*)
(i)
(ii)
Thm.19 Algorithm is correct
Thm.18 0 1 1, ,...,
NA A A
# fulfill ATAPE
, ,F F A!
1 0,E O O E%%& %%&
Concluding remarks
In this thesis
We have proposed two optimal ATAPE algorithms for GSENs
We obtained Rsc(N) = 2n+1
We obtained N = R(N), if N ≡ 2 (mod 4)
Open question
To determine R(N) for N ≡ 0 (mod 4).
Generalize to d× d switch elements.
0000
01
2345
6789
Initial GSEN
0001
01
2345
6789
0000
05
6127
8349
Round 1
0011
01
2345
6789
0001
05
6127
8349
0000
07
8563
4129
Round 2
0010
01
2345
6789
0011
05
6127
8349
0001
07
8563
4129
0000
03
4781
2569
Round 3
0110
01
2345
6789
0010
05
6127
8349
0011
07
8563
4129
0001
03
4781
2569
0000
01
2345
6789
Round 4
9
8
7
6
5
4
3
2
1
0
0111
01
2345
6789
0110
05
6127
8349
0010
07
8563
4129
0011
30
7418
5296
0001
10
3254
7698
Round 5
89
98
67
76
45
54
23
32
01
10
0101
01
2345
6789
0111
05
6127
8349
0110
70
5836
1492
0010
30
7418
5296
0011
83
0527
4961
Round 6
189
698
967
476
745
254
523
032
301
810
0100
01
2345
6789
0101
05
6127
8349
0111
70
5836
1492
0110
67
0145
8923
0010
38
5072
9416
Round 7
6189
1698
4967
9476
2745
7254
0523
5032
8301
3810
1100
01
2345
6789
0100
05
6127
8349
0101
70
5836
1492
0111
67
0145
8923
0110
65
8709
2143
Round 8
36189
41698
14967
29476
92745
07254
70523
85032
58301
63810
1101
01
2345
6789
1100
50
1672
3894
0100
70
5836
1492
0101
76
1054
9832
0111
56
7890
1234
Round 9
436189
341698
214967
129476
092745
907254
870523
785032
658301
563810
1101
50
1672
3894
1100
25
0381
6947
0100
76
1054
9832
0101
47
6981
0325
Round 10
5436189
2341698
3214967
0129476
1092745
8907254
9870523
6785032
7658301
4563810
1101
25
0381
6947
1100
21
6509
4387
0100
74
9618
3052
Round 11
25436189
52341698
03214967
30129476
81092745
18907254
69870523
96785032
47658301
74563810
1101
21
6509
4387
1100
29
4163
8507
Round 12
725436189
052341698
503214967
830129476
381092745
618907254
169870523
496785032
947658301
274563810
1101
92
1436
5870
Round 13
0725436189
7052341698
8503214967
5830129476
6381092745
3618907254
4169870523
1496785032
2947658301
9274563810