Section 1.5 - Exploiting partitioning of matrices and vectors · Section 1.5 - Exploiting...

Post on 17-Oct-2020

20 views 0 download

Transcript of Section 1.5 - Exploiting partitioning of matrices and vectors · Section 1.5 - Exploiting...

Section 1.5 - Exploiting partitioning of matricesand vectors

Maggie MyersRobert A. van de Geijn

The University of Texas at Austin

Practical Linear Algebra – Fall 2009

http://z.cs.utexas.edu/wiki/pla.wiki/ 1

Example: Partitioning vectors

Given a vector

x =

4−131

one can think of this as its four elements

x =

χ0

χ1

χ2

χ3

where χ0 = 4, χ1 = −1, etc.

Note

The parentheses are only there to delimit (outline) the vector.They have no particular other meaning.

http://z.cs.utexas.edu/wiki/pla.wiki/ 2

Example: Partitioning vectors

Given a vector

x =

4−131

one can think of this as two subvectors:

x =(x0

x1

)=

(

4−1

)(

31

) =

4−131

so that

x0 =(

4−1

)and x1 =

(31

)

http://z.cs.utexas.edu/wiki/pla.wiki/ 3

Example: Partitioning vectors

Given a vector

x =

4−131

one can think of this as two subvectors:

x =(x0

x1

)=

(

4) −1

31

=

4−131

so that

x0 =(

4)

and x1 =

−131

http://z.cs.utexas.edu/wiki/pla.wiki/ 4

Example: Partitioning vectors

Given a vector

x =

4−131

one can think of this as two subvectors:

x =(x0

x1

)=

( )

4−131

=

4−131

so that

x0 =( )

and x1 =

4−131

http://z.cs.utexas.edu/wiki/pla.wiki/ 5

Example: Inner product with partitioned vectors

Given vector

x =(x0

x1

)=

4−131

and y =(y0

y1

)=

1−23−4

We find that

xT y = (4)× (1) + (−1)× (−2)︸ ︷︷ ︸=(

4−1

)T ( 1−2

)= xT

0 y0

+ (3)× (−3) + (1)×−4)︸ ︷︷ ︸=(

31

)T ( 3−4

)= xT

1 y1︸ ︷︷ ︸= xT

0 y0 + xT1 y1

http://z.cs.utexas.edu/wiki/pla.wiki/ 6

Theorem

Let x, y ∈ Rn and partition

x =

x0

x1...

xN−1

and y =

y0

y1...

yN−1

,

where xi and yi have the same size, for i = 0, . . . , N − 1.

http://z.cs.utexas.edu/wiki/pla.wiki/ 7

Theorem (continued)

Then

xT y =

x0

x1...

xN−1

T

y0

y1...

yN−1

=(xT

0 xT1 · · · xT

N−1

)

y0

y1...

yN−1

= xT

0 y0 + xT1 y1 + · · ·+ xT

N−1yN−1

http://z.cs.utexas.edu/wiki/pla.wiki/ 8

Example: axpywith partitioned vectors

Given

x =(x0

x1

)=

4−131

, y =(y0

y1

)=

1−23−4

, and α = 4

We find that

αx+ y = 4

4−131

+

1−23−4

=

(4)× (4) + (1)

(4)× (−1) + (−2)(4)× (3) + (3)

(4)× (1) + (−4)

=

4(

4−1

)+(

1−2

)4(

31

)+(

3−4

) =

(αx0 + y0

αx1 + y1

)

http://z.cs.utexas.edu/wiki/pla.wiki/ 9

Theorem

Let x, y ∈ Rn, α ∈ R, and partition

x =

x0

x1...

xN−1

and y =

y0

y1...

yN−1

,

where xi and yi have the same size, for i = 0, . . . , N − 1.

http://z.cs.utexas.edu/wiki/pla.wiki/ 10

Theorem (continued)

Then

αx+ y = α

x0

x1...

xN−1

+

y0

y1...

yN−1

=

αx0 + y0

αx1 + y1...

αxN−1 + yN−1

http://z.cs.utexas.edu/wiki/pla.wiki/ 11

Partitioning matrices

A =

A00 a01 A02

aT10 α11 aT

12

A20 a21 A22

=

−1 2 4 1 0

1 0 −1 −2 12 −1 3 1 21 2 3 4 3−1 −2 0 1 2

Pronounce a21 as a-two-one instead of a-twentyone, please.

Notice how the labels “A”, “a”, “α” are used.

Why do we use the label “aT10”?

http://z.cs.utexas.edu/wiki/pla.wiki/ 12

Example: blocked matrix-vector multiplication

Consider

A =

A00 a01 A02

aT10 α11 aT

12

A20 a21 A22

=

−1 2 4 1 0

1 0 −1 −2 12 −1 3 1 21 2 3 4 3−1 −2 0 1 2

,

x =

x0

χ1

x2

=

12345

, and y =

y0

ψ1

y2

,

where y0, y2 ∈ R2.

http://z.cs.utexas.edu/wiki/pla.wiki/ 13

Example (continued)

Then Ax =0BBBB@−1 2 4 1 0

1 0 −1 −2 1

2 −1 3 1 2

1 2 3 4 3−1 −2 0 1 2

1CCCCA0BBBB@

12

3

45

1CCCCA =

0BBBB@(−1)× (1) + (2)× (2) + (4)× (3) + (1)× (4) + (0)× (5)

(1)× (1) + (0)× (2) + (−1)× (3) + (−2)× (4) + (1)× (5)

(2)× (1) + (−1)× (2) + (3)× (3) + (1)× (4) + (2)× (5)

(1)× (1) + (2)× (2) + (3)× (3) + (4)× (4) + (3)× (5)(−1)× (1) + (−2)× (2) + (0)× (3) + (1)× (4) + (2)× (5)

1CCCCA

http://z.cs.utexas.edu/wiki/pla.wiki/ 14

Example (continued)

0BBBB@(−1)× (1) + (2)× (2) + (4)× (3) + (1)× (4) + (0)× (5)

(1)× (1) + (0)× (2) + (−1)× (3) + (−2)× (4) + (1)× (5)

(2)× (1) + (−1)× (2) + (3)× (3) + (1)× (4) + (2)× (5)

(1)× (1) + (2)× (2) + (3)× (3) + (4)× (4) + (3)× (5)(−1)× (1) + (−2)× (2) + (0)× (3) + (1)× (4) + (2)× (5)

1CCCCA

=

0BBBBBB@

„−1 2

1 0

«„12

«+

„4−1

«3 +

„1 0−2 1

«„45

«`

2 −1´„ 1

2

«+ (3)3 +

`1 2

´„ 45

«„

1 2−1 −2

«„12

«+

„30

«3 +

„4 31 2

«„45

«

1CCCCCCA

=

0BBBB@„

31

«+

„12−3

«+

„4−3

«0 + 9 + 14„

5−5

«+

„90

«+

„3114

«1CCCCA =

0BBBB@19−523459

1CCCCA

http://z.cs.utexas.edu/wiki/pla.wiki/ 15

Blocked matrix-vector multiplication

Let A ∈ Rm×n, x ∈ Rn, and y ∈ Rn. Let

m = m0 +m1 + · · ·mM−1, mi ≥ 0 for i = 0, . . . ,M − 1; and

n = n0 + n1 + · · ·nN−1, nj ≥ 0 for j = 0, . . . , N − 1; and

Partition

A =

0BBB@A0,0 A0,1 · · · A0,N−1

A1,0 A1,1 · · · A1,N−1

......

. . ....

AM−1,0 AM−1,1 · · · AM−1,N−1

1CCCA ,

x =

0BBB@x0

x1

...xN−1

1CCCA , and y =

0BBB@y0y1...

yM−1

1CCCAwith Ai,j ∈ Rmi×nj , xj ∈ Rnj , and yi ∈ Rmi .

http://z.cs.utexas.edu/wiki/pla.wiki/ 16

Theorem (continued)

Then0BBB@y0y1...

yM−1

1CCCA =

0BBB@A0,0 A0,1 · · · A0,N−1

A1,0 A1,1 · · · A1,N−1

......

. . ....

AM−1,0 AM−1,1 · · · AM−1,N−1

1CCCA0BBB@

x0

x1

...xN−1

1CCCA

=

0BBB@A0,0x0 +A0,1x1 + · · ·+A0,N−1xN−1

A1,0x0 +A1,1x1 + · · ·+A1,N−1xN−1

...

AM−1,0x0 +AM−1,1x1 + · · ·+AM−1,N−1xN−1

1CCCA

In other words...

yi =N−1∑j=0

Ai,jxj .

http://z.cs.utexas.edu/wiki/pla.wiki/ 17

Example (revisited)

Consider

A =

0@ A00 a01 A02

aT10 α11 aT12A20 a21 A22

1A =

0BBBB@−1 2 4 1 0

1 0 −1 −2 1

2 −1 3 1 2

1 2 3 4 3−1 −2 0 1 2

1CCCCA ,

x =

0@ x0

χ1

x2

1A =

0BBBB@12

3

45

1CCCCA , and y =

0@ y0ψ1

y2

1A ,

where y0, y2 ∈ R2.

http://z.cs.utexas.edu/wiki/pla.wiki/ 18

Example (continued)

Then

y =

0@ y0ψ1

y2

1A =

0@ A00 a01 A02

aT10 α11 aT12A20 a21 A22

1A0@ x0

χ1

x2

1A=

0@ A00x0 + a01χ1 +A02x2

aT10x0 + α11χ1 + aT12x2

A20x0 + a21χ1 +A22x2

1A

=

0BBBBBB@

„−1 2

1 0

«„12

«+

„4−1

«3 +

„1 0−2 1

«„45

«`

2 −1´„ 1

2

«+ (3)3 +

`1 2

´„ 45

«„

1 2−1 −2

«„12

«+

„30

«3 +

„4 31 2

«„45

«

1CCCCCCA

=

0BBBB@„

31

«+

„12−3

«+

„4−3

«0 + 9 + 14„

5−5

«+

„90

«+

„3114

«1CCCCA =

0BBBB@19−523459

1CCCCAhttp://z.cs.utexas.edu/wiki/pla.wiki/ 19

We are now going to “play” with partitioned matrices, to get thehang of it.

http://z.cs.utexas.edu/wiki/pla.wiki/ 20

Special case: Partition matrix by rows and result vector byelements

Partition

A =

aT

0

aT1...

aTm−1

and y =

ψ0

ψ1...

ψm−1

Then y = Ax can be computed as

ψ0

ψ1...

ψm−1

=

aT

0

aT1...

aTm−1

x =

aT

0 xaT

1 x...

aTm−1x

http://z.cs.utexas.edu/wiki/pla.wiki/ 21

Concrete example

(on blackboard)

http://z.cs.utexas.edu/wiki/pla.wiki/ 22

A very strange way of presenting the algorithm...

y := Mvmult unb var1(A, x, y)

Partition A→„AT

AB

«, y →

„yT

yB

«where AT is 0× n and yT is 0× 1

while m(AT ) < m(A) doRepartition„

AT

AB

«→

0@A0

aT1A2

1A ,

„yT

yB

«→

0@ y0

ψ1

y2

1Awhere a1 is a row

ψ1 := aT1 x+ ψ1

Continue with„AT

AB

«←

0@A0

aT1A2

1A ,

„yT

yB

«←

0@ y0ψ1

y2

1Aendwhile

http://z.cs.utexas.edu/wiki/pla.wiki/ 23

Special case: Partition matrix by columns and vector by elements

Partition

A =(a0 a1 · · · an−1

)and x =

χ0

χ1...

χn−1

Then y = Ax can be computed as

y =(a0 a1 · · · an−1

)

χ0

χ1...

χn−1

= a0χ0 + a1χ1 + · · ·+ an−1χn−1

= χ0a0 + χ1a1 + · · ·+ χn−1an−1.

http://z.cs.utexas.edu/wiki/pla.wiki/ 24

Concrete example

(on blackboard)

http://z.cs.utexas.edu/wiki/pla.wiki/ 25

A very strange way of presenting the algorithm...

y := Mvmult unb var2(A, x, y)

Partition A→`AL AR

´, x→

„xT

xB

«where AL is m× 0 and xT is 0× 1

while m(xT ) < m(x) doRepartition`

AL AR´→`A0 a1 A2

´,

„xT

xB

«→

0@ x0

χ1

x2

1Awhere a1 is a column

y := χ1a1 + y

Continue with`AL AR

´←`A0 a1 A2

´,

„xT

xB

«←

0@ x0

χ1

x2

1Aendwhile

http://z.cs.utexas.edu/wiki/pla.wiki/ 26

Example: Transpose matrix-vector multiplication

Let

A =

1 −2 02 −1 11 2 3

and x =

−12−3

.

Then

ATx =

1 −2 02 −1 11 2 3

T −12−3

=

1 2 1−2 −1 2

0 1 3

−12−3

=

0−6−7

.

http://z.cs.utexas.edu/wiki/pla.wiki/ 27

Algorithm for transposing a matrix

B := Trans unb var1(A,B)

Partition A→`AL AR

´, B →

„BT

BB

«where AL is m× 0 and BT is 0× n

while n(AL) < n(A) doRepartition`

AL AR´→`A0 a1 A2

´,

„BT

BB

«→

0@B0

bT1B2

1Awhere a1 is a column and bT1 is a row

bT1 := aT1

Continue with`AT AB

´←`A0 aT1 A2

´,

„BT

BB

«←

0@B0

bT1B2

1Aendwhile

http://z.cs.utexas.edu/wiki/pla.wiki/ 28

Example: Blocked matrix transposition

−1 2 4 1

1 0 −1 −22 −1 3 11 2 3 4−1 −2 0 1

T

=

−1 1 2 1 −1

2 0 −1 2 −24 −1 3 3 01 −2 1 4 1

http://z.cs.utexas.edu/wiki/pla.wiki/ 29

0BBBB@−1 2 4 1

1 0 −1 −2

2 −1 3 1

1 2 3 4−1 −2 0 1

1CCCCAT

=

0BBBB@„−1 2

1 0

« „4−1

« „1−2

«`

2 −1´ `

3´ `

1´„

1 2−1 −2

« „30

« „41

«1CCCCAT

=

0BBBBBBB@

„−1 2

1 0

«T `2 −1

´T „1 2−1 −2

«T„

4−1

«T `3´T „

30

«T„

1−2

«T `1´T „

41

«T

1CCCCCCCA

=

0BB@„−1 1

2 0

« „2−1

« „1 −12 −2

«`

4 −1´ `

3´ `

3 0´`

1 −2´ `

1´ `

4 1´

1CCA

=

0BB@−1 1 2 1 −1

2 0 −1 2 −2

4 −1 3 3 0

1 −2 1 4 1

1CCAhttp://z.cs.utexas.edu/wiki/pla.wiki/ 30

Theorem

Let

A =

A0,0 A0,1 · · · A0,N−1

A1,0 A1,1 · · · A1,N−1...

.... . .

...

AM−1,0 AM−1,1 · · · AM−1,N−1

.

Then

AT =

AT

0,0 AT1,0 · · · AT

M−1,0

AT0,1 AT

1,1 · · · ATM−1,1

......

. . ....

AT0,N−1 AT

1,N−1 · · · ATM−1,N−1

.

http://z.cs.utexas.edu/wiki/pla.wiki/ 31

y := Mvmult unb var1(A, x, y)

Partition A→„AT

AB

«, y →

„yT

yB

«where AT is 0× n and yT is 0× 1

while m(AT ) < m(A) doRepartition„

AT

AB

«→

0@A0

aT1A2

1A ,

„yT

yB

«→

0@ y0

ψ1

y2

1Awhere a1 is a row

ψ1 := aT1 x+ ψ1

Continue with„AT

AB

«←

0@A0

aT1A2

1A ,

„yT

yB

«←

0@ y0ψ1

y2

1Aendwhile

http://z.cs.utexas.edu/wiki/pla.wiki/ 32

y := Mvmult unb var1b(A, x, y)

Partition A→„

AT L AT R

ABL ABR

«,

x→„

xT

xB

«, y →

„yT

yB

«where AT L is 0× 0, xT , yT are 0× 1

while m(AT L) < m(A) doRepartition„

AT L AT R

ABL ABR

«→

0@ A00 a01 A02

aT10 α11 aT

12A20 a21 A22

1A,

„xT

xB

«→

0@ x0χ1x2

1A ,

„yT

yB

«→

0@ y0ψ1y2

1Awhere α11, χ1, and ψ1 are scalars

ψ1 := aT10x0 + α11χ1 + aT

12x2 + ψ1

Continue with„AT L AT R

ABL ABR

«←

0@ A00 a01 A02aT10 α11 aT

12A20 a21 A22

1A,

„xT

xB

«←

0@ x0χ1x2

1A ,

„yT

yB

«←

0@ y0ψ1y2

1Aendwhile

http://z.cs.utexas.edu/wiki/pla.wiki/ 33

Theorem

Let U be an upper triangular matrix. Partition

U →(UTL UTR

UBL UBR

)=

U00 u01 U02

uT10 υ11 uT

12

U20 u21 U22

,

where UTL and U00 are square matrices. Then

U →(UTL UTR

0 UBR

)=

U00 u01 U02

0 υ11 uT12

0 0 U22

,

where UTL and UBR are upper triangular matrices.

http://z.cs.utexas.edu/wiki/pla.wiki/ 34

Example

Consider

U00 u01 U02

uT10 υ11 uT

12

U20 u21 U22

,=

−1 2 4 1 0

0 0 −1 −2 10 0 3 1 20 0 0 4 30 0 0 0 2

We notice that uT

10 = 0, U20 = 0, and u21 = 0.

http://z.cs.utexas.edu/wiki/pla.wiki/ 35

y := Mvmult unb var1b(A, x, y)

Partition A→„

AT L AT R

ABL ABR

«,

x→„

xT

xB

«, y →

„yT

yB

«where AT L is 0× 0, xT , yT are 0× 1

while m(AT L) < m(A) doRepartition„

AT L AT R

ABL ABR

«→

0@ A00 a01 A02

aT10 α11 aT

12A20 a21 A22

1A,

„xT

xB

«→

0@ x0χ1x2

1A ,

„yT

yB

«→

0@ y0ψ1y2

1Awhere α11, χ1, and ψ1 are scalars

ψ1 := aT10x0 + α11χ1 + aT

12x2 + ψ1

Continue with„AT L AT R

ABL ABR

«←

0@ A00 a01 A02aT10 α11 aT

12A20 a21 A22

1A,

„xT

xB

«←

0@ x0χ1x2

1A ,

„yT

yB

«←

0@ y0ψ1y2

1Aendwhile

http://z.cs.utexas.edu/wiki/pla.wiki/ 36

y := Trmv un unb var1(U, x, y)

Partition U →„

UT L UT R

0 UBR

«,

x→„

xT

xB

«, y →

„yT

yB

«where UT L is 0× 0, xT , yT are 0× 1

while m(UT L) < m(U) doRepartition„

UT L UT R

0 UBR

«→

0@ U00 u01 U02

0 υ11 uT12

0 0 U22

1A,

„xT

xB

«→

0@ x0χ1x2

1A ,

„yT

yB

«→

0@ y0ψ1y2

1Awhere υ11, χ1, and ψ1 are scalars

ψ1 := uT10x0+ υ11χ1 + uT

12x2 + ψ1

Continue with„UT L UT R

0 UBR

«←

0@ U00 u01 U020 υ11 uT

120 0 A22

1A,

„xT

xB

«←

0@ x0χ1x2

1A ,

„yT

yB

«←

0@ y0ψ1y2

1Aendwhile

http://z.cs.utexas.edu/wiki/pla.wiki/ 37

Exercise

Let U ∈ Rn×n be an upper triangular matrix. Modify thealgorithm for computing y = Ax to compute y = Ux instead,taking advantage of the zeroes in the matrix.

http://z.cs.utexas.edu/wiki/pla.wiki/ 38

Cost of a triangular matrix-vector multiplication?

Consider U →

0@ U00 u01 U02

uT10 υ11 uT12U20 u21 U22

1A with U ∈ Rn×n and U00 ∈ Rk×k. Then

What is the size of uT12? n− k − 1

What is the cost of ψ1 := υ11χ1 + uT12x2 + ψ1?

2 + 2(n− k − 1) = 2(n− k).

What is the total cost of the algorithm (in flops)?

Cost =n−1∑k=0

[2(n− k)] = 2n−1∑k=0

[(n− k)] = 2n∑

j=1

j

= 2

n−1∑j=0

j + n

= 2(n(n− 1)

2+ n

)

= 2(n(n+ 1)

2

)= n(n+ 1) ≈ n2.

http://z.cs.utexas.edu/wiki/pla.wiki/ 39