QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

30
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures Austin Benson ICME, Stanford University Work completed in part at UC-Berkeley SIAM CSE 2013 February 27, 2013

Transcript of QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Page 1: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

QR Factorizations and SVDs for Tall-and-skinnyMatrices in MapReduce Architectures

Austin Benson

ICME, Stanford UniversityWork completed in part at UC-Berkeley

SIAM CSE 2013

February 27, 2013

Page 2: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Collaborators

James Demmel, UC-Berkeley David Gleich, Purdue

Paul Constantine, Stanford

Thanks!

Page 3: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Quick QR and SVD review

A Q R

VT

n

n

n

n

n

n

A U n

n

n

n

n

n

Σ

n

n

Figure: Q, U, and V are orthogonal matrices. R is upper triangular andΣ is diagonal with positive entries.

Page 4: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Tall-and-skinny QR

A Q

R

m

n

m

n n

n

Tall-and-skinny (TS): m >> n

Page 5: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

TS-QR → TS-SVD

R is small, so computing its SVD is cheap.

Page 6: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Why Tall-and-skinny QR and SVD?

1. Regression with many samples

2. Principle Component Analysis (PCA)

3. Model Reduction

Pressure, Dilation, Jet Engine

Figure: Dynamic mode decomposition of a rectangular supersonicscreeching jet. Joe Nichols, Stanford University.

Page 7: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

MapReduce overview

Two functions that operate on key value pairs:

(key , value)map−−→ (key , value)

(key , 〈value1, . . . , valuen〉)reduce−−−−→ (key , value)

A shuffle stage runs between map and reduce to sort the values bykey.

Page 8: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

MapReduce overview

Scalability: many map tasks and many reduce tasks are used

https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png

Page 9: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

MapReduce overview

The idea is data-local computations. The programmer implements:

I map(key, value)

I reduce(key, 〈 value1, . . ., valuen 〉)

The shuffle and data I/O is implemented by the MapReduceframework, e.g., Hadoop.

This is a very restrictive programming environment! We sacrificeprogram control for structure, scalability, fault tolerance, etc.

Page 10: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Matrix representation

We have matrices, so what are the key-value pairs? We can justhave unique row identifiers:

A =

1.0 0.02.4 3.70.8 4.29.0 9.0

(a3320354, [1.0, 0.0])(5da011e2, [2.4, 3.7])(94d80025, [0.8, 4.2])(90764917, [9.0, 9.0])

Each value in a key-value pair is a row of the matrix.

Page 11: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Matrix representation: an example

We can encode more information in the keys, e.g., (x, y, z)coordinates and model number:((47570,103.429767811242,0,-16.525510963787,iDV7), [0.00019924

-4.706066e-05 2.875293979e-05 2.456653e-05 -8.436627e-06 -1.508808e-05

3.731976e-06 -1.048795e-05 5.229153e-06 6.323812e-06])

Figure: Aircraft simulation data. Paul Constantine, Stanford University

Page 12: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Why MapReduce?

1. Fault tolerance

2. Structured data I/O

3. Cheaper clusters with more data storage

4. Easy to program

Generate lots of data on supercomputer

Post-process and analyze on MapReduce cluster

http://www.nersc.gov/assets/About-Us/hopper1.jpg

Page 13: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Fault tolerance

0 50 100 150 2001200

1250

1300

1350

1400

1450

1500

1550

1/(probability of task fault)

job

tim

e (

se

cs.)

Performance with Injected Faults

Running time with no faults

Running times with injected faults

≈ 25% performance penalty with 1 out 5 tasks failing

Page 14: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Why MapReduce?

Not convinced MapReduce is good for computational science?

MS218: Is MapReduce Good for Science and Simulation Data?,Thursday, 4:30-6:30pm.

Page 15: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Communication-avoiding TSQR

A =

A1

A2

A3

A4

︸ ︷︷ ︸8n×4n

=

Q1

Q2

Q3

Q4

︸ ︷︷ ︸

8n×4n

R1

R2

R3

R4

︸ ︷︷ ︸4n×n

=

=Q︷ ︸︸ ︷Q1

Q2

Q3

Q4

︸ ︷︷ ︸

8n×4n

Q̃︸︷︷︸4n×n

R︸︷︷︸n×n

(Demmel et al. 2008)

Page 16: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Communication-avoiding TSQR

A =

A1

A2

A3

A4

︸ ︷︷ ︸8n×4n

=

Q1

Q2

Q3

Q4

︸ ︷︷ ︸

8n×4n

R1

R2

R3

R4

︸ ︷︷ ︸4n×n

Ai = QiRi can be computed in parallel. If we only need R, thenwe can throw out the Q factors.

Page 17: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

MapReduce TSQR

S(1)

A

A1

A2

A3

A3

R1 map

A2

emit R2 map

A3

emit R3 map

A4

emit R4 map

shuffle

S1

A2

reduce

S2 R2,2

reduce

R2,1 emit

emit

emit

shuffle

A2 S3 R2,3

reduce emit

Local TSQR

identity map

A2 S(2) R reduce emit

Local TSQR Local TSQR

Figure: S (1) is the matrix consisting of the rows of all of the Ri factors.Similarly, S (2) consists of all of the rows of the R2,j factors.

Page 18: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

MapReduce TSQR: the implementation

1 import random, numpy, hadoopy2 class SerialTSQR:3 def i n i t ( self , blocksize , isreducer ) :4 se l f . bsize , se l f . data =blocksize , [ ]5 se l f . c a l l = se l f . reducer i f isreducer else se l f .mapper67 def compress( se l f ) :8 R = numpy. l ina lg . qr(numpy. array( se l f . data) , ’ r ’ )9 se l f . data = [ ]

10 for row in R: se l f . data .append([ f loat (v) for v in row])1112 def col lect ( self , key , value ) :13 se l f . data .append(value)14 i f len ( se l f . data) > se l f . bsize ∗ len ( se l f . data [0 ] ) :15 se l f . compress()1617 def close ( se l f ) :18 se l f . compress()19 for row in se l f . data : yield random. randint(0,2000000000), row2021 def mapper( self , key , value ) :22 se l f . co l lect (key , value)2324 def reducer( self , key , values ) :25 for value in values : se l f .mapper(key , value)2627 i f name ==’ main ’ :28 mapper = SerialTSQR(blocksize=3, isreducer=False)29 reducer = SerialTSQR(blocksize=3, isreducer=True)30 hadoopy. run(mapper, reducer)

Page 19: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

MapReduce TSQR: Getting Q

We have an efficient way of getting R in QR (and Σ in the SVD).What if I want Q (or my left singular vectors)?

A = QR → Q = AR−1

This is a numerically unstable computation, and Q can be far fromorthogonal. (This may be OK in some cases).

Page 20: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Indirect TSQR

We call this method Indirect TSQR.

Page 21: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Indirect TSQR: Iterative Refinement

We can repeat the TSQR computation on Q to get a moreorthogonal matrix. This is called iterative refinement.

A

A1 R-1 map

R

Q1

A2 R-1 map

Q2

A3 R-1 map

Q3

A4 R-1 map

Q4

emit

emit

emit

emit

distribute

TSQ

R

Q

Q1 R1

-1 map

R1

Q1

Q2 R1

-1 map

Q2

Q3 R1

-1 map

Q3

Q4 R1

-1 map

Q4

emit

emit

emit

emit

distribute

Local MatMul Local MatMul

Iterative Refinement step

Page 22: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Direct TSQR

Why is it hard to reconstruct Q?

I Only maintained “state” is the data written to disk at the endof each map/reduce iteration

I No controlled communication between processors → can onlycontrol data movement via the shuffle

I File names and keys are the only methods of labeling datacomponents

Page 23: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Direct TSQR: Steps 1 and 2

A

A1 R1 map

emit Q11 emit

A2 R2 map

emit Q21

emit

A3 R3 map

emit Q31

emit

A4 R4 map

emit Q41

emit

First step

R1

R2

R3

R4

Q12

Q22

Q32

Q42

R

reduce

emit

emit

emit

emit

emit

Second step

shuffle

Page 24: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Direct TSQR: Step 3

Page 25: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Cholesky QR

We also have a Cholesky QR:

ATA = (QR)T (QR) = RTQTQR = RTR

Computing ATA translates nicely to MapReduce. This method hasworse stability issues but requires fewer flops.

Page 26: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Performance Data

4Bx4 (135 GB) 2.5Bx10 (193 GB) 600Mx25 (112 GB) 500Mx50 (184 GB) 150Mx100 (110 GB)0

1000

2000

3000

4000

5000

6000

7000

8000

matrix size

Tim

e t

o c

om

ple

tio

n (

se

co

nd

s)

Performance of various MapReduce algorithms for computing QR

Cholesky

Indirect TSQR

Cholesky + IR

Indirect TSQR + IR

Direct TSQR

Page 27: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

End

Questions??

I code: https://github.com/arbenson/mrtsqr

I arXiv paper: http://arxiv.org/abs/1301.1071

I my email: [email protected]

Page 28: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Extra slides

Page 29: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Cholesky QR

A

A1 A1

TA1 map emit

A2 A2

TA2 map emit

A3 A3

TA3 map emit

A4 A4

TA4 map emit

shuffle

(A1TA1)1

(A2TA2)1

(A3TA3)1

(A4TA4)1

(ATA)1

(A1TA1)2

(A2TA2)2

(A3TA3)2

(A4TA4)2

(ATA)2

(A1TA1)3

(A2TA2)3

(A3TA3)3

(A4TA4)3

(ATA)3

(A1TA1)4

(A2TA2)4

(A3TA3)4

(A4TA4)4

(ATA)4

emit

emit

emit

emit

ATA RTR emit

reduce

reduce

reduce

reduce

Local ATA Row Sum Cholesky

Page 30: QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architectures (SIAM CSE)

Stability

100

105

1010

1015

1020

10−15

10−10

10−5

100

105

cond(A)

||Q

TQ

− I

||2

Numerical stability: 10000x10 matrices

Dir. TSQR

Indir. TSQR

Indir. TSQR + I.R.

Chol.

Chol. + I.R.