Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw,...

29
Model Reduction for Edge-Weighted Personalized PageRank David Bindel Mar 2, 2015 David Bindel SCAN Mar 2, 2015 1 / 29

Transcript of Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw,...

Page 1: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Model Reduction forEdge-Weighted Personalized PageRank

David Bindel

Mar 2, 2015

David Bindel SCAN Mar 2, 2015 1 / 29

Page 2: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

The PageRank Model

Surfer follows random link (probability α) or teleports to random node:

x (t+1) = αPx (t) + (1− α)v ,

P = AD−1 is a (weighted) adjacency matrix with columns normalized.

David Bindel SCAN Mar 2, 2015 2 / 29

Page 3: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

PageRank: Unweighted case

David Bindel SCAN Mar 2, 2015 3 / 29

Page 4: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

PageRank: Node Weighted

David Bindel SCAN Mar 2, 2015 4 / 29

Page 5: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

PageRank: Edge Weighted

David Bindel SCAN Mar 2, 2015 5 / 29

Page 6: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

The PageRank Model

Surfer follows random link (probability α) or teleports to random node:

x (t+1) = αPx (t) + (1− α)v ,

P = AD−1 is a (weighted) adjacency matrix with columns normalized.

Stationary equations:

Mx = b, M = I − αP, b = (1− α)v

PageRank iteration is a standard solver for this system.

David Bindel SCAN Mar 2, 2015 6 / 29

Page 7: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Personalized PageRank

Introduce personalization parameters w ∈ Rd , consider two cases:

Node-weight: M x(w) = b(w)Edge-weight: M(w) x(w) = b

Examples:

b(w) = 1− αVw , columns of V are authorities for reference topics

Different edge types (authorship, citation, etc); wi is weight of type i

Nodes are writers, edge weights for topical similarity;w are weights in a weighted cosine similarity measure

Goal: Fast computation for varying w (different users, queries)

David Bindel SCAN Mar 2, 2015 7 / 29

Page 8: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Edge-Weight vs Node-Weight

Node-weight personalization is well-studied

Topic-sensitive PageRank: fast methods based on linearity

Localized PageRank: fast methods based on sparsity

Little work on fast methods for edge-weight personalization!

David Bindel SCAN Mar 2, 2015 8 / 29

Page 9: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Idea: Model Reduction

Replace large, expensive model by cheaper “reduced-order” model

Common idea in physical simulations

Use the model equations (vs black-box regression)

Great for control, optimization, etc (many evaluations)

Expensive pre-processing is OK

David Bindel SCAN Mar 2, 2015 9 / 29

Page 10: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Model Reduction Framework

Observation: x(w) approximately in a low-dimensional space:

x(w) ≈ Uy(w), U ∈ Rn×k , k � n

Can find U by PCA/POD/KL/SVD on a “snapshot” matrix of samples

X =[x(w1) x(w2) . . . x(wr )

]Can estimate quality of best approximation in the space

A priori by interpolation theory (given bounds on derivatives)

A posteriori from truncated singular values

Question: How to extract the best (or near-best) y(w)?

David Bindel SCAN Mar 2, 2015 10 / 29

Page 11: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Background: Interpolation Connection

Why not go with interpolant

x̂(w) =r∑

j=1

x(wj)cj(w)

where cj(w) is some Lagrange basis for an interpolation space?

Online phase is cheap

Accuracy depends on Lagrange basis (Lebesgue constants)

Have observed better accuracy with methods based on equations

David Bindel SCAN Mar 2, 2015 11 / 29

Page 12: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Galerkin Approach

Goal: r = MUy − b ≈ 0 or Uy ≈ xGalerkin ansatz: W T (MUy − b) = 0Bubnov-Galerkin: W = U

Works great for linear parameterization

M(w) = I − αP(w) = I − α

(∑i

wiP(i)

).

Model: pick edge type i with probability wi , then pick edge of that type.

B-G system: M̃(w)y(w) = UTb, where

M̃(w) = UTM(w)U = I − α

(∑i

wi P̃(i)

), P̃(i) = UTP(i)U.

David Bindel SCAN Mar 2, 2015 12 / 29

Page 13: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Error estimates

Key concept: quasi-optimality

‖x − x̂‖ ≤ C minz‖x − Uz‖

where C can be controlled in some way.

Accuracy = Good space (consistency) + Quasi-optimality (stability)

David Bindel SCAN Mar 2, 2015 13 / 29

Page 14: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Quasi-optimality

Define M̃ = W TMU and Π = UM̃−1W TM:

x − Uy = (I − Π)x 0 = (I − Π)U

So we have the Galerkin error relation

x − Uy = (I − Π)(x − Uz)

for any candidate solution Uz . Take norms and minimize over z :

‖e‖ ≤ (1 + κG )‖emin‖

whereκG ≡ ‖U‖‖M̃−1‖‖W TM‖.

David Bindel SCAN Mar 2, 2015 14 / 29

Page 15: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Quasi-optimality

If W TU normalized so W TU = I , then for linear parameterization

‖M̃−1‖ ≤ 1

1− αmaxj ‖P̃(j)‖

For 1-norm, have ‖M‖1 ≤ 1 + α, so if ‖P̃(j)‖1 < α−1 for j = 1, . . . , d ,

κG ≤(1 + α)‖U‖1‖W ‖∞1− αmaxj ‖P̃(j)‖1

.

That is, we can bound the quasi-optimality constant offline.

David Bindel SCAN Mar 2, 2015 15 / 29

Page 16: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Galerkin Shortcomings

For nonlinear parameterizations, still need

M̃(w) = UTM(w)U.

Without a trick, have to

Form all of M(w)

Do k matrix-vector products with M(w)

Comparable to cost of standard PageRank algorithm!

David Bindel SCAN Mar 2, 2015 16 / 29

Page 17: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

DEIM Approach

DEIM = Discrete Empirical Interpolation Method

Goal: r = MUy − b ≈ 0 or Uy ≈ xDEIM ansatz: minimize ‖rI‖ for chosen indices I, |I| ≥ k.

Only requires a few rows/columns of M(w)! But how to choose I?

More expensive if we choose high-degree nodes(Much more an issue in social networks than physical problems)

What about accuracy? Choose “important” (high-PR) nodes?

David Bindel SCAN Mar 2, 2015 17 / 29

Page 18: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Cost of forming the system

Typical case: P = AD−1, given A(w). Think of partitioning:A11 A12 0A21 A22 A23

0 A32 A33

If we enforce the first block equation, we need the colored blocksA11 A12 0

A21 A22 A23

0 A32 A33

where blue = used in DEIM equations, red = needed for normalization.

If A =∑

j wjA(j), no need to compute entries for normalization.

David Bindel SCAN Mar 2, 2015 18 / 29

Page 19: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Cost of forming the system

Graph theoretic terms: if A(w) is linear, cost to form MI,:U is∑v∈I

inDegree(v)

Issue: Social networks have some very high-degree nodes!

David Bindel SCAN Mar 2, 2015 19 / 29

Page 20: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Quasi-optimality

Analysis for DEIM ≈ analysis for Galerkin; in one-norm, have

κDEIM ≤ (1 + α)‖U‖1‖(MI,:(w)U)†‖1

Key: well-posedness of the projected least squares problem.

Pro: Estimating ‖(MI,:(w)U)†‖1 is cheap given MI,:(w)U = QR.

Con: A priori bounds are hard

David Bindel SCAN Mar 2, 2015 20 / 29

Page 21: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Choosing the interpolation set

Key: keep MI,: far from singular.

If |I| = k , this is a subset selection over rows of MU.

Have standard techniques (e.g. pivoted QR)

Want to pick I once, so look at rows of

Z =[M(w (1))U M(w (2))U . . .

]for sample parameters w (i).

David Bindel SCAN Mar 2, 2015 21 / 29

Page 22: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Application: Learning to Rank

Goal: Given T = {(iq, jq)}|T |q=1, find w that mostly ranks iq over j1.

Standard: Gradient descent on full problem

One PR computation for objective

One PR computation for each gradient component

Costs d + 1 PR computations per step

With model reduction

Rephrase objective in reduced coordinate space

Use factorization to solve PR for objective

Re-use same factorization for gradient

David Bindel SCAN Mar 2, 2015 22 / 29

Page 23: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

Test case

Test case: DBLP, 3.5M nodes, 18.5M edges, 7 params

Goal: Learning to rank (8 papers for training)

Consider linear parameterization (B-G and DEIM both apply)

Compare to ScaleRank (more restrictive than we are, but applies here)

This is a good case – see paper for some others

David Bindel SCAN Mar 2, 2015 23 / 29

Page 24: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

DBLP singular values

0.1

1

10

100

1000

10000

100000

1x106

0 50 100 150 200

Val

ue

ith

Largest Singular Value

DBLP-L

David Bindel SCAN Mar 2, 2015 24 / 29

Page 25: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

DBLP accuracy

10-5

10-4

10-3

10-2

10-1

100

Galerk

in

DEIM

-100

DEIM

-120

DEIM

-200

ScaleR

ank

Kendall@100

Normalized L1

David Bindel SCAN Mar 2, 2015 25 / 29

Page 26: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

DBLP learning task

100

150

200

250

300

350

400

0 2 4 6 8 10 12 14 16 18 20

Obje

ctiv

e F

unct

ion V

alue

Iteration

StandardGalerkin

DEIM-200

David Bindel SCAN Mar 2, 2015 26 / 29

Page 27: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

DBLP running times (PR at all nodes)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Galerkin

DEIM

-100

DEIM

-120

DEIM

-200

ScaleRank

Runnin

g t

ime

(s)

CoefficientsConstruction

David Bindel SCAN Mar 2, 2015 27 / 29

Page 28: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

The Punchline

Test case: DBLP, 3.5M nodes, 18.5M edges, 7 params

Cost per Iteration:

Method Standard Bubnov-Galerkin DEIM-200

Time(sec) 159.3 0.002 0.033

Improvement of nearly four or five orders of magnitude.

David Bindel SCAN Mar 2, 2015 28 / 29

Page 29: Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw, columns of V are authorities for reference topics Di erent edge types (authorship,

For more

Edge-Weighted Personalized PageRank:Breaking a Decade-Old Performance Barrier

Wenlei Xie, David Bindel, Johannes Gehrke, and Al Demers

Submitted to KDD

David Bindel SCAN Mar 2, 2015 29 / 29