Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw,...

Model Reduction forEdge-Weighted Personalized PageRank

David Bindel

Mar 2, 2015

David Bindel SCAN Mar 2, 2015 1 / 29

The PageRank Model

Surfer follows random link (probability α) or teleports to random node:

x (t+1) = αPx (t) + (1− α)v ,

P = AD−1 is a (weighted) adjacency matrix with columns normalized.

PageRank: Unweighted case

PageRank: Node Weighted

PageRank: Edge Weighted

The PageRank Model

Surfer follows random link (probability α) or teleports to random node:

x (t+1) = αPx (t) + (1− α)v ,

P = AD−1 is a (weighted) adjacency matrix with columns normalized.

Stationary equations:

Mx = b, M = I − αP, b = (1− α)v

PageRank iteration is a standard solver for this system.

Personalized PageRank

Introduce personalization parameters w ∈ Rd , consider two cases:

Node-weight: M x(w) = b(w)Edge-weight: M(w) x(w) = b

Examples:

b(w) = 1− αVw , columns of V are authorities for reference topics

Different edge types (authorship, citation, etc); wi is weight of type i

Nodes are writers, edge weights for topical similarity;w are weights in a weighted cosine similarity measure

Goal: Fast computation for varying w (different users, queries)

Edge-Weight vs Node-Weight

Node-weight personalization is well-studied

Topic-sensitive PageRank: fast methods based on linearity

Localized PageRank: fast methods based on sparsity

Little work on fast methods for edge-weight personalization!

Idea: Model Reduction

Replace large, expensive model by cheaper “reduced-order” model

Common idea in physical simulations

Use the model equations (vs black-box regression)

Great for control, optimization, etc (many evaluations)

Expensive pre-processing is OK

Model Reduction Framework

Observation: x(w) approximately in a low-dimensional space:

x(w) ≈ Uy(w), U ∈ Rn×k , k � n

Can find U by PCA/POD/KL/SVD on a “snapshot” matrix of samples

X =[x(w1) x(w2) . . . x(wr )

]Can estimate quality of best approximation in the space

A priori by interpolation theory (given bounds on derivatives)

A posteriori from truncated singular values

Question: How to extract the best (or near-best) y(w)?

Background: Interpolation Connection

Why not go with interpolant

x̂(w) =r∑

x(wj)cj(w)

where cj(w) is some Lagrange basis for an interpolation space?

Online phase is cheap

Accuracy depends on Lagrange basis (Lebesgue constants)

Have observed better accuracy with methods based on equations

Galerkin Approach

Goal: r = MUy − b ≈ 0 or Uy ≈ xGalerkin ansatz: W T (MUy − b) = 0Bubnov-Galerkin: W = U

Works great for linear parameterization

M(w) = I − αP(w) = I − α

wiP(i)

Model: pick edge type i with probability wi , then pick edge of that type.

B-G system: M̃(w)y(w) = UTb, where

M̃(w) = UTM(w)U = I − α

wi P̃(i)

), P̃(i) = UTP(i)U.

Error estimates

Key concept: quasi-optimality

‖x − x̂‖ ≤ C minz‖x − Uz‖

where C can be controlled in some way.

Accuracy = Good space (consistency) + Quasi-optimality (stability)

Quasi-optimality

Define M̃ = W TMU and Π = UM̃−1W TM:

x − Uy = (I − Π)x 0 = (I − Π)U

So we have the Galerkin error relation

x − Uy = (I − Π)(x − Uz)

for any candidate solution Uz . Take norms and minimize over z :

‖e‖ ≤ (1 + κG )‖emin‖

whereκG ≡ ‖U‖‖M̃−1‖‖W TM‖.

Quasi-optimality

If W TU normalized so W TU = I , then for linear parameterization

‖M̃−1‖ ≤ 1

1− αmaxj ‖P̃(j)‖

For 1-norm, have ‖M‖1 ≤ 1 + α, so if ‖P̃(j)‖1 < α−1 for j = 1, . . . , d ,

κG ≤(1 + α)‖U‖1‖W ‖∞1− αmaxj ‖P̃(j)‖1

That is, we can bound the quasi-optimality constant offline.

Galerkin Shortcomings

For nonlinear parameterizations, still need

M̃(w) = UTM(w)U.

Without a trick, have to

Form all of M(w)

Do k matrix-vector products with M(w)

Comparable to cost of standard PageRank algorithm!

DEIM Approach

DEIM = Discrete Empirical Interpolation Method

Goal: r = MUy − b ≈ 0 or Uy ≈ xDEIM ansatz: minimize ‖rI‖ for chosen indices I, |I| ≥ k.

Only requires a few rows/columns of M(w)! But how to choose I?

More expensive if we choose high-degree nodes(Much more an issue in social networks than physical problems)

What about accuracy? Choose “important” (high-PR) nodes?

Cost of forming the system

Typical case: P = AD−1, given A(w). Think of partitioning:A11 A12 0A21 A22 A23

0 A32 A33

If we enforce the first block equation, we need the colored blocksA11 A12 0

A21 A22 A23

0 A32 A33

where blue = used in DEIM equations, red = needed for normalization.

If A =∑

j wjA(j), no need to compute entries for normalization.

Cost of forming the system

Graph theoretic terms: if A(w) is linear, cost to form MI,:U is∑v∈I

inDegree(v)

Issue: Social networks have some very high-degree nodes!

Quasi-optimality

Analysis for DEIM ≈ analysis for Galerkin; in one-norm, have

κDEIM ≤ (1 + α)‖U‖1‖(MI,:(w)U)†‖1

Key: well-posedness of the projected least squares problem.

Pro: Estimating ‖(MI,:(w)U)†‖1 is cheap given MI,:(w)U = QR.

Con: A priori bounds are hard

Choosing the interpolation set

Key: keep MI,: far from singular.

If |I| = k , this is a subset selection over rows of MU.

Have standard techniques (e.g. pivoted QR)

Want to pick I once, so look at rows of

Z =[M(w (1))U M(w (2))U . . .

]for sample parameters w (i).

Application: Learning to Rank

Goal: Given T = {(iq, jq)}|T |q=1, find w that mostly ranks iq over j1.

Standard: Gradient descent on full problem

One PR computation for objective

One PR computation for each gradient component

Costs d + 1 PR computations per step

With model reduction

Rephrase objective in reduced coordinate space

Use factorization to solve PR for objective

Re-use same factorization for gradient

Test case

Test case: DBLP, 3.5M nodes, 18.5M edges, 7 params

Goal: Learning to rank (8 papers for training)

Consider linear parameterization (B-G and DEIM both apply)

Compare to ScaleRank (more restrictive than we are, but applies here)

This is a good case – see paper for some others

DBLP singular values

100000

0 50 100 150 200

Largest Singular Value

DBLP-L

DBLP accuracy

Galerk

ScaleR

Kendall@100

Normalized L1

DBLP learning task

0 2 4 6 8 10 12 14 16 18 20

Iteration

StandardGalerkin

DEIM-200

DBLP running times (PR at all nodes)

Galerkin

ScaleRank

Runnin

CoefficientsConstruction

The Punchline

Test case: DBLP, 3.5M nodes, 18.5M edges, 7 params

Cost per Iteration:

Method Standard Bubnov-Galerkin DEIM-200

Time(sec) 159.3 0.002 0.033

Improvement of nearly four or five orders of magnitude.

For more

Edge-Weighted Personalized PageRank:Breaking a Decade-Old Performance Barrier

Wenlei Xie, David Bindel, Johannes Gehrke, and Al Demers

Submitted to KDD

Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw,...

Documents

Transcript of Model Reduction for Edge-Weighted Personalized PageRankbindel/present/2015-03-scan.pdfb(w) = 1 Vw,...

Oti l i lti fd ttOptical manipulation of edge state ... · PDF fileOti l i lti fd ttOptical manipulation of edge state transpppgort in topological insulators ... A 364 me364 meVnm,

CS Edge part Distortion analysis 2006-12-08 T.Yoshioka M.Kazuyama.

Octal Edge-Triggered D-Type Flip-Flops With 3-State ...

Pendeteksian Tepi ( Edge Detection ) - Gunadarmaeri.staff.gunadarma.ac.id/Downloads/files/4625/citra6.pdf · Definisi Tepi • Tepi (edge ... Line. 3 Tujuan Pendeteksian tepi •

Competent Authorities response received on 23 June 2010 Iec.europa.eu/food/fvo/ap/ap_bg_2010-8436.pdf · (bttp://wm9.nymssovernment.be/fdesMMPKO%20NOVA%2U31%2003 īn contrast to the

One Dimensional Electronic Systems: Quantum Hall Edge ...

First-Principles Calculations of the Near-Edge Optical ...

Fermi Edge Singularities in the Mesoscopic X-Ray Edge Problemcophen04/Talks/Hentschel.pdf · e.g. Peaked Edge L 2,3-edge simple metals like Al, Mg, (Na) from K. Othaka, Y. Tanabe,

P16080 from left to right - EDGE

Edge Detection - University of Toronto · Edge Detection Goal: Detection and Localization of Image Edges. Motivation: • Signiﬁcant, often sharp, contrast variations in images

vivl-livad.voi.sch.grvivl-livad.voi.sch.gr/syllogikoskatalogos/FILES/AUTHORITIES/215_ebook.pdfii Κανόνες καθιέρωσης Authorities στο ΣΚΔΒ (Γενικά) •

Diamond Edge-TCT - GSI · Diamond Edge-TCT Christian Dorfer, ETH Zurich! 7th ADAMAS Workshop, Vienna, 13-14 December 2018! 2 Such setup is able to measure: • total collected charge

Graph Edge Coloring: Tashkinov Trees and Goldberg’s … · Graph Edge Coloring: Tashkinov Trees and Goldberg’s Conjecture ... [13, 14] a simple but very ... tional edge coloring

SN74AUC1G74 SINGLE POSITIVE-EDGE-TRIGGEREDD-TYPEFLIP …

Three-Point Bending Single-Edge Notched Fracture Test …nsel.cee.illinois.edu/website2008upgrade/testing/Testing Manuals... · Manual for Three-Point Bending Single-Edge Notched

X-ray Crystallographic Structure of an Artificial β-Sheet Dimer · X-ray Crystallographic Structure of an Artificial ... Introduction The formation of dimers through edge-to-edge

Post-marketing experiences in Pompe disease · – Rotterdam handicap scale – Fatigue severity scale – SF-36 • PROs increasingly important to regulatory authorities – Patients’

FP-RE-P006 6 Rippled Edge Plate 7 · FP-RE-P006 6" Rippled Edge Plate 7 φ155X14.6 485X330X450 2400 405 809 892 FP-RE-P007 7" Rippled Edge Plate 11 φ184X14.6 575X385X360 1800 365

LECTURE 4. PROOF OF IHARA’S THEOREM, EDGE ZETAS ...math.ucsd.edu/~aterras/montreal lecture4.pdf · LECTURE 4. PROOF OF IHARA’S THEOREM, EDGE ZETAS QUANTUMZETAS, QUANTUM CHAOS

LANGUAGE MODELLING FOR AUTHORSHIP ATTRIBUTION IN …