Chapter 7 PageRank · Background Web graph Google’s matrix Teleportation 1 − α Personalised...

Background Web graph Google’s matrix Teleportation 1 − α Personalised vector Sensitivity Proofs Local algorithms New

Chapter 7 PageRank

Angsheng Li

Institute of SoftwareChinese Academy of Sciences

Advanced AlgorithmsU CAS

1st, April, 2017


Outline

1. Backgrounds

2. Web graph

3. Google’s matrix

4. Teleportation

5. Personalised vector

6. Sensitivity

7. Proofs

8. Local algorithms

9. Exrcises


The new phenomena

Brin and Page, 1995 - 1998

1. The current-generation search engine

2. Billions of queries everyday

3. What is the principle behind?

4. How good is the current-generation search engine?


The graph

• Massive directed graph

• Nodes: webpages

• Directed edges, hyperlines, including inlinks and outlinks

• The question: Rank the web pages by importance.


The PageRank thesis

A page is important, if it is pointed to by many important pages.


Brin and Page, 1998

Established the equation of the PageRank thesis.The PageRank of a page Pi , written r(Pi), is the sum of thePageRanks of all the pages pointing to Pi , that is,

r(Pi) =∑

Pj∈Bi

r(Pj)

|Pj |, (1)

• Bi : the set of pages pointing to Pi ,

• |Pj |: the number of outlinks from page Pj .


Recurrence of the PageRank

rk+1(Pi) =∑

Pj∈Bi

rk (Pj )

|Pj |

r0(Pi) =1n

(2)

The stationary solution of the recursive equation in Equation (2)gives rise to the PageRank of a graph G.


Matrix representation

Hij =

1|Pi |

if there is an edge from node i to node j ,

0 o.w.(3)

|Pi |: The number of outlinks from node i .H = (Hij) is the PageRank matrix of G.


PageRank solution

Let πT be a 1× n vector.Set

π(k+1)T = π(k)TH,

π(0)T = 1n eT,

(4)

where eT = (1,1, · · · ,1).For the equation (4), we require:

• convergence and the interpretation of the solution

• Uniqueness of the solution

• Invariance of π(0)

• The number of iterations of the convergent solution


Rank sinks

1 2

ցւ3

All the PageRanks go to node 3.


Matrix S

To solve the sink problem, define a vector a,

ai =

1 if node i has no outgoing links,

0 o.w.(5)

DefinitionDefine

S = H +1n

aeT,

where eT = (1,1, · · · ,1).Intuition : If node i has no outgoing link, then from node i , therandomly walks to any other nodes uniformly.S is the transition probability matrix of a Markov chain.


Google’s matrix G

DefinitionDefine the Google’s matrix by

G = αS + (1− α)J,

where Jij =1n .

• J is called teleportation matrix

• 1− α is called the teleportation parameter.


Expander

Recall: If G is a graph with λ = λ(G) < 1, then for A = AG,

A = (1− λ)J + λC,

for some C with ‖C‖ ≤ 1.We thus know that Google’s matrix is an expander. However,the parameter α is chosen arbitrarily. Of course, α determinesthe spectral gap of the graph.


Properties of G - I

(1) G is stochasticIt is a convex combination of two stochastic matrices S andJ.

(2) G is irreducible.Every page is directly connected to every other page.

(3) G is aperiodic.Gii > 0. Every node has a self-loop.

(4) G is primitive.There exists a k such that Gk > 0Because: G is an expander. There is a unique πT such that

‖pGl − πT‖ ≈ 0

for a small l . – Power method works

Meng Wenxia

Text Box

随机游走

Meng Wenxia

Text Box

因为有J，可以直接连其他点


Properties of G - II

(5) G is rank-one updated

G = αS + (1− α)1n

eeT

= α(H +1n

aeT) + (1− α)1n

eeT

= αH + (α1n

a + (1− α)1n

e)eT. (6)

• H is sparse• α 1

n a + (1− α) 1n e is dense, but only one-dimensional vector.

(6) G is artificial due to the choice of α.G may not well reflect the real world H.


Computation of πT

Power method

π(k+1)T = π(k)TG

= απ(k)TS +1− α

nπ(k)TeeT

= απ(k)TH + (απ(k)Ta + (1− α)e)eT/n. (7)

Suppose that 1, λ2, · · · , λn are the eigenvalues of G with1 > |λ2| ≥ · · · ≥ |λn|.Then:

G = G1 + λ2G2 + · · · + λnGn,

– G2i = Gi ,

– For i 6= j , GiGj = 0.Then

Gl = G1 + λl2G2 + · · · λl

nGn

Since λ2 < 1, Gl quickly converges to G1.Furthermore, for any probabilistic vector p,


λ(G)

LemmaFor the Google matrix G = αS + (1− α)J,

|λ2(G)| ≤ α.


λ(G) again

LemmaIf the spectrum of the stochastic matrix S is 1, λ2, · · · , λn,then the spectrum of the Google matrix G = αS + (1− α)evT is

1, αλ2, · · · , αλn,where vT is the personalised vector.


Proofs - I

Since S is stochastic, (1,e) is an eigenpair of S. Let Q = (eX )be a nonsingular matrix that has the eigenvector e as its firstcolumn.Set

Q−1 =

(

yT

Y T

)

(8)

Then:

Q−1Q =

(

yTe yTXY Te Y TX

)

=

(

1 00 I

)

(9)


Proofs - II

Similarly,

Q−1SQ =

(

yTe Y TSXY Te Y TSX

)

=

(

1 yTSX0 Y TSX

)

(10)

This implies that Y TSX contains the remaining eigenvalues ofS, i.e., λ2, · · · , λn.In addition,

Q−1GQ =

(

1 αyTSX + (1− α)vTX0 αY TSX

)

(11)

The eigenvalues of G are

1, αλ2, · · · , αλn.Since λ2 ≤ 1, αλ2 ≤ α.


The role α

G = (1− α)J + αS.

If α is small, then 1− α is large, G is basically an artificialrandom graph, failing to reflect the real world matrix S.If α is large, then

• there is no unique stationary distribution

• even if there is a stationary distribution, it is hard tocompute

• the power method fails

Google’s choice : α = 0.85.


Personalised PageRank

For a personalised probability vector vT,

G = αS + (1− α)evT.

The power method works as before.The stationary distribution is a personalised PageRank.Significance : Real applications.


The stationary distribution

TheoremThe Pagerank πT(α) of Gα is

πT(α) =1

n∑

i=1Di(α)

(D1(α),D2(α), · · · ,Dn(α))

where Di(α) is the i-th principal minor determinant of ordern − 1 in I −Gα.Furthermore, every Di(α) is differentiable for α.

Proof.By definition.


Differential

TheoremIf πT(α) = (π1(α), π2(α), · · · , πn(α)), then

1. For each j,

|dπj(α)

dα| ≤ 1

1− α.

2.

‖dπT(α)

dα‖1 ≤

21− α

.

• If α is small, then the PageRank πT(α) is not sensitive.

• If α is large, then the upper bounds 11−α and 2

1−α are bothapproaching to infinity.


Representation

Theorem

dπT(α)

dα= −vT(I − S)(I − αS)−2.


Sensitive to H

1.dπT(hij)

dhij= απi(e

Tj − vT )(I − αS)−1

2.(I − αS)−1 →∞,

as α goes to 1.

πT is sensitive to perturbations in H is α ≈ 1.Therefore, if α ≈ 1, then πT is sensitive to small changes of thematrix H.


Sensitive to vT

dπT(vT )

dvT = (1− α+ α∑

i∈D

πi)(I − αS)−1,

D is the set of nodes that have no outgoing links.The same as before, as α goes to 1, (I − αS)−1 goes to∞.


Summary of sensitivity

If α ≈ 1, then

1. Computing πT(α) is hard, since the power method fails

2. πT(α) is sensitive to the perturbation of H

3. πT(α) is sensitive to the personalised vector vT

Google’s tradeoff:

α = 0.85


Proof of upper bounds - ITheoremIf πT(α) = (π1(α), π2(α), · · · , πn(α)), then

1. For each j,

|dπj(α)

dα| ≤ 1

1− α.

2.

‖dπT(α)

dα‖1 ≤

21− α

.

πT (α) is a probability vector, so

n∑

i=1

πi(α) = 1

givingπT(α)e = 1, eT = (1,1, · · · ,1).


Proof of upper bounds- IIBy definition,πT (α) = πT (α)G(α) = πT (α)(αS + (1− α)evT ).By differential,

dπT (α)

dα= πT (α)(S − evT )(I − αS)−1. (12)

For (1). For every real x , xT⊥e, i.e.,∑

xi = 0, and for all realvector y , column vector,

|xTy | = |n

∑

i=1

xiyi |

≤ ‖xT‖1 ·ymax − ymin

2. (13)

By Equation (12),

dπj(α)

dα= πT(α)(S − evT)(I − αS)−1ej .


Prof of upper bounds - III

Since πT(α)(S − evT)e = 0, set xT = πT(α)(S − evT) andy = (I − αS)−1ej .By Inequality (13),

|dπj(α)

dα| ≤ ‖πT(α)(S − evT)‖1 ·

ymax − ymin

2.

Since ‖πT(α)(S − evT)‖1 ≤ 2,

|dπj (α)dα | ≤ ymax − ymin.

Since (I − αS)−1 ≥ 0 and (I − αS)e = (1− α)e, and hence(I − αS)−1 = (1− α)−1e.This shows that ymin ≥ 0.For ymax, we haveymax ≤ maxi ,j [(I − αS)−1]ij ≤ 1

1−α .(1) follows.


Proof of upper bounds - IV

For (2).

‖dπT(α)

dα‖1 = ‖πT(α)(S − evT)(I − αS)−1‖1

≤ ‖πT(α)(S − evT)‖1 · ‖(I − αS)−1‖∞≤ 2

11− α

=2

1− α. (14)


Conductance

Given a graph G = (V ,E) and S ⊂ V , the conductance of S inG is:

Φ(S) =|E(S, S)|

minvol(S), vol(S).

The conductance of G isΦ = minΦ(S) | |S| ≤ n

2.


Push(u)

Andersen, Chung and Lang, FOCS, 2006.Define an operatorPush(u):

1. p(u)← p(u) + αr(u)

2. r(u)← (1− α)r(u)/2

3. For each v with v ∼ u,set

r(v)← r(v) + (1− α)r(u)/(2d(u)).


Approximate PageRank

Given a node v ,

1. set p = 0, r(v) = 1, and r(u) = 0 for all u 6= v .

2. For every u, if r(u) ≥ ǫd(u), then:– Apply push(u).

3. Otherwise, Then output p and r .


ACL local algorithm

1. To find the RageRank from a given input vertex v ,

2. To rank the pages by decreasing of the normalisedPageRank, i.e., pv

d(v) . Suppose that v1, v2, · · · , vl

is listed such that

pv1

d(v1)≥ pv2

d(v2)≥ · · · ≥ pvl

d(vl).

3. (Pruning) To take an initial segment of the list as acommunity associated with the given input v .Let j be such that

Φ(Xj) = minΦ(Xi) | 1 ≤ i ≤ l,where φ(X ) is the conductance of X in G, andXk = v1, · · · , vk.Output Xj .

Meng Wenxia

Text Box

conductance小，图出去的少


Question for local algorithm

For every query Q, we rank the set of answers for the query byPageRank, however, the list is a too long list.The question is to determine a short list of ranks as the outputof the query.Still open.


The great idea

• The PageRank thesis

• The teleportation parameter 1− α.This is a great idea, which may be used in many otherareas, such as learning, data processing.The essence of the idea here is to make sure that theRanking matrix is a well-defined stochastic procedure sothat PageRank exists and can be computed.We may also regard the introduction of 1− α as amplifyingnoises, playing a role similar to that in the error correctingcodes.

• Google’s success: Making big money by randomness


A grand challenge

• What is the principle for determining α? Is there a metric ofnetworks which determines the optimum α?

• What are principles for structuring the unstructured andnoisy data?

• Making money by connection and interaction???


Reference

1. Amy N. Langville and Carl D. Meyer, Google’s PageRankand Beyond: The Science of Search Engine Ranking,Princeton University Press, 2006.2. Andersen, Chung and Lang, Local graph partitioning usingPageRank vectors, FOCS, 2006.


Natural rank

The natural rank based on the structural information theory isthe answer.


Exercise 1

Let X1, · · · ,Xn be independent random variables such that Xi isequal to 1 with probability 1− δ and equal to 0 with probability

δ. Let X =n∑

i=1Xi( mod 2). Prove that

Pr[X = 1] =

12 + (1− 2δ)n/2, n is odd,12 − (1− 2δ)n/2, n is even.

Significance?


Exercise 1 - proof 1Let Yi = (−1)Xi , and Y =

n∏

i=1Yi .

Assume n odd.Let Pr[X = 1] = α.Then

E [Y ] = 1− 2α.

Since Xi and then Yi are independent,

E [Y ] = (−1 + 2δ)n

Therefore

(−1 + 2δ)n = 1− 2α

α =12+

(1− 2δ)n

2.


Exercise 1 - proof 2Proof.

(1 − 2δ)n = ((1 − δ) − δ)n

=

n∑

i=0

(

ni

)

(1 − δ)i (−δ)n−i .

Case 1. n odd(1− 2δ)n = Pr[X = 1]− Pr[X = 0],

Pr[X = 1] =12+

12(1 − 2δ)n

Case 2. n even (1− 2δ)n = Pr[X = 0]− Pr[X = 1],

Pr[X = 1] =12− 1

2(1 − 2δ)n


Exercise 2

Prove that if there exists a δ-density distribution H such thatPr

x∈RH[C(x) = f (x)] ≤ 1

2 + ǫ for every circuit C of size at most s

with s ≤√

ǫ2δ2n/100, then there exists a subset I ⊆ 0,1n ofsize at least δ

22n such that

Prx∈RI

[C(x) = f (x)] ≤ 12+ 2ǫ

for every circuit C of size at most s.


Exercise 2 - Proof

Some problems?Leave this to Mingji


Exercise 3

1. Let f : F→ F be any function. Suppose integer d ≥ 0 and

number ǫ > 2√

d|F| . Prove that there are at most 2/ǫ degree

d polynomials that agree with f on at least an ǫ fraction ofits coordinates.Significance?

2. Prove that if Q(x , y) is a bivariate polynomial over somefield F and P(x) is a univariate polynomial over F such thatQ(x ,P(x)) is the zero polynomial, thenQ(x , y) = (y − P(x))A(x , y) for some polynomial A(x , y).


Exercise 3 - proof 1Suppose that

P1,P2, · · · ,Pl

are the all degree d polynomials that agree with f in at least ǫfraction of coordinates.For each i , define a vector vi by

vi(j) =

1, if Pi(j) = f (j),

0,otherwise

for every j ∈ F.Then for every i ,

‖vi‖1 ≥ ǫ ·m,

where m = |F|.

ǫm ≤ 〈vi , vi〉 ≤ m


Exercise 3 - proof 2

Set

v =l

∑

i=1

vi

Then

〈v , v〉 =l

∑

i=1

〈vi 〉+∑

i 6=j

〈vi vj〉

≤ l ·m + (l2 − l)d .



And

〈v , v〉 =∑

k∈F

(v(k))2

=∑

k

(

l∑

i=1

vi(k))2

≥ (∑

k∑

i vi(k))2

m

=(∑

i∑

k vi(k))2

m

≥ (lǫm)2

m.



This gives

l ≤ 1− dm

ǫ2 − dm

for ǫ >√

dm .


Exercise 3 - proof 5For

ǫ > 2

√

dm

and

l ≤ 2ǫ.

ǫ+ (ǫ− dm) + · · · (ǫ− (l − 1)d

m) ≥ 1

with

ǫ− (l − 1)dm

≥ 0

Solving this, we have

l ≤ 2ǫ.



Take Q(x , y) as a polynomial of y with coefficients beingpolynomials of x .Divide Q(x , y) by the linear function y − P(x), linear in variabley , giving

Q(x , y) = (y − P(x))A(x , y) + R(x)

By the assumption,

Q(x ,P(x)) = R(x) ≡ 0.


Exercise 4

Linear codes We say that an ECC E : 0,1n → 0,1m islinear, if for every x , x ′ ∈ 0,1n, E(x + x ′) = E(x) + E(x ′)(componentwise addition modulo 2). A linear ECC can be seenan m × n matrix A such that E(x) = Ax , thinking of x as acolumn vector.

1. Prove that the distance of a linear ECC is equal to theminimum over all nonzero x ∈ 0,1n of the fraction of 1’sin E(x).

2. Prove that for every δ > 0, there exists a linear ECCE : 0,1n → 0,1m for m = Ω(n)/(1 − H(δ)) withdistance δ.

3. Prove that for some δ > 0, there is an ECCE : 0,1n → 0,1poly(n) of distance δ with poly timeencoding, and decoding algorithms.



Let A be an m × n 0,1 matrix which defines a linear ECC.The distance of A is:

δ = minx 6=x ′

1m· |i | yi 6= y ′

i |

where

yi = ai ,1x1 + ai ,2x2 + · · · ai ,nxn

and

y ′i = ai ,1x ′

1 + ai ,2x ′2 + · · · ai ,nx ′

n

This is

δ = minx 6=0

1m· |i |yi = 1|.



Remove the condition that E is linear.Given a vector y ∈ 0,1m, define the δ-ball of y to be the set ofthe vectors z ∈ 0,1m such that the distance between y and zis less than δ.Denoted by Bδ

y . Then

|Bδy | ≤

(

mδ ·m

)

= o(1) · 2H(δ)·m

In increasing order, for each x ∈ 0,1n, we define E(x) to be ay ∈ 0,1m such that Bδ

y disjoins all the δ-balls associated withthe codewords of x ′ < x .Suppose that m ≥ n

1−H(δ) . Then the definition above neverstops, since there are at least 2n many disjoint δ-balls in0,1m.



Consider now the linear ECC.Each linear ECC is given by an m × n matrix A.Two approaches:Case 1. Consider the random matrix A.With nonzero probability that A is such an ECC.Case 2. Counting the number of linear ECC that have distance< δ.


Exercise 4 - proof 4Consider the first approach.Let A be a random m × n matrix.We say that x = (x1, x2, · · · , xn) is a witness showing that A hasdistance < δ, if x 6= 0 and there are < δm many j such thatyj = 1, where

yj = aj1x1 + · · ·+ ajnxn.

For each j , define

Yj =

1, if yj = 1,

0, otherwise.

Let

Y =m∑

j=1

Yj .


Exercise 4 - proof 5Then for each j ,

E [Yj ] =12

E [Y ] = µ =m2

Clearly, all Yj ’s are independent.By the Chernoff bound, for ǫ = 1− 2δ,

Pr[Y < δm] = Pr[Y < (1 − ǫ)µ]

≤ [e−ǫ

(1− ǫ)(1−ǫ)]

m2

≤ 12c·m ,

for some constant c.



By the union bound,the probability that A has a witness for distance < δ is

12c·m−n

which is ≈ 0 if

m = Ω(n).



Consider the Reed-Solomon code

RS : Fn → Fm

It is an ECC with distance δ1 = 1− nm .

For every x = (a0,a1, · · · ,an−1) ∈ Fn,

RS(x) = (z0, z1, · · · , zm−1)

where

zj =

n−1∑

i=0

ai ji

j ∈ F.



Let |F| = 2k .Then each element f ∈ F is interpreted as an element inGF (2k ).For x ∈ F

n, we interpret it as an element in 0,1k ·n. Weencode RS(x) by

WH(z0),WH(z1), · · · ,WH(zm−1)

This is an ECC from 0,1k ·n to 0,1m·2k.

Choosing k such that m · 2k is a polynomial of k · n.


Exercise 5

1. Recall the spectral norm of a matrix A, written ‖A‖ to bethe maximum ‖Av‖2 for unit v . Let A be symmetricstochastic, i.e., A = AT, and every row and column of A hasnonnegative entries summing up to 1. Prove that ‖A‖ ≤ 1.

2. Let A, B be symmetric stochastic matrices. Prove thatλ(A + B) ≤ λ(A) + λ(B).

3. Let A, B be two n × n matrices.(a) Prove that ‖A + B‖ ≤ ‖A‖+ ‖B‖.(b) Prove that ‖AB‖ ≤ ‖A‖ · ‖B‖


Exercise 5 - proof

For 1.First,

‖A‖ ≤ n2.

Second, for every such A,

• A2 is symmetric stochastic matrix

•‖A2‖ ≥ ‖A‖2.

• If there is an A such that ‖A‖ = 1+ α for α > 0. Then thereis such a B with ‖B‖ unbounded.


Exercise 6

Let G be an (n,d , λ)-expander graph, and B be a set of verticesof size at most βn for 0 < β < 1. Let X1,X2, · · · ,Xk be arandom walk of k steps in G from X1 that is randomly anduniformly chosen.

1. Prove that for every subset I ⊆ [k ],

Pr[(∀i ∈ I)[Xi ∈ B]] ≤ (1− λ)√

β + λ)|I|−1.

2. Conclude that if B < n/100 and λ < 1/100, then theprobability that there exists a subset I ⊆ [k ] such that|I| > k/10 and ∀i≤|I|Xi ∈ B is at most 2−k/100.

3. To show that every BPP algorithm that uses m coins anddecides a language L with probability 0.99 into analgorithm B that uses m + O(k) coins and decides thelanguage L with probability 1− 2−k .


Exercise 6: Proof - I

For each i , 1 ≤ i ≤ k , letBi : the event Xi ∈ B. For I ⊆ [k ], let I = j1 < j2 < · · · ji. Then:

Pr[∧i∈IBi ]

= Pr[Bj1] · Pr[Bj2 |Bj1] · · · · · Pr[Bji |Bj1, · · · ,Bji−1]. (15)

Define B to be a linear transformation from Rn to R

n that keepsthe values indexed in B. That is, for (u1,u2, · · · ,un), define

(Bu)i =

ui , if i ∈ B,

0, otherwise.


Exercise 6: Proof - II

For every probability vector p,(i) Bp is the vector whose coordinates sum to the probabilitythat a vertex i is chosen according to p, is in B.(ii) The normalised Bp is the distribution of p conditioned to theevent that the vertex is in B.


Exercise 6: Proof - III

Let pj be the distribution of Xj conditioned on the eventsBj1 , · · · ,Bji . Then:

p1 =1

Pr[Bj1]· B1

p2 =1

Pr[Bj2 |Bj1 ]Pr[Bj1 ]BAB1

pi =1

Pr[Bji |Bji−1, · · ·Bj1] · · ·Pr[Bj1]

(BA)i−1B1.

Hence,

Pr[Bj1 ] · · ·Pr[Bji |Bji−1· · ·Bj1]p

i = (BA)i−1B1.


Exercise 6: Proof - IV

Pr[∧j∈IBj ] = Pr[B1] · · ·Pr[Bji |Bji−1· · ·Bj1 ] = ‖(BA)i−1B1‖1.

Let A = (1− λ)J + λC for some C with ‖C‖ ≤ 1.Then BA = (1− λ)BJ + λBC.Noting:(i) ‖B1‖2 ≤

√β‖1‖2

(ii) ‖BJ‖ ≤√β, ‖B‖ ≤ 1, ‖BC‖ ≤ 1.

(iii) ‖BA‖ ≤ (1− λ)√β + λ

Therefore,

|(BA)i−1B1|1 ≤ ‖(BA)i−1B1‖2 ·√

n

≤ ((1 − λ)√

β + λ)i−1. (16)


Exercise 7

(1) Give a probabilistic polynomial time algorithm that given a3CNF formula φ with exactly three distinct variables ineach clause, outputs an assignment satisfying at least a 7

8fraction of φ’s clauses.

(2) Give a deterministic polynomial time algorithm with thesame approximation guarantee as Exercise 1 above.

(3) Show a polynomial time algorithm that given a satisfiable2CSP instance φ over binary alphabet with m clausesoutputs a satisfying assignment for φ.

(4) Show a deterministic poly (n,2q)-time algorithm that givena qCSP-instance φ over binary alphabet with m clausesoutputs an assignment satisfying m/2q of the constraintsof φ.


Exercise 7 - proof

Easy


Exercise 8

(5) Suppose that G = (V ,E) is an (n,d , λ)-expander. Showthat for any S ⊂ V of size ≤ n

2 , the following holds:

Pr(u,v)∈RE

[u ∈ S ∧ v ∈ S] ≤ |S|n

(12+

λ

2).


Exercise 8 - proof -2

For X = Y = S, using the lemma,

Pr[v ∈ S | u ∈ S] ≤ 12(1 + λ).

Chapter 7 PageRank · Background Web graph Google’s matrix Teleportation 1 − α Personalised...

Documents

Transcript of Chapter 7 PageRank · Background Web graph Google’s matrix Teleportation 1 − α Personalised...