Raleigh, NC North Carolina State University Department of...

126
Google’sPageRank and Beyond Carl Meyer Department of Mathematics North Carolina State University Raleigh, NC Colorado State University Fort Collins, Colorado November 2, 2007

Transcript of Raleigh, NC North Carolina State University Department of...

Page 1: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Google’sPageRankand Beyond

Carl Meyer

Department of MathematicsNorth Carolina State UniversityRaleigh, NC

Colorado State University

Fort Collins, Colorado

November 2, 2007

Page 2: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Beautiful mathematics eventually

tends to be useful.

And useful mathematics eventually

tends to be beautiful.

Page 3: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Eye Of The Beholder

F (f ) =∫ ∞

−∞x(t)e−i2πftdt ←→

Page 4: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Eye Of The Beholder

⎧⎨⎩

Perron–Frobenius

Markov Chains

⎫⎬⎭ ←→

Page 5: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

System for the Mechanical Analysis and Retrieval ofText

Gerard Salton

Harvard 1962 – 1965

Cornell 1965 – 1970

• Implemented on IBM 7094 & IBM 360

• Based on matrix methods

Page 6: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Term–Document Matrices

Start with dictionary of terms

Words or phrases ( e.g., landing gear)

Page 7: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Term–Document Matrices

Start with dictionary of terms

Words or phrases ( e.g., landing gear)

Index Each Document

Humans scour pages and mark key terms

Page 8: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Term–Document Matrices

Start with dictionary of terms

Words or phrases ( e.g., landing gear)

Index Each Document

Humans scour pages and mark key terms

Count fij = # times term i appears in document j

Page 9: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Term–Document Matrices

Start with dictionary of terms

Words or phrases ( e.g., landing gear)

Index Each Document

Humans scour pages and mark key terms

Count fij = # times term i appears in document j

Term–Document Matrix

⎛⎜⎜⎜⎝

Doc 1 Doc 2 . . . Doc n

Term 1 f11 f12. . . f1n

Term 2 f21 f22. . . f2n...

......

. . ....

Term m fm1 fm2. . . fmn

⎞⎟⎟⎟⎠ = Am×n

Page 10: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Query MatchingQuery Vector

qT = (q1, q2, . . ., qm) qi ={

1 if Term i is requested0 if not

Page 11: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Query MatchingQuery Vector

qT = (q1, q2, . . ., qm) qi ={

1 if Term i is requested0 if not

How Close is Query to Each Document?

Page 12: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Query MatchingQuery Vector

qT = (q1, q2, . . ., qm) qi ={

1 if Term i is requested0 if not

How Close is Query to Each Document?

i.e., how close is q to each column Ai?

θ2

A1A2

A3

q

Page 13: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Query MatchingQuery Vector

qT = (q1, q2, . . ., qm) qi ={

1 if Term i is requested0 if not

How Close is Query to Each Document?

i.e., how close is q to each column Ai?

θ2

A1A2

A3

q

Use δi = cos θi =qTAi

‖q‖ ‖Ai‖

Page 14: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Query MatchingQuery Vector

qT = (q1, q2, . . ., qm) qi ={

1 if Term i is requested0 if not

How Close is Query to Each Document?

i.e., how close is q to each column Ai?

θ2

A1A2

A3

q

Use δi = cos θi =qTAi

‖q‖ ‖Ai‖

Rank documents by size of δi

Return Document i to user when δi ≥ tol

Page 15: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Susan Dumais’s Improvement

� Approximate A with a lower rank matrix

� Effect is to compress data in A

• 2 patents for Bell/Telcordia

— Computer information retrieval using latent semantic structure. U.S. Patent No.

4,839,853, June 13, 1989.

— Computerized cross-language document retrieval using latent semantic indexing.

U.S. Patent No. 5,301,109, April 5, 1994.

• LATENT SEMANTIC INDEXING

Page 16: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Latent Semantic IndexingUse a finite Fourier expansion of A

A =∑r

i=1 σiZi, 〈Zi Zj〉 ={

1 i=j,

0 i=j,|σ1| ≥ |σ2| ≥ . . . ≥ |σr|

|σi| = | 〈Zi A〉 | = amount of A in direction of Zi

Page 17: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Latent Semantic IndexingUse a finite Fourier expansion of A

A =∑r

i=1 σiZi, 〈Zi Zj〉 ={

1 i=j,

0 i=j,|σ1| ≥ |σ2| ≥ . . . ≥ |σr|

|σi| = | 〈Zi A〉 | = amount of A in direction of Zi

Realign data along dominant directions {Z1, . . ., Zk, Zk+1, . . ., Zr}— Project A onto span {Z1, Z2, . . ., Zk}

Page 18: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Latent Semantic IndexingUse a finite Fourier expansion of A

A =∑r

i=1 σiZi, 〈Zi Zj〉 ={

1 i=j,

0 i=j,|σ1| ≥ |σ2| ≥ . . . ≥ |σr|

|σi| = | 〈Zi A〉 | = amount of A in direction of Zi

Realign data along dominant directions {Z1, . . ., Zk, Zk+1, . . ., Zr}— Project A onto span {Z1, Z2, . . ., Zk}

Truncate: Ak = P (A) = σ1Z1 + σ2Z2 + . . . + σkZk

Page 19: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Latent Semantic IndexingUse a finite Fourier expansion of A

A =∑r

i=1 σiZi, 〈Zi Zj〉 ={

1 i=j,

0 i=j,|σ1| ≥ |σ2| ≥ . . . ≥ |σr|

|σi| = | 〈Zi A〉 | = amount of A in direction of Zi

Realign data along dominant directions {Z1, . . ., Zk, Zk+1, . . ., Zr}— Project A onto span {Z1, Z2, . . ., Zk}

Truncate: Ak = P (A) = σ1Z1 + σ2Z2 + . . . + σkZk

LSI: Query matching with Ak in place of A

— Doc2 forced closer to Doc1 =⇒ better chance of finding Doc2

Page 20: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Latent Semantic IndexingUse a finite Fourier expansion of A

A =∑r

i=1 σiZi, 〈Zi Zj〉 ={

1 i=j,

0 i=j,|σ1| ≥ |σ2| ≥ . . . ≥ |σr|

|σi| = | 〈Zi A〉 | = amount of A in direction of Zi

Realign data along dominant directions {Z1, . . ., Zk, Zk+1, . . ., Zr}— Project A onto span {Z1, Z2, . . ., Zk}

Truncate: Ak = P (A) = σ1Z1 + σ2Z2 + . . . + σkZk

LSI: Query matching with Ak in place of A

— Doc2 forced closer to Doc1 =⇒ better chance of finding Doc2

“Best” mathematical solution— SVD: A = UDVT =

∑σiuivT

i Zi = uivTi

Page 21: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

Page 22: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

Page 23: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

• Performs well on document collections that are

� Small + Homogeneous + Static

Page 24: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

• Performs well on document collections that are

� Small + Homogeneous + StaticCons

• Rankings are query dependent

— Rank of each doc is recomputed for each query

Page 25: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

• Performs well on document collections that are

� Small + Homogeneous + StaticCons

• Rankings are query dependent

— Rank of each doc is recomputed for each query

• Only semantic content used

— Susceptible to malicious manipulation

Page 26: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

• Performs well on document collections that are

� Small + Homogeneous + StaticCons

• Rankings are query dependent

— Rank of each doc is recomputed for each query

• Only semantic content used

— Susceptible to malicious manipulation

• Difficult to add & delete documents

Page 27: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

• Performs well on document collections that are

� Small + Homogeneous + StaticCons

• Rankings are query dependent

— Rank of each doc is recomputed for each query

• Only semantic content used

— Susceptible to malicious manipulation

• Difficult to add & delete documents

• Finding optimal compression requires empirical tuning

Page 28: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web FactsDifferent from other document collections

• It’s huge– Over 10 billion pages, where average page size ≈ 500KB

– 20 times size of Library of Congress print collection

– Deep Web ≈ 550 billion pages

Page 29: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web FactsDifferent from other document collections

• It’s huge– Over 10 billion pages, where average page size ≈ 500KB

– 20 times size of Library of Congress print collection

– Deep Web ≈ 550 billion pages

• It’s dynamic– 40% of all pages change in a week

– 23% of .com pages change daily

– Billions of pages added each year

Page 30: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web FactsDifferent from other document collections

• It’s huge– Over 10 billion pages, where average page size ≈ 500KB

– 20 times size of Library of Congress print collection

– Deep Web ≈ 550 billion pages

• It’s dynamic– 40% of all pages change in a week

– 23% of .com pages change daily

– Billions of pages added each year

• It’s self-organized– No standards, review process, formats

– Errors, falsehoods, link rot, and spammers!

Page 31: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web FactsDifferent from other document collections

• It’s huge– Over 10 billion pages, where average page size ≈ 500KB

– 20 times size of Library of Congress print collection

– Deep Web ≈ 550 billion pages

• It’s dynamic– 40% of all pages change in a week

– 23% of .com pages change daily

– Billions of pages added each year

• It’s self-organized– No standards, review process, formats

– Errors, falsehoods, link rot, and spammers!

• It has many users– Google alone processes more than 200 million queries per day

– Approximately 0.25 sec per query involving thousands of computers

Page 32: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web Search Components

Web Crawlers Software robotsgather web pages

Page 33: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web Search Components

Web Crawlers Software robotsgather web pages

Doc Server Stores docsand snippits

Page 34: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web Search Components

Web Crawlers Software robotsgather web pages

Doc Server Stores docsand snippits

Index Server

Scans pages and does term indexingTerms −→ Pages (similar to book index)

Page 35: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Ranking Module

• Measure the importance of each page

Page 36: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Ranking Module

• Measure the importance of each page

• The measure should be Independent of any query

— Primarily determined by the link structure of the Web

— Tempered by some content considerations

Page 37: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Ranking Module

• Measure the importance of each page

• The measure should be Independent of any query

— Primarily determined by the link structure of the Web

— Tempered by some content considerations

• Compute these measures off-line long before any queries areprocessed

Page 38: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Ranking Module

• Measure the importance of each page

• The measure should be Independent of any query

— Primarily determined by the link structure of the Web

— Tempered by some content considerations

• Compute these measures off-line long before any queries areprocessed

• Google’s PageRank c© technology distinguishes it from all com-petitors

Page 39: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Ranking Module

• Measure the importance of each page

• The measure should be Independent of any query

— Primarily determined by the link structure of the Web

— Tempered by some content considerations

• Compute these measures off-line long before any queries areprocessed

• Google’s PageRank c© technology distinguishes it from all com-petitors

Google’s PageRank = Google’s $$$$$

Page 40: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Process

Page 41: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Process

Page 42: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Process

Page 43: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Process

Page 44: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Process

Page 45: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 46: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 47: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 48: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 49: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 50: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 51: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 52: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

How To Measure “Importance”

Landmark Result Paper Survey Paper—Big Bib

Page 53: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

How To Measure “Importance”

Landmark Result Paper Survey Paper—Big Bib

Authorities Hubs

Page 54: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

How To Measure “Importance”

Landmark Result Paper Survey Paper—Big Bib

Authorities Hubs

• Good hubs point to good authorities

• Good authorities are pointed to by good hubs

Page 55: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITSHypertext Induced Topic Search (1998)

Jon Kleinberg

Determine Authority & Hub Scores

• ai = authority score for Pi

• hi = hub score for Pi

Page 56: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITSHypertext Induced Topic Search (1998)

Jon Kleinberg

Determine Authority & Hub Scores

• ai = authority score for Pi

• hi = hub score for Pi

Successive Refinement• Start with hi = 1 for all pages Pi ⇒ h0 =

⎡⎢⎢⎣

11...1

⎤⎥⎥⎦

Page 57: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITSHypertext Induced Topic Search (1998)

Jon Kleinberg

Determine Authority & Hub Scores

• ai = authority score for Pi

• hi = hub score for Pi

Successive Refinement• Start with hi = 1 for all pages Pi ⇒ h0 =

⎡⎢⎢⎣

11...1

⎤⎥⎥⎦

• Define Authority Scores (on the first pass)

ai =∑

j:Pj→Pi

hj

Page 58: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITSHypertext Induced Topic Search (1998)

Jon Kleinberg

Determine Authority & Hub Scores

• ai = authority score for Pi

• hi = hub score for Pi

Successive Refinement• Start with hi = 1 for all pages Pi ⇒ h0 =

⎡⎢⎢⎣

11...1

⎤⎥⎥⎦

• Define Authority Scores (on the first pass)

ai =∑

j:Pj→Pi

hj ⇒ a1 =

⎡⎢⎢⎣

a1

a2...an

⎤⎥⎥⎦ = LTh0

Lij ={

1 Pi → Pj

0 Pi → Pj

Page 59: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Page 60: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

Page 61: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

Page 62: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

Page 63: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

• h2 = La2

. . .

Page 64: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

• h2 = La2

. . .Combined Iterations

• A = LTL (authority matrix)

Page 65: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

• h2 = La2

. . .Combined Iterations

• A = LTL (authority matrix) ak = Aak−1 → e-vector (direction)

Page 66: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

• h2 = La2

. . .Combined Iterations

• A = LTL (authority matrix) ak = Aak−1 → e-vector (direction)

• H = LLT (hub matrix) hk = Hhk−1 → e-vector (direction)

Page 67: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

• h2 = La2

. . .Combined Iterations

• A = LTL (authority matrix) ak = Aak−1 → e-vector (direction)

• H = LLT (hub matrix) hk = Hhk−1 → e-vector (direction)

!! May not be uniquely defined !!

Page 68: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Compromise

1. Do direct query matching

Page 69: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Compromise

1. Do direct query matching

2. Build neighborhood graph

Page 70: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Compromise

1. Do direct query matching

2. Build neighborhood graph

3. Compute authority & hub scores for just the neighborhood

Page 71: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

Page 72: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

• Some flexibility for making refinements

Page 73: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

• Some flexibility for making refinements

Disadvantages

• Too much has to happen while client is waiting

Page 74: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

• Some flexibility for making refinements

Disadvantages

• Too much has to happen while client is waiting

— Custom built neighborhood graph needed for each query

Page 75: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

• Some flexibility for making refinements

Disadvantages

• Too much has to happen while client is waiting

— Custom built neighborhood graph needed for each query

— Two eigenvector computations needed for each query

Page 76: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

• Some flexibility for making refinements

Disadvantages

• Too much has to happen while client is waiting

— Custom built neighborhood graph needed for each query

— Two eigenvector computations needed for each query

• Scores can be manipulated by creating artificial hubs

Page 77: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS Applied

−→ −→

Page 78: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Wall Street Journal May 24, 2007

Page 79: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 80: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Google’s PageRank(Lawrence Page & Sergey Brin 1998)

The Google Goals

• Create a PageRank r(P ) that is not query dependent

� Off-line calculations — No query time computation

• Let the Web vote with in-links

� But not by simple link counts

— One link to P from Yahoo! is important

— Many links to P from me is not

• Share The Vote

� Yahoo! casts many “votes”

— value of vote from Yahoo! is diluted

� If Yahoo! “votes” for n pages

— Then P receives only r(Y )/n credit from Y

Page 81: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Google’s PageRank(Lawrence Page & Sergey Brin 1998)

The Google Goals

• Create a PageRank r(P ) that is not query dependent

� Off-line calculations — No query time computation

• Let the Web vote with in-links

� But not by simple link counts

— One link to P from Yahoo! is important

— Many links to P from me is not

• Share The Vote

� Yahoo! casts many “votes”

— value of vote from Yahoo! is diluted

� If Yahoo! “votes” for n pages

— Then P receives only r(Y )/n credit from Y

Page 82: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Google’s PageRank(Lawrence Page & Sergey Brin 1998)

The Google Goals

• Create a PageRank r(P ) that is not query dependent

� Off-line calculations — No query time computation

• Let the Web vote with in-links

� But not by simple link counts

— One link to P from Yahoo! is important

— Many links to P from me is not

• Share The Vote

� Yahoo! casts many “votes”

— value of vote from Yahoo! is diluted

� If Yahoo! “votes” for n pages

— Then P receives only r(Y )/n credit from Y

Page 83: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Google’s PageRank(Lawrence Page & Sergey Brin 1998)

The Google Goals

• Create a PageRank r(P ) that is not query dependent

� Off-line calculations — No query time computation

• Let the Web vote with in-links

� But not by simple link counts

— One link to P from Yahoo! is important

— Many links to P from me is not

• Share The Vote

� Yahoo! casts many “votes”

— value of vote from Yahoo! is diluted

� If Yahoo! “votes” for n pages

— Then P receives only r(Y )/n credit from Y

Page 84: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

PageRankGoogle’s Original Idea

r(P ) =∑P∈BP

r(P )|P |

BP = {all pages pointing to P}|P | = number of out links from P

Page 85: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

PageRankGoogle’s Original Idea

r(P ) =∑P∈BP

r(P )|P |

BP = {all pages pointing to P}|P | = number of out links from P

Successive Refinement

Start with r0(Pi) = 1/n for all pages P1, P2, . . ., Pn

Page 86: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

PageRankGoogle’s Original Idea

r(P ) =∑P∈BP

r(P )|P |

BP = {all pages pointing to P}|P | = number of out links from P

Successive Refinement

Start with r0(Pi) = 1/n for all pages P1, P2, . . ., Pn

Iteratively refine rankings for each page

r1(Pi) =∑

P∈BPi

r0(P )|P |

Page 87: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

PageRankGoogle’s Original Idea

r(P ) =∑P∈BP

r(P )|P |

BP = {all pages pointing to P}|P | = number of out links from P

Successive Refinement

Start with r0(Pi) = 1/n for all pages P1, P2, . . ., Pn

Iteratively refine rankings for each page

r1(Pi) =∑

P∈BPi

r0(P )|P |

r2(Pi) =∑

P∈BPi

r1(P )|P |

Page 88: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

PageRankGoogle’s Original Idea

r(P ) =∑P∈BP

r(P )|P |

BP = {all pages pointing to P}|P | = number of out links from P

Successive Refinement

Start with r0(Pi) = 1/n for all pages P1, P2, . . ., Pn

Iteratively refine rankings for each page

r1(Pi) =∑

P∈BPi

r0(P )|P |

r2(Pi) =∑

P∈BPi

r1(P )|P |

. . .

rj+1(Pi) =∑

P∈BPi

rj(P )|P |

Page 89: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

In Matrix Notation

After Step k

— πTk = [rk(P1), rk(P2), . . ., rk(Pn)]

Page 90: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

In Matrix Notation

After Step k

— πTk = [rk(P1), rk(P2), . . ., rk(Pn)]

— πTk+1 = πT

k H where hij ={

1/|Pi| if i → j

0 otherwise

Page 91: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

In Matrix Notation

After Step k

— πTk = [rk(P1), rk(P2), . . ., rk(Pn)]

— πTk+1 = πT

k H where hij ={

1/|Pi| if i → j

0 otherwise

— PageRank vector = πT = limk→∞

πTk = eigenvector for H

Provided that the limit exists

Page 92: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1

P2

P3

P4

P5

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 93: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2

P3

P4

P5

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 94: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3

P4

P5

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 95: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4

P5

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 96: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 97: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 98: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 99: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

� A random walk on the Web Graph

Page 100: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

� A random walk on the Web Graph

� PageRank = πi = amount of time spent at Pi

Page 101: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

� A random walk on the Web Graph

� PageRank = πi = amount of time spent at Pi

� Dead end page (nothing to click on) — a “dangling node”

Page 102: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

� A random walk on the Web Graph

� PageRank = πi = amount of time spent at Pi

� Dead end page (nothing to click on) — a “dangling node”

� πT = (0,1,0,0,0,0) = e-vector =⇒ Page P2 is a “rank sink”

Page 103: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The FixAllow Web Surfers To Make Random Jumps

Page 104: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The FixAllow Web Surfers To Make Random Jumps

— Replace zero rows with eT

n =(

1

n,1

n, . . . ,

1

n

)

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 1/6 1/6 1/6 1/6 1/6 1/6

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 105: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The FixAllow Web Surfers To Make Random Jumps

— Replace zero rows with eT

n =(

1

n,1

n, . . . ,

1

n

)

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 1/6 1/6 1/6 1/6 1/6 1/6

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

— S = H + a eT

6 is now row stochastic =⇒ ρ(S) = 1

Page 106: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The FixAllow Web Surfers To Make Random Jumps

— Replace zero rows with eT

n =(

1

n,1

n, . . . ,

1

n

)

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 1/6 1/6 1/6 1/6 1/6 1/6

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

— S = H + a eT

6 is now row stochastic =⇒ ρ(S) = 1

— Perron says ∃ πT ≥ 0 s.t. πT = πTS with∑

i πi = 1

Page 107: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Nasty ProblemThe Web Graph Is Not Strongly Connected

Page 108: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Nasty ProblemThe Web Graph Is Not Strongly Connected

— i.e., S is a reducible matrix

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 1/6 1/6 1/6 1/6 1/6 1/6

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 109: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Nasty ProblemThe Web Graph Is Not Strongly Connected

— i.e., S is a reducible matrix

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 1/6 1/6 1/6 1/6 1/6 1/6

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

— Reducible =⇒ PageRank vector is not well defined

— Frobenius says S needs to be irreducible to ensure a uniqueπT > 0 s.t. πT = πTS with

∑i πi = 1

Page 110: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

Page 111: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

— The powers Sk fail to converge

Page 112: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

— The powers Sk fail to converge

— πTk+1 = πT

k S fails to convergence

Page 113: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

— The powers Sk fail to converge

— πTk+1 = πT

k S fails to convergence

Convergence Requirement

— Perron–Frobenius requires S to be primitive

Page 114: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

— The powers Sk fail to converge

— πTk+1 = πT

k S fails to convergence

Convergence Requirement

— Perron–Frobenius requires S to be primitive

— No eigenvalues other than λ = 1 on unit circle

Page 115: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

— The powers Sk fail to converge

— πTk+1 = πT

k S fails to convergence

Convergence Requirement

— Perron–Frobenius requires S to be primitive

— No eigenvalues other than λ = 1 on unit circle

— Frobenius proved S is primitive ⇐⇒ Sk > 0 for some k

Page 116: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

Page 117: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

Page 118: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

— PageRank vector πT = left-hand Perron vector of G

Page 119: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

— PageRank vector πT = left-hand Perron vector of G

Some Happy Accidents

— xTG = αxTH + βvT Sparse computations with the original link structure

Page 120: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

— PageRank vector πT = left-hand Perron vector of G

Some Happy Accidents

— xTG = αxTH + βvT Sparse computations with the original link structure

— λ2(G) = α Convergence rate controllable by Google engineers

Page 121: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

— PageRank vector πT = left-hand Perron vector of G

Some Happy Accidents

— xTG = αxTH + βvT Sparse computations with the original link structure

— λ2(G) = α Convergence rate controllable by Google engineers

— vT can be any positive probability vector in G = αH + uvT

Page 122: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

— PageRank vector πT = left-hand Perron vector of G

Some Happy Accidents

— xTG = αxTH + βvT Sparse computations with the original link structure

— λ2(G) = α Convergence rate controllable by Google engineers

— vT can be any positive probability vector in G = αH + uvT

— The choice of vT allows for personalization

Page 123: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 124: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 125: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Personalization is Coming

Page 126: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

ConclusionGoogle Augments PR With Content Scores For Final Rankings

Content “Metrics” Are Proprietary — But Known Examples

— Whether query terms appear in the title or the body

— Number of times query terms appear in a page

— Proximity of multiple query words to one another

— Appearance of query terms in a page (e.g., headings in bold font score higher)

— Content of neighboring web pages

Elegant and Exciting Application of Mathematics

That Is Changing The World