Download - Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Transcript
Page 1: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Google’sPageRankand Beyond

Carl Meyer

Department of MathematicsNorth Carolina State UniversityRaleigh, NC

Colorado State University

Fort Collins, Colorado

November 2, 2007

Page 2: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Beautiful mathematics eventually

tends to be useful.

And useful mathematics eventually

tends to be beautiful.

Page 3: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Eye Of The Beholder

F (f ) =∫ ∞

−∞x(t)e−i2πftdt ←→

Page 4: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Eye Of The Beholder

⎧⎨⎩

Perron–Frobenius

Markov Chains

⎫⎬⎭ ←→

Page 5: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

System for the Mechanical Analysis and Retrieval ofText

Gerard Salton

Harvard 1962 – 1965

Cornell 1965 – 1970

• Implemented on IBM 7094 & IBM 360

• Based on matrix methods

Page 6: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Term–Document Matrices

Start with dictionary of terms

Words or phrases ( e.g., landing gear)

Page 7: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Term–Document Matrices

Start with dictionary of terms

Words or phrases ( e.g., landing gear)

Index Each Document

Humans scour pages and mark key terms

Page 8: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Term–Document Matrices

Start with dictionary of terms

Words or phrases ( e.g., landing gear)

Index Each Document

Humans scour pages and mark key terms

Count fij = # times term i appears in document j

Page 9: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Term–Document Matrices

Start with dictionary of terms

Words or phrases ( e.g., landing gear)

Index Each Document

Humans scour pages and mark key terms

Count fij = # times term i appears in document j

Term–Document Matrix

⎛⎜⎜⎜⎝

Doc 1 Doc 2 . . . Doc n

Term 1 f11 f12. . . f1n

Term 2 f21 f22. . . f2n...

......

. . ....

Term m fm1 fm2. . . fmn

⎞⎟⎟⎟⎠ = Am×n

Page 10: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Query MatchingQuery Vector

qT = (q1, q2, . . ., qm) qi ={

1 if Term i is requested0 if not

Page 11: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Query MatchingQuery Vector

qT = (q1, q2, . . ., qm) qi ={

1 if Term i is requested0 if not

How Close is Query to Each Document?

Page 12: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Query MatchingQuery Vector

qT = (q1, q2, . . ., qm) qi ={

1 if Term i is requested0 if not

How Close is Query to Each Document?

i.e., how close is q to each column Ai?

θ2

A1A2

A3

q

Page 13: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Query MatchingQuery Vector

qT = (q1, q2, . . ., qm) qi ={

1 if Term i is requested0 if not

How Close is Query to Each Document?

i.e., how close is q to each column Ai?

θ2

A1A2

A3

q

Use δi = cos θi =qTAi

‖q‖ ‖Ai‖

Page 14: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Query MatchingQuery Vector

qT = (q1, q2, . . ., qm) qi ={

1 if Term i is requested0 if not

How Close is Query to Each Document?

i.e., how close is q to each column Ai?

θ2

A1A2

A3

q

Use δi = cos θi =qTAi

‖q‖ ‖Ai‖

Rank documents by size of δi

Return Document i to user when δi ≥ tol

Page 15: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Susan Dumais’s Improvement

� Approximate A with a lower rank matrix

� Effect is to compress data in A

• 2 patents for Bell/Telcordia

— Computer information retrieval using latent semantic structure. U.S. Patent No.

4,839,853, June 13, 1989.

— Computerized cross-language document retrieval using latent semantic indexing.

U.S. Patent No. 5,301,109, April 5, 1994.

• LATENT SEMANTIC INDEXING

Page 16: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Latent Semantic IndexingUse a finite Fourier expansion of A

A =∑r

i=1 σiZi, 〈Zi Zj〉 ={

1 i=j,

0 i=j,|σ1| ≥ |σ2| ≥ . . . ≥ |σr|

|σi| = | 〈Zi A〉 | = amount of A in direction of Zi

Page 17: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Latent Semantic IndexingUse a finite Fourier expansion of A

A =∑r

i=1 σiZi, 〈Zi Zj〉 ={

1 i=j,

0 i=j,|σ1| ≥ |σ2| ≥ . . . ≥ |σr|

|σi| = | 〈Zi A〉 | = amount of A in direction of Zi

Realign data along dominant directions {Z1, . . ., Zk, Zk+1, . . ., Zr}— Project A onto span {Z1, Z2, . . ., Zk}

Page 18: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Latent Semantic IndexingUse a finite Fourier expansion of A

A =∑r

i=1 σiZi, 〈Zi Zj〉 ={

1 i=j,

0 i=j,|σ1| ≥ |σ2| ≥ . . . ≥ |σr|

|σi| = | 〈Zi A〉 | = amount of A in direction of Zi

Realign data along dominant directions {Z1, . . ., Zk, Zk+1, . . ., Zr}— Project A onto span {Z1, Z2, . . ., Zk}

Truncate: Ak = P (A) = σ1Z1 + σ2Z2 + . . . + σkZk

Page 19: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Latent Semantic IndexingUse a finite Fourier expansion of A

A =∑r

i=1 σiZi, 〈Zi Zj〉 ={

1 i=j,

0 i=j,|σ1| ≥ |σ2| ≥ . . . ≥ |σr|

|σi| = | 〈Zi A〉 | = amount of A in direction of Zi

Realign data along dominant directions {Z1, . . ., Zk, Zk+1, . . ., Zr}— Project A onto span {Z1, Z2, . . ., Zk}

Truncate: Ak = P (A) = σ1Z1 + σ2Z2 + . . . + σkZk

LSI: Query matching with Ak in place of A

— Doc2 forced closer to Doc1 =⇒ better chance of finding Doc2

Page 20: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Latent Semantic IndexingUse a finite Fourier expansion of A

A =∑r

i=1 σiZi, 〈Zi Zj〉 ={

1 i=j,

0 i=j,|σ1| ≥ |σ2| ≥ . . . ≥ |σr|

|σi| = | 〈Zi A〉 | = amount of A in direction of Zi

Realign data along dominant directions {Z1, . . ., Zk, Zk+1, . . ., Zr}— Project A onto span {Z1, Z2, . . ., Zk}

Truncate: Ak = P (A) = σ1Z1 + σ2Z2 + . . . + σkZk

LSI: Query matching with Ak in place of A

— Doc2 forced closer to Doc1 =⇒ better chance of finding Doc2

“Best” mathematical solution— SVD: A = UDVT =

∑σiuivT

i Zi = uivTi

Page 21: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

Page 22: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

Page 23: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

• Performs well on document collections that are

� Small + Homogeneous + Static

Page 24: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

• Performs well on document collections that are

� Small + Homogeneous + StaticCons

• Rankings are query dependent

— Rank of each doc is recomputed for each query

Page 25: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

• Performs well on document collections that are

� Small + Homogeneous + StaticCons

• Rankings are query dependent

— Rank of each doc is recomputed for each query

• Only semantic content used

— Susceptible to malicious manipulation

Page 26: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

• Performs well on document collections that are

� Small + Homogeneous + StaticCons

• Rankings are query dependent

— Rank of each doc is recomputed for each query

• Only semantic content used

— Susceptible to malicious manipulation

• Difficult to add & delete documents

Page 27: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Strengths & WeaknessesPros

• Finds hidden connections

• Can be adapted to identify document clusters

— Data mining applications

• Performs well on document collections that are

� Small + Homogeneous + StaticCons

• Rankings are query dependent

— Rank of each doc is recomputed for each query

• Only semantic content used

— Susceptible to malicious manipulation

• Difficult to add & delete documents

• Finding optimal compression requires empirical tuning

Page 28: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web FactsDifferent from other document collections

• It’s huge– Over 10 billion pages, where average page size ≈ 500KB

– 20 times size of Library of Congress print collection

– Deep Web ≈ 550 billion pages

Page 29: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web FactsDifferent from other document collections

• It’s huge– Over 10 billion pages, where average page size ≈ 500KB

– 20 times size of Library of Congress print collection

– Deep Web ≈ 550 billion pages

• It’s dynamic– 40% of all pages change in a week

– 23% of .com pages change daily

– Billions of pages added each year

Page 30: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web FactsDifferent from other document collections

• It’s huge– Over 10 billion pages, where average page size ≈ 500KB

– 20 times size of Library of Congress print collection

– Deep Web ≈ 550 billion pages

• It’s dynamic– 40% of all pages change in a week

– 23% of .com pages change daily

– Billions of pages added each year

• It’s self-organized– No standards, review process, formats

– Errors, falsehoods, link rot, and spammers!

Page 31: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web FactsDifferent from other document collections

• It’s huge– Over 10 billion pages, where average page size ≈ 500KB

– 20 times size of Library of Congress print collection

– Deep Web ≈ 550 billion pages

• It’s dynamic– 40% of all pages change in a week

– 23% of .com pages change daily

– Billions of pages added each year

• It’s self-organized– No standards, review process, formats

– Errors, falsehoods, link rot, and spammers!

• It has many users– Google alone processes more than 200 million queries per day

– Approximately 0.25 sec per query involving thousands of computers

Page 32: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web Search Components

Web Crawlers Software robotsgather web pages

Page 33: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web Search Components

Web Crawlers Software robotsgather web pages

Doc Server Stores docsand snippits

Page 34: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Web Search Components

Web Crawlers Software robotsgather web pages

Doc Server Stores docsand snippits

Index Server

Scans pages and does term indexingTerms −→ Pages (similar to book index)

Page 35: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Ranking Module

• Measure the importance of each page

Page 36: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Ranking Module

• Measure the importance of each page

• The measure should be Independent of any query

— Primarily determined by the link structure of the Web

— Tempered by some content considerations

Page 37: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Ranking Module

• Measure the importance of each page

• The measure should be Independent of any query

— Primarily determined by the link structure of the Web

— Tempered by some content considerations

• Compute these measures off-line long before any queries areprocessed

Page 38: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Ranking Module

• Measure the importance of each page

• The measure should be Independent of any query

— Primarily determined by the link structure of the Web

— Tempered by some content considerations

• Compute these measures off-line long before any queries areprocessed

• Google’s PageRank c© technology distinguishes it from all com-petitors

Page 39: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Ranking Module

• Measure the importance of each page

• The measure should be Independent of any query

— Primarily determined by the link structure of the Web

— Tempered by some content considerations

• Compute these measures off-line long before any queries areprocessed

• Google’s PageRank c© technology distinguishes it from all com-petitors

Google’s PageRank = Google’s $$$$$

Page 40: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Process

Page 41: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Process

Page 42: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Process

Page 43: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Process

Page 44: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Process

Page 45: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 46: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 47: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 48: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 49: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 50: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 51: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 52: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

How To Measure “Importance”

Landmark Result Paper Survey Paper—Big Bib

Page 53: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

How To Measure “Importance”

Landmark Result Paper Survey Paper—Big Bib

Authorities Hubs

Page 54: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

How To Measure “Importance”

Landmark Result Paper Survey Paper—Big Bib

Authorities Hubs

• Good hubs point to good authorities

• Good authorities are pointed to by good hubs

Page 55: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITSHypertext Induced Topic Search (1998)

Jon Kleinberg

Determine Authority & Hub Scores

• ai = authority score for Pi

• hi = hub score for Pi

Page 56: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITSHypertext Induced Topic Search (1998)

Jon Kleinberg

Determine Authority & Hub Scores

• ai = authority score for Pi

• hi = hub score for Pi

Successive Refinement• Start with hi = 1 for all pages Pi ⇒ h0 =

⎡⎢⎢⎣

11...1

⎤⎥⎥⎦

Page 57: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITSHypertext Induced Topic Search (1998)

Jon Kleinberg

Determine Authority & Hub Scores

• ai = authority score for Pi

• hi = hub score for Pi

Successive Refinement• Start with hi = 1 for all pages Pi ⇒ h0 =

⎡⎢⎢⎣

11...1

⎤⎥⎥⎦

• Define Authority Scores (on the first pass)

ai =∑

j:Pj→Pi

hj

Page 58: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITSHypertext Induced Topic Search (1998)

Jon Kleinberg

Determine Authority & Hub Scores

• ai = authority score for Pi

• hi = hub score for Pi

Successive Refinement• Start with hi = 1 for all pages Pi ⇒ h0 =

⎡⎢⎢⎣

11...1

⎤⎥⎥⎦

• Define Authority Scores (on the first pass)

ai =∑

j:Pj→Pi

hj ⇒ a1 =

⎡⎢⎢⎣

a1

a2...an

⎤⎥⎥⎦ = LTh0

Lij ={

1 Pi → Pj

0 Pi → Pj

Page 59: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Page 60: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

Page 61: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

Page 62: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

Page 63: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

• h2 = La2

. . .

Page 64: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

• h2 = La2

. . .Combined Iterations

• A = LTL (authority matrix)

Page 65: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

• h2 = La2

. . .Combined Iterations

• A = LTL (authority matrix) ak = Aak−1 → e-vector (direction)

Page 66: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

• h2 = La2

. . .Combined Iterations

• A = LTL (authority matrix) ak = Aak−1 → e-vector (direction)

• H = LLT (hub matrix) hk = Hhk−1 → e-vector (direction)

Page 67: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS AlgorithmRefine Hub Scores

• hi =∑

j:Pi→Pj

aj ⇒ h1 = La1 Lij ={

1 Pi → Pj

0 Pi → Pj

Successively Re-refine Authority & Hub Scores

• a1 = LTh0

• h1 = La1

• a2 = LTh1

• h2 = La2

. . .Combined Iterations

• A = LTL (authority matrix) ak = Aak−1 → e-vector (direction)

• H = LLT (hub matrix) hk = Hhk−1 → e-vector (direction)

!! May not be uniquely defined !!

Page 68: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Compromise

1. Do direct query matching

Page 69: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Compromise

1. Do direct query matching

2. Build neighborhood graph

Page 70: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Compromise

1. Do direct query matching

2. Build neighborhood graph

3. Compute authority & hub scores for just the neighborhood

Page 71: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

Page 72: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

• Some flexibility for making refinements

Page 73: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

• Some flexibility for making refinements

Disadvantages

• Too much has to happen while client is waiting

Page 74: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

• Some flexibility for making refinements

Disadvantages

• Too much has to happen while client is waiting

— Custom built neighborhood graph needed for each query

Page 75: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

• Some flexibility for making refinements

Disadvantages

• Too much has to happen while client is waiting

— Custom built neighborhood graph needed for each query

— Two eigenvector computations needed for each query

Page 76: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Pros & Cons

Advantages

• Returns satisfactory results

— Client gets both authority & hub scores

• Some flexibility for making refinements

Disadvantages

• Too much has to happen while client is waiting

— Custom built neighborhood graph needed for each query

— Two eigenvector computations needed for each query

• Scores can be manipulated by creating artificial hubs

Page 77: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

HITS Applied

−→ −→

Page 78: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Wall Street Journal May 24, 2007

Page 79: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 80: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Google’s PageRank(Lawrence Page & Sergey Brin 1998)

The Google Goals

• Create a PageRank r(P ) that is not query dependent

� Off-line calculations — No query time computation

• Let the Web vote with in-links

� But not by simple link counts

— One link to P from Yahoo! is important

— Many links to P from me is not

• Share The Vote

� Yahoo! casts many “votes”

— value of vote from Yahoo! is diluted

� If Yahoo! “votes” for n pages

— Then P receives only r(Y )/n credit from Y

Page 81: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Google’s PageRank(Lawrence Page & Sergey Brin 1998)

The Google Goals

• Create a PageRank r(P ) that is not query dependent

� Off-line calculations — No query time computation

• Let the Web vote with in-links

� But not by simple link counts

— One link to P from Yahoo! is important

— Many links to P from me is not

• Share The Vote

� Yahoo! casts many “votes”

— value of vote from Yahoo! is diluted

� If Yahoo! “votes” for n pages

— Then P receives only r(Y )/n credit from Y

Page 82: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Google’s PageRank(Lawrence Page & Sergey Brin 1998)

The Google Goals

• Create a PageRank r(P ) that is not query dependent

� Off-line calculations — No query time computation

• Let the Web vote with in-links

� But not by simple link counts

— One link to P from Yahoo! is important

— Many links to P from me is not

• Share The Vote

� Yahoo! casts many “votes”

— value of vote from Yahoo! is diluted

� If Yahoo! “votes” for n pages

— Then P receives only r(Y )/n credit from Y

Page 83: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Google’s PageRank(Lawrence Page & Sergey Brin 1998)

The Google Goals

• Create a PageRank r(P ) that is not query dependent

� Off-line calculations — No query time computation

• Let the Web vote with in-links

� But not by simple link counts

— One link to P from Yahoo! is important

— Many links to P from me is not

• Share The Vote

� Yahoo! casts many “votes”

— value of vote from Yahoo! is diluted

� If Yahoo! “votes” for n pages

— Then P receives only r(Y )/n credit from Y

Page 84: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

PageRankGoogle’s Original Idea

r(P ) =∑P∈BP

r(P )|P |

BP = {all pages pointing to P}|P | = number of out links from P

Page 85: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

PageRankGoogle’s Original Idea

r(P ) =∑P∈BP

r(P )|P |

BP = {all pages pointing to P}|P | = number of out links from P

Successive Refinement

Start with r0(Pi) = 1/n for all pages P1, P2, . . ., Pn

Page 86: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

PageRankGoogle’s Original Idea

r(P ) =∑P∈BP

r(P )|P |

BP = {all pages pointing to P}|P | = number of out links from P

Successive Refinement

Start with r0(Pi) = 1/n for all pages P1, P2, . . ., Pn

Iteratively refine rankings for each page

r1(Pi) =∑

P∈BPi

r0(P )|P |

Page 87: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

PageRankGoogle’s Original Idea

r(P ) =∑P∈BP

r(P )|P |

BP = {all pages pointing to P}|P | = number of out links from P

Successive Refinement

Start with r0(Pi) = 1/n for all pages P1, P2, . . ., Pn

Iteratively refine rankings for each page

r1(Pi) =∑

P∈BPi

r0(P )|P |

r2(Pi) =∑

P∈BPi

r1(P )|P |

Page 88: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

PageRankGoogle’s Original Idea

r(P ) =∑P∈BP

r(P )|P |

BP = {all pages pointing to P}|P | = number of out links from P

Successive Refinement

Start with r0(Pi) = 1/n for all pages P1, P2, . . ., Pn

Iteratively refine rankings for each page

r1(Pi) =∑

P∈BPi

r0(P )|P |

r2(Pi) =∑

P∈BPi

r1(P )|P |

. . .

rj+1(Pi) =∑

P∈BPi

rj(P )|P |

Page 89: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

In Matrix Notation

After Step k

— πTk = [rk(P1), rk(P2), . . ., rk(Pn)]

Page 90: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

In Matrix Notation

After Step k

— πTk = [rk(P1), rk(P2), . . ., rk(Pn)]

— πTk+1 = πT

k H where hij ={

1/|Pi| if i → j

0 otherwise

Page 91: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

In Matrix Notation

After Step k

— πTk = [rk(P1), rk(P2), . . ., rk(Pn)]

— πTk+1 = πT

k H where hij ={

1/|Pi| if i → j

0 otherwise

— PageRank vector = πT = limk→∞

πTk = eigenvector for H

Provided that the limit exists

Page 92: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1

P2

P3

P4

P5

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 93: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2

P3

P4

P5

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 94: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3

P4

P5

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 95: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4

P5

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 96: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 97: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 98: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 99: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

� A random walk on the Web Graph

Page 100: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

� A random walk on the Web Graph

� PageRank = πi = amount of time spent at Pi

Page 101: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

� A random walk on the Web Graph

� PageRank = πi = amount of time spent at Pi

� Dead end page (nothing to click on) — a “dangling node”

Page 102: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Tiny Web

3

6 5

4

1

3

6 5

4

1 2

H =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 0 0 0 0 0 0

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

� A random walk on the Web Graph

� PageRank = πi = amount of time spent at Pi

� Dead end page (nothing to click on) — a “dangling node”

� πT = (0,1,0,0,0,0) = e-vector =⇒ Page P2 is a “rank sink”

Page 103: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The FixAllow Web Surfers To Make Random Jumps

Page 104: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The FixAllow Web Surfers To Make Random Jumps

— Replace zero rows with eT

n =(

1

n,1

n, . . . ,

1

n

)

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 1/6 1/6 1/6 1/6 1/6 1/6

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 105: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The FixAllow Web Surfers To Make Random Jumps

— Replace zero rows with eT

n =(

1

n,1

n, . . . ,

1

n

)

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 1/6 1/6 1/6 1/6 1/6 1/6

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

— S = H + a eT

6 is now row stochastic =⇒ ρ(S) = 1

Page 106: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The FixAllow Web Surfers To Make Random Jumps

— Replace zero rows with eT

n =(

1

n,1

n, . . . ,

1

n

)

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 1/6 1/6 1/6 1/6 1/6 1/6

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

— S = H + a eT

6 is now row stochastic =⇒ ρ(S) = 1

— Perron says ∃ πT ≥ 0 s.t. πT = πTS with∑

i πi = 1

Page 107: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Nasty ProblemThe Web Graph Is Not Strongly Connected

Page 108: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Nasty ProblemThe Web Graph Is Not Strongly Connected

— i.e., S is a reducible matrix

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 1/6 1/6 1/6 1/6 1/6 1/6

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Page 109: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Nasty ProblemThe Web Graph Is Not Strongly Connected

— i.e., S is a reducible matrix

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P1 P2 P3 P4 P5 P6

P1 0 1/2 1/2 0 0 0

P2 1/6 1/6 1/6 1/6 1/6 1/6

P3 1/3 1/3 0 0 1/3 0

P4 0 0 0 0 1/2 1/2

P5 0 0 0 1/2 0 1/2

P6 0 0 0 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

— Reducible =⇒ PageRank vector is not well defined

— Frobenius says S needs to be irreducible to ensure a uniqueπT > 0 s.t. πT = πTS with

∑i πi = 1

Page 110: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

Page 111: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

— The powers Sk fail to converge

Page 112: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

— The powers Sk fail to converge

— πTk+1 = πT

k S fails to convergence

Page 113: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

— The powers Sk fail to converge

— πTk+1 = πT

k S fails to convergence

Convergence Requirement

— Perron–Frobenius requires S to be primitive

Page 114: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

— The powers Sk fail to converge

— πTk+1 = πT

k S fails to convergence

Convergence Requirement

— Perron–Frobenius requires S to be primitive

— No eigenvalues other than λ = 1 on unit circle

Page 115: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

— The powers Sk fail to converge

— πTk+1 = πT

k S fails to convergence

Convergence Requirement

— Perron–Frobenius requires S to be primitive

— No eigenvalues other than λ = 1 on unit circle

— Frobenius proved S is primitive ⇐⇒ Sk > 0 for some k

Page 116: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

Page 117: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

Page 118: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

— PageRank vector πT = left-hand Perron vector of G

Page 119: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

— PageRank vector πT = left-hand Perron vector of G

Some Happy Accidents

— xTG = αxTH + βvT Sparse computations with the original link structure

Page 120: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

— PageRank vector πT = left-hand Perron vector of G

Some Happy Accidents

— xTG = αxTH + βvT Sparse computations with the original link structure

— λ2(G) = α Convergence rate controllable by Google engineers

Page 121: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

— PageRank vector πT = left-hand Perron vector of G

Some Happy Accidents

— xTG = αxTH + βvT Sparse computations with the original link structure

— λ2(G) = α Convergence rate controllable by Google engineers

— vT can be any positive probability vector in G = αH + uvT

Page 122: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

The Google FixAllow A Random Jump From Any Page

— G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

— G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

— PageRank vector πT = left-hand Perron vector of G

Some Happy Accidents

— xTG = αxTH + βvT Sparse computations with the original link structure

— λ2(G) = α Convergence rate controllable by Google engineers

— vT can be any positive probability vector in G = αH + uvT

— The choice of vT allows for personalization

Page 123: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 124: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton
Page 125: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

Personalization is Coming

Page 126: Raleigh, NC North Carolina State University Department of ...carlmeyer.com/Presentations/CSU_11_2_07.pdf · S ystem for the M echanical A nalysis and R etrieval of T ext Gerard Salton

ConclusionGoogle Augments PR With Content Scores For Final Rankings

Content “Metrics” Are Proprietary — But Known Examples

— Whether query terms appear in the title or the body

— Number of times query terms appear in a page

— Proximity of multiple query words to one another

— Appearance of query terms in a page (e.g., headings in bold font score higher)

— Content of neighboring web pages

Elegant and Exciting Application of Mathematics

That Is Changing The World