Wsdm west wesley-smith

47
Jevin West, Information School, University of Washington Ian Wesley-Smith, Information School, University of Washington Carl T. Bergstrom, Department of Biology, University of Washington Article-Level EigenFactor (ALEF)

Transcript of Wsdm west wesley-smith

Page 1: Wsdm west wesley-smith

Jevin West, Information School, University of WashingtonIan Wesley-Smith, Information School, University of Washington

Carl T. Bergstrom, Department of Biology, University of Washington

Article-Level EigenFactor (ALEF)

Page 2: Wsdm west wesley-smith
Page 3: Wsdm west wesley-smith

Article-level Eigenfactor

WSDM Cup Challenge

Page 4: Wsdm west wesley-smith
Page 5: Wsdm west wesley-smith

Journal Ranking

P = α H + (1 − α ) a.eT

Matrix representing therandom walk over citations Probability of

not teleportingCross-citation Matrixdictating the structureof the citation network

Probability of teleportingto completely new journalweighted by the numberof articles in that journal

EF =100 Hπ[Hπ ]ii∑

Leading eigenvectorof the random walkmatrix P.

Normalization

West, JD et al. (2010) College of Research Libraries

Page 6: Wsdm west wesley-smith
Page 7: Wsdm west wesley-smith
Page 8: Wsdm west wesley-smith

Hierarchical Mappingwithout ALEF

Page 9: Wsdm west wesley-smith

Hierarchical Mappingwith ALEF

Page 10: Wsdm west wesley-smith

Flow Distribution

PageRank

ALEF

Time

Page 11: Wsdm west wesley-smith
Page 12: Wsdm west wesley-smith

Smart Teleportation

Lambiottee & Rosvall (2012) PhysRevE

Page 13: Wsdm west wesley-smith

1. calculate step weight

2. make row stochastic

3. one-step on network

adjacency matricMechanics

Page 14: Wsdm west wesley-smith

Flow Distribution (JSTOR)To

tal F

low

/Pap

er

Year

Black = ALEF

Green = citations

Red = PageRank

Blue = unrecorded teleport

Page 15: Wsdm west wesley-smith

Tree

Dep

th

Clus

ter S

ize

Year Year

Tree Depth and Cluster Size

Black = ALEFGreen = OUTDIR

Red = DIR-RBlue = DIR-UR

Page 16: Wsdm west wesley-smith

ALEF Strengths

Performs well

Simple mechanics

Fast calculation

High resolution partitions

Page 17: Wsdm west wesley-smith

West, Wesley-Smith, Bergstrom (2016) A recommendation system based on hierarchical clustering of an article-level citation network. IEEE, Transactions on Big Data

Page 18: Wsdm west wesley-smith

Papers§ J.D. West, M. Rosvall, C.T. Bergstrom (2016) Ranking and

mapping article-level citation networks, in prep§ J.D. West, I. Wesley-Smith, C.T. Bergstrom (2016) A

recommendation system based on hierarchical clustering of an article-level citation network. IEEE, Transactions on Big Data

§ I. Wesley-Smith, C.T. Bergstrom, J.D. West (2016) Static Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF), WSDM Conference: Entity Ranking Challenge Workshop

§ I. Wesley-Smith, J.D. West (2016) Babel: A platform for research in scholarly article recommendation. WWW Conference, Workshop on Big Scholarly Data

Page 19: Wsdm west wesley-smith

babel.eigenfactor.org

Ian Wesley-Smith

Page 20: Wsdm west wesley-smith

Article-level Eigenfactor

WSDM Cup Challenge

Page 21: Wsdm west wesley-smith

Data Pipeline

CitationScore AuthorScores

BlendFeatures

RandomizeZeroes

FinalScores

RawData

Page 22: Wsdm west wesley-smith

Citation Scores

CitationScore AuthorScores

BlendFeatures

RandomizeZeroes

FinalScores

RawData

Page 23: Wsdm west wesley-smith

Citation Scores

Average Paper Score by Year (ALEF)

Aver

age

Scor

e

Year

Page 24: Wsdm west wesley-smith

Citation Variants

42.76

4.2

69.3

33.97

0.01

68.7

40.76

7.2

66.5

40.76

6.13

66.3

40.76

4.9

68.1

Coverage (%) Unique (%) Score (%)

ALEF Degree Centrality 2-Step In Citations Uniform

24

Page 25: Wsdm west wesley-smith

Author Scores

CitationScore AuthorScores

BlendFeatures

RandomizeZeroes

FinalScores

RawData

Page 26: Wsdm west wesley-smith

Author Scores

• Author Score = Average citation score of all papers

• How should paper credit be assigned?– Equally or Fractional?

• Why not sum?– Unique Scores: 72.15% vs 28.27%

Page 27: Wsdm west wesley-smith

Other Features?

CitationScore AuthorScores

BlendFeatures

RandomizeZeroes

FinalScores

RawData AffiliationScore?

Page 28: Wsdm west wesley-smith

Other Features• Matching datasets is hard• Author Affiliation: University of Washington– george washington university– university of washington bioengineering– university of washington information school– university of washington school of law– university of washington tacoma– university of washington bothell

• Coverage is low: 25% of paper-author pairs have an affiliation

28

Page 29: Wsdm west wesley-smith

Blend Features

CitationScore AuthorScores

BlendFeatures

RandomizeZeroes

FinalScores

RawData

Page 30: Wsdm west wesley-smith

Blend Features

• Weighted Average– Weights found via manual parameter sweep– Citation Score: 70%– Author Score: 30%

• Axiom: Derived scores shouldn’t outweigh the source

Page 31: Wsdm west wesley-smith

Randomize Zeroes

CitationScore AuthorScores

BlendFeatures

RandomizeZeroes

FinalScores

RawData

Page 32: Wsdm west wesley-smith

Random Chance?

• Our best isn’t much better than random– Random: 52.6%– 1st: 68.3% (+30%)

• This judging is favorable to random chance• Unscored papers assigned [0, minval * 0.999]

32

Page 33: Wsdm west wesley-smith

Phase I Results

CitationScore AuthorScores

BlendFeatures

RandomizeZeroes

FinalScores

RawData

Page 34: Wsdm west wesley-smith

Submissions

34

40.76

4.2

69.3

54.76

72.15 69.9

100

84.75

69.9

Coverage (%) Unique (%) Score (%)

ALEF ALEF + Author Scores Final Submission

Page 35: Wsdm west wesley-smith

Submissions

Page 36: Wsdm west wesley-smith

ALEF Paper Scores

Average Paper Score by Year (ALEF)

Aver

age

Scor

e

Year

Page 37: Wsdm west wesley-smith

Final Paper Scores

Average Paper Score by Year (Final)

Aver

age

Scor

e

Year

Page 38: Wsdm west wesley-smith

Phase I – Evaluation Results

• 0.699• 15th

38

Page 39: Wsdm west wesley-smith

Phase I –Test Results

• 0.699 -> 0.676 (-3.3%)• 15th -> 2nd

39

Page 40: Wsdm west wesley-smith

Eigenfactor™ & Author Scores

42.76

4.2

69.3

54.76

72.15 69.9

Coverage(%) Unique(%) Score(%)

Eigenfactor™ Eigenfactor™&AuthorScores

40

Page 41: Wsdm west wesley-smith
Page 42: Wsdm west wesley-smith
Page 43: Wsdm west wesley-smith
Page 44: Wsdm west wesley-smith

Logistics• Phase II– Verticies 49,870,036– Edges 949,577,946

• Calculate Citation Scores: 34 minutes• Build Paper-Author Matrix: ~2 hours• Calculate Author Scores: 2 minutes• Author Score Feature: 5 minutes• Blending: 30 seconds

Page 45: Wsdm west wesley-smith

ALEF Summary

• Simple, fast variant of PageRank for ar ticle-level citation networks

• Ranks and maps• More experiments and modifications• Data cleaning issues• Thanks to Microsoft Academic Graph and

WSDM Cup Challenge

Page 46: Wsdm west wesley-smith

Acknowledgements

Carl Bergstrom, Department of Biology, University of Washington

Daril Vilhena, Department of Biology, University of Washington

Martin Rosvall, Department of Physics, Umea University

Aditya Gandhi, Information School, University of Washington

Metaknowledge Network, Templeton Foundation

Page 47: Wsdm west wesley-smith

Resources• Info, Data, Code - http://www.eigenfactor.org/• Babel - http://babel.eigenfactor.org/• J.D. West, M. Rosvall, C.T. Bergstrom (2016) Ranking and mapping

article-level citation networks, in prep• J.D. West, I. Wesley-Smith, C.T. Bergstrom (2016) A recommendation

system based on hierarchical clustering of an article-level citation network. IEEE, Transactions on Big Data

• I. Wesley-Smith, C.T. Bergstrom, J.D. West (2016) Static Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF), WSDM Conference: Entity Ranking Challenge Workshop

• I. Wesley-Smith, J.D. West (2016) Babel: A platform for research in scholarly article recommendation. WWW Conference, Workshop on Big Scholarly Data

• Jevin West - http://www.jevinwest.org/• Ian Wesley-Smith – http://iwsmith.in/