Wikipedia as controlled vocabulary

of 68 /68
Chris Sizemore Silver Oliver BBC ipedia as controlled vocabulary

Embed Size (px)

description

The Essentials of Metadata and Taxonomy - Henry Stewart EventThe Next Wave: Using Wikipedia as a Controlled Vocabulary * Leveraging an online resource for internal use * Integrating pre-existing unique identifications numbers (UIDs) * Inherited relations * Capturing and cataloging * Risks and remedies Chris Sizemore BBC Future Technology & Media and Silver Oliver, BBC Future Technology & Media

Transcript of Wikipedia as controlled vocabulary

  • 1.Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary

2. Im about Victorians 3. BBC Topic Page Im about Victorians Outside the BBC BBC silo #1 BBC silo #3 BBC silo #2 4. BBC Topic Page Im about Victorians viktorianisch V r NY Times, flickr, wikipedia Outside the BBC BBC silo #1 BBC silo #3 BBC silo #2 5. An index language exists primarily to: 6.

  • An index language exists primarily to:
  • Allow an indexer to represent the subject matter of documents in a consistent way

7.

  • An index language exists primarily to:
  • Allow an indexer to represent the subject matter of documents in a consistent way
  • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer

8.

  • An index language exists primarily to:
  • Allow an indexer to represent the subject matter of documents in a consistent way
  • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
  • Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate

9.

  • An index language exists primarily to:
  • Allow an indexer to represent the subject matter of documents in a consistent way
  • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
  • Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate

F.W. Lancaster Vocabulary control for information retrieval 10. Could Wikipedia be used as a universal language for identifying subjects? 11. Story of Wikipedia-as-CV 12. Story of Wikipedia-as-CV: personal origins 13. 14. Story of Wikipedia-as-CV: personal origins We needed a system to categorise movie & TV reviews 15. Story of Wikipedia-as-CV: personal origins So of course we built a categorisation system from scratch -- including its own controlled vocab 16. Story of Wikipedia-as-CV: personal origins And when people saw the system, they always said: Hey, that reminds me of Internet Movie Database 17. 18. Story of Wikipedia-as-CV: personal origins It struck me that the way Internet Movie Database is set up isnt dissimilar to the structure of a thesaurus or a very flat taxonomy 19. Story of Wikipedia-as-CV: personal origins But itss one where the emphasis is on related to, not broader/narrower, synonym, antonym, etc 20. Story of Wikipedia-as-CV: personal origins From then, I couldnt help but be drawn to websites where the structure is clearly: 21. Story of Wikipedia-as-CV: personal origins From then, I couldnt help but be drawn to websites where the structure is clearly: a single primary Concept per page --and pages for related Conceptslink to each other 22. Story of Wikipedia-as-CV: personal origins Could those one Concept per page webpages be used as terms as in a controlled vocabulary? 23. Are some websites actually indexing languages in disguise? 24. conText-- a Wikipedia-as-CV auto-categoriser prototype 25. 26. conText -- a Wikipedia-as-CV auto-categoriser prototype: http://sells.welcomebackstage.com:5000/item/submit 27. 28. Demo ofconText -- a Wikipedia-as-CV auto-categoriser prototype 29. Demo ofconText -- a Wikipedia-as-CV auto-categoriser prototype: Take text from audience! 30. Wikipedia is already being used across the Web as a form of subject identification & disambiguation, in a grassroots way: 31. Wikipedia is already being used across the Web as a form of subject identification & disambiguation, in a grassroots way:in the form of hyperlinksembedded by authors in blog posts, news articles, music reviews, etc everywhere! 32. http://en.wikipedia.org/wiki/British http://en.wikipedia.org/wiki/Science_fiction http://en.wikipedia.org/wiki/BBC http://en.wikipedia.org/wiki/Time_travel http://en.wikipedia.org/wiki/Dr_who http://en.wikipedia.org/wiki/Tardis 33. These days, by convention, when you link to Wikipedia from your webpage, more than saying go and have a look at this other page, you are more likely giving a definition to a concept referred to in your content 34. These days, by convention, when you link to Wikipedia from your webpage, more than saying go and have a look at this other page, you are more likely giving a definition to a concept referred to in your content Also used in this way for specific domains are Internet Movie Database (for films & TV programmes), MySpace (for bands), Amazon (for books), etc 35. For general knowledge, though, Wikipedia is becoming the Webs defacto controlled vocabulary 36. http://en.wikipedia.org/wiki/Heerlen http://en.wikipedia.org/wiki/Beethoven http://en.wikipedia.org/wiki/Amsterdam http://en.wikipedia.org/wiki/Van_Gogh_Museum 37.

  • An index language exists primarily to:
  • Allow an indexer to represent the subject matter of documents in a consistent way
  • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
  • Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate

F.W. Lancaster Vocabulary control for information retrieval 38. Wikipedia pages provide the best scope notes in the world 39. Wikipedia pages provide the best scope notes in the world Wikipedia-as-CV benefits from being developed through a social process, maintained and kept current by the Wikipedia community 40. Wikipedia pages provide the best scope notes in the world Wikipedia-as-CV benefits from being developed through a social process, maintained and kept current by the Wikipedia community Each concept represents a consensus view and its meaning can be understood simply by reading the associated Wikipedia page 41. Wikipedia pages provide the best scope notes in the world For each Concept, the document edit history, discussion around concept definition, & debate is important here 42. 43.

  • An index language exists primarily to:
  • Allow an indexer to represent the subject matter of documents in a consistent way
  • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
  • Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate

F.W. Lancaster Vocabulary control for information retrieval 44. So, we can tag pretty accurately semi-automatically with globally unique subject identifiers using this approach So what? 45. So, we can tag pretty accurately semi-automatically with globally unique subject identifiers using this approach So what? Un-silo your content repository quickly and cheaply, by connecting it to the Web via Wikipedia 46. 47. 48. 49. 50. Now playing vs. the Web 51. 52. 53. Now playing vs. the Web Why not bring in BBC Archive materials to this service via Wikipedia-as-CV tagging and linked data bridge between Wikipedia & MusicBrainz? 54. 55. 56. By usingWikipedia-as-CV,you can get your repository onto this diagram quickly,for free 57. 58.

  • An index language exists primarily to:
  • Allow an indexer to represent the subject matter of documents in a consistent way
  • Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
  • Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate

F.W. Lancaster Vocabulary control for information retrieval 59. A Web-scale, globally accessible index language accidentally exists: 60.

  • A Web-scale, globally accessible index language accidentally exists:
  • It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way

61.

  • A Web-scale, globally accessible index language accidentally exists:
  • It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
  • It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa

62.

  • A Web-scale, globally accessible index language accidentally exists:
  • It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
  • It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa
  • It provides means whereby a searcher can modulate a search and/or browse strategy to attain comprehensive or selective results as user needs dictate

63.

  • A Web-scale, globally accessible index language accidentally exists:
  • It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
  • It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa
  • It provides means whereby a searcher can modulate a search and/or browse strategy to attain comprehensive or selective results as user needs dictate
  • It adds Web-scale navigation & cross-reference possibilities

64. Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Wikipedia is a controlled vocabulary 65. Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Wikipedia is a controlled vocabulary 66. Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Chris Sizemore Silver Oliver BBC Wikipedia is a controlled vocabulary 67. Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary Chris Sizemore Silver Oliver BBC Wikipedia is a controlled vocabulary Much thanks! Questions, comments, & constructive criticism? 68. Chris Sizemore Silver Oliver BBC Wikipedia as controlled vocabulary http://flickr.com/photos/deniscollette/1817034358/