Subject Headings make information to be
topic maps
2010-9-30
Motomu Naito
Center for Integrated Area Studies (CIAS)
Kyoto University
Ψ http://psi.ontopedia.net/Motomu_Naito
http://www.cias.kyoto-u.ac.jp/english/CIAS/
1
Outline
1.Back ground
2.Purpose
3.Subject Headings
3 .1 NDLSH
3 .2 LCSH
4.Practical use of Subject Headings
5.Demo
6.Challenges
7.Conclusion & Future work
1. Background: Area Study and Area Informatics
This activity is a part of activities of Area Informatics in Center for Integrated Area Study (CIAS) in Kyoto university
Area Study is an Interdisciplinary Science
Understanding/comparing areas comprehensively
Diverse languages/subjects/disciplines/methodologies:
• history, literature, religions, politics, economics, ethnology, folklore, agriculture, environment, etc.
Area Informatics
Informatics paradigm in area studies
Focusing on quantitative analysis
• Objective, comparative and reproducible approaches
• Spatiotemporal attributes of events
Knowledge discovery supports
• Integration of disciplines
• Creation of hypothesesSource: Shoichiro Hara, TMJP2010,
http://www.knowledge-synergy.com/events/documents/TMJP2010-hara.pdf
Model of Area InformaticsSource: Shoichiro Hara, TMJP2010
4
2.Purpose
- Making and maintaining well organized knowledge is very hard
and time consuming work
- There have been many well organized knowledge
(ex: NDLSH, BSH, LCSH, JST thesaurus, etc.)
- Fortunately some Subject Headings (SHs) are published on the web
and we can use them (ex: NDLSH, LCSH)
Purpose of our activity:
To make good system for linking and organizing Area Studies related information
Purpose of today’s presentation:
To report and discuss about our efforts to make topic maps and PSI from SHs
5
3.Subject Headings
What is Subject Headings:
Wikipedia redirects “Subject Headings” to “Index term” and define the term as
“An index term, subject term, subject heading, or descriptor, in information retrieval, is a term that captures the essence of the topic of a document. Index terms make up a controlled vocabulary for use in bibliographic records.” (http://en.wikipedia.org/wiki/Index_term)
・We are working on the following SHs at the moment
- NDLSH, BSH and LCSH
・Probably we can find much more SHs in various countries
- German SH, Norwegian SH, Finnish SH, Thai SH, etc.
6
3.1 NDLSH
・ NDLSH: National Diet Library Subject Headings, in Japan
・We are making topic map from NDLSH 2008 Version
- Subject Headings:17,953
- Subject Headings + Reference words:47,816 (47,377)
- BT-NT relation:13,220 RT relation: 9,738
- USE-UF relation with LCSH: 11,663
・Conversion from the SH to Topic Map
- Subject Headings -> Topics
- BT-NT, RT, USE-UF relation -> Associations
- USE-UF, SA relation, Scope note, reading, … -> Occurrences
・ SHs have each own ID that can be used as PSI (e.g. 00574308)
・ If NDLSH shares PSI with LCSH, it can be merged with LCSH
・ NDLSH was exposed on the Web
We can download it from http://id.ndl.go.jp/auth/ndlsh
7
Some part of NDLSH
Subject Headings around “ビール: Beer”
8
Origianal data
ビール ビール〈地理区分〉 ID:00560674 UF:ビヤ ; 麦酒〔バクシュ〕 ; BeerBT:洋酒〔ヨウシュ〕{00574373} RT:ホップ{00563417} ; ※麦芽〔バク
ガ〕{00560487}NDC(9):588.54 NDLC:DL687;PA416
ビールス ビールス USE:ウイルス{00560678}
ビールスショウ ビールス症 USE:ウイルス感染症〔ウイルスカンセンショウ〕{00560679}
ビールゾク ※ビール族 ID:00575193 UF:Bhil (Indic people)NDC(9):382.25;469.925 NDLC:G131;SA51
ビールムギ ビール麦USE:大麦〔オオムギ〕{00568818}
ビインコウ 鼻咽腔 ID:00560662 UF:上咽頭〔ジョウイントウ〕 ; Nasopharynx BT:咽頭〔イントウ〕{00564179} NDC(9):491.134;496.8
NDLC:SC661
ヒエ ヒエ ID:00563143 UF:稗〔ヒエ〕 BT:穀物〔コクモツ〕{00566375} ; イネ科〔イネカ〕{00564121} NDC(9):479.343;616.62
NDLC:DM221;RA347;RB134
ヒエ 稗 USE:ヒエ{00563143}
ヒエイリダンタイ 非営利団体 USE:NPO〈地理区分〉{00577640}
NDLSH is provided as TSV (Tab Separated Value) format data
9
Conversion process
Conversion from original TSV data to topic maps
10
NDLSH Ontology
Ontology graph of NDLSH topic map
11
NDLSH topic map applicationScreen shots of the application
12
3.2 LCSH・ LCSH : Library of Congress Subject Headings in US
・We are making topic map from LCSH
- We downloaded it from “http://id.loc.gov/authorities/”
- Subject Headings : 380, 123
- BT-NT : 254,651 RT : 11,137
・ RDF (SKOS) to Topic Maps using Omnigator
- SH (core:Concept) -> Topics
- BT-NT, RT relation -> Associations
- scopeNote, created, modified, comment etc. -> Occurrences
・ SHs have each own identifiers as URI that can be used as PSIs
(e.g. http://id.loc.gov/authorities/sh85000002#concept)
・ LCSH has already exposed on the Web in consideration of
Linked data
13
Some part of LCSH
Subject Headings around “Beer”
14
Origianal data
<rdf:Description rdf:about="http://id.loc.gov/authorities/sh85012832#concept">
: :
<skos:narrower rdf:resource="http://id.loc.gov/authorities/sh97006323#concept"/>
<skos:broader rdf:resource="http://id.loc.gov/authorities/sh85080196#concept"/>
<skos:closeMatch rdf:resource="http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb11965887d"/>
<skos:inScheme rdf:resource="http://id.loc.gov/authorities#conceptScheme"/>
<skos:inScheme rdf:resource="http://id.loc.gov/authorities#topicalTerms"/>
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:related rdf:resource="http://id.loc.gov/authorities/sh85003341#concept"/>
<skos:related rdf:resource="http://id.loc.gov/authorities/sh85016775#concept"/>
<skos:related rdf:resource="http://id.loc.gov/authorities/sh85031951#concept"/>
<skos:prefLabel xml:lang="en">Beer</skos:prefLabel>
<owl:sameAs rdf:resource="info:lc/authorities/sh85012832"/>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">1989-03-22T15:09:28-04:00</dcterms:modified>
</rdf:Description>
LCSH is provided as RDF format data
15
LCSH Ontology
Ontology graph of LCSH topic map
LCSH topic map applicationScreen shots of the application
17
4. Practical use of Subject Headings
Many practical uses are possible
For example:
・ Organizing internal and external information according to SHs
・Multilanguage mapping using LCSH as a core system
・Mutual complementing of our concept classification and SHs
・ SH providing web service using TMRAP
・ Using SHs as PSI
・ Using SHs as common test data for TM engines, TM Query
engines, etc.
18
(1) Organizing information according to SHs
Example: Organizing Wikipedia according to SHs
・Available links to Wikipedia (NDLSH: 12051, BSH: 6086)
Subject Headings
around “Beer”
19
Organizing Wikipedia
Beer
Hop
Malt
Wines and Spirits
Liquor
Amenities of lifeWine
Whiskey
Fruit liquor
Brandy
Barley
Beer
Distilled liquor
The world around “Beer” in NDLSH
20
Organizing Wikipedia
We can easily generate Wikipedia’s address
“http://ja.wikipedia.org/wiki/” + “ビール” (SH)
21
(2) Mapping between multi-language
If each language is mapped to LCSH, multi-language mapping
will be achieved
NDLSH or BSH (Japanese)
LCSH (English)
mergemerge
mergemerge
Norwegian SH
(Norwegian)
e.g. Japanese Norwegian mapping via LCSH (English)
ビール Beer
Øl
22
Mapping between multi-language
Link from NDLSH to LCSH
(USE-UF relation between NDLSH and LCSH)
23
(3) Mutual complementing
- Sometimes SHs doesn’t have enough subjects or vocabulary though
it is very hard to gather enough subjects from scratch by ourselves
- By merging our own subjects with SHs we can get enriched subjects
Ontopia- Navigator Framework - Query engine
Topic Maps Web Application- JSP Page
Topic Map SH Topic Map
Ontopia- Navigator Framework - Query engine
Topic MapsWeb Application- JSP Page
ClientSH providingWeb service
“Term or Subject”
“Subject” topic
Request SH
Return SH related TM fragments
SH related information
(4) Web service for providing Subject Headings
Subject Heading providing web service using TMRAP
Information from client’s Web application
25
5. Demo
I will do short demo if I have enough time
26
6. Challenges(1) Attach or extract subjects to/from information
In order to organize information , we need
・attach subject to information by human
- tagging systems are required
・extract subjects from information
- subject extraction systems are required
(2) Large data
・We can’t convert large RDF data to topic map at the moment
because of out of memory
We had to omit “skos:altLabel”, “owl:sameAs”, etc.
We need scalable and stable environment for big files
(3) Type or Instance?
・We are treating each Subject Heading as instance topic
But probably, Subject Headings are type topics
We want to make topic map treating those as type topics
27
7.Conclusion & Future work No.1
・ CIAS has already stored huge amount of information that is wanted
to be topic maps
・Many well organized knowledge such as NDLSH, BSH, LCSH, etc.
have already existed
・We are making topic maps and their web application from them
・ Topic maps can inherit Subject Headings and their relationships
such as BT-NT, RT and USE-UF naturally
・According to the relationships, information can be linked and
organized, in other words, to be topic maps
・ By providing Subject Headings as topic maps and PSI for use in
the context of Linked Topic Maps, they will become powerful
elements and they will be used in many way
28
7. Conclusion & Future work No.2
・ To make our own ontologies
・ Continue to try our information to be topic maps
according to our ontologies and the SHs
・ Continue to try to achieve multi-language mapping
using the SHs
・ Try to merge our domain subjects with the SHs
・ Try to find out and realize good ways to link the SHs
with information resources
・ Try to realize the web service for providing the SHs
・ Others (Many, Many, Many, …. )
29
ありがとう
ございました。
Danke schön
Any suggestion?
Top Related