Subject Headings make information to be topic maps

30
Subject Headings make information to be topic maps 2010-9-30 Motomu Naito Center for Integrated Area Studies (CIAS) Kyoto University [email protected] Ψ http://psi.ontopedia.net/Motomu_Naito http://www.cias.kyoto-u.ac.jp/english/CIAS/

description

This paper reports the efforts to make topic maps from Subject Headings (SHs) and discuss practical use of them for organizing information and knowledge. SHs are often maintained by libraries and used in bibliographic records. SHs are thesauri and they are well organized. Fortunately some SHs are published on the Web. We transformed them to topic maps. Usually each subject in SHs has own ID. It can play PSI role. By keeping the relationships included in SHs such as Broader-Narrower, Related, USE-UF etc in topic maps, information or knowledge can be linked together and organized according to the structure of SHs. In other words, by using SHs information and knowledge can be topic maps easily.

Transcript of Subject Headings make information to be topic maps

Page 1: Subject Headings make information to be topic maps

Subject Headings make information to be

topic maps

2010-9-30

Motomu Naito

Center for Integrated Area Studies (CIAS)

Kyoto University

[email protected]

Ψ http://psi.ontopedia.net/Motomu_Naito

http://www.cias.kyoto-u.ac.jp/english/CIAS/

Page 2: Subject Headings make information to be topic maps

1

Outline

1.Back ground

2.Purpose

3.Subject Headings

3 .1 NDLSH

3 .2 LCSH

4.Practical use of Subject Headings

5.Demo

6.Challenges

7.Conclusion & Future work

Page 3: Subject Headings make information to be topic maps

1. Background: Area Study and Area Informatics

This activity is a part of activities of Area Informatics in Center for Integrated Area Study (CIAS) in Kyoto university

Area Study is an Interdisciplinary Science

Understanding/comparing areas comprehensively

Diverse languages/subjects/disciplines/methodologies:

• history, literature, religions, politics, economics, ethnology, folklore, agriculture, environment, etc.

Area Informatics

Informatics paradigm in area studies

Focusing on quantitative analysis

• Objective, comparative and reproducible approaches

• Spatiotemporal attributes of events

Knowledge discovery supports

• Integration of disciplines

• Creation of hypothesesSource: Shoichiro Hara, TMJP2010,

http://www.knowledge-synergy.com/events/documents/TMJP2010-hara.pdf

Page 4: Subject Headings make information to be topic maps

Model of Area InformaticsSource: Shoichiro Hara, TMJP2010

Page 5: Subject Headings make information to be topic maps

4

2.Purpose

- Making and maintaining well organized knowledge is very hard

and time consuming work

- There have been many well organized knowledge

(ex: NDLSH, BSH, LCSH, JST thesaurus, etc.)

- Fortunately some Subject Headings (SHs) are published on the web

and we can use them (ex: NDLSH, LCSH)

Purpose of our activity:

To make good system for linking and organizing Area Studies related information

Purpose of today’s presentation:

To report and discuss about our efforts to make topic maps and PSI from SHs

Page 6: Subject Headings make information to be topic maps

5

3.Subject Headings

What is Subject Headings:

Wikipedia redirects “Subject Headings” to “Index term” and define the term as

“An index term, subject term, subject heading, or descriptor, in information retrieval, is a term that captures the essence of the topic of a document. Index terms make up a controlled vocabulary for use in bibliographic records.” (http://en.wikipedia.org/wiki/Index_term)

・We are working on the following SHs at the moment

- NDLSH, BSH and LCSH

・Probably we can find much more SHs in various countries

- German SH, Norwegian SH, Finnish SH, Thai SH, etc.

Page 7: Subject Headings make information to be topic maps

6

3.1 NDLSH

・ NDLSH: National Diet Library Subject Headings, in Japan

・We are making topic map from NDLSH 2008 Version

- Subject Headings:17,953

- Subject Headings + Reference words:47,816 (47,377)

- BT-NT relation:13,220 RT relation: 9,738

- USE-UF relation with LCSH: 11,663

・Conversion from the SH to Topic Map

- Subject Headings -> Topics

- BT-NT, RT, USE-UF relation -> Associations

- USE-UF, SA relation, Scope note, reading, … -> Occurrences

・ SHs have each own ID that can be used as PSI (e.g. 00574308)

・ If NDLSH shares PSI with LCSH, it can be merged with LCSH

・ NDLSH was exposed on the Web

We can download it from http://id.ndl.go.jp/auth/ndlsh

Page 8: Subject Headings make information to be topic maps

7

Some part of NDLSH

Subject Headings around “ビール: Beer”

Page 9: Subject Headings make information to be topic maps

8

Origianal data

ビール ビール〈地理区分〉 ID:00560674 UF:ビヤ ; 麦酒〔バクシュ〕 ; BeerBT:洋酒〔ヨウシュ〕{00574373} RT:ホップ{00563417} ; ※麦芽〔バク

ガ〕{00560487}NDC(9):588.54 NDLC:DL687;PA416

ビールス ビールス USE:ウイルス{00560678}

ビールスショウ ビールス症 USE:ウイルス感染症〔ウイルスカンセンショウ〕{00560679}

ビールゾク ※ビール族 ID:00575193 UF:Bhil (Indic people)NDC(9):382.25;469.925 NDLC:G131;SA51

ビールムギ ビール麦USE:大麦〔オオムギ〕{00568818}

ビインコウ 鼻咽腔 ID:00560662 UF:上咽頭〔ジョウイントウ〕 ; Nasopharynx BT:咽頭〔イントウ〕{00564179} NDC(9):491.134;496.8

NDLC:SC661

ヒエ ヒエ ID:00563143 UF:稗〔ヒエ〕 BT:穀物〔コクモツ〕{00566375} ; イネ科〔イネカ〕{00564121} NDC(9):479.343;616.62

NDLC:DM221;RA347;RB134

ヒエ 稗 USE:ヒエ{00563143}

ヒエイリダンタイ 非営利団体 USE:NPO〈地理区分〉{00577640}

NDLSH is provided as TSV (Tab Separated Value) format data

Page 10: Subject Headings make information to be topic maps

9

Conversion process

Conversion from original TSV data to topic maps

Page 11: Subject Headings make information to be topic maps

10

NDLSH Ontology

Ontology graph of NDLSH topic map

Page 12: Subject Headings make information to be topic maps

11

NDLSH topic map applicationScreen shots of the application

Page 13: Subject Headings make information to be topic maps

12

3.2 LCSH・ LCSH : Library of Congress Subject Headings in US

・We are making topic map from LCSH

- We downloaded it from “http://id.loc.gov/authorities/”

- Subject Headings : 380, 123

- BT-NT : 254,651 RT : 11,137

・ RDF (SKOS) to Topic Maps using Omnigator

- SH (core:Concept) -> Topics

- BT-NT, RT relation -> Associations

- scopeNote, created, modified, comment etc. -> Occurrences

・ SHs have each own identifiers as URI that can be used as PSIs

(e.g. http://id.loc.gov/authorities/sh85000002#concept)

・ LCSH has already exposed on the Web in consideration of

Linked data

Page 14: Subject Headings make information to be topic maps

13

Some part of LCSH

Subject Headings around “Beer”

Page 15: Subject Headings make information to be topic maps

14

Origianal data

<rdf:Description rdf:about="http://id.loc.gov/authorities/sh85012832#concept">

: :

<skos:narrower rdf:resource="http://id.loc.gov/authorities/sh97006323#concept"/>

<skos:broader rdf:resource="http://id.loc.gov/authorities/sh85080196#concept"/>

<skos:closeMatch rdf:resource="http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb11965887d"/>

<skos:inScheme rdf:resource="http://id.loc.gov/authorities#conceptScheme"/>

<skos:inScheme rdf:resource="http://id.loc.gov/authorities#topicalTerms"/>

<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>

<skos:related rdf:resource="http://id.loc.gov/authorities/sh85003341#concept"/>

<skos:related rdf:resource="http://id.loc.gov/authorities/sh85016775#concept"/>

<skos:related rdf:resource="http://id.loc.gov/authorities/sh85031951#concept"/>

<skos:prefLabel xml:lang="en">Beer</skos:prefLabel>

<owl:sameAs rdf:resource="info:lc/authorities/sh85012832"/>

<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">1989-03-22T15:09:28-04:00</dcterms:modified>

</rdf:Description>

LCSH is provided as RDF format data

Page 16: Subject Headings make information to be topic maps

15

LCSH Ontology

Ontology graph of LCSH topic map

Page 17: Subject Headings make information to be topic maps

LCSH topic map applicationScreen shots of the application

Page 18: Subject Headings make information to be topic maps

17

4. Practical use of Subject Headings

Many practical uses are possible

For example:

・ Organizing internal and external information according to SHs

・Multilanguage mapping using LCSH as a core system

・Mutual complementing of our concept classification and SHs

・ SH providing web service using TMRAP

・ Using SHs as PSI

・ Using SHs as common test data for TM engines, TM Query

engines, etc.

Page 19: Subject Headings make information to be topic maps

18

(1) Organizing information according to SHs

Example: Organizing Wikipedia according to SHs

・Available links to Wikipedia (NDLSH: 12051, BSH: 6086)

Subject Headings

around “Beer”

Page 20: Subject Headings make information to be topic maps

19

Organizing Wikipedia

Beer

Hop

Malt

Wines and Spirits

Liquor

Amenities of lifeWine

Whiskey

Fruit liquor

Brandy

Barley

Beer

Distilled liquor

The world around “Beer” in NDLSH

Page 21: Subject Headings make information to be topic maps

20

Organizing Wikipedia

We can easily generate Wikipedia’s address

“http://ja.wikipedia.org/wiki/” + “ビール” (SH)

Page 22: Subject Headings make information to be topic maps

21

(2) Mapping between multi-language

If each language is mapped to LCSH, multi-language mapping

will be achieved

NDLSH or BSH (Japanese)

LCSH (English)

mergemerge

mergemerge

Norwegian SH

(Norwegian)

e.g. Japanese Norwegian mapping via LCSH (English)

ビール Beer

Øl

Page 23: Subject Headings make information to be topic maps

22

Mapping between multi-language

Link from NDLSH to LCSH

(USE-UF relation between NDLSH and LCSH)

Page 24: Subject Headings make information to be topic maps

23

(3) Mutual complementing

- Sometimes SHs doesn’t have enough subjects or vocabulary though

it is very hard to gather enough subjects from scratch by ourselves

- By merging our own subjects with SHs we can get enriched subjects

Page 25: Subject Headings make information to be topic maps

Ontopia- Navigator Framework - Query engine

Topic Maps Web Application- JSP Page

Topic Map SH Topic Map

Ontopia- Navigator Framework - Query engine

Topic MapsWeb Application- JSP Page

ClientSH providingWeb service

“Term or Subject”

“Subject” topic

Request SH

Return SH related TM fragments

SH related information

(4) Web service for providing Subject Headings

Subject Heading providing web service using TMRAP

Information from client’s Web application

Page 26: Subject Headings make information to be topic maps

25

5. Demo

I will do short demo if I have enough time

Page 27: Subject Headings make information to be topic maps

26

6. Challenges(1) Attach or extract subjects to/from information

In order to organize information , we need

・attach subject to information by human

- tagging systems are required

・extract subjects from information

- subject extraction systems are required

(2) Large data

・We can’t convert large RDF data to topic map at the moment

because of out of memory

We had to omit “skos:altLabel”, “owl:sameAs”, etc.

We need scalable and stable environment for big files

(3) Type or Instance?

・We are treating each Subject Heading as instance topic

But probably, Subject Headings are type topics

We want to make topic map treating those as type topics

Page 28: Subject Headings make information to be topic maps

27

7.Conclusion & Future work No.1

・ CIAS has already stored huge amount of information that is wanted

to be topic maps

・Many well organized knowledge such as NDLSH, BSH, LCSH, etc.

have already existed

・We are making topic maps and their web application from them

・ Topic maps can inherit Subject Headings and their relationships

such as BT-NT, RT and USE-UF naturally

・According to the relationships, information can be linked and

organized, in other words, to be topic maps

・ By providing Subject Headings as topic maps and PSI for use in

the context of Linked Topic Maps, they will become powerful

elements and they will be used in many way

Page 29: Subject Headings make information to be topic maps

28

7. Conclusion & Future work No.2

・ To make our own ontologies

・ Continue to try our information to be topic maps

according to our ontologies and the SHs

・ Continue to try to achieve multi-language mapping

using the SHs

・ Try to merge our domain subjects with the SHs

・ Try to find out and realize good ways to link the SHs

with information resources

・ Try to realize the web service for providing the SHs

・ Others (Many, Many, Many, …. )

Page 30: Subject Headings make information to be topic maps

29

ありがとう

ございました。

Danke schön

Any suggestion?