Subject Headings make information to be topic maps

Post on 13-Dec-2014

1.270 views 2 download

description

This paper reports the efforts to make topic maps from Subject Headings (SHs) and discuss practical use of them for organizing information and knowledge. SHs are often maintained by libraries and used in bibliographic records. SHs are thesauri and they are well organized. Fortunately some SHs are published on the Web. We transformed them to topic maps. Usually each subject in SHs has own ID. It can play PSI role. By keeping the relationships included in SHs such as Broader-Narrower, Related, USE-UF etc in topic maps, information or knowledge can be linked together and organized according to the structure of SHs. In other words, by using SHs information and knowledge can be topic maps easily.

Transcript of Subject Headings make information to be topic maps

Subject Headings make information to be

topic maps

2010-9-30

Motomu Naito

Center for Integrated Area Studies (CIAS)

Kyoto University

motom@green.ocn.ne.jp

Ψ http://psi.ontopedia.net/Motomu_Naito

http://www.cias.kyoto-u.ac.jp/english/CIAS/

1

Outline

1.Back ground

2.Purpose

3.Subject Headings

3 .1 NDLSH

3 .2 LCSH

4.Practical use of Subject Headings

5.Demo

6.Challenges

7.Conclusion & Future work

1. Background: Area Study and Area Informatics

This activity is a part of activities of Area Informatics in Center for Integrated Area Study (CIAS) in Kyoto university

Area Study is an Interdisciplinary Science

Understanding/comparing areas comprehensively

Diverse languages/subjects/disciplines/methodologies:

• history, literature, religions, politics, economics, ethnology, folklore, agriculture, environment, etc.

Area Informatics

Informatics paradigm in area studies

Focusing on quantitative analysis

• Objective, comparative and reproducible approaches

• Spatiotemporal attributes of events

Knowledge discovery supports

• Integration of disciplines

• Creation of hypothesesSource: Shoichiro Hara, TMJP2010,

http://www.knowledge-synergy.com/events/documents/TMJP2010-hara.pdf

Model of Area InformaticsSource: Shoichiro Hara, TMJP2010

4

2.Purpose

- Making and maintaining well organized knowledge is very hard

and time consuming work

- There have been many well organized knowledge

(ex: NDLSH, BSH, LCSH, JST thesaurus, etc.)

- Fortunately some Subject Headings (SHs) are published on the web

and we can use them (ex: NDLSH, LCSH)

Purpose of our activity:

To make good system for linking and organizing Area Studies related information

Purpose of today’s presentation:

To report and discuss about our efforts to make topic maps and PSI from SHs

5

3.Subject Headings

What is Subject Headings:

Wikipedia redirects “Subject Headings” to “Index term” and define the term as

“An index term, subject term, subject heading, or descriptor, in information retrieval, is a term that captures the essence of the topic of a document. Index terms make up a controlled vocabulary for use in bibliographic records.” (http://en.wikipedia.org/wiki/Index_term)

・We are working on the following SHs at the moment

- NDLSH, BSH and LCSH

・Probably we can find much more SHs in various countries

- German SH, Norwegian SH, Finnish SH, Thai SH, etc.

6

3.1 NDLSH

・ NDLSH: National Diet Library Subject Headings, in Japan

・We are making topic map from NDLSH 2008 Version

- Subject Headings:17,953

- Subject Headings + Reference words:47,816 (47,377)

- BT-NT relation:13,220 RT relation: 9,738

- USE-UF relation with LCSH: 11,663

・Conversion from the SH to Topic Map

- Subject Headings -> Topics

- BT-NT, RT, USE-UF relation -> Associations

- USE-UF, SA relation, Scope note, reading, … -> Occurrences

・ SHs have each own ID that can be used as PSI (e.g. 00574308)

・ If NDLSH shares PSI with LCSH, it can be merged with LCSH

・ NDLSH was exposed on the Web

We can download it from http://id.ndl.go.jp/auth/ndlsh

7

Some part of NDLSH

Subject Headings around “ビール: Beer”

8

Origianal data

ビール ビール〈地理区分〉 ID:00560674 UF:ビヤ ; 麦酒〔バクシュ〕 ; BeerBT:洋酒〔ヨウシュ〕{00574373} RT:ホップ{00563417} ; ※麦芽〔バク

ガ〕{00560487}NDC(9):588.54 NDLC:DL687;PA416

ビールス ビールス USE:ウイルス{00560678}

ビールスショウ ビールス症 USE:ウイルス感染症〔ウイルスカンセンショウ〕{00560679}

ビールゾク ※ビール族 ID:00575193 UF:Bhil (Indic people)NDC(9):382.25;469.925 NDLC:G131;SA51

ビールムギ ビール麦USE:大麦〔オオムギ〕{00568818}

ビインコウ 鼻咽腔 ID:00560662 UF:上咽頭〔ジョウイントウ〕 ; Nasopharynx BT:咽頭〔イントウ〕{00564179} NDC(9):491.134;496.8

NDLC:SC661

ヒエ ヒエ ID:00563143 UF:稗〔ヒエ〕 BT:穀物〔コクモツ〕{00566375} ; イネ科〔イネカ〕{00564121} NDC(9):479.343;616.62

NDLC:DM221;RA347;RB134

ヒエ 稗 USE:ヒエ{00563143}

ヒエイリダンタイ 非営利団体 USE:NPO〈地理区分〉{00577640}

NDLSH is provided as TSV (Tab Separated Value) format data

9

Conversion process

Conversion from original TSV data to topic maps

10

NDLSH Ontology

Ontology graph of NDLSH topic map

11

NDLSH topic map applicationScreen shots of the application

12

3.2 LCSH・ LCSH : Library of Congress Subject Headings in US

・We are making topic map from LCSH

- We downloaded it from “http://id.loc.gov/authorities/”

- Subject Headings : 380, 123

- BT-NT : 254,651 RT : 11,137

・ RDF (SKOS) to Topic Maps using Omnigator

- SH (core:Concept) -> Topics

- BT-NT, RT relation -> Associations

- scopeNote, created, modified, comment etc. -> Occurrences

・ SHs have each own identifiers as URI that can be used as PSIs

(e.g. http://id.loc.gov/authorities/sh85000002#concept)

・ LCSH has already exposed on the Web in consideration of

Linked data

13

Some part of LCSH

Subject Headings around “Beer”

14

Origianal data

<rdf:Description rdf:about="http://id.loc.gov/authorities/sh85012832#concept">

: :

<skos:narrower rdf:resource="http://id.loc.gov/authorities/sh97006323#concept"/>

<skos:broader rdf:resource="http://id.loc.gov/authorities/sh85080196#concept"/>

<skos:closeMatch rdf:resource="http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb11965887d"/>

<skos:inScheme rdf:resource="http://id.loc.gov/authorities#conceptScheme"/>

<skos:inScheme rdf:resource="http://id.loc.gov/authorities#topicalTerms"/>

<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>

<skos:related rdf:resource="http://id.loc.gov/authorities/sh85003341#concept"/>

<skos:related rdf:resource="http://id.loc.gov/authorities/sh85016775#concept"/>

<skos:related rdf:resource="http://id.loc.gov/authorities/sh85031951#concept"/>

<skos:prefLabel xml:lang="en">Beer</skos:prefLabel>

<owl:sameAs rdf:resource="info:lc/authorities/sh85012832"/>

<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">1989-03-22T15:09:28-04:00</dcterms:modified>

</rdf:Description>

LCSH is provided as RDF format data

15

LCSH Ontology

Ontology graph of LCSH topic map

LCSH topic map applicationScreen shots of the application

17

4. Practical use of Subject Headings

Many practical uses are possible

For example:

・ Organizing internal and external information according to SHs

・Multilanguage mapping using LCSH as a core system

・Mutual complementing of our concept classification and SHs

・ SH providing web service using TMRAP

・ Using SHs as PSI

・ Using SHs as common test data for TM engines, TM Query

engines, etc.

18

(1) Organizing information according to SHs

Example: Organizing Wikipedia according to SHs

・Available links to Wikipedia (NDLSH: 12051, BSH: 6086)

Subject Headings

around “Beer”

19

Organizing Wikipedia

Beer

Hop

Malt

Wines and Spirits

Liquor

Amenities of lifeWine

Whiskey

Fruit liquor

Brandy

Barley

Beer

Distilled liquor

The world around “Beer” in NDLSH

20

Organizing Wikipedia

We can easily generate Wikipedia’s address

“http://ja.wikipedia.org/wiki/” + “ビール” (SH)

21

(2) Mapping between multi-language

If each language is mapped to LCSH, multi-language mapping

will be achieved

NDLSH or BSH (Japanese)

LCSH (English)

mergemerge

mergemerge

Norwegian SH

(Norwegian)

e.g. Japanese Norwegian mapping via LCSH (English)

ビール Beer

Øl

22

Mapping between multi-language

Link from NDLSH to LCSH

(USE-UF relation between NDLSH and LCSH)

23

(3) Mutual complementing

- Sometimes SHs doesn’t have enough subjects or vocabulary though

it is very hard to gather enough subjects from scratch by ourselves

- By merging our own subjects with SHs we can get enriched subjects

Ontopia- Navigator Framework - Query engine

Topic Maps Web Application- JSP Page

Topic Map SH Topic Map

Ontopia- Navigator Framework - Query engine

Topic MapsWeb Application- JSP Page

ClientSH providingWeb service

“Term or Subject”

“Subject” topic

Request SH

Return SH related TM fragments

SH related information

(4) Web service for providing Subject Headings

Subject Heading providing web service using TMRAP

Information from client’s Web application

25

5. Demo

I will do short demo if I have enough time

26

6. Challenges(1) Attach or extract subjects to/from information

In order to organize information , we need

・attach subject to information by human

- tagging systems are required

・extract subjects from information

- subject extraction systems are required

(2) Large data

・We can’t convert large RDF data to topic map at the moment

because of out of memory

We had to omit “skos:altLabel”, “owl:sameAs”, etc.

We need scalable and stable environment for big files

(3) Type or Instance?

・We are treating each Subject Heading as instance topic

But probably, Subject Headings are type topics

We want to make topic map treating those as type topics

27

7.Conclusion & Future work No.1

・ CIAS has already stored huge amount of information that is wanted

to be topic maps

・Many well organized knowledge such as NDLSH, BSH, LCSH, etc.

have already existed

・We are making topic maps and their web application from them

・ Topic maps can inherit Subject Headings and their relationships

such as BT-NT, RT and USE-UF naturally

・According to the relationships, information can be linked and

organized, in other words, to be topic maps

・ By providing Subject Headings as topic maps and PSI for use in

the context of Linked Topic Maps, they will become powerful

elements and they will be used in many way

28

7. Conclusion & Future work No.2

・ To make our own ontologies

・ Continue to try our information to be topic maps

according to our ontologies and the SHs

・ Continue to try to achieve multi-language mapping

using the SHs

・ Try to merge our domain subjects with the SHs

・ Try to find out and realize good ways to link the SHs

with information resources

・ Try to realize the web service for providing the SHs

・ Others (Many, Many, Many, …. )

29

ありがとう

ございました。

Danke schön

Any suggestion?