University of the Aegean AI LAB Ontology Learning Εργαστήριο Τεχνητής...
-
Upload
willis-wood -
Category
Documents
-
view
219 -
download
2
Transcript of University of the Aegean AI LAB Ontology Learning Εργαστήριο Τεχνητής...
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Ontology Learning
Εργαστήριο Τεχνητής Νοημοσύνης και Στήριξης Αποφάσεων
AI LabDepartment of Information and Communication Systems Eng.
University of the Aegean83200 Karlovassi, Samos, Greece
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
K. Kotis 2 - QueryOnto
Structure Introduction Ontology Learning approaches Learning from Social Data Evaluation Future trends
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
The research problem lack of user involvement in semantic
content creation tasks small number of Web users (only “SW
people”), (may) annotate their Web resources semantically or build and publish ontologies
“tackle the incentive bottleneck in semantic content creation” Ontologies RDF data
K. Kotis 3 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Aim of SW research community To encourage large-scale user participation the SW community must identify/propose:
incentive structures (spurs) means to motivate humans to become part of
the Semantic Web movement …to contribute their knowledge and time to create
useful ontologies and to use these in annotating documents, images, videos or even Web services
K. Kotis 4 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Possible solutions/proposals Apply Web 2.0 successful stories to SW Borrow incentives from Intranet application
Apply to SW exploit collective intelligence and the
"Wisdom of Crowds“ (community-driven SW applications)
Make the creation of semantic content FUN (games?)
Automate the creation of semantic content “in some degree”
K. Kotis 5 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Automated ontology learning Focus on “useful” ontologies
Ontologies that reflect users’ search intentions Automate the creation of “kick-off”
ontologies To assist users to participate in the ont. Eng.
Life cycle in a more easy way Automate the creation of “fully fledged”
ontologies to assist semantic querying of SWDs
K. Kotis 6 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
A modern approach Ontology learning from social data Query Logs:
Queries reflect users’ search intentions, thus: Learned ontologies will be more suitable for SW
searching Learned ontologies will reflect domain knowledge
related to a specific problem/application (assist soft. agents)
Mining techniques to cluster queries in domains
NLP and ontology matching techniques To reformulate NL query and build the ontology
(query-ontology)K. Kotis 7 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Other O.L approaches From text/corpora
Texts are noisy and hard to process Learned ontologies are too broad whereas
queries are usually expressed in an extremely synoptic manner (a short sequence of keyword terms), focused on specific views in the domain of discourse
From Web 2.0 data (tags in folksonomies) Not an easy way to identify structure (on-going research) Cannot identify POS (with high precision) focus on the creation of light-weight ontologies (mostly
taxonomies)
Both may be suitable for a “Kick-off” ontology BUT ? for a “useful” and “fully-fledged” one
K. Kotis 8 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Other applications of query logs mining Detecting influenza epidemics using
search engine query data monitor health-seeking behaviour in the form
of queries to online search engines analysing large numbers of Google search
queries the relative frequency of certain queries is
highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms
Add formal semantics to solve such problem
K. Kotis 9 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
O.L in O.E life cycle
K. Kotis 10 - QueryOnto
Requirements Specification
Ontology Learning
Knowledge acquisition Develop&
Maintain
Exploitation
Use
Evaluate
Conceptualization
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
O.L in HCOME
K. Kotis 11 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Subtasks in ontology learning Extract the relevant domain terminology and synonyms
from a text collection Discover concepts which can be regarded as abstractions of
human thought Derive a concept hierarchy organizing these concepts Extend an existing concept hierarchy with new concepts Learn non-taxonomic relations between concepts Populate the ontology with instances of relations and
concepts Discover other axiomatic relationships or rules involving
concepts and relations
K. Kotis 12 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
K. Kotis 13 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Extracting the Relevant Terminology
Assumption: some terms unambiguously refer to a domain-specific concept extracting the relevant domain terminology
from a text collection counting raw frequency of terms, applying
information retrieval methods such as TF-IDF (see [Baeza-Yates & Ribeiro-Neto 1999]) OR
applying more sophisticated methods (see [Frantzi & Ananiadou 1999])
K. Kotis 14 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Discovery of Synonyms apply clustering techniques to group
similar words togetherOR use some association measure to detect
pairs of statistically correlated terms ([Manning & Schόtze 1999]).
The detection of synonyms can help to cluster terms to groups of terms sharing (almost) the same meaning, thus representing ontological classes.
K. Kotis 15 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Learning Conceptsand Concept Hierarchies Intentionally: by a descriptive label or its
relationships to other classes extensionally: by specifying a set of
instances belonging to this class Unsupervised hierarchical clustering techniques
known from machine learning research (very noisy as they highly depend on the frequency and behavior of the terms in the text collection under consideration) learn concepts at the same time since they also group
terms to meaning-bearing units can be regarded as abstractions over words and thus, to
some extent, as conceptsK. Kotis 16 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Learning Conceptsand Concept Hierarchies (2) supervised hierarchical clustering
directly involving the user to validate or reject certain clusters
including external information to guide the clustering process
Hearst Patterns (Marti Hearst) certain patterns in text reliably indicate a
relation of interest between terms E.g. “X such as Y” for example indicates that Y
is a subclass of X
K. Kotis 17 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Extending Concept Hierarchy with new
Concepts…by adding a new concept at an appropriate position in the
existing taxonomy
Supervised methods: classifiers need to be trained which predict membership for every
concept in the existing concept hierarchy. need a considerable amount of training data for each concept, such approaches do typically not scale to arbitrary large ontologies.
Unsupervised approaches: assume a similarity function which computes a measure of fit between
the new concept and the concepts existing in the ontology. rely on an appropriate contextual representation of the different
concepts on the basis of which similarity can be computed. the hierarchical structure of the ontology needs to be considered and
somehow integrated into the similarity measure
K. Kotis 18 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Learning Non-Taxonomic Relations
…learn the “flesh” of the ontology i.e. a set of non-taxonomic relationships essential for expressing domain-specific
properties of both classes and instances
E.g. identify verbs in text as indicatorsof a relation between their arguments
(object properties)
K. Kotis 19 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Ontology Population…adding instances of concepts and relations
to the ontology
Hearst Patterns work well
An easy task in O.L
K. Kotis 20 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
O.L Applications Use of ontologies which (to a large extent) are not
axiomatized in the sense of a logical theory Such ontologies typically consist of a set of concepts and a
loosely defined taxonomic organization of these concepts Such semi-formal ontologies (Gruber, 2004) have the
potential of providing a benefit for applications which need some abstraction over plain words but do not mainly rely on logical reasoning.
Such applications can be mainly found in the fields of information retrieval, text mining and machine learning.
K. Kotis 21 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
BT Digital Library Case Study
K. Kotis 22 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
A modern Approach Mining query logs in an Organizational K.M
setting queries reflect organization-specific users’
search interests (queries already clustered) Queries’ history and personalization is
important Mining query logs in the open Web
Search Engines query logs (e.g. Yahoo!) preprocessing step is applied for the
organization (clustering) of queries in domain-specific data sets
K. Kotis 23 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
The architecture
K. Kotis 24 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
The Algorithm
K. Kotis25 - QueryOnto
1. Identify key terms: “Cut” terms that occur more than once in the query log
2. Identify significant neighbor terms of the key terms
identifies mainly nouns, verbs, and
adjectives in order to be able to apply
simple heuristics e.g. for the creation of object properties
OWL Subsumption, equivalent and disjoint axioms WordNet Hypernym/Hyponym, Synonym and Antonym relations
Object properties and individual objects are also discovered using heuristic rules
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Domain clustering of Web queries … in order to reflect domain-specific users’
search interest (necessary condition of our approach)
Algorithm Requirements: cope with large data sets, in terms of time and
computational cost No prior input as regards to the number of
cluster Incremental algorithm, since new queries are
constantly fed in to search engines
K. Kotis 26 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Domain clustering of Web queries (2) Clustering method:
Incremental DBSCAN [Ester et al. 1998] Similarity function
Cosine similarity of weighted term vectors
1. “Cut” Stop-words 2. Apply Porter Stemmer3. Compute weights (tf-idf)4. Compute the cosine similarity function
1 21
1 22 2
1 21 1
( ) ( )( , )
( ) ( )
k
i ii
n m
i ii i
cw q cw qSim q q
w q w q
K. Kotis 27 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Domain clustering of Web queries (3) Advanced methods for discovering
similarity between queries that do not share common keywords Latent Semantic Indexing (LSI or LSA) Similarity as proportional to the number of
commonly selected documents (from resulted ones)
… the nominator denotes the number of common documents clicked and the denominator expresses the maximum number of documents clicked for each query
K. Kotis28 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Technologies used JENA API PELLET reasoner Stanford POS tagger WordNet lexicon Clustering algorithms: DBSCAN Porter Stemmer Similarity functions: cosine and cross-
reference Yahoo! query log (licensed)
K. Kotis 29 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Evaluation strategies Exploitation of learned ontologies in
applications (SW search e.g. SAMOS) Manual evaluation by experts (O.Eng. And
domain experts using e.g. Protégé tool) Automated evaluation by the computation
of similarity functions against a Gold ontology (e.g. OntoEval tool) Generic ontology alignment tools can be used
also
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Evaluation via predictions… researchers have realized that the output of ontology learning
algorithms is far from being perfect
To make the process controllable, we need an assessment ofhow certain an algorithm is in its predictions.
Numeric confidence values of an algorithm in the certainty of a prediction could then be used as a basis to combine different algorithms compensating for the drawbacks and false predictions of each other.
The representation of uncertainty and the combination of algorithms given their certainty are thus inherently coupled and represent one of the main open problems in the field of ontology learning.
K. Kotis 31 - QueryOnto
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
The OntoEval approach Dellschaft and Staab, 2006 computes measures including
Lexical Precision/ Lexical Recall, Taxonomic Precision/ Taxonomic Recall, F-Measure
Given a computed core ontology OC and a reference ontology OR ….
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Lexical Precision/recalllexical precision (LP) and lexical recall (LR) are
defined as:
reflecting how good the learned lexical terms cover the target domain
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Lexical Precision/recall (2)if one compares OC1 and OR1 with each other, one
gets LP(OC1,OR1) = 4/6 = 0.67 and LR(OC1,OR1) = 4/5 = 0.8
Example reference ontology (OR1, left) and computed ontology (OC1, right)
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Local Taxonomic precision the similarity of two concepts is computed
based on characteristic extracts from the concept hierarchy i.e. the position of a concept in the hierarchy
two extracts should contain many common objects if the characterized objects are at similar positions in the hierarchy
The proportion of common objects in the extracts should decrease with increasing dissimilarity of the characterized concepts
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Local Taxonomic precision (2) Given such a characteristic extract ce,
the local taxonomic precision tpce of two concepts c1 ϵ OC and c2 ϵ OR is defined as:
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Global Taxonomic Precision/Recall Local Taxonomic Precision + Semantic
Cotopy Semantic Cotopy (sc) = all super/sub-
concepts of a class in the ontology heavily influenced by the lexical precision of OC
because with decreasing lexical precision more and more concepts of sc(c, OC) are not contained in OR and sc(c, OR).
To overcome the problem, we can use the Common Semantic Cotopy (csc)
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Global Taxonomic Precision/Recall (2) Common Semantic Cotopy (csc)
excludes all concepts which are not also available in the other ontology’s set of concepts
use the common semantic cotopy and by computing the taxonomic precision values for the common concepts of both ontologies
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Global Taxonomic Precision/Recall (3) Balance precision/recall using their f-
measure The harmonic mean of the global taxonomic
precision and recall
The harmonic mean H of the positive real numbers x1, x2, ..., xn is defined to be
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Related Work to the modern appr. Mining of query logs to assist ontology learning
from relational databases Park et al (2003)
Lightweight ontologies - based on selected documents returned from queries
ORAKEL (2006) a target corpus must be available to construct lexicons
that will then assist the learning method Gulla et al (2007)
Not a fully automated approach Sekine & Suzuki (2007)
Named entities mapping against query logs
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Future work Large scale evaluation More evaluation data-sets Enrich learned ontologies from other resources
Other lexicons Existing ontology repositories SWOOGLE or WATSON
(ontology mapping) Existing web documents of on-line thesaurus (Wikipedia
and Wiktionary) applying hears patterns
Compute the union (or other kind of means) of learned items (concepts, properties, instances)
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
Future work – Open issues Incorporate knowledge extracted from
The history of queries The Selected results
Extract knowledge from other social (semi-structured) data:Yahoo! AnswersFixya.com
Assign trust values to the learning objects
Univ
ers
ity o
f th
e A
eg
ean
AI LA
Bw
ww
.ics
d.a
egean.g
r
References Kotis, K., A. Papasalouros, and M. Maragoudakis, "Mining
Query Logs for Learning Useful Ontologies: an Incentive to SW Content Creation", International Journal for Knowledge Engineering and Data Mining (IJKEDM), issue Special Issue on Incentives for Semantic Content Creation
Kotis, K., and A. Papasalouros, "Learning useful kick-off ontologies from Query Logs: HCOME revised", 4th International Conference on Complex, Intelligent and Software Intensive Systems (CISIS-2010), Kracow, IEEE Computer Society Press, 2010.
Zavitsanos, E, Paliouras G, Vouros G, Petridis S. 2010. Learning Subsumption Hierarchies of Ontology Concepts from Texts. Journal of Web Intelligence.
K. Kotis 43 - QueryOnto