Authors:
Saad Alajlan
1
;
Frans Coenen
2
;
Boris Konev
2
and
Angrosh Mandya
2
Affiliations:
1
Department of Computer Science, The University of Liverpool, Liverpool, U.K., College of Computer and Information Sciences, Al Imam Mohammad Ibn Saud Islamic University, Riyadh and Saudi Arabia
;
2
Department of Computer Science, The University of Liverpool, Liverpool and U.K.
Keyword(s):
Ontology Learning, RDF, Relation Extraction, Twitter, Name Entity Recognition, Regular Expression.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Data Engineering
;
Enterprise Information Systems
;
Information Systems Analysis and Specification
;
Knowledge Engineering and Ontology Development
;
Knowledge Representation
;
Knowledge-Based Systems
;
Natural Language Processing
;
Ontologies and the Semantic Web
;
Ontology Engineering
;
Pattern Recognition
;
Symbolic Systems
Abstract:
This paper presents and compares three mechanisms for learning an ontology describing a domain of discoursed as defined in a collection of tweets. The task in part involves the identification of entities and relations in the free text data, which can then be used to produce a set of RDF triples from which an ontology can be generated. The first mechanism is therefore founded on the Stanford CoreNLP Toolkit.; in particular the Named Entity Recognition and Relation Extraction mechanisms that come with this tool kit. The second is founded on the GATE General Architecture for Text Engineering which provides an alternative mechanism for relation extraction from text. Both require a substantial amount of training data. To reduce the training data requirement the third mechanism is founded on the concept of Regular Expressions extracted from a training data “seed set”. Although the third mechanism still requires training data the amount of training data is significantly reduced without adve
rsely affecting the quality of the ontologies generated.
(More)