Abstract
Named entity recognition (NER) and relation extraction (RE) are two important subtasks in information extraction (IE). Most of the current learning methods for NER and RE rely on supervised machine learning techniques with more accurate results for NER than RE. This paper presents OntoILPER a system for extracting entity and relation instances from unstructured texts using ontology and inductive logic programming, a symbolic machine learning technique. OntoILPER uses the domain ontology and takes advantage of a higher expressive relational hypothesis space for representing examples whose structure is relevant to IE. It induces extraction rules that subsume examples of entities and relation instances from a specific graph-based model of sentence representation. Furthermore, OntoILPER enables the exploitation of the domain ontology and further background knowledge in the form of relational features. To evaluate OntoILPER, several experiments over the TREC corpus for both NER and RE tasks were conducted and the yielded results demonstrate its effectiveness in both tasks. This paper also provides a comparative assessment among OntoILPER and other NER and RE systems, showing that OntoILPER is very competitive on NER and outperforms the selected systems on RE.
Similar content being viewed by others
Notes
Horn clauses consist of first-order clauses containing at most one positive literal.
ACE (2004). Automatic Content Extraction. Relation Detection and Characterization 2004 Evaluation. http://www.itl.nist.gov/iad/mig/tests/ace/2004.
In an ontology, TBox statements describe a system in terms of a controlled vocabulary, or a set of classes and properties, whereas ABox is the assertional component, i.e., TBox-compliant statements about that vocabulary.
Stanford CoreNLP Tools. http://nlp.stanford.edu/software/corenlp.shtml.
Apache OpenNLP. The Apache Software Foundation. http://opennlp.apache.org.
We have also experimented with 4-grams, but bi-grams and tri-grams achieved better results in our preliminary experiments.
ProGolem ILP system runs on the YAP Prolog (http://www.dcc.fc.up.pt/~vsc/Yap).
LIBSVM. A library for Support Vector Machines. https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
WordNet. A lexical database for English. https://wordnet.princeton.edu.
References
Airola A, Pyysalo S, Björne J, Pahikkala T, Ginter F, Salakoski T (2008) All-paths graph kernel for protein-protein interaction extraction with evaluation of cross corpus learning. BMC Bioinform. 9:S2
Alicante A, Corazza A (2011) Barrier features for classification of semantic relations. In: Proceedings of the international conference recent advances in natural language processing (RANLP) 2011, Hissar, Bulgaria, pp 509–514
Baader F, Horrocks I, Sattler U (2008) Description logics. Handbook of knowledge representation. Elsevier, Atlanta
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Boston
Björne J, Salakoski T (2015). TEES 2.2: Biomedical event extraction for diverse corpora. BMC Bioinform 16. Suppl 16 (2015): S4. PMC. Web. 1 Nov
Brown M, Kros JF (2003) Data mining and the impact of missing data. Indu Manag Data Syst 103(8):611–621
Byrd R, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. J Math Progr 134–1:127–155
Camacho R, Ramos R, Fonseca N (2014). AND Parallelism for ILP: the APIS system. In: Inductive logic programming: 23rd international conference, ILP (2013) Rio de Janeiro, Brazil, August 28–30, 2013. Revised Selected Papers. Springer, Berlin, pp 93–106
Choi SP, Lee S, Jung H, Song S (2013) An intensive case study on kernel-based relation extraction. In: Proceedings of multimedia tools and applications, Springer, US, pp 1–27
Choi SP, Jeong CH, Choi YS, Myaeng SH (2009) Relation extraction based on extended composite kernel using flat lexical features. JKIISE Softw Appl 36(8):642–652
Christensen J, Mausam, Soderland S, Etzioni O (2010) Semantic role labeling for open information extraction. In: Proceedings of the NAACL HLT, First international workshop on formalisms and methodology for learning by reading (FAM-LbR ’10), ACL, Stroudsburg, PA, USA, pp 52–60
Ciaramita M, Altun Y (2006) Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP ’06), association for computational linguistics, Stroudsburg, PA, USA, pp 594–602
De Marneffe M-C, Manning CD (2006) Stanford typed dependencies manual. Technical report. Department of Computer Science, Stanford University
Dou D, Wang H, Liu H (2015) Semantic data mining: a survey of ontology-based approaches. In: IEEE international conference on semantic computing (ICSC), 2015, Anaheim, CA, pp 244–251
Fürnkranz J, Gamberger D, Lavrac N (2012) Foundations of rule learning. Springer, Berlin
Giuliano C, Lavelli A, Romano L (2007) Relation extraction and the influence of automatic NER. ACM Trans Speech Lang Process 5(1):2
Gruber T (1993) Towards principles for the design of ontologies used for knowledge sharing. In: International workshop on formal ontology in conceptual analysis and knowledge representation, Kluwer Academic Publishers, Deventer, The Netherlands
Gutierrez F, Dou D, Fickas S, Wimalasuriya D, Zong H (2015) A hybrid ontology-based information extraction system. J Inform Sci 2015:1–23
Hitzler P, Krötzsch M, Parsia B, Patel-Schneider PF, Rudolph S (2009) OWL 2 Web ontology language primer. W3C Work Draft. http://www.w3.org/TR/owl2-primer
Horvath T, Paass G, Reichartz F, Wrobel S (2009) A logic-based approach to relation extraction from texts. In: De Raedt L (ed) Proceedings of the 19th international conference on inductive logic programming (ILP’09). Springer, Berlin, pp 34–48
Jiang J (2012) Information extraction from text. In: Aggarwal CC, Zhai CX (eds) Mining text data. Springer, Berlin, pp 11–41
Jiang J, Guan Y, Zhao C (2015) WI-ENRE in CLEF eHealth evaluation lab 2015: clinical named entity recognition based on CRF. In: Conference and labs of the evaluation forum Toulouse, France, September 8–11, CLEF (working notes)
Jiang J, Zhai CX (2007) A systematic exploration of the feature space for relation extraction. In: Annual conference of the North American chapter of the association for computational linguistics, NAACL-HLT’2007, Rochester, NY, USA, pp 113–120
Karkaletsis V, Fragkou P, Petasis G, Iosif E (2011) Ontology based information extraction from text. In: Paliouras G et al (eds) Multimedia information extraction, LNAI 6050, pp 89–109
Kate RJ, Mooney RJ (2010) Joint entity and relation extraction using card-pyramid parsing. In: Proceedings of the 14th conference on computational natural language learning (CoNLL-2010), Uppsala, Sweden, July, pp 203–212
Kohavi R, John GH (1995) Automatic parameter selection by minimizing estimated error. In: 12th international conference on machine learning, San Francisco, Morgam Kaufman
Lavrac N, Dzeroski S (1994) Inductive logic programming: techniques and applications. Ellis Horwood, New York
Lima R, Batista J, Ferreira R, Freitas F, Lins R, Simske S, Riss M (2014) Transforming graph-based sentence representations to alleviate overfitting in relation extraction. In: Proceedings of the 2014 ACM symposium on document engineering (DocEng ’14), ACM, New York, NY, USA, pp 53–62
Lima R, Espinasse B, Freitas F (2015) Relation extraction from texts with symbolic rules induced by inductive logic programming. In: Proceedings of the IEEE international conference on tools with artificial intelligence, IEEE-ICTAI 2015, Vietri sul Mar, Italy, pp 194–201
Lima R, Espinasse B, Oliveira H, Pentagrossa L, Freitas F (2013) Information extraction from the web: an ontology–based method using inductive logic programming. In: Proceeding of the IEEE international conference on tools with artificial intelligence, IEEE-ICTAI 2013, Washington DC, USA, pp 741–748
Li M, Munkhdalai T, Yu X, Keun HR (2015) A novel approach for protein-named entity recognition and protein-protein interaction extraction. Math Probl Eng 2015:10
Muggleton S (1991) Inductive logic programming. New Gener Comput 8(4):29
Muggleton S (1995) Inverse entailment and Progol. New Gener Comput 13:245–286
Muggleton S, Fen C (1990) Efficient induction of logic programs. In: 1st conference on algorithmic learning theory Tokyo, pp 368–381
Muggleton S, Santos J, Tamaddoni-Nezhad A (2009) ProGolem: a system based on relative minimal generalisation. In: 19th international conference on ILP. Springer, Leuven, pp 131–148
Muzaffar AW, Azam F, Qamar U (2015) A relation extraction framework for biomedical text using hybrid feature set. Comput Math Methods Med 2015:12
Nitesh V, Chawla Kevin W, Bowyer Lawrence OH, Philip KW (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
Patel A, Ramakrishnan G, Bhattacharya P (2010) Incorporating linguistic expertise using ILP for named entity recognition in data hungry Indian languages, LNCS, vol 5989. Springer, Berlin, pp 178–185
Petasis G, Karkaletsis V, Paliouras G, Krithara A, Zavitsanos E (2011) Ontology population and enrichment: state of the art. In: Paliouras G et al (eds) Multimedia information extraction, LNAI, vol 6050, pp 134–166
Plotkin G (1971) A note on inductive generalization. Mach Intell 5(1971):153–163
Ramakrishnan G, Joshi S, Balakrishnan S, Srinivasan A (2008) Using ILP to construct features for information extraction from semi-structured text. In: Proceedings of the 17th international conference on inductive logic programming, LNAI, vol 4894. Springer, Berlin, pp 211–224
Roth D, Yih W (2004) A Linear programming formulation for global inference in natural language tasks. CoNLL 2004:1–8
Roth D, Yih W (2007) Global inference for entity and relation identification via a linear programming formulation. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, Cambridge
Santos J (2010) Efficient learning and evaluation of complex concepts in inductive logic programming. Ph.D. thesis, Imperial College University
Seneviratne MD, Ranasinghe DN (2011) Inductive Logic programming in an agent system for ontological relation extraction. Int J Mach Learn Comput 1(4):344–352
Smole D, Ceh M, Podobnikar T (2011) Evaluation of inductive logic programming for information extraction from natural language texts to support spatial data recommendation services. Int J Geogr Inf Sci 25:1809–1827
Srinivasan A, Faruquie T, Joshi S (2012) Data and task parallelism in ILP using MapReduce. J Mach Learn 86–1:141–168
Tang J, Hong M, Zhang D, Liang B, Li J (2007) Information extraction: methodologies and applications. Emerging technologies of text mining: techniques and applications. Idea Group Inc., Hershey, pp 1–33
Wimalasuriya DC, Dou D (2010) Components for information extraction: ontology-based information extractors and generic platforms. In: CIKM’10, October 26–30, Toronto, Ontario, Canada
Wimalasuriya DC, Dou D (2009) Ontology-based information extraction: an introduction and a survey of current approaches. J Inform Sci 36(3):306–323
Xia J, Fang, A C, Zhang X (2014) A novel feature selection strategy for enhanced biomedical event extraction using the Turku system. BioMed Res Int 2014:12
Zhou G, Zhang M, Ji D-H, Zhu Q (2007) Tree kernel-based relation extraction with context-sensitive structured parse tree information. In: Joint conference on empirical methods in natural language processing and computational natural language learning, Prague, pp 728–736
Acknowledgements
The authors are grateful to Hilário Oliveira for his help in the development of some of the OntoILPER components. We also thank the National Council for Scientific and Technological Development (CNPq/Brazil) for financial support (Grant No. 140791/2010-8).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lima, R., Espinasse, B. & Freitas, F. OntoILPER: an ontology- and inductive logic programming-based system to extract entities and relations from text. Knowl Inf Syst 56, 223–255 (2018). https://doi.org/10.1007/s10115-017-1108-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1108-3