Abstract
While classic information retrieval methods return whole documents as a result of a query, many information demands would be better satisfied by fine-grain access inside the documents. One way to support this goal is to make the semantics of small document regions explicit, e.g. as XML labels, so that query engines can exploit them. To this purpose, the topics of the small document regions must be discovered from the texts; differently from document labelling applications, fine-grain topics cannot be listed in advance for arbitrary collections. Text-understanding approaches can derive the topic of a document region but are less appropriate for the construction of a small set of topics that can be used in queries.
To address this challenge we propose the coupling of text mining, prior knowledge explicated in ontologies and human expertise and present the system RELFIN, which is designed to assis the human expert in the discovery of topics appropriate for (i) ontology enhancement with additional concepts or relationships, (ii) semantic characterization and tagging of document regions. RELFIN performs data mining upon linguistically preprocessed corpora to group document regions on topics and constructing the topic labels for them, so that the labels are characteristic of the regions and thus helpful in ontology-based search. We show our first results of applying RELFIN on a case study of text analysis and retrieval.
Work partially funded under the EU Contract IST-2001-39023 Parmenides. http://www.crim.co.umist.ac.uk/parmenides
Chapter PDF
Similar content being viewed by others
References
Faure, D., Nédellec, C.: Knowledge acquisition of predicate argument structures from technical texts using machine learning: the system ASIUM. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999)
Graubitz, H., Spiliopoulou, M., Winkler, K.: The DIAsDEM framework for converting domain-specific texts into XML documents with data mining techniques. In: Proc. of the 1st IEEE Intl. Conf. on Data Mining, San Jose, CA, pp. 171–178. IEEE, Los Alamitos (2001)
Hotho, A., Staab, S., Stumme, G.: Explaining text clustering results using semantic structures. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 217–228. Springer, Heidelberg (2003)
Handschuh, S., Staab, S., Volz, R.: On deep annotation. In: Proceedings of the Twelfth International Conference on World Wide Web, Budapest, Hungary, pp. 431–438. ACM Press, New York (2003)
Kietz, J.-U., Volz, R., Maedche, A.: Extracting a domain-specific ontology from a corporate intranet. In: Cardie, C., Daelemans, W., Nédellec, C., Sang, E.T.K. (eds.) Proc. of 4th Conf. on Computational Natural Language Learning and of the 2nd Learning Language in Logic Workshop, Somerset, New Jersey, pp. 167–175. Association for Computational Linguistics (2000)
William Moore, G., Berman, J.J.: Medical data mining and knowledge discovery. In: Anatomic Pathology Data Mining. Studies in Fuzziness and Soft Computing, vol. 60, pp. 72–117. Physica-Verlag, Heidelberg (2001)
Maedche, A., Staab, S.: Discovering conceptual relations from text. In: Proc. of ECAI 2000, pp. 321–325 (2000)
Maedche, A., Staab, S.: Mining ontologies from text. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 189–202. Springer, Heidelberg (2000)
Maedche, A., Staab, S.: Semi-automatic engineering of ontologies from text. In: Proc. of 12th Int. Conf. on Software and Knowledge Engineering, Chicago, IL (2000)
Rinaldi, F., Dowdall, J., Hess, M., Ellman, J., Zarri, G.P., Persidis, A., Bernard, L., Karanikas, H.: Multilayer annotations in parmenides. In: Proceedings of the K-CAP 2003 workshop on Knowledge Markup and Semantic Annotation (October 2003)
Rinaldi, F., Dowdall, J., Hess, M., Kaljurand, K., Persidis, A., Theodoulidis, B., Black, B., McNaught, J., Karanikas, H., Vasilakopoulos, A., Zervanou, K., Bernard, L., Zarri, G.P., Slot, H.B., van der Touw, C., Daniel-King, M., Underwood, N., Lisowska, A., van der Plas, L., Sauron, V., Spiliopoulou, M., Brunzel, M., Ellman, J., Orphanos, G., Mavroudakis, T., Taraviras, S.: Parmenides: an opportunity for ISO TC37 SC4. In: ACL 2003 workshop on Linguistic Annotation, Sapporo, Japan (July 2003)
Rauber, A., Merkl, D.: Mining text archives: Creating readable maps to structure and describe document collections. In: Principles of Data Mining and Knowledge Discovery, pp. 524–529 (1999)
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)
Spiliopoulou, M., Rinaldi, F., Black, W.J., Zarri, G.P., Mueller, R.M., Brunzel, M., Theodoulidis, B., Orphanos, G., Hess, M., Dowdall, J., McNaught, J., King, M., Persidis, A., Bernard, L.: Coupling information extraction and data mining for ontology learning in parmenides. In: RIAO 2004, April 26th-28th, Avignon (2004)
Vasilakopoulos, A., Bersani, M., Black, W.J.: A suite of tools for marking up textual data for temporal text mining scenarios. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon (2004)
Winkler, K., Spiliopoulou, M.: Extraction of semantic XML DTDs from texts using data mining techniques. In: Proceedings of the K-CAP 2001 Workshop on Knowledge Markup and Semantic Annotation, Victoria, BC, Canada, pp. 59–68 (October 2001)
Winkler, K., Spiliopoulou, M.: Semi-automated XML tagging of public text archives: A case study. In: Proceedings of EuroWeb 2001 The Web in Public Administration, Pisa, Italy, pp. 271–285 (December 2001)
Winkler, K., Spiliopoulou, M.: Structuring domain-specific text archives by deriving a probabilistic XML DTD. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 461–474. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schaal, M., Müller, R.M., Brunzel, M., Spiliopoulou, M. (2005). RELFIN – Topic Discovery for Ontology Enhancement and Annotation. In: Gómez-Pérez, A., Euzenat, J. (eds) The Semantic Web: Research and Applications. ESWC 2005. Lecture Notes in Computer Science, vol 3532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11431053_41
Download citation
DOI: https://doi.org/10.1007/11431053_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26124-7
Online ISBN: 978-3-540-31547-6
eBook Packages: Computer ScienceComputer Science (R0)