Abstract
We present an architecture for structuring and querying the contents of a set of documents which belong to an organization. The structure is a database which is semi-automatically populated using information extraction techniques. We provide an ontology-based language to interrogate the contents of the documents. The processing of queries in this language can give approximate answers and triggers a mechanism for improving the answers by doing additional information extraction of the textual sources. Individual database items have associated quality metadata which can be used when evaluating the quality of answers. The interaction between information extraction and query processing is a pivotal aspect of this research.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hendler, J., Berners-Lee, T., Lassila, O.: The semantic web. Scientific American (May 2001)
Florescu, D., Levy, A., Mendelzon, A.: Database techniques for the world-wide-web: A survey. SIGMOD Record 27(3), 59–74 (1998)
Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Computer 25(3), 38–49 (1992)
Abad-Mota, S., Helman, P.A.: Dia: A document interrogation architecture. In: Proceedings of the Text Mining Workshop in conjunction with the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2002), pp. 35–45 (2002)
Guarino, N. (ed.): Formal Ontology and Information Systems. IOS Press, Amsterdam (1998)
Abad-Mota, S., Helman, P.A.: Odil: Ontology-based document interrogation language. In: Khosrow-Pour, M. (ed.) Proceedings of the 2004 Information Resources Management Association International Conference, IRMA, pp. 517–520. Idea Group Publishing, USA (2004)
Brill, E., Mooney, R.J.: An overview of empirical natural language processing. AI Magazine (Winter), 13–24 (1997)
Cardie, C.: Empirical methods in information extraction. AI Magazine 18(4), 65–80 (1997)
Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: AAAI 1999 Workshop on Machine Learning for Information Extraction, Orlando, Florida, July 19 (1999)
Yong Nahm, U., Mooney, R.J.: A mutually beneficial integration of data mining and information extraction. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000) (July 2000)
Embley, D.W., Campbell, D.M., Jiang, Y.S., Liddle, S.W., Ng, Y.-K., Quass, D., Smith, R.D.: Conceptual-model-based data extraction from multiple-record web pages. Data Knowledge Engineering 31(3), 227–251 (1999)
Embley, D.W.: Toward semantic understanding: an approach based on information extraction ontologies. In: CRPIT 2004: Proceedings of the fifteenth conference on Australasian database, Darlinghurst, Australia, pp. 3–12. Australian Computer Society, Inc. (2004)
Elaine Califf, M.: Relational learning techniques for natural language extraction, Tech. Rep. AI98-276, University of Texas (January 1998)
Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34(1-3), 233–272 (1999)
Ciarvegna, F.: (lp)2, an adaptative algorithm from information extraction from web-related texts. In: Proceedings of the IJCAI-2001 Workshop on Adaptative Text Extraction and Mining, IJCAI 2001 (August 2001)
Laender, B.A., Ribeiro-Neto, da Silva, A., Teixeira, J.: A brief survey of web data extraction tools. SIGMOD Record 31(2), 84–93 (2002)
Ireson, N., Ciarvegna, F., Elaine Califf, M., Freitag, D., Kushmerick, N., Lavelli, A.: Evaluating machine learning for information extraction. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005) IJCAI 2001 (August 2005)
Abad-Mota, S., Ruiz, E.: Experiments in information extraction. In: Khosrow-Pour, M. (ed.) The Proceedings of the 2006 Information Resources Management Association International Conference, IRMA. Idea Group Publishing, USA (to appear, 2006)
Motro, A.: Integrity = validity + completeness. ACM Transactions on Database Systems 14(4), 481–502 (1989)
Andrei Mihaila, G.: Publishing, Locating, and Querying Networked Information Sources, Ph.D. thesis, University of Toronto (2000)
Rakov, I.: Quality of information in relational databases and its use for reconciling inconsistent answers in multidatabases. electronic document, citeseer.ist.psu.edu/133297.html
Abad-Mota, S.: Approximate query processing with summary tables in statistical databases. In: Pirotte, A., Delobel, C., Gottlob, G. (eds.) EDBT 1992. LNCS, vol. 580, pp. 499–515. Springer, Heidelberg (1992)
Bouzeghoub, M., Peralta, V.: A framework for analysis of data freshness. In: IQIS 2004: Proceedings of the 2004 international workshop on Information quality in information systems, pp. 59–67. ACM Press, New York (2004)
Motro, A., Rakov, I.: chapter Not all answers are equally good: estimating the quality of database answers. In: Flexible query answering systems, Norwell, MA, USA, pp. 1–21. Kluwer Academic Publishers, Dordrecht (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abad-Mota, S. (2006). Document Interrogation: Architecture, Information Extraction and Approximate Answers. In: Grust, T., et al. Current Trends in Database Technology – EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 4254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11896548_12
Download citation
DOI: https://doi.org/10.1007/11896548_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46788-5
Online ISBN: 978-3-540-46790-8
eBook Packages: Computer ScienceComputer Science (R0)