Document Interrogation: Architecture, Information Extraction and Approximate Answers | SpringerLink
Skip to main content

Document Interrogation: Architecture, Information Extraction and Approximate Answers

  • Conference paper
Current Trends in Database Technology – EDBT 2006 (EDBT 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4254))

Included in the following conference series:

  • 653 Accesses

Abstract

We present an architecture for structuring and querying the contents of a set of documents which belong to an organization. The structure is a database which is semi-automatically populated using information extraction techniques. We provide an ontology-based language to interrogate the contents of the documents. The processing of queries in this language can give approximate answers and triggers a mechanism for improving the answers by doing additional information extraction of the textual sources. Individual database items have associated quality metadata which can be used when evaluating the quality of answers. The interaction between information extraction and query processing is a pivotal aspect of this research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Hendler, J., Berners-Lee, T., Lassila, O.: The semantic web. Scientific American (May 2001)

    Google Scholar 

  2. Florescu, D., Levy, A., Mendelzon, A.: Database techniques for the world-wide-web: A survey. SIGMOD Record 27(3), 59–74 (1998)

    Article  Google Scholar 

  3. Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Computer 25(3), 38–49 (1992)

    Google Scholar 

  4. Abad-Mota, S., Helman, P.A.: Dia: A document interrogation architecture. In: Proceedings of the Text Mining Workshop in conjunction with the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2002), pp. 35–45 (2002)

    Google Scholar 

  5. Guarino, N. (ed.): Formal Ontology and Information Systems. IOS Press, Amsterdam (1998)

    Google Scholar 

  6. Abad-Mota, S., Helman, P.A.: Odil: Ontology-based document interrogation language. In: Khosrow-Pour, M. (ed.) Proceedings of the 2004 Information Resources Management Association International Conference, IRMA, pp. 517–520. Idea Group Publishing, USA (2004)

    Google Scholar 

  7. Brill, E., Mooney, R.J.: An overview of empirical natural language processing. AI Magazine (Winter), 13–24 (1997)

    Google Scholar 

  8. Cardie, C.: Empirical methods in information extraction. AI Magazine 18(4), 65–80 (1997)

    Google Scholar 

  9. Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: AAAI 1999 Workshop on Machine Learning for Information Extraction, Orlando, Florida, July 19 (1999)

    Google Scholar 

  10. Yong Nahm, U., Mooney, R.J.: A mutually beneficial integration of data mining and information extraction. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000) (July 2000)

    Google Scholar 

  11. Embley, D.W., Campbell, D.M., Jiang, Y.S., Liddle, S.W., Ng, Y.-K., Quass, D., Smith, R.D.: Conceptual-model-based data extraction from multiple-record web pages. Data Knowledge Engineering 31(3), 227–251 (1999)

    Article  MATH  Google Scholar 

  12. Embley, D.W.: Toward semantic understanding: an approach based on information extraction ontologies. In: CRPIT 2004: Proceedings of the fifteenth conference on Australasian database, Darlinghurst, Australia, pp. 3–12. Australian Computer Society, Inc. (2004)

    Google Scholar 

  13. Elaine Califf, M.: Relational learning techniques for natural language extraction, Tech. Rep. AI98-276, University of Texas (January 1998)

    Google Scholar 

  14. Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34(1-3), 233–272 (1999)

    Article  MATH  Google Scholar 

  15. Ciarvegna, F.: (lp)2, an adaptative algorithm from information extraction from web-related texts. In: Proceedings of the IJCAI-2001 Workshop on Adaptative Text Extraction and Mining, IJCAI 2001 (August 2001)

    Google Scholar 

  16. Laender, B.A., Ribeiro-Neto, da Silva, A., Teixeira, J.: A brief survey of web data extraction tools. SIGMOD Record 31(2), 84–93 (2002)

    Article  Google Scholar 

  17. Ireson, N., Ciarvegna, F., Elaine Califf, M., Freitag, D., Kushmerick, N., Lavelli, A.: Evaluating machine learning for information extraction. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005) IJCAI 2001 (August 2005)

    Google Scholar 

  18. Abad-Mota, S., Ruiz, E.: Experiments in information extraction. In: Khosrow-Pour, M. (ed.) The Proceedings of the 2006 Information Resources Management Association International Conference, IRMA. Idea Group Publishing, USA (to appear, 2006)

    Google Scholar 

  19. Motro, A.: Integrity = validity + completeness. ACM Transactions on Database Systems 14(4), 481–502 (1989)

    Article  Google Scholar 

  20. Andrei Mihaila, G.: Publishing, Locating, and Querying Networked Information Sources, Ph.D. thesis, University of Toronto (2000)

    Google Scholar 

  21. Rakov, I.: Quality of information in relational databases and its use for reconciling inconsistent answers in multidatabases. electronic document, citeseer.ist.psu.edu/133297.html

  22. Abad-Mota, S.: Approximate query processing with summary tables in statistical databases. In: Pirotte, A., Delobel, C., Gottlob, G. (eds.) EDBT 1992. LNCS, vol. 580, pp. 499–515. Springer, Heidelberg (1992)

    Chapter  Google Scholar 

  23. Bouzeghoub, M., Peralta, V.: A framework for analysis of data freshness. In: IQIS 2004: Proceedings of the 2004 international workshop on Information quality in information systems, pp. 59–67. ACM Press, New York (2004)

    Chapter  Google Scholar 

  24. Motro, A., Rakov, I.: chapter Not all answers are equally good: estimating the quality of database answers. In: Flexible query answering systems, Norwell, MA, USA, pp. 1–21. Kluwer Academic Publishers, Dordrecht (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abad-Mota, S. (2006). Document Interrogation: Architecture, Information Extraction and Approximate Answers. In: Grust, T., et al. Current Trends in Database Technology – EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 4254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11896548_12

Download citation

  • DOI: https://doi.org/10.1007/11896548_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46788-5

  • Online ISBN: 978-3-540-46790-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics