A System for Discovering Relationships by Feature Extraction from Text Databases | SpringerLink
Skip to main content

A System for Discovering Relationships by Feature Extraction from Text Databases

  • Conference paper
SIGIR ’94

Abstract

A method for accessing text-based information using domain-specific features rather than documents alone is presented. The basis of this approach is the ability to automatically extract features from large text databases, and identify statistically significant relationships or associations between those features. The techniques supporting this approach are discussed, and examples from an application using these techniques, named the Associations System, are illustrated using the Wall Street Journal database. In this particular application, the features extracted are company and person names. The series of tests run on the Associations System demonstrate that feature extraction can be quite accurate, and that the relationships generated are reliable. In addition to conventional measures of recall and precision, evaluation measures are currently being studied which will indicate the usefulness of the relationships identified, in various domain-specific contexts.

This research was performed at the Center for Intelligent Information Retrieval at the University of Massachusetts at Amherst.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. R. H. Thompson and W.B. Croft. Support for browsing in an intelligent text retrieval system. International Journal of Man-Machine Studies, 30: 639–668, 1989.

    Article  Google Scholar 

  2. P. D. Bruza and Th.P. van der Weide. Two level hypermedia. In Proceedings of the International Conference on Database and Expert Systems Applications, pp. 76–83. Springer-Verlag, 1990.

    Google Scholar 

  3. D. Harman. The DARPA tipster project. ACM SIGIR Forum, 26 (2): 26–28, 1992.

    Article  Google Scholar 

  4. W. Lehnert and B. Sundheim. A performance evaluation of text-analysis technologies. AI Magazine, pp. 81–94, 1991.

    Google Scholar 

  5. D. D. Lewis. Text representation for intelligent text retrieval: a classification-oriented view. Text-based Intelligent Systems, ed. Paul S. Jacobs, pp. 179–197, LEA Press, 1992.

    Google Scholar 

  6. J. P. Callan, W.B. Croft, and S.M. Harding. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications, pp. 78–83. Springer-Verlag, 1992.

    Google Scholar 

  7. M. E. Lesk and E. Schmidt. Lex—a lexical analyzer generator. In UNIX Programmer’s Manual, Bell Telephone Laboratories, Inc., 1979.

    Google Scholar 

  8. L. F. Rau. Extracting company names from text. In Proceedings of the Sixth IEEE Conference on Artificial Intelligence Applications, 1991.

    Google Scholar 

  9. C. L. Borgman and S.L. Siegfried. Getty’s Synoname T e and its cousins: a survey of applications of personal name-matching algorithms. JA SIS, 43 (7): 459–476, 1992.

    Google Scholar 

  10. W. B. Croft, H.R. Turtle, and D.D. Lewis. The use of phrases and structured queries in information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 32–45, 1991.

    Google Scholar 

  11. K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. In Proceedings of the 27th Meeting of the ACL, pp. 76–83, 1989.

    Google Scholar 

  12. K. W. Church and W.A. Gale. Concordances for parallel text. In Seventh Annual Conference of the University of Waterloo Centre for the New OED and Text Research, pp. 40–62, 1991.

    Google Scholar 

  13. J. K. Ousterhout. An Introduction to Tel and Th, Addison-Wesley Publishing Company, Inc., 1994.

    Google Scholar 

  14. G. Salton, J. Allan, and C. Buckley. Approaches to passage retrieval in full text information systems. In Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag London Limited

About this paper

Cite this paper

Conrad, J.G., Utt, M.H. (1994). A System for Discovering Relationships by Feature Extraction from Text Databases. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2099-5_27

  • Publisher Name: Springer, London

  • Print ISBN: 978-3-540-19889-5

  • Online ISBN: 978-1-4471-2099-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics