Abstract
A method for accessing text-based information using domain-specific features rather than documents alone is presented. The basis of this approach is the ability to automatically extract features from large text databases, and identify statistically significant relationships or associations between those features. The techniques supporting this approach are discussed, and examples from an application using these techniques, named the Associations System, are illustrated using the Wall Street Journal database. In this particular application, the features extracted are company and person names. The series of tests run on the Associations System demonstrate that feature extraction can be quite accurate, and that the relationships generated are reliable. In addition to conventional measures of recall and precision, evaluation measures are currently being studied which will indicate the usefulness of the relationships identified, in various domain-specific contexts.
This research was performed at the Center for Intelligent Information Retrieval at the University of Massachusetts at Amherst.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. H. Thompson and W.B. Croft. Support for browsing in an intelligent text retrieval system. International Journal of Man-Machine Studies, 30: 639–668, 1989.
P. D. Bruza and Th.P. van der Weide. Two level hypermedia. In Proceedings of the International Conference on Database and Expert Systems Applications, pp. 76–83. Springer-Verlag, 1990.
D. Harman. The DARPA tipster project. ACM SIGIR Forum, 26 (2): 26–28, 1992.
W. Lehnert and B. Sundheim. A performance evaluation of text-analysis technologies. AI Magazine, pp. 81–94, 1991.
D. D. Lewis. Text representation for intelligent text retrieval: a classification-oriented view. Text-based Intelligent Systems, ed. Paul S. Jacobs, pp. 179–197, LEA Press, 1992.
J. P. Callan, W.B. Croft, and S.M. Harding. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications, pp. 78–83. Springer-Verlag, 1992.
M. E. Lesk and E. Schmidt. Lex—a lexical analyzer generator. In UNIX Programmer’s Manual, Bell Telephone Laboratories, Inc., 1979.
L. F. Rau. Extracting company names from text. In Proceedings of the Sixth IEEE Conference on Artificial Intelligence Applications, 1991.
C. L. Borgman and S.L. Siegfried. Getty’s Synoname T e and its cousins: a survey of applications of personal name-matching algorithms. JA SIS, 43 (7): 459–476, 1992.
W. B. Croft, H.R. Turtle, and D.D. Lewis. The use of phrases and structured queries in information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 32–45, 1991.
K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. In Proceedings of the 27th Meeting of the ACL, pp. 76–83, 1989.
K. W. Church and W.A. Gale. Concordances for parallel text. In Seventh Annual Conference of the University of Waterloo Centre for the New OED and Text Research, pp. 40–62, 1991.
J. K. Ousterhout. An Introduction to Tel and Th, Addison-Wesley Publishing Company, Inc., 1994.
G. Salton, J. Allan, and C. Buckley. Approaches to passage retrieval in full text information systems. In Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag London Limited
About this paper
Cite this paper
Conrad, J.G., Utt, M.H. (1994). A System for Discovering Relationships by Feature Extraction from Text Databases. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_27
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2099-5_27
Publisher Name: Springer, London
Print ISBN: 978-3-540-19889-5
Online ISBN: 978-1-4471-2099-5
eBook Packages: Springer Book Archive