Abstract
This paper introduces a new full-text document retrieval model that is based on comparing occurrence frequency rank numbers of terms in queries and documents.
More precisely, to compute the similarity between a query and a document, this new model first ranks the terms in the query and in the document on decreasing occurrence frequency. Next, for each term, it computes a local similarity between the query and the document, by calculating a weighted difference between the term’s rank number in the query and its rank number in the document. Finally, it collects all those local similarities and unifies them into one global similarity between the query and the document.
In this paper we also demonstrate that the effectiveness of this new full-text document retrieval model is comparable with that of the standard vector-space retrieval model.
On temporary leave from Philips Research Laboratories, Eindhoven, The Netherlands.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
IJ.J. Aalbersberg, Posting Compression in Dynamic Retrieval Environments, Proc. 14th International Conference on Research and Development in Information Retrieval SIGIR 91, Chicago, IL (October 1991), 72–81.
D.C. Blair, Language and Representation in Information Retrieval, Elsevier, Amsterdam, The Netherlands (1990).
A.D. Booth, A Law of Occurrence for Words of Low Frequency, Information and Control 10 (1967), 386–393.
B.C. Brookes, Ranking Techniques and the Empirical Log Law, Information Processing and Management 20 (1984), 37–46.
L. Egghe, On the Classification of the Classical Bibliometric Laws, Journal of Documentation 44 (1988), 53–62.
C. Fox, A Stop List for General Text, SIGIR Forum 24, No. 1–2 (1989/1990), 19–35.
C. Fox, Lexical Analysis and Stoplists, in Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates (eds.), Prentice-Hall, Englewood Cliffs, NJ (1992), 102–130.
W.B. Frakes, Stemming Algorithms, in Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates (eds.), Prentice-Hall, Englewood Cliffs, NJ (1992), 131–160.
N. Fuhr, Probabilistic Models in Information Retrieval, The Computer Journal 35 (1992), 243–255.
D. Harman, Ranking Algorithms, in Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates (eds.), Prentice-Hall, Englewood Cliffs, NJ (1992), 363–392.
H. Kucera and W.N. Francis, Computational Analysis of Present-day American English, Brown University Press, Providence, RI (1967).
J.B. Lovins, Development of a Stemming Algorithm, Mechanical Translation and Computational Linguistics 11 (1968), 22–31.
G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, Reading, MA (1989).
G. Salton and C. Buckley, Term-Weighting Approaches in Automatic Text Retrieval, Information Processing and Management 24 (1988), 513–523.
K. Sparck Jones, A Statistical Interpretation of Term Specificity and its Application in Retrieval, Journal of Documentation 28 (1972), 11–21.
Virginia Disc One, CD-ROM from Virginia Polytechnic Institute and State University, Blacksburg, VA (1990).
G.K. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, Reading, MA (1949).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag London Limited
About this paper
Cite this paper
Aalbersberg, I.J. (1994). A Document Retrieval Model Based on Term Frequency Ranks. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_17
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2099-5_17
Publisher Name: Springer, London
Print ISBN: 978-3-540-19889-5
Online ISBN: 978-1-4471-2099-5
eBook Packages: Springer Book Archive