Abstract
In modern information retrieval (IR) systems, scoring functions have been extensively adopted for sorting results. For a given document, the rank in sorted result lists with respect to hot searches can be considered as its influence. When a new document comes, can we use such IR systems to evaluate its influence before we insert it into the corpus? Such issue may not be solved very well by current IR systems with inverted indexes. In this paper, an influence measure based on documents’ global rank is proposed, and the inverted index structure has been extended by adding the position milestones for speeding up the ranking calculation. Moreover, a performance study using both real data and synthetic data verifies the effectiveness and the efficiency of our method.
The research of Yi Han was supported in part by by China National High-tech R&D Program (863 Program) under Grant No. 2007AA010502 and National Natural Science Foundation of China under Grant No. 60873204 and 60933005. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks 30(1-7), 107–117 (1998)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University (1998)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA 1998), pp. 668–677. ACM Press, New York (1998)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1984)
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press and McGraw-Hill Book Company (2001)
Knuth, D.E.: The Art of Computer Programming. Sorting and Searching, vol. III. Addison-Wesley, Reading (1973)
Klimt, B., Yang, Y.: The enron corpus: A new dataset for email classification research. In: ECML, pp. 217–226 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Y., Han, Y., Lu, T. (2010). Estimating the Influence of Documents in IR Systems: A Marked Indexing Approach. In: Taniar, D., Gervasi, O., Murgante, B., Pardede, E., Apduhan, B.O. (eds) Computational Science and Its Applications – ICCSA 2010. ICCSA 2010. Lecture Notes in Computer Science, vol 6019. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12189-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-12189-0_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12188-3
Online ISBN: 978-3-642-12189-0
eBook Packages: Computer ScienceComputer Science (R0)