A Statistical View of Binned Retrieval Models | SpringerLink
Skip to main content

A Statistical View of Binned Retrieval Models

  • Conference paper
Advances in Information Retrieval (ECIR 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4956))

Included in the following conference series:

Abstract

Many traditional information retrieval models, such as BM25 and language modeling, give good retrieval effectiveness, but can be difficult to implement efficiently. Recently, document-centric impact models were developed in order to overcome some of these efficiency issues. However, such models have a number of problems, including poor effectiveness, and heuristic term weighting schemes. In this work, we present a statistical view of document-centric impact models. We describe how such models can be treated statistically and propose a supervised parameter estimation technique. We analyze various theoretical and practical aspects of the model and show that weights estimated using our new estimation technique are significantly better than the integer-based weights used in previous studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proc. 17th SIGIR, pp. 232–241. Springer, New York (1994)

    Google Scholar 

  2. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proc. 21st SIGIR, pp. 275–281 (1998)

    Google Scholar 

  3. Fang, H., Zhai, C.: An exploration of axiomatic approaches to information retrieval. In: Proc. 28th SIGIR, pp. 480–487 (2005)

    Google Scholar 

  4. Amati, G., Rijsbergen, C.J.V.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Syststems 20(4), 357–389 (2002)

    Article  Google Scholar 

  5. Nallapati, R.: Discriminative models for information retrieval. In: Proc. 27th SIGIR, pp. 64–71 (2004)

    Google Scholar 

  6. Gao, J., Qi, H., Xia, X., Nie, J.Y.: Linear discriminant model for information retrieval. In: Proc. 28th SIGIR, pp. 290–297 (2005)

    Google Scholar 

  7. Anh, V.N., Moffat, A.: Simplified similarity scoring using term ranks. In: Proc. 28th SIGIR, pp. 226–233 (2005)

    Google Scholar 

  8. Song, F., Croft, W.B.: A general language model for information retrieval. In: Proc. 8th CIKM, pp. 316–321 (1999)

    Google Scholar 

  9. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proc. 24th SIGIR, pp. 334–342 (2001)

    Google Scholar 

  10. Tao, T., Zhai, C.: Regularized estimation of mixture models for robust pseudo-relevance feedback. In: Proc. 29th SIGIR, pp. 162–169 (2006)

    Google Scholar 

  11. Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proc. 10th CIKM, pp. 403–410 (2001)

    Google Scholar 

  12. Diaz, F., Metzler, D.: Improving the estimation of relevance models using large external corpora. In: Proc. 29th SIGIR, pp. 154–161 (2006)

    Google Scholar 

  13. Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proc. 24th SIGIR, pp. 120–127 (2001)

    Google Scholar 

  14. Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proc. 22nd SIGIR, pp. 222–229 (1999)

    Google Scholar 

  15. Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: Proc. 26th SIGIR, pp. 143–150 (2003)

    Google Scholar 

  16. Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proc. 27th SIGIR, pp. 186–193 (2004)

    Google Scholar 

  17. Kurland, O., Lee, L.: Corpus structure, language models, and ad hoc information retrieval. In: Proc. 27th SIGIR, pp. 194–201 (2004)

    Google Scholar 

  18. Wei, X., Croft, W.B.: Lda-based document models for ad-hoc retrieval. In: Proc. 29th SIGIR, pp. 178–185 (2006)

    Google Scholar 

  19. Jones, K.S.: Language modelling’s generative model: Is it rational? Technical report, University of Cambridge (2004)

    Google Scholar 

  20. Anh, V.N., Moffat, A.: Collection-independent document-centric impacts. In: Proc. Australian Document Computing Symposium, pp. 25–32 (2004)

    Google Scholar 

  21. Anh, V.N., Moffat, A.: Melbourne university 2004: Terabyte and web tracks. In: Proceedings of TREC 2004 (2004)

    Google Scholar 

  22. Anh, V.N., Moffat, A.: Pruned query evaluation using pre-computed impacts. In: Proc. 29th SIGIR, pp. 372–379 (2006)

    Google Scholar 

  23. Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for information retrieval. In: Proc. 27th SIGIR, pp. 178–185 (2004)

    Google Scholar 

  24. Büttcher, S., Clarke, C.L.A.: A document-centric approach to static index pruning in text retrieval systems. In: Proc. 15th CIKM, pp. 182–189 (2006)

    Google Scholar 

  25. Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y.S., Soffer, A.: Static index pruning for information retrieval systems. In: Proc. 24th SIGIR, pp. 43–50 (2001)

    Google Scholar 

  26. Fuhr, N.: Two models of retrieval with probabilistic indexing. In: Proc. 9th SIGIR, pp. 249–257 (1986)

    Google Scholar 

  27. Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: A language model-based serach engine for complex queries. In: Proceedings of the International Conference on Intelligence Analysis (2004)

    Google Scholar 

  28. Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.: Bias and the limits of pooling. In: Proc. 29th SIGIR, pp. 619–620 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Craig Macdonald Iadh Ounis Vassilis Plachouras Ian Ruthven Ryen W. White

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Metzler, D., Strohman, T., Croft, W.B. (2008). A Statistical View of Binned Retrieval Models. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds) Advances in Information Retrieval. ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78646-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78646-7_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78645-0

  • Online ISBN: 978-3-540-78646-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics