Does Selective Search Benefit from WAND Optimization?

Kim, Yubin; Callan, Jamie; Culpepper, J. Shane; Moffat, Alistair

doi:10.1007/978-3-319-30671-1_11

Yubin Kim²¹,
Jamie Callan²¹,
J. Shane Culpepper²² &
…
Alistair Moffat²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

European Conference on Information Retrieval

4523 Accesses
9 Citations

Abstract

Selective search is a distributed retrieval technique that reduces the computational cost of large-scale information retrieval. By partitioning the collection into topical shards, and using a resource selection algorithm to identify a subset of shards to search, selective search allows retrieval effectiveness to be maintained while evaluating fewer postings, often resulting in 90+% reductions in querying cost. However, there has been only limited attention given to the interaction between dynamic pruning algorithms and topical index shards. We demonstrate that the WAND dynamic pruning algorithm is more effective on topical index shards than it is on randomly-organized index shards, and that the savings generated by selective search and WAND are additive. We also compare two methods for applying WAND to topical shards: searching each shard with a separate top-k heap and threshold; and sequentially passing a shared top-k heap and threshold from one shard to the next, in the order established by a resource selection mechanism. Separate top-k heaps provide low query latency, whereas a shared top-k heap provides higher throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On the Efficiency of Selective Search

Inverted List Caching for Topical Index Shards

Efficient distributed selective search

Article 25 November 2016

Notes

1.
The values for b and \(k_1\) are based on the parameter choices reported for Atire and Lucene in the 2015 IR-Reproducibility Challenge, see http://github.com/lintool/IR-Reproducibility.
2.
We recognize that the AOL log has been withdrawn, but also note that it continues to be widely used for research purposes.

References

Aly, R., Hiemstra, D., Demeester, T.: Taily: shard selection using the tail of score distributions. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 673–682 (2013)
Google Scholar
Arguello, J., Callan, J., Diaz, F.: Classification-based resource selection. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1277–1286 (2009)
Google Scholar
Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.: Efficient query evaluation using a two-level retrieval process. In: Proceedings of the 12th International Conference on Information and Knowledge Management, pp. 426–434 (2003)
Google Scholar
Cacheda, F., Carneiro, V., Plachouras, V., Ounis, I.: Performance comparison of clustered and replicated information retrieval systems. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 124–135. Springer, Heidelberg (2007)
Chapter Google Scholar
Cambazoglu, B.B., Varol, E., Kayaaslan, E., Aykanat, C., Baeza-Yates, R.: Query forwarding in geographically distributed search engines. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 90–97 (2010)
Google Scholar
Croft, W.B.: A model of cluster searching based on classification. Inf. Syst. 5(3), 189–195 (1980)
Article Google Scholar
Dimopoulos, C., Nepomnyachiy, S., Suel, T.: Optimizing top-\(k\) document retrieval strategies for block-max indexes. In: Proceedings of the of the Sixth ACM International Conference on Web Search and Data Mining, pp. 113–122 (2013)
Google Scholar
Gravano, L., García-Molina, H., Tomasic, A.: GlOSS: Text-source discovery over the internet. ACM Trans. Database Syst. 24, 229–264 (1999)
Article Google Scholar
Ipeirotis, P.G., Gravano, L.: Distributed search over the hidden web: Hierarchical database sampling and selection. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 394–405 (2002)
Google Scholar
Kang, C., Wang, X., Chang, Y., Tseng, B.: Learning to rank with multi-aspect relevance for vertical search. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 453–462 (2012)
Google Scholar
Kulkarni, A., Callan, J.: Document allocation policies for selective searching of distributed indexes. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 449–458 (2010)
Google Scholar
Kulkarni, A., Callan, J.: Selective search: Efficient and effective search of large textual collections. ACM Trans. Inf. Syst. 33(4), 17:1–17:33 (2015)
Article Google Scholar
Kulkarni, A., Tigelaar, A., Hiemstra, D., Callan, J.: Shard ranking and cutoff estimation for topically partitioned collections. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 555–564 (2012)
Google Scholar
Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. Soft. Prac. & Exp. 41(1), 1–29 (2015)
Article Google Scholar
Nottelmann, H., Fuhr, N.: Evaluating different methods of estimating retrieval quality for resource selection. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 290–297. ACM (2003)
Google Scholar
Paltoglou, G., Salampasis, M., Satratzemi, M.: Integral based source selection for uncooperative distributed information retrieval environments. In: Proceedings of the 2008 ACM Workshop on Large-Scale Distributed Systems for Information Retrieval, pp. 67–74 (2008)
Google Scholar
Petri, M., Culpepper, J.S., Moffat, A.: Exploring the magic of WAND. In: Proceedings of the Australian Document Computing Symposium, pp. 58–65 (2013)
Google Scholar
Rojas, O., Gil-Costa, V., Marin, M.: Distributing effciently the block-max WAND algorithm. In: Proceedings of the 2013 International Conference on Computational Science, pp. 120–129 (2013)
Google Scholar
Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill, New York (1968)
Google Scholar
Shokouhi, M.: Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 160–172. Springer, Heidelberg (2007)
Chapter Google Scholar
Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 298–305 (2003)
Google Scholar
Strohman, T., Turtle, H., Croft, W.B.: Optimization strategies for complex queries. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 219–225 (2005)
Google Scholar
Thomas, P., Shokouhi, M.: Sushi: Scoring scaled samples for server selection. In: Proceedings of the 32nd ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 419–426 (2009)
Google Scholar
Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on internet. In: Proceedings of the International Conference on Database Systems for Advanced Applications, pp. 41–49 (1997)
Google Scholar

Download references

Acknowledgments

This research was supported by National Science Foundation (NSF) grant IIS-1302206; a Natural Sciences and Engineering Research Council of Canada (NSERC) Postgraduate Scholarship-Doctoral award; and the Australian Research Council (ARC) under the Discovery Projects scheme (DP140103256). Shane Culpepper is the recipient of an Australian Research Council (ARC) DECRA Research Fellowship (DE140100275).

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, USA
Yubin Kim & Jamie Callan
RMIT University, Melbourne, Australia
J. Shane Culpepper
The University of Melbourne, Melbourne, Australia
Alistair Moffat

Authors

Yubin Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jamie Callan
View author publications
You can also search for this author in PubMed Google Scholar
J. Shane Culpepper
View author publications
You can also search for this author in PubMed Google Scholar
Alistair Moffat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yubin Kim .

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Padova, Italy
Nicola Ferro
Faculty of Informatics, University of Lugano (USI), Lugano, Switzerland
Fabio Crestani
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Systèmes d’informations, Big Data et Recherche d’Information, Institut de Recherche en Informatique de Toulouse IRIT/équipe SIG, Toulouse Cedex 04, France
Josiane Mothe
Yahoo! Labs London, London, UK
Fabrizio Silvestri
Department of Information Engineering, University of Padua, Padova, Italy
Giorgio Maria Di Nunzio
TU Delft - EWI/ST/WIS, Delft, The Netherlands
Claudia Hauff
Department of Information Engineering, University of Padua, Padova, Italy
Gianmaria Silvello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, Y., Callan, J., Culpepper, J.S., Moffat, A. (2016). Does Selective Search Benefit from WAND Optimization?. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-30671-1_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Does Selective Search Benefit from WAND Optimization?

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On the Efficiency of Selective Search

Inverted List Caching for Topical Index Shards

Efficient distributed selective search

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Does Selective Search Benefit from WAND Optimization?

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On the Efficiency of Selective Search

Inverted List Caching for Topical Index Shards

Efficient distributed selective search

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation