An Inverse Retrieval Method via Query Generation for Xiaohongshu’s Search Engine | SpringerLink
Skip to main content

An Inverse Retrieval Method via Query Generation for Xiaohongshu’s Search Engine

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2024)

Abstract

In the real-world of Information Retrieval, the timely retrieval of the latest documents has gained significant attention in recent years. In this paper, we develop an effective retrieval method for search engines, i.e., inverse retrieval. We propose a two-stage contrastive strategy to train doc2query model, the component of inverse retrieval. We perform offline or nearline computations to generate queries and then build or update an index from the query to the tuple of document and score. We have implemented an offline and a nearline retrieval channel at Xiaohongshu. Both channels showed substantial improvement during A/B tests. To make our work reproducible, we release QD100K dataset with 111K documents and 23M query-doc pairs. Our experimental results on QK100K and MS MARCO show the effectiveness of our method. All our code and datasets are available at https://github.com/fytxlj/InverseRetrievalDataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.xiaohongshu.com/.

  2. 2.

    Quality is predicted by a neural network upon the publication of each new document.

  3. 3.

    https://github.com/shibing624/similarities.

  4. 4.

    https://huggingface.co/.

References

  1. Burges, C.J.: From ranknet to lambdarank to lambdamart: an overview. Learning 11(23–581), 81 (2010)

    Google Scholar 

  2. Campos, D.F., et al.: MS MARCO: a human generated machine reading comprehension dataset. arXiv (2016)

    Google Scholar 

  3. Cheng, H.T., et al.: Wide & deep learning for recommender systems. In: 1st Workshop on Deep Learning for Recommender Systems (2016)

    Google Scholar 

  4. Crestani, F., Lalmas, M., Rijsbergen, C.J.V., Campbell, I.: “Is this document relevant?. . . probably”: a survey of probabilistic models in information retrieval. ACM Comput. Surv. (1998)

    Google Scholar 

  5. Dai, Z., Callan, J.: Deeper text understanding for IR with contextual neural language modeling. In: SIGIR (2019)

    Google Scholar 

  6. Dong, Q., et al.: I3 retriever: incorporating implicit interaction in pre-trained language models for passage retrieval. In: CIKM (2023)

    Google Scholar 

  7. Formal, T., Lassance, C., Piwowarski, B., Clinchant, S.: SPLADE v2: sparse lexical and expansion model for information retrieval. arXiv (2021)

    Google Scholar 

  8. Guo, W., et al.: DeText: a deep text ranking framework with BERT. In: CIKM (2020)

    Google Scholar 

  9. Hambarde, K.A., Proença, H.: Information retrieval: recent advances and beyond. IEEE Access, 76581–76604 (2023)

    Google Scholar 

  10. Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: CIKM (2013)

    Google Scholar 

  11. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. TBD (2019)

    Google Scholar 

  12. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)

    Google Scholar 

  13. Liu, Y., Liu, P., Radev, D., Neubig, G.: BRIO: bringing order to abstractive summarization. arXiv (2022)

    Google Scholar 

  14. Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. TPAMI (2018)

    Google Scholar 

  15. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR (2020)

    Google Scholar 

  16. Rajput, S., et al.: Recommender systems with generative retrieval. In: NeurIPS (2023)

    Google Scholar 

  17. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: TREC (1994)

    Google Scholar 

  18. Shoef, M., Fogel, S., Cohen-Or, D.: Pointwise: an unsupervised point-wise feature learning network. arXiv (2019)

    Google Scholar 

  19. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  20. Vijayakumar, A.K., et al.: Diverse beam search for improved description of complex scenes. In: AAAI (2018)

    Google Scholar 

  21. Wang, J., et al.: Milvus: a purpose-built vector data management system. In: SIGMOD (2021)

    Google Scholar 

  22. Wang, Y., Ma, H., Wang, D.Z.: LIDER: an efficient high-dimensional learned index for large-scale dense passage retrieval. Proc. VLDB Endow. 16(2), 154–166 (2022)

    Article  Google Scholar 

  23. Yao, T., et al.: Self-supervised learning for large-scale item recommendations. In: CIKM (2021)

    Google Scholar 

  24. Zhai, J., et al.: Actions speak louder than words: trillion-parameter sequential transducers for generative recommendations. arXiv (2024)

    Google Scholar 

  25. Zhang, J., et al.: Build faster with less: a journey to accelerate sparse model building for semantic matching in product search. In: CIKM (2023)

    Google Scholar 

  26. Zou, L., et al.: Pre-trained language model based ranking in Baidu search. In: SIGKDD (2021)

    Google Scholar 

Download references

Acknowledgement

The authors would like to thank Dr. Shusen Wang at Xiaohongshu. This work was mainly conducted when Yuantao Fan was an intern. In addition, the authors would like to thank the anonymous reviewers for their valuable comments on improving the final version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruifan Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fan, Y., Tu, X., Li, R. (2024). An Inverse Retrieval Method via Query Generation for Xiaohongshu’s Search Engine. In: Huang, DS., Zhang, X., Zhang, C. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science(), vol 14879. Springer, Singapore. https://doi.org/10.1007/978-981-97-5675-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5675-9_31

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5674-2

  • Online ISBN: 978-981-97-5675-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics