An Inverse Retrieval Method via Query Generation for Xiaohongshu’s Search Engine

Fan, Yuantao; Tu, Xinyu; Li, Ruifan

doi:10.1007/978-981-97-5675-9_31

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14879))

Included in the following conference series:

International Conference on Intelligent Computing

469 Accesses

Abstract

In the real-world of Information Retrieval, the timely retrieval of the latest documents has gained significant attention in recent years. In this paper, we develop an effective retrieval method for search engines, i.e., inverse retrieval. We propose a two-stage contrastive strategy to train doc2query model, the component of inverse retrieval. We perform offline or nearline computations to generate queries and then build or update an index from the query to the tuple of document and score. We have implemented an offline and a nearline retrieval channel at Xiaohongshu. Both channels showed substantial improvement during A/B tests. To make our work reproducible, we release QD100K dataset with 111K documents and 23M query-doc pairs. Our experimental results on QK100K and MS MARCO show the effectiveness of our method. All our code and datasets are available at https://github.com/fytxlj/InverseRetrievalDataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8465; Price includes VAT (Japan)

Softcover Book: JPY 10581; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improving BERT-based Query-by-Document Retrieval with Multi-task Optimization

Multi-stage enhanced representation learning for document reranking based on query view

Article 21 August 2024

Learning Query-Space Document Representations for High-Recall Retrieval

Notes

1.
https://www.xiaohongshu.com/.
2.
Quality is predicted by a neural network upon the publication of each new document.
3.
https://github.com/shibing624/similarities.
4.
https://huggingface.co/.

References

Burges, C.J.: From ranknet to lambdarank to lambdamart: an overview. Learning 11(23–581), 81 (2010)
Google Scholar
Campos, D.F., et al.: MS MARCO: a human generated machine reading comprehension dataset. arXiv (2016)
Google Scholar
Cheng, H.T., et al.: Wide & deep learning for recommender systems. In: 1st Workshop on Deep Learning for Recommender Systems (2016)
Google Scholar
Crestani, F., Lalmas, M., Rijsbergen, C.J.V., Campbell, I.: “Is this document relevant?. . . probably”: a survey of probabilistic models in information retrieval. ACM Comput. Surv. (1998)
Google Scholar
Dai, Z., Callan, J.: Deeper text understanding for IR with contextual neural language modeling. In: SIGIR (2019)
Google Scholar
Dong, Q., et al.: I3 retriever: incorporating implicit interaction in pre-trained language models for passage retrieval. In: CIKM (2023)
Google Scholar
Formal, T., Lassance, C., Piwowarski, B., Clinchant, S.: SPLADE v2: sparse lexical and expansion model for information retrieval. arXiv (2021)
Google Scholar
Guo, W., et al.: DeText: a deep text ranking framework with BERT. In: CIKM (2020)
Google Scholar
Hambarde, K.A., Proença, H.: Information retrieval: recent advances and beyond. IEEE Access, 76581–76604 (2023)
Google Scholar
Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: CIKM (2013)
Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. TBD (2019)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Liu, Y., Liu, P., Radev, D., Neubig, G.: BRIO: bringing order to abstractive summarization. arXiv (2022)
Google Scholar
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. TPAMI (2018)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR (2020)
Google Scholar
Rajput, S., et al.: Recommender systems with generative retrieval. In: NeurIPS (2023)
Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: TREC (1994)
Google Scholar
Shoef, M., Fogel, S., Cohen-Or, D.: Pointwise: an unsupervised point-wise feature learning network. arXiv (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Vijayakumar, A.K., et al.: Diverse beam search for improved description of complex scenes. In: AAAI (2018)
Google Scholar
Wang, J., et al.: Milvus: a purpose-built vector data management system. In: SIGMOD (2021)
Google Scholar
Wang, Y., Ma, H., Wang, D.Z.: LIDER: an efficient high-dimensional learned index for large-scale dense passage retrieval. Proc. VLDB Endow. 16(2), 154–166 (2022)
Article Google Scholar
Yao, T., et al.: Self-supervised learning for large-scale item recommendations. In: CIKM (2021)
Google Scholar
Zhai, J., et al.: Actions speak louder than words: trillion-parameter sequential transducers for generative recommendations. arXiv (2024)
Google Scholar
Zhang, J., et al.: Build faster with less: a journey to accelerate sparse model building for semantic matching in product search. In: CIKM (2023)
Google Scholar
Zou, L., et al.: Pre-trained language model based ranking in Baidu search. In: SIGKDD (2021)
Google Scholar

Download references

Acknowledgement

The authors would like to thank Dr. Shusen Wang at Xiaohongshu. This work was mainly conducted when Yuantao Fan was an intern. In addition, the authors would like to thank the anonymous reviewers for their valuable comments on improving the final version of this paper.

Author information

Authors and Affiliations

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Yuantao Fan, Xinyu Tu & Ruifan Li
Engineering Research Center of Information Networks, Ministry of Education, Beijing, 100876, China
Ruifan Li

Authors

Yuantao Fan
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Tu
View author publications
You can also search for this author in PubMed Google Scholar
Ruifan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruifan Li .

Editor information

Editors and Affiliations

Eastern Institute of Technology, Ningbo, China
De-Shuang Huang
Tianjin University of Science and Technology, Tianjin, China
Xiankun Zhang
Tianjin University of Science and Technology, Tianjin, China
Chuanlei Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fan, Y., Tu, X., Li, R. (2024). An Inverse Retrieval Method via Query Generation for Xiaohongshu’s Search Engine. In: Huang, DS., Zhang, X., Zhang, C. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science(), vol 14879. Springer, Singapore. https://doi.org/10.1007/978-981-97-5675-9_31

Download citation

DOI: https://doi.org/10.1007/978-981-97-5675-9_31
Published: 01 August 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5674-2
Online ISBN: 978-981-97-5675-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Inverse Retrieval Method via Query Generation for Xiaohongshu’s Search Engine

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving BERT-based Query-by-Document Retrieval with Multi-task Optimization

Multi-stage enhanced representation learning for document reranking based on query view

Learning Query-Space Document Representations for High-Recall Retrieval

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Inverse Retrieval Method via Query Generation for Xiaohongshu’s Search Engine

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving BERT-based Query-by-Document Retrieval with Multi-task Optimization

Multi-stage enhanced representation learning for document reranking based on query view

Learning Query-Space Document Representations for High-Recall Retrieval

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation