Abstract
Although, semi-supervised learning with a small amount of labeled data can be utilized to improve the effectiveness of learning to rank in information retrieval, the pseudo labels created by semi-supervised learning may not reliable. The uncertain data nearby the boundaries of relevant and irrelevant documents for a given query has a significant impact on the effectiveness of learning to rank. Therefore, how to utilize the uncertain data to bring benefit for semi-supervised learning to rank is an excellent challenge. In this paper, we propose a semi-supervised learning to rank algorithm, that builds a query-quality predictor by utilizing uncertain data. Specially, this approach selects the training queries following the empirical observation that the relevant documents of high quality training queries are highly coherent. This approach learns from the uncertain data to predict the retrieval performance gain of a given training query by making use of query features. Then the pseudo labels for learning to rank are aggregated iteratively by semi-supervised learning with the selected queries. Experimental results on the standard LETOR dataset show that our proposed approaches outperform the strong baselines.
Supported by the Natural Science Foundation of Hefei University (18ZR07ZDA, 19ZR04ZDA), National Nature Science Foundation of China (Grant No. 61806068), the natural science research key project of Anhui university (Grant No. KJ2018A0556), the grant of Natural Science Foundation of Hefei University (Grant No. 16-17RC19,18-19RC27).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Liu, T.: Learning to rank for information retrieval. Found. Trends Inf. Retrieval 3, 225–331 (2011)
Szummer, M., Yilmaz, E.: Semi-supervised learning to rank with preference regularization. In: Proceedings of the 20th ACM Conference on Conference on Information and Knowledge Management, CIKM 2011, pp. 269–278 (2011)
van den Akker, B., Markov, I., de Rijken, M.: ViTOR: learning to rank webpages based on visual features. In: The Web Conference (2019)
Zoghi, M., Tunys, T., Ghavamzadeh, M., Kveton, B., Szepesvari, C., Wen, Z.: Online learning to rank in stochastic click models. In: Proceedings of the 20th ACM Conference on Conference on Proceedings of the 34th International Conference on Machine Learning, pp. 4199–4208 (2017)
Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference (2015)
Wang, B., Klabjan, D.: An attention-based deep net for learning to rank (2017). arXiv preprint arXiv
Qin, T., Liu, T.: Introducing LETOR 4.0 datasets. Technical Report Microsoft Research Asia (2013)
Ganjisaffar, Y., Caruana, R., Lope, C.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 85–94. ACM, New York, NY, USA (2011)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 133–142. ACM, New York, NY, USA (2002)
Hu, H., Sha, C., Wang, X., Zhou, A.: A unified framework for semi-supervised Pu learning. World Wide Web 17(4), 493–510 (2014)
Chapelle, O., Schlkopf, B., Zien, A.: Semi-Supervised Learning, 1st edn. MIT Press, Cambridge (2010)
Sellamanickam, S., Garg, P., Selvaraj, S.K.: A pairwise ranking based approach to learning with positive and unlabeled examples. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 663–672. ACM, New York, NY, USA (2011)
Huang, J.X., Miao, J., He, B.: High performance query expansion using adaptive co-training. Inf. Process. Manage. 49(2), 441–453 (2013). https://doi.org/10.1016/j.ipm.2012.08.002
Usunier, N., Truong, V., Amini, M.R., Gallinari, P., Curie, M.: Ranking with unlabeled data: a first study. In: Proceedings of NIPS Workshop (2005)
Zhang, L., Ma, B., He, J., Li, G., Huang, Q., Tian, Q.: Adaptively unified semi-supervised learning for cross-modal retrieval. In: Proceedings of the Twenty-Sixth International Joint Conference on Articial Intelligence(IJCAI-17) (2017)
Duh, K., Kirchhoff, K.: Learning to rank with partially-labeled data. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 251–258. ACM, New York, NY, USA (2008)
Li, M., Li, H., Zhou, Z.H.: Semi-supervised document retrieval. Inf. Process. Manage. 45, 341–355 (2009)
Kim, A., Cho, S.-B.: An ensemble semi-supervised learning method for predicting defaults in social lending. Eng. Appl. Artif. Intell. 81, 193–199 (2019)
Hong, T.P., Tseng, S.S.: A generalized version space learning algorithm for noisy and uncertain data. IEEE Trans. Knowl. Data Eng. 9(2), 336–340 (1997)
Rhee, P.K., Erdenee, E., Kyun, S.D., Ahmed, M.U., Jin, S.: Active and semi-supervised learning for object detection with imperfect data. Cogn. Syst. Res. 45, 109–123 (2017)
Dallaire, P., Besse, C., Chaib-draa, B.: An approximate inference with Gaussian process to latent functions from uncertain data. Neurocomputing 74, 1945–1955 (2011)
Liang, C., Zhang, Y., Shi, P., Hu, Z.: Information sciences learning very fast decision tree from uncertain data streams with positive and unlabeled samples. Inf. Sci. 213, 50–67 (2012)
Zhu, M., Gao, Z., Qi, G., Ji, Q.: DLP learning from uncertain data. Tsinghua Sci. Technol. 15, 650–656 (2010)
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002, pp. 299–306. ACM, New York, NY, USA (2002)
Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 543–550. ACM, New York, NY, USA (2007)
Shtok, A., Kurland, O., Carmel, D., Raiber, F., Markovits, G.: Predicting query performance by query-drift estimation. ACM Trans. Inf. Syst. 30(2), 11:1–11:35 (2012)
Reitmaie, T., Calma, A., Sick, B.: Transductive active learning –a new semi-supervised learning approach based on iteratively refined generative models to capture structure in data. Inf. Sci. 293, 275–298 (2015)
Webb, G.I., Boughton, J.R., Wang, Z.: Not so naive bayes: aggregating one-dependence estimators. Mach. Learn. 58(1), 5–24 (2005)
Palei, S.K., Das, S.K.: Logistic regression model for prediction of roof fall risks in bord and pillar workings in coal mines: an approach. Saf. Sci. 47(1), 88–96 (2009)
Zhang, X., He, B., Luo, T.: Transductive learning for real-time twitter search. In: The International Conference on Weblogs and Social Media (ICWSM), pp. 611–614 (2012)
Liu, T., Xu, J., Qin, T., Xiong, W., Li, H.: LETOR: benchmark dataset for research on learning to rank for information retrieval. In: SIGIR 2007 Workshop on Learning to Rank for Information Retrieval (2007)
Geng, X., Qin, T., Liu, T., Cheng, X., Li, H.: Selecting optimal training data for learning to rank. Inf. Process. Manage. 47(5), 730–741 (2011)
Yang, Y., Ma, Z., Nie, F., Chang, X., Hauptmann, A.G.: Multi-class active learning by uncertainty sampling with diversity maximization. Int. J. Comput. Vis. 113, 113–127 (2015)
Shang, C., Huang, X., You, F.: Data-driven robust optimization based on kernel learning. Comput. Chem. Eng. 106, 464–479 (2017)
Liu, J., Cui, R., Zhao, Y.: Multilingual short text classification via convolutional neural network. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds.) WISA 2018. LNCS, vol. 11242, pp. 27–38. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02934-0_3
Acknowledgements
This work is supported in part by the Natural Science Foundation of Hefei University (18ZR07ZDA,19ZR04ZDA), National Nature Science Foundation of China (Grant No. 61806068), the natural science research key project of Anhui university (Grant No. KJ2018A0556), the grant of Natural Science Foundation of Hefei University (Grant No. 16-17RC19,18-19RC27).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, X., Zhao, Z., Liu, C., Zhang, C., Cheng, Z. (2019). Semi-supervised Learning to Rank with Uncertain Data. In: Ni, W., Wang, X., Song, W., Li, Y. (eds) Web Information Systems and Applications. WISA 2019. Lecture Notes in Computer Science(), vol 11817. Springer, Cham. https://doi.org/10.1007/978-3-030-30952-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-30952-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30951-0
Online ISBN: 978-3-030-30952-7
eBook Packages: Computer ScienceComputer Science (R0)