Abstract
In this paper, we propose a novel ranking framework for collaborative filtering with the overall aim of learning user preferences over items by minimizing a pairwise ranking loss. We show the minimization problem involves dependent random variables and provide a theoretical analysis by proving the consistency of the empirical risk minimization in the worst case where all users choose a minimal number of positive and negative items. We further derive a Neural-Network model that jointly learns a new representation of users and items in an embedded space as well as the preference relation of users over the pairs of items. The learning objective is based on three scenarios of ranking losses that control the ability of the model to maintain the ordering over the items induced from the users’ preferences, as well as, the capacity of the dot-product defined in the learned embedded space to produce the ordering. The proposed model is by nature suitable for implicit feedback and involves the estimation of only very few parameters. Through extensive experiments on several real-world benchmarks on implicit data, we show the interest of learning the preference and the embedding simultaneously when compared to learning those separately. We also demonstrate that our approach is very competitive with the best state-of-the-art collaborative filtering techniques proposed for implicit feedback.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Talk of Xavier Amatriain - Recommender Systems - Machine Learning Summer School 2014 @ CMU.
B. James and L. Stan, The Netflix Prize (2007).
References
Ailon N, Mohri M (2008) An efficient reduction of ranking to classification. In: Proceedings of computational learning theory (COLT), pp. 87–98
Amini M-R, Usunier N (2015) Learning with partially labeled and interdependent data. Springer, New York
Bottou L (2012) Stochastic gradient descent tricks. In: Neural networks: tricks of the trade - Second Edition, pp. 421–436
Burges C. J. C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G. N (2005) Learning to rank using gradient descent. In: Proceedings of international conference on machine learning (ICML), pp. 89–96
Caruana R, Baluja S, Mitchell T. M (1995) Using the future to sort out the present: Rankprop and multitask learning for medical risk evaluation. In: Proceedings of the international conference on neural information processing systems (NIPS), pp. 959–965
Cheng H.-T, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, Anil R, Haque Z, Hong L, Jain V, Liu X, Shah H (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems, pp. 7–10
Cheng H, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, Anil R, Haque Z, Hong L, Jain V, Liu X, Shah H (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems, pp. 7–10
Cohen WW, Schapire RE, Singer Y (1999) Learning to order things. J Artif Intell Res (JAIR) 10:243–270
Crammer K, Singer Y (2001) Pranking with ranking. In: Proceedings of the international conference on neural information processing systems (NIPS), pp. 641–647
Dacrema MF, Cremonesi P, Jannach D (2019) Are we really making much progress? A worrying analysis of recent neural recommendation approaches. In: Proceedings of the 13th ACM conference on recommender systems, RecSys 2019, Copenhagen, Denmark, September 16-20, 2019, pp. 101–109
Freund Y, Iyer RD, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
Grbovic M, Radosavljevic V, Djuric N, Bhamidipati N, Savla J, Bhagwan V, Sharp D (2015) E-commerce in your inbox: product recommendations at scale. In: Proceedings of SIGKDD conference on knowledge discovery and data mining, pp. 1809–1818
Guàrdia-Sebaoun É, Guigue V, Gallinari P (2015) Latent trajectory modeling: a light and efficient way to introduce time in recommender systems. In: Proceedings of the ACM recommender systems conference (RecSys), pp. 281–284
Harper FM, Konstan JA (2015) The movielens datasets: history and context. ACM Trans Interact Intell Syst 5(4):19:1–19:19
He X, Liao L, Zhang H, Nie L, Hu X, Chua T (2017) Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web (WWW), pp. 173–182
He X, Liao L, Zhang H, Nie L, Hu X, Chua T (2017) Neural collaborative filtering. In: Proceedings of the international conference on world wide web companion (WWW), pp. 173–182
Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: Proceedings of the international conference on data mining (ICDM), pp. 263–272
Janson S (2004) Large deviations for sums of partly dependent random variables. Random Struct Algorithms 24(3):234–248
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of SIGKDD conference on knowledge discovery and data mining, pp. 133–142
Kingma D. P, Ba J (2014) Adam: a method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations (ICLR)
Kula M (2015) Metadata embeddings for user and item cold-start recommendations. In: Proceedings of the 2nd workshop on new trends on content-based recommender systems co-located with RecSys., pp. 14–21
Lehmann E, D’Abrera H (2006) Nonparametrics: statistical methods based on ranks. Springer, Berlin
Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Proceedings of the international conference on neural information processing systems (NIPS), pp. 2177–2185
Liang D, Altosaar J, Charlin L, Blei DM (2016) Factorization meets the item embedding: Regularizing matrix factorization with item co-occurrence. In: Proceedings of the ACM recommender systems conference (RecSys), pp. 59–66
Li P, Burges C. J. C, Wu Q (2007) Mcrank: learning to rank using multiple classification and gradient boosting. In: Proceedings of the international conference on neural information processing systems (NIPS), pp. 897–904
Liu T (2009) Learning to rank for information retrieval. Found Trends Inf Retrieval 3(3):225–331
Lops P, de Gemmis M, Semeraro G (2011) Content-based recommender systems: state of the art and trends. In: Ricci F, Rokach L, Shapira B, Kantor PB (Eds.), Recommender systems handbook, Springer, Berlin pp. 73–105
McDiarmid C (1989) On the method of bounded differences. Survey Combinatorics 141:148–188
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of the international conference on learning representations (ICLR)
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of international conference on neural information processing systems (NIPS), pp. 3111–3119
Pessiot J, Truong T, Usunier N, Amini M.-R, Gallinari P (2007) Learning to rank for collaborative filtering. In: Proceedings of ICEIS, pp. 145–151
Ralaivola L, Amini MR (2015) Entropy-based concentration inequalities for dependent variables. In: Proceedings of intenational conference on machine learning (ICML), pp. 2436–2444
Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009) BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the international conference on uncertainty in artificial intelligence (UAI), pp. 452–461
Ricci F, Rokach L, Shapira B, Kantor PB (2010) Recommender systems handbook, 1st edn. Springer, New York
Rigutini L, Papini T, Maggini M, Scarselli F (2011) Sortnet: learning to rank by a neural preference function. IEEE Trans Neural Netw 22(9):1368–1380
Rigutini L, Papini T, Maggini M, Bianchini M (2008) A neural network approach for learning object ranking. In: Proceedings of international conference on artificial neural networks (ICANN), pp. 899–908
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York, NY, USA
Shi Y, Larson M, Hanjalic A (2010) List-wise learning to rank with matrix factorization for collaborative filtering. In: Proceedings of the ACM recommender systems conference (RecSys), pp. 269–272
Sidana S, Laclau C, Amini MR (2018) Learning to recommend diverse items over implicit feedback on PANDOR. In: Proceedings of the \(12^{th}\) ACM conference on recommender systems (RecSys), pp. 427–431
Sidana S, Laclau C, Amini M-R, Vandelle G, Bois-Crettez A (2017) KASANDR: a large-scale dataset with implicit feedback for recommendation
Usunier N, Amini MR, Gallinari P (2006) Generalization error bounds for classifiers trained with interdependent data. In: Proceedings of the international conference on neural information processing systems (NIPS), pp. 1369–1376
Usunier N, Amini M.-R, Gallinari P (2005) A data-dependent generalisation error bound for the AUC. In: ICML workshop on ROC analysis in machine learning, Bonn, Germany
Vapnik V (2000) The nature of statistical learning theory. Springer, Berlin
Vasile F, Smirnova E, Conneau A (2016) Meta-prod2vec: product embeddings using side-information for recommendation. In: Proceedings of the ACM recommender systems conference (RecSys), pp. 225–232
Wang H, Wang N, Yeung D (2015) Collaborative deep learning for recommender systems. In: Proceedings of SIGKDD conference on knowledge discovery and data mining, pp. 1235–1244
White R, Jose JM, Ruthven I (2001) Comparing explicit and implicit feedback techniques for web retrieval: TREC-10 interactive track report, in: Proceedings of Proceedings of the Tenth Text Retrieval Conference (TREC), pp. 534–538
Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval, pp. 391–398
Xu J, Liu T, Lu M, Li H, Ma W (2008) Directly optimizing evaluation measures in learning to rank. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval, pp. 107–114
Zhang Y, Ai Q, Chen X, Croft WB (2017) Joint representation learning for top-n recommendation with heterogeneous information sources. In: Proceedings of the 2017 ACM on conference on information and knowledge management, CIKM ’17
Zhang W, Yuan Q, Han J, Wang J (2016) Collaborative multi-level embedding learning from reviews for rating prediction. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI), pp. 2986–2992
Acknowledgements
This work was partly done under the Calypso project supported by the FEDER program from the Région Auvergne-Rhône-Alpes. The work of Yury Maximov at LANL was carried out under the auspices of the National Nuclear Security Administration of the US Department of Energy under Contract No. DE-AC52-06NA25396 and CNLS/LANL support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Theorem 1
Let \({\mathcal {U}}\) be a set of M independent users, such that each user \(u \in {\mathcal {U}}\) prefers \(n_*^+\) items over \(n_*^-\) ones in a predefined set of \({\mathcal {I}}\) items. Let \(S=\{({\mathbf {z}}_{i,u,i'}\doteq (i,u,i'),y_{i,u,i'})\mid u\in {\mathcal {U}}, (i,i')\in {\mathcal {I}}^+_u\times {\mathcal {I}}^-_u\}\) be the associated training set, then for any \(1>\delta >0\) the following generalization bound holds for all \(f\in {\mathcal {F}}_{B,r}\) with probability at least \(1-\delta \):
where \({\mathfrak {C}}(S)=\sqrt{ \frac{1}{n^-_*}\sum _{j=1}^{n_*^-}{\mathbb {E}}_{{\mathcal {M}}_j}\left[ \sum _{\begin{array}{c} \alpha \in {\mathcal {M}}_j \\ {\mathbf {z}}_{\alpha } \in S \end{array}}d({\mathbf {z}}_\alpha ,{\mathbf {z}}_{\alpha }))\right] }\), \({\mathbf {z}}_\alpha =(i_\alpha ,u_\alpha ,i'_\alpha )\) and
Proof
Let \(n_\star ^+\) and respectively \(n_\star ^-\) be the minimum number of preferred and non-preferred items for any user \(u\in {\mathcal {U}}\), then we have:
As the set of users \({\mathcal {U}}\) is supposed to be independent, the exact fractional cover of the dependency graph corresponding to the training set S will be the union of the exact fractional cover associated to each user such that cover sets which do not contain any items in common are joined together.
Following [Ralaivola and Amini (2015), Proposition 4], for any \(1>\delta >0\) we have with probability at least \(1-\delta \):
The infimum is reached for \(\beta ^*=\sqrt{\frac{25}{16}\frac{\log \frac{1}{\delta }}{n_*^+\times {\mathfrak {R}}_{S}({\mathcal {F}}_{B,r})}}\) which by plugging it back into the upper-bound, and from Eq. (4), gives:
Now, for all \(j\in \{1,\ldots ,J\}\) and \(\alpha \in {\mathcal {M}}_j\), let \((u_\alpha ,i_\alpha )\) and \((u_\alpha ,i'_\alpha )\) be the first and the second pair constructed from \({\mathbf {z}}_\alpha \), then from the bilinearity of dot product and the Cauchy-Schwartz inequality, \({\mathfrak {R}}_{S}({\mathcal {F}}_{B,r})\) is upper-bounded by:
where the last inequality follows from Jensen’s inequality and the concavity of the square root, and
Further, for all \(j\in \{1,\ldots ,n^-_*\}, \alpha ,\alpha ' \in {\mathcal {M}}_j, \alpha \ne \alpha '; \) we have \({\mathbb {E}}_\xi [\xi _\alpha \xi _{\alpha '}]=0\), [Shawe-Taylor and Cristianini (2004), p. 91] so:
By using Jensen’s inequality and the concavity of the square root once again, we finally get
Rights and permissions
About this article
Cite this article
Sidana, S., Trofimov, M., Horodnytskyi, O. et al. User preference and embedding learning with implicit feedback for recommender systems. Data Min Knowl Disc 35, 568–592 (2021). https://doi.org/10.1007/s10618-020-00730-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-020-00730-8