User preference and embedding learning with implicit feedback for recommender systems | Data Mining and Knowledge Discovery Skip to main content

Advertisement

Log in

User preference and embedding learning with implicit feedback for recommender systems

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel ranking framework for collaborative filtering with the overall aim of learning user preferences over items by minimizing a pairwise ranking loss. We show the minimization problem involves dependent random variables and provide a theoretical analysis by proving the consistency of the empirical risk minimization in the worst case where all users choose a minimal number of positive and negative items. We further derive a Neural-Network model that jointly learns a new representation of users and items in an embedded space as well as the preference relation of users over the pairs of items. The learning objective is based on three scenarios of ranking losses that control the ability of the model to maintain the ordering over the items induced from the users’ preferences, as well as, the capacity of the dot-product defined in the learned embedded space to produce the ordering. The proposed model is by nature suitable for implicit feedback and involves the estimation of only very few parameters. Through extensive experiments on several real-world benchmarks on implicit data, we show the interest of learning the preference and the embedding simultaneously when compared to learning those separately. We also demonstrate that our approach is very competitive with the best state-of-the-art collaborative filtering techniques proposed for implicit feedback.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Talk of Xavier Amatriain - Recommender Systems - Machine Learning Summer School 2014 @ CMU.

  2. https://www.tensorflow.org/.

  3. https://movielens.org/.

  4. http://academictorrents.com/details/9b13183dc4d60676b773c9e2cd6de5e5542cee9a.

  5. B. James and L. Stan, The Netflix Prize (2007).

  6. https://archive.ics.uci.edu/ml/datasets/KASANDR.

  7. https://archive.ics.uci.edu/ml/datasets/PANDOR.

  8. https://github.com/hexiangnan/neural_collaborative_filtering.

  9. https://github.com/tensorflow/models/tree/master/official/r1/wide_deep

  10. https://github.com/sumitsidana/NERvE.

References

  • Ailon N, Mohri M (2008) An efficient reduction of ranking to classification. In: Proceedings of computational learning theory (COLT), pp. 87–98

  • Amini M-R, Usunier N (2015) Learning with partially labeled and interdependent data. Springer, New York

    Book  Google Scholar 

  • Bottou L (2012) Stochastic gradient descent tricks. In: Neural networks: tricks of the trade - Second Edition, pp. 421–436

  • Burges C. J. C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G. N (2005) Learning to rank using gradient descent. In: Proceedings of international conference on machine learning (ICML), pp. 89–96

  • Caruana R, Baluja S, Mitchell T. M (1995) Using the future to sort out the present: Rankprop and multitask learning for medical risk evaluation. In: Proceedings of the international conference on neural information processing systems (NIPS), pp. 959–965

  • Cheng H.-T, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, Anil R, Haque Z, Hong L, Jain V, Liu X, Shah H (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems, pp. 7–10

  • Cheng H, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, Anil R, Haque Z, Hong L, Jain V, Liu X, Shah H (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems, pp. 7–10

  • Cohen WW, Schapire RE, Singer Y (1999) Learning to order things. J Artif Intell Res (JAIR) 10:243–270

    Article  MathSciNet  Google Scholar 

  • Crammer K, Singer Y (2001) Pranking with ranking. In: Proceedings of the international conference on neural information processing systems (NIPS), pp. 641–647

  • Dacrema MF, Cremonesi P, Jannach D (2019) Are we really making much progress? A worrying analysis of recent neural recommendation approaches. In: Proceedings of the 13th ACM conference on recommender systems, RecSys 2019, Copenhagen, Denmark, September 16-20, 2019, pp. 101–109

  • Freund Y, Iyer RD, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969

    MathSciNet  MATH  Google Scholar 

  • Grbovic M, Radosavljevic V, Djuric N, Bhamidipati N, Savla J, Bhagwan V, Sharp D (2015) E-commerce in your inbox: product recommendations at scale. In: Proceedings of SIGKDD conference on knowledge discovery and data mining, pp. 1809–1818

  • Guàrdia-Sebaoun É, Guigue V, Gallinari P (2015) Latent trajectory modeling: a light and efficient way to introduce time in recommender systems. In: Proceedings of the ACM recommender systems conference (RecSys), pp. 281–284

  • Harper FM, Konstan JA (2015) The movielens datasets: history and context. ACM Trans Interact Intell Syst 5(4):19:1–19:19

    Google Scholar 

  • He X, Liao L, Zhang H, Nie L, Hu X, Chua T (2017) Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web (WWW), pp. 173–182

  • He X, Liao L, Zhang H, Nie L, Hu X, Chua T (2017) Neural collaborative filtering. In: Proceedings of the international conference on world wide web companion (WWW), pp. 173–182

  • Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: Proceedings of the international conference on data mining (ICDM), pp. 263–272

  • Janson S (2004) Large deviations for sums of partly dependent random variables. Random Struct Algorithms 24(3):234–248

    Article  MathSciNet  Google Scholar 

  • Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of SIGKDD conference on knowledge discovery and data mining, pp. 133–142

  • Kingma D. P, Ba J (2014) Adam: a method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations (ICLR)

  • Kula M (2015) Metadata embeddings for user and item cold-start recommendations. In: Proceedings of the 2nd workshop on new trends on content-based recommender systems co-located with RecSys., pp. 14–21

  • Lehmann E, D’Abrera H (2006) Nonparametrics: statistical methods based on ranks. Springer, Berlin

    MATH  Google Scholar 

  • Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Proceedings of the international conference on neural information processing systems (NIPS), pp. 2177–2185

  • Liang D, Altosaar J, Charlin L, Blei DM (2016) Factorization meets the item embedding: Regularizing matrix factorization with item co-occurrence. In: Proceedings of the ACM recommender systems conference (RecSys), pp. 59–66

  • Li P, Burges C. J. C, Wu Q (2007) Mcrank: learning to rank using multiple classification and gradient boosting. In: Proceedings of the international conference on neural information processing systems (NIPS), pp. 897–904

  • Liu T (2009) Learning to rank for information retrieval. Found Trends Inf Retrieval 3(3):225–331

    Article  Google Scholar 

  • Lops P, de Gemmis M, Semeraro G (2011) Content-based recommender systems: state of the art and trends. In: Ricci F, Rokach L, Shapira B, Kantor PB (Eds.), Recommender systems handbook, Springer, Berlin pp. 73–105

  • McDiarmid C (1989) On the method of bounded differences. Survey Combinatorics 141:148–188

    MathSciNet  MATH  Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of the international conference on learning representations (ICLR)

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of international conference on neural information processing systems (NIPS), pp. 3111–3119

  • Pessiot J, Truong T, Usunier N, Amini M.-R, Gallinari P (2007) Learning to rank for collaborative filtering. In: Proceedings of ICEIS, pp. 145–151

  • Ralaivola L, Amini MR (2015) Entropy-based concentration inequalities for dependent variables. In: Proceedings of intenational conference on machine learning (ICML), pp. 2436–2444

  • Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009) BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the international conference on uncertainty in artificial intelligence (UAI), pp. 452–461

  • Ricci F, Rokach L, Shapira B, Kantor PB (2010) Recommender systems handbook, 1st edn. Springer, New York

    MATH  Google Scholar 

  • Rigutini L, Papini T, Maggini M, Scarselli F (2011) Sortnet: learning to rank by a neural preference function. IEEE Trans Neural Netw 22(9):1368–1380

    Article  Google Scholar 

  • Rigutini L, Papini T, Maggini M, Bianchini M (2008) A neural network approach for learning object ranking. In: Proceedings of international conference on artificial neural networks (ICANN), pp. 899–908

  • Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York, NY, USA

    Book  Google Scholar 

  • Shi Y, Larson M, Hanjalic A (2010) List-wise learning to rank with matrix factorization for collaborative filtering. In: Proceedings of the ACM recommender systems conference (RecSys), pp. 269–272

  • Sidana S, Laclau C, Amini MR (2018) Learning to recommend diverse items over implicit feedback on PANDOR. In: Proceedings of the \(12^{th}\) ACM conference on recommender systems (RecSys), pp. 427–431

  • Sidana S, Laclau C, Amini M-R, Vandelle G, Bois-Crettez A (2017) KASANDR: a large-scale dataset with implicit feedback for recommendation

  • Usunier N, Amini MR, Gallinari P (2006) Generalization error bounds for classifiers trained with interdependent data. In: Proceedings of the international conference on neural information processing systems (NIPS), pp. 1369–1376

  • Usunier N, Amini M.-R, Gallinari P (2005) A data-dependent generalisation error bound for the AUC. In: ICML workshop on ROC analysis in machine learning, Bonn, Germany

  • Vapnik V (2000) The nature of statistical learning theory. Springer, Berlin

    Book  Google Scholar 

  • Vasile F, Smirnova E, Conneau A (2016) Meta-prod2vec: product embeddings using side-information for recommendation. In: Proceedings of the ACM recommender systems conference (RecSys), pp. 225–232

  • Wang H, Wang N, Yeung D (2015) Collaborative deep learning for recommender systems. In: Proceedings of SIGKDD conference on knowledge discovery and data mining, pp. 1235–1244

  • White R, Jose JM, Ruthven I (2001) Comparing explicit and implicit feedback techniques for web retrieval: TREC-10 interactive track report, in: Proceedings of Proceedings of the Tenth Text Retrieval Conference (TREC), pp. 534–538

  • Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval, pp. 391–398

  • Xu J, Liu T, Lu M, Li H, Ma W (2008) Directly optimizing evaluation measures in learning to rank. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval, pp. 107–114

  • Zhang Y, Ai Q, Chen X, Croft WB (2017) Joint representation learning for top-n recommendation with heterogeneous information sources. In: Proceedings of the 2017 ACM on conference on information and knowledge management, CIKM ’17

  • Zhang W, Yuan Q, Han J, Wang J (2016) Collaborative multi-level embedding learning from reviews for rating prediction. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI), pp. 2986–2992

Download references

Acknowledgements

This work was partly done under the Calypso project supported by the FEDER program from the Région Auvergne-Rhône-Alpes. The work of Yury Maximov at LANL was carried out under the auspices of the National Nuclear Security Administration of the US Department of Energy under Contract No. DE-AC52-06NA25396 and CNLS/LANL support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massih-Reza Amini.

Additional information

Responsible editor: Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Theorem 1

Let \({\mathcal {U}}\) be a set of M independent users, such that each user \(u \in {\mathcal {U}}\) prefers \(n_*^+\) items over \(n_*^-\) ones in a predefined set of \({\mathcal {I}}\) items. Let \(S=\{({\mathbf {z}}_{i,u,i'}\doteq (i,u,i'),y_{i,u,i'})\mid u\in {\mathcal {U}}, (i,i')\in {\mathcal {I}}^+_u\times {\mathcal {I}}^-_u\}\) be the associated training set, then for any \(1>\delta >0\) the following generalization bound holds for all \(f\in {\mathcal {F}}_{B,r}\) with probability at least \(1-\delta \):

$$\begin{aligned} {\mathcal {L}}(f)\le ~~&{\hat{{\mathcal {L}}}}_{*}(f,S) + \frac{2B{\mathfrak {C}}(S)}{Nn^+_*}\\&+\frac{5}{2}\left( \sqrt{\frac{2B{\mathfrak {C}}(S)}{Nn^+_*}}+\sqrt{\frac{r}{2}}\right) \sqrt{\frac{\log \frac{1}{\delta }}{n_*^+}}+\frac{25}{48}\frac{\log \frac{1}{\delta }}{n_*^+}, \end{aligned}$$

where \({\mathfrak {C}}(S)=\sqrt{ \frac{1}{n^-_*}\sum _{j=1}^{n_*^-}{\mathbb {E}}_{{\mathcal {M}}_j}\left[ \sum _{\begin{array}{c} \alpha \in {\mathcal {M}}_j \\ {\mathbf {z}}_{\alpha } \in S \end{array}}d({\mathbf {z}}_\alpha ,{\mathbf {z}}_{\alpha }))\right] }\), \({\mathbf {z}}_\alpha =(i_\alpha ,u_\alpha ,i'_\alpha )\) and

$$\begin{aligned} d({\mathbf {z}}_\alpha ,{\mathbf {z}}_{\alpha })=~&\kappa (\Phi (u_\alpha ,i_\alpha ),\Phi (u_\alpha ,i_\alpha ))\\&\quad +\kappa (\Phi (u_\alpha ,i'_\alpha ),\Phi (u_\alpha ,i'_\alpha ))- 2\kappa (\Phi (u_\alpha ,i_\alpha ),\Phi (u_\alpha ,i'_\alpha )). \end{aligned}$$

Proof

Let \(n_\star ^+\) and respectively \(n_\star ^-\) be the minimum number of preferred and non-preferred items for any user \(u\in {\mathcal {U}}\), then we have:

$$\begin{aligned} {{\hat{{\mathcal {L}}}}}(f,S)\le \underbrace{\frac{1}{N}\frac{1}{n_*^- n_*^+}\sum _{u\in {\mathcal {U}}}\sum _{i\in {\mathcal {I}}^+_u}\sum _{i'\in {\mathcal {I}}^-_u} \mathbb {1}_{y_{i,u,i'}f(i,u,i')<0}}_{={\hat{{\mathcal {L}}}}_{*}(f,S)}. \end{aligned}$$
(9)

As the set of users \({\mathcal {U}}\) is supposed to be independent, the exact fractional cover of the dependency graph corresponding to the training set S will be the union of the exact fractional cover associated to each user such that cover sets which do not contain any items in common are joined together.

Following [Ralaivola and Amini (2015), Proposition 4], for any \(1>\delta >0\) we have with probability at least \(1-\delta \):

$$\begin{aligned} \begin{aligned} {\mathbb {E}}_S[{\hat{{\mathcal {L}}}}_{*}(f,S)]&-{\hat{{\mathcal {L}}}}_{*}(f,S)\\&\le \inf _{\beta >0}\left( (1+\beta ){\mathfrak {R}}_{S}({\mathcal {F}}_{B,r})+\frac{5}{4}\sqrt{\frac{2r\log \frac{1}{\delta }}{n_*^+}}+\frac{25}{16}\left( \frac{1}{3}+\frac{1}{\beta }\right) \frac{\log \frac{1}{\delta }}{n_*^+}\right) . \end{aligned} \end{aligned}$$

The infimum is reached for \(\beta ^*=\sqrt{\frac{25}{16}\frac{\log \frac{1}{\delta }}{n_*^+\times {\mathfrak {R}}_{S}({\mathcal {F}}_{B,r})}}\) which by plugging it back into the upper-bound, and from Eq. (4), gives:

$$\begin{aligned} {\mathcal {L}}(f) \le {\hat{{\mathcal {L}}}}_{*}(f,S) + {\mathfrak {R}}_{S}({\mathcal {F}}_{B,r})+\frac{5}{2}\left( \sqrt{{\mathfrak {R}}_{S}({\mathcal {F}}_{B,r})}+\sqrt{\frac{r}{2}}\right) \sqrt{\frac{\log \frac{1}{\delta }}{n_*^+}}+\frac{25}{48}\frac{\log \frac{1}{\delta }}{n_*^+}. \end{aligned}$$
(10)

Now, for all \(j\in \{1,\ldots ,J\}\) and \(\alpha \in {\mathcal {M}}_j\), let \((u_\alpha ,i_\alpha )\) and \((u_\alpha ,i'_\alpha )\) be the first and the second pair constructed from \({\mathbf {z}}_\alpha \), then from the bilinearity of dot product and the Cauchy-Schwartz inequality, \({\mathfrak {R}}_{S}({\mathcal {F}}_{B,r})\) is upper-bounded by:

$$\begin{aligned} \frac{2}{m}{\mathbb {E}}_{\xi }\sum _{j=1}^{n_*^-}{\mathbb {E}}_{{\mathcal {M}}_j}\sup _{f\in {\mathcal {F}}_{B,r}}&\left\langle {\varvec{w}},\sum _{\begin{array}{c} \alpha \in {\mathcal {M}}_j \\ {\mathbf {z}}_\alpha \in S \end{array}}\xi _\alpha \left( \Phi (u_\alpha ,i_\alpha )-\Phi (u_\alpha ,i'_\alpha )\right) \right\rangle \nonumber \\&\le \frac{2B}{m}\sum _{j=1}^{n_*^-}{\mathbb {E}}_{{\mathcal {M}}_j} {\mathbb {E}}_{\xi }\left\| \sum _{\begin{array}{c} \alpha \in {\mathcal {M}}_j \\ {\mathbf {z}}_\alpha \in S \end{array}}\xi _\alpha (\Phi (u_\alpha ,i_\alpha )-\Phi (u_\alpha ,i'_\alpha ))\right\| \nonumber \\&\le \frac{2B}{m}\sum _{j=1}^{n_*^-}\left( {\mathbb {E}}_{{\mathcal {M}}_j \xi }\left[ \sum _{\begin{array}{c} \alpha ,\alpha '\in {\mathcal {M}}_j \\ {\mathbf {z}}_\alpha ,{\mathbf {z}}_{\alpha '} \in S \end{array}}\xi _\alpha \xi _{\alpha '}d({\mathbf {z}}_\alpha ,{\mathbf {z}}_{\alpha '}))\right] \right) ^{1/2}, \end{aligned}$$
(11)

where the last inequality follows from Jensen’s inequality and the concavity of the square root, and

$$\begin{aligned} d({\mathbf {z}}_\alpha ,{\mathbf {z}}_{\alpha '})=\left\langle \Phi (u_\alpha ,i_\alpha )-\Phi (u_\alpha ,i'_\alpha ),\Phi (u_\alpha ,i_\alpha )-\Phi (u_\alpha ,i'_\alpha )\right\rangle . \end{aligned}$$

Further, for all \(j\in \{1,\ldots ,n^-_*\}, \alpha ,\alpha ' \in {\mathcal {M}}_j, \alpha \ne \alpha '; \) we have \({\mathbb {E}}_\xi [\xi _\alpha \xi _{\alpha '}]=0\), [Shawe-Taylor and Cristianini (2004), p. 91] so:

$$\begin{aligned} {\mathfrak {R}}_{S}({\mathcal {F}}_{B,r})\le&~\frac{2B}{m}\sum _{j=1}^{n_*^-}\left( {\mathbb {E}}_{{\mathcal {M}}_j}\left[ \sum _{\begin{array}{c} \alpha \in {\mathcal {M}}_j \\ {\mathbf {z}}_\alpha \in S \end{array}}d({\mathbf {z}}_\alpha ,{\mathbf {z}}_{\alpha }))\right] \right) ^{1/2}\\ =&~ \frac{2Bn^-_*}{m}\sum _{j=1}^{n_*^-}\frac{1}{n^-_*}\left( {\mathbb {E}}_{{\mathcal {M}}_j}\left[ \sum _{\begin{array}{c} \alpha \in {\mathcal {M}}_j \\ {\mathbf {z}}_\alpha \in S \end{array}}d({\mathbf {z}}_\alpha ,{\mathbf {z}}_{\alpha }))\right] \right) ^{1/2}. \end{aligned}$$

By using Jensen’s inequality and the concavity of the square root once again, we finally get

$$\begin{aligned} {\mathfrak {R}}_{S}({\mathcal {F}}_{B,r})\le \frac{2B}{Nn^+_*}\sqrt{\sum _{j=1}^{n_*^-}\frac{1}{n^-_*}{\mathbb {E}}_{{\mathcal {M}}_j}\left[ \sum _{\begin{array}{c} \alpha \in {\mathcal {M}}_j \\ {\mathbf {z}}_\alpha \in S \end{array}}d({\mathbf {z}}_\alpha ,{\mathbf {z}}_{\alpha }))\right] }. \end{aligned}$$
(12)

The result follows from Eqs. (10) and (12). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sidana, S., Trofimov, M., Horodnytskyi, O. et al. User preference and embedding learning with implicit feedback for recommender systems. Data Min Knowl Disc 35, 568–592 (2021). https://doi.org/10.1007/s10618-020-00730-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00730-8

Keywords

Navigation