Explainable recommendations with nonnegative matrix factorization | Artificial Intelligence Review Skip to main content
Log in

Explainable recommendations with nonnegative matrix factorization

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Explicable recommendation system is proved to be conducive to improving the persuasiveness of the recommendation system, enabling users to trust the system more and make more intelligent decisions. Nonnegative Matrix Factorization (NMF) produces interpretable solutions for many applications including collaborative filtering as it’s nonnegativity. However, the latent features make it difficult to interpret recommendation results to users because we don’t know the specific meaning of features that users are interested in and the extent to which the items or users belong to these features. To overcome this difficulty, we develop a novel method called Partially Explainable Nonnegative Matrix Factorization (PE-NMF) by employing explicit data to replace part latent variables of item-feature matrix, by which users can learn more about the features of the items and then to make ideal decisions and recommendations. The objective function of PE-NMF is composed of two parts: one part corresponding to explicit features and the other part is about implicit features. We develop an iterative method to minimize the objective function and derive the iterative update rules, with which the objective function can be proved to be decreasing. Finally, the experiments are executed on Yelp, Amazon and Dianping datasets, and the experimental results demonstrate PE-NMF keeps a high prediction performance on both rating prediction and top-N recommendation that compare to fully explainable nonnegative matrix factorization (FE-NMF), which is obtained by using explicit opinions instead of item-feature matrix. Also PE-NMF holds almost the same recommendation ability as NMF.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Amazon data set can be found at http://jmcauley.ucsd.edu/data/amazon/.

  2. Yelp data set can be found at https://www.yelp.com/dataset.

  3. Dianping data set can be found at https://doi.org/10.18170/DVN/GCIUN4.

References

  • Abdollahi B, Nasraoui O (2016) Explainable matrix factorization for collaborative filtering, Proceedings of the 25th international conference companion on world wide web. International world wide web conferences steering committee. WWW’16 Companion, April 11–15, Montreal, Quebec, Canada

  • Abdollahi B, Nasraoui O (2017) Using explainability for constrained matrix factorization. In: Proceedings of the eleventh ACM conference on recommender systems, pp 79–83

  • Adeel A, Khalid S, Osman K (2021) On deep neural network for trust aware cross domain recommendations in E-commerce. Expert Syst Appl 174:114757

    Article  Google Scholar 

  • Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17:734–749

    Article  Google Scholar 

  • Aghdam MH (2022) A novel constrained non-negative matrix factorization method based on users and items pairwise relationship for recommender systems. Expert Syst Appl 195:116593

    Article  Google Scholar 

  • Behera G, Nain N (2022) DeepNNMF: deep nonlinear non-negative matrix factorization to address sparsity problem of collaborative recommender system. Int J Inf Technol 14:3637–3645

    Google Scholar 

  • Belkin M, Niyogi P, Sindhwani V (2004) Manifold regularization: a geometric framework for learning from examples. J Mach Learn Res 7:2399–2434

    MathSciNet  MATH  Google Scholar 

  • Bobadilla J, Serradilla F, Hernando A (2009) Collaborative filtering adapted to recommender systems of e-learning. Knowl-Based Syst 22:261–265

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the Em algorithm. J R Stat Soc 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Fatemeh R, Chitra D (2021) A survey of attack detection approaches in collaborative filtering recommender systems. Artif Intell Rev 54:2011–2066

    Article  Google Scholar 

  • Hernando A, Bobadilla J, Ortega F (2016) A non negative matrix factorization for collaborative filtering recommender systems based on a Bayesian probabilistic model. Knowl-Based Syst 97:188–202

    Article  Google Scholar 

  • Hongyan C (2020) Personalized recommendation of film and television culture based on an intelligent classification algorithm. Pers Ubiquitous Comput 24:165–176

    Article  Google Scholar 

  • Hua LP, Jing BW, Zhi JZ (2021) A movie recommendation model combining time information and probability matrix factorisation. Int J Embed Syst 14:239–247

    Article  Google Scholar 

  • Jamali M, Ester M (2010) A matrix factorization technique with trust propagation for recommendation in social networks. In: Proceedings of the fourth ACM conference on Recommender systems, pp 135–142

  • Ji Z, Pi H, Wei W (2019) Recommendation based on review texts and social communities: a hybrid model. IEEE Access 103:40416–40427

    Article  Google Scholar 

  • Jiang M, Cui PR (2012) Social contextual recommendation. In: 21st ACM international conference on information and knowledge management, pp 45–54

  • Khan Z, Iltaf N, Afzal H, Abbas H (2020) Enriching non-negative matrix factorization with contextual embeddings for recommender systems. Neurocomputing 380:246–258

    Article  Google Scholar 

  • Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42:30–37

    Article  Google Scholar 

  • Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401:788–791

    Article  MATH  Google Scholar 

  • Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. Nips 13:556–562

    Google Scholar 

  • Lee SK, Cho YH, Kim SH (2020) Collaborative filtering with ordinal scale-based implicit ratings for mobile music recommendations. Inf Sci 180:2142–2155

    Article  Google Scholar 

  • Li L, Zhang YJ (2009) FastNMF: highly efficient monotonic fixed-point nonnegative matrix factorization algorithm with good applicability. J Electron Imaging 18:273–288

    Article  Google Scholar 

  • Li H, Li K, An J et al (2019) An efficient manifold regularized sparse non-negative matrix factorization model for large-scale recommender systems on GPUs. Inf Sci 496:464–484

    Article  Google Scholar 

  • Lu Y, Castellanos M, Dayal U, Zhai CX (2011) Automatic construction of a context-aware sentiment lexicon: an optimization approach. In: International conference on world wide web, pp 347–356

  • Luo X, Zhou M, Xia Y, Zhu Q (2014) An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Trans Ind Inf 10:1273–1284

    Article  Google Scholar 

  • Mao Y, Lawrence KS (2004) Modeling distances in large-scale networks by matrix factorization. In: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pp 278–287

  • Nuez-Valdaz ER, Cueva-Lovelle JM, Sanjuán-Martínez Q (2012) Implicit feedback techniques on recommender systems applied to electronic books. Comput Hum Behav 28:1186–1193

    Article  Google Scholar 

  • Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5:111–126

    Article  Google Scholar 

  • Peng S, Ser W, Chen B, Sun L, Lin Z (2020) Robust nonnegative matrix factorization with local coordinate constraint for image clustering. Eng Appl Artif Intell 88:1–12

    Article  Google Scholar 

  • Qian XM, Feng H, Zhao GS, Mei T (2014) Personalized recommendation combining user interest and social circle. IEEE Trans Knowl Data Eng 26:1763–1777

    Article  Google Scholar 

  • Salakhutdinov R, Mnih A (2008) Probabilistic matrix factorization. Adv Neural Inf Process Syst 20:1257–1264

    Google Scholar 

  • Saul L, Pereira F (1997) Aggregate and mixed-order Markov models for statistical language processing. In: Proceedings of the second conference on empirical methods in natural language processing, ACL Press, pp 81–89

  • Song W, Li X (2019) A non-negative matrix factorization for recommender systems based on dynamic bias. In: International conference on modeling decisions for artificial intelligence, pp 151–163

  • Tao YY, Jia YL, Wang N (2019) The FacT: taming latent factor models for explainability with factorization trees. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pp 295–304

  • Vaccari I, Carlevaro A, Narteni S, Cambiaso E, Mongelli M (2022) eXplainable and reliable against adversarial machine learning in data analytics. Proc IEEE Access 10:83949–83970

    Article  Google Scholar 

  • Vig J, Sen S, Riedl J (2009) Tagsplanations: explaining recommendations using tags. In: Proceedings of the 14th international conference on intelligent user interfaces, pp 47–56

  • Wang CP, Song XL, Zhang JS (2018a) Graph regularized nonnegative matrix factorization with sample diversity for image representation. Eng Appl Artif Intell 68:32–39

    Article  Google Scholar 

  • Wang N, Wang H, Jia Y, Yin Y (2018b) Explainable recommendation via multi-task learning in opinionated text data. In: The 41st international ACM SIGIR conference on research & development in information retrieval, ACM, pp 165–174

  • Wang J, Zhu L, Dai T, Xu QN, Gao TY (2021) Low-rank and sparse matrix factorization with prior relations for recommender systems. Appl Intell 51:3435–3449

    Article  Google Scholar 

  • Wu Q, Tan M, Li X, Min H, Sun N (2015) NMFE-SSCC: non-negative matrix factorization ensemble for semi-supervised collective classification. Knowl-Based Syst 89:160–172

    Article  Google Scholar 

  • Yang XW, Steck H, Liu Y (2012) Circle-based recommendation in online social networks. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1267–1275

  • Zhang Y, Lai G, Zhang M, Zhang Y, Liu Y, Ma S (2014) Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, pp 83–92

  • Zhang H, Ganchev I, Nikolov NS (2020) Featuremf: an item feature enriched matrix factorization model for item recommendation. IEEE Access 23:1–11

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61936001, 62136002, 62233018, 62221005, 12201089),the Natural Science Foundation of Chongqing (cstc2019jcyj-cxttX0002, cstc2020jcyj-msxmX0737, cstc2021ycjh-bgzxm0013, cstb2022nscq-msx0226), the Key Cooperation Project of Chongqing Municipal Education Commission (HZ2021008), the Science and Technology Research Program of Chongqing Education Commission of China (KJQN201900638, KJQN202200513).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanjun Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The objective function \(O_{EF}\) in Eq. (3) is established using F-norm formulation, the proof of Theorem 1 is referred to Lee and Seung (1999). Actually, \(O_{EF}\) is obtained by fixing \({\textbf {U}}\) in \(O_{1}\), in this case, \(O_{EF}\) is a linear function that only relates to \({\textbf {U}}\). Therefore, we can minimize \(O_{EF}\) by taking the derivative with respect to \({\textbf {U}}\).

Proof of Theorem 1

As

$$\begin{aligned} O_{FE}&=\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {X}}-2r){\mathcal {P}}_{\Omega ^{T}} ({\textbf {X}}^{T})+2r\text{Tr}({\textbf {I}}{\mathcal {P}}_{\Omega ^{T}}({\textbf {V}}_{F} {\textbf {U}}^{T}))+2r^{2}\text{Tr}({\textbf {I}}{} {\textbf {I}}^{T}) \\ {}&-2\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {X}}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{F}{} {\textbf {U}}^{T}))+\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {U}} {\textbf {V}}_{F}^{T}){\mathcal {P}}_{\Omega ^{T}}({\textbf {V}}_{F}{} {\textbf {U}}^{T})), \end{aligned}$$

Let \(\psi _{ik}\) be the Lagrange multiplier for constraints \(u_{ik}\ge 0\), and \(\Psi =[\psi _{ik}].\) Then Lagrange function \({\mathcal {L}}_{O_{EF}}\) is

$$\begin{aligned} {\mathcal {L}}_{O_{EF}}&=\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {X}}-2r){\mathcal {P}}_{\Omega ^{T}} ({\textbf {X}}^{T})+2r\text{Tr}({\textbf {I}}{\mathcal {P}}_{\Omega ^{T}}({\textbf {V}}_{F}{} {\textbf {U}}^{T})) +2r^{2}\text{Tr}({\textbf {I}}{} {\textbf {I}}^{T}) \\ {}&-2\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {X}}){\mathcal {P}}_{\Omega ^{T}}({\textbf {V}}_{F} {\textbf {U}}^{T}))+\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {U}}{} {\textbf {V}}_{F}^{T}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{F}{} {\textbf {U}}^{T}))+\text{Tr}(\Psi {\textbf {U}}^{T}). \end{aligned}$$

The derivative of \({\mathcal {L}}_{O_{EF}}\) with respect to \({\textbf {U}}\) is:

$$\begin{aligned} \frac{{\mathcal {L}}_{O_{EF}}}{\partial {\textbf {U}}}=\,&-2{\mathcal {P}}_{\Omega } ({\textbf {X}}){\textbf {V}}_{F}+2r{\textbf {I}}{} {\textbf {V}}_{F}+2{\mathcal {P}}_{\Omega } ({\textbf {U}}{} {\textbf {V}}_{F}^{T}){\textbf {V}}_{F}+\Psi . \end{aligned}$$

Using KKT conditions \(\psi _{ik}u_{ik}=0\), the following equality for \(u_{ik}\) is obtained

$$\begin{aligned} -&({\mathcal {P}}_{\Omega }({\textbf {X}}){\textbf {V}}_{F})_{ik}u_{ik}+r({\textbf {I}} {\textbf {V}}_{F})_{ik}+({\mathcal {P}}_{\Omega }({\textbf {U}}{} {\textbf {V}}_{F}^{T}){\textbf {V}}_{F})_{ik}u_{ik}=0, \end{aligned}$$

which leads to the following updating formula:

$$\begin{aligned} u_{ik}\leftarrow u_{ik}\frac{\Big ({\mathcal {P}}_{\Omega }({\textbf {X}}){\textbf {V}}_{F}\Big )_{ik}}{\Big ({\mathcal {P}}_{\Omega } ({\textbf {U}}{} {\textbf {V}}_{F}^{T}+r){\textbf {V}}_{F}\big )_{ik}}. \end{aligned}$$

Thus, we prove the Theorem 1. \(\square\)

The objective function \(O_{PE}\) in Eq. (5) is established using F-norm formulation, thus \(O_{PE}\) is certainly bounded from below by zero. To finish the proof of Theorem 3, we need to indicate \(O_{PE}\) is non-increasing under the update steps in Eq. (6). Similar to the procedures described in Lee and Seung (1999), the auxiliary function used in Expectation-Maximization algorithm (Dempster et al. 1977; Saul and Pereira 1997) can also be involved in our proofs. The notion of the auxiliary function is given as follows:

Definition 1

(Dempster et al. 1977) \(T(v, v^{'})\) is an auxiliary function for E(v) if the following conditions

$$\begin{aligned} T(v, v^{'})\ge E(v), \quad T(v, v)=E(v) \end{aligned}$$

are satisfied.

It is verified that the auxiliary function is beneficial for proving the following lemma.

Lemma 1

(Dempster et al. 1977; Saul and Pereira 1997) If T is an auxiliary function of E, then E is non-increasing under the update

$$\begin{aligned} v^{t+1}=\arg \min _{v}T(v, v^{t}). \end{aligned}$$
(7)

Proof

Since \(v^{t+1}=\arg \min \limits _{v}T(v, v^{t})\), thus for \(v^{t}\), we can obtain

$$\begin{aligned} T(v^{t+1}, v^{t})\le T(v^{t}, v^{t}). \end{aligned}$$

While \(E(v^{t+1})=T(v^{t+1}, v^{t+1})\), according to Definition 1, we get

$$\begin{aligned} E(v^{t+1})\le T(v^{t+1}, v^{t})\le T(v^{t}, v^{t})=E(v^{t}). \end{aligned}$$

In what follows, we will indicate that the update formulas for \({\textbf {U}}_{1}, {\textbf {U}}_{2}, {\textbf {V}}_{2}\) in Eq. (6) is accurately the update in Eq. (7) by defining the suitable auxiliary functions \(T(v, v^{t})\) for Eq. (5).

According to the objective function \(O_{PE}\) in Eq. (6), we have

$$\begin{aligned} O_{PE}&=\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {X}}){\mathcal {P}}_{\Omega ^{T}}({\textbf {X}}^{T}) -2r\text{Tr}({\textbf {I}}{} {\textbf {X}}^{T}) +2r\text{Tr}({\textbf {I}}{\mathcal {P}}_{\Omega ^{T}}({\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T}))\\&\quad +2r\text{Tr}({\textbf {I}}{\mathcal {P}}_{\Omega ^{T}}({\textbf {V}}_{1}{} {\textbf {U}}_{1}^{T}))+2r^{2} \text{Tr}({\textbf {I}}{} {\textbf {I}}^{T})-2\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {X}}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{1}{} {\textbf {U}}_{1}^{T})) \\&\quad-2\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {X}}){\mathcal {P}}_{\Omega ^{T}}({\textbf {V}}_{2}\ textbf{U}_{2}^{T}))+\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}) {\mathcal {P}}_{\Omega ^{T}}({\textbf {V}}_{1}{} {\textbf {U}}_{1}^{T})) \\&\quad+\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {U}}_{2}{} {\textbf {V}}_{2}^{T}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T})) +\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T})) \end{aligned}$$

Taking

$$\begin{aligned} E_{{\textbf {U}}_{1}}&=2\text{Tr}({\mathcal {P}}_{\Omega }(r{\textbf {I}}-{\textbf {X}}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{1}{} {\textbf {U}}_{1}^{T}))+\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}) {\mathcal {P}}_{\Omega ^{T}}({\textbf {V}}_{1}{} {\textbf {U}}_{1}^{T})) \\ {}&\quad+\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T})),\\ E_{{\textbf {U}}_{2}}&=2\text{Tr}({\mathcal {P}}_{\Omega }(r{\textbf {I}}-{\textbf {X}}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T})) +\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {U}}_{2}{} {\textbf {V}}_{2}^{T}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T})) \\ {}&\quad+\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T})),\\ E_{{\textbf {V}}_{2}}&=2\text{Tr}({\mathcal {P}}_{\Omega }(r{\textbf {I}}-{\textbf {X}}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T})) +\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {U}}_{2}{} {\textbf {V}}_{2}^{T}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T})) \\ {}&\quad+\text{Tr}({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}){\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T})), \end{aligned}$$

as the part which are only relevant to \({\textbf {U}}_{1}\), \({\textbf {U}}_{2}\) and \({\textbf {V}}_{2}\) in \(O_{P}\), respectively, where \(k\in K_{1}\) and \(l\in K-K_{1}\). Considering any element \(u_{ik}\) in \({\textbf {U}}_{1}\), \(u_{il}\) in \({\textbf {U}}_{2}\) and \(v_{jl}\) in \({\textbf {V}}_{2}\), we use \(E_{{\textbf {U}}_{1}}(u_{ik})\), \(E_{{\textbf {U}}_{2}}(u_{il})\) and \(E_{{\textbf {V}}_{2}}(v_{jl})\) to denote the parts which includes \(u_{ik}\), \(u_{il}\) and \(v_{jl}\) in Eq. (8), respectively. It is easy to check that the first-order derivatives of \(E_{{\textbf {U}}_{ik}}\), \(E_{{\textbf {U}}_{ik}}\) and \(E_{{\textbf {U}}_{ik}}\) as follows:

$$\begin{aligned} E_{{\textbf {U}}_{1}}^{'}(u_{ik})&=\big (-2{\mathcal {P}}_{\Omega } ({\textbf {X}}){\textbf {V}}_{1}+2r{\textbf {IV}}_{1}+2{\mathcal {P}}_{\Omega } ({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}){\textbf {V}}_{1}+2{\mathcal {P}}_{\Omega } ({\textbf {U}}_{2}{} {\textbf {V}}_{2}^{T}){\textbf {V}}_{1}\big )_{ik},\\ E_{{\textbf {U}}_{2}}^{'}(u_{il})&=\big (-2{\mathcal {P}}_{\Omega } ({\textbf {X}}){\textbf {V}}_{2}+2r{\textbf {IV}}_{2}+2{\mathcal {P}}_{\Omega } ({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}){\textbf {V}}_{2}+2{\mathcal {P}}_{\Omega } ({\textbf {U}}_{2}{} {\textbf {V}}_{2}^{T}){\textbf {V}}_{2}\big )_{il},\\ E_{{\textbf {V}}_{2}}^{'}(v_{jl})&=\big (-2{\mathcal {P}}_{\Omega ^{T}} ({\textbf {X}}^{T}){\textbf {U}}_{2}+2r{\textbf {I}}^{T}{} {\textbf {U}}_{2}+2{\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T}){\textbf {U}}_{2}+2{\mathcal {P}}_{\Omega ^{T}} ({\textbf {V}}_{1}{} {\textbf {U}}_{1}^{T}){\textbf {U}}_{2}\big )_{jl}. \end{aligned}$$

Similarly, the second-order derivatives of \(E_{{\textbf {U}}_{1}}(u_{ik})\), \(E_{{\textbf {U}}_{2}}(u_{il})\) and \(E_{{\textbf {V}}_{2}}(v_{jl})\) are gives as

$$\begin{aligned} E_{{\textbf {U}}_{1}}^{''}(u_{ik})=2\big ({\textbf {V}}_{1}^{T}{} {\textbf {V}}_{1}\big )_{kk}, \quad E_{{\textbf {U}}_{2}}^{''}(u_{il})=2\big ({\textbf {V}}_{2}^{T}{} {\textbf {V}}_{2}\big )_{ll}, \quad E_{{\textbf {V}}_{2}}^{''}(v_{jl})=2\big ({\textbf {U}}_{2}^{T}{} {\textbf {U}}_{2}\big )_{ll}. \end{aligned}$$

\(\square\)

Since the update rule proposed in this paper is essentially element-wise, it is sufficient to indicate that each \(E_{{\textbf {U}}_{1}}(u_{ik}), E_{{\textbf {U}}_{2}}(u_{il}), E_{{\textbf {V}}_{2}}(v_{jl})\) are non-increasing under the update formulas of Eq. (7).

Lemma 2

Function

$$\begin{aligned}T(u_{ik}, u_{ik}^{t})&=E_{{\textbf {U}}_{1}}(u_{ik}^{t})+E_{{\textbf {U}}_{1}}^{'}(u_{ik}^{t})(u_{ik}-u_{ik}^{t}) \nonumber \\ {}&\quad +\frac{\Big ({\mathcal {P}}_{\Omega }({\textbf {U}}_{1} {\textbf {V}}_{1}^{T}+{\textbf {U}}_{2}{} {\textbf {V}}_{2}^{T} +r\big ){\textbf {V}}_{1}\Big )_{ik}}{u_{ik}^{t}}(u_{ik}-u_{ik}^{t})^{2},\nonumber \\&\quad T(u_{il}, u_{il}^{t})=E_{{\textbf {U}}_{2}}(u_{il}^{t})+E_{{\textbf {U}}_{2}}^{'}(u_{il}^{t}) (u_{il}-u_{il}^{t}) \nonumber \\&\quad +\frac{\Big ({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}+{\textbf {U}}_{2}{} {\textbf {V}}_{2}^{T} +r\big ){\textbf {V}}_{2}\Big )_{il}}{u_{il}^{t}}(u_{il}-u_{il}^{t})^{2},\nonumber \\&\quad T(v_{jl}, v_{jl}^{t})=E_{{\textbf {V}}_{2}}(v_{jl}^{t})+E_{{\textbf {V}}_{2}}^{'}(v_{jl}^{t})(v_{jl}-v_{jl}^{t}) \nonumber \\&\quad +\frac{\Big ({\mathcal {P}}_{\Omega ^{T}}({\textbf {V}}_{1}{} {\textbf {U}}_{1}^{T}+{\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T} +r\big ){\textbf {U}}_{2}\Big )_{jl}}{v_{jl}^{t}}(v_{jl}-v_{jl}^{t})^{2}, \end{aligned}$$
(8)

are auxiliary functions for \(E_{{\textbf {U}}_{1}}(u_{ik}), E_{{\textbf {U}}_{2}}(u_{il}), E_{{\textbf {V}}_{2}}(v_{jl})\), respectively, which are only associated with \(u_{ik}, u_{il}\) and \(v_{jl}\) in \(O_{P}\).

Proof

Since \(T(u_{ik}^{t}, u_{ik}^{t})=E_{{\textbf {U}}_{1}}(u_{ik}^{t}), T(u_{il}^{t}, u_{il}^{t})=E_{{\textbf {U}}_{2}}(u_{il}^{t})\) and \(T(v_{jl}^{t}, v_{jl}^{t})=E_{{\textbf {V}}_{2}}(v_{jl}^{t})\) are obvious, we only need to specify \(T(u_{ik}, u_{ik}^{t})\ge E_{{\textbf {U}}_{1}}(u_{ik}^{t}), T(u_{il}, u_{il}^{t})\ge E_{{\textbf {U}}_{2}}(u_{il}^{t})\) and \(T(v_{jl}, v_{jl}^{t})\ge E_{{\textbf {V}}_{2}}(v_{jl}^{t})\). To achieve this, we compare the Taylor series expansion of \(E_{{\textbf {U}}_{1}}(u_{ik}), E_{{\textbf {U}}_{2}}(u_{il}), E_{{\textbf {V}}_{2}}(v_{jl})\) at \(u_{ik}^{t}, u_{il}^{t}\) and \(v_{jl}^{t}\), respectively.

$$\begin{aligned} E_{{\textbf {U}}_{1}}(u_{ik})&=E_{{\textbf {U}}_{1}}(u_{ik}^{t})+E_{{\textbf {U}}_{1}}^{'}(u_{ik}^{t})(u_{ik}-u_{ik}^{t}) +\frac{E_{{\textbf {U}}_{1}}^{''}(u_{ik}^{t})}{2}(u_{ik}-u_{ik}^{t})^{2}\\ E_{{\textbf {U}}_{2}}(u_{il})&=E_{{\textbf {U}}_{2}}(u_{il}^{t})+E_{{\textbf {U}}_{2}}^{'}(u_{il}^{t})(u_{il}-u_{il}^{t}) +\frac{E_{{\textbf {U}}_{2}}^{''}(u_{il}^{t})}{2}(u_{il}-u_{il}^{t})^{2}\\ E_{{\textbf {V}}_{2}}(v_{jl})&=E_{{\textbf {V}}_{2}}(v_{jl}^{t})+E_{{\textbf {V}}_{2}}^{'}(v_{jl}^{t})(v_{jl}-v_{jl}^{t}) +\frac{E_{{\textbf {V}}_{2}}^{''}(v_{jl}^{t})}{2}(v_{jl}-v_{jl}^{t})^{2} \end{aligned}$$

with the expression of \(T(u_{ik}, u_{ik}^{t})\) to find that \(T(u, u_{ik}^{t})\ge E_{{\textbf {U}}_{1}}(u_{ik}^{t})\) is equivalent to

$$\begin{aligned} \frac{\Big ({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}+{\textbf {U}}_{2}{} {\textbf {V}}_{2}^{T} +r\big ){\textbf {V}}_{1}\Big )_{ik}}{u_{ik}^{t}}\ge \frac{E_{{\textbf {U}}_{1}}^{''}(u_{ik}^{t})}{2} =\big ({\textbf {V}}_{1}^{T}{} {\textbf {V}}_{1}\big )_{kk}, \end{aligned}$$

which follows \(\Big ({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}+{\textbf {U}}_{2}{} {\textbf {V}}_{2}^{T} +r\big ){\textbf {V}}_{1}\Big )_{ik}\ge u_{ik}^{t}\big ({\textbf {V}}_{1}^{T}{} {\textbf {V}}_{1}\big )_{kk}.\) On the other hand, since

$$\begin{aligned} \Big (\big ({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}\big ){\textbf {V}}_{1}\Big )_{ik} =\sum _{p=1}^{K_{1}}u_{ip}^{t}({\textbf {V}}_{1}^{T}{} {\textbf {V}}_{1})_{pk}\ge u_{ik}^{t}({\textbf {V}}_{1}^{T}{} {\textbf {V}}_{1})_{kk}. \end{aligned}$$

Therefore, \(T(u_{ik}, u_{ik}^{t})\ge E_{{\textbf {U}}_{1}}(u_{ik}^{t})\), similar proofs to \(T(u_{il}, u_{il}^{t})\ge E_{{\textbf {U}}_{2}}(u_{il}^{t})\) and \(T(v_{jl}, v_{jl}^{t})\ge E_{{\textbf {V}}_{2}}(v_{jl}^{t})\). Thus, the results are obtained.

Based on preceding analysis, it is easy to get the proof of Theorem 3. \(\square\)

Proof of Theorem 3

Replacing \(T(u_{ik}, u_{ik}^{t}), T(u_{il}, u_{il}^{t})\) and \(T(v_{jl}, v_{jl}^{t})\) in Eq. (8) by Eq. (6) leads to the following update formulas

$$\begin{aligned} u_{ik}^{t+1}=\arg \min _{u_{ik}}T(u_{ik},u_{ik}^{t}), u_{il}^{t+1}=\arg \min _{u_{il}}T(u_{il},u_{il}^{t}), v_{jl}^{t+1}=\arg \min _{v_{jl}}T(v_{jl}, v_{jl}^{t}). \end{aligned}$$

To find \(u_{ik}\) subject to \(T(u_{ik}, u_{ik}^{'})\) reaches to minimum at \(u_{ik}\), set

$$\begin{aligned} \frac{\partial T(u_{ik}, u_{ik}^{'})}{\partial u_{ik}}=\,&E_{{\textbf {U}}_{1}}^{'}(u_{ik}^{t})+2\frac{\Big ({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}\ {\textbf {V}}_{1}^{T}+{\textbf {U}}_{2}{} {\textbf {V}}_{2}^{T}+r\big ){\textbf {V}}_{1}\Big )_{ik}}{u_{ik}^{t}}(u_{ik}-u_{ik}^{t}) =0, \end{aligned}$$

which leads to \(u_{ik}=u_{ik}^{t}\frac{\big ({\mathcal {P}}_{\Omega }({\textbf {X}}){\textbf {V}}_{1}\big )_{ik} }{\Big ({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}+{\textbf {U}}_{2} {\textbf {V}}_{2}^{T}+r\big ){\textbf {V}}_{1}\Big )_{ik}}.\) Thus, \(u_{ik}^{t+1}=u_{ik}^{t}\frac{\big ({\mathcal {P}}_{\Omega }({\textbf {X}}){\textbf {V}}_{1}\big )_{il} }{\Big ({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}+{\textbf {U}}_{2} {\textbf {V}}_{2}^{T}+r\big ){\textbf {V}}_{1}\Big )_{jl}}.\) Similarly, we can prove

$$\begin{aligned} u_{il}^{t+1}&=\arg \min _{u_{il}}T(u_{il}, u_{il}^{'})=u_{il}^{t}\frac{\big ({\mathcal {P}}_{\Omega }({\textbf {X}}){\textbf {V}}_{2}\big )_{il} }{\Big ({\mathcal {P}}_{\Omega }({\textbf {U}}_{1}{} {\textbf {V}}_{1}^{T}+{\textbf {U}}_{2}{} {\textbf {V}}_{2}^ {T}+r\big ){\textbf {V}}_{2}\Big )_{il}},\\ v_{jl}^{t+1}&=\arg \min _{v_{jl}}T(v_{jl}, v_{jl}^{'})=v_{jl}^{t}\frac{\Big ({\mathcal {P}}_{\Omega } ({\textbf {X}}^{T}){\textbf {U}}_{2}\Big )_{jl}}{\Big ({\mathcal {P}}_{\Omega ^{T}}({\textbf {V}}_{1} {\textbf {U}}_{1}^{T}+{\textbf {V}}_{2}{} {\textbf {U}}_{2}^{T}+r\big ){\textbf {U}}_{2}\Big )_{jl} }. \end{aligned}$$

Since Eq. (8) is the auxiliary functions for \(E_{{\textbf {U}}_{1}}(u_{ik}), E_{{\textbf {U}}_{2}}(u_{il})\) and \(E_{{\textbf {V}}_{2}}(v_{jl})\), which are non-increasing under \(u_{ik}^{t+1}, u_{il}^{t+1}\) and \(v_{jl}^{t+1}\). Therefore, Eq. (5) is non-increasing under the update rules in Eq. (6). \(\square\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Zhou, X., Chen, L. et al. Explainable recommendations with nonnegative matrix factorization. Artif Intell Rev 56 (Suppl 3), 3927–3955 (2023). https://doi.org/10.1007/s10462-023-10619-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10619-9

Keywords