Abstract
In this paper, a new alternating nonmonotone projected Barzilai–Borwein (BB) algorithm is developed for solving large scale problems of nonnegative matrix factorization. Unlike the existing algorithms available in the literature, a nonmonotone line search strategy is proposed to find suitable step lengths, and an adaptive BB spectral parameter is employed to generate search directions such that the constructed subproblems are efficiently solved. Apart from establishment of global convergence for this algorithm, numerical tests on three synthetic datasets, four public face image datasets and a real-world transcriptomic dataset are conducted to show advantages of the developed algorithm in this paper. It is concluded that in terms of numerical efficiency, noise robustness and quality of matrix factorization, our algorithm is promising and applicable to face image reconstruction, and deep mining of transcriptomic profiles of the sub-genomes in hybrid fish lineage, compared with the state-of-the-art algorithms.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and material
The data used to support the findings of this study are available from the corresponding author upon request.
Notes
References
Barzilai J, Borwein JM (1988) Two-point step length gradient methods. IMA J Numer Anal 8:141–148. https://doi.org/10.1093/imanum/8.1.141
Berry MW, Browne M, Langville AN, Pauca VP, Plemmons RJ (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52:155–173. https://doi.org/10.1016/j.csda.2006.11.006
Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont
Birgin EG, Martínez JM, Raydan M (2000) Nonmonotone spectral projected gradient methods on convex sets. SIAM J Optim 10:1196–1211. https://doi.org/10.1137/S1052623497330963
Cai D, He XF, Han JW (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17:1624–1637. https://doi.org/10.1109/TKDE.2005.198
Dai YH, Fletcher R (2005) Projected Barzilai–Borwein methods for large-scale box-constrained quadratic programming. Numer Math 100:21–47. https://doi.org/10.1007/s00211-004-0569-y
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Program 91:201–213. https://doi.org/10.1007/s101070100263
Fu X, Huang K, Sidiropoulos ND, Ma WK (2019) Nonnegative matrix factorization for signal and data analytics: identifiability, algorithms, and applications. IEEE Signal Process Mag 36:59–80. https://doi.org/10.1109/MSP.2018.2877582
Gillis N, Glineur F (2008) Nonnegative factorization and the maximum edge biclique problem. ArXiv e-prints arXiv:0810.4225
Gong PH, Zhang CS (2012) Efficient nonnegative matrix factorization via projected Newton method. Pattern Recognit 45:3557–3565. https://doi.org/10.1016/j.patcog.2012.02.037
Grippo L, Sciandrone M (2000) On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Oper Res Lett 26:127–136. https://doi.org/10.1016/S0167-6377(99)00074-7
Guan NY, Tao DC, Luo ZG, Yuan B (2012) NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Process 60:2882–2898. https://doi.org/10.1109/TSP.2012.2190406
Hager WW, Zhang HC (2006) A new active set algorithm for box constrained optimization. SIAM J Optim 17:526–557. https://doi.org/10.1137/050635225
Han J, Han LX, Neumann M, Prasad U (2009a) On the rate of convergence of the image space reconstruction algorithm. Oper Matrices 3:41–58. https://doi.org/10.7153/oam-03-02
Han LX, Neumann M, Prasad AU (2009b) Alternating projected Barzilai–Borwein methods for nonnegative matrix factorization. Electron Trans Numer Anal 36:54–82. https://doi.org/10.1007/978-0-8176-4751-3_16
Hoyer PO (2004) Nonnegative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469. https://doi.org/10.1016/j.neucom.2011.09.024
Huang S, Wan Z (2017) A new nonmonotone spectral residual method for nonsmooth nonlinear equations. J Comput Appl Math 313:82–101. https://doi.org/10.1016/j.cam.2016.09.014
Huang S, Wan Z, Zhang J (2018) An extended nonmonotone line search technique for large-scale unconstrained optimization. J Comput Appl Math 330:586–604. https://doi.org/10.1016/j.cam.2017.09.026
Huang YK, Liu HW, Zhou SS (2015a) Quadratic regularization projected Barzilai–Borwein method for nonnegative matrix factorization. Data Min Knowl Discov 29:1665–1684. https://doi.org/10.1007/s10618-014-0390-x
Huang YK, Liu HW, Zhou SS (2015b) An efficient monotone projected Barzilai–Borwein method for nonnegative matrix factorization. Appl Math Lett 45:12–17. https://doi.org/10.1016/j.aml.2015.01.003
Kim DM, Sra S, Dhillon IS (2007) Fast Newton-type methods for the least squares nonnegative matrix approximation problem
Kim H, Park H (2008a) Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J Matrix Anal Appl 30:713–730. https://doi.org/10.1137/07069239X
Kim J, Park H (2008b) Toward faster nonnegative matrix factorization: a new algorithm and comparisons. Proc Eighth IEEE Int Conf Data Min. https://doi.org/10.1109/ICDM.2008.149
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791. https://doi.org/10.1038/44565
Li JC, Li WB, Liu XN (2020) An adaptive nonmonotone projected Barzilai–Borwein gradient method with active set prediction for nonnegative matrix factorization. Numer Math Theor Meth Appl 13:516–538. https://doi.org/10.4208/nmtma.OA-2019-0028
Li T, Wan Z (2019) New adaptive Barzilai–Borwein step size and its application in solving large scale optimization problems. ANZIAM J 61:76–98. https://doi.org/10.1017/S1446181118000263
Li XL, Zhang W, Dong XL (2017) A class of modified FR conjugate gradient method and applications to non-negative matrix factorization. Comput Math Appl 73:270–276. https://doi.org/10.1016/j.camwa.2016.11.017
Li Z, Tang J, He X (2018) Robust structured nonnegative matrix factorization for image representation. IEEE Trans Neural Netw Learn Syst 29:1947–1960. https://doi.org/10.1109/TNNLS.2017.2691725
Lin CJ (2007a) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18:1589–1596. https://doi.org/10.1109/TNN.2007.895831
Lin CJ (2007b) Projected gradient methods for nonnegative matrix factorization. Neural Comput 19:2756–2779. https://doi.org/10.1162/neco.2007.19.10.2756
Paarero P, Tapper U (1994) Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. Environmetrics 5:111–126. https://doi.org/10.1002/env.3170050203
Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1:127–239. https://doi.org/10.1561/2400000003
Pauca VP, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl 416:29–47. https://doi.org/10.1016/j.laa.2005.06.025
Pauca VP, Shahnaz F, Berry MW, Plemmons RJ (2004) Text mining using non-negative matrix factorization
Tang JY, Wan Z (2021) Orthogonal dual graph-regularized nonnegative matrix factorization for co-clustering. J Sci Comput 87:66. https://doi.org/10.1007/s10915-021-01489-w
Vavasis SA (2009) On the complexity of nonnegative matrix factorization. SIAM J Optim 20:1364–1377. https://doi.org/10.1137/070709967
Wan Z, Guo J, Liu JJ, Liu WY (2018) A modified spectral conjugate gradient projection method for signal recovery. Signal Image Video Process 12:1455–1462. https://doi.org/10.1007/s11760-018-1300-2
Wan Z, Tang JY, Ren L, Xiao YM, Liu SJ (2019) Optimization techniques to deeply mine the transciptomic profile of the sub-genomes in hybrid fish lineage. Front Genet 10:911. https://doi.org/10.3389/fgene.2019.00911
Xiao J, Hu FZ, Luo KK, Li WH, Liu SJ (2016) Unique nucleolar dominance patterns in distant hybrid lineage derived from Megalobrama Amblycephala \(\times \) Culter Alburnus. BMC Genet 17:150. https://doi.org/10.1186/s12863-016-0457-3
Xu YY, Yin WT (2013) A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J Imaging Sci 6:1758–1789. https://doi.org/10.1137/120887795
Zdunek R, Cichocki A (2006) Non-negative matrix factorization with quasi-Newton optimization. In: Proceedings of the eighth international conference on artificial intelligence and soft computing (ICAISC 2006)
Zhang L, Liu ZH, Pu JX, Song B (2020) Adaptive graph regularized nonnegative matrix factorization for data representation. Appl Intell 50:438–447. https://doi.org/10.1007/s10489-019-01539-9
Funding
This research is supported by the National Natural Science Foundation of China (Grant No. 71671190), the Fundamental Research Funds for the Central Universities of Central South University (Grant No. 206021706) and the Hunan Provincial Innovation Foundation For Postgraduate (Grant No. 150110022).
Author information
Authors and Affiliations
Contributions
Z.W. conceived and designed the research plan and wrote the paper. T.L. and J.T. performed the mathematical modelling, development of the algorithm, experiments and wrote the paper.
Corresponding author
Ethics declarations
Conflict of interest
We declare that all the authors have no any conflict of interest about submission and publication of this paper.
Code availability
All the computer codes used in this study are available from the corresponding author upon request.
Additional information
Responsible editor: Evangelos Papalexakis.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Remark 4
Proof
From (Lin 2007b) and (Gillis and Glineur 2008), the KKT conditions of Problem (1.1) are
Equivalently, (A.1) can be rewritten as
By the following definitions,
we can prove that the KKT conditions (A.1) and equations (2.20) are equal. This completes the proof of Remark 4. \(\square \)
Appendix B: Proof of Theorem 1
Proof
We prove Theorem 1 by mathematical induction.
For all \(t\ge 0\), from the first result in Lemma 3, it follows that \(D_t\) is a sufficiently descent direction at \(Z_t\). Then, similar to Lemma 1 in (Huang et al. 2018), there is a step length \(\lambda _t \in \{1, \rho , \rho ^2, \ldots , \}\) such that the following inequality holds:
It is easy to see that for \(t=0\), (3.1) and (3.2) hold.
Suppose that (3.1) and (3.2) also hold for \(t\in \{1, 2, \ldots , r-1\}\). Then, from Algorithm 1, it follows that \(f(Z_{r}) \le C_{\ell (r)}\) if \({\bar{f}}_1 \le {\bar{C}}_1\). Otherwise, \(f(Z_{r}) \le f(Z_{r-1}+{\hat{\lambda }}_2 D_{r-1}) \le C_{\ell (r-1)}\) and \(\eta _{r-1}=0\), which implies \(f(Z_{r}) \le \max \{C_{\ell (r-1)}, f(Z_r)\}= C_{\ell (r)}\). Consequently, \(f(Z_{t})\le C_{\ell {(t)}}\) holds for all \(t \in \{1, 2, \ldots , r\}\).
For \(t=r\), take \({\hat{\lambda }}_1= \lambda _t\) in (B.1). Then,
The last inequality in (B.2) is equivalent to (3.1).
Take \({\hat{\lambda }}_2=\lambda _t\). Then, it follows from (B.1) and \({\bar{Q}}_{t+1}\ge 1\) that
where the last inequality in (B.2) indicates that (3.2) holds. Therefore, Algorithm 1 is well defined. \(\square \)
Appendix C: Proof of Lemma 1
Proof
From the definition of \(C_{\ell (t)}\) and the inequality (3.3), it follows that
Thus, the sequence\(\{C_{\ell (t)}\}\) is nonincreasing. It directly follows that
Consequently, we have \(\{Z_t\} \subset L(H_0)=\{H\ge 0: f(H)\le f(H_0)\}\). The first result has been proved.
Since f is bounded below, from Theorem 2 in (Huang et al. 2018), it directly follows that the inequalities (3.4) and (3.5) hold. This ends the proof. \(\square \)
Appendix D: Proof of Lemma 2
Proof
Since f is continuously differentiable and \(\nabla f\) is Lipschitz continuous with the Lipschitz constant L, it holds that for any \(\lambda >0\),
Consequently, by taking \(\lambda =\dfrac{\lambda _t}{\rho }\), we have
On the other hand, for the obtained step length \(\lambda _t\), either \(\lambda _t=1\), or (3.1), or (3.2) fails at least once. That is to say,
or
Therefore,
Directly from (D.1) and (D.2), it follows that
By nonnegativity of \(\lambda _t\) and \(\rho \), we know
Combining the conditions \(\delta _t {\bar{Q}}_{t+1} \le \delta _{\max }\), \(0< \delta _{\max }<1\), and the first result in Lemma 3, we have proved the inequality (3.6). \(\square \)
Appendix E: Proof of Theorem 2
Proof
By the definition of \(D^{\alpha }(H)\), we have
Then, \(\Vert D^1 (Z_t)\Vert _F=0\) is equivalent to the termination condition of Algorithm 2.
From the second result in Lemma 3, Algorithm 2 will terminate in a finite iterations with \(\epsilon =0\) if \(Z_t\) is a stationary point of (2.3).
In the case that \(\{Z_t\}\) is an infinite sequence, we can prove that there exists an infinite subsequence \(l_1 \le l_2 \le \dots \) such that \(\Vert D^1 (Z_{l_t})\Vert _F\) approaches zero as t tends to \(\infty \). Take \(l_t=\ell (tM)-1\), from (3.5) in Lemma 1, it follows that
From Lemma 2, for all t, there exists a positive constant \({{\hat{\lambda }}}\) such that \({{\hat{\lambda }}} \le \lambda _{l_t} \le 1\). Consequently,
From the first result in Lemma 3 and \(0<\alpha _{\min }\le \alpha _{l_t}\le \alpha _{\max }\), we have
which implies
From P4 and P5 of Proposition 2.1 in (Hager and Zhang 2006), it follows that
Then, \(\Vert D_{l_t}\Vert _F=0\) implies \(\Vert D^1 (Z_{l_t})\Vert _F=0\). Therefore, (3.8) holds. The proof has been completed. \(\square \)
Appendix F: Proof of Theorem 3
Proof
Clearly, as a modification of the block coordinate descent method with two blocks for solving nonlinear optimization problems, the search direction and the step length computed by Algorithm 3 have the same nice properties as those by the algorithms developed in (Bertsekas 1999; Grippo and Sciandrone 2000). Therefore, the convergence result of Algorithm 3 is obtained directly from Corollary 2 in (Grippo and Sciandrone 2000). \(\square \)
Rights and permissions
About this article
Cite this article
Li, T., Tang, J. & Wan, Z. An alternating nonmonotone projected Barzilai–Borwein algorithm of nonnegative factorization of big matrices. Data Min Knowl Disc 35, 1972–2008 (2021). https://doi.org/10.1007/s10618-021-00773-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-021-00773-5