Abstract
We introduce a proximal alternating linearized minimization (PALM) algorithm for solving a broad class of nonconvex and nonsmooth minimization problems. Building on the powerful Kurdyka–Łojasiewicz property, we derive a self-contained convergence analysis framework and establish that each bounded sequence generated by PALM globally converges to a critical point. Our approach allows to analyze various classes of nonconvex-nonsmooth problems and related nonconvex proximal forward–backward algorithms with semi-algebraic problem’s data, the later property being shared by many functions arising in a wide variety of fundamental applications. A by-product of our framework also shows that our results are new even in the convex setting. As an illustration of the results, we derive a new and simple globally convergent algorithm for solving the sparse nonnegative matrix factorization problem.
Similar content being viewed by others
Notes
For instance, it suffices to assume that \(\varPsi \) is coercive to obtain a bounded sequence via (i); see also Remark 6.
References
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, 5–16 (2009)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. Ser. A 137, 91–129 (2013)
Auslender, A.: Méthodes numériques pour la décomposition et la minimisation de fonctions non différentiables. Numerische Mathematik 18, 213–223 (1971)
Auslender, A.: Optimisation—Méthodes numériques. Masson, Paris (1976)
Auslender, A.: Asymptotic properties of the Fenchel dual functional and applications to decomposition problems. J. Optim. Theory Appl. 73, 427–449 (1992)
Auslender, A., Teboulle, M., Ben-Tiba, S.: Coupling the logarithmic-quadratic proximal method and the block nonlinear Gauss-Seidel algorithm for linearly constrained convex minimization. In: Thera, M., Tichastschke, R. (eds.) Lecture Notes in Economics and Mathematical Systems, vol. 477. pp. 35–47 (1998)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci 2, 183–202 (2009)
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. Preprint (2011)
Berry, M., Browne, M., Langville, A., Pauca, P., Plemmons, R.J.: Algorithms and applications for approximation nonnegative matrix factorization. Comput. Stat. Data Anal. 52, 155–173 (2007)
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, New Jersey (1989)
Blum, M., Floyd, R.W., Pratt, V., Rivest, R., Tarjan, R.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)
Bolte, J., Combettes, P.L., Pesquet, J.-C.: Alternating proximal algorithm for blind image recovery. In: Proceedings of the 17-th IEEE International Conference on Image Processing,Hong-Kong, ICIP, pp. 1673–1676 (2010)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362, 3319–3363 (2010)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18, 556–572 (2007)
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. Wiley, New York (2009)
Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper. Res. Lett. 26, 127–136 (2000)
Heiler, M., Schnorr, C.: Learning sparse representations by non-negative matrix factorization and sequential cone programming. J. Mach. Learn. Res 7, 1385–1407 (2006)
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier 48, 769–783 (1998)
Lee, D.D., Seung, H.S.: Learning the part of objects from nonnegative matrix factorization. Nature 401, 788–791 (1999)
Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19, 2756–2779 (2007)
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles. Éditions du centre National de la Recherche Scientifique, Paris, 8–89 (1963)
Luss, R., Teboulle, M.: Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Rev. 55, 65–98 (2013)
Mordukhovich, B.: Variational Analysis and Generalized Differentiation. I. Basic Theory, Grundlehren der Mathematischen Wissenschaften, vol. 330. Springer, Berlin (2006)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New-York (1970)
Palomar, D.P., Eldar, Y. (eds.): Convex Optimization in Signal Processing and Communications. Cambridge University Press, UK (2010)
Powell, M.J.D.: On search directions for minimization algorithms. Math. Program. 4, 193–201 (1973)
Rockafellar, R.T., Wets, R.: Variational Analysis Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)
Sra, S., Nowozin, S., Wright, S.J. (eds.): Optimization for Machine Learning. The MIT Press, Cambridge (2011)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)
Zangwill, W.I.: Nonlinear Programming: A Unified Approach. Prentice Hall, Englewood Cliffs (1969)
Author information
Authors and Affiliations
Corresponding author
Additional information
Jérôme Bolte: This research benefited from the support of the FMJH Program Gaspard Monge in optimization and operation research (and from the support to this program from EDF) and it was co-funded by the European Union under the 7th Framework Programme “FP7-PEOPLE-2010-ITN”, grant agreement number 264735-SADCO. Shoham Sabach: Supported by a Tel Aviv University postdoctoral fellowship. Marc Teboulle: Partially supported by the Israel Science Foundation, ISF Grant 998-12.
Appendix: KL results
Appendix: KL results
This appendix summarizes some important results on KL theory and gives some examples.
Definition 5
(Semi-algebraic sets and functions)
-
(i)
A subset \(S\) of \(\mathbb R ^{d}\) is a real semi-algebraic set if there exists a finite number of real polynomial functions \(g_{ij} , h_{ij} : \mathbb R ^{d} \rightarrow \mathbb R \) such that
$$\begin{aligned} S = \bigcup _{j = 1}^{p} \bigcap _{i = 1}^{q} \left\{ u \in \mathbb R ^{d} : \; g_{ij}\left( u\right) = 0 \, \text {and } \, h_{ij}\left( u\right) < 0 \right\} \!\,. \end{aligned}$$ -
(ii)
A function \(h : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] \) is called semi-algebraic if its graph
$$\begin{aligned} \left\{ \left( u , t\right) \in \mathbb R ^{d + 1} : \; h\left( u\right) = t \right\} \end{aligned}$$is a semi-algebraic subset of \(\mathbb R ^{d + 1}\).
The following result is a nonsmooth version of the Łojasiewicz gradient inequality, it can be found in [16, 17].
Theorem 3
Let \(\sigma : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] \) be a proper and lower semicontinuous function. If \(\sigma \) is semi-algebraic then it satisfies the KL property at any point of \({\hbox {dom}}\,{\sigma }\).
The class of semi-algebraic sets is stable under the following operations: finite unions, finite intersections, complementation and Cartesian products.
Example 2
(Examples of semi-algebraic sets and functions) There is broad class of functions arising in optimization.
-
Real polynomial functions.
-
Indicator functions of semi-algebraic sets.
-
Finite sums and product of semi-algebraic functions.
-
Composition of semi-algebraic functions.
-
Sup/Inf type function, e.g., \(\sup \left\{ g\left( u , v\right) : \; v \in C \right\} \) is semi-algebraic when \(g\) is a semi-algebraic function and \(C\) a semi-algebraic set.
-
In matrix theory, all the following are semi-algebraic sets: cone of PSD matrices, Stiefel manifolds and constant rank matrices.
-
The function \(x \rightarrow \hbox {dist}\left( x , S\right) ^{2}\) is semi-algebraic whenever \(S\) is a nonempty semi-algebraic subset of \(\mathbb R ^{d}\).
Remark 8
The above results can be proven directly or via the fundamental Tarski-Seidenberg principle: The image of a semi-algebraic set \(A \subset \mathbb R ^{d + 1}\) by the projection \(\pi : \mathbb R ^{d + 1} \rightarrow \mathbb R ^{d}\) is semi-algebraic.
All these results and properties can be found in [1–3].
Let us now give some examples of semi-algebraic functions and other notions related to KL functions and their minimization through PALM.
Example 3
(\(\left\| {\cdot } \right\| _{0}\) is semi-algebraic) The sparsity measure (or the counting norm) of a vector \(x\) of \(\mathbb R ^d\) is defined by
For any given subset \(I \subset \left\{ 1 , \ldots , d \right\} \), we denote by \(\left| I\right| \) its cardinal and we define
The graph of \(\left\| {\cdot } \right\| _{0}\) is given by a finite union of product sets:
it is thus a piecewise linear set, and in particular a semi-algebraic set. Therefore \(\left\| {\cdot } \right\| _{0}\) is semi-algebraic. As a consequence the merit functions appearing in the various sparse NMF formulations we studied in Sect. 4 are semi-algebraic, hence KL.
Example 4
(\(\left\| {\cdot } \right\| _{p}\) and KL functions) Being given \(p > 0\) the \(p\) norm is defined through
Let us establish that \(\left\| {\cdot } \right\| _{p}\) is semi-algebraic whenever \(p\) is rational, i.e., \(p = \frac{p_{1}}{p_{2}}\) where \(p_{1}\) and \(p_{2}\) are positive integers. From a general result concerning the composition of semi-algebraic functions we see that it suffices to establish that the function \(s > 0 \rightarrow s^{\frac{p_{1}}{p_{2}}}\) is semi-algebraic. Its graph in \(\mathbb R ^{2}\) can be written as
This last set is semi-algebraic by definition.
When \(p\) is irrational \(\left\| {\cdot } \right\| ^{p}\) is not semi-algebraic, however for any semi-algebraic and lower semicontinuous functions \(H,\,f\) and any nonnegative real numbers \(\alpha \) and \(\lambda \) the functions
are KL functions (see, e.g., [2] and references therein) with \(\varphi \) of the form \(\varphi \left( s\right) = cs^{1 - \theta }\) where \(c\) is positive and \(\theta \) belongs to \(\left( 0 , 1\right] \).
1.1 Convex functions and KL property
Our developments on the convergence of PALM and its rate of convergence seem to be new even in the convex case. It is thus very important to realize that most convex functions encountered in finite dimensional applications satisfy the KL property. This may be due to the fact that they are semi-algebraic or subanalytic, but it can also come from more involved reasons involving o-minimal structures (see [2] for further details) or more down-to-earth properties like various growth conditions (see below). The reader which is wondering what a non KL convex function looks like can consult [15]. The convex counterexample provided in this work exhibit a wildly oscillatory collection of level sets, a phenomenon which seems highly unlikely to happen with functions modeling real world problems.
An interesting and rather specific feature of convex functions is that their desingularizing function \(\varphi \) can be explicitly computed from rather common and simple properties. Here are two important examples taken from Attouch et al. [2].
Example 5
(Growth condition for convex functions) Consider a proper, convex and lower semicontinuous function \(\sigma : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] \). Assume that \(\sigma \) satisfies the following growth condition: There exist a neighborhood \(U\) of \(\bar{x},\,\eta > 0, c > 0\) and \(r \ge 1\) such that
where \(\bar{x} \in \mathrm{argmin }\, \sigma \ne \emptyset \). Then \(\sigma \) satisfies the KL property at the point \(\bar{x}\) for \(\varphi \left( s\right) = r\; c^{-\frac{1}{r}} \; s^{\frac{1}{r}}\) on the set \(U \cap \left[ \min \sigma < \sigma < \min \sigma + \eta \right] \) (see, for more details, [15, 16]).
Example 6
(Uniform convexity) Assume now that \(\sigma \) is uniformly convex i.e., satisfies
for all \(x , y \in \mathbb R ^{d}\) and \(u \in \partial \sigma \left( x\right) \) (when \(p = 2\) the function is called strongly convex). Then \(\sigma \) satisfies the Kurdyka–Łojasiewicz property on \({\hbox {dom}}\,{\sigma }\) with \(\varphi \left( s\right) = pc^{-\frac{1}{p}}s^{\frac{1}{p}}\).
Rights and permissions
About this article
Cite this article
Bolte, J., Sabach, S. & Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014). https://doi.org/10.1007/s10107-013-0701-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-013-0701-9
Keywords
- Alternating minimization
- Block coordinate descent
- Gauss-Seidel method
- Kurdyka–Łojasiewicz property
- Nonconvex-nonsmooth minimization
- Proximal forward-backward
- Sparse nonnegative matrix factorization