Abstract
In this paper we introduce a novel abstract descent scheme suited for the minimization of proper and lower semicontinuous functions. The proposed abstract scheme generalizes a set of properties that are crucial for the convergence of several first-order methods designed for nonsmooth nonconvex optimization problems. Such properties guarantee the convergence of the full sequence of iterates to a stationary point, if the objective function satisfies the Kurdyka–Łojasiewicz property. The abstract framework allows for the design of new algorithms. We propose two inertial-type algorithms with implementable inexactness criteria for the main iteration update step. The first algorithm, i\(^2\)Piano, exploits large steps by adjusting a local Lipschitz constant. The second algorithm, iPila, overcomes the main drawback of line-search based methods by enforcing a descent only on a merit function instead of the objective function. Both algorithms have the potential to escape local minimizers (or stationary points) by leveraging the inertial feature. Moreover, they are proved to enjoy the full convergence guarantees of the abstract descent scheme, which is the best we can expect in such a general nonsmooth nonconvex optimization setup using first-order methods. The efficiency of the proposed algorithms is demonstrated on two exemplary image deblurring problems, where we can appreciate the benefits of performing a linesearch along the descent direction inside an inertial scheme.
Similar content being viewed by others
References
Bertero, M., Boccacci, P., Ruggiero, V.: Inverse Imaging with Poisson Data. IOP Publishing, Bristol (2018)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., Burachik, R.S., Combettes, P.L., Elser, V., Luke, D.R., Wolkowicz, H. (eds.) Fixed-point Algorithms for Inverse Problems in Science and Engineering. Springer Optimization and Its Applications, pp. 185–212. Springer, New York, NY (2011)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)
Bonettini, S., Loris, I., Porta, F., Prato, M., Rebegoldi, S.: On the convergence of a linesearch based proximal-gradient method for nonconvex optimization. Inverse Probl. 33(5), 055005 (2017)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Bonettini, S., Prato, M., Rebegoldi, S.: A block coordinate variable metric linesearch based proximal gradient method. Comput. Optim. Appl. 71(1), 5–52 (2018)
Chouzenoux, E., Pesquet, J.-C., Repetti, A.: A block coordinate variable metric forward-backward algorithm. J. Global Optim. 66(3), 457–485 (2016)
Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates. J. Opt. Theory Appl. 165, 874–900 (2015)
Li, G., Pong, T.K.: Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Math. Program. 159, 371–401 (2016)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Bolte, J., Danilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier 48(3), 769–783 (1998)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: Subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
den Dries, L.V.: Tame Topology and -minimal Structures. Cambridge University Press, 150 184 (1998)
Bolte, J., Daniilidis, A.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 10, 556–572 (2007)
Absil, P.A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM J. Optim. 16(2), 531–547 (2005)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)
Ochs, P.: Unifying abstract inexact convergence theorems and block coordinate variable metric iPiano. SIAM J. Optim. 29(1), 541–570 (2019)
Bonettini, S., Prato, M., Rebegoldi, S.: Convergence of inexact forward-backward algorithms using the forward-backward envelope. SIAM J. Optim. 30(4), 3069–3097 (2020)
Bonettini, S., Prato, M., Rebegoldi, S.: New convergence results for the inexact variable metric forward-backward method. Appl. Math. Comput. 392, 125719 (2021)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)
Polyak, B.: Introduction to Optimization. Optimization Software - Inc.Publication Division, New York (1987)
Ayers, G.R., Dainty, J.C.: Iterative blind deconvolution method and its applications. Opt. Lett. 13(7), 547–549 (1988)
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Structured sparsity through convex optimization. Stat. Sci. 27(4), 450–468 (2012)
Bonettini, S., Rebegoldi, S., Ruggiero, V.: Inertial variable metric techniques for the inexact forward-backward algorithm. SIAM J. Sci. Comput. 40(5), 3180–3210 (2018)
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward-backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
Rockafellar, R.T., Wets, R.J.-B., Wets, M.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)
Chouzenoux, E., Pesquet, J.-C., Repetti, A.: Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162(1), 107–132 (2014)
Noll, D.: Convergence of non-smooth descent methods using the Kurdyka-Łojasiewicz inequality. J. Opt. Theory Appl. 160(2), 553–572 (2014)
Stella, L., Themelis, A., Patrinos, P.: Forward-backward quasi-Newton methods for nonsmooth optimization problems. Comput. Optim. Appl. 67, 443–487 (2017)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search based methods for nonsmooth optimization. SIAM J. Optim. 26(2), 891–921 (2016)
Zalinescu, A.: Convex Analysis in General Vector Spaces. World Scientific Publishing, Singapore (2002)
Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 19(4), 1167–1192 (2012)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books on Mathematics. Springer, New York (2011)
Calatroni, L., Chambolle, A.: Backtracking strategies for accelerated descent methods with smooth composite objectives. SIAM J. Optim. 29(3), 1772–1798 (2019)
Scheinberg, K., Goldfarb, D., Bai, X.: Fast first-order methods for composite convex optimization with backtracking. Found. Comput. Math. 14, 389–417 (2014)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton, NJ (1970)
Chen, Y., Pock, T., Ranftl, R., Bischof, H.: Revisiting loss-specific training of filter-based MRFs for image restoration. In: Weickert, J., Hein, M., Schiele, B. (eds.) Pattern Recognition, pp. 271–281. Springer, Berlin, Heidelberg (2013)
Chen, Y., Ranftl, R., Pock, T.: Insights into analysis operator learning: from patch-based sparse models to higher order mrfs. IEEE Trans. Image Process. 23(3), 1060–1072 (2014)
Bonettini, S., Ochs, P., Prato, M., Rebegoldi, S.: inertial inexact Proximal algorithm for nonconvex optimization (i2Piano) and inertial Proximal inexact linesearch algorithm (iPila) software. http://www.oasis.unimore.it/site/home/software.html (2021)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. J. Phys. D. 60(1–4), 259–268 (1992)
Repetti, A., Chouzenoux, E.: RestoVMFB Lab: Matlab Toolbox for Image Restoration with the Variable Metric Forward-Backward Algorithm. http://www-syscom.univ-mlv.fr/~chouzeno/Logiciel.html (2013)
Benning, M., Betcke, M.M., Ehrhardt, M.J., Schönlieb, C.-B.: Choose your path wisely: gradient descent in a Bregman distance framework. SIAM J. Imaging Sci. 14(2), 814–843 (2021)
Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Funding
Silvia Bonettini, Marco Prato and Simone Rebegoldi are members of the INdAM research group GNCS. Peter Ochs acknowledges funding by the German Research Foundation (DFG Grant OC 150/3-1).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Appendix: convergence Analysis of i\(^2\)Piano and iPila
A Appendix: convergence Analysis of i\(^2\)Piano and iPila
In this section we analyze the convergence properties of algorithms i\(^2\)Piano and iPila, showing that they can be both considered as special cases of the abstract scheme presented in Sect. 3.
1.1 A.1 Preliminary results
We collect here new basic results concerning the inexactness criterion introduced in Sect. 4.1, which is incorporated in our proposed algorithms. The following lemma is a consequence of the strong convexity of the function \(h (\cdot ;x,s)\) defined in (15), and will be often employed in the following. In order to simplify the notation, here we omit the iteration index k.
Lemma 8
Suppose that Assumptions [A1]–[A2] hold true. For a given pair \((x,s)\in \textrm{dom}(f_1)\times {\mathbb {R}}^n\), let \({\hat{y}}, {\tilde{y}}\) be defined as in (1618). Then, the following inequalities hold.
Proof
Inequalities (36–37) follow by combining the strong convexity of the function \(h(\cdot ;x,s)\) and condition (18) as in [25, Lemma2]. As for (38), we have
where the last inequality follows from the application of (36–37). \(\square\)
The next lemma provides a subgradient \(\hat{v}\in \partial f(\hat{y})\) whose norm is bounded from above by a quantity containing \(\sqrt{ - h( {\tilde{y}};x,s)}\). Its proof is omitted since it is almost identical to the one of Lemma 3 in [25].
Lemma 9
Suppose Assumptions [A1]–[A3] hold true. Let x be a point in \(\textrm{dom}(f_1)\) and let \({\hat{y}}, {\tilde{y}}\) be defined as in (16–18). Moreover, assume that \(\alpha \in [\alpha _{min},\alpha _{max}]\), with \(0<\alpha _{min}\le \alpha _{max}\) and \(\beta \in [0,\beta _{max}]\), with \(\beta _{max}\ge 0\). Then, there exists a subgradient \({\hat{v}}\in \partial f( {\hat{y}})\) such that
where the two constants p, q depend only on \(\alpha _{min},\alpha _{max},\beta _{max}\) and on the Lipschitz constant L.
The following lemma is the equivalent of Lemma 4-5 in [25].
Lemma 10
Suppose Assumptions [A1]–[A3] hold true and assume \(0<\alpha _{min}\le \alpha \le \alpha _{max}\), \(\beta \in [0,\beta _{max}]\). Let (x, s) be a point in \(\textrm{dom}(f_1)\times {\mathbb {R}}^n\) and let \({\hat{y}}, {\tilde{y}}\) be defined as in (16–18), for some \(\tau \ge 0\). Then, there exists \(c,d,{\bar{c}},{\bar{d}} \in {\mathbb {R}}\) depending only on \(\alpha _{min},\alpha _{max},\beta _{max},\tau\) such that
Proof
From the Descent Lemma [51, Proposition A.24] we have
The inclusion \(0\in \partial _\epsilon h( {\tilde{y}};x,s)\) in (20) implies that there exists a vector \(e\in {\mathbb {R}}^n\) with
such that
(see [7] and references therein). The definition of \(\epsilon\)-subdifferential implies
Summing inequalities (43) and (45) yields
Now we consider each term at the right-hand-side in the above inequality so as to obtain a lower bound. Using the Cauchy-Schwarz inequality, Assumption [A3], (36) and (37) we obtain
Similarly, using again the Cauchy-Schwarz inequality, (37) and (38), we can write
Moreover, from (44) and (20) we obtain \(\Vert e\Vert \le \sqrt{2\alpha _{max}\epsilon }\le \sqrt{\alpha _{max}\tau (-h( {\tilde{y}};x,s))}\) which, using also (37), yields
Finally, using (37), we can also write
Combining (46) with (38, 47, 48, 49, 50) and (20), gives (41) with
As for (42), using the Descent Lemma we obtain
for all \(x, y\in \textrm{dom}(f_1)\). Summing \(f_1(y)\) on both sides yields
From the previous inequality with \(y= {\hat{y}}\), recalling that \(h( {\hat{y}};x,s)\le 0\) and combining with (36) yields (42), where the constants are set as \({\bar{c}} = (L\alpha _{max}+\beta _{max}) (1+{\tau }/{2})\), \({\bar{d}} = {\beta _{max}}/{(2\alpha _{min})}\). \(\square\)
1.2 A.2 Convergence analysis of i\(^2\)Piano
Our aim now is to frame i\(^2\)Piano in the abstract scheme defined by Conditions 3 to enjoy the favourable convergence guarantees that are provided by Theorem 5. The line of the proof developed in this section is based on an extension of the arguments in [23]. We start the convergence analysis by showing that i\(^2\)Piano is well-posed and that its parameters satisfy some useful relations.
Lemma 11
The loop between STEP 2 and STEP 6 terminates in a finite number of steps. In particular, there exists \(\overline{L}>0\) such that \(L_k\le \overline{L}\), \(\forall \ k\ge 0\). Moreover, we have
and there exist two positive constants \(\alpha _{min},\alpha _{max}\) with \(0<\alpha _{min}\le \alpha _{max}\) such that \(\alpha _k\in [\alpha _{min},\alpha _{max}]\), \(\forall k\ge 0\). We also have
Proof
Since \(\eta >1\), after a finite number of steps the tentative value of \(L_k\) satisfies \(L_k\ge L\), where L is the Lipschitz constant of \(\nabla f_0\). Then, from the Descent Lemma, the inequality at Step 6 is satisfied. From \(\delta \ge \gamma\) we have \(b_k\ge 1\), which implies (51). A simple inspection shows that the following equalities hold:
which leads to rewriting the parameter \(\alpha _k\) as
Then there holds \(\alpha _k\ge \alpha _{min}\) with \(\alpha _{min} = (1+\theta \omega )/(2(\overline{L} + 2\delta ))\) and, since \(\theta \omega \le 1\), we also have \(\alpha _k\le \alpha _{max}\) with \(\alpha _{max} = 2/L_{min}\). Moreover, we have
and
\(\square\)
Notice that the case \(\tau =0\), which corresponds to the exact computation of the inertial proximal gradient point at Step 5, implies \(\theta = 1\) and, choosing \(\omega =1\), the parameters settings in i\(^2\)Piano are exactly the same as in [6]. The need of introducing the parameter \(\omega\) is mainly technical and will be explained in the following.
We now prove that condition [H1] holds for i\(^2\)Piano when the corresponding merit function \(\Phi\) is defined as follows:
Proposition 12
Let \(\{x^{(k)}\}_{k\in {\mathbb {N}}}\) be the sequence generated by i\(^2\)Piano. Then there holds
Consequently, there exist \(\{s^{(k)}\}_{k\in {\mathbb {N}}}\), \(\{d_k\}_{k\in {\mathbb {N}}}\), \(\{a_k\}_{k\in {\mathbb {N}}}\) such that [H1] holds with \(\Phi\) as in (55).
Proof
By summing the quantity \(f_1( { x}^{(k+1)})\) to both sides of inequality (24) we obtain
where the first equality is obtained by adding and subtracting to the right-hand-side the quantity \(\Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2/(2{\alpha _k}) +\beta _k/\alpha _k\langle { x}^{(k+1)}- { x}^{(k)}, { x}^{(k)}- { x}^{(k-1)}\rangle\), the subsequent inequality follows from the basic relation \(2\langle a,b\rangle \le \Vert a\Vert ^2+\Vert b\Vert ^2\) and the next one from (38). Recalling (52), the above inequality can be conveniently rewritten as
Finally, exploiting (53), we obtain condition [H1] with \(\Phi\) given in (55), \(a_k=1\) and
\(\square\)
Under Assumption [A4], \(\Phi\) is bounded from below, hence, condition [H1] implies
and, if \(\omega < 1\), also
The choice \(\omega < 1\) is enforced when \(\tau > 0\) in order to obtain (60). If we take \(\omega =1\), with the same arguments as above we still obtain (59). It is also worth noticing that, for large values of \(\tau\), that is when a coarser accuracy is allowed in the computation of \({\tilde{y}}^{(k)}\), the parameter \(\theta\) can be very small and this also influences the choice of \(\alpha _k\) and \(\beta _k\).
In order to prove condition [H2], let us now introduce the second merit function as follows
The following result is proved using similar arguments as the ones used in Lemma 4-5 in [25].
Proposition 13
Let \(\{ { x}^{(k)}\}_{k\in {\mathbb {N}}}\) be the sequence generated by i\(^2\)Piano and, for each k, let the point \(y^{(k)}\) be defined as
Then, there exist \(\{\rho ^{(k)}\}_{k\in {\mathbb {N}}}\), \(\{r_k\}_{k\in {\mathbb {N}}}\) such that condition [H2] holds with \(\Phi\) defined in (55), \({\mathcal {F}}\) defined in (61), \(s^{(k)}\) defined in (57) and \(u^{(k)}=\hat{x}^{(k+1)}\).
Proof
When \(\omega =1\) we necessarily have \(\tau = 0\), which means \({\tilde{y}}^{(k)}= \hat{x}^{(k+1)}\). Therefore, [H2] direcly follows from [H1] with \(u^{(k)}= { x}^{(k+1)}\), \(\rho ^{(k)} = \sqrt{2\delta } \Vert { x}^{(k)}- { x}^{(k-1)}\Vert\), \(r_k=0\). Consider now the case \(\omega <1\). From (41) and (42), we directly obtain
where \(c,d,{\bar{c}},{\bar{d}}\) are defined as in Lemma 10 and do not depend on k. Combining the two inequalities above we obtain
Recalling the definition of \({\mathcal {F}}\) in (61), the above inequalities can be rewritten as
where \(\rho ^{(k)}, r_k\) are given by
From (59–60) we obtain \(\displaystyle \lim _{k\rightarrow \infty } r_k = 0\), hence, assumption [H2] is satisfied with the above settings and with \(u^{(k)}= y^{(k)}\). \(\square\)
Next we show that condition [H3] holds for i\(^2\)Piano. The following result combines elements of Lemma 3 in [25] and Lemma 17 in [23].
Proposition 14
There exist \(b>0\), \(I\subset {\mathbb {Z}}\), \(\{\theta _i\}_{i\in I}\) such that condition [H3] holds with \(\{u^{(k)}\}_{k\in {\mathbb {N}}},\{\rho ^{(k)}\}_{k\in {\mathbb {N}}},\{d_k\}_{k\in {\mathbb {N}}}\) defined as in Proposition 13 and in (58), \(\zeta _{k+1}=0\) and \(b_{k+1}=1\).
Proof
From the separable structure of \({{\mathcal {F}}}\), if \(v\in \partial f(u)\) and \(\rho \in {\mathbb {R}}\), we have that \((v,\rho )\in \partial {{\mathcal {F}}}(u,\rho )\). In particular,
If \(\omega = 1\), we are in the case \(\tau = 0\). This means that \({ x}^{(k+1)}= y^{(k)}\), \(\rho ^{(k)}= \sqrt{2\delta } \Vert { x}^{(k)}- { x}^{(k-1)}\Vert\). Moreover, (58) reduces to \(d_k = \sqrt{\gamma }\Vert { x}^{(k)}- { x}^{(k-1)}\Vert = \sqrt{\gamma /(2\delta )}\rho ^{(k)}\). From (39), we have that there exists a subgradient \({\hat{v}}^{(k+1)}\in \partial f(y^{(k)})\) and a positive constant p such that
Hence, \(\Vert {\hat{v}}^{(k+1)}\Vert +\rho ^{(k)}\le p d_{k+1}/{\sqrt{\gamma }} + d_k(p+\sqrt{2\delta })/{\sqrt{\gamma }}\) and, recalling (65), [H3] follows with \(b = (p+\sqrt{2\delta })/{\sqrt{\gamma }}\), \(I=\{0,1\}\), \(\theta _0=\theta _1=\frac{1}{2}\). Consider now the case \(\omega < 1\). From (40), there exists a subgradient \({\hat{v}}^{(k+1)}\in \partial f(y^{(k)})\) and a positive constant q such that
where \(r = \sqrt{2} \max \left\{ (2\delta \alpha _{max}/\theta +c)/(1-\omega ), d/\gamma \right\} ^{\frac{1}{2}}\). Then, combining (66) with (67), in view of (65) we obtain
and the thesis follows with \(I = \{1,2\}\), \(\theta _1=\theta _2 = \frac{1}{2}\). \(\square\)
The following lemma holds for all methods whose iterates satisfy [H1],[H2], [H3] with \({{\mathcal {F}}}\) defined as in (61), and it is crucial to ensure condition [H6] for i\(^2\)Piano.
Proposition 15
Let Assumptions [A1]–[A4] be satisfied and assume that \(\{ { x}^{(k)}\}_{k\in {\mathbb {N}}}\) satisfies [H1],[H2], [H3] with \({{\mathcal {F}}}\) defined as in (61). If \(\{(x^{(k_j)},\rho ^{(k_j)})\}_{j\in {\mathbb {N}}}\) is a subsequence of \(\{( { x}^{(k)},\rho ^{(k)})\}_{k\in {\mathbb {N}}}\) converging to some \((x^*,\rho ^*)\in {\mathbb {R}}^n\times {\mathbb {R}}^m\) such that \(\lim _{j\rightarrow \infty }\Vert u^{(k_j)}-x^{(k_j)}\Vert =0\), then \(\rho ^*=0\) and \(\lim _{j\rightarrow \infty }{{\mathcal {F}}}( u^{(k_j)},\rho ^{(k_j)}) = {{\mathcal {F}}}(x^*,\rho ^*)\).
Proof
Recalling that [H1] implies \(\displaystyle \lim _{k\rightarrow \infty } d_k=0\), from [H3] and thanks to the separable structure of \({\mathcal {F}}\), we obtain that there exists \({\hat{v}}^{(k)}\in \partial f(u^{(k)})\) such that \(\displaystyle \lim _{k\rightarrow \infty } \Vert {\hat{v}}^{(k)}\Vert =\displaystyle \lim _{k\rightarrow \infty }\rho ^{(k)}= 0\). In particular, in view of (12), we can write \({\hat{v}}^{(k)}= \nabla f_0( u^{(k)}) + w^{(k)}\), where \(w^{(k)}\in \partial f_1( u^{(k)})\). Therefore, by continuity of \(\nabla f_0\), the following implication holds
Adding the quantity \(\frac{1}{2} (\rho ^{(k_j)})^2\) to both sides of the subgradient inequality yields
where the last equality is obtained by adding and subtracting \(f_0(u^{(k_j)})\) to the right-hand-side. Taking limits on both sides we obtain
which, rearranging terms, gives \(\displaystyle \lim _{j\rightarrow \infty }{{\mathcal {F}}}(u^{(k_j)},\rho ^{(k_j)}) \le {{\mathcal {F}}}(x^*,\rho ^*)\). On the other side, by assumption, \({\mathcal {F}}\) is lower semicontinuous, therefore \(\displaystyle \lim _{j\rightarrow \infty }{{\mathcal {F}}}(u^{(k_j)},\rho ^{(k_j)}) \ge {{\mathcal {F}}}(x^*,\rho ^*)\), which completes the proof. \(\square\)
We are now ready to prove Theorem 6, which states the convergence of the i\(^2\)Piano iterates to a stationary point.
Proof
(Proof of Theorem 6) By Propositions 12-13-14, we know that conditions [H1]-[H2]-[H3] hold for i\(^2\)Piano. Furthermore, if \(\omega =1\), then \(u^{(k)}= {\hat{x}}^{(k+1)}= { x}^{(k+1)}\), and (59) directly implies that \(\displaystyle \lim _{k\rightarrow \infty } \Vert u^{(k)}- { x}^{(k)}\Vert = 0\). If \(\omega <1\), using (36) and (60) we have
Then, in both cases, the assumptions of Proposition 15 are satisfied and, therefore, condition [H6] holds. Finally, condition [H4] follows from (58), while condition [H6] is trivially satisfied, since both sequences \(\{a_k\}_{k\in {\mathbb {N}}}\) and \(\{b_k\}_{k\in {\mathbb {N}}}\) are constant. Then, Theorem 5 applies and guarantees that the sequence \(\{(x^{(k)},\rho ^{(k)})\}_{k\in {\mathbb {N}}}\) converges to a stationary point \((x^*,\rho ^*)\) of \({\mathcal {F}}\). Note that, since \({\mathcal {F}}\) is the sum of separable functions, its subdifferential can be written as \(\partial {\mathcal {F}}(x,\rho )=\partial f(x)\times \{\rho \}\). Then, \((x^*,\rho ^*)\) is stationary for \({\mathcal {F}}\) if and only if \(\rho ^*=0\) and \(0\in \partial f(x^*)\). Hence, \(x^*\) is a stationary point for f and \(\{x^{(k)}\}_{k\in {\mathbb {N}}}\) converges to it. \(\square\)
1.3 A.3 Convergence analysis of iPila
This section aims to develop the convergence framework for iPila. The first issue to be addressed is the well posedness of the line–search algorithm. To this end, we prove first the following Lemma.
Lemma 16
Let \(d^{(k)}\in {\mathbb {R}}^{2n}\) be defined according to (31–32), where \(0< {\alpha _k}\le \alpha _{max}\), \(\beta _k\ge 0\), \(\gamma _k\ge \gamma _{min}\), for some given positive real constants \(\alpha _{max}>0\), \(\gamma _{min}\ge 0\). Define \(\Delta _k \in {\mathbb {R}}_{\le 0}\) as
Then, we have
where \(a>0\) is a positive constant. Therefore, \(\Phi '( { x}^{(k)}, s^{(k)};d_x^{(k)},d_s^{(k)})<0\) whenever \({\tilde{y}}^{(k)}\ne { x}^{(k)}\) or \({ x}^{(k)}\ne s^{(k)}\).
Proof
We first observe that (29) with \(x= { x}^{(k)}\), \(s= s^{(k)}\), \(y={\tilde{y}}^{(k)}\), \(d_x=d_x^{(k)}\), \(d_s=d_s^{(k)}\), gives:
where the last inequality follows from (38). Then, the thesis follows from \(\alpha _k\le \alpha _{max}\) with \(a = \theta /{2\alpha _{max}}\). \(\square\)
The well posedness of the line–search procedure and its main properties are summarized in the following lemma.
Lemma 17
Let the assumptions of Lemma 16 be satisfied with, in addition, \(\beta _k\le \beta _{max}\), \(\gamma \le \gamma _{max}\), \({ \beta _{max}\ge 0}\), \(\gamma _{max}\ge 0\). Then, the Armijo backtracking line–search algorithm terminates in a finite number of steps, and there exists \(\lambda _{min}>0\) such that the parameter \(\lambda _k^+\) computed with the line–search algorithm satisfies
Proof
Let us first prove that
for a positive constant \(C>0\). Setting \(\delta _k = 1+\frac{\beta _k}{\alpha _k}\), the bounds on the parameters imply that \(1\le \delta _k\le {\bar{\delta }}\), with \({\bar{\delta }} = 1+\frac{\beta _{max}}{\alpha _{min}}\). By definition of \(d^{(k)}\) in (30) we have
where \(C =1/\max \{(1+{\bar{\delta }}^2+{\bar{\delta }} \gamma _{max})/a,\gamma _{max} + {\bar{\delta }}\}\). Multiplying both sides of the last inequality above by C and combining with (70) gives (74).
Since \(\Phi _0\) has M-Lipschitz continuous gradient, with \(M=L+2\) (see (27)), we can apply the Descent Lemma obtaining
where the second inequality follows from (74). From the Jensen’s inequality applied to the convex function \(f_1\) we also obtain
where the last inequality follows from (71–72). The above relation implies
with \(\rho = \frac{ L_{\Phi _0}}{2C}\). Moreover, comparing the above inequality with the Armijo condition (33) shows that the last one is surely fulfilled when \(\lambda _k^+\) satisfies \(1-\rho \lambda _k^+\ge \sigma\), that is when \(\lambda _k^+\le (1-\sigma )/\rho\).
Since \(\lambda _k^+\) in the backtracking procedure is obtained starting from 1 and by successive reductions of a factor \(\delta <1\), we have \(\lambda _k^+\ge \delta ^{ L_{\Phi _0}}\), where \(L_{\Phi _0}\) is the smallest nonnegative integer such that \(\delta ^{ L_{\Phi _0}}\le (1-\sigma )/\rho\). Therefore, (73) is satisfied with \(\lambda _{min} = \delta ^{ L_{\Phi _0}}\). \(\square\)
In the remaining of this section we show that iPila can be cast in the framework of the abstract scheme. We first show that conditions [H1]–[H3] are satisfied.
Proposition 18
Let \(\{( { x}^{(k)}, s^{(k)})\}_{k\in {\mathbb {N}}}\) be the sequence generated by iPila. Then, condition [H1] holds with \(d_k=\sqrt{-\Delta _k}\), \(a_k = \sigma \lambda _{min}\). Moreover, under Assumption [A4], we also have
Proof
From the updating rule at STEP 7 and from Lemma 17, we have
Then, condition [H1] is satisfied with \(d_k^2=-\Delta _k\), \(a_k = \sigma \lambda _{min}\). Since from Assumption [A4] f is bounded from below, \(\Phi\) is bounded from below as well. Therefore, [H1] implies \(\displaystyle -\sum _{k=0}^{\infty }\Delta _k<\infty\) which, in turn, yields \(\displaystyle \lim _{k\rightarrow \infty }\Delta _k=0\). Recalling (70), this implies (77). \(\square\)
In the following we describe the setup for proving [H2], with the second auxiliary function defined as in (61).
Proposition 19
Let \(\{( { x}^{(k)}, s^{(k)})\}_{{k\in {\mathbb {N}}}}\) be the sequence generated by Algorithm iPila with \(\gamma _{min}>0\) and let \({{\mathcal {F}}}\) be defined as in (61). Then, there exist \(\{\rho ^{(k)}\}_{k\in {\mathbb {N}}},\{r_k\}_{k\in {\mathbb {N}}}\subset {\mathbb {R}}\), with \(\lim _{k\rightarrow \infty }r_k=0\), such that [H2] holds with \(u^{(k)}= {\hat{y}}^{(k)}\), where \({\hat{y}}^{(k)}\) is the exact minimizer of \(h^{(k)}(y; { x}^{(k)}, s^{(k)})\), i.e.,
Proof
From STEP 7, we have
where the last inequality follows from (41) and (38). Setting
we obtain \(\Phi ( { x}^{(k+1)}, s^{(k+1)}) \le {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)})\), which represents the left-most inequality in [H2], with \(u^{(k)}= {\hat{y}}^{(k)}\). On the other hand, from inequality (42) we obtain
Setting \(r_k = (\rho ^{(k)})^2/2 -{\bar{c}} h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) +({\bar{d}}-\frac{1}{2})\Vert { x}^{(k)}- s^{(k)}\Vert ^2\), we can write
From (77) we have that \(\lim _{k\rightarrow \infty }r_k=0\) and this proves [H2]. \(\square\)
Proposition 20
Let \(\{( { x}^{(k)}, s^{(k)})\}_{{k\in {\mathbb {N}}}}\) be the sequence generated by iPila with \(\gamma _{min}>0\). Then, there exists a positive constant b such that [H3] is satisfied with \(I = \{1\}\), \(\theta _1 = 1\), \(\zeta _k = 0\).
Proof
From (40) we know that there exists a subgradient \({\hat{v}}^{(k)}\in \partial f(\hat{y}^{(k)})\) such that
and, reasoning as in the proof of Proposition 15, it follows that
Let us analyze the two terms at the right-hand side of the inequality above, showing that both can be bounded from above with a multiple of \(\sqrt{-\Delta _k}\). From (79) we obtain
which, setting \(A = q\left( 1+1/\sqrt{\gamma _{min}}\right)\) and using (70), yields
On the other hand, from definition (78)
where \(B = \sqrt{2}\max \left\{ c+ \alpha _{max}/{\theta }, d / {\gamma _{min}}\right\} ^{\frac{1}{2}}\). Therefore, combining the last inequality above with (81) and (80), yields \(\Vert \partial {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)})\Vert _-\le (A+B)\sqrt{-\Delta _k}\) which proves that [H3] is satisfied with \(I = \{1\}\), \(\theta _1 = 1\), \(\zeta _k = 0\), \(b=A+B\), \(b_k=1\). \(\square\)
We are now ready for proving Theorem 7, which states the main convergence result for Algorithm iPila.
Proof of Theorem 7
By Proposition 18, 19, and 20, we know that conditions [H1–H2–H3] hold for iPila. From (36) we have \(\displaystyle \lim _{k\rightarrow \infty }\Vert u^{(k)}- { x}^{(k)}\Vert = \displaystyle \lim _{k\rightarrow \infty }\Vert \hat{y}^{(k)}- { x}^{(k)}\Vert =0\). Hence we can apply Proposition 15 and conclude that [H6] holds. Moreover, condition [H4] holds as a consequence of Lemma 16, since \(d_k= \sqrt{-\Delta _k}\) and \(\Vert \tilde{y}^{(k)}-x^{(k)}\Vert \ge \Vert x^{(k+1)}-x^{(k)}\Vert /\lambda _k\ge \Vert x^{(k+1)}-x^{(k)}\Vert\) (see STEP 7). Finally, condition [H6] is trivially satisfied, since both sequences \(\{a_k\}_{k\in {\mathbb {N}}}\) and \(\{b_k\}_{k\in {\mathbb {N}}}\) are constant. Then Theorem 5 applies and guarantees that the sequence \(\{(x^{(k)},\rho ^{(k)})\}_{k\in {\mathbb {N}}}\) converges to a stationary point \((x^*,\rho ^*)\) of \({\mathcal {F}}\). Note that, since \({\mathcal {F}}\) is the sum of separable functions, its subdifferential can be written as \(\partial {\mathcal {F}}(x,\rho )=\partial f(x)\times \{\rho \}\). Then, \((x^*,\rho ^*)\) is stationary for \({\mathcal {F}}\) if and only if \(\rho ^*=0\) and \(0\in \partial f(x^*)\). Hence \(x^*\) is a stationary point for f and \(\{x^{(k)}\}_{k\in {\mathbb {N}}}\) converges to it. \(\square\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bonettini, S., Ochs, P., Prato, M. et al. An abstract convergence framework with application to inertial inexact forward–backward methods. Comput Optim Appl 84, 319–362 (2023). https://doi.org/10.1007/s10589-022-00441-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-022-00441-4