An abstract convergence framework with application to inertial inexact forward–backward methods | Computational Optimization and Applications Skip to main content
Log in

An abstract convergence framework with application to inertial inexact forward–backward methods

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

In this paper we introduce a novel abstract descent scheme suited for the minimization of proper and lower semicontinuous functions. The proposed abstract scheme generalizes a set of properties that are crucial for the convergence of several first-order methods designed for nonsmooth nonconvex optimization problems. Such properties guarantee the convergence of the full sequence of iterates to a stationary point, if the objective function satisfies the Kurdyka–Łojasiewicz property. The abstract framework allows for the design of new algorithms. We propose two inertial-type algorithms with implementable inexactness criteria for the main iteration update step. The first algorithm, i\(^2\)Piano, exploits large steps by adjusting a local Lipschitz constant. The second algorithm, iPila, overcomes the main drawback of line-search based methods by enforcing a descent only on a merit function instead of the objective function. Both algorithms have the potential to escape local minimizers (or stationary points) by leveraging the inertial feature. Moreover, they are proved to enjoy the full convergence guarantees of the abstract descent scheme, which is the best we can expect in such a general nonsmooth nonconvex optimization setup using first-order methods. The efficiency of the proposed algorithms is demonstrated on two exemplary image deblurring problems, where we can appreciate the benefits of performing a linesearch along the descent direction inside an inertial scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data Availibility Statement

The test image barbara in the experiments described in Sect. 5.1 is included in the software available at [47], while the jetplane one used in Sect. 5.2 can be downloaded from [49].

References

  1. Bertero, M., Boccacci, P., Ruggiero, V.: Inverse Imaging with Poisson Data. IOP Publishing, Bristol (2018)

    Google Scholar 

  2. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  3. Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  4. Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., Burachik, R.S., Combettes, P.L., Elser, V., Luke, D.R., Wolkowicz, H. (eds.) Fixed-point Algorithms for Inverse Problems in Science and Engineering. Springer Optimization and Its Applications, pp. 185–212. Springer, New York, NY (2011)

  5. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bonettini, S., Loris, I., Porta, F., Prato, M., Rebegoldi, S.: On the convergence of a linesearch based proximal-gradient method for nonconvex optimization. Inverse Probl. 33(5), 055005 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bonettini, S., Prato, M., Rebegoldi, S.: A block coordinate variable metric linesearch based proximal gradient method. Comput. Optim. Appl. 71(1), 5–52 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  10. Chouzenoux, E., Pesquet, J.-C., Repetti, A.: A block coordinate variable metric forward-backward algorithm. J. Global Optim. 66(3), 457–485 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  11. Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates. J. Opt. Theory Appl. 165, 874–900 (2015)

    Article  MATH  Google Scholar 

  12. Li, G., Pong, T.K.: Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Math. Program. 159, 371–401 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  13. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1–2), 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  14. Bolte, J., Danilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)

    Article  MATH  Google Scholar 

  15. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier 48(3), 769–783 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  16. Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: Subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)

    Article  MATH  Google Scholar 

  17. den Dries, L.V.: Tame Topology and -minimal Structures. Cambridge University Press, 150 184 (1998)

  18. Bolte, J., Daniilidis, A.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 10, 556–572 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  19. Absil, P.A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM J. Optim. 16(2), 531–547 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  20. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  21. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  22. Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  23. Ochs, P.: Unifying abstract inexact convergence theorems and block coordinate variable metric iPiano. SIAM J. Optim. 29(1), 541–570 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  24. Bonettini, S., Prato, M., Rebegoldi, S.: Convergence of inexact forward-backward algorithms using the forward-backward envelope. SIAM J. Optim. 30(4), 3069–3097 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  25. Bonettini, S., Prato, M., Rebegoldi, S.: New convergence results for the inexact variable metric forward-backward method. Appl. Math. Comput. 392, 125719 (2021)

    MathSciNet  MATH  Google Scholar 

  26. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)

    Article  Google Scholar 

  27. Polyak, B.: Introduction to Optimization. Optimization Software - Inc.Publication Division, New York (1987)

    MATH  Google Scholar 

  28. Ayers, G.R., Dainty, J.C.: Iterative blind deconvolution method and its applications. Opt. Lett. 13(7), 547–549 (1988)

    Article  Google Scholar 

  29. Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Structured sparsity through convex optimization. Stat. Sci. 27(4), 450–468 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  30. Bonettini, S., Rebegoldi, S., Ruggiero, V.: Inertial variable metric techniques for the inexact forward-backward algorithm. SIAM J. Sci. Comput. 40(5), 3180–3210 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  31. Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward-backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  32. Rockafellar, R.T., Wets, R.J.-B., Wets, M.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)

  33. Chouzenoux, E., Pesquet, J.-C., Repetti, A.: Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162(1), 107–132 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  34. Noll, D.: Convergence of non-smooth descent methods using the Kurdyka-Łojasiewicz inequality. J. Opt. Theory Appl. 160(2), 553–572 (2014)

    Article  MATH  Google Scholar 

  35. Stella, L., Themelis, A., Patrinos, P.: Forward-backward quasi-Newton methods for nonsmooth optimization problems. Comput. Optim. Appl. 67, 443–487 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  36. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  37. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  38. Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search based methods for nonsmooth optimization. SIAM J. Optim. 26(2), 891–921 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  39. Zalinescu, A.: Convex Analysis in General Vector Spaces. World Scientific Publishing, Singapore (2002)

    Book  MATH  Google Scholar 

  40. Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 19(4), 1167–1192 (2012)

    MathSciNet  MATH  Google Scholar 

  41. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books on Mathematics. Springer, New York (2011)

  42. Calatroni, L., Chambolle, A.: Backtracking strategies for accelerated descent methods with smooth composite objectives. SIAM J. Optim. 29(3), 1772–1798 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  43. Scheinberg, K., Goldfarb, D., Bai, X.: Fast first-order methods for composite convex optimization with backtracking. Found. Comput. Math. 14, 389–417 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  44. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton, NJ (1970)

    Book  MATH  Google Scholar 

  45. Chen, Y., Pock, T., Ranftl, R., Bischof, H.: Revisiting loss-specific training of filter-based MRFs for image restoration. In: Weickert, J., Hein, M., Schiele, B. (eds.) Pattern Recognition, pp. 271–281. Springer, Berlin, Heidelberg (2013)

    Chapter  Google Scholar 

  46. Chen, Y., Ranftl, R., Pock, T.: Insights into analysis operator learning: from patch-based sparse models to higher order mrfs. IEEE Trans. Image Process. 23(3), 1060–1072 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  47. Bonettini, S., Ochs, P., Prato, M., Rebegoldi, S.: inertial inexact Proximal algorithm for nonconvex optimization (i2Piano) and inertial Proximal inexact linesearch algorithm (iPila) software. http://www.oasis.unimore.it/site/home/software.html (2021)

  48. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. J. Phys. D. 60(1–4), 259–268 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  49. Repetti, A., Chouzenoux, E.: RestoVMFB Lab: Matlab Toolbox for Image Restoration with the Variable Metric Forward-Backward Algorithm. http://www-syscom.univ-mlv.fr/~chouzeno/Logiciel.html (2013)

  50. Benning, M., Betcke, M.M., Ehrhardt, M.J., Schönlieb, C.-B.: Choose your path wisely: gradient descent in a Bregman distance framework. SIAM J. Imaging Sci. 14(2), 814–843 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  51. Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

Download references

Funding

Silvia Bonettini, Marco Prato and Simone Rebegoldi are members of the INdAM research group GNCS. Peter Ochs acknowledges funding by the German Research Foundation (DFG Grant OC 150/3-1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Silvia Bonettini.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix: convergence Analysis of i\(^2\)Piano and iPila

A Appendix: convergence Analysis of i\(^2\)Piano and iPila

In this section we analyze the convergence properties of algorithms i\(^2\)Piano and iPila, showing that they can be both considered as special cases of the abstract scheme presented in Sect. 3.

1.1 A.1 Preliminary results

We collect here new basic results concerning the inexactness criterion introduced in Sect. 4.1, which is incorporated in our proposed algorithms. The following lemma is a consequence of the strong convexity of the function \(h (\cdot ;x,s)\) defined in (15), and will be often employed in the following. In order to simplify the notation, here we omit the iteration index k.

Lemma 8

Suppose that Assumptions [A1]–[A2] hold true. For a given pair \((x,s)\in \textrm{dom}(f_1)\times {\mathbb {R}}^n\), let \({\hat{y}}, {\tilde{y}}\) be defined as in (1618). Then, the following inequalities hold.

$$\begin{aligned} \frac{1}{2\alpha } \Vert {\hat{y}}-x\Vert ^2\le & {} \left( 1+\frac{\tau }{2}\right) (-h( {\tilde{y}};x,s))\end{aligned}$$
(36)
$$\begin{aligned} \frac{1}{2\alpha } \Vert {\tilde{y}}- {\hat{y}}\Vert ^2\le & {} \frac{\tau }{2} (-h( {\tilde{y}};x,s))\end{aligned}$$
(37)
$$\begin{aligned} \frac{\theta }{2\alpha } \Vert {\tilde{y}}-x\Vert ^2\le & {} (-h( {\tilde{y}};x,s)),\ \ \text{ with } \theta = 1/\left( \sqrt{1+\frac{\tau }{2}}+\sqrt{\frac{\tau }{2}}\right) ^2\le 1. \end{aligned}$$
(38)

Proof

Inequalities (3637) follow by combining the strong convexity of the function \(h(\cdot ;x,s)\) and condition (18) as in [25, Lemma2]. As for (38), we have

$$\begin{aligned} \frac{1}{2\alpha }\Vert {\tilde{y}}-x\Vert ^2= & {} \frac{1}{2\alpha }\Vert {\tilde{y}}- {\hat{y}}+ {\hat{y}}-x\Vert ^2 = \frac{1}{2\alpha }\Vert {\tilde{y}}- {\hat{y}}\Vert ^2 + \frac{1}{2\alpha }\Vert {\hat{y}}-x\Vert ^2 + \frac{1}{\alpha }\langle {\tilde{y}}- {\hat{y}}, {\hat{y}}-x\rangle \\\le & {} \frac{1}{2\alpha }\Vert {\tilde{y}}- {\hat{y}}\Vert ^2 + \frac{1}{2\alpha }\Vert {\hat{y}}-x\Vert ^2 + \frac{1}{\alpha } \Vert {\tilde{y}}- {\hat{y}}\Vert \cdot \Vert {\hat{y}}-x\Vert \\\le & {} \left( 1+\frac{\tau }{2}\right) (-h( {\tilde{y}};x,s)) + \frac{\tau }{2} (-h( {\tilde{y}};x,s)) + 2\sqrt{1+\frac{\tau }{2}}\sqrt{\frac{\tau }{2}}(-h( {\tilde{y}};x,s))\\= & {} \left( \sqrt{1+\frac{\tau }{2}}+\sqrt{\frac{\tau }{2}}\right) ^2(-h( {\tilde{y}};x,s)), \end{aligned}$$

where the last inequality follows from the application of (3637). \(\square\)

The next lemma provides a subgradient \(\hat{v}\in \partial f(\hat{y})\) whose norm is bounded from above by a quantity containing \(\sqrt{ - h( {\tilde{y}};x,s)}\). Its proof is omitted since it is almost identical to the one of Lemma 3 in [25].

Lemma 9

Suppose Assumptions [A1]–[A3] hold true. Let x be a point in \(\textrm{dom}(f_1)\) and let \({\hat{y}}, {\tilde{y}}\) be defined as in (1618). Moreover, assume that \(\alpha \in [\alpha _{min},\alpha _{max}]\), with \(0<\alpha _{min}\le \alpha _{max}\) and \(\beta \in [0,\beta _{max}]\), with \(\beta _{max}\ge 0\). Then, there exists a subgradient \({\hat{v}}\in \partial f( {\hat{y}})\) such that

$$\begin{aligned} \Vert {\hat{v}}\Vert\le & {} p(\Vert {\hat{y}}-x\Vert + \Vert x-s\Vert ) \end{aligned}$$
(39)
$$\begin{aligned}\le & {} q(\sqrt{ - h( {\tilde{y}};x,s)} + \Vert x-s\Vert ) , \end{aligned}$$
(40)

where the two constants pq depend only on \(\alpha _{min},\alpha _{max},\beta _{max}\) and on the Lipschitz constant L.

The following lemma is the equivalent of Lemma 4-5 in [25].

Lemma 10

Suppose Assumptions [A1]–[A3] hold true and assume \(0<\alpha _{min}\le \alpha \le \alpha _{max}\), \(\beta \in [0,\beta _{max}]\). Let (xs) be a point in \(\textrm{dom}(f_1)\times {\mathbb {R}}^n\) and let \({\hat{y}}, {\tilde{y}}\) be defined as in (1618), for some \(\tau \ge 0\). Then, there exists \(c,d,{\bar{c}},{\bar{d}} \in {\mathbb {R}}\) depending only on \(\alpha _{min},\alpha _{max},\beta _{max},\tau\) such that

$$\begin{aligned}{} & {} f( {\hat{y}})\ge f( {\tilde{y}}) +ch( {\tilde{y}};x,s) - d \Vert x-s\Vert ^2 \end{aligned}$$
(41)
$$\begin{aligned}{} & {} f( {\hat{y}})\le f(x) - {\bar{c}} h( {\tilde{y}};x,s)+ {\bar{d}}\Vert x-s\Vert ^2. \end{aligned}$$
(42)

Proof

From the Descent Lemma [51, Proposition A.24] we have

$$\begin{aligned} f_0( {\hat{y}})\ge f_0( {\tilde{y}}) - \langle \nabla f_0( {\hat{y}}), {\tilde{y}}- {\hat{y}}\rangle -\frac{L}{2} \Vert {\tilde{y}}- {\hat{y}}\Vert ^2. \end{aligned}$$
(43)

The inclusion \(0\in \partial _\epsilon h( {\tilde{y}};x,s)\) in (20) implies that there exists a vector \(e\in {\mathbb {R}}^n\) with

$$\begin{aligned} \frac{1}{2\alpha } \Vert e\Vert ^2 \le \epsilon \end{aligned}$$
(44)

such that

$$\begin{aligned} -\frac{1}{\alpha }( {\tilde{y}}-x+\alpha \nabla f_0(x) -\beta (x-s)+ e)\in \partial _{\epsilon }f_1( {\tilde{y}}) \end{aligned}$$

(see [7] and references therein). The definition of \(\epsilon\)-subdifferential implies

$$\begin{aligned} f_1( {\hat{y}})\ge f_1( {\tilde{y}}) -\frac{1}{\alpha }\langle {\tilde{y}}-x, {\hat{y}}- {\tilde{y}}\rangle +\frac{\beta }{\alpha }\langle {\hat{y}}- {\tilde{y}},x-s\rangle -\langle \nabla f_0(x), {\hat{y}}- {\tilde{y}}\rangle - \frac{1}{\alpha }\langle e, {\hat{y}}- {\tilde{y}}\rangle - \epsilon . \end{aligned}$$
(45)

Summing inequalities (43) and (45) yields

$$\begin{aligned}{} & {} f( {\hat{y}})\ge f( {\tilde{y}})-\langle \nabla f_0(x)- \nabla f_0( {\hat{y}}), {\hat{y}}- {\tilde{y}}\rangle +\frac{\beta }{\alpha }\langle {\hat{y}}- {\tilde{y}},x-s\rangle \nonumber \\{} & {} -\frac{1}{\alpha }\langle {\tilde{y}}-x, {\hat{y}}- {\tilde{y}}\rangle - \frac{1}{\alpha }\langle e, {\hat{y}}- {\tilde{y}}\rangle -\frac{L}{2} \Vert {\tilde{y}}- {\hat{y}}\Vert ^2 -\epsilon . \end{aligned}$$
(46)

Now we consider each term at the right-hand-side in the above inequality so as to obtain a lower bound. Using the Cauchy-Schwarz inequality, Assumption [A3], (36) and (37) we obtain

$$\begin{aligned} \langle \nabla f_0(x)- \nabla f_0( {\hat{y}}), {\hat{y}}- {\tilde{y}}\rangle\le & {} \Vert \nabla f_0(x)- \nabla f_0( {\hat{y}})\Vert \Vert {\hat{y}}- {\tilde{y}}\Vert \nonumber \\\le & {} L\alpha _{max}\sqrt{2\tau \left( 1+\frac{\tau }{2}\right) }(-h( {\tilde{y}};x,s)). \end{aligned}$$
(47)

Similarly, using again the Cauchy-Schwarz inequality, (37) and (38), we can write

$$\begin{aligned} \frac{1}{\alpha }\langle {\tilde{y}}-x, {\hat{y}}- {\tilde{y}}\rangle\le & {} \frac{1}{\alpha _{min}}\Vert {\tilde{y}}-x\Vert \Vert {\hat{y}}- {\tilde{y}}\Vert \le \frac{\alpha _{max}}{\alpha _{min}}\sqrt{\frac{2\tau }{\theta }}(-h( {\tilde{y}};x,s)). \end{aligned}$$
(48)

Moreover, from (44) and (20) we obtain \(\Vert e\Vert \le \sqrt{2\alpha _{max}\epsilon }\le \sqrt{\alpha _{max}\tau (-h( {\tilde{y}};x,s))}\) which, using also (37), yields

$$\begin{aligned} \frac{1}{\alpha }\langle e, {\hat{y}}- {\tilde{y}}\rangle\le & {} \frac{1}{\alpha } \Vert e\Vert \Vert {\hat{y}}- {\tilde{y}}\Vert \le \frac{\alpha _{max}\tau }{\alpha _{min}}(-h( {\tilde{y}};x,s)). \end{aligned}$$
(49)

Finally, using (37), we can also write

$$\begin{aligned} \frac{\beta }{\alpha }\langle {\hat{y}}- {\tilde{y}},x-s\rangle \ge - \frac{\beta }{2\alpha }(\Vert {\hat{y}}- {\tilde{y}}\Vert ^2 + \Vert x-s\Vert ^2)\ge -\frac{\beta _{max}}{2\alpha _{min}}\left( -\alpha _{max}\tau h( {\tilde{y}};x,s) + \Vert x-s\Vert ^2\right) . \end{aligned}$$
(50)

Combining (46) with (38, 47, 48, 49, 50) and (20), gives (41) with

$$\begin{aligned} c = L\alpha _{max}\sqrt{2\tau \left( 1+\frac{\tau }{2}\right) }+\frac{\alpha _{max}}{\alpha _{min}}\sqrt{\frac{2\tau }{\theta }}+\frac{\alpha _{max}\tau }{\alpha _{min}}+\frac{L \alpha _{max}\tau }{2}+\frac{\tau }{2}+\frac{\beta _{max}\alpha _{max}\tau }{2\alpha _{min}}, \ \ d = \frac{\beta _{max}}{2\alpha _{min}}. \end{aligned}$$

As for (42), using the Descent Lemma we obtain

$$\begin{aligned} f_0( y) \le f_0(x) + \langle \nabla f_0(x),y - x\rangle + \frac{L}{2} \Vert y - x\Vert ^2, \end{aligned}$$

for all \(x, y\in \textrm{dom}(f_1)\). Summing \(f_1(y)\) on both sides yields

$$\begin{aligned}{} & {} f(y) \le f(x) + f_1(y)-f_1(x) + \langle \nabla f_0(x),y - x\rangle + \frac{L}{2} \Vert y - x\Vert ^2\\{} & {} \le f(x) + h(y;x,s) + \frac{L}{2} \Vert y - x\Vert ^2+ \frac{\beta }{\alpha }\langle x-s,y - x\rangle \\{} & {} \le f(x) + h(y;x,s) + \frac{L}{2} \Vert y - x\Vert ^2+ \frac{\beta }{2\alpha }(\Vert x-s\Vert ^2 + \Vert y-x\Vert ^2). \end{aligned}$$

From the previous inequality with \(y= {\hat{y}}\), recalling that \(h( {\hat{y}};x,s)\le 0\) and combining with (36) yields (42), where the constants are set as \({\bar{c}} = (L\alpha _{max}+\beta _{max}) (1+{\tau }/{2})\), \({\bar{d}} = {\beta _{max}}/{(2\alpha _{min})}\). \(\square\)

1.2 A.2 Convergence analysis of i\(^2\)Piano

Our aim now is to frame i\(^2\)Piano in the abstract scheme defined by Conditions 3 to enjoy the favourable convergence guarantees that are provided by Theorem 5. The line of the proof developed in this section is based on an extension of the arguments in [23]. We start the convergence analysis by showing that i\(^2\)Piano is well-posed and that its parameters satisfy some useful relations.

Lemma 11

The loop between STEP 2 and STEP 6 terminates in a finite number of steps. In particular, there exists \(\overline{L}>0\) such that \(L_k\le \overline{L}\), \(\forall \ k\ge 0\). Moreover, we have

$$\begin{aligned} 0\le \beta _k\le \frac{1+\theta \omega }{2}, \ \ \forall \ k\ge 0 \end{aligned}$$
(51)

and there exist two positive constants \(\alpha _{min},\alpha _{max}\) with \(0<\alpha _{min}\le \alpha _{max}\) such that \(\alpha _k\in [\alpha _{min},\alpha _{max}]\), \(\forall k\ge 0\). We also have

$$\begin{aligned} \frac{1+\theta \omega }{2\alpha _k}-\frac{L_k}{2}-\frac{\beta _k}{2\alpha _k}= & {} \delta \end{aligned}$$
(52)
$$\begin{aligned} \delta -\frac{\beta _k}{2\alpha _k}= & {} \gamma . \end{aligned}$$
(53)

Proof

Since \(\eta >1\), after a finite number of steps the tentative value of \(L_k\) satisfies \(L_k\ge L\), where L is the Lipschitz constant of \(\nabla f_0\). Then, from the Descent Lemma, the inequality at Step 6 is satisfied. From \(\delta \ge \gamma\) we have \(b_k\ge 1\), which implies (51). A simple inspection shows that the following equalities hold:

$$\begin{aligned} b_k=\frac{1+\theta \omega -{\beta _k}}{1+\theta \omega -2\beta _k}\Rightarrow \frac{L_k+2\delta }{L_k+2\gamma }=\frac{1+\theta \omega -{\beta _k}}{1+\theta \omega -2\beta _k}, \end{aligned}$$

which leads to rewriting the parameter \(\alpha _k\) as

$$\begin{aligned} \alpha _k=\frac{1+\theta \omega -\beta _k}{L_k+2\delta }. \end{aligned}$$
(54)

Then there holds \(\alpha _k\ge \alpha _{min}\) with \(\alpha _{min} = (1+\theta \omega )/(2(\overline{L} + 2\delta ))\) and, since \(\theta \omega \le 1\), we also have \(\alpha _k\le \alpha _{max}\) with \(\alpha _{max} = 2/L_{min}\). Moreover, we have

$$\begin{aligned} \frac{1+\theta \omega }{\alpha _k}-\frac{L_k}{2}-\frac{\beta _k}{2\alpha _k}=\frac{1+\theta \omega -\beta _k}{2\alpha _k}-\frac{L_k}{2}=\frac{L_k+2\delta }{2}-\frac{L_k}{2}=\delta \end{aligned}$$

and

$$\begin{aligned} \delta -\frac{\beta _k}{2\alpha _k}=\frac{1+\theta \omega }{2\alpha _k}-\frac{L_k}{2}-\frac{\beta _k}{\alpha _k}=\frac{1+\theta \omega -2\beta _k}{2\alpha _k}-\frac{L_k}{2}=\frac{L_k+2\gamma }{2}-\frac{L_k}{2}=\gamma . \end{aligned}$$

\(\square\)

Notice that the case \(\tau =0\), which corresponds to the exact computation of the inertial proximal gradient point at Step 5, implies \(\theta = 1\) and, choosing \(\omega =1\), the parameters settings in i\(^2\)Piano are exactly the same as in [6]. The need of introducing the parameter \(\omega\) is mainly technical and will be explained in the following.

We now prove that condition [H1] holds for i\(^2\)Piano when the corresponding merit function \(\Phi\) is defined as follows:

$$\begin{aligned}&\Phi :{\mathbb {R}}^n\times {\mathbb {R}}^n \rightarrow \overline{{\mathbb {R}}},\ \ \ \ \Phi (x,s) = f(x) + \delta \Vert x-s\Vert ^2. \end{aligned}$$
(55)

Proposition 12

Let \(\{x^{(k)}\}_{k\in {\mathbb {N}}}\) be the sequence generated by i\(^2\)Piano. Then there holds

$$\begin{aligned} \Phi (x^{(k+1)},x^{(k)}) \le \Phi (x^{(k)},x^{(k-1)}) -\gamma \Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2+(1-\omega ) h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}). \end{aligned}$$
(56)

Consequently, there exist \(\{s^{(k)}\}_{k\in {\mathbb {N}}}\), \(\{d_k\}_{k\in {\mathbb {N}}}\), \(\{a_k\}_{k\in {\mathbb {N}}}\) such that [H1] holds with \(\Phi\) as in (55).

Proof

By summing the quantity \(f_1( { x}^{(k+1)})\) to both sides of inequality (24) we obtain

$$\begin{aligned}{} & {} f( { x}^{(k+1)}) \le f( { x}^{(k)}) \!+\! f_1( { x}^{(k+1)})-f_1( { x}^{(k)}) \!+\! \langle \nabla f_0( { x}^{(k)}), { x}^{(k+1)}- { x}^{(k)}\rangle \!+\! \frac{L_k}{2}\Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2\\{} & {} = f( { x}^{(k)}) + h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) - \left( \frac{1}{2{\alpha _k}}-\frac{L_k}{2}\right) \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2 \\{} & {} + \frac{\beta _k}{\alpha _k}\langle { x}^{(k+1)}- { x}^{(k)}, { x}^{(k)}- { x}^{(k-1)}\rangle \\{} & {} \le f( { x}^{(k)}) + h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) - \left( \frac{1}{2{\alpha _k}}-\frac{L_k}{2}\right) \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2\\{} & {} + \frac{\beta _k}{2\alpha _k}(\Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2+\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2)\\{} & {} \le f( { x}^{(k)})+(1-\omega )h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) -\frac{\theta \omega }{2\alpha _k}\Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2 \\{} & {} - \left( \frac{1}{2{\alpha _k}}-\frac{L_k}{2}\right) \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2+ \frac{\beta _k}{2\alpha _k}(\Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2+\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2)\\{} & {} = f( { x}^{(k)}) - \left( \frac{1+\theta \omega }{2{\alpha _k}}-\frac{L_k}{2}-\frac{\beta _k}{2\alpha _k}\right) \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2\\{} & {} +(1-\omega )h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) + \frac{\beta _k}{2\alpha _k}\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 \end{aligned}$$

where the first equality is obtained by adding and subtracting to the right-hand-side the quantity \(\Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2/(2{\alpha _k}) +\beta _k/\alpha _k\langle { x}^{(k+1)}- { x}^{(k)}, { x}^{(k)}- { x}^{(k-1)}\rangle\), the subsequent inequality follows from the basic relation \(2\langle a,b\rangle \le \Vert a\Vert ^2+\Vert b\Vert ^2\) and the next one from (38). Recalling (52), the above inequality can be conveniently rewritten as

$$\begin{aligned} f(x^{(k+1)})+\delta \Vert x^{(k+1)}&-x^{(k)}\Vert ^2 \le f(x^{(k)})+\delta \Vert x^{(k)}-x^{(k-1)}\Vert ^2\\&\ \ +\left( \frac{\beta _k}{2\alpha _k}-\delta \right) \Vert x^{(k)}-x^{(k-1)}\Vert ^2 +(1-\omega )h( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}). \end{aligned}$$

Finally, exploiting (53), we obtain condition [H1] with \(\Phi\) given in (55), \(a_k=1\) and

$$\begin{aligned} s^{(k)}= & {} { x}^{(k-1)}\end{aligned}$$
(57)
$$\begin{aligned} d_k^2= & {} \gamma \Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 -(1-\omega )h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}). \end{aligned}$$
(58)

\(\square\)

Under Assumption [A4], \(\Phi\) is bounded from below, hence, condition [H1] implies

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert x^{(k)}-x^{(k-1)}\Vert = 0 \end{aligned}$$
(59)

and, if \(\omega < 1\), also

$$\begin{aligned} \lim _{k\rightarrow \infty }h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) = 0. \end{aligned}$$
(60)

The choice \(\omega < 1\) is enforced when \(\tau > 0\) in order to obtain (60). If we take \(\omega =1\), with the same arguments as above we still obtain (59). It is also worth noticing that, for large values of \(\tau\), that is when a coarser accuracy is allowed in the computation of \({\tilde{y}}^{(k)}\), the parameter \(\theta\) can be very small and this also influences the choice of \(\alpha _k\) and \(\beta _k\).

In order to prove condition [H2], let us now introduce the second merit function as follows

$$\begin{aligned} {{\mathcal {F}}}:{\mathbb {R}}^n\times {\mathbb {R}}\rightarrow \overline{{\mathbb {R}}},\ \ \ \ {{\mathcal {F}}}(u,\rho ) = f(u) +\frac{1}{2} \rho ^2. \end{aligned}$$
(61)

The following result is proved using similar arguments as the ones used in Lemma 4-5 in [25].

Proposition 13

Let \(\{ { x}^{(k)}\}_{k\in {\mathbb {N}}}\) be the sequence generated by i\(^2\)Piano and, for each k, let the point \(y^{(k)}\) be defined as

$$\begin{aligned} y^{(k)}= \mathop {\textrm{argmin}}\limits _{y\in {\mathbb {R}}^n} h^{(k)}(y; { x}^{(k)}, { x}^{(k-1)}). \end{aligned}$$
(62)

Then, there exist \(\{\rho ^{(k)}\}_{k\in {\mathbb {N}}}\), \(\{r_k\}_{k\in {\mathbb {N}}}\) such that condition [H2] holds with \(\Phi\) defined in (55), \({\mathcal {F}}\) defined in (61), \(s^{(k)}\) defined in (57) and \(u^{(k)}=\hat{x}^{(k+1)}\).

Proof

When \(\omega =1\) we necessarily have \(\tau = 0\), which means \({\tilde{y}}^{(k)}= \hat{x}^{(k+1)}\). Therefore, [H2] direcly follows from [H1] with \(u^{(k)}= { x}^{(k+1)}\), \(\rho ^{(k)} = \sqrt{2\delta } \Vert { x}^{(k)}- { x}^{(k-1)}\Vert\), \(r_k=0\). Consider now the case \(\omega <1\). From (41) and (42), we directly obtain

$$\begin{aligned} f(y^{(k)})\ge & {} f({\tilde{y}}^{(k)}) + ch^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) - d\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2\\ f(y^{(k)})\le & {} f( { x}^{(k)}) - {\bar{c}} h({\tilde{y}}^{(k)}; { x}^{(k)}, { x}^{(k-1)})+ {\bar{d}}\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2, \end{aligned}$$

where \(c,d,{\bar{c}},{\bar{d}}\) are defined as in Lemma 10 and do not depend on k. Combining the two inequalities above we obtain

$$\begin{aligned} f(&{\tilde{y}}^{(k)}) + \delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2\le \\&\le f(y^{(k)}) + \delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2 { -ch^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})} + d\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2\\&\le f( { x}^{(k)}) - (c+{\bar{c}}) h({\tilde{y}}^{(k)}; { x}^{(k)}, { x}^{(k-1)})+ (d+{\bar{d}})\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 + \delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2 . \end{aligned}$$

Recalling the definition of \({\mathcal {F}}\) in (61), the above inequalities can be rewritten as

$$\begin{aligned} f({\tilde{y}}^{(k)}) + \delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2\le {{\mathcal {F}}}(y^{(k)},\rho ^{(k)}) \le f( { x}^{(k)}) + \delta \Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 + r_k \end{aligned}$$
(63)

where \(\rho ^{(k)}, r_k\) are given by

$$\begin{aligned} \rho ^{(k)}= & {} \sqrt{2}(\delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2 { -ch^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})} + d\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 )^{\frac{1}{2}} \nonumber \\{} & {} \nonumber \\ r_k= & {} - (c+{\bar{c}}) h({\tilde{y}}^{(k)}; { x}^{(k)}, { x}^{(k-1)})+ (d+{\bar{d}})\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2+ \delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2. \end{aligned}$$
(64)

From (5960) we obtain \(\displaystyle \lim _{k\rightarrow \infty } r_k = 0\), hence, assumption [H2] is satisfied with the above settings and with \(u^{(k)}= y^{(k)}\). \(\square\)

Next we show that condition [H3] holds for i\(^2\)Piano. The following result combines elements of Lemma 3 in [25] and Lemma 17 in [23].

Proposition 14

There exist \(b>0\), \(I\subset {\mathbb {Z}}\), \(\{\theta _i\}_{i\in I}\) such that condition [H3] holds with \(\{u^{(k)}\}_{k\in {\mathbb {N}}},\{\rho ^{(k)}\}_{k\in {\mathbb {N}}},\{d_k\}_{k\in {\mathbb {N}}}\) defined as in Proposition 13 and in (58), \(\zeta _{k+1}=0\) and \(b_{k+1}=1\).

Proof

From the separable structure of \({{\mathcal {F}}}\), if \(v\in \partial f(u)\) and \(\rho \in {\mathbb {R}}\), we have that \((v,\rho )\in \partial {{\mathcal {F}}}(u,\rho )\). In particular,

$$\begin{aligned} \Vert \partial {{\mathcal {F}}}(u,\rho )\Vert _- \le \Vert v\Vert + |\rho |, \text{ for } \text{ all } v\in \partial f(u), \rho \in {\mathbb {R}}. \end{aligned}$$
(65)

If \(\omega = 1\), we are in the case \(\tau = 0\). This means that \({ x}^{(k+1)}= y^{(k)}\), \(\rho ^{(k)}= \sqrt{2\delta } \Vert { x}^{(k)}- { x}^{(k-1)}\Vert\). Moreover, (58) reduces to \(d_k = \sqrt{\gamma }\Vert { x}^{(k)}- { x}^{(k-1)}\Vert = \sqrt{\gamma /(2\delta )}\rho ^{(k)}\). From (39), we have that there exists a subgradient \({\hat{v}}^{(k+1)}\in \partial f(y^{(k)})\) and a positive constant p such that

$$\begin{aligned} \Vert {\hat{v}}^{(k+1)}\Vert\le & {} p(\Vert { x}^{(k+1)}- { x}^{(k)}\Vert + \Vert { x}^{(k)}- { x}^{(k-1)}\Vert ) = \frac{p}{\sqrt{\gamma }}(d_{k+1} + d_{k}). \end{aligned}$$

Hence, \(\Vert {\hat{v}}^{(k+1)}\Vert +\rho ^{(k)}\le p d_{k+1}/{\sqrt{\gamma }} + d_k(p+\sqrt{2\delta })/{\sqrt{\gamma }}\) and, recalling (65), [H3] follows with \(b = (p+\sqrt{2\delta })/{\sqrt{\gamma }}\), \(I=\{0,1\}\), \(\theta _0=\theta _1=\frac{1}{2}\). Consider now the case \(\omega < 1\). From (40), there exists a subgradient \({\hat{v}}^{(k+1)}\in \partial f(y^{(k)})\) and a positive constant q such that

$$\begin{aligned}{} & {} \Vert {\hat{v}}^{(k+1)}\Vert \le q\sqrt{-h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})} + q\Vert { x}^{(k)}- { x}^{(k-1)}\Vert \nonumber \\{} & {} \le q\sqrt{-h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})} + q\sqrt{\frac{2\alpha _{max}}{\theta }}\sqrt{-h^{(k-1)}( { x}^{(k)}; { x}^{(k-1)},x^{(k-2)})}\nonumber \\{} & {} \le \frac{q}{\sqrt{1-\omega }}\sqrt{-(1-\omega )h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})+\gamma \Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2} \nonumber \\{} & {} + q\sqrt{\frac{2\alpha _{max}}{\theta (1-\omega )}}\sqrt{-(1-\omega )h^{(k-1)}( { x}^{(k)}; { x}^{(k-1)},x^{(k-2)})+\gamma \Vert { x}^{(k-1)}-x^{(k-2)}\Vert ^2} \nonumber \\{} & {} \le \frac{q}{\sqrt{1-\omega }} d_k + q\sqrt{\frac{2\alpha _{max}}{\theta (1-\omega )}}d_{k-1} . \end{aligned}$$
(66)

From (64) and (38) we have

$$\begin{aligned}{} & {} \rho ^{(k)}\le \sqrt{2}\left( -(2\delta \alpha _{max}/\theta + c)h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) + d\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 \right) ^{\frac{1}{2}} \nonumber \\{} & {} \le r d_k \end{aligned}$$
(67)

where \(r = \sqrt{2} \max \left\{ (2\delta \alpha _{max}/\theta +c)/(1-\omega ), d/\gamma \right\} ^{\frac{1}{2}}\). Then, combining (66) with (67), in view of (65) we obtain

$$\begin{aligned} \Vert \partial {{\mathcal {F}}}(y^{(k)},\rho ^{(k)})\Vert _-\le \frac{b}{2}(d_k+d_{k-1}),\ \ \text{ with } b = 2\max \left\{ \frac{q}{\sqrt{1-\omega }}+r,q\sqrt{\frac{2\alpha _{max}}{\theta (1-\omega )}}\right\} \end{aligned}$$

and the thesis follows with \(I = \{1,2\}\), \(\theta _1=\theta _2 = \frac{1}{2}\). \(\square\)

The following lemma holds for all methods whose iterates satisfy [H1],[H2], [H3] with \({{\mathcal {F}}}\) defined as in (61), and it is crucial to ensure condition [H6] for i\(^2\)Piano.

Proposition 15

Let Assumptions [A1]–[A4] be satisfied and assume that \(\{ { x}^{(k)}\}_{k\in {\mathbb {N}}}\) satisfies [H1],[H2], [H3] with \({{\mathcal {F}}}\) defined as in (61). If \(\{(x^{(k_j)},\rho ^{(k_j)})\}_{j\in {\mathbb {N}}}\) is a subsequence of \(\{( { x}^{(k)},\rho ^{(k)})\}_{k\in {\mathbb {N}}}\) converging to some \((x^*,\rho ^*)\in {\mathbb {R}}^n\times {\mathbb {R}}^m\) such that \(\lim _{j\rightarrow \infty }\Vert u^{(k_j)}-x^{(k_j)}\Vert =0\), then \(\rho ^*=0\) and \(\lim _{j\rightarrow \infty }{{\mathcal {F}}}( u^{(k_j)},\rho ^{(k_j)}) = {{\mathcal {F}}}(x^*,\rho ^*)\).

Proof

Recalling that [H1] implies \(\displaystyle \lim _{k\rightarrow \infty } d_k=0\), from [H3] and thanks to the separable structure of \({\mathcal {F}}\), we obtain that there exists \({\hat{v}}^{(k)}\in \partial f(u^{(k)})\) such that \(\displaystyle \lim _{k\rightarrow \infty } \Vert {\hat{v}}^{(k)}\Vert =\displaystyle \lim _{k\rightarrow \infty }\rho ^{(k)}= 0\). In particular, in view of (12), we can write \({\hat{v}}^{(k)}= \nabla f_0( u^{(k)}) + w^{(k)}\), where \(w^{(k)}\in \partial f_1( u^{(k)})\). Therefore, by continuity of \(\nabla f_0\), the following implication holds

$$\begin{aligned} \lim _{k\rightarrow \infty }{\hat{v}}^{(k)}=0\Rightarrow \lim _{j\rightarrow \infty } w^{(k_j)} = -\nabla f_0(x^*). \end{aligned}$$

Adding the quantity \(\frac{1}{2} (\rho ^{(k_j)})^2\) to both sides of the subgradient inequality yields

$$\begin{aligned} f_1(x^*) + \frac{1}{2} (\rho ^{(k_j)})^2\ge & {} f_1( u^{(k_j)}) + \langle w^{(k_j)},x^*-u^{(k_j)}\rangle + \frac{1}{2} (\rho ^{(k_j)})^2\\= & {} {{\mathcal {F}}}( u^{(k_j)},\rho ^{(k_j)}) -f_0(u^{(k_j)}) + \langle w^{(k_j)}, x^*-u^{(k_j)}\rangle \end{aligned}$$

where the last equality is obtained by adding and subtracting \(f_0(u^{(k_j)})\) to the right-hand-side. Taking limits on both sides we obtain

$$\begin{aligned} f_1(x^*) +\frac{1}{2} (\rho ^*)^2\ge \lim _{j\rightarrow \infty }{{\mathcal {F}}}(u^{(k_j)},\rho ^{(k_j)}) -f_0(x^*), \end{aligned}$$

which, rearranging terms, gives \(\displaystyle \lim _{j\rightarrow \infty }{{\mathcal {F}}}(u^{(k_j)},\rho ^{(k_j)}) \le {{\mathcal {F}}}(x^*,\rho ^*)\). On the other side, by assumption, \({\mathcal {F}}\) is lower semicontinuous, therefore \(\displaystyle \lim _{j\rightarrow \infty }{{\mathcal {F}}}(u^{(k_j)},\rho ^{(k_j)}) \ge {{\mathcal {F}}}(x^*,\rho ^*)\), which completes the proof. \(\square\)

We are now ready to prove Theorem 6, which states the convergence of the i\(^2\)Piano iterates to a stationary point.

Proof

(Proof of Theorem 6) By Propositions 12-13-14, we know that conditions [H1]-[H2]-[H3] hold for i\(^2\)Piano. Furthermore, if \(\omega =1\), then \(u^{(k)}= {\hat{x}}^{(k+1)}= { x}^{(k+1)}\), and (59) directly implies that \(\displaystyle \lim _{k\rightarrow \infty } \Vert u^{(k)}- { x}^{(k)}\Vert = 0\). If \(\omega <1\), using (36) and (60) we have

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert u^{(k)}- { x}^{(k)}\Vert&=\lim _{k\rightarrow \infty } \Vert y^{(k)}- { x}^{(k)}\Vert \\&\le \lim _{k\rightarrow \infty }\sqrt{-2\alpha _{max}\left( 1+\frac{\tau }{2}\right) h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})}= 0. \end{aligned}$$

Then, in both cases, the assumptions of Proposition 15 are satisfied and, therefore, condition [H6] holds. Finally, condition [H4] follows from (58), while condition [H6] is trivially satisfied, since both sequences \(\{a_k\}_{k\in {\mathbb {N}}}\) and \(\{b_k\}_{k\in {\mathbb {N}}}\) are constant. Then, Theorem 5 applies and guarantees that the sequence \(\{(x^{(k)},\rho ^{(k)})\}_{k\in {\mathbb {N}}}\) converges to a stationary point \((x^*,\rho ^*)\) of \({\mathcal {F}}\). Note that, since \({\mathcal {F}}\) is the sum of separable functions, its subdifferential can be written as \(\partial {\mathcal {F}}(x,\rho )=\partial f(x)\times \{\rho \}\). Then, \((x^*,\rho ^*)\) is stationary for \({\mathcal {F}}\) if and only if \(\rho ^*=0\) and \(0\in \partial f(x^*)\). Hence, \(x^*\) is a stationary point for f and \(\{x^{(k)}\}_{k\in {\mathbb {N}}}\) converges to it. \(\square\)

1.3 A.3 Convergence analysis of iPila

This section aims to develop the convergence framework for iPila. The first issue to be addressed is the well posedness of the line–search algorithm. To this end, we prove first the following Lemma.

Lemma 16

Let \(d^{(k)}\in {\mathbb {R}}^{2n}\) be defined according to (3132), where \(0< {\alpha _k}\le \alpha _{max}\), \(\beta _k\ge 0\), \(\gamma _k\ge \gamma _{min}\), for some given positive real constants \(\alpha _{max}>0\), \(\gamma _{min}\ge 0\). Define \(\Delta _k \in {\mathbb {R}}_{\le 0}\) as

$$\begin{aligned} \Delta _k = h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) -\gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2. \end{aligned}$$
(68)

Then, we have

$$\begin{aligned} \Phi '(x^{(k)},s^{(k)};d_x^{(k)},d_s^{(k)})&\le \Delta _k\end{aligned}$$
(69)
$$\begin{aligned}&\le - a\Vert {\tilde{y}}^{(k)}-x^{(k)}\Vert ^2-\gamma _{min} \Vert x^{(k)}-s^{(k)}\Vert ^2 \end{aligned}$$
(70)

where \(a>0\) is a positive constant. Therefore, \(\Phi '( { x}^{(k)}, s^{(k)};d_x^{(k)},d_s^{(k)})<0\) whenever \({\tilde{y}}^{(k)}\ne { x}^{(k)}\) or \({ x}^{(k)}\ne s^{(k)}\).

Proof

We first observe that (29) with \(x= { x}^{(k)}\), \(s= s^{(k)}\), \(y={\tilde{y}}^{(k)}\), \(d_x=d_x^{(k)}\), \(d_s=d_s^{(k)}\), gives:

$$\begin{aligned} \Phi '&(x^{(k)},s^{(k)};d_x^{(k)},d_s^{(k)}) \nonumber \\&\le f_1({\tilde{y}}^{(k)})-f_1(x^{(k)})+\langle \nabla f_0(x^{(k)})+x^{(k)}-s^{(k)},d_x^{(k)}\rangle +\langle s^{(k)}-x^{(k)},d_s^{(k)}\rangle \end{aligned}$$
(71)
$$\begin{aligned}&= f_1({\tilde{y}}^{(k)}) -f_1( { x}^{(k)}) +\langle \nabla f_0( { x}^{(k)}),{\tilde{y}}^{(k)}- { x}^{(k)}\rangle + \langle { x}^{(k)}- s^{(k)},{\tilde{y}}^{(k)}- { x}^{(k)}\rangle +\nonumber \\&+ \left( 1+\frac{\beta _k}{\alpha _k}\right) \langle s^{(k)}- { x}^{(k)},{\tilde{y}}^{(k)}- { x}^{(k)}\rangle -\gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2\nonumber \\&= f_1({\tilde{y}}^{(k)}) -f_1( { x}^{(k)}) +\langle \nabla f_0( { x}^{(k)})-\frac{\beta _k}{\alpha _k}( { x}^{(k)}- s^{(k)}),{\tilde{y}}^{(k)}- { x}^{(k)}\rangle -\gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2\nonumber \\&\le h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) -\gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2 \nonumber \\&\le -\frac{\theta }{2\alpha _k}\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 -\gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2 \end{aligned}$$
(72)

where the last inequality follows from (38). Then, the thesis follows from \(\alpha _k\le \alpha _{max}\) with \(a = \theta /{2\alpha _{max}}\). \(\square\)

The well posedness of the line–search procedure and its main properties are summarized in the following lemma.

Lemma 17

Let the assumptions of Lemma 16 be satisfied with, in addition, \(\beta _k\le \beta _{max}\), \(\gamma \le \gamma _{max}\), \({ \beta _{max}\ge 0}\), \(\gamma _{max}\ge 0\). Then, the Armijo backtracking line–search algorithm terminates in a finite number of steps, and there exists \(\lambda _{min}>0\) such that the parameter \(\lambda _k^+\) computed with the line–search algorithm satisfies

$$\begin{aligned} \lambda _k^+ \ge \lambda _{min}. \end{aligned}$$
(73)

Proof

Let us first prove that

$$\begin{aligned} \Delta _k\le - C \Vert d^{(k)}\Vert ^2, \ \ \forall k\in {\mathbb {N}}. \end{aligned}$$
(74)

for a positive constant \(C>0\). Setting \(\delta _k = 1+\frac{\beta _k}{\alpha _k}\), the bounds on the parameters imply that \(1\le \delta _k\le {\bar{\delta }}\), with \({\bar{\delta }} = 1+\frac{\beta _{max}}{\alpha _{min}}\). By definition of \(d^{(k)}\) in (30) we have

$$\begin{aligned} \Vert d^{(k)}\Vert ^2&= \Vert d_x^{(k)}\Vert ^2 + \Vert d_s^{(k)}\Vert ^2\\&= \Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 + \Vert \delta _k({\tilde{y}}^{(k)}- { x}^{(k)}) +\gamma _k( { x}^{(k)}- s^{(k)})\Vert ^2\\&= (1+\delta _k^2)\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 + \gamma _k^2\Vert { x}^{(k)}- s^{(k)}\Vert ^2+2\delta _k\gamma _k\langle {\tilde{y}}^{(k)}- { x}^{(k)}, { x}^{(k)}- s^{(k)}\rangle \\&\le (1+\delta _k^2+\delta _k\gamma _k)\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 + (\gamma _k^2 + \delta _k\gamma _k)\Vert { x}^{(k)}- s^{(k)}\Vert ^2\\&\le (1+{\bar{\delta }}^2+{\bar{\delta }} \gamma _{max})\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 + \gamma _k(\gamma _{max} + {\bar{\delta }})\Vert { x}^{(k)}- s^{(k)}\Vert ^2\\&\le \frac{1}{C} (a\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 + \gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2) \end{aligned}$$

where \(C =1/\max \{(1+{\bar{\delta }}^2+{\bar{\delta }} \gamma _{max})/a,\gamma _{max} + {\bar{\delta }}\}\). Multiplying both sides of the last inequality above by C and combining with (70) gives (74).

Since \(\Phi _0\) has M-Lipschitz continuous gradient, with \(M=L+2\) (see (27)), we can apply the Descent Lemma obtaining

$$\begin{aligned}&\Phi _0( { x}^{(k)}+\lambda d_x^{(k)}, s^{(k)}+\lambda d_s^{(k)}) \le \Phi _0( { x}^{(k)}, s^{(k)}) + \lambda \langle \nabla \Phi _0( { x}^{(k)}, s^{(k)}),d^{(k)}\rangle + \frac{ L_{\Phi _0}}{2} \lambda ^2\Vert d^{(k)}\Vert ^2\nonumber \\&\le \Phi _0( { x}^{(k)}, s^{(k)}) + \lambda \langle \nabla \Phi _0( { x}^{(k)}, s^{(k)}),d^{(k)}\rangle - \frac{ L_{\Phi _0}}{2C} \lambda ^2\Delta _k, \end{aligned}$$
(75)

where the second inequality follows from (74). From the Jensen’s inequality applied to the convex function \(f_1\) we also obtain

$$\begin{aligned} \Phi _1( { x}^{(k)}+\lambda d_x^{(k)}, s^{(k)}+\lambda d_s^{(k)})= & {} f_1( { x}^{(k)}+\lambda d_x^{(k)})= f_1(\lambda {\tilde{y}}^{(k)}+(1-\lambda ) { x}^{(k)})\nonumber \\{} & {} \le (1-\lambda )f_1( { x}^{(k)}) + \lambda f_1({\tilde{y}}^{(k)}). \end{aligned}$$
(76)

Summing (75) with (76) gives

$$\begin{aligned} \Phi ( { x}^{(k)}&+\lambda d_x^{(k)}, s^{(k)}+\lambda d_s^{(k)}) \le \\&\le \Phi ( { x}^{(k)}, s^{(k)}) + \lambda \left( f_1({\tilde{y}}^{(k)}) - f_1( { x}^{(k)}) +\langle \nabla \Phi _0( { x}^{(k)}, s^{(k)}),d^{(k)}\rangle \right) - \frac{ L_{\Phi _0}}{2C} \lambda ^2\Delta _k\\&\le \Phi ( { x}^{(k)}, s^{(k)}) + \lambda \Delta _k - \frac{ L_{\Phi _0}}{2C} \lambda ^2\Delta _k, \end{aligned}$$

where the last inequality follows from (7172). The above relation implies

$$\begin{aligned} \Phi ( { x}^{(k)}+\lambda d_x^{(k)}, s^{(k)}+ \lambda d_s^{(k)}) \le \Phi ( { x}^{(k)}, s^{(k)}) + \lambda (1-\rho \lambda )\Delta _k,\ \ \forall \lambda \in [0,1] \end{aligned}$$

with \(\rho = \frac{ L_{\Phi _0}}{2C}\). Moreover, comparing the above inequality with the Armijo condition (33) shows that the last one is surely fulfilled when \(\lambda _k^+\) satisfies \(1-\rho \lambda _k^+\ge \sigma\), that is when \(\lambda _k^+\le (1-\sigma )/\rho\).

Since \(\lambda _k^+\) in the backtracking procedure is obtained starting from 1 and by successive reductions of a factor \(\delta <1\), we have \(\lambda _k^+\ge \delta ^{ L_{\Phi _0}}\), where \(L_{\Phi _0}\) is the smallest nonnegative integer such that \(\delta ^{ L_{\Phi _0}}\le (1-\sigma )/\rho\). Therefore, (73) is satisfied with \(\lambda _{min} = \delta ^{ L_{\Phi _0}}\). \(\square\)

In the remaining of this section we show that iPila can be cast in the framework of the abstract scheme. We first show that conditions [H1]–[H3] are satisfied.

Proposition 18

Let \(\{( { x}^{(k)}, s^{(k)})\}_{k\in {\mathbb {N}}}\) be the sequence generated by iPila. Then, condition [H1] holds with \(d_k=\sqrt{-\Delta _k}\), \(a_k = \sigma \lambda _{min}\). Moreover, under Assumption [A4], we also have

$$\begin{aligned} 0=\lim _{k\rightarrow \infty } \Vert { x}^{(k)}- s^{(k)}\Vert = \lim _{k\rightarrow \infty } h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) = \lim _{k\rightarrow \infty }\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert . \end{aligned}$$
(77)

Proof

From the updating rule at STEP 7 and from Lemma 17, we have

$$\begin{aligned} \Phi ( { x}^{(k+1)}, s^{(k+1)})\le & {} \Phi ( { x}^{(k)}, s^{(k)}) + \sigma \lambda _k\Delta _k\le \Phi ( { x}^{(k)}, s^{(k)}) + \sigma \lambda _{min}\Delta _k. \end{aligned}$$

Then, condition [H1] is satisfied with \(d_k^2=-\Delta _k\), \(a_k = \sigma \lambda _{min}\). Since from Assumption [A4] f is bounded from below, \(\Phi\) is bounded from below as well. Therefore, [H1] implies \(\displaystyle -\sum _{k=0}^{\infty }\Delta _k<\infty\) which, in turn, yields \(\displaystyle \lim _{k\rightarrow \infty }\Delta _k=0\). Recalling (70), this implies (77). \(\square\)

In the following we describe the setup for proving [H2], with the second auxiliary function defined as in (61).

Proposition 19

Let \(\{( { x}^{(k)}, s^{(k)})\}_{{k\in {\mathbb {N}}}}\) be the sequence generated by Algorithm iPila with \(\gamma _{min}>0\) and let \({{\mathcal {F}}}\) be defined as in (61). Then, there exist \(\{\rho ^{(k)}\}_{k\in {\mathbb {N}}},\{r_k\}_{k\in {\mathbb {N}}}\subset {\mathbb {R}}\), with \(\lim _{k\rightarrow \infty }r_k=0\), such that [H2] holds with \(u^{(k)}= {\hat{y}}^{(k)}\), where \({\hat{y}}^{(k)}\) is the exact minimizer of \(h^{(k)}(y; { x}^{(k)}, s^{(k)})\), i.e.,

$$\begin{aligned} {\hat{y}}^{(k)}= \mathop {\textrm{argmin}}\limits _{y\in {\mathbb {R}}^n} h^{(k)}(y; { x}^{(k)}, s^{(k)}). \end{aligned}$$

Proof

From STEP 7, we have

$$\begin{aligned} \Phi ( { x}^{(k+1)}, s^{(k+1)})\le & {} \Phi ({\tilde{y}}^{(k)}, { x}^{(k)}) = f({\tilde{y}}^{(k)}) + \frac{1}{2} \Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2\\\le & {} f( {\hat{y}}^{(k)}) - \left( c+\frac{\alpha _{max}}{\theta }\right) h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) + d\Vert { x}^{(k)}- s^{(k)}\Vert ^2, \end{aligned}$$

where the last inequality follows from (41) and (38). Setting

$$\begin{aligned} \rho ^{(k)}= \sqrt{2}\left( -\left( c+\frac{\alpha _{max}}{\theta }\right) h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)})+ d\Vert { x}^{(k)}- s^{(k)}\Vert ^2\right) ^{\frac{1}{2}}, \end{aligned}$$
(78)

we obtain \(\Phi ( { x}^{(k+1)}, s^{(k+1)}) \le {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)})\), which represents the left-most inequality in [H2], with \(u^{(k)}= {\hat{y}}^{(k)}\). On the other hand, from inequality (42) we obtain

$$\begin{aligned} {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)}) = f( {\hat{y}}^{(k)}) +\frac{1}{2} (\rho ^{(k)})^2 \le f( { x}^{(k)}) -{\bar{c}} h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) +{\bar{d}}\Vert { x}^{(k)}- s^{(k)}\Vert ^2+\frac{1}{2} (\rho ^{(k)})^2. \end{aligned}$$

Setting \(r_k = (\rho ^{(k)})^2/2 -{\bar{c}} h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) +({\bar{d}}-\frac{1}{2})\Vert { x}^{(k)}- s^{(k)}\Vert ^2\), we can write

$$\begin{aligned} {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)}) \le f( { x}^{(k)}) +\frac{1}{2} \Vert { x}^{(k)}- s^{(k)}\Vert ^2 + r_k = \Phi ( { x}^{(k)}, s^{(k)}) + r_k. \end{aligned}$$

From (77) we have that \(\lim _{k\rightarrow \infty }r_k=0\) and this proves [H2]. \(\square\)

Proposition 20

Let \(\{( { x}^{(k)}, s^{(k)})\}_{{k\in {\mathbb {N}}}}\) be the sequence generated by iPila with \(\gamma _{min}>0\). Then, there exists a positive constant b such that [H3] is satisfied with \(I = \{1\}\), \(\theta _1 = 1\), \(\zeta _k = 0\).

Proof

From (40) we know that there exists a subgradient \({\hat{v}}^{(k)}\in \partial f(\hat{y}^{(k)})\) such that

$$\begin{aligned} \Vert {\hat{v}}^{(k)}\Vert \le q\sqrt{-h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)})} + q\Vert { x}^{(k)}- s^{(k)}\Vert \end{aligned}$$
(79)

and, reasoning as in the proof of Proposition 15, it follows that

$$\begin{aligned} \Vert \partial {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)})\Vert _- \le \left\| \begin{pmatrix} {\hat{v}}^{(k)}\\ \rho ^{(k)}\end{pmatrix} \right\| \le \Vert {\hat{v}}^{(k)}\Vert + |\rho ^{(k)}|. \end{aligned}$$
(80)

Let us analyze the two terms at the right-hand side of the inequality above, showing that both can be bounded from above with a multiple of \(\sqrt{-\Delta _k}\). From (79) we obtain

$$\begin{aligned} \left\| {\hat{v}^{{(k)}} } \right\| & \le q\sqrt { - h^{{(k)}} (\tilde{y}^{{(k)}} ;x^{{(k)}} ,s^{{(k)}} ) + \gamma _{{min}} \left\| {x^{{(k)}} - s^{{(k)}} } \right\|^{2} } \\ & \quad + \frac{q}{{\sqrt {\gamma _{{min}} } }}\sqrt {\gamma _{{min}} \left\| {x^{{(k)}} - s^{{(k)}} } \right\|^{2} - h^{{(k)}} (\tilde{y}^{{(k)}} ;x^{{(k)}} ,s^{{(k)}} )} . \\ \end{aligned}$$

which, setting \(A = q\left( 1+1/\sqrt{\gamma _{min}}\right)\) and using (70), yields

$$\begin{aligned} \Vert {\hat{v}}^{(k)}\Vert \le A \sqrt{-h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) + \gamma _{min}\Vert { x}^{(k)}- s^{(k)}\Vert ^2}\le A\sqrt{-\Delta _k}. \end{aligned}$$
(81)

On the other hand, from definition (78)

$$\begin{aligned} \rho ^{(k)}\le & {} B \sqrt{-h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) + \gamma _k \Vert { x}^{(k)}- s^{(k)}\Vert ^2} = B\sqrt{-\Delta _k} \end{aligned}$$

where \(B = \sqrt{2}\max \left\{ c+ \alpha _{max}/{\theta }, d / {\gamma _{min}}\right\} ^{\frac{1}{2}}\). Therefore, combining the last inequality above with (81) and (80), yields \(\Vert \partial {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)})\Vert _-\le (A+B)\sqrt{-\Delta _k}\) which proves that [H3] is satisfied with \(I = \{1\}\), \(\theta _1 = 1\), \(\zeta _k = 0\), \(b=A+B\), \(b_k=1\). \(\square\)

We are now ready for proving Theorem 7, which states the main convergence result for Algorithm iPila.

Proof of Theorem 7

By Proposition 18, 19, and 20, we know that conditions [H1–H2–H3] hold for iPila. From (36) we have \(\displaystyle \lim _{k\rightarrow \infty }\Vert u^{(k)}- { x}^{(k)}\Vert = \displaystyle \lim _{k\rightarrow \infty }\Vert \hat{y}^{(k)}- { x}^{(k)}\Vert =0\). Hence we can apply Proposition 15 and conclude that [H6] holds. Moreover, condition [H4] holds as a consequence of Lemma 16, since \(d_k= \sqrt{-\Delta _k}\) and \(\Vert \tilde{y}^{(k)}-x^{(k)}\Vert \ge \Vert x^{(k+1)}-x^{(k)}\Vert /\lambda _k\ge \Vert x^{(k+1)}-x^{(k)}\Vert\) (see STEP 7). Finally, condition [H6] is trivially satisfied, since both sequences \(\{a_k\}_{k\in {\mathbb {N}}}\) and \(\{b_k\}_{k\in {\mathbb {N}}}\) are constant. Then Theorem 5 applies and guarantees that the sequence \(\{(x^{(k)},\rho ^{(k)})\}_{k\in {\mathbb {N}}}\) converges to a stationary point \((x^*,\rho ^*)\) of \({\mathcal {F}}\). Note that, since \({\mathcal {F}}\) is the sum of separable functions, its subdifferential can be written as \(\partial {\mathcal {F}}(x,\rho )=\partial f(x)\times \{\rho \}\). Then, \((x^*,\rho ^*)\) is stationary for \({\mathcal {F}}\) if and only if \(\rho ^*=0\) and \(0\in \partial f(x^*)\). Hence \(x^*\) is a stationary point for f and \(\{x^{(k)}\}_{k\in {\mathbb {N}}}\) converges to it. \(\square\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bonettini, S., Ochs, P., Prato, M. et al. An abstract convergence framework with application to inertial inexact forward–backward methods. Comput Optim Appl 84, 319–362 (2023). https://doi.org/10.1007/s10589-022-00441-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-022-00441-4

Keywords

Mathematics Subject Classification

Navigation