An abstract convergence framework with application to inertial inexact forward–backward methods

Bonettini, Silvia; Ochs, Peter; Prato, Marco; Rebegoldi, Simone

doi:10.1007/s10589-022-00441-4

An abstract convergence framework with application to inertial inexact forward–backward methods

Published: 01 January 2023

Volume 84, pages 319–362, (2023)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Silvia Bonettini ORCID: orcid.org/0000-0003-2936-365X¹,
Peter Ochs²^na1,
Marco Prato¹^na1 &
…
Simone Rebegoldi³^na1

511 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper we introduce a novel abstract descent scheme suited for the minimization of proper and lower semicontinuous functions. The proposed abstract scheme generalizes a set of properties that are crucial for the convergence of several first-order methods designed for nonsmooth nonconvex optimization problems. Such properties guarantee the convergence of the full sequence of iterates to a stationary point, if the objective function satisfies the Kurdyka–Łojasiewicz property. The abstract framework allows for the design of new algorithms. We propose two inertial-type algorithms with implementable inexactness criteria for the main iteration update step. The first algorithm, i$^2$Piano, exploits large steps by adjusting a local Lipschitz constant. The second algorithm, iPila, overcomes the main drawback of line-search based methods by enforcing a descent only on a merit function instead of the objective function. Both algorithms have the potential to escape local minimizers (or stationary points) by leveraging the inertial feature. Moreover, they are proved to enjoy the full convergence guarantees of the abstract descent scheme, which is the best we can expect in such a general nonsmooth nonconvex optimization setup using first-order methods. The efficiency of the proposed algorithms is demonstrated on two exemplary image deblurring problems, where we can appreciate the benefits of performing a linesearch along the descent direction inside an inertial scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A Forward–Backward Algorithm With Different Inertial Terms for Structured Non-Convex Minimization Problems

Article 16 April 2023

Inertial Forward–Backward Algorithms with Perturbations: Application to Tikhonov Regularization

Article 09 August 2018

An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions

Article 04 August 2015

Data Availibility Statement

The test image barbara in the experiments described in Sect. 5.1 is included in the software available at [47], while the jetplane one used in Sect. 5.2 can be downloaded from [49].

References

Bertero, M., Boccacci, P., Ruggiero, V.: Inverse Imaging with Poisson Data. IOP Publishing, Bristol (2018)
Google Scholar
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Article MathSciNet MATH Google Scholar
Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., Burachik, R.S., Combettes, P.L., Elser, V., Luke, D.R., Wolkowicz, H. (eds.) Fixed-point Algorithms for Inverse Problems in Science and Engineering. Springer Optimization and Its Applications, pp. 185–212. Springer, New York, NY (2011)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Article MathSciNet MATH Google Scholar
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)
Article MathSciNet MATH Google Scholar
Bonettini, S., Loris, I., Porta, F., Prato, M., Rebegoldi, S.: On the convergence of a linesearch based proximal-gradient method for nonconvex optimization. Inverse Probl. 33(5), 055005 (2017)
Article MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Article MathSciNet MATH Google Scholar
Bonettini, S., Prato, M., Rebegoldi, S.: A block coordinate variable metric linesearch based proximal gradient method. Comput. Optim. Appl. 71(1), 5–52 (2018)
Article MathSciNet MATH Google Scholar
Chouzenoux, E., Pesquet, J.-C., Repetti, A.: A block coordinate variable metric forward-backward algorithm. J. Global Optim. 66(3), 457–485 (2016)
Article MathSciNet MATH Google Scholar
Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates. J. Opt. Theory Appl. 165, 874–900 (2015)
Article MATH Google Scholar
Li, G., Pong, T.K.: Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Math. Program. 159, 371–401 (2016)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Article MathSciNet MATH Google Scholar
Bolte, J., Danilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Article MATH Google Scholar
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier 48(3), 769–783 (1998)
Article MathSciNet MATH Google Scholar
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: Subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Article MATH Google Scholar
den Dries, L.V.: Tame Topology and -minimal Structures. Cambridge University Press, 150 184 (1998)
Bolte, J., Daniilidis, A.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 10, 556–572 (2007)
Article MathSciNet MATH Google Scholar
Absil, P.A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM J. Optim. 16(2), 531–547 (2005)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)
Article MathSciNet MATH Google Scholar
Ochs, P.: Unifying abstract inexact convergence theorems and block coordinate variable metric iPiano. SIAM J. Optim. 29(1), 541–570 (2019)
Article MathSciNet MATH Google Scholar
Bonettini, S., Prato, M., Rebegoldi, S.: Convergence of inexact forward-backward algorithms using the forward-backward envelope. SIAM J. Optim. 30(4), 3069–3097 (2020)
Article MathSciNet MATH Google Scholar
Bonettini, S., Prato, M., Rebegoldi, S.: New convergence results for the inexact variable metric forward-backward method. Appl. Math. Comput. 392, 125719 (2021)
MathSciNet MATH Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)
Article Google Scholar
Polyak, B.: Introduction to Optimization. Optimization Software - Inc.Publication Division, New York (1987)
MATH Google Scholar
Ayers, G.R., Dainty, J.C.: Iterative blind deconvolution method and its applications. Opt. Lett. 13(7), 547–549 (1988)
Article Google Scholar
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Structured sparsity through convex optimization. Stat. Sci. 27(4), 450–468 (2012)
Article MathSciNet MATH Google Scholar
Bonettini, S., Rebegoldi, S., Ruggiero, V.: Inertial variable metric techniques for the inexact forward-backward algorithm. SIAM J. Sci. Comput. 40(5), 3180–3210 (2018)
Article MathSciNet MATH Google Scholar
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward-backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T., Wets, R.J.-B., Wets, M.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)
Chouzenoux, E., Pesquet, J.-C., Repetti, A.: Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162(1), 107–132 (2014)
Article MathSciNet MATH Google Scholar
Noll, D.: Convergence of non-smooth descent methods using the Kurdyka-Łojasiewicz inequality. J. Opt. Theory Appl. 160(2), 553–572 (2014)
Article MATH Google Scholar
Stella, L., Themelis, A., Patrinos, P.: Forward-backward quasi-Newton methods for nonsmooth optimization problems. Comput. Optim. Appl. 67, 443–487 (2017)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search based methods for nonsmooth optimization. SIAM J. Optim. 26(2), 891–921 (2016)
Article MathSciNet MATH Google Scholar
Zalinescu, A.: Convex Analysis in General Vector Spaces. World Scientific Publishing, Singapore (2002)
Book MATH Google Scholar
Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 19(4), 1167–1192 (2012)
MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books on Mathematics. Springer, New York (2011)
Calatroni, L., Chambolle, A.: Backtracking strategies for accelerated descent methods with smooth composite objectives. SIAM J. Optim. 29(3), 1772–1798 (2019)
Article MathSciNet MATH Google Scholar
Scheinberg, K., Goldfarb, D., Bai, X.: Fast first-order methods for composite convex optimization with backtracking. Found. Comput. Math. 14, 389–417 (2014)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton, NJ (1970)
Book MATH Google Scholar
Chen, Y., Pock, T., Ranftl, R., Bischof, H.: Revisiting loss-specific training of filter-based MRFs for image restoration. In: Weickert, J., Hein, M., Schiele, B. (eds.) Pattern Recognition, pp. 271–281. Springer, Berlin, Heidelberg (2013)
Chapter Google Scholar
Chen, Y., Ranftl, R., Pock, T.: Insights into analysis operator learning: from patch-based sparse models to higher order mrfs. IEEE Trans. Image Process. 23(3), 1060–1072 (2014)
Article MathSciNet MATH Google Scholar
Bonettini, S., Ochs, P., Prato, M., Rebegoldi, S.: inertial inexact Proximal algorithm for nonconvex optimization (i2Piano) and inertial Proximal inexact linesearch algorithm (iPila) software. http://www.oasis.unimore.it/site/home/software.html (2021)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. J. Phys. D. 60(1–4), 259–268 (1992)
Article MathSciNet MATH Google Scholar
Repetti, A., Chouzenoux, E.: RestoVMFB Lab: Matlab Toolbox for Image Restoration with the Variable Metric Forward-Backward Algorithm. http://www-syscom.univ-mlv.fr/~chouzeno/Logiciel.html (2013)
Benning, M., Betcke, M.M., Ehrhardt, M.J., Schönlieb, C.-B.: Choose your path wisely: gradient descent in a Bregman distance framework. SIAM J. Imaging Sci. 14(2), 814–843 (2021)
Article MathSciNet MATH Google Scholar
Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar

Download references

Funding

Silvia Bonettini, Marco Prato and Simone Rebegoldi are members of the INdAM research group GNCS. Peter Ochs acknowledges funding by the German Research Foundation (DFG Grant OC 150/3-1).

Author information

Peter Ochs, Marco Prato and Simone Rebegoldi have contributed equally to this work.

Authors and Affiliations

Dipartimento di Scienze Fisiche, Informatiche e Matematiche, Università di Modena e Reggio Emilia, Via Campi 213/b, 41125, Modena, Italy
Silvia Bonettini & Marco Prato
Department of Mathematics, University of Tübingen, Auf der Morgenstelle 10, 72076, Tübingen, Germany
Peter Ochs
Dipartimento di Ingegneria Industriale, Università di Firenze, Via di S. Marta 3, 50139, Firenze, Italy
Simone Rebegoldi

Authors

Silvia Bonettini
View author publications
You can also search for this author in PubMed Google Scholar
Peter Ochs
View author publications
You can also search for this author in PubMed Google Scholar
Marco Prato
View author publications
You can also search for this author in PubMed Google Scholar
Simone Rebegoldi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Silvia Bonettini.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix: convergence Analysis of i$^2$Piano and iPila

In this section we analyze the convergence properties of algorithms i$^2$Piano and iPila, showing that they can be both considered as special cases of the abstract scheme presented in Sect. 3.

1.1 A.1 Preliminary results

We collect here new basic results concerning the inexactness criterion introduced in Sect. 4.1, which is incorporated in our proposed algorithms. The following lemma is a consequence of the strong convexity of the function $h (\cdot ;x,s)$ defined in (15), and will be often employed in the following. In order to simplify the notation, here we omit the iteration index k.

Lemma 8

Suppose that Assumptions [A1]–[A2] hold true. For a given pair $(x,s)\in \textrm{dom}(f_1)\times {\mathbb {R}}^n$, let ${\hat{y}}, {\tilde{y}}$ be defined as in (16 18). Then, the following inequalities hold.

$$\begin{aligned} \frac{1}{2\alpha } \Vert {\hat{y}}-x\Vert ^2\le & {} \left( 1+\frac{\tau }{2}\right) (-h( {\tilde{y}};x,s))\end{aligned}$$

(36)

$$\begin{aligned} \frac{1}{2\alpha } \Vert {\tilde{y}}- {\hat{y}}\Vert ^2\le & {} \frac{\tau }{2} (-h( {\tilde{y}};x,s))\end{aligned}$$

(37)

$$\begin{aligned} \frac{\theta }{2\alpha } \Vert {\tilde{y}}-x\Vert ^2\le & {} (-h( {\tilde{y}};x,s)),\ \ \text{ with } \theta = 1/\left( \sqrt{1+\frac{\tau }{2}}+\sqrt{\frac{\tau }{2}}\right) ^2\le 1. \end{aligned}$$

(38)

Proof

Inequalities (36–37) follow by combining the strong convexity of the function $h(\cdot ;x,s)$ and condition (18) as in [25, Lemma2]. As for (38), we have

$$\begin{aligned} \frac{1}{2\alpha }\Vert {\tilde{y}}-x\Vert ^2= & {} \frac{1}{2\alpha }\Vert {\tilde{y}}- {\hat{y}}+ {\hat{y}}-x\Vert ^2 = \frac{1}{2\alpha }\Vert {\tilde{y}}- {\hat{y}}\Vert ^2 + \frac{1}{2\alpha }\Vert {\hat{y}}-x\Vert ^2 + \frac{1}{\alpha }\langle {\tilde{y}}- {\hat{y}}, {\hat{y}}-x\rangle \\\le & {} \frac{1}{2\alpha }\Vert {\tilde{y}}- {\hat{y}}\Vert ^2 + \frac{1}{2\alpha }\Vert {\hat{y}}-x\Vert ^2 + \frac{1}{\alpha } \Vert {\tilde{y}}- {\hat{y}}\Vert \cdot \Vert {\hat{y}}-x\Vert \\\le & {} \left( 1+\frac{\tau }{2}\right) (-h( {\tilde{y}};x,s)) + \frac{\tau }{2} (-h( {\tilde{y}};x,s)) + 2\sqrt{1+\frac{\tau }{2}}\sqrt{\frac{\tau }{2}}(-h( {\tilde{y}};x,s))\\= & {} \left( \sqrt{1+\frac{\tau }{2}}+\sqrt{\frac{\tau }{2}}\right) ^2(-h( {\tilde{y}};x,s)), \end{aligned}$$

where the last inequality follows from the application of (36–37). $\square$

The next lemma provides a subgradient $\hat{v}\in \partial f(\hat{y})$ whose norm is bounded from above by a quantity containing $\sqrt{ - h( {\tilde{y}};x,s)}$. Its proof is omitted since it is almost identical to the one of Lemma 3 in [25].

Lemma 9

Suppose Assumptions [A1]–[A3] hold true. Let x be a point in $\textrm{dom}(f_1)$ and let ${\hat{y}}, {\tilde{y}}$ be defined as in (16–18). Moreover, assume that $\alpha \in [\alpha _{min},\alpha _{max}]$, with $0<\alpha _{min}\le \alpha _{max}$ and $\beta \in [0,\beta _{max}]$, with $\beta _{max}\ge 0$. Then, there exists a subgradient ${\hat{v}}\in \partial f( {\hat{y}})$ such that

$$\begin{aligned} \Vert {\hat{v}}\Vert\le & {} p(\Vert {\hat{y}}-x\Vert + \Vert x-s\Vert ) \end{aligned}$$

(39)

$$\begin{aligned}\le & {} q(\sqrt{ - h( {\tilde{y}};x,s)} + \Vert x-s\Vert ) , \end{aligned}$$

(40)

where the two constants p, q depend only on $\alpha _{min},\alpha _{max},\beta _{max}$ and on the Lipschitz constant L.

The following lemma is the equivalent of Lemma 4-5 in [25].

Lemma 10

Suppose Assumptions [A1]–[A3] hold true and assume $0<\alpha _{min}\le \alpha \le \alpha _{max}$, $\beta \in [0,\beta _{max}]$. Let (x, s) be a point in $\textrm{dom}(f_1)\times {\mathbb {R}}^n$ and let ${\hat{y}}, {\tilde{y}}$ be defined as in (16–18), for some $\tau \ge 0$. Then, there exists $c,d,{\bar{c}},{\bar{d}} \in {\mathbb {R}}$ depending only on $\alpha _{min},\alpha _{max},\beta _{max},\tau$ such that

$$\begin{aligned}{} & {} f( {\hat{y}})\ge f( {\tilde{y}}) +ch( {\tilde{y}};x,s) - d \Vert x-s\Vert ^2 \end{aligned}$$

(41)

$$\begin{aligned}{} & {} f( {\hat{y}})\le f(x) - {\bar{c}} h( {\tilde{y}};x,s)+ {\bar{d}}\Vert x-s\Vert ^2. \end{aligned}$$

(42)

Proof

From the Descent Lemma [51, Proposition A.24] we have

$$\begin{aligned} f_0( {\hat{y}})\ge f_0( {\tilde{y}}) - \langle \nabla f_0( {\hat{y}}), {\tilde{y}}- {\hat{y}}\rangle -\frac{L}{2} \Vert {\tilde{y}}- {\hat{y}}\Vert ^2. \end{aligned}$$

(43)

The inclusion $0\in \partial _\epsilon h( {\tilde{y}};x,s)$ in (20) implies that there exists a vector $e\in {\mathbb {R}}^n$ with

$$\begin{aligned} \frac{1}{2\alpha } \Vert e\Vert ^2 \le \epsilon \end{aligned}$$

(44)

such that

$$\begin{aligned} -\frac{1}{\alpha }( {\tilde{y}}-x+\alpha \nabla f_0(x) -\beta (x-s)+ e)\in \partial _{\epsilon }f_1( {\tilde{y}}) \end{aligned}$$

(see [7] and references therein). The definition of $\epsilon$-subdifferential implies

$$\begin{aligned} f_1( {\hat{y}})\ge f_1( {\tilde{y}}) -\frac{1}{\alpha }\langle {\tilde{y}}-x, {\hat{y}}- {\tilde{y}}\rangle +\frac{\beta }{\alpha }\langle {\hat{y}}- {\tilde{y}},x-s\rangle -\langle \nabla f_0(x), {\hat{y}}- {\tilde{y}}\rangle - \frac{1}{\alpha }\langle e, {\hat{y}}- {\tilde{y}}\rangle - \epsilon . \end{aligned}$$

(45)

Summing inequalities (43) and (45) yields

$$\begin{aligned}{} & {} f( {\hat{y}})\ge f( {\tilde{y}})-\langle \nabla f_0(x)- \nabla f_0( {\hat{y}}), {\hat{y}}- {\tilde{y}}\rangle +\frac{\beta }{\alpha }\langle {\hat{y}}- {\tilde{y}},x-s\rangle \nonumber \\{} & {} -\frac{1}{\alpha }\langle {\tilde{y}}-x, {\hat{y}}- {\tilde{y}}\rangle - \frac{1}{\alpha }\langle e, {\hat{y}}- {\tilde{y}}\rangle -\frac{L}{2} \Vert {\tilde{y}}- {\hat{y}}\Vert ^2 -\epsilon . \end{aligned}$$

(46)

Now we consider each term at the right-hand-side in the above inequality so as to obtain a lower bound. Using the Cauchy-Schwarz inequality, Assumption [A3], (36) and (37) we obtain

$$\begin{aligned} \langle \nabla f_0(x)- \nabla f_0( {\hat{y}}), {\hat{y}}- {\tilde{y}}\rangle\le & {} \Vert \nabla f_0(x)- \nabla f_0( {\hat{y}})\Vert \Vert {\hat{y}}- {\tilde{y}}\Vert \nonumber \\\le & {} L\alpha _{max}\sqrt{2\tau \left( 1+\frac{\tau }{2}\right) }(-h( {\tilde{y}};x,s)). \end{aligned}$$

(47)

Similarly, using again the Cauchy-Schwarz inequality, (37) and (38), we can write

$$\begin{aligned} \frac{1}{\alpha }\langle {\tilde{y}}-x, {\hat{y}}- {\tilde{y}}\rangle\le & {} \frac{1}{\alpha _{min}}\Vert {\tilde{y}}-x\Vert \Vert {\hat{y}}- {\tilde{y}}\Vert \le \frac{\alpha _{max}}{\alpha _{min}}\sqrt{\frac{2\tau }{\theta }}(-h( {\tilde{y}};x,s)). \end{aligned}$$

(48)

Moreover, from (44) and (20) we obtain $\Vert e\Vert \le \sqrt{2\alpha _{max}\epsilon }\le \sqrt{\alpha _{max}\tau (-h( {\tilde{y}};x,s))}$ which, using also (37), yields

$$\begin{aligned} \frac{1}{\alpha }\langle e, {\hat{y}}- {\tilde{y}}\rangle\le & {} \frac{1}{\alpha } \Vert e\Vert \Vert {\hat{y}}- {\tilde{y}}\Vert \le \frac{\alpha _{max}\tau }{\alpha _{min}}(-h( {\tilde{y}};x,s)). \end{aligned}$$

(49)

Finally, using (37), we can also write

$$\begin{aligned} \frac{\beta }{\alpha }\langle {\hat{y}}- {\tilde{y}},x-s\rangle \ge - \frac{\beta }{2\alpha }(\Vert {\hat{y}}- {\tilde{y}}\Vert ^2 + \Vert x-s\Vert ^2)\ge -\frac{\beta _{max}}{2\alpha _{min}}\left( -\alpha _{max}\tau h( {\tilde{y}};x,s) + \Vert x-s\Vert ^2\right) . \end{aligned}$$

(50)

Combining (46) with (38, 47, 48, 49, 50) and (20), gives (41) with

$$\begin{aligned} c = L\alpha _{max}\sqrt{2\tau \left( 1+\frac{\tau }{2}\right) }+\frac{\alpha _{max}}{\alpha _{min}}\sqrt{\frac{2\tau }{\theta }}+\frac{\alpha _{max}\tau }{\alpha _{min}}+\frac{L \alpha _{max}\tau }{2}+\frac{\tau }{2}+\frac{\beta _{max}\alpha _{max}\tau }{2\alpha _{min}}, \ \ d = \frac{\beta _{max}}{2\alpha _{min}}. \end{aligned}$$

As for (42), using the Descent Lemma we obtain

$$\begin{aligned} f_0( y) \le f_0(x) + \langle \nabla f_0(x),y - x\rangle + \frac{L}{2} \Vert y - x\Vert ^2, \end{aligned}$$

for all $x, y\in \textrm{dom}(f_1)$. Summing $f_1(y)$ on both sides yields

$$\begin{aligned}{} & {} f(y) \le f(x) + f_1(y)-f_1(x) + \langle \nabla f_0(x),y - x\rangle + \frac{L}{2} \Vert y - x\Vert ^2\\{} & {} \le f(x) + h(y;x,s) + \frac{L}{2} \Vert y - x\Vert ^2+ \frac{\beta }{\alpha }\langle x-s,y - x\rangle \\{} & {} \le f(x) + h(y;x,s) + \frac{L}{2} \Vert y - x\Vert ^2+ \frac{\beta }{2\alpha }(\Vert x-s\Vert ^2 + \Vert y-x\Vert ^2). \end{aligned}$$

From the previous inequality with $y= {\hat{y}}$, recalling that $h( {\hat{y}};x,s)\le 0$ and combining with (36) yields (42), where the constants are set as ${\bar{c}} = (L\alpha _{max}+\beta _{max}) (1+{\tau }/{2})$, ${\bar{d}} = {\beta _{max}}/{(2\alpha _{min})}$. $\square$

1.2 A.2 Convergence analysis of i$^2$Piano

Our aim now is to frame i$^2$Piano in the abstract scheme defined by Conditions 3 to enjoy the favourable convergence guarantees that are provided by Theorem 5. The line of the proof developed in this section is based on an extension of the arguments in [23]. We start the convergence analysis by showing that i$^2$Piano is well-posed and that its parameters satisfy some useful relations.

Lemma 11

The loop between STEP 2 and STEP 6 terminates in a finite number of steps. In particular, there exists $\overline{L}>0$ such that $L_k\le \overline{L}$, $\forall \ k\ge 0$. Moreover, we have

$$\begin{aligned} 0\le \beta _k\le \frac{1+\theta \omega }{2}, \ \ \forall \ k\ge 0 \end{aligned}$$

(51)

and there exist two positive constants $\alpha _{min},\alpha _{max}$ with $0<\alpha _{min}\le \alpha _{max}$ such that $\alpha _k\in [\alpha _{min},\alpha _{max}]$, $\forall k\ge 0$. We also have

$$\begin{aligned} \frac{1+\theta \omega }{2\alpha _k}-\frac{L_k}{2}-\frac{\beta _k}{2\alpha _k}= & {} \delta \end{aligned}$$

(52)

$$\begin{aligned} \delta -\frac{\beta _k}{2\alpha _k}= & {} \gamma . \end{aligned}$$

(53)

Proof

Since $\eta >1$, after a finite number of steps the tentative value of $L_k$ satisfies $L_k\ge L$, where L is the Lipschitz constant of $\nabla f_0$. Then, from the Descent Lemma, the inequality at Step 6 is satisfied. From $\delta \ge \gamma$ we have $b_k\ge 1$, which implies (51). A simple inspection shows that the following equalities hold:

$$\begin{aligned} b_k=\frac{1+\theta \omega -{\beta _k}}{1+\theta \omega -2\beta _k}\Rightarrow \frac{L_k+2\delta }{L_k+2\gamma }=\frac{1+\theta \omega -{\beta _k}}{1+\theta \omega -2\beta _k}, \end{aligned}$$

which leads to rewriting the parameter $\alpha _k$ as

$$\begin{aligned} \alpha _k=\frac{1+\theta \omega -\beta _k}{L_k+2\delta }. \end{aligned}$$

(54)

Then there holds $\alpha _k\ge \alpha _{min}$ with $\alpha _{min} = (1+\theta \omega )/(2(\overline{L} + 2\delta ))$ and, since $\theta \omega \le 1$, we also have $\alpha _k\le \alpha _{max}$ with $\alpha _{max} = 2/L_{min}$. Moreover, we have

$$\begin{aligned} \frac{1+\theta \omega }{\alpha _k}-\frac{L_k}{2}-\frac{\beta _k}{2\alpha _k}=\frac{1+\theta \omega -\beta _k}{2\alpha _k}-\frac{L_k}{2}=\frac{L_k+2\delta }{2}-\frac{L_k}{2}=\delta \end{aligned}$$

and

$$\begin{aligned} \delta -\frac{\beta _k}{2\alpha _k}=\frac{1+\theta \omega }{2\alpha _k}-\frac{L_k}{2}-\frac{\beta _k}{\alpha _k}=\frac{1+\theta \omega -2\beta _k}{2\alpha _k}-\frac{L_k}{2}=\frac{L_k+2\gamma }{2}-\frac{L_k}{2}=\gamma . \end{aligned}$$

$\square$

Notice that the case $\tau =0$, which corresponds to the exact computation of the inertial proximal gradient point at Step 5, implies $\theta = 1$ and, choosing $\omega =1$, the parameters settings in i$^2$Piano are exactly the same as in [6]. The need of introducing the parameter $\omega$ is mainly technical and will be explained in the following.

We now prove that condition [H1] holds for i$^2$Piano when the corresponding merit function $\Phi$ is defined as follows:

$$\begin{aligned}&\Phi :{\mathbb {R}}^n\times {\mathbb {R}}^n \rightarrow \overline{{\mathbb {R}}},\ \ \ \ \Phi (x,s) = f(x) + \delta \Vert x-s\Vert ^2. \end{aligned}$$

(55)

Proposition 12

Let $\{x^{(k)}\}_{k\in {\mathbb {N}}}$ be the sequence generated by i$^2$Piano. Then there holds

$$\begin{aligned} \Phi (x^{(k+1)},x^{(k)}) \le \Phi (x^{(k)},x^{(k-1)}) -\gamma \Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2+(1-\omega ) h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}). \end{aligned}$$

(56)

Consequently, there exist $\{s^{(k)}\}_{k\in {\mathbb {N}}}$, $\{d_k\}_{k\in {\mathbb {N}}}$, $\{a_k\}_{k\in {\mathbb {N}}}$ such that [H1] holds with $\Phi$ as in (55).

Proof

By summing the quantity $f_1( { x}^{(k+1)})$ to both sides of inequality (24) we obtain

$$\begin{aligned}{} & {} f( { x}^{(k+1)}) \le f( { x}^{(k)}) \!+\! f_1( { x}^{(k+1)})-f_1( { x}^{(k)}) \!+\! \langle \nabla f_0( { x}^{(k)}), { x}^{(k+1)}- { x}^{(k)}\rangle \!+\! \frac{L_k}{2}\Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2\\{} & {} = f( { x}^{(k)}) + h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) - \left( \frac{1}{2{\alpha _k}}-\frac{L_k}{2}\right) \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2 \\{} & {} + \frac{\beta _k}{\alpha _k}\langle { x}^{(k+1)}- { x}^{(k)}, { x}^{(k)}- { x}^{(k-1)}\rangle \\{} & {} \le f( { x}^{(k)}) + h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) - \left( \frac{1}{2{\alpha _k}}-\frac{L_k}{2}\right) \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2\\{} & {} + \frac{\beta _k}{2\alpha _k}(\Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2+\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2)\\{} & {} \le f( { x}^{(k)})+(1-\omega )h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) -\frac{\theta \omega }{2\alpha _k}\Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2 \\{} & {} - \left( \frac{1}{2{\alpha _k}}-\frac{L_k}{2}\right) \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2+ \frac{\beta _k}{2\alpha _k}(\Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2+\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2)\\{} & {} = f( { x}^{(k)}) - \left( \frac{1+\theta \omega }{2{\alpha _k}}-\frac{L_k}{2}-\frac{\beta _k}{2\alpha _k}\right) \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2\\{} & {} +(1-\omega )h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) + \frac{\beta _k}{2\alpha _k}\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 \end{aligned}$$

where the first equality is obtained by adding and subtracting to the right-hand-side the quantity $\Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2/(2{\alpha _k}) +\beta _k/\alpha _k\langle { x}^{(k+1)}- { x}^{(k)}, { x}^{(k)}- { x}^{(k-1)}\rangle$, the subsequent inequality follows from the basic relation $2\langle a,b\rangle \le \Vert a\Vert ^2+\Vert b\Vert ^2$ and the next one from (38). Recalling (52), the above inequality can be conveniently rewritten as

$$\begin{aligned} f(x^{(k+1)})+\delta \Vert x^{(k+1)}&-x^{(k)}\Vert ^2 \le f(x^{(k)})+\delta \Vert x^{(k)}-x^{(k-1)}\Vert ^2\\&\ \ +\left( \frac{\beta _k}{2\alpha _k}-\delta \right) \Vert x^{(k)}-x^{(k-1)}\Vert ^2 +(1-\omega )h( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}). \end{aligned}$$

Finally, exploiting (53), we obtain condition [H1] with $\Phi$ given in (55), $a_k=1$ and

$$\begin{aligned} s^{(k)}= & {} { x}^{(k-1)}\end{aligned}$$

(57)

$$\begin{aligned} d_k^2= & {} \gamma \Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 -(1-\omega )h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}). \end{aligned}$$

(58)

$\square$

Under Assumption [A4], $\Phi$ is bounded from below, hence, condition [H1] implies

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert x^{(k)}-x^{(k-1)}\Vert = 0 \end{aligned}$$

(59)

and, if $\omega < 1$, also

$$\begin{aligned} \lim _{k\rightarrow \infty }h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) = 0. \end{aligned}$$

(60)

The choice $\omega < 1$ is enforced when $\tau > 0$ in order to obtain (60). If we take $\omega =1$, with the same arguments as above we still obtain (59). It is also worth noticing that, for large values of $\tau$, that is when a coarser accuracy is allowed in the computation of ${\tilde{y}}^{(k)}$, the parameter $\theta$ can be very small and this also influences the choice of $\alpha _k$ and $\beta _k$.

In order to prove condition [H2], let us now introduce the second merit function as follows

$$\begin{aligned} {{\mathcal {F}}}:{\mathbb {R}}^n\times {\mathbb {R}}\rightarrow \overline{{\mathbb {R}}},\ \ \ \ {{\mathcal {F}}}(u,\rho ) = f(u) +\frac{1}{2} \rho ^2. \end{aligned}$$

(61)

The following result is proved using similar arguments as the ones used in Lemma 4-5 in [25].

Proposition 13

Let $\{ { x}^{(k)}\}_{k\in {\mathbb {N}}}$ be the sequence generated by i$^2$Piano and, for each k, let the point $y^{(k)}$ be defined as

$$\begin{aligned} y^{(k)}= \mathop {\textrm{argmin}}\limits _{y\in {\mathbb {R}}^n} h^{(k)}(y; { x}^{(k)}, { x}^{(k-1)}). \end{aligned}$$

(62)

Then, there exist $\{\rho ^{(k)}\}_{k\in {\mathbb {N}}}$, $\{r_k\}_{k\in {\mathbb {N}}}$ such that condition [H2] holds with $\Phi$ defined in (55), ${\mathcal {F}}$ defined in (61), $s^{(k)}$ defined in (57) and $u^{(k)}=\hat{x}^{(k+1)}$.

Proof

When $\omega =1$ we necessarily have $\tau = 0$, which means ${\tilde{y}}^{(k)}= \hat{x}^{(k+1)}$. Therefore, [H2] direcly follows from [H1] with $u^{(k)}= { x}^{(k+1)}$, $\rho ^{(k)} = \sqrt{2\delta } \Vert { x}^{(k)}- { x}^{(k-1)}\Vert$, $r_k=0$. Consider now the case $\omega <1$. From (41) and (42), we directly obtain

$$\begin{aligned} f(y^{(k)})\ge & {} f({\tilde{y}}^{(k)}) + ch^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) - d\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2\\ f(y^{(k)})\le & {} f( { x}^{(k)}) - {\bar{c}} h({\tilde{y}}^{(k)}; { x}^{(k)}, { x}^{(k-1)})+ {\bar{d}}\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2, \end{aligned}$$

where $c,d,{\bar{c}},{\bar{d}}$ are defined as in Lemma 10 and do not depend on k. Combining the two inequalities above we obtain

$$\begin{aligned} f(&{\tilde{y}}^{(k)}) + \delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2\le \\&\le f(y^{(k)}) + \delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2 { -ch^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})} + d\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2\\&\le f( { x}^{(k)}) - (c+{\bar{c}}) h({\tilde{y}}^{(k)}; { x}^{(k)}, { x}^{(k-1)})+ (d+{\bar{d}})\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 + \delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2 . \end{aligned}$$

Recalling the definition of ${\mathcal {F}}$ in (61), the above inequalities can be rewritten as

$$\begin{aligned} f({\tilde{y}}^{(k)}) + \delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2\le {{\mathcal {F}}}(y^{(k)},\rho ^{(k)}) \le f( { x}^{(k)}) + \delta \Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 + r_k \end{aligned}$$

(63)

where $\rho ^{(k)}, r_k$ are given by

$$\begin{aligned} \rho ^{(k)}= & {} \sqrt{2}(\delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2 { -ch^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})} + d\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 )^{\frac{1}{2}} \nonumber \\{} & {} \nonumber \\ r_k= & {} - (c+{\bar{c}}) h({\tilde{y}}^{(k)}; { x}^{(k)}, { x}^{(k-1)})+ (d+{\bar{d}})\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2+ \delta \Vert { x}^{(k+1)}- { x}^{(k)}\Vert ^2. \end{aligned}$$

(64)

From (59–60) we obtain $\displaystyle \lim _{k\rightarrow \infty } r_k = 0$, hence, assumption [H2] is satisfied with the above settings and with $u^{(k)}= y^{(k)}$. $\square$

Next we show that condition [H3] holds for i$^2$Piano. The following result combines elements of Lemma 3 in [25] and Lemma 17 in [23].

Proposition 14

There exist $b>0$, $I\subset {\mathbb {Z}}$, $\{\theta _i\}_{i\in I}$ such that condition [H3] holds with $\{u^{(k)}\}_{k\in {\mathbb {N}}},\{\rho ^{(k)}\}_{k\in {\mathbb {N}}},\{d_k\}_{k\in {\mathbb {N}}}$ defined as in Proposition 13 and in (58), $\zeta _{k+1}=0$ and $b_{k+1}=1$.

Proof

From the separable structure of ${{\mathcal {F}}}$, if $v\in \partial f(u)$ and $\rho \in {\mathbb {R}}$, we have that $(v,\rho )\in \partial {{\mathcal {F}}}(u,\rho )$. In particular,

$$\begin{aligned} \Vert \partial {{\mathcal {F}}}(u,\rho )\Vert _- \le \Vert v\Vert + |\rho |, \text{ for } \text{ all } v\in \partial f(u), \rho \in {\mathbb {R}}. \end{aligned}$$

(65)

If $\omega = 1$, we are in the case $\tau = 0$. This means that ${ x}^{(k+1)}= y^{(k)}$, $\rho ^{(k)}= \sqrt{2\delta } \Vert { x}^{(k)}- { x}^{(k-1)}\Vert$. Moreover, (58) reduces to $d_k = \sqrt{\gamma }\Vert { x}^{(k)}- { x}^{(k-1)}\Vert = \sqrt{\gamma /(2\delta )}\rho ^{(k)}$. From (39), we have that there exists a subgradient ${\hat{v}}^{(k+1)}\in \partial f(y^{(k)})$ and a positive constant p such that

$$\begin{aligned} \Vert {\hat{v}}^{(k+1)}\Vert\le & {} p(\Vert { x}^{(k+1)}- { x}^{(k)}\Vert + \Vert { x}^{(k)}- { x}^{(k-1)}\Vert ) = \frac{p}{\sqrt{\gamma }}(d_{k+1} + d_{k}). \end{aligned}$$

Hence, $\Vert {\hat{v}}^{(k+1)}\Vert +\rho ^{(k)}\le p d_{k+1}/{\sqrt{\gamma }} + d_k(p+\sqrt{2\delta })/{\sqrt{\gamma }}$ and, recalling (65), [H3] follows with $b = (p+\sqrt{2\delta })/{\sqrt{\gamma }}$, $I=\{0,1\}$, $\theta _0=\theta _1=\frac{1}{2}$. Consider now the case $\omega < 1$. From (40), there exists a subgradient ${\hat{v}}^{(k+1)}\in \partial f(y^{(k)})$ and a positive constant q such that

$$\begin{aligned}{} & {} \Vert {\hat{v}}^{(k+1)}\Vert \le q\sqrt{-h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})} + q\Vert { x}^{(k)}- { x}^{(k-1)}\Vert \nonumber \\{} & {} \le q\sqrt{-h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})} + q\sqrt{\frac{2\alpha _{max}}{\theta }}\sqrt{-h^{(k-1)}( { x}^{(k)}; { x}^{(k-1)},x^{(k-2)})}\nonumber \\{} & {} \le \frac{q}{\sqrt{1-\omega }}\sqrt{-(1-\omega )h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})+\gamma \Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2} \nonumber \\{} & {} + q\sqrt{\frac{2\alpha _{max}}{\theta (1-\omega )}}\sqrt{-(1-\omega )h^{(k-1)}( { x}^{(k)}; { x}^{(k-1)},x^{(k-2)})+\gamma \Vert { x}^{(k-1)}-x^{(k-2)}\Vert ^2} \nonumber \\{} & {} \le \frac{q}{\sqrt{1-\omega }} d_k + q\sqrt{\frac{2\alpha _{max}}{\theta (1-\omega )}}d_{k-1} . \end{aligned}$$

(66)

From (64) and (38) we have

$$\begin{aligned}{} & {} \rho ^{(k)}\le \sqrt{2}\left( -(2\delta \alpha _{max}/\theta + c)h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)}) + d\Vert { x}^{(k)}- { x}^{(k-1)}\Vert ^2 \right) ^{\frac{1}{2}} \nonumber \\{} & {} \le r d_k \end{aligned}$$

(67)

where $r = \sqrt{2} \max \left\{ (2\delta \alpha _{max}/\theta +c)/(1-\omega ), d/\gamma \right\} ^{\frac{1}{2}}$. Then, combining (66) with (67), in view of (65) we obtain

$$\begin{aligned} \Vert \partial {{\mathcal {F}}}(y^{(k)},\rho ^{(k)})\Vert _-\le \frac{b}{2}(d_k+d_{k-1}),\ \ \text{ with } b = 2\max \left\{ \frac{q}{\sqrt{1-\omega }}+r,q\sqrt{\frac{2\alpha _{max}}{\theta (1-\omega )}}\right\} \end{aligned}$$

and the thesis follows with $I = \{1,2\}$, $\theta _1=\theta _2 = \frac{1}{2}$. $\square$

The following lemma holds for all methods whose iterates satisfy [H1],[H2], [H3] with ${{\mathcal {F}}}$ defined as in (61), and it is crucial to ensure condition [H6] for i$^2$Piano.

Proposition 15

Let Assumptions [A1]–[A4] be satisfied and assume that $\{ { x}^{(k)}\}_{k\in {\mathbb {N}}}$ satisfies [H1],[H2], [H3] with ${{\mathcal {F}}}$ defined as in (61). If $\{(x^{(k_j)},\rho ^{(k_j)})\}_{j\in {\mathbb {N}}}$ is a subsequence of $\{( { x}^{(k)},\rho ^{(k)})\}_{k\in {\mathbb {N}}}$ converging to some $(x^*,\rho ^*)\in {\mathbb {R}}^n\times {\mathbb {R}}^m$ such that $\lim _{j\rightarrow \infty }\Vert u^{(k_j)}-x^{(k_j)}\Vert =0$, then $\rho ^*=0$ and $\lim _{j\rightarrow \infty }{{\mathcal {F}}}( u^{(k_j)},\rho ^{(k_j)}) = {{\mathcal {F}}}(x^*,\rho ^*)$.

Proof

Recalling that [H1] implies $\displaystyle \lim _{k\rightarrow \infty } d_k=0$, from [H3] and thanks to the separable structure of ${\mathcal {F}}$, we obtain that there exists ${\hat{v}}^{(k)}\in \partial f(u^{(k)})$ such that $\displaystyle \lim _{k\rightarrow \infty } \Vert {\hat{v}}^{(k)}\Vert =\displaystyle \lim _{k\rightarrow \infty }\rho ^{(k)}= 0$. In particular, in view of (12), we can write ${\hat{v}}^{(k)}= \nabla f_0( u^{(k)}) + w^{(k)}$, where $w^{(k)}\in \partial f_1( u^{(k)})$. Therefore, by continuity of $\nabla f_0$, the following implication holds

$$\begin{aligned} \lim _{k\rightarrow \infty }{\hat{v}}^{(k)}=0\Rightarrow \lim _{j\rightarrow \infty } w^{(k_j)} = -\nabla f_0(x^*). \end{aligned}$$

Adding the quantity $\frac{1}{2} (\rho ^{(k_j)})^2$ to both sides of the subgradient inequality yields

$$\begin{aligned} f_1(x^*) + \frac{1}{2} (\rho ^{(k_j)})^2\ge & {} f_1( u^{(k_j)}) + \langle w^{(k_j)},x^*-u^{(k_j)}\rangle + \frac{1}{2} (\rho ^{(k_j)})^2\\= & {} {{\mathcal {F}}}( u^{(k_j)},\rho ^{(k_j)}) -f_0(u^{(k_j)}) + \langle w^{(k_j)}, x^*-u^{(k_j)}\rangle \end{aligned}$$

where the last equality is obtained by adding and subtracting $f_0(u^{(k_j)})$ to the right-hand-side. Taking limits on both sides we obtain

$$\begin{aligned} f_1(x^*) +\frac{1}{2} (\rho ^*)^2\ge \lim _{j\rightarrow \infty }{{\mathcal {F}}}(u^{(k_j)},\rho ^{(k_j)}) -f_0(x^*), \end{aligned}$$

which, rearranging terms, gives $\displaystyle \lim _{j\rightarrow \infty }{{\mathcal {F}}}(u^{(k_j)},\rho ^{(k_j)}) \le {{\mathcal {F}}}(x^*,\rho ^*)$. On the other side, by assumption, ${\mathcal {F}}$ is lower semicontinuous, therefore $\displaystyle \lim _{j\rightarrow \infty }{{\mathcal {F}}}(u^{(k_j)},\rho ^{(k_j)}) \ge {{\mathcal {F}}}(x^*,\rho ^*)$, which completes the proof. $\square$

We are now ready to prove Theorem 6, which states the convergence of the i$^2$Piano iterates to a stationary point.

Proof

(Proof of Theorem 6) By Propositions 12-13-14, we know that conditions [H1]-[H2]-[H3] hold for i$^2$Piano. Furthermore, if $\omega =1$, then $u^{(k)}= {\hat{x}}^{(k+1)}= { x}^{(k+1)}$, and (59) directly implies that $\displaystyle \lim _{k\rightarrow \infty } \Vert u^{(k)}- { x}^{(k)}\Vert = 0$. If $\omega <1$, using (36) and (60) we have

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert u^{(k)}- { x}^{(k)}\Vert&=\lim _{k\rightarrow \infty } \Vert y^{(k)}- { x}^{(k)}\Vert \\&\le \lim _{k\rightarrow \infty }\sqrt{-2\alpha _{max}\left( 1+\frac{\tau }{2}\right) h^{(k)}( { x}^{(k+1)}; { x}^{(k)}, { x}^{(k-1)})}= 0. \end{aligned}$$

Then, in both cases, the assumptions of Proposition 15 are satisfied and, therefore, condition [H6] holds. Finally, condition [H4] follows from (58), while condition [H6] is trivially satisfied, since both sequences $\{a_k\}_{k\in {\mathbb {N}}}$ and $\{b_k\}_{k\in {\mathbb {N}}}$ are constant. Then, Theorem 5 applies and guarantees that the sequence $\{(x^{(k)},\rho ^{(k)})\}_{k\in {\mathbb {N}}}$ converges to a stationary point $(x^*,\rho ^*)$ of ${\mathcal {F}}$. Note that, since ${\mathcal {F}}$ is the sum of separable functions, its subdifferential can be written as $\partial {\mathcal {F}}(x,\rho )=\partial f(x)\times \{\rho \}$. Then, $(x^*,\rho ^*)$ is stationary for ${\mathcal {F}}$ if and only if $\rho ^*=0$ and $0\in \partial f(x^*)$. Hence, $x^*$ is a stationary point for f and $\{x^{(k)}\}_{k\in {\mathbb {N}}}$ converges to it. $\square$

1.3 A.3 Convergence analysis of iPila

This section aims to develop the convergence framework for iPila. The first issue to be addressed is the well posedness of the line–search algorithm. To this end, we prove first the following Lemma.

Lemma 16

Let $d^{(k)}\in {\mathbb {R}}^{2n}$ be defined according to (31–32), where $0< {\alpha _k}\le \alpha _{max}$, $\beta _k\ge 0$, $\gamma _k\ge \gamma _{min}$, for some given positive real constants $\alpha _{max}>0$, $\gamma _{min}\ge 0$. Define $\Delta _k \in {\mathbb {R}}_{\le 0}$ as

$$\begin{aligned} \Delta _k = h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) -\gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2. \end{aligned}$$

(68)

Then, we have

$$\begin{aligned} \Phi '(x^{(k)},s^{(k)};d_x^{(k)},d_s^{(k)})&\le \Delta _k\end{aligned}$$

(69)

$$\begin{aligned}&\le - a\Vert {\tilde{y}}^{(k)}-x^{(k)}\Vert ^2-\gamma _{min} \Vert x^{(k)}-s^{(k)}\Vert ^2 \end{aligned}$$

(70)

where $a>0$ is a positive constant. Therefore, $\Phi '( { x}^{(k)}, s^{(k)};d_x^{(k)},d_s^{(k)})<0$ whenever ${\tilde{y}}^{(k)}\ne { x}^{(k)}$ or ${ x}^{(k)}\ne s^{(k)}$.

Proof

We first observe that (29) with $x= { x}^{(k)}$, $s= s^{(k)}$, $y={\tilde{y}}^{(k)}$, $d_x=d_x^{(k)}$, $d_s=d_s^{(k)}$, gives:

$$\begin{aligned} \Phi '&(x^{(k)},s^{(k)};d_x^{(k)},d_s^{(k)}) \nonumber \\&\le f_1({\tilde{y}}^{(k)})-f_1(x^{(k)})+\langle \nabla f_0(x^{(k)})+x^{(k)}-s^{(k)},d_x^{(k)}\rangle +\langle s^{(k)}-x^{(k)},d_s^{(k)}\rangle \end{aligned}$$

(71)

$$\begin{aligned}&= f_1({\tilde{y}}^{(k)}) -f_1( { x}^{(k)}) +\langle \nabla f_0( { x}^{(k)}),{\tilde{y}}^{(k)}- { x}^{(k)}\rangle + \langle { x}^{(k)}- s^{(k)},{\tilde{y}}^{(k)}- { x}^{(k)}\rangle +\nonumber \\&+ \left( 1+\frac{\beta _k}{\alpha _k}\right) \langle s^{(k)}- { x}^{(k)},{\tilde{y}}^{(k)}- { x}^{(k)}\rangle -\gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2\nonumber \\&= f_1({\tilde{y}}^{(k)}) -f_1( { x}^{(k)}) +\langle \nabla f_0( { x}^{(k)})-\frac{\beta _k}{\alpha _k}( { x}^{(k)}- s^{(k)}),{\tilde{y}}^{(k)}- { x}^{(k)}\rangle -\gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2\nonumber \\&\le h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) -\gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2 \nonumber \\&\le -\frac{\theta }{2\alpha _k}\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 -\gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2 \end{aligned}$$

(72)

where the last inequality follows from (38). Then, the thesis follows from $\alpha _k\le \alpha _{max}$ with $a = \theta /{2\alpha _{max}}$. $\square$

The well posedness of the line–search procedure and its main properties are summarized in the following lemma.

Lemma 17

Let the assumptions of Lemma 16 be satisfied with, in addition, $\beta _k\le \beta _{max}$, $\gamma \le \gamma _{max}$, ${ \beta _{max}\ge 0}$, $\gamma _{max}\ge 0$. Then, the Armijo backtracking line–search algorithm terminates in a finite number of steps, and there exists $\lambda _{min}>0$ such that the parameter $\lambda _k^+$ computed with the line–search algorithm satisfies

$$\begin{aligned} \lambda _k^+ \ge \lambda _{min}. \end{aligned}$$

(73)

Proof

Let us first prove that

$$\begin{aligned} \Delta _k\le - C \Vert d^{(k)}\Vert ^2, \ \ \forall k\in {\mathbb {N}}. \end{aligned}$$

(74)

for a positive constant $C>0$. Setting $\delta _k = 1+\frac{\beta _k}{\alpha _k}$, the bounds on the parameters imply that $1\le \delta _k\le {\bar{\delta }}$, with ${\bar{\delta }} = 1+\frac{\beta _{max}}{\alpha _{min}}$. By definition of $d^{(k)}$ in (30) we have

$$\begin{aligned} \Vert d^{(k)}\Vert ^2&= \Vert d_x^{(k)}\Vert ^2 + \Vert d_s^{(k)}\Vert ^2\\&= \Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 + \Vert \delta _k({\tilde{y}}^{(k)}- { x}^{(k)}) +\gamma _k( { x}^{(k)}- s^{(k)})\Vert ^2\\&= (1+\delta _k^2)\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 + \gamma _k^2\Vert { x}^{(k)}- s^{(k)}\Vert ^2+2\delta _k\gamma _k\langle {\tilde{y}}^{(k)}- { x}^{(k)}, { x}^{(k)}- s^{(k)}\rangle \\&\le (1+\delta _k^2+\delta _k\gamma _k)\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 + (\gamma _k^2 + \delta _k\gamma _k)\Vert { x}^{(k)}- s^{(k)}\Vert ^2\\&\le (1+{\bar{\delta }}^2+{\bar{\delta }} \gamma _{max})\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 + \gamma _k(\gamma _{max} + {\bar{\delta }})\Vert { x}^{(k)}- s^{(k)}\Vert ^2\\&\le \frac{1}{C} (a\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2 + \gamma _k\Vert { x}^{(k)}- s^{(k)}\Vert ^2) \end{aligned}$$

where $C =1/\max \{(1+{\bar{\delta }}^2+{\bar{\delta }} \gamma _{max})/a,\gamma _{max} + {\bar{\delta }}\}$. Multiplying both sides of the last inequality above by C and combining with (70) gives (74).

Since $\Phi _0$ has M-Lipschitz continuous gradient, with $M=L+2$ (see (27)), we can apply the Descent Lemma obtaining

$$\begin{aligned}&\Phi _0( { x}^{(k)}+\lambda d_x^{(k)}, s^{(k)}+\lambda d_s^{(k)}) \le \Phi _0( { x}^{(k)}, s^{(k)}) + \lambda \langle \nabla \Phi _0( { x}^{(k)}, s^{(k)}),d^{(k)}\rangle + \frac{ L_{\Phi _0}}{2} \lambda ^2\Vert d^{(k)}\Vert ^2\nonumber \\&\le \Phi _0( { x}^{(k)}, s^{(k)}) + \lambda \langle \nabla \Phi _0( { x}^{(k)}, s^{(k)}),d^{(k)}\rangle - \frac{ L_{\Phi _0}}{2C} \lambda ^2\Delta _k, \end{aligned}$$

(75)

where the second inequality follows from (74). From the Jensen’s inequality applied to the convex function $f_1$ we also obtain

$$\begin{aligned} \Phi _1( { x}^{(k)}+\lambda d_x^{(k)}, s^{(k)}+\lambda d_s^{(k)})= & {} f_1( { x}^{(k)}+\lambda d_x^{(k)})= f_1(\lambda {\tilde{y}}^{(k)}+(1-\lambda ) { x}^{(k)})\nonumber \\{} & {} \le (1-\lambda )f_1( { x}^{(k)}) + \lambda f_1({\tilde{y}}^{(k)}). \end{aligned}$$

(76)

Summing (75) with (76) gives

$$\begin{aligned} \Phi ( { x}^{(k)}&+\lambda d_x^{(k)}, s^{(k)}+\lambda d_s^{(k)}) \le \\&\le \Phi ( { x}^{(k)}, s^{(k)}) + \lambda \left( f_1({\tilde{y}}^{(k)}) - f_1( { x}^{(k)}) +\langle \nabla \Phi _0( { x}^{(k)}, s^{(k)}),d^{(k)}\rangle \right) - \frac{ L_{\Phi _0}}{2C} \lambda ^2\Delta _k\\&\le \Phi ( { x}^{(k)}, s^{(k)}) + \lambda \Delta _k - \frac{ L_{\Phi _0}}{2C} \lambda ^2\Delta _k, \end{aligned}$$

where the last inequality follows from (71–72). The above relation implies

$$\begin{aligned} \Phi ( { x}^{(k)}+\lambda d_x^{(k)}, s^{(k)}+ \lambda d_s^{(k)}) \le \Phi ( { x}^{(k)}, s^{(k)}) + \lambda (1-\rho \lambda )\Delta _k,\ \ \forall \lambda \in [0,1] \end{aligned}$$

with $\rho = \frac{ L_{\Phi _0}}{2C}$. Moreover, comparing the above inequality with the Armijo condition (33) shows that the last one is surely fulfilled when $\lambda _k^+$ satisfies $1-\rho \lambda _k^+\ge \sigma$, that is when $\lambda _k^+\le (1-\sigma )/\rho$.

Since $\lambda _k^+$ in the backtracking procedure is obtained starting from 1 and by successive reductions of a factor $\delta <1$, we have $\lambda _k^+\ge \delta ^{ L_{\Phi _0}}$, where $L_{\Phi _0}$ is the smallest nonnegative integer such that $\delta ^{ L_{\Phi _0}}\le (1-\sigma )/\rho$. Therefore, (73) is satisfied with $\lambda _{min} = \delta ^{ L_{\Phi _0}}$. $\square$

In the remaining of this section we show that iPila can be cast in the framework of the abstract scheme. We first show that conditions [H1]–[H3] are satisfied.

Proposition 18

Let $\{( { x}^{(k)}, s^{(k)})\}_{k\in {\mathbb {N}}}$ be the sequence generated by iPila. Then, condition [H1] holds with $d_k=\sqrt{-\Delta _k}$, $a_k = \sigma \lambda _{min}$. Moreover, under Assumption [A4], we also have

$$\begin{aligned} 0=\lim _{k\rightarrow \infty } \Vert { x}^{(k)}- s^{(k)}\Vert = \lim _{k\rightarrow \infty } h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) = \lim _{k\rightarrow \infty }\Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert . \end{aligned}$$

(77)

Proof

From the updating rule at STEP 7 and from Lemma 17, we have

$$\begin{aligned} \Phi ( { x}^{(k+1)}, s^{(k+1)})\le & {} \Phi ( { x}^{(k)}, s^{(k)}) + \sigma \lambda _k\Delta _k\le \Phi ( { x}^{(k)}, s^{(k)}) + \sigma \lambda _{min}\Delta _k. \end{aligned}$$

Then, condition [H1] is satisfied with $d_k^2=-\Delta _k$, $a_k = \sigma \lambda _{min}$. Since from Assumption [A4] f is bounded from below, $\Phi$ is bounded from below as well. Therefore, [H1] implies $\displaystyle -\sum _{k=0}^{\infty }\Delta _k<\infty$ which, in turn, yields $\displaystyle \lim _{k\rightarrow \infty }\Delta _k=0$. Recalling (70), this implies (77). $\square$

In the following we describe the setup for proving [H2], with the second auxiliary function defined as in (61).

Proposition 19

Let $\{( { x}^{(k)}, s^{(k)})\}_{{k\in {\mathbb {N}}}}$ be the sequence generated by Algorithm iPila with $\gamma _{min}>0$ and let ${{\mathcal {F}}}$ be defined as in (61). Then, there exist $\{\rho ^{(k)}\}_{k\in {\mathbb {N}}},\{r_k\}_{k\in {\mathbb {N}}}\subset {\mathbb {R}}$, with $\lim _{k\rightarrow \infty }r_k=0$, such that [H2] holds with $u^{(k)}= {\hat{y}}^{(k)}$, where ${\hat{y}}^{(k)}$ is the exact minimizer of $h^{(k)}(y; { x}^{(k)}, s^{(k)})$, i.e.,

$$\begin{aligned} {\hat{y}}^{(k)}= \mathop {\textrm{argmin}}\limits _{y\in {\mathbb {R}}^n} h^{(k)}(y; { x}^{(k)}, s^{(k)}). \end{aligned}$$

Proof

From STEP 7, we have

$$\begin{aligned} \Phi ( { x}^{(k+1)}, s^{(k+1)})\le & {} \Phi ({\tilde{y}}^{(k)}, { x}^{(k)}) = f({\tilde{y}}^{(k)}) + \frac{1}{2} \Vert {\tilde{y}}^{(k)}- { x}^{(k)}\Vert ^2\\\le & {} f( {\hat{y}}^{(k)}) - \left( c+\frac{\alpha _{max}}{\theta }\right) h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) + d\Vert { x}^{(k)}- s^{(k)}\Vert ^2, \end{aligned}$$

where the last inequality follows from (41) and (38). Setting

$$\begin{aligned} \rho ^{(k)}= \sqrt{2}\left( -\left( c+\frac{\alpha _{max}}{\theta }\right) h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)})+ d\Vert { x}^{(k)}- s^{(k)}\Vert ^2\right) ^{\frac{1}{2}}, \end{aligned}$$

(78)

we obtain $\Phi ( { x}^{(k+1)}, s^{(k+1)}) \le {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)})$, which represents the left-most inequality in [H2], with $u^{(k)}= {\hat{y}}^{(k)}$. On the other hand, from inequality (42) we obtain

$$\begin{aligned} {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)}) = f( {\hat{y}}^{(k)}) +\frac{1}{2} (\rho ^{(k)})^2 \le f( { x}^{(k)}) -{\bar{c}} h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) +{\bar{d}}\Vert { x}^{(k)}- s^{(k)}\Vert ^2+\frac{1}{2} (\rho ^{(k)})^2. \end{aligned}$$

Setting $r_k = (\rho ^{(k)})^2/2 -{\bar{c}} h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) +({\bar{d}}-\frac{1}{2})\Vert { x}^{(k)}- s^{(k)}\Vert ^2$, we can write

$$\begin{aligned} {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)}) \le f( { x}^{(k)}) +\frac{1}{2} \Vert { x}^{(k)}- s^{(k)}\Vert ^2 + r_k = \Phi ( { x}^{(k)}, s^{(k)}) + r_k. \end{aligned}$$

From (77) we have that $\lim _{k\rightarrow \infty }r_k=0$ and this proves [H2]. $\square$

Proposition 20

Let $\{( { x}^{(k)}, s^{(k)})\}_{{k\in {\mathbb {N}}}}$ be the sequence generated by iPila with $\gamma _{min}>0$. Then, there exists a positive constant b such that [H3] is satisfied with $I = \{1\}$, $\theta _1 = 1$, $\zeta _k = 0$.

Proof

From (40) we know that there exists a subgradient ${\hat{v}}^{(k)}\in \partial f(\hat{y}^{(k)})$ such that

$$\begin{aligned} \Vert {\hat{v}}^{(k)}\Vert \le q\sqrt{-h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)})} + q\Vert { x}^{(k)}- s^{(k)}\Vert \end{aligned}$$

(79)

and, reasoning as in the proof of Proposition 15, it follows that

$$\begin{aligned} \Vert \partial {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)})\Vert _- \le \left\| \begin{pmatrix} {\hat{v}}^{(k)}\\ \rho ^{(k)}\end{pmatrix} \right\| \le \Vert {\hat{v}}^{(k)}\Vert + |\rho ^{(k)}|. \end{aligned}$$

(80)

Let us analyze the two terms at the right-hand side of the inequality above, showing that both can be bounded from above with a multiple of $\sqrt{-\Delta _k}$. From (79) we obtain

$$\begin{aligned} \left\| {\hat{v}^{{(k)}} } \right\| & \le q\sqrt { - h^{{(k)}} (\tilde{y}^{{(k)}} ;x^{{(k)}} ,s^{{(k)}} ) + \gamma _{{min}} \left\| {x^{{(k)}} - s^{{(k)}} } \right\|^{2} } \\ & \quad + \frac{q}{{\sqrt {\gamma _{{min}} } }}\sqrt {\gamma _{{min}} \left\| {x^{{(k)}} - s^{{(k)}} } \right\|^{2} - h^{{(k)}} (\tilde{y}^{{(k)}} ;x^{{(k)}} ,s^{{(k)}} )} . \\ \end{aligned}$$

which, setting $A = q\left( 1+1/\sqrt{\gamma _{min}}\right)$ and using (70), yields

$$\begin{aligned} \Vert {\hat{v}}^{(k)}\Vert \le A \sqrt{-h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) + \gamma _{min}\Vert { x}^{(k)}- s^{(k)}\Vert ^2}\le A\sqrt{-\Delta _k}. \end{aligned}$$

(81)

On the other hand, from definition (78)

$$\begin{aligned} \rho ^{(k)}\le & {} B \sqrt{-h^{(k)}({\tilde{y}}^{(k)}; { x}^{(k)}, s^{(k)}) + \gamma _k \Vert { x}^{(k)}- s^{(k)}\Vert ^2} = B\sqrt{-\Delta _k} \end{aligned}$$

where $B = \sqrt{2}\max \left\{ c+ \alpha _{max}/{\theta }, d / {\gamma _{min}}\right\} ^{\frac{1}{2}}$. Therefore, combining the last inequality above with (81) and (80), yields $\Vert \partial {{\mathcal {F}}}( {\hat{y}}^{(k)},\rho ^{(k)})\Vert _-\le (A+B)\sqrt{-\Delta _k}$ which proves that [H3] is satisfied with $I = \{1\}$, $\theta _1 = 1$, $\zeta _k = 0$, $b=A+B$, $b_k=1$. $\square$

We are now ready for proving Theorem 7, which states the main convergence result for Algorithm iPila.

Proof of Theorem 7

By Proposition 18, 19, and 20, we know that conditions [H1–H2–H3] hold for iPila. From (36) we have $\displaystyle \lim _{k\rightarrow \infty }\Vert u^{(k)}- { x}^{(k)}\Vert = \displaystyle \lim _{k\rightarrow \infty }\Vert \hat{y}^{(k)}- { x}^{(k)}\Vert =0$. Hence we can apply Proposition 15 and conclude that [H6] holds. Moreover, condition [H4] holds as a consequence of Lemma 16, since $d_k= \sqrt{-\Delta _k}$ and $\Vert \tilde{y}^{(k)}-x^{(k)}\Vert \ge \Vert x^{(k+1)}-x^{(k)}\Vert /\lambda _k\ge \Vert x^{(k+1)}-x^{(k)}\Vert$ (see STEP 7). Finally, condition [H6] is trivially satisfied, since both sequences $\{a_k\}_{k\in {\mathbb {N}}}$ and $\{b_k\}_{k\in {\mathbb {N}}}$ are constant. Then Theorem 5 applies and guarantees that the sequence $\{(x^{(k)},\rho ^{(k)})\}_{k\in {\mathbb {N}}}$ converges to a stationary point $(x^*,\rho ^*)$ of ${\mathcal {F}}$. Note that, since ${\mathcal {F}}$ is the sum of separable functions, its subdifferential can be written as $\partial {\mathcal {F}}(x,\rho )=\partial f(x)\times \{\rho \}$. Then, $(x^*,\rho ^*)$ is stationary for ${\mathcal {F}}$ if and only if $\rho ^*=0$ and $0\in \partial f(x^*)$. Hence $x^*$ is a stationary point for f and $\{x^{(k)}\}_{k\in {\mathbb {N}}}$ converges to it. $\square$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bonettini, S., Ochs, P., Prato, M. et al. An abstract convergence framework with application to inertial inexact forward–backward methods. Comput Optim Appl 84, 319–362 (2023). https://doi.org/10.1007/s10589-022-00441-4

Download citation

Received: 29 August 2022
Accepted: 06 December 2022
Published: 01 January 2023
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10589-022-00441-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

An abstract convergence framework with application to inertial inexact forward–backward methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Forward–Backward Algorithm With Different Inertial Terms for Structured Non-Convex Minimization Problems

Inertial Forward–Backward Algorithms with Perturbations: Application to Tikhonov Regularization

An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions

Data Availibility Statement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

A Appendix: convergence Analysis of i\(^2\)Piano and iPila

A Appendix: convergence Analysis of i\(^2\)Piano and iPila

1.1 A.1 Preliminary results

Lemma 8

Proof

Lemma 9

Lemma 10

Proof

1.2 A.2 Convergence analysis of i\(^2\)Piano

Lemma 11

Proof

Proposition 12

Proof

Proposition 13

Proof

Proposition 14

Proof

Proposition 15

Proof

Proof

1.3 A.3 Convergence analysis of iPila

Lemma 16

Proof

Lemma 17

Proof

Proposition 18

Proof

Proposition 19

Proof

Proposition 20

Proof

Proof of Theorem 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation