Abstract
In a real Hilbert space, we study a new class of forward-backward algorithms for structured non-smooth minimization problems. As a special case of the parameters, we recover the method AFB (Accelerated Forward-Backward) that was recently discussed as an enhanced variant of FISTA (Fast Iterative Soft Thresholding Algorithm). Our algorithms enjoy the well-known properties of AFB. Namely, they generate convergent sequences (xn) that minimize the function values at the rate o(n− 2). Another important specificity of our processes is that they can be regarded as discrete models suggested by first-order formulations of Newton-like dynamical systems. This permit us to extend to the non-smooth setting, a property of fast convergence to zero of the gradients, established so far for discrete Newton-like dynamics with smooth potentials only. In specific, as a new result, we show that the latter property also applies to AFB. To prove this stability phenomenon, we develop a technical analysis that can be also useful regarding many other related developments. Numerical experiments are furthermore performed so as to illustrate the properties of the considered algorithms comparing with other existing ones.





Similar content being viewed by others
References
Alvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian driven damping. Application to Optimization and Mechanics. J. Math. Pures appl. 81(8), 747–779 (2002)
Attouch, H., Bolte, J., Redont, P.: Optimizing properties of an inertial dynamical system with geometric damping. Control. Cybern. 31, 643–657 (2002)
Attouch, H., Cabot, A.: Convergence rates of inertial forward-backward algorithms. SIAM J. Optim. 28, 849–874 (2018)
Attouch, H., Cabot, A.: Convergence of a relaxed inertial forward-backward algorithm for structured monotone inclusions. Applied Math. Optimization 80, 547–598 (2019)
Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: First-order optimization algorithms via inertial systems with Hessian driven damping, arXiv preprint, arXiv:1907.10536 (2019)
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. programming, Volume 168, Issue 1–2, pp. 123–175 (2018)
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k2. SIAM J. Optimization 26(3), 1824–1834 (2016)
Attouch, J., Peypouquet, P.R.: Fast convex optimization via intertial dynamics with hessian driven damping. J Differential Equations 261 (10), 5734–5783 (2016)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Brezis, H.: Opérateurs Maximaux Monotones, Math. Stud, 5. North-Holland, Amsterdam (1973)
Chambolle, A., Dossal, C.: On the convergence of the iterates of FISTA. JOTA 166(3), 968–982 (2015)
Cruz, J.B., Nghia, T.: On the convergence of the proximal forward-backward splitting method with linesearches. Optim. Methods and Software 31 (6), 1209–1238 (2016)
Güler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optimization 29, 403–419 (1991)
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Iutzeler, F., Hendrickx, J.M.: A Generic online acceleration scheme for Optimization algorithms via Relaxation and Inertia arXiv:1603.05398v3 (2017)
Lemaire, B.: The Proximal Algorithm. In: New Methods in Optimization and Their Industrial Uses, J.P. Penot (Ed), Internat. Ser. Numer. Math, 87, pp. 73-87. Birkhauser, Basel (1989)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vision, pp. 1–15 (2014)
Labarre, F., Maingé, P. E.: First-order frameworks for continuous Newton-like dynamics governed by maximally monotone operators. Set-Valued and Variational Analysis, pp. 1–27 (2021)
Maingé, P.E., Maruster, S.: Convergence in norm of modified Krasnoselski-Mann iterations for fixed points of demicontractive mappings Applied Mathematics and Computation. Elsevier 217(24), 9864–9874 (2011)
May, R.: Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turkish Journal of Mathematics, 41(3). https://doi.org/10.3906/mat-1512-28 (2015)
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155(2), 447–454 (2003)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Mathematics Doklady 27, 372–376 (1983)
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Programming, Ser. B 140, 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Amer. Math. Soc. 73, 591–597 (1967)
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Su, W., Boyd, S., Candes, E. J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Machine learning Reasearch 17(153), 1–43 (2016)
Scheinberg, D.K., Goldfarb, X.: Bai, Fast first-order methods for composite convex optimization with backtraking. Found. Comput. Math. 14(3), 389–417 (2014)
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via High-Resolution differential equations. https://doi.org/10.13140/RG.2.2.20063.92329 (2018)
Apidopoulos, V., Aujol, J.F., Dossal, C.: Convegence rate of inertial forward-backward algorithms beyong Nesterov’s rule, Mathematical Programming, Serie A, Springer, pp. 1-20 (ff10.1007/s10107-018-1350-9) (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1. Proof of Proposition 3.2
Appendix 1. Proof of Proposition 3.2
Let {κ, e, νn} be positive parameters, and set ϱ = 1 − κ, τn = e + νn+ 1 and un = yn − xn. It is easily seen that (3.8a) can be rewritten as (for n ≥ p)
The sequel of the proof can be divided into the following parts (r1)–(r4):
(r1) An estimate from the inertial part of the method. Given \((s,q) \in [0,\infty ) \times {\mathcal H} \), we begin with proving that the discrete derivative \( \dot {G}_{n+1}(s,q)\) satisfies
In order to get this result, we readily notice that \(\dot {G}_{n+1}(s,q)\) can be formulated as
where an := 〈q − xn,un〉, bn := (1/2)∥xn − q∥2 and cn := (1/2)∥un∥2. For the sake of clarity, and so as to estimate the right side of (A.3), we set
Clearly, by an = 〈q − xn,un〉 and \(u_{n}=- \frac {1 }{\kappa } \dot {y}_{n+1} \) (from (A.1c)), we get
Again from an = 〈q − xn,un〉, and noticing \(\dot {u}_{n+1} = \dot {y}_{n+1} - \dot {x}_{n+1} \) (as un = yn − xn), we readily have
Taking the scalar product of each side of (A.1b) by \(q-\dot {x}_{n+1}\), along with \(u_{n}=- \frac {1 }{\kappa } \dot {y}_{n+1}\) (from (A.1c)), amounts to Pn − Wn = κ− 1𝜃nRn, which, by \( \theta _{n} = \tau _{n}^{-1} (\nu _{n} - {\varrho } \nu _{n+1})\) (from (A.1a)) is equivalent to
Therefore, by (A.4), (A.5) and (A.6), and recalling that ϱ = 1 − κ, we are led to
From bn+ 1 = (1/2)∥xn+ 1 − q∥2, we readily get
In addition, by cn+ 1 = (1/2)∥un+ 1∥2, and \(\dot {u}_{n+1}= - \kappa u_{n} - \dot {x}_{n+1} \) (from un = yn − xn and \(\dot {y}_{n+1} =-\kappa u_{n}\)), we immediately have
In light of (A.3) together with (A.7), (A.8) and (A.9), we are led to
where the quantity \( \bar {\eta }_{n}\) is given by
This leads to the desired result.
(r2) An estimate from the proximal part of the method. Let us establish that, for any ξn≠ 1, it holds that
Indeed, we have \(\dot {x}_{n+1} =-\theta _n u_{n} -\chi _{n}^{*}\) (from (A.1b)), which, for any ξn≠ 1, can be rewritten as
where \(H_{n}= \dot {x}_{n+1} + (1-\xi _n)^{-1}\theta _n u_{n} \). Furthermore, by \(-\chi _{n}^{*}= \dot {x}_{n+1}+\theta _{n} u_{n}\) (again using (A.1b)) and denoting \(Q_{n}=\langle \dot {x}_{n+1}, u_{n}\rangle \), we simply obtain
Therefore, by adding \((1/2) \| \chi _{n}^{*}\|^{2}\) to the scalar product of the left side of equality (A.11) with \(\chi _{n}^{*}\), and using (A.12) and \( \| \chi _{n}^{*}\|^{2}= \|\dot {x}_{n+1}+ \theta _{n} u_{n}\|^{2}\), we get
This yields (A.10).
(r3) Combining proximal and inertial effects. Let (3.10) hold and set \(\rho _{n}= 1- (1-\kappa ) \frac {\nu _{n+1}}{\nu _{n}}\). It is worthwhile noticing that the term 𝜃n involved in (A.1) can be simply expressed as
So, by (A.13), in light of ρn > 0 (from condition (3.3a)), we deduce that 𝜃n is a positive sequence.
Now, we introduce the real sequence \((\bar {\gamma }_{n})\) defined by
Clearly, by (A.14), along with τn > 0 (as τn := e + νn+ 1) and ρn > 0, we obviously have
Next, given s > 0, we show that the iterates produced by (3.8a) (or, equivalently, by (A.1)) verify
where Tn(u, x) is defined for any \((u,x) \in {\mathcal H}^{2}\) by
together with the parameters
Indeed, by (A.2) and setting \(Q=\langle \dot {x}_{n+1} , u_{n} \rangle \), we know that
Moreover, in light of \(\bar {\gamma }_{n} \neq 1\) (from (A.15)), by using (A.10) (with the special value \(\xi _{n}= \gamma _{n}:=1- \frac {s \rho _{n}}{\tau _{n}}\)) and recalling that \(\theta _n = \frac {\nu _{n} \rho _{n}}{\tau _{n}}\) we obtain
Then, multiplying equality (A.20) by \(\rho _{n}^{-1} {\tau _{n}^{2}}\), and adding the resulting equality to (A.19), we get
Hence, noticing that νnρn = νn − ϱνn+ 1, we infer that (A.16)–(A.17) is actually satisfied together with the parameters
This leads to the desired result.
(r4) Finally, we give an alternative formulation of the quantity Tn(u, x) given by (A.17)–(A.18). For this purpose, we begin with reformulating σn. By the definitions τn := e + νn+ 1, \(\bar {\gamma }_{n}:=1-s \frac {\rho _{n}}{\tau _{n}}\), and by an easy computation we have
where τn, t = t + 2νn+ 1 (for t ≥ 0). As a consequence, by the previous definition of σn (in (A.18)), we obtain
Then we consider the following two situations relative to the constant κ:
- In the special case when κ = 1 (hence ρn = 1 and \( \rho _{n}^{-1}=1\)), we obviously have wn = 0 and ηn = 0. Then, for \((u,x) \in {\mathcal H}^{2}\), by definition of Tn (in (A.17)) along with \( {\sigma }_{n}= \frac {\left (e -s \right )}{2} \tau _{n,e} \) (from (A.21)) we obtain
- For \(\kappa \in (0,1) \cup (1,\infty )\) (hence ηn≠ 0), also setting \({\varsigma }_{n}:=\frac {w_{n}}{2 \eta _{n}}\), and \(\psi _{n}:= 4 {\sigma }_{n} {\eta }_{n}- {w}_{n}^{2}\), by definition of Tn (in (A.17)) we classically have
On the one hand, by \({w}_{n} = {\varrho } \nu _{n+1} \left (\nu _{n+1} + s \right )\) (from (A.18)) and remembering that τn, s = s + 2νn+ 1, we simply have \({w}_{n}^{2} = ({\varrho } \nu _{n+1})^{2} \left ((\nu _{n+1} )^{2} + s \tau _{n,s} \right )\). Hence, by (A.21) while using the definition of ψn, and setting Sn := ϱρnνnνn+ 1 (so that Sn = 2ηn and \(\psi _{n}= 2 {\sigma }_{n} S_{n}- {w}_{n}^{2}\)), we obtain
It is also easily checked that \(S_{n} \left (\rho _{n}^{-1} -1 \right ) = ({\varrho } \nu _{n+1})^{2} \), which by the previous equality yields
Then, noticing that eτn, e − sτn, s = (e − s)τn, e+s (as τn, t := t + 2νn+ 1, for t ≥ 0), we infer that \( \psi _{n} = \left (e-s \right ) \left (S_{n} \tau _{n,e} + ({\varrho } \nu _{n+1})^{2} \tau _{n,e+s} \right )\), which by (A.23) entails that
On the other hand, we clearly have \({\varsigma }_{n}=\frac {w_{n}}{S_{n}}\) (since Sn = 2ηn), together with \( {w}_{n} = {\varrho } \nu _{n+1} \left (\nu _{n+1} + s \right ) \), Sn = ϱρnνnνn+ 1 and \( \frac {1}{\theta _{n}}=\frac { e+ \nu _{n+1}}{ \rho _{n} \nu _{n}}\) (from (A.13)), which gives us
Combining the last two results amounts to
This completes the proof.
Rights and permissions
About this article
Cite this article
Maingé, PE., Labarre, F. Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization. Numer Algor 90, 99–136 (2022). https://doi.org/10.1007/s11075-021-01181-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11075-021-01181-y
Keywords
- Nesterov-type algorithm
- Inertial-type algorithm
- Global rate of convergence
- Non-smooth minimization
- Structured minimization
- Fast first-order method