A non-linear conjugate gradient in dual space for $$L_p$$ -norm regularized non-linear least squares with application in data assimilation

Bernigaud, Antoine; Gratton, Serge; Simon, Ehouarn

doi:10.1007/s11075-023-01578-x

A non-linear conjugate gradient in dual space for $L_p$-norm regularized non-linear least squares with application in data assimilation

Original Paper
Published: 08 June 2023

Volume 95, pages 471–497, (2024)
Cite this article

Numerical Algorithms Aims and scope Submit manuscript

Antoine Bernigaud¹,
Serge Gratton¹ &
Ehouarn Simon¹

239 Accesses
Explore all metrics

Abstract

Wanting to minimize a non-linear least squares function penalized with an $L_p$-norm, with $p\in (1,2)$, stemming from a 4DVar formulation of a data assimilation problem, we propose a modification of the non-linear conjugate gradient by making the iterations and the step search in the topological dual space of $L_p$. We prove the convergence of this new algorithm and look at its performance on a data assimilation setup based on the shallow-water equations where the use of such an $L_p$-norm regularization is justified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Analysis of Continuous $H^{-1}$-Least-Squares Methods for the Steady Navier–Stokes System

Article 24 January 2019

Absolutely minimising generalised solutions to the equations of vectorial calculus of variations in $L^\infty $

Article Open access 16 January 2017

On the iterative regularization of non-linear illposed problems in $L^{\infty }$

Article Open access 22 June 2023

Data Availability

Data are available on request.

Notes

Other choices could be made for $H_k(x_k^*)$ when $\vert \langle \nabla f_k, J_q'(x_k^*)p_k \rangle \vert < \varepsilon $. For instance one choice that would make $J_q'(H_k(x_k^*)) p_k$ a descent direction for f as soon as $p_k$ is, would be to take $H_k(x_k^*) = \varvec{1}$ the vector of all components 1.
For $p=2$ we retrieve the Zoutendijk condition of Theorem 2
Reminder of the abbreviations: GD: the classical Gradient Descent; LCGDS: Linear Conjugate Gradient in Dual Space; RPCG: Restricted Preconditioned Conjugate Gradient; GDDS: Gradient Descent in Dual Space; NLCG: Non-Linear Conjugate Gradient; NLCGDS: Non-Linear Conjugate Gradient in Dual Space.

References

Freitag, M.A., Nichols, N.K., Budd, C.J.: Resolution of sharp fronts in the presence of model error in variational data assimilation. Quartely journal of the royal meteorological society 139, 742–757 (2013)
Article Google Scholar
Gratton, S., Lawless, A.S., Nichols, N.K.: Approximate Gauss-Newton methods for nonlinear least squares problems. SIAM J. Optim. 18(1), 106–132 (2007)
Article MathSciNet Google Scholar
Mascarenhas, W.F.: The divergence of the BFGS and Gauss Newton Methods. Mathematical Programming 147, 253–276 (2014)
Article MathSciNet Google Scholar
Conn, A.R., Gould, N.I.M., Toint, P.: Trust-Region Methods. SIAM, Philadelphia, PA, USA (2000)
Book Google Scholar
Gürol, S.: Solving regularized nonlinear least-squares problem in dual space with application to variational data assimilation. PhD thesis, Toulouse INP (2013)
Schuster, T., Kaltenbacher, B., Hofmann, B., Kazimierski, K.S.: Regularization Methods in Banach Spaces. Radon series on Computational and applied mathematics, De Gruyter Mouton (2012)
Book Google Scholar
Schöpfer, F., Louis, A.K., Schuster, T.: Nonlinear iterative methods for linear ill-posed problems in Banach spaces. Inverse problems 22, 311–329 (2006)
Article MathSciNet Google Scholar
Bonesky, T., Kazimierski, K.S., Maass, P., Schöpfer, F., Schuster, T.: Minimization of Tikhonov functional in Banach spaces. Abstract and applied analysis 2008, 1–18 (2007)
Article Google Scholar
Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (1999)
Book Google Scholar
Fletcher, S.: Data Assimilation for the Geosciences: From Theory to Application. Elsevier, March 10 (2017)
R.Herzog, Wollner, W.: A conjugate direction method for linear systems in Banach spaces. Journal of Inverse and Ill-Posed Problems 25 (2016)
Estatico, C., Gratton, S., Lenti, F., Titley-Peloquin, D.: A conjugate gradient like method for p-norm minimization in functional spaces. Numerische Mathematik 134, 895–922 (2017)
Article MathSciNet Google Scholar
Brezis, H.: Functional Analysis. Sobolev Spaces and Partial Differential Equations. Springer, New York (2011)
Google Scholar
Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2, 21–42 (1992)
Article MathSciNet Google Scholar
Bergou, E., Gratton, S., Vicente, L.: Levenberg-marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation. SIAM/ASA Journal on Uncertainty Quantification (JUQ), 4 (1). 924-951. ISSN 2166-2525 (2016)
Bonino, B., Estatico, C., Lazzaretti, M.: Dual descent regularization algorithms in variable exponent Lebesgue spaces for imaging. Numer. Algor https://doi.org/10.1007/s11075-022-01458-w (2023)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters 31(3), 167–175 (2003). https://doi.org/10.1016/S0167-6377(02)00231-6
Article MathSciNet Google Scholar
Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math 16, 1–3 (1966)
Article MathSciNet Google Scholar
Hager, W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pacific Journal of Optimization 2, 35–58 (2006)
MathSciNet Google Scholar
Hestenes, M.R., Stiefel, E.L.: Methods of conjugate gradients for solving linear systems. J. Reasearch Nat. Bur. Standards 49, 406–436 (1952)
MathSciNet Google Scholar
Gratton, S., Toint, P.L., Tshimanga, J.: Conjugate gradients versus multigrid solvers for diffusion-based correlation models in data assimilation. Quarterly Journal of the Royal Meteorological Society 139(675), 1481–1487 (2013)
Article Google Scholar
Gratton, S., Toint, P.L.: Approximate invariant subspaces and quasi-Newton optimization methods. Optimization Methods and Software 25(4), 507–529 (2010). https://doi.org/10.1080/10556780902992746
Article MathSciNet Google Scholar
Mirouze, I., Weaver, A.: Representation of correlation functions in variational assimilation using an implicit diffusion operator. Quarterly Journal of the Royal Meteorological Society 654, 1421–1443 (2010)
Article Google Scholar
Daley, R.: Atmospheric Data Analysis. Cambridge University Press, Cambridge Atmospheric and Space Science Series (1991)
Google Scholar
Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press; 3rd edition, https://doi.org/10.1016/B978-0-12-374370-1.X0001-8 (2008)
Asadi, N., Scott, K.A., Clausi, D.A.: Data fusion and data assimilation of ice thickness observations using a regularisation framework. Dynamic Meteorology and Oceanography, Tellus A (2019)
Book Google Scholar
Bernigaud, A., Gratton, S., Lenti, F., Simon, E., Sohab, O.: Lp-norm regularization in variational data assimilation. Quarterly Journal of the Royal Meteorological Society 147, 2067–2081 (2021)
Article Google Scholar
Hansen, P.C.: Regularization tools version 4.0 for Matlab 7.3. International Journal for Numerical Methods in Fluids 46, 189–194 (2007)
Wang, Y., Navon, I.M., Wang, X., Cheng, Y.: 2D Burgers equation with large Reynolds number using POD/DEIM and calibration. International Journal for Numerical Methods in Fluids 82(12), 909–931 (2016)
Article MathSciNet Google Scholar
Anzengruber, S.W.: The discrepancy principle for Tikhonov regularization in Banach spaces. PhD thesis, Johannes Kepler Universitat Linz (2011)
Gratton, S., Tshimanga, J.: An observation-space formulation of variational assimilation using a restricted preconditioned conjugate gradient algorithm. Quarterly Journal of the Royal Meteorological Society 135(643), 1573–1585 (2009)
Article Google Scholar

Download references

Acknowledgements

We thank Olivier Cots for his precious feedback, as well as the two anonymous referees for their helpful comments and suggestions.

Funding

This study was supported by the French national program LEFE (Les Enveloppes Fluides et Environnement).

Author information

Authors and Affiliations

Université de Toulouse INP, IRIT, 2 rue Camichel, BP 7122, Toulouse, 31071 Toulouse Cedex 7, France
Antoine Bernigaud, Serge Gratton & Ehouarn Simon

Authors

Antoine Bernigaud
View author publications
You can also search for this author in PubMed Google Scholar
Serge Gratton
View author publications
You can also search for this author in PubMed Google Scholar
Ehouarn Simon
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors have contributed together to this manuscript.

Corresponding author

Correspondence to Antoine Bernigaud.

Ethics declarations

Ethics Approval

Not Applicable.

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Descent algorithms in dual space as mirror descent algorithms

[16] has pointed out the similarities between Banach algorithm that transport the iterates in the dual space or the direction in the primal space and what are referred as proximal algorithms, which are themselves similar to mirror descent algorithms. [17] provides a convergence analysis of these algorithms relying on an hypothesis of strong convexity. In order to show a similar convergence result based on the hypothesis of a p-convex space, we briefly summarize how these algorithms are connected.

The classical gradient descent update $x_{k+1} = x_k - \alpha _k \nabla f_k$ can be retrieved by minimizing

$$\begin{aligned} \mathcal {F}(x) = f_k + \alpha _k \langle \nabla f_k,x \rangle + \frac{1}{2}\Vert x - x_k\Vert _2^2, \end{aligned}$$

looking for a compromise between minimizing the local approximation of f around $x_k$ while not going too far from this point. Mirror descent algorithms consist in using a different distance, instead of $L_2$-norm, to better capture the geometry of the underlying space. Taking the distance $D(x,x_k) = \frac{1}{p} \Vert x - x_k \Vert _p^p$ leads to the Banach descent with transport of the gradient in the primal space. [17] has highlighted the benefits of using the Bregman distance $\Delta _h$ [6] associated with a strictly convex and differentiable function h: $\Delta _h(x,y) = h(y) - h(x) - \langle \nabla h(x), y-x\rangle $. The mirror descent update can then be written as:

$$\begin{aligned} x_{k+1} = \nabla h^* (\nabla h - \alpha _k \nabla f_k), \end{aligned}$$

(28)

with $h^*$ the Fenchel conjugate of f defined by $h^*(y) = \underset{x \in X}{\sup }\ \left( \langle x,y \rangle - h(x) \right) $.

The Banach descent with transport of the iterations in the dual space (9) is equivalent to (28) if $h = \frac{1}{p}\Vert \cdot \Vert _p^p$. We now adapt the result from [17] in the case of a p-convex space.

Theorem 5

Let $f: X \rightarrow \mathbb R$ convex, differentiable and X being p-convex as in Definition 3. The gradient descent in dual space, equivalent to a mirror descent with $h = \frac{1}{p}\Vert \cdot \Vert _p^p$, with a constant step $\alpha $ and a starting point $x_0$ produces iterations $x_1,..., x_N$ such that for all $\bar{x} \in X$:

$$\begin{aligned} {\sum _{k=0}^N \left( f_k - f(\bar{x}) \right) } \le \frac{1}{\alpha } \Delta _h(x_0,\bar{x}) + \underbrace{\frac{1}{q}(\frac{\alpha }{c_p})^{\frac{q}{p}} \sum _{k=0}^N \Vert \nabla f_k\Vert _q^q}_{\text {Regret term}}, \end{aligned}$$

which holds in particular when $\bar{x}$ is the minimum of f.

Proof

Let $k \in \mathbb N$ et $\Phi _k = \frac{1}{\alpha } \Delta _h(x_k,\bar{x})$. By definition of $\Delta _h$, and since $\nabla h(x) = J_p(x)$:

$$\begin{aligned} \begin{aligned} \Phi _{k+1} - \Phi _k&= \frac{1}{\alpha } [ h(\bar{x}) - h(x_{k+1}) - \langle \underbrace{\nabla h(x_{k+1})}_{ x_{k+1}^*}, \bar{x} - x_{k+1} \rangle - h(\bar{x}) + h(x_{k}) \\&\quad + \langle \underbrace{\nabla h(x_{k})}_{x_{k}^*}, \bar{x} - x_{k} \rangle ] \\&= \frac{1}{\alpha } [ h(x_{k}) - h(x_{k+1}) - \langle x_k^* - \alpha \nabla f(x_k), \bar{x} - x_{k+1} \rangle + \langle x_{k}^*, \bar{x} - x_{k} \rangle ] \\&= \frac{1}{\alpha } [ h(x_{k}) - h(x_{k+1}) - \langle J_p(x), x_k - x_{k+1} \rangle + \alpha \langle \nabla f(x_k) , \bar{x} - x_{k+1} \rangle ] \end{aligned} \end{aligned}$$

Using the definition of p-convex space and substituting x by $x_k$ and y by $x_k - x_{k+1}$:

$$\begin{aligned} \frac{1}{p} \Vert x_k\Vert _p^p - \frac{1}{p} \Vert x_{k+1}\Vert _p^p - \langle J_p(x_k),x_k - x_{k+1} \rangle \le - \frac{c_p}{p} \Vert x_k - x_{k+1} \Vert _p^p. \end{aligned}$$

Whence

$$\begin{aligned} \Phi _{k+1} - \Phi _k \le \frac{1}{\alpha } [- \frac{c_p}{p} \Vert x_k - x_{k+1} \Vert _p^p + \alpha \langle \nabla f(x_k) , \bar{x} - x_{k+1} \rangle ]. \end{aligned}$$

We then have:

$$\begin{aligned} \begin{aligned} f_k \!-\! f(\bar{x}) \!+\! (\Phi _{k+1} \!-\! \Phi _k)&\le f_k - f(\bar{x}) - \frac{c_p}{\alpha p} \Vert x_k - x_{k+1} \Vert _p^p + \langle \nabla f(x_k) , \bar{x} - x_{k+1} \rangle \\&\le \underbrace{f_k - f(\bar{x}) + \langle \nabla f(x_k) , \bar{x} - x_{k} \rangle }_{\le 0 \text { because f is convex}} - \frac{c_p}{\alpha p} \Vert x_k - x_{k+1} \Vert _p^p \\&+ \langle \nabla f(x_k) , x_k - x_{k+1} \rangle \end{aligned} \end{aligned}$$

Thanks to the Cauchy inequality in Banach space, $\langle \nabla f(x_k), x_k - x_{k+1} \rangle \le \Vert \nabla f_k\Vert _{X^*}\Vert x_k - x_{k+1}\Vert _X $. We now use the Young’s inequality ($ab \le \frac{1}{q}a^q + \frac{1}{p}b^p$, for all $a,b \ge 0$) to write

$$\begin{aligned} \begin{aligned} \Vert \nabla f_k\Vert _q \Vert x_k - x_{k+1}\Vert _p&= (\frac{\alpha }{c_p})^{\frac{1}{p}} \Vert \nabla f_k\Vert _q (\frac{c_p}{\alpha })^{\frac{1}{p}}\Vert x_k - x_{k+1}\Vert _p \\&\le \frac{1}{q}(\frac{\alpha }{c_p})^{\frac{q}{p}}\Vert \nabla f_k\Vert _q^q + \frac{c_p}{\alpha p}\Vert x_k - x_{k+1}\Vert _p^p. \end{aligned} \end{aligned}$$

Thus

$$\begin{aligned} \begin{aligned} f_k - f(\bar{x}) + (\Phi _{k+1} - \Phi _k)&\le - \frac{c_p}{\alpha p} \Vert x_k - x_{k+1} \Vert _p^p + \frac{1}{q}(\frac{\alpha }{c_p})^{\frac{q}{p}}\Vert \nabla f_k\Vert _q^q + \frac{c_p}{\alpha p}\Vert x_k - x_{k+1}\Vert _p^p \\&\le \frac{1}{q}(\frac{\alpha }{c_p})^{\frac{q}{p}}\Vert \nabla f_k\Vert _q^q. \end{aligned} \end{aligned}$$

Summing over k we get:

$$\begin{aligned} {\sum _{k=0}^N \left( f_k - f(\bar{x}) \right) } \le \frac{1}{\alpha } \Delta _h(x_0,\bar{x}) + \frac{1}{q}(\frac{\alpha }{c_p})^{\frac{q}{p}} \sum _{k=0}^N \Vert \nabla f_k\Vert _q^q. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bernigaud, A., Gratton, S. & Simon, E. A non-linear conjugate gradient in dual space for $L_p$-norm regularized non-linear least squares with application in data assimilation. Numer Algor 95, 471–497 (2024). https://doi.org/10.1007/s11075-023-01578-x

Download citation

Received: 13 January 2023
Accepted: 10 May 2023
Published: 08 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11075-023-01578-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A non-linear conjugate gradient in dual space for \(L_p\)-norm regularized non-linear least squares with application in data assimilation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of Continuous \(H^{-1}\)-Least-Squares Methods for the Steady Navier–Stokes System

Absolutely minimising generalised solutions to the equations of vectorial calculus of variations in \(L^\infty \)

On the iterative regularization of non-linear illposed problems in \(L^{\infty }\)

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Conflicts of interest

Additional information

Publisher's Note

Descent algorithms in dual space as mirror descent algorithms

Theorem 5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A non-linear conjugate gradient in dual space for \(L_p\)-norm regularized non-linear least squares with application in data assimilation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of Continuous \(H^{-1}\)-Least-Squares Methods for the Steady Navier–Stokes System

Absolutely minimising generalised solutions to the equations of vectorial calculus of variations in \(L^\infty \)

On the iterative regularization of non-linear illposed problems in \(L^{\infty }\)

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Conflicts of interest

Additional information

Publisher's Note

Descent algorithms in dual space as mirror descent algorithms

Descent algorithms in dual space as mirror descent algorithms

Theorem 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation