A non-linear conjugate gradient in dual space for $$L_p$$ -norm regularized non-linear least squares with application in data assimilation | Numerical Algorithms Skip to main content
Log in

A non-linear conjugate gradient in dual space for \(L_p\)-norm regularized non-linear least squares with application in data assimilation

  • Original Paper
  • Published:
Numerical Algorithms Aims and scope Submit manuscript

Abstract

Wanting to minimize a non-linear least squares function penalized with an \(L_p\)-norm, with \(p\in (1,2)\), stemming from a 4DVar formulation of a data assimilation problem, we propose a modification of the non-linear conjugate gradient by making the iterations and the step search in the topological dual space of \(L_p\). We prove the convergence of this new algorithm and look at its performance on a data assimilation setup based on the shallow-water equations where the use of such an \(L_p\)-norm regularization is justified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

Data are available on request.

Notes

  1. Other choices could be made for \(H_k(x_k^*)\) when \(\vert \langle \nabla f_k, J_q'(x_k^*)p_k \rangle \vert < \varepsilon \). For instance one choice that would make \(J_q'(H_k(x_k^*)) p_k\) a descent direction for f as soon as \(p_k\) is, would be to take \(H_k(x_k^*) = \varvec{1}\) the vector of all components 1.

  2. For \(p=2\) we retrieve the Zoutendijk condition of Theorem 2

  3. Reminder of the abbreviations: GD: the classical Gradient Descent; LCGDS: Linear Conjugate Gradient in Dual Space; RPCG: Restricted Preconditioned Conjugate Gradient; GDDS: Gradient Descent in Dual Space; NLCG: Non-Linear Conjugate Gradient; NLCGDS: Non-Linear Conjugate Gradient in Dual Space.

References

  1. Freitag, M.A., Nichols, N.K., Budd, C.J.: Resolution of sharp fronts in the presence of model error in variational data assimilation. Quartely journal of the royal meteorological society 139, 742–757 (2013)

    Article  Google Scholar 

  2. Gratton, S., Lawless, A.S., Nichols, N.K.: Approximate Gauss-Newton methods for nonlinear least squares problems. SIAM J. Optim. 18(1), 106–132 (2007)

    Article  MathSciNet  Google Scholar 

  3. Mascarenhas, W.F.: The divergence of the BFGS and Gauss Newton Methods. Mathematical Programming 147, 253–276 (2014)

    Article  MathSciNet  Google Scholar 

  4. Conn, A.R., Gould, N.I.M., Toint, P.: Trust-Region Methods. SIAM, Philadelphia, PA, USA (2000)

    Book  Google Scholar 

  5. Gürol, S.: Solving regularized nonlinear least-squares problem in dual space with application to variational data assimilation. PhD thesis, Toulouse INP (2013)

  6. Schuster, T., Kaltenbacher, B., Hofmann, B., Kazimierski, K.S.: Regularization Methods in Banach Spaces. Radon series on Computational and applied mathematics, De Gruyter Mouton (2012)

    Book  Google Scholar 

  7. Schöpfer, F., Louis, A.K., Schuster, T.: Nonlinear iterative methods for linear ill-posed problems in Banach spaces. Inverse problems 22, 311–329 (2006)

    Article  MathSciNet  Google Scholar 

  8. Bonesky, T., Kazimierski, K.S., Maass, P., Schöpfer, F., Schuster, T.: Minimization of Tikhonov functional in Banach spaces. Abstract and applied analysis 2008, 1–18 (2007)

    Article  Google Scholar 

  9. Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (1999)

    Book  Google Scholar 

  10. Fletcher, S.: Data Assimilation for the Geosciences: From Theory to Application. Elsevier, March 10 (2017)

  11. R.Herzog, Wollner, W.: A conjugate direction method for linear systems in Banach spaces. Journal of Inverse and Ill-Posed Problems 25 (2016)

  12. Estatico, C., Gratton, S., Lenti, F., Titley-Peloquin, D.: A conjugate gradient like method for p-norm minimization in functional spaces. Numerische Mathematik 134, 895–922 (2017)

    Article  MathSciNet  Google Scholar 

  13. Brezis, H.: Functional Analysis. Sobolev Spaces and Partial Differential Equations. Springer, New York (2011)

    Google Scholar 

  14. Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2, 21–42 (1992)

    Article  MathSciNet  Google Scholar 

  15. Bergou, E., Gratton, S., Vicente, L.: Levenberg-marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation. SIAM/ASA Journal on Uncertainty Quantification (JUQ), 4 (1). 924-951. ISSN 2166-2525 (2016)

  16. Bonino, B., Estatico, C., Lazzaretti, M.: Dual descent regularization algorithms in variable exponent Lebesgue spaces for imaging. Numer. Algor https://doi.org/10.1007/s11075-022-01458-w (2023)

  17. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters 31(3), 167–175 (2003). https://doi.org/10.1016/S0167-6377(02)00231-6

    Article  MathSciNet  Google Scholar 

  18. Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math 16, 1–3 (1966)

    Article  MathSciNet  Google Scholar 

  19. Hager, W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pacific Journal of Optimization 2, 35–58 (2006)

    MathSciNet  Google Scholar 

  20. Hestenes, M.R., Stiefel, E.L.: Methods of conjugate gradients for solving linear systems. J. Reasearch Nat. Bur. Standards 49, 406–436 (1952)

    MathSciNet  Google Scholar 

  21. Gratton, S., Toint, P.L., Tshimanga, J.: Conjugate gradients versus multigrid solvers for diffusion-based correlation models in data assimilation. Quarterly Journal of the Royal Meteorological Society 139(675), 1481–1487 (2013)

    Article  Google Scholar 

  22. Gratton, S., Toint, P.L.: Approximate invariant subspaces and quasi-Newton optimization methods. Optimization Methods and Software 25(4), 507–529 (2010). https://doi.org/10.1080/10556780902992746

    Article  MathSciNet  Google Scholar 

  23. Mirouze, I., Weaver, A.: Representation of correlation functions in variational assimilation using an implicit diffusion operator. Quarterly Journal of the Royal Meteorological Society 654, 1421–1443 (2010)

    Article  Google Scholar 

  24. Daley, R.: Atmospheric Data Analysis. Cambridge University Press, Cambridge Atmospheric and Space Science Series (1991)

    Google Scholar 

  25. Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press; 3rd edition, https://doi.org/10.1016/B978-0-12-374370-1.X0001-8 (2008)

  26. Asadi, N., Scott, K.A., Clausi, D.A.: Data fusion and data assimilation of ice thickness observations using a regularisation framework. Dynamic Meteorology and Oceanography, Tellus A (2019)

    Book  Google Scholar 

  27. Bernigaud, A., Gratton, S., Lenti, F., Simon, E., Sohab, O.: Lp-norm regularization in variational data assimilation. Quarterly Journal of the Royal Meteorological Society 147, 2067–2081 (2021)

    Article  Google Scholar 

  28. Hansen, P.C.: Regularization tools version 4.0 for Matlab 7.3. International Journal for Numerical Methods in Fluids 46, 189–194 (2007)

  29. Wang, Y., Navon, I.M., Wang, X., Cheng, Y.: 2D Burgers equation with large Reynolds number using POD/DEIM and calibration. International Journal for Numerical Methods in Fluids 82(12), 909–931 (2016)

    Article  MathSciNet  Google Scholar 

  30. Anzengruber, S.W.: The discrepancy principle for Tikhonov regularization in Banach spaces. PhD thesis, Johannes Kepler Universitat Linz (2011)

  31. Gratton, S., Tshimanga, J.: An observation-space formulation of variational assimilation using a restricted preconditioned conjugate gradient algorithm. Quarterly Journal of the Royal Meteorological Society 135(643), 1573–1585 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

We thank Olivier Cots for his precious feedback, as well as the two anonymous referees for their helpful comments and suggestions.

Funding

This study was supported by the French national program LEFE (Les Enveloppes Fluides et Environnement).

Author information

Authors and Affiliations

Authors

Contributions

All the authors have contributed together to this manuscript.

Corresponding author

Correspondence to Antoine Bernigaud.

Ethics declarations

Ethics Approval

Not Applicable.

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Descent algorithms in dual space as mirror descent algorithms

Descent algorithms in dual space as mirror descent algorithms

[16] has pointed out the similarities between Banach algorithm that transport the iterates in the dual space or the direction in the primal space and what are referred as proximal algorithms, which are themselves similar to mirror descent algorithms. [17] provides a convergence analysis of these algorithms relying on an hypothesis of strong convexity. In order to show a similar convergence result based on the hypothesis of a p-convex space, we briefly summarize how these algorithms are connected.

The classical gradient descent update \(x_{k+1} = x_k - \alpha _k \nabla f_k\) can be retrieved by minimizing

$$\begin{aligned} \mathcal {F}(x) = f_k + \alpha _k \langle \nabla f_k,x \rangle + \frac{1}{2}\Vert x - x_k\Vert _2^2, \end{aligned}$$

looking for a compromise between minimizing the local approximation of f around \(x_k\) while not going too far from this point. Mirror descent algorithms consist in using a different distance, instead of \(L_2\)-norm, to better capture the geometry of the underlying space. Taking the distance \(D(x,x_k) = \frac{1}{p} \Vert x - x_k \Vert _p^p\) leads to the Banach descent with transport of the gradient in the primal space. [17] has highlighted the benefits of using the Bregman distance \(\Delta _h\) [6] associated with a strictly convex and differentiable function h: \(\Delta _h(x,y) = h(y) - h(x) - \langle \nabla h(x), y-x\rangle \). The mirror descent update can then be written as:

$$\begin{aligned} x_{k+1} = \nabla h^* (\nabla h - \alpha _k \nabla f_k), \end{aligned}$$
(28)

with \(h^*\) the Fenchel conjugate of f defined by \(h^*(y) = \underset{x \in X}{\sup }\ \left( \langle x,y \rangle - h(x) \right) \).

The Banach descent with transport of the iterations in the dual space (9) is equivalent to (28) if \(h = \frac{1}{p}\Vert \cdot \Vert _p^p\). We now adapt the result from [17] in the case of a p-convex space.

Theorem 5

Let \(f: X \rightarrow \mathbb R\) convex, differentiable and X being p-convex as in Definition 3. The gradient descent in dual space, equivalent to a mirror descent with \(h = \frac{1}{p}\Vert \cdot \Vert _p^p\), with a constant step \(\alpha \) and a starting point \(x_0\) produces iterations \(x_1,..., x_N\) such that for all \(\bar{x} \in X\):

$$\begin{aligned} {\sum _{k=0}^N \left( f_k - f(\bar{x}) \right) } \le \frac{1}{\alpha } \Delta _h(x_0,\bar{x}) + \underbrace{\frac{1}{q}(\frac{\alpha }{c_p})^{\frac{q}{p}} \sum _{k=0}^N \Vert \nabla f_k\Vert _q^q}_{\text {Regret term}}, \end{aligned}$$

which holds in particular when \(\bar{x}\) is the minimum of f.

Proof

Let \(k \in \mathbb N\) et \(\Phi _k = \frac{1}{\alpha } \Delta _h(x_k,\bar{x})\). By definition of \(\Delta _h\), and since \(\nabla h(x) = J_p(x)\):

$$\begin{aligned} \begin{aligned} \Phi _{k+1} - \Phi _k&= \frac{1}{\alpha } [ h(\bar{x}) - h(x_{k+1}) - \langle \underbrace{\nabla h(x_{k+1})}_{ x_{k+1}^*}, \bar{x} - x_{k+1} \rangle - h(\bar{x}) + h(x_{k}) \\&\quad + \langle \underbrace{\nabla h(x_{k})}_{x_{k}^*}, \bar{x} - x_{k} \rangle ] \\&= \frac{1}{\alpha } [ h(x_{k}) - h(x_{k+1}) - \langle x_k^* - \alpha \nabla f(x_k), \bar{x} - x_{k+1} \rangle + \langle x_{k}^*, \bar{x} - x_{k} \rangle ] \\&= \frac{1}{\alpha } [ h(x_{k}) - h(x_{k+1}) - \langle J_p(x), x_k - x_{k+1} \rangle + \alpha \langle \nabla f(x_k) , \bar{x} - x_{k+1} \rangle ] \end{aligned} \end{aligned}$$

Using the definition of p-convex space and substituting x by \(x_k\) and y by \(x_k - x_{k+1}\):

$$\begin{aligned} \frac{1}{p} \Vert x_k\Vert _p^p - \frac{1}{p} \Vert x_{k+1}\Vert _p^p - \langle J_p(x_k),x_k - x_{k+1} \rangle \le - \frac{c_p}{p} \Vert x_k - x_{k+1} \Vert _p^p. \end{aligned}$$

Whence

$$\begin{aligned} \Phi _{k+1} - \Phi _k \le \frac{1}{\alpha } [- \frac{c_p}{p} \Vert x_k - x_{k+1} \Vert _p^p + \alpha \langle \nabla f(x_k) , \bar{x} - x_{k+1} \rangle ]. \end{aligned}$$

We then have:

$$\begin{aligned} \begin{aligned} f_k \!-\! f(\bar{x}) \!+\! (\Phi _{k+1} \!-\! \Phi _k)&\le f_k - f(\bar{x}) - \frac{c_p}{\alpha p} \Vert x_k - x_{k+1} \Vert _p^p + \langle \nabla f(x_k) , \bar{x} - x_{k+1} \rangle \\&\le \underbrace{f_k - f(\bar{x}) + \langle \nabla f(x_k) , \bar{x} - x_{k} \rangle }_{\le 0 \text { because f is convex}} - \frac{c_p}{\alpha p} \Vert x_k - x_{k+1} \Vert _p^p \\&+ \langle \nabla f(x_k) , x_k - x_{k+1} \rangle \end{aligned} \end{aligned}$$

Thanks to the Cauchy inequality in Banach space, \(\langle \nabla f(x_k), x_k - x_{k+1} \rangle \le \Vert \nabla f_k\Vert _{X^*}\Vert x_k - x_{k+1}\Vert _X \). We now use the Young’s inequality (\(ab \le \frac{1}{q}a^q + \frac{1}{p}b^p\), for all \(a,b \ge 0\)) to write

$$\begin{aligned} \begin{aligned} \Vert \nabla f_k\Vert _q \Vert x_k - x_{k+1}\Vert _p&= (\frac{\alpha }{c_p})^{\frac{1}{p}} \Vert \nabla f_k\Vert _q (\frac{c_p}{\alpha })^{\frac{1}{p}}\Vert x_k - x_{k+1}\Vert _p \\&\le \frac{1}{q}(\frac{\alpha }{c_p})^{\frac{q}{p}}\Vert \nabla f_k\Vert _q^q + \frac{c_p}{\alpha p}\Vert x_k - x_{k+1}\Vert _p^p. \end{aligned} \end{aligned}$$

Thus

$$\begin{aligned} \begin{aligned} f_k - f(\bar{x}) + (\Phi _{k+1} - \Phi _k)&\le - \frac{c_p}{\alpha p} \Vert x_k - x_{k+1} \Vert _p^p + \frac{1}{q}(\frac{\alpha }{c_p})^{\frac{q}{p}}\Vert \nabla f_k\Vert _q^q + \frac{c_p}{\alpha p}\Vert x_k - x_{k+1}\Vert _p^p \\&\le \frac{1}{q}(\frac{\alpha }{c_p})^{\frac{q}{p}}\Vert \nabla f_k\Vert _q^q. \end{aligned} \end{aligned}$$

Summing over k we get:

$$\begin{aligned} {\sum _{k=0}^N \left( f_k - f(\bar{x}) \right) } \le \frac{1}{\alpha } \Delta _h(x_0,\bar{x}) + \frac{1}{q}(\frac{\alpha }{c_p})^{\frac{q}{p}} \sum _{k=0}^N \Vert \nabla f_k\Vert _q^q. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bernigaud, A., Gratton, S. & Simon, E. A non-linear conjugate gradient in dual space for \(L_p\)-norm regularized non-linear least squares with application in data assimilation. Numer Algor 95, 471–497 (2024). https://doi.org/10.1007/s11075-023-01578-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11075-023-01578-x

Keywords

Navigation