Abstract
Wanting to minimize a non-linear least squares function penalized with an \(L_p\)-norm, with \(p\in (1,2)\), stemming from a 4DVar formulation of a data assimilation problem, we propose a modification of the non-linear conjugate gradient by making the iterations and the step search in the topological dual space of \(L_p\). We prove the convergence of this new algorithm and look at its performance on a data assimilation setup based on the shallow-water equations where the use of such an \(L_p\)-norm regularization is justified.
Similar content being viewed by others
Data Availability
Data are available on request.
Notes
Other choices could be made for \(H_k(x_k^*)\) when \(\vert \langle \nabla f_k, J_q'(x_k^*)p_k \rangle \vert < \varepsilon \). For instance one choice that would make \(J_q'(H_k(x_k^*)) p_k\) a descent direction for f as soon as \(p_k\) is, would be to take \(H_k(x_k^*) = \varvec{1}\) the vector of all components 1.
For \(p=2\) we retrieve the Zoutendijk condition of Theorem 2
Reminder of the abbreviations: GD: the classical Gradient Descent; LCGDS: Linear Conjugate Gradient in Dual Space; RPCG: Restricted Preconditioned Conjugate Gradient; GDDS: Gradient Descent in Dual Space; NLCG: Non-Linear Conjugate Gradient; NLCGDS: Non-Linear Conjugate Gradient in Dual Space.
References
Freitag, M.A., Nichols, N.K., Budd, C.J.: Resolution of sharp fronts in the presence of model error in variational data assimilation. Quartely journal of the royal meteorological society 139, 742–757 (2013)
Gratton, S., Lawless, A.S., Nichols, N.K.: Approximate Gauss-Newton methods for nonlinear least squares problems. SIAM J. Optim. 18(1), 106–132 (2007)
Mascarenhas, W.F.: The divergence of the BFGS and Gauss Newton Methods. Mathematical Programming 147, 253–276 (2014)
Conn, A.R., Gould, N.I.M., Toint, P.: Trust-Region Methods. SIAM, Philadelphia, PA, USA (2000)
Gürol, S.: Solving regularized nonlinear least-squares problem in dual space with application to variational data assimilation. PhD thesis, Toulouse INP (2013)
Schuster, T., Kaltenbacher, B., Hofmann, B., Kazimierski, K.S.: Regularization Methods in Banach Spaces. Radon series on Computational and applied mathematics, De Gruyter Mouton (2012)
Schöpfer, F., Louis, A.K., Schuster, T.: Nonlinear iterative methods for linear ill-posed problems in Banach spaces. Inverse problems 22, 311–329 (2006)
Bonesky, T., Kazimierski, K.S., Maass, P., Schöpfer, F., Schuster, T.: Minimization of Tikhonov functional in Banach spaces. Abstract and applied analysis 2008, 1–18 (2007)
Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (1999)
Fletcher, S.: Data Assimilation for the Geosciences: From Theory to Application. Elsevier, March 10 (2017)
R.Herzog, Wollner, W.: A conjugate direction method for linear systems in Banach spaces. Journal of Inverse and Ill-Posed Problems 25 (2016)
Estatico, C., Gratton, S., Lenti, F., Titley-Peloquin, D.: A conjugate gradient like method for p-norm minimization in functional spaces. Numerische Mathematik 134, 895–922 (2017)
Brezis, H.: Functional Analysis. Sobolev Spaces and Partial Differential Equations. Springer, New York (2011)
Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2, 21–42 (1992)
Bergou, E., Gratton, S., Vicente, L.: Levenberg-marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation. SIAM/ASA Journal on Uncertainty Quantification (JUQ), 4 (1). 924-951. ISSN 2166-2525 (2016)
Bonino, B., Estatico, C., Lazzaretti, M.: Dual descent regularization algorithms in variable exponent Lebesgue spaces for imaging. Numer. Algor https://doi.org/10.1007/s11075-022-01458-w (2023)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters 31(3), 167–175 (2003). https://doi.org/10.1016/S0167-6377(02)00231-6
Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math 16, 1–3 (1966)
Hager, W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pacific Journal of Optimization 2, 35–58 (2006)
Hestenes, M.R., Stiefel, E.L.: Methods of conjugate gradients for solving linear systems. J. Reasearch Nat. Bur. Standards 49, 406–436 (1952)
Gratton, S., Toint, P.L., Tshimanga, J.: Conjugate gradients versus multigrid solvers for diffusion-based correlation models in data assimilation. Quarterly Journal of the Royal Meteorological Society 139(675), 1481–1487 (2013)
Gratton, S., Toint, P.L.: Approximate invariant subspaces and quasi-Newton optimization methods. Optimization Methods and Software 25(4), 507–529 (2010). https://doi.org/10.1080/10556780902992746
Mirouze, I., Weaver, A.: Representation of correlation functions in variational assimilation using an implicit diffusion operator. Quarterly Journal of the Royal Meteorological Society 654, 1421–1443 (2010)
Daley, R.: Atmospheric Data Analysis. Cambridge University Press, Cambridge Atmospheric and Space Science Series (1991)
Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press; 3rd edition, https://doi.org/10.1016/B978-0-12-374370-1.X0001-8 (2008)
Asadi, N., Scott, K.A., Clausi, D.A.: Data fusion and data assimilation of ice thickness observations using a regularisation framework. Dynamic Meteorology and Oceanography, Tellus A (2019)
Bernigaud, A., Gratton, S., Lenti, F., Simon, E., Sohab, O.: Lp-norm regularization in variational data assimilation. Quarterly Journal of the Royal Meteorological Society 147, 2067–2081 (2021)
Hansen, P.C.: Regularization tools version 4.0 for Matlab 7.3. International Journal for Numerical Methods in Fluids 46, 189–194 (2007)
Wang, Y., Navon, I.M., Wang, X., Cheng, Y.: 2D Burgers equation with large Reynolds number using POD/DEIM and calibration. International Journal for Numerical Methods in Fluids 82(12), 909–931 (2016)
Anzengruber, S.W.: The discrepancy principle for Tikhonov regularization in Banach spaces. PhD thesis, Johannes Kepler Universitat Linz (2011)
Gratton, S., Tshimanga, J.: An observation-space formulation of variational assimilation using a restricted preconditioned conjugate gradient algorithm. Quarterly Journal of the Royal Meteorological Society 135(643), 1573–1585 (2009)
Acknowledgements
We thank Olivier Cots for his precious feedback, as well as the two anonymous referees for their helpful comments and suggestions.
Funding
This study was supported by the French national program LEFE (Les Enveloppes Fluides et Environnement).
Author information
Authors and Affiliations
Contributions
All the authors have contributed together to this manuscript.
Corresponding author
Ethics declarations
Ethics Approval
Not Applicable.
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Descent algorithms in dual space as mirror descent algorithms
Descent algorithms in dual space as mirror descent algorithms
[16] has pointed out the similarities between Banach algorithm that transport the iterates in the dual space or the direction in the primal space and what are referred as proximal algorithms, which are themselves similar to mirror descent algorithms. [17] provides a convergence analysis of these algorithms relying on an hypothesis of strong convexity. In order to show a similar convergence result based on the hypothesis of a p-convex space, we briefly summarize how these algorithms are connected.
The classical gradient descent update \(x_{k+1} = x_k - \alpha _k \nabla f_k\) can be retrieved by minimizing
looking for a compromise between minimizing the local approximation of f around \(x_k\) while not going too far from this point. Mirror descent algorithms consist in using a different distance, instead of \(L_2\)-norm, to better capture the geometry of the underlying space. Taking the distance \(D(x,x_k) = \frac{1}{p} \Vert x - x_k \Vert _p^p\) leads to the Banach descent with transport of the gradient in the primal space. [17] has highlighted the benefits of using the Bregman distance \(\Delta _h\) [6] associated with a strictly convex and differentiable function h: \(\Delta _h(x,y) = h(y) - h(x) - \langle \nabla h(x), y-x\rangle \). The mirror descent update can then be written as:
with \(h^*\) the Fenchel conjugate of f defined by \(h^*(y) = \underset{x \in X}{\sup }\ \left( \langle x,y \rangle - h(x) \right) \).
The Banach descent with transport of the iterations in the dual space (9) is equivalent to (28) if \(h = \frac{1}{p}\Vert \cdot \Vert _p^p\). We now adapt the result from [17] in the case of a p-convex space.
Theorem 5
Let \(f: X \rightarrow \mathbb R\) convex, differentiable and X being p-convex as in Definition 3. The gradient descent in dual space, equivalent to a mirror descent with \(h = \frac{1}{p}\Vert \cdot \Vert _p^p\), with a constant step \(\alpha \) and a starting point \(x_0\) produces iterations \(x_1,..., x_N\) such that for all \(\bar{x} \in X\):
which holds in particular when \(\bar{x}\) is the minimum of f.
Proof
Let \(k \in \mathbb N\) et \(\Phi _k = \frac{1}{\alpha } \Delta _h(x_k,\bar{x})\). By definition of \(\Delta _h\), and since \(\nabla h(x) = J_p(x)\):
Using the definition of p-convex space and substituting x by \(x_k\) and y by \(x_k - x_{k+1}\):
Whence
We then have:
Thanks to the Cauchy inequality in Banach space, \(\langle \nabla f(x_k), x_k - x_{k+1} \rangle \le \Vert \nabla f_k\Vert _{X^*}\Vert x_k - x_{k+1}\Vert _X \). We now use the Young’s inequality (\(ab \le \frac{1}{q}a^q + \frac{1}{p}b^p\), for all \(a,b \ge 0\)) to write
Thus
Summing over k we get:
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bernigaud, A., Gratton, S. & Simon, E. A non-linear conjugate gradient in dual space for \(L_p\)-norm regularized non-linear least squares with application in data assimilation. Numer Algor 95, 471–497 (2024). https://doi.org/10.1007/s11075-023-01578-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11075-023-01578-x