Abstract
In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two terms: one is smooth and given by a black-box oracle, and another is a simple general convex function with known structure. Despite the absence of good properties of the sum, such problems, both in convex and nonconvex cases, can be solved with efficiency typical for the first part of the objective. For convex problems of the above structure, we consider primal and dual variants of the gradient method (with convergence rate \(O\left({1 \over k}\right)\)), and an accelerated multistep version with convergence rate \(O\left({1 \over k^2}\right)\), where \(k\) is the iteration counter. For nonconvex problems with this structure, we prove convergence to a point from which there is no descent direction. In contrast, we show that for general nonsmooth, nonconvex problems, even resolving the question of whether a descent direction exists from a point is NP-hard. For all methods, we suggest some efficient “line search” procedures and show that the additional computational work necessary for estimating the unknown problem class parameters can only multiply the complexity of each iteration by a small constant factor. We present also the results of preliminary computational experiments, which confirm the superiority of the accelerated scheme.
Similar content being viewed by others
Notes
However, this idea has much longer history. To the best of our knowledge, for the general framework this technique was originally developed in [4].
In this paper, for the sake of simplicity, we restrict ourselves to Euclidean norms only. The extension onto the general case can be done in a standard way using Bregman distances (e.g. [10]).
References
Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1998)
Claerbout, J., Muir, F.: Robust modelling of eratic data. Geophysics 38, 826–844 (1973)
Figueiredo, M., Novak, R., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. Submitted for publication
Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain nonconvex problems. Int. J. Sys. Sci. 12(8), 989–1000 (1981)
Kim, S.-J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: A method for large-scale \(l_1\)-regularized least-squares problems with applications in signal processing and statistics. Stanford University, March 20, Research report (2007)
Levy, S., Fullagar, P.: Reconstruction of a sparse spike train from a portion of its spectrum and application to high-resolution deconvolution. Geophysics 46, 1235–1243 (1981)
Miller, A.: Subset Selection in Regression. Chapman and Hall, London (2002)
Nemirovsky, A., Yudin, D.: Informational Complexity and Efficient Methods for Solution of Convex Extremal Problems. Wiley, New-York (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer, Boston (2004)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. (A) 103(1), 127–152 (2005)
Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE Discussion Paper \(\#\) 2007/76, CORE (2007)
Nesterov, Y.: Rounding of convex sets and efficient gradient methods for linear programming problems. Optim. Methods Softw. 23(1), 109–135 (2008)
Nesterov, Y.: Accelerating the cubic regularization of Newton’s method on convex problems. Math. Program. 112(1), 159–181 (2008)
Nesterov, Y., Nemirovskii, A.: Interior Point Polynomial Methods in Convex Programming: Theory and Applications. SIAM, Philadelphia (1994)
Ortega, J., Rheinboldt, W.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)
Santosa, F., Symes, W.: Linear inversion of band-limited reflection histograms. SIAM J. Sci. Stat. Comput. 7, 1307–1330 (1986)
Taylor, H., Bank, S., McCoy, J.: Deconvolution with the \(l_1\) norm. Geophysics 44, 39–52 (1979)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Tropp, J.: Just relax: convex programming methods for identifying sparse signals. IEEE Trans. Inf. Theory 51, 1030–1051 (2006)
Wright, S.J.: Solving \(l_{1}\)-Regularized Regression Problems. Talk at International Conference “Combinatorics and Optimization”, Waterloo (June 2007)
Acknowledgments
The author would like to thank M. Overton, Y. Xia, and anonymous referees for numerous useful suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Dedicated to Claude Lemaréchal on the Occasion of his 65th Birthday.
The author acknowledges the support from Office of Naval Research grant # N000140811104: Efficiently Computable Compressed Sensing.
Rights and permissions
About this article
Cite this article
Nesterov, Y. Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-012-0629-5
Keywords
- Local optimization
- Convex Optimization
- Nonsmooth optimization
- Complexity theory
- Black-box model
- Optimal methods
- Structural optimization
- \(l_1\)-Regularization