Abstract
We propose a new variant of Kelley’s cutting-plane method for minimizing a nonsmooth convex Lipschitz-continuous function over the Euclidean space. We derive the method through a constructive approach and prove that it attains the optimal rate of convergence for this class of problems.
Similar content being viewed by others
Notes
In order to avoid overly numerous special cases, we adopt the convention \(\frac{0}{0}=0\).
Note that since both problems admit a compact feasible set, attainment of both values is warranted.
References
Auslender, A.: Numerical methods for nondifferentiable convex optimization. In: Cornet, B., Nguyen, V., Vial, J. (eds.) Nonlinear Analysis and Optimization, Mathematical Programming Studies, vol. 30, pp. 102–126. Springer, Berlin (1987). doi:10.1007/BFb0121157
Auslender, A., Teboulle, M.: Interior gradient and epsilon-subgradient descent methods for constrained convex minimization. Math. Oper. Res. 29(1), 1–26 (2004)
Ben-Tal, A., Nemirovski, A.: Non-euclidean restricted memory level method for large-scale convex optimization. Math. Progr. 102(3), 407–456 (2005)
Ben-Tal, A., Nemirovskii, A.S.: Lectures on Modern Convex Optimization. Siam, Philadelphia (2001)
Benders, J.F.: Partitioning procedures for solving mixed-variables programming problems. Numer. Math. 4(1), 238–252 (1962)
Cheney, E.W., Goldstein, A.A.: Newton’s method for convex programming and Tchebycheff approximation. Numer. Math. 1(1), 253–268 (1959)
de Oliveira, W., Sagastizábal, C.: Bundle Methods in the XXIst Century: A Birds-Eye View. Optimization Online Report 4088 (2013)
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Progr. Ser. A 145, 451–482 (2014)
Fan, K.: Minimax theorems. Proc. Natl. Acad. Sci. USA 39(1), 42 (1953)
Grone, R., Johnson, C.R., Sá, E.M., Wolkowicz, H.: Positive definite completions of partial hermitian matrices. Linear Algebr. Appl. 58, 109–124 (1984)
Kelley Jr, J.E.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 8(4), 703–712 (1960)
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. arXiv:1406.5468 (2014)
Kiwiel, K.C.: Proximity control in bundle methods for convex nondifferentiable minimization. Math. Progr. 46(1–3), 105–122 (1990)
Kiwiel, K.C.: Proximal level bundle methods for convex nondifferentiable optimization, saddle-point problems and variational inequalities. Math. Progr. 69(1–3), 89–109 (1995)
Kiwiel, K.C.: Efficiency of proximal bundle methods. J. Optim. Theory Appl. 104(3), 589–603 (2000)
Lemaréchal, C.: An extension of davidon methods to non differentiable problems. In: Nondifferentiable Optimization, pp. 95–109. Springer, Berlin (1975)
Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Progr. 69(1–3), 111–147 (1995)
Lemaréchal, C., Sagastizábal, C.: Variable metric bundle methods: from conceptual to implementable forms. Math. Progr. 76(3), 393–410 (1997)
Lukšan, L., Vlček, J.: A Bundle–Newton method for nonsmooth unconstrained minimization. Math. Progr. 83(1–3), 373–391 (1998)
Mäkelä, M.: Survey of bundle methods for nonsmooth optimization. Optim. Methods Softw. 17(1), 1–29 (2002)
Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. A Wiley-Interscience Publication. Wiley, New York (1983) (Translated from the Russian and with a preface by E. R. Dawson, Wiley-Interscience Series in Discrete Mathematics)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Kluwer Academic Publishers, Dordrecht (2004)
Schramm, H., Zowe, J.: A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results. SIAM J. Optim. 2(1), 121–152 (1992)
Wolfe, P.: A method of conjugate subgradients for minimizing nondifferentiable functions. In: Nondifferentiable Optimization, pp. 145–173. Springer, Berlin (1975)
Acknowledgments
We thank the two referees and the associate editor for their constructive comments and useful suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was partially supported by the Israel Science Foundation under ISF Grant No. 998-12.
Appendix: a tight lower-complexity bound
Appendix: a tight lower-complexity bound
In this appendix, we refine the proof from [22, Sect. 3.2] to obtain a new lower-complexity bound on the class of nonsmooth, convex, and Lipschitz-continuous functions, which together with the results discussed above form a tight complexity result for this class of problems. More precisely, under the setting of Sect. 2.1, we show that for any first-order method, the worst-case absolute inaccuracy after N steps cannot be better than \(\frac{LR}{\sqrt{N}}\), which is exactly the bound attained by Algorithm KLM.
In order to simplify the presentation, and following [22, Sect. 3.2], we restrict our attention to first-order methods that generate sequences that satisfy the following assumption:
Assumption 1
The sequence \(\{x_i\}\) satisfies
where \(f'(x_i)\in \partial f(x_i)\) is obtained by evaluating a first-order oracle at \(x_i\).
As noted by Nesterov [22, Page 59], this assumption is not necessary and can be avoided by some additional reasoning.
The lower-complexity result is stated as follows.
Theorem 2
For any \(L,R>0\), \(N,p\in \mathbb {N}\) with \(N\le p\), and any starting point \(x_1\in \mathbb {R}^p\), there exists a convex and Lipschitz-continuous function \(f:\mathbb {R}^p\rightarrow \mathbb {R}\) with Lipschitz constant L and \(\Vert x^*_f-x_1\Vert \le R\), and a first-order oracle \(\mathcal {O}(x)= (f(x), f'(x))\), such that
for all sequences \(x_1,\dots ,x_N\) that satisfies Assumption 1.
Proof
The proof proceeds by constructing a “worst-case” function, on which any first-order method that satisfies Assumption 1 will not be able to improve its initial objective value during the first N iterations.
Let \(f_N:\mathbb {R}^p\rightarrow \mathbb {R}\) and \(\bar{f}_N:\mathbb {R}^p\rightarrow \mathbb {R}\) be defined by
then it is easy to verify that \(\bar{f}_N\) is Lipschitz-continuous with constant L and that
is attained for \(x^*\in \mathbb {R}^p\) such that
We equip \(\bar{f}_N\) with the oracle \(\mathcal {O}_N(x)= (\bar{f}_N(x), \bar{f}'_N(x))\) by choosing \(\bar{f}'_N(x)\in \partial \bar{f}_N(x)\) according to:
where
We also denote
Now, let \(x_1,\dots ,x_N\) be a sequence that satisfies Assumption 1 with \(f=\bar{f}_N\) and the oracle \(\mathcal {O}_N\), where without loss of generality we assume \(x_1=0\). Then \(\bar{f}'_N(x_1) = e_1\) and we get \(x_2\in \mathrm {span}\{\bar{f}'_N(x_1)\}=\mathbb {R}^{1,p}\). Now, from \(\langle x_2,e_2\rangle =\dots =\langle x_2,e_N\rangle =0\), we get that \(\min \{ i : f_N(x)=\langle x, e_i\rangle \}\le 2\) and it follows by (8.1) and (8.2) that \(f'_N(x_2)\in \mathbb {R}^{2,p}\) and \(\bar{f}'_N(x_2)\in \mathbb {R}^{2,p}\). Hence, we conclude from Assumption 1 that \(x_3 \in \mathrm {span}\{\bar{f}'_N(x_1),\bar{f}'_N(x_2)\}\subseteq \mathbb {R}^{2,p}\). It is straightforward to continue this argument to show that \(x_i \in \mathbb {R}^{i-1,p}\) and \(\bar{f}'_N(x_i)\in \mathbb {R}^{i,p}\) for \(i=1,\dots ,N\), thus \(x_N \in \mathbb {R}^{N-1,p}\). Finally, since for every \(x\in \mathbb {R}^{N-1,p}\) we have \(\bar{f}_N(x)\ge \langle x,e_N\rangle =0\), we immediately get
which completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Drori, Y., Teboulle, M. An optimal variant of Kelley’s cutting-plane method. Math. Program. 160, 321–351 (2016). https://doi.org/10.1007/s10107-016-0985-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-016-0985-7
Keywords
- Nonsmooth convex optimization
- Kelley’s cutting-plane method
- Bundle and subgradient methods
- Duality
- Complexity
- Rate of convergence