Stochastic Steffensen method | Computational Optimization and Applications Skip to main content
Log in

Stochastic Steffensen method

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes—the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating a specific step size we can even push its convergence order beyond quadratic to \(1+\sqrt{2} \approx 2.414\). While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method—note that this is not true for SGD or SLBFGS—and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

We do not analyze or generate any datasets. The numerical experiments in Sect. 5 rely on standard datasets in LIBSVM that is publicly available from https://github.com/cjlin1/libsvm. The authors declare no conflict of interest.

Notes

  1. SSM and SSBB are adpative methods; and we restrict our comparison here to other adaptive methods like SVRG–BB and SLBFGS as opposed to nonadaptive methods like the ones in [17, 18].

  2. https://github.com/cjlin1/libsvm.

References

  1. Steffensen, J.F.: Remarks on iteration. Skand. Aktuarietidskr. 1, 64–72 (1933)

    Google Scholar 

  2. Steffensen, J.F.: Further remarks on iteration. Skand. Aktuarietidskr. 28, 44–55 (1945)

    MathSciNet  Google Scholar 

  3. Amat, S., Ezquerro, J.A., Hernández-Verón, M.A.: On a Steffensen-like method for solving nonlinear equations. Calcolo 53(2), 171–188 (2016)

    Article  MathSciNet  Google Scholar 

  4. Ezquerro, J.A., Hernández-Verón, M.A., Rubio, M.J., Velasco, A.I.: An hybrid method that improves the accessibility of Steffensen’s method. Numer. Algorithms 66(2), 241–267 (2014)

    Article  MathSciNet  Google Scholar 

  5. Henrici, P.: Elements of Numerical Analysis. John Wiley, New York (1964)

    Google Scholar 

  6. Huang, H.Y.: Unified approach to quadratically convergent algorithms for function minimization. J. Optim. Theory Appl. 5, 405–423 (1970)

    Article  MathSciNet  Google Scholar 

  7. Johnson, L.W., Scholz, D.R.: On Steffensen’s method. SIAM J. Numer. Anal. 5, 296–302 (1968)

    Article  MathSciNet  Google Scholar 

  8. Nedzhibov, G.H.: An approach to accelerate iterative methods for solving nonlinear operator equations. In: Applications of Mathematics in Engineering and Economics (AMEE’11). AIP Conf. Proc., vol. 1410, pp. 76–82. Amer. Inst. Phys., Melville (2011)

  9. Nievergelt, Y.: Aitken’s and Steffensen’s accelerations in several variables. Numer. Math. 59(3), 295–310 (1991)

    Article  MathSciNet  Google Scholar 

  10. Nievergelt, Y.: The condition of Steffensen’s acceleration in several variables. J. Comput. Appl. Math. 58(3), 291–305 (1995)

    Article  MathSciNet  Google Scholar 

  11. Noda, T.: The Aitken-Steffensen method in the solution of simultaneous nonlinear equations. Sūgaku 33(4), 369–372 (1981)

    MathSciNet  Google Scholar 

  12. Noda, T.: The Aitken-Steffensen method in the solution of simultaneous nonlinear equations. II. Sūgaku 38(1), 83–85 (1986)

    MathSciNet  Google Scholar 

  13. Noda, T.: The Aitken-Steffensen method in the solution of simultaneous nonlinear equations. III. Proc. Jpn. Acad. Ser. A Math. Sci. 62(5), 174–177 (1986)

    Article  MathSciNet  Google Scholar 

  14. Noda, T.: The Aitken-Steffensen formula for systems of nonlinear equations. IV. Proc. Jpn. Acad. Ser. A Math. Sci. 66(8), 260–263 (1990)

    Article  MathSciNet  Google Scholar 

  15. Noda, T.: The Aitken-Steffensen formula for systems of nonlinear equations. V. Proc. Jpn. Acad. Ser. A Math. Sci. 68(2), 37–40 (1992)

    Article  MathSciNet  Google Scholar 

  16. Gill, P.E., Murray, W., Wright, M.H.: Numerical linear algebra and optimization. In: Classics in Applied Mathematics, vol. 83. Society for Industrial and Applied Mathematics, Philadelphia (2021)

  17. Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Hatami, H., McKenzie, P., King, V. (eds.) STOC’17: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. Annual ACM Symposium on Theory of Computing, pp. 1200–1205 (2017)

  18. Woodworth, B., Srebro, N.: Tight complexity bounds for optimizing composite objectives. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems (NIPS 2016). Advances in Neural Information Processing Systems, vol. 29 (2016)

  19. Brezinski, C., Redivo-Zaglia, M.: Extrapolation and Rational Approximation–the Works of the Main Contributors. Springer, Cham (2020)

    Book  Google Scholar 

  20. Householder, A.S.: The Numerical Treatment of a Single Nonlinear Equation. International Series in Pure and Applied Mathematics, McGraw-Hill, New York (1970)

    Google Scholar 

  21. Kaczmarz, S.: Angenäherte auflösung von systemen linearer gleichungen. Bull. Int. Acad. Polon. Sci. A 57(6), 355–357 (1937)

    MathSciNet  Google Scholar 

  22. Kaczmarz, S.: Approximate solution of systems of linear equations. Int. J. Control 57(6), 1269–1271 (1993)

    Article  MathSciNet  Google Scholar 

  23. Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15(2), 262–278 (2009)

    Article  MathSciNet  Google Scholar 

  24. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MathSciNet  Google Scholar 

  25. Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in Neural Information Processing Systems 25, pp. 2672–2680 (2012)

  26. Defazio, A., Bach, F.R., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems 27, pp. 1646–1654 (2014)

  27. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems 26, pp. 315–323 (2013)

  28. Moritz, P., Nishihara, R., Jordan, M.I.: A linearly-convergent stochastic L-BFGS algorithm. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS. JMLR Workshop and Conference Proceedings, vol. 51, pp. 249–258 (2016)

  29. Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)

    Article  MathSciNet  Google Scholar 

  30. Zhao, R., Haskell, W.B., Tan, V.Y.: Stochastic L-BFGS: improved convergence rates and practical acceleration strategies. IEEE Trans. Signal Process. 66(5), 1155–1169 (2018)

    Article  MathSciNet  Google Scholar 

  31. Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)

    MathSciNet  Google Scholar 

  32. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR (2015)

  33. Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(1/k^{2})\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)

    MathSciNet  Google Scholar 

  34. Poljak, B.T.: Some methods of speeding up the convergence of iterative methods. Ž. Vyčisl. Mat i Mat. Fiz. 4, 791–803 (1964)

    MathSciNet  Google Scholar 

  35. Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw. 12(1), 145–151 (1999)

    Article  Google Scholar 

  36. Reddi, S.J., Kale, S., Kumar, S.: On the convergence of Adam and beyond. arXiv:1904.09237 (2019)

  37. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)

    Article  MathSciNet  Google Scholar 

  38. Broyden, C.G.: The convergence of a class of double-rank minimization algorithms. II. The new algorithm. J. Inst. Math. Appl. 6, 222–231 (1970)

    Article  MathSciNet  Google Scholar 

  39. Fletcher, R.: A new approach to variable metric algorithms. Comput. J. 13(3), 317–322 (1970)

    Article  Google Scholar 

  40. Goldfarb, D.: A family of variable-metric methods derived by variational means. Math. Comput. 24, 23–26 (1970)

    Article  MathSciNet  Google Scholar 

  41. Shanno, D.F.: Conditioning of quasi-Newton methods for function minimization. Math. Comput. 24, 647–656 (1970)

    Article  MathSciNet  Google Scholar 

  42. Potra, F.A.: On an iterative algorithm of order \(1.839\cdots \) for solving nonlinear operator equations. Numer. Funct. Anal. Optim. 7(1), 75–106 (1984/85)

  43. Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Math. Program. 155(1–2, Ser. A), 549–573 (2016)

    Article  MathSciNet  Google Scholar 

  44. Babanezhad, R., Ahmed, M.O., Virani, A., Schmidt, M., Konečný, J., Sallinen, S.: Stopwasting my gradients: practical SVRG. In: Advances in Neural Information Processing Systems 28, pp. 2251–2259 (2015)

  45. Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)

    Article  MathSciNet  Google Scholar 

  46. Nitanda, A.: Accelerated stochastic gradient descent for minimizing finite sums. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS. JMLR Workshop and Conference Proceedings, vol. 51, pp. 195–203 (2016)

  47. Tan, C., Ma, S., Dai, Y., Qian, Y.: Barzilai-borwein step size for stochastic gradient descent. In: Advances in Neural Information Processing Systems 29, pp. 685–693 (2016)

  48. Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics, Princeton University Press, Princeton (1997)

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by DARPA HR00112190040, NSF DMS-1854831, NSF ECCS-2216912, ONR N000142312863, and the Eckhardt Faculty Fund. We thank Nati Srebro for his exceptionally pertinent pointers and the two anonymous referees for their helpful comments. LHL thanks Junjie Yue for helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lek-Heng Lim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, M., Lai, Z. & Lim, LH. Stochastic Steffensen method. Comput Optim Appl 89, 1–32 (2024). https://doi.org/10.1007/s10589-024-00583-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-024-00583-7

Keywords

Mathematics Subject Classification

Navigation