Stochastic Steffensen method

Zhao, Minda; Lai, Zehua; Lim, Lek-Heng

doi:10.1007/s10589-024-00583-7

Minda Zhao¹^na1,
Zehua Lai²^na1 &
Lek-Heng Lim³^na1

213 Accesses
Explore all metrics

Abstract

Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes—the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating a specific step size we can even push its convergence order beyond quadratic to \(1+\sqrt{2} \approx 2.414\). While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method—note that this is not true for SGD or SLBFGS—and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning

Article Open access 25 February 2023

An Accelerated Stochastic Mirror Descent Method

Article 29 August 2023

A mini-batch stochastic conjugate gradient algorithm with variance reduction

Article 01 July 2022

Data availability

We do not analyze or generate any datasets. The numerical experiments in Sect. 5 rely on standard datasets in LIBSVM that is publicly available from https://github.com/cjlin1/libsvm. The authors declare no conflict of interest.

Notes

SSM and SSBB are adpative methods; and we restrict our comparison here to other adaptive methods like SVRG–BB and SLBFGS as opposed to nonadaptive methods like the ones in [17, 18].
https://github.com/cjlin1/libsvm.

References

Steffensen, J.F.: Remarks on iteration. Skand. Aktuarietidskr. 1, 64–72 (1933)
Google Scholar
Steffensen, J.F.: Further remarks on iteration. Skand. Aktuarietidskr. 28, 44–55 (1945)
MathSciNet Google Scholar
Amat, S., Ezquerro, J.A., Hernández-Verón, M.A.: On a Steffensen-like method for solving nonlinear equations. Calcolo 53(2), 171–188 (2016)
Article MathSciNet Google Scholar
Ezquerro, J.A., Hernández-Verón, M.A., Rubio, M.J., Velasco, A.I.: An hybrid method that improves the accessibility of Steffensen’s method. Numer. Algorithms 66(2), 241–267 (2014)
Article MathSciNet Google Scholar
Henrici, P.: Elements of Numerical Analysis. John Wiley, New York (1964)
Google Scholar
Huang, H.Y.: Unified approach to quadratically convergent algorithms for function minimization. J. Optim. Theory Appl. 5, 405–423 (1970)
Article MathSciNet Google Scholar
Johnson, L.W., Scholz, D.R.: On Steffensen’s method. SIAM J. Numer. Anal. 5, 296–302 (1968)
Article MathSciNet Google Scholar
Nedzhibov, G.H.: An approach to accelerate iterative methods for solving nonlinear operator equations. In: Applications of Mathematics in Engineering and Economics (AMEE’11). AIP Conf. Proc., vol. 1410, pp. 76–82. Amer. Inst. Phys., Melville (2011)
Nievergelt, Y.: Aitken’s and Steffensen’s accelerations in several variables. Numer. Math. 59(3), 295–310 (1991)
Article MathSciNet Google Scholar
Nievergelt, Y.: The condition of Steffensen’s acceleration in several variables. J. Comput. Appl. Math. 58(3), 291–305 (1995)
Article MathSciNet Google Scholar
Noda, T.: The Aitken-Steffensen method in the solution of simultaneous nonlinear equations. Sūgaku 33(4), 369–372 (1981)
MathSciNet Google Scholar
Noda, T.: The Aitken-Steffensen method in the solution of simultaneous nonlinear equations. II. Sūgaku 38(1), 83–85 (1986)
MathSciNet Google Scholar
Noda, T.: The Aitken-Steffensen method in the solution of simultaneous nonlinear equations. III. Proc. Jpn. Acad. Ser. A Math. Sci. 62(5), 174–177 (1986)
Article MathSciNet Google Scholar
Noda, T.: The Aitken-Steffensen formula for systems of nonlinear equations. IV. Proc. Jpn. Acad. Ser. A Math. Sci. 66(8), 260–263 (1990)
Article MathSciNet Google Scholar
Noda, T.: The Aitken-Steffensen formula for systems of nonlinear equations. V. Proc. Jpn. Acad. Ser. A Math. Sci. 68(2), 37–40 (1992)
Article MathSciNet Google Scholar
Gill, P.E., Murray, W., Wright, M.H.: Numerical linear algebra and optimization. In: Classics in Applied Mathematics, vol. 83. Society for Industrial and Applied Mathematics, Philadelphia (2021)
Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Hatami, H., McKenzie, P., King, V. (eds.) STOC’17: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. Annual ACM Symposium on Theory of Computing, pp. 1200–1205 (2017)
Woodworth, B., Srebro, N.: Tight complexity bounds for optimizing composite objectives. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems (NIPS 2016). Advances in Neural Information Processing Systems, vol. 29 (2016)
Brezinski, C., Redivo-Zaglia, M.: Extrapolation and Rational Approximation–the Works of the Main Contributors. Springer, Cham (2020)
Book Google Scholar
Householder, A.S.: The Numerical Treatment of a Single Nonlinear Equation. International Series in Pure and Applied Mathematics, McGraw-Hill, New York (1970)
Google Scholar
Kaczmarz, S.: Angenäherte auflösung von systemen linearer gleichungen. Bull. Int. Acad. Polon. Sci. A 57(6), 355–357 (1937)
MathSciNet Google Scholar
Kaczmarz, S.: Approximate solution of systems of linear equations. Int. J. Control 57(6), 1269–1271 (1993)
Article MathSciNet Google Scholar
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15(2), 262–278 (2009)
Article MathSciNet Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet Google Scholar
Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in Neural Information Processing Systems 25, pp. 2672–2680 (2012)
Defazio, A., Bach, F.R., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems 27, pp. 1646–1654 (2014)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems 26, pp. 315–323 (2013)
Moritz, P., Nishihara, R., Jordan, M.I.: A linearly-convergent stochastic L-BFGS algorithm. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS. JMLR Workshop and Conference Proceedings, vol. 51, pp. 249–258 (2016)
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)
Article MathSciNet Google Scholar
Zhao, R., Haskell, W.B., Tan, V.Y.: Stochastic L-BFGS: improved convergence rates and practical acceleration strategies. IEEE Trans. Signal Process. 66(5), 1155–1169 (2018)
Article MathSciNet Google Scholar
Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
MathSciNet Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR (2015)
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(1/k^{2})\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)
MathSciNet Google Scholar
Poljak, B.T.: Some methods of speeding up the convergence of iterative methods. Ž. Vyčisl. Mat i Mat. Fiz. 4, 791–803 (1964)
MathSciNet Google Scholar
Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw. 12(1), 145–151 (1999)
Article Google Scholar
Reddi, S.J., Kale, S., Kumar, S.: On the convergence of Adam and beyond. arXiv:1904.09237 (2019)
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Article MathSciNet Google Scholar
Broyden, C.G.: The convergence of a class of double-rank minimization algorithms. II. The new algorithm. J. Inst. Math. Appl. 6, 222–231 (1970)
Article MathSciNet Google Scholar
Fletcher, R.: A new approach to variable metric algorithms. Comput. J. 13(3), 317–322 (1970)
Article Google Scholar
Goldfarb, D.: A family of variable-metric methods derived by variational means. Math. Comput. 24, 23–26 (1970)
Article MathSciNet Google Scholar
Shanno, D.F.: Conditioning of quasi-Newton methods for function minimization. Math. Comput. 24, 647–656 (1970)
Article MathSciNet Google Scholar
Potra, F.A.: On an iterative algorithm of order \(1.839\cdots \) for solving nonlinear operator equations. Numer. Funct. Anal. Optim. 7(1), 75–106 (1984/85)
Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Math. Program. 155(1–2, Ser. A), 549–573 (2016)
Article MathSciNet Google Scholar
Babanezhad, R., Ahmed, M.O., Virani, A., Schmidt, M., Konečný, J., Sallinen, S.: Stopwasting my gradients: practical SVRG. In: Advances in Neural Information Processing Systems 28, pp. 2251–2259 (2015)
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)
Article MathSciNet Google Scholar
Nitanda, A.: Accelerated stochastic gradient descent for minimizing finite sums. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS. JMLR Workshop and Conference Proceedings, vol. 51, pp. 195–203 (2016)
Tan, C., Ma, S., Dai, Y., Qian, Y.: Barzilai-borwein step size for stochastic gradient descent. In: Advances in Neural Information Processing Systems 29, pp. 685–693 (2016)
Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics, Princeton University Press, Princeton (1997)
Google Scholar

Download references

Acknowledgements

This work is partially supported by DARPA HR00112190040, NSF DMS-1854831, NSF ECCS-2216912, ONR N000142312863, and the Eckhardt Faculty Fund. We thank Nati Srebro for his exceptionally pertinent pointers and the two anonymous referees for their helpful comments. LHL thanks Junjie Yue for helpful discussions.

Author information

Minda Zhao, Zehua Lai and Lek-Heng Lim authors contributed equally to this work.

Authors and Affiliations

School of Industrial and Systems Engineering, Georgia Institute of Technology, 755 Ferst Drive, Atlanta, GA, 30332, USA
Minda Zhao
Department of Mathematics, University of Texas, 2515 Speedway, Austin, TX, 78712, USA
Zehua Lai
Computational and Applied Mathematics Initiative, University of Chicago, 5747 South Ellis Avenue, Chicago, IL, 60637, USA
Lek-Heng Lim

Authors

Minda Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zehua Lai
View author publications
You can also search for this author in PubMed Google Scholar
Lek-Heng Lim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lek-Heng Lim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, M., Lai, Z. & Lim, LH. Stochastic Steffensen method. Comput Optim Appl 89, 1–32 (2024). https://doi.org/10.1007/s10589-024-00583-7

Download citation

Received: 15 June 2023
Accepted: 12 May 2024
Published: 07 June 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s10589-024-00583-7

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Stochastic Steffensen method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning

An Accelerated Stochastic Mirror Descent Method

A mini-batch stochastic conjugate gradient algorithm with variance reduction

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Stochastic Steffensen method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning

An Accelerated Stochastic Mirror Descent Method

A mini-batch stochastic conjugate gradient algorithm with variance reduction

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation