Abstract
A preconditioned gradient scheme for the regularized minimization problem arising from the approximation of given data by a shallow neural network is presented. The construction of the preconditioner is based on random normal projections and is adjusted to the specific structure of the regularized problem.
The convergence of the preconditioned gradient method is investigated numerically for a synthetic problem with a known local minimizer. The method is also applied to real problems from the Proben1 benchmark set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Broyden, C.G., Dennis, J.E., Jr., Moré, J.J.: On the local and superlinear convergence of quasi-Newton methods. IMA J. Appl. Math. 12(3), 223–245 (1973)
Crane, R., Roosta, F.: Invexifying regularization of non-linear least-squares problems. arXiv preprint arXiv:2111.11027 (2021)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121–2159 (2011)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Gorbunov, E., Hanzely, F., Richtárik, P.: A unified theory of SGD: variance reduction, sampling, quantization and coordinate descent. In: International Conference on Artificial Intelligence and Statistics, pp. 680–690. PMLR (2020)
Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Hanke-Bourgeois, M.: Grundlagen der numerischen Mathematik und des wissenschaftlichen Rechnens, 3rd edn. Vieweg + Teubner, Wiesbaden (2009). https://doi.org/10.1007/978-3-8351-9020-7
Herman, G.T., Lent, A., Hurwitz, H.: A storage-efficient algorithm for finding the regularized solution of a large, inconsistent system of equations. IMA J. Appl. Math. 25(4), 361–366 (1980)
Lange, S., Helfrich, K., Ye, Q.: Batch normalization preconditioning for neural network training. arXiv preprint arXiv:2108.01110 (2021)
Meng, X., Saunders, M.A., Mahoney, M.W.: LSRN: a parallel iterative solver for strongly over-or underdetermined systems. SIAM J. Sci. Comput. 36(2), C95–C118 (2014)
Onose, A., Mossavat, S.I., Smilde, H.J.H.: A preconditioned accelerated stochastic gradient descent algorithm. In: 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2020)
Paige, C.C., Saunders, M.A.: Algorithm 583: LSQR: sparse linear equations and least squares problems. ACM Trans. Math. Softw. 8(2), 195–209 (1982)
Prechelt, L.: Proben1: a set of neural network benchmark problems and benchmarking rules (1994)
Qiao, Y., Lelieveldt, B.P., Staring, M.: An efficient preconditioner for stochastic gradient descent optimization of image registration. IEEE Trans. Med. Imaging 38(10), 2314–2325 (2019)
Vater, N., Borzì, A.: Training artificial neural networks with gradient and coarse-level correction schemes. In: Nicosia, G., et al. (eds.) International Conference on Machine Learning, Optimization, and Data Science, pp. 473–487. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-95467-3_34
Zhang, J., Fattahi, S., Zhang, R.: Preconditioned gradient descent for over-parameterized nonconvex matrix factorization. Adv. Neural Inf. Process. Syst. 34 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vater, N., Borzì, A. (2023). Preconditioned Gradient Method for Data Approximation with Shallow Neural Networks. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13811. Springer, Cham. https://doi.org/10.1007/978-3-031-25891-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-25891-6_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25890-9
Online ISBN: 978-3-031-25891-6
eBook Packages: Computer ScienceComputer Science (R0)