Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks

Xiao, Xuanzhe; Li, Zeng; Xie, Chuanlong; Zhou, Fengwei

doi:10.1007/978-3-031-44204-9_20

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14263))

Included in the following conference series:

International Conference on Artificial Neural Networks

929 Accesses

Abstract

Unraveling the reasons behind the remarkable success and exceptional generalization capabilities of deep neural networks presents a formidable challenge. Recent insights from random matrix theory, specifically those concerning the spectral analysis of weight matrices in deep neural networks, offer valuable clues to address this issue. A key finding indicates that the generalization performance of a neural network is associated with the degree of heavy tails in the spectrum of its weight matrices. To capitalize on this discovery, we introduce a novel regularization technique, termed Heavy-Tailed Regularization, which explicitly promotes a more heavy-tailed spectrum in the weight matrix through regularization. Firstly, we employ the Weighted Alpha and Stable Rank as penalty terms, both of which are differentiable, enabling the direct calculation of their gradients. To circumvent over-regularization, we introduce two variations of the penalty function. Then, adopting a Bayesian statistics perspective and leveraging knowledge from random matrices, we develop two novel heavy-tailed regularization methods, utilizing Power-law distribution and Fréchet distribution as priors for the global spectrum and maximum eigenvalues, respectively. We empirically show that heavy-tailed regularization outperforms conventional regularization techniques in terms of generalization performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 9380; Price includes VAT (Japan)

Softcover Book: JPY 11725; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improved Spectral Norm Regularization for Neural Networks

Heuristic Learning Model-Based Stochastic Regularization Technique for Reducing the Overfit of Training Data

Deep Network Regularization via Bayesian Inference of Synaptic Connectivity

References

Auffinger, A., Ben Arous, G., Péché, S.: Poisson convergence for the largest eigenvalues of heavy tailed random matrices. In: Annales de l’IHP Probabilités et Statistiques, vol. 45, pp. 589–610 (2009)
Google Scholar
Barsbey, M., Sefidgaran, M., Erdogdu, M.A., Richard, G., Simsekli, U.: Heavy tails in SGD and compressibility of overparametrized neural networks. Adv. Neural. Inf. Process. Syst. 34, 29364–29378 (2021)
Google Scholar
Bartlett, P., Maiorov, V., Meir, R.: Almost linear VC dimension bounds for piecewise polynomial networks. Adv. Neural. Inf. Process. Syst. 11 (1998)
Google Scholar
Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized margin bounds for neural networks. Adv. Neural. Inf. Process. Syst. 30 (2017)
Google Scholar
Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3(Nov), 463–482 (2002)
MathSciNet MATH Google Scholar
Chen, Q., Zhao, H., Li, W., Huang, P., Ou, W.: Behavior sequence transformer for e-commerce recommendation in Alibaba. In: Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data (2019). https://doi.org/10.1145/3326937.3341261, http://dx.doi.org/10.1145/3326937.3341261
Davis, R.A., Heiny, J., Mikosch, T., Xie, X.: Extreme value analysis for the sample autocovariance matrices of heavy-tailed multivariate time series. Extremes 19(3), 517–547 (2016). https://doi.org/10.1007/s10687-016-0251-7
Article MathSciNet MATH Google Scholar
Davis, R.A., Mikosch, T., Pfaffel, O.: Asymptotic theory for the sample covariance matrix of a heavy-tailed multivariate time series. Stochast. Process. Appl. 126(3), 767–799 (2016)
Article MathSciNet MATH Google Scholar
Davis, R.A., Pfaffel, O., Stelzer, R.: Limit theory for the largest eigenvalues of sample covariance matrices with heavy-tails. Stochast. Process. Appl. 124(1), 18–50 (2014)
Article MathSciNet MATH Google Scholar
Galassi, A., Lippi, M., Torroni, P.: Attention in natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32(10), 4291–4308 (2020). https://doi.org/10.1109/tnnls.2020.3019893, http://dx.doi.org/10.1109/tnnls.2020.3019893
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hodgkinson, L., Mahoney, M.: Multiplicative noise and heavy tails in stochastic optimization. In: International Conference on Machine Learning, pp. 4262–4274. PMLR (2021)
Google Scholar
Mandt, S., Hoffman, M.D., Blei, D.M.: Stochastic gradient descent as approximate Bayesian inference. arXiv preprint arXiv:1704.04289 (2017)
Martin, C.H., Mahoney, M.W.: Traditional and heavy-tailed self regularization in neural network models. arXiv preprint arXiv:1901.08276 (2019)
Martin, C.H., Mahoney, M.W.: Heavy-tailed universality predicts trends in test accuracies for very large pre-trained deep neural networks. In: Proceedings of the 2020 SIAM International Conference on Data Mining, pp. 505–513. SIAM (2020)
Google Scholar
Martin, C.H., Mahoney, M.W.: Implicit self-regularization in deep neural networks: evidence from random matrix theory and implications for learning. J. Mach. Learn. Res. 22(165), 1–73 (2021)
MathSciNet MATH Google Scholar
Martin, C.H., Mahoney, M.W.: Post-mortem on a deep learning contest: a Simpson’s paradox and the complementary roles of scale metrics versus shape metrics. arXiv preprint arXiv:2106.00734 (2021)
Martin, C.H., Peng, T.S., Mahoney, M.W.: Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data. Nat. Commun. 12(1), 1–13 (2021)
Article Google Scholar
Meng, X., Yao, J.: Impact of classification difficulty on the weight matrices spectra in deep learning and application to early-stopping. arXiv preprint arXiv:2111.13331 (2021)
Nagarajan, V., Kolter, J.Z.: Uniform convergence may be unable to explain generalization in deep learning. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Neyshabur, B., Bhojanapalli, S., Srebro, N.: A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1707.09564 (2017)
Simsekli, U., Sener, O., Deligiannidis, G., Erdogdu, M.A.: Hausdorff dimension, heavy tails, and generalization in neural networks. Adv. Neural. Inf. Process. Syst. 33, 5138–5151 (2020)
MATH Google Scholar
Soshnikov, A.: Poisson statistics for the largest eigenvalues of Wigner random matrices with heavy tails. Electron. Commun. Probab. 9, 82–91 (2004)
Article MathSciNet MATH Google Scholar
Vapnik, V., Levin, E., Le Cun, Y.: Measuring the VC-dimension of a learning machine. Neural Comput. 6(5), 851–876 (1994)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. ArXiv abs/1706.03762 (2017)
Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)
Article Google Scholar
Zhou, P., Feng, J., Ma, C., Xiong, C., Hoi, S.C.H., et al.: Towards theoretically understanding why SGD generalizes better than Adam in deep learning. Adv. Neural. Inf. Process. Syst. 33, 21285–21296 (2020)
Google Scholar

Download references

Acknowledgements

Zeng Li’s research is partially supported by NSFC (National Nature Science Foundation of China) Grant NO. 12101292, NSFC Grant NO. 12031005, Shenzhen Fundamental Research Program JCYJ20220818100602005.

Author information

Authors and Affiliations

Southern University of Science and Technology, Shenzhen, 518055, People’s Republic of China
Xuanzhe Xiao & Zeng Li
Beijing Normal University, Zhuhai, 519087, People’s Republic of China
Chuanlong Xie
Huawei Noah’s Ark Lab, Hong Kong, People’s Republic of China
Fengwei Zhou

Authors

Xuanzhe Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Zeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Chuanlong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Fengwei Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zeng Li .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
Lancaster University, Lancaster, UK
Plamen Angelov
Teesside University, Middlesbrough, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiao, X., Li, Z., Xie, C., Zhou, F. (2023). Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14263. Springer, Cham. https://doi.org/10.1007/978-3-031-44204-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-44204-9_20
Published: 22 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44203-2
Online ISBN: 978-3-031-44204-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improved Spectral Norm Regularization for Neural Networks

Heuristic Learning Model-Based Stochastic Regularization Technique for Reducing the Overfit of Training Data

Deep Network Regularization via Bayesian Inference of Synaptic Connectivity

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improved Spectral Norm Regularization for Neural Networks

Heuristic Learning Model-Based Stochastic Regularization Technique for Reducing the Overfit of Training Data

Deep Network Regularization via Bayesian Inference of Synaptic Connectivity

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation