Training Aware Sigmoidal Optimization | SpringerLink
Skip to main content

Training Aware Sigmoidal Optimization

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2022)

Abstract

Proper optimization of deep neural networks is an open research question since an optimal procedure to change the learning rate throughout training is still unknown. Manually defining a learning rate schedule involves troublesome, time-consuming try and error procedures to determine hyperparameters such as learning rate decay epochs and learning rate decay rates. Although adaptive learning rate optimizers automatize this process, recent studies suggest they may produce overfitting and reduce performance compared to fine-tuned learning rate schedules. Considering that deep neural networks loss functions present landscapes with much more saddle points than local minima, we proposed the Training Aware Sigmoidal Optimizer (TASO), consisting of a two-phase automated learning rate schedule. The first phase uses a high learning rate to fast traverse the numerous saddle point, while the second phase uses a low learning rate to approach the center of the local minimum previously found slowly. We compared the proposed approach with commonly used adaptive learning rates schedules such as Adam, RMSProp, and Adagrad. Our experiments showed that TASO outperformed all competing methods in both optimal (i.e., performing hyperparameter validation) and suboptimal (i.e., using default hyperparameters) scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/anonymous .

References

  1. Touvron, H., Vedaldi, A., Douze, M., Jegou, H.: Fixing the train-test resolution discrepancy. In: Neural Information Processing Systems, pp. 8252–8262 (2019)

    Google Scholar 

  2. Park, D.S.: “Improved noisy student training for automatic speech recognition. In: Annual Conference of the International Speech Communication Association, pp. 2817–2821 (2020)

    Google Scholar 

  3. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)

    Google Scholar 

  4. Alom, M., et al.: A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3), 292–358 (2019)

    Article  Google Scholar 

  5. Schmidt, R.M., Schneider, F., Hennig, P.: Descending through a crowded valley - benchmarking deep learning optimizers, CoRR, vol. abs/ arXiv: 2007.01547 (2020)

  6. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. M. L. Res. 12, 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  7. Dauphin, Y.N., Pascanu, R., Gülçehre, Ç., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Neural Information Processing Systems, pp. 2933–2941 (2014)

    Google Scholar 

  8. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations (2015)

    Google Scholar 

  9. Tieleman, T., Hinton, G.: RMSProp: Divide the gradient by a running average of its recent magnitude. In: Neural Networks for Machine Learning (2012)

    Google Scholar 

  10. Wilson, A.C., Roelofs, R., Stern, M., Srebro, N., Recht, B.: The marginal value of adaptive gradient methods in machine learning. In: NeurIPS, pp. 4148–4158 (2017)

    Google Scholar 

  11. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. of the IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  12. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  14. Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical Report, University of Toronto (2009)

    Google Scholar 

  15. Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Comp. Math. and Math. Phys. 4(5), 1–17 (1964)

    Article  Google Scholar 

  16. Graves, A.: Generating sequences with recurrent neural networks, CoRR, vol. abs/ arxiv: 1308.0850 (2013)

  17. Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. In: International Conference on Learning Representations (2018)

    Google Scholar 

  18. Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: IEEE Information Theory Workshop, pp. 1–5 (2015)

    Google Scholar 

  19. Krizhevsky, A.: Learning multiple layers of features from tiny images. Learning multiple layers of features from tiny images, Science Department, University of Toronto (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cleber Zanchettin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Macêdo, D., Dreyer, P., Ludermir, T., Zanchettin, C. (2022). Training Aware Sigmoidal Optimization. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13654 . Springer, Cham. https://doi.org/10.1007/978-3-031-21689-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21689-3_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21688-6

  • Online ISBN: 978-3-031-21689-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics