Optimizing nonlinear activation function for convolutional neural networks

Varshney, Munender; Singh, Pravendra

doi:10.1007/s11760-021-01863-z

Optimizing nonlinear activation function for convolutional neural networks

Original Paper
Published: 19 February 2021

Volume 15, pages 1323–1330, (2021)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

1639 Accesses
33 Citations
Explore all metrics

Abstract

Activation functions play a critical role in the training and performance of the deep convolutional neural networks. Currently, the rectified linear unit (ReLU) is the most commonly used activation function for the deep CNNs. ReLU is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. In this work, we propose a novel approach to generalize the ReLU activation function using multiple learnable slope parameters. These learnable slope parameters are optimized for every channel, which leads to the learning of a more generalized activation function (a variant of ReLU) corresponding to each channel. This activation is named as fully parametric rectified linear unit (FReLU) and trained using an alternate optimization technique by learning one set of parameters, keeping another set of parameters frozen. Our experiments show that the method outperforms ReLU and its other variant activation functions and also generalizes over various tasks such as image classification, object detection and action recognition in videos. The Top-1 classification accuracy of FReLU on ImageNet improves by 3.75% for MobileNet and \(\sim \) 2% for ResNet-50 over ReLU. We also provide various analyses for better interpretability of our proposed activation function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Optimizing performance of feedforward and convolutional neural networks through dynamic activation functions

Article 03 September 2024

A Convolutional Neural Network Model Based on Improved Softplus Activation Function

Analysis of Nonlinear Activation Functions for Classification Tasks Using Convolutional Neural Networks

References

Agostinelli, F., Hoffman, M., Sadowski, P., Baldi, P.: Learning activation functions to improve deep neural networks. arXiv:1412.6830 (2014)
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv:1511.07289 (2015)
El Jaafari, I., Ellahyani, A., Charfi, S.: Parametric rectified nonlinear unit (PRenu) for convolution neural networks . SIViP 15, 241–246 (2021). https://doi.org/10.1007/s11760-020-01746-9
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)
Article Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6202–6211 (2019)
Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv:1312.6082 (2013)
Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. arXiv:1302.4389 (2013)
Goyal, R., Kahou, S.E., Michalski, V., Materzynska, J., Westphal, S., Kim, H., Haenel, V., Fruend, I., Yianilos, P., Mueller-Freitag, M., et al.: The “something something” video database for learning and evaluating visual common sense. In: ICCV, vol. 1, p. 5 (2017)
Hayou, S., Doucet, A., Rousseau, J.: On the impact of the activation function on deep neural networks training. In: Proceedings of the International Conference on Machine Learning (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)
Jain, P., Kar, P., et al.: Non-convex optimization for machine learning. Found. Trends Mach. Learn. 10(3–4), 142–336 (2017)
Article Google Scholar
Jiang, X., Pang, Y., Li, X., Pan, J., Xie, Y.: Deep neural networks with elastic rectified linear units for object recognition. Neurocomputing 275, 1132–1139 (2018)
Article Google Scholar
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 972-981 (2017)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. rep, Citeseer (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6) 84–90 (2017). https://doi.org/10.1145/3065386
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Li, Y., Fan, C., Li, Y., Wu, Q., Ming, Y.: Improving deep neural network with multiple parametric exponential linear units. Neurocomputing 301, 11–24 (2018)
Article Google Scholar
Lin, J., Gan, C., Han, S.: Tsm: Temporal shift module for efficient video understanding. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7083–7093 (2019)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, p. 3 (2013)
Misra, D.: Mish: A self regularized non-monotonic neural activation function. arXiv:1908.08681 (2019)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. https://arxiv.org/pdf/1710.05941.pdf (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp. 1139–1147 (2013)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Taylor, G., Burmeister, R., Xu, Z., Singh, B., Patel, A., Goldstein, T.: Training neural networks without gradients: A scalable admm approach. In: International Conference on Machine Learning, pp. 2722–2731 (2016)
Trottier, L., Gigu, P., Chaib-draa, B., et al.: Parametric exponential linear unit for deep convolutional neural networks. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 207–214. IEEE (2017)
Urban, G., Geras, K.J., Kahou, S.E., Aslan, O., Wang, S., Caruana, R., Mohamed, A., Philipose, M., Richardson, M.: Do deep convolutional nets really need to be deep and convolutional? In: International Conference on Learning Representations (2017)
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853 (2015)
Yang, C., Xu, Y., Shi, J., Dai, B., Zhou, B.: Temporal pyramid network for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 591–600 (2020)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India
Munender Varshney & Pravendra Singh

Authors

Munender Varshney
View author publications
You can also search for this author in PubMed Google Scholar
Pravendra Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Munender Varshney.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Varshney, M., Singh, P. Optimizing nonlinear activation function for convolutional neural networks. SIViP 15, 1323–1330 (2021). https://doi.org/10.1007/s11760-021-01863-z

Download citation

Received: 13 August 2020
Revised: 01 December 2020
Accepted: 19 January 2021
Published: 19 February 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11760-021-01863-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Optimizing nonlinear activation function for convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing performance of feedforward and convolutional neural networks through dynamic activation functions

A Convolutional Neural Network Model Based on Improved Softplus Activation Function

Analysis of Nonlinear Activation Functions for Classification Tasks Using Convolutional Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Optimizing nonlinear activation function for convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing performance of feedforward and convolutional neural networks through dynamic activation functions

A Convolutional Neural Network Model Based on Improved Softplus Activation Function

Analysis of Nonlinear Activation Functions for Classification Tasks Using Convolutional Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation