Why Do Deep Neural Networks with Skip Connections and Concatenated Hidden Representations Work?

Oyedotun, Oyebade K.; Aouada, Djamila

doi:10.1007/978-3-030-63836-8_32

Oyebade K. Oyedotun¹⁴ &
Djamila Aouada¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12534))

Included in the following conference series:

International Conference on Neural Information Processing

2107 Accesses
2 Citations

Abstract

Training the classical-vanilla deep neural networks (DNNs) with several layers is problematic due to optimization problems. Interestingly, skip connections of various forms (e.g. that perform the summation or concatenation of hidden representations or layer outputs) have been shown to allow the successful training of very DNNs. Although there are ongoing theoretical works to understand very DNNs that employ the summation of the outputs of different layers (e.g. as in the residual network), there is none to the best of our knowledge that has studied why DNNs that concatenate of the outputs of different layers (e.g. as seen in Inception, FractalNet and DenseNet) works. As such, we present in this paper, the first theoretical analysis of very DNNs with concatenated hidden representations based on a general framework that can be extended to specific cases. Our results reveal that DNNs with concatenated hidden representations circumnavigate the singularity of hidden representation, which is catastrophic for optimization. For substantiating the theoretical results, extensive experiments are reported on standard datasets such as the MNIST and CIFAR-10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Training Very Deep Networks via Residual Learning with Stochastic Input Shortcut Connections

Deep learning with ExtendeD Exponential Linear Unit (DELU)

Article 16 August 2023

On the Performance of new Higher Order Transformation Functions for Highly Efficient Dense Layers

Article 23 July 2023

References

Lee, C.-Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics, pp. 562–570 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Google Scholar
Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: Conference on Learning Theory, pp. 907–940 (2016)
Google Scholar
Safran, I., Shamir, O.: Depth-width tradeoffs in approximating natural functions with neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2979–2987. JMLR. org (2017)
Google Scholar
Oyedotun, O. K., El Rahman Shabayek, A., Aouada, D., Ottersten, B.: Highway network block with gates constraints for training very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1658–1667 (2018)
Google Scholar
Srivastava, R. K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2377–2385 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Oyedotun, O.K., Aouada, D., Ottersten, B., et al.: Training very deep networks via residual learning with stochastic input shortcut connections. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) International Conference on Neural Information Processing, vol. 10635, pp. 23–33. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70096-0_3
Chapter Google Scholar
Veit, A., Wilber, M. J., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, pp. 550–558 (2016)
Google Scholar
Balduzzi, D., Frean, M., Leary, L., Lewis, J.P., Ma, K.W.D., McWilliams, B.: The shattered gradients problem: if resnets are the answer, then what is the question? In: International Conference on Machine Learning, pp. 342–350 (2017)
Google Scholar
Greff, K., Srivastava, R.K., Schmidhuber, J.: Highway and residual networks learn unrolled iterative estimation. In: International Conference Learning Representations (2017)
Google Scholar
Szegedy, C.,et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Larsson, G., Maire, M., Shakhnarovich, G.: FractalNet: ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., Bengio, Y.: Residual connections encourage iterative inference. In: International Conference on Learning Representations (2018)
Google Scholar
Chang, B., Meng, L., Haber, E., Tung, F., Begert, D.: Multi-level residual networks from dynamical systems view. In: International Conference on Learning Representations (2018)
Google Scholar
Zhou, Y., Liang, Y.: Critical points of neural networks: analytical forms and landscape properties. In: International Conference on Learning Representations (2017)
Google Scholar
Sonoda, S., Murata, N.: Transport analysis of infinitely deep neural network. J. Mach. Learn. Res. 20(1), 31–82 (2019)
MathSciNet MATH Google Scholar
Nguyen, Q., Hein, M.: Optimization landscape and expressivity of deep cnns. In: International Conference on Machine Learning, pp. 3730–3739 (2018)
Google Scholar
Laurent, T., Brecht, J.: Deep linear networks with arbitrary loss: all local minima are global. In: International Conference on Machine Learning, pp. 2902–2907 (2018)
Google Scholar
Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016)
Google Scholar
Saxe, A.M., McClelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In: International Conference on Learning Representations (2014)
Google Scholar
Neubauer, A.: A new gradient method for ill-posed problems. Numer. Funct. Anal. Optim. 39(6), 737–762 (2018)
Article MathSciNet Google Scholar
Neubauer, A., Scherzer, O.: A convergence rate result for a steepest descent method and a minimal error method for the solution of nonlinear ill-posed problems. Zeitschrift für Analysis und ihre Anwendungen 14(2), 369–377 (1995)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar
Solymosi, J.: The sum of nonsingular matrices is often nonsingular. Linear Algebra Appl. 552, 159–165 (2018)
Article MathSciNet Google Scholar
LeCun, Y., Cortes, C.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. Accessed Oct 2019
Krizhevsky, A., Nair, V., Hinton, G.: CIFAR-10, CIFAR-100 (Canadian institute for advanced research). http://www.cs.toronto.edu/~kriz/cifar.html. Accessed Oct 2019

Download references

Acknowledgments

This work was funded by the National Research Fund (FNR), Luxembourg, under the project reference R-AGR-0424-05-D/Björn Ottersten and CPPP17/IS/11643091/IDform/Aouada.

Author information

Authors and Affiliations

Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, 1855, Luxembourg City, Luxembourg
Oyebade K. Oyedotun & Djamila Aouada

Authors

Oyebade K. Oyedotun
View author publications
You can also search for this author in PubMed Google Scholar
Djamila Aouada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oyebade K. Oyedotun .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oyedotun, O.K., Aouada, D. (2020). Why Do Deep Neural Networks with Skip Connections and Concatenated Hidden Representations Work?. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12534. Springer, Cham. https://doi.org/10.1007/978-3-030-63836-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-63836-8_32
Published: 19 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63835-1
Online ISBN: 978-3-030-63836-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Why Do Deep Neural Networks with Skip Connections and Concatenated Hidden Representations Work?

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Training Very Deep Networks via Residual Learning with Stochastic Input Shortcut Connections

Deep learning with ExtendeD Exponential Linear Unit (DELU)

On the Performance of new Higher Order Transformation Functions for Highly Efficient Dense Layers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Why Do Deep Neural Networks with Skip Connections and Concatenated Hidden Representations Work?

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Training Very Deep Networks via Residual Learning with Stochastic Input Shortcut Connections

Deep learning with ExtendeD Exponential Linear Unit (DELU)

On the Performance of new Higher Order Transformation Functions for Highly Efficient Dense Layers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation