Why Do Deep Neural Networks with Skip Connections and Concatenated Hidden Representations Work? | SpringerLink
Skip to main content

Why Do Deep Neural Networks with Skip Connections and Concatenated Hidden Representations Work?

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12534))

Included in the following conference series:

Abstract

Training the classical-vanilla deep neural networks (DNNs) with several layers is problematic due to optimization problems. Interestingly, skip connections of various forms (e.g. that perform the summation or concatenation of hidden representations or layer outputs) have been shown to allow the successful training of very DNNs. Although there are ongoing theoretical works to understand very DNNs that employ the summation of the outputs of different layers (e.g. as in the residual network), there is none to the best of our knowledge that has studied why DNNs that concatenate of the outputs of different layers (e.g. as seen in Inception, FractalNet and DenseNet) works. As such, we present in this paper, the first theoretical analysis of very DNNs with concatenated hidden representations based on a general framework that can be extended to specific cases. Our results reveal that DNNs with concatenated hidden representations circumnavigate the singularity of hidden representation, which is catastrophic for optimization. For substantiating the theoretical results, extensive experiments are reported on standard datasets such as the MNIST and CIFAR-10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Lee, C.-Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics, pp. 562–570 (2015)

    Google Scholar 

  2. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

    Google Scholar 

  3. Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: Conference on Learning Theory, pp. 907–940 (2016)

    Google Scholar 

  4. Safran, I., Shamir, O.: Depth-width tradeoffs in approximating natural functions with neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2979–2987. JMLR. org (2017)

    Google Scholar 

  5. Oyedotun, O. K., El Rahman Shabayek, A., Aouada, D., Ottersten, B.: Highway network block with gates constraints for training very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1658–1667 (2018)

    Google Scholar 

  6. Srivastava, R. K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2377–2385 (2015)

    Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  8. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)

    Google Scholar 

  9. Oyedotun, O.K., Aouada, D., Ottersten, B., et al.: Training very deep networks via residual learning with stochastic input shortcut connections. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) International Conference on Neural Information Processing, vol. 10635, pp. 23–33. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70096-0_3

    Chapter  Google Scholar 

  10. Veit, A., Wilber, M. J., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, pp. 550–558 (2016)

    Google Scholar 

  11. Balduzzi, D., Frean, M., Leary, L., Lewis, J.P., Ma, K.W.D., McWilliams, B.: The shattered gradients problem: if resnets are the answer, then what is the question? In: International Conference on Machine Learning, pp. 342–350 (2017)

    Google Scholar 

  12. Greff, K., Srivastava, R.K., Schmidhuber, J.: Highway and residual networks learn unrolled iterative estimation. In: International Conference Learning Representations (2017)

    Google Scholar 

  13. Szegedy, C.,et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  14. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

    Google Scholar 

  15. Larsson, G., Maire, M., Shakhnarovich, G.: FractalNet: ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017)

    Google Scholar 

  16. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  17. Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., Bengio, Y.: Residual connections encourage iterative inference. In: International Conference on Learning Representations (2018)

    Google Scholar 

  18. Chang, B., Meng, L., Haber, E., Tung, F., Begert, D.: Multi-level residual networks from dynamical systems view. In: International Conference on Learning Representations (2018)

    Google Scholar 

  19. Zhou, Y., Liang, Y.: Critical points of neural networks: analytical forms and landscape properties. In: International Conference on Learning Representations (2017)

    Google Scholar 

  20. Sonoda, S., Murata, N.: Transport analysis of infinitely deep neural network. J. Mach. Learn. Res. 20(1), 31–82 (2019)

    MathSciNet  MATH  Google Scholar 

  21. Nguyen, Q., Hein, M.: Optimization landscape and expressivity of deep cnns. In: International Conference on Machine Learning, pp. 3730–3739 (2018)

    Google Scholar 

  22. Laurent, T., Brecht, J.: Deep linear networks with arbitrary loss: all local minima are global. In: International Conference on Machine Learning, pp. 2902–2907 (2018)

    Google Scholar 

  23. Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016)

    Google Scholar 

  24. Saxe, A.M., McClelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In: International Conference on Learning Representations (2014)

    Google Scholar 

  25. Neubauer, A.: A new gradient method for ill-posed problems. Numer. Funct. Anal. Optim. 39(6), 737–762 (2018)

    Article  MathSciNet  Google Scholar 

  26. Neubauer, A., Scherzer, O.: A convergence rate result for a steepest descent method and a minimal error method for the solution of nonlinear ill-posed problems. Zeitschrift für Analysis und ihre Anwendungen 14(2), 369–377 (1995)

    Article  MathSciNet  Google Scholar 

  27. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

    Google Scholar 

  28. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

    Google Scholar 

  29. Solymosi, J.: The sum of nonsingular matrices is often nonsingular. Linear Algebra Appl. 552, 159–165 (2018)

    Article  MathSciNet  Google Scholar 

  30. LeCun, Y., Cortes, C.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. Accessed Oct 2019

  31. Krizhevsky, A., Nair, V., Hinton, G.: CIFAR-10, CIFAR-100 (Canadian institute for advanced research). http://www.cs.toronto.edu/~kriz/cifar.html. Accessed Oct 2019

Download references

Acknowledgments

This work was funded by the National Research Fund (FNR), Luxembourg, under the project reference R-AGR-0424-05-D/Björn Ottersten and CPPP17/IS/11643091/IDform/Aouada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oyebade K. Oyedotun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oyedotun, O.K., Aouada, D. (2020). Why Do Deep Neural Networks with Skip Connections and Concatenated Hidden Representations Work?. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12534. Springer, Cham. https://doi.org/10.1007/978-3-030-63836-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63836-8_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63835-1

  • Online ISBN: 978-3-030-63836-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics