Abstract
Training the classical-vanilla deep neural networks (DNNs) with several layers is problematic due to optimization problems. Interestingly, skip connections of various forms (e.g. that perform the summation or concatenation of hidden representations or layer outputs) have been shown to allow the successful training of very DNNs. Although there are ongoing theoretical works to understand very DNNs that employ the summation of the outputs of different layers (e.g. as in the residual network), there is none to the best of our knowledge that has studied why DNNs that concatenate of the outputs of different layers (e.g. as seen in Inception, FractalNet and DenseNet) works. As such, we present in this paper, the first theoretical analysis of very DNNs with concatenated hidden representations based on a general framework that can be extended to specific cases. Our results reveal that DNNs with concatenated hidden representations circumnavigate the singularity of hidden representation, which is catastrophic for optimization. For substantiating the theoretical results, extensive experiments are reported on standard datasets such as the MNIST and CIFAR-10.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lee, C.-Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics, pp. 562–570 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: Conference on Learning Theory, pp. 907–940 (2016)
Safran, I., Shamir, O.: Depth-width tradeoffs in approximating natural functions with neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2979–2987. JMLR. org (2017)
Oyedotun, O. K., El Rahman Shabayek, A., Aouada, D., Ottersten, B.: Highway network block with gates constraints for training very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1658–1667 (2018)
Srivastava, R. K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2377–2385 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Oyedotun, O.K., Aouada, D., Ottersten, B., et al.: Training very deep networks via residual learning with stochastic input shortcut connections. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) International Conference on Neural Information Processing, vol. 10635, pp. 23–33. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70096-0_3
Veit, A., Wilber, M. J., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, pp. 550–558 (2016)
Balduzzi, D., Frean, M., Leary, L., Lewis, J.P., Ma, K.W.D., McWilliams, B.: The shattered gradients problem: if resnets are the answer, then what is the question? In: International Conference on Machine Learning, pp. 342–350 (2017)
Greff, K., Srivastava, R.K., Schmidhuber, J.: Highway and residual networks learn unrolled iterative estimation. In: International Conference Learning Representations (2017)
Szegedy, C.,et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Larsson, G., Maire, M., Shakhnarovich, G.: FractalNet: ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., Bengio, Y.: Residual connections encourage iterative inference. In: International Conference on Learning Representations (2018)
Chang, B., Meng, L., Haber, E., Tung, F., Begert, D.: Multi-level residual networks from dynamical systems view. In: International Conference on Learning Representations (2018)
Zhou, Y., Liang, Y.: Critical points of neural networks: analytical forms and landscape properties. In: International Conference on Learning Representations (2017)
Sonoda, S., Murata, N.: Transport analysis of infinitely deep neural network. J. Mach. Learn. Res. 20(1), 31–82 (2019)
Nguyen, Q., Hein, M.: Optimization landscape and expressivity of deep cnns. In: International Conference on Machine Learning, pp. 3730–3739 (2018)
Laurent, T., Brecht, J.: Deep linear networks with arbitrary loss: all local minima are global. In: International Conference on Machine Learning, pp. 2902–2907 (2018)
Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016)
Saxe, A.M., McClelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In: International Conference on Learning Representations (2014)
Neubauer, A.: A new gradient method for ill-posed problems. Numer. Funct. Anal. Optim. 39(6), 737–762 (2018)
Neubauer, A., Scherzer, O.: A convergence rate result for a steepest descent method and a minimal error method for the solution of nonlinear ill-posed problems. Zeitschrift für Analysis und ihre Anwendungen 14(2), 369–377 (1995)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Solymosi, J.: The sum of nonsingular matrices is often nonsingular. Linear Algebra Appl. 552, 159–165 (2018)
LeCun, Y., Cortes, C.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. Accessed Oct 2019
Krizhevsky, A., Nair, V., Hinton, G.: CIFAR-10, CIFAR-100 (Canadian institute for advanced research). http://www.cs.toronto.edu/~kriz/cifar.html. Accessed Oct 2019
Acknowledgments
This work was funded by the National Research Fund (FNR), Luxembourg, under the project reference R-AGR-0424-05-D/Björn Ottersten and CPPP17/IS/11643091/IDform/Aouada.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Oyedotun, O.K., Aouada, D. (2020). Why Do Deep Neural Networks with Skip Connections and Concatenated Hidden Representations Work?. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12534. Springer, Cham. https://doi.org/10.1007/978-3-030-63836-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-63836-8_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63835-1
Online ISBN: 978-3-030-63836-8
eBook Packages: Computer ScienceComputer Science (R0)