Training Deep Nets with Progressive Batch Normalization on Multi-GPUs

Qin, Lianke; Gong, Yifan; Tang, Tianqi; Wang, Yutian; Jin, Jiangming

doi:10.1007/s10766-018-0615-5

Training Deep Nets with Progressive Batch Normalization on Multi-GPUs

Published: 17 December 2018

Volume 47, pages 373–387, (2019)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Lianke Qin ORCID: orcid.org/0000-0002-1259-7137¹,
Yifan Gong¹,
Tianqi Tang¹,
Yutian Wang² &
…
Jiangming Jin¹

482 Accesses
9 Citations
12 Altmetric
Explore all metrics

Abstract

Batch normalization (BN) enables us to train various deep neural networks faster. However, the training accuracy will be significantly influenced with the decrease of input mini-batch size. To increase the model accuracy, a global mean and variance among all the input batch can be used, nevertheless communication across all devices is required in each BN layer, which reduces the training speed greatly. To address this problem, we propose progressive batch normalization, which can achieve a good balance between model accuracy and efficiency in multiple-GPU training. Experimental results show that our algorithm can obtain significant performance improvement over traditional BN without data synchronization across GPUs, achieving up to 18.4% improvement on training DeepLab for semantic segmentation task across 8 GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

FastHebb: Scaling Hebbian Training of Deep Neural Networks to ImageNet Level

FastSiam: Resource-Efficient Self-supervised Learning on a Single GPU

Group Normalization

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization (2016). arXiv:1607.06450
Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., Zhang, Z.: Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems (2015). arXiv preprint arXiv:1512.01274
Cooijmans, T., Ballas, N., Laurent, C., Courville, A.C.: Recurrent batch normalization. CoRR (2016). arXiv:1603.09025
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462–466 (1952). https://doi.org/10.1214/aoms/1177729392
Article MathSciNet MATH Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Li, Y., Wang, N., Shi, J., Liu, J., Hou, X.: Revisiting batch normalization for practical domain adaptation. CoRR (2016). arXiv:1603.04779
Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 901–909. Curran Associates Inc, Montreal (2016)
Google Scholar
Shrivastava, D., Chaudhury, S., Jayadeva, D.: A data and model-parallel, distributed and scalable framework for training of deep networks in apache spark. ArXiv e-prints (2017)
Smith, S.L., Kindermans, P., Le, Q.V.: Don’t decay the learning rate, increase the batch size. CoRR (2017). arXiv:1711.00489
Wu, S., Li, G., Deng, L., Liu, L., Xie, Y., Shi, L.: L1-norm batch normalization for efficient training of deep neural networks. CoRR (2018). arXiv:1802.09769
Wu, Y., He, K.: Group normalization. CoRR (2018). arXiv:1803.08494

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments. We acknowledge the support from the Tusimple HPC group and Tsinghua University.

Author information

Authors and Affiliations

TuSimple, Beijing, China
Lianke Qin, Yifan Gong, Tianqi Tang & Jiangming Jin
Computer Science Department, Tsinghua University, Beijing, China
Yutian Wang

Authors

Lianke Qin
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Gong
View author publications
You can also search for this author in PubMed Google Scholar
Tianqi Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yutian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiangming Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lianke Qin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, L., Gong, Y., Tang, T. et al. Training Deep Nets with Progressive Batch Normalization on Multi-GPUs. Int J Parallel Prog 47, 373–387 (2019). https://doi.org/10.1007/s10766-018-0615-5

Download citation

Received: 19 September 2018
Accepted: 16 November 2018
Published: 17 December 2018
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10766-018-0615-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Training Deep Nets with Progressive Batch Normalization on Multi-GPUs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FastHebb: Scaling Hebbian Training of Deep Neural Networks to ImageNet Level

FastSiam: Resource-Efficient Self-supervised Learning on a Single GPU

Group Normalization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Training Deep Nets with Progressive Batch Normalization on Multi-GPUs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FastHebb: Scaling Hebbian Training of Deep Neural Networks to ImageNet Level

FastSiam: Resource-Efficient Self-supervised Learning on a Single GPU

Group Normalization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation