Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing (Journal Article) | OSTI.GOV
Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Journal Article · · Future Generations Computer Systems

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
1617450
Alternate ID(s):
OSTI ID: 1778383
Report Number(s):
PNNL-SA-134513
Journal Information:
Future Generations Computer Systems, Vol. 108, Issue C; ISSN 0167-739X
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 9 works
Citation information provided by
Web of Science

References (13)

Going deeper with convolutions conference June 2015
Searching for exotic particles in high-energy physics with deep learning journal July 2014
Caffe: Convolutional Architecture for Fast Feature Embedding conference January 2014
Theano: A CPU and GPU Math Compiler in Python conference January 2010
Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
  • Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeff A.
  • 2017 IEEE International Parallel and Distributed Processing Symposium: Workshops (IPDPSW), 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2017.36
conference May 2017
Deep Residual Learning for Image Recognition conference June 2016
Knights Landing: Second-Generation Intel Xeon Phi Product journal March 2016
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition journal November 2008
ImageNet Large Scale Visual Recognition Challenge journal April 2015
FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters conference June 2016
RAPL: memory power estimation and capping
  • David, Howard; Gorbatov, Eugene; Hanebutte, Ulf R.
  • Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design - ISLPED '10 https://doi.org/10.1145/1840845.1840883
conference January 2010
Benchmarking State-of-the-Art Deep Learning Software Tools conference November 2016
Large Minibatch Training on Supercomputers with Improved Accuracy and Reduced Time to Train conference November 2018

Cited By (2)

Applications of Artificial Intelligence Methodologies to Behavioral and Social Sciences journal December 2019
A Framework for Memory Oversubscription Management in Graphics Processing Units
  • Li, Chen; Ausavarungnirun, Rachata; Rossbach, Christopher J.
  • ASPLOS '19: Architectural Support for Programming Languages and Operating Systems, Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems https://doi.org/10.1145/3297858.3304044
conference April 2019

Figures / Tables (19)