Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science

doi:10.1038/s41467-018-04316-3

. 2018 Jun 19;9(1):2383.

doi: 10.1038/s41467-018-04316-3.

Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science

Decebal Constantin Mocanu^{1

2}, Elena Mocanu^{3

4}, Peter Stone⁵, Phuong H Nguyen³, Madeleine Gibescu³, Antonio Liotta⁶

Affiliations

¹ Department of Mathematics and Computer Science, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands. d.c.mocanu@tue.nl.
² Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands. d.c.mocanu@tue.nl.
³ Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands.
⁴ Department of Mechanical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands.
⁵ Department of Computer Science, The University of Texas at Austin, 2317 Speedway, Stop D9500, Austin, TX, 78712-1757, USA.
⁶ Data Science Centre, University of Derby, Lonsdale House, Quaker Way, Derby, DE1 3HD, UK.

PMID: 29921910
PMCID: PMC6008460
DOI: 10.1038/s41467-018-04316-3

Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science

Decebal Constantin Mocanu et al. Nat Commun. 2018.

. 2018 Jun 19;9(1):2383.

doi: 10.1038/s41467-018-04316-3.

Authors

Decebal Constantin Mocanu^{1

2}, Elena Mocanu^{3

4}, Peter Stone⁵, Phuong H Nguyen³, Madeleine Gibescu³, Antonio Liotta⁶

Affiliations

¹ Department of Mathematics and Computer Science, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands. d.c.mocanu@tue.nl.
² Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands. d.c.mocanu@tue.nl.
³ Department of Electrical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands.
⁴ Department of Mechanical Engineering, Eindhoven University of Technology, De Rondom 70, 5612 AP, Eindhoven, The Netherlands.
⁵ Department of Computer Science, The University of Texas at Austin, 2317 Speedway, Stop D9500, Austin, TX, 78712-1757, USA.
⁶ Data Science Centre, University of Derby, Lonsdale House, Quaker Way, Derby, DE1 3HD, UK.

PMID: 29921910
PMCID: PMC6008460
DOI: 10.1038/s41467-018-04316-3

Abstract

Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods. Taking inspiration from the network properties of biological neural networks (e.g. sparsity, scale-freeness), we argue that (contrary to general practice) artificial neural networks, too, should not have fully-connected layers. Here we propose sparse evolutionary training of artificial neural networks, an algorithm which evolves an initial sparse topology (Erdős-Rényi random graph) of two consecutive layers of neurons into a scale-free topology, during learning. Our method replaces artificial neural networks fully-connected layers with sparse ones before training, reducing quadratically the number of parameters, with no decrease in accuracy. We demonstrate our claims on restricted Boltzmann machines, multi-layer perceptrons, and convolutional neural networks for unsupervised and supervised learning on 15 datasets. Our approach has the potential to enable artificial neural networks to scale up beyond what is currently possible.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
An illustration of the SET procedure. For each sparse connected layer, SC^k (a), of an ANN at the end of a training epoch a fraction of the weights, the ones closest to zero, are removed (b). Then, new weighs are added randomly in the same amount as the ones previously removed (c). Further on, a new training epoch is performed (d), and the procedure to remove and add weights is repeated. The process continues for a finite number of training epochs, as usual in the ANNs training

**Fig. 2**
Experiments with RBM variants on the DNA dataset. For each model studied we have considered three cases for the number of contrastive divergence steps, n^CD = 1 (a–c), n^CD = 3 (d–f), and n^CD = 10 (g–i). Also, we considered three cases for the number of hidden neurons, n^h = 100 (a,d,g), n^h = 250 (b,e,h), and n^h = 500 (c,f,i). In each panel, the x axes show the training epochs; the left y axes show the average log-probabilities computed on the test data with AIS; and the right y axes (the stacked bar on the right side of the panels) reflect the fraction given by the n^W of each model over the sum of the n^W of all three models. Overall, SET-RBM outperforms the other two models in most of the cases. Also, it is interesting to see that SET-RBM and RBM_FixProb are much more stable and do not present the over-fitting problems of RBM

**Fig. 3**
SET-RBM evolution towards a scale-free topology on the DNA dataset. We have considered three cases for the number of contrastive divergence steps, n^CD = 1 (a–c), n^CD = 3 (d–f), and n^CD = 10 (g–i). Also, we considered three cases for the number of hidden neurons, n^h = 100 (a, d, g), n^h = 250 (b, e, h), and n^h = 500 (c, f, i). In each panel, the x axes show the training epochs; the left y axes (red color) show the average log-probabilities computed for SET-RBMs on the test data with AIS; and the right y axes (cyan color) show the p-values computed between the degree distribution of the hidden neurons in SET-RBM and a power-law distribution. We may observe that for models with a high enough number of hidden neurons, the SET-RBM topology always tends to become scale-free

**Fig. 4**
SET-RBMs connectivity patterns for the visible neurons. a On the MNIST dataset. b On the Caltech 101 16 × 16 dataset. For each dataset, we have analyzed two SET-RBM architectures, i.e. 500 and 2500 hidden neurons. The heat-map matrices are obtained by reshaping the visible neurons vector to match the size of the original input images. In all cases, it can be observed that the connectivity starts from an initial Erdös–Rényi distribution. Then, during the training process, it evolves towards organized patterns which depend on the input images

**Fig. 5**
Experiments with MLP variants using three benchmark datasets. a,c,e reflect models performance in terms of classification accuracy (left y axes) over training epochs (x axes); the right y axes of a,c,e give the p-values computed between the degree distribution of the hidden neurons of the SET-MLP models and a power-law distribution, showing how the SET-MLP topology becomes scale-free over training epochs. b,d,f depict the number of weights of the three models on each dataset. The most striking situation happens for the CIFAR10 dataset (c,d) where the SET-MLP model outperforms drastically the MLP model, while having ~100 times fewer parameters

**Fig. 6**
Models accuracy using three weights regularization techniques on the Fashion-MNIST dataset. All models have been trained with stochastic gradient descent, having the same hyper-parameters, number of hidden layers (i.e. three), and number of hidden neurons per layer (i.e. 1000). a–c use ReLU activation function for the hidden neurons and Nesterov momentum; d–f use ReLU activation function without Nesterov momentum; g–i use SReLU activation function and Nesterov momentum; and j–l use SReLU activation function without Nesterov momentum. a,d,g,j present experiments with SET-MLP; b,e,h,k with MLP_FixProb; and c,f,i,l with MLP

**Fig. 7**
Experiments with CNN variants on the CIFAR10 dataset. a Models performance in terms of classification accuracy (left y axes) over training epochs (x axes). b The number of weights of the three models on each dataset. The convolutional layers of each model have in total 287,008 weights, while the fully connected (or the sparse) layers on top have 8,413,194, 184.842, and 184,842 weights for CNN, CNN_FixProb, and SET-CNN, respectively

See this image and copyright information in PMC

Cited by

A Comprehensive Diagnosis Method of Rolling Bearing Fault Based on CEEMDAN-DFA-Improved Wavelet Threshold Function and QPSO-MPE-SVM.
Wang Y, Xu C, Wang Y, Cheng X. Wang Y, et al. Entropy (Basel). 2021 Aug 31;23(9):1142. doi: 10.3390/e23091142. Entropy (Basel). 2021. PMID: 34573767 Free PMC article.
Perturbation of deep autoencoder weights for model compression and classification of tabular data.
Abrar S, Samad MD. Abrar S, et al. Neural Netw. 2022 Dec;156:160-169. doi: 10.1016/j.neunet.2022.09.020. Epub 2022 Sep 27. Neural Netw. 2022. PMID: 36270199 Free PMC article.
Identification of 12 cancer types through genome deep learning.
Sun Y, Zhu S, Ma K, Liu W, Yue Y, Hu G, Lu H, Chen W. Sun Y, et al. Sci Rep. 2019 Nov 21;9(1):17256. doi: 10.1038/s41598-019-53989-3. Sci Rep. 2019. PMID: 31754222 Free PMC article.
Deep neural networks using a single neuron: folded-in-time architecture using feedback-modulated delay loops.
Stelzer F, Röhm A, Vicente R, Fischer I, Yanchuk S. Stelzer F, et al. Nat Commun. 2021 Aug 27;12(1):5164. doi: 10.1038/s41467-021-25427-4. Nat Commun. 2021. PMID: 34453053 Free PMC article.
ARNS: Adaptive Relay-Node Selection Method for Message Broadcasting in the Internet of Vehicles.
Cao D, Jiang Y, Wang J, Ji B, Alfarraj O, Tolba A, Ma X, Liu Y. Cao D, et al. Sensors (Basel). 2020 Feb 29;20(5):1338. doi: 10.3390/s20051338. Sensors (Basel). 2020. PMID: 32121445 Free PMC article.

See all "Cited by" articles

References

1. Baldi P, Sadowski P, Whiteson D. Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 2014;5:4308. doi: 10.1038/ncomms5308. - DOI - PubMed
1. Mnih V, et al. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533. doi: 10.1038/nature14236. - DOI - PubMed
1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. - DOI - PubMed
1. Strogatz SH. Exploring complex networks. Nature. 2001;410:268–276. doi: 10.1038/35065725. - DOI - PubMed
1. Pessoa L. Understanding brain networks and brain organization. Phys. Life Rev. 2014;11:400–435. doi: 10.1016/j.plrev.2014.03.005. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

[1] Baldi P, Sadowski P, Whiteson D. Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 2014;5:4308. doi: 10.1038/ncomms5308. - DOI - PubMed

[2] Baldi P, Sadowski P, Whiteson D. Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 2014;5:4308. doi: 10.1038/ncomms5308. - DOI - PubMed

[3] Mnih V, et al. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533. doi: 10.1038/nature14236. - DOI - PubMed

[4] Mnih V, et al. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533. doi: 10.1038/nature14236. - DOI - PubMed

[5] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. - DOI - PubMed

[6] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. - DOI - PubMed

[7] Strogatz SH. Exploring complex networks. Nature. 2001;410:268–276. doi: 10.1038/35065725. - DOI - PubMed

[8] Strogatz SH. Exploring complex networks. Nature. 2001;410:268–276. doi: 10.1038/35065725. - DOI - PubMed

[9] Pessoa L. Understanding brain networks and brain organization. Phys. Life Rev. 2014;11:400–435. doi: 10.1016/j.plrev.2014.03.005. - DOI - PMC - PubMed

[10] Pessoa L. Understanding brain networks and brain organization. Phys. Life Rev. 2014;11:400–435. doi: 10.1016/j.plrev.2014.03.005. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science

Affiliations

Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources