Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 19;9(1):2383.
doi: 10.1038/s41467-018-04316-3.

Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science

Affiliations

Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science

Decebal Constantin Mocanu et al. Nat Commun. .

Abstract

Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods. Taking inspiration from the network properties of biological neural networks (e.g. sparsity, scale-freeness), we argue that (contrary to general practice) artificial neural networks, too, should not have fully-connected layers. Here we propose sparse evolutionary training of artificial neural networks, an algorithm which evolves an initial sparse topology (Erdős-Rényi random graph) of two consecutive layers of neurons into a scale-free topology, during learning. Our method replaces artificial neural networks fully-connected layers with sparse ones before training, reducing quadratically the number of parameters, with no decrease in accuracy. We demonstrate our claims on restricted Boltzmann machines, multi-layer perceptrons, and convolutional neural networks for unsupervised and supervised learning on 15 datasets. Our approach has the potential to enable artificial neural networks to scale up beyond what is currently possible.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
An illustration of the SET procedure. For each sparse connected layer, SCk (a), of an ANN at the end of a training epoch a fraction of the weights, the ones closest to zero, are removed (b). Then, new weighs are added randomly in the same amount as the ones previously removed (c). Further on, a new training epoch is performed (d), and the procedure to remove and add weights is repeated. The process continues for a finite number of training epochs, as usual in the ANNs training
None
Algorithm 1: SET pseudocode
Fig. 2
Fig. 2
Experiments with RBM variants on the DNA dataset. For each model studied we have considered three cases for the number of contrastive divergence steps, nCD = 1 (ac), nCD = 3 (df), and nCD = 10 (gi). Also, we considered three cases for the number of hidden neurons, nh = 100 (a,d,g), nh = 250 (b,e,h), and nh = 500 (c,f,i). In each panel, the x axes show the training epochs; the left y axes show the average log-probabilities computed on the test data with AIS; and the right y axes (the stacked bar on the right side of the panels) reflect the fraction given by the nW of each model over the sum of the nW of all three models. Overall, SET-RBM outperforms the other two models in most of the cases. Also, it is interesting to see that SET-RBM and RBMFixProb are much more stable and do not present the over-fitting problems of RBM
Fig. 3
Fig. 3
SET-RBM evolution towards a scale-free topology on the DNA dataset. We have considered three cases for the number of contrastive divergence steps, nCD = 1 (ac), nCD = 3 (df), and nCD = 10 (gi). Also, we considered three cases for the number of hidden neurons, nh = 100 (a, d, g), nh = 250 (b, e, h), and nh = 500 (c, f, i). In each panel, the x axes show the training epochs; the left y axes (red color) show the average log-probabilities computed for SET-RBMs on the test data with AIS; and the right y axes (cyan color) show the p-values computed between the degree distribution of the hidden neurons in SET-RBM and a power-law distribution. We may observe that for models with a high enough number of hidden neurons, the SET-RBM topology always tends to become scale-free
Fig. 4
Fig. 4
SET-RBMs connectivity patterns for the visible neurons. a On the MNIST dataset. b On the Caltech 101 16 × 16 dataset. For each dataset, we have analyzed two SET-RBM architectures, i.e. 500 and 2500 hidden neurons. The heat-map matrices are obtained by reshaping the visible neurons vector to match the size of the original input images. In all cases, it can be observed that the connectivity starts from an initial Erdös–Rényi distribution. Then, during the training process, it evolves towards organized patterns which depend on the input images
Fig. 5
Fig. 5
Experiments with MLP variants using three benchmark datasets. a,c,e reflect models performance in terms of classification accuracy (left y axes) over training epochs (x axes); the right y axes of a,c,e give the p-values computed between the degree distribution of the hidden neurons of the SET-MLP models and a power-law distribution, showing how the SET-MLP topology becomes scale-free over training epochs. b,d,f depict the number of weights of the three models on each dataset. The most striking situation happens for the CIFAR10 dataset (c,d) where the SET-MLP model outperforms drastically the MLP model, while having ~100 times fewer parameters
Fig. 6
Fig. 6
Models accuracy using three weights regularization techniques on the Fashion-MNIST dataset. All models have been trained with stochastic gradient descent, having the same hyper-parameters, number of hidden layers (i.e. three), and number of hidden neurons per layer (i.e. 1000). ac use ReLU activation function for the hidden neurons and Nesterov momentum; df use ReLU activation function without Nesterov momentum; gi use SReLU activation function and Nesterov momentum; and jl use SReLU activation function without Nesterov momentum. a,d,g,j present experiments with SET-MLP; b,e,h,k with MLPFixProb; and c,f,i,l with MLP
Fig. 7
Fig. 7
Experiments with CNN variants on the CIFAR10 dataset. a Models performance in terms of classification accuracy (left y axes) over training epochs (x axes). b The number of weights of the three models on each dataset. The convolutional layers of each model have in total 287,008 weights, while the fully connected (or the sparse) layers on top have 8,413,194, 184.842, and 184,842 weights for CNN, CNNFixProb, and SET-CNN, respectively

Similar articles

Cited by

References

    1. Baldi P, Sadowski P, Whiteson D. Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 2014;5:4308. doi: 10.1038/ncomms5308. - DOI - PubMed
    1. Mnih V, et al. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533. doi: 10.1038/nature14236. - DOI - PubMed
    1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. - DOI - PubMed
    1. Strogatz SH. Exploring complex networks. Nature. 2001;410:268–276. doi: 10.1038/35065725. - DOI - PubMed
    1. Pessoa L. Understanding brain networks and brain organization. Phys. Life Rev. 2014;11:400–435. doi: 10.1016/j.plrev.2014.03.005. - DOI - PMC - PubMed