Dendritic normalisation improves learning in sparsely connected artificial neural networks - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 9;17(8):e1009202.
doi: 10.1371/journal.pcbi.1009202. eCollection 2021 Aug.

Dendritic normalisation improves learning in sparsely connected artificial neural networks

Affiliations

Dendritic normalisation improves learning in sparsely connected artificial neural networks

Alex D Bird et al. PLoS Comput Biol. .

Abstract

Artificial neural networks, taking inspiration from biological neurons, have become an invaluable tool for machine learning applications. Recent studies have developed techniques to effectively tune the connectivity of sparsely-connected artificial neural networks, which have the potential to be more computationally efficient than their fully-connected counterparts and more closely resemble the architectures of biological systems. We here present a normalisation, based on the biophysical behaviour of neuronal dendrites receiving distributed synaptic inputs, that divides the weight of an artificial neuron's afferent contacts by their number. We apply this dendritic normalisation to various sparsely-connected feedforward network architectures, as well as simple recurrent and self-organised networks with spatially extended units. The learning performance is significantly increased, providing an improvement over other widely-used normalisations in sparse networks. The results are two-fold, being both a practical advance in machine learning and an insight into how the structure of neuronal dendritic arbours may contribute to computation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Dendritic normalisation improves learning in sparse artificial neural networks.
A, Schematic of dendritic normalisation. A neuron receives inputs across its dendritic tree (dark grey). In order to receive new inputs, the dendritic tree must expand (light grey), lowering the intrinsic excitability of the cell through increased membrane leak and spatial extent. B, Expected impact of changing local synaptic weight on somatic voltage as a function of dendrite length and hence potential connectivity. Top: Steady state transfer resistance (Eq 10) for somata of radii 0, 5, 10, and 15 μm. Shaded area shows one standard deviation around the mean in the 0 μm case (Eq 11). Middle: Maximum voltage response to synaptic currents with decay timescales 10, 50, and 100 ms (Eqs 14 and 16). Shaded area shows one standard deviation around the mean in the 100 ms case (Eq 15). Bottom: Total voltage response to synaptic currents with the above timescales (all averages lie on the solid line, Eq 17). Shaded areas show one standard deviation around the mean in each case (Eq 18). Intrinsic dendrite properties are radius r = 1 μm, membrane conductivity gl = 5 × 10−5 S/cm2, axial resistivity ra = 100 Ωcm, and specific capacitance c = 1 μF/cm2 in all cases and Δsyn = 1 mA. C, Schematic of a sparsely-connected artificial neural network. Input units (left) correspond to pixels from the input. Hidden units (centre) receive connections from some, but not necessarily all, input units. Output units (right) produce a classification probability. D, Example 28 × 28 pixel greyscale images from the MNIST [40] (left) and MNIST-Fashion [41] (right) datasets. The MNIST images are handwritten digits from 0 to 9 and the MNIST-Fashion images have ten classes, respectively: T-shirt/top, trousers, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot. E, Learning improvement with dendritic normalisation (orange) compared to the unnormalised case (blue). Top row: Log-likelihood cost on training data. Bottom row: Classification accuracy on test data. From left to right: digits with M = 30 hidden neurons, fashion with M = 30, digits with M = 100, fashion with M = 100, digits with M = 300, fashion with M = 300. Solid lines show the mean over 10 trials and shaded areas the mean ± one standard deviation. SET hyperparameters are ε = 0.2 and ζ = 0.15.
Fig 2
Fig 2. Evolution of synaptic weights.
A, Number of efferent contacts from each input neuron (pixel) to neurons in the hidden layer as the weights evolve. The left panels (blue) show the unnormalised case and the right (orange) the normalised case. B, Afferent contacts for the unnormalised (blue) and normalised (orange) cases. From left to right: Distribution of the number of afferent contacts arriving at each hidden neuron, weights, and mean weighted input to each hiddden neuron over the test set. All panels show the average over 10 trials on the original MNIST dataset with hyperparameters M = 100, ε = 0.2, and ζ = 0.15. Dashed lines show where the vertical axis has been truncated to preserve the scale.
Fig 3
Fig 3. Improved training in deeper networks and comparison with other norms.
A, Schematic of a sparsely connected network with 3 hidden layers. The output layer is fully connected to the final hidden layer, but all other connections are sparse. B, Learning improvement with dendritic normalisation (orange) compared to the unnormalised control case (blue) for networks with 2 (top) and 3 (bottom, see panel A) sparsely-connected hidden layers, each with M = 100 neurons. Top of each: Log-likelihood cost on training data. Bottom of each: Classification accuracy on test data. C, Schematic of a convolutional neural network [46] with 20 5 × 5 features and 2 × 2 maxpooling, followed by a sparsely connected layer with M = 100 neurons. D, Improved learning in the convolutional network described in C for an unnormalised (blue) and normalised (orange) sparsely-connected layer. Top: Log-likelihood cost on training data. Bottom: Classification accuracy on test data. E, Improved learning in a network with one hidden layer with M = 100 threshold-linear neurons for unnormalised (blue) and normalised (orange) sparsely-connected layers. Top: Log-likelihood cost on training data. Bottom: Classification accuracy on test data. F, Contribution of different norm orders to the learning gradients of neuron with different numbers of afferent connections and different mean absolute connection weights. Norms are (left to right and top to bottom): L0 (dendritic normalisation), L1, L2 [37], joint L1 and L2, joint L0 and L1, and joint L0 and L2 (Eq 6). Values are scaled linearly to have the a maximum of 1 for each norm order. G, Comparison of dendritic (orange), heterosynaptic (green [37]), and joint (red, Eq 6) normalisations. Top: Log-likelihood cost on training data. Bottom: Classification accuracy on test data. H, Comparison of test accuracy under different orders of norm p after (from top to bottom) 1, 5, 10, and 20 epochs. Pink shows constant (Eq 8) and olive variable (Eq 9) excitability. Solid lines show the mean over 20 trials and shaded areas and error bars the mean ± one standard deviation. All results are on the MNIST-Fashion dataset. Hyperparameters are ε = 0.2 and ζ = 0.15.
Fig 4
Fig 4. Sparse recurrent networks with backpropagation through time.
A, Schematic of a network with dense feedforward and sparse recurrent connectivity. B, Learning improvement with dendritic normalisation (orange) compared to the unnormalised control case (blue) for the above network with M = 50 neurons adding binary numbers up to 250. Top: Mean-square error cost. Bottom: Classification accuracy. Solid lines show average over 100 repetitions and shaded regions in the bottom graph show the mean ± on standard deviation (truncated to be below an accuracy of 1). C, Final distributions of afferent connectivity degrees (left) and weights (right) in each case after 100 epochs. D, Schematic of a network with sparse feedforward and recurrent connectivity. E, Learning improvement with dendritic normalisation (orange) compared to the unnormalised control case (blue) for the above network with M = 50 neurons adding binary numbers up to 250. Top: Mean-square error cost. Bottom: Classification accuracy. Solid lines show average over 100 repetitions and shaded regions in the bottom graph show the mean ± on standard deviation (truncated to be below an accuracy of 1). F, Final distributions of recurrent afferent connectivity degrees (left), feedforward weights (right, top in each case), and recurrent weights (right, bottom in each case) after 100 epochs. Hyperparameters are ε = 0.3 and ζ = 0.15.
Fig 5
Fig 5. Self-organisation in networks of spatially extended spiking neurons.
A, Spatially extended neurons with self-organised recurrent connectivity. Green dendrites are excitatory neurons and red dendrites are inhibitory neurons. B, Top: Distributions of dendrite lengths before (light green) and after (dark green) 50 epochs of learning. Bottom: Distributions of number of afferent contacts before (light green) and after (dark green) 50 epochs of learning. C, Distributions of local synaptic weights before (light green) and after (dark green) 50 epochs of learning. D, Distributions of somatic voltages induced by individual synapses before (light green) and after (dark green) 50 epochs of learning. All distributions are over stimuli with different numbers of repeated elements. E, Prediction performance of the spatially-extended neurons as a function of the number of repeated central elements. Error bars show ± one standard deviation over 5 repetitions.

Similar articles

Cited by

References

    1. Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, editors. Advances in Neural Information Processing Systems 25. Curran Associates, Inc.; 2012. p. 1097–1105.
    1. Sutskever I, Vinyals O, Le Q. Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, editors. Advances in Neural Information Processing Systems 27. Curran Associates, Inc.; 2014. p. 3104–3112.
    1. Ardila D, Kiraly A, Bharadwaj S, Choi B, Reicher J, Peng L, et al.. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature Medicine. 2019;25(6):954–961. doi: 10.1038/s41591-019-0447-x - DOI - PubMed
    1. McCulloch W, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics. 1943;5(4):115–133. doi: 10.1007/BF02478259 - DOI - PubMed
    1. Hebb D. The organization of behavior: A neuropsychological theory. Wiley. 1949;93(3):459–460.

Publication types

Grants and funding

We acknowledge funding through BMBF grants 01GQ1406 (Bernstein Award 2013) to HC and 031L0229 to PJ. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.