Dendritic normalisation improves learning in sparsely connected artificial neural networks

doi:10.1371/journal.pcbi.1009202

. 2021 Aug 9;17(8):e1009202.

doi: 10.1371/journal.pcbi.1009202. eCollection 2021 Aug.

Dendritic normalisation improves learning in sparsely connected artificial neural networks

Alex D Bird^{1

2

3}, Peter Jedlicka^{2

3}, Hermann Cuntz^{1

2}

Affiliations

¹ Ernst Strüngmann Institute for Neuroscience (ESI) in co-operation with Max Planck Society, Frankfurt, Germany.
² Frankfurt Institute for Advanced Studies (FIAS), Frankfurt, Germany.
³ ICAR3R-Interdisciplinary Centre for 3Rs in Animal Research, Faculty of Medicine, Justus Liebig University Giessen, Giessen, Germany.

PMID: 34370727
PMCID: PMC8407571
DOI: 10.1371/journal.pcbi.1009202

Dendritic normalisation improves learning in sparsely connected artificial neural networks

Alex D Bird et al. PLoS Comput Biol. 2021.

. 2021 Aug 9;17(8):e1009202.

doi: 10.1371/journal.pcbi.1009202. eCollection 2021 Aug.

Authors

Alex D Bird^{1

2

3}, Peter Jedlicka^{2

3}, Hermann Cuntz^{1

2}

Affiliations

¹ Ernst Strüngmann Institute for Neuroscience (ESI) in co-operation with Max Planck Society, Frankfurt, Germany.
² Frankfurt Institute for Advanced Studies (FIAS), Frankfurt, Germany.
³ ICAR3R-Interdisciplinary Centre for 3Rs in Animal Research, Faculty of Medicine, Justus Liebig University Giessen, Giessen, Germany.

PMID: 34370727
PMCID: PMC8407571
DOI: 10.1371/journal.pcbi.1009202

Abstract

Artificial neural networks, taking inspiration from biological neurons, have become an invaluable tool for machine learning applications. Recent studies have developed techniques to effectively tune the connectivity of sparsely-connected artificial neural networks, which have the potential to be more computationally efficient than their fully-connected counterparts and more closely resemble the architectures of biological systems. We here present a normalisation, based on the biophysical behaviour of neuronal dendrites receiving distributed synaptic inputs, that divides the weight of an artificial neuron's afferent contacts by their number. We apply this dendritic normalisation to various sparsely-connected feedforward network architectures, as well as simple recurrent and self-organised networks with spatially extended units. The learning performance is significantly increased, providing an improvement over other widely-used normalisations in sparse networks. The results are two-fold, being both a practical advance in machine learning and an insight into how the structure of neuronal dendritic arbours may contribute to computation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Dendritic normalisation improves learning in sparse artificial neural networks.**
A, Schematic of dendritic normalisation. A neuron receives inputs across its dendritic tree (dark grey). In order to receive new inputs, the dendritic tree must expand (light grey), lowering the intrinsic excitability of the cell through increased membrane leak and spatial extent. B, Expected impact of changing local synaptic weight on somatic voltage as a function of dendrite length and hence potential connectivity. Top: Steady state transfer resistance (Eq 10) for somata of radii 0, 5, 10, and 15 μm. Shaded area shows one standard deviation around the mean in the 0 μm case (Eq 11). Middle: Maximum voltage response to synaptic currents with decay timescales 10, 50, and 100 ms (Eqs 14 and 16). Shaded area shows one standard deviation around the mean in the 100 ms case (Eq 15). Bottom: Total voltage response to synaptic currents with the above timescales (all averages lie on the solid line, Eq 17). Shaded areas show one standard deviation around the mean in each case (Eq 18). Intrinsic dendrite properties are radius r = 1 μm, membrane conductivity g_l = 5 × 10⁻⁵ S/cm², axial resistivity r_a = 100 Ωcm, and specific capacitance c = 1 μF/cm² in all cases and Δ_syn = 1 mA. C, Schematic of a sparsely-connected artificial neural network. Input units (left) correspond to pixels from the input. Hidden units (centre) receive connections from some, but not necessarily all, input units. Output units (right) produce a classification probability. D, Example 28 × 28 pixel greyscale images from the MNIST [40] (left) and MNIST-Fashion [41] (right) datasets. The MNIST images are handwritten digits from 0 to 9 and the MNIST-Fashion images have ten classes, respectively: T-shirt/top, trousers, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot. E, Learning improvement with dendritic normalisation (orange) compared to the unnormalised case (blue). Top row: Log-likelihood cost on training data. Bottom row: Classification accuracy on test data. From left to right: digits with M = 30 hidden neurons, fashion with M = 30, digits with M = 100, fashion with M = 100, digits with M = 300, fashion with M = 300. Solid lines show the mean over 10 trials and shaded areas the mean ± one standard deviation. SET hyperparameters are ε = 0.2 and ζ = 0.15.

**Fig 2. Evolution of synaptic weights.**
A, Number of efferent contacts from each input neuron (pixel) to neurons in the hidden layer as the weights evolve. The left panels (blue) show the unnormalised case and the right (orange) the normalised case. B, Afferent contacts for the unnormalised (blue) and normalised (orange) cases. From left to right: Distribution of the number of afferent contacts arriving at each hidden neuron, weights, and mean weighted input to each hiddden neuron over the test set. All panels show the average over 10 trials on the original MNIST dataset with hyperparameters M = 100, ε = 0.2, and ζ = 0.15. Dashed lines show where the vertical axis has been truncated to preserve the scale.

**Fig 3. Improved training in deeper networks and comparison with other norms.**
A, Schematic of a sparsely connected network with 3 hidden layers. The output layer is fully connected to the final hidden layer, but all other connections are sparse. B, Learning improvement with dendritic normalisation (orange) compared to the unnormalised control case (blue) for networks with 2 (top) and 3 (bottom, see panel A) sparsely-connected hidden layers, each with M = 100 neurons. Top of each: Log-likelihood cost on training data. Bottom of each: Classification accuracy on test data. C, Schematic of a convolutional neural network [46] with 20 5 × 5 features and 2 × 2 maxpooling, followed by a sparsely connected layer with M = 100 neurons. D, Improved learning in the convolutional network described in C for an unnormalised (blue) and normalised (orange) sparsely-connected layer. Top: Log-likelihood cost on training data. Bottom: Classification accuracy on test data. E, Improved learning in a network with one hidden layer with M = 100 threshold-linear neurons for unnormalised (blue) and normalised (orange) sparsely-connected layers. Top: Log-likelihood cost on training data. Bottom: Classification accuracy on test data. F, Contribution of different norm orders to the learning gradients of neuron with different numbers of afferent connections and different mean absolute connection weights. Norms are (left to right and top to bottom): L⁰ (dendritic normalisation), L¹, L² [37], joint L¹ and L², joint L⁰ and L¹, and joint L⁰ and L² (Eq 6). Values are scaled linearly to have the a maximum of 1 for each norm order. G, Comparison of dendritic (orange), heterosynaptic (green [37]), and joint (red, Eq 6) normalisations. Top: Log-likelihood cost on training data. Bottom: Classification accuracy on test data. H, Comparison of test accuracy under different orders of norm p after (from top to bottom) 1, 5, 10, and 20 epochs. Pink shows constant (Eq 8) and olive variable (Eq 9) excitability. Solid lines show the mean over 20 trials and shaded areas and error bars the mean ± one standard deviation. All results are on the MNIST-Fashion dataset. Hyperparameters are ε = 0.2 and ζ = 0.15.

**Fig 4. Sparse recurrent networks with backpropagation through time.**
A, Schematic of a network with dense feedforward and sparse recurrent connectivity. B, Learning improvement with dendritic normalisation (orange) compared to the unnormalised control case (blue) for the above network with M = 50 neurons adding binary numbers up to 2⁵⁰. Top: Mean-square error cost. Bottom: Classification accuracy. Solid lines show average over 100 repetitions and shaded regions in the bottom graph show the mean ± on standard deviation (truncated to be below an accuracy of 1). C, Final distributions of afferent connectivity degrees (left) and weights (right) in each case after 100 epochs. D, Schematic of a network with sparse feedforward and recurrent connectivity. E, Learning improvement with dendritic normalisation (orange) compared to the unnormalised control case (blue) for the above network with M = 50 neurons adding binary numbers up to 2⁵⁰. Top: Mean-square error cost. Bottom: Classification accuracy. Solid lines show average over 100 repetitions and shaded regions in the bottom graph show the mean ± on standard deviation (truncated to be below an accuracy of 1). F, Final distributions of recurrent afferent connectivity degrees (left), feedforward weights (right, top in each case), and recurrent weights (right, bottom in each case) after 100 epochs. Hyperparameters are ε = 0.3 and ζ = 0.15.

**Fig 5. Self-organisation in networks of spatially extended spiking neurons.**
A, Spatially extended neurons with self-organised recurrent connectivity. Green dendrites are excitatory neurons and red dendrites are inhibitory neurons. B, Top: Distributions of dendrite lengths before (light green) and after (dark green) 50 epochs of learning. Bottom: Distributions of number of afferent contacts before (light green) and after (dark green) 50 epochs of learning. C, Distributions of local synaptic weights before (light green) and after (dark green) 50 epochs of learning. D, Distributions of somatic voltages induced by individual synapses before (light green) and after (dark green) 50 epochs of learning. All distributions are over stimuli with different numbers of repeated elements. E, Prediction performance of the spatially-extended neurons as a function of the number of repeated central elements. Error bars show ± one standard deviation over 5 repetitions.

See this image and copyright information in PMC

Cited by

A GPU-based computational framework that bridges neuron simulation and artificial intelligence.
Zhang Y, He G, Ma L, Liu X, Hjorth JJJ, Kozlov A, He Y, Zhang S, Kotaleski JH, Tian Y, Grillner S, Du K, Huang T. Zhang Y, et al. Nat Commun. 2023 Sep 18;14(1):5798. doi: 10.1038/s41467-023-41553-7. Nat Commun. 2023. PMID: 37723170 Free PMC article.
Distinctive properties of biological neural networks and recent advances in bottom-up approaches toward a better biologically plausible neural network.
Jeon I, Kim T. Jeon I, et al. Front Comput Neurosci. 2023 Jun 28;17:1092185. doi: 10.3389/fncom.2023.1092185. eCollection 2023. Front Comput Neurosci. 2023. PMID: 37449083 Free PMC article. Review.
Introducing principles of synaptic integration in the optimization of deep neural networks.
Dellaferrera G, Woźniak S, Indiveri G, Pantazi A, Eleftheriou E. Dellaferrera G, et al. Nat Commun. 2022 Apr 7;13(1):1885. doi: 10.1038/s41467-022-29491-2. Nat Commun. 2022. PMID: 35393422 Free PMC article.

References

1. Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, editors. Advances in Neural Information Processing Systems 25. Curran Associates, Inc.; 2012. p. 1097–1105.
1. Sutskever I, Vinyals O, Le Q. Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, editors. Advances in Neural Information Processing Systems 27. Curran Associates, Inc.; 2014. p. 3104–3112.
1. Ardila D, Kiraly A, Bharadwaj S, Choi B, Reicher J, Peng L, et al.. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature Medicine. 2019;25(6):954–961. doi: 10.1038/s41591-019-0447-x - DOI - PubMed
1. McCulloch W, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics. 1943;5(4):115–133. doi: 10.1007/BF02478259 - DOI - PubMed
1. Hebb D. The organization of behavior: A neuropsychological theory. Wiley. 1949;93(3):459–460.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

We acknowledge funding through BMBF grants 01GQ1406 (Bernstein Award 2013) to HC and 031L0229 to PJ. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

Full Text Sources

[1] Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, editors. Advances in Neural Information Processing Systems 25. Curran Associates, Inc.; 2012. p. 1097–1105.

[2] Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, editors. Advances in Neural Information Processing Systems 25. Curran Associates, Inc.; 2012. p. 1097–1105.

[3] Sutskever I, Vinyals O, Le Q. Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, editors. Advances in Neural Information Processing Systems 27. Curran Associates, Inc.; 2014. p. 3104–3112.

[4] Sutskever I, Vinyals O, Le Q. Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, editors. Advances in Neural Information Processing Systems 27. Curran Associates, Inc.; 2014. p. 3104–3112.

[5] Ardila D, Kiraly A, Bharadwaj S, Choi B, Reicher J, Peng L, et al.. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature Medicine. 2019;25(6):954–961. doi: 10.1038/s41591-019-0447-x - DOI - PubMed

[6] Ardila D, Kiraly A, Bharadwaj S, Choi B, Reicher J, Peng L, et al.. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature Medicine. 2019;25(6):954–961. doi: 10.1038/s41591-019-0447-x - DOI - PubMed

[7] McCulloch W, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics. 1943;5(4):115–133. doi: 10.1007/BF02478259 - DOI - PubMed

[8] McCulloch W, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics. 1943;5(4):115–133. doi: 10.1007/BF02478259 - DOI - PubMed

[9] Hebb D. The organization of behavior: A neuropsychological theory. Wiley. 1949;93(3):459–460.

[10] Hebb D. The organization of behavior: A neuropsychological theory. Wiley. 1949;93(3):459–460.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Dendritic normalisation improves learning in sparsely connected artificial neural networks

Affiliations

Dendritic normalisation improves learning in sparsely connected artificial neural networks

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources