A Correspondence Between Normalization Strategies in Artificial and Biological Neural Networks

doi:10.1162/neco_a_01439

. 2021 Nov 12;33(12):3179-3203.

doi: 10.1162/neco_a_01439.

A Correspondence Between Normalization Strategies in Artificial and Biological Neural Networks

Yang Shen¹, Julia Wang², Saket Navlakha³

Affiliations

¹ Cold Spring Harbor Laboratory, Simons Center for Quantitative Biology, Cold Spring Harbor, NY 11724, U.S.A. yshen@cshl.com.
² Cold Spring Harbor Laboratory, Simons Center for Quantitative Biology, Cold Spring Harbor, NY 11724, U.S.A. julwang@cshl.edu.
³ Cold Spring Harbor Laboratory, Simons Center for Quantitative Biology, Cold Spring Harbor, NY 11724, U.S.A. navlakha@cshl.edu.

PMID: 34474484
PMCID: PMC8662716
DOI: 10.1162/neco_a_01439

A Correspondence Between Normalization Strategies in Artificial and Biological Neural Networks

Yang Shen et al. Neural Comput. 2021.

. 2021 Nov 12;33(12):3179-3203.

doi: 10.1162/neco_a_01439.

Authors

Yang Shen¹, Julia Wang², Saket Navlakha³

Affiliations

¹ Cold Spring Harbor Laboratory, Simons Center for Quantitative Biology, Cold Spring Harbor, NY 11724, U.S.A. yshen@cshl.com.
² Cold Spring Harbor Laboratory, Simons Center for Quantitative Biology, Cold Spring Harbor, NY 11724, U.S.A. julwang@cshl.edu.
³ Cold Spring Harbor Laboratory, Simons Center for Quantitative Biology, Cold Spring Harbor, NY 11724, U.S.A. navlakha@cshl.edu.

PMID: 34474484
PMCID: PMC8662716
DOI: 10.1162/neco_a_01439

Abstract

A fundamental challenge at the interface of machine learning and neuroscience is to uncover computational principles that are shared between artificial and biological neural networks. In deep learning, normalization methods such as batch normalization, weight normalization, and their many variants help to stabilize hidden unit activity and accelerate network training, and these methods have been called one of the most important recent innovations for optimizing deep networks. In the brain, homeostatic plasticity represents a set of mechanisms that also stabilize and normalize network activity to lie within certain ranges, and these mechanisms are critical for maintaining normal brain function. In this article, we discuss parallels between artificial and biological normalization methods at four spatial scales: normalization of a single neuron's activity, normalization of synaptic weights of a neuron, normalization of a layer of neurons, and normalization of a network of neurons. We argue that both types of methods are functionally equivalent-that is, both push activation patterns of hidden units toward a homeostatic state, where all neurons are equally used-and we argue that such representations can improve coding capacity, discrimination, and regularization. As a proof of concept, we develop an algorithm, inspired by a neural normalization technique called synaptic scaling, and show that this algorithm performs competitively against existing normalization methods on several data sets. Overall, we hope this bidirectional connection will inspire neuroscientists and machine learners in three ways: to uncover new normalization algorithms based on established neurobiological principles; to help quantify the trade-offs of different homeostatic plasticity mechanisms used in the brain; and to offer insights about how stability may not hinder, but may actually promote, plasticity.

PubMed Disclaimer

Figures

**Figure 1:**
Neural homeostatic plasticity mechanisms across four spatial scales. (A) Normalization of a single neuron's activity. Left: Neuron $X$ has a relatively low firing rate and a high firing threshold, $θ_{X}$ , and vice versa for neuron $Y$ . Right: Both neurons can be brought closer to their target firing rate by decreasing $θ_{X}$ and increasing $θ_{Y}$ . (B) Normalization of synaptic weights. Left (synaptic scaling): If a neuron is firing above its target rate, its synapses are multiplicatively decreased, and vice versa if the neuron is firing below its target rate. Right (dendritic normalization): If a synapse size increases due to strong LTP, its neighboring synapses decrease their size. (C) Normalization of a layer of neurons. Left: Two layers of neurons with feedforward connections and other feedback inhibitory connections (not shown). Right: The cumulative distribution of firing rates for neurons in the first layer is exponential with a different mean for different inputs. The activity of neurons in the second layer is normalized such that the means of the three exponentials are approximately the same. (D) Left: Example of a neural circuit with the same units and connections but different activity levels for neurons (purple bars) and different weights (pink arrow thickness) under two different conditions. Right: Despite local variability, the global distributions of firing rates and synaptic weights for the network remains stable (log-normally distributed) under both conditions.

**Figure 2:**
Data set: CIFAR-10: Normalization increases performance and drives neural networks toward a “homeostatic” state. (A) Test accuracy ( $y$ -axis) versus training iteration ( $x$ -axis). Error bars show standard deviation over 10 random initializations. BatchNorm and Synaptic Scaling achieve higher accuracy at the beginning and the end of training compared to all other methods, including Vanilla. (B) The probability of each hidden unit (columns) being activated over all inputs in a batch, computed on every 100th training iteration (rows). Heat maps are shown for hidden units in both fully connected (FC) layers. (C) Distribution of the probabilities that each unit in the first FC layer is activated per input. (D) Histogram of the mean activation values for hidden units in the first FC layer, calculated using the test data set. (E) Distribution of the trained $α$ parameters for Synaptic Scaling, for each FC layer.

**Figure 3:**
Data set: SVHN: Similar benefits of normalization on a second data set. Synaptic Scaling and BatchNorm have the highest classification accuracy (A), increase coding capacity (B,C: probability of each hidden unit being activated), and increase regularization (D: mean activation values for hidden units). See Figure 2 caption for detailed panel descriptions.

See this image and copyright information in PMC

Cited by

Machine-Learning-Based Identification of Key Feature RNA-Signature Linked to Diagnosis of Hepatocellular Carcinoma.
Matboli M, Diab GI, Saad M, Khaled A, Roushdy M, Ali M, ELsawi HA, Aboughaleb IH. Matboli M, et al. J Clin Exp Hepatol. 2024 Nov-Dec;14(6):101456. doi: 10.1016/j.jceh.2024.101456. Epub 2024 Jun 14. J Clin Exp Hepatol. 2024. PMID: 39055616
Rethinking the Role of Normalization and Residual Blocks for Spiking Neural Networks.
Ikegawa SI, Saiin R, Sawada Y, Natori N. Ikegawa SI, et al. Sensors (Basel). 2022 Apr 8;22(8):2876. doi: 10.3390/s22082876. Sensors (Basel). 2022. PMID: 35458860 Free PMC article.
Building transformers from neurons and astrocytes.
Kozachkov L, Kastanenka KV, Krotov D. Kozachkov L, et al. Proc Natl Acad Sci U S A. 2023 Aug 22;120(34):e2219150120. doi: 10.1073/pnas.2219150120. Epub 2023 Aug 14. Proc Natl Acad Sci U S A. 2023. PMID: 37579149 Free PMC article.
Distinctive properties of biological neural networks and recent advances in bottom-up approaches toward a better biologically plausible neural network.
Jeon I, Kim T. Jeon I, et al. Front Comput Neurosci. 2023 Jun 28;17:1092185. doi: 10.3389/fncom.2023.1092185. eCollection 2023. Front Comput Neurosci. 2023. PMID: 37449083 Free PMC article. Review.

References

1. Arora, S., Li, Z., & Lyu, K. (2019). Theoretical analysis of auto rate-tuning by batch normalization. In Proceedings of the 7th International Conference on Learning Representation. OpenReview.net.
1. Arpit, D., Zhou, Y., Kota, B. U., & Govindaraju, V. (2016). Normalization propagation: A parametric technique for removing internal covariate shift in deep networks. In Proceedings of the 33rd International Conference on Machine Learning (pp. 1168–1176).
1. Bakker, A., Krauss, G. L., Albert, M. S., Speck, C. L., Jones, L. R., Stark, C. E., … Gallagher, M. (2012). Reduction of hippocampal hyperactivity improves cognition in amnestic mild cognitive impairment. Neuron, 74(3), 467–474. 10.1016/j.neuron.2012.03.023 - DOI - PMC - PubMed
1. Bienenstock, E. L., Cooper, L. N., & Munro, P. W. (1982). Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J. Neurosci., 2(1), 32–48. 10.1523/JNEUROSCI.02-01-00032.1982 - DOI - PMC - PubMed
1. Bjorck, N., Gomes, C. P., Selman, B., & Weinberger, K. Q. (2018). Understanding batch normalization. In Bengio S., Wallach H., Larochelle H., Grauman K., Cesa-Bianchi N., & Garnett R., R. (Eds.), Advances in neural information processing systems, 31 (pp. 7694–7705). Red Hook, NY: Curran.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Arora, S., Li, Z., & Lyu, K. (2019). Theoretical analysis of auto rate-tuning by batch normalization. In Proceedings of the 7th International Conference on Learning Representation. OpenReview.net.

[2] Arora, S., Li, Z., & Lyu, K. (2019). Theoretical analysis of auto rate-tuning by batch normalization. In Proceedings of the 7th International Conference on Learning Representation. OpenReview.net.

[3] Arpit, D., Zhou, Y., Kota, B. U., & Govindaraju, V. (2016). Normalization propagation: A parametric technique for removing internal covariate shift in deep networks. In Proceedings of the 33rd International Conference on Machine Learning (pp. 1168–1176).

[4] Arpit, D., Zhou, Y., Kota, B. U., & Govindaraju, V. (2016). Normalization propagation: A parametric technique for removing internal covariate shift in deep networks. In Proceedings of the 33rd International Conference on Machine Learning (pp. 1168–1176).

[5] Bakker, A., Krauss, G. L., Albert, M. S., Speck, C. L., Jones, L. R., Stark, C. E., … Gallagher, M. (2012). Reduction of hippocampal hyperactivity improves cognition in amnestic mild cognitive impairment. Neuron, 74(3), 467–474. 10.1016/j.neuron.2012.03.023 - DOI - PMC - PubMed

[6] Bakker, A., Krauss, G. L., Albert, M. S., Speck, C. L., Jones, L. R., Stark, C. E., … Gallagher, M. (2012). Reduction of hippocampal hyperactivity improves cognition in amnestic mild cognitive impairment. Neuron, 74(3), 467–474. 10.1016/j.neuron.2012.03.023 - DOI - PMC - PubMed

[7] Bienenstock, E. L., Cooper, L. N., & Munro, P. W. (1982). Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J. Neurosci., 2(1), 32–48. 10.1523/JNEUROSCI.02-01-00032.1982 - DOI - PMC - PubMed

[8] Bienenstock, E. L., Cooper, L. N., & Munro, P. W. (1982). Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J. Neurosci., 2(1), 32–48. 10.1523/JNEUROSCI.02-01-00032.1982 - DOI - PMC - PubMed

[9] Bjorck, N., Gomes, C. P., Selman, B., & Weinberger, K. Q. (2018). Understanding batch normalization. In Bengio S., Wallach H., Larochelle H., Grauman K., Cesa-Bianchi N., & Garnett R., R. (Eds.), Advances in neural information processing systems, 31 (pp. 7694–7705). Red Hook, NY: Curran.

[10] Bjorck, N., Gomes, C. P., Selman, B., & Weinberger, K. Q. (2018). Understanding batch normalization. In Bengio S., Wallach H., Larochelle H., Grauman K., Cesa-Bianchi N., & Garnett R., R. (Eds.), Advances in neural information processing systems, 31 (pp. 7694–7705). Red Hook, NY: Curran.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Correspondence Between Normalization Strategies in Artificial and Biological Neural Networks

Affiliations

A Correspondence Between Normalization Strategies in Artificial and Biological Neural Networks

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials