Developmental and evolutionary constraints on olfactory circuit selection - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 15;119(11):e2100600119.
doi: 10.1073/pnas.2100600119. Epub 2022 Mar 9.

Developmental and evolutionary constraints on olfactory circuit selection

Affiliations

Developmental and evolutionary constraints on olfactory circuit selection

Naoki Hiratani et al. Proc Natl Acad Sci U S A. .

Abstract

SignificanceIn this work, we explore the hypothesis that biological neural networks optimize their architecture, through evolution, for learning. We study early olfactory circuits of mammals and insects, which have relatively similar structure but a huge diversity in size. We approximate these circuits as three-layer networks and estimate, analytically, the scaling of the optimal hidden-layer size with input-layer size. We find that both longevity and information in the genome constrain the hidden-layer size, so a range of allometric scalings is possible. However, the experimentally observed allometric scalings in mammals and insects are consistent with biologically plausible values. This analysis should pave the way for a deeper understanding of both biological and artificial networks.

Keywords: model selection; neural circuit; olfaction; statistical learning theory.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
(A) Scaling law in mammalian olfactory circuits. Data points were taken from supplementary tables S2 and S3 of Srinivasan and Stevens (20). (B) Scaling law in invertebrate olfactory circuits. See SI Appendix, section 1.1 for details.
Fig. 2.
Fig. 2.
Network models. (A) Olfactory environment (teacher). (B) Olfactory circuit that models the environment (student).
Fig. 3.
Fig. 3.
Generalization error (red; Eq. 12), approximation error (blue; Eq. 9), and estimation error (green; Eq. 11) at Lx = 50, N=30,000, for various hidden-layer sizes Lh. Lines are analytical results; points are from numerical simulations (see SI Appendix, section 7.4 for details). Solid and dashed vertical lines are the minima of the generalization error from theory and simulations, respectively. Here, and in all figures except Fig. 4 D and E, both gt and gs are rectified linear functions [gt(u)=gs(u)=max(0,u)]. In all figures except Fig. 6 we use σt2=0.1 for the noise in the teacher circuit, and in all figures the hidden-layer size of the teacher network is fixed at Lt=500. Error bars represent the SD over 10 simulations.
Fig. 4.
Fig. 4.
Model behavior under maximum-likelihood estimation. (A) Relationship between the input-layer size, Lx, and the optimal hidden-layer size, Lh*, at a fixed sample size (N=30,000). Gray lines are found by optimizing Eq. 12 with respect to Lh; dashed lines are the asymptotic expression derived in SI Appendix, section 5.1. (B) Optimal hidden-layer size, Lh*, as a function of the input-layer size, Lx, and the sample size, N, from Eq. 12. (C) Scaling at N=1.65Lx1.96 . Gray line is theory; black points are from simulations; colored circles are the experimental data from Fig. 1A. Simulations were done only for low Lx, due to the computational cost of the simulations when Lx is large. (D) Relationship between the hidden-layer size, Lh, and the generalization error, ϵgen, under the logistic activation function (black), and ReLU (gray), at Lx = 50 and N=30,000. Lines are theory; bars are from simulations. Vertical lines mark the minima (solid, theory; dashed, simulations). Error bars are the SD over 10 simulations. (E) Scaling for the logistic activation function with N=240Lx1.96. Gray line is theory; black points are from simulations; colored circles are the experimental data from Fig. 1A. As in C, simulations were done only for low Lx, due to the computational cost of the simulations when Lx is large. (F) Analytical estimation of the Lh* - Lx scaling versus the Lx-N scaling (y axis, coefficient β in the scaling Lx*Lxβ; x axis, coefficient γ in the scaling LxNγ; see SI Appendix, section 5.1 for details). The gray horizontal line is the 3/2 scaling from Fig. 1A. As in Fig. 3, the teacher network had a hidden-layer size of 500, with a ReLU nonlinearity, and the noise was set to σt2=0.1.
Fig. 5.
Fig. 5.
Model behavior under stochastic gradient descent. (A) Hidden-layer size dependence of the decay time constant L1 and L2, with Lx = 100. (B) Dynamics of the estimation error under various hidden-layer sizes, Lh. Dashed lines, simulations; solid lines, theory. (C) The lifetime average generalization error, approximation error, and lifetime average estimation error under various hidden-layer sizes, Lh, at N=30,000. (D) Optimal hidden-layer size, Lh*, with N=30,000. Dashed lines are asymptotic scaling (SI Appendix, section 5.2). See SI Appendix, Fig. S2B for curves with a range of N and σt2. (E) Optimal hidden-layer size, Lh*, with N=19Lx1.96. Gray line is theory; black points are from simulations; colored circles are the experimental data from Fig. 1A. As in Fig. 4, simulations were done only for low Lx, due to the computational cost of the simulations when Lx is large. The discontinuity around Lx10 is originated from approximations that do not match perfectly around Lh*Lx2/2 (SI Appendix, sections 4.2 and 8). (F) Optimal hidden-layer size, Lh*, for various initial weight amplitudes, σR2, and N=30,000. Gray, fixed learning rate; black, adaptive learning rate. Lines are theory and dots are simulations. The initial readout weights were sampled from ws(0)N(0,σR2/Lh). The horizontal dashed line represents the cutoff of Lh* in the numerical simulations. When σR2<2, under a fixed learning rate, Lh* is larger than 105. In A–C and F we set the input-layer size to Lx=100. As in Fig. 3, the teacher network had a hidden-layer size of 500 and used a ReLU nonlinearity, and the noise was set to σt2=0.1.
Fig. 6.
Fig. 6.
Olfactory circuit augmented with a genetically specified pathway. (A) Schematic of the two-pathway model. For the top part of the circuit, the weights Jp and wp are hard wired; for the bottom part, the weights Js are randomly connected and ws are learned with adaptive SGD. (B–D) Optimal layer size of the projection neurons-to-Kenyon cells pathway ws·g(Jsx) under different model settings. (B) Low-bit synapses were achieved by adding Gaussian noise to Jp and wp. (C) Low-bit synapses were achieved by discretizing Jp and wp. (D) Low-bit synapses were achieved by adding noise to Jp and wp as in B, but wp was additionally learned from training samples using SGD (SI Appendix, Eq. S145). In B–D, the teacher network had a hidden-layer size of 500 and a ReLU nonlinearity, and we used σt2=0.01 and N=10Lx2 trials. For sb = 2 bits we used G=2,000, while for sb = 4 bits we used G=4,000. For sb = 0 bits, we simply removed the hard-wired pathway. The width of the hard-wired intermediate layer, Lp, was found from Eq. 18: Lp=G/sb(Lx+1), rounded up to an integer. See SI Appendix, sections 6 and 7.5 for details.

Similar articles

Cited by

References

    1. Mathis A., Herz A. V., Stemmler M., Optimal population codes for space: Grid cells outperform place cells. Neural Comput. 24, 2280–2317 (2012). - PubMed
    1. Gjorgjieva J., Sompolinsky H., Meister M., Benefits of pathway splitting in sensory coding. J. Neurosci. 34, 12127–12144 (2014). - PMC - PubMed
    1. Litwin-Kumar A., Harris K. D., Axel R., Sompolinsky H., Abbott L. F., Optimal degrees of synaptic connectivity. Neuron 93, 1153–1164.e7 (2017). - PMC - PubMed
    1. Akaike H., A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19, 716–723 (1974).
    1. Baum E. B., Haussler D., “What size net gives valid generalization?” in Advances in Neural Information Processing Systems, Touretzky D., Ed. (NIPS, 1988), vol. 1, pp. 81–90.

LinkOut - more resources