Training Excitatory-Inhibitory Recurrent Neural Networks for Cognitive Tasks: A Simple and Flexible Framework

doi:10.1371/journal.pcbi.1004792

. 2016 Feb 29;12(2):e1004792.

doi: 10.1371/journal.pcbi.1004792. eCollection 2016 Feb.

Training Excitatory-Inhibitory Recurrent Neural Networks for Cognitive Tasks: A Simple and Flexible Framework

H Francis Song¹, Guangyu R Yang¹, Xiao-Jing Wang^{1

2}

Affiliations

¹ Center for Neural Science, New York University, New York, New York, United States of America.
² NYU-ECNU Institute of Brain and Cognitive Science, NYU Shanghai, Shanghai, China.

PMID: 26928718
PMCID: PMC4771709
DOI: 10.1371/journal.pcbi.1004792

Training Excitatory-Inhibitory Recurrent Neural Networks for Cognitive Tasks: A Simple and Flexible Framework

H Francis Song et al. PLoS Comput Biol. 2016.

. 2016 Feb 29;12(2):e1004792.

doi: 10.1371/journal.pcbi.1004792. eCollection 2016 Feb.

Authors

H Francis Song¹, Guangyu R Yang¹, Xiao-Jing Wang^{1

2}

Affiliations

¹ Center for Neural Science, New York University, New York, New York, United States of America.
² NYU-ECNU Institute of Brain and Cognitive Science, NYU Shanghai, Shanghai, China.

PMID: 26928718
PMCID: PMC4771709
DOI: 10.1371/journal.pcbi.1004792

Abstract

The ability to simultaneously record from large numbers of neurons in behaving animals has ushered in a new era for the study of the neural circuit mechanisms underlying cognitive functions. One promising approach to uncovering the dynamical and computational principles governing population responses is to analyze model recurrent neural networks (RNNs) that have been optimized to perform the same tasks as behaving animals. Because the optimization of network parameters specifies the desired output but not the manner in which to achieve this output, "trained" networks serve as a source of mechanistic hypotheses and a testing ground for data analyses that link neural computation to behavior. Complete access to the activity and connectivity of the circuit, and the ability to manipulate them arbitrarily, make trained networks a convenient proxy for biological circuits and a valuable platform for theoretical investigation. However, existing RNNs lack basic biological features such as the distinction between excitatory and inhibitory units (Dale's principle), which are essential if RNNs are to provide insights into the operation of biological circuits. Moreover, trained networks can achieve the same behavioral performance but differ substantially in their structure and dynamics, highlighting the need for a simple and flexible framework for the exploratory training of RNNs. Here, we describe a framework for gradient descent-based training of excitatory-inhibitory RNNs that can incorporate a variety of biological knowledge. We provide an implementation based on the machine learning library Theano, whose automatic differentiation capabilities facilitate modifications and extensions. We validate this framework by applying it to well-known experimental paradigms such as perceptual decision-making, context-dependent integration, multisensory integration, parametric working memory, and motor sequence generation. Our results demonstrate the wide range of neural activity patterns and behavior that can be modeled, and suggest a unified setting in which diverse cognitive computations and mechanisms can be studied.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Recurrent neural network (RNN).**
A trained RNN of excitatory and inhibitory rate units r(t) receives time-varying inputs u(t) and produces the desired time-varying outputs z(t). Inputs encode task-relevant sensory information or internal rules, while outputs indicate a decision in the form of an abstract decision variable, probability distribution, or direct motor output. Only the recurrent units have their own dynamics: inputs are considered to be given and the outputs are read out from the recurrent units. Each unit of an RNN can be interpreted as the temporally smoothed firing rate of a single neuron or the spatial average of a group of similarly tuned neurons.

**Fig 2. Perceptual decision-making task.**
(A) Inputs (upper) and target outputs (lower) for a perceptual decision-making task with variable stimulus duration, which we refer to as VS here. The choice 1 output must hold low during fixation (fix.), then high during the decision (dec.) period if the choice 1 input is larger than choice 2 input, low otherwise, and similarly for the choice 2 output. There are no constraints on output during the stimulus period. (B) Inputs and target outputs for the reaction-time version of the integration task, which we refer to as RT. Here the outputs are encouraged to respond after a short delay following the onset of stimulus. The reaction time is defined as the time it takes for the outputs to reach a threshold. (C) Psychometric function for the VS version, showing the percentage of trials on which the network chose choice 1 as a function of the signed coherence. Coherence is a measure of the difference between evidence for choice 1 and evidence for choice 2, and positive coherence indicates evidence for choice 1 and negative for choice 2. Solid line is a fit to a cumulative Gaussian distribution. (D) Psychometric function for the RT version. (E) Percentage of correct responses as a function of stimulus duration in the VS version, for each nonzero coherence level. (F) Reaction time for correct trials in the RT version as a function of coherence. Inset: Distribution of reaction times on correct trials. (G) Example activity of a single unit in the VS version across all correct trials, averaged within conditions after aligning to the onset of the stimulus. Solid (dashed) lines denote positive (negative) coherence. (H) Example activity of a single unit in the RT version, averaged within conditions and across all correct trials aligned to the reaction time.

**Fig 3. Perceptual decision-making networks with different constraints.**
(A) Psychometric function (percent choice 1 as a function of signed coherence) and connection weights (input, upper-right; recurrent, upper-left; and output, lower) for a network in which all weights may be positive or negative, trained for a perceptual decision-making task. Connections go from columns (“pre-synaptic”) to rows (“post-synaptic”), with blue representing positive weights and red negative weights. Different color scales (arbitrary units) were used for the input, recurrent, and output matrices but are consistent across the three networks shown. In the psychometric function, solid lines are fits to a cumulative Gaussian distribution. In this and the networks in B and C, self-connections were not allowed. In each case 100 units were trained, but only the 25 units with the largest absolute selectivity index (Eq 30) are shown, ordered from most selective for choice 1 (large positive) to most selective for choice 2 (large negative). (B) A network trained for the same task as in A but with the constraint that excitatory units may only project positive weights and inhibitory units may only project negative weights. All input weights were constrained to be excitatory, and the readout weights, considered to be “long-range,” were nonzero only for excitatory units. All connections except self-connections were allowed, but training resulted in a strongly clustered pattern of connectivity. Units are again sorted by selectivity but separately for excitatory and inhibitory units (20 excitatory, 5 inhibitory). (C) Same as B but with the additional constraint that excitatory recurrent units receiving input for choice 1 and excitatory recurrent units receiving input for choice 2 do not project to one another, and each group sends output to the corresponding choice.

**Fig 4. Context-dependent integration task.**
(A) Psychometric function, showing the percentage of trials on which the network chose choice 1 as a function of the signed motion (upper) and signed color (lower) coherence in motion-context (black) and color-context (blue) trials. (B) Average population responses in state space during the stimulus period, projected to the 3-dimensional subspace capturing variance due to choice, motion, and color as in [5]. Only correct trials were included. The task-related axes were obtained through a linear regression analysis. Note that “choice” here has a unit-specific meaning that depends on the preferred choice of the unit as determined by the selectivity index (Eq 30). For both motion (black) and color (blue), coherences increase from light to dark. Upper plots show trials during the motion context, and lower plots show trials during the color context. (C) Normalized responses of four recurrent units during the stimulus period show mixed representation of task variables. Solid lines indicate the preferred choice and dashed lines the nonpreferred choice of each unit. (D) Denoised regression coefficients from the linear regression analysis. By definition, the coefficients for choice are almost exclusively positive.

**Fig 5. Constraining the connectivity.**
Connectivity after training for the context-dependent integration task (Fig 4), when the connection matrix is (A) unstructured and (B) structured. Both networks consist of 150 units (120 excitatory, 30 inhibitory). In B the units are divided into two equal-sized “areas,” each with a local population of inhibitory units (I_S and I_M) that only project to units in the same area. The “sensory” area (green) receives excitatory inputs and sends dense, “long-range” excitatory feedforward connections E_M ← E_S to the “motor” area (orange) from which the outputs are read out. The sensory area receives sparse excitatory feedback projections E_S ← E_M from the motor area.

**Fig 6. Multisensory integration task.**
(A) Example inputs for visual only (left), auditory only (middle), and multisensory (both visual and auditory, right) trials. Network units receive both positively tuned (increasing function of event rate) and negatively tuned (decreasing function of event rate) inputs; panels here show positively tuned input corresponding to a rate of 13 events/sec, just above the discrimination boundary. As in the single-stimulus perceptual decision-making task, the outputs of the network were required to hold low during “fixation” (before stimulus onset), then the output corresponding to a high rate was required to hold high if the input was above the decision boundary and low otherwise, and vice versa for the output corresponding to a low rate. (B) Psychometric functions (percentage of choice high as a function of the event rate) for visual, auditory, and multisensory trials show multisensory enhancement. (C) Sorted activity on visual only and auditory only trials for three units selective for choice (high vs. low, left), modality (visual vs. auditory, middle), and both (right).

**Fig 7. Parametric working memory task.**
(A) Sample positively tuned inputs, showing the case where f₁ > f₂ (upper) and f₁ < f₂ (lower). Recurrent units also receive corresponding negatively tuned inputs. (B) Percentage of correct responses for different combinations of f₁ and f₂. This plot also defines the colors used for each condition, labeled by f₁, in the remainder of the figure. Due to the overlap in the values of f₁, there are 7 distinct colors representing 10 trial conditions. (C) Lower: Correlation of the tuning a₁ (see text) at different time points to the tuning in the middle of the first stimulus period (blue) and middle of the delay period (green). Upper: The tuning at the end of delay vs. middle of the first stimulus (left) and the end of delay vs. middle of the delay (right). (D) Single-unit activity for a unit that is positively tuned for f₁ during both stimulus periods (left), and for a unit that is positively tuned during the first stimulus period but negatively tuned during the second stimulus period (right). (E) Proportion of significantly tuned units based on a simple linear regression of the firing rates as a function of f₁ at each time point.

**Fig 8. Eye-movement sequence execution task.**
(A) Task structure (for Sequence 5) and (B) sample inputs to the network. During the intertrial interval (ITI) the network receives only the input indicating the current sequence to be executed. Fixation is indicated by the presence of a fixation input, which is (the central) one of 9 possible dot positions on the screen. During each movement, the current dot plus two possible target dots appear. (C) State-space trajectories during the three movements M1, M2, and M3 for each sequence, projected on the first two principal components (PCs) (71% variance explained, note the different axis scales). The network was run with zero noise to obtain the plotted trajectories. The hierarchical organization of the sequence of movements is reflected in the splitting off of state-space trajectories. Note that all sequences start at fixation, or dot 5 (black), and are clustered here into two groups depending on the first move in the sequence. (D) Example run in which the network continuously executes each of the 8 sequences once in a particular order; the network can execute the sequences in any order. Each sequence is separated by a 1-second ITI during which the eye position returns from the final dot in the previous trial to the central fixation dot. Upper: Eye position in “screen” coordinates. Lower: x and y-positions of the network’s outputs indicating a point on the screen. Note the continuity of dynamics across trials.

**Fig 9. Estimated performance during training for networks in the Results.**
(A)-(I) Percentage of correct responses. (J) Error in eye position. For each network the relevant figure in the main text and a brief description are given. Black lines are for the networks shown in the main text, while gray lines show the performance for 5 additional networks trained for the same tasks but using different initial weights. Red lines indicate the target performance; training terminated when the mean performance on several (usually 5) evaluations of the validation dataset exceeded the target performance. In I the target performance indicates the minimum, rather than mean, percentage of correct responses across conditions. The number of recurrent units (green) is indicated for each network. The number of minutes (in “real-time”) needed for training (blue) are estimates for a MacBook Pro running OS X Yosemite 10.10.4, with a 2.8 GHz Intel Core i7 CPU and 16 GB 1600 MHz DDR3 memory. GPUs were not used in the training of these networks.

See this image and copyright information in PMC

Cited by

Two views on the cognitive brain.
Barack DL, Krakauer JW. Barack DL, et al. Nat Rev Neurosci. 2021 Jun;22(6):359-371. doi: 10.1038/s41583-021-00448-6. Epub 2021 Apr 15. Nat Rev Neurosci. 2021. PMID: 33859408 Review.
Computational modeling of human multisensory spatial representation by a neural architecture.
Domenici N, Sanguineti V, Morerio P, Campus C, Del Bue A, Gori M, Murino V. Domenici N, et al. PLoS One. 2023 Mar 8;18(3):e0280987. doi: 10.1371/journal.pone.0280987. eCollection 2023. PLoS One. 2023. PMID: 36888612 Free PMC article.
Computational mechanisms of distributed value representations and mixed learning strategies.
Farashahi S, Soltani A. Farashahi S, et al. Nat Commun. 2021 Dec 10;12(1):7191. doi: 10.1038/s41467-021-27413-2. Nat Commun. 2021. PMID: 34893597 Free PMC article.
Understanding the computation of time using neural network models.
Bi Z, Zhou C. Bi Z, et al. Proc Natl Acad Sci U S A. 2020 May 12;117(19):10530-10540. doi: 10.1073/pnas.1921609117. Epub 2020 Apr 27. Proc Natl Acad Sci U S A. 2020. PMID: 32341153 Free PMC article.
Multi-context blind source separation by error-gated Hebbian rule.
Isomura T, Toyoizumi T. Isomura T, et al. Sci Rep. 2019 May 9;9(1):7127. doi: 10.1038/s41598-019-43423-z. Sci Rep. 2019. PMID: 31073206 Free PMC article.

See all "Cited by" articles

References

1. Barak O, Tsodyks M, Romo R. Neuronal population coding of parametric working memory. 2010. J Neurosci;30:9424–9430. 10.1523/JNEUROSCI.1875-10.2010 - DOI - PMC - PubMed
1. Rigotti M, Rubin DBD, Wang XJ, Fusi S. Internal representation of task rules by recurrent dynamics: the importance of the diversity of neural responses. 2010. Front Comput Neurosci;4:24 10.3389/fncom.2010.00024 - DOI - PMC - PubMed
1. Rigotti M, Barak O, Warden MR, Wang XJ, Daw ND, Miller EK, et al. The importance of mixed selectivity in complex cognitive tasks. 2013. Nature;497:585–590. 10.1038/nature12160 - DOI - PMC - PubMed
1. Yuste R. From the neuron doctrine to neural networks. 2015. Nat Rev Neurosci;16:487–497. 10.1038/nrn3962 - DOI - PubMed
1. Mante V, Sussillo D, Shenoy KV, Newsome WT. Context-dependent computation by recurrent dynamics in prefrontal cortex. 2013. Nature;503:78–84. 10.1038/nature12742 - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

This work was supported by the Swartz Foundation, Office of Naval Research Grant N00014-13-1-0297, and a Google Computational Neuroscience Grant. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

[1] Barak O, Tsodyks M, Romo R. Neuronal population coding of parametric working memory. 2010. J Neurosci;30:9424–9430. 10.1523/JNEUROSCI.1875-10.2010 - DOI - PMC - PubMed

[2] Barak O, Tsodyks M, Romo R. Neuronal population coding of parametric working memory. 2010. J Neurosci;30:9424–9430. 10.1523/JNEUROSCI.1875-10.2010 - DOI - PMC - PubMed

[3] Rigotti M, Rubin DBD, Wang XJ, Fusi S. Internal representation of task rules by recurrent dynamics: the importance of the diversity of neural responses. 2010. Front Comput Neurosci;4:24 10.3389/fncom.2010.00024 - DOI - PMC - PubMed

[4] Rigotti M, Rubin DBD, Wang XJ, Fusi S. Internal representation of task rules by recurrent dynamics: the importance of the diversity of neural responses. 2010. Front Comput Neurosci;4:24 10.3389/fncom.2010.00024 - DOI - PMC - PubMed

[5] Rigotti M, Barak O, Warden MR, Wang XJ, Daw ND, Miller EK, et al. The importance of mixed selectivity in complex cognitive tasks. 2013. Nature;497:585–590. 10.1038/nature12160 - DOI - PMC - PubMed

[6] Rigotti M, Barak O, Warden MR, Wang XJ, Daw ND, Miller EK, et al. The importance of mixed selectivity in complex cognitive tasks. 2013. Nature;497:585–590. 10.1038/nature12160 - DOI - PMC - PubMed

[7] Yuste R. From the neuron doctrine to neural networks. 2015. Nat Rev Neurosci;16:487–497. 10.1038/nrn3962 - DOI - PubMed

[8] Yuste R. From the neuron doctrine to neural networks. 2015. Nat Rev Neurosci;16:487–497. 10.1038/nrn3962 - DOI - PubMed

[9] Mante V, Sussillo D, Shenoy KV, Newsome WT. Context-dependent computation by recurrent dynamics in prefrontal cortex. 2013. Nature;503:78–84. 10.1038/nature12742 - DOI - PMC - PubMed

[10] Mante V, Sussillo D, Shenoy KV, Newsome WT. Context-dependent computation by recurrent dynamics in prefrontal cortex. 2013. Nature;503:78–84. 10.1038/nature12742 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Training Excitatory-Inhibitory Recurrent Neural Networks for Cognitive Tasks: A Simple and Flexible Framework

Affiliations

Training Excitatory-Inhibitory Recurrent Neural Networks for Cognitive Tasks: A Simple and Flexible Framework

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources