Pruning recurrent neural networks replicates adolescent changes in working memory and reinforcement learning - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 31;119(22):e2121331119.
doi: 10.1073/pnas.2121331119. Epub 2022 May 27.

Pruning recurrent neural networks replicates adolescent changes in working memory and reinforcement learning

Affiliations

Pruning recurrent neural networks replicates adolescent changes in working memory and reinforcement learning

Bruno B Averbeck. Proc Natl Acad Sci U S A. .

Abstract

Adolescent development is characterized by an improvement in multiple cognitive processes. While performance on cognitive operations improves during this period, the ability to learn new skills quickly, for example, a new language, decreases. During this time, there is substantial pruning of excitatory synapses in cortex and specifically in prefrontal cortex. We have trained a series of recurrent neural networks to solve a working memory task and a reinforcement learning (RL) task. Performance on both of these tasks is known to improve during adolescence. After training, we pruned the networks by removing weak synapses. Pruning was done incrementally, and the networks were retrained during pruning. We found that pruned networks trained on the working memory task were more resistant to distraction. The pruned RL networks were able to produce more accurate value estimates and also make optimal choices more consistently. Both results are consistent with developmental improvements on these tasks. Pruned networks, however, learned some, but not all, new problems more slowly. Thus, improvements in task performance can come at the cost of flexibility. Our results show that overproduction and subsequent pruning of synapses is a computationally advantageous approach to building a competent brain.

Keywords: neural network; pruning; reinforcement learning; working memory.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
DMS task and recurrent network. (A) In the DMS task, a cue is presented on either the left or right of a subject. After the cue, there is a delay, during which distractors might be presented. Following the delay a second cue is presented, after which there is another delay. After the second delay, a cue is given for the subject to respond. If the two cues were on the same side, a match response should be given. If the two cues were on opposite sides, a nonmatch response should be given. In some simulations, probes were delivered during the delay interval. (B) The network has four inputs and one output. Between the input and output is a recurrent layer. Cue 1 (+1 = left, −1 = right) is presented on the first input. Cue 2 is presented on the second input. The third and fourth inputs indicate the start of the trial (1, 1), the cue and delay periods (0, 0), and the response period (0, 1). Note that inputs and outputs were fully connected to the recurrent layer and never pruned. Only a subset are shown for clarity. All pruning was done in the recurrent layer. (C and D) When cue 1 and cue 2 are on the same side, the network produces a match response [y(k)=1]. (E and F) When cue 1 and cue 2 are on opposite sides, the network produces a nonmatch response [y(k)=1]. (G) Training sequence for pruned and unpruned networks. Networks were first trained on the task, leading to a starting network. Then, this starting weight matrix was pruned (10%), or for the unpruned network copied forward, and retrained. We then pruned an additional 5% of the weights and retrained the network. In parallel, we also retrained the unpruned networks. As we introduced new training examples each time that had different noise realizations, both pruned and unpruned networks could be retrained. Thus, at each level of pruning, there was a pruned and unpruned network that had seen the same number of training examples and started from the same weight matrix. (H) Average final training loss for pruned and unpruned networks. Note that the x axis indicates prune fraction, but for unpruned networks, this indicates the number of equivalent training episodes since these networks were not pruned. (I) Number of CG training iterations when retraining pruned and unpruned networks.
Fig. 2.
Fig. 2.
Weight distribution and factorization of recurrent weight matrix, A. (A) Distribution of weights in the example pruned and unpruned network. (B) Normalized, cumulative spectrum of singular values of the recurrent matrix, A, for the pruned and unpruned network. The matrix A was factored as A = USV, and the singular values were sorted in S from largest to smallest. The plot shows the cumulative sum of the singular values, divided by the total sum of the singular values. (C–F) Example performance of pruned (70%) and unpruned network on example probed and unprobed trials. (C) Performance of unpruned network for condition 4 without a probe. The network generates the correct response. (D) Performance of same unpruned network following a probe. In this case the network gives the incorrect answer. (E) Performance of pruned network without probe. The network produces the correct output. (F) Same network as in E in a probe trial. The pruned network produces the correct output.
Fig. 3.
Fig. 3.
Low-dimensional representation of recurrent dynamics in pruned and unpruned networks. Same networks as shown in Fig. 2. (A) Evolution of activity over time in the first PC for trials of condition 1 (i.e., left/left, match), condition 4 (i.e., right/left, nonmatch), and a probe trial in which the condition 4 trial was probed during the delay (i.e., right/left probe/left). Note that condition 1 would correspond to the correct response if the network was responding to the probe instead of the first cue in condition 4. The perturbed and unperturbed trajectories for condition 4 diverge around time point 30. The symbols below the x axis indicate the times at which the first cue (C1), probe (P), second cue (C2), and response (R) occurred. These times were the same across all conditions. (B) Projection of the same trial on the second PC. (C) Network output, y(k), in each trial. Note that the network produces the wrong response in the probe trial. (D–F) Data from the pruned network. (D) Evolution of activity projected on the first PC over time for the same conditions shown in A. Note that the trajectories diverge less and only late in the trial. (E) Projection of activity on the second PC. Note that in this dimension, there is also little divergence in the activity. (F) In this example the network gives the correct output.
Fig. 4.
Fig. 4.
Evolution of activity in first two PCs with vector field overlay. (A) Evolution of activity for condition 4 and the probed condition 4 in the unpruned network. S indicates the start of the time window over which the plot is being shown, which starts two time steps before the probe (P) is given. The vector field around the probe trajectory is also shown. The network activity was perturbed from the mean trajectory in a grid around the mean, indicated by the open circles, and the network update equation was applied to the perturbed activity. The perturbations were carried out by displacing activity off the mean trajectory on the 2D PC plane by a fixed amount. The displaced trajectories were then projected back to the full dimensional space, and the network updates were calculated. These updates were then projected back to the 2D PC space for illustration. The light blue line connects the perturbed point to the subsequent position of the network after one iteration. Note that when lines are parallel to the trajectory, perturbed activity does not return to the unperturbed trajectory. (B) Same as A for the pruned network. Note that the open circles are the same displacement in both coordinate systems in A and B. (C) The contraction metric characterizes the expansion or contraction of the perturbed points after one iteration. First, the average Euclidean distance of the points after one iteration of the network is calculated. This is the spread of the vectors at their end. This was then normalized by the initial spread (which was constant) to calculate the fraction of expansion or contraction. These values are consistent with the LEs calculated over the delay interval. (D) The spectrum of LEs for the pruned and unpruned example networks.
Fig. 5.
Fig. 5.
Performance of population of pruned and unpruned networks with probes of different strengths and times relative to cue 1. Positive outputs during response time were counted as match responses, and negative outputs were counted as nonmatch responses. Error bars are SEM (n = 400). Data in B, C, E, and F are for a prune fraction of 0.7, indicated in A by the arrow. Bars at the top of A–D indicate the range over which the pruned and unpruned networks differ statistically (P < 0.01). (A) Performance of pruned and unpruned networks as a function of pruning, in probe trials. A correct response corresponds to ignoring the probe. Values are averaged over probes delivered at all time points, all strengths (greater than 0, which is no probe), and all trial conditions. Note the unpruned networks were not pruned, but they were trained to criterion in parallel with the pruned networks. (B) Average fraction of correct responses for pruned and unpruned networks when probes were delivered at different times during the delay. Values shown are for a prune fraction of 0.7 and averaged across probe strength, excluding a strength of 0 (i.e., when no probe was delivered). Note the time period examined with probes is the delay interval, and the first cue was delivered at time points 13 and 14, just before the illustrated data. (C) Average fraction of correct responses as a function of strength of probe. Values shown are averaged across all times at which probes were delivered for networks with a prune fraction of 0.7. A strength of 0 indicates no probe. (D) Maximal LE for pruned and unpruned networks. (E) Average LE of networks that were correct across all probe trials, across conditions, the number of times indicated on the x axis. Note that networks usually failed by not producing the correct response, when probed, for one or more of the conditions, so performance fell into the corresponding bins. Data shown are for a prune fraction of 0.7. (F) Average distribution of weights for networks pruned to 0.7. Note the small bump near zero for the pruned networks is for small weight values that fall into the middle bin.
Fig. 6.
Fig. 6.
Weight distributions and example performance. (A) Distribution of weights in unpruned and pruned (70%) network. Note the distribution does not reach 0 at weights of 0 in this plot because of a binning effect. (B) Scaling of singular values extracted from the A matrix for trained unpruned and pruned network. (C–F) Example performance of unpruned and pruned networks on fixation and choice periods from RL task. (C) Network predictions and optimal Q values for an example block of trials for unpruned network, during fixation phase. y values indicate network output, and q values indicate Q values estimated with model. The four lines (0 to 3) indicate the choice options. (D) Same as C for choice phase. (E) Example network output and Q values for fixation period for pruned network. (F) Same as E for choice phase. (G) Fraction of variance explained by PCs extracted from the latent dynamics, x(k). (H) Example sequence of points in PC space for fixation and choice periods for unpruned network. Note the point clouds for the two epochs are not well separated. S indicates first trial, and plus indicates a rewarded choice. (I) Same as H for pruned network.
Fig. 7.
Fig. 7.
Performance and accuracy for population of pruned and unpruned RL networks. Bars at the top of A–D and G indicate values for which pruned and unpruned networks differ significantly. A–G are shown for intermediate noise level (0.1). Error bars are SEM, n = 100. (A) Q-value prediction accuracy on new blocks of data. Note that this is not the training loss, because when networks were trained, the target function had added noise. This is the accuracy with which the network predicts the underlying, noise-free Q values. (B) Accuracy (fraction correct) with which networks predict the same choice as the Q algorithm on new blocks of data. (C) Average reward collected per block for pruned and unpruned networks. (D) Average maximum LE for pruned and unpruned networks. (E) Cumulative variance explained for pruned and unpruned networks at a prune fraction of 70%. (F) Distribution of connection strength for pruned and unpruned networks, averaged across all networks. Note that values at 0 are for small nonzero values that fall into the central bin. (G) Mahalanobis distance between centroids for recurrent activity for fixation vs. choice periods. (H) Scatterplot of fraction correct vs. maximal LE, across pruned and unpruned networks, trained at a noise level of 0.1. Fraction correct refers to consistency with Q-learning algorithm shown in B. (I) Same as H for noise level of 1.

Similar articles

Cited by

References

    1. Rosenblatt F., The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958). - PubMed
    1. Widrow B., Hoff M. E., “Adaptive switching circuits” in IRE Wescon Convention Record (Institute of Electrical and Electronics Engineers, Western Electronic Show and Convention, 1960), pp. 96–104.
    1. Hebb D. O., Organization of behavior. J. Clin. Psychol. 6, 307–307 (1950).
    1. Rumelhart D. E., McClelland J. L., Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Computational Models of Cognition and Perception (MIT Press, Cambridge, MA, 1986).
    1. Lee H., et al. , Synapse elimination and learning rules co-regulated by MHC class I H2-Db. Nature 509, 195–200 (2014). - PMC - PubMed

Publication types