Abstract
Brain networks exist within the confines of resource limitations. As a result, a brain network must overcome the metabolic costs of growing and sustaining the network within its physical space, while simultaneously implementing its required information processing. Here, to observe the effect of these processes, we introduce the spatially embedded recurrent neural network (seRNN). seRNNs learn basic task-related inferences while existing within a three-dimensional Euclidean space, where the communication of constituent neurons is constrained by a sparse connectome. We find that seRNNs converge on structural and functional features that are also commonly found in primate cerebral cortices. Specifically, they converge on solving inferences using modular small-world networks, in which functionally similar units spatially configure themselves to utilize an energetically efficient mixed-selective code. Because these features emerge in unison, seRNNs reveal how many common structural and functional brain motifs are strongly intertwined and can be attributed to basic biological optimization processes. seRNNs incorporate biophysical constraints within a fully artificial system and can serve as a bridge between structural and functional research communities to move neuroscientific understanding forwards.
Similar content being viewed by others
Main
As they develop, brain networks learn to achieve objectives, from simple functions such as autonomic regulation, to higher-order processes such as solving problems. Many stereotypical features of networks are downstream consequences of resolving challenges and trade-offs they face, across their lifetime1,2 and evolution3,4,5. One example is the optimization of functionality within resource constraints; all brain networks must overcome metabolic costs to grow and sustain the network in physical space, while simultaneously optimizing that network for information processing. This trade-off shapes all brains within and across species, meaning it could be why many brains converge on similar organizational solutions4. As such, the most basic features of both brain organization and network function—such as its sparse and small-world structure, functional modularity, and characteristic neuronal tuning curves—might arise because of this basic optimization problem.
Our understanding of how the brain’s structure and function interact largely comes from observing differences in brain structure, such as across individuals6 or following brain injury7, and then systematically linking these differences to brain function or behavioural outcomes. But how do these relationships between structure, function and behaviour emerge in the first place? To address this question, we need to be able to manipulate experimentally how neural networks form, as they learn to achieve behavioural objectives, to establish the causality of these relationships. Computational models allow us to do this8. They have shown that network modularity can arise through the spatial cost of growing a network9, how orthogonal population dynamics can arise purely through optimizing task performance10 and how predictive coding can arise through limiting a brain’s energy usage11. But we have yet to incorporate both the brain’s anatomy and the brain’s function into a single coherent model, allowing a network to dynamically trade-off its different structural, functional and behavioural objectives in real time.
To achieve this, we introduce spatially embedded recurrent neural networks (seRNNs). An seRNN is optimized to solve a task, making decisions to achieve functional goals. However, as it learns to achieve these goals and to optimize its behavioural performance, its constituent neurons face the kind of resource constraints experienced within biological networks. Neurons must balance their finite resources to grow or prune connections, while the cost of a connection is proportional to its length in three-dimensional (3D) Euclidean space12,13,14,15,16. At the same time, the network attempts to optimize its intra-network communication to allow for efficient propagation of signals17,18,19,20,21. By allowing seRNNs to dynamically manage both their structural and functional objectives simultaneously, while they learn to behave, multiple simple and complex hallmarks of biological brains naturally emerge.
Results
Spatially embedded recurrent neural networks
Our first goal was to create a supervised optimization process that subjects recurrent neural networks (RNNs; ‘RNN modelling’ in Methods) to the constraints of biophysical space while they are optimized for task performance. An established way of influencing a network’s weight matrix while it is optimized for task performance is regularization (Fig. 1a). In regularization, instead of merely optimizing a network’s weights to maximize task performance, one adds an additional regularization term to the optimizer to minimize the strength of a network’s weights. This is related to regularized regression, such as L1 (LASSO) regression, where the sum of the absolute beta weights is minimized to improve a model’s out-of-sample prediction performance. We use the same idea to spatially embed an RNN. We start with fully connected RNNs and while they are trained to maximize task performance, we nudge them to minimize weights that are long in 3D space. To achieve this, we assign every unit in the RNN’s recurrent layer a location in 3D space (Fig. 1b) and regularize a weight more strongly if it belongs to two units that are far apart in Euclidean space. In this pruning process, we also want the network to optimize within-network communication, meaning a weight should be more readily pruned if it does not contribute strongly to the propagation of signals within the network. A standard measure of signal propagation in a (binary) network is communicability, reflecting the shortest routes between all pairs of nodes22 (Fig. 1c; see details in ‘Communicability’ in Supplementary Information). When adapted for a weighted network (weighted communicability19), the communicability value of a network is low when there are strong global core connections supporting short paths across the network while avoiding redundant peripheral connections to achieve sparsity (Fig. 1d). In Supplementary Information (‘Minimizing redundant connectivity by minimizing weighted communicability’), we provide information on how weighted communicability differentially optimizes peripheral and core connection strengths. By combining the spatial distance and weighted communicability terms in an RNN’s regularization while it learns to solve a task, we arrive at seRNNs (Fig. 1e). We provide a detailed walkthrough of the regularization function in ‘seRNN regularization function’ in Methods. While learning to solve a task, seRNNs are nudged to prefer short core weights over long peripheral weights.
To understand how this spatial embedding impacts a network’s structure and function, we set up 2,000 RNNs. Half of the networks were seRNNs trained with the new optimization process described above. The other half were regular RNNs regularized with a standard L1 regularizer minimizing the sum of the absolute weights, to arrive at a population of baseline networks that match seRNNs in overall connectivity strength. In both cases, the regularizer was applied to the hidden recurrent layer of the network and the regularization strength was systematically varied within each subgroup of networks to cover a wide spectrum of regularization strength that is matched across subgroups (Fig. 1e and ‘Regularization strength set-up and network selection’ in Methods). All networks had 100 units in their hidden layer and were trained for 10 epochs. All networks started strongly connected and learned through pruning weights in accordance with their regularization. We trained networks on a one-choice inference task that required networks to develop two fundamental cognitive functions of recurrent networks: remembering task information (‘goal’) and integrating it with new incoming information (‘choices’) (Fig. 1f and ‘Task paradigm’ in Methods).
When training the networks, we found that both types of network manage to learn the task with high accuracy (Fig. 2a). Focusing on networks that successfully solve the task (>90% task accuracy; n = 390 for seRNNs, n = 479 for L1s; see ‘Regularization strength set-up and network selection’ in Methods for discussion of network numbers), we first validated that our optimization process is working. By using L1 networks as a baseline, we observed that both groups decrease in their average connectivity strength (Fig. 2b) but that only seRNNs did so by pruning long-distance connections (Fig. 2c). This is commonly found in empirical brain networks across species and scales23. In addition, we validate that seRNNs successfully focus their pruning process on weights that are less important for the network’s communicative structure, as represented by lower weighted communicability (Fig. 2d). Figure 2e shows an example visualization of one seRNN.
Having shown that the new regularization function in seRNNs has the expected effects on the weight matrix of networks, we next tested which features result from the spatial embedding. Specifically, we tested whether seRNNs show features commonly observed in primate cerebral cortices, including structural motifs such as modularity24,25,26 and small-worldness27,28, before testing for functional clustering of units in space27,28. We then go beyond structural and functional organization and test whether spatial embedding forces networks to implement an energy-efficient mixed-selective code29,30. In short, we wanted to test whether established organization properties of complex brain networks arise when we impose local biophysical constraints.
Modular small-world networks emerge from constraints
We first investigated two key topological characteristics that are commonly found in empirical brain networks across spatial scales and proposed to facilitate brain function: modularity24,25,26 and small-worldness27,28. Modularity denotes dense intrinsic connectivity within a module but sparse weak extrinsic connections between modules and small-worldness indicates a short average path length between all node pairs, with high local clustering.
Computing modularity Q statistics and small-worldness (‘Topological analysis’ in Methods) shows that seRNNs consistently show both increased modularity (Fig. 3a) and small-worldness (Fig. 3b) relative to L1 networks over the course of training. Differences are smaller initially, but later in training, the effect size for differences in modularity are large (at epoch 9, modularity P = 2.24 × 10−82, Cohen’s d = 1.07; Fig. 3a, right) and for small-worldness moderate to large (P = 2.82 × 10−19, Cohen’s d = 0.59; Fig. 3b, right). seRNNs achieve modularity Q statistics within ranges commonly found in empirical human cortical networks31. Both L1 and seRNNs achieve the technical definition of small-worldness of >1 (ref. 32), but seRNNs show a higher value more consistent with empirical networks33. ‘Replication across architectures’ in Supplementary Information shows how the subparts of the regularization interact with the task optimization to shape these structural effects. It is important to note that within the population of seRNNs, we find varying degrees of modularity and small-worldness (Fig. 3a, right, and Fig. 3b, right). We will return to this variability in a later section.
To further validate the structural likeness of seRNNs to empirical neural connectivity, we used generative network models9,34,35,36. These models elucidate which topological wiring rules can accurately approximate observed neural graphs. Corroborating empirical macro- and microscopic data15,35, we find that homophily wiring rules—where neurons preferentially form connections to other neurons that are self-similar in their connectivity profiles—perform best in approximating the topology of seRNNs relative to all other wiring rules (Fig. 3c and additional detail in ‘Generative network modelling of RNNs’ in Supplementary Information).
Functionally related units spatially organize in seRNNs
So far, we have explored how imposing biophysical constraints within seRNNs produces structures that mimic observed networks. However, this ignores the functional roles of neurons or their patterning within the network. We next examined this by exploring the configuration of functionally related neurons in 3D space (Fig. 4a). In brain networks, neurons sharing a tuning profile to a stimulus tend to spatially group37,38. This can be observed on fine-grained cortical surfaces with preferences for stimuli features39 (Fig. 4b) and in whole-brain functional connectivity forming modular network patterns40 (Fig. 4c). In addition, high-resolution recordings in rodents show how the brain keeps many codes localized but also distributes some across the network41. To test whether seRNNs recapitulate functional co-localization, we decoded how much variance of unit activity can be explained by the goal location or choice options, over the course of each trial (‘Decoding’ in Methods). In Fig. 4d, we show a visualization in a representative network and unit-specific preferences over the course of a single trial.
By taking the relative preference for goal versus choice for each unit, we tested whether the relative sensitivity to stimuli was concentrated in parts of the network. We used a spatial permutation test (‘Spatial permutation test’ in Methods) to test whether the Euclidean distance between highly ‘goal’ or ‘choice’ selective neurons was significantly less or more than would be expected by chance. A small Pperm value highlights that functionally similar neurons tend to be significantly clustered in space whereas a large Pperm corresponds to functionally similar neurons being distributed in space (Fig. 4e, top).
We tested for functional co-localization across three time windows of the trial (the total duration of a trial was 50 steps; Fig. 1e): (1) early stage (goal presented, steps 15–20); (2) middle stage (choice options presented, steps 30–35) and (3) late stage (decision point, steps 45–50). At the early stage, when only goal information is presented, neurons code for only the goal information (widespread dark green nodes in Fig. 4d, left). In seRNNs, there is a slight positive skew in Pperm values, suggesting clustering of highly goal-coding neurons (Fig. 4e, middle left). Subsequently, in the middle stage, when choice options are first shown, goal information clusters within a concentrated area of space, leaving the choice information distributed (seen by clustering of green nodes and distribution of brown nodes in Fig. 4d, middle). This is highlighted by a large positive skew in Pperm values for the goal in seRNN networks (Fig. 4e, middle top) and correspondingly the opposite for choice information (Fig. 4e, middle bottom). In the late stage, the clustering of goal information in space dissipates such that by the time a decision must be made, the goal information has now spread out more but still retains some clustering (Fig. 4e, middle right). The choice code remains distributed (Fig. 4e, bottom right). This suggests that seRNNs use their highly modular structure to keep a connected core with goal information, which needs to be retained across the trial. It uses spatially proximal units to form this core. The presented choices information is then represented by units outside this core and dynamically integrates with the information in the core as the decision point approaches. These findings are unique to seRNNs, as L1 Pperm values remain uniform, indicative of no functional organization. The control analysis in Supplementary Fig. 12 shows these findings hold true when variables are treated independently instead of relatively.
Mixed selectivity and energy-efficient coding
So far, we have shown that adding spatial constraints to a network gives rise to patterns of network connectivity that are highly reminiscent of observed biological networks. Nodes functionally co-localize and the spatial embedding causes differences in how they code task-relevant information. This selectivity profile has been widely studied. Studies show that neurons in task-positive brain regions tend to show a mixed selectivity profile, meaning that neurons do not only code for a single task variable but instead a mixture of them30,42,43,44. A mixed-selective code is assumed to allow networks to solve complex tasks by increasing the decodability of information from the network’s neurons29,45. There are many ways to quantify selectivity profiles46. One simple method is to calculate the correlation of explained variances of task variables across the population of neurons. These are expected to be uncorrelated, implying a neutrally mixed code where a neuron’s coding preference for one variable does not predict its code for another variable. In single-unit recordings, correlations can be close to zero or sometimes slightly positive47.
We looked at the correlation of selectivities of trained networks (epoch 9) for the goal and choices variables. At the time in the trial when networks make a choice, the median correlation is r = −0.057 for seRNN but r = −0.303 for L1, showing that L1 networks produce an anticorrelated code while seRNNs have a more mixed-selective code (Fig. 5a). It is possible that this effect is driven by the differential connectome structure of the two groups of networks. While a modular and separated network would not automatically mix codes across variables evenly, we find a well-mixed code in seRNNs. The additional highly communicative connections between modules of the small-worldness characteristic might help seRNNs to organize units in space while retaining a mixed code across the population. ‘Mixed selectivity’ in Supplementary Information shows how networks specifically show a mixed-selective code at the time when the decision is made. Like our structural results, we saw that there is variation across the population of networks (Fig. 5a), where some networks fall neatly on r = 0 and others might show correlated codes. The following section provides an analysis of this variance.
The choice of a neuronal code in populations of neurons is strongly linked to the question of energy demand. As the firing of action potentials uses a substantial amount of energy48, a population of neurons should choose a code with a good trade-off of metabolic cost and information capacity29. To test our networks’ energy consumption, we calculated the mean activation of each unit in a network’s recurrent layer (epoch 9) during the period of information integration (after onset of choices). Then we tested for the difference between seRNNs and L1 networks, controlling for the effect of the average weight strength in the recurrent layer (Fig. 5b). Across most weight strengths, seRNNs showed significantly lower unit activations compared with L1 networks (P < 0.001, t(86,497) = 21.4, 95% confidence interval = [−0.271, −0.226]). Sustaining a mixed-selective code at the time of choice might help downstream integration units to decode information more easily, with fewer unit activations needed to communicate the correct choice. This effect disappears for networks with higher average weights, with weak regularization and hence weaker spatial embedding.
Constraints cause linked brain-like structure and function
So far, we have seen that seRNNs show a collection of features that are commonly observed in brains but have not previously been related. The caveat not addressed so far is that for any feature we observed in seRNNs, we also see strong variation across the population of networks (for example, Fig. 3b for modularity or Fig. 5a for mixed selectivity). This opens the possibility that these features do not arise in parallel in seRNNs but instead each feature could emerge in its unique subgroup of networks. This would be unlike biological brains, which exist in a critical sweet-spot area49 where all the features described in this paper are observed. In this section, we tested whether all seRNN features co-appear in a similar subset of trained networks, defined by a unique combination of training parameters.
To study the co-occurrence of brain features in seRNNs, we looked at the distribution of feature magnitude across the space of training parameters (regularization strength, number of training epochs passed). Figure 6a shows matrix plots for accuracy (left), total sum of weights (middle left), modularity (middle right) and small-worldness (right) across the entire spectrum of training epochs (x axis) and regularization strengths (y axis). As before, there is variation in the magnitude of features across the population of networks, but now we also see that this variation is structured. Brain-like topology emerges in a sweet-spot of low to medium strength regularization and during the later training epochs (pink box). The schematic in Fig. 6b highlights this space of sparse, highly accurate, modular small-world networks with an example network showing all properties (Fig. 6b, middle right). Above this space (that is, networks with less regularization, highlighted in orange) networks can solve the task and show small-worldness, but remain very dense and lack the modular organization found in empirical brain networks. Below this space (that is, networks with more regularization, highlighted in light blue) networks show extreme sparsity and modularity, but fail to functionally converge on the task and they lose their small-world topology.
Next, we wanted to look at the same ‘sweet spot’ in terms of the network’s functional properties. As the decoding required us to focus this analysis on networks with high task performance (‘Decoding’ in Methods), we use networks with an accuracy >90% at epoch 9. Figure 6c shows the functional results across regularization strengths, highlighting the sweet spot of regularization from Fig. 6a with the pink box. In the first two plots from the top, we show two structural metrics (sparsity and short connection preference). We observed the same distribution when looking at the homophily generative wiring rule (Supplementary Fig. 11b). Looking at mixed selectivity (Fig. 6c, third from top), our analyses revealed that networks show a mixed-selective code at the decision point in the sweet-spot window identified before. Units here show a balanced code with information for both goal and choices (Fig. 6c, bottom), whereas very dense or sparse networks show a preference for either goal or choices information. As such, the density and related modular small-world structure influences the time horizon of information flowing through the network. Dense networks show greater focus on past information, which resonates with how functional networks reconfigure to support memory50. Supplementary Fig. 14 shows a correlation matrix showing pairwise relationships between features studied here.
Our findings show that there is a critical parameter window in which both structural and functional brain features jointly emerge in seRNNs. Brains are often said to live in a unique but critical niche where all characteristics needed to support their function can exist in parallel51. seRNNs show the same preference for a critical parameter window but also give us the ability to study networks on their way to converging on brain-like characteristics in this critical window.
Discussion
Functioning brains have key organizational features endowing them with computational capacities to perform a broad range of cognitive operations efficiently and flexibly. These include sparse connectivity with a modular small-world structure25,27,52, generatable via homophilic wiring rules34,35,36, with spatially configured functional units that implement a mixed-selective code30,45 and minimize energy expenditure29,48. We argue that these complex hallmarks can be, at least in part, attributed to three forces impacting virtually any brain network: optimization of functional performance in a (task) environment, metabolic/structural costs of the network and signal communication within the network. In this work we have shown that seRNNs allow us to manipulate these optimization goals, demonstrating that seemingly unrelated neuroscientific findings can emerge in unison and appear to have a strong co-dependence. We believe that these findings also have an impact on how we think about the interlinked structural and functional optimization processes in the brain under economic constrains (‘Network economics in structural and functional neuroscience models’ in Supplementary Discussion). Our model provides an important tool to continue the work on jointly studying structure and function in neuroscience models53,54,55,56,57. In addition, our results are relevant for developments on the intersection of neuroscience and artificial intelligence (NeuroAI58) (‘Implications of seRNN findings on artificial intelligence’) in Supplementary Discussion.
There are many areas that we wish to improve on with future research. Principally, our models did not include a substantial amount of biological detail that, while inevitably critical for neuronal functioning, does not speak to the observations we aimed to recapitulate in the present study. Implementing such details including molecular mechanisms guiding circuit development59 or heterogeneous spiking of neurons60 will probably provide insights into the trade-offs specific to biological brains. The addition of such details will help us expand the applicability of our models to explore the effect of developmental time courses61,62, functional brain specialization63 and how network variability may underpin individual differences64. Beyond these biological details, it will be important to see how different functional goals would have differential effects on structural optimization processes. The simple working memory task used here provides a first realistic cognitive challenge, but it will be interesting to consider seRNNs in continuous choice multi-task environments. Finally, it is unknown to what extent the inclusion of biophysical constraints has on the randomness of network structure, although we speculate it would generate less-random network structures, compared with regular task-optimized networks.
The development of seRNNs allowed us to observe the impact of optimizing task control, structural cost and network communication in a model system that can dynamically trade off its structural and functional objectives. This suggests that providing artificial neural networks with a topophysical structure65,66 can enhance our ability to directly link computational models of neural structure and function. We believe that the modelling approach shown to work in seRNNs will speed up innovations in neuroscience by allowing us to systematically study the relationships between features that all have been individually discussed to be of high importance to the brain.
Methods
seRNN regularization function
In a canonical supervised RNN, all the network’s trainable parameters are optimized to minimize the difference between the predicted value and correct value. To achieve this, we define a task loss function (L), which defines the prediction error to be minimized to optimize task performance. To produce a network that generalizes well to unseen data, we can add a regularization term. Regularization incentivizes networks to converge on sparse solutions and is commonly applied to neural networks in general67 and neuroscientific network models68,69. For a regularized network, the loss function becomes a combination of both the task loss and the regularization loss. One example of a commonly applied regularization is the L1 regularization, which is also used in LASSO regression70 and incentivizes the network to maximize task performance while concurrently minimizing the sum of all absolute weights in the neural network. If we want to regularize the recurrent weight matrix (W) with the dimensions m × m, where m is number of units in the recurrent layer, the loss function would be:
An RNN with this loss function would learn to solve the task with a sparse weight matrix (𝑤𝑖,𝑗), where γ would determine the extent to which the network is forced to converge on a sparse solution. This parameter is called the regularization strength.
Unlike regular RNNs, real brain networks are embedded in a physical space12,13,14. To simulate the pressures caused by existing in a biophysical space, we manipulated the regularization term. We hypothesized that by incorporating constraints that appear common to any biological neural system, we could test whether these local constraints are sufficient to drive a network architecture that more closely resembles observed brain networks. Specifically, we included spatial constraints in two forms—Euclidean and network communication—that we argue are integral to any realistic neural network. To implement this, we first embed units within a 3D space, such that each unit has a corresponding x, y and z coordinate. Using these coordinates, we can generate a Euclidean distance matrix that describes the physical distance between each pair of nodes (Fig. 1b). This allows to minimize weights multiplied by their Euclidean distance (di,j), thereby incentivizing the network to minimize (costly) long-distance connections. The element-wise matrix multiplication is denoted with the Hadamard product \(\odot\). Adding this to our optimization term gives us:
The above formalization provides a spatial context for RNN training. In a next step, we want to follow the same approach to incentivize networks to preferably prune weights that are not strongly contributing to the within-network communication structure. We can impose this influence of communication via a weighted communicability term19,22, which computes the extent to which, under a particular network topology, any two nodes are likely to communicate both directly and indirectly over time (Fig. 1c). Now taking this topological communication into account, we get the following loss function:
Supplementary Figs. 1–5 provide a walkthrough explanation of how this term works and expand on the logic of how constraining the network’s topology can serve as a prior for intra-network communication in sparse networks. Supplementary Fig. 6 specifically highlights the role that communicability has within the network optimization process. Note that in equation (6), S is a diagonal matrix with the degree of unit i (degi) on the diagonal (that is, the node strength), which simply acts as a normalization term preventing any one single edge having undue influence19. This is explained in Supplementary Figs. 4 and 5.
Importantly, as all terms (W, D, C) are element-wise multiplied within the regularization term, they are all minimized as part of the training process. Note, it is possible, in principle, to parameterize each part of the above equation to vary the extent to which each term influences network outcomes. However, in this work, we focus on establishing the role of all in tandem. Future work could look to establish models with greater parameterization to establish optimal relative magnitudes.
Task paradigm
The task that networks are presented with is a one-choice inference task requiring networks to remember and integrate information (Fig. 1f). On an abstract level, networks needed to first store a stimulus, integrate it with a second stimulus and make a predefined correct choice. More specifically, networks first observe stimulus A for 20 time steps, followed by a delay for 10 time steps, followed by stimulus B for 20 steps. Agents must then make one choice. This set-up can be interpreted as a one-step navigation task, where agents are presented with the goal location (stimulus A) followed by possible choice directions (stimulus B). The choice to be made is the one moving closer to the goal. Extended Data Table 1 outlines all possible trials and defines whether the given trial is included in the regular version of the task used in the main text.
All stimuli are one-hot encoded with a vector of eight binary digits. The first four define the goal locations and only one of the four digits would be set to one during the goal presentation. The second four binary digits each stand in for one allowed choice direction and two choice directions would be set to one during the choice options presentation. Gaussian noise with a standard deviation of 0.05 is added to all inputs.
This task design is a simplified version of a multi-step maze navigation task we have recorded in macaques. A harder version of the task with an extended set of trials is equivalent to the first choice monkeys face in their version of the task. We use the full set of trials for a control calculation in Supplementary Fig. 8. After this first choice, the monkeys then continue the task with a further step to reach the goal and collect the reward. As the goal of this study was to establish the emerging features of seRNNs, here we focus just on the first choice and leave questions relating to the multi-step task to future investigations.
RNN modelling
All recurrent neural networks in this project have 100 units in the hidden layer and are defined by the same basic set of equations:
Here xt is the input vector at time t (1 × 8), Wx is the input layer weight matrix (8 × 100) (Xavier initialization), ht−1 is the activation of hidden layer at time t − 1 (1 × 100) (zeros initialization), Wh is the hidden layer weight matrix (100 × 100) (orthogonal initialization), bh is the bias of hidden layer (1 × 100) (zeros initialization), ht is the activation of hidden layer at time t (1 × 100) (zeros initialization), Wy is the output layer weight matrix (100 × 8) (Xavier initialization), by is the bias of network output (1 × 8) (zeros initialization), σ is the softmax activation function and \(\widehat{{y}_{t}}\) is the network output/prediction.
Networks differ in terms of which regularization was applied to its hidden layer and with which regularization strength. Networks are optimized to minimize a cross entropy loss on task performance combined with the regularization penalty using the Adam optimizer (hyperparameters: learning rate 0.001, beta_1 0.9, beta_2 0.999, epsilon 1 × 10−7) for 10 epochs. Note that the network’s choice is only read out once, at the very end of the trial. Each epoch consists of 5,120 problems, batched in blocks of 128 problems.
Regularization strength set-up and network selection
The most critical parameter choice in our analyses is the regularization strength. As shown across analyses (for example, Fig. 6), the strength of the regularization has a major influence on all metrics analysed here. While the L1 regularization and the purely Euclidean regularization could be matched by average strength of regularization of the hidden layer, the communicability term of seRNNs makes this challenging due to it being dependent on the current state of the hidden layer and hence changing throughout training. To match the spectrum of regularization strengths in L1 and seRNNs, we used a functional approach. As performance in the task starts to break down as networks become too sparse to effectively remember past stimuli, we matched regularization strength using task performance before looking at any of the other structural or functional metrics. Specifically, we set the regularization spectrum on a linear scale and chose the boundary values so that task performance started to deteriorate half-way through the set of networks (so around the 500th network for the sets of 1,000 networks).
To make both groups comparable, we focus our analyses on networks that achieve >90% task accuracy. For the L1 networks, these were 47.9% of all trained networks and for seRNN networks 39%. Note that this difference in percentages is not meaningful per se and could be eliminated by matching the regularization spectra of both groups more closely. As we focus our analyses on highly functional networks with high task accuracy, matching the regularization spectra of both groups would have not influenced the results. The code repository has an overview file with regularization strengths chosen for different network types. We hope that future implementations of the seRNNs can provide a method for more precise numerical matching between regularization strengths.
Topological analysis
Graph theory network statistics were calculated using the Brain Connectivity Toolbox71, and the mathematical formalisms are provided. All network statistics were calculated on the hidden RNN weight matrix and all edges were enforced to be the absolute value of the element. When the measure in question was binary (for example, small-worldness) a proportional threshold was applied, taking the top 10% of these absolute connections.
Modularity
The modularity statistic, Q, quantifies the extent to which the network can be subdivided into clearly delineated groups:
where 𝑙 is number of connections, 𝑁 is the total number of nodes, 𝑎𝑖𝑗 is the connection status between nodes 𝑖 and 𝑗 (𝑎𝑖,𝑗 =1 when 𝑖 and 𝑗 are connected) and 𝑎𝑖,𝑗 = 0 otherwise, where 𝑘𝑖 and 𝑘j are the total number of connections (degrees) of nodes 𝑖 and 𝑗. mi is the module containing node i, and \({\delta }_{{m}_{i}{m}_{j}}=1\) if mi = mj, and 0 otherwise. In this work, we tested the modularity using the default resolution parameter of 1.
Small-worldness
Small-worldness refers to a graph property where most nodes are not neighbours of one another, but the neighbours of nodes are likely to be neighbours of each other. This means that most nodes can be reached from every other node in a small number of steps. It is given by:
where c and crand are the clustering coefficients, and l and lrand are the characteristic path lengths of the respective tested network and a random network with the same size and density of the empirical network. Networks are generally considered as small-world networks at σ > 1. In our work, we computed the random network as the mean statistic across a distribution of n = 1,000 random networks. The characteristic path length is given by:
Generative network modelling
We use a technique called generative network modelling to investigate whether the connectome of networks can be recreated by unsupervised wiring rules. The idea is to start from an empty network and probabilistically add connections-based simple wiring equations. The wiring equations are based on the topological structure of the existing network. We follow the approach outlined in refs. 15,35. We provide an overview of this approach in ‘Generative network modelling of RNNs’ in Supplementary Information.
Decoding
To analyse the internal function of our trained recurrent neural networks, we record the hidden state activity of every unit while the network solves a set of 640 trials. Each trial is constituted of 50 steps (as shown in Fig. 1e). For decoding, the activity is averaged in step windows of 5, so that there is a total of 10 time windows. In animal electrophysiology, researchers often look at the explained variance per task variable per unit. To allow for comparison of our networks with findings in the literature, we wanted to extract the same metric. Given the nature of our task, the variables used to predict unit activity (goal, choice options, correct choice) are highly correlated, so that the standard decoding with analysis of variance (ANOVA) would give biased results. Instead, we used a decoding algorithm based on L1 regression, as follows.
-
(1)
Apply cross-validated L1 regression with k-fold cross validation (5 folds) to set alpha term with best cross-validation performance.
-
(2)
Split the dataset via repeated k-fold (3 folds, 2 repeats).
-
(3)
On each (train, test) dataset:
-
(a)
Train L1 regression with the pre-set alpha term.
-
(b)
Calculate explained variance in test dataset including all predictor variables.
-
(c)
Iteratively set all values of a given set of predictors (for example, all goal predictors) to 0 and recalculate the explained variance and calculate the drop of explained variance per predictor group.
-
(d)
Take mean of drop of explained variance for each group across splits of dataset.
This algorithm results in every unit in every network being assigned an explained variance number for every task variable. Note that the decoding cannot reliably work in networks that make too many errors, so that we functionally analyse only networks with a task performance of 90% or above.
Spatial permutation test
To examine the spatial clustering of decoded task information of neuronal ensembles within the RNNs, we constructed a simple spatial permutation test as follows.
-
(1)
Considering a single RNN hidden layer at a particular task time window (note, explained variances change over the course of the task), for each unit, compute the relative preference for goal versus choice explained variance for each unit. This is calculated as the goal explained variance minus the choice explained variance.
-
(2)
Between all n ‘goal’ units (that is, positive difference from step 1), compute the Euclidean distance weighted by the decoding for goal information. This, therefore, captures the spatial proximity between goal units weighted by the magnitude of their ‘goal’ information. Average this matrix to compute a summary statistic. This is the observed statistic.
-
(3)
Then repeat this procedure for 1,000 times, but for a random set of n units taken from the 3D grid space. These 1,000 summary statistics constitutes the null distribution.
-
(4)
Compute a permuted P value (Pperm), which is simply the location in which the observed statistic (step 2) sits within the null distribution (step 3) normalized to the range [0 1]. This value subsequently corresponds to how clustered or distributed the observed goal decoding information is clustered in space relative to random chance. A small Pperm means that information is clustered more than chance and vice versa.
-
(5)
Do steps 1–4, but between all ‘choices’ units (that is, negative difference from step 1).
-
(6)
Redo steps 1–5 for all desired time windows that have been decoded. In the current work, we calculated Pperm values for time window 3, time window 6 and time window 9 to reflect different aspects of the task over the sequence of the task.
The above steps were done for all functional RNNs (>90% accuracy) for L1 and seRNNs. We presented distributions of these Pperm values for goals and choices to highlight how goal and choices information is clustered, distributed or random at key points in the sequence of the task. To ensure that we did not bias our findings, we further computed a slight variation of the above statistical test, which allows us to assess the clustering of coding information independently (that is, without computing relative goal versus choice coding, as in step 1 above). As cluster size was now not determined by the direction of coding (as it was previously), we instead used the 50 units with the highest variance-explained values for a given variable. This was selected because this approximately mirrors the cluster sizes achieved in the primary functional clustering analysis. Mirroring the permutation testing approach, we calculated Pperm by ranking the mean Euclidean distance between these units (top 50% coding neurons) in a null distribution of Euclidean distance between 1,000 permuted samples of 50 units. This was done for goal and choice options (to assess replication). This test is advantageous in that it allows for testing variables independently, but disadvantageous in that it does not directly incorporate the coding magnitude into the test statistics. These findings are given in Supplementary Fig. 12.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
No unique data were used in the production of this paper. Reference values were extracted from the respective cited reference. All data shown in the figures are based on simulations, which are described in the ‘Code availability’ section. The data files generated with simulations that underlie the figures are available in the CodeOcean capsule belonging to this paper (https://doi.org/10.24433/CO.3539394.v2)72.
Code availability
We provide detailed walkthroughs for the training of our recurrent neural networks alongside all the code used to create the plots in this paper on CodeOcean (https://doi.org/10.24433/CO.3539394.v2)72. We provide additional example implementations of seRNNs on GitHub. As new implementations of seRNNs become available, we will add them to this paper’s GitHub repository alongside the implementation used for this project. The GitHub repository is https://github.com/8erberg/spatially-embedded-RNN.
References
Fair, D. A. et al. Functional brain networks develop from a ‘local to distributed’ organization. PLoS Comput. Biol. 5, e1000381 (2009).
Kaiser, M. Mechanisms of connectome development. Trends Cogn. Sci. 21, 703–717 (2017).
Bosman, C. & Aboitiz, F. Functional constraints in the evolution of brain circuits. Front. Neurosci. 9, 303 (2015).
Heuvel van den, M. P., Bullmore, E. T. & Sporns, O. Comparative connectomics. Trends Cogn. Sci. 20, 345–361 (2016).
Hiratani, N. & Latham, P. E. Developmental and evolutionary constraints on olfactory circuit selection. Proc. Natl Acad. Sci. USA 119, e2100600119 (2022).
Mišić, B. et al. Network-level structure–function relationships in human neocortex. Cereb. Cortex 26, 3285–3296 (2016).
Smith, V. et al. Fluid intelligence and naturalistic task impairments after focal brain lesions. Cortex 146, 106–115 (2022).
Doerig, A. et al. The neuroconnectionist research programme. Nat. Rev. Neurosci. https://doi.org/10.1038/s41583-023-00705-w (2023).
Kaiser, M. & Hilgetag, C. C. Modelling the development of cortical systems networks. Neurocomputing 58–60, 297–302 (2004).
Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).
Ali, A., Ahmad, N., De Groot, E., Johannes Van Gerven, M. A. & Kietzmann, T. C. Predictive coding is a consequence of energy efficiency in recurrent neural networks. Patterns 3, 100639 (2022).
Barthélemy, M. Spatial networks. Phys. Rep. 499, 1–101 (2011).
Bassett, D. S. & Stiso, J. Spatial brain networks. C. R. Phys. 19, 253–264 (2018).
Bullmore, E. & Sporns, O. The economy of brain network organization. Nat. Rev. Neurosci. 13, 336–349 (2012).
Akarca, D. et al. Homophilic wiring principles underpin neuronal network topology in vitro. Preprint at bioRxiv https://doi.org/10.1101/2022.03.09.483605 (2022).
Song, H. F., Kennedy, H. & Wang, X.-J. Spatial embedding of structural similarity in the cerebral cortex. Proc. Natl Acad. Sci. USA 111, 16580–16585 (2014).
Avena-Koenigsberger, A., Misic, B. & Sporns, O. Communication dynamics in complex brain networks. Nat. Rev. Neurosci. 19, 17–33 (2018).
Laughlin, S. B. & Sejnowski, T. J. Communication in neuronal networks. Science 301, 1870–1874 (2003).
Crofts, J. J. & Higham, D. J. A weighted communicability measure applied to complex brain networks. J. R. Soc. Interface 6, 411–414 (2009).
Griffa, A. et al. The evolution of information transmission in mammalian brain networks. Preprint at bioRxiv https://doi.org/10.1101/2022.05.09.491115 (2022).
Seguin, C., Mansour L, S., Sporns, O., Zalesky, A. & Calamante, F. Network communication models narrow the gap between the modular organization of structural and functional brain networks. NeuroImage 257, 119323 (2022).
Estrada, E. & Hatano, N. Communicability in complex networks. Phys. Rev. E 77, 036111 (2008).
Betzel, R. F., Medaglia, J. D. & Bassett, D. S. Diversity of meso-scale architecture in human and non-human connectomes. Nat. Commun. 9, 346 (2018).
Bertolero, M. A., Yeo, B. T. T. & D’Esposito, M. The modular and integrative functional architecture of the human brain. Proc. Natl Acad. Sci. USA 112, E6798–E6807 (2015).
Park, H.-J. & Friston, K. Structural and functional brain networks: from connections to cognition. Science 342, 1238411 (2013).
Sporns, O. & Betzel, R. F. Modular brain networks. Annu. Rev. Psychol. 67, 613–640 (2016).
Bassett, D. S. & Bullmore, E. T. Small-world brain networks revisited. Neuroscientist 23, 499–516 (2017).
Sporns, O. & Zwi, J. D. The small world of the cerebral cortex. Neuroinformatics 2, 145–162 (2004).
Johnston, W. J., Palmer, S. E. & Freedman, D. J. Nonlinear mixed selectivity supports reliable neural computation. PLoS Comput. Biol. 16, e1007544 (2020).
Rigotti, M. et al. The importance of mixed selectivity in complex cognitive tasks. Nature 497, 585–590 (2013).
Hilger, K., Ekman, M., Fiebach, C. J. & Basten, U. Intelligence is associated with the modular structure of intrinsic brain networks. Sci. Rep. 7, 16088 (2017).
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998).
Bassett, D. S. & Sporns, O. Network neuroscience. Nat. Neurosci. 20, 353–364 (2017).
Betzel, R. F. et al. Generative models of the human connectome. NeuroImage 124, 1054–1064 (2016).
Akarca, D., Vértes, P. E., Bullmore, E. T. & Astle, D. E. A generative network model of neurodevelopmental diversity in structural brain organization. Nat. Commun. 12, 4216 (2021).
Vértes, P. E. et al. Simple models of human brain functional networks. Proc. Natl Acad. Sci. USA 109, 5868–5873 (2012).
Kanwisher, N. Functional specificity in the human brain: a window into the functional architecture of the mind. Proc. Natl Acad. Sci. USA 107, 11163–11170 (2010).
Thompson, W. H. & Fransson, P. Spatial confluence of psychological and anatomical network constructs in the human brain revealed by a mass meta-analysis of fMRI activation. Sci. Rep. 7, 44259 (2017).
Waskom, M. L. & Wagner, A. D. Distributed representation of context by intrinsic subnetworks in prefrontal cortex. Proc. Natl Acad. Sci. USA 114, 2030–2035 (2017).
Ji, J. L. et al. Mapping the human brain’s cortical-subcortical functional network organization. NeuroImage 185, 35–57 (2019).
Steinmetz, N. A., Zatka-Haas, P., Carandini, M. & Harris, K. D. Distributed coding of choice, action and engagement across the mouse brain. Nature 576, 266–273 (2019).
Wallach, A., Melanson, A., Longtin, A. & Maler, L. Mixed selectivity coding of sensory and motor social signals in the thalamus of a weakly electric fish. Curr. Biol. 32, 51–63.e3 (2022).
Hirokawa, J., Vaughan, A., Masset, P., Ott, T. & Kepecs, A. Frontal cortex neuron types categorically encode single decision variables. Nature 576, 446–451 (2019).
Whittington, J. C. R., Dorrell, W., Ganguli, S. & Behrens, T. E. J. Disentanglement with biological constraints: a theory of functional cell types. In The Eleventh International Conference on Learning Representations (ICLR, 2023).
Fusi, S., Miller, E. K. & Rigotti, M. Why neurons mix: high dimensionality for higher cognition. Curr. Opin. Neurobiol. 37, 66–74 (2016).
Bernardi, S. et al. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell 183, 954–967.e21 (2020).
Erez, Y. et al. Integrated neural dynamics for behavioural decisions and attentional competition in the prefrontal cortex. Eur. J. Neurosci. 56, 4393–4410 (2022).
Attwell, D. & Laughlin, S. B. An energy budget for signaling in the grey matter of the brain. J. Cereb. Blood Flow Metab. 21, 1133–1145 (2001).
Beggs, J. M. The criticality hypothesis: how local cortical networks might optimize information processing. Phil. Trans. R. Soc. Math. Phys. Eng. Sci. 366, 329–343 (2008).
Cohen, J. R. & D’Esposito, M. The segregation and integration of distinct brain networks and their relationship to cognition. J. Neurosci. 36, 12083–12094 (2016).
O’Byrne, J. & Jerbi, K. How critical is brain criticality? Trends Neurosci. 45, 820–837 (2022).
Yan, C. & He, Y. Driving and driven architectures of directed small-world human brain functional networks. PLoS ONE 6, e23460 (2011).
Suárez, L. E., Richards, B. A., Lajoie, G. & Misic, B. Learning function from structure in neuromorphic networks. Nat. Mach. Intell. 3, 771–786 (2021).
Lindsay, G. W., Rigotti, M., Warden, M. R., Miller, E. K. & Fusi, S. Hebbian learning in a random network captures selectivity properties of the prefrontal cortex. J. Neurosci. 37, 11021–11036 (2017).
Finzi, D., Margalit, E., Kay, K., Yamins, D. L. & Grill-Spector, K. Topographic DCNNs trained on a single self-supervised task capture the functional organization of cortex into visual processing streams. In NeurIPS 2022 Workshop SVRHM (2022); https://openreview.net/forum?id=E1iY-d13smd
Damicelli, F., Hilgetag, C. C. & Goulas, A. Brain connectivity meets reservoir computing. PLOS Comput. Biol. 18, e1010639 (2022).
Goulas, A., Damicelli, F. & Hilgetag, C. C. Bio-instantiated recurrent neural networks: integrating neurobiology-based network topology in artificial networks. Neural Netw. 142, 608–618 (2021).
Zador, A. et al. Catalyzing next-generation artificial intelligence through NeuroAI. Nat. Commun. 14, 1597 (2023).
Moons, L. & De Groef, L. Molecular mechanisms of neural circuit development and regeneration. Int. J. Mol. Sci. 22, 4593 (2021).
Perez-Nieves, N., Leung, V. C. H., Dragotti, P. L. & Goodman, D. F. M. Neural heterogeneity promotes robust learning. Nat. Commun. 12, 5791 (2021).
Baxter, R. A. & Levy, W. B. Constructing multilayered neural networks with sparse, data-driven connectivity using biologically-inspired, complementary, homeostatic mechanisms. Neural Netw. 122, 68–93 (2020).
Chechik, G., Meilijson, I. & Ruppin, E. Synaptic pruning in development: a computational account. Neural Comput. 10, 1759–1777 (1998).
Johnson, M. H. Interactive specialization: a domain-general framework for human functional brain development? Dev. Cogn. Neurosci. 1, 7–21 (2011).
Siugzdaite, R., Bathelt, J., Holmes, J. & Astle, D. E. Transdiagnostic brain mapping in developmental disorders. Curr. Biol. 30, 1245–1257.e4 (2020).
Bassett, D. S. et al. Efficient physical embedding of topologically complex information processing networks in brains and computer circuits. PLoS Comput. Biol. 6, e1000748 (2010).
Sperry, M. M., Telesford, Q. K., Klimm, F. & Bassett, D. S. Rentian scaling for the measurement of optimal embedding of complex networks into physical space. J. Complex Netw. 5, 199–218 (2017).
Hardt, M. & Recht, B. Patterns, Predictions, and Actions (Princeton Univ. Press, 2022).
Kietzmann, T. C. et al. Recurrence is required to capture the representational dynamics of the human visual system. Proc. Natl Acad. Sci. USA 116, 21854–21863 (2019).
Yang, G. R., Joglekar, M. R., Song, H. F., Newsome, W. T. & Wang, X.-J. Task representations in neural networks trained to perform many cognitive tasks. Nat. Neurosci. 22, 297–306 (2019).
Tibshirani, R. Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Ser. B 58, 267–288 (1996).
Rubinov, M. & Sporns, O. Complex network measures of brain connectivity: uses and interpretations. NeuroImage 52, 1059–1069 (2010).
Achterberg, J., Akarca, D., Strouse, D. J., Duncan, J. & Astle, D. E. Capsule for Achterberg & Akarca, et al: Spatially-embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings. CodeOcean https://doi.org/10.24433/CO.3539394.v2 (2023).
Acknowledgements
We thank M. Botvinick for helpful comments and input throughout the development of this project. We thank M. Dillinger for detailed comments on the paper. J.A. receives a Gates Cambridge Scholarship. D.A. receives a Cambridge Trust Vice Chancellor’s Scholarship. D.J.S. is funded by Google DeepMind. The following grants supported this work: UKRI MRC funding (MC_UU_00030/7 and MC-A0606-5PQ41): J.A., D.A., D.E.A. and J.D. James S. McDonnell Foundation Opportunity Award and the Templeton World Charity Foundation, Inc. (funder DOI 501100011730) under the grant TWCF-2022-30510: D.A. and D.E.A. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) license to the text, figures and code relating to this paper.
Author information
Authors and Affiliations
Contributions
J.A., D.A., D.J.S., J.D. and D.E.A. were involved in the conceptualization and methodological developments of the study. J.A. and D.A. conducted the analyses. J.A., D.A., D.J.S., J.D. and D.E.A interpreted the results. J.A. and D.A. wrote the paper in consultation with J.D. and D.E.A.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Bratislav Misic, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Supplementary information
Supplementary Information
Supplementary Figs. 1–14 and Discussion.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Achterberg, J., Akarca, D., Strouse, D.J. et al. Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings. Nat Mach Intell 5, 1369–1381 (2023). https://doi.org/10.1038/s42256-023-00748-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-023-00748-9
This article is cited by
-
Dynamics of specialization in neural modules under resource constraints
Nature Communications (2025)
-
Connectome-based reservoir computing with the conn2res toolbox
Nature Communications (2024)
-
Spatially embedded neuromorphic networks
Nature Machine Intelligence (2023)