Keywords

1 Introduction

Fluid intelligence (FI) is refers to the ability to reason and to solve new problems independently of previously acquired knowledge [1]. Many studies have examined the correlations of brain anatomical structures with human intelligence [1,2,3]. Deciphering the neural mechanisms underlying human intelligence is important for understanding neurocognitive development and has fundamental implications for education and clinical work. For example, different brain regions and anatomical differences have been associated with different learning abilities. In children, anatomical differences in hippocampus and associated learning and memory regions predicted difference in math skill acquisition [4]. In adults, video game skills were linked to striatal volumes [5] and foreign language skills were linked to white matters of the left insula/prefrontal cortex and inferior parietal cortices [6]. Better characterization of brain biomarkers for different cognitive functions may facilitate the development of targeted training and intervention programs for both typically developing children and those with developmental disabilities.

As the largest long-term study of brain development and child health in the United States [7], the Adolescent Brain Cognitive Development Study (ABCD) Consortium invited researchers to participate in their Neurocognitive Prediction (NP) Challenge. Contestants were asked to use structural MRI data acquired from the study to predict fluid intelligence scores. For the NP Challenge, data from 4154 subjects were provided to participants for training (3739 samples) and validation (415 samples). Data from 4515 additional subjects was reserved as the test set. This report describes the results of our machine learning analyses of these data.

2 Methods

2.1 Dataset and Features

Fluid intelligence (FI) scores were pre-residualized by the Challenge organizer to remove the effects of total brain volume and sociodemographic variables including data collection site, age at baseline, sex at birth, race/ethnicity, highest parental education, parental income, and parental marital status. FI scores from the training and validation sets were provided to contestant during the challenge. 122 regional brain volumes were provided by the Challenge organizer for all samples from the training, validation and test sets. Briefly, the skull stripped MRI images were affinely aligned to the SRI 24 atlas [9], and segmented into regions of interest according to the atlas using standardized processing pipelines in the ABCD Study (details see [10]).

We factor analyzed the training set, and 36 principal factors were identified to account for 100% of the variances with the top 18 factors accounted for 80% of the total variances. Varimax rotated factor scores were generated for the validation and test sets based on the training set factors. In addition to the MRI volumetric scores and factor scores, we included age and sex as features as they are universally available information. All input features were scaled based on the training set’s minimum and maximum values.

2.2 Statistical Machine Learning Methods

Eight different algorithms were investigated using Scikit-Learn’s grid search tool for hyper-parameter optimization: random forest regressor (RFR), stochastic gradient descent regressor (SGD), Lasso linear model with least angle regression (LassoLar), elastic net (EN), multilayer perceptron regressor (MLP), Ridge regression (Ridge), support vector regression (SVR) and Nu support vector regression (NuSVR) with linear, poly or sigmoid kernel. See the Scikit-Learn online documentation for detailed hyper-parameters available for tuning using grid search tool (https://scikit-learn.org). During grid search, models were optimized using the R2 coefficient of the validation samples. R2 is defined as follows:

$$ {\text{R}}^{2} = 1 - \frac{u}{v} $$

where u is the residual sum of squares

$$ {\text{u}} = \sum\nolimits_{i = 1}^{n} {\left( {y_{true} - y_{pred} } \right)}^{2} $$

and v is the total sum of squares

$$ {\text{v}} = \sum\nolimits_{i = 1}^{n} {\left( {y_{true} - \frac{1}{n}\sum\nolimits_{i = 1}^{n} {y_{true} } } \right)^{2} } $$

A perfect prediction will have a score of 1.

We compared the training and validation R2 scores for different models using different input features. The best model was chosen based on having the highest validation score and the lowest training score that was as good as or better than the validation score. We used the latter decision rule to reduce overfitting.

Prediction of FI scores was obtained by fitting training data to the best model and obtaining the predicted values (ypred). The predicted results were also evaluated with the mean squared error (MSE, requested by the Challenge organizer, see below) and correlation coefficient with the true FI values (ytrue):

$$ MSE = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left( {y_{pred} - y_{true} } \right)}^{2} $$

2.3 Learning Curve Analyses

Learning curves were generated by using deciles of the total training sample to fit the model. For each decile of sample size, we plotted the training and validation R2. We use learning curve analysis to evaluate the model’s bias and variance, as well as the sample size effect. This helps to draw inferences about how models might be improved in the future [11].

3 Results

3.1 Grid Search Results

The grid searches of hyperparameter spaces for eight models using three different feature sets returned 24 sets of training and validation R2 scores (Fig. 1). Each dot within each of the 24 sets represents one specific model with a unique set of hyperparameters setting. NuSVR had the highest validation scores (the highest R2 = 0.046). RFR ranked second, but was considerable lower (the highest R2 = 0.029). None of the other models yielded predictions with R2 above 0.02. Thus we only considered the NuSVR model further. For NuSVR, the best feature set used the 36 principal factor scores as inputs. The best hyperparameters were: gamma = 1, nu = 0.2, C = 10, kernel = ‘rbf’.

Fig. 1.
figure 1

Training scores were plotted against validation scores (R2) for all eight models using three types of input features. Validation scores below −.02 were discarded. sMRI, structure MRI.

3.2 Prediction Results

Using the best NuSVR model, we predicted the FI scores for the training and the validation sets. Predicted scores were plotted against the actual FI scores (Fig. 2). The scores were significantly correlated (r = 0.54 for the training samples and 0.21 for the validation samples (p < 0.0001 for both). The mean squared errors (MSE) for the training and validation samples were 68.2 and 68.6, respectively. Removing age and sex as predictors from the model yielded a slightly higher MSE (70.5), but the predictions was significantly different from the original prediction (F(1, 17314) = 0.72; p = 0.4).

Fig. 2.
figure 2

Predicted Fluid Intelligence (FI) score vs the actual FI score were plotted for training and validation sets.

3.3 Learning Curve

The learning curve plots the training and validation R2 for 10 incremental sample sizes (at 10% increments) up to the total number of samples. In an ideal model, with increasing sample sizes, the training and validation scores should gradually converge. We found, however, that our training and validation scores did not fully converge, suggesting that model has some degree of over-fitting at the current sample size and that increasing the training sample will be needed to improve the accuracy. Having more predictive features or using current features in a more efficient way could also increase accuracy (Fig. 3).

Fig. 3.
figure 3

Learning curve of the NuSVR model.

4 Discussion

Using the structural MRI data provided by the ABCD NP Challenge organizer, we developed statistical machine learning algorithms to predict fluid intelligence scores. We identified NuSVR to be the best prediction model and found that using 36 principal factors yielded the highest prediction accuracies. The predicted FI scores were significantly, albeit modestly, correlated with the actual scores. Our results show the promise of using structural MRI data to predict fluid intelligence and support prior findings of strong anatomical correlations of brain structures with human intelligence. However, we also found that current sample size is not adequate and that more training samples will likely help to improve the model’s prediction.

In addition to the sample size limit, we also note that we did not use the T1-weighted MRI images that were also provided for the Challenge. Methods such as convolutional neural networks may be able to extract useful features from the three dimensional MRI images. Such methods may help improve prediction accuracies.