Abstract
In voxel-based neuroimage analysis, lesion features have been the main focus in disease prediction due to their interpretability with respect to the related diseases. However, we observe that there exist another type of features introduced during the preprocessing steps and we call them “Procedural Bias”. Besides, such bias can be leveraged to improve classification accuracy. Nevertheless, most existing models suffer from either under-fit without considering procedural bias or poor interpretability without differentiating such bias from lesion ones. In this paper, a novel dual-task algorithm namely GSplit LBI is proposed to resolve this problem. By introducing an augmented variable enforced to be structural sparsity with a variable splitting term, the estimators for prediction and selecting lesion features can be optimized separately and mutually monitored by each other following an iterative scheme. Empirical experiments have been evaluated on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The advantage of proposed model is verified by improved stability of selected lesion features and better classification results.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Voxel-based structural magnetic resonance imaging
- Procedural bias
- Split Linearized Bregman Iteration
- Feature selection
1 Introduction
Usually, the first step of voxel-based neuroimage analysis requires preprocessing the T\(_{1}\)-weighted image, such as segmentation and registration of grey matter (GM), white matter (WM) and cerebral spinal fluid (CSF). However, some systematic biases due to scanner difference and different population etc., can be introduced in this pipeline [2]. Part of them can be helpful to the discrimination of subjects from normal controls (NC), but may not be directly related to the disease. For example in structural Magnetic Resonance Imaging (sMRI) images of subjects with Alzheimer’s Disease (AD), after spatial normalization during simultaneous registration of GM, WM and CSF, the GM voxels surrounding lateral ventricle and subarachnoid space etc. may be mistakenly enlarged caused by the enlargement of CSF space in those locations [2] compared to normal template, as shown in Fig. 1. Although these voxels/features are highly correlated with disease, they can’t be regarded as lesion features in an interpretable model. In this paper we refer to them as “Procedural Bias”, which should be identified but is neglected in the literature. We observe that it can be harnessed in our voxel-based image analysis to improve the prediction of disease.
Together with procedural bias, the lesion features are vital for prediction and lesion regions analysis tasks, which are commonly solved by two types of regularization models. Specifically, one kind of models such as general losses with \(l_{2}\) penalty, elastic net [13] and graphnet [5] select strongly correlated features to minimize classification error. However, such models don’t differentiate features either introduced by disease or procedural bias and may also introduce redundant features. Hence, the interpretability of such models are poor and the models are prone to over-fit. The other kind of models with sparsity enforcement such as TV-\(L_{1}\) (Combination of Total Variation [9] and \(L_{1}\)) and particularly \(n^{2}\) GFL [12] enforce strong prior of disease on the parameters of the models introduced in order to capture the lesion features. Although such features are disease-relevant and the selection is stable, the models ignore the inevitable procedural bias, hence, they are losing some prediction power.
To incorporate both tasks of prediction and selection of lesion features, we propose an iterative dual-task algorithm namely Generalized Split LBI (GSplit LBI) which can have better model selection consistency than generalized lasso [11]. Specifically, by the introduction of variable splitting term inspired by Split LBI [6], two estimators are introduced and split apart. One estimator is for prediction and the other is for selecting lesion features, both of which can be pursued separately with a gap control. Following an iterative scheme, they will be mutually monitored by each other: the estimator for selecting lesion features is gradually monitored to pursue stable lesion features; on the other hand, the estimator for prediction is also monitored to exploit both the procedural bias and lesion features to improve prediction. To show the validity of the proposed method, we successfully apply our model to voxel-based sMRI analysis for AD, which is challenging and attracts increasing attention.
2 Method
2.1 GSplit LBI Algorithm
Our dataset consists of N samples \(\{x_{i},y_{i}\}_{1}^{N}\) where \(x_{i} \in \mathbb {R}^{p}\) collects the \(i^{th}\) neuroimaging data with p voxels and \(y_{i} = \{\pm 1\}\) indicates the disease status (\(-1\) for Alzheimer’s disease in this paper). \(X \in \mathbb {R}^{N \times p}\) and \(y \in \mathbb {R}^{p}\) are concatenations of \(\{x_{i}\}_{i}\) and \(\{y_{i}\}_{i}\). Consider a general linear model to predict the disease status (with the intercept parameter \(\beta _{0} \in \mathbb {R}\)),
A desired estimator \(\beta _{pre} \in \mathbb {R}^p\) should not only fit the data by maximizing the log-likelihood in logistic regression, but also satisfy the following types of structural sparsity: (1) the number of voxels involved in the disease prediction is small, so \(\beta _{pre}\) is sparse; (2) the voxel activities should be geometrically clustered or 3D-smooth, suggesting a TV-type sparsity on \(D_G \beta _{pre}\) where \(D_{G}\) is a graph difference operatorFootnote 1; (3) the degenerate GM voxels in AD are captured by nonnegative component in \(\beta _{pre}\). However, the existing procedural bias may violate these a priori sparsity properties, esp. the third one, yet increase the prediction power.
To overcome this issue, we adopt a variable splitting idea in [6] by introducing an auxiliary variable \(\gamma \in \mathbb {R}^{|V|+|E|}\) to achieve these sparsity requirements separately, while controlling the gap from \(D\beta _{pre}\) with penalty \(S_{\rho }(\beta _{pre},\gamma ) := \Vert D\beta _{pre} - \gamma \Vert _{2}^{2} := \Vert \beta _{pre} - \gamma _{V} \Vert _{2}^{2} + \Vert \rho D_{G}\beta _{pre} - \gamma _{G} \Vert _{2}^{2}\) with \(\displaystyle \gamma = \left[ \begin{array}{cc} \gamma _{V}^T&\gamma _{G}^T \end{array} \right] ^T\) and \(\displaystyle D = \left[ \begin{array}{cc} I&\rho D_{G}^T\end{array} \right] ^T\). Here \(\rho \) controls the trade-off between different types of sparsity. Our purpose is thus of two-folds: (1) use \(\beta _{pre}\) for prediction; (2) enforce sparsity on \(\gamma \). Such a dual-task scheme can be illustrated by Fig. 2.
To implement it, we generalize the Split Linearized Bregman Iteration (Split LBI) algorithm in [6] to our setting with generalized linear models (GLM) and the three types of structural sparsity above, hence called Generalized Split LBI (or GSplit LBI). Algorithm 1 describes the procedure with a new loss:
where \(\ell (\beta _{pre};\{x_{i},y_{i}\}_{1}^{N})\) is the negative log-likelihood function for GLM and \(\nu >0\) tunes the strength of gap control. The algorithm returns a sequence of estimates as a regularization path, \(\{\beta _0^k, \beta _{pre}^{k},\gamma ^{k},\beta _{les}^{k}\}_{k\ge 0}\). In particular, \(\gamma ^k\) shows a variety of sparsity levels and \(\beta _{pre}^k\) is generically dense with different prediction powers. The projection of \(\beta _{pre}^k\) onto the subspace with the same support of \(\gamma ^k\) gives estimate \(\beta _{les}^k\), satisfying those a priori sparsity properties (sparse, 3D-smooth, nonnegative) and hence being regarded as the interpretable lesion features for AD. The remainder of this projection is heavily influenced by procedural bias; in this paper the non-zero elements in \(\beta _{pre}^k\) which are negative (−1 denotes disease label) with comparably large magnitude are identified as procedural bias, while others with tiny values can be treated as nuisance or weak features. In summary, \(\beta _{les}\) only selects lesion features; while \(\beta _{pre}\) also captures additional procedural bias. Hence, such two kinds of features can be differentiated, as illustrated in Fig. 2.
2.2 Setting the Parameters
A stopping time at \(t^{k}\) (line 10) is the regularization parameter, which can be determined via cross-validation to minimize the prediction error [7]. Parameter \(\rho \) is a tradeoff between geometric clustering and voxel sparsity. Parameter \(\kappa \), \(\alpha \) is damping factor and step size, which should satisfy \(\kappa \alpha \le \nu / \kappa (1 + \nu \Lambda _{H} + \Lambda _{D}^{2})\) to ensure the stability of iterations. Here \(\Lambda _{(\cdot )}\) denotes the largest singular value of a matrix and H denotes the Hessian matrix of \(\ell (\beta _{0},\beta _{pre};\{x_{i},y_{i}\}_{1}^{N})\).
Parameter \(\nu \) balances the prediction task and sparsity enforcement in feature selection. In this paper, it is task-dependent, as shown in Fig. 2. For prediction of disease, \(\beta _{pre}\) with appropriately larger value of \(\nu \) may increase the prediction power by harnessing both lesion features and procedural bias. For lesion features analysis, \(\beta _{les}\) with a small value of \(\nu \) is helpful to enhance stability of feature selection. For details please refer to supplementary information.
3 Experimental Results
We apply our model to AD/NC classification (namely ADNC) and MCI (Mild Cognitive Impairment)/NC (namely MCINC) classification, which are two fundamental challenges in diagnosis of AD. The data are obtained from ADNIFootnote 2 database, which is split into 1.5 T and 3.0T (namely 15 and 30) MRI scan magnetic field strength datasets. The 15 dataset contains 64 AD, 208 MCI and 90 NC; while the 30 dataset contains 66 AD and 110 NC. DARTEL VBM pipeline [1] is then implemented to preprocess the data. Finally, the input features consist of 2,527 8 \(\times \) 8 \(\times \) 8 mm\(^{3}\) size voxels with average values in GM population template greater than 0.1. Experiments are designed on 15ADNC, 30ADNC and 15MCINC tasks.
3.1 Prediction and Path Analysis
10-fold cross-validation is adopted for classification evaluation. Under exactly the same experimental setup, comparison is made between GSplit LBI and other classifiers: SVM, MLDA (univariate model via t-test + LDA) [3], Graphnet [5], Lasso [10], Elastic Net, TV+L\(_{1}\) and \(n^{2}\)GFL. For each model, optimal parameters are determined by grid-search. For GSplit LBI, \(\rho \) is chosen from \(\{1,2,...,10\}\), \(\kappa \) is set to 10; \(\alpha = \nu / \kappa (1 + \nu \Lambda _{X}^{2} + \Lambda _{D}^{2})\) Footnote 3; specifically, \(\nu \) is set to 0.2 (corresponding to \(\nu \nrightarrow 0\) in Fig. 2)Footnote 4. The regularization coefficient \(\lambda \) is ranged in \(\{0,0.05, 0.1,...,0.95,1,10,10^{2}\}\) for lassoFootnote 5 and \(2^{\{-20,-19,...,0,...,20\}}\) for SVM. For other models, parameters are optimized from \(\lambda : \{0.05,0.1,...,0.95,1,10,10^{2}\}\) and \(\rho : \{0.5,1,..,10\}\) (in addition, the mixture parameter \(\alpha \): \(\{0,0.05,...,0.95\}\) for Elastic Net).
The best accuracy in the path of GSplit LBI and counterpart are reported. Table 1 shows that \(\beta _{pre}\) of our model outperforms that of others in all cases. Note that although our accuracies may not be superior to models with multi-modality data [8], they are the state-of-the-art results for only sMRI modality.
The process of feature selection combined with prediction accuracy can be analyzed together along the path. The result of 30ADNC is used as an illustration in Fig. 3. We can see that \(\beta _{pre}\) (blue curve) outperforms \(\beta _{les}\) (red curve) in the whole path for additional procedural bias captured by \(\beta _{pre}\). Specifically, at \(\beta _{pre}\)’s highest accuracy (\(t_{5}\)), there is a more than \(8\%\) increase in prediction accuracy by \(\beta _{pre}\). Early stopping regularization at \(t_5\) is desired, as \(\beta _{pre}\) converges to \(\beta _{les}\) in prediction accuracy with overfitting when t grows. Recall that positive (negative) features represent degenerate (enlarged) voxels. In each fold of \(\beta _{pre}\) at \(t_5\), the commonly selected voxels among top 150 negative (enlargement) voxels are identified as procedural bias shown in Fig. 1, where most of these GM voxels are enlarged and located near lateral ventricle or subarachnoid space etc., possibly due to enlargement of CSF space in those locations that are different from the lesion features.
3.2 Lesion Features Analysis
To quantitatively evaluate the stability of selected lesion features, multi-set Dice Coefficient (mDC)Footnote 6 [4, 12] is applied as a measurement. The 30ADNC task is again applied as an example, the mDC is computed for \(\beta _{les}\) which achieves highest accuracy by 10-fold cross-validation. As shown from Table 2, when \(\nu = 0.0002\) (corresponding to \(\nu \rightarrow 0\) in Fig. 2), the \(\beta _{les}\) of our model can obtain more stable lesion feature selection results than other models with comparable prediction power. Besides, the average number of selected features (line 3 in Table 2) are also recorded. Note that although elastic net is of slightly higher accuracy than \(\beta _{les}\), it selects much more features than necessary.
For the meaningfulness of selected lesion features, they are shown in Fig. 4(a)–(c), located in hippocampus, parahippocampal gyrus and medial temporal lobe etc., which are believed to be early damaged regions for AD patients.
To further investigate the locus of lesion features, we conduct a coarse-to-fine experiment. Specifically, we project the selected overlapped voxels of \(8 \times 8 \times 8\) mm\(^{3}\) size (shown in Fig. 4(c)) onto MRI image with more finer scale voxels, i.e. in size of \(2 \times 2 \times 2\) mm\(^{3}\). Totally 4,895 voxels are served as input features after projection. Again, the GSplit LBI is implemented using 10-fold cross-validation. The prediction accuracy of \(\beta _{pre}\) is \(90.34\%\) and on average 446.6 voxels are selected by \(\beta _{les}\). As desired, these voxels belong to parts of lesion regions, such as those located in hippocampal tail, as shown in Fig. 4(d).
4 Conclusions
In this paper, a novel iterative dual task algorithm is proposed to incorporate both disease prediction and lesion feature selection in neuroimage analysis. With variable splitting term, the estimators for prediction and selecting lesion features can be separately pursued and mutually monitored under a gap control. The gap here is dominated by procedural bias, some specific features crucial for prediction yet ignored in a priori disease knowledge. With experimental studies conducted on 15ADNC, 30ADNC and 15MCINC tasks, we have shown that the leverage of procedural bias can lead to significant improvements in both prediction and model interpretability. In future works, we shall extend our model to other neuroimaging applications including multi-modality data.
Notes
- 1.
Here \(D_G:\mathbb {R}^V \rightarrow \mathbb {R}^E\) denotes a graph difference operator on \(G=(V,E)\), where V is the node set of voxels, E is the edge set of voxel pairs in neighbour (e.g. 3-by-3-by-3), such that \(D_G(\beta )(i,j):=\beta (i)-\beta (j)\).
- 2.
- 3.
For logit model, \(\alpha < \nu / \kappa (1 + \nu \Lambda _{H}^2 + \nu \Lambda _{X}^2)\) since \(\Lambda _{X} > \Lambda _{H}\).
- 4.
In this experiment, comparable prediction result will be given for \(\nu \in (0.1,10)\).
- 5.
0 corresponds to logistic regression model.
- 6.
In [12], \(mDC := \frac{10 | \cap _{k=1}^{10} S(k) | }{\sum _{k=1}^{10} | S(k) |}\) where S(k) denotes the support set of \(\beta _{les}\) in k-th fold.
References
Ashburner, J.: A fast diffeomorphic image registration algorithm. Neuroimage 38(1), 95–113 (2007)
Ashburner, J., Friston, K.J.: Why voxel-based morphometry should be used. Neuroimage 14(6), 1238–1243 (2001)
Dai, Z., Yan, C., Wang, Z., Wang, J., Xia, M., Li, K., He, Y.: Discriminative analysis of early alzheimer’s disease using multi-modal imaging and multi-level characterization with multi-classifier. Neuroimage 59(3), 2187–2195 (2012)
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Grosenick, L., Klingenberg, B., Katovich, K., Knutson, B., Taylor, J.E.: Interpretable whole-brain prediction analysis with graphnet. Neuroimage 72, 304–321 (2013)
Huang, C., Sun, X., Xiong, J., Yao, Y.: Split lbi: An iterative regularization path with structural sparsity. In: Advances In Neural Information Processing Systems, pp. 3369–3377 (2016)
Osher, S., Ruan, F., Xiong, J., Yao, Y., Yin, W.: Sparse recovery via differential inclusions. Appl. Comput. Harmonic Anal. 41(2), 436–469 (2016)
Peng, J., An, L., Zhu, X., Jin, Y., Shen, D.: Structured sparse kernel learning for imaging genetics based alzheimer’s disease diagnosis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 70–78 (2016)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58, 267–288 (1996)
Tibshirani, R.J., Taylor, J.E., Candes, E.J., Hastie, T.: The solution path of the generalized lasso. Ann. Stat. 39(3), 1335–1371 (2011)
Xin, B., Hu, L., Wang, Y., Gao, W.: Stable feature selection from brain smri. In: AAAI, pp. 1910–1916 (2014)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. Ser. B (Statistical Methodology) 67(2), 301–320 (2005)
Acknowledgements
This work was supported in part by 973-2015CB351800, 2015CB85600, 2012CB825501, NSFC-61625201, 61370004, 11421110001 and Scientific Research Common Program of Beijing Municipal Commission of Education (No. KM201610025013).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sun, X., Hu, L., Yao, Y., Wang, Y. (2017). GSplit LBI: Taming the Procedural Bias in Neuroimaging for Disease Prediction. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D., Duchesne, S. (eds) Medical Image Computing and Computer Assisted Intervention − MICCAI 2017. MICCAI 2017. Lecture Notes in Computer Science(), vol 10435. Springer, Cham. https://doi.org/10.1007/978-3-319-66179-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-66179-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66178-0
Online ISBN: 978-3-319-66179-7
eBook Packages: Computer ScienceComputer Science (R0)