Keywords

1 Introduction

Vascular disease diagnosis involves some medical imaging technologies. TOF-MRA is a non-invasive approach and widely used for clinical diagnosis of arteries and lesions [1]. For the cerebrovascular segmentation from TOF-MRA data, the model- and data-driven methods have been reviewed in [2, 3], which generally focused on scale analysis, deformation model, regional growth, statistical model, as well as the state-of-the-art deep-learning methods. These methods are generally affected by initial position deviation, artifacts and noise, and are high dependent on seed points or model parameters. The statistical models have received special attention so far, the popular ones [4,5,6,7] are still affected by exploring the finite mixture model (FMM) and its parameters, and cannot well cope with the variability of different imaging environments. In addition, their energy constraints are not sufficient to detect small-sized vessels or the main vascularity in low signal-to-noise ratio (SNR). Deep-learning methods [8, 9] realized the optimal segmentation on accuracy and completeness, while they are very hard to obtain vascular ground-truths (GT) by densely labeling on complex image context. Thus, a high-efficient statistical modeling is our main research motivation.

Related Works:

The MAP-MRF framework [5,6,7] used FMM and neighborhood constraint as the low- and high-level models respectively. Given the observed data Y and Markov label field X, the framework realizes cerebrovascular segmentation with \( p\left( {X |Y} \right) \propto p\left( {Y |X} \right)P\left( X \right), \) while the terms \( p(Y|X) \) and \( P\left( X \right) \) are modeled respectively as the Markov low- and high-level processes for data likelihood and data prior of the contextual information. Specifically, it is expressed as follows in a discrete mathematical way. The observed data on the lattice space \( \varOmega = \{ s_{n} |n = 1, \ldots ,N\} \) is modeled by FMM on \( Y = \left\{ {y_{s} |s \in \varOmega , y_{s} \in {\mathcal{R}}} \right\} \), where \( y_{s} \) represents the intensity of voxel site s; while the high-level process models label field on the set \( X = \{ x_{s} |s \in \varOmega , x_{s} \in \left\{ {L_{V} ,L_{B} } \right\}\} \) with \( L_{V} \) and \( L_{B} \) being the vessel and background classes. Based on MAP estimation [10], class label can be estimated by \( x_{s} = argmax\left( {p(x_{s} |y_{s} )} \right) \propto argmax\left( {p(y_{s} |x_{s} )p\left( {x_{s} } \right)} \right) \), and the class prior \( P\left( {x_{s} } \right) \) is denoted as the Gibbs distribution with Hammersley-Clifford Theorem [10], i.e., \( e^{{ - U\left( {x_{s} } \right)}} /\sum\nolimits_{{ x_{s} \in \left\{ {L_{V} ,L_{B} } \right\}}} {e^{{ - U\left( {x_{s} } \right)}} } \). The energy function \( U\left( {x_{s} } \right) \) consists of low- and high-level energy forms, i.e., \( U\left( {x_{s} } \right) = U_{1} \left( {x_{s} } \right) + U_{2} \left( {x_{s} ,x_{s'} } \right) \), and \( s' \in \eta_{s} \) is the size of a neighborhood system (NBS). Existing low-level process used Rayleigh and Gaussian functions to construct the FMM [6, 7], where the model parameters were estimated by using the expectation maximization (EM) algorithm. Without loss of generality, there exists a negative correlation in between low-level energy and the likelihood probability for background and vessel class-labels, i.e., \( e^{{ - U_{1} \left( {L_{B} } \right)}} \propto \frac{{\mathop \sum \nolimits_{l} w_{l} p_{l} \left( {y_{s} ;\theta_{l} } \right)}}{{\mathop \sum \nolimits_{l} w_{l} }} \) with \( l = 1,2,3 \), and \( e^{{ - U_{1} \left( {L_{V} } \right)}} \propto p_{l} \left( {y_{s} ;\theta_{l} } \right) \) with \( l = 4 \), where \( \theta_{l} \) is the distribution parameter and the class proportions satisfy \( \sum\nolimits_{l} {w_{l} } = 1 \). The high-level energy is the weighted sum of the potentials on pair-wise sites (PWS), i.e., \( U_{2} \left( {x_{s} ,x_{{s^{\prime}}} } \right) = \sum\nolimits_{{s^{\prime}\epsilon \eta_{s} }} {\beta_{c} E\left( {x_{s} ,x_{{s^{\prime}}} } \right)} \) with regularization parameter \( \beta_{c} \) [7]. Traditional PWS-potential used the delta function [5,6,7] to calculate the PWS-potentials, i.e.,

$$ E\left( {x_{s} ,x_{s'} } \right) = 1 - \delta \left( {x_{s} - x_{s'} } \right), s' \in \eta_{s} $$
(1)

Unfortunately, that lead the existing high-level process only to remove the label-noise and keep the continuity of neighboring labels. With the cube size of 3 × 3 × 3 voxels, the multi-pattern-NBS (MP-NBS) [7] generalized NBS energy by summing PWS-potentials on all of the patterns in clique-classes \( c_{m, j} \) and minimizing the energy by

$$ U_{2} \left( {x_{s} ,x_{s'} } \right) = \mathop {min}\limits_{m} \left[ {\mathop {min}\limits_{j} \sum\nolimits_{{c_{m,j} }} {\beta_{c} E_{{c_{m,j} }} \left( {x_{s} ,x_{s'} } \right)} } \right],\,j = 1, \cdots ,N_{{c_{m} }} ,\,m = 1, \cdots , M $$
(2)

where the traditional 6- and 26-NBSs are the particular cases of these patterns. Given \( f_{{L_{B} }} \) and \( f_{{L_{V} }} \) respectively the above two likelihood probabilities, Zhou et al. [7] expressed their posterior probabilities by using \( p(L|y_{s} ) \propto f_{L} \exp \left( { - U_{2} \left( {L,x_{s'} } \right)} \right) \), \( L = L_{V} \) or \( L_{B} \). The site s is inferred as a vessel site when \( p(L_{V} |y_{s} ) - p(L_{B} |y_{s} ) > 0 \).

So far, the state-of-the-art statistical methods still face following limitations. The low-level process inevitably suffers from large fitting error of intensity-distribution in background region that occupies about 98% of original volume, thus may be impacted at the region of low contrast and/or low SNR. Another, their high-level process overly depends on the low-level one for initialization and iteration updating, and merely uses the summation of the delta functions based second order energy, like Eqs. (1)–(2), to denoise the label field. As a comparison, great difference on the results from between the traditional modeling strategies and ours is demonstrated in Fig. 1.

Fig. 1.
figure 1

Cerebrovascular segmentation results from TOF-MRA data by the existing statistical models and the proposed one. The 2nd and 3rd rows correspond to the yellow and blue boxes of the first row. (a) The maximum intensity projections (MIPs) of the raw data; (b)–(e) results by Lu et al.’s [5], Wen et al.’s [4], Hassouna et al.’s [6], and Zhou et al.’s [7]; (f) results by our low-level model, (g) results by our low- and high-level model. (h) the GTs.

2 Method

Our method steps contribute as: (1) preprocessing with skull-stripping and multi-scale filtering; (2) exerting a Gaussian mixture model (GMM) on the low-level process to generate intensity-based energy; (3) modeling the high-level process by using the vascular shape-prior PWS-potentials and NBS energy; (4) automating the regularization parameter estimation (RPE) by machine-learning algorithm.

2.1 Vascular Shape-Priors

The multiscale filtering [11] of the original data produces vessel Hessian characteristics. Given the original image intensity \( I\left( s \right) \) at a site \( s\left( {x_{1} ,x_{2} ,x_{3} } \right), \) the Hessian matrix is denoted by \( H_{\sigma } \left( s \right) = \sigma^{2} \partial^{2} I_{\sigma } \left( s \right)/\partial x_{i} \partial x_{j} \) with dimensions \( i, j = 1,2,3 \); \( I_{\sigma } \left( s \right) \) is the filtered result with Gaussian kernel at scale \( \sigma \). The eigen-decomposed Hessian matrix has three eigenvalues (\( \lambda_{1, \sigma } ,\lambda_{2, \sigma } ,\lambda_{3, \sigma } \) with \( \left| {\lambda_{1, \sigma } } \right| \le |\lambda_{2, \sigma } | \le \left| {\lambda_{3, \sigma } } \right| \)) and the corresponding eigenvectors \( \left( {\vec{v}_{1, \sigma } , \vec{v}_{2, \sigma } ,\vec{v}_{3, \sigma } } \right) \), where direction of \( \vec{v}_{1, \sigma } \) indicates the direction locally parallel to the vascular ridge with minimum intensity variation. This filter responses to vascular ridges at different scales \( \sigma_{k} \) (with \( k = 1, 2, \ldots \)). The maximized response of each voxel is considered as the filter output and denoted by \( {\mathcal{V}}\left( s \right) \) and the corresponding eigenvectors \( \vec{v}_{1} \left( s \right) \) are obtained as the vascular feature map (VFM) and vascular direction field (VDF), as shown in Fig. 2 (right). Furthermore, the above vascular shape-prior features will be integrated with intensity distributions by our statistical modeling.

Fig. 2.
figure 2

(Left): Gaussian functions fit the intensity of skull-stripped TOF-MRA data. (Right): corresponding to the boxed regions in 2D original image, the VFMs are overlapped with VDFs.

2.2 Statistical Modeling with MAP-MRF

Without loss of generality, modeling cerebrovascular intensity distribution is greatly impacted by the fitting error in medium- and low-intensity range and also produces computational redundancy, while the skull-stripped intracranial area can stabilize the histogram fitting with Gaussian mixture model (GMM).

Low-Level Model.

With skull-stripping algorithm [12], the resultant intensity distribution facilitates to divide the voxels of skull-stripping data into one vessel class (cerebral vasculature) and two background classes (brain tissues and cerebrospinal fluid). We use two Gaussian distributions to model the background classes, and another one for the vessel class, as shown in Fig. 2 (left). At each site \( s \), the GMM of intensity value \( y_{s} \) is expressed as: \( f_{GMM} \left( {y_{s} } \right) = \sum\nolimits_{l = 1}^{3} {\frac{{w_{l} }}{{\sqrt {2\pi \sigma_{l} } }}exp\left( {\frac{{ - \left( {y_{s} - \mu_{l} } \right)}}{{2\sigma_{l}^{2} }}} \right)} \), where the parameters \( \mu_{l} ,\sigma_{l} ,w_{l} \) (l = 1, 2, 3) are automatically estimated by K-means based pre-classification and parameter initialization and are followed by EM algorithm [7]. Note, the GMM does not involve shape-priors, while has the relation of \( e^{{ - U_{I} \left( {x_{s} } \right)}} \propto f_{GMM} \left( {y_{s} } \right) \) on the low-level energy \( U_{I} \left( {x_{s} } \right) \) of vessel or background class.

High-Level Model.

To segment vessels especially regain the latent vessels from low-contrast region, the VFM and VDF (as the shape-priors feature) are employed to construct the Markov high-level model. Unlike traditional PWS-potential in Eq. (1), a new one is pre-analyzed as follows. For the VDF, we denote PWS’s direction vectors by \( \vec{v}_{1} \left( s \right) \) and \( \vec{v}_{1} \left( {s'} \right) \) with \( s' \in \eta_{s} \), the cosine of the angle between the two vectors is \( \frac{{\vec{v}_{1} \left( s \right) \cdot \vec{v}_{1} \left( {s'} \right)}}{{\left| {\vec{v}_{1} \left( s \right)} \right|\left| { \vec{v}_{1} \left( {s^{\prime}} \right)} \right|}} \). Then, we get the first measure, namely, the normalized phase continuity of the two sites, which is denoted by \( {\mathcal{S}}_{s, s'} = 1 - \frac{{\left| {\vec{v}_{1} \left( s \right) \cdot \vec{v}_{1} \left( {s'} \right)} \right|}}{{\left| {\vec{v}_{1} \left( s \right)} \right|\left| {\vec{v}_{1} \left( {s^{\prime}} \right)} \right|}} \). For the VFM, we denote numerical difference of PWS-intensity by \( \nabla {\mathcal{V}}_{s, s'} \), and \( \nabla {\mathcal{V}}_{ \hbox{max} } \) is the maximum value within the NBS, i.e., \( \eta_{s} \). Then, we get the second measure, namely, normalized intensity smoothness of the PWS, which is expressed as \( \nabla {\mathcal{I}}_{s, s'} = \frac{{\left| {\nabla {\mathcal{V}}_{s,s'} } \right|}}{{\left| {\nabla {\mathcal{V}}_{max} } \right|}} \). Combining the two measures, a compound measure of pair-wise sites is formed by using \( {\mathcal{E}}_{{s, s^{\prime}}} = \epsilon_{1} \nabla {\mathcal{I}}_{{s, s^{\prime}}} + \epsilon_{2} {\mathcal{S}}_{{s, s^{\prime}}} \) with \( \epsilon_{1} + \epsilon_{2} = 1, \) where an average assignment will balance the two measures. Traditional minimum of the NBS energy with the PWS-potential of Eq. (1) only to regulate the label values at \( x_{s} \ne x_{s'} \) that mainly situates in the edges of the vessels or the block-shaped structures. Applying the measure \( {\mathcal{E}}_{s, r} \) to either edge (with \( x_{s} \ne x_{s'} \)) or background (with \( x_{s} = x_{s'} = L_{B} \)), a new PWS-potential is derived as

$$ {\mathcal{P}}\left( {x_{s} ,x_{s'} } \right) = \left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}l} 0 \hfill \\ {1 - {\mathcal{E}}_{s, s'} } \hfill \\ {{\mathcal{E}}_{s, s'} } \hfill \\ \end{array} } & {\begin{array}{*{20}l} {x_{s} = x_{s'} = L_{V} } \hfill \\ {x_{s} = x_{s'} = L_{B} } \hfill \\ {x_{s} \ne x_{s'} } \hfill \\ \end{array} } \\ \end{array} } \right. $$
(3)

where \( x_{s} ,x_{s'} \in \left\{ {L_{V} ,L_{B} } \right\} \). The new high-level NBS energy \( U_{II} \left( {x_{s} ,x_{s'} } \right) \) can be derived from Eq. (2), when applying the potential \( {\mathcal{P}}\left( {x_{s} ,x_{s'} } \right) \) to the MP-NBS. Different from Eqs. (1), (3) uses \( {\mathcal{E}}_{s, s'} \) and \( 1 - {\mathcal{E}}_{s, s'} \) respectively at discontinuous edges and background to facilitate rerating the label field, thus improves vessel edges and regain the dim or slender vessels from low-contrast and/or low-SNR region.

Posterior Probability.

Combining the above low- and high-level models with the MP-NBS, the new posterior probability is derived as

$$ x_{s} : \to f_{GMM} \left( {y_{s} } \right) \cdot { \exp }\left( { - U_{II} \left( {x_{s} ,x_{{s^{\prime}}} } \right)} \right) $$
(4)

which combines the intensity information with vascular shape-priors, while the likelihood and prior probabilities is determined by the low- and high-level energy models \( f_{GMM} \left( {y_{s} } \right) \) and \( U_{II} \left( {x_{s} ,x_{s'} } \right) \) respectively. For Eq. (4), the iteration conditional mode [6, 10] and Bayesian rule [6, 7] can be used to iteratively solve a random field, namely, we infer that a site is the vessel site, if it meets \( p\left( {y_{s} |L_{V} } \right)p\left( {L_{V} } \right) > p\left( {y_{s} |L_{B} } \right)p\left( {L_{B} } \right) \). The MRF process facilitates a global convergence within few iterations.

Regularization Parameter.

Given a changeless constant \( \beta_{c} \) balancing the actions in between low- and high-level processes, Eq. (4) may produce uneven segmentation effects for individual data acquisitions. Thus, we estimate the parameter by using the maximum pseudo likelihood [10] with machine-learning, assuming that the spatial voxels meet homogeneous distribution. From an arbitrary initial label field (e.g., matrix zero), optimizing \( \widehat{\beta }_{\text{c}} \) is to minimize the negative logarithm of the pseudo likelihood by

$$ - ln{\text{PL}}\left( {x|\beta } \right) = \mathop \sum \nolimits_{{s{\epsilon}\Omega ,s\prime \in \eta_{s} }} \left[ {U_{I} \left( {x_{s} } \right) + U_{II} \left( {x_{s} ,x_{{s^{\prime}}} } \right) + { \ln }\mathop \sum \nolimits_{{x_{s} }} \exp \left( { - U_{I} \left( {x_{s} } \right) - U_{II} \left( {x_{s} ,x_{{s^{\prime}}} } \right)} \right)} \right] $$
(5)

According to [7], a classical machine-learning process iteratively find the optimal \( \widehat{\beta }_{\text{c}} \) through the gradient descent equation, i.e., \( \beta_{t} = \beta_{t - 1} + \gamma \nabla \left( {ln{\text{PL}}\left( {x|\beta_{t - 1} } \right)} \right) \), with the step-size of \( \gamma \) and the negative gradient \( - \nabla \left( {ln{\text{PL}}\left( {x|\beta } \right)} \right) \) derived as

$$ \frac{{\partial \left( {ln{\text{PL}}\left( {x|\beta } \right)} \right)}}{\partial \beta } = \sum\nolimits_{{s \in\Omega }} {\left( {\frac{{f_{{L_{V} }} N_{{L_{V} }} e^{{ - \beta N_{{L_{V} }} }} + f_{{L_{B} }} N_{{L_{B} }} e^{{ - \beta N_{{L_{B} }} }} }}{{f_{{L_{V} }} e^{{ - \beta N_{{L_{V} }} }} + f_{{L_{B} }} e^{{ - \beta N_{{L_{B} }} }} }} - \delta_{{x_{s} - L_{V} }} N_{{L_{V} }} - \delta_{{x_{s} - L_{B} }} N_{{L_{B} }} } \right)} $$
(6)

where the delta function \( \delta_{{x_{s} - L_{V} }} \) produce 1 at \( x_{s} = L_{V} \), the Gaussian functions \( f_{{L_{V} }} \) and \( f_{{L_{B} }} \) produce the conditional probability \( p\left( {y_{s} |L_{V} } \right) \) and \( p\left( {y_{s} |L_{B} } \right) \) for vessel- and background-class voxels respectively; the likelihood energies \( N_{{L_{V} }} \) and \( N_{{L_{B} }} \) are the derivatives of \( U_{2} \left( {x_{s} ,x_{s'} } \right) \). Given an initial \( \beta_{0} \) = 0, Eq. (6) is iteratively updated until \( \beta_{t} \) and \( \nabla \left( { - { \ln }PL\left( {x|\beta_{t} } \right)} \right) \) no longer change, then the obtained \( \widehat{\beta }_{\text{c}} \) serves as a global parameter.

3 Experimental Results

To validate our algorithms of cerebrovascular segmentation, 109 sets of the public data (http://www.insight-journal.org/midas/community/view/21) of brain TOF-MRA were employed in the experiments. These datasets were acquired on a 3T MR unit, and the size of each volume was 128 slices of 448 × 448 voxels with voxel size of 0.5 × 0.5 × 0.8 mm3. We randomly selected 10 public datasets as training ones to produce their GTs, the remaining ones were for testing. The GTs were obtained by three neurosurgeons using MIMICS (i.e., the medical image processing software). The following experiments were carried out on a PC Inter(R) Core (TM) i7-4790 CPU @ 3.60 GHz, 24 GB-RAM; the main algorithms were coded by MATLAB 2018a; maximum connected regions of segmentation results were used for evaluations.

For the validation of RPE with different PWS-potentials, the gradient descent equation was applied to the training datasets. By using the trial-error-test, we took out 9 candidate beta-values uniformly from the interval [0, 1] to acquire segmentation results and the DSCs. Given \( \beta_{T} \) is the beta-value corresponding to the highest DSC, we can define the range \( \beta_{T} \pm 0.05 \) as the optimal beta-interval (OBI). Well, the results \( \widehat{\beta }_{\text{c}} \) calculated by the PML on the training datasets were found all falling into the OBIs, as shown in Fig. 3, which facilitated to automate the RPE for testing datasets with individual statistical characteristics.

Fig. 3.
figure 3

Trial-error-test based PML-RPE verification on the training data. The left: for Normal001, three sets of DSCs on 6-, 26-, and MP-NBS are calculated with the candidate beta-values. The right: for 10 training datasets, i.e., Normal-001, -003, -005, -007, -009, -018, -033, -047, -051, -077, the estimated beta-values (colored points) all fall into the OBIs (colored line-segments). (Color figure online)

Our experiments consisted of cross validation of 24 composite modes with different low- and high-level models, as shown in Table 1. The cross-comparison was carried through in between their results and the corresponding GTs, where the Dice Similarity Coefficient (DSC) refers to \( {\text{DSC}} = \frac{2TP}{2TP + FP + FN} \) with TP, FP and FN being the number of true positives, false positive, and false negative, respectively. Besides, clinical evaluation with the 99 testing datasets was proceeded through the observations of neurosurgeons. Their evaluation reports were divided into three-level cases, i.e. Good, general, and poor cases according to cerebrovascular network coverage, over-segmentation, and the percentage of pseudo-vascular structures. The visual effects of segmentation results from arbitrary 10 testing datasets were compared in Fig. 4.

Table 1. The DSCs and processing time (s) in cross validation of 24 composite modes.
Fig. 4.
figure 4

Segmentation results from ten testing datasets (Normal-010, -017, -019, -021, -012, -013, -020, -022, -011, -029) by using five methods. The first row represents the MIPs of the raw datasets (a), while the segmentation results are shown in (b) by Lu’s [5], (c) by Wen’s [4], (d) by Hassouna’s [6], (e) by Zhou’s [7], and (f) by ours.

4 Discussion and Conclusions

Cerebrovascular segmentation and manual annotation from TOF-MRA dataset is a very hard and complicated work, which has to deal with a large number of personalized datasets and vast local environments with low-contrast or -SNR. Thus, we implemented cross-validation and numerical assessments on only 10 TOF-MRA datasets that have corresponding GTs by our manual annotation. In Table 1, the MP-NBS generates the highest DSC than other NBSs, while its computation time is longer, therefore, the statistical modeling with better calculation cost performance is to use GMM fitting skull-stripped data and use the shape-prior PWS-potential on the 6-NBS. Through qualitative comparison, Fig. 4 illustrates that all the methods show similar performance on the segmentation of the large size vessel or in high contrast region, while the proposed strategy greatly improves the segmentation effect of the small size vessel in low contrast region, e.g., Figure 4(f). According to the neurosurgeons, the proposed model performs well on most of the 99 testing datasets and is better than the other traditional models. For the RPE, we found that the estimated \( \widehat{\beta }_{\text{c}} \) all fall into the OBIs corresponding to the maximum DSCs, see Fig. 3. For the model-driven methods, our statistical modeling has obtained the best effect among the existing ones that had been reported to win out the level set method in [5, 7].

Summarily, cerebrovascular segmentation from TOF-MRA data is of great significance, and also an open study so far. Our GMM and the optimized PWS-potential improves the traditional low- and high-level models on skull-tripped TOF-MRA data, and produces higher DSCs and better cerebrovascular network coverage. Such a novel statistical modeling strategy develops current MAP-MRF framework, and realizes the vascular segmentation with better accuracy, completeness, and computation speed.