Keywords

1 Introduction

The reading of the ribs from three-dimensional (3D) computed tomography (CT) scans is a typical task in radiology, e.g. to find bone lesions or identify fractures. During reading, each of the 24 ribs needs to be followed individually while scrolling through the slices. As a result, this task is time-consuming and rib abnormalities are likely to be overlooked.

In order to assist reading, efficient visualization schemes or methods for navigation support are required. These applications are typically based on the rib centerlines [1, 2]. Despite their generally high contrast, automated extraction of the rib centerlines from CT is challenging. For example, image noise and artifacts impede the extraction, but also other bony structures in close vicinity (most prominently the vertebra), as well as severe pathologies. Finally, anatomical labeling of the extracted centerlines (i.e. knowing which one for example is the “7th right rib”) is usually desirable. From an algorithmic perspective, this task is trivial if all 24 ribs are correctly extracted, as simply counting left and right ribs from cranial to caudal would be sufficient. Obviously, this task becomes significantly more challenging once the rib-cage is only partially imaged or once a rib is missing (e.g. due to pathologies or missed detection in a previous step).

A wide range of different approaches has been proposed in the past for rib centerline extraction partially also including their labeling. Tracing based approaches, as in [3, 4] aim at iteratively following the ribs. As such approaches rely on an initial seed point detection per rib, an entire rib is easily missed once a corresponding seed point was not detected. Alternatively, potential rib candidates can be first detected in the entire volume which then need to be grouped to obtain ribs, as for example done in [5]. However, the removal of other falsely detected structures remains a crucial task. Attempts have been made to additionally integrate prior knowledge by means of geometrical rib cage centerline models [2, 6]. Nevertheless, such approaches may struggle with deviations from the model in terms of pathologies.

In this paper, we propose a two-stage approach combining deep learning and classic image processing techniques to overcome several of the limitations listed above. Rib probability maps are calculated at first using a fully convolutional neural network (FCNN) (Sect. 2.2), and then the centerlines are reconstructed using a specifically designed centerline extraction algorithm (Sect. 2.3). In particular, three distinct rib probability maps are calculated (first rib, twelfth rib or intermediate rib). By knowing the first and/or twelfth rib, labeling can be solved easily by iterative counting. This scheme also works in case of partial rib cages (for example if only the upper or lower part is shown). Evaluation is carried out on a representative number of 113 cases.

2 Methods

2.1 Data

Our data set consists in total of 113 image volumes containing 59 thorax as well as 54 full body CT scans. The data includes a wide range of typical challenges, such as variation in the field of view leading to partly visible or missing ribs (3 patients with first rib missing, 38 patients with at least partially missing twelfth rib), various types of rib fractures, spine scoliosis (14 patients) strong contrast-uptake around the first rib (33 patients), implants in other bony structures (7 around the sternum, 2 around the spine, and 2 around the femur/humerus), several different devices with similar intensity to the ribs such as catheters or cables (57 patients).

In each image, we annotated rib centerlines by manually placing spline control points. The rib centerlines were then obtained using cubic spline interpolation. For each image volume, we generated a label mask by dilating the corresponding centerlines with a radius of 3.0 mm. Four different labels are assigned to the classes background, first rib, twelfth rib and intermediate rib.

Fig. 1.
figure 1

Foveal architecture with three resolution levels. The feature extraction pathways (green), consisting of three CBR blocks, are integrated using CBRU blocks (blue). The final CS block consists of a \(3 \times 3 \times 3\) valid convolution and a soft-max layer. (Color figure online)

2.2 Multi-label Rib Probability Map Generation

For rib detection, we first apply a FCNN in order to generate probability maps which are subsequently fed into the tracing algorithm (Sect. 2.3). More specifically, we formulate our task as a four-class problem, where the network yields for each voxel \(v_{ijk}\) of the volume a four-dimensional vector \(p_{ijk} \in [0,1]^4\). The components \(p_{ijk, 0}, p_{ijk, 1}, p_{ijk, 2}, p_{ijk, 3}\) can be interpreted as probabilities that the associated voxel belongs to the classes background, first rib (pair), twelfth rib (pair) or intermediate rib (pairs), respectively. Distinct classes for the first and the twelfth rib were introduced to deal with differences in anatomy (especially for the first rib) while significantly simplifying the following labeling procedure. By using the relative to location of the intermediate ribs to the first and twelfth rib, labeling of the ribs can be achieved efficiently. Moreover, knowing the potential location of first or twelfth rib enables labeling even in cases of partial rib cages. Details are provided in Sect. 2.3 below. We favored the parsimonious four-class learning task over training a neural network for detecting each individual rib, resulting in a 25-class (24 ribs plus background) classification problem, due to several reasons: (a) The four-class network in combination with our iterative tracing approach seems sufficient for robustly solving the problem at hand, (b) due to the similar appearance of intermediate ribs, we do not expect the 25-class network to be able to identify internal ribs reliably, (c) the 25-class approach would cause a higher memory footprint and runtime during training and inference.

As network architecture, we chose the Foveal network described in [7]. Basically, the network is composed of two different types of layer modules, CBR and CBRU blocks (Fig. 1). A CBR block consists of a \(3 \times 3 \times 3\) valid convolution (C) followed by batch normalization (B) and a rectified linear unit activation (R). A CBRU block is a CBR block followed by an average unpooling layer (U). Since we favor fast network execution times and a moderate GPU memory consumption, we decided to use three resolution layers \(L_\text {H}, L_\text {M}, L_\text {L}\), each composed of three CBR blocks. Differently sized image patches with different isotropic voxel spacings are fed into the layers as input (Table 1). The low and medium resolution pathways \(L_\text {L}, L_\text {M}\) are integrated into the high resolution layer \(L_\text {H}\) using CBRU blocks. Implementation details and further remarks concerning the architecture performance can be found in [7].

As preprocessing, the CT images are resampled to an isotropic spacing of 1.5 mm using linear interpolation. For image normalization we used intensity window normalization with an intensity window of [500, 2000] HU and clipping at the window boundaries. The network was trained by minimizing the cross entropy on mini-batches containing 8 patches (each at three different resolutions) drawn from eight randomly selected images. In order to compensate for the class imbalance between background and rib voxels, we used the following randomized sampling strategy: 10% of the patch centers were sampled from the bounding box of the first rib pair, 10% from the bounding box of the twelfth rib pair and 30% from the bounding box of the intermediate ribs. The remaining 50% patch centers were uniformly sampled from the entire volume. As an update rule, we chose AdaDelta [8] in combination with a learning rate schedule. For data augmentation, the patches were randomly scaled and rotated around all three axes. The neural network was implemented with CNTK version 2.2 and trained for 1200 epochs on a GeForce GTX 1080. The network training could be completed within a few hours and network inference times were ranging from approximately 5 to 20 s, depending on the size of the input CT volume.

Table 1. Input configuration of the network layers.

2.3 Centerline Extraction and Labeling

In order to robustly obtain rib centerlines, we designed an algorithm that specifically incorporates the available information from the multi-label probability map. It basically consists of four distinct steps:

  1. 1.

    Determination of a rib-cage bounding box.

  2. 2.

    Detection of an initial left and right rib.

  3. 3.

    Tracing of the detected ribs and detecting neighboring ribs.

  4. 4.

    Rib labeling.

Steps 1 to 3 are performed on the combined probability map, adding the results of the three non-background classes and limiting the sum to a total probability of 1.0, i.e. to each voxel \(v_{ijk}\) we assign the value \(q_{ijk} := \min \{ p_{ijk, 1} + p_{ijk, 2} + p_{ijk, 3}, 1 \}\).

Step 1: Bounding Box Detection. Generally, the given CT volume is assumed to cover at least a large portion of the rib cage, but may extend beyond it. Therefore, we first determine a search region in order to identify the visible ribs. Based on the axial image slices, a two-dimensional (2D) bounding rectangle is computed using a probability threshold of 0.5 on the combined probability map. To suppress spurious responses, we require a minimal 2D box of size \(30\,\text {mm} \times 10\,\text {mm}\) to be a valid bounding box. From the set of valid 2D bounding boxes, a 3D bounding box is calculated from the largest connected stack in vertical direction. The 3D bounding box is strictly speaking not a box, but has inclined faces. Each of the four faces results from a linear regression of the slice wise determined four border positions, having the advantage of being robust against outliers and being able to represent to some extent the narrowing of the rib cage from abdomen to shoulders (Fig. 2(a) and (b)).

Step 2: Initial Rib Detection. From the approximate rib cage bounding box obtained in Step 1, we derive an initial cross-sectional search window to detect the ribs. For that purpose, anchor point \(a_l, a_r\) are chosen at 25% and 75% of the left-to-right extension of the box section at medium axial level. Then sagittal 2D search regions centered at \(a_l\) and \(a_r\) of spacial extension \(100\,\text {mm} \times 100\,\text {mm}\) are defined (Fig. 2(a) and (b)). In each of these regions an initialization point exceeding a probability of 0.5 is determined. We remark that this point may be located at the rib border. To locate the rib center, we sample the probability values in a spherical region of 15 mm diameter around the initialization point. Next, the probability weighted center of mass \(c_0\in \mathbb {R}^3\) and the probability weighted covariance matrix \(\varSigma _0 \in \mathbb {R}^{3\times 3}\) of the voxel coordinates are calculated. Finally, we use \(c_0\) as rib center estimate and the eigenvector \(t_0\in \mathbb {R}^3\) corresponding to the largest eigenvalue of \(\varSigma _0\) as estimation of the tangential direction. The position \(c_0\) is added to the list of rib center line control points.

Fig. 2.
figure 2

(a) Neural network output (green: first rib; red: intermediate rib; blue: twelfth rib) and approximate three-dimensional (3D) bounding box of the rib cage (yellow) in coronal (top) and axial view (bottom). The lower image depicts in light blue the two search regions for rib detection. (b) Schematic representation of the vertical stack of two-dimensional bounding boxes (red) in coronal view and the approximate 3D bounding box of the rib cage resulting from the largest connected stack in vertical direction by linear regression (yellow). The dashed yellow line marks the box section at medium axial level. The two search regions used for initial rib detection are depicted in light blue. (c) Traced ribs (red) are shown on top of a sagittal cross-section of the probability map. The fan-like search regions for neighboring ribs are depicted in yellow. (Color figure online)

Step 3: Rib Tracing and Detection of Neighboring Ribs. Based on the initial rib detection result from Step 2, the rib response in the probability map is traced in an iterated scheme (\(i = 0,1,2,\ldots \)) consisting of the following three actions:

  1. (a)

    Starting from \(c_i\) move in tangential direction \(t_i\) until a voxel with combined probability value \(q_{ijk} < 0.5\) is encountered or a maximal moving distance of 7.5 mm is reached.

  2. (b)

    Calculate the weighted mean vector \(c_{i+1} \in \mathbb {R}^3\) in a spherical region around the current position. Add \(c_{i+1}\) to the list of rib center line control points and move to \(c_{i+1}\).

  3. (c)

    Calculate the probability weighted covariance matrix \(\varSigma _{i+1} \in \mathbb {R}^{3 \times 3}\) in a spherical region around \(c_{i+1}\) and compute the tangential direction \(t_{i+1} \in \mathbb {R}^3\), see Step 2.

This scheme is iterated until the moving distance in the current iteration falls below a predefined threshold of 3.0 mm. In that case, a forward-looking mechanism is triggered which aims at bridging local drop-outs of the probability response. More precisely, the algorithm searches for a voxel with a combined probability value exceeding 0.5 within a cone-shaped region. This voxel then serves as continuation point for the iterative tracing procedure described above (Fig. 3).

Tracing from the initial starting point \(c_0\) is performed in both possible directions and results are finally concatenated which yields a centerline of the full rib represented by the point sequence \(\{c_0,c_1,\ldots \}\). After the tracing of one rib is completed, the resulting centerline is inserted into the list of rib centerlines L which is ordered in vertical direction from feet to head.

This collection is extended in a step wise fashion by detecting adjacent so far untraced ribs using fan-like 2D search regions anchored at the lowest and highest rib contained in L (Fig. 2(c)). The initial location of the search fan is 10 mm distal from the rib starting point at the spine. The rib tangential vector at this point is used as normal vector of the fan plane. The fan opening direction withing this plane is steered by the intersection point of the previous rib with the fan plane. If only one traced rib is available yet, the fan is simply pointing upward or downward. If a neighboring rib could be found within the fan, the iterative scheme described above is applied to trace the rib. If not, the search fan is moved along the rib in 10 mm steps towards the distal end.

Fig. 3.
figure 3

Schematic representation of the iterative tracing algorithm. Each red point corresponds to a probability weighted mean vector \(c_i\) in the spherical region around the associated preceding position which is depicted by a yellow point connected by a yellow arrow (see Step 3, b). The black arrows correspond to a movement in tangential direction \(t_i\) (see Step 3, a). The blue triangle depicts the cone-shaped search region used by the forward-looking mechanism. The rib center line resulting from a spline interpolation of the control points \(c_i\) is depicted by the dashed red line. (Color figure online)

Step 4: Rib Labeling. After extraction of the centerlines, the average probability for all three non-background classes is calculated for each found rib. In the optimal case, 12 rib pairs have been found and the first and twelfth rib have an average probability along their centerlines above 0.5 for their respective class. In this case, the intermediate ribs are labeled according to their position in the list L. In case that less than 12 ribs were traced, the labeling is still possible if either the first or twelfth rib can be identified. Labeling is not possible if both first and twelfth rib cannot be identified and less then 10 ribs were traced.

Table 2. Mean, standard deviation, 25% quartile, median and 75% quartile of the sensitivity (Sens.), precision (Prec.) and Dice score (Dice) associated to the predicted class labels first rib, intermediate rib, twelfth rib and the combined class rib across the 4 cross validation test sets.

3 Results

Our pipeline was evaluated using 4-fold cross validation (CV). More precisely, the dataset was randomly shuffled and partitioned into four subsamples of sizes 28, 28, 28 and 29. We trained four different networks by using in each fold three subsamples as training data while retaining a single subsample for testing.

3.1 Multi-label Network

For the evaluation of a probability map \(p_{ijk} = (p_{ijk, 0}, p_{ijk, 1}, p_{ijk, 2}, p_{ijk, 3})\) generated by the neural network, we assigned to each voxel \(v_{ijk}\) a predicted class label \(L^\text {pred}_{ijk}\) based on its maximal class response, i.e. \(L^\text {pred}_{ijk} = {{\,\mathrm{argmax}\,}}_{c=0,1,2,3}\, p_{ijk, c}\). Following the naming convention from Sect. 2.2, the labels 0, 1, 2, 3 correspond to the classes background, first rib, twelfth rib and intermediate rib, respectively. Comparing the predicted class labels with the corresponding ground truth labels \(L^\text {GT}_{ijk}\), yields for each class the number of true positives (TP), false positives (FP), and false negatives (FN), i.e.

$$\begin{aligned} \text {TP}_C&= \vert \{ ijk ~ : ~ L^\text {GT}_{ijk} = C \text { and } L^\text {pred}_{ijk} = C \}\vert ,\\ \text {FP}_C&= \vert \{ ijk ~ : ~ L^\text {GT}_{ijk} \not = C \text { and } L^\text {pred}_{ijk} = C \}\vert ,\\ \text {FN}_C&= \vert \{ ijk ~ : ~ L^\text {GT}_{ijk} = C \text { and } L^\text {pred}_{ijk} \not = C \}\vert , \end{aligned}$$

where \(C \in \{0,1,2,3\}\) denotes the class under consideration. Henceforth, we will omit the class index C in order to simplify the notation. Based on these quantities we compute for each class sensitivity, precision and Dice as follows:

$$\begin{aligned} \begin{aligned} \text {sensitivity}&= \frac{\text {TP}}{\text {TP}+\text {FN}},\\ \text {precision}&= \frac{\text {TP}}{\text {TP}+\text {FP}},\\ \text {Dice}&= \frac{2\text {TP}}{2\text {TP}+\text {FP}+\text {FN}}. \end{aligned} \end{aligned}$$
(1)

Table 2 summarizes different descriptive statistics (mean, standard deviation, median and quartiles) of the measures from (1) calculated on the test sets from the 4-fold CV. In order to analyze the overall rib detection rate irrespective of the specific rib class, we introduced an addition label rib which was assigned to each non-background voxel.

As can be seen from Table 2, we obtain a good performance for the overall rib detection captured, for example, by a median Dice of 0.87 for the label intermediate rib. Let us remark that for thin objects, such as the dilated rib centerlines, the Dice score constitutes a rather sensitive measure. The results indicate that detecting the first and twelfth rib pairs is more difficult for our network. While extraction of the first rib is more challenging due to, e.g. higher noise in the upper thorax or other bony structures in close vicinity (clavicle, shoulder blades, vertebrae), the twelfth rib can be extremely short or is even sometimes entirely missing. For further illustration, Fig. 4 shows the results on selected representative cases. Generally, the ribs are well detected without major false responses in other structures - despite all the different challenges present in the data. The color coding highlighting of the multi-label detection reveals that first and twelfth are mostly correctly detected. In few cases the network wrongly generated strong responses of the classes first rib or last rib for voxels belonging to the second or eleventh rib pair.

Table 3. Rib-wise evaluation of the method based on the final labeled centerline point sets. A detected rib centerline point counts only as true positive if the correct label was determined. The table shows the summary for the collected 113 cases and reports sensitivity (Sens.), precision (Prec.), point-to-line distance (Dist., in mm) and Dice score (Dice).
Table 4. Rib-wise evaluation of the method based on the final centerline point sets, irrespective labeling. A detected rib centerline point counts as true positive if it is close enough to a ground truth centerline, irrespective labeling. The table shows the summary for the collected 113 cases and reports sensitivity (Sens.), precision (Prec.), point-to-line distance (Dist., in mm) and Dice score (Dice).
Table 5. Percentage of cases with missed labeled ribs. A rib counts as missed, if less than half of the ground truth rib centerline could be detected. A detected rib centerline point counts only as true positive if the correct label was determined.

3.2 Rib Centerlines

For the evaluation of the final centerlines, both ground truth lines and automatically determined centerlines were resampled to 1.0 mm uniform point distance. A true positive distance of \(\delta = 5.0\) mm was chosen such that, if for a ground truth point (GTP) no result point within \(\delta \) was found, the GTP was counted as FN. Result points having a corresponding GTP within \(\delta \) were counted as TP, all other as FP. From the TP, FP, and FN values we calculated sensitivity, precision and Dice using (1).

Tables 3 and 4 summarize our results from the 4-fold cross-validation. The point wise responses (TP, FP, FN) are averaged up over all cases. The evaluation measures are finally reported on a per rib basis, as well as for all ribs. The Euclidean distance is measured as point-to-line distance (in millimeters) between result point and ground truth line. Moreover, Table 5 contains the percentage of cases with missed labeled ribs. Here, a rib is counted as missed, if only less than half of the ground truth rib centerline could be detected. A detected rib centerline point counts only as true positive if the correct label was determined.

Fig. 4.
figure 4

Maximum intensity projections of selected computed tomography volumes overlaid with the multi-label output of the neural network (green: first rib; red: intermediate rib; blue: twelfth rib). The selected case above display common difficulties which are inherent in the data set, such as (a) pads or (b) cables, internal devices such as (c) pacemakers, (d) stents, (e) spinal implants, (f) femural/humeral implants, (g) injected contrast agents, (h) patient shape variations such as scoliosis, and limited field of views, i.e. (i) partly missing first rib or (j) partly missing twelfth rib. (Color figure online)

Fig. 5.
figure 5

Automatically generated centerline splines associated with the fully convolutional neural network outputs displayed in Fig. 4. The selected case above display common difficulties which are inherent in the data set, such as (a) pads or (b) cables, internal devices such as (c) pacemakers, (d) stents, (e) spinal implants and (f) femural/humeral implants, (g) injected contrast agents, (h) patient shape variations such as scoliosis, limited field of views, i.e. (i) partly missing first rib or (j) twelfth rib.

With an average Euclidean distance error of 0.723 mm, we obtained an overall result that is generally better compared to what is reported in the state of the art. Although, it needs to be kept in mind that results are unfortunately not directly comparable as both the data sets as well the evaluation metrics significantly differ across prior work. Similarly to the results obtained on the probability maps, distance errors are significantly higher for first and twelfth rib compared to the rest of the rib cage. As discussed, this is caused by the intrinsic challenges of these ribs, but certainly also an affect of error propagation in that sense that the quality of the probability maps also impacts centerline extraction. Interestingly, the right ribs are generally slightly worse compared to the left ribs, probably due to a slightly unbalanced data set with more challenges on the right side. Figure 5 shows the centerlines which were automatically generated using our walker algorithm from the corresponding network outputs displayed in Fig. 4.

4 Conclusion

We presented a fully automated two-stage approach for rib centerline extraction and labeling from CT images. First, multi-label probability maps (containing the classes first rib, twelfth rib, intermediate ribs, background) are calculated using a FCNN and then centerlines are extracted from this multi-label information using a tracing algorithm. For assessment, we performed a 4-fold cross validation on a set of 113 cases which includes several cases displaying typical clinical challenges. Comparing the automated extraction results to our manual ground truth annotations, we were able to achieve an Euclidean point-to-line distance error of 0.723 mm. The 4-class label detection was crucial to simplify rib labeling by taking the network responses associated to the classes first rib and twelfth rib into account. In contrast to other approaches, no strong anatomical prior knowledge, e.g. in the form of geometrical models, was explicitly encoded into our pipeline to deal with pathological deviations.

Future work will focus on improving the performance of the neural network by using motion field and registration based data augmentation techniques and a more advanced data-driven image preprocessing which aims at better separating foreground and background voxels.

We would like to mention that the approach outlined above was already extended by training the FCNN to segment a dilated spinal column centerline in addition to the dilated rib centerlines. This additional information is utilized by the walker algorithm in order to generate a spinal column centerline which is subsequently used to improve the performance of the rib tracing, for example by detecting wrongly labeled rib pairs.