$$\hbox {C}^3$$ Fusion: Consistent Contrastive Colon Fusion, Towards Deep SLAM in Colonoscopy | SpringerLink
Skip to main content

\(\hbox {C}^3\)Fusion: Consistent Contrastive Colon Fusion, Towards Deep SLAM in Colonoscopy

  • Conference paper
  • First Online:
Shape in Medical Imaging (ShapeMI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14350))

Included in the following conference series:

Abstract

3D colon reconstruction from Optical Colonoscopy (OC) to detect non-examined surfaces remains an unsolved problem. The challenges arise from the nature of optical colonoscopy data, characterized by highly reflective low-texture surfaces, drastic illumination changes and frequent tracking loss. Recent methods demonstrate compelling results, but suffer from: (1) frangible frame-to-frame (or frame-to-model) pose estimation resulting in many tracking failures; or (2) rely on point-based representations at the cost of scan quality. In this paper, we propose a novel reconstruction framework that addresses these issues end to end, which result in both quantitatively and qualitatively accurate and robust 3D colon reconstruction. Our SLAM approach, which employs correspondences based on contrastive deep features, and deep consistent depth maps, estimates globally optimized poses, is able to recover from frequent tracking failures, and estimates a global consistent 3D model; all within a single framework. We perform an extensive experimental evaluation on multiple synthetic and real colonoscopy videos, showing high-quality results and comparisons against relevant baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 7549
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 9437
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alyabsi, M., Algarni, M., Alshammari, K.: Trends in colorectal cancer incidence rates in Saudi Arabia (2001–2016) using Saudi national registry: Early- versus late-onset disease. Front. Oncol. 11, 3392 (2021)

    Article  Google Scholar 

  2. Bian, J., et al.: Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: NeurIPS (2019)

    Google Scholar 

  3. International Agency for Research on Cancer: Globocan 2020: Cancer fact sheets-colorectal cancer”. https://gco.iarc.fr/today/data/factsheets/cancers/10_8_9-Colorectum-fact-sheet.pdf

  4. Xiang, C., H., Li, K., Fu, Z., Liu, M., Chen, Z., Guo, Y.: Distortion-aware monocular depth estimation for omnidirectional images. IEEE Signal Process. Lett. 28, 334–338 (2021)

    Google Scholar 

  5. Chen, K., et al.: MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)

  6. Chen, R.J., Bobrow, T.L., Athey, T., Mahmood, F., Durr, N.J.: Slam endoscopy enhanced by adversarial depth prediction. In: KDD Workshop on Applied Data Science for Healthcare 2019 (2019)

    Google Scholar 

  7. Choi, S., Zhou, Q.Y., Koltun, V.: Robust reconstruction of indoor scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  8. Curless, B., Levoy, M.: A volumetric method for building complex models from range images. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (1996)

    Google Scholar 

  9. Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3d reconstruction using on-the-fly surface re-integration. CoRR (2016)

    Google Scholar 

  10. Dai, J., et al.: Deformable convolutional networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 764–773 (2017)

    Google Scholar 

  11. DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: self-supervised interest point detection and description. CoRR (2017). http://arxiv.org/abs/1712.07629

  12. Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. (2018)

    Google Scholar 

  13. Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth prediction (October 2019)

    Google Scholar 

  14. Gower, J.: Generalized procrustes analysis. Psychometrika 40(1), 33–51 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  15. Grisetti, G., Kümmerle, R., Stachniss, C., Burgard, W.: A tutorial on graph-based slam. IEEE Intell. Transp. Syst. Mag. 2(4), 31–43 (2010)

    Article  Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)

    Google Scholar 

  17. Jau, Y.Y., Zhu, R., Su, H., Chandraker, M.: Deep keypoint-based camera pose estimation with geometric constraints. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4950–4957 (2020). https://doi.org/10.1109/IROS45743.2020.9341229

  18. Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. A 32(5), 922–923 (1976)

    Article  Google Scholar 

  19. Kumar, V.R., et al.: Fisheyedistancenet: self-supervised scale-aware distance estimation using monocular fisheye camera for autonomous driving. 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 574–581 (2020)

    Google Scholar 

  20. Liang, Z., Richards, R.: Virtual colonoscopy vs optical colonoscopy. Expert Opinion Med. Diagn. 4(2), 159–169 (2010), 20473367[pmid]

    Google Scholar 

  21. Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)

    Google Scholar 

  22. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3d surface construction algorithm. SIGGRAPH Comput. Graph. 21(4), 163–169 (1987)

    Article  Google Scholar 

  23. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)

    Google Scholar 

  24. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)

    Google Scholar 

  25. Ma, R., et al.: Colon10k: a benchmark for place recognition in colonoscopy. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 1279–1283 (2021). https://doi.org/10.1109/ISBI48211.2021.9433780

  26. Ma, R., et al.: Rnnslam: reconstructing the 3d colon to visualize missing regions during a colonoscopy. Med. Image Anal. 72, 102100 (2021)

    Article  Google Scholar 

  27. Mirzaei, H., Panahi, M., Etemad, K., GHanbari-Motlagh, A., Holakouie-Naini, K.A.: Evaluation of pilot colorectal cancer screening programs in iran. Iranian J. Epidem. 12(3) (2016)

    Google Scholar 

  28. Mohaghegh, P., Ahmadi, F., Shiravandi, M., Nazari, J.: Participation rate, risk factors, and incidence of colorectal cancer in the screening program among the population covered by the health centers in arak, iran. Inter. J. Cancer Manag. 14(7), e113278 (2021)

    Article  Google Scholar 

  29. Moshfeghi, K., Mohammadbeigi, A., Hamedi-Sanani, D., Bahrami, M.: Evaluation the role of nutritional and individual factors in colorectal cancer. Zahedan J. Res. Med. Sci. 13(4), e93934 (2011)

    Google Scholar 

  30. Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. CoRR (2015). http://arxiv.org/abs/1502.00956

  31. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. ArXiv (2018)

    Google Scholar 

  32. Ozyoruk, K.B., et al.: Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med. Image Anal. 71, 102058 (2021)

    Article  Google Scholar 

  33. Rau, A., et al.: Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy. Inter. J. Comput. Assisted Radiol. Surgery4 (2019)

    Google Scholar 

  34. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. CoRR (2014). http://arxiv.org/abs/1409.0575

  35. Shao, S., et al.: Self-supervised monocular depth and ego-motion estimation in endoscopy: Appearance flow to the rescue. Med. Image Anal., 102338 (2021)

    Google Scholar 

  36. Smith, K., et al.: Data from ct colonography. the cancer imaging archive (2015). https://doi.org/10.7937/K9/TCIA.2015.NWTESAY1

  37. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS 2016, pp. 1857–1865. Curran Associates Inc., Red Hook, NY, USA (2016)

    Google Scholar 

  38. Widya, A.R., Monno, Y., Okutomi, M., Suzuki, S., Gotoda, T., Miki, K.: Learning-based depth and pose estimation for monocular endoscope with loss generalization. CoRR abs/ arXiv: 2107.13263 (2021)

  39. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance-level discrimination. ArXiv (2018)

    Google Scholar 

  40. Yao, H., Stidham, R.W., Gao, Z., Gryak, J., Najarian, K.: Motion-based camera localization system in colonoscopy videos. Med. Image Anal. 73, 102180 (2021)

    Article  Google Scholar 

  41. Zhang, S., Zhao, L., Huang, S., Ye, M., Hao, Q.: A template-based 3d reconstruction of colon structures and textures from stereo colonoscopic images. IEEE Trans. Med. Robotics Bionics 3(1), 85–95 (2021)

    Article  Google Scholar 

  42. Zhang, Y., et ak.: Colde: a depth estimation framework for colonoscopy reconstruction (2021)

    Google Scholar 

  43. Zhang, Y., Wang, S., Ma, R., McGill, S.K., Rosenman, J.G., Pizer, S.M.: Lighting enhancement aids reconstruction of colonoscopic surfaces (2021)

    Google Scholar 

  44. Zhang, Z., Scaramuzza, D.: A tutorial on quantitative trajectory evaluation for visual(-inertial) odometry. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).,pp. 7244–7251 (2018)

    Google Scholar 

  45. Zhou, Q.Y., Koltun, V.: Dense scene reconstruction with points of interest. ACM Trans. Graph. 32 (2013)

    Google Scholar 

  46. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. CoRR (2017). http://arxiv.org/abs/1704.07813

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erez Posner .

Editor information

Editors and Affiliations

A. Appendix

A. Appendix

1.1 A.1 Synthetic Data Generation

For the purpose of reproducibility, we state the parameters that were used to build each synthetic sequence using the synthetic colonoscopy simulator [41]. The parameters are summarised in Table. 4, where RP stands for Random Path. It is worth mentioning that the user does not have the ability to set the seed of the random number generator for the random path chosen.

1.2 A.2 Depth Training and Implementation Details

We use AdamW optimizer [23], with \(\beta _1 = 0.9\), \(\beta _2 = 0.999\). We train the synthetic and the Colon10k models for 40 epochs, with a batch size of 16 on a 24GB Nvidia 3090 RTX. The initial learning rate is \(10^{-4}\); we reduce it by half on each of the 16th, 24th and 32nd epochs. As for the 3D colon print model, we train for 200 epochs; reduce the learning rate by half on each of the 80th, 120th and 170th epochs. We center-crop the synthetic images to \(400\times 400\) to remove vignetting effects. The Colon10k images are provided in an un-distorted and center-cropped version of 270\(\,\times \,\)216 pixels. Finally, the cropped image is scaled to 224\(\,\times \,\)224 before feeding to the network. For the 3D colon print, we employ test time training due to the scarcity of the data and the fact that the training process is completely self-supervised.

To generate the specular reflection mask for each frame, we convert the input frames to YUV color-space and apply a threshold of 90% on the Y channel and dilate the resulting binary mask with a kernel of 13 pixels.

We use MMLab’s [5] implementation of ResNet [16], deformable convolutions and FPN. All ResNet encoders and the FPN were pre-trained on ImageNet [34]. We use ResNet50 for the depth encoder. For the pose encoder and FPN, we use ResNet18. Deformable convolution layers are applied in the depth encoder stages of conv3, conv4 and conv5. We set \(\lambda _{ph-extra}=0.1\) and \(\lambda _{dc}=0.1\), \(\tau =0.01\).

1.3 A.3 Correspondence Matching Qualitative Results

Matching examples of ContraFeat, SIFT [24] and SuperPoint [11] are shown in Fig. 8. ContraFeat incline to produce more correct matches and spread out evenly throughout the image, and is less susceptible against drastic illumination changes.

Fig. 8.
figure 8

Matching qualitative comparison on the synthetic data. Correct matches are lines and mismatches are lines. Mismatches defined when correspondence re-projection error is greater than 1\(\%\) of colons diameter. (Color figure online)

1.4 A.4 Comparison of the Estimated Trajectories and Ground Truth Trajectories

Fig 9 compares the estimated trajectory and ground truth trajectory on the 3D colon print between DSO [12], our framework using SuperPoint [11] and our proposed method. The pose estimation from the network is of arbitrary scale. Therefore, we first align the two trajectories using similarity transform [18] following with first-frame alignment for better visualization and comparison. Note that the estimated trajectories by our framework is more accurate with loops of similar shape as compared to the ground truth trajectory.

Fig. 9.
figure 9

Comparison of the estimated trajectories and ground truth trajectories on the 3D colon print sequence.

1.5 A.5 Extra Qualitative Depth-Map Predictions Results

In Fig. 10 and Fig. 11 we show extra depth-map predictions of the 3D colon print and Colon10K [25].

1.6 A.6 Extra Qualitative 3D Reconstruction Results on Colon10K

In Fig. 12 and Fig. 13 we show extra points-of-view of the 3D reconstructions by our proposed framework on Colon10K [25] and 3D colon print data.

Table 4. Synthetic data creation parameters.

1.7 A.7 SuperPoint Training

SuperPoint [11] was trained using [17] Pytorch implementation with their suggested improvements that enable end to end training using a softargmax at the detector head and a sparse descriptor loss that allows an efficient training. Photo-metric augmentations were adapted to the colon data-set by lowering the contrast, blur and noise levels to values that enabled the extraction of features even from deeper shadowed areas of the colon. the network was trained for about 100 epochs, with a batch size of 10 and learning rate of 0.0003. The best checkpoint was chosen based on validation set precision and recall.

Fig. 10.
figure 10

Extra depth map prediction results on 3D colon print.

Fig. 11.
figure 11

Extra depth map prediction results on Colon10K [25]. Left image exhibits a highly specular area with strong motion blur. The middle image exhibits strong illumination differences. The right image exhibits low texture images. In all three examples, our depth network produces detailed and artifact-free depth maps.

Fig. 12.
figure 12

Extra reconstruction qualitative results on Colon10K [25] and 3D colon print.

Fig. 13.
figure 13

Extra points-of-view of the 3D reconstruction results on Colon10K data-set. proposed framework (top), mesh reconstructed from depth and pose predictions by Godard et al. [13] (bottom).

Fig. 14.
figure 14

Captured video and re-rendered reconstruction model similarity.

1.8 A.8 Supplementary Video Results

In the supplementary video, labeled as rgb_tex_geo.mp4, we show the fully endoscopic investigation of the 3D colon print while comparing the resemblance between the reconstructed model and the captured RGB images. This is accomplished by re-rendering the reconstructed model using the camera intrinsics, camera predicted pose and framework’s output mesh. In the video rgb_tex_geo.mp4 we visualise the captured video (Left) next to the re-rendered reconstructed model with texture (right). An example can be seen in Fig. 14. An additional camera fly-through video is available, labeled as fly_through.mkv, showing the final reconstruction of the 3D colon print.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Posner, E., Zholkover, A., Frank, N., Bouhnik, M. (2023). \(\hbox {C}^3\)Fusion: Consistent Contrastive Colon Fusion, Towards Deep SLAM in Colonoscopy. In: Wachinger, C., Paniagua, B., Elhabian, S., Li, J., Egger, J. (eds) Shape in Medical Imaging. ShapeMI 2023. Lecture Notes in Computer Science, vol 14350. Springer, Cham. https://doi.org/10.1007/978-3-031-46914-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46914-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46913-8

  • Online ISBN: 978-3-031-46914-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics