$$\hbox {C}^3$$ Fusion: Consistent Contrastive Colon Fusion, Towards Deep SLAM in Colonoscopy

Posner, Erez; Zholkover, Adi; Frank, Netanel; Bouhnik, Moshe

doi:10.1007/978-3-031-46914-5_2

Erez Posner¹²,
Adi Zholkover¹²,
Netanel Frank¹² &
…
Moshe Bouhnik¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14350))

Included in the following conference series:

International Workshop on Shape in Medical Imaging

470 Accesses
3 Citations

Abstract

3D colon reconstruction from Optical Colonoscopy (OC) to detect non-examined surfaces remains an unsolved problem. The challenges arise from the nature of optical colonoscopy data, characterized by highly reflective low-texture surfaces, drastic illumination changes and frequent tracking loss. Recent methods demonstrate compelling results, but suffer from: (1) frangible frame-to-frame (or frame-to-model) pose estimation resulting in many tracking failures; or (2) rely on point-based representations at the cost of scan quality. In this paper, we propose a novel reconstruction framework that addresses these issues end to end, which result in both quantitatively and qualitatively accurate and robust 3D colon reconstruction. Our SLAM approach, which employs correspondences based on contrastive deep features, and deep consistent depth maps, estimates globally optimized poses, is able to recover from frequent tracking failures, and estimates a global consistent 3D model; all within a single framework. We perform an extensive experimental evaluation on multiple synthetic and real colonoscopy videos, showing high-quality results and comparisons against relevant baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 7549; Price includes VAT (Japan)

Softcover Book: JPY 9437; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Structure-Preserving Image Translation for Depth Estimation in Colonoscopy

Estimating the Coverage in 3D Reconstructions of the Colon from Colonoscopy Videos

Unsupervised colonoscopic depth estimation by domain translations with a Lambertian-reflection keeping auxiliary task

Article 17 May 2021

References

Alyabsi, M., Algarni, M., Alshammari, K.: Trends in colorectal cancer incidence rates in Saudi Arabia (2001–2016) using Saudi national registry: Early- versus late-onset disease. Front. Oncol. 11, 3392 (2021)
Article Google Scholar
Bian, J., et al.: Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: NeurIPS (2019)
Google Scholar
International Agency for Research on Cancer: Globocan 2020: Cancer fact sheets-colorectal cancer”. https://gco.iarc.fr/today/data/factsheets/cancers/10_8_9-Colorectum-fact-sheet.pdf
Xiang, C., H., Li, K., Fu, Z., Liu, M., Chen, Z., Guo, Y.: Distortion-aware monocular depth estimation for omnidirectional images. IEEE Signal Process. Lett. 28, 334–338 (2021)
Google Scholar
Chen, K., et al.: MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Chen, R.J., Bobrow, T.L., Athey, T., Mahmood, F., Durr, N.J.: Slam endoscopy enhanced by adversarial depth prediction. In: KDD Workshop on Applied Data Science for Healthcare 2019 (2019)
Google Scholar
Choi, S., Zhou, Q.Y., Koltun, V.: Robust reconstruction of indoor scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (1996)
Google Scholar
Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3d reconstruction using on-the-fly surface re-integration. CoRR (2016)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 764–773 (2017)
Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: self-supervised interest point detection and description. CoRR (2017). http://arxiv.org/abs/1712.07629
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. (2018)
Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth prediction (October 2019)
Google Scholar
Gower, J.: Generalized procrustes analysis. Psychometrika 40(1), 33–51 (1975)
Article MathSciNet MATH Google Scholar
Grisetti, G., Kümmerle, R., Stachniss, C., Burgard, W.: A tutorial on graph-based slam. IEEE Intell. Transp. Syst. Mag. 2(4), 31–43 (2010)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
Google Scholar
Jau, Y.Y., Zhu, R., Su, H., Chandraker, M.: Deep keypoint-based camera pose estimation with geometric constraints. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4950–4957 (2020). https://doi.org/10.1109/IROS45743.2020.9341229
Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. A 32(5), 922–923 (1976)
Article Google Scholar
Kumar, V.R., et al.: Fisheyedistancenet: self-supervised scale-aware distance estimation using monocular fisheye camera for autonomous driving. 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 574–581 (2020)
Google Scholar
Liang, Z., Richards, R.: Virtual colonoscopy vs optical colonoscopy. Expert Opinion Med. Diagn. 4(2), 159–169 (2010), 20473367[pmid]
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)
Google Scholar
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3d surface construction algorithm. SIGGRAPH Comput. Graph. 21(4), 163–169 (1987)
Article Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)
Google Scholar
Ma, R., et al.: Colon10k: a benchmark for place recognition in colonoscopy. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 1279–1283 (2021). https://doi.org/10.1109/ISBI48211.2021.9433780
Ma, R., et al.: Rnnslam: reconstructing the 3d colon to visualize missing regions during a colonoscopy. Med. Image Anal. 72, 102100 (2021)
Article Google Scholar
Mirzaei, H., Panahi, M., Etemad, K., GHanbari-Motlagh, A., Holakouie-Naini, K.A.: Evaluation of pilot colorectal cancer screening programs in iran. Iranian J. Epidem. 12(3) (2016)
Google Scholar
Mohaghegh, P., Ahmadi, F., Shiravandi, M., Nazari, J.: Participation rate, risk factors, and incidence of colorectal cancer in the screening program among the population covered by the health centers in arak, iran. Inter. J. Cancer Manag. 14(7), e113278 (2021)
Article Google Scholar
Moshfeghi, K., Mohammadbeigi, A., Hamedi-Sanani, D., Bahrami, M.: Evaluation the role of nutritional and individual factors in colorectal cancer. Zahedan J. Res. Med. Sci. 13(4), e93934 (2011)
Google Scholar
Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. CoRR (2015). http://arxiv.org/abs/1502.00956
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. ArXiv (2018)
Google Scholar
Ozyoruk, K.B., et al.: Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med. Image Anal. 71, 102058 (2021)
Article Google Scholar
Rau, A., et al.: Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy. Inter. J. Comput. Assisted Radiol. Surgery4 (2019)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. CoRR (2014). http://arxiv.org/abs/1409.0575
Shao, S., et al.: Self-supervised monocular depth and ego-motion estimation in endoscopy: Appearance flow to the rescue. Med. Image Anal., 102338 (2021)
Google Scholar
Smith, K., et al.: Data from ct colonography. the cancer imaging archive (2015). https://doi.org/10.7937/K9/TCIA.2015.NWTESAY1
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS 2016, pp. 1857–1865. Curran Associates Inc., Red Hook, NY, USA (2016)
Google Scholar
Widya, A.R., Monno, Y., Okutomi, M., Suzuki, S., Gotoda, T., Miki, K.: Learning-based depth and pose estimation for monocular endoscope with loss generalization. CoRR abs/ arXiv: 2107.13263 (2021)
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance-level discrimination. ArXiv (2018)
Google Scholar
Yao, H., Stidham, R.W., Gao, Z., Gryak, J., Najarian, K.: Motion-based camera localization system in colonoscopy videos. Med. Image Anal. 73, 102180 (2021)
Article Google Scholar
Zhang, S., Zhao, L., Huang, S., Ye, M., Hao, Q.: A template-based 3d reconstruction of colon structures and textures from stereo colonoscopic images. IEEE Trans. Med. Robotics Bionics 3(1), 85–95 (2021)
Article Google Scholar
Zhang, Y., et ak.: Colde: a depth estimation framework for colonoscopy reconstruction (2021)
Google Scholar
Zhang, Y., Wang, S., Ma, R., McGill, S.K., Rosenman, J.G., Pizer, S.M.: Lighting enhancement aids reconstruction of colonoscopic surfaces (2021)
Google Scholar
Zhang, Z., Scaramuzza, D.: A tutorial on quantitative trajectory evaluation for visual(-inertial) odometry. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).,pp. 7244–7251 (2018)
Google Scholar
Zhou, Q.Y., Koltun, V.: Dense scene reconstruction with points of interest. ACM Trans. Graph. 32 (2013)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. CoRR (2017). http://arxiv.org/abs/1704.07813

Download references

Author information

Authors and Affiliations

Intuitive Surgical, Sunnyvale, CA, 94086, USA
Erez Posner, Adi Zholkover, Netanel Frank & Moshe Bouhnik

Authors

Erez Posner
View author publications
You can also search for this author in PubMed Google Scholar
Adi Zholkover
View author publications
You can also search for this author in PubMed Google Scholar
Netanel Frank
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Bouhnik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erez Posner .

Editor information

Editors and Affiliations

Technical University of Munich, Munich, Germany
Christian Wachinger
University of North Carolina at Chapel Hill, Carrboro, NC, USA
Beatriz Paniagua
University of Utah, Salt Lake City, UT, USA
Shireen Elhabian
Essen University Hospital, Essen, Germany
Jianning Li
Essen University Hospital, Essen, Germany
Jan Egger

A. Appendix

1.1 A.1 Synthetic Data Generation

For the purpose of reproducibility, we state the parameters that were used to build each synthetic sequence using the synthetic colonoscopy simulator [41]. The parameters are summarised in Table. 4, where RP stands for Random Path. It is worth mentioning that the user does not have the ability to set the seed of the random number generator for the random path chosen.

1.2 A.2 Depth Training and Implementation Details

We use AdamW optimizer [23], with $\beta _1 = 0.9$, $\beta _2 = 0.999$. We train the synthetic and the Colon10k models for 40 epochs, with a batch size of 16 on a 24GB Nvidia 3090 RTX. The initial learning rate is $10^{-4}$; we reduce it by half on each of the 16th, 24th and 32nd epochs. As for the 3D colon print model, we train for 200 epochs; reduce the learning rate by half on each of the 80th, 120th and 170th epochs. We center-crop the synthetic images to $400\times 400$ to remove vignetting effects. The Colon10k images are provided in an un-distorted and center-cropped version of 270$\,\times \,$216 pixels. Finally, the cropped image is scaled to 224$\,\times \,$224 before feeding to the network. For the 3D colon print, we employ test time training due to the scarcity of the data and the fact that the training process is completely self-supervised.

To generate the specular reflection mask for each frame, we convert the input frames to YUV color-space and apply a threshold of 90% on the Y channel and dilate the resulting binary mask with a kernel of 13 pixels.

We use MMLab’s [5] implementation of ResNet [16], deformable convolutions and FPN. All ResNet encoders and the FPN were pre-trained on ImageNet [34]. We use ResNet50 for the depth encoder. For the pose encoder and FPN, we use ResNet18. Deformable convolution layers are applied in the depth encoder stages of conv3, conv4 and conv5. We set $\lambda _{ph-extra}=0.1$ and $\lambda _{dc}=0.1$, $\tau =0.01$.

1.3 A.3 Correspondence Matching Qualitative Results

Matching examples of ContraFeat, SIFT [24] and SuperPoint [11] are shown in Fig. 8. ContraFeat incline to produce more correct matches and spread out evenly throughout the image, and is less susceptible against drastic illumination changes.

1.4 A.4 Comparison of the Estimated Trajectories and Ground Truth Trajectories

Fig 9 compares the estimated trajectory and ground truth trajectory on the 3D colon print between DSO [12], our framework using SuperPoint [11] and our proposed method. The pose estimation from the network is of arbitrary scale. Therefore, we first align the two trajectories using similarity transform [18] following with first-frame alignment for better visualization and comparison. Note that the estimated trajectories by our framework is more accurate with loops of similar shape as compared to the ground truth trajectory.

1.5 A.5 Extra Qualitative Depth-Map Predictions Results

In Fig. 10 and Fig. 11 we show extra depth-map predictions of the 3D colon print and Colon10K [25].

1.6 A.6 Extra Qualitative 3D Reconstruction Results on Colon10K

In Fig. 12 and Fig. 13 we show extra points-of-view of the 3D reconstructions by our proposed framework on Colon10K [25] and 3D colon print data.

Table 4. Synthetic data creation parameters.

Full size table

1.7 A.7 SuperPoint Training

SuperPoint [11] was trained using [17] Pytorch implementation with their suggested improvements that enable end to end training using a softargmax at the detector head and a sparse descriptor loss that allows an efficient training. Photo-metric augmentations were adapted to the colon data-set by lowering the contrast, blur and noise levels to values that enabled the extraction of features even from deeper shadowed areas of the colon. the network was trained for about 100 epochs, with a batch size of 10 and learning rate of 0.0003. The best checkpoint was chosen based on validation set precision and recall.

1.8 A.8 Supplementary Video Results

In the supplementary video, labeled as rgb_tex_geo.mp4, we show the fully endoscopic investigation of the 3D colon print while comparing the resemblance between the reconstructed model and the captured RGB images. This is accomplished by re-rendering the reconstructed model using the camera intrinsics, camera predicted pose and framework’s output mesh. In the video rgb_tex_geo.mp4 we visualise the captured video (Left) next to the re-rendered reconstructed model with texture (right). An example can be seen in Fig. 14. An additional camera fly-through video is available, labeled as fly_through.mkv, showing the final reconstruction of the 3D colon print.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Posner, E., Zholkover, A., Frank, N., Bouhnik, M. (2023). $\hbox {C}^3$Fusion: Consistent Contrastive Colon Fusion, Towards Deep SLAM in Colonoscopy. In: Wachinger, C., Paniagua, B., Elhabian, S., Li, J., Egger, J. (eds) Shape in Medical Imaging. ShapeMI 2023. Lecture Notes in Computer Science, vol 14350. Springer, Cham. https://doi.org/10.1007/978-3-031-46914-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-46914-5_2
Published: 31 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46913-8
Online ISBN: 978-3-031-46914-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

\(\hbox {C}^3\)Fusion: Consistent Contrastive Colon Fusion, Towards Deep SLAM in Colonoscopy