HMD-EgoPose: head-mounted display-based egocentric marker-less tool and hand pose estimation for augmented surgical guidance

Doughty, Mitchell; Ghugre, Nilesh R.

doi:10.1007/s11548-022-02688-y

HMD-EgoPose: head-mounted display-based egocentric marker-less tool and hand pose estimation for augmented surgical guidance

Original Article
Published: 14 June 2022

Volume 17, pages 2253–2262, (2022)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

1123 Accesses
10 Citations
Explore all metrics

Abstract

Purpose

The success or failure of modern computer-assisted surgery procedures hinges on the precise six-degree-of-freedom (6DoF) position and orientation (pose) estimation of tracked instruments and tissue. In this paper, we present HMD-EgoPose, a single-shot learning-based approach to hand and object pose estimation and demonstrate state-of-the-art performance on a benchmark dataset for monocular red-green-blue (RGB) 6DoF marker-less hand and surgical instrument pose tracking. Further, we reveal the capacity of our HMD-EgoPose framework for performant 6DoF pose estimation on a commercially available optical see-through head-mounted display (OST-HMD) through a low-latency streaming approach.

Methods

Our framework utilized an efficient convolutional neural network (CNN) backbone for multi-scale feature extraction and a set of subnetworks to jointly learn the 6DoF pose representation of the rigid surgical drill instrument and the grasping orientation of the hand of a user. To make our approach accessible to a commercially available OST-HMD, the Microsoft HoloLens 2, we created a pipeline for low-latency video and data communication with a high-performance computing workstation capable of optimized network inference.

Results

HMD-EgoPose outperformed current state-of-the-art approaches on a benchmark dataset for surgical tool pose estimation, achieving an average tool 3D vertex error of 11.0 mm on real data and furthering the progress towards a clinically viable marker-free tracking strategy. Through our low-latency streaming approach, we achieved a round trip latency of 199.1 ms for pose estimation and augmented visualization of the tracked model when integrated with the OST-HMD.

Conclusion

Our single-shot learned approach, which optimized 6DoF pose based on the joint interaction between the hand of a user and a rigid surgical drill, was robust to occlusion and complex surfaces and improved on current state-of-the-art approaches to marker-less tool and hand pose estimation. Further, we presented the feasibility of our approach for 6DoF object tracking on a commercially available OST-HMD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of single-stage vision models for pose estimation of surgical instruments

Article 30 April 2023

Towards markerless surgical tool and hand pose estimation

Article Open access 21 April 2021

Support Point Sets for Improving Contactless Interaction in Geometric Learning for Hand Pose Estimation

Data availability

The described software is available here: https://github.com/doughtmw/hmd-ego-pose (accessed on 23 February 2022). Additional data is available on request from the corresponding author.

References

Navab N, Blum T, Wang L, Okur A, Wendler T (2012) First deployments of augmented reality in operating rooms. Computer 45(7):48–55
Article Google Scholar
Sorriento A, Porfido MB, Mazzoleni S, Calvosa G, Tenucci M, Ciuti G, Dario P (2019) Optical and electromagnetic tracking systems for biomedical applications: a critical review on potentialities and limitations. IEEE Rev Biomed Eng 13:212–232
Article PubMed Google Scholar
Doughty M, Ghugre NR (2022) Head-mounted display-based augmented reality for image-guided media delivery to the heart: a preliminary investigation of perceptual accuracy. J Imaging 8(2):33
Article PubMed PubMed Central Google Scholar
Müller F, Roner S, Liebmann F, Spirig JM, Fürnstahl P, Farshad M (2020) Augmented reality navigation for spinal pedicle screw instrumentation using intraoperative 3d imaging. Spine J 20(4):621–628
Article PubMed Google Scholar
Doughty, M, Singh, K, Ghugre NR (2021) Surgeonassist-net: towards context-aware head-mounted display-based augmented reality for surgical guidance. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 667–677
Bernhardt S, Nicolau SA, Soler L, Doignon C (2017) The status of augmented reality in laparoscopic surgery as of 2016. Med Image Anal 37:66–90
Article PubMed Google Scholar
Meola A, Cutolo F, Carbone M, Cagnazzo F, Ferrari M, Ferrari V (2017) Augmented reality in neurosurgery: a systematic review. Neurosurg Rev 40(4):537–548
Article PubMed Google Scholar
Jud L, Fotouhi J, Andronic O, Aichmair A, Osgood G, Navab N, Farshad M (2020) Applicability of augmented reality in orthopedic surgery—a systematic review. BMC Musculoskelet Disord 21(1):1–13
Article Google Scholar
Rahman R, Wood ME, Qian L, Price CL, Johnson AA, Osgood GM (2020) Head-mounted display use in surgery: a systematic review. Surg Innov 27(1):88–100
Article PubMed Google Scholar
Fitzpatrick JM (2010) The role of registration in accurate surgical guidance. Proc Inst Mech Eng Part H J Eng Med 224(5):607–622
Article CAS Google Scholar
Hinterstoisser, S, Lepetit, V, Ilic, S, Holzer, S, Bradski, G, Konolige, K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian Conference on Computer Vision. Springer, Berlin, pp 548–562
Drost, B, Ulrich, M, Navab, N, Ilic S (2010) Model globally, match locally: efficient and robust 3d object recognition. In: 2010 IEEE Computer society conference on computer vision and pattern recognition. IEEE, pp 998–1005
Brachmann, E, Krull, A, Michel, F, Gumhold, S, Shotton, J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates. In: European conference on computer vision. Springer, Berlin, pp 536–551
Sahin, C, Kim T-K (2018) Recovering 6d object pose: a review and multi-modal analysis. In: Proceedings of the European conference on computer vision (ECCV) workshops
Tekin, B, Sinha, SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 292–301
Xiang, Y, Schmidt, T, Narayanan, V, Fox D (2018) PoseCNN: a convolutional neural network for 6d object pose estimation in cluttered scenes. In: Proceedings of robotics: science and systems
Bukschat, Y, Vetter M (2020) Efficientpose: an efficient, accurate and scalable end-to-end 6d multi object pose estimation approach. arXiv preprint arXiv:2011.04307
Peng, S, Liu, Y, Huang, Q, Zhou, X, Bao H (2019) Pvnet: pixel-wise voting network for 6DoF pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4561–4570
Song, C, Song, J, Huang Q (2020) Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 431–440
Rad, M, Lepetit V (2017) Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE international conference on computer vision, pp 3828–3836
Athitsos, V, Sclaroff S (2003) Estimating 3d hand pose from a cluttered image. In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings, vol 2. IEEE, p 432
Cai, Y, Ge, L, Cai, J, Yuan J (2018) Weakly-supervised 3d hand pose estimation from monocular RGB images. In: Proceedings of the European conference on computer vision (ECCV), pp 666–682
Mueller, F, Bernard, F, Sotnychenko, O, Mehta, D, Sridhar, S, Casas, D, Theobalt C (2018) Ganerated hands for real-time 3d hand tracking from monocular RGB. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–59
Romero, J, Tzionas, D, Black MJ (2017) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph
Hasson, Y, Varol, G, Tzionas, D, Kalevatykh, I, Black, MJ, Laptev, I, Schmid C (2019) Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11807–11816
Hasson, Y, Tekin, B, Bogo, F, Laptev, I, Pollefeys, M, Schmid C (2020) Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 571–580
Hein J, Seibold M, Bogo F, Farshad M, Pollefeys M, Fürnstahl P, Navab N (2021) Towards markerless surgical tool and hand pose estimation. Int J Comput Assist Radiol Surg 16(5):799–808
Article PubMed PubMed Central Google Scholar
Tan, M, Pang, R, Le QV (2020) EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Tan, M, Le Q (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105–6114
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Kingma, DP, Ba J (2015) Adam: a method for stochastic optimization. In: International conference for learning representations
Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334
Article Google Scholar
Ronneberger, O, Fischer, P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 234–241
He, K, Zhang, X, Ren, S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

Download references

Funding

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery program (RGPIN-2019-06367) and New Frontiers in Research Fund-Exploration (NFRFE-2019-00333). N.R.G. is supported by the National New Investigator (NNI) award from the Heart and Stroke Foundation of Canada (HSFC).

Author information

Authors and Affiliations

Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
Mitchell Doughty & Nilesh R. Ghugre
Schulich Heart Program, Sunnybrook Health Sciences Centre, Toronto, ON, Canada
Mitchell Doughty & Nilesh R. Ghugre
Physical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, Canada
Nilesh R. Ghugre

Authors

Mitchell Doughty
View author publications
You can also search for this author in PubMed Google Scholar
Nilesh R. Ghugre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mitchell Doughty.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Informed consent

This article does not contain patient data.

Human participants and/or animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 10904 KB)

Supplementary file 2 (mp4 18390 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Doughty, M., Ghugre, N.R. HMD-EgoPose: head-mounted display-based egocentric marker-less tool and hand pose estimation for augmented surgical guidance. Int J CARS 17, 2253–2262 (2022). https://doi.org/10.1007/s11548-022-02688-y

Download citation

Received: 01 March 2022
Accepted: 20 May 2022
Published: 14 June 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11548-022-02688-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

HMD-EgoPose: head-mounted display-based egocentric marker-less tool and hand pose estimation for augmented surgical guidance