HMD-EgoPose: head-mounted display-based egocentric marker-less tool and hand pose estimation for augmented surgical guidance | International Journal of Computer Assisted Radiology and Surgery
Skip to main content

HMD-EgoPose: head-mounted display-based egocentric marker-less tool and hand pose estimation for augmented surgical guidance

  • Original Article
  • Published:
International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Abstract

Purpose

The success or failure of modern computer-assisted surgery procedures hinges on the precise six-degree-of-freedom (6DoF) position and orientation (pose) estimation of tracked instruments and tissue. In this paper, we present HMD-EgoPose, a single-shot learning-based approach to hand and object pose estimation and demonstrate state-of-the-art performance on a benchmark dataset for monocular red-green-blue (RGB) 6DoF marker-less hand and surgical instrument pose tracking. Further, we reveal the capacity of our HMD-EgoPose framework for performant 6DoF pose estimation on a commercially available optical see-through head-mounted display (OST-HMD) through a low-latency streaming approach.

Methods

Our framework utilized an efficient convolutional neural network (CNN) backbone for multi-scale feature extraction and a set of subnetworks to jointly learn the 6DoF pose representation of the rigid surgical drill instrument and the grasping orientation of the hand of a user. To make our approach accessible to a commercially available OST-HMD, the Microsoft HoloLens 2, we created a pipeline for low-latency video and data communication with a high-performance computing workstation capable of optimized network inference.

Results

HMD-EgoPose outperformed current state-of-the-art approaches on a benchmark dataset for surgical tool pose estimation, achieving an average tool 3D vertex error of 11.0 mm on real data and furthering the progress towards a clinically viable marker-free tracking strategy. Through our low-latency streaming approach, we achieved a round trip latency of 199.1 ms for pose estimation and augmented visualization of the tracked model when integrated with the OST-HMD.

Conclusion

Our single-shot learned approach, which optimized 6DoF pose based on the joint interaction between the hand of a user and a rigid surgical drill, was robust to occlusion and complex surfaces and improved on current state-of-the-art approaches to marker-less tool and hand pose estimation. Further, we presented the feasibility of our approach for 6DoF object tracking on a commercially available OST-HMD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The described software is available here: https://github.com/doughtmw/hmd-ego-pose (accessed on 23 February 2022). Additional data is available on request from the corresponding author.

References

  1. Navab N, Blum T, Wang L, Okur A, Wendler T (2012) First deployments of augmented reality in operating rooms. Computer 45(7):48–55

    Article  Google Scholar 

  2. Sorriento A, Porfido MB, Mazzoleni S, Calvosa G, Tenucci M, Ciuti G, Dario P (2019) Optical and electromagnetic tracking systems for biomedical applications: a critical review on potentialities and limitations. IEEE Rev Biomed Eng 13:212–232

    Article  PubMed  Google Scholar 

  3. Doughty M, Ghugre NR (2022) Head-mounted display-based augmented reality for image-guided media delivery to the heart: a preliminary investigation of perceptual accuracy. J Imaging 8(2):33

    Article  PubMed  PubMed Central  Google Scholar 

  4. Müller F, Roner S, Liebmann F, Spirig JM, Fürnstahl P, Farshad M (2020) Augmented reality navigation for spinal pedicle screw instrumentation using intraoperative 3d imaging. Spine J 20(4):621–628

    Article  PubMed  Google Scholar 

  5. Doughty, M, Singh, K, Ghugre NR (2021) Surgeonassist-net: towards context-aware head-mounted display-based augmented reality for surgical guidance. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 667–677

  6. Bernhardt S, Nicolau SA, Soler L, Doignon C (2017) The status of augmented reality in laparoscopic surgery as of 2016. Med Image Anal 37:66–90

    Article  PubMed  Google Scholar 

  7. Meola A, Cutolo F, Carbone M, Cagnazzo F, Ferrari M, Ferrari V (2017) Augmented reality in neurosurgery: a systematic review. Neurosurg Rev 40(4):537–548

    Article  PubMed  Google Scholar 

  8. Jud L, Fotouhi J, Andronic O, Aichmair A, Osgood G, Navab N, Farshad M (2020) Applicability of augmented reality in orthopedic surgery—a systematic review. BMC Musculoskelet Disord 21(1):1–13

    Article  Google Scholar 

  9. Rahman R, Wood ME, Qian L, Price CL, Johnson AA, Osgood GM (2020) Head-mounted display use in surgery: a systematic review. Surg Innov 27(1):88–100

    Article  PubMed  Google Scholar 

  10. Fitzpatrick JM (2010) The role of registration in accurate surgical guidance. Proc Inst Mech Eng Part H J Eng Med 224(5):607–622

    Article  CAS  Google Scholar 

  11. Hinterstoisser, S, Lepetit, V, Ilic, S, Holzer, S, Bradski, G, Konolige, K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian Conference on Computer Vision. Springer, Berlin, pp 548–562

  12. Drost, B, Ulrich, M, Navab, N, Ilic S (2010) Model globally, match locally: efficient and robust 3d object recognition. In: 2010 IEEE Computer society conference on computer vision and pattern recognition. IEEE, pp 998–1005

  13. Brachmann, E, Krull, A, Michel, F, Gumhold, S, Shotton, J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates. In: European conference on computer vision. Springer, Berlin, pp 536–551

  14. Sahin, C, Kim T-K (2018) Recovering 6d object pose: a review and multi-modal analysis. In: Proceedings of the European conference on computer vision (ECCV) workshops

  15. Tekin, B, Sinha, SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 292–301

  16. Xiang, Y, Schmidt, T, Narayanan, V, Fox D (2018) PoseCNN: a convolutional neural network for 6d object pose estimation in cluttered scenes. In: Proceedings of robotics: science and systems

  17. Bukschat, Y, Vetter M (2020) Efficientpose: an efficient, accurate and scalable end-to-end 6d multi object pose estimation approach. arXiv preprint arXiv:2011.04307

  18. Peng, S, Liu, Y, Huang, Q, Zhou, X, Bao H (2019) Pvnet: pixel-wise voting network for 6DoF pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4561–4570

  19. Song, C, Song, J, Huang Q (2020) Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 431–440

  20. Rad, M, Lepetit V (2017) Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE international conference on computer vision, pp 3828–3836

  21. Athitsos, V, Sclaroff S (2003) Estimating 3d hand pose from a cluttered image. In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings, vol 2. IEEE, p 432

  22. Cai, Y, Ge, L, Cai, J, Yuan J (2018) Weakly-supervised 3d hand pose estimation from monocular RGB images. In: Proceedings of the European conference on computer vision (ECCV), pp 666–682

  23. Mueller, F, Bernard, F, Sotnychenko, O, Mehta, D, Sridhar, S, Casas, D, Theobalt C (2018) Ganerated hands for real-time 3d hand tracking from monocular RGB. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–59

  24. Romero, J, Tzionas, D, Black MJ (2017) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph

  25. Hasson, Y, Varol, G, Tzionas, D, Kalevatykh, I, Black, MJ, Laptev, I, Schmid C (2019) Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11807–11816

  26. Hasson, Y, Tekin, B, Bogo, F, Laptev, I, Pollefeys, M, Schmid C (2020) Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 571–580

  27. Hein J, Seibold M, Bogo F, Farshad M, Pollefeys M, Fürnstahl P, Navab N (2021) Towards markerless surgical tool and hand pose estimation. Int J Comput Assist Radiol Surg 16(5):799–808

    Article  PubMed  PubMed Central  Google Scholar 

  28. Tan, M, Pang, R, Le QV (2020) EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790

  29. Tan, M, Le Q (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105–6114

  30. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  31. Kingma, DP, Ba J (2015) Adam: a method for stochastic optimization. In: International conference for learning representations

  32. Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334

    Article  Google Scholar 

  33. Ronneberger, O, Fischer, P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 234–241

  34. He, K, Zhang, X, Ren, S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

Download references

Funding

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery program (RGPIN-2019-06367) and New Frontiers in Research Fund-Exploration (NFRFE-2019-00333). N.R.G. is supported by the National New Investigator (NNI) award from the Heart and Stroke Foundation of Canada (HSFC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mitchell Doughty.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Informed consent

This article does not contain patient data.

Human participants and/or animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 10904 KB)

Supplementary file 2 (mp4 18390 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Doughty, M., Ghugre, N.R. HMD-EgoPose: head-mounted display-based egocentric marker-less tool and hand pose estimation for augmented surgical guidance. Int J CARS 17, 2253–2262 (2022). https://doi.org/10.1007/s11548-022-02688-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11548-022-02688-y

Keywords