Abstract
Purpose
The success or failure of modern computer-assisted surgery procedures hinges on the precise six-degree-of-freedom (6DoF) position and orientation (pose) estimation of tracked instruments and tissue. In this paper, we present HMD-EgoPose, a single-shot learning-based approach to hand and object pose estimation and demonstrate state-of-the-art performance on a benchmark dataset for monocular red-green-blue (RGB) 6DoF marker-less hand and surgical instrument pose tracking. Further, we reveal the capacity of our HMD-EgoPose framework for performant 6DoF pose estimation on a commercially available optical see-through head-mounted display (OST-HMD) through a low-latency streaming approach.
Methods
Our framework utilized an efficient convolutional neural network (CNN) backbone for multi-scale feature extraction and a set of subnetworks to jointly learn the 6DoF pose representation of the rigid surgical drill instrument and the grasping orientation of the hand of a user. To make our approach accessible to a commercially available OST-HMD, the Microsoft HoloLens 2, we created a pipeline for low-latency video and data communication with a high-performance computing workstation capable of optimized network inference.
Results
HMD-EgoPose outperformed current state-of-the-art approaches on a benchmark dataset for surgical tool pose estimation, achieving an average tool 3D vertex error of 11.0 mm on real data and furthering the progress towards a clinically viable marker-free tracking strategy. Through our low-latency streaming approach, we achieved a round trip latency of 199.1 ms for pose estimation and augmented visualization of the tracked model when integrated with the OST-HMD.
Conclusion
Our single-shot learned approach, which optimized 6DoF pose based on the joint interaction between the hand of a user and a rigid surgical drill, was robust to occlusion and complex surfaces and improved on current state-of-the-art approaches to marker-less tool and hand pose estimation. Further, we presented the feasibility of our approach for 6DoF object tracking on a commercially available OST-HMD.
Similar content being viewed by others
Data availability
The described software is available here: https://github.com/doughtmw/hmd-ego-pose (accessed on 23 February 2022). Additional data is available on request from the corresponding author.
References
Navab N, Blum T, Wang L, Okur A, Wendler T (2012) First deployments of augmented reality in operating rooms. Computer 45(7):48–55
Sorriento A, Porfido MB, Mazzoleni S, Calvosa G, Tenucci M, Ciuti G, Dario P (2019) Optical and electromagnetic tracking systems for biomedical applications: a critical review on potentialities and limitations. IEEE Rev Biomed Eng 13:212–232
Doughty M, Ghugre NR (2022) Head-mounted display-based augmented reality for image-guided media delivery to the heart: a preliminary investigation of perceptual accuracy. J Imaging 8(2):33
Müller F, Roner S, Liebmann F, Spirig JM, Fürnstahl P, Farshad M (2020) Augmented reality navigation for spinal pedicle screw instrumentation using intraoperative 3d imaging. Spine J 20(4):621–628
Doughty, M, Singh, K, Ghugre NR (2021) Surgeonassist-net: towards context-aware head-mounted display-based augmented reality for surgical guidance. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 667–677
Bernhardt S, Nicolau SA, Soler L, Doignon C (2017) The status of augmented reality in laparoscopic surgery as of 2016. Med Image Anal 37:66–90
Meola A, Cutolo F, Carbone M, Cagnazzo F, Ferrari M, Ferrari V (2017) Augmented reality in neurosurgery: a systematic review. Neurosurg Rev 40(4):537–548
Jud L, Fotouhi J, Andronic O, Aichmair A, Osgood G, Navab N, Farshad M (2020) Applicability of augmented reality in orthopedic surgery—a systematic review. BMC Musculoskelet Disord 21(1):1–13
Rahman R, Wood ME, Qian L, Price CL, Johnson AA, Osgood GM (2020) Head-mounted display use in surgery: a systematic review. Surg Innov 27(1):88–100
Fitzpatrick JM (2010) The role of registration in accurate surgical guidance. Proc Inst Mech Eng Part H J Eng Med 224(5):607–622
Hinterstoisser, S, Lepetit, V, Ilic, S, Holzer, S, Bradski, G, Konolige, K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian Conference on Computer Vision. Springer, Berlin, pp 548–562
Drost, B, Ulrich, M, Navab, N, Ilic S (2010) Model globally, match locally: efficient and robust 3d object recognition. In: 2010 IEEE Computer society conference on computer vision and pattern recognition. IEEE, pp 998–1005
Brachmann, E, Krull, A, Michel, F, Gumhold, S, Shotton, J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates. In: European conference on computer vision. Springer, Berlin, pp 536–551
Sahin, C, Kim T-K (2018) Recovering 6d object pose: a review and multi-modal analysis. In: Proceedings of the European conference on computer vision (ECCV) workshops
Tekin, B, Sinha, SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 292–301
Xiang, Y, Schmidt, T, Narayanan, V, Fox D (2018) PoseCNN: a convolutional neural network for 6d object pose estimation in cluttered scenes. In: Proceedings of robotics: science and systems
Bukschat, Y, Vetter M (2020) Efficientpose: an efficient, accurate and scalable end-to-end 6d multi object pose estimation approach. arXiv preprint arXiv:2011.04307
Peng, S, Liu, Y, Huang, Q, Zhou, X, Bao H (2019) Pvnet: pixel-wise voting network for 6DoF pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4561–4570
Song, C, Song, J, Huang Q (2020) Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 431–440
Rad, M, Lepetit V (2017) Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE international conference on computer vision, pp 3828–3836
Athitsos, V, Sclaroff S (2003) Estimating 3d hand pose from a cluttered image. In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings, vol 2. IEEE, p 432
Cai, Y, Ge, L, Cai, J, Yuan J (2018) Weakly-supervised 3d hand pose estimation from monocular RGB images. In: Proceedings of the European conference on computer vision (ECCV), pp 666–682
Mueller, F, Bernard, F, Sotnychenko, O, Mehta, D, Sridhar, S, Casas, D, Theobalt C (2018) Ganerated hands for real-time 3d hand tracking from monocular RGB. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–59
Romero, J, Tzionas, D, Black MJ (2017) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph
Hasson, Y, Varol, G, Tzionas, D, Kalevatykh, I, Black, MJ, Laptev, I, Schmid C (2019) Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11807–11816
Hasson, Y, Tekin, B, Bogo, F, Laptev, I, Pollefeys, M, Schmid C (2020) Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 571–580
Hein J, Seibold M, Bogo F, Farshad M, Pollefeys M, Fürnstahl P, Navab N (2021) Towards markerless surgical tool and hand pose estimation. Int J Comput Assist Radiol Surg 16(5):799–808
Tan, M, Pang, R, Le QV (2020) EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Tan, M, Le Q (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105–6114
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Kingma, DP, Ba J (2015) Adam: a method for stochastic optimization. In: International conference for learning representations
Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334
Ronneberger, O, Fischer, P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 234–241
He, K, Zhang, X, Ren, S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Funding
This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery program (RGPIN-2019-06367) and New Frontiers in Research Fund-Exploration (NFRFE-2019-00333). N.R.G. is supported by the National New Investigator (NNI) award from the Heart and Stroke Foundation of Canada (HSFC).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Informed consent
This article does not contain patient data.
Human participants and/or animals
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1 (mp4 10904 KB)
Supplementary file 2 (mp4 18390 KB)
Rights and permissions
About this article
Cite this article
Doughty, M., Ghugre, N.R. HMD-EgoPose: head-mounted display-based egocentric marker-less tool and hand pose estimation for augmented surgical guidance. Int J CARS 17, 2253–2262 (2022). https://doi.org/10.1007/s11548-022-02688-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11548-022-02688-y