{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T01:10:12Z","timestamp":1733188212401,"version":"3.30.0"},"reference-count":59,"publisher":"Tsinghua University Press","issue":"1","license":[{"start":{"date-parts":[[2023,11,30]],"date-time":"2023-11-30T00:00:00Z","timestamp":1701302400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,11,30]],"date-time":"2023-11-30T00:00:00Z","timestamp":1701302400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Comp. Visual Media"],"published-print":{"date-parts":[[2024,2]]},"abstract":"Abstract<\/jats:title>Estimating 3D hand shape from a single-view RGB image is important for many applications. However, the diversity of hand shapes and postures, depth ambiguity, and occlusion may result in pose errors and noisy hand meshes. Making full use of 2D cues such as 2D pose can effectively improve the quality of 3D human hand shape estimation. In this paper, we use 2D joint heatmaps to obtain spatial details for robust pose estimation. We also introduce a depth-independent 2D mesh to avoid depth ambiguity in mesh regression for efficient hand-image alignment. Our method has four cascaded stages: 2D cue extraction, pose feature encoding, initial reconstruction, and reconstruction refinement. Specifically, we first encode the image to determine semantic features during 2D cue extraction; this is also used to predict hand joints and for segmentation. Then, during the pose feature encoding stage, we use a hand joints encoder to learn spatial information from the joint heatmaps. Next, a coarse 3D hand mesh and 2D mesh are obtained in the initial reconstruction step; a mesh squeeze-and-excitation block is used to fuse different hand features to enhance perception of 3D hand structures. Finally, a global mesh refinement stage learns non-local relations between vertices of the hand mesh from the predicted 2D mesh, to predict an offset hand mesh to fine-tune the reconstruction results. Quantitative and qualitative results on the FreiHAND benchmark dataset demonstrate that our approach achieves state-of-the-art performance.\n<\/jats:p>","DOI":"10.1007\/s41095-023-0346-4","type":"journal-article","created":{"date-parts":[[2023,11,30]],"date-time":"2023-11-30T07:01:41Z","timestamp":1701327701000},"page":"79-96","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["3D hand pose and shape estimation from monocular RGB via efficient 2D cues"],"prefix":"10.26599","volume":"10","author":[{"given":"Fenghao","family":"Zhang","sequence":"first","affiliation":[]},{"given":"Lin","family":"Zhao","sequence":"additional","affiliation":[]},{"given":"Shengling","family":"Li","sequence":"additional","affiliation":[]},{"given":"Wanjuan","family":"Su","sequence":"additional","affiliation":[]},{"given":"Liman","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Wenbing","family":"Tao","sequence":"additional","affiliation":[]}],"member":"11138","published-online":{"date-parts":[[2023,11,30]]},"reference":[{"issue":"4","key":"346_CR1","doi-asserted-by":"publisher","first-page":"501","DOI":"10.1109\/TVCG.2015.2391860","volume":"21","author":"Y Jang","year":"2015","unstructured":"Jang, Y.; Noh, S. T.; Chang, H. J.; Kim, T. K.; Woo, W. 3D finger CAPE: Clicking action and position estimation under self-occlusions in egocentric viewpoint. IEEE Transactions on Visualization and Computer Graphies Vol. 21, No. 4, 501\u2013510, 2015.","journal-title":"IEEE Transactions on Visualization and Computer Graphies"},{"issue":"3","key":"346_CR2","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1109\/TVCG.2008.190","volume":"15","author":"T Lee","year":"2009","unstructured":"Lee, T.; Hollerer, T. Multithreaded hybrid feature tracking for markerless augmented reality. IEEE Transactions on Visualization and Computer Graphics Vol. 15, No. 3, 355\u2013368, 2009.","journal-title":"IEEE Transactions on Visualization and Computer Graphics"},{"key":"346_CR3","first-page":"282","volume-title":"Human-Computer Interaction\u2013INTERACT 2013. Lecture Notes in Computer Science, Vol. 8118.","author":"T Piumsomboon","year":"2013","unstructured":"Piumsomboon, T.; Clark, A.; Billinghurst, M.; Cockburn, A. User-defined gestures for augmented reality. In: Human-Computer Interaction\u2013INTERACT 2013. Lecture Notes in Computer Science, Vol. 8118. Kotz\u00e9, P.; Marsden, G.; Lindgaard, G.; Wesson, J.; Winckler, M. Eds. Springer Berlin Heidelberg, 282\u2013299, 2013."},{"issue":"1","key":"346_CR4","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1007\/s41095-017-0098-0","volume":"4","author":"T Kikuchi","year":"2018","unstructured":"Kikuchi, T.; Endo, Y.; Kanamori, Y.; Hashimoto, T.; Mitani, J. Transferring pose and augmenting background for deep human-image parsing and its applications. Computational Visual Media Vol. 4, No. 1, 43\u201354, 2018.","journal-title":"Computational Visual Media"},{"issue":"1","key":"346_CR5","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/s41095-020-0162-z","volume":"6","author":"M Wang","year":"2020","unstructured":"Wang, M.; Lyu, X. Q.; Li, Y. J.; Zhang, F. L. VR content creation and exploration with deep learning: A survey. Computational Visual Media Vol. 6, No. 1, 3\u201328, 2020.","journal-title":"Computational Visual Media"},{"key":"346_CR6","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1016\/j.neucom.2021.01.045","volume":"437","author":"P F Ren","year":"2021","unstructured":"Ren, P. F.; Sun, H. F.; Huang, W. T.; Hao, J. C.; Cheng, D. X.; Qi, Q.; Wang, J. Y.; Liao, J. X. Spatial-aware stacked regression network for real-time 3D hand pose estimation. Neurocomputing Vol. 437, 42\u201357, 2021.","journal-title":"Neurocomputing"},{"key":"346_CR7","first-page":"752","volume-title":"Computer Vision\u2013ECCV 2020. Lecture Notes in Computer Science, Vol. 12352.","author":"G Moon","year":"2020","unstructured":"Moon, G.; Lee, K. M. I2L-MeshNet: Image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: Computer Vision\u2013ECCV 2020. Lecture Notes in Computer Science, Vol. 12352. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 752\u2013768, 2020."},{"key":"346_CR8","doi-asserted-by":"crossref","unstructured":"Zhang, X.; Huang, H. S.; Tan, J. C.; Xu, H. M.; Yang, C.; Peng, G. Z.; Wang, L.; Liu, J. Hand image understanding via deep multi-task learning. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, 11261\u201311272, 2021.","DOI":"10.1109\/ICCV48922.2021.01109"},{"key":"346_CR9","doi-asserted-by":"crossref","unstructured":"Chen, X. Y.; Liu, Y. F.; Ma, C. Y.; Chang, J. L.; Wang, H. Y.; Chen, T.; Guo, X. Y.; Wan, P. F.; Zheng, W. Camera-space hand mesh recovery via semantic aggregation and adaptive 2D-1D registration. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 13269\u201313278, 2021.","DOI":"10.1109\/CVPR46437.2021.01307"},{"key":"346_CR10","doi-asserted-by":"crossref","unstructured":"Lin, K.; Wang, L. J.; Liu, Z. C. End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 1954\u20131963, 2021.","DOI":"10.1109\/CVPR46437.2021.00199"},{"key":"346_CR11","doi-asserted-by":"crossref","unstructured":"Tang, X.; Wang, T. Y.; Fu, C. W. Towards accurate alignment in real-time 3D hand-mesh reconstruction. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, 11678\u201311687, 2021.","DOI":"10.1109\/ICCV48922.2021.01149"},{"key":"346_CR12","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1016\/j.neucom.2021.12.013","volume":"474","author":"C Y Gao","year":"2022","unstructured":"Gao, C. Y.; Yang, Y. J.; Li, W. S. 3D interacting hand pose and shape estimation from a single RGB image. Neurocomputing Vol. 474, 25\u201336, 2022.","journal-title":"Neurocomputing"},{"issue":"14","key":"346_CR13","doi-asserted-by":"publisher","first-page":"16667","DOI":"10.1007\/s10489-022-03390-x","volume":"52","author":"I Kourbane","year":"2022","unstructured":"Kourbane, I.; Genc, Y. A graph-based approach for absolute 3D hand pose estimation using a single RGB image. Applied Intelligence Vol. 52, No. 14, 16667\u201316682, 2022.","journal-title":"Applied Intelligence"},{"key":"346_CR14","doi-asserted-by":"crossref","unstructured":"Loper, M.; Mahmood, N.; Romero, J.; Pons-Moll, G.; Black, M. J. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics Vol. 34, No. 6, Article No. 248, 2015.","DOI":"10.1145\/2816795.2818013"},{"key":"346_CR15","doi-asserted-by":"crossref","unstructured":"Romero, J.; Tzionas, D.; Black, M. J. Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics Vol. 36, No. 6, Article No. 245, 2017.","DOI":"10.1145\/3130800.3130883"},{"key":"346_CR16","doi-asserted-by":"crossref","unstructured":"Kanazawa, A.; Black, M. J.; Jacobs, D. W.; Malik, J. End-to-end recovery of human shape and pose. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 7122\u20137131, 2018.","DOI":"10.1109\/CVPR.2018.00744"},{"key":"346_CR17","doi-asserted-by":"crossref","unstructured":"Hasson, Y.; Varol, G.; Tzionas, D.; Kalevatykh, I.; Black, M. J.; Laptev, I.; Schmid, C. Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 11799\u201311808, 2019.","DOI":"10.1109\/CVPR.2019.01208"},{"key":"346_CR18","doi-asserted-by":"crossref","unstructured":"Zhou, Y. X.; Habermann, M.; Xu, W. P.; Habibie, I.; Theobalt, C.; Xu, F. Monocular real-time hand shape and motion capture using multi-modal data. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 5345\u20135354, 2020.","DOI":"10.1109\/CVPR42600.2020.00539"},{"key":"346_CR19","unstructured":"Yang, L. X.; Li, J. S.; Xu, W. Q.; Diao, Y. Q.; Lu, C. W. BiHand: Recovering hand mesh with multi-stage bisected hourglass networks. arXiv preprint arXiv:2008.05079, 2020."},{"key":"346_CR20","unstructured":"Kulon, D.; Wang, H. Y.; G\u00fcler, R. A.; Bronstein, M.; Zafeiriou, S. Single image 3D hand reconstruction with mesh convolutions. arXiv preprint arXiv:1905.01326, 2019."},{"key":"346_CR21","doi-asserted-by":"crossref","unstructured":"Lin, K.; Wang, L. J.; Liu, Z. C. Mesh graphormer. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, 12919\u201312928, 2021.","DOI":"10.1109\/ICCV48922.2021.01270"},{"key":"346_CR22","doi-asserted-by":"crossref","unstructured":"Ge, L. H.; Ren, Z.; Li, Y. C.; Xue, Z. H.; Wang, Y. Y.; Cai, J. F.; Yuan, J. S. 3D hand shape and pose estimation from a single RGB image. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 10825\u201310834, 2019.","DOI":"10.1109\/CVPR.2019.01109"},{"key":"346_CR23","doi-asserted-by":"crossref","unstructured":"Zhang, B. W.; Wang, Y. G.; Deng, X. M.; Zhang, Y. D.; Tan, P.; Ma, C. X.; Wang, H. A. Interacting two-hand 3D pose and shape reconstruction from single color image. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, 11334\u201311343, 2021.","DOI":"10.1109\/ICCV48922.2021.01116"},{"key":"346_CR24","doi-asserted-by":"crossref","unstructured":"Zimmermann, C.; Brox, T. Learning to estimate 3D hand pose from single RGB images. In: Proceedings of the IEEE International Conference on Computer Vision, 4913\u20134921, 2017.","DOI":"10.1109\/ICCV.2017.525"},{"key":"346_CR25","doi-asserted-by":"crossref","unstructured":"Spurr, A.; Song, J.; Park, S.; Hilliges, O. Cross-modal deep variational hand pose estimation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 89\u201398, 2018.","DOI":"10.1109\/CVPR.2018.00017"},{"key":"346_CR26","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1007\/978-3-030-01252-6_8","volume-title":"Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11215.","author":"U Iqbal","year":"2018","unstructured":"Iqbal, U.; Molchanov, P.; Breuel, T.; Gall, J.; Kautz, J. Hand pose estimation via latent 2.5D heatmap regression. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11215. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 125\u2013143, 2018."},{"key":"346_CR27","doi-asserted-by":"crossref","unstructured":"Mueller, F.; Bernard, F.; Sotnychenko, O.; Mehta, D.; Sridhar, S.; Casas, D.; Theobalt, C. GANerated hands for real-time 3D hand tracking from monocular RGB. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 49\u201359, 2018.","DOI":"10.1109\/CVPR.2018.00013"},{"key":"346_CR28","first-page":"678","volume-title":"Computer Vision\u2013ECCV 2018. Lecture Notes in Computer Science, Vol. 11210.","author":"Y J Cai","year":"2018","unstructured":"Cai, Y. J.; Ge, L. H.; Cai, J. F.; Yuan, J. S. Weakly-supervised 3D hand pose estimation from monocular RGB images. In: Computer Vision\u2013ECCV 2018. Lecture Notes in Computer Science, Vol. 11210. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 678\u2013694, 2018."},{"key":"346_CR29","doi-asserted-by":"crossref","unstructured":"Yang, L. L.; Yao, A. Disentangling latent hands for image synthesis and pose estimation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 9869\u20139878, 2019.","DOI":"10.1109\/CVPR.2019.01011"},{"key":"346_CR30","doi-asserted-by":"crossref","unstructured":"Doosti, B.; Naha, S.; Mirbagheri, M.; Crandall, D. J. HOPE-net: A graph-based model for hand-object pose estimation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 6607\u20136616, 2020.","DOI":"10.1109\/CVPR42600.2020.00664"},{"key":"346_CR31","first-page":"211","volume-title":"Computer Vision\u2013ECCV 2020. Lecture Notes in Computer Science, Vol. 12362.","author":"A Spurr","year":"2020","unstructured":"Spurr, A.; Iqbal, U.; Molchanov, P.; Hilliges, O.; Kautz, J. Weakly supervised 3D hand pose estimation via biomechanical constraints. In: Computer Vision\u2013ECCV 2020. Lecture Notes in Computer Science, Vol. 12362. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 211\u2013228, 2020."},{"key":"346_CR32","doi-asserted-by":"crossref","unstructured":"Boukhayma, A.; de Bem, R.; Torr, P. H. S. 3D hand shape and pose from images in the wild. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 10835\u201310844, 2019.","DOI":"10.1109\/CVPR.2019.01110"},{"key":"346_CR33","doi-asserted-by":"crossref","unstructured":"Zhang, X.; Li, Q.; Mo, H.; Zhang, W. B.; Zheng, W. End-to-end hand mesh recovery from a monocular RGB image. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, 2354\u20132364, 2019.","DOI":"10.1109\/ICCV.2019.00244"},{"key":"346_CR34","doi-asserted-by":"crossref","unstructured":"Baek, S.; Kim, K. I.; Kim, T. K. Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 1067\u20131076, 2019.","DOI":"10.1109\/CVPR.2019.00116"},{"key":"346_CR35","doi-asserted-by":"crossref","unstructured":"Kolotouros, N.; Pavlakos, G.; Daniilidis, K. Convo-lutional mesh regression for single-image human shape reconstruction. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 4496\u20134505, 2019.","DOI":"10.1109\/CVPR.2019.00463"},{"key":"346_CR36","doi-asserted-by":"crossref","unstructured":"Kulon, D.; G\u00fcler, R. A.; Kokkinos, I.; Bronstein, M. M.; Zafeiriou, S. Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 4989\u20134999, 2020.","DOI":"10.1109\/CVPR42600.2020.00504"},{"key":"346_CR37","doi-asserted-by":"crossref","unstructured":"Chen, P.; Chen, Y. J.; Yang, D.; Wu, F. Y.; Li, Q.; Xia, Q. P.; Tan, Y. I2UV-HandNet: Image-to-UV prediction network for accurate and high-fidelity 3D hand mesh modeling. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, 12909\u201312918, 2021.","DOI":"10.1109\/ICCV48922.2021.01269"},{"issue":"2","key":"346_CR38","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1007\/s41095-020-0171-y","volume":"6","author":"M P Li","year":"2020","unstructured":"Li, M. P.; Zhou, Z. M.; Liu, X. G. 3D hypothesis clustering for cross-view matching in multi-person motion capture. Computational Visual Media Vol. 6, No. 2, 147\u2013156, 2020.","journal-title":"Computational Visual Media"},{"key":"346_CR39","doi-asserted-by":"crossref","unstructured":"Pavlakos, G.; Zhu, L. Y.; Zhou, X. W.; Daniilidis, K. Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 459\u2013468, 2018.","DOI":"10.1109\/CVPR.2018.00055"},{"key":"346_CR40","first-page":"20","volume-title":"Computer Vision\u2013ECCV 2018. Lecture Notes in Computer Science, Vol. 11211.","author":"G Varol","year":"2018","unstructured":"Varol, G.; Ceylan, D.; Russell, B.; Yang, J. M.; Yumer, E.; Laptev, I.; Schmid, C. BodyNet: Volumetric inference of 3D human body shapes. In: Computer Vision\u2013ECCV 2018. Lecture Notes in Computer Science, Vol. 11211. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 20\u201338, 2018."},{"key":"346_CR41","doi-asserted-by":"crossref","unstructured":"He, K. M.; Gkioxari, G.; Doll\u00e1r, P.; Girshick, R. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2980\u20132988, 2017.","DOI":"10.1109\/ICCV.2017.322"},{"key":"346_CR42","doi-asserted-by":"crossref","unstructured":"Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 7132\u20137141, 2018.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"346_CR43","unstructured":"Nair, V.; Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, 807\u2013814, 2010."},{"key":"346_CR44","doi-asserted-by":"crossref","unstructured":"Chang, J. Y.; Moon, G.; Lee, K. M. V2V-PoseNet: Voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 5079\u20135088, 2018.","DOI":"10.1109\/CVPR.2018.00533"},{"key":"346_CR45","doi-asserted-by":"crossref","unstructured":"Malik, J.; Abdelaziz, I.; Elhayek, A.; Shimada, S.; Ali, S. A.; Golyanik, V.; Theobalt, C.; Stricker, D. HandVoxNet: Deep voxel-based network for 3D hand shape and pose estimation from a single depth map. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 7111\u20137120, 2020.","DOI":"10.1109\/CVPR42600.2020.00714"},{"key":"346_CR46","unstructured":"Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In: Proceedings of the 36th International Conference on Machine Learning, 7354\u20137363, 2019."},{"key":"346_CR47","doi-asserted-by":"crossref","unstructured":"Wang, X. L.; Girshick, R.; Gupta, A.; He, K. M. Non-local neural networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 7794\u20137803, 2018.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"346_CR48","doi-asserted-by":"crossref","unstructured":"Gong, S. W.; Chen, L.; Bronstein, M.; Zafeiriou, S. SpiralNet: A fast and highly efficient mesh convolution operator. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshop, 4141\u20134148, 2019.","DOI":"10.1109\/ICCVW.2019.00509"},{"key":"346_CR49","first-page":"483","volume-title":"Computer Vision\u2013ECCV 2016. Lecture Notes in Computer Science, Vol. 9912.","author":"A Newell","year":"2016","unstructured":"Newell, A.; Yang, K. Y.; Deng, J. Stacked hourglass networks for human pose estimation. In: Computer Vision\u2013ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 483\u2013499, 2016."},{"key":"346_CR50","doi-asserted-by":"crossref","unstructured":"He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770\u2013778, 2016.","DOI":"10.1109\/CVPR.2016.90"},{"key":"346_CR51","first-page":"769","volume-title":"Computer Vision\u2013ECCV 2020. Lecture Notes in Computer Science, Vol. 12352.","author":"H Choi","year":"2020","unstructured":"Choi, H.; Moon, G.; Lee, K. M. Pose2Mesh: Graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: Computer Vision\u2013ECCV 2020. Lecture Notes in Computer Science, Vol. 12352. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 769\u2013787, 2020."},{"key":"346_CR52","doi-asserted-by":"crossref","unstructured":"Chen, Y. J.; Tu, Z. G.; Kang, D.; Bao, L. C.; Zhang, Y.; Zhe, X. F.; Chen, R. Z.; Yuan, J. S. Model-based 3D hand reconstruction via self-supervised learning. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 10446\u201310455, 2021.","DOI":"10.1109\/CVPR46437.2021.01031"},{"key":"346_CR53","doi-asserted-by":"crossref","unstructured":"Zimmermann, C.; Ceylan, D.; Yang, J. M.; Russell, B.; Argus, M. J.; Brox, T. FreiHAND: A dataset for markerless capture of hand pose and shape from single RGB images. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, 813\u2013822, 2019.","DOI":"10.1109\/ICCV.2019.00090"},{"issue":"3","key":"346_CR54","doi-asserted-by":"publisher","first-page":"1921","DOI":"10.1609\/aaai.v35i3.16287","volume":"35","author":"M R Li","year":"2021","unstructured":"Li, M. R.; Gao, Y.; Sang, N. Exploiting learnable joint groups for hand pose estimation. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 3, 1921\u20131929, 2021.","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"issue":"3","key":"346_CR55","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","volume":"115","author":"O Russakovsky","year":"2015","unstructured":"Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S. A.; Huang, Z. H.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211\u2013252, 2015.","journal-title":"International Journal of Computer Vision"},{"key":"346_CR56","unstructured":"Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014."},{"key":"346_CR57","unstructured":"Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z. M.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 721, 8026\u20138037, 2019."},{"issue":"1","key":"346_CR58","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1007\/BF02291478","volume":"40","author":"J C Gower","year":"1975","unstructured":"Gower, J. C. Generalized Procrustes analysis. Psychometrika Vol. 40, No. 1, 33\u201351, 1975.","journal-title":"Psychometrika"},{"key":"346_CR59","doi-asserted-by":"crossref","unstructured":"Yang, L. L.; Li, S. L.; Lee, D.; Yao, A. Aligning latent spaces for 3D hand pose estimation. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, 2335\u20132343, 2019.","DOI":"10.1109\/ICCV.2019.00242"}],"container-title":["Computational Visual Media"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41095-023-0346-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41095-023-0346-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41095-023-0346-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T00:51:17Z","timestamp":1733187077000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41095-023-0346-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,30]]},"references-count":59,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,2]]}},"alternative-id":["346"],"URL":"https:\/\/doi.org\/10.1007\/s41095-023-0346-4","relation":{},"ISSN":["2096-0433","2096-0662"],"issn-type":[{"type":"print","value":"2096-0433"},{"type":"electronic","value":"2096-0662"}],"subject":[],"published":{"date-parts":[[2023,11,30]]},"assertion":[{"value":"3 July 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 November 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declaration of competing interest"}}]}}