Abstract
Recent advances in deep learning and computer vision offer an excellent opportunity to investigate high-level visual analysis tasks such as human localization and human pose estimation. Although the performances of human localization and human pose estimation have significantly improved in recent reports, they are not perfect, and erroneous estimation of position and pose can be expected among video frames. Studies on the integration of these techniques into a generic pipeline robust to those errors are still lacking. This paper fills the missing study. We explored and developed two working pipelines that suited visual-based positioning and pose estimation tasks. Analyses of the proposed pipelines were conducted on a badminton game. We showed that the concept of tracking by detection could work well, and errors in position and pose could be effectively handled by linear interpolation of information from nearby frames. The results showed that the Visual-based Positioning and Pose Estimation could deliver position and pose estimations with good spatial and temporal resolutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Action analysis based on a skeleton figure, i.e., a stick man figure.
- 2.
AR/VR applications could either track the position of a head mounted display unit in a 3D world space using external sensors (outside-in) or using internal sensors (inside-out) equipped on the head-mounted display device.
- 3.
Matterport.
References
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. arXiv:1703.06870v3 (2018)
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France, vol. 1, pp. 432–439 (2003)
Mo, L., Li, F., Zhu, Y., Huang, A.: Human physical activity recognition based on computer vision with deep learning model. In: Proceedings of the IEEE International Conference on Instrumentation and Measurement Technology, Taipei, Taiwan, pp. 1–6 (2016)
Kojima, A., Tamura, T., Fukunaga, K.: Natural language description of human activities from video images based on concept hierarchical of actions. Int. J. Comput. Vis. 50(2), 171–184 (2002)
Xu, D., Zhu, Y., Choy, C.B., Li, F.F.: Scene graph generation by iterative message passing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR 2017), pp. 5410–5419 (2017)
Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2011), pp. 1297–1304 (2011)
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2014), pp. 1297–1304 (2011)
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. arXiv:1411.4280v3 (2015)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machine. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2016) (2016)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (2017)
Kudo, Y., Ogaki, K., Matsui, Y., Odagiri, Y.: Unsupervised adversarial learning of 3D human pose from 2D joint locations. arXiv:1803.08244v1 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS), pp. 91–99 (2015)
Phon-Amnuaisuk, S., Murata, K.T., Pavarangkoon, P., Mizuhara, T., Hadi, S.: Children activity descriptions from visual and textual associations. In: Chamchong, R., Wong, K.W. (eds.) MIWAI 2019. LNCS (LNAI), vol. 11909, pp. 121–132. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33709-4_11
Acknowledgments
This publication is the output of the ASEAN IVO (http://www.nict.go.jp/en/asean_ivo/index.html) project titled Event Analysis: Applications of computer vision and AI in smart tourism industry and financially supported by NICT (http://www.nict.go.jp/en/index.html). We would also like to thank anonymous reviewers for their constructive comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Phon-Amnuaisuk, S., Murata, K.T., Kovavisaruch, LO., Lim, TH., Pavarangkoon, P., Mizuhara, T. (2020). Visual-Based Positioning and Pose Estimation. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_68
Download citation
DOI: https://doi.org/10.1007/978-3-030-63820-7_68
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)