Abstract
Deep CNNs have recently led to new standards in all fields of computer vision with specialized architectures for most challenges, including Video Object Segmentation and Pose Tracking. We extend Space-Time Memory Networks for the simultaneous detection of multiple object parts. This enables the detection of human body parts for multiple persons in videos. Results in terms of F1-score are satisfactory (a score of 47.6 with the best configuration evaluated on PoseTrack18 datatset) and encouraging for follow-up work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andriluka, M., et al.: PoseTrack: a benchmark for human pose estimation and tracking. In: CVPR, pp. 5167–5176 (2018)
Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. In: FG, pp. 468–475 (2017)
Bruckert, A., Tavakoli, H.R., Liu, Z., Christie, M., Meur, O.L.: Deep saliency models : the quest for the loss function. Neurocomputing 453, 693–704 (2021)
Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2019)
Doering, A., Iqbal, U., Gall, J.: JointFlow: temporal flow fields for multi person pose estimation. In: BMVC, pp. 261–272 (2018)
Fieraru, M., Khoreva, A., Pishchulin, L., Schiele, B.: Learning to refine human pose estimation. In: CVPR, pp. 318–327 (2018)
Girdhar, R., Gkioxari, G., Torresani, L., Paluri, M., Tran, D.: Detect-and-track: efficient pose estimation in videos. In: CVPR, pp. 350–359 (2018)
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3
Jin, S., Liu, W., Ouyang, W., Qian, C.: Multi-person articulated tracking with spatial and temporal embeddings. In: CVPR, pp. 5657–5666 (2019)
Kreiss, S., Bertoni, L., Alahi, A.: PifPaf: composite fields for human pose estimation. In: CVPR, pp. 11977–11986 (2019)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Miller, A., Fisch, A., Dodge, J., Karimi, A.H., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. In: EMNLP, pp. 1400–1409 (2016)
Ning, G., Huang, H.: LightTrack: a generic framework for online top-down human pose tracking. In: CVPR, pp. 4456–4465 (2020)
Oh, S.W., Lee, J.Y., Xu, N., Kim, S.J.: Video object segmentation using spacetime memory networks. In: ICCV, pp. 9225–9234 (2019)
Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 Davis challenge on video object segmentation. arXiv:1704.00675 (2017)
Raaj, Y., Idrees, H., Hidalgo, G., Sheikh, Y.: Efficient online multi-person 2D pose tracking with recurrent spatio-temporal affinity fields. In: CVPR, pp. 4620–4628 (2019)
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR, pp. 1653–1660 (2014)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR, pp. 4724–4732 (2016)
Xiu, Y., Li, J., Wang, H., Fang, Y., Lu, C.: Pose flow: efficient online pose tracking. In: BMVC, pp. 53–64 (2018)
Xu, N., et al.: Youtube-VOS: A large-scale video object segmentation benchmark. arXiv:1809.03327 (2018)
Acknowledgments
This research work contributes to the french collaborative project TASV (autonomous passengers service train), with SNCF, Alstom Crespin, Thales, Bosch, and SpirOps. It was carried out in the framework of FCS Railenium, Famars and co-financed by the European Union with the European Regional Development Fund (Hauts-de-France region).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Dufour, R., Meurie, C., Lézoray, O., Mahtani, A. (2022). Space-Time Memory Networks for Multi-person Skeleton Body Part Detection. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2022. Lecture Notes in Computer Science, vol 13364. Springer, Cham. https://doi.org/10.1007/978-3-031-09282-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-09282-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09281-7
Online ISBN: 978-3-031-09282-4
eBook Packages: Computer ScienceComputer Science (R0)