Behavior recognition based on the improved density clustering and context-guided Bi-LSTM model

Zhou, Tongchi; Tao, Aimin; Sun, Liangfeng; Qu, Boyang; Wang, Yanzhao; Huang, Hu

doi:10.1007/s11042-023-15501-y

Behavior recognition based on the improved density clustering and context-guided Bi-LSTM model

Published: 03 May 2023

Volume 82, pages 45471–45488, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Tongchi Zhou¹,
Aimin Tao ORCID: orcid.org/0000-0002-2478-4458¹,
Liangfeng Sun¹,
Boyang Qu¹,
Yanzhao Wang¹ &
…
Hu Huang¹

248 Accesses
1 Altmetric
Explore all metrics

Abstract

Context information is vital to research video human behavior recognition. Under the LSTM together with CNN of the framework, a novel action recognition method, which extracts keyframes by the improved density clustering, and learns Spatio-temporal context information by Context-Guided BiLSTM, is proposed. Specifically, keyframes are firstly extracted by the Gini-based density clustering, then used as the inputs of CNN. Secondly, a deep Spatio-temporal Bi-directional long short-term memory neural network named by Context-Guided BiLSTM, which is built by each Bi-directional LSTM block, is utilized to model temporal dependencies of spatial features. After learning by ConvLSTM and Context-Guided BiLSTM, the results generated by the fusion module are treated as the inputs of the Softmax layer for action recognition. On the three benchmark datasets, UCF sport, UCF11, and jHMDB, experimental results show that our approach achieves good recognition results. The recognition rate is better than that of most existing action recognition methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Deep learning network model based on fusion of spatiotemporal features for action recognition

Article 14 February 2022

Project and Pool: An Action Localization Network for Localizing Actions in Untrimmed Videos

Semantic Image Networks for Human Action Recognition

Article 22 October 2019

Data availability

All data included in this study are available upon request by contact with the corresponding author.

References

Ballas N, Yao L, Pal C, Courville A (2015) Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432
Bhattacharya S, Sukthankar R, Jin R, Shah M (2011) A probabilistic representation for efficient large-scale visual recognition tasks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2593–2600
Google Scholar
Chen K, Franko K, Sang R (2021) Structured model pruning of convolutional networks on tensor processing units. arXiv preprint arXiv:2107.04191
Dai C, Liu X, Lai J (2020) Human action recognition using two-stream attention based LSTM networks. Appl Soft Comput 86:105820
Google Scholar
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Google Scholar
Fan Y, Weng S, Zhang Y, Shi B, Zhang Y (2020) Context-aware cross-attention for skeleton-based human action recognition. IEEE Access 8:15280–15290
Google Scholar
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941
Google Scholar
Gammulle H, Denman S, Sridharan S, Fookes C (2017) Two stream LSTM: a deep fusion framework for human action recognition
Graves A, Jaitly N, Mohamed AR (2013) Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, pp 273–278
Google Scholar
Guo Y, Xu Q, Sun S, Luo X, Sbert M (2016) Selecting video keyframes based on relative entropy and the extreme studentized deviate test. Entropy 18(3):73
Google Scholar
Hua M, Gao M, Zhong Z (2021) SCN: dilated silhouette convolutional network for video action recognition. Comput Aided Geom Des 85:101965
MathSciNet MATH Google Scholar
Ijjina EP, Mohan CK (2016) Hybrid deep neural network model for human action recognition. Appl Soft Comput 46:936–952
Google Scholar
Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Google Scholar
Ji Y, Zhan Y, Yang Y, Xu X, Shen F, Shen HT (2019) A context knowledge map guided coarse-to-fine action recognition. IEEE Trans Image Process 29:2742–2752
MATH Google Scholar
Jiang YG, Wu Z, Tang J, Li Z, Xue X, Chang SF (2018) Modeling multimodal clues in a hybrid deep learning framework for video classification. IEEE Trans Multimed 20(11):3137–3147
Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Li Z, Gavrilyuk K, Gavves E, Jain M, Snoek CG (2018) Videolstm convolves, attends and flows for action recognition. Comput Vis Image Underst 166:41–50
Google Scholar
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, Cham, pp 816–833
Google Scholar
Liu J, Wang G, Duan LY et al (2018) Skeleton-based human action recognition with global context-aware attention LSTM networks[J]. IEEE Trans Image Process 27(99):1586–1599
MathSciNet MATH Google Scholar
Liu H, Zhou M, Lu XS, Yao C (2018) Weighted Gini index feature selection method for imbalanced data. In: 2018 IEEE 15th international conference on networking, sensing and control (ICNSC). IEEE, pp 1–6
Google Scholar
Lu J, Corso JJ (2015) Human action segmentation with hierarchical super voxel consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3762–3771
Google Scholar
Ma CY, Chen MH, Kira Z, AlRegib G (2019) TS-LSTM and temporal-inception: exploiting spatiotemporal dynamics for activity recognition. Signal Process Image Commun 71:76–87
Google Scholar
Majd M, Safabakhsh R (2020) Correlational convolutional LSTM for human action recognition. Neurocomputing 396:224–229
Google Scholar
Meng B, Liu X, Wang X (2018) Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos. Multimed Tools Appl 77(20):26901–26918
Google Scholar
Muhammad K, Ullah A, Imran AS, Sajjad M, Kiran MS, Sannino G, de Albuquerque VHC (2021) Human action recognition using attention based LSTM network with dilated CNN features. Futur Gener Comput Syst 125:820–830
Google Scholar
Nazir S, Yousaf MH, Nebel JC, Velastin SA (2018) A bag of expression framework for improved human action recognition. Pattern Recogn Lett 103:39–45
Google Scholar
Pan Z, Li C (2020) Robust basketball sports recognition by leveraging motion block estimation. Signal Process Image Commun 83:115784
Google Scholar
Pareek P, Thakkar A (2021) A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artif Intell Rev 54(3):2259–2322
Google Scholar
Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: Proceedings of the IEEE international conference on computer vision, pp 5533–5541
Google Scholar
Ramasinghe S, Rajasegaran J, Jayasundara V, Ranasinghe K, Rodrigo R, Pasqual AA (2017) Combined static and motion features for deep-networks-based activity recognition in videos. IEEE Trans Circuits Syst Video Technol 29(9):2693–2707
Google Scholar
Ren L, Qu Z, Niu W, Niu C, Cao Y (2010) Keyframe extraction based on information entropy and edge matching rate. In: 2010 2nd international conference on future computer and communication, vol 3. IEEE, pp V3–V91
Google Scholar
Shi Z, Kim TK (2017) Learning and refining of privileged information-based RNNs for action recognition from depth sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3461–3470
Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199
Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International conference on machine learning. PMLR, pp 843–852
Google Scholar
Tang H, Liu H, Xiao W, Sebe N (2019) Fast and robust dynamic hand gesture recognition via keyframes extraction and feature fusion. Neurocomputing 331:424–433
Google Scholar
Tu Z, Xie W, Qin Q, Poppe R, Veltkamp RC, Li B, Yuan J (2018) Multi-stream CNN: learning representations based on human-related regions for action recognition. Pattern Recogn 79:32–43
Google Scholar
Tu Z, Li H, Zhang D, Dauwels J, Li B, Yuan J (2019) Action-stage emphasized spatiotemporal VLAD for video action recognition. IEEE Trans Image Process 28(6):2799–2812
MathSciNet MATH Google Scholar
Van Oord A, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. In: International conference on machine learning. PMLR, pp 1747–1756
Google Scholar
Wang C, Chen X (2017) Argument component classification with context-LSTM. In: 2017 global conference on mechanics and civil engineering (GCMCE 2017). Atlantis Press, pp 115–121
Google Scholar
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558
Google Scholar
Wang H, Yuan C, Hu W, Sun C (2012) Supervised class-specific dictionary learning for sparse modeling in action recognition. Pattern Recogn 45(11):3902–3911
Google Scholar
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision. Springer, Cham, pp 20–36
Google Scholar
Wu B, Lu HL, Jiang HJ (2020) Research on adaptive density peak clustering algorithm. Comput Appl:331–221
Xu W, Miao Z, Yu J, Ji Q (2019) Action recognition and localization with spatial and temporal contexts. Neurocomputing 333:351–363
Google Scholar
Yan A, Wang Y, Li Z, Qiao Y (2019) PA3D: pose-action 3D machine for video recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7922–7931
Google Scholar
Ye J, Wang L, Li G, Chen D, Zhe S, Chu X, Xu Z (2018) Learning compact recurrent neural networks with block-term tensor decomposition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9378–9387
Google Scholar
Yu T, Guo C, Wang L, Gu H, Xiang S, Pan C (2018) Joint spatial-temporal attention for action recognition. Pattern Recogn Lett 112:226–233
Google Scholar
Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4694–4702
Google Scholar
Zhang HB, Zhang YX, Zhong B, Lei Q, Yang L, Du JX, Chen DS (2019) A comprehensive survey of vision-based human action recognition methods. Sensors 19(5):1005
Google Scholar
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high-performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978
Google Scholar
Zhang X, Xu C, Tao D (2020) Context aware graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14333–14342
Google Scholar
Zhou TC (2016). Research on the local Spatio-Temporal relationships based feature model for action recognition (Doctoral dissertation, Southeast University)
Zhu Y, Jiang S (2019) Attention-based densely connected LSTM for video captioning. In: Proceedings of the 27th ACM international conference on multimedia, pp 802–810
Google Scholar

Download references

Acknowledgements

Fund Project of National Natural Science Foundation of China (No. 61976237), China Textile Industry Federation under Grant (No. 2018107), Key scientific research project of colleges and universities in Henan Province (No. 20B120004).

Author information

Authors and Affiliations

College of Electronic Information, Zhongyuan University of Technology, Zhengzhou, 451191, China
Tongchi Zhou, Aimin Tao, Liangfeng Sun, Boyang Qu, Yanzhao Wang & Hu Huang

Authors

Tongchi Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Aimin Tao
View author publications
You can also search for this author inPubMed Google Scholar
Liangfeng Sun
View author publications
You can also search for this author inPubMed Google Scholar
Boyang Qu
View author publications
You can also search for this author inPubMed Google Scholar
Yanzhao Wang
View author publications
You can also search for this author inPubMed Google Scholar
Hu Huang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Aimin Tao.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, T., Tao, A., Sun, L. et al. Behavior recognition based on the improved density clustering and context-guided Bi-LSTM model. Multimed Tools Appl 82, 45471–45488 (2023). https://doi.org/10.1007/s11042-023-15501-y

Download citation

Received: 13 December 2021
Revised: 07 June 2022
Accepted: 19 April 2023
Published: 03 May 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11042-023-15501-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Behavior recognition based on the improved density clustering and context-guided Bi-LSTM model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning network model based on fusion of spatiotemporal features for action recognition

Project and Pool: An Action Localization Network for Localizing Actions in Untrimmed Videos

Semantic Image Networks for Human Action Recognition

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now