Behavior Capture Based Explainable Engagement Recognition

Bei, Yijun; Guo, Songyuan; Gao, Kewei; Feng, Zunlei; Tong, Yining; Cai, Weimin; Cheng, Lechao; Xue, Liang

doi:10.1007/978-981-97-8792-0_17

Yijun Bei¹⁵,
Songyuan Guo¹⁵,
Kewei Gao¹⁵,
Zunlei Feng^15,16,
Yining Tong¹⁷,
Weimin Cai¹⁷,
Lechao Cheng¹⁸ &
…
Liang Xue¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15040))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

121 Accesses

Abstract

Engagement recognition aims to identify an individual’s level of participation in a particular activity, which has broad application fields, such as education, healthcare, and driving. However, the performance of engagement recognition in current methods is often compromised by excessive data and distractions. Our Behavior Capture based TRansformer (BCTR) introduces a Transformer-based video analysis approach, emphasizing frame and video level spatiotemporal details to improve engagement recognition. BCTR features dual branches for detecting static and dynamic signs of disengagements, such as eye closure and head down, through refined class tokens. This method allows the model to independently identify critical disengagement indicators, mirroring human observational techniques. As a result, BCTR not only boosts the precision but also enriches the interpretability of engagement assessments by recognizing these signs of disengagements. Extensive experimental results demonstrate that our BCTR model achieves superior performance, particularly in challenging environments rich in distractions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 9380; Price includes VAT (Japan)

Softcover Book: JPY 11725; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abedi, A., Khan, S.S.: Improving state-of-the-art in detecting student engagement with Resnet and TCN hybrid network. In: 2021 18th Conference on Robots and Vision (CRV), pp. 151–157. IEEE (2021)
Google Scholar
Baltrušaitis, T., Robinson, P., Morency, L.P.: Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10. IEEE (2016)
Google Scholar
Bertasius, G., Wang, H., Torresani, L.: Is space-time attention all you need for video understanding? In: ICML, vol. 2, p. 4 (2021)
Google Scholar
Binh, H.T., Trung, N.Q., Nguyen, H.A.T., Duy, B.T.: Detecting student engagement in classrooms for intelligent tutoring systems. In: 2019 23rd International Computer Science and Engineering Conference (ICSEC), pp. 145–149. IEEE (2019)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer, Berlin (2020)
Google Scholar
Chang, C., Zhang, C., Chen, L., Liu, Y.: An ensemble model using face and body tracking for engagement detection. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 616–622 (2018)
Google Scholar
Chefer, H., Gur, S., Wolf, L.: Transformer interpretability beyond attention visualization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 782–791 (2021)
Google Scholar
Dhall, A.: EmotiW 2019: automatic emotion, engagement and cohesion prediction tasks. In: 2019 International Conference on Multimodal Interaction, pp. 546–550 (2019)
Google Scholar
Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014)
Google Scholar
Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S.: Deep convolutional ranking for multilabel image annotation (2013). arXiv:1312.4894
Guo, D., Li, K., Hu, B., Zhang, Y., Wang, M.: Benchmarking micro-action recognition: dataset, method, and application. IEEE Trans. Circuits Syst. Video Technol. (2024)
Google Scholar
Gupta, A., D’Cunha, A., Awasthi, K., Balasubramanian, V.: Daisee: towards user engagement recognition in the wild (2016). arXiv:1609.01885
Huang, T., Mei, Y., Zhang, H., Liu, S., Yang, H.: Fine-grained engagement recognition in online learning environment. In: 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), pp. 338–341. IEEE (2019)
Google Scholar
Hwang, S., Heo, M., Oh, S.W., Kim, S.J.: Video instance segmentation using inter-frame communication transformers. Adv. Neural. Inf. Process. Syst. 34, 13352–13363 (2021)
Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (1955)
Article MathSciNet Google Scholar
Liao, J., Liang, Y., Pan, J.: Deep facial spatiotemporal network for engagement prediction in online learning. Appl. Intell. 51(10), 6609–6621 (2021)
Article Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer, Berlin (2016)
Google Scholar
Mehta, N.K., Prasad, S.S., Saurav, S., Saini, R., Singh, S.: Three-dimensional denseNet self-attention neural network for automatic detection of student’s engagement. Appl. Intell. 52(12), 13803–13823 (2022)
Article Google Scholar
Neimark, D., Bar, O., Zohar, M., Asselmann, D.: Video transformer network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3163–3172 (2021)
Google Scholar
Parmar, P., Tran Morris, B.: Learning to score Olympic events. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28 (2017)
Google Scholar
Saleh, K., Yu, K., Chen, F.: Video-based student engagement estimation via time convolution neural networks for remote learning. In: Australasian Joint Conference on Artificial Intelligence, pp. 658–667. Springer, Berlin (2022)
Google Scholar
Sharma, P., Joshi, S., Gautam, S., Maharjan, S., Khanal, S.R., Reis, M.C., Barroso, J., de Jesus Filipe, V.M.: Student engagement detection using emotion analysis, eye tracking and head movement with machine learning. In: International Conference on Technology and Innovation in Learning, Teaching and Education, pp. 52–68. Springer, Berlin (2022)
Google Scholar
Stewart, R., Andriluka, M., Ng, A.Y.: End-to-end people detection in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2325–2333 (2016)
Google Scholar
Su, D., Michaud, T.L., Estabrooks, P., Schwab, R.J., Eiland, L.A., Hansen, G., DeVany, M., Zhang, D., Li, Y., Pagán, J.A., et al.: Diabetes management through remote patient monitoring: the importance of patient activation and engagement with the technology. Telemed. E-Health 25(10), 952–959 (2019)
Article Google Scholar
Tian, X., Nunes, B.P., Liu, Y., Manrique, R.: Predicting student engagement using sequential ensemble model. IEEE Trans. Learn. Technol. (2023)
Google Scholar
Vinyals, O., Bengio, S., Kudlur, M.: Order matters: Sequence to sequence for sets (2015). arXiv:1511.06391
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: CNN-RNN: a unified framework for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2016)
Google Scholar
Wei, Y., Xia, W., Huang, J., Ni, B., Dong, J., Zhao, Y., Yan, S.: CNN: single-label to multi-label (2014). arXiv:1406.5726
Wu, J., Yang, B., Wang, Y., Hattori, G.: Advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction. In: Proceedings of the 2020 International Conference on Multimodal Interaction, pp. 777–783 (2020)
Google Scholar
Yang, J., Wang, K., Peng, X., Qiao, Y.: Deep recurrent multi-instance learning with Spatio-temporal features for engagement intensity prediction. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 594–598 (2018)
Google Scholar
Zha, X., Zhu, W., Xun, L., Yang, S., Liu, J.: Shifted chunk transformer for Spatio-temporal representational learning. Adv. Neural. Inf. Process. Syst. 34, 11384–11396 (2021)
Google Scholar
Zhang, H., Cheng, L., Hao, Y., Ngo, C.w.: Long-term leap attention, short-term periodic shift for video classification. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5773–5782 (2022)
Google Scholar
Zhang, H., Xiao, X., Huang, T., Liu, S., Xia, Y., Li, J.: An novel end-to-end network for automatic student engagement recognition. In: 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), pp. 342–345. IEEE (2019)
Google Scholar
Zhu, B., Lan, X., Guo, X., Barner, K.E., Boncelet, C.: Multi-rate attention based GRU model for engagement prediction. In: Proceedings of the 2020 International Conference on Multimodal Interaction, pp. 841–848 (2020)
Google Scholar

Download references

Acknowledgement

This work is supported by the Ningbo Key Research and Development Program (Grant No. 2023Z057), and the Fundamental Research Funds for the Central Universities (226-2024-00058).

Author information

Authors and Affiliations

School of Software Technology, Zhejiang University, Ningbo, China
Yijun Bei, Songyuan Guo, Kewei Gao & Zunlei Feng
Key Laboratory of Visual Perception (Zhejiang University), Ministry of Education and Microsoft, Hangzhou, China
Zunlei Feng
Zhejiang Jiyun Education Technology Group Co., Ltd., Ningbo, China
Yining Tong & Weimin Cai
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
Lechao Cheng
Computing Science and Artificial Intelligence College, Suzhou City University, Suzhou, China
Liang Xue

Authors

Yijun Bei
View author publications
You can also search for this author in PubMed Google Scholar
Songyuan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Kewei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zunlei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yining Tong
View author publications
You can also search for this author in PubMed Google Scholar
Weimin Cai
View author publications
You can also search for this author in PubMed Google Scholar
Lechao Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Liang Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lechao Cheng .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bei, Y. et al. (2025). Behavior Capture Based Explainable Engagement Recognition. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15040. Springer, Singapore. https://doi.org/10.1007/978-981-97-8792-0_17

Download citation

DOI: https://doi.org/10.1007/978-981-97-8792-0_17
Published: 09 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8791-3
Online ISBN: 978-981-97-8792-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics