Abstract
Recent advancements in deep learning have significantly enhanced facial video-based depression recognition. However, existing models encounter critical limitations that hinder their performance. They struggle with spatially localized facial feature extraction, relying on either a single convolution or a convolution operation with a simplistic attention mechanism, leading to inadequate recognition of depression-relevant patterns. Besides, increasing model depth leads to the extraction of ambiguous, abstract features while losing crucial dynamic facial details. To address these issues, we propose LMS-VDR by combining the Multi-Scale Mixed Attention Module (MSMAM) with landmark-based prior knowledge integration. More specifically, MSMAM synergistically merges channel and spatial attention, forming a mixed attention vector block through vector products. It introduces a dense connection mechanism, directly connecting features of each dimension to the final output, thereby enabling multi-scale diversity feature extraction. The integration of landmarks, initially linearly transformed and later combined with temporal feature sequences, enhances dynamic temporal feature extraction using our proposed Cross Multi-head Self-Attention (CMHSA) block based on self-attention. Experiments on AVEC 2013 and AVEC 2014 datasets validate our method’s efficacy, achieving MAE/RMSE of 6.04/7.68 and 5.98/7.59, respectively. Our proposed method offers a promising direction for clinical depression assessment, demonstrating the potential for significant contributions in this critical domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cai, C., Niu, M., Liu, B., Tao, J., Liu, X.: Tdca-net: Time-domain channel attention network for depression detection. In: Interspeech, pp. 2511–2515 (2021)
Casado, C.Á., Cañellas, M.L., López, M.B.: Depression recognition using remote photoplethysmography from facial videos. IEEE Trans. Affect. Comput. (2023)
Hammen, C.: Stress and depression. Annu. Rev. Clin. Psychol. 1, 293–319 (2005)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, L., Guo, C., Tiwari, P., Pandey, H.M., Dang, W.: Intelligent system for depression scale estimation with facial expressions and case study in industrial intelligence. Int. J. Intell. Syst. 37(12), 10140–10156 (2022)
He, L., Jiang, D., Sahli, H.: Automatic depression analysis using dynamic facial appearance descriptor and dirichlet process fisher encoding. IEEE Trans. Multimedia 21(6), 1476–1486 (2018)
He, L., Tiwari, P., Lv, C., Wu, W., Guo, L.: Reducing noisy annotations for depression estimation from facial images. Neural Netw. 153, 120–129 (2022)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Liu, Z., Yuan, X., Li, Y., Shangguan, Z., Zhou, L., Hu, B.: Pra-net: Part-and-relation attention network for depression recognition from facial expression. Comput. Biol. Med. 157, 106589 (2023)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)
de Melo, W.C., Granger, E., Lopez, M.B.: Mdn: A deep maximization-differentiation network for spatio-temporal depression detection. IEEE Trans. Affect. Comput. 14(1), 578–590 (2021)
Niu, M., He, L., Li, Y., Liu, B.: Depressioner: facial dynamic representation for automatic depression level prediction. Expert Syst. Appl. 204, 117512 (2022)
Niu, M., Tao, J., Liu, B., Huang, J., Lian, Z.: Multimodal spatiotemporal representation for automatic depression level detection. IEEE Trans. Affect. Comput. 14(1), 294–307 (2020)
Niu, M., Zhao, Z., Tao, J., Li, Y., Schuller, B.W.: Dual attention and element recalibration networks for automatic depression level prediction. IEEE Trans. Affect. Comput. (2022)
Pan, Y., Shang, Y., Liu, T., Shao, Z., Guo, G., Ding, H., Hu, Q.: Spatial-temporal attention network for depression recognition from facial videos. Expert Syst. Appl. 237, 121410 (2024)
Pan, Y., Shang, Y., Shao, Z., Liu, T., Guo, G., Ding, H.: Integrating deep facial priors into landmarks for privacy preserving multimodal depression recognition. IEEE Trans. Affect. Comput. (2023)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Shang, Y., Pan, Y., Jiang, X., Shao, Z., Guo, G., Liu, T., Ding, H.: Lqgdnet: A local quaternion and global deep network for facial depression recognition. IEEE Trans. Affect. Comput. 14(3), 2557–2563 (2021)
Song, S., Jaiswal, S., Shen, L., Valstar, M.: Spectral representation of behaviour primitives for depression analysis. IEEE Trans. Affect. Comput. 13(2), 829–844 (2020)
Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Adv. Neural. Inf. Process. Syst. 35, 10078–10093 (2022)
Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R., Pantic, M.: Avec 2014: 3d dimensional affect and depression recognition challenge. In: Proceedings of the 4th International Workshop on Audio/visual Emotion Challenge, pp. 3–10 (2014)
Valstar, M., Schuller, B., Smith, K., Eyben, F., Jiang, B., Bilakhia, S., Schnieder, S., Cowie, R., Pantic, M.: Avec 2013: the continuous audio/visual emotion and depression recognition challenge. In: Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, pp. 3–10 (2013)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in Neural Information Processing Systems 30 (2017)
Wang, R., Guo, J., Wang, J., He, L., Yang, Y.: A multi-frame rate network with attention mechanism for depression severity estimation. In: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2679–2686. IEEE (2023)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Zhang, S., Zhang, X., Zhao, X., Fang, J., Niu, M., Zhao, Z., Yu, J., Tian, Q.: Mtdan: A lightweight multi-scale temporal difference attention networks for automated video depression detection. IEEE Trans. Affect. Comput. (2023)
Zhao, Z., Liu, Q.: Former-dfer: Dynamic facial expression recognition transformer. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1553–1561 (2021)
Zhou, X., Jin, K., Shang, Y., Guo, G.: Visually interpretable representation learning for depression recognition from facial images. IEEE Trans. Affect. Comput. 11(3), 542–552 (2018)
Zhu, Y., Shang, Y., Shao, Z., Guo, G.: Automated depression diagnosis based on deep networks to encode facial appearance and dynamics. IEEE Trans. Affect. Comput. 9(4), 578–584 (2017)
Acknowledgement
This study received support from the National Natural Science Foundation of China (Grant No. 61876112) and the Beijing Natural Science Foundation (Grant No. 4242034).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yang, M. et al. (2025). LMS-VDR: Integrating Landmarks into Multi-scale Hybrid Net for Video-Based Depression Recognition. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15040. Springer, Singapore. https://doi.org/10.1007/978-981-97-8792-0_21
Download citation
DOI: https://doi.org/10.1007/978-981-97-8792-0_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8791-3
Online ISBN: 978-981-97-8792-0
eBook Packages: Computer ScienceComputer Science (R0)