Reverse Testing Image Set Model Based Multi-view Human Action Recognition | SpringerLink
Skip to main content

Reverse Testing Image Set Model Based Multi-view Human Action Recognition

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

Abstract

Recognizing human activities from videos becomes a hot research topic in computer vision, but many studies show that action recognition based on single view cannot obtain satisfying performance, thus, many researchers put their attentions on multi-view action recognition, but how to mine the relationships among different views still is a challenge problem. Since video face recognition algorithm based on image set has proved that image set algorithm can effectively mine the complementary properties of different views image, and achieves satisfying performance. Thus, Inspired by these, image set is utilized to mine the relationships among multi-view action recognition. However, the studies show that the sample number of gallery and query set in video face recognition based on image set will affect the algorithm performance, and several ten to several hundred samples is supplied, but, in multi-view action recognition, we only have 3–5 views (samples) in each query set, which will limit the effect of image set.

In order to solve the issues, reverse testing image set model (called RTISM) based multi-view human action recognition is proposed. We firstly extract dense trajectory feature for each camera, and then construct the shared codebook by k-means for all cameras, after that, Bag-of-Word (BoW) weight scheme is employed to code these features for each camera; Secondly, for each query set, we will compute the compound distance with each image subset in gallery set, after that, the scheme of the nearest image subset (called RTIS) is chosen to add into the query set; Finally, RTISM is optimized where the query set and RTIS are whole reconstructed by the gallery set, thus, the relationship of different actions among gallery set and the complementary property of different samples among query set are meanwhile excavated. Large scale experimental results on two public multi-view action3D datasets - Northwestern UCLA and CVS-MV-RGBD-Single, show that the reconstruction of query set over gallery set is very effectively, and RTIS added into query set is very helpful for classification, what is more, the performance of RTISM is comparable to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://vision.ucsd.edu/~leekc/HondaUCSDVideoDatabase/HondaUCSD.html.

References

  1. Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Trans, Circ. Syst. Video Technol. 18(11), 1473–1488 (2008)

    Article  Google Scholar 

  2. Ke, S.-R., Thuc, H.L.U., Lee, Y.-J., Hwang, J.-N., Yoo, J.-H., Choi, K.-H.: A review on video-based human activity recognition. Computers 2, 88–131 (2013)

    Article  Google Scholar 

  3. Song, Y., Davis, R.: Multi-view latent variable discriminative models for action recognition. In: CVPR 2012, pp. 1–8 (2012)

    Google Scholar 

  4. Cai, Z., Wang, L., Peng, X.: Multi-view super vector for action recognition. In: CVPR 2014, pp. 1–8 (2014)

    Google Scholar 

  5. Kan, M., Shan, S., Zhang, H., Lao, S., Chen, X.: Multi-view discriminant analysis. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 808–821. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Liu, A., Su, Y., Jia, P., Gao, Z., Hao, T., Yang, Z.: Multipe/single-view human action recognition via part-induced multi-task structural learning. IEEE Trans. Cybern. 45(6), 1194–1208 (2015)

    Article  Google Scholar 

  7. Gao, Z., Zhang, H., Liu, A., Xue, Y., Xu, G.: Human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning. KSII Trans. Internet Inf. Syst. 8(2), 483–503 (2014)

    Article  Google Scholar 

  8. Liu, A., Xu, N., Su, Y., Lin, H., Hao, T., Yang, Z.: Single/multi-view human action recognition via regularized multi-task learning. Neurocomputing 151(2), 544–553 (2015)

    Article  Google Scholar 

  9. Gao, Z., Zhang, H., Xu, G.P., Xue, Y.B.: Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition. Neurocomputing 151(2), 554–564 (2015). doi:10.1016/j.neucom.2014.06.085

    Google Scholar 

  10. Liu, A., Su, Y., Nie, W., Yang, Z.: Jointly learning multiple sequential dynamics for human action recognition. PLoS ONE 10(7), e0130884. doi:10.1371/journal.pone.0130884

    Google Scholar 

  11. Gao, Z., Zhang, H., Xu, G-P., Xue, Y.-B., Hauptmann, A.G.: Multi-view discriminative and structure dictionary learning with group sparsity for human action recognition. Sig. Process. (2014). doi:10.1016/j.sigpro.2014.08.034

    Google Scholar 

  12. Nie, W., Liu, A., Su, Y., et al.: Single/cross-camera multiple-person tracking by graph matching. Neurocomputing 139, 220–232 (2014)

    Article  Google Scholar 

  13. Gao, Z., Zhang, L., Chen, M., Hauptmann, A., Zhang, H., Cai, A.: Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimedia Tools Appl. 68(3), 641–657 (2014)

    Article  Google Scholar 

  14. Liu, A., Wang, Z., Nie, W., Su, Y.: Graph-based characteristic view set extraction and matching for 3D model retrieval. Inf. Sci. (2015). doi:10.1016/j.ins.2015.04.042

    Google Scholar 

  15. Gao, Z., Song, J., Zhang, H., Liu, A., Xu, G., Xue, Y.: Human action recognition via multi-modality information. J. Electr. Eng. Technol. 9(2), 739–748 (2014)

    Article  Google Scholar 

  16. Hu, Y., Mian, A.S., Owens, R.: Sparse approximated nearest points for image set classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 121–128. IEEE (2011)

    Google Scholar 

  17. Cui, Z., Shan, S., Zhang, H., Lao, S., Chen, X.: Image sets alignment for video-based face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2633. IEEE (2012)

    Google Scholar 

  18. Chen, Y.-C., Patel, V.M., Phillips, P.J., Chellappa, R.: Dictionary-based face recognition from video. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 766–779. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  19. Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)

    Google Scholar 

  20. Gunawardana, A., Byrne, W.: Convergence theorems for generalized alternating minimization procedures. J. Mach. Learn. Res. 6, 2049–2073 (2005)

    MATH  MathSciNet  Google Scholar 

  21. Wang, J., Nie, X., Xia, Y., Wu, Y., Zhu, S.-C.: Cross-view action modeling, learning, and recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  22. Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR, IEEE, June 2011 (2, 6, 7, 8)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (No. 61572357, No. 61502337, No. 61472275, No. 61201234, No. 61202168), Tianjin Municipal Natural Science Foundation (No. 14JCZDJC31700, No. 13JCQNJC0040), Tianjin Education Committee science and technology development Foundation (No. 20120802).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Z. Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gao, Z., Zhang, Y., Zhang, H., Xu, G.P., Xue, Y.B. (2016). Reverse Testing Image Set Model Based Multi-view Human Action Recognition. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27671-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27670-0

  • Online ISBN: 978-3-319-27671-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics