Abstract
Recognizing human activities from videos becomes a hot research topic in computer vision, but many studies show that action recognition based on single view cannot obtain satisfying performance, thus, many researchers put their attentions on multi-view action recognition, but how to mine the relationships among different views still is a challenge problem. Since video face recognition algorithm based on image set has proved that image set algorithm can effectively mine the complementary properties of different views image, and achieves satisfying performance. Thus, Inspired by these, image set is utilized to mine the relationships among multi-view action recognition. However, the studies show that the sample number of gallery and query set in video face recognition based on image set will affect the algorithm performance, and several ten to several hundred samples is supplied, but, in multi-view action recognition, we only have 3–5 views (samples) in each query set, which will limit the effect of image set.
In order to solve the issues, reverse testing image set model (called RTISM) based multi-view human action recognition is proposed. We firstly extract dense trajectory feature for each camera, and then construct the shared codebook by k-means for all cameras, after that, Bag-of-Word (BoW) weight scheme is employed to code these features for each camera; Secondly, for each query set, we will compute the compound distance with each image subset in gallery set, after that, the scheme of the nearest image subset (called RTIS) is chosen to add into the query set; Finally, RTISM is optimized where the query set and RTIS are whole reconstructed by the gallery set, thus, the relationship of different actions among gallery set and the complementary property of different samples among query set are meanwhile excavated. Large scale experimental results on two public multi-view action3D datasets - Northwestern UCLA and CVS-MV-RGBD-Single, show that the reconstruction of query set over gallery set is very effectively, and RTIS added into query set is very helpful for classification, what is more, the performance of RTISM is comparable to the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Trans, Circ. Syst. Video Technol. 18(11), 1473–1488 (2008)
Ke, S.-R., Thuc, H.L.U., Lee, Y.-J., Hwang, J.-N., Yoo, J.-H., Choi, K.-H.: A review on video-based human activity recognition. Computers 2, 88–131 (2013)
Song, Y., Davis, R.: Multi-view latent variable discriminative models for action recognition. In: CVPR 2012, pp. 1–8 (2012)
Cai, Z., Wang, L., Peng, X.: Multi-view super vector for action recognition. In: CVPR 2014, pp. 1–8 (2014)
Kan, M., Shan, S., Zhang, H., Lao, S., Chen, X.: Multi-view discriminant analysis. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 808–821. Springer, Heidelberg (2012)
Liu, A., Su, Y., Jia, P., Gao, Z., Hao, T., Yang, Z.: Multipe/single-view human action recognition via part-induced multi-task structural learning. IEEE Trans. Cybern. 45(6), 1194–1208 (2015)
Gao, Z., Zhang, H., Liu, A., Xue, Y., Xu, G.: Human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning. KSII Trans. Internet Inf. Syst. 8(2), 483–503 (2014)
Liu, A., Xu, N., Su, Y., Lin, H., Hao, T., Yang, Z.: Single/multi-view human action recognition via regularized multi-task learning. Neurocomputing 151(2), 544–553 (2015)
Gao, Z., Zhang, H., Xu, G.P., Xue, Y.B.: Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition. Neurocomputing 151(2), 554–564 (2015). doi:10.1016/j.neucom.2014.06.085
Liu, A., Su, Y., Nie, W., Yang, Z.: Jointly learning multiple sequential dynamics for human action recognition. PLoS ONE 10(7), e0130884. doi:10.1371/journal.pone.0130884
Gao, Z., Zhang, H., Xu, G-P., Xue, Y.-B., Hauptmann, A.G.: Multi-view discriminative and structure dictionary learning with group sparsity for human action recognition. Sig. Process. (2014). doi:10.1016/j.sigpro.2014.08.034
Nie, W., Liu, A., Su, Y., et al.: Single/cross-camera multiple-person tracking by graph matching. Neurocomputing 139, 220–232 (2014)
Gao, Z., Zhang, L., Chen, M., Hauptmann, A., Zhang, H., Cai, A.: Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimedia Tools Appl. 68(3), 641–657 (2014)
Liu, A., Wang, Z., Nie, W., Su, Y.: Graph-based characteristic view set extraction and matching for 3D model retrieval. Inf. Sci. (2015). doi:10.1016/j.ins.2015.04.042
Gao, Z., Song, J., Zhang, H., Liu, A., Xu, G., Xue, Y.: Human action recognition via multi-modality information. J. Electr. Eng. Technol. 9(2), 739–748 (2014)
Hu, Y., Mian, A.S., Owens, R.: Sparse approximated nearest points for image set classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 121–128. IEEE (2011)
Cui, Z., Shan, S., Zhang, H., Lao, S., Chen, X.: Image sets alignment for video-based face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2633. IEEE (2012)
Chen, Y.-C., Patel, V.M., Phillips, P.J., Chellappa, R.: Dictionary-based face recognition from video. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 766–779. Springer, Heidelberg (2012)
Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Gunawardana, A., Byrne, W.: Convergence theorems for generalized alternating minimization procedures. J. Mach. Learn. Res. 6, 2049–2073 (2005)
Wang, J., Nie, X., Xia, Y., Wu, Y., Zhu, S.-C.: Cross-view action modeling, learning, and recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR, IEEE, June 2011 (2, 6, 7, 8)
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (No. 61572357, No. 61502337, No. 61472275, No. 61201234, No. 61202168), Tianjin Municipal Natural Science Foundation (No. 14JCZDJC31700, No. 13JCQNJC0040), Tianjin Education Committee science and technology development Foundation (No. 20120802).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Gao, Z., Zhang, Y., Zhang, H., Xu, G.P., Xue, Y.B. (2016). Reverse Testing Image Set Model Based Multi-view Human Action Recognition. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-27671-7_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)