Exploring Spatiotemporal Features for Activity Classifications in Films

Phon-Amnuaisuk, Somnuk; Hadi, Shiqah; Omar, Saiful

doi:10.1007/978-3-030-63820-7_47

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1332))

Included in the following conference series:

International Conference on Neural Information Processing

2478 Accesses

Abstract

Humans are able to appreciate implicit and explicit contexts in a visual scene within a few seconds. How we obtain the interpretations of the visual scene using computers has not been well understood, and so the question remains whether this ability could be emulated. We investigated activity classifications of movie clips using 3D convolutional neural network (CNN) as well as combinations of 2D CNN and long short-term memory (LSTM). This work was motivated by the concepts that CNN can effectively learn the representation of visual features, and LSTM can effectively learn temporal information. Hence, an architecture that combined information from many time slices should provide an effective means to capture the spatiotemporal features from a sequence of images. Eight experiments run on the following three main architectures were carried out: 3DCNN, ConvLSTM2D, and a pipeline of pre-trained CNN-LSTM. We analyzed the empirical output, followed by a critical discussion of the analyses and suggestions for future research directions in this domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recognizing online video genres using ensemble deep convolutional learning for digital media service management

Article Open access 14 May 2024

Film Shot Type Classification Based on Camera Movement Styles

Segregating and Recognizing Human Actions from Video Footages Using LRCN Technique

Notes

1.
We chose eight frames from each clip. The frames were evenly pick from each clip. The number 8 was arbitrary decision.

References

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR 2009), pp. 2929–2936 (2009)
Google Scholar
Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., Baik, S.W.: Action recognition in video sequences using deep bi-directional LSTM With CNN features. IEEE Access 2018(6), 1155–1166 (2018)
Article Google Scholar
Varol, G., Laptev, I., Schmid, C.: Long-term temporal convolutions for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018(40), 1510–1517 (2018)
Article Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). CoRR abs/1412.2306 (2015)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2016)
Article Google Scholar
Phon-Amnuaisuk, S., Murata, K.T., Pavarangkoon, P., Mizuhara, T., Hadi, S.: Children activity descriptions from visual and textual associations. In: Chamchong, R., Wong, K.W. (eds.) MIWAI 2019. LNCS (LNAI), vol. 11909, pp. 121–132. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33709-4_11
Chapter Google Scholar
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. CoRR, abs/1608.06993 (2016). http://arxiv.org/abs/1608.06993
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Google Scholar
Zoph, B., Vasudevan, V., Shlen, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8697–8710 (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR, abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning representations (ICLR) CoRR, 1409.1556 (2015)
Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1251–1258 (2017)
Google Scholar
Phon-Amnuaisuk, S., Ahmad, A.: Tracking and identifying a changing appearance target. In: Bikakis, A., Zheng, X. (eds.) MIWAI 2015. LNCS (LNAI), vol. 9426, pp. 245–252. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26181-2_23
Chapter Google Scholar

Download references

Acknowledgments

We wish to thank the Centre for Innovative Engineering, Universiti Teknologi Brunei for the financial support given to this research. We would also like to thank anonymous reviewers for their constructive comments and suggestions.

Author information

Authors and Affiliations

Universiti Teknologi Brunei, Jln Tungku Link, Gadong, BE1410, Brunei
Somnuk Phon-Amnuaisuk, Shiqah Hadi & Saiful Omar

Authors

Somnuk Phon-Amnuaisuk
View author publications
You can also search for this author in PubMed Google Scholar
Shiqah Hadi
View author publications
You can also search for this author in PubMed Google Scholar
Saiful Omar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Somnuk Phon-Amnuaisuk .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Phon-Amnuaisuk, S., Hadi, S., Omar, S. (2020). Exploring Spatiotemporal Features for Activity Classifications in Films. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-63820-7_47
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploring Spatiotemporal Features for Activity Classifications in Films

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Recognizing online video genres using ensemble deep convolutional learning for digital media service management

Film Shot Type Classification Based on Camera Movement Styles

Segregating and Recognizing Human Actions from Video Footages Using LRCN Technique

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Exploring Spatiotemporal Features for Activity Classifications in Films

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Recognizing online video genres using ensemble deep convolutional learning for digital media service management

Film Shot Type Classification Based on Camera Movement Styles

Segregating and Recognizing Human Actions from Video Footages Using LRCN Technique

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation