Abstract
To build a robust system for predicting emotions from user-generated videos is a challenging problem due to the diverse contents and the high level abstraction of human emotions. Evidenced by the recent success of deep learning (e.g. Convolutional Neural Networks, CNN) in several visual competitions, CNN is expected to be a possible solution to conquer certain challenges in human cognitive processing, such as emotion prediction. The emotion wheel (a widely used emotion categorization in psychology) may provide a guidance on building basic cognitive structure for CNN feature learning. In this work, we try to predict emotions from user-generated videos with the aid of emotion wheel guided CNN feature extractors. Experimental results show that the emotion wheel guided and CNN learned features improved the average emotion prediction accuracy rate to 54.2 %, which is better than that of the related state-of-the-art approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ayache, S., Quénot, G., Gensel, J.: Classifier fusion for SVM-based multimedia semantic indexing. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 494–504. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71496-5_44
Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.-F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: ACM MM 2013, pp. 223–232. ACM, New York (2013)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies foraccurate object detection and semantic segmentation. In: IEEE CVPR 2014 (2014)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding, pp. 675–678 (2014)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: IEEE CVPR 2014 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) NIPS 2012, pp. 1097–1105 (2012)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semanticsegmentation. In: IEEE CVPR 2015 (2015)
Pang, L., Ngo, C.-W.: Mutlimodal learning with deep boltzmann machine for emotion prediction in user generated videos. In: ICMR 2015, ICMR 2015, pp. 619–622. ACM, New York (2015)
Peng, X., Zou, C., Qiao, Y., Peng, Q.: Action recognition with stacked fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 581–595. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_38
Plutchik, R.: Emotion: Theory, Research and Experience, vol. 1. Academic Press, New York (1980)
Rasheed, Z., Sheikh, Y., Shah, M.: On the use of computable features for film classification. IEEE TCSVT 15(1), 52–64 (2005)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. IJCV, 1–42, April 2015
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep fisher networks for large-scale image classification. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) NIPS 2013, pp. 163–171 (2013)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) NIPS 2014, pp. 568–576 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scaleimage recognition. In: ICLR 2015 (2015)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: IEEE CVPR 2014, pp. 1701–1708 (2014)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV 2013 (2013)
Wang, H.L., Cheong, L.-F.: Affective understanding in film. IEEE TCSVT 16(6), 689–704 (2006)
Wang, W., He, Q.: A survey on emotional semantic image retrieval. In: IEEE ICIP 2008, pp. 117–120, October 2008
Jiang, Y.-G., Xue, X., Baohan, X.: Predicting emotions in user-generated videos. In: AAAI 2014, Canada (2014)
Zha, S., Luisier, F., Andrews, W., Srivastava, N., Salakhutdinov, R.: Exploiting image-trained CNN architectures for unconstrained video classification. In: BMVC 2015 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Ho, CT., Lin, YH., Wu, JL. (2016). Emotion Prediction from User-Generated Videos by Emotion Wheel Guided Deep Learning. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9947. Springer, Cham. https://doi.org/10.1007/978-3-319-46687-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-46687-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46686-6
Online ISBN: 978-3-319-46687-3
eBook Packages: Computer ScienceComputer Science (R0)