The Latent Semantic Power of Labels: Improving Image Classification via Natural Language Semantic

Jia, Haosen; Yao, Hong; Tian, Tian; Yan, Cheng; Li, Shengwen

doi:10.1007/978-3-030-37429-7_18

Haosen Jia¹¹,
Hong Yao^11,12,
Tian Tian^11,12,
Cheng Yan¹¹ &
…
Shengwen Li¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11956))

Included in the following conference series:

International Conference on Human Centered Computing

1516 Accesses

Abstract

In order to address the problem that numerical labels are difficult to optimize, one-hot encoding is introduced into image classification tasks, and has been widely used in current models based on CNNs. However, one-hot encoding neglects the textual semantics of class labels, which closely relate to image characteristics and contain latent connections between images. Inspired by distributional similarity based representations in Natural Language Processing society, we propose a framework by introducing Word2Vec into classic CNN models to improve image classification performance. By mining the latent semantic power of classes labels, word vector representations participate in the classification model instead of the traditional one-hot encoding. In the evaluation experiments implemented on data sets of CIFAR-10 and CIFAR-100, a series of representative CNNs have been tested as the feature extraction component for our framework. Experimental results show that the proposed method has revealed compelling ability to improve the classification accuracy.

This research was supported by Project 61672474, 61501412 supported by NSFC. National Science and Technology Major Project (No. 2017ZX05036-001-010). Science and Technology Planning Project of Guangdong Province, China. (No. 2018B020207012).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A multi-label image classification method combining multi-stage image semantic information and label relevance

Article 08 April 2024

Multi-label Cluster Discrimination for Visual Representation Learning

Capturing Prior Knowledge in Soft Labels for Classification with Limited or Imbalanced Data

References

Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2016)
Article Google Scholar
Bengio, Y.: Neural net language models. Scholarpedia 3(1), 3881 (2008)
Article Google Scholar
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Chollet, F., et al.: Keras (2015)
Google Scholar
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, pp. 1–2, Prague (2004)
Google Scholar
De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)
Article MathSciNet Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Dixit, M., Chen, S., Gao, D., Rasiwasia, N., Vasconcelos, N.: Scene classification with semantic fisher vectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2974–2983 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. CoRR abs/1502.01852 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization (2015)
Google Scholar
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)
Article MathSciNet Google Scholar
Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset (2014). http://www.cs.toronto.edu/kriz/cifar.html
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Comput. Vis. 43(1), 29–44 (2001)
Article Google Scholar
Li, F.F., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)
Article Google Scholar
Li, X., Liao, S., Lan, W., Du, X., Yang, G.: Zero-shot image tagging by hierarchical semantic embedding. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 879–882. ACM (2015)
Google Scholar
Lin, M., Chen, Q., Yan, S.: Network in network. CoRR abs/1312.4400 (2013)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Joulin, A., Chopra, S., Mathieu, M., Ranzato, M.: Learning longer memory in recurrent neural networks. arXiv preprint arXiv:1412.7753 (2014)
Morgado, P., Vasconcelos, N.: Semantically consistent regularization for zero-shot recognition. In: CVPR, vol. 9, p. 10 (2017)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Invest. 30(1), 3–26 (2007)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, May 2010. http://is.muni.cz/publication/884893/en
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: Null, p. 1470. IEEE (2003)
Google Scholar
Su, Y., Jurie, F.: Improving image classification using semantic attributes. Int. J. Comput. Vis. 100(1), 59–77 (2012)
Article Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for Twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1555–1565 (2014)
Google Scholar
Tieleman, T., Hinton, G.: Rmsprop. Lecture, COURSERA (2012)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017)
Article Google Scholar
Wan, J., et al.: Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 157–166. ACM (2014)
Google Scholar
Zhang, L., Xiang, T., Gong, S., et al.: Learning a deep embedding model for zero-shot learning (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, China University of Geosciences, Wuhan, 430074, China
Haosen Jia, Hong Yao, Tian Tian & Cheng Yan
Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, 430074, China
Hong Yao & Tian Tian
Faculty of Information Engineering, China University of Geosciences, Wuhan, 430074, China
Shengwen Li

Authors

Haosen Jia
View author publications
You can also search for this author in PubMed Google Scholar
Hong Yao
View author publications
You can also search for this author in PubMed Google Scholar
Tian Tian
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Yan
View author publications
You can also search for this author in PubMed Google Scholar
Shengwen Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Yao .

Editor information

Editors and Affiliations

University of Kragujevac, Čačak, Serbia
Danijela Milošević
South China Normal University, Guangzhou, China
Yong Tang
Wuhan University of Technology, Wuhan, Hubei, China
Qiaohong Zu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jia, H., Yao, H., Tian, T., Yan, C., Li, S. (2019). The Latent Semantic Power of Labels: Improving Image Classification via Natural Language Semantic. In: Milošević, D., Tang, Y., Zu, Q. (eds) Human Centered Computing. HCC 2019. Lecture Notes in Computer Science(), vol 11956. Springer, Cham. https://doi.org/10.1007/978-3-030-37429-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-37429-7_18
Published: 12 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37428-0
Online ISBN: 978-3-030-37429-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Latent Semantic Power of Labels: Improving Image Classification via Natural Language Semantic

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A multi-label image classification method combining multi-stage image semantic information and label relevance

Multi-label Cluster Discrimination for Visual Representation Learning

Capturing Prior Knowledge in Soft Labels for Classification with Limited or Imbalanced Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Latent Semantic Power of Labels: Improving Image Classification via Natural Language Semantic

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A multi-label image classification method combining multi-stage image semantic information and label relevance

Multi-label Cluster Discrimination for Visual Representation Learning

Capturing Prior Knowledge in Soft Labels for Classification with Limited or Imbalanced Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation