Abstract
Most of the existing Zero-Shot Learning (ZSL) approaches learn direct embeddings from global features or image parts (regions) to the semantic space, which, however, fail to capture the appearance relationships between different local regions within a single image. In this paper, to model the relations among local image regions, we incorporate the region-based relation reasoning into ZSL. Our method, termed as Region Graph Embedding Network (RGEN), is trained end-to-end from raw image data. Specifically, RGEN consists of two branches: the Constrained Part Attention (CPA) branch and the Parts Relation Reasoning (PRR) branch. CPA branch is built upon attention and produces the image regions. To exploit the progressive interactions among these regions, we represent them as a region graph, on which the parts relation reasoning is performed with graph convolutions, thus leading to our PRR branch. To train our model, we introduce both a transfer loss and a balance loss to contrast class similarities and pursue the maximum response consistency among seen and unseen outputs, respectively. Extensive experiments on four datasets well validate the effectiveness of the proposed method under both ZSL and generalized ZSL settings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In this paper, part and region are alternatively used.
References
Akata, Z., Malinowski, M., Fritz, M., Schiele, B.: Multi-cue zero-shot learning with strong supervision. In: CVPR (2016)
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: CVPR (2013)
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. In: TPAMI (2016)
Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: CVPR (2015)
Annadani, Y., Biswas, S.: Preserving semantic relations for zero-shot learning. In: CVPR (2018)
Cacheux, Y., Borgne, H., Crucianu, M.: Modeling inter and intra-class relations in the triplet loss for zero-shot learning. In: ICCV (2019)
Changpinyo, S., Chao, W.L., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: CVPR (2016)
Chao, W.-L., Changpinyo, S., Gong, B., Sha, F.: An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 52–68. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_4
Chen, L., Zhang, H., Xiao, J., Liu, W., Chang, S.F.: Zero-shot visual recognition using semantics-preserving adversarial embedding network. In: CVPR (2018)
Elhoseiny, M., Elfeki, M.: Creativity inspired zero-shot learning. In: ICCV (2019)
Elhoseiny, M., Zhu, Y., Zhang, H., Elgammal, A.M.: Link the head to the "beak": zero shot learning from noisy text description at part precision. In: CVPR (2017)
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)
Felix, R., Kumar, V.B., Reid, I., Carneiro, G.: Multi-modal cycle-consistent generalized zero-shot learning. In: ECCV (2008)
Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., T. Mikolov, E.A.: DeViSE: a deep visual-semantic embedding model. In: NeurIPS (2013)
Fu, Y., Hospedales, T.M., Xiang, T., Gong, S.: Transductive multi-view zero-shot learning. In: TPAMI (2015)
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Jayaraman, D., Grauman, K.: Zero-shot recognition with unreliable attributes. In: NeurIPS (2014)
Jiang, H., Wang, R., Shan, S., Chen, X.: Transferable contrastive network for generalized zero-shot learning. In: ICCV (2019)
Kampffmeyer, M., Chen, Y., Liang, X., Wang, H., Zhang, Y., Xing, E.: Rethinking knowledge graph propagation for zero-shot learning. In: CVPR (2019)
Kipf, T., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv:1609.02907 (2016)
Kodirov, E., Xiang, T., Fu, Z., Gong, S.: Unsupervised domain adaptation for zero-shot learning. In: ICCV (2015)
Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: CVPR (2017)
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)
Li, X., Yang, F., Cheng, H., Liu, W., Shen, D.: Contour knowledge transfer for salient object detection. In: ECCV (2018)
Li, Y., Zhang, J., Zhang, J., Huang, K.: Discriminative learning of latent features for zero-shot recognition. In: CVPR (2018)
Liu, S., Long, M., Wang, J., Jordan, M.: Generalized zero-shot learning with deep calibration network. In: NeurIPS (2018)
Liu, Y., Guo, J., Cai, D., He, X.: Attribute attention for semantic disambiguation in zero-shot learning. In: ICCV (2019)
Long, Y., Liu, L., Shen, F., Shao, L., Li, X.: Zero-shot learning using synthesised unseen visual data with diffusion regularisation. In: TPAMI (2017)
Lu, X., Wang, W., Ma, C., Shen, J., Shao, L., Porikli, F.: See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: CVPR (2019)
Lu, X., Wang, W., Martin, D., Zhou, T., Shen, J., Luc, V.G.: Video object segmentation with episodic graph memory networks. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. JMLR 9, 2579–2605 (2008)
Morgado, P., Vasconcelos, N.: Semantically consistent regularization for zero-shot recognition. In: CVPR (2017)
Norouzi, M., et al.: Zero-shot learning by convex combination of semantic embeddings. In: NeurIPS (2014)
Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: NeurIPS (2009)
Patterson, G., Hays, J.: Sun attribute database: discovering, annotating, and recognizing scene attributes. In: CVPR (2012)
Qiao, R., Liu, L., Shen, C., van den Hengel, A.: Less is more: zero-shot learning from online textual documents with noise suppression. In: CVPR (2016)
Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: CVPR (2016)
Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: ICML (2015)
Shen, Y., Qin, J., Huang, L., Liu, L., Zhu, F., Shao, L.: Invertible zero-shot recognition flows. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: NeurIPS (2013)
Song, J., Shen, C., Yang, Y., Liu, Y., Song, M.: Transductive unbiased embedding for zero-shot learning. In: CVPR (2018)
Verma, V.K., Arora, G., Mishra, A., Rai, P.: Generalized zero-shot learning via synthesized examples. In: CVPR (2018)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset. In: Technical report (2011)
Wang, X., Ye, Y., Gupta, A.: Zero-shot recognition via semantic embeddings and knowledge graphs. In: CVPR (2018)
Wu, B., et al.: Tencent ml-images: a large-scale multi-label image database for visual representation learning. IEEE Access 7, 172683–172693 (2019)
Wu, B., Jia, F., Liu, W., Ghanem, B., Lyu, S.: Multi-label learning with missing labels using mixed dependency graphs. Int. J. Comput. Vis. 126, 875–896 (2018)
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: CVPR (2016)
Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: CVPR (2018)
Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning-the good, the bad and the ugly. In: CVPR (2017)
Xian, Y., Sharma, S., Saurabh, S., Akata, Z.: f-VAEGAN-D2: a feature generating framework for any-shot learning. In: CVPR (2019)
Xie, G.S., et al.: Attentive region embedding network for zero-shot learning. In: CVPR (2019)
Xie, G.S., Zhang, X.Y., Yang, W., Xu, M., Yan, S., Liu, C.L.: LG-CNN: from local parts to global discrimination for fine-grained recognition. Pattern Recogn. 71, 118–131 (2017)
Xie, G.S., et al.: SRSC: selective, robust, and supervised constrained feature representation for image classification. IEEE Trans. Neural Netw. Learn. Syst. 31, 4290–4302 (2019)
Xu, H., Saenko, K.: Ask, attend and answer: exploring question-guided spatial attention for visual question answering. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 451–466. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_28
Xu, J., Zhao, R., Zhu, F., Wang, H., Ouyang, W.: Attention-aware compositional network for person re-identification. arXiv:1805.03344 (2018)
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015)
Yang, F.S.Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: Relation network for few-shot learning. In: CVPR (2018)
Yang, G., Liu, J., Xu, J., Li, X.: Dissimilarity representation learning for generalized zero-shot recognition. In: MM (2018)
Yao, Y., et al.: Exploiting web images for multi-output classification: from category to subcategories. IEEE Trans. Neural Netw. Learn. Syst. 31, 2348–2360 (2020)
Yao, Y., Zhang, J., Shen, F., Hua, X., Xu, J., Tang, Z.: Exploiting web images for dataset construction: a domain robust approach. IEEE Trans. Multimedia 19, 1771–1784 (2017)
Ye, M., Guo, Y.: Zero-shot classification with discriminative semantic representation learning. In: CVPR (2017)
Yu, H., Lee, B.: Zero-shot learning via simultaneous generating and learning. In: NeurIPS (2019)
Yu, Y., Ji, Z., Fu, Y., Guo, J., Pang, Y., Zhang, Z.: Stacked semantics-guided attention model for fine-grained zero-shot learning. In: NeurIPS (2018)
Yu, Y., Ji, Z., Han, J., Zhang, Z.: Episode-based prototype generating network for zero-shot learning. In: CVPR (2020)
Zhang, L., Xiang, T., Gong, S., et al.: Learning a deep embedding model for zero-shot learning. In: CVPR (2017)
Zhang, L., et al.: Towards effective deep embedding for zero-shot learning. IEEE Trans. Circ. Syst. Video Technol. 30, 2843–2852 (2020)
Zhang, L., et al.: Adaptive importance learning for improving lightweight image super-resolution network. Int. J. Comput. Vis. 128, 479–499 (2020)
Zhang, L., et al.: Unsupervised domain adaptation using robust class-wise matching. IEEE Trans. Circ. Syst. Video Technol. 29, 1339–1349 (2018)
Zhang, L., Wei, W., Bai, C., Gao, Y., Zhang, Y.: Exploiting clustering manifold structure for hyperspectral imagery super-resolution. IEEE Trans. Image Process. 27, 5969–5982 (2018)
Zhang, L., Wei, W., Zhang, Y., Shen, C., Van Den Hengel, A., Shi, Q.: Cluster sparsity field: an internal hyperspectral imagery prior for reconstruction. Int. J. Comput. Vis. 126, 797–821 (2018)
Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: ICCV (2015)
Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: CVPR (2016)
Zhang, Z., Liu, L., Shen, F., Shen, H.T., Shao, L.: Binary multi-view clustering. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1774–1782 (2018)
Zhao, F., Liao, S., Xie, G.S., Zhao, J., Zhang, K., Shao, L.: Unsupervised domain adaptation with noise resistible mutual-training for person re-identification. In: ECCV (2020)
Zhao, F., Zhao, J., Yan, S., Feng, J.: Dynamic conditional networks for few-shot learning. In: ECCV (2018)
Zhou, B., Khosla, A.A.L., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)
Zhu, P., Wang, H., Saligrama, V.: Generalized zero-shot recognition based on visually semantic embedding. In: CVPR (2019)
Zhu, Y., Elhoseiny, M., Liu, B., Peng, X., Elgammal, A.: A generative adversarial approach for zero-shot learning from noisy texts. In: CVPR (2018)
Zhu, Y., Xie, J., Tang, Z., Peng, X., Elgammal, A.: Learning where to look: semantic-guided multi-attention localization for zero-shot learning. In: NeurIPS (2019)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Nos. 61702163 and 61976116), the Fundamental Research Funds for the Central Universities (Nos. 30920021135), and the Key Project of Shenzhen Municipal Technology Research (Nos. JSGG20200103103401723).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Xie, GS. et al. (2020). Region Graph Embedding Network for Zero-Shot Learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12349. Springer, Cham. https://doi.org/10.1007/978-3-030-58548-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-58548-8_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58547-1
Online ISBN: 978-3-030-58548-8
eBook Packages: Computer ScienceComputer Science (R0)