Abstract
Few-shot learning aims to train classifiers to learn new visual object categories from few training examples. Recently, metric-learning based methods have made promising progress. Relation Network is a metric-based method that uses simple convolutional neural networks to learn deep relationships between image features in order to recognize new objects. However, during the feature comparing phase, Relation Network is considered sensitive to the spatial positions of the compared objects. Moreover, it learns from only single-scale features which can lead to a poor generalization ability due to scale variation of the compared objects. To solve these problems, we intend to extend Relation Network to be position-aware and integrate multi-scale features for more robust metric learning and better generalization ability. In this paper, we propose a novel few-shot learning method called Multi-scale Kronecker-Product Relation Networks For Few-Shot Learning (MsKPRN). Our method combines feature maps with spatial correlation maps generated from a Kronecker-product module to capture position-wise correlations between the compared features and then feeds them to a relation network module, which captures similarities between the combined features in a multi-scale manner. Extensive experiments demonstrate that the proposed method outperforms the related state-of-the-art methods on popular few-shot learning datasets. Particularly, MsKPRN has improved the accuracy of Relation Network from 50.44 to 57.02 and from 65.63 to 72.06 on 5-way 1-shot and 5-shot scenarios, respectively. Our code will be available on: https://github.com/mouniraziz/MsKPRN.
Similar content being viewed by others
References
Abdelaziz M, Zhang Z (2021) Few-shot learning with saliency maps as additional visual information. Multimedia Tools and Applications 80(7):10491–10508
Baik S, Hong S, Lee KM (2020) Learning to forget for meta-learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2379–2387
Biederman I (1987) Recognition-by-components: A theory of human image understanding. Psychological Review 94(2):115–147
Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 4080–4088
Chen Z, Fu Y, Zhang Y, Jiang Y-G, Xue X, Sigal L (2019) Multi-level semantic feature augmentation for one-shot learning. IEEE Transactions on Image Processing 28(9):4594–4605
Chen Z, Fu Y, Wang Y-X, Ma L, Liu W, Hebert M (2019) Image deformation meta-networks for one-shot learning. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 8680–8689
Chen H, Li H, Li Y, Chen C (2020) Multi-scale adaptive task attention network for few-shot learning. arXiv:2011.14479
Chu W-H, Li Y-J, Chang J-C, Wang Y-CF (2019) Spot and learn: A maximum-entropy patch sampler for few-shot image classification. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 6251–6260
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT 2019: Annual conference of the north american chapter of the association for computational linguistics, pp 4171–4186
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(4):594–611
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th international conference on machine learning-vol 70, pp 1126–1135
Flennerhag S, Rusu AA, Pascanu R, Visin F, Yin H, Hadsell R (2020) Meta-learning with warped gradient descent. In: ICLR 2020: Eighth international conference on learning representations
Gidaris S, Komodakis N (2018) Dynamic few-shot visual learning without forgetting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4367–4375
Han M, Wang R, Yang J, Xue L, Hu M (2020) Multi-scale feature network for few-shot learning. Multimedia Tools and Applications 79(17):11617–11637
Hariharan B, Girshick R (2017) Low-shot visual recognition by shrinking and hallucinating features. In: 2017 IEEE International conference on computer vision (ICCV), pp 3037–3046
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 770–778
Huang H, Zhang J, Zhang J, Xu J, Wu Q (2020) Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification. IEEE Transactions on Multimedia
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 7132–7141
Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proc. CVPR workshop on fine-grained visual categorization (FGVC) (Vol. 2, No. 1)
Kingma DP, Ba JL (2015) Adam: A method for stochastic optimization. In: ICLR 2015 : International conference on learning representations 2015
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop, vol 2
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3D Object representations for fine-grained categorization. In: 2013 IEEE International conference on computer vision workshops, pp 554–561
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Communications of The ACM 60(6):84–90
Lake BM, Salakhutdinov R, Gross J, Tenenbaum JB (2011) One shot learning of simple visual concepts. Cogn Sci:33(33)
Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 10657–10665
Li W, Wang L, Xu J, Huo J, Gao Y, Luo J (2019) Revisiting local descriptor based image-to-class measure for few-shot learning. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 7260–7268
Li Z, Zhou F, Chen F, Li H (2017) Meta-SGD: Learning to learn quickly for few-shot learning. arXiv:1707.09835
Mishra N, Rohaninejad M, Chen X, Abbeel P (2017) A simple neural attentive meta-learner. arXiv:1707.03141
Munkhdalai T, Yu H (2017) Meta networks. In: ICML’17 Proceedings of the 34th international conference on machine learning - vol 70, pp 2554–2563
Oh J, Yoo H, Kim C, Yun S-Y (2021) BOIL: Towards representation change for few-shot learning. In: ICLR 2021: The ninth international conference on learning representations
Oreshkin B, López PR, Lacoste A (2018) TADAM: Task dependent adaptive metric for improved few-shot learning. In: NIPS 2018: The 32nd annual conference on neural information processing systems, pp 721–731
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Ravi S, Larochelle H (2017) Optimization as a model for few-shot learning. In: ICLR 2017: International conference on learning representations 2017
Ren M, Ravi S, Triantafillou E, Snell J, Swersky K, Tenenbaum JB, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. In: ICLR 2018: International conference on learning representations 2018
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Bernstein M (2015) ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115(3):211–252
Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T (2016) Meta-learning with memory-augmented neural networks. In: ICML’16 Proceedings of the 33rd international conference on international conference on machine learning - vol 48, pp 1842–1850
Satorras VG, Estrach JB (2018) Few-shot learning with graph neural networks. In: 6th International conference on learning representations, ICLR 2018
Schwartz E, Karlinsky L, Feris RS, Giryes R, Bronstein AM (2019) Baby steps towards few-shot learning with multiple semantics. arXiv:1906.01905
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Shen Y, Xiao T, Li H, Yi S, Wang X (2018) End-to-end deep kronecker-product matching for person re-identification. In: 2018 IEEE CVF Conference on computer vision and pattern recognition, pp 6886–6895
Shen Y, Xiao T, Yi S, Chen D, Wang X, Li H (2020) Person re-identification with deep kronecker-product matching and group-shuffling random walk. IEEE Trans Pattern Anal Mach Intell:1–1
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in neural information processing systems, pp 4077–4087
Steiner B, DeVito Z, Chintala S, Gross S, Paszke A, Massa F, Yang, E (2019) PyTorch: An imperative style, high-performance deep learning library. In: NeurIPS 2019: Thirty-third conference on neural information processing systems, pp 8024–8035
Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: Relation network for few-shot learning. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 1199–1208
Tan, M et al (2020) EfficientDet: Scalable and efficient object detection. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 10781–10790
Tao A, Sapra K, Catanzaro B (2020) Hierarchical multi-scale attention for semantic segmentation. arXiv:arXiv:2005.10821
Thrun S, Pratt L (1998) Learning to learn: introduction and overview. Learning Learn:3–17
Vilalta R, Drissi Y (2002) A perspective view and survey of meta-learning. Artificial Intelligence Review 18(2):77–95
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In NIPS’16 Proceedings of the 30th international conference on neural information processing systems, pp 3637–3645
Wang Y-X, Girshick R, Hebert M, Hariharan B (2018) Low-shot learning from imaginary data. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 7278–7286
Wang X, Ma B, Yu Z, Li F, Cai Y (2020) Multi-scale decision network with feature fusion and weighting for few-shot learning. IEEE Access 8:92172–92181
Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P (2010) Caltech-UCSD birds 200
Wu Z, Li Y, Guo L, Jia K (2019) Parn: Position-aware relation networks for few-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6659–6667
Xing C, Rostamzadeh N, Oreshkin B, Pinheiro PO (2019) Adaptive cross-modal few-shot learning. In: NeurIPS 2019: Thirty-third conference on neural information processing systems, pp 4848-4858
Xue Z, Duan L, Li W, Chen L, Luo J (2020) Region comparison network for interpretable few-shot image classification. arXiv:2009.03558
Xue Z, Xie Z, Xing Z, Duan L (2020) Relative position and map networks in few-shot learning for image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 932–933
Zhang H, Koniusz P (2019) Power normalizing second-order similarity network for few-shot learning. In: 2019 IEEE Winter conference on applications of computer vision (WACV), pp 1185–1193
Zhang H, Torr PH, Koniusz P (2020) Few-shot Learning with multi-scale self-supervision. arXiv:2001.01600
Zhang H, Zhang J, Koniusz P (2019) Few-shot learning via saliency-guided hallucination of samples. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 2770–2779
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: Proceedings of the AAAI conference on artificial intelligence, pp 13001–13008
Acknowledgements
We would like to thank the anonymous referees for their helpful comments and suggestions.
Funding
This study was funded by the National Natural Science Foundation of China (Grant No.61379109,M1321007) and Science and Technology Plan of Hunan Province (Grant No.2014GK2018, 2016JC2011).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Abdelaziz, M., Zhang, Z. Multi-scale kronecker-product relation networks for few-shot learning. Multimed Tools Appl 81, 6703–6722 (2022). https://doi.org/10.1007/s11042-021-11735-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11735-w