Abstract
Person re-identification (re-id) with unmanned aerial vehicles (UAVs) is of great significance in intelligent surveillance. However, recognizing a person of interest from UAVs is more challenging than existing person re-id tasks across multiple fixed cameras. The images taken by UAVs have large resolution variations and complex backgrounds due to the rapid movement and constantly changing flight altitudes of UAVs. Some methods propose cross-resolution learning for person images captured by fixed cameras, assuming query images with low resolution (LR) and gallery images with high resolution (HR). However, they are incapable of handing the resolution variations in UAV scenarios, where both query and gallery images are with significant resolution variations. In this paper, we present a novel multi-resolution feature perception network (MRFPN) to learn discriminative and resolution-robust feature for UAV person re-id. Firstly, we introduce a self-attention module to capture the full-image context information in pixel level and obtain the pixel context-aware feature map for both HR and LR images, which can effectively deal with the background clutters. Secondly, we construct a cross-attention module to learn resolution-robust representations by bi-directionally perceiving the resolution-guided semantic information between HR and LR features. Further, we design a semantic consistency constraint to limit the difference of HR and LR features. Extensive experiments show the superiority of our method on both UAV and traditional datasets.
Similar content being viewed by others
Data Availability Statements
All data included in this study are available from the corresponding author on reasonable request.
References
Liao K, Wang K, Zheng Y, Lin G, Cao C (2023) Multi-scale saliency features fusion model for person re-identification. Multimedia Tools and Applications
Cheng R, Wang L, Wei M, Tian C (2022) Joint learning dynamic pruning and attention for person re-identification. Multimedia Tools and Applications, 1–21
Munir A, Martinel N, Micheloni C (2022) Consistent attentive dual branch network for person re-identification. Multimedia Tools and Applications, 1–18
Zhang S, Zhang Q, Yang Y, Wei X, Wang P, Jiao B, Zhang Y (2020) Person re-identification in aerial imagery. IEEE Trans Multimedia 23:281–291
Li T, Liu J, Zhang W, Ni Y, Wang W, Li Z (2021) Uav-human: a large benchmark for human behavior understanding with unmanned aerial vehicles. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16266–16275
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC (2021) Deep learning for person re-identification: a survey and outlook. IEEE Trans Pattern Anal Mach Intell 44(6):2872–2893
Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 4321–4329
Zheng Z, Zheng L, Yang Y (2017) A discriminatively learned cnn embedding for person reidentification. ACM transactions on multimedia computing, communications, and applications (TOMM) 14(1):1–20
Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European conference on computer vision (ECCV), pp 480–496
Fu Y, Wei Y, Zhou Y, Shi H, Huang G, Wang X, Yao Z, Huang T (2019) Horizontal pyramid matching for person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8295–8302
Kalayeh MM, Basaran E, Gökmen M, Kamasak ME, Shah M (2018) Human semantic parsing for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1062–1071
Su C, Li J, Zhang S, Xing J, Gao W, Tian Q (2017) Pose-driven deep convolutional model for person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 3960–3969
Zhao L, Li X, Zhuang Y, Wang J (2017) Deeply-learned part-aligned representations for person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 3219–3228
Zhang G, Yang J, Zheng Y, Wang Y, Wu Y, Chen S (2021) Hybrid-attention guided network with multiple resolution features for person re-identification. Inf Sci 578:525–538
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z-H, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229
Jiao J, Zheng W-S, Wu A, Zhu X, Gong S (2018) Deep low-resolution person re-identification. Proceedings of the AAAI conference on artificial intelligence, vol 32, pp 6967–6874
Zheng W-S, Hong J, Jiao J, Wu A, Zhu X, Gong S, Qin J, Lai J (2022) Joint bilateral-resolution identity modeling for cross-resolution person re-identification. Int J Comput Vision 130(1):136–156
Li Y-J, Chen Y-C, Lin Y-Y, Du X, Wang Y-CF (2019) Recover and identify: a generative dual model for cross-resolution person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8090–8099
Chen Y-C, Li Y-J, Du X, Wang Y-CF (2019) Learning resolution-invariant deep representations for person re-identification. Proceedings of the AAAI conference on artificial intelligence vol 33, pp 8215–8222
He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: transformer-based object re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15013–15022
Zhu K, Guo H, Zhang S, Wang Y, Huang G, Qiao H, Liu J, Wang J, Tang M (2021) Aaformer: auto-aligned transformer for person re-identification. arXiv:2104.00921
Li Y, He J, Zhang T, Liu X, Zhang Y, Wu F (2021) Diverse part discovery: occluded person re-identification with part-aware transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2898–2907
Zhang G, Zhang P, Qi J, Lu H (2021) Hat: hierarchical aggregation transformers for person re-identification. In: Proceedings of the 29th ACM international conference on multimedia, pp 516–525
Grigorev A, Tian Z, Rho S, Xiong J, Liu S (2019) Jiang F (2019) Deep person re-identification in uav images. EURASIP Journal on Advances in Signal Processing 2019(1):1–10
Kumar SA, Yaghoubi E, Das A, Harish B, Proença H (2020) The p-destre: a fully annotated dataset for pedestrian detection, tracking, and short/long-term re-identification from aerial devices. IEEE Trans Inf Forensics Secur 16:1696–1708
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv:1606.08415
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv:1607.06450
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision, pp 1116–1124
Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE international conference on computer vision, pp 3754–3762
Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 79–88
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM international conference on multimedia, pp 274–282
Zheng Z, Yang X, Yu Z, Zheng L, Yang Y, Kautz J (2019) Joint discriminative and generative learning for person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2138–2147
Zhou K, Yang Y, Cavallaro A, Xiang T (2019) Omni-scale feature learning for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3702–3712
Hou R, Ma B, Chang H, Gu X, Shan S, Chen X (2019) Interaction-and-aggregation network for person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9317–9326
Tay C-P, Roy S, Yap K-H (2019) Aanet: attribute attention network for person re-identifications. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7134–7143
Fang P, Zhou J, Roy SK, Petersson L, Harandi M (2019) Bilinear attention networks for person retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8030–8039
Zheng M, Karanam S, Wu Z, Radke RJ (2019) Re-identification with consistent attentive siamese networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5735–5744
Jin X, Lan C, Zeng W, Chen Z, Zhang L (2020) Style normalization and restitution for generalizable person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3143–3152
Zhang Z, Lan C, Zeng W, Jin X, Chen Z (2020) Relation-aware global attention for person re-identification. In: Proceedings of the Ieee/cvf conference on computer vision and pattern recognition, pp 3186–3195
Li H, Wu G, Zheng W-S (2021) Combined depth space based architecture search for person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6729–6738
Wu G, Zhu X, Gong S (2022) Learning hybrid ranking representation for person re-identification. Pattern Recogn 121:108239
Ming Z, Yang Y, Wei X, Yan J, Wang X, Wang F, Zhu M (2021) Global-local dynamic feature alignment network for person re-identification. arXiv:2109.05759
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. Journal of machine learning research 9(11)
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 62171318.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, M., Hou, C., Zheng, X. et al. Multi-resolution feature perception network for UAV person re-identification. Multimed Tools Appl 83, 62559–62580 (2024). https://doi.org/10.1007/s11042-023-17937-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17937-8