Abstract
Visible-infrared person re-identification (VI-ReID) is an important but very challenging task in the automated video surveillance and forensics. Although existing VI-ReID methods have achieved very encouraging results, how to make full use of the useful information contained in cross-modality visible and infrared images has not been well studied. In this paper, we propose an Information Disentanglement based Cross-modal Representation Learning (IDCRL) approach for VI-ReID. Specifically, IDCRL first extracts the shared and specific features from data of each modality by using the shared feature learning module and the specific feature learning module, respectively. To ensure that the shared and specific information can be well disentangled, we impose an orthogonality constraint on the shared and specific features of each modality. To make the shared features extracted from the visible and infrared images of the same person own high similarity, IDCRL designs a shared feature consistency constraint. Furthermore, IDCRL uses a modality-aware loss to ensure that the useful modality-specific features can be extracted from each modality effectively. Then, the obtained shared and specific features are concatenated as the representation of each image. Finally, identity loss function and cross-modal discriminant loss function are employed to enhance the discriminability of the obtained image representation. We conducted comprehensive experiments on the benchmark visible-infrared pedestrian datasets (SYSU-MM01 and RegDB) to evaluate the efficacy of our IDCRL approach. Experimental results demonstrate that IDCRL outperforms the compared state-of-the-art methods. On the SYSU-MM01 dataset, the rank-1 matching rate of our approach reaches 62.35% and 71.64% in the all-search and in-door modes, respectively. On the RegDB dataset, the rank-1 result of our approach reaches 76.32% and 75.49% in the visible to thermal and thermal to visible modes, respectively.
Similar content being viewed by others
Data Availability
The SYSU-MM01 dataset that supports the findings of this study is available from the corresponding author upon reasonable request. The researchers will need to sign a dataset release agreement before they can obtain the download link. The related page for SYSU-MM01 is https://github.com/wuancong/SYSU-MM01. The RegDB dataset that supports the findings of this study is publicly available online at https://github.com/bismex/HiCMD and https://drive.google.com/file/d/1gnVt9GIQSvium_mcxc7AWLhSXm6lNWsa/view?usp=sharing.
References
Basaran E, Gökmen M, Kamasak M E (2020) An efficient framework for visible-infrared cross modality person re-identification. Signal Process: Image Commun 87:115933. https://doi.org/10.1016/j.image.2020.115933https://doi.org/10.1016/j.image.2020.115933
Chen W, Chen X, Zhang J, Huang K (2017) Beyond triplet loss: a deep quadruplet network for person re-identification. In: IEEE Conference on CVPR, pp 1320–1329
Chen W, Lu Y, Ma H, Chen Q, Wu X, Wu P (2022) Self-attention mechanism in person re-identification models. Multimed Tools Applic 81 (4):4649–4667. https://doi.org/10.1007/s11042-020-10494-4
Chen Y, Wan L, Li Z, Jing Q, Sun Z (2021) Neural feature search for rgb-infrared person re-identification. In: IEEE Conference on CVPR, pp 587–597
Choi S, Lee S, Kim Y, Kim T, Kim C (2020) Hi-cmd: ical cross-modality disentanglement for visible-infrared person re-identification. In: IEEE Conference on CVP, pp 10254–10263
Dai J, Zhang P, Wang D, Lu H, Wang H (2019) Video person re-identification by temporal residual learning. IEEE Trans Image Process 28(3):1366–1377. https://doi.org/10.1109/TIP.2018.2878505
Dai P, Ji R, Wang H, Wu Q, Huang Y (2018) Cross-modality person re-identification with generative adversarial training. In: IJCAI, pp 677–683
Feng Z-X, Lai J, Xie X (2020) Learning modality-specific representations for visible-infrared person re-identification. IEEE Trans Image Process 29:579–590. https://doi.org/10.1109/TIP.2019.2928126
Hao X, Zhao S, Ye M, Shen J (2021) Cross-modality person re-identification via modality confusion and center aggregation. In: IEEE Conference on ICCV, pp 16383–16392
Hao Y, Wang N, Gao X, Li J, Wang X (2019) Dual-alignment feature embedding for cross-modality person re-identification. In: ACM Multimedia, pp 57–65
Hao Y, Wang N, Li J, Gao X (2019) HSME: hypersphere manifold embedding for visible thermal person re-identification. In: AAAI, pp 8385–8392
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on CVPR, pp 770–778
Huang P, Zhu S, Wang D, Liang Z (2022) Cross-modality person re-identication with triple-attentive feature aggregation. Multimed Tools Applic 81(3):4455–4473. https://doi.org/10.1007/s11042-021-11739-6https://doi.org/10.1007/s11042-021-11739-6
Jia M, Zhai Y, Lu S, Ma S, Zhang J (2020) A similarity inference metric for rgb-infrared cross-modality person re-identification. In: IJCAI, pp 1026–1032
Jia X, Jing X-Y, Zhu X, Chen S, Du B, Cai Z, He Z, Yue D (2021) Semi-supervised multi-view deep discriminant representation learning. IEEE Trans Pattern Anal Mach Intell 43(7):2496–2509. https://doi.org/10.1109/TPAMI.2020.2973634
Jiang J, Jin K, Qi M, Wang Q, Wu J, Chen C (2020) A cross-modal multi-granularity attention network for RGB-IR person re-identification. Neurocomputing 406:59–67. https://doi.org/10.1016/j.neucom.2020.03.109https://doi.org/10.1016/j.neucom.2020.03.109
Kniaz V V, Knyaz V A, Hladuvka J, Kropatsch W G, Mizginov V (2018) Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset. In: ECCV, vol 11134, pp 606–624
Li D, Wei X, Hong X, Gong Y (2020) Infrared-visible cross-modal person re-identification with an X modality. In: AAAI. https://ojs.aaai.org/index.php/AAAI/article/view/5891. Accessed 25 Sept 2021, pp 4610–4617
Li M, Zhu X, Gong S (2020) Unsupervised tracklet person re-identification. IEEE Trans Pattern Anal Mach Intell 42(7):1770–1782. https://doi.org/10.1109/TPAMI.2019.2903058
Li W, Zhu X, Gong S (2020) Scalable person re-identification by harmonious attention. Int J Comput Vision 128(6):1635–1653. https://doi.org/10.1007/s11263-019-01274-1
Liang W, Wang G, Lai J, Xie X (2021) Homogeneous-to-heterogeneous: unsupervised learning for rgb-infrared person re-identification. IEEE Trans Image Process 30:6392–6407. https://doi.org/10.1109/TIP.2021.3092578https://doi.org/10.1109/TIP.2021.3092578
Liao S, Hu Y, Zhu X, Li S Z (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE Conference on CVPR, pp 2197–2206
Liao S, Li S Z (2015) Efficient PSD constrained asymmetric metric learning for person re-identification. In: IEEE Conference on ICCV, pp 3685–3693
Liu H, Cheng J, Wang W, Su Y, Bai H (2020) Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification. Neurocomputing 398:11–19. https://doi.org/10.1016/j.neucom.2020.01.089https://doi.org/10.1016/j.neucom.2020.01.089
Liu K, Ma B, Zhang W, Huang R (2015) A spatio-temporal appearance representation for viceo-based pedestrian re-identification. In: IEEE Conference on ICCV, pp 3810–3818
Lu Y, Wu Y, Liu B, Zhang T, Li B, Chu Q, Yu N (2020) Cross-modality person re-identification with shared-specific feature transfer. In: IEEE Conference on CVPR, pp 13376–13386
Matsukawa T, Okabe T, Suzuki E, Sato Y (2016) Hierarchical gaussian descriptor for person re-identification. In: IEEE Conference on CVPR, pp 1363–1372
Meng J, Wu S, Zheng W-S (2019) Weakly supervised person re-identification. In: IEEE Conference on CVPR, pp 760–769
Nguyen D T, Hong H G, Kim K-W, Park K R (2017) Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3):605. https://doi.org/10.3390/s17030605https://doi.org/10.3390/s17030605
Park H, Lee S, Lee J, Ham B (2021) Learning by aligning: visible-infrared person re-identification using cross-modal correspondences. In: IEEE Conference on ICCV, pp 12046–12055
Qi M, Wang S, Huang G, Jiang J, Wu J, Chen C (2021) Mask-guided dual attention-aware network for visible-infrared person re-identification. Multimed Tools Applic 80(12):17645–17666. https://doi.org/10.1007/s11042-020-10431-5
Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: ECCV, pp 501–518
Tian X, Zhang Z, Lin S, Qu Y, Xie Y, Ma L (2021) Farewell to mutual information: variational distillation for cross-modal person re-identification. In: IEEE Conference on CVPR, pp 1522–1531
Wang G, Yang Y, Zhang T, Cheng J, Hou Z, Tiwari P, Pandey H M (2020) Cross-modality paired-images generation and augmentation for rgb-infrared person re-identification. Neural Netw 128:294–304. https://doi.org/10.1016/j.neunet.2020.05.008
Wang G, Zhang T, Cheng J, Liu S, Yang Y, Hou Z (2019) Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. In: IEEE Conference on ICCV, pp 3622–3631
Wang G, Zhang T, Yang Y, Cheng J, Chang J, Liang X, Hou Z-G (2020) Cross-modality paired-images generation for rgb-infrared person re-identification. In: AAAI. https://ojs.aaai.org/index.php/AAAI/article/view/6894. Accessed 28 Sept 2021, pp 12144–12151
Wang X, Girshick R B, Gupta A, He K (2018) Non-local neural networks. In: IEEE Conference on CVPR, pp 7794–7803
Wang Z, Wang Z, Zheng Y, Chuang Y-Y, Satoh S (2019) Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: IEEE Conference on CVPR, pp 618–626
Wu A, Zheng W-S, Gong S, Lai J (2020) RGB-IR person re-identification by cross-modality similarity preservation. Int J Comput Vision 128 (6):1765–1785. https://doi.org/10.1007/s11263-019-01290-1
Wu A, Zheng W-S, Guo X, Lai J-H (2019) Distilled person re-identification: towards a more scalable system. In: IEEE Conference on CVPR, pp 1187–1196
Wu A, Zheng W-S, Yu H-X, Gong S, Lai J (2017) Rgb-infrared cross-modality person re-identification. In: IEEE conference on ICCV, pp 5390–5399
Wu D, Ye M, Lin G, Gao X, Shen J (2021) Person re-identification by context-aware part attention and multi-head collaborative learning. IEEE Trans Inf Forensics Secur 17:115–126. https://doi.org/10.1109/TIFS.2021.3075894
Wu F, Jing X-Y, Wu Z, Ji Y, Dong X, Luo X, Huang Q, Wang R (2020) Modality-specific and shared generative adversarial network for cross-modal retrieval. Pattern Recogn 104:107335. https://doi.org/10.1016/j.patcog.2020.107335
Wu Q, Dai P, Chen J, Lin C-W, Wu Y, Huang F, Zhong B, Ji R (2021) Discover cross-modality nuances for visible-infrared person re-identification. In: IEEE Conference on CVPR, pp 4330–4339
Wu W, Tao D, Li H, Yang Z, Cheng J (2021) Deep features for person re-identification on metric learning. Pattern Recogn 110:107424. https://doi.org/10.1016/j.patcog.2020.107424
Wu Y, Bourahla O E F, Li X, Wu F, Tian Q, Zhou X (2020) Adaptive graph representation learning for video person re-identification. IEEE Trans Image Process 29:8821–8830. https://doi.org/10.1109/CVPR.2019.00128https://doi.org/10.1109/CVPR.2019.00128
Xie Z, Li L, Zhong X, Zhong L, Xiang J (2020) Image-to-video person re-identification with cross-modal embeddings. Pattern Recogn Lett 133:70–76. https://doi.org/10.1016/j.patrec.2019.03.003
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi S C H (2021) Deep learning for person re-identification: a survey and outlook. IEEE Trans Pattern Anal Mach Intell, 1–1. https://doi.org/10.1109/TPAMI.2021.3054775https://doi.org/10.1109/TPAMI.2021.3054775
Ye M, Lan X, Leng Q (2019) Modality-aware collaborative learning for visible thermal person re-identification. In: ACM Multimedia, pp 347–355
Ye M, Lan X, Leng Q, Shen J (2020) Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans Image Process 29:9387–9399. https://doi.org/10.1109/TIP.2020.2998275https://doi.org/10.1109/TIP.2020.2998275
Ye M, Lan X, Li J, Yuen P C (2018) Hierarchical discriminative learning for visible thermal person re-identification. In: AAAI. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16734, pp 7501–7508
Ye M, Lan X, Wang Z, Yuen P C (2020) Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Trans Inf Forensics Secur 15:407–419. https://doi.org/10.1109/TIFS.2019.2921454https://doi.org/10.1109/TIFS.2019.2921454
Ye M, Ruan W, Du B, Shou M Z (2021) Channel augmented joint learning for visible-infrared recognition. In: IEEE Conference on ICCV, pp 13547–13556
Ye M, Shen J, Crandall D J, Shao L, Luo J (2020) Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: ECCV, vol 12362, pp 229–247
Ye M, Shen J, Shao L (2021) Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Trans Inf Forensics Secur 16:728–739. https://doi.org/10.1109/TIFS.2020.3001665https://doi.org/10.1109/TIFS.2020.3001665
Ye M, Wang Z, Lan X, Yuen P C (2018) Visible thermal person re-identification via dual-constrained top-ranking. In: IJCAI, pp 1092–1099
Yin J, Wu A, Zheng W-S (2020) Fine-grained person re-identification. Int J Comput Vis 128(6):1654–1672. https://doi.org/10.1007/s11263-019-01259-0
Yu H-X, Wu A, Zheng W-S (2020) Unsupervised person re-identification by deep asymmetric metric embedding. IEEE Trans Pattern Anal Mach Intell 42 (4):956–973. https://doi.org/10.1109/CVPR.2019.00085https://doi.org/10.1109/CVPR.2019.00085
Zhang P, Xu J, Wu Q, Huang Y, Zhang J (2020) Top-push constrained modality-adaptive dictionary learning for cross-modality person re-identification. IEEE Trans Circuits Syst Video Technol 30(12):4554–4566. https://doi.org/10.1109/TCSVT.2019.2939564
Zhang S, Yang Y, Wang P, Zhang X, Zhang Y (2019) Attend to the difference: cross-modality person re-identification via contrastive correlation. arXiv:??abs/1910.11656
Zhang W, He X, Yu X, Lu W, Zha Z, Tian Q (2020) A multi-scale spatial-temporal attention model for person re-identification in videos. IEEE Trans Image Process 29:3365–3373. https://doi.org/10.1109/TIP.2019.2959653
Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: surpassing human-level performance in person re-identification. arXiv:1711.08184
Zhao Y-B, Lin J-W, Xuan Q, Xi X (2019) HPILN: a feature learning framework for cross-modality person re-identification. IET Image Processing 13(14):2897–2904. https://doi.org/10.1049/iet-ipr.2019.0699https://doi.org/10.1049/iet-ipr.2019.0699
Zhu X, Jing X-Y, Zhang F, Zhang X, You X, Cui X (2019) Distance learning by mining hard and easy negative samples for person re-identification. Pattern Recogn 95:211–222. https://doi.org/10.1016/j.patcog.2019.06.007
Zhu Y, Yang Z, Wang L, Zhao S, Hu X, Tao D (2020) Hetero-center loss for cross-modality person re-identification. Neurocomputing 386:97–109. https://doi.org/10.1016/j.neucom.2019.12.100
Acknowledgments
This work was supported by the NSFC Project (No. 62176069), Young Scientists Fund of the National Natural Science Foundation of China (No. 62006070), Natural Science Foundation of Henan Province (Nos. 202300410092 and 202300410093), Key Scientific and Technological Project of Henan Province of China (Nos. 222102210204 and 222102210197), and the Excellent Youth Scientific Research Project of Hunan Education Department (No. 21B0582).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, X., Zheng, M., Chen, X. et al. Information disentanglement based cross-modal representation learning for visible-infrared person re-identification. Multimed Tools Appl 82, 37983–38009 (2023). https://doi.org/10.1007/s11042-022-13669-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13669-3