Information disentanglement based cross-modal representation learning for visible-infrared person re-identification | Multimedia Tools and Applications Skip to main content
Log in

Information disentanglement based cross-modal representation learning for visible-infrared person re-identification

  • 1227: Content-based Image Retrieval
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Visible-infrared person re-identification (VI-ReID) is an important but very challenging task in the automated video surveillance and forensics. Although existing VI-ReID methods have achieved very encouraging results, how to make full use of the useful information contained in cross-modality visible and infrared images has not been well studied. In this paper, we propose an Information Disentanglement based Cross-modal Representation Learning (IDCRL) approach for VI-ReID. Specifically, IDCRL first extracts the shared and specific features from data of each modality by using the shared feature learning module and the specific feature learning module, respectively. To ensure that the shared and specific information can be well disentangled, we impose an orthogonality constraint on the shared and specific features of each modality. To make the shared features extracted from the visible and infrared images of the same person own high similarity, IDCRL designs a shared feature consistency constraint. Furthermore, IDCRL uses a modality-aware loss to ensure that the useful modality-specific features can be extracted from each modality effectively. Then, the obtained shared and specific features are concatenated as the representation of each image. Finally, identity loss function and cross-modal discriminant loss function are employed to enhance the discriminability of the obtained image representation. We conducted comprehensive experiments on the benchmark visible-infrared pedestrian datasets (SYSU-MM01 and RegDB) to evaluate the efficacy of our IDCRL approach. Experimental results demonstrate that IDCRL outperforms the compared state-of-the-art methods. On the SYSU-MM01 dataset, the rank-1 matching rate of our approach reaches 62.35% and 71.64% in the all-search and in-door modes, respectively. On the RegDB dataset, the rank-1 result of our approach reaches 76.32% and 75.49% in the visible to thermal and thermal to visible modes, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

The SYSU-MM01 dataset that supports the findings of this study is available from the corresponding author upon reasonable request. The researchers will need to sign a dataset release agreement before they can obtain the download link. The related page for SYSU-MM01 is https://github.com/wuancong/SYSU-MM01. The RegDB dataset that supports the findings of this study is publicly available online at https://github.com/bismex/HiCMD and https://drive.google.com/file/d/1gnVt9GIQSvium_mcxc7AWLhSXm6lNWsa/view?usp=sharing.

References

  1. Basaran E, Gökmen M, Kamasak M E (2020) An efficient framework for visible-infrared cross modality person re-identification. Signal Process: Image Commun 87:115933. https://doi.org/10.1016/j.image.2020.115933https://doi.org/10.1016/j.image.2020.115933

    Google Scholar 

  2. Chen W, Chen X, Zhang J, Huang K (2017) Beyond triplet loss: a deep quadruplet network for person re-identification. In: IEEE Conference on CVPR, pp 1320–1329

  3. Chen W, Lu Y, Ma H, Chen Q, Wu X, Wu P (2022) Self-attention mechanism in person re-identification models. Multimed Tools Applic 81 (4):4649–4667. https://doi.org/10.1007/s11042-020-10494-4

    Article  Google Scholar 

  4. Chen Y, Wan L, Li Z, Jing Q, Sun Z (2021) Neural feature search for rgb-infrared person re-identification. In: IEEE Conference on CVPR, pp 587–597

  5. Choi S, Lee S, Kim Y, Kim T, Kim C (2020) Hi-cmd: ical cross-modality disentanglement for visible-infrared person re-identification. In: IEEE Conference on CVP, pp 10254–10263

  6. Dai J, Zhang P, Wang D, Lu H, Wang H (2019) Video person re-identification by temporal residual learning. IEEE Trans Image Process 28(3):1366–1377. https://doi.org/10.1109/TIP.2018.2878505

    Article  MathSciNet  Google Scholar 

  7. Dai P, Ji R, Wang H, Wu Q, Huang Y (2018) Cross-modality person re-identification with generative adversarial training. In: IJCAI, pp 677–683

  8. Feng Z-X, Lai J, Xie X (2020) Learning modality-specific representations for visible-infrared person re-identification. IEEE Trans Image Process 29:579–590. https://doi.org/10.1109/TIP.2019.2928126

    Article  MathSciNet  MATH  Google Scholar 

  9. Hao X, Zhao S, Ye M, Shen J (2021) Cross-modality person re-identification via modality confusion and center aggregation. In: IEEE Conference on ICCV, pp 16383–16392

  10. Hao Y, Wang N, Gao X, Li J, Wang X (2019) Dual-alignment feature embedding for cross-modality person re-identification. In: ACM Multimedia, pp 57–65

  11. Hao Y, Wang N, Li J, Gao X (2019) HSME: hypersphere manifold embedding for visible thermal person re-identification. In: AAAI, pp 8385–8392

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on CVPR, pp 770–778

  13. Huang P, Zhu S, Wang D, Liang Z (2022) Cross-modality person re-identication with triple-attentive feature aggregation. Multimed Tools Applic 81(3):4455–4473. https://doi.org/10.1007/s11042-021-11739-6https://doi.org/10.1007/s11042-021-11739-6

    Article  Google Scholar 

  14. Jia M, Zhai Y, Lu S, Ma S, Zhang J (2020) A similarity inference metric for rgb-infrared cross-modality person re-identification. In: IJCAI, pp 1026–1032

  15. Jia X, Jing X-Y, Zhu X, Chen S, Du B, Cai Z, He Z, Yue D (2021) Semi-supervised multi-view deep discriminant representation learning. IEEE Trans Pattern Anal Mach Intell 43(7):2496–2509. https://doi.org/10.1109/TPAMI.2020.2973634

    Article  Google Scholar 

  16. Jiang J, Jin K, Qi M, Wang Q, Wu J, Chen C (2020) A cross-modal multi-granularity attention network for RGB-IR person re-identification. Neurocomputing 406:59–67. https://doi.org/10.1016/j.neucom.2020.03.109https://doi.org/10.1016/j.neucom.2020.03.109

    Article  Google Scholar 

  17. Kniaz V V, Knyaz V A, Hladuvka J, Kropatsch W G, Mizginov V (2018) Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset. In: ECCV, vol 11134, pp 606–624

  18. Li D, Wei X, Hong X, Gong Y (2020) Infrared-visible cross-modal person re-identification with an X modality. In: AAAI. https://ojs.aaai.org/index.php/AAAI/article/view/5891. Accessed 25 Sept 2021, pp 4610–4617

  19. Li M, Zhu X, Gong S (2020) Unsupervised tracklet person re-identification. IEEE Trans Pattern Anal Mach Intell 42(7):1770–1782. https://doi.org/10.1109/TPAMI.2019.2903058

    Article  Google Scholar 

  20. Li W, Zhu X, Gong S (2020) Scalable person re-identification by harmonious attention. Int J Comput Vision 128(6):1635–1653. https://doi.org/10.1007/s11263-019-01274-1

    Article  Google Scholar 

  21. Liang W, Wang G, Lai J, Xie X (2021) Homogeneous-to-heterogeneous: unsupervised learning for rgb-infrared person re-identification. IEEE Trans Image Process 30:6392–6407. https://doi.org/10.1109/TIP.2021.3092578https://doi.org/10.1109/TIP.2021.3092578

    Article  MathSciNet  Google Scholar 

  22. Liao S, Hu Y, Zhu X, Li S Z (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE Conference on CVPR, pp 2197–2206

  23. Liao S, Li S Z (2015) Efficient PSD constrained asymmetric metric learning for person re-identification. In: IEEE Conference on ICCV, pp 3685–3693

  24. Liu H, Cheng J, Wang W, Su Y, Bai H (2020) Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification. Neurocomputing 398:11–19. https://doi.org/10.1016/j.neucom.2020.01.089https://doi.org/10.1016/j.neucom.2020.01.089

    Article  Google Scholar 

  25. Liu K, Ma B, Zhang W, Huang R (2015) A spatio-temporal appearance representation for viceo-based pedestrian re-identification. In: IEEE Conference on ICCV, pp 3810–3818

  26. Lu Y, Wu Y, Liu B, Zhang T, Li B, Chu Q, Yu N (2020) Cross-modality person re-identification with shared-specific feature transfer. In: IEEE Conference on CVPR, pp 13376–13386

  27. Matsukawa T, Okabe T, Suzuki E, Sato Y (2016) Hierarchical gaussian descriptor for person re-identification. In: IEEE Conference on CVPR, pp 1363–1372

  28. Meng J, Wu S, Zheng W-S (2019) Weakly supervised person re-identification. In: IEEE Conference on CVPR, pp 760–769

  29. Nguyen D T, Hong H G, Kim K-W, Park K R (2017) Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3):605. https://doi.org/10.3390/s17030605https://doi.org/10.3390/s17030605

    Article  Google Scholar 

  30. Park H, Lee S, Lee J, Ham B (2021) Learning by aligning: visible-infrared person re-identification using cross-modal correspondences. In: IEEE Conference on ICCV, pp 12046–12055

  31. Qi M, Wang S, Huang G, Jiang J, Wu J, Chen C (2021) Mask-guided dual attention-aware network for visible-infrared person re-identification. Multimed Tools Applic 80(12):17645–17666. https://doi.org/10.1007/s11042-020-10431-5

    Article  Google Scholar 

  32. Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: ECCV, pp 501–518

  33. Tian X, Zhang Z, Lin S, Qu Y, Xie Y, Ma L (2021) Farewell to mutual information: variational distillation for cross-modal person re-identification. In: IEEE Conference on CVPR, pp 1522–1531

  34. Wang G, Yang Y, Zhang T, Cheng J, Hou Z, Tiwari P, Pandey H M (2020) Cross-modality paired-images generation and augmentation for rgb-infrared person re-identification. Neural Netw 128:294–304. https://doi.org/10.1016/j.neunet.2020.05.008

    Article  Google Scholar 

  35. Wang G, Zhang T, Cheng J, Liu S, Yang Y, Hou Z (2019) Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. In: IEEE Conference on ICCV, pp 3622–3631

  36. Wang G, Zhang T, Yang Y, Cheng J, Chang J, Liang X, Hou Z-G (2020) Cross-modality paired-images generation for rgb-infrared person re-identification. In: AAAI. https://ojs.aaai.org/index.php/AAAI/article/view/6894. Accessed 28 Sept 2021, pp 12144–12151

  37. Wang X, Girshick R B, Gupta A, He K (2018) Non-local neural networks. In: IEEE Conference on CVPR, pp 7794–7803

  38. Wang Z, Wang Z, Zheng Y, Chuang Y-Y, Satoh S (2019) Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: IEEE Conference on CVPR, pp 618–626

  39. Wu A, Zheng W-S, Gong S, Lai J (2020) RGB-IR person re-identification by cross-modality similarity preservation. Int J Comput Vision 128 (6):1765–1785. https://doi.org/10.1007/s11263-019-01290-1

    Article  MathSciNet  Google Scholar 

  40. Wu A, Zheng W-S, Guo X, Lai J-H (2019) Distilled person re-identification: towards a more scalable system. In: IEEE Conference on CVPR, pp 1187–1196

  41. Wu A, Zheng W-S, Yu H-X, Gong S, Lai J (2017) Rgb-infrared cross-modality person re-identification. In: IEEE conference on ICCV, pp 5390–5399

  42. Wu D, Ye M, Lin G, Gao X, Shen J (2021) Person re-identification by context-aware part attention and multi-head collaborative learning. IEEE Trans Inf Forensics Secur 17:115–126. https://doi.org/10.1109/TIFS.2021.3075894

    Article  Google Scholar 

  43. Wu F, Jing X-Y, Wu Z, Ji Y, Dong X, Luo X, Huang Q, Wang R (2020) Modality-specific and shared generative adversarial network for cross-modal retrieval. Pattern Recogn 104:107335. https://doi.org/10.1016/j.patcog.2020.107335

    Article  Google Scholar 

  44. Wu Q, Dai P, Chen J, Lin C-W, Wu Y, Huang F, Zhong B, Ji R (2021) Discover cross-modality nuances for visible-infrared person re-identification. In: IEEE Conference on CVPR, pp 4330–4339

  45. Wu W, Tao D, Li H, Yang Z, Cheng J (2021) Deep features for person re-identification on metric learning. Pattern Recogn 110:107424. https://doi.org/10.1016/j.patcog.2020.107424

    Article  Google Scholar 

  46. Wu Y, Bourahla O E F, Li X, Wu F, Tian Q, Zhou X (2020) Adaptive graph representation learning for video person re-identification. IEEE Trans Image Process 29:8821–8830. https://doi.org/10.1109/CVPR.2019.00128https://doi.org/10.1109/CVPR.2019.00128

    Article  MATH  Google Scholar 

  47. Xie Z, Li L, Zhong X, Zhong L, Xiang J (2020) Image-to-video person re-identification with cross-modal embeddings. Pattern Recogn Lett 133:70–76. https://doi.org/10.1016/j.patrec.2019.03.003

    Article  Google Scholar 

  48. Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi S C H (2021) Deep learning for person re-identification: a survey and outlook. IEEE Trans Pattern Anal Mach Intell, 1–1. https://doi.org/10.1109/TPAMI.2021.3054775https://doi.org/10.1109/TPAMI.2021.3054775

  49. Ye M, Lan X, Leng Q (2019) Modality-aware collaborative learning for visible thermal person re-identification. In: ACM Multimedia, pp 347–355

  50. Ye M, Lan X, Leng Q, Shen J (2020) Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans Image Process 29:9387–9399. https://doi.org/10.1109/TIP.2020.2998275https://doi.org/10.1109/TIP.2020.2998275

    Article  MATH  Google Scholar 

  51. Ye M, Lan X, Li J, Yuen P C (2018) Hierarchical discriminative learning for visible thermal person re-identification. In: AAAI. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16734, pp 7501–7508

  52. Ye M, Lan X, Wang Z, Yuen P C (2020) Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Trans Inf Forensics Secur 15:407–419. https://doi.org/10.1109/TIFS.2019.2921454https://doi.org/10.1109/TIFS.2019.2921454

    Article  Google Scholar 

  53. Ye M, Ruan W, Du B, Shou M Z (2021) Channel augmented joint learning for visible-infrared recognition. In: IEEE Conference on ICCV, pp 13547–13556

  54. Ye M, Shen J, Crandall D J, Shao L, Luo J (2020) Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: ECCV, vol 12362, pp 229–247

  55. Ye M, Shen J, Shao L (2021) Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Trans Inf Forensics Secur 16:728–739. https://doi.org/10.1109/TIFS.2020.3001665https://doi.org/10.1109/TIFS.2020.3001665

    Article  Google Scholar 

  56. Ye M, Wang Z, Lan X, Yuen P C (2018) Visible thermal person re-identification via dual-constrained top-ranking. In: IJCAI, pp 1092–1099

  57. Yin J, Wu A, Zheng W-S (2020) Fine-grained person re-identification. Int J Comput Vis 128(6):1654–1672. https://doi.org/10.1007/s11263-019-01259-0

    Article  Google Scholar 

  58. Yu H-X, Wu A, Zheng W-S (2020) Unsupervised person re-identification by deep asymmetric metric embedding. IEEE Trans Pattern Anal Mach Intell 42 (4):956–973. https://doi.org/10.1109/CVPR.2019.00085https://doi.org/10.1109/CVPR.2019.00085

    Article  Google Scholar 

  59. Zhang P, Xu J, Wu Q, Huang Y, Zhang J (2020) Top-push constrained modality-adaptive dictionary learning for cross-modality person re-identification. IEEE Trans Circuits Syst Video Technol 30(12):4554–4566. https://doi.org/10.1109/TCSVT.2019.2939564

    Article  Google Scholar 

  60. Zhang S, Yang Y, Wang P, Zhang X, Zhang Y (2019) Attend to the difference: cross-modality person re-identification via contrastive correlation. arXiv:??abs/1910.11656

  61. Zhang W, He X, Yu X, Lu W, Zha Z, Tian Q (2020) A multi-scale spatial-temporal attention model for person re-identification in videos. IEEE Trans Image Process 29:3365–3373. https://doi.org/10.1109/TIP.2019.2959653

    Article  MATH  Google Scholar 

  62. Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: surpassing human-level performance in person re-identification. arXiv:1711.08184

  63. Zhao Y-B, Lin J-W, Xuan Q, Xi X (2019) HPILN: a feature learning framework for cross-modality person re-identification. IET Image Processing 13(14):2897–2904. https://doi.org/10.1049/iet-ipr.2019.0699https://doi.org/10.1049/iet-ipr.2019.0699

    Article  Google Scholar 

  64. Zhu X, Jing X-Y, Zhang F, Zhang X, You X, Cui X (2019) Distance learning by mining hard and easy negative samples for person re-identification. Pattern Recogn 95:211–222. https://doi.org/10.1016/j.patcog.2019.06.007

    Article  Google Scholar 

  65. Zhu Y, Yang Z, Wang L, Zhao S, Hu X, Tao D (2020) Hetero-center loss for cross-modality person re-identification. Neurocomputing 386:97–109. https://doi.org/10.1016/j.neucom.2019.12.100

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the NSFC Project (No. 62176069), Young Scientists Fund of the National Natural Science Foundation of China (No. 62006070), Natural Science Foundation of Henan Province (Nos. 202300410092 and 202300410093), Key Scientific and Technological Project of Henan Province of China (Nos. 222102210204 and 222102210197), and the Excellent Youth Scientific Research Project of Hunan Education Department (No. 21B0582).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiaopan Chen or Xinyu Zhang.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Zheng, M., Chen, X. et al. Information disentanglement based cross-modal representation learning for visible-infrared person re-identification. Multimed Tools Appl 82, 37983–38009 (2023). https://doi.org/10.1007/s11042-022-13669-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13669-3

Keywords

Navigation