Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-identification | SpringerLink
Skip to main content

Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-identification

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12362))

Included in the following conference series:

  • 4402 Accesses

Abstract

Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Due to the large intra-class variations and cross-modality discrepancy with large amount of sample noise, it is difficult to learn discriminative part features. Existing VI-ReID methods instead tend to learn global representations, which have limited discriminability and weak robustness to noisy images. In this paper, we propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID. We propose an intra-modality weighted-part attention module to extract discriminative part-aggregated features, by imposing the domain knowledge on the part relationship mining. To enhance robustness against noisy samples, we introduce cross-modality graph structured attention to reinforce the representation with the contextual relations across the two modalities. We also develop a parameter-free dynamic dual aggregation learning strategy to adaptively integrate the two components in a progressive joint training manner. Extensive experiments demonstrate that DDAG outperforms the state-of-the-art methods under various settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We adopt ResNet50 as the backbone network, following [44, 49, 58].

References

  1. Bai, S., Tang, P., Torr, P.H., Latecki, L.J.: Re-ranking via metric fusion for object retrieval and person re-identification. In: CVPR, pp. 740–749 (2019)

    Google Scholar 

  2. Basaran, E., Gokmen, M., Kamasak, M.E.: An efficient framework for visible-infrared cross modality person re-identification. arXiv preprint arXiv:1907.06498 (2019)

  3. Cao, J., Pang, Y., Han, J., Li, X.: Hierarchical shot detector. In: ICCV, pp. 9705–9714 (2019)

    Google Scholar 

  4. Chen, B., Deng, W., Hu, J.: Mixed high-order attention network for person re-identification. In: ICCV, pp. 371–381 (2019)

    Google Scholar 

  5. Chen, D., et al.: Improving deep visual representation for person re-identification by global and local image-language association. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 56–73. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_4

    Chapter  Google Scholar 

  6. Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: ICML, pp. 793–802 (2018)

    Google Scholar 

  7. Dai, P., Ji, R., Wang, H., Wu, Q., Huang, Y.: Cross-modality person re-identification with generative adversarial training. In: IJCAI, pp. 677–683 (2018)

    Google Scholar 

  8. Fang, P., Zhou, J., Roy, S.K., Petersson, L., Harandi, M.: Bilinear attention networks for person retrieval. In: ICCV, pp. 8030–8039 (2019)

    Google Scholar 

  9. Feng, Z., Lai, J., Xie, X.: Learning modality-specific representations for visible-infrared person re-identification. IEEE TIP 29, 579–590 (2020)

    MathSciNet  Google Scholar 

  10. Gong, Y., Zhang, Y., Poellabauer, C., et al.: Second-order non-local attention networks for person re-identification. In: ICCV, pp. 3760–3769 (2019)

    Google Scholar 

  11. Hao, Y., Wang, N., Li, J., Gao, X.: HSME: hypersphere manifold embedding for visible thermal person re-identification. In: AAAI, pp. 8385–8392 (2019)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  13. He, R., Wu, X., Sun, Z., Tan, T.: Learning invariant deep representation for NIR-VIS face recognition. In: AAAI, pp. 2000–2006 (2017)

    Google Scholar 

  14. Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Interaction-and-aggregation network for person re-identification. In: CVPR, pp. 9317–9326 (2019)

    Google Scholar 

  15. Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: VRSTC: occlusion-free video person re-identification. In: CVPR, pp. 7183–7192 (2019)

    Google Scholar 

  16. Huang, D.A., Frank Wang, Y.C.: Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In: ICCV, pp. 2496–2503 (2013)

    Google Scholar 

  17. Jingya, W., Xiatian, Z., Shaogang, G., Wei, L.: Transferable joint attribute-identity deep learning for unsupervised person re-identification. In: CVPR, pp. 2275–2284 (2018)

    Google Scholar 

  18. Leng, Q., Ye, M., Tian, Q.: A survey of open-world person re-identification. IEEE TCSVT 30(4), 1092–1108 (2019)

    Google Scholar 

  19. Li, D., Wei, X., Hong, X., Gong, Y.: Infrared-visible cross-modal person re-identification with an X modality. In: AAAI, pp. 4610–4617 (2020)

    Google Scholar 

  20. Li, S., Bak, S., Carr, P., Wang, X.: Diversity regularized spatiotemporal attention for video-based person re-identification. In: CVPR, pp. 369–378 (2018)

    Google Scholar 

  21. Li, S., Xiao, T., Li, H., Yang, W., Wang, X.: Identity-aware textual-visual matching with latent co-attention. In: ICCV, pp. 1890–1899 (2017)

    Google Scholar 

  22. Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: CVPR, pp. 2285–2294 (2018)

    Google Scholar 

  23. Lin, J.W., Li, H.: HPILN: a feature learning framework for cross-modality person re-identification. arXiv preprint arXiv:1906.03142 (2019)

  24. Liu, C.T., Wu, C.W., Wang, Y.C.F., Chien, S.Y.: Spatially and temporally efficient non-local attention network for video-based person re-identification. In: BMVC (2019)

    Google Scholar 

  25. Liu, H., Cheng, J.: Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification. arXiv preprint arXiv:1907.09659 (2019)

  26. Liu, X., et al.: HydraPlus-Net: attentive deep features for pedestrian analysis. In: ICCV, pp. 350–359 (2017)

    Google Scholar 

  27. Luo, H., et al.: A strong baseline and batch normalization neck for deep person re-identification. arXiv preprint arXiv:1906.08332 (2019)

  28. Mudunuri, S.P., Venkataramanan, S., Biswas, S.: Dictionary alignment with re-ranking for low-resolution NIR-VIS face recognition. IEEE TIFS 14(4), 886–896 (2019)

    Google Scholar 

  29. Nguyen, D.T., Hong, H.G., Kim, K.W., Park, K.R.: Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3), 605 (2017)

    Article  Google Scholar 

  30. Pang, M., Cheung, Y.M., Shi, Q., Li, M.: Iterative dynamic generic learning for face recognition from a contaminated single-sample per person. IEEE TNNLS (2020)

    Google Scholar 

  31. Pang, M., Cheung, Y.M., Wang, B., Lou, J.: Synergistic generic learning for face recognition from a contaminated single sample per person. IEEE TIFS 15, 195–209 (2019)

    Google Scholar 

  32. Peng, C., Wang, N., Li, J., Gao, X.: Re-ranking high-dimensional deep local representation for NIR-VIS face recognition. IEEE TIP 28, 4553–4565 (2019)

    MathSciNet  MATH  Google Scholar 

  33. Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? In: NeurIPS, pp. 2483–2493 (2018)

    Google Scholar 

  34. Sarfraz, M.S., Stiefelhagen, R.: Deep perceptual mapping for cross-modal face recognition. Int. J. Comput. Vision 122(3), 426–438 (2017)

    Article  Google Scholar 

  35. Shao, R., Lan, X., Li, J., Yuen, P.C.: Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In: CVPR, pp. 10023–10031 (2019)

    Google Scholar 

  36. Shao, R., Lan, X., Yuen, P.C.: Joint discriminative learning of deep dynamic textures for 3D mask face anti-spoofing. IEEE TIFS 14(4), 923–938 (2018)

    Google Scholar 

  37. Si, J., et al.: Dual attention matching network for context-aware feature sequence based person re-identification. In: CVPR, pp. 5363–5372 (2018)

    Google Scholar 

  38. Song, G., Chai, W.: Collaborative learning for deep neural networks. In: NeurIPS, pp. 1837–1846 (2018)

    Google Scholar 

  39. Sun, Y., et al.: Perceive where to focus: learning visibility-aware part-level features for partial person re-identification. In: CVPR, pp. 393–402 (2019)

    Google Scholar 

  40. Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_30

    Chapter  Google Scholar 

  41. Tay, C.P., Roy, S., Yap, K.H.: AANet: attribute attention network for person re-identifications. In: CVPR, pp. 7134–7143 (2019)

    Google Scholar 

  42. Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)

    Google Scholar 

  43. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. In: ICLR (2018)

    Google Scholar 

  44. Wang, G., Zhang, T., Cheng, J., Liu, S., Yang, Y., Hou, Z.: RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In: ICCV, pp. 3623–3632 (2019)

    Google Scholar 

  45. Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: ACM MM, pp. 274–282. ACM (2018)

    Google Scholar 

  46. Wang, N., Gao, X., Sun, L., Li, J.: Bayesian face sketch synthesis. IEEE TIP 26(3), 1264–1274 (2017)

    MathSciNet  MATH  Google Scholar 

  47. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)

    Google Scholar 

  48. Wang, Z., Wang, Z., Zheng, Y., Wu, Y., Zeng, W., Satoh, S.: Beyond intra-modality: a survey of heterogeneous person re-identification. In: IJCAI (2020)

    Google Scholar 

  49. Wang, Z., Wang, Z., Zheng, Y., Chuang, Y.Y., Satoh, S.: Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: CVPR, pp. 618–626 (2019)

    Google Scholar 

  50. Wu, A., Zheng, W.s., Yu, H.X., Gong, S., Lai, J.: RGB-infrared cross-modality person re-identification. In: ICCV, pp. 5380–5389 (2017)

    Google Scholar 

  51. Wu, X., Huang, H., Patel, V.M., He, R., Sun, Z.: Disentangled variational representation for heterogeneous face recognition. In: AAAI, pp. 9005–9012 (2019)

    Google Scholar 

  52. Wu, X., Song, L., He, R., Tan, T.: Coupled deep learning for heterogeneous face recognition. In: AAAI, pp. 1679–1686 (2018)

    Google Scholar 

  53. Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp. 2048–2057 (2015)

    Google Scholar 

  54. Yang, W., Huang, H., Zhang, Z., Chen, X., Huang, K., Zhang, S.: Towards rich feature discovery with class activation maps augmentation for person re-identification. In: CVPR, pp. 1389–1398 (2019)

    Google Scholar 

  55. Yao, H., Zhang, S., Hong, R., Zhang, Y., Xu, C., Tian, Q.: Deep representation learning with part loss for person re-identification. IEEE TIP 28(6), 2860–2871 (2019)

    MathSciNet  MATH  Google Scholar 

  56. Ye, M., Lan, X., Leng, Q., Shen, J.: Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans. Image Process. (TIP) 29, 9387–9399 (2020)

    Article  Google Scholar 

  57. Ye, M., Lan, X., Li, J., Yuen, P.C.: Hierarchical discriminative learning for visible thermal person re-identification. In: AAAI, pp. 7501–7508 (2018)

    Google Scholar 

  58. Ye, M., Lan, X., Wang, Z., Yuen, P.C.: Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE TIFS 15, 407–419 (2020)

    Google Scholar 

  59. Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. arXiv preprint arXiv:2001.04193 (2020)

  60. Ye, M., Shen, J., Shao, L.: Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE TIFS 16, 728–739 (2020)

    Google Scholar 

  61. Ye, M., Shen, J., Zhang, X., Yuen, P.C., Chang, S.F.: Augmentation invariant and instance spreading feature for softmax embedding. IEEE TPAMI (2020)

    Google Scholar 

  62. Zeng, Z., Wang, Z., Wang, Z., Zheng, Y., Chuang, Y.Y., Satoh, S.: Illumination-adaptive person re-identification. IEEE TMM (2020)

    Google Scholar 

  63. Zhang, X., Yu, F.X., Karaman, S., Zhang, W., Chang, S.F.: Heated-up softmax embedding. arXiv preprint arXiv:1809.04157 (2018)

  64. Zhang, X., et al.: AlignedReID: surpassing human-level performance in person re-identification. arXiv preprint arXiv:1711.08184 (2017)

  65. Zhang, Y., Li, K., Li, K., Zhong, B., Fu, Y.: Residual non-local attention networks for image restoration. In: ICLR (2019)

    Google Scholar 

  66. Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: ICCV, pp. 3219–3228 (2017)

    Google Scholar 

  67. Zheng, F., et al.: Pyramidal person re-identification via multi-loss dynamic training. In: CVPR, pp. 8514–8522 (2019)

    Google Scholar 

  68. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV, pp. 1116–1124 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianbing Shen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ye, M., Shen, J., J. Crandall, D., Shao, L., Luo, J. (2020). Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-identification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12362. Springer, Cham. https://doi.org/10.1007/978-3-030-58520-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58520-4_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58519-8

  • Online ISBN: 978-3-030-58520-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics