A Transformer-Based Method for UAV-View Geo-Localization | SpringerLink
Skip to main content

A Transformer-Based Method for UAV-View Geo-Localization

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Abstract

The geo-localization is the task of matching a query image depicting the ground-view of an unknown location with a group of satellite-view images with GPS tags. UAV-view can mitigate the large visual differences between images from different viewpoints. CNN-based approaches have achieved great success in cross-view geo-localization, but they rely on polar transform and have limited receptive field of convolution. Therefore, we investigate the geo-localization in terms of both feature representation and viewpoint transformation, design an end-to-end multitask jointly trained network model, introduce an efficient Transformer-based lightweight structure, use the strengths of transformer related to global information modeling and explicit position information encoding, and propose an end-to-end geo-localization architecture integrating a geo-localization module and a cross-view synthesis module, called Transformer-Based for UAV-View Geo-Localization (TUL). The geo-localization module combines feature segmentation and region alignment to achieve cross-view image matching. The cross-view synthesis module synthesizes maps close to real satellite images by conditional generative adversarial networks (cGAN). Image matching and synthesis are also considered, using the network to match images over two input domains so that the network is biased to learn potential feature representations that are useful for image synthesis. Furthermore, an image augmentation strategy is proposed for data enhancement to address the problem of sample imbalance due to the difference in the number of satellite images and images from other sources in University-1652 dataset. Experiments show that our method has significant performance improvement and reaches the state-of-the-art application level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 9380
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 11725
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cai, S., Guo, Y., Khan, S., Hu, J., Wen, G.: Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8391–8400 (2019)

    Google Scholar 

  2. Dai, M., Hu, J., Zhuang, J., Zheng, E.: A transformer-based feature segmentation and region alignment method for UAV-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4376–4389 (2021)

    Article  Google Scholar 

  3. Deng, W., Zheng, L., Ye, Q., Kang, G., Yang, Y., Jiao, J.: Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 994–1003 (2018)

    Google Scholar 

  4. Ding, L., Zhou, J., Meng, L., Long, Z.: A practical cross-view image matching method between UAV and satellite for UAV-based geo-localization. Remote Sens. 13(1), 47 (2020)

    Article  Google Scholar 

  5. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  6. Goodfellow, I., et al.: Generative adversarial nets in advances in neural information processing systems (nips), pp. 2672–2680. Curran Associates, Inc., Red Hook, NY, USA (2014)

    Google Scholar 

  7. Hu, S., Feng, M., Nguyen, R.M., Lee, G.H.: CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7258–7267 (2018)

    Google Scholar 

  8. Li, C., Wand, M.: Precomputed real-time texture synthesis with Markovian generative adversarial networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 702–716. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_43

    Chapter  Google Scholar 

  9. Liu, H., Feng, J., Qi, M., Jiang, J., Yan, S.: End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 26(7), 3492–3506 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  10. Liu, L., Li, H.: Lending orientation to neural networks for cross-view geo-localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5624–5633 (2019)

    Google Scholar 

  11. Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018)

    Article  Google Scholar 

  12. Regmi, K., Borji, A.: Cross-view image synthesis using geometry-guided conditional GANs. Comput. Vis. Image Underst. 187, 102788 (2019)

    Article  Google Scholar 

  13. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

    Google Scholar 

  14. Shi, Y., Liu, L., Yu, X., Li, H.: Spatial-aware feature aggregation for image based cross-view geo-localization. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  15. Tian, X., Shao, J., Ouyang, D., Shen, H.T.: UAV-satellite view synthesis for cross-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4804–4815 (2021)

    Article  Google Scholar 

  16. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)

    Google Scholar 

  17. Wang, T., et al.: Each part matters: local patterns facilitate cross-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 32(2), 867–879 (2021)

    Article  Google Scholar 

  18. Yang, H., Lu, X., Zhu, Y.: Cross-view geo-localization with layer-to-layer transformer. Adv. Neural. Inf. Process. Syst. 34, 29009–29020 (2021)

    Google Scholar 

  19. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4133–4141 (2017)

    Google Scholar 

  20. Zheng, Z., Wei, Y., Yang, Y.: University-1652: a multi-view multi-source benchmark for drone-based geo-localization. In: Proceedings of the 28th ACM International Conference On Multimedia, pp. 1395–1403 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huahu Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, P., Yang, Z., Chen, X., Xu, H. (2023). A Transformer-Based Method for UAV-View Geo-Localization. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14259. Springer, Cham. https://doi.org/10.1007/978-3-031-44223-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44223-0_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44222-3

  • Online ISBN: 978-3-031-44223-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics