A Transformer-Based Method for UAV-View Geo-Localization

Wang, Ping; Yang, Zheyu; Chen, Xueyang; Xu, Huahu

doi:10.1007/978-3-031-44223-0_27

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14259))

Included in the following conference series:

International Conference on Artificial Neural Networks

1262 Accesses
1 Citations

Abstract

The geo-localization is the task of matching a query image depicting the ground-view of an unknown location with a group of satellite-view images with GPS tags. UAV-view can mitigate the large visual differences between images from different viewpoints. CNN-based approaches have achieved great success in cross-view geo-localization, but they rely on polar transform and have limited receptive field of convolution. Therefore, we investigate the geo-localization in terms of both feature representation and viewpoint transformation, design an end-to-end multitask jointly trained network model, introduce an efficient Transformer-based lightweight structure, use the strengths of transformer related to global information modeling and explicit position information encoding, and propose an end-to-end geo-localization architecture integrating a geo-localization module and a cross-view synthesis module, called Transformer-Based for UAV-View Geo-Localization (TUL). The geo-localization module combines feature segmentation and region alignment to achieve cross-view image matching. The cross-view synthesis module synthesizes maps close to real satellite images by conditional generative adversarial networks (cGAN). Image matching and synthesis are also considered, using the network to match images over two input domains so that the network is biased to learn potential feature representations that are useful for image synthesis. Furthermore, an image augmentation strategy is proposed for data enhancement to address the problem of sample imbalance due to the difference in the number of satellite images and images from other sources in University-1652 dataset. Experiments show that our method has significant performance improvement and reaches the state-of-the-art application level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 9380; Price includes VAT (Japan)

Softcover Book: JPY 11725; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Ground-to-Aerial Image Geo-Localization with Cross-View Image Synthesis

Geospecific View Generation Geometry-Context Aware High-Resolution Ground View Inference from Satellite Views

Cross-View Image Geo-Localization with Panorama-BEV Co-retrieval Network

References

Cai, S., Guo, Y., Khan, S., Hu, J., Wen, G.: Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8391–8400 (2019)
Google Scholar
Dai, M., Hu, J., Zhuang, J., Zheng, E.: A transformer-based feature segmentation and region alignment method for UAV-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4376–4389 (2021)
Article Google Scholar
Deng, W., Zheng, L., Ye, Q., Kang, G., Yang, Y., Jiao, J.: Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 994–1003 (2018)
Google Scholar
Ding, L., Zhou, J., Meng, L., Long, Z.: A practical cross-view image matching method between UAV and satellite for UAV-based geo-localization. Remote Sens. 13(1), 47 (2020)
Article Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Goodfellow, I., et al.: Generative adversarial nets in advances in neural information processing systems (nips), pp. 2672–2680. Curran Associates, Inc., Red Hook, NY, USA (2014)
Google Scholar
Hu, S., Feng, M., Nguyen, R.M., Lee, G.H.: CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7258–7267 (2018)
Google Scholar
Li, C., Wand, M.: Precomputed real-time texture synthesis with Markovian generative adversarial networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 702–716. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_43
Chapter Google Scholar
Liu, H., Feng, J., Qi, M., Jiang, J., Yan, S.: End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 26(7), 3492–3506 (2017)
Article MathSciNet MATH Google Scholar
Liu, L., Li, H.: Lending orientation to neural networks for cross-view geo-localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5624–5633 (2019)
Google Scholar
Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018)
Article Google Scholar
Regmi, K., Borji, A.: Cross-view image synthesis using geometry-guided conditional GANs. Comput. Vis. Image Underst. 187, 102788 (2019)
Article Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Shi, Y., Liu, L., Yu, X., Li, H.: Spatial-aware feature aggregation for image based cross-view geo-localization. In: Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Tian, X., Shao, J., Ouyang, D., Shen, H.T.: UAV-satellite view synthesis for cross-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4804–4815 (2021)
Article Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Google Scholar
Wang, T., et al.: Each part matters: local patterns facilitate cross-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 32(2), 867–879 (2021)
Article Google Scholar
Yang, H., Lu, X., Zhu, Y.: Cross-view geo-localization with layer-to-layer transformer. Adv. Neural. Inf. Process. Syst. 34, 29009–29020 (2021)
Google Scholar
Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4133–4141 (2017)
Google Scholar
Zheng, Z., Wei, Y., Yang, Y.: University-1652: a multi-view multi-source benchmark for drone-based geo-localization. In: Proceedings of the 28th ACM International Conference On Multimedia, pp. 1395–1403 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, China
Ping Wang, Zheyu Yang, Xueyang Chen & Huahu Xu

Authors

Ping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zheyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xueyang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Huahu Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huahu Xu .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
Lancaster University, Lancaster, UK
Plamen Angelov
Teesside University, Middlesbrough, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, P., Yang, Z., Chen, X., Xu, H. (2023). A Transformer-Based Method for UAV-View Geo-Localization. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14259. Springer, Cham. https://doi.org/10.1007/978-3-031-44223-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-44223-0_27
Published: 22 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44222-3
Online ISBN: 978-3-031-44223-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Transformer-Based Method for UAV-View Geo-Localization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Ground-to-Aerial Image Geo-Localization with Cross-View Image Synthesis

Geospecific View Generation Geometry-Context Aware High-Resolution Ground View Inference from Satellite Views

Cross-View Image Geo-Localization with Panorama-BEV Co-retrieval Network

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Transformer-Based Method for UAV-View Geo-Localization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Ground-to-Aerial Image Geo-Localization with Cross-View Image Synthesis

Geospecific View Generation Geometry-Context Aware High-Resolution Ground View Inference from Satellite Views

Cross-View Image Geo-Localization with Panorama-BEV Co-retrieval Network

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation