Abstract
In the digital age, social media platforms have amassed a wealth of user-generated content, which contains valuable geographic information. However, the irregularities and noise in user-generated text, have led to suboptimal performance in traditional text-based user geolocation methods. We propose a unsupervised framework for user geolocation based on Large Language Models (LLMs), which utilizes the LLMs’ powerful text processing abilities to geolocate users based on user-generated text unsupervisedly. Firstly, preprocess the text using regularization rules and LLM to denoise and normalize user-generated text, thus enhancing data quality. Subsequently, appropriate prompts are designed to guide the knowledgeable LLM in understanding the user text’s geolocating mechanism, thereby profiling users. To refine user geolocation accuracy, five independent positioning iterations are conducted, with the most frequent occurrence identified as the final user location. Through a series of experiments, we have demonstrated the potential of utilizing large language models for processing noisy text and the effectiveness of geolocating users in an unsupervised setting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 759–768 (2010)
Chi, L., Lim, K.H., Alam, N., Butler, C.J.: Geolocation prediction in twitter using location indicative words and textual features. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), pp. 227–234 (2016)
Eisenstein, J., O’Connor, B., Smith, N.A., Xing, E.P.: A latent variable model for geographic lexical variation. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1277–1287. ACL (2010)
Han, B., Cook, P., Baldwin, T.: Geolocation prediction in social media data by finding location indicative words. In: Proceedings of COLING. pp. 1045–1062 (2012)
Jia, X., et al.: Comparative analysis of urban underground public space and user walking paths based on the social network model. Neural Comput. Appl. 35(36), 24981–24999 (2023)
Kumar, A., Singh, J.P.: Deep neural networks for location reference identification from bilingual disaster-related tweets. IEEE Trans. Comput. Soc. Syst. 11, 880–891 (2024)
Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining, pp. 1023–1031. ACM (2012)
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55(9), 1–35 (2023)
Manvi, R., Khanna, S., Mai, G., Burke, M., Lobell, D.B., Ermon, S.: GeoLLM: extracting geospatial knowledge from large language models. In: The Twelfth International Conference on Learning Representations (2024)
Matsuno, S., Mizuki, S., Sakaki, T.: Improved advertisement targeting via fine-grained location prediction using twitter. In: Companion Proceedings of the Web Conference 2020, pp. 527–532. Association for Computing Machinery, New York, NY, USA (2020)
Rahimi, A., Cohn, T., Baldwin, T.: A neural model for user geolocation and lexical dialectology. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 209–216. ACL (2017)
Ryoo, K., Moon, S.: Inferring twitter user locations with 10 km accuracy. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 643–648 (2014)
Tang, H., Zhao, X., Ren, Y.: A multilayer recognition model for twitter user geolocation. Wirel. Netw. 28, 1–6 (2022)
Tian, H., Zhang, M., Luo, X., Liu, F., Qiao, Y.: Twitter user location inference based on representation learning and label propagation. In: Proceedings of The Web Conference 2020, pp. 2648–2654. ACM (2020)
Zola, P., Ragno, C., Cortez, P.: A Google trends spatial clustering approach for a worldwide Twitter user geolocation. Inf. Process. Manage. 57(6), 102312 (2020)
Acknowledgments
This study was funded by the National Natural Science Foundation of China (No. U23A20305), and Key Research and Development Project of Henan Province (No.221111321200).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, M., Luo, X., Huang, N. (2025). Social Media User Geolocation Based on Large Language Models. In: Chen, X., Huang, X., Yung, M. (eds) Data Security and Privacy Protection. DSPP 2024. Lecture Notes in Computer Science, vol 15215. Springer, Singapore. https://doi.org/10.1007/978-981-97-8540-7_19
Download citation
DOI: https://doi.org/10.1007/978-981-97-8540-7_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8539-1
Online ISBN: 978-981-97-8540-7
eBook Packages: Computer ScienceComputer Science (R0)