Abstract
The automatic identification of location expressions in social media text is an actively researched task. We present a novel approach to detection mentions of locations in the texts of microblogs and social media. We propose an approach based on Noun Phrase extraction and n-gram based matching instead of the traditional methods using Named Entity Recognition (NER) or Conditional Random Fields (CRF), arguing that our method is better suited to noisy microblog text. Our proposed system is comprised of several individual modules to detect addresses, Points of Interest (e.g. hospitals or universities), distance and direction markers; and location names (e.g. suburbs or countries). Our system won the ALTA 2014 Twitter Location Detection shared task with an F-score of 0.792 for detecting location expressions in a test set of 1,000 tweets, demonstrating its efficacy for this task. A number of directions for future work are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Requests for the data should also be directed to the authors of [5].
- 2.
- 3.
Available for download free of charge under a creative commons attribution license.
- 4.
- 5.
- 6.
A noun phrase that contains other NPs, for example, within prepositions.
- 7.
The web service offers a number of advanced features that can help increase search specificity.
- 8.
See [4] for more details about these metrics.
- 9.
- 10.
References
Ao, J., Zhang, P., Cao, Y.: Estimating the locations of emergency events from twitter streams. Procedia Comput. Sci. 31, 731–739 (2014)
Berardi, G., Esuli, A., Marcheggiani, D., Sebastiani, F.: ISTI@ TREC microblog track 2011: exploring the use of hashtag segmentation and text quality ranking. In: TREC (2011)
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)
Grossman, D.A.: Information Retrieval: Algorithms and Heuristics, vol. 15. Springer, Dordrecht (2004)
Lingad, J., Karimi, S., Yin, J.: Location extraction from disaster-related microblogs. In: Proceedings of the 22nd International Conference on World Wide Web Companion, pp. 1017–1020. International World Wide Web Conferences Steering Committee (2013)
Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 359–367. Association for Computational Linguistics (2011)
Mahmud, J., Nichols, J., Drews, C.: Where is this tweet from? inferring homelocations of twitter users. In: ICWSM (2012)
Malmasi, S., Cahill, A.: Measuring feature diversity in native language identification. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 49–55. Association for Computational Linguistics, Denver, June 2015. http://aclweb.org/anthology/W15-0606
Malmasi, S., Dras, M.: Chinese native language identification. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), pp. 95–99. Association for Computational Linguistics, Gothenburg, April 2014. http://aclweb.org/anthology/E14-4019
Malmasi, S., Dras, M.: Large-scale native language identification with cross-corpus evaluation. In: Proceedings of NAACL-HLT 2015, pp. 1403–1409. Association for Computational Linguistics, Denver, June 2015. http://aclweb.org/anthology/N15-1160
Malmasi, S., Wong, S.M.J., Dras, M.: NLI shared task 2013: MQ submission. In: Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 124–133. Association for Computational Linguistics, Atlanta, June 2013. http://www.aclweb.org/anthology/W13-1716
Middleton, S., Middleton, L., Modafferi, S.: Real-time crisis mapping of natural disasters using social media (2014)
Molla, D., Karimi, S.: Overview of the 2014 ALTA shared task: identifying expressions of locations in tweets. In: Proceedings of the Australasian Language Technology Workshop (ALTA), pp. 151, Melbourne, Australia (2014)
Norvig, P.: Natural language corpus data. In: Beautiful Data, pp. 219–242 (2009)
Núñez-Redó, M., Díaz, L., Gil, J., González, D., Huerta, J.: Discovery and integration of web 2.0 content into geospatial information infrastructures: a use case in wild fire monitoring. In: Tjoa, A.M., Quirchmayr, G., You, I., Xu, L. (eds.) ARES 2011. LNCS, vol. 6908, pp. 50–68. Springer, Heidelberg (2011)
Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics (2011)
Tuten, T.L.: Advertising 2.0: social media marketing in a web 2.0 world. Greenwood Publishing Group, New York (2008)
Vieweg, S., Hughes, A.L., Starbird, K., Palen, L.: Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1079–1088. ACM (2010)
Yin, J., Lampert, A., Cameron, M., Robinson, B., Power, R.: Using social media to enhance emergency situation awareness. IEEE Intell. Syst. 27(6), 52–59 (2012)
Acknowledgments
We would like to thank our three anonymous reviewers for their valuable comments. The data and the task’s original idea is from John Lingad’s Honours project (The University of Sydney) co-supervised with Jie Yin (CSIRO). The shared task prize was sponsored by IBM Research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Malmasi, S., Dras, M. (2016). Location Mention Detection in Tweets and Microblogs. In: Hasida, K., Purwarianti, A. (eds) Computational Linguistics. PACLING 2015. Communications in Computer and Information Science, vol 593. Springer, Singapore. https://doi.org/10.1007/978-981-10-0515-2_9
Download citation
DOI: https://doi.org/10.1007/978-981-10-0515-2_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0514-5
Online ISBN: 978-981-10-0515-2
eBook Packages: Computer ScienceComputer Science (R0)