Abstract
In this article, we propose a generic method to build thematic datasets from social media. Many research works gather their data from social media, but the extraction processes used are mostly ad hoc and do not follow a formal or standardized method. We aim at extending the processes currently used by designing an iterative, generic and domain-independent approach to build thematic datasets from social media with three modulable dimensions at its core: spatial, temporal and thematic. We experiment our method using data extracted from Twitter to build a thematic dataset about tourism in a highly touristic region. This dataset is then evaluated using both quantitative and qualitative metrics to highlight the value of this method. The application to this use case shows the effectiveness of our domain-independent method to generate thematic datasets from Twitter data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aguiar, A., Szekut, A.: Big data and tourism: opportunities and applications in tourism destination management. Appl. Tour. 4, 36 (2019)
Chiruzzo, L., Castro, S., Rosá, A.: HAHA 2019 dataset: a corpus for humor analysis in Spanish. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 5106–5112 (2020)
Cignarella, A.T., Lai, M., Bosco, C., Patti, V., Paolo, R., et al.: Overview of the task on stance detection in Italian tweets. In: EVALITA 2020 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, pp. 1–10. CEUR (2020)
Cossin, S., Jouhet, V., Mougin, F., Diallo, G., Thiessard, F.: IAM at CLEF eHealth 2018: concept annotation and coding in French death certificates. arXiv preprint arXiv:1807.03674 (2018)
Han, S., Lee, K., Lee, D., Lee, G.G.: Counseling dialog system with 5W1H extraction. In: Proceedings of the SIGDIAL 2013 Conference, pp. 349–353 (2013)
Sathick, J., Venkat, J.: A generic framework for extraction of knowledge from social web sources (social networking websites) for an online recommendation system. Int. Rev. Res. Open Distrib. Learn. 16(2), 247–271 (2015)
Scholz, J., Jeznik, J.: Evaluating geo-tagged twitter data to analyze tourist flows in Styria, Austria. ISPRS Int. J. Geo Inf. 9(11), 681 (2020)
Shimada, K., Inoue, S., Maeda, H., Endo, T.: Analyzing tourism information on Twitter for a local city. In: 2011 First ACIS International Symposium on Software and Network Engineering, pp. 61–66. IEEE (2011)
Sloan, L., Morgan, J.: Who tweets with their location? Understanding the relationship between demographic characteristics and the use of geoservices and geotagging on Twitter. PLoS ONE 10(11), e0142209 (2015)
Zenasni, S., Kergosien, E., Roche, M., Teisseire, M.: Spatial information extraction from short messages. Expert Syst. Appl. 95, 351–367 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Masson, M., Sallaberry, C., Agerri, R., Bessagnet, MN., Roose, P., Le Parc Lacayrelle, A. (2022). A Domain-Independent Method for Thematic Dataset Building from Social Media: The Case of Tourism on Twitter. In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham. https://doi.org/10.1007/978-3-031-20891-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-20891-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20890-4
Online ISBN: 978-3-031-20891-1
eBook Packages: Computer ScienceComputer Science (R0)