A Domain-Independent Method for Thematic Dataset Building from Social Media: The Case of Tourism on Twitter | SpringerLink
Skip to main content

A Domain-Independent Method for Thematic Dataset Building from Social Media: The Case of Tourism on Twitter

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2022 (WISE 2022)

Abstract

In this article, we propose a generic method to build thematic datasets from social media. Many research works gather their data from social media, but the extraction processes used are mostly ad hoc and do not follow a formal or standardized method. We aim at extending the processes currently used by designing an iterative, generic and domain-independent approach to build thematic datasets from social media with three modulable dimensions at its core: spatial, temporal and thematic. We experiment our method using data extracted from Twitter to build a thematic dataset about tourism in a highly touristic region. This dataset is then evaluated using both quantitative and qualitative metrics to highlight the value of this method. The application to this use case shows the effectiveness of our domain-independent method to generate thematic datasets from Twitter data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://developer.twitter.com/en/products/twitter-api/academic-research.

  2. 2.

    https://www.e-unwto.org/doi/book/10.18111/9789284404551.

  3. 3.

    https://www.tweepy.org.

  4. 4.

    https://www.openstreetmap.org/.

References

  1. Aguiar, A., Szekut, A.: Big data and tourism: opportunities and applications in tourism destination management. Appl. Tour. 4, 36 (2019)

    Article  Google Scholar 

  2. Chiruzzo, L., Castro, S., Rosá, A.: HAHA 2019 dataset: a corpus for humor analysis in Spanish. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 5106–5112 (2020)

    Google Scholar 

  3. Cignarella, A.T., Lai, M., Bosco, C., Patti, V., Paolo, R., et al.: Overview of the task on stance detection in Italian tweets. In: EVALITA 2020 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, pp. 1–10. CEUR (2020)

    Google Scholar 

  4. Cossin, S., Jouhet, V., Mougin, F., Diallo, G., Thiessard, F.: IAM at CLEF eHealth 2018: concept annotation and coding in French death certificates. arXiv preprint arXiv:1807.03674 (2018)

  5. Han, S., Lee, K., Lee, D., Lee, G.G.: Counseling dialog system with 5W1H extraction. In: Proceedings of the SIGDIAL 2013 Conference, pp. 349–353 (2013)

    Google Scholar 

  6. Sathick, J., Venkat, J.: A generic framework for extraction of knowledge from social web sources (social networking websites) for an online recommendation system. Int. Rev. Res. Open Distrib. Learn. 16(2), 247–271 (2015)

    Google Scholar 

  7. Scholz, J., Jeznik, J.: Evaluating geo-tagged twitter data to analyze tourist flows in Styria, Austria. ISPRS Int. J. Geo Inf. 9(11), 681 (2020)

    Article  Google Scholar 

  8. Shimada, K., Inoue, S., Maeda, H., Endo, T.: Analyzing tourism information on Twitter for a local city. In: 2011 First ACIS International Symposium on Software and Network Engineering, pp. 61–66. IEEE (2011)

    Google Scholar 

  9. Sloan, L., Morgan, J.: Who tweets with their location? Understanding the relationship between demographic characteristics and the use of geoservices and geotagging on Twitter. PLoS ONE 10(11), e0142209 (2015)

    Article  Google Scholar 

  10. Zenasni, S., Kergosien, E., Roche, M., Teisseire, M.: Spatial information extraction from short messages. Expert Syst. Appl. 95, 351–367 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maxime Masson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Masson, M., Sallaberry, C., Agerri, R., Bessagnet, MN., Roose, P., Le Parc Lacayrelle, A. (2022). A Domain-Independent Method for Thematic Dataset Building from Social Media: The Case of Tourism on Twitter. In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham. https://doi.org/10.1007/978-3-031-20891-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20891-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20890-4

  • Online ISBN: 978-3-031-20891-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics