Syndromic Classification of Twitter Messages | SpringerLink
Skip to main content

Syndromic Classification of Twitter Messages

  • Conference paper
Electronic Healthcare (eHealth 2011)

Abstract

Recent studies have shown strong correlation between social networking data and national influenza rates. We expanded upon this success to develop an automated text mining system that classifies Twitter messages in real time into six syndromic categories based on key terms from a public health ontology. 10-fold cross validation tests were used to compare Naive Bayes (NB) and Support Vector Machine (SVM) models on a corpus of 7431 Twitter messages. SVM performed better than NB on 4 out of 6 syndromes. The best performing classifiers showed moderately strong F1 scores: respiratory = 86.2 (NB); gastrointestinal = 85.4 (SVM polynomial kernel degree 2); neurological = 88.6 (SVM polynomial kernel degree 1); rash = 86.0 (SVM polynomial kernel degree 1); constitutional = 89.3 (SVM polynomial kernel degree 1); hemorrhagic = 89.9 (NB). The resulting classifiers were deployed together with an EARS C2 aberration detection algorithm in an experimental online system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: Understanding microblogging usage and communities. In: Proc. 9th WebKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis, August 12. ACM (2007)

    Google Scholar 

  2. Collier, N., Nguyen, S.T., Nguyen, M.T.N.: OMG U got flu? analysis of shared health messages for bio-surveillance. Biomedical Semantics 2(suppl. 5), S10 (2011)

    Google Scholar 

  3. Earle, P.: Earthquake twitter. Nature Geoscience 3(4), 221–222 (2010), doi:10.1038/ngeo832

    Article  Google Scholar 

  4. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proc. of the 19th International World Wide Web Conference, Raleigh, NC, USA, pp. 851–860 (2010)

    Google Scholar 

  5. Hartley, D., Nelson, N., Walters, R., Arthur, R., Yangarber, R., Madoff, L., Linge, J., Mawudeku, A., Collier, N., Brownstein, J., Thinus, G., Lightfoot, N.: The landscape of international biosurveillance. Emerging Health Threats J. 3(e3) (January 2010), doi:10.1093/bioinformatics/btn534

    Google Scholar 

  6. Szomszor, M., Kostkova, P., De Quincey, E.: swineflu : Twitter predicts swine flu outbreak in 2009 (December 2009)

    Google Scholar 

  7. Lampos, V., De Bie, T., Cristianini, N.: Flu Detector - Tracking Epidemics on Twitter. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6323, pp. 599–602. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Signorini, A., Segre, A.M., Polgreen, P.M.: The use of twitter to track levels of disease activity and public concern in the U.S. during the influenza a h1n1 pandemic. PLoS One 6(5), e19467 (2011)

    Article  Google Scholar 

  9. Wagner, M.M., Espino, J., Tsui, F.C., Gesteland, P., Chapman, W., Ivanov, W., Moore, A., Wong, W., Dowling, J., Hutman, J.: Syndrome and outbreak detection using chief-complaint data - experience of the real-time outbreak and disease surveillance project. Morbidity and Mortality Weekly Report (MMWR) 53(suppl.), 28–31 (2004)

    Google Scholar 

  10. Collier, N., Doan, S., Kawazoe, A., Matsuda Goodwin, R., Conway, M., Tateno, Y., Ngo, Q., Dien, D., Kawtrakul, A., Takeuchi, K., Shigematsu, M., Taniguchi, K.: BioCaster:detecting public health rumors with a web-based text mining system. Bioinformatics 24(24), 2940–2941 (2008), doi:10.1093/bioinformatics/btn534

    Article  Google Scholar 

  11. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  12. Christensen, L.M., Haug, P.J., Fiszmann, M.: Mplus: A probabilistic medical language understanding model. In: Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain, Philadelphia, USA (July 2002)

    Google Scholar 

  13. Hutwagner, L., Thompson, W., Seeman, M.G., Treadwell, T.: The bioterrorism preparedness and response early aberration reporting system (EARS). J. Urban Health 80(2), i89–i96 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Collier, N., Doan, S. (2012). Syndromic Classification of Twitter Messages. In: Kostkova, P., Szomszor, M., Fowler, D. (eds) Electronic Healthcare. eHealth 2011. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 91. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29262-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29262-0_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29261-3

  • Online ISBN: 978-3-642-29262-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics