Weakly Supervised and Online Learning of Word Models for Classification to Detect Disaster Reporting Tweets

Palshikar, Girish Keshav; Apte, Manoj; Pandita, Deepak

doi:10.1007/s10796-018-9830-2

Weakly Supervised and Online Learning of Word Models for Classification to Detect Disaster Reporting Tweets

Published: 22 February 2018

Volume 20, pages 949–959, (2018)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Girish Keshav Palshikar¹,
Manoj Apte¹ &
Deepak Pandita²

614 Accesses
13 Citations
Explore all metrics

Abstract

Social media has quickly established itself as an important means that people, NGOs and governments use to spread information during natural or man-made disasters, mass emergencies and crisis situations. Given this important role, real-time analysis of social media contents to locate, organize and use valuable information for disaster management is crucial. In this paper, we propose self-learning algorithms that, with minimal supervision, construct a simple bag-of-words model of information expressed in the news about various natural disasters. Such a model is human-understandable, human-modifiable and usable in a real-time scenario. Since tweets are a different category of documents than news, we next propose a model transfer algorithm, which essentially refines the model learned from news by analyzing a large unlabeled corpus of tweets. We show empirically that model transfer improves the predictive accuracy of the model. We demonstrate empirically that our model learning algorithm is better than several state of the art semi-supervised learning algorithms. Finally, we present an online algorithm that learns the weights for words in the model and demonstrate the efficacy of the model with word weights.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Semi-Supervised Learning Classifier for Misinformation Related to Earthquakes Prediction on Social Media

Disaster Tweets Classification for Multilingual Tweets Using Machine Learning Techniques

Disaster Analysis Through Tweets

Notes

http://fire.irsi.res.in/fire/data

References

Guerra, P.H.C., Veloso, A., Meira, W.Jr., & Almeida, V. (2011). From bias to opinion: a transfer-learning approach to real-time sentiment analysis. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 150–158): ACM.
Dai, W., Xue, G.-R., Yang, Q., & Yong, Y. (2007). Transferring naive bayes classifiers for text classification. In Proceedings of the national conference on artificial intelligence 1999 (Vol. 22, p. 540). Menlo Park, CA; Cambridge, MA; London: AAAI Press; MIT Press.
Davidov, D., Tsur, O., & Rappoport, A. (2010). Semi-supervised recognition of sarcastic sentences in twitter and amazon. In Proceedings of the fourteenth conference on computational natural language learning (pp. 107–116). Association for Computational Linguistics.
De Boom, C., Van Canneyt, S., Demeester, T., & Dhoedt, B. (2016). Representation learning for very short texts using weighted word embedding aggregation. Pattern Recognition Letters, 80(C), 150–156.
Article Google Scholar
Druck, G., Mann, G., & McCallum, A. (2008). Learning from labeled features using generalized expectation criteria. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 595–602). ACM.
Greene, D., & Cunningham, P. (2006). Practical solutions to the problem of diagonal dominance in kernel document clustering. In Proceedings 23rd international conference on machine learning (ICML06) (pp. 377384). ACM Press.
Imran, M., Castillo, C., Diaz, F., & Vieweg, S. (2015). Processing social media messages in mass emergency: a survey. ACM Computing Surveys, 47(4), 67:1–67:38.
Article Google Scholar
Joachims, T. (1999). Transductive inference for text classification using support vector machines. In Proceedings of the sixteenth international conference on machine learning (ICML 99) (pp. 200–209).
Kenter, T., & de Rijke, M. (2015). Short text similarity with word embeddings. In Proceedings of the 24th ACM international on conference on information and knowledge management, CIKM ’15 (pp. 1411–1420).
McCallum, A.K. (2002). Mallet: a machine learning for language toolkit. http://mallet.cs.umass.edu .
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning the MIT press.
Musaev, A., De, W., & Litmus, C.P. (2014). Landslide detection by integrating multiple sources. In 11th international conference information systems for crisis response and management (ISCRAM).
Nigam, K., McCallum, A.K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using em. Machine Learning, 39(2-3), 103–134.
Article Google Scholar
Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: global vectors for word representation. In Empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Ritter, A., Wright, E., Casey, W., & Mitchell, T. (2015). Weakly supervised extraction of computer security events from twitter. In Proceedings of the 24th international conference on world wide web (pp.896–905). ACM.
Roy, Suman D., Mei, T., Zeng, W., & Li, S. (2012). Socialtransfer: cross-domain transfer learning from social streams for media applications. In Proceedings of the 20th ACM international conference on multimedia (pp. 649–658). ACM.
Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shake s twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on world wide web (pp. 851–860). ACM.
Tsur, O., Davidov, D., & name, A.R. (2010). Icwsm-a great catchy Semi-supervised recognition of sarcastic sentences in online product reviews. In ICWSM.
Yang, C.C., Shi, X., & Wei, C.-P. (2009). Discovering event evolution graphs from news corpora. IEEE Transactions on Systems, Man, and cybernetics-Part A: Systems and Humans, 39(4), 850–863.
Article Google Scholar
Zhao, Q., Mitra, P., & Bi, C. (2007). Temporal and information flow based event detection from social text streams. In AAAI (Vol. 7, pp. 1501–1506).
Zhao, Z., Da, Y., Ng, W., & Gao, S. (2013). A transfer learning based framework of crowd-selection on twitter. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1514–1517). ACM.
Zhao, L., Chen, F., Dai, J., Hua, T., Lu, C.-T., & Ramakrishnan, N. (2014). Unsupervised spatial event detection in targeted domains with applications to civil unrest modeling. PLOS ONE, 9(10).
Zhou, Y., Kantarcioglu, M., & Thuraisingham, B. (2012). Self-training with selection-by-rejection. In 2012 IEEE 12th international conference on data mining (pp. 795–803). IEEE.
Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In ICML (pp. 912–919).

Download references

Author information

Authors and Affiliations

TCS Research, Tata Consultancy Services Limited, 54B Hadapsar Industrial Estate, Pune, 411013, India
Girish Keshav Palshikar & Manoj Apte
Department of Computer Science, University of Rochester, Rochester, NY, 14623, USA
Deepak Pandita

Authors

Girish Keshav Palshikar
View author publications
You can also search for this author inPubMed Google Scholar
Manoj Apte
View author publications
You can also search for this author inPubMed Google Scholar
Deepak Pandita
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Girish Keshav Palshikar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palshikar, G.K., Apte, M. & Pandita, D. Weakly Supervised and Online Learning of Word Models for Classification to Detect Disaster Reporting Tweets. Inf Syst Front 20, 949–959 (2018). https://doi.org/10.1007/s10796-018-9830-2

Download citation

Published: 22 February 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s10796-018-9830-2

Keywords

Part of a collection:

Special Issue: Exploitation of Social Media for Emergency Relief and Preparedness: Recent Research and Trends

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Weakly Supervised and Online Learning of Word Models for Classification to Detect Disaster Reporting Tweets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semi-Supervised Learning Classifier for Misinformation Related to Earthquakes Prediction on Social Media

Disaster Tweets Classification for Multilingual Tweets Using Machine Learning Techniques

Disaster Analysis Through Tweets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now