Abstract
With the current upsurge in the usage of social media platforms, the trend of using short text (microtext) in place of text with standard words has seen a significant rise. The usage of microtext poses a considerable performance issue to sentiment analysis, since models are trained on standard words. This paper discusses the impact of coupling sub-symbolic (phonetics) with symbolic (machine learning) Artificial Intelligence to transform the out-of-vocabulary (OOV) concepts into their standard in-vocabulary (IV) form. We develop binary classifier to detect OOV sentences and then they are transformed to phoneme subspace using grapheme to phoneme converter. We compare the phonetic and string distance using the Sorensen similarity algorithm. The phonetically similar IV concepts thus obtained are then used to compute the correct polarity value, which was previously being miscalculated because of the presence of microtext. Our proposed framework improves the accuracy of polarity detection by 6% as compared to the earlier model. In conclusion, we apply a grapheme to phoneme converter for microtext normalization and show its application on sentiment analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
Repetition of a soundex encoding for greater than one.
- 5.
References
Aw, A., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for SMS text normalization. In: ACL, pp. 33–40 (2006)
Beaufort, R., Roekhaut, S., Cougnon, L.A.l., Fairon, C.d.: A hybrid rule/model-based finite-state framework for normalizing SMS messages. In: ACL, pp. 770–779. Association for Computational Linguistics (2010)
Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293 (2000)
Brown, A.: Singapore English in a Nutshell: An Alphabetical Description of its Features. Federal Publications, Singapore (1999)
Cambria, E., Benson, T., Eckl, C., Hussain, A.: Sentic PROMs: application of sentic computing to the development of a novel unified framework for measuring health-care quality. Expert Syst. Appl. 39(12), 10533–10543 (2012)
Cambria, E., Hussain, A., Durrani, T., Havasi, C., Eckl, C., Munro, J.: Sentic computing for patient centered applications. In: IEEE ICSP, pp. 1279–1282 (2010)
Cambria, E., Hussain, A., Havasi, C., Eckl, C.: Sentic computing: exploitation of common sense for the development of emotion-sensitive systems. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds.) Development of Multimodal Interfaces: Active Listening and Synchrony. LNCS, vol. 5967, pp. 148–156. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12397-9_12
Cambria, E., Poria, S., Gelbukh, A., Thelwall, M.: Sentiment analysis is a big suitcase. IEEE Intell. Syst. 32(6), 74–80 (2017)
Cambria, E., Poria, S., Hazarika, D., Kwok, K.: SenticNet 5: discovering conceptual primitives for sentiment analysis by means of context embeddings. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 1795–1802 (2018)
Choudhury, M., Saraf, R., Jain, V., Sarkar, S., Basu, A.: Investigation and modeling of the structure of texting language. Int. J. Doc. Anal. Recogn. 10(3–4), 157–174 (2007)
Church, K.W., Gale, W.A.: Probability scoring for spelling correction. Stat. Comput. 1(2), 93–103 (1991)
Cook, P., Stevenson, S.: An unsupervised model for text message normalization. In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pp. 71–78 (2009)
Han, B., Baldwin, T.: Lexical normalisation of short text messages: Makn sens a# Twitter. In: ACL, pp. 368–378 (2011)
Howard, N., Cambria, E.: Intention awareness: improving upon situation awareness in human-centric environments. Human-centric Comput. Inf. Sci. 3(9), 1–17 (2013)
Hutto, C.J., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media, pp. 216–225 (2014)
Kaufmann, M., Kalita, J.: Syntactic normalization of Twitter messages. natural language processing, Kharagpur, India (2010)
Khoury, R.: Microtext normalization using probably-phonetically-similar word discovery. In: 2015 IEEE 11th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 392–399 (2015)
Kobus, C., Yvon, F., Damnati, G.é.: Normalizing SMS: are two metaphors better than one? In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 441–448. Association for Computational Linguistics (2008)
Laurent, A., Deléglise, P., Meignier, S.: Grapheme to phoneme conversion using an SMT system. In: Tenth Annual Conference of the International Speech Communication Association, pp. 708–711 (2009)
Li, M., Zhang, Y., Zhu, M., Zhou, M.: Exploring distributional similarity based models for query spelling correction. In: ACL, pp. 1025–1032 (2006)
Li, Z., Yarowsky, D.: Unsupervised translation induction for Chinese abbreviations using monolingual corpora. In: Proceedings of ACL-08: HLT, pp. 425–433 (2008)
Liu, F., Weng, F., Wang, B., Liu, Y.: Insertion, deletion, or substitution? normalizing text messages without pre-categorization nor supervision. ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies 2, pp. 71–76 (2011)
Mortensen, D.R., Dalmia, S., Littell, P.: Epitran: precision G2P for many languages. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 7–12. European Language Resources Association (ELRA), Paris, France, May 2018
Pennell, D.L., Liu, Y.: A character-level machine translation approach for normalization of SMS abbreviations. In: IJCNLP, pp. 974–982 (2011)
Pennell, D.L., Liu, Y.: Normalization of informal text. Comput. Speech Lang. 28(1), 256–277 (2014)
Qazi, A., Syed, K., Raj, R., Cambria, E., Tahir, M., Alghazzawi, D.: A concept-level approach to the analysis of online review helpfulness. Comput. Hum. Behav. 58, 75–81 (2016)
Qian, T., Hollingshead, K., Yoon, S.Y., Kim, K.Y., Sproat, R.: A python toolkit for universal transliteration. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), pp. 2897–2901 (2010)
Rajagopal, D., Cambria, E., Olsher, D., Kwok, K.: A graph-based approach to commonsense concept extraction and semantic similarity detection. In: WWW, pp. 565–570 (2013)
Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. Proceedings of the first instructional conference on machine learning. 242, 133–142 (2003)
Rao, K., Peng, F., Sak, H., Beaufays, F.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4225–4229. IEEE (2015)
Read, J.: Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL Student Research Workshop, pp. 43–48. Association for Computational Linguistics (2005)
Rosa, K.D., Ellen, J.: Text classification methodologies applied to micro-text in military chat. In: Proceedings of the Eight International Conference on Machine Learning and Applications, Miami, pp. 710–714 (2009)
Satapathy, R., Guerreiro, C., Chaturvedi, I., Cambria, E.: Phonetic-based microtext normalization for twitter sentiment analysis. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 407–413. IEEE (2017)
Sproat, R., Black, A.W., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Normalization of non-standard words. Comput. Speech Lang. 15(3), 287–333 (2001)
Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: ACL, pp. 144–151 (2002)
Vilares, D., Peng, H., Satapathy, R., Cambria, E.: Babelsenticnet: a commonsense reasoning framework for multilingual sentiment analysis. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1292–1298. IEEE (2018)
Wang, P., Ng, H.T.: A beam-search decoder for normalization of social media text with application to machine translation. In: HLT-NAACL, pp. 471–481 (2013)
Warschauer, M.: The internet and linguistic pluralism. Silicon literacies: Communication, innovation and education in the electronic age, pp. 62–74 (2002)
Xue, Z., Yin, D., Davison, B.D.: Normalizing Microtext. Analyzing Microtext, pp. 74–79 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Satapathy, R., Singh, A., Cambria, E. (2019). PhonSenticNet: A Cognitive Approach to Microtext Normalization for Concept-Level Sentiment Analysis. In: Tagarelli, A., Tong, H. (eds) Computational Data and Social Networks. CSoNet 2019. Lecture Notes in Computer Science(), vol 11917. Springer, Cham. https://doi.org/10.1007/978-3-030-34980-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-34980-6_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34979-0
Online ISBN: 978-3-030-34980-6
eBook Packages: Computer ScienceComputer Science (R0)