Abstract
An immense amount of data is available with the advent of social media in the last decade. This data can be used for sentiment analysis and decision making. The data present on blogs, news/review sites, social networks, etc., are so enormous that manual labeling is not feasible and an automatic approach is required for its analysis. The sentiment of the masses can be understood by analyzing this large scale and opinion rich data. The major issues in the application of automated approaches are data unavailability, data sparsity, domain independence and inadequate performance. This research proposes a semi-supervised sentiment analysis approach that incorporates lexicon-based methodology with machine learning in order to improve sentiment analysis performance. Mathematical models such as information gain and cosine similarity are employed to revise the sentiment scores defined in SentiWordNet. This research also emphasizes on the importance of nouns and employs them as semantic features with other parts of speech. The evaluation of performance measures and comparison with state-of-the-art techniques proves that the proposed approach is superior.
Similar content being viewed by others
Notes
http://www.noslang.com/dictionary (Last Accessed: April 6, 2016).
http://nlp.stanford.edu/IR-book/html/htmledition/dropping-common-terms-stop-words-1.html (Last Accessed: April 6, 2016).
http://www.interopia.com/education/all-question-words-in-english/ (Last Accessed: April 6, 2016).
http://nlp.stanford.edu/software/tagger.shtml (Last Accessed: April 6, 2016).
http://download.joachims.org/svm_light/current/svm_light_windows64.zip (Last Accessed: April 7, 2016).
http://sentiwordnet.isti.cnr.it/code/SentiWordNetDemoCode.java (Last Accessed: April 8, 2016).
References
Khan FH, Qamar U, Bashir S (2016) SentiMI: introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Appl Soft Comput 39:140–153
Balahur A (2013) Sentiment analysis in social media texts. In: 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 120–128
Molina-González MD, Martínez-Cámara E, Martín-Valdivia MT, Ureña-López LA (2015) A Spanish semantic orientation approach to domain adaptation for polarity classification. Inf Process Manag 51:520–531
Khan FH, Bashir S, Qamar U (2014) TOM: twitter opinion mining framework using hybrid classification scheme. Decis Support Syst 57:245–257
Khan FH, Qamar U, Bashir S (2015) Building normalized SentiMI to enhance semi-supervised sentiment analysis. J Intell Fuzzy Syst 29:1805–1816
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113
Triguero Isaac, García Salvador, Herrera Francisco (2013) Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl Inf Syst 42(2):245–284
Fazakis N, Karlos S, Kotsiantis S, Sgarbas K (2016) Self-trained LMT for semisupervised learning. Comput Intell Neurosci 2016:1–13
Didaci L, Fumera G, Roli F, Gimel’farb, Hancock E, Imiya A, Kuijper A, Kudo M, Omachi S, Windeatt T, Yamada K (eds) (2012) Analysis of co-training algorithm with very small training sets. LNCS. Springer, Berlin Heidelberg. pp 719–726. ISBN 9783642341656
Habernal I, Ptáček T, Steinberger J (2015) Reprint of ”Supervised sentiment analysis in Czech social media”. Inf Process Manag 51:532–546
Lin Y, Zhang J, Wang X, Zhou A (2012) An information theoretic approach to sentiment polarity classification. In: Proceedings of the 2nd joint WICOW/AIRWeb workshop on web quality, pp 35–40
Singh PK, Husain MS (2014) Methodological study of opinion mining and sentiment analysis techniques. Int J Soft Comput 5(1):11
Ortega R, Fonseca A, Montoyo A (2013) SSA-UO: unsupervised Twitter sentiment analysis. In: Second joint conference on lexical and computational semantics (*SEM), Vol. 2, pp 501–507
Ohana B, Tierney B (2009) Sentiment classification of reviews using SentiWordNet. In: 9th. IT & T conference p 13
Bhaskar J, Sruthi K, Nedungadi P (2015) Hybrid approach for emotion classification of audio conversation based on text and speech mining. Procedia Comput Sci 46:635–643
Chikersal P, Poria S, Cambria E, Gelbukh A, Siong CE (2015) Modelling public sentiment in twitter: using linguistic patterns to enhance supervised learning. In: Computational linguistics and intelligent text processing, Springer International Publishing, pp 49–65
Pandarachalil R, Sendhilkumar S, Mahalakshmi GS (2015) Twitter sentiment analysis for large-scale data: an unsupervised approach. In: Cognitive computation pp 1–9
Ghosh M, Kar A (2013) Unsupervised linguistic approach for sentiment classification from online reviews using SentiWordNet 3.0. Int J Eng Res Technol 2(9) ESRSA Publications
Fellbaum C (1998) WordNet: an electronic database. MIT Press, Cambridge, MA
Strapparava C, Valitutti A (2004) WordNet-affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), pp 1083–1086
Cerini S, Compagnoni V, Demontis A, Formentelli M, Gandini C (2007) Micro-WNOp: a gold standard for the evaluation of automatically compiled lexical resources for opinion mining. In: Sanso A (ed) Language resources, linguistic theory. Franco Angeli, Milan, pp 200–210
Stone PJ, Hunt EB (1963) A computer approach to content analysis: studies using the general inquirer system. In: Proceedings of the spring joint computer conference (AFIPS 1963), pp 241–256
de Albornoz JC, Plaza L, Gervas P (2012) Sentisense: an easily scalable concept based affective lexicon for sentiment analysis. In: Proceedings of the 8th international conference on language resources and evaluation (LREC 2012), pp 3562–3567
Nielsen FA (2011) A new ANEW: evaluation of a word list for sentiment analysis in microblogs, CoRR abs/1103.2903
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), pp 168–177
Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 conference on empirical methods in natural language processing (EMNLP 2003), pp 105–112
Cambria E, Havasi C, Hussain A (2012) Senticnet 2: a semantic and affective resource for opinion mining and sentiment analysis. In: Proceedings of the 25th Florida artificial intelligence research society conference (FLAIRS 2012), pp 202–207
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 142–150
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, pp 115–124
Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL vol 7, pp 440–447
Khan FH, Qamar U, Bashir S (2016) Multi-objective model selection (MOMS)-based semi-supervised framework for sentiment analysis. Cognit Comput. doi:10.1007/s12559-016-9386-8
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: International conference on language resources and evaluation (LREC), vol 10, pp 2200–2204
Mitchell T (1996) Machine learning. McCraw Hill, New YorK
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, pp 412–420
Lewis DD, Ringuette M (1994) Comparison of two learning algorithms for text categorization. In: Proceedings of third annual symposium on document analysis and information retrieval
Sidorov G, Gelbukh A, Gómez-Adorno H, Pinto D (2014) Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3):491–504
Basu T, Murthy CA (2012) Effective text classification by a supervised feature selection approach. In: Data mining workshops (ICDMW), 2012 IEEE 12th international conference on IEEE, pp 918–925
Kim K, Chung BS, Choi Y, Lee S, Jung JY, Park J (2014) Language independent semantic kernels for short-text classification. Expert Syst Appl 41(2):735–743
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
Verma S, Bhattacharyya P (2009) Incorporating semantic knowledge for sentiment analysis. In: 6th international conference on natural language processing India
Kalaivani P, Shunmuganathan KL (2015) Feature reduction based on genetic algorithm and hybrid model for opinion mining. Sci Program. doi:10.1155/2015/961454
Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152
Varela PL, Martins AF, Aguiar PM, Figueiredo MA (2013) An empirical study of feature selection for sentiment analysis. In: 9th conference on telecommunications, Conftele, Castelo Branco
Hung C, Lin HK (2013) Using objective words in SentiWordNet to improve word-of-mouth sentiment classification. IEEE Intell Syst 2:47–54
Rice DR, Zorn C (2013) Corpus-based dictionaries for sentiment analysis of specialized vocabularies. In: Proceedings of NDATAD
Demiroz G, Yanikoglu B, Tapucu D, Saygin Y (2012) Learning domain-specific polarity lexicons. In: Data mining workshops (ICDMW). In: 2012 IEEE 12th international conference on IEEE, pp 674–679
Sharma A, Dey S (2012) Performance investigation of feature selection methods and sentiment lexicons for sentiment analysis. In: Special issue of international journal of computer applications (0975 – 8887) on advanced computing and communication technologies for HPC Applications – ACCTHPCA
Mudinas A, Zhang D, Levene M (2012) Combining lexicon and learning based approaches for concept-level sentiment analysis. In: Proceedings of the first international workshop on issues of sentiment discovery and opinion mining. ACM, p 5
Hamouda A, Marei M, Rohaim M (2011) Building machine learning based senti-word lexicon for sentiment analysis. J Adv Inf Technol 2(4):199–203
Su F, Markert K (2008) From words to senses: a case study of subjectivity recognition. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 825–832
Agarwal B, Mittal N, Bansal P, Garg S (2015) Sentiment analysis using common-sense and context information. Comput Intell Neurosci 9:715–730. doi:10.1155/2015/715730
Wang G, Sun J, Ma J, Xu K, Gu J (2014) Sentiment classification: the contribution of ensemble learning. Decision Support Syst 57:77–93
Dhande LL, Patnaik GK (2014) Analyzing sentiment of movie review data using naive bayes neural classifier. Int J Emerg Trends Technol Comput Sci (IJETTCS)
Zhou S, Chen Q, Wang X, Li X (2014) Hybrid deep belief networks for semi-supervised sentiment classification. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics. Technical Papers, pp 1341–1349
Liu B, Blasch E, Chen Y, Shen D, Chen G (2013) Scalable sentiment classification for big data analysis using naive bayes classifier. In: Big data, 2013 IEEE international conference on IEEE, pp 99–104
Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing. pp 151–161
He Y, Zhou D (2011) Self-training from labeled features for sentiment analysis. Inf Process Manag 47(4):606–616
Lin C, He Y, Everson Y (2010) A comparative study of Bayesian models for unsupervised sentiment. In: Proceedings of the fourteenth conference on computational natural language learning. Uppsala, Sweden, pp 144–152
Park S, Lee W, Moon IC (2015) Efficient extraction of domain specific sentiment lexicon with active learning. Pattern Recognit Lett 56:38–44
Agarwal B, Mittal N (2013) Sentiment classification using rough set based hybrid feature selection. In: Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (WASSA), 2013, June, pp 115–119
Dang Y, Zhang Y, Chen H (2010) A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell Syst 25(4):46–53
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khan, F.H., Qamar, U. & Bashir, S. A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet. Knowl Inf Syst 51, 851–872 (2017). https://doi.org/10.1007/s10115-016-0993-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-0993-1