A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet

Khan, Farhan Hassan; Qamar, Usman; Bashir, Saba

doi:10.1007/s10115-016-0993-1

A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet

Regular Paper
Published: 20 September 2016

Volume 51, pages 851–872, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Farhan Hassan Khan¹,
Usman Qamar¹ &
Saba Bashir¹

2071 Accesses
Explore all metrics

Abstract

An immense amount of data is available with the advent of social media in the last decade. This data can be used for sentiment analysis and decision making. The data present on blogs, news/review sites, social networks, etc., are so enormous that manual labeling is not feasible and an automatic approach is required for its analysis. The sentiment of the masses can be understood by analyzing this large scale and opinion rich data. The major issues in the application of automated approaches are data unavailability, data sparsity, domain independence and inadequate performance. This research proposes a semi-supervised sentiment analysis approach that incorporates lexicon-based methodology with machine learning in order to improve sentiment analysis performance. Mathematical models such as information gain and cosine similarity are employed to revise the sentiment scores defined in SentiWordNet. This research also emphasizes on the importance of nouns and employs them as semantic features with other parts of speech. The evaluation of performance measures and comparison with state-of-the-art techniques proves that the proposed approach is superior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Sentiment Analysis Method

Hybrid Approaches to Sentiment Analysis of Social Media Data

Review of Various Sentiment Analysis Approaches

Notes

http://www.noslang.com/dictionary (Last Accessed: April 6, 2016).
http://nlp.stanford.edu/IR-book/html/htmledition/dropping-common-terms-stop-words-1.html (Last Accessed: April 6, 2016).
http://www.interopia.com/education/all-question-words-in-english/ (Last Accessed: April 6, 2016).
http://nlp.stanford.edu/software/tagger.shtml (Last Accessed: April 6, 2016).
http://download.joachims.org/svm_light/current/svm_light_windows64.zip (Last Accessed: April 7, 2016).
http://sentiwordnet.isti.cnr.it/code/SentiWordNetDemoCode.java (Last Accessed: April 8, 2016).

References

Khan FH, Qamar U, Bashir S (2016) SentiMI: introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Appl Soft Comput 39:140–153
Article Google Scholar
Balahur A (2013) Sentiment analysis in social media texts. In: 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 120–128
Molina-González MD, Martínez-Cámara E, Martín-Valdivia MT, Ureña-López LA (2015) A Spanish semantic orientation approach to domain adaptation for polarity classification. Inf Process Manag 51:520–531
Article Google Scholar
Khan FH, Bashir S, Qamar U (2014) TOM: twitter opinion mining framework using hybrid classification scheme. Decis Support Syst 57:245–257
Article Google Scholar
Khan FH, Qamar U, Bashir S (2015) Building normalized SentiMI to enhance semi-supervised sentiment analysis. J Intell Fuzzy Syst 29:1805–1816
Article Google Scholar
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113
Article Google Scholar
Triguero Isaac, García Salvador, Herrera Francisco (2013) Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl Inf Syst 42(2):245–284
Article Google Scholar
Fazakis N, Karlos S, Kotsiantis S, Sgarbas K (2016) Self-trained LMT for semisupervised learning. Comput Intell Neurosci 2016:1–13
Article Google Scholar
Didaci L, Fumera G, Roli F, Gimel’farb, Hancock E, Imiya A, Kuijper A, Kudo M, Omachi S, Windeatt T, Yamada K (eds) (2012) Analysis of co-training algorithm with very small training sets. LNCS. Springer, Berlin Heidelberg. pp 719–726. ISBN 9783642341656
Habernal I, Ptáček T, Steinberger J (2015) Reprint of ”Supervised sentiment analysis in Czech social media”. Inf Process Manag 51:532–546
Article Google Scholar
Lin Y, Zhang J, Wang X, Zhou A (2012) An information theoretic approach to sentiment polarity classification. In: Proceedings of the 2nd joint WICOW/AIRWeb workshop on web quality, pp 35–40
Singh PK, Husain MS (2014) Methodological study of opinion mining and sentiment analysis techniques. Int J Soft Comput 5(1):11
Article Google Scholar
Ortega R, Fonseca A, Montoyo A (2013) SSA-UO: unsupervised Twitter sentiment analysis. In: Second joint conference on lexical and computational semantics (*SEM), Vol. 2, pp 501–507
Ohana B, Tierney B (2009) Sentiment classification of reviews using SentiWordNet. In: 9th. IT & T conference p 13
Bhaskar J, Sruthi K, Nedungadi P (2015) Hybrid approach for emotion classification of audio conversation based on text and speech mining. Procedia Comput Sci 46:635–643
Article Google Scholar
Chikersal P, Poria S, Cambria E, Gelbukh A, Siong CE (2015) Modelling public sentiment in twitter: using linguistic patterns to enhance supervised learning. In: Computational linguistics and intelligent text processing, Springer International Publishing, pp 49–65
Pandarachalil R, Sendhilkumar S, Mahalakshmi GS (2015) Twitter sentiment analysis for large-scale data: an unsupervised approach. In: Cognitive computation pp 1–9
Ghosh M, Kar A (2013) Unsupervised linguistic approach for sentiment classification from online reviews using SentiWordNet 3.0. Int J Eng Res Technol 2(9) ESRSA Publications
Fellbaum C (1998) WordNet: an electronic database. MIT Press, Cambridge, MA
MATH Google Scholar
Strapparava C, Valitutti A (2004) WordNet-affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), pp 1083–1086
Cerini S, Compagnoni V, Demontis A, Formentelli M, Gandini C (2007) Micro-WNOp: a gold standard for the evaluation of automatically compiled lexical resources for opinion mining. In: Sanso A (ed) Language resources, linguistic theory. Franco Angeli, Milan, pp 200–210
Google Scholar
Stone PJ, Hunt EB (1963) A computer approach to content analysis: studies using the general inquirer system. In: Proceedings of the spring joint computer conference (AFIPS 1963), pp 241–256
de Albornoz JC, Plaza L, Gervas P (2012) Sentisense: an easily scalable concept based affective lexicon for sentiment analysis. In: Proceedings of the 8th international conference on language resources and evaluation (LREC 2012), pp 3562–3567
Nielsen FA (2011) A new ANEW: evaluation of a word list for sentiment analysis in microblogs, CoRR abs/1103.2903
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Article Google Scholar
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), pp 168–177
Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 conference on empirical methods in natural language processing (EMNLP 2003), pp 105–112
Cambria E, Havasi C, Hussain A (2012) Senticnet 2: a semantic and affective resource for opinion mining and sentiment analysis. In: Proceedings of the 25th Florida artificial intelligence research society conference (FLAIRS 2012), pp 202–207
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 142–150
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, pp 115–124
Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL vol 7, pp 440–447
Khan FH, Qamar U, Bashir S (2016) Multi-objective model selection (MOMS)-based semi-supervised framework for sentiment analysis. Cognit Comput. doi:10.1007/s12559-016-9386-8
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: International conference on language resources and evaluation (LREC), vol 10, pp 2200–2204
Mitchell T (1996) Machine learning. McCraw Hill, New YorK
MATH Google Scholar
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, pp 412–420
Lewis DD, Ringuette M (1994) Comparison of two learning algorithms for text categorization. In: Proceedings of third annual symposium on document analysis and information retrieval
Sidorov G, Gelbukh A, Gómez-Adorno H, Pinto D (2014) Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3):491–504
Article Google Scholar
Basu T, Murthy CA (2012) Effective text classification by a supervised feature selection approach. In: Data mining workshops (ICDMW), 2012 IEEE 12th international conference on IEEE, pp 918–925
Kim K, Chung BS, Choi Y, Lee S, Jung JY, Park J (2014) Language independent semantic kernels for short-text classification. Expert Syst Appl 41(2):735–743
Article Google Scholar
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
MATH Google Scholar
Verma S, Bhattacharyya P (2009) Incorporating semantic knowledge for sentiment analysis. In: 6th international conference on natural language processing India
Kalaivani P, Shunmuganathan KL (2015) Feature reduction based on genetic algorithm and hybrid model for opinion mining. Sci Program. doi:10.1155/2015/961454
Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152
Article Google Scholar
Varela PL, Martins AF, Aguiar PM, Figueiredo MA (2013) An empirical study of feature selection for sentiment analysis. In: 9th conference on telecommunications, Conftele, Castelo Branco
Hung C, Lin HK (2013) Using objective words in SentiWordNet to improve word-of-mouth sentiment classification. IEEE Intell Syst 2:47–54
Article Google Scholar
Rice DR, Zorn C (2013) Corpus-based dictionaries for sentiment analysis of specialized vocabularies. In: Proceedings of NDATAD
Demiroz G, Yanikoglu B, Tapucu D, Saygin Y (2012) Learning domain-specific polarity lexicons. In: Data mining workshops (ICDMW). In: 2012 IEEE 12th international conference on IEEE, pp 674–679
Sharma A, Dey S (2012) Performance investigation of feature selection methods and sentiment lexicons for sentiment analysis. In: Special issue of international journal of computer applications (0975 – 8887) on advanced computing and communication technologies for HPC Applications – ACCTHPCA
Mudinas A, Zhang D, Levene M (2012) Combining lexicon and learning based approaches for concept-level sentiment analysis. In: Proceedings of the first international workshop on issues of sentiment discovery and opinion mining. ACM, p 5
Hamouda A, Marei M, Rohaim M (2011) Building machine learning based senti-word lexicon for sentiment analysis. J Adv Inf Technol 2(4):199–203
Google Scholar
Su F, Markert K (2008) From words to senses: a case study of subjectivity recognition. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 825–832
Agarwal B, Mittal N, Bansal P, Garg S (2015) Sentiment analysis using common-sense and context information. Comput Intell Neurosci 9:715–730. doi:10.1155/2015/715730
Google Scholar
Wang G, Sun J, Ma J, Xu K, Gu J (2014) Sentiment classification: the contribution of ensemble learning. Decision Support Syst 57:77–93
Article Google Scholar
Dhande LL, Patnaik GK (2014) Analyzing sentiment of movie review data using naive bayes neural classifier. Int J Emerg Trends Technol Comput Sci (IJETTCS)
Zhou S, Chen Q, Wang X, Li X (2014) Hybrid deep belief networks for semi-supervised sentiment classification. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics. Technical Papers, pp 1341–1349
Liu B, Blasch E, Chen Y, Shen D, Chen G (2013) Scalable sentiment classification for big data analysis using naive bayes classifier. In: Big data, 2013 IEEE international conference on IEEE, pp 99–104
Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing. pp 151–161
He Y, Zhou D (2011) Self-training from labeled features for sentiment analysis. Inf Process Manag 47(4):606–616
Article MathSciNet Google Scholar
Lin C, He Y, Everson Y (2010) A comparative study of Bayesian models for unsupervised sentiment. In: Proceedings of the fourteenth conference on computational natural language learning. Uppsala, Sweden, pp 144–152
Park S, Lee W, Moon IC (2015) Efficient extraction of domain specific sentiment lexicon with active learning. Pattern Recognit Lett 56:38–44
Article Google Scholar
Agarwal B, Mittal N (2013) Sentiment classification using rough set based hybrid feature selection. In: Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (WASSA), 2013, June, pp 115–119
Dang Y, Zhang Y, Chen H (2010) A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell Syst 25(4):46–53
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, College of Electrical and Mechanical Engineering, National University of Sciences and Technology (NUST), Islamabad, Pakistan
Farhan Hassan Khan, Usman Qamar & Saba Bashir

Authors

Farhan Hassan Khan
View author publications
You can also search for this author in PubMed Google Scholar
Usman Qamar
View author publications
You can also search for this author in PubMed Google Scholar
Saba Bashir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farhan Hassan Khan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, F.H., Qamar, U. & Bashir, S. A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet. Knowl Inf Syst 51, 851–872 (2017). https://doi.org/10.1007/s10115-016-0993-1

Download citation

Received: 23 August 2015
Accepted: 08 September 2016
Published: 20 September 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10115-016-0993-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid Sentiment Analysis Method

Hybrid Approaches to Sentiment Analysis of Social Media Data

Review of Various Sentiment Analysis Approaches

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now