Abstract
Customer reviews are an essential source of information to consumers. Meanwhile, opinion spams spread widely and the detection of spam reviews becomes critically important for ensuring the integrity of the echo system of online reviews. Singleton spam reviews—one-time reviews—have spread widely of late as spammers can create multiple accounts to purposefully cheat the system. Most available techniques fail to detect this cunning form of malicious reviews, mainly due to the scarcity of behaviour trails left behind by singleton spammers. Available approaches also require extensive feature engineering, expensive manual annotation and are less generalizable. Based on our thorough study of spam reviews, it was found that genuine opinions are usually directed uniformly towards important aspects of entities. In contrast, spammers attempt to counter the consensus towards these aspects while covering their malicious intent by adding more text but on less important aspects. Additionally, spammers usually target specific time periods along products’ lifespan to cause maximum bias to the public opinion. Based on these observations, we present an unsupervised singleton spam review detection model that runs in two steps. Unsupervised deep aspect-level sentiment model employing deep Boltzmann machines first learns fine-grained opinion representations from review texts. Then, an LSTM network is trained on opinion learned representation to track the evolution of opinions through the fluctuation of sentiments in a temporal context, followed by the application of a Robust Variational Autoencoder to identify spam instances. Experiments on three benchmark datasets widely used in the literature showed that our approach outperforms strong state-of-the-art baselines.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahmed M, Mahmood AN, Hu J (2016) A survey of network anomaly detection techniques. J Netw Comput Appl 60:19–31
Akcay S, Atapour-Abarghouei A, Breckon TP (2018) Ganomaly: semi-supervised anomaly detection via adversarial training. In: Proceedings of Asian conference on computer vision (ACCV), pp 622–637
Akoglu L, Chandy C R andFaloutsos (2013) Opinion fraud detection in online reviews by network effects. In: Proceedings of international conference on web and social media (ICWSM), pp 2–11
Aygun RC, Yavuz AG (2017) Network anomaly detection with stochastically improved autoencoder based models. In: Proceedings of international conference on cyber security and cloud computing (CSCloud), pp 193–198
Baldi P (2012) Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of international conference on machine learning (ICML), pp 37–49
Blei D, Ng A, Jordan I (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Bontemps L, McDermott J, Le-Khac NA (2016) Collective anomaly detection based on long short-term memory recurrent neural networks. In: Proceedings of international conference on foundations of restoration ecology (ICFRE), pp 141–152
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Byeon W, Breuel TM, Raue F, Liwicki M (2015) Scene labeling with lstm recurrent neural networks. In: Proceedings of conference on computer vision and pattern recognition (CVPR), pp 3547–3555
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58:1–37
Card D, Tan C, Smith N (2018) Neural models for documents with metadata. In: Proceedings of the Association for Computational Linguistics (ACL), pp 2031–2040
Chen P, Sun Z, Bing L, Yang W (2017) Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of conference on empirical methods in natural language processing (EMNLP), pp 452–461
Christy A, Gandhi GM, Vaithyasubramanian S (2015) Cluster based outlier detection algorithm for healthcare data. Procedia Comput Sci 50:209–215
Colhon M, Vlăduţescu S, Negrea X (2017) How objective a neutral word is? A neutrosophic approach for the objectivity degrees of neutral words. Symmetry 9(11):280
Devlin J, Ming-Wei C, Kenton L, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL), pp 4171–4186
Erfani SM, Baktashmotlagh M, Moshtaghi M, Nguyen V, Leckie C, Bailey J, Ramamohanarao K (2017) From shared subspaces to shared landmarks: a robust multi-source classification approach. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI)
Eskin E, Arnold A, Prerau M, Portnoy L, Stolfo S (2002) A geometric framework for unsupervised anomaly detection. In: Barbará D, Jajodia S (eds) Applications of data mining in computer security. Springer, Boston, pp 77–101
Estiri H, Murphy SN (2019) Semi-supervised encoding for outlier detection in clinical observation data. Comput Methods Programs Biomed 181:104830
Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. In: Proceedings of international conference on web and social media (ICWSM), pp 175–184
Fu P, Lin Z, Yuan F, Wang W, Meng D (2018) Learning sentiment-specific word embedding via global sentiment representation. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI)
Ganu G, Elhadad N, Marian A (2009) Beyond the stars: improving rating predictions using review text content. Web Databases 9:1–6
García-Pablos A, Cuadros M, Rigau G (2018) W2vlda: Almost unsupervised system for aspect based sentiment analysis. Expert Syst Appl 91:127–137
Garcia-Teodoro P, Diaz-Verdejo J, Maciá-Fernández G, Vázquez E (2016) Anomaly-based network intrusion detection: techniques, systems and challenges. Comput Secur 28:18–28
Ghanem B, Rosso P, Rangel F (2020) An emotional analysis of false information in social media and news articles. ACM Trans Internet Technol 20(2):1–18
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5):602–610
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) Lstm: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232
Gupta P, Chaudhary Y, Buettner F, Schutze H (2019) Document informed neural autoregressive topic models with distributional prior. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), pp 6505–6512
Görnitz N, Kloft M, Rieck K, Brefeld U (2013) Toward supervised anomaly detection. J Artif Intell Res 46:235–262
Hai Z, Chang K, Kim J (2011) Implicit feature identification via co-occurrence association rule mining. In: Proceedings of international conference on intelligent text processing and computational linguistics (CICLing), pp 393–404
He S, Wang S, Lan W, Fu H, Ji Q (2013) Facial expression recognition using deep Boltzmann machine from thermal infrared images. In: Proceedings of conference of the Association for the Advancement of Affective Computing (AAAC), pp 239–244
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hu M, Liu B (2004a) Mining and summarizing customer reviews. In: Proceedings of special interest group on knowledge discovery in data (ACM SIGKDD), pp 168–177
Hu M, Liu B (2004b) Mining and summarizing customer reviews. In: Proceedings of special interest group on knowledge discovery in data (ACM SIGKDD), pp 168–177
Huayi L, Geli F, Shuai W, Bing L, Weixiang S, Mukherjee A, Jidong S (2017) Bimodal distribution and co-bursting in review spam detection. In: Proceedings of international world wide web conference (WWW), pp 1063–1072
Hundman K, Constantinou V, Laporte C, Colwell I, Soderstrom T (2018) Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In: Proceedings of special interest group on knowledge discovery in data (ACM SIGKDD), pp 387–395
Jakob N, Gurevych I (2010) Extracting opinion targets in a single- and cross-domain setting with conditional random fields. In: Proceedings of empirical methods in natural language processing (EMNLP), pp 1035–1045
Jo Y, Oh A (2011) Aspect and sentiment unification model for online review analysis. In: Proceedings of conference on web search and data mining (WSDM), pp 815–824
Kazemi S, Abghari S, Lavesson N, Johnson H, Ryman P (2016) Open data for anomaly detection in maritime surveillance. Expert Syst Appl 40(14):5719–5729
Kim S, Choi Y, Lee M (2015) Deep learning with support vector data description. Neurocomputing 165:111–117
Kobayashi N, Iida R, Inui K, Matsumoto Y (2006) Opinion mining on the web by extracting subject-aspect-evaluation relations. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), pp 86–91
Kou Y, Lu CT, Chen D (2006) Spatial weighted outlier detection. In: Proceedings of Society for Industrial and Applied Mathematics (SIAM), pp 614–618
Kumar D, Shaalan Y, Zhang X, Chan J (2018) Identifying singleton spammers via spammer group detection. In: Proceedings of Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 175–184
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), pp 2267–2273
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of international conference on machine learning (ICML), pp 1188–1196
Le Q, Mikolov T (2016) Fake it till you make it: reputation, competition, and yelp review fraud. Manag Sci 62(12):3412–3427
Li J, Cardie C, Li S (2013) Topicspam: a topic-model-based approach for spam detection. In: Proceedings of The Association for Computational Linguistics (ACL), pp 217–221
Li J, Ott M, Cardie C, Hovy E (2014) Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the Association for Computational Linguistics (ACL), pp 1566–1576
Lim E, Nguyen V, Jindal N, Liu B, Lauw HW (2008) Detecting product review spammers using rating behaviors. In: Proceedings of conference on information and knowledge management (CIKM), pp 939–948
Liu B (2010) Sentiment analysis and subjectivity. Handbook of natural language orocessing, vol 2, 2nd edn. Chapman and Hall, Boca Raton, pp 627–666
Liu B, Zhang L (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
Liu F, Ting K, Zhou Z (2008) Isolation forest. In: Proceedings of international conference on data mining (ICDM), pp 413–422
Liu FT, Ting KM, Zhou ZH (2010) On detecting clustered anomalies using sciforest. In: Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD), pp 274–290
Logeswaran L, Lee H (2018) An efficient framework for learning sentence representations. In: Proceedings of international conference on learning representations (ICLR)
Luyang L, Qin B, Wenjing R, Liu T (2016) Document representation and feature combination for deceptive spam review detection. Neuro Comput 254:33–41
Ma J, Sun L, Wang H, Zhang Y, Aickelin U (2016) Supervised anomaly detection in uncertain pseudo-periodic data streams. ACM Trans Internet Technol 16(1):235–262
Ma D, Li S, Zhang X, Wang H (2017) Interactive attention networks for aspect-level sentiment classification. In: Proceedings of international joint conferences on artificial intelligence (IJCAI), pp 4068–4074
Maas A, Daly R, Pham P, Huang D, Ng A, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of The Association for Computational Linguistics (ACL)
Malhotra P, Lovekesh V, Shroff G, Argarwal P (2015) Long short term memory networks for anomaly detection in time series. In: Proceedings of European symposium on artificial neural networks (ESANN), pp 665–674
McAuley J, Leskovec J (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of conference on recommender systems (RecSys), pp 165–172
McAuley J, Leskovec J, Jurafsky D (2012) Learning attitudes and attributes from multiaspect reviews. In: Proceedings of international conference on data Mining (ICDM), pp 1020–1025
Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of international world wide web conference (WWW), pp 171–180
Miedema F (2018) Sentiment analysis with long short-term memory. Vrije Universiteit Amsterdam 1
Mihalcea R, Strapparava C (2009) The lie detector: explorations in the automatic recognition of deceptive language. In: Proceedings of international joint conference on natural language processing (IJCNLP), pp 309–312
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119
Moghaddam S, Ester M (2011) Ilda: Interdependent lda model for learning latent aspects and their ratings from online product reviews. In: Proceedings of conference on research and development in information retrieval (ACM SIGIR), pp 665–674
Mukherjee A, Liu B (2010) Modeling review comments. In: Proceedings of The Association for Computational Linguistics (ACL), pp 320–329)
Mukherjee A, Liu B, Wang J, Glance N, Jindal N (2011) Detecting group review spam. In: Proceedings of international world wide web conference (WWW), pp 93–94
Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013) Spotting opinion spammers using behavioral footprints. In: Proceedings of special interest group on knowledge discovery in data (ACM SIGKDD), pp 632–640
Narisawa K, Hideo B, Hatano K, Takeda M (2007) Unsupervised spam detection based on string alienness measures. In: Proceedings of international conference on discovery science (DS), pp 161–172
Nguyen-Hoang B, Ha Q, Nghiem M (2016) Aspect-based sentiment analysis using word embedding restricted Boltzmann machines. In: Proceedings of international conference on computational social networks (CSoNet), pp 285–297
Ott M, Choi Y, Cardie C, Hancok J (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of The Association for Computational Linguistics (ACL), pp 309–319
Pennington J, Socher R, Manning C (2014) Glove: Global vectors forward representation. In: Proceedings of empirical methods in natural language processing (EMNLP), pp 1532–1543
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL), pp 2227–2237
Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor Newsl 6(1):50–59
Pontiki M, Galanis D, Pavlopoulos J, Papageorgiou H, Androutsopoulos I, Manandhar S (2014) Proceedings of international workshop on semantic evaluation-2014 task 4: aspect based sentiment analysis. In: Proceedings of international workshop on semantic evaluation (SemEval), pp 27–35
Pontiki M, Galanis D, Pavlopoulos J, Papageorgiou H, Manandhar S (2015) Proceedings of international workshop on semantic evaluation-2015 task 12: aspect based sentiment analysis. In: Proceedings of international workshop on semantic evaluation (SemEval), pp 486–495
Pontiki M, Galanis D, Papageorgiou H, Androutsopoulos I, Manandhar S, AL-Smadi M, Al-Ayyoub M, Zhao Y, Qin B, of The International Conference on Language Resources ODC, Evaluationq U (2016) Proceedings of international workshop on semantic evaluation-2016 task 5: aspect based sentiment analysis. In: Proceedings of international workshop on semantic evaluation (SemEval), pp 19–30
Principi E, Vesperini F, Squartini S, Piazza F (2017) Acoustic novelty detection with adversarial autoencoders. In: Proceedings of international joint conference on neural networks (IJCNN), pp 3324–3330
Qiu G, Liu B, Bu J, Chen C (2009) Expanding domain sentiment lexicon through double propagation. In: Proceedings of international joint conferences on artificial intelligence (IJCAI), pp 1199-1204
Racah E, Beckham C, Maharaj T, Kahou SE, Prabhat M, Pal C (2017) Extremeweather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events. In: Proceedings of neural information processing systems (NIPS), pp 3402–3413
Rayana S, Akoglu L (2015) Collective opinion spam detection: Bridging review networks and metadata. In: Proceedings of special interest group on knowledge discovery in data (ACM SIGKDD), pp 985–994
Saini M, Sharan A (2017) Identifying deceptive opinion spam using aspect-based emotions and human behavior modeling. Int J Hybrid Inf Technol 10(1):447–456
Sakurada M, Yairi T (2014) Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of workshop on machine learning for sensory data analysis (MLSDA)
Salakhutdinov R, Hinton G (2009) Deep Boltzmann machines. In: Proceedings of artificial intelligence and statistics (AISTATS), pp 448–455
Salvador S, Chan P, Brodie J (2004) Learning states and rules for time series anomaly detection. In: Proceedings of The Florida artificial intelligence research society (FLAIRS), pp 306–311
Sandulescu V, Ester M (2015) Detecting singleton review spammers using semantic similarity. In: Proceedings of international world wide web conference (WWW), pp 971–976
Sauper C, Barzilay R (2013) Automatic aggregation by joint modeling of aspects and values. J Artif Intell Res 46:89–127
Savage D, Zhang X, Yu X, Chou P, Wang Q (2015) Detection of opinion spam based on anomalous rating deviation. Expert Syst Appl 42(22):8650–8657
Shehnepoor S, Salehi M, Farahbakhsh R, Crespi N (2019) Netspam: a network-based spam detection framework for reviews in online social media. IEEE Trans Inf Forensics Secur 12(7):1585–1595
Shojaee S, Murad M, A BA, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: Proceedings of international conference on intelligent systems design and applications (ISDA), pp 53–58
Siegel S, Castellan J (1988) Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw-Hill, New York
Solberg HE, Lahti A (2005) Detection of outliers in reference distributions: performance of Horn’s algorithm. Clin Chem 51(12):2326–2332
Srivastava A, Kundu A, Sural S, Majumdar A (2008) Credit card fraud detection using hidden Markov model. IEEE Trans Dependable Secure Comput 5(1):37–48
Srivastava N, Salakhutdinov R, Hinton G (2013) Modeling documents with a deep Boltzmann machine. In: Proceedings of conference on uncertainty in artificial intelligence (UAI), pp 616–624
Sun J, Wang X, Xiong N, Shao J (2018) Learning sparse representation with variational auto-encoder for anomaly detection. IEEE Access 6:33353–33361
Sundermeyer M, Schlüter R, Ney H (2012) Lstm neural networks for language modeling. In: Proceedings of conference of the International Speech Communication Association (ISCA)
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of neural information processing systems (NIPS), pp 3104–3112
Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2015) Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the Association for Computational Linguistics (ACL), pp 1555–1565
Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: Liwc and computerized text analysis methods. J Lang Soc Psychol 29(1):24–54
Titov I, McDonald R (2008) A joint model of text and aspect ratings for sentiment summarization. In: Proceedings of The Association for Computational Linguistics (ACL), pp 308–316
Titov I, McDonald R (2009) Modeling online reviews with multi-grain topic models. In: Proceedings of International World Wide Web Conference (WWW), pp 111–120
Toprak C, Jakob N, Gurevych I (2010) Sentence and expression level annotation of opinions in user-generated discourse. In: Proceedings of The Association for Computational Linguistics (ACL), pp 575–584
van der Maaten L, Hinton G (2008) Visualizing high-dimensional data using t-sne. Mach Learn Res 9:2579–2605
Verwimp L, Pelemans J, , Wambacq P (2017) Character-word lstm language models. In: Proceedings of conference of the European chapter of the Association for Computational Linguistics (EACL), pp 417–427
Wang L, Liu K, Cao Z, Zhao J, de Melo G (2015) Sentiment-aspect extraction based on restricted Boltzmann machines. In: Proceedings of the Association for Computational Linguistics (ACL), pp 616–625
Wang X, Liu K, Zhao J (2017) Detecting deceptive review spam via attention-based neural networks. In: Proceedings of national conference on Natural Language Processing and Chinese Computing (NLPCC), pp 866–876
Wang H, Bah MJ, Hammad M (2019) Progress in outlier detection techniques: a survey. IEEE Access 7:107964–108000
Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via time series pattern discovery. In: Proceedings of knowledge discovery and data mining (KDD), pp 823–831
Xie P, Deng Y, Xing E (2015) Diversifying restricted Boltzmann machine for document modeling. In: Proceedings of special interest group on knowledge discovery in data (ACM SIGKDD), pp 1315–1324
Xu H, Chen W, Zhao N, Li Z, Bu J, Li Z, Liu Y, Zhao Y, Pei D, Feng Y, Chen J (2018) Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. In: Proceedings of International World Wide Web Conference (WWW), pp 187–196
Ye Y, Akoglu L (2017) Discovering opinion spammer groups by network footprints. In: Proceedings of joint European conference on machine learning and knowledge discovery in databases (ECML KDD), pp 97–113
Ye J, Kumar S, Akoglu L (2016) Temporal opinion spam detection by multivariate indicative signals. In: Proceedings of international conference on web and social media (ICWSM), pp 743–746
Zhang L, Liu B, Lim SH, O’Brien-Strain E (2010) Extracting and ranking product features in opinion documents. In: Proceedings of international conference on computational linguistics (COLING), pp 1462–1470
Zhao W, Jiang J, Yan H, Li X (2010a) Jointly modeling aspects and opinions with a maxent-lda hybrid. In: Proceedings of empirical methods in natural language processing (EMNLP), pp 56–65
Zhao Y, Qin B, Hu S, Liu T (2010b) Generalizing syntactic structures for product attribute candidate extraction. In: Proceedings of conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), pp 377–380
Zhao S, Xu Z, Liu L, Guo M, Yun J (2018) Towards accurate deceptive opinion spam detection based on word order-preserving CNN. Math Probl Eng 2018:2410206
Zheng Y, Zhang H, Yu Y (2015) Detecting collective anomalies from multiple spatio-temporal datasets across different domains. In: Proceedings of international conference on advances in geographic information systems (ACM SIGSPATIAL), pp 1–10
Zhou C, Paffenroth R (2017) Anomaly detection with robust deep autoencoders. In: Proceedings of special interest group on knowledge discovery in data (ACM SIGKDD), pp 665–674
Zhu J, Wang H, Tsou B, Zhu M (2009) Multi-aspect opinion polling from textual reviews. In: Proceedings of conference on information and knowledge management (CIKM), pp 1799–1802
Zhuang L, Jing F, Zhu XY (2006) Movie review mining and summarization. In: Proceedings of conference on information and knowledge management (CIKM), pp 43–50
Zong B, Song Q, Min W M Rand Cheng, Lumezanu C, Cho D, Chen H (2018) Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In: Proceedings of international conference on learning representations (ICLR)
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shaalan, Y., Zhang, X., Chan, J. et al. Detecting singleton spams in reviews via learning deep anomalous temporal aspect-sentiment patterns. Data Min Knowl Disc 35, 450–504 (2021). https://doi.org/10.1007/s10618-020-00725-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-020-00725-5