Abstract
This paper addresses the email spam filtering problem by proposing an approach based on two levels text semantic analysis. In the first level, a deep learning technique, based on Word2Vec is used to categorize emails by specific domains (e.g., health, education, finance, etc.). This enables a separate conceptual view for spams in each domain. In the second level, we extract a set of latent topics from email contents and represent them by rules to summarize the email content into compact topics discriminating spam from legitimate emails in an efficient way. The experimental study shows promising results in term of the precision of the spam detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bíró, I., Szabó, J., Benczúr, A.A.: Latent dirichlet allocation in web spam filtering. In: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web, pp. 29–32. ACM (2008)
Caruana, G., Li, M.: A survey of emerging approaches to spam filtering. ACM Comput. Surv. (CSUR) 44(2), 1–27 (2012)
Ezpeleta, E., Zurutuza, U., Gómez Hidalgo, J.M.: Does sentiment analysis help in bayesian spam filtering? In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 79–90. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_7
Gudkova, D., Vergelis, M., et al.: Spam and phishing in Q2 2016. Kaspersky Lab, pp. 1–22 (2016)
Gudkova, D., Vergelis, M., et al.: Spam and phishing in Q2 2017. Securelsit, Spam and phishing reports (2017). https://securelist.com/spam-and-phishing-in-q2-2017/81537/
Gudkova, D., Vergelis, M., Demidova, N.: Spam and phishing in Q2 2015. Kaspersky Lab, pp. 1–19 (2015)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)
Laorden, C., Santos, I., et al.: Word sense disambiguation for spam filtering. Electron. Commer. Res. Appl. 11(3), 290–298 (2012)
Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(2), 153–188 (2004)
Mikolov, T., Sutskever, I., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Polyvyanyy, A., Kuropka, D.: A quantitative evaluation of the enhanced topic-based vector space model (2007)
Kadam, S., Gala, A., Gehlot, P., Kurup, A., Ghag, K.: Word embedding based multinomial naive bayes algorithm for spam filtering. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 1–5. IEEE (2018)
Renuka, K.D., Visalakshi, P.: Latent semantic indexing based SVM model for email spam classification, vol. 73, no. 6, pp. 437–442 (2014)
Saidani, N., Adi, K., Allili, M.S.: A supervised approach for spam detection using text-based semantic representation. In: Aïmeur, E., Ruhi, U., Weiss, M. (eds.) MCETECH 2017. LNBIP, vol. 289, pp. 136–148. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59041-7_8
Santos, I., Laorden, C., Sanz, B., Bringas, P.G.: Enhanced topic-based vector space model for semantics aware spam filtering. Exp. Syst. Appl. 39(1), 437–444 (2012)
Symantec. Internet Security Threat Report, vol. 21, pp. 1–77 (2016)
Tang, G., Pei, J., Luk, W.-S.: Email mining: tasks, common techniques, and tools. Knowl. Inf. Syst. 41(1), 1–31 (2013). https://doi.org/10.1007/s10115-013-0658-2
Wang, P., Xu, J.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 352–357 (2015)
Wu, T., Liu, S., Zhang, J., Xiang, Y.: Twitter spam detection based on deep learning. In: Proceedings of the Australasian Computer Science Week Multiconference, pp. 1–8. ACM (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Saidani, N., Adi, K., Allili, M.S. (2020). Semantic Representation Based on Deep Learning for Spam Detection. In: Benzekri, A., Barbeau, M., Gong, G., Laborde, R., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2019. Lecture Notes in Computer Science(), vol 12056. Springer, Cham. https://doi.org/10.1007/978-3-030-45371-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-45371-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45370-1
Online ISBN: 978-3-030-45371-8
eBook Packages: Computer ScienceComputer Science (R0)