Abstract
Along with the widespread concern of spam problem, at present, there are spam filtering system nowadays about the problem of semantic imperfection and spam filter low effect in the multi-send spam. This paper proposes a model of spam filtering which based on latent semantic analysis (LSA) and message-digest algorithm 5 (SHA). Making use of the LSA marks the latent feature phrase in the spam, semantic analysis is led into the spam filtering technique; the "e-mail fingerprint" of multi-send spam is born with SHA on the LSA analytical foundation, the problem of filtering technique’s low effect in the multi-send spam is resolved with this kind of method. We have designed a spam filtering system based on this model. Our designed system was evaluated with an optional dataset. The results obtained were compared with KNN algorithm filter experiment results show that system based on Latent Semantic Analysis and SHA performs KNN. The experiments show the expected results obtained, and the feasibility and advantage of the new spam filtering method is validated.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anti-spam Alliance in China, http://www.anti-spam.org.cn
Hoanca, B.: How Good are Our Weapons in the Spam Wars? Technology and Society Magazine 25(1), 22–30 (2006)
Whitworth, B., Whitworth, E.: Spam and the Social Technical Gap. Computer & Graphics 37(10), 38–45 (2004)
Tang, P.Z., Li, L.Q., Zuo, L.M.: A New Verification Technology Based on SHA and OTP. Journal of East China Jiao Tong University 22(2), 55–59 (2005)
Wang, G.P.: An Efficient Implementation of SHA-1 Hash Function. In: The 2006 IEEE International Conference on Information Technology, pp. 575–579. IEEE Press, China (2006)
Chen, H., Zhou, J.L., Feng, S.: Double Figure Authentication System Based on SHA and RSA. Network & Computer Security 4, 6–8 (2006)
Burr, W.E.: Cryptographic Hash Standards: Where Do We Go From Here? Security & Privacy Magazine 4(2), 88–91 (2006)
Zhu, W.Z., Chen, C.M.: Storylines: Visual Exploration and Analysis in Latent Semantic Spaces. Computers & Graphics 31(3), 78–79 (2007)
Maletic, J.I., Marcus, A.: Using Latent Semantic Analysis to Identify Similarities in Source Code to Support Program Understanding. In: 12th IEEE International Conference on Tools with Artificial Intelligence, pp. 46–53. IEEE Press, New York (2000)
Martin, D.I., Martin, J.C., Berry, M.W.: Out-of-core SVD Performance for Document Indexing. Applied Numerical Mathematics 57(11-12), 224–226 (1994)
Gai, J., Wang, Y., Wu, G.S.: The Theory and Application of Latent Semantic Analysis. Application Research of Computers 21(3), 161–164 (2004)
Michail, H., Kakarountas, A.P.: A Low-power and High-throughput Implementation of the SHA-1 Hash Function. In: The 2005 IEEE International Symposium on Circuits and Systems, vol. 4, pp. 4086–4089. IEEE Press, Kobe Japan (2005)
Wang, M.Y., Su, C.P., Huang, C.T., Wu, C.W.: An HMAC Processor with Integrated SHA-1 and MD5 Algorithms. In: Design Automation Conference, Proceedings of the ASP-DAC 2004, Japan, pp. 456–458 (2004)
Paul, D.B.: MySQL: The Definitive Guide to Using, Programming, and Administering MySQL 4, 2nd edn. China Machine Press, China (2004)
Learning to Filter Unsolicited Commercial E-mail, http://www.aueb.gr/users/ion/docs/TR2004_updated.pdf
Deshpande, V.P., Erbacher, R.F., Harris, C.: An Evaluation of Naïve Bayesian Anti-Spam Filtering. In: Information Assurance and Security Workshop, pp. 333–340. IEEE SMC Press, Spain (2007)
Li, J.Z., Zhang, D.D.: Algorithms for Dynamically Adjusting the Sizes of Sliding Windows. Journal of Software 15(12), 13–16 (2004)
Parthasarathy, G., Chatterji, B.N.: A Class of New KNN Methods for Low Sample Problems. Systems, Man and Cybernetics 20(3), 715–718 (1990)
Yuan, W., Liu, J., Zhou, H.B.: An Improved KNN Method and Its Application to Tumor Diagnosis. In: The 2004 IEEE International Conference on Machine Learning and Cybernetics, vol. 5, pp. 2836–2841. IEEE Press, Shanghai (2004)
Soucy, P., Mineau, G.W.: A Simple KNN Algorithm for Text Categorization. In: Data Mining. The 2001 IEEE International Conference on Data Mining, pp. 647–648. IEEE Press, USA (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sun, J., Zhang, Q., Yuan, Z., Huang, W., Yan, X., Dong, J. (2008). Research of Spam Filtering System Based on LSA and SHA. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87734-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-87734-9_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87733-2
Online ISBN: 978-3-540-87734-9
eBook Packages: Computer ScienceComputer Science (R0)