Abstract
In this paper we compare four machine learning techniques for spam filtering in blog comments. The machine learning techniques are: Naïve Bayes, K-nearest neighbors, neural networks and support vector machines. In this work we used a corpus of 1021 blog comments with 67% spam, the results of the filtering using 10 fold cross-validation are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tretyakov, K.: Machine Learning Techniques in Spam Filtering. Institute of Computer Science, University of Tartu (2004)
Aas, K., Eikvil, L.: Text categorization. A survey (1999), http://citeseer.ist.psu.edu/aas99text.html
Cristianini, N., Shewe-Taylor, J.: An introduction to support Vector Machines and other Kernel Based Learning Methods. Cambridge University Press, Cambridge (2003)
Kecman, V.: Learning and soft computing. The MIT Press, Cambridge (2001)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Practice Hall (1998)
Androutsopoulos, I., et al.: Learning to filter Spam E-mail: A comparison of Naïve Bayesian and a Memory-Based Approach
Androutsopoulos, I., et al.: An experimental comparison of Naïve Bayesian and Keywords-Based Anti-Spam filtering with Personal E-mail
Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning (1995)
Vladimir, N., Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Mishne, G., Carmel, D., Lempel, R.: Bocking Blog Spam with Language Model Disagreement
Mishne, G.: Using Blogs Properties to Improve Retrieval
Kolari, P., Finin, T., Joshi, A.: SVMs for the Blogsphere: Blog Identification and Splog Detection. In: AAAI Spring Symposium on Computational Approaches to Analysis Weblogs (2006)
Cormack, G., Gomez, J.M., Puertas, E.: Spam Filterin For Shot Messages
Holdens, S.: Spam Filters (2004), http://freshment.net/articles/view/964
Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning (1992)
Cover, T.M., Hart, P.E.: Nearest Neighbor Pattern Classification. Knowledge Based Systems (1995)
Goldstain, M.: K-Nearest Neighbor Classification (1972)
Bishop, C.M.: Neural Networks for Pattern Recognitions. Oxford University Press, U.K. (1995)
Ning Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Maning. Adison Wesley (2006)
Arasu, A., Novak, J., Tomkins, A., Tomlin, J.: PageRank computation and the structure of the web: Experiments and algorithms. In: Proceedings of the Eleventh International World Wide Web Conference, Poster Track. Brisbane, Australia, pp. 107–117 (2002), http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.18.5264&rep=rep1&type=pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Romero, C., Garcia-Valdez, M., Alanis, A. (2010). A Comparative Study of Blog Comments Spam Filtering with Machine Learning Techniques. In: Melin, P., Kacprzyk, J., Pedrycz, W. (eds) Soft Computing for Recognition Based on Biometrics. Studies in Computational Intelligence, vol 312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15111-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-15111-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15110-1
Online ISBN: 978-3-642-15111-8
eBook Packages: EngineeringEngineering (R0)