Abstract
In this paper we apply multiobjective genetic programming to the cost-sensitive classification task of labelling spam e-mails. We consider three publicly-available spam corpora and make comparison with both support vector machines and naïve Bayes classifiers, both of which are held to perform well on the spam filtering problem. We find that for the high cost ratios of practical interest, our cost-sensitive multiobjective genetic programming gives the best results across a range of performance measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alpaydin, E.: Combined 5 ×2 cv F-test for comparing supervised classification learning algorithms. Neural Computation 11, 1885–1892 (1999)
Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.: An evaluation of naive Bayesian anti-spam filtering. In: Proc. Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning, pp. 9–17 (2000)
Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to filter unsolicited commercial e-mail. NCSR Demokritos Technical Report No.2004/2 (2004)
Clack, C., Farrington, J., Lidwell, P., Yu, T.: Autonomous document classification for business. In: Proc. ACM Conf. AGENTS 1997, pp. 201–208 (1997)
Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Trans. on Neural Networks 10, 1048–1054 (1999)
Ekárt, A., Németh, S.Z.: Selection based on the Pareto nondomination criterion for controlling code growth in genetic programming. Genetic Programming & Evolvable Machines 2, 61–73 (2001)
Fawcett, T.: In vivo spam filtering: A challenge problem for data mining. KDD Explorations 5, 140–148 (2003)
Fonseca, C.M., Fleming, P.J.: Multi-objective optimization and multiple constraints handling with evolutionary algorithms. Part 1: A unified formulation. IEEE Trans. Syst., Man & Cybern. 28, 26–37 (1998)
Hidalgo, J.G.: Evaluating cost-sensitive unsolicited bulk e-mail categorization. In: Proc. 17th ACM Symposium on Appl. Computing, pp. 615–620 (2002)
Hirsch, L., Saeedi, M., Hirsch, R.: Evolving rules for document classification. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J.I., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 85–95. Springer, Heidelberg (2005)
Ito, T., Iba, H., Sato, S.: Non-destructive depth-dependent crossover for genetic programming. In: 1st European Workshop on Genetic Programming, pp. 14–15 (1998)
Katirai, H.: Filtering junk e-mail: A performance comparison between genetic programming and naïve Bayes (1999), available at, http://members.rogers.com/hoomank/katirai99filtering.pdf
Kolcz, A., Alspector, J.: SVM-based filtering of e-mail spam with content-specific misclassification costs. In: IEEE Int. Conf. on Data Mining,TextDM 2001 Workshop on Text Mining (2001)
Kumar, R., Rockett, P.: Improved sampling of the Pareto-front in multi-objective genetic optimization by steady-state evolution: A Pareto converging genetic algorithm. Evolutionary Computation 10, 283–314 (2002)
Li, J., Li, X., Yao, X.: Cost-sensitive classification with genetic programming. Congress on Evolutionary Computation 3, 2114–2121 (2005)
Li, H., Niranjan, M.: Discriminant subspaces of some high dimensional pattern recognition problems. In: IEEE Workshop on Machine Learning for Signal Processing, Thessaloniki (August 2007)
Lochart, A.: Quoted in Koprowski, G. J., Spam accounts for most e-mail traffic, Tech News World (2006), http://www.technewsworld.com/story/51055.html
Porter, M.: An algorithm for suffix stripping. Automated Library and Information Systems 4, 130–137 (1980)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: AAAI Workshop on Learning for Text Categorization (1998)
Tretyakov, K.: Machine learning techniques in spam filtering. In: Data Mining Problem-oriented Seminar. MTAT.03., vol. 177, pp. 60–79 (2004)
Yang, Y., Pedersen, J.O.: A comparative study of feature selection in text categorization. In: Proc. 14th Int. Conf. on Machine Learning, pp. 412–420 (1997)
Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing (TALIP) 3, 243–269 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Y., Li, H., Niranjan, M., Rockett, P. (2008). Applying Cost-Sensitive Multiobjective Genetic Programming to Feature Extraction for Spam E-mail Filtering. In: O’Neill, M., et al. Genetic Programming. EuroGP 2008. Lecture Notes in Computer Science, vol 4971. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78671-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-78671-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78670-2
Online ISBN: 978-3-540-78671-9
eBook Packages: Computer ScienceComputer Science (R0)