Searching for Interacting Features for Spam Filtering | SpringerLink
Skip to main content

Searching for Interacting Features for Spam Filtering

  • Conference paper
Advances in Neural Networks - ISNN 2008 (ISNN 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5263))

Included in the following conference series:

Abstract

In this paper, we introduce a novel feature selection method—INTERACT to select relevant words of emails for spam email filtering, i.e. classifying an email as spam or legitimate. Four traditional feature selection methods in text categorization domain, Information Gain, Gain Ratio, Chi Squared, and ReliefF, are also used for performance comparison. Three classifiers, Support Vector Machine (SVM), Naïve Bayes and a novel classifier—Locally Weighted learning with Naïve Bayes (LWNB) are discussed in this paper. Four popular datasets are employed as the benchmark corpora in our experiments to examine the capabilities of these five feature selection methods and the three classifiers. In our simulations, we discover that the LWNB improves the Naïve Bayes and gain higher prediction results by learning local models, and its performance is sometimes better than that of the SVM. Our study also shows the INTERACT can result in better performances of classifiers than the other four traditional methods for the spam email filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 14299
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Frank, E., Hall, M., Pfahringer, B.: Locally Weighted Naive Bayes. In: Proc. of the Conference on Uncertainty in Artificial Intelligence, pp. 249–256 (2003)

    Google Scholar 

  2. Zhao, Z., Liu, H.: Searching for Interacting Features. In: Proc. of International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, pp. 1156–1161 (2007)

    Google Scholar 

  3. CAUBE.AU (2006), http://www.caube.org.au/spamstats.html

  4. Cranor, L.F., LaMacchia, B.A.: Spam! In: Communications of ACM, pp. 74–83. ACM Press, New York (1998)

    Google Scholar 

  5. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-mail. AAAI Technical Report WS-98-05, AAAI 1998 Workshop on Learning for Text Categorization (1998)

    Google Scholar 

  6. Schneider, K.M.: A Comparison of Event Models for Naïve Bayes Anti-Spam E-Mail Filtering. In: Proc. of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, pp. 307–314 (2003)

    Google Scholar 

  7. Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.D., Stamatopoulos, P.: Learning to Filter Spam E-mail: A Comparison of a Naïve Bayesian and a Memory-based Approach. In: Proc. of the Workshop on Machine Learning and Textual Information Access, pp. 1–13 (2000)

    Google Scholar 

  8. Zhang, L., Zhu, J., Yao, T.: An Evaluation of Statistical Spam Filtering Techniques. ACM Trans. Asian Lang. Inf. Process 3, 243–269 (2004)

    Article  Google Scholar 

  9. Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Trans. on Neural Networks 10, 1048–1054 (1999)

    Article  Google Scholar 

  10. Kolcz, A., Alspector, J.: SVM-based Filtering of E-mail Spam with Content-specific Misclassification Costs. In: Proc. of the TextDM 2001 Workshop on Text Mining - held at the 2001 IEEE International Conference on Data Mining (2001)

    Google Scholar 

  11. Sakkis, G., Androutsopoulos, I., Paliouras, G., Stamatopoulos, P.: A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists. Information Retrieval 6, 49–73 (2003)

    Article  Google Scholar 

  12. Yu, L., Liu, H.: Feature Selection for High-dimensional Data: A Fast Correlation-based Filter Solution. In: Proc. of the 20th International Conference on Machine Learning, Washington DC, pp. 856–863 (2003)

    Google Scholar 

  13. Carreras, X., Marquez, L.: Boosting Trees for Anti-spam Email Filtering. In: Proc. Inter-national Conference on Recent Advances in Natural Language Processing (RANLP 2001), Tzigov Chark, Bulgaria, pp. 58–64 (2001)

    Google Scholar 

  14. Méndez, J.R., Iglesias, E.L., Fdez-Riverola, F., Díaz, F., Corchado, J.M.: Analyzing the Impact of Corpus Preprocessing on Anti-Spam Filtering Software. Research on Computing Science 17, 129–138 (2005)

    Google Scholar 

  15. Méndez, J.R., Fdez-Riverola, F., Díaz, F., Iglesias, E.L., Corchado, J.M.: A Comparative Performance Study of Feature Selection Methods for the Anti-spam Filtering Domain. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 106–120. Springer, Heidelberg (2006)

    Google Scholar 

  16. Email Benchmark Corpus (2006), http://www.aueb.gr/users/ion/publications.html

  17. Kononenko, I.: Estimating Attributes: Analysis and Extensions of Relief. In: Proc. of European Conference on Machine Learning, pp. 171–182. Springer, Heidelberg (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, C., Gong, Y., Bie, R., Gao, X. (2008). Searching for Interacting Features for Spam Filtering. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87732-5_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87732-5_55

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87731-8

  • Online ISBN: 978-3-540-87732-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics