Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection | SpringerLink
Skip to main content

Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

  • Conference paper
Contemporary Computing (IC3 2009)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 40))

Included in the following conference series:

Abstract

Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Burke, H.B., Goodman, P.H., Rosen, D.B., Henson, D.E., Weinstein, J.N., Harrell Jr., F.E., Marks, J.R., Winchester, D.P., Bostwick, D.G.: Artificial neural networks improve the accuracy of cancer survival prediction. Cancer 79, 857–862 (1997)

    Article  CAS  PubMed  Google Scholar 

  2. Lavrac, N.: Selected techniques for data mining in medicine. Artif. Intell. Med. 16, 3–23 (1999)

    Article  CAS  PubMed  Google Scholar 

  3. Cios, K.J., Moore, G.: Uniqueness of medical data mining. Artif. Intell. Med. 26, 1–24 (2002)

    Article  PubMed  Google Scholar 

  4. Liu, Motoda, H.: Feature Extraction, Construction and Selection. In: A Data Mining Perspective. Kluwer Academic Publishers, Boston (1998); 2nd Printing (2001)

    Google Scholar 

  5. Split, A.M.T., Stegwee, R.A., Teitink, J.A.C.: Business intelligent for healthcare organizations. In: Proceeding of the 35th Annual Hawaii International Conference on System Sciences. IEEE Press, New York (2002)

    Google Scholar 

  6. Abraham, R., Simha, J.B., Iyengar, S.: Medical datamining with a new algorithm for feature selection and Naïve Bayesian classifier. In: 10th International Conference on Information Technology

    Google Scholar 

  7. Chang, C.-C., Lin, C.-J.: LIBSVM a library for support vector machines (2005), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  8. Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  9. Chen, Y.-W., Lin, C.-J.: Combining SVMs with various feature selection strategies (2005), http://www.csie.ntu.edu.tw/~cjlin/papers/features.pdf

  10. Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceedings of the 13th international conference on machine learning, San Francisco, CA, pp. 82–90 (1998)

    Google Scholar 

  11. Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.N.: Feature selection for SVMs. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 668–674 (2001)

    Google Scholar 

  12. Guyon, I., Weston, J., Barnhill, S., Bapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1–3), 389–422 (2002)

    Article  Google Scholar 

  13. Duan, K., Keerthi, S.S., Poo, A.N.: Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51, 41–59 (2003)

    Article  Google Scholar 

  14. Computer aided diagnosis, data mining combine for improved care, Health care IT (2006)

    Google Scholar 

  15. Predicting Health: Jeff Kaplan, managing director at Apollo Data Technologies LLC in Chicago

    Google Scholar 

  16. Roshawnna Scales, Mark Embrechts: Computational intelligence techniques for medical diagnostics

    Google Scholar 

  17. Kononeko, I., Kukar, M.: Machine learning for medical diagnosis. In: Workshop on Computer-Aided Data Analysis in Medicine, CADAM 1995. IJS Scientific Publishing, Ljubljana (1995)

    Google Scholar 

  18. Delen*, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine, doi:10.1016/j.artmed.2004.07.002

    Google Scholar 

  19. Hedberg, S.R.: The data gold rush. Byte, 83–88 (October 1995)

    Google Scholar 

  20. Magoulas, G.D., Prentza, A.: Machine learning in medical applications

    Google Scholar 

  21. Lee, S.J., Siau, K.: A review of data mining techniques. Industrial Management and Data Systems 101(1), 41–46 (2001)

    Article  Google Scholar 

  22. Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers, San Francisco (1999)

    Google Scholar 

  23. Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)

    Google Scholar 

  24. Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.N.: Feature selection for SVMs. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 668–674 (2001)

    Google Scholar 

  25. Molina, L.C., Belanche, L., Nebot, A.: Attribute Selection Algorithms: A survey and experimental evaluation. In: Proceedings of 2nd IEEE’s KDD 2002, pp. 306–313 (2002)

    Google Scholar 

  26. Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    Google Scholar 

  27. Koller, D., Sahami, M.: Towards optimal feature selection. In: 13th International Conference on Machine Learning, Bari, Italy, pp. 284–292 (1996)

    Google Scholar 

  28. Richards, G., Rayward-Smith, V.J., Sonksen, P.H., Carey, S., Weng, C.: Data mining for indicators of early mortality in a database of clinical records. Artif. Intell. Med. 22, 215–231 (2001)

    Article  CAS  PubMed  Google Scholar 

  29. Siedlecki, W., Sklansky, J.: On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence 2(2), 197–220 (1988)

    Article  Google Scholar 

  30. Kohavi, R., John, G.: Wrapper for Feature Subset Selection. Artificial Intelligence 97(1-2), 273–324 (1997)

    Article  Google Scholar 

  31. Almuallim, H., Dietterich, T.G.: Efficient algorithms for identifying relevant features. In: Proceedings of the Ninth Canadian Conference on Artificial Intelligence. Morgan Kaufmann, Vancouver (1992)

    Google Scholar 

  32. Kohavi, R., John, G.: Wrapper for Feature Subset Selection. Artificial Intelligence 97(1-2), 273–324 (1997)

    Article  Google Scholar 

  33. Langley, P.: Selection of Relevant Features in Machine Learning. In: Proc. AAAI Fall Symp. Relevance (1994)

    Google Scholar 

  34. Liu, H., Yu, L.: Feature Selection for Data Mining

    Google Scholar 

  35. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    Google Scholar 

  36. Evgeniou, T., Pontil, M., Papageorgiou, C., Poggio, T.: Image representations for object detection using kernel classifiers. In: Asian Conference on Computer Vision (2000)

    Google Scholar 

  37. Mukherjee, S., Tamayo, P., Slonim, D., Verri, A., Golub, T., Mesirov, J., Poggio, T.: Support vector machine classification of microarray data. AI Memo 1677, Massachusetts Institute of Technology (1999)

    Google Scholar 

  38. Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using Microarray expression data. Bioinformatics 16, 906–914 (2000)

    Article  CAS  PubMed  Google Scholar 

  39. Takeuchi, K., Collier, N.: Bio-medical entity extraction using support vector machines. Artif. Intell. Med. 33, 125–137 (2005)

    Article  PubMed  Google Scholar 

  40. Cohen, G., Hilario, M., Sax, H., Hugonnet, S., Geissbuhler, A.: Learning from imbalanced data in surveillance of nosocomial infection. Artif. Intell. Med. 37, 7–18 (2006)

    Article  PubMed  Google Scholar 

  41. Ali, S., Smith, K.A.: Automatic parameter selection for polynomial kernel. In: Proc. of the IEEE Int. Conf. on Information Reuse and Integration (IRI 2003), Las Vegas, NV, USA, October 27–29, pp. 243–249 (2003)

    Google Scholar 

  42. Imbault, F., Lebart, K.: A stochastic optimization approach for parameter tuning of support vector machines. In: Proc. of the 17th Int. Conf. on Pattern Recognition (ICPR 2004), Cambridge, UK, vol. 4, pp. 597–600 (2004)

    Google Scholar 

  43. Schittkowski, K.: Optimal parameter selection in support vector machines. Journal of Industrial and Management Optimization 1(4), 465–476 (2005)

    Article  Google Scholar 

  44. John, G.H., Kohavi, R., Pfleger, K.: Irrelevant feature and the subset selection problem. In: asnd Hirsh H. Cohen, W.W. (ed.) Machine Learning: Proceedings of the Eleventh International Conference, New Brunswick, N.J., pp. 121–129. Rutgers University (1994)

    Google Scholar 

  45. Herron: Machine Learning for Medical Decision Support: Evaluating Diagnostic Performance of Machine Learning classification Algorithms

    Google Scholar 

  46. Huang, J., Ling, C.X.: Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering 17(3) (2005)

    Google Scholar 

  47. Huang, C.-L., Liao, H.-C., Chen, M.-C.: Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Systems with Applications, 578–587 (2008), doi:10.1016/j.eswa.2006.09.041

    Google Scholar 

  48. EL-Manzalawy, Y., Honavar, V.: WLSVM: Integrating LibSVM into Weka Environment (2005), http://www.cs.iastate.edu/~yasser/wlsvm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sarojini, B., Ramaraj, N., Nickolas, S. (2009). Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection. In: Ranka, S., et al. Contemporary Computing. IC3 2009. Communications in Computer and Information Science, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03547-0_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03546-3

  • Online ISBN: 978-3-642-03547-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics