Abstract
Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Burke, H.B., Goodman, P.H., Rosen, D.B., Henson, D.E., Weinstein, J.N., Harrell Jr., F.E., Marks, J.R., Winchester, D.P., Bostwick, D.G.: Artificial neural networks improve the accuracy of cancer survival prediction. Cancer 79, 857–862 (1997)
Lavrac, N.: Selected techniques for data mining in medicine. Artif. Intell. Med. 16, 3–23 (1999)
Cios, K.J., Moore, G.: Uniqueness of medical data mining. Artif. Intell. Med. 26, 1–24 (2002)
Liu, Motoda, H.: Feature Extraction, Construction and Selection. In: A Data Mining Perspective. Kluwer Academic Publishers, Boston (1998); 2nd Printing (2001)
Split, A.M.T., Stegwee, R.A., Teitink, J.A.C.: Business intelligent for healthcare organizations. In: Proceeding of the 35th Annual Hawaii International Conference on System Sciences. IEEE Press, New York (2002)
Abraham, R., Simha, J.B., Iyengar, S.: Medical datamining with a new algorithm for feature selection and Naïve Bayesian classifier. In: 10th International Conference on Information Technology
Chang, C.-C., Lin, C.-J.: LIBSVM a library for support vector machines (2005), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Chen, Y.-W., Lin, C.-J.: Combining SVMs with various feature selection strategies (2005), http://www.csie.ntu.edu.tw/~cjlin/papers/features.pdf
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceedings of the 13th international conference on machine learning, San Francisco, CA, pp. 82–90 (1998)
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.N.: Feature selection for SVMs. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 668–674 (2001)
Guyon, I., Weston, J., Barnhill, S., Bapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1–3), 389–422 (2002)
Duan, K., Keerthi, S.S., Poo, A.N.: Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51, 41–59 (2003)
Computer aided diagnosis, data mining combine for improved care, Health care IT (2006)
Predicting Health: Jeff Kaplan, managing director at Apollo Data Technologies LLC in Chicago
Roshawnna Scales, Mark Embrechts: Computational intelligence techniques for medical diagnostics
Kononeko, I., Kukar, M.: Machine learning for medical diagnosis. In: Workshop on Computer-Aided Data Analysis in Medicine, CADAM 1995. IJS Scientific Publishing, Ljubljana (1995)
Delen*, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine, doi:10.1016/j.artmed.2004.07.002
Hedberg, S.R.: The data gold rush. Byte, 83–88 (October 1995)
Magoulas, G.D., Prentza, A.: Machine learning in medical applications
Lee, S.J., Siau, K.: A review of data mining techniques. Industrial Management and Data Systems 101(1), 41–46 (2001)
Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers, San Francisco (1999)
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.N.: Feature selection for SVMs. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 668–674 (2001)
Molina, L.C., Belanche, L., Nebot, A.: Attribute Selection Algorithms: A survey and experimental evaluation. In: Proceedings of 2nd IEEE’s KDD 2002, pp. 306–313 (2002)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Koller, D., Sahami, M.: Towards optimal feature selection. In: 13th International Conference on Machine Learning, Bari, Italy, pp. 284–292 (1996)
Richards, G., Rayward-Smith, V.J., Sonksen, P.H., Carey, S., Weng, C.: Data mining for indicators of early mortality in a database of clinical records. Artif. Intell. Med. 22, 215–231 (2001)
Siedlecki, W., Sklansky, J.: On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence 2(2), 197–220 (1988)
Kohavi, R., John, G.: Wrapper for Feature Subset Selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Almuallim, H., Dietterich, T.G.: Efficient algorithms for identifying relevant features. In: Proceedings of the Ninth Canadian Conference on Artificial Intelligence. Morgan Kaufmann, Vancouver (1992)
Kohavi, R., John, G.: Wrapper for Feature Subset Selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Langley, P.: Selection of Relevant Features in Machine Learning. In: Proc. AAAI Fall Symp. Relevance (1994)
Liu, H., Yu, L.: Feature Selection for Data Mining
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Evgeniou, T., Pontil, M., Papageorgiou, C., Poggio, T.: Image representations for object detection using kernel classifiers. In: Asian Conference on Computer Vision (2000)
Mukherjee, S., Tamayo, P., Slonim, D., Verri, A., Golub, T., Mesirov, J., Poggio, T.: Support vector machine classification of microarray data. AI Memo 1677, Massachusetts Institute of Technology (1999)
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using Microarray expression data. Bioinformatics 16, 906–914 (2000)
Takeuchi, K., Collier, N.: Bio-medical entity extraction using support vector machines. Artif. Intell. Med. 33, 125–137 (2005)
Cohen, G., Hilario, M., Sax, H., Hugonnet, S., Geissbuhler, A.: Learning from imbalanced data in surveillance of nosocomial infection. Artif. Intell. Med. 37, 7–18 (2006)
Ali, S., Smith, K.A.: Automatic parameter selection for polynomial kernel. In: Proc. of the IEEE Int. Conf. on Information Reuse and Integration (IRI 2003), Las Vegas, NV, USA, October 27–29, pp. 243–249 (2003)
Imbault, F., Lebart, K.: A stochastic optimization approach for parameter tuning of support vector machines. In: Proc. of the 17th Int. Conf. on Pattern Recognition (ICPR 2004), Cambridge, UK, vol. 4, pp. 597–600 (2004)
Schittkowski, K.: Optimal parameter selection in support vector machines. Journal of Industrial and Management Optimization 1(4), 465–476 (2005)
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant feature and the subset selection problem. In: asnd Hirsh H. Cohen, W.W. (ed.) Machine Learning: Proceedings of the Eleventh International Conference, New Brunswick, N.J., pp. 121–129. Rutgers University (1994)
Herron: Machine Learning for Medical Decision Support: Evaluating Diagnostic Performance of Machine Learning classification Algorithms
Huang, J., Ling, C.X.: Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering 17(3) (2005)
Huang, C.-L., Liao, H.-C., Chen, M.-C.: Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Systems with Applications, 578–587 (2008), doi:10.1016/j.eswa.2006.09.041
EL-Manzalawy, Y., Honavar, V.: WLSVM: Integrating LibSVM into Weka Environment (2005), http://www.cs.iastate.edu/~yasser/wlsvm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sarojini, B., Ramaraj, N., Nickolas, S. (2009). Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection. In: Ranka, S., et al. Contemporary Computing. IC3 2009. Communications in Computer and Information Science, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_51
Download citation
DOI: https://doi.org/10.1007/978-3-642-03547-0_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03546-3
Online ISBN: 978-3-642-03547-0
eBook Packages: Computer ScienceComputer Science (R0)