Abstract
In this paper, we develop a novel feature selection algorithm based on the genetic algorithm (GA) using a specifically devised trace-based separability criterion. According to the scores of class separability and variable separability, this criterion measures the significance of feature subset, independent of any specific classification. In addition, a mutual information matrix between variables is used as features for classification, and no prior knowledge about the cardinality of feature subset is required. Experiments are performed by using a standard lung cancer dataset. The obtained solutions are verified with three different classifiers, including the support vector machine (SVM), the back-propagation neural network (BPNN), and the K-nearest neighbor (KNN), and compared with those obtained by the whole feature set, the F-score and the correlation-based feature selection methods. The comparison results show that the proposed intelligent system has a good diagnosis performance and can be used as a promising tool for lung cancer diagnosis.
Similar content being viewed by others
References
Polat, K., and Gunes, S., Principles component analysis, fuzzy weighting pre-processing and artificial immune recognition system based diagnostic system for diagnosis of lung cancer. Expert Syst. Appl. 34(1):214–221, 2008.
Ahmad, F., Isa, N., Hussain, Z., and Osman, M., Intelligent medical disease diagnosis using improved hybrid genetic algorithm-multilayer perceptron network. J. Med. Syst. 37(2):1–8, 2013.
Liang, C., and Peng, L., An automated diagnosis system of liver disease using artificial immune and genetic algorithms. J. Med. Syst. 37(2):1–10, 2013.
Elizabeth, D. S., Nehemiah, H. K., Retmin Raj, C. S., and Kannan, A., Computer-aided diagnosis of lung cancer based on analysis of the significant slice of chest computed tomography image. IET Image Process. 6(6):697–705, 2010.
Ocak, H., A medical decision support system based on support vector machines and the genetic algorithm for the evaluation of fetal well-being. J. Med. Syst. 37(2):1–9, 2013.
Avci, E., A new expert system for diagnosis of lung cancer: GDA-LS_SVM. J. Med. Syst. 36(3):2005–2009, 2011.
Özçift, A., and Gülten, A., Genetic algorithm wrapped Bayesian network feature selection applied to differential diagnosis of erythemato-squamous diseases. Digit. Signal. Process. 23(1):230–237, 2013.
Shilaskar, S., and Ghatol, A., Feature selection for medical diagnosis: Evaluation for cardiovascular diseases. Expert Syst. Appl. 40(10):4146–4153, 2013.
De Stefano, C., Fontanella, F., Marrocco, C., and Scotto Di Freca, A., A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recogn. Lett. 35:130–141, 2014.
Siedlecki, W., and Sklansky, J., A note on genetic algorithms for large-scale feature selection. Pattern Recogn. Lett. 10(5):335–347, 1989.
Oh, I. S., Lee, J. S., and Moon, B. R., Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11):1424–1437, 2004.
Kudo, M., and Sklansky, J., Comparison of algorithms that select features for pattern recognition. Pattern Recogn. 33(1):25–41, 2000.
Daliri, M. R., A hybrid automatic system for the diagnosis of lung cancer based on genetic algorithm and fuzzy extreme learning machines. J. Med. Syst. 36(2):1001–1005, 2012.
Wu, Y. G., Wu, Y. M., Wang, J., Yan, Z., Qu, L. B., Xiang, B. R., and Zhang, Y. G., An optimal tumor marker group-coupled artificial neural network for diagnosis of lung cancer. Expert Syst. Appl. 38(9):11329–11334, 2011.
Lee, M. C., Using support vector machine with a hybrid feature selection method to the stock trend prediction. Expert Syst. Appl. 36(8):10896–10904, 2009.
Yang, K., Yoon, H., and Shahabi, C., A supervised feature subset selection technique for multivariate time series. In: Workshop Feature Selection for Data Mining: Interfacing Machine Learning with Statistics, pp. 92–101, 2005.
Akay, M. F., Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst. Appl. 36(2):3240–3247, 2009.
Battiti, R., Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Networks 4(5):537–550, 1994.
Doquire, G., and Verleysen, M., Feature selection with missing data using mutual information estimators. Neurocomputing 90:3–11, 2012.
Han, M., and Liu, X. X., Feature selection techniques with class separability for multivariate time series. Neurocomputing 110:29–34, 2013.
Xie, J. Y., and Wang, C. X., Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases. Expert Syst. Appl. 38(5):5809–5815, 2011.
Huang, C. J., Yang, D. X., and Chuang, Y. T., Application of wrapper approach and composite classifier to the stock trend prediction. Expert Syst. Appl. 34(4):2870–2878, 2008.
Vapnik, V., and Cortes, C., Support vector networks. Mach. Learn. 20:273–297, 1989.
Chang, P. C., Liu, C. H., Lin, J. L., Fan, C. Y., and Ng, C. P., A neural network with a case based dynamic window for stock trading prediction. Expert Syst. Appl. 36(3):6889–6898, 2009.
Wan, C. H., Lee, L. H., Rajkumar, R., and Isa, D., A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine. Expert Syst. Appl. 39(15):11880–118888, 2012.
Wang, L., Feature selection with kernel class separability. IEEE Trans. Patt. Anal. Mach. Intell. 30(9):1534–1546, 2008.
Frank, A., and Asuncion, A., UCI machine learning repository (http://archive.ics.uci.edu/ml). Irvine, CA: University of California, School of Information and Computer Science, 2010.
Mitra, P., Murthy, C., and Pal, S. K., Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3):301–312, 2002.
Yu, L., and Liu, H., Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth Int. Conf. on Machine Learning (ICML-03), pp. 856–863, Washington, D.C., 2003.
Tan, P. J., and Dowe, D. L., MML inference of oblique decision trees. In: Australian Conf. on Artificial Intelligence, pp. 1082–1088, 2004.
Bostrom, H., Maximizing the area under the ROC curve using incremental rediced error pruning. In: Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning, 2005.
Acknowledgments
This work is supported by the Fundamental Research Funds for the Central Universities (JUDCF12027, JUSRP211A37, JUSRP51323B), the Fund of the State Key Laboratory of ASIC and System in Fudan University (11KF003), the PAPD of Jiangsu Higher Education Institutions and Graduate Student Innovation Program for Universities of Jiangsu Province (CXLX12_0734).
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the Topical Collection on Systems-Level Quality Improvement
Rights and permissions
About this article
Cite this article
Lu, C., Zhu, Z. & Gu, X. An Intelligent System for Lung Cancer Diagnosis Using a New Genetic Algorithm Based Feature Selection Method. J Med Syst 38, 97 (2014). https://doi.org/10.1007/s10916-014-0097-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-014-0097-y