Abstract
The most important factors that prevent pattern recognition from functioning rapidly and effectively are the noisy and inconsistent data in databases. This article presents a new data preparation method based on clustering algorithms for diagnosis of heart and diabetes diseases. In this method, a new modified K-means Algorithm is used for clustering based data preparation system for the elimination of noisy and inconsistent data and Support Vector Machines is used for classification. This newly developed approach was tested in the diagnosis of heart diseases and diabetes, which are prevalent within society and figure among the leading causes of death. The data sets used in the diagnosis of these diseases are the Statlog (Heart), the SPECT images and the Pima Indians Diabetes data sets obtained from the UCI database. The proposed system achieved 97.87 %, 98.18 %, 96.71 % classification success rates from these data sets. Classification accuracies for these data sets were obtained through using 10-fold cross-validation method. According to the results, the proposed method of performance is highly successful compared to other results attained, and seems very promising for pattern recognition applications.
Similar content being viewed by others
References
Myatt. G. J., Making sense of data a practical guide to exploratory data analysis and data mining. John Wiley & Sons, 2007.
Han, J., and Kamber, M., Data Mining Concepts and Techniques, (2nd ed.). Morgan Kauffmann Publishers, 2006.
Patil, B. M., Joshi, R. C., and Toshniwal, D., Hybrid prediction model for Type-2 diabetic patients. Expert Syst. Appl. 37(12):8102–8108, 2010. doi:10.1016/j.eswa.2010.05.078.
Patil, B. M., Joshi, R. C., and Toshniwal, D., Impact of k-means on the performance of classifiers for labeled data. Comm. Com. Inf. Sc. 94:423–434, 2010. doi:10.1007/978-3-642-14834-7_40.
Tang, W., and Khoshgoftaar, T. M., Noise identification with the k-means algorithm. Ictai 2004: 16th IEEE Internationalconference on Tools with Artificial Intelligence, Proceedings:373–378, 2004
Zhang, B., Li, S. S., Wu, C. S., Gao, L. R., Zhang, W. J., and Peng, M., A neighbourhood-constrained k-means approach to classify very high spatial resolution hyperspectral imagery. Remote Sens Lett 4(2):161–170, 2013. doi:10.1080/2150704x.2012.713139.
Organization WH, World Health Organization. http://www.who.int/cardiovascular_diseases/en/, 2012
McSherry, D., Conversational case-based reasoning in medical decision making. Artif. Intell. Med. 52(2):59–66, 2011. doi:10.1016/j.artmed.2011.04.007.
Zaman, S., and Karray, F., Features Selection using Fuzzy ESVDF for Data Dimensionality Reduction. Int. Conf. Comput. Eng. Technol. I:81–87, 2009. doi:10.1109/Iccet.2009.36.
Polat, K., and Gunes, S., A new feature selection method on classification of medical datasets: Kernel F-score feature selection. Expert Syst. Appl. 36(7):10367–10373, 2009. doi:10.1016/j.eswa.2009.01.041.
Duch, W., Adamczak, R., and Grabczewski, K., A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans. Neural Networks. 12(2):277–306, 2001.
Sahan, S., Polat, K., Kodaz, H., and Gunes, S., The medical applications of attribute weighted artificial immune system (AWAIS): Diagnosis of Heart and Diabetes Diseases. Artif. Immune Syst., Proc. 3627:456–468, 2005.
Ozsen, S., and Gunes, S., Effect of feature-type in selecting distance measure for an artificial immune system as a pattern recognizer. Digit. Signal Proc. 18(4):635–645, 2008. doi:10.1016/j.dsp.2007.08.004.
Kahramanli, H., and Allahverdi, N., Design of a hybrid system for the diabetes and heart diseases. Expert Syst. Appl. 35(1–2):82–89, 2008. doi:10.1016/j.eswa.2007.06.004.
Ozsen, S., Gunes, S., Kara, S., and Latifoglu, F., Use of kernel functions in artificial immune systems for the nonlinear classification problems. IEEE T. Inf. Technol. B. 13(4):621–628, 2009. doi:10.1109/Titb.2009.2019637.
Sub bulakshmi, C. V., Deepa, S. N., and Malathi, N., Extreme learning machine for two category data classification. Paper presented at the IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), 2012
Karabulut, E. M., and Ibrikci, T., Effective Diagnosis of Coronary Artery Disease Using The Rotation Forest Ensemble Method. J. Med. Syst. 36(5):3011–3018, 2012.
Aibinu, A. M., Salami, M. J. E., and Shafie, A. A., A novel signal diagnosis technique using pseudo complex-valued autoregressive technique. Expert Syst. Appl. 38(8):9063–9069, 2011. doi:10.1016/j.eswa.2010.11.005.
Isa, N. A. M., and Mamat, W. M. F. W., Clustered-Hybrid Multilayer Perceptron network for pattern recognition application. Appl. Soft. Comput. 11(1):1457–1466, 2011. doi:10.1016/j.asoc.2010.04.017.
Polat, K., and Gunes, S., An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit. Signal Proc. 17(4):702–710, 2007. doi:10.1016/j.dsp.2006.09.005.
Polat, K., Gunes, S., and Arslan, A., A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Syst. Appl. 34(1):482–487, 2008. doi:10.1016/j.eswa.2006.09.012.
Chikh, M. A., Saidi, M., and Settouti, N., Diagnosis of Diabetes Diseases Using an Artificial Immune Recognition System2 (AIRS2) with Fuzzy K-nearest Neighbor. J. Med. Syst. 36(5):2721–2729, 2012.
Ahmad, F., Isa, N. A. M., Hussain, Z., and Osman, M. K., Intelligent medical disease diagnosis using improved hybrid genetic algorithm - multilayer perceptron network. J. Med. Syst. 37(2), 2013
Ozcift, A., SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of parkinson disease. J. Med. Syst. 36(4):2141–2147, 2012.
MacQueen, J. B., Some methods for classification and analysis of multivariate observations. Paper presented at the In Proceedings of 5th Berkeley symposium on mathematical statistics and probability, California, 1967
Zhang, J. Y., Peng, L. Q., Zhao, X. X., and Kuruoglu, E. E., Robust data clustering by learning multi-metric Lq-norm distances. Expert Syst. Appl. 39(1):335–349, 2012. doi:10.1016/j.eswa.2011.07.023.
Erisoglu, M., Calis, N., and Sakallioglu, S., A new algorithm for initial cluster centers in k-means algorithm. Pattern Recogn. Lett. 32(14):1701–1705, 2011. doi:10.1016/j.patrec.2011.07.011.
Cortes, C., and Vapnik, V., Support-Vector Networks. Mach Learn. 20(3):273–297, 1995.
Stehman, S. V., Selecting and interpreting measures of thematic classification accuracy. Remote Sens. Environ. 62(1):77–89, 1997.
Xu, Y., Zhu, Q., and Wang, J. H., Breast cancer diagnosis based on a kernel orthogonal transform. Neural Comput. Appl. 21(8):1865–1870, 2012. doi:10.1007/s00521-011-0547-0.
Polat, K., and Gunes, S., Breast cancer diagnosis using least square support vector machine. Digit. Signal Proc. 17(4):694–701, 2007. doi:10.1016/j.dsp.2006.10.008.
Francois, D., Rossi, F., Wertz, V., and Verleysen, M., Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing. 70:1276–1288, 2007.
Diamantidis, N. A., Karlis, D., and Giakoumakis, E. A., Unsupervised stratification of cross-validation for accuracy estimation. Artif. Intell. 116:1–16, 2000.
Breiman, L., Friedman, J., Olshen, R., Stone, C., Classification and regression trees. Wadsworth & Boks/Cole Advanced Boks & Software, 1984
Kohavi, R., A study of cross validation and bootstrap for accuracy estimation and model selection. Paper presented at the The Fourteenth International Joint Conference on Artificial Intelligence, San Francisco, 1995
Yao, X., and Liu, Y., A new evolutionary system for evolving artificial neural networks. IEEE Trans. Neural Networks 8(3):694–713, 1997.
Polat, K., Sahan, S., and Gunes, S., Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing. Expert Syst. Appl. 32(2):625–631, 2007. doi:10.1016/j.eswa.2006.01.027.
Blake, C. L., M.C.J. (1998) UCI repository of machine learning databases.
Tian, J., Li, M. Q., and Chen, F. Z., A hybrid classification algorithm based on coevolutionary EBFNN and domain covering method. Neural. Comput. Appl. 18(3):293–308, 2009. doi:10.1007/s00521-008-0182-6.
Acknowledgment
The authors are grateful to Selcuk University Scientific Research Projects Coordinatorship for support of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the Topical Collection on Transactional Processing Systems
Rights and permissions
About this article
Cite this article
Yilmaz, N., Inan, O. & Uzer, M.S. A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases. J Med Syst 38, 48 (2014). https://doi.org/10.1007/s10916-014-0048-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-014-0048-7