A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases | Journal of Medical Systems Skip to main content
Log in

A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases

  • Transactional Processing Systems
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

The most important factors that prevent pattern recognition from functioning rapidly and effectively are the noisy and inconsistent data in databases. This article presents a new data preparation method based on clustering algorithms for diagnosis of heart and diabetes diseases. In this method, a new modified K-means Algorithm is used for clustering based data preparation system for the elimination of noisy and inconsistent data and Support Vector Machines is used for classification. This newly developed approach was tested in the diagnosis of heart diseases and diabetes, which are prevalent within society and figure among the leading causes of death. The data sets used in the diagnosis of these diseases are the Statlog (Heart), the SPECT images and the Pima Indians Diabetes data sets obtained from the UCI database. The proposed system achieved 97.87 %, 98.18 %, 96.71 % classification success rates from these data sets. Classification accuracies for these data sets were obtained through using 10-fold cross-validation method. According to the results, the proposed method of performance is highly successful compared to other results attained, and seems very promising for pattern recognition applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Myatt. G. J., Making sense of data a practical guide to exploratory data analysis and data mining. John Wiley & Sons, 2007.

  2. Han, J., and Kamber, M., Data Mining Concepts and Techniques, (2nd ed.). Morgan Kauffmann Publishers, 2006.

  3. Patil, B. M., Joshi, R. C., and Toshniwal, D., Hybrid prediction model for Type-2 diabetic patients. Expert Syst. Appl. 37(12):8102–8108, 2010. doi:10.1016/j.eswa.2010.05.078.

    Article  Google Scholar 

  4. Patil, B. M., Joshi, R. C., and Toshniwal, D., Impact of k-means on the performance of classifiers for labeled data. Comm. Com. Inf. Sc. 94:423–434, 2010. doi:10.1007/978-3-642-14834-7_40.

    Article  Google Scholar 

  5. Tang, W., and Khoshgoftaar, T. M., Noise identification with the k-means algorithm. Ictai 2004: 16th IEEE Internationalconference on Tools with Artificial Intelligence, Proceedings:373–378, 2004

  6. Zhang, B., Li, S. S., Wu, C. S., Gao, L. R., Zhang, W. J., and Peng, M., A neighbourhood-constrained k-means approach to classify very high spatial resolution hyperspectral imagery. Remote Sens Lett 4(2):161–170, 2013. doi:10.1080/2150704x.2012.713139.

    Article  Google Scholar 

  7. Organization WH, World Health Organization. http://www.who.int/cardiovascular_diseases/en/, 2012

  8. McSherry, D., Conversational case-based reasoning in medical decision making. Artif. Intell. Med. 52(2):59–66, 2011. doi:10.1016/j.artmed.2011.04.007.

    Article  Google Scholar 

  9. Zaman, S., and Karray, F., Features Selection using Fuzzy ESVDF for Data Dimensionality Reduction. Int. Conf. Comput. Eng. Technol. I:81–87, 2009. doi:10.1109/Iccet.2009.36.

    Google Scholar 

  10. Polat, K., and Gunes, S., A new feature selection method on classification of medical datasets: Kernel F-score feature selection. Expert Syst. Appl. 36(7):10367–10373, 2009. doi:10.1016/j.eswa.2009.01.041.

    Article  Google Scholar 

  11. Duch, W., Adamczak, R., and Grabczewski, K., A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans. Neural Networks. 12(2):277–306, 2001.

    Article  Google Scholar 

  12. Sahan, S., Polat, K., Kodaz, H., and Gunes, S., The medical applications of attribute weighted artificial immune system (AWAIS): Diagnosis of Heart and Diabetes Diseases. Artif. Immune Syst., Proc. 3627:456–468, 2005.

    Article  Google Scholar 

  13. Ozsen, S., and Gunes, S., Effect of feature-type in selecting distance measure for an artificial immune system as a pattern recognizer. Digit. Signal Proc. 18(4):635–645, 2008. doi:10.1016/j.dsp.2007.08.004.

    Article  Google Scholar 

  14. Kahramanli, H., and Allahverdi, N., Design of a hybrid system for the diabetes and heart diseases. Expert Syst. Appl. 35(1–2):82–89, 2008. doi:10.1016/j.eswa.2007.06.004.

    Article  Google Scholar 

  15. Ozsen, S., Gunes, S., Kara, S., and Latifoglu, F., Use of kernel functions in artificial immune systems for the nonlinear classification problems. IEEE T. Inf. Technol. B. 13(4):621–628, 2009. doi:10.1109/Titb.2009.2019637.

    Article  Google Scholar 

  16. Sub bulakshmi, C. V., Deepa, S. N., and Malathi, N., Extreme learning machine for two category data classification. Paper presented at the IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), 2012

  17. Karabulut, E. M., and Ibrikci, T., Effective Diagnosis of Coronary Artery Disease Using The Rotation Forest Ensemble Method. J. Med. Syst. 36(5):3011–3018, 2012.

    Article  Google Scholar 

  18. Aibinu, A. M., Salami, M. J. E., and Shafie, A. A., A novel signal diagnosis technique using pseudo complex-valued autoregressive technique. Expert Syst. Appl. 38(8):9063–9069, 2011. doi:10.1016/j.eswa.2010.11.005.

    Article  Google Scholar 

  19. Isa, N. A. M., and Mamat, W. M. F. W., Clustered-Hybrid Multilayer Perceptron network for pattern recognition application. Appl. Soft. Comput. 11(1):1457–1466, 2011. doi:10.1016/j.asoc.2010.04.017.

    Article  Google Scholar 

  20. Polat, K., and Gunes, S., An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit. Signal Proc. 17(4):702–710, 2007. doi:10.1016/j.dsp.2006.09.005.

    Article  Google Scholar 

  21. Polat, K., Gunes, S., and Arslan, A., A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Syst. Appl. 34(1):482–487, 2008. doi:10.1016/j.eswa.2006.09.012.

    Article  Google Scholar 

  22. Chikh, M. A., Saidi, M., and Settouti, N., Diagnosis of Diabetes Diseases Using an Artificial Immune Recognition System2 (AIRS2) with Fuzzy K-nearest Neighbor. J. Med. Syst. 36(5):2721–2729, 2012.

    Article  Google Scholar 

  23. Ahmad, F., Isa, N. A. M., Hussain, Z., and Osman, M. K., Intelligent medical disease diagnosis using improved hybrid genetic algorithm - multilayer perceptron network. J. Med. Syst. 37(2), 2013

  24. Ozcift, A., SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of parkinson disease. J. Med. Syst. 36(4):2141–2147, 2012.

    Article  Google Scholar 

  25. MacQueen, J. B., Some methods for classification and analysis of multivariate observations. Paper presented at the In Proceedings of 5th Berkeley symposium on mathematical statistics and probability, California, 1967

  26. Zhang, J. Y., Peng, L. Q., Zhao, X. X., and Kuruoglu, E. E., Robust data clustering by learning multi-metric Lq-norm distances. Expert Syst. Appl. 39(1):335–349, 2012. doi:10.1016/j.eswa.2011.07.023.

    Article  Google Scholar 

  27. Erisoglu, M., Calis, N., and Sakallioglu, S., A new algorithm for initial cluster centers in k-means algorithm. Pattern Recogn. Lett. 32(14):1701–1705, 2011. doi:10.1016/j.patrec.2011.07.011.

    Article  Google Scholar 

  28. Cortes, C., and Vapnik, V., Support-Vector Networks. Mach Learn. 20(3):273–297, 1995.

    MATH  Google Scholar 

  29. Stehman, S. V., Selecting and interpreting measures of thematic classification accuracy. Remote Sens. Environ. 62(1):77–89, 1997.

    Article  Google Scholar 

  30. Xu, Y., Zhu, Q., and Wang, J. H., Breast cancer diagnosis based on a kernel orthogonal transform. Neural Comput. Appl. 21(8):1865–1870, 2012. doi:10.1007/s00521-011-0547-0.

    Article  Google Scholar 

  31. Polat, K., and Gunes, S., Breast cancer diagnosis using least square support vector machine. Digit. Signal Proc. 17(4):694–701, 2007. doi:10.1016/j.dsp.2006.10.008.

    Article  Google Scholar 

  32. Francois, D., Rossi, F., Wertz, V., and Verleysen, M., Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing. 70:1276–1288, 2007.

    Article  Google Scholar 

  33. Diamantidis, N. A., Karlis, D., and Giakoumakis, E. A., Unsupervised stratification of cross-validation for accuracy estimation. Artif. Intell. 116:1–16, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  34. Breiman, L., Friedman, J., Olshen, R., Stone, C., Classification and regression trees. Wadsworth & Boks/Cole Advanced Boks & Software, 1984

  35. Kohavi, R., A study of cross validation and bootstrap for accuracy estimation and model selection. Paper presented at the The Fourteenth International Joint Conference on Artificial Intelligence, San Francisco, 1995

  36. Yao, X., and Liu, Y., A new evolutionary system for evolving artificial neural networks. IEEE Trans. Neural Networks 8(3):694–713, 1997.

    Article  MathSciNet  Google Scholar 

  37. Polat, K., Sahan, S., and Gunes, S., Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing. Expert Syst. Appl. 32(2):625–631, 2007. doi:10.1016/j.eswa.2006.01.027.

    Article  Google Scholar 

  38. Blake, C. L., M.C.J. (1998) UCI repository of machine learning databases.

  39. Tian, J., Li, M. Q., and Chen, F. Z., A hybrid classification algorithm based on coevolutionary EBFNN and domain covering method. Neural. Comput. Appl. 18(3):293–308, 2009. doi:10.1007/s00521-008-0182-6.

    Article  Google Scholar 

Download references

Acknowledgment

The authors are grateful to Selcuk University Scientific Research Projects Coordinatorship for support of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nihat Yilmaz.

Additional information

This article is part of the Topical Collection on Transactional Processing Systems

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yilmaz, N., Inan, O. & Uzer, M.S. A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases. J Med Syst 38, 48 (2014). https://doi.org/10.1007/s10916-014-0048-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-014-0048-7

Keywords

Navigation