Abstract
Retrieval of information related to a subset variable or feature has become the attention of many researchers in data mining fields. The objective of feature selection (FS) is to improve the performance of the prediction. This contributes to providing a better definition of the features, feature structure, feature ranking, feature selection functions, efficient search techniques, and feature validation methods. In this study, a retrieval method that integrates correlation and linear forward selection algorithms to evaluate and generate the subset of clinical features are present. The objective of the research is to find the optimal features of a cancer dataset and to classify the disease into multiple cancer stages: one, two, three, and four. The research methodology is developed based on data mining, knowledge data discovery with four phases: pre-processing, resampling, feature selection, and classification. The proposed Bayesian Relevance Feedback (BRF) for classification is also described to resolve the zero value of posterior probabilities, concentrating on increasing the accuracy in the diagnosis of cancer stages. The experimental works are done on oral cancer dataset by applying WEKA. The analysis on accuracy performance was done on several classification algorithms using 15 optimal features that were chosen by a hybrid features selection method. The result shows that, BRF has outperformed others achieving 97.25% classification accuracy compared to the six classifiers, which are K-Nearest Neighbors Classifier, Multi Class Classifier, Tree-Random, Multilayer Perceptron, Naïve Bayes, and Support Vector Machine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Han, J., Pei, J., Kamber, M.: Data mining: concepts and techniques, 2nd edn. Elsevier, San Francisco (2011)
Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications–a decade review from 2000 to 2011. Expert Syst. Appl. 39(12), 11303–11311 (2012)
Ngai, E.W., Hu, Y., Wong, Y.H., Chen, Y., Sun, X.: The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis. Support Syst. 50(3), 559–569 (2011)
Carneiro, N., Figueira, G., Costa, M.: A data mining based system for credit-card fraud detection in e-tail. Decis. Support Syst. 95, 91–101 (2017)
Romero, C., Ventura, S.: Educational data mining: a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40(6), 601–618 (2010)
Mohamad, S.K., Tasir, Z.: Educational data mining: a review. Procedia-Soc. Behav. Sci. 97, 320–324 (2013)
Esfandiari, N., Babavalian, M.R., Moghadam, A.M.E., Tabar, V.K.: Knowledge discovery in medicine: Current issue and future trend. Expert Syst. Appl. 41(9), 4434–4463 (2014)
Mohd, F., Jalil, M.A., Noor, N.M.M., Bakar, Z.A., Abdullah., Z.: Enhancement of Bayesian model with relevance feedback for improving diagnostic model. Malays. J. Comput. Sci. (Spec. Issue December), 1–14 (2018)
Dangare, C.S., Apte, S.S.: Improved study of heart disease prediction system using data mining classification techniques. Int. J. Comput. Appl. 47(10), 44–48 (2012)
Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)
Borowska, K., Topczewska, M.: Data preprocessing in the classification of the imbalanced data. Adv. Comput. Sci. Res. 11, 31–46 (2014)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Jain, D., Singh, V.: Feature selection and classification systems for chronic disease prediction: a review. Egypt. Inf. J. 19(3), 179–189 (2018)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(2002), 321–357 (2002)
Bakar, Z.A., Mohd, F., Noor, N.M.M., Rajion, Z.A.: Demographic profile of oral cancer patients in East Coast of Peninsular Malaysia. Int. Med. J. 20(3), 362–364 (2013)
Hall, M.A., Correlation-based feature selection for machine learning. University of Waikato, Hamilton, NewZealand (1999)
Zhu, W., Zeng, N., Wang, N.: Sensitivity, specificity, accuracy, associated confidence interval and ROC analysis with practical SAS implementations. In: NESUG Proceedings: Health Care and Life Sciences, Baltimore, Maryland, vol. 19, p. 67 (2010)
Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
Kautz, T., Eskofier, B.M., Pasluosta, C.F.: Generic performance measure for multiclass-classifiers. Pattern Recogn. 68, 111–125 (2017)
Fushiki, T.: Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 21(2), 137–146 (2011)
Zhang, Y., Yang, Y.: Cross-validation for selecting a model selection procedure. J. Econom. 187(1), 95–112 (2015)
Kraemer, H.C.: Kappa coefficient. Wiley StatsRef: Statistics Reference Online 1–4 (2014)
Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE 12(6), e0177678 (2017)
Acknowledgement
This study is partially funded by the JKKLA, Universiti Malaysia Terengganu (UMT).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mohd, F., Abdul Jalil, M., Mohamad Noor, N.M., Ismail, S., Abu Bakar, Z. (2019). The Use of Hybrid Information Retrieve Technique and Bayesian Relevance Feedback Classification on Clinical Dataset. In: Berry, M., Yap, B., Mohamed, A., Köppen, M. (eds) Soft Computing in Data Science. SCDS 2019. Communications in Computer and Information Science, vol 1100. Springer, Singapore. https://doi.org/10.1007/978-981-15-0399-3_16
Download citation
DOI: https://doi.org/10.1007/978-981-15-0399-3_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0398-6
Online ISBN: 978-981-15-0399-3
eBook Packages: Computer ScienceComputer Science (R0)