Abstract
In real life, there are a lot of unbalanced data, and there are great differences in the data volume in category distribution, especially in the medical data where this problem is more prominent because of the prevalence rate. In this paper, the P-mRMR algorithm is proposed based on the mRMR algorithm to improve the feature selection process of unbalance data, and to process the attributes with more missing values and integrate the missing values into feature selection while selecting features specific to the characteristics of more missing values in the data set, so as to reduce the complexity of the data pre-processing. In the experiments, the AUC, confusion matrix and probability of missing value are used to compare the algorithms. The experiment shows that the features selected by the improved algorithm have better results in the classifiers.
Supported by National Key R&D Program of China (No. 2018YFC0810601), National Key R&D Program of China (No. 2016YFC0901303), National Natural Science Foundation of China (No. 61977005).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)
Bolón-Canedo, V., Seth, S., Sánchez-Maroño, N., Alonso-Betanzos, A., Principe, J.C.: Statistical dependence measure for feature selection in microarray datasets. In: European Symposium on ESANN (2012)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2011)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. AM Sigkdd Explor. Newsl. 6(1), 1–6 (2004)
Chen, C., Breiman, L.: Using random forest to learn imbalanced data. University of California, Berkeley (2004)
Chen, H., Li, T., Fan, X., Luo, C.: Feature selection for imbalanced data based on neighborhood rough sets. Inf. Sci. 483, 1–20 (2019)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(6), 1157–1182 (2003)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Li, A., Wang, R., Xu, L.: Shrink: a breast cancer risk assessment model based on medical social network. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1189–1196. IEEE (2017)
Li, D.C., Liu, C.W., Hu, S.C.: A learning method for the class imbalance problem with medical data sets. Comput. Biol. Med. 40(5), 509–518 (2010)
Li, J., et al.: Feature selection: a data perspective. AM Comput. Surv. 50(6), 1–45 (2016)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. B 39(2), 539–550 (2009)
Mafarja, M.M., Mirjalili, S.: Hybrid binary ant lion optimizer with rough set and approximate entropy reducts for feature selection. Soft Comput. 23(15), 6249–6265 (2018). https://doi.org/10.1007/s00500-018-3282-y
Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014)
Moayedikia, A., Ong, K.L., Boo, Y.L., Yeoh, W.G., Jensen, R.: Feature selection for high dimensional imbalanced class data using harmony search. Eng. Appl. Artif. Intell. 57, 38–49 (2017)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. A Syst. Hum. 40(1), 185–197 (2010)
Urbanowicz, R.J., Melissa, M., La, C.W., Olson, R.S., Moore, J.H.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2017)
Wasikowski, M., Chen, X.W.: Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22(10), 1388–1400 (2010)
Yan-Xia, L.I., Yi, C., You-Qiang, H.U., Hong-Peng, Y.: Review of imbalanced data classification methods. Control Decis. 34(04), 673–688 (2019)
Yin, L., Ge, Y., Xiao, K., Wang, X., Quan, X.: Feature selection for high-dimensional imbalanced data. Neurocomput. 105, 3–11 (2013)
Zhang, C., Wang, G., Zhou, Y., Yao, L., Wang, X.: Feature selection for high dimensional imbalanced class data based on f-measure optimization. In: 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), pp. 278–283. IEEE(2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Di, J., Shi, Z. (2020). Prediction Model of Breast Cancer Based on mRMR Feature Selection. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-63820-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)