Abstract
Models based on machine learning algorithms have been developed to detect the breast cancer disease early. Feature selection is commonly applied to improve the performance of these models through selecting only relevant features. However, selecting relevant features in unsupervised learning is much difficult. This is due to the absence of class labels that guide the search for relevant information. This kind of the problem has rarely been studied in the literature. This paper presents a hybrid intelligence model that uses the cluster analysis algorithms with bio-inspired algorithms as feature selection for analyzing clinical breast cancer data. A binary version of both moth flame optimization and whale optimization algorithm is proposed. Two evaluation criteria are adopted to evaluate the proposed algorithms: clustering-based measurements and statistics-based measurements. The experimental results positively demonstrate that the capability of the proposed bio-inspired feature selection algorithms to produce both meaningful data partitions and significant feature subsets.

















Similar content being viewed by others
References
Abdel-Basset, M., Shahat, D., Sangaiah, A. (2017). A modied nature inspired meta-heuristic whale optimization algorithm forsolving 01 knapsack problem. International Journal of Machine Learning and Cyber, 1–20.
AbdEl-Fattah, S., Nabil, E., Badr, A. (2016). A binary colnal flower pollination algorithm for feature selection. Pattern Recognition Letters, 77, 21–27.
Alba, E., & Dorronsoro, B. (2005). The exploration/exploitation tradeoffin dynamic cellular genetic algorithms. IEEE Transaction on Evolutionary Computation, 9(2), 126–142.
Arthur, D., & Vassilvitskii, S. (2007). K-means++: the advantages of careful seeding. In Proceedings of the 18th annual acm-siam symposium on discrete algorithms (p. 10271035). PA, USA.
Aziz, M., Ewees, A., Hassanien, A. (2017). Whale optimization algorithm and moth-flame optimization for multilevel thresholding image segmentation. Expert Systems with Applications, 1–33.
Boussaid, I., Lepagnot, J., Siarry, P. (2013). A survey on optimization meta-heuristics. Information Sciences, 237, 82–117.
Bradley, P., & Fayyad, U. (1998). Refining initial points for k-means clustering. In Proceedings 15th international conference on machine learning (p. 9199). San Francisco.
Brendan, J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.
Buch, H., Trivedi, I., Jangir, P., Zheng, P. (2017). Moth flame optimization to solve optimal power flow with non-parametric statistical evaluation validation. Cogent Engineering, 4, 1–22.
Chen, C.H. (2014). A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection. Applied Soft Computing, 20, 4–14.
Dey, V.H.A.E., & Nilanjan, B. (2016). [studies in computational intelligence] medical imaging in clinical applications volume 651 —- bio-inspired swarm techniques for thermogram breast cancer detection. https://doi.org/10.1007/978-3-319-33793-7, 487-506.
Dunn, J.C. (1973). A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32–57.
Emary, E., Zawbaa, H., Hassanien, A. (2016a). Binary gray wolf optimization approaches for feature selection. Neurocomputing, 172, 371–381.
Emary, E., Zawbaa, H., Hassanien, A. (2016b). Binary grey wolf optimization approaches for feature selection. Neurocomputing, 172, 371–381.
Faber, V. (1994). Clustering and the continuous k-means algorithm. Los Alamos Science, 22, 138144.
Goldbogen, J., Friedlaender, A., Calambokidis, J., Mckenna, M., Simon, Nowacek, M. (2013). Integrative approaches to the study of baleen whale diving behavior, feeding performance, and foraging ecology. Bio-Science, 63, 90–100.
Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 17(2), 107–145.
Hartigan, J.A., & Wong, M.A. (1979). Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society, 28(1), 100–108.
Hof, P., & Van, E. (2007). Structure of the cerebral cortex of the humpback whale, megaptera novaeangliae (cetacea, mysticeti, balaenopteridae). Anat Rec (Hoboken), 290 (1), 1–31.
Hu, H., Bai, Y., Xu, T. (2017). Improved whale optimization algorithms based on inertia weights and theirs applications. International Journal of Circuits, Systems and Signal Processing, 11, 12–26.
Kaufman, L., & Rousseeuw, P.J. (1990). [wiley series in probability and statistics] finding groups in data —- agglomerative nesting (program agnes). In (pp. 199–252).
Kaya, Y. (2013). A new intelligent classifier for breast cancer diagnosis based on a rough set and extreme learning machine: Rs + elm. Turkish Journal of Electrical Engineering and Computer Science, 21, 2079–2091.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the 1995 ieee international conference on neural networks (pp. 1942–1948), Perth, WA.
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43(1), 59–69.
Laan, M., Pollard, K., Bryan, J. (2003). A new partitioning around medoids algorithm. Journal of Statistical Computation and Simulation, 73(8), 575–584.
Lin, L., & Gen, M. (2009). Auto-tuning strategy for evolutionary algorithms: balancing between exploration and exploitation. Soft Computing, Springer, 13(2), 157–168.
Liu, D., Liu, C., Fu, Q., Li, T., Imran, K., Cui, S., et al. (2017). Elm evaluation model of regional groundwater quality based on the crow search algorithm. Ecological Indicators, 81, 302–314.
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J. (2010). Understanding of internal clustering validation measures. In International conference on data mining (pp. 911–916).
Mahdad, B., & Srairi, K. (2017). A new interactive sine cosine algorithm for loading margin stability improvement under contingency. Electrical Engineering (Archiv fur Elektrotechnik), 1–21.
Marcano, A., Quintanilla, J., Andina, D. (2011). Wbcd breast cancer database classification applying artificial metaplasticity neural network. Expert Systems with Applications, 38, 9573–9579.
Mirjalili, S. (2015). Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowledge-Based Systems, 89, 228–249.
Mirjalili, S., & Lewis, A. (2014). Grey wolf optimizer. Advances in Engineering Software, 69, 46–61.
Nahato, K., Harichandran, K., Arputharaj, K. (2015). Knowledge mining from clinical datasets using rough sets and back propagation neural network. Computational and Mathematical Methods in Medicine, 2015, 1–13.
Neagu, B., Ivanov, O., Gavrilas, M. (2017a). Link prediction based on whale optimization algorithm. In The international conference on new trends in computing sciences (ictcs2017) (pp. 55–59). Amman, Jordan.
Neagu, B., Ivanov, O., Gavrilas, M. (2017b). Voltage profile improvement in distribution networks using the whale optimization algorithm. In The 9th international conference on electronics, computers and artificial intelligence (ecai) (pp. 1–6). Targoviste, Romania.
Olorunda, O., & Engelbrecht, A. (2008). Measuring exploration/exploitation in particle swarms using swarm diversity. In Proceedings of the 2008 ieee congress on evolutionary computation, cec (ieee world congress on computational intelligence) (pp. 1128–1134). Hong Kong.
Pavlyukevich, I. (2007). Levy flights, non-local search and simulated annealing. Journal of Computing Physics, 226, 1830–1844.
Rajeshkumar, J., & Kousalya, K. (2017). Diabetes data classification using whale optimization algorithm and backpropagation neural network. International Research Journal of Pharmacy, 8(11), 219–222.
Reddy, S., Panwar, L., Panigrahi, B., Kumar, R. (2017). Solution to unit commitment in power system operation planning using binary coded modified moth flame optimization algorithm (bmmfoa): A flame selection based computational technique. Journal of Computational Science, 1–22.
Sayed, G., Darwish, A., Hassanien, A. (2017). Quantum multiverse optimization algorithm for optimization problems. Neural Computing and Applications, 1–18.
Sayed, G., & Hassanien, A. (2017). Moth-flame swarm optimization with neutrosophic sets for automatic mitosis detection in breast cancer histology images. Applied Intelligence, 1–12.
Sayed, G., Hassanien, A., Azar, A. (2017). Feature selection via a novel chaotic crow search algorithm. Neural Computing and Applications, 1–18.
Seyedali, M., & Andrew, L. (2016). The whale optimization algorithm. Advances in Engineering Software, Elsevier, 95, 51–67.
Steinley, D., & Brusco, M.J. (2007). Initializing k-means batch clustering: A critical evaluation of several techniques. Journal of Classification, 24(1), 99–121.
Steinley, D., & Brusco, M.J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika, 73, 125–144.
Su, T., & Dy, J. (2007). In search of deterministic methods for initializing k-means and gaussian mixture clustering. Intelligent Data Analysis, 11(4), 319–338.
Wang, K., Wang, B., Peng, L. (2009). Cvap: validation for cluster analysis. Data Science Journal, 8, 88–93.
Wari, E., & Zhu, W. (2016). A survey on metaheuristics for optimization in food manufacturing industry. Applied Soft Computing, 1–22.
Watkins, W., & Schevill, W. (1979). Aerial observation of feeding behavior in four baleen whales: Eubalaena glacialis, balaenoptera borealis, megaptera novaean-gliae, and balaenoptera physalus. Journal of Mammalogy, 60(1), 155–163.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biom Bull, 1, 80–83.
World health organization (woa). (2010). quick cancer facts. http://www.who.int/cancer/en/. (Retrieved September 22, 2010).
Yang, X. (2012). Flower pollination algorithm for global optimization. Proceedings of the Unconventional Computation and Natural Computation, 7445, 240–249.
Zhang, W., & Zhu, G. (2017). Drilling path optimization by optimal foraging algorithm. IEEE Transactions on Industrial Informatics, PP(99), 1–21.
Zheng, B., Won, S., Lam, S. (2014). Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms. Expert Systems with Applications, 41, 1476–1482.
Acknowledgements
We would like to thank the Editor for suggesting implementing different initialization strategies. We found these strategies can achieve better results for the proposed clinical decision support system model.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sayed, G.I., Darwish, A. & Hassanien, A.E. Binary Whale Optimization Algorithm and Binary Moth Flame Optimization with Clustering Algorithms for Clinical Breast Cancer Diagnoses. J Classif 37, 66–96 (2020). https://doi.org/10.1007/s00357-018-9297-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-018-9297-3