Abstract
Gene selection from microarray gene expression datasets and clustering of samples into different groups are important data mining tasks for disease identification. Selection of more interpretable genes from the gene expression dataset is an essential data-preprocessing task, which helps to study on cancer diseases. Gene selection during sample clustering is inherently a difficult task as there is no obvious criterion to guide the search. Simultaneous gene selection and sample clustering is a two-way data analysis technique which has recently gained attention in research area. The traditional clustering techniques are unable to handle noisy data properly. So, effective clustering algorithms are more desirable which can deal with the relevant and noise free data. Therefore, target genes selection before sample clustering is essential and of course effective if both the tasks are done simultaneously. In this chapter, optimal gene subset is selected and sample clustering is performed simultaneously using Multi-Objective Genetic Algorithm (MOGA). Different versions of MOGA are employed to choose the optimal gene subset, where natural number of optimal clusters of samples is automatically obtained at the end of the process. Non-dominated sorting genetic algorithm (NSGA), Strength pareto evolutionary algorithm (SPEA) and its modified version SPEA2 are applied for the purpose. The methods use nonlinear hybrid uniform cellular automata for generating initial population, tournament selection strategy, two-point crossover operation, and a suitable jumping gene mutation mechanism to maintain diversity in the population. It uses mutual correlation coefficient; internal and external cluster validation indices as objective functions to find out the non-dominated solutions. To measure the cluster validation indices, clustering algorithm is applied on data subset associated to chromosomes in the population to find out different clusters. After the convergence of genetic algorithm, the best solution from the non-dominated solutions is identified that provides the important genes and categorizes the samples into clusters. The experimental results express the correctness of the proposed simultaneous gene selection and sample categorization method. The goodness of optimality of the clusters obtained using different genetic algorithms is expressed by comparing various cluster validation indices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
C.J. Alonso-Gonzalez, Q.I. Moro-Sancho, A. Simon-Hurtado, R. Varela-Arrabal, Microarray gene expression classification with few genes: criteria to combine attribute selection and classification methods. Expert Syst. Appl. 39(8), 7270–7280 (2012)
S. Akogul, M. Erisoglu, An approach for determining the number of clusters in a model-based cluster analysis. Entropy 19(452), 1–15 (2017)
A. Baraldi, P. Blonda, A Survey of fuzzy clustering algorithms for pattern recognition—part I and II. IEEE Trans. Syst. Man Cybern. B, Cybern. 29(6), 778–801 (1999)
A. Bellaachia, D. Portno, Y. Chen, A.G. Elkahloun, E-CAST: a data mining algorithm for gene expression data. J. Comput. Biol. 7, 559–584 (2000)
A. Ben-Dor, R. Shamir, Z. Yakhini, Clustering gene expression patterns. J. Comput. Biol. 6(3–4), 281–297 (1999)
A. Bhat, K-Medoids clustering using partitioning around mediods performing face recognition. Int. J. Soft Comput. Math. Control (IJSCMC) 3(3), 1–12 (2014)
D.N. Campo, G. Stegmayer, D.H. Milone, A new index for clustering validation with overlapped clusters. Expert Syst. Appl. 64, 549–556 (2016)
R.B. Calinski, J. Harabasz, A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)
D.L. Davies, D.W. Bouldin, A cluster separation measure. IEEE Trans. Pattern Recogn. Mach. Intell. 1(2), 224–227 (1979)
K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms. Wiley, vol. 16 (2001)
K. Deb, D. Goldberg, An investigation of niche and spices formation in genetic function optimization, in Proceedings of the Third International Conference on Genetic Algorithms (1989), pp. 42–50
K. Deb, Genetic Algorithm in Multi-Modal Function Optimization, Master’s Thesis, Tuscaloosa, University of Alabama (1989)
K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
C.M. Fonseca, P.J. Fleming, Genetic algorithms for multi-objective optimization: formulation, discussion and generalization, in Proceedings of the Fifth International Conference on Genetic Algorithms, ed. by S. Forrest (Morgan Kauffman, San Mateo, CA, 1993), pp. 416–423
D. Gong, G. Wang, X. Sun, Y. Han, A set-based genetic algorithm for solving the many-objective optimization problem. Soft Comput. 19(6), 1477–1495 (2015)
K.C. Gowda, G. Krishna, Agglomerative clustering using the concept of mutual nearest neighborhood. Pattern Recogn. 10, 105–112 (1978)
F. Gu, H.L. Liu, K.C. Tan, A hybrid evolutionary multi-objective optimization algorithm with adaptive multi-fitness assignment. Soft Comput. 19(11), 3249–3259 (2015)
J. Horn, N. Nafploitis, D.E. Goldberg, A niched Pareto genetic algorithm for multi-objective optimization, in Proceedings of the First IEEE Conference on Evolutionary Computation, ed. by Z. Michalewicz (IEEE Press, Piscataway, NJ, 1994), pp. 82–87
Z. Huang, M.K. Ng, A fuzzy k-Modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)
R. Kerber, ChiMerge: discretization of numeric attributes, in Tenth National Conference on Artificial Intelligence (1992), pp. 123–128
H. Liu, B. Dai, H. He, Y. Yan, The k-prototype algorithm of clustering high dimensional and large scale mixed data, in Proceedings of the International computer Conference, China (2006), pp. 738–743
H. Maaranen, K. Miettinen, M.M. Makela, A quasi-random initial population for genetic algorithms, in Computers and Mathematics with Applications, vol. 47(12) (Elsevier, 2004), pp. 1885–1895
U. Maulik, S. Bandyopadhyay, Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. 24(12), 1650–1654 (2002)
P. Merz, An Iterated Local Search Approach for Minimum Sum of Squares Clustering. IDA 2003 (2003), pp. 286–296
P.A. Mundra, J.C. Rajapakse, Gene and sample selection for cancer classification with support vectors based t-statistic. Neurocomputing 73(13–15), 2353–2362 (2010)
R.T. Nag, J. Han, CLARANS: a method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Eng. 14(5), 1003–1016 (2002)
S.K. Pati, A.K. Das, A. Ghosh, Gene selection using multi-objective genetic algorithm integrating cellular automata and rough set theory in Swarm, Evolutionary, and Memetic Computing (2013), pp. 144–155
W. Pedrycz, K. Hirota, Fuzzy vector quantization with the particle swarm optimization: a study in fuzzy granulation-degranulation information processing. Signal Process. 87(9), 2061–2071 (2007)
M.I. Petrovskiy, Outlier detection algorithms in data mining systems. Program. Comput. Softw. 29(4), 228–237 (2003)
K. Price, R.M. Storn, J.A. Lampinen, Differential Evolution: A Practical Approach to Global Optimization. Natural Computing Series (Springer, 2005). ISBN: 3540209506
P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
J.D. Schaffer, Multiple objective optimization with vector evaluated genetic algorithms, in Proceedings of the First International Conference on Genetic Algorithms ed. by J.J. Grefensttete (Lawrence Erlbaum, Hillsdale, NJ, 1987), pp. 93–100
N. Srinivas, K. Deb, Multi-objective function optimization using non dominated sorting genetic algorithms. Evol. Comput. 2(3), 221–248 (1995)
M. Steinbach, G. Karypis, V. Kumar, A Comparison of document clustering technique, Technical Report number 00 - 034, University of Minnesota, Minneapolis (2000)
I.V. Tetko, D.J. Livingstone, A.I. Luik, Neural network studies. 1. Comparison of overfitting and overtraining. J. Chem. Inf. Comput. Sci. 35, 826–833 (1995)
D.P. Waters, Von Neumann’s theory of self-reproducing automata: a useful framework for biosemiotics? Biosemiotics 5(1), 5–15 (2012)
X.L. Xie, G. Beni, A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 841–846 (1991)
E. Zitzler, M. Laumanns, L. Thiele, SPEA2: improving the strength pareto evolutionary algorithm for multiobjective optimization, in Evolutionary Methods for Design, Optimisation, and Control (2002), pp. 95–100
E. Zitzler, L. Thiele, Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Das, A.K., Das, S. (2018). A Comparative Study on Different Versions of Multi-Objective Genetic Algorithm for Simultaneous Gene Selection and Sample Categorization. In: Mandal, J., Mukhopadhyay, S., Dutta, P. (eds) Multi-Objective Optimization. Springer, Singapore. https://doi.org/10.1007/978-981-13-1471-1_11
Download citation
DOI: https://doi.org/10.1007/978-981-13-1471-1_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1470-4
Online ISBN: 978-981-13-1471-1
eBook Packages: Computer ScienceComputer Science (R0)