Abstract
Gene selection is a general phenomenon in the subject of bioinformatics where data mining and knowledge innovation plays a significant role in selecting an optimal set of genes regarding some useful evaluation functions. Gene selection based on single objective genetic algorithm may not provide the best solution due to varied characteristics of the datasets. If multiple objective functions are combined, an algorithm generally provides more important genes compared to the algorithm relying on a single criterion. Here, two criteria are united and a novel bi-objective genetic algorithm for gene selection is proposed, which effectively reduces the dimensionality of the huge volume gene dataset without sacrificing any meaningful information. The method uses nonlinear hybrid cellular automata for creating initial population and a novel jumping gene technique for mutation to maintain diversity in chromosomes of the population. It explores rough set theory and Kullback–Leibler divergence technique to define two fitness functions, which are conflicting in nature and are employed to approximate a Pareto-optimal solution sets. The best solutions of the proposed method provide the informative genes used for disease diagnosis. The replacement strategy for the creation of next generation population is based on the Pareto-optimal solution regarding both the fitness functions. The experimental results on the publicly obtainable microarray data express the importance of the identified genes and the effectiveness of the proposed informative gene selection mechanism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)
H.C. Causton, J. Quackenbush, A. Brazma, Microarray gene expression data analysis: a beginner’s guide. Genet. Res. 82, 151–153 (2003)
G. Chaconas, B.D. Lavoie, M.A. Watson, DNA transposition: jumping gene machine. Curr. Biol. 6(7), 817–820 (1996)
K. Deb, A. Pratap, S. Agarwal, T.A. Meyarivan, A fast and elitist multi objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comp. 6(2), 182–197 (2002)
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, E.S. Lander, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
D.E. Goldberg, J.H. Holland, Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99 (1988)
D. Gong, G. Wang, X. Sun, Y. Han, A set-based genetic algorithm for solving the many-objective optimization problem. Soft Comput. 19(6), 1477–1495 (2015)
G.J. Gordon, R.V. Jensen, L.L. Hsiao, S.R. Gullans, J.E. Blumenstock, S. Ramaswamy, W.G. Richards, D.J. Sugarbaker, R. Bueno, Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963–4967 (2002)
F. Gu, H.L. Liu, K.C. Tan, A hybrid evolutionary multi-objective optimization algorithm with adaptive multi-fitness assignment. Soft Comput. 19(11), 3249–3259 (2015)
A.M. Hall, Correlation-based feature selection for machine learning, The University of Waikato, 1999
J. Harmouche, C. Delpha, D. Diallo, Y.L. Bihan, Statistical approach for non-destructive incipient crack detection and characterization using Kullback-Leibler divergence. IEEE Trans. Reliab. 65(3), 1360–1368 (2016)
J.E. Jackson, A User’s Guide to Principal Components (Wiley, New York, 1991), ISBN 0-471-62267-2
S.Y. Jing, A hybrid genetic algorithm for feature subset selection in rough set theory. Soft Comput. 18(7), 1373–1382 (2014)
Kent Ridge Biomedical Dataset Repository, (n.d), http://datam.i2r.a-star.edu.sg/datasets/krbd/
R. Kerber, ChiMerge: discretization of numeric attributes. in National Conference on Artificial Intelligence, pp. 123–128 (1992)
J.D. Knowles, D.W. Corne, M-PAES: a memetic algorithm for multi-objective optimization. in Proceedings of IEEE Congress on Evolutionary Computation, pp. 325–332 (2000)
S. Kullback, R.A. Leibler, On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Y. Leung, Y. Hung, A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 7(1), 108–117 (2010)
H. Maaranen, K. Miettinen, M.M. Makela, A quasi-random initial population for genetic algorithms. Comput. Math. Appl. 47(12), 1885–1895 (2004), Elsevier
J.V. Neumann, in Theory of Self-reproducing Automata, ed. by A.W. Burks (Univer. of Illinois Press, USA, 1996)
Z. Pawlak, Rough set theory and its applications to data analysis. Cybern. Syst. 29, 661–688 (1998)
M. Petrou, P. Bosdogianni, An example of SVD. in Image Processing: The Fundamentals (Wiley, 2000), pp. 37–44
K. Price, R.M. Storn, J.A. Lampinen, in Differential Evolution: A Practical Approach to Global Optimization, Natural Computing Series (Springer, 2005), ISBN: 3540209506
L.S. Santana, A.M. Canuto, Filter-based optimization techniques for selection of feature subsets in ensemble systems. Expert Syst. Appl. 41(4), 1622–1631 (2014)
G. Schaefer, Data mining of gene expression data by fuzzy and hybrid fuzzy methods. IEEE Trans. Inf. Technol. Biomed. 14(1), 23–29 (2010)
P. Shelokar, A. Quirin, O. Cordón, MOSubdue: a Pareto dominance-based multi objective Subdue algorithm for frequent sub graph mining. Knowl. Inf. Syst. 34(1), 75–108 (2013)
M.A. Shipp, K.N. Ross, P. Tamayo, A.P. Weng, J.L. Kutok, R.C.T. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, T.R. Golub, Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Natl. Med. 8(1), 68–74 (2002)
D. Singh, P.G. Febbo, K. Ross, D.G. Jackson, J. Manola, C. Ladd, P. Tamayo, A.A. Renshaw, J.P. Richie, E.S. Lander, M. Loda, T.R. Golub, W.R. Sellers, Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)
L.J. Veer, H. Dai, M.J. Vijver, Y.D. He, Y.D. He, A.A.M. Hart, Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
D.P. Waters, Von Neumann’s theory of self-reproducing automata: a useful framework for biosemiotics? Biosemiotics 5(1), 5–15 (2012)
Y. Yang, J.O. Pedersen, A comparative study on feature selection in text categorization. ICML 97, 412–420 (1997)
Q. Zhang, H. Li, MOEA/D: a multi-objective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11(6), 712–731 (2007)
E. Zitzler, L. Thiele, Multi-objective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Das, A.K., Pati, S.K. (2018). Bi-objective Genetic Algorithm with Rough Set Theory for Important Gene Selection in Disease Diagnosis. In: Mandal, J., Mukhopadhyay, S., Dutta, P. (eds) Multi-Objective Optimization. Springer, Singapore. https://doi.org/10.1007/978-981-13-1471-1_13
Download citation
DOI: https://doi.org/10.1007/978-981-13-1471-1_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1470-4
Online ISBN: 978-981-13-1471-1
eBook Packages: Computer ScienceComputer Science (R0)