Abstract
We present a novel approach to multivariate feature ranking in context of microarray data classification that employs a simple genetic algorithm in conjunction with Random forest feature importance measures. We demonstrate performance of the algorithm by comparing it against three popular feature ranking and selection methods on a colon cancer recurrence prediction problem. In addition, we investigate biological relevance of the selected features, finding functional associations of corresponding genes with cancer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Glas, A.M., Floore, A., Delahaye, L.J., Witteveen, A.T., Pover, R.C., Bakx, N., Lahti- Domenici, J.S., Bruinsma, T.J., Warmoes, M.O., Bernards, R., Wessels, L.F., Van’t Veer, L.J.: Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics 7, 278 (2006)
Fraser, A.: Simulation of genetic systems by automatic digital computers. I. Introduction. Aust. J. Biol. Sci. 10, 484–491 (1957)
Holland, J.H.: Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. University of Michigan Press (1975)
Gondro, C., Kinghorn, B.P.: A simple genetic algorithm for multiple sequence alignment. Genetics and Molecular Research 6(4), 964–982 (2007) PMID 18058716
Van Batenburg, F.H., Gultyaev, A.P., Pleij, C.W.: An APL-programmed genetic algorithm for the prediction of RNA secondary structure. Journal of Theoretical Biology 174(3), 269–280 (1995) PMID 7545258, doi:10.1006/jtbi.1995.0098
Popovic, D., Sifrim, A., Pavlopoulos, G.A., Moreau, Y., De Moor, B.: A simple genetic algorithm for biomarker mining. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds.) PRIB 2012. LNCS, vol. 7632, pp. 222–232. Springer, Heidelberg (2012)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Strobl, C., Boulesteix, A.L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 8, 25 (2007)
Huang, X., Pan, W., Grindle, S., Han, X., Chen, Y., Park, S.J., Miller, L.W., Hall, J.: A comparative study of discriminating human heart failure etiology using gene expression profiles. BMC Bioinformatics 6, 205 (2005)
Bureau, A., Dupuis, J., Falls, K., Lunetta, K.L., Hayward, B., et al.: Identifying SNPs predictive of phenotype using random forests. Genetic Epidemiology 28, 171–182 (2005)
Saeys, Y., et al.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)
Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton (1993)
Loughrey, J., Cunningham, P.: Overfitting in wrapper-based feature subset se lection: the harder you try the worse it gets. In: Proceedings of International Conference on Innovative Techniques and Applications of Artificial Intelligence, vol. 33, p. 43 (2004)
Loots, G.G., Locksley, R.M., Blankespoor, C.M., Wang, Z.E., Miller, W., Rubin, E.M., Frazer, K.A.: Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288, 136–140 (2000)
Smith, J.J., Deane, N.G., Wu, F., Merchant, N.B., et al.: Experimentally derived me tastasis gene expression profile predicts recurrence and death in patients with colon cancer. Gastroenterology 138(3), 958–968 (2010)
Kaiser, S., Park, Y.K., Franklin, J.L., Halberg, R.B., et al.: Transcriptional recapitula tion and subversion of embryonic colon development by mouse colon tumor models and human colon cancer. Genome Biol. 8(7), R131 (2007)
Wang, Y., Jatkoe, T., Zhang, Y., Mutch, M.G., Talantov, D., Jiang, J., McLeod, H.L., Atkins, D.: Gene expression profiles and molecular markers to predict recur rence of Dukes’ B colon cancer. J. Clin. Oncol. 22, 1564–1571 (2004)
Jiang, Y., Casey, G., Lavery, I.C., Zhang, Y., Talantov, D., Martin-McGreevy, M., Skacel, M., Manilich, E., Mazumder, A., Atkins, D., Delaney, C.P., Wang, Y.: Development of a clinically feasible molecular assay to predict recurrence of stage II colon cancer. J. Mol. Diagn. 10, 346–354 (2008)
Lin, Y.H., Friederichs, J., Black, M.A., Mages, J., Rosenberg, R., Guilford, P.J., Phillips, V., Thompson-Fawcett, M., Kasabov, N., Toro, T., Merrie, A.E., van Rij, A., Yoon, H.S., McCall, J.L., Siewert, J.R., Holzmann, B., Reeve, A.E.: Multiple gene expression classi fiers from different array platforms predict poor prognosis of colorectal cancer. Clin. Cancer. Res. 13, 498–507 (2007)
Lin, P.C., Lin, S.C., Lee, C.T., Lin, Y.J., Lee, J.C.: Dynamic change of tetraspanin CD151 membrane protein expression in colorectal cancer patients. Cancer Invest. 29(8), 542–547 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Popovic, D., Sifrim, A., Moschopoulos, C., Moreau, Y., De Moor, B. (2013). A Hybrid Approach to Feature Ranking for Microarray Data Classification. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds) Engineering Applications of Neural Networks. EANN 2013. Communications in Computer and Information Science, vol 384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41016-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-41016-1_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41015-4
Online ISBN: 978-3-642-41016-1
eBook Packages: Computer ScienceComputer Science (R0)