Abstract
It is known that breast cancer is not just one disease, but rather a collection of many different diseases occurring in one site that can be distinguished based in part on characteristic gene expression signatures. Appropriate diagnosis of the specific subtypes of this disease is critical for ensuring the best possible patient response to therapy. Currently, therapeutic direction is determined based on the expression of characteristic receptors; while cost effective, this method is not robust and is limited to predicting a small number of subtypes reliably. Using the original 5 subtypes of breast cancer we hypothesized that machine learning techniques would offer many benefits for feature selection. Unlike existing gene selection approaches, we propose a tree-based approach that conducts gene selection and builds the classifier simultaneously. We conducted computational experiments to select the minimal number of genes that would reliably predict a given subtype. Our results support that this modified approach to gene selection yields a small subset of genes that can predict subtypes with greater than 95% overall accuracy. In addition to providing a valuable list of targets for diagnostic purposes, the gene ontologies of selected genes suggest that these methods have isolated a number of potential genes involved in breast cancer biology, etiology and potentially novel therapeutics.
Chapter PDF
Similar content being viewed by others
References
Perou, C.M., et al.: Golecular Portraits of Human Breast Tumours. Nature 406, 747–752 (2000)
Perou, C.M., et al.: Comprehensive Molecular Portraits of Human Breast Tumours. Nature 490, 61–70 (2012)
Chandriani, S., Frengen, E., Cowling, V.H., Pendergrass, S.A., Perou, C.M., Whitfield, M.L., Cole, M.D.: A Core MYC Gene Expression Signatures is Prominent in Basal-Like Breast Cancer but only Partially Overlaps the Core Serum Response. PLOS One 4(8), e6693 (2009)
van’t Veer, L.J., et al.: Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer. Nature 415(6871), 530–536 (2002)
Klebanov, L., Yakovlev, A.: How High is The Level of Technical Noise in Microarray Data? Biology Direct. 2, 9 (2007)
Ding, C., Peng, H.: Munimun Redundancy Feature Selection from Microarray Gene Expression Data. Journal of Bioinformatics and Computational Biology 3(2), 185–205 (2005)
Peng, H., Long, F., Ding, C.: Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Li, T., Zhang, C., Ogihata, M.: A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Vased on Gene Expression. Bioinformatics 20(15), 2429–2437 (2004)
Liu, H., Setiono, R.: Chi2: Feature Selection and Discretization of Numeric Attributes. In: IEEE International Conference on Tools with Artificial Intelligence, pp. 388–391. IEEE Press, New York (1995)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-Norm Support Vector Machines. In: NIPS. MIT Press, Cambridge (2004)
Hu, Z., et al.: The Molecular Portraits of Breast Tumors are Conserved Across Microarray Platforms. BMC Genomics 7, 96 (2006)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience, New York (2006)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Chang, C.-C., Lin, C.-J.: LIBSVM: a Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology 12, 27:1–27:27 (2011)
Liu, X., Krishnan, A., Mondry, A.: An Entropy-Based Gene Selection Method for Cancer Classification Using Microarray Data. BMC Bioinformatics 6, 76 (2005)
Liu, Q., Sung, A.H., Chen, Z., Liu, J., Huang, X., Deng, Y.: Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data. PLoS One 4(12), e8250 (2009)
Zeng, T., Liu, J.: Mixture Classification Model Based on Clinical Markers for Breast Cancer Prognosis. Artificial Intelligence in Medicine 48, 129–137 (2010)
Mohamad, M.S., Omatu, S., Deris, S., Yoshioka, M.: Particle Swarm Optimization for Gene Selection in Classifying Cancer Classes. Artificial Life and Robotics 14(1), 16–19 (2009)
Yousef, M., Jung, S., Showe, L., Showe, M.: Recursive Cluster Elimination (RCE) for Classification and Feature Selection from Gene Expression Data. BMC Bioinformatics 8, 144 (2007)
Li, Y., Ngom, A., Rueda, L.: A Framework of Gene Subset Selection Using Multiobjective Evolutionary Algorithm. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds.) PRIB 2012. LNCS (LNBI), vol. 7632, pp. 38–48. Springer, Heidelberg (2012)
Diehn, M., et al.: SOURCE: a Unified Genomic Resource of Functional Annotations, Ontologies, and Gene Expression Data. Nucleic Acids Research 31(1), 219–223 (2003), http://smd.stanford.edu/cgi-bin/source/sourceSearch
Sorlie, T., et al.: Gene Expression Patterns of Breast Carcinomas Distinguish Tumor Subclasses with Clinical Implications. PANS 98(19), 10869–10874 (2001)
Sorlie, T., et al.: Repeated Observation of Breast Tumor Subtypes in Independent Gene Expression Data Sets. PANS 100(14), 8418–8423 (2003)
Curtis, C., et al.: The Genomic and Transcriptomic Architecture of 2,000 Breast Tumours Reveals Novel Subgroups. Nature 486(7403), 346–352 (2012)
Hallett, R.M., Dvorkin-Gheva, A., Bane, A., Hassell, J.A.: A Gene Signature for Predicting Outcome in Patients with Basal-Like Breast Cancer. Scientific Reports 2, 227 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rezaeian, I. et al. (2013). Identifying Informative Genes for Prediction of Breast Cancer Subtypes. In: Ngom, A., Formenti, E., Hao, JK., Zhao, XM., van Laarhoven, T. (eds) Pattern Recognition in Bioinformatics. PRIB 2013. Lecture Notes in Computer Science(), vol 7986. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39159-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-39159-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39158-3
Online ISBN: 978-3-642-39159-0
eBook Packages: Computer ScienceComputer Science (R0)