Abstract
Finding robust marker genes is one of the key challenges in breast cancer research. Significant signatures identified in independent datasets often show little to no overlap, possibly due to small sample size, noise in gene expression measurements, and heterogeneity across patients. To find more robust markers, several studies analyzed the gene expression data by grouping functionally related genes using pathways or protein interaction data. Here we pursue a protein similarity measure based on Pfam protein family information to aid the identification of robust subnetworks for prediction of metastasis. The proposed protein-to-protein similarities are derived from a protein-to-family network using family HMM profiles. The gene expression data is overlaid with the obtained protein-protein sequence similarity network on six breast cancer datasets. The results indicate that the captured protein similarities represent interesting predictive capacity that aids interpretation of the resulting signatures and improves robustness.
Chapter PDF
Similar content being viewed by others
Keywords
References
Weigelt, B., et al.: Breast cancer metastasis: markers and models. Nat. Rev. Cancer 5(8), 591–602 (2005)
Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)
Vijver, M.J., et al.: A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347(25), 1999–2009 (2002)
van Vliet, M.H., et al.: Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability. BMC Genomics 9, 375 (2008)
Ein-Dor, L., et al.: Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21(2), 171–178 (2005)
Hua, J., Tembe, W.D.: Performance of feature-selection methods in the classification of high-dimension data. Pattern Recog. 42(3), 409–424 (2009)
Symmans, W.F., et al.: Breast cancer heterogeneity: evaluation of clonality in primary and metastatic lesions. Hum. Pathol. 26(2), 210–216 (1995)
Shen, R., et al.: Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data. BMC Genomics 5(1), 94 (2004)
Pujana, M.A., et al.: Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat. Genet. 39(11), 1338–1349 (2007)
Chuang, H.Y., et al.: Network-based classification of breast cancer metastasis. Mol. Sys. Bio. 3, 140 (2007)
van den Akker, E., et al.: Integrating protein-protein interaction networks with gene-gene co-expression networks improves gene signatures for classifying breast cancer metastasis (submitted)
Rigden, D.: From protein structure to function with bioinformatics. Springer, Heidelberg (2009)
Finn, R.D., et al.: The Pfam protein families database. Nucleic Acids Res. 38, D211–D222 (2010)
Eddy, S.R.: A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comp. Bio. 4(5), e1000069 (2008)
von Mering, C., et al.: STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31(1), 258–261 (2003)
van der Maaten, L.J.P., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Res. 9, 2579–2605 (2008)
Goeman, J.J., et al.: A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 20(1), 93–99 (2004)
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et des Jura, Bulletin de la Société Vaudoise de Sciences. Naturelles 37, 547–579 (1901)
Edwards, A.W.F.: The measure of association in a 2×2 table. JSTOR 126(1), 1–28 (1968)
Huang, D.W., et al.: Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protoc. 4(1), 44–57 (2009)
Ingenuity Pathways Analysis software, http://www.ingenuity.com
Deblois, G., et al.: Genome-wide identification of direct target genes implicates estrogen-related receptor alpha as a determinant of breast cancer heterogeneity. Cancer Res. 69(15), 6149–6157 (2009)
Yumei, F.: KNSL4 is a novel molecular marker for diagnosis and prognosis of breast cancer. American Assoc. for Cancer Res. (AACR) Meeting Abstracts, 1809 (2008)
Diarra-Mehrpour, M., et al.: Prion protein prevents human breast carcinoma cell line from tumor necrosis factor alpha-induced cell death. Cancer Res. 64(2), 719–727 (2004)
Tripathi, A., et al.: Gene expression abnormalities in histologically normal breast epithelium of breast cancer patients. Int. J. Cancer 122(7), 1557–1566 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Babaei, S., van den Akker, E., de Ridder, J., Reinders, M. (2011). Integrating Protein Family Sequence Similarities with Gene Expression to Find Signature Gene Networks in Breast Cancer Metastasis. In: Loog, M., Wessels, L., Reinders, M.J.T., de Ridder, D. (eds) Pattern Recognition in Bioinformatics. PRIB 2011. Lecture Notes in Computer Science(), vol 7036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24855-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-24855-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24854-2
Online ISBN: 978-3-642-24855-9
eBook Packages: Computer ScienceComputer Science (R0)