Abstract
Efficient and effective analysis of large datasets from microarray gene expression data is one of the keys to time-critical personalized medicine. The issue we address here is the scalability of the data processing software for clustering gene expression data into groups with homogeneous expression profile. In this paper we propose FPF-SB, a novel clustering algorithm based on a combination of the Furthest-Point-First (FPF) heuristic for solving the k-center problem and a stability-based method for determining the number of clusters k. Our algorithm improves the state of the art: it is scalable to large datasets without sacrificing output quality.
Chapter PDF
Similar content being viewed by others
Keywords
- Cluster Algorithm
- Gene Expression Data
- Cluster Validation
- Microarray Gene Expression Data
- Prediction Strength
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96(12), 6745–6750 (1999)
Belacel, N., Cuperlovic-Culf, M., Laflamme, M., Ouellette, R.: Fuzzy J-Means and VNS methods for clustering genes from microarray data. Bioinf. 20(11), 1690–1701 (2004)
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J Comput Biol. 6(3-4), 281–297 (1999)
Cho, R.J., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell. 2(1), 65–73 (1988)
Clarkson, K.L.: Nearest-neighbor searching and metric space dimensions. In: Shakhnarovich, G., Darrell, T., Indyk, P. (eds.) Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pp. 15–59. MIT Press, Cambridge (2006)
Eisen, M.B., Spellman, P.T., Browndagger, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. PNAS 95(25), 14863–14868 (1998)
Ernst, J., Naur, G.J., Bar-Joseph, Z.: Clustering short time series gene expression. Bioinf. 21(1), i159–i168 (2005)
Feder, T., Greene, D.H.: Optimal algortihms for approximate clustering. In: Proc. of 20th ACM Symposium on Theory of Computing, pp. 434–444 (1988)
Geraci, F., Pellegrini, M., Sebastiani, F., Pisati, P.: A Scalable Algorithm for High-Quality Clustering of Web Snippets. In: Proc. of 21st ACM Symposium on Applied Computing 2006 (2006)
Gibbons, F.D., Roth, F.P.: Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation. Genome Research 12, 1574–1581 (2000)
Gonzalez, T.: Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 293–306 (1985)
Hastie, T., et al.: Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome. Biol. 1(2) (2000)
Holloway, A.J., et al.: Options available - from start to finish - for obtaining data from DNA microarrays II. Nature Gen. Suppl. 32, 481–489 (2002)
Huang, D., Pan, W.: Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data. Bioinf. 22(10), 1259–1268 (2006)
Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data: A Survey. IEEE Trans. on Knowledge and Data Eng. 16(11), 1370–1386 (2004)
Ramoni, M.F., Sebastiani, P., Kohane, I.S.: Cluster analysis of gene expression dynamics. Proc. Nat. Acad. Sci. USA 99(14), 9121–9126 (2002)
Schadt, E.E., et al.: A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome Biology 5(10), 73 (2004)
Sharan, R., Maron-Katz, A., Shamir, R.: CLICK and EXPANDER: A System for Clustering and Visualizing Gene Expression Data. Bioinf. 19(14), 1787–1799 (2003)
Spellman, P.T., et al.: Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Mol. Biol. Cell. 9, 3273–3297 (1998)
Tamayo, P., et al.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96(6), 2907–2912 (1999)
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)
Tibshirani, R., Walther, G., Botstein, D., Brown, P.: Cluster validation by prediction strength. Journal of Computational & Graphical Statistics 14, 511–528 (2005)
Trent, J.M., Bexevanis, A.D.: Chipping away at genomic medicine. Nature Genetics (Suppl), p. 426 (2002)
Wen, X., et al.: Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95(1), 334–349 (1988)
Xing, E.P., Karp, R.M.: CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinf. 17(1), 306–315 (2001)
Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinf. 17(4), 309–318 (2001)
WWW, Personalized Medicine Coalition, The case for Personalised Medicine http://www.personalizedmedicinecoalition.org
WWW, The Royal Society, Personalised medicines: hopes and realities http://www.royalsoc.ac.uk
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Geraci, F., Leoncini, M., Montangero, M., Pellegrini, M., Renda, M.E. (2007). FPF-SB: A Scalable Algorithm for Microarray Gene Expression Data Clustering. In: Duffy, V.G. (eds) Digital Human Modeling. ICDHM 2007. Lecture Notes in Computer Science, vol 4561. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73321-8_69
Download citation
DOI: https://doi.org/10.1007/978-3-540-73321-8_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73318-8
Online ISBN: 978-3-540-73321-8
eBook Packages: Computer ScienceComputer Science (R0)