Abstract
Clustering is a major exploratory technique for gene expression data in post-genomic era. As essential tools within cluster analysis, cluster validation techniques have the potential to assess the quality of clustering results and performance of clustering algorithms, helpful to the interpretation of clustering results. In this work, the validation ability of Silhouette index, Dunn’s index, Davies-Bouldin index and FOM in gene clustering was investigated with public gene expression datasets clustered by hierarchical single-linkage and average-linkage clustering, K-means and SOMs. It was made clear that Silhouette index and FOM can preferably validate the performance of clustering algorithms and the quality of clustering results, Dunn’s index should not be used directly in gene clustering validation for its high susceptibility to outliers, while Davies- Bouldin index can afford better validation than Dunn’s index, exception for its preference to hierarchical single-linkage clustering.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering 16, 1370–1386 (2004)
Amir, B., Friedman, N., Yakhini, Z.: Class discovery in gene expression data. In: RECOMB, pp. 31–38 (2001)
Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427 (2001)
Slonim, D.K.: From patterns to pathways: gene expression data analysis comes of age. Nature Genetics 32, 502–508 (2002)
Sherlock, G.: Analysis of large-scale gene expression data. Current Opinion in Immunology 12, 201–205 (2000)
Datta, S., Datta, S.: Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19, 459–466 (2003)
Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17, 309–318 (2001)
Eisen, M.B., Spellman, P.T., Brown, P.O., et al.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)
Halkidi, M.: On clustering validation techniques. J. Intell. Inform. Syst. 17, 107–145 (2001)
Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3212 (2005)
Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Processing 83, 825–833 (2003)
Ji, X.L., Li, L.J., Sun, Z.R.: Mining gene expression data using a novel approach based on hidden Markov models. FEBS Letters 542, 125–131 (2003)
Bolshakova, N., Azuaje, F.: Improving expression data mining through cluster validation. In: Proc. of the 4th Annual IEEE conf. on Information Technology Application in Biomedicine, pp. 19–22 (2003)
Chu, S., DeRisi, J., Eisen, M., et al.: The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998)
Cho, R.J., Campbell, M.J., Winzeler, E.A., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 65–73 (1998)
Tavazoie, S., Huges, J.D., Campbell, M.J., et al.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)
Wen, X.L., Fuhrman, S., Michaels, G.S., et al.: Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95, 334–339 (1998)
Ideker, T., Thorsson, V., Ranish, J.A., et al.: Integrated genomic and proteomic analyses of a systemically perturbed metabolic network. Science 292, 929–934 (2001)
Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: Clustering gene expression data with repeated measurements. Genome Biology 4, R34 (2003)
Iyer, V.R., Eisen, M.B., Ross, D.T., et al.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)
Xu, Y., Olman, V., Xu, D.: Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics 18, 536–545 (2002)
Yang, C.M., Wan, B.K., Gao, X.F.: Selections of data preprocessing methods and similarity metrics for gene cluster analysis. Progress in Nature Science 16, 607–713 (2006)
Yang, C.M., Wan, B.K., Gao, X.F.: Data preprocessing in cluster analysis of gene expression. Chin. Phys. Lett. 20, 774–777 (2003)
Rousseuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65 (1987)
Bezdek, J.C., Nikhil, R.P.: Some new indexes of cluster validity. IEEE Transactions on systems, man, and cybernetics 28, 301–315 (1998)
Azuaje, F.: A cluster validity framework for genome expression data. Bioinformatics 18, 319–320 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, C., Wan, B., Gao, X. (2006). Effectivity of Internal Validation Techniques for Gene Clustering. In: Maglaveras, N., Chouvarda, I., Koutkias, V., Brause, R. (eds) Biological and Medical Data Analysis. ISBMDA 2006. Lecture Notes in Computer Science(), vol 4345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11946465_5
Download citation
DOI: https://doi.org/10.1007/11946465_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68063-5
Online ISBN: 978-3-540-68065-9
eBook Packages: Computer ScienceComputer Science (R0)