{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,6]],"date-time":"2024-10-06T00:34:56Z","timestamp":1728174896776},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"19","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2006,10,1]]},"abstract":"Abstract<\/jats:title>\n Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods.<\/jats:p>\n Results: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis.<\/jats:p>\n Contact: ctseng@pitt.edu<\/jats:p>\n Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btl406","type":"journal-article","created":{"date-parts":[[2006,8,2]],"date-time":"2006-08-02T18:47:44Z","timestamp":1154544464000},"page":"2405-2412","source":"Crossref","is-referenced-by-count":219,"title":["Evaluation and comparison of gene clustering methods in microarray analysis"],"prefix":"10.1093","volume":"22","author":[{"given":"Anbupalam","family":"Thalamuthu","sequence":"first","affiliation":[{"name":"Department of Human Genetics, University of Pittsburgh 1 \u00a0 1 \u00a0 \u00a0 Pittsburgh, PA, USA"}]},{"given":"Indranil","family":"Mukhopadhyay","sequence":"additional","affiliation":[{"name":"Department of Human Genetics, University of Pittsburgh 1 \u00a0 1 \u00a0 \u00a0 Pittsburgh, PA, USA"}]},{"given":"Xiaojing","family":"Zheng","sequence":"additional","affiliation":[{"name":"Department of Human Genetics, University of Pittsburgh 1 \u00a0 1 \u00a0 \u00a0 Pittsburgh, PA, USA"}]},{"given":"George C.","family":"Tseng","sequence":"additional","affiliation":[{"name":"Department of Human Genetics, University of Pittsburgh 1 \u00a0 1 \u00a0 \u00a0 Pittsburgh, PA, USA"},{"name":"Department of Biostatistics, University of Pittsburgh 2 \u00a0 2 \u00a0 \u00a0 Pittsburgh, PA, USA"}]}],"member":"286","published-online":{"date-parts":[[2006,7,31]]},"reference":[{"key":"2023012409233326800_b1","doi-asserted-by":"crossref","first-page":"13790","DOI":"10.1073\/pnas.191502998","article-title":"Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses","volume":"98","author":"Bhattacharjee","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012409233326800_b2","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1038\/4462","article-title":"Exploring the new world of the genome with DNA microarrays","volume":"21","author":"Brown","year":"1999","journal-title":"Nature Genetics"},{"key":"2023012409233326800_b3","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1091\/mbc.12.2.323","article-title":"Remodeling of yeast genome expression in response to environmental changes","volume":"12","author":"Causton","year":"2001","journal-title":"Mol. Biol. Cell"},{"key":"2023012409233326800_b4","first-page":"93","article-title":"Biclustering of expression data","volume":"8","author":"Cheng","year":"2000","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol."},{"key":"2023012409233326800_b5","doi-asserted-by":"crossref","first-page":"973-980","DOI":"10.1093\/bioinformatics\/btg119","article-title":"Fuzzy C-means method for clustering microarray data","volume":"19","author":"Demb\u00e9l\u00e9","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012409233326800_b6","doi-asserted-by":"crossref","first-page":"RESEARCH0036","DOI":"10.1186\/gb-2002-3-7-research0036","article-title":"A prediction-based resampling method for estimating the number of clusters in a dataset","volume":"3","author":"Dudoit","year":"2002","journal-title":"Genome Biol."},{"key":"2023012409233326800_b7","doi-asserted-by":"crossref","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","article-title":"Cluster analysis and display of genome-wide expression patterns","volume":"95","author":"Eisen","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012409233326800_b8","article-title":"MCLUST:Software for model-based clustering, density estimation and discriminant analysis","author":"Fraley","year":"2002"},{"key":"2023012409233326800_b9","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1198\/016214502760047131","article-title":"Model-based clustering, discriminant analysis, and density estimation","volume":"97","author":"Fraley","year":"2002","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012409233326800_b10","doi-asserted-by":"crossref","first-page":"3201","DOI":"10.1093\/bioinformatics\/bti517","article-title":"Computational cluster validation in post-genomic data analysis","volume":"21","author":"Handl","year":"2005","journal-title":"Bioinformatics,"},{"key":"2023012409233326800_b11","doi-asserted-by":"crossref","first-page":"126","DOI":"10.2307\/2346830","article-title":"A K-means clustering alrorithm","volume":"28","author":"Hartigan","year":"1979","journal-title":"Appl. Stat."},{"key":"2023012409233326800_b12","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Classif."},{"key":"2023012409233326800_b13","doi-asserted-by":"crossref","DOI":"10.1002\/9780470316801","volume-title":"Finding Groups in Data: An Introduction to Cluster Analysis","author":"Kaufman","year":"1990"},{"key":"2023012409233326800_b14","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1189","article-title":"A new type of stochastic-dependence revealed in gene expression aata","volume":"5","author":"Klebanov","year":"2006","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023012409233326800_b15","doi-asserted-by":"crossref","first-page":"1464","DOI":"10.1109\/5.58325","article-title":"The self-organizing map","volume":"78","author":"Kohonen","year":"1990","journal-title":"Proc. IEEE"},{"key":"2023012409233326800_b16","doi-asserted-by":"crossref","first-page":"16875","DOI":"10.1073\/pnas.252466999","article-title":"Genome-wide coexpression dynamics: theory and application","volume":"99","author":"Li","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012409233326800_b17","first-page":"281","article-title":"Some methods for classification and analysis of multivariate observations","volume":"1","author":"MacQueen","year":"1967","journal-title":"Proc. fifth Berkeley Symp. Math. Stat. Prob."},{"key":"2023012409233326800_b18","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1093\/bioinformatics\/18.3.413","article-title":"A mixture model-based approach to the clustering of microarray expression data","volume":"18","author":"McLachlan","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012409233326800_b19","doi-asserted-by":"crossref","first-page":"1194","DOI":"10.1093\/bioinformatics\/18.9.1194","article-title":"Bayesian infinite mixture model based clustering of gene expression profiles","volume":"18","author":"Medvedovic","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012409233326800_b20","doi-asserted-by":"crossref","first-page":"1222","DOI":"10.1093\/bioinformatics\/bth068","article-title":"Bayesian mixture model based clustering of replicated microarray data","volume":"20","author":"Medvedovic","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012409233326800_b21","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1007\/BF02294245","article-title":"An examination of procedures for determining number of clusters in a data set","volume":"50","author":"Milligan","year":"1985","journal-title":"Psychometrika"},{"key":"2023012409233326800_b22","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/A:1023949509487","article-title":"A resampling-based method for class discovery and visualization of gene-expression microarray data","volume":"52","author":"Monti","year":"2003","journal-title":"Machine Learning"},{"key":"2023012409233326800_b23","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","article-title":"Objective criteria for the evaluation of clustering methods","volume":"66","author":"Rand","year":"1971","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012409233326800_b24","volume-title":"A Language and Environment for Statistical Computing","author":"RDevelopmentCoreTeam","year":"2004"},{"key":"2023012409233326800_b25","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1186\/1471-2105-4-36","article-title":"Cluster stability scores for microarray data in cancer studies","volume":"4","author":"Smolkin","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023012409233326800_b26","doi-asserted-by":"crossref","first-page":"8418","DOI":"10.1073\/pnas.0932692100","article-title":"Repeated observation of breast tumor subtypes in independent gene expression data sets","volume":"100","author":"Sorlie","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012409233326800_b27","doi-asserted-by":"crossref","first-page":"3273","DOI":"10.1091\/mbc.9.12.3273","article-title":"Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization","volume":"9","author":"Spellman","year":"1998","journal-title":"Mol. Biol. Cell"},{"key":"2023012409233326800_b28","doi-asserted-by":"crossref","first-page":"2907","DOI":"10.1073\/pnas.96.6.2907","article-title":"Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation","volume":"96","author":"Tamayo","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012409233326800_b29","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1038\/10343","article-title":"Systematic determination of genetic network architecture","volume":"22","author":"Tavazoie","year":"1999","journal-title":"Nature Genetics"},{"key":"2023012409233326800_b30","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a dataset via the Gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J. R. Stat.Soc. B"},{"key":"2023012409233326800_b31","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","article-title":"Missing value estimation methods for DNA microarrays","volume":"17","author":"Troyanskaya","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012409233326800_b32","doi-asserted-by":"crossref","first-page":"2549","DOI":"10.1093\/nar\/29.12.2549","article-title":"Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects","volume":"29","author":"Tseng","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012409233326800_b33","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1111\/j.0006-341X.2005.031032.x","article-title":"Tight clustering: a resampling-based approach for identifying stable and tight patterns in data","volume":"61","author":"Tseng","year":"2005","journal-title":"Biometrics"},{"key":"2023012409233326800_b34","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1016\/S0378-3758(02)00388-9","article-title":"A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap","volume":"117","author":"van der Laan","year":"2003","journal-title":"J. Stat. Plann. Infer."},{"key":"2023012409233326800_b35","doi-asserted-by":"crossref","first-page":"1977","DOI":"10.1091\/mbc.02-02-0030","article-title":"Identification of genes periodically expressed in the human cell cycle and their expression in tumors","volume":"13","author":"Whitfield","year":"2002","journal-title":"Mol. Biol. Cell"},{"key":"2023012409233326800_b36","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1038\/ng906","article-title":"Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters","volume":"31","author":"Wu","year":"2002","journal-title":"Nat. Genet."},{"key":"2023012409233326800_b37","doi-asserted-by":"crossref","first-page":"e15","DOI":"10.1093\/nar\/30.4.e15","article-title":"Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation","volume":"30","author":"Yang","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023012409233326800_b38","doi-asserted-by":"crossref","first-page":"977","DOI":"10.1093\/bioinformatics\/17.10.977","article-title":"Model-based clustering and data transformations for gene expression data","volume":"17","author":"Yeung","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012409233326800_b39","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1093\/bioinformatics\/17.4.309","article-title":"Validating clustering for gene expression data","volume":"17","author":"Yeung","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012409233326800_b40","doi-asserted-by":"crossref","first-page":"12783","DOI":"10.1073\/pnas.192159399","article-title":"From the cover: transitive functional annotation by shortest-path analysis of gene expression data","volume":"99","author":"Zhou","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/19\/2405\/48840928\/bioinformatics_22_19_2405.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/19\/2405\/48840928\/bioinformatics_22_19_2405.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,24]],"date-time":"2023-01-24T10:09:05Z","timestamp":1674554945000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/22\/19\/2405\/241466"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,7,31]]},"references-count":40,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2006,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btl406","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2006,10,1]]},"published":{"date-parts":[[2006,7,31]]}}}