Computational cluster validation in post-genomic data analysis
- PMID: 15914541
- DOI: 10.1093/bioinformatics/bti517
Computational cluster validation in post-genomic data analysis
Abstract
Motivation: The discovery of novel biological knowledge from the ab initio analysis of post-genomic data relies upon the use of unsupervised processing methods, in particular clustering techniques. Much recent research in bioinformatics has therefore been focused on the transfer of clustering methods introduced in other scientific fields and on the development of novel algorithms specifically designed to tackle the challenges posed by post-genomic data. The partitions returned by a clustering algorithm are commonly validated using visual inspection and concordance with prior biological knowledge--whether the clusters actually correspond to the real structure in the data is somewhat less frequently considered. Suitable computational cluster validation techniques are available in the general data-mining literature, but have been given only a fraction of the same attention in bioinformatics.
Results: This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis. Synthetic and real biological datasets are used to demonstrate the benefits, and also some of the perils, of analytical clustervalidation.
Availability: The software used in the experiments is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/.
Supplementary information: Enlarged colour plots are provided in the Supplementary Material, which is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/.
Similar articles
-
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5. Bioinformatics. 2007. PMID: 17483500
-
A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets.Bioinformatics. 2009 May 1;25(9):1152-7. doi: 10.1093/bioinformatics/btp123. Epub 2009 Mar 4. Bioinformatics. 2009. PMID: 19261720 Free PMC article.
-
How does gene expression clustering work?Nat Biotechnol. 2005 Dec;23(12):1499-501. doi: 10.1038/nbt1205-1499. Nat Biotechnol. 2005. PMID: 16333293 Review.
-
Graph-based consensus clustering for class discovery from gene expression data.Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14. Bioinformatics. 2007. PMID: 17872912
-
Unsupervised pattern recognition: an introduction to the whys and wherefores of clustering microarray data.Brief Bioinform. 2005 Dec;6(4):331-43. doi: 10.1093/bib/6.4.331. Brief Bioinform. 2005. PMID: 16420732 Review.
Cited by
-
VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder.Molecules. 2020 Jul 29;25(15):3446. doi: 10.3390/molecules25153446. Molecules. 2020. PMID: 32751155 Free PMC article.
-
A highly efficient multi-core algorithm for clustering extremely large datasets.BMC Bioinformatics. 2010 Apr 6;11:169. doi: 10.1186/1471-2105-11-169. BMC Bioinformatics. 2010. PMID: 20370922 Free PMC article.
-
Creating functional groups of marine fish from categorical traits.PeerJ. 2018 Oct 23;6:e5795. doi: 10.7717/peerj.5795. eCollection 2018. PeerJ. 2018. PMID: 30370185 Free PMC article.
-
Statistical power for cluster analysis.BMC Bioinformatics. 2022 May 31;23(1):205. doi: 10.1186/s12859-022-04675-1. BMC Bioinformatics. 2022. PMID: 35641905 Free PMC article.
-
Face detection in untrained deep neural networks.Nat Commun. 2021 Dec 16;12(1):7328. doi: 10.1038/s41467-021-27606-9. Nat Commun. 2021. PMID: 34916514 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources