FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

doi:10.1186/1471-2105-8-3

. 2007 Jan 4:8:3.

doi: 10.1186/1471-2105-8-3.

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

Limin Fu¹, Enzo Medico

Affiliations

Affiliation

¹ Laboratory of Functional Genomics, The Oncogenomics Center, Institute for Cancer Research and Treatment, University of Torino, School of Medicine, 10060 Candiolo, Italy. limin.fu@ircc.it <limin.fu@ircc.it>

PMID: 17204155
PMCID: PMC1774579
DOI: 10.1186/1471-2105-8-3

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

Limin Fu et al. BMC Bioinformatics. 2007.

. 2007 Jan 4:8:3.

doi: 10.1186/1471-2105-8-3.

Authors

Limin Fu¹, Enzo Medico

Affiliation

¹ Laboratory of Functional Genomics, The Oncogenomics Center, Institute for Cancer Research and Treatment, University of Torino, School of Medicine, 10060 Candiolo, Italy. limin.fu@ircc.it <limin.fu@ircc.it>

PMID: 17204155
PMCID: PMC1774579
DOI: 10.1186/1471-2105-8-3

Abstract

Background: Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process.

Results: The clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with "archetypal" features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License.

Conclusion: The FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the identification of cluster outliers, i.e. genes that are not assigned to any cluster. As a result, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of biological functions. The clustering algorithm can be easily extended to applications different from gene expression analysis.

PubMed Disclaimer

Figures

**Figure 1**
The key steps of the FLAME algorithm shown on a small simulated dataset ("Starting data"). **Step One:** expression data are used to calculate for each gene a density value corresponding to the average similarity to its nearest neighbors (in the picture, darkness of each spot is proportional to density); Cluster Supporting Objects (CSOs) are then identified as genes with local maximum density and assigned unique membership to themselves. The red and green colors define two CSOs, while the blue color indicates outliers. **Step Two:** for all the other genes, a fuzzy membership vector is approximated from the memberships of their nearest neighbors, until convergence; for each spot, red, green and blue colors are now mixed in accordance with the fuzzy membership of that gene to the two clusters or to the outlier group. **Step Three:** at the end of this process, genes can be assigned to one of the two clusters built around the CSOs or to the outlier group, based on their approximated memberships.

**Figure 2**
Clustering validation and comparison by 2-Norm FOM. a, 2-Norm FOM on the reduced peripheral blood monocyte dataset. b, 2-Norm FOM on the reduced hypoxia response dataset. c, 2-Norm FOM on the reduced yeast cell cycle dataset.

**Figure 3**
Clustering validation and comparison by Partition Index. a, Partition Index on the reduced peripheral blood monocyte dataset. b, Partition Index on the hypoxia response dataset. c, Partition Index on the yeast cell cycle dataset. d, Partition Index on the mouse tissue dataset.

**Figure 4**
Clustering validation and comparison by Annotation Spreading Index. a, Spreading Index on the hypoxia response dataset. b, Spreading Index on the yeast cell cycle dataset. c, Spreading Index on the mouse tissue dataset.

**Figure 5**
Clustering validation and comparison by Correlation to Average Annotation Profile (CAVA). a, CAVA on the hypoxia response dataset. b, CAVA on the yeast cell cycle dataset. c, CAVA on the mouse tissue dataset.

See this image and copyright information in PMC

Cited by

Gene expression profiling of HGF/Met activation in neonatal mouse heart.
Gatti S, Leo C, Gallo S, Sala V, Bucci E, Natale M, Cantarella D, Medico E, Crepaldi T. Gatti S, et al. Transgenic Res. 2013 Jun;22(3):579-93. doi: 10.1007/s11248-012-9667-2. Epub 2012 Dec 6. Transgenic Res. 2013. PMID: 23224784
Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.
Celton M, Malpertuy A, Lelandais G, de Brevern AG. Celton M, et al. BMC Genomics. 2010 Jan 7;11:15. doi: 10.1186/1471-2164-11-15. BMC Genomics. 2010. PMID: 20056002 Free PMC article.
Clusterdv: a simple density-based clustering method that is robust, general and automatic.
Marques JC, Orger MB. Marques JC, et al. Bioinformatics. 2019 Jun 1;35(12):2125-2132. doi: 10.1093/bioinformatics/bty932. Bioinformatics. 2019. PMID: 30407500 Free PMC article.
The thioxotriazole copper(II) complex A0 induces endoplasmic reticulum stress and paraptotic death in human cancer cells.
Tardito S, Isella C, Medico E, Marchiò L, Bevilacqua E, Hatzoglou M, Bussolati O, Franchi-Gazzola R. Tardito S, et al. J Biol Chem. 2009 Sep 4;284(36):24306-19. doi: 10.1074/jbc.M109.026583. Epub 2009 Jun 26. J Biol Chem. 2009. PMID: 19561079 Free PMC article.
TNF-α promotes invasive growth through the MET signaling pathway.
Bigatto V, De Bacco F, Casanova E, Reato G, Lanzetti L, Isella C, Sarotto I, Comoglio PM, Boccaccio C. Bigatto V, et al. Mol Oncol. 2015 Feb;9(2):377-88. doi: 10.1016/j.molonc.2014.09.002. Epub 2014 Sep 26. Mol Oncol. 2015. PMID: 25306394 Free PMC article.

See all "Cited by" articles

References

1. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. - DOI - PMC - PubMed
1. Tavazioie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nature Genetics. 1999;22:281–285. doi: 10.1038/10343. - DOI - PubMed
1. Hughes JD, Estep PW, Tavazoie S, Church GM. Computational Identification of Cis-regulatory Elements Associated with Groups of Functionally Related Genes in Saccharomyces cerevisiae. J Mol Biol. 2000;296:1205–1214. doi: 10.1006/jmbi.2000.3519. - DOI - PubMed
1. Handl J, Knowles J, Kell DB. Computational cluster validation in post-genomic data. Bioinformatics. 2005;21:3201–3212. doi: 10.1093/bioinformatics/bti517. - DOI - PubMed
1. Di Gesu V, Giancarlo R, Lo Bosco G, Raimondi A, Scaturro D. GenClust: A Genetic Algorithm for Clustering Gene Expression Data. BMC Bioinformatics. 2005;6:289. doi: 10.1186/1471-2105-6-289. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. - DOI - PMC - PubMed

[2] Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. - DOI - PMC - PubMed

[3] Tavazioie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nature Genetics. 1999;22:281–285. doi: 10.1038/10343. - DOI - PubMed

[4] Tavazioie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nature Genetics. 1999;22:281–285. doi: 10.1038/10343. - DOI - PubMed

[5] Hughes JD, Estep PW, Tavazoie S, Church GM. Computational Identification of Cis-regulatory Elements Associated with Groups of Functionally Related Genes in Saccharomyces cerevisiae. J Mol Biol. 2000;296:1205–1214. doi: 10.1006/jmbi.2000.3519. - DOI - PubMed

[6] Hughes JD, Estep PW, Tavazoie S, Church GM. Computational Identification of Cis-regulatory Elements Associated with Groups of Functionally Related Genes in Saccharomyces cerevisiae. J Mol Biol. 2000;296:1205–1214. doi: 10.1006/jmbi.2000.3519. - DOI - PubMed

[7] Handl J, Knowles J, Kell DB. Computational cluster validation in post-genomic data. Bioinformatics. 2005;21:3201–3212. doi: 10.1093/bioinformatics/bti517. - DOI - PubMed

[8] Handl J, Knowles J, Kell DB. Computational cluster validation in post-genomic data. Bioinformatics. 2005;21:3201–3212. doi: 10.1093/bioinformatics/bti517. - DOI - PubMed

[9] Di Gesu V, Giancarlo R, Lo Bosco G, Raimondi A, Scaturro D. GenClust: A Genetic Algorithm for Clustering Gene Expression Data. BMC Bioinformatics. 2005;6:289. doi: 10.1186/1471-2105-6-289. - DOI - PMC - PubMed

[10] Di Gesu V, Giancarlo R, Lo Bosco G, Raimondi A, Scaturro D. GenClust: A Genetic Algorithm for Clustering Gene Expression Data. BMC Bioinformatics. 2005;6:289. doi: 10.1186/1471-2105-6-289. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

Affiliation

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials