FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jan 4:8:3.
doi: 10.1186/1471-2105-8-3.

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

Affiliations

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

Limin Fu et al. BMC Bioinformatics. .

Abstract

Background: Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process.

Results: The clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with "archetypal" features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License.

Conclusion: The FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the identification of cluster outliers, i.e. genes that are not assigned to any cluster. As a result, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of biological functions. The clustering algorithm can be easily extended to applications different from gene expression analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The key steps of the FLAME algorithm shown on a small simulated dataset ("Starting data"). Step One: expression data are used to calculate for each gene a density value corresponding to the average similarity to its nearest neighbors (in the picture, darkness of each spot is proportional to density); Cluster Supporting Objects (CSOs) are then identified as genes with local maximum density and assigned unique membership to themselves. The red and green colors define two CSOs, while the blue color indicates outliers. Step Two: for all the other genes, a fuzzy membership vector is approximated from the memberships of their nearest neighbors, until convergence; for each spot, red, green and blue colors are now mixed in accordance with the fuzzy membership of that gene to the two clusters or to the outlier group. Step Three: at the end of this process, genes can be assigned to one of the two clusters built around the CSOs or to the outlier group, based on their approximated memberships.
Figure 2
Figure 2
Clustering validation and comparison by 2-Norm FOM. a, 2-Norm FOM on the reduced peripheral blood monocyte dataset. b, 2-Norm FOM on the reduced hypoxia response dataset. c, 2-Norm FOM on the reduced yeast cell cycle dataset.
Figure 3
Figure 3
Clustering validation and comparison by Partition Index. a, Partition Index on the reduced peripheral blood monocyte dataset. b, Partition Index on the hypoxia response dataset. c, Partition Index on the yeast cell cycle dataset. d, Partition Index on the mouse tissue dataset.
Figure 4
Figure 4
Clustering validation and comparison by Annotation Spreading Index. a, Spreading Index on the hypoxia response dataset. b, Spreading Index on the yeast cell cycle dataset. c, Spreading Index on the mouse tissue dataset.
Figure 5
Figure 5
Clustering validation and comparison by Correlation to Average Annotation Profile (CAVA). a, CAVA on the hypoxia response dataset. b, CAVA on the yeast cell cycle dataset. c, CAVA on the mouse tissue dataset.

Similar articles

Cited by

References

    1. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. - DOI - PMC - PubMed
    1. Tavazioie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nature Genetics. 1999;22:281–285. doi: 10.1038/10343. - DOI - PubMed
    1. Hughes JD, Estep PW, Tavazoie S, Church GM. Computational Identification of Cis-regulatory Elements Associated with Groups of Functionally Related Genes in Saccharomyces cerevisiae. J Mol Biol. 2000;296:1205–1214. doi: 10.1006/jmbi.2000.3519. - DOI - PubMed
    1. Handl J, Knowles J, Kell DB. Computational cluster validation in post-genomic data. Bioinformatics. 2005;21:3201–3212. doi: 10.1093/bioinformatics/bti517. - DOI - PubMed
    1. Di Gesu V, Giancarlo R, Lo Bosco G, Raimondi A, Scaturro D. GenClust: A Genetic Algorithm for Clustering Gene Expression Data. BMC Bioinformatics. 2005;6:289. doi: 10.1186/1471-2105-6-289. - DOI - PMC - PubMed

Publication types