Abstract
Recently DNA microarray gene expression studies have been actively performed for mining unknown biological knowledge hidden under a large volume of gene expression data in a systematic way. In particular, the problem of finding groups of co-expressed genes or samples has been largely investigated due to its usefulness in characterizing unknown gene functions or performing more sophisticated tasks, such as modeling biological pathways. Nevertheless, there are still some difficulties in practice to identify good clusters since many clustering methods require user’s arbitrary selection of the number of target clusters. In this paper we propose a novel approach to systematically identifying good candidates of cluster numbers so that we can minimize the arbitrariness in cluster generation. Our experimental results on both synthetic dataset and real gene expression dataset show the applicability and usefulness of this approach in microarray data mining.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hand, D.J., Heard, N.A.: Finding groups in gene expression data. Journal of Biomedicine and Biotechnology 2, 215–225 (2005)
Slonim, D.K.: From patterns to pathways: gene expression data analysis comes of age. Nature genetics supplement 32, 502–508 (2002)
Walker, M.G.: Pharmaceutical target identification by gene expression analysis. Mini reviews in medicinal chemistry 1, 197–205 (2001)
Eisen, M.B., Spellman, P.T., Brown, P.O., Bostein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95, 14863–14868 (1998)
Tamayo, P., et al.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. 96, 2907–2912 (1999)
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Liu, H., Li, J., Wong, L.: Use of extreme patient samples for outcome prediction from gene expression data. Bioinformatics 21(16), 3377–3384 (2005)
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)
Toh, H., Horimoto, K.: Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 18(2), 287–297 (2002)
Xu, R., Wunsch II, D.: Survey of clustering algorithms. IEEE Trans. on Neural Networks 16(3), 645–678 (2005)
Horn, D., Axel, I.: Novel clustering algorithm for microarray expression data in a truncated SVD space. Bioinformatics 19, 1110–1115 (2003)
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 1090–1099 (2003)
Dhilon, I., et al.: Diametrical clustering for identifying anti-correlated gene clusters. Bioinformatics 19, 1612–1619
Sharan, R., et al.: Click and expander: a system for clustering and visualizing gene expression data. Bioinformatics 19, 1787–1799 (2003)
Bolshakova, N., Azuaje, F.: Estimating the number of clusters in DNA microarray data. Methods Inf. Med. 45(2), 153–157 (2006)
Amato, R., et al.: A multi-step approach to time series analysis and gene expression clustering. Bioinformatics 22(5), 589–596 (2006)
Tseng, V.S., Kao, C.-P.: Efficiently mining gene expression data via a novel parameterless clustering method. IEEE/ACM trans. on Comp. Biology and Bioinformatics 2(4), 355–365 (2005)
Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. The Johns Hopkins University Press (1996)
Quackenbush, J.: Computational analysis of microarray data. Nature Reviews Genetics 2, 418–422 (2001)
Cho, R.J., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 65–73 (1998)
Shin, M., Park, S.H.: Microarray expression data analysis using seed-based clustering method. Key engineering materials 277, 343–348 (2005)
Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shin, M. (2006). A Novel Approach for Effective Learning of Cluster Structures with Biological Data Applications. In: Dalkilic, M.M., Kim, S., Yang, J. (eds) Data Mining and Bioinformatics. VDMB 2006. Lecture Notes in Computer Science(), vol 4316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11960669_2
Download citation
DOI: https://doi.org/10.1007/11960669_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68970-6
Online ISBN: 978-3-540-68971-3
eBook Packages: Computer ScienceComputer Science (R0)