Abstract
The microarray DNA technologies have given researchers the ability to examine, discover and monitor thousands of genes in a single experiment. Nonetheless, the tremendous amount of data that can be obtained from microarray studies presents a challenge for data analysis, mainly due to the very high data dimensionality. A particular class of clustering algorithms has been very successful in dealing with such data, utilising information driven by the Principal Component Analysis. In this paper, we investigate the application of recently proposed projection based hierarchical clustering algorithms on gene expression microarray data. The algorithms apart from identifying the clusters present in a data set also calculate their number and thus require no special knowledge about the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide array. Proc. Natl. Acad. Sci. USA 96(12), 6745–6750 (1999)
Bellman, R.: Adaptive control processes: A guided tour. Princeton University Press, Princeton (1961)
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful. In: 7th International Conference on Database Theory, pp. 217–235 (1999)
Boley, D.: Principal direction divisive partitioning. Data Mining and Knowledge Discovery 2(4), 325–344 (1998)
Brown, P., Botstein, D., Eisen, M., Spellman, P.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America 95(25), 14863–14868 (1998)
Chute, C., Yang, Y.: An overview of statistical methods for the classification and retrieval of patient events. Methods Inf. Med. 34(1-2), 104–110 (1995)
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Dhillon, I., Kogan, J., Nicholas, C.: Feature selection and document clustering. A Comprehensive Survey of Text Mining, 73–100 (2003)
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 269–274. ACM, New York (2001)
Golub, T., Slomin, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Caligiuri, M., Downing, J., Bloomfield, C., Lander, E.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 268, 531–537 (1999)
Greengard, L., Strain, J.: The fast gauss transform. SIAM J. Sci. Stat. Comput. 12(1), 79–94 (1991)
Jain, A.K., Dubes, R.C.: Algorithms for clustering data (1988)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999), http://citeseer.ist.psu.edu/jain99data.html
Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C., Peterson, C., Meltzer, P.: Classification and diagnostic prediction of cancers using expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)
Lax, P.D.: Linear algebra and its applications. Wiley Interscience, Hoboken (2007)
Nilsson, M.: Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning. Information Retrieval 5(4), 311–321 (2002)
Notterman, D.A., Alon, U., Sierk, A.J., Levine, A.J.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research 61, 3124–3130 (2001)
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery 2(2), 169–194 (1998)
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P.: Gene expression correlates of clinical prostate cancer behavior. Cancer cell 1(2), 203–209 (2002)
Steinbach, M., Ertz, L., Kumar, V.: The challenges of clustering high dimensional data. New Vistas in Statistical Physics: Applications in Econophysics, Bioinformatics, and Pattern Recognition (2003)
Tasoulis, S., Tasoulis, D.: Improving principal direction divisive clustering. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), Workshop on Data Mining using Matrices and Tensors, Las Vegas, USA (2008)
Tryon, C.: Cluster Analysis. Edward Brothers, Ann Arbor (1939)
Wen, X., Fuhrman, S., Michaels, G., Carr, D., Smith, S., Barker, J., Somogyi, R.: Large-scale temporal gene expression mapping of cns development. Proceedings of the National Academy of Sciences of the United States of America 95, 334–339 (1998)
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55(3), 311–331 (2004)
Yang, C., Duraiswami, R., Gumerov, N.A., Davis, L.: Improved fast gauss transform and efficient kernel density estimation. In: Proceedings of Ninth IEEE International Conference on Computer Vision, pp. 664–671 (2003)
Yeoh, E.J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer cell 1(2), 133–143 (2002)
Zeimpekis, D., Gallopoulos, E.: PDDP(l): Towards a Flexing Principal Direction Divisive Partitioning Clustering Algorithms. In: Boley, D., Dhillon, I., Ghosh, J., Kogan, J. (eds.) Proc. IEEE ICDM ’03 Workshop on Clustering Large Data Sets, Melbourne, Florida, pp. 26–35 (2003)
Zeimpekis, D., Gallopoulos, E.: Principal direction divisive partitioning with kernels and k-means steering. In: Survey of Text Mining II: Clustering, Classification, and Retrieval, pp. 45–64 (2007)
Zhangi, A., Jiang, D., Tang, C.: Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge Data Engineering 16(11), 1370–1386 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tasoulis, S.K., Plagianakos, V.P., Tasoulis, D.K. (2010). Projection Based Clustering of Gene Expression Data. In: Masulli, F., Peterson, L.E., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2009. Lecture Notes in Computer Science(), vol 6160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14571-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-14571-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14570-4
Online ISBN: 978-3-642-14571-1
eBook Packages: Computer ScienceComputer Science (R0)