Abstract
One of the challenges in analyzing high-dimensional expression data is the detection of important biological signals. A common approach is to apply a dimension reduction method, such as principal component analysis. Typically, after application of such a method the data is projected and visualized in the new coordinate system, using scatter plots or profile plots. These methods provide good results if the data have certain properties which become visible in the new coordinate system but which were hard to detect in the original coordinate system. Often however, the application of only one method does not suffice to capture all important signals. Therefore several methods addressing different aspects of the data need to be applied. We have developed a framework for linear and non-linear dimension reduction methods within our visual analytics pipeline SpRay. This includes measures that assist the interpretation of the factorization result. Different visualizations of these measures can be combined with functional annotations that support the interpretation of the results. We show an application to high-resolution time series microarray data in the antibiotic-producing organism Streptomyces coelicolor as well as to microarray data measuring expression of cells with normal karyotype and cells with trisomies of human chromosomes 13 and 21.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdi H, Williams LJ (2010) Principal component analysis. Wiley interdisciplinary reviews. Comput Stat 2(4): 433–459
Agilent Technologies (2007) GeneSpring GX manual. http://www.chem.agilent.com/cag/bsp/products/gsgx/manuals/GeneSpring-manual.pdf
Altug-Teber Ö, Bonin M, Walter M, Mau-Holzmann UA, Dufke A, Stappert H, Tekesin I, Heilbronner H, Nieselt K, Riess O (2008) Specific transcriptional changes in human fetuses with autosomal trisomies. Cytogenet Genome Res 119(3-4): 171–184
Battke F, Symons S, Nieselt K (2010) Mayday—integrative analytics for expression data. BMC Bioinform 11(1): 121
Battke F, Herbig A, Wentzel A, Jakobsen ØM, Bonin M, Hodgson DA, Wohlleben W, Ellingsen TE, Nieselt K (2011) A technical platform for generating reproducible expression data from Streptomyces coelicolor batch cultivations. In: Arabnia HRR, Tran QN (eds) Software tools and algorithms for biological systems, advances in experimental medicine and biology, vol 696. Springer, New York, , pp 3–15
Dietzsch J, Heinrich J, Nieselt K, Bartz D (2009) Spray: a visual analytics approach for gene expression data. In: IEEE symposium on visual analytics science and technology (VAST)
Fontes M, Soneson C (2011) The projection score—an evaluation criterion for variable subset selection in PCA visualization. BMC Bioinform 12(1): 307
Golub GH, van Loan CF (1983) Matrix computations, 1st edn. The John Hopkins University Press, Baltimore
Harrower M, Brewer C (2003) ColorBrewer.org: an online tool for selecting colour schemes for maps. Cartogr J 40(1): 27–37
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Edu Psychol 24(7): 498–520
Hyvaerinen A (1997) New approximations of differential entropy for independent component analysis and projection pursuit. In: Advances in neural information processing systems, vol 10. pp 273–279
Hyvaerinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw 10(3): 626–634
Hyvaerinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9(7): 1483–1492
Hyvaerinen A, Karhunen J, Oja E (2001) Independent component analysis. In: Adaptive and learning systems for signal processing, communication, and control, 1st edn. Wiley-Interscience, New York
Inselberg A (1985) The plane with parallel coordinates. Visual Comput 1(2): 69–91
Inselberg A (2009) Parallel coordinates: visual multidimensional geometry and its applications. Springer, New York
Jeong DH, Ziemkiewicz C, Fisher B, Ribarsky W, Chang R (2009) iPCA: an interactive system for PCA-based visual analytics. Comput Graph Forum 28(3): 767–774
Joliffe I (2002) Principal component analysis, 2nd edn. Springer series in statistics, New York
Kaiser HF (1958) The varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3): 187–200
Karbauskaite R, Kurasova O, Dzemyda G (2007) Selection of the number of neighbors of each data point for the locally linear embedding algorithm. Inf Technol Control 36(4): 359–364
Kouropteva O, Okun O, Pietikinen M (2002) Selection of the optimal parameter value for the locally linear embedding algorithm. In: Proceedings of the 1st international conference on fuzzy systems and knowledge discovery, pp 359–363
Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Norton H, Brown EL (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14(13): 1675–1680
Mannfolk P, Wirestam R, Nilsson M, Sthlberg F, Olsrud J (2010) Dimensionality reduction of fMRI time series data using locally linear embedding. Magn Reson Mater Phys Biol Med 23(5-6): 327–338
Nieselt K, Battke F, Herbig A, Bruheim P, Wentzel A, Jakobsen O, Sletta H, Alam M, Merlo M, Moore J, Omara W, Morrissey E, Juarez-Hermosillo M, Rodriguez-Garcia A, Nentwich M, Thomas L, Iqbal M, Legaie R, Gaze W, Challis G, Jansen R, Dijkhuizen L, Rand D, Wild D, Bonin M, Reuther J, Wohlleben W, Smith M, Burroughs N, Martin J, Hodgson D, Takano E, Breitling R, Ellingsen T, Wellington E (2010) The dynamic architecture of the metabolic switch in Streptomyces coelicolor. BMC Genomics 11(1):10
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2(6): 559–572
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500): 2323–2326
Saeed A, Bhagabati N, Braisted J, Liang W, Sharov V, Howe E, Li J, Thiagarajan M, White J, Quackenbush J (2006) TM4 microarray software suite. Methods Enzymol 411: 134–193
Saul LK, Roweis ST (2003) Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Mach Learn Res 4: 119–155
Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235): 467–470
Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26(10): 1135–1145
Tarjan R (1972) Depth-first search and linear graph algorithms. SIAM J Comput 1(2): 146–160
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500): 2319–2323
Valencia-Aguirre J, lvarez Mesa A, Daza-Santacoloma G, Castellanos-Domnguez G (2009) Automatic choice of the number of nearest neighbors in locally linear embedding. In: Bayro-Corrochano E, Eklundh JO (eds) Progress in pattern recognition, image analysis, computer vision, and applications. Lecture notes in computer science, vol 5856. Springer, New York, pp 77–84
Weinberger KQ, Saul LK (2006) Unsupervised learning of image manifolds by semidefinite programming. Int J Comput Vision 70(1): 77–90
Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J Sci Comput 26(1): 313–338
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Barbara Hammer, Daniel Keim, Guy Lebanon, Neil Lawrence.
Rights and permissions
About this article
Cite this article
Lehrmann, A., Huber, M., Polatkan, A.C. et al. Visualizing dimensionality reduction of systems biology data. Data Min Knowl Disc 27, 146–165 (2013). https://doi.org/10.1007/s10618-012-0268-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0268-8
Keywords
- Dimension reduction
- Principal component analysis
- Independent component analysis
- Local linear embedding
- Systems biology