Visualizing dimensionality reduction of systems biology data | Data Mining and Knowledge Discovery Skip to main content
Log in

Visualizing dimensionality reduction of systems biology data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

One of the challenges in analyzing high-dimensional expression data is the detection of important biological signals. A common approach is to apply a dimension reduction method, such as principal component analysis. Typically, after application of such a method the data is projected and visualized in the new coordinate system, using scatter plots or profile plots. These methods provide good results if the data have certain properties which become visible in the new coordinate system but which were hard to detect in the original coordinate system. Often however, the application of only one method does not suffice to capture all important signals. Therefore several methods addressing different aspects of the data need to be applied. We have developed a framework for linear and non-linear dimension reduction methods within our visual analytics pipeline SpRay. This includes measures that assist the interpretation of the factorization result. Different visualizations of these measures can be combined with functional annotations that support the interpretation of the results. We show an application to high-resolution time series microarray data in the antibiotic-producing organism Streptomyces coelicolor as well as to microarray data measuring expression of cells with normal karyotype and cells with trisomies of human chromosomes 13 and 21.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abdi H, Williams LJ (2010) Principal component analysis. Wiley interdisciplinary reviews. Comput Stat 2(4): 433–459

    Article  Google Scholar 

  • Agilent Technologies (2007) GeneSpring GX manual. http://www.chem.agilent.com/cag/bsp/products/gsgx/manuals/GeneSpring-manual.pdf

  • Altug-Teber Ö, Bonin M, Walter M, Mau-Holzmann UA, Dufke A, Stappert H, Tekesin I, Heilbronner H, Nieselt K, Riess O (2008) Specific transcriptional changes in human fetuses with autosomal trisomies. Cytogenet Genome Res 119(3-4): 171–184

    Article  Google Scholar 

  • Battke F, Symons S, Nieselt K (2010) Mayday—integrative analytics for expression data. BMC Bioinform 11(1): 121

    Article  Google Scholar 

  • Battke F, Herbig A, Wentzel A, Jakobsen ØM, Bonin M, Hodgson DA, Wohlleben W, Ellingsen TE, Nieselt K (2011) A technical platform for generating reproducible expression data from Streptomyces coelicolor batch cultivations. In: Arabnia HRR, Tran QN (eds) Software tools and algorithms for biological systems, advances in experimental medicine and biology, vol 696. Springer, New York, , pp 3–15

    Chapter  Google Scholar 

  • Dietzsch J, Heinrich J, Nieselt K, Bartz D (2009) Spray: a visual analytics approach for gene expression data. In: IEEE symposium on visual analytics science and technology (VAST)

  • Fontes M, Soneson C (2011) The projection score—an evaluation criterion for variable subset selection in PCA visualization. BMC Bioinform 12(1): 307

    Article  Google Scholar 

  • Golub GH, van Loan CF (1983) Matrix computations, 1st edn. The John Hopkins University Press, Baltimore

    MATH  Google Scholar 

  • Harrower M, Brewer C (2003) ColorBrewer.org: an online tool for selecting colour schemes for maps. Cartogr J 40(1): 27–37

    Article  Google Scholar 

  • Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Edu Psychol 24(7): 498–520

    Article  Google Scholar 

  • Hyvaerinen A (1997) New approximations of differential entropy for independent component analysis and projection pursuit. In: Advances in neural information processing systems, vol 10. pp 273–279

  • Hyvaerinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw 10(3): 626–634

    Article  Google Scholar 

  • Hyvaerinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9(7): 1483–1492

    Article  Google Scholar 

  • Hyvaerinen A, Karhunen J, Oja E (2001) Independent component analysis. In: Adaptive and learning systems for signal processing, communication, and control, 1st edn. Wiley-Interscience, New York

  • Inselberg A (1985) The plane with parallel coordinates. Visual Comput 1(2): 69–91

    Article  MATH  Google Scholar 

  • Inselberg A (2009) Parallel coordinates: visual multidimensional geometry and its applications. Springer, New York

    MATH  Google Scholar 

  • Jeong DH, Ziemkiewicz C, Fisher B, Ribarsky W, Chang R (2009) iPCA: an interactive system for PCA-based visual analytics. Comput Graph Forum 28(3): 767–774

    Article  Google Scholar 

  • Joliffe I (2002) Principal component analysis, 2nd edn. Springer series in statistics, New York

  • Kaiser HF (1958) The varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3): 187–200

    Article  MATH  Google Scholar 

  • Karbauskaite R, Kurasova O, Dzemyda G (2007) Selection of the number of neighbors of each data point for the locally linear embedding algorithm. Inf Technol Control 36(4): 359–364

    Google Scholar 

  • Kouropteva O, Okun O, Pietikinen M (2002) Selection of the optimal parameter value for the locally linear embedding algorithm. In: Proceedings of the 1st international conference on fuzzy systems and knowledge discovery, pp 359–363

  • Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Norton H, Brown EL (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14(13): 1675–1680

    Article  Google Scholar 

  • Mannfolk P, Wirestam R, Nilsson M, Sthlberg F, Olsrud J (2010) Dimensionality reduction of fMRI time series data using locally linear embedding. Magn Reson Mater Phys Biol Med 23(5-6): 327–338

    Article  Google Scholar 

  • Nieselt K, Battke F, Herbig A, Bruheim P, Wentzel A, Jakobsen O, Sletta H, Alam M, Merlo M, Moore J, Omara W, Morrissey E, Juarez-Hermosillo M, Rodriguez-Garcia A, Nentwich M, Thomas L, Iqbal M, Legaie R, Gaze W, Challis G, Jansen R, Dijkhuizen L, Rand D, Wild D, Bonin M, Reuther J, Wohlleben W, Smith M, Burroughs N, Martin J, Hodgson D, Takano E, Breitling R, Ellingsen T, Wellington E (2010) The dynamic architecture of the metabolic switch in Streptomyces coelicolor. BMC Genomics 11(1):10

    Google Scholar 

  • Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2(6): 559–572

    Google Scholar 

  • Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500): 2323–2326

    Article  Google Scholar 

  • Saeed A, Bhagabati N, Braisted J, Liang W, Sharov V, Howe E, Li J, Thiagarajan M, White J, Quackenbush J (2006) TM4 microarray software suite. Methods Enzymol 411: 134–193

    Article  Google Scholar 

  • Saul LK, Roweis ST (2003) Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Mach Learn Res 4: 119–155

    MathSciNet  Google Scholar 

  • Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235): 467–470

    Article  Google Scholar 

  • Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26(10): 1135–1145

    Article  Google Scholar 

  • Tarjan R (1972) Depth-first search and linear graph algorithms. SIAM J Comput 1(2): 146–160

    Article  MathSciNet  MATH  Google Scholar 

  • Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500): 2319–2323

    Article  Google Scholar 

  • Valencia-Aguirre J, lvarez Mesa A, Daza-Santacoloma G, Castellanos-Domnguez G (2009) Automatic choice of the number of nearest neighbors in locally linear embedding. In: Bayro-Corrochano E, Eklundh JO (eds) Progress in pattern recognition, image analysis, computer vision, and applications. Lecture notes in computer science, vol 5856. Springer, New York, pp 77–84

  • Weinberger KQ, Saul LK (2006) Unsupervised learning of image manifolds by semidefinite programming. Int J Comput Vision 70(1): 77–90

    Article  Google Scholar 

  • Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J Sci Comput 26(1): 313–338

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kay Nieselt.

Additional information

Responsible editor: Barbara Hammer, Daniel Keim, Guy Lebanon, Neil Lawrence.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lehrmann, A., Huber, M., Polatkan, A.C. et al. Visualizing dimensionality reduction of systems biology data. Data Min Knowl Disc 27, 146–165 (2013). https://doi.org/10.1007/s10618-012-0268-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-012-0268-8

Keywords

Mathematics Subject Classification (2000)

Navigation