Finding High-Order Correlations in High-Dimensional Biological Data | SpringerLink
Skip to main content

Finding High-Order Correlations in High-Dimensional Biological Data

  • Chapter
  • First Online:
Link Mining: Models, Algorithms, and Applications

Abstract

In many emerging real-life problems, the number of dimensions in the data sets can be from thousands to millions. The large number of features poses great challenge to existing high-dimensional data analysis methods. One particular issue is that the latent patterns may only exist in subspaces of the full-dimensional space. In this chapter, we discuss the problem of finding correlations hidden in feature subspaces. Both linear and nonlinear cases will be discussed. We present efficient algorithms for finding such correlated feature subsets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 22879
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 28599
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
JPY 28599
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    CARE stands for finding loCAl lineaR corrElations.

  2. 2.

    REDUS stands for REDUcible Subspaces.

  3. 3.

    In this chapter, we assume that the eigenvalues are always arranged in increasing order. Their corresponding eigenvectors are \(\{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n\}\).

  4. 4.

    This theorem also applies to Hermitian matrix [35]. Here we focus on the covariance matrix, which is semi-positive definite and symmetric.

References

  1. M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns, Proceedings of National Acadamy of Science USA, 95:14863–14868, 1998.

    Article  CAS  Google Scholar 

  2. V. Iyer and et. al. The transcriptional program in the response of human fibroblasts to serum. Science, 283:83–87, 1999.

    Article  PubMed  CAS  Google Scholar 

  3. L. Parsons, E. Haque, and H. Liu. Subspae clustering for high dimensional data: a review, In KDD Explorations, 6(1): 90–105, 2004.

    Article  Google Scholar 

  4. A. Blum and P. Langley, “Selection of relevant features and examples in machine learning,” Artificial Intelligence, 97: 245–271, 1997.

    Article  Google Scholar 

  5. H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Boston, MA, 1998.

    Book  Google Scholar 

  6. L. Yu and H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution. In Proceedings of International Conference on Machine Learning, 856–863, 2003.

    Google Scholar 

  7. Z. Zhao and H. Liu. Searching for interacting features, In The 20th International Joint Conference on AI, 1156–1161, 2007.

    Google Scholar 

  8. M. Belkin and P. Niyogi. “laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003.

    Google Scholar 

  9. T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 1996.

    Google Scholar 

  10. I. Borg and P. Groenen. Modern multidimensional scaling. Springer, New York, 1997.

    Google Scholar 

  11. I. Jolliffe. Principal Component Analysis. Springer, New York, 1986.

    Book  Google Scholar 

  12. S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500):2323–2326, 2000.

    Article  PubMed  CAS  Google Scholar 

  13. J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500):2319–2323, 2000.

    Article  PubMed  CAS  Google Scholar 

  14. C. Aggarwal and P. Yu. Finding generalized projected clusters in high dimensional spaces. In SIGMOD, 2000.

    Google Scholar 

  15. E. Achtert, C. Bohm, H.-P. Kriegel, P. Kroger, and A. Zimek. Deriving quantitative models for correlation clusters. In KDD, 2006.

    Google Scholar 

  16. H. Wang, W. Wang, J. Yang, and Y. Yu. Clustering by pattern similarity in large data sets. In SIGMOD, 2002.

    Google Scholar 

  17. M. Ashburner et al. Gene ontology: tool for the unification of biology, The gene ontology consortium, Nature Genetics, 25:25–29, 2000.

    CAS  Google Scholar 

  18. X. Zhang, F. Pan, and W. Wang. Care: Finding local linear correlations in high dimensional data. In ICDE, 130–139, 2008.

    Google Scholar 

  19. K. Fukunaga. Intrinsic dimensionality extraction. Classification, Pattern recongnition and Reduction of Dimensionality, Volume 2 of Handbook of Statistics, pages 347–360, P. R. Krishnaiah and L. N. Kanal editors, Amsterdam, North Holland, 1982.

    Chapter  Google Scholar 

  20. F. Camastra and A. Vinciarelli. Estimating intrinsic dimension of data with a fractal-based approach. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(10):1404–1407, 2002.

    Article  Google Scholar 

  21. K. Fukunaga and D. R. Olsen. An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 20(2):165–171, 1976.

    Google Scholar 

  22. E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems, 2005.

    Google Scholar 

  23. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD, 94–105, 1998.

    Google Scholar 

  24. C. Aggarwal, J. Wolf, P. Yu, C. Procopiuc, and J. Park. Fast algorithms for projected clustering. In SIGMOD, 61–72, 1999.

    Google Scholar 

  25. C. Chen, A. Fu, and Y. Zhang. Entropy-based subspace clustering for mining numerical data. In SIGKDD, 84–93, 1999.

    Google Scholar 

  26. D. Barbara and P. Chen. Using the fractal dimension to cluster datasets. In KDD, 260–264, 2000.

    Google Scholar 

  27. A. Gionis, A. Hinneburg, S. Papadimitriou, and P. Tsaparas. Dimension induced clustering. In KDD, 2005.

    Google Scholar 

  28. S. Papadimitriou, H. Kitawaga, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. In ICDE, 2003.

    Google Scholar 

  29. B. U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In ICDE, 589, 2000.

    Google Scholar 

  30. A. Belussi and C. Faloutsos. Self-spacial join selectivity estimation using fractal concepts. ACM Transactions on Information Systems, 16(2):161–201, 1998.

    Article  Google Scholar 

  31. C. Faloutsos and I. Kamel. Beyond uniformity and independence: analysis of r-trees using the concept of fractal dimension. In PODS, 1994.

    Google Scholar 

  32. G. Golub and A. Loan. Matrix computations. Johns Hopkins University Press, Baltimore, MD, 1996.

    Google Scholar 

  33. S. N. Rasband. Chaotic Dynamics of Nonlinear Systems. Wiley, 1990.

    Google Scholar 

  34. M. Schroeder. Fractals, Chaos, Power Lawers: Minutes from an Infinite Paradise. W. H. Freeman, New York, 1991.

    Google Scholar 

  35. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge UK, 1985.

    Google Scholar 

  36. D. C. Lay. Linear Algebra and Its Applications. Addison Wesley, 2005.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Zhang, X., Pan, F., Wang, W. (2010). Finding High-Order Correlations in High-Dimensional Biological Data. In: Yu, P., Han, J., Faloutsos, C. (eds) Link Mining: Models, Algorithms, and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6515-8_19

Download citation

Publish with us

Policies and ethics