Abstract
In many emerging real-life problems, the number of dimensions in the data sets can be from thousands to millions. The large number of features poses great challenge to existing high-dimensional data analysis methods. One particular issue is that the latent patterns may only exist in subspaces of the full-dimensional space. In this chapter, we discuss the problem of finding correlations hidden in feature subspaces. Both linear and nonlinear cases will be discussed. We present efficient algorithms for finding such correlated feature subsets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
CARE stands for finding loCAl lineaR corrElations.
- 2.
REDUS stands for REDUcible Subspaces.
- 3.
In this chapter, we assume that the eigenvalues are always arranged in increasing order. Their corresponding eigenvectors are \(\{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n\}\).
- 4.
This theorem also applies to Hermitian matrix [35]. Here we focus on the covariance matrix, which is semi-positive definite and symmetric.
References
M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns, Proceedings of National Acadamy of Science USA, 95:14863–14868, 1998.
V. Iyer and et. al. The transcriptional program in the response of human fibroblasts to serum. Science, 283:83–87, 1999.
L. Parsons, E. Haque, and H. Liu. Subspae clustering for high dimensional data: a review, In KDD Explorations, 6(1): 90–105, 2004.
A. Blum and P. Langley, “Selection of relevant features and examples in machine learning,” Artificial Intelligence, 97: 245–271, 1997.
H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Boston, MA, 1998.
L. Yu and H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution. In Proceedings of International Conference on Machine Learning, 856–863, 2003.
Z. Zhao and H. Liu. Searching for interacting features, In The 20th International Joint Conference on AI, 1156–1161, 2007.
M. Belkin and P. Niyogi. “laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003.
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 1996.
I. Borg and P. Groenen. Modern multidimensional scaling. Springer, New York, 1997.
I. Jolliffe. Principal Component Analysis. Springer, New York, 1986.
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500):2323–2326, 2000.
J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500):2319–2323, 2000.
C. Aggarwal and P. Yu. Finding generalized projected clusters in high dimensional spaces. In SIGMOD, 2000.
E. Achtert, C. Bohm, H.-P. Kriegel, P. Kroger, and A. Zimek. Deriving quantitative models for correlation clusters. In KDD, 2006.
H. Wang, W. Wang, J. Yang, and Y. Yu. Clustering by pattern similarity in large data sets. In SIGMOD, 2002.
M. Ashburner et al. Gene ontology: tool for the unification of biology, The gene ontology consortium, Nature Genetics, 25:25–29, 2000.
X. Zhang, F. Pan, and W. Wang. Care: Finding local linear correlations in high dimensional data. In ICDE, 130–139, 2008.
K. Fukunaga. Intrinsic dimensionality extraction. Classification, Pattern recongnition and Reduction of Dimensionality, Volume 2 of Handbook of Statistics, pages 347–360, P. R. Krishnaiah and L. N. Kanal editors, Amsterdam, North Holland, 1982.
F. Camastra and A. Vinciarelli. Estimating intrinsic dimension of data with a fractal-based approach. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(10):1404–1407, 2002.
K. Fukunaga and D. R. Olsen. An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 20(2):165–171, 1976.
E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems, 2005.
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD, 94–105, 1998.
C. Aggarwal, J. Wolf, P. Yu, C. Procopiuc, and J. Park. Fast algorithms for projected clustering. In SIGMOD, 61–72, 1999.
C. Chen, A. Fu, and Y. Zhang. Entropy-based subspace clustering for mining numerical data. In SIGKDD, 84–93, 1999.
D. Barbara and P. Chen. Using the fractal dimension to cluster datasets. In KDD, 260–264, 2000.
A. Gionis, A. Hinneburg, S. Papadimitriou, and P. Tsaparas. Dimension induced clustering. In KDD, 2005.
S. Papadimitriou, H. Kitawaga, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. In ICDE, 2003.
B. U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In ICDE, 589, 2000.
A. Belussi and C. Faloutsos. Self-spacial join selectivity estimation using fractal concepts. ACM Transactions on Information Systems, 16(2):161–201, 1998.
C. Faloutsos and I. Kamel. Beyond uniformity and independence: analysis of r-trees using the concept of fractal dimension. In PODS, 1994.
G. Golub and A. Loan. Matrix computations. Johns Hopkins University Press, Baltimore, MD, 1996.
S. N. Rasband. Chaotic Dynamics of Nonlinear Systems. Wiley, 1990.
M. Schroeder. Fractals, Chaos, Power Lawers: Minutes from an Infinite Paradise. W. H. Freeman, New York, 1991.
R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge UK, 1985.
D. C. Lay. Linear Algebra and Its Applications. Addison Wesley, 2005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Zhang, X., Pan, F., Wang, W. (2010). Finding High-Order Correlations in High-Dimensional Biological Data. In: Yu, P., Han, J., Faloutsos, C. (eds) Link Mining: Models, Algorithms, and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6515-8_19
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6515-8_19
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6514-1
Online ISBN: 978-1-4419-6515-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)