Abstract
Density-based clustering defines clusters as dense areas in feature space separated by sparsely populated areas. It is known to successfully identify clusters of arbitrary shapes even in noisy data. Today, we face increasingly high-dimensional data, i.e. data objects described by many attributes. Effects attributed to the “curse of dimensionality” mean that in high-dimensional spaces, traditional clustering methods fail to identify meaningful clusters. In little more than a decade, the research field of subspace clustering has established methods for identifying clusters in subsets of the attributes in such high-dimensional spaces. As the number of possible subsets is exponential in the number of attributes, efficient algorithms are crucial. This short survey discusses challenges in this area, and presents models and algorithms for efficient and scalable density-based subspace clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD, pp. 94–105 (1998)
Assent, I.: Clustering high dimensional data. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 2(4), 340–350 (2012)
Assent, I., Krieger, R., Müller, E., Seidl, T.: DUSC: dimensionality unbiased subspace clustering. In: ICDM, pp. 409–414 (2007)
Assent, I., Krieger, R., Müller, E., Seidl, T.: EDSC: efficient density-based subspace clustering. In: CIKM, pp. 1093–1102 (2008)
Assent, I., Krieger, R., Müller, E., Seidl, T.: INSCY: indexing subspace clusters with in-process-removal of redundancy. In: ICDM, pp. 719–724 (2008)
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases. In: KDD, pp. 226–231 (1996)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)
Kailing, K., Kriegel, H.-P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: SDM, pp. 246–257 (2004)
Kriegel, H.-P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdisc. Rev. Data Min. Knowl. Disc. 1(3), 231–240 (2011)
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1:1–1:58 (2009)
Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp. 533–541 (2008)
Moise, G., Zimek, A., Kröger, P., Kriegel, H.-P., Sander, J.: Subspace and projected clustering: experimental evaluation and analysis. Knowl. Inf. Syst. 21, 299–326 (2009). doi:10.1007/s10115-009-0226-y
Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp. 377–386 (2009)
Müller, E., Assent, I., Günnemann, S., Seidl, T.: Scalable density-based subspace clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1077–1086. ACM (2011)
Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. PVLDB 2(1), 1270–1281 (2009)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explor. 6(1), 90–105 (2004)
Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Min. Knowl. Discov. 26, 332–397 (2012). online first
Acknowledgments
This work has been supported in part by the Danish Council for Strategic Research, grant 10-092316, and by the Danish Council for Independent Research - Technology and Production Sciences (FTP), grant 10-081972.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Assent, I. (2015). Efficient Density-Based Subspace Clustering in High Dimensions. In: Masulli, F., Petrosino, A., Rovetta, S. (eds) Clustering High--Dimensional Data. CHDD 2012. Lecture Notes in Computer Science(), vol 7627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48577-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-662-48577-4_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48576-7
Online ISBN: 978-3-662-48577-4
eBook Packages: Computer ScienceComputer Science (R0)