Subspace and projected clustering: experimental evaluation and analysis

Moise, Gabriela; Zimek, Arthur; Kröger, Peer; Kriegel, Hans-Peter; Sander, Jörg

doi:10.1007/s10115-009-0226-y

Subspace and projected clustering: experimental evaluation and analysis

Regular Paper
Published: 08 July 2009

Volume 21, pages 299–326, (2009)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Gabriela Moise¹,
Arthur Zimek²,
Peer Kröger²,
Hans-Peter Kriegel² &
…
Jörg Sander¹

523 Accesses
46 Citations
Explore all metrics

Abstract

Subspace and projected clustering have emerged as a possible solution to the challenges associated with clustering in high-dimensional data. Numerous subspace and projected clustering techniques have been proposed in the literature. A comprehensive evaluation of their advantages and disadvantages is urgently needed. In this paper, we evaluate systematically state-of-the-art subspace and projected clustering techniques under a wide range of experimental settings. We discuss the observed performance of the compared techniques, and we make recommendations regarding what type of techniques are suitable for what kind of problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

References

Achtert E, Böhm C, Kriegel H-P, Kröger P, Müller-Gorman I, Zimek A (2007) Detection and visualization of subspace cluster hierarchies. In: Ramamohanarao K, Krishna P, Mohania M, Nantajeewarawat E (eds) Proceedings of the 12th international conference on database systems for advanced applications (DASFAA), Bangkok, Thailand, 2007, pp 152–163
Achtert E, Kriegel H-P, Zimek A (2008) ELKI: a software system for evaluation of subspace clustering algorithms. In: Proceedings of the 20th international conference on scientific and statistical database management (SSDBM), Hong Kong, China
Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. In: Bussche J, Vianu V (eds) Proceedings of the eighth international conference on database theory (ICDT), London, UK, 2001, pp 420–434
Aggarwal C, Procopiuc C, Wolf J, Yu P, Park J (1999) Fast algorithms for projected clustering. In: Delis A, Faloutsos C, Ghandeharizadeh, S (eds) Proceedings of the ACM SIGMOD international conference on management of data, Philadelphia, PA, USA, 1999, pp 61–72
Aggarwal C, Yu P (2000) Finding generalized projected clusters in high dimensional spaces. In: Chen W, Naughton J, Bernstein P (eds) Proceedings of the ACM SIGMOD international conference on management of data, Dallas, TX, USA, 2000, pp 70–81
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high-dimensional data for data mining applications. In: Haas L, Tiwary A (eds) Proceedings of the ACM SIGMOD international conference on management of data, Seattle, WA, USA, 1998, pp 94–105
Agrawal R, Srikan R (1994) Fast algorithms for mining association rules in large databases. In: Bocca J, Jarke M, Zaniolo C (eds) Proceedings of the international conference on very large data bases VLDB, Santiago de Chile, Chile, 1994, pp 487–499
Ankerst M, Breunig M, Kriegel H-P, Sander J (1999) OPTICS: Ordering points to identify the clustering structure. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds) Proceedings of the ACM international conference on management of data (SIGMOD), Philadelphia, PA, USA, 1999, pp 49–60
Assent I, Krieger R, Müller E, Seidl T (2007) DUSC: dimensionality unbiased subspace clustering. In: Proceedings of the seventh international conference on data mining (ICDM), Omaha, NE, USA, 2007, pp 409–414
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful?. In: Beeri C, Buneman P (eds) Proceedings of the seventh international conference on database theory (ICDT), Jerusalem, Israel, 1999, pp 217–235
Böhm C, Kailing K, Kriegel H-P, Kröger P (2004) Density connected clustering with local subspace preferences. In: Proceedings of the fourth international conference on data mining (ICDM), Brighton, UK, 2004, pp 27–34
Cheng C, Fu A, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA, USA, 1999, pp 84–93
Dempster A, Laird N, Rubin D (1977) Maximum likelihood for incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1): 1–38
MATH MathSciNet Google Scholar
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Barbara D, Kamath C (eds) Proceedings of the third SIAM international conference on data mining (SDM), San Francisco, CA, USA, 2003
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad U (eds) Proceedings of the second ACM international conference on knowledge discovery and data mining (KDD), Portland, OR, USA, 1996, pp 226–231
Gondek D, Hofmann T (2007) Non-redundant data clustering. Knowl Inf Syst 12(1): 1–116
Article Google Scholar
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Academic Press, London
Google Scholar
Hinneburg A, Aggarwal C, Keim D (2000) What is the nearest neighbor in high dimensional spaces? In: Abbadi A, Brodie M, Chakravarthy, Dayal U, Kamel N, Schlageter G, Whang K (eds) Proceedings of the 26th international conference on very large data bases (VLDB), Cairo, Egypt, 2000, pp 506–515
Hinneburg A, Keim D (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5(4): 387–415
Article Google Scholar
Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Berry M, Dayal U, Kamath C, Skilicorn D (eds) Proceedings of the fourth SIAM international conference on data mining (SDM), Orlando, FL, USA, 2004, pp 1–11
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analyis. Wiley, New York
Google Scholar
Kriegel H-P, Kröger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the fifth international conference on data mining, Houston, TX, USA, 2005, pp 250–257
Kriegel H-P, Kröger P, Zimek A (2009) Clustering high dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1)
Li T (2008) Clustering based on matrix approximation: a unifying view. Knowl Inf Syst 17(1): 1–133
Article Google Scholar
Liu G, Li J, Sim K, Wong L (2007) Distance based subspace clustering with flexible dimension partitioning. In: Proceedings of the 23rd international conference on data engineering (ICDE), Istanbul, Turkey, 2007, pp 1250–1254
McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley symposium on mathematics, statistics, and probabilistics, vol 1, 1967, pp 281–297
Moise G, Sander J, Ester M (2006) P3C: A robust projected clustering algorithm. In: Proceedings of the sixth international conference on data mining (ICDM), Hong Kong, China, 2006, pp 414–425
Moise G, Sander J, Ester M (2008) Robust projected clustering. Knowl Inf Syst 14(3): 273–298
Article MATH Google Scholar
Moise G, Sander J (2008) Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: Proceedings of the 14th ACM international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, NV, USA, 2008, pp 533–541
Nagesh H, Goil S, Choudhary A (2001) Adaptive grids for clustering massive data sets. In: Proceedings of the first SIAM international conference on data mining (SDM), Chicago, IL, USA, 2001, pp 1–17
Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases 1998
Ng K, Fu A, Wong C (2005) Projective clustering by histograms. IEEE Trans Knowl Data Eng 17(3): 369–383
Article Google Scholar
Ng R, Han J(1994) Efficient and effective clustering methods for spatial data mining. In: Bocca J, Jarke M, Zaniolo C (eds) Proceedings of the international conference on very large data bases VLDB, Santiago de Chile, Chile, 1994, pp 487–499
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6(1): 90–105
Article Google Scholar
Patrikainen A, Meila M (2006) Comparing subspace clusterings. IEEE Trans Knowl Data Eng 18(7): 902–916
Article Google Scholar
Procopiuc C, Jones M, Agarwal P, Murali T (2002) A Monte Carlo algorithm for fast projective clustering. In: Franklin M, Moon B, Ailamaki, A (eds) Proceedings of the ACM international conference on management of data (SIGMOD, Madison, WI, USA, 2002, pp 418–427
R project http://www.r-project.org/
UCI Machine Learning Repository http://archive.ics.uci.edu/ml/
Sequeira K, Zaki M (2004) SCHISM: a new approach for interesting subspace mining. In: Proceedings of the fourth international conference on data mining (ICDM), Brighton, UK, 2004, pp 186–193
Tang J, Chen J, Fu A, Cheung W (2007) Capabilities of outlier detection schemes in large datasets, framework and methodologies. Knowl Inf Syst 11(1): 45–84
Article Google Scholar
Yip K, Cheung D, Ng M (2004) HARP: a practical projected clustering algorithm. IEEE Trans Knowl Data Eng 16(11): 1387–1397
Article Google Scholar
Yip K, Cheung D, Ng M (2005) On discovery of extremely low-dimensional clusters using semi-supervised projected clustering. In: Proceedings of the 21st international conference on data engineering (ICDE), Tokyo, Japan, 2005, pp 329–340
Yip K, Qi P, Schultz M, Cheung D, Cheung K (2006) SemBiosphere: a semantic web approach to recommending microarray clustering services. In: Proceedings of the 11th pacific symposium on biocomputing (PSB), Maui, HI, USA, 2006
Yiu M, Mamoulis N (2005) Iterative projected clustering by subspace mining. IEEE Trans Knowl Data Eng 17(2): 176–189
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
Gabriela Moise & Jörg Sander
Ludwig-Maximilians-Universität München, Munich, Germany
Arthur Zimek, Peer Kröger & Hans-Peter Kriegel

Authors

Gabriela Moise
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Zimek
View author publications
You can also search for this author in PubMed Google Scholar
Peer Kröger
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Kriegel
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Sander
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriela Moise.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moise, G., Zimek, A., Kröger, P. et al. Subspace and projected clustering: experimental evaluation and analysis. Knowl Inf Syst 21, 299–326 (2009). https://doi.org/10.1007/s10115-009-0226-y

Download citation

Received: 01 August 2008
Revised: 27 March 2009
Accepted: 17 May 2009
Published: 08 July 2009
Issue Date: December 2009
DOI: https://doi.org/10.1007/s10115-009-0226-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Subspace and projected clustering: experimental evaluation and analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Density-Based Subspace Clustering in High Dimensions

Subspace multi-clustering: a review

Efficient Monte Carlo clustering in subspaces

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Subspace and projected clustering: experimental evaluation and analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Density-Based Subspace Clustering in High Dimensions

Subspace multi-clustering: a review

Efficient Monte Carlo clustering in subspaces

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now