Abstract
Multidimensional multivariate data have been studied in different areas for quite some time. Commonly, the analysis goal is not to look into individual records but to understand the distribution of the records at large and to find clusters of records that exhibit correlations between dimensions or variables. We propose a visualization method that operates on density rather than individual records. To not restrict our search for clusters, we compute density in the given multidimensional space. Clusters are formed by areas of high density. We present an approach that automatically computes a hierarchical tree of high density clusters. For visualization purposes, we propose a method to project the multidimensional clusters to a 2D or 3D layout. The projection method uses an optimized star coordinates layout. The optimization procedure minimizes the overlap of projected clusters and maximally maintains the cluster shapes, compactness, and distribution. The star coordinate visualization allows for an interactive analysis of the distribution of clusters and comprehension of the relations between clusters and the original dimensions. Clusters are being visualized using nested sequences of density level sets leading to a quantitative understanding of information content, patterns, and relationships.
Similar content being viewed by others
References
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD international conference on management of data, Washington, DC, pp 95–104
Andrews D (1972) Plots of high-dimensional data. Biometrics 28: 125–136
Ankerst M, Keim DA, Kriegel H-P (1996) Circle segments: a technique for visually exploring large multidimensional data sets. IEEE Visualization Proceedings, Hot topic session, San Francisco, CA
Artero AO, de Oliveira MCF (2004) Viz3d: effective exploratory visualization of large multidimensional data sets. Computer graphics and image processing, the 17th Brazilian symposium on SIBGRAPI, pp 304–347
Balzer M, Deussen O (2007) Level-of-detail visualization of clustered graph layouts. Asia-Pacific symposium on visualization (APVIS), pp 133–140
Card SK, Mackinlay J, Shneiderman B (1999) Readings in information visualization: using vision to think. Morgan Kaufmann, San Francisco
Chambers JM, Cleveland WS, Tukey PA, Kleiner B (1983) Graphical methods for data analysis. Wadsworth, Belmont
Dhillon IS, Modha DS, Spangler WS (1998) Visualizing class structure of multidimensional data. In: Proceedings of 30th symposium on interface: computing science and statistics, pp 488–493
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd ACM SIGKDD, Portand, Oregon, pp 226–231
Fua Y-H, Ward MO, Rundensteiner EA (1999) Hierarchical parallel coordinates for exploration of large datasets. Proceedings of IEEE Symposium on Information Visualization, pp 43–50
Heckel B, Hamann B (1998) Visualization of cluster hierarchies. In: Erbacher RF, Pang A (eds) Proceedings of SPIE: visual data exploration and analysis V, 3298:162–171
Hendley RJ, Drew NS, Wood AM, Beale RE (1995) Case study—narcissus: visualizing information. In: Proceedings of the IEEE information visualization 95, pp 51–58
Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceedings international conference knowledge discovery and data mining, pp 58–65
Hinneburg A, Keim DA, Wawryniuk M (1999) Hd-eye: visual mining of high-dimensional data. IEEE Comput Graph Appl 19: 22–31
Hoffman P, Grinstein G, Marx K, Grosse I, Stanley E (1997) Dna visual and analytic data mining. In: Visualization ’97., Proceedings, pp 437–441
Huber PJ (1985) Projection pursuit. Ann Stat 13: 435–475
Inselberg A (1985) The phane with parallel coordinates. Vis Comput 1: 69–97
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs
Kandogan E (2000) Star coordinates: a multi-dimensional visualization technique with uniform treatment of dimensions. In: Proceedings of IEEE information visualization symposium (hot topics), pp 4–8
Kandogan E (2001) Visualizing multi-dimensional clusters, trends, and outliers using star coordinates. Proceeings of ACM international conf. knowledge discovery and data mining, pp 107–116
Liu D, Sprague AP, Gray JG (2004) Polycluster: an interactive visualization approach to construct classification rules. The 2004 international conference on machine learning and applications, Louisville, KY, USA, pp 280–287
Lorensen WE, Cline HE (1987) Marching cubes: a high resolution 3d surface construction algorithm. Comput Graph 21: 163–169
Scott DW, Sain SR (2004) Multidimensional density estimation, in handbook of statistics. In: Rao CR, Wegman EJ (eds) Vol 23: data mining and computational statistics. Elsevier, Amsterdam, pp 229–261
Shaik JS, Yeasin M (2006) Visualization of high dimensional data using an automated 3d star coordinate system. International joint conference on neural networks, pp 1339–1346
Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings 24th very large databases conference, pp 428–439
Sips M, Neubert B, Lewis JP, Hanrahan P (2009) Selecting good views of high-dimensional data using class consistency. Comput Graph Forum 28(3): 831–838
Sprenger TC, Brunella R, Gross MH (2000) H-blob: a hierarchical visual clustering method using implicit surfaces. In: Proceedings of the conference on visualization ’00, pp 61–68
Sprenger TC, Gross MH, Bielser D, Strasser T (1998) Ivory—an object-oriented framework for physics-based information visualization in java. In: Proceedings of the 1998 IEEE symposium on information visualization, pp 79–86
Sprenger TC, Gross MH, Eggenberger A, Kaufmann M (1997) A framework for physically-based information visualization. In: Proceedings of eurographics workshop on visualization ’97 (Boulogne sur Mer, France), pp 77–86
Stuetzle W (2003) Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J Classif 20: 25–47
Stuetzle W, Nugent R (2010) A generalized single linkage method for estimating the cluster tree of a density. J Comput Graph Stat 19(2): 397–418
Wang W, Yang J, Muntz R (1997) Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd international conference on very large data bases, pp 186–195
Wegman EJ (1990) Hyper-dimensional data analysis using parallel coordinates. J Am Stat Assoc 21: 664–675
Wegman EJ, Luo Q (2002) On methods of computer graphics for visualizing densities. J Comput Graph Stat 11: 137–162
Wong A, Lane T (1983) A kth nearest neighbor clustering procedure. J R Stat Soc Ser B 45: 362–368
Yanchang Z, Junde S (2003) Agrid: an efficient algorithm for clustering large high-dimensional datasets. Lect Notes Comput Sci 2637: 271–282
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Van Long, T., Linsen, L. Visualizing high density clusters in multidimensional data using optimized star coordinates. Comput Stat 26, 655–678 (2011). https://doi.org/10.1007/s00180-011-0271-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-011-0271-3