Visualizing high density clusters in multidimensional data using optimized star coordinates | Computational Statistics Skip to main content
Log in

Visualizing high density clusters in multidimensional data using optimized star coordinates

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Multidimensional multivariate data have been studied in different areas for quite some time. Commonly, the analysis goal is not to look into individual records but to understand the distribution of the records at large and to find clusters of records that exhibit correlations between dimensions or variables. We propose a visualization method that operates on density rather than individual records. To not restrict our search for clusters, we compute density in the given multidimensional space. Clusters are formed by areas of high density. We present an approach that automatically computes a hierarchical tree of high density clusters. For visualization purposes, we propose a method to project the multidimensional clusters to a 2D or 3D layout. The projection method uses an optimized star coordinates layout. The optimization procedure minimizes the overlap of projected clusters and maximally maintains the cluster shapes, compactness, and distribution. The star coordinate visualization allows for an interactive analysis of the distribution of clusters and comprehension of the relations between clusters and the original dimensions. Clusters are being visualized using nested sequences of density level sets leading to a quantitative understanding of information content, patterns, and relationships.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD international conference on management of data, Washington, DC, pp 95–104

  • Andrews D (1972) Plots of high-dimensional data. Biometrics 28: 125–136

    Article  Google Scholar 

  • Ankerst M, Keim DA, Kriegel H-P (1996) Circle segments: a technique for visually exploring large multidimensional data sets. IEEE Visualization Proceedings, Hot topic session, San Francisco, CA

  • Artero AO, de Oliveira MCF (2004) Viz3d: effective exploratory visualization of large multidimensional data sets. Computer graphics and image processing, the 17th Brazilian symposium on SIBGRAPI, pp 304–347

  • Balzer M, Deussen O (2007) Level-of-detail visualization of clustered graph layouts. Asia-Pacific symposium on visualization (APVIS), pp 133–140

  • Card SK, Mackinlay J, Shneiderman B (1999) Readings in information visualization: using vision to think. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Chambers JM, Cleveland WS, Tukey PA, Kleiner B (1983) Graphical methods for data analysis. Wadsworth, Belmont

    MATH  Google Scholar 

  • Dhillon IS, Modha DS, Spangler WS (1998) Visualizing class structure of multidimensional data. In: Proceedings of 30th symposium on interface: computing science and statistics, pp 488–493

  • Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd ACM SIGKDD, Portand, Oregon, pp 226–231

  • Fua Y-H, Ward MO, Rundensteiner EA (1999) Hierarchical parallel coordinates for exploration of large datasets. Proceedings of IEEE Symposium on Information Visualization, pp 43–50

  • Heckel B, Hamann B (1998) Visualization of cluster hierarchies. In: Erbacher RF, Pang A (eds) Proceedings of SPIE: visual data exploration and analysis V, 3298:162–171

  • Hendley RJ, Drew NS, Wood AM, Beale RE (1995) Case study—narcissus: visualizing information. In: Proceedings of the IEEE information visualization 95, pp 51–58

  • Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceedings international conference knowledge discovery and data mining, pp 58–65

  • Hinneburg A, Keim DA, Wawryniuk M (1999) Hd-eye: visual mining of high-dimensional data. IEEE Comput Graph Appl 19: 22–31

    Article  Google Scholar 

  • Hoffman P, Grinstein G, Marx K, Grosse I, Stanley E (1997) Dna visual and analytic data mining. In: Visualization ’97., Proceedings, pp 437–441

  • Huber PJ (1985) Projection pursuit. Ann Stat 13: 435–475

    Article  MATH  Google Scholar 

  • Inselberg A (1985) The phane with parallel coordinates. Vis Comput 1: 69–97

    Article  MATH  Google Scholar 

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  • Kandogan E (2000) Star coordinates: a multi-dimensional visualization technique with uniform treatment of dimensions. In: Proceedings of IEEE information visualization symposium (hot topics), pp 4–8

  • Kandogan E (2001) Visualizing multi-dimensional clusters, trends, and outliers using star coordinates. Proceeings of ACM international conf. knowledge discovery and data mining, pp 107–116

  • Liu D, Sprague AP, Gray JG (2004) Polycluster: an interactive visualization approach to construct classification rules. The 2004 international conference on machine learning and applications, Louisville, KY, USA, pp 280–287

  • Lorensen WE, Cline HE (1987) Marching cubes: a high resolution 3d surface construction algorithm. Comput Graph 21: 163–169

    Article  Google Scholar 

  • Scott DW, Sain SR (2004) Multidimensional density estimation, in handbook of statistics. In: Rao CR, Wegman EJ (eds) Vol 23: data mining and computational statistics. Elsevier, Amsterdam, pp 229–261

  • Shaik JS, Yeasin M (2006) Visualization of high dimensional data using an automated 3d star coordinate system. International joint conference on neural networks, pp 1339–1346

  • Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings 24th very large databases conference, pp 428–439

  • Sips M, Neubert B, Lewis JP, Hanrahan P (2009) Selecting good views of high-dimensional data using class consistency. Comput Graph Forum 28(3): 831–838

    Article  Google Scholar 

  • Sprenger TC, Brunella R, Gross MH (2000) H-blob: a hierarchical visual clustering method using implicit surfaces. In: Proceedings of the conference on visualization ’00, pp 61–68

  • Sprenger TC, Gross MH, Bielser D, Strasser T (1998) Ivory—an object-oriented framework for physics-based information visualization in java. In: Proceedings of the 1998 IEEE symposium on information visualization, pp 79–86

  • Sprenger TC, Gross MH, Eggenberger A, Kaufmann M (1997) A framework for physically-based information visualization. In: Proceedings of eurographics workshop on visualization ’97 (Boulogne sur Mer, France), pp 77–86

  • Stuetzle W (2003) Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J Classif 20: 25–47

    Article  MathSciNet  MATH  Google Scholar 

  • Stuetzle W, Nugent R (2010) A generalized single linkage method for estimating the cluster tree of a density. J Comput Graph Stat 19(2): 397–418

    Article  MathSciNet  Google Scholar 

  • Wang W, Yang J, Muntz R (1997) Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd international conference on very large data bases, pp 186–195

  • Wegman EJ (1990) Hyper-dimensional data analysis using parallel coordinates. J Am Stat Assoc 21: 664–675

    Article  Google Scholar 

  • Wegman EJ, Luo Q (2002) On methods of computer graphics for visualizing densities. J Comput Graph Stat 11: 137–162

    Article  MathSciNet  Google Scholar 

  • Wong A, Lane T (1983) A kth nearest neighbor clustering procedure. J R Stat Soc Ser B 45: 362–368

    MathSciNet  MATH  Google Scholar 

  • Yanchang Z, Junde S (2003) Agrid: an efficient algorithm for clustering large high-dimensional datasets. Lect Notes Comput Sci 2637: 271–282

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tran Van Long.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Van Long, T., Linsen, L. Visualizing high density clusters in multidimensional data using optimized star coordinates. Comput Stat 26, 655–678 (2011). https://doi.org/10.1007/s00180-011-0271-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-011-0271-3

Keywords

Navigation