Data depth based clustering analysis (Conference) | OSTI.GOV
Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Data depth based clustering analysis

Conference ·
 [1];  [1];  [1];  [1]
  1. Univ. of Illinois at Urbana-Champaign, Urbana, IL (United States)

Here, this paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with different parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also significantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, the proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are affine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of affine invariance, and exceeds or matches the ro-bustness to noises of DBSCAN or HDBSCAN. The robust-ness to parameter selection is also demonstrated through the case study of clustering twitter data.

Research Organization:
North Carolina State University, Raleigh, NC (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA), Office of Nonproliferation and Verification Research and Development (NA-22)
DOE Contract Number:
NA0002576
OSTI ID:
1438413
Resource Relation:
Conference: 24. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Francisco, CA (United States), 31 Oct- 3 Nov 2016
Country of Publication:
United States
Language:
English

References (26)

ST-DBSCAN: An algorithm for clustering spatial–temporal data journal January 2007
Locally Scaled Density Based Clustering book January 2007
Comparing clusterings—an information based distance journal May 2007
KAZE Features book January 2012
Background Modeling using Mixture of Gaussians for Foreground Detection - A Survey journal November 2008
Hierarchical clustering schemes journal September 1967
Cluster analysis and display of genome-wide expression patterns journal December 1998
Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? journal October 2014
Characterizing the shapes of noisy, non-uniform, and disconnected point clusters in the plane journal May 2016
Interval estimates of weighted effect sizes in the one-way heteroscedastic ANOVA journal May 2006
A Quality Index Based on Data Depth and Multivariate Rank Tests journal March 1993
OPTICS: ordering points to identify the clustering structure
  • Ankerst, Mihael; Breunig, Markus M.; Kriegel, Hans-Peter
  • Proceedings of the 1999 ACM SIGMOD international conference on Management of data - SIGMOD '99 https://doi.org/10.1145/304182.304187
conference January 1999
Fast nonparametric classification based on data depth journal November 2012
An improved sampling-based DBSCAN for large spatial databases conference January 2004
Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets journal April 2010
Model-Based Clustering, Discriminant Analysis, and Density Estimation journal June 2002
Comparing partitions journal December 1985
On the shape of a set of points in the plane journal July 1983
Algorithm AS 136: A K-Means Clustering Algorithm journal January 1979
General notions of statistical depth function journal April 2000
Classifying the segmentation of customer value via RFM model and RS theory journal April 2009
Choosing DBSCAN Parameters Automatically using Differential Evolution journal April 2014
CLARANS: a method for clustering objects for spatial data mining journal September 2002
Understanding Human Mobility from Twitter journal July 2015
An adaptive clustering algorithm for image segmentation journal April 1992
Clustering and classification based on the L1 data depth journal July 2004