Abstract
Hierarchical clustering with data field can find clusters with various shape and filter the noises in data set without input parameters. However, its clustering process is complex and cannot effectively deal with complex and high dimensional data. In this paper, a novel clustering algorithm is proposed by differencing potential (DP) of data field. The potential difference specifies the nearest object which has high potential as the aggregation direction, and the data distance is used to divide the global data set into local multiple clusters. Simultaneously, noises are identified effectively in the light of the potential of data field. Experimental results on eight popular data sets and a facial image data set indicate that the proposed method outperforms existing clustering algorithms for dealing with data set with high dimensions and distribution in complex shape, as well as noise identification.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Li D, Wang S, Li D (2015) Spatial data mining: theory and application. Springer, Berlin, Germany
Wang S, Gan W, Li D et al (2011) Data field for hierarchical clustering. Int J Data Warehous Min 7(4):43–63
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Estivill-Castro V (2002) Why so many clustering algorithms: a position paper. ACM SIGKDD Explor Newslett 4(1):65–75
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken, New Jersey, USA
Zha H, He X, Ding C et al (2001) Spectral relaxation for k-means clustering. In: Advances in neural information processing systems (pp 1057–1064)
Ester M, Kriegel HP, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (vol 96, no 34, pp 226–231)
Ankerst M, Breunig MM, Kriegel HP et al (1999) OPTICS: ordering points to identify the clustering structure//ACM sigmod record. ACM 28(2):49–60
Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86
Veenman CJ, Reinders MJT, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280
Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Applications of computer vision, proceedings of the second IEEE workshop on (pp 138–142). IEEE
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Sampat MP, Wang Z, Gupta S et al (2009) Complex wavelet structural similarity: a new image similarity index. IEEE Trans Image Process 18(11):2385–2401
Liu W, He J, Chang SF (2010) Large graph construction for scalable semi-supervised learning. In: International conference on machine learning (pp 679-686). DBLP
Acknowledgements
The work is supported in part by National Key Research and Development Plan of China (2016YFC0803004), National Natural Science Fund of China (61472039) and Beijing Major Science and Technology (Z171100005117002).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Wang, S., Wang, S., Yuan, H. et al. Clustering by differencing potential of data field. Computing 100, 403–419 (2018). https://doi.org/10.1007/s00607-018-0605-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-018-0605-x
Keywords
- Potential difference
- Data field clustering
- Complex shape cluster
- High dimensional data
- Noise filter
- Potential topology