Abstract
Aimed at the Gaussian kernel parameter σ sensitive issue of the traditional spectral clustering algorithm, this paper proposed to utilize the similarity measure based on data density during creating the similarity matrix, inspired by density sensitive similarity measure. Making it increase the distance of the pairs of data in the high density areas, which are located in different spaces. And it can reduce the similarity degree among the pairs of data in the same density region, so as to find the spatial distribution characteristics complex data. According to this point, we designed two similarity measure methods, and both of them didn’t introduce Gaussian kernel function parameter σ. The main difference between the two methods is that the first method introduces a shortest path, while the second method doesn’t. The second method proved to have better comprehensive performance of similarity measure, experimental verification showed that it improved stability of the entire algorithm. In addition to matching spectral clustering algorithm, the final stage of the algorithm is to use the k-means (or other traditional clustering algorithms) for the selected feature vector to cluster, however the k-means algorithm is sensitive to the initial cluster centers. Therefore, we also designed a simple and effective method to optimize the initial cluster centers leads to improve the k-means algorithm, and applied the improved method to the proposed spectral clustering algorithm. Experimental results on UCI datasets show that the improved k-means clustering algorithm can further make cluster more stable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ding, C., He, X.: k-Nearest-Neighbor consistency in data clustering: Incorporating local information into global optimization. In: ACM Symposium on Applied Computing, pp. 584–589 (2004)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data vis the EM algorithm. Journal of Royal Statistical Society Series B 39(1), 1–38 (1997)
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. ACM SIGMOD Record 27(2), 73–84 (1998)
Gelbard, R., Goldman, O., Spiegler, I.: Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 155–156 (2007)
Huang, Z.: Extensions to the k-means algorithm for clustering large datasets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)
Jain, A.: Data clustering: 50 years beyond k-means. In: ICPR, pp. 651–666 (2010)
Michael, K., Joyce, C.: Clustering categorical data sets using tabu search techniques. Pattern Recognition 35, 2783–2790 (2002)
Queen, J.M.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkley Symposium Math. Stat. Prob., vol. 1, pp. 281–297 (1967)
Qin, Y., Zhang, S., Zhu, X., Zhang, J., Zhang, C.: Semi-parametric optimization for missing data imputation. Appl. Intell. 27(1), 79–88 (2007)
Sun, Y., Zhu, Q., Chen, Z.: An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recognition Letters 23, 875–884 (2002)
Strehl, A., Ghosh, J.: Cluster ensembles-a knowledge reuse framework for combining partitioning‘s. Journal of Machine Learning Research 3, 583–617 (2002)
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)
Wang, L., Bo, L., Jiao, L.: Density-Sensitive Semi-Supervised Spectral Clustering. Journal of Software 18(10), 2412–2422 (2007)
Wang, L., Bo, L., Jiao, L.: Density-Sensitive Spectral Clustering. Acta Electronica Sinica 35(8), 1577–1581 (2007)
Xiang, T., Gong, S.: Spectral clustering with eigenvector selection. Pattern Recognition 41(3), 1012–1029 (2008)
Wu, X., Zhang, S.: Synthesizing High-Frequency Rules from Different Data Sources. IEEE Trans. Knowl. Data Eng. 15(2), 353–367 (2003)
Wu, X., Zhang, C., Zhang, S.: Efficient mining of both positive and negative association rules. ACM Trans. Inf. Syst. 22(3), 381–405 (2004)
Wu, X., Zhang, C., Zhang, S.: Database classification for multi-database mining. Inf. Syst. 30(1), 71–88 (2005)
Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing Value Imputation Based on Data Clusteri ng. Transactions on Computational Science 1, 128–138 (2008)
Zhang, S., Chen, F., Wu, X., Zhang, C., Wang, R.: Mining bridging rules between conceptual clusters. Applied Intelligence 36(1), 108–118 (2012)
Zhang, J., Zhu, X., Li, X., Zhang, S.: Mining item popularity for recommender systems. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013, Part II. LNCS (LNAI), vol. 8347, pp. 372–383. Springer, Heidelberg (2013)
Zhang, S., Zhang, C., Yan, X.: Post-mining: maintenance of association rules by weighting. Inf. Syst. 28(7), 691–707 (2003)
Zhang, S., Qin, Z., Ling, C., Sheng, S.: “Missing Is Useful”: Missing Values in Cost-Sensitive Decision Trees. IEEE Trans. Knowl. Data Eng. 17(12), 1689–1693 (2005)
Zhao, Y., Zhang, S.: Generalized Dimension-Reduction Framework for Recent-Biased Time Series Analysis. IEEE Trans. Knowl. Data Eng. 18(2), 231–244 (2006)
Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing Value Estimation for Mixed-Attribute Data Sets. IEEE Trans. Knowl. Data Eng. 23(1), 110–121 (2011)
Zhu, X., Zhang, L., Huang, Z.: A Sparse Embedding and Least Variance Encoding Approach to Hashing. IEEE Transactions on Image Processing 23(9), 3737–3750 (2014)
Zhu, X., Huang, Z., Shen, H., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: ACM Multimedia, pp. 143–152 (2013)
Zhu, X., Suk, H., Shen, D.: A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis. NeuroImage 100, 91–105 (2014)
Zhu, X., Suk, H., Shen, D.: Matrix-Similarity Based Loss Function and Feature Selection for Alzheimer’s Disease Diagnosis. In: CVPR, pp. 3089–3096 (2014)
Zhu, X., Huang, Z., Yang, Y., Shen, H., Xu, C., Luo, J.: Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recognition 46(1), 215–229 (2013)
Zhu, X., Huang, Z., Cui, J., Shen, H.: Video-to-Shot Tag Propagation by Graph Sparse Group Lasso. IEEE Transactions on Multimedia 15(3), 633–646 (2013)
Zhu, X., Huang, Z., Cheng, H., Cui, J., Shen, H.: Sparse hashing for fast multimedia search. ACM Trans. Inf. Syst. 31(2), 9 (2013)
Zhu, X., Huang, Z., Shen, H., Cheng, J., Xu, C.: Dimensionality reduction by Mixed Kernel Canonical Correlation Analysis. Pattern Recognition 45(8), 3003–3016 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Yan, J., Cheng, D., Zong, M., Deng, Z. (2014). Improved Spectral Clustering Algorithm Based on Similarity Measure. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-14717-8_50
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14716-1
Online ISBN: 978-3-319-14717-8
eBook Packages: Computer ScienceComputer Science (R0)