A Novel Fuzzy Kernel C-Means Algorithm for Document Clustering | SpringerLink
Skip to main content

A Novel Fuzzy Kernel C-Means Algorithm for Document Clustering

  • Conference paper
Information Retrieval Technology (AIRS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4993))

Included in the following conference series:

  • 1483 Accesses

Abstract

Fuzzy Kernel C-Means (FKCM) algorithm can improve accuracy significantly compared with classical Fuzzy C-Means algorithms for nonlinear separability, high dimension and clusters with overlaps in input space. Despite of these advantages, several features are subjected to the applications in real world such as local optimal, outliers, the c parameter must be assigned in advance and slow convergence speed. To overcome these disadvantages, Semi-Supervised learning and validity index are employed. Semi-Supervised learning uses limited labeled data to assistant a bulk of unlabeled data. It makes the FKCM avoid drawbacks proposed. The number of cluster will great affect clustering performance. It isn’t possible to assume the optimal number of clusters especially to large text corps. Validity function makes it possible to determine the suitable number of cluster in clustering process. Sparse format, Cscatter and gathering strategy save considerable store space and computation time. Experimental results on the Reuters-21578 benchmark dataset demonstrate that the algorithm proposed is more flexibility and accuracy than the state-of-art FKCM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)

    MATH  Google Scholar 

  2. Wu, Z.-d., Xie, W.-x., Yu, J.-p.: Fuzzy C-means clustering algorithm based on kernel method. In: Proceedings of Fifth International Conference on Computational Intelligence and Multimedia Applications, pp. 49–56 (2003)

    Google Scholar 

  3. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis, pp. 327–338. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  4. Pal, N.R., Bezdek, J.C.: On clustering for the fuzzy c-means model. IEEE Transaction on Fuzzy System 3(3), 370–379 (1995)

    Article  Google Scholar 

  5. Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 841–847 (1999)

    Google Scholar 

  6. Bensaid, A.M., Hall, L.O., Bezdek, J.C.: Validity-guided (re)clustering with applications to image segmentation. IEEE Transactions on Fuzzy Systems, 112–123 (1996)

    Google Scholar 

  7. Li, K., Liu, Y.: KFCSA:A Novel clustering Algorithm for High-Dimension Data. In: Wang, L., Jin, Y. (eds.) FSKD 2005. LNCS (LNAI), vol. 3613, pp. 531–536. Springer, Heidelberg (2005)

    Google Scholar 

  8. Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2006)

    Google Scholar 

  9. Huang, T.-M., Kecman, V., Kopriva, I.: Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised, and Unsupervised Learning. Springer, Berlin (2006)

    MATH  Google Scholar 

  10. Bouchachia, A., Pedrycz, W.: Data Clustering with Partial Supervision Data Mining and Knowledge Discovery 12, 47–78 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hang Li Ting Liu Wei-Ying Ma Tetsuya Sakai Kam-Fai Wong Guodong Zhou

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yin, Y., Zhang, X., Miao, B., Gao, L. (2008). A Novel Fuzzy Kernel C-Means Algorithm for Document Clustering. In: Li, H., Liu, T., Ma, WY., Sakai, T., Wong, KF., Zhou, G. (eds) Information Retrieval Technology. AIRS 2008. Lecture Notes in Computer Science, vol 4993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68636-1_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68636-1_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68633-0

  • Online ISBN: 978-3-540-68636-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics