Annals of Computer Science and Information Systems, Volume 35
Logo PTI Logo FedCSIS

Proceedings of the 18th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 35

Hashtag Discernability - Competitiveness Study of Graph Spectral and Other Clustering Methods

, , ,

DOI: http://dx.doi.org/10.15439/2023F2398

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 759767 ()

Full text

Abstract. Spectral clustering methods are claimed to possess ability to represent clusters of diverse shapes, densities etc. They constitute an approximation to graph cuts of various types (plain cuts, normalized cuts, ratio cuts). They are applicable to unweighted and weighted similarity graphs. We perform an evaluation of these capabilities for clustering tasks of increasing complexity.

References

  1. S. T. Wierzchoń and M. A. Kłopotek, Modern Clustering Algorithms, ser. Studies in Big Data. Springer Verlag, 2018, vol. 34. ISBN 978-3-319-69307-1. http://dx.doi.org/https://doi.org/10.1007/978-3-319-69308-8
  2. P. Łoziński, D. Czerski, and M. A. Kłopotek, “Grammatical case based IS-A relation extraction with boosting for polish,” in Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, FedCSIS 2016, Gdańsk, Poland, September 11-14, 2016, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, and M. Paprzycki, Eds., vol. 8. IEEE, 2016. http://dx.doi.org/10.15439/2016F391 pp. 533–540. [Online]. Available: https://doi.org/10.15439/2016F391
  3. J. Dörpinghaus, S. Schaaf, J. Fluck, and M. Jacobs, “Document clustering using a graph covering with pseudostable sets,” in Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, FedCSIS 2017, Prague, Czech Republic, September 3-6, 2017, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, and M. Paprzycki, Eds., vol. 11, 2017. http://dx.doi.org/10.15439/2017F84 pp. 329–338. [Online]. Available: https://doi.org/10.15439/2017F84
  4. P. Borkowski, M. A. Kłopotek, B. Starosta, S. T. Wierzchoń, and M. Sydow, “Eigenvalue based spectral classification,” PLoS ONE, vol. 18, no. 4, p. e0283413, 2023. http://dx.doi.org/https://doi.org/10.1371/journal.pone.0283413
  5. I. S. Dhillon and D. S. Modha, “Concept decompositions for large sparse text data using clustering,” Machine Learning, vol. 42, no. 1, pp. 143–175, Jan 2001. http://dx.doi.org/https://doi.org/10.1023/A:1007612920971
  6. S. T. Wierzchoń and M. A. Kłopotek, “Spectral cluster maps versus spectral clustering,” in Computer Information Systems and Industrial Management, ser. LNCS, vol. 12133. Springer, 2020. http://dx.doi.org/10.1007/978-3-030-47679-3 40 pp. 472–484.
  7. C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. Fernández del Rı́o, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, and T. E. Oliphant, “Array programming with NumPy,” Nature, vol. 585, p. 357–362, 2020. http://dx.doi.org/10.1038/s41586-020-2649-2 https://numpy.org.
  8. P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors, “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, pp. 261–272, 2020. http://dx.doi.org/10.1038/s41592-019-0686-2 https://scipy.org.
  9. L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux, “API design for machine learning software: experiences from the scikit-learn project,” in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 2013. http://dx.doi.org/10.48550/arXiv.1309.023 pp. 108–122, https://scikit-learn.org.
  10. H. Kim and H. K. Kim, “clustering4docs github repository,” 2020, https://pypi.org/project/soyclustering/. [Online]. Available: https://github.com/lovit/clustering4docs
  11. H. Kim, H. K. Kim, and S. Cho, “Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling,” Expert Systems with Applications, vol. 150, p. 113288, 2020. http://dx.doi.org/https://doi.org/10.1016/j.eswa.2020.113288. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417420301135
  12. U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007. http://dx.doi.org/https://doi.org/10.48550/arXiv.0711.0189
  13. P. Macgregor and H. Sun, “A tighter analysis of spectral clustering, and beyond,” in Proceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., vol. 162. PMLR, 17–23 Jul 2022. http://dx.doi.org/https://doi.org/10.48550/arXiv.2208.01724 pp. 14 717–14 742. [Online]. Available: https://proceedings.mlr.press/v162/macgregor22a.html
  14. Y. Xu, A. Srinivasan, and L. Xue, A Selective Overview of Recent Advances in Spectral Clustering and Their Applications. Cham: Springer International Publishing, 2021, pp. 247–277. ISBN 978-3-030-72437-5. http://dx.doi.org/10.1007/978-3-030-72437-5_12
  15. C. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press, 2008. http://dx.doi.org/https://doi.org/10.1017/CBO9780511809071
  16. F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborov, and P. Zhang, “Spectral redemption in clustering sparse networks,” in Proc. the National Academy of Sciences, vol. 110
  17. , 2013. http://dx.doi.org/10.48550/arXiv.1306.5550 pp. 20 935–20 940.
  18. H. T. Ali and R. Couillet, “Improved spectral community detection in large heterogeneous networks,” Journal of Machine Learning Research, vol. 18, no. 225, pp. 1–49, 2018. [Online]. Available: http://jmlr.org/papers/v18/17-247.html
  19. A. Saade, F. Krzakala, and L. Zdeborová, “Spectral clustering of graphs with the bethe hessian,” 2014. [Online]. Available: https://arxiv.org/abs/1406.1880. http://dx.doi.org/10.48550/ARXIV.1406.1880
  20. Y. Endo and S. Miyamoto, “Spherical k-means++ clustering,” in Modeling Decisions for Artificial Intelligence, V. Torra and T. Narukawa, Eds. Cham: Springer International Publishing, 2015. http://dx.doi.org/https://doi.org/10.1007/978-3-319-23240-9 9. ISBN 978-3-319-23240-9 pp. 103–114.
  21. S. Ji, D. Xu, L. Guo, M. Li, and D. Zhang, “The seeding algorithm for spherical k-means clustering with penalties,” J. Comb. Optim., vol. 44, no. 3, p. 1977–1994, oct 2022. http://dx.doi.org/10.1007/s10878-020-00569-1. [Online]. Available: https://doi.org/10.1007/s10878-020-00569-1
  22. J. Knittel, S. Koch, and T. Ertl, “Efficient sparse spherical k-means for document clustering,” in Proceedings of the 21st ACM Symposium on Document Engineering, DocEng ’21. ACM, New York, NY, United States, 2021. http://dx.doi.org/https://doi.org/10.1145/3469096.3474937 pp. 1–4.
  23. R. Pratap, A. Deshmukh, P. Nair, and T. Dutt, “A faster sampling algorithm for spherical k-means,” in Proceedings of The 10th Asian Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Zhu and I. Takeuchi, Eds., vol. 95. PMLR, 14–16 Nov 2018, pp. 343–358. [Online]. Available: https://proceedings.mlr.press/v95/pratap18a.html
  24. R. A. Kłopotek, M. A. Kłopotek, and S. T. Wierzchoń, “A feasible k-means kernel trick under non-euclidean feature space,” International Journal of Applied Mathematics and Computer Science, vol. 30, no. 4, pp. 703–715, 2020. http://dx.doi.org/https://doi.org/10.34768/amcs-2020-0052 Online publication date: 1-Dec-2020.
  25. J. Gower, “Some distance properties of latent root and vector methods used in multivariate analysis,” Biometrika, vol. 53(3-4), pp. 325—-338, 1966. http://dx.doi.org/https://doi.org/10.1093/biomet/53.3-4.325