A Dynamic Convergence Criterion for Fast K-means Computations | SpringerLink
Skip to main content

A Dynamic Convergence Criterion for Fast K-means Computations

  • Conference paper
  • First Online:
Web Information Systems and Applications (WISA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14883))

Included in the following conference series:

  • 302 Accesses

Abstract

The K-Means algorithm has effectively promoted the development of intelligent systems and data-driven decision-making through data clustering and analysis. A reasonable convergence judgment directly determines when the model training can be terminated, which heavily affects the model quality. There are many researches for training acceleration and quality improvement, but few focus on the judgment. Currently, the convergence criteria still adopt a centralized judgment strategy for a single loss value. The criterion is simply copied between different optimized K-Means variants, typically like the fast Mini-Batch version and the traditional Full-Batch version. Our analysis reveals that such a design cannot guarantee that different variants converge to the same point, that is, it can result in abnormal situations such as false-positive and over-training. To perform a fair comparison and guarantee the model accuracy, we proposed a new dynamic convergence criterion VF (Vote for Freezing) and optimized version VF+. VF adopts a distributed judgment strategy where each sample can decide whether to participate in training based on the criterion (i.e., freezing itself) or not. Meanwhile, combined with the priority of samples, VF adaptively adjusts the sample freezing threshold which achieves asymptotic withdrawal of samples and accelerates model convergence. VF+ further introduced parameter freezing thresholds and freezing periods to eliminate redundant distance calculations, hence it improves the training efficiency. Experiments on multiple datasets validate the effectiveness of our convergence criterion in terms of training quality and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 14871
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 11725
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmed, M., Seraj, R., Islam, S.M.S.: The k-means algorithm: a comprehensive survey and performance evaluation. Electronics 9(8), 1295 (2020)

    Article  Google Scholar 

  2. Blackard, J.: Covertype. UCI Machine Learning Repository (1998). https://doi.org/10.24432/C50K5N

  3. Chen, C., et al.: Communication-efficient federated learning with adaptive parameter freezing. In: 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), pp. 1–11. IEEE (2021)

    Google Scholar 

  4. Ghazal, T.M.: Performances of k-means clustering algorithm with different distance metrics. Intell. Autom. Soft Comput. 30(2), 735–742 (2021)

    Article  Google Scholar 

  5. Han, R., et al.: SlimML: removing non-critical input data in large-scale iterative machine learning. IEEE Trans. Knowl. Data Eng. 33(5), 2223–2236 (2019)

    Google Scholar 

  6. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)

    Google Scholar 

  7. Mao, Y., Gan, D., Mwakapesa, D.S., Nanehkaran, Y.A., Tao, T., Huang, X.: A mapreduce-based k-means clustering algorithm. J. Supercomput. 1–22 (2022)

    Google Scholar 

  8. McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 169–178 (2000)

    Google Scholar 

  9. Olukanmi, P., Nelwamondo, F., Marwala, T., Twala, B.: Automatic detection of outliers and the number of clusters in k-means clustering via Chebyshev-type inequalities. Neural Comput. Appl. 34(8), 5939–5958 (2022)

    Article  Google Scholar 

  10. Pérez, J., Martínez, A., Almanza, N., Mexicano, A., Pazos, R.: Improvement to the k-means algorithm by using its geometric and cluster neighborhood properties. In: Proceedings of ICITSEM, pp. 21–26 (2014)

    Google Scholar 

  11. Song, Z., et al.: ADGNN: towards scalable GNN training with aggregation-difference aware sampling. Proc. ACM Manag. Data 1(4), 1–26 (2023)

    Google Scholar 

  12. Song, Z., Gu, Y., Qi, J., Wang, Z., Yu, G.: EC-graph: a distributed graph neural network system with error-compensated compression. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 648–660. IEEE (2022)

    Google Scholar 

  13. Wang, Z., Gu, Y., Bao, Y., Yu, G., Yu, J.X., Wei, Z.: HGraph: I/O-efficient distributed and iterative graph computing by hybrid pushing/pulling. IEEE Trans. Knowl. Data Eng. 33(5), 1973–1987 (2019)

    Google Scholar 

  14. Wang, Z., et al.: FSP: towards flexible synchronous parallel frameworks for distributed machine learning. IEEE Trans. Parallel Distrib. Syst. 34(2), 687–703 (2022)

    Article  Google Scholar 

  15. Wang, Z., et al.: Lightweight streaming graph partitioning by fully utilizing knowledge from local view. In: 2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS), pp. 614–625. IEEE (2023)

    Google Scholar 

  16. Whiteson, D.: HIGGS. UCI Machine Learning Repository (2014). https://doi.org/10.24432/C5V312

  17. Whiteson, D.: HEPMASS. UCI Machine Learning Repository (2016). https://doi.org/10.24432/C5PP5W

  18. Yu, C., Fei, L., Chen, F., Chen, L., Wang, J.: Heterogeneous graphs embedding learning with metapath instance contexts. In: Yuan, L., Yang, S., Li, R., Kanoulas, E., Zhao, X. (eds.) WISA 2023. LNCS, vol. 14094, pp. 149–161. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-6222-8_13

    Chapter  Google Scholar 

Download references

Acknowledgement

This work was supported by the Key R&D Program of Shandong Province, China (No. 2023CXPT020), and the National Natural Science Foundation of China (No. U23A20320 and No. U22A2068).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhigang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yu, H. et al. (2024). A Dynamic Convergence Criterion for Fast K-means Computations. In: Jin, C., Yang, S., Shang, X., Wang, H., Zhang, Y. (eds) Web Information Systems and Applications. WISA 2024. Lecture Notes in Computer Science, vol 14883. Springer, Singapore. https://doi.org/10.1007/978-981-97-7707-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-7707-5_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-7706-8

  • Online ISBN: 978-981-97-7707-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics