{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T07:32:51Z","timestamp":1725003171993},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"\n \n The quality of K-Means clustering is extremely sensitive to proper initialization. The classic remedy is to apply k-means++ to obtain an initial set of centers that is provably competitive with the optimal solution. Unfortunately, k-means++ requires k full passes over the data which limits its applicability to massive datasets. We address this problem by proposing a simple and efficient seeding algorithm for K-Means clustering. The main idea is to replace the exact D2-sampling step in k-means++ with a substantially faster approximation based on Markov Chain Monte Carlo sampling. We prove that, under natural assumptions on the data, the proposed algorithm retains the full theoretical guarantees of k-means++ while its computational complexity is only sublinear in the number of data points. For such datasets, one can thus obtain a provably good clustering in sublinear time. Extensive experiments confirm that the proposed method is competitive with k-means++ on a variety of real-world, large-scale datasets while offering a reduction in runtime of several orders of magnitude.\n \n <\/jats:p>","DOI":"10.1609\/aaai.v30i1.10259","type":"journal-article","created":{"date-parts":[[2022,6,24]],"date-time":"2022-06-24T05:51:12Z","timestamp":1656049872000},"source":"Crossref","is-referenced-by-count":55,"title":["Approximate K-Means++ in Sublinear Time"],"prefix":"10.1609","volume":"30","author":[{"given":"Olivier","family":"Bachem","sequence":"first","affiliation":[]},{"given":"Mario","family":"Lucic","sequence":"additional","affiliation":[]},{"given":"S. Hamed","family":"Hassani","sequence":"additional","affiliation":[]},{"given":"Andreas","family":"Krause","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2016,2,21]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/10259\/10118","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/10259\/10118","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,24]],"date-time":"2022-06-24T05:51:13Z","timestamp":1656049873000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/10259"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,2,21]]},"references-count":0,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2016,2,18]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v30i1.10259","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2016,2,21]]}}}