Back

G-2017-33

No-means clustering: A Stochastic variant of k-means

, , and

BibTeX reference

Simple, intuitive, and scalable to large problems, k-means clustering is perhaps the most frequently-used technique for unsupervised learning. However, global optimization of the k-means objective function is challenging, as the clustering algorithm is highly sensitive to its initial value. Exploiting the connection between k-means and Bayesian clustering, we explore the benefits of stochastic optimization to address this issue. Our "no means" algorithm has provably superior mixing time to a natural Gibbs sampler with auxiliary cluster centroids. Yet, it retains the same computational complexity as the original k-means approach. Comparisons on two benchmark datasets indicate that stochastic search usually produces more homogeneous clusters than the steepest descent algorithm employed by k-means. Our no means method objective function has multiple modes which are not too far apart.

, 13 pages

Document

G1733.pdf (3 MB)