G-2017-33
No-means clustering: A Stochastic variant of k-means
, , and BibTeX reference
Simple, intuitive, and scalable to large problems, k
-means clustering is perhaps the most frequently-used technique for unsupervised learning. However, global optimization of the k
-means objective function is challenging, as the clustering algorithm is highly sensitive to its initial value. Exploiting the connection between k
-means and Bayesian clustering, we explore the benefits of stochastic optimization to address this issue. Our "no means
" algorithm has provably superior mixing time to a natural Gibbs sampler with auxiliary cluster centroids. Yet, it retains the same computational complexity as the original k
-means approach.
Comparisons on two benchmark datasets
indicate that stochastic search usually produces more homogeneous clusters than the steepest descent algorithm employed by k
-means. Our no means
method
objective function has multiple modes which are not too far apart.
Published May 2017 , 13 pages
Document
G1733.pdf (3 MB)