G-2017-33
No-means clustering: A Stochastic variant of \(k\)-means
, , and BibTeX reference
Simple, intuitive, and scalable to large problems, \(k\)
-means clustering is perhaps the most frequently-used technique for unsupervised learning. However, global optimization of the \(k\)
-means objective function is challenging, as the clustering algorithm is highly sensitive to its initial value. Exploiting the connection between \(k\)
-means and Bayesian clustering, we explore the benefits of stochastic optimization to address this issue. Our "\(\texttt{no means}\)
" algorithm has provably superior mixing time to a natural Gibbs sampler with auxiliary cluster centroids. Yet, it retains the same computational complexity as the original \(k\)
-means approach.
Comparisons on two benchmark datasets
indicate that stochastic search usually produces more homogeneous clusters than the steepest descent algorithm employed by \(k\)
-means. Our \(\texttt{no means}\)
method
objective function has multiple modes which are not too far apart.
Published May 2017 , 13 pages
Document
G1733.pdf (3 MB)