G-2016-81
On strategies to fix degenerate k-Means-means solutions
, , , and BibTeX reference
The \(k\)
-means is a benchmark algorithm used in cluster analysis. It belongs to the large category of
heuristics based on location-allocation steps that alternately locate cluster centers and allocate data points to them
until no further
improvement is possible. Such heuristics are known to suffer from a phenomenon called degeneracy in which some of the clusters are empty. In this paper, we compare and propose a series of strategies to circumvent degenerate solutions during a \(k\)
-means execution.
Our computational experiments
show that these strategies are effective leading to better clustering solutions in the vast majority of
the cases in which degeneracy appears in \(k\)
-means. Moreover, we compare the use of our fixing strategies within \(k\)
-means against the use of two initialization methods found in the literature. These results demonstrate how useful the proposed strategies can be, specially inside memory-based clustering algorithms.
Published October 2016 , 22 pages