Back

G-2019-96

A Lagrangian-based score for assessing the quality of pairwise constraints in SSC

, , , and

BibTeX reference

Clustering algorithms help identify homogeneous subgroups from data. In some cases, additional information about the relationship among some subsets of the data exists. When using a semi-supervised clustering algorithm, an expert may provide additional information to constrain the solution based on that knowledge and guide the algorithm to a more useful and meaningful solution. For instance, he may specify that two points cannot be part of the same cluster (i.e., cannot-link constraint) or two points must be part of the same clusters (i.e., must-link constraint). A key challenge for users of semi-supervised learning algorithms, however, is that the addition of inaccurate or conflicting constraints can decrease accuracy and little is known about how to detect whether expert-imposed constraints are likely wrong. In the present work, we propose a method to score each must-link and cannot-link pairwise constraint and help users identify which constraints should be amended or removed. Using synthetic experimental examples and real data, we show that the scoring method can successfully identify constraints that should be removed.

, 20 pages

Research Axis

Research application

Publication

, , , and
Data Mining and Knowledge Discovery, 35, 2341–2368, 2021 BibTeX reference

Document

G1996.pdf (1 MB)