TY - JOUR
T1 - A link-based approach to the cluster ensemble problem
AU - Iam-On, Natthakan
AU - Boongoen, Tossapon
AU - Garrett, Simon
AU - Price, Chris
PY - 2011/12
Y1 - 2011/12
N2 - Cluster ensembles have recently emerged as a powerful alternative to standard cluster analysis, aggregating several input data clusterings to generate a single output clustering, with improved robustness and stability. From the early work, these techniques held great promise; however, most of them generate the final solution based on incomplete information of a cluster ensemble. The underlying ensemble-information matrix reflects only cluster-data point relations, while those among clusters are generally overlooked. This paper presents a new link-based approach to improve the conventional matrix. It achieves this using the similarity between clusters that are estimated from a link network model of the ensemble. In particular, three new link-based algorithms are proposed for the underlying similarity assessment. The final clustering result is generated from the refined matrix using two different consensus functions of feature-based and graph-based partitioning. This approach is the first to address and explicitly employ the relationship between input partitions, which has not been emphasized by recent studies of matrix refinement. The effectiveness of the link-based approach is empirically demonstrated over 10 data sets (synthetic and real) and three benchmark evaluation measures. The results suggest the new approach is able to efficiently extract information embedded in the input clusterings, and regularly illustrate higher clustering quality in comparison to several state-of-the-art techniques.
AB - Cluster ensembles have recently emerged as a powerful alternative to standard cluster analysis, aggregating several input data clusterings to generate a single output clustering, with improved robustness and stability. From the early work, these techniques held great promise; however, most of them generate the final solution based on incomplete information of a cluster ensemble. The underlying ensemble-information matrix reflects only cluster-data point relations, while those among clusters are generally overlooked. This paper presents a new link-based approach to improve the conventional matrix. It achieves this using the similarity between clusters that are estimated from a link network model of the ensemble. In particular, three new link-based algorithms are proposed for the underlying similarity assessment. The final clustering result is generated from the refined matrix using two different consensus functions of feature-based and graph-based partitioning. This approach is the first to address and explicitly employ the relationship between input partitions, which has not been emphasized by recent studies of matrix refinement. The effectiveness of the link-based approach is empirically demonstrated over 10 data sets (synthetic and real) and three benchmark evaluation measures. The results suggest the new approach is able to efficiently extract information embedded in the input clusterings, and regularly illustrate higher clustering quality in comparison to several state-of-the-art techniques.
KW - Clustering
KW - cluster ensembles
KW - cluster relations
KW - data mining
KW - link-based similarity
UR - http://hdl.handle.net/2160/8694
UR - http://www.scopus.com/inward/record.url?scp=80054932597&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2011.84
DO - 10.1109/TPAMI.2011.84
M3 - Article
C2 - 21576752
SN - 0162-8828
VL - 33
SP - 2396
EP - 2409
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 12
M1 - 5765991
ER -