Abstract
Cluster ensembles have recently emerged as a powerful alternative to standard cluster analysis, aggregating several input data clusterings to generate a single output clustering, with improved robustness and stability. From the early work, these techniques held great promise; however, most of them generate the final solution based on incomplete information of a cluster ensemble. The underlying ensemble-information matrix reflects only cluster-data point relations, while those among clusters are generally overlooked. This paper presents a new link-based approach to improve the conventional matrix. It achieves this using the similarity between clusters that are estimated from a link network model of the ensemble. In particular, three new link-based algorithms are proposed for the underlying similarity assessment. The final clustering result is generated from the refined matrix using two different consensus functions of feature-based and graph-based partitioning. This approach is the first to address and explicitly employ the relationship between input partitions, which has not been emphasized by recent studies of matrix refinement. The effectiveness of the link-based approach is empirically demonstrated over 10 data sets (synthetic and real) and three benchmark evaluation measures. The results suggest the new approach is able to efficiently extract information embedded in the input clusterings, and regularly illustrate higher clustering quality in comparison to several state-of-the-art techniques.
Original language | English |
---|---|
Article number | 5765991 |
Pages (from-to) | 2396-2409 |
Number of pages | 14 |
Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Volume | 33 |
Issue number | 12 |
Early online date | 12 May 2011 |
DOIs | |
Publication status | Published - Dec 2011 |
Keywords
- Clustering
- cluster ensembles
- cluster relations
- data mining
- link-based similarity