Abstract
Motivation: It is far from trivial to select the most effective clustering method and its parameterization, for a particular set of gene expression data, because there are a very large number of possibilities. Although many researchers still prefer to use hierarchical clustering in one form or another, this is often sub-optimal. Cluster ensemble research solves this problem by automatically combining multiple data partitions from different clusterings to improve both the robustness and quality of the clustering result. However, many existing ensemble techniques use an association matrix to summarize sample-cluster co-occurrence statistics, and relations within an ensemble are encapsulated only at coarse level, while those existing among clusters are completely neglected. Discovering these missing associations may greatly extend the capability of the ensemble methodology for microarray data clustering. Results: The link-based cluster ensemble (LCE) method, presented here, implements these ideas and demonstrates outstanding performance. Experiment results on real gene expression and synthetic datasets indicate that LCE: (i) usually outperforms the existing cluster ensemble algorithms in individual tests and, overall, is clearly class-leading; (ii) generates excellent, robust performance across different types of data, especially with the presence of noise and imbalanced data clusters; (iii) provides a high-level data matrix that is applicable to many numerical clustering techniques; and (iv) is computationally efficient for large datasets and gene clustering. Availability: Online supplementary and implementation are available at: http://users.aber.ac.uk/nii07/bioinformatics2010. Contact: [email protected]; [email protected]. Supplementary information: Supplementary data are available at Bioinformatics online.
Original language | English |
---|---|
Article number | btq226 |
Pages (from-to) | 1513-1519 |
Number of pages | 7 |
Journal | Bioinformatics |
Volume | 26 |
Issue number | 12 |
Early online date | 05 May 2010 |
DOIs | |
Publication status | Published - 15 Jun 2010 |
Keywords
- Algorithms
- Cluster Analysis
- Gene Expression
- Gene Expression Profiling/methods
- Oligonucleotide Array Sequence Analysis/methods
- Pattern Recognition, Automated