LCE: A link-based cluster ensemble method for improved gene expression data analysis

Natthakan Iam-On*, Tossapon Boongoen, Simon Garrett

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

102 Citations (SciVal)
28 Downloads (Pure)

Abstract

Motivation: It is far from trivial to select the most effective clustering method and its parameterization, for a particular set of gene expression data, because there are a very large number of possibilities. Although many researchers still prefer to use hierarchical clustering in one form or another, this is often sub-optimal. Cluster ensemble research solves this problem by automatically combining multiple data partitions from different clusterings to improve both the robustness and quality of the clustering result. However, many existing ensemble techniques use an association matrix to summarize sample-cluster co-occurrence statistics, and relations within an ensemble are encapsulated only at coarse level, while those existing among clusters are completely neglected. Discovering these missing associations may greatly extend the capability of the ensemble methodology for microarray data clustering. Results: The link-based cluster ensemble (LCE) method, presented here, implements these ideas and demonstrates outstanding performance. Experiment results on real gene expression and synthetic datasets indicate that LCE: (i) usually outperforms the existing cluster ensemble algorithms in individual tests and, overall, is clearly class-leading; (ii) generates excellent, robust performance across different types of data, especially with the presence of noise and imbalanced data clusters; (iii) provides a high-level data matrix that is applicable to many numerical clustering techniques; and (iv) is computationally efficient for large datasets and gene clustering. Availability: Online supplementary and implementation are available at: http://users.aber.ac.uk/nii07/bioinformatics2010. Contact: nii07@aber.ac.uk; natthakan@mfu.ac.th. Supplementary information: Supplementary data are available at Bioinformatics online.

Original languageEnglish
Article numberbtq226
Pages (from-to)1513-1519
Number of pages7
JournalBioinformatics
Volume26
Issue number12
Early online date05 May 2010
DOIs
Publication statusPublished - 15 Jun 2010

Keywords

  • Algorithms
  • Cluster Analysis
  • Gene Expression
  • Gene Expression Profiling/methods
  • Oligonucleotide Array Sequence Analysis/methods
  • Pattern Recognition, Automated

Fingerprint

Dive into the research topics of 'LCE: A link-based cluster ensemble method for improved gene expression data analysis'. Together they form a unique fingerprint.

Cite this