Diversity-driven generation of link-based cluster ensemble and application to data classification

Natthakan Iam-On*, Tossapon Boongoen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

23 Citations (Scopus)

Abstract

Over decades, a large number of research studies have concentrated on improving the accuracy of classification model. This is the case as several types of classifiers prove to be useful in real-life problems, including the prediction of system failure risk and microarray-based cancer diagnosis. Despite this, the accuracy of existing classifiers has been constrained by uninformative variables typically observed in modern data. In addition to feature selection, one may transform the original data to another variation, where only key feature components are included. Unlike conventional transformation-based techniques found in the literature, this paper presents a novel method that makes use of cluster ensembles, specifically the summarized information matrix, as the transformed data for the following classification step. Among different state-of-the-art methods, the link-based cluster ensemble approach (LCE) provides a highly accurate clustering, and thus particularly employed here. This is uniquely coupled with a diversity-driven generation of ensemble, which provides informative and diverse sets of clusterings. The performance of this transformation model is evaluated on published synthetic, standard and gene expression datasets; using C4.5, Naive Bayes, KNN, Neural Network and Random Forest classifiers; in comparison with benchmark techniques. The findings suggest that the new model can improve the classification accuracy of original data and performs better than the other transformation methods investigated in the empirical study.

Original languageEnglish
Pages (from-to)8259-8273
Number of pages15
JournalExpert Systems with Applications
Volume42
Issue number21
Early online date25 Jul 2015
DOIs
Publication statusPublished - 30 Nov 2015
Externally publishedYes

Keywords

  • Data classification
  • Ensemble clustering
  • Feature transformation
  • Optimization

Fingerprint

Dive into the research topics of 'Diversity-driven generation of link-based cluster ensemble and application to data classification'. Together they form a unique fingerprint.

Cite this