Improving consensus clustering with noise-induced ensemble generation

Patcharaporn Panwong, Tossapon Boongoen, Natthakan Iam-On

Research output: Contribution to journalArticlepeer-review

Abstract

Because of the negative perception towards noise, it is commonly eliminated in the process of data cleansing prior to the analysis process. Some studies attempt to employ tolerant or robust algorithms to achieve a reliable outcome. One way or another, the impact of noise might be minimized, thus preserving the integrity of discovered knowledge. On the other hand, making good use of noise has recently been investigated and exploited in different contexts, such as in privacy-preserving data mining, single clustering and consensus clustering. Given our initial study of employing uniform random noise in the process of ensemble generation as a way to increase diversity within an ensemble, improved clustering goodness can be obtained at specific levels of noise. To consolidate the aforementioned finding, this paper investigates a rich collection of random noise functions, which can be used to form perturbed data variation within the framework of noise-induced ensemble generation. The effectiveness of this approach which uses different cases for random noise is demonstrated over benchmark datasets from the UCI repository. The results suggest that the noise-induced strategy is generally better than the baseline counterpart, whilst showing uneven improvement with different data patterns. As such, a guideline is provided to make the best use of the proposed method with any new set of data.

Original languageEnglish
Article number113138
Number of pages16
JournalExpert Systems with Applications
Volume146
Early online date30 Dec 2019
DOIs
Publication statusPublished - 15 May 2020
Externally publishedYes

Keywords

  • Consensus clustering
  • Attribute noise
  • Ensemble generation

Fingerprint

Dive into the research topics of 'Improving consensus clustering with noise-induced ensemble generation'. Together they form a unique fingerprint.

Cite this