TY - JOUR
T1 - Improving consensus clustering with noise-induced ensemble generation
AU - Panwong, Patcharaporn
AU - Boongoen, Tossapon
AU - Iam-On, Natthakan
N1 - Funding Information:
This research is part of a PhD dissertation and also partly funded by Mae Fah Luang University. Note also that the work is also a part of the project IAPP1/100077 - Newton Fund (RAE - TRF): Industry Academia Partnership Programme - 17/18, with Dr Tossapon Boongoen being the PI.
Publisher Copyright:
© 2019
PY - 2020/5/15
Y1 - 2020/5/15
N2 - Because of the negative perception towards noise, it is commonly eliminated in the process of data cleansing prior to the analysis process. Some studies attempt to employ tolerant or robust algorithms to achieve a reliable outcome. One way or another, the impact of noise might be minimized, thus preserving the integrity of discovered knowledge. On the other hand, making good use of noise has recently been investigated and exploited in different contexts, such as in privacy-preserving data mining, single clustering and consensus clustering. Given our initial study of employing uniform random noise in the process of ensemble generation as a way to increase diversity within an ensemble, improved clustering goodness can be obtained at specific levels of noise. To consolidate the aforementioned finding, this paper investigates a rich collection of random noise functions, which can be used to form perturbed data variation within the framework of noise-induced ensemble generation. The effectiveness of this approach which uses different cases for random noise is demonstrated over benchmark datasets from the UCI repository. The results suggest that the noise-induced strategy is generally better than the baseline counterpart, whilst showing uneven improvement with different data patterns. As such, a guideline is provided to make the best use of the proposed method with any new set of data.
AB - Because of the negative perception towards noise, it is commonly eliminated in the process of data cleansing prior to the analysis process. Some studies attempt to employ tolerant or robust algorithms to achieve a reliable outcome. One way or another, the impact of noise might be minimized, thus preserving the integrity of discovered knowledge. On the other hand, making good use of noise has recently been investigated and exploited in different contexts, such as in privacy-preserving data mining, single clustering and consensus clustering. Given our initial study of employing uniform random noise in the process of ensemble generation as a way to increase diversity within an ensemble, improved clustering goodness can be obtained at specific levels of noise. To consolidate the aforementioned finding, this paper investigates a rich collection of random noise functions, which can be used to form perturbed data variation within the framework of noise-induced ensemble generation. The effectiveness of this approach which uses different cases for random noise is demonstrated over benchmark datasets from the UCI repository. The results suggest that the noise-induced strategy is generally better than the baseline counterpart, whilst showing uneven improvement with different data patterns. As such, a guideline is provided to make the best use of the proposed method with any new set of data.
KW - Consensus clustering
KW - Attribute noise
KW - Ensemble generation
UR - http://www.scopus.com/inward/record.url?scp=85077060447&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2019.113138
DO - 10.1016/j.eswa.2019.113138
M3 - Article
AN - SCOPUS:85077060447
SN - 0957-4174
VL - 146
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 113138
ER -