TY - GEN
T1 - Improving consensus clustering with noise-induced ensemble generation
T2 - 10th International Conference on Machine Learning and Computing, ICMLC 2018
AU - Panwong, Patcharaporn
AU - Boongoen, Tossapon
AU - Iam-On, Natthakan
N1 - Funding Information:
This research is part of a PhD dissertation and also partly funded by Mae Fah Luang University.
Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/2/26
Y1 - 2018/2/26
N2 - Specific to data mining or data analysis in general, noise raises the difficulty for many conventional models to deliver a trustworthy result. Several studies have devoted to adjust existing methods to exhibit a noise tolerance characteristic, while some others rely pretty much on the process of data cleansing prior the analysis process. One way or another, the impact of noise is minimized, thus keeping up the goodness of discovered knowledge. In contrary of these, a few researches have recently reported a benefit of injecting small amount of noise into the data under examination. Given such an insight, the paper introduces an initial and unique study of employing noise in the process of cluster ensemble generation. This noise-induced strategy is to deliver data perturbation that can be coupled with general generation methods like homogeneous ensemble of k-means and different number of clusters. In a nutshell, multiple data matrices are created from the original data, each of which possesses salt-and-pepper noise locations and uniform-random noise values. This may yield different cluster structures, hence the diversity within an ensemble. Based on the empirical investigation with nine benchmark datasets, the aforementioned approach has shown potential with improved clustering performance comparing to basic generation methods.
AB - Specific to data mining or data analysis in general, noise raises the difficulty for many conventional models to deliver a trustworthy result. Several studies have devoted to adjust existing methods to exhibit a noise tolerance characteristic, while some others rely pretty much on the process of data cleansing prior the analysis process. One way or another, the impact of noise is minimized, thus keeping up the goodness of discovered knowledge. In contrary of these, a few researches have recently reported a benefit of injecting small amount of noise into the data under examination. Given such an insight, the paper introduces an initial and unique study of employing noise in the process of cluster ensemble generation. This noise-induced strategy is to deliver data perturbation that can be coupled with general generation methods like homogeneous ensemble of k-means and different number of clusters. In a nutshell, multiple data matrices are created from the original data, each of which possesses salt-and-pepper noise locations and uniform-random noise values. This may yield different cluster structures, hence the diversity within an ensemble. Based on the empirical investigation with nine benchmark datasets, the aforementioned approach has shown potential with improved clustering performance comparing to basic generation methods.
KW - Consensus clustering
KW - Ensemble generation
KW - Uniform random noise
UR - http://www.scopus.com/inward/record.url?scp=85048332396&partnerID=8YFLogxK
U2 - 10.1145/3195106.3195154
DO - 10.1145/3195106.3195154
M3 - Conference Proceeding (Non-Journal item)
AN - SCOPUS:85048332396
T3 - ACM International Conference Proceeding Series
SP - 390
EP - 395
BT - Proceedingsof 2018 10th International Conference on Machine Learning and Computing, ICMLC 2018
PB - Association for Computing Machinery
Y2 - 26 February 2018 through 28 February 2018
ER -