Abstract
The aim of data science is to catch up with the data-intensive life style as well as the demand for decision support, which becomes common in various domains such as medical, education, and other smart solutions. As such, high quality of data analysis is greatly desired for accurate and effective downstreaming exploitations. Specific to data clustering, vast amounts of works have concentrated on modeling a distance metric and a clustering algorithm, with the assumption of a complete data. However, this might not always be the case as missing values can occur in the dataset under examination. Instead of filling in these values using an imputation method, a recent study successfully makes use of the consensus clustering to overcome the problem without committing an explicit imputation procedure. This paper extends the previous framework to link-based consensus clustering that provides a more refined summarization of cluster ensemble, hence the resulting data partition. It exhibits a promising performance on several benchmark data collections obtained from UCI repository.
Original language | English |
---|---|
Article number | 7 |
Number of pages | 9 |
Journal | Data-Enabled Discovery and Applications |
Volume | 3 |
Issue number | 1 |
DOIs | |
Publication status | Published - 29 Mar 2019 |
Externally published | Yes |