TY - GEN
T1 - A density-based discretization method with inconsistency evaluation
AU - Zhao, Rong
AU - Qu, Yanpeng
AU - Deng, Ansheng
AU - Zwiggelaar, Reyer
N1 - Funding Information:
ACKNOWLEDGMENT This work is jointly supported by the National Natural Science Foundation of China (No. 61502068), the China Postdoctoral Science Foundation (No. 2013M541213 and 2015T80239), the Royal Society International Exchanges Cost Share Award with NSFC (No. IE160875).
Publisher Copyright:
© 2018 IEEE.
PY - 2018/6/8
Y1 - 2018/6/8
N2 - Commonly, the data used in the real-world applications is composed by two types, the continuous data and the discrete data. The continuous data represents a range of values, while the discrete data refers to the information that share certain commonality. Since the discretized data always enjoys the general and simple usability, many data mining methods such as rough set theory and decision tree are designed to deal with discrete data. Due to the abundant existence of continuous attributes in data sets, data discretization is required as an important data processing method. In this paper, a density-based clustering algorithm is used to generate a discretization method. Specifically, in order to automatically seek out the proper number of clusters, the clustering method is employed to divide data set into clusters by fast search and find of density peaks. Then a top-down splitting strategy is utilized to discretize the interval of attributes. Furthermore, a novel probabilistic inconsistency measure is proposed to evaluate the results of discretization method. The experimental results demonstrate that the discretization methods with higher classification accuracy selected by inconsistency measure is better than the other methods. Therefore, the inconsistency measure can be used as an evaluation indicator.
AB - Commonly, the data used in the real-world applications is composed by two types, the continuous data and the discrete data. The continuous data represents a range of values, while the discrete data refers to the information that share certain commonality. Since the discretized data always enjoys the general and simple usability, many data mining methods such as rough set theory and decision tree are designed to deal with discrete data. Due to the abundant existence of continuous attributes in data sets, data discretization is required as an important data processing method. In this paper, a density-based clustering algorithm is used to generate a discretization method. Specifically, in order to automatically seek out the proper number of clusters, the clustering method is employed to divide data set into clusters by fast search and find of density peaks. Then a top-down splitting strategy is utilized to discretize the interval of attributes. Furthermore, a novel probabilistic inconsistency measure is proposed to evaluate the results of discretization method. The experimental results demonstrate that the discretization methods with higher classification accuracy selected by inconsistency measure is better than the other methods. Therefore, the inconsistency measure can be used as an evaluation indicator.
KW - Clustering
KW - Discretization method
KW - Inconsistency measure
UR - http://www.scopus.com/inward/record.url?scp=85049784098&partnerID=8YFLogxK
U2 - 10.1109/ICACI.2018.8377556
DO - 10.1109/ICACI.2018.8377556
M3 - Conference Proceeding (Non-Journal item)
AN - SCOPUS:85049784098
T3 - Proceedings - 2018 10th International Conference on Advanced Computational Intelligence, ICACI 2018
SP - 758
EP - 763
BT - Proceedings - 2018 10th International Conference on Advanced Computational Intelligence, ICACI 2018
PB - IEEE Press
T2 - 10th International Conference on Advanced Computational Intelligence, ICACI 2018
Y2 - 29 March 2018 through 31 March 2018
ER -