TY - JOUR
T1 - Feature grouping and selection
T2 - A graph-based approach
AU - Zheng, Ling
AU - Chao, Fei
AU - Parthaláin, Neil Mac
AU - Zhang, Defu
AU - Shen, Qiang
PY - 2021/2/6
Y1 - 2021/2/6
N2 - Most current feature selection techniques are focused on the incremental inclusion or exclusion of single individual features with respect to the candidate feature subset(s). The use of such approaches, where only the individual inclusion/exclusion of features is considered, means that information such as the collaborative contribution or correlation between features may be lost. The result is that the final selected feature subset may contain high levels of inter-feature redundancy, assuming that the key information embedded in the original feature set can still be retained. To address this problem, a general framework based on graph processing and three-way mutual information metrics is proposed in this paper that works by clustering similar features into groups, from which representative features are then drawn. Two different feature selection techniques based on this framework are presented: one by straightforward selection of representative features from the resulting feature groups and the other via a music-inspired metaheuristic search. Comparative experimental evaluation against traditional feature selection techniques over a diverse range of 20 benchmark datasets demonstrates the efficacy of the proposed approach. With these implementations, significant performance gains can be made in terms of classification accuracy in general and dimensionality reduction in particular while retaining feature semantics and considerably lessening the redundancy in the returned feature subsets.
AB - Most current feature selection techniques are focused on the incremental inclusion or exclusion of single individual features with respect to the candidate feature subset(s). The use of such approaches, where only the individual inclusion/exclusion of features is considered, means that information such as the collaborative contribution or correlation between features may be lost. The result is that the final selected feature subset may contain high levels of inter-feature redundancy, assuming that the key information embedded in the original feature set can still be retained. To address this problem, a general framework based on graph processing and three-way mutual information metrics is proposed in this paper that works by clustering similar features into groups, from which representative features are then drawn. Two different feature selection techniques based on this framework are presented: one by straightforward selection of representative features from the resulting feature groups and the other via a music-inspired metaheuristic search. Comparative experimental evaluation against traditional feature selection techniques over a diverse range of 20 benchmark datasets demonstrates the efficacy of the proposed approach. With these implementations, significant performance gains can be made in terms of classification accuracy in general and dimensionality reduction in particular while retaining feature semantics and considerably lessening the redundancy in the returned feature subsets.
KW - Feature grouping
KW - Feature selection
KW - Graph processing
KW - Harmony search
KW - Minimum spanning tree
UR - http://www.scopus.com/inward/record.url?scp=85092446573&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2020.09.022
DO - 10.1016/j.ins.2020.09.022
M3 - Article
AN - SCOPUS:85092446573
SN - 0020-0255
VL - 546
SP - 1256
EP - 1272
JO - Information Sciences
JF - Information Sciences
ER -