TY - JOUR
T1 - EGFAFS
T2 - A Novel Feature Selection Algorithm Based on Explosion Gravitation Field Algorithm
AU - Huang, Lan
AU - Hu, Xuemei
AU - Wang, Yan
AU - Fu, Yuan
N1 - Funding Information:
Funding: This research was funded by the National Natural Science Foundation of China (No. 62072212), the National Key Research and Development Program (No.2018YFC2001302), and the Development Project of Jilin Province of China (No. 20200403172SF, No.20210504003GH).
Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2022/6/25
Y1 - 2022/6/25
N2 - Feature selection (FS) is a vital step in data mining and machine learning, especially for analyzing the data in high-dimensional feature space. Gene expression data usually consist of a few samples characterized by high-dimensional feature space. As a result, they are not suitable to be processed by simple methods, such as the filter-based method. In this study, we propose a novel feature selection algorithm based on the Explosion Gravitation Field Algorithm, called EGFAFS. To reduce the dimensions of the feature space to acceptable dimensions, we constructed a recommended feature pool by a series of Random Forests based on the Gini index. Furthermore, by paying more attention to the features in the recommended feature pool, we can find the best subset more efficiently. To verify the performance of EGFAFS for FS, we tested EGFAFS on eight gene expression datasets compared with four heuristic-based FS methods (GA, PSO, SA, and DE) and four other FS methods (Boruta, HSICLasso, DNN-FS, and EGSG). The results show that EGFAFS has better performance for FS on gene expression data in terms of evaluation metrics, having more than the other eight FS algorithms. The genes selected by EGFAGS play an essential role in the differential co-expression network and some biological functions further demonstrate the success of EGFAFS for solving FS problems on gene expression data.
AB - Feature selection (FS) is a vital step in data mining and machine learning, especially for analyzing the data in high-dimensional feature space. Gene expression data usually consist of a few samples characterized by high-dimensional feature space. As a result, they are not suitable to be processed by simple methods, such as the filter-based method. In this study, we propose a novel feature selection algorithm based on the Explosion Gravitation Field Algorithm, called EGFAFS. To reduce the dimensions of the feature space to acceptable dimensions, we constructed a recommended feature pool by a series of Random Forests based on the Gini index. Furthermore, by paying more attention to the features in the recommended feature pool, we can find the best subset more efficiently. To verify the performance of EGFAFS for FS, we tested EGFAFS on eight gene expression datasets compared with four heuristic-based FS methods (GA, PSO, SA, and DE) and four other FS methods (Boruta, HSICLasso, DNN-FS, and EGSG). The results show that EGFAFS has better performance for FS on gene expression data in terms of evaluation metrics, having more than the other eight FS algorithms. The genes selected by EGFAGS play an essential role in the differential co-expression network and some biological functions further demonstrate the success of EGFAFS for solving FS problems on gene expression data.
KW - Explosion Gravitation Field Algorithm
KW - feature selection
KW - gene expression data
KW - heuristic algorithm
UR - http://www.scopus.com/inward/record.url?scp=85133260198&partnerID=8YFLogxK
UR - https://github.com/abcair/EGFAFS
U2 - 10.3390/e24070873
DO - 10.3390/e24070873
M3 - Article
C2 - 35885095
AN - SCOPUS:85133260198
VL - 24
JO - Entropy
JF - Entropy
IS - 7
M1 - 873
ER -