Stability Selection using a Genetic Algorithm and Logistic Linear Regression on Healthcare Records

Aleš Zamuda, Christine Zarges, Gregor Stiglic, Goran Hrovat

Allbwn ymchwil: Pennod mewn Llyfr/Adroddiad/Trafodion CynhadleddTrafodion Cynhadledd (Nid-Cyfnodolyn fathau)

223 Wedi eu Llwytho i Lawr (Pure)


This paper presents a Genetic Algorithm (GA) application to measuring feature importance in machine learning (ML) from a large-scale database. Too many input features may cause over-fitting, therefore a feature selection is desirable. Some ML algorithms have feature selection embedded, e.g., lasso penalized linear regression or random forests. Others do not include such functionality and are sensitive to over-fitting, e.g., unregularized linear regression. The latter algorithms require that proper features are chosen before learning.

Therefore, we propose a novel stability selection (SS) approach using GA-based feature selection. The proposed SS approach iteratively applies GA on a subsample of records and features. Each GA individual represents a binary vector of selected features in the subsample. An unregularized logistic linear regression model is then trained and tested using GA-selected features through cross-validation of the subsamples. GA fitness is evaluated by area under the curve (AUC) and optimized during a GA run.

AUC is assessed with an unregularized logistic regression model on multiple-subsampled healthcare records, collected under the Healthcare Cost, and Utilization Project (HCUP), utilizing the National (Nationwide) Inpatient Sample (NIS) database.

Reported results show that averaging feature importance from top-4 SS and the SS using GA (GASS), improves these AUC results.
Iaith wreiddiolSaesneg
TeitlGECCO '17
Is-deitlProceedings of the Genetic and Evolutionary Computation Conference Companion
Man cyhoeddiNew York
CyhoeddwrAssociation for Computing Machinery
Nifer y tudalennau2
ISBN (Argraffiad)978-1-4503-4939-0
Dynodwyr Gwrthrych Digidol (DOIs)
StatwsCyhoeddwyd - 15 Gorff 2017
DigwyddiadGECCO 2017: The Genetic and Evolutionary Computation Conference -
Hyd: 15 Gorff 201719 Gorff 2017


CynhadleddGECCO 2017
Cyfnod15 Gorff 201719 Gorff 2017

Ôl bys

Gweld gwybodaeth am bynciau ymchwil 'Stability Selection using a Genetic Algorithm and Logistic Linear Regression on Healthcare Records'. Gyda’i gilydd, maen nhw’n ffurfio ôl bys unigryw.

Dyfynnu hyn