Machine Learning-Based Stacking Ensemble Model for Prediction of Heart Disease with Explainable AI and K-Fold Cross-Validation: A Symmetric Approach

Sara Qamar Sultan, Nadeem Javaid*, Nabil Alrajeh, Muhammad Aslam

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

One of the most complex and prevalent diseases is heart disease (HD). It is among the main causes of death around the globe. With changes in lifestyles and the environment, its prevalence is rising rapidly. The prediction of the disease in its early stages is crucial, as delays in diagnosis can cause serious complications and even death. Machine learning (ML) can be effective in this regard. Many researchers have used different techniques for the efficient detection of the disease and to overcome the drawbacks of existing models. Several ensemble models have also been applied. We proposed a stacking ensemble model named NCDG, which uses Naive Bayes, Categorical Boosting, and Decision Tree as base learners, with Gradient Boosting serving as the meta-learner classifier. We performed preprocessing using a factorization method to convert string columns into integers. We employ the Synthetic Minority Oversampling TEchnique (SMOTE) and BorderLineSMOTE balancing techniques to address the issue of data class imbalance. Additionally, we implemented hard and soft voting using voting classifier and compared the results with the proposed stacking model. For the Artificial Intelligence-based eXplainability of our proposed NCDG model, we use the SHapley Additive exPlanations (SHAP) technique. The outcomes show that our suggested stacking model, NCDG, performs better than the benchmark existing techniques. The experimental results of our proposed stacking model achieved the highest accuracy, F1-Score, precision and recall of 0.91, 0.91, 0.91 and 0.91, respectively, and an execution time of 653 s. Moreover, we have also utilized K-Fold Cross-Validation method to validate our predicted results. It is worth mentioning that our prediction results and their validation strongly coincide with each other which proves our approach to be symmetric.
Original languageEnglish
Article number185
Number of pages26
JournalSymmetry
Volume17
Issue number2
Early online date25 Jan 2025
DOIs
Publication statusPublished - 28 Feb 2025

Keywords

  • BorderLineSMOTE
  • heart disease
  • machine learning
  • SHapley Additive exPlanations
  • stacking model
  • VotingClassifier
  • K-Fold Cross-Validation
  • symmetric approach

Fingerprint

Dive into the research topics of 'Machine Learning-Based Stacking Ensemble Model for Prediction of Heart Disease with Explainable AI and K-Fold Cross-Validation: A Symmetric Approach'. Together they form a unique fingerprint.

Cite this