Healthc Inform Res > Volume 29(3); 2023 > Article |
|
Predictor attribute | Valuea | |
---|---|---|
Male | Female | |
Sex | 5,505 | 1,332 |
Age (yr) | 58.18 | 61.27 |
Erythrocytes (million cells/μL) | 4.3–5.6 | 3.9–5.1 |
Hematocrit (%) | 41–50 | 36–44 |
Hemoglobin (g/dL) | 13–17 | 12–15 |
MCH (pg) | 27.5–33.2 | |
MCHC (g/dL) | 32–36 | |
Leukocytes (103/μL) | 3.5–10.5 | |
Thrombocyte (103/μL) | 135–317 | 157–371 |
Diagnosis code | ||
Number of patients with AHD | 4,702 | |
Number of patients with no AHD | 2,135 |
Study | ML technique | Dataset | Result and limitation |
---|---|---|---|
Almustafa [20] | NB, SGD, SVM, KNN, DT, AdaBoost | 1,025 patient records from Cleveland, Hungary, Switzerland, and Long Beach datasets. 14 attributes were used, mainly, age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, old peak, the slope of the peak exercise ST segment, number of major vessels fluoroscopy and defect along with the class attribute. |
Classification algorithms for the heart disease dataset produce very promising results in terms of classification accuracy. In-depth sensitivity analysis and performance have not been performed. |
Park et al. [9] | LRM, CART, CIT, RF | 3,302 patient records from two cohorts (Soonchunhyang University Cheonan Hospital and Kangbuk Samsung Health Study). Attributes were namely HTN (hypertension), DM (diabetes mellitus), eGFR (estimated glomerular filtration rate), BMI (body mass index), non-HDL (non-high-density lipoprotein) cholesterol, and CACS (coronary artery calcification score). |
All models showed acceptable accuracies: LR (70.71%), CART (71.32%), CIT (71.32%), RF (71.02%). The cohorts used in this study had previously been enrolled in other studies, which could result in biases. |
Su et al. [21] | RF, LR | 498 subjects were conducted in Xi’an Medical University. The risk of developing CVD can be predicted according to the individual’s age, BMI, triglycerides, and diastolic blood pressure (DBP). |
The ROC-AUCs were 0.802 for random forest model and 0.843 for LR model. A retrospective study with a small number of subjects (n = 498). |
Budholiya et al. [15] | XGboost, RF, ExtraTree classifiers | The Cleveland Heart Disease dataset obtained from the University of California, Irvine (UCI) online ML, and data mining repository. Attributes were namely, age, sex, chest pain, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiograph results, maximum heart rate achieved, exercise-induced angina, ST-depression, ST-slope, number of major vessels, thalassemia, num (target variable). |
XGboost performed the highest prediction accuracy of 91.8%. This research has not performed interpretable methods for ML to understand and explain predictions results. |
Cao et al. [22] | LR, BP neural network, XGBoost, RF |
553 patients in the Department of Cardiology at a tertiary hospital in Anhui Province. Clinical data sources include patients’ general data, cardiac ultrasound recording, laboratory examination results. |
The XGBoost model’s prediction value was the best. A retrospective study with a small number of subjects (n = 553). |
Absar et al. [19] | RF, DT, Ada-Boost, KNN | The Cleveland Heart Disease dataset obtained from the University of California, Irvine (UCI) online machine learning, and data mining repository. Attributes were namely, age, sex, chest pain type, blood pressure, serum cholesterol, fasting blood sugar, resting electro-cardiographic, maximum heart rate, old peak, the slant of the peak exercise ST segment, number of major vessels, exercise-induced angina, Thalach. |
AdaBoost performed the highest prediction accuracy of 100%. This research has not performed interpretable methods for ML to understand and explain predictions results. |
Mahesh et al. [23] | NB, DT, AdaBoost | UCI Repository provided the Heart Disease dataset. Attributes were namely, age, sex, chest pain, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiograph, maximum heart rate achieved, exercise-induced angina, oldpeak, slope, major vessels colored by fluoroscopy, defect type. |
AdaBoost-RF classifier provides 95.47% accuracy in the early detection of heart disease. This research has not performed interpretable methods for ML to understand and explain predictions results. |