Predicting the Risk of Severity and Readmission in Patients with Heart Failure in Indonesia: A Machine Learning Approach
Article information
Abstract
Objectives
In Indonesia, the poor prognosis and high hospital readmission rates of patients with heart failure (HF) have yet to receive focused attention. However, machine learning (ML) approaches can help to mitigate these problems. We aimed to determine which ML models best predicted HF severity and hospital readmissions and could be used in a patient self-monitoring mobile application.
Methods
In a retrospective cohort study, we collected the data of patients admitted with HF to the Siloam Diagram Heart Center in 2020, 2021, and 2022. Data was analyzed using the Orange data mining classification method. ML support algorithms, including artificial neural network (ANN), random forest, gradient boosting, Naïve Bayes, tree-based models, and logistic regression were used to predict HF severity and hospital readmissions. The performance of these models was evaluated using the area under the curve (AUC), accuracy, and F1-scores.
Results
Of the 543 patients with HF, 3 (0.56%) were excluded due to death on admission. Hospital readmission occurred in 138 patients (25.6%). Of the six algorithms tested, ANN showed the best performance in predicting both HF severity (AUC = 1.000, accuracy = 0.998, F1-score = 0.998) and readmission for HF (AUC = 0.998, accuracy = 0.975, F1-score = 0.972). Other studies have shown variable results for the best algorithm to predict hospital readmission in patients with HF.
Conclusions
The ANN algorithm performed best in predicting HF severity and hospital readmissions and will be integrated into a mobile application for patient self-monitoring to prevent readmissions.
I. Introduction
The signs and symptoms of heart failure (HF), a complex and potentially fatal clinical syndrome, are caused by the heart’s diminished ability to pump and supply blood to the rest of the body. HF results in high medical costs, poor functional capacity and quality of life, and significant morbidity and mortality [1–3]. HF is an international concern and is thought to affect approximately 64 million individuals. It was estimated that there were 17 cases of HF for every 1,000 people in 13 European nations. In the United States, the prevalence of HF was 2.4% in 2012 and is expected to rise to 3.0% by 2030. There are limited data on the prevalence of HF in Asia, with estimates ranging from 1.3% to 6.7% [3–5].
Patients with HF have a poor prognosis; 5% to 10% die while hospitalized, 15% within 3 months of hospitalization, and >50% within 5 years of their initial hospitalization for HF [6]. Hospitalization rates were 43% at least four times and 83% at least once. Readmission occurs in approximately 30% of patients within 30 to 60 days following their initial discharge [7]. The absolute death rate from HF within 5 years of diagnosis remains approximately 50%, despite increasing cure rates. In 2012, the global expense of treating HF surpassed $100 billion [8]. Another study showed that the individual annual cost of HF management in the United States ranged from $10,832 to $17,744, with nationwide expenses of $20.9 billion in 2012 and a projected $53.1 billion by 2030. Hospital readmissions add a significant financial burden to both the healthcare system and to patients. This is a problem that requires increased attention [9].
The Global Burden of Disease Study demonstrated how the disease burden, as measured in disability-adjusted life years (DALYs), changed in Indonesia between 1990 and 2016. The disease burden was primarily due to diarrheal diseases in 1990, which led to communicable, maternal, neonatal, and nutritional disorders. In 2006, ischemic heart disease was the predominant cause, with cerebrovascular diseases in third place; by 2016 these were the first and second causes. Although Indonesia saw an 8-year increase in life expectancy at birth between 1990 and 2016, this was accompanied by a rise in the prevalence of cardiac and cerebrovascular diseases, which can lead to HF [10].
Numerous variations of machine learning (ML) techniques have been widely used in healthcare for the treatment of ailments like HF and cardiovascular diseases. ML techniques have demonstrated effectiveness in prediction and classification tasks [11]. By identifying important risk variables, predictive models help identify patients who are at high risk of readmission to the hospital and may even make it possible to target specific interventions for the people who could benefit from them the most. When compared to traditional statistical techniques, ML methods can improve the range of prediction since they can exploit all available data and their complex relationships [12]. Analyzing various ML algorithms can lead to a deeper comprehension of the models and their real-time applications [13]. In addition, as digital technology and artificial intelligence have progressed, their capacity to offer clinical judgment has resulted in remarkable new features [14].
The objective of this study was to determine the ML method that performs best and to create recommendations for readmission prevention. Our subsequent goal will be to develop a mobile application that helps prevent hospital readmissions for patients with HF. Mobile applications can also improve HF management by encouraging patients to adopt preventive strategies [15].
II. Methods
1. Data Collection and Preparation
We used the 2019 version of the International Classification of Diseases, 10th Revision (ICD-10) codes (specifically I50 for HF) [16] in a retrospective cohort analysis to examine data from 543 patients who had HF. These patients were admitted to the Siloam Diagram Heart Center in Indonesia from January 2020 to December 2022. The patient data was followed for readmissions over a period of 1 year after each admission. Three patients died while they were being admitted to the hospital. Data was taken from medical records and extracted using a checklist of 83 elements based on the patient’s identity and history, and the clinical examinations performed by a cardiologist and hospital staff. The checklist included data on demographic variables, past medical history, current admission medical history, comorbidities, procedures or implant placement, signs and symptoms, physical examination, and laboratory results. Little to no data was available for 13 of the elements, which were eliminated from the checklist. Since patient medical records were not entirely computerized at this hospital, data were either automatically gathered from electronic medical records or manually gathered from physical medical records. During the follow-up period, HF severity and the intervals of unplanned hospital readmissions for HF were calculated. The Research and Community Service Ethics Committee granted ethical approval for this study (No. Ket-592/UN2.F10. D11/PPM.00.02/2023).
2. Predictive Variables
Since the Indonesian health system is not focused on HF specifically, we did not have basic clinical data regarding significant variables that can predict the severity and readmission rates of patients with HF in Indonesia. Therefore, we conducted a systematic literature review prior to the study to obtain the necessary variables. We used ML due to the relatively large number of variables. We included elements that were commonly applied in other studies and classified them by the rule-based method into 8 categories. The categories included: (1) demographics and socioeconomic profile (age, gender, marital status, payer status, home address, living partner, education, occupation), (2) medical history (family history of HF, time since HF diagnosed, intensive care unit/high care unit (ICU/HCU) stays during the current admission, total hospital length of stay (LOS), number of admissions for HF in the past year, number of emergency department (ED) visits because of HF in the past year, and adherence to once-a-month outpatient visits after discharge), (3) habits (smoking, alcohol, dietary salt intake, and drug use), (4) medication therapy (angiotensin-converting enzyme inhibitors/angiotensin II receptor blockers/angiotensin receptor II blocker-neprilysin inhibitors, diuretics, beta blockers, spironolactone), (5) signs and symptoms, physical assessment (shortness of breath, edema, systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, O2 saturation, jugular venous pressure, left ventricular ejection fraction [LVEF], New York Heart Association [NYHA] classification, and body mass index [BMI]), (6) lab results (hemoglobin, hematocrit, white blood cell count, glucose, sodium, potassium, balance urea nitrogen, urea, creatinine, and glomerular filtration rate [GFR]), (7) comorbidities (anemia, aortic valve disorder, asthma, cancer, cardiac arrhythmia, cerebrovascular accident/stroke, coronary artery disease [CAD], chronic obstructive pulmonary disease, dementia, depression, diabetes mellitus, dyslipidemia, hypertension, liver disease, lung disease, protein-calorie malnutrition, psychiatric disorder, renal disease, rheumatic disease, thyroid disease, and vascular disease), and (8) medical procedures or implant history (cancer-related procedures, cardiac devices, cardiac operations, coronary angioplasty, and mechanical ventilation devices).
3. Machine Learning Process
Data preparation, model selection, model training, and performance evaluation were the four main steps in the procedure (Figure 1). In data pre-processing, we used a rule-based technique and separated the targets into two categories, HF severity and readmissions for HF. The HF severity index was developed by combining several variables: LVEF, NYHA classification, creatinine levels, and history of CAD. The readmission index was developed using the number of hospital admissions and the average interval between admissions. We used version 3.36 of the Orange data mining process. A systematic literature review was conducted prior to this study to identify some of the best-performing ML algorithms. We tested six algorithms: logistic regression, trees, random forest, gradient boosting, naïve Bayes, and neural network.
4. Machine Learning Algorithms
The classification approach incorporated multiple ML models. The best technique for analyzing binary classification tasks with strong diagnostic ability is logistic regression [17]. Despite being extensively utilized in medical diagnostics, logistic regression models have several drawbacks. First, because logistic regression models rely on linearity assumptions, they may perform poorly when applied to nonlinear data distributions. The dependability of logistic regression models can be strongly impacted by the existence of outliers in the data. These models are prone to overfitting when faced with limited sample sizes, and the use of point estimation in logistic regression models creates uncertainty in the dependability of regression coefficients. A classifier called a decision tree is created by continually dividing the training set using features to predict the class labels [18]. With the help of this supervised method, data can be divided into subgroups according to specific criteria and can be used for regression and classification. This mathematical model provides a graphical depiction of every potential answer, from which a choice must be made. The only element in the graphical depiction is a tree with conditions, where certain conditions may influence the decisions [19]. Known as an ensemble classifier, the random forest method repeats random sampling for predictors and observations to train several weak classifiers [20]. This method entails building several decision trees that are used to categorize and pinpoint crucial variables for forecasting. Gradient boosting produces integrated and weighted predictive variables, thereby leveraging the potential of weaker predictors [12]. Boosting algorithms iteratively combine weak learners (i.e., learners that are only marginally better than random chance) into strong learners. Regression algorithms that resemble boosting are called gradient boosting. Extreme gradient boosting (XGBoost) is a highly scalable decision tree ensemble that utilizes gradient boosting. Like gradient boosting, XGBoost minimizes a loss function to provide an additive extension of the objective function. A version of the loss function is employed to regulate the complexity of the tree, while XGBoost is limited to using decision trees as base classifiers [18]. It is an ensemble classifier that teaches weak classifiers one after the other to fix the mistakes of the last prediction until there is no further improvement. Naïve Bayes models are generative models that assume that the characteristics, or “events,” are produced independently. As one of the most basic models in ML, Naïve Bayes classifiers still perform admirably in practical applications [21]. An artificial neural network, or ANN, is modeled after a biological learning system, which is made up of interconnected nerve cells called neurons. This model has several benefits, including a significantly improved prediction of hospital readmission risk. Since it is based on real-time data from the electronic health record, it is applicable at the time of hospital discharge. It is also compact and not susceptible to model drift [22]. Each of these approaches performs differently, depending on the data; therefore, no one approach is truly the best [9]. We compared the ANN method with other studies that focused on the performance of deep uncertainty networks (DUNs) relative to techniques like logistic regression, gradient boosting, and maxout networks and evaluated the method using 10-fold cross-validation [23] or compared it to traditional statistical models [24].
5. Statistical Analysis and Performance Criteria
The performance of various ML algorithms was assessed using area under the curve (AUC), accuracy, and F1-score. Numerous attempts were made to build predictive models, and the average values for these evaluation criteria were computed. The algorithms we compared included logistic regression, decision tree, random forest, gradient boosting, Naïve Bayes, and neural networks. Accuracy was measured by determining the proportion of correctly predicted instances to the total instances. The F1-score was the harmonic mean of precision and recall. The AUC represented the area under the ROC curve, which indicated the model’s ability to distinguish between classes.
III. Results
1. Baseline Data and Variable Classification Results
The characteristics of the patients with HF are shown in Table 1 according to the eight categories of variables.
2. Statistical Data Results
Of the 540 patients with HF, 138 (25.6%) had at least one hospital readmission. Table 2 shows the clinical characteristics of the patients with HF.
At baseline, the mean age of patients was 61.7 ± 11.8 years, the period since HF diagnosis averaged 2.1 ± 1.7 years, the LOS in the hospital was 3.3 ± 1.8 days, the number of admissions in the past year averaged 0.5±0.7, and the number of emergency department visits was 0.6 ± 1.0.
Vital signs and clinical measurements included a mean systolic blood pressure of 131.3 ± 24.8 mmHg, diastolic blood pressure of 79.4 ± 15.8 mmHg, heart rate of 86.0 ± 17.8 beats per minute, respiratory rate of 20.3 ± 2.8 breaths per minute, and oxygen saturation levels of 96.6% ± 2.7%. The mean LVEF was 38.8%±14.3%, and the mean BMI was 25.7 ± 4.8 kg/m2.
The mean laboratory values were as follows: hemoglobin, 12.7±2.0 g/dL; hematocrit, 38.4%±5.7%; white blood cell count, 8.6 ± 3.2 ×103/μL; glucose level, 147.7±59.1 mg/dL; sodium level 137.1±4.5 mmol/L; potassium level, 3.9±0.7 mmol/L; blood urea nitrogen, 23.4±14.3 mg/dL; urea, 50.0±30.6 mg/dL; creatinine, 1.5±1.1 mg/dL; and estimated glomerular filtration rate, 59.5±22.1 mL/min/1.73 m2.
3. Machine Learning Performance Evaluation
Model performance is shown in Tables 3 and 4.
In the index for predicting HF severity, the neural network performed best in three measures: AUC, accuracy, and F1-score. Neural networks provided the highest AUC (1,000). The accuracy of the neural network and F1-score were approximately 1,000. The naïve Bayes showed the lowest scores for AUC, accuracy and F1-score.
In the index for predicting readmission, the neural network performed best in all three measures (AUC, accuracy, and F1-score). Neural networks and random forests produced the highest AUC (0.988), followed by logistic regression and trees. Neural networks and F1-scores had the highest accuracy. The Naïve Bayes had the lowest scores in AUC, accuracy and F1-score.
Important features were extracted from the data by features importance, both of 70 variables and 8 variables that had been categorized.
The top three important variables for predicting HF severity were creatinine level, LVEF, and CAD. The top variables for predicting hospital readmission were the number of admissions for HF within 1 year, dyslipidemia, and hospital LOS (Figure 2).
The top three predictors of HF severity in the eight categories were lab results, comorbidities, and medical history. The top three predictors of hospital readmission were physical assessment, comorbidities, and medical history (Figure 3).
In comparing AUC scores, our research tended to have higher AUCs than the models from Golas et al. [23] across similar algorithms. The logistic regression model achieved an AUC of 0.992 compared to 0.664 ± 0.015 in the study by Golas et al. [23] The gradient boosting model produced an AUC of 0.958 in our study versus 0.650 ± 0.011 in the study by Golas et al. [23] (Table 5).
IV. Discussion
Many studies have been conducted worldwide to predict readmission in patients with HF, with various characteristics and results. This study was conducted to determine the best ML method to predict both the severity of HF and the risk of hospital readmission for patients with HF. To predict both HF severity and readmission risk, we found that the neural network algorithm performed best (AUC = 1.000 and 0.998), higher than the results found in our systematic literature review (AUC = 0.930). Although other algorithms had AUCs <0.900, they performed well; tree, random forest, and gradient boosting for HF severity prediction and tree, random forest and logistic regression for readmission prediction.
Golas et al. [23] conducted a study to compare various ML methods for predicting readmission rates for patients with HF, focusing on the performance of DUNs relative to other techniques such as logistic regression, gradient boosting, and maxout networks. The study evaluated these methods using 10-fold cross-validation, with DUNs achieving the highest mean AUC (0.705), outperforming maxout networks (0.695), logistic regression (0.664), and gradient boosting (0.650) [23]. The study by Frizzell et al. [24] compared the efficacy of ML algorithms to traditional statistical models in predicting 30-day readmissions for HF. Using data from the get with the guidelines (GWTG)-HF dataset, the researchers analyzed 56,477 patients and found that ML models, such as the tree-augmented Naive Bayes, random forest, and gradient-boosted models, did not significantly outperform logistic regression and least absolute shrinkage and selection operator (LASSO) models [24]. Differences in the outcomes of these studies, as well as the present study, suggest that the most effective ML method can vary across different populations. Therefore, further analysis is needed to enhance the predictability of these models in a way that can be generalized to a broader population.
While analyzing the predictive importance of the eight feature categories, the three most important categories for predicting HF severity were laboratory results, comorbidities, and medical history. The most important categories for predicting readmission were physical assessment, comorbidities, and medical history. We also evaluated all the original 70 variables before they were categorized. We found the three most important variables for predicting severity were creatinine level, LVEF score, and a diagnosis of CAD, and the most important variables for predicting readmission were the number admissions for HF in the past year, dyslipidemia, and hospital LOS. As a comparison, Frizzell et al. [24] found that low LVEF, ischemic heart disease, and hospital LOS were significant variables in their logistic regression model, while admission history was not significant. These results will be used for further research or to provide implements that aid in the management of HF.
This research had several limitations. First, the data were obtained from the Diagram Heart Center, which is still developing its inpatient electronic medical records, and manual efforts to collect data could impact results. The performance of the ML methods could also be impacted by missing data. Second, clinically specific variables were not fully supported where they were necessary for making predictions (e.g., specific physical examination results, lab results, or variations in medication cost coverage). However, our research showed that ML techniques are effective at predicting the severity of HF and hospital readmission. The identification of patients who are at high risk of severe HF and hospital readmission for HF can be helped by these results. Raising physician awareness of the benefits of early HF management can lower costs and enhance quality of life.
In conclusion, this study showed that ANN outperformed other ML models in predicting HF severity and hospital readmissions in patients with HF. Thus, this model will be integrated into a self-monitoring mobile application for patients with HF. In addition, these findings may help researchers design larger-scale studies that utilize the highest level ML methods to benefit individuals with HF.
Acknowledgments
This paper is part of a PhD thesis of the first author in Public Health and supported by Universitas Indonesia with grant number NKB-213/UN2.RST/HKP.05.00/2023.
Notes
Conflict of Interest
No potential conflict of interest relevant to this article was reported.