The purpose of this study was to use decision tree analysis to explore the factors associated with pressure ulcers (PUs) among elderly people admitted to Korean long-term care facilities.
The data were extracted from the 2014 National Inpatient Sample (NIS)—data of Health Insurance Review and Assessment Service (HIRA). A MapReduce-based program was implemented to join and filter 5 tables of the NIS. The outcome predicted by the decision tree model was the prevalence of PUs as defined by the Korean Standard Classification of Disease-7 (KCD-7; code L89*). Using R 3.3.1, a decision tree was generated with the finalized 15,856 cases and 830 variables.
The decision tree displayed 15 subgroups with 8 variables showing 0.804 accuracy, 0.820 sensitivity, and 0.787 specificity. The most significant primary predictor of PUs was length of stay less than 0.5 day. Other predictors were the presence of an infectious wound dressing, followed by having diagnoses numbering less than 3.5 and the presence of a simple dressing. Among diagnoses, “injuries to the hip and thigh” was the top predictor ranking 5th overall. Total hospital cost exceeding 2,200,000 Korean won (US $2,000) rounded out the top 7.
These results support previous studies that showed length of stay, comorbidity, and total hospital cost were associated with PUs. Moreover, wound dressings were commonly used to treat PUs. They also show that machine learning, such as a decision tree, could effectively predict PUs using big data.
Pressure ulcers (PUs) are defined as “localized injuries of the skin and/or underlying tissue over a bony prominence due to pressure and shear” [
In Korea, long-term care facilities have rapidly increased since the implementation in 2008 of long-term care insurance, a social insurance for Korean older people, which provides physical and social support for older people and helps relieve caregivers' financial burdens [
To reduce the occurrence of PUs, prevention is emphasized more than treatment is [
With the widespread use of electronic health records, big data is digitally collected and stored in healthcare sections [
However, because most big data sets, such as the HIRA NPS, are not purposely collected for research purposes and are complex. This makes it difficult to manage common and traditional tools and methods [
Data mining is the process of selecting, exploring, and modeling large amounts of data to uncover events and characteristics for accurate predictions of future data [
Therefore, the purpose of this study was to explore the factors associated with PUs among elderly patients admitted to Korean long-term care facilities according to the 2014 Health Insurance Review and Assessment Service National Inpatient Sample (HIRA NIS) using decision tree analysis.
Data were extracted from the 2014 NIS provided by HIRA (HIRA-NIS-2014-0071). HIRA claims data are national data collected from healthcare providers all over the country for reimbursements for healthcare services [
We first cleaned and preprocessed the data. An expert data manager who had skills to deal with big data analytics tools, such as Apache Hadoop and machine learning, was involved in this process. Preprocessing included initial filtering based on inclusion criteria and research interests, variable transformation and binarization, and table joining.
The HIRA NIS data consists of 5 tables: general specifications, health services, diagnosis information, outpatient prescription information, and health service provider information [
Non-useful variables were initially excluded after reviewing; only medication/prescription, injection/procedure, and treatment/operation among healthcare services were included. A variable with too raw values without classification was categorized, and each category was dichotomized for a new variable. For example, drug codes in a medication variable were categorized into 34 mid-classes according to Korean drug classification [
Initial machine learning was performed considering all selected variables. Based on the analysis results, a few meaningless variables were removed to improve interpretation and performance.
The outcome predicted by the decision tree model was the prevalence of PUs as defined by the KCD-7 (code L89*) [
A decision tree was generated using R 3.3.1 to visualize associated factors and explore the patients most at risk of pressure. It could help to identify sub-populations with/without pressure through easily interpreted grouping rules [
The 10-fold cross-validation method was used to minimize the bias associated with the random sampling of the training. In the 10-fold cross-validation, the data set was divided into 10 parts, and then 9 parts were used for training and 1 set was used for testing. The process was then repeated until all parts were tested. The goal of this process was to determine which data mining algorithm performs best so we could use it to generate our target predictive model [
Three performance measures were used to evaluate the models: accuracy, sensitivity, and specificity. Accuracy is the ability to differentiate between patient and healthy cases correctly. Sensitivity is the ability to identify patient cases correctly. Specificity is the ability to identify healthy cases correctly [
Statistical analysis examined associations between PUs and predictors identified by decision tree analysis: chi-square analysis for categorical variables and independent
This study was a secondary analysis using HIRA NIS. The NIS data obtained from HIRA were de-identified and did not contain any patient-specific information. Furthermore, this study was reviewed and exempted by the Kyungpook National University ethical committee (IRB No. 2016-0104).
The group most likely to have PUs was the 1,391 patients who stayed in the hospital for more than or equal to 0.5 days and had infectious wound dressing. The second pressure group included 1,212 patients. Patients who stayed in the hospital for a period longer than or equal to 0.5 days, who did not have infectious wound dressing, who had 3.5 or more than diagnoses, who did not have simple dressing, who did not have injuries the hip and thigh, who had 5.5 or more diagnoses were more likely to have PUs when they had total hospital cost exceeding US $2,000 (2,200,000 Korean won).
This study explored the factors associated with the development of PUs using a data mining approach. The data were extracted from the HIRS NIS. A decision tree was generated with 15,856 cases and 830 variables. The decision tree displayed 15 subgroups with 8 variables showing good prediction performance. First of all, this study highlighted the usefulness of the data mining approach in managing and analyzing healthcare big data, such as the HIRA NIS data. Data mining accurately identified meaningful associations between an outcome and many variables.
The length of stay was the top variable associated with PUs. Moreover, the group with PU had a significantly longer length of stay. These results were similar to those of previous PU studies. PU could be a significant factor that prolongs the length of stay beyond expectations based on diagnosis at admission [
Infectious wound dressings and simple dressings were the second- and fourth-most commonly associated variables with PU. The results are quite reasonable because wound dressing is a main component of PU care. Dressings are used to keep a wound bed moist or to keep the periwound dry and prevent maceration to facilitate healing [
The number of medical diagnoses was an important variable to predict PUs. The numbers of diagnoses less than 3.5 and 5.5 were ranked 3rd and 6th splits. The number of medical diagnoses could be considered as a comorbidity. The results confirmed that comorbidity is a risk factor for PUs [
We found that total hospital cost was a factor associated with PUs, and this is supported in the literature. Multiple factors, including the prolonged stay, labor of healthcare providers, and treatment material costs can increase hospital costs [
Overall, the results of this study with big data confirmed other previous study results related to PUs; our results appear to be more valid and generalizable than those of previous studies. PUs are associated with length of stay, number of diagnoses, and total hospital costs. Moreover, elderly patients with PUs are more likely to have simple, infected dressings. Eventually, longer length of stay or additional procedures, such as changing the dressing could lead to increased hospital costs for PU patients. Therefore, the importance of PU prevention to alleviate the financial burden of long-term care facilities is highlighted by this study.
Contrary to our expectation, the results did not show any drug associated with PUs even though previous research has identified specific drugs that seem to increase the incidence of PUs [
Big data continue to impact healthcare; such information can be used to improve patient care and clinical decision-making. However, big data are very large and complex, and they are hard to manage with traditional manipulation methods. This study suggested that the use of data mining in healthcare big data can minimize the time spent manually search while minimizing insignificant studies; it can identify association [
In conclusion, this study used a decision tree to find factors associated with PUs using the 2014 NIS, which is data from the HIRA. A decision tree was generated with 15,856 cases and 830 variables. The decision tree displayed 15 subgroups with 8 variables showing 0.804 accuracy, 0.820 sensitivity, and 0.787 specificity. These results support those of previous studies that showed length of stay, comorbidity, and total hospital cost were associated with developing PUs. Moreover, wound dressings were commonly used to treat PUs. Finally, this study showed that data mining methods, such as decision tree analysis, could identify outcome variables in a big data set with many variables.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2015R1C1A2A01054883).
Values are presented as number (%) or mean (range).
KCD-7: Korean Standard Classification of Diseases-7, OPD: outpatient department.
aThe only top or meaningful value was described. bTop 3 variables among the variables were selected and described.
Values are presented as number (%) or mean (range).
aBilling codes. bKorean Standard Classification of Disease 7 (KCD-7) codes.