OBJECTIVE: The purpose of this study was to explore the potential application of a Bayesian network, an emerging data mining technique, in predicting outcomes using large healthcare databases.
METHODS: The HIV Cost and Services Utilization Study(HCSUS) dataset, consisting of 2,864 HIV positive adults in the US, was used. A total of 35 variables were selected including one output variable defined as more than one hospitalization in six months representing a sub-optimal pattern of healthcare utilization in HIV care. The HUGIN Researcher 6.2(TM) was used to develop a Bayesian network model with two learning algorithms: 1) Necessary Path Condition(NPC) to construct a Bayesian network structure, and 2) Expectation-Maximization(EM) algorithm to estimate parameters.
RESULTS: The area under the Receiver Operating Characteristic(ROC) curve was .72. The accuracy of the prediction model was .66. Sensitivity and specificity were .65 and .66, respectively.
CONCLUSION: The Bayesian network showed fair performance in this prediction problem. This study provides researchers new insight into working with large sets of data, which continue to grow in a number of cases and variables. The repeated testing and refinement of the Bayesian network modeling techniques with other health outcomes in large databases is recommended. |