Identifying Patterns of Depression Comorbidities Using Association Rule Learning: Insights from Maryland Medicaid Data
Article information
Abstract
Objectives
This study aimed to identify association rules in patients with multiple chronic conditions, with a focus on patterns involving depression, a highly prevalent psychiatric disorder and a significant risk factor for suicide. Understanding comorbidity patterns in patients with depression is critical for targeting screening efforts, enabling early diagnosis, and improving chronic disease management.
Methods
Maryland Medicaid claims data from 2021 to 2022 were analyzed to examine the co-occurrence of depression with 62 other chronic conditions using association rule learning. Analyses were stratified by sex and age group to identify patterns specific to demographic subgroups. Thresholds for case numbers and confidence levels were applied to ensure that identified rules were both clinically meaningful and statistically robust.
Results
The study showed a marked increase in the number of association rules with advancing age, particularly among women compared to men. In total, 582 association rules were identified, providing important insights into comorbidity structures.
Conclusions
This study demonstrates the utility of association rule learning for detecting clinically relevant patterns of depression comorbidities, including variations by age and sex. The identified rules could inform clinical practice by improving targeted screening, facilitating early diagnosis, and guiding management strategies for patients with multiple chronic conditions.
I. Introduction
Depression is one of the most prevalent mental illnesses worldwide [1], with 9% of United States adults experiencing a major depressive episode in 2022 [2]. Often termed a silent killer, depression may remain undetected or undiagnosed for extended periods [3]. This condition can profoundly disrupt patients’ lives. Despite available screening recommendations [4] and effective treatments [5], depression remains significantly underdiagnosed [6].
Individuals with multiple chronic conditions (MCCs)— defined as the simultaneous presence of two or more chronic diseases—are more likely to experience depression than those with fewer health problems [7]. A recent report from the Centers for Disease Control and Prevention highlighted that “an increasing proportion of people in America are dealing with MCCs; 42% have 2 or more” [8]. Identifying patterns of chronic disease may therefore support improved screening, earlier diagnosis, and better management of chronic conditions.
Association rule learning, widely applied in healthcare [9–11], is a powerful data-mining approach because of its ability to uncover hidden patterns and relationships in large datasets [12,13]. This method is particularly valuable in healthcare due to the abundance of structured data (e.g., patient charts, medical claims) collected over time, where complex interactions between multiple conditions influence outcomes. Identifying association rules and applying them to individual patient characteristics may enable a more comprehensive understanding of potentially undiagnosed conditions. This, in turn, can promote more effective and personalized treatment strategies, improve allocation of healthcare resources, and enhance patient management. Furthermore, association rule mining is adaptable to evolving datasets, making it an important tool for continuous learning and changes in clinical practice.
Previous studies have investigated different numbers of chronic conditions across varying population sizes, largely depending on data availability. For example, Li et al. [14] examined 15 chronic conditions in 11,500 patients, whereas Birk et al. [15] analyzed 21 conditions among 2,311 patients. To our knowledge, this study is the first to apply association rule learning to examine the associations between depression and a broad range of 62 chronic conditions using a large-scale healthcare database encompassing approximately 1.3 million patients.
Association rules can reveal important relationships that may not be apparent with traditional analytical techniques. One example is identifying the co-occurrence of chronic conditions. Although other analytical methods, such as cluster analysis [16] and logistic regression [17], have been employed to study MCCs, each presents limitations. Cluster analysis can reveal grouping patterns, but the resulting clusters often lack clinical interpretability and do not generate IF-THEN decision logic. Logistic regression is well-suited for estimating the probability of a target condition but is less effective at capturing the complex, interdependent nature of co-occurring diseases. Moreover, when the number of chronic conditions is large, building separate logistic models for each condition is impractical, and multivariate models may face dimensionality challenges. By contrast, association rule learning provides a powerful, unsupervised approach for identifying coexisting patterns among chronic conditions without requiring a predefined target variable, while also offering clinically interpretable if-then rules. In this study, we applied association rule learning to Maryland Medicaid enrollees to identify all statistically significant association rules for depression, including age- and sex-specific subgroups.
The Hilltop Institute has access to Maryland Medicaid data as a business associate to the Maryland Department of Health and this research was conducted with their approval and in compliance with the business associate agreement. The research adhered to Health Insurance Portability and Accountability Act requirements and other federal regulations governing human subjects research. Methodological and ethical considerations were consistent with the University of Maryland, Baltimore County’s commitment to ethical research practices.
II. Methods
Using the Maryland Medicaid claims database from 2021 to 2022, we applied association rule learning to examine the co-occurrence of depression with 62 other chronic conditions. The analysis was stratified by sex and age group in a retrospective, observational cohort study. Depression and the 62 chronic conditions were represented as binary variables, with a value of 1 indicating presence and 0 indicating absence.
1. Study Population
Medicaid is a public health insurance program provided through a federal–state partnership and covers low-income individuals, children, pregnant women, and people with disabilities. To ensure complete healthcare claims information, we included only full-benefit Medicaid enrollees who maintained continuous enrollment during the study period. We excluded enrollees who were dually eligible for Medicare, a federal insurance program covering Americans aged 65 and older, as well as those with disabilities or end-stage renal disease, because Medicare serves as the primary payer for these individuals.
We constructed chronic condition indicators as of December 31, 2022, using Medicaid claims and encounter data. The definitions for the 62 chronic conditions were based on the Chronic Condition Data Warehouse methodology [18,19]. We selected conditions defined exclusively by diagnosis codes and excluded those with overlapping definitions (e.g., we used depressive disorder rather than depression, bipolar, or other depressive mood disorders).
2. Data Mining
We reported descriptive statistics of the study population stratified by age as of December 31, 2022. Included characteristics were sex (male, female), race (Asian, Black, White, Hispanic, Native American, Pacific Islander, or Alaskan Native), basis of Medicaid eligibility (expansion adults under the Affordable Care Act, non-disabled adult or child, disabled, children, foster care), and the 63 chronic conditions.
We identified the total number of potentially relevant association rules and highlighted the top 10 rules in six categories defined by sex (male, female) and age group (children <18 years, adults 18–44, and middle-aged adults 45–64). The complete lists of rules for each category are provided in the supplemental materials. To extract potentially relevant association rules between depression and other chronic conditions, we set thresholds requiring (1) at least 1,000 cases for the condition combination and (2) a confidence level of at least 0.6. In association rule learning, confidence is defined as the conditional probability of the consequent (i.e., depression) given the presence of the antecedent condition. The dataset was divided into a training dataset (80% of participants, selected through simple random sampling) and a holdout testing dataset (20% of participants, reserved for validation). Association rules were derived from the training dataset and subsequently applied to the testing dataset. Actual confidence values for each rule were compared with predicted confidence values. Relative errors were then calculated to assess the accuracy of the rules through error analysis.
Data preparation and management were conducted using SAS 9.4 software (SAS Institute, Cary, NC, USA), and the association rule analysis was performed in Python with the MLxtend package. The complete data-processing workflow is illustrated in Figure 1. Steps shown in dashed rectangles represent planned future work.
The research adhered to Health Insurance Portability and Accountability Act requirements and other federal regulations governing human subjects research. Methodological and ethical considerations were consistent with the University of Maryland, Baltimore County’s commitment to ethical research practices.
III. Results
The database constructed for this study included 1,296,851 Medicaid enrollees, of whom 691,595 (53.3%) were female and 605,256 (46.7%) were male. The age distribution comprised 555,636 (42.9%) children (<18 years), 533,960 (41.2%) young adults (18–44 years), and 207,255 (15.9%) middle-aged adults (45–64 years). More detailed demographic and program eligibility information is provided in Supplementary Table S1 of the supplementary material.
Among all participants, depression was the most frequently diagnosed chronic condition, with a prevalence of 13.9%. This was followed by anxiety disorder (13.3%), obesity (10.9%), hypertension (10.7%), and hyperlipidemia (9.6%).
We assessed the balance of key characteristics, including sex, age group, race, and five mental health conditions, between the training and testing datasets, as shown in Supplementary Table S2. All examined variables demonstrated well-balanced distributions across datasets. Applying the significance thresholds, we identified potentially meaningful association rules for each sex-by-age-group combination (Supplementary Tables S3–S6) using the training dataset. In total, 582 significant association rules were identified. No rules were found for children (<18 years), while 195 rules were identified for young women (19–44 years), 70 for young men, 237 for middle-aged (45–64 years) women, and 80 for middle-aged men. Within each age group, women exhibited a greater number of association rules than men.
For young women, 195 association rules were identified. Table 1 presents the top 10 rules ranked by descending confidence. An association rule is denoted as antecedent → consequent, where both antecedent and consequent may consist of one or multiple conditions. For example, the first rule in Table 1, (anxiety, drug use disorders, schizophrenia) → depression, indicates that the antecedent is the combination of anxiety, drug use disorders, and schizophrenia, while the consequent is depression. For young women with this combination of conditions, the probability of depression as a coexisting condition was 0.96. Individuals with this antecedent were 3.99 times more likely to be diagnosed with depression than would be expected if antecedent and consequent occurrences were independent. This rule was derived from 5,109 cases in the training dataset. When validated in the holdout testing dataset, the calculated confidence was 0.94, closely approximating the predicted confidence of 0.96, with an absolute percentage error (APE) of 2.19%. Across all 195 association rules for young women, the mean absolute percentage error (MAPE) was 2.01% (standard deviation [SD] 1.66%). Supplementary Table S3 in the supplementary material contains all 195 rules, and Supplementary Table S7 provides short names and full definitions for all 63 chronic conditions referenced.
For young men, the top 10 association rules ranked by descending confidence are shown in Table 2. In the holdout validation dataset, the MAPE for all 70 association rules was 1.96% (SD 2.01%).
For middle-aged women, 237 significant rules were identified. The top 10 rules, ranked by descending confidence, are presented in Table 3. The MAPE for all 237 rules was 2.02% (SD 1.74%).
For middle-aged men, 80 significant association rules were identified. Table 4 shows the top 10 rules in descending order of confidence. The MAPE for all 80 rules was 1.81% (SD 1.55%).
IV. Discussion
This study demonstrates that association rule learning can uncover potentially meaningful patterns of chronic conditions from large-scale healthcare data on Maryland Medicaid enrollees. By focusing on depression, we identified 582 association rules that met thresholds for case counts and confidence levels. Our findings illustrate how association rule learning can be applied within demographic subgroups to identify patterns specific to those groups. For example, in analyses stratified by age and sex, we observed both commonalities, such as frequent co-occurrence with anxiety or PTSD (post-traumatic stress disorder), and subgroup-specific differences in condition combinations. Association rules for older men and women included conditions such as hypertension, which is more prevalent in older populations. In contrast, tobacco use appeared in nearly all of the top rules for younger men (seven of the top 10), whereas other groups had only two or three rules involving tobacco use. This suggests that tobacco use may be a particularly important factor for younger men.
Prior research has shown that chronic conditions can lead to depression and, conversely, that depression can contribute to the onset of chronic disease [20]. The association rules identified in this study could be readily incorporated into clinical practice to improve detection of depression and to support management or prevention of related conditions. For example, if a young adult female has diagnoses of anxiety, drug use disorder, and schizophrenia, a decision support tool could flag that she has a very high probability (0.96) of also having depression. In this scenario, her clinician could initiate or expand screening for depression if it has not yet been diagnosed.
Validation of the association rules with holdout testing data showed strong predictive performance. For instance, among middle-aged women, the MAPE of the 237 rules was 2.02%, meaning that the predicted probability of coexisting depression deviated from the observed probability by only about 2% on average. The consistently small MAPEs across all four demographic groups support the utility of association rules for predicting comorbid conditions involving depression.
The association rules revealed by this analysis should be interpreted as indicators of potential relationships, suitable for guiding further screening or testing, rather than definitive diagnostic tools. This study focused specifically on Maryland Medicaid claims data to derive subgroup-specific insights. Although some rules aligned with existing guidelines and research [21], the findings were not reviewed by clinical experts, which limits the assessment of their clinical validity and relevance.
Our analysis was restricted to full-benefit Medicaid enrollees to ensure complete claims information. Consequently, the findings may not be generalizable to populations covered by commercial insurance or to uninsured individuals. Nonetheless, the large number of significant rules identified suggests that applying similar methods in other datasets could provide valuable insights into unique comorbidity patterns within different populations.
While association rule learning is widely used in data mining to identify relationships in large datasets, it has inherent limitations. These include scalability challenges, difficulties handling high dimensionality, the inability to establish causality, and the risk of overfitting. Strategies to mitigate these issues include leveraging parallel computing for scalability, applying dimensionality reduction or subsampling for high-dimensional data, and using pruning techniques or cross-validation to reduce overfitting [22]. Implementing such strategies strengthens the robustness and applicability of association rule learning in diverse healthcare contexts.
In this study, association rule learning applied to a large healthcare database produced many rules with high predictive accuracy and clinical relevance for depression, a common chronic condition. The results could be incorporated into web-based applications or integrated with existing clinical decision support tools or electronic health record systems. Such tools could combine demographic and clinical data—such as age, sex, and current health conditions— to generate reports on potentially undiagnosed conditions or conditions at risk of developing. This information could guide clinical decision-making, support targeted screening, and refine treatment plans. Regularly updating the learning process would enable rules to reflect newly available data and shifting comorbidity patterns, thereby improving care for high-risk patients. Although the identified rules demonstrated strong predictive performance, expert clinical review remains essential for interpreting novel or unexpected findings and assessing their clinical importance. As a next step, we plan to collaborate with clinicians to validate these rules in real-world practice and to further explore their clinical utility and relevance.
Notes
Conflict of Interest
No potential conflict of interest relevant to this article was reported.
Acknowledgments
This work was supported by the Maryland Department of Health (USA)
Supplementary Materials
Supplementary materials can be found via https://doi.org/10.4258/hir.2025.31.4.388.
