Utility of Treatment Pattern Analysis Using a Common Data Model: A Scoping Review
Article information
Abstract
Objectives
We aimed to derive observational research evidence on treatment patterns through a scoping review of common data model (CDM)-based publications.
Methods
We searched the medical literature databases PubMed and EMBASE, as well as the Observational Health Data Sciences and Informatics (OHDSI) website, for papers published between January 1, 2010 and August 21, 2023 to identify research papers relevant to our topic.
Results
Eighteen articles satisfied the inclusion criteria for this scoping review. We summarized study characteristics such as phenotypes, patient numbers, data periods, countries, Observational Medical Outcomes Partnership (OMOP) CDM databases, and definitions of index date and target cohort. Type 2 diabetes mellitus emerged as the most frequently studied disease, covered in five articles, followed by hypertension and depression, each addressed in four articles. Biguanides, with metformin as the primary drug, were the most commonly prescribed first-line treatments for type 2 diabetes mellitus. Most studies utilized sunburst plots to visualize treatment patterns, whereas two studies used Sankey plots. Various software tools were employed for treatment pattern analysis, including JavaScript, the open-source ATLAS by OHDSI, R code, and the R package “TreatmentPatterns.”
Conclusions
This study provides a comprehensive overview of research on treatment patterns using the CDM, highlighting the growing importance of OMOP CDM in enabling multinational observational network studies and advancing collaborative research in this field.
I. Introduction
The common data model (CDM) is a standardized data structure that transforms heterogeneous data sources into a uniform format. This standardization facilitates data manipulation and enables the sharing of results through standardized analytical techniques [1]. Various CDMs have been developed in the medical field, including the Sentinel CDM by the United States Food and Drug Administration [2], the National Patient-Centered Clinical Outcomes Research Network CDM [3], and the Observational Medical Outcomes Partnership (OMOP) CDM. The OMOP CDM, introduced by the OMOP consortium in 2008, has gained widespread adoption globally due to its standardized terminology and focus on openness and sharing [4,5]. The Observational Health Data Sciences and Informatics (OHDSI) community plays a key role in the continuous development and implementation of the OMOP CDM model. It also manages a distributed research network that utilizes the CDM [6,7].
After the OMOP consortium released its CDM in the early 2010s, there were initially only a few retrospective observational studies utilizing this model. However, as the OMOP CDM became more widely adopted, there has been a consistent and significant increase in research activity over the past decade. Currently, more than 100 research papers are published annually, and from 2010 to 2022, over 12,000 citations have been accumulated [8]. Researchers appreciate the flexibility that OMOP CDM databases offer, allowing them to create various analytical environments tailored to their specific areas of interest. This flexibility facilitates conducting experiments on the same research topic across multiple institutions, as well as performing analyses on the same subject at different institutions [9,10]. Notably, the CDM is used for a variety of research purposes, including clinical characterization, treatment pattern analysis, population-level effect estimation, and patient-level prediction.
A scoping review is a method of knowledge synthesis that employs a systematic and iterative process to identify and synthesize the literature on a specific topic, whether it is well-established or emerging [11]. Scoping reviews can serve several purposes, but primarily aim to map the scope, variety, and characteristics of extant research, and to identify potential gaps within the topic area [12,13]. These reviews are designed to chart the existing literature on a subject, pinpointing key concepts, identifying gaps, and categorizing types of evidence. This provides a comprehensive snapshot of research trends. With its broad scope, a scoping review can encompass various methodologies and results, thereby steering future research and shaping policy and practice [14,15].
There have been relatively few scoping reviews on the CDM. One scoping review of the OMOP CDM summarized meta-information such as journal fields, countries, publication years, and research topics by screening titles and abstracts of studies published over a 5-year period [16]. Another study conducted a scoping review that focused on the overall design process of CDM studies from 2000 to March 2022 and proposed a conceptual model for CDM development methods [17]. This study aims to conduct a full-text-based scoping review of studies that utilize OMOP CDM data for treatment pattern analysis, with a particular focus on cohort definitions and analytical methodologies. Previous systematic reviews of treatment pattern analysis did not utilize the CDM but instead focused on research involving electronic health records (EHRs), clinical registries, or claims databases [18,19]. Such studies required separate analyses for each database, leading to significant time and effort spent conducting the same analyses across multiple databases. Recently, the CDM has been employed as an effective tool to overcome the limitations observed in previous research related to treatment patterns. Studies on treatment pattern analysis using the CDM were able to conduct and publish collaborative research more easily, as they shared a common data structure and analysis code [9,10].
This scoping review analyzed treatment patterns using the CDM, offering significant benefits in understanding current research trends. This method enables a comprehensive examination of recent studies, facilitating a deeper understanding and exploration of the evolving research landscape concerning treatment patterns based on CDM. Therefore, our study aimed to gather evidence on CDM-based research focused on analyzing treatment patterns and provide insights for its clinical application.
II. Methods
1. Literature Search Strategy and Selection
We conducted a search of the medical literature databases PubMed, EMBASE, and the OHDSI publication website to identify eligible observational cohort studies on treatment pattern analysis using the OMOP CDM databases. The studies were published between January 1, 2010, and August 21, 2023. The inclusion criteria included: (1) studies that conducted analyses using the OMOP, OHDSI, CDM, or “Common Data Model”; (2) studies that reported treatment patterns, treatment pathways, or sunburst plots as results of these analyses; and (3) studies focusing on cohort characterization research. The exclusion criteria were: (1) studies that were not retrospective cohort studies; (2) gray literature, such as conference abstracts; and (3) publications before 2010. After removing duplicates, we screened titles and abstracts and conducted thorough full-text reviews based on the inclusion and exclusion criteria. The results of these reviews are presented in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) checklist, as shown in Supplementary Table S1 [20]. Supplementary Table S2 provides a detailed overview of the search executed in this study.
2. Data Extraction and Data Analysis
From the selected literature, we extracted and summarized study phenotypes including target diseases, drugs, patient numbers, data periods, countries of origin, OMOP CDM source database information, and definitions of index dates and target cohorts for each study. We specifically visualized the frequency of target diseases reported in the literature using a histogram. Additionally, we illustrated the drug classes involved in treatment pattern analyses for studies that focused on frequently reported target diseases such as type 2 diabetes mellitus, hypertension, and depression. We initially categorized the medications used in treating type 2 diabetes mellitus, hypertension, and depression into their respective upper drug classes. These drug classes were displayed in a scatter histogram using the matplotlib, pandas, and numpy packages in Python version 3.11.5. Moreover, we collected and presented examples of sunburst plots and Sankey diagrams that illustrate the treatment pattern results analyzed in these studies. To assess whether different studies used varying techniques for treatment pattern analysis, we summarized the use of visualization methods such as sunburst plots and Sankey diagrams, as well as the types of analytical software used to represent treatment pathways. Specifically, we examined studies focusing on type 2 diabetes mellitus, hypertension, and depression to explore how cohort definitions and index date definitions vary for the same diseases across studies.
III. Results
1. Characteristics of the Selected Literature
The literature search strategy and selection criteria for the included studies are depicted in Figure 1. From the 1,145 records identified through database searches, 18 articles satisfied the inclusion criteria for the scoping review. The characteristics of these 18 studies are presented in Table 1 [9,10,21–36], including the definition of the target disease, patient numbers, investigated medications and treatments, data periods, countries, and OMOP CDM source databases. Supplementary Table S3 presents the target diseases in the selected articles, along with the index dates and target cohorts defined in each study. The index date is defined as the time of first exposure to the medication or the diagnosis of the disease, while the definition of the target cohort varied across the studies.

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow chart diagram. CDM: common data model, OMOP: Observational Medical Outcomes Partnership.
2. Target Diseases and Drug Classes of the Selected Literature
A total of 14 target diseases were identified across 18 selected articles. Among these, type 2 diabetes mellitus was the most frequently targeted disease, appearing in five articles, followed by hypertension and depression, which were featured in four articles each (Figure 2). In the studies concerning type 2 diabetes mellitus, all included five drug classes: biguanides, dipeptidyl peptidase 4 (DPP-4) inhibitors, insulin, sulfonylureas, and thiazolidinediones [9,10,21,22,28,29] (Figure 3). The analysis of treatment patterns revealed a pre-dominant use of biguanides (e.g., metformin) as the primary therapeutic agents (Supplementary Figure S1A). For hypertension and depression, a broad range of drug classes was utilized in the selected studies (Figure 3). The primary drug class used as first-line agents for hypertension varied across the studies (Supplementary Figure S1B). Selective serotonin reuptake inhibitors (SSRIs) were commonly prescribed as the sole first-line treatment for depression, with no progression to second- or third-line treatments (Supplementary Figure S1C).

Drug classes for type 2 diabetes mellitus, hypertension, and depression in the articles. (A) Type 2 diabetes mellitus. (B) Hypertension. (C) Depression. DPP-4: dipeptidyl peptidase 4, GLP-1: glucagon-like peptide-1, SGLT2: sodium-glucose co-transporter-2, ACE: angiotensin-converting enzyme, ARBs: angiotensin receptor blockers, CCBs: calcium channel blockers, NDRIs: norepinephrine-dopamine reuptake inhibitors, SSRIs: selective serotonin reuptake inhibitors, SARIs: serotonin antagonist and reuptake inhibitors, SNRIs: serotonin-norepinephrine reuptake inhibitors, TeCAs: tetracyclic antidepressants, TCAs: tricyclic antidepressants.
3. Methods Used for Analyses of Treatment Patterns
The variability in studies on treatment patterns can be attributed to several factors, including differences in drug classification systems, the inclusion of combination therapies, and variations in analysis methods. These methods range from the use of open-source software like ATLAS, to programming with R code, SQL, and JavaScript (Table 2). As detailed in Table 2, the majority of studies employed sunburst plots for visualization, whereas two studies opted for Sankey plots. The software choices for these visualizations varied; sunburst plots were generated using JavaScript, ATLAS by OHDSI, R source codes, and the open-source R package “Treatment-Patterns.” For Sankey plots, the “networkD3” package in R was predominantly used. Additionally, the definition of the target cohort varied, encompassing pre- and post-index periods and differing cohort characteristics such as age, index year, renal function, and disease severity (Table 3).
IV. Discussion
This study conducted a literature review of the OHDSI CDM, specifically employing a scoping review approach with various emphases, notably on cases of treatment pattern analysis. During the period from January 1, 2010, to August 21, 2023, we identified 18 studies that analyzed treatment patterns using the OMOP CDM. The earliest of these studies, which focused on treatment patterns using the OMOP CDM, targeted three diseases: type 2 diabetes mellitus, hypertension, and depression [9]. This initial research spurred numerous subsequent studies on treatment patterns for other chronic diseases, culminating in a significant body of literature that predominantly explores these three conditions.
The most commonly prescribed first-line treatment for type 2 diabetes mellitus varies in prescription rates across different studies, with metformin, a biguanide drug, frequently being the drug of choice. This study confirmed that the recommendation of metformin as an initial oral antidiabetic agent in major clinical guidelines for diabetes mellitus aligns with the actual treatment patterns observed in real-world clinical data [37,38]. In a study that conducted a subgroup analysis based on estimated glomerular filtration rates (eGFR), metformin was commonly used as a first-line treatment in the group with eGFR ≥ 60 mL/min/1.73m2. In contrast, DPP-4 inhibitors were more commonly used in groups with lower eGFR values. However, the patterns of second-and third-line treatments for patients with type 2 diabetes mellitus varied across institutions and countries, highlighting the importance of the follow-up period for accurately identifying these therapies [28]. The source of data for the CDM database can differ between multicenter and single-center sources, as well as by the national characteristics of hospitals. For instance, a study based on the CDM from a single tertiary hospital in Singapore included several combination drugs in its analysis of treatment patterns, demonstrating significant heterogeneity compared to other studies on patients with type 2 diabetes mellitus [29] (Supplementary Figure S1A).
Unlike diabetes mellitus, treatment patterns for hypertension vary widely across studies concerning first-line therapy. This inconsistency arises because hypertension treatment guidelines [39] permit the selection of various medications, with choices tailored to the patient’s underlying conditions and age. Therefore, the first-line agents differ depending on the characteristics of the patients included in each study. For patients with depression, SSRIs are commonly used as the first-line treatment because they are relatively safer compared to other medications. Guidelines also recommend SSRIs or serotonin-norepinephrine reuptake inhibitors as the first-line antidepressants [40]. Even with underlying conditions such as arrhythmia or stroke, escitalopram, an SSRI, is recommended as the first choice. As a second-choice medication, another SSRI, sertraline, is frequently used [41].
Through this scoping review, we synthesized findings from various studies that employed CDM to analyze treatment patterns. Although studies dealing with the same diseases—such as diabetes mellitus, hypertension, and depression—often used similar index dates, the operational definitions of target cohorts varied significantly. This variation indicates that directly comparing treatment pattern analysis results for the same disease across different studies may be challenging. Furthermore, the analytical techniques, including the software and visualizations used for analyzing treatment patterns, showed considerable diversity. Many studies frequently reused analytical methods from prior research. To promote more robust and collaborative research, it is essential to create an environment that encourages the sharing of analytical techniques and codes, thus enhancing collaboration among researchers using standardized CDM data.
While the study provides valuable insights, it has some limitations. The OMOP CDM databases primarily relied on EHR data, with only a limited amount of data derived from claims databases. In South Korea, one study successfully standardized claims data from the Health Insurance Review and Assessment Service into the OMOP CDM database [27]. We anticipate that as the transformation and utilization of the OMOP CDM based on claims data continue to grow, future studies will be able to compare and further explore OMOP CDM databases that incorporate both claims and EHR data. Furthermore, the current literature using OMOP CDM data in this study does not fully capture the variations in drug dosage and usage observed in real clinical settings, underscoring the need for future research to include clinical data that reflects these variations.
Despite the limitations of this research, it also offers significant academic contributions. This study focuses on analyzing treatment patterns using CDM databases, which is different from studies that use EHR or claims databases. By leveraging the strengths of CDM resources, this approach benefits from the implementation of uniform data structures and formats. Such uniformity enables multicenter studies to use the same data analysis code, which facilitates the comparison of treatment patterns across various cohorts, institutions, or time periods. This capability is particularly valuable for researchers who aim to identify trends, variations, guideline adoption, or best practices in healthcare. Consequently, the adoption of CDM has the potential to enhance the efficiency, accuracy, and comprehensiveness of healthcare data analysis, providing researchers with a powerful tool for advancing medical knowledge and improving patient outcomes [42]. However, based on the studies reviewed so far, the analysis of treatment patterns has been limited to specific disease groups, indicating that the current analytical framework is still insufficient to fully support CDM-based studies. Therefore, strategies to further facilitate and expand the use of CDM should be developed. Additionally, many of the studies included in the treatment pattern analysis utilized data from domestic hospitals, highlighting the need for further research that leverages multicenter CDM data from both domestic and international institutions.
This study conducted a scoping review of the literature on the OMOP CDM based on the OHDSI, with a specific focus on treatment pattern analysis. By examining these studies from various perspectives, we identified emerging trends and provided a comprehensive overview. The findings highlight the increasing importance of OMOP CDM in facilitating network studies using observational patient data from multiple countries. Analyzing these treatment pattern studies can aid research teams in efficiently identifying relevant topics for observational network studies and promote advancements in collaborative research within this field.
Notes
Conflict of Interest
No potential conflict of interest relevant to this article was reported.
Acknowledgments
This research was supported by a grant of the Medical data-driven hospital support project through the Korea Health Information Service (KHIS) funded by the Ministry of Health & Welfare, the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RE-2023-00241887), and the NAVER Digital Bio Innovation Research Fund from NAVER Corporation (Grant No. 3720230025).