Development and Verification of Time-Series Deep Learning for Drug-Induced Liver Injury Detection in Patients Taking Angiotensin II Receptor Blockers: A Multicenter Distributed Research Network Approach

Article information

Healthc Inform Res. 2023;29(3):246-255
Publication date (electronic) : 2023 July 31
doi :
1Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea
2Medical Informatics Collaborative Unit, Department of Research Affairs, Yonsei University College of Medicine, Seoul, Korea
3Healthcare Data Science Center, Konyang University Hospital, Daejeon, Korea
4Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Seoul, Korea
5Department of Statistics, Korea University, Suwon, Korea
6Healthcare AI Team, National Cancer Center, Goyang, Korea
7Transdisciplinary Department of Medicine & Advanced Technology, Seoul National University Hospital, Seoul, Korea
8Division of Allergy and Immunology, Department of Internal Medicine, Institute of Allergy, Yonsei University College of Medicine, Seoul, Korea
Corresponding Author: Yu Rang Park, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul 03772, Korea. Tel: +82-2-2228-2493, E-mail: (
*These authors contributed equally to this work.
Received 2023 May 18; Revised 2023 July 20; Accepted 2023 July 23.



The objective of this study was to develop and validate a multicenter-based, multi-model, time-series deep learning model for predicting drug-induced liver injury (DILI) in patients taking angiotensin receptor blockers (ARBs). The study leveraged a national-level multicenter approach, utilizing electronic health records (EHRs) from six hospitals in Korea.


A retrospective cohort analysis was conducted using EHRs from six hospitals in Korea, comprising a total of 10,852 patients whose data were converted to the Common Data Model. The study assessed the incidence rate of DILI among patients taking ARBs and compared it to a control group. Temporal patterns of important variables were analyzed using an interpretable time-series model.


The overall incidence rate of DILI among patients taking ARBs was found to be 1.09%. The incidence rates varied for each specific ARB drug and institution, with valsartan having the highest rate (1.24%) and olmesartan having the lowest rate (0.83%). The DILI prediction models showed varying performance, measured by the average area under the receiver operating characteristic curve, with telmisartan (0.93), losartan (0.92), and irbesartan (0.90) exhibiting higher classification performance. The aggregated attention scores from the models highlighted the importance of variables such as hematocrit, albumin, prothrombin time, and lymphocytes in predicting DILI.


Implementing a multicenter-based time-series classification model provided evidence that could be valuable to clinicians regarding temporal patterns associated with DILI in ARB users. This information supports informed decisions regarding appropriate drug use and treatment strategies.

I. Introduction

Adverse drug reactions (ADRs) are a significant concern for public health, as they can cause hospital admissions and rank among the leading causes of death [1,2]. According to the Food and Drug Administration, the number of ADRs has been steadily increasing over the years, tripling from 2006 to 2014 [3]. Drug-induced liver injury (DILI), in particular, stands out as one of the primary reasons underlying ADRs in real-world treatment, significantly affecting patient safety and drug development [4,5]. Despite the importance of predicting DILI risk for ensuring safety, adverse hepatic effects on health remain unpredictable, and there is insufficient evidence to support risk factors for DILI resulting from medications [4,6].

Therefore, many researchers have focused on identifying the early hepatotoxic risk for future intervention using artificial intelligence (AI) and big data [4,5,7]. A study of Jaganathan et al. [4] presented an accuracy of 0.811 using a molecular-level support vector machine in 2021. Chen et al. [7] developed a multi-source-based prediction model using the ResNet-18 deep neural network. However, these studies had two main limitations. One is an insufficient standardized multicenter validation study, which would obtain more reliable results through acceptable validation. Due to the risk of leaking patient information and many laws related to protecting patient information, multicenter studies encounter some hurdles. The other limitation lies in the black-box nature of AI. Most research has utilized conventional statistical methods or deep learning techniques [4,5,7]. Although the performance of these methods is sufficiently high, the prediction results lack explainability, which is a necessary component for clinical implementation. The absence of explainability makes it difficult to implement models in clinical environments.

Multicenter research has been conducted using many standardized models such as Patient-Centered Outcomes Research Network, National Institute Health Common Data elements, and Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) [810]. CDM-based studies showed effective results due to the advantage of having an optimized data structure and terminology system for multicenter studies. OMOP-CDM is superior to other CDM models in terms of content coverage, integrity, and integration. Most medical center data have been converted into OMOP-CDM [11]. Furthermore, in 2016, the US Food and Drug Administration (FDA) Sentinel, a specialized system for drug surveillance [12], was initiated. While research using this system has been conducted on specific areas such as pharmacoepidemiology and hemorrhage in the USA, its application for DILI has not been explored [13].

In terms of model explainability, several techniques such as grad-cam, Shapley values, and partial dependent plots have been suggested [1416]. However, most of these methodologies were primarily designed for image or tabular data. The interpretability multivariate long short-term memory (IMVLSTM) model was recently published, which considers time-based explanations [17]. To the best of our knowledge, there has been no application of time-based explanations in DILI research.

To address these research gaps, we developed and validated a multicenter-based explainable time-series AI model for predicting DILI using data from six hospitals in Korea.

II. Methods

1. Study Design

This study is a retrospective cohort analysis using a standardized CDM of Electronic Health Records (EHRs) from six hospitals in South Korea to predict DILI. The data sources include Severance Hospital (SH), Gangnam Severance Hospital (GSH), Konyang University Hospital (KYUH), Ajou University Hospital (AJUH), Seoul National University Cancer Hospital (SNUH), and the National Cancer Center (NCC). The study utilized OMOP-CDM version 5.3.1. In order to identify risk factors for DILI, we constructed cohorts based on each hospital and drug. The distributed research networks (DRNs)’s based on CDM encompassed a vast population of approximately 12.47 million individuals from 1994 to 2021. From this extensive dataset, we curated a final cohort consisting of 15,236 subjects, comprising 3,809 cases and 11,427 controls. The detailed study design can be found in Supplementary Table S1.

This study was approved by the Institutional Review Committee of Severance Hospital (No. 4-2021-1209), Gangnam Severance Hospital (No. 3-2021-0005), Konyang University Hospital (No. KYUH 2021-10-003-001), Ajou University Hospital (No. AJIRB-MED-MDB-21-676), Seoul National University Cancer Hospital (No. E-2207-151-1342), and the National Cancer Center (No. NCC2022-0184).

2. Definition of Drug-Induced Liver Injury

In this research, we employed criteria for defining DILI classification stages that align with the “injury” category. These were: (1) an alanine aminotransferase (ALT) elevation ≥5 times the upper limit of normal (ULN), (2) an alkaline phosphatase (ALP) elevation ≥2 times the ULN, or (3) an ALT ≥3 times the ULN accompanied by a total bilirubin concentration above 2 times the ULN [16].

3. Cohort Definition

This study aimed to predict DILI by focusing on six selected drugs in the category of angiotensin II receptor blockers (ARBs): losartan, candesartan, telmisartan, olmesartan, irbesartan, and valsartan. These specific drugs were carefully selected from a pool of eight ARBs commonly reported in the literature and frequently encountered in hospitals [18,19]. In addition, a drug was selected as a target drug if at least 20 cases of DILI were recorded in the six hospitals participating in the study (Table 1).

Cohort population before and after propensity score matching

For the case cohort, we included patients who had been administered any one of the six ARBs. The index date for this target cohort was determined as the initial administration date of the ARBs. Initially, we included patients who met the criteria for DILI within 60 days after the index date. The control cohort was defined as patients who did not exhibit DILI within 60 days after the index date. To minimize confounding factors compared to cases, we performed propensity score matching (PSM) using the K-nearest neighbor algorithm based on age, sex, and baseline liver function tests (LFTs) at the time of enrollment, maintaining a 1:3 ratio between controls and cases. The LFTs utilized in the matching process included aspartate aminotransferase (AST), ALT, ALP, and total bilirubin (TBL). The inclusion criteria mandated that the visit record be at least 30 days prior to the index date, with the patient having undergone at least two LFTs within 60 days preceding the index date during the pre-observation period. Exclusion criteria encompassed cases where the measured LFT value exceeded the ULN value. To provide an overview of the cohort construction process, we have presented a diagram in Figure 1. For accessibility, the cohort definitions created in ATLAS are available as JSON files on GitHub [20].

Figure 1

The overall flowchart for predicting drug-induced liver injury (DILI ) events. SH: Severance Hospital, GSH: Gangnam Severance Hospital, KYUH: Konyang University Hospital, AJUH: Ajou University Hospital, SNUH: Seoul National University Cancer Hospital, NCC: National Cancer Center, ULN: upper limit of normal, IMV-LSTM: interpretability multivariate long short-term memory.

4. Candidate Predictors for the Time-Series

In this study, we extracted candidate predictors from various domains within the OMOP-CDM by querying per-patient observational data using Python’s SQL query tools. Candidate variables were selected from all concepts used in the person domain of the CDM (sex, age), and in four main domains: measurement, drug exposure, condition occurrence, and procedure occurrence. We handled laboratory tests as continuous variables and the rest as dichotomous variables. To select the predictors, we conducted statistical tests to assess the significance of the difference between the cohort’s enrollment time and the onset date of DILI. For continuous variables, we employed the paired t-test, while for dichotomous variables, we used the McNamar test. To organize the data in a time-series format, we created a table where the candidate variables were pivoted into columns and dates were represented in rows. Missing values were handled by forward-filling for laboratory test values and diagnoses, and zero-filling for medications and treatments. To predict DILI, the data were split over a 4-week window size of the sequential data with a 2-week shift into the prediction period.

5. DILI Prediction Modeling

For DILI prediction modeling, we utilized an advanced LSTM model called the IMV-LSTM module. This model is designed to predict and interpret multivariate time series data [16]. As illustrated in Supplementary Figure S1, we introduced the IMV-LSTM model, which enhances the conventional LSTM model by considering the temporal aspect of each variable. This model utilizes multivariate time series data to expand hidden states for each variable, enabling the computation of variable attention and temporal attention scores. These scores reflect the importance of both variables and time in the model’s interpretation.

For DILI prediction, independent datasets were meticulously curated for each drug within each hospital. Subsequently, these datasets were partitioned into training, testing, and validation sets, maintaining a balanced distribution of 6:2:2. The training process encompassed training each model for 200 epochs, adopting a batch size of 64 and a learning rate of 0.001. To mitigate overfitting, we implemented early stopping using the Adam optimizer after 20 epochs. The performance evaluation of each model was conducted based on the area under the receiver operating characteristic curve (AUROC) value on their respective internal test sets. Additionally, supplementary metrics such as accuracy, precision, F1-score, and the area under the precision-recall curve (AUPRC) were also presented to provide a comprehensive assessment of model performance.

In this study, DILI prediction models were created for each hospital and drug, and each model had a different selection of candidate variables. To interpret the predictors in each model, variable-wise attention scores and temporal-wise attention scores were extracted from all trained models. These scores were then aggregated by calculating an overall temporal attention score, which was obtained by taking a weighted average of the temporal attention value over the variable attention value for each predictor variable. The resulting scores were plotted as a heatmap for interpretation.

After assembling the cohort from individual institutions through the DRNs, the execution code utilized by the primary hospital, which was publicly available on GitHub [20], was shared with each participating hospital. Subsequently, the code was executed, and only non-sensitive results were obtained and consolidated.

III. Results

1. Demographic and Clinical Characteristics

In this study, a total of 336,680 patients were included in the cohort across six institutions. Among them, 3,833 patients were identified as experiencing DILI, resulting in an overall incidence rate of 1.15% for all ARBs. Among the drugs, losartan (1.30%) had the highest incidence rate, followed by valsartan (1.28%), candesartan (1.21%), irbesartan (1.07%), telmisartan (1.0%), and olmesartan (0.85%). Regarding the incidence by hospital, NCC had the highest incidence (6%), followed by SH (1.52%), AJUH (1.34%), GSH (1.21%), KYUH (1.08%), and SNUH (0.5%). However, olmesartan (14 cases) and irbesartan (10 cases), with fewer than 20 case samples in the NCC, were excluded from the analysis.

2. Model Performance

To evaluate the DILI predictive model, the AUROCs for each drug and each hospital are shown in Figure 2. Telmisartan had the highest average AUROC (0.93; 95% confidence interval [CI], 0.91–0.96), followed by irbesartan (0.90; 95% CI, 0.85–0.97), losartan (0.89; 95% CI, 0.85–0.95]), olmesartan (0.89; 95% CI, 0.83–0.95), and candesartan (0.83; 95% CI, 0.73–0.95]), with valsartan having the lowest average AUROC (0.79; 95% CI, 0.68–0.91). The results indicate distinct variations in drug performance across different hospitals. For example, candesartan had the highest AUROC at SH (0.96; 95% CI, 0.95–0.98) but the lowest at SNHU (0.61; 95% CI, 0.51–0.71). Irbesartan showed the highest performance at GSH (0.97; 95% CI, 0.92–1.00) but the lowest performance at KYUH (0.78; 95% CI, 0.66–0.89).

Figure 2

Receiver operating characteristic (ROC) curves of the drug-induced liver injury (DILI) prediction model for each hospital and each drug: (A) losartan, (B) candesartan, (C) telmisartan, (D) olmesartan, (E) lrbesartan, and (F) valsartan. SH: Severance Hospital, GSH: Gangnam Severance Hospital, KYUH: Konyang University Hospital, AJUH: Ajou University Hospital, SNUH: Seoul National University Cancer Hospital, NCC: National Cancer Center, AUC: area under the ROC curve.

To confirm the robustness of the DILI prediction model, additional performance metrics were calculated for the trained models. These metrics are shown in Table 2, with an overall average AUPRC of 0.76, an F1 score of 0.71, an accuracy of 0.85, and a precision of 0.79. Telmisartan at KYUH had the highest AUPRC value (0.95), followed by candesartan and olmesartan at SH (0.91). However, there were some poorly trained or overfitted models based on the F1-score, including candesartan at SNUH (0.17), valsartan at SNUH (0.12), and valsartan at NCC (0.23).

Performance metrics of the DILI prediction model for each hospital and each drug

3. Aggregated Attention Scores of the DILI Prediction Model

In order to interpret the DILI prediction model, we demonstrated each contributor variable’s temporal attention values, which were weighted aggregations from the model for each institution and drug (Figure 3). The last week of hematocrit (0.36) showed the highest attention scores, followed by albumin (0.34), hypertensive disorder (0.33), prothrombin time (0.32), lymphocytes (0.32), and cholesterol (0.3). These variables displayed an increasing trend in their attention scores. In addition, the temporal pattern was verified by visualizing the distribution of the actual data of the matching variables. The attention scores for all variables across all hospitals are presented in Supplementary Table S2.

Figure 3

Temporal attention score of important features of the drug-induced liver injury (DILI) prediction model (A) and the distribution of actual data (B).

IV. Discussion

In this study, we developed and validated a DILI prediction model using IMV-LSTM for considering time-based explanations, using data from six hospitals based on a CDM for a multicenter study without data transfer. We confirmed the association between ARBs and DILI, consistent with previous literature reporting an incidence rate of less than 2%. We also observed subtle differences in the occurrence rates among different ARB drugs. The time-series-based learning model achieved a high average AUROC value of around 0.9, indicating excellent predictive performance. A comprehensive interpretation of the trained models highlighted the significant impact of indicators such as hematocrit, albumin, hypertensive disorder, prothrombin time, and lymphocytes, which are increasingly highlighted from 4 weeks to 1 week prior to the occurrence of DILI. However, considering the influence of other biases, further examination is necessary. This study holds significance as it adopted the protocol used in the FDA Sentinel for clinical post-marketing surveillance purposes and adapted it to the DRN setting, which is operated by national agencies [21]. This aligns the study with established protocols and enhances its applicability for real-world monitoring of drug safety.

A multicenter study requires a standardized process and terminological system. We used the most common and major DILI ADR terminology set and protocol, which can be a cornerstone for further research. Moreover, we shared the defined SQL query, specification documentation, and ATLAS definitions on GitHub. Finally, we have distributed open-source packages for the public to contribute to DILI research.

There have been few multicenter-based studies on this issue. To the best of knowledge, this is the first multicenter and national-level study using a CDM for DILI prediction, which is important for reliability, providing more significant results with big data and protecting patients’ private information from leaking. In particular, we adopted a time-related attention mechanism to reveal the importance of variables at each time point. Explainability is one of the essential components for clinical implementation, and our study results can provide patient-level explanations, which is a strong point for future applications.

Nonetheless, there are some limitations of our study. Firstly, the overall incidence of DILI was relatively low, being less than 2%. Despite this limitation, we employed PSM and utilized multicenter data to enhance the robustness and validity of our analysis. Secondly, the CDM had certain limitations in terms of the coverage and granularity of specific DILI-related features. As a result, we may not have been able to consider a wide range of variables that could potentially contribute to the prediction of DILI. Despite these limitations, our study provides valuable insights into the prediction of DILI in patients taking ARBs by leveraging multicenter data and utilizing a comprehensive time-series deep learning approach. Future research could advance our understanding of DILI and ability to predict DILI by implementing federated learning and utilizing multi-institutional DRNs at the national level.


Conflict of Interest

Rae Woong Park is an editorial member of Healthcare Informatics Research; however, he did not involve in the peer reviewer selection, evaluation, and decision process of this article. Otherwise, no potential conflict of interest relevant to this article was reported.


This study was supported by the Bio-Industrial Technology Development Program (No. 20014841), funded by the Ministry of Trade, Industry & Energy (MOTIE, South Korea). We would like to thank the MCU (Medical informatics Collaborative Unit) members of Yonsei University College of Medicine for their assistance in data analysis.


1. Liu Y, Aickelin U. Feature selection in detection of adverse drug reactions from the Health Improvement Network (THIN) database. arXiv [Preprint] 2014. Sep. 2.
2. Montastruc JL, Lafaurie M, de Canecaude C, Durrieu G, Sommet A, Montastruc F, et al. Fatal adverse drug reactions: a worldwide perspective in the World Health Organization pharmacovigilance database. Br J Clin Pharmacol 2021;87(11):4334–40.
3. Sonawane KB, Cheng N, Hansen RA. Serious adverse drug events reported to the FDA: analysis of the FDA adverse event reporting system 2006–2014 database. J Manag Care Spec Pharm 2018;24(7):682–90.
4. Jaganathan K, Tayara H, Chong KT. Prediction of drug-induced liver toxicity using SVM and optimal descriptor sets. Int J Mol Sci 2021;22(15):8073.
5. Liu A, Walter M, Wright P, Bartosik A, Dolciami D, Elbasir A, et al. Prediction and mechanistic analysis of drug-induced liver injury (DILI) based on chemical structure. Biol Direct 2021;16(1):6.
6. Chalasani N, Bjornsson E. Risk factors for idiosyncratic drug-induced liver injury. Gastroenterology 2010;138(7):2246–59.
7. Chen Z, Jiang Y, Zhang X, Zheng R, Qiu R, Sun Y, et al. The prediction approach of drug-induced liver injury: response to the issues of reproducible science of artificial intelligence in real-world applications. Brief Bioinform 2022. 23(4)bbac196.
8. Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc 2014;21(4):578–82.
9. Huser V, Amos L. Analyzing real-world use of research common data elements. AMIA Annu Symp Proc 2018;2018:602–8.
10. Voss EA, Makadia R, Matcho A, Ma Q, Knoll C, Schuemie M, et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J Am Med Inform Assoc 2015;22(3):553–64.
11. Ryu B, Yoo S, Kim S, Choi J. Development of prediction models for unplanned hospital readmission within 30 days based on common data model: a feasibility study. Methods Inf Med 2021;60(S 02):e65–75.
12. Ball R, Robb M, Anderson SA, Dal Pan G. The FDA’s sentinel initiative: a comprehensive approach to medical product surveillance. Clin Pharmacol Ther 2016;99(3):265–8.
13. The Council for International Organizations of Medical Sciences (CIOMS). Drug-induced liver injury (DILI): current status and future directions for drug development and the post-market setting Geneva, Switzerland: CIOMS; 2020.
14. Goldstein A, Kapelner A, Bleich J, Pitkin E. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. J Comput Graph Stat 2015;24(1):44–65.
15. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 2017;30:4765–74.
16. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. In : Proceedings of the IEEE International Conference on Computer Vision; 2017 Oct 22–29; Venice, Italy. p. 618–26.
17. Guo T, Lin T, Antulov-Fantulin N. Exploring interpretable LSTM neural networks over multi-variable data. In : Proceedings of the 36th International Conference on Machine Learning (ICML); 2019 Jun 9–15; Long Beach, CA. p. 2494–504.
18. Barreras A, Gurk-Turner C. Angiotensin II receptor blockers. Proc (Bayl Univ Med Cent) 2003;16(1):123–6.
19. Hill RD, Vaidya PN. Angiotensin II receptor blockers (ARB) Treasure Island (FL): StatPearls Publishing; 2019.
20. DigitalHealthcareLab. MOACDM [Internet] Seoul, Korea: DigitalHealthcareLab; 2023. [cited at 2023 Jul 27]. Available from:
21. Wang SV, Gagne JJ, Maro JC, Eworuke E, Kattinakere S, Kulldorff M, et al. Development and evaluation of a global propensity score for data mining with tree-based scan statistics (Sentinel Methods Protocol) Toledo (OH): The Sentinel System; 2018.

Article information Continued

Figure 1

The overall flowchart for predicting drug-induced liver injury (DILI ) events. SH: Severance Hospital, GSH: Gangnam Severance Hospital, KYUH: Konyang University Hospital, AJUH: Ajou University Hospital, SNUH: Seoul National University Cancer Hospital, NCC: National Cancer Center, ULN: upper limit of normal, IMV-LSTM: interpretability multivariate long short-term memory.

Figure 2

Receiver operating characteristic (ROC) curves of the drug-induced liver injury (DILI) prediction model for each hospital and each drug: (A) losartan, (B) candesartan, (C) telmisartan, (D) olmesartan, (E) lrbesartan, and (F) valsartan. SH: Severance Hospital, GSH: Gangnam Severance Hospital, KYUH: Konyang University Hospital, AJUH: Ajou University Hospital, SNUH: Seoul National University Cancer Hospital, NCC: National Cancer Center, AUC: area under the ROC curve.

Figure 3

Temporal attention score of important features of the drug-induced liver injury (DILI) prediction model (A) and the distribution of actual data (B).

Table 1

Cohort population before and after propensity score matching

Propensity score matching Drug Total

Valsartan Losartan Candesartan Telmisartan Olmesartan Irbesartan


 SH (n = 110,309) 463 27,055 1.71 350 22,207 1.58 379 23,707 1.6 216 16,350 1.32 131 11,713 1.12 113 7,625 1.48 1,652 108,657 1.52
 GSH (n = 34,420) 124 9,181 1.35 96 7,262 1.32 83 6,734 1.23 51 4,536 1.12 29 3,692 0.79 27 2,605 1.04 410 34,010 1.21
 KYUH (n = 28,615) 37 3,870 0.96 71 6,069 1.17 52 4,515 1.15 34 3,173 1.07 75 6,377 1.18 37 4,305 0.86 306 28,309 1.08
 AJUH (n = 49,451) 116 8,090 1.43 64 4,411 1.45 133 8,396 1.58 182 15,375 1.18 77 6,899 1.12 81 5,627 1.44 653 48,798 1.34
 SNUH (n = 109,112) 110 20,366 0.54 216 21,775 0.68 57 16,155 0.35 81 18,402 0.44 28 12,427 0.23 50 9,445 0.53 542 108,570 0.5
 NCC (n = 4773) 36 814 4.42 163 2,222 7.34 24 475 5.05 23 572 4.02 14 317 4.42 10 103 9.71 270 4,503 6
 Total (n = 336,680) 886 69,376 1.28 960 79,946 1.30 728 59,982 1.21 587 58,408 1 354 41,425 0.85 318 29,710 1.07 3,833 332,847 1.15

 SH (n = 6,608) 463 1,389 33 350 1,050 33 379 1,137 33 216 648 33 131 397 33 113 339 33 1,652 4,956 33
 GSH (n = 1,640) 124 372 33 96 288 33 83 249 33 51 153 33 29 87 33 27 81 33 410 1,230 33
 KYUH (n = 1,224) 37 111 33 71 213 33 52 156 33 34 102 33 75 225 33 37 111 33 306 918 33
 AJUH (n = 2,612) 116 348 33 64 192 33 133 399 33 182 546 33 77 231 33 81 243 33 653 1,959 33
 SNUH (n = 2,168) 110 330 33 216 648 33 57 171 33 81 243 33 28 84 33 50 150 33 542 1,626 33
 NCC (n = 1,080) 36 108 33 163 489 33 24 72 33 23 69 33 - - * - - * 246 738 33
 Total (n = 15,332) 886 2,658 33 960 2,880 33 728 2,184 33 587 1,761 33 340 1,020 33 308 924 33 3,809 11,427 33

DILI: drug-induced liver injury, SH: Severance Hospital, GSH: Gangnam Severance Hospital, KYUH: Konyang University Hospital, AJUH: Ajou University Hospital, SNUH: Seoul National University Hospital, NCC: National Cancer Center.


excluded if there are less than 20 cases in the case group (NCC: telmisartan, irbesartan).

Table 2

Performance metrics of the DILI prediction model for each hospital and each drug

Drug Hospital AUROC AUPRC F1-score Accuracy Precision
Candesartan GSH 0.95 0.87 0.84 0.91 0.89
SH 0.96 0.91 0.88 0.93 0.93
KYUH 0.84 0.84 0.77 0.84 0.87
AJUH 0.94 0.85 0.81 0.90 0.88
SNUH 0.61 0.41 0.17 0.70 0.46
NCC 0.70 0.75 0.55 0.81 0.92
All hospitals 0.83 0.77 0.67 0.85 0.83

Irbesartan GSH 0.97 0.72 0.96 0.97 0.96
SH 0.94 0.88 0.86 0.90 0.84
KYUH 0.78 0.69 0.56 0.64 0.65
AJUH 0.93 0.79 0.75 0.87 0.80
SNUH 0.90 0.86 0.80 0.87 0.88
NCC - - - - -
All hospitals 0.90 0.79 0.79 0.85 0.83

Losartan GSH 0.91 0.46 0.83 0.89 0.86
SH 0.95 0.87 0.84 0.90 0.83
KYUH 0.91 0.82 0.79 0.87 0.81
AJUH 0.90 0.77 0.73 0.87 0.78
SNUH 0.92 0.66 0.82 0.88 0.85
NCC 0.76 0.57 0.50 0.76 0.54
All hospitals 0.89 0.69 0.75 0.86 0.78

Olmesartan GSH 0.90 0.79 0.75 0.82 0.75
SH 0.97 0.91 0.90 0.93 0.90
KYUH 0.88 0.75 0.70 0.83 0.76
AJUH 0.91 0.77 0.72 0.88 0.82
SNUH 0.76 0.70 0.59 0.74 0.72
NCC - - - - -
All hospitals 0.88 0.78 0.73 0.84 0.79

Telmisartan GSH 0.90 0.73 0.67 0.78 0.71
SH 0.95 0.89 0.87 0.92 0.87
KYUH 0.96 0.95 0.93 0.95 0.96
AJUH 0.95 0.83 0.81 0.90 0.81
SNUH 0.89 0.80 0.76 0.82 0.75
NCC - - - - -
All hospitals 0.93 0.84 0.81 0.87 0.82

Valsartan GSH 0.90 0.85 0.81 0.87 0.84
SH 0.94 0.86 0.81 0.89 0.87
KYUH 0.70 0.68 0.59 0.76 0.68
AJUH 0.95 0.88 0.86 0.92 0.86
SNUH 0.59 0.46 0.12 0.68 0.56
NCC 0.64 0.43 0.23 0.75 0.50
All hospitals 0.82 0.75 0.64 0.82 0.76

All drug All hospitals 0.87 0.76 0.71 0.85 0.79

DILI: drug-induced liver injury, GSH: Gangnam Severance Hospital, SH: Severance Hospital, KYUH: Konyang University Hospital, AJUH: Ajou University Hospital, SNUH: Seoul National University Hospital, NCC: National Cancer Center, AUROC: area under the receiver operating characteristics, AUPRC: area under the precision-recall curve.