Development and Verification of Time-Series Deep Learning for Drug-Induced Liver Injury Detection in Patients Taking Angiotensin II Receptor Blockers: A Multicenter Distributed Research Network Approach

Article information

Healthc Inform Res. 2023;29(3):246-255

Publication date (electronic) : 2023 July 31

doi : https://doi.org/10.4258/hir.2023.29.3.246

Suncheol Heo ¹^,^*

, Jae Yong Yu ¹^,^*

, Eun Ae Kang ²

, Hyunah Shin ³

, Kyeongmin Ryu ³

, Chungsoo Kim ⁴

, Yebin Chegal ⁵

, Hyojung Jung ⁶

, Suehyun Lee ³

, Rae Woong Park ⁴

, Kwangsoo Kim ⁷

, Yul Hwangbo ⁶

, Jae-Hyun Lee ⁸

, Yu Rang Park ¹

¹Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea

²Medical Informatics Collaborative Unit, Department of Research Affairs, Yonsei University College of Medicine, Seoul, Korea

³Healthcare Data Science Center, Konyang University Hospital, Daejeon, Korea

⁴Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Seoul, Korea

⁵Department of Statistics, Korea University, Suwon, Korea

⁶Healthcare AI Team, National Cancer Center, Goyang, Korea

⁷Transdisciplinary Department of Medicine & Advanced Technology, Seoul National University Hospital, Seoul, Korea

⁸Division of Allergy and Immunology, Department of Internal Medicine, Institute of Allergy, Yonsei University College of Medicine, Seoul, Korea

Corresponding Author: Yu Rang Park, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul 03772, Korea. Tel: +82-2-2228-2493, E-mail: yurangpark@yuhs.ac (https://orcid.org/0000-0002-4210-2094)

*These authors contributed equally to this work.

Received 2023 May 18; Revised 2023 July 20; Accepted 2023 July 23.

Abstract

Objectives

The objective of this study was to develop and validate a multicenter-based, multi-model, time-series deep learning model for predicting drug-induced liver injury (DILI) in patients taking angiotensin receptor blockers (ARBs). The study leveraged a national-level multicenter approach, utilizing electronic health records (EHRs) from six hospitals in Korea.

Methods

A retrospective cohort analysis was conducted using EHRs from six hospitals in Korea, comprising a total of 10,852 patients whose data were converted to the Common Data Model. The study assessed the incidence rate of DILI among patients taking ARBs and compared it to a control group. Temporal patterns of important variables were analyzed using an interpretable time-series model.

Results

The overall incidence rate of DILI among patients taking ARBs was found to be 1.09%. The incidence rates varied for each specific ARB drug and institution, with valsartan having the highest rate (1.24%) and olmesartan having the lowest rate (0.83%). The DILI prediction models showed varying performance, measured by the average area under the receiver operating characteristic curve, with telmisartan (0.93), losartan (0.92), and irbesartan (0.90) exhibiting higher classification performance. The aggregated attention scores from the models highlighted the importance of variables such as hematocrit, albumin, prothrombin time, and lymphocytes in predicting DILI.

Conclusions

Implementing a multicenter-based time-series classification model provided evidence that could be valuable to clinicians regarding temporal patterns associated with DILI in ARB users. This information supports informed decisions regarding appropriate drug use and treatment strategies.

Keywords: Adverse Drug Reaction; Time-Series Classification; Distributed Research Network; Common Data Model; Multicenter Study

I. Introduction

Adverse drug reactions (ADRs) are a significant concern for public health, as they can cause hospital admissions and rank among the leading causes of death [1,2]. According to the Food and Drug Administration, the number of ADRs has been steadily increasing over the years, tripling from 2006 to 2014 [3]. Drug-induced liver injury (DILI), in particular, stands out as one of the primary reasons underlying ADRs in real-world treatment, significantly affecting patient safety and drug development [4,5]. Despite the importance of predicting DILI risk for ensuring safety, adverse hepatic effects on health remain unpredictable, and there is insufficient evidence to support risk factors for DILI resulting from medications [4,6].

Therefore, many researchers have focused on identifying the early hepatotoxic risk for future intervention using artificial intelligence (AI) and big data [4,5,7]. A study of Jaganathan et al. [4] presented an accuracy of 0.811 using a molecular-level support vector machine in 2021. Chen et al. [7] developed a multi-source-based prediction model using the ResNet-18 deep neural network. However, these studies had two main limitations. One is an insufficient standardized multicenter validation study, which would obtain more reliable results through acceptable validation. Due to the risk of leaking patient information and many laws related to protecting patient information, multicenter studies encounter some hurdles. The other limitation lies in the black-box nature of AI. Most research has utilized conventional statistical methods or deep learning techniques [4,5,7]. Although the performance of these methods is sufficiently high, the prediction results lack explainability, which is a necessary component for clinical implementation. The absence of explainability makes it difficult to implement models in clinical environments.

Multicenter research has been conducted using many standardized models such as Patient-Centered Outcomes Research Network, National Institute Health Common Data elements, and Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) [8–10]. CDM-based studies showed effective results due to the advantage of having an optimized data structure and terminology system for multicenter studies. OMOP-CDM is superior to other CDM models in terms of content coverage, integrity, and integration. Most medical center data have been converted into OMOP-CDM [11]. Furthermore, in 2016, the US Food and Drug Administration (FDA) Sentinel, a specialized system for drug surveillance [12], was initiated. While research using this system has been conducted on specific areas such as pharmacoepidemiology and hemorrhage in the USA, its application for DILI has not been explored [13].

In terms of model explainability, several techniques such as grad-cam, Shapley values, and partial dependent plots have been suggested [14–16]. However, most of these methodologies were primarily designed for image or tabular data. The interpretability multivariate long short-term memory (IMVLSTM) model was recently published, which considers time-based explanations [17]. To the best of our knowledge, there has been no application of time-based explanations in DILI research.

To address these research gaps, we developed and validated a multicenter-based explainable time-series AI model for predicting DILI using data from six hospitals in Korea.

II. Methods

1. Study Design

This study is a retrospective cohort analysis using a standardized CDM of Electronic Health Records (EHRs) from six hospitals in South Korea to predict DILI. The data sources include Severance Hospital (SH), Gangnam Severance Hospital (GSH), Konyang University Hospital (KYUH), Ajou University Hospital (AJUH), Seoul National University Cancer Hospital (SNUH), and the National Cancer Center (NCC). The study utilized OMOP-CDM version 5.3.1. In order to identify risk factors for DILI, we constructed cohorts based on each hospital and drug. The distributed research networks (DRNs)’s based on CDM encompassed a vast population of approximately 12.47 million individuals from 1994 to 2021. From this extensive dataset, we curated a final cohort consisting of 15,236 subjects, comprising 3,809 cases and 11,427 controls. The detailed study design can be found in Supplementary Table S1.

This study was approved by the Institutional Review Committee of Severance Hospital (No. 4-2021-1209), Gangnam Severance Hospital (No. 3-2021-0005), Konyang University Hospital (No. KYUH 2021-10-003-001), Ajou University Hospital (No. AJIRB-MED-MDB-21-676), Seoul National University Cancer Hospital (No. E-2207-151-1342), and the National Cancer Center (No. NCC2022-0184).

2. Definition of Drug-Induced Liver Injury

In this research, we employed criteria for defining DILI classification stages that align with the “injury” category. These were: (1) an alanine aminotransferase (ALT) elevation ≥5 times the upper limit of normal (ULN), (2) an alkaline phosphatase (ALP) elevation ≥2 times the ULN, or (3) an ALT ≥3 times the ULN accompanied by a total bilirubin concentration above 2 times the ULN [16].

3. Cohort Definition

This study aimed to predict DILI by focusing on six selected drugs in the category of angiotensin II receptor blockers (ARBs): losartan, candesartan, telmisartan, olmesartan, irbesartan, and valsartan. These specific drugs were carefully selected from a pool of eight ARBs commonly reported in the literature and frequently encountered in hospitals [18,19]. In addition, a drug was selected as a target drug if at least 20 cases of DILI were recorded in the six hospitals participating in the study (Table 1).

Table 1

Cohort population before and after propensity score matching

For the case cohort, we included patients who had been administered any one of the six ARBs. The index date for this target cohort was determined as the initial administration date of the ARBs. Initially, we included patients who met the criteria for DILI within 60 days after the index date. The control cohort was defined as patients who did not exhibit DILI within 60 days after the index date. To minimize confounding factors compared to cases, we performed propensity score matching (PSM) using the K-nearest neighbor algorithm based on age, sex, and baseline liver function tests (LFTs) at the time of enrollment, maintaining a 1:3 ratio between controls and cases. The LFTs utilized in the matching process included aspartate aminotransferase (AST), ALT, ALP, and total bilirubin (TBL). The inclusion criteria mandated that the visit record be at least 30 days prior to the index date, with the patient having undergone at least two LFTs within 60 days preceding the index date during the pre-observation period. Exclusion criteria encompassed cases where the measured LFT value exceeded the ULN value. To provide an overview of the cohort construction process, we have presented a diagram in Figure 1. For accessibility, the cohort definitions created in ATLAS are available as JSON files on GitHub [20].

Figure 1

The overall flowchart for predicting drug-induced liver injury (DILI ) events. SH: Severance Hospital, GSH: Gangnam Severance Hospital, KYUH: Konyang University Hospital, AJUH: Ajou University Hospital, SNUH: Seoul National University Cancer Hospital, NCC: National Cancer Center, ULN: upper limit of normal, IMV-LSTM: interpretability multivariate long short-term memory.

4. Candidate Predictors for the Time-Series

In this study, we extracted candidate predictors from various domains within the OMOP-CDM by querying per-patient observational data using Python’s SQL query tools. Candidate variables were selected from all concepts used in the person domain of the CDM (sex, age), and in four main domains: measurement, drug exposure, condition occurrence, and procedure occurrence. We handled laboratory tests as continuous variables and the rest as dichotomous variables. To select the predictors, we conducted statistical tests to assess the significance of the difference between the cohort’s enrollment time and the onset date of DILI. For continuous variables, we employed the paired t-test, while for dichotomous variables, we used the McNamar test. To organize the data in a time-series format, we created a table where the candidate variables were pivoted into columns and dates were represented in rows. Missing values were handled by forward-filling for laboratory test values and diagnoses, and zero-filling for medications and treatments. To predict DILI, the data were split over a 4-week window size of the sequential data with a 2-week shift into the prediction period.

5. DILI Prediction Modeling

For DILI prediction modeling, we utilized an advanced LSTM model called the IMV-LSTM module. This model is designed to predict and interpret multivariate time series data [16]. As illustrated in Supplementary Figure S1, we introduced the IMV-LSTM model, which enhances the conventional LSTM model by considering the temporal aspect of each variable. This model utilizes multivariate time series data to expand hidden states for each variable, enabling the computation of variable attention and temporal attention scores. These scores reflect the importance of both variables and time in the model’s interpretation.

For DILI prediction, independent datasets were meticulously curated for each drug within each hospital. Subsequently, these datasets were partitioned into training, testing, and validation sets, maintaining a balanced distribution of 6:2:2. The training process encompassed training each model for 200 epochs, adopting a batch size of 64 and a learning rate of 0.001. To mitigate overfitting, we implemented early stopping using the Adam optimizer after 20 epochs. The performance evaluation of each model was conducted based on the area under the receiver operating characteristic curve (AUROC) value on their respective internal test sets. Additionally, supplementary metrics such as accuracy, precision, F1-score, and the area under the precision-recall curve (AUPRC) were also presented to provide a comprehensive assessment of model performance.

In this study, DILI prediction models were created for each hospital and drug, and each model had a different selection of candidate variables. To interpret the predictors in each model, variable-wise attention scores and temporal-wise attention scores were extracted from all trained models. These scores were then aggregated by calculating an overall temporal attention score, which was obtained by taking a weighted average of the temporal attention value over the variable attention value for each predictor variable. The resulting scores were plotted as a heatmap for interpretation.

After assembling the cohort from individual institutions through the DRNs, the execution code utilized by the primary hospital, which was publicly available on GitHub [20], was shared with each participating hospital. Subsequently, the code was executed, and only non-sensitive results were obtained and consolidated.

III. Results

1. Demographic and Clinical Characteristics

In this study, a total of 336,680 patients were included in the cohort across six institutions. Among them, 3,833 patients were identified as experiencing DILI, resulting in an overall incidence rate of 1.15% for all ARBs. Among the drugs, losartan (1.30%) had the highest incidence rate, followed by valsartan (1.28%), candesartan (1.21%), irbesartan (1.07%), telmisartan (1.0%), and olmesartan (0.85%). Regarding the incidence by hospital, NCC had the highest incidence (6%), followed by SH (1.52%), AJUH (1.34%), GSH (1.21%), KYUH (1.08%), and SNUH (0.5%). However, olmesartan (14 cases) and irbesartan (10 cases), with fewer than 20 case samples in the NCC, were excluded from the analysis.

2. Model Performance

To evaluate the DILI predictive model, the AUROCs for each drug and each hospital are shown in Figure 2. Telmisartan had the highest average AUROC (0.93; 95% confidence interval [CI], 0.91–0.96), followed by irbesartan (0.90; 95% CI, 0.85–0.97), losartan (0.89; 95% CI, 0.85–0.95]), olmesartan (0.89; 95% CI, 0.83–0.95), and candesartan (0.83; 95% CI, 0.73–0.95]), with valsartan having the lowest average AUROC (0.79; 95% CI, 0.68–0.91). The results indicate distinct variations in drug performance across different hospitals. For example, candesartan had the highest AUROC at SH (0.96; 95% CI, 0.95–0.98) but the lowest at SNHU (0.61; 95% CI, 0.51–0.71). Irbesartan showed the highest performance at GSH (0.97; 95% CI, 0.92–1.00) but the lowest performance at KYUH (0.78; 95% CI, 0.66–0.89).

Figure 2

Receiver operating characteristic (ROC) curves of the drug-induced liver injury (DILI) prediction model for each hospital and each drug: (A) losartan, (B) candesartan, (C) telmisartan, (D) olmesartan, (E) lrbesartan, and (F) valsartan. SH: Severance Hospital, GSH: Gangnam Severance Hospital, KYUH: Konyang University Hospital, AJUH: Ajou University Hospital, SNUH: Seoul National University Cancer Hospital, NCC: National Cancer Center, AUC: area under the ROC curve.

To confirm the robustness of the DILI prediction model, additional performance metrics were calculated for the trained models. These metrics are shown in Table 2, with an overall average AUPRC of 0.76, an F1 score of 0.71, an accuracy of 0.85, and a precision of 0.79. Telmisartan at KYUH had the highest AUPRC value (0.95), followed by candesartan and olmesartan at SH (0.91). However, there were some poorly trained or overfitted models based on the F1-score, including candesartan at SNUH (0.17), valsartan at SNUH (0.12), and valsartan at NCC (0.23).

Table 2

Performance metrics of the DILI prediction model for each hospital and each drug

3. Aggregated Attention Scores of the DILI Prediction Model

In order to interpret the DILI prediction model, we demonstrated each contributor variable’s temporal attention values, which were weighted aggregations from the model for each institution and drug (Figure 3). The last week of hematocrit (0.36) showed the highest attention scores, followed by albumin (0.34), hypertensive disorder (0.33), prothrombin time (0.32), lymphocytes (0.32), and cholesterol (0.3). These variables displayed an increasing trend in their attention scores. In addition, the temporal pattern was verified by visualizing the distribution of the actual data of the matching variables. The attention scores for all variables across all hospitals are presented in Supplementary Table S2.

Figure 3

Temporal attention score of important features of the drug-induced liver injury (DILI) prediction model (A) and the distribution of actual data (B).

IV. Discussion

In this study, we developed and validated a DILI prediction model using IMV-LSTM for considering time-based explanations, using data from six hospitals based on a CDM for a multicenter study without data transfer. We confirmed the association between ARBs and DILI, consistent with previous literature reporting an incidence rate of less than 2%. We also observed subtle differences in the occurrence rates among different ARB drugs. The time-series-based learning model achieved a high average AUROC value of around 0.9, indicating excellent predictive performance. A comprehensive interpretation of the trained models highlighted the significant impact of indicators such as hematocrit, albumin, hypertensive disorder, prothrombin time, and lymphocytes, which are increasingly highlighted from 4 weeks to 1 week prior to the occurrence of DILI. However, considering the influence of other biases, further examination is necessary. This study holds significance as it adopted the protocol used in the FDA Sentinel for clinical post-marketing surveillance purposes and adapted it to the DRN setting, which is operated by national agencies [21]. This aligns the study with established protocols and enhances its applicability for real-world monitoring of drug safety.

A multicenter study requires a standardized process and terminological system. We used the most common and major DILI ADR terminology set and protocol, which can be a cornerstone for further research. Moreover, we shared the defined SQL query, specification documentation, and ATLAS definitions on GitHub. Finally, we have distributed open-source packages for the public to contribute to DILI research.

There have been few multicenter-based studies on this issue. To the best of knowledge, this is the first multicenter and national-level study using a CDM for DILI prediction, which is important for reliability, providing more significant results with big data and protecting patients’ private information from leaking. In particular, we adopted a time-related attention mechanism to reveal the importance of variables at each time point. Explainability is one of the essential components for clinical implementation, and our study results can provide patient-level explanations, which is a strong point for future applications.

Nonetheless, there are some limitations of our study. Firstly, the overall incidence of DILI was relatively low, being less than 2%. Despite this limitation, we employed PSM and utilized multicenter data to enhance the robustness and validity of our analysis. Secondly, the CDM had certain limitations in terms of the coverage and granularity of specific DILI-related features. As a result, we may not have been able to consider a wide range of variables that could potentially contribute to the prediction of DILI. Despite these limitations, our study provides valuable insights into the prediction of DILI in patients taking ARBs by leveraging multicenter data and utilizing a comprehensive time-series deep learning approach. Future research could advance our understanding of DILI and ability to predict DILI by implementing federated learning and utilizing multi-institutional DRNs at the national level.

Notes

Conflict of Interest

Rae Woong Park is an editorial member of Healthcare Informatics Research; however, he did not involve in the peer reviewer selection, evaluation, and decision process of this article. Otherwise, no potential conflict of interest relevant to this article was reported.

Acknowledgments

This study was supported by the Bio-Industrial Technology Development Program (No. 20014841), funded by the Ministry of Trade, Industry & Energy (MOTIE, South Korea). We would like to thank the MCU (Medical informatics Collaborative Unit) members of Yonsei University College of Medicine for their assistance in data analysis.

Supplementary Materials

Supplementary materials can be found via https://doi.org/10.4258/hir.2023.29.3.246

hir-2023-29-3-246-Supplementary-Fig-S1.pdf

hir-2023-29-3-246-Supplementary-Table-S1.pdf

hir-2023-29-3-246-Supplementary-Table-S2.xlsx

References

1. Liu Y, Aickelin U. Feature selection in detection of adverse drug reactions from the Health Improvement Network (THIN) database. arXiv [Preprint] 2014. Sep. 2. https://doi.org/10.48550/arXiv.1409.0775.

2. Montastruc JL, Lafaurie M, de Canecaude C, Durrieu G, Sommet A, Montastruc F, et al. Fatal adverse drug reactions: a worldwide perspective in the World Health Organization pharmacovigilance database. Br J Clin Pharmacol 2021;87(11):4334–40. https://doi.org/10.1111/bcp.14851.

3. Sonawane KB, Cheng N, Hansen RA. Serious adverse drug events reported to the FDA: analysis of the FDA adverse event reporting system 2006–2014 database. J Manag Care Spec Pharm 2018;24(7):682–90. https://doi.org/10.18553/jmcp.2018.24.7.682.

4. Jaganathan K, Tayara H, Chong KT. Prediction of drug-induced liver toxicity using SVM and optimal descriptor sets. Int J Mol Sci 2021;22(15):8073. https://doi.org/10.3390/ijms22158073.

5. Liu A, Walter M, Wright P, Bartosik A, Dolciami D, Elbasir A, et al. Prediction and mechanistic analysis of drug-induced liver injury (DILI) based on chemical structure. Biol Direct 2021;16(1):6. https://doi.org/10.1186/s13062-020-00285-0.

6. Chalasani N, Bjornsson E. Risk factors for idiosyncratic drug-induced liver injury. Gastroenterology 2010;138(7):2246–59. https://doi.org/10.1053/j.gastro.2010.04.001.

7. Chen Z, Jiang Y, Zhang X, Zheng R, Qiu R, Sun Y, et al. The prediction approach of drug-induced liver injury: response to the issues of reproducible science of artificial intelligence in real-world applications. Brief Bioinform 2022. 23(4)bbac196. https://doi.org/10.1093/bib/bbac196.

8. Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc 2014;21(4):578–82. https://doi.org/10.1136/amiajnl-2014-002747.

9. Huser V, Amos L. Analyzing real-world use of research common data elements. AMIA Annu Symp Proc 2018;2018:602–8.

10. Voss EA, Makadia R, Matcho A, Ma Q, Knoll C, Schuemie M, et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J Am Med Inform Assoc 2015;22(3):553–64. https://doi.org/10.1093/jamia/ocu023.

11. Ryu B, Yoo S, Kim S, Choi J. Development of prediction models for unplanned hospital readmission within 30 days based on common data model: a feasibility study. Methods Inf Med 2021;60(S 02):e65–75. https://doi.org/10.1055/s-0041-1735166.

12. Ball R, Robb M, Anderson SA, Dal Pan G. The FDA’s sentinel initiative: a comprehensive approach to medical product surveillance. Clin Pharmacol Ther 2016;99(3):265–8. https://doi.org/10.1002/cpt.320.

13. The Council for International Organizations of Medical Sciences (CIOMS). Drug-induced liver injury (DILI): current status and future directions for drug development and the post-market setting Geneva, Switzerland: CIOMS; 2020. https://doi.org/10.56759/ojsg8296.

14. Goldstein A, Kapelner A, Bleich J, Pitkin E. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. J Comput Graph Stat 2015;24(1):44–65. https://doi.org/10.1080/10618600.2014.907095.

15. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 2017;30:4765–74.

16. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. In : Proceedings of the IEEE International Conference on Computer Vision; 2017 Oct 22–29; Venice, Italy. p. 618–26. https://doi.org/10.1109/ICCV.2017.74.

17. Guo T, Lin T, Antulov-Fantulin N. Exploring interpretable LSTM neural networks over multi-variable data. In : Proceedings of the 36th International Conference on Machine Learning (ICML); 2019 Jun 9–15; Long Beach, CA. p. 2494–504.

18. Barreras A, Gurk-Turner C. Angiotensin II receptor blockers. Proc (Bayl Univ Med Cent) 2003;16(1):123–6. https://doi.org/10.1080/08998280.2003.11927893.

19. Hill RD, Vaidya PN. Angiotensin II receptor blockers (ARB) Treasure Island (FL): StatPearls Publishing; 2019.

20. DigitalHealthcareLab. MOACDM [Internet] Seoul, Korea: DigitalHealthcareLab; 2023. [cited at 2023 Jul 27]. Available from: https://github.com/DigitalHealthcareLab/22MOACDM.

21. Wang SV, Gagne JJ, Maro JC, Eworuke E, Kattinakere S, Kulldorff M, et al. Development and evaluation of a global propensity score for data mining with tree-based scan statistics (Sentinel Methods Protocol) Toledo (OH): The Sentinel System; 2018.

Article information Continued

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 1

Cohort population before and after propensity score matching

Propensity score matching	Drug																		Total

	Valsartan			Losartan			Candesartan			Telmisartan			Olmesartan			Irbesartan

	DILI	Non-DILI	%	DILI	Non-DILI	%	DILI	Non-DILI	%	DILI	Non-DILI	%	DILI	Non-DILI	%	DILI	Non-DILI	%	DILI	Non-DILI	%

Before
SH (n = 110,309)	463	27,055	1.71	350	22,207	1.58	379	23,707	1.6	216	16,350	1.32	131	11,713	1.12	113	7,625	1.48	1,652	108,657	1.52
GSH (n = 34,420)	124	9,181	1.35	96	7,262	1.32	83	6,734	1.23	51	4,536	1.12	29	3,692	0.79	27	2,605	1.04	410	34,010	1.21
KYUH (n = 28,615)	37	3,870	0.96	71	6,069	1.17	52	4,515	1.15	34	3,173	1.07	75	6,377	1.18	37	4,305	0.86	306	28,309	1.08
AJUH (n = 49,451)	116	8,090	1.43	64	4,411	1.45	133	8,396	1.58	182	15,375	1.18	77	6,899	1.12	81	5,627	1.44	653	48,798	1.34
SNUH (n = 109,112)	110	20,366	0.54	216	21,775	0.68	57	16,155	0.35	81	18,402	0.44	28	12,427	0.23	50	9,445	0.53	542	108,570	0.5
NCC (n = 4773)	36	814	4.42	163	2,222	7.34	24	475	5.05	23	572	4.02	14	317	4.42	10	103	9.71	270	4,503	6
Total (n = 336,680)	886	69,376	1.28	960	79,946	1.30	728	59,982	1.21	587	58,408	1	354	41,425	0.85	318	29,710	1.07	3,833	332,847	1.15

After
SH (n = 6,608)	463	1,389	33	350	1,050	33	379	1,137	33	216	648	33	131	397	33	113	339	33	1,652	4,956	33
GSH (n = 1,640)	124	372	33	96	288	33	83	249	33	51	153	33	29	87	33	27	81	33	410	1,230	33
KYUH (n = 1,224)	37	111	33	71	213	33	52	156	33	34	102	33	75	225	33	37	111	33	306	918	33
AJUH (n = 2,612)	116	348	33	64	192	33	133	399	33	182	546	33	77	231	33	81	243	33	653	1,959	33
SNUH (n = 2,168)	110	330	33	216	648	33	57	171	33	81	243	33	28	84	33	50	150	33	542	1,626	33
NCC (n = 1,080)	36	108	33	163	489	33	24	72	33	23	69	33	-	-	*	-	-	*	246	738	33
Total (n = 15,332)	886	2,658	33	960	2,880	33	728	2,184	33	587	1,761	33	340	1,020	33	308	924	33	3,809	11,427	33

DILI: drug-induced liver injury, SH: Severance Hospital, GSH: Gangnam Severance Hospital, KYUH: Konyang University Hospital, AJUH: Ajou University Hospital, SNUH: Seoul National University Hospital, NCC: National Cancer Center.

excluded if there are less than 20 cases in the case group (NCC: telmisartan, irbesartan).

Table 2

Performance metrics of the DILI prediction model for each hospital and each drug

Drug	Hospital	AUROC	AUPRC	F1-score	Accuracy	Precision
Candesartan	GSH	0.95	0.87	0.84	0.91	0.89
	SH	0.96	0.91	0.88	0.93	0.93
	KYUH	0.84	0.84	0.77	0.84	0.87
	AJUH	0.94	0.85	0.81	0.90	0.88
	SNUH	0.61	0.41	0.17	0.70	0.46
	NCC	0.70	0.75	0.55	0.81	0.92
	All hospitals	0.83	0.77	0.67	0.85	0.83

Irbesartan	GSH	0.97	0.72	0.96	0.97	0.96
	SH	0.94	0.88	0.86	0.90	0.84
	KYUH	0.78	0.69	0.56	0.64	0.65
	AJUH	0.93	0.79	0.75	0.87	0.80
	SNUH	0.90	0.86	0.80	0.87	0.88
	NCC	-	-	-	-	-
	All hospitals	0.90	0.79	0.79	0.85	0.83

Losartan	GSH	0.91	0.46	0.83	0.89	0.86
	SH	0.95	0.87	0.84	0.90	0.83
	KYUH	0.91	0.82	0.79	0.87	0.81
	AJUH	0.90	0.77	0.73	0.87	0.78
	SNUH	0.92	0.66	0.82	0.88	0.85
	NCC	0.76	0.57	0.50	0.76	0.54
	All hospitals	0.89	0.69	0.75	0.86	0.78

Olmesartan	GSH	0.90	0.79	0.75	0.82	0.75
	SH	0.97	0.91	0.90	0.93	0.90
	KYUH	0.88	0.75	0.70	0.83	0.76
	AJUH	0.91	0.77	0.72	0.88	0.82
	SNUH	0.76	0.70	0.59	0.74	0.72
	NCC	-	-	-	-	-
	All hospitals	0.88	0.78	0.73	0.84	0.79

Telmisartan	GSH	0.90	0.73	0.67	0.78	0.71
	SH	0.95	0.89	0.87	0.92	0.87
	KYUH	0.96	0.95	0.93	0.95	0.96
	AJUH	0.95	0.83	0.81	0.90	0.81
	SNUH	0.89	0.80	0.76	0.82	0.75
	NCC	-	-	-	-	-
	All hospitals	0.93	0.84	0.81	0.87	0.82

Valsartan	GSH	0.90	0.85	0.81	0.87	0.84
	SH	0.94	0.86	0.81	0.89	0.87
	KYUH	0.70	0.68	0.59	0.76	0.68
	AJUH	0.95	0.88	0.86	0.92	0.86
	SNUH	0.59	0.46	0.12	0.68	0.56
	NCC	0.64	0.43	0.23	0.75	0.50
	All hospitals	0.82	0.75	0.64	0.82	0.76

All drug	All hospitals	0.87	0.76	0.71	0.85	0.79

DILI: drug-induced liver injury, GSH: Gangnam Severance Hospital, SH: Severance Hospital, KYUH: Konyang University Hospital, AJUH: Ajou University Hospital, SNUH: Seoul National University Hospital, NCC: National Cancer Center, AUROC: area under the receiver operating characteristics, AUPRC: area under the precision-recall curve.