### I. Introduction

*in-silico*computational DDI estimation approaches and to estimate potential DDIs. Commonly known similarity-based computational DDI estimation methods were used to discover new potential DDIs. The drug interaction profile was found to be a better predictor of DDIs than drug side effects and protein similarities between DDI pairs [4].

*t*-test was applied using the SPSS program to analyze all output results. Subsequently, mostly tree-based ML algorithms are applied using the Python programming language to predict the dexketoprofen outputs. Seven ML methods were compared to find the best method for estimating the optimal dexketoprofen pharmaceutical dosage formulation. The predicted system output was tested by specialists in the laboratory and the results were evaluated.

### II. Methods

### 1. Dataset Preparation

### 2. Statistical Analysis and ML Models used in the Dexketoprofen Dataset

*t*-test was used when the data had two independent groups with normal distributions [7,8]. Otherwise, the Mann-Whitney U test was used [9]. Moreover, if two Eudragit (15.16% and 17.34%) coating amounts are established to have differences from each other, that finding will provide support to make the right decisions in the next steps.

#### 1) Pre-processing for the dexketoprofen dataset

*t*-test and Levene test [19] were implemented for hardness values. The Mann-Whitney U test was implemented for friability and disintegration time values.

#### 2) ML models for the dexketoprofen dataset

#### 3) Evaluation criteria

*R*

^{2}(coefficient of determination) and root mean square error (RMSE). The model with the best

*R*

^{2}and RMSE for each output was selected and saved for final predictions. The obtained results were filtered according to the criteria determined by the Food and Drug Administration (FDA) [20] and the International Council for Harmonisation (ICH) Q6 series [21]. These criteria were friability <1%, disintegration <30 seconds, and a dissolution rate of 100%. The inputs determined after the filtering process were transferred to experts for testing.

### III. Results

### 1. Statistical Analysis of the Dexketoprofen Dataset

#### 1) Hardness values between groups

*t*-test was implemented with the Levene test, which yielded a

*p*-value of 0.435. Since this value was greater than 0.05, the variance was equal between groups. The

*p*-value obtained using the

*t*-test (0.504) was substantially greater than 0.05. Therefore, hardness showed no statistically significant difference between the tablet coating groups at a 5% significance level.

#### 2) Friability values between groups

*p*-value (0.640) that substantially exceeded the threshold of 0.05. Thus, friability had no statistically significant difference between tablet coating groups at a 5% significance level.

#### 3) Disintegration time values between groups

*p*-value (0.993) that was significantly higher than 0.05. Therefore, disintegration time showed no statistically significant difference between the tablet coating groups at a 5% significance level.

### 2. Machine Learning Models Based on the Dexketoprofen Dataset

*R*

^{2}) of 99% and an RMSE of 2.88. For friability, the model’s explanatory power was 92% (

*R*

^{2}) and the RMSE was 0.02. For disintegration time, the model’s explanatory power (

*R*

^{2}) was 97% and the RMSE was 10.09. The explanatory power for dissolution varied based on the time range; as shown in Table 2, the RMSE values were distributed between 1.89 and 5.92 and the

*R*

^{2}values ranged from 0.65 to 0.94. All ML model results of the outputs are shown in Table 3 in detail.

*sci-learn*library by choosing the model with the best predictive success for the dependent variable. The important point here is to find the model with the lowest RMSE and then determine input importance using the related properties. Graphical interpretations in terms of feature importance for hardness, friability, and disintegration time outputs are given in Figures 1–3.

*t*-test to determine whether there were statistically significant differences between them. The

*t*-test

*p*-value was 0.548, which exceeded 0.05, meaning that there was no significant difference between the actual values and predicted values.

### IV. Discussion

*t*-test were successfully implemented before-hand in the pre-evaluation period. The proposed approach in this study has eliminated the necessity for many trials, and prevented the use of a limited amount of active ingredients, which would have significant impacts in terms of cost and time. However, a limitation of the study is the difficulty of finding targeted global optimal values with limited data. The other principal difficulty of the study is that the new dataset produced for forecasting was too large. The creation of a new method in this area can provide faster results. This research program can be improved via the development of new models and statistical analysis to use new medicine formulations as per specific requirements for the relevant analysis.