### I. Introduction

*t*-test results showed that the best performance classification method was SMO. The same approach was also adopted by Prabowo et al. [29]. The research also investigated the performance of the algorithm computational intelligence. This research differentiated between the research of Nahar et al. [27] and Akrami et al. [28]. The differences are the process of randomization before the 10-fold cross-validation. The process was repeated 10 times, so the end result was the average of the 10 times. The concept, which is similar to that of Nahar et al. [27] was also adopted by Setiawan et al. [30]. This research compared the performance of feature selection methods in classifying 5 levels of coronary heart disease using the naive Bayesian classification method and J48 (C4.5).

### II. Methods

### 1. Data and Data Processing

### 2. Synthetic Minority Over-sampling Technique

*k*= 5 (nearest neighbors), and the value of the over-sampling rate was adjusted by the amount of data for each level, using the healthy level as a reference. This means that the healthy level established using SMOTE.

### 3. Model Intelligence System Based K-Star Classifier

^{*}). The K

^{*}algorithm can be defined as a clustering method that divides

*n*data into

*k*clusters, where each data entry in a particular cluster with an average viewing distance nearby. The K

^{*}algorithm is an instance-based learner algorithm that uses entropy to measure the distance [33]. The advantages of using entropy are that it provides a consistent approach to dealing with real-valued attributes, symbolic and missing values. The K

^{*}algorithm is similar to the k-NN algorithm, in that it uses entropy to measure the closeness of data.

### 4. Performance Evaluation of the K-Star Intelligence System Based on Classification

*k*subsets. Each subset contained the data of each class. Then, from the k-subsets, one subset was taken for testing, and

*k*–1 subset was taken for training. This was done alternately so that each subset was used for testing. The

*k*value used in this study for testing was

*k*= 10, so the performance was the average result from 10 times training and testing. Performance was measured in terms of sensitivity, specificity, PPV, NPV, AUC, and F-measure. An explanation of each performance parameter is given as follows:

### III. Results

^{*}algorithm. In the first stage the R-SCOR-RD was not used, while in the second, the R-SCOR-RD was used. The test results produced without using the R-SCOR-RD generated a confused metrics table shown in Table 2.

### IV. Discussion

^{*}algorithm with R-SCOR-RD (proposed system) treatment and without R-SCOR-RD treatment. Second, we will discuss the comparison of the proposed system with those of previous studies which adopted binary and multiclass classification approaches. The first comparison is the system of diagnosis without conducting R-SCOR-RD before classification. Figure 4 shows that the differences in the parameters of sensitivity, specificity, PPV, NPV, AUC, and F-measure were significant. Based on the test results of statistical significance of the difference by using t-test produces

*p*= 0.00757 (

*p*< 0.05), meaning that there were significant differences before and after using the R-SCOR-RD. The significant difference is explained by the fact that despite the data imbalance problem, machine learning will yield good prediction accuracy classification of the training data classes with large numbers of members, while the number of class members has poor accuracy [31].

*p*< 0.05.

*p*-value was less than 0.05.

^{*}algorithm. The resulting performance showed an average sensitivity of 80.1%, specificity of 95%, PPV of 80.1%, NPV of 95%, AUC of 87.5%, and F-measure of 80.1%. The performance is better that that of other systems proposed in previous studies. Many previous studies have used the binary classification approach without consideration the data imbalance problem.