I. Introduction
Infant survival, physical and mental growth, and maternal health status and history are associated with a vital health indicator: birth weight [
1]. The World Health Organization classifies low birth weight (LBW) as a weight of below 2,500 g obtained after birth for a live-born infant. Annually, more than 20 million infants (15%–20%) are born with LBW worldwide. An objective of the World Health Organization is to reduce the number of LBW infants by 30% by 2025 [
2]. Rates of LBW have been reported at approximately 7%, 16.5%, and 18.6% of births in developed countries, less developed or developing countries, and least developed countries, respectively [
3]. In Iran, an LBW prevalence of 8.5% was reported in a systematic review in 2020. According to that study, the highest percentage of LBW was found in Hamadan at 19.1% in 2007 [
4]. A major public health problem, LBW has a variety of short- and long-term consequences. Infants with LBW are about 20 times more likely to die than heavier infants [
5]. In addition to fetal and neonatal mortality, LBW is associated with higher risk of several adverse outcomes, including intellectual disability, impaired cognitive development, and future chronic health problems such as diabetes and cardiovascular diseases [
6]. Specific maternal characteristics before and during pregnancy may provide a basis for predicting LBW. Although many researchers have endeavored to identify factors contributing to LBW, the causes differ by region, with poor fetal growth (due to poor maternal nutrition before and during pregnancy) a major cause in less developed regions and prematurity (due to high maternal age, multiparity, cesarean section, and smoking) in more developed ones [
7]. LBW is also influenced by many other factors discussed in previous studies, such as maternal educational level, residence (urban or rural), family income, maternal occupation and health status, birth order, miscarriage, interpregnancy interval, and multiple pregnancies.
Recently, machine learning (ML), an important branch of artificial intelligence, has been widely used in many fields. In particular, breakthroughs have been made with ML methods in medical diagnosis and outcome prediction [
8,
9]. These methods are generally categorized into supervised or unsupervised learning. In a supervised ML method, a model is first trained on a variety of features related to a known outcome. The model can then make outcome predictions based on new data. When studying a discrete outcome (such as normal birth weight [NBW] versus LBW), the fitted model is termed a classification algorithm. Various ML methods have been proposed to improve the precision of data classification. Unlike traditional parametric statistical methods, ML techniques require no distributional assumptions about the dataset and are excellently suited for large datasets [
10]. Nevertheless, each proposed ML method has specific features for outcome classification and estimation, and its performance may vary across conditions and datasets. Therefore, the purpose of this study was to determine the best technique for LBW prediction by comparing the predictive performance of five popular supervised ML methods—decision tree (DT), random forest (RF), artificial neural network (ANN), support vector machine (SVM), and logistic regression (LR)—using a dataset of neonates born at Fatemieh Hospital in Hamadan, Iran. Moreover, we aimed to determine the most important factors associated with LBW.
II. Methods
1. Data Collection
In this retrospective cross-sectional study, we selected a random sample of 800 infants born at Fatemieh Hospital in the city of Hamadan. For data collection, we used a researcher-designed questionnaire based on case records available on the Iranian Maternal and Neonatal Network (IMaN Net) in 2017. The IMaN Net was designed by Iran’s Ministry of Health and Medical Education to evaluate the maternal and neonatal health status in Iran. After collecting the data, we excluded multiple pregnancies, stillbirths, and infants who died for any reason before discharge from the hospital or who had at least one abnormality. After this exclusion, 741 infants were included in the study, and the associated information was extracted as follows: (1) maternal data included place of residence (urban or rural); maternity insurance (insured or uninsured); delivery type (cesarean section or vaginal); maternal age at delivery (< 18, 18–35, or > 35 years); gestational age in weeks; preterm delivery (yes [< 37 weeks] or no [≥ 37 weeks of gestation]); consanguinity (yes or no); pregnancy risk factors such as chronic blood pressure, hepatitis, thyroid disease, cardiovascular disease, and preeclampsia/ eclampsia (yes or no); gravida; parity (i.e., number of previous live and non-live births); number of abortions; and number of previous live births. (2) Neonatal data included sex (male or female) and birth weight in grams.
Infants were classified using a binary outcome (1 for LBW and 0 for NBW), with a birth weight of 2,500 g as a threshold.
2. Data Preprocessing and Missing Values
Before the analysis, the data were evaluated to ensure that no outliers were present. However, some missing values were found for seven variables, ranging from 0.13% to 5% of the dataset. The mean and median were used to impute quantitative and qualitative variables, respectively.
3. Machine Learning Classifiers
In classification, class imbalance and a bias toward the majority class may lead to misclassification. Thus, the data were first evaluated for imbalance, and the Synthetic Minority Oversampling TEchnique (SMOTE) was then used for the ML methods as an efficient algorithm for data balancing [
11]. In this technique, the minority class is oversampled by creating synthesized samples according to the similarities between pairs of the existing minority instances.
1) Decision Tree (DT)
A simple ML technique, DT learning generates a tree-like structure by repeatedly splitting the dataset based on a criterion that maximizes the separation of the data [
12]. This technique is first executed at distant parts of the tree and then returns to its beginning according to a method termed retrograde return [
13]. DTs play an important role in medical diagnosis. We used recursive partitioning and regression trees with the tuning parameter of “cp” (the complexity parameter).
2) Random forest (RF)
The RF algorithm is based on an ensemble of large, correlated decision trees and combines the decisions of individual trees to produce accurate, stable results [
14]. We used Breiman and Cutler’s RF method with the tuning parameter of “mtry” (randomly selected predictors at each split).
3) Artificial neural network (ANN)
An ANN is a mathematical model designed to simulate the structure and function of biological neural networks in the brain [
15]. Each ANN consists of a set of specially arranged neurons acting in coordination to solve a problem. This method is among the best for medical assessment and diagnosis due to its minimal error and maximal confidence. We used a single-hidden-layer neural network with tuning parameters of “size” (hidden units) and “decay” (weight decay).
4) Support vector machine (SVM)
SVMs use a decision boundary termed the hyperplane to separate classes. The hyperplane is located at a maximum distance from the closest data points of each class. These points are known as support vectors [
16]. Two uses of SVMs in clinical medical research are prediction models for disease diagnosis and prognosis based on a specific diagnosis. We used a vector machine with polynomial kernel and tuning parameters of “degree” (polynomial degree), “scale” (scale), and “C” (cost).
5) Logistic regression (LR)
LR, a special case of generalized linear modeling, is extensively used for binary outcomes in epidemiology and medicine. By fitting data to a logistic function, the probability of an occurrence may be predicted [
17]. We used binary LR with the enter method.
4. Evaluation Criteria
To compare the performance of the utilized classifiers, we divided the data into training (70% of the data) and test (30%) sets and repeated this process 10 times. Then, the performance of the trained models was evaluated using the test set based on criteria of sensitivity (or recall), specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and accuracy as follows:
A false positive (FP) indicates NBW neonates that were incorrectly identified as LBW, a true positive (TP) indicates LBW neonates that were correctly diagnosed as LBW, a true negative (TN) indicates NBW neonates correctly identified as NBW, and a false negative (FN) indicates LBW neonates incorrectly identified as NBW.
5. Hyperparameter Tuning
To find the optimum values of the hyperparameters for the methods (DT, RF, ANN, and SVM), we applied a 10-fold cross-validation strategy to the training set. Hyperparameters were chosen when the maximum value of the receiver operating characteristic was observed. We repeated this process 10 times with different training and test partitions.
6. Variable Importance
In this study, depending on the ML model, different methods were used to compute variable importance on a numerical scale from 0 to 100. For DT, it has been stated that “an overall measure of variable importance is the sum of the goodness of split measures for each split for which it was the primary variable, plus goodness (adjusted agreement) for all splits in which it was a surrogate” [
18]. For RF, the increase in the percentage of instances in which a case was out-of-bag and misclassified when the variable was permuted was considered to indicate variable importance. For ANN, the variable importance was computed based on the weights method [
19]. For SVM, variable importance was calculated using the area under the receiver operating characteristic curve. For LR, the absolute value of the Wald statistic corresponding to the model was used to compute variable importance.
7. Software
SPSS version 25 (IBM Corp., Armonk, NY, USA) was used to calculate descriptive statistics, after which R software (version 4.2.1; R Foundation for Statistical Computing, Vienna, Austria; packages: caret, themis, rpart, randomForest, nnet, and kernlab) was used to apply ML classifiers to the dataset and measure the evaluation criteria. p-values of less than 0.05 were considered to indicate significance for all statistical inferences.
8. Ethics Approval and Consent to Participate
The data were collected from the IMaN Net. Therefore, a waiver of informed consent was awarded for this study. All methods were carried out in accordance with relevant guidelines and regulations, and the study was approved by the Ethical Committee of the Hamadan University of Medical Sciences (No. IR.UMSHA.REC.1401.779).
IV. Discussion
Different classification approaches for LBW have been utilized in several studies. In the United Arab Emirates, Khan et al. [
20] assessed the performance of several ML algorithms. Through 5-fold cross-validation, they showed that the RF approach was superior to alternatives in birth weight estimation with regard to mean absolute error (294.53 g). However, the best classification performance was achieved using LR with SMOTE regarding accuracy (90.24%), precision (87.6%), recall (90.2%), and F1-score (0.89). Maternal diabetes, hypertension, and gestational age were found to be vital factors in the classification of LBW. In a study by Zahirzada and Lavangnananda [
21], five popular ML techniques (k-nearest neighbor, naive Bayes [NB], ANN, RF, and SVM) were applied to data obtained from the Afghanistan Demographic and Health Survey to determine the most effective strategy for predicting LBW. The data were divided into a training set (80% of the data) and a test set (20%). For both rural and urban areas, RF was the best method in terms of all four-evaluation metrics (accuracy, area under the curve, precision, and recall). In a study by Borson et al. [
22] in Bangladesh, six classification techniques (LR, NB, RF, SVM, k-nearest neighbor, and multilayer perceptron artificial neural network [MLP-ANN]) were used to predict infant LBW. According to 10-fold cross-validation, LR and SVM exhibited the greatest accuracy, at 80.3%. The highest precision (0.80) and F-measure (0.89) were obtained using SVM, while MLP-ANN was associated with the greatest recall (0.81). A train-test split (75:25) analysis also showed that the MLP-ANN, SVM, and LR methods showed almost identical performance, yielding the highest accuracy (81.6%), precision (0.81), recall (0.81), and F-measure (0.89) compared to the other classifiers. Senthilkumar and Paulraj [
23] employed a cross-validation technique to predict the performance of six data mining algorithms (LR, NB, RF, SVM, ANN, and classification tree) using data gathered at Baystate Medical Center in Springfield, MA, USA. They demonstrated that compared to other data mining methods, the classification tree method provided superior overall prediction accuracy (89.95%), specificity (72.88%), area under the curve (93.80%), F-value (93.04%), and precision (88.81%). The highest recall was provided by RF, with a value of 99.23%. Last maternal weight (in pounds) before pregnancy and maternal age were the two main factors associated with LBW.
Numerous studies have indicated that low gestational age is one of the most critical risk factors for LBW [
3]. In addition, similar to the present study, other research has indicated that poor obstetric history (such as past abortion) is associated with LBW. In a study by Brown et al. [
24], women with one, two, and three or more prior abortions were, respectively, 2.8, 4.6, and 9.5 times more likely to have LBW infants than those who had never had an abortion. Recently, Ghelichkhani et al. [
25] assessed the maternal risk factors for preterm delivery (gestational age < 37 weeks) at Hamadan’s Fatemieh Hospital. Their study revealed a history of abortion to be one of the most critical factors associated with preterm delivery. A study conducted by Cogendez et al. [
26] showed that early detection of congenital and acquired intrauterine causes of abortion is possible with post-abortion hysteroscopy. Therefore, we anticipate that timely diagnosis of the causes of abortion and proper intervention can play a vital role in reducing preterm delivery and LBW. In a study by Demelash et al. [
6], the rates of LBW for primigravida, multigravida, and grand multigravida cases were 47.3%, 38.7%, and 14%, respectively. This finding aligned with our study. The other factor related to LBW in the present study was consanguinity. Poorolajal et al. [
27] performed a meta-analysis to explore the effect of consanguinity on LBW. Their findings showed that consanguineous marriage can increase the risk of LBW. In many nations, such as in North America, it is forbidden or even illegal to marry close relatives; however, in other nations, especially those in Asia, the Middle East, and Africa, it may be preferred [
28]. Lower maternal age at delivery has also been repeatedly found to be significantly related to adverse neonatal outcomes, specifically a higher prevalence of LBW. This can be explained by the fact that younger mothers are less likely to receive adequate prenatal care than older ones [
29]. In the present study, female neonates had a higher risk of LBW than male infants. This may result from the greater lean body mass and lower body fat seen in male neonates relative to female infants or the Y chromosome’s influence on the weight of the male fetus [
30].
The present study had some limitations. First, we had no data regarding several important maternal characteristics, such as prenatal care, nutritional status, body mass index, interpregnancy interval, and financial status. Second, we could not include some features, such as maternal education, in the analysis due to a high percentage of missing values. Third, because considering each pregnancy risk factor (such as chronic blood pressure and cardiovascular disease) separately led to extremely unbalanced factor distributions, we considered all pregnancy risk factors as a single combined factor. Fourth, this study may have been vulnerable to potential bias in the evaluation of performance criteria, as the data were obtained from a retrospective registry-based study.
The results of this study showed that LR outperformed the other ML classifiers. Using promising classifiers to identify key LBW-related factors can allow medical practitioners to take preventative steps to minimize LBW. Based on the results, facilitating timely diagnosis of causes of abortion, providing genetic counseling to consanguineous couples, and improving care before and during pregnancy (especially for young mothers) can play an important role in reducing LBW. Additionally, the results of this study could be used to design an online mobile application to predict LBW risk in pregnant women. This would assist healthcare practitioners in the timely detection of mothers at high risk of giving birth to LBW infants and help provide them with appropriate interventions. In addition, researchers should consider the factors noted in the limitations section in further studies.