ArticlePDF Available

Machine learning approach for predicting under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian Demographic and Health Survey

Authors:

Abstract and Figures

There is a dearth of literature on the use of machine learning models to predict important under-five mortality risks in Ethiopia. In this study, we showed spatial variations of under-five mortality and used machine learning models to predict its important sociodemographic determinants in Ethiopia. The study data were drawn from the 2016 Ethiopian Demographic and Health Survey. We used three machine learning models such as random forests, logistic regression, and K-nearest neighbors as well as one traditional logistic regression model to predict under-five mortality determinants. For each machine learning model, measures of model accuracy and receiver operating characteristic curves were used to evaluate the predictive power of each model. The descriptive results show that there are considerable regional variations in under-five mortality rates in Ethiopia. The under-five mortality prediction ability was found to be between 46.3 and 67.2% for the models considered, with the random forest model (67.2%) showing the best performance. The best predictive model shows that household size, time to the source of water, breastfeeding status, number of births in the preceding 5 years, sex of a child, birth intervals, antenatal care, birth order, type of water source, and mother's body mass index play an important role in under-five mortality levels in Ethiopia. The random forest machine learning model produces a better predictive power for estimating under-five mortality risk factors and may help to improve policy decision-making in this regard. Childhood survival chances can be improved considerably by using these important factors to inform relevant policies.
Content may be subject to copyright.
O R I G I N A L A R T I C L E Open Access
Machine learning approach for predicting
under-five mortality determinants in
Ethiopia: evidence from the 2016 Ethiopian
Demographic and Health Survey
Fikrewold H. Bitew
1*
, Samuel H. Nyarko
1,2
, Lloyd Potter
1,2
and Corey S. Sparks
1
* Correspondence: fikre.wold@gmail.
com
1
Department of Demography,
College for Health, Community &
Policy, University of Texas at San
Antonio, 501 W. Cesar Chavez Blvd,
San Antonio, TX 78207, USA
Full list of author information is
available at the end of the article
Abstract
There is a dearth of literature on the use of machine learning models to predict
important under-five mortality risks in Ethiopia. In this study, we showed spatial
variations of under-five mortality and used machine learning models to predict its
important sociodemographic determinants in Ethiopia. The study data were drawn
from the 2016 Ethiopian Demographic and Health Survey. We used three machine
learning models such as random forests, logistic regression, and K-nearest neighbors
as well as one traditional logistic regression model to predict under-five mortality
determinants. For each machine learning model, measures of model accuracy and
receiver operating characteristic curves were used to evaluate the predictive power
of each model. The descriptive results show that there are considerable regional
variations in under-five mortality rates in Ethiopia. The under-five mortality prediction
ability was found to be between 46.3 and 67.2% for the models considered, with the
random forest model (67.2%) showing the best performance. The best predictive
model shows that household size, time to the source of water, breastfeeding status,
number of births in the preceding 5 years, sex of a child, birth intervals, antenatal
care, birth order, type of water source, and mothers body mass index play an
important role in under-five mortality levels in Ethiopia. The random forest machine
learning model produces a better predictive power for estimating under-five
mortality risk factors and may help to improve policy decision-making in this regard.
Childhood survival chances can be improved considerably by using these important
factors to inform relevant policies.
Keywords: Machine learning, Under-five mortality, Determinants, Ethiopia
Introduction
Globally, an estimated 5.4 million children under the age of 5 are said to have died in
2017 alone (UNICEF, WHO, World Bank Group, and United Nations, 2018). Mean-
while, the global under-five mortality rate is said to have declined by 58%, from 93
deaths per 1000 live births in 1990 to 39 in 2017 (UNICEF, WHO, World Bank Group,
and United Nations, 2018). Yet still, the under-five mortality rate in low-income
© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit
line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a
copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Ge
n
us
Bitew et al. Genus (2020) 76:37
https://doi.org/10.1186/s41118-020-00106-2
countries was 69 deaths per 1000 live births in 2017almost 14 times the rate in
high-income countries (5 deaths per 1000 live births) (UNICEF, WHO, World
Bank Group, and United Nations, 2018). It has been observed that more than half
of these deaths are due to infectious diseases (such as pneumonia and diarrhea)
that are preventable and treatable through simple, affordable interventions (World
Health Organization, 2017).
Despite the considerable improvements over the past decades, sub-Saharan Africa re-
mains the region with the highest level of under-five mortality in the world, with about
half of the global under-five mortality burden (UNICEF, WHO, World Bank Group,
and United Nations, 2018). Ethiopia appears to have the fifth-highest number of new-
born deaths in the world, following India, Pakistan, Nigeria, and the Democratic Re-
public of Congo (UNICEF, 2017). It is estimated that about 472,000 children die in
Ethiopia each year before their fifth birthday, which places Ethiopia sixth among the
countries in the world in terms of absolute numbers of under-five deaths (Federal Min-
istry of Health, 2005). In Ethiopia, the under-five mortality rate has declined by two-
thirds from the 1990 figure of 204 per 1000 live births to 58 per 1000 live births in
2016, and thus, achieving the target for Millennium Development Goal 4 (MDG 4)
(You, Hug, Ejdemyr, Idele, et al., 2015). Despite this achievement, the under-five mor-
tality rate in Ethiopia remains higher than those of many low and middle-income coun-
tries (LMIC).
Previous studies have provided much evidence on the socioeconomic and demographic
factors that are associated with under-five mortality in Ethiopia (Ayele & Zewotir, 2016;
Ayele, Zewotir, & Mwambi, 2017; Bereka, Habtewold, & Nebi, 2017), using traditional re-
gression models. In this study, we predict the important determinants of under-five mor-
tality in Ethiopia using non-traditional regression models drawing on nationally
representative data. Specifically, we employed machine learning techniques to predict
under-five mortality risks in the study sample. The main aim of this study is to show the
spatial distribution of under-five mortality and the potential of machine learning algo-
rithms in predicting important sociodemographic factors underlying the spatial variations
in under-five mortality. As such, we initially develop a spatial visualization of the under-
five mortality rate by region in Ethiopia. This is to visually highlight the spatial disparities
in under-five mortality in the country while predicting the most important factors under-
lying these disparities. This study informs and strengthens appropriate extant policies or
intervention strategies aimed at reducing under-five mortality in the country. It also un-
derscores the potential role of the machine learning approach in demographic research.
Methods
Data source
This study is based on data from the 2016 Ethiopian Demographic and Health Survey
(EDHS), the most recent in the demographic and health survey series that is conducted
every five years. The EDHS is a nationally representative household survey that collects
data on a wide range of population, health, and nutrition indicators to improve mater-
nal and child health in Ethiopia (Central Statistical Agency (CSA) [Ethiopia],, and ICF
International, 2016). The survey used a multi-stage stratified sampling technique based
on the 2007 National Population and Housing Census of Ethiopia to select respondents
Bitew et al. Genus (2020) 76:37 Page 2 of 16
from a total of 624 clusters (187 urban and 437 rural) (Central Statistical Agency (CSA)
[Ethiopia],, and ICF International, 2016). The unit of analysis is under-five children
with a total sample size of 10,641 selected from 645 clusters across Ethiopia. This is
based on childrens data obtained from retrospective information from mothers about
their children that died before age 5 within the 5 years preceding the survey (2011 to
2016).
Study variables and measurements
In this study, the outcome variableunder-five mortalitywas measured as a binary
outcome. Thus, under-five mortality was measured as being alive (coded as 0) or dead
(coded as 1) for all the models.
The predictors (features) used in this study include individual, household, commu-
nity, and health service factors. The individual-level factors consisted of maternal and
child characteristics. Maternal factors include mothers age at birth (< 20, > 20), educa-
tion (no education, primary, secondary/higher), contraceptive use (yes/no), and
mothers body mass index (BMI) (underweight/overweight and normal). Child factors
include whether the child was wanted (child wanted then, wanted later, not at all), sex
of the child, birth order (12, 3/later), births in last 5 years, and previous birth interval
(< 2, 24, > 4 years), as well as whether the child was breastfed within 1 h of birth. The
household factors used are the source of drinking water (improved/unimproved), time
to the water source, toilet facility (improved/unimproved), and household wealth index
(low, middle, high), and household size. The community factors comprised residence
type (urban/rural) and geographical region (Tigray, Afar, Amhara, Oromia, Somali,
Benishangul-Gumuz, Southern Nations Nationalities and People Region (SNNPR),
Gambella, Harari, Dire Dawa, and Addis Ababa). The health service factors include
antenatal visits (0, 14, 5+ visits), place/mode of delivery services (facility with caesar-
ean section (CS) services, facility without CS, home), and postnatal visits within 2
months after delivery (yes/no). The selection of these predictor variables was based on
information from existing literature on the subject (Aheto, 2019; Bereka et al., 2017;
Yaya, Bishwajit, Okonofua, & Uthman, 2018).
Analytic strategy
The R programming language (version 3.6.0) and the caret package (Kuhn, 2020) was
used to perform the data processing and analysis. We first developed a spatial map for
crude under-five mortality rates by regions in Ethiopia to document the regional dis-
parities in under-five mortality in the country. In this regard, we estimated the rates
under-five mortality by region and then merged them with an Ethiopian regional sha-
pefile before mapping it.
We also used three widely used machine learning (ML) algorithmslogistic regres-
sion, a Random Forest (RF), K-nearest neighbors (KNN) modelsto predict under-five
mortality determinants in Ethiopia and compared the results of the best algorithm to
the results of the traditional logistic regression model. These three models were se-
lected for various reasons. The logistic regression is typically used to analyze binary
data and is commonly used as an inferential tool in population health research, but can
be also used as a binary classification model. The KNN model is chosen based on its
Bitew et al. Genus (2020) 76:37 Page 3 of 16
ability to detect linear and nonlinear boundaries between groups. The K is a value that
represents the number of nearest neighbors which is the core deciding factor in this
classifier. It relies on finding the best value of kso that the kclosest observations are
used to predict the value of a given observation. Thus, when k= 1 then the new data
object is simply assigned to the class of its nearest neighbor. The nearnessof observa-
tions is widely measured using Euclidean distance between observations even though
there are various numerical measures (Ali, Neagu, & Trundle, 2019; Larose, 2015). The
main concept behind KNN depends on calculating the distances between the tested,
and the trained data samples to identify its nearest neighbors. The RF model is com-
monly used in machine learning situations because it is highly flexible and provides
better predictive performance. The RF model repeatedly samples the variables in the
training data set several times, each time using a random set of predictor variables to
produce decision trees. After many of these trees are formed, the forest is examined to
see which variable consistently produce a better prediction. These groups of relatively
uncorrelated models can produce ensemble predictions that are more accurate than
any of the individual predictions. This is because the trees protect each other from their
errors (as long as they do not all constantly err in the same direction).
ML was born from pattern recognition and the theory that computers can learn with-
out being programmed to perform specific tasks. It allows computers to learn from
complex data sources, to potentially find previously unseen insights without being ex-
plicitly programmed where to look (Elisa, 2018). It can also be used to automate tasks
by building analytical models using algorithms that iteratively learn from data. In
demographic parlance, ML appears to address some of the major challenges in demo-
graphic research by helping to draw insights using available datasets collected for differ-
ent purposes at different points in time, which in most cases may be challenging to
incorporate in the traditional techniques. It may be also used to predict future occur-
rences of the principal components of population change (fertility, mortality, and mi-
gration) and associated factors. As such, ML techniques can be both used to predict
previously identified proximate correlates and new significantdemographic variables,
and also shed light on how important previously used variables are in terms of
prediction.
In this regard, we randomly sampled and trained 80% of the total sample, which was
eventually used for 10-fold cross-validation to tune the model parameters. The
remaining 20% random sample was used as test data to predict the measures of model
performance. Because the outcome is unbalanced (there is a low fraction of under-five
mortality in the data), the data were down-sampled so the proportions of data in the
training set are equivalent to the cases who were alive after 5 years, and those who had
died before 5 years. Model accuracy metrics such as sensitivity, specificity, positive pre-
dictive value, and negative predictive values were calculated to show how well the
models perform in terms of predicting the dead and alive cases. Sensitivity (positivity
in health) refers to the proportion of subjects who have dead cases (reference standard
positive) and give positive test results. Specificity (negativity in health) is the propor-
tion of subjects that are alive (reference standard negative) and give negative test re-
sults. Positive predictive value is the proportion of positive results that are true
positives (i.e., truly dead) whereas negative predictive value is the proportion of negative
results that are true negatives (i.e., truly alive). Predictive values vary depending on the
Bitew et al. Genus (2020) 76:37 Page 4 of 16
prevalence of the target condition in the population being studied, even if the sensitivity
and specificity remain the same (Price & Christenson, 2007).
Metricssuchastheareaundercurve(AUC)and receiver operating characteristic (ROC)
curvewerealsousedtoevaluatemodelperformanceindistinguishingbetweenthedead
and alive cases. The ROC curves compare sensitivity versus specificity across a range of
values to determine the ability to predict a dichotomous outcome. The AUC is a measure
of the ability of a classifier to distinguish between classes and is used as a summary of the
ROC curve (Florkowski, 2008). Thus, the higher the AUC, the better the performance of
the model at distinguishing between the positive and negative classes (Florkowski, 2008).
The results of all the models were weighted using person weights provided by the
data. For the traditional logistic regression model, we infer the importance and signifi-
cance of predictors using odds ratios and confidence intervals derived from the model
estimation, while for the ML models, the Mean Decrease in Gini was calculated for
each variable, which is a measure of variable importance for these models. The top 10
categories of variables based on their Mean Decrease in Gini were automatically gener-
ated and then presented in diagrams for each ML model.
Results
Descriptive results of the background characteristics
Table 1showstheresultsofunder-fivemortalityby the sample characteristics. Of the 10,
641 under-five children in the sample, there appears to be a significant difference in mortal-
ity prevalence between both sexes with female children experiencing higher (6.7%) than
males(4.2%).Therewerealsoconsiderabledifferences by birth intervals with under-five
mortality being more prevalent among children with less than 2 years of birth intervals
(9.3%) than children with 24 and over 4 years of birth intervals (4.45% and 4.53%, respect-
ively). Under-five mortality was also significantly prevalent among children using unim-
proved water sources (5.8%) than those who used improved water sources (2.9%).
Significant differences were also observed regarding antenatal visits and postnatal care, with
under-five mortality being considerably prevalent among children whose mothers did not
receive antenatal (5.6%) and postnatal care (4.2%). Children who were breastfed within more
than 1 h of birth had a significantly higher prevalence of death (9.8%) than those breastfed
within 1 h of birth (4.5%) while there was also evidence of a significant difference in under-
five mortality regarding the number of people in the household. The rest of the characteris-
tics did not show any significant difference in mortality prevalence among their categories.
Spatial distribution of under-five mortality
Figure 1shows the spatial distribution of crude under-five mortality rates by regions
in Ethiopia. The under-five mortality rate in the map is presented as the number of
under-five deaths per 1000 live births. The Afar region recorded the highest under-five
mortality rate of 125 per 1000 live births, followed by BenshangulGumuz, and Somali,
which recorded 98 and 94 per 1000 live births, respectively. The lowest under-five mor-
tality rate is recorded in Addis Ababa, with a rate of 39 per 1000 live births.
Predicting under-five mortality
Below, we report the results of the three machine learning models (logistic regression,
random Forests, and the K-nearest neighbor models) (Table 2). The under-five
Bitew et al. Genus (2020) 76:37 Page 5 of 16
Table 1 Descriptive statistics of child mortality outcome by study characteristics, EDHS 2016 (N=
10,641)
Characteristics Child alive before
age 5
Percent/mean
Child dead before
age 5
Percent/mean
Chi-square test of
equality
Child dead/alive 94.9 5.1
Child sex p= .0001
Male 95.8 4.2
Female 93.3 6.7
Birth order p= 0.71
1st or 2nd 94.7 5.3
3rd or higher 94.4 5.6
Birth interval p= .0001
< 2 years 90.7 9.3
24 years 95.5 4.45
> 4 years 95.5 4.53
Mothers age at first birth p= 0.47
< 20 years 94.3 5.7
> 20 years 94.9 5.1
Age of the mother p= 0.94
1519 94.9 5.0
2034 94.4 5.6
3549 94.6 5.4
Residence p= 0.28
Rural 94.4 5.6
Urban 95.7 4.3
Education p= 0.34
No education 94.1 5.9
Primary 95.1 4.8
Secondary and Higher 95.5 4.5
Wealth index p= 0.63
Low 94.7 5.3
Middle 94.7 5.3
High 94.1 5.9
Water source p= 0.51
Unimproved 94.3 5.73
Improved 94.8 5.23
Time to water source 167.4 164.6 p= 0.89*
Toilet facility p= 0.005
Unimproved 94.2 5.8
Improved 97.1 2.9
Place and mode of delivery services p= 0.07
Fac with CS delivery 96.1 3.9
Fac without CS delivery 94.1 5.9
Home 94.2 5.8
Contraceptive use p= 0.23
Yes 95.1 4.9
Bitew et al. Genus (2020) 76:37 Page 6 of 16
mortality prediction accuracy was found to be low for all models, between 46.3 and
67.2% accuracy on the test dataset, with the RF model having the highest overall accur-
acy. The RF model had high sensitivity, meaning that it was more accurate in identify-
ing dead cases, but had low specificity, meaning that it was poor in identifying the alive
cases. However, the model correctly identified 70% of the real dead cases (28/(28+12))
and 67% of real alive cases (698/(698+343)), which means that the RF model is rela-
tively better at predicting both real positive (dead) and negative (alive) cases. The
Table 1 Descriptive statistics of child mortality outcome by study characteristics, EDHS 2016 (N=
10,641) (Continued)
Characteristics Child alive before
age 5
Percent/mean
Child dead before
age 5
Percent/mean
Chi-square test of
equality
No 94.2 5.8
Child wanted p= 0.74
Then 94.6 5.4
Later 94.5 5.5
Not at all 93.6 6.4
Antenatal visits p= 0.002
No visit 94.4 5.6
14 visits 96.7 3.3
5+ visits 97.6 2.4
Postnatal care visits p= 0.009
No 95.8 4.2
Yes 98.3 1.7
Region p= 0.43
Oromia 94.2 5.8
Addis Ababa 95.9 4.1
Afar 91.5 8.5
Amhara 94.9 5.1
Ben-Gumuz 94.2 5.8
Dire Dawa 93.7 6.3
Gambella 93.3 6.7
Harari 94.5 5.5
SNNP 93.3 6.7
Somali 96.7 3.3
Tigray 93.8 6.2
Mothers BMI p= 0.41
Underweight 93.7 6.3
Normal 94.6 5.4
Overweight 95.4 4.6
Breastfed p= .0001
Within an hour of birth 95.5 4.5
Greater than an hour of birth 90.2 9.8
Household size 6.1 5.4 p= .0001*
NB all estimates include sample design and person weights, per DHS instructions. *ttest was used instead of chi-square.
Significant variables are in bold
Bitew et al. Genus (2020) 76:37 Page 7 of 16
logistic and KNN models both show lower overall accuracy (59.9 and 46.3%, respect-
ively), and lower sensitivity, specificity, and positive as well as negative predictive
values.
A visualization of the receiver operating characteristics (ROC) curve is shown in Fig. 2.
Among the three machine learning models employed in this study, the curve of the RF
model shows the highest AUC value, indicating it is the best at classifying dead and alive
cases, among the models.
Figures 3,4, and 5show the variable importance measures, measured by the scaled
mean decrease in the Gini coefficient for each variable, as calculated during the k-fold
cross-validation process. This is an effective measure of how important a variable is for
predicting under-five mortality across all the cross-validation estimates. The three
Table 2 Model accuracy metrics for all models as evaluated on the test data
Confusion matrix Random Forest Logistic regression KNN model
Predicted Predicted Predicted
Alive Dead Alive Dead Alive Dead
Observed Alive 698 343 632 418 475 566
Dead 12 28 16 24 14 26
%%%
Accuracy 67.2 59.9 46.3
Sensitivity 98.3 97.5 97.1
Specificity 7.5 5.34 4.4
Positive predictive value 70.0 59.9 45.6
Negative predictive value 67.0 60.0 65.0
AUC 72.0 66.1 55.5
Tigray
Afar
Amhara
Oromia
Somali
Benishangul
SNNPR
Gambela
Harari
Addis Adaba
Dire Dawa
U5 Mor tality Rate
30 – 40
40 – 50
50 – 60
60 – 70
70 – 80
80 – 90
90 – 100
100 – 110
110 – 130
Leaflet | © OpenStreetMap contributors, © OpenStreetMap contributors © CARTO
Fig. 1 Spatial distribution of crude under-five mortality rates by regions in Ethiopia.Source: Created by the
authors from EDHS estimates
Bitew et al. Genus (2020) 76:37 Page 8 of 16
figures show very similar results, with household size (nhh) and breastfeeding behavior
(bfeed) being among the top 3 variables in all three models. Other important factors
that appeared in the top five variables are the time to water source (time_water), num-
ber of births (births5_ys), birth interval (b_interval), and child sex (male).
Unlike the ML model results presented above, the traditional logistic regression
model is the only one that allows direct interpretation of the model coefficients
(Table 3). Table 3shows the estimated odds ratios and confidence intervals for the
model parameters. Factors associated with under-five mortality were sex, birth
order, birth interval, water source, place of delivery, antenatal visit, postnatal care,
breastfeeding, and household size. Increased risks of under-five mortality were
found among males, higher birth order children, and children born in a facility
without C-section services. On the contrary, reduced risks were found among chil-
dren with longer birth intervals, improved water sources, children who received
antenatal and postnatal care as well as those from larger households.
Fig. 2 ROC Curves for the three models
Fig. 3 Variable importance measures for the random Forest model
Bitew et al. Genus (2020) 76:37 Page 9 of 16
Discussion
This study briefly described spatial variations in under-five mortality and predicted
under-five mortality risks in Ethiopia using machine learning techniques. The spatial
map provides evidence of considerable regional disparities in under-five mortality rates
in Ethiopia similar to what has been observed in Ghana (Aheto, 2019). Tigray and some
regions in the central part of the country show the lowest under-five mortality rates
whereas regions in the eastern and western parts of the country have the highest
under-five mortality rates. Providing evidence on the spatial variations of under-five
mortality in the country may provide the need to better understand the underlying risk
factors. Regarding the predictive analysis, the prediction accuracies and AUC statistics
are found to be highest for the RF model. The RF model shows a higher predictive
power compared to the other ML models included in this study. In this regard, the RF
model shows that household size, time to the water source, breastfeeding behavior,
births in the preceding 5 years, sex of a child, birth intervals, birth order, antenatal
Fig. 4 Variable importance measures for logistic regression model
Fig. 5 Variable importance measures for K-nearest neighbor model
Bitew et al. Genus (2020) 76:37 Page 10 of 16
Table 3 Logistic regression analysis of under-five mortality in Ethiopia
Variables Odds ratio Lower 95 % CI Upper 95% CI pvalue
(Intercept) 0.033 0.006 0.193 0.0001
Mothers age first birth (Ref: < 20)
> 20 0.600 0.353 1.018 0.059
Sex (Ref: female)
Male 2.018 1.398 2.913 0.0001
Birth order (Ref: 1st/2nd)
3rd or higher 2.129 1.131 4.008 0.020
Birth interval (Ref: < 2)
24 years 0.527 0.309 0.898 0.019
> 4 years 0.385 0.190 0.779 0.008
Time to water source 1.000 0.999 1.000 0.244
Water source (Ref: unimproved)
Improved 0.585 0.348 0.985 0.044
Toilet facility (Ref: improved)
Unimproved 1.713 0.744 3.943 0.206
Births in last 5 years 1.163 0.744 1.816 0.508
Residence (Ref: rural)
Urban 0.527 0.181 1.541 0.243
Mothers education (Ref: no education)
Primary 0.928 0.513 1.680 0.805
Secondary/higher 1.856 0.480 7.178 0.370
Wealth index (Ref: low)
Middle 1.342 0.698 2.581 0.378
High 1.694 0.937 3.064 0.082
Contraceptive use (Ref: using)
Not using 1.174 0.735 1.876 0.502
Region
Addis Ababa 1.124 0.485 2.605 0.786
Afar 0.573 0.228 1.435 0.235
Amhara 0.885 0.354 2.211 0.794
Ben-Gumuz 1.494 0.587 3.803 0.400
Dire Dawa 1.021 0.408 2.554 0.965
Gambella 0.623 0.243 1.597 0.325
Harari 1.175 0.495 2.790 0.715
SNNP 1.221 0.376 3.960 0.740
Somali 1.504 0.287 7.881 0.629
Tigray 1.733 0.519 5.787 0.372
Mothers BMI (Ref: normal)
Overweight 0.527 0.170 1.640 0.269
Underweight 1.402 0.868 2.264 0.168
Place of delivery (Ref: fac with CS delivery)
Facility without CS delivery 2.850 1.182 6.869 0.020
Home 1.185 0.617 2.275 0.610
Bitew et al. Genus (2020) 76:37 Page 11 of 16
visits, type of water source, and mothers BMI are the top 10 important predictors of
under-five mortality in Ethiopia. The important role played by some of these factors in
under-five mortality levels is widely documented in the literature (Abir, Agho, Page,
Milton, & Dibley, 2015; Dendup, Zhao, & Dema, 2018; Howell, Holla, & Waidmann,
2016; Yaya et al., 2018).
In comparison, the findings of the best performing ML model appear to be virtually
consistent with the traditional logistic regression analysis which also shows that a
childs sex, birth interval, birth order, water source, place of delivery, antenatal visits,
postnatal care, household size, and breastfeeding behavior play a significant role in
under-five mortality levels in Ethiopia. Only the number of births in the preceding
5 years and the mothers BMI appear to play an important role in the ML models
but play an insignificant role in the traditional logistic regression analysis. This is
an indication that ML models may produce some new variablesor previously un-
seen insights by the traditional regression models which may play a crucial role in
policy decision making. From the traditional logistic regression findings, male chil-
dren have shown a significantly higher risk of dying before age 5 compared with
female children. This is consistent with the finding of a cross-sectional study con-
ducted in Bangladesh (Abir et al., 2015). It has been shown that male children
have an increased risk of dying in the first month of life because of high vulner-
ability to infectious disease. This may be because female neonates are more likely
to develop early fetal lung maturity in the first week of life, which may result in a
lower incidence of respiratory diseases in female compared with male neonates
(Khoury, Marks, McCarthy, & Zaro, 1985). Also, higher birth order of children ap-
pears to be associated with a significantly higher risk of under-five mortality.
Analogously, the unfavorable effect of higher birth order on childhood survival
chances has been well documented in Africa (Howell et al., 2016)aswellassome
parts of Asia (Dendup et al., 2018;Hong&Hor,2013)andmayprovideabetter
understanding of the spatial variations in the country.
Furthermore, the risk of under-five mortality has increased significantly among
children with less than 2 years of birth interval than children with more than 2
years of birth interval. Affirmatively, there is much evidence that longer birth
Table 3 Logistic regression analysis of under-five mortality in Ethiopia (Continued)
Variables Odds ratio Lower 95 % CI Upper 95% CI pvalue
Antenatal visits (Ref: no visit)
14 visits 0.616 0.381 0.995 0.048
5+ visits 0.437 0.208 0.917 0.029
Postnatal care (Ref: no)
Yes 0.264 0.080 0.872 0.029
Child wanted (Ref: wanted then)
Wanted later 0.768 0.369 1.599 0.482
Not at all 1.407 0.749 2.642 0.289
Breastfeeding (Ref: > an hour of birth)
Within 1 h of birth 0.242 0.147 0.398 0.0001
vHousehold size 0.498 0.345 0.719 0.0001
NB significant variables are in bold
Bitew et al. Genus (2020) 76:37 Page 12 of 16
intervals improve the survival chance of succeeding children (Kozuki & Walker,
2013; Yaya et al., 2018). A short preceding birth interval can be said to influence
under-five mortality through three main mechanisms: first, closely spaced births
may cause depletion of the mother. The second mechanism is through competition
for scarce household resources among children, while the third is the transmission
of infectious diseases between the closely spaced children (Majumder, May, & Pant,
1997). While the first mechanism is biological, the last two are said to be behav-
ioral effects of a short preceding birth interval (Koenig, Phillips, Campbell, &
Dsouza, 1990).
Additionally, this study finds that the use of unimproved drinking water is associated
with an increased risk of under-five mortality. Lack of access to clean water has been
considered as one of the important factors that contribute to more than 80% of child
deaths in the world (UNICEF, 2018). There is also considerable evidence from studies
in developing countries that show that household sanitation and a clean water supply
promote child health and survival (Ezeh, Agho, Dibley, Hall, & Page, 2014; Mugo,
Agho, Zwi, Damundu, & Dibley, 2018). In Ethiopia, the proportion of the population
using improved drinking water sources is only 57%, and those who use improved sani-
tation are less than 5% (World Health Organization, 2017). This may have serious im-
plications for variations in under-five mortality in the country. This study further
provides evidence that children whose mothers do not use any contraceptives have a
significantly higher risk of under-five mortality than their counterparts whose mothers
use modern contraceptives.
This study also finds that delivery in health facilities without CS services and at home
is associated with a higher under-five mortality risk. This may be mainly related to
dealing with delivery complications that may raise under-five mortality risks. Health fa-
cilities with CS services are very scarce in Ethiopia, and where they are available, trans-
portation challenges encourage women to deliver at home even when facility-based
delivery is available at a minimal cost (Shiferaw, Spigt, Godefrooij, Melkamu, & Tekie,
2013). Moreover, the study finds a positive effect of antenatal and postnatal care
checkups on under-five survival chances. This is consistent with the significant associ-
ation observed between antenatal and postnatal care and lower under-five mortality
risk in the literature (Bitew & Nyarko, 2019; Machio, 2018). The implication is that
children whose mothers do not receive antenatal and postnatal care services may ex-
perience several proximate under-five mortality risk factors, such as congenital and in-
fectious diseases, than their counterparts. This study has also shown a considerable
positive effect of early timing of breastfeeding on childhood survival chances. Breast-
feeding has long been shown as an important protective factor against under-five mor-
tality, particularly among developing countries (Azuine, Murray, Alsafi, & Singh, 2015;
Nyarko, Tanle, & Kumi-Kyereme, 2014) and may play a key part in childhood survival
interventions in Ethiopia. Quite surprisingly, larger household size appears to be associ-
ated with reduced under-five mortality risk in this study, contrary to what is docu-
mented in the literature (Dendup et al., 2018). However, this may well be underscored
by some household-level contextual factors in the country such as availability of consid-
erable social support from parents and siblings.
This study is not without limitations. The survey comprised only surviving women,
and since neonatal and maternal mortalities may occur concurrently, this may have led
Bitew et al. Genus (2020) 76:37 Page 13 of 16
to an underestimation of the under-five mortality rates. Ultimately, unlike the trad-
itional regression models, the ML results appear to be mostly uninterpretable because
they have no regression coefficients and for that matter no direction of effect. In effect,
ML models generally predict or classify specific variables based on the level of import-
ance of their role in determining the under-five mortality levels in the current study. In
this case, extant empirical literature from studies using the traditional methodologies
may be used to determine the direction of these important variables. There are also
possible biases in the memorization or non-disclosure of deaths by mothers which may
underestimate the number of deaths. Nevertheless, machine learning techniques are
considered to be very useful in predicting population health and other phenomena and
lead to better policy decisions (Ashrafian & Darzi, 2018; Holzinger, 2017).
Conclusions
The findings show that considerable regional disparities in under-five mortality rates
persist in Ethiopia, with the highest rates being found in the Afar, Benishangul
Gumuz, and Somali regions. Also, the RF model provides a moderately better predictive
power than the logistic regression and KNN ML models in predicting under-five mor-
tality determinants in Ethiopia. Even though the RF model and the traditional logistic
regression model have shown similar factors, the RF model appears to reveal some im-
portant factors that may not be identified by the traditional logistic regression model.
This model may, therefore, proffer better policy directions regarding under-five child-
hood survival. Thus, household size, time to the water source, breastfeeding behavior,
number of births in the past 5 years, sex of a child, birth intervals, antenatal visits, birth
order, type of water source, and mothers BMI may play an important role in under-
five survival chances in Ethiopia. This study highlights the use of machine learning al-
gorithms to predict and better understand very important under-five mortality risk fac-
tors to improve crucial policy directions. As a corollary, ML methods may also apply to
other areas of demographic research including fertility and migration studies. Our find-
ings reinforce the need to focus on the most important predicted factors including
breastfeeding, birth interval control, and antenatal care among others in developing
policies aimed at enhancing childhood survival chances. Also, based on the findings,
expanding access to improved drinking water will help to substantially reduce future
under-five mortality levels in Ethiopia.
Abbreviations
AUC: Area under curve; BMI: Body mass index; CS: Caesarean section; EDHS: Ethiopian Demographic and Health
Survey; KNN: K-Nearest Neighbors; LMIC: Low- and middle-income countries; MDG: Millennium Development Goal;
ML: Machine learning; RF: Random forest; ROC: Receiver operating characteristic; SNNPR: Southern Nations Nationalities
and People Region
Authorscontributions
FHB conceived and designed the study. FHB and CSS performed the analysis with technical support from SHN. FHB
wrote the initial draft of the manuscript with technical support from SHN, LP, and CSS. All authors critically reviewed
the manuscript for important intellectual content and then approved the final version of the manuscript for
publication.
Funding
No funding was received for this study
Availability of data and materials
The datasets analyzed in this study are freely available at the DHS Program repository
Bitew et al. Genus (2020) 76:37 Page 14 of 16
Competing interests
The authors declare that they have no competing interests.
Author details
1
Department of Demography, College for Health, Community & Policy, University of Texas at San Antonio, 501 W.
Cesar Chavez Blvd, San Antonio, TX 78207, USA.
2
Institute for Demographic and Socioeconomic Research, The
University of Texas at San Antonio, 501 W. Cesar Chavez Blvd, San Antonio, TX 78207, USA.
Received: 30 April 2020 Accepted: 16 October 2020
References
Abir, T., Agho, K. E., Page, A. N., Milton, A. H., & Dibley, M. J. (2015). Risk factors for under-five mortality: evidence from
Bangladesh Demographic and Health Survey, 20042011. BMJ Open,5(8), e006722.
Aheto, J. M. K. (2019). Predictive model and determinants of under-five child mortality: evidence from the 2014 Ghana
Demographic and Health Survey. BMC Public Health,19, 64.
Ali, N., Neagu, D., & Trundle, P. (2019). Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets.
SN Applied Sciences,1(12), 1559.
Ashrafian, H., & Darzi, A. (2018). Transforming health policy through machine learning. PLoS Medicine,15(11), e1002692.
Ayele, D. G., & Zewotir, T. T. (2016). Childhood mortality spatial distribution in Ethiopia. Journal of Applied Statistics,43(15),
28132828.
Ayele, D. G., Zewotir, T. T., & Mwambi, H. (2017). Survival analysis of under-five mortality using Cox and frailty models in
Ethiopia. Journal of Health, Population, & Nutrition,36(1), 25.
Azuine, R. E., Murray, J., Alsafi, N., & Singh, G. K. (2015). Exclusive breastfeeding and under-five mortality, 2006-2014: A cross-
national analysis of 57 low- and-middle income countries. International Journal of MCH AIDS,4(1), 1321.
Bereka, S. G., Habtewold, F. G., & Nebi, T. D. (2017). Under-five mortality of children and its determinants in Ethiopian Somali
Regional State, Eastern Ethiopia. Health Science Journal,11,3.
Bitew, F., & Nyarko, S. H. (2019). Modern contraceptive use and intention to use: implication for under-five mortality in
Ethiopia. Heliyon,5, e02295.
Central Statistical Agency (CSA) [Ethiopia], & ICF International (2016). Ethiopia Demographic and Health Survey 2016. Addis
Ababa, Ethiopia, Calverton, MD, USA: Central Statistical Agency, ICF International.
Dendup, T., Zhao, Y., & Dema, D. (2018). Factors associated with under-five mortality in Bhutan: an analysis of the Bhutan
National Health Survey 2012. BMC Public Health,18, 1375.
Elisa, N. (2018). Could Machine Learning be used to address Africa's Challenges? International Journal of Computer
Applications,180(18), 09758887.
Ezeh, O. K., Agho, K. E., Dibley, M. J., Hall, J., & Page, A. N. (2014). The impact of water and sanitation on childhood mortality in
Nigeria: evidence from demographic and health surveys, 20032013. International Journal of Environmental Research and
Public Health,11(9), 92569272.
Federal Ministry of Health (2005). National Strategy for Child Survival in Ethiopia. Addis Ababa: Federal Ministry of Health.
Florkowski, C. M. (2008). Sensitivity, specificity, receiver-operating characteristic (ROC) curves, and likelihood ratios:
communicating the performance of diagnostic tests. The Clinical Biochemist Reviews,29(Suppl 1), S83.
Holzinger, A. (2017). Introduction to machine learning and knowledge extraction (MAKE). Machine Learning and Knowledge
Extraction,1(1), 120.
Hong, R., & Hor, D. (2013). Factors associated with the decline of under-five mortality in Cambodia, 2000-2010: Further analysis of
the Cambodia Demographic and Health Surveys. Calverton: ICF International.s.
Howell, E. M., Holla, N., & Waidmann, T. (2016). Being the younger child in a large African family: a study of birth order as a
risk factor for poor health using the demographic and health surveys for 18 countries. BMC Nutrition,2, 61.
Khoury, M. J., Marks, J. S., McCarthy, B. J., & Zaro, S. M. (1985). Factors affecting the sex differential in neonatal mortality: the
role of respiratory distress syndrome. American Journal of Obstetrics and Gynecology,151(6), 777782.
Koenig, M. A., Phillips, J. F., Campbell, O. M., & Dsouza, S. (1990). Birth intervals and childhood mortality in rural Bangladesh.
Demography,27(2), 251265.
Kozuki, N., & Walker, N. (2013). Exploring the association between short/long preceding birth intervals and child mortality:
using reference birth interval children of the same mother as comparison. BMC Public Health,13, S6.
Kuhn, M. (2020). Caret: Classification and Regression Training.R package version,6,085 https://CRAN.R-project.org/package=
caret.
Larose, D. T. (2015). Data mining and predictive analytics. New York: Wiley.
Machio, P. M. (2018). Determinants of neonatal and under-five mortality in Kenya: Do antenatal and skilled delivery care
services matter? Journal of African Development,20(1), 5967.
Majumder, A. K., May, M., & Pant, P. D. (1997). Infant and child mortality determinants in Bangladesh: Are they changing?
Journal of Biosocial Science,29(4), 385399.
Mugo, N. S., Agho, K. E., Zwi, A. B., Damundu, E. Y., & Dibley, M. J. (2018). Determinants of neonatal, infant, and under-five
mortality in a war-affected country: analysis of the 2010 Household Health Survey in South Sudan. BMJ Global Health,
3(1), e000510.
Nyarko, S. H., Tanle, A., & Kumi-Kyereme, A. (2014). Determinants of childhood mortality in Ghana. International Journal of
Social Science Research,3,6177.
Price, C. P., & Christenson, R. H. (2007). Evidence-based laboratory medicine: principles, practice, and outcomes, (2nd ed., ).
Washington DC: American Association for Clinical Chemistry Press.
Shiferaw, S., Spigt, M., Godefrooij, M., Melkamu, Y., & Tekie, M. (2013). Why do women prefer home births in Ethiopia? BMC
Pregnancy and Childbirth,13,5.
UNICEF. (2017). The State of the Worlds Children.https://www.unicef.org/sowc/. Accessed March 15, 2019.
UNICEF (2018). Every Child Alive. The urgent need to end newborn deaths. Genèva, Switzerland: UNICEF.
Bitew et al. Genus (2020) 76:37 Page 15 of 16
UNICEF, WHO, World Bank Group & United Nations (2018). Levels and trends in child mortality report 2018. New York: UNICEF.
World Health Organization (2017). World health statistics 2017: Monitoring health for the SDGs, and Sustainable Development
Goals. Geneva: WHO.
Yaya, S., Bishwajit, G., Okonofua, F., & Uthman, O. A. (2018). Under five mortality patterns and associated maternal risk factors
in sub-Saharan Africa: A multi-country analysis. PLoS ONE,13(10), e0205977.
You, D., Hug, L., Ejdemyr, S., Idele, P., et al. (2015). Global, regional, and national levels and trends in under-five mortality
between 1990 and 2015, with scenario-based projections to 2030: a systematic analysis by the UN Inter-agency Group
for Child Mortality Estimation. Lancet,386(10010), 22752286.
PublishersNote
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Bitew et al. Genus (2020) 76:37 Page 16 of 16
... The under-5 mortality rate continued to decrease from 6.5% in 2004 to 4.4% in 2011 and the projections suggest further reductions to 17.6 per 1000 live births by 2030 [13,15]. The Under 5 Mortality declines from 82.5 to 41.0 per 1000 live births during 1994-2014 and projected to further reduces to 17.6 per 1000 live births by 2030 [15]. Despite these positive trends, childhood mortality remains relatively high in Bangladesh, and studies have identi ed crucial factors like mother's education, breastfeeding practices, wealth index, TV watching, and birth order for targeted interventions [1]. ...
... In a study utilizing Machine Learning techniques and the NFHS-IV (India) dataset, factors in uencing under-ve mortality encompassed living children, survival duration, wealth index, infant size at birth, recent births within a ve-year span, total children born, maternal education, and birth order[16]. The most effective predictive model highlights household size, water access, duration of breastfeeding, recent births, child's gender, birth spacing, antenatal care, birth order, water source type, and maternal body mass index as crucial factors in under-ve mortality levels in Ethiopia[17].Although many previous studies identi ed a number of risk factors associated with under-5 mortality. No study shows the application of Advanced of Machine Learning (AML) for BDHS data 2017-18 for nding determinants and prediction of under-5 mortality in Bangladesh. ...
... [17]. Religion stands as a pivotal social institution in uencing child health and mortality rates [22] our study shows that religion is an important factors under-5 mortality. ...
Preprint
Full-text available
Background: Under-5 mortality is a vital social indicator of a country's development and long-term economic viability. The most underlying factors contributing under-5 mortality is a concern in developing countries like Bangladesh. There has been extensive research conducted on under-5 mortality. The prevailing approach employed thus far primarily relies on traditional logistic regression analysis, which have demonstrated limited predictive effectiveness. Advance Machine Learning (AML) methods provide accurate prediction of under-5 mortalities. This study utilized Machine Learning techniques to forecast the mortality rate among children under the age of five in Bangladesh. Methods: The data for the study were drawn from the Bangladesh Demographic Health Survey 2017–18 data. Python version 3.0 software was utilized to implement and evaluate various Machine Learning (ML) techniques, including Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN) and Support Vector Machine (SVM). Boruta algorithm for selecting best features by using Boruta packages of R programming language. Furthermore, the SPSS Version 17 was used for analyzing conventional methods. Various matrices, like confusion matrix, accuracy, precision, recall, F1 score and the Area Under the Receiver Operating Characteristic Curve (AUROC) was utilized as a metric to assess the effectiveness or performance of predictive models. Results: We opted for t2xhe Random Forest (RF) model is the best predictive model of under-5 mortality in Bangladesh with accuracy (95.97%), recall (11%), precision (40%), F1 score (18%), and AUROC (75%). Our predictive models showed that Currently breastfeeding, Wealth index, Religion, Birth order number, Number of household members, Place of delivery, Type of toilet facility, Type of cooking fuel are the 8 top determinants of under-5 mortality in Bangladesh. Conclusions: Machine Learning methods were utilized to create the most optimal predictive model enabling the classification of hidden information that remained undetectable through traditional statistical methods. In our Study the Random Forest model was the best models for predicting under-5 mortality in Bangladesh.
... The researchers conclude that Random Forest (RF) is the best algorithm among others [43]. A study that used the Ethiopian Demographic and Health Survey 2016 data carried out by Bitew et al. (2020) found that the Extreme Gradient (xgbTree) algorithm gave a better result [44]. Use of artificial intelligence on the Indian Demographic Health Survey dataset [2005][2006] to identify the likelihood correlation with malnutrition done by Khare et al. (2017) [45]. ...
... The researchers conclude that Random Forest (RF) is the best algorithm among others [43]. A study that used the Ethiopian Demographic and Health Survey 2016 data carried out by Bitew et al. (2020) found that the Extreme Gradient (xgbTree) algorithm gave a better result [44]. Use of artificial intelligence on the Indian Demographic Health Survey dataset [2005][2006] to identify the likelihood correlation with malnutrition done by Khare et al. (2017) [45]. ...
Article
Full-text available
Background This paper presents an in-depth examination of malnutrition in women in Bangladesh. Malnutrition in women is a major public health issue related to different diseases and has negative repercussions for children, such as premature birth, decreased infection resistance, and an increased risk of death. Moreover, malnutrition is a severe problem in Bangladesh. Data from the Bangladesh Demographic Health Survey (BDHS) conducted in 2017-18 was used to identify risk factors for malnourished women and to create a machine learning-based strategy to detect their nutritional status. Methods A total of 17022 women participants are taken to conduct the research. All the participants are from different regions and different ages. A chi-square test with a five percent significance level is used to identify possible risk variables for malnutrition in women and six machine learning-based classifiers (Naïve Bayes, two types of Decision Tree, Logistic Regression, Random Forest, and Gradient Boosting Machine) were used to predict the malnutrition of women. The models are being evaluated using different parameters like accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document} score, and area under the curve (AUC). Results Descriptive data showed that 45% of the population studied were malnourished women, and the chi-square test illustrated that all fourteen variables are significantly associated with malnutrition in women and among them, age and wealth index had the most influence on their nutritional status, while water source had the least impact. Random Forest had an accuracy of 60% and 60.2% for training and test data sets, respectively. CART and Gradient Boosting Machine also had close accuracy like Random Forest but based on other performance metrics such as kappa and F1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document} scores Random Forest got the highest rank among others. Also, it had the highest accuracy and F1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document} scores in k-fold validation along with the highest AUC (0.604). Conclusion The Random Forest (RF) approach is a reasonably superior machine learning-based algorithm for forecasting women’s nutritional status in Bangladesh in comparison to other ML algorithms investigated in this work. The suggested approach will aid in forecasting which women are at high susceptibility to malnutrition, hence decreasing the strain on the healthcare system.
... The use of machine learning algorithms to extract knowledge from large amounts of data can open new avenues of understanding the complex relationships between social determinants and health outcomes. A few algorithms have been used to understand the relationship between social determinants and infant mortality or under-five child mortality [12][13][14][15] . There is a few evidence in Latin America, to our knowledge, extensively examined the impact of social determinants of health on child well-being using longitudinal data spanning the last two decades. ...
... We randomly divided the total data into 70% to train the models and 30% to obtain variable importance, cumulative dependency, and performance metrics. We trained into a random forest model 17 as one of the best algorithms for predicting infantile mortality or under-five child mortality in other studies [12][13][14] . To train the random forest we used k-fold cross-validation technique with threefold. ...
Article
Full-text available
The reduction of child mortality rates remains a significant global public health challenge, particularly in regions with high levels of inequality such as Latin America. We used machine learning (ML) algorithms to explore the relationship between social determinants and child under-5 mortality rates (U5MR) in Brazil, Ecuador, and Mexico over two decades. We created a municipal-level cohort from 2000 to 2019 and trained a random forest model (RF) to estimate the relative importance of social determinants in predicting U5MR. We conducted a sensitivity analysis training two more ML models and presenting the mean square error, root mean square error, and median absolute deviation. Our findings indicate that poverty, illiteracy, and the Gini index were the most important variables for predicting U5MR according to the RF. Furthermore, non-linear relationships were found mainly for Gini index and U5MR. Our study suggests that long-term public policies to reduce U5MR in Latin America should focus on reducing poverty, illiteracy, and socioeconomic inequalities. This research provides important insights into the relationships between social determinants and child mortality rates in Latin America. The use of ML algorithms, combined with large longitudinal data, allowed us to evaluate the effects of social determinants on health more carefully than traditional models.
... In the region including Ethiopia, many studies produced information on different healthcare issues such as antenatal care (ANC) utilization status of mothers , the postnatal care (PNC) visits of mothers (Sahle, 2016), access to tetanus toxoid (TT) immunization of mothers , predicting undernutrition status of U5 children (Markos, Doyore, Yifiru, & Haidar, 2014), predicting the CD4 count status of patients under ART (Mariam & Mariam, 2015), predicting the level of anaemia among women (Dejene, Abuhay, & Bogale, 2022), and predicting U5 mortality (Bitew, Nyarko, Potter, & Sparks, 2020) by using different machine-learning algorithms and AI-driven models. In addition to this, a few studies have applied them to the detection and classification of COVID-19 cases from X-ray images (Ayalew, Salau, Abeje, & Enyew, 2022;Erdaw & Tachbele, 2021). ...
Thesis
Full-text available
COVID-19 severely affected Eastern Africa as of other parts of the world, which significantly disrupted social and economic activities in the region. The objective of this study was to predict mortality due to COVID-19 using an artificial intelligence-driven ensemble and boosting models in Eastern Africa. The study used two years of daily data collected consecutively. At the preprocessing stage, the dataset was split into training and verification sets. In the modelling, sensitivity analysis, the development of single black box AI-driven models, the development of ensemble and boosting models, and the comparison of ensemble models with single models were conducted. In the sensitivity analysis, four inputs were selected and used to run the single models, and accordingly, the coefficients of determination (DC) for ANFIS, FFNN, SVM, and MLR were, respectively, 0.9273, 0.8586, 0.8490, and 0.7956. Another four inputs were used to create the boosting method: AdaBoost, KNN, ANN-6, and SVM were shown to have determination coefficients of 0.9422, 0.8618, 0.8629, and 0.7171, respectively. The ANFIS ensemble and AdaBoost algorithms proved to be the most effective at enhancing the prediction performance of the single AI-driven models with non-linear ensemble techniques. This demonstrates the ability of ensemble and boosting models to predict COVID-19 mortality in Eastern Africa.
... In this reference, Yagcı ranked the algorithms from highest to lowest accuracy as RF, NN, SVM, LR, ANN, and KNN, which aligns with the conclusions drawn in this study. Conversely, [25] found that the RF model demonstrated moderately better predictive power compared to the LR and KNN models. The proposed RF model achieved a 73.8% accuracy in predicting students' final exam grades. ...
Article
Full-text available
This study employs machine learning algorithms to predict the determinants of stu-dents' academic performance in Somaliland using data from the 2021/2022 academic year. The educational landscape in Somaliland is characterized by challenges in implementing quality education, especially at the secondary level. This research aims to uncover factors influencing student performance in national secondary exams and compare the effectiveness of machine learning algorithms, including random forest, naive Bayes, and logistic regression. It utilizes a dataset encompassing 14,342 students and examines variables such as student residence, administrative region, school type, and gender. The results indicate that the random forest model outperforms other algorithms , achieving 73.8% accuracy in predicting student performance. The traditional logistic regression analysis further highlights the impact of region, residence type, and school type on academic outcomes. These findings contribute to understanding the factors affecting students' performance in Somaliland and provide insights for educational policy and interventions.
... According to the (Kementerian Kesehatan Republik Indonesia, 2021), the decrease in the number of visits to under-fives coughing and shortness of breath at public health center is more due to the impact of the COVID-19 pandemic than other factors. Several studies related to under-five mortality due to pneumonia have been carried out, such as research by (Noor et al. (2016); Ramandey et al. (2018); Vicasco and Handayani (2020)) whose research used a quantitative analytic approach with a case control study; Woldeamanuel and Aga (2021) used Hurdle Negative Binomial (HNB) regression analysis; Tadesse Zeleke (2022) used propensity score analysis and Poisson regression; Bitew et al. (2020) used machine learning models such as random forests, logistic regression, and K-nearest neighbors. As well as research by Yatnaningtyas et al. (2016) which uses Geographically Weighted Negative Binomial Regression (GWNBR). ...
Article
Full-text available
Pneumonia is one of the main causes of under-five mortality in Indonesia. In under-fives, pneumoniais the number one killer in the world. Meanwhile, in Indonesia, it ranks second after diarrhea. Onaverage, the disease affects half a million children a year. This study aims to identify and analyze therisk of variables that affect the number of under-five mortality due to pneumonia in Indonesia in 2021.The novelty of this research focuses on the macro variables used, making it easier for policy makers tomake decisions. The research method used is negative binomial regression. The results showed that thehighest number of under-five mortality due to pneumonia was in Central Java Province. Meanwhile, thelowest was in Jambi Province, South Sumatra, Riau Islands, DKI Jakarta, North Kalimantan, SoutheastSulawesi, and Papua. The per capita income significantly reduces the number of under-five mortalitydue to pneumonia, while the number of under-fives with severe pneumonia significantly reduces thenumber of under-five mortality due to pneumonia in Indonesia. The government needs attention toreduce the death rate of children under five due to pneumonia by providing social protection in thefields of health and education for underprivileged communities.
... According to Abedin [13] the research highlights the signi cant impact of women's decision-making is in uencing on fertility outcomes. Some of the studies highlight the applications of machine learning methods in various eld of demography [16][17][18][19][20]. In a study shows that a woman's age, education level, occupation, and location are signi cant determining factors for the survival of a child during birth delivery [23]. ...
Preprint
Full-text available
Background Fertility is a social indicator that represents the country’s growth and economic sustainability. The fertility rate of a country refers to number of average children born to a woman during her lifetime. It is an important demographic indicator that influences population dynamics, economic growth, social welfare, and public policy. This research leverages advanced machine learning methodologies to achieve more precise predictions of fertility and fertility determinants in Bangladesh. Methods The dataset utilized in this study was sourced from the Bangladesh Demographic Health Survey (BDHS) conducted in the year 2017–18. Python 3.0 programming language were used to implement and test the machine learning (ML) models such as Random Forests (RF), Decision Tree (DT), K-Nearest Neighbors (KNN), Logistic Regression (LR), Support Vector Machine (SVM), XGBoost, LightGBM and Neural Network (NN). We have used Boruta algorithm of Feature selection with R programming language packages. Conventional methods were analyzed using SPSS Version 25 and R programming language. The predictive models performance was evaluated and compared with the metrics such as macro average and weighted average of the Confusion Matrix, Accuracy, F1 Score, Precision, Recall, Area Under the Receiver Operating Characteristics Curve (AUROC) and K-fold cross-validation. Results We preferred with the Support Vector Machine (SVM) model of fertility in Bangladesh with macro average recall (93%), precision (89%), F1 score (90%) in addition with weighted average recall (97%), precision (96%), F1 score (96%) K-fold accuracy (95.9%). Our predictive models showed that Access to mass media, Husband/partner's education level, Highest educational level, Number of household members, Body Mass Index of mother, Number of living children and Son or daughter died stand out as the key determinants influencing fertility in Bangladesh. Conclusions In the realm of constructing advanced predictive models, Machine Learning methods surpass conventional statistical approaches in classifying concealed information. In our Study the Support Vector Machine (SVM) emerged as the top-performing model for fertility prediction in Bangladesh.
... This suggests that the ML models may reveal previously unknown insights beyond traditional logistic regression approaches. Specifically, ML models could identify new influential variables for policy decision making that are missed by standard statistical methods (37). While the core findings aligned, ML provided the additional benefit of highlighting novel and potentially crucial MN deficiency factors not captured by traditional logistic regression. ...
Article
Full-text available
Introduction Micronutrient (MN) deficiencies are a major public health problem in developing countries including Ethiopia, leading to childhood morbidity and mortality. Effective implementation of programs aimed at reducing MN deficiencies requires an understanding of the important drivers of suboptimal MN intake. Therefore, this study aimed to identify important predictors of MN deficiency among children aged 6–23 months in Ethiopia using machine learning algorithms. Methods This study employed data from the 2019 Ethiopia Mini Demographic and Health Survey (2019 EMDHS) and included a sample of 1,455 children aged 6–23 months for analysis. Machine Learning (ML) methods including, Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Neural Network (NN), and Naïve Bayes (NB) were used to prioritize risk factors for MN deficiency prediction. Performance metrics including accuracy, sensitivity, specificity, and Area Under the Receiver Operating Characteristic (AUROC) curves were used to evaluate model prediction performance. Results The prediction performance of the RF model was the best performing ML model in predicting child MN deficiency, with an AUROC of 80.01% and accuracy of 72.41% in the test data. The RF algorithm identified the eastern region of Ethiopia, poorest wealth index, no maternal education, lack of media exposure, home delivery, and younger child age as the top prioritized risk factors in their order of importance for MN deficiency prediction. Conclusion The RF algorithm outperformed other ML algorithms in predicting child MN deficiency in Ethiopia. Based on the findings of this study, improving women’s education, increasing exposure to mass media, introducing MN-rich foods in early childhood, enhancing access to health services, and targeted intervention in the eastern region are strongly recommended to significantly reduce child MN deficiency.
... After Nigeria, India, Pakistan, and the Democratic Republic of the Congo, Ethiopia appears to have the fifthhighest rate of U5M worldwide in 2019 [6]. Ethiopia ranks sixth in the world in terms of the total number of deaths of children under five, with an estimated 472,000 children passing away before turning five each year [7]. ...
Article
Full-text available
One important measure of a nation's level of development is its under-five mortality (U5M) rate. Notwithstanding notable reductions in the U5M rate, around 5.6 million children worldwide still pass away before turning five each year. According to the 2016 Ethiopian Demographic and Health Survey (EDHS) report, 67 children out of every 1,000 live births passed away before turning five years old. This study used data from the EDHS in 2016 to investigate factors associated with U5M in Ethiopia. The EDHS 2016 provided the information and 10,641 under-five children in total, weighted, were included in this study. Tables and graphs were used in the completion and reporting of descriptive statistics. To find important variables influencing U5M, a multi-level hurdle negative binomial model with an additional random effect was fitted. The following were found to be statistically significant factors for U5M in Ethiopia: maternal education status, place of delivery, husband/partners' educational status, place of residence, household wealth index, birth type, preceding birth interval, number of under-five children, sex of child, age of mother at first birth, source of drinking water, immunization coverage, child diarrhea status, ANC and PNC visits, and use of contraceptives. According to the findings, improving female education chances, resolving regional differences, and encouraging mothers to give birth in medical facilities would all have a significant role in reducing the burden of U5M. Furthermore, the results of this study support the idea that implementing multi-sectoral interventions to enhance access to drinking water, prenatal and postnatal care, spacing of births, child immunization programs, and contraceptive use will significantly lower Ethiopia's rates of U5M in the future. Policymakers and health planners should prioritize addressing preventable factors for under-five mortality in order to curtail and meet the Sustainable Development Goal (SDG) targets for under-five mortality in Ethiopia.
Chapter
Prediction of mortality is an important problem for making plans related to health and insurance systems. In this work, mortality of Africa, America, East Asia and Pacific, Europe and Central Asia, Europe alone, South Asia regions have been studied and predictions are made using fourteen machine learning techniques. These are linear, polynomial, ridge, Bayesian ridge, lasso, elastic net, k-nearest neighbors, support vector (with linear, polynomial and radial basis function kernels), decision tree, random forest, gradient boosting and artificial neural network regressors. The results are compared based on the coefficient of determination and the accuracy values. The best predicting algorithm varies from one region to another. On the other hand, the best accuracy (99.32%) and coefficient of determination (0.9931) are obtained for Africa region and using k-nearest neighbor regressor.
Article
Full-text available
Distance-based algorithms are widely used for data classification problems. The k-nearest neighbour classification (k-NN) is one of the most popular distance-based algorithms. This classification is based on measuring the distances between the test sample and the training samples to determine the final classification output. The traditional k-NN classifier works naturally with numerical data. The main objective of this paper is to investigate the performance of k-NN on heterogeneous datasets, where data can be described as a mixture of numerical and categorical features. For the sake of simplicity, this work considers only one type of categorical data, which is binary data. In this paper, several similarity measures have been defined based on a combination between well-known distances for both numerical and binary data, and to investigate k-NN performances for classifying such heterogeneous data sets. The experiments used six heterogeneous datasets from different domains and two categories of measures. Experimental results showed that the proposed measures performed better for heterogeneous data than Euclidean distance, and that the challenges raised by the nature of heterogeneous data need personalised similarity measures adapted to the data characteristics.
Article
Full-text available
Objective: High under-five mortality has been identified as a major problem in many developing countries including Ethiopia. The main purpose of this study is to examine the effect of modern contraceptive use on under-five mortality in Ethiopia. Methods: The study draws on data from the 2011 and 2016 Ethiopia Demographic and Health Surveys. The Kaplan-Meier survival function was used to demonstrate the survival probabilities of children while a multivariate analysis using the Cox proportional hazards model was used to estimate the under-five mortality risks for various predictors. Results: The results show consistently higher survival probabilities for children of mothers who use modern contraceptives for all survival periods. Significant predictors of under-five mortality include modern contraceptive use, tetanus vaccinations, mother's age, child's sex, parity, postnatal checkup, marital status, and source of drinking water. Conclusion: Modern contraceptive use has a notable implication for the chances of under-five survival in Ethiopia. This underscores the importance of modern contraceptive use in the pursuit of a substantial reduction in under-five mortality in the country.
Article
Full-text available
Background Globally, millions of children aged below 5 years die every year and some of these deaths could have been prevented. Though a global problem, under-five mortality is also a major public health problem in Ghana with a rate of 60 deaths per 1000 live births. Identification of drivers of mortality among children aged below 5 years is an important problem that needs to be addressed because it could help inform health policy and intervention strategies aimed at achieving the United Nations SDG Goal 3 target 2. The aim of this study is to develop a predictive model and to identify determinants of under-five mortality. Method The 2014 Ghana Demographic and Health Survey data was used in this study. Analyses were conducted on 5884 children. The outcome variable is child survival status (alive or dead). Single level binary logistic and multilevel logistic regression models were employed to investigate determinants of under-five mortality. The fit of the model was checked using Variance Inflation Factor and Likelihood Ratio tests. The Receiver Operating Characteristic curve was used to assess the predictive ability of the models. A p-value< 0.05 was used to declare statistical significance. Results The study observed 289 (4.91%) deaths among children aged below 5 years. The study produced a good predictive model and identified increase in number of total children ever born, number of births in last 5 years, and mothers who did not intend to use contraceptive as critical risk factors that increase the odds of under-five mortality. Also, children who were born multiple and residing in certain geographical regions of Ghana is associated with increased odds of under-five mortality. Maternal education and being a female child decreased the odds of under-five mortality. No significant unobserved household-level variations in under-five mortality were found. The spatial map revealed regional differences in crude under-five mortality rate in the country. Conclusion This study identified critical risk factors for under-five mortality and strongly highlights the need for family planning, improvement in maternal education and addressing regional disparities in child health which could help inform health policy and intervention strategies aimed at improving child survival. Electronic supplementary material The online version of this article (10.1186/s12889-019-6390-4) contains supplementary material, which is available to authorized users.
Article
Full-text available
Background As an important marker for health equity and access, under-five mortality (UFM) is a primary measure for socioeconomic development. The importance of reducing UFM has been further emphasized in an ambitious target under Sustainable Development Goals. The factors influencing UFM are not adequately understood in Bhutan. Methods The most recent dataset of the Bhutan National Health Survey (BNHS) 2012 was used in this study. Multiple logistic regression analysis using a backwards elimination approach was performed to identify significant factors influencing UFM. All statistical analyses were adjusted for the complex study design due to the multistage stratified cluster sampling used in BNHS. Results Bhutan’s UFM rate was 37 per 1000 live births. The weighted mean age of the children was 7.3 years (SD: 1.53; range: 3–12). Mother’s age, household size, access to electricity and sanitation, residential region, and parity were the key factors associated with UFM. The UFM risk was significantly lower in children born to mothers aged 36–40 years, 41–45 years, and > 45 years when compared to that in children born to mothers aged < 26 years. The likelihood of mortality was 66% lower (95% CI: 0.21–0.55) among children born in households with > 5 members. Children born in households without electricity and safe sanitation had a significantly higher risk of death, by 81 and 49% respectively. Relative to those born in the west, children born in the central and eastern regions were 1.72 (95% CI: 1.07–2.77) and 2.09 (95% CI: 1.46–2.99) times more likely to die, respectively. Children born to mothers who gave birth to > 2 children were significantly more likely to die than their counterparts. Conclusion These findings suggest that younger mother’s age, the higher number of births and being born in the central and eastern regions are associated with a higher UFM risk, whereas a larger household size and access to electricity and safe sanitation are key factors associated with lower UFM risk in Bhutan. Women empowerment, health education and strategies promoting maternal and child health in rural areas need to be scaled-up. Additionally, socioeconomic development programs should seek to reduce regional disparities. Electronic supplementary material The online version of this article (10.1186/s12889-018-6308-6) contains supplementary material, which is available to authorized users.
Article
Full-text available
In their Perspective, Ara Darzi and Hutan Ashrafian give us a tour of the future policymaker's machine learning toolkit.
Article
Full-text available
Background Under-5 mortality rate in the sub-Saharan region has remained unabated. Worse still, information on the regional trend and associated determinants are not readily available. Knowledge of the trend and determinants of under-5 mortality are essential for effective design of intervention programmes that will enhance their survival. We aimed to examine the mortality patterns in under-5 children and maternal factors associated with under-5 deaths. Methods Demographic and Health Survey (DHS) data from five sub-Sahara Africa countries; Chad, Democratic Republic of Congo, Mali, Niger and Zimbabwe were used in this study. The sample size consisted of 68,085 women aged 15–49 years with at least one history of childbirth. The outcome variable was under-five mortality rate. Relevant information on maternal factors were extracted for analysis. Multivariable Cox proportional hazards regression was used to model maternal factors associated with under-five mortality. Results The current under-5 mortality rate (per 1,000 live births) was; 133 in Republic of Chad, 104 in Democratic Republic of Congo, 95 in Mali, 127 in Niger, and 69 in Zimbabwe. Several maternal and child level factors were found to be significantly associated with under-five mortality. Lack of spousal support (not currently married) resulted to increase in under-five mortality (Chad- Hazard Ratio [HR] = 1.11, 95%CI = 0.97–1.25; DR Congo- HR = 1.24, 95%CI = 1.11–1.40; Mali- HR = 2.43, 95%CI = 1.63–3.64; Niger- HR = 1.59, 95%CI = 1.24–2.03; Zimbabwe- HR = 1.33, 95%CI = 1.06–1.67). Delivery by caesarean section was significantly associated with under-five mortality (Chad- HR = 1.32, 95%CI = 1.00–1.77; DR Congo- HR = 1.20, 95%CI = 1.01–1.43; Mali- HR = 1.42, 95%CI = 1.08–1.85; Niger- HR = 1.43, 95%CI = 1.06–1.92; Zimbabwe- HR = 1.49, 95%CI = 1.03–2.15). Conclusion Despite concerted effort by government and several stakeholders in health to improve childhood survival, the rate of under-5 mortality is still high. Our findings provided evidence on the contribution of maternal age, place of residence, household wealth index, level of education, employment, marital status, religious background, birth type, birth order and interval, sex and size of child, place and mode of delivery, to Under-5 mortality rate in SSA. The position of prominent risk factors for under-five mortality should be addressed through effective design of timely and efficient intervention aimed at reducing childhood mortality.
Article
Full-text available
Background Under-five children born in a fragile and war-affected setting of South Sudan are faced with a high risk of death as reflecting in high under-five mortality. In South Sudan health inequities and inequitable condition of daily living play a significant role in childhood mortality. This study examines factors associated with under-five mortality in South Sudan. Methods The study population includes 8125 singleton, live birth, under-five children born in South Sudan within 5 years prior to the 2010 South Sudan Household Survey. Factors associated with neonatal, infant and under-five deaths were examined using generalised linear latent and mixed models with the logit link and binomial family that adjusted for cluster and survey weights. Results The multivariate analysis showed that mothers who reported a previous death of a child reported significantly higher risk of neonatal (adjusted OR (AOR)=3.74, 95% confidence interval (CI 2.88 to 4.87), P<0.001), infant (AOR=3.19, 95% CI (2.62 to 3.88), P<0.001) and under-five deaths (AOR=3.07, 95% CI (2.58 to 3.64), P<0.001). Other associated factors included urban dwellers (AOR=1.37, 95% CI (1.01 to 1.87), P=0.045) for neonatal, (AOR=1.35, 95% CI (1.08 to 1.69), P=0.009) for infants and (AOR=1.39, 95% CI (1.13 to 1.71), P=0.002) for under-five death. Unimproved sources of drinking water were significantly associated with neonatal mortality (AOR=1.91, 95% CI (1.11 to 3.31), P=0.02). Conclusions This study suggested that the condition and circumstances in which the child is born into, and lives with, play a role in under-five mortality, such as higher mortality among children born to teenage mothers. Ensuring equitable healthcare service delivery to all disadvantaged populations of children in both urban and rural areas is essential but remains a challenge, while violence continues in South Sudan.
Article
Kenya lags behind its East Africa partners in reducing childhood mortality. Childhood mortality can be prevented or reduced if women have access to quality care during conception, pregnancy, and in intra-partum and post-natal periods. In Kenya, few women use adequate antenatal care services and many still deliver at home. Using the Kenya Demographic and Health Survey data, this study investigates the effects of adequate use of antenatal and skilled delivery care services on neonatal and under-five mortality. Two-stage residual inclusion and control function approaches are used. The main finding is that adequate antenatal care and skilled assistance during delivery reduce neonatal and under-five mortality. Thus, policies that promote use of maternal health services such as increasing women's education and reducing average distances to health facilities should be implemented.
Article
Machine Learning can be both experience and supervised based learning. Machine learning would help in designing system that can be able to take decisions in a more optimized form and also help them to work in most efficient method. In the field of machine learning one considers the important question of how to make machines able to “learn”. Learning in this context is understood as inductive inference, where one observes examples that represent incomplete information about some “statistical phenomenon”. Advanced economies are already using machine learning to solve problems like medical diagnosis, Improving Ecommerce Conversion Rates, traffic congestion, saving cows from bad drivers and improving healthcare, while Africa lags conspicuously behind. African leaders may be aware of this. Whether they possess the foresight to see and take people through is however debatable. This paper discusses how machine learning could be used to address Africa’s challenges by highlighting how some of the major challenges can be solved using certain machine learning techniques. Major challenges to Africa continent are identified and machine learning techniques that could address them are briefly highlighted. While it does have some frightening implications when thinking about it, machine learning applications are several of the many ways this technology can improve our lives.