Content uploaded by Md. Akhtarul Islam
Author content
All content in this area was uploaded by Md. Akhtarul Islam on Sep 04, 2021
Content may be subject to copyright.
Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
Received August 2, 2021, accepted August 24, 2021, date of publication August 27, 2021, date of current version September 3, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3108551
Associating Measles Vaccine Uptake Classification
and Its Underlying Factors Using an Ensemble of
Machine Learning Models
MD. KAMRUL HASAN 1, MD. TASNIM JAWAD 1, AISHWARIYA DUTTA 2, MD. ABDUL AWAL 3,
MD. AKHTARUL ISLAM 4, MEHEDI MASUD 5, (Senior Member, IEEE),
AND JEHAD F. AL-AMRI 6
1Department of Electrical and Electronic Engineering (EEE), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh
2Department of Biomedical Engineering (BME), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh
3Electronics and Communication Engineering (ECE) Discipline, Khulna University (KU), Khulna 9208, Bangladesh
4Statistics Discipline, Khulna University (KU), Khulna 9208, Bangladesh
5Department of Computer Science, College of Computers and Information Technology, Taif University, Taif 21944, Saudi Arabia
6Department of Information Technology, College of Computer and Information Technology, Taif University, Taif 21994, Saudi Arabia
Corresponding author: Md. Abdul Awal (m.awal@ece.ku.ac.bd)
This work was supported by Taif University Researchers Supporting Project, Taif University, Taif, Saudi Arabia, under
Grant TURSP-2020/211.
This work involved human subjects or animals in its research. Approval of all ethical and experimental procedures and protocols was
granted by the ICF Institutional Review Board (ICF-IRB).
ABSTRACT Measles is one of the significant public health issues responsible for the high mortality
rate around the globe, especially for developing countries. Using nationally representative demographic
and health survey data, measles vaccine utilization has been classified, and its underlying factors are
identified through an ensemble Machine Learning (ML) approach. Firstly, missing values are imputed
employing various approaches, and then several feature selection techniques have been applied to identify the
crucial attributes for predicting measles vaccination. A grid search hyperparameter optimization technique
has been applied for tuning the critical hyperparameters of different ML models, such as Naive Bayes,
random forest, decision tree, XGboost, and lightgbm. The individual optimized ML model’s categorization
performance as all their ensembles have been reported utilizing our proposed BDHS dataset. Individually,
the optimized lightgbm provides the highest precision and AUC of 79.90% and 77.80 %, respectively. This
result improved when the optimized lightgbm is ensembled with XGboost, providing the precision and AUC
of 84.60 % and 80.0%, respectively. Our result reveals that the statistical median imputation technique with
the XGboost-based attribute selection method and the lightgbm classifier provides the best individual result.
The performance improved when the proposed weighted ensemble of the XGboost and lightgbm approach
was adapted with the same preprocessing and recommended for measles vaccine utilization. The significance
of our proposed approach is that it utilizes minimum attributes collected from the child and their family
members and yielded 80.0 % accuracy, making it easily explainable by caregivers and healthcare personnel.
Finally, our predictive model provides an early detection procedure to help national policymakers enforce
new policies with specific rules and regulations. The data and source codes that support the findings of this
study are available at https://github.com/kamruleee51/measles_vaccine_uptake.
INDEX TERMS Attribute selection, measles vaccine uptake classification, measles BDHS data, missing
value imputation, weighted ensemble ML model.
I. INTRODUCTION
Measles is a highly contagious viral disease, which is very
common in developing countries and is associated with a
The associate editor coordinating the review of this manuscript and
approving it for publication was Emre Koyuncu .
significant level of mortality and morbidity [1], [2]. This viral
disease is vaccine-preventable, yet measles is a leading cause
of death among children among vaccine-preventable dis-
eases, and the fatality rate of measles is up to 10.0 % [3]–[5].
This vaccine-preventable disease is a crucial public health
issue in sub-Saharan Africa and South-East Asia, involving
VOLUME 9, 2021
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 119613
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
Bangladesh [6], [7]. Every year, more than 0.1 million deaths
occur due to measles, and in the first three months of 2019,
the cases of measles have increased by 300.0 % than 2018 [1],
[8], [9]. To reduce measles and increase community-level
immunity, 95.0% measles vaccination coverage is crucial
with two doses, which will decrease related causes of
mortality and lead to elimination [10]–[13]. To realize the
elimination of measles, we can consider the statement made
by the World Health Organization (WHO) (2017) as ‘‘The
interruption of measles transmission in a defined geograph-
ical area that has lasted at least 12 months and is verified
after it has been sustained for at least 36 months [14]’’.
In Bangladesh, measles vaccination coverage was 88.0 %
among children below the age of one year. The Fourth
Health, Population and Nutrition Sector Program (HPNSP)
sets a goal of 90.0 % coverage by 2022 [15]. The crucial
thing to increasing the vaccination rate is to recognize the
influencing factors associated with the utilization of measles
vaccination [16], [17]. Existing literature revealed several
influencing factors related to measles vaccination uptake [6],
[17]–[19].
This study focused on recognizing the contributing factors
to the non-utilization of measles vaccination among children
in Bangladesh. We have employed Machine Learning (ML)
techniques in four consecutive Bangladesh Demographic and
Health Survey (BDHS) data surveys from 2007 to 2017−18.
Utilizing the ML procedure may accelerate the recognition
of appropriate features related to the non-utilization of
the measles vaccine compared to other methods frequently
applied to variable selection challenges, as well as improve
the prediction accuracy of the concluding classification
model [20]. For evidence, the authors in [21] utilized
the Synthetic Minority Over-Sampling Technique (SMOTE)
techniques to investigate the problem of imbalance in class
and found 93.90 % as a true-positive rate. In contrast,
the false-positive and false-negative rates were 5.80 % and
5.10 %, respectively. To evaluate the influencing factors
that place individuals at a higher risk of measles [22],
utilized ML techniques and found that contact with measles
patients, age, rhinorrhea, vaccination, male sex, cough,
conjunctivitis, ethnicity, and fever were the crucial factors
that were associated with measles disease. The authors
in [23] adopted the LASSO (Least Absolute Shrinkage
and Selection Operator) logistic regression model on the
electronic health record to identify message vaccine-resistant
families and obtained 72.0 % precision. They attributed 25
features based on the history of the child and their family
members. The authors employed the ML approach based
on the area level feature to predict vaccine hesitancy for
a broad range of vaccine-preventable diseases, including
measles [24]. The authors found that the random forest
provided the best performance than the gradient boosting
machine, LASSO, and neural network. The authors in [25]
explored and identified associated features to predict measles
non-vaccination from the Philippine National Demographic
and Health Survey data. They employed an Elastic Net ML
model using 32 relevant attributes comprised of geographic
location, socioeconomic condition, and features related to
children and family information. As a result, they obtained
an accuracy, sensitivity, and specificity of 79.02 %, 97.73 %,
and 23.41 %, respectively. A review article was published
in [26] to explore the usefulness of data mining and ML
approaches to explore the clinical significance of measles and
its prediction. A multiple linear regression model was applied
in [27], and they found that the associated factors for measles
uptake were parenting and knowledge, nutritional status, and
behavior. The authors of [28] applied a logistic regression
model to find out the association between socioeconomic
characteristics with measles uptake and revealed that measles
vaccine utilization rates are highly socially determined.
An illustration of the positive relationship between child
daycare centers, maternal and paternal education, and
measles vaccine uptake was accomplished in [29] in Ger-
many. Finally, a systematic review analysis was conducted
in [30] utilizing the primary studies and discovered that for
measles, mumps, and rubella vaccine uptake, community
health, peer judgment, confidence in experts and vaccines,
responsibility toward children, and measles severity are
strongly associated. Unfortunately, research on measles and
its vaccine using ML approaches was minimal, and to our
best knowledge, in Bangladesh, with our proposed BDHS
data, this article is the first attempt. However, the significant
contributions and key topics covered by this article are as
follows:
•Proposing nationally representative demographic and
health survey measles data from Bangladesh, called the
BDHS dataset.
•Developing a framework for linking measles vaccine
uptake classification and its underlying factors.
•Incorporating an integral preprocessing, which includes
missing value imputation and attribute selection strate-
gies.
•Optimizing the hyperparameters of different ML-based
models and proposing a weighted ensemble ML model
for the aimed task of this article.
•Conducting complete ablation studies for the prepro-
cessing and classifier determination for recommend-
ing the best possible framework for measles vaccine
utilization.
The article’s remaining sections are arranged as fol-
lows: Section II describes the proposed BDHS dataset
and framework. Section III illustrates the achieved results
from different extensive experiments with the possible
explainability. In the end, Section IV terminates the article
with future working directions.
II. MATERIALS AND METHODS
This section elaborately manifests the materials and method-
ologies of the article. Section II-A illustrates the proposed
datasets, which were collected from Bangladesh. Section II-B
explains the proposed framework, incorporating missing
value imputation (see in Section II-B1), attribute selection
119614 VOLUME 9, 2021
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
(see in Section II-B2), and different ML classifiers with
the proposed ensemble classifier (see in Section II-B3).
In Section II-B3, we also describe the hyperparameter opti-
mization for different ML models. In the end, we define the
evaluation indices of different comprehensive experiments in
Section II-B4.
A. PROPOSED DATASETS
1) CLINICAL INTERPRETATION OF MEASLES
The disease is initiated by an RNA respiratory virus of the
Morbillivirus genus and Paramyxoviridae family [31]–[33].
According to the WHO, the clinical meaning of measles
is that any individual with cough, coryza or conjunctivitis,
and fever generalized maculopapular rash [31], [34], [35].
Sometimes, unusual tiny white spots on the buccal mucosa
called koplik spots can be observed for measles disease
[31], [34]. Fever may be as high as 40◦C, cough,
conjunctivitis, and rash are the symptoms of measles, similar
to the symptoms of other respiratory seasonal infections [28],
[36]. These symptoms similarity may be why rapid increases
in measles cases and rapid spread occur through close contact
with one another, and routine interaction in public places
[36], [37]. The measles virus affects individuals through
respiratory droplets produced by sneezing or coughing
or through straight contact. These tiny droplets or tiny-
particle aerosols can drift in the air for prolonged durations,
and the typical contagious duration is four days after the
rash occurs [31]–[33]. Therefore, vaccine utilization to
prevent measles is crucial to growing hard immunity in the
community.
2) DATA SOURCES AND VARIABLES
This study utilized four consecutive nationally representative
Demographic and Health Survey data of Bangladesh begin-
ning from 2007, 2011, 2014, and 2017-18 [15], [38]–[40].
These datasets were collected under the National Institute
of Population Research and Training (NIPORT) authority
of the Ministry of Health and Family Welfare (MOHFW).
A Bangladeshi research organization, Mitra and Associates,
implemented the survey. In this survey, a two-stage stratified
clustering sampling technique was utilized. The total area was
divided into several enumeration areas (EA) and selected in
the first stage, and for the second stage, several households
were selected. For instance, in the 2017-18 survey, a list
of 675 Enumeration Areas (EA) was established in the
first stage, with 250 in urban and 425 in rural areas.
In the second stage, 30 households were taken on average
by each EA. The BDHS 2017-18 was conducted using five
types of questionnaires. In this study, we used data from
the woman’s questionnaire. This questionnaire was based on
the model questionnaires developed for the worldwide DHS-
7 Program, adjusted to the circumstances and requirements in
Bangladesh, and considering the content of the instruments
employed in earlier DHS surveys in Bangladesh [15].
Our focused question was related to children’s immu-
nizations. During this survey, women were asked questions
regarding their socioeconomic characteristics (for instance,
age, education, religion, and media exposure), reproductive
history, knowledge of uses and sources of family planning
methods, antenatal, delivery, postnatal, and newborn care,
husbands’ background, etc. [15]. Note that we have utilized
publicly identified accessible datasets, which were secondary
data for this study. This data was collected considering
all ethical issues that can be found on the DHS websites
(https://dhsprogram.com/) and is now published at Harvard
Dataverse [41]. This study excluded the ethical review
endorsement separately.
Dependent Variable: We consider measles as the depen-
dent variable with two categories. Children who took the
measles vaccine were categorized as ‘‘Yes’’, and those who
did not take the measles vaccine were categorized as ‘‘No’’.
In the following Fig. 1, we represent the prevalence rate of
measles uptake in different divisions in Bangladesh. Children
from the Barisal division recorded the lowest prevalence
(59.67 %) of measles uptake, whereas children from the
Rajshahi division showed the highest prevalence (67.91 %).
Also, in the Dhaka and Khulna divisions, the rates were
65.21 % and 63.91 %, respectively.
Independent Variable: Table. 1illustrates the different
independent variables, which are classified as categorical and
continuous attributes.
B. PROPOSED METHODOLOGIES
Fig. 2displays our proposed framework for the Measles
Disease Classification (MDC), which incorporates two
crucial preprocessing, such as missing value imputation
and attribute selection. We apply different imputation and
attribute selection techniques to perform complete ablation
studies for the proposed BDHS datasets. The BDHS datasets
after preprocessing have been partitioned into Kfolds,
where the K−1 folds are utilized for training and fine-
tuning the hyperparameters in the inner loop, employing the
grid search algorithm [42]. In the outer circle (Ktimes),
the best hyperparameters and unseen test data were utilized to
evaluate the classifier in the proposed framework. Since the
proposed BDHS datasets contain imbalanced class samples,
the stratified cross-fold validation [43] has been adopted to
preserve the fundamental class specimen ratio. After training
all the ML models, an evaluation has been accomplished,
utilizing the unseen test data. Then the obtained prediction
probabilities (Pi,∀i∈N, where Nis the number of candidate
classifiers for ensembling) are employed to build an ensemble
classifier for the MDC. The following sections describe the
integrated parts of the proposed framework in Fig. 2.
1) MISSING VALUE IMPUTATION
The real-world datasets often include missing values,
encoded as NaNs, blanks, undefined, null, or any other
placeholders, for various reasons [44]. There are many
methods for replacing missing values with substituted values,
such as case deletion (Raw), missing data imputation,
model-based prediction, etc [45]. The latter method, like
VOLUME 9, 2021 119615
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
FIGURE 1. The prevalence rate of measles uptake in different divisions in Bangladesh in the proposed BDHS dataset, bestowing the higher
to lower prevalence rate using heatmap intensities.
model-based prediction, suffers from various complications,
such as it fails for the complex & blended pattern and
necessitates a long time to converge [46]. Therefore,
this article integrates statistical imputation methods in
the proposed framework in Fig. 2, such as Median and
Mode, as they are simple, easy, and faster [46]. The steps
applied for the Filling Missing Value (FMV) are presented
in Algorithm 1.
Algorithm 1 The Steps for Achieving the Applied FMV
Technique
Input: The n-dimensional uncurated data, Xin ∈Rnand
outcome, Y∈[0,1].
Output: The n-dimensional curated data, Xout ∈Rn
1Estimate class median or mode as MCi,∀i∈[0,1]
2Missing value imputation as
Xout (x)=(MC=i,∀i∈[0,1],if x=missed
x,otherwise ,
3where x∈Xin is the observation of Xin and lies in
n-dimensional attribute space
2) ATTRIBUTE SELECTION
The ML models’ accuracy increases with the addition of
the attribute’s dimension. However, it brings the curse of
dimensionality by decreasing the results by increasing the
dimension. With the extension of size without increasing
sample numbers in the feature vector, the dimensionality
of the attribute-space converted sparser, which pushed the
ML models to be overfitted by dropping generalizing
capacity [43]. Additionally, constructing models from
datasets with many attributes is more computationally
demanding [47]. Therefore, it is essential to incorporate
attribute reduction techniques in a classification framework,
which is likely to build a generic ML model. The supervised
Attribute Selection (AS) method usually has better perfor-
mance among supervised, semi-supervised, and unsupervised
AS techniques [43], [48]. This article applies four most
commonly employed supervised AS methods to reduce the
attribute redundancy, namely Fisher Score (FS) [49], RF [50],
LGB [51], and XGB [52] for conducting the ablation studies
for our BDHS datasets, which are briefly detailed in the
following paragraphs.
a: FS ATTRIBUTE SELECTOR
The core intention of the FS is to attain a subset of attributes
so that the lengths between data points in separate classes are
as high as possible. In contrast, the distances between data
points in the same category are as small as possible [49]. The
applied actions for the FS scheme in the AS are conferred in
Algorithm 2.
b: RF ATTRIBUTE SELECTOR
RF, a tree-based strategy, is employed for the AS in our
framework in Fig. 2, as it directly ranks the attributes by
how well it improves the purity of the node, decreasing
the impurity over all trees. Nodes with the most significant
reduction in impurity happen at the start of the trees, while
notes with a minor drop in impurity occur at the end of
the trees. Thus, by pruning the trees below a particular
node, a subset of the essential attributes can be picked.
119616 VOLUME 9, 2021
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
TABLE 1. Description of the independent attributes (categorical and continuous) utilized in this research. A χ2-test is used for categorical attributes to
describe the significant relationship with the dependent variable measles uptake, whereas the Mean ±std is used to describe continuous variables. The
respondent is the mother of the child who is considering vaccine utilization.
FIGURE 2. The complete workflow of the study, where the training dataset is further divided to perform grid search optimization for finding
the best hyperparameters of the ML models.
The applied steps for the RF-based AS are displayed
in Algorithm 3.
c: LGB AND XGB ATTRIBUTE SELECTORS
The feature importance obtained from the LGB and XGB
are likely to be more accurate as they are way more reliable
than linear models [52]. Both models practice regularized
learning and cache-aware block structure tree learning for
ensembling learning. The gain from them represents the gain
score for each tree split, and the average growth calculates
the final feature importance score. Finally, the selections of
the top-mranked features are obtained from their gain (see in
Algorithm 7and 8).
3) CLASSIFIERS AND HYPERPARAMETER OPTIMIZATION
Different ML classifiers, such as Gaussian Naive Bayes
(GNB), Bernoulli Naive Bayes (BNB), Decision Tree (DT),
Random Forest (RF), XGboos (XGB), and Lightgbm (LGB),
are trained and evaluated for the measles classification
VOLUME 9, 2021 119617
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
Algorithm 2 The Implementing Steps for the Involved FS
Technique
Input: The d-dimensional data, Xin ∈Rn×dand
outcome, Y∈[0,1].
Output: The reduced m-dimensional data, Xout ∈Rn×m,
where m<d
1Estimate Fisher score (F) consider jth feature xj∈R1×n
of Xin as
F(xj)=Pc
k=1nk×(µj
k−µj)2
(σj)2,
where (σj)2=Pc
k=1nk×(σj
k)2and C∈[0,1] is the
class number. The kth-class mean and standard deviation
are µj
kand σj
k. Considering that µjand σjdenote the
mean and standard deviation of the whole data set
corresponding to the jth feature.
2Select the top-mranked features with large scores (F)
and store in Xout
Algorithm 3 The Steps for Implementing the Applied RF
Technique
Input: The d-dimensional data, Xin ∈Rn×dand
outcome, Y∈[0,1].
Output: The reduced m-dimensional data, Xout ∈Rn×m,
where m<d
1Compute the Out of Bag (OOB) error of a tree.
2Randomly assign each observation with ˆ
Pkto the child
nodes if the parent node kis split in X, where ˆ
Pkis the
relative frequency of observations that initially went in
the same direction of the tree.
3Recompute the OOB error of the tree (following step 2).
4Compute the difference between the original and
recomputed OOB-errors.
5Repeat steps 1–4 for each tree and apply the average
deviation over all trees as the overall importance score
(F).
6Select the top-mranked features with large scores (F)
and store them in Xout .
in the proposed framework. The following paragraphs
elaborately explain the algorithmic actions of these ML
classifiers.
a: GNB & BNB CLASSIFIER
The Bayesian methods are supervised learning algorithms
based on applying Bayes’ theorem with the assumption of
conditional independence between all couple of attributes
providing the value of the class variable. We employ two
variants of this classifier, such as GNB and BNB. The former
variant utilizes Gaussian function as a likelihood of the
attributes, whereas the second variant applies multivariate
Bernoulli distributions. The actions for implementing those
two Bayesian classifiers are illustrated in Algorithm 4.
Algorithm 4 The Steps of Implementing GNB & BNB
Classifiers
Input: The d-dimensional data X∈Rn×dwith n
samples, and target Y∈Rn×1
Output: The posterior probability P∈[0,1] of unseen
test set x, necessitating
PC
i=1Pi=1,∀i∈C=2, Cis the class number
1Compute the prior as P(Y=Cj)=nj
n,∀j∈C, and njis
the sample in jth class
2Estimate the output posterior probability as
P(Cj|X)=P(X|Cj)×P(Y=Cj)
P(X), where P(X|Ci) is the
likelihood of the predictor for a given class (∀j∈C)
b: RF CLASSIFIER
RF models apply the bagging method to individual trees in
the ensemble, which repeatedly chooses a random sample
with replacement from the training set and fits trees to these
samples. The number of trees in the ensemble is a free
parameter that is readily automatically learned using out-
of-bag errors. The algorithmic steps for developing the RF
classifier are defined in Algorithm 5.
Algorithm 5 The Steps of Implementing RF Classifier
Input: The d-dimensional data X∈Rn×dwith n
samples, and target Y∈Rn×1
Output: The posterior probability P∈[0,1] of unseen
test set x, necessitating
PC
i=1Pi=1,∀i∈C=2, Cis the class number
1for b=1∼N(n_Bagging)do
2Draw a bootstrap sample, (Xb,Yb) from given
(X∈Rn×d,Y∈Rn×1)
3Grow a random-forest tree Tbusing Xband Ybby
repeating recursively using the following steps until
the minimum node size is nmin.
1) Randomly select mvariables from the given n
variables
2) Pick the best variable or split-point among the m
variables
3) Split the node into two daughter nodes
Output the ensemble of trees will be {Tb}N
1
4The posterior probability ˆ
PN
RF (x)=Voting{ˆ
Pk(x)}N
1,
where ˆ
Pk(x) is the class prediction of the kth RF.
c: DT CLASSIFIER
DT builds classification models in a tree structure, breaking
down a data set into smaller and smaller subsets. The final
result is a tree with decision nodes and leaf nodes, where
a decision node has two or more branches, and a leaf node
represents a classification or decision. The topmost decision
node in a tree corresponds to the best predictor, called the root
node. Algorithm 6explains the steps of a DT model.
119618 VOLUME 9, 2021
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
Algorithm 6 The Steps of Implementing DT Classifier
Input: The d-dimensional data X∈Rn×dwith n
samples, and target Y∈Rn×1
Output: The posterior probability P∈[0,1] of unseen
test set x, necessitating
PC
i=1Pi=1,∀i∈C=2, Cis the class number
1Split θ=(j,tm) into Qleft (θ) and Qright (θ) subsets,
where θconsisting of a feature, jand threshold, tm
2Compute the impurity at kth node using an impurity
function (H),
G(Q, θ)=nleft
NmH(Qleft (θ)) +nright
NmH(Qright (θ)), where
H=X
C
PmC ×(1 −PmC ) or
H= − X
C
PmC ×log(pmC ) and
PmC =1
NmX
xi∈Rm
I(yi=C)
3Minimise the impurity by selecting the parameters,
θ∗=argminθG(Q, θ)
4Repeat the above processes for subsets Qleft (θ∗) and
Qright (θ∗) until depth reach to Nm<minsamples or
Nm=1
d: XGB CLASSIFIER
XGB falls under the category of boosting techniques in
ensemble learning, consisting of multiple models to predict
accuracy better. In this boosting technique, the errors made
by previous models are adjusted by succeeding models by
adding some weights to the models. The actions for the XGB
classifier implementation are disclosed in Algorithm 7.
e: LGB CLASSIFIER
LGB is also a gradient boosting framework built on decision
tree algorithms., which applies a technique called Gradient-
Based One-Side Sampling (GOSS) and Exclusive Feature
Bundling (EFB) that benefits from both leaf-wise and level-
wise strategy. Those techniques in LGB accelerate the
training process [54], [55]. Algorithm 8describes the steps
of completing the LGB classifier.
f: ENSEMBLE CLASSIFIER
The six different ML models, as described earlier, are
employed for the ensemble models as they can boost the
performance of the ML-based classifiers [43], [56] and shown
outperforming in many applications such as pneumonia,
diabetic retinopathy classifications [57], [58]. In ensembling
approaches, the aggregation of the outputs from different
models can improve the measles vaccine uptake prediction
precision. The output from each model Pj∈RC,∀j∈
{1,2,...,m=6}(mis the number of classifiers) assigns
C=2 confidence values yi∈R(i=1,2) to the unseen
test data, where yi∈[0,1] and
C
X
i=1
yi=1. The weighted
Algorithm 7 The Steps of Implementing XGB Classifier
Input: The d-dimensional data X∈Rn×dwith n
samples, and target Y∈Rn×1
Output: The posterior probability P∈[0,1] of unseen
test set x, necessitating
PC
i=1Pi=1,∀i∈C=2, Cis the class number
1Initialize the model with constant value:
Fo(x)=argminγ
N
X
i=1
L(Y, γ ) [53], where L(Y,F(x)) is
the differentiable loss function and Nis the number of
sample
2for m=1∼M(n_Iterations)do
3Compute pseudo-residuals, rim = −[δL(Y,F(Xi))
δF(Xi)],
where i=1,2,...,N
4Fit a base tree, hmusing training set (Xi,rim) for
i=1,2,...,N
5Compute multiplier γmby
γm=argminγ
n
X
i=1
L(Yi,Fm−1(Xi)+γhm(Xi))
6Update the model by Fm(x)=Fm−1(x)+γmhm(x)
7Fm(x) is the desired posterior probability, P∈[0,1]
aggregation of various ML models was conducted employing
the equation as in (1).
Pen
i=
m=6
X
j=1
(Wj×Pij)
C=2
X
i=1
m=6
X
j=1
(Wj×Pij)
,(1)
where the weight, Wjis the jth classifier’s AUC. We choose
AUC as a weight for the proposed ensemble classifier
since we necessitate a class unbiased metric as a weight
to introduce a weighted soft voting ensembling. However,
the output of the ensemble model, Y∈RChas the confidence
values Pen
i∈[0,1]. The final class label of the unseen data
of our BDHS datasets, X∈Rnfrom ensemble model will be
Ciif Pen
i=max(Y(X)).
g: HYPERPARAMETER OPTIMIZATION
The performance of ML algorithms depends critically
on identifying a good set of hyperparameters, as those
algorithms are susceptible to many hyperparameters [43],
[59], [60]. However, the grid search [42] is the most basic
method, where the user specifies a finite set of values
for each hyperparameter, and the grid search evaluates the
Cartesian product of these sets [60]. Let us consider that
be the space of problem parameters P=(p1,p2,...,pm)
over which we maximize the p-value. A simple way to
set up a grid search consists in defining a vector of lower
bounds L=(l1,l2,...,lm) and a vector of upper bounds
VOLUME 9, 2021 119619
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
Algorithm 8 The Steps of Implementing LGB Classifier
Input: The d-dimensional data X∈Rn×dwith n
samples, and target Y∈Rn×1
Output: The posterior probability P∈[0,1] of unseen
test set x, necessitating
PC
i=1Pi=1,∀i∈C=2, Cis the class number
1Combine mutually exclusive features of X∈Rn×dby
the exclusive feature bundling technique and set
θ0(x) =argminC
n
X
i
L(Yi,C)
2for m=1∼M(no_Iteration)do
3Calculate gradient absolute values as
ri= | ∂L(yi,θ (xi))
∂θ (xi)|θ(x)=θm−1(x),∀i∈n
4Resample data set using GOSS process as
top_n=a×len(X), rand_n=b×len(X),
sorted =GetSortedIndices(abs(ri)),
A=sorted[1 :top_n],
B=RandomPick(sorted [top_n:len(X)],rand _n),
and ˆ
X=A+B, where aand bare the big and slight
gradient data sampling ratios, respectively.
5Estimate information gain as
Vj(d)=1
nPxi∈Alri+1−a
bPxi∈Blri2
nj
l(d)+
Pxi∈Arri+1−a
bPxi∈Brri2
nj
r(d)
6Build a new decision tree as θm(ˆx) on set ˆ
X
7Update θm(χ)=θm−1(χ)+θm(χ)
8Finally, obtained θm(x) is the desired posterior
probability, P∈[0,1]
U=(u1,u2,...,um) for each component of P. It involves
taking nequally spaced points in each interval of the form
[Li,Ui],∀i∈m, including Liand Ui. This creates a
total of n×mpossible grid points to check. Finally, once
each pair of points is calculated, the maximum of these
values is chosen. Table 3bestows different hyperparameters
of six separate ML models, which are optimized in this
article.
4) EVALUATION INDICES
Different extensive experiments of this article are evaluated
utilizing various metrics, such as Sensitivity (Sn), Precision
(Pr), Accuracy (Acc), and the ROC curve with AUC
value [61], [62]. The former three metrics estimate the true-
positive rates, positive predictive values, and total correctly
classified samples among all the pieces. A ROC curve
confirms the performance of a classification model at all
classification thresholds, whereas the AUC expresses the
degree or measure of separability by the classifiers. Since all
the experiments are conducted using a k-fold cross-validation
technique, the final evaluation metrics are estimated using the
equation in (2) [63], [64].
Metric =1
K×
K
X
n=1
Pn±
v
u
u
u
u
t
K
X
n=1
(Pn−¯
P)2
K−1,(2)
where Kis fold numbers and Pn∈R,∀n∈K, is the
performance metric for each fold.
III. RESULTS AND DISCUSSION
This section exhibits various extensive experiments of
this article with the corresponding results in several sub-
sections. The best missing value imputation and attribute
selection methods are analyzed through comprehensive
ablation studies in Sections III-A and III-B, respectively.
The hyperparameters of different ML models are optimized
in Section III-C. In the end, Section III-D describes the
obtained results from other ML models and the proposed
weighted ensemble classifiers with complete ablation studies.
The effectiveness of the proposed classifier has also been
validated employing a statistical ANOVA test in this section.
A. FILLING MISSING VALUES
To alleviate the missing value obstacle (see in Section II-B1),
we have applied three strategies, such as Raw (removing
those samples), Median (using median value), and Mode
(using most frequent value), as presented in Table 2. We have
applied four different BDHS datasets (see in Section II-A)
and six separate ML classifiers to produce the ablation
studies on various methods of FMV to choose the best
performing FMV technique for the measle categorization.
The experimental results in Table 2reveal that the Median
and Mode techniques outperform most of the cases with
a significant margin than the Raw method, while the Raw
method beats them in the remaining cases with a low
margin. The observation in all the BDHS datasets (as
explained in Section II-A) reveals that the percentage of
missing values is significantly less than the total samples
(13.7 %). Moreover, only one feature (Antenatal visits (A19))
contains the missing values out of nineteen features. Since
the number of missing values and the attribute containing
missing values are significantly smaller, the obtained AUCs
from all the classifiers for all the proposed datasets are
almost similar for all the MVF strategies, with a little bit
better in the Median and Mode methods in most cases (see
in Table 2).
Again, the visual inspection in Fig. 3exposes that the
populations of the A19 feature for all the BDHS datasets
follow the normal distribution, conferring similar values of
mode, median, and mean. Such as median and mode values
are responsible for getting similar AUCs for the Median
and Mode methods of FMV policies for all the datasets and
classifiers. Since the Median method outperforms the other
two FMV methods (see in Table 2), this method is applied in
the rest of the experiments of this article.
119620 VOLUME 9, 2021
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
TABLE 2. Extensive experimental results in terms of AUC for the missing value imputation, employing three imputation methods, four different BDHS
datasets, and six different classifiers, where the best imputation method for each dataset and classifier is underlined with a blue color.
FIGURE 3. Normal distribution of an A19 attribute of all the BDHS datasets containing the missing values, where (a) for BDHS-2007, (b) for
BDHS-2011, (c) for BDHS-2014, and (d) for BDHS-2020.
B. ATTRIBUTE SELECTION
AS methods have been integrated into the recommended
framework for finding the smallest subset of features,
yielding increased performance. However, it is imprac-
tical to guess the proper AS method without ablation
studies, as those methods’ performance often varies with
the applications. This article explores four distinct AS
methods without attribute transformation (thus conserving
the interpretation) and six different classifiers for the
measle uptake classification task to conduct a complete
ablation study. Fig. 4displays the AS results from different
experiments.
The AS results from the FS-based method confirm that
the LGB classifier achieves the highest possible AUC of
approximately 0.75 utilizing top 13 ∼14 attributes (see
in Fig. 4(a)). The other classifiers also demonstrate their
corresponding highest AUC at that number of features.
Again, the RF method also shows the highest performance
utilizing top 9 ∼11 attributes, with a maximum AUC of
0.74 for the same LGB classifier (see in Fig. 4(b)). Another
AS method, named LGB-based AS, explicates its maximum
AUC of 0.74 at top 7 ∼8 attributes for the LGB classifiers
(see in Fig. 4(c)). Although the FS outperforms the RF-
and LGB-based approach by a margin of 1.0 %, the former
technique demands more attributes, approximately double
than the LGB-based scheme. The remaining last method,
called XGB-based AS, confers the best AUC of roughly 0.76
for the same LGB classifier with top 3 ∼5 attributes (see
in Fig. 4(d)).
All the results in Fig. 4demonstrate that the XBG-
based AS method outperforms the RF- and LGB-based
techniques by the margins of 2.0 % and FS-based system
by a 1.0 % boundary. The FS-based AS method reveals the
discriminative power of each feature independently from
others, without indicating anything on the combination of
mutual information, leading to poor MDC results. Like the
FS-based method, the RF-based approach also points to low
MDC results, as it outputs higher importance to the attributes
without considering their correlation. It is noteworthy that
the classifiers expose their corresponding highest AUC at the
top 3 ∼5 attributes when the XBG-based AS approach is
employed. It is remarkably clear from all the figures in Fig. 4
that almost all the classifiers depict the same patterns with
varying attribute numbers, where the classifiers yield the best
results for the same attribute numbers. The AS experiments
quantitatively approve the MDC attribute ranking by the
XGB-based AS process, providing an order of A13, A14,
A1, A17, A19, A7, A12, A11, A16, A18, A8, A6, A4, A15,
A3, A2, A9, A5, and A10 (high to low importance), where
first 3 ∼5 attributes yield best AUCs for the MDC. The
obtained attributes’ ranking points to the logical results as it
provides a better ranking of the features, which are related
to respondents’ ever-born children’s numbers, age of first
birth, current age, birth order, and antenatal visit during the
VOLUME 9, 2021 119621
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
FIGURE 4. AUC versus the number of features of the proposed BDHS dataset, employing four distinct attribute selection algorithms and six individual
classifiers. The attribute numbers are varied from top 2 ∼19 to explore their characteristics in the proposed BDHS dataset.
TABLE 3. The tuned hyperparameters of six ML models with the highest possible AUC for the MDC.
pregnancy etc. Since the XGB-based AS scheme publicizes
the best results for the measle classification with fewer
attribute numbers, it has been involved in the rest of the
experiments in this article.
C. HYPERPARAMETER OPTIMIZATION
The best-obtained FMV and AS methods from those two
previous experiments are used for the hyperparameter
optimization of six different ML models to attain the
maximum possible AUCs. Table 3exposes the list of
ML models’ hyperparameters with their optimized values,
employing a grid search strategy in the proposed framework.
The optimized hyperparameter values are picked from
the set of predefined values in a grid by a searching
algorithm by maximizing AUC for the MDC, as described in
Section II-B3.
119622 VOLUME 9, 2021
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
TABLE 4. The measle classification results employing six separate ML models and proposed weighted ensemble models, incorporating missing value
imputation, attribute selection, and hyperparameter optimization. The best metrics obtained from a single ML model are presented in bold fonts, and
those metrics from the proposed ensembling models are underlined with a blue color.
D. CLASSIFIERS
The measles classification results employing different ML
models, the best performing FMV and AS methods, utilizing
the proposed BDHS datasets, are presented in Table 4.
a: INDIVIDUAL ML CLASSIFIERS
Again, the measles classification from the tree-based clas-
sifiers, such as RF and DT, shows that the RF model
outperforms three cases out of four cases with significant
margins than the DT model. Although the DT model is
less biased towards the positive class, the performance of
the RF model is far better in terms of Acc and AUC.
Technically, the RF model reduces the variance component
of error rather than the bias component as in the DT model.
Hence, the DT model has better deals with bias, while the
RF model has better accuracy. Such concepts have been
reflected in the measles classification of this article as the
DT model wins in terms of positive predictive value (Pr),
and the RF model outperforms in terms of Acc. Furthermore,
contrasting the boosting-based classifiers’ (XGB and LGB)
results, it is perceived that the LGB has more Sn, Acc, and
AUC, while the XGB has better Pr. Although the XGB model
has a slightly better positive predictive value (Pr), the LGB
model has better remaining three metrics (see in Table 4).
Although both the XGB and LGB models are based on
the boosting mechanism, the XGB model cannot supervise
categorical attributes by itself, unlike LGB or CatBoost
[65], [66]. Therefore, the LGB is the winner model for
the given BDHS dataset, which mainly holds categorical
attributes. However, confronting all the single ML models,
the applied LBG has better deals with the measles catego-
rization in the proposed BDHS dataset when the proposed
preprocessing and hyperparameter optimization are practiced
(see first six rows in Table 4). Such a result has proven
the superiority of the LGB model to classifying the measles
disease concerning accuracy and AUC.
b: ENSEMBLING ML CLASSIFIERS
To further enhance the measle categorization results, we per-
formed an ablation study to build an ensembling classifier,
FIGURE 5. 2D visualization of the proposed BDHS dataset to demonstrate
the inter-class homogeneities using a principal component analysis,
where the x-axis and y-axis respectively denote the first and second
principal components.
as it has been proven earlier that such a classifier provides
better results (see details in Section II-B3). Table 4displays
the results for all the proposed weighted ensembling models.
Firstly, we aggregate the Bayesian, tree-based, and boosting
ML models to build three ensembling models, where the
AUC of the individual model acts as a weight of that
model for the aggregation. The results of those three models
show that the proposed LGB+XGB wins three cases, such
as Pr, Acc, and AUC, out of four cases with a high
degree of margin (see in 7 ∼9th rows of Table 4).
Although the results obtained from the GNB+BNB model
shows 100.0 % Sn, it is very unfortunate that this model
predicts all the samples as positive (as the positive predictive
value (Pr) is the same as the positive class prior probability
(Ppos) (Pr =Ppos =0.749)). Such results reveal that
the classification by the ensemble of Bayesian models of
a dataset with lots of inter-class homogeneities (see the
class similarity in the BDHS dataset in Fig. 5) is not a
suitable choice as it is experimentally approved in this
article.
VOLUME 9, 2021 119623
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
FIGURE 6. The ROC curves of two different ensemble models, such as (a) LGB+XGB and (b) GNB+BNB+XGB+LGB, for the measles
classification utilizing the proposed approach.
Secondly, the weighted aggregation of two different
type model mechanisms, such as Bayesian with tree-based,
Bayesian with boosting-based, and tree and boosting-based,
points out that the proposed GNB+BNB+XGB+LGB
increases the overall accuracy with the reduced Sn, Pr,
and AUC (see in 10 ∼12th rows of Table 4). The
other two models, such as GNB+BNB+DT+RF and
DT+RF+XGB+LGB, do not produce any success of those
types of ensembling. However, the ROC curves in Fig. 6
yield the explainability of revealing the superiority of the
LGB+XGB and GNB+BNB+XGB+LGB models.
Although those two ROC curves confer almost similar
AUC values, they mainly differ in their accuracy point (see
red cross points in both the ROC curves). The left ROC curve
for the LGB+XGB model shows around 88.0 % true-positive
rates with 47.0 % false-positive rates at its accuracy point (see
blue dashed line in left figure). Similarly, the right ROC curve
for the GNB+BNB+XGB+LGB model produces around
98.0 % true-positive rates with 70.0 % false-positive rates
at its accuracy point (see blue dashed line in left figure).
Such results confer that to increase 10.0 % true-positive rates,
we must accept 22.0 % false-positive rates, which is not
a better alternative in the medical diagnostic application.
Therefore, the LGB+XGB model deals better with both the
true- and false-positive rates, providing the highest possible
AUC of 80.0%. Thirdly, the weighted ensembling of the
Bayesian-, tree-, and boosting-based models cannot further
improve the classification results; instead, it reduces the
performance. Again, we explore the two AS techniques, such
as LGB- and XGB-based AS, on all the proposed ensembling
models, whose results are visualized in Fig. 7.
The AS results in Fig. 7again exhibits a similar pattern as
they conferred in Section III-B. The varying attribute results
on all the proposed models (see in Fig. 7) acknowledge that
the XGB-based AS method again outperforms the XGB-
based AS process, providing the maximum AUC of 0.80. All
the models exhibit a similar pattern with varying attributes,
demonstrating better results for the XGB+LGB classifier
with top-5 attributes. The obtained attributes’ ranking using
the XGB-based AS method and the proposed XGB+LGB
classifier notches similar logical results, as in Section III-B,
giving a better ranking to the respondent has ever born
children numbers, age of first birth, current age, birth order,
and antenatal visit during the pregnancy.
Furthermore, the experimental results from different clas-
sification models, utilizing the proposed best preprocessing,
have been authorized employing a statistical test called
ANOVA and 10-fold cross-validation. Fig. 8dispenses the
Box and Whisker plot of the AUC values of this validation
test. For ANOVA testing, α=0.05 is applied as a
threshold to reject the Null hypothesis (all classifiers’
means are equal) if p-value ≤0.05, which outcomes signif-
icant results. The ANOVA test demonstrates a p-value of
7.93 ×10−38 (≤0.05), which reveals that an alternative
hypothesis is accepted, strongly pointing that none of the
means are equal (also displayed in Fig. 8). Again, a post
hoc T-test (Bonferroni correction) is incorporated with the
ANOVA test for deciding the better classification model in
the recommended classification system, which confirms the
superiority of the offered weighted ensemble XGB+LGB
classifier.
c: YEAR-WISE CROSS-FOLD VALIDATION
All the previous results are carried utilizing a one-year
BDHS dataset employing 5-fold cross-validation, where we
have proposed four-year BDHS datasets (n=4) (see
in Section II-A). We evaluate the proposed framework,
incorporating missing value imputation, AS method, and
proposed weighted ensembling model, utilizing all the BDHS
datasets, where data acts as one fold each year. In this
experiment, ith (∀i∈n)-year dataset is utilized as a test
set, and the remaining three datasets are used as a training
set and iterate n=4 times to test all the data in a year-
wise fashion. In this way, we have validated our proposed
119624 VOLUME 9, 2021
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
FIGURE 7. AUC versus the number of features of the proposed BDHS dataset, employing two distinct AS algorithms and the proposed weighted
ensembling classifiers. The attribute numbers are varied from top 2 ∼19 to explore their characteristics in the proposed BDHS dataset.
FIGURE 8. Box and Whisker plot of the AUC values obtained from 10-fold
cross-validation on different ML-based classifiers, where Model-1 to
Model-13, respectively, denote GNB, BNB, RF, DT, XGB, LGB, GNB +BNB,
RF +DT, XGB +LGB, GNB +BNB +RF +DT, RF +DT +LGB +XGB, GNB +
BNB +LGB +XGB, and GNB +BNB +RF +DT +XGB +LGB classifiers.
prediction and showed the generalization capability of our
proposed approach. The ROC curve in Fig. 9represents the
results of this experiment. The obtained ROC curve clarifies
that the proposed framework achieves an average AUC of
0.781 with a standard deviation of 0.005. Although the
average AUC The following paragraphs elaborately explain
the algorithmic actions of these ML classifiers. Using all the
BDHS datasets is less than the individual dataset utilization,
the standard deviation (inter-fold variation) is much higher.
Such a result reveals that the utilization of more samples
increases the model’s genericity with significantly fewer
inter-fold variations.
d: FRAMEWORK SUPERIORITY COMPARED TO OTHER
STUDIES
It is unreasonable to compare the recommended framework
with the published frameworks, as we utilized our newly
proposed BDHS datasets (see dataset details in Section II-A).
FIGURE 9. The ROC curve best performing ensemble model, named
LGB+XGB, for the measles classification utilizing the proposed approach
and all the BDHS datasets.
However, it is the first attempt to suggest an AI-based
framework for the endeavored task using nationally represen-
tative demographic and health survey data from Bangladesh.
Additionally, the contributions in this article focused on
identifying the contributing factor of the non-utilization
of measles vaccination among children in Bangladesh.
However, the authors in [25] utilized Philippine National
Demographic and Health Survey data, using 32 relevant
attributes comprised of geographic location, socioeconomic
condition, and features related to children and family
information, which obtained an accuracy of 79.02 %. Another
article in [23] received 72.0 % precision, using 25 attributes
based on the history of the child and their family members.
In contrast, our framework achieved an accuracy of 78.70 %
and precision of 84.60 %, using only 3 ∼5 attributes, such
as respondents’ ever-born children numbers, first birth’s age,
current age, birth order number, and antenatal visit during the
pregnancy. Such above discussions reveal the preponderance
VOLUME 9, 2021 119625
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
of the recommended AI-based system as it provides better
results with the least number of attributes.
IV. CONCLUSION
This article schemes and optimizes a novel ML-based
framework for measles vaccine uptake classification and
correlates its underlying factors. The whole research has been
succeeded based on the newly proposed BDHS datasets. The
recommended framework reveals that a weightedensemble of
ML models successfully enhances the classification results,
as it weighted aggregates the output probabilities of the
ensemble candidates’ model. Furthermore, the integration
of missing value imputation and attribute selection as a
preprocessing also heightens the aimed outcome. Adopting
those preprocessing methods is critical, necessitating a
complete ablation study to determine the essentially suitable
methods. Moreover, compared to other studies, our research
provides a more accurate model using only 3 ∼5 attributes,
namely respondents’ ever-born children numbers, first birth’s
age, current age, birth order number, and antenatal visit
during the pregnancy, which are easily explainable. We hope
that this study will help national policymakers to give more
importance to these attributes and to ensure ‘‘hard-immunity’’
in the community.
CONFLICT OF INTEREST
The authors have not any conflicts to disclose this research.
AUTHOR CONTRIBUTIONS
Md. Kamrul Hasan and Md. Abdul Awal conceived of the
presented idea and planned the experiments. Md. Abdul
Awal and Md. Akhtarul Islam conceptualized the original
idea. Md. Kamrul Hasan and Md. Abdul Awal designed
the model and the computational framework, analyzed the
data, and Md. Kamrul Hasan carried out the implementation.
Md. Kamrul Hasan, Md. Tasnim Jawad, and Aishwariya
Dutta carried out the experiments. Md. Kamrul Hasan,
Md. Tasnim Jawad, Aishwariya Dutta, Md. Abdul Awal, and
Md. Akhtarul Islam wrote the manuscript with support from
Mehedi Masud and Jehad F. Al-Amri, and Mehedi.Masud and
Jehad F. Al-Amri edited the manuscript. All authors provided
critical feedback and helped shape the research, analysis,
and manuscript. Md. Kamrul Hasan, Md. Abdul Awal, and
Md. Akhtarul Islam supervised the project.
MATERIAL AVAILABILITY
This data was collected considering all ethical issues that
can be found on the DHS websites (https://dhsprogram.com/)
and now published at Harvard Dataverse [41]. This study
excluded the ethical review endorsement separately. The data
and source codes that support the findings of this study are
available at https://github.com/kamruleee51/measles_vaccine
_uptake.
REFERENCES
[1] W. J. Moss, ‘‘Measles,’’ Lancet, vol. 390, no. 10111, pp. 2490–2502,
2017. [Online]. Available: https://www.sciencedirect.com/science/
article/pii/S0140673617314630
[2] H. Q. McLean, A. P. Fiebelkorn, J. L. Temte, and G. S. Wallace,
‘‘Prevention of measles, rubella, congenital rubella syndrome, and
mumps, 2013: Summary recommendations of the Advisory Committee
on Immunization Practices (ACIP),’’ Morbidity Mortality Weekly Rep.,
Recommendations Rep., vol. 62, no. 4, pp. 1–34, 2013.
[3] R. Fernandez, A. Rammohan, and N. Awofeso, ‘‘Correlates of first dose
of measles vaccination delivery and uptake in Indonesia,’’ Asian Pacific J.
Tropical Med., vol. 4, no. 2, pp. 140–145, Feb. 2011.
[4] S. Izadi, S.-M. Zahraie, and M. Sartipi, ‘‘An investigation into a measles
outbreak in southeast Iran,’’ Jpn. J. Infectious Diseases, vol. 65, no. 1,
pp. 45–51, 2012.
[5] A. Mahamud, A. Burton, M. Hassan, J. A. Ahmed, J. B. Wagacha,
P. Spiegel, C. Haskew, R. B. Eidex, S. Shetty, S. Cookson,
C. Navarro-Colorado, and J. L. Goodson, ‘‘Risk factors for measles
mortality among hospitalized Somali refugees displaced by famine,
Kenya, 2011,’’ Clin. Infectious Diseases, vol. 57, no. 8, pp. e160–e166,
Oct. 2013.
[6] N. Sheikh, M. Sultana, N. Ali, R. Akram, R. Mahumud, M. Asaduzzaman,
and A. Sarker, ‘‘Coverage, timelines, and determinants of incomplete
immunization in Bangladesh,’’ Tropical Med. Infectious Disease, vol. 3,
no. 3, p. 72, Jun. 2018.
[7] R. E. Black, S. Cousens, H. L. Johnson, J. E. Lawn, I. Rudan, D. G. Bassani,
P. Jha, H. Campbell, C. F. Walker, R. Cibulskis, T. Eisele, L. Liu, and
C. Mathers, ‘‘Global, regional, and national causes of child mortality in
2008: A systematic analysis,’’ Lancet, vol. 375, no. 9730, pp. 1969–1987,
Jun. 2010.
[8] New Measles Surveillance Data for 2019, World Health Organization,
Geneva, Switzerland, 2019, vol. 24.
[9] A. C. Kantner, S. H. van Wees, E. M. G. Olsson, and S. Ziaei, ‘‘Factors
associated with measles vaccination status in children under the age of
three years in a post-Soviet context: A cross-sectional study using the DHS
VII in Armenia,’’ BMC Public Health, vol. 21, no. 1, pp. 1–10, Dec. 2021.
[10] P. Plans-Rubió, ‘‘Why does measles persist in Europe?’’ Eur. J. Clin.
Microbiol. Infectious Diseases, vol. 36, no. 10, pp. 1899–1906, Oct. 2017.
[11] Y. Hu, Y. Chen, Y. Wang, and H. Liang, ‘‘Evaluation of potentially
achievable vaccination coverage of the second dose of measles containing
vaccine with simultaneous administration and risk factors for missed
opportunities among children in Zhejiang province, East China,’’ Hum.
Vaccines Immunotherapeutics, vol. 14, no. 4, pp. 875–880, Apr. 2018.
[12] P. Plans-Rubió, ‘‘Low percentages of measles vaccination coverage with
two doses of vaccine and low herd immunity levels explain measles
incidence and persistence of measles in the European union in 2017–
2018,’’ Eur. J. Clin. Microbiol. Infectious Diseases, vol. 38, no. 9,
pp. 1719–1729, Sep. 2019.
[13] J. P. Higgins, K. Soares-Weiser, J. A. López-López, A. Kakourou,
K. Chaplin, H. Christensen, N. K. Martin, J. A. Sterne, and A. L. Reingold,
‘‘Association of BCG, DTP, and measles containing vaccines with
childhood mortality: Systematic review,’’ Brit. Med. J., vol. 355, Oct. 2016,
Art. no. i5170.
[14] O. M. de la Santé, ‘‘Measles vaccines: Who position paper—April 2017-
note de synthèse de l’OMS sur les vaccins contre la rougeole-avril 20177,’’
Weekly Epidemiolog. Record= Relevé épidémiologique hebdomadaire,
vol. 92, no. 17, pp. 205–227, 2017.
[15] Bangladesh Demographic and Health Survey 2017–18: Key Indicators,
National Institute of Population Research and Training (NIPORT), Dhaka,
Bangladesh, 2019.
[16] M. D. C. Tauil, A. P. S. Sato, and E. A. Waldman, ‘‘Factors associated with
incomplete or delayed vaccination across countries: A systematic review,’’
Vaccine, vol. 34, no. 24, pp. 2635–2643, May 2016.
[17] S. Bhattacherjee, P. Dasgupta, A. Mukherjee, and S. Dasgupta, ‘‘Vaccine
hesitancy for childhood vaccinations in slum areas of Siliguri, India,’’
Indian J. Public Health, vol. 62, no. 4, p. 253, 2018.
[18] R. Rossi, ‘‘Do maternal living arrangements influence the vaccination
status of children age 12–23 months? A data analysis of demographic
health surveys 2010–11 from Zimbabwe,’’ PLoS ONE, vol. 10, no. 7,
Jul. 2015, Art. no. e0132357.
[19] S. Walsh, D. R. Thomas, B. W. Mason, and M. R. Evans, ‘‘The impact of
the media on the decision of parents in south Wales to accept measles-
mumps-rubella (MMR) immunization,’’ Epidemiol. Infection, vol. 143,
no. 3, pp. 550–560, Feb. 2015.
119626 VOLUME 9, 2021
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
[20] S. Engebretsen and J. Bohlin, ‘‘Statistical predictions with glmnet,’’ Clin.
Epigenetics, vol. 11, no. 1, pp. 1–3, Dec. 2019.
[21] W. M. T. W. Ahmad, N. Ghani, and S. M. Drus, ‘‘Handling imbalanced
class problem of measles infection risk prediction model,’’ Int. J. Eng. Adv.
Technol., vol. 9, no. 1, pp. 3431–3435, 2019.
[22] J. Nazari, P.-S. Fathi, N. Sharahi, M. Taheri, P. Amini, and
A. Almasi-Hashiani, ‘‘Evaluating measles incidence rates using machine
learning and time series methods in the center of Iran; 1997–2020,’’
Tech. Rep., 2020.
[23] A. Bell, A. Rich, M. Teng, T. Oreskovic, N. B. Bras, L. Mestrinho,
S. Golubovic, I. Pristas, and L. Zejnilovic, ‘‘Proactive advising: A machine
learning driven approach to vaccine hesitancy,’’ in Proc. IEEE Int. Conf.
Healthcare Informat. (ICHI), Jun. 2019, pp. 1–6.
[24] V. Carrieri, R. Lagravinese, and G. Resce, ‘‘Predicting vaccine hesitancy
from area-level indicators: A machine learning approach,’’ MedRxiv,
Mar. 2021.
[25] O. O. Bucaro, ‘‘Exploring relevant features associated with
measles nonvaccination using a machine learning approach,’’
Tech. Rep., 2020. [Online]. Available: https://www.diva-portal.org/smash/
get/diva2:1461628/FULLTEXT01.pdf
[26] A. S. Rao, D. A. D’Mello, R. Anand, and S. Nayak, ‘‘Clinical significance
of measles and its prediction using data mining techniques: A systematic
review,’’ in Advances in Artificial Intelligence and Data Engineering.
Singapore: Springer, 2021, pp. 737–759.
[27] A. Susilowati, Y. Wijayanti, and I. M. Sudana, ‘‘The influencing risk
factors of measles in Bantul regency,’’ Public Health Perspective J., vol. 4,
no. 2, pp. 129–140, 2019.
[28] V. D. Kien, H. Van Minh, K. B. Giang, V. Q. Mai, N. T. Tuan,
and M. B. Quam, ‘‘Trends in childhood measles vaccination highlight
socioeconomic inequalities in Vietnam,’’ Int. J. Public Health, vol. 62,
no. S1, pp. 41–49, Feb. 2017.
[29] C. Hagemann, A. Streng, A. Kraemer, and J. G. Liese, ‘‘Heterogeneity
in coverage for measles and varicella vaccination in toddlers—Analysis
of factors influencing parental acceptance,’’ BMC Public Health, vol. 17,
no. 1, pp. 1–10, Dec. 2017.
[30] A. B. Wilder-Smith and K. Qureshi, ‘‘Resurgence of measles in Europe:
A systematic review on parental attitudes and beliefs of measles vaccine,’’
J. Epidemiol. Global Health, vol. 10, no. 1, p. 46, 2019.
[31] D. E. Griffin, ‘‘Measles vaccine,’’ Viral Immunol., vol.31, no. 2, pp. 86–95,
2018.
[32] R. D. de Vries, A. W. Mesman, T. B. Geijtenbeek, W. P. Duprex, and
R. L. de Swart, ‘‘The pathogenesis of measles,’’ Current Opinion Virol.,
vol. 2, no. 3, pp. 248–255, 2012.
[33] R. Buchanan and D. J. Bonthius, ‘‘Measles virus and associated central
nervous system sequelae,’’ Seminars Pediatric Neurol., vol. 19, no. 3,
pp. 107–114, Sep. 2012.
[34] J. C. Bester, ‘‘Measles and measles vaccination: A review,’’ JAMA
Pediatrics, vol. 170, no. 12, pp. 1209–1215, 2016.
[35] W. J. Moss and D. E. Griffin, ‘‘Global measles elimination,’’ Nature Rev.
Microbiol., vol. 4, no. 12, pp. 900–908, Dec. 2006.
[36] L. K. Tannous, G. Barlow, and N. H. Metcalfe, ‘‘A short clinical
review of vaccination against measles,’’ JRSM open, vol. 5, no. 4, 2014,
Art. no. 2054270414523408.
[37] R. T. Perry and N. A. Halsey, ‘‘The clinical significance of measles: A
review,’’ J. Infectious Diseases, vol. 189, no. 1, pp. S4–S16, May 2004.
[38] Bangladesh Demographic and Health Survey, Mitra and Associates
(Firm), M. I. I. for Resource Development Demographic and Health
Survey, National Institute of Population Research and Training (NIPORT),
Dhaka, Bangladesh, 2011.
[39] Bangladesh Demographic and Health Survey 2014: Key Indicators,
National Institute of Population Research and Training (NIPORT), Mitra,
Associates, and II, Dhaka, Bangladesh, 2015.
[40] Bangladesh Demographic Health Survey, 2007, National Institute of
Population Research and Training (NIPORT), Mitra, Associates, (Firm),
and Macro International, Dhaka, Bangladesh, 2009
[41] M. K. Hasan, J. M. Tasnim, A. Dutta, A. M. Abdul, M. A. Islam,
M. Mehedi, and F. Al-Amr Jehad, ‘‘Measles,’’ Harvard Dataverse, V1,
Tech. Rep. UNF:6:CG4S8sYltZv8Btm5uCF/aA==[fileUNF], 2021, doi:
10.7910/DVN/S76AZS.
[42] D. Krstajic, L. J. Buturovic, D. E. Leahy, and S. Thomas, ‘‘Cross-validation
pitfalls when selecting and assessing regression and classification models,’’
J. Cheminform., vol. 6, no. 1, pp. 1–15, Dec. 2014.
[43] M. K. Hasan, M. A. Alam, D. Das, E. Hossain, and M. Hasan, ‘‘Diabetes
prediction using ensembling of different machine learning classifiers,’’
IEEE Access, vol. 8, pp. 76516–76531, 2020.
[44] A. Purwar and S. K. Singh, ‘‘Hybrid prediction model with missing
value imputation for medical data,’’ Expert Syst. Appl., vol. 42, no. 13,
pp. 5621–5631, Aug. 2015.
[45] P. J. García-Laencina, J.-L. Sancho-Gómez, and A. R. Figueiras-Vidal,
‘‘Pattern classification with missing data: A review,’’ Neural Comput.
Appl., vol. 19, no. 2, pp. 263–282, 2010.
[46] T. Aljuaid and S. Sasi, ‘‘Proper imputation techniques for missing values in
data sets,’’ in Proc. Int. Conf. Data Sci. Eng. (ICDSE), Aug. 2016, pp. 1–5.
[47] F. Korn, B.-U. Pagel, and C. Faloutsos, ‘‘‘On the ‘dimensionality curse’
and the ‘self-similarity blessing,’’’ IEEE Trans. Knowl. Data Eng., vol. 13,
no. 1, pp. 96–111, Jan./Feb. 2001.
[48] A. Jovic, K. Brkic, and N. Bogunovic, ‘‘A review of feature selection
methods with applications,’’ in Proc. 38th Int. Conv. Inf. Commun.
Technol., Electron. Microelectron. (MIPRO), May 2015, pp. 1200–1205.
[49] Q. Gu, Z. Li, and J. Han, ‘‘Generalized Fisher score for feature
selection,’’ 2012, arXiv:1202.3725. [Online]. Available: http://arxiv.org/
abs/1202.3725
[50] B. H. Menze, B. M. Kelm, R. Masuch, U. Himmelreich, P. Bachert,
W. Petrich, and F. A. Hamprecht, ‘‘A comparison of random forest and
its Gini importance with standard chemometric methods for the feature
selection and classification of spectral data,’’ BMC Bioinf., vol. 10, no. 1,
pp. 1–16, 2009.
[51] Y. Ye, C. Liu, N. Zemiti, and C. Yang, ‘‘Optimal feature selection for EMG-
based finger force estimation using LightGBM model,’’ in Proc. 28th
IEEE Int. Conf. Robot Hum. Interact. Commun. (RO-MAN), Oct. 2019,
pp. 1–7.
[52] C. Chen, Q. Zhang, B. Yu, Z. Yu, P. J. Lawrence, Q. Ma, and
Y. Zhang, ‘‘Improving protein-protein interactions prediction accuracy
using XGBoost feature selection and stacked ensemble classifier,’’
Comput. Biol. Med., vol. 123, Aug. 2020, Art. no. 103899.
[53] T. Chen and C. Guestrin, ‘‘XGBoost: A scalable tree boosting system,’’
in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
Aug. 2016, pp. 785–794.
[54] M. Ustuner and F. Balik Sanli, ‘‘Polarimetric target decompositions and
light gradient boosting machine for crop classification: A comparative
evaluation,’’ ISPRS Int. J. Geo-Inf., vol. 8, no. 2, p. 97, Feb. 2019.
[55] A. A. Taha and S. J. Malebary, ‘‘An intelligent approach to credit card
fraud detection using an optimized light gradient boosting machine,’’ IEEE
Access, vol. 8, pp. 25579–25587, 2020.
[56] S.-L. Hsieh, S.-H. Hsieh, P.-H. Cheng, C.-H. Chen, K.-P. Hsu, I.-S. Lee,
Z. Wang, and F. Lai, ‘‘Design ensemble machine learning model for breast
cancer diagnosis,’’ J. Med. Syst., vol. 36, no. 5, pp. 2841–2847, Oct. 2012.
[57] N. Sikder, M. Masud, A. K. Bairagi, A. S. M. Arif, A.-A. Nahid,
andH. A. Alhumyani, ‘‘Severity classification of diabetic retinopathy
using an ensemble learning algorithm through analyzing retinal images,’’
Symmetry, vol. 13, no. 4, p. 670, Apr. 2021.
[58] M. Masud, A. K. Bairagi, A.-A. Nahid, N. Sikder, S. Rubaiee, A. Ahmed,
and D. Anand, ‘‘A pneumonia diagnosis scheme based on hybrid
features extracted from chest radiographs using an ensemble learning
algorithm,’’ J. Healthcare Eng., vol. 2021, pp. 1–11, Feb. 2021, doi:
10.1155/2021/8862089.
[59] M. A. Awal, M. Masud, M. S. Hossain, A. A.-M. Bulbul,
S. M. H. Mahmud, and A. K. Bairagi, ‘‘A novel Bayesian optimization-
based machine learning framework for COVID-19 detection from inpatient
facility data,’’ IEEE Access, vol. 9, pp. 10263–10281, 2021.
[60] L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar,
‘‘Hyperband: A novel bandit-based approach to hyperparameter optimiza-
tion,’’ J. Mach. Learn. Res., vol. 18, no. 1, pp. 6765–6816, 2017.
[61] R. Dai, W. Zhang, W. Tang, E. Wynendaele, Q. Zhu, Y. Bin,
B. De Spiegeleer, and J. Xia, ‘‘BBPpred: Sequence-based prediction of
blood-brain barrier peptides with feature representation learning and
logistic regression,’’ J. Chem. Inf. Model., vol. 61, no. 1, pp. 525–534,
Jan. 2021.
[62] N. Cheng, M. Li, L. Zhao, B. Zhang, Y. Yang, C.-H. Zheng, and J. Xia,
‘‘Comparison and integration of computational methods for deleterious
synonymous mutation prediction,’’ Briefings Bioinf., vol. 21, no. 3,
pp. 970–981, May 2020, doi: 10.1093/bib/bbz047.
[63] M. K. Hasan, T. A. Aleef, and S. Roy, ‘‘Automatic mass classification
in breast using transfer learning of deep convolutional neural network
and support vector machine,’’ in Proc. IEEE Region Symp. (TENSYMP),
Jun. 2020, pp. 110–113.
VOLUME 9, 2021 119627
M. K. Hasan et al.: Associating Measles Vaccine Uptake Classification and Its Underlying Factors
[64] M. A. Awal, M. S. Hossain, K. Debjit, N. Ahmed, R. D. Nath,
G. M. M. Habib, M. S. Khan, M. A. Islam, and M. A. P. Mahmud,
‘‘An early detection of asthma using BOMLA detector,’’ IEEE Access,
vol. 9, pp. 58403–58420, 2021.
[65] A. V. Dorogush, V. Ershov, and A. Gulin, ‘‘CatBoost: Gradient boosting
with categorical features support,’’ 2018, arXiv:1810.11363. [Online].
Available: http://arxiv.org/abs/1810.11363
[66] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu,
‘‘LightGBM: A highly efficient gradient boosting decision tree,’’ in Proc.
Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 3146–3154.
MD. KAMRUL HASAN received the B.Sc.
and M.Sc. degrees in electrical and electronic
engineering (EEE) from Khulna University of
Engineering & Technology (KUET), in 2014 and
2017, respectively, and the M.Sc. degree in
medical imaging and application (MAIA) from the
University of Burgundy, France, the University of
Cassino and Southern Lazio, Italy, and the Uni-
versity of Girona, Spain, as an Erasmus Scholar,
in 2019. He is currently working as an Assistant
Professor with the EEE Department, KUET. His research interests include
medical image and data analysis, machine learning, deep convolutional
neural network, medical image reconstruction, augmented reality, and
surgical robotics in minimally invasive surgery. He is currently a supervisor
of several undergraduate students on the classification, segmentation, and
registration of medical images with different modalities. His previous works
were published in various journals such as Medical Image Analysis (MIA;
Elsevier), Computer in Biology and Medicine (CBM; Elsevier), Artificial
Intelligence in Medicine (AIIM; Elsevier), Biomedical Signal Processing
and Control (BSCP; Elsevier), and IEEE ACCESS.
MD. TASNIM JAWAD was born in Rangpur,
Bangladesh, in 2000. He is currently pursuing
the B.Sc. degree in electrical and electronic engi-
neering with Khulna University of Engineering
& Technology. He is also taking supplementary
courses from online educational providers, such as
Coursera and Udemy in machine learning and deep
learning. His current research interests include
image classification, audio classification, medical
image processing, convolutional neural networks,
recurrent neural networks, and generative adversarial networks.
AISHWARIYA DUTTA received the B.Sc. degree
in biomedical engineering (BME) from Khulna
University of Engineering & Technology (KUET),
where she is currently pursuing the master’s
degree with the Department of Biomedical Engi-
neering (BME). She has published one con-
ference paper in the 4th International Joint
Conference on Advances in Computational Intel-
ligence (IJCACI), in 2020, and also coauthored
one international journal article. Her research
interests include machine learning and its applications, deep learning,
biomedical imaging, biomedical signal processing, and nanotechnology in
bioengineering.
MD. ABDUL AWAL received the B.Sc. degree in
electronics and communication engineering (ECE)
from the ECE Discipline, Khulna University,
in 2009, the M.Sc. degree in biomedical engi-
neering from Khulna University of Engineering
& Technology, in 2011, and the Ph.D. degree
in biomedical engineering from The University
of Queensland, Australia, in 2018. He is cur-
rently working as an Associate Professor with
the ECE Discipline, Khulna University, Khulna,
Bangladesh. He is also investigating some projects as the Principal
Investigator and a Co-Investigator and supervising several undergraduate
and post-graduate students. His research interests include signal processing,
especially biomedical signal processing, big data analysis, image processing,
time-frequency analysis, machine learning algorithms, deep learning,
optimization, and computational intelligence biomedical engineering. He has
more than 40 papers published in internationally accredited journals and
conferences.
MD. AKHTARUL ISLAM received the B.Sc. and
M.S. degree in statistics biostatistics & informat-
ics from Dhaka University, Dhaka, Bangladesh,
in 2012 and 2013, respectively. He is currently
working as an Assistant Professor with the
Statistics Discipline, Khulna University, Khulna,
Bangladesh. He has authored or coauthored
around 12 publications in different peer-reviewed
journals. His research interests include bio-
statistics, epidemiology, public health, infectious
disease, meta-analysis, statistical computing, and multivariate analysis.
MEHEDI MASUD (Senior Member, IEEE)
received the Ph.D. degree in computer science
from the University of Ottawa, Canada. He is
currently a Full Professor with the Department
of Computer Science, Taif University, Taif, Saudi
Arabia. He has authored or coauthored around
50 publications, including refereed IEEE, ACM,
Springer, and Elsevier journals, conference papers,
books, and book chapters. His research interests
include cloud computing, distributed algorithms,
data security, data interoperability, formal methods, and cloud and
multimedia for healthcare. He has served as a Technical Program Committee
Member of different international conferences. He is a recipient of a number
of awards, including the Research in Excellence Award from Taif University.
He is on the Associate Editorial Board of IEEE ACCESS and International
Journal of Knowledge Society Research (IJKSR). He is an Editorial Board
Member of Journal of Software. He also served as the Guest Editor of
ComSIS journal and Journal of Universal Computer Science (JUCS). He is
a member of ACM.
JEHAD F. AL-AMRI received the degree from
the Centre for Computing and Social Responsi-
bility, De Montfort University. He is currently
an Associate Professor with the Department of
Information Technology, Faculty of Computers
and Information Technology, Taif University,
Saudi Arabia. His research interests include cloud
computing security, multimedia security, image
encryption, steganography, and medical image
processing.
119628 VOLUME 9, 2021