ArticlePDF Available

Abstract and Figures

Stroke is one of the fatal brain diseases that cause death in 3 to 10 hours. However, most stroke mortality can be prevented by identifying the nature of the stroke and reacting to it promptly through smart health systems. In this paper, a machine learning model is approached for predicting the existence of stroke of a patient where the Random forest classifier outperforms the state-of-the-art models, including Logistic Regression, Decision Tree Classifier (DTC), K-NN. We conduct the experiments on datasets which has 5110 observations with 12 attributes. We also applied EDA for preprocessing and feature techniques for balancing the datasets. Finally, a cloud-based mobile app collects user data to analyze and provide the possibility of stroke for alerting the person with the accuracy of precision 96%, recall 96%, and F1-score 96%. This user-friendly system can be a lifesaver as the person gets an essential warning very easily by providing very little information from anywhere with a mobile device.
Content may be subject to copyright.
International Journal of Electronics and Communications System
Volume 1, Issue 2, 17-22.
ISSN: 2798-2610
http://ejournal.radenintan.ac.id/index.php/IJECS/index
DOI: 10.24042/ijecs.v1i2.10393
Corresponding author:
Md. Monirul Islam, Uttara University, Dhaka-1230, BANGLADESH. monirul@uttarauniversity.edu.bd
© 2021 The Author(s). Open Access. This article is under the CC BY SA license (https://creativecommons.org/licenses/by-sa/4.0/)
Stroke prediction analysis using machine learning classifiers and
feature technique
Md. Monirul Islam *
Uttara University, Dhaka-1230,
BANGLADESH
Sharmin Akter
Atish Dipankar University of Science
& Technology, Dhaka-1230,
BANGLADESH
Md. Rokunojjaman
Chongqing University of Technology,
Chongqing 400054, CHINA
Jahid Hasan Rony
Dhaka University of Engineering and
Technology, Gazipur, Gazipur-1700,
BANGLADESH
Al Amin
Chongqing University of
Technology, Chongqing 400054,
CHINA
Article Info
Abstract
Article history:
Received: October 4, 2021
Revised: December 8, 2021
Accepted: December 15 2021
Stroke is one of the fatal brain diseases that cause death in 3 to 10 hours.
However, most stroke mortality can be prevented by identifying the nature of the
stroke and reacting to it promptly through smart health systems. In this paper, a
machine learning model is approached for predicting the existence of stroke of a
patient where the Random forest classifier outperforms the state-of-the-art
models, including Logistic Regression, Decision Tree Classifier (DTC), K-NN. We
conduct the experiments on datasets which has 5110 observations with 12
attributes. We also applied EDA for preprocessing and feature techniques for
balancing the datasets. Finally, a cloud-based mobile app collects user data to
analyze and provide the possibility of stroke for alerting the person with the
accuracy of precision 96%, recall 96%, and F1-score 96%. This user-friendly
system can be a lifesaver as the person gets an essential warning very easily by
providing very little information from anywhere with a mobile device.
Keywords:
Feature Technique,
Random Forest Classifier,
Stroke disease
To cite this article: M. M. Islam, S. Akter, M. Rokunojjaman, J. H. Rony, A. Al Amin, and S. Kar, Stroke
Prediction Analysis using Machine Learning Classifiers and Feature Technique,” Int. J. Electron. Commun. Syst.,
vol. 1, no. 2, 17-22, 2021.
INTRODUCTION
A stroke happens to interrupt blood flow
to a portion of your brain [1]. A loss of blood
circulation to some brain areas causes a
stroke, which is also known as a brain attack
[2]. Furthermore, clot blocking is the major
cause of stroke in the brain (thrombosis). The
blood vessel delivers the brain portion and is
subsequently run down of blood and oxygen.
The brain cells expire as an outcome of the
lack of blood and O2, and the part of the body
it regulates ceases working [3]. Death and
disability happen for stroke in the United
States badly. Ischemic embolic and
hemorrhagic strokes cause the majority of
strokes. An ischemic embolic stroke happens
when a blood clot exits the patient's brain,
travels through the circulatory system, and
becomes lodged in smaller brain arteries.
Another type is hemorrhagic stroke, which
occurs when leaks or ruptures a blood vessel
in the brain. [4]. The use of various predictive
indicators to predict the outcome of a stroke
could help doctors identify high-risk patients
and reduce morbidity. Overweight, physical
inactivity, diabetics, and other parameters
such as age, sex, race can be used to predict the
possibility of stroke. On the other hand,
machine learning offers an option, particularly
for large-scale multi-institutional data that
18 Int. J. Electron. Commun. Syst, 1 (2) (2021) 17-22
may be readily included in a forecast [5] based
on freshly available data.
Smartphones can play a vital role in
establishing a between the healthcare system
and the global population. A mobile app is very
user-friendly and popular in the current world.
According to statistics, there are more than 3.2
billion smartphone users. As a result, a mobile
app could be one of the most popular and
effective mediums. In case of stroke, a disease
avoidable through awareness, a smartphone
could be the easier way to reach people.
Machine learners have various
applications expanding within the study of
bioinformatics, a subfield of artificial
intelligence which includes improving
calculations to discover how projections are
dependent on information. Bioinformatics
manages computational and numerical
approaches for comprehension and
manipulating natural data. Six natural
environments have been subjected to machine
learning. To assist in the analysis of stroke,
machine learning algorithms for examining
neuroimaging data are used. The diagnosis and
treatment of stroke disease in underdeveloped
countries is extremely difficult due to a lack of
diagnostic technologies and a scarcity of
doctors and other resources that impede the
accurate prediction and treatment of heart
patients. Recently, computer technology and
machine learning approaches have been
developed with this goal in mind to improve
the system's ability to assist doctors in the
initial phases of disease decision-making [6].
Our motivation is to benefit stroke prediction
to prevent casualty and ensure accessibility for
everyone.
Among various studies in this area, in [7],
stroke prediction directions were designed as
risk assessment and web-based cooperative
Java applets. These Java applets enable risk
calculations and can be run interactively with
any web browser that supports Java 1.1. With
this method, patient data can simply be
entered into a computer that uses complex
statistical models to produce instant
calculations of risk scores. Authors in [8]
examined the utility of the echo planer
magnetic perfusion imaging and diffusion-
weighted imaging in predicting stock with a
critical hemispheric infraction. In [9], type 2
diabetes patients have an increased risk of
stroke. In this approach, they examined the
stroke predictors and effects of atorvastatin on
certain stroke subtypes in type 2 diabetes in
the collaborative atorvastatin diabetes study,
which used Cox regression models to evaluate
atorva's impact statins on stroke, and assess
the risks associated with stroke and
underlying stroke. The authors determined
how many self-measures of blood pressure
they took home compared to their predictive
value for the risk of a stroke. In [10], they have
designed and compared several methods of
learning machines, which can predict the
result of endovascular intervention in the
previous histosa circulation. The authors
developed a CPS to detect the appearance of
the patients who are at high risk or survived a
stroke before [11]. CPS developed send data
registered by the doctor and warned to find a
stroke.
Furthermore, the proposed system works
in data purchased by the patient's brain
electroencephalography sensor. The authors
have developed a model learning model (ML)
calculated by threshold (ML) to predict the
tracking infarction in patients with acute
ischemic stroke [12]. The author determined
the optimal number of self-measurements of
blood pressure at home based on its predictive
value for stroke risk. Therefore, the Cox
proportional hazard regression model [13]
was used to investigate the prognostic
significance of blood pressure for the risk of
stroke, which was adjusted for possible
confounding factors.
The author has developed a very accurate
and highly interpretable predictive model.
These predictive models will be provided in
the form of sparse decision lists [14], which
are derived from a series of if . . .Then . . .
Statements where the if statement defines a
set of feature partitions and the then
statement corresponds to the predicted result
of interest. In [15], the authors predicted
stroke occurrence using a large population-
based EMC database and also compared DNN
with three other ML methods. The authors
compared the Cox proportional hazard model
with an automatic stroke prediction approach
based on a cardiovascular health study (CHS)
dataset [16]. The author developed a hybrid
machine learning method to predict stroke
based on incomplete and unbalanced
physiological data for clinical diagnosis [17].
Using this method, the whole process involves
Int. J. Electron. Commun. Syst, 1 (2) (2021) 17-22 19
two steps. First, use random forest regression
to estimate missing values before
classification. Secondly, automatic
hyperparameter optimization (AutoHPO)
based on deep neural networks (DNN)
predicts stroke on unbalanced data sets. The
author applies machine learning principles to
existing large data sets to effectively predict
strokes based on potentially changeable risk
factors [18] to develop applications that
provide personalized warnings and related
information based on each user's stroke risk
level Lifestyle of stroke risk factors. The
authors raised the hypothesis that the degree
of stenosis, the irregularity of the plaque's
surface, eColity, and consistency, complicated
in a total score of risk (TPR), are predictors of
the ischemic blow [19]. Three classification
algorithms that include the decision-making
tree, naive bayes, and neuronal network are
used to predict the stretch based on models
higher than general statistics and obtained an
adequate model for identification [20].
This paper proposed Stroke prediction
analysis using a machine learning algorithm
using a healthcare dataset, including various
kinds of risk factors.
The rest of the paper is organized as
tracks: the methodology is stated in the next
section. Study outcome and discussion are in
the results and discussion section. Finally, the
paper concludes with future scope.
METHOD
Figure 1 shows the detailed block diagram
of the proposed methodology.
Figure 1. Block Diagram of the Proposed
Methodology
Dataset Description
The utilized dataset [21] contains 5110
observations with 12 attributes. The attributes
are gender, age, hypertension, heart_disease,
ever_married, work_type, Residence type,
average glucose_level, BMI, smoking_status,
and stroke. Stroke is a dependent variable, and
others are independent variables.
Exploratory Data Analysis (EDA)
EDA often uses data visualization
approaches to analyze and examine data sets
and summarize their key characteristics. It can
help determine how best data sources can be
handled to get the needed answers, facilitating
the finding of patterns, spot anomalies,
hypotheses, or assume checks for data
scientists. In this part, we defined the missing
values, data counts, dropped the id column,
exploring each variable.
Feature Techniques
Feature engineering means transforming
raw data into features that better signify the
predictive models' underlying problem and
improve model accuracy in unsightly data.
Many techniques can be employed, including
NearMiss, SMOTE, Tomak Links, etc. This
paper utilized the synthetic minority over-
sampling technique (SMOTE) after
preprocessing the datasets in the EDA step.
The target variable has 201 stroke occurrences
and 4908 non-occurrence patients.
Machine Learning Analysis
This paper utilized various machine
learning (ML) models containing Naïve Bayes,
Random Forest, Ada Boost Algorithm. Among
them, the Random Forest model outperforms
the best accuracy. So Random forest model is
described here.
Random Forest (RF)
RF is a supervised learning algorithm. It
creates a "forest" from a series of decision
trees that are usually trained using a "bagging"
process. The basic premise of the bagging
method is that combining different learning
models can improve the overall result. The
advantage of RF is that it can solve
classification and regression problems that
make up most of the existing machine learning
systems. Decision trees or bagging classifiers
have almost the same hyperparameters as
random forests. Fortunately, you can use
random forest classifiers instead of combining
decision trees and bagging classifiers. You can
use the algorithm's suppressor to handle the
regression task of the random forest. The RF
adds additional unpredictability to the model
20 Int. J. Electron. Commun. Syst, 1 (2) (2021) 17-22
as the tree develops. Instead of the most
relevant feature, splitting a node looks for the
optimal function in a random selection of
features. Hence, many types lead to better
models. Therefore, the algorithm for splitting
nodes in the random forest only considers a
random subset of features. Instead of looking
for the best possible threshold, you can make
the tree more random by using random
thresholds for each function [22].
The random forest training algorithm uses
the general aggregation bootstrap technique,
or bags, for train trees students. Figure 2
demonstrates the concept of a random forest
model where Tree 1 and Tree 2 associate Class
X. So, the majority vote/predicted output is
Class X.
Figure 2. Random Forest
Predictions for unseen samples I can be
produced after training by summing the
predictions from all of the separate regression
trees on i':



or by taking the majority vote in the case of
classification trees [24].
User Interface
User data are collected through mobile
apps. Users input gender, age, work_type,
heart_disease, hypertension, ever_married,
Residence_type, BMI, avg glucose level,
smoking_status through the mobile app. In
Figure 3, the mobile app interface is shown.
User data are stored in the cloud Firestore
database. After the processing, the result is
stored in the Firestore and shown on the user
end.
Figure 3. Mobile App
RESULTS AND DISCUSSION
Python programming language is used to
classify the proposed model and describe
other models for data analysis. The instrument
is very useful for analysis and includes
different methods. For each model species, we
have used 20% of the values for testing and
80% for training. We take precision, recall, and
f1-score as performance metrics.
Precision (P): P is the ratio of the positive
cases correctly predicted to the positive cases.
The low false positive rate refers to high
accuracy. It is a measure of a classifier's
accuracy. In equation 1, it is defined
mathematically.
  
 
Int. J. Electron. Commun. Syst, 1 (2) (2021) 17-22 21
Recall (R): R refers to the ratio of positive
cases correctly predicted to all positive
classification cases. It is a measure of a
classification's completeness. In Equation 2, R
is defined mathematically.
  
 
F1-Score: is an average weighted accuracy and
recall. F1, if there is an inconsistent class
distribution in the data set, is usually more
useful than precision. It is displayed in
equation 3 mathematically, and the result of
accuracies can be seen in table 1.
  
   
Table 1. Result Accuracies
ML Model
Accuracies (%)
Preci
sion
Rec
all
F1-
Score
Logistic
Regression [23]
87
87
87
DTC [24]
93
93
93
K-NN [25]
90
91
90
Random Forest
(proposed)
96
96
96
Table 1 describes the result of accuracies.
The random forest model gives the highest
accuracies in all performance metrics as 96%.
K-NN achieves 3rd place as holding 90%
performance metrics, DTC stays 2nd position
as 93% accuracy, and logistic regression
receives 87% accuracy.
CONCLUSION
This paper presented a machine learning
approach to the stroke dataset. The Random
Forest models showed the best accuracy as
precision 96%, recall 96%, and F1-score 96%,
outperforming the state-of-art models
including logistic regression, decision tree
classifier, and K-NN. The utilized dataset is
imbalanced, therefore, SMOTE feature
engineering is used to process the data. In the
future, we will plan to analyze the dataset
using deep learning methods and try to
enhance the accuracy.
REFERENCES
[1] “Stroke (Causes, Symptoms, and
Complications) - Assignment Point.”
https://www.assignmentpoint.com/science
/medical/stroke-causes-symptoms-and
(accessed May 16, 2021).
[2] Mayo Clinic, “Stroke - Symptoms and
Causes,” Mayo Clinic, Nov. 06, 2020.
https://www.mayoclinic.org/diseases-
conditions/stroke/symptoms-causes/syc-
20350113.
[3] B. Wedro, “Stroke Warning Signs,
Symptoms, Treatment, Types & Causes,”
MedicineNet, 2019.
https://www.medicinenet.com/stroke_sym
ptoms_and_treatment/article.htm
[4] H. Rodgers, “Stroke,” Neurological
Rehabilitation, pp. 427433, 2013, doi:
10.1016/b978-0-444-52901-5.00036-8
[5] H. Asadi, R. Dowling, B. Yan, and P. Mitchell,
“Machine Learning for Outcome Prediction
of Acute Ischemic Stroke Post Intra-Arterial
Therapy,” PLoS ONE, vol. 9, no. 2, p. e88225,
Feb. 2014, doi:
10.1371/journal.pone.0088225.
[6] P. Govindarajan, R. K. Soundarapandian, A.
H. Gandomi, R. Patan, P. Jayaraman, and R.
Manikandan, “Classification of stroke
disease using machine learning algorithms,”
Neural Computing and Applications, vol. 32,
no. 3, pp. 817828, Jan. 2019, doi:
10.1007/s00521-019-04041-y.
[7] T. Lumley, R. A. Kronmal, M. Cushman, T. A.
Manolio, and S. Goldstein, “A stroke
prediction score in the elderly,” Journal of
Clinical Epidemiology, vol. 55, no. 2, pp.
129136, Feb. 2002, doi: 10.1016/s0895-
4356(01)00434-6.
[8] P. A. Barber et al., “Prediction of stroke
outcome with echoplanar perfusion- and
diffusion-weighted MRI,” Neurology, vol. 51,
no. 2, pp. 418426, Aug. 1998, doi:
10.1212/wnl.51.2.418.
[9] G. A. Hitman et al., “Stroke prediction and
stroke prevention with atorvastatin in the
Collaborative Atorvastatin Diabetes Study
(CARDS),” Diabetic Medicine, vol. 24, no. 12,
pp. 13131321, Dec. 2007, doi:
10.1111/j.1464-5491.2007.02268.x.
[10] H. Asadi, R. Dowling, B. Yan, and P. Mitchell,
“Machine Learning for Outcome Prediction
of Acute Ischemic Stroke Post Intra-Arterial
Therapy,” PLoS ONE, vol. 9, no. 2, p. e88225,
22 Int. J. Electron. Commun. Syst, 1 (2) (2021) 17-22
Feb. 2014, doi:
10.1371/journal.pone.0088225.
[11] A. Laghari, Z. A. Memon, S. Ullah and I.
Hussain, "Cyber Physical System for Stroke
Detection," in IEEE Access, vol. 6, pp.
37444-37453, 2018, doi:
10.1109/ACCESS.2018.2851540.
[12] H. Kuang et al., "Computed Tomography
PerfusionBased Machine Learning Model
Better Predicts Follow-Up Infarction in
Patients With Acute Ischemic Stroke,"
stroke, vol. 52, no. 1, pp. 223231, Jan.
2021, doi: 10.1161/strokeaha.120.030092.
[13] T. Ohkubo et al., “How many times should
blood pressure be measured at home for
better prediction of stroke risk? Ten-year
follow-up results from the Ohasama study,”
Journal of Hypertension, vol. 22, no. 6, pp.
10991104, Jun. 2004, doi:
10.1097/00004872-200406000-00009.
[14] B. Letham, C. Rudin, T. H. McCormick, and D.
Madigan, “Interpretable classifiers using
rules and Bayesian analysis: Building a
better stroke prediction model,” Annals of
Applied Statistics, vol. 9, no. 3, pp. 1350
1371, Sep. 2015, doi: 10.1214/15-
AOAS848.
[15] C. Hung, W. Chen, P. Lai, C. Lin and C. Lee,
"Comparing deep neural network and other
machine learning algorithms for stroke
prediction in a large-scale population-based
electronic medical claims database," 2017
39th Annual International Conference of
the IEEE Engineering in Medicine and
Biology Society (EMBC), 2017, pp. 3110-
3113, doi: 10.1109/EMBC.2017.8037515.
[16] A. Khosla, Y. Cao, C. C.-Y. Lin, H.-K. Chiu, J.
Hu, and H. Lee, “An integrated machine
learning approach to stroke prediction,”
Proceedings of the 16th ACM SIGKDD
international conference on Knowledge
discovery and data mining - KDD ’10, 2010,
doi: 10.1145/1835804.1835830.
[17] T. Liu, W. Fan, and C. Wu, “A hybrid machine
learning approach to cerebral stroke
prediction based on imbalanced medical
dataset,” Artificial Intelligence in Medicine,
vol. 101, p. 101723, Nov. 2019, doi:
10.1016/j.artmed.2019.101723.
[18] M. Monteiro et al., "Using Machine Learning
to Improve the Prediction of Functional
Outcome in Ischemic Stroke Patients," in
IEEE/ACM Transactions on Computational
Biology and Bioinformatics, vol. 15, no. 6,
pp. 1953-1959, 1 Nov.-Dec. 2018, doi:
10.1109/TCBB.2018.2811471.
[19] P. Prati et al., “Carotid Plaque Morphology
Improves Stroke Risk Prediction:
Usefulness of a New Ultrasonographic
Score,” Cerebrovascular Diseases, vol. 31,
no. 3, pp. 300304, 2011, doi:
10.1159/000320852.
[20] T. Kansadub, S. Thammaboosadee, S.
Kiattisin and C. Jalayondeja, "Stroke risk
prediction model based on demographic
data," 2015 8th Biomedical Engineering
International Conference (BMEiCON), 2015,
pp. 1-3, doi:
10.1109/BMEiCON.2015.7399556.
[21] “Stroke Prediction Dataset,
kaggle.com.https://www.kaggle.com/fedes
oriano/stroke-prediction-dataset.
[22] MM. Islam, MA. Kashem, J. Uddin, “Fish
survival prediction in an aquatic
environment using random forest model,”
IAES International Journal of Artificial
Intelligence (IJ-AI), vol. 10, no. 3, pp. 614
622, 2021, doi: 10.11591/ijai.v10.i3.pp614-
622.
[23] Md. M. Islam, J. Uddin, M. A. Kashem, F.
Rabbi, and Md. W. Hasnat, “Design and
Implementation of an IoT System for
Predicting Aqua Fisheries Using Arduino
and KNN,” Intelligent Human Computer
Interaction, pp. 108118, 2021, doi:
10.1007/978-3-030-68452-5_11.
[24] A. Esmael, M. Elsherief, and K. Eltoukhy,
“Predictive Value of the Alberta Stroke
Program Early CT Score (ASPECTS) in the
Outcome of the Acute Ischemic Stroke and
Its Correlation with Stroke Subtypes,
NIHSS, and Cognitive Impairment,” Stroke
Research and Treatment, vol. 2021, pp. 1
10, Jan. 2021, doi: 10.1155/2021/5935170.
[25] C. Y. Baek, W. N. Chang, B. Y. Park, K. B. Lee,
K. Y. Kang, and M. R. Choi, “Effects of dual-
task gait treadmill training on gait ability,
dual-task interference, and fall efficacy in
people with stroke: A Randomized
Controlled Trial,” Physical Therapy, Feb.
2021, doi: 10.1093/ptj/pzab067.
... Authors in their study propose a machine learning model with K-Nearest Neighbours, Decision Tree, and Logistic Regression 36 . Exploratory data analysis is applied for preprocessing and uses the SMOTE technique to balance the dataset. ...
... Random Forest 30 achieves an impressive accuracy of 94.46% on the imbalanced dataset, while Support Vector Machine 31 reaches an accuracy of 95.49%. Additionally, Random Forest is studied [32][33][34]36 with accuracies ranging from 95.50% to 96.00%. The proposed RXLM 35 model achieves an accuracy of 96.34% on the balanced dataset. ...
Article
Full-text available
Strokes are a leading global cause of mortality, underscoring the need for early detection and prevention strategies. However, addressing hidden risk factors and achieving accurate prediction become particularly challenging in the presence of imbalanced and missing data. This study encompasses three imputation techniques to deal with missing data. To tackle data imbalance, it employs the synthetic minority oversampling technique (SMOTE). The study initiates with a baseline model and subsequently employs an extensive range of advanced models. This study thoroughly evaluates the performance of these models by employing k-fold cross-validation on various imbalanced and balanced datasets. The findings reveal that age, body mass index (BMI), average glucose level, heart disease, hypertension, and marital status are the most influential features in predicting strokes. Furthermore, a Dense Stacking Ensemble (DSE) model is built upon previous advanced models after fine-tuning, with the best-performing model as a meta-classifier. The DSE model demonstrated over 96% accuracy across diverse datasets, with an AUC score of 83.94% on imbalanced imputed dataset and 98.92% on balanced one. This research underscores the remarkable performance of the DSE model, compared to the previous research on the same dataset. It highlights the model's potential for early stroke detection to improve patient outcomes.
... In recent times, studies of data from different sensors and wearable devices like smart watches, smart glasses, etc. are bringing important insights. Smartphones are also being used to track behavioral data [78]. In the detection of mental disorders, these types of data help us to learn the association between physical or environmental responses and the mental condition of a person. ...
... After training with the processed data set, the scientists created a robust stroke incidence classifier and compared it to others. In this research work (Islam et al. 2021), the RF classifier outperformed the baseline methods in predicting stroke. Experiments are run on 5110 observations with 12 characteristics. ...
Article
Full-text available
The paper addresses the challenge of imbalanced classification in the context of cerebrovascular diseases, including stroke, transient ischemic attack (TIA), and vascular dementia. The imbalanced nature of cerebrovascular disease datasets poses significant challenges to conventional machine learning algorithms, making precise diagnosis and effective management difficult. The aim of the paper is to propose a novel approach, the INTEL_SS algorithm, which combines ensemble learning techniques with Support Vector Machine-Synthetic Minority Over-sampling Technique (SVM-SMOTE) to effectively handle the imbalanced nature of cerebrovascular disease datasets. The goal is to improve the accuracy of diagnosis and management of cerebrovascular diseases through advanced machine learning techniques. The proposed methodology involves several key steps, including preprocessing, SVM-SMOTE, and ensemble learning. Preprocessing techniques are used to improve the quality of the dataset, SVM-SMOTE is employed to address class imbalance, and ensemble learning methods such as bagging, boosting, and stacking are utilized to improve overall classification performance. The experimental results demonstrate that the INTEL_SS algorithm outperforms existing methods in terms of accuracy, precision, recall, F1-score, and AUC-ROC. Performance metrics are used to assess the effectiveness of the proposed approach, and the results consistently show the superiority of INTEL_SS compared to state-of-the-art imbalanced classification algorithms. The paper concludes that the INTEL_SS algorithm has the potential to enhance the diagnosis and management of cerebrovascular diseases, offering new opportunities to apply machine learning techniques to improve healthcare outcomes.
... The improvised random forest algorithm presented in this method provides maximum accuracy of 96.9%. Islam et al. [14] developed a system using Random forest algorithm for stroke detection. For feature analysis, the synthetic minority over-sampling approach was used. ...
Article
Full-text available
p class="CM12">This study develops a technique to predict brain strokes using magnetic resonance imaging (MRI). Worldwide, brain stroke is a leading factor in death and long-term impairment. The impact of stroke on the life of survivors is substantial, often resulting in disability. Stroke analysis performed manually takes a lot of time and is subject to intra- and inter-operator variability. Consequently, this work aims to create a computer-based system for the prediction of stroke utilizing deep learning techniques, which help in timely diagnosis. The MRI images are preferred as it provides images of good contrast and no ionizing radiations are used in this imaging method. The deep learning methods included in this proposed work are DenseNet-121, Xception, LeNet, ResNet-50 and VGG-16. The DenseNet-121 classifier outperformed other classifiers and achieved acccuracy of 96%. The outcomes of the proposed approach for stroke prediction in IOT healthcare systems show that improved performance is attained using deep learning methods.</p
Article
Full-text available
Stroke is a disease which cause the death of brain cells, so that the part of the body controlled by the brain loses its function. If not treated immediately, this disease can cause long-term disability, brain damage, and death. In this research, stroke prediction was carried out on the Stroke dataset acquired from the Kaggle dataset using various machine learning models. Then, data sampling techniques are used to handle data imbalance problems in the stroke dataset, which include Random Undersampling, Random Oversampling, and SMOTE techniques. Pearson Correlation and Principal Component Analysis are also used for dimensional reduction and analyzing the important features that are most influential in predicting stroke. Pearson Correlation produces five attributes that have the highest Pearson coefficient, namely age, hypertension, heart disease, blood sugar level, and marital status. Experimental results have demonstrated that the utilization of RUS, ROS, and SMOTE sampling techniques can significantly boost the F1-Score testing by an impressive 43.44%, 34.44%, and 35.55% respectively, as compared to experiments conducted without implementing any data sampling techniques. The highest F1-Score testing was achieved using the Support Vector Machine and Gaussian Naïve Bayes models, namely 0.83.
Article
The death of brain cells occurs when blood flow to a particular area of the brain is abruptly cut off, resulting in a stroke. Early recognition of stroke symptoms is essential to prevent strokes and promote a healthy lifestyle. FAST tests (looking for abnormalities in the face, arms, and speech) have limitations in reliability and accuracy for diagnosing strokes. This research employs machine learning (ML) techniques to develop and assess multiple ML models to establish a robust stroke risk prediction framework. This research uses a stacking-based ensemble method to select the best three machine learning (ML) models and combine their collective intelligence. An empirical evaluation of a publicly available stroke prediction dataset demonstrates the superior performance of the proposed stacking-based ensemble model, with only one misclassification. The experimental results reveal that the proposed stacking model surpasses other state-of-the-art research, achieving accuracy, precision, F1-score of 99.99%, recall of 100%, receiver operating characteristics (ROC), Mathews correlation coefficient (MCC), and Kappa scores 1.0. Furthermore, Shapley's Additive Explanations (SHAP) are employed to analyze the predictions of the black-box machine learning (ML) models. The findings highlight that age, BMI, and glucose level are the most significant risk factors for stroke prediction. These findings contribute to the development of more efficient techniques for stroke prediction, potentially saving many lives.
Conference Paper
Diabetes and heart disease are some of the most critical diseases for human beings. Lots of people are suffering from these two diseases. Early-stage diagnosing of these diseases is very essential for doctors and patients. Machine learning (ML) can play a vital role in this section. To this, ML algorithms can analyze the health data using various Data analytics tools. In this paper, we have found out the prediction of heart disease and diabetes patients. To validate the experimental analysis, we analyzed two datasets named diabetes dataset and heart disease prediction dataset in two popular analytics tools including WEKA and Python. Also, we used 6 supervised machine learning (SML) classifiers named Random forest (RF), Naive Bayes (NB), Decision Tree Classifier (DTC), Logistic regression (LR), K-NN, and support vector machine (SVM) for predicting heart and diabetes diseases. As a performance scale, we used accuracy, precision, recall, and F1 measure. In the case of diabetes disease, Random Forest outperforms the performance metrics by achieving 81% accuracy in python and DTC outperforms by placing 65% in Weka. On the other hand, in case of heart disease, LR achieves the highest score of 75% accuracy in Python and DTC gets the highest value of 79% accuracy in Weka. At last, the comparison result is shown between WEKA and Python tools in this paper. We got better results in Python than in the WEKA tool for the diabetes data set.
Article
Full-text available
In the real world, it is very difficult for fish farmers to select the perfect fish species for aquaculture in a specific aquatic environment. The main goal of this research is to build a machine learning that can predict the perfect fish species in an aquatic environment. In this paper, we have utilized a model using random forest. To validate the model, we have used a dataset of aquatic environments for 11 different fishes. To predict the fish species, we utilized the different characteristics of the aquatic environment including pH, temperature, and turbidity. As a performance metrics, we measured accuracy, TP rate, and kappa statistics. Experimental results demonstrate that the proposed random forest-based prediction model shows an accuracy of 88.48%, kappa statistic 87.11%, and TP rate 88.5% for the tested dataset. In addition, we compare the proposed model with the state-of-art models-J48, random forest, KNN, classification, and regression (CART). The proposed model outperforms the existing models by exhibiting a higher accuracy score, TP rate, and kappa statistics. Keywords: Accuracy prediction Aquaculture Fish survival Random forest model Supervised machine learning This is an open-access article under the CC BY-SA license.
Chapter
Full-text available
This paper presents an Internet of Things (IoT) system using K Nearest Neighbors Machine Learning Model for selection fish species by analyzing a fish data set. For storing real time data from used sensors, we used a cloud server. We make a dynamic website for giving information of various fish species living in an aquatic environment. This website is connected with cloud server; anyone can easily watch it on a web application. Therefore, they can easily decide what should follow the next step, which kinds of fish are surviving in the water. For constructing the proposed IoT system, we utilized 5 sensors including mq7, ph, temperature, ultrasonic and turbidity. These sensors are connected with an Arduino Uno. The real time data of water environment using sensor is obtained in the cloud server as a csv format file. In this study, we have utilized a server of thingspeak. The end user of fish farming can monitor easily remotely using the proposed IoT system.
Article
Full-text available
Objectives: This study is aimed at correlating ASPECTS with mortality and morbidity in patients with acute middle cerebral artery territory infarction and at determining the cutoff value of ASPECTS that may predict the outcome. Methods: 150 patients diagnosed with acute middle cerebral artery territory infarction were involved in this study. Risk factors, initial NIHSS, and GCS were determined. An initial or follow-up noncontrast CT brain was done and assessed by ASPECTS. Outcomes were determined by mRS during the follow-up of cases after 3 months. Correlations of ASPECTS and outcome variables were done by Spearman correlation. Logistic regression analysis and ROC curve were done to detect the cutoff value of ASPECTS that predicts unfavorable outcomes. Results: The most common subtypes of ischemic strokes were lacunar stroke in 66 patients (44%), cardioembolic stroke in 39 patients (26%), and LAA stroke in 30 cases (20%). The cardioembolic stroke had a statistically significant lower ASPECT score than other types of ischemic strokes (P < 0.05). Spearman correlation showed that lower ASPECTS values (worse outcome) were more in older patients and associated with lower initial GCS. ASPECTS values were inversely correlated with initial NIHSS, inpatient stay, inpatient complications, mortality, and mRS. The ASPECTS cutoff value determined for the prediction of unfavorable outcomes was equal to ≤7. The binary logistic regression analysis detected that patients with ASPECTS ≤ 7 were significantly associated with about fourfold increased risk of poor outcomes (OR 3.95, 95% CI 2.09-11.38, and P < 0.01). Conclusions: ASPECTS is a valuable and appropriate technique for the evaluation of the prognosis in acute ischemic stroke. Patients with high ASPECTS values are more likely to attain favorable outcomes, and the cutoff value of ASPECTS is a strong predictor for unfavorable outcomes. This trial is registered with ClinicalTrials.gov NCT04235920.
Article
Full-text available
Background and Purpose—Prediction of infarct extent among patients with acute ischemic stroke (AIS) using computed tomography perfusion (CTP) is defined by predefined discrete CTP thresholds. Our objective is to develop a threshold-free CTP based machine learning model to predict follow-up infarct in AIS patients. Methods—68 patients from the PRoveIT study were used to derive a machine learning model (ML) using random forest to predict follow-up infarction voxel by voxel, and 137 patients from the HERMES study were used to test the derived ML model. Average map, Tmax, cerebral blood flow (CBF), cerebral blood volume, and time variables including stroke onset-to-imaging and imaging-to-reperfusion time, were used as features to train the ML model. Spatial and volumetric agreement between the ML model predicted follow-up infarct and actual follow-up infarct were assessed. Relative CBF<0.3 threshold using RAPID software and time dependent Tmax thresholds were compared to the ML model. Results—In the test cohort (137 patients), median follow-up infarct volume predicted by the ML model was 30.9 mL (interquartile range (IQR): 16.4–54.3 mL), compared to a median 29.6 mL (IQR: 11.1–70.9 mL) of actual follow-up infarct volume. The Pearson correlation coefficient between two measurements was 0.80 [95% confidence interval: 0.74–0.86, P<0.001)] while the volumetric difference was -3.2 mL (IQR: -16.7–6.1 mL). Volumetric difference with the ML model was smaller vs. the rCBF<0.3 threshold and the time dependent Tmax threshold (P<0.001). Conclusions—A Machine learning using CTP data and time estimates follow-up infarction in AIS patients better than current methods.
Article
Full-text available
This paper presents a prototype to classify stroke that combines text mining tools and machine learning algorithms. Machine learning can be portrayed as a significant tracker in areas like surveillance, medicine, data management with the aid of suitably trained machine learning algorithms. Data mining techniques applied in this work give an overall review about the tracking of information with respect to semantic as well as syntactic perspectives. The proposed idea is to mine patients’ symptoms from the case sheets and train the system with the acquired data. In the data collection phase, the case sheets of 507 patients were collected from Sugam Multispecialty Hospital, Kumbakonam, Tamil Nadu, India. Next, the case sheets were mined using tagging and maximum entropy methodologies, and the proposed stemmer extracts the common and unique set of attributes to classify the strokes. Then, the processed data were fed into various machine learning algorithms such as artificial neural networks, support vector machine, boosting and bagging and random forests. Among these algorithms, artificial neural networks trained with a stochastic gradient descent algorithm outperformed the other algorithms with a higher classification accuracy of 95% and a smaller standard deviation of 14.69.
Article
Full-text available
Stroke is one of the fatal diseases that affect the brain and causes death within 3 to 10 hours. However, most of the deaths caused by a stroke can be avoided with the identification of the nature of stroke and react to it in a timely manner by intelligent health systems. The state-of-the-art Cyber Physical Systems (CPS) enables interaction between physical and computational world to identify any anomaly in the physical world and respond to it. The response of CPS may vary depending upon the context of the physical world. Extensive research has been done in this area from the perspective of Wireless Sensor Networks, Body Area Networks, and wearable smart devices. This article proposes a Cyber Physical System for detecting the occurrence of stroke in patients, who have a high risk of stroke or have survived a stroke before. The developed CPS sends recorded data to the doctor and alerts him when the stroke occurs. The proposed system is operating on data acquired from EEG sensors from patients’ brain. This article aimed at decreasing human mortality rate due to stroke and will bridge the gaps in CPS due to interdisciplinary isolation. The disciplines involved in the development of a CPS include communication networks, pattern recognition, software engineering, mathematics, and biomedical etc.
Article
Full-text available
Ischemic stroke is a leading cause of disability and death worldwide among adults. The individual prognosis after stroke is extremely dependent on treatment decisions physicians take during the acute phase. In the last five years, several scores such as the ASTRAL, DRAGON, and THRIVE have been proposed as tools to help physicians predict the patient functional outcome after a stroke. These scores are rule-based classifiers that use features available when the patient is admitted to the emergency room. In this paper, we apply machine learning techniques to the problem of predicting the functional outcome of ischemic stroke patients, three months after admission. We show that a pure machine learning approach achieves only a marginally superior Area Under the ROC Curve (AUC) ( $0.808\pm 0.085$ ) than that of the best score ( $0.771\pm 0.056$ ) when using the features available at admission. However, we observed that by progressively adding features available at further points in time, we can significantly increase the AUC to a value above 0.90. We conclude that the results obtained validate the use of the scores at the time of admission, but also point to the importance of using more features, which require more advanced methods, when possible.
Article
Objective This study aimed to investigate the effects of dual-task gait training using a treadmill on gait ability, dual-task interference, and fall efficacy in people with stroke. Methods Patients with chronic stroke (N = 34) were recruited and randomly allocated to the experimental or control group. Both groups underwent gait training on a treadmill and a cognitive task. In the experimental group, gait training was conducted in conjunction with the cognitive task, whereas in the control group, the training and the cognitive task were conducted separately. Each intervention was provided for 60 minutes, twice a week, for a period of 6 weeks for both groups. The primary outcomes were as follows: gait parameters (speed, stride, variability, and cadence) under single-task and dual-task conditions, correct response rate (CRR) under single-task and dual-task conditions, and dual-task cost (DTC) in gait parameters and CRR. The secondary outcome was the fall efficacy scale. Results Dual-task gait training using a treadmill improved all gait parameters in the dual-task condition, speed, stride, and variability in the single-task condition, and CRR in both conditions. Difference between the groups was observed in speed, stride, and variability in the dual-task condition. Furthermore, dual-task gait training on a treadmill improved DTC in speed, variability, and cadence along with that in CRR, indicating true improvement of DTC, which led to significant improvement in DTC in speed and variability compared with single-task training. Conclusions Dual-task gait treadmill training was more effective in improving gait ability in dual-task training and DTI than single-task training involving gait and cognitive task separately in people with chronic stroke.
Article
Background and objective: Cerebral stroke has become a significant global public health issue in recent years. The ideal solution to this concern is to prevent in advance by controlling related metabolic factors. However, it is difficult for medical staff to decide whether special precautions are needed for a potential patient only based on the monitoring of physiological indicators unless they are obviously abnormal. This paper will develop a hybrid machine learning approach to predict cerebral stroke for clinical diagnosis based on the physiological data with incompleteness and class imbalance. Methods: Two steps are involved in the whole process. Firstly, random forest regression is adopted to impute missing values before classification. Secondly, an automated hyperparameter optimization(AutoHPO) based on deep neural network(DNN) is applied to stroke prediction on an imbalanced dataset. Results: The medical dataset contains 43,400 records of potential patients which includes 783 occurrences of stroke. The false negative rate from our prediction approach is only 19.1%, which has reduced by an average of 51.5% in comparison to other traditional approaches. The false positive rate, accuracy and sensitivity predicted by the proposed approach are respectively 33.1, 71.6, and 67.4%. Conclusion: The approach proposed in this paper has effectively reduced the false negative rate with a relatively high overall accuracy, which means a successful decrease in the misdiagnosis rate for stroke prediction. The results are more reliable and valid as the reference in stroke prognosis, and also can be acquired conveniently at a low cost.
Conference Paper
Electronic medical claims (EMCs) can be used to accurately predict the occurrence of a variety of diseases, which can contribute to precise medical interventions. While there is a growing interest in the application of machine learning (ML) techniques to address clinical problems, the use of deep-learning in healthcare have just gained attention recently. Deep learning, such as deep neural network (DNN), has achieved impressive results in the areas of speech recognition, computer vision, and natural language processing in recent years. However, deep learning is often difficult to comprehend due to the complexities in its framework. Furthermore, this method has not yet been demonstrated to achieve a better performance comparing to other conventional ML algorithms in disease prediction tasks using EMCs. In this study, we utilize a large population-based EMC database of around 800,000 patients to compare DNN with three other ML approaches for predicting 5-year stroke occurrence. The result shows that DNN and gradient boosting decision tree (GBDT) can result in similarly high prediction accuracies that are better compared to logistic regression (LR) and support vector machine (SVM) approaches. Meanwhile, DNN achieves optimal results by using lesser amounts of patient data when comparing to GBDT method.