ArticlePDF Available

Stroke Prediction Analysis using Machine Learning Classifiers and Feature Technique

December 2021
International Journal of Electronics and Communications Systems 1(2):1-7

December 2021
1(2):1-7

DOI:10.24042/ijecs.v1i2.10393

License
CC BY-SA 4.0

Authors:

Md. Monirul Islam

Daffodil International University

Sharmin Akter

Atish Dipankar University of Science and Technology

Md Rokunojjaman

Chongqing University of Technology

Jahid Hasan Rony

Dhaka University of Engineering & Technology

Show all 5 authorsHide

Stroke is one of the fatal brain diseases that cause death in 3 to 10 hours. However, most stroke mortality can be prevented by identifying the nature of the stroke and reacting to it promptly through smart health systems. In this paper, a machine learning model is approached for predicting the existence of stroke of a patient where the Random forest classifier outperforms the state-of-the-art models, including Logistic Regression, Decision Tree Classifier (DTC), K-NN. We conduct the experiments on datasets which has 5110 observations with 12 attributes. We also applied EDA for preprocessing and feature techniques for balancing the datasets. Finally, a cloud-based mobile app collects user data to analyze and provide the possibility of stroke for alerting the person with the accuracy of precision 96%, recall 96%, and F1-score 96%. This user-friendly system can be a lifesaver as the person gets an essential warning very easily by providing very little information from anywhere with a mobile device.

Block Diagram of the Proposed Methodology

Figures - uploaded by Md. Monirul Islam

Content may be subject to copyright.

Content uploaded by Md. Monirul Islam

Content may be subject to copyright.

International Journal of Electronics and Communications System

Volume 1, Issue 2, 17-22.

ISSN: 2798-2610

http://ejournal.radenintan.ac.id/index.php/IJECS/index

DOI: 10.24042/ijecs.v1i2.10393

 Corresponding author:

Md. Monirul Islam, Uttara University, Dhaka-1230, BANGLADESH. monirul@uttarauniversity.edu.bd

Stroke prediction analysis using machine learning classifiers and

feature technique

Md. Monirul Islam *

Uttara University, Dhaka-1230,

BANGLADESH

Sharmin Akter

Atish Dipankar University of Science

& Technology, Dhaka-1230,

BANGLADESH

Md. Rokunojjaman

Chongqing University of Technology,

Chongqing 400054, CHINA

Jahid Hasan Rony

Dhaka University of Engineering and

Technology, Gazipur, Gazipur-1700,

BANGLADESH

Al Amin

Chongqing University of

Technology, Chongqing 400054,

CHINA

Susmita Kar

Dhaka University of Engineering and

Technology, Gazipur, Gazipur-1700,

BANGLADESH

Article Info

Abstract

Article history:

Received: October 4, 2021

Revised: December 8, 2021

Accepted: December 15 2021

Stroke is one of the fatal brain diseases that cause death in 3 to 10 hours.

However, most stroke mortality can be prevented by identifying the nature of the

stroke and reacting to it promptly through smart health systems. In this paper, a

machine learning model is approached for predicting the existence of stroke of a

patient where the Random forest classifier outperforms the state-of-the-art

models, including Logistic Regression, Decision Tree Classifier (DTC), K-NN. We

conduct the experiments on datasets which has 5110 observations with 12

attributes. We also applied EDA for preprocessing and feature techniques for

balancing the datasets. Finally, a cloud-based mobile app collects user data to

analyze and provide the possibility of stroke for alerting the person with the

accuracy of precision 96%, recall 96%, and F1-score 96%. This user-friendly

system can be a lifesaver as the person gets an essential warning very easily by

providing very little information from anywhere with a mobile device.

Keywords:

Feature Technique,

Random Forest Classifier,

Stroke disease

To cite this article: M. M. Islam, S. Akter, M. Rokunojjaman, J. H. Rony, A. Al Amin, and S. Kar, “Stroke

Prediction Analysis using Machine Learning Classifiers and Feature Technique,” Int. J. Electron. Commun. Syst.,

vol. 1, no. 2, 17-22, 2021.

INTRODUCTION

A stroke happens to interrupt blood flow

to a portion of your brain [1]. A loss of blood

circulation to some brain areas causes a

stroke, which is also known as a brain attack

[2]. Furthermore, clot blocking is the major

cause of stroke in the brain (thrombosis). The

blood vessel delivers the brain portion and is

subsequently run down of blood and oxygen.

The brain cells expire as an outcome of the

lack of blood and O2, and the part of the body

it regulates ceases working [3]. Death and

disability happen for stroke in the United

States badly. Ischemic embolic and

hemorrhagic strokes cause the majority of

strokes. An ischemic embolic stroke happens

when a blood clot exits the patient's brain,

travels through the circulatory system, and

becomes lodged in smaller brain arteries.

Another type is hemorrhagic stroke, which

occurs when leaks or ruptures a blood vessel

in the brain. [4]. The use of various predictive

indicators to predict the outcome of a stroke

could help doctors identify high-risk patients

and reduce morbidity. Overweight, physical

inactivity, diabetics, and other parameters

such as age, sex, race can be used to predict the

possibility of stroke. On the other hand,

machine learning offers an option, particularly

for large-scale multi-institutional data that

18    Int. J. Electron. Commun. Syst, 1 (2) (2021) 17-22 
may be readily included in a forecast [5] based 
on freshly available data. 
Smartphones  can  play  a  vital  role  in 
establishing  a  between the  healthcare system 
and the global population. A mobile app is very 
user-friendly and popular in the current world. 
According to statistics, there are more than 3.2 
billion smartphone users. As a result, a mobile 
app  could  be  one  of  the  most  popular  and 
effective mediums. In case of stroke, a disease 
avoidable  through  awareness,  a  smartphone 
could be the easier way to reach people. 
Machine  learners  have  various 
applications  expanding  within  the  study  of 
bioinformatics,  a  subfield  of  artificial 
intelligence  which  includes  improving 
calculations  to  discover  how  projections  are 
dependent  on  information.  Bioinformatics 
manages  computational  and  numerical 
approaches  for  comprehension  and 
manipulating  natural  data.  Six  natural 
environments have been subjected to machine 
learning.  To  assist  in  the  analysis  of  stroke, 
machine  learning  algorithms  for  examining 
neuroimaging data are used. The diagnosis and 
treatment of stroke disease in underdeveloped 
countries is extremely difficult due to a lack of 
diagnostic  technologies  and  a  scarcity  of 
doctors  and  other  resources  that  impede  the 
accurate  prediction  and  treatment  of  heart 
patients.  Recently,  computer  technology  and 
machine  learning  approaches  have  been 
developed  with  this goal  in  mind to  improve 
the  system's  ability  to  assist  doctors  in  the 
initial  phases  of  disease  decision-making  [6]. 
Our motivation  is to  benefit  stroke  prediction 
to prevent casualty and ensure accessibility for 
everyone.  
Among various studies in this area, in [7], 
stroke prediction  directions  were designed as 
risk  assessment  and  web-based  cooperative 
Java  applets.  These  Java  applets  enable  risk 
calculations and can be run interactively  with 
any web browser that supports Java 1.1. With 
this  method,  patient  data  can  simply  be 
entered  into  a  computer  that  uses  complex 
statistical  models  to  produce  instant 
calculations  of  risk  scores.  Authors  in  [8] 
examined  the  utility  of  the  echo  planer 
magnetic  perfusion  imaging  and  diffusion-
weighted  imaging  in  predicting  stock  with  a 
critical  hemispheric  infraction.  In  [9],  type  2 
diabetes  patients  have  an  increased  risk  of 
stroke.  In  this  approach,  they  examined  the 
stroke predictors and effects of atorvastatin on 
certain  stroke subtypes  in  type  2  diabetes in 
the  collaborative  atorvastatin  diabetes  study, 
which used Cox regression models to evaluate 
atorva's  impact  statins  on  stroke,  and  assess 
the  risks  associated  with  stroke  and 
underlying  stroke.  The  authors  determined 
how  many  self-measures  of  blood  pressure 
they  took  home  compared  to their  predictive 
value for the risk of a stroke. In [10], they have 
designed  and  compared  several  methods  of 
learning  machines,  which  can  predict  the 
result  of  endovascular  intervention  in  the 
previous  histosa  circulation.  The  authors 
developed  a CPS  to  detect  the  appearance  of 
the patients who are at high risk or survived a 
stroke  before  [11].  CPS  developed  send  data 
registered by the doctor and warned to find  a 
stroke. 
Furthermore, the proposed system  works 
in  data  purchased  by  the  patient's  brain 
electroencephalography  sensor.  The  authors 
have developed  a  model learning  model  (ML) 
calculated  by  threshold  (ML)  to  predict  the 
tracking  infarction  in  patients  with  acute 
ischemic stroke  [12].   The author determined 
the  optimal  number  of  self-measurements  of 
blood pressure at home based on its predictive 
value  for  stroke  risk.  Therefore,  the  Cox 
proportional  hazard  regression  model  [13] 
was  used  to  investigate  the  prognostic 
significance  of  blood  pressure  for  the  risk  of 
stroke,  which  was  adjusted  for  possible 
confounding factors. 
The author has developed a very accurate 
and  highly  interpretable  predictive  model. 
These  predictive  models  will  be  provided  in 
the  form  of  sparse  decision  lists  [14],  which 
are  derived  from  a  series  of  if  .  .  .Then  .  .  . 
Statements  where  the  if  statement  defines  a 
set  of  feature  partitions  and  the  then 
statement corresponds to the predicted result 
of  interest.  In  [15],  the  authors  predicted 
stroke  occurrence  using  a  large  population-
based EMC  database and  also compared  DNN 
with  three  other  ML  methods.  The  authors 
compared the  Cox  proportional hazard model 
with an  automatic  stroke prediction approach 
based on  a cardiovascular health  study (CHS) 
dataset  [16].  The  author  developed  a  hybrid 
machine  learning  method  to  predict  stroke 
based  on  incomplete  and  unbalanced 
physiological  data  for  clinical  diagnosis  [17]. 
Using this method, the whole process involves 

Int. J. Electron. Commun. Syst, 1 (2) (2021) 17-22 19

two steps. First, use random forest regression

to estimate missing values before

classification. Secondly, automatic

hyperparameter optimization (AutoHPO)

based on deep neural networks (DNN)

predicts stroke on unbalanced data sets. The

author applies machine learning principles to

existing large data sets to effectively predict

strokes based on potentially changeable risk

factors [18] to develop applications that

provide personalized warnings and related

information based on each user's stroke risk

level Lifestyle of stroke risk factors. The

authors raised the hypothesis that the degree

of stenosis, the irregularity of the plaque's

surface, eColity, and consistency, complicated

in a total score of risk (TPR), are predictors of

the ischemic blow [19]. Three classification

algorithms that include the decision-making

tree, naive bayes, and neuronal network are

used to predict the stretch based on models

higher than general statistics and obtained an

adequate model for identification [20].

This paper proposed Stroke prediction

analysis using a machine learning algorithm

using a healthcare dataset, including various

kinds of risk factors.

The rest of the paper is organized as

tracks: the methodology is stated in the next

section. Study outcome and discussion are in

the results and discussion section. Finally, the

paper concludes with future scope.

METHOD

Figure 1 shows the detailed block diagram

of the proposed methodology.

Figure 1. Block Diagram of the Proposed

Methodology

Dataset Description

The utilized dataset [21] contains 5110

observations with 12 attributes. The attributes

are gender, age, hypertension, heart_disease,

ever_married, work_type, Residence type,

average glucose_level, BMI, smoking_status,

and stroke. Stroke is a dependent variable, and

others are independent variables.

Exploratory Data Analysis (EDA)

EDA often uses data visualization

approaches to analyze and examine data sets

and summarize their key characteristics. It can

help determine how best data sources can be

handled to get the needed answers, facilitating

the finding of patterns, spot anomalies,

hypotheses, or assume checks for data

scientists. In this part, we defined the missing

values, data counts, dropped the id column,

exploring each variable.

Feature Techniques

Feature engineering means transforming

raw data into features that better signify the

predictive models' underlying problem and

improve model accuracy in unsightly data.

Many techniques can be employed, including

NearMiss, SMOTE, Tomak Links, etc. This

paper utilized the synthetic minority over-

sampling technique (SMOTE) after

preprocessing the datasets in the EDA step.

The target variable has 201 stroke occurrences

and 4908 non-occurrence patients.

Machine Learning Analysis

This paper utilized various machine

learning (ML) models containing Naïve Bayes,

Random Forest, Ada Boost Algorithm. Among

them, the Random Forest model outperforms

the best accuracy. So Random forest model is

described here.

Random Forest (RF)

RF is a supervised learning algorithm. It

creates a "forest" from a series of decision

trees that are usually trained using a "bagging"

process. The basic premise of the bagging

method is that combining different learning

models can improve the overall result. The

advantage of RF is that it can solve

classification and regression problems that

make up most of the existing machine learning

systems. Decision trees or bagging classifiers

have almost the same hyperparameters as

random forests. Fortunately, you can use

random forest classifiers instead of combining

decision trees and bagging classifiers. You can

use the algorithm's suppressor to handle the

regression task of the random forest. The RF

adds additional unpredictability to the model

20 Int. J. Electron. Commun. Syst, 1 (2) (2021) 17-22

as the tree develops. Instead of the most

relevant feature, splitting a node looks for the

optimal function in a random selection of

features. Hence, many types lead to better

models. Therefore, the algorithm for splitting

nodes in the random forest only considers a

random subset of features. Instead of looking

for the best possible threshold, you can make

the tree more random by using random

thresholds for each function [22].

The random forest training algorithm uses

the general aggregation bootstrap technique,

or bags, for train trees students. Figure 2

demonstrates the concept of a random forest

model where Tree 1 and Tree 2 associate Class

X. So, the majority vote/predicted output is

Class X.

Figure 2. Random Forest

Predictions for unseen samples I can be

produced after training by summing the

predictions from all of the separate regression

trees on i':



 







or by taking the majority vote in the case of

classification trees [24].

User Interface

User data are collected through mobile

apps. Users input gender, age, work_type,

heart_disease, hypertension, ever_married,

Residence_type, BMI, avg glucose level,

smoking_status through the mobile app. In

Figure 3, the mobile app interface is shown.

User data are stored in the cloud Firestore

database. After the processing, the result is

stored in the Firestore and shown on the user

end.

Figure 3. Mobile App

RESULTS AND DISCUSSION

Python programming language is used to

classify the proposed model and describe

other models for data analysis. The instrument

is very useful for analysis and includes

different methods. For each model species, we

have used 20% of the values for testing and

80% for training. We take precision, recall, and

f1-score as performance metrics.

Precision (P): P is the ratio of the positive

cases correctly predicted to the positive cases.

The low false positive rate refers to high

accuracy. It is a measure of a classifier's

accuracy. In equation 1, it is defined

mathematically.

  

 

Int. J. Electron. Commun. Syst, 1 (2) (2021) 17-22 21

Recall (R): R refers to the ratio of positive

cases correctly predicted to all positive

classification cases. It is a measure of a

classification's completeness. In Equation 2, R

is defined mathematically.

  

 

F1-Score: is an average weighted accuracy and

recall. F1, if there is an inconsistent class

distribution in the data set, is usually more

useful than precision. It is displayed in

equation 3 mathematically, and the result of

accuracies can be seen in table 1.

     

   

Table 1. Result Accuracies

ML Model

Accuracies (%)

Preci

sion

Rec

all

F1-

Score

Logistic

Regression [23]

DTC [24]

K-NN [25]

Random Forest

(proposed)

Table 1 describes the result of accuracies.

The random forest model gives the highest

accuracies in all performance metrics as 96%.

K-NN achieves 3rd place as holding 90%

performance metrics, DTC stays 2nd position

as 93% accuracy, and logistic regression

receives 87% accuracy.

CONCLUSION

This paper presented a machine learning

approach to the stroke dataset. The Random

Forest models showed the best accuracy as

precision 96%, recall 96%, and F1-score 96%,

outperforming the state-of-art models

including logistic regression, decision tree

classifier, and K-NN. The utilized dataset is

imbalanced, therefore, SMOTE feature

engineering is used to process the data. In the

future, we will plan to analyze the dataset

using deep learning methods and try to

enhance the accuracy.

REFERENCES

[1] “Stroke (Causes, Symptoms, and

Complications) - Assignment Point.”

https://www.assignmentpoint.com/science

/medical/stroke-causes-symptoms-and

(accessed May 16, 2021).

[2] Mayo Clinic, “Stroke - Symptoms and

Causes,” Mayo Clinic, Nov. 06, 2020.

https://www.mayoclinic.org/diseases-

conditions/stroke/symptoms-causes/syc-

20350113.

[3] B. Wedro, “Stroke Warning Signs,

Symptoms, Treatment, Types & Causes,”

MedicineNet, 2019.

https://www.medicinenet.com/stroke_sym

ptoms_and_treatment/article.htm

[4] H. Rodgers, “Stroke,” Neurological

Rehabilitation, pp. 427–433, 2013, doi:

10.1016/b978-0-444-52901-5.00036-8

[5] H. Asadi, R. Dowling, B. Yan, and P. Mitchell,

“Machine Learning for Outcome Prediction

of Acute Ischemic Stroke Post Intra-Arterial

Therapy,” PLoS ONE, vol. 9, no. 2, p. e88225,

Feb. 2014, doi:

10.1371/journal.pone.0088225.

[6] P. Govindarajan, R. K. Soundarapandian, A.

H. Gandomi, R. Patan, P. Jayaraman, and R.

Manikandan, “Classification of stroke

disease using machine learning algorithms,”

Neural Computing and Applications, vol. 32,

no. 3, pp. 817–828, Jan. 2019, doi:

10.1007/s00521-019-04041-y.

[7] T. Lumley, R. A. Kronmal, M. Cushman, T. A.

Manolio, and S. Goldstein, “A stroke

prediction score in the elderly,” Journal of

Clinical Epidemiology, vol. 55, no. 2, pp.

129–136, Feb. 2002, doi: 10.1016/s0895-

4356(01)00434-6.

[8] P. A. Barber et al., “Prediction of stroke

outcome with echoplanar perfusion- and

diffusion-weighted MRI,” Neurology, vol. 51,

no. 2, pp. 418–426, Aug. 1998, doi:

10.1212/wnl.51.2.418.

[9] G. A. Hitman et al., “Stroke prediction and

stroke prevention with atorvastatin in the

Collaborative Atorvastatin Diabetes Study

(CARDS),” Diabetic Medicine, vol. 24, no. 12,

pp. 1313–1321, Dec. 2007, doi:

10.1111/j.1464-5491.2007.02268.x.

[10] H. Asadi, R. Dowling, B. Yan, and P. Mitchell,

“Machine Learning for Outcome Prediction

of Acute Ischemic Stroke Post Intra-Arterial

Therapy,” PLoS ONE, vol. 9, no. 2, p. e88225,

22 Int. J. Electron. Commun. Syst, 1 (2) (2021) 17-22

Feb. 2014, doi:

10.1371/journal.pone.0088225.

[11] A. Laghari, Z. A. Memon, S. Ullah and I.

Hussain, "Cyber Physical System for Stroke

Detection," in IEEE Access, vol. 6, pp.

37444-37453, 2018, doi:

10.1109/ACCESS.2018.2851540.

[12] H. Kuang et al., "Computed Tomography

Perfusion–Based Machine Learning Model

Better Predicts Follow-Up Infarction in

Patients With Acute Ischemic Stroke,"

stroke, vol. 52, no. 1, pp. 223–231, Jan.

2021, doi: 10.1161/strokeaha.120.030092.

[13] T. Ohkubo et al., “How many times should

blood pressure be measured at home for

better prediction of stroke risk? Ten-year

follow-up results from the Ohasama study,”

Journal of Hypertension, vol. 22, no. 6, pp.

1099–1104, Jun. 2004, doi:

10.1097/00004872-200406000-00009.

[14] B. Letham, C. Rudin, T. H. McCormick, and D.

Madigan, “Interpretable classifiers using

rules and Bayesian analysis: Building a

better stroke prediction model,” Annals of

Applied Statistics, vol. 9, no. 3, pp. 1350–

1371, Sep. 2015, doi: 10.1214/15-

AOAS848.

[15] C. Hung, W. Chen, P. Lai, C. Lin and C. Lee,

"Comparing deep neural network and other

machine learning algorithms for stroke

prediction in a large-scale population-based

electronic medical claims database," 2017

39th Annual International Conference of

the IEEE Engineering in Medicine and

Biology Society (EMBC), 2017, pp. 3110-

3113, doi: 10.1109/EMBC.2017.8037515.

[16] A. Khosla, Y. Cao, C. C.-Y. Lin, H.-K. Chiu, J.

Hu, and H. Lee, “An integrated machine

learning approach to stroke prediction,”

Proceedings of the 16th ACM SIGKDD

international conference on Knowledge

discovery and data mining - KDD ’10, 2010,

doi: 10.1145/1835804.1835830.

[17] T. Liu, W. Fan, and C. Wu, “A hybrid machine

learning approach to cerebral stroke

prediction based on imbalanced medical

dataset,” Artificial Intelligence in Medicine,

vol. 101, p. 101723, Nov. 2019, doi:

10.1016/j.artmed.2019.101723.

[18] M. Monteiro et al., "Using Machine Learning

to Improve the Prediction of Functional

Outcome in Ischemic Stroke Patients," in

IEEE/ACM Transactions on Computational

Biology and Bioinformatics, vol. 15, no. 6,

pp. 1953-1959, 1 Nov.-Dec. 2018, doi:

10.1109/TCBB.2018.2811471.

[19] P. Prati et al., “Carotid Plaque Morphology

Improves Stroke Risk Prediction:

Usefulness of a New Ultrasonographic

Score,” Cerebrovascular Diseases, vol. 31,

no. 3, pp. 300–304, 2011, doi:

10.1159/000320852.

[20] T. Kansadub, S. Thammaboosadee, S.

Kiattisin and C. Jalayondeja, "Stroke risk

prediction model based on demographic

data," 2015 8th Biomedical Engineering

International Conference (BMEiCON), 2015,

pp. 1-3, doi:

10.1109/BMEiCON.2015.7399556.

[21] “Stroke Prediction Dataset,”

kaggle.com.https://www.kaggle.com/fedes

oriano/stroke-prediction-dataset.

[22] MM. Islam, MA. Kashem, J. Uddin, “Fish

survival prediction in an aquatic

environment using random forest model,”

IAES International Journal of Artificial

Intelligence (IJ-AI), vol. 10, no. 3, pp. 614–

622, 2021, doi: 10.11591/ijai.v10.i3.pp614-

622.

[23] Md. M. Islam, J. Uddin, M. A. Kashem, F.

Rabbi, and Md. W. Hasnat, “Design and

Implementation of an IoT System for

Predicting Aqua Fisheries Using Arduino

and KNN,” Intelligent Human Computer

Interaction, pp. 108–118, 2021, doi:

10.1007/978-3-030-68452-5_11.

[24] A. Esmael, M. Elsherief, and K. Eltoukhy,

“Predictive Value of the Alberta Stroke

Program Early CT Score (ASPECTS) in the

Outcome of the Acute Ischemic Stroke and

Its Correlation with Stroke Subtypes,

NIHSS, and Cognitive Impairment,” Stroke

Research and Treatment, vol. 2021, pp. 1–

10, Jan. 2021, doi: 10.1155/2021/5935170.

[25] C. Y. Baek, W. N. Chang, B. Y. Park, K. B. Lee,

K. Y. Kang, and M. R. Choi, “Effects of dual-

task gait treadmill training on gait ability,

dual-task interference, and fall efficacy in

people with stroke: A Randomized

Controlled Trial,” Physical Therapy, Feb.

2021, doi: 10.1093/ptj/pzab067.

Predictive modelling and identification of key risk factors for stroke using machine learning

Article

Full-text available

May 2024

Strokes are a leading global cause of mortality, underscoring the need for early detection and prevention strategies. However, addressing hidden risk factors and achieving accurate prediction become particularly challenging in the presence of imbalanced and missing data. This study encompasses three imputation techniques to deal with missing data. To tackle data imbalance, it employs the synthetic minority oversampling technique (SMOTE). The study initiates with a baseline model and subsequently employs an extensive range of advanced models. This study thoroughly evaluates the performance of these models by employing k-fold cross-validation on various imbalanced and balanced datasets. The findings reveal that age, body mass index (BMI), average glucose level, heart disease, hypertension, and marital status are the most influential features in predicting strokes. Furthermore, a Dense Stacking Ensemble (DSE) model is built upon previous advanced models after fine-tuning, with the best-performing model as a meta-classifier. The DSE model demonstrated over 96% accuracy across diverse datasets, with an AUC score of 83.94% on imbalanced imputed dataset and 98.92% on balanced one. This research underscores the remarkable performance of the DSE model, compared to the previous research on the same dataset. It highlights the model's potential for early stroke detection to improve patient outcomes.

A comprehensive review of predictive analytics models for mental illness using machine learning algorithms

Article

Full-text available

Jun 2024

Balancing cerebrovascular disease data with integrated ensemble learning and SVM-SMOTE

Article

Full-text available

Mar 2024

The paper addresses the challenge of imbalanced classification in the context of cerebrovascular diseases, including stroke, transient ischemic attack (TIA), and vascular dementia. The imbalanced nature of cerebrovascular disease datasets poses significant challenges to conventional machine learning algorithms, making precise diagnosis and effective management difficult. The aim of the paper is to propose a novel approach, the INTEL_SS algorithm, which combines ensemble learning techniques with Support Vector Machine-Synthetic Minority Over-sampling Technique (SVM-SMOTE) to effectively handle the imbalanced nature of cerebrovascular disease datasets. The goal is to improve the accuracy of diagnosis and management of cerebrovascular diseases through advanced machine learning techniques. The proposed methodology involves several key steps, including preprocessing, SVM-SMOTE, and ensemble learning. Preprocessing techniques are used to improve the quality of the dataset, SVM-SMOTE is employed to address class imbalance, and ensemble learning methods such as bagging, boosting, and stacking are utilized to improve overall classification performance. The experimental results demonstrate that the INTEL_SS algorithm outperforms existing methods in terms of accuracy, precision, recall, F1-score, and AUC-ROC. Performance metrics are used to assess the effectiveness of the proposed approach, and the results consistently show the superiority of INTEL_SS compared to state-of-the-art imbalanced classification algorithms. The paper concludes that the INTEL_SS algorithm has the potential to enhance the diagnosis and management of cerebrovascular diseases, offering new opportunities to apply machine learning techniques to improve healthcare outcomes.

Deep learning-based approach for prediction of brain stroke from MR images for IoT in healthcare

Article

Full-text available

Jan 2024

p class="CM12">This study develops a technique to predict brain strokes using magnetic resonance imaging (MRI). Worldwide, brain stroke is a leading factor in death and long-term impairment. The impact of stroke on the life of survivors is substantial, often resulting in disability. Stroke analysis performed manually takes a lot of time and is subject to intra- and inter-operator variability. Consequently, this work aims to create a computer-based system for the prediction of stroke utilizing deep learning techniques, which help in timely diagnosis. The MRI images are preferred as it provides images of good contrast and no ionizing radiations are used in this imaging method. The deep learning methods included in this proposed work are DenseNet-121, Xception, LeNet, ResNet-50 and VGG-16. The DenseNet-121 classifier outperformed other classifiers and achieved acccuracy of 96%. The outcomes of the proposed approach for stroke prediction in IOT healthcare systems show that improved performance is attained using deep learning methods.</p

Analysis of Data and Feature Processing on Stroke Prediction using Wide Range Machine Learning Model

Article

Full-text available

Apr 2024

Stroke is a disease which cause the death of brain cells, so that the part of the body controlled by the brain loses its function. If not treated immediately, this disease can cause long-term disability, brain damage, and death. In this research, stroke prediction was carried out on the Stroke dataset acquired from the Kaggle dataset using various machine learning models. Then, data sampling techniques are used to handle data imbalance problems in the stroke dataset, which include Random Undersampling, Random Oversampling, and SMOTE techniques. Pearson Correlation and Principal Component Analysis are also used for dimensional reduction and analyzing the important features that are most influential in predicting stroke. Pearson Correlation produces five attributes that have the highest Pearson coefficient, namely age, hypertension, heart disease, blood sugar level, and marital status. Experimental results have demonstrated that the utilization of RUS, ROS, and SMOTE sampling techniques can significantly boost the F1-Score testing by an impressive 43.44%, 34.44%, and 35.55% respectively, as compared to experiments conducted without implementing any data sampling techniques. The highest F1-Score testing was achieved using the Support Vector Machine and Gaussian Naïve Bayes models, namely 0.83.

A stroke prediction framework using explainable ensemble learning

Article

Feb 2024
COMPUT METHOD BIOMEC

The death of brain cells occurs when blood flow to a particular area of the brain is abruptly cut off, resulting in a stroke. Early recognition of stroke symptoms is essential to prevent strokes and promote a healthy lifestyle. FAST tests (looking for abnormalities in the face, arms, and speech) have limitations in reliability and accuracy for diagnosing strokes. This research employs machine learning (ML) techniques to develop and assess multiple ML models to establish a robust stroke risk prediction framework. This research uses a stacking-based ensemble method to select the best three machine learning (ML) models and combine their collective intelligence. An empirical evaluation of a publicly available stroke prediction dataset demonstrates the superior performance of the proposed stacking-based ensemble model, with only one misclassification. The experimental results reveal that the proposed stacking model surpasses other state-of-the-art research, achieving accuracy, precision, F1-score of 99.99%, recall of 100%, receiver operating characteristics (ROC), Mathews correlation coefficient (MCC), and Kappa scores 1.0. Furthermore, Shapley's Additive Explanations (SHAP) are employed to analyze the predictions of the black-box machine learning (ML) models. The findings highlight that age, BMI, and glucose level are the most significant risk factors for stroke prediction. These findings contribute to the development of more efficient techniques for stroke prediction, potentially saving many lives.

Advanced Interoperable Framework for Real-Time Predictive Analysis Leveraging Machine Learning and IoT in Smart Health Monitoring Systems

Conference Paper

Nov 2023

Optimization Based Random Forest Algorithm Modification for Detecting Monkeypox Disease

Conference Paper

Oct 2023

Diabetes and heart disease prediction using machine learning classifiers based on Weka, python

Conference Paper

Jan 2023

Diabetes and heart disease are some of the most critical diseases for human beings. Lots of people are suffering from these two diseases. Early-stage diagnosing of these diseases is very essential for doctors and patients. Machine learning (ML) can play a vital role in this section. To this, ML algorithms can analyze the health data using various Data analytics tools. In this paper, we have found out the prediction of heart disease and diabetes patients. To validate the experimental analysis, we analyzed two datasets named diabetes dataset and heart disease prediction dataset in two popular analytics tools including WEKA and Python. Also, we used 6 supervised machine learning (SML) classifiers named Random forest (RF), Naive Bayes (NB), Decision Tree Classifier (DTC), Logistic regression (LR), K-NN, and support vector machine (SVM) for predicting heart and diabetes diseases. As a performance scale, we used accuracy, precision, recall, and F1 measure. In the case of diabetes disease, Random Forest outperforms the performance metrics by achieving 81% accuracy in python and DTC outperforms by placing 65% in Weka. On the other hand, in case of heart disease, LR achieves the highest score of 75% accuracy in Python and DTC gets the highest value of 79% accuracy in Weka. At last, the comparison result is shown between WEKA and Python tools in this paper. We got better results in Python than in the WEKA tool for the diabetes data set.

Stroke prediction analysis using federated machine learning

Conference Paper

Jul 2023

Fish survival prediction in an aquatic environment using random forest model

Article

Full-text available

Sep 2021

In the real world, it is very difficult for fish farmers to select the perfect fish species for aquaculture in a specific aquatic environment. The main goal of this research is to build a machine learning that can predict the perfect fish species in an aquatic environment. In this paper, we have utilized a model using random forest. To validate the model, we have used a dataset of aquatic environments for 11 different fishes. To predict the fish species, we utilized the different characteristics of the aquatic environment including pH, temperature, and turbidity. As a performance metrics, we measured accuracy, TP rate, and kappa statistics. Experimental results demonstrate that the proposed random forest-based prediction model shows an accuracy of 88.48%, kappa statistic 87.11%, and TP rate 88.5% for the tested dataset. In addition, we compare the proposed model with the state-of-art models-J48, random forest, KNN, classification, and regression (CART). The proposed model outperforms the existing models by exhibiting a higher accuracy score, TP rate, and kappa statistics. Keywords: Accuracy prediction Aquaculture Fish survival Random forest model Supervised machine learning This is an open-access article under the CC BY-SA license.

Design and Implementation of an IoT System for Predicting Aqua Fisheries Using Arduino and KNN

Chapter

Full-text available

Feb 2021

This paper presents an Internet of Things (IoT) system using K Nearest Neighbors Machine Learning Model for selection fish species by analyzing a fish data set. For storing real time data from used sensors, we used a cloud server. We make a dynamic website for giving information of various fish species living in an aquatic environment. This website is connected with cloud server; anyone can easily watch it on a web application. Therefore, they can easily decide what should follow the next step, which kinds of fish are surviving in the water. For constructing the proposed IoT system, we utilized 5 sensors including mq7, ph, temperature, ultrasonic and turbidity. These sensors are connected with an Arduino Uno. The real time data of water environment using sensor is obtained in the cloud server as a csv format file. In this study, we have utilized a server of thingspeak. The end user of fish farming can monitor easily remotely using the proposed IoT system.

Predictive Value of the Alberta Stroke Program Early CT Score (ASPECTS) in the Outcome of the Acute Ischemic Stroke and Its Correlation with Stroke Subtypes, NIHSS, and Cognitive Impairment

Article

Full-text available

Jan 2021

Objectives: This study is aimed at correlating ASPECTS with mortality and morbidity in patients with acute middle cerebral artery territory infarction and at determining the cutoff value of ASPECTS that may predict the outcome. Methods: 150 patients diagnosed with acute middle cerebral artery territory infarction were involved in this study. Risk factors, initial NIHSS, and GCS were determined. An initial or follow-up noncontrast CT brain was done and assessed by ASPECTS. Outcomes were determined by mRS during the follow-up of cases after 3 months. Correlations of ASPECTS and outcome variables were done by Spearman correlation. Logistic regression analysis and ROC curve were done to detect the cutoff value of ASPECTS that predicts unfavorable outcomes. Results: The most common subtypes of ischemic strokes were lacunar stroke in 66 patients (44%), cardioembolic stroke in 39 patients (26%), and LAA stroke in 30 cases (20%). The cardioembolic stroke had a statistically significant lower ASPECT score than other types of ischemic strokes (P < 0.05). Spearman correlation showed that lower ASPECTS values (worse outcome) were more in older patients and associated with lower initial GCS. ASPECTS values were inversely correlated with initial NIHSS, inpatient stay, inpatient complications, mortality, and mRS. The ASPECTS cutoff value determined for the prediction of unfavorable outcomes was equal to ≤7. The binary logistic regression analysis detected that patients with ASPECTS ≤ 7 were significantly associated with about fourfold increased risk of poor outcomes (OR 3.95, 95% CI 2.09-11.38, and P < 0.01). Conclusions: ASPECTS is a valuable and appropriate technique for the evaluation of the prognosis in acute ischemic stroke. Patients with high ASPECTS values are more likely to attain favorable outcomes, and the cutoff value of ASPECTS is a strong predictor for unfavorable outcomes. This trial is registered with ClinicalTrials.gov NCT04235920.

CT Perfusion Based Machine Learning Model Better Predicts Follow-up Infarction in Patients with Acute Ischemic Stroke

Article

Full-text available

Dec 2020
STROKE

Background and Purpose—Prediction of infarct extent among patients with acute ischemic stroke (AIS) using computed tomography perfusion (CTP) is defined by predefined discrete CTP thresholds. Our objective is to develop a threshold-free CTP based machine learning model to predict follow-up infarct in AIS patients. Methods—68 patients from the PRoveIT study were used to derive a machine learning model (ML) using random forest to predict follow-up infarction voxel by voxel, and 137 patients from the HERMES study were used to test the derived ML model. Average map, Tmax, cerebral blood flow (CBF), cerebral blood volume, and time variables including stroke onset-to-imaging and imaging-to-reperfusion time, were used as features to train the ML model. Spatial and volumetric agreement between the ML model predicted follow-up infarct and actual follow-up infarct were assessed. Relative CBF<0.3 threshold using RAPID software and time dependent Tmax thresholds were compared to the ML model. Results—In the test cohort (137 patients), median follow-up infarct volume predicted by the ML model was 30.9 mL (interquartile range (IQR): 16.4–54.3 mL), compared to a median 29.6 mL (IQR: 11.1–70.9 mL) of actual follow-up infarct volume. The Pearson correlation coefficient between two measurements was 0.80 [95% confidence interval: 0.74–0.86, P<0.001)] while the volumetric difference was -3.2 mL (IQR: -16.7–6.1 mL). Volumetric difference with the ML model was smaller vs. the rCBF<0.3 threshold and the time dependent Tmax threshold (P<0.001). Conclusions—A Machine learning using CTP data and time estimates follow-up infarction in AIS patients better than current methods.

RETRACTED ARTICLE: Classification of stroke disease using machine learning algorithms

Article

Full-text available

Jan 2019
NEURAL COMPUT APPL

This paper presents a prototype to classify stroke that combines text mining tools and machine learning algorithms. Machine learning can be portrayed as a significant tracker in areas like surveillance, medicine, data management with the aid of suitably trained machine learning algorithms. Data mining techniques applied in this work give an overall review about the tracking of information with respect to semantic as well as syntactic perspectives. The proposed idea is to mine patients’ symptoms from the case sheets and train the system with the acquired data. In the data collection phase, the case sheets of 507 patients were collected from Sugam Multispecialty Hospital, Kumbakonam, Tamil Nadu, India. Next, the case sheets were mined using tagging and maximum entropy methodologies, and the proposed stemmer extracts the common and unique set of attributes to classify the strokes. Then, the processed data were fed into various machine learning algorithms such as artificial neural networks, support vector machine, boosting and bagging and random forests. Among these algorithms, artificial neural networks trained with a stochastic gradient descent algorithm outperformed the other algorithms with a higher classification accuracy of 95% and a smaller standard deviation of 14.69.

Cyber Physical System for Stroke Detection

Article

Full-text available

Jun 2018

Stroke is one of the fatal diseases that affect the brain and causes death within 3 to 10 hours. However, most of the deaths caused by a stroke can be avoided with the identification of the nature of stroke and react to it in a timely manner by intelligent health systems. The state-of-the-art Cyber Physical Systems (CPS) enables interaction between physical and computational world to identify any anomaly in the physical world and respond to it. The response of CPS may vary depending upon the context of the physical world. Extensive research has been done in this area from the perspective of Wireless Sensor Networks, Body Area Networks, and wearable smart devices. This article proposes a Cyber Physical System for detecting the occurrence of stroke in patients, who have a high risk of stroke or have survived a stroke before. The developed CPS sends recorded data to the doctor and alerts him when the stroke occurs. The proposed system is operating on data acquired from EEG sensors from patients’ brain. This article aimed at decreasing human mortality rate due to stroke and will bridge the gaps in CPS due to interdisciplinary isolation. The disciplines involved in the development of a CPS include communication networks, pattern recognition, software engineering, mathematics, and biomedical etc.

Using Machine Learning to Improve the Prediction of Functional Outcome in Ischemic Stroke Patients

Article

Full-text available

Mar 2018

Ischemic stroke is a leading cause of disability and death worldwide among adults. The individual prognosis after stroke is extremely dependent on treatment decisions physicians take during the acute phase. In the last five years, several scores such as the ASTRAL, DRAGON, and THRIVE have been proposed as tools to help physicians predict the patient functional outcome after a stroke. These scores are rule-based classifiers that use features available when the patient is admitted to the emergency room. In this paper, we apply machine learning techniques to the problem of predicting the functional outcome of ischemic stroke patients, three months after admission. We show that a pure machine learning approach achieves only a marginally superior Area Under the ROC Curve (AUC) ( $0.808\pm 0.085$ ) than that of the best score ( $0.771\pm 0.056$ ) when using the features available at admission. However, we observed that by progressively adding features available at further points in time, we can significantly increase the AUC to a value above 0.90. We conclude that the results obtained validate the use of the scores at the time of admission, but also point to the importance of using more features, which require more advanced methods, when possible.

Effects of Dual-Task Gait Treadmill Training on Gait Ability, Dual-Task Interference, and Fall Efficacy in People With Stroke: A Randomized Controlled Trial

Article

Feb 2021

Objective This study aimed to investigate the effects of dual-task gait training using a treadmill on gait ability, dual-task interference, and fall efficacy in people with stroke. Methods Patients with chronic stroke (N = 34) were recruited and randomly allocated to the experimental or control group. Both groups underwent gait training on a treadmill and a cognitive task. In the experimental group, gait training was conducted in conjunction with the cognitive task, whereas in the control group, the training and the cognitive task were conducted separately. Each intervention was provided for 60 minutes, twice a week, for a period of 6 weeks for both groups. The primary outcomes were as follows: gait parameters (speed, stride, variability, and cadence) under single-task and dual-task conditions, correct response rate (CRR) under single-task and dual-task conditions, and dual-task cost (DTC) in gait parameters and CRR. The secondary outcome was the fall efficacy scale. Results Dual-task gait training using a treadmill improved all gait parameters in the dual-task condition, speed, stride, and variability in the single-task condition, and CRR in both conditions. Difference between the groups was observed in speed, stride, and variability in the dual-task condition. Furthermore, dual-task gait training on a treadmill improved DTC in speed, variability, and cadence along with that in CRR, indicating true improvement of DTC, which led to significant improvement in DTC in speed and variability compared with single-task training. Conclusions Dual-task gait treadmill training was more effective in improving gait ability in dual-task training and DTI than single-task training involving gait and cognitive task separately in people with chronic stroke.

A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset

Article

Oct 2019
ARTIF INTELL MED

Background and objective: Cerebral stroke has become a significant global public health issue in recent years. The ideal solution to this concern is to prevent in advance by controlling related metabolic factors. However, it is difficult for medical staff to decide whether special precautions are needed for a potential patient only based on the monitoring of physiological indicators unless they are obviously abnormal. This paper will develop a hybrid machine learning approach to predict cerebral stroke for clinical diagnosis based on the physiological data with incompleteness and class imbalance. Methods: Two steps are involved in the whole process. Firstly, random forest regression is adopted to impute missing values before classification. Secondly, an automated hyperparameter optimization(AutoHPO) based on deep neural network(DNN) is applied to stroke prediction on an imbalanced dataset. Results: The medical dataset contains 43,400 records of potential patients which includes 783 occurrences of stroke. The false negative rate from our prediction approach is only 19.1%, which has reduced by an average of 51.5% in comparison to other traditional approaches. The false positive rate, accuracy and sensitivity predicted by the proposed approach are respectively 33.1, 71.6, and 67.4%. Conclusion: The approach proposed in this paper has effectively reduced the false negative rate with a relatively high overall accuracy, which means a successful decrease in the misdiagnosis rate for stroke prediction. The results are more reliable and valid as the reference in stroke prognosis, and also can be acquired conveniently at a low cost.

Comparing deep neural network and other machine learning algorithms for stroke prediction in a large-scale population-based electronic medical claims database

Conference Paper

Jul 2017

Electronic medical claims (EMCs) can be used to accurately predict the occurrence of a variety of diseases, which can contribute to precise medical interventions. While there is a growing interest in the application of machine learning (ML) techniques to address clinical problems, the use of deep-learning in healthcare have just gained attention recently. Deep learning, such as deep neural network (DNN), has achieved impressive results in the areas of speech recognition, computer vision, and natural language processing in recent years. However, deep learning is often difficult to comprehend due to the complexities in its framework. Furthermore, this method has not yet been demonstrated to achieve a better performance comparing to other conventional ML algorithms in disease prediction tasks using EMCs. In this study, we utilize a large population-based EMC database of around 800,000 patients to compare DNN with three other ML approaches for predicting 5-year stroke occurrence. The result shows that DNN and gradient boosting decision tree (GBDT) can result in similarly high prediction accuracies that are better compared to logistic regression (LR) and support vector machine (SVM) approaches. Meanwhile, DNN achieves optimal results by using lesser amounts of patient data when comparing to GBDT method.

Stroke Prediction Analysis using Machine Learning Classifiers and Feature Technique

Abstract and Figures

Recommended publications

An enhanced stroke prediction model based on data class balance and machine learning

Review on Evaluation of Stroke Prediction Using Machine Learning Methods

Diabetes and heart disease prediction using machine learning classifiers based on Weka, python

Real-time IoT pond water dataset for fish farming