Content uploaded by Dr.Pravin R. Kshirsagar
Author content
All content in this area was uploaded by Dr.Pravin R. Kshirsagar on Nov 13, 2021
Content may be subject to copyright.
1825
* For correspondence.
Journal of Environmental Protection and Ecology 22, No 5, 1825–1835 (2021)
Atmospheric pollution
NOVEL HYBRID ARTIFICIAL INTELLIGENCE-BASED
ALGORITHM TO DETERMINE THE EFFECTS OF AIR
POLLUTION ON HUMAN ELECTROENCEPHALOGRAM
SIGNALS
M. A. BERLINa*, NIRAJ UPADHAYAYAb, ALI ALGAHTANIc,
VINEET TIRTHc, SAIFUL ISLAMd, K. MURALIe,
PRAVIN R. KSHIRSAGARf*, BUI THANH HUNGg,
PRASUN CHAKRABARTIh, PANKAJ DADHEECHi
aDepartment of Computer Science and Engineering, RMD Engineering College,
Chennai,Tamil Nadu, India
E-mail: mba.cse@rmd.ac.in
bDepartment of Computer Science, J. B. Institute of Engineering and
Technology, 500 075 Telangana, India
cDepartment of Mechanical Engineering, College of Engineering, King Khalid
University, 61 411 Abha, Kingdom of Saudi Arabia; Research Centre for
Advanced Materials Science (RCAMS), King Khalid University, Guraiger,
61 413 Abha, Asir, Kingdom of Saudi Arabia
dDepartment of Civil Engineering College of Engineering, King Khalid
University, 61 413 Abha, Asir, Kingdom of Saudi Arabia
eDepartment of Electronics and Communication Engineering, Vijaya Institute
of Technology for Women, Enikepadu, Andhra Pradesh, Hyderabad, Telengana,
India
fDepartment of Electronics and Communication Engineering, AVN Institute of
Technology & Engineering, Hyderabad, Telengana, India
gData Analytics & Articial Intelligence Laboratory – DAAI Lab, Articial
Intelligence and Information System Programme, Institute of Engineering –
Technology, Thu Dau Mot University, 06 Tran Van On Street, Phu Hoa District,
Thu Dau Mot City, Binh Duong Province, Vietnam
hProvost and lnstitute Endowed Distinguished Senior Chair, Techno lndia NJR
Institute of Technology, 313 003 Udaipur, Rajasthan, lndia
iComputer Science and Engineering, Swami Keshvanand Institute of Technology
(SKIT), Management and Gramothan, 302 017 Jaipur, Rajasthan, India
Abstract. Air pollution is a serious global issue. The complicated mix of gases, uids, and particle
matter, heterogeneous, and growing pollutants from cars, manufacturers, or houses, damage the
1826
ionosphere and also damage human health. In preliminary research, the risk for short and long-term
exposure to coronary heart disease in existing levels of ambient particulate matter was consistently
raised. These challenges can be modelled with Articial intelligence (AI). Their benet is that they
can resolve the issue without the realisation of the conceptual link between the input and output data
under the situations of partial data. The AI approach may be used to accurately identify coronary
disease, and numerous diseases to prevent heart problems. This paper provides a comprehensive
analysis of the data on air pollution and cardiovascular disease by healthcare experts and regulatory
authorities and also differentiates individuals with cardiovascular disease from healthy individuals
easily. The decision support system focused on AI Technology can help physicians better diagnose
heart patients. When chosen by the Correlation-based function subset selection algorithm, the clas-
sier logistic regression for 20-fold cross-validation exhibited the greatest 92% accuracy.
Keywords: environmental factors, air pollution, cardiovascular diseases, Articial intelligence (AI),
correlation-based feature subset selection, relief feature selection algorithm, logistic regression, Naïve
Bayes, k-nearest neighbour algorithm.
AIMS AND BACKGROUND
Air pollution is an important risk factor in the breathing and cardiovascular systems
and environmental circumstances of human health. Receiving nationwide ambient
quality requirements for the preservation of public health and the environment,
different environmental protection authorities in different nations have recently
laid down1,2. The particulate matter (PM) (ozone at ground level (O3), carbon
monoxide (CO), sulphur oxides (SO2), nitrogen oxides (NOx) and lead as “critical
pollutants” are dened in the set. Among these six contaminants, O3 is one of the
most toxic emissions with detrimental impacts on human health and agricultural
development3. Several researchers in the eld of air pollution prediction have
recently been approved for the discovery and implementation of meteorological
modelling models, such as statistical models. Articial neural network (ANN), grey
model, and other hybrid techniques4–6. Governments, companies, and households
not only respond more promptly and economically to air pollution issues but also
acquire more knowledge into more expensive corrective actions6–8.
Kshirsagar et al.4,9,10 discussed about Electrocardiogram (ECG) is a bio-electric
signal that records the electric function of the heart over time. In the diagnosis of
cardiac illnesses and choice of suitable therapy for a patient, timely and precise
diagnosis is essential. Signal ECGs are utilised as parameters for cardiac disease
identication, with the majority of the data being derived from the MIT-BIH
dataset and Physio Data Net. The ECG signal is preprocessed using the toolbox of
the Wavelet and is also utilised to remove the ECG signal. Animesh Hazra et al.11
suggested a few of the existing research on cardiac disease prediction utilising data
mining methods analyses the many mixtures of mining algorithms employed and
determine what strategies are benecial and successful. In addition, certain pos-
sible improvements have been discussed in the prediction model. Suriya Beguma
et al.12 illustrated the application of several Heart Disease Dataset techniques is
1827
addressed. The Random Forest Classier machinery learning method has dem-
onstrated its accuracy and reliability in the suggested system. Skrzypski et al.1
suggested an air pollution prediction system based on articial neural networks.
The level of the toxic gases or a given range of levels can be predicted (class of
air quality)2. Comparable results recommend that air pollution is a key avoidable
cause of increasing respiratory problems and irregular heartbeats.
EXPERIMENTAL
Hazardous air pollutants cause heart disorders such as heart attack blockage (arte-
rial occlusion) and cardiac tissue death related to lack of oxygen, resulting in a
potential loss of heart (infarct formation). Various classication systems for the
diagnosis of cardiac disease have been checked for their success on complete and
certain functions. Selecting main features such as relief, maximum relevance –
minimum redundancy and least absolute shrinkage and selection operator has
been used and the efciency of the classiers has been checked on those features
extracted. Logistic regression, K-NN, and NB were used for common articial
intelligence classication11,12. The approach for the model validation and analysis
of the performance measures is estimated as shown further in Fig. 2.
A variety of possible methods have been developed to trigger immediate ef-
fects of air pollution on the cardiovascular system, plasma, and lung receptors,
as indicated in Fig. 1, or negative impacts caused by oxidative stress and chronic
inammation. Changed autonomy may also lead to arterial plaque destabilisation
and heart arrhythmias start. Severe cardiovascular reactions to PM exposure such
as myocardial infarction may lead to direct air-conditioning consequences1.
The particulate matters (PMs) are a collection of solid and liquid particles
which inuence human health signicantly based on their size (i.e. the lungs reach
PM10, diameter < 10 μm, and the pulmonary particles penetrate further into the
lung with diameters < 2.5 μm) (PM2.5) (Ref. 6).
1828
Air pollution (PM)
Z
Lung
inflammation
Blood
translocation
Autonomic
regulation
Heart Disease
PM
10
PM
2.5
Heart disease
database
Pre-processing
Feature extraction
and reduction
CFS
Relief
LASSO
AI algorithm
LR
KNN
NB
Model
prediction
Existence of Heart disease
Absence of Heart disease
K-fold cross-
validation method
Heart disease caused due
to Air pollution
Fig. 1. Block diagram of a hybrid intelligent system for predicting heart disease caused due to air
pollution
Database. The largest dataset available at the 2016 Physionet Challenge was used to
test the proposed technique. The Physionet database comprises six (A to F) datasets
containing 3240 raw heart sound samples in total. These samples were obtained
separately by various research groups in clinical and non-clinical (i.e. home visits)
settings using heterogeneous sensor devices from different countries9. The dataset
includes clean and noisy cardiovascular sounds. Both healthy individuals and
patients with several cardiac problems, in particular coronary artery disease and
heart valve disease, obtained the information. Themes included girls, adults, and
elderly people from various age groups. The sound registered in the heart ranged
between 5 and just 120 s (Ref. 11). The typical sounds in the heart normally con-
1829
sist of a rst sound (S1) and a second sound (S2). S1 arises when the mitral and
tricuspid valves close due to the sudden increase in pressure within the ventricles
at the start of the insulated ventricular contract. The aortic and pulmonary valves
are closed at the beginning of the diastole12.
Preprocessing of heart sound. The electronic stethoscope sound of the heart often
contains background noise. The preprocessing of cardiac sound is a critical phase
in the automatic detection of cardiac beat recordings. It exposes the underlying
physiological structure of the cardiac signal through the identication of anoma-
lies in the meaningful PCG signal regions and enables the automated detection of
pathological events7. The segmentation process is known as the detection of the
precise dates of the rst and second heart sounds (e.g. S1 and S2). The key objec-
tive of this procedure is to ensure that the incoming heartbeat is aligned correctly
before classication because it improves the recognition score signicantly. A
pattern of ECG beats is the input of this method. We take a few simple steps to
remove ECG beats from a given ECG signal4,5:
1. Standardisation of the ECG to the range from one to zero.
2. Find the t wave set in the Massachusetts Institute of Technology-Beth
Israel Hospital.arrhythmia database for the electrocardiogram R-peaks of their
consequent annotation le.
3. Separate the ECG continuous signal in a heartbeat sequence based on the
t waves obtained and attaches a marker on the annotation le for every heartbeat.
4. Redimensioning each heartbeat to a xed length (280 samples).
FEATURE EXTRACTION AND REDUCTION
One of the main steps in categorisation is to isolate and reduce features as the right
categorisation cannot be accurate even though features do not have to be selected.
This decreased dimensioning helps to reduce the database size while speeding up
the deduction procedure for a large database in particular. For that purpose, there
are various algorithms: Correlation-based feature subset selection (CFS); Relief
feature selection algorithm, and Least absolute shrinkage and selection operator.
Correlation-based feature subset selection (CFS): “A valuable functional subset
contains features that are not (not dened) one with the other, specically connected
to the class”13,14. A function evaluation form, based on test theory assumptions,
gives the estimation as dened in equation (1):
rfc = k rfc (1)
(k + k(k – 1) rff)1/2
where rfc – the correlation between the summed features and class variable; k –
the number of features; rfc – the average of correlation between features and class
variable; rff – the average inter-correlation between features.
1830
Continuous functions are converted into categorical attributes using supervised
as a preprocessing step before they are used as classication tasks to satisfy their
nominal or categorical and continuous or ordinal features in equation (1). The
information gain theory is used to estimate the extent of the association between
nominal characteristics. Furthermore, possible sub-sets of reduced features are
possible (n is initially the number of possible initial features). Particularly for a
large characteristic set, it would also be unrealistic to discover every sub-set to nd
the best subset Also, in different applications, both lter-type and wrapper-type
feature selection methods use correlation-based approaches.
Relief feature selection algorithm. Relief is a distance-based method used to assess
feature weight in the calculation of the angle from a randomly selected example
(nearest hit) by observing the variation in functional values while calculation the
distance between a randomly selected case and its closest case, provided it has a
different value on the same feature as calculating the nearest distance the greater
the chance of the closest miss’ distance, the greater the weight of the feature, but
the opposite, the weight of the characteristics is inversely proportional to the prob-
ability of the nearest hit’s distance. A relief algorithm is described here, where the
weight of Feature X is targeted and denoted by D[U] in equation (2):
D[U] = P(different value of X | nearest instance from the different class) –
P(different value of X | nearest instance from same class). (2)
Least absolute shrinkage and selection operator. Using the last absolute reduction
and selection operator to pick features by upgrading the absolute importance of
the features. The LASSO works superbly with low coefcients14. AI classica-
tion techniques are used to identify heart patients and healthy individuals. Some
popular classication algorithms are discussed briey with their theoretical basis:
Logistic regression; Naïve Bayes; k-nearest neighbour.
Logistic regression. The logistic HSMM regression is similar to an SVM-based
probability of emission and allows more segregation in various states. Logistic
regression is a binary classication that uses a logistic function to map the func-
tion space or prediction variables to the binary response variables7,9,11. The logistic
function σ(α) is dened as:
σ(α) = 1/(1 + exp(–α)). (3)
Within the logistic function, it is possible to dene the probability of a state
or class given the input analyses Ot :
P[qt = α|Ot] = σ(w′Ot), (4)
where w is the weights of the framework used for every analysis or input. The
classier is constructed and weighted on the dataset on lesser squares10. The prob-
1831
ability of each analysis given the state for the one-versus-all logistic regression
bj(Ot|αj) is found by using the Bayes’ rule:
bj(Ot) = P[Ot|qt = α] = P[qt = α|Ot] × P(Ot). (5)
P(αj)
The P(Ot) can be measured from a normal multivariate allocation of the entire
training data, and P(α j) is the initial allocation of the probability.
Bayesian networks. A Naive Bayes (NB) is a technique for training. The theorem
determined the form of new factor variables in terms of the highest probability.
The NB uses data sets to evaluate a certain class’s vector likelihood5,13. Depend-
ing on its likelihood value for individual vector the development work class is
determined. For the grouping of texts, NB is used. Then the most possible class
is allocated test text using the Bayes rule:
G(A|X) = G(X|A) G(A), (6)
G(X)
where G(A|X) is the posterior probability of class; G(A) – the prior probability of
class; G(X|A) – the likelihood which is the probability of the predictor given class,
and G(X) – the prior probability of predictor.
k-Nearest neighbour. The k-nearest neighbour approach (KNN) denes the clos-
est neighbour about the k value dening the number of closest neighbours to be
seen to identify the data point class14,15. The structure-based approach covers the
fundamental structure of data with less mechanical data training. In a less techni-
cal structure, all the information is divided into datasets and training data, and
the distance between the sampling points and all the training points is calculated.
RESULT AND DISCUSSION
This sector addresses diverse perspectives on the processes and consequences of
classication. We rst evaluated the efciency of different articial intelligence al-
gorithms, including logistical regression, KNN, and Naive Bayes, to set up complete
heart disease results. In the second, CFS, Relief, and LASSO selection algorithms
were used for essential features. Third, classiers also measured the usefulness of
chosen features. The k-fold method was often used for cross-validation. The clas-
sication algorithms were applied to check the output assessment methods. This
research carried out tests with k = 12 for the KNN classiers. At k = 12, however,
the efciency of KNN as shown in Fig. 2 was excellent.
1832
Fig. 2. Performance of KNN for different values of k
The relation between temperature, humidity, heart rate, and various risk level,
and human comfort decisions that can be applied to the constructed continuous
monitoring system is described in Table 1. 20-fold CV category metrics of heart-
disease algorithms is described in Table 2.
Table 1. Boolean decision-making table based on patient status
Tem -
perature
sensor
(oC)
Humidity
in (%age)
Human perception Pulse rate sensor
(per min)
Action taken Level of
risk
<37 31–41 comfortable 60 to 100 no action 1
37–38 41–46 comfortable for
many people
40–60 or 100–120 informed fam-
ily members
2
>38 46–52 uncomfortable for
many people at up-
per age
40–60 or 100–120 informed to a
local doctor
3
>38 >52 high humid, ex-
tremely uncomfort-
able
<40 or >120 emergency 4
Table 2. 20-fold CV category metrics of heart-disease algorithms
Predictive model Accuracy Specicity Sensitivity
Linear regression 92.00 89.98 88.12
Naive Bayes 89.34 92.45 86.45
KNN 85.57 88.01 84.56
1833
Fig. 3. Accuracy performance of various classiers with 20-fold CV
Fig. 4. Specicity performance of various classiers with 20-fold CV
Fig. 5. Sensitivity performance of various classiers with 20-fold CV
Fig. 6. Performance of various classiers with 20-fold CV
1834
The predictive performance of LR was 92% (accuracy) as shown in Fig 3,
88.12% (sensitivity) as shown in Fig 5, and 89.98% (specicity) as illustrated in
Fig 4. The second classier was NB that has 92% (specicity), 86.45% (sensitivity),
and 89.34% (precision). K-NN, which has 88.01% (precision), 84.56% (accuracy),
and 85.57% (accuracy) of the classication was the third important classication.
The efciency of 20 fold CV classiers on the maximum range is shown in Fig.
6. As shown in it, LR’s output in terms of precision, sensitivity, and specicities
beats the other two classiers.
CONCLUSIONS
Air pollution is a furious worldwide problem. Increased cars, industry, and home
pollutants are detrimental to the atmosphere and impact human lives as well. The
device was checked on a dataset of Cleveland cardiac disease. Three popular clas-
siers, for example, logistic regressions, KNN, and NB have been used for select-
ing the essential characteristics with the three practical algorithms relief, CFS and
LASSO. In the validation scheme, the k-fold cross-validation approach is used.
Different calculation methods were often introduced to verify the efciency of clas-
siers. When chosen using a correlation-based subset sorting algorithm, the logistic
regression with 20-fold cross-validation displayed the highest accuracy of 92%.
Acknowledgements. The authors thankfully acknowledge the Deanship of Scientic Research, King
Khalid University (KKU), Abha, Asir, Kingdom of Saudi Arabia for funding this research under
the grant number R.G.P.2/89/41. This work has been done remotely at DAAI Lab, Thu Dau Mot
University Vietnam and i3 LABs, Techno India NJR Institute of Technology, India.
REFERENCES
1. Pravin Kshirsagar et al.: Operational Collection Strategy for Monitoring Smart Waste Manage-
ment System Using Shortest Path Algorithm. J Environ Prot Ecol, 22 (2), 566 (2021).
2. R. J. LAUMBACH, H. M. KIPEN: Respiratory Health Effects of Air Pollution: Update on
Biomass Smoke and Trafc Pollution. J Allergy Clin Immunol, 129, 3 (2012).
3. T. M. CHEN, J. GOKHALE, S. SHOFER, W. G. KUSCHNER: Outdoor Air Pollution: Nitrogen
Dioxide, Sulfur Dioxide, and Carbon Monoxide Health Effects. Am J Med Sci, 333 (4), 249
(2007).
4. P. KSHIRSAGAR, S. AKOJWAR: Novel Approach for Classication and Prediction of Non
Linear Chaotic Databases. In: proceedings of the 2016 International Conference on Electrical,
Electronics, and Optimization Techniques, 2016, 514-518. DOI: 10.1109/ICEEOT.2016.7755667.
5. NEERAJ SATHAWANE, PRAVIN KSHIRSAGAR: ECG Signals for Chaotic Diagnosis Using
ANN, PSO and Wavelet. International Journal of Electronics, Communication & Instrumentation
Engineering Research and Development (IJECIERD), 4 (2), 127 (2014).
6. C. A. POPE, R. T. BURNETT, G. D. THURSTON: Cardiovascular Mortality and Long-term
Exposure to Particulate Air Pollution: Epidemiological Evidence of General Pathophysiological
Pathways of Disease. Circulation, 109, 71 (2004).
1835
7. SUMITRA SANGWAN, TAZEEM AHMAD KHAN: Review Paper Automatic Console for
Disease Prediction Using Integrated Module of A-priori and k-mean through ECG Signal. Int J
Technol Res Eng, 2 (7) ISSN(Online): 2347-4718, 1368 (2015).
8. A. PETERS, D. W. DOCKERY, J. E. MULLER: Increased Particulate Air Pollution and the
Triggering of Myocardial Infarction. Circulation. 103, 2810 (2001).
9. P. R.KSHIRSAGAR, S. G.AKOJWAR, R. DHANORIYA: Classication of ECG-signals Using
Articial Neural Networks. In: Proceedings of International Conference on Intelligent Technolo-
gies and Engineering Systems, Lecture Notes in Electrical Engineering, vol. 345. Springer, Cham,
2014.
10. PRAVIN KSHIRSAGAR, SUDHIR AKOJWAR: Hybrid Heuristic Optimization for Benchmark
Datasets. Int J Comput Appl, 146 (7), 11 (2016).
11. ANIMESH HAZRA, SUBRATA KUMAR MANDALL: Heart Disease Diagnosis and Prediction
Using Machine Learning and Data Mining Techniques: a Review. Adv Comput Sci Technol, 10
(7), 2137 (2017).
12. SURIYA BEGUMA, FAROOQ AHMED SIDDIQUEB, RAJESH TIWARIC: A Study for Predict-
ing Heart Disease Using Machine Learning. Turk J Comput Math Educ, 12 (10), 4584 (2021).
13. P. R. KSHIRSAGAR, S. G. AKOJWAR: Prediction of Neurological Disorders Using Optimized
Neural Network. In: Proceedings of the International Conference on Signal Processing, Com-
munication, Power and Embedded System, 3–5 October, Paralakhemundi, India, 2016, 1695-169.
DOI: 10.1109/SCOPES.2016.7955731.
14. PRAVIN KSHIRSAGAR, SUDHIR AKOJWAR: Classication and Detection of Neurological
Disorders Using ICA and AR as Feature Extractor. Int J Eng Sci, 1 (1), 1 (2015).
15. PRAVIN KSHIRSAGAR, SUDHIR AKOJWAR, NIDHI BAJAJ: A Hybridised Neural Network
and Optimisation Algorithms for Prediction and Classication of Neurological Disorders. Int J
Biomed Eng Technol, 28 (4), 307 (2020). DOI: 10.1504/IJBET.2018.095981.
Received 26 May 2021
Revised 25 June 2021