ArticlePDF Available

Utilisation of Machine Learning Techniques in Testing and Training of Different Medical Datasets

Authors:

Abstract and Figures

On our planet, chemical waste increases day after day, the emergence of new types of it, as well as the high level of toxic pollution, the difficulty of daily life, the increase in the psychological state of humans, and other factors all have led to the emergence of many diseases that affect humans, including deadly once like COVID-19 disease. Symptoms may appear on a person, and sometimes they may not; some people may know their condition, and others may neglect their health status due to lack of knowledge that may lead to death, or the disease may be chronic for life. In this regard, the author executes machine learning techniques (Support Vector Machine, C5.0 Decision Tree, K-Nearest Neighbours, and Random Forest) due to their influence in medical sciences to identify the best technique that gives the highest level of accuracy in detecting diseases. Thus, this technique will help to recognise symptoms and diagnose them correctly. This article covers a dataset from the UCI machine learning repository, namely the Wisconsin Breast Cancer dataset, Chronic Kidney disease dataset, Immunotherapy dataset, Cryotherapy dataset, Hepatitis dataset and COVID-19 dataset. In the results section, a comparison is made between the execution of each technique to find out which one is the best and which one is the worst in the performance of analysis related to the dataset of each disease.
Content may be subject to copyright.
Asian Journal of Computer and Information Systems (ISSN: 2321 5658)
Volume 9 Issue 4, October 2021
Asian Online Journals (www.ajouronline.com) 29
Utilisation of Machine Learning Techniques in Testing and
Training of Different Medical Datasets
Maad M. Mijwil1, Israa Ezzat Salem2 and Rana A. Abttan3
1 Computer Techniques Engineering Department, Baghdad College of Economic Sciences University
Baghdad, Iraq
Email: mr.maad.alnaimiy [AT] baghdadcollege.edu.iq
1 Computer Techniques Engineering Department, Baghdad College of Economic Sciences University
Baghdad, Iraq
Email: israa.ezzat [AT] baghdadcollege.edu.iq
1 Computer Techniques Engineering Department, Baghdad College of Economic Sciences University
Baghdad, Iraq
Email: rana.ali.abttan [AT] baghdadcollege.edu.iq
_________________________________________________________________________________
ABSTRACT On our planet, chemical waste increases day after day, the emergence of new types of it, as well as the
high level of toxic pollution, the difficulty of daily life, the increase in the psychological state of humans, and other
factors all have led to the emergence of many diseases that affect humans, including deadly once like COVID-19
disease. Symptoms may appear on a person, and sometimes they may not; some people may know their condition, and
others may neglect their health status due to lack of knowledge that may lead to death, or the disease may be chronic
for life. In this regard, the author executes machine learning techniques (Support Vector Machine, C5.0 Decision
Tree, K-Nearest Neighbours, and Random Forest) due to their influence in medical sciences to identify the best
technique that gives the highest level of accuracy in detecting diseases. Thus, this technique will help to recognise
symptoms and diagnose them correctly
.
This article covers a dataset from the UCI machine learning repository,
namely the Wisconsin Breast Cancer dataset, Chronic Kidney disease dataset, Immunotherapy dataset, Cryotherapy
dataset, Hepatitis dataset and COVID-19 dataset. In the results section, a comparison is made between the execution
of each technique to find out which one is the best and which one is the worst in the performance of analysis related
to the dataset of each disease.
Keywords Disease, Machine Learning Techniques, COVID-19, Symptoms, Medical Datasets.
_________________________________________________________________________________
1. INTRODUCTION
The doctor or specialist makes analyses of the patient in order to ascertain his condition if there is a disorder in the
physical or psychological function that affects the well-being and execution of the patient. The disease is usually
associated with specific signs or symptoms that appear to him/her. For example, flu is usually associated with symptoms
such as headache, runny nose, and fever. Frequently, some patients do not differentiate between disease and symptoms.
Some diseases occur more frequently at certain times of the year. These diseases are also colloquially called seasonal
diseases [1]. The most common seasonal illness is bronchus and influenza [2].
Figure 1: CT- scan images of patients with COVID-19 disease.
Presently, the volume of data is growing dramatically, and its complexity increases day by day. The task of analysing
it and finding useful statistics in a traditional way by humans is challenging, and this is why there is an attempt to find
suitable techniques to solve such a problem by the computer. In addition, medical data is one of these problems because
Asian Journal of Computer and Information Systems (ISSN: 2321 5658)
Volume 9 Issue 4, October 2021
Asian Online Journals (www.ajouronline.com) 30
this data becomes more complicated with the large spread of diseases worldwide [3]. It has become difficult to control it,
especially with the spread of COVID-19 disease and the increase in infections among humans and the increase in deaths
[4]. This matter forced doctors and specialists to find techniques that help them in a significant way in diagnosing the
injured and determining their condition quickly and accurately. From these techniques are machine learning techniques.
Machine learning [5] is evolving and growing in the world of healthcare. Furthermore, healthcare [6] is always one of the
most vital areas that witness a remarkable advancement in machine learning techniques. Recently, machine learning has
been adopted to predict and analyse medical datasets due to its speed, accuracy, and low cost [7]. For example, it has
been widely applied in analysing chest images of patients with the COVID-19 disease [8-11]. These techniques can be
trained to look at these images to analyse them, locate the abnormalities, and point at areas where the virus is spread in
the human lung, and to give us a high analysis [12]. With these types of advanced technologies, clinicians can be better
informed in analysing patient information [13]. As well as it has the ability to predict early diseases such as stroke, breast
cancer and many other diseases, which made these techniques of great value to doctors. Figure 1 shows a set of CT- scan
images of people with COVID-19 disease [14].
The main contribution of this article is the exhibition of an investigation on the execution of machine learning
techniques (Support Vector Machine, C5.0 Decision Tree, K-Nearest Neighbours, and Random Forest) to perform an
analysis on a set of binary data that has been chosen from the University of California at Irwin machine learning
repository to obtain the best technique with high results in analysing data for each disease so that this technique is
supportive for doctors and specialists. This work is conducted by using Python. It is a high-level programming language
that Guido Van Rossum invented while working at the Centrum Wiskunde & Informatica Research Centre in 1986. This
language is widely used in artificial intelligence.
The following parts of this article are organised as follows: Section two reviews a set of recent studies that apply machine
learning techniques to analyse medical datasets earned from UCI machine learning repository. Section three discusses the
techniques and materials used in this research. Section four covers the results obtained through experiments as well as the
comparison between these techniques. At the end of this article conclusion and future works are advised in Section five.
2. LITERATURE SURVEY
In this section, several previous works of literature that adopt the same views of the current paper and which has an
impact on the author on its reading are presented. In addition, the researchers have not found find a similar published
study to count the medical datasets chosen from the UCI repository website, and no study that applied the same
techniques used in this paper, which make this paper unique.
The start is from a 2016 study conducted by Aswal et al. from India [15], they recommend implementing machine
learning techniques (Support Vector Machine, C5.0 decision tree, k- Nearest Neighbour) on a medical dataset from the
UCI machine learning repository, namely (Indian Liver Patient Dataset, Hepatitis Dataset, Thyroid Disease Dataset,
Lung Cancer Dataset, and Pima Indians Diabetes Dataset). Their research explains that the best execution is the Support
Vector Machine. In another paper issued at IEEE Xplore by Islam et al. in 2017 [16], they propose machine learning
techniques (K-Nearest Neighbours and Support Vector Machine) to diagnose the breast cancer termed as Wisconsin
breast cancer. This study has achieved an accuracy of more than 98% of support vector machine and earned more than
97% accuracy of K-Nearest neighbours. In another article conducted by Cahyani and Muslim [17], they make an
improvement in the C4.5 Algorithm for Chronic Kidney Disease Diagnosis by adding two factors which are
Discretization and Correlation-based Feature Selection. Their idea achieved success in analysing disease data, as they
obtain an accuracy of more than 97%. This study is very impressive. In another study, Eedi and Kolla [18], they propose
employing machine learning techniques (K-Nearest Neighbour, Random Forest, Naïve Bayes, Logistic Regression, and
Decision Tree,) to detect Breast Cancer Wisconsin Diagnostic. Their research covers Breast Cancer Wisconsin dataset
from the UCI machine learning repository. This research discovers the best execution for the random forest technique,
with more than 93% accuracy. As for the previous study that will be covered in this section, it is an article conducted by
Kumar et al. [19], on the application of one of the machine learning techniques, namely Support Vector machine with
Genetic programming, on a dataset from the UCI repository, namely BUPA liver disorder, chronic kidney disease
(CKD), fertility, and Wisconsin diagnostic breast cancer (WDBC). In this article, the authors obtain excellent accuracy
for BUPA, Fertility, WDBC, and CKD as 75.36%, 85.0%, 99.12%, and 100%, respectively.
3. MATERIALS AND TECHNIQUES
This section is divided into two parts; the first part is about the repository from which the data is taken, and the
second part is directed towards techniques that have been utilised in this article. The UCI Machine Learning Repository
[20] is a website affiliated with the University of California that includes nearly 600 free datasets to serve researchers and
authors in the machine learning community. Meanwhile, these datasets can be used easily with one condition, which is to
make a citation for the reference of this data and this repository. The table below presents a concise description of all the
datasets utilised in this comparison with their number of attributes and instance.
Asian Journal of Computer and Information Systems (ISSN: 2321 5658)
Volume 9 Issue 4, October 2021
Asian Online Journals (www.ajouronline.com) 31
Table 1: Dataset’s description
Datasets
Attributes
Instances
Wisconsin Breast Cancer [21]
32
569
Chronic Kidney disease [22]
25
400
Immunotherapy [23]
8
90
Cryotherapy [24]
7
90
Hepatitis [25]
19
155
COVID-19 [26]
7
14
In the second part, the importance of each technique utilised in this article is concisely discussed, where a set of machine
learning techniques are utilised, which are outlined below.
Support Vector Machine (SVM)
SVM [27] is one of the most widespread supervised machine learning techniques invented in 1992 by three scientists:
Bernhard Boser, Isabelle Guyon, and Vladimir Vapnik. This classifier is applied in classification and regression and
performs operations using linear equations. The classifier has the ability to predict with high accuracy while avoiding
overfitting of automatic data. We can summarize them as systems that employ a hypothesis for linear tasks in a high
dimensional space and are trained from optimization theory that applies a learning bias derived from statistical learning
theory. This technique employs hyperplanes to classify various classes in the dataset and practices various kernels like
Poly, Sigmoid, Radial Basis Function, and Linear
C5.0 Decision Tree (C5.0 DT)
C5.0 [28] is an updated and revised version of the C4.5 decision tree. This tree intentionally creates branches in the
process of using the Information gain measure. When creating a tree model, the attribute splitting is based on the
maximum amount of information gained. The data acquisition mechanism is the process of multiplying the probability of
multiplying the class by the probability register of that class. The attribute impurity measure is performed by entropy.
Large quantities of information are generated based on calculating the entropy values of either the main tree or sub-tree
features. This process continues until a decision is reached that no further division within the tree is required. The most
significant characteristic of this version of the decision tree is the ability to create a large group of branches to receive the
largest number of data and is also characterized by less memory consumption and faster implementation and support.
Unfortunately, this technique does not work with small data.
K-Nearest Neighbours (K-NN)
K-NN [29] is one of the easiest arsenals of machine learning techniques to execute. This technique is based on the
classification process, where this process is done by identifying the closest neighbours, for example, querying and using
these neighbours to determine the query class. At the beginning of implementation, it is required to specify the value of
, which is set by default 5. Moreover, the group of examples is categorized based on the class of 's closest neighbours.
Often it is necessary to take more than one neighbour into account, as these examples are required at run- time, meaning
they must be stored in memory, so sometimes this technique is called Memory-Based Classification. A disadvantage of
this technique is that it is a lazy learning method because the induction is delayed by the runtime. Besides, this technique
uses measurement equations to calculate the distance between two points of the most famous of these equations is
Euclidean Distance.
Random Forest (RF)
In 2000, Leo Breiman introduced a scheme that he called a random forest [30] whose goal is to build a set of predictions
with other schemas that grow in subspaces that are randomly selected from the data. We can define this technique as a set
of tree predictors so that each tree in the scheme depends on the values of a random vector, and samples are collected
independently and with the same distribution for all trees in the forest. In addition, this technique has a generalization
error that indicates the strength of individual trees in the forest and the continuous relationship between them. Also, the
advantage of using a random group is to split each node in the tree into error rates that compare favourably with Adaptive
Boosting and also lead to increased noise in it. This technique involves computing internal estimates that give strength,
correlation, and error and is employed to prove the response to increasing the number of features used in segmentation.
This technique can be used in the regression. This algorithm gives the best accuracy with less processing time for each
dataset.
Asian Journal of Computer and Information Systems (ISSN: 2321 5658)
Volume 9 Issue 4, October 2021
Asian Online Journals (www.ajouronline.com) 32
4. EXPERIMENTAL RESULTS
In this section, the results of the analysis of each technique are presented and its execution is evaluated based on
various factors like Testing Accuracy, Training Accuracy, Testing Time, Training Time. Figure 2 shows the mechanism
of this article in terms of input, processing and output of all medical data. Tables 2 to 5 display the execution evaluation
effects for each technique in analysing the medical dataset. The computer specifications in which this work is applied
consist of the following: Intel® Core™ i5-1130G7 Processor (4-Core), Hard disk:512GB SSD, 16GB RAM, Python
v.3.7 with Spyder IDE v.4.2.1 and running on Windows 10.0 Home build 1904164-bit (last update on February 2021).
Figure 2: The stages of this work
Table 2: Execution evaluation of SVM
Medical Datasets
Testing Accuracy
Training Accuracy
Testing
Time
Wisconsin Breast Cancer
0.96114795214
0.9536719818323
0.04125
Chronic Kidney disease
0.97578491347
0.9347588136591
0.05125
Immunotherapy
0.76261839462
0.7162193529
0.013425
Cryotherapy
0.92333333333
0.82131145451
0.013425
Hepatitis
0.8361859564
0.7861292128
0.013425
COVID-19
0.913968222222
0.891968222222
0.04125
Table 3: Execution evaluation of C5.0
Medical Datasets
Testing Accuracy
Training Accuracy
Testing
Time
Wisconsin Breast Cancer
0.9381429581
0.887820375481
0.034825
Chronic Kidney disease
0.9741222222
0.9537388134491
0.05125
Immunotherapy
0.9332432782613
0.881323282321
0.015625
Cryotherapy
0.976210000
0.890301386712
0.05125
Hepatitis
0.88264867336
0.8119202925
0.05125
COVID-19
0.71622412555
0.66731424444
0.066125
Table 4: Execution evaluation of K-NN
Medical Datasets
Testing Accuracy
Training Accuracy
Testing
Time
Wisconsin Breast Cancer
0.94102895133891
0.93119309670342
0.066125
Chronic Kidney disease
0.985222222222
0.925223452898
0.013425
Immunotherapy
0.8132435886611
0.7955811238125
0.013425
Cryotherapy
0.98888888888
0.97848589843
0.04125
Hepatitis
0.826086956
0.7611940298
0.04125
COVID-19
0.66666666666
0.63218467925
0.07125
Asian Journal of Computer and Information Systems (ISSN: 2321 5658)
Volume 9 Issue 4, October 2021
Asian Online Journals (www.ajouronline.com) 33
Table 5: Execution evaluation of RF
Medical Datasets
Testing Accuracy
Training Accuracy
Testing
Time
Wisconsin Breast Cancer
0.8891288722
0.9021282721
0.066125
Chronic Kidney disease
0.875444444
0.915243742
0.074875
Immunotherapy
0.9846421835
0.9280745516
0.066125
Cryotherapy
0.733333333
0.7321862671
0.066125
Hepatitis
0.928808192
0.928908288
0.074875
COVID-19
0.9888281133
0.9828222111
0.074875
5. CONCLUSIONS AND FUTURE DIRECTIONS
In fact, health is an invaluable blessing, and there is a wonderful saying by Anne Wilson Schaef (an American
clinical psychologist), who says, Good health is not something we can buy. However, it can be an extremely valuable
savings account. In this article, machine learning techniques are utilised to analyse medical datasets that have been
chosen from the UCI repository. This article purposes to study the effect of each technique in analysing these data, as
each group of these data has attributes and instances that differ from the other. Table 6 exhibits the effect of the execution
of each technique, as the index included four points, which are excellent execution, good execution, Fair execution, and
inadequate execution. In the future, other techniques can be applied in analysing other data or the same data collected in
order to see the strength of their implementation in analysing medical data.
Table 6: The effect of executing all techniques
Medical Datasets
Excellent
Execution
Good
Execution
Fair
Execution
Inadequate
Execution
Wisconsin Breast Cancer
SVM
K-NN
C5.0
RF
Chronic Kidney disease
C5.0
K-NN
SVM
RF
Immunotherapy
RF
C5.0
K-NN
SVM
Cryotherapy
K-NN
C5.0
SVM
RF
Hepatitis
RF
C5.0
SVM
K-NN
COVID-19
RF
SVM
C5.0
K-NN
6. REFERENCES
[1] Grassly N. C. and Fraser C., Seasonal infectious disease epidemiology, Proceedings. Biological sciences, vol.273,
no.1600, pp: 25412550, July 2006. https://doi.org/10.1098/rspb.2006.3604
[2] Tate M. D., Deng Y., Jones J. E., Anderson G. P., Brooks A. G., and Reading P. C., Neutrophils Ameliorate Lung
Injury and the Development of Severe Disease during Influenza Infection, The Journal of Immunology, vol. 183,
pp:7441-7450, November 2009. https://doi.org/10.4049/jimmunol.0902497
[3] Pandey S. C., Data Mining Techniques for Medical Data: A Review, In Proceedings of International Conference
on Signal Processing, Communication, Power and Embedded System (SCOPES), pp:1-12, Paralakhemundi, India, 3-
5 October 2016. https://doi.org/10.1109/SCOPES.2016.7955586
[4] Jia Q., Guo Y., Wang G., and Barnes S. J., Big Data Analytics in the Fight against Major Public Health Incidents
(Including COVID-19): A Conceptual Framework, International Journal of Environmental Research and Public
Health, vol.17, no.6161, pp:1-20, August 2020. https://doi.org/10.3390/ijerph17176161
[5] Jones L. D., Golan D., Hanna S. A., and Ramachandran M., Artificial intelligence, machine learning and the
evolution of healthcare: A bright future or cause for concern?, Bone & Joint Research, vol.7, no.33,pp:223-225,
March 2018. https://doi.org/10.1302/2046-3758.73.BJR-2017-0147.R1
[6] Schmidt J., Marques M. R. G., Botti S., and Marques M. A. L., Recent Advances and Applications of Machine
Learning in Solid-State Materials Science, NPJ Computational Materials, vol.5, no.83, pp:1-11, August 2019.
https://doi.org/10.1038/s41524-019-0221-0
[7] Battineni G., Sagaro G. G., Chinatalapudi N., Amenta F., “Applications of Machine Learning Predictive Models in
the Chronic Disease Diagnosis,” Journal of Personalized Medicine, vol.10, no.21, pp:1-11, March 2020.
https://doi.org/10.3390/jpm10020021
[8] Pham T. D., Classification of COVID-19 chest X-rays with deep learning: new models or fine tuning? Health
Information Science and Systems, vol.9, no. 2, November 2020. https://doi.org/10.1007/s13755-020-00135-3
[9] Mijwil, M. M., “Implementation of Machine Learning Techniques for the Classification of Lung X-Ray Images
Used to Detect COVID-19 in Humans,” Iraqi Journal of Science, vol.62, no.6., pp: 2099-2109, 2 July 2021.
https://doi.org/10.24996/ijs.2021.62.6.35.
Asian Journal of Computer and Information Systems (ISSN: 2321 5658)
Volume 9 Issue 4, October 2021
Asian Online Journals (www.ajouronline.com) 34
[10] Mijwil, M. M. and Al-Zubaidi, E. A., “Medical Image Classification for Coronavirus Disease (COVID-19) Using
Convolutional Neural Networks,” Iraqi Journal of Science, vol.62, no.8, pp: 2740-2747, 31 August 2021.
https://doi.org/10.24996/ijs.2021.62.8.27.
[11] Mijwil, M. M., Alsaadi, A. S, and Aggarwal K., “Differences and Similarities Between Coronaviruses: A
Comparative Review,” Asian Journal of Pharmacy, Nursing and Medical Sciences, vol.9, no.4, pp:49-61. 10
September 2021. https://doi.org/10.24203/ajpnms.v9i4.6696
[12] Borkowski A. A., Viswanadhan N. A., Thomas L. B., Guzman R. D., Deland L. A., and Mastorides S. M., Using
Artificial Intelligence for COVID-19 Chest X-ray Diagnosis, Federal practitioner: for the health care professionals
of the VA, DoD, and PHS, vol.37, no.9, pp: 398404, September 2020. https://doi.org/10.12788/fp.0045
[13] Sidey-Gibbons J. A. M. and Sidey-Gibbons C. J., Machine Learning in Medicine: A Practical Introduction, BMC
Medical Research Methodology, vol.19, no.64, pp:1-18, March 2019. https://doi.org/10.1186/s12874-019-0681-4
[14] Mishra A. K., Das S. K., Roy P., and Bandyopadhyay S., “Identifying COVID19 from Chest CT Images: A Deep
Convolutional Neural Networks Based Approach,” Journal of Healthcare Engineering, vol.2020, ID. 8843664,
pp:1-7, August 2020. https://doi.org/10.1155/2020/8843664
[15] Aswal S., Ahuja N. J., and Ritika, “Experimental analysis of traditional classification algorithms on bio medical
dtatasets,” In Proceedings of International Conference on Next Generation Computing Technologies (NGCT), pp:1-
6, Dehradun, India, 14-16 October 2016. https://doi.org/10.1109/NGCT.2016.7877478
[16] Islam M., Iqbal I., Haque R., and Hasan K., “Prediction of breast cancer using support vector machine and K -Nearest
neighbors,” In Proceedings of International Conference on Region 10 Humanitarian Technology (R10-HTC), pp:1-
6, Dhaka, Bangladesh,21-23 December 2017. https://doi.org/10.1109/R10-HTC.2017.8288944
[17] Cahyani N., and Muslim M. A., “Increasing Accuracy of C4.5 Algorithm by Applying Discreti zation and
Correlation-based Feature Selection for Chronic Kidney Disease Diagnosis,” Journal of Telecommunication,
Electronic and Computer Engineering, Vol.12 No.1, pp:25-32, March 2020.
[18] Eedi H. and Kolla M. “Machine Learning Approaches for Healthcare Data Analysis,” Journal of Critical Reviews,
vol.7, no.4, pp:806-81,1 February 2020. http://dx.doi.org/10.31838/jcr.07.04.149
[19] Kumar A., Sinha N., and Bhardwaj A., “A Novel Fitness Function in Genetic Programming for Medical Data
Classification,” Journal of Biomedical Informatics, vol. 112, December 2020.
https://doi.org/10.1016/j.jbi.2020.103623
[20] Dua D. and Graff C., UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of
California, School of Information and Computer Science, 2019.
[21] Street W. N., Wolberg W. H., and Mangasarian O. L., Nuclear feature extraction for breast tumor diagnosis,”
IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, vol.1905, pp:861-870,
California, United States, 1993.
[22] Thomas R., Kanso A., and Sedor J. R., Chronic Kidney Disease and Its Complications, Primary Care: Clinics in
Office Practice, vol.35, pp: 329344, 2008. https://doi.org/10.1016/j.pop.2008.01.008
[23] Khozeimeh F., Azad F. J., Oskouei Y. M., M. Jafari, S. Tehranian S., Alizadehsani R., and Layegh P., Intralesional
Immunotherapy Compared to Cryotherapy in The Treatment of Warts, International Journal of Dermatology,
vol.56, pp:474478. 2017, https://doi.org/10.1111/ijd.13535
[24] Khozeimeh F., Alizadehsani R., Roshanzamir M., Khosravi A., Layegh P., and Nahavandi S., An expert system for
selecting wart treatment method, Computers in Biology and Medicine, vol. 81, pp:167-175, February 2017.
https://doi.org/10.1016/j.compbiomed.2017.01.001
[25] Cestnik G., Konenenko I., and Bratko I., Assistant-86: A Knowledge-Elicitation Tool for Sophisticated Users. In:
Bratko, I. and Lavrac, N., Eds., Progress in Machine Learning, Sigma Press, Wilmslow, pp: 31-45, 1987.
[26] Wua Y., Chena C., and Chan Y., “The outbreak of COVID-19: An overview,” Journal of the Chinese Medical
Association, vol.83, no.3, pp:217-220. March 2020. https://doi.org/10.1097/JCMA.0000000000000270
[27] Yu W., Liu T., Valdez R., Gwinn M., and Khoury M. J., Application of support vector machine modeling for
prediction of common diseases: the case of diabetes and pre-diabetes, BMC Medical Informatics and Decision
Making, vol.10, no.16, pp:1-7, March 2010. https://doi.org/10.1186/1472-6947-10-16
[28] Rajeswari S., and Suthendran K., C5.0: Advanced Decision Tree (ADT) classification model for agricultural data
analysis on cloud, Computers and Electronics in Agriculture, vol.156, pp:530-539, December 2018.
https://doi.org/10.1016/j.compag.2018.12.013
[29] Zhang Z., Introduction to machine learning: k-nearest neighbors, Annals of Translational Medicine, vol.4, no.11,
pp:1-7, June 2016. https://doi.org/10.21037/atm.2016.03.37
[30] Biau G., and Editor: Yu B., Analysis of a Random Forests Model, Journal of Machine Learning Research, vol.13,
pp:1063-1095, April 2012.
... From a health perspective, advanced technologies and methods improve the quality of medical operations, communicate well with patients, keep track of their health status [39][40][41][42], and contribute to quality improvement. Some specific ways AI can help medical diagnosis were also listed, including image analysis, medical history analysis, treatment recommendations, and predictive modeling [19,43]. ...
Article
Full-text available
In today’s digitalized era, embracing new and emerging technologies is a requirement to remain competitive. The present research investigates the adoption of artificial intelligence (AI) by the elderly in the European landscape, emphasizing the importance of individuals’ digital skills. As has already been globally recognized, the most imminent demographic challenge is no longer represented by the rapid growth of the population but by its aging. Thus, the paper initially analyzed European perspectives on AI adoption, also discussing the importance of focusing on seniors. A bibliometric analysis was required afterward, and the review of the resulting relevant scientific publications uncovered gaps in understanding the relationship between older individuals and AI, particularly in terms of digital competence. Further exploration considered the EU population’s digital literacy and cultural influences using Hofstede’s model, while also identifying potential ways to improve the elderly’s digital skills and promote the adoption of AI. Results indicate a growing interest in AI adoption among the elderly, underscoring the urgent need for digital skills development. The imperative of personalized approach implementations, such as specialized courses, personalized training sessions, or mentoring programs, was underscored. Moreover, the importance of targeted strategies and collaborative efforts to ensure equitable participation in the digital age was identified as a prerequisite for AI adoption by seniors. In terms of potential implications, the research can serve as a starting point for various stakeholders in promoting an effective and sustainable adoption of AI among older citizens in the EU.
... This data will be used to examine the rate of diabetes in ethnic Indian populations. Basic data processing is carried out first, including missing value replacement, feature standardization, and dataset partitioning (for instance, keeping 70% of the data for training and 30% for testing) [18]. Using principal component analysis (PCA), which stands for "principal component analysis," we are able to reduce the dimensionality of the dataset. ...
Article
Full-text available
This paper presents an adaptation of the Multi-Layer Perceptron (MLP) algorithm for use in predicting diabetes risk. The aim is to enhance the accuracy and generalizability of the model by incorporating preprocessing techniques, dimensionality reduction using Principal Component Analysis (PCA), and improvements in optimization and regularization. Several factors, including glucose level, pregnancy, blood pressure, and body mass index, are taken into account when analyzing the PIMA Indian Diabetes dataset. Modern optimization methods, dropout regularization, and an adaptive learning rate are incorporated into the modified MLP model to fine-tune the model's weights and boost its predictive abilities. The effectiveness of the modified MLP algorithm is evaluated by comparing its performance with baseline machine learning methods and the original MLP algorithm in terms of accuracy, sensitivity, and specificity. The results of this study can improve the quality of healthcare provided to people at risk for developing diabetes and thus contribute to the development of better prediction models for the disease.
... With the current algorithms, a number of newly established processes are involved in the automation of text classification [27]. The most typical strategies utilised for this objective include linear regression [28], naïve Bayes [29], support vector machine [30], and decision tree [31]. NLP involves the development of algorithms and models that enable computers to process human language and the ability to process it. ...
Research Proposal
Full-text available
Text classification is the process of setting records into classes that have already been set up based on what they say. It automatically puts texts in natural languages into categories that have already been set up. Text classification is the most crucial part of text retrieval systems, which find texts based on what the user requests, and text understanding systems, which change the text in some way, like by making summaries, answering questions, or pulling out data. Existing algorithms that use supervised learning to classify text automatically need enough examples to learn well. The algorithms for data mining are used to classify texts, as well as a review of the work that has been done on classifying texts. Design/Methodology/Approach: Data mining algorithms that are used to classify texts were talked about, and studies that looked at how these algorithms were used to classify texts were looked at, with a focus on comparative studies. Findings: No classifier can always do the best job because different datasets and situations lead to different classification accuracy.
... The authors worked by applying SVM, C5.0 DT, KNN, and RF classification algorithms. The authors have shown that RF worked best for the hepatitis dataset with 92.88% accuracy [19]. ...
Article
Full-text available
Hepatitis is among the deadliest diseases on the planet. Machine learning approaches can contribute toward diagnosing hepatitis disease based on a few characteristics. On the UCI dataset, authors assessed distinct classifiers' performance in order to develop a systematic strategy for hepatitis disease diagnosis. The classifiers used are support vector machine, logistic regression (LR), K-nearest neighbor, and random forest. The classifiers were employed without class balancing and in conjunction with class balancing using SMOTE strategy. Both studies, classification without class balancing and with class balancing, were compared in terms of different performance parameters. After adopting class balancing, the efficiency of classifiers improved significantly. LR with SMOTE provided the highest level of accuracy (93.18%).
... Algorithms are used to get meaningful effects from this data. To create algorithms, machine learning is favoured, which has the ability to develop intelligent algorithms capable of accomplishing excellent and wonderful assignments [8] [9]. Artificial intelligence has an artificial neural network algorithm that simulates the human brain and enters into our applications deep fake that creates fake and unreal content, which is the most amazing idea that occurred in the last twenty years. ...
Article
Full-text available
Artificial intelligence is one of the most popular and influential sciences in many fields. It works continuously to contemporise computer systems to operate with high efficiency and to think like what a human think. In addition, this science seeks to make the work of the machine simulate the work of the human brain in thinking and making decisions, according to the environment in which they live. Therefore, it has become necessary to have artificial intelligence applications in all areas, including education, especially the English language teaching electronically. In this regard, the most influential applications and programs that contribute to the development of teaching English electronically and their effectiveness in developing e-learning will be reviewed. This article concluded that there are applications of artificial intelligence in teaching English electronically, which are of great importance and a great future in the development of language teaching.
... Information technology systems automatically learn patterns and relationships from data and gain without being explicitly programmed. Machine learning has been successfully supported in business, investigation, and improvement for many years [32][33][34][35]. Furthermore, machine learning can automatically produce knowledge, train algorithms, identify relationships, and recognize unknown patterns. ...
Preprint
Full-text available
In the present period, various words relating to artificial intelligence, machine learning, and deep learning are commonly used in business, healthcare, industry, and the military, among others. In these fields, accurate data modeling and analysis are vital regardless of the size of the data. Given the rapid expansion and vast development of public life, however, the use of big data is complicated and requires a tremendous amount of human work to deal with and extract useful information. Thus, the role of artificial intelligence begins with the analysis of large data sets using scientific approaches, particularly machine learning, in order to uncover decision-making patterns and eliminate human interaction. In this sense, the role of artificial intelligence, machine learning, and deep learning is fast increasing in importance. In this paper, the authors highlight these sciences by addressing how to build and implement them in a variety of decision-making contexts. In addition, the impact of artificial intelligence on healthcare and the benefits this science offers in the future are emphasized. This article indicates that these sciences have a significant impact, particularly in healthcare, as well as the capacity to develop and enhance their decision-making approach. Moreover, artificial intelligence is an essential science, particularly in the face of the unknown future.
... In addition, deep learning techniques aim to automatically extract features from data and abstractions and identify the most critical data required [13][14][15][16]. These techniques are practical when it comes to dealing with large amounts of unsupervised data, learning to represent data naturally, and pursuing goals in data classification and decision-making with high accuracy [17][18][19][20]. ...
Article
Full-text available
Deep learning has become a favoured trend in many applications serving humanity in the past few years. Since deep learning seeks useful investigation and can learn and train huge amounts of unlabelled data, deep learning has been applied in many fields including the medical field. In this article, the most noteworthy applications of deep learning are presented shortly and positively, they are image recognition, automatic speech recognition, natural language processing, drug discovery and toxicology, customer relationship management, recommendation systems and bioinformatics. The report concluded that these applications have a significant and vital role in all areas of life.
Chapter
This chapter explores the transformative role of artificial intelligence (AI) in ovarian cancer prediction and prognosis. It aims to analyze complex datasets, predict patient outcomes, and optimize treatment pathways. The study critically examines existing research on ovarian cancer, highlighting gaps and challenges in current prognostic methodologies. The findings demonstrate significant advancements in AI application in ovarian cancer prediction, highlighting its transformative potential in research and clinical practice. AI not only enhances understanding of ovarian cancer complexities, but also offers personalized and optimized treatment strategies, offering hope for improved patient outcomes and overall survival rates.
Article
Background / Purpose: The development of computer systems that can carry out tasks that traditionally require human intelligence is referred to as artificial intelligence (AI). It entails the development of intelligent machines that can reason, learn, solve issues, and make judgements. A fast-developing topic, AI has enormous ramifications for many different businesses and facets of society. By leveraging advanced algorithms and data analysis techniques, AI systems can process and interpret large amounts of information in real-time, enabling them to extract valuable insights and patterns that may be difficult for humans to perceive.AI technologies have a wide range of applications across multiple domains, including healthcare, finance, transportation, manufacturing, education, entertainment, including agriculture. When referring to AI in the context of agriculture, we mean the use of advanced analytics and computational algorithms to analyse massive volumes of agricultural data, anticipate the future, and give farmers and stakeholders useful information. The main goal of using AI to agriculture is to increase efficiency, sustainability, and productivity across a range of farming operations, also to create smart and efficient systems that can monitor, analyze, and control water resources in real-time, leading to improved water management and sustainable agricultural practices, thereby addressing the challenges faced by the agricultural sector.AI offers significant potential to optimize water usage, enhance crop productivity, and mitigate environmental impact. In this paper, IBM, a significant provider of services in the sector of agriculture in recent years, is examined. Objective: In this case study, artificial intelligence is the main topic with particular emphasis on IBM's agricultural technology. Design/Methodology/Approach: Academic works published in a variety of peer-reviewed journals, conferences, and business websites provided the necessary information and specifics for this case study on IBM. Findings/Result: This study is primarily concerned with the usefulness and significance of AI in the modern world. The demand for and necessity of the numerous resources. provided by IBM, discussion topics include the company's business plan, varied results, top clientele, and numerous service types. Originality/Value: The analysis gives a concise description of IBM, the types of data collected and managed, information on artificial intelligence (AI), and the numerous AI services offered by IBM. Paper Type: Case study on the importance of storage and computing requirements for AI services offered by different service. providers, with a focus on IBM.
Article
Full-text available
Nowadays, artificial intelligence, machine learning, and deep learning are among the most popular and applied topics in many scientific and life fields that serve humanity. This science seeks to impose itself strongly on the various activities and academic circles and information science. Artificial intelligence techniques have proven their worth to be important in many fields, especially in the medical fields, business administration, military applications, communications, and many others. In short, artificial intelligence is from another world that will be of great importance in the future. In this article, the importance of digitisation and artificial intelligence in the healthcare sector will be addressed, what services they provide to this sector, and how they contribute to the service of healthcare workers and patient satisfaction. This article concluded that artificial intelligence and digital technologies are of great importance in the healthcare sector and can never be dispensed with.
Article
Full-text available
Today, humans fight powerful and active viruses that never take hold and do not know defeat, named coronaviruses. These viruses have start in 2002 and continued to grow and have changed their chains dramatically until now. They are known for having many similar features in common, and there are also structural differences between them. The most important reason that has turned coronaviruses into a pandemic is that this disease is easily transmitted by droplets near infected people, which leads to the spread of this virus faster worldwide. The more details known about coronaviruses that have profoundly affected humanity in the past and present and the diseases they cause, the more benefit in help designing an immune response or preventive vaccine to these viruses in the near future. In this article, coronaviruses, how they have been started and spread, and what differences and similarities are between them will be briefly covered here. The information of this investigation is taken from articles and the world health organization and are reviewed here. The goal is to document this information for future reference.
Article
Full-text available
The coronavirus is a family of viruses that cause different dangerous diseases that lead to death. Two types of this virus have been previously found: SARS-CoV, which causes a severe respiratory syndrome, and MERS-CoV, which causes a respiratory syndrome in the Middle East. The latest coronavirus, originated in the Chinese city of Wuhan, is known as the COVID-19 pandemic. It is a new kind of coronavirus that can harm people and was first discovered in Dec. 2019. According to the statistics of the World Health Organization (WHO), the number of people infected with this serious disease has reached more than seven million people from all over the world. In Iraq, the number of people infected has reached more than twenty-two thousand people until April 2020. In this article, we have applied convolutional neural networks (ConvNets) for the detection of the accuracy of computed tomography (CT) coronavirus images that assist medical staffs in hospitals on categorization chest CT-coronavirus images at an early stage. The ConvNets are able to automatically learn and extract features from the medical image dataset. The objective of this study is to train the GoogleNet ConvNet architecture, using the COVID-CT dataset, to classify 425 CT-coronavirus images. The experimental results show that the validation accuracy of GoogleNet in training the dataset is 82.14% with an elapsed time of 74 minutes and 37 seconds.
Article
Full-text available
COVID-19 (Coronavirus disease-2019), commonly called Coronavirus or CoV, is a dangerous disease caused by the SARS-CoV-2 virus. It is one of the most widespread zoonotic diseases around the world, which started from one of the wet markets in Wuhan city. Its symptoms are similar to those of the common flu, including cough, fever, muscle pain, shortness of breath, and fatigue. This article suggests implementing machine learning techniques (Random Forest, Logistic Regression, Naïve Bayes, Support Vector Machine) by Python to classify a series of chest X-ray images that include viral pneumonia, COVID-19, and healthy (Not infected) cases in humans. The study includes more than 1400 images that are collected from the Kaggle platform. The experimental outcomes of this study confirmed that the supported vector machine technique has high accuracy and excellent performance in the classification of the disease, as reflected by values of 91.8% accuracy, 91.7% sensitivity, 95.9% specificity, 91.8% F1-score, and 97.6% AUC.
Article
Full-text available
Background and objectives Chest X-ray data have been found to be very promising for assessing COVID-19 patients, especially for resolving emergency-department and urgent-care-center overcapacity. Deep-learning (DL) methods in artificial intelligence (AI) play a dominant role as high-performance classifiers in the detection of the disease using chest X-rays. Given many new DL models have been being developed for this purpose, the objective of this study is to investigate the fine tuning of pretrained convolutional neural networks (CNNs) for the classification of COVID-19 using chest X-rays. If fine-tuned pre-trained CNNs can provide equivalent or better classification results than other more sophisticated CNNs, then the deployment of AI-based tools for detecting COVID-19 using chest X-ray data can be more rapid and cost-effective. Methods Three pretrained CNNs, which are AlexNet, GoogleNet, and SqueezeNet, were selected and fine-tuned without data augmentation to carry out 2-class and 3-class classification tasks using 3 public chest X-ray databases. Results In comparison with other recently developed DL models, the 3 pretrained CNNs achieved very high classification results in terms of accuracy, sensitivity, specificity, precision, \(F_1\) score, and area under the receiver-operating-characteristic curve. Conclusion AlexNet, GoogleNet, and SqueezeNet require the least training time among pretrained DL models, but with suitable selection of training parameters, excellent classification results can be achieved without data augmentation by these networks. The findings contribute to the urgent need for harnessing the pandemic by facilitating the deployment of AI tools that are fully automated and readily available in the public domain for rapid implementation.
Article
Full-text available
Major public health incidents such as COVID-19 typically have characteristics of being sudden, uncertain, and hazardous. If a government can effectively accumulate big data from various sources and use appropriate analytical methods, it may quickly respond to achieve optimal public health decisions, thereby ameliorating negative impacts from a public health incident and more quickly restoring normality. Although there are many reports and studies examining how to use big data for epidemic prevention, there is still a lack of an effective review and framework of the application of big data in the fight against major public health incidents such as COVID-19, which would be a helpful reference for governments. This paper provides clear information on the characteristics of COVID-19, as well as key big data resources, big data for the visualization of pandemic prevention and control, close contact screening, online public opinion monitoring, virus host analysis, and pandemic forecast evaluation. A framework is provided as a multidimensional reference for the effective use of big data analytics technology to prevent and control epidemics (or pandemics). The challenges and suggestions with respect to applying big data for fighting COVID-19 are also discussed.
Article
Full-text available
Coronavirus Disease (COVID19) is a fast-spreading infectious disease that is currently causing a healthcare crisis around the world. Due to the current limitations of the reverse transcription-polymerase chain reaction (RT-PCR) based tests for detecting COVID19, recently radiology imaging based ideas have been proposed by various works. In this work, various Deep CNN based approaches are explored for detecting the presence of COVID19 from chest CT images. A decision fusion based approach is also proposed, which combines predictions from multiple individual models, to produce a final prediction. Experimental results show that the proposed decision fusion based approach is able to achieve above 86% results across all the performance metrics under consideration, with average AUROC and F1-Score being 0.883 and 0.867, respectively. The experimental observations suggest the potential applicability of such Deep CNN based approach in real diagnostic scenarios, which could be of very high utility in terms of achieving fast testing for COVID19.
Article
Full-text available
This paper reviews applications of machine learning (ML) predictive models in the diagnosis of chronic diseases. Chronic diseases (CDs) are responsible for a major portion of global health costs. Patients who suffer from these diseases need lifelong treatment. Nowadays, predictive models are frequently applied in the diagnosis and forecasting of these diseases. In this study, we reviewed the state-of-the-art approaches that encompass ML models in the primary diagnosis of CD. This analysis covers 453 papers published between 2015 and 2019, and our document search was conducted from PubMed (Medline), and Cumulative Index to Nursing and Allied Health Literature (CINAHL) libraries. Ultimately, 22 studies were selected to present all modeling methods in a precise way that explains CD diagnosis and usage models of individual pathologies with associated strengths and limitations. Our outcomes suggest that there are no standard methods to determine the best approach in real-time clinical practice since each method has its advantages and disadvantages. Among the methods considered, support vector machines (SVM), logistic regression (LR), clustering were the most commonly used. These models are highly applicable in classification, and diagnosis of CD and are expected to become more important in medical practice in the near future.
Article
Full-text available
Data mining is a research technique to find interesting pattern from hidden information in a database. In the health sector, data mining can be used to diagnose a disease from the patient's medical data record. This research used a Chronic Kidney Disease (CKD) dataset obtained from UCI machine learning repository. In this dataset, almost half of attributes are numeric types that are continuous. Continuous attributes can lead to low accuracy because due to the unlimited data forms; hence, it needs to be transformed into discrete data. In certain cases, if all attributes are used, it can produce a low level of accuracy because it is irrelevant and does not have a correlation with the target class. Therefore, these attributes need to be selected in advance to get more accurate results. One of the techniques of data mining is classification, and one of classification algorithms is C4.5. The purpose of this study is to increase the accuracy of C4.5 algorithm by applying discretization and Correlation-Based Feature Selection (CFS) for chronic kidney disease diagnosis. An improved accuracy has been achieved by applying the discretization and CFS. Discretization was used to handle the continuous value, while CFS was used as attribute selection. An experiment was conducted using WEKA (Waikato Environment for Knowledge Analysis). The application of discretization and CFS in C4.5 resulted in an increase in accuracy of 0.5%. The C4.5 has an accuracy of 97%. The accuracy of C4.5 with discretization was 97.25% and the accuracy of C4.5 algorithm with discretization and CFS was 97.5%.
Article
Full-text available
One of the most exciting tools that have entered the material science toolbox in recent years is machine learning. This collection of statistical methods has already proved to be capable of considerably speeding up both fundamental and applied research. At present, we are witnessing an explosion of works that develop and apply machine learning to solid-state systems. We provide a comprehensive overview and analysis of the most recent research in this topic. As a starting point, we introduce machine learning principles, algorithms, descriptors, and databases in materials science. We continue with the description of different machine learning approaches for the discovery of stable materials and the prediction of their crystal structure. Then we discuss research in numerous quantitative structure–property relationships and various approaches for the replacement of first-principle methods by machine learning. We review how active learning and surrogate-based optimization can be applied to improve the rational design process and related examples of applications. Two major questions are always the interpretability of and the physical understanding gained from machine learning models. We consider therefore the different facets of interpretability and their importance in materials science. Finally, we propose solutions and future research paths for various challenges in computational materials science.
Article
Background: Coronavirus disease-19 (COVID-19), caused by a novel member of the coronavirus family, is a respiratory disease that rapidly reached pandemic proportions with high morbidity and mortality. In only a few months, it has had a dramatic impact on society and world economies. COVID-19 has presented numerous challenges to all aspects of health care, including reliable methods for diagnosis, treatment, and prevention. Initial efforts to contain the spread of the virus were hampered by the time required to develop reliable diagnostic methods. Artificial intelligence (AI) is a rapidly growing field of computer science with many applications for health care. Machine learning is a subset of AI that uses deep learning with neural network algorithms. It can recognize patterns and achieve complex computational tasks often far quicker and with increased precision than can humans. Methods: In this article, we explore the potential for the simple and widely available chest X-ray (CXR) to be used with AI to diagnose COVID-19 reliably. Microsoft CustomVision is an automated image classification and object detection system that is a part of Microsoft Azure Cognitive Services. We utilized publicly available CXR images for patients with COVID-19 pneumonia, pneumonia from other etiologies, and normal CXRs as a dataset to train Microsoft CustomVision. Results: Our trained model overall demonstrated 92.9% sensitivity (recall) and positive predictive value (precision), with results for each label showing sensitivity and positive predictive value at 94.8% and 98.9% for COVID-19 pneumonia, 89% and 91.8% for non-COVID-19 pneumonia, 95% and 88.8% for normal lung. We then validated the program using CXRs of patients from our institution with confirmed COVID-19 diagnoses along with non-COVID-19 pneumonia and normal CXRs. Our model performed with 100% sensitivity, 95% specificity, 97% accuracy, 91% positive predictive value, and 100% negative predictive value. Conclusions: We have used a readily available, commercial platform to demonstrate the potential of AI to assist in the successful diagnosis of COVID-19 pneumonia on CXR images. The findings have implications for screening and triage, initial diagnosis, monitoring disease progression, and identifying patients at increased risk of morbidity and mortality. Based on the data, a website was created to demonstrate how such technologies could be shared and distributed to others to combat entities such as COVID-19 moving forward.