ArticlePDF Available

Utilisation of Machine Learning Techniques in Testing and Training of Different Medical Datasets

November 2021
Asian Journal of Computer and Information Systems 9(4):29-34

November 2021
9(4):29-34

DOI:10.24203/ajcis.v9i4.6765

License
CC BY-NC 4.0

Authors:

Maad M. Mijwil

Baghdad College of Economic Sciences University

Israa Ezzat Salem

Baghdad College of Economic Sciences University

Rana Abttan

University of Baghdad

On our planet, chemical waste increases day after day, the emergence of new types of it, as well as the high level of toxic pollution, the difficulty of daily life, the increase in the psychological state of humans, and other factors all have led to the emergence of many diseases that affect humans, including deadly once like COVID-19 disease. Symptoms may appear on a person, and sometimes they may not; some people may know their condition, and others may neglect their health status due to lack of knowledge that may lead to death, or the disease may be chronic for life. In this regard, the author executes machine learning techniques (Support Vector Machine, C5.0 Decision Tree, K-Nearest Neighbours, and Random Forest) due to their influence in medical sciences to identify the best technique that gives the highest level of accuracy in detecting diseases. Thus, this technique will help to recognise symptoms and diagnose them correctly. This article covers a dataset from the UCI machine learning repository, namely the Wisconsin Breast Cancer dataset, Chronic Kidney disease dataset, Immunotherapy dataset, Cryotherapy dataset, Hepatitis dataset and COVID-19 dataset. In the results section, a comparison is made between the execution of each technique to find out which one is the best and which one is the worst in the performance of analysis related to the dataset of each disease.

CT-scan images of patients with COVID-19 disease.

…

The stages of this work

…

Execution evaluation of SVM

…

Execution evaluation of C5.0

…

Execution evaluation of K-NN

…

Figures - uploaded by Maad M. Mijwil

Content may be subject to copyright.

Content uploaded by Maad M. Mijwil

Content may be subject to copyright.

Asian Journal of Computer and Information Systems (ISSN: 2321 – 5658)

Volume 9 – Issue 4, October 2021

Asian Online Journals (www.ajouronline.com) 29

Utilisation of Machine Learning Techniques in Testing and

Training of Different Medical Datasets

Maad M. Mijwil1, Israa Ezzat Salem2 and Rana A. Abttan3

1 Computer Techniques Engineering Department, Baghdad College of Economic Sciences University

Baghdad, Iraq

Email: mr.maad.alnaimiy [AT] baghdadcollege.edu.iq

1 Computer Techniques Engineering Department, Baghdad College of Economic Sciences University

Baghdad, Iraq

Email: israa.ezzat [AT] baghdadcollege.edu.iq

1 Computer Techniques Engineering Department, Baghdad College of Economic Sciences University

Baghdad, Iraq

Email: rana.ali.abttan [AT] baghdadcollege.edu.iq

_________________________________________________________________________________

ABSTRACT— On our planet, chemical waste increases day after day, the emergence of new types of it, as well as the

high level of toxic pollution, the difficulty of daily life, the increase in the psychological state of humans, and other

factors all have led to the emergence of many diseases that affect humans, including deadly once like COVID-19

disease. Symptoms may appear on a person, and sometimes they may not; some people may know their condition, and

others may neglect their health status due to lack of knowledge that may lead to death, or the disease may be chronic

for life. In this regard, the author executes machine learning techniques (Support Vector Machine, C5.0 Decision

Tree, K-Nearest Neighbours, and Random Forest) due to their influence in medical sciences to identify the best

technique that gives the highest level of accuracy in detecting diseases. Thus, this technique will help to recognise

symptoms and diagnose them correctly

This article covers a dataset from the UCI machine learning repository,

namely the Wisconsin Breast Cancer dataset, Chronic Kidney disease dataset, Immunotherapy dataset, Cryotherapy

dataset, Hepatitis dataset and COVID-19 dataset. In the results section, a comparison is made between the execution

of each technique to find out which one is the best and which one is the worst in the performance of analysis related

to the dataset of each disease.

Keywords— Disease, Machine Learning Techniques, COVID-19, Symptoms, Medical Datasets.

_________________________________________________________________________________

1. INTRODUCTION

The doctor or specialist makes analyses of the patient in order to ascertain his condition if there is a disorder in the

physical or psychological function that affects the well-being and execution of the patient. The disease is usually

associated with specific signs or symptoms that appear to him/her. For example, flu is usually associated with symptoms

such as headache, runny nose, and fever. Frequently, some patients do not differentiate between disease and symptoms.

Some diseases occur more frequently at certain times of the year. These diseases are also colloquially called seasonal

diseases [1]. The most common seasonal illness is bronchus and influenza [2].

Figure 1: CT- scan images of patients with COVID-19 disease.

Presently, the volume of data is growing dramatically, and its complexity increases day by day. The task of analysing

it and finding useful statistics in a traditional way by humans is challenging, and this is why there is an attempt to find

suitable techniques to solve such a problem by the computer. In addition, medical data is one of these problems because

Asian Journal of Computer and Information Systems (ISSN: 2321 – 5658)

Volume 9 – Issue 4, October 2021

Asian Online Journals (www.ajouronline.com) 30

this data becomes more complicated with the large spread of diseases worldwide [3]. It has become difficult to control it,

especially with the spread of COVID-19 disease and the increase in infections among humans and the increase in deaths

[4]. This matter forced doctors and specialists to find techniques that help them in a significant way in diagnosing the

injured and determining their condition quickly and accurately. From these techniques are machine learning techniques.

Machine learning [5] is evolving and growing in the world of healthcare. Furthermore, healthcare [6] is always one of the

most vital areas that witness a remarkable advancement in machine learning techniques. Recently, machine learning has

been adopted to predict and analyse medical datasets due to its speed, accuracy, and low cost [7]. For example, it has

been widely applied in analysing chest images of patients with the COVID-19 disease [8-11]. These techniques can be

trained to look at these images to analyse them, locate the abnormalities, and point at areas where the virus is spread in

the human lung, and to give us a high analysis [12]. With these types of advanced technologies, clinicians can be better

informed in analysing patient information [13]. As well as it has the ability to predict early diseases such as stroke, breast

cancer and many other diseases, which made these techniques of great value to doctors. Figure 1 shows a set of CT- scan

images of people with COVID-19 disease [14].

The main contribution of this article is the exhibition of an investigation on the execution of machine learning

techniques (Support Vector Machine, C5.0 Decision Tree, K-Nearest Neighbours, and Random Forest) to perform an

analysis on a set of binary data that has been chosen from the University of California at Irwin machine learning

repository to obtain the best technique with high results in analysing data for each disease so that this technique is

supportive for doctors and specialists. This work is conducted by using Python. It is a high-level programming language

that Guido Van Rossum invented while working at the Centrum Wiskunde & Informatica Research Centre in 1986. This

language is widely used in artificial intelligence.

The following parts of this article are organised as follows: Section two reviews a set of recent studies that apply machine

learning techniques to analyse medical datasets earned from UCI machine learning repository. Section three discusses the

techniques and materials used in this research. Section four covers the results obtained through experiments as well as the

comparison between these techniques. At the end of this article conclusion and future works are advised in Section five.

2. LITERATURE SURVEY

In this section, several previous works of literature that adopt the same views of the current paper and which has an

impact on the author on its reading are presented. In addition, the researchers have not found find a similar published

study to count the medical datasets chosen from the UCI repository website, and no study that applied the same

techniques used in this paper, which make this paper unique.

The start is from a 2016 study conducted by Aswal et al. from India [15], they recommend implementing machine

learning techniques (Support Vector Machine, C5.0 decision tree, k- Nearest Neighbour) on a medical dataset from the

UCI machine learning repository, namely (Indian Liver Patient Dataset, Hepatitis Dataset, Thyroid Disease Dataset,

Lung Cancer Dataset, and Pima Indians Diabetes Dataset). Their research explains that the best execution is the Support

Vector Machine. In another paper issued at IEEE Xplore by Islam et al. in 2017 [16], they propose machine learning

techniques (K-Nearest Neighbours and Support Vector Machine) to diagnose the breast cancer termed as Wisconsin

breast cancer. This study has achieved an accuracy of more than 98% of support vector machine and earned more than

97% accuracy of K-Nearest neighbours. In another article conducted by Cahyani and Muslim [17], they make an

improvement in the C4.5 Algorithm for Chronic Kidney Disease Diagnosis by adding two factors which are

Discretization and Correlation-based Feature Selection. Their idea achieved success in analysing disease data, as they

obtain an accuracy of more than 97%. This study is very impressive. In another study, Eedi and Kolla [18], they propose

employing machine learning techniques (K-Nearest Neighbour, Random Forest, Naïve Bayes, Logistic Regression, and

Decision Tree,) to detect Breast Cancer Wisconsin Diagnostic. Their research covers Breast Cancer Wisconsin dataset

from the UCI machine learning repository. This research discovers the best execution for the random forest technique,

with more than 93% accuracy. As for the previous study that will be covered in this section, it is an article conducted by

Kumar et al. [19], on the application of one of the machine learning techniques, namely Support Vector machine with

Genetic programming, on a dataset from the UCI repository, namely BUPA liver disorder, chronic kidney disease

(CKD), fertility, and Wisconsin diagnostic breast cancer (WDBC). In this article, the authors obtain excellent accuracy

for BUPA, Fertility, WDBC, and CKD as 75.36%, 85.0%, 99.12%, and 100%, respectively.

3. MATERIALS AND TECHNIQUES

This section is divided into two parts; the first part is about the repository from which the data is taken, and the

second part is directed towards techniques that have been utilised in this article. The UCI Machine Learning Repository

[20] is a website affiliated with the University of California that includes nearly 600 free datasets to serve researchers and

authors in the machine learning community. Meanwhile, these datasets can be used easily with one condition, which is to

make a citation for the reference of this data and this repository. The table below presents a concise description of all the

datasets utilised in this comparison with their number of attributes and instance.

Asian Journal of Computer and Information Systems (ISSN: 2321 – 5658)

Volume 9 – Issue 4, October 2021

Asian Online Journals (www.ajouronline.com) 31

Table 1: Dataset’s description

Datasets

Attributes

Instances

Wisconsin Breast Cancer [21]

569

Chronic Kidney disease [22]

400

Immunotherapy [23]

Cryotherapy [24]

Hepatitis [25]

155

COVID-19 [26]

In the second part, the importance of each technique utilised in this article is concisely discussed, where a set of machine

learning techniques are utilised, which are outlined below.

Support Vector Machine (SVM)

SVM [27] is one of the most widespread supervised machine learning techniques invented in 1992 by three scientists:

Bernhard Boser, Isabelle Guyon, and Vladimir Vapnik. This classifier is applied in classification and regression and

performs operations using linear equations. The classifier has the ability to predict with high accuracy while avoiding

overfitting of automatic data. We can summarize them as systems that employ a hypothesis for linear tasks in a high

dimensional space and are trained from optimization theory that applies a learning bias derived from statistical learning

theory. This technique employs hyperplanes to classify various classes in the dataset and practices various kernels like

Poly, Sigmoid, Radial Basis Function, and Linear

C5.0 Decision Tree (C5.0 DT)

C5.0 [28] is an updated and revised version of the C4.5 decision tree. This tree intentionally creates branches in the

process of using the Information gain measure. When creating a tree model, the attribute splitting is based on the

maximum amount of information gained. The data acquisition mechanism is the process of multiplying the probability of

multiplying the class by the probability register of that class. The attribute impurity measure is performed by entropy.

Large quantities of information are generated based on calculating the entropy values of either the main tree or sub-tree

features. This process continues until a decision is reached that no further division within the tree is required. The most

significant characteristic of this version of the decision tree is the ability to create a large group of branches to receive the

largest number of data and is also characterized by less memory consumption and faster implementation and support.

Unfortunately, this technique does not work with small data.

K-Nearest Neighbours (K-NN)

K-NN [29] is one of the easiest arsenals of machine learning techniques to execute. This technique is based on the

classification process, where this process is done by identifying the closest neighbours, for example, querying and using

these neighbours to determine the query class. At the beginning of implementation, it is required to specify the value of

, which is set by default 5. Moreover, the group of examples is categorized based on the class of 's closest neighbours.

Often it is necessary to take more than one neighbour into account, as these examples are required at run- time, meaning

they must be stored in memory, so sometimes this technique is called Memory-Based Classification. A disadvantage of

this technique is that it is a lazy learning method because the induction is delayed by the runtime. Besides, this technique

uses measurement equations to calculate the distance between two points of the most famous of these equations is

Euclidean Distance.

Random Forest (RF)

In 2000, Leo Breiman introduced a scheme that he called a random forest [30] whose goal is to build a set of predictions

with other schemas that grow in subspaces that are randomly selected from the data. We can define this technique as a set

of tree predictors so that each tree in the scheme depends on the values of a random vector, and samples are collected

independently and with the same distribution for all trees in the forest. In addition, this technique has a generalization

error that indicates the strength of individual trees in the forest and the continuous relationship between them. Also, the

advantage of using a random group is to split each node in the tree into error rates that compare favourably with Adaptive

Boosting and also lead to increased noise in it. This technique involves computing internal estimates that give strength,

correlation, and error and is employed to prove the response to increasing the number of features used in segmentation.

This technique can be used in the regression. This algorithm gives the best accuracy with less processing time for each

dataset.

Asian Journal of Computer and Information Systems (ISSN: 2321 – 5658)

Volume 9 – Issue 4, October 2021

Asian Online Journals (www.ajouronline.com) 32

4. EXPERIMENTAL RESULTS

In this section, the results of the analysis of each technique are presented and its execution is evaluated based on

various factors like Testing Accuracy, Training Accuracy, Testing Time, Training Time. Figure 2 shows the mechanism

of this article in terms of input, processing and output of all medical data. Tables 2 to 5 display the execution evaluation

effects for each technique in analysing the medical dataset. The computer specifications in which this work is applied

consist of the following: Intel® Core™ i5-1130G7 Processor (4-Core), Hard disk:512GB SSD, 16GB RAM, Python

v.3.7 with Spyder IDE v.4.2.1 and running on Windows 10.0 Home build 1904164-bit (last update on February 2021).

Figure 2: The stages of this work

Table 2: Execution evaluation of SVM

Medical Datasets

Testing Accuracy

Training Accuracy

Testing

Time

Training

Time

Wisconsin Breast Cancer

0.96114795214

0.9536719818323

0.04125

Chronic Kidney disease

0.97578491347

0.9347588136591

0.05125

Immunotherapy

0.76261839462

0.7162193529

0.013425

Cryotherapy

0.92333333333

0.82131145451

0.013425

Hepatitis

0.8361859564

0.7861292128

0.013425

COVID-19

0.913968222222

0.891968222222

0.04125

Table 3: Execution evaluation of C5.0

Medical Datasets

Testing Accuracy

Training Accuracy

Testing

Time

Training

Time

Wisconsin Breast Cancer

0.9381429581

0.887820375481

0.034825

Chronic Kidney disease

0.9741222222

0.9537388134491

0.05125

Immunotherapy

0.9332432782613

0.881323282321

0.015625

Cryotherapy

0.976210000

0.890301386712

0.05125

Hepatitis

0.88264867336

0.8119202925

0.05125

COVID-19

0.71622412555

0.66731424444

0.066125

Table 4: Execution evaluation of K-NN

Medical Datasets

Testing Accuracy

Training Accuracy

Testing

Time

Training

Time

Wisconsin Breast Cancer

0.94102895133891

0.93119309670342

0.066125

Chronic Kidney disease

0.985222222222

0.925223452898

0.013425

Immunotherapy

0.8132435886611

0.7955811238125

0.013425

Cryotherapy

0.98888888888

0.97848589843

0.04125

Hepatitis

0.826086956

0.7611940298

0.04125

COVID-19

0.66666666666

0.63218467925

0.07125

Asian Journal of Computer and Information Systems (ISSN: 2321 – 5658)

Volume 9 – Issue 4, October 2021

Asian Online Journals (www.ajouronline.com) 33

Table 5: Execution evaluation of RF

Medical Datasets

Testing Accuracy

Training Accuracy

Testing

Time

Training

Time

Wisconsin Breast Cancer

0.8891288722

0.9021282721

0.066125

Chronic Kidney disease

0.875444444

0.915243742

0.074875

Immunotherapy

0.9846421835

0.9280745516

0.066125

Cryotherapy

0.733333333

0.7321862671

0.066125

Hepatitis

0.928808192

0.928908288

0.074875

COVID-19

0.9888281133

0.9828222111

0.074875

5. CONCLUSIONS AND FUTURE DIRECTIONS

In fact, health is an invaluable blessing, and there is a wonderful saying by Anne Wilson Schaef (an American

clinical psychologist), who says, “Good health is not something we can buy. However, it can be an extremely valuable

savings account”. In this article, machine learning techniques are utilised to analyse medical datasets that have been

chosen from the UCI repository. This article purposes to study the effect of each technique in analysing these data, as

each group of these data has attributes and instances that differ from the other. Table 6 exhibits the effect of the execution

of each technique, as the index included four points, which are excellent execution, good execution, Fair execution, and

inadequate execution. In the future, other techniques can be applied in analysing other data or the same data collected in

order to see the strength of their implementation in analysing medical data.

Table 6: The effect of executing all techniques

Medical Datasets

Excellent

Execution

Good

Execution

Fair

Execution

Inadequate

Execution

Wisconsin Breast Cancer

SVM

K-NN

C5.0

Chronic Kidney disease

C5.0

K-NN

SVM

Immunotherapy

C5.0

K-NN

SVM

Cryotherapy

K-NN

C5.0

SVM

Hepatitis

C5.0

SVM

K-NN

COVID-19

SVM

C5.0

K-NN

6. REFERENCES

[1] Grassly N. C. and Fraser C., “Seasonal infectious disease epidemiology,” Proceedings. Biological sciences, vol.273,

no.1600, pp: 2541–2550, July 2006. https://doi.org/10.1098/rspb.2006.3604

[2] Tate M. D., Deng Y., Jones J. E., Anderson G. P., Brooks A. G., and Reading P. C., “Neutrophils Ameliorate Lung

Injury and the Development of Severe Disease during Influenza Infection,” The Journal of Immunology, vol. 183,

pp:7441-7450, November 2009. https://doi.org/10.4049/jimmunol.0902497

[3] Pandey S. C., “Data Mining Techniques for Medical Data: A Review,” In Proceedings of International Conference

on Signal Processing, Communication, Power and Embedded System (SCOPES), pp:1-12, Paralakhemundi, India, 3-

5 October 2016. https://doi.org/10.1109/SCOPES.2016.7955586

[4] Jia Q., Guo Y., Wang G., and Barnes S. J., “Big Data Analytics in the Fight against Major Public Health Incidents

(Including COVID-19): A Conceptual Framework,” International Journal of Environmental Research and Public

Health, vol.17, no.6161, pp:1-20, August 2020. https://doi.org/10.3390/ijerph17176161

[5] Jones L. D., Golan D., Hanna S. A., and Ramachandran M., “Artificial intelligence, machine learning and the

evolution of healthcare: A bright future or cause for concern?,” Bone & Joint Research, vol.7, no.33,pp:223-225,

March 2018. https://doi.org/10.1302/2046-3758.73.BJR-2017-0147.R1

[6] Schmidt J., Marques M. R. G., Botti S., and Marques M. A. L., “Recent Advances and Applications of Machine

Learning in Solid-State Materials Science,” NPJ Computational Materials, vol.5, no.83, pp:1-11, August 2019.

https://doi.org/10.1038/s41524-019-0221-0

[7] Battineni G., Sagaro G. G., Chinatalapudi N., Amenta F., “Applications of Machine Learning Predictive Models in

the Chronic Disease Diagnosis,” Journal of Personalized Medicine, vol.10, no.21, pp:1-11, March 2020.

https://doi.org/10.3390/jpm10020021

[8] Pham T. D., “Classification of COVID-19 chest X-rays with deep learning: new models or fine tuning?” Health

Information Science and Systems, vol.9, no. 2, November 2020. https://doi.org/10.1007/s13755-020-00135-3

[9] Mijwil, M. M., “Implementation of Machine Learning Techniques for the Classification of Lung X-Ray Images

Used to Detect COVID-19 in Humans,” Iraqi Journal of Science, vol.62, no.6., pp: 2099-2109, 2 July 2021.

https://doi.org/10.24996/ijs.2021.62.6.35.

Asian Journal of Computer and Information Systems (ISSN: 2321 – 5658) 
Volume 9 – Issue 4, October 2021 
 
Asian Online Journals (www.ajouronline.com)    34 
 
[10] Mijwil, M.  M. and Al-Zubaidi,  E.  A., “Medical  Image  Classification  for  Coronavirus  Disease  (COVID-19) Using 
Convolutional  Neural  Networks,”  Iraqi  Journal  of  Science,  vol.62,  no.8,  pp:  2740-2747,  31  August  2021. 
https://doi.org/10.24996/ijs.2021.62.8.27. 
[11] Mijwil,  M.  M.,  Alsaadi,  A.  S,  and  Aggarwal  K.,  “Differences  and  Similarities  Between  Coronaviruses:  A 
Comparative  Review,”  Asian  Journal  of  Pharmacy,  Nursing  and  Medical  Sciences,  vol.9,  no.4,  pp:49-61.  10 
September 2021. https://doi.org/10.24203/ajpnms.v9i4.6696 
[12] Borkowski A. A., Viswanadhan N. A., Thomas L. B., Guzman R. D., Deland L. A., and Mastorides S. M.,  “Using 
Artificial Intelligence for COVID-19 Chest X-ray Diagnosis,” Federal practitioner: for the health care professionals 
of the VA, DoD, and PHS, vol.37, no.9, pp: 398–404, September 2020. https://doi.org/10.12788/fp.0045 
[13] Sidey-Gibbons J. A. M. and Sidey-Gibbons C. J., “Machine Learning in Medicine: A Practical Introduction,” BMC 
Medical Research Methodology, vol.19, no.64, pp:1-18, March 2019. https://doi.org/10.1186/s12874-019-0681-4 
[14] Mishra A. K., Das S. K., Roy P.,  and  Bandyopadhyay S., “Identifying  COVID19  from Chest CT  Images:  A  Deep 
Convolutional  Neural  Networks  Based  Approach,”  Journal  of  Healthcare  Engineering,  vol.2020,  ID.  8843664, 
pp:1-7, August 2020. https://doi.org/10.1155/2020/8843664 
[15] Aswal S.,  Ahuja  N.  J.,  and  Ritika, “Experimental  analysis  of traditional  classification  algorithms on bio medical 
dtatasets,” In Proceedings of International Conference on Next Generation Computing Technologies (NGCT), pp:1-
6, Dehradun, India, 14-16 October 2016. https://doi.org/10.1109/NGCT.2016.7877478 
[16] Islam M., Iqbal I., Haque R., and Hasan K., “Prediction of breast cancer using support vector machine and K -Nearest 
neighbors,” In Proceedings of International Conference on Region 10 Humanitarian Technology (R10-HTC), pp:1-
6, Dhaka, Bangladesh,21-23 December 2017. https://doi.org/10.1109/R10-HTC.2017.8288944 
[17] Cahyani  N.,  and  Muslim  M.  A.,  “Increasing  Accuracy  of  C4.5  Algorithm  by  Applying  Discreti zation  and 
Correlation-based  Feature  Selection  for  Chronic  Kidney  Disease  Diagnosis,”  Journal  of  Telecommunication, 
Electronic and Computer Engineering, Vol.12 No.1, pp:25-32, March 2020. 
[18] Eedi H. and Kolla M. “Machine  Learning  Approaches  for  Healthcare  Data  Analysis,” Journal of Critical Reviews, 
vol.7, no.4, pp:806-81,1 February 2020. http://dx.doi.org/10.31838/jcr.07.04.149 
[19] Kumar  A.,  Sinha  N.,  and  Bhardwaj  A.,  “A  Novel  Fitness  Function  in  Genetic  Programming  for  Medical Data 
Classification,”  Journal  of  Biomedical  Informatics,  vol.  112,  December  2020. 
https://doi.org/10.1016/j.jbi.2020.103623 
[20] Dua D. and Graff C., UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of 
California, School of Information and Computer Science, 2019. 
[21] Street W.  N., Wolberg  W. H.,  and  Mangasarian  O. L.,  “Nuclear feature  extraction for  breast tumor  diagnosis,” 
IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and  Technology, vol.1905, pp:861-870, 
California, United States, 1993. 
[22] Thomas R., Kanso A., and Sedor J. R., “Chronic Kidney Disease and Its Complications,” Primary Care: Clinics in 
Office Practice, vol.35, pp: 329–344, 2008. https://doi.org/10.1016/j.pop.2008.01.008 
[23] Khozeimeh F., Azad F. J., Oskouei Y. M., M. Jafari, S. Tehranian S., Alizadehsani R., and Layegh P., “Intralesional 
Immunotherapy  Compared  to  Cryotherapy in  The  Treatment  of  Warts,” International  Journal  of  Dermatology, 
vol.56, pp:474–478. 2017, https://doi.org/10.1111/ijd.13535 
[24] Khozeimeh F., Alizadehsani R., Roshanzamir M., Khosravi A., Layegh P., and Nahavandi S., “An expert system for 
selecting  wart  treatment  method,” Computers  in  Biology  and  Medicine,  vol.  81,  pp:167-175,  February  2017. 
https://doi.org/10.1016/j.compbiomed.2017.01.001 
[25] Cestnik G.,  Konenenko I.,  and Bratko  I., Assistant-86: A Knowledge-Elicitation Tool for Sophisticated  Users. In: 
Bratko, I. and Lavrac, N., Eds., Progress in Machine Learning, Sigma Press, Wilmslow, pp: 31-45, 1987. 
[26] Wua Y.,  Chena C.,  and Chan  Y.,  “The  outbreak of  COVID-19:  An  overview,”  Journal of  the Chinese  Medical 
Association, vol.83, no.3, pp:217-220. March 2020. https://doi.org/10.1097/JCMA.0000000000000270 
[27] Yu W.,  Liu T.,  Valdez R.,  Gwinn M.,  and Khoury  M. J.,  “Application of  support vector  machine modeling  for 
prediction  of common  diseases: the  case of  diabetes  and  pre-diabetes,” BMC  Medical  Informatics and  Decision 
Making, vol.10, no.16, pp:1-7, March 2010. https://doi.org/10.1186/1472-6947-10-16 
[28] Rajeswari S., and Suthendran K., “C5.0: Advanced Decision Tree (ADT) classification model for agricultural data 
analysis  on  cloud,” Computers  and  Electronics  in  Agriculture,  vol.156,  pp:530-539,  December  2018. 
https://doi.org/10.1016/j.compag.2018.12.013 
[29] Zhang Z., “Introduction to machine learning: k-nearest neighbors,” Annals of Translational Medicine, vol.4, no.11, 
pp:1-7, June 2016. https://doi.org/10.21037/atm.2016.03.37 
[30] Biau G., and Editor: Yu B., “Analysis of a Random Forests Model,” Journal of Machine Learning Research, vol.13, 
pp:1063-1095, April 2012. 
 
 

Perspectives on Artificial Intelligence Adoption for European Union Elderly in the Context of Digital Skills Development

Article

Full-text available

May 2024

In today’s digitalized era, embracing new and emerging technologies is a requirement to remain competitive. The present research investigates the adoption of artificial intelligence (AI) by the elderly in the European landscape, emphasizing the importance of individuals’ digital skills. As has already been globally recognized, the most imminent demographic challenge is no longer represented by the rapid growth of the population but by its aging. Thus, the paper initially analyzed European perspectives on AI adoption, also discussing the importance of focusing on seniors. A bibliometric analysis was required afterward, and the review of the resulting relevant scientific publications uncovered gaps in understanding the relationship between older individuals and AI, particularly in terms of digital competence. Further exploration considered the EU population’s digital literacy and cultural influences using Hofstede’s model, while also identifying potential ways to improve the elderly’s digital skills and promote the adoption of AI. Results indicate a growing interest in AI adoption among the elderly, underscoring the urgent need for digital skills development. The imperative of personalized approach implementations, such as specialized courses, personalized training sessions, or mentoring programs, was underscored. Moreover, the importance of targeted strategies and collaborative efforts to ensure equitable participation in the digital age was identified as a prerequisite for AI adoption by seniors. In terms of potential implications, the research can serve as a starting point for various stakeholders in promoting an effective and sustainable adoption of AI among older citizens in the EU.

Disease Prediction Using a Modified Multi-Layer Perceptron Algorithm in Diabetes

Article

Full-text available

Sep 2023

This paper presents an adaptation of the Multi-Layer Perceptron (MLP) algorithm for use in predicting diabetes risk. The aim is to enhance the accuracy and generalizability of the model by incorporating preprocessing techniques, dimensionality reduction using Principal Component Analysis (PCA), and improvements in optimization and regularization. Several factors, including glucose level, pregnancy, blood pressure, and body mass index, are taken into account when analyzing the PIMA Indian Diabetes dataset. Modern optimization methods, dropout regularization, and an adaptive learning rate are incorporated into the modified MLP model to fine-tune the model's weights and boost its predictive abilities. The effectiveness of the modified MLP algorithm is evaluated by comparing its performance with baseline machine learning methods and the original MLP algorithm in terms of accuracy, sensitivity, and specificity. The results of this study can improve the quality of healthcare provided to people at risk for developing diabetes and thus contribute to the development of better prediction models for the disease.

Effectual Text Classification in Data Mining: A Practical Approach

Research Proposal

Full-text available

May 2023

Text classification is the process of setting records into classes that have already been set up based on what they say. It automatically puts texts in natural languages into categories that have already been set up. Text classification is the most crucial part of text retrieval systems, which find texts based on what the user requests, and text understanding systems, which change the text in some way, like by making summaries, answering questions, or pulling out data. Existing algorithms that use supervised learning to classify text automatically need enough examples to learn well. The algorithms for data mining are used to classify texts, as well as a review of the work that has been done on classifying texts. Design/Methodology/Approach: Data mining algorithms that are used to classify texts were talked about, and studies that looked at how these algorithms were used to classify texts were looked at, with a focus on comparative studies. Findings: No classifier can always do the best job because different datasets and situations lead to different classification accuracy.

A systematic method for diagnosis of hepatitis disease using machine learning

Article

Full-text available

Jan 2023
Innovat Syst Software Eng

Hepatitis is among the deadliest diseases on the planet. Machine learning approaches can contribute toward diagnosing hepatitis disease based on a few characteristics. On the UCI dataset, authors assessed distinct classifiers' performance in order to develop a systematic strategy for hepatitis disease diagnosis. The classifiers used are support vector machine, logistic regression (LR), K-nearest neighbor, and random forest. The classifiers were employed without class balancing and in conjunction with class balancing using SMOTE strategy. Both studies, classification without class balancing and with class balancing, were compared in terms of different performance parameters. After adopting class balancing, the efficiency of classifiers improved significantly. LR with SMOTE provided the highest level of accuracy (93.18%).

Artificial Intelligence Applications in English Language Teaching: A Short Survey

Article

Full-text available

Jan 2023

Artificial intelligence is one of the most popular and influential sciences in many fields. It works continuously to contemporise computer systems to operate with high efficiency and to think like what a human think. In addition, this science seeks to make the work of the machine simulate the work of the human brain in thinking and making decisions, according to the environment in which they live. Therefore, it has become necessary to have artificial intelligence applications in all areas, including education, especially the English language teaching electronically. In this regard, the most influential applications and programs that contribute to the development of teaching English electronically and their effectiveness in developing e-learning will be reviewed. This article concluded that there are applications of artificial intelligence in teaching English electronically, which are of great importance and a great future in the development of language teaching.

Revolutionizing the Future: Artificial Intelligence's Impact to Society

Preprint

Full-text available

Nov 2022

Renato Racelis Maaliw III

In the present period, various words relating to artificial intelligence, machine learning, and deep learning are commonly used in business, healthcare, industry, and the military, among others. In these fields, accurate data modeling and analysis are vital regardless of the size of the data. Given the rapid expansion and vast development of public life, however, the use of big data is complicated and requires a tremendous amount of human work to deal with and extract useful information. Thus, the role of artificial intelligence begins with the analysis of large data sets using scientific approaches, particularly machine learning, in order to uncover decision-making patterns and eliminate human interaction. In this sense, the role of artificial intelligence, machine learning, and deep learning is fast increasing in importance. In this paper, the authors highlight these sciences by addressing how to build and implement them in a variety of decision-making contexts. In addition, the impact of artificial intelligence on healthcare and the benefits this science offers in the future are emphasized. This article indicates that these sciences have a significant impact, particularly in healthcare, as well as the capacity to develop and enhance their decision-making approach. Moreover, artificial intelligence is an essential science, particularly in the face of the unknown future.

Deep Learning Applications and Their Worth: A Short Review

Article

Full-text available

Nov 2022

Deep learning has become a favoured trend in many applications serving humanity in the past few years. Since deep learning seeks useful investigation and can learn and train huge amounts of unlabelled data, deep learning has been applied in many fields including the medical field. In this article, the most noteworthy applications of deep learning are presented shortly and positively, they are image recognition, automatic speech recognition, natural language processing, drug discovery and toxicology, customer relationship management, recommendation systems and bioinformatics. The report concluded that these applications have a significant and vital role in all areas of life.

Transformative Insights: Harnessing Artificial Intelligence for Enhanced Ovarian Cancer Prediction and Prognosis

Chapter

Apr 2024

Rita Komalasari

This chapter explores the transformative role of artificial intelligence (AI) in ovarian cancer prediction and prognosis. It aims to analyze complex datasets, predict patient outcomes, and optimize treatment pathways. The study critically examines existing research on ovarian cancer, highlighting gaps and challenges in current prognostic methodologies. The findings demonstrate significant advancements in AI application in ovarian cancer prediction, highlighting its transformative potential in research and clinical practice. AI not only enhances understanding of ovarian cancer complexities, but also offers personalized and optimized treatment strategies, offering hope for improved patient outcomes and overall survival rates.

Revolutionizing Agriculture: A Case Study of IBM's AI Innovations

Article

Nov 2023

Background / Purpose: The development of computer systems that can carry out tasks that traditionally require human intelligence is referred to as artificial intelligence (AI). It entails the development of intelligent machines that can reason, learn, solve issues, and make judgements. A fast-developing topic, AI has enormous ramifications for many different businesses and facets of society. By leveraging advanced algorithms and data analysis techniques, AI systems can process and interpret large amounts of information in real-time, enabling them to extract valuable insights and patterns that may be difficult for humans to perceive.AI technologies have a wide range of applications across multiple domains, including healthcare, finance, transportation, manufacturing, education, entertainment, including agriculture. When referring to AI in the context of agriculture, we mean the use of advanced analytics and computational algorithms to analyse massive volumes of agricultural data, anticipate the future, and give farmers and stakeholders useful information. The main goal of using AI to agriculture is to increase efficiency, sustainability, and productivity across a range of farming operations, also to create smart and efficient systems that can monitor, analyze, and control water resources in real-time, leading to improved water management and sustainable agricultural practices, thereby addressing the challenges faced by the agricultural sector.AI offers significant potential to optimize water usage, enhance crop productivity, and mitigate environmental impact. In this paper, IBM, a significant provider of services in the sector of agriculture in recent years, is examined. Objective: In this case study, artificial intelligence is the main topic with particular emphasis on IBM's agricultural technology. Design/Methodology/Approach: Academic works published in a variety of peer-reviewed journals, conferences, and business websites provided the necessary information and specifics for this case study on IBM. Findings/Result: This study is primarily concerned with the usefulness and significance of AI in the modern world. The demand for and necessity of the numerous resources. provided by IBM, discussion topics include the company's business plan, varied results, top clientele, and numerous service types. Originality/Value: The analysis gives a concise description of IBM, the types of data collected and managed, information on artificial intelligence (AI), and the numerous AI services offered by IBM. Paper Type: Case study on the importance of storage and computing requirements for AI services offered by different service. providers, with a focus on IBM.

The Significance of Digitalisation and Artificial Intelligence in The Healthcare Sector: A Review

Article

Full-text available

Nov 2022

Nowadays, artificial intelligence, machine learning, and deep learning are among the most popular and applied topics in many scientific and life fields that serve humanity. This science seeks to impose itself strongly on the various activities and academic circles and information science. Artificial intelligence techniques have proven their worth to be important in many fields, especially in the medical fields, business administration, military applications, communications, and many others. In short, artificial intelligence is from another world that will be of great importance in the future. In this article, the importance of digitisation and artificial intelligence in the healthcare sector will be addressed, what services they provide to this sector, and how they contribute to the service of healthcare workers and patient satisfaction. This article concluded that artificial intelligence and digital technologies are of great importance in the healthcare sector and can never be dispensed with.

Differences and Similarities between Coronaviruses: A Comparative Review

Article

Full-text available

Sep 2021

Today, humans fight powerful and active viruses that never take hold and do not know defeat, named coronaviruses. These viruses have start in 2002 and continued to grow and have changed their chains dramatically until now. They are known for having many similar features in common, and there are also structural differences between them. The most important reason that has turned coronaviruses into a pandemic is that this disease is easily transmitted by droplets near infected people, which leads to the spread of this virus faster worldwide. The more details known about coronaviruses that have profoundly affected humanity in the past and present and the diseases they cause, the more benefit in help designing an immune response or preventive vaccine to these viruses in the near future. In this article, coronaviruses, how they have been started and spread, and what differences and similarities are between them will be briefly covered here. The information of this investigation is taken from articles and the world health organization and are reviewed here. The goal is to document this information for future reference.

Medical Image Classification for Coronavirus Disease (COVID-19) Using Convolutional Neural Networks

Article

Full-text available

Aug 2021

The coronavirus is a family of viruses that cause different dangerous diseases that lead to death. Two types of this virus have been previously found: SARS-CoV, which causes a severe respiratory syndrome, and MERS-CoV, which causes a respiratory syndrome in the Middle East. The latest coronavirus, originated in the Chinese city of Wuhan, is known as the COVID-19 pandemic. It is a new kind of coronavirus that can harm people and was first discovered in Dec. 2019. According to the statistics of the World Health Organization (WHO), the number of people infected with this serious disease has reached more than seven million people from all over the world. In Iraq, the number of people infected has reached more than twenty-two thousand people until April 2020. In this article, we have applied convolutional neural networks (ConvNets) for the detection of the accuracy of computed tomography (CT) coronavirus images that assist medical staffs in hospitals on categorization chest CT-coronavirus images at an early stage. The ConvNets are able to automatically learn and extract features from the medical image dataset. The objective of this study is to train the GoogleNet ConvNet architecture, using the COVID-CT dataset, to classify 425 CT-coronavirus images. The experimental results show that the validation accuracy of GoogleNet in training the dataset is 82.14% with an elapsed time of 74 minutes and 37 seconds.

Implementation of Machine Learning Techniques for the Classification of Lung X-Ray Images Used to Detect COVID-19 in Humans

Article

Full-text available

Jul 2021

Maad M. Mijwil

COVID-19 (Coronavirus disease-2019), commonly called Coronavirus or CoV, is a dangerous disease caused by the SARS-CoV-2 virus. It is one of the most widespread zoonotic diseases around the world, which started from one of the wet markets in Wuhan city. Its symptoms are similar to those of the common flu, including cough, fever, muscle pain, shortness of breath, and fatigue. This article suggests implementing machine learning techniques (Random Forest, Logistic Regression, Naïve Bayes, Support Vector Machine) by Python to classify a series of chest X-ray images that include viral pneumonia, COVID-19, and healthy (Not infected) cases in humans. The study includes more than 1400 images that are collected from the Kaggle platform. The experimental outcomes of this study confirmed that the supported vector machine technique has high accuracy and excellent performance in the classification of the disease, as reflected by values of 91.8% accuracy, 91.7% sensitivity, 95.9% specificity, 91.8% F1-score, and 97.6% AUC.

Classification of COVID-19 chest X-rays with deep learning: new models or fine tuning?

Article

Full-text available

Nov 2020

Tuan D. Pham

Background and objectives Chest X-ray data have been found to be very promising for assessing COVID-19 patients, especially for resolving emergency-department and urgent-care-center overcapacity. Deep-learning (DL) methods in artificial intelligence (AI) play a dominant role as high-performance classifiers in the detection of the disease using chest X-rays. Given many new DL models have been being developed for this purpose, the objective of this study is to investigate the fine tuning of pretrained convolutional neural networks (CNNs) for the classification of COVID-19 using chest X-rays. If fine-tuned pre-trained CNNs can provide equivalent or better classification results than other more sophisticated CNNs, then the deployment of AI-based tools for detecting COVID-19 using chest X-ray data can be more rapid and cost-effective. Methods Three pretrained CNNs, which are AlexNet, GoogleNet, and SqueezeNet, were selected and fine-tuned without data augmentation to carry out 2-class and 3-class classification tasks using 3 public chest X-ray databases. Results In comparison with other recently developed DL models, the 3 pretrained CNNs achieved very high classification results in terms of accuracy, sensitivity, specificity, precision, \(F_1\) score, and area under the receiver-operating-characteristic curve. Conclusion AlexNet, GoogleNet, and SqueezeNet require the least training time among pretrained DL models, but with suitable selection of training parameters, excellent classification results can be achieved without data augmentation by these networks. The findings contribute to the urgent need for harnessing the pandemic by facilitating the deployment of AI tools that are fully automated and readily available in the public domain for rapid implementation.

Big Data Analytics in the Fight against Major Public Health Incidents (Including COVID-19): A Conceptual Framework

Article

Full-text available

Aug 2020
Int J Environ Res Publ Health

Major public health incidents such as COVID-19 typically have characteristics of being sudden, uncertain, and hazardous. If a government can effectively accumulate big data from various sources and use appropriate analytical methods, it may quickly respond to achieve optimal public health decisions, thereby ameliorating negative impacts from a public health incident and more quickly restoring normality. Although there are many reports and studies examining how to use big data for epidemic prevention, there is still a lack of an effective review and framework of the application of big data in the fight against major public health incidents such as COVID-19, which would be a helpful reference for governments. This paper provides clear information on the characteristics of COVID-19, as well as key big data resources, big data for the visualization of pandemic prevention and control, close contact screening, online public opinion monitoring, virus host analysis, and pandemic forecast evaluation. A framework is provided as a multidimensional reference for the effective use of big data analytics technology to prevent and control epidemics (or pandemics). The challenges and suggestions with respect to applying big data for fighting COVID-19 are also discussed.

Identifying COVID19 from Chest CT Images: A Deep Convolutional Neural Networks Based Approach

Article

Full-text available

Aug 2020

Coronavirus Disease (COVID19) is a fast-spreading infectious disease that is currently causing a healthcare crisis around the world. Due to the current limitations of the reverse transcription-polymerase chain reaction (RT-PCR) based tests for detecting COVID19, recently radiology imaging based ideas have been proposed by various works. In this work, various Deep CNN based approaches are explored for detecting the presence of COVID19 from chest CT images. A decision fusion based approach is also proposed, which combines predictions from multiple individual models, to produce a final prediction. Experimental results show that the proposed decision fusion based approach is able to achieve above 86% results across all the performance metrics under consideration, with average AUROC and F1-Score being 0.883 and 0.867, respectively. The experimental observations suggest the potential applicability of such Deep CNN based approach in real diagnostic scenarios, which could be of very high utility in terms of achieving fast testing for COVID19.

Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis

Article

Full-text available

Mar 2020

This paper reviews applications of machine learning (ML) predictive models in the diagnosis of chronic diseases. Chronic diseases (CDs) are responsible for a major portion of global health costs. Patients who suffer from these diseases need lifelong treatment. Nowadays, predictive models are frequently applied in the diagnosis and forecasting of these diseases. In this study, we reviewed the state-of-the-art approaches that encompass ML models in the primary diagnosis of CD. This analysis covers 453 papers published between 2015 and 2019, and our document search was conducted from PubMed (Medline), and Cumulative Index to Nursing and Allied Health Literature (CINAHL) libraries. Ultimately, 22 studies were selected to present all modeling methods in a precise way that explains CD diagnosis and usage models of individual pathologies with associated strengths and limitations. Our outcomes suggest that there are no standard methods to determine the best approach in real-time clinical practice since each method has its advantages and disadvantages. Among the methods considered, support vector machines (SVM), logistic regression (LR), clustering were the most commonly used. These models are highly applicable in classification, and diagnosis of CD and are expected to become more important in medical practice in the near future.

Increasing Accuracy of C4.5 Algorithm by Applying Discretization and Correlation-based Feature Selection for Chronic Kidney Disease Diagnosis

Article

Full-text available

Mar 2020

Data mining is a research technique to find interesting pattern from hidden information in a database. In the health sector, data mining can be used to diagnose a disease from the patient's medical data record. This research used a Chronic Kidney Disease (CKD) dataset obtained from UCI machine learning repository. In this dataset, almost half of attributes are numeric types that are continuous. Continuous attributes can lead to low accuracy because due to the unlimited data forms; hence, it needs to be transformed into discrete data. In certain cases, if all attributes are used, it can produce a low level of accuracy because it is irrelevant and does not have a correlation with the target class. Therefore, these attributes need to be selected in advance to get more accurate results. One of the techniques of data mining is classification, and one of classification algorithms is C4.5. The purpose of this study is to increase the accuracy of C4.5 algorithm by applying discretization and Correlation-Based Feature Selection (CFS) for chronic kidney disease diagnosis. An improved accuracy has been achieved by applying the discretization and CFS. Discretization was used to handle the continuous value, while CFS was used as attribute selection. An experiment was conducted using WEKA (Waikato Environment for Knowledge Analysis). The application of discretization and CFS in C4.5 resulted in an increase in accuracy of 0.5%. The C4.5 has an accuracy of 97%. The accuracy of C4.5 with discretization was 97.25% and the accuracy of C4.5 algorithm with discretization and CFS was 97.5%.

Recent advances and applications of machine learning in solid- state materials science

Article

Full-text available

Aug 2019

One of the most exciting tools that have entered the material science toolbox in recent years is machine learning. This collection of statistical methods has already proved to be capable of considerably speeding up both fundamental and applied research. At present, we are witnessing an explosion of works that develop and apply machine learning to solid-state systems. We provide a comprehensive overview and analysis of the most recent research in this topic. As a starting point, we introduce machine learning principles, algorithms, descriptors, and databases in materials science. We continue with the description of different machine learning approaches for the discovery of stable materials and the prediction of their crystal structure. Then we discuss research in numerous quantitative structure–property relationships and various approaches for the replacement of first-principle methods by machine learning. We review how active learning and surrogate-based optimization can be applied to improve the rational design process and related examples of applications. Two major questions are always the interpretability of and the physical understanding gained from machine learning models. We consider therefore the different facets of interpretability and their importance in materials science. Finally, we propose solutions and future research paths for various challenges in computational materials science.

Using Artificial Intelligence for COVID-19 Chest X-ray Diagnosis

Article

Sep 2020
Fed Pract

Background: Coronavirus disease-19 (COVID-19), caused by a novel member of the coronavirus family, is a respiratory disease that rapidly reached pandemic proportions with high morbidity and mortality. In only a few months, it has had a dramatic impact on society and world economies. COVID-19 has presented numerous challenges to all aspects of health care, including reliable methods for diagnosis, treatment, and prevention. Initial efforts to contain the spread of the virus were hampered by the time required to develop reliable diagnostic methods. Artificial intelligence (AI) is a rapidly growing field of computer science with many applications for health care. Machine learning is a subset of AI that uses deep learning with neural network algorithms. It can recognize patterns and achieve complex computational tasks often far quicker and with increased precision than can humans. Methods: In this article, we explore the potential for the simple and widely available chest X-ray (CXR) to be used with AI to diagnose COVID-19 reliably. Microsoft CustomVision is an automated image classification and object detection system that is a part of Microsoft Azure Cognitive Services. We utilized publicly available CXR images for patients with COVID-19 pneumonia, pneumonia from other etiologies, and normal CXRs as a dataset to train Microsoft CustomVision. Results: Our trained model overall demonstrated 92.9% sensitivity (recall) and positive predictive value (precision), with results for each label showing sensitivity and positive predictive value at 94.8% and 98.9% for COVID-19 pneumonia, 89% and 91.8% for non-COVID-19 pneumonia, 95% and 88.8% for normal lung. We then validated the program using CXRs of patients from our institution with confirmed COVID-19 diagnoses along with non-COVID-19 pneumonia and normal CXRs. Our model performed with 100% sensitivity, 95% specificity, 97% accuracy, 91% positive predictive value, and 100% negative predictive value. Conclusions: We have used a readily available, commercial platform to demonstrate the potential of AI to assist in the successful diagnosis of COVID-19 pneumonia on CXR images. The findings have implications for screening and triage, initial diagnosis, monitoring disease progression, and identifying patients at increased risk of morbidity and mortality. Based on the data, a website was created to demonstrate how such technologies could be shared and distributed to others to combat entities such as COVID-19 moving forward.

Utilisation of Machine Learning Techniques in Testing and Training of Different Medical Datasets

Abstract and Figures

Recommended publications

A diagnostic testing for people with appendicitis using machine learning techniques

Evaluation Machine-Learning Approaches for Classification of Cryotherapy and Immunotherapy Datasets

Comparative Analysis of Classification Method for Wart Treatment Method

Human Papillomavirus Targeted Immunotherapy Outcome Prediction Using Machine Learning