ArticlePDF Available

CLASSIFYING THE SUPERVISED MACHINE LEARNING AND COMPARING THE PERFORMANCES OF THE ALGORITHMS

January 2024
International Journal of Advanced Research 12(1):422-438

January 2024
12(1):422-438

DOI:10.21474/IJAR01/18138

License
CC BY 4.0

Authors:

Rathindra Nath Mohalder

Jessore University of Science and Technology

Md. Alam Hossain

Jessore University of Science and Technology

Nazmul Hossain

Jessore University of Science and Technology

Supervised Learning (SL), also recognized as SML, means Supervised Machine Learning. Its a subclass of AI (Artificial Intelligence) and Machine Learning (ML). Its defined by the conduct of entitled datasets for training algorithms that predict outcomes precisely or classify data. The input dataset is faded into the supervised Machine Learning model, which synthesizes its weights until the model has been fitted properly, which happens as a segment of the cross-validation process. Supervised learning machine assists organizations in solving different kinds of real-world problems. SML is searching for algorithms that externally outfitted the instances to produce common hypotheses, preparing predictions for future cases.The supervised Machine Learning (SML) classifications are frequently completed tasks by effective intelligent systems. This paper discusses different categories of Supervised Machine Learning classification technology, compares different categories of supervised learning algorithms and identifies the best effective classification algorithm based on some instances, data set and variables or features. This paper discusses eight different types of SML algorithms. Those were envisaging: Artificial Neural Network (ANN), Bayesian Networks, K-nearest Neighbor (KNN), Random Forest, Decision Tree (DT), Linear Regression, Support Vector Machine (SVM), and Logistic Regression.These eight algorithms develop in the python language. Using a sample dataset for every algorithm and justify the algorithm performance. Here, justify the algorithms based on three different outcomes: throughput, response time, and accuracy. The supervised learning method depends on pre-defined parameters. The performance metric has an important role in identifying the ability and capacity of any kind of machine learning algorithm. The outcomes show that Decision Tree is the best prediction performance in this paper and gives the best accuracy, response time and throughput. The next accurate algorithms in SML algorithms are Logistic Regression and SVM after the DT algorithm.

Artificial neural network (ANN) architecture.

…

Possible hierarchical Bayesian network structures. The gray-colored nodes represent the latent variables and the white-colored nodes represent the observed variables.

…

Random forest Architecture.

…

Features of the SML Model [34].

…

Figures - uploaded by Nazmul Hossain

Content may be subject to copyright.

Content uploaded by Nazmul Hossain

Content may be subject to copyright.

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

422

Journal Homepage: -www.journalijar.com

Article DOI:10.21474/IJAR01/18138

DOI URL: http://dx.doi.org/10.21474/IJAR01/18138

RESEARCH ARTICLE

CLASSIFYING THE SUPERVISED MACHINE LEARNING AND COMPARING THE PERFORMANCES

OF THE ALGORITHMS

Rathindra Nath Mohalder1, Dr. Md. Alam Hossain2 and Nazmul Hossain3

1. Research Scholar, Department of Computer Science and Engineering, Jashore University of Science and

Technology.

2. Professor, Department of Computer Science and Engineering, Jashore University of Science and Technology.

3. Assistant Professor, Department of Computer Science and Engineering, Jashore University of Science and

Technology.

……………………………………………………………………………………………………....

Manuscript Info Abstract

……………………. ………………………………………………………………

Manuscript History

Received: 11 November 2023

Final Accepted: 14 December 2023

Published: January 2024

Key words:-

SL, SML, AL, ML, ANN, KNN, DT,

SVM, KD, KDD, IDA, SLM

Supervised Learning (SL), also recognized as SML, means Supervised

Machine Learning. It‘s a subclass of AI (Artificial Intelligence) and

Machine Learning (ML). It's defined by the conduct of entitled datasets

for training algorithms that predict outcomes precisely or classify data.

The input dataset is faded into the supervised Machine Learning model,

which synthesizes its weights until the model has been fitted properly,

which happens as a segment of the cross-validation process. Supervised

learning machine assists organizations in solving different kinds of

real-world problems. SML is searching for algorithms that externally

outfitted the instances to produce common hypotheses, preparing

predictions for future cases.The supervised Machine Learning (SML)

classifications are frequently completed tasks by effective intelligent

systems. This paper discusses different categories of Supervised

Machine Learning classification technology, compares different

categories of supervised learning algorithms and identifies the best

effective classification algorithm based on some instances, data set and

variables or features. This paper discusses eight different types of SML

algorithms. Those were envisaging: Artificial Neural Network (ANN),

Bayesian Networks, K-nearest Neighbor (KNN), Random Forest,

Decision Tree (DT), Linear Regression, Support Vector Machine

(SVM), and Logistic Regression.These eight algorithms develop in the

python language. Using a sample dataset for every algorithm and

justify the algorithm performance. Here, justify the algorithms based on

three different outcomes: throughput, response time, and accuracy. The

supervised learning method depends on pre-defined parameters. The

performance metric has an important role in identifying the ability and

capacity of any kind of machine learning algorithm. The outcomes

show that Decision Tree is the best prediction performance in this paper

and gives the best accuracy, response time and throughput. The next

accurate algorithms in SML algorithms are Logistic Regression and

SVM after the DT algorithm.

Corresponding Author:- Nazmul Hossain

Address:- Assistant Professor, Department of Computer Science and Engineering,

Jashore University of Science and Technology.

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

423

……………………………………………………………………………………………………....

Introduction:-

For a long time, various ML algorithms have been effectively used for creating predictive casts from the dataset. ML

algorithms and data mining tools in expert-based disciplines are knowledge-intensive and data-rich. In the recent-

day, the information society is growing rapidly. Computers affect all strategies in society, such as medical research

and economy, and typically extrude the manner human function and penetrate new regions of practice. They look at

recent statistics technology and its utility in specific regions resulting in the advent and improvement of

contemporary-day programming languages and structures [1].

Figure 01:- The Method of SML [6].

In general, it was studied within the context of numerous disciplines –– DM (data mining), IDA (intelligent data

analysis), KDD (Knowledge discovery in databases), and ML (machine learning). In trend, KDD studies worry

about the whole Knowledge invention process, such as data mining, pre-processing and post-processing.

Contrariwise, ML studies have many opportunities--it researches the studying system and, in several, the automatic

studying system.

Notwithstanding all four fields' making large developments in automatic information achievement from data, they've

now no longer at once searched the labyrinth of ambit knowledge integration in another all of KD method. ML is the

fastest developing computer science and engineering sector, with large-extending application programs. It means the

self-acting discovery of significant swatches in a dataset. ML instruments are related to awarding projects which

might adapt yet learned [2]. The ML method has formed on the mainframes of a principal portion of our existence,

albeit usually hidden and Information technology. With the increasingly huge amount of data fitting available for the

best reason, I believe that dataset analysis will happen more as an important component for technical systems.

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

424

Mobs are often apt to possibly build errors during analyses when attempting to set up relevance among multiple

features [3]. Machine Learning and Data Mining are similar pairs where different sagacity can be evolved with

accurate learning methods. New data creation makes ML methods more realistic from day today. It is used for

individual methods for both unsupervised and supervised MLis rationally collective in the classification matters sod

that the aim is to get the computational system to study the classification technique that we made [4]. Machine

Learning is properly aimed at achieving accessibility invisible between large data.

ML is appropriate for the angularity of monitoring through different data roots, yet the huge extent of values with

huge amounts of data worried machine learning improves raising datasets. With the freedom from the lines of a

particular level of study and concern, machine learning is cute to search out and display the models mystic in the

dataset [5].

ML is the fastest developing computer science and engineering sector, with large-extending application programs. It

means the self-acting discovery of significant swatches in a dataset. ML instruments are related to awarding projects

which might adapt yet learned [2]. The ML method has formed on the mainframes of a principal portion of our

existence, albeit usually hidden and Information technology. With the increasingly huge amount of data fitting

available for the best reason, I believe that dataset analysis will happen more as an important component for

technical systems. Mobs are often apt to possibly build errors during analyses when attempting to set up relevance

among multiple features [3]. Machine Learning and Data Mining are similar pairs where different sagacity can be

evolved with accurate learning methods. New data creation makes ML methods more realistic from day today. It is

used for individual methods for both unsupervised and supervised ML the SML is rationally collective in the

classification matters sod that the aim is to get the computational system to study the classification technique that we

made [4]. Machine Learning is properly aimed at achieving accessibility invisible between large data. Machine

learning hands over the surety of taking out valorizes from distinct and big data roots, although bordering on less

dependence scheduled on separate tracks as its data spurts and determined at machine range.

ML is appropriate for the angularity of monitoring through different data roots, yet the huge extent of values with

huge amounts of data worried machine learning improves raising datasets. With the freedom from the lines of a

particular level of study and concern, machine learning is cute to search out and display the models mystic in the

dataset [5].

Standard manufacture of SLM load is a classification matter: Learner is needed to study (probable the nature) a

function that designs a vector with different classes by finding different input-outcome instances of functions.

Inducing ML is a system of knowledge that is a part of laws from cases (training dataset), or moreover usually

creating, telling a classifier which will be conducted to normalize from recent precedents. The system of enforcing

SML in an actual problem is related in Figure 1. This work focuses on different types of machine learning methods

and identifies the best efficient algorithm with strong accuracy, throughput, and response time.

The Supervised Machine Learning Algorithms Classification: -

Following [7], SML methods/algorithms are conducted much with the classification encircles due to Artificial

Neural Networks, Bayesian Networks, Decision Tree, K-nearest Neighbor, Support Vector Machine, Random

Forest, Linear Regression and Logistic Regression and others.

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

425

Figure 02:- Artificial neural network (ANN) architecture.

Artificial Neural Network (ANN):

Recently, a multitude of methodologies and ideas from individual disciplinarian fields have increased in particularly

attractive research field ANN [8]. A neuron is a basic unit for making the nervous networks which perform

communication and computational methods. The ANN is the working repetition of the facilitated method of the

biologic neuron. Yet, the aim is to reconstruct knowing data appraisal methods like classification, generalization,

and pattern recognition using simple distributed and robust processing units named Processing Elements (PE) or

artificial neurons [9]. A chief benefit of the ANN access is that the ambit learning gives shares in neurons. Data

processing is brought to pass in a collateral distributed manner [10]. ANNs are exceptionally collateral data

processing instruments able to learn the working dependencies of the dataset [8]. They have to be able to clearly

categorize a high non-linear bearing yet, once trained, can categorize fresh datasets so much more swiftly than it

should be probable by proving the structure logically. ANN formation is based on artificial neurons.

Every artificial neuron features a system node (‗body‘) delineated by circles within the figure likewise as affiliations

from (‗dendrites‘) and affiliations to (‗axons‘) alternative neurons that are shown as arrows sign in the figure 02. In

the last step, the output neuron gets the weighted aggregate of inputs and dispenses the non-linear functionality to

the weighted aggregate. The results of this function make the outcome for the complete ANN [11].

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

426

Figure 03:-Possible hierarchical Bayesian network structures. The gray-colored nodes represent the latent variables

and the white-colored nodes represent the observed variables.

Bayesian Networks (BNs):

An essential step for developing artificial sagacity is to qualify an engine to describe how the universe acts by

constructing an interior model from a dataset. A very important view is studying the dependency diagram of BNs

from a dataset. The motive defines learning the structure, is called NP-solid [12], and yet is chopping-edge research

object acute. Shortly, it is knowledge of as selecting a diagram depends on a few candidates, mounting the argument

all over the gathering of patterns of the format producing dataset. A wide review of remaining software instruments

is also conferred [13]. The main argument in artificial intelligence is developing structures that enable generating a

model narration of the ambit wisdom yet receiving within calculation the most probably structures will learn from

the dataset.

BNs is a DAG (directed acyclic graph). Where conditional dependencies on edges and the random variables are

represented by nodes. Now the Bayes‘ Theorem equation is:

P(H|E) = P󰇛H󰇜P(E|H)

P󰇛E󰇜

Here, P(H|E)= Posterior probability of ‗H‘given the evidence; P(H)= Prior Probability; P(E|H)= Linklihood of the

evidence ‗E‘ if the Hypothesis ‘H‘ is true; P(E)=Priori probability that the evidence itself is true.

The strict HBN and loose HBN were received for supervised classification of instances and for designing of

variables [14].

Dicision Tree (DT):

DT is one of the significant technologies in machine learning. Several sectors applied the Decision Tree algorithm

and applied it in several applications. Decision Tree has three different algorithms that are C4.5, CART and ID3

[15]. The ramification is the motive of offering objects to the class, which has various applications.A normal tree

comprises roots, leaves and branches. It's a predictive structure applied in machine learning, data mining and

statistics. In tree models, the destination variable can get a limited set of entities that are defined as classification

trees; in the tree model, leaves represent the label of branches, and class represents the joins of fertilities that

conduct with those labels of class. Decision trees can construct comparatively faster than any other method of

classification [16].

Decision tree is similar to the tree. To construct a tree, uses the CART (Classification and Regression Tree

algorithm). The structure of the decision tree:

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

427

A group of analysis used to analogy the performance of every algorithm. This added the boosted/tuning DT

Regression. Tuning/boosted DT regression is the algorithm applied to train model by performing with Multiple

Additive Regression Trees (MART) algorithm. Every tree is devoted on prior trees that represents how to

boost/tuning works shown in Figure 04[17].

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

428

Figure 06:- Architecture of support vector machine.

D. Support Vector Machine (SVM):

SVM is the significant ML algorithm usually developed in pattern recognition problems, classifying the image

processing and network traffic for recognition.

Too much research is working on skilling to develop Quality of Service (QoS) and indemnity aspects. Recent work

in this sector has been solved by SVM. It acts more finely than any other classification network traffic for

normalizing the difficulty. This research represents the aspect of SVM, its applications and its concepts overview

[18]. SVM is the strong learning method applied in binary classification. The SVMs principle task is to search for

the greatest hyper plane which can differentiate data properly into twice classes. [19]. Nowadays, multi-class

classification was gained by mixing multiple binary support vector machines. The architecture of SVM is shown in

figure. 06.

E.K-nearest Neighbor (KNN):

Distance-basis algorithms are broadly used for dataset classification difficulty. The KNN classification is the most

exoteric distance-basis algorithm.Euclidean distance by this behavior of different datasets requires private

resemblance measurement accommodated to the dataset features. [20] Alignment is the SML system that graphs on

the input dataset, defining classes/groups. The main principle argument for investing an arrangement rein is that all

of the dataset aims would be engaged to the groups and that every entity object would be engaged to a single group.

Here, K use as an amount of the closest neighbors in KNN.

F. Linear Regression:

One of the maximum general, extensive statistical and ML algorithms is linear regression. It's applied to identify

linear relationships within once or farther identifiers. Two types of linear regression: multiple regressions (MLR)

and simple regression. Different researchers are researching polynomial, and linear regression yet compares their

effectiveness using the accession to optimistic precision and prediction [21].

Machine learning [22-23] is generally conducted in diverse sectors to solve various problems that can't be smoothly

composed depending on computer direction. Linear regression [24] is the mathematical experiment conducted for

quantifying and evaluating the familiarity into the calculated attributes. Therefore, sketchy regression and

correlation are experiments where a boffin in perception the bonding into two attributes to count the influence of

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

429

disorders [22, 25, and 26]. Linear regression [27] is generally conducted in mathematical techniques. It‘s likely to

identify the prediction model yet affect the versus multiplex input attributes.

Figure 07:-K-nearest Neighbour (KNN) algorithm workflow.

Now consider Figure 8.1 an architecture model of the linear regression, as a forerunner of the neural network. Sum

of the weighted is presented as computational unit. Here‘s a consecutive convention considering the favor as few

specific weights, having the corresponding node as input and also constant noted value is +1[28].

The Linear regression has also been normalized to multi-variation linear regression. In this case, multiple variables

y1, ....... yp for prediction, as represented in Figure 8.2 (as an example, 3 prediction variables use y1, y2, y3) [28].

Figure 8.1:- Architectural model of linear regression.

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

430

Figure 8.2:- Architectural model of multi-variation linear regression.

Figure 09:-Architecture of a Logistic Regression Model.

Logistic Regression:

Logistic regression has broadly applied to different fields of the experiment, such as hygiene science, to learn the

risk factor attached to the illness. Some surveys depending on Health Survey and Demographic are made predicting

mixed modeling i.e., multistage sampling, stratified and probabilistic with unbalanced magnification in the survey.

These compound diagrams must hold to calculate faithful outcomes. Although it is a relevant general issue and not

well analyzed in the literature [29].

It is the preferential probabilistic structure. This structure generates inferior probability formation P(Y|X),(Y =

destination variable and X = features). Given X, return to a probability formation over Y. In Figure 09represents an

architecture model of the logistic regression. Outcome of the sigmoid function is explained as probability of

individual sample including to the positive class, in the binary classification crux. An example, 󰇛Z󰇜=

P󰇛y = 1x; w󰇜(Z = linear combination of the weights and the samples features Z = wTx. This algorithm is broadly

used for classification. This algorithm is broadly used for classification[30].

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

431

Figure 10:- Random forest Architecture.

Figure 11:- Features of the SML Model [34].

H. Random Forest (RF):

Random Forest algorithm is an assemble method which combined the outcomes of various randomly built

classification tree. Two elements of randomness are proposed into the building of the several trees. At Frist, every

tree is built using the random bootstrapped form of training dataset. Prediction contracts for unobserved dataset by

getting a majority view of the individual trees. Random Forest packages in python used for implementation [31]. In

RF for constructing a tree uses random dataset. Figure 10 shows the Random Forest model architecture. Random

forest is generated the prediction result by the average all sub-tree predictions.

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

432

Features of supervised Machine Learning Algorithms:

SML methods are contextual to mass ambits. Some ML dispensation oriented researchers can get in [32], [33]. In

general, neural networks and SVMs bend to deduct better, however, conduct with continuous features and multi-

tasking. However, other sites' logic-basic methods bend to enact so well, conducting with definite characteristics.

For SVMs and ANN models a huge pattern is needed in sequence to gain its maximal accuracy count where Naive

Bayes can require a comparatively little dataset.

Figure 12:- SML working process [36].

Related Works:-

An SML model has two portions: one is the training part and another is the testing part. In the training part, sample

datasets are taken into training as input. Which characteristics are realized by the learner or which learning

algorithm constructs the learning model [35]. On the other hand, the testing part, the learning structure, conducts the

redaction engine, making the count for producing or testing data. Tied datasets are the model's main outcome, which

generates the ultimate classified /prediction data.

The SML process (figure 12) is a common method in classification matters, although the machine aims to learn

about the classification model we have constructed.

In the classification, the entities are abstract, in regression, the label is continual. In this research data was reached

from BMI calculation chart. This dataset was selecting because of its accuracy and has also been anonym zed (de-

identified), therefore confidentiality is ensured.

In this research number of Columns 4 and number of row 1128 .So number of 4512 instance for using as a datasets.

In the four columns are height, weight, BMI, and outcome a result which is depends on another three columns

instance. Here uses the height as a centimeter scale and weight data gets in pound scale. This height and weight are

integer type‘s data and BMI is floating type‘s data. But on the other hand, the outcome result or the destination

result columns are String type datasets. For getting proper outcomes iterates every models 100 times and takes the

average outcome result as the final result.

Proposed Efficient Model:-

In this research have two vital objectives: At first, makes a compare of some supervised machine algorithms:

Random Forest, Support Vector Machines, Naive Bayes, K-near Neighbor (KNN), Logistic Regression, Linear

Regression, Artificial Neural Networks (ANN) and Decision Trees. And other important goal is improve the

accuracy of final results using hyper parameter tuning and cross validation. The architecture of the proposed

approach can be seen in Figure no. 13.

(i) Comparison of Methods

This paper discus about eight supervised machine algorithms performance based on the algorithms accuracy,

Response time and Throughput. We target an algorithm is best which accuracy and throughput is high and response

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

433

time is low. In this research we get significant outcome on throughput, response time and accuracy from different

supervised machine learning algorithm.

(ii). Proposed Model

In this research analytical part we find vital outcome on accuracy, response time and throughput. Different

supervised machine learning algorithms gives different kind of performance which based on throughput, response

time and accuracy. In this research, two different algorithms Random Forest and Decision Tree generates best

outcomes. It‘s to be noted that, Random Forest generated accuracy is high that is near about 93.36% but its response

time and throughput performances are low. One the other hand, Decision Tree generates second higher accuracy is

near about 90.71% and its response time and throughput performances are high.

Figure 13:- Proposed Model using hyperparameter tuning.

In this situation, we used proposed methods for getting a best algorithm form those supervised machine learning

algorithms. Now focus on two algorithms Decision Tree and Random Forest. The Random Forest accuracy is high

but its response time and throughput performances are low. On other hand, the Decision Tree generates second

higher accuracy and its response time and throughput performances are high. So at this moment, we want improve

the Decision Tree accuracy using hyperparameter tuning and cross validation Model.

Hyperparameter tuning and cross validation for an algorithm is normal method that uses for increasing algorithm

performances.Hyperparameters are naturally adjusted before the real training method begins.

It‘s done by engaging various values for that hyperparameters, training model, and improving accuracy of ours

targeted algorithms. The tuning of hyperparameter is a method for selecting the particular combination of the

hyperparameters depends on the performance of current data. It‘s a fundamental necessity for getting accurate and

meaningful outcomes from machine learning algorithms. Following figure no. 13 represents things to consider, the

model tuning method, and workflow.

Rotation estimation is also known as Cross-validation (CV). It is the model validation method for measuring the

efficiency of the results and statistical analysis. The aim is to create the model normalized toward a random test set.

It helps to estimate which way model can predict and it will perform accurately in machine learning application.

Using cross-validation method, a model is normally trained with dataset of a acquainted type. Moreover, it‘s tested a

dataset which is unknown variant. In this aim, CV helps to narrate a dataset testing the model in training stage using

validation set.

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

434

Hyper-parameters are a significant sector for model performance analysis which set appropriate values to improve

model efficiency. These research hyper-parameters use GridSearchCV to find out optimal values for getting the

model's best performance.

GridSearchCV is a method of executing hyper-parameter tuning for identifying optimal values for selecting the

model. In GridSearchCV checks, all types of combos of values arrived at the dictionary so that estimate the

algorithm for every combo using the Cross-Validation technique. Using this technique get loss/accuracy for each

combo of the hyper-parameters and choose the best performance generating optimal combo values.

The model‘s parameters configuration is native to the model. Prediction needs to use these parameters. Those are

guessed or specified when the method trains up. It‘s an internal part of a model. It‘s set and learned by the model.

The model‘s parameters configuration is native to the model. Prediction needs to use these parameters. Those are

guessed or specified when the method trains up. It‘s an internal part of a model. It‘s set and learned by the model.

On the other site, hyper-parameters are that parameters are explicitly controlled and specified in the training method.

This research finds the optimal values using the GridSearchCV method for hyper-parameters tuning to get the best

performance of our model.

Evaluation of methods results and Efficiency Analysis:-

Results:-

Table 1:- Eight Supervised Learning Method and Proposed Model performances table on a sample dataset.

Figure 14:-Eight supervised learning methods and proposed model accuracy graph.

No.

Algorithm‘s Name

Throughput

Response Time(millisecond)

Accuracy

(100%)

Proposed Model

110874.5251

0.001990

95.13%

Decision Tree

112873.6251

0.001993

90.71%

Random Forest

11886.35081

0.018929

93.36%

Support Vector Machine(SVM)

12546.826

0.017933

90.27%

K-near Neighbor (KNN)

14116.13683

0.015939

88.05%

Logistic Regression

75621.27675

0.002975

84.51%

Naive Bayes

15055.07527

0.014945

79.65%

Linear Regression

75284.94194

0.002989

57.96%

Artificial Neural Networks(ANN)

3893.936746

0.057782

53.10%

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

435

Accuracy

Accuracy directs to the sanctification of an individual measurement. Accuracy is destined by assimilating the

measurement versus the accepted or true value. The accurate measurement is closer to the truth value.

Figure 15:- Eight supervised learning methods and proposed model Response time graph.

Response time

The time season between a marginal operator's depletion of an investigation and the getting a response. The

response time involves the time taken to send the inquiry yet process it by the methods and send the feedback to the

marginal. The response time is usually used for measuring the reduction of interactive algorithms.

Figure 16:- Eight supervised learning methods and proposed model throughput graph.

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

436

Throughput

Throughput is the calculation of how much data a method can process in a given number of times. Connected

calculation of algorithm productivity comprises the speed across which few tangible workloads can be fulfilled. The

response time is the extent of time into an individual interactive user desire and getting the response times.

Discussion:-

In table. 01, we see eight (08) supervised learning methods (ANN, Bayesian Networks, DT, KNN, SVM, Random

Forest, Linear Regression and Logistic Regression) and proposed model. A sample dataset and implement python

coding to notify eight supervised learning methods and proposed model. We determine all supervised learning

method throughput, the Response time (millisecond), and accuracy (100%) through the process. Every supervised

learning method gives its individual operation performance, providing our perfect outcome. By this performance

analysis, we get proposed model method is best among eight methods.In figure no. 14, the graph shows that the best

accuracy is proposed model about 95.13%. Secondly high accuracy is Random Forest that is about 93.36% and the

worst accuracy is ANN – 53.10%. For getting best accuracy we use hyper-parameter tuning with Cross- validation

technique. At the final outcome, we get that proposed model gives the best result in our dataset.

Conclusion and Future Work:-

Machine learning ramification needs detail subtle to the constants and, at that time, a large number of cases for the

dataset. There is no problem constructing the structure for any method but the right classification. However, a better

learning method for a special dataset doesn't gaze at accuracy, throughput and response time for another dataset

whose characteristics are logically varied from others. Although the main point when we are in contact with

Machine Learning classification isn't in the case, a learning method is excellent for another. Still, an individual

model can significantly outcome from any other definite application matter beneath those situations. Meta-learning

is passing the way, trying to search functions that are designed with the dataset for algorithm execution. At last, it

uses several attributes defined as meta-attributes to display the behavior of studying works while finding the

interrelations among these attributes yet the learning algorithms reduction. Some behaviors of the learning works

are: the ratio of categorical attributes, total count of instances, entropy of classes, a ratio of missing values, etc.

Providing a comprehensive chart of statistical yet information calculates for datasets. Considering the limitations yet

strengths of every technique, the probability of perfecting two or extra methods simultaneously to prove matters

would be researched. The purpose is to improve the powers of one model to multiplier the debilitation of another. If

we focus on the best feasible classification on the accuracy, it is impossible and difficult to identify a singular

classifier that fulfills the best classification ensemble. SVM, DT and Logistic Regression algorithm can give upper

accuracy, response time and throughput of many data and attributes instances. ML algorithm requires the best

accuracy, minimum response time, and maximum throughput to have supervised machine learning.

This paper recommends a few sample datasets and predicts the target values in different supervised machine

learning algorithms. In the future, we have to work with different varieties of huge datasets and analyze the

performance of the SML algorithms that also depend on other different parameters.

References:-

1. Shafagat Mahmudova ―Analysis of Software Performance Enhancement and Development of Algorithm‖

International Journal of Innovative Science and Research Technology, Volume 4, Issue 1, ISSN No:-2456-

2165,(January – 2019).

2. Alican Dogana, Derya Birantb,―Machine Learning and Data Mining in Manufacturing‖Expert Systems with

Applications Volume 166, 114060,(15 March 2021).

3. Ritu Sharma, Kavya Sharma, Apurva Khanna.―Study of Supervised Learning and Unsupervised Learning‖,

International Journal for Research in Applied Science &Engineering Technology (IJRASET), Volume 8

Issue VI , ISSN: 2321-9653,(June 2020);

4. Keith A. Brown, Sarah Brittman, NicolMaccaferri, Deep Jariwala, and Umberto Celano, ―Machine Learning in

Nanoscience: Big Data at Small Scales‖,National Liabrary of Medicine Nano latters,pubs.acs.org/Nano

LettersDOI: 10.1021/acs.nanolett.9b04090, (January 2020).

5. Jose F. Rodrigues Jr, Larisa Florea, Maria C. F. de Oliveira, Dermot Diamond, Osvaldo N. Oliveira Jr, ―Big

data and machine learning for materials science‖ Discover Materials in Springer, published on (19 April,

2021).

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

437

6. Oluwakemi Christiana Abikoye, Gyunka Benjamin Aruwa ―he Threat of Split-Personality Android Malware on

Developing Economy‖ Published by University of the West of Scotland is a registered Scottish charity. Vol

22, No 1,Charity number SC002520,( 2018).

7. Mohammed H. Alsharif, Anabi Hilary Kelechi, Khalid Yahya and Shehzad Ashraf Chaudhry, ―Machine

Learning Algorithms for Smart Data Analysis in Internet of Things Environment: Taxonomies and Research

Trends‖ published in Symmetry, Symmetry 2020,12, 88; doi:10.3390/sym12010088, Published: (2 January

2020).

8. Egba. Anwaitu Fraser, Okonkwo, Obikwelu R, ―ARTIFICIAL NEURAL NETWORKS FOR MEDICAL

DIAGNOSIS: A REVIEW OF RECENT TRENDS‖ Published in International Journal of Computer Science

and Engineering Survey (IJCSES), Vol.11, No.3, (June 2020).

9. Roza Dastres and Mohsen Soori ―Artificial Neural Network Systems‖ International Journal of Imaging and

Robotics, [Formerly known as the ―International Journal of Imaging‖ (ISSN 0974-0627)] Volume 21; Issue No.

2; Int. J. Imag. Robot. ISSN 2231–525X; (Year 2021).

10. Brigitta Nagy, Dorián László Galata, Attila Farkas and Zsombor Kristóf Nagy, ―Application of Artifcial Neural

Networks in the Process Analytical Technology of Pharmaceutical Manufacturing—a Review‖ Piblished in

American Association of Pharmaceutical Scientists, in Springer, The AAPS Journal, Published online 14

June 2022).

11. Nida Shahid, Tim Rappon, Whitney Berta ―Applications of artificial neural networks in health care

organizational decision – making:A scoping review‖, Published on PLOS ONE, (February19,2019).

12. Marco Scutari,Claudia Vitolo, Allan Tucker, ―Learning Bayesian networks from big data with greedy search:

computational complexity and efficient implementation‖ Published in Statistics and Computing (2019)

29:1095–1108, Content courtesy of Springer Nature, Published online: (15 February 2019).

13. Mauro Scanagatta, Antonio Salmerón, Fabio Stella3‖ A survey on Bayesian network structure learning from

data‖ Springer-Verlag GmbH Germany, part of Springer Nature 2019Progress in Artificial Intelligence,

Received: 14 March 2019 / Accepted: 20 May 2019. https://doi.org/10.1007/s13748-019-00194-y

14. Hasna Njah, Salma Jamoussi1Walid Mahdi, ―Deep Bayesian network architecture for Big Data mining‖

Published in WILEY, Concurrency Computat Pract Exper.( 2018);

15. Harsh H. Patel, Purvi Prajapati, ―Study and Analysis of Decision Tree Based Classification Algorithms‖,

International Journal of Computer Sciences and Engineering Open AccessResearch Paper Vol.-6, Issue-10,

Oct. 2018 E-ISSN: 2347-2693. Published: (31/Oct/2018)

16. Puvvada Hruthik, Sai Nivas Rao, Tuta Sri Sai Kailash, Akula Bala Chandra, Chavi Ralha ―MEAL PLAN

PREDICTION USING DECISION TREE CLASSIFIER: A REVIEW‖ Published in International Research

Journal of Modernization in Engineering Technology and Science, Volume:04/Issue:05/May-2022, e-ISSN:

2582-5208,(May-2022).

17. Jean-Pierre Briot, Ga¨etan Hadjeresand Franc¸ois Pachet; ―Deep Learning Techniques for Music Generation– A

Survey‖ Published in Cornil UniversityFrance, arXiv:1709.01620v1 [cs.SD],(5 Sep 2017).

18. Jair Cervantes, Farid Garcia-Lamont, Lisbeth Rodríguez-Mazahua, Asdrubal Lopez, ―A comprehensive survey

on support vector machine classification: Applications, challenges and trends‖ Published in Elsevier

Neurocomputing, Volume 408, Pages 189-215, (30 September 2020).

19. N. Chandra Sekhar Reddy, Purna Chandra Rao Vemuri , A. Govardhan , ―An Empirical Study on Support

Vector Machines for Intrusion Detection‖, International Journal of Emerging Trends in Engineering

Research, Volume 7, ISSN 2347 – 3983, No.( 10 October 2019).

20. Najat Ali, Daniel Neagu, Paul Trundle―Evaluation of k‑nearest neighbour classifier performance for

heterogeneous data sets‖ published in springer nature journal, Published online: (6 November 2019).

21. Dastan Hussen Maulud, Adnan Mohsin Abdulazeez ―A Review on Linear Regression Comprehensive in

Machine Learning”, Journal of Applied Science and Technology Trends Vol. 01, No. 04, pp. 140 –147,

(2020).

22. D. Q. Zeebaree, H. Haron, A. M. Abdulazeez, and D. A. Zebari, "Machine learning and Region Growing for

Breast Cancer Segmentation," International Conference on Advanced Science and Engineering (ICOASE),

2019, pp. 88-93, in (2019).

23. Bargarai, F., Abdulazeez, A., Tiryaki, V., & Zeebaree, D.‖ Management of Wireless Communication Systems

Using Artificial Intelligence-Based Software Defined Radio.‖ International journal of Interactive Mobile

Technologies, iJIM ‒ Vol. 14, No. 13, (2020).

24. D. M. Abdulqader, A. M. Abdulazeez, and D. Q. Zeebaree, "Machine Learning Supervised Algorithms of Gene

Selection: A Review‖, Technology Reports of Kansai UniversityISSN: 04532198 Volume 62, Issue 03,

(April, 2020).

ISSN: 2320-5407 Int. J. Adv. Res. 12(01), 422-438

438

25. Zebari, D. A., Zeebaree, D. Q., Abdulazeez, A. M., Haron, H., & Hamed, H. N. A., ―Improved Threshold Based

and Trainable Fully Automated Segmentation for Breast Cancer Boundary and Pectoral Muscle in Mammogram

Images‖ IEEE Access, 8, 203097-203116. Published:(2020).

26. Abdulazeez, A, M. A. Sulaiman, and D. Q. Zeebaree ―Evaluating Data Mining Classification Methods

Performance in Internet of Things Applications," Journal of Soft Computing and Data Mining, vol. 1, pp. 11-

25, (2020).

27. H.-I. Lim, ―A Linear Regression Approach to Modeling Software Characteristics for Classifying Similar

Software," IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), pp. 942-

943, (in 2019).

28. Jean-Pierre Briot, Ga ¨ etan Hadjeres, Franc¸ois-David Pachet, ―Deep Learning Techniques for Music

Generation‖, Published in Springer,ISBN: 978-3-319-70162-2.eBook ISBN: 978-3-319-70163-9,Series

ISSN:2509-6575in (2019)

29. Ernest Yeboah Boateng and Daniel A. Abaye, ―A Review of the Logistic Regression Model with Emphasis on

Medical Research‖ Journal of Data Analysis and Information Processing, 7, 190-207, ISSN Online: 2327-

7203,ISSN Print: 2327-7211, (in 2019).

30. Renato Torres, Orlando Ohashi and Gustavo Pessin ―A Machine-Learning Approach to Distinguish Passengers

and Drivers Reading While Driving‖, Published in MDPI-Sensors, Sensors 2019, 19, 3174;

doi:10.3390/s19143174, (19 July 2019).

31. Bette Loef, Albert Wong, Nicole A. H. Janssen, Maciek Strak, Jurriaan Hoekstra, H. Susan J. Picavet1, H. C.

Hendriek Boshuizen, W. M. Monique Verschuren1& Gerrie‑Cor M. Herber, ―Using random forest to identify

longitudinal predictors of health in a 30‑year cohort study‖ Publishedin Scientific reports (2022) 12:10372, in

(2022)

32. MichałWieczorek,JakubSiłkaID,DawidPołap,MarcinWoźniak,RobertasDamasˇevičiusID,―Real-

timeneuralnetworkbasedpredictorfor cov19virusspread‖, Published on PLOS ONE, (December17,2020),

33. Ian H. ―Data Mining Practical Machine Learning Tools and Techniques, Second Edition,‖ MORGAN

Published in(2005)

34. Alan Brnabicand, Lisa M. Hess ―Systematic literature review of machine learning methods used in the analysis

of real-world data for patient-provider decision making‖ Published on BMC Medical Informatics and

Decision Making, Brnabic and Hess BMC Med Inform Decis Mak (2021).

Development, Application and Utility of a Machine Learning Approach for Melanoma and Non-Melanoma Lesion Classification Using Counting Box Fractal Dimension

Article

Full-text available

May 2024

The diagnosis and identification of melanoma are not always accurate, even for experienced dermatologists. Histopathology continues to be the gold standard, assessing specific parameters such as the Breslow index. However, it remains invasive and may lack effectiveness. Therefore, leveraging mathematical modeling and informatics has been a pursuit of diagnostic methods favoring early detection. Fractality, a mathematical parameter quantifying complexity and irregularity, has proven useful in melanoma diagnosis. Nonetheless, no studies have implemented this metric to feed artificial intelligence algorithms for the automatic classification of dermatological lesions, including melanoma. Hence, this study aimed to determine the combined utility of fractal dimension and unsupervised low-computational-requirements machine learning models in classifying melanoma and non-melanoma lesions. We analyzed 39,270 dermatological lesions obtained from the International Skin Imaging Collaboration. Box-counting fractal dimensions were calculated for these lesions. Fractal values were used to implement classification methods by unsupervised machine learning based on principal component analysis and iterated K-means (100 iterations). A clear separation was observed, using only fractal dimension values, between benign or malignant lesions (sensibility 72.4% and specificity 50.1%) and melanoma or non-melanoma lesions (sensibility 72.8% and specificity 50%) and subsequently, the classification quality based on the machine learning model was ≈80% for both benign and malignant or melanoma and non-melanoma lesions. However, the grouping of metastatic melanoma versus non-metastatic melanoma was less effective, probably due to the small sample size included in MM lesions. Nevertheless, we could suggest a decision algorithm based on fractal dimension for dermatological lesion discrimination. On the other hand, it was also determined that the fractal dimension is sufficient to generate unsupervised artificial intelligence models that allow for a more efficient classification of dermatological lesions.

Using random forest to identify longitudinal predictors of health in a 30-year cohort study

Article

Full-text available

Jun 2022

Due to the wealth of exposome data from longitudinal cohort studies that is currently available, the need for methods to adequately analyze these data is growing. We propose an approach in which machine learning is used to identify longitudinal exposome-related predictors of health, and illustrate its potential through an application. Our application involves studying the relation between exposome and self-perceived health based on the 30-year running Doetinchem Cohort Study. Random Forest (RF) was used to identify the strongest predictors due to its favorable prediction performance in prior research. The relation between predictors and outcome was visualized with partial dependence and accumulated local effects plots. To facilitate interpretation, exposures were summarized by expressing them as the average exposure and average trend over time. The RF model’s ability to discriminate poor from good self-perceived health was acceptable (Area-Under-the-Curve = 0.707). Nine exposures from different exposome-related domains were largely responsible for the model’s performance, while 87 exposures seemed to contribute little to the performance. Our approach demonstrates that ML can be interpreted more than widely believed, and can be applied to identify important longitudinal predictors of health over the life course in studies with repeated measures of exposure. The approach is context-independent and broadly applicable.

Journal of Soft Computing and Data Mining Evaluating Data Mining Classification Methods Performance in Internet of Things Applications

Article

Full-text available

Jan 2020

Big data and machine learning for materials science

Article

Full-text available

Apr 2021

Herein, we review aspects of leading-edge research and innovation in materials science that exploit big data and machine learning (ML), two computer science concepts that combine to yield computational intelligence. ML can accelerate the solution of intricate chemical problems and even solve problems that otherwise would not be tractable. However, the potential benefits of ML come at the cost of big data production; that is, the algorithms demand large volumes of data of various natures and from different sources, from material properties to sensor data. In the survey, we propose a roadmap for future developments with emphasis on computer-aided discovery of new materials and analysis of chemical sensing compounds, both prominent research fields for ML in the context of materials science. In addition to providing an overview of recent advances, we elaborate upon the conceptual and practical limitations of big data and ML applied to materials science, outlining processes, discussing pitfalls, and reviewing cases of success and failure.

I. DATE OF PUBLICATION XXXX 00, 0000, DATE OF CURRENT VERSION XXXX 00, 0000. Improved Threshold Based and Trainable Fully Automated Segmentation for Breast Cancer Boundary and Pectoral Muscle in Mammogram Images

Article

Full-text available

Nov 2020

Segmentation of the breast region and pectoral muscle are fundamental subsequent steps in the process of Computer-Aided Diagnosis (CAD) systems. Segmenting the breast region and pectoral muscle are considered a difficult task, particularly in mammogram images because of artefacts, homogeneity among the region of the breast and pectoral muscle, and low contrast along the region of breast boundary, the similarity between the texture of the Region of Interest (ROI), and the unwanted region and irregular ROI. This study aims to propose an improved threshold-based and trainable segmentation model to derive ROI. A hybrid segmentation approach for the boundary of the breast region and pectoral muscle in mammogram images was established based on thresholding and Machine Learning (ML) techniques. For breast boundary estimation, the region of the breast was highlighted by eliminating bands of the wavelet transform. The initial breast boundary was determined through a new thresholding technique. Morphological operations and masking were employed to correct the overestimated boundary by deleting small objects. In the medical imaging field, significant progress to develop effective and accurate ML methods for the segmentation process. In the literature, the imperative role of ML methods in enabling effective and more accurate segmentation method has been highlighted. In this study, an ML technique was built based on the Histogram of Oriented Gradient (HOG) feature with neural network classifiers to determine the region of pectoral muscle and ROI. The proposed segmentation approach was tested by utilizing 322, 200, 100 mammogram images from mammographic image analysis society (mini-MIAS), INbreast, Breast Cancer Digital Repository (BCDR) databases, respectively. The experimental results were compared with manual segmentation based on different texture features. Moreover, evaluation and comparison for the boundary of the breast region and pectoral muscle segmentation have been done separately. The experimental results showed that the boundary of the breast region and the pectoral muscle segmentation approach obtained an accuracy of 98.13% and 98.41% (mini-MIAS), 100%, and 98.01% (INbreast), and 99.8% and 99.5% (BCDR), respectively. On average, the proposed study achieved 99.31% accuracy for the boundary of breast region segmentation and 98.64% accuracy for pectoral muscle segmentation. The overall ROI performance of the proposed method showed improving accuracy after improving the threshold technique for background segmentation and building an ML technique for pectoral muscle segmentation. More so, this paper also included the ground-truth as an evaluation of comprehensive similarity. In the clinic, this analysis may be provided as a valuable support for breast cancer identification. INDEX TERMS Breast cancer, Digital mammogram, Threshold technique, ML technique, Breast segmentation, Pectoral muscle segmentation. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Management of Wireless Communication Systems Using Artificial Intelligence-Based Software Defined Radio

Article

Full-text available

Aug 2020

The wireless communication system was investigated by novel methods, which produce an optimized data link, especially the software-based methods. Software-Defined Radio (SDR) is a common method for developing and implementing wireless communication protocols. In this paper, SDR and artificial intelligence (AI) are used to design a self-management communication system with variable node locations. Three affected parameters for the wireless signal are considered: channel frequency, bandwidth, and modulation type. On one hand, SDR collects and analyzes the signal components while on the other hand, AI processes the situation in real-time sequence after detecting unwanted data during the monitoring stage. The decision was integrated into the system by AI with respect to the instantaneous data read then passed to the communication nodes to take its correct location. The connectivity ratio and coverage area are optimized nearly double by the proposed method, which means the variable node location, according to the peak time, increases the attached subscriber by a while ratio

Machine Learning Supervised Algorithms of Gene Selection: A Review

Article

Full-text available

Apr 2020

Machine Learning Algorithms for Smart Data Analysis in Internet of Things Environment: Taxonomies and Research Trends

Article

Full-text available

Jan 2020

Machine learning techniques will contribution towards making Internet of Things (IoT) symmetric applications among the most significant sources of new data in the future. In this context, network systems are endowed with the capacity to access varieties of experimental symmetric data across a plethora of network devices, study the data information, obtain knowledge, and make informed decisions based on the dataset at its disposal. This study is limited to supervised and unsupervised machine learning (ML) techniques, regarded as the bedrock of the IoT smart data analysis. This study includes reviews and discussions of substantial issues related to supervised and unsupervised machine learning techniques, highlighting the advantages and limitations of each algorithm, and discusses the research trends and recommendations for further study.

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article

Full-text available

Nov 2019

Distance-based algorithms are widely used for data classification problems. The k-nearest neighbour classification (k-NN) is one of the most popular distance-based algorithms. This classification is based on measuring the distances between the test sample and the training samples to determine the final classification output. The traditional k-NN classifier works naturally with numerical data. The main objective of this paper is to investigate the performance of k-NN on heterogeneous datasets, where data can be described as a mixture of numerical and categorical features. For the sake of simplicity, this work considers only one type of categorical data, which is binary data. In this paper, several similarity measures have been defined based on a combination between well-known distances for both numerical and binary data, and to investigate k-NN performances for classifying such heterogeneous data sets. The experiments used six heterogeneous datasets from different domains and two categories of measures. Experimental results showed that the proposed measures performed better for heterogeneous data than Euclidean distance, and that the challenges raised by the nature of heterogeneous data need personalised similarity measures adapted to the data characteristics.

A Review of the Logistic Regression Model with Emphasis on Medical Research

Article

Full-text available

Jan 2019

Background: This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on medical research. Method: Thirty seven research articles published between 2000 and 2018 which employed logistic regression as the main statistical tool as well as six text books on logistic regression were reviewed. Logistic regression concepts such as odds, odds ratio, logit transformation, logistic curve, assumption, selecting dependent and independent variables, model fitting, reporting and interpreting were presented. Results: Upon perusing the literature, considerable deficiencies were found in both the use and reporting of LR. For many studies, the ratio of the number of outcome events to predictor variables (events per variable) was sufficiently small to call into question the accuracy of the regression model. Also, most studies did not report on validation analysis, regression diagnostics or goodness-of-fit measures; measures which authenticate the robustness of the LR model. Here, we demonstrate a good example of the application of the LR model using data obtained on a cohort of pregnant women and the factors that influence their decision to opt for caesarean delivery or vaginal birth. Conclusion: It is recommended that researchers should be more rigorous and pay greater attention to guidelines concerning the use and reporting of LR models. Keywords: Logistic Regression Model, Validation Analysis, Goodness-of-Fit Measures, Odds Ratio, Likelihood Ratio Test, Hosmer-Lemeshow Test, Wald Statistic, Medical Research

Study of Supervised Learning and Unsupervised Learning

Article