ArticlePDF Available

Training Feed-Forward Multi-Layer Perceptron Artificial Neural Networks with a Tree-Seed Algorithm

Authors:

Abstract and Figures

The artificial neural network (ANN) is the most popular research area in neural computing. A multi-layer perceptron (MLP) is an ANN that has hidden layers. Feed-forward (FF) ANN is used for classification and regression commonly. Training of FF MLP ANN is performed by backpropagation (BP) algorithm generally. The main disadvantage of BP is trapping into local minima. Nature-inspired optimizers have some mechanisms escaping from the local minima. Tree-seed algorithm (TSA) is an effective population-based swarm intelligence algorithm. TSA mimics the relationship between trees and their seeds. The exploration and exploitation are controlled by search tendency which is a peculiar parameter of TSA. In this work, we train FF MLP ANN for the first time. TSA is compared with particle swarm optimization, gray wolf optimizer, genetic algorithm, ant colony optimization, evolution strategy, population-based incremental learning, artificial bee colony, and biogeography-based optimization. The experimental results show that TSA is the best in terms of mean classification rates and outperformed the opponents on 18 problems.
Content may be subject to copyright.
Arabian Journal for Science and Engineering
https://doi.org/10.1007/s13369-020-04872-1
RESEARCH ARTICLE-COMPUTER ENGINEERING AND COMPUTER SCIENCE
Training Feed-Forward Multi-Layer Perceptron Artificial Neural
Networks with a Tree-Seed Algorithm
Ahmet Cevahir Cinar1
Received: 28 March 2020 / Accepted: 13 August 2020
© King Fahd University of Petroleum & Minerals 2020
Abstract
The artificial neural network (ANN) is the most popular research area in neural computing. A multi-layer perceptron (MLP)
is an ANN that has hidden layers. Feed-forward (FF) ANN is used for classification and regression commonly. Training of FF
MLP ANN is performed by backpropagation (BP) algorithm generally. The main disadvantage of BP is trapping into local
minima. Nature-inspired optimizers have some mechanisms escaping from the local minima. Tree-seed algorithm (TSA) is
an effective population-based swarm intelligence algorithm. TSA mimics the relationship between trees and their seeds. The
exploration and exploitation are controlled by search tendency which is a peculiar parameter of TSA. In this work, we train FF
MLP ANN for the first time. TSA is compared with particle swarm optimization, gray wolf optimizer, genetic algorithm, ant
colony optimization, evolution strategy, population-based incremental learning, artificial bee colony, and biogeography-based
optimization. The experimental results show that TSA is the best in terms of mean classification rates and outperformed the
opponents on 18 problems.
Keywords Tree-seed algorithm ·Multi-layer perceptron ·Training neural network ·Artificial neural network ·Neural
networks ·Nature inspired algorithms
1 Introduction
Neural computing mimics the human brain which is the
most complex organ of the human body [1]. Neural net-
works simulate the connections in the human brain [2]. This
simulation is named as the artificial neural network (ANN).
Basically ANN takes inputs, computes them, and produces
the outputs. This process is a learning process. Learning
has two main types: supervised and unsupervised. In super-
vised learning, the training data have output labels but in
unsupervised learning, the data have not got output labels.
ANN is a balancer between the input and outputs. In the
literature, there are various types of networks such as feed-
forward (FF) [3], Kohonen [4], radial-basis function (RBF)
[5], recurrent neural [6], spiking neural [7]. FF is a network
that has one way (one direction). In FF, the association of
inputs and outputs is provided with weights and biases. If a
FF ANN has hidden layers, it is named as multi-layer percep-
BAhmet Cevahir Cinar
accinar@selcuk.edu.tr; ahmetcevahircinar@gmail.com
1Department of Computer Engineering, Faculty of
Technology, Selcuk University, 42075 Konya, Turkey
tron (MLP) [8,9]. MLP has three layers: input, hidden, and
output. Training data are used for learning the hidden weights
between the attributes and class labels. The deterministic and
stochastic learning approaches are used for training an ANN.
The gradient-based methods and backpropagation (BP) algo-
rithm are deterministic methods [10]. If the training data
does not change, deterministic methods produce the same
results. The deterministic methods are simple and speedy.
Stochastic methods try to improve the learning rate during
the iterations. Thus, time usage is higher than the deter-
ministic methods, but it gives us better results. The main
drawback of the deterministic method is the initial solu-
tions dependency. Stochastic optimization techniques start
with random solutions. These random solutions are evolved
in every iteration, and the main advantage is avoiding the
local optima. Nature-inspired optimizers are in the stochas-
tic optimization techniques group. Most of these methods are
multi-solution-based algorithms but some of them like hill
climbing [11] and simulated annealing [12,13] are single
solution-based algorithms. TSA, particle swarm optimiza-
tion (PSO), gray wolf optimizer (GWO), genetic algorithm
(GA), ant colony optimization (ACO), evolution strategy
(ES), population-based incremental learning (PBIL), arti-
123
Arabian Journal for Science and Engineering
ficial bee colony (ABC), biogeography-based optimization
(BBO) are some of the multi-solution nature-inspired opti-
mizers. These algorithms are not only used in training of FF
ANN MLPs but also used in various applications like feed
formulation [14], traveling salesman problem [15], layout
problem [16], and combinatorial problems [17]. Also, ANN
is used in many applications like forecasting [18], classi-
fication [19,20], estimation [21,22], and prediction [23].
The ANN is not only used for classifying stationary signals
but also non-stationary signals [24]. Various techniques are
used in classification of non-stationary signals, for example,
Koh and Woo [24] combined ensemble technique and multi-
view learning for classifying stationary signals; Boashash
and Ouelha [25] focused on extracting information from
non-stationary signals; Delsy et al. [26] extracted the fea-
tures from non-stationary signals and classified them with
backpropagation network. According to the No Free Lunch
(NFL) [27] theorem, a nature-inspired optimizer cannot solve
all optimization problems successfully. Therefore, since the
GA is proposed in 1975, more than 300 nature-inspired
algorithms are proposed until now. Every nature-inspired
optimizer has peculiar property, so, in this work, we want
to prove the success of the TSA to training MLP. TSA is an
effective solver on low dimensional problems. In this work,
we modify the basic TSA for solving large-scale MLP train-
ing.
The remainder of the paper is organized as follows:
Sect. 1.1 gives the main contribution of the study. In Sect. 2,
the related works are given. FF MLP ANN and TSA are
examined in Sects. 3.1 and 3.2, respectively. The experimen-
tal setup and information about datasets are given in Sect. 4.
The results and discussion are located in Sect. 5. Finally, in
Sect. 6we conclude the work.
1.1 The Main Contribution of the Study
TSA is used for training the FF MLP ANN for the first
time.
TSA is compared and outperformed on 18 different
datasets (6 to 6786 dimensions, 4 to 7400 samples) with 8
metaheuristic algorithms.
TSA finds eligible weights and biases of FF MLP ANN.
The parameter adjustment for the basic TSA has increased
the mean classification accuracy.
TSA is the best solver on 18 different type datasets in terms
of mean classification rates.
2 Literature Review
Neural computing and nature-inspired optimizers are a huge
research domain. Training feed-forward multi-layer percep-
tron artificial neural networks subject is not a fresh idea, but
it is a most discussed, alive, and growing research problem
in the literature. Therefore, in this section, we only focus
on the recent applications related to the training of FF MLP
ANN with nature-inspired optimizers and the literature of
TSA.
Wienholt [28] uses ES for minimizing the system error of
an MLP. Seiffer [29] propose a GA approach for avoiding
local minima on training MLP in 2001. Mendes et al. [30]
use PSO for training MLPs on classification and regression
tasks in 2002. In 2005, Blum and Socha [31] extend ACO
for pattern classification on medical data. Karaboga et al.
[32] use ABC for training FF ANN. Five function approxi-
mation problems are used in experiments. ABC outperforms
BP and GA in this work. Mirjalili et al. [33] hybridize PSO
and gravitational search algorithm, and it is named PSOGSA.
In this work, MLP is trained with PSOGSA. The obtained
results are compared with PSO and GSA. PSOGSA is bet-
ter than PSO and GSA in terms of convergence, training
error, and classification rate. Mirjalili et al. [34]trainMLP
with BBO in 2014. Five classification and six approximation
datasets are used for experiments. BBO is outperformed to
PSO, GA, ACO, ES, and PBIL. Also, the obtained results
are compared with the BP algorithm and extreme learning
machine. Mirjalili [8] investigates the effectiveness of the
GWO on training MLP in 2015. Five classification and three
function approximation datasets are used for determining
the performance of GWO. GWO creates better results than
PSO, GA, ACO, ES, and PBIL. Amirsadri et al. [35] com-
bine BP and GWO for training MLP. Levy flight technique
is used to improve the exploration capability of GWO like
[36]. BP increases the exploitation capability of GWO. The
success of the proposed model is shown on 12 classification
and function-approximation datasets. Xu et al. [37] modify
ABC with the global best-guided approach for continuous
optimization problems, and this method is named ABC-ISB.
ABC-ISB is compared with ABC variants in the literature. In
this work, basic ABC and ABC-ISB is compared on training
MLP. ABC-ISB creates promising results. Zhang et al. [38]
optimize the weights and biases of a MLP with improved
GWO, and their approach is named RSMGWO in 2019.
RSMGWO used a random opposition learning strategy for
avoiding the local optima. Nineteen different cancer-related
datasets are used for experiments. RSMGWO produces com-
petitive results. Heidari et al. [39] use ant–lion optimizer
for the training MLP in 2020, and their approach is named
as ALOMLP. ALOMLP outperforms GA, PBIL, DE, and
PSO in this work. Dalwinder et al. [40] weighted the fea-
tures of the datasets, for increasing the classification rate in
2020. In this work, an ant–lion optimizer is used for train-
ing the MLPs. Three breast cancer datasets are used in the
experimental setup. The obtained results showed that this
paradigm increases the classification rate. Faris et al. [41]
train MLP with multi-verse optimizer (MVO). Nine different
123
Arabian Journal for Science and Engineering
Fig. 1 The structure of 2–3-1 MLP
bio-medical datasets selected from the UCI machine learning
repository are used in experiments. MVO is compared with
GA, PSO, DE, firefly, and cuckoo search algorithms. The
experimental results show that MVO produces compatible
results. The metaheuristic algorithms are not used only for
time-series prediction but also used in image processing [42],
electrical machine design [43], and optimization of energy
consumption in wireless sensor network [44].
TSA is another iterative continuous search algorithm pro-
posed by Kiran [45] in 2015. In the literature, TSA is used
in the wide range of research areas such as constrained ver-
sions of TSA [46,47], engineering optimization problems
solved with TSA [4858], RBF network training and appli-
cations with TSA [48,59], parallel versions of TSA [60,61],
image processing with TSA [6265], binary optimization
with TSA [6668], improved versions of TSA [50,6976],
feature selection with TSA [77], discrete versions of TSA
[15].
Until now, there is no work in the literature for training
MLP with TSA; our main motivation is to present the effec-
tiveness of TSA for training MLPs.
3 Materials and Methods
3.1 Feed-Forward Neural Network and Multi-Layer
Perceptron
The FF neural network is a neural network that has
one way (one direction) between their neurons. If a NN
has hidden layers, it is named as MLP. In this study,
vector representation is used for individuals. The indi-
vidual for the 2-3-1 MLP that is presented in Fig. 1is
X[W13W23 W14 W24 W15 W25 W36W46W56θ1θ2θ3θ4].
The dimension is calculated as
((InputNumber+OutputNumber+1)HiddenNodesNumber)+
1.
Mean square error (MSE) for all training samples is used
as the objective function. Equation 1shows this calculation:
MSE
T
t1
m
i1Rt
iCt
i2
T(1)
where Tis the number of training samples, m is the number
of outputs, Ct
iis the created output value for ith input for tth
training sample, Rt
iis the real output value for ith input for
tth training sample.
3.2 Tree-Seed Algorithm
TSA was proposed by Kiran [45] for solving unconstrained
continuous optimization problems in 2015. TSA simulates
the relationship between trees and their seeds. TSA is
a population-based swarm intelligence techniques. It has
two peculiar parameters. These parameters are the Search
Tendency (ST) and the Number of Seeds (NS). ST controls
the seed creation direction. The population is named as stand
in TSA. Kiran [45] recommends that the NS can be between
10% of the stand size and 25% of the stand. But if nec-
essary one can change this number. In TSA, the trees and
seeds correspond to the possible solution of an optimization
problem. At the initialization phase, the population is cre-
ated randomly in a predetermined search space. The trees
and seeds are D-dimensional vectors, and D is the dimen-
sionality of an optimization problem. The search process is
a trade-off of exploration and exploitation. If this trade-off
is balanced, the algorithm creates more qualified solutions.
In TSA, this situation is controlled by the ST parameter with
two different seed creation formulas given in Eqs. 2and 3,
respectively.
Seed(k,j)Tree(i,j)+Best jTree(r,j)×Rand(1,1)
(2)
Seed(k,j)Tree(i,j)+Tree(i,j)Tree(r,j)×Rand(1,1)
(3)
where kis the index of the seed, jis the index of the dimen-
sion, ris the index of the random neighbor tree, Best is the
best tree obtained so far, Rand(1,1) is a random number
between 1 and 1. The Equation Xprovides the exploita-
tion and Equation Xprovides the exploration. The detailed
pseudocode of the basic TSA is given in Fig. 2.
3.3 Training MLP with TSA
This section describes how to train a FF MLP with TSA
deeply. TSA is a continuous optimization algorithm, and
Sect. 3.2 gives detailed information about TSA. The main
aim is to determine the optimum parameters of MLP. These
123
Arabian Journal for Science and Engineering
Determine the number of trees (N)
Determine the search tendency (ST) parameter
Determine the maximum function evaluation number (Maxfes)
Dis the dimensionality of the problem
Initialize the trees
Evaluate the trees
Fes=N
WHILE Fes<Maxfes
FOR i=1 to N
Determine the number of seeds between the 10% of the population size and 25%of the
population (NS)
Select a random neighbor tree (r) that not equals the current tree
FOR k=1 to NS
FOR j=1 to D
IF rand<ST
Seed(k,j)=Tree(i,j)+rand(-1,1)*(Best(j)-Tree(r,j))
Relocate the seeds if cross the search space boundaries
ELSE
Seed(k,j)=Tree(i,j)+rand(-1,1)*(Tree(i,j)-Tree(r,j))
Relocate the seeds if cross the search space boundaries
END
END
END
Determine the best seed with a greedy selection mechanism
If the best seed is better than its tree, then the tree is removed from the search space and the
best seed become a tree
END
Determine the best tree with a greedy selection mechanism
END
Fig. 2 The detailed pseudocode of the basic TSA
parameters are clearly explained in Sect. 3.1. At the initializa-
tion phase, these values are started as a random vector. After
that, this vector is optimized by TSA, and finally optimized
parameters of a MLP are produced by TSA. The flowchart
of the proposed method is given in Fig. 3.
3.4 The Computational Complexity of the Proposed
Method
The computational complexity of the proposed method is
related to the structure of the MLP, the number of instances
in the training data, the stand size, the number of maximum
function evaluation numbers, and the number of seeds. The
Big-0 notation of the computational complexity of the pro-
posed method is given in Eq. 4.
O(TSA,MLP)O(Maxfes(O(MLP)+O(TSA))) (4)
where Maxfes is the number of maximum function evaluation
numbers, O(MLP)is the Big-O notation of MLP, and it is
calculated as in Eq. 5,O(TSA
)is the Big-O notation of TSA,
and it is calculated as in Eq. 6.
O(MLP)(t(h+o)) (5)
123
Arabian Journal for Science and Engineering
Fig. 3 The flowchart of the proposed method
123
Arabian Journal for Science and Engineering
Table 1 The details of datasets
No Dataset name Number of attributes MLP structure Dimensions Weight numbers Bias numbers Range
1 XOR6 2 2-2-1 6 6 weights 0 bias [100,100]
2 XOR9 2 2-2-1 9 6 weights 3 biases [10,10]
3 XOR13 2 2-3-1 13 9 weights 4 biases [10,10]
4 3-bit Parity 3 3-3-1 16 12 weights 4 biases [10,10]
5 4-bit Enc. Dec 4 4-2-4 22 16 weights 6 biases [10,10]
6 3-bits XOR 3 3-7-1 36 28 weights 8 biases [10,10]
7 Sigmoid 1 1-15-1 46 30 weights 16 biases [10,10]
8 Cosine 1 1-15-1 46 31 weights 17 biases [10,10]
9 Sine 1 1-15-1 46 32 weights 18 biases [10,10]
10 Balloon 4 4-9-1 55 45 weights 10 biases [10,10]
11 Iris 4 4-9-3 75 63 weights 12 biases [10,10]
12 Breast Cancer 9 9-19-1 210 190 weights 20 biases [10,10]
13 Heart 22 22-45-1 1082 1035 weights 46 biases [10,10]
14 Banknote 4 4-9-1 55 45 weights 10 biases [10,10]
15 Diabetic 19 19-39-1 820 780 weights 40 biases [10,10]
16 Twonorm 20 20-41-1 903 861 weights 42 biases [10,10]
17 Ringnorm 20 20-41-1 903 861 weights 42 biases [10,10]
18 Spambase 57 57-115-1 6786 6670 weights 116 biases [10,10]
where tis the number of instances in the training data, his
the number of hidden nodes in the MLP, ois the number of
output values. In this work, hand oare smaller than t,soin
the worst-case O(MLP)t.
O(TSA)(N×NS×D)(6)
where Nis the stand size, NS is the number of seeds, Dis
the dimensionality of the training dataset. In the best case
NS N/10 and in the worst-case NS N/4. NS must
be smaller than N. Generally Dis smaller than t; for easing
the calculation we suppose Dt. So, in the worst-case O
(TSA)N×N/4×t.
The overall computational complexity of the proposed
method is given in Eq. 7.
O(TSA,MLP)O(Maxfes(t+N×N/4×t)) (7)
4 Experimental Setup
In this section, we gave the details of our experimental setup.
The details of the datasets (the name of the dataset, the num-
ber of the attributes of the dataset, the MLP structure for
training the dataset, the dimension of the dataset, the total
weight numbers of the dataset, the total bias numbers of
the dataset, and the search space range for the dataset) are
given in Table 1. Determining the performance of the algo-
rithm, 18 different datasets (XOR6, XOR9, XOR13, 3-bit
Parity, 4-bit Encoder Decoder, 3-bits XOR, Sigmoid, Cosine,
Sine, Balloon, Iris, Breast Cancer, Heart, Banknote, Diabetic,
Twonorm, Ringnorm, and Spambase) are used in experi-
ments. The large datasets have more than 1000 training/test
samples discussed in Sect. 5.3. There is no strict rule for
selecting the number of hidden nodes. Equation 8.isused
for the determining the number of hidden nodes.
H2×I+1 (8)
where His the number of hidden nodes of MLP and Iis the
number of input nodes. For function approximation datasets,
the number of hidden nodes is set as 15.
In this work, we use nine algorithms. All specific param-
eters of these algorithms are listed in Table 2.
The maximum iteration numbers, the maximum number
of the evaluation numbers, the population sizes (for PSO,
GWO, GA, ACO, ES, PBIL, and BBO), the stand size of
TSA, and the colony size of ABC for every dataset are given
in Table 3.
The information about training/test samples is given in
Table 4.
The datasets are mapped to [1, + 1] space with the
min–max normalization method that was formulated as seen
in Eq. 5.
X(Xxmin)×(1 (1))
(xmax xmin)+(1) (9)
123
Arabian Journal for Science and Engineering
Table 2 Specific parameters of the algorithms that used in this work
Algorithm Parameter Value
TSA Search Tendency 0.1
Number of Seeds N:
Stand Size
N*0.1—N*0.25
GWO a (linearly decreased) 2 to 0
ABC Colony Size (CS) N:
Population Size
N/2
Limit D: Dimension of
the problem
CS*D
BBO Habitat modification
probability
1
Immigration
probability bounds
per gene
[0,1]
Step size for numerical
integration of
probabilities
1
Max immigration (I)
and Max emigration
(E)
1
Mutation probability 0.005
PSO Cognitive constant
(C1)
1
Social constant (C2) 1
Inertia constant (w) 0.3
GA (real coded,
selection Roulette
wheel)
Crossover single point
(probability)
1
Mutation uniform
(probability)
0.01
ACO Initial pheromone (s0) 1.00E06
Pheromone update
constant (Q)
20
Pheromone constant
(q0)
1
Global pheromone
decay rate (pg)
0.9
Local pheromone
decay rate (pt)
0.5
Pheromone sensitivity
(a)
1
Visibility sensitivity
(b)
5
ES Lambda 10
Sigma 1
PBIL Learning rate 0.05
Good population
member
1
Bad population
member
0
Elitism parameter 1
Mutational probability 0.1
where Xis the mapped value, Xis the real value, xmax is the
maximum value of the dataset, xmin is the minimum value of
the dataset.
4.1 Balloon Dataset
The balloon dataset, about blowing up a balloon, has 4
attributes (color, size, act, and age) and 20 training/test sam-
ples (4 repeated). If the balloon is inflated, the output is 1;
otherwise, the output is zero. The string input variables are
converted to binary format. The color values are yellow and
purple, the size values are small and large, the act values are
stretch and dip, and the age values are adult and child. The
files related to the dataset can be found in https://archive.ics.
uci.edu/ml/datasets/Balloons.
4.2 Iris Dataset
The Iris dataset, about class of iris plant, has 4 attributes
(sepal length, sepal width, petal length, and petal width) and
150 training/test samples. If the class is Iris Setosa the output
is 1, if the class is Iris Versicolour the output is 0, and if
the class is Iris Virginica the output is 1. The input variables
are mapped between 1 and 1 with the min–max normal-
ization method that aforementioned before. The files related
to the dataset can be found in https://archive.ics.uci.edu/ml/
datasets/Iris.
4.3 Breast Cancer Dataset
The Breast cancer dataset, about patients who cancer or not,
has 10 attributes (id, clump thickness, uniformity of cell size,
uniformity of cell shape, marginal adhesion, single epithe-
lial cell size, bare nuclei, bland chromatin, normal nucleoli,
and mitoses) and 599 training/100 test samples. If the can-
cer is benign the output is 0; if the cancer is malignant, the
output is 1. The input variables are converted to continuous
variables between 1 and 1 with the min–max normaliza-
tion method that aforementioned before. The files related
to the dataset can be found in https://archive.ics.uci.edu/ml/
datasets/breast+cancer+wisconsin+(original).
4.4 Heart Dataset
The Heart dataset, about patients who has heart disease or
not, has 22 attributes (binary features that extracted from
images) and 267 training/test samples. In this work, we only
use the first 80 training/test samples. If the patient is normal,
the output is 0; if the patient is abnormal, the output is 1. The
files related to the dataset can be found in https://archive.ics.
uci.edu/ml/datasets/spect+heart.
123
Arabian Journal for Science and Engineering
Table 3 The
population/colony/stand sizes
and iteration/function evaluation
numbers
No Dataset name Population size Maximum
iteration
number
Maximum
function
evaluations
Colony size Stand size
1 XOR6 50 250 12,500 25 10
2 XOR9 50 250 12,500 25 10
3 XOR13 50 250 12,500 25 10
4 3-bit Parity 50 250 12,500 25 10
5 4-bit Enc. Dec 50 250 12,500 25 10
6 3-bits XOR 50 250 12,500 25 10
7 Sigmoid 200 250 50,000 100 50
8 Cosine 200 250 50,000 100 50
9 Sine 200 250 50,000 100 50
10 Balloon 50 250 12,500 25 10
11 Iris 200 250 50,000 100 50
12 Breast cancer 200 250 50,000 100 50
13 Heart 200 250 50,000 100 50
14 Banknote 50 250 12,500 25 10
15 Diabetic 50 250 12,500 25 10
16 Twonorm 50 250 12,500 25 10
17 Ringnorm 50 250 12,500 25 10
18 Spambase 50 250 12,500 25 10
Table 4 The training/test samples information
No Dataset name Training samples Test samples NOTrS NOTeS
1XOR6 (0 0;0 1;1 0;1 1) >(0;1;1;0) Same as training samples 4 4
2XOR9 (0 0;0 1;1 0;1 1) >(0;1;1;0) Same as training samples 4 4
3XOR13 (0 0;0 1;1 0;1 1) > (0;1;1;0) Same as training samples 4 4
43-bitParity (000;001;010;011;100;101;110;11
1)(0;1;1;0;1;0;0;1)
Same as training samples 8 8
54-bitEnc.Dec (0001;0010;0100;1000)(0001;001
0;0100;1000)
Same as training samples 4 4
63-bitsXOR (000;001;010;011;100;101;110;11
1)(0;1;1;0;1;0;0;1)
Same as training samples 8 8
7Sigmoid x in [3:0.1:3] x in [3:0.05:3] 61 121
8Cosine x in [1.25:0.05:2.75] x in [1.25:0.04:2.75] 31 38
9Sine xin [2π:0.1:2π]xin[2π:0.05:2π] 126 252
10 Balloon The details are given in Sect. 4.1 Same as training samples 20 20
11 Iris The details are given in Sect. 4.2 Same as training samples 150 150
12 Breast cancer The details are given in Sect. 4.3 Not the same as training samples 599 100
13 Heart The details are given in Sect. 4.4 Same as training samples 80 80
14 Banknote The details are given in Sect. 4.5 Same as training samples 1372 1372
15 Diabetic The details are given in Sect. 4.6 Same as training samples 1151 1151
16 Twonorm The details are given in Sect. 4.7 Same as training samples 7400 7400
17 Ringnorm The details are given in Sect. 4.8 Same as training samples 7400 7400
18 Spambase The details are given in Sect. 4.9 Same as training samples 4601 4601
NOTrS: Number of training samples NOTeS: Number of test samples
123
Arabian Journal for Science and Engineering
Table 5 Experimental results of the XOR6 dataset
XOR6 ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 1.98E23 2.40E–23 2.97E29 6.73E29 2.34E29 1.05E29 1.24E29 1.05E29 1.05E29
Best 1.06E29 1.24E29 2.97E29 1.06E29 1.24E29 1.03E29 1.24E29 1.03E29 1.03E29
Worst 3.99E22 7.16E22 2.97E29 5.48E28 8.41E29 1.09E29 1.24E29 1.09E29 1.09E29
SD 7.39E23 1.31E22 2.28E44 1.16E28 2.40E29 1.66E31 2.85E45 1.91E31 1.88E31
Median 5.10E26 8.29E29 2.97E29 2.51E29 1.24E29 1.05E29 1.24E29 1.05E29 1.05E29
Mean time 1.3564 1.6653 1.3946 1.4546 1.8817 1.4412 0.7653 2.2254 2.5986
Friedman rank 8.4 7.8 6.6 5.8 5.5 1.9 4.7 2.3 1.9
Manuel rank 2 3 4 2 3 1311
Wilcoxon 1.92E06 1.73E06 1.73E06 2.13E06 1.73E06 4.91E01 1.73E06 2.06E01 0.00E +00
Classification rate (%) 100 100 100 100 100 100 100 100 100
4.5 Banknote Authentication Dataset
The Banknote authentication dataset is related to whether the
banknote is valid or invalid. It has four continuous attributes
(variance of wavelet transformed image, skewness of wavelet
transformed image, curtosis of wavelet transformed image,
the entropy of image), and 1372 training/test samples. The
files related to the dataset can be found in https://archive.ics.
uci.edu/ml/datasets/banknote+authentication.
4.6 Diabetic Retinopathy Debrecen Dataset
The Diabetic Retinopathy Debrecen dataset contains infor-
mation about people who have diabetic retinopathy or
not. It has 19 continuous and integer attributes and 1151
training/test samples. The files related to the dataset can
be found in https://archive.ics.uci.edu/ml/datasets/Diabetic+
Retinopathy+Debrecen+Data+Set.
4.7 Twonorm Dataset
The Twonorm dataset is an artificial dataset that has 20 con-
tinuous attributes and 7400 training/test samples. The files
related to the dataset can be found in https://www.cs.toronto.
edu/~delve/data/twonorm/desc.html
4.8 Ringnorm Dataset
The Ringnorm dataset is an artificial dataset that has 20 con-
tinuous attributes and 7400 training/test samples. The files
related to the dataset can be found in https://www.cs.toronto.
edu/~delve/data/ringnorm/desc.html.
4.9 Spambase Dataset
The Spambase dataset is about the classifying emails as spam
or not. It has 57 continuous or integer attributes and 4601
training/test samples. The files related to the dataset can be
found in https://archive.ics.uci.edu/ml/datasets/Spambase/.
5 Results and Discussion
All obtained results and discussion about these results are
located in this section. The best training results and the best
classification rates are highlighted with bold type text and
italic background in Tables 5,6,7,8,9,10,11,12,13,14,
15,16,17 ,21,22,23,24. The statistical tests are impor-
tant for determining the significant difference between the
obtained results. In this work, two different statistical tests
are conducted. These are the Wilcoxon signed rank test and
Friedman’s test. The 30 runs obtained results are used in these
tests. The significance level is taken as 5% (0.05), and the p
values of the Wilcoxon signed rank test and mean rank values
of Friedman’s test are located in Tables 5,6,7,8,9,10,11,
12,13,14,15,16,17 and Tables 21,22,23,24.Thelarge
datasets have more than 1000 training/test samples discussed
in Sect. 5.3.
The experimental results for the XOR6 dataset are given
in Table 5. TSA, PSO, and GWO share the first position in
terms of mean training error results. The classification rate is
100% for all methods. Therefore, XOR6 is not an identifier
problem.
The experimental results for the XOR9 dataset are given
in Table 6. ABC, BBO, GA, PBIL, and GWO share the first
position in terms of mean training error results. The classifi-
cation rate is 100% for all methods. Therefore, XOR9 is not
an identifier problem.
The experimental results for the XOR13 dataset are given
in Table 7. ABC, BBO, GA, and GWO share the first position
in terms of mean training error results. The classification
rate is 100% for all methods. Therefore, XOR13 is not an
identifier problem.
123
Arabian Journal for Science and Engineering
Table 6 Experimental results of the XOR9 dataset
XOR9 ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 2.70E09 2.30E06 2.69E09 1.86E05 2.69E09 2.89E 06 3.23E09 1.75E08 2.94E09
Best 2.69E09 1.74E08 2.69E09 3.11E09 2.69E09 2.69E09 2.69E09 2.70E09 2.70E09
Wor s t 2.71E09 2.58E05 2.69E09 0.00028 2.69E09 5.12E05 5.75E09 1.15E07 3.67E09
SD 3.03E12 4.84E06 4.21E25 5.34E05 1.08E24 1.12E 05 8.34E10 2.41E08 2.60E10
Median 2.69E09 8.89E07 2.69E09 7.10E07 2.69E09 2.70E 09 2.69E09 9.30E09 2.83E09
Mean time 1.4393 1.4955 1.4015 1.4390 1.8169 1.4570 0.7771 2.2102 2.7048
Friedman rank 3.9 8.4 1.7 8.2 1.7 4.6 4.0 6.8 5.6
Manuel rank 141311122
Wilcoxon 1.73E06 1.73E06 1.73E06 1.73E06 1.73E 06 6.16E04 6.58E 01 4.73E06 0.00E + 00
Classification rate (%) 100 100 100 100 100 100 100 100 100
Table 7 Experimental results of the XOR13 dataset
XOR13 ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 1.43E13 2.20E07 1.40E13 1.12E06 1.73E09 4.51E 08 5.95E10 1.32E09 6.72E10
Best 1.40E13 4.49E11 1.40E13 6.51E10 1.40E13 1.40E13 1.52E13 2.54E13 1.55E13
Wor s t 1.59E13 1.72E06 1.40E13 2.24E05 2.16E08 6.81E07 3.96E09 2.32E08 5.21E09
SD 4.41E15 3.82E07 2.57E29 4.15E06 4.17E09 1.61E 07 1.03E09 4.57E09 1.50E09
Median 1.41E13 5.03E08 1.40E13 2.28E08 1.40E13 1.42E 13 6.08E13 1.34E11 3.92E13
Mean time 2.0267 2.1107 2.0286 1.8804 2.4000 2.0502 0.9473 2.7743 3.7753
Friedman rank 2.8 8.3 1.4 8.1 3.5 4.6 5.2 5.8 5.3
Manuel rank 151611 243
Wilcoxon 1.73E06 2.60E06 1.73E06 3.88E06 5.30E 01 9.26E01 8.13E01 1.92E01 0.00E + 00
Classification rate (%) 100 100 100 100 100 100 100 100 100
Table 8 Experimental results of the 3-bit Parity dataset
3-bit Parity ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 1.49E13 2.74E05 1.44E13 2.00E05 6.77E10 1.39E09 1.13E09 3.34E07 3.87E09
Best 1.32E13 5.41E08 1.44E13 2.43E09 1.40E13 1.34E13 6.04E13 7.00E12 4.01E13
Wor s t 2.24E13 0.00017 1.44E13 2.99E05 1.41E09 2.67E09 1.73E08 9.13E06 2.44E08
SD 2.54E14 4.03E05 2.57E29 1.26E05 6.99E10 1.12E09 3.51E09 1.66E06 5.78E09
Median 1.37E13 7.79E06 1.44E13 2.99E05 3.58E10 1.40E09 3.42E12 1.43E08 2.07E09
Mean time 1.9296 2.3495 2.0153 2.0680 2.5710 2.2769 1.0279 2.7993 4.0024
Friedman rank 1.6 8.5 2.2 8.4 3.6 4.5 4.2 6.6 5.4
Manuel rank 194832675
Wilcoxon 1.73E06 1.73E06 1.73E06 1.73E06 4.11E03 6.27E02 6.64E04 3.32E04 0.00E + 00
Classification rate (%) 50.00 62.50 50.00 62.50 50.00 50.00 62.50 62.50 50.00
The experimental results for the 3-bit Parity dataset are
given in Table 8. ABC is the best in terms of mean training
error results. But the classification rate of ABC is 50%. ACO,
ES, PBIL, and PSO have the same classification rate (62.5%).
The best trained model cannot produce the best classification
accuracy.
The experimental results for the 4-bit Encoder Decoder
dataset are given in Table 9. All algorithms are trapped in
the same local minima, and all of them produce the same
classification rates. Thus, the 4-bit Encoder Decoder is not
an identifier problem. This problem has four output values;
therefore, our model cannot be appropriate for solving the
4-bit Encoder Decoder problem.
The experimental results for the 3-bits XOR dataset are
given in Table 10. GA is the best in terms of mean training
123
Arabian Journal for Science and Engineering
Table 9 Experimental results of the 4-bit Encoder Decoder dataset
4-bit Encoder Decoder ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 6.23E02 6.23E02 6.23E02 6.25E 02 6.23E02 6.23E 02 6.23E02 6.23E 02 6.23E02
Best 6.23E02 6.23E02 6.23E02 6.23E02 6.23E02 6.23E02 6.23E02 6.23E02 6.23E02
Wor s t 6.23E02 6.24E02 6.23E02 6.37E02 6.23E02 6.23E02 6.23E02 6.23E02 6.23E 02
SD 1.09E07 2.31E05 2.82E17 3.76E 04 1.40E05 1.60E 07 1.18E05 3.85E 06 2.32E06
Median 6.23E02 6.23E02 6.23E02 6.24E 02 6.23E02 6.23E 02 6.23E02 6.23E 02 6.23E02
Mean time 2.2708 2.9314 2.4063 2.4090 2.7637 2.7607 1.1166 3.1452 4.7394
Friedman rank 2.8 8.0 1.5 8.5 5.2 2.3 7.3 5.3 4.1
Manuel rank 111111111
Wilcoxon 3.41E05 1.73E06 1.73E 06 1.73E06 2.41E 03 9.71E05 1.73E 06 1.04E03 0.00E +00
Classification rate (%) 25 25 25 25 25 25 25 25 25
Table 10 Experimental results of the 3-bits XOR dataset
3-bits XOR ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 2.97E17 2.63E06 3.14E30 9.28E10 7.43E20 7.11E15 3.84E12 7.37E10 1.15E10
Best 5.93E20 2.41E09 3.14E30 3.40E15 1.36E30 3.68E26 4.83E19 4.81E13 1.94E18
Worst 9.91E17 2.09E05 3.14E30 1.38E09 2.23E18 1.35E13 1.13E10 1.37E08 2.75E09
SD 2.76E17 4.62E06 2.14E45 3.99E10 4.07E19 2.78E14 2.06E11 2.53E09 5.06E10
Median 1.70E17 5.28E07 3.14E30 6.91E10 1.88E25 5.64E22 1.44E15 4.22E11 8.68E14
Mean time 3.1274 4.3180 3.4139 3.4170 3.5980 4.1621 1.3527 3.8864 7.0275
Friedman rank 3.9 9.0 1.1 7.8 2.0 3.4 5.1 7.0 5.7
Manuel rank 4 9 2 7 13586
Wilcoxon 2.35E06 1.73E06 1.73E06 3.41E05 1.73E06 5.31E05 1.48E02 1.96E03 0.00E + 00
Classification rate (%) 100 100 100 100 100 100 100 100 100
Table 11 Experimental results of the Sigmoid dataset
Sigmoid ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 2.46E01 2.48E01 2.46E01 2.49E01 2.47E01 2.47E 01 2.47E01 2.47E01 2.47E01
Best 2.46E01 2.47E01 2.46E01 2.47E01 2.46E01 2.46E01 2.47E01 2.47E01 2.47E01
Wor s t 2.46E01 2.51E01 2.46E01 2.53E01 2.49E01 2.47E01 2.48E01 2.48E01 2.48E01
SD 2.15E05 1.13E03 8.47E17 1.20E03 5.79E04 1.67E 04 3.04E04 3.26E04 3.47E04
Median 2.46E01 2.48E01 2.46E01 2.49E01 2.47E01 2.46E 01 2.47E01 2.47E01 2.47E01
Mean time 95.9492 34.0330 35.7778 31.2610 31.3496 35.0565 23.0235 47.3332 39.8589
Friedman rank 1.1 7.9 2.6 8.6 4.7 2.8 5.5 5.8 6.2
Manuel rank 121211 222
Wilcoxon 1.73E06 2.60E05 1.73E06 2.60E06 5.67E 03 3.18E06 1.75E02 2.99E01 0.00E + 00
Classification rate (%) 100.00 94.21 100.00 100.00 100.00 100.00 100.00 100.00 100.00
error results. The classification rate is 100% for all methods.
Therefore, 3-bits XOR is not an identifier problem.
The experimental results for the Sigmoid dataset are given
in Table 11. ABC, BBO, GA, and GWO share the first posi-
tion in terms of mean training error results. The classification
rate is 100% for all methods except ACO. Therefore, Sigmoid
is not an identifier problem.
The experimental results for the Cosine dataset are given
in Table 12. GWO and ABC share the best position in terms
of mean training error results. PSO is the best in terms of the
classification. Cosine is an identifier problem because every
algorithm produces different results.
The experimental results for the Sine dataset are given in
Table 13. ABC is in the best position in terms of mean training
error results. GA is the best in terms of the classification. Sine
123
Arabian Journal for Science and Engineering
Table 12 Experimental results of the Cosine dataset
Cosine ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 1.77E01 1.86E01 1.78E01 1.95E01 1.79E01 1.76E01 1.81E01 1.82E01 1.81E01
Best 1.76E01 1.79E01 1.78E01 1.82E01 1.77E01 1.76E01 1.78E01 1.78E01 1.78E01
Wor s t 1.77E01 1.97E01 1.78E01 2.13E01 1.83E01 1.77E01 1.85E01 1.85E01 1.83E01
SD 3.48E04 4.46E03 5.65E17 8.32E03 1.51E03 4.70E04 1.82E03 1.53E03 1.34E03
Median 1.76E01 1.86E01 1.78E01 1.95E01 1.78E01 1.76E01 1.81E01 1.81E01 1.81E01
Mean time 88.9853 29.4610 31.5415 26.7391 26.9503 30.9972 18.8338 43.6489 33.0181
Friedman rank 1.7 7.7 3.6 8.7 3.7 1.3 5.7 6.6 5.9
Manuel rank 143521333
Wilcoxon 1.73E06 1.24E05 2.13E06 1.73E06 1.92E06 1.73E06 9.26E01 1.48E02 0.00E + 00
Classification rate (%) 97.37 78.95 92.11 76.32 94.74 97.37 92.11 100.00 84.21
Table 13 Experimental results of the Sine dataset
Sine ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 4.13E01 4.52E01 4.41E01 4.49E01 4.54E01 4.53E01 4.35E01 4.40E01 4.46E01
Best 4.00E01 4.43E01 4.41E01 4.30E01 4.45E01 4.44E01 4.20E01 4.22E01 4.34E01
Wor s t 4.27E01 4.56E01 4.41E01 4.56E01 4.60E01 4.56E01 4.47E01 4.52E01 4.53E01
SD 6.20E03 4.12E03 1.13E16 7.51E03 3.99E03 1.86E03 6.38E03 7.42E03 5.31E03
Median 4.14E01 4.54E01 4.41E01 4.52E01 4.55E01 4.53E01 4.35E01 4.42E01 4.48E01
Mean time 88.4847 36.3617 38.3013 33.9895 34.4315 36.8322 26.4933 48.3780 42.7450
Friedman rank 1.0 7.0 3.6 6.4 8.1 7.2 2.7 3.8 5.1
Manuel rank 176498235
Wilcoxon 1.73E06 2.61E04 1.15E04 7.52E02 2.16E05 2.60E06 1.64E05 3.38E03 0.00E + 00
Classification rate (%) 59.52 57.54 56.75 60.32 67.06 56.75 66.67 60.71 57.94
Table 14 Experimental results of the Balloon dataset
Balloon ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 6.22E17 4.13E07 2.22E31 8.22E10 3.15E23 7.44E19 1.65E13 1.33E10 8.46E12
Best 1.39E20 3.99E10 2.22E31 4.68E14 3.40E40 1.10E31 4.10E21 1.43E15 2.08E17
Worst 5.68E16 4.51E06 2.22E31 1.55E09 9.45E22 8.07E18 1.87E12 1.19E09 1.73E10
SD 1.09E16 8.96E07 4.45E47 3.86E10 1.72E22 2.13E18 4.45E13 2.44E10 3.25E11
Median 3.62E17 8.62E08 2.22E31 1.03E09 9.86E34 1.05E22 2.55E15 3.11E11 6.73E14
Mean time 10.3630 6.2161 4.8502 4.9285 4.5307 6.0645 1.9197 4.9773 12.5299
Friedman rank 4.3 8.9 1.8 8.0 1.3 2.9 5.1 6.9 5.8
Manuel rank 5 9 3 8 12476
Wilcoxon 2.35E06 1.73E06 1.73E06 1.92E06 1.73E06 1.73E06 6.04E03 1.97E05 0.00E + 00
Classification rate (%) 50 050 50 50 50 50 50 0
is an identifier problem because every algorithm produces
different results.
The experimental results for the Balloon dataset are given
in Table 14. GA is in the best position in terms of mean train-
ing error results. The best trained model of ACO and TSA
cannot classify the test data. ABC, BBO, ES, GWO, PBIL,
and PSO have a 50% classification rate. Balloon is an iden-
tifier problem because every algorithm produces different
results.
The experimental results for the Iris dataset are given in
Table 15. All algorithms except ACO and GA trapped in the
same local minima. ABC is the best in terms of the classifi-
cation. This problem has three output values; therefore, our
model cannot appropriate for solving the Iris problem.
123
Arabian Journal for Science and Engineering
Table 15 Experimental results of the Iris dataset
Iris ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 2.50E01 2.84E01 2.50E01 2.51E01 3.52E01 2.50E 01 2.50E01 2.51E 01 2.52E01
Best 2.50E01 2.54E01 2.50E01 2.50E01 2.54E01 2.50E01 2.50E01 2.50E01 2.50E01
Wor s t 2.51E01 3.32E01 2.50E01 2.54E01 4.91E01 2.50E01 2.51E01 2.55E01 2.56E01
SD 1.64E04 2.02E02 1.69E16 8.61E04 7.18E02 2.90E 05 1.43E04 1.03E 03 1.58E03
Median 2.50E01 2.84E01 2.50E01 2.50E01 3.19E01 2.50E 01 2.50E01 2.51E 01 2.51E01
Mean time 255.9866 37.5246 42.3152 33.8377 33.6497 41.2012 20.9516 49.5602 56.7811
Friedman rank 3.5 8.2 1.5 5.0 8.8 1.8 3.6 6.2 6.5
Manuel rank 121121111
Wilcoxon 5.22E06 1.92E06 1.73E06 4.86E 05 1.73E06 1.73E06 2.35E 06 8.97E02 0.00E +00
Classification rate (%) 20.67 5.33 12.00 8.67 3.33 11.33 10.00 4.67 20.00
Table 16 Experimental results of the Breast Cancer dataset
Breast Cancer ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 9.36E03 1.10E02 2.35E03 3.76E02 1.82E03 1.34E03 2.55E02 2.76E02 2.92E02
Best 3.95E03 9.28E03 2.35E03 3.49E02 1.18E03 1.14E03 1.24E02 2.11E02 1.67E02
Worst 1.56E02 1.41E02 2.35E03 4.07E02 8.17E03 1.55E03 3.28E02 3.49E02 3.39E02
SD 2.89E03 1.96E03 8.82E19 1.59E03 1.32E03 1.26E04 4.92E03 3.18E03 3.45E03
Median 9.23E03 9.64E03 2.35E03 3.74E02 1.46E03 1.35E03 2.62E02 2.76E02 2.96E02
Mean time 592.9885 178.6869 200.6069 181.4763 170.7403 208.1356 145.6965 189.3432 232.0039
Friedman rank 4.3 4.7 2.9 9.0 1.8 1.4 6.8 6.8 7.4
Manuel rank 4 5 3 9 2 1687
Wilcoxon 1.73E06 1.73E06 1.73E06 1.73E06 1.73E06 1.73E06 7.27E03 2.18E02 0.00E + 00
Classification rate (%) 0 100 100 95 0 0 52 1 77
Table 17 Experimental results of the Heart dataset
Heart ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 6.49E25 1.57E17 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00
Best 0.00E + 00 0.00E + 00 0.00E + 00 0.00E + 00 0.00E + 00 0.00E + 00 0.00E + 00 0.00E + 00 0.00E + 00
Wor s t 1.94E23 4.70E16 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00
SD 3.55E24 8.59E17 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00
Median 1.12E33 4.48E33 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00
Mean time 562.8594 462.9802 636.0618 466.6571 457.8536 632.8333 225.8350 494.0318 356.5683
Friedman rank 6.7 6.8 4.5 4.5 4.5 4.5 4.5 4.5 4.5
Manuel rank 111111111
Wilcoxon 6.10E05 6.10E05 1.00E +00 1.00E +00 1.00E +00 1.00E +00 1.00E +00 1.00E +00 0.00E +00
Classification rate (%) 100 0100 100 59.09 100 00100
The experimental results for the Breast Cancer dataset
are given in Table 16. GWO is the best in terms of mean
training error results, but its trained model cannot classify
the test data. ACO and BBO share the best position in terms
of the classification. According to these results, it has seen
that breast cancer is a challenging problem.
The experimental results for the Heart dataset are given in
Table 17. All algorithms achieved the zero error, but ACO,
PBIL, and PSO cannot classify the test data. The classifica-
tion rate of GA is 59.09%. ABC, BBO, ES, GWO, and TSA
classify the test data successfully.
123
Arabian Journal for Science and Engineering
Table 18 The manual ranks
overview Manual ranks ABC ACO BBO ES GA GWO PBIL PSO TSA
XOR6 234231311
XOR9 141311122
XOR13 151611243
3-bit Parity 194832675
4-bit Enc. Dec 111111111
3-bits XOR 492713586
Sigmoid 121211222
Cosine 143521333
Sine 176498235
Balloon 593812476
Iris 121111111
Breast cancer 453921687
Heart 111111111
Total ranks 24 61 31 57 27 24 37 48 43
Table 19 The Friedman ranks
overview Friedman ranks ABC ACO BBO ES GA GWO PBIL PSO TSA
XOR6 8.4 7.8 6.6 5.8 5.5 1.9 4.7 2.3 1.9
XOR9 3.9 8.4 1.7 8.2 1.7 4.6 4.0 6.8 5.6
XOR13 2.8 8.3 1.4 8.1 3.5 4.6 5.2 5.8 5.3
3-bit Parity 1.6 8.5 2.2 8.4 3.6 4.5 4.2 6.6 5.4
4-bit Enc. Dec 2.8 8.0 1.5 8.5 5.2 2.3 7.3 5.3 4.1
3-bits XOR 3.9 9.0 1.1 7.8 2.0 3.4 5.1 7.0 5.7
Sigmoid 1.1 7.9 2.6 8.6 4.7 2.8 5.5 5.8 6.2
Cosine 1.7 7.7 3.6 8.7 3.7 1.3 5.7 6.6 5.9
Sine 1.0 7.0 3.6 6.4 8.1 7.2 2.7 3.8 5.1
Balloon 4.3 8.9 1.8 8.0 1.3 2.9 5.1 6.9 5.8
Iris 3.3 8.6 1.0 8.1 4.0 2.0 7.1 5.9 5.0
Breast cancer 4.3 4.7 2.9 9.0 1.8 1.4 6.8 6.8 7.4
Heart 6.7 6.8 4.5 4.5 4.5 4.5 4.5 4.5 4.5
Total ranks 45.9 101.6 34.4 100.1 49.6 43.5 68.0 74.0 68.1
General rank 3 9 1842576
Table 20 The classification rates
overview ABC ACO BBO ES GA GWO PBIL PSO TSA
XOR6 100 100 100 100 100 100 100 100 100
XOR9 100 100 100 100 100 100 100 100 100
XOR13 100 100 100 100 100 100 100 100 100
3-bit Parity 50.00 62.50 50.00 62.50 50.00 50.00 62.50 62.50 50.00
4-bit Enc. Dec 25 25 25 25 25 25 25 25 25
3-bits XOR 100 100 100 100 100 100 100 100 100
Sigmoid 100.00 94.21 100.00 100.00 100.00 100.00 100.00 100.00 100.00
Cosine 97.37 78.95 92.11 76.32 94.74 97.37 92.11 100.00 84.21
Sine 59.52 57.54 56.75 60.32 67.06 56.75 66.67 60.71 57.94
Balloon 50 0 50 50 50 50 50 50 0
Iris 20.67 5.33 12.00 8.67 3.33 11.33 10.00 4.67 20.00
Breast cancer 0 100 100 95 0 0 52 1 77
Heart 100 0 100 100 59.0909 100 0 0 100
Mean CR 69.4 63.3 75.8 75.2 65.3 68.5 66.0 61.8 70.3
123
Arabian Journal for Science and Engineering
Table 21 Experimental results of the Balloon dataset for TSA
Balloon N10 ST
0.1
N10 ST
0.5
N10 ST
0.9
N50 ST
0.1
N50 ST
0.5
N50 ST
0.9
N100
ST 0.1
N100
ST 0.5
N100
ST 0.9
Mean 4.08E10 1.26E10 7.19E11 5.27E09 5.17E09 5.97E09 2.67E08 1.71E08 2.62E08
Best 9.43E19 9.81E21 4.95E20 1.51E16 7.38E14 2.79E15 3.75E14 1.01E14 5.59E14
Worst 6.51E09 3.23E09 1.95E09 7.56E08 3.41E08 1.57E07 3.29E07 2.01E07 1.99E07
SD 1.47E09 5.91E10 3.56E10 1.44E08 1.01E08 2.86E08 6.56E08 4.31E08 5.94E08
Median 8.42E14 4.81E15 8.00E14 4.06E10 6.68E10 5.13E11 2.41E09 4.89E10 9.13E10
Mean time 8.0720 8.0343 8.7865 5.0810 4.9332 4.7059 4.6307 4.5278 4.2975
Friedman
rank
2.8 2.4 2.5 5.9 6.2 5.1 7.3 6.5 6.3
Manuel
rank
312495768
Wilcoxon 0.00E + 00 2.54E01 4.05E01 8.19E05 1.80E05 6.84E03 9.32E06 1.24E05 1.74E04
Classification
rate %
50.00 50.00 50.00 50.00 50.00 50.00 50.00 0.00 50.00
Table 22 Experimental results of the Iris dataset for TSA
Iris N10 ST
0.1
N10 ST
0.5
N10 ST
0.9
N50 ST
0.1
N50 ST
0.5
N50 ST
0.9
N100
ST 0.1
N100
ST 0.5
N100
ST 0.9
Mean 2.51E01 2.51E01 2.51E01 2.55E01 2.54E 01 2.53E01 2.54E 01 2.53E01 2.53E 01
Best 2.50E01 2.50E01 2.50E01 2.51E01 2.50E01 2.50E01 2.50E01 2.50E01 2.50E01
Wor s t 2.54E01 2.56E01 2.55E01 2.69E01 2.69E01 2.66E01 2.76E01 2.64E01 2.58E01
SD 8.92E04 1.15E03 1.23E03 5.24E03 3.94E 03 3.42E03 4.71E 03 3.30E03 2.25E 03
Median 2.50E01 2.50E01 2.50E01 2.52E01 2.52E 01 2.52E01 2.53E 01 2.52E01 2.52E 01
Mean time 122.8270 122.7315 107.2986 42.7832 41.5848 41.1891 31.9500 31.3707 30.3779
Friedman
rank
2.6 2.8 3.0 6.5 5.9 6.1 6.3 5.8 5.9
Manuel
rank
111211111
Wilcoxon 0.00E +00 6.58E01 7.19E01 1.97E05 2.16E 05 3.72E05 4.29E 06 4.86E05 1.64E 05
Classification
rate %
16.67 27.33 9.33 30.00 20.67 7.33 14.00 30.67 4.00
Table 23 Experimental results of the Cancer dataset for TSA
Cancer N10 ST
0.1
N10 ST
0.5
N10 ST
0.9
N50 ST
0.1
N50 ST
0.5
N50 ST
0.9
N100
ST 0.1
N100
ST 0.5
N100
ST 0.9
Mean 2.55E02 2.49E02 2.54E02 3.05E02 2.89E02 2.93E02 3.03E02 3.08E02 3.00E02
Best 1.62E02 1.50E02 1.25E02 2.44E02 1.51E02 2.43E02 2.24E02 2.17E02 2.19E02
Worst 3.39E02 3.44E02 3.31E02 3.47E02 3.41E02 3.44E02 3.59E02 3.62E02 3.66E02
SD 4.25E03 4.32E03 5.16E03 2.63E03 4.34E03 3.02E03 3.64E03 3.61E03 3.69E03
Median 2.58E02 2.54E02 2.60E02 3.04E02 2.98E02 2.91E02 3.08E02 3.21E02 3.06E02
Mean time 442.0449 337.4356 362.2962 231.0281 253.7927 311.0903 217.8359 213.5434 227.1923
Friedman
rank
3.3 2.9 3.7 6.4 5.5 5.3 5.9 6.1 6.0
Manuel
rank
421938756
Wilcoxon 0.00E + 00 5.58E01 8.61E01 3.41E05 1.71E03 1.38E03 1.15E04 1.15E04 3.06E04
Classification
rate %
77.00 74.00 0.00 89.00 77.00 81.00 76.00 77.00 87.00
123
Arabian Journal for Science and Engineering
Table 24 Experimental results of the 3-bit parity dataset for TSA
3-bit Parity N10 ST
0.1
N10 ST
0.5
N10 ST
0.9
N50 ST
0.1
N50 ST
0.5
N50 ST
0.9
N100
ST 0.1
N100
ST 0.5
N100
ST 0.9
Mean 1.70E09 1.99E06 2.06E08 4.60E06 1.66E06 8.88E07 1.39E05 8.09E06 6.14E06
Best 6.41E13 8.06E13 8.19E13 1.17E11 1.54E11 7.66E11 5.49E11 2.68E09 4.83E12
Wor s t 5.34E09 5.95E05 5.28E07 5.17E05 1.44E05 1.89E05 7.07E05 8.19E05 5.04E05
SD 1.85E09 1.09E05 9.60E08 1.15E05 3.29E06 3.54E06 2.18E05 1.85E05 1.29E05
Median 1.26E09 2.52E09 2.56E09 2.28E08 8.53E08 1.66E08 1.76E06 4.26E07 4.15E07
Mean time 1.8920 1.8433 1.8087 1.2872 1.2497 1.1923 1.2163 1.2349 1.1735
Friedman
rank
2.2 2.8 2.8 5.5 5.5 5.2 7.2 7.0 6.9
Manuel
rank
123568794
Wilcoxon 0.00E +00 1.59E01 5.71E02 4.29E06 6.34E06 6.98E06 2.13E06 1.73E06 2.13E06
Classification
rate %
50.00 75.00 50.00 75.00 50.00 50.00 50.00 62.50 75.00
Table 25 Experimental results of the 4-bit Encoder Decoder dataset for TSA
4-bit
Encoder
Decoder
N10 ST
0.1
N10 ST
0.5
N10 ST
0.9
N50 ST
0.1
N50 ST
0.5
N50 ST
0.9
N100
ST 0.1
N100
ST 0.5
N100
ST 0.9
Mean 2.06E02 2.05E02 2.06E02 2.05E 02 2.06E02 2.05E 02 2.07E02 2.06E 02 2.06E02
Best 2.05E02 2.05E02 2.05E02 2.05E02 2.05E02 2.05E02 2.05E02 2.05E02 2.05E02
Wor s t 2.23E02 2.11E02 2.17E02 2.09E02 2.18E02 2.06E02 2.20E02 2.15E02 2.10E 02
SD 3.51E04 1.11E04 2.65E04 8.37E 05 2.64E04 1.42E 05 3.22E04 1.85E 04 1.16E04
Median 2.05E02 2.05E02 2.05E02 2.05E 02 2.05E02 2.05E 02 2.05E02 2.05E 02 2.05E02
Mean time 2.5172 2.3700 2.3480 1.8992 1.8609 1.7707 1.8014 1.7484 1.7475
Friedman
rank
3.6 3.1 3.4 5.4 5.2 5.1 6.9 6.3 6.0
Manuel
rank
111111111
Wilcoxon 0.00E +00 1.85E01 6.88E01 3.00E 02 3.68E02 3.16E 02 8.31E04 3.61E 03 9.84E03
Classification
rate %
25.00 25.00 25.00 25.00 25.00 25.00 25.00 25.00 25.00
For overall analysis, the manual ranks overview is given in
Table 18, the Friedman ranks overview is given in Table 19,
and the classification rates overview is given in Table 20.////
ABC and GWO have the same total rank value in terms
of mean training error results. BBO is the best in terms of
Friedman rank and classification rank values. Every dataset
has a different type of search space. An algorithm that solves
one problem well cannot solve another problem. This situa-
tion is proved by Wolpert and Macready [27], and its name
is no free lunch theorems for optimization. The mean classi-
fication rate of the TSA is 70.3%. This value is a compatible
result. In these experiments, we use fixed stand sizes (10 and
50) and ST (0.1) for TSA. These two peculiar parameters
affect the results of the algorithm. In the next experiment,
we analyze the different stand sizes and ST values for iden-
tifier datasets (Balloon, Iris, Cancer, Parity, EncDec, Cosine,
Sine).
5.1 The Parameter Adjustment for TSA
In this section, we adjust the peculiar parameters of TSA
for the 7 identifier datasets (Balloon, Iris, Cancer, Parity,
EncDec, Cosine, and Sine). In experiments, we use 10, 50,
and 100 as stand sizes and 0.1, 0.5, and 0.9 as ST parameters.
The base method for Wilcoxon signed rank test is N10
and ST 0.1.
The experimental results of the Balloon dataset for the
TSA are given in Table 21.N10 and ST 0.5 variant is
the best in terms of mean training error results. All variants
except N100 and ST 0.5 has the same classification
rates (50%).
123
Arabian Journal for Science and Engineering
Table 26 Experimental results of the Cosine dataset for TSA
Cosine N10 ST
0.1
N10 ST
0.5
N10 ST
0.9
N50 ST
0.1
N50 ST
0.5
N50 ST
0.9
N100
ST 0.1
N100
ST 0.5
N100
ST 0.9
Mean 1.80E01 1.80E01 1.81E01 1.82E01 1.82E01 1.82E01 1.82E01 1.82E01 1.82E01
Best 1.77E01 1.77E01 1.78E01 1.78E01 1.78E01 1.78E01 1.78E01 1.77E01 1.79E01
Wor s t 1.83E01 1.83E01 1.86E01 1.89E01 1.85E01 1.86E01 1.87E01 1.87E01 1.85E01
SD 1.56E03 1.64E03 1.88E03 2.13E03 1.73E03 1.98E03 2.46E03 2.12E03 1.69E03
Median 1.80E01 1.80E01 1.81E01 1.82E01 1.82E01 1.82E01 1.82E01 1.82E01 1.82E01
Mean time 73.7480 66.3652 60.0203 36.9796 55.8863 50.7422 27.1082 27.2170 25.9935
Friedman
rank
3.7 4.0 4.2 5.0 5.7 5.6 5.9 5.2 5.5
Manuel
rank
112222213
Wilcoxon 0.00E +00 7.81E01 3.39E01 1.04E02 1.20E03 2.96E03 3.32E04 2.30E02 5.32E03
Classification
rate %
36.84 36.84 100.00 36.84 36.84 100.00 44.74 100.00 36.84
Table 27 Experimental results of the Sine dataset for TSA
SineNN10 ST
0.1
N10 ST
0.5
N10 ST
0.9
N50 ST
0.1
N50 ST
0.5
N50 ST
0.9
N100
ST 0.1
N100
ST 0.5
N100
ST 0.9
Mean 4.44E01 4.43E01 4.40E01 4.44E01 4.44E01 4.44E01 4.46E01 4.45E01 4.43E01
Best 4.29E01 4.23E01 4.23E01 4.17E01 4.27E01 4.29E01 4.33E01 4.29E01 4.24E01
Worst 4.54E01 4.53E01 4.54E01 4.54E01 4.53E01 4.53E01 4.53E01 4.54E01 4.54E01
SD 6.67E03 7.45E03 8.59E03 8.24E03 6.32E03 6.96E03 6.23E03 6.60E03 8.10E03
Median 4.46E01 4.45E01 4.42E01 4.47E01 4.44E01 4.46E01 4.48E01 4.47E01 4.43E01
Mean time 56.8857 56.5048 55.8578 37.4354 54.5648 52.6769 52.1033 48.5141 38.4682
Friedman
rank
5.3 4.6 3.9 5.4 4.9 5.1 5.5 5.4 4.9
Manuel
rank
522145653
Wilcoxon 0.00E + 00 4.17E01 7.52E02 9.26E01 8.13E01 9.43E01 2.21E01 4.41E01 6.00E01
Classification
rate %
100.00 52.78 51.98 55.95 100.00 51.98 100.00 89.68 51.98
Table 28 The Friedman ranks overview for the TSA variants
NN10 ST
0.1
N10 ST
0.5
N10 ST
0.9
N50 ST
0.1
N50 ST
0.5
N50 ST
0.9
N100
ST 0.1
N100
ST 0.5
N100
ST 0.9
Balloon 2.8 2.4 2.5 5.9 6.2 5.1 7.3 6.5 6.3
Iris 2.6 2.8 3.0 6.5 5.9 6.1 6.3 5.8 5.9
Cancer 3.3 2.9 3.7 6.4 5.5 5.3 5.9 6.1 6.0
Parity 2.2 2.8 2.8 5.5 5.5 5.2 7.2 7.0 6.9
EncDec 3.6 3.1 3.4 5.4 5.2 5.1 6.9 6.3 6.0
Cosine 3.7 4.0 4.2 5.0 5.7 5.6 5.9 5.2 5.5
Sine 5.3 4.6 3.9 5.4 4.9 5.1 5.5 5.4 4.9
Tot a l FR 23. 5 22.6 23.5 40.2 38.9 37.5 45.0 42.4 41.4
123
Arabian Journal for Science and Engineering
Table 29 The classification rates overview for the TSA
N10 ST
0.1
N10 ST
0.5
N10 ST
0.9
N50 ST
0.1
N50 ST
0.5
N50 ST
0.9
N100
ST 0.1
N100
ST 0.5
N100
ST 0.9
Balloon 50.00 50.00 50.00 50.00 50.00 50.00 50.00 0.00 50.00
Iris 16.67 27.33 9.33 30.00 20.67 7.33 14.00 30.67 4.00
Cancer 77.00 74.00 0.00 89.00 77.00 81.00 76.00 77.00 87.00
Parity 50.00 75.00 50.00 75.00 50.00 50.00 50.00 62.50 75.00
EncDec 25.00 25.00 25.00 25.00 25.00 25.00 25.00 25.00 25.00
Cosine 36.84 36.84 100.00 36.84 36.84 100.00 44.74 100.00 36.84
Sine 100.00 52.78 51.98 55.95 100.00 51.98 100.00 89.68 51.98
Total FR 50.8 48.7 40.9 51.7 51.4 52.2 51.4 55.0 47.1
Table 30 The classification rates
overview ABC ACO BBO ES GA GWO PBIL PSO TSA
XOR6 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
XOR9 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
XOR13 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
3-bit Parity 50.00 62.50 50.00 62.50 50.00 50.00 62.50 62.50 62.50
4-bit Enc. Dec 25.00 25.00 25.00 25.00 25.00 25.00 25.00 25.00 25.00
3-bits XOR 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
Sigmoid 100.00 94.21 100.00 100.00 100.00 100.00 100.00 100.00 100.00
Cosine 97.37 78.95 92.11 76.32 94.74 97.37 92.11 100.00 100.00
Sine 59.52 57.54 56.75 60.32 67.06 56.75 66.67 60.71 89.68
Balloon 50.00 0.00 50.00 50.00 50.00 50.00 50.00 50.00 0.00
Iris 20.67 5.33 12.00 8.67 3.33 11.33 10.00 4.67 30.67
Breast cancer 0.00 100.00 100.00 95.00 0.00 0.00 52.00 1.00 77.00
Heart 100.00 0.00 100.00 100.00 59.09 100.00 0.00 0.00 100.00
Banknote 66.69 62.76 55.54 60.28 55.54 55.54 55.54 61.95 55.54
Diabetic 51.26 58.12 47.87 51.69 51.26 43.96 54.04 54.74 52.13
Twonorm 84.58 73.61 86.16 85.95 95.95 92.95 89.85 83.64 88.59
Ringnorm 63.24 55.04 60.69 64.18 78.19 69.15 67.64 66.59 63.08
Spambase 42.19 43.06 40.62 42.84 42.84 39.40 41.88 42.21 40.69
Mean CR 67.25 62.01 70.93 71.26 65.17 66.19 64.85 61.83 71.38
The experimental results of the Iris dataset for the TSA
are given in Table 22. All variants except N50 and ST
0.1 variant are trapped in the same local minima in terms of
mean training error results. N100 and ST 0.5 variant is
the best in terms of classification rates.
The experimental results of the Cancer dataset for the TSA
are given in Table 23.N10 and ST 0.9 variant is the
best in terms of mean training error results. N50 and ST
0.1 variant is the best in terms of classification rates.
The experimental results of the 3-bit Parity dataset for the
TSA are given in Table 24.N10 and ST 0.1 variant is
the best in terms of mean training error results. N10 and
ST 0.5, N50 and ST 0.1 and N100 and ST
0.9 variants share the best position in terms of classification
rates.
The experimental results of the 4-bit Encoder Decoder
dataset for the TSA are given in Table 25. All variants are
trapped in the same local minima, and all of them produce
the same classification rates.
The experimental results of the Cosine dataset for the TSA
are given in Table 26.N10 and ST 0.1, N10 and
ST 0.5 and N100 and ST 0.5 variants share the best
position in terms of mean training error results. N10 and
ST 0.9, N50 and ST 0.9 and N100 and ST
0.5 variants share the best position in terms of classification
rates.
The experimental results of the Sine dataset for the TSA
are given in Table 27.N50 and ST 0.1 variant is the best
in terms of mean training error results. N10 and ST 0.1,
N50 and ST 0.5 and N100 and ST 0.1 variants
share the best position in terms of classification rates.
123
Arabian Journal for Science and Engineering
Table 31 The classification rates of the Cancer dataset for the TSA variants
Run No N10 ST
0.1
N10 ST
0.5
N10 ST
0.9
N50 ST
0.1
N50 ST
0.5
N50 ST
0.9
N100
ST 0.1
N100
ST 0.5
N100
ST 0.9
1799190879088818853
2747491859282838757
3578074886186898325
4848166338694877783
5798983887990848782
6826583748495908484
7883669778383794877
8847883928583847982
9848490828481909083
10 84 88 91 79 76 81 79 96 88
11 94 77 85 89 82 85 80 88 89
12 82 79 59 84 83 82 87 87 87
13 95 89 41 84 85 86 88 84 42
14 87 77 86 92 90 81 85 92 83
15 86 76 73 92 86 78 84 88 83
16 86 78 52 83 88 84 86 82 64
17 56 88 47 82 77 86 85 78 81
18 84 78 87 89 83 88 25 75 82
19 78 81 82 91 42 82 89 77 39
20 78 80 34 85 83 84 84 70 90
21 93 85 80 76 87 92 93 81 81
22 87 86 85 45 85 85 56 81 89
23 83 80 37 86 76 82 13 81 82
24 77 80 69 89 90 91 76 87 83
25 78 82 81 83 79 73 88 79 45
26 67 26 78 87 74 91 86 84 84
27 86 81 0 74 90 81 92 53 80
28 79 83 87 83 80 85 89 85 85
29 67 87 83 81 61 70 87 86 85
30 88 61 84 88 84 90 83 63 63
Mean CR 80.87 77.33 71.67 81.60 80.83 84.63 80.07 80.67 74.37
Rank379241658
MaxCR959191929295939690
Min CR 56 26 0 33 42 70 13 48 25
For the overall analysis of the TSA variants, the Friedman
ranks overview for the TSA variants is given in Table 28 and
the classification rates overview for the TSA variants is given
in Table 29.
N10 and ST 0.5 variant is the best in terms of mean
Friedman rank results. N100 and ST 0.5 variant is the
best in terms of classification rate values. The last combined
values of the classification rate are given in Table 30.
According to the classification rates in Table 30, TSA is the
best classifier in these experiment area. TSA is the best solver
on 18 different type datasets in terms of mean classification
rates. In this work, the second is ES, the third is BBO, the
fourth is ABC, the fifth is GWO, the sixth is GA, the seventh
is PBIL, the eighth is ACO, and the last is PSO.
5.2 The Deep Run Analyses for TSA
In the aforementioned experiments, we use the best trained
model for classification the test data. When we look at deeply,
the best trained model is not the best in the test phase at every
time. The classification rates of the Cancer dataset for the
TSA variants are given in Table 31.
The maximum classification rate is 96% when N100
and ST 0.5. According to these results, for better classifi-
123
Arabian Journal for Science and Engineering
Fig. 4 The convergence graph for the Cancer dataset
cation we should not use only the best trained model, but we
must look at the different run model results. The convergence
graph for the Cancer dataset is given in Fig. 4.
According to Fig. 4, TSA achieves the best in terms of
the mean square error in the training phase for the Cancer
dataset. ES, PSO, PBIL, and ACO trapped local minima early
but ABC, BBO, GA, GWO, and TSA continue the search
process until the termination criterion is met. This graph is
also proved the success of the TSA again.
In this type of study, there are two main limitations:
general limitations and specific limitations. The general lim-
itations are related to all methods. These are determining
the population size; trapping into local optima; detecting the
effective exploration and exploitation ratio. The specific lim-
itations of TSA are related to the peculiar parameters of the
method. These are determining the search tendency param-
eter and determining the number of seeds parameter. The
search tendency controls the new candidate solution creating
scheme, and the number of seeds controls the exploitation in
the search space. In this work, we analyzed the search ten-
dency parameter. Experimental results showed that 0.5 is a
good value for the search tendency parameter. This means
the new candidate solution creation equations are used as
half-and-half.
5.3 The Experiments on Large Datasets
In this section, the experiments on five large datasets are
given. The experimental results of Banknote, Diabetic,
Twonorm, Ringnorm, and Spambase datasets are located into
Tables 32,33,34,35,36, respectively.
The experimental results for the Banknote dataset are
given in Table 32. BBO is in the best position in terms of
mean training error results. ABC produced the maximum
classification rate.
The experimental results for the Diabetic dataset are given
in Table 33. GA is in the best position in terms of mean train-
ing error results. ACO produced the maximum classification
rate.
The experimental results for the Twonorm dataset are
given in Table 34. GA is in the best position in terms of mean
training error results. GA produced the maximum classifica-
tion rate.
The experimental results for the Ringnorm dataset are
given in Table 35. GA is in the best position in terms of mean
training error results. GA produced the maximum classifica-
tion rate.
The experimental results for the Spambase dataset are
given in Table 36. All algorithms produced the optimum
results in terms of mean training error results. ACO produced
the maximum classification rate.
6 Conclusion
In this paper, FF MLP ANN is trained by TSA for the first
time. TSA is one of the population-based swarm intelligence
Table 32 Experimental results of the Banknote dataset
Banknote ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 4.39E20 7.84E24 1.38E87 1.02E53 1.88E79 1.31E71 7.72E69 1.95E41 6.79E55
Best 4.81E34 5.00E36 1.38E87 2.37E70 1.38E87 1.51E87 2.95E87 1.80E51 1.61E74
Worst 6.43E19 1.71E22 1.38E87 4.12E53 4.96E78 2.04E70 2.30E67 5.28E40 2.04E53
SD 1.55E19 3.22E23 9.08E103 1.49E53 9.10E79 4.79E71 4.19E68 9.65E41 3.72E54
Median 1.35E22 2.04E28 1.38E87 8.53E61 1.65E87 2.43E79 9.67E78 1.46E45 6.19E66
Mean time 91.9752 5.7195 4.7103 4.7717 4.5926 5.5933 2.8472 4.8154 42.1367
Friedman rank 8.9 8.1 1.0 5.9 2.1 3.1 3.8 7.0 5.1
Manuel rank 8 7 1512364
Wilcoxon 1.73E06 1.73E06 1.73E06 0.001287 1.73E06 2.13E06 1.73E06 1.73E06 0
Classification rate (%) 66.69 62.76 55.54 60.28 55.54 55.54 55.54 61.95 55.54
123
Arabian Journal for Science and Engineering
Table 33 Experimental results of the Diabetic dataset
Diabetic ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 6.81E02 2.08E01 2.14E05 1.52E01 3.44E02 7.82E02 1.39E01 1.42E01 1.43E01
Best 4.07E03 1.58E01 2.14E05 1.21E01 6.77E06 3.67E02 1.06E01 9.14E02 6.54E02
Worst 1.50E01 2.51E01 2.14E05 1.84E01 1.00E01 1.31E01 1.87E01 1.72E01 1.76E01
SD 3.01E02 2.48E02 6.89E21 1.61E02 2.95E02 2.20E02 1.87E02 2.38E02 2.29E02
Median 6.24E02 2.11E01 2.14E05 1.58E01 3.26E02 7.77E02 1.41E01 1.46E01 1.44E01
Mean time 298.6501 57.9424 47.1950 47.5073 37.6225 61.5118 17.7858 38.4946 195.2061
Friedman rank 3.3 8.8 1.1 7.1 2.3 3.5 6.0 6.5 6.4
Manuel rank 3 9 2 8 14765
Wilcoxon 2.35E06 1.92E06 1.73E06 0.135908 1.73E06 1.92E06 0.158855 0.975387 0
Classification rate (%) 51.26 58.12 47.87 51.69 51.26 43.96 54.04 54.74 52.13
Table 34 Experimental results of the Twonorm dataset
Twonorm ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 1.43E03 1.27E01 9.37E107 3.78E04 6.10E59 5.06E54 3.19E14 1.50E06 4.01E08
Best 1.99E17 4.18E02 9.37E107 1.29E20 1.10E139 1.63E94 4.83E31 9.20E18 1.28E23
Worst 4.30E02 2.01E01 9.37E107 7.26E03 1.83E57 1.49E52 9.34E13 1.43E05 1.13E06
SD 7.85E03 3.93E02 2.46E122 1.48E03 3.34E58 2.72E53 1.70E13 4.00E06 2.06E07
Median 9.37E17 1.22E01 9.37E107 2.87E08 1.48E107 4.42E69 1.18E21 4.17E09 3.32E14
Mean time 2112.1370 97.5102 87.5008 87.7857 77.1834 102.8828 55.6546 77.8253 927.3152
Friedman rank 5.5 9.0 1.5 7.3 1.6 2.9 4.2 7.2 5.7
Manuel rank 8 9 2 6 13475
Wilcoxon 0.007731 1.73E06 1.73E06 0.000529 1.73E06 1.73E06 6.32E05 9.32E06 0
Classification rate (%) 84.58 73.61 86.16 85.95 95.95 92.95 89.85 83.64 88.59
Table 35 Experimental results of the Ringnorm dataset
Ringnorm ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 2.13E02 1.69E01 6.51E28 1.03E01 8.88E10 5.60E12 3.18E02 7.47E02 5.43E02
Best 6.21E15 1.00E01 6.51E28 5.52E02 1.44E32 6.81E24 3.46E07 3.05E02 2.61E03
Worst 5.02E02 2.26E01 6.51E28 1.41E01 2.66E08 1.48E10 1.06E01 1.08E01 1.12E01
SD 2.43E02 3.20E02 9.12E44 2.33E02 4.86E09 2.72E11 2.47E02 2.23E02 2.72E02
Median 1.32E03 1.68E01 6.51E28 1.06E01 3.39E18 6.39E17 3.45E02 7.82E02 5.05E02
Mean time 2115.7338 97.4103 87.4961 87.9626 76.6757 102.8440 55.4910 77.9352 928.1096
Friedman rank 4.4 9.0 1.1 7.7 2.3 2.7 5.0 6.9 5.9
Manuel rank 5 3 6 1 79842
Wilcoxon 4.07E05 1.73E06 1.73E06 3.18E06 1.73E06 1.73E06 0.003854 0.002255 0
Classification rate (%) 63.24 55.04 60.69 64.18 78.19 69.15 67.64 66.59 63.08
algorithms. TSA has two peculiar parameters which are ST
and NS. ST controls the exploration and exploitation progress
of the algorithm. NS provides better intensification about the
current solutions. FF MLP ANN is converted to a vector
and TSA optimizes this vector. Eighteen different datasets
(XOR6, XOR9, XOR13, 3-bit Parity, 4-bit Encoder Decoder,
3-bits XOR, Sigmoid, Cosine, Sine, Balloon, Iris, Breast
Cancer, Heart, Banknote, Diabetic, Twonorm, Ringnorm,
and Spambase) are used in experiments. TSA is compared
with PSO, GWO, GA, ACO, ES, PBIL, ABC, and BBO. The
experimental results show that TSA is the best in terms of
mean classification rates and outperformed the opponents on
18 problems. The obtained results are proven by two differ-
ent statistical (Wilcoxon signed rank test and Friedman’s test)
tests. Generally speaking, the swarm-based methods suffer
from low exploration, but TSA has an efficient exploration
123
Arabian Journal for Science and Engineering
Table 36 Experimental results of the Spambase dataset
Spambase ABC ACO BBO ES GA GWO PBIL PSO TSA
Mean 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00
Best 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00
Wor s t 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00
SD 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00
Median 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00 0.00E +00
Mean time 2751.0268 468.9487 418.0443 428.8673 344.7158 538.8551 154.8249 341.6761 1754.3480
Friedman rank 111111111
Manuel rank 111111111
Wilcoxon 111111110
Classification rate (%) 42.19 43.06 40.62 42.84 42.84 39.40 41.88 42.21 40.69
mechanism. In future studies, the improved versions of TSA
would be used for training FF MLP ANN.
Funding The authors wish to thank Scientific Research Projects Coor-
dinatorship at Selcuk University and The Scientific and Technological
Research Council of Turkey for their institutional supports.
Compliance with ethical standards
Conflicts of interest The authors declare that they have no conflict of
interest.
References
1. Bassett, D.S.; Gazzaniga, M.S.: Understanding complexity in the
human brain. Trends Cognit. Sci. 15(5), 200–209 (2011)
2. Haykin, S.: Neural Networks: A Comprehensive Foundation. Pren-
tice Hall PTR, New York (1994)
3. Yao, L.; Li, T.; Li, Y.; Long, W.; Yi, J.: An improved feed-forward
neural network based on UKF and strong tracking filtering to estab-
lish energy consumption model for aluminum electrolysis process.
Neural Comput. Appl. 31(8), 4271–4285 (2019)
4. Zhang, Y.; Gendeel, M.A.A.; Peng, H.; Qian, X.: Xu H Super-
vised Kohonen network with heterogeneous valuedifference metric
for both numeric and categorical inputs. Soft Comput.ss 24(3),
1763–1774 (2020)
5. Mirjalili, S.: Evolutionary radial basis function networks. In:
Evolutionary Algorithms and Neural Networks: Theory and Appli-
cations. Springer International Publishing, Cham, pp 105–139
(2019). https://doi.org/10.1007/978-3-319-93025-1-8
6. Shojaeifard, A.; Amroudi, A.N.; Mansoori, A.; Erfanian, M.: Pro-
jection recurrent neural network model: a new strategy to solve
weapon-target assignment problem. Neural Process. Lett. 30(8),
2538–2547 (2019)
7. Tavanaei, A.; Ghodrati, M.; Kheradpisheh, S.R.; Masquelier, T.;
Maida, A.: Deep learning in spiking neural networks. Neural Netw.
111, 47–63 (2019)
8. Mirjalili, S.: How effective is the Grey Wolf optimizer in training
multi-layer perceptrons. Appl. Intell. 43(1), 150–161 (2015)
9. Lee, S.-J.; Tseng, C.-H.; Lin, G.R.; Yang, Y.; Yang, P.; Muham-
mad, K.; Pandey, H.M.: A dimension-reduction based multilayer
perception method for supporting the medical decision making.
Pattern Recogn. Lett. 131, 15–22 (2020)
10. Hertz, J.A.: Introduction to the Theory of Neural Computation.
CRC Press, Amsterdam (2018)
11. Mitchell, M.; Holland, J.H.; Forrest, S.: When will a genetic algo-
rithm outperform hill climbing. In: Advances in Neural Information
Processing Systems, pp. 51–58 (1994)
12. Sonuc, E.; Sen, B.; Bayir, S.: A cooperative GPU-based parallel
multistart simulated annealing algorithm for quadratic assignment
problem. Eng. Sci. Technol. Int. J. 21(5), 843–849 (2018). https://
doi.org/10.1016/j.jestch.2018.08.002
13. Pandey, H.M.; Rajput, M.; Mishra, V.: Performance compari-
son of pattern search, simulated annealing, genetic algorithm and
jaya algorithm. In: Data Engineering and Intelligent Computing.
Springer, Berlin, pp 377–384 (2018)
14. ¸Sahman, M.A.; Altun, A.A.; Dündar, A.O.: A new MILP model
proposal in feed formulation and using a hybrid-linear binary PSO
(H-LBP) approach for alternative solutions. Neural Comput. Appl.
29(2), 537–552 (2018)
15. Cinar, A.C.; Korkmaz, S.; Kiran, M.S.: A discrete tree-seed algo-
rithm for solving symmetric traveling salesman problem. Eng. Sci.
Technol. Int. J. (2019)
16. Tongur, V.; Hacibeyoglu, M.; Ulker, E.: Solving a big-scaled hos-
pital facility layout problem with meta-heuristics algorithms. Eng.
Sci. Technol. Int. J. (2019)
17. Xu, X.; Rong, H.; Trovati, M.; Liptrott, M.; Bessis, N.: CS-PSO:
chaotic particle swarm optimization algorithm for solving com-
binatorial optimization problems. Soft. Comput. 22(3), 783–795
(2018)
18. Egrioglu, E.; Yolcu, U.; Bas, E.; Dalar, A.Z.: Median-Pi artifi-
cial neural network for forecasting. Neural Comput. Appl. 31(1),
307–316 (2019)
19. Yasar, A.; Saritas, I.; Sahman, M.A.; Dundar, A.O.: Classification
of leaf type using artificial neural networks. Int. J. Intell. Syst. Appl.
Eng. 3(4), 136–139 (2015)
20. Yasar, A.; Saritas, I.; Sahman, M.; Cinar, A.: Classification of
parkinson disease data with artificial neural networks. In: IOP
Conference Series: Materials Science and Engineering, vol 1. IOP
Publishing, p. 012031 (2019
21. Sulistyo, S.B.; Woo, W.L.; Dlay, S.S.: Regularized neural networks
fusion and genetic algorithm based on-field nitrogen status esti-
mation of wheat plants. IEEE Trans. Industr. Inf. 13(1), 103–114
(2016)
22. Sulistyo, S.B.; Woo, W.L.; Dlay, S.S.; Gao, B.: Building a globally
optimized computational intelligent image processing algorithm
for on-site inference of nitrogen in plants. IEEE Intell. Syst. 33(3),
15–26 (2018)
123
Arabian Journal for Science and Engineering
23. Gu, K.; Zhou, Y.; Sun, H.; Zhao, L.; Liu, S.: Prediction of air quality
in Shenzhen based on neural network algorithm. Neural Comput.
Appl. 1–14 (2019)
24. Koh, B.H.D.; Woo, W.L.: Multi-view temporal ensemble for clas-
sification of non-stationary signals. IEEE Access 7, 32482–32491
(2019)
25. Boashash, B.; Ouelha, S.: Designing high-resolution time–fre-
quency and time–scale distributions for the analysis and classifica-
tion of non-stationary signals: a tutorial review with a comparison
of features performance. Digital Signal Process. 77, 120–152
(2018)
26. Delsy, T.T.M.; Nandhitha, N.; Rani, B.S.: Feasibility of spectral
domain techniques for the classification of non-stationary signals.
J. Ambient Intell. Hum. Comput, 1–8 (2020)
27. Wolpert, D.H.; Macready, W.G.: No free lunch theorems for opti-
mization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997)
28. Wienholt, W.: Minimizing the system error in feedforward neural
networks with evolution strategy. In: International Conference on
Artificial Neural Networks, Springer, Berlin, pp 490–493 (1993)
29. Seiffert, U.: Multiple layer perceptron training using genetic algo-
rithms. In: ESANN, Citeseer, pp 159–164 (2001)
30. Mendes, R.; Cortez, P.; Rocha, M.; Neves, J.: Particle swarms for
feedforward neural network training. In: Proceedings of the 2002
International Joint Conference on Neural Networks. IJCNN’02
(Cat. No. 02CH37290), IEEE, pp 1895–1899 (2002)
31. Blum, C.; Socha, K.: Training feed-forward neural networks with
ant colony optimization: an application to pattern classification.
In: Fifth International Conference on Hybrid Intelligent Systems
(HIS’05), IEEE (2005)
32. Karaboga, D.; Akay, B.; Ozturk, C.: Artificial bee colony (ABC)
optimization algorithm for training feed-forward neural networks.
In: International conference on modeling decisions for artificial
intelligence, Springer, Berlin, pp. 318–329 (2007)
33. Mirjalili, S.; Hashim, S.Z.M.; Sardroudi, H.M.: Training feedfor-
ward neural networks using hybrid particle swarm optimization
and gravitational search algorithm. Appl. Math. Comput. 218(22),
11125–11137 (2012)
34. Mirjalili, S.; Mirjalili, S.M.; Lewis, A.: Let a biogeography-based
optimizer train your multi-layer perceptron. Inf. Sci. 269, 188–209
(2014)
35. Amirsadri, S.; Mousavirad, S.J.; Ebrahimpour-Komleh, H.: A Levy
flight-based grey wolf optimizer combined with back-propagation
algorithm for neural network training. Neural Comput. Appl.
30(12), 3707–3720 (2018)
36. Haklı, H.; U˘guz, H.: A novel particle swarm optimization algorithm
with Levy flight. Appl. Soft Comput. 23, 333–345 (2014)
37. Xu, F.; Pun, C.-M.; Li, H.; Zhang, Y.; Song, Y.; Gao, H.: Training
feed-forward artificial neural networks with a modified artificial
bee colony algorithm. Neurocomputing (2019)
38. Zhang, X.; Wang, X.; Chen, H.; Wang, D.; Fu, Z.: Improved GWO
for large-scale function optimization and MLP optimization in can-
cer identification. Neural Comput. Appl., 1–21 (2019)
39. Heidari, A.A.; Faris, H.; Mirjalili, S.; Aljarah, I.; Mafarja, M.: Ant
lion optimizer: theory, literature review, and application in multi-
layer perceptron neural networks. In: Nature-inspired optimizers.
Springer, Berlin, pp 23–46 (2020)
40. Dalwinder, S.; Birmohan, S.; Manpreet, K.: Simultaneous feature
weighting and parameter determination of neural networks using
ant lion optimization for the classification of breast cancer. Biocy-
bernet. Biomed. Eng. (2019)
41. Faris, H.; Aljarah, I.; Mirjalili, S.: Training feedforward neural
networks using multi-verse optimizer for binary classification prob-
lems. Appl. Intell. 45(2), 322–332 (2016)
42. Gao, B.; Li, X.; WooyunTian, W.L.G.: Physics-based image
segmentation using first order statistical properties and genetic
algorithm for inductive thermography imaging. IEEE Trans. Image
Process. 27(5), 2160–2175 (2017)
43. Mutluer, M.; ¸Sahman, M.A.; Çunka¸s, M.: Heuristic optimization
based on penalty approach for surface permanent magnet syn-
chronous machines. Arab. J. Sci. Eng. 1–17 (2020)
44. Karasekreter, N.; ¸Sahman, M.A.; Ba¸sçiftçi, F.; Fidan, U.: PSO
based clustering for the optimization of energy consumption in
wireless sensor network. Emerg. Mater. Res, 1–7 (2020)
45. Kiran, M.S.: TSA: tree-seed algorithm for continuous optimization.
Expert Syst. Appl. 42(19), 6686–6698 (2015)
46. Kıran, M.S.: An implementation of tree-seed algorithm (TSA) for
constrained optimization. In: Intelligent and evolutionary systems.
Springer, Berlin, pp 189–197 (2016)
47. Babalik, A.; Cinar, A.C.; Kiran, M.S.: A modification of tree-seed
algorithm using Deb’s rules for constrained optimization. Appl.
Soft Comput. 63, 289–305 (2018)
48. El-Fergany, A.A.; Hasanien, H.M.: Tree-seed algorithm for solv-
ing optimal power flow problem in large-scale power systems
incorporating validations and comparisons. Appl. Soft Comput.
64, 307–316 (2018)
49. Zhou, J.; Zheng, Y.; Xu, Y.; Liu, H.; Chen, D.: A heuristic
TS fuzzy model for the pumped-storage generator-motor using
variable-length tree-seed algorithm-based competitive agglomer-
ation. Energies 11(4), 944 (2018)
50. Horng, S.-C.; Lin, S.-S.: Embedding ordinal optimization into
tree–seed algorithm for solving the probabilistic constrained sim-
ulation optimization problems. Appl. Sci. 8(11), 2153 (2018)
51. Zheng, Y.; Zhou, J.; Zhu, W.; Zhang, C.; Li, C.; Fu, W.: Design of a
multi-mode intelligent model predictive control strategy for hydro-
electric generating unit. Neurocomputing 207, 287–299 (2016)
52. Chen, W.; Tan, X.; Cai, M.: Parameter identification of equivalent
circuit models for Li-ion batteries based on tree seeds algorithm.
In: IOP Conference Series: Earth and Environmental Science, vol
1. IOP Publishing, p 012024 (2017)
53. Chen, W.; Cai, M.; Tan, X.; Wei, B.: Parameter identification and
state-of-charge estimation for Li-Ion batteries using an improved
tree seed algorithm. IEICE Trans. Inf. Syst. 102(8), 1489–1497
(2019)
54. Ding, Z.; Zhao, Y.; Lu, Z.: Simultaneous identification of structural
stiffness and mass parameters based on Bare-bones Gaussian Tree
Seeds Algorithm using time-domain data. Appl. Soft Comput. 83,
105602 (2019)
55. Zhao, S.; Wang, N.; Liu, X.: Artificial bee colony algorithm
with tree-seed searching for modeling multivariable systems using
GRNN. In: 2019 Chinese Control And Decision Conference
(CCDC), IEEE, pp. 4702–4707 (2019)
56. Sahman, M.; Cinar, A.; Saritas, I.; Yasar, A.: Tree-seed algorithm in
solving real-life optimization problems. In: IOP conference series:
materials science and engineering, vol 1. IOP Publishing (2019)
57. Ding, Z.; Li, J.; Hao, H.; Lu, Z.-R.: Nonlinear hysteretic param-
eter identification using an improved tree-seed algorithm. Swarm
Evolut. Comput. 46, 69–83 (2019)
58. Ding, Z.; Li, J.; Hao, H.: Structural damage detection with uncer-
tainties using a modified tree seeds algorithm. In: International
Conference on Computational & Experimental Engineering and
Sciences, Springer, Berlin, pp. 751–760 (2019)
59. Muneeswaran, V.; Rajasekaran, M.P.: Gallbladder shape estimation
using tree-seed optimization tuned radial basis function network
for assessment of acute cholecystitis. In: Intelligent engineering
informatics. Springer, pp 229–239 (2018)
60. Cinar, A.; Kiran, M.: A parallel version of tree-seed algorithm
(TSA) within CUDA platform. In: Selçuk International Scientific
Conference on Applied Sciences (2016)
61. Cinar, A.C.; Kiran, M.S.: A parallel implementation of tree-seed
algorithm on CUDA-supported graphical processing unit. J Fac
Eng Archit Gazi Univ 33(4), 1397–1409 (2018)
123
Arabian Journal for Science and Engineering
62. Muneeswaran, V.; Rajasekaran, M.P. Beltrami-regularized denois-
ing filter based on tree seed optimization algorithm: an ultrasound
image application. In: International conference on information
and communication technology for intelligent systems, Springer,
pp. 449–457 (2017)
63. Muneeswaran, V.; Rajasekaran, M.P.: Local contrast regularized
contrast limited adaptive histogram equalization using tree seed
algorithm—an aid for mammogram images enhancement. In:
Smart Intelligent Computing and Applications. Springer, Berlin,
pp 693–701 (2019)
64. Ding, Z.; Li, J.; Hao, H.; Lu, Z.-R.: Structural damage identifi-
cation with uncertain modelling error and measurement noise by
clustering based tree seeds algorithm. Eng. Struct. 185, 301–314
(2019)
65. Oliva, D.; Elaziz, M.A.; Hinojosa, S.: Otsu’s between class variance
and the tree seed algorithm. In: Metaheuristic Algorithms for Image
Segmentation: Theory and Applications. Springer, pp 71–83 (2019)
66. Cinar, A.C.; Kiran, M.S.: Similarity and logic gate-based tree-
seed algorithms for binary optimization. Comput. Ind. Eng. 115,
631–646 (2018)
67. Cinar, A.C.; Iscan, H.; Kiran, M.S.: Tree-seed algorithm for large-
scale binary optimization. In: KnE Social Sciences, pp. 48–64
(2018)
68. Sahman, M.A.; Cinar, A.C.: Binary tree-seed algorithms with S-
shaped and V-shaped transfer functions. Int. J. Intell. Syst. Appl.
Eng. 7(2), 111–117 (2019)
69. Kiran, M.S.: Withering process for tree-seed algorithm. Proced.
Comput. Sci. 111, 46–51 (2017)
70. Aslan, M.; Beskirli, M.; Kodaz, H.; Kıran, M.S.: An improved
tree seed algorithm for optimization problems. Int. J. Mach. Learn.
Comput. 8(1), 20–25 (2018)
71. Çınar, A.C.; Kıran, M.S. Boundary conditions in Tree-Seed Algo-
rithm: analysis of the success of search space limitation techniques
in Tree-Seed Algorithm. In: 2017 International Conference on
Computer Science and Engineering (UBMK), IEEE, pp. 571–576
(2017)
72. Be¸skirli, A.; Özdemir, D.; Temurta¸s, H.: A comparison of modi-
fied tree–seed algorithm for high-dimensional numerical functions.
Neural Comput. Appl., 1–35 (2019)
73. Gungor, I.; Emiroglu, B.G.; Cinar, A.C.; Kiran, M.S.: Integration
search strategies in tree seed algorithm for high dimensional func-
tion optimization. Int. J. Mach. Learn. Cybernet., 1–19 (2019)
74. Jiang, J.; Jiang, S.; Meng, X.; Qiu, C.: EST-TSA: An effective
search tendency based to tree seed algorithm. Physica A 534,
122323 (2019)
75. Jiang, J.; Xu, M.; Meng, X.; Li, K.: STSA: A sine Tree-Seed Algo-
rithm for complex continuous optimization problems. Physica A
537, 122802 (2020)
76. Be¸skirli, M.; Yüksek, B.: Test Fonksiyonlarında A˘gaç Tohum Algo-
ritmasının Performans Analizi. Avrupa Bilim ve Teknoloji Dergisi,
pp. 93–101
77. Chen, F.; Ye, Z.; Wang, C.; Yan, L.; Wang, R.: A feature selec-
tion approach for network intrusion detection based on tree-seed
algorithm and k-nearest neighbor. In: 2018 IEEE 4th International
Symposium on Wireless Systems within the International Confer-
ences on Intelligent Data Acquisition and Advanced Computing
Systems (IDAACS-SWS), IEEE, pp 68–72 (2018)
123
... The MLPs have a wide range of applications in hydrological research, including streamflow forecasting (De Faria et al., 2022), rainfall forecasting (Diop et al., 2020), monthly pan evaporation prediction (Zounemat-Kermani et al., 2021), etc. The ANN module in our proposed system consists of eight MLPs, each having an input layer, a hidden layer, and an output layer of nodes (Cinar, 2020). Each MLP is designed to accept the interim output vector {I} produced by a particular rule group (R G ). ...
Article
Reference evapotranspiration (ET 0) is a significant parameter for efficient irrigation scheduling and groundwater conservation. Different machine learning models have been designed for ET 0 estimation for specific combinations of available meteorological parameters. However, no single model has been suggested so far that can handle diverse combinations of available meteorological parameters for the estimation of ET 0. This article suggests a novel architecture of an improved hybrid quasi-fuzzy artificial neural network (ANN) model (EvatCrop) for this purpose. EvatCrop yielded superior results when compared with the other three popular models, decision trees, artificial neural networks, and adaptive neuro-fuzzy inference systems, irrespective of study locations and the combinations of input parameters. For real-field case studies, it was applied in the groundwater-stressed area of the Terai agro-climatic region of North Bengal, India, and trained and tested with the daily meteorological data available from the National Centres for Environmental Prediction from 2000 to 2014. The precision of the model was compared with the standard Penman-Monteith model (FAO56PM). Empirical results depicted that the model performances remarkably varied under different data-limited situations. When the complete set of input parameters was available, EvatCrop resulted in the best values of coefficient of determination (R 2 = 0.988), degree of agreement (d = 0.997), root mean square error (RMSE = 0.183), and root mean square relative error (RMSRE = 0.034).
... The artificial neural network (ANN) is a type of simulation that emulates the interconnectedness observed in the human brain. Artificial Neural Network (ANN) is a learning experience that takes inputs, computes them, and produces outputs [21]. A multi-layer perceptron (MLP) is an ANN with hidden layers. ...
Conference Paper
Full-text available
Datasets related to Genes expressions stand out for their very high dimensionality, which can be on the order of thousands of genes found in the same organism's genome. Pre-processing the data, in this case, is essential because of the noise and the redundancy phenomena. The smallest possible subset of genes identifying a given disease or a significant feature can be found by performing feature selection on gene expression data. In this context, this study aims to provide a classification system by using a DNA microarray. The suggested technique uses a real-valued Artificial Bee Colony algorithm to make the selection, then uses the feed-forward Artificial Neural Networks (ANNs) to evaluate the generated subsets. Binary and multi-cancer gene expression data-sets were employed to test this proposition. The obtained results were compared to those of five nature-inspired meta-heuristics and revealed that the proposed method shows the lowest error rate.
... This structure signifies a tiered arrangement from input to output, with connectivity exclusive to adjacent layers (Fig. 5). 51,52 According to this definition, MLP always has at least two layers, namely the input and output layer. MLPs are trained using backpropagation, a supervised learning algorithm that adjusts the weights based on the error in the output compared to the expected result. ...
Preprint
Full-text available
This paper presents a comprehensive systematic review of forecasting models applied to cancer burden prediction, focusing on their efficacy for long-term predictions using annual data. Cancer represents a significant challenge to global healthcare systems, necessitating accurate forecasting models for effective planning and resource allocation. We evaluated various methodologies, including JoinPoint Regression, Age-Period-Cohort models, time series analysis, exponential smoothing, machine learning, and more, highlighting their strengths and weaknesses in forecasting cancer incidence, mortality, and Disability-Adjusted Life Years. Our literature search strategy involved a systematic search across major scientific databases, yielding a final selection of 10 studies for in-depth analysis. These studies employed diverse forecasting models, which were critically assessed for their predictive accuracy, handling of annual data limitations, and applicability to cancer epidemiology. Our findings indicate that no single model universally excels in all aspects of cancer burden forecasting. However, ARIMA models and their variants consistently demonstrated strong predictive performance across different cancers, countries, and projection periods. The evaluation also underscores the challenges posed by limited long-term data and the potential for complex models to overfit in sparse data scenarios. Importantly, the review suggests a need for further research into developing models capable of accurate longer-term forecasts, which could significantly enhance healthcare planning and intervention strategies. In conclusion, while ARIMA and its derivatives currently lead in performance, there is a pressing need for innovative models that extend predictive capabilities over longer horizons, improving the global healthcare sector's response to the cancer burden.
... Artificial neural networks (ANN) are considered supervised learning algorithms and are able to classify relationships between functions to achieve a desired level of accuracy. In the literature review, ANN are categorized into various types, including feed-forward (FF), Kohonen, radial-basis function (RBF), recurrent, and spiking neural networks [31,50]. Previous research also shows that The Multi-Layer Perceptron (MLP) deep neural network with Levenberg-Marquardt (LM) training algorithms performed significantly better than the Radial Basis Function (RBF) network as achieving a remarkable correlation coefficient [38]. ...
Article
Full-text available
The agarwood oil quality has been divided into four grades, including low, medium-low, medium-high, and high, and has been thoroughly examined in this manuscript. Recently, there has been a high demand for agarwood oil but the current grading method is based on conventional techniques that rely on visual inspection of various characteristics such as intensity, smell, texture, and weight. However, this method is not standardized, making it difficult to grade agarwood oil accurately. Therefore, the use of artificial neural networks (ANN) in artificial intelligence (AI) was employed to develop a system for identifying agarwood oil quality using the Levenberg-Marquardt (LM) algorithm. Data from 660 samples of chemical compounds extracted from agarwood oil were used to train the ANN. To enhance the accuracy of agarwood oil quality identification with LM performance, the data was split into 70% for validation, 15% for training, and 15% for testing. The results showed that the ANN with the eleven inputs (10-epi-ɤ-eudesmol, α-agarofuran, ɤ-eudesmol, β-agarofuran, ar-curcumene, valerianol, β-dihydro agarofuran, α-guaiene, allo aromadendrene epoxide and ϒ-cadinene) trained by ten hidden neurons of LM algorithm provided the best performance with 100% for accuracy, specificity, sensitivity and precision as well as minimum convergence epoch. The experimental implementation of the model was done using the MATLAB version R2015a platform. This study will help to standardize agarwood oil quality determination using intelligent modeling techniques and serve as a guide for future research in the essential oil industry.
... To overcome these limitations, this work proposes a novel control method based on Artificial Neural Networks (ANN) [70,71]. The suggested control approach aims to retain the advantages of the conventional DPC while simultaneously minimizing the THD of current and reducing the ripples in active and reactive powers. ...
Article
Full-text available
This study tackles the complex task of integrating wind energy systems into the electric grid, facing challenges such as power oscillations and unreliable energy generation due to fluctuating wind speeds. Focused on wind energy conversion systems, particularly those utilizing double-fed induction generators (DFIGs), the research introduces a novel approach to enhance Direct Power Control (DPC) effectiveness. Traditional DPC, while simple, encounters issues like torque ripples and reduced power quality due to a hysteresis controller. In response, the study proposes an innovative DPC method for DFIGs using artificial neural networks (ANNs). Experimental verification shows ANNs effectively addressing issues with the hysteresis controller and switching table. Additionally, the study addresses wind speed variability by employing an artificial neural network to directly control reactive and active power of DFIG, aiming to minimize challenges with varying wind speeds. Results highlight the effectiveness and reliability of the developed intelligent strategy, outperforming traditional methods by reducing current harmonics and improving dynamic response. This research contributes valuable insights into enhancing the performance and reliability of renewable energy systems, advancing solutions for wind energy integration complexities.
... MLP: The MLP neural network consists of one input layer, several hidden layers, and an output layer (Cinar, 2020). An early step toward more complex NN was the representation of a single neuron called a perceptron. ...
Article
Full-text available
Heart disease is one of the primary causes of morbidity and death worldwide. Millions of people have had heart attacks every year, and only early-stage predictions can help to reduce the number. Researchers are working on designing and developing early-stage prediction systems using different advanced technologies, and machine learning (ML) is one of them. Almost all existing ML-based works consider the same dataset (intra-dataset) for the training and validation of their method. In particular, they do not consider inter-dataset performance checks, where different datasets are used in the training and testing phases. In inter-dataset setup, existing ML models show a poor performance named the inter-dataset discrepancy problem. This work focuses on mitigating the inter-dataset discrepancy problem by considering five available heart disease datasets and their combined form. All potential training and testing mode combinations are systematically executed to assess discrepancies before and after applying the proposed methods. Imbalance data handling using SMOTE-Tomek, feature selection using random forest (RF), and feature extraction using principle component analysis (PCA) with a long preprocessing pipeline are used to mitigate the inter-dataset discrepancy problem. The preprocessing pipeline builds on missing value handling using RF regression, log transformation, outlier removal, normalization, and data balancing that convert the datasets to more ML-centric. Support vector machine, K-nearest neighbors, decision tree, RF, eXtreme Gradient Boosting, Gaussian naive Bayes, logistic regression, and multilayer perceptron are used as classifiers. Experimental results show that feature selection and classification using RF produce better results than other combination strategies in both single- and inter-dataset setups. In certain configurations of individual datasets, RF demonstrates 100% accuracy and 96% accuracy during the feature selection phase in an inter-dataset setup, exhibiting commendable precision, recall, F1 score, specificity, and AUC score. The results indicate that an effective preprocessing technique has the potential to improve the performance of the ML model without necessitating the development of intricate prediction models. Addressing inter-dataset discrepancies introduces a novel research avenue, enabling the amalgamation of identical features from various datasets to construct a comprehensive global dataset within a specific domain.
... It is widely used in various tasks in the field of machine learning. Different types of neural network algorithms are suitable for different data types and task requirements [20]. The development and improvement of these algorithms have promoted the rapid development of the field of deep learning and achieved important breakthroughs and applications in many fields. ...
... Feedforward neural networks, also known as multilayer perceptrons (MLPs), exhibit the ability for applications without sequential or temporal relationships in the data, such as the wear process [55,56]. Feedforward neural networks can handle big datasets and model nonlinear interactions by virtue of their layered architecture and activation functions. ...
Article
Bireysel ve kurumsal yatırımcılar açısından portföy oluşturmasındaki temel amaç yüksek getiri elde etmektir. Yatırımcılar açısından, varlıkların getiri ve risk yönlerini kapsamlı bir analiz yaparak minimum risk ve maksimum getiriyi oluşturacak şekilde bir portföy ortaya koymak gereklidir. Portföy optimizasyonu alanında, literatürde temel olarak iki yöntem kullanılarak portföy oluşturma çalışmaları yapılmıştır. Geleneksel portföy yönetimi, sektörel çeşitlendirme esasına dayanmaktadır. Modern portföy yönetimi ise matematiksel modeli esas alan yenilikçi bir yaklaşımdır. Bu çalışmada, modern portföy yaklaşımları temelli simülasyon metodu kullanılarak portföyler elde edilmiştir. BİST 30’da yer alan şirketlerin Piyasa Değeri / Defter Değeri (PD/DD) ve Fiyat – Kazanç (F/K) oranlarını kullanarak Monte Carlo simülasyonu metodu ile yatırımcıların risk algısına göre portföyler elde edilmiştir. Çalışma sonucunda, portföyün en yüksek yüzdesini oluşturan hisselerin PD/DD değerinin daha düşük olanlar olduğu gözlemlenmiştir. F/K değerlerinin ikinci derece önemli faktör olduğu görülmüştür.
Article
Full-text available
Wireless sensors (Node) are devices with a built-in battery, sensor and communication unit. Wireless sensor network (WSN) are structures that are formed by multiple nodes coming together to transmit the data they collect to each other to the base station. Significant work has been done on WSN in recent years. One of the important issues that these studies focus on is increasing the energy efficiency of the nodes forming the network and ensure their survival for longer. In this study, two-dimensional PSO (TDPSO) has been proposed to solve the problem of clustering in wireless sensor networks by inspiring from Particle Swarm Optimization (PSO) modified by Huilian FAN to solve discrete problems such as traveling salesman problem. The proposed algorithm was analyzed comparatively with the Low-Energy Adaptive Clustering Hierarchy (LEACH) protocol. As a result, an improvement of 4% compared to LEACH was achieved in terms of the amount of energy left in the network. The data packet sent in 20 rounds was increased by 2000 packets according to LEACH and 27% improvement was achieved. In addition, the number of surviving nodes was increased by 22%.
Article
Full-text available
Extensive research is carried out in the analysis of non stationary signals. Most of the real time signals are non-stationary in nature. In most cases, these non stationary signals are of types viz defect / non defect, normal / abnormal etc. Hence analysis refers to categorising the signals. Developing a signal processing algorithm for performing the above task is a major challenge. In Machine learning, features are extracted and classifiers are used for categorizing the signals. Feature extraction can be done in time domain, frequency domain and spectral domain. In this paper, feasibility of Singular Value Decomposition (SVD), Framelet Transforms, Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) for feature extraction is studied. These features are aggregated using statistical parameters, namely mean, skewness and kurtosis. These aggregated features are then fed to Back Propagation Network (BPN). Performance of Back Propagation Network is measured in terms of sensitivity, specificity and accuracy.
Article
Full-text available
Tree-seed algorithm (TSA) is a nature-inspired metaheuristic optimization algorithm. TSA was originally designed and introduced for solving continuous optimization problems. In this study, TSA was modified with transfer functions so as to solve binary optimization problems. Continuous search space was mapped to binary search space with transfer functions. Four S-shaped and four V-shaped transfer functions were used for discretization. Uncapacitated facility location problem (UFLP) is a pure binary optimization problem. In order to measure the performance, 15 different sized (small, medium, large and extra-large) UFLPs were solved with eight different binary TSAs in this study. Experimental results has shown that S-shaped transfer functions are better than V-shaped transfer functions on these problem sets.
Article
Full-text available
Tree-Seed algorithm (TSA) is a recently developed nature inspired population-based iterative search algorithm. TSA is proposed for solving continuous optimization problems by inspiring the relations between trees and their seeds. The constrained and binary versions of TSA are present in the literature but there is no discrete version of TSA which decision variables represented as integer values. In the present work, the basic TSA is redesigned by integrating the swap, shift, and symmetry transformation operators in order to solve the permutation-coded optimization problems and it is called as DTSA. In the basic TSA, the solution update rules can be used for the decision variables whose are defined in continuous solution space, this rules are replaced with the transformation operators in the proposed DTSA. In order to investigate the performance of DTSA, well-known symmetric traveling salesman problems are considered in the experiments. The obtained results are compared with well-known metaheuristic algorithms and their variants, such as Ant Colony Optimization (ACO), Genetic Algorithm (GA), Simulated Annealing (SA), State Transition Algorithm (STA), Artificial Bee Colony (ABC), Black Hole (BH), and Particle Swarm Optimization (PSO). Experimental results show that DTSA is another qualified and competitive solver on discrete optimization.
Article
Full-text available
Due to the rapid development of Medical IoT recently, how to effectively apply these huge amounts of IoT data to enhance the reliability of the clinical decision making has become an increasing issue in the medical field. These data usually comprise high-complicated features with tremendous volume, and it implies that the simple inference models may less powerful to be practiced. In deep learning, multilayer perceptron (MLP) is a kind of feed-forward artificial neural network, and it is one of the high-performance methods about stochastic scheme, fitness approximation, and regression analysis. To process these high uncertain data, the proposed work based on MLP structure in particular integrates the boosting scheme and dimension-reduction process. In this proposed work, the advanced ReLU-based activation function is used. Also, the weight initialization is applied to improve the stable prediction and convergence. After the improved dimension-reduction process is introduced, the proposed method can effectively learn the hidden information from the reformative data and the precise labels also can be recognized by stacking a small amount of neural network layers with paying few extra cost. The proposed work shows a possible path of embedding dimension reduction in deep learning structure with minor price. In addition to the prediction issue, the proposed method can also be applied to assess risk and forecast trend among different information systems.
Article
Full-text available
Tree-seed algorithm (TSA) is a nature-inspired and population-based algorithm for solving continuous optimization problems. The tree-seed relationship is the main motivation of this algorithm. TSA has only two peculiar parameters which are the total number of trees in the stand (pop) and the controller of the seed production (search tendency). Although many problems have been solved in the literature by TSA which is the successful optimizer for low dimensional unconstrained continuous problems, real-life problems have not been addressed yet. In this study, six continuous unconstrained real-life optimization problems (gas transmission compressor design, optimal capacity of gas production facility, gear train design, frequency modulation sounds parameter identification, the spread spectrum radar polyphase code design with 10 decision variables, the spread spectrum radar polyphase code design with 20 decision variables) have been solved. It is seen that choosing the number of the population as 50 and the value of search tendency as 0.1 is appropriate according to experimental results for these problems.
Article
Full-text available
An artificial neural network system has been developed to detect Parkinson's Disease (PD). Three samples were taken from each patient and included in the system. The importance of the study is based on the development and use of a new subject-based ANN approach that takes into account the dependent nature of the data in a replicated measure-based design. In order to evaluate the performance of the proposed system, an audio replication-based experiment was performed to differentiate healthy people from PD patients. The UCI Experiment consisted of 80 subjects, half of whom were affected by PD. Although the proposed system has a reduced number of subjects, the system is able to distinguish people with PD from an acceptable degree of healthy people with an accuracy rate of 94.93% in an artificial neural network.
Article
This paper aims to provide a smart design to improve the efficiency of surface permanent magnet synchronous motor. An efficient design strategy involving penalty approaches are considered for extracting all the possible parameter combinations and the solutions in the infeasible region. We compare the performance of tree heuristic optimization algorithms and six penalty methods. The heuristic optimization algorithms are: particle swarm optimization, differential search algorithm, and tree seed algorithm. The penalty methods are: three of which are static penalty approaches, two of dynamic penalty approaches, and Deb’s rule. Besides, the optimized motor design is tested with finite element analysis. Two conclusions were drawn from the experiments. First, heuristic algorithms using penalty approaches had significantly better performance compared to standard and popular heuristic algorithms. This emphasizes the importance of using heuristic algorithms with penalty approaches in SPMSM design optimization. Second, the compatibility of design optimization and numerical analysis results are acceptable and highly satisfactory for surface permanent magnet synchronous motor design. According to the analytical design, a 4% improvement in efficiency was achieved with the proposed approach.
Article
In this paper, feature weighting is used to develop an effective computer-aided diagnosis system for breast cancer. Feature weighting is employed because it boosts the classification performance more as compared to feature subset selection. Specifically, a wrapper method utilizing the Ant Lion Optimization algorithm is presented that searches for best feature weights and parametric values of Multilayer Neural Network simultaneously. The selection of hidden neurons and backpropagation training algorithms are used as parameters of neural networks. The performance of the proposed approach is evaluated on three breast cancer datasets. The data is initially normalized using tanh method to remove the effects of dominant features and outliers. The results show that the proposed wrapper method has a better ability to attain higher accuracy as compared to the existing techniques. The obtained high classification performance validates the work which has the potential for becoming an alternative to the other well-known techniques.