ArticlePDF Available

A Chaotic-Based Interactive Autodidactic School Algorithm for Data Clustering Problems and Its Application on COVID-19 Disease Detection

Authors:

Abstract

In many disciplines, including pattern recognition, data mining, machine learning, image analysis, and bioinformatics, data clustering is a common analytical tool for data statistics. The majority of conventional clustering techniques are slow to converge and frequently get stuck in local optima. In this regard, population-based meta-heuristic algorithms are used to overcome the problem of getting trapped in local optima and increase the convergence speed. An asymmetric approach to clustering the asymmetric self-organizing map is proposed in this paper. The Interactive Autodidactic School (IAS) is one of these population-based metaheuristic and asymmetry algorithms used to solve the clustering problem. The chaotic IAS algorithm also increases exploitation and generates a better population. In the proposed model, ten different chaotic maps and the intra-cluster summation fitness function have been used to improve the results of the IAS. According to the simulation findings, the IAS based on the Chebyshev chaotic function outperformed other chaotic IAS iterations and other metaheuristic algorithms. The efficacy of the proposed model is finally highlighted by comparing its performance with optimization algorithms in terms of fitness function and convergence rate. This algorithm can be used in different engineering problems as well. Moreover, the Binary IAS (BIAS) detects coronavirus disease 2019 (COVID-19). The results demonstrate that the accuracy of BIAS for the COVID-19 dataset is 96.25%.
Citation: Gharehchopogh, F.S.;
Khargoush, A.A. A Chaotic-Based
Interactive Autodidactic School
Algorithm for Data Clustering
Problems and Its Application on
COVID-19 Disease Detection.
Symmetry 2023,15, 894. https://
doi.org/10.3390/sym15040894
Academic Editors: Jeng-Shyang Pan,
Zhixun Su and Alexander Shelupanov
Received: 18 February 2023
Revised: 13 March 2023
Accepted: 7 April 2023
Published: 10 April 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
symmetry
S
S
Article
A Chaotic-Based Interactive Autodidactic School Algorithm for
Data Clustering Problems and Its Application on COVID-19
Disease Detection
Farhad Soleimanian Gharehchopogh * and Aysan Alavi Khargoush
Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia 5716963896, Iran
*Correspondence: bonab.farhad@gmail.com; Tel.: +98-91-4176-4427
Abstract:
In many disciplines, including pattern recognition, data mining, machine learning, image
analysis, and bioinformatics, data clustering is a common analytical tool for data statistics. The
majority of conventional clustering techniques are slow to converge and frequently get stuck in local
optima. In this regard, population-based meta-heuristic algorithms are used to overcome the problem
of getting trapped in local optima and increase the convergence speed. An asymmetric approach to
clustering the asymmetric self-organizing map is proposed in this paper. The Interactive Autodidactic
School (IAS) is one of these population-based metaheuristic and asymmetry algorithms used to solve
the clustering problem. The chaotic IAS algorithm also increases exploitation and generates a better
population. In the proposed model, ten different chaotic maps and the intra-cluster summation fitness
function have been used to improve the results of the IAS. According to the simulation findings, the
IAS based on the Chebyshev chaotic function outperformed other chaotic IAS iterations and other
metaheuristic algorithms. The efficacy of the proposed model is finally highlighted by comparing its
performance with optimization algorithms in terms of fitness function and convergence rate. This
algorithm can be used in different engineering problems as well. Moreover, the Binary IAS (BIAS)
detects coronavirus disease 2019 (COVID-19). The results demonstrate that the accuracy of BIAS for
the COVID-19 dataset is 96.25%.
Keywords: interactive autodidactic school algorithm; chaotic maps; data clustering; optimization
1. Introduction
One of the main scientific fields of machine learning and data mining is data clustering.
It involves separating several objects into groups of things [
1
]. In other words, data
clustering is a branch of unsupervised learning and an automatic process that divides
samples into categories whose members are similar. Data clustering aims to illustrate an
extensive dataset with fewer initial samples or clusters; this simplifies the data in modeling
and plays a significant role in exploration and data mining. Clustering means identifying
similar classes of objects. It is possible to identify further the dense and scattered areas
in the object space, discover the general distribution pattern, and find the correlation
properties between the data. Clustering techniques combine visible samples within clusters
that meet two main criteria: (1) each group or cluster is homogeneous; and (2) each group
or cluster must be different from other clusters. The most crucial clustering techniques are
hierarchical, distribution, partition, density, fuzzy, and graph-based clustering [2,3].
Using asymmetric similarities and dissimilarities is one solution to data clustering. In
order for them to accurately reflect the hierarchical asymmetric relationships between items
in the studied dataset, they need to be applied in algorithms in the appropriate manner.
Hence, it should be ensured that their use is in line with the data’s hierarchical linkages.
Asymmetric coefficients and cluster coefficients that are put into the formulas for symmetric
measures can be used to accomplish this. By building the asymmetric measurements off
of the symmetric ones, we can do so. The hierarchy’s consistency should be guaranteed
Symmetry 2023,15, 894. https://doi.org/10.3390/sym15040894 https://www.mdpi.com/journal/symmetry
Symmetry 2023,15, 894 2 of 26
by the asymmetry coefficients and cluster coefficients. As they should guarantee greater
values in the direction—from a more specific notion to a more generic one—in the event of
similarities, they should.
Clustering means assigning samples to different cluster centers based on proximity
and intra-cluster similarity. K-means clustering is widely used as one of the classical
methods due to its easy implementation and low computational efficiency for clustering
problems [
4
]. However, for K-means clustering, the number of clusters should be specified
beforehand. While in many practical applications, users usually have no information
about the number of clusters. If the clustering algorithm tries to test a different number
of clusters to find the optimal state, finding the correct number will be time-consuming
and challenging. Therefore, to overcome the above phenomenon, intelligent clustering
methods should automatically determine the optimal number of clusters and obtain better
partitioning [5].
Optimization algorithms are critical computational tools in engineering, and their
application has grown significantly over the past decades. Analytical and metaheuristic
methods differentiate the optimization algorithms. Analytical approaches, also called
gradient-based algorithms, are deterministic and always offer the same optimal solution
using the same starting point [
6
]. Although these numerical methods work well in solving
optimization problems, they have three significant drawbacks compared to metaheuristic
methods [
7
]. First, numerical methods cannot be used when the fitness function and
constraints are discrete since their gradients are not defined. Second, numerical methods
may get trapped in local minimums due to their dependence on the value of the starting
point. Finally, the numerical methods are unstable and unreliable when the fitness function
and constraints have multiple or sharp peaks. Researchers have turned to new stochastic
approaches with specific features instead of traditional analytical techniques to solve
complex engineering optimization problems.
Metaheuristic algorithms are essential in solving optimization problems; they are
among the most successful methods in solving various complex optimization problems.
These algorithms provide the optimal solution for optimization problems. The systems
and behavior of animals often inspire metaheuristic algorithms in nature, such as flocks of
birds, ant colonies, and fish schools. The behavior of the members of these algorithms is
based on the behavior of the inspiring being in nature in terms of searching for the best
food sources. Most metaheuristic optimization algorithms have similar characteristics:
stochastic or random walk algorithms, independent of gradient information, iterative
methods, and applied to continuous and discrete problems. The performance of any
metaheuristic algorithm depends on the complexity of the cost function and the constraints
that define the functional search space. Metaheuristic algorithms have been used to solve
various optimization problems; they have been successful in many optimization problems,
including clustering. Classical clustering algorithms such as k-means often converge
to local optima and have slow convergence rates for larger datasets. Clustering-based
algorithms use swarm-based metaheuristic methods to overcome such issues. Swarm
or population-based metaheuristic approaches strive to achieve the optimal solution in
clustering at a reasonable time [8].
The IAS is a novel metaheuristic algorithm proposed by Jahangiri in 2020 [
9
]. It
simulates the interactions of a group of students trying to learn without the help of a
teacher; thus, an autodidactic school sequence is created. To explore the search space
looking for the optimal solution, the IAS, as with other population-based algorithms,
iteratively uses a population in which the students’ leader is called the leader and the rest
of the community is called the followers. This paper implements an improved IAS based
on chaotic maps on various clustering datasets.
The proposed model is appraised on different benchmark test functions for analyzing
its efficiency and accuracy. The experimental results demonstrated that the performance
of the proposed model is improved in terms of global search and convergence rate. The
proposed model is analyzed considering statistical criteria such as best, worst, lowest,
Symmetry 2023,15, 894 3 of 26
and standard deviation. Moreover, its convergence is compared with other metaheuristic
algorithms such as the Artificial Bee Colony (ABC) [
10
], Bat Algorithm (BA) [
11
], Crow
Search Algorithm (CSA) [
12
], and Artificial Electric Field Algorithm (AEFA) [
13
]. Then,
the IAS is developed to transfer the continuous search space to a binary one using the
S-shaped transfer function. Furthermore, the BIAS was applied in a case study to detect
coronavirus disease 2019 (COVID-19). The experimental results prove that BIAS is more
efficient than other comparative algorithms in searching the problem space and selecting
the most compelling features. The contributions of this paper are as follows:
â
Increasing the discovery of the optimal solution in the proposed model with a balance
between exploration and exploitation by chaotic maps
â
They provide an improved version of the IAS for the data clustering problem based
on chaotic maps.
âEvaluation of the proposed model on 20 UCI datasets
âAssessment of the proposed model based on fitness function and convergence rate
â
Developing the BIAS as the binary version of IAS using the V-shaped transfer function
to find valuable features from COVID-19.
âComparison of the proposed model with ABC, BA, CSA, and AEFA
âWe are applying the BIAS in a case study to detect COVID-19.
The rest of the paper is organized as follows. In Section 2, related works in clustering
by metaheuristic algorithms are surveyed. Section 3describes material and methods such
as the IAS algorithm and chaotic maps. Section 4proposes a new version of the improved
IAS algorithm based on chaotic maps for data clustering. In Section 5, the performance of
the proposed model is compared with other algorithms on the clustering dataset. Section 6
establishes the actual application of the proposed BIAS for extracting essential features from
the COVID-19 dataset. Finally, Section 7provides concluding remarks and suggestions for
future research.
2. Related Works
This section presents the subject’s background and related literature in data clustering
using metaheuristic algorithms. Here, the aim is to review recent data clustering improve-
ments using metaheuristic algorithms. Therefore, the related works are presented below in
the order of publishing time.
Ahmadi et al. [
14
] presented an improved version of the Grey Wolf Optimizer (GWO)
algorithm for clustering problems. A modified GWO has been proposed to address some
metaheuristic algorithms’ challenges. This modification includes a balancing approach
between exploring and exploiting the GWO and a local search for the best solution. The re-
sults show that the proposed model has a lower intra-cluster distance than other algorithms
and a mean error of about 11%, which is the lowest among all comparison algorithms.
Ashish et al. [
15
] proposed a fast and efficient parallel BA for data clustering using a
mapping reduction architecture. The parallel BA is very efficient and helpful since it uses
an evolutionary approach to clustering instead of other algorithms, such as k-means; it also
enjoys high speed due to Hadoop architecture. The results of various experiments show
that the parallel BA performs better than Particle Swarm Optimization (PSO); it performs
faster than other comparative algorithms when the number of nodes increases.
The applicability of the Cuttlefish Algorithm (CFA) to clustering issues has been
examined in this study [
16
]. Additionally, it has demonstrated that the CFA can find
the optimal cluster centers. The technique has prevented the cluster centers from readily
becoming trapped in a local minimum, a significant drawback of the K-means. The CFA was
used as a search method to reduce the clustering metrics. Based on the Shapes and UCI real-
world datasets, the performance of the CFA-Clustering model has been assessed. The three
well-known algorithms, Genetic Algorithm (GA), PSO, and K-means, were compared. The
empirical findings show that, for the most part, the CFA-Clustering approach outperforms
the other methods.
Symmetry 2023,15, 894 4 of 26
An asymmetric version of the k-means clustering algorithm [
17
] arises caused by
usage of dissimilarities, which are asymmetric by definition (for example, the Kullback–
Leibler divergence).
Cuckoo and krill herd algorithms are utilized on k-means++ to improve cluster quality
and create optimized clusters [
18
]. Performance parameters such as accuracy, error rate,
f-measure, CPU time, standard deviation, cluster quality check, and so forth are used to
measure the clustering potentialities of these algorithms. The results presented the high
performance of the newly designed algorithm.
Zhang et al. [
19
] proposed an improved K-means algorithm based on canopy density
in 2018 to improve the K-means algorithm’s accuracy and stability and determine the
most appropriate number of K clusters and the best initial data. An enhanced K-means
method based on density Canopy is suggested to enhance the accuracy and stability of the
K-means algorithm and to address the issue of selecting the best-starting seeds and the
optimal number of K of clusters. The first step is to compute the density of the sample data
sets, the average sample distance inside clusters, and the distance between clusters. The
density maximum sampling point is then selected as the first cluster center, and the density
cluster is then removed from the sample datasets. The K-means technique uses the density
Canopy as a pre-processing step, and the output is utilized to determine the cluster number
and starting clustering center. Comparative results show that the improved K-means
algorithm based on canopy density has obtained better clustering results. The improved
K-means algorithm based on canopy density is less sensitive to noisy data than the K-
means algorithm, the canopy-based K-means algorithm, the semi-supervised K-means++
algorithm, and the K-means-u algorithm. The clustering accuracy of the proposed canopy
density-based K-means algorithm is improved by an average of 30.7%, 6.1%, 5.3%, and 3.7%
in the UCI dataset, respectively, and by 44.3%, 3.6%, 9.6%, and 8.9%, respectively, in the
simulated dataset with the improved noise signal. It enjoys a more accurate performance
than comparative algorithms.
To use the advantages of the two ABC and the K-means algorithms, Kummer et al. [
20
]
proposed a hybrid algorithm combining these two algorithms, called the MABCKM Al-
gorithm. The Hybrid MABCKM Algorithm modifies the solutions generated by ABC and
considers them as the initial solutions for the K-means algorithm. According to the results
obtained from comparing the performance of MABCKM, K-means, and ABC algorithms
on different datasets taken from the UCI repository, it is clear that MABCKM outperforms
other comparative algorithms.
The Whale Clustering Optimization Algorithm(WOA) was proposed for clustering
data [
21
]. The results of WOA are compared with the well-known k-means clustering
method and other standard stochastic algorithms such as PSO, ABC, Differential Evolution
(DE), and GA clustering. The proposed model was checked using one artificial and seven
real benchmark datasets from the UCI. Simulations have proven that the proposed model
could successfully be used for data clustering.
Qaddoura et al. [
22
] presented an improved version of the GA’s evolutionary behavior
as well as the advanced performance of the nearest neighbor search technique for clustering
problems based on allocation and selection mechanisms. The success of evolutionary
algorithms in solving various machine learning problems, including clustering, has been
proven. The proposed model’s objective was to improve the quality of clustering results
by identifying a solution that maximizes differentiation between different clusters and
coherence between data points within the same cluster. Various experiments show that
the proposed model works well with the Silhouette coefficient’s fitness function and
outperforms other algorithms.
Zhou et al. [
23
] presented an enhanced version of the symbiotic organism search
(SOS) algorithm to solve data clustering. It evokes the symbiotic interaction strategies
used by organisms in the ecosystem to survive and spread. This paper implemented
the proposed model on ten standard UCI machine-learning repository datasets. Various
Symmetry 2023,15, 894 5 of 26
experiments showed that the SOS algorithm performed better than other algorithms in
accuracy and precision.
Rahnema and Gharehchopogh proposed an improved ABC based on the whale op-
timization algorithm for data clustering in 2020 [
2
]. In this paper, two random and elite
memories are used in the ABC to overcome the problem of exploration and late convergence.
Finally, the proposed model was evaluated by being implemented on ten standard datasets
taken from the UCI machine learning repository. Ewees et al. presented an improved
version of the Multi-Verse Harris-Hawk optimization (CMVHHO) [
24
]. The primary pur-
pose of this algorithm was to use chaotic maps to determine optimized values of the main
parameters of the Harris algorithm. In addition, it was used as a local search approach
to improving the ability to exploit the search space. It was tested using several different
chaotic maps. Experimental results show that the Circle chaotic point is the best function
among all available functions since it has improved the performance of the proposed model
and has a positive effect on the behavior of the proposed model.
Chen et al. presented a chaotic-based dynamic weighted PSO algorithm [
25
]. The
proposed model introduces a chaotic map and an emotional weight for modifying the
search process. Dynamic weight is a fitness function that increases the search accuracy and
performance of the proposed model. Various experiments show that the proposed model
outperformed nature-inspired and PSO algorithms in almost all functions.
To overcome the shortcomings of the Fruit Fly Optimization (FFO) algorithm [
26
],
Zhang et al. proposed a new version of the FFO using the Gaussian mutation operator and
the local chaotic search strategy. The Gaussian mutation operator is integrated into the FFO
algorithm to prevent premature convergence and improve the exploration process. Then,
a chaotic local search approach is adopted to increase the group’s local search ability; the
results prove that the proposed model works better than the basic FFO algorithm.
In this section, important clustering literature using metaheuristic algorithms was
reviewed. Most of these works have considered the clustering problem an optimization
problem and applied a metaheuristic algorithm to solve it; in addition, the fitness function
of the intra-cluster dataset was used as the fitness function. Some authors have used a
combination of genetic operators and other methods, while others have employed chaotic
and quantum mapping to improve exploitation and convergence. Considering the literature
reviewed in this paper, an enhanced version of the IAS based on chaotic maps is proposed
for the clustering problem.
3. Material and Method
3.1. IAS Algorithm
As with other population-based algorithms, IAS randomly generates an initial pop-
ulation called students [
9
]. A specific problem’s upper and lower limit values determine
students’ eligibility for inclusion in the IAS. The student with the highest performance
(minimum score) in each step gets the position of “leader student” or simply “leader”. In
IAS optimization, the best performance is achieved when the minimum value of the cost
function is performed. However, this position can be reassigned to another more skilled
student at any point in the process. The method of student generation and assessment of
student eligibility in school can be described as Algorithm 1.
Algorithm 1 The method of student generation and assessment of student eligibility
1: For i=:N_student
2: Si=LB +ri(0, 1)(U B LB);Mi=|f(Si)
3: End For
4: f(LS)={m}
where S
i
is the ith generated student,
LB
, and
UB
are the lower and upper limits of
the variables, respectively, r
i
(0, 1) is a random number between 0 and 1, N_student is
the number of students, M
i
is the score of the ith student, and LS is the leader student.
Symmetry 2023,15, 894 6 of 26
Autodidactic/self-learning sessions in this interactive school are held in three stages:
individual training, group training, and new student challenges.
Individual Training Session: First, a random group of two follower students is selected.
Then, they discuss it one by one with the leader student. The student’s knowledge will
increase in peer-to-peer discussions with the leader. Accordingly, an individual training
session can be formulated as described in Algorithm 2:
Algorithm 2 Individual Training Session
1: For i=1 : Nstudent
2: Randomly select one student Sj.where i6=j
3: TS
i=TSi+ri(1.2)(LS ICiTSi);
4: TS
j=TSj+rj(1.2)LS ICjTSj;
5: End for
6: Accept TS
iand TS
jif they (it)achieve(s)better marks than
where TS
i
and TS
j
are the first and second follower students, respectively; IC
i
and IC
j
are the inherent competencies of the first and second students, respectively; r
i
(1, 2) and
r
j
(1, 2) are two different random vectors between 1 and 2. Individual competencies (IC
i
and ICj) are randomly determined as 1 or 2.
Collective Training Session: After the individual training session, each follower student
has the opportunity to review the contents of the last session and interact with other
follower students in the same group to resolve the unclear points of the lesson. In addition
to the knowledge level of individually trained students, their social abilities, such as
communication skills, teamwork, and collaboration, referred to as collective competencies,
can significantly impact the effectiveness of group learning. Accordingly, the group training
session can be formulated as described in Algorithm 3.
Algorithm 3 Collective Training Session
1: For i=1 : Nstudent
2: CCij = (CCi×TSi+CCj×TSj)/(CCi+CCj);
3: TS
i=TSi+ri(1.2)×(LS CCi×CCij);
4: TS
j=TSj+rj(1.2)×(LS CCj×CCij );
5: End for
6: Accept TS
iand TS
ji f the y (it)achieve(s)better marks than TSiand TSj
where CC
ij
is defined as the collective ability of the group as a team, based on the weighted
average of students’ competencies. Moreover, r
i
(1, 2) and r
j
(1, 2) are two different random
vectors between 1 and 2. Students’ collective competencies (CC
i
and CC
j
) are randomly set
as 1 or 2.
Challenge of the New Student: In some optimization problems, due to the complex
nature of the cost function, the gradual improvement of follower students may be lim-
ited to a specific area of design space solely around the leader student (i.e., the current
temporary/local optima). However, it is still far from a permanent/global optimization.
Accordingly, a bad operating loop hinders the optimization process and will probably fail
to find the global optimum. The new student challenge is introduced to complement the
algorithm to provide a more dynamic and exploratory IAS, creating an ongoing rebellion
against the current leader. If the new student has more skills than the current leader stu-
dent, they will take on the role of leader. The new student challenge can be formulated as
described in Algorithm 4.
Symmetry 2023,15, 894 7 of 26
Algorithm 4 New student challenge
1: NS =LB +R×(UB L B);
2: MF1=round(r(0.1));
3: MF2=1M F1
4: LS=MF1×LS +MF2×NS;
5: Archieves a better mark than LS
where NS is a new student; MF1 and MF2 are the first and second corrective factors,
respectively; r(0, 1) is a random vector between 0 and 1. In addition, LS Student is the new
leader of the school.
The process (including all three sessions) is repeated until the termination criteria are
met. At the end of the process, each student has to have communicated with the leader
at least once. In both individual and group training sessions, groups of two students are
randomly selected in the search space to interact with the leader and themselves. Proper
selection of regulatory parameters, such as the number of students and several iterations,
can lead to faster detection of a global optimum. The more students exist in the autodidactic
school, the more likely there will be elite students among them. In addition, the number of
sessions held is equal to the number of students in the school. Hence, the population in
this IAS has a significant effect on increasing the knowledge level of students.
3.2. Chaotic Maps
Most chaotic maps have been used to solve various stochastic and optimization
algorithms problems [
27
]. This section introduces ten chaotic maps to improve the IAS.
Each chaotic map has unique features, described and formulated in Table 1. The whole set
of chaotic maps employed in this paper is selected with an initial point of 0.7 with different
behaviors. The initial point in chaotic maps can be any number between 0 and 1.
Table 1. Functions of Chaotic Maps.
Methods Chaotics Mathematical Model Range
CIAS-1 Chebyshev pq +1=cos(q cos 1(pq)) (1, 1)
CIAS-2 Circle pq+1=modpq+dc
2πsinsin 2πpq.1.c=0.5 and d =0.2 (0, 1)
CIAS-3 Guess/mouse pq+1=1pq=01
mod (pq.1).otherwise (0, 1)
CIAS-4 Iterative pq+1=sinsin cπ
pq.c=0.7 (1, 1)
CIAS-5 Logistic pq+1=cpq1pq,c= 4 (0, 1)
CIAS-6 Piecewise pq+1=npq
l. 0 pqlpq1
0.5l.lpq
0.5 1lpq
0.5l. 0.5 pq
1l1pq
l. 1 lpq1
(0, 1)
CIAS-7 Sine pq+1=c
4sinsin 2πpq.c=4 (0, 1)
CIAS-8 Singer pq+1=µ(7.86pq23.31 p2
q+28.75p3
q
13.302875p4
q).µ=1.07
(0, 1)
CIAS-9 Sinusoidal pq+1=cp2
qsinsin πpq,c= 2.3 (0, 1)
CIAS-10 Tent pq+1=npq
0.7 pq0.7 10
31pq.otherwise (0, 1)
Table 1lists the proposed chaotic maps to improve the IAS. The proposed model uses
chaotic maps to create the initial population and generate random parameters.
Symmetry 2023,15, 894 8 of 26
4. Proposed Model
The IAS is one of the most successful optimization algorithms. However, it fails to
work effectively in global optimization and finding the best solution. The main reason may
be the generation of an inadequate initial population and random parameters. Due to the
ergodic nature and lack of correct iteration of chaotic maps, better global and local searches
can be performed than random searches that rely primarily on probability. As a result,
this paper presents different versions of the IAS based on other chaotic maps to solve the
clustering problem. The flowchart of the proposed model is shown in Figure 1.
Symmetry 2023, 15, x FOR PEER REVIEW 9 of 26
Figure 1. Flowchart of the proposed model.
4.1. Pre-Processing
The pre-processing step includes data conversion and data normalization. For da-
tasets where the data is of string type, the label-encoder method is used to convert string
data to numeric data. Once the string data is converted to numeric data, the data normal-
ization is carried out. The MinMax method is the most popular standard normalization
method that transfers data to the space between 0 and 1, as given in Equation (1).


(1)
In Equation (1), Xvalue is the initial value of a feature in the dataset, and Xnormal refers to
the normalized feature. The MaxXvalue and MinXvalue parameters represent the feature’s larg-
est and smallest values.
A dataset 󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇞 with m samples is defined according to
Equation (2).
Figure 1. Flowchart of the proposed model.
4.1. Pre-Processing
The pre-processing step includes data conversion and data normalization. For datasets
where the data is of string type, the label-encoder method is used to convert string data to
numeric data. Once the string data is converted to numeric data, the data normalization is
carried out. The MinMax method is the most popular standard normalization method that
transfers data to the space between 0 and 1, as given in Equation (1).
Xnormal =Xvalue MinX val ue
MinXvalue MaxXvalue
(1)
Symmetry 2023,15, 894 9 of 26
In Equation (1), X
value
is the initial value of a feature in the dataset, and X
normal
refers
to the normalized feature. The Max
Xvalue
and Min
Xvalue
parameters represent the feature’s
largest and smallest values.
A dataset
D={(x1.L1).(x2.L2). . . . .(xm.Lm)}
with msamples is defined according to
Equation (2).
D=
x11 x12 · · · x1dL1
x21 x22 · · · x2dL2
· · · · · · · · · · · · · · ·
xm1xm2· · · xmd Lm
(2)
In Equation (2)
(xi.Li)
is the ith samples of D,
xi=[xi1.xi2. . . . .xid]
is the information
of the ith sample, and Lishows the label of the ith sample.
4.2. Chaotic-Based Population Generation
First, the IAS based on chaotic maps must generate a suitable initial population
to improve the algorithm’s convergence rate. Therefore, student generation and assess-
ment of students’ competence in school can be described as Equation (3) according to the
chaotic maps.
Sij =lbi+choma pi(0.1)×(ub lbi)(3)
where S
i
is the ith generated student; lb and ub are the lower and upper bounds, respectively;
r
i
(0, 1) is a number generated by chaotic maps (listed in Table 1) between 0 and 1. Thus,
the IAS generates a population based on chaotic maps from the very beginning.
4.3. Chaotic-Based Individual Training Session
In the second step, the IAS uses chaotic sequences instead of random numbers to
improve the convergence speed of the algorithm in different iterations. Therefore, according
to the chaotic maps, the individual training session can be described as Algorithm 5.
Algorithm 5 Chaotic-Based Individual Training
1: For i=1 : Nstudent
2: Randomly select one student Sj.where i 6=j
3: hj=1+chomapj(0.1)
4: hi=1+chomapi(0.1)
5: TS
i=TSi+(1+chomapi(0.1)) (LS ICiTSi);
6: TS
j=TSj+ (1+chomapj(0.1))(LS ICjTSj);
7: End for
8: Accept TS
iand TS
ji f the y (it)achieve(s)better marks than T Siand TSj
where TS
i
and TS
j
are the first and second students, h
j
and h
i
are two different chaotic
vectors between 1 and 2 that are generated by the chaotic maps (listed in Table 1). Indi-
vidual competencies (IC
i
and IC
j
) are randomly set to 1 or 2, and there is no need to use
chaotic maps.
4.4. Chaotic-Based Group Training Session
In the third step, the IAS uses chaotic sequences instead of random numbers to improve
the convergence speed of the algorithm in different iterations. Therefore, according to the
chaotic maps, the group training session can be described as Algorithm 6.
Symmetry 2023,15, 894 10 of 26
Algorithm 6 Chaotic-Based Group Training
1: For i=1 : Nstudent
2 : hj=1+choma pj(0.1)
3 : hi=1+choma pi(0.1)
4: CCij = (CCiTSi+CCjTSj)/(CCi+CCj);
5: TS
i=TSi+hi(LS CCiCCij);
6: TS
j=TSj+hj(LS CCjCCij );
7 : End for
8 : Accept TS
iand TS
ji f the y(it)achieve(s)better marks than T Siand TSj
where CC
ij
is defined as the collective ability of the group as a team based on the weighted
average of students’ competencies, h
j
and h
i
are two different chaotic vectors between 1
and 2 generated by the chaotic maps (listed in Table 1). Students’ collective competencies
(CCiand CCj) are randomly set to 1 or 2, and there is no need to use chaotic maps.
4.5. Chaotic-Based New Student Challenge
In the fourth step of the IAS, chaotic sequences are used instead of random numbers to
improve the convergence speed of the algorithm in different iterations. Therefore, according
to the chaotic maps, the group training session can be described as Algorithm 7.
Algorithm 7 Chaotic-Based New Student
1 : NS =lbi+chomapi(0.1)(ub lbi);
2: m= chomapi(0.1)
3: MF1=round(m);
4: MF2=1M F1
5: LS=MF1LS +MF2NS;
6: Archieves a better mark than LS
In Algorithm 7, a new solution (i.e., NS, is generated entirely by chaotic maps), and
MF1 and MF2 are the first and second corrective factors generated based on the chaotic
variable m. The worst side of this step is that, instead of random numbers, the chaotic
sequence generated by the chaotic maps is applied to increase the exploitation of the
proposed model.
4.6. Formation of Clusters
For the proposed model, each student vector expresses a solution with a certain
number of cluster centers, ranging from
Cmin
to
Cmax
. The decision variables are inscribed
as real-valued strings and regarded as cluster centers. Assuming that the dimension of
the dataset is
d
, the maximum length of the student vector is Kmax
×
d. For each student
vector whose cluster number is c, the
c×d
entries are evaluated as effective cluster center
solutions, and the remaining variables are invalid. Figure 2shows the format of students’
initial population for clustering. In IAS, the candidate solution is determined as {
X1
j(k)
,
X2
j(k), . . . , Xd
j(k)}, where k=1.2. . . . .P. Here, Pdescribes the number of iterations.
Figure 2shows that if a dataset has two clusters, different solutions are generated to
find the two clusters. In each solution, other features are formed for the centrality of a
cluster. Each solution is evaluated, and at the end, the solution with the best fitness (closest
distance) is selected as the optimal solution.
Symmetry 2023,15, 894 11 of 26
Symmetry 2023, 15, x FOR PEER REVIEW 12 of 26
Figure 2. format of the initial population of students for clustering.
Figure 2 shows that if a dataset has two clusters, different solutions are generated to
find the two clusters. In each solution, other features are formed for the centrality of a
cluster. Each solution is evaluated, and at the end, the solution with the best fitness (closest
distance) is selected as the optimal solution.
4.7. Fitness Function of Clustering
In our proposed model, the fitness function of clustering called intra-cluster summa-
tion using the Euclidean distance function is employed, the most popular and valid dis-
tance criterion in clustering. It can be calculated as Equation (4).
󰇭
󰇮
(4)
In Equation (4), the variable m indicates the number of features, Oip represents the
value of the feature p of the object Oi, and  represents the value of the feature p of the
object Oj. This function minimizes the distance between each object and the cluster’s cen-
ter, which is allocated to generate compact groups. Intra-cluster is defined by Equation
(5).

(5)
Here, if  is 1, the object is in cluster i; otherwise, is not in cluster . The
variable k shows the number of clusters; the variable n indicates the number of objects.
The variable m shows the number of features. Note that  shows the value of feature p
of the center of cluster ith.
5. Results and Discussion
An IAS based on ten chaotic maps (i.e., CIAS, was presented in the previous section).
In this section, statistical criteria such as the fitness function’s minimum value and the
fitness function’s convergence rate are considered to compare the proposed model and
other algorithms. Here, ten versions of the proposed CIAS algorithm are first compared
with each other in terms of statistical criteria. The best version is considered an improved
or superior version. Then it is compared with other metaheuristic methods such as the
BA, CSA, ABC, and AEFA. Therefore, more details about the implementation, parameters,
criteria, comparison, and evaluation of the proposed CIAS algorithm for the clustering
problems are given here.
Figure 2. Format of the initial population of students for clustering.
4.7. Fitness Function of Clustering
In our proposed model, the fitness function of clustering called intra-cluster sum-
mation using the Euclidean distance function is employed, the most popular and valid
distance criterion in clustering. It can be calculated as Equation (4).
distanceOi.Oj= m
p=1Oip Oj p
1
2!2
(4)
In Equation (4), the variable m indicates the number of features, O
ip
represents the
value of the feature pof the object O
i
, and
Ojp
represents the value of the feature pof the
object Oj. This function minimizes the distance between each object and the cluster ’s center,
which is allocated to generate compact groups. Intra-cluster is defined by Equation (5).
SSE =
k
i=1
n
j=1
Wij ×v
u
u
t
m
p=1Ojp Oi p 2(5)
Here, if
Wij
is 1, the object
Oj
is in cluster i; otherwise,
Oj
is not in cluster
i
. The
variable kshows the number of clusters; the variable nindicates the number of objects. The
variable mshows the number of features. Note that
Oip
shows the value of feature p of the
center of cluster ith.
5. Results and Discussion
An IAS based on ten chaotic maps (i.e., CIAS, was presented in the previous section).
In this section, statistical criteria such as the fitness function’s minimum value and the
fitness function’s convergence rate are considered to compare the proposed model and
other algorithms. Here, ten versions of the proposed CIAS algorithm are first compared
with each other in terms of statistical criteria. The best version is considered an improved
or superior version. Then it is compared with other metaheuristic methods such as the
BA, CSA, ABC, and AEFA. Therefore, more details about the implementation, parameters,
criteria, comparison, and evaluation of the proposed CIAS algorithm for the clustering
problems are given here.
5.1. Dataset
All clustering datasets used here to evaluate the improved version of the IAS based on
chaotic maps are listed in Table 2; the number of features and samples taken from these
20 valid UCI clustering datasets is mentioned.
Symmetry 2023,15, 894 12 of 26
Table 2. Clustering Dataset.
No. Datasets Number of Features Number of Samples
1 Balance Scale 4 625
2 Blood 4 748
3 breast 30 569
4 CMC 9 1473
5 Dermatology 34 366
6 Glass 9 214
7 Haberman’s Survival 3 306
8 hepatitis 19 155
9 Iris 4 150
10 Libras 90 360
11 lung cancer 32 56
12 Madelon 500 2600
13 ORL 1024 400
14 seeds 7 210
15 speech 310 125
16 Starlog (Heart) 13 270
17 Steel 33 1941
18 Vowel 3 871
19 wine 13 178
20 Wisconsin 9 699
5.2. Simulation Environment and Parameters Determination
The proposed CIAS approaches and comparative algorithms are implemented using
MATLAB 2019 on a system with 8 GB of RAM, a Cori5 CPU (2.4 GHz), and a 64-bit
operating system. For a better comparison, the quantitative parameters of the proposed
CIAS approaches, the BS, CSA, ABC, and AEFA, are set the same (see Table 3). In addition,
the qualitative parameters of each algorithm are set as standard.
Table 3shows that the initial values of the population and the number of iterations are
considered the same for all algorithms, and the importance of the other parameters is set as
standard. Different versions of the IAS based on chaotic maps (i.e., CIAS-1, CIAS-2,
. . .
,
CIAS-10) are compared with other statistical criteria. The evaluations and comparisons of
different versions of the IAS based on chaotic maps are provided below. The convergence
rate for implementing various versions of the IAS based on chaotic maps on 10 clustering
datasets is presented in Figure 3.
The results related to the convergence rate of different versions of the proposed
model on 20 datasets indicate that: (1) IAS-2 had a better performance on BLOOD and
DERMATOLOGY datasets, and IAS-1 had a better performance on BLOOD and CANCER
datasets; (2) IAS-1 had a better performance on IRIS and WINE datasets and IAS-4 had
a better performance on STEEL and IRIS datasets; (3) IAS-4 had a better performance
on GLASS, HABERMAN, BREASTEW datasets, and IAS-1 had a better performance on
BREASTEW AND HABERMAN datasets; (4) IAS-1 had a better performance on HEART
and LUNG CANCER datasets, and IAS-2 had a better performance on HABERMAN
dataset; (5) IAS-1 had a better performance on VOWEL datasets, and IAS-2 had a better
performance on SEEDS dataset. The results related to the convergence rate of different
versions of the IAS based on chaotic maps implemented on the whole dataset show that
IAS-1, IAS-2, IAS-4, and IAS-6 have improved results compared to other versions. To
further evaluate the different versions of the IAS based on chaotic maps, the developments
Symmetry 2023,15, 894 13 of 26
related to the worst solution for the population of the algorithms are compared, as shown
in Table 4.
Table 3. Values of initial parameters.
Values Parameters Algorithms
ABC [10]
Limit 5D 5D
Population Size 20
Number of lookers 20
Iterations 100
BA [11]
R 0.5
A 0.8
population size 20
Iterations 100
CSA [12]
Ap 0.8
population size is 20
Iterations 100
AEFA [13]
FCheck 1
population size 20
Iterations 100
Proposed Model population size 20
Iterations 100
Symmetry 2023, 15, x FOR PEER REVIEW 14 of 26
population size
Iterations
Proposed Model
population size
Iterations
Table 3 shows that the initial values of the population and the number of iterations
are considered the same for all algorithms, and the importance of the other parameters is
set as standard. Different versions of the IAS based on chaotic maps (i.e., CIAS-1, CIAS-2,
…, CIAS-10) are compared with other statistical criteria. The evaluations and comparisons
of different versions of the IAS based on chaotic maps are provided below. The conver-
gence rate for implementing various versions of the IAS based on chaotic maps on 10
clustering datasets is presented in Figure 3.
Figure 3. Cont.
Symmetry 2023,15, 894 14 of 26
Symmetry 2023, 15, x FOR PEER REVIEW 15 of 26
Figure 3. Convergence rate of different versions of IAS based on chaotic maps.
The results related to the convergence rate of different versions of the proposed
model on 20 datasets indicate that: (1) IAS-2 had a better performance on BLOOD and
DERMATOLOGY datasets, and IAS-1 had a better performance on BLOOD and CANCER
datasets; (2) IAS-1 had a better performance on IRIS and WINE datasets and IAS-4 had a
Figure 3. Convergence rate of different versions of IAS based on chaotic maps.
Symmetry 2023,15, 894 15 of 26
Table 4.
Results related to the worst, best, and average solutions for the population of different
versions of the ISA.
Dataset Results IAS-1 IAS-2 IAS-3 IAS-4 IAS-5 IAS-6 IAS-7 IAS-8 IAS-9 IAS-10
Blood
Worst 4.21E+05 8.46E+05 8.46E+05 4.20E+05 8.47E+05 8.46E+05 8.47E+05 8.46E+05 8.46E+05 8.46E+05
Best 4.10E+05 4.10E+05 4.12E+05 4.10E+05 4.18E+05 4.13E+05 4.12E+05 4.15E+05 4.20E+05 4.13E+05
Avg 4.14E+05 4.93E+05 4.93E+05 4.15E+05 5.46E+05 5.20E+05 5.22E+05 6.30E+05 6.55E+05 5.20E+05
Cancer
Worst 4.25E+03 4.45E+03 3.59E+03 3.63E+03 4.36E+03 3.94E+03 4.71E+03 4.24E+03 4.25E+03 5.36E+03
Best 3.28E+03 3.93E+03 3.30E+03 3.26E+03 3.68E+03 3.42E+03 3.94E+03 3.77E+03 3.50E+03 3.72E+03
Avg 3.82E+03 4.20E+03 3.44E+03 3.48E+03 4.08E+03 3.62E+03 4.40E+03 4.03E+03 3.83E+03 5.05E+03
CMC
Worst 9.70E+03 9.91E+03 1.38E+04 1.01E+04 1.32E+04 1.26E+04 1.31E+04 1.38E+04 1.38E+04 1.31E+04
Best 8.08E+03 7.79E+03 7.40E+03 7.33E+03 7.80E+03 7.60E+03 7.31E+03 6.93E+03 7.66E+03 7.15E+03
Avg 9.11E+03 8.67E+03 1.16E+04 8.85E+03 1.10E+04 1.09E+04 1.01E+04 1.18E+04 1.21E+04 1.05E+04
Dermatology
Worst 4.95E+03 3.54E+03 1.21E+04 4.76E+03 4.65E+03 4.00E+03 7.71E+03 4.94E+03 5.05E+03 4.37E+03
Best 3.05E+03 2.75E+03 2.83E+03 2.87E+03 3.31E+03 3.03E+03 2.96E+03 3.08E+03 2.93E+03 2.90E+03
Avg 3.75E+03 3.27E+03 1.15E+04 3.69E+03 3.80E+03 3.41E+03 3.99E+03 3.79E+03 3.71E+03 3.51E+03
Iris
Worst 2.29E+02 3.03E+02 2.97E+02 2.38E+02 2.75E+02 2.88E+02 2.86E+02 3.03E+02 3.03E+02 2.84E+02
Best 1.67E+02 1.71E+02 1.75E+02 1.55E+02 2.00E+02 1.47E+02 2.05E+02 1.85E+02 1.97E+02 1.71E+02
Avg 1.97E+02 2.55E+02 2.51E+02 1.96E+02 2.43E+02 2.39E+02 2.57E+02 2.66E+02 2.57E+02 2.29E+02
Orl
Worst 9.55E+05 7.65E+05 5.28E+05 5.59E+05 5.69E+05 7.77E+05 5.72E+05 7.32E+05 6.38E+05 7.67E+05
Best 8.44E+05 7.35E+05 5.23E+05 5.36E+05 5.55E+05 7.44E+05 5.54E+05 6.70E+05 5.73E+05 7.51E+05
Avg 9.37E+05 7.62E+05 5.27E+05 5.45E+05 5.64E+05 7.71E+05 5.65E+05 7.01E+05 6.28E+05 7.60E+05
Steel
Worst 3.8E+09 4.64E+09 4.18E+09 3.8E+09 4.64E+09 4.26E+09 4.64E+09 4.64E+09 4.64E+09 4.64E+09
Best 2.40E+06 2.42E+06 2.48E+06 2.36E+06 2.55E+06 2.48E+06 2.51E+06 2.55E+06 2.37E+06 2.44E+06
Avg 3.29E+09 3.97E+09 3.85E+09 3.05E+09 3.92E+09 3.72E+09 3.85E+09 3.96E+09 3.98E+09 3.71E+09
Wine
Worst 2.26E+04 2.90E+04 3.51E+04 2.27E+04 8.38E+04 3.83E+04 4.28E+04 8.38E+04 8.38E+04 8.38E+04
Best 1.74E+04 1.82E+04 1.77E+04 1.75E+04 1.75E+04 1.78E+04 1.79E+04 1.79E+04 1.91E+04 1.82E+04
Avg 2.01E+04 2.18E+04 3.40E+04 2.00E+04 3.31E+04 2.58E+04 2.52E+04 3.64E+04 3.83E+04 3.35E+04
Balance
Scale
Worst 1.49E+03 1.51E+03 2.33E+03 1.63E+03 1.54E+03 1.52E+03 1.51E+03 1.53E+03 1.79E+03 1.51E+03
Best 1.45E+03 1.45E+03 1.52E+03 1.47E+03 1.46E+03 1.44E+03 1.45E+03 1.45E+03 1.48E+03 1.45E+03
Avg 1.47E+03 1.49E+03 2.15E+03 1.55E+03 1.49E+03 1.48E+03 1.48E+03 1.50E+03 1.61E+03 1.48E+03
Worst 2.37E+03 2.99E+03 2.69E+03 2.44E+03 3.22E+03 2.79E+03 3.13E+03 2.55E+03 2.50E+03 2.79E+03
Breasts Best 2.22E+03 2.69E+03 2.58E+03 2.29E+03 2.84E+03 2.53E+03 2.68E+03 2.43E+03 2.43E+03 2.47E+03
Avg 2.30E+03 2.89E+03 2.63E+03 2.37E+03 3.10E+03 2.73E+03 2.98E+03 2.51E+03 2.47E+03 2.63E+03
Glass
Worst 8.46E+02 8.86E+02 9.65E+02 8.76E+02 1.19E+03 1.15E+03 1.19E+03 1.19E+03 1.20E+03 1.20E+03
Best 5.52E+02 5.87E+02 5.89E+02 5.15E+02 5.31E+02 5.73E+02 6.19E+02 5.65E+02 5.93E+02 6.10E+02
Avg 8.13E+02 7.72E+02 9.35E+02 7.44E+02 9.74E+02 8.87E+02 9.86E+02 1.02E+03 1.09E+03 8.79E+02
Haberman
Worst 3.61E+03 4.47E+03 4.46E+03 3.64E+03 4.14E+03 4.46E+03 5.64E+03 4.16E+03 4.28E+03 4.52E+03
Best 2.70E+03 2.84E+03 3.07E+03 2.73E+03 3.00E+03 3.14E+03 3.21E+03 2.78E+03 3.01E+03 2.79E+03
Avg 3.14E+03 3.85E+03 3.43E+03 3.29E+03 3.65E+03 3.78E+03 3.77E+03 3.70E+03 3.78E+03 3.91E+03
Heart
Worst 1.97E+04 2.72E+04 3.33E+04 1.97E+04 4.22E+04 3.46E+04 4.17E+04 4.22E+04 4.15E+04 3.62E+04
Best 2.40E+06 2.42E+06 2.48E+06 2.36E+06 1.46E+04 1.43E+04 1.42E+04 1.37E+04 1.42E+04 1.31E+04
Avg 3.29E+09 3.97E+09 3.85E+09 3.05E+09 2.43E+04 2.18E+04 2.75E+04 2.71E+04 3.03E+04 1.99E+04
Hepatitis
Worst 2.26E+04 2.90E+04 3.51E+04 2.27E+04 2.22E+04 2.25E+04 1.96E+04 2.25E+04 2.24E+04 2.27E+04
Best 1.74E+04 1.82E+04 1.77E+04 1.75E+04 1.35E+04 1.31E+04 1.34E+04 1.36E+04 1.32E+04 1.34E+04
Avg 2.01E+04 2.18E+04 3.40E+04 2.00E+04 1.83E+04 1.78E+04 1.72E+04 1.90E+04 1.79E+04 1.75E+04
Libras
Worst 1.49E+03 1.51E+03 2.33E+03 1.63E+03 1.05E+03 9.16E+02 9.21E+02 6.11E+02 7.15E+02 7.38E+02
Best 1.45E+03 1.45E+03 1.52E+03 1.47E+03 6.89E+02 8.72E+02 6.69E+02 5.78E+02 6.88E+02 5.97E+02
Avg 1.47E+03 1.49E+03 2.15E+03 1.55E+03 8.82E+02 8.94E+02 8.62E+02 5.87E+02 7.07E+02 6.60E+02
Lung Cancer
Worst 2.37E+03 2.99E+03 2.69E+03 2.44E+03 1.98E+02 1.87E+02 2.03E+02 2.19E+02 1.97E+02 2.07E+02
Best 2.22E+03 2.69E+03 2.58E+03 2.29E+03 1.69E+02 1.70E+02 1.80E+02 1.79E+02 1.66E+02 1.76E+02
Avg 2.30E+03 2.89E+03 2.63E+03 2.37E+03 1.88E+02 1.80E+02 1.95E+02 1.97E+02 1.83E+02 1.93E+02
Madelon
Worst 8.46E+02 8.86E+02 9.65E+02 8.76E+02 1.95E+06 1.83E+06 1.86E+06 1.82E+06 1.84E+06 1.82E+06
Best 5.52E+02 5.87E+02 5.89E+02 5.15E+02 1.94E+06 1.83E+06 1.84E+06 1.82E+06 1.82E+06 1.82E+06
Avg 8.13E+02 7.72E+02 9.35E+02 7.44E+02 1.95E+06 1.83E+06 1.85E+06 1.82E+06 1.83E+06 1.82E+06
Seeds
Worst 3.61E+03 4.47E+03 4.46E+03 3.64E+03 7.75E+02 8.30E+02 8.41E+02 7.97E+02 8.58E+02 7.19E+02
Best 2.70E+03 2.84E+03 3.07E+03 2.73E+03 5.29E+02 5.15E+02 4.93E+02 5.29E+02 5.28E+02 5.33E+02
Avg 3.14E+03 3.85E+03 3.43E+03 3.29E+03 6.63E+02 6.42E+02 6.66E+02 6.96E+02 7.47E+02 6.29E+02
Speech
Worst 1.97E+04 2.72E+04 3.33E+04 1.97E+04 6.58E+12 6.58E+12 6.52E+12 6.58E+12 6.58E+12 6.55E+12
Best 2.40E+06 2.42E+06 2.48E+06 2.36E+06 3.68E+12 2.54E+12 3.02E+12 3.31E+12 3.45E+12 3.46E+12
Avg 3.29E+09 3.97E+09 3.85E+09 3.05E+09 5.40E+12 5.06E+12 5.09E+12 5.03E+12 4.42E+12 5.07E+12
Vowel
Worst 2.26E+04 2.90E+04 3.51E+04 2.27E+04 5.83E+05 4.49E+05 5.72E+05 5.76E+05 6.92E+05 4.43E+05
Best 1.74E+04 1.82E+04 1.77E+04 1.75E+04 2.45E+05 2.50E+05 2.60E+05 2.32E+05 2.37E+05 2.23E+05
Avg 2.01E+04 2.18E+04 3.40E+04 2.00E+04 4.04E+05 3.46E+05 3.87E+05 4.38E+05 4.46E+05 3.31E+05
IAS-1, in 60% of clustering datasets, achieved better results than other versions; IAS-2,
IAS-4, and IAS-6 succeeded in 10% of clustering datasets. Regarding the average fitness
function results for the different versions of the IAS, IAS-1 exceeded other versions in 67%
Symmetry 2023,15, 894 16 of 26
of the clustering datasets. Chaotic functions tend to reach the closest point to the objective
function by finding optimal solutions.
5.3. Comparison of the Proposed Model with Other Metaheuristics
In this section, the first chaotic map-based IAS (IAS-1), called CIAS, is compared with
other basic metaheuristic algorithms in terms of different statistical criteria. The results of
other evaluations and comparisons are given below. The results related to the convergence
rate of the proposed model and comparative metaheuristic algorithms implemented on
10 datasets are shown in Figure 4.
The results related to the convergence rate of the proposed model and the fifth group
of comparative algorithms show that the proposed CIAS algorithm performed better than
the other metaheuristic algorithms in two of the fifth group of datasets. The results related
to the convergence rate of the proposed model and comparative algorithms implemented
on the whole dataset indicate that the proposed CIAS approach has achieved better results.
The proposed CIAS model performed better in 75% of clustering datasets. The results
related to the worst, best, and average solutions for the population of the proposed model
and other comparative algorithms are presented in Table 5.
Symmetry 2023, 15, x FOR PEER REVIEW 17 of 26
Libras
Wors
t
1.49E+03 1.51E+03 2.33E+03 1.63E+03 1.05E+03 9.16E+02 9.21E+02 6.11E+02 7.15E+02 7.38E+02
Best 1.45E+03 1.45E+03 1.52E+03 1.47E+03 6.89E+02 8.72E+02 6.69E+02 5.78E+02 6.88E+02 5.97E+02
Avg 1.47E+03 1.49E+03 2.15E+03 1.55E+03 8.82E+02 8.94E+02 8.62E+02 5.87E+02 7.07E+02 6.60E+02
Lung
Cancer
Worst 2.37E+03 2.99E+03 2.69E+03 2.44E+03 1.98E+02 1.87E+02 2.03E+02 2.19E+02 1.97E+02 2.07E+02
Best 2.22E+03 2.69E+03 2.58E+03 2.29E+03 1.69E+02 1.70E+02 1.80E+02 1.79E+02 1.66E+02 1.76E+02
Avg 2.30E+03 2.89E+03 2.63E+03 2.37E+03 1.88E+02 1.80E+02 1.95E+02 1.97E+02 1.83E+02 1.93E+02
Made-
lon
Worst 8.46E+02 8.86E+02 9.65E+02 8.76E+02 1.95E+06 1.83E+06 1.86E+06 1.82E+06 1.84E+06 1.82E+06
Best 5.52E+02 5.87E+02 5.89E+02 5.15E+02 1.94E+06 1.83E+06 1.84E+06 1.82E+06 1.82E+06 1.82E+06
Avg 8.13E+02 7.72E+02 9.35E+02 7.44E+02 1.95E+06 1.83E+06 1.85E+06 1.82E+06 1.83E+06 1.82E+06
Seeds
Worst 3.61E+03 4.47E+03 4.46E+03 3.64E+03 7.75E+02 8.30E+02 8.41E+02 7.97E+02 8.58E+02 7.19E+02
Best 2.70E+03 2.84E+03 3.07E+03 2.73E+03 5.29E+02 5.15E+02 4.93E+02 5.29E+02 5.28E+02 5.33E+02
Avg 3.14E+03 3.85E+03 3.43E+03 3.29E+03 6.63E+02 6.42E+02 6.66E+02 6.96E+02 7.47E+02 6.29E+02
Speech
Worst 1.97E+04 2.72E+04 3.33E+04 1.97E+04 6.58E+12 6.58E+12 6.52E+12 6.58E+12 6.58E+12 6.55E+12
Best 2.40E+06 2.42E+06 2.48E+06 2.36E+06 3.68E+12 2.54E+12 3.02E+12 3.31E+12 3.45E+12 3.46E+12
Avg 3.29E+09 3.97E+09 3.85E+09 3.05E+09 5.40E+12 5.06E+12 5.09E+12 5.03E+12 4.42E+12 5.07E+12
Vowel
Worst 2.26E+04 2.90E+04 3.51E+04 2.27E+04 5.83E+05 4.49E+05 5.72E+05 5.76E+05 6.92E+05 4.43E+05
Best 1.74E+04 1.82E+04 1.77E+04 1.75E+04 2.45E+05 2.50E+05 2.60E+05 2.32E+05 2.37E+05 2.23E+05
Avg 2.01E+04 2.18E+04 3.40E+04 2.00E+04 4.04E+05 3.46E+05 3.87E+05 4.38E+05 4.46E+05 3.31E+05
Figure 4. Cont.
Symmetry 2023,15, 894 17 of 26
Symmetry 2023, 15, x FOR PEER REVIEW 18 of 26
Figure 4. Convergence rate of the proposed model and other comparative algorithms.
The results related to the convergence rate of the proposed model and the fifth group
of comparative algorithms show that the proposed CIAS algorithm performed better than
the other metaheuristic algorithms in two of the fifth group of datasets. The results related
to the convergence rate of the proposed model and comparative algorithms implemented
on the whole dataset indicate that the proposed CIAS approach has achieved better re-
sults. The proposed CIAS model performed better in 75% of clustering datasets. The re-
sults related to the worst, best, and average solutions for the population of the proposed
model and other comparative algorithms are presented in Table 5.
Table 5. Results related to the worst, best, and average solutions for the population of the proposed
model and other comparative algorithms.
Dataset CSA ABC BA AEFA CIAS
Figure 4. Convergence rate of the proposed model and other comparative algorithms.
Symmetry 2023,15, 894 18 of 26
Table 5. Results related to the worst, best, and average solutions for the population of the proposed model and other comparative algorithms.
Dataset CSA ABC BA AEFA CIAS
Blood
worst 4.10E+05 3.90E+06 6.01E+05 4.88E+06 4.41E+05
best 4.08E+05 4.11E+05 6.01E+05 4.85E+05 4.41E+05
avg 4.09E+05 1.13E+06 6.01E+05 1.99E+06 4.41E+05
Cancer
worst 4.43E+03 9.45E+03 5.85E+03 3.57E+03 2.96E+03
best 4.09E+03 3.57E+03 5.81E+03 3.56E+03 2.96E+03
avg 4.30E+03 5.78E+03 5.82E+03 3.56E+03 2.96E+03
CMC
worst 6.47E+03 1.05E+04 7.72E+03 6.74E+03 5.53E+03
best 6.30E+03 5.95E+03 7.69E+03 6.74E+03 5.53E+03
avg 6.35E+03 7.75E+03 7.70E+03 6.74E+03 5.53E+03
Dermatology
worst 2.97E+03 3.51E+03 3.08E+03 3.14E+03 2.25E+03
best 2.97E+03 3.16E+03 3.07E+03 3.13E+03 2.24E+03
avg 2.97E+03 3.35E+03 3.07E+03 3.14E+03 2.25E+03
Iris
worst 1.06E+02 3.60E+02 1.50E+02 1.07E+02 9.67E+01
best 1.03E+02 1.22E+02 1.46E+02 1.05E+02 9.67E+01
avg 1.04E+02 2.21E+02 1.48E+02 1.07E+02 9.67E+01
Orl
worst 5.01E+05 7.77E+05 6.36E+05 7.33E+05 5.03E+05
best 5.00E+05 7.68E+05 6.36E+05 7.26E+05 5.03E+05
avg 5.00E+05 7.74E+05 6.36E+05 7.30E+05 5.03E+05
Steel
worst 2.99E+09 9.93E+09 6.82E+09 2.98E+10 5.81E+09
best 2.95E+09 2.15E+09 6.82E+09 6.30E+09 5.81E+09
avg 2.97E+09 3.40E+09 6.82E+09 1.85E+10 5.81E+09
Wine
worst 1.72E+04 1.83E+04 1.71E+04 5.35E+04 1.63E+04
best 1.71E+04 1.65E+04 1.71E+04 1.90E+04 1.63E+04
avg 1.72E+04 1.72E+04 1.71E+04 5.02E+04 1.63E+04
balance scale
worst 1.43E+03 1.72E+03 1.45E+03 1.43E+03 1.43E+03
best 1.43E+03 1.44E+03 1.44E+03 1.43E+03 1.43E+03
avg 1.43E+03 1.52E+03 1.44E+03 1.43E+03 1.43E+03
breasts
worst 3.43E+03 6.08E+03 3.05E+03 2.36E+03 2.02E+03
best 3.39E+03 2.32E+03 3.03E+03 2.36E+03 2.02E+03
avg 3.41E+03 4.34E+03 3.04E+03 2.36E+03 2.02E+03
Symmetry 2023,15, 894 19 of 26
Table 5. Cont.
Dataset CSA ABC BA AEFA CIAS
glass
worst 4.37E+02 6.30E+02 3.69E+02 4.11E+02 2.53E+02
best 3.91E+02 3.07E+02 3.65E+02 4.10E+02 2.53E+02
avg 4.10E+02 5.03E+02 3.67E+02 4.11E+02 2.53E+02
Haberman
worst 2.62E+03 1.11E+04 2.94E+03 2.59E+03 2.57E+03
best 2.59E+03 2.62E+03 2.93E+03 2.59E+03 2.57E+03
avg 2.61E+03 3.90E+03 2.93E+03 2.59E+03 2.57E+03
heart
worst 1.10E+04 3.01E+04 1.45E+04 1.19E+04 1.06E+04
best 1.08E+04 1.07E+04 1.45E+04 1.13E+04 1.06E+04
avg 1.09E+04 1.37E+04 1.45E+04 1.18E+04 1.06E+04
Hepatitis
worst 1.24E+04 1.25E+04 1.48E+04 1.93E+04 1.18E+04
best 1.20E+04 1.18E+04 1.48E+04 1.48E+04 1.18E+04
avg 1.22E+04 1.21E+04 1.48E+04 1.93E+04 1.18E+04
Libras
worst 5.87E+02 9.16E+02 7.34E+02 7.78E+02 5.41E+02
best 5.85E+02 8.71E+02 7.23E+02 7.78E+02 5.40E+02
avg 5.86E+02 8.92E+02 7.26E+02 7.78E+02 5.41E+02
lung Cancer
worst 1.59E+02 1.71E+02 1.64E+02 1.65E+02 1.38E+02
best 1.58E+02 1.60E+02 1.63E+02 1.65E+02 1.38E+02
avg 1.59E+02 1.66E+02 1.63E+02 1.65E+02 1.38E+02
Madelon
worst 1.86E+06 3.91E+06 2.85E+06 2.67E+06 1.91E+06
best 1.86E+06 3.64E+06 2.85E+06 2.52E+06 1.90E+06
avg 1.86E+06 3.77E+06 2.85E+06 2.59E+06 1.90E+06
seeds
worst 3.77E+02 1.04E+03 3.69E+02 3.68E+02 3.12E+02
best 3.67E+02 3.72E+02 3.63E+02 3.65E+02 3.12E+02
avg 3.71E+02 5.29E+02 3.64E+02 3.66E+02 3.12E+02
speech
worst 4.65E+12 2.41E+12 6.92E+12 3.63E+13 3.00E+12
best 3.71E+12 2.16E+12 6.92E+12 7.07E+12 3.00E+12
avg 4.18E+12 2.26E+12 6.92E+12 1.68E+13 3.00E+12
vowel
worst 1.71E+05 3.73E+05 2.55E+05 4.16E+05 1.62E+05
best 1.69E+05 1.92E+05 2.55E+05 2.09E+05 1.62E+05
avg 1.70E+05 2.58E+05 2.55E+05 3.27E+05 1.62E+05
Symmetry 2023,15, 894 20 of 26
The outcomes of the worst, best, and average population solutions for the proposed
model, as well as other comparative algorithms, demonstrate that the worst population
solution outperformed other algorithms in clustering datasets, while the best population
solution outperformed other algorithms in clustering datasets, and the average population
solution outperformed other comparative algorithms. In this section, the simulation and
parameter determination was first carried out. Then the different versions of the IAS based
on the chaotic map (CIAS-1, CIAS-2,
. . .
, CIAS-10) were compared in various statistical
criteria. Further evaluations and comparisons showed that the Chebyshev chaotic map
achieved better results than other chaotic maps. The Chebyshev-based IAS was compared
with the basic metaheuristic algorithms such as BA, CSA, ABC, and AEFA. The results of
various experiments indicate that the Chebyshev-based IAS has better convergence and
performance than other basic metaheuristic algorithms.
6. Real Application: Binary CIAS on COVID-19 Dataset
The severe acute respiratory syndrome coronavirus2 is the causative agent of the
sickness known as coronavirus disease 2019 (COVID-19) (https://github.com/Atharva-
Peshkar/Covid-19-Patient-Health-Analytics, accessed on 22 January 2023), which is an
exceptionally infectious and dangerous illness (SARS-CoV-2). In December 2019, Wuhan,
China, was the location of the first confirmed case, which was quickly followed by a rapid
global spread. Due to the escalating number of likely COVID-19 acute respiratory issues
and the disease’s high fatality rates, the World Health Organization (WHO) has proclaimed
the COVID-19 illness a worldwide catastrophe. It is vital to develop effective processes
that consistently identify potential cases of COVID-19 to halt its spread and partially
alleviate the global crisis. This will enable likely patients to be isolated from the general
population. Several alternative optimization approaches are being developed as part of
the response to the COVID-19 pandemic. These approaches may be separated into distinct
categories: screening, monitoring, prediction, and diagnosis. In recent times, a significant
number of diagnostic procedures that detect the COVID-19 disease by exploiting efficient
features taken from the clinical dataset have been developed. So far, various models such
as BE-WOA [
28
], Binary Simulated Normal Distribution Optimizer (BSNDO) [
29
], and
Artificial Gorilla Troop Optimization (AGTO) [
30
] have been proposed for the diagnosis of
COVID-19 disease.
The applicability and performance of the BIAS are tested on the novel coronavirus
2019 dataset, which is a pre-processed and released version of the original COVID-19
dataset. The results of these evaluations are discussed in the section that follows. Table 6
describes the dataset after it has been pre-processed. In that Table, the column labeled
“diff_sym_hos” contains the number of days that have elapsed since the date on which
symptoms were first observed (which corresponds to the column “sym_on” in the raw
dataset) and the date on which the patient checked into the hospital (column hosp-vis
in the original dataset). All of the categorical columns in the pre-processed dataset were
label-encoded by assigning a number to each distinct, unconditional value included inside
the column. There are 864 cases and 14 attributes included in this dataset.
The experiment’s results were repeated 20 times to evaluate the BIAS’s performance
compared to other algorithms. The K Nearest Neighbor (KNN) classifier was used with k
equal to 3 and the 10-fold cross-validation approach to construct the classification model
for every algorithm.
Symmetry 2023,15, 894 21 of 26
Table 6. Description of the novel coronavirus 2019 dataset.
No. Features Name Description
1 Location The location where patients belong to
2 Country The country where patients belong to
3 Gender The gender of patients
4 Age The ages of the patients
5 vis_wuhan (Yes: 1, 0: No) Whether the patients visited Wuhan, China
6 from_wuhan (Yes: 1, 0: No) Whether the patients from Wuhan, China
7 symptom 1 Fever
8 symptom 2 Cough
9 symptom 3 Cold
10 symptom 4 Fatigue
11 symptom 5 Body pain
12 symptom 6 Malaise
13 diff_sym_hos The day’s difference between the symptoms being noticed
and admission to the hospital
14 Class The class of patient can be either death or recovery
6.1. Fitness Function
The main challenge is determining which features from a dataset will help a classifier
correctly identify the category to which a sample belongs [
31
,
32
]. While we are selecting
essential features, we must automatically rule out those that are redundant. When the se-
lected feature subset is used for classification, we will be able to maximize the classification
accuracy of a classification problem [
33
]. In this paper, BIAS is used to identify the most
helpful feature subset, and then a classifier is used to determine how accurately this feature
subset can be classified. Let
ACC
stand for the classification accuracy of the model that was
determined with the help of a classifier;
Da
for the dimension of the feature subset; and
Nt
for the total number of attributes that were included in the initial dataset. Therefore, the
error in classification is denoted by the notation
(1ACC)
, and the proportion of features
chosen from the complete dataset is represented by
Da
Na
. The fitness function is defined
according to Equation (6).
FF =α×(1ACC)+(1α)×Da
Na(6)
In Equation (6), α[0, 1]denotes the weightage given to the classification error.
6.2. Transfer Function
Since FS is a binary optimization problem, its result is the numbers 0 and 1, where
0 indicates that the feature is not selected since it is unnecessary. One suggests that it is
chosen since it is beneficial. However, we cannot determine the potential that the result
produced will fall outside the desired range. A binarization function needs to be applied to
each agent for us to guarantee that the output will always fall within the parameters of the
selected range. The sigmoid (S-shaped) transfer function is responsible for carrying out
this activity in BIAS. The S-shaped transfer function is defined according to Equation (7).
T(x)=1
1+ex(7)
Symmetry 2023,15, 894 22 of 26
Xd(t)=
1i f r and <T(Xd(t))
0i f r and T(Xd(t))(8)
This function has a range of [0,1] in its domain. If the output of the transfer function is
more significant than rand, where it is a random number chosen from a uniform distribution
between 0 and 1, the value is equal to 1. This property is helpful, its value will always be
equal to 0 if it is equal to or lower than the rand. Since the attribute is unnecessary, we will
not consider it.
6.3. Evaluation Criteria
The BIAS’s effectiveness was evaluated based on accuracy, Recall, precision, F-measure,
and the total amount of features (selection size).
Precision:
The significance of the results is defined by the accuracy of the results,
which is represented as the ratio of successfully predicted positive observations to the total
number of positive observations.
Precision =TP
TP +FP (9)
Recall:
The term “recall” refers to the proportion of accurately predicted affirmative
observations relative to the total number of observations in an actual class that answer “yes”.
Recall =TP
TP +FN (10)
F-measure:
The F1 Score is another method for determining the correctness of an
experiment. It is calculated using the weighted mean of the Precision and Recall scores. As
a result, this score considers the possibility of both false positives and negatives.
FMeasure =2×Precision ×Recall
Precision +Recall (11)
Accuracy:
Accuracy is the measurable statistic that correctly classifies the occurrence
instance, and it is simply a ratio of predicted correct observations to the total sample size.
It is the performance measure that is the most intuitive to assess.
Accuracy =TP +TN
TP +TN +FP +FN (12)
According to accuracy values in Table 7, BIAS could take the highest average accuracy
value of 96.25%, while BABC, BBA, BCSA, and BAEFA could take 91.95%, 92.48%, 95.32%,
and 94.79%, respectively.
Compared to previous algorithms, the proposed model performed significantly better
in Recall, precision, accuracy, and f-measure, as evidenced by the experiment’s findings.
Within the search space, the implementation of BIAS investigates regions that are relatively
close to optimum global values. During the exploration and exploitation phases, the BIAS
versions search the most promising area of the search space. According to the results of
BIAS’s search history analyses, the distribution of candidate solution points around the
global optimal is greater than that of BCSA. Figure 5compares the performance of BIAS
with other algorithms based on recall, precision, f-measure, and accuracy.
Symmetry 2023,15, 894 23 of 26
Table 7. Comparison of the BIAS and other algorithms based on the accuracy.
Models Iterations Precision Recall F-Measure Accuracy
BABC 100 91.15 91.24 91.19 91.21
200 91.48 91.62 91.55 91.95
BBA 100 92.29 92.38 92.33 92.15
200 92.37 92.51 92.44 92.48
BCSA 100 94.14 94.26 94.20 94.71
200 95.25 95.38 95.31 95.32
BAEFA 100 94.06 94.19 94.09 94.36
200 94.52 94.63 94.57 94.79
Proposed
Model
100 95.53 95.76 95.64 95.84
200 96.04 96.35 96.19 96.25
Symmetry 2023, 15, x FOR PEER REVIEW 23 of 26
According to accuracy values in Table 7, BIAS could take the highest average accu-
racy value of 96.25%, while BABC, BBA, BCSA, and BAEFA could take 91.95%, 92.48%,
95.32%, and 94.79%, respectively.
Table 7. Comparison of the BIAS and other algorithms based on the accuracy.
Models
Iterations
Precision
Recall
F-Measure
Accuracy
BABC
100
91.15
91.24
91.19
91.21
200
91.48
91.62
91.55
91.95
BBA
100
92.29
92.38
92.33
92.15
200
92.37
92.51
92.44
92.48
BCSA
100
94.14
94.26
94.20
94.71
200
95.25
95.38
95.31
95.32
BAEFA
100
94.06
94.19
94.09
94.36
200
94.52
94.63
94.57
94.79
Proposed Model
100
95.53
95.76
95.64
95.84
200
96.04
96.35
96.19
96.25
Compared to previous algorithms, the proposed model performed significantly bet-
ter in Recall, precision, accuracy, and f-measure, as evidenced by the experiment’s find-
ings. Within the search space, the implementation of BIAS investigates regions that are
relatively close to optimum global values. During the exploration and exploitation phases,
the BIAS versions search the most promising area of the search space. According to the
results of BIAS’s search history analyses, the distribution of candidate solution points
around the global optimal is greater than that of BCSA. Figure 5 compares the perfor-
mance of BIAS with other algorithms based on recall, precision, f-measure, and accuracy.
Figure 5. Performance comparison of BIAS with other algorithms.
Figure 5. Performance comparison of BIAS with other algorithms.
The BIAS optimization algorithm was used to produce the best possible feature set,
displayed in Table 8). BIAS identifies the best potential value for six of the thirteen provided
features. It is utilized in these characteristics to predict the presence of COVID-19 positive
in individuals exhibiting various symptoms. When compared to the input feature set,
which consisted of 14 features (Table 6), 8 of those features were removed. The vast majority
of the elements that have been removed are those that pertain to personal information such
as age, sex, etc. The results of the suggested model, based on the selection of features, are
presented in Table 8. The accuracy percentage ranges from 98.41 to 98.68 for five features.
The accuracy percentage that is the highest for six different features is 98.23, while the
accuracy percentage that is the lowest is 98.06. In addition, the recall and precision scores
Symmetry 2023,15, 894 24 of 26
are both 98.35 and 98.30, correspondingly. The accuracy ranges from 97.76 to 98.31 percent
over seven distinct features. The least accurate accuracy range is 97.76 percent. The accuracy
ranges from 97.52 to 97.65 when eight features are chosen. Accuracy is 97.29 percent for
precision, 97.48 percent for Recall, 97.38 percent for F-Measure, and 97.52 percent overall
when ten features from the feature space are selected. The highest possible accuracy is
96.92%, and the lowest possible accuracy percentage is 96.84 when 11 features are selected.
According to the findings, the proposed model with fewer features has a better accuracy
percentage than its competitors.
Table 8. Results of the BIAS Feature Selection.
Features Precision Recall F-Measure Accuracy
5 98.32 98.43 98.37 98.68
5 98.41 98.47 98.44 98.41
6 98.26 98.35 98.30 98.23
6 98.14 98.46 98.30 98.06
7 98.45 98.55 98.50 98.31
7 97.35 97.49 97.42 97.76
8 97.50 97.58 97.54 97.65
8 97.14 97.30 97.22 97.52
9 97.32 97.58 97.45 97.41
10 97.29 97.48 97.38 97.52
10 97.06 97.19 97.12 97.13
11 96.58 96.67 96.62 96.84
11 96.61 96.75 96.68 96.92
12 96.35 96.42 96.38 96.56
12 96.42 96.56 96.49 96.42
13 96.11 96.20 96.15 96.25
7. Conclusions and Future Works
The IAS is a population-based metaheuristic optimization algorithm with three robust
operators: individual training sessions, group training sessions, and new student challenges.
This paper presented an improved version of the chaotic IAS to solve data clustering
problems. First, ten different chaotic maps were used to generate different versions of
the IAS (i.e., CIAS-1, CIAS-2,
. . .
, and CIAS-10). Next, 20 valid UCI clustering datasets
were used to evaluate the proposed approaches. In addition, the intra-cluster summation
fitness function was previously defined as the fitness function for the proposed model and
other comparative algorithms. An improved version of the chaotic IAS was implemented
on MATLAB 2019; the initial population was 20, and the number of repetitions was 100.
At first, CIAS-1, CIAS-2,
. . .
, and CIAS-10 were compared with different criteria. The
various evaluations and comparisons showed that the Chebyshev-Chaotic-Map-based
version performed better. Finally, the Chebyshev-Chaotic-Map-based performance of the
IAS was compared with other basic metaheuristic methods such as the BA, the CSA, the
ABC, and the AEFA. The various experiments showed that this version of the IAS has better
convergence and performance than other basic metaheuristic algorithms. Furthermore,
BIAS is experimented on a COVID-19 dataset for detecting the coronavirus disease. Future
research will consider a multi-objective IAS with chaotic maps for solving high-dimensional
data clustering.
Author Contributions:
Conceptualization, F.S.G. and A.A.K.; methodology, F.S.G.; software, A.A.K.;
validation, F.S.G. and A.A.K.; formal analysis, A.A.K.; investigation, A.A.K.; resources, F.S.G.; data
curation, F.S.G.; writing—original draft preparation, A.A.K.; writing—review and editing, F.S.G.;
visualization, A.A.K.; supervision, F.S.G.; project administration, F.S.G.; funding acquisition, F.S.G.
All authors have read and agreed to the published version of the manuscript.
Funding: This paper received no external funding.
Institutional Review Board Statement: Not applicable.
Symmetry 2023,15, 894 25 of 26
Informed Consent Statement: Not applicable.
Data Availability Statement:
All data used in this manuscript was downloaded from the
UCI depository
.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
Sorkhabi, L.B.; Gharehchopogh, F.S.; Shahamfar, J. A systematic approach for pre-processing electronic health records for mining:
Case study of heart disease. Int. J. Data Min. Bioinform. 2020,24, 97–120. [CrossRef]
2.
Arasteh, B.; Abdi, M.; Bouyer, A. Program source code comprehension by module clustering using combination of discretized
gray wolf and genetic algorithms. Adv. Eng. Softw. 2022,173, 103252. [CrossRef]
3.
Nadimi-Shahraki, M.H.; Taghian, S.; Mirjalili, S.; Ewees, A.A.; Abualigah, L.; Elaziz, M.A. MTV-MFO: Multi-Trial Vector-Based
Moth-Flame Optimization Algorithm. Symmetry 2021,13, 2388. [CrossRef]
4.
Izci, D. A novel improved atom search optimization algorithm for designing power system stabilizer. Evol. Intell.
2022
,15,
2089–2103. [CrossRef]
5.
Ekinci, S.; Izci, D.; Al Nasar, M.R.; Abu Zitar, R.; Abualigah, L. Logarithmic spiral search based arithmetic optimization algorithm
with selective mechanism and its application to functional electrical stimulation system control. Soft Comput.
2022
,26, 12257–12269.
[CrossRef]
6.
Arasteh, B.; Sadegi, R.; Arasteh, K. Bölen: Software module clustering method using the combination of shuffled frog leaping and
genetic algorithm. Data Technol. Appl. 2021,55, 251–279. [CrossRef]
7.
Arasteh, B.; Sadegi, R.; Arasteh, K. ARAZ: A software modules clustering method using the combination of particle swarm
optimization and genetic algorithms. Intell. Decis. Technol. 2020,14, 449–462. [CrossRef]
8.
Gharehchopogh, F.S.; Gholizadeh, H. A comprehensive survey: Whale Optimization Algorithm and its applications. Swarm Evol.
Comput. 2019,48, 1–24. [CrossRef]
9.
Jahangiri, M.; Hadianfard, M.A.; Najafgholipour, M.A.; Jahangiri, M.; Gerami, M.R. Interactive autodidactic school: A new
metaheuristic optimization algorithm for solving mathematical and structural design optimization problems. Comput. Struct.
2020,235, 106268. [CrossRef]
10.
Karaboga, D. An Idea Based on Honey Bee Swarm for Numerical Optimization; Technical Report-tr06; Erciyes University: Ercis, Turkey,
2005.
11.
Yang, X.-S. A new metaheuristic bat-inspired algorithm. In Nature Inspired Cooperative Strategies for Optimization (NICSO 2010);
Springer: Berlin/Heidelberg, Germany, 2010; pp. 65–74.
12.
Askarzadeh, A. A novel metaheuristic method for solving constrained engineering optimization problems: Crow search algorithm.
Comput. Struct. 2016,169, 1–12. [CrossRef]
13.
Yadav, A.; Kumar, N. Artificial electric field algorithm for engineering optimization problems. Expert Syst. Appl.
2020
,149, 113308.
[CrossRef]
14.
Ahmadi, R.; Ekbatanifard, G.; Bayat, P. A Modified Grey Wolf Optimizer Based Data Clustering Algorithm. Appl. Artif. Intell.
2021,35, 63–79. [CrossRef]
15.
Ashish, T.; Kapil, S.; Manju, B. Parallel bat algorithm-based clustering using mapreduce. In Networking Communication and Data
Knowledge Engineering; Springer: Berlin/Heidelberg, Germany, 2018; pp. 73–82.
16.
Eesa, A.S.; Orman, Z. A new clustering method based on the bio-inspired cuttlefish optimization algorithm. Expert Syst.
2020
,
37, e12478. [CrossRef]
17.
Olszewski, D. Asymmetric k-means algorithm. In Adaptive and Natural Computing Algorithms; Lecture Notes in Computer Science;
Dobnikar, A., Lotric, U., Ster, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6594, pp. 1–10.
18.
Aggarwal, S.; Singh, P. Cuckoo and krill herd-based k-means++ hybrid algorithms for clustering. Expert Syst.
2019
,36, e12353.
[CrossRef]
19.
Zhang, G.; Zhang, C.; Zhang, H. Improved K-means algorithm based on density Canopy. Knowl. Based Syst.
2018
,145, 289–297.
[CrossRef]
20.
Kumar, A.; Kumar, D.; Jarial, S. A novel hybrid K-means and artificial bee colony algorithm approach for data clustering. Decis. Sci.
Lett. 2018,7, 65–76. [CrossRef]
21.
Nasiri, J.; Khiyabani, F.M. A whale optimization algorithm (WOA) approach for clustering. Cogent Math. Stat.
2018
,5, 1483565.
[CrossRef]
22.
Qaddoura, R.; Faris, H.; Aljarah, I. An efficient evolutionary algorithm with a nearest neighbor search technique for clustering
analysis. J. Ambient. Intell. Humaniz. Comput. 2021,12, 8387–8412. [CrossRef]
23.
Zhou, Y.; Wu, H.; Luo, Q.; Abdel-Baset, M. Automatic data clustering using nature-inspired symbiotic organism search algorithm.
Knowl. Based Syst. 2019,163, 546–557. [CrossRef]
24.
Ewees, A.A.; Elaziz, M.A. Performance analysis of Chaotic Multi-Verse Harris Hawks Optimization: A case study on solving
engineering problems. Eng. Appl. Artif. Intell. 2020,88, 103370. [CrossRef]
25.
Chen, K.; Zhou, F.; Liu, A. Chaotic dynamic weight particle swarm optimization for numerical function optimization. Knowl. Based
Syst. 2018,139, 23–40. [CrossRef]
Symmetry 2023,15, 894 26 of 26
26.
Zhang, X.; Xu, Y.; Yu, C.; Heidari, A.A.; Li, S.; Chen, H.; Li, C. Gaussian mutational chaotic fruit fly-built optimization and feature
selection. Expert Syst. Appl. 2019,141, 112976. [CrossRef]
27.
Gharehchopogh, F.S.; Nadimi-Shahraki, M.H.; Barshandeh, S.; Abdollahzadeh, B.; Zamani, H. CQFFA: A Chaotic Quasi-
oppositional Farmland Fertility Algorithm for Solving Engineering Optimization Problems. J. Bionic Eng.
2022
,20, 158–183.
[CrossRef]
28.
Nadimi-Shahraki, M.H.; Zamani, H.; Mirjalili, S. Enhanced whale optimization algorithm for medical feature selection:
A COVID-19 case study. Comput. Biol. Med. 2022,148, 105858. [CrossRef]
29.
Ahmed, S.; Sheikh, K.H.; Mirjalili, S.; Sarkar, R. Binary Simulated Normal Distribution Optimizer for feature selection: Theory
and application in COVID-19 datasets. Expert Syst. Appl. 2022,200, 116834. [CrossRef]
30.
Piri, J.; Mohapatra, P.; Acharya, B.; Gharehchopogh, F.S.; Gerogiannis, V.C.; Kanavos, A.; Manika, S. Feature Selection Using
Artificial Gorilla Troop Optimization for Biomedical Data: A Case Analysis with COVID-19 Data. Mathematics
2022
,10, 2742.
[CrossRef]
31.
Nadimi-Shahraki, M.H.; Fatahi, A.; Zamani, H.; Mirjalili, S. Binary Approaches of Quantum-Based Avian Navigation Optimizer
to Select Effective Features from High-Dimensional Medical Data. Mathematics 2022,10, 2770. [CrossRef]
32.
Nadimi-Shahraki, M.H.; Fatahi, A.; Zamani, H.; Mirjalili, S.; Oliva, D. Hybridizing of Whale and Moth-Flame Optimization
Algorithms to Solve Diverse Scales of Optimal Power Flow Problem. Electronics 2022,11, 831. [CrossRef]
33.
Nadimi-Shahraki, M.H.; Moeini, E.; Taghian, S.; Mirjalili, S. DMFO-CD: A Discrete Moth-Flame Optimization Algorithm for
Community Detection. Algorithms 2021,14, 314. [CrossRef]
Disclaimer/Publisher’s Note:
The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
... It is a problem to group a collection of items so B Farhad Soleimanian Gharehchopogh bonab.farhad@gmail.com 1 Department of Computer Engineering, Islamic Azad University, Urmia Branch, Urmia, Iran that those in the same group are more similar than those in separate sets [1]. A cluster refers to a collection of objects grouped. ...
... The study of data clustering is a subfield of data mining that may be used as a systematic approach to data analysis. It is a problem to group a collection of items so B Farhad Soleimanian Gharehchopogh bonab.farhad@gmail.com 1 Department of Computer Engineering, Islamic Azad University, Urmia Branch, Urmia, Iran that those in the same group are more similar than those in separate sets [1]. A cluster refers to a collection of objects grouped. ...
Article
Full-text available
Data clustering is one of the main issues in the optimization problem. It is the process of clustering a group of items into several groups. Items within each group have the greatest similarity and the least similarity to things in other groups. It is employed in various domains and applications, including biology, business, and consumer analysis, document clustering, web, banking, and image processing, to name a few. In this paper, two new methods are proposed using hybridization of the Bald Eagle Search (BES) Algorithm with the African Vultures Optimization Algorithm (AVOA) (BESAVOA) and BESAVOA with Opposition Based Learning (BESAVOA-OBL) for data clustering. AVOA is used to find the centers of the clusters and improve the centrality of the groups obtained by the BES algorithm. Primary vectors are created based on the population of eagles, and then each vector is used BESAVOA to search the centers of the clusters. The proposed methods (BESAVOA and BESAVOA-OBL) are evaluated on 16 UCI datasets, based on the number of generations, number of iterations, execution time, and convergence. The results show that the BESAVOA-OBL fits better than the other algorithms. The results show that compared to other algorithms, BESAVOA-OBL is more effective by a ratio of 12.42 percent.
... Recent meta-heuristic algorithms have been proposed to address a wide range of complex problems, such as IoT, image segmentation, the traveling salesman problem, multi-objective problems, and data clustering [29][30][31][32][33][34][35]. ...
Article
The synergy between deep learning and meta-heuristic algorithms presents a promising avenue for tackling the complexities of energy-related modeling and forecasting tasks. While deep learning excels in capturing intricate patterns in data, it may falter in achieving optimality due to the nonlinear nature of energy data. Conversely, meta-heuristic algorithms offer optimization capabilities but suffer from computational burdens, especially with high-dimensional data. This paper provides a comprehensive review spanning 2018 to 2023, examining the integration of meta-heuristic algorithms within deep learning frameworks for energy applications. We analyze state-of-the-art techniques, innovations, and recent advancements, identifying open research challenges. Additionally, we propose a novel framework that seamlessly merges meta-heuristic algorithms into deep learning paradigms, aiming to enhance performance and efficiency in addressing energy-related problems. The contributions of the paper include: 1. Overview of recent advancements in MHs, DL, and integration. 2. Coverage of trends from 2018 to 2023. 3. Introduction of Alpha metric for performance evaluation. 4. Innovative framework harmonizing MHs with DL for energy problems.
... S F Gharehchopogh et al. proposed an improved asymmetric self-organizing mapping asymmetric clustering method. The results show that the improved algorithm based on Chebyshev chaotic function is superior to other chaotic iteration and metaheuristic algorithms, which significantly improves the fitting and convergence speed [19]. S F Gharehchopogh proposed several improved Harris Eagle optimization algorithms for community detection in social networks. ...
Article
Full-text available
High-dimensional data is widely used in many fields, but selecting key features from it is challenging. Feature selection can reduce data dimensionality and weaken noise interference, thereby improving model efficiency and enhancing model interpretability. In order to improve the efficiency and accuracy of high-dimensional data processing, a feature selection method based on optimized genetic algorithm is proposed in this study. The algorithm simulates the process of natural selection, searches for possible subsets of feature, and finds the subsets of feature that optimizes the performance of the model. The results show that when the value of K is less than 4 or more than 8, the recognition rate is very low. After adaptive bias filtering, 724 features are filtered to 372, and the accuracy is improved from 0.9352 to 0.9815. From 714 features to 406 Gaussian codes, the accuracy is improved from 0.9625 to 0.9754. Among all tests, the colon has the highest average accuracy, followed by small round blue cell tumor(SRBCT), lymphoma, central nervous system(CNS) and ovaries. The green curve is the best, with stable performance and a time range of 0–300. While maintaining the efficiency, it can reach 4.48 as soon as possible. The feature selection method has practical significance for high-dimensional data processing, improves the efficiency and accuracy of data processing, and provides an effective new method for high-dimensional data processing.
... However, these methods require significant emphasis on feature engineering and feature quality, thereby enhancing the complexity of the RUL prediction task. In various domains, optimization algorithms have provided effective solutions to tackle the complexities of problems Gharehchopogh & Khargoush, 2023;Gharehchopogh, 2023;Shishavan & Gharehchopogh, 2022;Piri et al., 2022;Gharehchopogh et al., 2023;Mohammadzadeh & Gharehchopogh, 2021). In the realm of data-driven Remaining Useful Life (RUL) prediction methods, given the increasing complexity of industrial data, deep learning neural networks have garnered significant research attention and interest due to their strengths in learning, broad coverage, adaptability, and scalability (Xia et al., 2019;. ...
Article
Full-text available
In the domain of Predictive Health Management (PHM), the prediction of Remaining Useful Life (RUL) is pivotal for averting machinery malfunctions and curtailing maintenance expenditures. Currently, most RUL prediction methods overlook the correlation between local and global information, which may lead to the loss of important features and, consequently, a subsequent decline in predictive precision. To address these limitations, this study presents a groundbreaking deep learning framework termed the Parallel Multi-Scale Feature Fusion Network (PM2FN). This approach leverages the advantages of different network structures by constructing two distinct feature extractors to capture both global and local information, thereby providing a more comprehensive feature set for RUL prediction. Experimental results on two publicly available datasets and a real-world dataset demonstrate the superiority and effectiveness of our method, offering a promising solution for industrial RUL prediction.
... An improved version of the Golden Jackal Optimization (GJO) algorithm, incorporates the opposition-based learning (OBL) approach with a probability rate, enabling the algorithm to escape from local optima [56]. The proposed chaotic IAS (Interactive Autodidactic School) model utilizes ten chaotic maps and the intra-cluster summation fitness function to enhance the results of the chaotic IAS algorithm and is introduced by Gharehchopogh et al. [57]. An improved chaotic PSO algorithm, based on adaptive inertia weight (AIWCPSO), enhances population diversity and particle periodicity by appropriately generating the initial population [58]. ...
Article
Full-text available
The Mountain Gazelle Optimizer (MGO) algorithm has become one of the most prominent swarm-inspired meta-heuristic algorithms because of its outstanding rapid convergence and excellent accuracy. However, the MGO still faces premature convergence, making it challenging to leave the local optima if early-best solutions neglect the relevant search domain. Therefore, in this study, a newly developed Chaotic-based Mountain Gazelle Optimizer (CMGO) is proposed with numerous chaotic maps to overcome the above-mentioned flaws. Moreover, the ten distinct chaotic maps were simultaneously incorporated into MGO to determine the optimal values and enhance the exploitation of the most promising solutions. The performance of CMGO has been evaluated using CEC2005 and CEC2019 benchmark functions, along with four engineering problems. Statistical tests like the t-test and Wilcoxon rank-sum test provide further evidence that the proposed CMGO outperforms the existing eminent algorithms. Hence, the experimental outcomes demonstrate that the CMGO produces successful and auspicious results.
... Most researchers have used a feature selection technique to capture the botnet's attack features to speed up botnet detection. Feature selection methods are usually viewed as a pre-processing strategy to select the best subset of features and the most influential features in the classification process from a given set of features (Piri et al., 2022;Gharehchopogh and Khargoush, 2023). In the early stages, a different set of features extracted from the network traffic was used to identify botnets. ...
Article
Wireless Sensor Networks (WSNs) have garnered significant attention from both the academic and industrial communities. However, the limited battery capacity of WSN nodes imposes a set of restrictions on energy dissipations, which has compelled researchers to seek ways to save and minimize energy consumption. This paper presents a hybrid optimization model to minimize energy dissipation in Wireless Sensor Networks (WSNs). Employing linear programming and a combination of XGBoost and Random Forest algorithms, it effectively predicts internode distances and network lifetime. The results demonstrate significant energy savings in WSN deployments, outperforming traditional methods. This approach contributes to the field by offering a practical, energy-efficient strategy for WSN configuration planning, highlighting the model’s applicability in real-world scenarios, where energy conservation is critical.
Article
Full-text available
Image segmentation consists of separating an image into regions that are entirely different from each other, and multilevel thresholding is a method used to perform this task. This article proposes an Estimation of Distribution Algorithms (EDA) combined with a Differential Evolution (DE) operator as a metaheuristic to solve the multilevel thresholding problem. The proposal is called the Differential Mutation Estimation of Distribution Algorithm (DMEDA), where the inclusion of the Differential Mutation increases the standard EDA’s exploration capacity. The performance of the DMEDA for image segmentation is tested using Otsu’s between-class variance and Kapur’s entropy as objective functions applied separately over the Berkeley Segmentation Data Set 300 (BSDS300). Besides, a comparative study includes eight well-known algorithms in the literature. In this sense, statistical and non-parametric tests are performed to verify the efficiency of the DMEDA in solving the image segmentation problem from an optimization perspective. In terms of segmentation, different metrics are employed to verify the capabilities of the DMEDA to segment digital images properly. Regarding the two objective functions, the proposed DMEDA obtains better results in 97% of the experiments for Otsu’s between-class variance and 85% for Kapur’s entropy.
Article
Full-text available
Farmland Fertility Algorithm (FFA) is a recent nature-inspired metaheuristic algorithm for solving optimization problems. Nevertheless, FFA has some drawbacks: slow convergence and imbalance of diversifcation (exploration) and intensifcation (exploitation). An adaptive mechanism in every algorithm can achieve a proper balance between exploration and exploitation. The literature shows that chaotic maps are incorporated into metaheuristic algorithms to eliminate these drawbacks. Therefore, in this paper, twelve chaotic maps have been embedded into FFA to find the best numbers of prospectors to increase the exploitation of the best promising solutions. Furthermore, the Quasi-Oppositional-Based Learning (QOBL) mechanism enhances the exploration speed and convergence rate; we name a CQFFA algorithm. The improvements have been made in line with the weaknesses of the FFA algorithm because the FFA algorithm has fallen into the optimal local trap in solving some complex problems or does not have sufcient ability in the intensification component. The results obtained show that the proposed CQFFA model has been significantly improved. It is applied to twenty-three widely-used test functions and compared with similar state-of-the-art algorithms statistically and visually. Also, the CQFFA algorithm has evaluated six real-world engineering problems. The experimental results showed that the CQFFA algorithm outperforms other competitor algorithms.
Article
Full-text available
Many metaheuristic approaches have been developed to select effective features from different medical datasets in a feasible time. However, most of them cannot scale well to large medical datasets, where they fail to maximize the classification accuracy and simultaneously minimize the number of selected features. Therefore, this paper is devoted to developing an efficient binary version of the quantum-based avian navigation optimizer algorithm (QANA) named BQANA, utilizing the scalability of the QANA to effectively select the optimal feature subset from high-dimensional medical datasets using two different approaches. In the first approach, several binary versions of the QANA are developed using S-shaped, V-shaped, U-shaped, Z-shaped, and quadratic transfer functions to map the continuous solutions of the canonical QANA to binary ones. In the second approach, the QANA is mapped to binary space by converting each variable to 0 or 1 using a threshold. To evaluate the proposed algorithm, first, all binary versions of the QANA are assessed on different medical datasets with varied feature sizes, including Pima, HeartEW, Lymphography, SPECT Heart, PenglungEW, Parkinson, Colon, SRBCT, Leukemia, and Prostate tumor. The results show that the BQANA developed by the second approach is superior to other binary versions of the QANA to find the optimal feature subset from the medical datasets. Then, the BQANA was compared with nine well-known binary metaheuristic algorithms, and the results were statistically assessed using the Friedman test. The experimental and statistical results demonstrate that the proposed BQANA has merit for feature selection from medical datasets.
Article
Full-text available
Feature selection (FS) is commonly thought of as a pre-processing strategy for determining the best subset of characteristics from a given collection of features. Here, a novel discrete artificial gorilla troop optimization (DAGTO) technique is introduced for the first time to handle FS tasks in the healthcare sector. Depending on the number and type of objective functions, four variants of the proposed method are implemented in this article, namely: (1) single-objective (SO-DAGTO), (2) bi-objective (wrapper) (MO-DAGTO1), (3) bi-objective (filter wrapper hybrid) (MO-DAGTO2), and (4) tri-objective (filter wrapper hybrid) (MO-DAGTO3) for identifying relevant features in diagnosing a particular disease. We provide an outstanding gorilla initialization strategy based on the label mutual information (MI) with the aim of increasing population variety and accelerate convergence. To verify the performance of the presented methods, ten medical datasets are taken into consideration, which are of variable dimensions. A comparison is also implemented between the best of the four suggested approaches (MO-DAGTO2) and four established multi-objective FS strategies, and it is statistically proven to be the superior one. Finally, a case study with COVID-19 samples is performed to extract the critical factors related to it and to demonstrate how this method is fruitful in real-world applications.
Article
Full-text available
A biomedical application of a novel metaheuristic optimizer is proposed in this paper by constructing an enhanced arithmetic optimization algorithm (AOA). The latter algorithm was constructed using the logarithmic spiral (Ls) search mechanism from the whale optimization algorithm and the greedy selection scheme from the differential evolution algorithm. The proposed algorithm (Ls-AOA) was tested against unimodal and multimodal benchmark functions and demonstrated better capability comparatively using other efficient metaheuristic algorithms reported in the literature. The constructed Ls-AOA algorithm was then proposed to design a proportional-integral-derivative (PID) controller employed in a functional electrical stimulation (FES) system for the first time. The initial statistical and convergence profile assessment showed better performance of the proposed algorithm. The comparative analyses for transient and frequency responses were performed for the PID-controlled FES system using the original AOA, sine-cosine and particle swarm optimization algorithms and the traditional Ziegler-Nichols tuning scheme. Similarly, the FES system tuned with the latter methods was also assessed for disturbance rejection and noise elimination. All the comparative analyses demonstrated that the proposed Ls-AOA has the greater capability for the challenging biomedical FES system.
Article
Full-text available
The optimal power flow (OPF) is a practical problem in a power system with complex characteristics such as a large number of control parameters and also multi-modal and non-convex objective functions with inequality and nonlinear constraints. Thus, tackling the OPF problem is becoming a major priority for power engineers and researchers. Many metaheuristic algorithms with different search strategies have been developed to solve the OPF problem. Although, the majority of them suffer from stagnation , premature convergence, and local optima trapping during the optimization process, which results in producing low solution qualities, especially for real-world problems. This study is devoted to proposing an effective hybridizing of whale optimization algorithm (WOA) and a modified moth-flame optimization algorithm (MFO) named WMFO to solve the OPF problem. In the proposed WMFO, the WOA and the modified MFO cooperate to effectively discover the promising areas and provide high-quality solutions. A randomized boundary handling is used to return the solutions that have violated the permissible boundaries of search space. Moreover, a greedy selection operator is defined to assess the acceptance criteria of new solutions. Ultimately, the performance of the WMFO is scrutinized on single and multi-objective cases of different OPF problems including standard IEEE 14-bus, IEEE 30-bus, IEEE 39-bus, IEEE 57-bus, and IEEE118-bus test systems. The obtained results corroborate that the proposed algorithm outperforms the contender algorithms for solving the OPF problem.
Article
Full-text available
The moth-flame optimization (MFO) algorithm is an effective nature-inspired algorithm based on the chemical effect of light on moths as an animal with bilateral symmetry. Although it is widely used to solve different optimization problems, its movement strategy affects the convergence and the balance between exploration and exploitation when dealing with complex problems. Since movement strategies significantly affect the performance of algorithms, the use of multi-search strategies can enhance their ability and effectiveness to solve different optimization problems. In this paper, we propose a multi-trial vector-based moth-flame optimization (MTV-MFO) algorithm. In the proposed algorithm, the MFO movement strategy is substituted by the multi-trial vector (MTV) approach to use a combination of different movement strategies, each of which is adjusted to accomplish a particular behavior. The proposed MTV-MFO algorithm uses three different search strategies to enhance the global search ability, maintain the balance between exploration and exploitation, and prevent the original MFO's premature convergence during the optimization process. Furthermore, the MTV-MFO algorithm uses the knowledge of inferior moths preserved in two archives to prevent premature convergence and avoid local optima. The performance of the MTV-MFO algorithm was evaluated using 29 benchmark problems taken from the CEC 2018 competition on real parameter optimization. The gained results were compared with eight metaheuristic algorithms. The comparison of results shows that the MTV-MFO algorithm is able to provide competitive and superior results to the compared algorithms in terms of accuracy and convergence rate. Moreover, a statistical analysis of the MTV-MFO algorithm and other compared algorithms was conducted, and the effectiveness of our proposed algorithm was also demonstrated experimentally.
Article
Full-text available
In this paper, a discrete moth-flame optimization algorithm for community detection (DMFO-CD) is proposed. The representation of solution vectors, initialization, and movement strategy of the continuous moth-flame optimization are purposely adapted in DMFO-CD such that it can solve the discrete community detection. In this adaptation, locus-based adjacency representation is used to represent the position of moths and flames and the initialization process is performed by considering the community structure and the relation between nodes without the need of any knowledge about the number of communities. Solution vectors are updated by the adapted movement strategy using a single-point crossover to distance imitating, a two-point crossover to calculate the movement, and a single-point neighbor-based mutation which can enhance the exploration and balance exploration and exploitation. The fitness function is also defined based on modularity. The performance of DMFO-CD was evaluated on eleven real-world networks, and the obtained results were compared with five well-known algorithms in community detection including GA-Net, DPSO-PDM, GACD, EGACD, and DECS in terms of modularity, NMI, and the number of detected communities. Additionally, the obtained results were statistically analyzed by the Wilcoxon signed-rank and Friedman tests. In the comparison with other comparative algorithms, the results show that the proposed DMFO-CD is very competitive to detect the correct number of communities with high modularity.
Article
Maintenance is a critical and costly phase of software lifecycle. Understanding the structure of software will make it much easier to maintain the software. Clustering the modules of software is regarded as a useful reverse engineering technique for constructing software structural models from source code. Minimizing the connections between produced clusters, maximizing the internal connections within the clusters, and maximizing the clustering quality are the most important objectives in software module clustering. Finding the optimal software clustering model is regarded as an NP-complete problem. The low success rate, limited stability, and poor modularization quality are the main drawbacks of the previous methods. In this paper, a combination of gray wolf optimization algorithm and genetic algorithms is suggested for efficient clustering of software modules. An extensive series of experiments on 14 standard benchmarks have been conducted to evaluated the proposed method. The results illustrate that using the combination of gray wolf and genetic algorithms to the software-module clustering problem increases the quality of clustering. In terms of modularization quality and convergence speed, proposed hybrid method outperforms the other heuristic approaches.
Article
The whale optimization algorithm (WOA) is a prominent problem solver which is broadly applied to solve NP-hard problems such as feature selection. However, it and most of its variants suffer from low population diversity and poor search strategy. Introducing efficient strategies is highly demanded to mitigate these core drawbacks of WOA particularly for dealing with the feature selection problem. Therefore, this paper is devoted to proposing an enhanced whale optimization algorithm named E-WOA using a pooling mechanism and three effective search strategies named migrating, preferential selecting, and enriched encircling prey. The performance of E-WOA is evaluated and compared with well-known WOA variants to solve global optimization problems. The obtained results proved that the E-WOA outperforms WOA's variants. After E-WOA showed a sufficient performance, then, it was used to propose a binary E-WOA named BE-WOA to select effective features, particularly from medical datasets. The BE-WOA is validated using medical diseases datasets and compared with the latest high-performing optimization algorithms in terms of fitness, accuracy, sensitivity, precision, and number of features. Moreover, the BE-WOA is applied to detect coronavirus (COVID-19) disease. The experimental and statistical results prove the efficiency of the BE-WOA in searching the problem space and selecting the most effective features compared to comparative optimization algorithms.
Article
Classification accuracy achieved by a machine learning technique depends on the feature set used in the learning process. However, it is often found that all the features extracted by some means for a particular task do not contribute to the classification process. Feature selection (FS) is an imperative and challenging pre-processing technique that helps to discard the unnecessary and irrelevant features while reducing the computational time and space requirement and increasing the classification accuracy. Generalized Normal Distribution Optimizer (GNDO), a recently proposed meta-heuristic algorithm, can be used to solve any optimization problem. In this paper, a hybrid version of GNDO with Simulated Annealing (SA) called Binary Simulated Normal Distribution Optimizer (BSNDO) is proposed which uses SA as a local search to achieve higher classification accuracy. The proposed method is evaluated on 18 well-known UCI datasets and compared with its predecessor as well as some popular FS methods. Moreover, this method is tested on high dimensional microarray datasets to prove its worth in real-life datasets. On top of that, it is also applied to a COVID-19 dataset for classification purposes. The obtained results prove the usefulness of BSNDO as a FS method. The source code of this work is publicly available at https://github.com/ahmed-shameem/Feature_selection.