ArticlePDF Available

A Chaotic-Based Interactive Autodidactic School Algorithm for Data Clustering Problems and Its Application on COVID-19 Disease Detection

April 2023
Symmetry 15(4)

April 2023
15(4)

DOI:10.3390/sym15040894

License
CC BY 4.0

Authors:

Farhad Soleimanian Gharehchopogh

Islamic Azad University of Urmia

In many disciplines, including pattern recognition, data mining, machine learning, image analysis, and bioinformatics, data clustering is a common analytical tool for data statistics. The majority of conventional clustering techniques are slow to converge and frequently get stuck in local optima. In this regard, population-based meta-heuristic algorithms are used to overcome the problem of getting trapped in local optima and increase the convergence speed. An asymmetric approach to clustering the asymmetric self-organizing map is proposed in this paper. The Interactive Autodidactic School (IAS) is one of these population-based metaheuristic and asymmetry algorithms used to solve the clustering problem. The chaotic IAS algorithm also increases exploitation and generates a better population. In the proposed model, ten different chaotic maps and the intra-cluster summation fitness function have been used to improve the results of the IAS. According to the simulation findings, the IAS based on the Chebyshev chaotic function outperformed other chaotic IAS iterations and other metaheuristic algorithms. The efficacy of the proposed model is finally highlighted by comparing its performance with optimization algorithms in terms of fitness function and convergence rate. This algorithm can be used in different engineering problems as well. Moreover, the Binary IAS (BIAS) detects coronavirus disease 2019 (COVID-19). The results demonstrate that the accuracy of BIAS for the COVID-19 dataset is 96.25%.

Content uploaded by Farhad Soleimanian Gharehchopogh

Content may be subject to copyright.

Citation: Gharehchopogh, F.S.;

Khargoush, A.A. A Chaotic-Based

Interactive Autodidactic School

Algorithm for Data Clustering

Problems and Its Application on

COVID-19 Disease Detection.

Symmetry 2023,15, 894. https://

doi.org/10.3390/sym15040894

Academic Editors: Jeng-Shyang Pan,

Zhixun Su and Alexander Shelupanov

Received: 18 February 2023

Revised: 13 March 2023

Accepted: 7 April 2023

Published: 10 April 2023

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

symmetry

Article

A Chaotic-Based Interactive Autodidactic School Algorithm for

Data Clustering Problems and Its Application on COVID-19

Disease Detection

Farhad Soleimanian Gharehchopogh * and Aysan Alavi Khargoush

Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia 5716963896, Iran

*Correspondence: bonab.farhad@gmail.com; Tel.: +98-91-4176-4427

Abstract:

In many disciplines, including pattern recognition, data mining, machine learning, image

analysis, and bioinformatics, data clustering is a common analytical tool for data statistics. The

majority of conventional clustering techniques are slow to converge and frequently get stuck in local

optima. In this regard, population-based meta-heuristic algorithms are used to overcome the problem

of getting trapped in local optima and increase the convergence speed. An asymmetric approach to

clustering the asymmetric self-organizing map is proposed in this paper. The Interactive Autodidactic

School (IAS) is one of these population-based metaheuristic and asymmetry algorithms used to solve

the clustering problem. The chaotic IAS algorithm also increases exploitation and generates a better

population. In the proposed model, ten different chaotic maps and the intra-cluster summation ﬁtness

function have been used to improve the results of the IAS. According to the simulation ﬁndings, the

IAS based on the Chebyshev chaotic function outperformed other chaotic IAS iterations and other

metaheuristic algorithms. The efﬁcacy of the proposed model is ﬁnally highlighted by comparing its

performance with optimization algorithms in terms of ﬁtness function and convergence rate. This

algorithm can be used in different engineering problems as well. Moreover, the Binary IAS (BIAS)

detects coronavirus disease 2019 (COVID-19). The results demonstrate that the accuracy of BIAS for

the COVID-19 dataset is 96.25%.

Keywords: interactive autodidactic school algorithm; chaotic maps; data clustering; optimization

1. Introduction

One of the main scientiﬁc ﬁelds of machine learning and data mining is data clustering.

It involves separating several objects into groups of things [

]. In other words, data

clustering is a branch of unsupervised learning and an automatic process that divides

samples into categories whose members are similar. Data clustering aims to illustrate an

extensive dataset with fewer initial samples or clusters; this simpliﬁes the data in modeling

and plays a signiﬁcant role in exploration and data mining. Clustering means identifying

similar classes of objects. It is possible to identify further the dense and scattered areas

in the object space, discover the general distribution pattern, and ﬁnd the correlation

properties between the data. Clustering techniques combine visible samples within clusters

that meet two main criteria: (1) each group or cluster is homogeneous; and (2) each group

or cluster must be different from other clusters. The most crucial clustering techniques are

hierarchical, distribution, partition, density, fuzzy, and graph-based clustering [2,3].

Using asymmetric similarities and dissimilarities is one solution to data clustering. In

order for them to accurately reﬂect the hierarchical asymmetric relationships between items

in the studied dataset, they need to be applied in algorithms in the appropriate manner.

Hence, it should be ensured that their use is in line with the data’s hierarchical linkages.

Asymmetric coefﬁcients and cluster coefﬁcients that are put into the formulas for symmetric

measures can be used to accomplish this. By building the asymmetric measurements off

of the symmetric ones, we can do so. The hierarchy’s consistency should be guaranteed

Symmetry 2023,15, 894. https://doi.org/10.3390/sym15040894 https://www.mdpi.com/journal/symmetry

Symmetry 2023,15, 894 2 of 26

by the asymmetry coefﬁcients and cluster coefﬁcients. As they should guarantee greater

values in the direction—from a more speciﬁc notion to a more generic one—in the event of

similarities, they should.

Clustering means assigning samples to different cluster centers based on proximity

and intra-cluster similarity. K-means clustering is widely used as one of the classical

methods due to its easy implementation and low computational efﬁciency for clustering

problems [

]. However, for K-means clustering, the number of clusters should be speciﬁed

beforehand. While in many practical applications, users usually have no information

about the number of clusters. If the clustering algorithm tries to test a different number

of clusters to ﬁnd the optimal state, ﬁnding the correct number will be time-consuming

and challenging. Therefore, to overcome the above phenomenon, intelligent clustering

methods should automatically determine the optimal number of clusters and obtain better

partitioning [5].

Optimization algorithms are critical computational tools in engineering, and their

application has grown signiﬁcantly over the past decades. Analytical and metaheuristic

methods differentiate the optimization algorithms. Analytical approaches, also called

gradient-based algorithms, are deterministic and always offer the same optimal solution

using the same starting point [

]. Although these numerical methods work well in solving

optimization problems, they have three signiﬁcant drawbacks compared to metaheuristic

methods [

]. First, numerical methods cannot be used when the ﬁtness function and

constraints are discrete since their gradients are not deﬁned. Second, numerical methods

may get trapped in local minimums due to their dependence on the value of the starting

point. Finally, the numerical methods are unstable and unreliable when the ﬁtness function

and constraints have multiple or sharp peaks. Researchers have turned to new stochastic

approaches with speciﬁc features instead of traditional analytical techniques to solve

complex engineering optimization problems.

Metaheuristic algorithms are essential in solving optimization problems; they are

among the most successful methods in solving various complex optimization problems.

These algorithms provide the optimal solution for optimization problems. The systems

and behavior of animals often inspire metaheuristic algorithms in nature, such as ﬂocks of

birds, ant colonies, and ﬁsh schools. The behavior of the members of these algorithms is

based on the behavior of the inspiring being in nature in terms of searching for the best

food sources. Most metaheuristic optimization algorithms have similar characteristics:

stochastic or random walk algorithms, independent of gradient information, iterative

methods, and applied to continuous and discrete problems. The performance of any

metaheuristic algorithm depends on the complexity of the cost function and the constraints

that deﬁne the functional search space. Metaheuristic algorithms have been used to solve

various optimization problems; they have been successful in many optimization problems,

including clustering. Classical clustering algorithms such as k-means often converge

to local optima and have slow convergence rates for larger datasets. Clustering-based

algorithms use swarm-based metaheuristic methods to overcome such issues. Swarm

or population-based metaheuristic approaches strive to achieve the optimal solution in

clustering at a reasonable time [8].

The IAS is a novel metaheuristic algorithm proposed by Jahangiri in 2020 [

]. It

simulates the interactions of a group of students trying to learn without the help of a

teacher; thus, an autodidactic school sequence is created. To explore the search space

looking for the optimal solution, the IAS, as with other population-based algorithms,

iteratively uses a population in which the students’ leader is called the leader and the rest

of the community is called the followers. This paper implements an improved IAS based

on chaotic maps on various clustering datasets.

The proposed model is appraised on different benchmark test functions for analyzing

its efﬁciency and accuracy. The experimental results demonstrated that the performance

of the proposed model is improved in terms of global search and convergence rate. The

proposed model is analyzed considering statistical criteria such as best, worst, lowest,

Symmetry 2023,15, 894 3 of 26

and standard deviation. Moreover, its convergence is compared with other metaheuristic

algorithms such as the Artiﬁcial Bee Colony (ABC) [

], Bat Algorithm (BA) [

], Crow

Search Algorithm (CSA) [

], and Artiﬁcial Electric Field Algorithm (AEFA) [

]. Then,

the IAS is developed to transfer the continuous search space to a binary one using the

S-shaped transfer function. Furthermore, the BIAS was applied in a case study to detect

coronavirus disease 2019 (COVID-19). The experimental results prove that BIAS is more

efﬁcient than other comparative algorithms in searching the problem space and selecting

the most compelling features. The contributions of this paper are as follows:

Increasing the discovery of the optimal solution in the proposed model with a balance

between exploration and exploitation by chaotic maps

They provide an improved version of the IAS for the data clustering problem based

on chaotic maps.

âEvaluation of the proposed model on 20 UCI datasets

âAssessment of the proposed model based on ﬁtness function and convergence rate

Developing the BIAS as the binary version of IAS using the V-shaped transfer function

to ﬁnd valuable features from COVID-19.

âComparison of the proposed model with ABC, BA, CSA, and AEFA

âWe are applying the BIAS in a case study to detect COVID-19.

The rest of the paper is organized as follows. In Section 2, related works in clustering

by metaheuristic algorithms are surveyed. Section 3describes material and methods such

as the IAS algorithm and chaotic maps. Section 4proposes a new version of the improved

IAS algorithm based on chaotic maps for data clustering. In Section 5, the performance of

the proposed model is compared with other algorithms on the clustering dataset. Section 6

establishes the actual application of the proposed BIAS for extracting essential features from

the COVID-19 dataset. Finally, Section 7provides concluding remarks and suggestions for

future research.

2. Related Works

This section presents the subject’s background and related literature in data clustering

using metaheuristic algorithms. Here, the aim is to review recent data clustering improve-

ments using metaheuristic algorithms. Therefore, the related works are presented below in

the order of publishing time.

Ahmadi et al. [

] presented an improved version of the Grey Wolf Optimizer (GWO)

algorithm for clustering problems. A modiﬁed GWO has been proposed to address some

metaheuristic algorithms’ challenges. This modiﬁcation includes a balancing approach

between exploring and exploiting the GWO and a local search for the best solution. The re-

sults show that the proposed model has a lower intra-cluster distance than other algorithms

and a mean error of about 11%, which is the lowest among all comparison algorithms.

Ashish et al. [

] proposed a fast and efﬁcient parallel BA for data clustering using a

mapping reduction architecture. The parallel BA is very efﬁcient and helpful since it uses

an evolutionary approach to clustering instead of other algorithms, such as k-means; it also

enjoys high speed due to Hadoop architecture. The results of various experiments show

that the parallel BA performs better than Particle Swarm Optimization (PSO); it performs

faster than other comparative algorithms when the number of nodes increases.

The applicability of the Cuttleﬁsh Algorithm (CFA) to clustering issues has been

examined in this study [

]. Additionally, it has demonstrated that the CFA can ﬁnd

the optimal cluster centers. The technique has prevented the cluster centers from readily

becoming trapped in a local minimum, a signiﬁcant drawback of the K-means. The CFA was

used as a search method to reduce the clustering metrics. Based on the Shapes and UCI real-

world datasets, the performance of the CFA-Clustering model has been assessed. The three

well-known algorithms, Genetic Algorithm (GA), PSO, and K-means, were compared. The

empirical ﬁndings show that, for the most part, the CFA-Clustering approach outperforms

the other methods.

Symmetry 2023,15, 894 4 of 26

An asymmetric version of the k-means clustering algorithm [

] arises caused by

usage of dissimilarities, which are asymmetric by deﬁnition (for example, the Kullback–

Leibler divergence).

Cuckoo and krill herd algorithms are utilized on k-means++ to improve cluster quality

and create optimized clusters [

]. Performance parameters such as accuracy, error rate,

f-measure, CPU time, standard deviation, cluster quality check, and so forth are used to

measure the clustering potentialities of these algorithms. The results presented the high

performance of the newly designed algorithm.

Zhang et al. [

] proposed an improved K-means algorithm based on canopy density

in 2018 to improve the K-means algorithm’s accuracy and stability and determine the

most appropriate number of K clusters and the best initial data. An enhanced K-means

method based on density Canopy is suggested to enhance the accuracy and stability of the

K-means algorithm and to address the issue of selecting the best-starting seeds and the

optimal number of K of clusters. The ﬁrst step is to compute the density of the sample data

sets, the average sample distance inside clusters, and the distance between clusters. The

density maximum sampling point is then selected as the ﬁrst cluster center, and the density

cluster is then removed from the sample datasets. The K-means technique uses the density

Canopy as a pre-processing step, and the output is utilized to determine the cluster number

and starting clustering center. Comparative results show that the improved K-means

algorithm based on canopy density has obtained better clustering results. The improved

K-means algorithm based on canopy density is less sensitive to noisy data than the K-

means algorithm, the canopy-based K-means algorithm, the semi-supervised K-means++

algorithm, and the K-means-u algorithm. The clustering accuracy of the proposed canopy

density-based K-means algorithm is improved by an average of 30.7%, 6.1%, 5.3%, and 3.7%

in the UCI dataset, respectively, and by 44.3%, 3.6%, 9.6%, and 8.9%, respectively, in the

simulated dataset with the improved noise signal. It enjoys a more accurate performance

than comparative algorithms.

To use the advantages of the two ABC and the K-means algorithms, Kummer et al. [

]

proposed a hybrid algorithm combining these two algorithms, called the MABCKM Al-

gorithm. The Hybrid MABCKM Algorithm modiﬁes the solutions generated by ABC and

considers them as the initial solutions for the K-means algorithm. According to the results

obtained from comparing the performance of MABCKM, K-means, and ABC algorithms

on different datasets taken from the UCI repository, it is clear that MABCKM outperforms

other comparative algorithms.

The Whale Clustering Optimization Algorithm(WOA) was proposed for clustering

data [

]. The results of WOA are compared with the well-known k-means clustering

method and other standard stochastic algorithms such as PSO, ABC, Differential Evolution

(DE), and GA clustering. The proposed model was checked using one artiﬁcial and seven

real benchmark datasets from the UCI. Simulations have proven that the proposed model

could successfully be used for data clustering.

Qaddoura et al. [

] presented an improved version of the GA’s evolutionary behavior

as well as the advanced performance of the nearest neighbor search technique for clustering

problems based on allocation and selection mechanisms. The success of evolutionary

algorithms in solving various machine learning problems, including clustering, has been

proven. The proposed model’s objective was to improve the quality of clustering results

by identifying a solution that maximizes differentiation between different clusters and

coherence between data points within the same cluster. Various experiments show that

the proposed model works well with the Silhouette coefﬁcient’s ﬁtness function and

outperforms other algorithms.

Zhou et al. [

] presented an enhanced version of the symbiotic organism search

(SOS) algorithm to solve data clustering. It evokes the symbiotic interaction strategies

used by organisms in the ecosystem to survive and spread. This paper implemented

the proposed model on ten standard UCI machine-learning repository datasets. Various

Symmetry 2023,15, 894 5 of 26

experiments showed that the SOS algorithm performed better than other algorithms in

accuracy and precision.

Rahnema and Gharehchopogh proposed an improved ABC based on the whale op-

timization algorithm for data clustering in 2020 [

]. In this paper, two random and elite

memories are used in the ABC to overcome the problem of exploration and late convergence.

Finally, the proposed model was evaluated by being implemented on ten standard datasets

taken from the UCI machine learning repository. Ewees et al. presented an improved

version of the Multi-Verse Harris-Hawk optimization (CMVHHO) [

]. The primary pur-

pose of this algorithm was to use chaotic maps to determine optimized values of the main

parameters of the Harris algorithm. In addition, it was used as a local search approach

to improving the ability to exploit the search space. It was tested using several different

chaotic maps. Experimental results show that the Circle chaotic point is the best function

among all available functions since it has improved the performance of the proposed model

and has a positive effect on the behavior of the proposed model.

Chen et al. presented a chaotic-based dynamic weighted PSO algorithm [

]. The

proposed model introduces a chaotic map and an emotional weight for modifying the

search process. Dynamic weight is a ﬁtness function that increases the search accuracy and

performance of the proposed model. Various experiments show that the proposed model

outperformed nature-inspired and PSO algorithms in almost all functions.

To overcome the shortcomings of the Fruit Fly Optimization (FFO) algorithm [

Zhang et al. proposed a new version of the FFO using the Gaussian mutation operator and

the local chaotic search strategy. The Gaussian mutation operator is integrated into the FFO

algorithm to prevent premature convergence and improve the exploration process. Then,

a chaotic local search approach is adopted to increase the group’s local search ability; the

results prove that the proposed model works better than the basic FFO algorithm.

In this section, important clustering literature using metaheuristic algorithms was

reviewed. Most of these works have considered the clustering problem an optimization

problem and applied a metaheuristic algorithm to solve it; in addition, the ﬁtness function

of the intra-cluster dataset was used as the ﬁtness function. Some authors have used a

combination of genetic operators and other methods, while others have employed chaotic

and quantum mapping to improve exploitation and convergence. Considering the literature

reviewed in this paper, an enhanced version of the IAS based on chaotic maps is proposed

for the clustering problem.

3. Material and Method

3.1. IAS Algorithm

As with other population-based algorithms, IAS randomly generates an initial pop-

ulation called students [

]. A speciﬁc problem’s upper and lower limit values determine

students’ eligibility for inclusion in the IAS. The student with the highest performance

(minimum score) in each step gets the position of “leader student” or simply “leader”. In

IAS optimization, the best performance is achieved when the minimum value of the cost

function is performed. However, this position can be reassigned to another more skilled

student at any point in the process. The method of student generation and assessment of

student eligibility in school can be described as Algorithm 1.

Algorithm 1 The method of student generation and assessment of student eligibility

1: For i=:N_student

2: Si=LB +ri(0, 1)∗(U B −LB);Mi=|f(Si)

3: End For

4: f(LS)={m}

where S

is the ith generated student,

, and

are the lower and upper limits of

the variables, respectively, r

(0, 1) is a random number between 0 and 1, N_student is

the number of students, M

is the score of the ith student, and LS is the leader student.

Symmetry 2023,15, 894 6 of 26

Autodidactic/self-learning sessions in this interactive school are held in three stages:

individual training, group training, and new student challenges.

Individual Training Session: First, a random group of two follower students is selected.

Then, they discuss it one by one with the leader student. The student’s knowledge will

increase in peer-to-peer discussions with the leader. Accordingly, an individual training

session can be formulated as described in Algorithm 2:

Algorithm 2 Individual Training Session

1: For i=1 : Nstudent

2: Randomly select one student Sj.where i6=j

3: TS∗

i=TSi+ri(1.2)∗(LS −ICi∗TSi);

4: TS∗

j=TSj+rj(1.2)∗LS −ICj∗TSj;

5: End for

6: Accept TS∗

iand TS∗

jif they (it)achieve(s)better marks than

where TS

and TS

are the ﬁrst and second follower students, respectively; IC

and IC

are the inherent competencies of the ﬁrst and second students, respectively; r

(1, 2) and

(1, 2) are two different random vectors between 1 and 2. Individual competencies (IC

and ICj) are randomly determined as 1 or 2.

Collective Training Session: After the individual training session, each follower student

has the opportunity to review the contents of the last session and interact with other

follower students in the same group to resolve the unclear points of the lesson. In addition

to the knowledge level of individually trained students, their social abilities, such as

communication skills, teamwork, and collaboration, referred to as collective competencies,

can signiﬁcantly impact the effectiveness of group learning. Accordingly, the group training

session can be formulated as described in Algorithm 3.

Algorithm 3 Collective Training Session

1: For i=1 : Nstudent

2: CCij = (CCi×TSi+CCj×TSj)/(CCi+CCj);

3: TS∗

i=TSi+ri(1.2)×(LS −CCi×CCij);

4: TS∗

j=TSj+rj(1.2)×(LS −CCj×CCij );

5: End for

6: Accept TS∗

iand TS∗

ji f the y (it)achieve(s)better marks than TSiand TSj

where CC

is deﬁned as the collective ability of the group as a team, based on the weighted

average of students’ competencies. Moreover, r

(1, 2) and r

(1, 2) are two different random

vectors between 1 and 2. Students’ collective competencies (CC

and CC

) are randomly set

as 1 or 2.

Challenge of the New Student: In some optimization problems, due to the complex

nature of the cost function, the gradual improvement of follower students may be lim-

ited to a speciﬁc area of design space solely around the leader student (i.e., the current

temporary/local optima). However, it is still far from a permanent/global optimization.

Accordingly, a bad operating loop hinders the optimization process and will probably fail

to ﬁnd the global optimum. The new student challenge is introduced to complement the

algorithm to provide a more dynamic and exploratory IAS, creating an ongoing rebellion

against the current leader. If the new student has more skills than the current leader stu-

dent, they will take on the role of leader. The new student challenge can be formulated as

described in Algorithm 4.

Symmetry 2023,15, 894 7 of 26

Algorithm 4 New student challenge

1: NS =LB +R×(UB −L B);

2: MF1=round(r(0.1));

3: MF2=1−M F1

4: LS∗=MF1×LS +MF2×NS;

5: Archieves a better mark than LS

where NS is a new student; MF1 and MF2 are the ﬁrst and second corrective factors,

respectively; r(0, 1) is a random vector between 0 and 1. In addition, LS Student is the new

leader of the school.

The process (including all three sessions) is repeated until the termination criteria are

met. At the end of the process, each student has to have communicated with the leader

at least once. In both individual and group training sessions, groups of two students are

randomly selected in the search space to interact with the leader and themselves. Proper

selection of regulatory parameters, such as the number of students and several iterations,

can lead to faster detection of a global optimum. The more students exist in the autodidactic

school, the more likely there will be elite students among them. In addition, the number of

sessions held is equal to the number of students in the school. Hence, the population in

this IAS has a signiﬁcant effect on increasing the knowledge level of students.

3.2. Chaotic Maps

Most chaotic maps have been used to solve various stochastic and optimization

algorithms problems [

]. This section introduces ten chaotic maps to improve the IAS.

Each chaotic map has unique features, described and formulated in Table 1. The whole set

of chaotic maps employed in this paper is selected with an initial point of 0.7 with different

behaviors. The initial point in chaotic maps can be any number between 0 and 1.

Table 1. Functions of Chaotic Maps.

Methods Chaotics Mathematical Model Range

CIAS-1 Chebyshev pq +1=cos(q cos −1(pq)) (−1, 1)

CIAS-2 Circle pq+1=modpq+d−c

2πsinsin 2πpq.1.c=0.5 and d =0.2 (0, 1)

CIAS-3 Guess/mouse pq+1=1pq=01

mod (pq.1).otherwise (0, 1)

CIAS-4 Iterative pq+1=sinsin cπ

pq.c=0.7 (−1, 1)

CIAS-5 Logistic pq+1=cpq1−pq,c= 4 (0, 1)

CIAS-6 Piecewise pq+1=npq

l. 0 ≤pq≤lpq−1

0.5−l.l≤pq

≤0.5 1−l−pq

0.5−l. 0.5 ≤pq

≤1−l1−pq

l. 1 −l≤pq≤1

(0, 1)

CIAS-7 Sine pq+1=c

4sinsin 2πpq.c=4 (0, 1)

CIAS-8 Singer pq+1=µ(7.86pq−23.31 p2

q+28.75p3

−13.302875p4

q).µ=1.07

(0, 1)

CIAS-9 Sinusoidal pq+1=cp2

qsinsin πpq,c= 2.3 (0, 1)

CIAS-10 Tent pq+1=npq

0.7 pq≤0.7 10

31−pq.otherwise (0, 1)

Table 1lists the proposed chaotic maps to improve the IAS. The proposed model uses

chaotic maps to create the initial population and generate random parameters.

Symmetry 2023,15, 894 8 of 26

4. Proposed Model

The IAS is one of the most successful optimization algorithms. However, it fails to

work effectively in global optimization and ﬁnding the best solution. The main reason may

be the generation of an inadequate initial population and random parameters. Due to the

ergodic nature and lack of correct iteration of chaotic maps, better global and local searches

can be performed than random searches that rely primarily on probability. As a result,

this paper presents different versions of the IAS based on other chaotic maps to solve the

clustering problem. The ﬂowchart of the proposed model is shown in Figure 1.

Symmetry 2023, 15, x FOR PEER REVIEW 9 of 26

Figure 1. Flowchart of the proposed model.

4.1. Pre-Processing

The pre-processing step includes data conversion and data normalization. For da-

tasets where the data is of string type, the label-encoder method is used to convert string

data to numeric data. Once the string data is converted to numeric data, the data normal-

ization is carried out. The MinMax method is the most popular standard normalization

method that transfers data to the space between 0 and 1, as given in Equation (1).





(1)

In Equation (1), Xvalue is the initial value of a feature in the dataset, and Xnormal refers to

the normalized feature. The MaxXvalue and MinXvalue parameters represent the feature’s larg-

est and smallest values.

A dataset 󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇞 with m samples is defined according to

Equation (2).

Figure 1. Flowchart of the proposed model.

4.1. Pre-Processing

The pre-processing step includes data conversion and data normalization. For datasets

where the data is of string type, the label-encoder method is used to convert string data to

numeric data. Once the string data is converted to numeric data, the data normalization is

carried out. The MinMax method is the most popular standard normalization method that

transfers data to the space between 0 and 1, as given in Equation (1).

Xnormal =Xvalue −MinX val ue

MinXvalue −MaxXvalue

(1)

Symmetry 2023,15, 894 9 of 26

In Equation (1), X

value

is the initial value of a feature in the dataset, and X

normal

refers

to the normalized feature. The Max

Xvalue

and Min

Xvalue

parameters represent the feature’s

largest and smallest values.

A dataset

D={(x1.L1).(x2.L2). . . . .(xm.Lm)}

with msamples is deﬁned according to

Equation (2).

D=





x11 x12 · · · x1dL1

x21 x22 · · · x2dL2

· · · · · · · · · · · · · · ·

xm1xm2· · · xmd Lm







(2)

In Equation (2)

(xi.Li)

is the ith samples of D,

xi=[xi1.xi2. . . . .xid]

is the information

of the ith sample, and Lishows the label of the ith sample.

4.2. Chaotic-Based Population Generation

First, the IAS based on chaotic maps must generate a suitable initial population

to improve the algorithm’s convergence rate. Therefore, student generation and assess-

ment of students’ competence in school can be described as Equation (3) according to the

chaotic maps.

Sij =lbi+choma pi(0.1)×(ub −lbi)(3)

where S

is the ith generated student; lb and ub are the lower and upper bounds, respectively;

(0, 1) is a number generated by chaotic maps (listed in Table 1) between 0 and 1. Thus,

the IAS generates a population based on chaotic maps from the very beginning.

4.3. Chaotic-Based Individual Training Session

In the second step, the IAS uses chaotic sequences instead of random numbers to

improve the convergence speed of the algorithm in different iterations. Therefore, according

to the chaotic maps, the individual training session can be described as Algorithm 5.

Algorithm 5 Chaotic-Based Individual Training

1: For i=1 : Nstudent

2: Randomly select one student Sj.where i 6=j

3: hj=1+chomapj(0.1)

4: hi=1+chomapi(0.1)

5: TS∗

i=TSi+(1+chomapi(0.1)) ∗(LS −ICi∗TSi);

6: TS∗

j=TSj+ (1+chomapj(0.1))∗(LS −ICj∗TSj);

7: End for

8: Accept TS∗

iand TS∗

ji f the y (it)achieve(s)better marks than T Siand TSj

where TS

and TS

are the ﬁrst and second students, h

and h

are two different chaotic

vectors between 1 and 2 that are generated by the chaotic maps (listed in Table 1). Indi-

vidual competencies (IC

and IC

) are randomly set to 1 or 2, and there is no need to use

chaotic maps.

4.4. Chaotic-Based Group Training Session

In the third step, the IAS uses chaotic sequences instead of random numbers to improve

the convergence speed of the algorithm in different iterations. Therefore, according to the

chaotic maps, the group training session can be described as Algorithm 6.

Symmetry 2023,15, 894 10 of 26

Algorithm 6 Chaotic-Based Group Training

1: For i=1 : Nstudent

2 : hj=1+choma pj(0.1)

3 : hi=1+choma pi(0.1)

4: CCij = (CCi∗TSi+CCj∗TSj)/(CCi+CCj);

5: TS∗

i=TSi+hi∗(LS −CCi∗CCij);

6: TS∗

j=TSj+hj∗(LS −CCj∗CCij );

7 : End for

8 : Accept TS∗

iand TS∗

ji f the y(it)achieve(s)better marks than T Siand TSj

where CC

is deﬁned as the collective ability of the group as a team based on the weighted

average of students’ competencies, h

and h

are two different chaotic vectors between 1

and 2 generated by the chaotic maps (listed in Table 1). Students’ collective competencies

(CCiand CCj) are randomly set to 1 or 2, and there is no need to use chaotic maps.

4.5. Chaotic-Based New Student Challenge

In the fourth step of the IAS, chaotic sequences are used instead of random numbers to

improve the convergence speed of the algorithm in different iterations. Therefore, according

to the chaotic maps, the group training session can be described as Algorithm 7.

Algorithm 7 Chaotic-Based New Student

1 : NS =lbi+chomapi(0.1)∗(ub −lbi);

2: m= chomapi(0.1)

3: MF1=round(m);

4: MF2=1−M F1

5: LS∗=MF1∗LS +MF2∗NS;

6: Archieves a better mark than LS

In Algorithm 7, a new solution (i.e., NS, is generated entirely by chaotic maps), and

MF1 and MF2 are the ﬁrst and second corrective factors generated based on the chaotic

variable m. The worst side of this step is that, instead of random numbers, the chaotic

sequence generated by the chaotic maps is applied to increase the exploitation of the

proposed model.

4.6. Formation of Clusters

For the proposed model, each student vector expresses a solution with a certain

number of cluster centers, ranging from

Cmin

Cmax

. The decision variables are inscribed

as real-valued strings and regarded as cluster centers. Assuming that the dimension of

the dataset is

, the maximum length of the student vector is Kmax

d. For each student

vector whose cluster number is c, the

c×d

entries are evaluated as effective cluster center

solutions, and the remaining variables are invalid. Figure 2shows the format of students’

initial population for clustering. In IAS, the candidate solution is determined as {

j(k)

j(k), . . . , Xd

j(k)}, where k=1.2. . . . .P. Here, Pdescribes the number of iterations.

Figure 2shows that if a dataset has two clusters, different solutions are generated to

ﬁnd the two clusters. In each solution, other features are formed for the centrality of a

cluster. Each solution is evaluated, and at the end, the solution with the best ﬁtness (closest

distance) is selected as the optimal solution.

Symmetry 2023,15, 894 11 of 26

Symmetry 2023, 15, x FOR PEER REVIEW 12 of 26

Figure 2. format of the initial population of students for clustering.

Figure 2 shows that if a dataset has two clusters, different solutions are generated to

find the two clusters. In each solution, other features are formed for the centrality of a

cluster. Each solution is evaluated, and at the end, the solution with the best fitness (closest

distance) is selected as the optimal solution.

4.7. Fitness Function of Clustering

In our proposed model, the fitness function of clustering called intra-cluster summa-

tion using the Euclidean distance function is employed, the most popular and valid dis-

tance criterion in clustering. It can be calculated as Equation (4).

󰇭





 󰇮

(4)

In Equation (4), the variable m indicates the number of features, Oip represents the

value of the feature p of the object Oi, and  represents the value of the feature p of the

object Oj. This function minimizes the distance between each object and the cluster’s cen-

ter, which is allocated to generate compact groups. Intra-cluster is defined by Equation

(5).















(5)

Here, if  is 1, the object  is in cluster i; otherwise,  is not in cluster . The

variable k shows the number of clusters; the variable n indicates the number of objects.

The variable m shows the number of features. Note that  shows the value of feature p

of the center of cluster ith.

5. Results and Discussion

An IAS based on ten chaotic maps (i.e., CIAS, was presented in the previous section).

In this section, statistical criteria such as the fitness function’s minimum value and the

fitness function’s convergence rate are considered to compare the proposed model and

other algorithms. Here, ten versions of the proposed CIAS algorithm are first compared

with each other in terms of statistical criteria. The best version is considered an improved

or superior version. Then it is compared with other metaheuristic methods such as the

BA, CSA, ABC, and AEFA. Therefore, more details about the implementation, parameters,

criteria, comparison, and evaluation of the proposed CIAS algorithm for the clustering

problems are given here.

Figure 2. Format of the initial population of students for clustering.

4.7. Fitness Function of Clustering

In our proposed model, the ﬁtness function of clustering called intra-cluster sum-

mation using the Euclidean distance function is employed, the most popular and valid

distance criterion in clustering. It can be calculated as Equation (4).

distanceOi.Oj= m

∑

p=1Oip −Oj p 

2!2

(4)

In Equation (4), the variable m indicates the number of features, O

represents the

value of the feature pof the object O

, and

Ojp

represents the value of the feature pof the

object Oj. This function minimizes the distance between each object and the cluster ’s center,

which is allocated to generate compact groups. Intra-cluster is deﬁned by Equation (5).

SSE =

∑

i=1

∑

j=1

Wij ×v

∑

p=1Ojp −Oi p 2(5)

Here, if

Wij

is 1, the object

is in cluster i; otherwise,

is not in cluster

. The

variable kshows the number of clusters; the variable nindicates the number of objects. The

variable mshows the number of features. Note that

Oip

shows the value of feature p of the

center of cluster ith.

5. Results and Discussion

An IAS based on ten chaotic maps (i.e., CIAS, was presented in the previous section).

In this section, statistical criteria such as the ﬁtness function’s minimum value and the

ﬁtness function’s convergence rate are considered to compare the proposed model and

other algorithms. Here, ten versions of the proposed CIAS algorithm are ﬁrst compared

with each other in terms of statistical criteria. The best version is considered an improved

or superior version. Then it is compared with other metaheuristic methods such as the

BA, CSA, ABC, and AEFA. Therefore, more details about the implementation, parameters,

criteria, comparison, and evaluation of the proposed CIAS algorithm for the clustering

problems are given here.

5.1. Dataset

All clustering datasets used here to evaluate the improved version of the IAS based on

chaotic maps are listed in Table 2; the number of features and samples taken from these

20 valid UCI clustering datasets is mentioned.

Symmetry 2023,15, 894 12 of 26

Table 2. Clustering Dataset.

No. Datasets Number of Features Number of Samples

1 Balance Scale 4 625

2 Blood 4 748

3 breast 30 569

4 CMC 9 1473

5 Dermatology 34 366

6 Glass 9 214

7 Haberman’s Survival 3 306

8 hepatitis 19 155

9 Iris 4 150

10 Libras 90 360

11 lung cancer 32 56

12 Madelon 500 2600

13 ORL 1024 400

14 seeds 7 210

15 speech 310 125

16 Starlog (Heart) 13 270

17 Steel 33 1941

18 Vowel 3 871

19 wine 13 178

20 Wisconsin 9 699

5.2. Simulation Environment and Parameters Determination

The proposed CIAS approaches and comparative algorithms are implemented using

MATLAB 2019 on a system with 8 GB of RAM, a Cori5 CPU (2.4 GHz), and a 64-bit

operating system. For a better comparison, the quantitative parameters of the proposed

CIAS approaches, the BS, CSA, ABC, and AEFA, are set the same (see Table 3). In addition,

the qualitative parameters of each algorithm are set as standard.

Table 3shows that the initial values of the population and the number of iterations are

considered the same for all algorithms, and the importance of the other parameters is set as

standard. Different versions of the IAS based on chaotic maps (i.e., CIAS-1, CIAS-2,

. . .

CIAS-10) are compared with other statistical criteria. The evaluations and comparisons of

different versions of the IAS based on chaotic maps are provided below. The convergence

rate for implementing various versions of the IAS based on chaotic maps on 10 clustering

datasets is presented in Figure 3.

The results related to the convergence rate of different versions of the proposed

model on 20 datasets indicate that: (1) IAS-2 had a better performance on BLOOD and

DERMATOLOGY datasets, and IAS-1 had a better performance on BLOOD and CANCER

datasets; (2) IAS-1 had a better performance on IRIS and WINE datasets and IAS-4 had

a better performance on STEEL and IRIS datasets; (3) IAS-4 had a better performance

on GLASS, HABERMAN, BREASTEW datasets, and IAS-1 had a better performance on

BREASTEW AND HABERMAN datasets; (4) IAS-1 had a better performance on HEART

and LUNG CANCER datasets, and IAS-2 had a better performance on HABERMAN

dataset; (5) IAS-1 had a better performance on VOWEL datasets, and IAS-2 had a better

performance on SEEDS dataset. The results related to the convergence rate of different

versions of the IAS based on chaotic maps implemented on the whole dataset show that

IAS-1, IAS-2, IAS-4, and IAS-6 have improved results compared to other versions. To

further evaluate the different versions of the IAS based on chaotic maps, the developments

Symmetry 2023,15, 894 13 of 26

related to the worst solution for the population of the algorithms are compared, as shown

in Table 4.

Table 3. Values of initial parameters.

Values Parameters Algorithms

ABC [10]

Limit 5D 5D

Population Size 20

Number of lookers 20

Iterations 100

BA [11]

R 0.5

A 0.8

population size 20

Iterations 100

CSA [12]

Ap 0.8

population size is 20

Iterations 100

AEFA [13]

FCheck 1

population size 20

Iterations 100

Proposed Model population size 20

Iterations 100

Symmetry 2023, 15, x FOR PEER REVIEW 14 of 26

population size

Iterations

100

Proposed Model

population size

Iterations

100

Table 3 shows that the initial values of the population and the number of iterations

are considered the same for all algorithms, and the importance of the other parameters is

set as standard. Different versions of the IAS based on chaotic maps (i.e., CIAS-1, CIAS-2,

…, CIAS-10) are compared with other statistical criteria. The evaluations and comparisons

of different versions of the IAS based on chaotic maps are provided below. The conver-

gence rate for implementing various versions of the IAS based on chaotic maps on 10

clustering datasets is presented in Figure 3.

Figure 3. Cont.

Symmetry 2023,15, 894 14 of 26

Symmetry 2023, 15, x FOR PEER REVIEW 15 of 26

Figure 3. Convergence rate of different versions of IAS based on chaotic maps.

The results related to the convergence rate of different versions of the proposed

model on 20 datasets indicate that: (1) IAS-2 had a better performance on BLOOD and

DERMATOLOGY datasets, and IAS-1 had a better performance on BLOOD and CANCER

datasets; (2) IAS-1 had a better performance on IRIS and WINE datasets and IAS-4 had a

Figure 3. Convergence rate of different versions of IAS based on chaotic maps.

Symmetry 2023,15, 894 15 of 26

Table 4.

Results related to the worst, best, and average solutions for the population of different

versions of the ISA.

Dataset Results IAS-1 IAS-2 IAS-3 IAS-4 IAS-5 IAS-6 IAS-7 IAS-8 IAS-9 IAS-10

Blood

Worst 4.21E+05 8.46E+05 8.46E+05 4.20E+05 8.47E+05 8.46E+05 8.47E+05 8.46E+05 8.46E+05 8.46E+05

Best 4.10E+05 4.10E+05 4.12E+05 4.10E+05 4.18E+05 4.13E+05 4.12E+05 4.15E+05 4.20E+05 4.13E+05

Avg 4.14E+05 4.93E+05 4.93E+05 4.15E+05 5.46E+05 5.20E+05 5.22E+05 6.30E+05 6.55E+05 5.20E+05

Cancer

Worst 4.25E+03 4.45E+03 3.59E+03 3.63E+03 4.36E+03 3.94E+03 4.71E+03 4.24E+03 4.25E+03 5.36E+03

Best 3.28E+03 3.93E+03 3.30E+03 3.26E+03 3.68E+03 3.42E+03 3.94E+03 3.77E+03 3.50E+03 3.72E+03

Avg 3.82E+03 4.20E+03 3.44E+03 3.48E+03 4.08E+03 3.62E+03 4.40E+03 4.03E+03 3.83E+03 5.05E+03

CMC

Worst 9.70E+03 9.91E+03 1.38E+04 1.01E+04 1.32E+04 1.26E+04 1.31E+04 1.38E+04 1.38E+04 1.31E+04

Best 8.08E+03 7.79E+03 7.40E+03 7.33E+03 7.80E+03 7.60E+03 7.31E+03 6.93E+03 7.66E+03 7.15E+03

Avg 9.11E+03 8.67E+03 1.16E+04 8.85E+03 1.10E+04 1.09E+04 1.01E+04 1.18E+04 1.21E+04 1.05E+04

Dermatology

Worst 4.95E+03 3.54E+03 1.21E+04 4.76E+03 4.65E+03 4.00E+03 7.71E+03 4.94E+03 5.05E+03 4.37E+03

Best 3.05E+03 2.75E+03 2.83E+03 2.87E+03 3.31E+03 3.03E+03 2.96E+03 3.08E+03 2.93E+03 2.90E+03

Avg 3.75E+03 3.27E+03 1.15E+04 3.69E+03 3.80E+03 3.41E+03 3.99E+03 3.79E+03 3.71E+03 3.51E+03

Iris

Worst 2.29E+02 3.03E+02 2.97E+02 2.38E+02 2.75E+02 2.88E+02 2.86E+02 3.03E+02 3.03E+02 2.84E+02

Best 1.67E+02 1.71E+02 1.75E+02 1.55E+02 2.00E+02 1.47E+02 2.05E+02 1.85E+02 1.97E+02 1.71E+02

Avg 1.97E+02 2.55E+02 2.51E+02 1.96E+02 2.43E+02 2.39E+02 2.57E+02 2.66E+02 2.57E+02 2.29E+02

Orl

Worst 9.55E+05 7.65E+05 5.28E+05 5.59E+05 5.69E+05 7.77E+05 5.72E+05 7.32E+05 6.38E+05 7.67E+05

Best 8.44E+05 7.35E+05 5.23E+05 5.36E+05 5.55E+05 7.44E+05 5.54E+05 6.70E+05 5.73E+05 7.51E+05

Avg 9.37E+05 7.62E+05 5.27E+05 5.45E+05 5.64E+05 7.71E+05 5.65E+05 7.01E+05 6.28E+05 7.60E+05

Steel

Worst 3.8E+09 4.64E+09 4.18E+09 3.8E+09 4.64E+09 4.26E+09 4.64E+09 4.64E+09 4.64E+09 4.64E+09

Best 2.40E+06 2.42E+06 2.48E+06 2.36E+06 2.55E+06 2.48E+06 2.51E+06 2.55E+06 2.37E+06 2.44E+06

Avg 3.29E+09 3.97E+09 3.85E+09 3.05E+09 3.92E+09 3.72E+09 3.85E+09 3.96E+09 3.98E+09 3.71E+09

Wine

Worst 2.26E+04 2.90E+04 3.51E+04 2.27E+04 8.38E+04 3.83E+04 4.28E+04 8.38E+04 8.38E+04 8.38E+04

Best 1.74E+04 1.82E+04 1.77E+04 1.75E+04 1.75E+04 1.78E+04 1.79E+04 1.79E+04 1.91E+04 1.82E+04

Avg 2.01E+04 2.18E+04 3.40E+04 2.00E+04 3.31E+04 2.58E+04 2.52E+04 3.64E+04 3.83E+04 3.35E+04

Balance

Scale

Worst 1.49E+03 1.51E+03 2.33E+03 1.63E+03 1.54E+03 1.52E+03 1.51E+03 1.53E+03 1.79E+03 1.51E+03

Best 1.45E+03 1.45E+03 1.52E+03 1.47E+03 1.46E+03 1.44E+03 1.45E+03 1.45E+03 1.48E+03 1.45E+03

Avg 1.47E+03 1.49E+03 2.15E+03 1.55E+03 1.49E+03 1.48E+03 1.48E+03 1.50E+03 1.61E+03 1.48E+03

Worst 2.37E+03 2.99E+03 2.69E+03 2.44E+03 3.22E+03 2.79E+03 3.13E+03 2.55E+03 2.50E+03 2.79E+03

Breasts Best 2.22E+03 2.69E+03 2.58E+03 2.29E+03 2.84E+03 2.53E+03 2.68E+03 2.43E+03 2.43E+03 2.47E+03

Avg 2.30E+03 2.89E+03 2.63E+03 2.37E+03 3.10E+03 2.73E+03 2.98E+03 2.51E+03 2.47E+03 2.63E+03

Glass

Worst 8.46E+02 8.86E+02 9.65E+02 8.76E+02 1.19E+03 1.15E+03 1.19E+03 1.19E+03 1.20E+03 1.20E+03

Best 5.52E+02 5.87E+02 5.89E+02 5.15E+02 5.31E+02 5.73E+02 6.19E+02 5.65E+02 5.93E+02 6.10E+02

Avg 8.13E+02 7.72E+02 9.35E+02 7.44E+02 9.74E+02 8.87E+02 9.86E+02 1.02E+03 1.09E+03 8.79E+02

Haberman

Worst 3.61E+03 4.47E+03 4.46E+03 3.64E+03 4.14E+03 4.46E+03 5.64E+03 4.16E+03 4.28E+03 4.52E+03

Best 2.70E+03 2.84E+03 3.07E+03 2.73E+03 3.00E+03 3.14E+03 3.21E+03 2.78E+03 3.01E+03 2.79E+03

Avg 3.14E+03 3.85E+03 3.43E+03 3.29E+03 3.65E+03 3.78E+03 3.77E+03 3.70E+03 3.78E+03 3.91E+03

Heart

Worst 1.97E+04 2.72E+04 3.33E+04 1.97E+04 4.22E+04 3.46E+04 4.17E+04 4.22E+04 4.15E+04 3.62E+04

Best 2.40E+06 2.42E+06 2.48E+06 2.36E+06 1.46E+04 1.43E+04 1.42E+04 1.37E+04 1.42E+04 1.31E+04

Avg 3.29E+09 3.97E+09 3.85E+09 3.05E+09 2.43E+04 2.18E+04 2.75E+04 2.71E+04 3.03E+04 1.99E+04

Hepatitis

Worst 2.26E+04 2.90E+04 3.51E+04 2.27E+04 2.22E+04 2.25E+04 1.96E+04 2.25E+04 2.24E+04 2.27E+04

Best 1.74E+04 1.82E+04 1.77E+04 1.75E+04 1.35E+04 1.31E+04 1.34E+04 1.36E+04 1.32E+04 1.34E+04

Avg 2.01E+04 2.18E+04 3.40E+04 2.00E+04 1.83E+04 1.78E+04 1.72E+04 1.90E+04 1.79E+04 1.75E+04

Libras

Worst 1.49E+03 1.51E+03 2.33E+03 1.63E+03 1.05E+03 9.16E+02 9.21E+02 6.11E+02 7.15E+02 7.38E+02

Best 1.45E+03 1.45E+03 1.52E+03 1.47E+03 6.89E+02 8.72E+02 6.69E+02 5.78E+02 6.88E+02 5.97E+02

Avg 1.47E+03 1.49E+03 2.15E+03 1.55E+03 8.82E+02 8.94E+02 8.62E+02 5.87E+02 7.07E+02 6.60E+02

Lung Cancer

Worst 2.37E+03 2.99E+03 2.69E+03 2.44E+03 1.98E+02 1.87E+02 2.03E+02 2.19E+02 1.97E+02 2.07E+02

Best 2.22E+03 2.69E+03 2.58E+03 2.29E+03 1.69E+02 1.70E+02 1.80E+02 1.79E+02 1.66E+02 1.76E+02

Avg 2.30E+03 2.89E+03 2.63E+03 2.37E+03 1.88E+02 1.80E+02 1.95E+02 1.97E+02 1.83E+02 1.93E+02

Madelon

Worst 8.46E+02 8.86E+02 9.65E+02 8.76E+02 1.95E+06 1.83E+06 1.86E+06 1.82E+06 1.84E+06 1.82E+06

Best 5.52E+02 5.87E+02 5.89E+02 5.15E+02 1.94E+06 1.83E+06 1.84E+06 1.82E+06 1.82E+06 1.82E+06

Avg 8.13E+02 7.72E+02 9.35E+02 7.44E+02 1.95E+06 1.83E+06 1.85E+06 1.82E+06 1.83E+06 1.82E+06

Seeds

Worst 3.61E+03 4.47E+03 4.46E+03 3.64E+03 7.75E+02 8.30E+02 8.41E+02 7.97E+02 8.58E+02 7.19E+02

Best 2.70E+03 2.84E+03 3.07E+03 2.73E+03 5.29E+02 5.15E+02 4.93E+02 5.29E+02 5.28E+02 5.33E+02

Avg 3.14E+03 3.85E+03 3.43E+03 3.29E+03 6.63E+02 6.42E+02 6.66E+02 6.96E+02 7.47E+02 6.29E+02

Speech

Worst 1.97E+04 2.72E+04 3.33E+04 1.97E+04 6.58E+12 6.58E+12 6.52E+12 6.58E+12 6.58E+12 6.55E+12

Best 2.40E+06 2.42E+06 2.48E+06 2.36E+06 3.68E+12 2.54E+12 3.02E+12 3.31E+12 3.45E+12 3.46E+12

Avg 3.29E+09 3.97E+09 3.85E+09 3.05E+09 5.40E+12 5.06E+12 5.09E+12 5.03E+12 4.42E+12 5.07E+12

Vowel

Worst 2.26E+04 2.90E+04 3.51E+04 2.27E+04 5.83E+05 4.49E+05 5.72E+05 5.76E+05 6.92E+05 4.43E+05

Best 1.74E+04 1.82E+04 1.77E+04 1.75E+04 2.45E+05 2.50E+05 2.60E+05 2.32E+05 2.37E+05 2.23E+05

Avg 2.01E+04 2.18E+04 3.40E+04 2.00E+04 4.04E+05 3.46E+05 3.87E+05 4.38E+05 4.46E+05 3.31E+05

IAS-1, in 60% of clustering datasets, achieved better results than other versions; IAS-2,

IAS-4, and IAS-6 succeeded in 10% of clustering datasets. Regarding the average ﬁtness

function results for the different versions of the IAS, IAS-1 exceeded other versions in 67%

Symmetry 2023,15, 894 16 of 26

of the clustering datasets. Chaotic functions tend to reach the closest point to the objective

function by ﬁnding optimal solutions.

5.3. Comparison of the Proposed Model with Other Metaheuristics

In this section, the ﬁrst chaotic map-based IAS (IAS-1), called CIAS, is compared with

other basic metaheuristic algorithms in terms of different statistical criteria. The results of

other evaluations and comparisons are given below. The results related to the convergence

rate of the proposed model and comparative metaheuristic algorithms implemented on

10 datasets are shown in Figure 4.

The results related to the convergence rate of the proposed model and the ﬁfth group

of comparative algorithms show that the proposed CIAS algorithm performed better than

the other metaheuristic algorithms in two of the ﬁfth group of datasets. The results related

to the convergence rate of the proposed model and comparative algorithms implemented

on the whole dataset indicate that the proposed CIAS approach has achieved better results.

The proposed CIAS model performed better in 75% of clustering datasets. The results

related to the worst, best, and average solutions for the population of the proposed model

and other comparative algorithms are presented in Table 5.

Symmetry 2023, 15, x FOR PEER REVIEW 17 of 26

Libras

Wors

1.49E+03 1.51E+03 2.33E+03 1.63E+03 1.05E+03 9.16E+02 9.21E+02 6.11E+02 7.15E+02 7.38E+02

Best 1.45E+03 1.45E+03 1.52E+03 1.47E+03 6.89E+02 8.72E+02 6.69E+02 5.78E+02 6.88E+02 5.97E+02

Avg 1.47E+03 1.49E+03 2.15E+03 1.55E+03 8.82E+02 8.94E+02 8.62E+02 5.87E+02 7.07E+02 6.60E+02

Lung

Cancer

Worst 2.37E+03 2.99E+03 2.69E+03 2.44E+03 1.98E+02 1.87E+02 2.03E+02 2.19E+02 1.97E+02 2.07E+02

Best 2.22E+03 2.69E+03 2.58E+03 2.29E+03 1.69E+02 1.70E+02 1.80E+02 1.79E+02 1.66E+02 1.76E+02

Avg 2.30E+03 2.89E+03 2.63E+03 2.37E+03 1.88E+02 1.80E+02 1.95E+02 1.97E+02 1.83E+02 1.93E+02

Made-

lon

Worst 8.46E+02 8.86E+02 9.65E+02 8.76E+02 1.95E+06 1.83E+06 1.86E+06 1.82E+06 1.84E+06 1.82E+06

Best 5.52E+02 5.87E+02 5.89E+02 5.15E+02 1.94E+06 1.83E+06 1.84E+06 1.82E+06 1.82E+06 1.82E+06

Avg 8.13E+02 7.72E+02 9.35E+02 7.44E+02 1.95E+06 1.83E+06 1.85E+06 1.82E+06 1.83E+06 1.82E+06

Seeds

Worst 3.61E+03 4.47E+03 4.46E+03 3.64E+03 7.75E+02 8.30E+02 8.41E+02 7.97E+02 8.58E+02 7.19E+02

Best 2.70E+03 2.84E+03 3.07E+03 2.73E+03 5.29E+02 5.15E+02 4.93E+02 5.29E+02 5.28E+02 5.33E+02

Avg 3.14E+03 3.85E+03 3.43E+03 3.29E+03 6.63E+02 6.42E+02 6.66E+02 6.96E+02 7.47E+02 6.29E+02

Speech

Worst 1.97E+04 2.72E+04 3.33E+04 1.97E+04 6.58E+12 6.58E+12 6.52E+12 6.58E+12 6.58E+12 6.55E+12

Best 2.40E+06 2.42E+06 2.48E+06 2.36E+06 3.68E+12 2.54E+12 3.02E+12 3.31E+12 3.45E+12 3.46E+12

Avg 3.29E+09 3.97E+09 3.85E+09 3.05E+09 5.40E+12 5.06E+12 5.09E+12 5.03E+12 4.42E+12 5.07E+12

Vowel

Worst 2.26E+04 2.90E+04 3.51E+04 2.27E+04 5.83E+05 4.49E+05 5.72E+05 5.76E+05 6.92E+05 4.43E+05

Best 1.74E+04 1.82E+04 1.77E+04 1.75E+04 2.45E+05 2.50E+05 2.60E+05 2.32E+05 2.37E+05 2.23E+05

Avg 2.01E+04 2.18E+04 3.40E+04 2.00E+04 4.04E+05 3.46E+05 3.87E+05 4.38E+05 4.46E+05 3.31E+05

Figure 4. Cont.

Symmetry 2023,15, 894 17 of 26

Symmetry 2023, 15, x FOR PEER REVIEW 18 of 26

Figure 4. Convergence rate of the proposed model and other comparative algorithms.

The results related to the convergence rate of the proposed model and the fifth group

of comparative algorithms show that the proposed CIAS algorithm performed better than

the other metaheuristic algorithms in two of the fifth group of datasets. The results related

to the convergence rate of the proposed model and comparative algorithms implemented

on the whole dataset indicate that the proposed CIAS approach has achieved better re-

sults. The proposed CIAS model performed better in 75% of clustering datasets. The re-

sults related to the worst, best, and average solutions for the population of the proposed

model and other comparative algorithms are presented in Table 5.

Table 5. Results related to the worst, best, and average solutions for the population of the proposed

model and other comparative algorithms.

Dataset CSA ABC BA AEFA CIAS

Figure 4. Convergence rate of the proposed model and other comparative algorithms.

Symmetry 2023,15, 894 18 of 26

Table 5. Results related to the worst, best, and average solutions for the population of the proposed model and other comparative algorithms.

Dataset CSA ABC BA AEFA CIAS

Blood

worst 4.10E+05 3.90E+06 6.01E+05 4.88E+06 4.41E+05

best 4.08E+05 4.11E+05 6.01E+05 4.85E+05 4.41E+05

avg 4.09E+05 1.13E+06 6.01E+05 1.99E+06 4.41E+05

Cancer

worst 4.43E+03 9.45E+03 5.85E+03 3.57E+03 2.96E+03

best 4.09E+03 3.57E+03 5.81E+03 3.56E+03 2.96E+03

avg 4.30E+03 5.78E+03 5.82E+03 3.56E+03 2.96E+03

CMC

worst 6.47E+03 1.05E+04 7.72E+03 6.74E+03 5.53E+03

best 6.30E+03 5.95E+03 7.69E+03 6.74E+03 5.53E+03

avg 6.35E+03 7.75E+03 7.70E+03 6.74E+03 5.53E+03

Dermatology

worst 2.97E+03 3.51E+03 3.08E+03 3.14E+03 2.25E+03

best 2.97E+03 3.16E+03 3.07E+03 3.13E+03 2.24E+03

avg 2.97E+03 3.35E+03 3.07E+03 3.14E+03 2.25E+03

Iris

worst 1.06E+02 3.60E+02 1.50E+02 1.07E+02 9.67E+01

best 1.03E+02 1.22E+02 1.46E+02 1.05E+02 9.67E+01

avg 1.04E+02 2.21E+02 1.48E+02 1.07E+02 9.67E+01

Orl

worst 5.01E+05 7.77E+05 6.36E+05 7.33E+05 5.03E+05

best 5.00E+05 7.68E+05 6.36E+05 7.26E+05 5.03E+05

avg 5.00E+05 7.74E+05 6.36E+05 7.30E+05 5.03E+05

Steel

worst 2.99E+09 9.93E+09 6.82E+09 2.98E+10 5.81E+09

best 2.95E+09 2.15E+09 6.82E+09 6.30E+09 5.81E+09

avg 2.97E+09 3.40E+09 6.82E+09 1.85E+10 5.81E+09

Wine

worst 1.72E+04 1.83E+04 1.71E+04 5.35E+04 1.63E+04

best 1.71E+04 1.65E+04 1.71E+04 1.90E+04 1.63E+04

avg 1.72E+04 1.72E+04 1.71E+04 5.02E+04 1.63E+04

balance scale

worst 1.43E+03 1.72E+03 1.45E+03 1.43E+03 1.43E+03

best 1.43E+03 1.44E+03 1.44E+03 1.43E+03 1.43E+03

avg 1.43E+03 1.52E+03 1.44E+03 1.43E+03 1.43E+03

breasts

worst 3.43E+03 6.08E+03 3.05E+03 2.36E+03 2.02E+03

best 3.39E+03 2.32E+03 3.03E+03 2.36E+03 2.02E+03

avg 3.41E+03 4.34E+03 3.04E+03 2.36E+03 2.02E+03

Symmetry 2023,15, 894 19 of 26

Table 5. Cont.

Dataset CSA ABC BA AEFA CIAS

glass

worst 4.37E+02 6.30E+02 3.69E+02 4.11E+02 2.53E+02

best 3.91E+02 3.07E+02 3.65E+02 4.10E+02 2.53E+02

avg 4.10E+02 5.03E+02 3.67E+02 4.11E+02 2.53E+02

Haberman

worst 2.62E+03 1.11E+04 2.94E+03 2.59E+03 2.57E+03

best 2.59E+03 2.62E+03 2.93E+03 2.59E+03 2.57E+03

avg 2.61E+03 3.90E+03 2.93E+03 2.59E+03 2.57E+03

heart

worst 1.10E+04 3.01E+04 1.45E+04 1.19E+04 1.06E+04

best 1.08E+04 1.07E+04 1.45E+04 1.13E+04 1.06E+04

avg 1.09E+04 1.37E+04 1.45E+04 1.18E+04 1.06E+04

Hepatitis

worst 1.24E+04 1.25E+04 1.48E+04 1.93E+04 1.18E+04

best 1.20E+04 1.18E+04 1.48E+04 1.48E+04 1.18E+04

avg 1.22E+04 1.21E+04 1.48E+04 1.93E+04 1.18E+04

Libras

worst 5.87E+02 9.16E+02 7.34E+02 7.78E+02 5.41E+02

best 5.85E+02 8.71E+02 7.23E+02 7.78E+02 5.40E+02

avg 5.86E+02 8.92E+02 7.26E+02 7.78E+02 5.41E+02

lung Cancer

worst 1.59E+02 1.71E+02 1.64E+02 1.65E+02 1.38E+02

best 1.58E+02 1.60E+02 1.63E+02 1.65E+02 1.38E+02

avg 1.59E+02 1.66E+02 1.63E+02 1.65E+02 1.38E+02

Madelon

worst 1.86E+06 3.91E+06 2.85E+06 2.67E+06 1.91E+06

best 1.86E+06 3.64E+06 2.85E+06 2.52E+06 1.90E+06

avg 1.86E+06 3.77E+06 2.85E+06 2.59E+06 1.90E+06

seeds

worst 3.77E+02 1.04E+03 3.69E+02 3.68E+02 3.12E+02

best 3.67E+02 3.72E+02 3.63E+02 3.65E+02 3.12E+02

avg 3.71E+02 5.29E+02 3.64E+02 3.66E+02 3.12E+02

speech

worst 4.65E+12 2.41E+12 6.92E+12 3.63E+13 3.00E+12

best 3.71E+12 2.16E+12 6.92E+12 7.07E+12 3.00E+12

avg 4.18E+12 2.26E+12 6.92E+12 1.68E+13 3.00E+12

vowel

worst 1.71E+05 3.73E+05 2.55E+05 4.16E+05 1.62E+05

best 1.69E+05 1.92E+05 2.55E+05 2.09E+05 1.62E+05

avg 1.70E+05 2.58E+05 2.55E+05 3.27E+05 1.62E+05

Symmetry 2023,15, 894 20 of 26

The outcomes of the worst, best, and average population solutions for the proposed

model, as well as other comparative algorithms, demonstrate that the worst population

solution outperformed other algorithms in clustering datasets, while the best population

solution outperformed other algorithms in clustering datasets, and the average population

solution outperformed other comparative algorithms. In this section, the simulation and

parameter determination was ﬁrst carried out. Then the different versions of the IAS based

on the chaotic map (CIAS-1, CIAS-2,

. . .

, CIAS-10) were compared in various statistical

criteria. Further evaluations and comparisons showed that the Chebyshev chaotic map

achieved better results than other chaotic maps. The Chebyshev-based IAS was compared

with the basic metaheuristic algorithms such as BA, CSA, ABC, and AEFA. The results of

various experiments indicate that the Chebyshev-based IAS has better convergence and

performance than other basic metaheuristic algorithms.

6. Real Application: Binary CIAS on COVID-19 Dataset

The severe acute respiratory syndrome coronavirus2 is the causative agent of the

sickness known as coronavirus disease 2019 (COVID-19) (https://github.com/Atharva-

Peshkar/Covid-19-Patient-Health-Analytics, accessed on 22 January 2023), which is an

exceptionally infectious and dangerous illness (SARS-CoV-2). In December 2019, Wuhan,

China, was the location of the ﬁrst conﬁrmed case, which was quickly followed by a rapid

global spread. Due to the escalating number of likely COVID-19 acute respiratory issues

and the disease’s high fatality rates, the World Health Organization (WHO) has proclaimed

the COVID-19 illness a worldwide catastrophe. It is vital to develop effective processes

that consistently identify potential cases of COVID-19 to halt its spread and partially

alleviate the global crisis. This will enable likely patients to be isolated from the general

population. Several alternative optimization approaches are being developed as part of

the response to the COVID-19 pandemic. These approaches may be separated into distinct

categories: screening, monitoring, prediction, and diagnosis. In recent times, a signiﬁcant

number of diagnostic procedures that detect the COVID-19 disease by exploiting efﬁcient

features taken from the clinical dataset have been developed. So far, various models such

as BE-WOA [

], Binary Simulated Normal Distribution Optimizer (BSNDO) [

], and

Artiﬁcial Gorilla Troop Optimization (AGTO) [

] have been proposed for the diagnosis of

COVID-19 disease.

The applicability and performance of the BIAS are tested on the novel coronavirus

2019 dataset, which is a pre-processed and released version of the original COVID-19

dataset. The results of these evaluations are discussed in the section that follows. Table 6

describes the dataset after it has been pre-processed. In that Table, the column labeled

“diff_sym_hos” contains the number of days that have elapsed since the date on which

symptoms were ﬁrst observed (which corresponds to the column “sym_on” in the raw

dataset) and the date on which the patient checked into the hospital (column hosp-vis

in the original dataset). All of the categorical columns in the pre-processed dataset were

label-encoded by assigning a number to each distinct, unconditional value included inside

the column. There are 864 cases and 14 attributes included in this dataset.

The experiment’s results were repeated 20 times to evaluate the BIAS’s performance

compared to other algorithms. The K Nearest Neighbor (KNN) classiﬁer was used with k

equal to 3 and the 10-fold cross-validation approach to construct the classiﬁcation model

for every algorithm.

Symmetry 2023,15, 894 21 of 26

Table 6. Description of the novel coronavirus 2019 dataset.

No. Features Name Description

1 Location The location where patients belong to

2 Country The country where patients belong to

3 Gender The gender of patients

4 Age The ages of the patients

5 vis_wuhan (Yes: 1, 0: No) Whether the patients visited Wuhan, China

6 from_wuhan (Yes: 1, 0: No) Whether the patients from Wuhan, China

7 symptom 1 Fever

8 symptom 2 Cough

9 symptom 3 Cold

10 symptom 4 Fatigue

11 symptom 5 Body pain

12 symptom 6 Malaise

13 diff_sym_hos The day’s difference between the symptoms being noticed

and admission to the hospital

14 Class The class of patient can be either death or recovery

6.1. Fitness Function

The main challenge is determining which features from a dataset will help a classiﬁer

correctly identify the category to which a sample belongs [

]. While we are selecting

essential features, we must automatically rule out those that are redundant. When the se-

lected feature subset is used for classiﬁcation, we will be able to maximize the classiﬁcation

accuracy of a classiﬁcation problem [

]. In this paper, BIAS is used to identify the most

helpful feature subset, and then a classiﬁer is used to determine how accurately this feature

subset can be classiﬁed. Let

ACC

stand for the classiﬁcation accuracy of the model that was

determined with the help of a classiﬁer;

for the dimension of the feature subset; and

for the total number of attributes that were included in the initial dataset. Therefore, the

error in classiﬁcation is denoted by the notation

(1−ACC)

, and the proportion of features

chosen from the complete dataset is represented by

. The ﬁtness function is deﬁned

according to Equation (6).

FF =α×(1−ACC)+(1−α)×Da

Na(6)

In Equation (6), α∈[0, 1]denotes the weightage given to the classiﬁcation error.

6.2. Transfer Function

Since FS is a binary optimization problem, its result is the numbers 0 and 1, where

0 indicates that the feature is not selected since it is unnecessary. One suggests that it is

chosen since it is beneﬁcial. However, we cannot determine the potential that the result

produced will fall outside the desired range. A binarization function needs to be applied to

each agent for us to guarantee that the output will always fall within the parameters of the

selected range. The sigmoid (S-shaped) transfer function is responsible for carrying out

this activity in BIAS. The S-shaped transfer function is deﬁned according to Equation (7).

T(x)=1

1+e−x(7)

Symmetry 2023,15, 894 22 of 26

Xd(t)=





1i f r and <T(Xd(t))

0i f r and ≥T(Xd(t))(8)

This function has a range of [0,1] in its domain. If the output of the transfer function is

more signiﬁcant than rand, where it is a random number chosen from a uniform distribution

between 0 and 1, the value is equal to 1. This property is helpful, its value will always be

equal to 0 if it is equal to or lower than the rand. Since the attribute is unnecessary, we will

not consider it.

6.3. Evaluation Criteria

The BIAS’s effectiveness was evaluated based on accuracy, Recall, precision, F-measure,

and the total amount of features (selection size).

Precision:

The signiﬁcance of the results is deﬁned by the accuracy of the results,

which is represented as the ratio of successfully predicted positive observations to the total

number of positive observations.

Precision =TP

TP +FP (9)

Recall:

The term “recall” refers to the proportion of accurately predicted afﬁrmative

observations relative to the total number of observations in an actual class that answer “yes”.

Recall =TP

TP +FN (10)

F-measure:

The F1 Score is another method for determining the correctness of an

experiment. It is calculated using the weighted mean of the Precision and Recall scores. As

a result, this score considers the possibility of both false positives and negatives.

F−Measure =2×Precision ×Recall

Precision +Recall (11)

Accuracy:

Accuracy is the measurable statistic that correctly classiﬁes the occurrence

instance, and it is simply a ratio of predicted correct observations to the total sample size.

It is the performance measure that is the most intuitive to assess.

Accuracy =TP +TN

TP +TN +FP +FN (12)

According to accuracy values in Table 7, BIAS could take the highest average accuracy

value of 96.25%, while BABC, BBA, BCSA, and BAEFA could take 91.95%, 92.48%, 95.32%,

and 94.79%, respectively.

Compared to previous algorithms, the proposed model performed signiﬁcantly better

in Recall, precision, accuracy, and f-measure, as evidenced by the experiment’s ﬁndings.

Within the search space, the implementation of BIAS investigates regions that are relatively

close to optimum global values. During the exploration and exploitation phases, the BIAS

versions search the most promising area of the search space. According to the results of

BIAS’s search history analyses, the distribution of candidate solution points around the

global optimal is greater than that of BCSA. Figure 5compares the performance of BIAS

with other algorithms based on recall, precision, f-measure, and accuracy.

Symmetry 2023,15, 894 23 of 26

Table 7. Comparison of the BIAS and other algorithms based on the accuracy.

Models Iterations Precision Recall F-Measure Accuracy

BABC 100 91.15 91.24 91.19 91.21

200 91.48 91.62 91.55 91.95

BBA 100 92.29 92.38 92.33 92.15

200 92.37 92.51 92.44 92.48

BCSA 100 94.14 94.26 94.20 94.71

200 95.25 95.38 95.31 95.32

BAEFA 100 94.06 94.19 94.09 94.36

200 94.52 94.63 94.57 94.79

Proposed

Model

100 95.53 95.76 95.64 95.84

200 96.04 96.35 96.19 96.25

Symmetry 2023, 15, x FOR PEER REVIEW 23 of 26

According to accuracy values in Table 7, BIAS could take the highest average accu-

racy value of 96.25%, while BABC, BBA, BCSA, and BAEFA could take 91.95%, 92.48%,

95.32%, and 94.79%, respectively.

Table 7. Comparison of the BIAS and other algorithms based on the accuracy.

Models

Iterations

Precision

Recall

F-Measure

Accuracy

BABC

100

91.15

91.24

91.19

91.21

200

91.48

91.62

91.55

91.95

BBA

100

92.29

92.38

92.33

92.15

200

92.37

92.51

92.44

92.48

BCSA

100

94.14

94.26

94.20

94.71

200

95.25

95.38

95.31

95.32

BAEFA

100

94.06

94.19

94.09

94.36

200

94.52

94.63

94.57

94.79

Proposed Model

100

95.53

95.76

95.64

95.84

200

96.04

96.35

96.19

96.25

Compared to previous algorithms, the proposed model performed significantly bet-

ter in Recall, precision, accuracy, and f-measure, as evidenced by the experiment’s find-

ings. Within the search space, the implementation of BIAS investigates regions that are

relatively close to optimum global values. During the exploration and exploitation phases,

the BIAS versions search the most promising area of the search space. According to the

results of BIAS’s search history analyses, the distribution of candidate solution points

around the global optimal is greater than that of BCSA. Figure 5 compares the perfor-

mance of BIAS with other algorithms based on recall, precision, f-measure, and accuracy.

Figure 5. Performance comparison of BIAS with other algorithms.

The BIAS optimization algorithm was used to produce the best possible feature set,

displayed in Table 8). BIAS identiﬁes the best potential value for six of the thirteen provided

features. It is utilized in these characteristics to predict the presence of COVID-19 positive

in individuals exhibiting various symptoms. When compared to the input feature set,

which consisted of 14 features (Table 6), 8 of those features were removed. The vast majority

of the elements that have been removed are those that pertain to personal information such

as age, sex, etc. The results of the suggested model, based on the selection of features, are

presented in Table 8. The accuracy percentage ranges from 98.41 to 98.68 for ﬁve features.

The accuracy percentage that is the highest for six different features is 98.23, while the

accuracy percentage that is the lowest is 98.06. In addition, the recall and precision scores

Symmetry 2023,15, 894 24 of 26

are both 98.35 and 98.30, correspondingly. The accuracy ranges from 97.76 to 98.31 percent

over seven distinct features. The least accurate accuracy range is 97.76 percent. The accuracy

ranges from 97.52 to 97.65 when eight features are chosen. Accuracy is 97.29 percent for

precision, 97.48 percent for Recall, 97.38 percent for F-Measure, and 97.52 percent overall

when ten features from the feature space are selected. The highest possible accuracy is

96.92%, and the lowest possible accuracy percentage is 96.84 when 11 features are selected.

According to the ﬁndings, the proposed model with fewer features has a better accuracy

percentage than its competitors.

Table 8. Results of the BIAS Feature Selection.

Features Precision Recall F-Measure Accuracy

5 98.32 98.43 98.37 98.68

5 98.41 98.47 98.44 98.41

6 98.26 98.35 98.30 98.23

6 98.14 98.46 98.30 98.06

7 98.45 98.55 98.50 98.31

7 97.35 97.49 97.42 97.76

8 97.50 97.58 97.54 97.65

8 97.14 97.30 97.22 97.52

9 97.32 97.58 97.45 97.41

10 97.29 97.48 97.38 97.52

10 97.06 97.19 97.12 97.13

11 96.58 96.67 96.62 96.84

11 96.61 96.75 96.68 96.92

12 96.35 96.42 96.38 96.56

12 96.42 96.56 96.49 96.42

13 96.11 96.20 96.15 96.25

7. Conclusions and Future Works

The IAS is a population-based metaheuristic optimization algorithm with three robust

operators: individual training sessions, group training sessions, and new student challenges.

This paper presented an improved version of the chaotic IAS to solve data clustering

problems. First, ten different chaotic maps were used to generate different versions of

the IAS (i.e., CIAS-1, CIAS-2,

. . .

, and CIAS-10). Next, 20 valid UCI clustering datasets

were used to evaluate the proposed approaches. In addition, the intra-cluster summation

ﬁtness function was previously deﬁned as the ﬁtness function for the proposed model and

other comparative algorithms. An improved version of the chaotic IAS was implemented

on MATLAB 2019; the initial population was 20, and the number of repetitions was 100.

At ﬁrst, CIAS-1, CIAS-2,

. . .

, and CIAS-10 were compared with different criteria. The

various evaluations and comparisons showed that the Chebyshev-Chaotic-Map-based

version performed better. Finally, the Chebyshev-Chaotic-Map-based performance of the

IAS was compared with other basic metaheuristic methods such as the BA, the CSA, the

ABC, and the AEFA. The various experiments showed that this version of the IAS has better

convergence and performance than other basic metaheuristic algorithms. Furthermore,

BIAS is experimented on a COVID-19 dataset for detecting the coronavirus disease. Future

research will consider a multi-objective IAS with chaotic maps for solving high-dimensional

data clustering.

Author Contributions:

Conceptualization, F.S.G. and A.A.K.; methodology, F.S.G.; software, A.A.K.;

validation, F.S.G. and A.A.K.; formal analysis, A.A.K.; investigation, A.A.K.; resources, F.S.G.; data

curation, F.S.G.; writing—original draft preparation, A.A.K.; writing—review and editing, F.S.G.;

visualization, A.A.K.; supervision, F.S.G.; project administration, F.S.G.; funding acquisition, F.S.G.

All authors have read and agreed to the published version of the manuscript.

Funding: This paper received no external funding.

Institutional Review Board Statement: Not applicable.

Symmetry 2023,15, 894 25 of 26

Informed Consent Statement: Not applicable.

Data Availability Statement:

All data used in this manuscript was downloaded from the

UCI depository

Conﬂicts of Interest: The authors declare no conﬂict of interest.

References

Sorkhabi, L.B.; Gharehchopogh, F.S.; Shahamfar, J. A systematic approach for pre-processing electronic health records for mining:

Case study of heart disease. Int. J. Data Min. Bioinform. 2020,24, 97–120. [CrossRef]

Arasteh, B.; Abdi, M.; Bouyer, A. Program source code comprehension by module clustering using combination of discretized

gray wolf and genetic algorithms. Adv. Eng. Softw. 2022,173, 103252. [CrossRef]

Nadimi-Shahraki, M.H.; Taghian, S.; Mirjalili, S.; Ewees, A.A.; Abualigah, L.; Elaziz, M.A. MTV-MFO: Multi-Trial Vector-Based

Moth-Flame Optimization Algorithm. Symmetry 2021,13, 2388. [CrossRef]

Izci, D. A novel improved atom search optimization algorithm for designing power system stabilizer. Evol. Intell.

2022

,15,

2089–2103. [CrossRef]

Ekinci, S.; Izci, D.; Al Nasar, M.R.; Abu Zitar, R.; Abualigah, L. Logarithmic spiral search based arithmetic optimization algorithm

with selective mechanism and its application to functional electrical stimulation system control. Soft Comput.

2022

,26, 12257–12269.

[CrossRef]

Arasteh, B.; Sadegi, R.; Arasteh, K. Bölen: Software module clustering method using the combination of shufﬂed frog leaping and

genetic algorithm. Data Technol. Appl. 2021,55, 251–279. [CrossRef]

Arasteh, B.; Sadegi, R.; Arasteh, K. ARAZ: A software modules clustering method using the combination of particle swarm

optimization and genetic algorithms. Intell. Decis. Technol. 2020,14, 449–462. [CrossRef]

Gharehchopogh, F.S.; Gholizadeh, H. A comprehensive survey: Whale Optimization Algorithm and its applications. Swarm Evol.

Comput. 2019,48, 1–24. [CrossRef]

Jahangiri, M.; Hadianfard, M.A.; Najafgholipour, M.A.; Jahangiri, M.; Gerami, M.R. Interactive autodidactic school: A new

metaheuristic optimization algorithm for solving mathematical and structural design optimization problems. Comput. Struct.

2020,235, 106268. [CrossRef]

10.

Karaboga, D. An Idea Based on Honey Bee Swarm for Numerical Optimization; Technical Report-tr06; Erciyes University: Ercis, Turkey,

2005.

11.

Yang, X.-S. A new metaheuristic bat-inspired algorithm. In Nature Inspired Cooperative Strategies for Optimization (NICSO 2010);

Springer: Berlin/Heidelberg, Germany, 2010; pp. 65–74.

12.

Askarzadeh, A. A novel metaheuristic method for solving constrained engineering optimization problems: Crow search algorithm.

Comput. Struct. 2016,169, 1–12. [CrossRef]

13.

Yadav, A.; Kumar, N. Artiﬁcial electric ﬁeld algorithm for engineering optimization problems. Expert Syst. Appl.

2020

,149, 113308.

[CrossRef]

14.

Ahmadi, R.; Ekbatanifard, G.; Bayat, P. A Modiﬁed Grey Wolf Optimizer Based Data Clustering Algorithm. Appl. Artif. Intell.

2021,35, 63–79. [CrossRef]

15.

Ashish, T.; Kapil, S.; Manju, B. Parallel bat algorithm-based clustering using mapreduce. In Networking Communication and Data

Knowledge Engineering; Springer: Berlin/Heidelberg, Germany, 2018; pp. 73–82.

16.

Eesa, A.S.; Orman, Z. A new clustering method based on the bio-inspired cuttleﬁsh optimization algorithm. Expert Syst.

2020

37, e12478. [CrossRef]

17.

Olszewski, D. Asymmetric k-means algorithm. In Adaptive and Natural Computing Algorithms; Lecture Notes in Computer Science;

Dobnikar, A., Lotric, U., Ster, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6594, pp. 1–10.

18.

Aggarwal, S.; Singh, P. Cuckoo and krill herd-based k-means++ hybrid algorithms for clustering. Expert Syst.

2019

,36, e12353.

[CrossRef]

19.

Zhang, G.; Zhang, C.; Zhang, H. Improved K-means algorithm based on density Canopy. Knowl. Based Syst.

2018

,145, 289–297.

[CrossRef]

20.

Kumar, A.; Kumar, D.; Jarial, S. A novel hybrid K-means and artiﬁcial bee colony algorithm approach for data clustering. Decis. Sci.

Lett. 2018,7, 65–76. [CrossRef]

21.

Nasiri, J.; Khiyabani, F.M. A whale optimization algorithm (WOA) approach for clustering. Cogent Math. Stat.

2018

,5, 1483565.

[CrossRef]

22.

Qaddoura, R.; Faris, H.; Aljarah, I. An efﬁcient evolutionary algorithm with a nearest neighbor search technique for clustering

analysis. J. Ambient. Intell. Humaniz. Comput. 2021,12, 8387–8412. [CrossRef]

23.

Zhou, Y.; Wu, H.; Luo, Q.; Abdel-Baset, M. Automatic data clustering using nature-inspired symbiotic organism search algorithm.

Knowl. Based Syst. 2019,163, 546–557. [CrossRef]

24.

Ewees, A.A.; Elaziz, M.A. Performance analysis of Chaotic Multi-Verse Harris Hawks Optimization: A case study on solving

engineering problems. Eng. Appl. Artif. Intell. 2020,88, 103370. [CrossRef]

25.

Chen, K.; Zhou, F.; Liu, A. Chaotic dynamic weight particle swarm optimization for numerical function optimization. Knowl. Based

Syst. 2018,139, 23–40. [CrossRef]

Symmetry 2023,15, 894 26 of 26

26.

Zhang, X.; Xu, Y.; Yu, C.; Heidari, A.A.; Li, S.; Chen, H.; Li, C. Gaussian mutational chaotic fruit ﬂy-built optimization and feature

selection. Expert Syst. Appl. 2019,141, 112976. [CrossRef]

27.

Gharehchopogh, F.S.; Nadimi-Shahraki, M.H.; Barshandeh, S.; Abdollahzadeh, B.; Zamani, H. CQFFA: A Chaotic Quasi-

oppositional Farmland Fertility Algorithm for Solving Engineering Optimization Problems. J. Bionic Eng.

2022

,20, 158–183.

[CrossRef]

28.

Nadimi-Shahraki, M.H.; Zamani, H.; Mirjalili, S. Enhanced whale optimization algorithm for medical feature selection:

A COVID-19 case study. Comput. Biol. Med. 2022,148, 105858. [CrossRef]

29.

Ahmed, S.; Sheikh, K.H.; Mirjalili, S.; Sarkar, R. Binary Simulated Normal Distribution Optimizer for feature selection: Theory

and application in COVID-19 datasets. Expert Syst. Appl. 2022,200, 116834. [CrossRef]

30.

Piri, J.; Mohapatra, P.; Acharya, B.; Gharehchopogh, F.S.; Gerogiannis, V.C.; Kanavos, A.; Manika, S. Feature Selection Using

Artiﬁcial Gorilla Troop Optimization for Biomedical Data: A Case Analysis with COVID-19 Data. Mathematics

2022

,10, 2742.

[CrossRef]

31.

Nadimi-Shahraki, M.H.; Fatahi, A.; Zamani, H.; Mirjalili, S. Binary Approaches of Quantum-Based Avian Navigation Optimizer

to Select Effective Features from High-Dimensional Medical Data. Mathematics 2022,10, 2770. [CrossRef]

32.

Nadimi-Shahraki, M.H.; Fatahi, A.; Zamani, H.; Mirjalili, S.; Oliva, D. Hybridizing of Whale and Moth-Flame Optimization

Algorithms to Solve Diverse Scales of Optimal Power Flow Problem. Electronics 2022,11, 831. [CrossRef]

33.

Nadimi-Shahraki, M.H.; Moeini, E.; Taghian, S.; Mirjalili, S. DMFO-CD: A Discrete Moth-Flame Optimization Algorithm for

Community Detection. Algorithms 2021,14, 314. [CrossRef]

Disclaimer/Publisher’s Note:

The statements, opinions and data contained in all publications are solely those of the individual

author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to

people or property resulting from any ideas, methods, instructions or products referred to in the content.

An Improved Boosting Bald Eagle Search Algorithm with Improved African Vultures Optimization Algorithm for Data Clustering

Article

Full-text available

Apr 2024

Farhad Soleimanian Gharehchopogh

Data clustering is one of the main issues in the optimization problem. It is the process of clustering a group of items into several groups. Items within each group have the greatest similarity and the least similarity to things in other groups. It is employed in various domains and applications, including biology, business, and consumer analysis, document clustering, web, banking, and image processing, to name a few. In this paper, two new methods are proposed using hybridization of the Bald Eagle Search (BES) Algorithm with the African Vultures Optimization Algorithm (AVOA) (BESAVOA) and BESAVOA with Opposition Based Learning (BESAVOA-OBL) for data clustering. AVOA is used to find the centers of the clusters and improve the centrality of the groups obtained by the BES algorithm. Primary vectors are created based on the population of eagles, and then each vector is used BESAVOA to search the centers of the clusters. The proposed methods (BESAVOA and BESAVOA-OBL) are evaluated on 16 UCI datasets, based on the number of generations, number of iterations, execution time, and convergence. The results show that the BESAVOA-OBL fits better than the other algorithms. The results show that compared to other algorithms, BESAVOA-OBL is more effective by a ratio of 12.42 percent.

Meta-heuristics and deep learning for energy applications: Review and open research challenges (2018–2023)

Article

May 2024

The synergy between deep learning and meta-heuristic algorithms presents a promising avenue for tackling the complexities of energy-related modeling and forecasting tasks. While deep learning excels in capturing intricate patterns in data, it may falter in achieving optimality due to the nonlinear nature of energy data. Conversely, meta-heuristic algorithms offer optimization capabilities but suffer from computational burdens, especially with high-dimensional data. This paper provides a comprehensive review spanning 2018 to 2023, examining the integration of meta-heuristic algorithms within deep learning frameworks for energy applications. We analyze state-of-the-art techniques, innovations, and recent advancements, identifying open research challenges. Additionally, we propose a novel framework that seamlessly merges meta-heuristic algorithms into deep learning paradigms, aiming to enhance performance and efficiency in addressing energy-related problems. The contributions of the paper include: 1. Overview of recent advancements in MHs, DL, and integration. 2. Coverage of trends from 2018 to 2023. 3. Introduction of Alpha metric for performance evaluation. 4. Innovative framework harmonizing MHs with DL for energy problems.

Feature selection algorithm based on optimized genetic algorithm and the application in high-dimensional data processing

Article

Full-text available

May 2024
PLOS ONE

Guilian Feng

High-dimensional data is widely used in many fields, but selecting key features from it is challenging. Feature selection can reduce data dimensionality and weaken noise interference, thereby improving model efficiency and enhancing model interpretability. In order to improve the efficiency and accuracy of high-dimensional data processing, a feature selection method based on optimized genetic algorithm is proposed in this study. The algorithm simulates the process of natural selection, searches for possible subsets of feature, and finds the subsets of feature that optimizes the performance of the model. The results show that when the value of K is less than 4 or more than 8, the recognition rate is very low. After adaptive bias filtering, 724 features are filtered to 372, and the accuracy is improved from 0.9352 to 0.9815. From 714 features to 406 Gaussian codes, the accuracy is improved from 0.9625 to 0.9754. Among all tests, the colon has the highest average accuracy, followed by small round blue cell tumor(SRBCT), lymphoma, central nervous system(CNS) and ovaries. The green curve is the best, with stable performance and a time range of 0–300. While maintaining the efficiency, it can reach 4.48 as soon as possible. The feature selection method has practical significance for high-dimensional data processing, improves the efficiency and accuracy of data processing, and provides an effective new method for high-dimensional data processing.

Remaining useful life prediction based on parallel multi-scale feature fusion network

Article

Full-text available

May 2024
J INTELL MANUF

In the domain of Predictive Health Management (PHM), the prediction of Remaining Useful Life (RUL) is pivotal for averting machinery malfunctions and curtailing maintenance expenditures. Currently, most RUL prediction methods overlook the correlation between local and global information, which may lead to the loss of important features and, consequently, a subsequent decline in predictive precision. To address these limitations, this study presents a groundbreaking deep learning framework termed the Parallel Multi-Scale Feature Fusion Network (PM2FN). This approach leverages the advantages of different network structures by constructing two distinct feature extractors to capture both global and local information, thereby providing a more comprehensive feature set for RUL prediction. Experimental results on two publicly available datasets and a real-world dataset demonstrate the superiority and effectiveness of our method, offering a promising solution for industrial RUL prediction.

Chaotic-Based Mountain Gazelle Optimizer for Solving Optimization Problems

Article

Full-text available

May 2024
INT J COMPUT INT SYS

The Mountain Gazelle Optimizer (MGO) algorithm has become one of the most prominent swarm-inspired meta-heuristic algorithms because of its outstanding rapid convergence and excellent accuracy. However, the MGO still faces premature convergence, making it challenging to leave the local optima if early-best solutions neglect the relevant search domain. Therefore, in this study, a newly developed Chaotic-based Mountain Gazelle Optimizer (CMGO) is proposed with numerous chaotic maps to overcome the above-mentioned flaws. Moreover, the ten distinct chaotic maps were simultaneously incorporated into MGO to determine the optimal values and enhance the exploitation of the most promising solutions. The performance of CMGO has been evaluated using CEC2005 and CEC2019 benchmark functions, along with four engineering problems. Statistical tests like the t-test and Wilcoxon rank-sum test provide further evidence that the proposed CMGO outperforms the existing eminent algorithms. Hence, the experimental outcomes demonstrate that the CMGO produces successful and auspicious results.

Developing a hybrid feature selection method to detect botnet attacks in IoT devices

Article

Full-text available

Apr 2024

FIAO: Feature Information Aggregation Oversampling for Imbalanced Data Classification

Article

May 2024
APPL SOFT COMPUT

Hybrid optimization approach for energy minimization in wireless sensor networks leveraging XGBoost and random forest

Article

May 2024
J INTELL FUZZY SYST

Wireless Sensor Networks (WSNs) have garnered significant attention from both the academic and industrial communities. However, the limited battery capacity of WSN nodes imposes a set of restrictions on energy dissipations, which has compelled researchers to seek ways to save and minimize energy consumption. This paper presents a hybrid optimization model to minimize energy dissipation in Wireless Sensor Networks (WSNs). Employing linear programming and a combination of XGBoost and Random Forest algorithms, it effectively predicts internode distances and network lifetime. The results demonstrate significant energy savings in WSN deployments, outperforming traditional methods. This approach contributes to the field by offering a practical, energy-efficient strategy for WSN configuration planning, highlighting the model’s applicability in real-world scenarios, where energy conservation is critical.

Improving the estimation of distribution algorithm with a differential mutation for multilevel thresholding image segmentation

Article

Full-text available

May 2024

Image segmentation consists of separating an image into regions that are entirely different from each other, and multilevel thresholding is a method used to perform this task. This article proposes an Estimation of Distribution Algorithms (EDA) combined with a Differential Evolution (DE) operator as a metaheuristic to solve the multilevel thresholding problem. The proposal is called the Differential Mutation Estimation of Distribution Algorithm (DMEDA), where the inclusion of the Differential Mutation increases the standard EDA’s exploration capacity. The performance of the DMEDA for image segmentation is tested using Otsu’s between-class variance and Kapur’s entropy as objective functions applied separately over the Berkeley Segmentation Data Set 300 (BSDS300). Besides, a comparative study includes eight well-known algorithms in the literature. In this sense, statistical and non-parametric tests are performed to verify the efficiency of the DMEDA in solving the image segmentation problem from an optimization perspective. In terms of segmentation, different metrics are employed to verify the capabilities of the DMEDA to segment digital images properly. Regarding the two objective functions, the proposed DMEDA obtains better results in 97% of the experiments for Otsu’s between-class variance and 85% for Kapur’s entropy.

HPB3C-3PG algorithm: A new hybrid global optimization algorithm and its application to plant classification

Article

Mar 2024
ECOL INFORM

CQFFA: A Chaotic Quasi‑oppositional Farmland Fertility Algorithm for Solving Engineering Optimization Problems

Article

Full-text available

Aug 2022

Farmland Fertility Algorithm (FFA) is a recent nature-inspired metaheuristic algorithm for solving optimization problems. Nevertheless, FFA has some drawbacks: slow convergence and imbalance of diversifcation (exploration) and intensifcation (exploitation). An adaptive mechanism in every algorithm can achieve a proper balance between exploration and exploitation. The literature shows that chaotic maps are incorporated into metaheuristic algorithms to eliminate these drawbacks. Therefore, in this paper, twelve chaotic maps have been embedded into FFA to find the best numbers of prospectors to increase the exploitation of the best promising solutions. Furthermore, the Quasi-Oppositional-Based Learning (QOBL) mechanism enhances the exploration speed and convergence rate; we name a CQFFA algorithm. The improvements have been made in line with the weaknesses of the FFA algorithm because the FFA algorithm has fallen into the optimal local trap in solving some complex problems or does not have sufcient ability in the intensification component. The results obtained show that the proposed CQFFA model has been significantly improved. It is applied to twenty-three widely-used test functions and compared with similar state-of-the-art algorithms statistically and visually. Also, the CQFFA algorithm has evaluated six real-world engineering problems. The experimental results showed that the CQFFA algorithm outperforms other competitor algorithms.

Binary Approaches of Quantum-Based Avian Navigation Optimizer to Select Effective Features from High-Dimensional Medical Data

Article

Full-text available

Aug 2022

Many metaheuristic approaches have been developed to select effective features from different medical datasets in a feasible time. However, most of them cannot scale well to large medical datasets, where they fail to maximize the classification accuracy and simultaneously minimize the number of selected features. Therefore, this paper is devoted to developing an efficient binary version of the quantum-based avian navigation optimizer algorithm (QANA) named BQANA, utilizing the scalability of the QANA to effectively select the optimal feature subset from high-dimensional medical datasets using two different approaches. In the first approach, several binary versions of the QANA are developed using S-shaped, V-shaped, U-shaped, Z-shaped, and quadratic transfer functions to map the continuous solutions of the canonical QANA to binary ones. In the second approach, the QANA is mapped to binary space by converting each variable to 0 or 1 using a threshold. To evaluate the proposed algorithm, first, all binary versions of the QANA are assessed on different medical datasets with varied feature sizes, including Pima, HeartEW, Lymphography, SPECT Heart, PenglungEW, Parkinson, Colon, SRBCT, Leukemia, and Prostate tumor. The results show that the BQANA developed by the second approach is superior to other binary versions of the QANA to find the optimal feature subset from the medical datasets. Then, the BQANA was compared with nine well-known binary metaheuristic algorithms, and the results were statistically assessed using the Friedman test. The experimental and statistical results demonstrate that the proposed BQANA has merit for feature selection from medical datasets.

Feature Selection Using Artificial Gorilla Troop Optimization for Biomedical Data: A Case Analysis with COVID-19 Data

Article

Full-text available

Aug 2022

Feature selection (FS) is commonly thought of as a pre-processing strategy for determining the best subset of characteristics from a given collection of features. Here, a novel discrete artificial gorilla troop optimization (DAGTO) technique is introduced for the first time to handle FS tasks in the healthcare sector. Depending on the number and type of objective functions, four variants of the proposed method are implemented in this article, namely: (1) single-objective (SO-DAGTO), (2) bi-objective (wrapper) (MO-DAGTO1), (3) bi-objective (filter wrapper hybrid) (MO-DAGTO2), and (4) tri-objective (filter wrapper hybrid) (MO-DAGTO3) for identifying relevant features in diagnosing a particular disease. We provide an outstanding gorilla initialization strategy based on the label mutual information (MI) with the aim of increasing population variety and accelerate convergence. To verify the performance of the presented methods, ten medical datasets are taken into consideration, which are of variable dimensions. A comparison is also implemented between the best of the four suggested approaches (MO-DAGTO2) and four established multi-objective FS strategies, and it is statistically proven to be the superior one. Finally, a case study with COVID-19 samples is performed to extract the critical factors related to it and to demonstrate how this method is fruitful in real-world applications.

Logarithmic spiral search based arithmetic optimization algorithm with selective mechanism and its application to functional electrical stimulation system control

Article

Full-text available

Apr 2022
SOFT COMPUT

A biomedical application of a novel metaheuristic optimizer is proposed in this paper by constructing an enhanced arithmetic optimization algorithm (AOA). The latter algorithm was constructed using the logarithmic spiral (Ls) search mechanism from the whale optimization algorithm and the greedy selection scheme from the differential evolution algorithm. The proposed algorithm (Ls-AOA) was tested against unimodal and multimodal benchmark functions and demonstrated better capability comparatively using other efficient metaheuristic algorithms reported in the literature. The constructed Ls-AOA algorithm was then proposed to design a proportional-integral-derivative (PID) controller employed in a functional electrical stimulation (FES) system for the first time. The initial statistical and convergence profile assessment showed better performance of the proposed algorithm. The comparative analyses for transient and frequency responses were performed for the PID-controlled FES system using the original AOA, sine-cosine and particle swarm optimization algorithms and the traditional Ziegler-Nichols tuning scheme. Similarly, the FES system tuned with the latter methods was also assessed for disturbance rejection and noise elimination. All the comparative analyses demonstrated that the proposed Ls-AOA has the greater capability for the challenging biomedical FES system.

Hybridizing of Whale and Moth-Flame Optimization Algorithms to Solve Diverse Scales of Optimal Power Flow Problem

Article

Full-text available

Mar 2022

The optimal power flow (OPF) is a practical problem in a power system with complex characteristics such as a large number of control parameters and also multi-modal and non-convex objective functions with inequality and nonlinear constraints. Thus, tackling the OPF problem is becoming a major priority for power engineers and researchers. Many metaheuristic algorithms with different search strategies have been developed to solve the OPF problem. Although, the majority of them suffer from stagnation , premature convergence, and local optima trapping during the optimization process, which results in producing low solution qualities, especially for real-world problems. This study is devoted to proposing an effective hybridizing of whale optimization algorithm (WOA) and a modified moth-flame optimization algorithm (MFO) named WMFO to solve the OPF problem. In the proposed WMFO, the WOA and the modified MFO cooperate to effectively discover the promising areas and provide high-quality solutions. A randomized boundary handling is used to return the solutions that have violated the permissible boundaries of search space. Moreover, a greedy selection operator is defined to assess the acceptance criteria of new solutions. Ultimately, the performance of the WMFO is scrutinized on single and multi-objective cases of different OPF problems including standard IEEE 14-bus, IEEE 30-bus, IEEE 39-bus, IEEE 57-bus, and IEEE118-bus test systems. The obtained results corroborate that the proposed algorithm outperforms the contender algorithms for solving the OPF problem.

MTV-MFO: Multi-Trial Vector-Based Moth-Flame Optimization Algorithm

Article

Full-text available

Dec 2021

The moth-flame optimization (MFO) algorithm is an effective nature-inspired algorithm based on the chemical effect of light on moths as an animal with bilateral symmetry. Although it is widely used to solve different optimization problems, its movement strategy affects the convergence and the balance between exploration and exploitation when dealing with complex problems. Since movement strategies significantly affect the performance of algorithms, the use of multi-search strategies can enhance their ability and effectiveness to solve different optimization problems. In this paper, we propose a multi-trial vector-based moth-flame optimization (MTV-MFO) algorithm. In the proposed algorithm, the MFO movement strategy is substituted by the multi-trial vector (MTV) approach to use a combination of different movement strategies, each of which is adjusted to accomplish a particular behavior. The proposed MTV-MFO algorithm uses three different search strategies to enhance the global search ability, maintain the balance between exploration and exploitation, and prevent the original MFO's premature convergence during the optimization process. Furthermore, the MTV-MFO algorithm uses the knowledge of inferior moths preserved in two archives to prevent premature convergence and avoid local optima. The performance of the MTV-MFO algorithm was evaluated using 29 benchmark problems taken from the CEC 2018 competition on real parameter optimization. The gained results were compared with eight metaheuristic algorithms. The comparison of results shows that the MTV-MFO algorithm is able to provide competitive and superior results to the compared algorithms in terms of accuracy and convergence rate. Moreover, a statistical analysis of the MTV-MFO algorithm and other compared algorithms was conducted, and the effectiveness of our proposed algorithm was also demonstrated experimentally.

DMFO-CD: A Discrete Moth-Flame Optimization Algorithm for Community Detection

Article

Full-text available

Oct 2021

In this paper, a discrete moth-flame optimization algorithm for community detection (DMFO-CD) is proposed. The representation of solution vectors, initialization, and movement strategy of the continuous moth-flame optimization are purposely adapted in DMFO-CD such that it can solve the discrete community detection. In this adaptation, locus-based adjacency representation is used to represent the position of moths and flames and the initialization process is performed by considering the community structure and the relation between nodes without the need of any knowledge about the number of communities. Solution vectors are updated by the adapted movement strategy using a single-point crossover to distance imitating, a two-point crossover to calculate the movement, and a single-point neighbor-based mutation which can enhance the exploration and balance exploration and exploitation. The fitness function is also defined based on modularity. The performance of DMFO-CD was evaluated on eleven real-world networks, and the obtained results were compared with five well-known algorithms in community detection including GA-Net, DPSO-PDM, GACD, EGACD, and DECS in terms of modularity, NMI, and the number of detected communities. Additionally, the obtained results were statistically analyzed by the Wilcoxon signed-rank and Friedman tests. In the comparison with other comparative algorithms, the results show that the proposed DMFO-CD is very competitive to detect the correct number of communities with high modularity.

Program source code comprehension by module clustering using combination of discretized gray wolf and genetic algorithms

Article

Nov 2022
ADV ENG SOFTW

Maintenance is a critical and costly phase of software lifecycle. Understanding the structure of software will make it much easier to maintain the software. Clustering the modules of software is regarded as a useful reverse engineering technique for constructing software structural models from source code. Minimizing the connections between produced clusters, maximizing the internal connections within the clusters, and maximizing the clustering quality are the most important objectives in software module clustering. Finding the optimal software clustering model is regarded as an NP-complete problem. The low success rate, limited stability, and poor modularization quality are the main drawbacks of the previous methods. In this paper, a combination of gray wolf optimization algorithm and genetic algorithms is suggested for efficient clustering of software modules. An extensive series of experiments on 14 standard benchmarks have been conducted to evaluated the proposed method. The results illustrate that using the combination of gray wolf and genetic algorithms to the software-module clustering problem increases the quality of clustering. In terms of modularization quality and convergence speed, proposed hybrid method outperforms the other heuristic approaches.

Enhanced whale optimization algorithm for medical feature selection: A COVID-19 case study

Article

Jul 2022
COMPUT BIOL MED

The whale optimization algorithm (WOA) is a prominent problem solver which is broadly applied to solve NP-hard problems such as feature selection. However, it and most of its variants suffer from low population diversity and poor search strategy. Introducing efficient strategies is highly demanded to mitigate these core drawbacks of WOA particularly for dealing with the feature selection problem. Therefore, this paper is devoted to proposing an enhanced whale optimization algorithm named E-WOA using a pooling mechanism and three effective search strategies named migrating, preferential selecting, and enriched encircling prey. The performance of E-WOA is evaluated and compared with well-known WOA variants to solve global optimization problems. The obtained results proved that the E-WOA outperforms WOA's variants. After E-WOA showed a sufficient performance, then, it was used to propose a binary E-WOA named BE-WOA to select effective features, particularly from medical datasets. The BE-WOA is validated using medical diseases datasets and compared with the latest high-performing optimization algorithms in terms of fitness, accuracy, sensitivity, precision, and number of features. Moreover, the BE-WOA is applied to detect coronavirus (COVID-19) disease. The experimental and statistical results prove the efficiency of the BE-WOA in searching the problem space and selecting the most effective features compared to comparative optimization algorithms.

Binary Simulated Normal Distribution Optimizer for feature selection: Theory and application in COVID-19 datasets

Article

Mar 2022
EXPERT SYST APPL

Classification accuracy achieved by a machine learning technique depends on the feature set used in the learning process. However, it is often found that all the features extracted by some means for a particular task do not contribute to the classification process. Feature selection (FS) is an imperative and challenging pre-processing technique that helps to discard the unnecessary and irrelevant features while reducing the computational time and space requirement and increasing the classification accuracy. Generalized Normal Distribution Optimizer (GNDO), a recently proposed meta-heuristic algorithm, can be used to solve any optimization problem. In this paper, a hybrid version of GNDO with Simulated Annealing (SA) called Binary Simulated Normal Distribution Optimizer (BSNDO) is proposed which uses SA as a local search to achieve higher classification accuracy. The proposed method is evaluated on 18 well-known UCI datasets and compared with its predecessor as well as some popular FS methods. Moreover, this method is tested on high dimensional microarray datasets to prove its worth in real-life datasets. On top of that, it is also applied to a COVID-19 dataset for classification purposes. The obtained results prove the usefulness of BSNDO as a FS method. The source code of this work is publicly available at https://github.com/ahmed-shameem/Feature_selection.

A Chaotic-Based Interactive Autodidactic School Algorithm for Data Clustering Problems and Its Application on COVID-19 Disease Detection

Abstract

Recommended publications

A Hybrid African Vulture Optimization Algorithm and Harmony Search: Algorithm and Application in Clu...

A new hybrid whale optimization algorithm and golden jackal optimization for data clustering

Automatic Data Clustering Using Farmland Fertility Metaheuristic Algorithm

An Improved Boosting Bald Eagle Search Algorithm with Improved African Vultures Optimization Algorit...