Content uploaded by Mohd Najib B. Mohd Salleh
Author content
All content in this area was uploaded by Mohd Najib B. Mohd Salleh on May 20, 2019
Content may be subject to copyright.
A Divide-and-Conquer Strategy
for Adaptive Neuro-Fuzzy Inference
System Learning Using Metaheuristic
Algorithm
Mohd Najib Mohd Salleh, Kashif Hussain and Noreen Talpur
Abstract Adaptive neuro-fuzzy inference system (ANFIS) has produced promising
results in model approximation. The core of ANFIS computation lies in the train-
ing of its parameters. Metaheuristic algorithms have been successfully employed on
ANFIS parameters training. Conventionally, a population individual in metaheuristic
algorithm, considered as ANFIS model with candidate parameters, is evaluated for
its fitness on complete training set. This makes ANFIS parameters training compu-
tationally expensive when dataset is large. This paper proposes divide-and-conquer
strategy where each population individual is given a piece of dataset instead of
complete dataset to train and evaluate ANFIS model fitness. The proposed ANFIS
training approach is evaluated on accuracy on testing dataset, as well as, training
computational complexity. Experiments on several classification problems reveal
that the proposed methodology reduced training computational complexity up to
93%. Moreover, the proposed ANFIS training approach generated rules that achieved
better accuracy on testing dataset as compared to conventional training approach.
Keywords Neuro-fuzzy systems ·Fuzzy inference ·Metaheuristic algorithms ·
Learning
1 Introduction
Computationally intelligent techniques such as neural network and fuzzy logic have
been successfully used for modeling nonlinear problems in engineering, science,
medical, business, etc. [1]. The fusion of neural network and fuzzy logic has resulted
in highly accurate, adaptive, and flexible neuro-fuzzy system (NFS). An NFS uses
learning ability of neural network and reasoning ability of fuzzy inference system to
fuzzy IF-THEN rules that best map inputs with the desired output. The performance of
any NFS depends on how the network learns its weights (parameters). Hence, learning
algorithm employed on NFS is crucial to successfully modeling a problem [2]. There
M. N. M. Salleh (B)·K. Hussain ·N. Talpur
Universiti Tun Hussein Onn Malaysia, 86400 Batu Pahat, Johor, Malaysia
e-mail: najib@uthm.edu.my
© Springer Nature Singapore Pte Ltd. 2019
V.Piurietal.(eds.),Intelligent and Interactive Computing, Lecture Notes in Networks
and Systems 67, https://doi.org/10.1007/978-981- 13-6031- 2_18
205
206 M. N. M. Salleh et al.
has been extensive research pursuing high accuracy and efficient learning in NFSs.
According to [3], an efficient NFS maintains (a) fast parameter learning for generating
smallest possible error and (b) low network complexity for less computational cost.
Adaptive neuro-fuzzy inference system (ANFIS) [4] is based on Takagi—
Sugeno–Kang (TSK) inference mechanism. It is the most successful among other
NFSs, applied on almost all types of data modeling problems including regression,
classification, approximation, as well as, control problems [2,5]. However, ANFIS
suffers from poor learning performance especially on problems where parameter
optimization becomes highly multimodal and non-convex problem, due to gradient-
based learning. Moreover, gradient-based learning algorithm in standard ANFIS is
slow and prone to produce suboptimal parameter tuning. To cater this, researchers
have proposed numerous alternatives to ANFIS learning using metaheuristic algo-
rithms. This research also uses metaheuristic algorithm called particle swarm opti-
mization (PSO) for ANFIS parameter training, but with a different approach than
typically applied in literature.
The subsequent section discusses ANFIS and its standard learning algorithm. The
alternative ANFIS learning algorithms proposed in literature have been highlighted
in this section. Section 3explains the proposed divide-and-conquer approach for the
training of ANFIS parameters using PSO. The experimental environment is presented
in Sect. 4. Section 5discusses results, and the study is concluded in Sect. 6.
2 Adaptive Neuro-fuzzy Inference System
Jang in 1993 [4] introduced adaptive neuro-fuzzy inference system (ANFIS) which is
a neural network-type structure with fuzzy logic embedded in first and fourth layers.
The trainable parameters of ANFIS architecture are contained in first and fourth
layers, and the remaining three layers perform inference operations to connect IF
and THEN parts of a fuzzy rule. A fuzzy rule in ANFIS may be expressed as shown
in Fig. 1, which has two parts: IF part which is also called premise part and the latter
THEN part is known as consequent part. The kparameters in these two parts are
trained by learning algorithm. This implies that if an ANFIS architecture has rrules,
then k·rparameters will be trained, meaning that ANFIS training algorithm will
have enormous computational cost if rules increase exponentially in a problem with
large number of input variables.
The layer-wise illustration of ANFIS architecture is given in Fig. 2. The first layer
of ANFIS takes crisp inputs and converts into degree of membership for the linguistic
term defined. In Fig. 2,μA1,μA2, and μB1,μB2are membership functions for input
Fig. 1 IF-THEN rule in
ANFIS IF xis A1AND y is B1THEN f=xp +yq +r
Premise Part Consequent Part
A Divide-and-Conquer Strategy for Adaptive Neuro-Fuzzy … 207
x
y
A1
A2
B1
B2
Π
Π
Π
Π
N
N
N
N
f1
f2
f3
f4
∑
Layer 1 Layer 2 Layer 3 Layer 4 Layer 5
Premise part Fuzzy rules Consequent part
fi=xpi+yqi+ri
GD LSE
Fig. 2 ANFIS architecture with two inputs, four rules, and one output
Fig. 3 Gaussian-type
membership function
variables xand y, respectively. Here, it is noteworthy to mention that the definition
of membership function is user-defined. It is up to user who, based on experience or
expert opinion, defines the shape of membership function which can be trapezoidal,
triangular, Gaussian, generalized bell-shaped, or any other user-defined shape. Every
node in this layer contains parameters updated during training process. For example,
in case of Gaussian membership function (Fig. 3), the trainable parameters are center
(c) and width (σ).
The output O1,i of every node in the first layer is membership degree, as expressed
in (1) where μAi and μBi are membership functions:
O1,iμAi (x),i1,2
or
O1,iμBi(y),i3,4
(1)
208 M. N. M. Salleh et al.
The second layer simply performs product operation to calculate the firing
strength wiof a fuzzy rule. The output O2,i of the nodes in this layer is the product
of combinations of membership degrees related to input variables.
O2,iwiμAi (x).μBi (y),i1,2,3,4(2)
In this layer, the rule base is generated with the help of different partitioning meth-
ods—mainly clustering and grid partitioning methods are used [6]. This research uses
grid partitioning to generate mnrules where mis number of membership functions per
input and nis the number of input variables. The strength of each rule is normalized
against all rules in the rule base in the third layer, using (3):
O3,i¯wiwi
n
j1wj
,in1,2,3,4(3)
where ¯wiis the normalized firing strength of ith rule. The fourth layer is consequent
part of fuzzy rules. The output of each rule O4,i is calculated using linear polynomial
equation as expressed in (4):
O4,i¯wifi(xp
i+yqi+ri),i1,2,3,4(4)
where pi,qi, and riare the consequent parameters associated with ith rule. These
parameters are trained by learning algorithm. In the last layer, the function of the
only node O5is to aggregate outputs of rules calculated in the previous layer (5):
O5
n
i1
O4i(5)
where nis the total number of rules (such as four rules in Fig. 1). It is obvious from
(5) that the output of individual rule is important to the aggregated output of ANFIS.
2.1 Standard ANFIS Learning Method
To achieve minimum error, ANFIS learning algorithm updates trainable membership
function parameters (such as c,σ) and consequent parameters (such as p,q,r). The
standard learning algorithm uses a hybrid of gradient descent (GD) and least square
estimator (LSE). For utilizing the two methods, ANFIS uses a two-pass learning
algorithm. During the first pass, LSE tunes consequent parameters, while in the
second pass, GD updates membership function parameters to further approximate
the model with better accuracy.
Training of ANFIS parameters is often formulated as optimization problem. In
this regard, according to [5], mainly three approaches related to ANFIS learning are
A Divide-and-Conquer Strategy for Adaptive Neuro-Fuzzy … 209
found in literature: gradient-based learning, heuristic algorithms, and hybrid meth-
ods combining derivative and heuristic approaches. Since derivative-based learn-
ing approaches are prone to find suboptimal solutions [7], many researchers prefer
metaheuristic approaches. The next section highlights some of the works proposing
heuristic methods for the training of ANFIS parameters for solving classification
problems.
2.2 Heuristic Approaches for ANFIS Learning
To counter suboptimal performance of derivative-based training methods in ANFIS
training, various heuristic and metaheuristic algorithms have been utilized in litera-
ture. Premise and consequent parameters are often optimized by devising a combined
search space and let metaheuristic algorithm to find optimal set of parameters that
produce best results for ANFIS network.
The research by [8] employed ANFIS on identification of earthquake victims for
disaster management operations in China. For optimizing the network parameters,
the authors proposed novel differential biogeography-based optimization (DBBO)
algorithm. When compared with other evolutionary algorithms, DBBO produced
better classification results. Obo et al. [9] classified human gesture for human—
computer interaction using ANFIS, where evolutionary algorithm was used to train
parameters. Mine blast algorithm (MBA) was used on ANFIS parameters training
for solving classification problems [10]. The research first improved MBA and then
employed on both premise and consequent parameters. With an application in medi-
cal science, [11] proposed ant colony optimization (ACO) for improving the learning
of ANFIS parameters. The research claimed to have improved ANFIS on mammo-
gram classification problem by identifying breast cancer cases more accurately. Rini
et al. [12] proposed modified ANFIS with linguistic hedge in membership functions.
The research improved the network accuracy by optimizing all the trainable parame-
ters using PSO. The authors contended to have achieved better accuracy on different
classification problems.
Based on limited literature review, it is obvious that metaheuristic algorithms have
been successfully applied to parameter training of ANFIS model. This implies that
there is potential for further improvement in this area of research.
3 Proposed Method for ANFIS Learning
From the literature review mentioned above, it is clear that metaheuristic algorithms
have been successfully used on ANFIS parameters optimization problem. In most of
the works, each population individual in a metaheuristic algorithm is taken as ANFIS
model, meaning that a population individual maintains a set of parameters (premise
and consequent parameters). While evaluated on objective function (generally, error
210 M. N. M. Salleh et al.
.
.
Shuffle Training Set
Population
Individuals Training Set
1
2
.
.
N
Iterations
t = 1
t = T
.
.
.
.
Population
Individuals Training Set
1
2
.
.
N
Particle with Best Fitness Value
Shuffle Training Set
Population
Individuals
Training Set
Chunks
1
2
.
.
NN
Iterations
t = 1
t = T
.
.
Particle with Best Fitness Value
2
1
.
.
Training Set
Chunks
N
2
1
1
2
.
.
N
Population
Individuals
Shuffle Training Set
(a) Conventional Approach (b) Proposed Divide and Conquer
Approach
.
.
Fig. 4 Conventional and proposed approach for ANFIS parameters training using metaheuristics
measure), each individual is given all the training sets to evaluate model fitness. After
training, the best-fit model is tested on testing set for validating model accuracy. This
suggests that all the population individuals in a metaheuristic algorithm must try all
Minstances in a training set during parameter training process. It is therefore clear
that if there are Npopulation individuals in a metaheuristic algorithm, all will run
complete training set for Ttimes or iterations. This is well illustrated via Fig. 4a.
Opposite to the conventional approach proposed in existing literature, as shown
in Fig. 4b, in each iteration t, the training set is shuffled and chunked into Nparts so
that each population individual is given a new bunch of training instances instead of
whole training set. The idea behind shuffling instances, every time before breaking
into pieces, is that every population individual should become fully aware of the data
so that a best-fit model can be generated. Towards the end of iterations, the population
individual with best objective function value is selected to test on testing instances.
A Divide-and-Conquer Strategy for Adaptive Neuro-Fuzzy … 211
It is noteworthy that the division of training set is proportional to population size. If
there are Npopulation individuals, Nchunks of training data will be prepared.
As an example, we used PSO in this study. A comprehensive detail about PSO
can be found in [13]. Algorithm 1 presents pseudocode of the PSO implementation
for ANFIS training in this study. ANFIS training procedure starts with shuffling
the classification dataset and partitioning into training and testing sets. Then, PSO
initializes particles with random positions in search space. The search process is
performed in iterations until stop criteria are reached (usually, maximum number of
iterations). During iterations, the training dataset is first shuffled and then broken
further into Nchunks (equal to the swarm size) and each particle is assigned with a
piece of training set. For example, if a training set has 100 instances and swarm size
is 10, then each piece of training set will comprise of 100/10 instances. Each particle
is then evaluated with the designated training set for its fitness value. The particle’s
personal best and swarm’s global best positions are assigned. After the iterations,
the particle with best solution is then taken as trained ANFIS model which is then
tested on testing set for validation purpose. While testing, the trained ANFIS model
is tested on the number of instances correctly identified.
4 Experimental Settings
Several experiments were performed to evaluate the proposed approach to train
ANFIS parameters using metaheuristic algorithms. We solved classification prob-
lems by ANFIS using three classification datasets from the UCI (University of Cal-
ifornia) Machine Learning Repository [14]: Habermans, Iris Flower, Vertebral, and
three from KEEL (knowledge extraction based on evolutionary learning) repository
[15]: Banana, Phoneme, and Appendicitis. The datasets were divided into training
and testing sets with the ratio of around 70:30 percent. Table 1presents dataset details
and further partitioning information.
Algorithm 1: PSO Steps for ANFIS Training
1: Partition dataset into training and testing sets
2: Initialize Nparticles with random positions
3: Do
4: Shuffle and partition training set into Nchunks
5: Assign each particle with one chunk of training set
6: Evaluate fitness of each particle on assigned training set
7: Assign personal best and global best particles
8: Update velocity and position of the particles
9: Until maximum iterations or stop criteria reached
10: Apply global best particle on testing set and evaluate results.
The objective function is mean squared error (MSE) for ANFIS training. After
the model is trained, it is then evaluated on testing set for the number of instances
correctly identified—hence, testing accuracy is computed as (6)
212 M. N. M. Salleh et al.
Tabl e 1 Experimental settings
Datasets Habermans (Attributes 3, Instances 306, Classes 2)
Iris Flower (4 Attributes, 150 Instances, 4 Classes)
Vertebral (6 Attributes, 310 Instances, 3 Classes)
Banana (2 Attributes, 5300 Instances, 2 Classes)
Phoneme (5 Attributes, 5404 Instances, 2 Classes)
Appendicitis (7 Attributes, 106 Instances, 2 Classes)
ANFIS Settings Membership function shape Gaussian
Membership functions per input 2
Rule generation method Grid partitioning
PSO Settings Swarm Size 15
Cognition factor C1Social factor C22
Inertia weight range [0.4–0.9]
Maximum iterations 50
Accuracy % CorrectN
InstancesT
×100 (6)
where CorrectNand InstancesTare the testing instances correctly identified and
total number of testing instances, respectively. The major reason of the proposed
methodology is reducing computational complexity of ANFIS training. We mea-
sure training computational complexity (CCT) of the conventional and the proposed
approach using (7)
CCTTrain P×N×MaxItr (7)
where TrainP,N, and MaxItr are training instances executed by each particle, swarm
size, and maximum number of iterations of PSO algorithm.
5 Results
The proposed divide-and-conquer strategy for ANFIS training using PSO was imple-
mented on six classification datasets. On the other hand, the conventional implemen-
tation of PSO on ANFIS training was also performed. The results are compared
for accuracy and computational cost of ANFIS training. Because PSO is stochastic
method, single run may not reflect the actual performance perspective; hence, the
experiments on each dataset were run 10 times and mean of results is reported in this
section. Table 2shows performance results of ANFIS-PSO with proposed and con-
ventional methodologies for comparison. According to the training MSE and testing
accuracy, it is clear that there is no significant difference in results. However, from
A Divide-and-Conquer Strategy for Adaptive Neuro-Fuzzy … 213
Tabl e 2 Performance results of ANFIS-PSO on classification problems
Problem Proposed approach Conventional approach
Training MSE Testing accuracy
(%)
Training MSE Testing accuracy
(%)
Habermans 0.0098 97 0.0100 98
Iris Flower 0.0047 100 0.0049 100
Vertebral 0.0108 96 0.0100 98
Banana 0.0139 94 0.0105 97
Phoneme 0.0122 94 0.0152 93
Appendicitis 0.0070 99 0.0050 100
Tabl e 3 ANFIS parameters training computational complexity
Problem Training computational complexity Complexity reduced by
(%)
Proposed approach Conventional approach
Habermans 4200 64,200 93
Iris Flower 2100 31,500 93
Vertebral 4200 65,100 94
Banana 74,100 1,113,000 93
Phoneme 75,600 1,134,900 93
Appendicitis 1500 22,200 93
the perspective of computation complexity of training process given in Table 3,itis
obvious that the proposed methodology reduced computational complexity signifi-
cantly. Overall, around 93% of computation complexity of the training process was
reduced.
6 Conclusions and Future Directions
A new approach for optimizing ANFIS parameters has been proposed in this study.
Unlike traditional approach where population individuals in metaheuristic algorithms
are given complete training set to evaluate model fitness, we proposed divide-and-
conquer strategy which significantly reduced training computational complexity. In
this paper, the training set is chunked into number of pieces equal to population size,
meaning that each population individual is given a piece of training set instead of
whole dataset. The proposed approach reduced training computational complexity
up to 93% on most of the classification problems, whereas classification accuracy
remained as good as conventional method. The proposed methodology can be handily
implemented on more efficient population-based metaheuristic algorithms, which is
the future consideration of this study.
214 M. N. M. Salleh et al.
Acknowledgements The authors would like to thank Universiti Tun Hussein Onn Malaysia
(UTHM), Malaysia for supporting this research under Postgraduate Incentive Research Grant, Vote
No. U560.
References
1. Subramanian K, Savitha R, Suresh S (2014) A complex-valued neuro-fuzzy inference system
and its learning mechanism. Neurocomputing 123:110–120
2. Ghosh S, Biswas S, Sarkar D, Sarkar PP (2014) A novel Neuro-fuzzy classification technique
for data mining. Egypt Inf J 15(3):129–147
3. Shihabudheen KV, Pillai GN (2018) Recent advances in neuro-fuzzy system: a survey. Knowl-
Based Syst 152:136–162
4. Jang JS (1993) ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man
Cybern 23(3):665–685
5. Karaboga D, Kaya E (2018) Adaptive network based fuzzy inference system (ANFIS) training
approaches: a comprehensive survey. Artif Intell Rev 1–31
6. Kisi O, Shiri J, Karimi S, Adnan RM (2018) Three different adaptive neuro fuzzy computing
techniques for forecasting long-period daily streamflows. In: Big data in engineering applica-
tions. Springer, Singapore, pp 303–321
7. Salleh MNM, Hussain K, Naseem R, Uddin J (2016) Optimization of ANFIS using artificial
bee colony algorithm for classification of Malaysian SMEs. In: International conference on
soft computing and data mining. Springer, Cham, pp 21–30
8. Zheng YJ, Ling HF, Chen SY, Xue JY (2015) A hybrid neuro-fuzzy network based on dif-
ferential biogeography-based optimization for online population classification in earthquakes.
IEEE Trans Fuzzy Syst 23(4):1070–1083
9. Obo T, Loo CK, Seera M, Kubota N (2016) Hybrid evolutionary neuro-fuzzy approach based
on mutual adaptation for human gesture recognition. Appl Soft Comput 42:377–389
10. Salleh MNM, Hussain K (2016) Accelerated mine blast algorithm for ANFIS training for
solving classification problems. Int J Softw Eng Appl 10(6):161–168
11. Thangavel K, Mohideen AK (2016) Mammogram classification using ANFIS with ant colony
optimization based learning. In: Annual convention of the computer society of India. Springer,
Singapore, pp 141–152
12. Rini DP, Shamsuddin SM, Yuhaniz SS (2016) Particle swarm optimization for ANFIS inter-
pretability and accuracy. Soft Comput 20(1):251–262
13. Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization. Swarm Intell 1(1):33–57
14. Dua D, Taniskidou EK (2017) UCI machine learning repository. University of California,
School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml
15. Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011)
KEEL data-mining software tool: data set repository,integration of algorithms and experimental
analysis framework. J Multiple-Valued Logic Soft Comput 17(2–3):255–287