A preview of this full-text is provided by Springer Nature.
Content available from Neural Computing and Applications
This content is subject to copyright. Terms and conditions apply.
ORIGINAL ARTICLE
A multi-objective Artificial Bee Colony algorithm for cost-sensitive
subset selection
Emrah Hancer
1
Received: 20 May 2021 / Accepted: 6 May 2022 / Published online: 30 May 2022
ÓThe Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022
Abstract
Feature selection typically aims to select a feature subset that maximally contributes to the performance of a further process
(such as clustering, classification and regression). Most of the current feature selection methods handle all features in the
dataset with the same importance while evaluating the possible feature subsets in a solution space. However, this case may
not be appropriate since each feature in a dataset comes with its own impact and importance. In particular, each feature
may provide a different cost to achieve some specific purposes. To address this issue, we introduce an improved multi-
objective artificial bee colony-based cost-sensitive subset selection method which simultaneously tries to minimize two
main conflicting objectives: the classification error rate and the feature cost. According to the results on well-known
benchmarks, the proposed cost-sensitive subset selection approach outperforms the recently introduced multi-objective
variants of the artificial bee colony, particle swarm optimization and genetic algorithms. To the best of our knowledge, this
work is one of the earliest studies on multi-objective cost-sensitive subset selection in the literature.
Keywords Cost Sensitive Learning Feature Selection Artificial Bee Colony Multi-Objective Optimization
1 Introduction
Feature selection is one of the key issues in data science
and machine learning applications, such as clustering
[1,2], classification [3,4] and regression [5,6]. The overall
goal is to select a subset from the input features in the
dataset according to predefined performance metrics.
Reducing the dimensionality of the dataset prevents the
adverse effects of noisy information within the dataset,
improves the generalization ability and reduces the com-
putational complexity of the further learning process.
Therefore, feature selection has undergone a great devel-
opment during the last decade [7].
In the case of evaluation, feature selection can be cat-
egorized in three fundamental categories: filters, wrappers
and embedded. Filters evaluate the features by examining
the intrinsic information in the dataset. Max-Relevance
Min-Redundancy (mRmR) [8], ReliefF [9], information
gain [10] and Laplacian Score [11] are the most repre-
sentative algorithms among filters. Wrappers use a learning
algorithm such as a black box to evaluate possible feature
subsets. The representative examples of wrappers are
sequential floating forward–backward selection [12] and
support vector machine recursive feature elimination
(SVM-RFE) [13]. Embedded methods incorporate the
selection process in their learning process, i.e., they com-
bine the selection and the learning processes into a single
process. Regularized regression-based methods [14] are an
example of this category. Although all such feature selec-
tion methods have achieved promising results and thereby
have been widely used by researchers in data mining
applications, most of them still encounter local conver-
gence problems. To alleviate the convergence problems,
researchers have frequently applied evolutionary compu-
tation (EC) techniques to feature selection tasks since they
are able to thoroughly investigate a solution space due to
their global searching characteristics. The most represen-
tative EC techniques applied to feature selection are
genetic algorithms (GA) [15], genetic programming (GP)
[16], ant colony optimization (ACO) [17], particle swarm
&Emrah Hancer
ehancer@mehmetakif.edu.tr; emrahhanc@gmail.com
1
Department of Software Engineering, Mehmet Akif Ersoy
University, Burdur 15039, Turkey
123
Neural Computing and Applications (2022) 34:17523–17537
https://doi.org/10.1007/s00521-022-07407-x(0123456789().,-volV)(0123456789().,-volV)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.