ArticlePublisher preview available

A multi-objective Artificial Bee Colony algorithm for cost-sensitive subset selection

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract and Figures

Feature selection typically aims to select a feature subset that maximally contributes to the performance of a further process (such as clustering, classification and regression). Most of the current feature selection methods handle all features in the dataset with the same importance while evaluating the possible feature subsets in a solution space. However, this case may not be appropriate since each feature in a dataset comes with its own impact and importance. In particular, each feature may provide a different cost to achieve some specific purposes. To address this issue, we introduce an improved multi-objective artificial bee colony-based cost-sensitive subset selection method which simultaneously tries to minimize two main conflicting objectives: the classification error rate and the feature cost. According to the results on well-known benchmarks, the proposed cost-sensitive subset selection approach outperforms the recently introduced multi-objective variants of the artificial bee colony, particle swarm optimization and genetic algorithms. To the best of our knowledge, this work is one of the earliest studies on multi-objective cost-sensitive subset selection in the literature.
This content is subject to copyright. Terms and conditions apply.
ORIGINAL ARTICLE
A multi-objective Artificial Bee Colony algorithm for cost-sensitive
subset selection
Emrah Hancer
1
Received: 20 May 2021 / Accepted: 6 May 2022 / Published online: 30 May 2022
ÓThe Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022
Abstract
Feature selection typically aims to select a feature subset that maximally contributes to the performance of a further process
(such as clustering, classification and regression). Most of the current feature selection methods handle all features in the
dataset with the same importance while evaluating the possible feature subsets in a solution space. However, this case may
not be appropriate since each feature in a dataset comes with its own impact and importance. In particular, each feature
may provide a different cost to achieve some specific purposes. To address this issue, we introduce an improved multi-
objective artificial bee colony-based cost-sensitive subset selection method which simultaneously tries to minimize two
main conflicting objectives: the classification error rate and the feature cost. According to the results on well-known
benchmarks, the proposed cost-sensitive subset selection approach outperforms the recently introduced multi-objective
variants of the artificial bee colony, particle swarm optimization and genetic algorithms. To the best of our knowledge, this
work is one of the earliest studies on multi-objective cost-sensitive subset selection in the literature.
Keywords Cost Sensitive Learning Feature Selection Artificial Bee Colony Multi-Objective Optimization
1 Introduction
Feature selection is one of the key issues in data science
and machine learning applications, such as clustering
[1,2], classification [3,4] and regression [5,6]. The overall
goal is to select a subset from the input features in the
dataset according to predefined performance metrics.
Reducing the dimensionality of the dataset prevents the
adverse effects of noisy information within the dataset,
improves the generalization ability and reduces the com-
putational complexity of the further learning process.
Therefore, feature selection has undergone a great devel-
opment during the last decade [7].
In the case of evaluation, feature selection can be cat-
egorized in three fundamental categories: filters, wrappers
and embedded. Filters evaluate the features by examining
the intrinsic information in the dataset. Max-Relevance
Min-Redundancy (mRmR) [8], ReliefF [9], information
gain [10] and Laplacian Score [11] are the most repre-
sentative algorithms among filters. Wrappers use a learning
algorithm such as a black box to evaluate possible feature
subsets. The representative examples of wrappers are
sequential floating forward–backward selection [12] and
support vector machine recursive feature elimination
(SVM-RFE) [13]. Embedded methods incorporate the
selection process in their learning process, i.e., they com-
bine the selection and the learning processes into a single
process. Regularized regression-based methods [14] are an
example of this category. Although all such feature selec-
tion methods have achieved promising results and thereby
have been widely used by researchers in data mining
applications, most of them still encounter local conver-
gence problems. To alleviate the convergence problems,
researchers have frequently applied evolutionary compu-
tation (EC) techniques to feature selection tasks since they
are able to thoroughly investigate a solution space due to
their global searching characteristics. The most represen-
tative EC techniques applied to feature selection are
genetic algorithms (GA) [15], genetic programming (GP)
[16], ant colony optimization (ACO) [17], particle swarm
&Emrah Hancer
ehancer@mehmetakif.edu.tr; emrahhanc@gmail.com
1
Department of Software Engineering, Mehmet Akif Ersoy
University, Burdur 15039, Turkey
123
Neural Computing and Applications (2022) 34:17523–17537
https://doi.org/10.1007/s00521-022-07407-x(0123456789().,-volV)(0123456789().,-volV)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Article
Full-text available
The massive growth of data in recent years has led challenges in data mining and machine learning tasks. One of the major challenges is the selection of relevant features from the original set of available features that maximally improves the learning performance over that of the original feature set. This issue attracts researchers’ attention resulting in a variety of successful feature selection approaches in the literature. Although there exist several surveys on unsupervised learning (e.g., clustering), lots of works concerning unsupervised feature selection are missing in these surveys (e.g., evolutionary computation based feature selection for clustering) for identifying the strengths and weakness of those approaches. In this paper, we introduce a comprehensive survey on feature selection approaches for clustering by reflecting the advantages/disadvantages of current approaches from different perspectives and identifying promising trends for future research.
Article
Full-text available
The diagnosis and prognosis of patients with severe chronic disorders of consciousness are still challenging issues and a high rate of misdiagnosis is evident. Hence, new tools are needed for an accurate diagnosis, which will also have an impact on the prognosis. In recent years, functional Magnetic Resonance Imaging (fMRI) has been gaining more and more importance when diagnosing this patient group. Especially resting state scans, i.e., an examination when the patient does not perform any task in particular, seems to be promising for these patient groups. After preprocessing the resting state fMRI data with a standard pipeline, we extracted the correlation matrices of 132 regions of interest. The aim was to find the regions of interest which contributed most to the distinction between the different patient groups and healthy controls. We performed feature selection using a genetic algorithm and a support vector machine. Moreover, we show by using only those regions of interest for classification that are most often selected by our algorithm, we get a much better performance of the classifier.
Article
Full-text available
Background: Support vector machines (SVM) are a powerful tool to analyze data with a number of predictors approximately equal or larger than the number of observations. However, originally, application of SVM to analyze biomedical data was limited because SVM was not designed to evaluate importance of predictor variables. Creating predictor models based on only the most relevant variables is essential in biomedical research. Currently, substantial work has been done to allow assessment of variable importance in SVM models but this work has focused on SVM implemented with linear kernels. The power of SVM as a prediction model is associated with the flexibility generated by use of non-linear kernels. Moreover, SVM has been extended to model survival outcomes. This paper extends the Recursive Feature Elimination (RFE) algorithm by proposing three approaches to rank variables based on non-linear SVM and SVM for survival analysis. Results: The proposed algorithms allows visualization of each one the RFE iterations, and hence, identification of the most relevant predictors of the response variable. Using simulation studies based on time-to-event outcomes and three real datasets, we evaluate the three methods, based on pseudo-samples and kernel principal component analysis, and compare them with the original SVM-RFE algorithm for non-linear kernels. The three algorithms we proposed performed generally better than the gold standard RFE for non-linear kernels, when comparing the truly most relevant variables with the variable ranks produced by each algorithm in simulation studies. Generally, the RFE-pseudo-samples outperformed the other three methods, even when variables were assumed to be correlated in all tested scenarios. Conclusions: The proposed approaches can be implemented with accuracy to select variables and assess direction and strength of associations in analysis of biomedical data using SVM for categorical or time-to-event responses. Conducting variable selection and interpreting direction and strength of associations between predictors and outcomes with the proposed approaches, particularly with the RFE-pseudo-samples approach can be implemented with accuracy when analyzing biomedical data. These approaches, perform better than the classical RFE of Guyon for realistic scenarios about the structure of biomedical data.
Article
In recent years, a variety of feature selection methods based on evolutionary computation (EC) techniques have been developed for classification due to their robustness and search ability. However, the previous EC-based feature selection research mostly focuses on enhancing the predication accuracy without taking the costs associated with the learning process into consideration. In other words, the impact of using EC-based feature selection to improve the cost-sensitive classification performance has not been well studied. Further, it is not possible to find a EC-based filter cost-sensitive feature selection method in the literature, to the best of our knowledge. We therefore design a cost-sensitive evaluation criterion using the principles of fuzzy mutual estimator and then adopt the criterion in differential evolution framework. According to a variety of experiments conducted on various benchmarks, the proposed filter can effectively minimize both the classification error rate and the feature cost by removing irrelevant and distracting features from the dataset in a reasonable time.
Chapter
Feature selection in supervised classification is a crucial task in many biomedical applications. Most of the existing approaches assume that all features have the same cost. However, in many medical applications, this assumption may be inappropriate, as the acquisition of the value of some features can be costly. For example, in a medical diagnosis, each diagnostic value extracted by a clinical test is associated with its own cost. Costs can also refer to non-financial aspects, for example, the decision between an invasive exploratory surgery and a simple blood test. In such cases, the goal is to select a subset of features associated with the class variable (e.g., the occurrence of disease) within the assumed user-specified budget. We consider a general information theoretic framework that allows controlling the costs of features. The proposed criterion consists of two components: the first one describes the feature relevance and the second one is a penalty for its cost. We introduce a cost factor that controls the trade-off between these two components. We propose a procedure in which the optimal value of the cost factor is chosen in a data-driven way. The experiments on artificial and real medical datasets indicate that, when the budget is limited, the proposed approach is superior to existing traditional feature selection methods. The proposed framework has been implemented in an open source library (Python package: https://github.com/kaketo/bcselector).
Article
Today’s real-world data mostly involves incomplete, inconsistent, and/or irrelevant information that causes many drawbacks to transform it into an understandable format. In order to deal with such issues, data preprocessing is a proven discipline in data mining. One of the typical tasks in data preprocessing, feature selection aims to reduce the dimensionality in the data and thereby contributes to further processing. Feature selection is widely used to enhance the performance of a supervised learning algorithm (e.g., classification) but is rarely used in unsupervised tasks (e.g., clustering). This paper introduces a new multi-objective differential evolution approach in order to find relatively homogeneous clusters without the prior knowledge of cluster number using a smaller number of features from all available features in the data. To analyze the goodness of the introduced approach, several experiments are conducted on a various number of real-world and synthetic benchmarks using a variety of clustering approaches. From the analyzes through several different criteria, it is suggested that our method can significantly improve the clustering performance while reducing the dimensionality at the same time.
Article
In this paper, we propose a multi-objective differential evolution-based filter approach for feature selection that interconnects fuzzy- and kernel-based information theory measures to find feature subsets that are optimal responses to the targets. In contrast to the existing filter approaches using the principles of information theory and rough set theory, our approach can be applied to continuous datasets without discretisation. Moreover, our study is the first in the literature that employs fuzzy and kernel measures to form a filter criterion for feature selection, to our knowledge. We prove various favourable results using a variety of benchmark datasets and also demonstrate that our approach can better search the dimensionality space to reach maximum predictive of the response.
Article
Since different features may require different costs, the cost-sensitive feature selection problem become more and more important in real-world applications. Generally, it includes two main conflicting objectives, i.e., maximizing the classification performance and minimizing the feature cost. However, most existing approaches treat this task as a single-objective optimization problem. To satisfy various requirements of decision-makers, this paper studies a multi-objective feature selection approach, called two-archive multi-objective artificial bee colony algorithm (TMABC-FS). Two new operators, i.e., convergence-guiding search for employed bees and diversity-guiding search for onlooker bees, are proposed for obtaining a group of non-dominated feature subsets with good distribution and convergence. And two archives, i.e., the leader archive and the external archive are employed to enhance the search capability of different kinds of bees. The proposed TMABC-FS is validated on several datasets from UCI, and is compared with two traditional algorithms and three multi-objective methods. Results have shown that TMABC-FS is an efficient and robust optimization method for solving cost-sensitive feature selection problems.