ArticlePublisher preview available

A multi-objective Artificial Bee Colony algorithm for cost-sensitive subset selection

May 2022
Neural Computing and Applications 34(4)

May 2022
34(4)

Authors:

Mehmet Akif Ersoy University

Feature selection typically aims to select a feature subset that maximally contributes to the performance of a further process (such as clustering, classification and regression). Most of the current feature selection methods handle all features in the dataset with the same importance while evaluating the possible feature subsets in a solution space. However, this case may not be appropriate since each feature in a dataset comes with its own impact and importance. In particular, each feature may provide a different cost to achieve some specific purposes. To address this issue, we introduce an improved multi-objective artificial bee colony-based cost-sensitive subset selection method which simultaneously tries to minimize two main conflicting objectives: the classification error rate and the feature cost. According to the results on well-known benchmarks, the proposed cost-sensitive subset selection approach outperforms the recently introduced multi-objective variants of the artificial bee colony, particle swarm optimization and genetic algorithms. To the best of our knowledge, this work is one of the earliest studies on multi-objective cost-sensitive subset selection in the literature.

Results of best front

…

Figures - available from: Neural Computing and Applications

This content is subject to copyright. Terms and conditions apply.

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Neural Computing and Applications

This content is subject to copyright. Terms and conditions apply.

ORIGINAL ARTICLE

A multi-objective Artificial Bee Colony algorithm for cost-sensitive

subset selection

Emrah Hancer

Received: 20 May 2021 / Accepted: 6 May 2022 / Published online: 30 May 2022

ÓThe Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022

Abstract

Feature selection typically aims to select a feature subset that maximally contributes to the performance of a further process

(such as clustering, classiﬁcation and regression). Most of the current feature selection methods handle all features in the

dataset with the same importance while evaluating the possible feature subsets in a solution space. However, this case may

not be appropriate since each feature in a dataset comes with its own impact and importance. In particular, each feature

may provide a different cost to achieve some speciﬁc purposes. To address this issue, we introduce an improved multi-

objective artiﬁcial bee colony-based cost-sensitive subset selection method which simultaneously tries to minimize two

main conﬂicting objectives: the classiﬁcation error rate and the feature cost. According to the results on well-known

benchmarks, the proposed cost-sensitive subset selection approach outperforms the recently introduced multi-objective

variants of the artiﬁcial bee colony, particle swarm optimization and genetic algorithms. To the best of our knowledge, this

work is one of the earliest studies on multi-objective cost-sensitive subset selection in the literature.

Keywords Cost Sensitive Learning Feature Selection Artiﬁcial Bee Colony Multi-Objective Optimization

1 Introduction

Feature selection is one of the key issues in data science

and machine learning applications, such as clustering

[1,2], classiﬁcation [3,4] and regression [5,6]. The overall

goal is to select a subset from the input features in the

dataset according to predeﬁned performance metrics.

Reducing the dimensionality of the dataset prevents the

adverse effects of noisy information within the dataset,

improves the generalization ability and reduces the com-

putational complexity of the further learning process.

Therefore, feature selection has undergone a great devel-

opment during the last decade [7].

In the case of evaluation, feature selection can be cat-

egorized in three fundamental categories: ﬁlters, wrappers

and embedded. Filters evaluate the features by examining

the intrinsic information in the dataset. Max-Relevance

Min-Redundancy (mRmR) [8], ReliefF [9], information

gain [10] and Laplacian Score [11] are the most repre-

sentative algorithms among ﬁlters. Wrappers use a learning

algorithm such as a black box to evaluate possible feature

subsets. The representative examples of wrappers are

sequential ﬂoating forward–backward selection [12] and

support vector machine recursive feature elimination

(SVM-RFE) [13]. Embedded methods incorporate the

selection process in their learning process, i.e., they com-

bine the selection and the learning processes into a single

process. Regularized regression-based methods [14] are an

example of this category. Although all such feature selec-

tion methods have achieved promising results and thereby

have been widely used by researchers in data mining

applications, most of them still encounter local conver-

gence problems. To alleviate the convergence problems,

researchers have frequently applied evolutionary compu-

tation (EC) techniques to feature selection tasks since they

are able to thoroughly investigate a solution space due to

their global searching characteristics. The most represen-

tative EC techniques applied to feature selection are

genetic algorithms (GA) [15], genetic programming (GP)

[16], ant colony optimization (ACO) [17], particle swarm

&Emrah Hancer

ehancer@mehmetakif.edu.tr; emrahhanc@gmail.com

Department of Software Engineering, Mehmet Akif Ersoy

University, Burdur 15039, Turkey

123

Neural Computing and Applications (2022) 34:17523–17537

https://doi.org/10.1007/s00521-022-07407-x(0123456789().,-volV)(0123456789().,-volV)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

MOFS-REPLS: A Large-Scale Multi-Objective Feature Selection Algorithm Based on Real-Valued Encoding and Preference Leadership Strategy

Article

Mar 2024
INFORM SCIENCES

A modified artificial bee colony algorithm based on a non-dominated sorting genetic approach for combined economic-emission load dispatch problem

Article

May 2023
APPL SOFT COMPUT

A survey on feature selection approaches for clustering

Article

Full-text available

Aug 2020
ARTIF INTELL REV

The massive growth of data in recent years has led challenges in data mining and machine learning tasks. One of the major challenges is the selection of relevant features from the original set of available features that maximally improves the learning performance over that of the original feature set. This issue attracts researchers’ attention resulting in a variety of successful feature selection approaches in the literature. Although there exist several surveys on unsupervised learning (e.g., clustering), lots of works concerning unsupervised feature selection are missing in these surveys (e.g., evolutionary computation based feature selection for clustering) for identifying the strengths and weakness of those approaches. In this paper, we introduce a comprehensive survey on feature selection approaches for clustering by reflecting the advantages/disadvantages of current approaches from different perspectives and identifying promising trends for future research.

Genetic algorithms for feature selection when classifying severe chronic disorders of consciousness

Article

Full-text available

Jul 2019
PLOS ONE

The diagnosis and prognosis of patients with severe chronic disorders of consciousness are still challenging issues and a high rate of misdiagnosis is evident. Hence, new tools are needed for an accurate diagnosis, which will also have an impact on the prognosis. In recent years, functional Magnetic Resonance Imaging (fMRI) has been gaining more and more importance when diagnosing this patient group. Especially resting state scans, i.e., an examination when the patient does not perform any task in particular, seems to be promising for these patient groups. After preprocessing the resting state fMRI data with a standard pipeline, we extracted the correlation matrices of 132 regions of interest. The aim was to find the regions of interest which contributed most to the distinction between the different patient groups and healthy controls. We performed feature selection using a genetic algorithm and a support vector machine. Moreover, we show by using only those regions of interest for classification that are most often selected by our algorithm, we get a much better performance of the classifier.

SVM-RFE: Selection and visualization of the most relevant features through non-linear kernels

Article

Full-text available

Nov 2018
BMC BIOINFORMATICS

Background: Support vector machines (SVM) are a powerful tool to analyze data with a number of predictors approximately equal or larger than the number of observations. However, originally, application of SVM to analyze biomedical data was limited because SVM was not designed to evaluate importance of predictor variables. Creating predictor models based on only the most relevant variables is essential in biomedical research. Currently, substantial work has been done to allow assessment of variable importance in SVM models but this work has focused on SVM implemented with linear kernels. The power of SVM as a prediction model is associated with the flexibility generated by use of non-linear kernels. Moreover, SVM has been extended to model survival outcomes. This paper extends the Recursive Feature Elimination (RFE) algorithm by proposing three approaches to rank variables based on non-linear SVM and SVM for survival analysis. Results: The proposed algorithms allows visualization of each one the RFE iterations, and hence, identification of the most relevant predictors of the response variable. Using simulation studies based on time-to-event outcomes and three real datasets, we evaluate the three methods, based on pseudo-samples and kernel principal component analysis, and compare them with the original SVM-RFE algorithm for non-linear kernels. The three algorithms we proposed performed generally better than the gold standard RFE for non-linear kernels, when comparing the truly most relevant variables with the variable ranks produced by each algorithm in simulation studies. Generally, the RFE-pseudo-samples outperformed the other three methods, even when variables were assumed to be correlated in all tested scenarios. Conclusions: The proposed approaches can be implemented with accuracy to select variables and assess direction and strength of associations in analysis of biomedical data using SVM for categorical or time-to-event responses. Conducting variable selection and interpreting direction and strength of associations between predictors and outcomes with the proposed approaches, particularly with the RFE-pseudo-samples approach can be implemented with accuracy when analyzing biomedical data. These approaches, perform better than the classical RFE of Guyon for realistic scenarios about the structure of biomedical data.

Fuzzy filter cost-sensitive feature selection with differential evolution

Article

Jan 2022
KNOWL-BASED SYST

In recent years, a variety of feature selection methods based on evolutionary computation (EC) techniques have been developed for classification due to their robustness and search ability. However, the previous EC-based feature selection research mostly focuses on enhancing the predication accuracy without taking the costs associated with the learning process into consideration. In other words, the impact of using EC-based feature selection to improve the cost-sensitive classification performance has not been well studied. Further, it is not possible to find a EC-based filter cost-sensitive feature selection method in the literature, to the best of our knowledge. We therefore design a cost-sensitive evaluation criterion using the principles of fuzzy mutual estimator and then adopt the criterion in differential evolution framework. According to a variety of experiments conducted on various benchmarks, the proposed filter can effectively minimize both the classification error rate and the feature cost by removing irrelevant and distracting features from the dataset in a reasonable time.

Controlling Costs in Feature Selection: Information Theoretic Approach

Chapter

Jun 2021

Feature selection in supervised classification is a crucial task in many biomedical applications. Most of the existing approaches assume that all features have the same cost. However, in many medical applications, this assumption may be inappropriate, as the acquisition of the value of some features can be costly. For example, in a medical diagnosis, each diagnostic value extracted by a clinical test is associated with its own cost. Costs can also refer to non-financial aspects, for example, the decision between an invasive exploratory surgery and a simple blood test. In such cases, the goal is to select a subset of features associated with the class variable (e.g., the occurrence of disease) within the assumed user-specified budget. We consider a general information theoretic framework that allows controlling the costs of features. The proposed criterion consists of two components: the first one describes the feature relevance and the second one is a penalty for its cost. We introduce a cost factor that controls the trade-off between these two components. We propose a procedure in which the optimal value of the cost factor is chosen in a data-driven way. The experiments on artificial and real medical datasets indicate that, when the budget is limited, the proposed approach is superior to existing traditional feature selection methods. The proposed framework has been implemented in an open source library (Python package: https://github.com/kaketo/bcselector).

A new multi-objective differential evolution approach for simultaneous clustering and feature selection

Article

Jan 2020
ENG APPL ARTIF INTEL

Emrah Hancer

Today’s real-world data mostly involves incomplete, inconsistent, and/or irrelevant information that causes many drawbacks to transform it into an understandable format. In order to deal with such issues, data preprocessing is a proven discipline in data mining. One of the typical tasks in data preprocessing, feature selection aims to reduce the dimensionality in the data and thereby contributes to further processing. Feature selection is widely used to enhance the performance of a supervised learning algorithm (e.g., classification) but is rarely used in unsupervised tasks (e.g., clustering). This paper introduces a new multi-objective differential evolution approach in order to find relatively homogeneous clusters without the prior knowledge of cluster number using a smaller number of features from all available features in the data. To analyze the goodness of the introduced approach, several experiments are conducted on a various number of real-world and synthetic benchmarks using a variety of clustering approaches. From the analyzes through several different criteria, it is suggested that our method can significantly improve the clustering performance while reducing the dimensionality at the same time.

Fuzzy kernel feature selection with multi-objective differential evolution algorithm

Article

Jul 2019

Emrah Hancer

In this paper, we propose a multi-objective differential evolution-based filter approach for feature selection that interconnects fuzzy- and kernel-based information theory measures to find feature subsets that are optimal responses to the targets. In contrast to the existing filter approaches using the principles of information theory and rough set theory, our approach can be applied to continuous datasets without discretisation. Moreover, our study is the first in the literature that employs fuzzy and kernel measures to form a filter criterion for feature selection, to our knowledge. We prove various favourable results using a variety of benchmark datasets and also demonstrate that our approach can better search the dimensionality space to reach maximum predictive of the response.

Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm

Article

Jun 2019
EXPERT SYST APPL

Since different features may require different costs, the cost-sensitive feature selection problem become more and more important in real-world applications. Generally, it includes two main conflicting objectives, i.e., maximizing the classification performance and minimizing the feature cost. However, most existing approaches treat this task as a single-objective optimization problem. To satisfy various requirements of decision-makers, this paper studies a multi-objective feature selection approach, called two-archive multi-objective artificial bee colony algorithm (TMABC-FS). Two new operators, i.e., convergence-guiding search for employed bees and diversity-guiding search for onlooker bees, are proposed for obtaining a group of non-dominated feature subsets with good distribution and convergence. And two archives, i.e., the leader archive and the external archive are employed to enhance the search capability of different kinds of bees. The proposed TMABC-FS is validated on several datasets from UCI, and is compared with two traditional algorithms and three multi-objective methods. Results have shown that TMABC-FS is an efficient and robust optimization method for solving cost-sensitive feature selection problems.

A differential evolution approach for simultaneous clustering and feature selection

Conference Paper