ArticlePDF Available

A Genetic Algorithm-Based Feature Selection

Authors:
  • Department of Primary industries and regional development, Perth, Western Australia

Abstract

This article details the exploration and application of Genetic Algorithm (GA) for feature selection. Particularly a binary GA was used for dimensionality reduction to enhance the performance of the concerned classifiers. In this work, hundred (100) features were extracted from set of images found in the Flavia dataset (a publicly available dataset). The extracted features are Zernike Moments (ZM), Fourier Descriptors (FD), Lengendre Moments (LM), Hu 7 Moments (Hu7M), Texture Properties (TP) and Geometrical Properties (GP). The main contributions of this article are (1) detailed documentation of the GA Toolbox in MATLAB and (2) the development of a GA-based feature selector using a novel fitness function (kNN-based classification error) which enabled the GA to obtain a combinatorial set of feature giving rise to optimal accuracy. The results obtained were compared with various feature selectors from WEKA software and obtained better results in many ways than WEKA feature selectors in terms of classification accuracy
A preview of the PDF is not available
... This iterative process persists until a stop criterion is met, signaling the completion of execution and the provision of an optimal solution. Genetic algorithms (GA), particle swarm optimization (PSO), and migrating birds optimization (MBO) are prominent examples of population-based techniques widely adopted in the realm of feature selection [19]- [21]. In the subsequent section, we will closely examine MBO method due to its relevance and effectiveness in this context. ...
... The number of tours is set at 3 (m=3), and 2 denotes the number of entries to be changed. MBOSU-FS is systematically compared with concurrent feature selection methods: Particle swarm optimization for feature selection (PSOFS) [32] and genetic algorithmbased feature selection (GAFS) [19]. To ensure a fair comparison, all experiments adhere to the same number of fitness evaluations [26]. ...
Article
Full-text available
The feature selection task is a crucial phase in data analysis, aiming to identify a minimized set of relevant features for the target class, thereby eliminating irrelevant and redundant attributes used for model training. While population-based feature selection approaches offer prominent solutions for classification performance, their computational time can be prohibitive. To mitigate delays and optimize resource utilization, this study adopts machine learning operations (MLOps). MLOps involves the seamless transition of experimental Machine Learning models into production, serving them to end users and automating the feature selection phase. This paper introduces a novel feature selection method based on improved migrating bird optimization and its automated variant integrated into MLOps. Experiments conducted on six medical datasets validate the effectiveness of our proposed feature selection method in improving the outcomes of medical diagnosis systems. The results showcase satisfactory performance in terms of classification compared to concurrent feature selection algorithms.
... The genetic algorithm solves constrained and unconstrained optimization problems. These genetic algorithms are population-based and heuristic methods that are inspired by man [66]. Genetic algorithms use three important steps selection, reproduction, and termination. ...
Article
Full-text available
Diabetes complications are classified as Macro and Microvascular Diseases. Microvascular complications in type 2 Diabetic patients commonly occur as diabetic retinopathy, diabetic neuropathy, and diabetic nephropathy. Therefore detecting these microvascular complications from the clinical dataset is very important. The paper proposed a machine learning model for predicting and detecting microvascular diseases in type 2 diabetic Patients. In the initial stage data preprocessing is performed upon data. After the preprocessing feature selection is done using the Improved Enhanced Coati algorithm. The optimal features from the Improved Enhanced Coati Optimization algorithm are applied to various classification algorithms. The results are obtained for traditional classifiers such as XGB, KNN, SVM, RF, AdaBoost, Tree, and ANN algorithms. For the classification of diabetic retinopathy, the selected features are age, sex, BMI, BP, FPS, Family History, and Medical Adherence. Similarly, the features used to classify Diabetic Nephropathy are Sex, SP, FPS, Family History, Onset Age, and HbA1C and FPS used to classify Diabetic Neuropathy. On optimal selection of features various ML classification algorithms are applied. The results are compared with XGB, KNN, SVM, RF, AdaBoost, Tree, and ANN. The results are measured for training and testing on parameter accuracy and Random Forest Classifier results are optimal for the AdaBoost estimator for type 2 diabetic patients for the diabetic retinopathy is 99.9% and 94.78%, diabetic nephropathy, diabetic neuropathy is 99.8% and 95.44%. In the proposed methodology the feature-selecting fitness function is selected based on the received optimal accuracy from the feature-selecting estimator as AdaBoost. In Coati Optimizer the feature selection process is done by a fitness function that provides the minimum error.
... The work [54] utilized a binary GA for classifying features extracted from images in the Flavia dataset. Various GA approaches have been proposed for FS across different problem sizes [55,56], including hybrid GA methods integrating wrapper and embedded FS techniques for classification tasks. ...
Article
Full-text available
High-dimensional datasets often harbor redundant, irrelevant, and noisy features that detrimentally impact classification algorithm performance. Feature selection (FS) aims to mitigate this issue by identifying and retaining only the most pertinent features, thus reducing dataset dimensions. In this study, we propose an FS approach based on black hole algorithms (BHOs) augmented with a mutation technique termed MBHO. BHO typically comprises two primary phases. During the exploration phase, a set of stars is iteratively modified based on existing solutions, with the best star selected as the “black hole”. In the exploration phase, stars nearing the event horizon are replaced, preventing the algorithm from being trapped in local optima. To address the potential randomness-induced challenges, we introduce inversion mutation. Moreover, we enhance a widely used objective function for wrapper feature selection by integrating two new terms based on the correlation among selected features and between features and classification labels. Additionally, we employ a transfer function, the V2 transfer function, to convert continuous values into discrete ones, thereby enhancing the search process. Our approach undergoes rigorous evaluation experiments using fourteen benchmark datasets, and it is compared favorably against Binary Cuckoo Search (BCS), Mutual Information Maximization (MIM), Joint Mutual Information (JMI), and minimum Redundancy Maximum Eelevance (mRMR), approaches. The results demonstrate the efficacy of our proposed model in selecting superior features that enhance classifier performance metrics. Thus, MBHO is presented as a viable alternative to the existing state-of-the-art approaches. We make our implementation source code available for community use and further development.
... The work [56] utilized a binary GA for classifying features extracted from images in the Flavia dataset. Various GA approaches have been proposed for FS across different problem sizes [57,58], including hybrid GA methods integrating Wrapper and Embedded FS techniques for classification tasks. ...
Preprint
Full-text available
High dimensional datasets are highly likely to have redundant, irrelevant, and noisy features that negatively affect the performance of the classification algorithms. Selecting the most relevant features and reducing the dimensions of datasets by removing the undesired features is a dimensional reduction technique called Feature Selection (FS). In this paper, we propose an FS approach based on the Black Hole Algorithms (BHO) with a mutation technique called MBHO. Generally, BHO contains two major phases. At the exploitation phase, a set of stars are modified based on some rule and according to some objective function, the best star is selected as the black hole which attracts other stars. Furthermore, when a star gets closer to the event horizon, it will be swallowed and a new one will be randomly generated in the search space which thus is the exploration phase. However, randomness may cause the algorithm to fall into the trap of local optima, and to overcome such complications, inversion mutation is used. Furthermore, we modify a widely utilized objective function in most of the proposed works for wrapper feature selection by combining two new terms that are based on the correlation among the selected subset of features and the features and the classification label. We also utilize a transfer function, known as the V2 transfer function, to convert continuous values into discrete ones to enhance search. We assess our approach via extensive evaluation experiments using fourteen benchmark datasets. We benchmark the performance of a wrapper FS approach called Binary Cuckoo Search (BCS), and three filter-based FS approaches (namely Mutual Information Maximisation (MIM), Joint Mutual Information (JMI), and minimum Redundancy Maximum Relevance (mRMR)). Our evaluation has shown that the proposed model is an effective approach for FS, in selecting better features that enhance the performance metrics on the classifiers. Thus, MBHO can be utilized as one alternative to the existing state-of-art-approaches. We release the source codes of our implementation for the community to build on with new methods and datasets.
Article
Remote monitoring of the status of beehives is essential for efficient beekeeping, leading to less workload on the beekeeper and, because of not opening the hives too frequently, to less stress for the colonies. Sound analysis, utilizing machine learning models of various paradigms, is a common feature of so-called smart hives. Most of these models are aimed at the task of swarming prediction. Swarming of a colony, a fundamental phenomenon in the reproductive process of bees, can cause substantial losses in the production of the apiary and, thus, its prediction is of utmost importance. However, especially in case of nomadic beekeeping where the apiary is moved to the country without access to electricity and good internet connection, the used prediction models should run “on-site” with as low energy consumption as possible and using internet connection only to send alerts to the beekeeper. For such, lightweight models are required which can be achieved by using simpler prediction models and/or only the most important audio features. In this paper, the importance of audio features for swarming prediction is investigated by using a genetic algorithm. Various Machine Learning models are trained, using the selected features, and used for predicting swarming on real-world data collected in one Hungarian apiary. This experimental evaluation is the main contribution of this paper. While genetic algorithms are commonly used for feature selection, however, to the best of the authors' knowledge, they have not yet been used in the beekeeping domain.
Article
Bridge deterioration is attributed to inadequate maintenance budgets, ineffective restoration strategies, and rapidly changing climatic conditions. Considering that the latter can significantly exacerbate the deterioration initiated by the formers, there is a pressing need to develop climate change-informed management and resilience quantification/enhancement strategies for bridges. Although climate variability and bridge deterioration are logically related, explicit mathematical representation of their connection is still missing. The current study proposes a predictive climate-induced bridge deterioration framework based on data-driven approaches. Through the application of this framework, a closed-form expression linking bridge condition to its intrinsic characteristics, traffic volumes, and climate indices is generated. To demonstrate the framework’s utility, it was applied to evaluate the conditions of concrete bridges in Ontario, Canada. The generated model efficiently reproduced the actual bridge conditions between 2000 and 2020 considering a 70/30 training-to-testing splitting scheme. Further interpretation of the model demonstrated that the variability in bridge deterioration prediction is controlled by intrinsic characteristics followed by the climate characteristics and loading conditions, respectively. The model was subsequently applied between years 2022 and 2050, revealing the accelerated deterioration in the near future considering different climate change projections. Attributed to its generic nature, the presented framework can be applied to other infrastructure systems, when relevant data is available, to devise effective climate-informed management, resilience quantification and enhancement, and rehabilitation strategies.
Chapter
Supporting human activities using automated systems is an inherent aspect of making work more performant. We know support systems from process industries, driver assistance, or human–robot-collaboration. Recently, also exoskeletons were established to support human handling in manufacturing or to assist disabled persons in mobility. An important enabler for good human support is to understand those human intentions that need to be supported. Intentions need to be recognized by the support system early enough to generate the support function timely in the first place. Besides support functions need to be user-intention specific, e.g., need to provide assistance when and what the user really needs in the given contextual conditions. For this, support systems need to be able to identify users’ behavior and to then interpret and assist the user with the right content. This work shows an approach to identify the user intentions then to generate appropriate assistance. The approach makes practical use of task recognition based on pupillary dynamics and artificial neural networks. To realize context-related user support, a concept for context-related user support system using binocular See-Through Displays is shown. To identify tasks during their performance, characteristic features have been gathered from literature. Based on these basic parameters, further features have been developed for context identification. An artificial neural network is trained with the compiled and developed features to determine their suitability to classify activities. To generate task specific eye tracking data, activities are defined that correspond to industry-typical activities. During the performance of the defined tasks, pupillary dynamics are recorded on 9 subjects using an eye tracking system. From this, data sets, the features are extracted and an artificial neural network is trained to generate a classifier, which is able to classify the corresponding activity in-situ. In order to optimize the classification performance, a genetic algorithm (GA) is used to identify the features and feature sets with the highest relevance for the task identification using the artificial neural network.
Preprint
Full-text available
Human activity recognition (HAR) methods are becoming increasingly crucial in observing daily human actions, namely aged care, investigations, intelligent homes, healthcare, and sports. Smart gadgets have various sensors, such as a gyroscope, motion, and accelerometer, which are extensively utilized inertial sensors that can detect various human physical circumstances. Many studies on human action recognition have been conducted recently. Smartphone sensor data generate high-dimensional relevant features that may be used to detect human actions. However, not all of the vectors are vital in the detection phase. The 'curse of dimensionality' occurs when all feature vectors are included. A hybridized feature selection technique that incorporates a wrapper and filter approach has been proposed in this study. The technique employs a sequential floating forward search (SFFS) with a Genetic Algorithm (GA) to extract the necessary characteristics for enhanced activity detection. The characteristics are then supplied into a fuzzy-based recurrent neural network (FRNN) classifier to generate nonlinear classifiers using deep learning features for training and testing. A benchmark dataset is utilized to investigate the proposed model. The suggested system utilizes limited hardware resources effectively and accurately identifies activities.
Article
Considering the difficulty of parameter determination in the original support vector machine (SVM), the genetic algorithm (GA) is used to select the parameters of the SVM automatically and the orthogonal method is utilized to determine the best GA parameters. In view of the characteristics of apple leaf disease images, the SVM based on optimized GA is applied to apple leaf disease recognition. Firstly, the color and texture characteristics of apple disease leaf images are extracted as feature vectors. Then, Kernel principal component analysis (KPCA)-based feature selection is performed to identify the best features. Finally, the proposed optimized GA-SVM is used to classify apple leaf disease images. The orthogonal experimental results demonstrate that the proposed KPCA/GA-SVM model recognition ratios are 98.14%, 94.05%, and 97.96% for apple mosaic virus, apple rust and apple alternaria leaf spot, respectively. Compared with the recognition methods based on SVM, GA or PCA, the proposed approach shows higher performance.
Article
The purpose of this paper is to influence the researchers working in the field of algorithms for large scale systems about the efficiency of evolution algorithms as optimization techniques to explore robust large search spaces and find near-global optima. These evolutionary algorithms (EAs) can be an alternative to numerical methods in difficult optimization problems like complex systems where the phenomena are difficult to model due to uncertainty, noise or even too little knowledge of the real problem. In such cases EAs are robust procedures to overcome these difficulties. In general, these algorithms are not just an alternative to traditional methods, nowadays they are used hybridised form to complement and extend numerical methods. The convergence of heuristic optimization techniques is not affected by the continuity or differentiability of the functions to be optimized in the applications. These algorithms only require evaluation of the function in search space points. Applications of these evolutionary algorithms have been more convincing than their theory, which is still weak, though under progress. This paper is divided in two parts. A large number of references are included to enhance the presentation of the material. Finally, we describe the main aspects for solving an optimization problem of interest in Aerospace Industry by Genetic Algorithms. The problem considered is the optimum design of an airfoil shape, which is an inverse problem that consists of finding the shape for a given pressure distribution on the airfoil.
Article
In order to solve numerous practical navigational, geodetic and astro-geodetic problems, it is necessary to transform geocentric cartesian coordinates into geodetic coordinates or vice versa. It is very easy to solve the problem of transforming geodetic coordinates into geocentric cartesian coordinates. On the other hand, it is rather difficult to solve the problem of transforming geocentric cartesian coordinates into geodetic coordinates as it is very hard to define a mathematical relationship between the geodetic latitude (φ) and the geocentric cartesian coordinates (X, Y, Z). In this paper, a new algorithm, the Differential Search Algorithm (DS), is presented to solve the problem of transforming the geocentric cartesian coordinates into geodetic coordinates and its performance is compared with the performances of the classical methods (i.e., 3, 4, 30, 32, 33, 77, 2, 59 and 40) and Computational-Intelligence algorithms (i.e., ABC, JDE, JADE, SADE, EPSDE, GSA, PSO2011, and CMA–ES). The statistical tests realized for the comparison of performances indicate that the problem-solving success of DS algorithm in transforming the geocentric cartesian coordinates into geodetic coordinates is higher than those of all classical methods and Computational-Intelligence algorithms used in this paper.