Figure - uploaded by Barnali Sahu
Content may be subject to copyright.
Filter feature selection methos (non-soft computing methods)

Filter feature selection methos (non-soft computing methods)

Source publication
Article
Full-text available
Background: This paper studies the relevance of feature selection algorithms in microarray data for effective analysis. With no loss of generality, we present a list of feature selection algorithms and propose a generic categorizing framework that systematically groups algorithms into categories. The generic categorizing framework is based on searc...

Context in source publication

Context 1
... filter individual ranking technique belongs to univariate methods and wrapper, embedded or hybrid methods belong to multivariate methods. Filter individual ranking univariate methods as well as bivariate methods are non-soft computing methods of feature selection (Table 3). Wrapper FS technique uses subset generation strategy. ...

Similar publications

Conference Paper
Full-text available
the Rapid growth in telecom industry searching for new methods provides the better service to their customer, here implementing proposes a multi-hop dynamic channel assignment (MDCA) scheme for time division multiple access (TDMA) based multi-hop cellular networks. The proposed MDCA assigns channels are observed in (OFDMA) system to the calls based...
Article
Full-text available
Children in care of the state are amongst the most disadvantaged in society. They have often experienced adverse childhood experiences leading to their care entry including abuse and neglect. Longitudinal data suggests problems children in care of the state experience within adolescence persist into adulthood, showing “a continuing legacy of advers...
Preprint
Full-text available
We study the two-Higgs-doublet model with type-II seesaw mechanism. In view of constraints from the Higgs data, we consider the aligned two-Higgs-doublet scheme and its effects on muon anomalous magnetic dipole moment, $a_{\mu}$, including both one-loop and two-loop Barr-Zee type diagrams. Thanks to a sizable trilinear scalar coupling, the Barr-Zee...
Chapter
Full-text available
This chapter of the book aimed to establish the background of the USR (University Social Responsibility), the methodology used was the content analysis, with a qualitative approach, the type of review is descriptive and the document search strategy was carried out in Several stages, the first the research of scientific articles, the second the orga...
Article
Full-text available
The quality of life of college students is directly affected at the onset of college life due to the neglect of healthcare and predisposing risks such as obesity. Purpose: The main objective of the present study is to analyze the levels of physical activity and body mass index (BMI) of college students. Methods: The PubMed, Scielo, ScienceDirect, B...

Citations

... Feature selection applications are enormous which would be used for decision making in distinct fields such as e-commerce, business, government services, graph theory, metal industries, medical applications, disease-oriented prediction and treatment, gene selection, Microarray data etc. As shown in Table 1 the Feature selection process involves a series of steps, such as search direction, search strategy, evaluation strategy, stopping criteria, and validation process (Sahu, Dehuri, and Jagadev 2018). In order to do effective feature selection, features are classified as redundant, noisy, weakly as well as strongly relevant and irrelevant features in general (Yu and Liu 2004). ...
... Stopping of the feature selection process by using some constraints Fixed feature count, number of iterations, evaluation functionbased iterations (Sahu, Dehuri, and Jagadev 2018) Validation process ...
Article
Full-text available
Diabetes mellitus is a metabolic disorder that significantly implicates serious consequences in various parts of the human body, such as the Eye, Heart, kidney, Nerves, Foot, etc. The identification of consistent features significantly helps us to assess their impact on various organs of the human body and prevent further damage when detected at an early stage. The selection of appropriate features in the data set has potential benefits such as accuracy, minimizing complexity in terms of storage, computation, and positive decision-making. The left features might contain potential information that would be useful for analysis. In order to do effective analysis, additionally, all features should be studied and analyzed in plausible ways, such as using more feature selection (FS) methods with and without standardization. This article focuses on analyzing the critical factors of diabetes by using univariate, wrapper, and brute force FS techniques. To identify critical features, we used info gain, chi-square, RFE, and correlation using the NIDDK data. Later, distinct machine learning models were applied to both phases of the feature sets. This study was carried out in two phases to evaluate the efficacy of the techniques employed. The performance has been assessed using accuracy, F1score, and recall metrics.
... Rights reserved. [108]. Hence, there must be a pre-processing stage for gene selection on the microarray dataset. ...
... a. reducing the processing time that may be required to analyze these data, b. noise avoidance and c. preventing irrelevant genes from being used as classifiers [108,109]. ...
Article
Full-text available
Cancer is one of the most devastating health conditions in the world. In the diagnosis and treatment of the various forms of cancer illness, studies have shown that early detection of the cancer by clinical methods usually takes a considerable lengthy time. This informs searching for an alternative non-clinical diagnosis of cancer cases using microarray technology. Therefore, this study develops an efficient feature selection and classification method for high-dimensional microarray cancer data by combining Genetic Algorithms (GA) and Deep Belief Networks (DBN). The study employed a GA for selecting the most informative gene biomarker and DBN for the classification of biological samples. In a Monte Carlo experiment, the simulated and real-life microarray datasets were partitioned into 95% training and 5% test samples at 100 to 1000 epochs. The classifier was constructed using the training datasets while its efficiency was assessed on the test sample using the Misclassification Error Rate, Sensitivity, Specificity, and Receiver Operating Characteristic Analysis. The GA and DBN were implemented using Caret and Deepnet (R statistical packages). The proposed GADBN method was implemented using simulated and real-life datasets, which yielded out-of-bag average classification accuracy of 98.8% and 93.1% respectively. The proposed GADBN outperformed the other existing classifiers; and the GADBN method was more efficient with a smaller Misclassification Error Rate (MER) for Leukaemia 1 (0.18), Prostate 1 (0.35) and Prostate 3 (0.09) datasets than some of the existing methods under various performance indices considered. The proposed model is a powerful and effective instrument for identifying useful features in microarray cancer data and classifying the cancer types accordingly.
... Relevance selects a subset of relevant features from among those that are strongly relevant, weakly relevant, and irrelevant. Redundancy determines and eliminates redundant features from the subset of relevant features by producing a final subset of features (Sahu, Dehuri and Jagadev, 2018) (Aziz, Verma and Srivastava, 2016). Feature selection is more important than the classifier used, though the classifiers are the main component in microarray data analysis. ...
... Relevance selects a subset of relevant features from among those that are strongly relevant, weakly relevant, and irrelevant. Redundancy determines and eliminates redundant features from the subset of relevant features by producing a final subset of features (Sahu, Dehuri and Jagadev, 2018) (Aziz, Verma and Srivastava, 2016). Feature selection is more important than the classifier used, though the classifiers are the main component in microarray data analysis. ...
Chapter
Full-text available
Cancer is becoming a serious public health problem due to its increasing prevalence and fatality rate around the world. To diagnose such critical diseases, microarray technology has become a trend. It is necessary to find a fast and accurate method for cancer diagnosis and drug discovery that helps in eradicating the disease from the body. The raw microarray gene expression data contains an enormous number of features with a small sample size, making the classification of the dataset into an accurate class a challenging task. These microarray genes also contain a noisy, irrelevant, and redundant gene that results in poor diagnosis and classification. Hence, researchers employed various Machine Learning algorithms to retrieve the most relevant features from the gene expression data to achieve the objective. Thus, this chapter gives a comprehensive study of microarray gene expression data with feature selection and classification algorithms, and finally, future challenges are discussed.
... The value of the features is diminished, and more genes are discovered to be linked with the disease than those that were previously identified. This increases the complexity of the problem, causes computational strain, and generates useless noise in classification methods [23], [24]. As a result, identifying a minimal number of genes is crucial, known as informative genes that may be sufficient for acceptable classification. ...
... Entropy is the basic rule of information theory used as an equation for computing the similarity of the characteristics. If the samples are completely homogenous, for example, entropy is 0 for them, the 2057 entropy value of the evenly divided samples is one [24], [31]. Because of the dataset's higher dimension and small sample size, classifying these data is difficult. ...
Article
Full-text available
The problem with using microarray technology to detect diseases is that not each is analytically necessary. The presence of non-essential gene data adds a computing load to the detection method. Therefore, the purpose of this study is to reduce the high-dimensional data size by determining the most critical genes involved in Alzheimer's disease progression. A study also aims to predict patients with a subset of genes that cause Alzheimer's disease. This paper uses feature selection techniques like information gain (IG) and a novel metaheuristic optimization technique based on a swarm’s algorithm derived from nomadic people’s behavior (NPO). This suggested method matches the structure of these individuals' lives movements and the search for new food sources. The method is mostly based on a multi-swarm method; there are several clans, each seeking the best foraging opportunities. Prediction is carried out after selecting the informative genes of the support vector machine (SVM), frequently used in a variety of prediction tasks. The accuracy of the prediction was used to evaluate the suggested system's performance. Its results indicate that the NPO algorithm with the SVM model returns high accuracy based on the gene subset from IG and NPO methods. © 2023 Institute of Advanced Engineering and Science. All rights reserved.
... However, these models can be inefficient when applied to microarray datasets because these are composed of thousands of genes (features) and only a minimal number of samples [13]. Therefore, to minimize this unbalance, several feature selection methods can be applied to the microarray dataset, namely filter, wrapper and embedded methods [31]. Wrapper and embedded methods rely on learning methods and therefore have an expensive computational cost [32]. ...
Article
Full-text available
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.
... are worth indicating, where its use of the evaluation criteria with the class label, leads to enhanced correlation among features and reduced similarity among features [5]. ...
Article
Full-text available
The popular modified graph clustering ant colony optimization (ACO) algorithm (MGCACO) performs feature selection (FS) by grouping highly correlated features. However, the MGCACO has problems in local search, thus limiting the search for optimal feature subset. Hence, an enhanced feature clustering with ant colony optimization (ECACO) algorithm is proposed. The improvement constructs an ACO feature clustering method to obtain clusters of highly correlated features. The ACO feature clustering method utilizes the ability of various mechanisms, such as local and global search to provide highly correlated features. The performance of ECACO was evaluated on six benchmark datasets from the University California Irvine (UCI) repository and two deoxyribonucleic acid microarray datasets, and its performance was compared against that of five benchmark metaheuristic algorithms. The classifiers used are random forest, k-nearest neighbors, decision tree, and support vector machine. Experimental results on the UCI dataset show the superior performance of ECACO compared with other algorithms in all classifiers in terms of classification accuracy. Experiments on the microarray datasets, in general, showed that the ECACO algorithm outperforms other algorithms in terms of average classification accuracy. ECACO can be utilized for FS in classification tasks for high-dimensionality datasets in various application domains such as medical diagnosis, biological classification, and health care systems.
... The heuristic method applies a greedy approach to select features, and these feature subsets are created using an incremental strategy. Sequential backward selection, sequential backward selection, and sequential floating forward selection are some heuristic techniques [11]. Metaheuristic algorithms help to find out the global best with computationally less time for finding the best features in the FS problem. ...
Article
Full-text available
Computations that mimic nature are known as nature-inspired computing. Nature presents a wealthy source of thoughts and ideas for computing. The use of natural galvanized techniques has been found to provide machine solutions to complex problems. One of the challenging issues among researchers is high-dimensional data which contains a large number of unwanted, redundant, and irrelevant features. These redundant or unwanted features reduce the accuracy of machine learning models. Therefore, to solve this problem nowadays metaheuristic techniques are being used. The paper presents both surveys as well as comparison of five metaheuristic algorithms for feature selection. A wrapper-based feature selection approach using five nature-inspired techniques for feature selection has been applied. The binary version of the five swarm-based nature-inspired algorithms (NIAs), namely particle swarm optimization, whale optimization algorithm (WOA), grey wolf optimization (GWO), firefly algorithm, and bat algorithm. WOA and GWO are recent algorithms used for finding optimal feature subsets when there is no empirical information. The S-shape transfer function has been used to convert the continuous value to binary form and K-nearest neighbor is used to calculate the classification accuracy of selected feature subsets. To validate the results of the selected NIAs eleven benchmark datasets from the UCI repository are used. The strength of each NIA has been verified using a nonparametric test called the Friedman rank and Holm test. p value obtained shows that WOA is statistically significant and performs better than other models.
... The process of feature selection generates Y (i) | i = 1, 2,...,p where Y(i) represents the new subset of features and p is now the number of features in the subset with p ≤ m. There are three types of feature selection methods -Filter, Wrapper, and Embedded approaches [46][47]. Figure 11 shows the proposed computational model framework. The feature selection stage uses GA for dimensionality reduction. ...
... [26][27][28] The main intention of feature selection is to decide on a subset of features that diminish redundancy and boost the signi¯cance like the class labels in terms of classi¯cation. 29 Hence, feature selection is accomplished by upgrading learning performance, dropping computational complexity, constructing¯ner-global models and diminishing expected storage. It picks a cluster of features from the initial feature set not including any changeover and retains the physical implications of the initial features. ...
Article
Full-text available
Classification algorithm selection is an important concern for breast cancer diagnosis. The traditional routine of adopting a unique performance metric for evaluating classifiers is not adequate in the case of micro-array gene expression dataset. This paper introduces an MCDM technique to evaluate classification algorithms in breast cancer forecasting by seeing different performance measure along with feature space. An empirical study is designed to support an overall assessment of classifiers on micro-array datasets using well-known MCDM technique. TOPSIS is used to rank 11 prominent assessment criteria of different classifiers. First, the sequence order of 20 classifiers along with 11 assessment criteria is generated. Further topmost classifiers are grounded on their performances highlighting the role of feature selection in the overall process supporting the genuine assessment of classifiers over any solitary performance criteria. Result indicates that AdaBoostM1 and Iterative Classifier Optimizer are graded as topmost classifiers without and with feature selection, respectively, grounded on their performances on different measures. Furthermore, the proposed MCDM-based model can reconcile distinct or even inconsistent evaluation performance to grasp a group agreement in a complicated decision-making environment.