$Critical difference diagrams showing the ranks after applying feature selection over the 38 real datasets. For feature selection methods that require a threshold, the option to keep 10%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10\%$$\end{document} is indicated by ‘-10’, the option to stay with 20%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\%$$\end{document} is indicated by ‘-20’, and the option ‘-log’ refers to use log2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$log_2$$\end{document}$

Critical difference diagrams showing the ranks after applying feature selection over the 38 real datasets. For feature selection methods that require a threshold, the option to keep 10%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10\%$$\end{document} is indicated by ‘-10’, the option to stay with 20%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\%$$\end{document} is indicated by ‘-20’, and the option ‘-log’ refers to use log2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$log_2$$\end{document}

Source publication

Simplex graphs for pair comparison of each feature selection method and...

Critical difference diagram showing the ranks after applying feature...

Finding a needle in a haystack: insights on feature selection for classification tasks

Article

Full-text available

Nov 2023

The growth of Big Data has resulted in an overwhelming increase in the volume of data available, including the number of features. Feature selection, the process of selecting relevant features and discarding irrelevant ones, has been successfully used to reduce the dimensionality of datasets. However, with numerous feature selection approaches in t...

Fig. 2 Proportions of countries assigned to each latent state from 2000...

Fig. 3 Predicted sequence of latent states across years (from 2000 to...

Fig. 4 Maps of the countries according to the predicted states under...

Results of the variable selection procedure under different simulated...

List of the 25 variables of the World Development Indicators and their...

Variable Selection for Hidden Markov Models with Continuous Variables and Missing Data

Article

Full-text available

Jan 2024

We propose a variable selection method for multivariate hidden Markov models with continuous responses that are partially or completely missing at a given time occasion. Through this procedure, we achieve a dimensionality reduction by selecting the subset of the most informative responses for clustering individuals and simultaneously choosing the o...

Figure 1. Distribution of the rank estimation error as a function of...

Figure 2. Sources of rank estimation error using analysis of means...

Figure 5. ccc and concordance as a function of the rank in the Swimmer...

Quartiles of the distribution of the rank estimation error as a...

Performance (in seconds) of concordance/error and ccc/error as a...

On Rank Selection in Non-Negative Matrix Factorization Using Concordance

Article

Full-text available

Nov 2023

The choice of the factorization rank of a matrix is critical, e.g., in dimensionality reduction, filtering, clustering, deconvolution, etc., because selecting a rank that is too high amounts to adjusting the noise, while selecting a rank that is too low results in the oversimplification of the signal. Numerous methods for selecting the factorizatio...

Overall workflow of the proposed framework.

Examples of images from each class in ISIC 2017 dataset.

Examples of images from each class in ISIC 2018 dataset.

Accuracy test results of pre‐trained models on ISIC 2017.

Accuracy test results of pre‐trained models on ISIC 2018.

A comparative study on deep feature selection methods for skin lesion classification

Article

Full-text available

Nov 2023

Melanoma, a widespread and hazardous form of cancer, has prompted researchers to prioritize dermoscopic image‐based algorithms for classifying skin lesions. Recently, there has been a growing trend in using pre‐trained convolutional neural networks for detecting skin lesions. However, the features extracted from these classifiers may include irrele...

Figure 2: Class structure in DAG hierarchy format

Figure 4: Number of publications per year 5.1 Q1: What are the methods...

Figure 7: Quantity and Percentage of Works by Type of Contribution 5.4...

Figure 8: Number and percentage of publications per answer to the...

Dimensionality Reduction for Hierarchical Multi-Label Classification: A Systematic Mapping Study

Article

Full-text available

Jan 2024

Hierarchical multi-label classification problems typically deal with datasets with many attributes and labels, which can negatively impact the classifier performance. The application of dimensionality reduction methods can significantly improve the performance of classifiers. Dimensionality reduction can be performed by feature extraction or featur...

The GB-AFS architecture: A Generate a feature separability matrix and...

Application of the Separability-Based Feature Space part of the GB-AFS...

Silhouette, SS, MSS and accuracy results obtained by three classifiers...

Comparison of the Accuracy of various classifiers utilizing k features...

GB-AFS: graph-based automatic feature selection for multi-class classification via Mean Simplified Silhouette

Article

Full-text available

May 2024

This paper introduces a novel graph-based filter method for automatic feature selection (abbreviated as GB-AFS) for multi-class classification tasks. The method determines the minimum combination of features required to sustain prediction performance while maintaining complementary discriminating abilities between different classes. It does not req...

FINESSD: Near-Storage Feature Selection with Mutual Information for Resource-Limited FPGAs

Conference Paper

May 2024

Feature selection is the data analysis process that selects a smaller and curated subset of the original dataset by filtering out data (features) which are irrelevant or redundant. The most important features can be ranked and selected based on statistical measures, such as mutual information. Feature selection not only reduces the size of dataset as well as the execution time for training Machine Learning (ML) models, but it can also improve the accuracy of the inference. This paper analyses mutual-information-based feature selection for resource-constrained FPGAs and proposes FINESSD, a novel approach that can be deployed for near-storage acceleration. This paper highlights that the Mutual Information Maximization (MIM) algorithm does not require multiple passes over the data while being a good trade-off between accuracy and FPGA resources, when approximated appropriately. The new FPGA accelerator for MIM generated by FINESSD can fully utilize the NVMe bandwidth of a modern SSD and perform feature selection without requiring full dataset transfers onto the main processor. The evaluation using a Samsung SmartSSD over small, large and out-of-core datasets shows that, compared to the mainstream multiprocessing Python ML libraries and an optimized C library, FINESSD yields up to 35x and 19x speedup respectively while being more than 70x more energy efficient for large, out-of-core datasets.

Similar publications

Citations