ArticlePDF Available

Review of Feature Selection for Solving Classification Problems

January 2013

January 2013

Authors:

Norshafarina Omar

Mohd Shahizan Othman

Universiti Teknologi Malaysia

Classification of data crosses different domains has been extensively researched and is one of the basic methods for distinguishing one from another, as we need to know which belongs to which group. It has the capabilities to infer the unseen dataset with unknown class by analyzing its structural similarity to a given dataset with known classes. Reliability on classification results is very crucial issues. The higher the accuracy of generated classification results, the better the classifier is. There are constantly seeking to increase the accuracy of classification, either through existing techniques or through development of new ones. Different processes are applied to improve the accuracy of classification performance. While most existing methods addressed this task aim at improving the classifier techniques, this paper focused on reducing the number of features in dataset by selecting only the relevant features before giving the dataset to classifier. This motivates the need for sufficient methods that capable of selecting the relevant features with minimal information loss. The aim is to reduce the workload of classifier by using feature selection methods. With the focus on classification performance accuracy, this paper highlights and discusses the concept, abilities and application of feature selection for various applications in classification problem. From the review, classification with feature selection methods has shown impressive results with significant accuracy when compared to classification without feature selection.

Content uploaded by Norshafarina Omar

Content may be subject to copyright.

JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION

http://seminar.utmspace.edu.my/jisri/

ISSN: 2289-1358 P a g e | 64

Review of Feature Selection for Solving Classification Problems

Norshafarina Binti Omar

e-mail: norshafarina.omar@gmail.com

Fatimatufaridah Binti Jusoh

e-mail: efaridah88@gmail.com

Roliana Binti Ibrahim

e-mail: roliana@utm.my

Mohd Shahizan Bin Othman

e-mail: shahizan@fsksm.utm.my

Author(s) Contact Details:

1,2,3,4

Faculty of Computing, Universiti Teknologi Malaysia,, Johor, Malaysia

Abstract — Classification of data crosses different domains has been extensively researched and is one of the basic

methods for distinguishing one from another, as we need to know which belongs to which group. It has the capabilities to

infer the unseen dataset with unknown class by analyzing its structural similarity to a given dataset with known classes.

Reliability on classification results is very crucial issues. The higher the accuracy of generated classification results, the

better the classifier is. There are constantly seeking to increase the accuracy of classification, either through existing

techniques or through development of new ones. Different processes are applied to improve the accuracy of classification

performance. While most existing methods addressed this task aim at improving the classifier techniques, this paper

focused on reducing the number of features in dataset by selecting only the relevant features before giving the dataset to

classifier. This motivates the need for sufficient methods that capable of selecting the relevant features with minimal

information loss. The aim is to reduce the workload of classifier by using feature selection methods. With the focus on

classification performance accuracy, this paper highlights and discusses the concept, abilities and application of feature

selection for various applications in classification problem. From the review, classification with feature selection methods

has shown impressive results with significant accuracy when compared to classification without feature selection.

Keywords – accuracy, feature selection; classification; classifier; dataset

1. INTRODUCTION

Classification is one of the most important tasks in real world problem with the intention of finding the underlying

patterns of the data and making use of the found patterns [1]. It is obvious that we often flooded by data but lack of the

information and clearly data could not tell us anything without processing [2]. The central idea of classification is to learn

from the given dataset in which patterns with their classes are provided; with the output of the classifier is a model or

hypothesis that provides the relationship between the attributes and the class [3]. Complex classification problems are likely

to present large numbers of features, wherefore many of which will be redundant for the task of classification. Hence if the

number of features is very large, classifier will take more time to classify the dataset. Classification requires careful

consideration when it comes to dataset before giving the data to classifier. It is better to consider only necessary features

rather than adding many irrelevant features since it will makes classification process much harder. It is very useful to have

sufficient methods that capable of selecting the most relevant and informative features needed to come out with accurate and

reliable outcome for classification problems.

This paper is organized as follows: In Section 1, we present the introduction of this paper. Then Section 2 provides the

research methodology that used in this paper. We present the idea of feature selection for classification in Section 3. Based

on the observations made in Section 3, the experimental results are presented and discussed to prove the idea of feature

selection for solving classification problem in Section 4. Finally, Section 5 gives concluding remarks.

2. METHODOLOGY OF STUDY

This section briefly explains the methodology adopted in this research for this paper. For this section, it also discusses the

research steps taken in comparing the ability of classification with and without feature selection method. The summary of the

problem situation in classification as well as problem solution is summarized in Table 1.

JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION

http://seminar.utmspace.edu.my/jisri/

ISSN: 2289-1358 P a g e | 65

TABLE 1: Summary of problem situations and solution

Problem Situation Problem Solution

Huge amount of data with numerous features that

goes through classification caused the classifiers to

have a workload. Some of the features are irrelevant

and unnecessary, as well impacts (decreased) the

performance accuracy of classifiers.

Feature selection is adopted to find the significant

features. It reduced the workload of the classifiers,

which also improved the classification accuracy.

As stated in Table 1, to solve the classification problems, feature selection is adopted to select the most significant features

and finally improved the performance of classifiers. The overview of the research methodology is illustrated in Figure 1 and

it follows by the details of the research methodology itself.

FIGURE 1: Overview of Research Methodology

There are two phases that this research went through to come up with review of feature selection for solving classification

problems. This methodology is only based on previous experiments listed in Table 3. Phase 1 focused on identified the main

problem in classification. Based on the previous literature review, the classifier does not work well for dataset which have

many features. The used of classifier alone is not good enough (in terms of classification accuracy). To solve this problem,

[7, 14, 17, 18, 19, 20, 21, 22, 23] aim to reduce the number of features before giving the data to the classifier. It is done by

applying feature selection to select the most significant features by removing the unnecessary features. Phase 2 focused on

classification using feature selection. Previous researchers carried out the experiments to prove the efficient of feature

selection for solving classification problems. The feature selection with significant features is brought from previous steps

and classifier is used to classify the respective data with selected features. An interesting observation from previous works as

listed in Table 3, it can be seen that [14, 19, 20, 21, 22, 23] conducted several experiments by manipulated different feature

selection methods. The outcome of this phase is the performance of classification in terms of accuracy. The result from this

phase will then lead to the process of comparing on classification performance with or without feature selection methods.

Finally, the analysis is made based on classification accuracy that tested using classifier. Apart from that, the ideas of feature

selection for classification are briefly discussed in the next section.

JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION

http://seminar.utmspace.edu.my/jisri/

ISSN: 2289-1358 P a g e | 66

3. FEATURE SELECTION FOR CLASSIFICATION

Feature selection is considerable importance in pattern classification, data analysis, information retrieval, machine

learning and data mining application. [4] achieved impressive classification results with classifier alone; however the

approach works well for data which do not have many features and classification tasks using many features were beyond

their consideration. So, [4] proposed other techniques such as feature selection may be needed if it involved thousands of

attributes in order to choose relevant subset before giving them to classifier. As shown in Figure 2, the process of

classification with feature selection involves the input variable (represent by full dataset features) and the final output

variable is classification pattern based on selected features from the previous feature selection process.

Feature Selection Methods

Selected Features

Full Dataset

Features

Output

Input

Classifier Techniques Input

Classification

Pattern

Output

FIGURE 2: Feature Selection for Classification

Many feature selection methods for classification are reviewed critically in this paper, with particular emphasis on their

abilities. Feature selection is often applied to optimize the classification performance. With the application of feature

selection, for instance, to distinguish between healthy and cancer patients based on their gene expression profile, feature

selection provides a sufficient solution by reducing the size of these datasets which would otherwise have been unsuitable

for further processing [5,6]. The aim of feature selection is to find useful features to represent the data and remove non-

relevant ones, as well as to simplify the implementation of the actual pattern classifier itself; by determining what features

should be made available to classifier. Furthermore, feature selection tends to speed up the processing rate of the classifier

and at the same time lead to improved response times by reduced the input dimensionality [7, 8, 9, 10]. Additionally,

feature selection can improve the quality of classification in term of accuracy and these methods also work to select smaller

number of features from a huge feature space [11, 12]. [7] believed that FS methods are particularly desirable to facilitate

the interpretability of the outcome. [13] summarized the potential benefits of feature selection that may include; feature

selection facilitate data understanding, reduce the measurement and storage requirements, reduce the computational process

time, and lastly reduce the dimensionality of data to improve classification performance. In this paper, feature selection

methods were compared within four existing methods in term of it abilities. Therefore, Table 2 highlighted several abilities

of rough set theory (RST), particle swarm optimization (PSO), genetic algorithm (GA) and fuzzy-rough feature selection

(FRFS). Discrete particle swarm optimization (DPSO), new particle swarm optimization (NPSO) and chaotic binary

particle swarm optimization (CBPSO) are an extension of PSO methods. Meanwhile, sequential genetic algorithm (SGA)

and hybrid genetic algorithm (HGA) are an extension of GA methods. Other abilities of feature selection methods for

classification may exist, but discussion is only based on the previous works listed in Table 3 in section 4. Thus, this paper

discusses the abilities of feature selection method in classification problem in order to find the optimal features for better

classification performance.

JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION

http://seminar.utmspace.edu.my/jisri/

ISSN: 2289-1358 P a g e | 67

TABLE 2: Abilities of feature selection methods for classification

Feature Selection

Abilities

References

RST

•

Only the facts hidden in data are analyzed

• It finds a minimal knowledge representation.

• No additional knowledge about the data is required such as

expert (able to discover data dependencies)

[7,14]

PSO

•

Powerful exploration ability until the optimal solution is

found.

• Particle swarm has memory and knowledge of the solution is

retained by all particles

• Require simple mathematic operators.

• Running time is affected less by the number of features

[15]

•

Quite effective for rapid search of large, nonlinear and

poorly understood spaces.

• Population of solutions can be modified at the same time.

• Provides several optimal or close-to-optimal features subsets

as output.

[7,16,11]

FRFS

•

Allow greater flexibility when handling noisy and real-

valued data.

• Maintain the underlying semantics of the feature set, thus

ensuring the resulting output is interpretable and the

inference explainable.

• Produces significantly less complex rules while having

higher classification accuracy.

• Capable to reduce dataset dimensionality, removing

redundant features that would otherwise increase rule

complexity and reduce the time for the induction process

itself.

[7]

:full features average accuracy;

:reduced features average accuracy

4. APPLICATION OF FEATURE SELECTION FOR CLASSIFICATION TASKS

In order to gain insight into how feature selection work in classification problems, we present the relevant survey of the

existing literature. Table 3 gives a brief description of previous works of feature selection for classification tasks.

Classifiers are adopted to show the effectiveness of feature selection methods in classification problems. Nevertheless, the

purpose of the present literature survey is to compare the classifier performance using the full features and the optimum

features from feature selection methods.

TABLE 3: Example of works of feature selection for classification

References Feature Selection Classifier Result Remarks

#Full

Features

#Reduced

Features

Classification

Accuracy (%)

[7] FRFS SMO` 38 11 71.80

72.50

Classification

accuracy is increased

with feature selection

methods

[14]

RST + DPSO SVM 25 6 93.20

94.80

The integration of

RST-DPSO is capable

of searching the

optimal features for

protein classification.

RST SVM 25 11 93.20

92.4

The use of RST alone

does not improve the

average classification

accuracy.

JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION

http://seminar.utmspace.edu.my/jisri/

ISSN: 2289-1358 P a g e | 68

[17] PSO SVM 30 13 Not stated

95.61

Feature selection

(PSO) coupled with

SVM presents a very

good performance for

pattern classification

problem compared to

other feature selection

methods

[18] PSO SVM Dataset 1

Dataset 2

4.16

84.16

4.16

Classification

performance is far

surpassed the

efficiency of

classification results.

[19]

GA Back

Propagati

Network(

BPN)

114 Not stated Not stated Feature selection can

efficiently search

optimal features,

however NPSO-BPN

selects better features

than GA.

NPSO BPN 114 Not stated Not stated

[20] GA SVM Dataset 1

Dataset 2

Not stated

77.4

Not stated

88.1

Classification

accuracy rates for IFS

with feature selection

perform significantly

superior to the PSO-

SVM and GA+SVM.

Improved Feature

Selection (IFS)

SVM Dataset 1

Dataset 2

75.9

80.2

85.8

89.6b

PSO SVM Dataset 1

Dataset 2

Not stated

76.8

Not stated

72.3

[21] Signal-to-noise

ratio (SNR)

ranking

SVM 50 5 Not stated

97.5

Kmeans +SNR feature

selections coupled

with SVM and k-NN

performed well which

helps to enhance the

classification

accuracy.

SNR k -nearest

neighbor

(k-NN)

Not stated

95.4

Kmeans +SNR SVM Not stated

99.3

Kmeans+SNR k-NN Not stated

99.3

[22] Ant Colony

Optimization(AC

ANN Not

stated

Not stated Not stated

84.23

ACO-based feature

selection outperformed

GA-based feature

selection in term of

classification

accuracy.

GA ANN Not

stated

Not stated Not stated

83.47

[23] Sequential

forward search

(SPS)

1-NN 36 22 Not stated

90.45

An experimental result

has shown that

classification with

JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION

http://seminar.utmspace.edu.my/jisri/

ISSN: 2289-1358 P a g e | 69

Sequential genetic

algorithm (SGA)

1-NN 22 Not stated

91.36

feature selection

provides a sufficient

approximation to the

true optimal solution.

Hybrid genetic

algorithm (HGA)

1-NN 22 Not stated

91.37

Chaotic binary

particle swarm

optimization

(CBPSO)

1-NN 21 Not stated

91.45

:full features average accuracy;

:reduced features average accuracy

It could be seen that almost all methods aim at reducing the number of features to improve the classification accuracy.

Thus, feature selection is needed in order to provide reasonable classification accuracy. It use feature selection to reduce

the non-relevant features without any compromise in classification accuracy. Based on the listed example of works, the

classification accuracy results that used the feature selection methods outperformed the use of full features dataset. The

results are very encouraging and it proved that the reduction of features does not influence the classification performance

instead the performance is highly improved. An issue that could be highlighted from this review is the number of selected

features that used for classification. Based on the experiment conducted by [12] they conclude that too few features being

selected through feature selection cannot function well and by selecting too many features also is no meaning for feature

reduction at all. They recommended that the number of features selection should be in an acceptable range especially in

classification. Moreover, based on experiment conducted by [24], it indicates that some of classifier technique like SVM

works well for data which do not have many features. However [7] stated that feature selection is not restricted involving

huge amount of datasets with huge number of features, but also have been applied to small and medium-sized datasets. [25]

once stated that a well designed feature selection methods will choose a small set feature that is highly predictive of

outcome. Different type of feature selection algorithm will produce different set of relevant features and will provide

different accuracy [25].

5. CONCLUSION

Based on a review of the existing literature, it may be appropriate to suggest that the best feature selection method for

classification tasks is PSO. PSO enjoy better selection in term of classification accuracy than many other feature selection

methods. However, other feature selection methods produced a comparatively reasonable accuracy too. Our goal, again, is

to discover the important role of applying feature selection that accurately classify the dataset, so then classifier can be of

great benefit by providing solution to extract useful knowledge. Additionally, classification accuracy can be significantly

improved with feature selection, wherefore feature selection methods were applied with promising results.

REFERENCES

[1] J. Hand, H. Mannila , P. Smyth, "Principles of data mining," MIT Press, 2001.

[2] H.Hasan, N.M. Tahir, “Feature selection of breast cancer based on principal component analysis”, 6th

International Colloqium on Signal Processing and Its Application, 2010.

[3] A.Ahmad, “Data transformation for decision tree ensembles,” PHD Degree Thesis, University of Manchester,

2009.

[4] C.W.Hsu, C.C.Chang, C.J.Lin, “A practical guide to support vector classification,”

[5] E.P. Xing, “Feature selection in microarray analysis: a practical approach to microarray data analysis,”

Kluwer Academic Publishers, 2003.

[6] M. Xiong, W. Li, J. Zhao, L. Jin, E. Boerwinkle, “feature (gene) selection in gene expression-based tumor

classification,” Molecular Genetics and Metabolism, vol. 73(3), pp. 239–247, 2001.

[7] R.Jensen, “Combining rough and fuzzy sets for feature selection,” PHD Degree Thesis, University of Edinburgh,

2005.

[8] W.H. Au, K.C.C. Chan, “An effective algorithm for discovering fuzzy rules in relational databases,” In

Proceedings of the 7th IEEE International Conference on Fuzzy Systems, pp. 1314–1319, 1998.

[9] S. Chen, S.L. Lee,C. Lee, “A new method for generating fuzzy rules from numerical data for handling

classification problems,” Applied Artificial Intelligence,vol. 15( 7), pp. 645–664, 2001.

[10] R. Jensen,Q. Shen, “Aiding fuzzy rule induction with fuzzy-rough attribute reduction,” In Proceedings of the

2002 UK Workshop on Computational Intelligence, pp. 81–88, 2002.

JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION

http://seminar.utmspace.edu.my/jisri/

ISSN: 2289-1358 P a g e | 70

[11] M. Kudo, J. Skalansky. Comparison of algorithms that select features for pattern classifiers,” Pattern

Recognition, vol. 33( 1), pp. 25–41. 2000

[12] X.Sun,Y.Liu,J.Li,J.Zhu,X.Liu, H.Chen, “ Using cooperative game theory to optimize the feature selection

problem,” Neurocomputing, 2012.

[13] T.P.Ling, “Iterative bayesian model averaging for patients’ survival analysis,” Bachelor Degree Thesis, Universiti

Teknologi Malaysia, 2010

[14] S.A. Rahman, A.A.Bakar, Z.A.M.Hussein, “Filter-wrapper approach to feature selection using RST-DPSO for

mining protein function,” 2nd Conference on Data Mining and Optimization, 2009.

[15] X.Wang,J.Yang,X.Teng,W.Xia,R.Jensen, “Feature selection based on rough sets and particle swarm

optimization,” Pattern Recognition Letters, vol.28, 2007.

[16] W. Siedlecki , J. Sklansky, “A note on genetic algorithms for large-scale feature selection,” Pattern Recognition

Letters, vol. 10(5), pp. 335–347.

[17] C.J.Tu, L.Y.Chuang, J.Y. Chang, C.H.Yang, “Feature selection using PSO-SVM”, IAENG International

Journal of Computer Science, vol.33, 2007.

[18] R.M.Sharkawy, K.Ibrahim, M.M.A.Salama, R.Bartnikas, “Particle swarm optimization feature selection for the

classification of conductiong particles in transformer oil,” IEEE Transaction on Dielectrics and Electrical

Insulation, vol.18, 2011.

[19] K.Geetha, K.Thanushkodi, A.K.Kumar, “New particle swarm optimization for feature selection and classification

of microclacifications in mammograms,” International Conference on Signal processing, Communication and

Networking,Madras Institute of Technology, Anna University Chermai India pp.458-463, 2008.

[20] Y.Liu, G.Wang, H.Chen,H.Dong,X.Zhu,S.Wang, “An improved particle swarm optimization for feature

selection,” Journal of Boinic Engineering, vol.8, 2011.

[21] D.Mishra, B.Sahu, “Feature selection for cancer classification: a signal-to-noise ratio approach,” International

Journal of Scientific and Engineering Research, vol.2, 2011.

[22] A.A.Ahmed, “Feature subset selection using ant colony optimization,” International Journal of Computational

Intelligence,pp. 53-58, 2005

[23] C.S.Yang,L.Y.Chuang,J.C.Li, C.H.Yang, “Chaotic maps in binary particle swarm optimization for feature

selection,” IEEE Conference on Soft Computing in Industrial Application, 2008.

[24] C.W.Hsu, C.C.Chang, C.J.Lin, “A practical guide to support vector classification,” 2003.

[25] A.Annest,R.E. Bumgarner, A.E. Raftery, K.Y.Yeung, “Iterative Bayesian model averaging: a method for the

application of survival analysis to high dimensional microarray data,” BMC Bioinformatics, vol.10(72), 2009.

D-BullyRumbler: a safety rumble strip to resolve online denigration bullying using a hybrid filter-wrapper approach

Article

Full-text available

Jun 2020
MULTIMEDIA SYST

Denigration is a specialized form of cyberbullying which describes a recurrent, sustained and intentional attempt to damage the victim’s reputation or ruin the friendships that he or she has by spreading unfounded gossip or rumors online. It is the most common bullying tactic involving character assassination of public figures like celebrities and politicians. As a comprehensive approach to match to the scale of social media this research put forwards a D-BullyRumbler model for automatic detection and resolution of denigration cyberbullying in online textual content using a hybrid of lexicon-based and machine learning-based techniques. The model processes textual, content-based and user-based features to uncover denigration from two perspectives. Firstly, a direct explicit content analysis is done to look for denigration markers as features for model training and testing. Concurrently, potentially harmful messages, rumors, are identified as candidates and examined for target profile type to reveal the case of denigration. An additional OR operation is done to maintain the holistic framework. Another novelty of the work includes the use of hybrid filter-wrapper method, Chi-square filter and cuckoo search wrapper algorithm to improve the performance of reputation rumor classification module. Experimental results on social media datasets show the superior classification performance. The results validate the effectiveness of the proposed model which facilitates timely intervention by buzzing an alarm to the moderators and further forming a rumble safety strip to inhibit the production and dissemination of inappropriate content to protect the victims.

Soft computing for anomaly detection and prediction to mitigate IoT-based real-time abuse

Article

Full-text available

May 2021
PERS UBIQUIT COMPUT

Cyber-surveillance and connected devices can be misused to monitor, harass, isolate, and otherwise, harm individuals. In particular, these devices gather high volumes of personal data such as account details with shared passwords, person’s behavior and preferences, movements by GPS, and audio-video recordings which can be maneuvered. It is therefore imperative to define approaches that help mitigate the Internet of things (IoT)-based real-time abuse in a pro-active, reactive, or predictive manner. The key objective of this research is to outline and categorize such approaches. Further, to comprehend predictive analytics as a potential solution to mitigate technology abuse, we propose an anomaly detection methodology (MFEW_Bagging) to classify normal and abnormal use pattern categories in an Intrusion Detection System (IDS) for IoT system. A hybrid feature selection technique based on an ensemble of multiple filter–based techniques and a wrapper algorithm is firstly used as search method for finding an optimal feature subset. Further, ensemble learning technique, namely bagging, is used for final classification into normal and abnormal use pattern categories. The use of ensemble feature selection removes biasness of individual feature selection method during ensemble and identifies the optimal subset with non-redundant and relevant features. The proposed methodology is evaluated on publicly available real-time IDS dataset. The research persuades the need of designing robust and lightweight IDS for IoT-based smart environments which understand the cyber-security risks in a proactive predictive manner as it the best way to defend networks and systems with the growing IoT complexity.

Financial Fraud Detection and Prediction in Listed Companies Using SMOTE and Machine Learning Algorithms

Article

Full-text available

Aug 2022
Entropy

This paper proposes a new method that can identify and predict financial fraud among listed companies based on machine learning. We collected 18,060 transactions and 363 indicators of finance, including 362 financial variables and a class variable. Then, we eliminated 9 indicators which were not related to financial fraud and processed the missing values. After that, we extracted 13 indicators from 353 indicators which have a big impact on financial fraud based on multiple feature selection models and the frequency of occurrence of features in all algorithms. Then, we established five single classification models and three ensemble models for the prediction of financial fraud records of listed companies, including LR, RF, XGBOOST, SVM, and DT and ensemble models with a voting classifier. Finally, we chose the optimal single model from five machine learning algorithms and the best ensemble model among all hybrid models. In choosing the model parameter, optimal parameters were selected by using the grid search method and comparing several evaluation metrics of models. The results determined the accuracy of the optimal single model to be in a range from 97% to 99%, and that of the ensemble models as higher than 99%. This shows that the optimal ensemble model performs well and can efficiently predict and detect fraudulent activity of companies. Thus, a hybrid model which combines a logistic regression model with an XGBOOST model is the best among all models. In the future, it will not only be able to predict fraudulent behavior in company management but also reduce the burden of doing so.

A Novel Hyperspectral Method to Detect Moldy Core in Apple Fruits

Article

Full-text available

Jun 2022
SENSORS-BASEL

An innovative low-cost device based on hyperspectral spectroscopy in the near infrared (NIR) spectral region is proposed for the non-invasive detection of moldy core (MC) in apples. The system, based on light collection by an integrating sphere, was tested on 70 apples cultivar (cv) Golden Delicious infected by Alternaria alternata, one of the main pathogens responsible for MC disease. Apples were sampled in vertical and horizontal positions during five measurement rounds in 13 days’ time, and 700 spectral signatures were collected. Spectral correlation together with transmittance temporal patterns and ANOVA showed that the spectral region from 863.38 to 877.69 nm was most linked to MC presence. Then, two binary classification models based on Artificial Neural Network Pattern Recognition (ANN-AP) and Bagging Classifier (BC) with decision trees were developed, revealing a better detection capability by ANN-AP, especially in the early stage of infection, where the predictive accuracy was 100% at round 1 and 97.15% at round 2. In subsequent rounds, the classification results were similar in ANN-AP and BC models. The system proposed surpassed previous MC detection methods, needing only one measurement per fruit, while further research is needed to extend it to different cultivars or fruits.

Rumour detection using deep learning and filter-wrapper feature selection in benchmark twitter dataset

Article

Full-text available

Aug 2021
MULTIMED TOOLS APPL

Microblogs have become a customary news media source in recent times. But as synthetic text or ‘readfakes’ scale up the online disinformation operation, unsubstantiated pieces of information on social media platforms can cause significant havoc by misleading people. It is essential to develop models that can detect rumours and curtail its cascading effect and virality. Undeniably, quick rumour detection during the initial propagation phase is desirable for subsequent veracity and stance assessment. Linguistic features are easily available and act as important attributes during the initial propagation phase. At the same time, the choice of features is crucial for both interpretability and performance of the classifier. Motivated by the need to build a model for automatic rumour detection, this research proffers a hybrid model for rumour classification using deep learning (Convolution neural network) and a filter-wrapper (Information gain—Ant colony) optimized Naive Bayes classifier, trained and tested on the PHEME rumour dataset. The textual features are learnt using the CNN which are combined with the optimized feature vector generated using the filter-wrapper technique, IG-ACO. The resultant optimized vector is then used to train the Naïve Bayes classifier for rumour classification at the output layer of CNN. The proposed classifier shows improved performance to the existing works.

PENGARUH SELEKSI FITUR CITRA TERHADAP KLASIFIKASI TINGKAT KESEGARAN DAGING SAPI LOKAL

Article

Full-text available

Mar 2021

Identifying beef manually has some drawbacks because human visual has limitations and there are differences of human perception in assessing object quality. Several researches developed beef quality assessment methods based on image feature extraction. However, not all features support for obtaining the classification results that have high accuracy. The efficiency will be achieved if the classification analyzes only the relevant features. Therefore, a feature selection process is required to select relevant features and to eliminate irrelevant features to obtain more accurate and faster classification results. One of the feature selection algorithms is the F-Score which is a simple technique that measures the discrimination of two sets of real numbers. The features with the lowest ranking from the F-Score will be eliminated one by one until the most relevant features are obtained. The test is carried out by analyzing the classification results in the form of sensitivity, specificity, and accuracy values. The results of this research showed that by using the F-Score feature, the most relevant features for the classification of freshness level of local beef are obtained using the K-Nearest Neighbor (KNN) method. These features include the average color intensity R and standard deviation with a sensitivity of 0.8, a specificity of 0.93, and an accuracy of 86%. Keywords: Classification, Fiture Selection, F-Score, K-Nearest Neighbor, Local beef

Denigration Bullying Resolution using Wolf Search Optimized Online Reputation Rumour Detection

Article

Full-text available

Jan 2020

Denigration is the most common bullying tactic involving public figures like celebrities and politicians where rumourous stories, pictures and videos are posted online to discredit and defame. It involves online “dissing” or “gossiping” about someone by writing and distributing vulgar, derogatory, cruel, mean, or untrue rumours. An online denigrate comment is typically posted as a malicious viral rumour to hurt the victim. A model to detect defamatory posts in the form of online reputation rumours can facilitate pinpointing cases of denigration in target profiles. The key bottlenecks in analyzing rumours in real-time are characterized by the cross-platform, cross-lingual, multimodal, often skewed, high-dimensional nature of data. Optimal feature selection can avoid the curse of dimensionality, increase model accuracy, decrease model training time and enhance the generalizability of the model by reducing overfitting. This research proffers an implicit mechanism that comprehends the truth value of the reputation rumour using meta-heuristic optimization algorithm, Wolf Search algorithm (WSA). An empirical evaluation on the rumour dataset affirms that using feature selection maximizes the relevance and minimizes the redundancy in feature set to build an efficient rumour classification model.

Combination of Best First and F-Score Image Feature Selection Methods for Beef Freshness Classification

Conference Paper

Oct 2021

A Multiple Combined Method for Rebalancing Medical Data with Class Imbalances

Article

May 2021
COMPUT BIOL MED

Most classification algorithms assume that classes are in a balanced state. However, datasets with class imbalances are everywhere. The classes of actual medical datasets are imbalanced, severely impacting identification models and even sacrificing the classification accuracy of the minority class, even though it is the most influential and representative. The medical field has irreversible characteristics. Its tolerance rate for misjudgment is relatively low, and errors may cause irreparable harm to patients. Therefore, this study proposes a multiple combined method to rebalance medical data featuring class imbalances. The combined methods include (1) resampling methods (synthetic minority oversampling technique [SMOTE] and undersampling [US]), (2) particle swarm optimization (PSO), and (3) MetaCost. This study conducted two experiments with nine medical datasets to verify and compare the proposed method with the listing methods. A decision tree is used to generate decision rules for easy understanding of the research results. The results show that (1) the proposed method with ensemble learning can improve the area under a receiver operating characteristic curve (AUC), recall, precision, and F1 metrics; (2) MetaCost can increase sensitivity; (3) SMOTE can effectively enhance AUC; (4) US can improve sensitivity, F1, and misclassification costs in data with a high-class imbalance ratio; and (5) PSO-based attribute selection can increase sensitivity and reduce data dimension. Finally, we suggest that the dataset with an imbalanced ratio > 9 must use the US results to make the decision. As the imbalanced ratio is < 9, the decision-maker can simultaneously consider the results of SMOTE and US to identify the best decision.

Online tool condition monitoring for ultrasonic metal welding via sensor fusion and machine learning

Article

Feb 2021
J Manuf Process

In ultrasonic metal welding (UMW), tool wear significantly affects the weld quality and tool maintenance constitutes a substantial part of production cost. Thus, tool condition monitoring (TCM) is crucial for UMW. Despite extensive literature focusing on TCM for other manufacturing processes, limited studies are available on TCM for UMW. Existing TCM methods for UMW require offline high-resolution measurement of tool surface profiles, which leads to undesirable production downtime and delayed decision-making. This paper proposes a completely online TCM system for UMW using sensor fusion and machine learning (ML) techniques. A data acquisition (DAQ) system is designed and implemented to obtain in-situ sensing signals during welding processes. A large feature pool is then extracted from the sensing signals. A subset of features are selected and subsequently used by ML-based classification models. A variety of classification models are trained, validated, and tested using experimental data. The best-performing classification models can achieve close to 100% classification accuracy for both training and test datasets. The proposed TCM system not only provides real-time TCM for UMW but also can support optimal decision-making in tool maintenance. The TCM system can be extended to predict remaining useful life (RUL) of tools and integrated with a controller to adjust welding parameters accordingly.

Feature Selection for Cancer Classification: A Signal-to-noise Ratio Approach

Article

Full-text available

Nov 2010

Cancers are generally caused by abnormalities in the genetic material of the transformed cells. Cancer has a reputation as a deadly disease hence cancer research is intense scientific effort to understand disease. Classification is a machine learning technique used to predict group membership for data instances. There are several classification techniques such as decision tree induction, Bayesian classifier, k-nearest neighbor (k-NN), case-based reasoning, support vector machine (SVM), genetic algorithm etc. Feature selection for classification of cancer data is to discover gene expression profiles of diseased and healthy tissues and use the knowledge to predict the health state of new sample. It is usually impractical to go through all the details of the features before picking up the right features. This paper provides a model for feature selection using signal-to-noise ratio (SNR) ranking. Basically we have proposed two approaches of feature selection. In first approach, the genes of microarray data is clustered by k-means clustering and then SNR ranking is implemented to get top ranked features from each cluster and given to two classifiers for validation such as SVM and k-NN. In the second approach the features (genes) of microarray data set is ranked by implementing only SNR ranking and top scored feature are given to the classifier and validated. We have tested Leukemia data set for the proposed approach and 10fold cross validation method to validate the classifiers. The 10fold validation result of two approaches is compared with hold out validation result and again with results of leave one out cross validation (LOOCV) of different approaches in the literature. From the experimental evaluation we got 99.3% accuracy in first approach for both k-NN and SVM classifiers with five numbers of genes and with 10fold cross validation method. The accuracy result is compared with the accuracy of different methods available in the literature for leukemia data set with LOOCV, where only multiple-filter-multiple wrapper approach gives 100% accuracy in LOOCV with leukemia data set.

Combining rough and fuzzy sets for feature selection

Thesis

Full-text available

Jan 2005

Richard Jensen

A Practical Guide to Support Vector Classification

Article

Jan 2003

Feature selection of breast cancer based on Principal Component Analysis

Conference Paper

May 2010

Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques and allows computer to learn from past examples and detect patterns from large data sets, which is particularly well-suited to assist medical practitioners in diagnosis of disease based on a variety of test results. Therefore, in this research, we deemed further by developing feature extraction algorithm based on Principal Component Analysis (PCA) and Artificial Neural Network (ANNs) as classifier as the optimal tool to enhance the classification of benign or malignant based on the Wisconsin Breast Cancer Database. In addition, the three rules of thumb of PCA namely the Scree Test, Cumulative Variance and the KG rule are employed as feature selection. An ensemble of the reduced datasets based on these rules is used as the inputs to ANN classifier with back propagation algorithm. Initial results showed that this approach is able to discriminate between the normal and breast cancer patients.

Particle Swarm Optimization Feature Selection for the Classification of Conducting Particles in Transformer Oil

Article

Dec 2011

The determination of particle type and dimensions in transformer oil is accomplished by using a Particle Swarm Optimization (PSO) technique in terms of the features extracted from the measured partial discharge (PD) pulse patterns. PSO selection of effective features is shown to be successful with intelligent classification for both electrical and acoustically measured data. Classification results of individual measurements were also reliable and far surpassed the efficiency of classification results obtained using the classifier solely for the same dimension of input features. The approach in this paper provides a solid basis for a data mining technique that can be used for the interpretation of both time and phase resolved raw PD patterns by searching a wide range of statistical attributes.

Using cooperative game theory to optimize the feature selection problem

Article

Nov 2012
NEUROCOMPUTING

Feature selection is an important preprocessing step in machine learning and pattern recognition. Recent years, various information theoretic based measurements have been proposed to remove redundant and irrelevant features from high-dimensional data set as many as possible. One of the main disadvantages of existing filter feature selection methods is that they often ignore some features which have strong discriminatory power as a group but are weak as individuals. In this work, we propose a new framework for feature evaluation and weighting to optimize the performance of feature selection. The framework first introduces a cooperative game theoretic method based on Shapley value to evaluate the weight of each feature according to its influence to the intricate and intrinsic interrelation among features, and then provides the weighted features to feature selection algorithm. We also present a flexible feature selection scheme to employ any information criterion to our framework. To verify the effectiveness of our method, experimental comparisons on a set of UCI data sets are carried out using two typical classifiers. The results show that the proposed method achieves promising improvement on feature selection and classification accuracy.

Principles of Data Mining

Book

Jan 2001

The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.

Chaotic maps in binary particle swarm optimization for feature selection

Conference Paper

Jul 2008

Feature selection is a useful pre-processing technique for solving classification problems. The challenge of using evolutionary algorithms lies in solving the feature selection problem caused by the number of features. Classification data may contain useless, redundant or misleading features. To increase the classification accuracy, the primary objective is to remove irrelevant features in the feature space and identify the relevant features. Binary particle swarm optimization (BPSO) has been applied successfully in solving feature selection problem. In this paper, two kinds of chaotic maps are embedded in binary particle swarm optimization (BPSO), a logistic map and a tent map, respectively. The purpose of the chaotic maps is to determine the inertia weight of the BPSO. In this study, we propose the chaotic binary particle swarm optimization (CBPSO) method to implement feature selection, and the K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) serves as a classifier to evaluate the classification accuracies. The proposed method showed promising results for feature selection with respect to the number of feature subsets. The classification accuracy obtained by the proposed method is superior to ones obtained by the other methods from the literature.

A note on genetic algorithms for large-scale feature selection

Article

Nov 1989
PATTERN RECOGN LETT

We introduce the use of genetic algorithms (GA) for the selection of features in the design of automatic pattern classifiers. Our preliminary results suggest that GA is a powerful means of reducing the time for finding near-optimal subsets of features from large sets.

Sklansky, J.: Comparison of Algorithms that Select Features for Pattern Classifiers. Pattern Recognition 33, 25-41

Article

Jan 2000
PATTERN RECOGN

A comparative study of algorithms for large-scale feature selection (where the number of features is over 50) is carried out. In the study, the goodness of a feature subset is measured by leave-one-out correct-classification rate of a nearest-neighbor (1-NN) classifier and many practical problems are used. A unified way is given to compare algorithms having dissimilar objectives. Based on the results of many experiments, we give guidelines for the use of feature selection algorithms. Especially, it is shown that sequential floating search methods are suitable for small- and medium-scale problems and genetic algorithms are suitable for large-scale problems.

Review of Feature Selection for Solving Classification Problems

Abstract

Recommended publications

Predict the diagnosis of heart disease using feature selection and k-nearest neighbor algorithm

Correlation-Based and Contextual Merit-Based

Particle Swarm Optimization Feature Selection for Classification of Survival Analysis in Lymphoma Ca...

Single Feature Ranking and Binary Particle Swarm Optimisation Based Feature Subset Ranking for Featu...

Text feature selection using ant colony optimization

Multi-objective PSO Algorithm for Feature Selection Problems with Unreliable Data