ArticlePDF Available

Review of Feature Selection for Solving Classification Problems

Authors:

Abstract

Classification of data crosses different domains has been extensively researched and is one of the basic methods for distinguishing one from another, as we need to know which belongs to which group. It has the capabilities to infer the unseen dataset with unknown class by analyzing its structural similarity to a given dataset with known classes. Reliability on classification results is very crucial issues. The higher the accuracy of generated classification results, the better the classifier is. There are constantly seeking to increase the accuracy of classification, either through existing techniques or through development of new ones. Different processes are applied to improve the accuracy of classification performance. While most existing methods addressed this task aim at improving the classifier techniques, this paper focused on reducing the number of features in dataset by selecting only the relevant features before giving the dataset to classifier. This motivates the need for sufficient methods that capable of selecting the relevant features with minimal information loss. The aim is to reduce the workload of classifier by using feature selection methods. With the focus on classification performance accuracy, this paper highlights and discusses the concept, abilities and application of feature selection for various applications in classification problem. From the review, classification with feature selection methods has shown impressive results with significant accuracy when compared to classification without feature selection.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 64
Review of Feature Selection for Solving Classification Problems
Norshafarina Binti Omar
1
e-mail: norshafarina.omar@gmail.com
Fatimatufaridah Binti Jusoh
2
e-mail: efaridah88@gmail.com
Roliana Binti Ibrahim
4
e-mail: roliana@utm.my
Mohd Shahizan Bin Othman
3
e-mail: shahizan@fsksm.utm.my
Author(s) Contact Details:
1,2,3,4
Faculty of Computing, Universiti Teknologi Malaysia,, Johor, Malaysia
Abstract Classification of data crosses different domains has been extensively researched and is one of the basic
methods for distinguishing one from another, as we need to know which belongs to which group. It has the capabilities to
infer the unseen dataset with unknown class by analyzing its structural similarity to a given dataset with known classes.
Reliability on classification results is very crucial issues. The higher the accuracy of generated classification results, the
better the classifier is. There are constantly seeking to increase the accuracy of classification, either through existing
techniques or through development of new ones. Different processes are applied to improve the accuracy of classification
performance. While most existing methods addressed this task aim at improving the classifier techniques, this paper
focused on reducing the number of features in dataset by selecting only the relevant features before giving the dataset to
classifier. This motivates the need for sufficient methods that capable of selecting the relevant features with minimal
information loss. The aim is to reduce the workload of classifier by using feature selection methods. With the focus on
classification performance accuracy, this paper highlights and discusses the concept, abilities and application of feature
selection for various applications in classification problem. From the review, classification with feature selection methods
has shown impressive results with significant accuracy when compared to classification without feature selection.
Keywords – accuracy, feature selection; classification; classifier; dataset
1. INTRODUCTION
Classification is one of the most important tasks in real world problem with the intention of finding the underlying
patterns of the data and making use of the found patterns [1]. It is obvious that we often flooded by data but lack of the
information and clearly data could not tell us anything without processing [2]. The central idea of classification is to learn
from the given dataset in which patterns with their classes are provided; with the output of the classifier is a model or
hypothesis that provides the relationship between the attributes and the class [3]. Complex classification problems are likely
to present large numbers of features, wherefore many of which will be redundant for the task of classification. Hence if the
number of features is very large, classifier will take more time to classify the dataset. Classification requires careful
consideration when it comes to dataset before giving the data to classifier. It is better to consider only necessary features
rather than adding many irrelevant features since it will makes classification process much harder. It is very useful to have
sufficient methods that capable of selecting the most relevant and informative features needed to come out with accurate and
reliable outcome for classification problems.
This paper is organized as follows: In Section 1, we present the introduction of this paper. Then Section 2 provides the
research methodology that used in this paper. We present the idea of feature selection for classification in Section 3. Based
on the observations made in Section 3, the experimental results are presented and discussed to prove the idea of feature
selection for solving classification problem in Section 4. Finally, Section 5 gives concluding remarks.
2. METHODOLOGY OF STUDY
This section briefly explains the methodology adopted in this research for this paper. For this section, it also discusses the
research steps taken in comparing the ability of classification with and without feature selection method. The summary of the
problem situation in classification as well as problem solution is summarized in Table 1.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 65
TABLE 1: Summary of problem situations and solution
Problem Situation Problem Solution
Huge amount of data with numerous features that
goes through classification caused the classifiers to
have a workload. Some of the features are irrelevant
and unnecessary, as well impacts (decreased) the
performance accuracy of classifiers.
Feature selection is adopted to find the significant
features. It reduced the workload of the classifiers,
which also improved the classification accuracy.
As stated in Table 1, to solve the classification problems, feature selection is adopted to select the most significant features
and finally improved the performance of classifiers. The overview of the research methodology is illustrated in Figure 1 and
it follows by the details of the research methodology itself.
FIGURE 1: Overview of Research Methodology
There are two phases that this research went through to come up with review of feature selection for solving classification
problems. This methodology is only based on previous experiments listed in Table 3. Phase 1 focused on identified the main
problem in classification. Based on the previous literature review, the classifier does not work well for dataset which have
many features. The used of classifier alone is not good enough (in terms of classification accuracy). To solve this problem,
[7, 14, 17, 18, 19, 20, 21, 22, 23] aim to reduce the number of features before giving the data to the classifier. It is done by
applying feature selection to select the most significant features by removing the unnecessary features. Phase 2 focused on
classification using feature selection. Previous researchers carried out the experiments to prove the efficient of feature
selection for solving classification problems. The feature selection with significant features is brought from previous steps
and classifier is used to classify the respective data with selected features. An interesting observation from previous works as
listed in Table 3, it can be seen that [14, 19, 20, 21, 22, 23] conducted several experiments by manipulated different feature
selection methods. The outcome of this phase is the performance of classification in terms of accuracy. The result from this
phase will then lead to the process of comparing on classification performance with or without feature selection methods.
Finally, the analysis is made based on classification accuracy that tested using classifier. Apart from that, the ideas of feature
selection for classification are briefly discussed in the next section.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 66
3. FEATURE SELECTION FOR CLASSIFICATION
Feature selection is considerable importance in pattern classification, data analysis, information retrieval, machine
learning and data mining application. [4] achieved impressive classification results with classifier alone; however the
approach works well for data which do not have many features and classification tasks using many features were beyond
their consideration. So, [4] proposed other techniques such as feature selection may be needed if it involved thousands of
attributes in order to choose relevant subset before giving them to classifier. As shown in Figure 2, the process of
classification with feature selection involves the input variable (represent by full dataset features) and the final output
variable is classification pattern based on selected features from the previous feature selection process.
Feature Selection Methods
Selected Features
Full Dataset
Features
Output
Input
Classifier Techniques Input
Classification
Pattern
Output
FIGURE 2: Feature Selection for Classification
Many feature selection methods for classification are reviewed critically in this paper, with particular emphasis on their
abilities. Feature selection is often applied to optimize the classification performance. With the application of feature
selection, for instance, to distinguish between healthy and cancer patients based on their gene expression profile, feature
selection provides a sufficient solution by reducing the size of these datasets which would otherwise have been unsuitable
for further processing [5,6]. The aim of feature selection is to find useful features to represent the data and remove non-
relevant ones, as well as to simplify the implementation of the actual pattern classifier itself; by determining what features
should be made available to classifier. Furthermore, feature selection tends to speed up the processing rate of the classifier
and at the same time lead to improved response times by reduced the input dimensionality [7, 8, 9, 10]. Additionally,
feature selection can improve the quality of classification in term of accuracy and these methods also work to select smaller
number of features from a huge feature space [11, 12]. [7] believed that FS methods are particularly desirable to facilitate
the interpretability of the outcome. [13] summarized the potential benefits of feature selection that may include; feature
selection facilitate data understanding, reduce the measurement and storage requirements, reduce the computational process
time, and lastly reduce the dimensionality of data to improve classification performance. In this paper, feature selection
methods were compared within four existing methods in term of it abilities. Therefore, Table 2 highlighted several abilities
of rough set theory (RST), particle swarm optimization (PSO), genetic algorithm (GA) and fuzzy-rough feature selection
(FRFS). Discrete particle swarm optimization (DPSO), new particle swarm optimization (NPSO) and chaotic binary
particle swarm optimization (CBPSO) are an extension of PSO methods. Meanwhile, sequential genetic algorithm (SGA)
and hybrid genetic algorithm (HGA) are an extension of GA methods. Other abilities of feature selection methods for
classification may exist, but discussion is only based on the previous works listed in Table 3 in section 4. Thus, this paper
discusses the abilities of feature selection method in classification problem in order to find the optimal features for better
classification performance.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 67
TABLE 2: Abilities of feature selection methods for classification
Feature Selection
Abilities
References
RST
Only the facts hidden in data are analyzed
It finds a minimal knowledge representation.
No additional knowledge about the data is required such as
expert (able to discover data dependencies)
[7,14]
PSO
Powerful exploration ability until the optimal solution is
found.
Particle swarm has memory and knowledge of the solution is
retained by all particles
Require simple mathematic operators.
Running time is affected less by the number of features
[15]
GA
Quite effective for rapid search of large, nonlinear and
poorly understood spaces.
Population of solutions can be modified at the same time.
Provides several optimal or close-to-optimal features subsets
as output.
[7,16,11]
FRFS
Allow greater flexibility when handling noisy and real-
valued data.
Maintain the underlying semantics of the feature set, thus
ensuring the resulting output is interpretable and the
inference explainable.
Produces significantly less complex rules while having
higher classification accuracy.
Capable to reduce dataset dimensionality, removing
redundant features that would otherwise increase rule
complexity and reduce the time for the induction process
itself.
[7]
a
:full features average accuracy;
b
:reduced features average accuracy
4. APPLICATION OF FEATURE SELECTION FOR CLASSIFICATION TASKS
In order to gain insight into how feature selection work in classification problems, we present the relevant survey of the
existing literature. Table 3 gives a brief description of previous works of feature selection for classification tasks.
Classifiers are adopted to show the effectiveness of feature selection methods in classification problems. Nevertheless, the
purpose of the present literature survey is to compare the classifier performance using the full features and the optimum
features from feature selection methods.
TABLE 3: Example of works of feature selection for classification
References Feature Selection Classifier Result Remarks
#Full
Features
#Reduced
Features
Classification
Accuracy (%)
[7] FRFS SMO` 38 11 71.80
a
72.50
b
Classification
accuracy is increased
with feature selection
methods
[14]
RST + DPSO SVM 25 6 93.20
a
94.80
b
The integration of
RST-DPSO is capable
of searching the
optimal features for
protein classification.
RST SVM 25 11 93.20
a
92.4
b
The use of RST alone
does not improve the
average classification
accuracy.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 68
[17] PSO SVM 30 13 Not stated
95.61
b
Feature selection
(PSO) coupled with
SVM presents a very
good performance for
pattern classification
problem compared to
other feature selection
methods
[18] PSO SVM Dataset 1
98
Dataset 2
98
14
12
4.16
a
84.16
b
4.16
a
90
b
Classification
performance is far
surpassed the
efficiency of
classification results.
[19]
GA Back
Propagati
on
Network(
BPN)
114 Not stated Not stated Feature selection can
efficiently search
optimal features,
however NPSO-BPN
selects better features
than GA.
NPSO BPN 114 Not stated Not stated
[20] GA SVM Dataset 1
23
Dataset 2
17
10
12
Not stated
a
77.4
b
Not stated
a
88.1
b
Classification
accuracy rates for IFS
with feature selection
perform significantly
superior to the PSO-
SVM and GA+SVM.
Improved Feature
Selection (IFS)
SVM Dataset 1
23
Dataset 2
17
9
10
75.9
a
80.2
b
85.8
a
89.6b
PSO SVM Dataset 1
23
Dataset 2
17
9
10
Not stated
a
76.8
b
Not stated
a
72.3
b
[21] Signal-to-noise
ratio (SNR)
ranking
SVM 50 5 Not stated
a
97.5
b
Kmeans +SNR feature
selections coupled
with SVM and k-NN
performed well which
helps to enhance the
classification
accuracy.
SNR k -nearest
neighbor
(k-NN)
Not stated
a
95.4
b
Kmeans +SNR SVM Not stated
a
99.3
b
Kmeans+SNR k-NN Not stated
a
99.3
b
[22] Ant Colony
Optimization(AC
O)
ANN Not
stated
Not stated Not stated
a
84.23
b
ACO-based feature
selection outperformed
GA-based feature
selection in term of
classification
accuracy.
GA ANN Not
stated
Not stated Not stated
a
83.47
b
[23] Sequential
forward search
(SPS)
1-NN 36 22 Not stated
a
90.45
b
An experimental result
has shown that
classification with
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 69
Sequential genetic
algorithm (SGA)
1-NN 22 Not stated
a
91.36
b
feature selection
provides a sufficient
approximation to the
true optimal solution.
Hybrid genetic
algorithm (HGA)
1-NN 22 Not stated
a
91.37
b
Chaotic binary
particle swarm
optimization
(CBPSO)
1-NN 21 Not stated
a
91.45
b
a
:full features average accuracy;
b
:reduced features average accuracy
It could be seen that almost all methods aim at reducing the number of features to improve the classification accuracy.
Thus, feature selection is needed in order to provide reasonable classification accuracy. It use feature selection to reduce
the non-relevant features without any compromise in classification accuracy. Based on the listed example of works, the
classification accuracy results that used the feature selection methods outperformed the use of full features dataset. The
results are very encouraging and it proved that the reduction of features does not influence the classification performance
instead the performance is highly improved. An issue that could be highlighted from this review is the number of selected
features that used for classification. Based on the experiment conducted by [12] they conclude that too few features being
selected through feature selection cannot function well and by selecting too many features also is no meaning for feature
reduction at all. They recommended that the number of features selection should be in an acceptable range especially in
classification. Moreover, based on experiment conducted by [24], it indicates that some of classifier technique like SVM
works well for data which do not have many features. However [7] stated that feature selection is not restricted involving
huge amount of datasets with huge number of features, but also have been applied to small and medium-sized datasets. [25]
once stated that a well designed feature selection methods will choose a small set feature that is highly predictive of
outcome. Different type of feature selection algorithm will produce different set of relevant features and will provide
different accuracy [25].
5. CONCLUSION
Based on a review of the existing literature, it may be appropriate to suggest that the best feature selection method for
classification tasks is PSO. PSO enjoy better selection in term of classification accuracy than many other feature selection
methods. However, other feature selection methods produced a comparatively reasonable accuracy too. Our goal, again, is
to discover the important role of applying feature selection that accurately classify the dataset, so then classifier can be of
great benefit by providing solution to extract useful knowledge. Additionally, classification accuracy can be significantly
improved with feature selection, wherefore feature selection methods were applied with promising results.
REFERENCES
[1] J. Hand, H. Mannila , P. Smyth, "Principles of data mining," MIT Press, 2001.
[2] H.Hasan, N.M. Tahir, “Feature selection of breast cancer based on principal component analysis”, 6th
International Colloqium on Signal Processing and Its Application, 2010.
[3] A.Ahmad, “Data transformation for decision tree ensembles,” PHD Degree Thesis, University of Manchester,
2009.
[4] C.W.Hsu, C.C.Chang, C.J.Lin, “A practical guide to support vector classification,”
[5] E.P. Xing, “Feature selection in microarray analysis: a practical approach to microarray data analysis,”
Kluwer Academic Publishers, 2003.
[6] M. Xiong, W. Li, J. Zhao, L. Jin, E. Boerwinkle, “feature (gene) selection in gene expression-based tumor
classification,” Molecular Genetics and Metabolism, vol. 73(3), pp. 239–247, 2001.
[7] R.Jensen, “Combining rough and fuzzy sets for feature selection,” PHD Degree Thesis, University of Edinburgh,
2005.
[8] W.H. Au, K.C.C. Chan, “An effective algorithm for discovering fuzzy rules in relational databases,” In
Proceedings of the 7th IEEE International Conference on Fuzzy Systems, pp. 1314–1319, 1998.
[9] S. Chen, S.L. Lee,C. Lee, “A new method for generating fuzzy rules from numerical data for handling
classification problems,” Applied Artificial Intelligence,vol. 15( 7), pp. 645–664, 2001.
[10] R. Jensen,Q. Shen, “Aiding fuzzy rule induction with fuzzy-rough attribute reduction,” In Proceedings of the
2002 UK Workshop on Computational Intelligence, pp. 81–88, 2002.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 70
[11] M. Kudo, J. Skalansky. Comparison of algorithms that select features for pattern classifiers,” Pattern
Recognition, vol. 33( 1), pp. 25–41. 2000
[12] X.Sun,Y.Liu,J.Li,J.Zhu,X.Liu, H.Chen, “ Using cooperative game theory to optimize the feature selection
problem,” Neurocomputing, 2012.
[13] T.P.Ling, “Iterative bayesian model averaging for patients’ survival analysis,” Bachelor Degree Thesis, Universiti
Teknologi Malaysia, 2010
[14] S.A. Rahman, A.A.Bakar, Z.A.M.Hussein, “Filter-wrapper approach to feature selection using RST-DPSO for
mining protein function,” 2nd Conference on Data Mining and Optimization, 2009.
[15] X.Wang,J.Yang,X.Teng,W.Xia,R.Jensen, “Feature selection based on rough sets and particle swarm
optimization,” Pattern Recognition Letters, vol.28, 2007.
[16] W. Siedlecki , J. Sklansky, “A note on genetic algorithms for large-scale feature selection,” Pattern Recognition
Letters, vol. 10(5), pp. 335–347.
[17] C.J.Tu, L.Y.Chuang, J.Y. Chang, C.H.Yang, “Feature selection using PSO-SVM”, IAENG International
Journal of Computer Science, vol.33, 2007.
[18] R.M.Sharkawy, K.Ibrahim, M.M.A.Salama, R.Bartnikas, “Particle swarm optimization feature selection for the
classification of conductiong particles in transformer oil,” IEEE Transaction on Dielectrics and Electrical
Insulation, vol.18, 2011.
[19] K.Geetha, K.Thanushkodi, A.K.Kumar, “New particle swarm optimization for feature selection and classification
of microclacifications in mammograms,” International Conference on Signal processing, Communication and
Networking,Madras Institute of Technology, Anna University Chermai India pp.458-463, 2008.
[20] Y.Liu, G.Wang, H.Chen,H.Dong,X.Zhu,S.Wang, “An improved particle swarm optimization for feature
selection,” Journal of Boinic Engineering, vol.8, 2011.
[21] D.Mishra, B.Sahu, “Feature selection for cancer classification: a signal-to-noise ratio approach,” International
Journal of Scientific and Engineering Research, vol.2, 2011.
[22] A.A.Ahmed, “Feature subset selection using ant colony optimization,” International Journal of Computational
Intelligence,pp. 53-58, 2005
[23] C.S.Yang,L.Y.Chuang,J.C.Li, C.H.Yang, “Chaotic maps in binary particle swarm optimization for feature
selection,” IEEE Conference on Soft Computing in Industrial Application, 2008.
[24] C.W.Hsu, C.C.Chang, C.J.Lin, “A practical guide to support vector classification,” 2003.
[25] A.Annest,R.E. Bumgarner, A.E. Raftery, K.Y.Yeung, “Iterative Bayesian model averaging: a method for the
application of survival analysis to high dimensional microarray data,” BMC Bioinformatics, vol.10(72), 2009.
... Concurrently, the implicit denigration module classifies the incoming comment as 'dissing' or 'non-dissing' using textual, content-based and user-based features to detect rumors linked with target profiles. To achieve a superior performance accuracy of the classification model, we use a hybrid Chi-square filter [7] and a swarm-based cuckoo search [8] wrapper approach to generate an optimal textual feature set. This optimal feature set is combined with the content and user-based features to train a naïve Bayesian (NB) classifier and predict the output as 'dissing' or 'non-dissing'. ...
... It is followed by the feature selection task. Feature selection is an imperative step in any machine learning task, wherein a subset of most relevant features is selected from the entire feature set [7]. It is the process of selecting an optimal subset of features with the aim of maximizing or minimizing an objective function. ...
... That is, the filter methods choose the intrinsic properties of the features (i.e., the "relevance" of the features) measured via univariate statistics. A number of filter methods are available, such as, information gain [7], Pearson's correlation, and Chi square [59]. The proposed model uses Chi square (CS). ...
Article
Full-text available
Denigration is a specialized form of cyberbullying which describes a recurrent, sustained and intentional attempt to damage the victim’s reputation or ruin the friendships that he or she has by spreading unfounded gossip or rumors online. It is the most common bullying tactic involving character assassination of public figures like celebrities and politicians. As a comprehensive approach to match to the scale of social media this research put forwards a D-BullyRumbler model for automatic detection and resolution of denigration cyberbullying in online textual content using a hybrid of lexicon-based and machine learning-based techniques. The model processes textual, content-based and user-based features to uncover denigration from two perspectives. Firstly, a direct explicit content analysis is done to look for denigration markers as features for model training and testing. Concurrently, potentially harmful messages, rumors, are identified as candidates and examined for target profile type to reveal the case of denigration. An additional OR operation is done to maintain the holistic framework. Another novelty of the work includes the use of hybrid filter-wrapper method, Chi-square filter and cuckoo search wrapper algorithm to improve the performance of reputation rumor classification module. Experimental results on social media datasets show the superior classification performance. The results validate the effectiveness of the proposed model which facilitates timely intervention by buzzing an alarm to the moderators and further forming a rumble safety strip to inhibit the production and dissemination of inappropriate content to protect the victims.
... Using various swarm-based wrapper methods can eliminate the curse of dimensionality by removing unnecessary and improper features in the data. This research proffers the use of multiple filter methods, namely information-based (information gain) [20,21], divergence-based (Relief-F) [22], and dependency-based (chi-square) [23] hybridized with swarmbased ant colony optimization (ACO) [24] wrapper methods to maximize the relevance and minimize the redundancy in feature set. Finally, the optimized feature set is used to train an ensemble learning model (bagging) and make the final predictions. ...
... Service f 4 Flag f 5 Src_bytes f 6 Dst_bytes f 7 Land f 8 Wrong_fragment f 9 Urgent f 10 Hot f 11 Content Num_failed_logins f 12 Logged_in f 13 Num_compromised f 14 Root_shell f 15 Su_attempted f 16 Num_root f 17 Num_file_creations f 18 Num_shell f 19 Num_access_files f 20 Num_outbound_cmds f 21 Is_hot_login f 22 Is_guest_login f 23 Traffic (same service features) Count f 24 Srv_count f 25 Serror_rate f 26 Srv_serror_rate f 27 Rerror_rate f 28 Srv_rerror_rate f 29 Same_srv_rate f 30 Diff_srv_rate ...
Article
Full-text available
Cyber-surveillance and connected devices can be misused to monitor, harass, isolate, and otherwise, harm individuals. In particular, these devices gather high volumes of personal data such as account details with shared passwords, person’s behavior and preferences, movements by GPS, and audio-video recordings which can be maneuvered. It is therefore imperative to define approaches that help mitigate the Internet of things (IoT)-based real-time abuse in a pro-active, reactive, or predictive manner. The key objective of this research is to outline and categorize such approaches. Further, to comprehend predictive analytics as a potential solution to mitigate technology abuse, we propose an anomaly detection methodology (MFEW_Bagging) to classify normal and abnormal use pattern categories in an Intrusion Detection System (IDS) for IoT system. A hybrid feature selection technique based on an ensemble of multiple filter–based techniques and a wrapper algorithm is firstly used as search method for finding an optimal feature subset. Further, ensemble learning technique, namely bagging, is used for final classification into normal and abnormal use pattern categories. The use of ensemble feature selection removes biasness of individual feature selection method during ensemble and identifies the optimal subset with non-redundant and relevant features. The proposed methodology is evaluated on publicly available real-time IDS dataset. The research persuades the need of designing robust and lightweight IDS for IoT-based smart environments which understand the cyber-security risks in a proactive predictive manner as it the best way to defend networks and systems with the growing IoT complexity.
... The result showed that the latter can improve accuracy in decision trees and naive-Bayes. Omar et al. [16] studied classification, whether using feature selection or not. The result showed a higher accuracy and reduced classifier workload by selection of the relevant variable. ...
Article
Full-text available
This paper proposes a new method that can identify and predict financial fraud among listed companies based on machine learning. We collected 18,060 transactions and 363 indicators of finance, including 362 financial variables and a class variable. Then, we eliminated 9 indicators which were not related to financial fraud and processed the missing values. After that, we extracted 13 indicators from 353 indicators which have a big impact on financial fraud based on multiple feature selection models and the frequency of occurrence of features in all algorithms. Then, we established five single classification models and three ensemble models for the prediction of financial fraud records of listed companies, including LR, RF, XGBOOST, SVM, and DT and ensemble models with a voting classifier. Finally, we chose the optimal single model from five machine learning algorithms and the best ensemble model among all hybrid models. In choosing the model parameter, optimal parameters were selected by using the grid search method and comparing several evaluation metrics of models. The results determined the accuracy of the optimal single model to be in a range from 97% to 99%, and that of the ensemble models as higher than 99%. This shows that the optimal ensemble model performs well and can efficiently predict and detect fraudulent activity of companies. Thus, a hybrid model which combines a logistic regression model with an XGBOOST model is the best among all models. In the future, it will not only be able to predict fraudulent behavior in company management but also reduce the burden of doing so.
... The best performing model was selected, and then further improved by a dimensionality reduction aimed at reducing the number of independent variables. In fact, the presence of redundant information in the spectral data can distort classification results [45]. In classification problems, some statistical techniques can be used to minimize redundant data [46]. ...
Article
Full-text available
An innovative low-cost device based on hyperspectral spectroscopy in the near infrared (NIR) spectral region is proposed for the non-invasive detection of moldy core (MC) in apples. The system, based on light collection by an integrating sphere, was tested on 70 apples cultivar (cv) Golden Delicious infected by Alternaria alternata, one of the main pathogens responsible for MC disease. Apples were sampled in vertical and horizontal positions during five measurement rounds in 13 days’ time, and 700 spectral signatures were collected. Spectral correlation together with transmittance temporal patterns and ANOVA showed that the spectral region from 863.38 to 877.69 nm was most linked to MC presence. Then, two binary classification models based on Artificial Neural Network Pattern Recognition (ANN-AP) and Bagging Classifier (BC) with decision trees were developed, revealing a better detection capability by ANN-AP, especially in the early stage of infection, where the predictive accuracy was 100% at round 1 and 97.15% at round 2. In subsequent rounds, the classification results were similar in ANN-AP and BC models. The system proposed surpassed previous MC detection methods, needing only one measurement per fruit, while further research is needed to extend it to different cultivars or fruits.
... That is, the rumour classification in the output layer is done using a Naïve Bayes classifier. This classifier is trained using a combination of two sets of features, that is, the features which are learnt using the CNN and an optimized feature vector generated using the filter-wrapper technique, information gain (IG) [21] and ant colony optimization (ACO) [7]. This combined feature vector is used to train the Naïve Bayes classifier to predict the rumour. ...
Article
Full-text available
Microblogs have become a customary news media source in recent times. But as synthetic text or ‘readfakes’ scale up the online disinformation operation, unsubstantiated pieces of information on social media platforms can cause significant havoc by misleading people. It is essential to develop models that can detect rumours and curtail its cascading effect and virality. Undeniably, quick rumour detection during the initial propagation phase is desirable for subsequent veracity and stance assessment. Linguistic features are easily available and act as important attributes during the initial propagation phase. At the same time, the choice of features is crucial for both interpretability and performance of the classifier. Motivated by the need to build a model for automatic rumour detection, this research proffers a hybrid model for rumour classification using deep learning (Convolution neural network) and a filter-wrapper (Information gain—Ant colony) optimized Naive Bayes classifier, trained and tested on the PHEME rumour dataset. The textual features are learnt using the CNN which are combined with the optimized feature vector generated using the filter-wrapper technique, IG-ACO. The resultant optimized vector is then used to train the Naïve Bayes classifier for rumour classification at the output layer of CNN. The proposed classifier shows improved performance to the existing works.
... Fitur yang tidak relevan akan membuat proses klasifikasi menjadi jauh lebih sulit. Review pengaruh seleksi fitur terhadap peningkatan performa klasifikasi (Omar et al., 2013) menunjukkan peningkatan akurasi yang signifikan dibandingkan klasifikasi tanpa penerapan seleksi fitur. ...
Article
Full-text available
Identifying beef manually has some drawbacks because human visual has limitations and there are differences of human perception in assessing object quality. Several researches developed beef quality assessment methods based on image feature extraction. However, not all features support for obtaining the classification results that have high accuracy. The efficiency will be achieved if the classification analyzes only the relevant features. Therefore, a feature selection process is required to select relevant features and to eliminate irrelevant features to obtain more accurate and faster classification results. One of the feature selection algorithms is the F-Score which is a simple technique that measures the discrimination of two sets of real numbers. The features with the lowest ranking from the F-Score will be eliminated one by one until the most relevant features are obtained. The test is carried out by analyzing the classification results in the form of sensitivity, specificity, and accuracy values. The results of this research showed that by using the F-Score feature, the most relevant features for the classification of freshness level of local beef are obtained using the K-Nearest Neighbor (KNN) method. These features include the average color intensity R and standard deviation with a sensitivity of 0.8, a specificity of 0.93, and an accuracy of 86%. Keywords: Classification, Fiture Selection, F-Score, K-Nearest Neighbor, Local beef
... Moreover, it checks how relevant the keyword is throughout the corpus [6,7].It is followed by the feature selection task. Feature selection is an imperative step in any machine learning task, wherein a subset of most relevant features is selected from the entire feature set [8]. It is the process of selecting an optimal subset of features with the aim of maximizing or minimizing an objective function. ...
Article
Full-text available
Denigration is the most common bullying tactic involving public figures like celebrities and politicians where rumourous stories, pictures and videos are posted online to discredit and defame. It involves online “dissing” or “gossiping” about someone by writing and distributing vulgar, derogatory, cruel, mean, or untrue rumours. An online denigrate comment is typically posted as a malicious viral rumour to hurt the victim. A model to detect defamatory posts in the form of online reputation rumours can facilitate pinpointing cases of denigration in target profiles. The key bottlenecks in analyzing rumours in real-time are characterized by the cross-platform, cross-lingual, multimodal, often skewed, high-dimensional nature of data. Optimal feature selection can avoid the curse of dimensionality, increase model accuracy, decrease model training time and enhance the generalizability of the model by reducing overfitting. This research proffers an implicit mechanism that comprehends the truth value of the reputation rumour using meta-heuristic optimization algorithm, Wolf Search algorithm (WSA). An empirical evaluation on the rumour dataset affirms that using feature selection maximizes the relevance and minimizes the redundancy in feature set to build an efficient rumour classification model.
Article
Most classification algorithms assume that classes are in a balanced state. However, datasets with class imbalances are everywhere. The classes of actual medical datasets are imbalanced, severely impacting identification models and even sacrificing the classification accuracy of the minority class, even though it is the most influential and representative. The medical field has irreversible characteristics. Its tolerance rate for misjudgment is relatively low, and errors may cause irreparable harm to patients. Therefore, this study proposes a multiple combined method to rebalance medical data featuring class imbalances. The combined methods include (1) resampling methods (synthetic minority oversampling technique [SMOTE] and undersampling [US]), (2) particle swarm optimization (PSO), and (3) MetaCost. This study conducted two experiments with nine medical datasets to verify and compare the proposed method with the listing methods. A decision tree is used to generate decision rules for easy understanding of the research results. The results show that (1) the proposed method with ensemble learning can improve the area under a receiver operating characteristic curve (AUC), recall, precision, and F1 metrics; (2) MetaCost can increase sensitivity; (3) SMOTE can effectively enhance AUC; (4) US can improve sensitivity, F1, and misclassification costs in data with a high-class imbalance ratio; and (5) PSO-based attribute selection can increase sensitivity and reduce data dimension. Finally, we suggest that the dataset with an imbalanced ratio > 9 must use the US results to make the decision. As the imbalanced ratio is < 9, the decision-maker can simultaneously consider the results of SMOTE and US to identify the best decision.
Article
In ultrasonic metal welding (UMW), tool wear significantly affects the weld quality and tool maintenance constitutes a substantial part of production cost. Thus, tool condition monitoring (TCM) is crucial for UMW. Despite extensive literature focusing on TCM for other manufacturing processes, limited studies are available on TCM for UMW. Existing TCM methods for UMW require offline high-resolution measurement of tool surface profiles, which leads to undesirable production downtime and delayed decision-making. This paper proposes a completely online TCM system for UMW using sensor fusion and machine learning (ML) techniques. A data acquisition (DAQ) system is designed and implemented to obtain in-situ sensing signals during welding processes. A large feature pool is then extracted from the sensing signals. A subset of features are selected and subsequently used by ML-based classification models. A variety of classification models are trained, validated, and tested using experimental data. The best-performing classification models can achieve close to 100% classification accuracy for both training and test datasets. The proposed TCM system not only provides real-time TCM for UMW but also can support optimal decision-making in tool maintenance. The TCM system can be extended to predict remaining useful life (RUL) of tools and integrated with a controller to adjust welding parameters accordingly.
Article
Full-text available
Cancers are generally caused by abnormalities in the genetic material of the transformed cells. Cancer has a reputation as a deadly disease hence cancer research is intense scientific effort to understand disease. Classification is a machine learning technique used to predict group membership for data instances. There are several classification techniques such as decision tree induction, Bayesian classifier, k-nearest neighbor (k-NN), case-based reasoning, support vector machine (SVM), genetic algorithm etc. Feature selection for classification of cancer data is to discover gene expression profiles of diseased and healthy tissues and use the knowledge to predict the health state of new sample. It is usually impractical to go through all the details of the features before picking up the right features. This paper provides a model for feature selection using signal-to-noise ratio (SNR) ranking. Basically we have proposed two approaches of feature selection. In first approach, the genes of microarray data is clustered by k-means clustering and then SNR ranking is implemented to get top ranked features from each cluster and given to two classifiers for validation such as SVM and k-NN. In the second approach the features (genes) of microarray data set is ranked by implementing only SNR ranking and top scored feature are given to the classifier and validated. We have tested Leukemia data set for the proposed approach and 10fold cross validation method to validate the classifiers. The 10fold validation result of two approaches is compared with hold out validation result and again with results of leave one out cross validation (LOOCV) of different approaches in the literature. From the experimental evaluation we got 99.3% accuracy in first approach for both k-NN and SVM classifiers with five numbers of genes and with 10fold cross validation method. The accuracy result is compared with the accuracy of different methods available in the literature for leukemia data set with LOOCV, where only multiple-filter-multiple wrapper approach gives 100% accuracy in LOOCV with leukemia data set.
Conference Paper
Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques and allows computer to learn from past examples and detect patterns from large data sets, which is particularly well-suited to assist medical practitioners in diagnosis of disease based on a variety of test results. Therefore, in this research, we deemed further by developing feature extraction algorithm based on Principal Component Analysis (PCA) and Artificial Neural Network (ANNs) as classifier as the optimal tool to enhance the classification of benign or malignant based on the Wisconsin Breast Cancer Database. In addition, the three rules of thumb of PCA namely the Scree Test, Cumulative Variance and the KG rule are employed as feature selection. An ensemble of the reduced datasets based on these rules is used as the inputs to ANN classifier with back propagation algorithm. Initial results showed that this approach is able to discriminate between the normal and breast cancer patients.
Article
The determination of particle type and dimensions in transformer oil is accomplished by using a Particle Swarm Optimization (PSO) technique in terms of the features extracted from the measured partial discharge (PD) pulse patterns. PSO selection of effective features is shown to be successful with intelligent classification for both electrical and acoustically measured data. Classification results of individual measurements were also reliable and far surpassed the efficiency of classification results obtained using the classifier solely for the same dimension of input features. The approach in this paper provides a solid basis for a data mining technique that can be used for the interpretation of both time and phase resolved raw PD patterns by searching a wide range of statistical attributes.
Article
Feature selection is an important preprocessing step in machine learning and pattern recognition. Recent years, various information theoretic based measurements have been proposed to remove redundant and irrelevant features from high-dimensional data set as many as possible. One of the main disadvantages of existing filter feature selection methods is that they often ignore some features which have strong discriminatory power as a group but are weak as individuals. In this work, we propose a new framework for feature evaluation and weighting to optimize the performance of feature selection. The framework first introduces a cooperative game theoretic method based on Shapley value to evaluate the weight of each feature according to its influence to the intricate and intrinsic interrelation among features, and then provides the weighted features to feature selection algorithm. We also present a flexible feature selection scheme to employ any information criterion to our framework. To verify the effectiveness of our method, experimental comparisons on a set of UCI data sets are carried out using two typical classifiers. The results show that the proposed method achieves promising improvement on feature selection and classification accuracy.
Book
The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.
Conference Paper
Feature selection is a useful pre-processing technique for solving classification problems. The challenge of using evolutionary algorithms lies in solving the feature selection problem caused by the number of features. Classification data may contain useless, redundant or misleading features. To increase the classification accuracy, the primary objective is to remove irrelevant features in the feature space and identify the relevant features. Binary particle swarm optimization (BPSO) has been applied successfully in solving feature selection problem. In this paper, two kinds of chaotic maps are embedded in binary particle swarm optimization (BPSO), a logistic map and a tent map, respectively. The purpose of the chaotic maps is to determine the inertia weight of the BPSO. In this study, we propose the chaotic binary particle swarm optimization (CBPSO) method to implement feature selection, and the K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) serves as a classifier to evaluate the classification accuracies. The proposed method showed promising results for feature selection with respect to the number of feature subsets. The classification accuracy obtained by the proposed method is superior to ones obtained by the other methods from the literature.
Article
We introduce the use of genetic algorithms (GA) for the selection of features in the design of automatic pattern classifiers. Our preliminary results suggest that GA is a powerful means of reducing the time for finding near-optimal subsets of features from large sets.
Article
A comparative study of algorithms for large-scale feature selection (where the number of features is over 50) is carried out. In the study, the goodness of a feature subset is measured by leave-one-out correct-classification rate of a nearest-neighbor (1-NN) classifier and many practical problems are used. A unified way is given to compare algorithms having dissimilar objectives. Based on the results of many experiments, we give guidelines for the use of feature selection algorithms. Especially, it is shown that sequential floating search methods are suitable for small- and medium-scale problems and genetic algorithms are suitable for large-scale problems.