Content uploaded by Norshafarina Omar
Author content
All content in this area was uploaded by Norshafarina Omar on Oct 19, 2015
Content may be subject to copyright.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 64
Review of Feature Selection for Solving Classification Problems
Norshafarina Binti Omar
1
e-mail: norshafarina.omar@gmail.com
Fatimatufaridah Binti Jusoh
2
e-mail: efaridah88@gmail.com
Roliana Binti Ibrahim
4
e-mail: roliana@utm.my
Mohd Shahizan Bin Othman
3
e-mail: shahizan@fsksm.utm.my
Author(s) Contact Details:
1,2,3,4
Faculty of Computing, Universiti Teknologi Malaysia,, Johor, Malaysia
Abstract — Classification of data crosses different domains has been extensively researched and is one of the basic
methods for distinguishing one from another, as we need to know which belongs to which group. It has the capabilities to
infer the unseen dataset with unknown class by analyzing its structural similarity to a given dataset with known classes.
Reliability on classification results is very crucial issues. The higher the accuracy of generated classification results, the
better the classifier is. There are constantly seeking to increase the accuracy of classification, either through existing
techniques or through development of new ones. Different processes are applied to improve the accuracy of classification
performance. While most existing methods addressed this task aim at improving the classifier techniques, this paper
focused on reducing the number of features in dataset by selecting only the relevant features before giving the dataset to
classifier. This motivates the need for sufficient methods that capable of selecting the relevant features with minimal
information loss. The aim is to reduce the workload of classifier by using feature selection methods. With the focus on
classification performance accuracy, this paper highlights and discusses the concept, abilities and application of feature
selection for various applications in classification problem. From the review, classification with feature selection methods
has shown impressive results with significant accuracy when compared to classification without feature selection.
Keywords – accuracy, feature selection; classification; classifier; dataset
1. INTRODUCTION
Classification is one of the most important tasks in real world problem with the intention of finding the underlying
patterns of the data and making use of the found patterns [1]. It is obvious that we often flooded by data but lack of the
information and clearly data could not tell us anything without processing [2]. The central idea of classification is to learn
from the given dataset in which patterns with their classes are provided; with the output of the classifier is a model or
hypothesis that provides the relationship between the attributes and the class [3]. Complex classification problems are likely
to present large numbers of features, wherefore many of which will be redundant for the task of classification. Hence if the
number of features is very large, classifier will take more time to classify the dataset. Classification requires careful
consideration when it comes to dataset before giving the data to classifier. It is better to consider only necessary features
rather than adding many irrelevant features since it will makes classification process much harder. It is very useful to have
sufficient methods that capable of selecting the most relevant and informative features needed to come out with accurate and
reliable outcome for classification problems.
This paper is organized as follows: In Section 1, we present the introduction of this paper. Then Section 2 provides the
research methodology that used in this paper. We present the idea of feature selection for classification in Section 3. Based
on the observations made in Section 3, the experimental results are presented and discussed to prove the idea of feature
selection for solving classification problem in Section 4. Finally, Section 5 gives concluding remarks.
2. METHODOLOGY OF STUDY
This section briefly explains the methodology adopted in this research for this paper. For this section, it also discusses the
research steps taken in comparing the ability of classification with and without feature selection method. The summary of the
problem situation in classification as well as problem solution is summarized in Table 1.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 65
TABLE 1: Summary of problem situations and solution
Problem Situation Problem Solution
Huge amount of data with numerous features that
goes through classification caused the classifiers to
have a workload. Some of the features are irrelevant
and unnecessary, as well impacts (decreased) the
performance accuracy of classifiers.
Feature selection is adopted to find the significant
features. It reduced the workload of the classifiers,
which also improved the classification accuracy.
As stated in Table 1, to solve the classification problems, feature selection is adopted to select the most significant features
and finally improved the performance of classifiers. The overview of the research methodology is illustrated in Figure 1 and
it follows by the details of the research methodology itself.
FIGURE 1: Overview of Research Methodology
There are two phases that this research went through to come up with review of feature selection for solving classification
problems. This methodology is only based on previous experiments listed in Table 3. Phase 1 focused on identified the main
problem in classification. Based on the previous literature review, the classifier does not work well for dataset which have
many features. The used of classifier alone is not good enough (in terms of classification accuracy). To solve this problem,
[7, 14, 17, 18, 19, 20, 21, 22, 23] aim to reduce the number of features before giving the data to the classifier. It is done by
applying feature selection to select the most significant features by removing the unnecessary features. Phase 2 focused on
classification using feature selection. Previous researchers carried out the experiments to prove the efficient of feature
selection for solving classification problems. The feature selection with significant features is brought from previous steps
and classifier is used to classify the respective data with selected features. An interesting observation from previous works as
listed in Table 3, it can be seen that [14, 19, 20, 21, 22, 23] conducted several experiments by manipulated different feature
selection methods. The outcome of this phase is the performance of classification in terms of accuracy. The result from this
phase will then lead to the process of comparing on classification performance with or without feature selection methods.
Finally, the analysis is made based on classification accuracy that tested using classifier. Apart from that, the ideas of feature
selection for classification are briefly discussed in the next section.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 66
3. FEATURE SELECTION FOR CLASSIFICATION
Feature selection is considerable importance in pattern classification, data analysis, information retrieval, machine
learning and data mining application. [4] achieved impressive classification results with classifier alone; however the
approach works well for data which do not have many features and classification tasks using many features were beyond
their consideration. So, [4] proposed other techniques such as feature selection may be needed if it involved thousands of
attributes in order to choose relevant subset before giving them to classifier. As shown in Figure 2, the process of
classification with feature selection involves the input variable (represent by full dataset features) and the final output
variable is classification pattern based on selected features from the previous feature selection process.
Feature Selection Methods
Selected Features
Full Dataset
Features
Output
Input
Classifier Techniques Input
Classification
Pattern
Output
FIGURE 2: Feature Selection for Classification
Many feature selection methods for classification are reviewed critically in this paper, with particular emphasis on their
abilities. Feature selection is often applied to optimize the classification performance. With the application of feature
selection, for instance, to distinguish between healthy and cancer patients based on their gene expression profile, feature
selection provides a sufficient solution by reducing the size of these datasets which would otherwise have been unsuitable
for further processing [5,6]. The aim of feature selection is to find useful features to represent the data and remove non-
relevant ones, as well as to simplify the implementation of the actual pattern classifier itself; by determining what features
should be made available to classifier. Furthermore, feature selection tends to speed up the processing rate of the classifier
and at the same time lead to improved response times by reduced the input dimensionality [7, 8, 9, 10]. Additionally,
feature selection can improve the quality of classification in term of accuracy and these methods also work to select smaller
number of features from a huge feature space [11, 12]. [7] believed that FS methods are particularly desirable to facilitate
the interpretability of the outcome. [13] summarized the potential benefits of feature selection that may include; feature
selection facilitate data understanding, reduce the measurement and storage requirements, reduce the computational process
time, and lastly reduce the dimensionality of data to improve classification performance. In this paper, feature selection
methods were compared within four existing methods in term of it abilities. Therefore, Table 2 highlighted several abilities
of rough set theory (RST), particle swarm optimization (PSO), genetic algorithm (GA) and fuzzy-rough feature selection
(FRFS). Discrete particle swarm optimization (DPSO), new particle swarm optimization (NPSO) and chaotic binary
particle swarm optimization (CBPSO) are an extension of PSO methods. Meanwhile, sequential genetic algorithm (SGA)
and hybrid genetic algorithm (HGA) are an extension of GA methods. Other abilities of feature selection methods for
classification may exist, but discussion is only based on the previous works listed in Table 3 in section 4. Thus, this paper
discusses the abilities of feature selection method in classification problem in order to find the optimal features for better
classification performance.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 67
TABLE 2: Abilities of feature selection methods for classification
Feature Selection
Abilities
References
RST
•
Only the facts hidden in data are analyzed
• It finds a minimal knowledge representation.
• No additional knowledge about the data is required such as
expert (able to discover data dependencies)
[7,14]
PSO
•
Powerful exploration ability until the optimal solution is
found.
• Particle swarm has memory and knowledge of the solution is
retained by all particles
• Require simple mathematic operators.
• Running time is affected less by the number of features
[15]
GA
•
Quite effective for rapid search of large, nonlinear and
poorly understood spaces.
• Population of solutions can be modified at the same time.
• Provides several optimal or close-to-optimal features subsets
as output.
[7,16,11]
FRFS
•
Allow greater flexibility when handling noisy and real-
valued data.
• Maintain the underlying semantics of the feature set, thus
ensuring the resulting output is interpretable and the
inference explainable.
• Produces significantly less complex rules while having
higher classification accuracy.
• Capable to reduce dataset dimensionality, removing
redundant features that would otherwise increase rule
complexity and reduce the time for the induction process
itself.
[7]
a
:full features average accuracy;
b
:reduced features average accuracy
4. APPLICATION OF FEATURE SELECTION FOR CLASSIFICATION TASKS
In order to gain insight into how feature selection work in classification problems, we present the relevant survey of the
existing literature. Table 3 gives a brief description of previous works of feature selection for classification tasks.
Classifiers are adopted to show the effectiveness of feature selection methods in classification problems. Nevertheless, the
purpose of the present literature survey is to compare the classifier performance using the full features and the optimum
features from feature selection methods.
TABLE 3: Example of works of feature selection for classification
References Feature Selection Classifier Result Remarks
#Full
Features
#Reduced
Features
Classification
Accuracy (%)
[7] FRFS SMO` 38 11 71.80
a
72.50
b
Classification
accuracy is increased
with feature selection
methods
[14]
RST + DPSO SVM 25 6 93.20
a
94.80
b
The integration of
RST-DPSO is capable
of searching the
optimal features for
protein classification.
RST SVM 25 11 93.20
a
92.4
b
The use of RST alone
does not improve the
average classification
accuracy.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 68
[17] PSO SVM 30 13 Not stated
95.61
b
Feature selection
(PSO) coupled with
SVM presents a very
good performance for
pattern classification
problem compared to
other feature selection
methods
[18] PSO SVM Dataset 1
98
Dataset 2
98
14
12
4.16
a
84.16
b
4.16
a
90
b
Classification
performance is far
surpassed the
efficiency of
classification results.
[19]
GA Back
Propagati
on
Network(
BPN)
114 Not stated Not stated Feature selection can
efficiently search
optimal features,
however NPSO-BPN
selects better features
than GA.
NPSO BPN 114 Not stated Not stated
[20] GA SVM Dataset 1
23
Dataset 2
17
10
12
Not stated
a
77.4
b
Not stated
a
88.1
b
Classification
accuracy rates for IFS
with feature selection
perform significantly
superior to the PSO-
SVM and GA+SVM.
Improved Feature
Selection (IFS)
SVM Dataset 1
23
Dataset 2
17
9
10
75.9
a
80.2
b
85.8
a
89.6b
PSO SVM Dataset 1
23
Dataset 2
17
9
10
Not stated
a
76.8
b
Not stated
a
72.3
b
[21] Signal-to-noise
ratio (SNR)
ranking
SVM 50 5 Not stated
a
97.5
b
Kmeans +SNR feature
selections coupled
with SVM and k-NN
performed well which
helps to enhance the
classification
accuracy.
SNR k -nearest
neighbor
(k-NN)
Not stated
a
95.4
b
Kmeans +SNR SVM Not stated
a
99.3
b
Kmeans+SNR k-NN Not stated
a
99.3
b
[22] Ant Colony
Optimization(AC
O)
ANN Not
stated
Not stated Not stated
a
84.23
b
ACO-based feature
selection outperformed
GA-based feature
selection in term of
classification
accuracy.
GA ANN Not
stated
Not stated Not stated
a
83.47
b
[23] Sequential
forward search
(SPS)
1-NN 36 22 Not stated
a
90.45
b
An experimental result
has shown that
classification with
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 69
Sequential genetic
algorithm (SGA)
1-NN 22 Not stated
a
91.36
b
feature selection
provides a sufficient
approximation to the
true optimal solution.
Hybrid genetic
algorithm (HGA)
1-NN 22 Not stated
a
91.37
b
Chaotic binary
particle swarm
optimization
(CBPSO)
1-NN 21 Not stated
a
91.45
b
a
:full features average accuracy;
b
:reduced features average accuracy
It could be seen that almost all methods aim at reducing the number of features to improve the classification accuracy.
Thus, feature selection is needed in order to provide reasonable classification accuracy. It use feature selection to reduce
the non-relevant features without any compromise in classification accuracy. Based on the listed example of works, the
classification accuracy results that used the feature selection methods outperformed the use of full features dataset. The
results are very encouraging and it proved that the reduction of features does not influence the classification performance
instead the performance is highly improved. An issue that could be highlighted from this review is the number of selected
features that used for classification. Based on the experiment conducted by [12] they conclude that too few features being
selected through feature selection cannot function well and by selecting too many features also is no meaning for feature
reduction at all. They recommended that the number of features selection should be in an acceptable range especially in
classification. Moreover, based on experiment conducted by [24], it indicates that some of classifier technique like SVM
works well for data which do not have many features. However [7] stated that feature selection is not restricted involving
huge amount of datasets with huge number of features, but also have been applied to small and medium-sized datasets. [25]
once stated that a well designed feature selection methods will choose a small set feature that is highly predictive of
outcome. Different type of feature selection algorithm will produce different set of relevant features and will provide
different accuracy [25].
5. CONCLUSION
Based on a review of the existing literature, it may be appropriate to suggest that the best feature selection method for
classification tasks is PSO. PSO enjoy better selection in term of classification accuracy than many other feature selection
methods. However, other feature selection methods produced a comparatively reasonable accuracy too. Our goal, again, is
to discover the important role of applying feature selection that accurately classify the dataset, so then classifier can be of
great benefit by providing solution to extract useful knowledge. Additionally, classification accuracy can be significantly
improved with feature selection, wherefore feature selection methods were applied with promising results.
REFERENCES
[1] J. Hand, H. Mannila , P. Smyth, "Principles of data mining," MIT Press, 2001.
[2] H.Hasan, N.M. Tahir, “Feature selection of breast cancer based on principal component analysis”, 6th
International Colloqium on Signal Processing and Its Application, 2010.
[3] A.Ahmad, “Data transformation for decision tree ensembles,” PHD Degree Thesis, University of Manchester,
2009.
[4] C.W.Hsu, C.C.Chang, C.J.Lin, “A practical guide to support vector classification,”
[5] E.P. Xing, “Feature selection in microarray analysis: a practical approach to microarray data analysis,”
Kluwer Academic Publishers, 2003.
[6] M. Xiong, W. Li, J. Zhao, L. Jin, E. Boerwinkle, “feature (gene) selection in gene expression-based tumor
classification,” Molecular Genetics and Metabolism, vol. 73(3), pp. 239–247, 2001.
[7] R.Jensen, “Combining rough and fuzzy sets for feature selection,” PHD Degree Thesis, University of Edinburgh,
2005.
[8] W.H. Au, K.C.C. Chan, “An effective algorithm for discovering fuzzy rules in relational databases,” In
Proceedings of the 7th IEEE International Conference on Fuzzy Systems, pp. 1314–1319, 1998.
[9] S. Chen, S.L. Lee,C. Lee, “A new method for generating fuzzy rules from numerical data for handling
classification problems,” Applied Artificial Intelligence,vol. 15( 7), pp. 645–664, 2001.
[10] R. Jensen,Q. Shen, “Aiding fuzzy rule induction with fuzzy-rough attribute reduction,” In Proceedings of the
2002 UK Workshop on Computational Intelligence, pp. 81–88, 2002.
JOURNAL OF INFORMATION SYSTEMS RESEARCH AND INNOVATION
http://seminar.utmspace.edu.my/jisri/
ISSN: 2289-1358 P a g e | 70
[11] M. Kudo, J. Skalansky. Comparison of algorithms that select features for pattern classifiers,” Pattern
Recognition, vol. 33( 1), pp. 25–41. 2000
[12] X.Sun,Y.Liu,J.Li,J.Zhu,X.Liu, H.Chen, “ Using cooperative game theory to optimize the feature selection
problem,” Neurocomputing, 2012.
[13] T.P.Ling, “Iterative bayesian model averaging for patients’ survival analysis,” Bachelor Degree Thesis, Universiti
Teknologi Malaysia, 2010
[14] S.A. Rahman, A.A.Bakar, Z.A.M.Hussein, “Filter-wrapper approach to feature selection using RST-DPSO for
mining protein function,” 2nd Conference on Data Mining and Optimization, 2009.
[15] X.Wang,J.Yang,X.Teng,W.Xia,R.Jensen, “Feature selection based on rough sets and particle swarm
optimization,” Pattern Recognition Letters, vol.28, 2007.
[16] W. Siedlecki , J. Sklansky, “A note on genetic algorithms for large-scale feature selection,” Pattern Recognition
Letters, vol. 10(5), pp. 335–347.
[17] C.J.Tu, L.Y.Chuang, J.Y. Chang, C.H.Yang, “Feature selection using PSO-SVM”, IAENG International
Journal of Computer Science, vol.33, 2007.
[18] R.M.Sharkawy, K.Ibrahim, M.M.A.Salama, R.Bartnikas, “Particle swarm optimization feature selection for the
classification of conductiong particles in transformer oil,” IEEE Transaction on Dielectrics and Electrical
Insulation, vol.18, 2011.
[19] K.Geetha, K.Thanushkodi, A.K.Kumar, “New particle swarm optimization for feature selection and classification
of microclacifications in mammograms,” International Conference on Signal processing, Communication and
Networking,Madras Institute of Technology, Anna University Chermai India pp.458-463, 2008.
[20] Y.Liu, G.Wang, H.Chen,H.Dong,X.Zhu,S.Wang, “An improved particle swarm optimization for feature
selection,” Journal of Boinic Engineering, vol.8, 2011.
[21] D.Mishra, B.Sahu, “Feature selection for cancer classification: a signal-to-noise ratio approach,” International
Journal of Scientific and Engineering Research, vol.2, 2011.
[22] A.A.Ahmed, “Feature subset selection using ant colony optimization,” International Journal of Computational
Intelligence,pp. 53-58, 2005
[23] C.S.Yang,L.Y.Chuang,J.C.Li, C.H.Yang, “Chaotic maps in binary particle swarm optimization for feature
selection,” IEEE Conference on Soft Computing in Industrial Application, 2008.
[24] C.W.Hsu, C.C.Chang, C.J.Lin, “A practical guide to support vector classification,” 2003.
[25] A.Annest,R.E. Bumgarner, A.E. Raftery, K.Y.Yeung, “Iterative Bayesian model averaging: a method for the
application of survival analysis to high dimensional microarray data,” BMC Bioinformatics, vol.10(72), 2009.