Conference PaperPDF Available

Test-cost-sensitive attribute reduction based on neighborhood rough set

Authors:

Abstract

Recent research in machine learning and data mining has produced a wide variety of algorithms for cost-sensitive learning. Most existing rough set methods on this issue deal with nominal attributes. This is because that nominal attributes produce equivalent relations and therefore are easy to process. However, in real applications, datasets often contain numerical attributes. As we know, numerical attributes are more complex than nominal ones and require more computational resources. Consequently, respective learning tasks are more challenging. This paper deals with test-cost-sensitive attribute reduction for numerical valued decision systems. Neighborhood rough set achieved success in numerical data processing, hence we adopt the model to define the minimal test cost reduct problem. Due to the complexity of the new problem, heuristic algorithms are needed to find a sub-optimal solution. We propose one kind of heuristic information, which is the sum of the positive region and weighted test cost. When the test cost is not considered, the information degrades to the positive region, which is the most commonly used one in classical rough set. Three metrics are adopted to evaluate the performance of reduction algorithms from a statistical viewpoint. Experimental results show that the proposed method takes advantages of test costs and therefore produces satisfactory results.
Test-Cost-Sensitive Attribute Reduction Based on Neighborhood Rough Set
Hong Zhao, Fan Min, and William Zhu
Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China.
Email: hongzhao2012@163.com, minfanphd@163.com, williamfengzhu@gmail.com
Abstract—Recent research in machine learning and data
mining has produced a wide variety of algorithms for cost-
sensitive learning. Most existing rough set methods on this
issue deal with nominal attributes. This is because that nominal
attributes produce equivalent relations and therefore are easy
to process. However, in real applications, datasets often contain
numerical attributes. As we know, numerical attributes are
more complex than nominal ones and require more com-
putational resources. Consequently, respective learning tasks
are more challenging. This paper deals with test-cost-sensitive
attribute reduction for numerical valued decision systems.
Neighborhood rough set achieved success in numerical data
processing, hence we adopt the model to define the minimal test
cost reduct problem. Due to the complexity of the new problem,
heuristic algorithms are needed to find a sub-optimal solution.
We propose one kind of heuristic information, which is the sum
of the positive region and weighted test cost. When the test
cost is not considered, the information degrades to the positive
region, which is the most commonly used one in classical rough
set. Three metrics are adopted to evaluate the performance of
reduction algorithms from a statistical viewpoint. Experimental
results show that the proposed method takes advantages of test
costs and therefore produces satisfactory results.
Keywords-Cost-sensitive learning, neighborhood, rough set,
reduction, heuristic algorithm.
I. INT ROD UC TI ON
In practical data mining applications, it is well known
that redundant data make the mining task rather difficult.
Attribute reduction is a successful technique to remove
them and facilitate the mining task. This issue has attracted
much attention in recent years [1], [2], [3], [4]. Different
definitions of reducts and respective optimal metrics are
applicable to different fields. When the test cost is not
considered, people proposed attribute reduction algorithms
to deal with nominal data [5], [6], [7]. On the other hand,
the attribute reduction algorithms based on neighborhood
rough set model are proposed to deal with numerical valued
decision systems [8], [9], [10].
Recently, the test-cost-sensitive attribute reduction prob-
lem was proposed in [11], [12], [13]. This problem has
wide application since the collection of data is not free,
and there is a test cost for each data item [14], [15]. The
test-cost-sensitive attribute reduction algorithm framework
in [16] is devoted to this problem. The algorithm of [11]
employs a user-specified factor λto adjust the heuristic
information function based on the test-cost of each attribute.
Corresponding author. Tel.: +86 133 7690 8359
The performance of the algorithm is satisfactory. However
the data is limited to nominal ones. Since numerical data
exists in real-world widely, there is much need to consider
them.
In this paper we define the test-cost-sensitive attribute
reduction problem on numerical data. Because neighborhood
rough set is successful in deal with numerical data, we adopt
the theory for our problem definition. In order to facilitate
neighborhood threshold settings, data items are normalized
firstly. Meanwhile, test costs are also normalized to facilitate
the definition of heuristic function.
In most existing works, positive region is used solely in
neighborhood as the heuristic information. We have a more
complex data model due to test costs. Specifically, we use
sum of the positive region and weighted test cost as the
heuristic information, with which a new heuristic algorithm
is designed. As we know, the value of positive region is a
number in the range [0,1]. Hence, the weighted test cost after
normalization can adjust the heuristic information based on
positive region in a small range. In order to ensure the
leading position of positive region, the adjustment adopts
addition instead of commonly used multiplication.
The Iris dataset with various test-cost settings is employed
to study the performance of our algorithm. Since there is
no test cost setting in the dataset, we use three distribu-
tion functions to generate test costs. The three functions
correspond with different applications. Moreover, we adopt
three metrics to evaluate the performance of the reduction
algorithms from a statistical viewpoint. Experimental results
show that our algorithm can generate a minimal test cost
reduct in most cases. This is because the proposed method
takes advantages of test costs. Experiments are undertaken
using an open source software called COSER (cost-sensitive
rough sets) [17].
The rest of the paper is organized as follows: Section II
presents the subjects needed in the other parts. Section III
shows the attribute reduction algorithm based on neighbor-
hood rough set. Experimental analysis is given in Section
IV. Conclusions come in Section V.
II. PRELIMINARIES
This section introduces preliminary knowledge of the
paper. First, the neighborhood rough set decision system is
reviewed. And then, test-cost-sensitive decision system is
discussed.
A. Neighborhood rough set decision systems
Formally, the decision system can be written as a 5-tuple
S=< U, C, D, {Va},{Ia}>, where Uis the nonempty
set called a universe, Cand Dare the nonempty sets
of variables called as conditional attributes and decision
attributes respectively. Vais the set of values for each
aCD, and Ia:UVais an information function
for each aCD. We often denote {Va|aCD}and
{Ia|aCD}by Vand I, respectively.
Definition 1: [8] Given arbitrary xiUand BC, the
neighborhood δB(xi)of xiis defined as:
δB(xi) = {xj|xjU, B(xi, xj)δ},(1)
where is a distance function. x1, x2, x3U, it satisfies
(1) ∆(x1, x2)0;
(2) ∆(x1, x2) = 0, if and only if x1=x2;
(3) ∆(x1, x2) = ∆(x2, x1);
(4) ∆(x1, x3)∆(x1, x2) + ∆(x2, x3);
A detailed survey on distance functions can be seen in [18].
If the attributes generate neighborhood relation over the
universe, the decision system is called a neighborhood
decision system. It is denoted by NDS =< U, C, D, V , I >.
Definition 2: [10] Given a neighborhood decision system
NDS,X1, X2, ..., XNare the object subsets with decisions
1to N,δB(xi)is the neighborhood information granules
including xiand generated by attributes BC, Then the
positive region (POS) of the decision is defined as
P OSB(D) = {xi|δB(xi)X, xiU}.(2)
The size of the neighborhood depends on threshold δ.
When δ= 0, the samples in the same neighborhood granule
are equivalent to each other. In this case, the neighborhood
rough sets are a natural generalization of Pawlak rough sets.
B. Attribute significance and reduction with neighborhood
model
The dependency degree of Dto Bis defined as the ratio
of consistent objects:
γB(D) = |P OSB(D)|
|U|.(3)
A number of definitions of relative reducts exist [19], [20],
[21] for different rough set models. This paper employs the
definition based on the positive region.
Definition 3: [8] Given a neighborhood decision system
NDS =< U, C, D, V, I >, B C, attribute subset Bis a
relative reduct if
(1)γB(D) = γC(D),
(2)aB, γB(D)> γBa(D).
C. Test-cost-sensitive decision systems
Since we have assumed that cost tests are undertaken in
parallel, so we consider the most widely used model as
follows:
Definition 4: [16] A test-cost-independent decision sys-
tem (TCI-DS) Sis the 6-tuple:
S= (U, C, D, {Va},{Ia}, c),(4)
where aCD,Uis the nonempty set called a universe, C
and Dare the nonempty set of variables called as conditional
attributes and decision attributes respectively. Vais the set
of values for each aCD, and Ia:UVais a
heuristic information function for each aCD, and
c:CR+∪ {0}is the test cost function. Test costs are
independent of one another, that is, c(B) = aBc(a)for
any BC.
III. TEST-C OS T-SENSITIVE ATTRIBUTE REDUCTION
BASED ON NEIGHBORHOOD DECISION SYSTEM
In this section, we discuss both the attribute value and test
cost normalization first. And then, a problem of test-cost-
sensitive attribute reduction based on neighborhood decision
system is proposed.
A. Attribute value normalization
To design test-cost-sensitive attribute reduction based on
neighborhood decision system, we need to set the threshold
δ, which determines the size of the neighborhood. Setting the
threshold is the most important problem in neighborhood-
based classification. In order to facilitate neighborhood
threshold settings, the values of attributes are normalized
first.
Example 1: Table I presents a decision system of Iris,
which conditional attributes are numerical data. Where U=
{x1, x2, x3, x4, ..., x149, x150 },C={Sepal-length, Sepal-
width, Petal-length, Petal-width}, and D={Class}. In
order to computing the distance of conditional attributes,
we normalized every attributes from their value into a range
from 0 to 1 firstly.
Table I
AN EX AMP LE N UME RIC AL VAL UE ATTR IB UTE D EC ISI ON TAB LE (IRI S)
Patient Sepal-length Sepal-width Petal-length Petal-width Class
x10.23529 0.77273 0.14286 0.04762 0
x20.29412 0.72727 0.11905 0.04762 0
x30.35294 0.09091 0.38095 0.42857 0.5
x40.64706 0.31818 0.52381 0.52381 0.5
x50.41176 0.31818 0.50000 0.42857 0.5
...... ...... ...... ...... ...... ......
x149 0.58824 0.54545 0.85714 1.00000 1
x150 0.44118 0.27273 0.64286 0.71429 1
B. Test cost normalization
For statistical purposes, three different schemes to produce
random test costs are adopted. These schemes comprise:
uniform distribution, normal distribution, and Pareto distri-
bution. For simplicity, test costs are integers ranging from
Mto N, and are evaluated independently [16].
In order to facilitate the definition of heuristic function,
we normalize the value of test cost.
Let BC, aiB.
c
i= (cimincost)/(maxcost mincost)(5)
is the normalized cost of attribute ai, where ciis the test
cost of attribute ai, and mincost and maxcost are the
minimum cost and the max one of all conditional attributes
respectively.
C. Test-cost-sensitive attribute reduction based on neighbor-
hood decision system
Most heuristic algorithms to the attribute reduction have
the same structure and their differences lie in the heuristic
function [2]. Now we define the heuristic function based on
the positive region and weighted test cost.
Definition 5: Let S= (U, C, D, {Va},{Ia}, c)be a
test-cost-sensitive neighborhood decision system , where c
is normalized one of c. Let BC, ai(CB),c
iis
defined in Equation(5).
Now we propose our positive region and weighted test
cost function as follows.
SI Gtc (ai, B, D, c
i) = SI G(ai, B, D) + (1 c
i)ρ, (6)
where ρis regulatory factor of the test cost, if ρ= 0,SIGtc
and SI G are equivalent, where
SI G(ai, B, D) = γBai(D).(7)
SI Gtc is the heuristic function with test costs, and SIG is
the one without taking into account test costs.
Now we propose a heuristic algorithm based on the
positive region and weighted test cost to find out the reduct
with minimal test cost. A framework of our heuristic method
is shown in Algorithm 1. In the proposed Algorithm 1,
if sigm= 1, the algorithm finds the best solution. The
proposed algorithm has stably performance from a statistical
perspective.
According to Equation (1), if let δ= 0, the neighborhood
rough set degenerates to classical one. In this case, the pro-
posed test-cost reduct method can be used to deal with both
nominal attributes and numerical ones without discretization.
IV. EXPERIMENTS
The complexity of classification not only depends on the
given feature space, but also the granularity level [8]. In this
paper, user can set granularity level and parameter δin one
set of experiments. We let δ= 0.005,0.008,0.011, ..., 0.029.
Algorithm 1 Test-cost-sensitive attribute reduction based on
neighborhood decision system
Input:(U, C, D, {Va},{Ia}, c)and δ,δis the threshold to
control the size of the neighborhood
Output: A reduct with minimal test cost red
Method:
1: red =;
2: sigm=1, sigt= 0 ;
3: while (sigm̸= 1||sigm̸=sigt)do
4: sigt=sigm;
5: for each ai(Cred)do
6: Compute SI Gtc (ai, red, D, ci);
7: end for
8: Select amand cmwith the maximal
SI Gtc (am, red, D, cm);
9: Compute sigm=SI G(am, red, D );
10: red =red am;
11: end while
12: if sigm= 1 then
13: return red;
14: end if
In Equation (6), ρis the regulatory factor of the test cost.
We let ρ= 0.01, which is just to reduce the influence of
the test cost but necessary. In other words, it can make the
SI G play a major role in the function.
For each distribution, we generate 100 sets of test costs,
and for each test cost setting, there are 9 δsettings.
A. Evaluation metrics
In order to dispel the influence of subjective and objective
factors, three evaluation metrics are adopted to compare
the performances. These are finding optimal factor (FOF),
maximal exceeding factor (MEF), and average exceeding
factor (AEF). The detail of definition can be seen in [16].
When the algorithm runs with different test cost settings, we
obtain some reducts with these evaluation metrics.
B. Statistical results
For different test cost distributions, the performance of
the algorithm is different. Figure 1 shows the results of
finding optimal factors. This metric is both qualitative and
quantitative. First, it only counts optimal solutions. Second,
it is computed statistically. Figure 2 shows the results of
maximal exceeding factors. The maximal exceeding factors
provide the worst case of the algorithm, and they should be
viewed as a statistical metric. Figure 3 shows the average
exceeding factors. This displays the overall performance of
the algorithm from a statistical perspective. On the whole,
with the Normal test cost distribution, the algorithm has the
best performance.
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
delta
Finding optimal factor
Uniform
Normal
Pareto
Figure 1. Finding optimal factor(FOF)
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
delta
Maximal exceeding factor
Uniform
Normal
Pareto
Figure 2. Maximal exceeding factor(MEF)
C. Performance comparison
If let ρ= 0, the Equation (6) will degrade to the Equation
(7). Now we just consider ρ̸= 0. Figure 4, Figure 5 and
Figure 6 compare the heuristic information function with
test cost and without one. Experimental results on the Iris
dataset with various test-cost settings show performance
improvement of the information function SIGtc over the
SI G one.
V. CONCLUSION
This study has proposed a new problem of a test-cost-
sensitive attribute reduction. We formally defined the mini-
mal test cost reduct problem for numerical valued decision
systems. The new problem has practical areas of applica-
tion because datasets often contain numerical attributes in
the real world. The proposed solution on this problem is
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
delta
Average exceeding factor
Uniform
Normal
Pareto
Figure 3. Average exceeding factor(AEF)
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
delta
FOF with the Uniform distribution
Without Cost
With Cost
Figure 4. Finding optimal factor with the Uniform distribution
based on neighborhood rough set model. We also design
a heuristic information function based on positive region
and the weighted test costs to obtain effective results.
With this function, a new heuristic algorithm is designed.
Experimental results show that the proposed method is able
to find a low cost test set.
ACKNOWLEDGMENTS
This work is in part supported by Fujian Province Foun-
dation of Higher Education under Grant No. JK2010036,
the Fujian Province Foundation of Serving the Construction
of the Economic Zone on the West Side of the Straits,
National Science Foundation of China under Grant No.
60873077, 61170128, the Natural Science Foundation of
Fujian Province, China under Grant No. 2011J01374, and
the Education Department of Fujian Province under Grant
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
delta
FOF with the Normal distribution
Without Cost
With Cost
Figure 5. Finding optimal factor with the Normal distribution
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
delta
FOF with the Pareto distribution
Without Cost
With Cost
Figure 6. Finding optimal factor with the Pareto distribution
No. JA11176.
REF ER EN CE S
[1] M. Dash and H. Liu, “Consistency-based search in feature
selection,” Artificial Intelligence, vol. 151, pp. 155–176, 2003.
[2] Y. Yao, Y. Zhao, and J. Wang, “On reduct construction
algorithms,” in Rough Set and Knowledge Technology, 2006,
pp. 297–304.
[3] W. Zhu and F. Wang, “Reduction and axiomization of cover-
ing generalized rough sets,” Information Sciences, vol. 152,
no. 1, pp. 217–230, 2003.
[4] Y. Yao and Y. Zhao, “Attribute reduction in decision-theoretic
rough set models,” Information Sciences, vol. 178, no. 17, pp.
3356–3373, 2008.
[5] H. Li, W. Zhang, and H. Wang, “Classification and reduction
of attributes in concept lattices,” in Granular Computing,
2006, pp. 142–147.
[6] Q. Liu, F. Li, F. Min, M. Ye, and G. Yang, “An efficient
reduction algorithm based on new conditional information
entropy,Control and Decision (in Chinese), vol. 20, no. 8,
pp. 878–882, 2005.
[7] G. Wang, H. Yu, and D. Yang, “Decision table reduction
based on conditional information entropy,Chinese Journal
of Computers, vol. 2, no. 7, pp. 759–766, 2002.
[8] Q. Hu, D. Yu, J. Liu, and C. Wu, “Neighborhood rough set
based heterogeneous feature subset selection,” Information
Sciences, vol. 178, no. 18, pp. 3577–3594, 2008.
[9] Q. Hu, D. Yu, and Z. Xie, “Numerical attribute reduction
based on neighborhood granulation and rough approximation
(in chinese),” Journal of Software, vol. 19, no. 3, pp. 640–649,
March 2008.
[10] Z. X. Qinghua Hu, Daren Yu, “Neighborhood classifiers,
Expert Systems with Applications, vol. 34, pp. 866–876, 2008.
[11] F. Min, H. He, Y. Qian, and W. Zhu, “Test-cost-sensitive
attribute reduction,Information Sciences, vol. 181, pp. 4928–
4942, November 2011.
[12] H. He, F. Min, and W. Zhu, “Attribute reduction in test-
cost-sensitive decision systems with common-test-costs,” in
ICMLC, v1, 2011, pp. 432–436.
[13] H. He and F. Min, “Accumulated cost based test-cost-sensitive
attribute reduction,” in RSFDGrC, ser. LNAI, vol. 6743, 2011,
pp. 244–247.
[14] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data
mining to knowledge discovery in databases,AI Magazine,
vol. 17, pp. 37–54, 1996.
[15] P. D. Turney, “Cost-sensitive classification: Empirical evalu-
ation of a hybrid genetic decision tree induction algorithm,”
Journal of Artificial Intelligence Research, vol. 2, pp. 369–
409, 1995.
[16] F. Min and Q. Liu, “A hierarchical model for test-cost-
sensitive decision systems,Information Sciences, vol. 179,
no. 14, pp. 2442–2452, 2009.
[17] F. Min, W. Zhu, and H. Zhao, “Coser: Cost-senstive rough
sets, http://grc.fjzs.edu.cn/˜fmin/coser/index.html,” 2011.
[18] T. R. M. D. Randall Wilson, “Improved heterogeneous dis-
tance functions,” Journal of Artificial Intelligence Research,
vol. 6, pp. 1–34, 1997.
[19] Z. Pawlak, “Rough sets,International Journal of Computer
and Information Sciences, vol. 11, pp. 341–356, 1982.
[20] D. Slezak, “Approximate entropy reducts,” Fundamenta In-
formaticae, vol. 53, no. 3-4, pp. 365–390, 2002.
[21] Y. Qian, J. Liang, W. Pedrycz, and C. Dang, “Positive
approximation: An accelerator for attribute reduction in rough
set theory,Artificial Intelligence, vol. 174, no. 9-10, pp. 597–
618, 2010.
... The Abhimanyu Bar abhi16@uohyd.ac.in P. S. V. S. Sai Prasad saics@uohyd.ernet.in 1 School of Computer and Information Sciences, University of Hyderabad, Hyderabad, 500046, Telengana, India applicability or utility of reduct(s) in data analysis has been extensively studied for a variety of applications, including cost-sensitive dimensionality reduction [3,4], attribute relevance assessment or ranking [5,6], classification model induction [7,8], and so on. ...
... In A * search, if cost of reaching any intermediate node (g score ) of B from the initial state (∅) will denoted by the granular space induced by B. Formally, this is defined as (4). ...
Article
Full-text available
Traditionally, the shortest length has been used as the optimality criterion in rough set based optimal / near-optimal reduct computation. A more generalizable alternative to the optimal reduct computation approach was recently introduced, with the coarsest granular space as the optimality criterion. However, owing to exponential time complexity, it is not scalable to even moderate-sized data sets. This article investigates to formulate two near-optimal reduct computation alternatives for scaling comparatively larger data sets. The first algorithm employs a controlled A∗ search based strategy to find a near-optimal reduct while reducing both space utilization and computational time. Whereas, the second algorithm employs a greedy sequential backward elimination (SBE) strategy on the higher granular space attribute ordering for achieving coarsest granular space based near-optimal reduct. The comparative experimental study is conducted among the proposed approaches with the coarsest granular space based optimal reduct algorithm A∗RSOR and state-of-the-art shortest length based optimal and near-optimal reduct algorithms. The experimental study amply validates the relevance of the proposed approaches in obtaining near-optimal reduct with increased scalability and comparable or improved generalizable classification models induction.
... By utilizing cost informations, costsensitive algorithms modify the classification boundaries or the decision thresholds of the traditional learning methods to minimize the total classification cost instead of Xiaodan Wang wang afeu@163.com 1 Air and Missile Defense College, Air Force Engineering University, Xi'an, Shanxi, 710051, People's Republic of China classification errors. In recent years, cost-sensitive learning has been a hot research topic and nearly all the traditional classification methods have been extended cost-sensitively, such as cost-sensitive SVM [2][3][4], cost-sensitive Bayes [5,6], cost-sensitive decision tree [7][8][9], cost-sensitive boosting [10,11], cost-sensitive rough set [12][13][14][15], et al. ...
... Wang [11] extended existing online ensemble and cost-sensitive algorithms to several theoretically sound online cost-sensitive bagging and online cost-sensitive boosting framaworks. Zhao et al. [12][13][14] explored several attribute reduction methods to search the optimal attributes in order to minimize test or classification costs. Ju [15] proposed a cost-sensitive rough set approach by taking into account test cost and misclassification cost simultaneously, in which the information granules are sensitive to test cost and approximations are sensitive to misclassification cost, respectively. ...
Article
Full-text available
The asymmetry of different misclassification costs is a common problem in many realistic applications. However, most of the traditional classifiers pursue high recognition accuracy, assuming that different misclassification errors bring uniform cost. This paper proposes two cost-sensitive models based on support vector data description (SVDD) to minimize classification costs while maximize classification accuracy. The one-class classifier SVDD is extended to two two-class models. The cost information is incorporated to pursue tradeoff generalization performances between different classes in order to minimize the misclassification costs. Cost information is also considered to build the decision rules. The solutions of the optimization problems of the proposed two models are formulated according to sequential minimal optimization (SMO) algorithm. However, SMO needs to check all the samples to select the working set in each iteration, which is very time consuming. Considering that only the support vectors are needed to describe the boundaries, a sample selection approach is proposed to speed up the training time and reduce the storage requirement by selecting edge and overlapping samples, and overcome the local overlearning by remove outliers. Experimental results on synthetic and public datasets demonstrate the effectiveness and efficiency of the proposed methods.
... Similar to other generalized rough sets, attribute reduction [2,10,19,23] plays a fundamental role in neighborhood rough set. Recently, a great number of evaluation criteria such that neighborhood decision error rate and conditional entropy, have been deeply explored in neighborhood rough set [7,9,24]; how to accelerate the speeds of computations of the reducts was investigated in Refs. [3,14]; some scholars even managed to apply attribute reduction to other research domains or practical problems [12,13,22]. ...
Article
Full-text available
In neighborhood rough set theory, traditional heuristic algorithm for computing reducts does not take the stability of the selected attributes into account, it follows that the performances of the reducts may not be good enough if the perturbations of data occur. To fill the gap, the mechanism of acquiring the most significant attribute is realized by two steps in the reduction process: firstly, several important attributes are derived in each iteration based on several radii which are close to the given radius for computing reduct; secondly, the most significant attribute is selected from them by a voting strategy. The experiments verify that such method can effectively improve the stabilities of the reducts, and it does not require too much attributes for constructing the reducts.
... Test cost has been studied by Min et. al. [167] [165] [168] [297] using the classical rough set approach, i.e., using a single granulation; a test-cost-sensitive multigranulation rough set model is presented in [255]. Multigranulation rough set is an extension of the classical RST that leans on multiple granular structures. ...
Chapter
Full-text available
This chapter emphasizes on the role played by rough set theory (RST) within the broad field of Machine Learning (ML). As a sound data analysis and knowledge discovery paradigm, RST has much to offer to the ML community. We surveyed the existing literature and reported on the most relevant RST theoretical developments and applications in this area. The review starts with RST in the context of data preprocessing (discretization, feature selection, instance selection and meta-learning) as well as the generation of both descriptive and predictive knowledge via decision rule induction, association rule mining and clustering. Afterward, we examined several special ML scenarios in which RST has been recently introduced, such as imbalanced classification, multi-label classification, dynamic/incremental learning, Big Data analysis and cost-sensitive learning.
... Feature selection is the process of choosing an appropriate subset of attributes from the original dataset [34]. There are numerous reduct problems which have been defined on the classical [28], the neighborhood (see, e.g., [11,12]), the covering-based [16,40,[43][44][45][46], the decision-theoretical [37], and the dominancebased [4] rough set models. Respective definitions of relative reducts also have been studied in [8,29]. ...
Chapter
Full-text available
In many data mining and machine learning applications, data are not free, and there is a test cost for each data item. Due to economic, technological and legal reasons, it is neither possible nor necessary to obtain a classifier with 100 % accuracy. In this paper, we consider such a situation and propose a new constraint satisfaction problem to address it. With this in mind, one has to minimize the test cost to keep the accuracy of the classification under a budget. The constraint is expressed by the positive region, whereas the object is to minimizing the total test cost. The new problem is essentially a dual of the test cost constraint attribute reduction problem, which has been addressed recently. We propose a heuristic algorithm based on the information gain, the test cost, and a user specified parameter \(\lambda \) to deal with the new problem. The algorithm is tested on four University of California - Irvine datasets with various test cost settings. Experimental results indicate that the algorithm finds optimal feature subset in most cases, the rational setting of \(\lambda \) is different among datasets, and the algorithm is especially stable when the test cost is subject to the Pareto distribution.
Article
The scale of the radius for constructing neighborhood relation has a great effect on the results of neighborhood rough sets and corresponding measures. A very small radius frequently brings us nothing because any two different samples are separated from each other, though these two samples have the same label. If the radius is growing, then there is a serious risk that samples with different labels may fall into the same neighborhood. Obviously, the radius based neighborhood relation does not take the labels of samples into account, which will lead to unsatisfactory discrimination. To fill such gap, a pseudo-label strategy is systematically studied in rough set theory. Firstly, a pseudo-label neighborhood relation is proposed. Such relation can differentiate samples by not only the distance but also the pseudo labels of samples. Therefore, both the neighborhood rough set and some corresponding measures can be re-defined. Secondly, attribute reductions are explored based on the re-defined measures. The heuristic algorithm is also designed to compute reducts. Finally, the experimental results over UCI data sets tell us that our pseudo-label strategy is superior to the traditional neighborhood approach. This is mainly because the former can significantly reduce the uncertainties and improve the classification accuracies. The Wilcoxon signed rank test results also show that neighborhood approach and pseudo-label neighborhood approach are so different from the viewpoints of the measures and attribute reductions in rough set theory.
Article
Preference analysis is a class of important issues in multi-criteria ordinal decision making. Rough set is an effective approach to handle preference analysis. In order to solve the multi-criteria preference analysis problem, this work improves the fuzzy preference relation rough set model with additive consistent fuzzy preference relation, and expands it to multi-granulation case. Cost is also an important issue in decision analysis. Taking the cost into consideration, we also expand the model to cost sensitive multi-granulation fuzzy preference relation rough set. Some theorems are represented, and the classification and sample condensation algorithms based on our model are investigated. Some experiments are complete and the experimental results show that our model and algorithms are effective for preference decision making of ordinal decision system.
Article
This paper proposes a new approach of attribute reduction for decision systems based on rough set and fuzzy clustering in order to avoid information loss resulted from the discretization of real valued condition attributes. In this paper, the fuzzy clustering technique is employed to obtain an optimal value which measures the inconsistency between condition attributes and decision attribute, and attribute reduction is performed to keep this optimal value. Finally, an example is employed to illustrate our idea.
Article
Test-cost-sensitive attribute reduction is a recent challenge in data mining. Most existing work considers nominal data, while only a few studies numerical ones. Moreover, for numerical data, the existing problem definition is not clear enough; only heuristic algorithms are proposed, which cannot be evaluated without optimal results. To address this situation, we formally redefine the minimal-test-cost reduct problem on neighborhood decision systems. Inconsistent objects are introduced to describe positive regions, boundary regions and reducts. Some propositions concerning attribute reduction are also presented. We design a backtrack algorithm to obtain a minimal-test-cost reduct. In the algorithm, three pruning techniques are employed to improve its efficiency. Experimental results on eight UCI (University of California, Irvine) datasets indicate that the pruning techniques are effective, and the new algorithm is efficient on medium-sized datasets. We also study the performance of existing heuristic attribute reduction algorithms employing results obtained by our algorithm. It is shown that there is a need to improve the performance of these heuristic algorithms. 1548-7741/
Article
This paper proposes a new attribute reduction problem as well as an algorithm under the context of cost-sensitive learning. Attribution reduction, also called feature selection, has attracted much research interests especially from the rough sets society. An attribute reduct is a subset of attributes/tests that are jointly sufficient and individually necessary for preserving a particular property of the given information table. The tradition attribute reduction problem aims at constructing a minimal reduct. As extensions of this problem, test-cost-sensitive attribute reduction problems aim at constructing a minimal test-cost reduct. Some previous works have addressed such a problem under the model where test-costs of attributes are independent from each other. This paper addresses a new problem under the model where different attributes may share some common-test-costs. We define the problem formally, and propose an algorithm to deal with it. This algorithm takes into account the conditional information entropy, test-costs and common-test-costs. Experimental results on the Zoo dataset with various test-cost settings show that our algorithms are efficient and effective.
Article
We investigate in this paper approximate operations on sets, approximate equality of sets, and approximate inclusion of sets. The presented approach may be considered as an alternative to fuzzy sets theory and tolerance theory. Some applications are outlined.
Article
To deal with numerical features, a neighborhood rough set model is proposed based on the definitions of δ neighborhood and neighborhood relations in metric spaces. Each object in the universe is assigned with a neighborhood subset, called neighborhood granule. The family of neighborhood granules forms a concept system to approximate an arbitrary subset in the universe with two unions of neighborhood granules: lower approximation and upper approximation. Thereby, the concepts of neighborhood information systems and neighborhood decision tables are introduced. The properties of the model are discussed. Furthermore, the dependency function is used to evaluate the significance of numerical attributes and a forward greedy numerical attribute reduction algorithm is constructed. Experimental results with UCI data sets show that the neighborhood model can select a few attributes but keep,
Article
This paper analyzes the information view of rough set theory and compares it with the algebra view of rough set theory. Some equivalence relations and other kind of relations like inclusion relation between the information view and the algebra view of rough set theory are resulted through comparing each other. Two novel heuristic knowledge reduction algorithms are developed based on conditional information entropy, that is, conditional entropy based algorithm for reduction of knowledge with computing core (CEBARKCC) and conditional entropy based algorithm for reduction of knowledge without computing core (CEBARKNC). These two algorithms are compared with a mutual information based algorithm for reduction of knowledge (MIBARK) of Duoqian Miao through theoretical analysis and experimental simulation. CEBARKCC algorithm and CEBARKNC algorithm have good performance in simulation.
Article
Feature subset selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. Most of researches are focused on dealing with homogeneous feature selection, namely, numerical or categorical features. In this paper, we introduce a neighborhood rough set model to deal with the problem of heterogeneous feature subset selection. As the classical rough set model can just be used to evaluate categorical features, we generalize this model with neighborhood relations and introduce a neighborhood rough set model. The proposed model will degrade to the classical one if we specify the size of neighborhood zero. The neighborhood model is used to reduce numerical and categorical features by assigning different thresholds for different kinds of attributes. In this model the sizes of the neighborhood lower and upper approximations of decisions reflect the discriminating capability of feature subsets. The size of lower approximation is computed as the dependency between decision and condition attributes. We use the neighborhood dependency to evaluate the significance of a subset of heterogeneous features and construct forward feature subset selection algorithms. The proposed algorithms are compared with some classical techniques. Experimental results show that the neighborhood model based method is more flexible to deal with heterogeneous data.
Article
K nearest neighbor classifier (K-NN) is widely discussed and applied in pattern recognition and machine learning, however, as a similar lazy classifier using local information for recognizing a new test, neighborhood classifier, few literatures are reported on. In this paper, we introduce neighborhood rough set model as a uniform framework to understand and implement neighborhood classifiers. This algorithm integrates attribute reduction technique with classification learning. We study the influence of the three norms on attribute reduction and classification, and compare neighborhood classifier with KNN, CART and SVM. The experimental results show that neighborhood-based feature selection algorithm is able to delete most of the redundant and irrelevant features. The classification accuracies based on neighborhood classifier is superior to K-NN, CART in original feature spaces and reduced feature subspaces, and a little weaker than SVM.
Article
Rough set theory can be applied to rule induction. There are two different types of classification rules, positive and boundary rules, leading to different decisions and consequences. They can be distinguished not only from the syntax measures such as confidence, coverage and generality, but also the semantic measures such as decision-monotocity, cost and risk. The classification rules can be evaluated locally for each individual rule, or globally for a set of rules. Both the two types of classification rules can be generated from, and interpreted by, a decision-theoretic model, which is a probabilistic extension of the Pawlak rough set model.As an important concept of rough set theory, an attribute reduct is a subset of attributes that are jointly sufficient and individually necessary for preserving a particular property of the given information table. This paper addresses attribute reduction in decision-theoretic rough set models regarding different classification properties, such as: decision-monotocity, confidence, coverage, generality and cost. It is important to note that many of these properties can be truthfully reflected by a single measure γ in the Pawlak rough set model. On the other hand, they need to be considered separately in probabilistic models. A straightforward extension of the γ measure is unable to evaluate these properties. This study provides a new insight into the problem of attribute reduction.
Article
Feature selection is an effective technique in dealing with dimensionality reduction. For classification, it is used to find an “optimal” subset of relevant features such that the overall accuracy of classification is increased while the data size is reduced and the comprehensibility is improved. Feature selection methods contain two important aspects: evaluation of a candidate feature subset and search through the feature space. Existing algorithms adopt various measures to evaluate the goodness of feature subsets. This work focuses on inconsistency measure according to which a feature subset is inconsistent if there exist at least two instances with same feature values but with different class labels. We compare inconsistency measure with other measures and study different search strategies such as exhaustive, complete, heuristic and random search, that can be applied to this measure. We conduct an empirical study to examine the pros and cons of these search methods, give some guidelines on choosing a search method, and compare the classifier error rates before and after feature selection.