Content uploaded by Fan Min
Author content
All content in this area was uploaded by Fan Min on Aug 15, 2014
Content may be subject to copyright.
Test-Cost-Sensitive Attribute Reduction Based on Neighborhood Rough Set
Hong Zhao, Fan Min∗, and William Zhu
Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China.
Email: hongzhao2012@163.com, minfanphd@163.com, williamfengzhu@gmail.com
Abstract—Recent research in machine learning and data
mining has produced a wide variety of algorithms for cost-
sensitive learning. Most existing rough set methods on this
issue deal with nominal attributes. This is because that nominal
attributes produce equivalent relations and therefore are easy
to process. However, in real applications, datasets often contain
numerical attributes. As we know, numerical attributes are
more complex than nominal ones and require more com-
putational resources. Consequently, respective learning tasks
are more challenging. This paper deals with test-cost-sensitive
attribute reduction for numerical valued decision systems.
Neighborhood rough set achieved success in numerical data
processing, hence we adopt the model to define the minimal test
cost reduct problem. Due to the complexity of the new problem,
heuristic algorithms are needed to find a sub-optimal solution.
We propose one kind of heuristic information, which is the sum
of the positive region and weighted test cost. When the test
cost is not considered, the information degrades to the positive
region, which is the most commonly used one in classical rough
set. Three metrics are adopted to evaluate the performance of
reduction algorithms from a statistical viewpoint. Experimental
results show that the proposed method takes advantages of test
costs and therefore produces satisfactory results.
Keywords-Cost-sensitive learning, neighborhood, rough set,
reduction, heuristic algorithm.
I. INT ROD UC TI ON
In practical data mining applications, it is well known
that redundant data make the mining task rather difficult.
Attribute reduction is a successful technique to remove
them and facilitate the mining task. This issue has attracted
much attention in recent years [1], [2], [3], [4]. Different
definitions of reducts and respective optimal metrics are
applicable to different fields. When the test cost is not
considered, people proposed attribute reduction algorithms
to deal with nominal data [5], [6], [7]. On the other hand,
the attribute reduction algorithms based on neighborhood
rough set model are proposed to deal with numerical valued
decision systems [8], [9], [10].
Recently, the test-cost-sensitive attribute reduction prob-
lem was proposed in [11], [12], [13]. This problem has
wide application since the collection of data is not free,
and there is a test cost for each data item [14], [15]. The
test-cost-sensitive attribute reduction algorithm framework
in [16] is devoted to this problem. The algorithm of [11]
employs a user-specified factor λto adjust the heuristic
information function based on the test-cost of each attribute.
∗Corresponding author. Tel.: +86 133 7690 8359
The performance of the algorithm is satisfactory. However
the data is limited to nominal ones. Since numerical data
exists in real-world widely, there is much need to consider
them.
In this paper we define the test-cost-sensitive attribute
reduction problem on numerical data. Because neighborhood
rough set is successful in deal with numerical data, we adopt
the theory for our problem definition. In order to facilitate
neighborhood threshold settings, data items are normalized
firstly. Meanwhile, test costs are also normalized to facilitate
the definition of heuristic function.
In most existing works, positive region is used solely in
neighborhood as the heuristic information. We have a more
complex data model due to test costs. Specifically, we use
sum of the positive region and weighted test cost as the
heuristic information, with which a new heuristic algorithm
is designed. As we know, the value of positive region is a
number in the range [0,1]. Hence, the weighted test cost after
normalization can adjust the heuristic information based on
positive region in a small range. In order to ensure the
leading position of positive region, the adjustment adopts
addition instead of commonly used multiplication.
The Iris dataset with various test-cost settings is employed
to study the performance of our algorithm. Since there is
no test cost setting in the dataset, we use three distribu-
tion functions to generate test costs. The three functions
correspond with different applications. Moreover, we adopt
three metrics to evaluate the performance of the reduction
algorithms from a statistical viewpoint. Experimental results
show that our algorithm can generate a minimal test cost
reduct in most cases. This is because the proposed method
takes advantages of test costs. Experiments are undertaken
using an open source software called COSER (cost-sensitive
rough sets) [17].
The rest of the paper is organized as follows: Section II
presents the subjects needed in the other parts. Section III
shows the attribute reduction algorithm based on neighbor-
hood rough set. Experimental analysis is given in Section
IV. Conclusions come in Section V.
II. PRELIMINARIES
This section introduces preliminary knowledge of the
paper. First, the neighborhood rough set decision system is
reviewed. And then, test-cost-sensitive decision system is
discussed.
A. Neighborhood rough set decision systems
Formally, the decision system can be written as a 5-tuple
S=< U, C, D, {Va},{Ia}>, where Uis the nonempty
set called a universe, Cand Dare the nonempty sets
of variables called as conditional attributes and decision
attributes respectively. Vais the set of values for each
a∈C∪D, and Ia:U→Vais an information function
for each a∈C∪D. We often denote {Va|a∈C∪D}and
{Ia|a∈C∪D}by Vand I, respectively.
Definition 1: [8] Given arbitrary xi∈Uand B⊆C, the
neighborhood δB(xi)of xiis defined as:
δB(xi) = {xj|xj∈U, ∆B(xi, xj)≤δ},(1)
where ∆is a distance function. ∀x1, x2, x3∈U, it satisfies
(1) ∆(x1, x2)≥0;
(2) ∆(x1, x2) = 0, if and only if x1=x2;
(3) ∆(x1, x2) = ∆(x2, x1);
(4) ∆(x1, x3)≤∆(x1, x2) + ∆(x2, x3);
A detailed survey on distance functions can be seen in [18].
If the attributes generate neighborhood relation over the
universe, the decision system is called a neighborhood
decision system. It is denoted by NDS =< U, C, D, V , I >.
Definition 2: [10] Given a neighborhood decision system
NDS,X1, X2, ..., XNare the object subsets with decisions
1to N,δB(xi)is the neighborhood information granules
including xiand generated by attributes B⊆C, Then the
positive region (POS) of the decision is defined as
P OSB(D) = {xi|δB(xi)⊆X, xi∈U}.(2)
The size of the neighborhood depends on threshold δ.
When δ= 0, the samples in the same neighborhood granule
are equivalent to each other. In this case, the neighborhood
rough sets are a natural generalization of Pawlak rough sets.
B. Attribute significance and reduction with neighborhood
model
The dependency degree of Dto Bis defined as the ratio
of consistent objects:
γB(D) = |P OSB(D)|
|U|.(3)
A number of definitions of relative reducts exist [19], [20],
[21] for different rough set models. This paper employs the
definition based on the positive region.
Definition 3: [8] Given a neighborhood decision system
NDS =< U, C, D, V, I >, B ⊆C, attribute subset Bis a
relative reduct if
(1)γB(D) = γC(D),
(2)∀a∈B, γB(D)> γB−a(D).
C. Test-cost-sensitive decision systems
Since we have assumed that cost tests are undertaken in
parallel, so we consider the most widely used model as
follows:
Definition 4: [16] A test-cost-independent decision sys-
tem (TCI-DS) Sis the 6-tuple:
S= (U, C, D, {Va},{Ia}, c),(4)
where a∈C∪D,Uis the nonempty set called a universe, C
and Dare the nonempty set of variables called as conditional
attributes and decision attributes respectively. Vais the set
of values for each a∈C∪D, and Ia:U→Vais a
heuristic information function for each a∈C∪D, and
c:C→R+∪ {0}is the test cost function. Test costs are
independent of one another, that is, c(B) = ∑a∈Bc(a)for
any B⊆C.
III. TEST-C OS T-SENSITIVE ATTRIBUTE REDUCTION
BASED ON NEIGHBORHOOD DECISION SYSTEM
In this section, we discuss both the attribute value and test
cost normalization first. And then, a problem of test-cost-
sensitive attribute reduction based on neighborhood decision
system is proposed.
A. Attribute value normalization
To design test-cost-sensitive attribute reduction based on
neighborhood decision system, we need to set the threshold
δ, which determines the size of the neighborhood. Setting the
threshold is the most important problem in neighborhood-
based classification. In order to facilitate neighborhood
threshold settings, the values of attributes are normalized
first.
Example 1: Table I presents a decision system of Iris,
which conditional attributes are numerical data. Where U=
{x1, x2, x3, x4, ..., x149, x150 },C={Sepal-length, Sepal-
width, Petal-length, Petal-width}, and D={Class}. In
order to computing the distance of conditional attributes,
we normalized every attributes from their value into a range
from 0 to 1 firstly.
Table I
AN EX AMP LE N UME RIC AL VAL UE ATTR IB UTE D EC ISI ON TAB LE (IRI S)
Patient Sepal-length Sepal-width Petal-length Petal-width Class
x10.23529 0.77273 0.14286 0.04762 0
x20.29412 0.72727 0.11905 0.04762 0
x30.35294 0.09091 0.38095 0.42857 0.5
x40.64706 0.31818 0.52381 0.52381 0.5
x50.41176 0.31818 0.50000 0.42857 0.5
...... ...... ...... ...... ...... ......
x149 0.58824 0.54545 0.85714 1.00000 1
x150 0.44118 0.27273 0.64286 0.71429 1
B. Test cost normalization
For statistical purposes, three different schemes to produce
random test costs are adopted. These schemes comprise:
uniform distribution, normal distribution, and Pareto distri-
bution. For simplicity, test costs are integers ranging from
Mto N, and are evaluated independently [16].
In order to facilitate the definition of heuristic function,
we normalize the value of test cost.
Let B⊆C, ai∈B.
c∗
i= (ci−mincost)/(maxcost −mincost)(5)
is the normalized cost of attribute ai, where ciis the test
cost of attribute ai, and mincost and maxcost are the
minimum cost and the max one of all conditional attributes
respectively.
C. Test-cost-sensitive attribute reduction based on neighbor-
hood decision system
Most heuristic algorithms to the attribute reduction have
the same structure and their differences lie in the heuristic
function [2]. Now we define the heuristic function based on
the positive region and weighted test cost.
Definition 5: Let S∗= (U, C, D, {Va},{Ia}, c∗)be a
test-cost-sensitive neighborhood decision system , where c∗
is normalized one of c. Let B⊆C, ai∈(C−B),c∗
iis
defined in Equation(5).
Now we propose our positive region and weighted test
cost function as follows.
SI Gtc (ai, B, D, c∗
i) = SI G(ai, B, D) + (1 −c∗
i)∗ρ, (6)
where ρis regulatory factor of the test cost, if ρ= 0,SIGtc
and SI G are equivalent, where
SI G(ai, B, D) = γB∪ai(D).(7)
SI Gtc is the heuristic function with test costs, and SIG is
the one without taking into account test costs.
Now we propose a heuristic algorithm based on the
positive region and weighted test cost to find out the reduct
with minimal test cost. A framework of our heuristic method
is shown in Algorithm 1. In the proposed Algorithm 1,
if sigm= 1, the algorithm finds the best solution. The
proposed algorithm has stably performance from a statistical
perspective.
According to Equation (1), if let δ= 0, the neighborhood
rough set degenerates to classical one. In this case, the pro-
posed test-cost reduct method can be used to deal with both
nominal attributes and numerical ones without discretization.
IV. EXPERIMENTS
The complexity of classification not only depends on the
given feature space, but also the granularity level [8]. In this
paper, user can set granularity level and parameter δin one
set of experiments. We let δ= 0.005,0.008,0.011, ..., 0.029.
Algorithm 1 Test-cost-sensitive attribute reduction based on
neighborhood decision system
Input:(U, C, D, {Va},{Ia}, c∗)and δ,δis the threshold to
control the size of the neighborhood
Output: A reduct with minimal test cost red
Method:
1: red =∅;
2: sigm=−1, sigt= 0 ;
3: while (sigm̸= 1||sigm̸=sigt)do
4: sigt=sigm;
5: for each ai∈(C−red)do
6: Compute SI Gtc (ai, red, D, ci);
7: end for
8: Select amand cmwith the maximal
SI Gtc (am, red, D, cm);
9: Compute sigm=SI G(am, red, D );
10: red =red ∪am;
11: end while
12: if sigm= 1 then
13: return red;
14: end if
In Equation (6), ρis the regulatory factor of the test cost.
We let ρ= 0.01, which is just to reduce the influence of
the test cost but necessary. In other words, it can make the
SI G play a major role in the function.
For each distribution, we generate 100 sets of test costs,
and for each test cost setting, there are 9 δsettings.
A. Evaluation metrics
In order to dispel the influence of subjective and objective
factors, three evaluation metrics are adopted to compare
the performances. These are finding optimal factor (FOF),
maximal exceeding factor (MEF), and average exceeding
factor (AEF). The detail of definition can be seen in [16].
When the algorithm runs with different test cost settings, we
obtain some reducts with these evaluation metrics.
B. Statistical results
For different test cost distributions, the performance of
the algorithm is different. Figure 1 shows the results of
finding optimal factors. This metric is both qualitative and
quantitative. First, it only counts optimal solutions. Second,
it is computed statistically. Figure 2 shows the results of
maximal exceeding factors. The maximal exceeding factors
provide the worst case of the algorithm, and they should be
viewed as a statistical metric. Figure 3 shows the average
exceeding factors. This displays the overall performance of
the algorithm from a statistical perspective. On the whole,
with the Normal test cost distribution, the algorithm has the
best performance.
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
delta
Finding optimal factor
Uniform
Normal
Pareto
Figure 1. Finding optimal factor(FOF)
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
delta
Maximal exceeding factor
Uniform
Normal
Pareto
Figure 2. Maximal exceeding factor(MEF)
C. Performance comparison
If let ρ= 0, the Equation (6) will degrade to the Equation
(7). Now we just consider ρ̸= 0. Figure 4, Figure 5 and
Figure 6 compare the heuristic information function with
test cost and without one. Experimental results on the Iris
dataset with various test-cost settings show performance
improvement of the information function SIGtc over the
SI G one.
V. CONCLUSION
This study has proposed a new problem of a test-cost-
sensitive attribute reduction. We formally defined the mini-
mal test cost reduct problem for numerical valued decision
systems. The new problem has practical areas of applica-
tion because datasets often contain numerical attributes in
the real world. The proposed solution on this problem is
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
delta
Average exceeding factor
Uniform
Normal
Pareto
Figure 3. Average exceeding factor(AEF)
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
delta
FOF with the Uniform distribution
Without Cost
With Cost
Figure 4. Finding optimal factor with the Uniform distribution
based on neighborhood rough set model. We also design
a heuristic information function based on positive region
and the weighted test costs to obtain effective results.
With this function, a new heuristic algorithm is designed.
Experimental results show that the proposed method is able
to find a low cost test set.
ACKNOWLEDGMENTS
This work is in part supported by Fujian Province Foun-
dation of Higher Education under Grant No. JK2010036,
the Fujian Province Foundation of Serving the Construction
of the Economic Zone on the West Side of the Straits,
National Science Foundation of China under Grant No.
60873077, 61170128, the Natural Science Foundation of
Fujian Province, China under Grant No. 2011J01374, and
the Education Department of Fujian Province under Grant
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
delta
FOF with the Normal distribution
Without Cost
With Cost
Figure 5. Finding optimal factor with the Normal distribution
0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
delta
FOF with the Pareto distribution
Without Cost
With Cost
Figure 6. Finding optimal factor with the Pareto distribution
No. JA11176.
REF ER EN CE S
[1] M. Dash and H. Liu, “Consistency-based search in feature
selection,” Artificial Intelligence, vol. 151, pp. 155–176, 2003.
[2] Y. Yao, Y. Zhao, and J. Wang, “On reduct construction
algorithms,” in Rough Set and Knowledge Technology, 2006,
pp. 297–304.
[3] W. Zhu and F. Wang, “Reduction and axiomization of cover-
ing generalized rough sets,” Information Sciences, vol. 152,
no. 1, pp. 217–230, 2003.
[4] Y. Yao and Y. Zhao, “Attribute reduction in decision-theoretic
rough set models,” Information Sciences, vol. 178, no. 17, pp.
3356–3373, 2008.
[5] H. Li, W. Zhang, and H. Wang, “Classification and reduction
of attributes in concept lattices,” in Granular Computing,
2006, pp. 142–147.
[6] Q. Liu, F. Li, F. Min, M. Ye, and G. Yang, “An efficient
reduction algorithm based on new conditional information
entropy,” Control and Decision (in Chinese), vol. 20, no. 8,
pp. 878–882, 2005.
[7] G. Wang, H. Yu, and D. Yang, “Decision table reduction
based on conditional information entropy,” Chinese Journal
of Computers, vol. 2, no. 7, pp. 759–766, 2002.
[8] Q. Hu, D. Yu, J. Liu, and C. Wu, “Neighborhood rough set
based heterogeneous feature subset selection,” Information
Sciences, vol. 178, no. 18, pp. 3577–3594, 2008.
[9] Q. Hu, D. Yu, and Z. Xie, “Numerical attribute reduction
based on neighborhood granulation and rough approximation
(in chinese),” Journal of Software, vol. 19, no. 3, pp. 640–649,
March 2008.
[10] Z. X. Qinghua Hu, Daren Yu, “Neighborhood classifiers,”
Expert Systems with Applications, vol. 34, pp. 866–876, 2008.
[11] F. Min, H. He, Y. Qian, and W. Zhu, “Test-cost-sensitive
attribute reduction,” Information Sciences, vol. 181, pp. 4928–
4942, November 2011.
[12] H. He, F. Min, and W. Zhu, “Attribute reduction in test-
cost-sensitive decision systems with common-test-costs,” in
ICMLC, v1, 2011, pp. 432–436.
[13] H. He and F. Min, “Accumulated cost based test-cost-sensitive
attribute reduction,” in RSFDGrC, ser. LNAI, vol. 6743, 2011,
pp. 244–247.
[14] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data
mining to knowledge discovery in databases,” AI Magazine,
vol. 17, pp. 37–54, 1996.
[15] P. D. Turney, “Cost-sensitive classification: Empirical evalu-
ation of a hybrid genetic decision tree induction algorithm,”
Journal of Artificial Intelligence Research, vol. 2, pp. 369–
409, 1995.
[16] F. Min and Q. Liu, “A hierarchical model for test-cost-
sensitive decision systems,” Information Sciences, vol. 179,
no. 14, pp. 2442–2452, 2009.
[17] F. Min, W. Zhu, and H. Zhao, “Coser: Cost-senstive rough
sets, http://grc.fjzs.edu.cn/˜fmin/coser/index.html,” 2011.
[18] T. R. M. D. Randall Wilson, “Improved heterogeneous dis-
tance functions,” Journal of Artificial Intelligence Research,
vol. 6, pp. 1–34, 1997.
[19] Z. Pawlak, “Rough sets,” International Journal of Computer
and Information Sciences, vol. 11, pp. 341–356, 1982.
[20] D. Slezak, “Approximate entropy reducts,” Fundamenta In-
formaticae, vol. 53, no. 3-4, pp. 365–390, 2002.
[21] Y. Qian, J. Liang, W. Pedrycz, and C. Dang, “Positive
approximation: An accelerator for attribute reduction in rough
set theory,” Artificial Intelligence, vol. 174, no. 9-10, pp. 597–
618, 2010.