Conference PaperPDF Available

Test-cost-sensitive attribute reduction based on neighborhood rough set

November 2011

November 2011

DOI:10.1109/GRC.2011.6122701

Source
DBLP

Conference: 2011 IEEE International Conference on Granular Computing, GrC-2011, Kaohsiung, Taiwan, November 8-10, 2011

Authors:

Hong Zhao

MinNan Normal University

Fan Min

Southwest Petroleum University

Recent research in machine learning and data mining has produced a wide variety of algorithms for cost-sensitive learning. Most existing rough set methods on this issue deal with nominal attributes. This is because that nominal attributes produce equivalent relations and therefore are easy to process. However, in real applications, datasets often contain numerical attributes. As we know, numerical attributes are more complex than nominal ones and require more computational resources. Consequently, respective learning tasks are more challenging. This paper deals with test-cost-sensitive attribute reduction for numerical valued decision systems. Neighborhood rough set achieved success in numerical data processing, hence we adopt the model to define the minimal test cost reduct problem. Due to the complexity of the new problem, heuristic algorithms are needed to find a sub-optimal solution. We propose one kind of heuristic information, which is the sum of the positive region and weighted test cost. When the test cost is not considered, the information degrades to the positive region, which is the most commonly used one in classical rough set. Three metrics are adopted to evaluate the performance of reduction algorithms from a statistical viewpoint. Experimental results show that the proposed method takes advantages of test costs and therefore produces satisfactory results.

Content uploaded by Fan Min

Content may be subject to copyright.

Test-Cost-Sensitive Attribute Reduction Based on Neighborhood Rough Set

Hong Zhao, Fan Min∗, and William Zhu

Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China.

Email: hongzhao2012@163.com, minfanphd@163.com, williamfengzhu@gmail.com

Abstract—Recent research in machine learning and data

mining has produced a wide variety of algorithms for cost-

sensitive learning. Most existing rough set methods on this

issue deal with nominal attributes. This is because that nominal

attributes produce equivalent relations and therefore are easy

to process. However, in real applications, datasets often contain

numerical attributes. As we know, numerical attributes are

more complex than nominal ones and require more com-

putational resources. Consequently, respective learning tasks

are more challenging. This paper deals with test-cost-sensitive

attribute reduction for numerical valued decision systems.

Neighborhood rough set achieved success in numerical data

processing, hence we adopt the model to deﬁne the minimal test

cost reduct problem. Due to the complexity of the new problem,

heuristic algorithms are needed to ﬁnd a sub-optimal solution.

We propose one kind of heuristic information, which is the sum

of the positive region and weighted test cost. When the test

cost is not considered, the information degrades to the positive

region, which is the most commonly used one in classical rough

set. Three metrics are adopted to evaluate the performance of

reduction algorithms from a statistical viewpoint. Experimental

results show that the proposed method takes advantages of test

costs and therefore produces satisfactory results.

Keywords-Cost-sensitive learning, neighborhood, rough set,

reduction, heuristic algorithm.

I. INT ROD UC TI ON

In practical data mining applications, it is well known

that redundant data make the mining task rather difﬁcult.

Attribute reduction is a successful technique to remove

them and facilitate the mining task. This issue has attracted

much attention in recent years [1], [2], [3], [4]. Different

deﬁnitions of reducts and respective optimal metrics are

applicable to different ﬁelds. When the test cost is not

considered, people proposed attribute reduction algorithms

to deal with nominal data [5], [6], [7]. On the other hand,

the attribute reduction algorithms based on neighborhood

rough set model are proposed to deal with numerical valued

decision systems [8], [9], [10].

Recently, the test-cost-sensitive attribute reduction prob-

lem was proposed in [11], [12], [13]. This problem has

wide application since the collection of data is not free,

and there is a test cost for each data item [14], [15]. The

test-cost-sensitive attribute reduction algorithm framework

in [16] is devoted to this problem. The algorithm of [11]

employs a user-speciﬁed factor λto adjust the heuristic

information function based on the test-cost of each attribute.

∗Corresponding author. Tel.: +86 133 7690 8359

The performance of the algorithm is satisfactory. However

the data is limited to nominal ones. Since numerical data

exists in real-world widely, there is much need to consider

them.

In this paper we deﬁne the test-cost-sensitive attribute

reduction problem on numerical data. Because neighborhood

rough set is successful in deal with numerical data, we adopt

the theory for our problem deﬁnition. In order to facilitate

neighborhood threshold settings, data items are normalized

ﬁrstly. Meanwhile, test costs are also normalized to facilitate

the deﬁnition of heuristic function.

In most existing works, positive region is used solely in

neighborhood as the heuristic information. We have a more

complex data model due to test costs. Speciﬁcally, we use

sum of the positive region and weighted test cost as the

heuristic information, with which a new heuristic algorithm

is designed. As we know, the value of positive region is a

number in the range [0,1]. Hence, the weighted test cost after

normalization can adjust the heuristic information based on

positive region in a small range. In order to ensure the

leading position of positive region, the adjustment adopts

addition instead of commonly used multiplication.

The Iris dataset with various test-cost settings is employed

to study the performance of our algorithm. Since there is

no test cost setting in the dataset, we use three distribu-

tion functions to generate test costs. The three functions

correspond with different applications. Moreover, we adopt

three metrics to evaluate the performance of the reduction

algorithms from a statistical viewpoint. Experimental results

show that our algorithm can generate a minimal test cost

reduct in most cases. This is because the proposed method

takes advantages of test costs. Experiments are undertaken

using an open source software called COSER (cost-sensitive

rough sets) [17].

The rest of the paper is organized as follows: Section II

presents the subjects needed in the other parts. Section III

shows the attribute reduction algorithm based on neighbor-

hood rough set. Experimental analysis is given in Section

IV. Conclusions come in Section V.

II. PRELIMINARIES

This section introduces preliminary knowledge of the

paper. First, the neighborhood rough set decision system is

reviewed. And then, test-cost-sensitive decision system is

discussed.

A. Neighborhood rough set decision systems

Formally, the decision system can be written as a 5-tuple

S=< U, C, D, {Va},{Ia}>, where Uis the nonempty

set called a universe, Cand Dare the nonempty sets

of variables called as conditional attributes and decision

attributes respectively. Vais the set of values for each

a∈C∪D, and Ia:U→Vais an information function

for each a∈C∪D. We often denote {Va|a∈C∪D}and

{Ia|a∈C∪D}by Vand I, respectively.

Deﬁnition 1: [8] Given arbitrary xi∈Uand B⊆C, the

neighborhood δB(xi)of xiis deﬁned as:

δB(xi) = {xj|xj∈U, ∆B(xi, xj)≤δ},(1)

where ∆is a distance function. ∀x1, x2, x3∈U, it satisﬁes

(1) ∆(x1, x2)≥0;

(2) ∆(x1, x2) = 0, if and only if x1=x2;

(3) ∆(x1, x2) = ∆(x2, x1);

(4) ∆(x1, x3)≤∆(x1, x2) + ∆(x2, x3);

A detailed survey on distance functions can be seen in [18].

If the attributes generate neighborhood relation over the

universe, the decision system is called a neighborhood

decision system. It is denoted by NDS =< U, C, D, V , I >.

Deﬁnition 2: [10] Given a neighborhood decision system

NDS,X1, X2, ..., XNare the object subsets with decisions

1to N,δB(xi)is the neighborhood information granules

including xiand generated by attributes B⊆C, Then the

positive region (POS) of the decision is deﬁned as

P OSB(D) = {xi|δB(xi)⊆X, xi∈U}.(2)

The size of the neighborhood depends on threshold δ.

When δ= 0, the samples in the same neighborhood granule

are equivalent to each other. In this case, the neighborhood

rough sets are a natural generalization of Pawlak rough sets.

B. Attribute signiﬁcance and reduction with neighborhood

model

The dependency degree of Dto Bis deﬁned as the ratio

of consistent objects:

γB(D) = |P OSB(D)|

|U|.(3)

A number of deﬁnitions of relative reducts exist [19], [20],

[21] for different rough set models. This paper employs the

deﬁnition based on the positive region.

Deﬁnition 3: [8] Given a neighborhood decision system

NDS =< U, C, D, V, I >, B ⊆C, attribute subset Bis a

relative reduct if

(1)γB(D) = γC(D),

(2)∀a∈B, γB(D)> γB−a(D).

C. Test-cost-sensitive decision systems

Since we have assumed that cost tests are undertaken in

parallel, so we consider the most widely used model as

follows:

Deﬁnition 4: [16] A test-cost-independent decision sys-

tem (TCI-DS) Sis the 6-tuple:

S= (U, C, D, {Va},{Ia}, c),(4)

where a∈C∪D,Uis the nonempty set called a universe, C

and Dare the nonempty set of variables called as conditional

attributes and decision attributes respectively. Vais the set

of values for each a∈C∪D, and Ia:U→Vais a

heuristic information function for each a∈C∪D, and

c:C→R+∪ {0}is the test cost function. Test costs are

independent of one another, that is, c(B) = ∑a∈Bc(a)for

any B⊆C.

III. TEST-C OS T-SENSITIVE ATTRIBUTE REDUCTION

BASED ON NEIGHBORHOOD DECISION SYSTEM

In this section, we discuss both the attribute value and test

cost normalization ﬁrst. And then, a problem of test-cost-

sensitive attribute reduction based on neighborhood decision

system is proposed.

A. Attribute value normalization

To design test-cost-sensitive attribute reduction based on

neighborhood decision system, we need to set the threshold

δ, which determines the size of the neighborhood. Setting the

threshold is the most important problem in neighborhood-

based classiﬁcation. In order to facilitate neighborhood

threshold settings, the values of attributes are normalized

ﬁrst.

Example 1: Table I presents a decision system of Iris,

which conditional attributes are numerical data. Where U=

{x1, x2, x3, x4, ..., x149, x150 },C={Sepal-length, Sepal-

width, Petal-length, Petal-width}, and D={Class}. In

order to computing the distance of conditional attributes,

we normalized every attributes from their value into a range

from 0 to 1 ﬁrstly.

Table I

AN EX AMP LE N UME RIC AL VAL UE ATTR IB UTE D EC ISI ON TAB LE (IRI S)

Patient Sepal-length Sepal-width Petal-length Petal-width Class

x10.23529 0.77273 0.14286 0.04762 0

x20.29412 0.72727 0.11905 0.04762 0

x30.35294 0.09091 0.38095 0.42857 0.5

x40.64706 0.31818 0.52381 0.52381 0.5

x50.41176 0.31818 0.50000 0.42857 0.5

...... ...... ...... ...... ...... ......

x149 0.58824 0.54545 0.85714 1.00000 1

x150 0.44118 0.27273 0.64286 0.71429 1

B. Test cost normalization

For statistical purposes, three different schemes to produce

random test costs are adopted. These schemes comprise:

uniform distribution, normal distribution, and Pareto distri-

bution. For simplicity, test costs are integers ranging from

Mto N, and are evaluated independently [16].

In order to facilitate the deﬁnition of heuristic function,

we normalize the value of test cost.

Let B⊆C, ai∈B.

c∗

i= (ci−mincost)/(maxcost −mincost)(5)

is the normalized cost of attribute ai, where ciis the test

cost of attribute ai, and mincost and maxcost are the

minimum cost and the max one of all conditional attributes

respectively.

C. Test-cost-sensitive attribute reduction based on neighbor-

hood decision system

Most heuristic algorithms to the attribute reduction have

the same structure and their differences lie in the heuristic

function [2]. Now we deﬁne the heuristic function based on

the positive region and weighted test cost.

Deﬁnition 5: Let S∗= (U, C, D, {Va},{Ia}, c∗)be a

test-cost-sensitive neighborhood decision system , where c∗

is normalized one of c. Let B⊆C, ai∈(C−B),c∗

iis

deﬁned in Equation(5).

Now we propose our positive region and weighted test

cost function as follows.

SI Gtc (ai, B, D, c∗

i) = SI G(ai, B, D) + (1 −c∗

i)∗ρ, (6)

where ρis regulatory factor of the test cost, if ρ= 0,SIGtc

and SI G are equivalent, where

SI G(ai, B, D) = γB∪ai(D).(7)

SI Gtc is the heuristic function with test costs, and SIG is

the one without taking into account test costs.

Now we propose a heuristic algorithm based on the

positive region and weighted test cost to ﬁnd out the reduct

with minimal test cost. A framework of our heuristic method

is shown in Algorithm 1. In the proposed Algorithm 1,

if sigm= 1, the algorithm ﬁnds the best solution. The

proposed algorithm has stably performance from a statistical

perspective.

According to Equation (1), if let δ= 0, the neighborhood

rough set degenerates to classical one. In this case, the pro-

posed test-cost reduct method can be used to deal with both

nominal attributes and numerical ones without discretization.

IV. EXPERIMENTS

The complexity of classiﬁcation not only depends on the

given feature space, but also the granularity level [8]. In this

paper, user can set granularity level and parameter δin one

set of experiments. We let δ= 0.005,0.008,0.011, ..., 0.029.

Algorithm 1 Test-cost-sensitive attribute reduction based on

neighborhood decision system

Input:(U, C, D, {Va},{Ia}, c∗)and δ,δis the threshold to

control the size of the neighborhood

Output: A reduct with minimal test cost red

Method:

1: red =∅;

2: sigm=−1, sigt= 0 ;

3: while (sigm̸= 1||sigm̸=sigt)do

4: sigt=sigm;

5: for each ai∈(C−red)do

6: Compute SI Gtc (ai, red, D, ci);

7: end for

8: Select amand cmwith the maximal

SI Gtc (am, red, D, cm);

9: Compute sigm=SI G(am, red, D );

10: red =red ∪am;

11: end while

12: if sigm= 1 then

13: return red;

14: end if

In Equation (6), ρis the regulatory factor of the test cost.

We let ρ= 0.01, which is just to reduce the inﬂuence of

the test cost but necessary. In other words, it can make the

SI G play a major role in the function.

For each distribution, we generate 100 sets of test costs,

and for each test cost setting, there are 9 δsettings.

A. Evaluation metrics

In order to dispel the inﬂuence of subjective and objective

factors, three evaluation metrics are adopted to compare

the performances. These are ﬁnding optimal factor (FOF),

maximal exceeding factor (MEF), and average exceeding

factor (AEF). The detail of deﬁnition can be seen in [16].

When the algorithm runs with different test cost settings, we

obtain some reducts with these evaluation metrics.

B. Statistical results

For different test cost distributions, the performance of

the algorithm is different. Figure 1 shows the results of

ﬁnding optimal factors. This metric is both qualitative and

quantitative. First, it only counts optimal solutions. Second,

it is computed statistically. Figure 2 shows the results of

maximal exceeding factors. The maximal exceeding factors

provide the worst case of the algorithm, and they should be

viewed as a statistical metric. Figure 3 shows the average

exceeding factors. This displays the overall performance of

the algorithm from a statistical perspective. On the whole,

with the Normal test cost distribution, the algorithm has the

best performance.

0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

delta

Finding optimal factor

Uniform

Normal

Pareto

Figure 1. Finding optimal factor(FOF)

0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

delta

Maximal exceeding factor

Uniform

Normal

Pareto

Figure 2. Maximal exceeding factor(MEF)

C. Performance comparison

If let ρ= 0, the Equation (6) will degrade to the Equation

(7). Now we just consider ρ̸= 0. Figure 4, Figure 5 and

Figure 6 compare the heuristic information function with

test cost and without one. Experimental results on the Iris

dataset with various test-cost settings show performance

improvement of the information function SIGtc over the

SI G one.

V. CONCLUSION

This study has proposed a new problem of a test-cost-

sensitive attribute reduction. We formally deﬁned the mini-

mal test cost reduct problem for numerical valued decision

systems. The new problem has practical areas of applica-

tion because datasets often contain numerical attributes in

the real world. The proposed solution on this problem is

0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029

0.02

0.04

0.06

0.08

0.1

0.12

0.14

delta

Average exceeding factor

Uniform

Normal

Pareto

Figure 3. Average exceeding factor(AEF)

0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

delta

FOF with the Uniform distribution

Without Cost

With Cost

Figure 4. Finding optimal factor with the Uniform distribution

based on neighborhood rough set model. We also design

a heuristic information function based on positive region

and the weighted test costs to obtain effective results.

With this function, a new heuristic algorithm is designed.

Experimental results show that the proposed method is able

to ﬁnd a low cost test set.

ACKNOWLEDGMENTS

This work is in part supported by Fujian Province Foun-

dation of Higher Education under Grant No. JK2010036,

the Fujian Province Foundation of Serving the Construction

of the Economic Zone on the West Side of the Straits,

National Science Foundation of China under Grant No.

60873077, 61170128, the Natural Science Foundation of

Fujian Province, China under Grant No. 2011J01374, and

the Education Department of Fujian Province under Grant

0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

delta

FOF with the Normal distribution

Without Cost

With Cost

Figure 5. Finding optimal factor with the Normal distribution

0.005 0.008 0.011 0.014 0.017 0.02 0.023 0.026 0.029

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

delta

FOF with the Pareto distribution

Without Cost

With Cost

Figure 6. Finding optimal factor with the Pareto distribution

No. JA11176.

REF ER EN CE S

[1] M. Dash and H. Liu, “Consistency-based search in feature

selection,” Artiﬁcial Intelligence, vol. 151, pp. 155–176, 2003.

[2] Y. Yao, Y. Zhao, and J. Wang, “On reduct construction

algorithms,” in Rough Set and Knowledge Technology, 2006,

pp. 297–304.

[3] W. Zhu and F. Wang, “Reduction and axiomization of cover-

ing generalized rough sets,” Information Sciences, vol. 152,

no. 1, pp. 217–230, 2003.

[4] Y. Yao and Y. Zhao, “Attribute reduction in decision-theoretic

rough set models,” Information Sciences, vol. 178, no. 17, pp.

3356–3373, 2008.

[5] H. Li, W. Zhang, and H. Wang, “Classiﬁcation and reduction

of attributes in concept lattices,” in Granular Computing,

2006, pp. 142–147.

[6] Q. Liu, F. Li, F. Min, M. Ye, and G. Yang, “An efﬁcient

reduction algorithm based on new conditional information

entropy,” Control and Decision (in Chinese), vol. 20, no. 8,

pp. 878–882, 2005.

[7] G. Wang, H. Yu, and D. Yang, “Decision table reduction

based on conditional information entropy,” Chinese Journal

of Computers, vol. 2, no. 7, pp. 759–766, 2002.

[8] Q. Hu, D. Yu, J. Liu, and C. Wu, “Neighborhood rough set

based heterogeneous feature subset selection,” Information

Sciences, vol. 178, no. 18, pp. 3577–3594, 2008.

[9] Q. Hu, D. Yu, and Z. Xie, “Numerical attribute reduction

based on neighborhood granulation and rough approximation

(in chinese),” Journal of Software, vol. 19, no. 3, pp. 640–649,

March 2008.

[10] Z. X. Qinghua Hu, Daren Yu, “Neighborhood classiﬁers,”

Expert Systems with Applications, vol. 34, pp. 866–876, 2008.

[11] F. Min, H. He, Y. Qian, and W. Zhu, “Test-cost-sensitive

attribute reduction,” Information Sciences, vol. 181, pp. 4928–

4942, November 2011.

[12] H. He, F. Min, and W. Zhu, “Attribute reduction in test-

cost-sensitive decision systems with common-test-costs,” in

ICMLC, v1, 2011, pp. 432–436.

[13] H. He and F. Min, “Accumulated cost based test-cost-sensitive

attribute reduction,” in RSFDGrC, ser. LNAI, vol. 6743, 2011,

pp. 244–247.

[14] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data

mining to knowledge discovery in databases,” AI Magazine,

vol. 17, pp. 37–54, 1996.

[15] P. D. Turney, “Cost-sensitive classiﬁcation: Empirical evalu-

ation of a hybrid genetic decision tree induction algorithm,”

Journal of Artiﬁcial Intelligence Research, vol. 2, pp. 369–

409, 1995.

[16] F. Min and Q. Liu, “A hierarchical model for test-cost-

sensitive decision systems,” Information Sciences, vol. 179,

no. 14, pp. 2442–2452, 2009.

[17] F. Min, W. Zhu, and H. Zhao, “Coser: Cost-senstive rough

sets, http://grc.fjzs.edu.cn/˜fmin/coser/index.html,” 2011.

[18] T. R. M. D. Randall Wilson, “Improved heterogeneous dis-

tance functions,” Journal of Artiﬁcial Intelligence Research,

vol. 6, pp. 1–34, 1997.

[19] Z. Pawlak, “Rough sets,” International Journal of Computer

and Information Sciences, vol. 11, pp. 341–356, 1982.

[20] D. Slezak, “Approximate entropy reducts,” Fundamenta In-

formaticae, vol. 53, no. 3-4, pp. 365–390, 2002.

[21] Y. Qian, J. Liang, W. Pedrycz, and C. Dang, “Positive

approximation: An accelerator for attribute reduction in rough

set theory,” Artiﬁcial Intelligence, vol. 174, no. 9-10, pp. 597–

618, 2010.

Approaches for coarsest granularity based near-optimal reduct computation

Article

Full-text available

Jun 2022
APPL INTELL

Traditionally, the shortest length has been used as the optimality criterion in rough set based optimal / near-optimal reduct computation. A more generalizable alternative to the optimal reduct computation approach was recently introduced, with the coarsest granular space as the optimality criterion. However, owing to exponential time complexity, it is not scalable to even moderate-sized data sets. This article investigates to formulate two near-optimal reduct computation alternatives for scaling comparatively larger data sets. The first algorithm employs a controlled A∗ search based strategy to find a near-optimal reduct while reducing both space utilization and computational time. Whereas, the second algorithm employs a greedy sequential backward elimination (SBE) strategy on the higher granular space attribute ordering for achieving coarsest granular space based near-optimal reduct. The comparative experimental study is conducted among the proposed approaches with the coarsest granular space based optimal reduct algorithm A∗RSOR and state-of-the-art shortest length based optimal and near-optimal reduct algorithms. The experimental study amply validates the relevance of the proposed approaches in obtaining near-optimal reduct with increased scalability and comparable or improved generalizable classification models induction.

Cost-sensitive SVDD models based on a sample selection approach

Article

Full-text available

Nov 2018
APPL INTELL

The asymmetry of different misclassification costs is a common problem in many realistic applications. However, most of the traditional classifiers pursue high recognition accuracy, assuming that different misclassification errors bring uniform cost. This paper proposes two cost-sensitive models based on support vector data description (SVDD) to minimize classification costs while maximize classification accuracy. The one-class classifier SVDD is extended to two two-class models. The cost information is incorporated to pursue tradeoff generalization performances between different classes in order to minimize the misclassification costs. Cost information is also considered to build the decision rules. The solutions of the optimization problems of the proposed two models are formulated according to sequential minimal optimization (SMO) algorithm. However, SMO needs to check all the samples to select the working set in each iteration, which is very time consuming. Considering that only the support vectors are needed to describe the boundaries, a sample selection approach is proposed to speed up the training time and reduce the storage requirement by selecting edge and overlapping samples, and overcome the local overlearning by remove outliers. Experimental results on synthetic and public datasets demonstrate the effectiveness and efficiency of the proposed methods.

Stable attribute reduction for neighborhood rough set

Article

Full-text available

Jan 2018

In neighborhood rough set theory, traditional heuristic algorithm for computing reducts does not take the stability of the selected attributes into account, it follows that the performances of the reducts may not be good enough if the perturbations of data occur. To fill the gap, the mechanism of acquiring the most significant attribute is realized by two steps in the reduction process: firstly, several important attributes are derived in each iteration based on several radii which are close to the given radius for computing reduct; secondly, the most significant attribute is selected from them by a voting strategy. The experiments verify that such method can effectively improve the stabilities of the reducts, and it does not require too much attributes for constructing the reducts.

Rough Sets in Machine Learning: A Review

Chapter

Full-text available

Apr 2017

This chapter emphasizes on the role played by rough set theory (RST) within the broad field of Machine Learning (ML). As a sound data analysis and knowledge discovery paradigm, RST has much to offer to the ML community. We surveyed the existing literature and reported on the most relevant RST theoretical developments and applications in this area. The review starts with RST in the context of data preprocessing (discretization, feature selection, instance selection and meta-learning) as well as the generation of both descriptive and predictive knowledge via decision rule induction, association rule mining and clustering. Afterward, we examined several special ML scenarios in which RST has been recently introduced, such as imbalanced classification, multi-label classification, dynamic/incremental learning, Big Data analysis and cost-sensitive learning.

Feature Selection with Positive Region Constraint for Test-Cost-Sensitive Data

Chapter

Full-text available

Aug 2014

In many data mining and machine learning applications, data are not free, and there is a test cost for each data item. Due to economic, technological and legal reasons, it is neither possible nor necessary to obtain a classifier with 100 % accuracy. In this paper, we consider such a situation and propose a new constraint satisfaction problem to address it. With this in mind, one has to minimize the test cost to keep the accuracy of the classification under a budget. The constraint is expressed by the positive region, whereas the object is to minimizing the total test cost. The new problem is essentially a dual of the test cost constraint attribute reduction problem, which has been addressed recently. We propose a heuristic algorithm based on the information gain, the test cost, and a user specified parameter \(\lambda \) to deal with the new problem. The algorithm is tested on four University of California - Irvine datasets with various test cost settings. Experimental results indicate that the algorithm finds optimal feature subset in most cases, the rational setting of \(\lambda \) is different among datasets, and the algorithm is especially stable when the test cost is subject to the Pareto distribution.

Granule-specific feature selection for continuous data classification using neighborhood rough sets

Article

Sep 2023
EXPERT SYST APPL

Pseudo-label neighborhood rough set: Measures and attribute reductions

Article

Nov 2018
INT J APPROX REASON

The scale of the radius for constructing neighborhood relation has a great effect on the results of neighborhood rough sets and corresponding measures. A very small radius frequently brings us nothing because any two different samples are separated from each other, though these two samples have the same label. If the radius is growing, then there is a serious risk that samples with different labels may fall into the same neighborhood. Obviously, the radius based neighborhood relation does not take the labels of samples into account, which will lead to unsatisfactory discrimination. To fill such gap, a pseudo-label strategy is systematically studied in rough set theory. Firstly, a pseudo-label neighborhood relation is proposed. Such relation can differentiate samples by not only the distance but also the pseudo labels of samples. Therefore, both the neighborhood rough set and some corresponding measures can be re-defined. Secondly, attribute reductions are explored based on the re-defined measures. The heuristic algorithm is also designed to compute reducts. Finally, the experimental results over UCI data sets tell us that our pseudo-label strategy is superior to the traditional neighborhood approach. This is mainly because the former can significantly reduce the uncertainties and improve the classification accuracies. The Wilcoxon signed rank test results also show that neighborhood approach and pseudo-label neighborhood approach are so different from the viewpoints of the measures and attribute reductions in rough set theory.

Multi-granulation fuzzy preference relation rough set for ordinal decision system

Article

Aug 2016
FUZZY SET SYST

Preference analysis is a class of important issues in multi-criteria ordinal decision making. Rough set is an effective approach to handle preference analysis. In order to solve the multi-criteria preference analysis problem, this work improves the fuzzy preference relation rough set model with additive consistent fuzzy preference relation, and expands it to multi-granulation case. Cost is also an important issue in decision analysis. Taking the cost into consideration, we also expand the model to cost sensitive multi-granulation fuzzy preference relation rough set. Some theorems are represented, and the classification and sample condensation algorithms based on our model are investigated. Some experiments are complete and the experimental results show that our model and algorithms are effective for preference decision making of ordinal decision system.

A new algorithm of attribute reduction based on fuzzy clustering

Article

Jul 2013

This paper proposes a new approach of attribute reduction for decision systems based on rough set and fuzzy clustering in order to avoid information loss resulted from the discretization of real valued condition attributes. In this paper, the fuzzy clustering technique is employed to obtain an optimal value which measures the inconsistency between condition attributes and decision attribute, and attribute reduction is performed to keep this optimal value. Finally, an example is employed to illustrate our idea.

Minimal-test-cost reduct problem on neighborhood decision systems

Article

Nov 2012

Test-cost-sensitive attribute reduction is a recent challenge in data mining. Most existing work considers nominal data, while only a few studies numerical ones. Moreover, for numerical data, the existing problem definition is not clear enough; only heuristic algorithms are proposed, which cannot be evaluated without optimal results. To address this situation, we formally redefine the minimal-test-cost reduct problem on neighborhood decision systems. Inconsistent objects are introduced to describe positive regions, boundary regions and reducts. Some propositions concerning attribute reduction are also presented. We design a backtrack algorithm to obtain a minimal-test-cost reduct. In the algorithm, three pruning techniques are employed to improve its efficiency. Experimental results on eight UCI (University of California, Irvine) datasets indicate that the pruning techniques are effective, and the new algorithm is efficient on medium-sized datasets. We also study the performance of existing heuristic attribute reduction algorithms employing results obtained by our algorithm. It is shown that there is a need to improve the performance of these heuristic algorithms. 1548-7741/

Efficient knowledge reduction algorithm based on new conditional information entropy

Article

Aug 2005

Attribute Reduction in Test-cost-sensitive Decision Systems with Common-test-costs

Article

Jan 2011

This paper proposes a new attribute reduction problem as well as an algorithm under the context of cost-sensitive learning. Attribution reduction, also called feature selection, has attracted much research interests especially from the rough sets society. An attribute reduct is a subset of attributes/tests that are jointly sufficient and individually necessary for preserving a particular property of the given information table. The tradition attribute reduction problem aims at constructing a minimal reduct. As extensions of this problem, test-cost-sensitive attribute reduction problems aim at constructing a minimal test-cost reduct. Some previous works have addressed such a problem under the model where test-costs of attributes are independent from each other. This paper addresses a new problem under the model where different attributes may share some common-test-costs. We define the problem formally, and propose an algorithm to deal with it. This algorithm takes into account the conditional information entropy, test-costs and common-test-costs. Experimental results on the Zoo dataset with various test-cost settings show that our algorithms are efficient and effective.

Data Mining and Knowledge Discovery in Databases (Introduction to the Special Section).

Article

Nov 1996

Rough set

Article

Oct 1982
Int J Comput Inform Sci

Zdzisław Pawlak

We investigate in this paper approximate operations on sets, approximate equality of sets, and approximate inclusion of sets. The presented approach may be considered as an alternative to fuzzy sets theory and tolerance theory. Some applications are outlined.

Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation

Article

Mar 2008

To deal with numerical features, a neighborhood rough set model is proposed based on the definitions of δ neighborhood and neighborhood relations in metric spaces. Each object in the universe is assigned with a neighborhood subset, called neighborhood granule. The family of neighborhood granules forms a concept system to approximate an arbitrary subset in the universe with two unions of neighborhood granules: lower approximation and upper approximation. Thereby, the concepts of neighborhood information systems and neighborhood decision tables are introduced. The properties of the model are discussed. Furthermore, the dependency function is used to evaluate the significance of numerical attributes and a forward greedy numerical attribute reduction algorithm is constructed. Experimental results with UCI data sets show that the neighborhood model can select a few attributes but keep,

Decision table reduction based on conditional information entropy

Article

Jul 2002

This paper analyzes the information view of rough set theory and compares it with the algebra view of rough set theory. Some equivalence relations and other kind of relations like inclusion relation between the information view and the algebra view of rough set theory are resulted through comparing each other. Two novel heuristic knowledge reduction algorithms are developed based on conditional information entropy, that is, conditional entropy based algorithm for reduction of knowledge with computing core (CEBARKCC) and conditional entropy based algorithm for reduction of knowledge without computing core (CEBARKNC). These two algorithms are compared with a mutual information based algorithm for reduction of knowledge (MIBARK) of Duoqian Miao through theoretical analysis and experimental simulation. CEBARKCC algorithm and CEBARKNC algorithm have good performance in simulation.

Neighborhood rough set based heterogeneous feature subset selection

Article

Sep 2008
INFORM SCIENCES

Feature subset selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. Most of researches are focused on dealing with homogeneous feature selection, namely, numerical or categorical features. In this paper, we introduce a neighborhood rough set model to deal with the problem of heterogeneous feature subset selection. As the classical rough set model can just be used to evaluate categorical features, we generalize this model with neighborhood relations and introduce a neighborhood rough set model. The proposed model will degrade to the classical one if we specify the size of neighborhood zero. The neighborhood model is used to reduce numerical and categorical features by assigning different thresholds for different kinds of attributes. In this model the sizes of the neighborhood lower and upper approximations of decisions reflect the discriminating capability of feature subsets. The size of lower approximation is computed as the dependency between decision and condition attributes. We use the neighborhood dependency to evaluate the significance of a subset of heterogeneous features and construct forward feature subset selection algorithms. The proposed algorithms are compared with some classical techniques. Experimental results show that the neighborhood model based method is more flexible to deal with heterogeneous data.

Neighborhood classifiers

Article

Feb 2008
EXPERT SYST APPL

K nearest neighbor classifier (K-NN) is widely discussed and applied in pattern recognition and machine learning, however, as a similar lazy classifier using local information for recognizing a new test, neighborhood classifier, few literatures are reported on. In this paper, we introduce neighborhood rough set model as a uniform framework to understand and implement neighborhood classifiers. This algorithm integrates attribute reduction technique with classification learning. We study the influence of the three norms on attribute reduction and classification, and compare neighborhood classifier with KNN, CART and SVM. The experimental results show that neighborhood-based feature selection algorithm is able to delete most of the redundant and irrelevant features. The classification accuracies based on neighborhood classifier is superior to K-NN, CART in original feature spaces and reduced feature subspaces, and a little weaker than SVM.

Attribute reduction in decision-theoretic rough set models

Article

Sep 2008
INFORM SCIENCES

Rough set theory can be applied to rule induction. There are two different types of classification rules, positive and boundary rules, leading to different decisions and consequences. They can be distinguished not only from the syntax measures such as confidence, coverage and generality, but also the semantic measures such as decision-monotocity, cost and risk. The classification rules can be evaluated locally for each individual rule, or globally for a set of rules. Both the two types of classification rules can be generated from, and interpreted by, a decision-theoretic model, which is a probabilistic extension of the Pawlak rough set model.As an important concept of rough set theory, an attribute reduct is a subset of attributes that are jointly sufficient and individually necessary for preserving a particular property of the given information table. This paper addresses attribute reduction in decision-theoretic rough set models regarding different classification properties, such as: decision-monotocity, confidence, coverage, generality and cost. It is important to note that many of these properties can be truthfully reflected by a single measure γ in the Pawlak rough set model. On the other hand, they need to be considered separately in probabilistic models. A straightforward extension of the γ measure is unable to evaluate these properties. This study provides a new insight into the problem of attribute reduction.

Consistency-based Search in Feature Selection

Article

Dec 2003
ARTIF INTELL

Feature selection is an effective technique in dealing with dimensionality reduction. For classification, it is used to find an “optimal” subset of relevant features such that the overall accuracy of classification is increased while the data size is reduced and the comprehensibility is improved. Feature selection methods contain two important aspects: evaluation of a candidate feature subset and search through the feature space. Existing algorithms adopt various measures to evaluate the goodness of feature subsets. This work focuses on inconsistency measure according to which a feature subset is inconsistent if there exist at least two instances with same feature values but with different class labels. We compare inconsistency measure with other measures and study different search strategies such as exhaustive, complete, heuristic and random search, that can be applied to this measure. We conduct an empirical study to examine the pros and cons of these search methods, give some guidelines on choosing a search method, and compare the classifier error rates before and after feature selection.

Test-cost-sensitive attribute reduction based on neighborhood rough set

Abstract

Recommended publications

A new approach of attribute reduction of rough sets based on soft metric

Optimal Sub-Reducts with Test Cost Constraint

Attribute reduction of data with error ranges and test costs

A genetic algorithm to the minimal test cost reduct problem

Optimal sub-reducts in the dynamic environment