Content uploaded by Qinghua Hu
Author content
All content in this area was uploaded by Qinghua Hu on Jul 19, 2014
Content may be subject to copyright.
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006 191
Fuzzy Probabilistic Approximation Spaces and Their
Information Measures
Qinghua Hu,Daren Yu,Zongxia Xie, and Jinfu Liu
Abstract—Rough set theory has proven to be an efficient tool for
modeling and reasoning with uncertainty information. By intro-
ducing probability into fuzzy approximation space, a theory about
fuzzy probabilistic approximation spaces is proposed in this paper,
which combines three types of uncertainty: probability, fuzziness,
and roughness into a rough set model. We introduce Shannon’s
entropy to measure information quantity implied in a Pawlak’s
approximation space, and then present a novel representation of
Shannon’s entropy with a relation matrix. Based on the modified
formulas, some generalizations of the entropy are proposed to cal-
culate the information in a fuzzy approximation space and a fuzzy
probabilistic approximation space, respectively. As a result, uni-
form representations of approximation spaces and their informa-
tion measures are formed with this work.
Index Terms—Approximation space, fuzzy set, information mea-
sure, probability distribution, rough set.
I. INTRODUCTION
ROUGH set methodology has been witnessed great success
in modeling with imprecise and incomplete information.
The basic idea of this method hinges on classifying objects of
discourse into classes containing indiscernible objects with re-
spect to some attributes. Then the indiscernible classes, also
called knowledge granules, are used to approximate the unseen
object sets. In this framework, an attribute set is viewed as a
family of knowledge, which partitions the universe into some
knowledge granules or elemental concepts. Any attribute or at-
tribute set can induce a partition of the universe. We say
that knowledge is finer than knowledge if is a re-
finement of . An arbitrary subset of the universe can
be approximated by two sets , called the lower ap-
proximation and upper approximation, respectively. If can be
precisely approximated by some knowledge granules of the par-
tition, the set is called a definable set, where ; oth-
erwise we say is a rough set. The approximating power of an
information system depends on the knowledge . The finer the
knowledge is, the more accurately can be approximated.
This process is much similar to reasoning of human’s mind. In
real life, the objects are drawn together by indistinguishability,
similarity, proximity and named with a concept. Then a concept
system is formed and used to approximately describe unseen ob-
jects. Partition, granulation and approximation are the methods
widely used in human’s reasoning [10], [39]. Rough set method-
ology presents a novel paradigm to deal with uncertainty and has
Manuscript received May 16, 2004; revised March 16, 2005 and June 8, 2005.
The authors are with Harbin Institute of Technology, Harbin, Heilonghiang
Province 150001, China (e-mail: huqinghua@hcms.hit.edu.cn; yudaren@hcms.
hit.edu.cn; xiezongxia@hcms.hit.edu.cn; liujinfu@hcms.hit.edu.cn).
Digital Object Identifier 10.1109/TFUZZ.2005.864086
been applied to feature selection [1], [2], knowledge reduction
[3], [36], [38], rule extraction [4]–[6], uncertainty reasoning [7],
[8] and granulation computing [9], [33], [34], [42].
In Pawlak’s rough set model, fuzziness and probability are
not taken into consideration. Pawlak’s model just works in nom-
inal data domain, for crisp equivalence relations and equiva-
lence classes are the foundation of the model [8], [37]. How-
ever, there are usually real-valued data and fuzzy information
in real-world applications. To deal with fuzziness, some gen-
eralizations of Pawlak’s model were proposed; the theories on
rough and fuzzy sets were put together. Rough-fuzzy sets and
fuzzy-rough sets were introduced in [11], [12], [35] and ana-
lyzed in detail [13]–[16]. The generalized methods were applied
to hybrid data reduction [41], mining stock price [17], vocabu-
lary for information retrieval [18] and fuzzy decision rule ex-
traction [19].
Both the theory on classical rough sets and its fuzzy gen-
eralizations implicitly take an assumption that the objects in
the universe are equally probable. Namely, the objects are uni-
formly distributed and the probability of each object is ,
where is the number of objects. In fact, this assumption just
holds if the information about the probability of the objects
is totally ignored. Sometimes, there is a probability distribu-
tion on the object or event set [23], [40]. A theory on proba-
bilistic approximation space or a probabilistic rough set model
is expected in this case. For example, there is an information
system about the disease flu, which is described with three at-
tributes: headache, muscle pain, and temperature. The values of
the attributes headache, muscle pain and flu are yes and no, and
those of the attribute temperature are high and normal. There
are cases in all. If there are not any samples about the
disease, but a probability distribution of the 16 cases, then the
theory about probabilistic approximation spaces is desirable for
reasoning with uncertainty of roughness and randomness. Prob-
ability distribution of the universe lays a foundation to employ
statistical techniques into rough set model, which maybe lead to
a tool to deal with inconsistency or noise in data.
In the rough set framework, attributes are called knowledge,
which is used to form a concept system of the universe. Knowl-
edge introduced by an attribute set implies in the partition of a
referential universe. The more knowledge there is, the finer par-
tition will be, and correspondingly we can get a more perfect
approximation of a subset in the universe. Attributes induce an
order or a structure of universe of discourse, which decreases
uncertainty or chaos of the universe. Given a universe , a prob-
ability distribution on , and some nominal, real-value or fuzzy
attributes, there comes forth an interesting problem: How do we
measure the knowledge quantity introduced by an attribute set in
the approximation space. In other words, it’s interesting in con-
1063-6706/$20.00 © 2006 IEEE
192 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006
structing a measure to compute the discernibility power induced
by a family of attributes. This measure leads to the likelihood to
compare the knowledge quantity formed by different attributes,
and help us find the important attribute set and redundancy of an
information system. Hartley captured the intuitive idea that the
more possible results for an experiment, the less it can be pre-
dicted. Shannon [20] defined a measure of a random variable
within the frame of communication theory. Forte and Kampe
[21], [22] gave the axiomatic information measure, where the
word “information”was associated both to measures of events
and measures of partitions and suggested that the uncertainty
measure is associated to a family of partitions of a given refer-
ential space. Zadeh [23] introduced a new uncertainty measure
for fuzzy probabilistic space. Yager introduced some measures
to calculate uncertainty implied in similarity relation [24]. In
[25] a measure, suitable to operate on fuzzy equivalence relation
domains, was introduced. Uncertainty measure on fuzzy parti-
tions was analyzed in documents [26], [27], [37]. In this paper,
Shannon’s entropy is first introduced to compute the knowledge
quantity of nominal attributes in Pawlak’s approximation space,
and then an extended information measure will be presented,
which is suitable for the spaces where fuzzy attributes or fuzzy
relations are defined on. Based on the extension, the solutions
to measuring the information in fuzzy and fuzzy probabilistic
hybrid approximation spaces are presented.
The rest of the paper is organized as follows. Some defini-
tions in classical approximation spaces are reviewed in Sec-
tion II. We introduce fuzzy probabilistic approximation spaces
in Section III. Shannon’s entropy is applied to calculating the
information quantity in a classical approximation space in Sec-
tion IV. Then we redefine the formulae of Shannon’s entropy
with a matrix representation and extend it to the fuzzy cases. The
information measures for fuzzy approximation spaces and fuzzy
probabilistic approximation spaces are presented in Section V.
Finally, the conclusions and discusses are given in Section VI.
II. PRELIMINARIES
In this section, we will review some basic definitions in rough
set theory.
Definition 1: is called an approximation space, where
is the universe; is a family of at-
tributes, also called knowledge in the universe. is the value
domain of and is an information function .
An approximation space is also called an information system.
Any subset of knowledge defines an equivalence (also
called indiscernibility) relation on
(1)
will generate a partition of . We denote the parti-
tion induced by attributes as
(2)
where is the equivalence class containing , the elements
in are indiscernible or equivalent with respect to knowl-
edge . Equivalent classes, also named as elemental concepts,
information granules, are used to characterize arbitrary subsets
of .
Definition 2: An arbitrary subset of is characterized by
two unions of elemental concepts , called lower and
upper approximations, respectively
(3)
The lower approximation is the greatest union of con-
tained in and the upper approximation is the least union of
containing . The lower approximation is also called pos-
itive region sometimes, denoted as .
We say is a refinement of if there is a partial order
(4)
Theorem 1: , .
Theorem 2: ,
.
, that is to say, can be accurately characterized
with knowledge , and we say set is definable, otherwise,
is indefinable and we say is a rough set.
is called boundary set. A set is definable if it is a
finite union of some elemental concepts, which let precisely
characterized with respect to knowledge . Theorem 1 shows
that the more knowledge we have, the finer partition we will get,
accordingly, the more accurately subset can be approximated
and a less boundary we will get.
Definition 3: Given , , ,if
, we say knowledge is redundant in . Otherwise,
we say knowledge is indispensable. If each in is indis-
pensable, we say is independent. If a set is indepen-
dent and , we say is a reduct of .
A reduct of an information system has the same discernibility
or representation power as that of the original system; however
the reduct has a concise representation with respect to the orig-
inal data.
There is often more than one reduct in an information system.
The common elements of all reducts are called the core of the
information system. The core is the attribute set which cannot
be deleted from the system, or the discernibility of the system
will decrease.
Definition 4: An information system is called a deci-
sion table if the attribute set , where is the con-
dition attribute set and is the decision attribute set. We define
the dependency between and as
(5)
where denotes the cardinality of a set and
, is th equivalence class induced by .Given ,
we say is redundant relative to in if
, otherwise is indispensable.If is indispens-
able we say is independent with respect to the decision .
HU et al.: FUZZY PROBABILISTIC APPROXIMATION SPACES AND THEIR INFORMATION MEASURES 193
Dependency measures the capability of condition attributes
to characterize the decision and can be used as a sig-
nificance measure of condition attributes with respect to deci-
sion. means that the decision can be approximated
precisely by the knowledge granules induced with the attribute
set .
Definition 5: Given , we say is the - relative reduct
if satisfies
1) ;
2) B is independent relative to .
The first term grantees the power of to approximate is
the same as that of ; the second term means that there is no
redundant attribute in .
III. FUZZY PROBABILISTIC APPROXIMATION SPACES
Pawlak’s approximation spaces work on the domain where
crisp equivalence relations are defined. In this section, we will
integrate three types of uncertainty: probability, fuzziness, and
roughness together, and present the definition of fuzzy proba-
bilistic approximation spaces.
Definition 6: Given a nonempty finite set , is a fuzzy
binary relation over , denoted by a matrix
(6)
where is the relation value between and .We
say is a fuzzy equivalence relation if , , , satisfies
1) Reflexivity: ;
2) Symmetry: ;
3) Transitivity: .
Some operations of relation matrices are defined as
1) ;
2) ;
3) ;
4) .
A crisp equivalence relation induces a crisp partition of the
universe and generates a family of crisp equivalence classes.
Correspondingly, a fuzzy equivalence relation generates a
fuzzy partition of the universe and a series of fuzzy equivalence
classes, which are also called fuzzy knowledge granules [10],
[39], [43].
Definition 7: The fuzzy partition of the universe generated
by a fuzzy equivalence relation is defined as
(7)
where .
is the fuzzy equivalence class containing . is the degree of
equivalent to . Here, “ ” means union of elements.
In this case, is a fuzzy set and the family of forms a
fuzzy concept system of the universe. This system will be used
to approximate the object subset of the universe.
Example 1: Assume is an object set,
is a fuzzy equivalence relation on :
Then, the equivalence classes are
Theorem 3: Given an set , is a fuzzy equivalence relation
on . , ,wehave
1) ;
2)
Theorem 4: Given an set , and are two fuzzy equiv-
alence relations on ,wehave
Definition 8: A three-tuple is a fuzzy probabilistic
approximation space or a fuzzy probabilistic information
system, where is a nonempty and finite set of objects, called
the universe, is a probability distribution over . is a
family of fuzzy equivalence relations defined on .
Definition 9: Give a fuzzy probabilistic approximation space
. is a fuzzy subset of . The lower approxima-
tion and upper approximation is denoted by and ; then
membership of to are defined as
(8)
where and mean and operators, respectively, and
means the membership of to , seeing [28]. These
definitions are the rational extension of some models. Let us
derive the other model from these definitions.
Case 1: is a crisp subset and is a crisp equivalence
relation on
194 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006
These definitions are consistent with Pawlak’s rough set
model in this case.
Case 2: is a fuzzy subset of and is a crisp equivalence
relation on
Here, the rough sets are called rough fuzzy sets.
Case 3: is a subset of and is a fuzzy equivalence
relation on :
From the previous analysis, we can conclude that the defini-
tions of lower and upper approximations of fuzzy sets in fuzzy
information systems are rational generalizations of the classical
model.
The membership of an object , belonging to the fuzzy
positive region is
(9)
Definition 10: Given a fuzzy probabilistic information
system , and are two subsets of attribute set ,
the dependency degree of to is defined as
(10)
The difference between fuzzy approximation spaces and
fuzzy probabilistic approximation spaces is introducing prob-
ability distribution over . This leads to a more general
generalization of Pawlak’s approximation space. The classic
approximation space takes the uniform-distribution assump-
tion. So , . Then
This formula is the same as that in fuzzy approximation space
[28], which shows that the fuzzy probabilistic approximation
space will degrade to a fuzzy approximation space when the
equality–probability assumption is satisfied.
Definition 11: Given , , ,if
and are two fuzzy partitions, we say knowledge is
redundant or superfluous in if . Otherwise,
we say knowledge is indispensable.Ifany belonging to
is indispensable, we say is independent. If attribute subset
is independent and , we say is a reduct
of .
Definition 12: Given , . is a subset
of . , is redundant in relative to if
, otherwise is indispensable. is independent if
is indispensable, otherwise is dependent. is a reduct
if satisfies
1) ;
2) .
Comparing the fuzzy probabilistic approximation space with
fuzzy approximation space, we can find that the foundational
difference is in computing the cardinality of fuzzy set, such
as fuzzy equivalence classes, fuzzy lower approximations and
fuzzy upper approximations. Accordingly it leads to difference
in defining the function of dependency. Finding dependency of
data is a foundational problem in machine learning and data
mining. The difference in dependency will lead to great changes
in reasoning with uncertainty. In classical fuzzy approxima-
tion space, we assume the objects are uniformly distributed and
. In the fuzzy probabilistic approximation space
the probability of is . When the probability
, the fuzzy probabilistic approximation space degrades to
a fuzzy approximation space, and if the equivalence relation and
the object subset to be approximated are both crisp, we get a
Pawlak’s approximation space.
IV. SHANNON’SENTROPIES ON PAWLAK’S
APPROXIMATION SPACE
Knowledge is thought as the discernibility power of the at-
tributes in the framework of rough set methodology. An attribute
set forms an equivalence relation; correspondingly generates a
partition of the universe and a family of concepts. The quantity
of knowledge measures the fineness degree of the partition. The
finer the partition is, the more knowledge about the universe we
have, and accordingly a finer approximation we will have. In
this section, we will introduce Shannon’s information measure
to compute the knowledge quantity of a crisp attribute set or a
crisp partition of .
Given a universe and two attribute sets , , we take the
partitions and as two random variables in -algebra:
The probability distributions of and are defined as
(11)
HU et al.: FUZZY PROBABILISTIC APPROXIMATION SPACES AND THEIR INFORMATION MEASURES 195
and
(12)
where ; . Correspondingly,
the joint probability of and is
(13)
where .
Definition 13: Information quantity of attributes is defined
as
(14)
Definition 14: The joint entropy of and is defined as
(15)
Definition 15: The conditional entropy to is
defined as
(16)
where .
Theorem 5: .
Theorem 6: Given a universe , and are two attribute
sets on ,if then
1) ;
2) ;
3) , where means .
Proof: The first two terms are straightforward. Here we just
give the proof of the third term.
Taking that the probability distributions about knowledge
and are
and
Without loss of generality, we assume that
,
Here we have
,,so
.
Theorem 7: Given ,if is redundant, then
, otherwise .
Proof: The probability distributions of and are
and
Attribute is redundant, is the same as .So
Then .
Theorem 8: Given an approximation space , is
a reduct if satisfies
1) ;
2) .
Theorem 9: Given a decision table ,,if
, then .
Theorem 10: Given a decision table , ,
where is the condition attribute set and is the decision,
, . is redundant if . is
independent if . is a reduct
of the decision table if satisfies
1) ;
2) .
Example 2: Consider the decision Table I, where
196 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006
TABLE I
HIRING DATA
The partitions of by , , , and are
First, we calculate the dependency between the condition at-
tributes and decision attribute with Definition 5 and find that
there are two reducts of the system: and
As we know, information entropy is greater than 0. The
above computation shows the decision attribute can be totally
precisely approximated if we have attribute set .
will make no refinement to the partition by . There-
fore no knowledge will be brought into the system by or
. and and
are less than . According to Theorem 7,
we know is a reduct of the decision table. Analogously
the set is also a reduct.
V. I NFORMATION MEASURES ON FUZZY PROBABILISTIC
APPROXIMATION SPACES
Shannon’s information entropy just works in the case where a
crisp equivalence relation or a crisp partition is defined. It is suit-
able for Pawlak’s approximation space. In this section, a novel
formula to compute Shannon’s entropy with a crisp relation ma-
trix is presented, and then generalized to the fuzzy cases. Fur-
thermore, we will propose another generalization applicable to
the case where a probability distribution is defined on the uni-
verse and use the proposed entropies to measure the information
in fuzzy probabilistic approximation spaces.
A. Shannon’s Entropy for Crisp Equivalence Relations
Given a crisp approximation space , Arbitrary relation
can be denoted by a relation matrix
where is the relation value between elements and .If
satisfies ; ; and
then , we say is an equivalence
relation and is an equivalence relation matrix.
Then the equivalence class contained with respect to is
written as
(17)
where or 1. “1”means that is indiscernible with
respect to the relation and belongs to the equivalence class;
“0”means doesn’t belong to the class. The cardinality of
is defined as
(18)
Definition 16: Given an approximation space , an arbi-
trary equivalence relation on , denoted by a relation matrix
, then we define the information measure for relation as
(19)
where .
HU et al.: FUZZY PROBABILISTIC APPROXIMATION SPACES AND THEIR INFORMATION MEASURES 197
Example 3: There is an object set , and a relation
matrix of the set induced by a nominal attribute
Then, the equivalence class of can be written as
And the information entropy of is calculated by
Intuitively, the object set is divided into two classes
. The information quantity of the relation
is
We can find that the information entropy with Definition 16
is equivalent to Shannon’s entropy for crisp relations.
Theorem 11: Given an approximation space , ,
is the equivalence relation generated by attributes . Then,
we have .
Theorem 12: Given an approximation space , ,
, , are two equivalence relations generated by attributes
and . and are the equivalence classes induced
by and . The joint entropy of and is
Here, means and means .
Theorem 13: Given an approximation space , ,
, , are two equivalence relation generated by attributes
and . and are the equivalence classes induced by
and . The conditional entropy conditioned to
is
Proof: Please see the Appendix.
The previous work reforms Shannon’s information measures
into a relation matrix representation. The reformation will bring
great advantages for generalizing them to the fuzzy cases.
B. Information Measure for Fuzzy Relations
As we know, fuzziness exists in many real-world applications.
Dubois et al. presented the definitions of fuzzy approximation
spaces [11], [12]. In this section, we will present a generaliza-
tion of Shannon’s entropy. The novel measure is with the same
form as Shannon’s one and can work in the case where fuzzy
equivalence relations are defined.
Given a finite set , is a fuzzy attribute set in , which
generates a fuzzy equivalence relation on . The fuzzy re-
lation matrix is denoted by
where is the relation value of and .
The fuzzy partition generated by the fuzzy equivalence rela-
tion is
(20)
where .
Remark that takes a value in the range [0, 1] here. This
is the key difference between the crisp set theory and the fuzzy
one. As to a fuzzy partition induced by a fuzzy equivalence re-
lation, the equivalence class is a fuzzy set. “”means the oper-
ator of union in this case. The cardinality of the fuzzy set
can be calculated with
(21)
which appears to be a natural generalization of the crisp set.
Definition 17: Information quantity of a fuzzy attribute set or
a fuzzy equivalence relation is defined as
(22)
where , called a fuzzy relative frequency, n is the
number of objects in .
This measure has the same form as the Shannon’s one de-
fined as Definition 16, but it has been generalized to the fuzzy
case. The formula of information measure forms a map:
, where is a equivalence relation matrix, is the
nonnegative real-number set. This map builds a foundation on
which we can compare the discernibility power, partition power
or approximating power of multiple fuzzy equivalence relations.
The entropy value increases monotonously with the discerni-
bility power of the fuzzy attributes.
Definition 18: Given , , are two subsets of .
and are fuzzy equivalence classes containing generated
by , , respectively. The joint entropy of and is defined
as
(23)
198 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006
Definition 19: Given , is the fuzzy attribute set. ,
are two subsets of . The conditional entropy of conditioned
to is defined as
(24)
Theorem 14: .
Theorem 15:
1) ,“ ” holds if and only if , , .
2) .
3) .
4) .
C. Information Quantity on Fuzzy Probabilistic Approximation
Space
Shannon’s entropy and the proposed measure work on the as-
sumption that all the objects are equality-probable. In practice,
probability of elements in the universe are different. In this sec-
tion, we will give a generalization where a probability distribu-
tion is defined on .
Given a fuzzy probabilistic approximation space ,
is the fuzzy attribute set, and generates a family of fuzzy equiv-
alence relations on ; is the probability distribution over
and is the probability of object . A fuzzy equivalence
relation generated by the attribute subset is
denoted by a relation matrix:
where . .
Definition 20: The expected cardinality of a fuzzy equiv-
alence class is defined as
(25)
Definition 21: The information quantity of a fuzzy attribute
set or fuzzy equivalence relation is defined as
(26)
This measure is identical with Yager’s entropy [24] in the
form, but different in goal. The information measure we give
is to compute the discernibility power of a fuzzy attribute set
or a fuzzy equivalence relation where a probability distribute is
defined on . while Yager’s entropy is to measure the semantics
of a fuzzy similarity relation.
Here. we will present a smooth generalization of the defini-
tions of joint entropy and conditional entropy in Shannon’s in-
formation theory. And the novel generalizations overcome this
problem.
Definition 22: Given , , are two subsets of .
The fuzzy equivalence relations induced by , are denoted
by and .The joint entropy of and is defined as
(27)
where .
Definition 23: The conditional entropy of to is defined
as
(28)
where and .
Theorem 16:
The forms of the proposed information measures are identical
with that of Shannon’s ones, however they can be used to mea-
sure the information generated by a fuzzy attribute set, a fuzzy
equivalence relation or a fuzzy partition.
The previous work presents an information measure for fuzzy
equivalence relations when a probability distribution is defined.
Here, we will apply it to the fuzzy probabilistic approximation
space.
Theorem 17: Given a fuzzy probabilistic approximation
space , is a fuzzy attribute set; is the probability
distribution on . , are two subsets of . The fuzzy
equivalence relations induced by , are denoted by and
. Then, we have
1) ;
2) ;
3) or ;
4) or
Theorem 18: Given a fuzzy information system ,
, , if is redundant;
if is independent. is a reduct if
satisfies
1) ;
2) .
Theorem 19: Given a fuzzy information system
. is a subset of . ,
if a is redundant in relative to ;
if is independent. is a reduct of
relative to if satisfies
1) ;
2) .
Example 4: Given a set . The probability
distribution is and Some
fuzzy equivalence relations on are shown as follows:
HU et al.: FUZZY PROBABILISTIC APPROXIMATION SPACES AND THEIR INFORMATION MEASURES 199
where , and are fuzzy equivalence ma-
trices induced by fuzzy condition attributes , , and ,
is the relation matrix induced by decision .
First, let’s not take the decision into account, and analyze the
approximation space without the decision .
Looking at and ,wefind although the relation matrices of
and are similar, the information quantities are different.
The difference comes from the probability distribution of the ob-
jects. The probabilities of and are greater than that of .
and are discernible as to , so the total discernibility powerof
relation isgreater than that of ,and
and
We have .
We can conclude and are independent and
have the same discernibility power as , respec-
tively. So and are two reducts
From Theorem 19, and are reducts of the ap-
proximation space.
In the same way, we can find and are relative
reducts of the space.
VI. CONCLUSION AND DISCUSSION
The contribution of this paper is two-fold. On one side, we
generalize fuzzy approximation spaces to fuzzy probabilistic
approximation spaces by introducing a probability distribution
on . On the other side, we reform Shannon’s information mea-
sures into relation matrix representations and extend them to
the fuzzy probabilistic approximation spaces. The proposed def-
initions of fuzzy probabilistic approximation spaces integrate
three types of uncertainty: fuzziness, probability and roughness
into one framework. The analysis shows that the fuzzy proba-
bilistic approximation space will degrade to fuzzy approxima-
tion space if the uniform distribution assumption holds. Further-
more, an approximation space is Pawlak’s one if equivalence
relations and the subsets to be approximated are crisp. Therefore
the fuzzy probabilistic approximation spaces unify the represen-
tations of the approximation spaces. Accordingly, the informa-
tion measures for fuzzy probabilistic approximation spaces give
uniform formulas to calculate the information quantity of the
spaces.
The probability characterizes the uncertainty of randomness
of event sets and is an efficient tool to deal with inconsistency
and noise in data. Introducing probability into an approximation
space presents a gate for statistical techniques applying to rough
set methodology, which maybe lead to a tool for randomness,
incompleteness, inconsistence and vagueness in real-world ap-
plications.
APPENDIX
Theorem 13: Given a set with n elements and two crisp
equivalence relation matrices , , the equivalence classes
generated by and are denoted by and , respectively.
The equivalence classes contained are denoted by and
, then we have
where
Proof:
200 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006
Because there are the following properties between:
and
•, that is ;
•, , ;
•, , such that
Then, we have
(29)
Now, we just require proving
Just the same as before, we have
•
;
•, ; ,
, ;
•Assuming that
Then
Now, we have
(30)
Combine (29) with (30), we can reach the conclusion.
ACKNOWLEDGMENT
The authors would like to thank the anonymous reviewers for
their helpful and insightful comments and suggestions.
REFERENCES
[1] W. Swiniarski, Roman, and L. Hargis, “Rough sets as a front end of
neural-networks texture classifiers,”Neurocomput., vol. 36, no. 1–4,
pp. 85–102, 2001.
[2] W. Swiniarski and A. Skowron, “Rough set methods in feature selec-
tion and recognition,”Pattern Recog. Lett., vol. 24, no. 6, pp. 833–849,
2003.
[3] Q. Hu, D. Yu, and Z. Xie, “Reduction algorithms for hybrid data based
on fuzzy rough set approaches,”in Proc. 2004 Int. Conf. Machine
Learning and Cybernetics, pp. 1469–1474.
[4] S. Tsumoto, “Automated extraction of hierarchical decision rules from
clinical databases using rough set model,”Expert Syst. Appl., vol. 24,
no. 2, pp. 189–197, 2003.
[5] N. Zhong, J. Dong, and S. Ohsuga, “Rule discovery by soft induction
techniques,”Neurocomput., vol. 36, no. 1–4, pp. 171–204.
[6] T. P. Hong, L. Tseng, and S. Wang, “Learning rules from incomplete
training examples by rough sets,”Expert Syst. Appl., vol. 22, no. 4, pp.
285–293, 2002.
[7] L. Polkowski and A. Skowron, “Rough mereology: A new paradigm
for approximate reasoning. Intern,”J. Approx. Reason., vol. 15, no. 4,
pp. 333–365, 1996.
[8] Z. Pawlak, “Rough sets, decision algorithms and Bayes’theorem,”Eur.
J. Oper. Res., vol. 136, no. 1, pp. 181–189, 2002.
[9] Z. Pawlak, “Granularity of knowledge, indiscernibility and rough
sets,”in Proc. 1998 IEEE Int. Conf. Fuzzy Systems, 1998, pp.
106–110.
[10] L. Zadeh, “Toward a theory of fuzzy information granulation and its
centrality in human reasoning and fuzzy logic,”Fuzzy Sets Syst., vol.
19, pp. 111–127, 1997.
[11] D. Dubois and H. Prade, “Rough fuzzy sets and fuzzy rough sets,”Int.
J. Gen. Syst., vol. 17, no. 2–3, pp. 191–209, 1990.
[12] D. Dubois and H. Prade, “Putting fuzzy sets and rough sets together,”
in Intelligent Decision Support, R. Slowiniski, Ed. Dordrecht, The
Netherlands: Kluwer, 1992, pp. 203–232.
[13] N. Morsi Nehad and M. M. Yakout, “Axiomatics for fuzzy rough
sets,”Fuzzy Sets Syst., vol. 100, no. 1–3, pp. 327–342, November
16, 1998.
[14] R. Anna Maria and E. E. Kerre, “A comparative study of fuzzy rough
sets,”Fuzzy Sets Syst., vol. 126, no. 2, pp. 137–155, 2002.
[15] W. Wu and W. Zhang, “Constructive and axiomatic approaches of
fuzzy approximation operators,”Inform. Sci., vol. 159, no. 3–4, pp.
233–254, 2004.
[16] J. Mi and W. Zhang, “An axiomatic characterization of a fuzzy gener-
alization of rough sets,”Inform. Sci., vol. 160, no. 1–4, pp. 235–249,
2004.
[17] Y.-F. Wang, “Mining stock price using fuzzy rough set system,”Expert
Syst. Appl., vol. 24, no. 1, pp. 13–23, 2003.
[18] S. Padmini and R. Miguel et al.,“Vocabulary mining for information
retrieval: rough sets and fuzzy sets,”Inform. Process. Manage., vol. 37,
no. 1, pp. 15–38.
[19] Q. Shen and A. Chouchoulas, “A rough-fuzzy approach for generating
classification rules,”Pattern Recog., vol. 35, no. 11, pp. 2425–2438,
2002.
[20] C. Shannon and W. Weaver, The Mathematical Theory of Communi-
cation. Champaign, IL: Univ. Illinois Press, 1964.
[21] B. Forte, “Measure of information: The general axiomatic theory,”
RIRO, vol. R2, no. 3, pp. 63–90, 1969.
[22] J. Kampe de Feriet and B. Forte, “Information etc Probabilite CRAS
Paris,”in ser. A, vol. 265, 1967, pp. 110–114, 143-146.
[23] L. Zadeh, “Probability measures of fuzzy events,”J. Math. Anal. Appl.,
vol. 23, pp. 421–427, 1968.
[24] R. Yager, “Entropy measures under similarity relations,”Int. J. Gen.
Syst., vol. 20, pp. 341–358, 1992.
[25] E. Hernandez and J. Recasens, “A reformulation of entropy in the pres-
ence of indistinguishability operators,”Fuzzy Sets Syst., vol. 128, pp.
185–196, 2002.
[26] R. Mesiar and J. Rybarik, “Entropy of fuzzy partitions: a general
model,”Fuzzy Sets Syst., vol. 99, pp. 73–79, 1998.
[27] C. Bertoluzza, V. Doldi, and G. Naval, “Uncertainty measure on fuzzy
partitions,”Fuzzy Sets Syst., vol. 142, pp. 105–116, 2004.
[28] R. Jensen and Q. Shen, “Fuzzy-rough attribute reduction with appli-
cation to web categorization,”Fuzzy Sets Syst., vol. 141, pp. 469–485,
2004.
[29] J. F. Peters, Z. Pawlak, and A. Skowron, “A rough set approach to mea-
suring information granules,”in Computer Software and Applications
Conf., 2002, pp. 1135–1139.
[30] R. Yager, “On the entropy of fuzzy measures,”IEEE Trans. Fuzzy Syst.,
vol. 8, no. 4, pp. 453–461, Aug. 2000.
[31] S. Greco, B. Matarazzo, and R. Słowin
´ski, “Rough sets methodology
for sorting problems in presence of multiple attributes and criteria,”
Eur. J. Oper. Res., vol. 138, no. 2, pp. 247–259, 2002.
HU et al.: FUZZY PROBABILISTIC APPROXIMATION SPACES AND THEIR INFORMATION MEASURES 201
[32] ——,“Fuzzy extension of the rough set approach to multicriteria and
multiattribute sorting,”in Preferences and Decisions Under Incom-
plete Knowledge. Heidelberg, Germany: Physica-Verlag, 2000, pp.
131–151.
[33] Y. Yao, “Information granulation and rough set approximation,”Int. J.
Intell. Syst., vol. 16, no. 1, pp. 87–104, 2001.
[34] T. Y. Lin, “From rough sets and neighborhood systems to informa-
tion granulation and computing in words,”Proc. Eur. Congr. Intelligent
Techniques and Soft Computing, pp. 1602–1606, Sep. 8–12, 1997.
[35] W. Pedrycz, “Shadowed sets: bridging fuzzy and rough set,”in Rough
Fuzzy Hybridization: A New Trend in Decision Making, S. K. Pal and
A. Skowron, Eds. Berlin, Germany: Springer-Verlag, 1999.
[36] G. Wang, H. Yu, and D. Yang, “Decision table reduction based on con-
ditional information entropy,”Chinese J. Comp., vol. 25, no. 7, pp. 1–9,
2002.
[37] Q. Hu and D. Yu, “Entropies of fuzzy indiscernibility relation and its
operations,”Int. J. Uncertainty, Fuzziness Knowledge-Based Syst., vol.
12, no. 5, pp. 575–589, 2004.
[38] D. Li, B. Zhang, and Y. Leung, “On knowledge reduction in incon-
sistent decision information systems,”Int. J. Uncertainty, Fuzziness
Knowledge-Based Syst., vol. 12, no. 5, pp. 651–672, 2004.
[39] L. Zadeh, “Fuzzy logic equals computing with words,”IEEE Trans.
Fuzzy Syst., vol. 4, no. 2, pp. 103–111, Apr. 1996.
[40] J. Casasnovas and J. Torrens, “An axiomatic approach to scalar cardi-
nalities of fuzzy sets,”Fuzzy Sets Syst., vol. 133, no. 2, pp. 193–209,
2003.
[41] Q. Hu, D. Yu, and Z. Xie, “Information-preserving hybrid data reduc-
tion based on fuzzy-rough techniques,”Pattern Recognit. Lett., vol. 27,
no. 5, pp. 414–423, 2006.
[42] L. Zadeh, “A new direction in AI—Toward a computational theory of
perceptions,”AI Mag., vol. 22, no. 1, pp. 73–84, 2001.
[43] Y. Zhang, “Constructive granular systems with universal approxima-
tion and fast knowledge discovery,”IEEE Trans. Fuzzy Syst., vol. 13,
no. 1, pp. 48–57, Feb., 2005.
Qinghua Hu received the M.S. degree in power engi-
neering from Harbin Institute of Technology, Harbin,
China, in 2002. He is current;y working toward the
Ph.D. degree at Harbin Institute of Technology.
His research interests are focused on data mining
and knowledge discovery in historical record
database of power plants with fuzzy and rough
techniques. He has authored or coauthored more
than 20 journal and conference papers in the areas of
machine learning, data mining, and rough set theory.
Daren Yu was born in Datong, China, in 1966. He
received the M.Sc. and D.Sc. degrees from Harbin
Institute of Technology, Harbin, China, in 1988 and
1996, respectively.
Since 1988, he has been with the School of Energy
Science and Engineering, Harbin Institute of Tech-
nology. His main research interests are in modeling,
simulation, and control of power systems. He has
published more than one hundred conference and
journal papers on power control and fault diagnosis.
Zongxia Xie , photograph and biography not available at the time of publication.
Jinfu Liu , photograph and biography not available at the time of publication.