ArticlePDF Available

Fuzzy Probabilistic Approximation Spaces and Their Information Measures.

Authors:

Abstract

Rough set theory has proven to be an efficient tool for modeling and reasoning with uncertainty information. By introducing probability into fuzzy approximation space, a theory about fuzzy probabilistic approximation spaces is proposed in this paper, which combines three types of uncertainty: probability, fuzziness, and roughness into a rough set model. We introduce Shannon's entropy to measure information quantity implied in a Pawlak's approximation space, and then present a novel representation of Shannon's entropy with a relation matrix. Based on the modified formulas, some generalizations of the entropy are proposed to calculate the information in a fuzzy approximation space and a fuzzy probabilistic approximation space, respectively. As a result, uniform representations of approximation spaces and their information measures are formed with this work
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006 191
Fuzzy Probabilistic Approximation Spaces and Their
Information Measures
Qinghua Hu,Daren Yu,Zongxia Xie, and Jinfu Liu
Abstract—Rough set theory has proven to be an efficient tool for
modeling and reasoning with uncertainty information. By intro-
ducing probability into fuzzy approximation space, a theory about
fuzzy probabilistic approximation spaces is proposed in this paper,
which combines three types of uncertainty: probability, fuzziness,
and roughness into a rough set model. We introduce Shannon’s
entropy to measure information quantity implied in a Pawlak’s
approximation space, and then present a novel representation of
Shannon’s entropy with a relation matrix. Based on the modified
formulas, some generalizations of the entropy are proposed to cal-
culate the information in a fuzzy approximation space and a fuzzy
probabilistic approximation space, respectively. As a result, uni-
form representations of approximation spaces and their informa-
tion measures are formed with this work.
Index Terms—Approximation space, fuzzy set, information mea-
sure, probability distribution, rough set.
I. INTRODUCTION
ROUGH set methodology has been witnessed great success
in modeling with imprecise and incomplete information.
The basic idea of this method hinges on classifying objects of
discourse into classes containing indiscernible objects with re-
spect to some attributes. Then the indiscernible classes, also
called knowledge granules, are used to approximate the unseen
object sets. In this framework, an attribute set is viewed as a
family of knowledge, which partitions the universe into some
knowledge granules or elemental concepts. Any attribute or at-
tribute set can induce a partition of the universe. We say
that knowledge is finer than knowledge if is a re-
finement of . An arbitrary subset of the universe can
be approximated by two sets , called the lower ap-
proximation and upper approximation, respectively. If can be
precisely approximated by some knowledge granules of the par-
tition, the set is called a definable set, where ; oth-
erwise we say is a rough set. The approximating power of an
information system depends on the knowledge . The finer the
knowledge is, the more accurately can be approximated.
This process is much similar to reasoning of human’s mind. In
real life, the objects are drawn together by indistinguishability,
similarity, proximity and named with a concept. Then a concept
system is formed and used to approximately describe unseen ob-
jects. Partition, granulation and approximation are the methods
widely used in human’s reasoning [10], [39]. Rough set method-
ology presents a novel paradigm to deal with uncertainty and has
Manuscript received May 16, 2004; revised March 16, 2005 and June 8, 2005.
The authors are with Harbin Institute of Technology, Harbin, Heilonghiang
Province 150001, China (e-mail: huqinghua@hcms.hit.edu.cn; yudaren@hcms.
hit.edu.cn; xiezongxia@hcms.hit.edu.cn; liujinfu@hcms.hit.edu.cn).
Digital Object Identifier 10.1109/TFUZZ.2005.864086
been applied to feature selection [1], [2], knowledge reduction
[3], [36], [38], rule extraction [4]–[6], uncertainty reasoning [7],
[8] and granulation computing [9], [33], [34], [42].
In Pawlak’s rough set model, fuzziness and probability are
not taken into consideration. Pawlak’s model just works in nom-
inal data domain, for crisp equivalence relations and equiva-
lence classes are the foundation of the model [8], [37]. How-
ever, there are usually real-valued data and fuzzy information
in real-world applications. To deal with fuzziness, some gen-
eralizations of Pawlak’s model were proposed; the theories on
rough and fuzzy sets were put together. Rough-fuzzy sets and
fuzzy-rough sets were introduced in [11], [12], [35] and ana-
lyzed in detail [13]–[16]. The generalized methods were applied
to hybrid data reduction [41], mining stock price [17], vocabu-
lary for information retrieval [18] and fuzzy decision rule ex-
traction [19].
Both the theory on classical rough sets and its fuzzy gen-
eralizations implicitly take an assumption that the objects in
the universe are equally probable. Namely, the objects are uni-
formly distributed and the probability of each object is ,
where is the number of objects. In fact, this assumption just
holds if the information about the probability of the objects
is totally ignored. Sometimes, there is a probability distribu-
tion on the object or event set [23], [40]. A theory on proba-
bilistic approximation space or a probabilistic rough set model
is expected in this case. For example, there is an information
system about the disease flu, which is described with three at-
tributes: headache, muscle pain, and temperature. The values of
the attributes headache, muscle pain and flu are yes and no, and
those of the attribute temperature are high and normal. There
are cases in all. If there are not any samples about the
disease, but a probability distribution of the 16 cases, then the
theory about probabilistic approximation spaces is desirable for
reasoning with uncertainty of roughness and randomness. Prob-
ability distribution of the universe lays a foundation to employ
statistical techniques into rough set model, which maybe lead to
a tool to deal with inconsistency or noise in data.
In the rough set framework, attributes are called knowledge,
which is used to form a concept system of the universe. Knowl-
edge introduced by an attribute set implies in the partition of a
referential universe. The more knowledge there is, the finer par-
tition will be, and correspondingly we can get a more perfect
approximation of a subset in the universe. Attributes induce an
order or a structure of universe of discourse, which decreases
uncertainty or chaos of the universe. Given a universe , a prob-
ability distribution on , and some nominal, real-value or fuzzy
attributes, there comes forth an interesting problem: How do we
measure the knowledge quantity introduced by an attribute set in
the approximation space. In other words, it’s interesting in con-
1063-6706/$20.00 © 2006 IEEE
192 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006
structing a measure to compute the discernibility power induced
by a family of attributes. This measure leads to the likelihood to
compare the knowledge quantity formed by different attributes,
and help us nd the important attribute set and redundancy of an
information system. Hartley captured the intuitive idea that the
more possible results for an experiment, the less it can be pre-
dicted. Shannon [20] dened a measure of a random variable
within the frame of communication theory. Forte and Kampe
[21], [22] gave the axiomatic information measure, where the
word informationwas associated both to measures of events
and measures of partitions and suggested that the uncertainty
measure is associated to a family of partitions of a given refer-
ential space. Zadeh [23] introduced a new uncertainty measure
for fuzzy probabilistic space. Yager introduced some measures
to calculate uncertainty implied in similarity relation [24]. In
[25] a measure, suitable to operate on fuzzy equivalence relation
domains, was introduced. Uncertainty measure on fuzzy parti-
tions was analyzed in documents [26], [27], [37]. In this paper,
Shannons entropy is rst introduced to compute the knowledge
quantity of nominal attributes in Pawlaks approximation space,
and then an extended information measure will be presented,
which is suitable for the spaces where fuzzy attributes or fuzzy
relations are dened on. Based on the extension, the solutions
to measuring the information in fuzzy and fuzzy probabilistic
hybrid approximation spaces are presented.
The rest of the paper is organized as follows. Some deni-
tions in classical approximation spaces are reviewed in Sec-
tion II. We introduce fuzzy probabilistic approximation spaces
in Section III. Shannons entropy is applied to calculating the
information quantity in a classical approximation space in Sec-
tion IV. Then we redene the formulae of Shannons entropy
with a matrix representation and extend it to the fuzzy cases. The
information measures for fuzzy approximation spaces and fuzzy
probabilistic approximation spaces are presented in Section V.
Finally, the conclusions and discusses are given in Section VI.
II. PRELIMINARIES
In this section, we will review some basic denitions in rough
set theory.
Definition 1: is called an approximation space, where
is the universe; is a family of at-
tributes, also called knowledge in the universe. is the value
domain of and is an information function .
An approximation space is also called an information system.
Any subset of knowledge denes an equivalence (also
called indiscernibility) relation on
(1)
will generate a partition of . We denote the parti-
tion induced by attributes as
(2)
where is the equivalence class containing , the elements
in are indiscernible or equivalent with respect to knowl-
edge . Equivalent classes, also named as elemental concepts,
information granules, are used to characterize arbitrary subsets
of .
Definition 2: An arbitrary subset of is characterized by
two unions of elemental concepts , called lower and
upper approximations, respectively
(3)
The lower approximation is the greatest union of con-
tained in and the upper approximation is the least union of
containing . The lower approximation is also called pos-
itive region sometimes, denoted as .
We say is a renement of if there is a partial order
(4)
Theorem 1: , .
Theorem 2: ,
.
, that is to say, can be accurately characterized
with knowledge , and we say set is denable, otherwise,
is indenable and we say is a rough set.
is called boundary set. A set is denable if it is a
nite union of some elemental concepts, which let precisely
characterized with respect to knowledge . Theorem 1 shows
that the more knowledge we have, the ner partition we will get,
accordingly, the more accurately subset can be approximated
and a less boundary we will get.
Definition 3: Given , , ,if
, we say knowledge is redundant in . Otherwise,
we say knowledge is indispensable. If each in is indis-
pensable, we say is independent. If a set is indepen-
dent and , we say is a reduct of .
A reduct of an information system has the same discernibility
or representation power as that of the original system; however
the reduct has a concise representation with respect to the orig-
inal data.
There is often more than one reduct in an information system.
The common elements of all reducts are called the core of the
information system. The core is the attribute set which cannot
be deleted from the system, or the discernibility of the system
will decrease.
Definition 4: An information system is called a deci-
sion table if the attribute set , where is the con-
dition attribute set and is the decision attribute set. We dene
the dependency between and as
(5)
where denotes the cardinality of a set and
, is th equivalence class induced by .Given ,
we say is redundant relative to in if
, otherwise is indispensable.If is indispens-
able we say is independent with respect to the decision .
HU et al.: FUZZY PROBABILISTIC APPROXIMATION SPACES AND THEIR INFORMATION MEASURES 193
Dependency measures the capability of condition attributes
to characterize the decision and can be used as a sig-
nicance measure of condition attributes with respect to deci-
sion. means that the decision can be approximated
precisely by the knowledge granules induced with the attribute
set .
Denition 5: Given , we say is the - relative reduct
if satises
1) ;
2) B is independent relative to .
The rst term grantees the power of to approximate is
the same as that of ; the second term means that there is no
redundant attribute in .
III. FUZZY PROBABILISTIC APPROXIMATION SPACES
Pawlaks approximation spaces work on the domain where
crisp equivalence relations are dened. In this section, we will
integrate three types of uncertainty: probability, fuzziness, and
roughness together, and present the denition of fuzzy proba-
bilistic approximation spaces.
Denition 6: Given a nonempty nite set , is a fuzzy
binary relation over , denoted by a matrix
(6)
where is the relation value between and .We
say is a fuzzy equivalence relation if , , , satises
1) Reexivity: ;
2) Symmetry: ;
3) Transitivity: .
Some operations of relation matrices are dened as
1) ;
2) ;
3) ;
4) .
A crisp equivalence relation induces a crisp partition of the
universe and generates a family of crisp equivalence classes.
Correspondingly, a fuzzy equivalence relation generates a
fuzzy partition of the universe and a series of fuzzy equivalence
classes, which are also called fuzzy knowledge granules [10],
[39], [43].
Denition 7: The fuzzy partition of the universe generated
by a fuzzy equivalence relation is dened as
(7)
where .
is the fuzzy equivalence class containing . is the degree of
equivalent to . Here, means union of elements.
In this case, is a fuzzy set and the family of forms a
fuzzy concept system of the universe. This system will be used
to approximate the object subset of the universe.
Example 1: Assume is an object set,
is a fuzzy equivalence relation on :
Then, the equivalence classes are
Theorem 3: Given an set , is a fuzzy equivalence relation
on . , ,wehave
1) ;
2)
Theorem 4: Given an set , and are two fuzzy equiv-
alence relations on ,wehave
Denition 8: A three-tuple is a fuzzy probabilistic
approximation space or a fuzzy probabilistic information
system, where is a nonempty and nite set of objects, called
the universe, is a probability distribution over . is a
family of fuzzy equivalence relations dened on .
Denition 9: Give a fuzzy probabilistic approximation space
. is a fuzzy subset of . The lower approxima-
tion and upper approximation is denoted by and ; then
membership of to are dened as
(8)
where and mean and operators, respectively, and
means the membership of to , seeing [28]. These
denitions are the rational extension of some models. Let us
derive the other model from these denitions.
Case 1: is a crisp subset and is a crisp equivalence
relation on
194 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006
These denitions are consistent with Pawlaks rough set
model in this case.
Case 2: is a fuzzy subset of and is a crisp equivalence
relation on
Here, the rough sets are called rough fuzzy sets.
Case 3: is a subset of and is a fuzzy equivalence
relation on :
From the previous analysis, we can conclude that the deni-
tions of lower and upper approximations of fuzzy sets in fuzzy
information systems are rational generalizations of the classical
model.
The membership of an object , belonging to the fuzzy
positive region is
(9)
Denition 10: Given a fuzzy probabilistic information
system , and are two subsets of attribute set ,
the dependency degree of to is dened as
(10)
The difference between fuzzy approximation spaces and
fuzzy probabilistic approximation spaces is introducing prob-
ability distribution over . This leads to a more general
generalization of Pawlaks approximation space. The classic
approximation space takes the uniform-distribution assump-
tion. So , . Then
This formula is the same as that in fuzzy approximation space
[28], which shows that the fuzzy probabilistic approximation
space will degrade to a fuzzy approximation space when the
equalityprobability assumption is satised.
Denition 11: Given , , ,if
and are two fuzzy partitions, we say knowledge is
redundant or superuous in if . Otherwise,
we say knowledge is indispensable.Ifany belonging to
is indispensable, we say is independent. If attribute subset
is independent and , we say is a reduct
of .
Denition 12: Given , . is a subset
of . , is redundant in relative to if
, otherwise is indispensable. is independent if
is indispensable, otherwise is dependent. is a reduct
if satises
1) ;
2) .
Comparing the fuzzy probabilistic approximation space with
fuzzy approximation space, we can nd that the foundational
difference is in computing the cardinality of fuzzy set, such
as fuzzy equivalence classes, fuzzy lower approximations and
fuzzy upper approximations. Accordingly it leads to difference
in dening the function of dependency. Finding dependency of
data is a foundational problem in machine learning and data
mining. The difference in dependency will lead to great changes
in reasoning with uncertainty. In classical fuzzy approxima-
tion space, we assume the objects are uniformly distributed and
. In the fuzzy probabilistic approximation space
the probability of is . When the probability
, the fuzzy probabilistic approximation space degrades to
a fuzzy approximation space, and if the equivalence relation and
the object subset to be approximated are both crisp, we get a
Pawlaks approximation space.
IV. SHANNONSENTROPIES ON PAWLAKS
APPROXIMATION SPACE
Knowledge is thought as the discernibility power of the at-
tributes in the framework of rough set methodology. An attribute
set forms an equivalence relation; correspondingly generates a
partition of the universe and a family of concepts. The quantity
of knowledge measures the neness degree of the partition. The
ner the partition is, the more knowledge about the universe we
have, and accordingly a ner approximation we will have. In
this section, we will introduce Shannons information measure
to compute the knowledge quantity of a crisp attribute set or a
crisp partition of .
Given a universe and two attribute sets , , we take the
partitions and as two random variables in -algebra:
The probability distributions of and are dened as
(11)
HU et al.: FUZZY PROBABILISTIC APPROXIMATION SPACES AND THEIR INFORMATION MEASURES 195
and
(12)
where ; . Correspondingly,
the joint probability of and is
(13)
where .
Denition 13: Information quantity of attributes is dened
as
(14)
Denition 14: The joint entropy of and is dened as
(15)
Denition 15: The conditional entropy to is
dened as
(16)
where .
Theorem 5: .
Theorem 6: Given a universe , and are two attribute
sets on ,if then
1) ;
2) ;
3) , where means .
Proof: The rst two terms are straightforward. Here we just
give the proof of the third term.
Taking that the probability distributions about knowledge
and are
and
Without loss of generality, we assume that
,
Here we have
,,so
.
Theorem 7: Given ,if is redundant, then
, otherwise .
Proof: The probability distributions of and are
and
Attribute is redundant, is the same as .So
Then .
Theorem 8: Given an approximation space , is
a reduct if satises
1) ;
2) .
Theorem 9: Given a decision table ,,if
, then .
Theorem 10: Given a decision table , ,
where is the condition attribute set and is the decision,
, . is redundant if . is
independent if . is a reduct
of the decision table if satises
1) ;
2) .
Example 2: Consider the decision Table I, where
196 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006
TABLE I
HIRING DATA
The partitions of by , , , and are
First, we calculate the dependency between the condition at-
tributes and decision attribute with Denition 5 and nd that
there are two reducts of the system: and
As we know, information entropy is greater than 0. The
above computation shows the decision attribute can be totally
precisely approximated if we have attribute set .
will make no renement to the partition by . There-
fore no knowledge will be brought into the system by or
. and and
are less than . According to Theorem 7,
we know is a reduct of the decision table. Analogously
the set is also a reduct.
V. I NFORMATION MEASURES ON FUZZY PROBABILISTIC
APPROXIMATION SPACES
Shannons information entropy just works in the case where a
crisp equivalence relation or a crisp partition is dened. It is suit-
able for Pawlaks approximation space. In this section, a novel
formula to compute Shannons entropy with a crisp relation ma-
trix is presented, and then generalized to the fuzzy cases. Fur-
thermore, we will propose another generalization applicable to
the case where a probability distribution is dened on the uni-
verse and use the proposed entropies to measure the information
in fuzzy probabilistic approximation spaces.
A. Shannons Entropy for Crisp Equivalence Relations
Given a crisp approximation space , Arbitrary relation
can be denoted by a relation matrix
where is the relation value between elements and .If
satises ; ; and
then , we say is an equivalence
relation and is an equivalence relation matrix.
Then the equivalence class contained with respect to is
written as
(17)
where or 1. 1means that is indiscernible with
respect to the relation and belongs to the equivalence class;
0means doesnt belong to the class. The cardinality of
is dened as
(18)
Denition 16: Given an approximation space , an arbi-
trary equivalence relation on , denoted by a relation matrix
, then we dene the information measure for relation as
(19)
where .
HU et al.: FUZZY PROBABILISTIC APPROXIMATION SPACES AND THEIR INFORMATION MEASURES 197
Example 3: There is an object set , and a relation
matrix of the set induced by a nominal attribute
Then, the equivalence class of can be written as
And the information entropy of is calculated by
Intuitively, the object set is divided into two classes
. The information quantity of the relation
is
We can nd that the information entropy with Denition 16
is equivalent to Shannons entropy for crisp relations.
Theorem 11: Given an approximation space , ,
is the equivalence relation generated by attributes . Then,
we have .
Theorem 12: Given an approximation space , ,
, , are two equivalence relations generated by attributes
and . and are the equivalence classes induced
by and . The joint entropy of and is
Here, means and means .
Theorem 13: Given an approximation space , ,
, , are two equivalence relation generated by attributes
and . and are the equivalence classes induced by
and . The conditional entropy conditioned to
is
Proof: Please see the Appendix.
The previous work reforms Shannons information measures
into a relation matrix representation. The reformation will bring
great advantages for generalizing them to the fuzzy cases.
B. Information Measure for Fuzzy Relations
As we know, fuzziness exists in many real-world applications.
Dubois et al. presented the denitions of fuzzy approximation
spaces [11], [12]. In this section, we will present a generaliza-
tion of Shannons entropy. The novel measure is with the same
form as Shannons one and can work in the case where fuzzy
equivalence relations are dened.
Given a nite set , is a fuzzy attribute set in , which
generates a fuzzy equivalence relation on . The fuzzy re-
lation matrix is denoted by
where is the relation value of and .
The fuzzy partition generated by the fuzzy equivalence rela-
tion is
(20)
where .
Remark that takes a value in the range [0, 1] here. This
is the key difference between the crisp set theory and the fuzzy
one. As to a fuzzy partition induced by a fuzzy equivalence re-
lation, the equivalence class is a fuzzy set. means the oper-
ator of union in this case. The cardinality of the fuzzy set
can be calculated with
(21)
which appears to be a natural generalization of the crisp set.
Denition 17: Information quantity of a fuzzy attribute set or
a fuzzy equivalence relation is dened as
(22)
where , called a fuzzy relative frequency, n is the
number of objects in .
This measure has the same form as the Shannons one de-
ned as Denition 16, but it has been generalized to the fuzzy
case. The formula of information measure forms a map:
, where is a equivalence relation matrix, is the
nonnegative real-number set. This map builds a foundation on
which we can compare the discernibility power, partition power
or approximating power of multiple fuzzy equivalence relations.
The entropy value increases monotonously with the discerni-
bility power of the fuzzy attributes.
Denition 18: Given , , are two subsets of .
and are fuzzy equivalence classes containing generated
by , , respectively. The joint entropy of and is dened
as
(23)
198 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006
Denition 19: Given , is the fuzzy attribute set. ,
are two subsets of . The conditional entropy of conditioned
to is dened as
(24)
Theorem 14: .
Theorem 15:
1) , holds if and only if , , .
2) .
3) .
4) .
C. Information Quantity on Fuzzy Probabilistic Approximation
Space
Shannons entropy and the proposed measure work on the as-
sumption that all the objects are equality-probable. In practice,
probability of elements in the universe are different. In this sec-
tion, we will give a generalization where a probability distribu-
tion is dened on .
Given a fuzzy probabilistic approximation space ,
is the fuzzy attribute set, and generates a family of fuzzy equiv-
alence relations on ; is the probability distribution over
and is the probability of object . A fuzzy equivalence
relation generated by the attribute subset is
denoted by a relation matrix:
where . .
Denition 20: The expected cardinality of a fuzzy equiv-
alence class is dened as
(25)
Denition 21: The information quantity of a fuzzy attribute
set or fuzzy equivalence relation is dened as
(26)
This measure is identical with Yagers entropy [24] in the
form, but different in goal. The information measure we give
is to compute the discernibility power of a fuzzy attribute set
or a fuzzy equivalence relation where a probability distribute is
dened on . while Yagers entropy is to measure the semantics
of a fuzzy similarity relation.
Here. we will present a smooth generalization of the deni-
tions of joint entropy and conditional entropy in Shannons in-
formation theory. And the novel generalizations overcome this
problem.
Denition 22: Given , , are two subsets of .
The fuzzy equivalence relations induced by , are denoted
by and .The joint entropy of and is dened as
(27)
where .
Denition 23: The conditional entropy of to is dened
as
(28)
where and .
Theorem 16:
The forms of the proposed information measures are identical
with that of Shannons ones, however they can be used to mea-
sure the information generated by a fuzzy attribute set, a fuzzy
equivalence relation or a fuzzy partition.
The previous work presents an information measure for fuzzy
equivalence relations when a probability distribution is dened.
Here, we will apply it to the fuzzy probabilistic approximation
space.
Theorem 17: Given a fuzzy probabilistic approximation
space , is a fuzzy attribute set; is the probability
distribution on . , are two subsets of . The fuzzy
equivalence relations induced by , are denoted by and
. Then, we have
1) ;
2) ;
3) or ;
4) or
Theorem 18: Given a fuzzy information system ,
, , if is redundant;
if is independent. is a reduct if
satises
1) ;
2) .
Theorem 19: Given a fuzzy information system
. is a subset of . ,
if a is redundant in relative to ;
if is independent. is a reduct of
relative to if satises
1) ;
2) .
Example 4: Given a set . The probability
distribution is and Some
fuzzy equivalence relations on are shown as follows:
HU et al.: FUZZY PROBABILISTIC APPROXIMATION SPACES AND THEIR INFORMATION MEASURES 199
where , and are fuzzy equivalence ma-
trices induced by fuzzy condition attributes , , and ,
is the relation matrix induced by decision .
First, lets not take the decision into account, and analyze the
approximation space without the decision .
Looking at and ,wend although the relation matrices of
and are similar, the information quantities are different.
The difference comes from the probability distribution of the ob-
jects. The probabilities of and are greater than that of .
and are discernible as to , so the total discernibility powerof
relation isgreater than that of ,and
and
We have .
We can conclude and are independent and
have the same discernibility power as , respec-
tively. So and are two reducts
From Theorem 19, and are reducts of the ap-
proximation space.
In the same way, we can nd and are relative
reducts of the space.
VI. CONCLUSION AND DISCUSSION
The contribution of this paper is two-fold. On one side, we
generalize fuzzy approximation spaces to fuzzy probabilistic
approximation spaces by introducing a probability distribution
on . On the other side, we reform Shannons information mea-
sures into relation matrix representations and extend them to
the fuzzy probabilistic approximation spaces. The proposed def-
initions of fuzzy probabilistic approximation spaces integrate
three types of uncertainty: fuzziness, probability and roughness
into one framework. The analysis shows that the fuzzy proba-
bilistic approximation space will degrade to fuzzy approxima-
tion space if the uniform distribution assumption holds. Further-
more, an approximation space is Pawlaks one if equivalence
relations and the subsets to be approximated are crisp. Therefore
the fuzzy probabilistic approximation spaces unify the represen-
tations of the approximation spaces. Accordingly, the informa-
tion measures for fuzzy probabilistic approximation spaces give
uniform formulas to calculate the information quantity of the
spaces.
The probability characterizes the uncertainty of randomness
of event sets and is an efcient tool to deal with inconsistency
and noise in data. Introducing probability into an approximation
space presents a gate for statistical techniques applying to rough
set methodology, which maybe lead to a tool for randomness,
incompleteness, inconsistence and vagueness in real-world ap-
plications.
APPENDIX
Theorem 13: Given a set with n elements and two crisp
equivalence relation matrices , , the equivalence classes
generated by and are denoted by and , respectively.
The equivalence classes contained are denoted by and
, then we have
where
Proof:
200 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 2, APRIL 2006
Because there are the following properties between:
and
, that is ;
, , ;
, , such that
Then, we have
(29)
Now, we just require proving
Just the same as before, we have
;
, ; ,
, ;
Assuming that
Then
Now, we have
(30)
Combine (29) with (30), we can reach the conclusion.
ACKNOWLEDGMENT
The authors would like to thank the anonymous reviewers for
their helpful and insightful comments and suggestions.
REFERENCES
[1] W. Swiniarski, Roman, and L. Hargis, Rough sets as a front end of
neural-networks texture classiers,Neurocomput., vol. 36, no. 14,
pp. 85102, 2001.
[2] W. Swiniarski and A. Skowron, Rough set methods in feature selec-
tion and recognition,Pattern Recog. Lett., vol. 24, no. 6, pp. 833849,
2003.
[3] Q. Hu, D. Yu, and Z. Xie, Reduction algorithms for hybrid data based
on fuzzy rough set approaches,in Proc. 2004 Int. Conf. Machine
Learning and Cybernetics, pp. 14691474.
[4] S. Tsumoto, Automated extraction of hierarchical decision rules from
clinical databases using rough set model,Expert Syst. Appl., vol. 24,
no. 2, pp. 189197, 2003.
[5] N. Zhong, J. Dong, and S. Ohsuga, Rule discovery by soft induction
techniques,Neurocomput., vol. 36, no. 14, pp. 171204.
[6] T. P. Hong, L. Tseng, and S. Wang, Learning rules from incomplete
training examples by rough sets,Expert Syst. Appl., vol. 22, no. 4, pp.
285293, 2002.
[7] L. Polkowski and A. Skowron, Rough mereology: A new paradigm
for approximate reasoning. Intern,J. Approx. Reason., vol. 15, no. 4,
pp. 333365, 1996.
[8] Z. Pawlak, Rough sets, decision algorithms and Bayestheorem,Eur.
J. Oper. Res., vol. 136, no. 1, pp. 181189, 2002.
[9] Z. Pawlak, Granularity of knowledge, indiscernibility and rough
sets,in Proc. 1998 IEEE Int. Conf. Fuzzy Systems, 1998, pp.
106110.
[10] L. Zadeh, Toward a theory of fuzzy information granulation and its
centrality in human reasoning and fuzzy logic,Fuzzy Sets Syst., vol.
19, pp. 111127, 1997.
[11] D. Dubois and H. Prade, Rough fuzzy sets and fuzzy rough sets,Int.
J. Gen. Syst., vol. 17, no. 23, pp. 191209, 1990.
[12] D. Dubois and H. Prade, Putting fuzzy sets and rough sets together,
in Intelligent Decision Support, R. Slowiniski, Ed. Dordrecht, The
Netherlands: Kluwer, 1992, pp. 203232.
[13] N. Morsi Nehad and M. M. Yakout, Axiomatics for fuzzy rough
sets,Fuzzy Sets Syst., vol. 100, no. 13, pp. 327342, November
16, 1998.
[14] R. Anna Maria and E. E. Kerre, A comparative study of fuzzy rough
sets,Fuzzy Sets Syst., vol. 126, no. 2, pp. 137155, 2002.
[15] W. Wu and W. Zhang, Constructive and axiomatic approaches of
fuzzy approximation operators,Inform. Sci., vol. 159, no. 34, pp.
233254, 2004.
[16] J. Mi and W. Zhang, An axiomatic characterization of a fuzzy gener-
alization of rough sets,Inform. Sci., vol. 160, no. 14, pp. 235249,
2004.
[17] Y.-F. Wang, Mining stock price using fuzzy rough set system,Expert
Syst. Appl., vol. 24, no. 1, pp. 1323, 2003.
[18] S. Padmini and R. Miguel et al.,Vocabulary mining for information
retrieval: rough sets and fuzzy sets,Inform. Process. Manage., vol. 37,
no. 1, pp. 1538.
[19] Q. Shen and A. Chouchoulas, A rough-fuzzy approach for generating
classication rules,Pattern Recog., vol. 35, no. 11, pp. 24252438,
2002.
[20] C. Shannon and W. Weaver, The Mathematical Theory of Communi-
cation. Champaign, IL: Univ. Illinois Press, 1964.
[21] B. Forte, Measure of information: The general axiomatic theory,
RIRO, vol. R2, no. 3, pp. 6390, 1969.
[22] J. Kampe de Feriet and B. Forte, Information etc Probabilite CRAS
Paris,in ser. A, vol. 265, 1967, pp. 110114, 143-146.
[23] L. Zadeh, Probability measures of fuzzy events,J. Math. Anal. Appl.,
vol. 23, pp. 421427, 1968.
[24] R. Yager, Entropy measures under similarity relations,Int. J. Gen.
Syst., vol. 20, pp. 341358, 1992.
[25] E. Hernandez and J. Recasens, A reformulation of entropy in the pres-
ence of indistinguishability operators,Fuzzy Sets Syst., vol. 128, pp.
185196, 2002.
[26] R. Mesiar and J. Rybarik, Entropy of fuzzy partitions: a general
model,Fuzzy Sets Syst., vol. 99, pp. 7379, 1998.
[27] C. Bertoluzza, V. Doldi, and G. Naval, Uncertainty measure on fuzzy
partitions,Fuzzy Sets Syst., vol. 142, pp. 105116, 2004.
[28] R. Jensen and Q. Shen, Fuzzy-rough attribute reduction with appli-
cation to web categorization,Fuzzy Sets Syst., vol. 141, pp. 469485,
2004.
[29] J. F. Peters, Z. Pawlak, and A. Skowron, A rough set approach to mea-
suring information granules,in Computer Software and Applications
Conf., 2002, pp. 11351139.
[30] R. Yager, On the entropy of fuzzy measures,IEEE Trans. Fuzzy Syst.,
vol. 8, no. 4, pp. 453461, Aug. 2000.
[31] S. Greco, B. Matarazzo, and R. Słowin
´ski, “Rough sets methodology
for sorting problems in presence of multiple attributes and criteria,”
Eur. J. Oper. Res., vol. 138, no. 2, pp. 247–259, 2002.
HU et al.: FUZZY PROBABILISTIC APPROXIMATION SPACES AND THEIR INFORMATION MEASURES 201
[32] ——,Fuzzy extension of the rough set approach to multicriteria and
multiattribute sorting,in Preferences and Decisions Under Incom-
plete Knowledge. Heidelberg, Germany: Physica-Verlag, 2000, pp.
131151.
[33] Y. Yao, Information granulation and rough set approximation,Int. J.
Intell. Syst., vol. 16, no. 1, pp. 87104, 2001.
[34] T. Y. Lin, From rough sets and neighborhood systems to informa-
tion granulation and computing in words,Proc. Eur. Congr. Intelligent
Techniques and Soft Computing, pp. 16021606, Sep. 812, 1997.
[35] W. Pedrycz, Shadowed sets: bridging fuzzy and rough set,in Rough
Fuzzy Hybridization: A New Trend in Decision Making, S. K. Pal and
A. Skowron, Eds. Berlin, Germany: Springer-Verlag, 1999.
[36] G. Wang, H. Yu, and D. Yang, Decision table reduction based on con-
ditional information entropy,Chinese J. Comp., vol. 25, no. 7, pp. 19,
2002.
[37] Q. Hu and D. Yu, Entropies of fuzzy indiscernibility relation and its
operations,Int. J. Uncertainty, Fuzziness Knowledge-Based Syst., vol.
12, no. 5, pp. 575589, 2004.
[38] D. Li, B. Zhang, and Y. Leung, On knowledge reduction in incon-
sistent decision information systems,Int. J. Uncertainty, Fuzziness
Knowledge-Based Syst., vol. 12, no. 5, pp. 651672, 2004.
[39] L. Zadeh, Fuzzy logic equals computing with words,IEEE Trans.
Fuzzy Syst., vol. 4, no. 2, pp. 103111, Apr. 1996.
[40] J. Casasnovas and J. Torrens, An axiomatic approach to scalar cardi-
nalities of fuzzy sets,Fuzzy Sets Syst., vol. 133, no. 2, pp. 193209,
2003.
[41] Q. Hu, D. Yu, and Z. Xie, Information-preserving hybrid data reduc-
tion based on fuzzy-rough techniques,Pattern Recognit. Lett., vol. 27,
no. 5, pp. 414423, 2006.
[42] L. Zadeh, A new direction in AIToward a computational theory of
perceptions,AI Mag., vol. 22, no. 1, pp. 7384, 2001.
[43] Y. Zhang, Constructive granular systems with universal approxima-
tion and fast knowledge discovery,IEEE Trans. Fuzzy Syst., vol. 13,
no. 1, pp. 4857, Feb., 2005.
Qinghua Hu received the M.S. degree in power engi-
neering from Harbin Institute of Technology, Harbin,
China, in 2002. He is current;y working toward the
Ph.D. degree at Harbin Institute of Technology.
His research interests are focused on data mining
and knowledge discovery in historical record
database of power plants with fuzzy and rough
techniques. He has authored or coauthored more
than 20 journal and conference papers in the areas of
machine learning, data mining, and rough set theory.
Daren Yu was born in Datong, China, in 1966. He
received the M.Sc. and D.Sc. degrees from Harbin
Institute of Technology, Harbin, China, in 1988 and
1996, respectively.
Since 1988, he has been with the School of Energy
Science and Engineering, Harbin Institute of Tech-
nology. His main research interests are in modeling,
simulation, and control of power systems. He has
published more than one hundred conference and
journal papers on power control and fault diagnosis.
Zongxia Xie , photograph and biography not available at the time of publication.
Jinfu Liu , photograph and biography not available at the time of publication.
... Because Jensen et al. [5], [29] primarily focused on the lower approximation space and overlooked considerations for the upper approximation space, Wang et al. [30] developed the concept of fuzzy self-information (FSI) based on their fitting fuzzy rough set model, which considers both the upper and lower fuzzy approximation spaces for attribute reduction. The second direction involves extending information entropy, proposing concepts like fuzzy joint information entropy and fuzzy conditional information entropy, such as the research by Hu et al. [31], [32]. The third direction involves measuring the relationships between attributes based on granularity [33], [34], [35], [36]. ...
... Hernandez and Recasens [43] further extended Yager's work by formulating the concepts of fuzzy joint entropy and fuzzy conditional entropy. Hu et al. [31] redefined fuzzy joint information entropy, fuzzy conditional information entropy, etc., to measure the relationships between attributes. Dai et al. [44] furthered the concept of fuzzy mutual information and introduced the information gain ratio for attribute reduction tasks. ...
... Dai et al. [44] furthered the concept of fuzzy mutual information and introduced the information gain ratio for attribute reduction tasks. The most widely used are the fuzzy information entropy and fuzzy conditional information entropy proposed by Hu et al. [31], [32]. However, the fuzzy information conditional entropy faces a significant problem: its variation during attribute reduction is not monotonic. ...
Article
Fuzzy rough set model is a powerful tool for handling attribute reduction tasks for complex data. While the fuzzy rough set model commonly employs fuzzy information entropy to measure attribute uncertainty, utilizing fuzzy conditional information entropy for measuring attribute relationships presents a drawback due to its lack of monotonicity, impacting attribute reduction results. Furthermore, entropy computations involve numerous logarithmic function computations, resulting in a significant computational burden. Moreover, the results obtained from logarithmic functions are unbounded. To address these problems, this paper presents the concept of Fuzzy Implication Granularity Information (FIGI) for measuring attribute information. Additionally, we introduce several related generalizations, such as fuzzy conditional implication granularity information, fuzzy mutual implication granularity information, and fuzzy joint implication granularity information, aiming to measure the relationships between attributes. Notably, the introduced fuzzy conditional implication granularity information to measure the relationship between attributes demonstrates the desirable property of monotonicity. Crucially, all the metrics proposed in this paper are bounded, ensuring that computed values within the range of 0 to 1. Finally, we propose a forward greedy attribute reduction algorithm based on the monotonic fuzzy conditional implication granularity information (MFIGI), and the performance of our MFIGI algorithm was compared against six different attribute reduction algorithms using three classifiers across 15 different datasets, the experimental results demonstrate the excellence of our MFIGI algorithm.
... Originally, rough sets was ill-suited for handling continuous data, leading to their extension to overcome this limitation. One of the primary extension model for rough sets is fuzzy rough sets, which replaces equivalence relations with similarity relations in classical rough sets to calculate indistinguishability. Several evaluation metrics based on fuzzy rough sets were introduced, including fuzzy dependency functions [25][26][27][28] and fuzzy information entropy [29,30]. These metrics paved the way for the creation of a fuzzy discernibility matrix for single-label feature selection [31,32]. ...
... Definition 3 [29] Given an object space < , ℜ > , the fuzzy information entropy of feature is as follows: ...
... Definition 4 [29] Given a infor mation system =< , , , > , is attribute set, is the attribute value range, the mapping = × → . The equivalence classes m X and m Y that contain m are produced by feature subsets X and Y. ...
Article
Full-text available
Multi-label feature selection aims to mitigate the curse of dimensionality in multi-label data by selecting a smaller subset of features from the original set for classification. Existing multi-label feature selection algorithms frequently neglect the inherent uncertainty in multi-label data and fail to adequately consider the relationships between features and labels when assessing the importance of features. In response to this challenge, a Fuzzy Information Gain Ratio-based multi-label feature selection considering Label Correlation (FIGR_LC) algorithm is proposed. FIGR_LC evaluates feature importance by combining the relationship between features and individual labels, as well as the correlation between features and label sets. Subsequently, a feature ranking is established based on these feature weights. Experimental results substantiate the effectiveness of FIGR_LC, showcasing its superiority over several established feature selection methods.
... For instance, Jensen and Shen [13] proposed a fuzzy dependency-based attribute reduct algorithm, which was the first time that fuzzy rough sets-based approach proposed. Hu et al. [14,15] extended the Shannon entropy to the Pawlak approximation space and proposed an uncertainty measure under fuzzy rough sets, based on which a feature selection algorithm was designed. And Chen et al. [16] proposed an algorithm to obtain an attribute reduct based on the smallest element in the discernibility matrix. ...
Article
Full-text available
In the era of big data, the amount of class labels is growing rapidly, which poses a great challenge to classification tasks. The hierarchical classification was thus introduced to address this issue by considering the structural information between different class labels. In this paper, we propose an incremental feature selection algorithm for handling the arrival of new samples by using the theory of fuzzy rough sets. As a preliminary step, we propose a non-incremental hierarchical feature selection algorithm, which is an improved version of the existing method. Then utilizing the sibling strategy, the incremental calculation of the dependency degree at the arrival of samples is discussed. Based on the analysis of dependency degree change, we design feature addition and deletion strategies, as well as the incremental feature selection algorithm. In the experimental section, two versions of algorithms are designed. The experimental results show that our improvement of the existing method is highly effective and can significantly accelerate the process of feature selection. In addition, version 2 of the incremental algorithm exhibits much higher efficiency than the improved non-incremental algorithm on several datasets, as well as the existing method. Compared to six hierarchical feature selection algorithms, our algorithm achieves better results on the classification accuracy and three hierarchical evaluation metrics. The effectiveness and efficiency of version 1 are also verified by the comparison of version 2 and other results.
... To the best of our knowledge, there are two categories of fuzzy and fuzzy-rough attribute reduction approaches. 1) One way is to create new fuzzy measurement as feature evaluation metric. For example, Hu et al. [22] adjusted the fuzzy rough approximation with probability, and then proposed a theory about fuzzy probabilistic approximation spaces that develop fuzzy information measures for attribute reduction. Rao et al. [23] presented a very quick scheme of feature reduction through taking multiple Gaussian kernels into account in which multiple levels of granularity by different scales of fuzzy granules were considered. ...
Article
Full-text available
As a very prominent research application of the theory of rough sets, attribute reduction technique has made significant strides in a lot of fields, including decision making, granular computing, etc. In particular, fuzzy attribute reduction approaches contribute greatly in the presence of uncertain data. However, most of fuzzy relations used in these approaches lack the discriminant ability to sample similarity, failing to identify the feature significance satisfactorily. In this article, a novel scheme using the shared neighborhood fuzzy uncertainties is proposed. Firstly, the concept of shared neighborhood is formulated, and then employed to establish the fuzzy similarity relation that effectively captures the sample similarity. Secondly, two fuzzy uncertainty measures named joint entropy and discrimination index based on shared neighborhood fuzzy relation are defined, which can quantify the feature’s significance to the uncertainty characterization. Finally, two heuristic searching algorithms are designed to identify reducts aimed at minimizing the fuzzy uncertainties. Some comparative studies are investigated to examine the advantage of the designed reduction algorithms in classifier modeling. The reported analyses on public data sets verify that the designed algorithms outperform some representative and latest algorithms.
... Liu [14] and others improved the traditional attribute reduction algorithm proposed by Hu et al. and put forward the FHARA algorithm. Entropy, as a measure of disorder and uncertainty, is used to calculate conditional information measures in literature [15] [16]. The attribute reduction algorithm proposed in literature [12] is a greedy-based approach that consistently performs high-dimensional operations during attribute selection. ...
Article
In the era of big data, there exist complex structure between different classes labels. Hierarchical structure, among others, has become a representative one, which is mathematically depicted as a tree-like structure or directed acyclic graph. Most studies in the literature focus on static feature selection in hierarchical information system. In this study, in order to solve the incremental feature selection problem of hierarchical classification in a dynamic environment, we develop two incremental algorithms for this purpose (IHFSGR-1 and IHFSGR-2 for short). As a preliminary step, we propose a new uncertainty measure to quantify the amount of information contained in the hierarchical classification system, and based on this, we develop a non-incremental hierarchical feature selection algorithm. Next, we investigate the updating mechanism of this uncertainty measure upon the arrival of samples, and propose two strategies for adding and deleting features, leading to the development of two incremental algorithms. Finally, we conduct some comparative experiments with several non-incremental algorithms. The experimental results suggest that compared with several non-incremental algorithms, our incremental algorithms can achieve better performance in terms of the classification accuracy and two hierarchical evaluation metrics, and can significantly accelerate the fuzzy rough set-based hierarchical feature selection.
Article
Full-text available
The notion of a rough set introduced by Pawlak has often been compared to that of a fuzzy set, sometimes with a view to prove that one is more general, or, more useful than the other. In this paper we argue that both notions aim to different purposes. Seen this way, it is more natural to try to combine the two models of uncertainty (vagueness and coarseness) rather than to have them compete on the same problems. First, one may think of deriving the upper and lower approximations of a fuzzy set, when a reference scale is coarsened by means of an equivalence relation. We then come close to Caianiello's C-calculus. Shafer's concept of coarsened belief functions also belongs to the same line of thought. Another idea is to turn the equivalence relation into a fuzzy similarity relation, for the modeling of coarseness, as already proposed by Farinas del Cerro and Prade. Instead of using a similarity relation, we can start with fuzzy granules which make a fuzzy partition of the reference scale. The main contribution of the paper is to clarify the difference between fuzzy sets and rough sets, and unify several independent works which deal with similar ideas in different settings or notations.
Chapter
As its name suggests, computing with words, CW, is a methodology in which words are used in place of numbers for computing and reasoning. The point of this note is that fuzzy logic plays a pivotal role in CW and vice-versa. Thus, as an approximation, fuzzy logic may be equated to CW. There are two major imperatives for computing with words. First, computing with words is a necessity when the available information is too imprecise to justify the use of numbers. And second, when there is a tolerance for imprecision which can be exploited to achieve tractability, robustness, low solution cost and better rapport with reality. Exploitation of the tolerance for imprecision is an issue of central importance in CW. In CW, a words is viewed as a label of a granule, that is, a fuzzy set of points drawn together by similarity, with the fuzzy set playing the role of a fuzzy constraint on a variable. The premises are assumed to be expressed as propositions in a natural language. For purposes of computation, the propositions are expressed as canonical forms which serve to place in evidence the fuzzy constraints that are implicit in the premises. Then, the rules of inference in fuzzy logic are employed to propagate the constraints from premises to conclusions. At this juncture, the techniques of computing with words underlie -- in one way or another -- almost all applications of fuzzy logic. In coming years, computing with words is likely to evolve into a basic methodology in its own right with wide-ranging ramifications on both basic and applied levels.
Article
We consider a sorting (classification) problem in the presence of multiple attributes and criteria, called MA&C sorting problem. It concerns an assignment of some actions to some pre-defined and preference-ordered decision classes. The actions are described by a finite set of attributes and criteria. Both, attributes and criteria take values from corresponding domains, however, the domains of attributes are not preference-ordered, while the domains of criteria (scales) are totally ordered by preference relations. In order to construct a comprehensive preference model that could be used to support the sorting task, we are considering a preferential information of the decision maker (DM) in the form of assignment examples, i.e. exemplary assignments of some reference actions to the decision classes. The preference model being inferred from these examples is a set of “if..., then...” decision rules. The rules are derived from rough approximations of decision classes made up of reference actions. They satisfy conditions of completeness and dominance, and manage with possible ambiguity (inconsistencies) in the set of examples. Our idea of rough approximations involves two relations together: similarity, being a generalization of classic indiscernibility relation defined on attributes, and dominance relation defined on criteria. In this paper, we propose a fuzzy extension of the rough set approach to MA&C sorting problem.
Chapter
The paper concerns some relationships between decision algorithms, Bayes’ theorem and flow graphs. It is shown it this paper that every decision algorithm reveals probabilistic properties, particularly it satisfies the total probability theorem and Bayes’ theorem. This leads to a new look on Bayesian inference methodology, showing that Bayes’ theorem can be used to reason directly from data without referring to prior and posterior probabilities, inherently associated with Bayesian inference. Besides, a new form of Bayes’ theorem is introduced, based on the strength of decision rules, which simplifies essentially computations. Moreover it is shown that decision algorithms can be depicted in a form of a flow graph in which flow is ruled by the total probability theorem and Bayes’ theorem. This leads to a new class of flow networks, unlike to those introduced by Ford and Fulkerson. Interpretation of flow graphs as a kind of neural network is briefly discussed.
Conference Paper
Classical rough set theory is a powerful tool for nominal data. It has been generalized to fuzzy case with fuzzy indiscernibility relation, which is much general for real-world application. In this paper we introduce and extend Yager's entropy measure. And the definition of conditional entropy is interpreted as the increment of discernibility power by introducing an unseen attribute which is used as a significance measure of the attribute in rough set theory framework. We give novel definitions of independence, reduct, and relative reduct based on the entropy measure in fuzzy rough set model. Then two greedy algorithms are proposed for computing reduct and relative reduct, respectively. Two illustrative examples show the proposed approaches are efficient.
Article
Since the start of the reform era in the late 1970s, China has seen the gradual but unmistakable emergence of a host of phenomena that mark a capitalist society: the commodification of labour, privatization of the means of production, the rise of an entrepreneurial class, and so on. These social and economic changes have hastened the collapse of the "communist" moral order of the Maoist era and with it the personality structure that was an integral part of that order. Quite alarmingly, two decades have passed and no new moral order has arisen to fill the gap left by the demise of the old order. Especially conspicuous is the almost total lack of a new type of person whose values and motivations can help sustain China's emerging capitalist society as the Maoist type of person did the old "communist" order. It is of course debatable whether the introduction of a capitalist market economy in China is a good thing, but there is no denying that the advent of the free market without the simultaneous emergence of a sustaining moral order is a recipe for social problems of gigantic proportions. Nothing better represents such problems than the sheer scale of corruption and the ineffectiveness of all measures to keep it in check. Whatever the intrinsic flaws of capitalism as a social system, China's social problems seem to come as much from the failure to establish a viable capitalist social order as from the success in introducing a capitalist free market.