ArticlePDF Available

Cost-sensitive methods of constructing hierarchical classifiers

Authors:

Abstract and Figures

The cost of a future exploitation of a decision support system plays a key role. The paper deals with the problem of feature value acquisition cost for such systems. We present a modification of a cost-sensitive learning method for decision-tree induction with fixed attribute acquisition cost limit. Properties of the concept are established during computer experiments conducted on chosen benchmark databases from the UCI Machine Learning Repository and a real medical decision task. The results of experiments confirm that, for some decision problems, our proposition allows us to obtain a classifier with the same quality as a classifier obtained without cost limit but its exploitation is cheaper.
Content may be subject to copyright.
Cost-sensitive methods of constructing
hierarchical classifiers
Wojciech Penar
1
and Michal Wozniak
2
(1) Institute of Computer Engineering, Control and Robotics, Wroclaw University of
Technology, Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland
(2) Chair of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze
Wyspianskiego 27, 50-370 Wroclaw, Poland
Email: [wojciech.penar,michal.wozniak]@pwr.wroc.pl
Abstract: The cost of a future exploitation of a decision support system plays a key role. The paper deals with
the problem of feature value acquisition cost for such systems. We present a modification of a cost-sensitive
learning method for decision-tree induction with fixed attribute acquisition cost limit. Properties of the concept
are established during computer experiments conducted on chosen benchmark databases from the UCI Machine
Learning Repository and a real medical decision task. The results of experiments confirm that, for some
decision problems, our proposition allows us to obtain a classifier with the same quality as a classifier obtained
without cost limit but its exploitation is cheaper.
Keywords: decision tree, cost-sensitive classifier, cost limit
1. Introduction
During the design of computer decision support
systems the cost of their design and exploitation
plays a key role. The cost of exploitation can be
considered as the expense of incorrect diagnosis
or the expense of feature value acquisition. The
first problem is a typical problem for decision
theory where we want to find the classifier for
which the cost of misclassification is the lowest
(Duda et al., 2001). Of course we have to define
so-called lost function in advance which gives
the cost values of misclassification. In our paper
we concentrate our attention on the case where
the cost depends on the real expense of feature
value acquisition for decision making (Turney,
1995; Greiner et al., 2002). Of course it can be
measured by monetary units or time units. A
typical example of cost-sensitive classification is
medical diagnosis where physicians would like
to balance the costs of various tests with the
expected benefits or doctors have to make a
diagnosis quickly on the basis of low cost
(quickly measured) features because therapeutic
action has to be taken without delay. Let us note
that for many decision tasks there are no
problems in making high-quality medical deci-
sions on the basis of expensive medical tests.
One of the main tasks of designing computer-
aided diagnosis is finding the balance between
cost of exploitation and qualities of diagnosis.
As we stated the problem of cost-sensitive deci-
sion making arises frequently in medicine
(Nu´ nez, 1988), industrial production processes
(Verdenius, 1991), robotics (Tan & Schlimmer,
1989), technological diagnosis (Lirov & Yue,
1991) and many other fields, e.g. electronic
equipment testing and real-time computer sys-
tems. Additionally for many typical diagnosis
DOI: 10.1111/j.1468-0394.2010.00515.x
Article _____________________________
146 Expert Systems, July 2010, Vol. 27, No. 3 c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd
systems we cannot exceed the cost limit which
means that the maximum diagnosis cost (or
time) is fixed. Our paper addresses the most
popular inductive learning method (Mitchell,
1997) based on top-down decision-tree induction.
We propose cost-sensitive modifications of the
decision-tree induction algorithm with respect to
on the one hand the cost of feature value acquisi-
tion and on the other the maximum cost limit.
The content of the work is as follows. Section 2
provides the idea of the decision-tree induction
algorithm and related work which takes into
account cost during the learning process. Section
3 describes our modification of the cost-sensitive
approach to the decision-tree induction algo-
rithm. In the following section the results of
experimental investigations of the algorithms are
presented. The last part concludes the paper.
2. Cost-sensitive decision-tree learning
Let us discuss the methods used in this paper.
First we present the idea of the top-down
decision-tree induction algorithm (Quinlan,
1986), and related modifications of the algo-
rithm which respect the feature value acquisition
cost will be proposed. The basic idea involved in
any multi-stage approach is to break up a
complex decision into several simpler classifica-
tions (Safavian & Landgrebe, 1991). The deci-
sion-tree classifiers are one possible approach to
multi-stage pattern recognition. The synthesis of
a hierarchical classifier is a complex problem
(Burduk, 2010). It involves specification of the
following components (Mui & Fu, 1980):
(1) design of a decision-tree structure;
(2) feature selection used at each non-terminal
node of the decision tree;
(3) choice of decision rules for performing the
classification.
The approach under consideration is based
on the top-down induction algorithm and fo-
cuses its attention on decision-tree construction.
In each node only one test on an individual
feature is performed. Decision-tree induction
algorithms have been used for several years
(Mitchell, 1997; Alpaydin, 2010). In general,
they propose an approximate discrete function
method which is adopted to the classification
task. It is one of the most important methods for
classification which achieves very good classifi-
cation quality in many practical decision sup-
port systems. Many decision-tree algorithms
have been developed. The most famous are
CART (Breiman et al., 1984), ID3 (Quinlan,
1986) and its modification C4.5 (Quinlan, 1993).
2.1. Idea of the ID3 learning algorithm
As we mentioned above ID3 is a typical deci-
sion-tree algorithm. It introduces information
entropy as the splitting attribute choosing
measure. It trains a tree from root to leaf, a
top-down sequence which is shown in the pseu-
docode presented below.
function ID3(examples, target_concept, attributes)
parameters: examples learning set, target_concept, attributes list of
attributes
[1] Create a Root node for tree
[2] IF all examples belong to the same class
[3] THEN return the single node tree Root with this class label and return
[4] IF set of attributes is empty
[5] THEN return the single node tree Root with label ¼most common
value of label in examples and return
[6] Choose ‘the best’ attribute A from the set of attributes
[7] FOR EACH possible value vi of attribute
[8] Add new tree branch below Root, corresponding to the test A ¼vi.
c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd Expert Systems, July 2010, Vol. 27, No. 3 147
The central choice in the ID3 algorithm is
selecting the best attribute to test at each node in
the tree. The proposed algorithm uses the in-
formation gain that measures how well the given
attribute separates the training examples ac-
cording to the target classification. This measure
is based on the Shannon entropy of set S:
entropyðSÞ¼ X
M
i¼0
pilog2pið1Þ
where p
i
is the proportion of Sbelonging
to class i. The information gain of an attribute
Arelative to the collection of examples Sis
defined as
gainðS;AÞ¼entropyðSÞ
X
c2valuesðAÞ
jSvj
jSjentropyðSvÞð2Þ
where values(A) is the set of all possible values
for attribute Aand S
v
is the subset of Sfor
which A¼v.
2.2. Cost-sensitive modification of information
gain function
As we mentioned earlier the C4.5 algorithm is an
extended version of ID3. It improves appropri-
ate attribute selection measures, avoids data
overfitting, reduces error pruning, handles attri-
butes with different weights, improves comput-
ing efficiency, handles missing value data and
continuous attributes, and performs other func-
tions. Instead of the information gain in ID3,
C4.5 uses an information gain ratio (Quinlan,
1993). One of the main advantages of this
decision tree is that we can easily convert the
obtained tree into a set of rules (each path
from that form is a decision rule). This form of
knowledge is the most popular form of knowl-
edge representation in most expert systems
(Liebowitz, 1998). We can find some modifica-
tions of the function which evaluates attributes.
Some well-known propositions are presented
in Table 1. Let us present some propositions
which take the feature acquisition cost into
consideration.
3. Propositions of the new method
We propose the following modification of the
decision-tree algorithm to respect the cost limit.
We use the proposition known as EG2 (Nu´ nez,
1991) with different values instead of the infor-
mation gain function to determine which attri-
bute has to be chosen for node creation (Penar
& Wozniak, 2007). First, we add the following
input parameters:
COST-LIMIT, which means the limit of cost
connected with classifier exploitation (a gen-
eral initial value of COST-LIMIT is fixed by
experts)
MAX-COST-LIMIT which means the max-
imum cost limit connected with classifier
exploitation
STEP is a grain of the increasing cost
EXPECTED-QUALITY
We then propose to modify lines [4] to [6] of the
ID3 algorithm and similarly for any other algo-
rithm the code that uses functions which evalu-
ate the information gain of the chosen attribute,
e.g. for C4.5 the function that is modified is the
so-called information ratio.
[9] Let Evi be the subset of set of examples that has value vi for A.
[10] IF Evi is empty
[11] THEN below this new branch add a leaf node with label ¼most
common value of label in the set of examples
[12] ELSE below this new branch add new subtree and call
ID3(Evi, target_concept, attributes fAg).
[13] END
[14] RETURN Root
148 Expert Systems, July 2010, Vol. 27, No. 3 c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd
Let CS-ID3 be the name of such modified top-
down induction algorithm. Taking advantage of
this function we propose the following modifi-
cation:
4. Experimental investigation
The aim of the experiment was to compare
errors and size of the decision-tree classifiers
obtained via the C4.5 procedure which consider
the classification (attribute acquisition) cost lim-
it. We carried out two groups of experiments.
The first was performed on benchmark learning
sets; however, the second one evaluated a cost-
sensitive decision tree for the medical diagnosis
problem. The conditions of the experiments
were as follows.
All experiments were made for different limits
of cost and ovalues for EG2 modification.
For the experiments we chose a pruned
decision tree obtained via C4.5 (Quinlan,
1993).
Table 1: Cost-sensitive modifications of the information gain function
Name Description Comment
EG2 (Nu´ nez, 1991)
ICFðS;AÞ¼ 2gainðS;AÞ
½costðAÞ1o
ois the strength of the bias toward the
lowest cost attributes. In the case of
o¼0 the feature acquisition cost is
ignored and ICF has the same features as
the gain function; if o¼1 the mentioned
cost plays the most important role
CS-ID3 (Tan & Schlimmer,
1989, 1990; Tan, 1993) CSðS;AÞ¼gainðS;AÞ
costðAÞ2
IDX (Norton, 1989) IDXðS;AÞ¼gainðS;AÞ
costðAÞ
[4] IF set of attributes is empty or cost of previously chosen attributes
and cost of any remaining attributes exceed the COST_LIMIT
[5] THEN return the single node tree Root with label ¼most common
value of label in examples and return
[6] Choose the best attribute A from the set of attributes for which
summarized cost of attributes does not exceed the cost limit
[1] Call function ID3(examples, target_concept, attributes, COST-LIMIT)
[2] Evaluate obtained tree
[3] IF quality of classifier <EXPECTED-QUALITY
[4] THEN COST-LIMIT: ¼COST-LIMIT þSTEP
[5] ELSE return tree
[6] FI
[7] IF COST-LIMIT exceeds MAX-COST-LIMIT
[8] THEN return tree and information that EXPECTED-QUALITY was not
achieved
[9] FI
c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd Expert Systems, July 2010, Vol. 27, No. 3 149
The rule post pruning method was used for
pruning.
All experiments were carried out using a
modified Quinlan implementation of C4.5
and our own software created in the Matlab
environment using PRTOOLS (Duin et al.,
2004) and the classification toolbox (Stork &
Yom-Tov, 2004). For this purpose we mod-
ified the C4.5 algorithm source code.
Probabilities of errors of the classifiers were
estimated using the 10-fold cross-validation
method.
4.1. Experiments on benchmark databases
For all experiments five databases from the UCI
Machine Learning Repository (Newman et al.,
1998) were chosen. They concern different fields
of medicine and differ from each other in respect
of numbers of examples, amounts and types of
attributes, and numbers of classes. The details of
the tested databases are presented in Table 2.
The results of the experimental investigations
are presented in Figures 1–5.
4.2. Type of hypertension diagnosis
For hypertension therapy it is very important to
recognize the state of the patient and the correct
treatment. The physician is responsible for de-
ciding if the hypertension is of an essential or a
secondary type (so-called first-level diagnosis).
Senior physicians from the Hoˆ pital Broussais
Hypertension Clinic and Wroclaw Medical
Academy suggest 30% as an acceptable error
rate for first-level diagnosis. The presented pro-
ject was developed together with the Service
d’Informatique Me
´dicale from the University
Paris VI. All data were obtained from the
medical database ARTEMIS, which contains
the data of patients with hypertension who have
been treated at the Hoˆ pital Broussais in Paris.
The mathematical model was simplified. How-
ever, our experts from the Hoˆ pital Broussais
and Wroclaw Medical Academy regarded this
problem of diagnosis very useful. It led to the
following classification of hypertension type:
(1) essential hypertension (abbreviation, es-
sential);
(2) fibroplastic renal artery stenosis (abbrevia-
tion, fibro);
(3) atheromatous renal artery stenosis (abbre-
viation, athero);
(4) Conns syndrome (abbreviation, conn);
(5) renal cystic disease (abbreviation, poly);
(6) pheochromocystoma (abbreviation, pheo).
Although the set of symptoms necessary to
correctly assess existing hypertension is pretty
wide, in practice for the diagnosis the results of
19 examinations (which came from general in-
formation about the patient, blood pressure
measurements and basic biochemical data) are
used (Table 3) and the results of the classifier for
diagnosis of hypertension type are presented in
Figure 6.
4.3. Evaluation of the experimental results
In general the results of the simulations did not
surprise us. The quality of the classifiers strongly
depends on cost limit. It increased as the cost
limit rose. We noted that the quality of the
classifiers was not improved when some values
of cost limit were exceeded.
For some databases like the thyroid (Figure
1) and liver (Figure 2) we observed that
there were values of cost limit for which the
classifiers obtained were good enough. Increas-
ing them provided slightly better classifiers but
Table 2: Description of databases used in experiments
Database Number of examples Number of attributes Number of classes
1 Pima Indians diabetes 768 8 2
2 Heart disease 303 13 2
3 Hepatitis 155 19 2
4 Liver disorders 345 5 2
5 Thyroid disease 7200 20 3
150 Expert Systems, July 2010, Vol. 27, No. 3 c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd
0
1
2
3
4
5
6
7
8
0 20 40 60 80 100 120 140 160
Error rate [%]
Maximum classification cost
Classification error for Thyroid Disease
0
5
10
15
20
25
30
35
40
0 20 40 60 80 100 120 140 160
Tree size
Maximum classification cost
Tree size for Thyroid Disease
Figure 1: Classification error and decision-tree size versus maximum classification cost for thyroid
database.
32
34
36
38
40
42
44
46
0 20 40 60 80 100
Error rate [%]
Maximum classification cost
Classification error for Liver Disorders
0
5
10
15
20
25
30
35
40
45
0 20 40 60 80 100
Tree size
Maximum classification cost
Tree size for Liver Disorders
Figure 2: Classification error and decision-tree size versus maximum classification cost for liver
database.
15
20
25
30
35
40
45
0 100 200 300 400 500 600
Error rate [%]
Maximum classification cost
Classification error for Heart Disease
0
10
20
30
40
50
60
0 100 200 300 400 500 600
Tree size
Maximum classification cost
Tree size for Heart Disease
Figure 3: Classification error and decision-tree size versus maximum classification cost for heart
disease database.
c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd Expert Systems, July 2010, Vol. 27, No. 3 151
tree size grew rapidly (especially for the thyroid
database), which might cause the classifiers to be
overtrained.
But we have to mention an interesting phe-
nomenon that we observed in some experiments.
For example, for the hepatitis (Figure 4) or
Pima Indian diabetes (Figure 5) databases we
noted that an increasing cost limit did not result
in classification quality improvement. A cheap
classifier (for a cost limit of about 20) had good
quality and was based on a small decision-tree
structure. The small decision-tree size protected
the classifier from overfitting. The observed
effect of decreasing tree size and error rate with
increasing cost limit were caused by the avail-
ability of new, more expensive features, which
were better then the cheaper ones used pre-
viously. The cheap features were responsible
for building an overgrown tree. When the cost
limit increased, the cheap attributes became
available again and they caused tree overgrowth
and a lower quality of classifier.
The same observation was made in the second
experiment (Figure 6). A high quality and very
cheap classifier is obtained for a cost limit equal
to about 12–15 (the maximum cost limit is 44)
and it depends on the ovalue. The second
advantage of this classifier is that tree size is
rather small which could protect against over-
fitting. A similar problem of computer-aided
diagnosis of hypertension type was described
by Blinowska et al. (1991) but they used another
mathematical model and implemented Bayes’s
decision rule. They obtained a better classifier
17
17.5
18
18.5
19
19.5
20
20.5
0 10 20 30 40 50
Error rate [%]
Maximum classification cost
Classification error for Hepatitis
0
2
4
6
8
10
12
14
16
0 10 20 30 40 50
Tree size
Maximum classification cost
Tree size for Hepatitis
Figure 4: Classification error and decision-tree size versus maximum classification cost for hepatitis
database.
22
24
26
28
30
32
34
36
0 20 40 60 80 100
Error rate [%]
Maximum classification cost
Classification error for Pima Indian Diabetes
0
10
20
30
40
50
60
0 20 40 60 80 100
Tree size
Maximum classification cost
Tree size for Pima Indian Diabetes
Figure 5: Classification error and decision-tree size versus maximum classification cost for Pima
Indian diabetes database.
152 Expert Systems, July 2010, Vol. 27, No. 3 c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd
than ours: its frequency of correct classification
of the secondary type of hypertension is about
85% (ours was about 70%). The advantage of
our proposition is that it is simpler and extre-
mely cheap compared with the model in the
above-mentioned paper.
5. Conclusion
The idea of a cost-sensitive learning method for
a decision tree with fixed cost limit was pre-
sented in this paper. The properties of the
proposed concept were established during com-
puter experiments conducted on four bench-
mark databases from the medical area. The
results did not surprise us but we noted some
interesting properties of the method under con-
sideration. We hope that the idea presented will
be helpful for constructing real decision systems,
especially for decision-aided systems where the
cost of classification plays an important role.
Acknowledgements
This work is supported by the Polish State
Committee for Scientific Research under a grant
which was realized in the years 2006–2009.
References
ALPAYDIN, E. (2010) Introduction to Machine Learning,
London: MIT Press.
BLINOWSKA, A., G. CHATELLIER,J.BERNIER and
M. LAVRIL, (1991) Bayesian statistics as applied
to hypertension diagnosis, IEEE Transactions on
Biomedical Engineering,38, 699–706.
31
32
33
34
35
36
37
38
010 20 30 40 50
Error rate [%]
Maximum classification cost
Classification error for Hypertension
0
50
100
150
200
250
010 20 30 40 50
Tree size
Maximum classification cost
Tree size for Hypertension
Figure 6: Classification error and decision-tree size versus maximum classification cost for type of
hypertension diagnosis.
Table 3: Clinical features considered
Number Feature Cost Number Feature Cost
1 Sex 1 11 Effusion 1
2 Body weight 1 12 Artery stenosis 1
3 Height 1 13 Heart failure 1
4 Cigarette smoker 1 14 Palpitation 1
5 Limb ache 1 15 Cholesterol 8
6 Alcohol 1 16 Serum creatinine 6
7 Systolic blood pressure 1 17 Serum potassium 5
8 Diastolic blood pressure 1 18 Serum sodium 5
9 Maximal systolic blood pressure 1 19 Uric acid 6
10 Carotid or lumbar murmur 1
c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd Expert Systems, July 2010, Vol. 27, No. 3 153
BREIMAN, L., J.H. FRIEDMAN, R.A. OLSHEN and C.J.
STONE (1984) Classification and Decision Trees,
Belmont, CA: Wadsworth.
BURDUK, R. (2010) Classification error in Bayes multi-
stage recognition task with fuzzy observations, Pat-
tern Analysis and Applications,13, 85–91.
DUDA, R.O., P.E. HART and D.G. STORK (2001) Pattern
Classification, New York: Wiley-Interscience.
DUIN, R.P.W., P. JUSZCZAK,P.PACLIK,E.PEKALSKA,
D. DE RIDDER and D.M.J. TAX (2004) PRTools4, A
Matlab Toolbox for Pattern Recognition, Delft: Delft
University of Technology.
GREINER, R., A. GROVE and D. ROTH (2002) Learning
cost-sensitive active classifiers, Artificial Intelligence,
139, 137–174.
Liebowitz, J. (ed.) (1998) The Handbook of Applied
Expert Systems, Boca Raton, FL: CRC Press.
LIROV, Y. and O.C. YUE (1991) Automated network
troubleshooting knowledge acquisition, Journal of
Applied Intelligence,1, 121–132.
MITCHELL, T.M. (1997) Machine Learning, New York:
McGraw-Hill.
MUI, J. and K.S. FU(1980) Automated classification of
nucleated blood cells using a binary tree classifier,
IEEE Transactions on Pattern Analysis and Machine
Intelligence,PAMI-2, 429–443.
NEWMAN, D.J., S. HETTICH, C.L. BLAKE and C.J.
MERZ (1998) UCI Repository of Machine Learning
Databases (http://www.ics.uci.edu/mlearn/MLRe
pository.html), Irvine, CA: Department of
Information and Computer Science, University of
California.
NORTON, S.W. (1989) Generating better decision trees,
in Proceedings of the 11th International Joint
Conference on Artificial Intelligence IJCAI-89,
San Francisco, CA: Morgan Kaufmann, 800–805.
NU
´NEZ, M. (1988) Economic induction: a case study, in
Proceedings of the 3rd European Working Session on
Learning EWSL-88, San Mateo, CA: Morgan Kauf-
mann, 139–145.
NU
´NEZ, M. (1991) The use of background knowledge
in decision tree induction, Machine Learning,6,
231–250.
PENAR, W. and M. WOZNIAK (2007) Experiments on
classifiers obtained via decision tree induction meth-
ods with different attribute acquisition cost limit, in
Computer Recognition Systems 2, M. Kurzynski
et al. (eds), Berlin: Springer, 371–377.
QUINLAN, J.R. (1986) Induction on decision tree,
Machine Learning,1, 81–106.
QUINLAN, J.R. (1993) C4.5 Programs for Machine
Learning, San Mateo, CA: Morgan Kaufmann.
SAFAVIAN, S.R. and D. LANDGREBE (1991) A survey of
decision tree classifier methodology, IEEE Transac-
tions on Systems, Man, and Cybernetics,21,
660–674.
STORK, D.G. and E. YOM-TOV (2004) Computer Man-
ual in MATLAB to Accompany Pattern Classifica-
tion, New York: Wiley-Interscience.
TAN, M. (1993) Cost-sensitive learning of classification
knowledge and its applications in robotics, Machine
Learning,13, 7–33.
TAN, M. and J. SCHLIMMER (1989) Cost-sensitive
concept learning of sensor use in approach
and recognition, in Proceedings of the 6th
International Workshop on Machine Learning ML-
89, San Francisco, CA: Morgan Kaufmann, 392–
395.
TAN, M. and J. SCHLIMMER (1990) CSL: a cost-
sensitive learning system for sensing and grasping
objects, in Proceedings of the IEEE International
Conference on Robotics and Automation, New York:
IEEE Press, 858–863.
TURNEY, P.D. (1995) Cost-sensitive classification: em-
pirical evaluation of a hybrid genetic decision tree
induction algorithm, Journal of Artificial Intelligence
Research,2, 369–409.
VERDENIUS, F. (1991) A method for inductive cost
optimization, in Proceedings of the 5th European
Working Session on Learning EWSL-91, New York:
Springer, 179–191.
The authors
Wojciech Penar
Wojciech Penar is a PhD student in the Institute
of Computer Engineering, Control and Ro-
botics, Faculty of Electronics, Wroclaw Univer-
sity of Technology, Poland. He received an MS
degree in computer science from the Wroclaw
University of Technology in 2004. His research
focuses on distributed systems, communication
networks and machine learning.
Michal Wozniak
Michal Wozniak is Professor of Computer
Science in the Department of Systems and
Computer Networks, Faculty of Electronics,
Wroclaw University of Technology, Poland. He
received an MS degree in biomedical engineer-
ing in 1992 from the Wroclaw University of
Technology, and PhD and DSc (habilitation)
degrees in computer science in 1996 and 2007
respectively from the same university. His re-
154 Expert Systems, July 2010, Vol. 27, No. 3 c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd
search focuses on multiple classifier systems,
machine learning, data and web mining, Bayes
compound theory, distributed algorithms, com-
puter and networks security and teleinformatics.
Professor Wozniak has published over 120 pa-
pers and two books and has edited three books.
He is editor in chief of International Journal of
Computer Networks and Communications and
associate editor of several international journals
including Pattern Analysis and Applications,Ex-
pert Systems,Information Fusion,Logic Journal
of the IGPL and International Journal of Com-
munication Networks and Distributed Systems.
He serves on the programme committees of
numerous international conferences. His works
have been transitioned into commercial applica-
tions. Professor Wozniak has been involved in
many research projects related to machine learn-
ing, computer networks and telemedicine.
Moreover, he has been a consultant on several
commercial projects for well-known Polish
companies and for the Polish public administra-
tion. Professor Wozniak is a member of
the IEEE (Computational Intelligence Society
and Systems, Man and Cybernetics Society)
and IBS (International Biometric Society).
For a more detailed profile see http://www.kssk.
pwr.wroc.pl/pracownicy/michal.wozniak-en.
c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd Expert Systems, July 2010, Vol. 27, No. 3 155
... Filter-based methods [11][12][13][14] evaluate individual features by their relevance or mutual information concerning the target variable. It works independently without the need of pre-trained models. ...
Preprint
In this paper, we investigate dynamic feature selection within multivariate time-series scenario, a common occurrence in clinical prediction monitoring where each feature corresponds to a bio-test result. Many existing feature selection methods fall short in effectively leveraging time-series information, primarily because they are designed for static data. Our approach addresses this limitation by enabling the selection of time-varying feature subsets for each patient. Specifically, we employ reinforcement learning to optimize a policy under maximum cost restrictions. The prediction model is subsequently updated using synthetic data generated by trained policy. Our method can seamlessly integrate with non-differentiable prediction models. We conducted experiments on a sizable clinical dataset encompassing regression and classification tasks. The results demonstrate that our approach outperforms strong feature selection baselines, particularly when subjected to stringent cost limitations. Code will be released once paper is accepted.
... In addition, the modified learning algorithms usually consist of a large number of hyper-parameters tuning. • Cost-sensitive learning methods: These approaches consider higher costs for misclassifying the minority classes with respect to majority classes, that is, misclassification of minority class is much more expensive [27]. The learning process turns to minimize the cost errors instead of maximization of accuracy rate [28]. ...
Article
Full-text available
The class imbalance learning problem is an important topic that has attracted considerable attention in machine learning and data mining. The most common method of addressing imbalanced datasets is the synthetic minority oversampling technique (SMOTE). However, the SMOTE and its variants suffer from the noise derived from the interpolation of synthetic examples. In this paper, an overproduce-and-choose strategy, which is divided into the overproduction and selection phases, is proposed to generate an appropriate set of synthetic examples for imbalance learning problems. In the overproduction phase, a new interpolation mechanism is developed to produce numerous synthetic examples, while in the selection phase, the synthetic examples that are beneficial to the classification task are selected by using instance selection based on evolutionary computation. Experiments are conducted on a large number of datasets selected from the real-world applications. The experimental results demonstrate that the proposed method is significantly better than SMOTE and its well-known variants in terms of several metrics, including G-mean (GM) and area under the curve.
... This type of learning has multi-objective, we try to strike a balance between performance and cost of used features (Turney, 1994)(Penar, 2010. Because in many cases using costly features offer higher predictive power than several cheaper features (few more expensive ones) (Jackowski, 2012). ...
Thesis
Full-text available
Imbalanced datasets classification is inherently difficult. This situation becomes a challenge when amounts of data are processed to extract knowledge because traditional learning models fail to generate required results due to imbalanced nature of data. In this paper, we will address the problem of imbalanced datasets whether at the class level, or at the classifier level. In our work, we are interested in binary or multi-class classification. To do this, we present a set of techniques used to solve this problem in particular boosting methods and machine learning algorithms. Our goal is therefore to re-balance the dataset at the class level and to find an optimal classifier to handle these datasets after balancing. Through the results obtained, it was observed that the boosting methods are well suited to re-balance the data and thus give a very satisfactory classification result.
... Finding or approximating a set of non-dominated solutions and choosing among them is the main topic of multi-criteria optimization and multi-criteria decision making. There are many methods used in the multi-criteria optimization task, including Simlex models, methods based on the structure of graphs, trees, two-phase Simplex, etc. Decision trees are compared in [7]. The method with lower cost limit obtains similar or better quality for some data sets because it uses a greedy approach that does not guarantee globally optimal solutions. ...
Chapter
Many different decision problems require taking a compromise between the various goals we want to achieve into account. A specific group of features often decides the state of a given object. An example of such a task is the feature selection that allows increasing the decision’s quality while minimizing the cost of features or the total budget. The work’s main purpose is to compare feature selection methods such as the classical approach, the one-objective optimization, and the multi-objective optimization. The article proposes a feature selection algorithm using the Genetic Algorithm with various criteria, i.e., the cost and accuracy. In this way, the optimal Pareto points for the nonlinear problem of multi-criteria optimization were obtained. These points constitute a compromise between two conflicting objectives. By carrying out various experiments on various base classifiers, it has been shown that the proposed approach can be used in the task of optimizing difficult data.
... where ω is the strength of the bias toward the lowest cost attributes; in case of ω = 0 the feature acquisition cost is ignored and ICF as the same features as Gain function, if ω = 1 -mentioned cost plays the most important role. We propose to use different values instead of information gain function to determine which attribute has to be chosen for the node creation [11]. Firstly, we add the following input parameters: TOTAL-COST, which means cost connected with classifier exploitation (as a general initial value is fixed by expert), MAX-COST-LIMIT which means maximum cost limit connected with classifier exploitation, STEP , which is the increase of the TOTAL-COST in the next induction step, EXPECTED-QUALITY, which denotes the expected performance of the inducted cost sensitive decision tree (that can be expressed as accuracy, error, sensitivity or any other measure). ...
Chapter
Full-text available
Liver Fibrosis caused by the Hepatitis Virus type C (HCV) may be a serious life-threatening condition if is not diagnosed and treated on time. Our previous research proved that it is possible to estimate liver fibrosis stage in patients with diagnosed HCV only using blood tests. The aim of our research is to find a safe and non-invasive but also inexpensive diagnostic method. As not all blood tests are equally expensive (not only in meaning of money, but also time of analysis), this article introduces a Cost Factor to the hierarchical classifiers. Our classifier has been based on a C4.5 decision tree building algorithm enhanced with a modified EG2 algorithm for maintaining a cost limit.
Chapter
Cost-sensitive learning is an aspect of algorithm-level modifications for class imbalance. Here, instead of using a standard error-driven evaluation (or 0–1 loss function), a misclassification cost is being introduced in order to minimize the conditional risk. By strongly penalizing mistakes on some classes, we improve their importance during classifier training step. This pushes decision boundaries away from their instances, leading to improved generalization on these classes. In this chapter we will discuss the basics of cost-sensitive methods, introduce their taxonomy, and describe how to deal with scenarios in which misclassification cost is not given beforehand by an expert. Then we will describe most popular cost-sensitive classifiers and talk about the potential for hybridization with other techniques. Section 4.1 offers background and taxonomy of cost-sensitive classification algorithms. The important issue of how to obtain the cost matrix is discussed in Sect. 4.2. Section 4.3 describes MetaCost, a popular wrapper approach for adapting any classifier to a cost-sensitive setting, while Sect. 4.4 discusses various aspects of cost-sensitive decision trees. Other cost-sensitive classification models are described in Sect. 4.5, while Sect. 4.6 shows the potential advantages of using hybrid cost-sensitive algorithms. Finally Sect. 4.7 concludes this chapter and presents future challenges in the field of cost-sensitive solutions to class imbalance.
Conference Paper
The paper presents a cost-sensitive classifier ensemble pruning method, which employs a genetic algorithm to choose the most promising ensemble. In this study the pruning algorithm considers constraints put on the cost of selected features, which is the one of the key-problems in the real-life decision support systems, especially dedicated medical support systems. The proposed method takes into consideration both the overall classification accuracy and the cost constraints, returning balanced solution for the problem at hand. Additionally, also to boost the value of the exploitation cost, we propose to use cost-sensitive decision trees as the base classifiers. The pruning algorithm was evaluated on the basis of the comprehensive computer experiments run on cost-sensitive medical benchmark datasets.
Conference Paper
The article presents the concept of decomposition of the multidimensional classification task. The recognition procedure is divided into independent blocks. These blocks can be interpreted as lower classification problems. The structure of these blocks is presented as a decision tree. In this model the experts give the decision tree structure. The problem discussed in the work shows a selection of different classifiers (or their parameters) to the internal nodes of the decision tree. Experiments conducted for selected medical diagnosis problem show that the use of different classifiers can improve the quality of classification.
Article
In the paper the problem of cost in the two-stage binary classifier is presented. Assuming that both the tree structure and the feature used at each non-terminal node have been specified, we present the expected total cost for two cases. The first one concerns the zero-one loss function, the second concerns the stage-dependent loss function. The work focuses on the difference between the expected total costs for these two cases of loss function. Obtained results relate to the globally optimal strategy of Bayes multistage classifier.
Conference Paper
Learning classification rules are not enough to solve diagnostic problems that deal with costly tests of difficult measurements. A program of inductive learning based on criteria of economy of resources is presented. This program is a implementation of a simplified version of EG2 induction method [Núñez 88a] [Núñez 88b]. The economic criterion used in this program can be calibrated from the smallest tree (maximum information gain criterion of ID3) to the most economic tree (EG2*). The generated decision tree makes future users spend less money, people or energy (depending on the cost unit associated to each attribute) while consulting a classification. This program called ALEXIS has been applied to a Gynaecology diagnostic problem and the results are shown in this paper.
Article
The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.
Article
Describes the interactive design of a binary tree classifier. The binary tree classifier with a quadratic discriminant function using up to ten features at each nonterminal node was applied to classify 1294 cells into one of 17 classes. Classification accuracies of 83 percent and 77 percent were obtained by the binary tree classifier using the resubstitution and the leave-one-out methods of error estimation, respectively, whereas the existing results using the same data are 71 percent and 67 percent using a single stage linear classifier with 20 features and the resubstitution and the half-and-half methods of error estimation, respectively.
Article
The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.