ArticlePDF Available

Cost-sensitive methods of constructing hierarchical classifiers

July 2010
Expert Systems 27(3):146-155

July 2010
27(3):146-155

DOI:10.1111/j.1468-0394.2010.00515.x

Source
DBLP

Authors:

Michal Wozniak

Wroclaw University of Science and Technology

The cost of a future exploitation of a decision support system plays a key role. The paper deals with the problem of feature value acquisition cost for such systems. We present a modification of a cost-sensitive learning method for decision-tree induction with fixed attribute acquisition cost limit. Properties of the concept are established during computer experiments conducted on chosen benchmark databases from the UCI Machine Learning Repository and a real medical decision task. The results of experiments confirm that, for some decision problems, our proposition allows us to obtain a classifier with the same quality as a classifier obtained without cost limit but its exploitation is cheaper.

Classification error and decision-tree size versus maximum classification cost for type of hypertension diagnosis.

…

Figures - uploaded by Michal Wozniak

Content may be subject to copyright.

Content uploaded by Michal Wozniak

Content may be subject to copyright.

Cost-sensitive methods of constructing

hierarchical classiﬁers

Wojciech Penar

and Michal Wozniak

(1) Institute of Computer Engineering, Control and Robotics, Wroclaw University of

Technology, Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland

(2) Chair of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze

Wyspianskiego 27, 50-370 Wroclaw, Poland

Email: [wojciech.penar,michal.wozniak]@pwr.wroc.pl

Abstract: The cost of a future exploitation of a decision support system plays a key role. The paper deals with

the problem of feature value acquisition cost for such systems. We present a modiﬁcation of a cost-sensitive

learning method for decision-tree induction with ﬁxed attribute acquisition cost limit. Properties of the concept

are established during computer experiments conducted on chosen benchmark databases from the UCI Machine

Learning Repository and a real medical decision task. The results of experiments conﬁrm that, for some

decision problems, our proposition allows us to obtain a classiﬁer with the same quality as a classiﬁer obtained

without cost limit but its exploitation is cheaper.

Keywords: decision tree, cost-sensitive classiﬁer, cost limit

1. Introduction

During the design of computer decision support

systems the cost of their design and exploitation

plays a key role. The cost of exploitation can be

considered as the expense of incorrect diagnosis

or the expense of feature value acquisition. The

ﬁrst problem is a typical problem for decision

theory where we want to ﬁnd the classiﬁer for

which the cost of misclassiﬁcation is the lowest

(Duda et al., 2001). Of course we have to deﬁne

so-called lost function in advance which gives

the cost values of misclassiﬁcation. In our paper

we concentrate our attention on the case where

the cost depends on the real expense of feature

value acquisition for decision making (Turney,

1995; Greiner et al., 2002). Of course it can be

measured by monetary units or time units. A

typical example of cost-sensitive classiﬁcation is

medical diagnosis where physicians would like

to balance the costs of various tests with the

expected beneﬁts or doctors have to make a

diagnosis quickly on the basis of low cost

(quickly measured) features because therapeutic

action has to be taken without delay. Let us note

that for many decision tasks there are no

problems in making high-quality medical deci-

sions on the basis of expensive medical tests.

One of the main tasks of designing computer-

aided diagnosis is ﬁnding the balance between

cost of exploitation and qualities of diagnosis.

As we stated the problem of cost-sensitive deci-

sion making arises frequently in medicine

(Nu´ nez, 1988), industrial production processes

(Verdenius, 1991), robotics (Tan & Schlimmer,

1989), technological diagnosis (Lirov & Yue,

1991) and many other ﬁelds, e.g. electronic

equipment testing and real-time computer sys-

tems. Additionally for many typical diagnosis

DOI: 10.1111/j.1468-0394.2010.00515.x

Article _____________________________

146 Expert Systems, July 2010, Vol. 27, No. 3 c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd

systems we cannot exceed the cost limit which

means that the maximum diagnosis cost (or

time) is ﬁxed. Our paper addresses the most

popular inductive learning method (Mitchell,

1997) based on top-down decision-tree induction.

We propose cost-sensitive modiﬁcations of the

decision-tree induction algorithm with respect to

on the one hand the cost of feature value acquisi-

tion and on the other the maximum cost limit.

The content of the work is as follows. Section 2

provides the idea of the decision-tree induction

algorithm and related work which takes into

account cost during the learning process. Section

3 describes our modiﬁcation of the cost-sensitive

approach to the decision-tree induction algo-

rithm. In the following section the results of

experimental investigations of the algorithms are

presented. The last part concludes the paper.

2. Cost-sensitive decision-tree learning

Let us discuss the methods used in this paper.

First we present the idea of the top-down

decision-tree induction algorithm (Quinlan,

1986), and related modiﬁcations of the algo-

rithm which respect the feature value acquisition

cost will be proposed. The basic idea involved in

any multi-stage approach is to break up a

complex decision into several simpler classiﬁca-

tions (Safavian & Landgrebe, 1991). The deci-

sion-tree classiﬁers are one possible approach to

multi-stage pattern recognition. The synthesis of

a hierarchical classiﬁer is a complex problem

(Burduk, 2010). It involves speciﬁcation of the

following components (Mui & Fu, 1980):

(1) design of a decision-tree structure;

(2) feature selection used at each non-terminal

node of the decision tree;

(3) choice of decision rules for performing the

classiﬁcation.

The approach under consideration is based

on the top-down induction algorithm and fo-

cuses its attention on decision-tree construction.

In each node only one test on an individual

feature is performed. Decision-tree induction

algorithms have been used for several years

(Mitchell, 1997; Alpaydin, 2010). In general,

they propose an approximate discrete function

method which is adopted to the classiﬁcation

task. It is one of the most important methods for

classiﬁcation which achieves very good classiﬁ-

cation quality in many practical decision sup-

port systems. Many decision-tree algorithms

have been developed. The most famous are

CART (Breiman et al., 1984), ID3 (Quinlan,

1986) and its modiﬁcation C4.5 (Quinlan, 1993).

2.1. Idea of the ID3 learning algorithm

As we mentioned above ID3 is a typical deci-

sion-tree algorithm. It introduces information

entropy as the splitting attribute choosing

measure. It trains a tree from root to leaf, a

top-down sequence which is shown in the pseu-

docode presented below.

function ID3(examples, target_concept, attributes)

parameters: examples learning set, target_concept, attributes list of

attributes

[1] Create a Root node for tree

[2] IF all examples belong to the same class

[3] THEN return the single node tree Root with this class label and return

[4] IF set of attributes is empty

[5] THEN return the single node tree Root with label ¼most common

value of label in examples and return

[6] Choose ‘the best’ attribute A from the set of attributes

[7] FOR EACH possible value vi of attribute

[8] Add new tree branch below Root, corresponding to the test A ¼vi.

c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd Expert Systems, July 2010, Vol. 27, No. 3 147

The central choice in the ID3 algorithm is

selecting the best attribute to test at each node in

the tree. The proposed algorithm uses the in-

formation gain that measures how well the given

attribute separates the training examples ac-

cording to the target classiﬁcation. This measure

is based on the Shannon entropy of set S:

entropyðSÞ¼ X

i¼0

pilog2pið1Þ

where p

is the proportion of Sbelonging

to class i. The information gain of an attribute

Arelative to the collection of examples Sis

deﬁned as

gainðS;AÞ¼entropyðSÞ

X

c2valuesðAÞ

jSvj

jSjentropyðSvÞð2Þ

where values(A) is the set of all possible values

for attribute Aand S

is the subset of Sfor

which A¼v.

2.2. Cost-sensitive modiﬁcation of information

gain function

As we mentioned earlier the C4.5 algorithm is an

extended version of ID3. It improves appropri-

ate attribute selection measures, avoids data

overﬁtting, reduces error pruning, handles attri-

butes with different weights, improves comput-

ing efﬁciency, handles missing value data and

continuous attributes, and performs other func-

tions. Instead of the information gain in ID3,

C4.5 uses an information gain ratio (Quinlan,

1993). One of the main advantages of this

decision tree is that we can easily convert the

obtained tree into a set of rules (each path

from that form is a decision rule). This form of

knowledge is the most popular form of knowl-

edge representation in most expert systems

(Liebowitz, 1998). We can ﬁnd some modiﬁca-

tions of the function which evaluates attributes.

Some well-known propositions are presented

in Table 1. Let us present some propositions

which take the feature acquisition cost into

consideration.

3. Propositions of the new method

We propose the following modiﬁcation of the

decision-tree algorithm to respect the cost limit.

We use the proposition known as EG2 (Nu´ nez,

1991) with different values instead of the infor-

mation gain function to determine which attri-

bute has to be chosen for node creation (Penar

& Wozniak, 2007). First, we add the following

input parameters:

COST-LIMIT, which means the limit of cost

connected with classiﬁer exploitation (a gen-

eral initial value of COST-LIMIT is ﬁxed by

experts)

MAX-COST-LIMIT which means the max-

imum cost limit connected with classiﬁer

exploitation

STEP is a grain of the increasing cost

EXPECTED-QUALITY

We then propose to modify lines [4] to [6] of the

ID3 algorithm and similarly for any other algo-

rithm the code that uses functions which evalu-

ate the information gain of the chosen attribute,

e.g. for C4.5 the function that is modiﬁed is the

so-called information ratio.

[9] Let Evi be the subset of set of examples that has value vi for A.

[10] IF Evi is empty

[11] THEN below this new branch add a leaf node with label ¼most

common value of label in the set of examples

[12] ELSE below this new branch add new subtree and call

ID3(Evi, target_concept, attributes fAg).

[13] END

[14] RETURN Root

148 Expert Systems, July 2010, Vol. 27, No. 3 c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd

Let CS-ID3 be the name of such modiﬁed top-

down induction algorithm. Taking advantage of

this function we propose the following modiﬁ-

cation:

4. Experimental investigation

The aim of the experiment was to compare

errors and size of the decision-tree classiﬁers

obtained via the C4.5 procedure which consider

the classiﬁcation (attribute acquisition) cost lim-

it. We carried out two groups of experiments.

The ﬁrst was performed on benchmark learning

sets; however, the second one evaluated a cost-

sensitive decision tree for the medical diagnosis

problem. The conditions of the experiments

were as follows.

All experiments were made for different limits

of cost and ovalues for EG2 modiﬁcation.

For the experiments we chose a pruned

decision tree obtained via C4.5 (Quinlan,

1993).

Table 1: Cost-sensitive modiﬁcations of the information gain function

Name Description Comment

EG2 (Nu´ nez, 1991)

ICFðS;AÞ¼ 2gainðS;AÞ

½costðAÞ1o

ois the strength of the bias toward the

lowest cost attributes. In the case of

o¼0 the feature acquisition cost is

ignored and ICF has the same features as

the gain function; if o¼1 the mentioned

cost plays the most important role

CS-ID3 (Tan & Schlimmer,

1989, 1990; Tan, 1993) CSðS;AÞ¼gainðS;AÞ

costðAÞ2

IDX (Norton, 1989) IDXðS;AÞ¼gainðS;AÞ

costðAÞ

[4] IF set of attributes is empty or cost of previously chosen attributes

and cost of any remaining attributes exceed the COST_LIMIT

[5] THEN return the single node tree Root with label ¼most common

value of label in examples and return

[6] Choose the best attribute A from the set of attributes for which

summarized cost of attributes does not exceed the cost limit

[1] Call function ID3(examples, target_concept, attributes, COST-LIMIT)

[2] Evaluate obtained tree

[3] IF quality of classiﬁer <EXPECTED-QUALITY

[4] THEN COST-LIMIT: ¼COST-LIMIT þSTEP

[5] ELSE return tree

[6] FI

[7] IF COST-LIMIT exceeds MAX-COST-LIMIT

[8] THEN return tree and information that EXPECTED-QUALITY was not

achieved

[9] FI

c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd Expert Systems, July 2010, Vol. 27, No. 3 149

The rule post pruning method was used for

pruning.

All experiments were carried out using a

modiﬁed Quinlan implementation of C4.5

and our own software created in the Matlab

environment using PRTOOLS (Duin et al.,

2004) and the classiﬁcation toolbox (Stork &

Yom-Tov, 2004). For this purpose we mod-

iﬁed the C4.5 algorithm source code.

Probabilities of errors of the classiﬁers were

estimated using the 10-fold cross-validation

method.

4.1. Experiments on benchmark databases

For all experiments ﬁve databases from the UCI

Machine Learning Repository (Newman et al.,

1998) were chosen. They concern different ﬁelds

of medicine and differ from each other in respect

of numbers of examples, amounts and types of

attributes, and numbers of classes. The details of

the tested databases are presented in Table 2.

The results of the experimental investigations

are presented in Figures 1–5.

4.2. Type of hypertension diagnosis

For hypertension therapy it is very important to

recognize the state of the patient and the correct

treatment. The physician is responsible for de-

ciding if the hypertension is of an essential or a

secondary type (so-called ﬁrst-level diagnosis).

Senior physicians from the Hoˆ pital Broussais

Hypertension Clinic and Wroclaw Medical

Academy suggest 30% as an acceptable error

rate for ﬁrst-level diagnosis. The presented pro-

ject was developed together with the Service

d’Informatique Me

´dicale from the University

Paris VI. All data were obtained from the

medical database ARTEMIS, which contains

the data of patients with hypertension who have

been treated at the Hoˆ pital Broussais in Paris.

The mathematical model was simpliﬁed. How-

ever, our experts from the Hoˆ pital Broussais

and Wroclaw Medical Academy regarded this

problem of diagnosis very useful. It led to the

following classiﬁcation of hypertension type:

(1) essential hypertension (abbreviation, es-

sential);

(2) ﬁbroplastic renal artery stenosis (abbrevia-

tion, ﬁbro);

(3) atheromatous renal artery stenosis (abbre-

viation, athero);

(4) Conns syndrome (abbreviation, conn);

(5) renal cystic disease (abbreviation, poly);

(6) pheochromocystoma (abbreviation, pheo).

Although the set of symptoms necessary to

correctly assess existing hypertension is pretty

wide, in practice for the diagnosis the results of

19 examinations (which came from general in-

formation about the patient, blood pressure

measurements and basic biochemical data) are

used (Table 3) and the results of the classiﬁer for

diagnosis of hypertension type are presented in

Figure 6.

4.3. Evaluation of the experimental results

In general the results of the simulations did not

surprise us. The quality of the classiﬁers strongly

depends on cost limit. It increased as the cost

limit rose. We noted that the quality of the

classiﬁers was not improved when some values

of cost limit were exceeded.

For some databases like the thyroid (Figure

1) and liver (Figure 2) we observed that

there were values of cost limit for which the

classiﬁers obtained were good enough. Increas-

ing them provided slightly better classiﬁers but

Table 2: Description of databases used in experiments

Database Number of examples Number of attributes Number of classes

1 Pima Indians diabetes 768 8 2

2 Heart disease 303 13 2

3 Hepatitis 155 19 2

4 Liver disorders 345 5 2

5 Thyroid disease 7200 20 3

150 Expert Systems, July 2010, Vol. 27, No. 3 c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd

0 20 40 60 80 100 120 140 160

Error rate [%]

Maximum classification cost

Classification error for Thyroid Disease

0 20 40 60 80 100 120 140 160

Tree size

Maximum classification cost

Tree size for Thyroid Disease

Figure 1: Classiﬁcation error and decision-tree size versus maximum classiﬁcation cost for thyroid

database.

0 20 40 60 80 100

Error rate [%]

Maximum classification cost

Classification error for Liver Disorders

0 20 40 60 80 100

Tree size

Maximum classification cost

Tree size for Liver Disorders

Figure 2: Classiﬁcation error and decision-tree size versus maximum classiﬁcation cost for liver

database.

0 100 200 300 400 500 600

Error rate [%]

Maximum classification cost

Classification error for Heart Disease

0 100 200 300 400 500 600

Tree size

Maximum classification cost

Tree size for Heart Disease

Figure 3: Classiﬁcation error and decision-tree size versus maximum classiﬁcation cost for heart

disease database.

c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd Expert Systems, July 2010, Vol. 27, No. 3 151

tree size grew rapidly (especially for the thyroid

database), which might cause the classiﬁers to be

overtrained.

But we have to mention an interesting phe-

nomenon that we observed in some experiments.

For example, for the hepatitis (Figure 4) or

Pima Indian diabetes (Figure 5) databases we

noted that an increasing cost limit did not result

in classiﬁcation quality improvement. A cheap

classiﬁer (for a cost limit of about 20) had good

quality and was based on a small decision-tree

structure. The small decision-tree size protected

the classiﬁer from overﬁtting. The observed

effect of decreasing tree size and error rate with

increasing cost limit were caused by the avail-

ability of new, more expensive features, which

were better then the cheaper ones used pre-

viously. The cheap features were responsible

for building an overgrown tree. When the cost

limit increased, the cheap attributes became

available again and they caused tree overgrowth

and a lower quality of classiﬁer.

The same observation was made in the second

experiment (Figure 6). A high quality and very

cheap classiﬁer is obtained for a cost limit equal

to about 12–15 (the maximum cost limit is 44)

and it depends on the ovalue. The second

advantage of this classiﬁer is that tree size is

rather small which could protect against over-

ﬁtting. A similar problem of computer-aided

diagnosis of hypertension type was described

by Blinowska et al. (1991) but they used another

mathematical model and implemented Bayes’s

decision rule. They obtained a better classiﬁer

17.5

18.5

19.5

20.5

0 10 20 30 40 50

Error rate [%]

Maximum classification cost

Classification error for Hepatitis

0 10 20 30 40 50

Tree size

Maximum classification cost

Tree size for Hepatitis

Figure 4: Classiﬁcation error and decision-tree size versus maximum classiﬁcation cost for hepatitis

database.

0 20 40 60 80 100

Error rate [%]

Maximum classification cost

Classification error for Pima Indian Diabetes

0 20 40 60 80 100

Tree size

Maximum classification cost

Tree size for Pima Indian Diabetes

Figure 5: Classiﬁcation error and decision-tree size versus maximum classiﬁcation cost for Pima

Indian diabetes database.

152 Expert Systems, July 2010, Vol. 27, No. 3 c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd

than ours: its frequency of correct classiﬁcation

of the secondary type of hypertension is about

85% (ours was about 70%). The advantage of

our proposition is that it is simpler and extre-

mely cheap compared with the model in the

above-mentioned paper.

5. Conclusion

The idea of a cost-sensitive learning method for

a decision tree with ﬁxed cost limit was pre-

sented in this paper. The properties of the

proposed concept were established during com-

puter experiments conducted on four bench-

mark databases from the medical area. The

results did not surprise us but we noted some

interesting properties of the method under con-

sideration. We hope that the idea presented will

be helpful for constructing real decision systems,

especially for decision-aided systems where the

cost of classiﬁcation plays an important role.

Acknowledgements

This work is supported by the Polish State

Committee for Scientiﬁc Research under a grant

which was realized in the years 2006–2009.

References

ALPAYDIN, E. (2010) Introduction to Machine Learning,

London: MIT Press.

BLINOWSKA, A., G. CHATELLIER,J.BERNIER and

M. LAVRIL, (1991) Bayesian statistics as applied

to hypertension diagnosis, IEEE Transactions on

Biomedical Engineering,38, 699–706.

010 20 30 40 50

Error rate [%]

Maximum classification cost

Classification error for Hypertension

100

150

200

250

010 20 30 40 50

Tree size

Maximum classification cost

Tree size for Hypertension

Figure 6: Classiﬁcation error and decision-tree size versus maximum classiﬁcation cost for type of

hypertension diagnosis.

Table 3: Clinical features considered

Number Feature Cost Number Feature Cost

1 Sex 1 11 Effusion 1

2 Body weight 1 12 Artery stenosis 1

3 Height 1 13 Heart failure 1

4 Cigarette smoker 1 14 Palpitation 1

5 Limb ache 1 15 Cholesterol 8

6 Alcohol 1 16 Serum creatinine 6

7 Systolic blood pressure 1 17 Serum potassium 5

8 Diastolic blood pressure 1 18 Serum sodium 5

9 Maximal systolic blood pressure 1 19 Uric acid 6

10 Carotid or lumbar murmur 1

c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd Expert Systems, July 2010, Vol. 27, No. 3 153

BREIMAN, L., J.H. FRIEDMAN, R.A. OLSHEN and C.J.

STONE (1984) Classiﬁcation and Decision Trees,

Belmont, CA: Wadsworth.

BURDUK, R. (2010) Classiﬁcation error in Bayes multi-

stage recognition task with fuzzy observations, Pat-

tern Analysis and Applications,13, 85–91.

DUDA, R.O., P.E. HART and D.G. STORK (2001) Pattern

Classiﬁcation, New York: Wiley-Interscience.

DUIN, R.P.W., P. JUSZCZAK,P.PACLIK,E.PEKALSKA,

D. DE RIDDER and D.M.J. TAX (2004) PRTools4, A

Matlab Toolbox for Pattern Recognition, Delft: Delft

University of Technology.

GREINER, R., A. GROVE and D. ROTH (2002) Learning

cost-sensitive active classiﬁers, Artiﬁcial Intelligence,

139, 137–174.

Liebowitz, J. (ed.) (1998) The Handbook of Applied

Expert Systems, Boca Raton, FL: CRC Press.

LIROV, Y. and O.C. YUE (1991) Automated network

troubleshooting knowledge acquisition, Journal of

Applied Intelligence,1, 121–132.

MITCHELL, T.M. (1997) Machine Learning, New York:

McGraw-Hill.

MUI, J. and K.S. FU(1980) Automated classiﬁcation of

nucleated blood cells using a binary tree classiﬁer,

IEEE Transactions on Pattern Analysis and Machine

Intelligence,PAMI-2, 429–443.

NEWMAN, D.J., S. HETTICH, C.L. BLAKE and C.J.

MERZ (1998) UCI Repository of Machine Learning

Databases (http://www.ics.uci.edu/mlearn/MLRe

pository.html), Irvine, CA: Department of

Information and Computer Science, University of

California.

NORTON, S.W. (1989) Generating better decision trees,

in Proceedings of the 11th International Joint

Conference on Artiﬁcial Intelligence IJCAI-89,

San Francisco, CA: Morgan Kaufmann, 800–805.

´NEZ, M. (1988) Economic induction: a case study, in

Proceedings of the 3rd European Working Session on

Learning EWSL-88, San Mateo, CA: Morgan Kauf-

mann, 139–145.

´NEZ, M. (1991) The use of background knowledge

in decision tree induction, Machine Learning,6,

231–250.

PENAR, W. and M. WOZNIAK (2007) Experiments on

classiﬁers obtained via decision tree induction meth-

ods with different attribute acquisition cost limit, in

Computer Recognition Systems 2, M. Kurzynski

et al. (eds), Berlin: Springer, 371–377.

QUINLAN, J.R. (1986) Induction on decision tree,

Machine Learning,1, 81–106.

QUINLAN, J.R. (1993) C4.5 Programs for Machine

Learning, San Mateo, CA: Morgan Kaufmann.

SAFAVIAN, S.R. and D. LANDGREBE (1991) A survey of

decision tree classiﬁer methodology, IEEE Transac-

tions on Systems, Man, and Cybernetics,21,

660–674.

STORK, D.G. and E. YOM-TOV (2004) Computer Man-

ual in MATLAB to Accompany Pattern Classiﬁca-

tion, New York: Wiley-Interscience.

TAN, M. (1993) Cost-sensitive learning of classiﬁcation

knowledge and its applications in robotics, Machine

Learning,13, 7–33.

TAN, M. and J. SCHLIMMER (1989) Cost-sensitive

concept learning of sensor use in approach

and recognition, in Proceedings of the 6th

International Workshop on Machine Learning ML-

89, San Francisco, CA: Morgan Kaufmann, 392–

395.

TAN, M. and J. SCHLIMMER (1990) CSL: a cost-

sensitive learning system for sensing and grasping

objects, in Proceedings of the IEEE International

Conference on Robotics and Automation, New York:

IEEE Press, 858–863.

TURNEY, P.D. (1995) Cost-sensitive classiﬁcation: em-

pirical evaluation of a hybrid genetic decision tree

induction algorithm, Journal of Artiﬁcial Intelligence

Research,2, 369–409.

VERDENIUS, F. (1991) A method for inductive cost

optimization, in Proceedings of the 5th European

Working Session on Learning EWSL-91, New York:

Springer, 179–191.

The authors

Wojciech Penar

Wojciech Penar is a PhD student in the Institute

of Computer Engineering, Control and Ro-

botics, Faculty of Electronics, Wroclaw Univer-

sity of Technology, Poland. He received an MS

degree in computer science from the Wroclaw

University of Technology in 2004. His research

focuses on distributed systems, communication

networks and machine learning.

Michal Wozniak

Michal Wozniak is Professor of Computer

Science in the Department of Systems and

Computer Networks, Faculty of Electronics,

Wroclaw University of Technology, Poland. He

received an MS degree in biomedical engineer-

ing in 1992 from the Wroclaw University of

Technology, and PhD and DSc (habilitation)

degrees in computer science in 1996 and 2007

respectively from the same university. His re-

154 Expert Systems, July 2010, Vol. 27, No. 3 c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd

search focuses on multiple classiﬁer systems,

machine learning, data and web mining, Bayes

compound theory, distributed algorithms, com-

puter and networks security and teleinformatics.

Professor Wozniak has published over 120 pa-

pers and two books and has edited three books.

He is editor in chief of International Journal of

Computer Networks and Communications and

associate editor of several international journals

including Pattern Analysis and Applications,Ex-

pert Systems,Information Fusion,Logic Journal

of the IGPL and International Journal of Com-

munication Networks and Distributed Systems.

He serves on the programme committees of

numerous international conferences. His works

have been transitioned into commercial applica-

tions. Professor Wozniak has been involved in

many research projects related to machine learn-

ing, computer networks and telemedicine.

Moreover, he has been a consultant on several

commercial projects for well-known Polish

companies and for the Polish public administra-

tion. Professor Wozniak is a member of

the IEEE (Computational Intelligence Society

and Systems, Man and Cybernetics Society)

and IBS (International Biometric Society).

For a more detailed proﬁle see http://www.kssk.

pwr.wroc.pl/pracownicy/michal.wozniak-en.

c2010 The Authors. Journal Compilation c2010 Blackwell Publishing Ltd Expert Systems, July 2010, Vol. 27, No. 3 155

Dynamic feature selection in medical predictive monitoring by reinforcement learning

Preprint

May 2024

In this paper, we investigate dynamic feature selection within multivariate time-series scenario, a common occurrence in clinical prediction monitoring where each feature corresponds to a bio-test result. Many existing feature selection methods fall short in effectively leveraging time-series information, primarily because they are designed for static data. Our approach addresses this limitation by enabling the selection of time-varying feature subsets for each patient. Specifically, we employ reinforcement learning to optimize a policy under maximum cost restrictions. The prediction model is subsequently updated using synthetic data generated by trained policy. Our method can seamlessly integrate with non-differentiable prediction models. We conducted experiments on a sizable clinical dataset encompassing regression and classification tasks. The results demonstrate that our approach outperforms strong feature selection baselines, particularly when subjected to stringent cost limitations. Code will be released once paper is accepted.

ESMOTE: an overproduce-and-choose synthetic examples generation strategy based on evolutionary computation

Article

Full-text available

Dec 2022
NEURAL COMPUT APPL

The class imbalance learning problem is an important topic that has attracted considerable attention in machine learning and data mining. The most common method of addressing imbalanced datasets is the synthetic minority oversampling technique (SMOTE). However, the SMOTE and its variants suffer from the noise derived from the interpolation of synthetic examples. In this paper, an overproduce-and-choose strategy, which is divided into the overproduction and selection phases, is proposed to generate an appropriate set of synthetic examples for imbalance learning problems. In the overproduction phase, a new interpolation mechanism is developed to produce numerous synthetic examples, while in the selection phase, the synthetic examples that are beneficial to the classification task are selected by using instance selection based on evolutionary computation. Experiments are conducted on a large number of datasets selected from the real-world applications. The experimental results demonstrate that the proposed method is significantly better than SMOTE and its well-known variants in terms of several metrics, including G-mean (GM) and area under the curve.

Master thesis_ Student: Bacha soufiane (Algeria)

Thesis

Full-text available

Aug 2022

Soufiane Bacha

Imbalanced datasets classification is inherently difficult. This situation becomes a challenge when amounts of data are processed to extract knowledge because traditional learning models fail to generate required results due to imbalanced nature of data. In this paper, we will address the problem of imbalanced datasets whether at the class level, or at the classifier level. In our work, we are interested in binary or multi-class classification. To do this, we present a set of techniques used to solve this problem in particular boosting methods and machine learning algorithms. Our goal is therefore to re-balance the dataset at the class level and to find an optimal classifier to handle these datasets after balancing. Through the results obtained, it was observed that the boosting methods are well suited to re-balance the data and thus give a very satisfactory classification result.

Application of Multi-objective Optimization to Feature Selection for a Difficult Data Classification Task

Chapter

Jun 2021

Many different decision problems require taking a compromise between the various goals we want to achieve into account. A specific group of features often decides the state of a given object. An example of such a task is the feature selection that allows increasing the decision’s quality while minimizing the cost of features or the total budget. The work’s main purpose is to compare feature selection methods such as the classical approach, the one-objective optimization, and the multi-objective optimization. The article proposes a feature selection algorithm using the Genetic Algorithm with various criteria, i.e., the cost and accuracy. In this way, the optimal Pareto points for the nonlinear problem of multi-criteria optimization were obtained. These points constitute a compromise between two conflicting objectives. By carrying out various experiments on various base classifiers, it has been shown that the proposed approach can be used in the task of optimizing difficult data.

Cost Sensitive Hierarchical Classifiers for Non-invasive Recognition of Liver Fibrosis Stage

Chapter

Full-text available

Jan 2013

Liver Fibrosis caused by the Hepatitis Virus type C (HCV) may be a serious life-threatening condition if is not diagnosed and treated on time. Our previous research proved that it is possible to estimate liver fibrosis stage in patients with diagnosed HCV only using blood tests. The aim of our research is to find a safe and non-invasive but also inexpensive diagnostic method. As not all blood tests are equally expensive (not only in meaning of money, but also time of analysis), this article introduces a Cost Factor to the hierarchical classifiers. Our classifier has been based on a C4.5 decision tree building algorithm enhanced with a modified EG2 algorithm for maintaining a cost limit.

Cost-Sensitive Learning

Chapter

Jan 2018

Cost-sensitive learning is an aspect of algorithm-level modifications for class imbalance. Here, instead of using a standard error-driven evaluation (or 0–1 loss function), a misclassification cost is being introduced in order to minimize the conditional risk. By strongly penalizing mistakes on some classes, we improve their importance during classifier training step. This pushes decision boundaries away from their instances, leading to improved generalization on these classes. In this chapter we will discuss the basics of cost-sensitive methods, introduce their taxonomy, and describe how to deal with scenarios in which misclassification cost is not given beforehand by an expert. Then we will describe most popular cost-sensitive classifiers and talk about the potential for hybridization with other techniques. Section 4.1 offers background and taxonomy of cost-sensitive classification algorithms. The important issue of how to obtain the cost matrix is discussed in Sect. 4.2. Section 4.3 describes MetaCost, a popular wrapper approach for adapting any classifier to a cost-sensitive setting, while Sect. 4.4 discusses various aspects of cost-sensitive decision trees. Other cost-sensitive classification models are described in Sect. 4.5, while Sect. 4.6 shows the potential advantages of using hybrid cost-sensitive algorithms. Finally Sect. 4.7 concludes this chapter and presents future challenges in the field of cost-sensitive solutions to class imbalance.

Empowering One-vs-One Decomposition with Ensemble Learning for Multi-Class Imbalanced Data

Article

May 2016
KNOWL-BASED SYST

Pruning Ensembles with Cost Constraints

Conference Paper

Mar 2015

The paper presents a cost-sensitive classifier ensemble pruning method, which employs a genetic algorithm to choose the most promising ensemble. In this study the pruning algorithm considers constraints put on the cost of selected features, which is the one of the key-problems in the real-life decision support systems, especially dedicated medical support systems. The proposed method takes into consideration both the overall classification accuracy and the cost constraints, returning balanced solution for the problem at hand. Additionally, also to boost the value of the exploitation cost, we propose to use cost-sensitive decision trees as the base classifiers. The pruning algorithm was evaluated on the basis of the comprehensive computer experiments run on cost-sensitive medical benchmark datasets.

Decomposition of Classification Task with Selection of Classifiers on the Medical Diagnosis Example

Conference Paper

Mar 2012

The article presents the concept of decomposition of the multidimensional classification task. The recognition procedure is divided into independent blocks. These blocks can be interpreted as lower classification problems. The structure of these blocks is presented as a decision tree. In this model the experts give the decision tree structure. The problem discussed in the work shows a selection of different classifiers (or their parameters) to the internal nodes of the decision tree. Experiments conducted for selected medical diagnosis problem show that the use of different classifiers can improve the quality of classification.

Costs-Sensitive Classification in Two-Stage Binary Classifier

Article

Jan 2011

In the paper the problem of cost in the two-stage binary classifier is presented. Assuming that both the tree structure and the feature used at each non-terminal node have been specified, we present the expected total cost for two cases. The first one concerns the zero-one loss function, the second concerns the stage-dependent loss function. The work focuses on the difference between the expected total costs for these two cases of loss function. Obtained results relate to the globally optimal strategy of Bayes multistage classifier.

Pattern Classification

Chapter

Full-text available

Jan 2001

Economic induction: A case study

Conference Paper

Jan 1988

Marlon Núñez

Learning classification rules are not enough to solve diagnostic problems that deal with costly tests of difficult measurements. A program of inductive learning based on criteria of economy of resources is presented. This program is a implementation of a simplified version of EG2 induction method [Núñez 88a] [Núñez 88b]. The economic criterion used in this program can be calibrated from the smallest tree (maximum information gain criterion of ID3) to the most economic tree (EG2*). The generated decision tree makes future users spend less money, people or energy (depending on the cost unit associated to each attribute) while consulting a classification. This program called ALEXIS has been applied to a Gynaecology diagnostic problem and the results are shown in this paper.

Induction of Decision Trees

Article

Mar 1986

Ross Quinlan

The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.

Pattern Classification (Pt.1)

Book

Jan 2001

Automated classification of nucleated blood cells using a binary tree classifier

Article

Sep 1980

Describes the interactive design of a binary tree classifier. The binary tree classifier with a quadratic discriminant function using up to ten features at each nonterminal node was applied to classify 1294 cells into one of 17 classes. Classification accuracies of 83 percent and 77 percent were obtained by the binary tree classifier using the resubstitution and the leave-one-out methods of error estimation, respectively, whereas the existing results using the same data are 71 percent and 67 percent using a single stage linear classifier with 20 features and the resubstitution and the half-and-half methods of error estimation, respectively.

UCI Repository of Machine Learning Databases

Article