Conference PaperPDF Available

Hierarchical Rules for a Hierarchical Classifier

April 2007

April 2007
4431:749-757

DOI:10.1007/978-3-540-71618-1_83

Source
DBLP

Conference: Adaptive and Natural Computing Algorithms, 8th International Conference, ICANNGA 2007, Warsaw, Poland, April 11-14, 2007, Proceedings, Part I

Authors:

Igor Tomasz Podolak

Jagiellonian University

A system of rule extraction out of a complex hierarchical classifier is proposed in this paper. There are several methods for rule extraction out of trained artificial neural networks (ANN’s), but these methods do not scale well, i.e. results are satisfactory for small problems. For complicated problems hundreds of rules are produced, which are hard to govern. In this paper a hierarchical classifier with a tree-like structure and simple ANN’s at nodes, is presented, which splits the original problem into several sub-problems that overlap. Node classifiers are all weak (i.e. with accuracy only better than random), and errors are corrected at lower levels. Single sub-problems constitute of examples that were hard to separate. Such architecture is able to classify better than single network models. At the same time if–then rules are extracted, which only answer which sub-problem a given example belongs to. Such rules, by introducing hierarchy, are simpler and easier to modify by hand, giving also a better insight into the original classifier behaviour.

An example original classifier Cl and a modified one Cl mod

…

The complete hierarchical classifier architecture for the zoo problem

…

Figures - uploaded by Igor Tomasz Podolak

Content may be subject to copyright.

Content uploaded by Igor Tomasz Podolak

Content may be subject to copyright.

A preview of the PDF is not available

Imbalanced Classification Using Dictionary-based Prototypes and Hierarchical Decision Rules for Entity Sense Disambiguation.

Conference Paper

Full-text available

Jan 2010

Entity sense disambiguation becomes difficult with few or even zero training instances available, which is known as imbalanced learning problem in machine learning. To overcome the problem, we create a new set of reliable training instances from dictionary, called dictionary-based prototypes. A hierarchical classification system with a tree-like structure is designed to learn from both the prototypes and training instances, and three different types of classifiers are employed. In addition, supervised dimensionality reduction is conducted in a similarity-based space. Experimental results show our system outperforms three baseline systems by at least 8.3% as measured by macro F1 score.

On the number of clusterings in a hierarchical classification model with overlapping clusters

Article

Full-text available

Jan 2011

This paper shows a new combinatorial problem which emerged from studies on an artificial intelligence classification model of a hierarchical classifier. We introduce the notion of proper clustering and show how to count their number in a special case when 3 clusters are allowed. An algorithm that generates all clusterings is given. We also show that the proposed approach can be generalized to any number of clusters, and can be automatized. Finally, we show the relationship between the problem of counting clusterings and the Dedekind problem.

B2R: An Algorithm for Converting Bayesian Networks to Sets of Rules

Conference Paper

Full-text available

Jan 2010

In this paper B2R algorithm that converts Bayesian networks into sets of rules is proposed. It is tested on several data sets with various configurations and results show that accuracy is similar to original Bayesian networks even after pruning a high number of rules. It allows to exploit advantages of both knowledge representation techniques.

Risk Estimation for Hierarchical Classifier

Conference Paper

Full-text available

May 2011

We describe the Hierarchical Classifier (HC), which is a hybrid architecture [1] built with the help of supervised training and unsupervised problem clustering. We prove a theorem giving the estimation $\hat{R}$\hat{R} of HC risk. The proof works because of an improved way of computing cluster weights, introduced in this paper. Experiments show that $\hat{R}$\hat{R} is correlated with HC real error. This allows us to use $\hat{R}$\hat{R} as the approximation of HC risk without evaluating HC subclusters. We also show how $\hat{R}$\hat{R} can be used in efficient clustering algorithms by comparing HC architectures with different methods of clustering.

A New Notion of Weakness in Classification Theory

Article

Full-text available

Jul 2009
IEEE Comput Soc Conf Comput Vis Pattern Recogn

The notion of a weak classifier, as one which is “a little better” than a random one, was introduced first for 2-class problems [1]. The extensions to K-class problems are known. All are based on relative activations for correct and incorrect classes and do not take into account the final choice of the answer. A new understanding and definition is proposed here. It takes into account only the final choice of classification that must be taken. It is shown that for a K class classifier to be called “weak”, it needs to achieve lower than 1/K risk value. This approach considers only the probability of the final answer choice, not the actual activations.

Hierarchical classifier with overlapping class groups

Article

Jan 2008
EXPERT SYST APPL

Igor Tomasz Podolak

In this paper a novel complex classifier architecture is proposed. The architecture has a hierarchical tree-like structure with simple artificial neural networks (ANNs) at each node. The actual structure for a given problem is not preset but is built throughout training.The training algorithm’s ability to build the tree-like structure is based on the assumption that when a weak classifier (i.e., one that classifies only slightly better than a random classifier) is trained and examples from any two output classes are frequently mismatched, then they must carry similar information and constitute a sub-problem. After each ANN has been trained its incorrect classifications are analyzed and new sub-problems are formed. Consequently, new ANNs are built for each of these sub-problems and form another layer of the hierarchical classifier.An important feature of the hierarchical classifier proposed in this work is that the problem partition forms overlapping sub-problems. Thus, the classification follows not just a single path from the root, but may fork enhancing the power of the classification. It is shown how to combine the results of these individual classifiers.

A Hierarchical Classifier with Growing Neural Gas Clustering

Conference Paper

Full-text available

Apr 2009

A novel architecture for a hierarchical classifier (HC) is defined. The objective is to combine several weak classifiers to form a strong one, but a different approach from those known, e.g. AdaBoost, is taken: the training set is split on the basis of previous classifier misclassification between output classes. The problem is split into overlapping subproblems, each classifying into a different set of output classes. This allows for a task size reduction as each sub-problem is smaller in the sense of lower number of output classes, and for higher accuracy. The HC proposes a different approach to the boosting approach. The groups of output classes overlap, thus examples from a single class may end up in several subproblems. It is shown, that this approach ensures that such hierarchical classifier achieves better accuracy. A notion of generalized accuracy is introduced. The sub-problems generation is simple as it is performed with a clustering algorithm operating on classifier outputs. We propose to use the Growing Neural Gas [1] algorithm, because of its good adaptiveness.

Computational intelligence methods for understanding of data

Article

Full-text available

Jan 2004

Neural Networks. A Comprehensive Foundation

Article

Jan 1994

The Strength of Weak Learnability

Article

Jun 1990

Robert E. Schapire

This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent. A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error ∈.

The Elements of Statistical Learning: Springer

Article

Jan 2009

JESS, the rule engine for the Java platform

Article

E. J. Friedman-Hill

Data Mining: Practical Machine Learning Tools with Java Implementations

Article

Jan 2000

The Elements Of Statistical Learning

Article

Jan 2001
ELEMENTS

Principles of Data Mining

Book

Jan 2001

The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.

Neural Networks – A Comprehensive Foundation

Chapter