Article

Induction in Noisy Domains

January 2000

January 2000

Authors:

This paper examines the induction of classification rules from examples using real-world data. Real-world data is almost always characterized by two features, which are important for the design of an induction algorithm. Firstly, there is often noise present, for example, due to imperfect measuring equipment used to collect the data. Secondly the description language is often incomplete, such that examples with identical descriptions in the language will not always be members of the same class. Many induction systems make the `noiseless domain' assumption that the examples do not contain errors and the description language is complete, and consequently constrain their search for rules to those for which no counterexamples exist in the data used for induction. However, in real-world domains correlations between attributes and classes in a data set are rarely without exceptions. To locate such correlations and induce rules describing them it is also necessary to consider rules which may not classify all the training examples correctly. This paper firstly discusses some of the problems presented by noise and proposes a top-down induction algorithm for induction in real-world domains. Secondly, an experimental comparison of this algorithm with other induction systems is presented using three sets of real-world medical data. October 20, 1994 Induction in Noisy Domains Peter Clark Tim Niblett The Turing Institute Glasgow, G1 2AD 1.

Argumentation Theory for Decision Support in Health-Care: A Comparison with Machine Learning

Conference Paper

Full-text available

Oct 2013

This study investigates the role of defeasible reasoning and argumentation theory for decision-support in the health-care sector. The main objective is to support clinicians with a tool for taking plausible and rational medical decisions that can be better justified and explained. The basic principles of argumentation theory are described and demonstrated in a well known health scenario: the breast cancer recurrence problem. It is shown how to translate clinical evidence in the form of arguments, how to define defeat relations among them and how to create a formal argumentation framework. Acceptability semantics are then applied over this framework to compute arguments justification status. It is demonstrated how this process can enhance clinician decision-making. In detail, the designed framework is developed according to the knowledge-base of an interviewed expert in cancers, and according to the evidence available in a well-know dataset. This dataset has been frequently applied by machine learning techniques to test their capacity to predict the recurrence of breast cancers removed from 286 real patients. A well-known dataset has been used to evaluate our argument-based approach. An encouraging 74% predictive accuracy is compared against the accuracy of well-established machine-learning classifiers that performed equally or worse than our argument-based approach. This result is extremely promising because not only demonstrates how a knowledge-base paradigm can perform as well as state-of-the-art learning-based paradigms, but also because it appears to have a better explanatory capacity and a higher degree of intuitiveness that might be appealing to clinicians.

Rule induction for subgroup discovery with CN2-SD

Article

Full-text available

Rule learning is typically used in solving classification and prediction tasks. However, learning of classification rules can be adapted also to subgroup discovery. This paper shows how this can be achieved by modifying the CN2 rule learning algorithm. Modifications include a new covering algorithm (weighted covering algorithm), a new search heuristic (weighted relative accuracy), probabilistic classification of instances, and a new measure for evaluating the results of subgroup discovery (area under ROC curve). The main advantage of the proposed approach is that each rule with high weighted accuracy represents a 'chunk' of knowledge about the problem, due to the appropriate tradeoff between accuracy and coverage, achieved through the use of the weighted relative accuracy heuristic. Moreover, unlike the classical covering algorithm, in which only the first few induced rules may be of interest as subgroup descriptors with sufficient coverage (since subsequently induced rules are induced from biased example subsets), the subsequent rules induced by the weighted covering algorithm allow for discovering interesting subgroup properties of the entire population. Experimental results on 17 UCI datasets are very promising, demonstrating big improvements in number of induced rules, rule coverage and rule significance, as well as smaller improvements in rule accuracy and area under ROC curve.

Classification Techniques in Data Mining:

Chapter

Jan 2017

Learning is the ability to improve behavior based on former experiences and observations. Nowadays, mankind continuously attempts to train computers for his purpose, and make them smarter through trainings and experiments. Learning machines are a branch of artificial intelligence with the aim of reaching machines able to extract knowledge (learning) from the environment. Classical, fuzzy classification, as a subcategory of machine learning, has an important role in reaching these goals in this area. In the present chapter, we undertake to elaborate and explain some useful and efficient methods of classical versus fuzzy classification. Moreover, we compare them, investigating their advantages and disadvantages.

Improving Rule Learning with ROC Analysis

Article

Full-text available

Nikos Gekas

Rules are commonly used in the field of Machine Learning, because they are simple and intuitive. They can overlap and, thus, allow various different resolution strategies. An open issue is how to select the best strategy and how that can produce reliable probabilistic classification. The idea that this work proposes is to build a decision tree from an overlapping set of rules using the rules as attributes, and utilize the information extracted from that tree to find a better strategy and improve the performance of the rule set. The construction and evaluation of the tree is made through the use of Receiver Operating Characteristic (ROC) Analysis, a technique for visualizing, organizing and selecting classifiers based on their performance. Various methods are proposed that consist of different search heuristics, splitting criteria and algorithms for choosing a resolution strategy. The most promising combination proved to be the creation of Decision Lists derived from a Decision Tree generated using AUCsplit, which is a splitting criterion based on ROC analysis. Experimental results on 23 UCI data sets show a significant increase in the Areas Under the ROC curve (AUCs), while classification accuracy is maintained in high values and rule set sizes are substantially reduced. Also, the method compared adequately against other good probability estimators. Even though the project produced encouraging results, there is still a lot of work to be done in this direction, as the idea of building a Decision Tree from a Rule Set and translating back to the Rule Set is not systematically investigated.

Attribute extraction and classification using rough sets on a lymphoma dataset

Article

Full-text available

This is an electronic version of a paper presented at the International Symposium on Health Informatics and Bioinformatics: HIBIT'05, 10-12 Nov 2005, Antalya, Turkey. material from within this archive for profit-making enterprises or for commercial gain is strictly forbidden. Whilst further distribution of specific materials from within this archive is forbidden, you may freely distribute the URL of WestminsterResearch. Abstract In this paper, we describe a rough sets approach to classification and attribute extraction of a small biomedical dataset. The dataset contains 148 entries with 19 attributes on patients that were suspected to have a lymphoma. Our primary goal was to be able to create a set of rules that allow the prediction of the decision class based on the values of relevant attributes. Our preliminary study of this dataset indicated that seven of the 19 attributes were predictive in this dataset. Our classification accuracy was approximately 85%, with a high sensitivity and specificity. In addition to the promising classification results, rough sets provided a means of dimensionality reduction and rule generation.

RSD: Relational Subgroup Discovery through First-Order Feature Construction

Conference Paper

Full-text available

Mar 2003
Lect Notes Comput Sci

Relational rule learning is typically used in solving classification and prediction tasks. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction. The proposed approach, applicable to subgroup discovery in individualcentered domains, was successfully applied to two standard ILP problems (East-West trains and KRK) and a real-life telecommunications application.

Evaluating the Performance of Machine Learning Classiﬁer Algorithms for Software Estimation in Software Development Projects

Article

Mar 2024

The major aim of this research is to rank the best performing features in order to classify the Software estimation dataset using SVM, Naïve Bayes, Random forest, Decision tree, and KNN classifiers and evaluate their accuracy. Two steps are involved in the classification process: first, the dataset with all attributes is analyzed; second, the information gain methodology is used to rank the attributes, and only the highly rated ones are used to generate the model of classification. Using several folds of cross-validation, we assess the accuracy rank of SVM, Naïve Bayes, Decision tree, Random forest, and KNN classifier

A three-way decision method with tolerance dominance relations in decision information systems

Article

Full-text available

Nov 2022
ARTIF INTELL REV

In recent years, various medical diagnosis problems have been addressed from the perspective of multi-attribute decision making. Among them, the three-way decision theory can provide a novel scheme to solve medical diagnosis issues under the framework of multi-attribute decision making via considering transforming relationships between loss functions and decision matrices. In this paper, we primarily explore a three-way decision method with tolerance dominance relations in ordered decision information systems. In existing three-way decision models, all objects can be divided into two states, we utilize decision attributes to obtain the set of two states in ordered decision information systems. Then, in order to improve the accuracy of patient classifications, the paper simultaneously considers the influence of loss and gain functions for each object, and uses loss and gain functions to obtain net profit functions as new measurement functions. Meanwhile, a class of three-way decisions in terms of multi-attribute decision making rules based on a tolerance dominance relation is established. In light of the proposed three-way decision method, we further construct a multi-attribute decision making method by using tolerance dominance relations and the constructed method is applied to a medical diagnosis issue of Lymphography. Finally, a comparison analysis and an experimental evaluation are performed to illustrate the feasibility and effectiveness of the presented methodology.

A Novel Approach for Noisy Signal Classification Through the Use of Multiple Wavelets and Ensembles of Classifiers

Chapter

Nov 2019

Classification of time series signals can be crucial for many practical applications. While the existing classifiers may accurately classify pure signals, the existence of noise can significantly disturb the classification accuracy of these classifiers. We propose a novel classification approach that uses multiple wavelets together with an ensemble of classifiers to return high classification accuracy even for noisy signals. The proposed technique has two main steps. In Step 1, We convert raw signals into a useful dataset by applying multiple wavelet transforms, each from a different wavelet family or all from the same family with differing filter lengths. In Step 2, We apply the dataset processed in Step 1 to an ensemble of classifiers. We test on 500 noisy signals from five different classes. Our experimental results demonstrate the effectiveness of the proposed technique, on noisy signals, compared to the approaches that use either raw signals or a single wavelet transform.

Extending bayesian classifier with ontological attributes

Article

Jan 2010

Tomasz Lukaszewski

Towards A Practical Estimate Of Training Sample Size

Article

Steve G. Romaniuk

The purpose of this paper is to introduce a simple learning model that allows one to draw conclusions about the number of distinct training examples required to learn some boolean function with at least accuracy and probability across a general class of learning algorithms. The motivation for this work stems from the inability of learning theoretical models to suggest reasonable sample bounds. Reducing sample size is essential in the wake of expected costs for labeling patterns by a teacher e.g. human expert. The derived results are then extended from learning functions to learning concepts to make the analysis more realistic. The importance of domain speciic knowledge in learning concepts is discussed and incorporated into the model in form of identifying impossible training patterns. Several possible sources for these impossibilities are pointed out. The paper is concluded with 2 representative examples.

Contr^ ole d'un Algorithme G en etique Controlling Genetic Algorithms

Article

Full-text available

Dynamic RDT Model for Mining Rules from Real Data

Article

Full-text available

Dynamic Rough Set based Decision Tree Induction (ROT) model is proposed to deal with the noise present in the real time dataset. The paper explores the variants of ROT models for learning classification rules. The required set of classification rules is aimed to help in identification of the households which are vulnerable to food shortage. The classification rules are desired to be as simple -' as possible. In this paper, classical rough set method, C4.5 algorithm, the hybrid algorithm ROT and its variants as well as dynamic ROT model are used for mining rules from a real dataset. The experimental results are compared graphically with that ofthe base algorithms based on the performance parameters classification accuracy, complexity, number of rules and the CS score for the resulting classifier. The performance parameter accuracy as obtained by using Linear Discriminant Analysis is used as a benchmark for comparing accuracy of the proposed model called dynamic ROT. The performance of the proposed model is observed to be better for the real dataset.

Extreme Learning Machines -A Review and State-of-the-art

Article

Nov 2010

Learning time is an important factor while designing any computational intelligent algorithms for classifications, medication, control etc. Recently, Extreme Learning Machine has been proposed, which significantly reduce the amount of time needed to train a Neural Network. It has been widely used for many applications. This paper surveys ELM and it applications. I. INTROUCTION Neural Networks have been extensively used in many fields due to their ability to approximate complex nonlinear mappings directly from the input sample; and to provide models for a large class of natural and artificial phenomena that are difficult to handle using classical parametric techniques. There are many algorithm for training Neural Network like Back propagation, Support Vector Machine (SVM) [41], Hidden Markov Model (HMM) etc. One of the disadvantages of the Neural Network is the learning time. Recently, Huang et al [25], [67]proposed a new learning algorithm for Single Layer Feedforward Neural Network architecture called Extreme Learning Ma-chine (ELM) which overcomes the problems caused by gradient descent based algorithms such as Back propagation applied in ANNs. ELM can significantly reduce the amount of time needed to train a Neural Network. This paper presents a survey of Extreme Learning Machine (ELM). This paper is organized as follows, Section 2 describes about the working of ELM, and Section 3 presents the learning of ELM. Applications of ELM are reviewed in Section 4 and Section 5 concludes of the paper.

Extração de Informações Padronizadas para a Avaliação de Regras Induzidas por Algoritmos de Aprendizado de Máquina Simbólico

Technical Report

Full-text available

Jul 2001

ID3 revisited: A distance based criterion for attribute selection

Article

Full-text available

Ramon López de Mántaras

Relational subgroup discovery for gene expression data mining

Article

Full-text available

We propose a methodology for predictive classification from gene expression data, able to combine the robustness of high-dimensional statistical classification methods with the comprehensibility and interpretabil-ity of simple logic-based models. We first con-struct a robust classifier combining contribu-tions of a large number of gene expression val-ues, and then search for compact summariza-tions of subgroups among genes associated in the classifier with a given class. The subgroups are described by means of relational logic fea-tures extracted from publicly available gene annotations. The curse of dimensionality per-taining to the gene expression based classi-fication problem due to the large number of attributes (genes) is turned into an advantage in the secondary subgroup discovery task, as here the original attributes become learning examples.

SUMMARIZING GENE-EXPRESSION-BASED CLASSIFIERS BY META-MINING COMPREHENSIBLE RELATIONAL PATTERNS

Article

Full-text available

Jan 2006

We propose a methodology for predictive classification from gene expression data, able to combine the robust-ness of high-dimensional statistical classification methods with the comprehensibility and interpretability of simple logic-based models. We first construct a robust classifier combining contributions of a large number of gene expres-sion values, and then (meta)-mine the classifier for com-pact summarizations of subgroups among genes associated with a given class therein. The subgroups are described by means of relational logic features extracted from publicly available gene ontology information. The curse of dimen-sionality pertaining to the gene expression based classifica-tion problem due to the large number of attributes (genes) is turned into an advantage in the secondary, meta-mining task as here the original attributes become learning exam-ples. We cross-validate the proposed method on two classi-fication problems: (i) acute lymphoblastic leukemia (ALL) vs. acute myeloid leukemia (AML), (ii) seven subclasses of ALL.

Combining Argumentation and Bayesian Nets for Breast Cancer Prognosis

Article

Jul 2006

We present a new framework for combining logic with probability, and demonstrate the application of this framework to breast cancer prognosis. Background knowledge concerning breast cancer prognosis is represented using logical arguments. This background knowledge and a database are used to build a Bayesian net that captures the probabilistic relationships amongst the variables. Causal hypotheses gleaned from the Bayesian net in turn generate new arguments. The Bayesian net can be queried to help decide when one argument attacks another. The Bayesian net is used to perform the prognosis, while the argumentation framework is used to provide a qualitative explanation of the prognosis.

Robust k-DNF Learning via Inductive Belief Merging

Conference Paper

Full-text available

Mar 2003
Lect Notes Comput Sci

A central issue in logical concept induction is the prospect of inconsistency. This problem may arise due to noise in the training data, or because the target concept does not fit the underlying concept class. In this paper, we introduce the paradigm of inductive belief merging which handles this issue within a uniform framework. The key idea is to base learning on a belief merging operator that selects the concepts which are as close as possible to the set of training examples. From a computational perspective, we apply this paradigm to robust k-DNF learning. To this end, we develop a greedy algorithm which approximates the optimal concepts to within a logarithmic factor. The time complexity of the algorithm is polynomial in the size of k. Moreover, the method bidirectional and returns one maximally specific concept and one maximally general concept. We present experimental results showing the effectiveness of our algorithm on both nominal and numerical datasets.

FOIL: a midterm report

Chapter

FOIL is a learning system that constructs Horn clause programs from examples. This paper summarises the development of FOIL from 1989 up to early 1993 and evaluates its effectiveness on a non-trivial sequence of learning tasks taken from a Prolog programming text. Although many of these tasks are handled reasonably well, the experiment highlights some weaknesses of the current implementation. Areas for further research are identified.

Non-redundant implicational base of formal context with constraints using SAT

Article

Jan 2024

An implicational base is knowledge extracted from a formal context. The implicational base of a formal context consists of attribute implications which are sound, complete, and non-redundant regarding to the formal context. Non-redundant means that each attribute implication in the implication base cannot be inferred from the others. However, sometimes some attribute implications in the implication base can be inferred from the others together with a prior knowledge. Regarding knowledge discovery, such attribute implications should be not considered as new knowledge and ignored from the implicational base. In other words, such attribute implications are redundant based on prior knowledge. One sort of prior knowledge is a set of constraints that restricts some attributes in data. In formal context, constraints restrict some attributes of objects in the formal context. This article proposes a method to generate non-redundant implication base of a formal context with some constraints which restricting the formal context. In this case, non-redundant implicational base means that the implicational base does not contain all attribute implications which can be inferred from the others together with information of the constraints. This article also proposes a formulation to check the redundant attribute implications and encoding the problem into satisfiability (SAT) problem such that the problem can be solved by SAT Solver, a software which can solve a SAT problem. After implementation, an experiment shows that the proposed method is able to check the redundant attribute implication and generates a non-redundant implicational base of formal context with constraints.

The Use of Artificial Intelligence to Determine Contingent Legal Liabilities Provisions.

Article

Jan 2021

Signal Classification Using Smooth Coefficients of Multiple Wavelets to Achieve High Accuracy from Compressed Representation of Signal

Chapter

Nov 2022

Classification of time series signals has become an important construct and has many practical applications. With existing classifiers, we may be able to classify signals accurately; however, that accuracy may decline if using a reduced number of attributes. Transforming the data and then undertaking a dimensionality reduction may improve the quality of the data analysis, decrease the time required for classification and simplify models. We propose an approach, which chooses suitable wavelets to transform the data, then combines the output from these transformations to construct a dataset by applying ensemble classifiers. We demonstrate this on different data sets across different classifiers and use different evaluation methods. Our experimental results demonstrate the effectiveness of the proposed technique, compared to the approaches that use either raw signal data or a single wavelet transform.KeywordsSignal classificationEnergy distributionWaveletsEnsembles

A Survey on Classifying Big Data with Label Noise

Article

Apr 2022

Class label noise is a critical component of data quality that directly inhibits the predictive performance of machine learning algorithms. While many data-level and algorithm-level methods exist for treating label noise, the challenges associated with big data call for new and improved methods. This survey addresses these concerns by providing an extensive literature review on treating label noise within big data. We begin with an introduction to the class label noise problem and traditional methods for treating label noise. Next, we present 30 methods for treating class label noise in a range of big data contexts, i.e. high volume, high variety, and high velocity problems. The surveyed works include distributed solutions capable of operating on data sets of arbitrary sizes, deep learning techniques for large-scale data sets with limited clean labels, and streaming techniques for detecting class noise in the presence of concept drift. Common trends and best practices are identified in each of these areas, implementation details are reviewed, empirical results are compared across studies when applicable, and references to 17 open-source projects and programming packages are provided. An emphasis on label noise challenges, solutions, and empirical results as they relate to big data distinguishes this work as a unique contribution that will inspire future research and guide machine learning practitioners.

A Modified Naïve Bayes Style Possibilistic Classifier for the Diagnosis of Lymphatic Diseases

Conference Paper

Feb 2017

In this paper, we propose a modified version of the Naïve Bayes Style Possibilistic Classifier (NBSPC) which has been already suggested to make decision from the categorical and subjective medical information included by the lymphography dataset of University of California Irvine (UCI). As the former NBSPC, the modified classifier combines the structure of the Naïve Bayes Classifier (NBC) as a good classifier for discrete features with the possibility theory as a powerful framework for belief estimation from subjective data. However, unlike the former NBSPC which uses the minimum as a fusion operator, the proposed classifier fuses possibilistic beliefs using the generalized minimum-based algorithm which has been recently proposed to deal with heterogeneous medical data. Experimental evaluations on the lymphograhy dataset show that the proposed G-Min-based NBSPC outperforms the former NBSPC as well as the main classification techniques which have been used in related work.

Comparison of Four Methods of Combining Classifiers on the Basis of Dispersed Medical Data

Chapter

Jun 2016

Małgorzata Przybyła-Kasperek

The main aim of the article is to compare the results obtained using four different methods of combining classifiers in a dispersed decision-making system. In the article the following fusion methods are used: the majority vote, the weighted majority vote, the Borda count method and the highest rank method. Two of these methods are used if the individual classifier generates a class label and two are used in the case when the individual classifier produces ranking of classes instead of unique class choice. All of these methods were tested in a situation when we have access to data from medical field and this data are in a dispersed form. The use of dispersed medical data is very important because it is common situation that medical data from one domain are collected in many different medical centers. It would be good to be able to use all this accumulated knowledge at the same time.

Rule Learning in a Nutshell

Chapter

Sep 2012

This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the material presented here and discuss advanced approaches, whereas this chapter only presents the core concepts. The chapter describes search heuristics and rule quality criteria, the basic covering algorithm, illustrates classification rule learning on simple propositional learning problems, shows how to use the learned rules for classifying new instances, and introduces the basic evaluation criteria and methodology for rule-set evaluation.

Case Studies

Chapter

Jan 2016

This chapter introduces three case studies of big data. In particular, the methods and techniques introduced in Chaps. 3, 4, 5 and 6 are evaluated through theoretical analysis and empirical validation using large data sets in terms of accuracy, efficiency and interpretability.

Global Decisions Taking Process, Including the Stage of Negotiation, on the Basis of Dispersed Medical Data

Conference Paper

May 2014

Małgorzata Przybyła-Kasperek

The article discusses the issues related to the decision-making system using dispersed knowledge. In the proposed system, the classification process of the test object can be described as follows. In the first step, we investigate how particular classifiers classify a test object. We describe this using probability vectors over decision classes. We cluster classifiers with respect to similarities of the probability vectors. In the paper a new approach has been proposed in which the clustering process consists of two stages and three types of relations between classifiers: friendship, conflict and neutrality are defined. In the first step initial groups are created. Such a group contains classifiers that are in friendship relation. In the second stage, classifiers which are in neutrality relation are attached to the existing groups. In experiments the situation is considered in which medical data from one domain are collected in many medical centers. We want to use all of the collected data at the same time in order to make a global decisions.

Association Pattern Analysis for Pattern Pruning, Clustering and Summarization

Article

Chung Lam Li

Digital image analysis. Selected techniques and applications. Incl. 1 CD-ROM

Article

Jan 2001

Global Decisions Taking on the Basis of Dispersed Medical Data

Conference Paper

Oct 2013

The main aim of the article is to present a decision-making system using dispersed knowledge. The article introduces the system with dynamically generated coalitions. The local knowledge bases, on the basis of which a similar classification for the test object is made, are combined into a coalition. In the proposed system, the classification process can be divided into several steps. In the first step we describe the classification of a test object made on the basis of local knowledge base, by probability vectors over decision classes. We cluster local knowledge bases with respect to similarities of probability vectors. For every cluster, we find a kind of combined information. Finally, we classify the test object using the method for the conflict analysis. The main aim of the paper is to present the results of experiments on medical data. In experiments the situation is considered in which medical data from one domain are collected in many medical centers. We want to use all of the collected data at the same time in order to make a global decisions.

Diagnosis of Lymphatic Diseases Using A Naive Bayes Style Possibilistic Classifier

Conference Paper

Oct 2013

This paper investigates a Naïve Bayes Style Possibilistic Classifier (NBSPC) to make decision from the categorical and subjective medical information included by the lymphog-raphy dataset of University of California Irvine (UCI). Main focus of the work is to improve the classification accuracy. NBSPC simultaneously relies on the structure of the Naïve Bayes classifier as a good classifier for categorical features, and on the possibility theory as an interesting framework to model and fuse subjective medical data. Possibilistic measures are estimated within the NBSPC using maximum likelihood estimation and then the probability-possibility transformation method of Dubois et al. Results show that the proposed classifier outperforms other classification techniques which have been already evaluated on the same data.

Knowledge Acquisition via Knowledge Integration Current Trends in Knowledge Acquisition Amsterdam

Article

Full-text available

Jun 1990

In this paper we are concerned with the problem of acquiring knowledge by integration. Our aim is to construct an integrated knowledge base from several separate sources. The need to merge knowledge bases can arise, for example, when knowledge bases are acquired independently from interactions with several domain experts. As opinions of different domain experts may differ, the knowledge bases constructed in this way will normally differ too. A similar problem can also arise whenever separate knowledge bases are generated by learning algorithms. The objective of integration is to construct one system that exploits all the knowledge that is available and has a good performance. The aim of this paper is to discuss the methodology of knowledge integration, describe the implemented system (INTEG.3), and present some concrete results which demonstrate the advantages of this method.

Evaluation of automatic rule induction systems

Article

Mar 1995
EXPERT SYST APPL

The effects of number of attributes for description of data set, number of examples included in the training set, and post-pruning mechanism, on the predictability power of the classification rules for automatic assignment of river water pollution levels were studied. In the induction experiments, the original ID3 algorithm embedded in the Knowledge Maker environment was modified by postpruning mechanism. In order to facilitate the evaluation of the developed classification rules, the algorithm of Reingold and Tilford for tidier drawing of trees was implemented. The results showed that efficient classification rules in comparison with experts' class assignment can already be derived from 500 examples of baseline data, each example being described by 5 attributes.

Kernel Methods and Component Analysis for Pattern Recognition

Article

Jason C. Isaacs

Kernel methods, as alternatives to component analysis, are mathematical tools that provide a higher dimensional representation, for feature recognition and image analysis problems. In machine learning, the kernel trick is a method for converting a linear classification learning algorithm into non-linear one, by mapping the original observations into a higher-dimensional space so that the use of a linear classifier in the new space is equivalent to a non-linear classifier in the original space. In this dissertation we present the performance results of several continuous distribution function kernels, lattice oscillation model kernels, Kelvin function kernels, and orthogonal polynomial kernels on select benchmarking databases. In addition, we develop methods to analyze the use of these kernels for projection analysis applications; principal component analysis, independent component analysis, and optimal projection analysis. We compare the performance results with known kernel methods on several benchmarks. Empirical results show that several of these kernels outperform other previously suggested kernels on these data sets. Additionally, we develop a genetic algorithm-based kernel optimal projection analysis method which, through extensive testing, demonstrates a ten percent average improvement in performance on all data sets over the kernel principal component analysis projection. We also compare our kernels methods for kernel eigenface representations with previous techniques. Finally, we analyze the benchmark databases used here to determine whether we can aid in the selection of a particular kernel that would perform optimally based on the statistical characteristics of each database.

Usage of the T-norms and T-conorms in medical data classification

Article

Oct 2008

Kalle Saastamoinen

In this article, properties and usability of combined operator based on t-norms and t-conorms is studied in medical data classification. It is shown how this operator can be weighted with differential evolution and aggregated with a generalized mean, and what kinds of measures for comparison can be achieved from this procedure. New operators suitable for comparison measures are suggested, that are combination measures based on the use of Frank and Yu type of t-norms and t-conorms.

The Curious Incidence of Novels About Asperger's Syndrome

Article

Sep 2004

Bill Greenwell

Since Asperger's Syndrome was formally recognised in 1994, several novels featuring characters with the syndrome have appeared. Bill Greenwell's article discusses these books in providing a context for a closer consideration of the British publishing sensation of 2003, Mark Haddon's ‘The Curious Incident of the Dog in the Night-Time’. The reasons for the success of this suburban comedy, Greenwell argues, include the consequences for the reader of Haddon's choice of the sufferer from Asperger's as narrator, especially the generation of unconscious humour and the range of literary forms he uses to tell his story. Nicholas Tucker adds an Afterword from his perspective as an educational psychologist as well as a literary critic, finding in the novel a rich mixture of heroism, mystery and love mediated through narrrative ingenuity.

Machine-learning techniques and their applications in manufacturing

Article

Apr 2005

Machine learning is concerned with enabling computer programs automatically to improve their performance at some tasks through experience. Manufacturing is an area where the application of machine learning can be very fruitful. However, little has been published about the use of machine-learning techniques in the manufacturing domain. This paper evaluates several machine-learning techniques and examines applications in which they have been successfully deployed. Special attention is given to inductive learning, which is among the most mature of the machine-learning approaches currently available. Current trends and recent developments in machine-learning research are also discussed. The paper concludes with a summary of some of the key research issues in machine learning.

Opportunistic Constructive Induction: Using Fragments of Domain Knowledge to Guide Construction

Article

Dec 1991

Gregg H. Gunsch

One subfield of machine learning is the induction of a representation of a concept from positive and negative examples of the concept. Given a set of training examples, the goal of the inductive system is to create a description capable of classifying the training examples, yet general enough to accurately predict the classification of unseen examples. Often the original attributes describing the instances are inadequate to capture important regularities in the concept. New descriptors, constructed through the application of operators to the original attributes, can provide the proper vocabulary to create concise concept representations at the right level of generalization to be highly predictive. Constructive induction is the process of generating and applying new descriptors during inductive learning. The large number of possible constructive operators and combinations of attributes defines an enormous search space for the inductive process. Knowledge about the concept or problem domain can be used to guide the construction of new descriptors. This thesis lays the foundation of opportunistic constructive induction in the context of decision-tree assembly, providing a framework for dynamically applying fragments of knowledge to produce potentially useful descriptors or hypotheses.

APACS: A system for the automatic analysis and classification of conceptual patterns

Article

Apr 2007
COMPUT INTELL

Many existing inductive learning systems have been developed under the assumption that the learning tasks are performed in a noise-free environment. To cope with most real-world problems, it is important that a learning system be equipped with the capability to handle uncertainty. In this paper, we first identify the various sources of uncertainty that may be encountered in a noisy problem domain. Next, we present a method for the efficient acquisition of classification rules from training instances which may contain inconsistent, incorrect, or missing information. This algorithm consists of three phases: (i) the detection of inherent patterns in a set of noisy training data; (ii) the construction of classification rules based on these patterns; and (iii) the use of these rules to predict the class membership of an object. The method has been implemented in a system known as APACS (automatic pattern analysis and classification system). This system has been tested using both real-life and simulated data, and its performance is found to be superior to many existing systems in terms of efficiency and classification accuracy. Being able to handle uncertainty in the learning process, the proposed algorithm can be employed for applications in real-world problem domains involving noisy data.

Quantitative measures of a fuzzy expert system

Article

Full-text available

Jan 2001

Phayung Meesad

Using optimization tools such as genetic algorithms (GAs) to construct a fuzzy expert system (FES) focusing only on its accuracy without considering the comprehensibility may result in a system that is not easy to understand. To exploit the transparency features of FESs for explanation in higher-level knowledge representation, a FES should provide high comprehensibility while preserves its accuracy. The completeness of fuzzy sets and rule structures should also be considered to guarantee that every data point has a response output. This paper proposes some quantitative measures to determine the degrees of the accuracy, the comprehensibility, and the completeness of FESs. These quantitative measures are then used as a fitness function for a genetic algorithm in optimally constructing a FES.

Technical Note submitted to Machine Learning Comparing Information-theoretic Attribute Selection Measures: A statistical approach

Article

Full-text available

In a previous paper (Lopez de Mantaras, 1991), we introduced a new information theoretic attribute selection method for decision tree induction. This method consists in computing for each node, a distance between the partition generated by the values of each candidate attribute in the node and the correct partition of the subset of training examples in this node. The chosen attribute is that whose corresponding partition is the closest to the correct partition (i.e. the partition that perfectly classifies the training data). In the paper we had also formally proved that such distance is not biased towards attributes with a large number of values in the sense specified by Quinlan in (Quinlan, 1986) and we had also some initial experimental evidence that the predictive accuracy of the induced trees was not significantly different from that obtained with the most widely used information theoretic attribute selection measures, that is, Quinlan´s Gain and Quinlan´s Gain Ratio. However, it seemed that the distance induced smaller trees especially when the attributes had different number of values. In that paper we could not confirm that the differences were statistically significant due to the small number of experiments we had performed. Now in this paper we report experimental results that allow us to confirm that the distance induces trees whose size, without losing accuracy, is not significantly different from those obtained using Quinlan's Gain but smaller than those obtained with Quinlan's Gain Ratio. These experimental results are supported by a statistical analysis performed using two statistical hypothesis tests : the sign test and the signed rank test.

Multi-pass instance based learning

Article

Steve G. Romaniuk

This paper introduces a new modiied approach to the instance based learning the-ory. Instance based learning is augmented by neighborhood spheres and multi-pass training to improve both on generalization capabilities and storage requirements. Two models for creating neighborhood spheres are investigated and put in perspec-tive with the IBL instance based learner. The IBL system considered here is based on the proximity algorithm, the growth additive algorithm and a noise resistant modiication of the growth additive algorithm. The herein described experiments will address the similarity of the MPIL and the IBL algorithms, but also point out signiicant diierences in the approach of reducing storage requirements and increas-ing generalization. A time complexity analysis of the proposed multi-pass instance based learning approach is provided. Several domains are used in this study, which include a real world domain in CMOS wafer fault diagnosis to allow for a comparison of these two approaches. Finally, the task of knowledge extraction in form of rules is addressed.

Learning Logical Definitions from Relations

Article

Jan 1990

Ross Quinlan

This paper describesfoil, a system that learns Horn clauses from data expressed as relations.foil is based on ideas that have proved effective in attribute-value learning systems, but extends them to a first-order formalism. This new system has been applied successfully to several tasks taken from the machine learning literature.

Multi-Layer Hierarchical Rule Learning in Reactive Robot Control Using Incremental Decision Trees

Article

Feb 1999

This paper presents a new approach to the intelligent navigation of a mobile robot. The hybrid control architecture described combines properties of purely reactive and behaviour-based systems, providing the ability both to learn automatically behaviours from inception, and to capture these in a distributed hierarchy of decision tree networks. The robot is first trained in the simplest world which has no obstacles, and is then trained in successively more complex worlds, using the knowledge acquired in the previous worlds. Each world representing the perceptual space is thus directly mapped on a unique rule layer which represents in turn the robot action space encoded in a distinct decision tree. A major advantage of the current implementation, compared with the previous work, is that the generated rules are easily understood by human users. The paper demonstrates that the proposed behavioural decomposition approach provides efficient management of complex knowledge, and that the learning mechanism is able to cope with noise and uncertainty in sensory data.

Learning in Uncertain Environments

Chapter

Jan 2011

In this paper we briefly survey the problems arising in learning concept descriptions from examples in domains affected by uncertainty and vagueness. A programming environment, called SMART-SHELL, is also presented: it addresses these problems, exploiting fuzzy logic. This is achieved by supplying the learning system with the capability of handling a fuzzy relational database, containing the extensional representation of the acquired logic formulas.

A numerical strategy to defectuous knowledge using

Article

Jun 1995

Knowledge-Based Systems are based on an often defectuous knowledge, be this knowledge acquired from experts or learned from examples. This paper presents a strategy designed to cope with defectuous knowledge: given a set of rules, it builds a similarity function over the work space of the problem. This similarity function together with a set of examples then enables case-based reasoning, through aK-nearest-neighbour-like process. Compared to other case-based reasoning techniques, the advantage of this approach is the following: the “topology” of the space is automatically induced from the given rules, instead of being explicitly provided (and tuned) by the expert.

Learning Classifier Systems in Data Mining: An Introduction

Chapter

Jan 2008

This chapter provides an introduction to Learning Classifier Systems before reviewing a number of historical uses in data mining. An overview of the rest of the volume is then presented.

The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains.

Conference Paper

Full-text available

Jan 1986

Learning Efficient Classification Procedures and Their Application to Chess End Games

Chapter

Dec 1983

Ross Quinlan

A series of experiments dealing with the discovery of efficient classification procedures from large numbers of examples is described, with a case study from the chess end game king-rook versus king-knight. After an outline of the inductive inference machinery used, the paper reports on trials leading to correct and very fast attribute-based rules for the relations lost 2-ply and lost 3-ply. On another tack, a model of the performance of an idealized induction system is developed and its somewhat surprising predictions compared with observed results. The paper ends with a description of preliminary work on the automatic specification of relevant attributes.

Probability and statistical inference. II

Book

Jan 1985

J.G. Kalbfleisch

· I @BULLET@BULLET A Comparative Study of Inductive Learning Systems AQllP and ID-3 Using a Chess Endgame Test Problem

Article

Paul O ' Rorke

Discovering rules by induction from large collections of examples Introductory readings in expert s

Article

J Quinlan

CALM: contestation for argumentative learning machine Machine Learning: A guide to current research

Article

Algorithms for learning logical formulas IJCAI83

Article

Incremental generation of VL1 hypotheses: the underlying methodology and the de-scription of program AQ11

Article

Jan 1983

Selection of most representative training examples and incremental generation of VL1 hypothesis: the underlying methodology and description of programs ESEL and AQ11

Article

Jan 1978

The AQ15 inductive learning system: An overview and experiments

Article

Jan 1986

Knowledge Acquisition and Refinement Tools for the ADVISE Meta-Expert System

Article

Jan 1984

R. E. Reinke

Generalizations based on explanations IJCAI81 Vancouver

Article

G Dejong

Induction of decision trees" Machine Learning

Article

Jan 1986

Ross Quinlan

Learning decision rules in noisy domAlns

Article

Jan 1986

Automatic induction of classification rules for a chess endgame

Article

Dec 1982

Inductively derived decision rules which correctly classify all legal Black-to-move positions in the king and pawn versus king endgame are derived using a technique of ‘structured’ induction.

“Learning By Being Told and Learning from Example: An Experimental Comparison of the Two Methods of Knowledge Acquisition in the Context of Developing an Expert System for Soybean Disease Diagnosis,”

Article

Jan 1980

Inductive knowledge acquisition: A case study

Article

Jan 1987

An abstract is not available.

An Inductive Learning Approach to the Problem of Predicting a Protein's Secondary Structure from Its Amino Acid Sequence.

Conference Paper

Jan 1987

Ross D. King

Explanation-Based Generalization: A Unifying View

Article

Mar 1986

The problem of formulating general concepts from specific training examples has long been a major focus of machine learning research. While most previous research has focused on empirical methods for generalizing from a large number of training examples using no domain-specific knowledge, in the past few years new methods have been developed for applying domain-specific knowledge to for- mulate valid generalizations from single training examples. The characteristic common to these methods is that their ability to generalize from a single example follows from their ability to explain why the training example is a member of the concept being learned. This paper proposes a general, domain-independent mechanism, called EBG, that unifies previous approaches to explanation-based generalization. The EBG method is illustrated in the context of several example problems, and used to contrast several existing systems for explanation-based generalization. The perspective on explanation-based generalization af- forded by this general method is also used to identify open research problems in this area.

Economics of automatic generation of rules from examples in a chess end-game /

Article

Alan Jackson

Typescript. Thesis (M.S.)--University of Illinois at Urbana-Champaign, 1984. Includes bibliographical references (leaves 105-106).

Experiments in automatic learning of medical diagnostic rules. Presented at the International School for the Synthesis of Expert Knowledge Workshop

Jan 1984

I Kononenko
I Bratko
E Roskar

Kononenko I., Bratko I., Roskar E. (1984) Experiments in automatic learning of medical diagnostic rules. Presented at the International School for the Synthesis of Expert Knowledge Workshop 1984, Bled, Yugoslavia. Also published as a Technical Report, Faculty of Electrical Engineering, E. Kardelj University, Ljubljana, Yugoslavia 1984.

Jan 1947

A Wald

Wald A. (1947) Sequential Analysis New York: Wiley.

Rule Learning in DISCIPLE Proceedings of EWSL 1986 Orsay

Jan 1986

Y Kodratoff
G Tecuci

Kodratoff Y., Tecuci G. (1986) Rule Learning in DISCIPLE Proceedings of EWSL 1986 Orsay: Université de Paris-Sud.

PLAGE: A way to give and use knowledge in learning Proceedings of EWSL

Jan 1986

O Gascuel

Gascuel O. (1986) PLAGE: A way to give and use knowledge in learning Proceedings of EWSL 1986 Orsay: Université de Paris-Sud.

Concept formation from sequential data Proceedings of EWSL

Jan 1986

J Haiech
J Quinqueton
J Sallantin

Haiech J., Quinqueton J., Sallantin J. (1986) Concept formation from sequential data Proceedings of EWSL 1986 Orsay: Université de Paris-Sud.

Learning from noisy data

Jan 1986
MACH LEARN

J Quinlan

Quinlan J. (1986) Learning from noisy data, Machine Learning vol. 2 Ed. R.Michalski, J.Carbonell and T.Mitchell, Palo Alto, CA: Tioga.

Learning decision rules in noisy domains. Presented at Expert Systems 86 Brighton

Dec 1986
15-18

T Niblett
I Bratko

Niblett T., Bratko I. (1986) Learning decision rules in noisy domains. Presented at Expert Systems 86 Brighton, 15-18 Dec. and published in Conference Proceedings.

Induction in Noisy Domains

Abstract

No full-text available

Recommended publications

The Impact of Small Disjuncts on Classifier Learning