Article

Induction in Noisy Domains

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper examines the induction of classification rules from examples using real-world data. Real-world data is almost always characterized by two features, which are important for the design of an induction algorithm. Firstly, there is often noise present, for example, due to imperfect measuring equipment used to collect the data. Secondly the description language is often incomplete, such that examples with identical descriptions in the language will not always be members of the same class. Many induction systems make the `noiseless domain' assumption that the examples do not contain errors and the description language is complete, and consequently constrain their search for rules to those for which no counterexamples exist in the data used for induction. However, in real-world domains correlations between attributes and classes in a data set are rarely without exceptions. To locate such correlations and induce rules describing them it is also necessary to consider rules which may not classify all the training examples correctly. This paper firstly discusses some of the problems presented by noise and proposes a top-down induction algorithm for induction in real-world domains. Secondly, an experimental comparison of this algorithm with other induction systems is presented using three sets of real-world medical data. October 20, 1994 Induction in Noisy Domains Peter Clark Tim Niblett The Turing Institute Glasgow, G1 2AD 1.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Predicting recurrence is important for assisting the identification of patients with critical prognosis and minimising unnecessary therapies. We have chosen this domain because real data from a public dataset 1 is available and repeatedly used in the machine learning literature from 1986 up to 2011 ([12] [4]). It includes 286 instances of real patients who went through a breast cancer operation (9 records contains incomplete values). ...
... Table 3 clearly emphasises the the high prediction rate of our model against machinelearning (ML) classifiers. Our approach does not require any training/learning and the output is always the same (unlike ML classifiers) 4 . Each case is evaluated independently by this CAF and the size of the dataset is negligible. ...
... Results are then averaged over all folds giving the cross-validation estimate of the accuracy.3 x% split: x% of the records of the dataset is used for training the classifier and the remaining 100-x% is used to test the model and check its predictive capacity.4 We recall that in the experiments we have only evaluated just one expert's knowledge, which is not trained to fit the data, but it is used to build the CAF ...
Conference Paper
Full-text available
This study investigates the role of defeasible reasoning and argumentation theory for decision-support in the health-care sector. The main objective is to support clinicians with a tool for taking plausible and rational medical decisions that can be better justified and explained. The basic principles of argumentation theory are described and demonstrated in a well known health scenario: the breast cancer recurrence problem. It is shown how to translate clinical evidence in the form of arguments, how to define defeat relations among them and how to create a formal argumentation framework. Acceptability semantics are then applied over this framework to compute arguments justification status. It is demonstrated how this process can enhance clinician decision-making. In detail, the designed framework is developed according to the knowledge-base of an interviewed expert in cancers, and according to the evidence available in a well-know dataset. This dataset has been frequently applied by machine learning techniques to test their capacity to predict the recurrence of breast cancers removed from 286 real patients. A well-known dataset has been used to evaluate our argument-based approach. An encouraging 74% predictive accuracy is compared against the accuracy of well-established machine-learning classifiers that performed equally or worse than our argument-based approach. This result is extremely promising because not only demonstrates how a knowledge-base paradigm can perform as well as state-of-the-art learning-based paradigms, but also because it appears to have a better explanatory capacity and a higher degree of intuitiveness that might be appealing to clinicians.
... This paper investigates how to adapt classical classification rule learning approaches to subgroup discovery, by exploiting the information about class membership in training examples. This paper shows how this can be achieved by appropriately modifying the well-known CN2 rule learning algorithm [4, 5, 3], which we have implemented in Java and incorporated in the WEKA data mining environment [16]. The modified CN2 algorithm and its experimental evaluation in selected domains of the UCI Repository of Machine Learning Databases [12] are outlined. ...
... The CN2 Rule Induction Algorithm. CN2 is an algorithm for inducing propositional classification rules [4, 5]. CN2 consists of two main procedures: the search procedure that performs beam search in order to find a single rule and the control procedure that repeatedly executes the search. ...
... The performance of different variants of the CN2 rule induction algorithm was measured using 10-fold stratified cross-validation. In particular, we compared the CN2-SD subgroup discovery algorithm with the standard CN2 algorithm (CN2- standard, described in [4, 5, 3] ) and the CN2 algorithm using WRAcc (CN2- WRAcc, described in [15]). All these variants of the CN2 algorithm were first re-implemented in the WEKA data mining environment [16], because the use of the same system makes the comparisons more impartial. ...
Article
Full-text available
Rule learning is typically used in solving classification and prediction tasks. However, learning of classification rules can be adapted also to subgroup discovery. This paper shows how this can be achieved by modifying the CN2 rule learning algorithm. Modifications include a new covering algorithm (weighted covering algorithm), a new search heuristic (weighted relative accuracy), probabilistic classification of instances, and a new measure for evaluating the results of subgroup discovery (area under ROC curve). The main advantage of the proposed approach is that each rule with high weighted accuracy represents a 'chunk' of knowledge about the problem, due to the appropriate tradeoff between accuracy and coverage, achieved through the use of the weighted relative accuracy heuristic. Moreover, unlike the classical covering algorithm, in which only the first few induced rules may be of interest as subgroup descriptors with sufficient coverage (since subsequently induced rules are induced from biased example subsets), the subsequent rules induced by the weighted covering algorithm allow for discovering interesting subgroup properties of the entire population. Experimental results on 17 UCI datasets are very promising, demonstrating big improvements in number of induced rules, rule coverage and rule significance, as well as smaller improvements in rule accuracy and area under ROC curve.
... Decision trees can attribute symbolic decisions to new samples. Decision tree is a method for proposing law base, and in fact, a method of representing knowledge (Clark & Niblett, 1987). Traditional decision trees are a powerful approach in machine learning, but they have little efficiency in the instances of confusion, lack of values of attributes in sample description, high cardinality of an attribute, insufficient set of training samples, inadequate partition of the value space for some attributes,as well as when there is a need for numeral decisions. ...
... Numerous researches have been performed on altering inference procedures from imperfect or incompatible tree (Quinlan, 1984;Quinlan, 1987;Quinlan, 1986;Mingers, 1989a;Mingers, 1989b;Clark and Niblett, 1987). Nonetheless, no method is predominant. ...
Chapter
Learning is the ability to improve behavior based on former experiences and observations. Nowadays, mankind continuously attempts to train computers for his purpose, and make them smarter through trainings and experiments. Learning machines are a branch of artificial intelligence with the aim of reaching machines able to extract knowledge (learning) from the environment. Classical, fuzzy classification, as a subcategory of machine learning, has an important role in reaching these goals in this area. In the present chapter, we undertake to elaborate and explain some useful and efficient methods of classical versus fuzzy classification. Moreover, we compare them, investigating their advantages and disadvantages.
... CN2 (Clark and Niblett, 1987; Clark and Niblett, 1989) is an algorithm designed to induce " if…then… " rules in domains where there might be noise. It consists of two main procedures; a search algorithm performing a beam search for a good rule, and a control algorithm for repeatedly executing the search. ...
... First, the interpretation of each rule is dependent on the rules that precede it. As pointed out by Clark and Niblett (1987), it is difficult for an expert to understand the true meaning of a rule far down in the list, especially with a large number of rules. Second, on each iteration, fewer training examples are available for the learning algorithm, which hinders the algorithm's ability to learn. ...
Article
Full-text available
Rules are commonly used in the field of Machine Learning, because they are simple and intuitive. They can overlap and, thus, allow various different resolution strategies. An open issue is how to select the best strategy and how that can produce reliable probabilistic classification. The idea that this work proposes is to build a decision tree from an overlapping set of rules using the rules as attributes, and utilize the information extracted from that tree to find a better strategy and improve the performance of the rule set. The construction and evaluation of the tree is made through the use of Receiver Operating Characteristic (ROC) Analysis, a technique for visualizing, organizing and selecting classifiers based on their performance. Various methods are proposed that consist of different search heuristics, splitting criteria and algorithms for choosing a resolution strategy. The most promising combination proved to be the creation of Decision Lists derived from a Decision Tree generated using AUCsplit, which is a splitting criterion based on ROC analysis. Experimental results on 23 UCI data sets show a significant increase in the Areas Under the ROC curve (AUCs), while classification accuracy is maintained in high values and rule set sizes are substantially reduced. Also, the method compared adequately against other good probability estimators. Even though the project produced encouraging results, there is still a lot of work to be done in this direction, as the idea of building a Decision Tree from a Rule Set and translating back to the Rule Set is not systematically investigated.
... In each case, the mean or mode is used (in the event of a tie in the mode version, a random selection is used) to fill in the missing values, based on the particular attribute in question, conditioned on the particular decision class the attribute belongs to. There are many variations on this theme, and the interested reader is directed to [3,4] for an extended discussion on this critical issue. Once missing values are handled, the next step is to discretise the dataset. ...
... We were able to achieve a high classification rate for this dataset, with an average of 89%. Our results provide reasonable classification accuracy, surpassing several reported values [2,3]. In the process of classifying the data, we were also able to reduce the dimensionality of the dataset to 7 attributes. ...
Article
Full-text available
This is an electronic version of a paper presented at the International Symposium on Health Informatics and Bioinformatics: HIBIT'05, 10-12 Nov 2005, Antalya, Turkey. material from within this archive for profit-making enterprises or for commercial gain is strictly forbidden. Whilst further distribution of specific materials from within this archive is forbidden, you may freely distribute the URL of WestminsterResearch. Abstract In this paper, we describe a rough sets approach to classification and attribute extraction of a small biomedical dataset. The dataset contains 148 entries with 19 attributes on patients that were suspected to have a lymphoma. Our primary goal was to be able to create a set of rules that allow the prediction of the decision class based on the values of relevant attributes. Our preliminary study of this dataset indicated that seven of the 19 attributes were predictive in this dataset. Our classification accuracy was approximately 85%, with a high sensitivity and specificity. In addition to the promising classification results, rough sets provided a means of dimensionality reduction and rule generation.
... Rule learning typically consists of two main procedures: the search procedure that performs search in order to find a single rule and the control procedure that repeatedly executes the search. In the propositional rule learner CN2 [5,6], for instance, the search procedure performs beam search using classification accuracy of the rule as a heuristic function. The accuracy of rule H ← B is equal to the conditional probability of head H, given that the body B is satisfied: ...
... In the case of ties, we make the appropriate number of steps up and to the right at once, drawing a diagonal line segment.5 A description of this method applied to decision tree induction can be found in[8]. ...
Conference Paper
Full-text available
Relational rule learning is typically used in solving classification and prediction tasks. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction. The proposed approach, applicable to subgroup discovery in individualcentered domains, was successfully applied to two standard ILP problems (East-West trains and KRK) and a real-life telecommunications application.
... A 78% accuracy rate was found by Cestink et al. Zhang and Su [9] examined the ranking, or Area Under the Curve (AUC), of decision tree learning algorithms and Naïve Bayes. Different performance aspects are measured and illustrated using different metrics [21]. ...
Article
The major aim of this research is to rank the best performing features in order to classify the Software estimation dataset using SVM, Naïve Bayes, Random forest, Decision tree, and KNN classifiers and evaluate their accuracy. Two steps are involved in the classification process: first, the dataset with all attributes is analyzed; second, the information gain methodology is used to rank the attributes, and only the highly rated ones are used to generate the model of classification. Using several folds of cross-validation, we assess the accuracy rank of SVM, Naïve Bayes, Decision tree, Random forest, and KNN classifier
... Then, we suppose a hospital plans to use Lymphography to screen some patients for lymphoma. The example contains 148 objects z i , 18 conditional attributes c j , 1 decision attribute d i , and 4 possible final diagnostic classes Clark and Niblett (1987), where d = {d 1 , d 2 , d 3 , d 4 } stands for normal find, metastases, malign lymph and fibrosis, respectively. The conditional attributes refer to the data of disease conditions, which are obviously benefit attributes. ...
Article
Full-text available
In recent years, various medical diagnosis problems have been addressed from the perspective of multi-attribute decision making. Among them, the three-way decision theory can provide a novel scheme to solve medical diagnosis issues under the framework of multi-attribute decision making via considering transforming relationships between loss functions and decision matrices. In this paper, we primarily explore a three-way decision method with tolerance dominance relations in ordered decision information systems. In existing three-way decision models, all objects can be divided into two states, we utilize decision attributes to obtain the set of two states in ordered decision information systems. Then, in order to improve the accuracy of patient classifications, the paper simultaneously considers the influence of loss and gain functions for each object, and uses loss and gain functions to obtain net profit functions as new measurement functions. Meanwhile, a class of three-way decisions in terms of multi-attribute decision making rules based on a tolerance dominance relation is established. In light of the proposed three-way decision method, we further construct a multi-attribute decision making method by using tolerance dominance relations and the constructed method is applied to a medical diagnosis issue of Lymphography. Finally, a comparison analysis and an experimental evaluation are performed to illustrate the feasibility and effectiveness of the presented methodology.
... Decision trees can attribute symbolic decisions to new samples. Automatic rule induction systems for inducing classification rules have already proved valuable as tools for assisting in the task of knowledge acquisition for expert systems [1]. ...
Chapter
Classification of time series signals can be crucial for many practical applications. While the existing classifiers may accurately classify pure signals, the existence of noise can significantly disturb the classification accuracy of these classifiers. We propose a novel classification approach that uses multiple wavelets together with an ensemble of classifiers to return high classification accuracy even for noisy signals. The proposed technique has two main steps. In Step 1, We convert raw signals into a useful dataset by applying multiple wavelet transforms, each from a different wavelet family or all from the same family with differing filter lengths. In Step 2, We apply the dataset processed in Step 1 to an ensemble of classifiers. We test on 500 noisy signals from five different classes. Our experimental results demonstrate the effectiveness of the proposed technique, on noisy signals, compared to the approaches that use either raw signals or a single wavelet transform.
... Let us call this resulting noise as description-noise. Following for example [3] the main reason for description-noise may be in a language used to represent attribute values, which is not expressive enough to model different levels of knowledge granularity. In such a case, erroneous or missing attribute values may be introduced by users of a system that are required to provide very specific values, but the level of their knowledge of the domain is too general to precisely describe the observation by the appropriate value of an attribute. ...
... In fact, the di erence between r = d n 2 e and the LM bound remains about the same. What this result suggests is, if n is large and we happen to know, the concept we i n tend to learn can be represented by a decision tree with 5 For practical reasons we are only interested in relatively high values of Since we apply binary coding for representing attribute value pairs a total of n = 2 5 bits are required to encode the 7 attributes. Figure 4a indicates the number of patterns required using a decision tree of rank r = 2 , a neural network with 1 hidden unit and LM with a priori knowledge apk, that is domain encoding knowledge. ...
Article
The purpose of this paper is to introduce a simple learning model that allows one to draw conclusions about the number of distinct training examples required to learn some boolean function with at least accuracy and probability across a general class of learning algorithms. The motivation for this work stems from the inability of learning theoretical models to suggest reasonable sample bounds. Reducing sample size is essential in the wake of expected costs for labeling patterns by a teacher e.g. human expert. The derived results are then extended from learning functions to learning concepts to make the analysis more realistic. The importance of domain speciic knowledge in learning concepts is discussed and incorporated into the model in form of identifying impossible training patterns. Several possible sources for these impossibilities are pointed out. The paper is concluded with 2 representative examples.
... Notons que les exemples peuvent se r ev eler incoh erents : le croisement de x 1 avec x 2 selon un masque c peut conduire a des enfants plus performants que les parents, alors que le croisement du même x 1 avec un autre parent selon le même masque de croisement c peut conduire a des enfants moins performants; comme un exemple comprend la description d'un seul parent, il est alors possible que deux exemples de même description appartiennent a des classes di erentes, i.e. soient incoh erents. Le risque d'incoh erence augmente si les exemples sont d ecrits par le seul masque de l'op erateur concern e. Ces incoh erences, qui restent marginales dans la pratique, ne p enalisent cependant pas la d emarche propos ee dans la mesure o u de nombreux algorithmes d'apprentissage (dont DiVS) permettent de g erer les incoh erences50,9]. ...
... Induction is performed using only representative training examples, either selected by the expert or automatically, as was done by the ESEL system (Clark and Niblett (1986)) for the task of Soybean diagnosis. However, selection of noise based on expert advice is not practically feasible for data mining because of cost constraints, non-availability of experts and difficulties in identification of a noisy example from the large datasets. ...
Article
Full-text available
Dynamic Rough Set based Decision Tree Induction (ROT) model is proposed to deal with the noise present in the real time dataset. The paper explores the variants of ROT models for learning classification rules. The required set of classification rules is aimed to help in identification of the households which are vulnerable to food shortage. The classification rules are desired to be as simple -' as possible. In this paper, classical rough set method, C4.5 algorithm, the hybrid algorithm ROT and its variants as well as dynamic ROT model are used for mining rules from a real dataset. The experimental results are compared graphically with that ofthe base algorithms based on the performance parameters classification accuracy, complexity, number of rules and the CS score for the resulting classifier. The performance parameter accuracy as obtained by using Linear Discriminant Analysis is used as a benchmark for comparing accuracy of the proposed model called dynamic ROT. The performance of the proposed model is observed to be better for the real dataset.
... a) Experts 85 b) AQ15 80-82 P. Clark et.al. [13] a) Simple Bayes (1987) 83 b) CN2 82 G. Cestnik et.al., Knowledge-Elicitation 76 Tool (1987) [54] ...
Article
Learning time is an important factor while designing any computational intelligent algorithms for classifications, medication, control etc. Recently, Extreme Learning Machine has been proposed, which significantly reduce the amount of time needed to train a Neural Network. It has been widely used for many applications. This paper surveys ELM and it applications. I. INTROUCTION Neural Networks have been extensively used in many fields due to their ability to approximate complex nonlinear mappings directly from the input sample; and to provide models for a large class of natural and artificial phenomena that are difficult to handle using classical parametric techniques. There are many algorithm for training Neural Network like Back propagation, Support Vector Machine (SVM) [41], Hidden Markov Model (HMM) etc. One of the disadvantages of the Neural Network is the learning time. Recently, Huang et al [25], [67]proposed a new learning algorithm for Single Layer Feedforward Neural Network architecture called Extreme Learning Ma-chine (ELM) which overcomes the problems caused by gradient descent based algorithms such as Back propagation applied in ANNs. ELM can significantly reduce the amount of time needed to train a Neural Network. This paper presents a survey of Extreme Learning Machine (ELM). This paper is organized as follows, Section 2 describes about the working of ELM, and Section 3 presents the learning of ELM. Applications of ELM are reviewed in Section 4 and Section 5 concludes of the paper.
... Em (Prati, Baranauskas & Monard 2001) ¶ e descrita o formato padrão PBM com maiores detalhes, al ¶ em de uma biblioteca de scripts que convertem a linguagem de representa» cão de conceitos dos principais algoritmos de AM simb ¶ olico | ID3 (Quinlan 1986), C4:5 (Quinlan 1988), C4:5rules (Quinlan 1988), C5:0 /See5 2 , CN 2 (Clark & Niblett 1987;Clark & Boswell 1989;Clark & Boswell 1991), OC1 (Murthy, Kasif & Salzberg 1994), Ripper (Cohen 1995 ...
... Recent comparative studies of several selection measures for decision-tree induction (Mingers, 1989) show that Quinlan's Gain Ratio generates the smallest trees. We have compared our distance-based criterion with Quinlan's Gain Ratio using data of two medical domains (see Table 1) already used by other researchers (Clark & Niblett, 1987;Cestnik et al., 1987) to compare their inductive algorithms. In each domain we have taken different proportions (60%, 70%, 80%) of randomly selected examples for training and the remaining (40%, 30%, 20%) for testing. ...
... Our algorithm is based on an adaptation of the standard propositional rule learner CN2 [16, 13]. Its search procedure used in learning a single rule performs beam search, starting from the empty conjunct , successively adding conditions (relational features ). ...
Article
Full-text available
We propose a methodology for predictive classification from gene expression data, able to combine the robustness of high-dimensional statistical classification methods with the comprehensibility and interpretabil-ity of simple logic-based models. We first con-struct a robust classifier combining contribu-tions of a large number of gene expression val-ues, and then search for compact summariza-tions of subgroups among genes associated in the classifier with a given class. The subgroups are described by means of relational logic fea-tures extracted from publicly available gene annotations. The curse of dimensionality per-taining to the gene expression based classi-fication problem due to the large number of attributes (genes) is turned into an advantage in the secondary subgroup discovery task, as here the original attributes become learning examples.
... Our algorithm is based on an adaptation of the standard propositional rule learner CN2 [3, 4]. Its search procedure used in learning a single rule performs beam search, starting from the empty conjunct, successively adding conditions (relational features). ...
Article
Full-text available
We propose a methodology for predictive classification from gene expression data, able to combine the robust-ness of high-dimensional statistical classification methods with the comprehensibility and interpretability of simple logic-based models. We first construct a robust classifier combining contributions of a large number of gene expres-sion values, and then (meta)-mine the classifier for com-pact summarizations of subgroups among genes associated with a given class therein. The subgroups are described by means of relational logic features extracted from publicly available gene ontology information. The curse of dimen-sionality pertaining to the gene expression based classifica-tion problem due to the large number of attributes (genes) is turned into an advantage in the secondary, meta-mining task as here the original attributes become learning exam-ples. We cross-validate the proposed method on two classi-fication problems: (i) acute lymphoblastic leukemia (ALL) vs. acute myeloid leukemia (AML), (ii) seven subclasses of ALL.
... We used the Ljubljana Breast Cancer Dataset [34], a set of 286 instances of real patient data with a binary outcome (Recurrence/ No Recurrence) and 9 possible predictive attributes. This dataset has been used in the past for several machine learning projects [18, 4]. The dataset contains the following variables: ...
Article
We present a new framework for combining logic with probability, and demonstrate the application of this framework to breast cancer prognosis. Background knowledge concerning breast cancer prognosis is represented using logical arguments. This background knowledge and a database are used to build a Bayesian net that captures the probabilistic relationships amongst the variables. Causal hypotheses gleaned from the Bayesian net in turn generate new arguments. The Bayesian net can be queried to help decide when one argument attacks another. The Bayesian net is used to perform the prognosis, while the argumentation framework is used to provide a qualitative explanation of the prognosis.
... In presence of inconsistency, any version space becomes empty (it is said to collapse) and hence, the learning algorithm can fail into trivialization. In fact, as noticed by Clark and Niblett [6] , very few real world problems operate under consistent conditions . Inconsistency may arise due to the imperfectness of the " training set " . ...
Conference Paper
Full-text available
A central issue in logical concept induction is the prospect of inconsistency. This problem may arise due to noise in the training data, or because the target concept does not fit the underlying concept class. In this paper, we introduce the paradigm of inductive belief merging which handles this issue within a uniform framework. The key idea is to base learning on a belief merging operator that selects the concepts which are as close as possible to the set of training examples. From a computational perspective, we apply this paradigm to robust k-DNF learning. To this end, we develop a greedy algorithm which approximates the optimal concepts to within a logarithmic factor. The time complexity of the algorithm is polynomial in the size of k. Moreover, the method bidirectional and returns one maximally specific concept and one maximally general concept. We present experimental results showing the effectiveness of our algorithm on both nominal and numerical datasets.
... The principal diierences between zeroth-order and rst-order supervised learning systems are the form of the training data and the way that a learned theory is expressed. Data for zeroth-order learning programs such as ASSISTANT Cestnik, Kononenko and Bratko, 1986], CART Breiman, Friedman, Olshen and Stone, 1984], CN2 Clark and Niblett, 1987] and C4.5 Quinlan, 1992] comprise preclassiied cases, each described by its values for a xed collection of attributes. These systems develop theories, in the form of decision trees or production rules, that relate a case's class to its attribute values. ...
Chapter
FOIL is a learning system that constructs Horn clause programs from examples. This paper summarises the development of FOIL from 1989 up to early 1993 and evaluates its effectiveness on a non-trivial sequence of learning tasks taken from a Prolog programming text. Although many of these tasks are handled reasonably well, the experiment highlights some weaknesses of the current implementation. Areas for further research are identified.
Article
An implicational base is knowledge extracted from a formal context. The implicational base of a formal context consists of attribute implications which are sound, complete, and non-redundant regarding to the formal context. Non-redundant means that each attribute implication in the implication base cannot be inferred from the others. However, sometimes some attribute implications in the implication base can be inferred from the others together with a prior knowledge. Regarding knowledge discovery, such attribute implications should be not considered as new knowledge and ignored from the implicational base. In other words, such attribute implications are redundant based on prior knowledge. One sort of prior knowledge is a set of constraints that restricts some attributes in data. In formal context, constraints restrict some attributes of objects in the formal context. This article proposes a method to generate non-redundant implication base of a formal context with some constraints which restricting the formal context. In this case, non-redundant implicational base means that the implicational base does not contain all attribute implications which can be inferred from the others together with information of the constraints. This article also proposes a formulation to check the redundant attribute implications and encoding the problem into satisfiability (SAT) problem such that the problem can be solved by SAT Solver, a software which can solve a SAT problem. After implementation, an experiment shows that the proposed method is able to check the redundant attribute implication and generates a non-redundant implicational base of formal context with constraints.
Chapter
Classification of time series signals has become an important construct and has many practical applications. With existing classifiers, we may be able to classify signals accurately; however, that accuracy may decline if using a reduced number of attributes. Transforming the data and then undertaking a dimensionality reduction may improve the quality of the data analysis, decrease the time required for classification and simplify models. We propose an approach, which chooses suitable wavelets to transform the data, then combines the output from these transformations to construct a dataset by applying ensemble classifiers. We demonstrate this on different data sets across different classifiers and use different evaluation methods. Our experimental results demonstrate the effectiveness of the proposed technique, compared to the approaches that use either raw signal data or a single wavelet transform.KeywordsSignal classificationEnergy distributionWaveletsEnsembles
Article
Class label noise is a critical component of data quality that directly inhibits the predictive performance of machine learning algorithms. While many data-level and algorithm-level methods exist for treating label noise, the challenges associated with big data call for new and improved methods. This survey addresses these concerns by providing an extensive literature review on treating label noise within big data. We begin with an introduction to the class label noise problem and traditional methods for treating label noise. Next, we present 30 methods for treating class label noise in a range of big data contexts, i.e. high volume, high variety, and high velocity problems. The surveyed works include distributed solutions capable of operating on data sets of arbitrary sizes, deep learning techniques for large-scale data sets with limited clean labels, and streaming techniques for detecting class noise in the presence of concept drift. Common trends and best practices are identified in each of these areas, implementation details are reviewed, empirical results are compared across studies when applicable, and references to 17 open-source projects and programming packages are provided. An emphasis on label noise challenges, solutions, and empirical results as they relate to big data distinguishes this work as a unique contribution that will inspire future research and guide machine learning practitioners.
Conference Paper
In this paper, we propose a modified version of the Naïve Bayes Style Possibilistic Classifier (NBSPC) which has been already suggested to make decision from the categorical and subjective medical information included by the lymphography dataset of University of California Irvine (UCI). As the former NBSPC, the modified classifier combines the structure of the Naïve Bayes Classifier (NBC) as a good classifier for discrete features with the possibility theory as a powerful framework for belief estimation from subjective data. However, unlike the former NBSPC which uses the minimum as a fusion operator, the proposed classifier fuses possibilistic beliefs using the generalized minimum-based algorithm which has been recently proposed to deal with heterogeneous medical data. Experimental evaluations on the lymphograhy dataset show that the proposed G-Min-based NBSPC outperforms the former NBSPC as well as the main classification techniques which have been used in related work.
Chapter
The main aim of the article is to compare the results obtained using four different methods of combining classifiers in a dispersed decision-making system. In the article the following fusion methods are used: the majority vote, the weighted majority vote, the Borda count method and the highest rank method. Two of these methods are used if the individual classifier generates a class label and two are used in the case when the individual classifier produces ranking of classes instead of unique class choice. All of these methods were tested in a situation when we have access to data from medical field and this data are in a dispersed form. The use of dispersed medical data is very important because it is common situation that medical data from one domain are collected in many different medical centers. It would be good to be able to use all this accumulated knowledge at the same time.
Chapter
This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the material presented here and discuss advanced approaches, whereas this chapter only presents the core concepts. The chapter describes search heuristics and rule quality criteria, the basic covering algorithm, illustrates classification rule learning on simple propositional learning problems, shows how to use the learned rules for classifying new instances, and introduces the basic evaluation criteria and methodology for rule-set evaluation.
Chapter
This chapter introduces three case studies of big data. In particular, the methods and techniques introduced in Chaps. 3, 4, 5 and 6 are evaluated through theoretical analysis and empirical validation using large data sets in terms of accuracy, efficiency and interpretability.
Conference Paper
The article discusses the issues related to the decision-making system using dispersed knowledge. In the proposed system, the classification process of the test object can be described as follows. In the first step, we investigate how particular classifiers classify a test object. We describe this using probability vectors over decision classes. We cluster classifiers with respect to similarities of the probability vectors. In the paper a new approach has been proposed in which the clustering process consists of two stages and three types of relations between classifiers: friendship, conflict and neutrality are defined. In the first step initial groups are created. Such a group contains classifiers that are in friendship relation. In the second stage, classifiers which are in neutrality relation are attached to the existing groups. In experiments the situation is considered in which medical data from one domain are collected in many medical centers. We want to use all of the collected data at the same time in order to make a global decisions.
Conference Paper
The main aim of the article is to present a decision-making system using dispersed knowledge. The article introduces the system with dynamically generated coalitions. The local knowledge bases, on the basis of which a similar classification for the test object is made, are combined into a coalition. In the proposed system, the classification process can be divided into several steps. In the first step we describe the classification of a test object made on the basis of local knowledge base, by probability vectors over decision classes. We cluster local knowledge bases with respect to similarities of probability vectors. For every cluster, we find a kind of combined information. Finally, we classify the test object using the method for the conflict analysis. The main aim of the paper is to present the results of experiments on medical data. In experiments the situation is considered in which medical data from one domain are collected in many medical centers. We want to use all of the collected data at the same time in order to make a global decisions.
Conference Paper
This paper investigates a Naïve Bayes Style Possibilistic Classifier (NBSPC) to make decision from the categorical and subjective medical information included by the lymphog-raphy dataset of University of California Irvine (UCI). Main focus of the work is to improve the classification accuracy. NBSPC simultaneously relies on the structure of the Naïve Bayes classifier as a good classifier for categorical features, and on the possibility theory as an interesting framework to model and fuse subjective medical data. Possibilistic measures are estimated within the NBSPC using maximum likelihood estimation and then the probability-possibility transformation method of Dubois et al. Results show that the proposed classifier outperforms other classification techniques which have been already evaluated on the same data.
Article
Full-text available
In this paper we are concerned with the problem of acquiring knowledge by integration. Our aim is to construct an integrated knowledge base from several separate sources. The need to merge knowledge bases can arise, for example, when knowledge bases are acquired independently from interactions with several domain experts. As opinions of different domain experts may differ, the knowledge bases constructed in this way will normally differ too. A similar problem can also arise whenever separate knowledge bases are generated by learning algorithms. The objective of integration is to construct one system that exploits all the knowledge that is available and has a good performance. The aim of this paper is to discuss the methodology of knowledge integration, describe the implemented system (INTEG.3), and present some concrete results which demonstrate the advantages of this method.
Article
The effects of number of attributes for description of data set, number of examples included in the training set, and post-pruning mechanism, on the predictability power of the classification rules for automatic assignment of river water pollution levels were studied. In the induction experiments, the original ID3 algorithm embedded in the Knowledge Maker environment was modified by postpruning mechanism. In order to facilitate the evaluation of the developed classification rules, the algorithm of Reingold and Tilford for tidier drawing of trees was implemented. The results showed that efficient classification rules in comparison with experts' class assignment can already be derived from 500 examples of baseline data, each example being described by 5 attributes.
Article
Kernel methods, as alternatives to component analysis, are mathematical tools that provide a higher dimensional representation, for feature recognition and image analysis problems. In machine learning, the kernel trick is a method for converting a linear classification learning algorithm into non-linear one, by mapping the original observations into a higher-dimensional space so that the use of a linear classifier in the new space is equivalent to a non-linear classifier in the original space. In this dissertation we present the performance results of several continuous distribution function kernels, lattice oscillation model kernels, Kelvin function kernels, and orthogonal polynomial kernels on select benchmarking databases. In addition, we develop methods to analyze the use of these kernels for projection analysis applications; principal component analysis, independent component analysis, and optimal projection analysis. We compare the performance results with known kernel methods on several benchmarks. Empirical results show that several of these kernels outperform other previously suggested kernels on these data sets. Additionally, we develop a genetic algorithm-based kernel optimal projection analysis method which, through extensive testing, demonstrates a ten percent average improvement in performance on all data sets over the kernel principal component analysis projection. We also compare our kernels methods for kernel eigenface representations with previous techniques. Finally, we analyze the benchmark databases used here to determine whether we can aid in the selection of a particular kernel that would perform optimally based on the statistical characteristics of each database.
Article
In this article, properties and usability of combined operator based on t-norms and t-conorms is studied in medical data classification. It is shown how this operator can be weighted with differential evolution and aggregated with a generalized mean, and what kinds of measures for comparison can be achieved from this procedure. New operators suitable for comparison measures are suggested, that are combination measures based on the use of Frank and Yu type of t-norms and t-conorms.
Article
Since Asperger's Syndrome was formally recognised in 1994, several novels featuring characters with the syndrome have appeared. Bill Greenwell's article discusses these books in providing a context for a closer consideration of the British publishing sensation of 2003, Mark Haddon's ‘The Curious Incident of the Dog in the Night-Time’. The reasons for the success of this suburban comedy, Greenwell argues, include the consequences for the reader of Haddon's choice of the sufferer from Asperger's as narrator, especially the generation of unconscious humour and the range of literary forms he uses to tell his story. Nicholas Tucker adds an Afterword from his perspective as an educational psychologist as well as a literary critic, finding in the novel a rich mixture of heroism, mystery and love mediated through narrrative ingenuity.
Article
Machine learning is concerned with enabling computer programs automatically to improve their performance at some tasks through experience. Manufacturing is an area where the application of machine learning can be very fruitful. However, little has been published about the use of machine-learning techniques in the manufacturing domain. This paper evaluates several machine-learning techniques and examines applications in which they have been successfully deployed. Special attention is given to inductive learning, which is among the most mature of the machine-learning approaches currently available. Current trends and recent developments in machine-learning research are also discussed. The paper concludes with a summary of some of the key research issues in machine learning.
Article
One subfield of machine learning is the induction of a representation of a concept from positive and negative examples of the concept. Given a set of training examples, the goal of the inductive system is to create a description capable of classifying the training examples, yet general enough to accurately predict the classification of unseen examples. Often the original attributes describing the instances are inadequate to capture important regularities in the concept. New descriptors, constructed through the application of operators to the original attributes, can provide the proper vocabulary to create concise concept representations at the right level of generalization to be highly predictive. Constructive induction is the process of generating and applying new descriptors during inductive learning. The large number of possible constructive operators and combinations of attributes defines an enormous search space for the inductive process. Knowledge about the concept or problem domain can be used to guide the construction of new descriptors. This thesis lays the foundation of opportunistic constructive induction in the context of decision-tree assembly, providing a framework for dynamically applying fragments of knowledge to produce potentially useful descriptors or hypotheses.
Article
Many existing inductive learning systems have been developed under the assumption that the learning tasks are performed in a noise-free environment. To cope with most real-world problems, it is important that a learning system be equipped with the capability to handle uncertainty. In this paper, we first identify the various sources of uncertainty that may be encountered in a noisy problem domain. Next, we present a method for the efficient acquisition of classification rules from training instances which may contain inconsistent, incorrect, or missing information. This algorithm consists of three phases: (i) the detection of inherent patterns in a set of noisy training data; (ii) the construction of classification rules based on these patterns; and (iii) the use of these rules to predict the class membership of an object. The method has been implemented in a system known as APACS (automatic pattern analysis and classification system). This system has been tested using both real-life and simulated data, and its performance is found to be superior to many existing systems in terms of efficiency and classification accuracy. Being able to handle uncertainty in the learning process, the proposed algorithm can be employed for applications in real-world problem domains involving noisy data.
Article
Full-text available
Using optimization tools such as genetic algorithms (GAs) to construct a fuzzy expert system (FES) focusing only on its accuracy without considering the comprehensibility may result in a system that is not easy to understand. To exploit the transparency features of FESs for explanation in higher-level knowledge representation, a FES should provide high comprehensibility while preserves its accuracy. The completeness of fuzzy sets and rule structures should also be considered to guarantee that every data point has a response output. This paper proposes some quantitative measures to determine the degrees of the accuracy, the comprehensibility, and the completeness of FESs. These quantitative measures are then used as a fitness function for a genetic algorithm in optimally constructing a FES.
Article
Full-text available
In a previous paper (Lopez de Mantaras, 1991), we introduced a new information theoretic attribute selection method for decision tree induction. This method consists in computing for each node, a distance between the partition generated by the values of each candidate attribute in the node and the correct partition of the subset of training examples in this node. The chosen attribute is that whose corresponding partition is the closest to the correct partition (i.e. the partition that perfectly classifies the training data). In the paper we had also formally proved that such distance is not biased towards attributes with a large number of values in the sense specified by Quinlan in (Quinlan, 1986) and we had also some initial experimental evidence that the predictive accuracy of the induced trees was not significantly different from that obtained with the most widely used information theoretic attribute selection measures, that is, Quinlan´s Gain and Quinlan´s Gain Ratio. However, it seemed that the distance induced smaller trees especially when the attributes had different number of values. In that paper we could not confirm that the differences were statistically significant due to the small number of experiments we had performed. Now in this paper we report experimental results that allow us to confirm that the distance induces trees whose size, without losing accuracy, is not significantly different from those obtained using Quinlan's Gain but smaller than those obtained with Quinlan's Gain Ratio. These experimental results are supported by a statistical analysis performed using two statistical hypothesis tests : the sign test and the signed rank test.
Article
This paper introduces a new modiied approach to the instance based learning the-ory. Instance based learning is augmented by neighborhood spheres and multi-pass training to improve both on generalization capabilities and storage requirements. Two models for creating neighborhood spheres are investigated and put in perspec-tive with the IBL instance based learner. The IBL system considered here is based on the proximity algorithm, the growth additive algorithm and a noise resistant modiication of the growth additive algorithm. The herein described experiments will address the similarity of the MPIL and the IBL algorithms, but also point out signiicant diierences in the approach of reducing storage requirements and increas-ing generalization. A time complexity analysis of the proposed multi-pass instance based learning approach is provided. Several domains are used in this study, which include a real world domain in CMOS wafer fault diagnosis to allow for a comparison of these two approaches. Finally, the task of knowledge extraction in form of rules is addressed.
Article
This paper describesfoil, a system that learns Horn clauses from data expressed as relations.foil is based on ideas that have proved effective in attribute-value learning systems, but extends them to a first-order formalism. This new system has been applied successfully to several tasks taken from the machine learning literature.
Article
This paper presents a new approach to the intelligent navigation of a mobile robot. The hybrid control architecture described combines properties of purely reactive and behaviour-based systems, providing the ability both to learn automatically behaviours from inception, and to capture these in a distributed hierarchy of decision tree networks. The robot is first trained in the simplest world which has no obstacles, and is then trained in successively more complex worlds, using the knowledge acquired in the previous worlds. Each world representing the perceptual space is thus directly mapped on a unique rule layer which represents in turn the robot action space encoded in a distinct decision tree. A major advantage of the current implementation, compared with the previous work, is that the generated rules are easily understood by human users. The paper demonstrates that the proposed behavioural decomposition approach provides efficient management of complex knowledge, and that the learning mechanism is able to cope with noise and uncertainty in sensory data.
Chapter
In this paper we briefly survey the problems arising in learning concept descriptions from examples in domains affected by uncertainty and vagueness. A programming environment, called SMART-SHELL, is also presented: it addresses these problems, exploiting fuzzy logic. This is achieved by supplying the learning system with the capability of handling a fuzzy relational database, containing the extensional representation of the acquired logic formulas.
Article
Knowledge-Based Systems are based on an often defectuous knowledge, be this knowledge acquired from experts or learned from examples. This paper presents a strategy designed to cope with defectuous knowledge: given a set of rules, it builds a similarity function over the work space of the problem. This similarity function together with a set of examples then enables case-based reasoning, through aK-nearest-neighbour-like process. Compared to other case-based reasoning techniques, the advantage of this approach is the following: the “topology” of the space is automatically induced from the given rules, instead of being explicitly provided (and tuned) by the expert.
Chapter
This chapter provides an introduction to Learning Classifier Systems before reviewing a number of historical uses in data mining. An overview of the rest of the volume is then presented.
Chapter
A series of experiments dealing with the discovery of efficient classification procedures from large numbers of examples is described, with a case study from the chess end game king-rook versus king-knight. After an outline of the inductive inference machinery used, the paper reports on trials leading to correct and very fast attribute-based rules for the relations lost 2-ply and lost 3-ply. On another tack, a model of the performance of an idealized induction system is developed and its somewhat surprising predictions compared with observed results. The paper ends with a description of preliminary work on the automatic specification of relevant attributes.
Article
Inductively derived decision rules which correctly classify all legal Black-to-move positions in the king and pawn versus king endgame are derived using a technique of ‘structured’ induction.
Article
The problem of formulating general concepts from specific training examples has long been a major focus of machine learning research. While most previous research has focused on empirical methods for generalizing from a large number of training examples using no domain-specific knowledge, in the past few years new methods have been developed for applying domain-specific knowledge to for- mulate valid generalizations from single training examples. The characteristic common to these methods is that their ability to generalize from a single example follows from their ability to explain why the training example is a member of the concept being learned. This paper proposes a general, domain-independent mechanism, called EBG, that unifies previous approaches to explanation-based generalization. The EBG method is illustrated in the context of several example problems, and used to contrast several existing systems for explanation-based generalization. The perspective on explanation-based generalization af- forded by this general method is also used to identify open research problems in this area.
Article
Typescript. Thesis (M.S.)--University of Illinois at Urbana-Champaign, 1984. Includes bibliographical references (leaves 105-106).
Experiments in automatic learning of medical diagnostic rules. Presented at the International School for the Synthesis of Expert Knowledge Workshop
  • I Kononenko
  • I Bratko
  • E Roskar
Kononenko I., Bratko I., Roskar E. (1984) Experiments in automatic learning of medical diagnostic rules. Presented at the International School for the Synthesis of Expert Knowledge Workshop 1984, Bled, Yugoslavia. Also published as a Technical Report, Faculty of Electrical Engineering, E. Kardelj University, Ljubljana, Yugoslavia 1984.
  • A Wald
Wald A. (1947) Sequential Analysis New York: Wiley.
Rule Learning in DISCIPLE Proceedings of EWSL 1986 Orsay
  • Y Kodratoff
  • G Tecuci
Kodratoff Y., Tecuci G. (1986) Rule Learning in DISCIPLE Proceedings of EWSL 1986 Orsay: Université de Paris-Sud.
PLAGE: A way to give and use knowledge in learning Proceedings of EWSL
  • O Gascuel
Gascuel O. (1986) PLAGE: A way to give and use knowledge in learning Proceedings of EWSL 1986 Orsay: Université de Paris-Sud.
Concept formation from sequential data Proceedings of EWSL
  • J Haiech
  • J Quinqueton
  • J Sallantin
Haiech J., Quinqueton J., Sallantin J. (1986) Concept formation from sequential data Proceedings of EWSL 1986 Orsay: Université de Paris-Sud.
Learning from noisy data
  • J Quinlan
Quinlan J. (1986) Learning from noisy data, Machine Learning vol. 2 Ed. R.Michalski, J.Carbonell and T.Mitchell, Palo Alto, CA: Tioga.
Learning decision rules in noisy domains. Presented at Expert Systems 86 Brighton
  • T Niblett
  • I Bratko
Niblett T., Bratko I. (1986) Learning decision rules in noisy domains. Presented at Expert Systems 86 Brighton, 15-18 Dec. and published in Conference Proceedings.