Backus–Naur form grammar that generates classifiers

Source publication

Investigating Scaling of an Abstracted LCS Utilising Ternary and S-Expression Alphabets

Conference Paper

Full-text available

Jul 2007

Utilising the expressive power of S-Expressions in Learning Classifier Systems often prohibitively increases the search space due to increased flexibility over the ternary alphabet. Selection of appropriate S-Expressions functions through domain knowledge improves scaling, as expected. Considering the Cognitive Systems roots, abstraction was includ...

Context 1

... episodes were learnt as in a classical XCS to produce a maximally general accurate population, termed Optimum population set (OPS). This OPS was then the environment where each generalised rule was a message passed to the S-Expression based XCS, termed S-XCS. This could be contrasted with a single population version (without OPS) that directly learnt from the raw environmental messages, again utilising the S-Expression alphabet, in order to discover any benefits of abstraction. Following the S-Expression paradigm [7][ 8] the classifiers’ building blocks consists of either non-terminals (i.e. functions) or terminals (i.e. variables or constants), see figure 3. These can be tailored to the domain, e.g. VALUEAT which returns the bit value of a position within the classifier and ADDROF which returns the integer counterpart of a string of bits. Tailored functions rely upon domain knowledge, which reduces the applicability of the technique unless used with other general functions. AND, OR, NOT – Binary functions; the first two receive two arguments and the last one only one. Arguments can be either binary values directly from the leaf nodes of the condition trees, or integers (values >1 are treated as the binary ‘true’ or ‘1’) if the functions are located in the middle of the trees. • PLUS, MINUS, MULTIPLY, DIVIDE, POWEROF – Arithmetic functions; they all receive two arguments apart from the final one that receives only one argument. These arguments can be either binary values or integers as above. The POWEROF function treats its argument as an exponent of base two. • VALUEAT, ADDROF – Domain specific functions; VALUEAT receives one argument (pointer to an address, which is capped to the string length) and returns a value (environmental ‘value’ at the referenced address). ADDROF receives two arguments symbolising the beginning and end (as positions in the environmental string) for specifying the binary string to be converted to an integer. There is no ordering as the lowest index is considered the least significant bit and the highest index as the most significant bit. There is no match set using S-Expressions since the classifiers represent a complete solution by using variables and not predetermined bit strings with fixed actions. Thus the actions are computed (similar to piece-wise linear approximators). It is noted that VALUEAT points to a value and that it is impossible to consistently point to an incorrect value, which does not favour the low prediction classifiers that occur in a classical XCS. Other methods were also appropriately altered, e.g. Classifier Covering or completely removed, e.g. Subsumption Deletion. Coverage was triggered when the average fitness was below 0.5 in an action set and created a random valid tree. In initial tests numerosity tended to be small due to the diversity of the S- Expressions, which tended to distort fitness measurements. Thus, numerosity was removed and absolute accuracy was used as fitness instead of the relative accuracy. The update function, including accuracy-based fitness and action selection in exploit trials are the same as in classical XCS. The first system created was the S_XCS which uses only three Binary Functions (AND, OR, NOT). The Covering procedure created one classifier per invocation. The second system was the S_XCS1 which was created in order to investigate the effect of tailored building blocks (e.g. ADDROF, VALUEAT) and to determine if an increased variety of functions causes bloating or prevents learning. System parameters are β =0.2, initial error ε =0.5, accuracy parameters α =0.1, ε 0 =10, ν =5, GA threshold θ GA =25, deletion probabilities θ del =20, mutation rate μ =0.01, τ =0.2, crossover possibility χ =1, population size N=400 [2]. S_XCS discovers the Disjunctive Normal Form of the 3-MUX problem (see figure 4 & table 1), but fails to scale. However, S_XCS1 does scale, see figures 5 & 6. Provided sufficient time is allowed the system will discover the most compact functions suited to the domain, see table 2. This time may be greatly reduced by the use abstraction, where generalisation is accomplished using the ternary alphabet (where possible) and then semantic knowledge is gained by training on the optimum set, see figure 7. The OPS can not be learnt practically for >135 MUX and thus the known ternary solution was used in figure 7 and table 3, which only shows the potential scalability of S-Expressions. The best discovered solutions are shown in tables 2 and 3 (highest fitness in bold). Unnecessary functions, such as POWEROF, are more likely to appear without abstraction. The results of using S_XCS1 with a limited set of the most suited functions (discovered though abstraction) did not provide significant advantage, comparing figure 8 with figure 4. It is only the hypothetical case of figure 7, which demonstrates optimum performance is possible as the domain scales. XCS scales well on the MUX problems as the learning is polynomial complexity [9]. However, XCS still struggles to solve the 135-MUX problem whereas humans are capable of understanding the concept behind all MUX problems. The S_XCS system does exhibit potential in developing solutions that resemble the Disjunctive or the Conjunctive Normal Form of the k-MUX problem, but it did not scale well. The reason for this was that no partially correct classifiers entered the population rendering the Genetic Algorithm redundant and only Covering could create a correct classifier. The problem with competent classifiers not entering the population was that the environmental variables changed at every epoch and the functions (AND, OR, ...

View in full-text

Alternative Approaches to Diagnosing Ozone Production Regime

Chapter

Full-text available

Jan 2007

Effective formulation of control strategies requires knowledge of the responsiveness of ozone to emissions of its two main precursors, nitrogen oxides (NOx) and volatile organic compounds (VOC). While responsiveness depends nonlinearly on an array of spatially and temporally variable factors, a large body of research has sought to classify ozone fo...

Preprint

Full-text available

May 2020

Multitask Learning is a learning paradigm that deals with multiple different tasks in parallel and transfers knowledge among them. XOF, a Learning Classifier System using tree-based programs to encode building blocks (meta-features), constructs and collects features with rich discriminative information for classification tasks in an observed list. This paper seeks to facilitate the automation of feature transferring in between tasks by utilising the observed list. We hypothesise that the best discriminative features of a classification task carry its characteristics. Therefore, the relatedness between any two tasks can be estimated by comparing their most appropriate patterns. We propose a multiple-XOF system, called mXOF, that can dynamically adapt feature transfer among XOFs. This system utilises the observed list to estimate the task relatedness. This method enables the automation of transferring features. In terms of knowledge discovery, the resemblance estimation provides insightful relations among multiple data. We experimented mXOF on various scenarios, e.g. representative Hierarchical Boolean problems, classification of distinct classes in the UCI Zoo dataset, and unrelated tasks, to validate its abilities of automatic knowledge-transfer and estimating task relatedness. Results show that mXOF can estimate the relatedness reasonably between multiple tasks to aid the learning performance with the dynamic feature transferring.

Improvement of code fragment fitness to guide feature construction in XCS

Conference Paper

Jul 2019

In complex classification problems, constructed features with rich discriminative information can simplify decision boundaries. Code Fragments (CFs) produce GP-tree-like constructed features that can represent decision boundaries effectively in Learning Classifier Systems (LCSs). But the search space for useful CFs is vast due to this richness in boundary creation, which is impractical. Online Feature-generation (OF) improves the search of useful CFs by growing promising CFs from a dynamic list of preferable CFs based on the ability to produce accurate and generalised, i.e. high-fitness, classifiers. However, the previous preference for high-numerosity CFs did not encapsulate information about the applicability of CFs directly. Consequently, learning performances of OF with an accuracy-based LCS (termed XOF) struggled to progress in the final learning phase. The hypothesis is that estimating the CF-fitness of CFs based on classifier fitness will aid the search for useful constructed features. This is anticipated to drive the search of decision boundaries efficiently, and thereby improve learning performances. Experiments on large-scale and hierarchical Boolean problems show that the proposed systems learn faster than traditional LCSs regarding the number of generations and time consumption. Tests on real-world datasets demonstrate its capability to find readable and useful features to solve practical problems.

Human-inspired Scaling in Learning Classifier Systems: Case Study on the n-bit Multiplexer Problem Set

Conference Paper

Jul 2016

Learning classifier systems (LCSs) originated from artificial cognitive systems research, but migrated such that LCS became powerful classification techniques. Modern LCSs can be used to extract building blocks of knowledge in order to solve more difficult problems in the same or a related domain. The past work showed that the reuse of knowledge through the adoption of code fragments, GP-like sub-trees, into the XCS learning classifier system framework could provide advances in scaling. However, unless the pattern underlying the complete domain can be described by the selected LCS representation of the problem, a limit of scaling will eventually be reached. This is due to LCSs' 'divide and conquer' approach utilizing rule-based solutions, which entails an increasing number of rules (subclauses) to describe a problem as it scales. Inspired by human problem solving abilities, the novel work in this paper seeks to reuse learned knowledge and learned functionality to scale to complex problems by transferring them from simpler problems. Progress is demonstrated on the benchmark Multiplexer (Mux) domain, albeit the developed approach is applicable to other scalable domains. The fundamental axioms necessary for learning are proposed. The methods for transfer learning in LCSs are developed. Also, learning is recast as a decomposition into a series of sub-problems. Results show that from a conventional tabula rasa, with only a vague notion of what subordinate problems might be relevant, it is possible to learn a general solution to any n-bit Mux problem for the first time. This is verified by tests on the 264, 521 and 1034 bit Mux problems.

Reusing Building Blocks of Extracted Knowledge to Solve Complex, Large-Scale Boolean Problems

Article

Aug 2014
IEEE T EVOLUT COMPUT

Evolutionary computation techniques have had limited capabilities in solving large-scale problems due to the large search space demanding large memory and much longer training times. In the work presented here, a genetic programming like rich encoding scheme has been constructed to identify building blocks of knowledge in a learning classifier system. The fitter building blocks from the learning system trained against smaller problems have been utilized in a higher complexity problem in the domain to achieve scalable learning. The proposed system has been examined and evaluated on four different Boolean problem domains: 1) multiplexer, 2) majority-on, 3) carry, and 4) even-parity problems. The major contribution of this paper is to successfully extract useful building blocks from smaller problems and reuse them to learn more complex large-scale problems in the domain, e.g., 135-bit multiplexer problem, where the number of possible instances is $2^{135}approx 4times10^{40}$ , is solved by reusing the extracted knowledge from the learned lower level solutions in the domain. Autonomous scaling is, for the first time, shown to be possible in learning classifier systems. It improves effectiveness and reduces the number of training instances required in large problems, but requires more time due to its sequential build-up of knowledge.

Dynamical genetic programming in XCSF

Article

Full-text available

Jul 2013
EVOL COMPUT

Abstract A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to artificial neural networks. This paper presents results from an investigation into using a temporally dynamic symbolic representation within the XCSF learning classifier system. In particular, dynamical arithmetic networks are used to represent the traditional condition-action production system rules to solve continuous-valued reinforcement learning problems and to perform symbolic regression, finding competitive performance with traditional genetic programming on a number of composite polynomial tasks. In addition, the network outputs are later repeatedly sampled at varying temporal intervals to perform multistep-ahead predictions of a financial time series.

GAssist vs. BioHEL: Critical assessment of two paradigms of genetics-based machine learning

Article

Full-text available

Jun 2013
SOFT COMPUT

This paper reports an exhaustive analysis performed over two specific Genetics-based Machine Learning systems: BioHEL and GAssist. These two systems share many mechanisms and operators, but at the same time, they apply two different learning paradigms (the Iterative Rule Learning approach and the Pittsburgh approach, respectively). The aim of this paper is to: (a) propose standard configurations for handling small and large datasets, (b) compare the two systems in terms of learning capabilities, complexity of the obtained solutions and learning time, (c) determine the areas of the problem space where each one of these two systems performs better, and (d) compare them with other well-known machine learning algorithms. The results show that it is possible to find standard configurations for both systems. With these configurations the systems perform up to the standards of other state-of-the-art machine learning algorithms such as Support Vector Machines. Moreover, we identify the problem domains where each one of these systems have advantages and disadvantages and propose ways to improve the systems based on this analysis.

Rule-based Genetic Programming

Conference Paper

Jan 2008

In this paper we introduce a new approach for genetic programming, called rule-based genetic programming, or RBGP in short. A program evolved in the RBGP syntax is a list of rules. Each rule consists of two conditions, combined with a logical operator, and an action part. Such rules are independent from each other in terms of position (mostly) and cardinality (always). This reduces the epistasis drastically and hence, the genetic reproduction operations are much more likely to produce good results than in other Genetic Programming methodologies. In order to verify the utility of our idea, we apply RBGP to a hard problem in distributed systems. With it, we are able to obtain emergent algorithms for mutual exclusion at a distributed critical section.

Learning Classifier Systems: Looking Back and Glimpsing Ahead

Conference Paper

Full-text available

Jan 2007

Over the recent years, research on Learning Classifier Systems (LCSs) got more and more pronounced and diverse. There have been significant advances of the LCS field on various fronts including system understanding, representations, computational models, and successful applications. In comparison to other machine learning techniques, the advantages of LCSs have become more pronounced: (1) rule-comprehensibility and thus knowledge extraction is straightforward; (2) online learning is possible; (3) local minima are avoided due to the evolutionary learning component; (4) distributed solution representations evolve; or (5) larger problem domains can be handled. After the tenth edition of the International Workshop on LCSs, more than ever before, we are looking towards an exciting future. More diverse and challenging applications, efficiency enhancements, studies of dynamical systems, and applications to cognitive control approaches appear imminent. The aim of this paper is to provide a look back at the LCS field, whereby we place our emphasis on the recent advances. Moreover, we take a glimpse ahead by discussing future challenges and opportunities for successful system applications in various domains.

ConCS: A Continual Classifier System for Continual Learning of Multiple Boolean Problems

Article

Full-text available

Jan 2022
IEEE T EVOLUT COMPUT

Human intelligence can simultaneously process many tasks with the ability to accumulate and reuse knowledge. Recent advances in artificial intelligence, such as Transfer, Multitask and Layered Learning, seek to replicate these abilities. However, humans must specify the task order, which is often difficult particularly with uncertain domain knowledge. This work introduces a Continual-learning system (ConCS), such that given an open-ended set of problems once each is solved its solution can contribute to solving further problems. The hypothesis is that the Evolutionary Computation approach of Learning Classifier Systems (LCSs) can form this system due to its niched, cooperative rules. A collaboration of parallel LCSs identifies sets of patterns linking features to classes that can be reused in related problems automatically. Results from distinct Boolean and integer classification problems, with varying interrelations, show that by combining knowledge from simple problems, complex problems can be solved at increasing scales. 100% accuracy is achieved for the problems tested regardless of the order of task presentation. This includes intractable problems for previous approaches, e.g. n-bit Majority-on. A major contribution is that human guidance is now unnecessary to determine the task learning order. Furthermore, the system automatically generates the curricula for learning the most difficult tasks.

Conference Paper

Apr 2020

Backus–Naur form grammar that generates classifiers

Context in source publication

Similar publications

Citations