Figure 3 - uploaded by Will Neil Browne
Content may be subject to copyright.
Backus–Naur form grammar that generates classifiers 

Backus–Naur form grammar that generates classifiers 

Source publication
Conference Paper
Full-text available
Utilising the expressive power of S-Expressions in Learning Classifier Systems often prohibitively increases the search space due to increased flexibility over the ternary alphabet. Selection of appropriate S-Expressions functions through domain knowledge improves scaling, as expected. Considering the Cognitive Systems roots, abstraction was includ...

Context in source publication

Context 1
... episodes were learnt as in a classical XCS to produce a maximally general accurate population, termed Optimum population set (OPS). This OPS was then the environment where each generalised rule was a message passed to the S-Expression based XCS, termed S-XCS. This could be contrasted with a single population version (without OPS) that directly learnt from the raw environmental messages, again utilising the S-Expression alphabet, in order to discover any benefits of abstraction. Following the S-Expression paradigm [7][ 8] the classifiers’ building blocks consists of either non-terminals (i.e. functions) or terminals (i.e. variables or constants), see figure 3. These can be tailored to the domain, e.g. VALUEAT which returns the bit value of a position within the classifier and ADDROF which returns the integer counterpart of a string of bits. Tailored functions rely upon domain knowledge, which reduces the applicability of the technique unless used with other general functions. AND, OR, NOT – Binary functions; the first two receive two arguments and the last one only one. Arguments can be either binary values directly from the leaf nodes of the condition trees, or integers (values >1 are treated as the binary ‘true’ or ‘1’) if the functions are located in the middle of the trees. • PLUS, MINUS, MULTIPLY, DIVIDE, POWEROF – Arithmetic functions; they all receive two arguments apart from the final one that receives only one argument. These arguments can be either binary values or integers as above. The POWEROF function treats its argument as an exponent of base two. • VALUEAT, ADDROF – Domain specific functions; VALUEAT receives one argument (pointer to an address, which is capped to the string length) and returns a value (environmental ‘value’ at the referenced address). ADDROF receives two arguments symbolising the beginning and end (as positions in the environmental string) for specifying the binary string to be converted to an integer. There is no ordering as the lowest index is considered the least significant bit and the highest index as the most significant bit. There is no match set using S-Expressions since the classifiers represent a complete solution by using variables and not predetermined bit strings with fixed actions. Thus the actions are computed (similar to piece-wise linear approximators). It is noted that VALUEAT points to a value and that it is impossible to consistently point to an incorrect value, which does not favour the low prediction classifiers that occur in a classical XCS. Other methods were also appropriately altered, e.g. Classifier Covering or completely removed, e.g. Subsumption Deletion. Coverage was triggered when the average fitness was below 0.5 in an action set and created a random valid tree. In initial tests numerosity tended to be small due to the diversity of the S- Expressions, which tended to distort fitness measurements. Thus, numerosity was removed and absolute accuracy was used as fitness instead of the relative accuracy. The update function, including accuracy-based fitness and action selection in exploit trials are the same as in classical XCS. The first system created was the S_XCS which uses only three Binary Functions (AND, OR, NOT). The Covering procedure created one classifier per invocation. The second system was the S_XCS1 which was created in order to investigate the effect of tailored building blocks (e.g. ADDROF, VALUEAT) and to determine if an increased variety of functions causes bloating or prevents learning. System parameters are β =0.2, initial error ε =0.5, accuracy parameters α =0.1, ε 0 =10, ν =5, GA threshold θ GA =25, deletion probabilities θ del =20, mutation rate μ =0.01, τ =0.2, crossover possibility χ =1, population size N=400 [2]. S_XCS discovers the Disjunctive Normal Form of the 3-MUX problem (see figure 4 & table 1), but fails to scale. However, S_XCS1 does scale, see figures 5 & 6. Provided sufficient time is allowed the system will discover the most compact functions suited to the domain, see table 2. This time may be greatly reduced by the use abstraction, where generalisation is accomplished using the ternary alphabet (where possible) and then semantic knowledge is gained by training on the optimum set, see figure 7. The OPS can not be learnt practically for >135 MUX and thus the known ternary solution was used in figure 7 and table 3, which only shows the potential scalability of S-Expressions. The best discovered solutions are shown in tables 2 and 3 (highest fitness in bold). Unnecessary functions, such as POWEROF, are more likely to appear without abstraction. The results of using S_XCS1 with a limited set of the most suited functions (discovered though abstraction) did not provide significant advantage, comparing figure 8 with figure 4. It is only the hypothetical case of figure 7, which demonstrates optimum performance is possible as the domain scales. XCS scales well on the MUX problems as the learning is polynomial complexity [9]. However, XCS still struggles to solve the 135-MUX problem whereas humans are capable of understanding the concept behind all MUX problems. The S_XCS system does exhibit potential in developing solutions that resemble the Disjunctive or the Conjunctive Normal Form of the k-MUX problem, but it did not scale well. The reason for this was that no partially correct classifiers entered the population rendering the Genetic Algorithm redundant and only Covering could create a correct classifier. The problem with competent classifiers not entering the population was that the environmental variables changed at every epoch and the functions (AND, OR, ...

Similar publications

Chapter
Full-text available
Effective formulation of control strategies requires knowledge of the responsiveness of ozone to emissions of its two main precursors, nitrogen oxides (NOx) and volatile organic compounds (VOC). While responsiveness depends nonlinearly on an array of spatially and temporally variable factors, a large body of research has sought to classify ozone fo...

Citations

... Thus, it inhibits XCS from describing complex patterns in hierarchical problems. Fortunately, XCS allows encoding its rules using rich representations, such as tree-based programs [1,13,16]. ...
Preprint
Full-text available
Multitask Learning is a learning paradigm that deals with multiple different tasks in parallel and transfers knowledge among them. XOF, a Learning Classifier System using tree-based programs to encode building blocks (meta-features), constructs and collects features with rich discriminative information for classification tasks in an observed list. This paper seeks to facilitate the automation of feature transferring in between tasks by utilising the observed list. We hypothesise that the best discriminative features of a classification task carry its characteristics. Therefore, the relatedness between any two tasks can be estimated by comparing their most appropriate patterns. We propose a multiple-XOF system, called mXOF, that can dynamically adapt feature transfer among XOFs. This system utilises the observed list to estimate the task relatedness. This method enables the automation of transferring features. In terms of knowledge discovery, the resemblance estimation provides insightful relations among multiple data. We experimented mXOF on various scenarios, e.g. representative Hierarchical Boolean problems, classification of distinct classes in the UCI Zoo dataset, and unrelated tasks, to validate its abilities of automatic knowledge-transfer and estimating task relatedness. Results show that mXOF can estimate the relatedness reasonably between multiple tasks to aid the learning performance with the dynamic feature transferring.
... LCSs need to be extended with rich encodings for the system rules to cope with complex problems. Among them, the direction of using tree-based programs has been investigated in various research [1,5,15]. This is of interest because they can cover complex sets of data instances with readability. ...
Conference Paper
In complex classification problems, constructed features with rich discriminative information can simplify decision boundaries. Code Fragments (CFs) produce GP-tree-like constructed features that can represent decision boundaries effectively in Learning Classifier Systems (LCSs). But the search space for useful CFs is vast due to this richness in boundary creation, which is impractical. Online Feature-generation (OF) improves the search of useful CFs by growing promising CFs from a dynamic list of preferable CFs based on the ability to produce accurate and generalised, i.e. high-fitness, classifiers. However, the previous preference for high-numerosity CFs did not encapsulate information about the applicability of CFs directly. Consequently, learning performances of OF with an accuracy-based LCS (termed XOF) struggled to progress in the final learning phase. The hypothesis is that estimating the CF-fitness of CFs based on classifier fitness will aid the search for useful constructed features. This is anticipated to drive the search of decision boundaries efficiently, and thereby improve learning performances. Experiments on large-scale and hierarchical Boolean problems show that the proposed systems learn faster than traditional LCSs regarding the number of generations and time consumption. Tests on real-world datasets demonstrate its capability to find readable and useful features to solve practical problems.
... The multiplexer problem is a complex and difficult problem due to epistasis and its large search space. An early attempt at scaling was the S-XCS system that utilizes optimal populations of rules, which are learned in the same way as classical XCS [12]. These optimal rules are then fed to S-XCS as messages thus enabling abstraction. ...
... These optimal rules are then fed to S-XCS as messages thus enabling abstraction. The system uses human constructed functions such as Multiply, Divide, PowerOf, ValueAt, AddrOf, among others [12]. Although these key functions provide the system with the building blocks to piece together the necessary knowledge blocks, they have an inherent bias and might not be available to the system in large problem domains. ...
Conference Paper
Learning classifier systems (LCSs) originated from artificial cognitive systems research, but migrated such that LCS became powerful classification techniques. Modern LCSs can be used to extract building blocks of knowledge in order to solve more difficult problems in the same or a related domain. The past work showed that the reuse of knowledge through the adoption of code fragments, GP-like sub-trees, into the XCS learning classifier system framework could provide advances in scaling. However, unless the pattern underlying the complete domain can be described by the selected LCS representation of the problem, a limit of scaling will eventually be reached. This is due to LCSs' 'divide and conquer' approach utilizing rule-based solutions, which entails an increasing number of rules (subclauses) to describe a problem as it scales. Inspired by human problem solving abilities, the novel work in this paper seeks to reuse learned knowledge and learned functionality to scale to complex problems by transferring them from simpler problems. Progress is demonstrated on the benchmark Multiplexer (Mux) domain, albeit the developed approach is applicable to other scalable domains. The fundamental axioms necessary for learning are proposed. The methods for transfer learning in LCSs are developed. Also, learning is recast as a decomposition into a series of sub-problems. Results show that from a conventional tabula rasa, with only a vague notion of what subordinate problems might be relevant, it is possible to learn a general solution to any n-bit Mux problem for the first time. This is verified by tests on the 264, 521 and 1034 bit Mux problems.
... Furthermore, increased dimensionality of the problem, resulting in increased search space, demands large memory space and leads to much longer training times, and eventually restricts LCS to a limit in problem size. By explicitly feeding the domain knowledge to an LCS, scalability can be achieved but it adds bias and restricts use in multiple domains [3]. ...
... The proposed system will be tested on four different Boolean problem domains: 1) multiplexer, 2) majority-on, 3) carry, and 4) even-parity problems. The multiplexer domain is a multimodal and epistatic problem domain. ...
... XCS is a formulation of LCS that uses accuracy-based fitness to learn the problem by forming a complete mapping of states and actions to rewards. 3 In XCS, the learning agent evolves a population [P] of classifiers, where each classifier consists of a rule and a set of associated parameters estimating the quality of the rule. Each rule is of the form if condition then action, having two parts: a condition and the corresponding action. ...
Article
Evolutionary computation techniques have had limited capabilities in solving large-scale problems due to the large search space demanding large memory and much longer training times. In the work presented here, a genetic programming like rich encoding scheme has been constructed to identify building blocks of knowledge in a learning classifier system. The fitter building blocks from the learning system trained against smaller problems have been utilized in a higher complexity problem in the domain to achieve scalable learning. The proposed system has been examined and evaluated on four different Boolean problem domains: 1) multiplexer, 2) majority-on, 3) carry, and 4) even-parity problems. The major contribution of this paper is to successfully extract useful building blocks from smaller problems and reuse them to learn more complex large-scale problems in the domain, e.g., 135-bit multiplexer problem, where the number of possible instances is $2^{135}approx 4times10^{40}$ , is solved by reusing the extracted knowledge from the learned lower level solutions in the domain. Autonomous scaling is, for the first time, shown to be possible in learning classifier systems. It improves effectiveness and reduces the number of training instances required in large problems, but requires more time due to its sequential build-up of knowledge.
... ough the course of evolution as the system constructs the required building blocks to find solutions. However, when logical disjunctions are involved, optimality is unattainable because the symbolic conditions highly overlap, resulting in classifiers sharing their fitness with other classifiers and thereby lowering the fitness values (Lanzi, 2007). Ioannides and Browne (2008) later extended this approach to also include arithmetic functions (i.e., PLUS, MINUS, MULTIPLY, DIVIDE, and POWEROF) as well as domain specific functions (i.e., VALUEAT and ADDROF) to solve a number of Multiplexer problems. ...
Article
Full-text available
Abstract A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to artificial neural networks. This paper presents results from an investigation into using a temporally dynamic symbolic representation within the XCSF learning classifier system. In particular, dynamical arithmetic networks are used to represent the traditional condition-action production system rules to solve continuous-valued reinforcement learning problems and to perform symbolic regression, finding competitive performance with traditional genetic programming on a number of composite polynomial tasks. In addition, the network outputs are later repeatedly sampled at varying temporal intervals to perform multistep-ahead predictions of a financial time series.
... To test the capabilities of the system in real-valued domains, several encodings have been proposed, including intervals (Wilson 2000(Wilson , 2001, ellipsoids (Butz 2005), and convex-hulls (Lanzi and Wilson 2006). Also, there has been some work in representations that use a semantical approach, such as first order logic (Mellor 2005), messy conditions (Lanzi and Perrucci 1999a) and S-expressions (Browne and Ioannides 2007;Lanzi and Perrucci 1999b). ...
Article
Full-text available
This paper reports an exhaustive analysis performed over two specific Genetics-based Machine Learning systems: BioHEL and GAssist. These two systems share many mechanisms and operators, but at the same time, they apply two different learning paradigms (the Iterative Rule Learning approach and the Pittsburgh approach, respectively). The aim of this paper is to: (a) propose standard configurations for handling small and large datasets, (b) compare the two systems in terms of learning capabilities, complexity of the obtained solutions and learning time, (c) determine the areas of the problem space where each one of these two systems performs better, and (d) compare them with other well-known machine learning algorithms. The results show that it is possible to find standard configurations for both systems. With these configurations the systems perform up to the standards of other state-of-the-art machine learning algorithms such as Support Vector Machines. Moreover, we identify the problem domains where each one of these systems have advantages and disadvantages and propose ways to improve the systems based on this analysis.
... Furthermore, we could even construct a tree-like program structure from the rules, since each of them corresponds to a single if statement in a normal programming language. There are similarities between our RBGP and some special types of LCS, like Browne's abstracted LCS [13] and S-expression-based LCS [41]. The two most fundamental differences lie in the semantics of both, the rules and the approach: In RBGP, a rule may directly manipulate symbols and invoke external procedures with (at most) two in/outarguments . ...
Conference Paper
In this paper we introduce a new approach for genetic programming, called rule-based genetic programming, or RBGP in short. A program evolved in the RBGP syntax is a list of rules. Each rule consists of two conditions, combined with a logical operator, and an action part. Such rules are independent from each other in terms of position (mostly) and cardinality (always). This reduces the epistasis drastically and hence, the genetic reproduction operations are much more likely to produce good results than in other Genetic Programming methodologies. In order to verify the utility of our idea, we apply RBGP to a hard problem in distributed systems. With it, we are able to obtain emergent algorithms for mutual exclusion at a distributed critical section.
... Recent years have seen an explosion in quantity and diversity of LCS research. Advances have been made on various frontiers including different condition representations beyond the traditional binary/ternary rules (rules for continuous attributes [80], hyperellipsoids [28], representations based on S-expressions [78, 21], etc.), other problem classes (function approximation tasks [76, 86] , cluster- ing [109]), smarter exploration mechanisms [36, 84, 10], and various theoretical advances [34, 26, 91, 94]. The main meeting point of the LCS community, the International Workshop on Learning Classifier Systems, celebrated its 10th edition in 2007. ...
... Other alternatives are using rule representations based on fuzzy logic [39], decision trees and synthetic instances used as the core of a nearest neighbor classifier [81], or hyperellipsoid conditions [28, 35]. Another kind of representation advance is the use of symbolic expressions to define classifier conditions [77, 2, 78, 21]. This kind of representation may be the most flexible one, in the sense that it can specify the most diverse types of problem subspaces. ...
Conference Paper
Full-text available
Over the recent years, research on Learning Classifier Systems (LCSs) got more and more pronounced and diverse. There have been significant advances of the LCS field on various fronts including system understanding, representations, computational models, and successful applications. In comparison to other machine learning techniques, the advantages of LCSs have become more pronounced: (1) rule-comprehensibility and thus knowledge extraction is straightforward; (2) online learning is possible; (3) local minima are avoided due to the evolutionary learning component; (4) distributed solution representations evolve; or (5) larger problem domains can be handled. After the tenth edition of the International Workshop on LCSs, more than ever before, we are looking towards an exciting future. More diverse and challenging applications, efficiency enhancements, studies of dynamical systems, and applications to cognitive control approaches appear imminent. The aim of this paper is to provide a look back at the LCS field, whereby we place our emphasis on the recent advances. Moreover, we take a glimpse ahead by discussing future challenges and opportunities for successful system applications in various domains.
Article
Full-text available
Human intelligence can simultaneously process many tasks with the ability to accumulate and reuse knowledge. Recent advances in artificial intelligence, such as Transfer, Multitask and Layered Learning, seek to replicate these abilities. However, humans must specify the task order, which is often difficult particularly with uncertain domain knowledge. This work introduces a Continual-learning system (ConCS), such that given an open-ended set of problems once each is solved its solution can contribute to solving further problems. The hypothesis is that the Evolutionary Computation approach of Learning Classifier Systems (LCSs) can form this system due to its niched, cooperative rules. A collaboration of parallel LCSs identifies sets of patterns linking features to classes that can be reused in related problems automatically. Results from distinct Boolean and integer classification problems, with varying interrelations, show that by combining knowledge from simple problems, complex problems can be solved at increasing scales. 100% accuracy is achieved for the problems tested regardless of the order of task presentation. This includes intractable problems for previous approaches, e.g. n-bit Majority-on. A major contribution is that human guidance is now unnecessary to determine the task learning order. Furthermore, the system automatically generates the curricula for learning the most difficult tasks.
Conference Paper
Multitask Learning is a learning paradigm that deals with multiple different tasks in parallel and transfers knowledge among them. XOF, a Learning Classifier System using tree-based programs to encode building blocks (meta-features), constructs and collects features with rich discriminative information for classification tasks in an observed list. This paper seeks to facilitate the automation of feature transferring in between tasks by utilising the observed list. We hypothesise that the best discriminative features of a classification task carry its characteristics. Therefore, the relatedness between any two tasks can be estimated by comparing their most appropriate patterns. We propose a multiple-XOF system, called mXOF, that can dynamically adapt feature transfer among XOFs. This system utilises the observed list to estimate the task relatedness. This method enables the automation of transferring features. In terms of knowledge discovery, the resemblance estimation provides insightful relations among multiple data. We experimented mXOF on various scenarios, e.g. representative Hierarchical Boolean problems, classification of distinct classes in the UCI Zoo dataset, and unrelated tasks, to validate its abilities of automatic knowledge-transfer and estimating task relatedness. Results show that mXOF can estimate the relatedness reasonably between multiple tasks to aid the learning performance with the dynamic feature transferring.