A Parse Tree Representation

Source publication

Autonomous Document Classification for Business.

Conference Paper

Full-text available

Jan 1997

With the continuing exponential growth of the Internet and the more recent growth of business Intranets, the commercial world is becoming increasingly aware of the problem of electronic information overload. This has encouraged interest in developing agents/softbots that can act as electronic personal assistants and can develop and adapt representa...

Context 1

... any part of a tree can be interchanged with another part and the tree remains valid -perfect for flexible evolution and mutations. An example parse tree is shown in Figure 2. ...

View in full-text

INCREASING SITUATIONAL AWARENESS THROUGH THE USE OF UXV TEAMS WHILE REDUCING OPERATOR WORKLOAD

Article

Full-text available

Advanced Technology Laboratories, in conjunction with Lock-heed Martin Maritime Systems and Sensors and Lockheed Martin Aeronautics Advanced Development Programs, performed a set of experiments in cooperation with the U.S. Navy involving collaborative, unmanned surface and air vehicle mis-sion execution. Our multi-domain, collaborative research foc...

Modeling a Neural Network Based Control for Autonomous Production Systems

Article

Full-text available

Jul 2010

The increasing complexity of production logistic systems has lead to an emergence of new decentralized control concepts. The Collaborative Research Center 637 (CRC 637) investigates the ad-vantages and limitations of autonomous control as one of these con-cepts. This research mainly focuses on control strategies consisting of precise descriptions o...

Internet Routing Protocols as an Autonomous Control Approach for Transport Networks

Article

Full-text available

Jan 2006

To realise an autonomous control for transport networks it is attempted to transfer well known and approved routing protocols from data communication to transport problems. Here structural differences between data and transportation networks prevent a direct transfer of the protocols. In transportation networks not one but several diverse and parti...

An Autonomous Control Concept for Production Logistics

Conference Paper

Full-text available

Jan 2010

The German Collaborative Research Centre 637 'Autonomous Cooperating Logistic Processes' tries to make a paradigm shift from central planning to autonomous control in the field of logistics. Among other things, autonomous routing algorithms based on internet routing protocols are developed. The Distributed Logistics Routing Protocol (DLRP) was orig...

Document Clustering with Evolved Multi-Word Search Queries Where the Number of Classes is Unknown

Preprint

Full-text available

Aug 2023

We present a novel, hybrid approach for clustering text databases. We use a genetic algorithm to generate and evolve a set of search queries in Apache Lucene format. Clusters are formed as the set of documents matched by a search query. The queries are optimized to maximize the number of documents returned and to minimize the overlap between clusters (documents returned by more than one query). Where queries contain more than one word, we have found it useful to assign one word to be the root and constrain the query construction such that the set of documents returned by any additional query words intersect with the set returned by the root word. Multiword queries are interpreted disjunctively. We also describe how a gene can be used to determine the number of clusters (k). Not all documents in a collection are returned by any of the search queries in a set, so once the search query evolution is completed a second stage is performed whereby a KNN algorithm is applied to assign all unassigned documents to their nearest cluster. We describe the method and present results using 8 text datasets comparing effectiveness with well-known existing algorithms. We note that search query format has the qualitative benefits of being interpretable and providing an explanation of cluster construction.

Document Clustering with Evolved Single Word Search Queries

Conference Paper

Full-text available

Jun 2021

We present a novel, hybrid approach for clustering text databases. We use a genetic algorithm to generate and evolve a set of single word search queries in Apache Lucene format. Clusters are formed as the set of documents matching a search query. The queries are optimized to maximize the number of documents returned and to minimize the overlap between clusters (documents returned by more than one query in a set). Optionally, the number of clusters can be specified in advance, which will normally result in an improvement in performance. Not all documents in a collection are returned by any of the search queries in a set, so once the search query evolution is completed a second stage is performed whereby a KNN algorithm is applied to assign all unassigned documents to their nearest cluster. We describe the method and compare effectiveness with other well-known existing systems on 8 different text datasets. We note that search query format has the qualitative benefits of being interpretable and providing an explanation of cluster construction.

Document clustering with evolved search queries

Conference Paper

Full-text available

Jun 2017

A Survey of Email Service; Attacks, Security Methods and Protocols

Article

Full-text available

Mar 2017

SPAM Detection: Naïve Bayesian Classification and RPN Expression-based LGP Approaches Compared

Conference Paper

Full-text available

Apr 2016

An investigation is performed of a machine learning algorithm and the Bayesian classifier in the spam-filtering context. The paper shows the advantage of the use of Reverse Polish Notation (RPN) expressions with feature extraction compared to the traditional Naïve Bayesian classifier used for spam detection assuming the same features. The performance of the two is investigated using a public corpus and a recent private spam collection, concluding that the system based on RPN LGP (Linear Genetic Programming) gave better results compared to two popularly used open source Bayesian spam filters.

Automatic Document Classification in Small Environments

Article

Jan 2012

Jonathan David McElroy

Document classification is used to sort and label documents. This gives users quicker access to relevant data. Users that work with large inflow of documents spend time filing and categorizing them to allow for easier procurement. The Automatic Classification and Document Filing (ACDF) system proposed here is designed to allow users working with files or documents to rely on the system to classify and store them with little manual attention. By using a system built on Hidden Markov Models, the documents in a smaller desktop environment are categorized with better results than the traditional Naive Bayes implementation of classification.

Evolved Apache Lucene SpanFirst queries are good text classifiers

Conference Paper

Full-text available

Aug 2010

Laurie Hirsch

Human readable text classifiers have a number of advantages over classifiers based on complex and opaque mathematical models. For some time now search queries or rules have been used for classification purposes, either constructed manually or automatically. We have performed experiments using genetic algorithms to evolve text classifiers in search query format with the combined objective of classifier accuracy and classifier readability. We have found that a small set of disjunct Lucene SpanFirst queries effectively meet both goals. This kind of query evaluates to true for a document if a particular word occurs within the first N words of a document. Previously researched classifiers based on queries using combinations of words connected with OR, AND and NOT were found to be generally less accurate and (arguably) less readable. The approach is evaluated using standard test sets Reuters-21578 and Ohsumed and compared against several classification algorithms.

Evolving Lucene search queries for text classification

Conference Paper

Full-text available

Jul 2007

We describe a method for generating accurate, compact, human understandable text classifiers. Text datasets are indexed using Apache Lucene and Genetic Programs are used to construct Lucene search queries. Genetic programs acquire fitness by producing queries that are effective binary classifiers for a particular category when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from classification tasks.

Intelligent GP fusion from multiple sources for text classification

Conference Paper

Full-text available

Oct 2005

This paper shows how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity -- five derived from the citation information of the collection, and three derived from the structural content -- and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our experiments with the ACM Computing Classification Scheme, using documents from the ACM Digital Library, indicate that GP can discover similarity functions superior to those based solely on a single type of evidence. Effectiveness of the similarity functions discovered through simple majority voting is better than that of content-based as well as combination-based Support Vector Machine classifiers. Experiments also were conducted to compare the performance between GP techniques and other fusion techniques such as Genetic Algorithms (GA) and linear fusion. Empirical results show that GP was able to discover better similarity functions than GA or other fusion techniques.

Evolving Text Classification Rules with Genetic Programming.

Article

Full-text available

Aug 2005
APPL ARTIF INTELL

We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications.

A Parse Tree Representation

Context in source publication

Similar publications

Citations