The syntax of grammar rules.

Source publication

Basic Gene Grammars and DNA-ChartParser for language processing of Escherichia coli promoter DNA sequences

Article

Full-text available

Mar 2001

Motivation: The field of 'DNA linguistics' has emerged from pioneering work in computational linguistics and molecular biology. Most formal grammars in this field are expressed using Definite Clause Grammars but these have computational limitations which must be overcome. The present study provides a new DNA parsing system, comprising a logic gram...

Context 1

... rule syntax Figure 1 shows the syntax of basic grammar rules. A grammar rule consists of a single left-hand-side (LHS) category, an arrow symbol, and one or more right-hand- side (RHS) categories (rhs cat 1,..., rhs cat n.). ...

View in full-text

A syntactic component for Vietnamese language processing

Article

Full-text available

Jun 2015

This paper presents the development of a grammar and a syntactic parser for the Vietnamese language. We first discuss the construction of a lexicalized tree-adjoining grammar using an automatic extraction approach. We then present the construction and evaluation of a deep syntactic parser based on the extracted grammar. This is a complete system th...

Parsing Bangla Grammar Using Context Free Grammar

Article

Full-text available

Jan 2013

Parsing plays a very prominent role in computational linguistics. Parsing a Bangla sentence is a primary need in Bangla language processing. This chapter describes the Context Free Grammar (CFG) for parsing Bangla language, and hence, a Bangla parser is proposed based on the Bangla grammar. This approach is very simple to apply in Bangla sentences,...

Trailfinder - A Case Study in Extracting Spatial Information Using Deep Language Processing.

Conference Paper

Full-text available

Jan 2004

Abstract The present paper reports on an end-to-end application using a deep processing grammar,to ex- tract spatial and temporal information of prepositional and adverbial expressions from running text. The extraction process is based on the full understanding of the input text. It is represented in a formalism standard for unification-based gramm...

A Bangla Semantic Parser Using Context-Free-Grammar

Conference Paper

Full-text available

Sep 2017

This research work describes a computer system for understanding the parsing of Bangla sentences. It draws on recent developments in Natural Language Processing (NLP) research to look at the past, present, and future of NLP technology in a new light. The research work of Bangla Language Processing (BLP) was started in late1980s in Bangladesh and it...

Fig. 9. Análisis del sintagma preposicional en EsTxala de la oración...

Fig. 14. Análisis de la subordinada sustantiva en EsTxala del ejemplo...

Fig. 15. Análisis de la subordinada adverbial en CaTxala del ejemplo...

Fig. 22. Análisis de la construcción coordinada según EsTxala de la...

Consideraciones sobre la naturaleza de los núcleos sintácticos. Hacia una representación sintáctica de dependencias

Article

Full-text available

Jan 2012

RESUMEN En el análisis sintáctico automático, la definición de criterios lingüísticos para gramáticas basadas en conocimiento lingüístico permite de desarrollar recursos coherentes y consistentes. La construcción de EsTxala y CaTxala, dos gramáticas de dependencias del español y del catalán para FreeLing (un entorno de herramientas de Procesamiento...

A Survey on Syntactic Pattern Recognition Methods in Bioinformatics

Article

Full-text available

Mar 2024

Mariusz Flasiński

Formal tools and models of syntactic pattern recognition which are used in bioinformatics are introduced and characterized in the paper. They include, among others: stochastic (string) grammars and automata, hidden Markov models, programmed grammars, attributed grammars, stochastic tree grammars, Tree Adjoining Grammars (TAGs), algebraic dynamic programming, NLC- and NCE-type graph grammars, and algebraic graph transformation systems. The survey of applications of these formal tools and models in bioinformatics is presented.

Prolog Meets Biology

Chapter

Jun 2023

This paper provides an overview of the use of Prolog and its derivatives to sustain research and development in the fields of bioinformatics and computational biology. A number of applications in this domain have been enabled by the declarative nature of Prolog and the combinatorial nature of the underlying problems. The paper provides a summary of some relevant applications as well as potential directions that the Prolog community can continue to pursue in this important domain. The presentation is organized in two parts: “small,” which explores studies in biological components and systems, and “large,” that discusses the use of Prolog to handle biomedical knowledge and data. A concrete encoding example is presented and the effective implementation in Prolog of a widely used approximated search technique, large neighborhood search, is presented.

Analysing bio-art’s epistemic landscape: from metaphoric to post-metaphoric structure

Article

Full-text available

Mar 2022
BioSocieties

Diaa Ahmed Mohamed Ahmedien

Since its emergence, bio-art has developed numerous metaphors central to the transfer of concepts of modern biology, genetics, and genomics to the public domain that reveal several cultural, ethical, and social variations in their related themes. This article assumes that a general typology of metaphors developed by practices related to bio-art can be categorised into two categories: pictorial and operational metaphors. Through these, information regarding several biological issues is transferred to the public arena. Based on the analysis, this article attempts to answer the following questions: How does bio-art develop metaphors to advance epistemic and discursive agendas that constitute public understanding of a set of deeply problematic assumptions regarding how today’s biology operates? Under the influence of today’s synthetic biology, could bio-media operationally reframe these epistemic agendas by reframing complex and multi-layered metaphors towards post-metaphoric structures? Finally, what are the scientific, cultural, and social implications of reframing?

Formal Modeling and Mining of Big Data

Article

May 2017

Aljoharah A Algwaiz

As data collection technologies are advancing and memory storage costs are declining, volumes of data collected have soared. Scientists and investigators are collecting all possible data in fear of missing out on important information. With the merge of the data collection trend, researchers were studying data mining and analysis to find the most efficient way to data mine. There are various valuable data mining techniques that can be found in literature such as Support Vector Machine (SVM), Neural Networks (ANN), and Formal Methods (Grammars). Grammars are a very valuable in analyzing structured data and describing them in a condense matter. However, not many have used it for data mining even though it has many benefits. In this research we present an approach to data mine big data. First, a grammar is inferred to build a structural model that describes the data. Then, on the next phase, a probabilistic context-free grammar is inferred and a model for a more complex structures. Given an input sequence, the model parses and generates the probability of that data sequence being part of the class based on its structural characteristics. Grammatical concatenation is utilized in case of existing sub-structures within the class’s structural description. The model then accepts, or rejects, the input as part of the data’s class by comparing the probability to a pre-set threshold. Finally, this is applied on a heterogeneous large data set by inferring multiple grammars. After building grammatical model for each class, the algorithm parse multiple points in the large set. It then classifies these data into smaller sets where they share similar structural characteristics using probabilistic grammar. If more than one class accepts the data point, it is associated to the highest ranking class. Biological data, DNAs and Proteins, were used for experimentation in this research.

A Grammar Inference Approach for Predicting Kinase Specific Phosphorylation Sites

Article

Full-text available

Apr 2015
PLOS ONE

Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphory-lation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphor-ylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.

A Composite Method Based on Formal Grammar and DNA Structural Features in Detecting Human Polymerase II Promoter Region

Article

Full-text available

Feb 2013
PLOS ONE

An important step in understanding gene regulation is to identify the promoter regions where the transcription factor binding takes place. Predicting a promoter region de novo has been a theoretical goal for many researchers for a long time. There exists a number of in silico methods to predict the promoter region de novo but most of these methods are still suffering from various shortcomings, a major one being the selection of appropriate features of promoter region distinguishing them from non-promoters. In this communication, we have proposed a new composite method that predicts promoter sequences based on the interrelationship between structural profiles of DNA and primary sequence elements of the promoter regions. We have shown that a Context Free Grammar (CFG) can formalize the relationships between different primary sequence features and by utilizing the CFG, we demonstrate that an efficient parser can be constructed for extracting these relationships from DNA sequences to distinguish the true promoter sequences from non-promoter sequences. Along with CFG, we have extracted the structural features of the promoter region to improve upon the efficiency of our prediction system. Extensive experiments performed on different datasets reveals that our method is effective in predicting promoter sequences on a genome-wide scale and performs satisfactorily as compared to other promoter prediction techniques.

Dichotomy in the definition of prescriptive information suggests both prescribed data and prescribed algorithms: Biosemiotics applications in genomic systems

Article

Full-text available

Mar 2012
Theor Biol Med Model

The fields of molecular biology and computer science have cooperated over recent years to create a synergy between the cybernetic and biosemiotic relationship found in cellular genomics to that of information and language found in computational systems. Biological information frequently manifests its "meaning" through instruction or actual production of formal bio-function. Such information is called prescriptive information (PI). PI programs organize and execute a prescribed set of choices. Closer examination of this term in cellular systems has led to a dichotomy in its definition suggesting both prescribed data and prescribed algorithms are constituents of PI. This paper looks at this dichotomy as expressed in both the genetic code and in the central dogma of protein synthesis. An example of a genetic algorithm is modeled after the ribosome, and an examination of the protein synthesis process is used to differentiate PI data from PI algorithms.

Recent Progresses in the Linguistic Modeling of Biological Sequences Based on Formal Language Theory

Article

Full-text available

Mar 2011

Treating genomes just as languages raises the possibility of producing concise generalizations about information in biological sequences. Grammars used in this way would constitute a model of underlying biological processes or structures, and that grammars may, in fact, serve as an appropriate tool for theory formation. The increasing number of biological sequences that have been yielded further highlights a growing need for developing grammatical systems in bioinformatics. The intent of this review is therefore to list some bibliographic references regarding the recent progresses in the field of grammatical modeling of biological sequences. This review will also contain some sections to briefly introduce basic knowledge about formal language theory, such as the Chomsky hierarchy, for non-experts in computational linguistics, and to provide some helpful pointers to start a deeper investigation into this field.

Computational inference of grammars for larger-than-gene structures from annotated gene sequences

Article

Full-text available

Mar 2011
BIOINFORMATICS

Larger than gene structures (LGS) are DNA segments that include at least one gene and often other segments such as inverted repeats and gene promoters. Mobile genetic elements (MGE) such as integrons are LGS that play an important role in horizontal gene transfer, primarily in Gram-negative organisms. Known LGS have a profound effect on organism virulence, antibiotic resistance and other properties of the organism due to the number of genes involved. Expert-compiled grammars have been shown to be an effective computational representation of LGS, well suited to automating annotation, and supporting de novo gene discovery. However, development of LGS grammars by experts is labour intensive and restricted to known LGS. Objectives: This study uses computational grammar inference methods to automate LGS discovery. We compare the ability of six algorithms to infer LGS grammars from DNA sequences annotated with genes and other short sequences. We compared the predictive power of learned grammars against an expert-developed grammar for gene cassette arrays found in Class 1, 2 and 3 integrons, which are modular LGS containing up to 9 of about 240 cassette types. Using a Bayesian generalization algorithm our inferred grammar was able to predict > 95% of MGE structures in a corpus of 1760 sequences obtained from Genbank (F-score 75%). Even with 100% noise added to the training and test sets, we obtained an F-score of 68%, indicating that the method is robust and has the potential to predict de novo LGS structures when the underlying gene features are known. http://www2.chi.unsw.edu.au/attacca.

Modeling Biological Structures via Abstract Grammars to Solve Common Problems in Computational Biology

Thesis

Full-text available

Nov 2010

David James Russell

Grammars are generally understood to be the set of rules that define the relationships between elements of a language. However, grammars can also be used to elucidate structural relationships within sequences constructed from any finite alphabet. In this work abstract grammars are used to model the primary and secondary structures present in biological data. These grammar models are inferred and applied to efficiently solve various sequence analysis problems in computational biology, including multiple sequence alignment, fragment assembly, database redundancy removal, and structural prediction. The primary structures, or sequential ordering of symbols, of biological data are first modeled with Lempel-Ziv (LZ) grammars. The results are used to construct a grammar based sequence distance metric which can be used to compare biological sequences by comparing their inferred grammars. This concept is applied to solve several problems involving biological sequence analysis including multiple sequence alignment and phylogenetic clustering. The higher-level secondary structures of biological sequences are then modeled via two novel grammar inference methods. The resulting context-free grammars are used to estimate structural pieces within biological sequences, which can in-turn be used as supplemental information to help guide various sequence analysis algorithms. The use of this approach to develop algorithms for various sequence analysis tasks demonstrates the viability and versatility of using abstract grammars to model biological data.

The syntax of grammar rules.

Context in source publication

Similar publications

Citations