Definition of a syntactico-semantic feature (dotted line) in the ontology.

Source publication

Genic Interaction Extraction by Reasoning on an Ontology

Article

Full-text available

Jan 2008

Information Extraction (IE) systems have been proposed in recent years, to extract genic interactions from bibliographical re-sources. But they are limited to single in-teraction relations, and have to face a trade-off between recall and precision, by focus-ing either on specific interactions (for pre-cision), or general and unspecified inter-actio...

Context 1

... layer also allows to introduce classes which may be semantically irrelevant from a do- main ontology point of view but factorize con- cepts that share common properties, and thus, fac- torize together otherwise multiple inference rules. This is exemplified in figure 5, which shows the definition of a "biological actor" (bio actor) class, where a "gene", a "protein" and a "gene family" share common syntactical contexts in biological articles. Figure 3 illustrates a final representation combining semantic features (a protein instance "GerE"), and syntactic ones (a subject "subj:V- N" relation between "GerE" and "stimulate", an instance of the "regulation" concept). ...

View in full-text

Self Training Wrapper Induction with Linked Data

Conference Paper

Full-text available

Sep 2014

This work explores the usage of Linked Data for Web scale Information Extraction, with focus on the task of Wrapper Induction. We show how to effectively use Linked Data to automatically generate training material and build a self-trained Wrapper Induction method. Experiments on a publicly available dataset demonstrate that for covered domains, our...

Information Extraction and Metallogenic Prediction of Qiangduo area in Tibet based on Multi-source Remote Sensing Data

Research

Full-text available

Apr 2015

Liu Xinxing

Information Extraction and Metallogenic Prediction of Qiangduo area in Tibet based on Multi-source Remote Sensing Data

A Study in Domain-Independent Information Extraction for Disaster Management

Conference Paper

Full-text available

Jan 2014

During and after natural disasters, detailed information about their impact is a key for successful relief operations. In the 21st century, such information can be found on the Web, traditionally provided by news agencies and recently through social media by affected people themselves. Manual information acquisition from such texts requires ongoing...

Annotation Free Information Extraction from Semi-structured Documents

Article

Full-text available

Oct 2002

The vast amount of online information available has led to renewed interest in information extraction (IE) systems that analyze input documents to produce a structured representation of selected information from the documents. However, the design of an IE system differs greatly according to its input: from unrestricted free-text to semi-structured...

Domain-Independent Novel Event Discovery and Semi-Automatic Event Annotation

Article

Full-text available

Jan 2010

Information Extraction (IE) is becoming increasingly useful, but it is a costly task to discover and annotate novel events, event arguments, and event types. We exploit both monolingual texts and bilingual sentence-aligned parallel texts to cluster event triggers and discover novel event types. We then generate event argument annotations semi-autom...

Close Integration of ML and NLP Tools in BioAlvis for Semantic Search in Bacteriology

Article

Full-text available

Apr 2012

This paper focuses on the use of corpus-based machine learning (ML) methods for fine-grained semantic annotation of text. The state of the art in semantic annotation in Life Science as in other technical and scientific domains, takes advantage of recent breakthroughs in the development of natural language processing (NLP) platforms. The resources required to run such platforms include named entity dictionaries, terminologies, grammars and ontologies. The demand for domain-specific, comprehensive and low cost resources led to the intensive use of ML methods. The precise specification of the ML task goal and target knowledge, and the adequate normalization of the training corpus representation can notably increase the quality of the acquired knowledge. We argue in this paper that integrated ML-NLP architectures facilitate such specifications. We illustrate our demonstration with four representative NLP tasks that are part of the BioAlvis semantic annotation platform. Their impact on the quality of the semantic annotation is qualified through the evaluation of an IR application in Bacteriology.

Etude fonctionnelle d’un centre d’interactions protéiques chez Bacillus subtilis par une approche intégrée

Thesis

Full-text available

Jun 2009

Elodie Marchadier

The entire complement of proteins expressed by a genome forms the proteome. The proteome is organized in structured networks of protein interactions: the interactome. In these networks, most of the proteins have few interactions whereas a few proteins have many connections: these proteins are called centres of interactions or hubs. This thesis focused on an important biological question: understanding the biological function of a cluster of hubs (CoH), discovered in Bacillus subtilis, and which is located at the interface of several essential cellular processes: DNA replication, cell division, chromosome segregation, stress response and biogenesis of the bacterial cell wall. The partners of the protein of the cluster of hubs were first identified by the technique of two-hybrid in yeast, which helped us to define it rigorously in a network composed of 287 proteins connected by 787 interactions. This network shows many proteins in a new context, thereby facilitate functional analysis of individual proteins and links between the major cellular processes. After conducting a study of the genomic context of genes of the CoH, an integrative biology approach has been initiated by analyzing heterogeneous transcriptome data available in public databases. Statistical analysis of these data identified groups of genes co-regulated with the genes of the cluster of hubs. At first, the analysis of correlations between the expression of genes across various conditions has been performed on the basis of classical statistics such as the unsupervised classification. This first analysis allowed us to associate genes in the CoH to functional groups, to validate and to identify regulons. It also enabled us to highlight the limitations of this approach and the need to resort to methods allowing identification of the conditions in which genes are co-regulated. To this end, we have (i) generated transcriptome data to promote the differential expression of genes coding for proteins CoH and (ii) used bi-clustering methods, to identify groups of genes co -expressed in a wide range of conditions. This led us to identify associations of expression in specific conditions among the genes of the CoH. Therefore, it has been possible to combine two approaches: the study of the transcriptome and the interactome, both of them were conducted in a systematic manner in the whole genome. The integration of these two kinds of data allowed us to clarify the functional context of genes of interest and to make assumptions about the nature of interactions between proteins cluster hub. It appears finally composed of a few groups of co-expressed proteins (party hubs) which can interact together and other proteins expressed in an uncorrelated manner (date hubs). The CoH could form a large group of date hubs whose function could be to ensure the connection between basic cellular processes, whatever the environmental conditions B. subtilis could be exposed. Generation and processing of such a data set is a major scientific challenge, it require the mobilization of skills, knowledge, and tools to access to a better understanding of living organisms. The constituted data set may be used to implement other statistical methods. All of this will provide methods to ultimately extract information from large data sets which are currently produced. This is the major issue of integrative biology.

Definition of a syntactico-semantic feature (dotted line) in the ontology.

Context in source publication

Similar publications

Citations