Figure 2 - uploaded by Érick Alphonse
Content may be subject to copyright.
Example of a semantic representation resulting from the IE system.

Example of a semantic representation resulting from the IE system.

Source publication
Article
Full-text available
Information Extraction (IE) systems have been proposed in recent years, to extract genic interactions from bibliographical re-sources. But they are limited to single in-teraction relations, and have to face a trade-off between recall and precision, by focus-ing either on specific interactions (for pre-cision), or general and unspecified inter-actio...

Context in source publication

Context 1
... a protein results from the expression of a gene ("product of"), and a protein complex results from the assembly of sev- eral proteins ("complex with"). Figure 2 shows, on an example sentence, the result of the IE sys- tem provided as instances of the ontology. Note that, as a normalized representation of the text, not all the meaning is kept: for instance, we do not stress anymore about the "DNA binding" nature of the "GerE" protein; the fact that the transcrip- tion happens from "several" promoters is lost. ...

Similar publications

Conference Paper
Full-text available
This work explores the usage of Linked Data for Web scale Information Extraction, with focus on the task of Wrapper Induction. We show how to effectively use Linked Data to automatically generate training material and build a self-trained Wrapper Induction method. Experiments on a publicly available dataset demonstrate that for covered domains, our...
Research
Full-text available
Information Extraction and Metallogenic Prediction of Qiangduo area in Tibet based on Multi-source Remote Sensing Data
Conference Paper
Full-text available
During and after natural disasters, detailed information about their impact is a key for successful relief operations. In the 21st century, such information can be found on the Web, traditionally provided by news agencies and recently through social media by affected people themselves. Manual information acquisition from such texts requires ongoing...
Article
Full-text available
The vast amount of online information available has led to renewed interest in information extraction (IE) systems that analyze input documents to produce a structured representation of selected information from the documents. However, the design of an IE system differs greatly according to its input: from unrestricted free-text to semi-structured...
Article
Full-text available
Information Extraction (IE) is becoming increasingly useful, but it is a costly task to discover and annotate novel events, event arguments, and event types. We exploit both monolingual texts and bilingual sentence-aligned parallel texts to cluster event triggers and discover novel event types. We then generate event argument annotations semi-autom...

Citations

... where Protein, Interaction_Action and Gene are ontology concepts, and obj and subject are syntactic dependencies. Many complex gene interaction cases are handled with the same method including those involving regulon membership and promoter binding (detailed method in [29]). Relation extraction rules are learned by the supervised Inductive Logic Programming method, LP-Propal. ...
Article
Full-text available
This paper focuses on the use of corpus-based machine learning (ML) methods for fine-grained semantic annotation of text. The state of the art in semantic annotation in Life Science as in other technical and scientific domains, takes advantage of recent breakthroughs in the development of natural language processing (NLP) platforms. The resources required to run such platforms include named entity dictionaries, terminologies, grammars and ontologies. The demand for domain-specific, comprehensive and low cost resources led to the intensive use of ML methods. The precise specification of the ML task goal and target knowledge, and the adequate normalization of the training corpus representation can notably increase the quality of the acquired knowledge. We argue in this paper that integrated ML-NLP architectures facilitate such specifications. We illustrate our demonstration with four representative NLP tasks that are part of the BioAlvis semantic annotation platform. Their impact on the quality of the semantic annotation is qualified through the evaluation of an IR application in Bacteriology.
Thesis
Full-text available
The entire complement of proteins expressed by a genome forms the proteome. The proteome is organized in structured networks of protein interactions: the interactome. In these networks, most of the proteins have few interactions whereas a few proteins have many connections: these proteins are called centres of interactions or hubs. This thesis focused on an important biological question: understanding the biological function of a cluster of hubs (CoH), discovered in Bacillus subtilis, and which is located at the interface of several essential cellular processes: DNA replication, cell division, chromosome segregation, stress response and biogenesis of the bacterial cell wall. The partners of the protein of the cluster of hubs were first identified by the technique of two-hybrid in yeast, which helped us to define it rigorously in a network composed of 287 proteins connected by 787 interactions. This network shows many proteins in a new context, thereby facilitate functional analysis of individual proteins and links between the major cellular processes. After conducting a study of the genomic context of genes of the CoH, an integrative biology approach has been initiated by analyzing heterogeneous transcriptome data available in public databases. Statistical analysis of these data identified groups of genes co-regulated with the genes of the cluster of hubs. At first, the analysis of correlations between the expression of genes across various conditions has been performed on the basis of classical statistics such as the unsupervised classification. This first analysis allowed us to associate genes in the CoH to functional groups, to validate and to identify regulons. It also enabled us to highlight the limitations of this approach and the need to resort to methods allowing identification of the conditions in which genes are co-regulated. To this end, we have (i) generated transcriptome data to promote the differential expression of genes coding for proteins CoH and (ii) used bi-clustering methods, to identify groups of genes co -expressed in a wide range of conditions. This led us to identify associations of expression in specific conditions among the genes of the CoH. Therefore, it has been possible to combine two approaches: the study of the transcriptome and the interactome, both of them were conducted in a systematic manner in the whole genome. The integration of these two kinds of data allowed us to clarify the functional context of genes of interest and to make assumptions about the nature of interactions between proteins cluster hub. It appears finally composed of a few groups of co-expressed proteins (party hubs) which can interact together and other proteins expressed in an uncorrelated manner (date hubs). The CoH could form a large group of date hubs whose function could be to ensure the connection between basic cellular processes, whatever the environmental conditions B. subtilis could be exposed. Generation and processing of such a data set is a major scientific challenge, it require the mobilization of skills, knowledge, and tools to access to a better understanding of living organisms. The constituted data set may be used to implement other statistical methods. All of this will provide methods to ultimately extract information from large data sets which are currently produced. This is the major issue of integrative biology.