Figure 5 - uploaded by Érick Alphonse
Content may be subject to copyright.
Source publication
Information Extraction (IE) systems have been proposed in recent years, to extract genic interactions from bibliographical re-sources. But they are limited to single in-teraction relations, and have to face a trade-off between recall and precision, by focus-ing either on specific interactions (for pre-cision), or general and unspecified inter-actio...
Context in source publication
Context 1
... layer also allows to introduce classes which may be semantically irrelevant from a do- main ontology point of view but factorize con- cepts that share common properties, and thus, fac- torize together otherwise multiple inference rules. This is exemplified in figure 5, which shows the definition of a "biological actor" (bio actor) class, where a "gene", a "protein" and a "gene family" share common syntactical contexts in biological articles. Figure 3 illustrates a final representation combining semantic features (a protein instance "GerE"), and syntactic ones (a subject "subj:V- N" relation between "GerE" and "stimulate", an instance of the "regulation" concept). ...
Similar publications
This work explores the usage of Linked Data for Web scale Information Extraction, with focus on the task of Wrapper Induction. We show how to effectively use Linked Data to automatically generate training material and build a self-trained Wrapper Induction method. Experiments on a publicly available dataset demonstrate that for covered domains, our...
Information Extraction and Metallogenic Prediction of Qiangduo area in Tibet based on Multi-source Remote Sensing Data
During and after natural disasters, detailed information about their impact is a key for successful relief operations.
In the 21st century, such information can be found on the Web, traditionally provided by news agencies and recently
through social media by affected people themselves. Manual information acquisition from such texts requires ongoing...
The vast amount of online information available has led to renewed interest in information extraction (IE) systems that analyze input documents to produce a structured representation of selected information from the documents. However, the design of an IE system differs greatly according to its input: from unrestricted free-text to semi-structured...
Information Extraction (IE) is becoming increasingly useful, but it is a costly task to discover and annotate novel events, event arguments, and event types. We exploit both monolingual texts and bilingual sentence-aligned parallel texts to cluster event triggers and discover novel event types. We then generate event argument annotations semi-autom...
Citations
... where Protein, Interaction_Action and Gene are ontology concepts, and obj and subject are syntactic dependencies. Many complex gene interaction cases are handled with the same method including those involving regulon membership and promoter binding (detailed method in [29]). Relation extraction rules are learned by the supervised Inductive Logic Programming method, LP-Propal. ...
This paper focuses on the use of corpus-based machine learning (ML) methods for fine-grained semantic annotation of text. The state of the art in semantic annotation in Life Science as in other technical and scientific domains, takes advantage of recent breakthroughs in the development of natural language processing (NLP) platforms. The resources required to run such platforms include named entity dictionaries, terminologies, grammars and ontologies. The demand for domain-specific, comprehensive and low cost resources led to the intensive use of ML methods. The precise specification of the ML task goal and target knowledge, and the adequate normalization of the training corpus representation can notably increase the quality of the acquired knowledge. We argue in this paper that integrated ML-NLP architectures facilitate such specifications. We illustrate our demonstration with four representative NLP tasks that are part of the BioAlvis semantic annotation platform. Their impact on the quality of the semantic annotation is qualified through the evaluation of an IR application in Bacteriology.
The entire complement of proteins expressed by a genome forms the proteome. The proteome is
organized in structured networks of protein interactions: the interactome. In these networks, most of
the proteins have few interactions whereas a few proteins have many connections: these proteins are
called centres of interactions or hubs.
This thesis focused on an important biological question: understanding the biological function of a
cluster of hubs (CoH), discovered in Bacillus subtilis, and which is located at the interface of several
essential cellular processes: DNA replication, cell division, chromosome segregation, stress response
and biogenesis of the bacterial cell wall.
The partners of the protein of the cluster of hubs were first identified by the technique of two-hybrid in
yeast, which helped us to define it rigorously in a network composed of 287 proteins connected by 787
interactions. This network shows many proteins in a new context, thereby facilitate functional analysis
of individual proteins and links between the major cellular processes.
After conducting a study of the genomic context of genes of the CoH, an integrative biology approach
has been initiated by analyzing heterogeneous transcriptome data available in public databases.
Statistical analysis of these data identified groups of genes co-regulated with the genes of the cluster of
hubs. At first, the analysis of correlations between the expression of genes across various conditions
has been performed on the basis of classical statistics such as the unsupervised classification. This first
analysis allowed us to associate genes in the CoH to functional groups, to validate and to identify
regulons. It also enabled us to highlight the limitations of this approach and the need to resort to
methods allowing identification of the conditions in which genes are co-regulated.
To this end, we have (i) generated transcriptome data to promote the differential expression of genes
coding for proteins CoH and (ii) used bi-clustering methods, to identify groups of genes co -expressed
in a wide range of conditions. This led us to identify associations of expression in specific conditions
among the genes of the CoH.
Therefore, it has been possible to combine two approaches: the study of the transcriptome and the
interactome, both of them were conducted in a systematic manner in the whole genome. The
integration of these two kinds of data allowed us to clarify the functional context of genes of interest
and to make assumptions about the nature of interactions between proteins cluster hub. It appears
finally composed of a few groups of co-expressed proteins (party hubs) which can interact together
and other proteins expressed in an uncorrelated manner (date hubs). The CoH could form a large group
of date hubs whose function could be to ensure the connection between basic cellular processes,
whatever the environmental conditions B. subtilis could be exposed.
Generation and processing of such a data set is a major scientific challenge, it require the mobilization
of skills, knowledge, and tools to access to a better understanding of living organisms. The constituted
data set may be used to implement other statistical methods. All of this will provide methods to
ultimately extract information from large data sets which are currently produced. This is the major
issue of integrative biology.