Fig 5 - uploaded by Yang Liu
Content may be subject to copyright.
Example of Query Input

Example of Query Input

Source publication
Article
Full-text available
The incompatibilities among complex data formats and various schema used by biological databases that house these data are becoming a bottleneck in biological research. For example, biological data format varies from simple words (e.g. gene name), numbers (e.g. molecular weight) to sequence strings (e.g. nucleic acid sequence), to even more complex...

Context in source publication

Context 1
... Iuput Type is a subset of the BAO Property Class. For the given query example, Figure 5 shows that the envelope polyprotein GP160 precusor of input type PROTEIN-NAME and HIV-1 of input type ORGANISM-NAME are entered as query input. ...

Similar publications

Article
Full-text available
The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integr...

Citations

... However, maintaining and updating the static links between various databases is a challenge. Furthermore, the only queries that can be answered by information linkage based systems are those that are within the scope of the pre-existing static links [1,2]. ...
... We are providing a mapping between various DTDs using the ontological concepts as the common terms for them, hence developing a relation between various DTDs. We are using BAO [5,2], which is a three-dimensional domain ontology for Biological and Chemical Information Integration system BACIIS [4]. This system integrates Life Sciences databases mainly Swissprot [12], GenBank [13], PDB [14], OMIM [15]. ...
Conference Paper
Full-text available
Several biological databases exist which use different formats for storing data. Further, each database has its own schema and a query interface. There exist no standard conversion tools for converting data from one format to another, which makes querying multiple heterogeneous databases, a difficult task. Since XML provides a standard format for sharing and exchanging data on WWW, it is used to share information in biological databases. Various DTDs are defined for biological databases but they do not have a unified format. Same data is being represented as different formats using different DTDs. This paper provides mapping between various biological XML DTDs and ontology so that it is possible to correlate the data between the DTDs.
... Therefore, to process refining queries in a comprehensive and correct manner, we must assume an overlapping coverage of the global schema by different sources and deal with the data absence and inconsistency. The objective of this paper is to describe the solution proposed by BACIIS, the Biological and Chemical Information Integration System567, for providing data provenance for result records and for supporting queries over integrated results. Section 2 briefly introduces BACIIS system and its main data integration features. ...
Conference Paper
Full-text available
For execution of complex biological queries, data integration systems often use several intermediate data sources because the domain coverage of individual sources is limited. Quality of intermediate sources differs greatly based on the method used for curation, frequency of updates and breadth of domain coverage, which affects the quality of the results. Therefore, integration systems should provide data provenance; i.e. information about the path used to obtain every record in the result. Furthermore, since query capabilities of web- accessible sources are limited, integration systems need to support refinement queries of finer granularity issued over the integrated data. However, unlike the individual sources, integration systems have to handle the absence of data and conflicts in the integrated data caused by inconsistencies among the sources. This paper describes the solution proposed by BACIIS, the Biological and Chemical Information Integration System, for providing data provenance and for supporting refinement queries over integrated data. Semantic correspondence between records from different sources is defined based on the links connecting these data sources including cross-references. Two characteristics of semantic correspondence, namely degree and cardinality, are identified based on the closeness of the links that exist between data records and based on the mappings between domains of data records respectively. An algorithm based on semantic correspondence is presented to handle absence of data and conflicts in the integrated data.
... We present here the unifying model used by HKIS. We do not aim to propose a new complete conceptual model for biological and biomedical data (see Cornell et al., 2003 andDavidson et al., 2000) or a new ontology (see Ben Miled et al., 2003or Backer et al., 1999, but instead to provide the main biological entities that would be addressed in our application domain, the study of cancer. The biologists involved in the project identified the entities considered to be important. ...
... However, it does not offer the vocabulary to describe the specific services in biological domain. In recent years, there have been several studies including a research about ontology which offers vocabularies to describe data types and concepts in biological domain (Miled et al., 2003) as well as a research about ontology providing vocabularies to describe biological Web services (Wilkinson et al., 2003). Biological Web services ontology used in our architecture has been adopted from Wroe et al. (2003). ...
Article
Full-text available
A lot of efforts have been added to bioinformatics communities to handle the problem of integration or interoperability between distributed, heterogeneous biological resources over the Web. Recently, the Web services technology and its applications are becoming promising alternatives for the integration and interoperability issues in bioinformatics domain. However, Web services itself cannot find an appropriate service for the end-user. The smart discovery of biological Web services is best accomplished by using ontology. In this paper, an integrated discovery process of biological Web services is proposed. We used DAML+OIL to support biological Web services descriptions and DAML-S to create service ontology. These classifications of Web services are used to support semantic service matching and discovery by SMA (Semantic Matching Agent). To illustrate the integrated biological Web services discovery process, some scenarios based on the BLAST (Basic Local Alignment Sequence Tool) Web services are presented. Future efforts will include implementing Bio-Web services portal and its components, expanding the ontology, and comparing the precision and recall of our approach with other discovery techniques.
... Los mediadores encapsulan el conocimiento necesario para recuperar y presentar a los usuarios un determinado tipo de información, como por ejemplo, historias clínicas computerizadas, secuencias de ADN, etc. El sistema BACIIS (Miled et al., 2003) es un claro ejemplo de sistemas pertenecientes a esta categoría. La principal desventaja de estos sistemas es que son menos intuitivos para el usuario que los basados en esquemas conceptuales virtuales. ...
... Como puede verse en la figura 2.5, el componente central del sistema es una ontología de dominio (Miled et al., 2003). Esta ontología se utiliza para alcanzar un consenso terminológico entre diferentes bases de datos relacionadas con el dominio de aplicación -en este caso, la biología y la química. ...
Article
Full-text available
La llamada “sociedad de la información” y el rápido crecimiento de la Web han favorecido la aparición de numerosas fuentes “on-line” que contienen grandes cantidades de datos e información. Es por ello que se hace necesaria la creación de nuevos métodos y herramientas para facilitar el acceso integrado a todos estos recursos a través de Internet. En esta tesis doctoral se presentan una serie de métodos y herramientas cuyo propósito es llevar a cabo la integración de fuentes estructuradas (normalmente bases de datos relacionales) con fuentes no estructuradas (como colecciones de documentos de texto “plano”). Para ello, se parte del trabajo previo realizado por el autor de esta tesis en el desarrollo de OntoFusion, un sistema que permite llevar a cabo la integración de fuentes estructuradas siguiendo un enfoque basado en repositorios virtuales y el uso de modelos de dominio. A priori, los métodos y herramientas proporcionados por OntoFusion no pueden ser utilizados para integrar ambos tipos de fuentes, ya que las fuentes no estructuradas carecen de 1) un modelo de datos físico que las describa, y 2) un mecanismo de recuperación de información que permita ejecutar preguntas formuladas en base al modelo de datos. Para solucionar estos problemas, en este trabajo se propone: 1) crear un método que permita obtener, a partir de una fuente no estructurada, un modelo de dominio que describa su contenido, y 2) definir un modelo de recuperación de información para fuentes no estructuradas que pueda integrarse con la recuperación de datos en fuentes estructuradas. Este modelo de recuperación, denominado “Modelo de índices ontológicos” o MIO está basado en el modelo de recuperación más utilizado durante las últimas décadas: el modelo del espacio vectorial (MEV). La utilización conjunta de estos dos componentes, y de los métodos y herramientas desarrollados en el contexto de INFOGENMED, sugiere que es posible lograr la integración de fuentes estructuradas y no estructuradas siguiendo para ello un enfoque basado en repositorios virtuales y el uso de modelos de dominio. De cara a comprobar experimentalmente que la hipótesis anterior era cierta, se llevó a cabo un experimento de integración con un conjunto de fuentes estructuradas y no estructuradas, concluyéndose que es posible lograr la integración de ambos tipos de fuentes siguiendo la aproximación propuesta en este trabajo. Asimismo, con el propósito de evaluar el rendimiento del nuevo modelo de recuperación de información, se realizó un experimento comparativo entre el MIO y el MEV. Los resultados de este experimento demuestran empíricamente que el rendimiento del MIO es superior al del MEV para dos colecciones de documentos de prueba. La conclusión obtenida tras estos experimentos es que el uso del conocimiento contenido en los modelos de dominio asociados a las colecciones de prueba influye positivamente en el proceso de recuperación de información.
Article
The exponential increase of the amount of data available in several domains and the need for processing such data makes problems become computationally intensive. Consequently, it is infeasible to carry out sequential analysis, so the need for parallel processing. Over the last few years, the widespread deployment of multicore architectures, accelerators, grids, clusters, and other powerful architectures such as FPGAs and ASICs has encouraged researchers to write parallel algorithms using available parallel computing paradigms to solve such problems. The major challenge now is to take advantage of these architectures irrespective of their heterogeneity. This is due to the fact that designing an execution model that can unify all computing resources is still very difficult. Moreover, scheduling tasks to run efficiently on heterogeneous architectures still needs a lot of research. Existing solutions tend to focus on individual architectures or deal with heterogeneity among CPUs and GPUs only, but in reality, often, heterogeneous systems exist. Up to now very cumbersome, manual adaption is required to take advantage of these heterogeneous architectures. The aim of this paper is to provide a proposal for a functional-level design of a multiagent-based framework to deal with the heterogeneity of hardware architectures and parallel computing paradigms deployed to solve those problems. Bioinformatics will be selected as a case study.
Chapter
New experimental methods allow researchers within molecular and systems biology to rapidly generate larger and larger amounts of data. This data is often made publicly available on the Internet and although this data is extremely useful, we are not using its full capacity. One important reason is that we still lack good ways to connect or integrate information from different resources. One kind of resource is the over 1000 data sources freely available on the Web. As most data sources are developed and maintained independently, they are highly heterogeneous. Information is also updated frequently. Other kinds of resources that are not so well-known or commonly used yet are the ontologies and the standards. Ontologies aim to define a common terminology for a domain of interest. Standards provide a way to exchange data between data sources and tools, even if the internal representations of the data in the resources and tools are different. In this chapter we argue that ontological knowledge and standards should be used for integration of data. We describe properties of the different types of data sources, ontological knowledge and standards that are available on the Web and discuss how this knowledge can be used to support integrated access to multiple biological data sources. Further, we present an integration approach that combines the identified ontological knowledge and standards with traditional information integration techniques. Current integration approaches only cover parts of the suggested approach. We also discuss the components in the model on which much recent work has been done in more detail: ontology-based data source integration, ontology alignment and integration using standards. Although many of our discussions in this chapter are general we exemplify mainly using work done within the REWERSE working group on Adding Semantics to the Bioinformatics Web.
Conference Paper
The life sciences are a promising application area for semantic web technologies as there are large online structured and unstructured data repositories and ontologies, which structure this knowledge. We briefly give an overview over biomedical ontologies and show how they can help to locate, retrieve, and integrate biomedical data. Annotating literature with ontology terms is an important problem to support such ontology-based searches. We review the steps involved in this text mining task and introduce the ontology-based search engine GoPubMed. As the underlying data sources evolve, so do the ontologies. We give a brief overview over different approaches supporting the semi-automatic evolution of ontologies.