Figure 5 - available via license: Creative Commons Attribution 2.0 Generic
Content may be subject to copyright.
An example of the parse tree building. The parse tree is constructed from the normalized regular expression, STAR nodes converting, and ALL nodes merging.

An example of the parse tree building. The parse tree is constructed from the normalized regular expression, STAR nodes converting, and ALL nodes merging.

Source publication
Article
Full-text available
As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowled...

Contexts in source publication

Context 1
... the second step of the algorithm, we can construct a parse tree using this normalized regular expression. The example of the parse tree for the normalized regu- lar expression is represented in Figure 5(a). The leaf nodes in the parse tree contain grams that are separated by OR, AND, or STAR operators. ...
Context 2
... is because STAR nodes can be repre- sented as all combinations of their descendant nodes, and the number of possible grams for the STAR nodes is infinite. The shaded nodes in Figure 5(b) are the converted ones. After the conversion, the nodes that have the ALL nodes as their children are merged into AND or OR nodes with their children. ...
Context 3
... IDXAND or an IDXOR are logical AND or OR operators between the two sets of the triple IDs that come from its child operators. For example, Figure 6(a) represents the converted parse tree in Figure 5. The AND and OR nodes are converted into the IDXSCAN and IDXOR operators, and each leaf node g is converted into the IDXSCAN(g) operator. ...

Similar publications

Conference Paper
Full-text available
Users need better ways to explore large complex linked data resources. Using SPARQL requires not only mastering its syntax and semantics but also understanding the RDF data model, the ontology and URIs for entities of interest. Natural language question answering systems solve the problem, but these are still subjects of research. The Schema agnost...

Citations

... In addition, there are languages to perform searches on an ontology, two of which are the most used. On the one hand, SPARQL (recursive acronym for SPARQL Protocol and RDF Query Language), a subset of RDF [41]. This language presents the problem (as stated in [42]) an incomplete understanding of OWL semantics, although it offers acceptable results in most cases when the ontology is expressed in RDF serialization. ...
Article
Full-text available
This paper presents a system based on ontologies for the definition of alarms in sensor systems. Although the authors consider that the ontology and the system are interesting themselves, they have a special impact on their experience with the software tools used and the technology related to the Semantic Web. This approach is intended to illustrate the process for those readers not familiar with this field and who can consider the use of ontologies for their next developments. An ontology has been designed in OWL language, complemented with SWRL rules and search elements in SQWRL. The set has offered satisfactory results in simulation. Other interesting contributions of the work are: a survey of the available literature in the field of the use of Semantic Web technologies and ontologies for the detection of events from data obtained from sensors; a study on the tools and vocabularies to be used to create a system that interfaces with ontologies and an educational method - reporting the authors' experience - to help university students understand this topic.
... These properties are more index-friendly in contrast to the other operations such as Quantifiers. Building an index for these kind can improve the speed of search by order of magnitudes, as proposed by [22,23]. Moreover, SPARQL 1.1 has just added additional properties such as: STRSTARTS, STRENDS, CONTAINS, STRBEFORE and STRAFTER. ...
Article
Full-text available
SPARQL is the standardised query language for the RDF data model. To process literal strings, filter expressions can be used with regexes. However, regex can be slow due to its computational complexity. As an initial step towards this area, we present an analysis of regex queries from a large real-world set of queries that have been posed on different SPARQL end-points that represent different domains. We report our findings and deliver some suggestions that can help the performance of regex queries within SPARQL.
... The various language elements of OWL are described in [9] in detail. SPARQL [10] is a query Language for RDF as a formal standard that support semantic search on ontology data. ...
Article
Full-text available
With the increasing growth of users and exponential propagation of messages, Social Network Services such as Twitter, Facebook can be used to analyze social trend, people's interests and personal-favor. Recently, the Social Network Analysis based on Semantic Web technology tends to be a hot study. In this paper, we propose a sentiment-oriented method that takes advantage of emotion ontology to reason out Twitter users' basic emotions and retrieve YAGO ontology to explain associative topics. We implement a prototype system that visualizes the analysis result, which is human- readable as well as machine-understandable. Index Terms - Semantic Web technology, Twitter Mining, Social Network Analysis, emotion analysis, Social Semantic Network.
Article
Full-text available
This paper presents an indoor navigation support system based on the Building Information Models (BIM) paradigm. Although BIM is initially defined for the Architecture, Engineering and Construction/Facility Management (AEC/FM) industry, the authors believe that it can provide added value in this context. To this end, the authors will focus on the Industry Foundation Classes (IFC) standard for the formal representation of BIM. The approach followed in this paper will be based on the ifcOWL ontology, which translates the IFC schemas into Ontology Web Language (OWL). Several modifications of this ontology have been proposed, consisting of the inclusion of new items, SWRL rules and SQWRL searches. This way of expressing the elements of a building can be used to code information that is very useful for navigation, such as the location of elements related to the actions desired by the user. It is important to note that this design is intended to be used as a complement to other well-known tools and techniques for indoor navigation. The proposed modifications have been successfully tested in a variety of simulated and real scenarios. The main limitation of the proposal is the immense amount of information contained in the ifcOWL ontology, which causes difficulties involving its processing and the time necessary to perform operations on it. Those elements that are considered important have been selected, removing those that seem secondary to navigation. This procedure will result in a significant reduction in the storage and semantic processing of the information. Thus, for a system with 1000 individuals (in the ontological sense), the processing time is about 90 s. The authors regard this time as acceptable, since in most cases the tasks involved can be considered part of the system initialization, meaning they will only be executed once at the beginning of the process.
Chapter
The rate at which toxicological data is generated is continually becoming more rapid and the volume of data generated is growing dramatically. This is due in part to advances in software solutions and cheminformatics approaches which increase the availability of open data from chemical, biological and toxicological and high throughput screening resources. However, the amplified pace and capacity of data generation achieved by these novel techniques presents challenges for organising and analysing data output. Big Data in Predictive Toxicology discusses these challenges as well as the opportunities of new techniques encountered in data science. It addresses the nature of toxicological big data, their storage, analysis and interpretation. It also details how these data can be applied in toxicity prediction, modelling and risk assessment. This title is of particular relevance to researchers and postgraduates working and studying in the fields of computational methods, applied and physical chemistry, cheminformatics, biological sciences, predictive toxicology and safety and hazard assessment.
Article
In recent years, various proposals have been put forth to formalize the Grafcet graphical language. The objective of this paper is to propose an ontology-based approach to formalize this language. The authors have implemented a semi-coarse grain ontology written in Ontology Web Language and have tested it by including it in an existing educational tool for the teaching of Grafcet language for use in programmable logic controllers.
Chapter
It is often assumed that Big Data resources are too large and complex for human comprehension. The analysis of Big Data is best left to software programs. Not so. When data analysts go straight to the complex calculations, before they perform a simple estimation, they will find themselves accepting wildly ridiculous calculations. For comparison purposes, there is nothing quite like a simple and intuitive estimate to pull an overly eager analyst back to reality. Often, the simple act of looking at a stripped-down version of the problem opens a new approach that can drastically reduce computation time. In some situations, analysts will find that a point is reached when higher refinements in methods yield diminishing returns. When everyone has used their most advanced algorithms to make an accurate prediction, they may find that their best effort offers little improvement over a simple estimator. This chapter reviews simple methods for analyzing complex data.
Article
Comparative genomics is essentially a form of data mining in large collections of n-ary relations between genomic elements. Increases in the number of sequenced genomes create a stress on comparative genomics that grows, at worse geometrically, for every increase in sequence data. Even modestly-sized labs now routinely obtain several genomes at a time, and like large consortiums expect to be able to perform all-against-all analyses as part of these new multi-genome strategies. In order to address the needs at all levels it is necessary to rethink the algorithmic frameworks and data storage technologies used for comparative genomics.To meet these challenges of scale, in this thesis we develop novel methods based on NoSQL and MapReduce technologies. Using a characterization of the kinds of data used in comparative genomics, and a study of usage patterns for their analysis, we define a practical formalism for genomic Big Data, implement it using the Cassandra NoSQL platform, and evaluate its performance. Furthermore, using two quite different global analyses in comparative genomics, we define two strategies for adapting these applications to the MapReduce paradigm and derive new algorithms. For the first, identifying gene fusion and fission events in phylogenies, we reformulate the problem as a bounded parallel traversal that avoids high-latency graph-based algorithms. For the second, consensus clustering to identify protein families, we define an iterative sampling procedure that quickly converges to the desired global result. For both of these new algorithms, we implement each in the Hadoop MapReduce platform, and evaluate their performance. The performance is competitive and scales much better than existing solutions, but requires particular (and future) effort in devising specific algorithms.
Article
The Resource Description Framework (RDF) is widely used for sharing biomedical resources, such as the online protein database UniProt or gene database GeneOntology. SPARQL is the native query language for RDF databases and it features regular expressions in queries for which the exact values are either irrelevant or unknown. A recent paper by Lee et al. presented an efficient indexing support for such queries adopting multigram indexes for regular expressions. In this paper we contribute to their work by addressing index updates. As a result, we identify a major performance problem of straightforward implementations and design a new algorithm that utilizes unique properties of multigram indexes. Our contributions can be summarized as follows: 1) we propose an efficient update algorithm for regular expression indexes in RDF databases; 2) we build a prototype system for the proposed framework in C++; 3) we conduct extensive experiments to demonstrate the properties of our algorithm. The experiments show that our algorithm outperforms the straightforward implementations by an order of magnitude.