Conference PaperPDF Available

SPARQL Update for Complex Event Processing

Authors:
  • Dinador Ltd.

Abstract

Complex event processing is currently done primarily with proprietary definition languages. Future smart environments will require collaboration of multi-platform sensors operated by multiple parties. The goal of my research is to verify the applicability of standard-compliant SPARQL for complex event processing tasks. If successful, semantic web standards RDF, SPARQL and OWL with their established base of tools have many other benefits for event processing including support for interconnecting disjoint vocabularies, enriching event information with linked open data and reasoning over semantically annotated content. A software platform capable of continuous incremental evaluation of multiple parallel SPARQL queries is a key enabler of the approach.
SPARQL Update for Complex Event Processing
Mikko Rinne
Distributed Systems Group,
Department of Computer Science and Engineering,
Aalto University, School of Science, Finland
mikko.rinne@aalto.fi
Abstract. Complex event processing is currently done primarily with
proprietary definition languages. Future smart environments will require
collaboration of multi-platform sensors operated by multiple parties. The
goal of my research is to verify the applicability of standard-compliant
SPARQL for complex event processing tasks. If successful, semantic web
standards RDF, SPARQL and OWL with their established base of tools
have many other benefits for event processing including support for inter-
connecting disjoint vocabularies, enriching event information with linked
open data and reasoning over semantically annotated content. A software
platform capable of continuous incremental evaluation of multiple paral-
lel SPARQL queries is a key enabler of the approach.
Keywords: Complex event processing, SPARQL, RDF, Rete-algorithm
1 Smart Cities Need SPARQL
Smart environments of the future will need to interconnect billions of sensors
based on platforms from multiple vendors operated by dierent companies, pub-
lic authorities or individuals. To mitigate the need for overlapping sensors pro-
ducing duplicate measurements, interoperation of dierent platforms should be
maximized. Highly distributed, loosely coupled solutions based on common stan-
dards are needed in such open environments. Event processing systems based on
proprietary definition languages have challenges to adapt to multi-vendor con-
texts.
The benefit of RDF in complex event processing is that it provides a flexi-
ble representation of heterogeneous events in an open distributed environment,
where new sensors must be able to add new information fields without breaking
compability with existing applications. SPARQL, tailor-made to query RDF, was
augmented in SPARQL 1.1 Update by the powerful capability to insert selected
data into named triple stores. When combined with a continuous query process-
ing engine, INSERT gives SPARQL queries memory and capability to commu-
nicate and collaborate with each other. As a result, interconnected SPARQL
queries can be used to create complex event processing applications, capable
of handling layered and heterogeneous representations of event instances. When
taking into account their other benefits, semantic web standards RDF, SPARQL
and OWL form a very promising base for complex event processing.
R. Cudré-Mauroux et al. (Eds.): ISWC 2012, LNCS 7650, pp. 453–456.
(C) Springer-Verlag Berlin Heidelberg 2012
The final publication is available at link.springer.com:
http://link.springer.com/chapter/10.1007/978-3-642-35173-0_38
R. Cudré-Mauroux et al. (Eds.): ISWC 2012, LNCS 7650, pp. 453–456.
(C) Springer-Verlag Berlin Heidelberg 2012
The final publication is available at link.springer.com:
http://link.springer.com/chapter/10.1007/978-3-642-35173-0_38
In the Distributed Systems Group we are working on an incremental contin-
uous SPARQL query processor based on the Rete-algorithm [5]. The Instans1
platform supports selected parts of SPARQL 1.1 Query and Update specifica-
tions. The first generation of Instans was coded on Scala2[1, 8–10]. Instans is
currently being ported to Lisp, where the Rete-net is compiled through macro ex-
pansion in the setup phase into executable Lisp code. The Scala-version reached
notification delays of 5-14 ms for the cases tested, but first measurements indi-
cate that the Lisp-version would be 100-200 times faster.
In event processing it is equally important to detect the events which didn’t
happen as the ones that did. Missing events are sometimes referred to as “no-
events” or “absence patterns” [4]. A “timed events” mechanism is implemented
with special predicate values used to mark input to a timer-queue. Events in
the timer queue can be set to trigger either after a relative time or at absolute
points in time. A triggered event can be used to set a new timed event, support-
ing periodic operations. The whole interface is SPARQL-compliant, with the
triggering of a timer changing a corresponding triple predicate from “waiting”
to “triggered”, the change being detectable in a SPARQL query.
2 Related Activities
Other research teams have been looking into streaming SPARQL, e.g. C-SPARQL3
[3] and CQELS4[7]. Some dierences to our approach are:
Individual triples: “Data stream processing” focuses on individual time-
annotated triples. We are assuming heterogeneous event formats, where it
may not be known at the time of writing an event processing application,
what information future sensors are going to include into an event. Possibil-
ity to layer events is also of critical importance.
Extensions: All other solutions extend SPARQL, typically with time-based
windowing or processing a stream order of data. We have used no extensions.
Repetition of queries: Defined on windows based on time or number of triples
and a repetition rate, with which queries will be re-run. Our approach is
based on continuous and incremental matching of queries, where a particular
segment can be isolated by filtering.
Sparkweave5[6] applies SPARQL queries to RDF format data using an extended
Rete-algorithm, but focuses on inference and fast data stream processing of
individual triples instead of heterogeneous events. Sparkweave v. 1.1 also doesn’t
have support for SPARQL 1.1 features such as SPARQL Update.
1Incremental eNgine for STANding Sparql, http://cse.aalto.fi/instans/
2http://www.scala-lang.org/
3http://streamreasoning.org/download
4http://code.google.com/p/cqels/
5https://github.com/skomazec/Sparkweave
The Prolog-based ETALIS has a SPARQL compiler front-end called “EP-
SPARQL” [2], but it is more limited than the Prolog notation and doesn’t sup-
port (at the time of writing) SPARQL 1.1 features such as SPARQL Update,
which is critical for our study. EP-SPARQL concentrates on operations on event
sequences.
No other system based on collaborative SPARQL queries is known to us.
Current systems in the research community are mainly concentrating on running
one query at a time6. Even the ones allowing to register multiple simultaneous
queries are not expecting the queries to communicate during runtime.
3 Measuring Success
Our event processing work focuses on two main components:
1. Approach: Multiple collaborating SPARQL queries and update rules pro-
cessing heterogeneous events expressed in RDF.
2. Implementation (Instans): Incremental continuous query engine based
on the Rete-algorithm
The overall target of the approach is that it would be easy to create and main-
tain ecient event processing applications for open and heterogeneous environ-
ments. Research questions are related to finding good principles and patterns for
SPARQL queries used in event processing, creating a mapping to SPARQL for
the main operations needed in event processing (e.g. filtering, splitting, enrich-
ment, aggregation, pattern detection), developing ecient methods of linking
event information with background knowledge, adopting ontology-based infer-
ence mechanisms in event processing and comparing to other event processing
approaches.
An example application “Fast Flowers Delivery” is presented in [4]. It is a lo-
gistics management system, where flower stores send requests to an independent
pool of drivers to send flowers to customers. Drivers are selected based on loca-
tion and ranking. Ranking involves a periodic reporting system. Our next target
is to verify that SPARQL has all the elements in place to support also this kind
of event processing applications. Once the example cases have been confirmed
to work, generalized solution patterns for the complex event processing elements
found in literature using SPARQL building blocks will be defined.
Measuring the success of the implementation can be approached with:
Implementation eciency (compared to other Rete implementations)
Algorithmic eciency (Rete compared to other ways of processing SPARQL
queries)
Performance of the approach (compared to other event processing systems)
Targets for empirical studies are e.g. latency (notification time), throughput,
memory consumption, system load, continuous operation over extended time
6e.g. Jena (http://incubator.apache.org/jena/), Sesame (http://www.openrdf.org/)
periods and energy eciency (especially when operating over sensors). In ad-
dition to empirical comparisons, this work is expected to provide answers for
understanding of garbage build-up in the system and for solutions to improve
performance compared to basic Rete.
As a first step comparisons with C-SPARQL have been carried out and doc-
umented on the project homepage using an example “close friends” service[8],
but since C-SPARQL is based on repeated execution of queries on windows, the
results are very dicult to compare. A “notification delay” in C-SPARQL is
dominated by the window repetition rate. Trying to minimize delay by increas-
ing repetition rate leads to wasted computing resources and duplicate detections.
Even doing so, the format of C-SPARQL only allows to execute queries once per
second (far too often for most applications), whereas the notification delays for
Instans have been clocking in at 5-14 ms (depending on hardware).
Both the approach and the implementation would be involved in testing
the ease of deployment and management of the system in a distributed way
in an open environment. Related questions are the processing and memory re-
quirements of the implementation, arrangements for communication between
distributed deployments and any security-related issues specific to the approach.
Based on our verifications both the approach and Instans look very promising.
References
1. Abdullah, H., Rinne, M., T¨orm¨a, S., Nuutila, E.: Ecient matching of SPARQL
subscriptions using Rete. In: Proceedings of the 27th Symposium On Applied Com-
puting. Riva del Garda, Italy (Mar 2012)
2. Anicic, D., Fodor, P., Rudolph, S., Stojanovic, N.: EP-SPARQL: a unified lan-
guage for event processing and stream reasoning. pp. 635–644. WWW ’11, ACM,
Hyderabad, India (2011)
3. Barbieri, D.F., Braga, D., Ceri, S., Grossniklaus, M.: An execution environment
for C-SPARQL queries. In: Proceedings of the 13th International Conference on
Extending Database Technology - EDBT ’10. p. 441. Lausanne, Switzerland (2010)
4. Etzion, O., Niblett, P., Luckham, D.: Event Processing in Action. Manning Pub-
lications (Jul 2010)
5. Forgy, C.L.: Rete: A fast algorithm for the many pattern/many object pattern
match problem. Artificial Intelligence 19(1), 17–37 (Sep 1982)
6. Komazec, S., Cerri, D.: Towards Ecient Schema-Enhanced Pattern Matching over
RDF Data Streams. In: 10th ISWC. Springer, Bonn, Germany (2011)
7. Le-Phuoc, D., Dao-Tran, M., Parreira, J.X., Hauswirth, M.: A native and adaptive
approach for unified processing of linked streams and linked data. In: ISWC’11.
pp. 370–388. Springer-Verlag Berlin (Oct 2011)
8. Rinne, M., Abdullah, H., T¨orm¨a, S., Nuutila, E.: Processing Heterogeneous RDF
Events with Standing SPARQL Update Rules. In: Meersman, R., Dillon, T. (eds.)
OTM 2012 Conferences, Part II. pp. 793–802. Springer-Verlag (2012)
9. Rinne, M., Nuutila, E., T¨orm¨a, S.: INSTANS: High-Performance Event Process-
ing with Standard RDF and SPARQL. In: Poster in International Semantic Web
Conference 2012. Boston, MA (2012)
10. Rinne, M., T¨orm¨a, S., Nuutila, E.: SPARQL-Based Applications for RDF-Encoded
Sensor Data. In: 5th International Workshop on Semantic Sensor Networks (2012)
... During the evaluation of a query, it identifies additional tuples by dereferencing URLs, turning to remote servers and feeding new data elements to the Rete network. INSTANS [58] uses the Rete algorithm to perform complex event processing on streaming RDF data. Strider [57] is a recent research prototype supporting continuous SPARQL queries. ...
Preprint
Full-text available
The property graph data model of modern graph database systems is increasingly adapted for storing and processing heterogeneous datasets like networks. Many challenging applications with near real-time requirements -- e.g. financial fraud detection, recommendation systems, and on-the-fly validation -- can be captured with graph queries, which are evaluated repeatedly. To ensure quick response time for a changing data set, these applications would benefit from applying incremental view maintenance (IVM) techniques, which can perform continuous evaluation of queries and calculate the changes in the result set upon updates. However, currently, no graph databases provide support for incremental views. While IVM problems have been studied extensively over relational databases, views on property graph queries require operators outside the scope of standard relational algebra. Hence, tackling this problem requires the integration of numerous existing IVM techniques and possibly further extensions. In this paper, we present an approach to perform IVM on property graphs, using a nested relational algebraic representation for property graphs and graph operations. Then we define a chain of transformations to reduce most property graph queries to flat relational algebra and use techniques from discrimination networks (used in rule-based expert systems) to evaluate them. We demonstrate the approach using our prototype tool, ingraph, which uses openCypher, an open graph query language specified as part of an industry initiative. However, several aspects of our approach can be generalised to other graph query languages such as G-CORE and PGQL.
... Here we extend the discussion in [5] by adding further information on the Instans implementation of continuous incremental SPARQL query processing. Instans 1 [6] is an incremental engine for near-real-time processing of complex, layered, heterogeneous events. ...
Conference Paper
Full-text available
Smart environments require collaboration of multi-platform sensors operated by multiple parties. Proprietary event processing solutions lack interoperation flexibility, leading to overlapping functions that can waste hardware and communication resources. Our goal is to show the applicability of standard RDF and SPARQL – including SPARQL 1.1 Update – for complex event processing tasks. If found feasible, event processing would enjoy the benefits of semantic web technologies: cross-domain interoperability, flexible representation and query capabilities, interrelating disjoint vocabularies, reasoning over event content, and enriching events with linked data. To enable event processing with standard RDF/SPARQL we have created Instans, a high-performance Rete-based platform for continuous execution of interconnected SPARQL queries.
Article
Full-text available
Conversational systems like chatbots have emerged as powerful tools for automating interactive tasks traditionally confined to human involvement. Fundamental to chatbot functionality is their knowledge base, the foundation of their reasoning processes. A pivotal challenge resides in chatbots' innate incapacity to seamlessly integrate changes within their knowledge base, thereby hindering their ability to provide real-time responses. The increasing literature attention dedicated to effective knowledge base updates, which we term content update, underscores the significance of this topic. This work provides an overview of content update methodologies in the context of conversational agents. We delve into the state-of-the-art approaches for natural language understanding, such as language models and alike, which are essential for turning data into knowledge. Additionally, we discuss turning point strategies and primary resources, such as deep learning, which are crucial for supporting language models. As our principal contribution, we review and discuss the core techniques underpinning information extraction as well as knowledge base representation and update in the context of conversational agents.
Conference Paper
Queries are the foundations of data intensive applications. In model-driven software engineering (MDE), model queries are core technologies of tools and transformations. As software models are rapidly increasing in size and complexity, traditional tools exhibit scalability issues that decrease productivity and increase costs [17]. While scalability is a hot topic in the database community and recent NoSQL efforts have partially addressed many shortcomings, this happened at the cost of sacrificing the ad-hoc query capabilities of SQL. Unfortunately, this is a critical problem for MDE applications due to their inherent workload complexity. In this paper, we aim to address both the scalability and ad-hoc querying challenges by adapting incremental graph search techniques – known from the EMF-IncQuery framework – to a distributed cloud infrastructure. We propose a novel architecture for distributed and incremental queries, and conduct experiments to demonstrate that IncQuery-D, our prototype system, can scale up from a single workstation to a cluster that can handle very large models and complex incremental queries efficiently.
Conference Paper
Queries are the foundations of data intensive applications. In model-driven software engineering (MDE), model queries are core technologies of tools and transformations. As software models are rapidly increasing in size and complexity, traditional MDE tools frequently exhibit scalability issues that decrease productivity and increase costs. While such scalability challenges are a constantly hot topic in the database community and recent efforts of the NoSQL movement have partially addressed many shortcomings, this happened at the cost of sacrificing the powerful ad-hoc query capabilities of SQL. Unfortunately, this is a critical problem for MDE applications, as their queries can be significantly more complex than in general database applications. In this paper, we aim to address this challenge by adapting incremental graph search techniques -- known from the EMF-IncQuery framework -- to the distributed cloud infrastructure. IncQuery-D, our prototype system can scale up from a single-node tool to a cluster of nodes that can handle very large models and complex queries efficiently. The feasibility of our approach is supported by early experimental results.
Conference Paper
Full-text available
SPARQL query language is targeted to search datasets encoded in RDF. SPARQL Update adds support of insert and delete operations between graph stores, enabling queries to process data in steps, have persistent memory and communicate with each other. When used in a system supporting incremental evaluation of multiple simultaneously active and collaborating queries SPARQL can define entire event processing networks. The method is demonstrated by an example service, which triggers notifications about the proximity of friends, comparing alternative SPARQL-based approaches. Observed performance in terms of both notification delay and correctness of results far exceed systems based on window repetition without extending standard SPARQL or RDF.
Article
Full-text available
Data streams, often seen as sources of events, have appeared on the Web. Event processing on the Web needs however to cope with the typical openness and heterogeneity of the Web environment. Semantic Web technology, meant to facilitate data integration in an open envi-ronment, can help to address heterogeneities across multiple streams. In this paper we discuss an approach towards efficient pattern matching over RDF data streams based on the Rete algorithm, which can be considered as a first building block for event processing on the Web. Our approach focuses on enhancing Rete with knowledge from the RDF schema as-sociated with data streams, so that implicit knowledge can contribute to pattern matching. Moreover, we cover Rete extensions that cope with the streaming nature of the processed data, such as support for temporal operators, time windows, consumption strategies and garbage collection.
Conference Paper
Full-text available
Complex event processing is currently more dominated by proprietary systems and vertical products than open technologies. In the future, however, internet-connected people and things moving between smart spaces in smart cities will create a huge volume of events in a multi-actor, multi-platform environment. End-user applications would benefit from the possibility for open access to all relevant sensors and data sources. The work on semantic sensor networks concerns such open technologies to discover and access sensors on the Web, to integrate heterogeneous sensor data, and to make it meaningful to applications. In this study we address the question of how a set of applications can efficiently access a shared set of sensors while avoiding redundant data acquisition that would lead to energy-efficiency problems. The Instans event processing platform, based on the Rete-algorithm, offers continuous execution of interconnected SPARQL queries and update rules. Rete enables sharing of sensor access and caching of intermediate results in a natural and high-performance manner. Our solution suggests that with incremental query evaluation, standard-based SPARQL and RDF can handle complex event processing tasks relevant to sensor networks, and reduce the redundant access from a set of applications to shared sensors.
Conference Paper
Full-text available
Smart environments require collaboration of multi-platform sensors operated by multiple parties. Proprietary event processing solutions lack interoperation flexibility, leading to overlapping functions that can waste hardware and communication resources. Our goal is to show the applicability of standard RDF and SPARQL – including SPARQL 1.1 Update – for complex event processing tasks. If found feasible, event processing would enjoy the benefits of semantic web technologies: cross-domain interoperability, flexible representation and query capabilities, interrelating disjoint vocabularies, reasoning over event content, and enriching events with linked data. To enable event processing with standard RDF/SPARQL we have created Instans, a high-performance Rete-based platform for continuous execution of interconnected SPARQL queries.
Conference Paper
Full-text available
In this paper we address the problem of scalable, native and adaptive query processing over Linked Stream Data integrated with Linked Data. Linked Stream Data consists of data generated by stream sources, e.g., sensors, enriched with semantic descriptions, following the standards proposed for Linked Data. This enables the integration of stream data with Linked Data collections and facilitates a wide range of novel applications. Currently available systems use a “black box” approach which delegates the processing to other engines such as stream/event processing engines and SPARQL query processors by translating to their provided languages. As the experimental results described in this paper show, the need for query translation and data transformation, as well as the lack of full control over the query execution, pose major drawbacks in terms of efficiency. To remedy these drawbacks, we present CQELS (Continuous Query Evaluation over Linked Streams), a native and adaptive query processor for unified query processing over Linked Stream Data and Linked Data. In contrast to the existing systems, CQELS uses a “white box” approach and implements the required query operators natively to avoid the overhead and limitations of closed system regimes. CQELS provides a flexible query execution framework with the query processor dynamically adapting to the changes in the input data. During query execution, it continuously reorders operators according to some heuristics to achieve improved query execution in terms of delay and complexity. Moreover, external disk access on large Linked Data collections is reduced with the use of data encoding and caching of intermediate query results. To demonstrate the efficiency of our approach, we present extensive experimental performance evaluations in terms of query execution time, under varied query types, dataset sizes, and number of parallel queries. These results show that CQELS outperforms related approaches by orders of magnitude.
Conference Paper
SPARQL query language is targeted to search datasets encoded in RDF. SPARQL Update adds support of insert and delete operations between graph stores, enabling queries to process data in steps, have persistent memory and communicate with each other. When used in a system supporting incremental evaluation of multiple simultaneously active and collaborating queries SPARQL can define entire event processing networks. The method is demonstrated by an example service, which triggers notifications about the proximity of friends, comparing alternative SPARQL-based approaches. Observed performance in terms of both notification delay and correctness of results far exceed systems based on window repetition without extending standard SPARQL or RDF.
Conference Paper
Ubiquitous domains such as smart spaces, location-aware mobile systems, or internet-of-things are characterized by large and volatile sets of heterogeneous and independently behaving entities like devices, services, and other identified objects. This study focuses on efficient implementation of an event processing system to manage interaction among these entities. The approach is based on expressive semantic representations: information sharing in RDF and content-based publish/subscribe with SPARQL as the subscription language. SPARQL can be used to construct elaborate queries for detecting complex states resulting from receiving events produced by multiple interrelated entities. The notification system should aim at short notification times while simultaneously allowing high throughput of events. We study incremental matching of SPARQL queries on RDF data using Rete algorithm. The results obtained demonstrate that an efficient and fast semantic notification framework can be implemented by representing SPARQL queries and RDF triples as rules and facts in a Rete engine.
Article
The Rete Match Algorithm is an efficient method for comparing a large collection of patterns to a large collection of objects. It finds all the objects that match each pattern. The algorithm was developed for use in production system interpreters, and it has been used for systems containing from a few hundred to more than a thousand patterns and objects. This article presents the algorithm in detail. It explains the basic concepts of the algorithm, it describes pattern and object representations that are appropriate for the algorithm, and it describes the operations performed by the pattern matcher.
Conference Paper
Streams of events appear increasingly today in various Web applications such as blogs, feeds, sensor data streams, geospatial information, on-line financial data, etc. Event Processing (EP) is concerned with timely detection of compound events within streams of simple events. State-of-the-art EP provides on-the-fly analysis of event streams, but cannot combine streams with background knowledge and cannot perform reasoning tasks. On the other hand, semantic tools can effectively handle background knowledge and perform reasoning thereon, but cannot deal with rapidly changing data provided by event streams. To bridge the gap, we propose Event Processing SPARQL (EP-SPARQL) as a new language for complex events and Stream Reasoning. We provide syntax and formal semantics of the language and devise an effective execution model for the proposed formalism. The execution model is grounded on logic programming, and features effective event processing and inferencing capabilities over temporal and static knowledge. We provide an open-source prototype implementation and present a set of tests to show the usefulness and effectiveness of our approach.