Conference PaperPDF Available

SPARQL Update for Complex Event Processing

November 2012

November 2012
7650

DOI:10.1007/978-3-642-35173-0_38

Conference: 11th International Semantic Web Conference (ISWC)
At: Boston, USA
Volume: 7650 in Lecture Notes in Computer Science

Authors:

Mikko Rinne

Dinador Ltd.

Complex event processing is currently done primarily with proprietary definition languages. Future smart environments will require collaboration of multi-platform sensors operated by multiple parties. The goal of my research is to verify the applicability of standard-compliant SPARQL for complex event processing tasks. If successful, semantic web standards RDF, SPARQL and OWL with their established base of tools have many other benefits for event processing including support for interconnecting disjoint vocabularies, enriching event information with linked open data and reasoning over semantically annotated content. A software platform capable of continuous incremental evaluation of multiple parallel SPARQL queries is a key enabler of the approach.

Content uploaded by Mikko Rinne

Content may be subject to copyright.

SPARQL Update for Complex Event Processing

Mikko Rinne

Distributed Systems Group,

Department of Computer Science and Engineering,

Aalto University, School of Science, Finland

mikko.rinne@aalto.fi

Abstract. Complex event processing is currently done primarily with

proprietary deﬁnition languages. Future smart environments will require

collaboration of multi-platform sensors operated by multiple parties. The

goal of my research is to verify the applicability of standard-compliant

SPARQL for complex event processing tasks. If successful, semantic web

standards RDF, SPARQL and OWL with their established base of tools

have many other beneﬁts for event processing including support for inter-

connecting disjoint vocabularies, enriching event information with linked

open data and reasoning over semantically annotated content. A software

platform capable of continuous incremental evaluation of multiple paral-

lel SPARQL queries is a key enabler of the approach.

Keywords: Complex event processing, SPARQL, RDF, Rete-algorithm

1 Smart Cities Need SPARQL

Smart environments of the future will need to interconnect billions of sensors

based on platforms from multiple vendors operated by di↵erent companies, pub-

lic authorities or individuals. To mitigate the need for overlapping sensors pro-

ducing duplicate measurements, interoperation of di↵erent platforms should be

maximized. Highly distributed, loosely coupled solutions based on common stan-

dards are needed in such open environments. Event processing systems based on

proprietary deﬁnition languages have challenges to adapt to multi-vendor con-

texts.

The beneﬁt of RDF in complex event processing is that it provides a ﬂexi-

ble representation of heterogeneous events in an open distributed environment,

where new sensors must be able to add new information ﬁelds without breaking

compability with existing applications. SPARQL, tailor-made to query RDF, was

augmented in SPARQL 1.1 Update by the powerful capability to insert selected

data into named triple stores. When combined with a continuous query process-

ing engine, INSERT gives SPARQL queries memory and capability to commu-

nicate and collaborate with each other. As a result, interconnected SPARQL

queries can be used to create complex event processing applications, capable

of handling layered and heterogeneous representations of event instances. When

taking into account their other beneﬁts, semantic web standards RDF, SPARQL

and OWL form a very promising base for complex event processing.

R. Cudré-Mauroux et al. (Eds.): ISWC 2012, LNCS 7650, pp. 453–456.

The ﬁnal publication is available at link.springer.com:

http://link.springer.com/chapter/10.1007/978-3-642-35173-0_38

R. Cudré-Mauroux et al. (Eds.): ISWC 2012, LNCS 7650, pp. 453–456.

The ﬁnal publication is available at link.springer.com:

http://link.springer.com/chapter/10.1007/978-3-642-35173-0_38

In the Distributed Systems Group we are working on an incremental contin-

uous SPARQL query processor based on the Rete-algorithm [5]. The Instans1

platform supports selected parts of SPARQL 1.1 Query and Update speciﬁca-

tions. The ﬁrst generation of Instans was coded on Scala2[1, 8–10]. Instans is

currently being ported to Lisp, where the Rete-net is compiled through macro ex-

pansion in the setup phase into executable Lisp code. The Scala-version reached

notiﬁcation delays of 5-14 ms for the cases tested, but ﬁrst measurements indi-

cate that the Lisp-version would be 100-200 times faster.

In event processing it is equally important to detect the events which didn’t

happen as the ones that did. Missing events are sometimes referred to as “no-

events” or “absence patterns” [4]. A “timed events” mechanism is implemented

with special predicate values used to mark input to a timer-queue. Events in

the timer queue can be set to trigger either after a relative time or at absolute

points in time. A triggered event can be used to set a new timed event, support-

ing periodic operations. The whole interface is SPARQL-compliant, with the

triggering of a timer changing a corresponding triple predicate from “waiting”

to “triggered”, the change being detectable in a SPARQL query.

2 Related Activities

Other research teams have been looking into streaming SPARQL, e.g. C-SPARQL3

[3] and CQELS4[7]. Some di↵erences to our approach are:

–Individual triples: “Data stream processing” focuses on individual time-

annotated triples. We are assuming heterogeneous event formats, where it

may not be known at the time of writing an event processing application,

what information future sensors are going to include into an event. Possibil-

ity to layer events is also of critical importance.

–Extensions: All other solutions extend SPARQL, typically with time-based

windowing or processing a stream order of data. We have used no extensions.

–Repetition of queries: Deﬁned on windows based on time or number of triples

and a repetition rate, with which queries will be re-run. Our approach is

based on continuous and incremental matching of queries, where a particular

segment can be isolated by ﬁltering.

Sparkweave5[6] applies SPARQL queries to RDF format data using an extended

Rete-algorithm, but focuses on inference and fast data stream processing of

individual triples instead of heterogeneous events. Sparkweave v. 1.1 also doesn’t

have support for SPARQL 1.1 features such as SPARQL Update.

1Incremental eNgine for STANding Sparql, http://cse.aalto.ﬁ/instans/

2http://www.scala-lang.org/

3http://streamreasoning.org/download

4http://code.google.com/p/cqels/

5https://github.com/skomazec/Sparkweave

The Prolog-based ETALIS has a SPARQL compiler front-end called “EP-

SPARQL” [2], but it is more limited than the Prolog notation and doesn’t sup-

port (at the time of writing) SPARQL 1.1 features such as SPARQL Update,

which is critical for our study. EP-SPARQL concentrates on operations on event

sequences.

No other system based on collaborative SPARQL queries is known to us.

Current systems in the research community are mainly concentrating on running

one query at a time6. Even the ones allowing to register multiple simultaneous

queries are not expecting the queries to communicate during runtime.

3 Measuring Success

Our event processing work focuses on two main components:

1. Approach: Multiple collaborating SPARQL queries and update rules pro-

cessing heterogeneous events expressed in RDF.

2. Implementation (Instans): Incremental continuous query engine based

on the Rete-algorithm

The overall target of the approach is that it would be easy to create and main-

tain eﬃcient event processing applications for open and heterogeneous environ-

ments. Research questions are related to ﬁnding good principles and patterns for

SPARQL queries used in event processing, creating a mapping to SPARQL for

the main operations needed in event processing (e.g. ﬁltering, splitting, enrich-

ment, aggregation, pattern detection), developing eﬃcient methods of linking

event information with background knowledge, adopting ontology-based infer-

ence mechanisms in event processing and comparing to other event processing

approaches.

An example application “Fast Flowers Delivery” is presented in [4]. It is a lo-

gistics management system, where ﬂower stores send requests to an independent

pool of drivers to send ﬂowers to customers. Drivers are selected based on loca-

tion and ranking. Ranking involves a periodic reporting system. Our next target

is to verify that SPARQL has all the elements in place to support also this kind

of event processing applications. Once the example cases have been conﬁrmed

to work, generalized solution patterns for the complex event processing elements

found in literature using SPARQL building blocks will be deﬁned.

Measuring the success of the implementation can be approached with:

–Implementation eﬃciency (compared to other Rete implementations)

–Algorithmic eﬃciency (Rete compared to other ways of processing SPARQL

queries)

–Performance of the approach (compared to other event processing systems)

Targets for empirical studies are e.g. latency (notiﬁcation time), throughput,

memory consumption, system load, continuous operation over extended time

6e.g. Jena (http://incubator.apache.org/jena/), Sesame (http://www.openrdf.org/)

periods and energy eﬃciency (especially when operating over sensors). In ad-

dition to empirical comparisons, this work is expected to provide answers for

understanding of garbage build-up in the system and for solutions to improve

performance compared to basic Rete.

As a ﬁrst step comparisons with C-SPARQL have been carried out and doc-

umented on the project homepage using an example “close friends” service[8],

but since C-SPARQL is based on repeated execution of queries on windows, the

results are very diﬃcult to compare. A “notiﬁcation delay” in C-SPARQL is

dominated by the window repetition rate. Trying to minimize delay by increas-

ing repetition rate leads to wasted computing resources and duplicate detections.

Even doing so, the format of C-SPARQL only allows to execute queries once per

second (far too often for most applications), whereas the notiﬁcation delays for

Instans have been clocking in at 5-14 ms (depending on hardware).

Both the approach and the implementation would be involved in testing

the ease of deployment and management of the system in a distributed way

in an open environment. Related questions are the processing and memory re-

quirements of the implementation, arrangements for communication between

distributed deployments and any security-related issues speciﬁc to the approach.

Based on our veriﬁcations both the approach and Instans look very promising.

References

1. Abdullah, H., Rinne, M., T¨orm¨a, S., Nuutila, E.: Eﬃcient matching of SPARQL

subscriptions using Rete. In: Proceedings of the 27th Symposium On Applied Com-

puting. Riva del Garda, Italy (Mar 2012)

2. Anicic, D., Fodor, P., Rudolph, S., Stojanovic, N.: EP-SPARQL: a uniﬁed lan-

guage for event processing and stream reasoning. pp. 635–644. WWW ’11, ACM,

Hyderabad, India (2011)

3. Barbieri, D.F., Braga, D., Ceri, S., Grossniklaus, M.: An execution environment

for C-SPARQL queries. In: Proceedings of the 13th International Conference on

Extending Database Technology - EDBT ’10. p. 441. Lausanne, Switzerland (2010)

4. Etzion, O., Niblett, P., Luckham, D.: Event Processing in Action. Manning Pub-

lications (Jul 2010)

5. Forgy, C.L.: Rete: A fast algorithm for the many pattern/many object pattern

match problem. Artiﬁcial Intelligence 19(1), 17–37 (Sep 1982)

6. Komazec, S., Cerri, D.: Towards Eﬃcient Schema-Enhanced Pattern Matching over

RDF Data Streams. In: 10th ISWC. Springer, Bonn, Germany (2011)

7. Le-Phuoc, D., Dao-Tran, M., Parreira, J.X., Hauswirth, M.: A native and adaptive

approach for uniﬁed processing of linked streams and linked data. In: ISWC’11.

pp. 370–388. Springer-Verlag Berlin (Oct 2011)

8. Rinne, M., Abdullah, H., T¨orm¨a, S., Nuutila, E.: Processing Heterogeneous RDF

Events with Standing SPARQL Update Rules. In: Meersman, R., Dillon, T. (eds.)

OTM 2012 Conferences, Part II. pp. 793–802. Springer-Verlag (2012)

9. Rinne, M., Nuutila, E., T¨orm¨a, S.: INSTANS: High-Performance Event Process-

ing with Standard RDF and SPARQL. In: Poster in International Semantic Web

Conference 2012. Boston, MA (2012)

10. Rinne, M., T¨orm¨a, S., Nuutila, E.: SPARQL-Based Applications for RDF-Encoded

Sensor Data. In: 5th International Workshop on Semantic Sensor Networks (2012)

Reducing Property Graph Queries to Relational Algebra for Incremental View Maintenance

Preprint

Full-text available

Jun 2018

The property graph data model of modern graph database systems is increasingly adapted for storing and processing heterogeneous datasets like networks. Many challenging applications with near real-time requirements -- e.g. financial fraud detection, recommendation systems, and on-the-fly validation -- can be captured with graph queries, which are evaluated repeatedly. To ensure quick response time for a changing data set, these applications would benefit from applying incremental view maintenance (IVM) techniques, which can perform continuous evaluation of queries and calculate the changes in the result set upon updates. However, currently, no graph databases provide support for incremental views. While IVM problems have been studied extensively over relational databases, views on property graph queries require operators outside the scope of standard relational algebra. Hence, tackling this problem requires the integration of numerous existing IVM techniques and possibly further extensions. In this paper, we present an approach to perform IVM on property graphs, using a nested relational algebraic representation for property graphs and graph operations. Then we define a chain of transformations to reduce most property graph queries to flat relational algebra and use techniques from discrimination networks (used in rule-based expert systems) to evaluate them. We demonstrate the approach using our prototype tool, ingraph, which uses openCypher, an open graph query language specified as part of an industry initiative. However, several aspects of our approach can be generalised to other graph query languages such as G-CORE and PGQL.

INSTANS: High-Performance Event Processing with Standard RDF and SPARQL

Conference Paper

Full-text available

Nov 2012

Smart environments require collaboration of multi-platform sensors operated by multiple parties. Proprietary event processing solutions lack interoperation flexibility, leading to overlapping functions that can waste hardware and communication resources. Our goal is to show the applicability of standard RDF and SPARQL – including SPARQL 1.1 Update – for complex event processing tasks. If found feasible, event processing would enjoy the benefits of semantic web technologies: cross-domain interoperability, flexible representation and query capabilities, interrelating disjoint vocabularies, reasoning over event content, and enriching events with linked data. To enable event processing with standard RDF/SPARQL we have created Instans, a high-performance Rete-based platform for continuous execution of interconnected SPARQL queries.

Advancing Chatbot Conversations: A Review of Knowledge Update Approaches

Article

Full-text available

Apr 2024
J Braz Comput Soc

Conversational systems like chatbots have emerged as powerful tools for automating interactive tasks traditionally confined to human involvement. Fundamental to chatbot functionality is their knowledge base, the foundation of their reasoning processes. A pivotal challenge resides in chatbots' innate incapacity to seamlessly integrate changes within their knowledge base, thereby hindering their ability to provide real-time responses. The increasing literature attention dedicated to effective knowledge base updates, which we term content update, underscores the significance of this topic. This work provides an overview of content update methodologies in the context of conversational agents. We delve into the state-of-the-art approaches for natural language understanding, such as language models and alike, which are essential for turning data into knowledge. Additionally, we discuss turning point strategies and primary resources, such as deep learning, which are crucial for supporting language models. As our principal contribution, we review and discuss the core techniques underpinning information extraction as well as knowledge base representation and update in the context of conversational agents.

IncQuery-D: A Distributed Incremental Model Query Framework in the Cloud

Conference Paper

Sep 2014

Queries are the foundations of data intensive applications. In model-driven software engineering (MDE), model queries are core technologies of tools and transformations. As software models are rapidly increasing in size and complexity, traditional tools exhibit scalability issues that decrease productivity and increase costs [17]. While scalability is a hot topic in the database community and recent NoSQL efforts have partially addressed many shortcomings, this happened at the cost of sacrificing the ad-hoc query capabilities of SQL. Unfortunately, this is a critical problem for MDE applications due to their inherent workload complexity. In this paper, we aim to address both the scalability and ad-hoc querying challenges by adapting incremental graph search techniques – known from the EMF-IncQuery framework – to a distributed cloud infrastructure. We propose a novel architecture for distributed and incremental queries, and conduct experiments to demonstrate that IncQuery-D, our prototype system, can scale up from a single workstation to a cluster that can handle very large models and complex incremental queries efficiently.

IncQuery-D: Incremental graph search in the cloud

Conference Paper

Jun 2013

Queries are the foundations of data intensive applications. In model-driven software engineering (MDE), model queries are core technologies of tools and transformations. As software models are rapidly increasing in size and complexity, traditional MDE tools frequently exhibit scalability issues that decrease productivity and increase costs. While such scalability challenges are a constantly hot topic in the database community and recent efforts of the NoSQL movement have partially addressed many shortcomings, this happened at the cost of sacrificing the powerful ad-hoc query capabilities of SQL. Unfortunately, this is a critical problem for MDE applications, as their queries can be significantly more complex than in general database applications. In this paper, we aim to address this challenge by adapting incremental graph search techniques -- known from the EMF-IncQuery framework -- to the distributed cloud infrastructure. IncQuery-D, our prototype system can scale up from a single-node tool to a cluster of nodes that can handle very large models and complex queries efficiently. The feasibility of our approach is supported by early experimental results.

Processing Heterogeneous RDF Events with Standing SPARQL Update Rules

Conference Paper

Full-text available

Sep 2012

SPARQL query language is targeted to search datasets encoded in RDF. SPARQL Update adds support of insert and delete operations between graph stores, enabling queries to process data in steps, have persistent memory and communicate with each other. When used in a system supporting incremental evaluation of multiple simultaneously active and collaborating queries SPARQL can define entire event processing networks. The method is demonstrated by an example service, which triggers notifications about the proximity of friends, comparing alternative SPARQL-based approaches. Observed performance in terms of both notification delay and correctness of results far exceed systems based on window repetition without extending standard SPARQL or RDF.

Towards Efficient Schema-Enhanced Pattern Matching over RDF Data Streams

Article

Full-text available

Data streams, often seen as sources of events, have appeared on the Web. Event processing on the Web needs however to cope with the typical openness and heterogeneity of the Web environment. Semantic Web technology, meant to facilitate data integration in an open envi-ronment, can help to address heterogeneities across multiple streams. In this paper we discuss an approach towards efficient pattern matching over RDF data streams based on the Rete algorithm, which can be considered as a first building block for event processing on the Web. Our approach focuses on enhancing Rete with knowledge from the RDF schema as-sociated with data streams, so that implicit knowledge can contribute to pattern matching. Moreover, we cover Rete extensions that cope with the streaming nature of the processed data, such as support for temporal operators, time windows, consumption strategies and garbage collection.

SPARQL-Based Applications for RDF-Encoded Sensor Data

Conference Paper

Full-text available

Nov 2012

Complex event processing is currently more dominated by proprietary systems and vertical products than open technologies. In the future, however, internet-connected people and things moving between smart spaces in smart cities will create a huge volume of events in a multi-actor, multi-platform environment. End-user applications would benefit from the possibility for open access to all relevant sensors and data sources. The work on semantic sensor networks concerns such open technologies to discover and access sensors on the Web, to integrate heterogeneous sensor data, and to make it meaningful to applications. In this study we address the question of how a set of applications can efficiently access a shared set of sensors while avoiding redundant data acquisition that would lead to energy-efficiency problems. The Instans event processing platform, based on the Rete-algorithm, offers continuous execution of interconnected SPARQL queries and update rules. Rete enables sharing of sensor access and caching of intermediate results in a natural and high-performance manner. Our solution suggests that with incremental query evaluation, standard-based SPARQL and RDF can handle complex event processing tasks relevant to sensor networks, and reduce the redundant access from a set of applications to shared sensors.

INSTANS: High-Performance Event Processing with Standard RDF and SPARQL

Conference Paper

Full-text available

Nov 2012

A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data

Conference Paper

Full-text available

Oct 2011

In this paper we address the problem of scalable, native and adaptive query processing over Linked Stream Data integrated with Linked Data. Linked Stream Data consists of data generated by stream sources, e.g., sensors, enriched with semantic descriptions, following the standards proposed for Linked Data. This enables the integration of stream data with Linked Data collections and facilitates a wide range of novel applications. Currently available systems use a “black box” approach which delegates the processing to other engines such as stream/event processing engines and SPARQL query processors by translating to their provided languages. As the experimental results described in this paper show, the need for query translation and data transformation, as well as the lack of full control over the query execution, pose major drawbacks in terms of efficiency. To remedy these drawbacks, we present CQELS (Continuous Query Evaluation over Linked Streams), a native and adaptive query processor for unified query processing over Linked Stream Data and Linked Data. In contrast to the existing systems, CQELS uses a “white box” approach and implements the required query operators natively to avoid the overhead and limitations of closed system regimes. CQELS provides a flexible query execution framework with the query processor dynamically adapting to the changes in the input data. During query execution, it continuously reorders operators according to some heuristics to achieve improved query execution in terms of delay and complexity. Moreover, external disk access on large Linked Data collections is reduced with the use of data encoding and caching of intermediate query results. To demonstrate the efficiency of our approach, we present extensive experimental performance evaluations in terms of query execution time, under varied query types, dataset sizes, and number of parallel queries. These results show that CQELS outperforms related approaches by orders of magnitude.

Rete: A fast algorithm for the many pattern/many object pattern match problem

Article

Jan 1990

C.L. Forgy

Lecture Notes in Computer Science

Conference Paper

Sep 2012

Efficient matching of SPARQL subscriptions using rete

Conference Paper

Mar 2012

Ubiquitous domains such as smart spaces, location-aware mobile systems, or internet-of-things are characterized by large and volatile sets of heterogeneous and independently behaving entities like devices, services, and other identified objects. This study focuses on efficient implementation of an event processing system to manage interaction among these entities. The approach is based on expressive semantic representations: information sharing in RDF and content-based publish/subscribe with SPARQL as the subscription language. SPARQL can be used to construct elaborate queries for detecting complex states resulting from receiving events produced by multiple interrelated entities. The notification system should aim at short notification times while simultaneously allowing high throughput of events. We study incremental matching of SPARQL queries on RDF data using Rete algorithm. The results obtained demonstrate that an efficient and fast semantic notification framework can be implemented by representing SPARQL queries and RDF triples as rules and facts in a Rete engine.

RETE: A fast algorithm for the many pattern/many object pattern match problem

Article

Sep 1982
ARTIF INTELL

Charles Forgy

The Rete Match Algorithm is an efficient method for comparing a large collection of patterns to a large collection of objects. It finds all the objects that match each pattern. The algorithm was developed for use in production system interpreters, and it has been used for systems containing from a few hundred to more than a thousand patterns and objects. This article presents the algorithm in detail. It explains the basic concepts of the algorithm, it describes pattern and object representations that are appropriate for the algorithm, and it describes the operations performed by the pattern matcher.

EP-SPARQL: A unified language for event processing and stream reasoning

Conference Paper

Mar 2011

Streams of events appear increasingly today in various Web applications such as blogs, feeds, sensor data streams, geospatial information, on-line financial data, etc. Event Processing (EP) is concerned with timely detection of compound events within streams of simple events. State-of-the-art EP provides on-the-fly analysis of event streams, but cannot combine streams with background knowledge and cannot perform reasoning tasks. On the other hand, semantic tools can effectively handle background knowledge and perform reasoning thereon, but cannot deal with rapidly changing data provided by event streams. To bridge the gap, we propose Event Processing SPARQL (EP-SPARQL) as a new language for complex events and Stream Reasoning. We provide syntax and formal semantics of the language and devise an effective execution model for the proposed formalism. The execution model is grounded on logic programming, and features effective event processing and inferencing capabilities over temporal and static knowledge. We provide an open-source prototype implementation and present a set of tests to show the usefulness and effectiveness of our approach.

SPARQL Update for Complex Event Processing

Abstract

Recommended publications

Querying RDF and OWL data source using SPARQL

INSTANS: High-Performance Event Processing with Standard RDF and SPARQL

Constructing Event Processing Systems of Layered and Heterogeneous Events with SPARQL

Constructing Event Processing Systems of Layered and Heterogeneous Events with SPARQL

SPARQL-Based Applications for RDF-Encoded Sensor Data