Figure 3 - uploaded by Beth Plale
Content may be subject to copyright.
Input/Output behavior of query.

Input/Output behavior of query.

Source publication
Conference Paper
Full-text available
Continuous query systems are an intuitive way for users to access streaming data in large-scale scientific applications containing many hundreds of streams. A challenge in these systems is to join streams in such a way that memory is conserved. Storing events that could not possibly participate in a join any longer wastes memory and limits scalabil...

Context in source publication

Context 1
... microbenchmarks break down the overhead of the join window algorithm. The scenario, depicted in Figure 3, a single query run in a quoblet container accepts two input streams, D and R, and joins those streams together based on timestamp to produce the aggregate event < D R > . ...

Similar publications

Article
Full-text available
As the maritime logistics becomes global, the issues related to the real time monitoring of the possible problems that could occur during the transport have come to the fore. As the RFID technology for monitoring the transport process has been used and significant amount of data stream is fed into middleware, the function for processing them in rea...
Conference Paper
Full-text available
In the last decade, Stream Processing Engines (SPEs) have emerged as a new processing paradigm that can process huge amounts of data while retaining low latency and high-throughputs. Yet, it is often necessary to join streaming data with traditional databases to provide more contextual information for the end-users and applications. The major probl...
Article
Full-text available
Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path patterns or tree patterns. Current applications of XML require querying of data whose structure is complex or is not fully known to the user, or integrating XML data sources w...
Chapter
Full-text available
Linked Stream Data has emerged as an effort to represent dynamic, time-dependent data streams following the principles of Linked Data. Given the increasing number of available stream data sources like sensors and social network services, Linked Stream Data allows an easy and seamless integration, not only among heterogenous stream data, but also be...
Conference Paper
Full-text available
ABSTRACT,data are persistent queries that continuous~y output results as they Query processing,in Data Stream Management,Systetns (DSMSs) has to meet,various,Quality-of-Service (QoS) rcq~~irements. In Inany data stream,applications. processing,deln? is the most,critical quality requirement,since the val~te of query rcsults decreases dramatically,ov...

Citations

... Events are then pushed through the join operator. Each join operator is appended with a cost operator internally that samples the input streams to detect their rate, for use in calculating the join window size [24]. Joins in Calder are a Cartesian product followed by a time based comparison. ...
... A naive approach to detecting a missing stream is to wait for a preset period of time and if the data doesn't arrive in the specified time period, consider the stream missing. But this approach does not generalize as the stream rates may change dynamically (STORM mode and CLEAR mode for weather sensors [24]) and single preset time may not be relevant for different streams or even subsets of the same stream. It is reasonable to assume that the inter-arrival time of a time series falls within a particular time range averaging around the set generation time interval. ...
Article
Full-text available
Sensors and instruments are an important source of real time data. However, sensor networks and instruments and their delivery systems can fail due to intrusion attacks, node fail-ures, link failures, or problems in the measuring instruments. Missing data can cause prediction inaccuracies or problems in the continuous events processing process. Estimation techniques can approximate missing data in a stream, thus enabling a continuous flow of data when the stream goes down temporarily. We propose Kalman filters for predicting missing events in sensor streams, specifically, with the dynamic linear model. Our study compares the Kalman filter based approach to reservoir sampling and histogram based approaches. We show that Kalman filtering is promising and has the least root mean squared error for most cases. We introduce a novel solution for inserting this approximation technique into an SQL-based events processing system as a new query operator. Our experimental analysis shows that the predic-tion operator has low overhead and is effective in estimat-ing missing events in weather data streams, specifically, the METAR streams.
... As shown in [17,16], a time-based sliding window can be implemented at low cost, and when tied to input stream rate, can be more intuitive for users than a count-based sliding window, where the window size is defined as the number of interested events. ...
Conference Paper
Full-text available
With recent explosive growth of sensors and instruments, large scale data-intensive and computation-intensive appli- cations are emerging, especially in scientific fields. Helping scientists to efficiently, even in real time, process queries over those large scale scientific streams thus has great de- mand. However, query optimization for high volume stream applications- in particular its core component, theevalu- ation model- has not been systematically studied. We ob- serve that evaluating stream query plans should consider three aspects: output rate, computation cost and memory consumption. However, to our knowledge, no existing re- search on evaluating stream query plans consider all three metrics. In this paper, we propose a new combined opti- mization goal which leverages all these aspects and develop a multi-model based optimization framework to accomplish this goal. Specifically, we build three models to evaluate a plan's output rate, computation cost and memory consump- tion respectively. Based on such three models, we search for an optimal plan while considering systems's computation resource and memory constraints. We also experimentally evaluate our optimization framework.
... Optimizations have been applied to yield memory savings for instance in [13] [14] [16]. The SPE architecture uses an underlying storage and/or transport medium that can be files [12] [15], a publish-subscribe system [17], or sockets [18]. ...
... We can see that query execution consumes a small fraction of total service time. More complex queries may consume longer execution time, but this confirms earlier results [29] and also confirms our earlier results that service time is dependant on the rates of the input streams when joins are involved [17]. ...
Conference Paper
Full-text available
The use of real-time data streams in data-driven computational science is driving the need for stream processing tools that work within the architectural framework of the larger application. Data stream processing systems are beginning to emerge in the commercial space, but these systems fail to address the needs of large-scale scientific applications. In this paper we illustrate the unique needs of large-scale data driven computational science through an example taken from weather prediction and forecasting. We apply a realistic workload from this application against our Calder stream processing system to determine effective throughput, event processing latency, data access scalability, and deployment latency. 1
... The Calder system enables execution of SQL-based continuous queries on data streams. It uses a query planner service that optimizes and distributes queries to the computational nodes [11]; sophisticated algorithms to join streams with asynchronous arrival rates [16,17]; and a Kalman filter based approximation technique that estimates the input values when there are gaps in stream data [23]. ...
Article
Full-text available
Workflow-driven, dynamically adaptive e-Science is a form of scientific investigation often using a Service-Oriented Ar-chitecture (SOA) paradigm, designed to use large-scale com-putational resources on-the-fly to execute workflows consist-ing of parallel models, analysis, and visualization tasks. In the Linked Environments for Atmospheric Discovery (LEAD) project, with which our team is involved, our research has centered around event processing and mining of observa-tional and model generated weather data such that users can dynamically trigger regional weather forecasts on-demand in response to developing weather. In this paper we describe stream provenance in complex event processing (CEP) systems. Specifically, we give an information model and architecture for stream provenance capture and collection, and evaluate the provenance service for perturbation and scalability.
... While the users input is required to expose the adaptation parameter, the systems' adaptation to the real-time constraint associated with analizing streaming data is performed automatically by the system. Another effort in supporting stream processing in Grid via a middleware is the dQUOB project [99,100,101]. This system enables continuous processing of SQL queries on data streams. ...
Conference Paper
In recent years, there has been a growing trend towards supporting more tightly coupled applications on the grid, including scientific workflows, applications that use pipelined or data-flow like processing, and distributed streaming applications. As availability of resources can vary over time in a grid environment, dynamic reallocation of resources is very important for these applications, particularly because of their long-running nature, and because they often require large-volume data transfers between processing stages. This paper considers the problem of supporting and efficiently implementing dynamic resource allocation for tightly-coupled and pipelined applications in a grid environment. We provide an alternative to basic checkpointing, using the notion of light-weight summary structure (LSS), to enable efficient migration. The idea behind LSS is that at certain points during the execution of a processing stage, the state of the program can be summarized by a small amount of memory. This allows us to perform low-cost process migration, as long as such memory can be identified by an application developer, and migration is performed only at these points. Our implementation and evaluation of LSS based process migration has been in the context of the GATES (grid-based adaptive execution on streams) middleware that we have been developing. We also present an algorithm for dynamic resource allocation, and have shown an architecture for resource monitoring and allocation. We have extensively evaluated our implementation using three stream data processing applications, and show that the use of LSS allows efficient process migration
Conference Paper
This paper describes how we have used a self-adapting middleware to implement a distributed and adaptive volume rendering application. The middleware we have used is GATES (grid-based adaptive execution on streams), which allows processing of streaming data in a distributed environment. A challenge in supporting such an application on streaming data is to balance the visualization quality and the speed of processing, which can be automatically done by the GATES middleware. We describe how we divide the application into a number of processing stages, and what adaptation parameters we use. Our experimental studies have focused on evaluating the self-adaptation enabled by the middleware, and measuring the overhead associated with the use of middleware.
Conference Paper
Full-text available
We have architected and evaluated a new kind of data resource, one that is composed of a logical collection of ephemeral data streams that could be viewed as a collection of publish-subscribe "channels" over which rich data-access and semantic operations can be performed. This paper contributes new insight to stream processing under the highly asynchronous stream workloads often found in data-driven scientific applications, and presents insights gained through porting a distributed stream processing system to a grid services framework. Experimental results reveal limits on stream processing rates that are directly tied to differences in stream rates.