Conference Paper

Supporting Dynamic Migration in Tightly Coupled Grid Applications

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In recent years, there has been a growing trend towards supporting more tightly coupled applications on the grid, including scientific workflows, applications that use pipelined or data-flow like processing, and distributed streaming applications. As availability of resources can vary over time in a grid environment, dynamic reallocation of resources is very important for these applications, particularly because of their long-running nature, and because they often require large-volume data transfers between processing stages. This paper considers the problem of supporting and efficiently implementing dynamic resource allocation for tightly-coupled and pipelined applications in a grid environment. We provide an alternative to basic checkpointing, using the notion of light-weight summary structure (LSS), to enable efficient migration. The idea behind LSS is that at certain points during the execution of a processing stage, the state of the program can be summarized by a small amount of memory. This allows us to perform low-cost process migration, as long as such memory can be identified by an application developer, and migration is performed only at these points. Our implementation and evaluation of LSS based process migration has been in the context of the GATES (grid-based adaptive execution on streams) middleware that we have been developing. We also present an algorithm for dynamic resource allocation, and have shown an architecture for resource monitoring and allocation. We have extensively evaluated our implementation using three stream data processing applications, and show that the use of LSS allows efficient process migration

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Existing frameworks [98,97,43,129,55,141,34] ...
... The algorithm then performs a three-way comparison between timeInP revSchedule, (timeinP revCluster + intraClusterCost) and (bestT ime + interClusterCost) to decide if multiClusterInterval j has to be executed on sameSchedule, scheduleonSameCluster or bestSchedule (lines [33][34][35][36][37][38][39][40][41]. The algorithm continues performing the above steps until it includes all the phases in MCRP. ...
... While we adhere to the general principles of confining the executions of such applications in a cluster at a point of application execution, we claim and show that efficient scheduling and rescheduling mechanisms for execution of different phases of the applications in different clusters can bring performance benefits to the tightly-coupled applications when executed on multi-cluster grids. Existing frameworks [98,97,43,129,55,141,34,114] for enabling MPI based tightly-coupled parallel applications on grids do not adequately deal with application dynamics for largescale parallel applications, have large application execution overheads, and are generally not suitable for multi-cluster grids that can consist of batch systems. These techniques are not practical for grid systems where node availability can frequently vary. ...
Article
Full-text available
As computational grids have become popular and ubiquitous, users have access to large number and different types of geographically distributed grid resources. Many computational grid frameworks are composed of multiple distributed sites with each site consisting of one or more dedicated or non-dedicated clusters. Jobs submitted to a grid are handled by a matascheduler which interacts with the local schedulers of the clusters for scheduling jobs to the individual clusters. Computational grids have been found to be powerful research-beds for execution of various kinds of parallel applications. When a parallel application is submitted to a grid, the metascheduler has to choose a set of resources from a cluster for application execution. To select the best set of resources for application execution, it is important to determine the performance of the application. Accurate performance estimates of an application is essential in assisting a grid meta scheduler to efficiently schedule user jobs. Thus models that predict execution times of parallel applications on a set of resources and a search procedure (scheduling strategy) which selects the best set of machines within a cluster for application execution are of importance for enabling the parallel applications on grids. For efficient execution of large scientific parallel applications consisting of multiple phases, performance models of the individual phases should be obtained. Efficient rescheduling strategies that can use the per-phase models to adapt the parallel applications to application and resource dynamics are necessary for maintaining high performance of the applications on grids. A practical and robust grid computing infrastructure that integrates components related to application and resource monitoring, performance modeling, scheduling and rescheduling techniques, is highly essential for large-scale deployment and high performance of scientific applications on grid systems and hence for fostering high performance computing. This thesis focuses on developing performance models for predicting execution times of parallel problems/subproblems on dedicated and non-dedicated grid resources. The thesis also constructs robust scheduling and rescheduling strategies in a grid metascheduler that can use the performance models for efficient execution of large scientific parallel applications on dynamic grids. Finally, the thesis builds a practical and robust grid middleware infrastructure which integrates components related to performance modeling, scheduling and rescheduling, monitoring and migration frameworks for large-scale deployment and use of high performance applications on grids. The thesis consists of four main components. In the first part of the thesis, we have developed a comprehensive set of performance modeling strategies to predict the execution times of tightly-coupled parallel applications on a set of resources in a dedicated or non-dedicated cluster. The main purpose of our prediction strategies is to aid grid metaschedulers in making scheduling decisions. Our performance modeling strategies, based on linear regression, can deal with non-dedicated systems where the loads can change during application executions. Our models do not require detailed knowledge and instrumentation of the applications and can be constructed without the involvement of application developers. The strategies are intended for rapid and large scale deployment of parallel applications on non-dedicated grid systems. We have evaluated our strategies on 8, 16, 24 and 32-node clusters with random loads and load traces from a grid system. Our performance modeling strategies gave less than 30% average percentage prediction errors in all cases, which is reasonable for non-dedicated systems. We also found that scheduling based on the predictions by our strategies will result in perfect scheduling in many cases. For modeling large-scale scientific applications, we use execution profiles and automatic program analysis, and manual analysis of significant portions of the application’s code to identify the different phases of applications. We then adopt our performance modeling strategies to predict execution times for the different phases of the tightly-coupled parallel applications on a set of resources in a dedicated or non-dedicated cluster. Our experiments show that using combinations of performance models of the phases give 18% – 70% more accurate predictions than using single performance models for the applications. In the second part of the thesis, we have devised, evaluated and compared algorithms for scheduling tightly-coupled parallel applications on multi-cluster grids. Our algorithms use performance models that predict the execution times of parallel applications, for evaluations of candidate schedules. In this work, we propose a novel algorithm called Box Elimination (BE) that searches a space of performance model parameters to determine efficient schedules. By eliminating large search space regions containing poorer solutions at each step and searching high quality solutions, our algorithm is able to generate efficient schedules within few seconds for even clusters of 512 processors. By means of large number of real and simulation experiment, we compared our algorithm with popular optimization techniques. We show that our algorithm generates up to 80% more efficient schedules than other algorithms and the resulting execution times are more robust against performance modeling errors. The third part of the thesis deals with policies for rescheduling long-running multi-phase parallel applications in response to application and resource dynamics. In this work, we use our performance modeling and scheduling strategies to derive rescheduling plans for executing multi-phase parallel applications on grids. A rescheduling plan consists of potential points in application execution for rescheduling and schedules of resources for application execution between two consecutive rescheduling points. We have developed three algorithms, namely an incremental algorithm, a divide-and-conquer algorithm and a genetic algorithm, for deriving a rescheduling plan for a parallel application execution. We have also developed an algorithm that uses rescheduling plans derived on different clusters to form a single coherent rescheduling plan for application execution on a grid consisting of multiple clusters. The rescheduling plans generated by our algorithms are highly efficient leading to application execution times that are higher than the execution times corresponding to brute force method by less than 10%. We also find that rescheduling in response to changing application and resource dynamics, using the rescheduling plans for multi-cluster grids generated by our algorithms, give much lesser execution times when compared to executions of the applications on a single schedule throughout application execution. In the final part of the thesis, we have developed a practical grid middleware framework called MerITA (Middleware for Performance Improvement of Tightly Coupled Parallel Applications on Grids), a system for effective execution of tightly-coupled parallel applications on multi-cluster grids consisting of dedicated or non-dedicated, interactive or batch systems. The framework brings together performance modeling for automatically determining the characteristics of parallel applications, scheduling strategies that use the performance models for efficient mapping of applications to resources, rescheduling policies for determining the points in application execution when executing applications can be rescheduled to different sets of resources to obtain performance improvement and a check-pointing library for enabling rescheduling.
... With the development of high performance systems with massive number of processors[1] and long running scalable scientific applications that can use the processors for executions [2], the mean time between failures (MTBF) of the processors used for a single application execution has tremendously decreased [3]. Hence many checkpointing systems have been developed to enable fault tolerance for application executions [4], [5], [6], [7], [8]. A checkpointing system periodically saves the state of an application execution. ...
Article
Selecting optimal intervals of checkpointing an application is important for minimizing the run time of the application in the presence of system failures. Most of the existing efforts on checkpointing interval selection were developed for sequential applications while few efforts deal with parallel applications where the applications are executed on the same number of processors for the entire duration of execution. Some checkpointing systems support parallel applications where the number of processors on which the applications execute can be changed during the execution. We refer to these kinds of parallel applications as {\em malleable} applications. In this paper, we develop a performance model for malleable parallel applications that estimates the amount of useful work performed in unit time (UWT) by a malleable application in the presence of failures as a function of checkpointing interval. We use this performance model function with different intervals and select the interval that maximizes the UWT value. By conducting a large number of simulations with the traces obtained on real supercomputing systems, we show that the checkpointing intervals determined by our model can lead to high efficiency of applications in the presence of failures.
... Second direction for our future research would be to provide our middleware with support for dynamic computation migration and fault-tolerance. Recently our research group has published work on supporting migration [31] and fault-tolerance [137] in the context of the GATES grid middleware aimed at supporting applications processing distributed data streams [30]. Fault-tolerance, in particular, is achieved through the use of a Lighweight Summary Structure (LSS) that summarizes intermediate results of data stream processing, that allows the middleware (provided that a fault has been detected) to re-start processing of a particular stream at a new computational resource. ...
... Various efforts exist for scheduling parallel applications on multiple resources [1][2][3][4][5][6][7]. Some parallel applications are loosely coupled [8][9][10] where the interactions among the parallel tasks are negligible whereas some parallel applications are tightly coupled [11][12][13][14][15]. ...
Article
Although various strategies have been developed for scheduling parallel applications with independent tasks, very little work exists for scheduling tightly coupled parallel applications on cluster environments. In this paper, we compare four different strategies based on performance models of tightly coupled parallel applications for scheduling the applications on clusters. In addition to algorithms based on existing popular optimization techniques, we also propose a new algorithm called Box Elimination that searches the space of performance model parameters to determine the best schedule of machines. By means of real and simulation experiments, we evaluated the algorithms on single cluster and multi-cluster setups. We show that our Box Elimination algorithm generates up to 80% more efficient schedules than other algorithms. We also show that the execution times of the schedules produced by our algorithm are more robust against the performance modeling errors. Copyright © 2009 John Wiley & Sons, Ltd.
... Aurora* is a framework for distributed processing of data streams, but only within a single administrative domain [12]. Earlier, it has been shown how GATES can support process migration to deal with changes in availability of CPU cycles or network bandwidth [11], but not for handling complete failure of a processing node. ...
Conference Paper
This paper considers the problem of supporting and efciently implementing fault-tolerance for tightly-coupled and pipelined applications, especially streaming applications, in a grid environment. We provide an alternative to basic checkpointing and use the notion of Light-weight Summary Structure(LSS) to enable efcient failure-recovery. The idea behind LSS is that at certain points during the execution of a processing stage, the state of the program can be summarized by a small amount of memory. This allows us to store copies of LSS for enabling failure-recovery, which causes low overhead fault-tolerance. Our work can be viewed as an optimization and adaptation of the idea of application-level checkpointing to a different execution environment, and for a different class of applications. Our implementation and evaluation of LSS based failure-recovery has been in the context of the GATES (Grid- based AdapTive Execution on Streams) middleware. An observation we use for providing very low overhead support for fault-tolerance is that algorithms analyzing data streams are only allowed to take a single pass over data, which means they only perform approximate processing. Therefore, we believe that in supporting fault-tolerant execution for these applications, it is acceptable to not analyze a small number of packets of data during failure-recovery. We show how we perform failure-recovery and also demonstrate how we could use additional buffers to limit data loss during the recovery procedure. We also present an efcient algorithm for allocating a new computation resource for failure-recovery at runtime. We have extensively evaluated our implementation using three stream data processing applications, and shown that the use of LSS allows effective and low-overhead failure-recovery.
... There are two reasons. (1) With the maturity of the checkpointing and migration techniques [10,11,35,55], together with our currencybased resource representation and reputation-based scheduling, HOURS can introduce a highly efficient automated rescheduling, which is promising in the future Grid platform development. It will greatly reduce the intervention of humans for resubmission, so as to improve the productivity of the Grid. ...
Article
The obstacle for the Grid to be prevalent is the difficulty in using, configuring and maintaining it, which needs excessive IT knowledge, workload, and human intervention. At the same time, inter-operation amongst Grids is on track. To be the core of Grid systems, the resource management must be autonomic and inter-operational to be sustainable for future Grid computing. For this purpose, we introduce HOURS, a reputation-driven economic framework for Grid resource management. HOURS is designed to tackle the difficulty of automatic rescheduling, self-protection, incentives, heterogeneous resource sharing, reservation, and SLA in Grid computing. In this paper, we focus on designing a reputation-based resource scheduler, and use emulation to test its performance with real job traces and node failure traces. To describe the HOURS framework completely, a preliminary multiple-currency-based economic model is also introduced in this paper, with which future extension and improvement can be easily integrated into the framework. The results demonstrate that our scheduler can reduce the job failure rate significantly, and the average number of job resubmissions, which is the most important metric in this paper that affects the system performance and resource utilization from the perspective of users, can be reduced from 3.82 to 0.70 compared to simple sequence resource selection.
Article
Full-text available
LTE (Long-Term Evolution Network) is gaining momentum with its success in smart grids. Many time-critical applications can be hosted in mobile grids using LTE. High degrees of intermittent connectivity and disconnections are prevalent in a mobile grid. When a mobile node disconnects, the processes become orphans and require migration. This paper proposes a non-DHT based heap overlay to determine a destination host for the migrated process to continue computing with minimum disruption. The overlay facilitates structured communication in search of a destination node. Simulation results indicate that the proposed approach has constant time complexity than random or learning-based techniques. It is also found to have reduced maintenance overheads. Further, a dedicated bandwidth allocation is proposed for exporting the process state from source to destination identified.
Conference Paper
The distributed stream processing applications (DSPA) are the applications which can process real-time data in the distributed environment. These applications contain processing elements (PE) and different kinds of streams in the cluster or cloud environment. However, this kind of applications maybe have hotspots and bursting data in a very short of time when they running in the cluster or cloud. Furthermore, the resources such as memory and CPU power in each node of an environment can lead to resources imbalance because these applications always run in a long time. Considering two problems above, in this paper, we proposed a paralleled-PEs method to relieve the hotspots and bursting data in DSPA. We also approach a dynamic and situational awareness method which can combine with paralleled-PEs method to relieve imbalance situations and improve the utilization of resource. We use some shell scripts and java packages to implement our methods. In our experiments, we deploy these two methods in the S4 system framework which is an open source stream computation platform. And in the experiments evaluation, the results show that our methods have a lower delay, better resources utilization. The results also show that our methods have higher throughput than other methods.
Article
Using Virtual Machines (VMs) as a computing resource within a Service Oriented Architecture (SOA) creates a variety of new issues and challenges. Traditionally parallel task scheduling algorithms only focus on handling CPU resources but with use of a VM there are many more resources properties to monitor and manage. The objective of this paper is to address these challenges of a multi-dimensional scheduling algorithm for VMs within a SOA. To do this we deploy a testbed of SOA environment with VMs that are capable of being registered, indexed, allocated, accessed and controlled by our new parallel task scheduling algorithms.
Article
A rapid growth in the storage capacity requirements at a computer center can lead to the installation of additional disk racks. The challenging task is not the installation, but to migrate old data to the new storage pools. A framework to parallelize the data migration process, using Linux clusters connected to Storage Area Network storage, is presented. A Linux tool to efficiently parallelize data migration, utilizing the High Performance Computing environment, is developed. Results show that using multiple nodes and multiple data copying streams per node achieves significant speedup factors over manual copying. The tool is demonstrated on four nodes using 178 data copying streams, achieving a speedup factor close to seven. The tool is scalable and capable of higher speedup factors with more available data moving nodes. KeywordsParallel data migration–DMover tool–Linux cluster–Storage area network
Article
Full-text available
Despite the recent surge of research in query process-ing over data streams, little attention has been devoted to defining precise semantics for continuous queries over streams. We first present an abstract semantics based on several building blocks: formal definitions for streams and relations, mappings among them, and any relational query language. From these basics we define a precise interpretation for continuous queries over streams and re-lations. We then propose a concrete language, CQL (for Con-tinuous Query Language), which instantiates the abstract semantics using SQL as the relational query language and window specifications derived from SQL-99 to map from streams to relations. We identify some equivalences that can be used to rewrite CQL queries for optimiza-tion, and we discuss some additional implementation is-sues arising from the language and its semantics. We are implementing CQL as part of a general-purpose Data Stream Management System at Stanford.
Article
Full-text available
This paper gives an overview of our research in build-ing rare class prediction models for identifying known intrusions and their variations and anomaly/outlier detec-tion schemes for detecting novel attacks whose nature is unknown. Experimental results on the KDDCup'99 data set have demonstrated that our rare class predictive mod-els are much more efficient in the detection of intrusive behavior than standard classification techniques. Experi-mental results on the DARPA 1998 data set, as well as on live network traffic at the University of Minnesota, show that the new techniques show great promise in detecting novel intrusions. In particular, during the past few months our techniques have been successful in automatically identifying several novel intrusions that could not be de-tected using state-of-the-art tools such as SNORT. In fact, many of these have been on the CERT/CC list of recent advisories and incident notes.
Conference Paper
Full-text available
Processor virtualization via migratable objects is a powerful technique that enables the runtime system to carry out intel- ligent adaptive optimizations like dynamic resource manage- ment. CHARM++ is an early language/system that supports migratable objects. This paper describes Adaptive MPI (or AMPI), an MPI implementation and extension, that supports processor virtualization. AMPI implements virtual MPI pro- cesses (VPs), several of which may be mapped to a single physical processor. AMPI includes a powerful runtime sup- port system that takes advantage of the degree of freedom afforded by allowing it to assign VPs onto processors. With this runtime system, AMPI supports such features as au- tomatic adaptive overlapping of communication and com- putation, automatic load balancing, flexibility of running on arbitrary number of processors, and checkpoint/restart support. It also inherits communication optimization from CHARM++ framework. This paper describes AMPI, illus- trates its performance benefits through a series of bench- marks, and shows that AMPI is a portable and mature MPI implementation that offers various performance benefits to dynamic applications. Categories and Subject Descriptors D.1.3 (Concurrent Programming): Parallel programming
Conference Paper
Full-text available
We describe a multi-perspective vision studio as a flexible high performance framework for solving complex image processing and machine vision problems on multi-view image sequences. The studio abstracts multi-view image data from image sequence acquisition facilities, stores and catalogs sequences in a high performance distributed database, allows customization of back-end processing services, and can serve custom client applications, thus helping make multi-view video sequence processing efficient and generic. To illustrate our approach, we describe two multi-perspective studio applications, and discuss performance and scalability results.
Conference Paper
Full-text available
Stream processing fits a large class of new applications for which conventional DBMSs fall short. Because many stream-oriented systems are inherently geographically distributed and because distribution offers scalable load management and higher availability, future stream processing systems will operate in a distributed fashion. They will run across the Internet on computers typically owned by multiple cooperating administrative domains. This paper describes the architectural challenges facing the design of large-scale distributed stream processing systems, and discusses novel approaches for addressing load management, high availability, and federated operation issues. We describe two stream processing systems, Aurora* and Medusa, which are being designed to explore complementary solutions to these challenges. This paper discusses the architectural issues facing the design of large-scale distributed stream processing systems. We begin in Section 2 with a brief description of our centralized stream processing system, Aurora (4). We then discuss two complementary efforts to extend Aurora to a distributed environment: Aurora* and Medusa. Aurora* assumes an environment in which all nodes fall under a single administrative domain. Medusa provides the infrastructure to support federated operation of nodes across administrative boundaries. After describing the architectures of these two systems in Section 3, we consider three design challenges common to both: infrastructures and protocols supporting communication amongst nodes (Section 4), load sharing in response to variable network conditions (Section 5), and high availability in the presence of failures (Section 6). We also discuss high-level policy specifications employed by the two systems in Section 7. For all of these issues, we believe that the push-based nature of stream-based applications not only raises new challenges but also offers the possibility of new domain-specific solutions.
Article
Full-text available
In this paper we address the problem of automatically generating job workflows for the Grid. These workflows describe the execution of a complex application built from individual application components. In our work we have developed two workflow generators: the first (the Concrete Workflow Generator CWG) maps an abstract workflow defined in terms of application-level components to the set of available Grid resources. The second generator (Abstract and Concrete Workflow Generator, ACWG) takes a wider perspective and not only performs the abstract to concrete mapping but also enables the construction of the abstract workflow based on the available components. This system operates in the application domain and chooses application components based on the application metadata attributes. We describe our current ACWG based on AI planning technologies and outline how these technologies can play a crucial role in developing complex application workflows in Grid environments. Although our work is preliminary, CWG has already been used to map high energy physics applications onto the Grid. In one particular experiment, a set of production runs lasted 7 days and resulted in the generation of 167,500 events by 678 jobs. Additionally, ACWG was used to map gravitational physics workflows, with hundreds of nodes onto the available resources, resulting in 975 tasks, 1365 data transfers and 975 output files produced.
Conference Paper
Full-text available
"Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. In this article, we define this new field. First, we review the "Grid problem," which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources-what we refer to as virtual organizations. In such settings, we encounter unique authentication, authorization, resource access, resource discovery, and other challenges. It is this class of problem that is addressed by Grid technologies. Next, we present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. We describe requirements that we believe any such mechanisms must satisfy, and we discuss the central role played by the intergrid protocols that enable interoperability among different Grid systems. Finally, we discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. We maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.
Conference Paper
Full-text available
The model for data mining on streaming data assumes that there is a buffer of fixed length and a data stream of infinite length and the challenge is to extract patterns, changes, anomalies, and statistically significant structures by examining the data one time and storing records and derived attributes of length less than N. As data grids, data webs, and semantic webs become more common, mining distributed streaming data will become more and more important. The first step when presented with two or more distributed streams is to merge them using a common key. In this paper, we present two algorithms for merging streaming data using a common key. We also present experimental studies showing these algorithms scale in practice to OC-12 networks.
Conference Paper
Full-text available
Continuous query systems are an intuitive way for users to access streaming data in large-scale scientific applications containing many hundreds of streams. A challenge in these systems is to join streams in such a way that memory is conserved. Storing events that could not possibly participate in a join any longer wastes memory and limits scalability of the query processing system. This paper reports an experimentwe conducted to validate an algorithm we developed for adaptive rate, adjustable join windows. We posit that a rate-based strategy can result in memory savings, can be sufficiently responsive to rapid changes in stream rates, and can execute with suitably low overhead. Based on the results, we conclude that the algorithm adds between 0.007% and 2.6% overhead, with significant gains in memory utilization possible depending on the particular workload.
Conference Paper
Full-text available
With the advent of grid computing, more and more high-end computational resources become available for use to a scientist. While this opens up new avenues for scientific research, it makes reliability and fault tolerance of such a system a non-trivial task, especially for long running distributed applications. In order to solve this problem, we present a distributed user-defined checkpointing mechanism within the XCAT3 system. XCAT3 is a framework for component component architecture (CCA) based components consistent with current Grid standards. We describe in detail the algorithms and APIs that are added to XCAT3 in order to support distributed checkpointing. Our approach ensures that the checkpoints are platform independent, minimal in size, and always available during component failures. In addition, our algorithms maintain correctness in the presence of failures and scale well with the number of components, and checkpoint size.
Conference Paper
Full-text available
Summary form only given. Computational grids have been proposed as the next generation computing platform for solving large-scale problems in science, engineering, and commerce. There is an enormous amount of interest in applications, called grid workflows in which a number of otherwise independent programs are run in a "pipeline". In practice, there are a number of different mechanisms that can be used to couple the models, ranging from loosely coupled file based IO to tightly coupled message passing. We propose a flexible IO architecture that provides a wide range of mechanisms for building grid workflows without the need for any source code modification and without the need to fix them at design time. Further, the architecture works with legacy applications. We evaluate the performance of our prototype system using a workflow in computational mechanics.
Conference Paper
Full-text available
The dQUOB system conceptualization of data streams as database and its SQL interface to data streams is an intuitive way for users to think about their data needs in a large scale application containing hundreds if not thousands of data streams. Experience with dQUOB has shown the need for more aggressive memory management to achieve the scalability we desire. This paper addresses the problem with a two-fold solution. The first one is replacement of the existing first-come first-served scheduling algorithm with an earliest job first algorithm which we demonstrate to yield better average service time. The second one is an introspection algorithm that sets and adapts the sizes of join windows in response to the knowledge acquired at runtime about event rates. In addition to the potential for significant improvements in memory utilization, the algorithm presented here also provides a means by which the user can reason about join window sizes. Wide area measurements demonstrate the adaptive capability required by the introspection technique.
Conference Paper
Full-text available
Global computational grids bring together distributed computation/communication resources. Beyond this, we envision the emergence of global `service grids', which provide a `market' of application-level distributed services for clients to discover and to request. We study the issue of wide-area service discovery in service grids. We start with an existing basic wide-area service discovery framework. The framework adopts a scalable architecture consisting of a hierarchy of Discovery Servers. We then identify problems with the basic framework, and propose our enhancement of query responsiveness and QoS awareness. The key techniques we introduce include: (1) the addition of QoS feedback capability to clients; and (2) the caching and propagation of discovery results with QoS feedback in the discovery server hierarchy. With these techniques, the enhanced service discovery framework will be faster in finding qualified service providers. Furthermore, it will select a `good' (with respect to the QoS to be delivered) service provider for each querying client, based on QoS feedback
Conference Paper
Full-text available
Data-intensive, interactive applications are an important class of metacomputing (Grid) applications. They are characterized by large, time-varying data flows between data providers and consumers. The topic of this paper is the runtime adaptation of data streams, in response to changes in resource availability and/or in end user requirements, with the goal of continually providing to consumers data at the levels of quality they require. Our approach is one that associates computational objects with data streams. Runtime adaptation is achieved by adjusting objects' actions on streams, by splitting and merging objects, and by migrating them (and the streams on which they operate) across machines and network links. Adaptive streams also react to changes in resource availability detected by online monitoring
Conference Paper
Full-text available
Presents a distributed discovery method allowing individual nodes to gather information about resources in a wide-area distributed system made up of autonomous systems linked together by a network technology substrate. We introduce an algorithm and a model for distributed awareness and a framework for the dynamic assembly of agents monitoring network resources. Whenever an agent needs detailed information about the individual components of another system, it uses the information gathered by the distributed awareness mechanism to identify the target system, then creates a description of a monitoring agent that is capable of providing the information about remote resources, and sends this description to the remote site. There, an agent factory dynamically assembles the monitoring agent. This solution is scalable and is suitable for heterogeneous environments where the architecture and the hardware resources of individual nodes differ, where the services provided by the system are diverse, and where the bandwidth and latency of the communication links cover a broad range
Conference Paper
Full-text available
This paper discusses Nimrod, a tool for performing parametrised simulations over networks of loosely coupled workstations. Using Nimrod the user interactively generates a parametrised experiment. Nimrod then controls the distribution of jobs to machines and the collection of results. A simple graphical user interface which is built for each application allows the user to view the simulation in terms of their problem domain. The current version of Nimrod is implemented above OSF DCE and runs on DEC Alpha and IBM RS6000 workstations (including a 22 node SP2). Two different case studies are discussed as an illustration of the utility of the system
Conference Paper
The notion of agent has of late become popular in the Grid community, as exemplified by several workshops on the use of agents in the Grid. What are agents for the Grid? What is the difference between agents and Web-services? These are questions that we address by describing a port of the SoFAR agent framework to Web services in the context of a bioinformatics Grid. In this first paper, we focus our discussion solely on issues at the transport layer. Through an agent communication language (ACL)and an abstract communication model, we have been able to define a generic API to communications, and are able to support multiple protocols, including the XML protocol, the transport mechanism of Web services. This approach facilitates the development of applications, makes our environment future-proof, and promotes the open-ness of our Grid architecture to third-party developers.
Conference Paper
Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or institutions: what are sometimes called virtual organizations. In these settings, the discovery, characterization, and monitoring of resources, services, and computations are challenging problems due to the considerable diversity, large numbers, dynamic behavior, and geographical distribution of the entities in which a user might be interested. Consequently, information services are a vital part of any Grid software infrastructure, providing fundamental mechanisms for discovery and monitoring, and hence for planning and adapting application behavior. We present here an information services architecture that addresses performance, security, scalability, and robustness requirements. Our architecture defines simple low-level enquiry and registration protocols that make it easy to incorporate individual entities into various information structures, such as aggregate directories that support a variety of different query languages and discovery strategies. These protocols can also be combined with other Grid protocols to construct additional higher-level services and capabilities such as brokering, monitoring, fault detection, and troubleshooting. Our architecture has been implemented as MDS-2, which forms part of the Globus Grid toolkit and has been widely deployed and applied.
Article
Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require support for on-line analysis of rapidly changing data streams. Limitations of traditional DBMSs in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data. The purpose of this paper is to review recent work in data stream management systems, with an emphasis on application requirements, data models, continuous query languages, and query evaluation.
Article
this document to be the first to solve a problem known as NUG30. [11] NUG30is a quadratic assignment problem that was first proposed in 1968 as one of the most di#cultcombinatorial optimization challenges, but remained unsolved for 32 years because of itscomplexity
Article
We have implemented both checkpointing and migration of processes under UNIX as a part of the Condor package. Checkpointing, remote execution, and process migration are different, but closely related ideas; the rela- tionship between these ideas is explored. A unique feature of the Condor implementation of these items is that they are accomplished entirely at user level. Costs and benefits of implementing these features without kernel support are presented. Portability issues, and the mechanisms we have devised to deal with these issues, are discussed in concrete terms. The limitations of our implementation, and possible avenues to relieve some of these limitations, are presented.
Conference Paper
We propose a new resource discovery protocol, REALTOR, which is based on a combination of pull-based and push-based resource information dissemination. REALTOR has been designed for real-time component-based distributed applications in very dynamic or adverse environments. REALTOR supports survivability and information assurance by allowing the migration of components to safe locations under emergencies like external attack, malfunction, or lack of resources. Simulation studies show that under normal and heavy load conditions REALTOR remains very effective in finding available resources with a reasonably low communication overhead. REALTOR: 1) effectively locates resources under highly dynamic conditions, 2) has an overhead that is system-size independent, and 3) works well in highly adverse environments. We evaluate the effectiveness of a REALTOR implementation as part of Agile Objects, an infrastructure for real-time capable, highly mobile Java components.
Article
Recent research on programming models for developing applications on the Grid has proposed component-based models as a viable approach, in which an application is composed of multiple interacting computational objects. We have been developing a framework, called filter-stream programming, for building data-intensive applications that query, analyze and manipulate very large datasets in a distributed environment. In this model, the processing structure of an application is represented as a set of processing units, referred to as filters. In this paper, we develop the problem of scheduling instances of a filter group. A filter group is a set of filters collectively performing a computation for an application. In particular, we seek the answer to the following question: should a new instance be created, or an existing one reused? We experimentally investigate the effects on performance of instantiating multiple filter groups under varying application characteristics.
Conference Paper
This chapter discusses a framework for clustering evolving data streams. The clustering problem is a difficult problem for the data stream domain. This is because the large volumes of data arriving in a stream render most traditional algorithms too inefficient. In recent years, a few one-pass clustering algorithms have been developed for the data stream problem. Although such methods address the scalability issues of the clustering problem, they are generally blind to the evolution of the data and do not address the following issues: (1) the quality of the clusters is poor when the data evolves considerably over time. (2) A data stream clustering algorithm requires much greater functionality in discovering and exploring clusters over different portions of the stream. The widely used practice of viewing data stream clustering algorithms as a class of one-pass clustering algorithms is not very useful from an application point of view. The chapter discusses a fundamentally different philosophy for data stream clustering which is guided by application-centered requirements. It divides the clustering process into an online component, which periodically stores detailed summary statistics and an offline component, which uses only this summary statistics. The problems of efficient choice, storage, and use of this statistical data for a fast data stream turns out to be quite tricky. The concepts of a pyramidal time frame in conjunction with a micro-clustering approach are used.
Conference Paper
This paper introduces monitoring applications, which we will show differ substantially from conventional business data processing. The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS that is currently under construction at Brandeis University, Brown University, and M.I.T. We describe the basic system architecture, a stream-oriented set of operators, optimization tactics, and support for real- time operation.
Conference Paper
The education industry has a very poor record of productivity gains. In this brief article, I outline some of the ways the teaching of a college course in database systems could be made more efficient, and staff time used more productively. These ideas ...
Conference Paper
Relational query optimizers have traditionally relied upon table cardinalities when estimating the cost of the query plans they consider. While this approach has been and continues to be successful, the advent of the Internet and the need to execute queries over streaming sources requires a different approach, since for streaming inputs the cardinality may not be known or may not even be knowable (as is the case for an unbounded stream.) In view of this, we propose shifting from a cardinality-based approach to a rate-based approach, and give an optimization framework that aims at maximizing the output rate of query evaluation plans. This approach can be applied to cases where the cardinality-based approach cannot be used. It may also be useful for cases where cardinalities are known, because by focusing on rates we are able not only to optimize the time at which the last result tuple appears, but also to optimize for the number of answers computed at any specified time after the query evaluation commences. We present a preliminary validation of our rate-based optimization framework on a prototype XML query engine, though it is generic enough to be used in other database contexts. The results show that rate-based optimization is feasible and can indeed yield correct decisions.
Article
The recent development of gigabit networking technology, combined with the proliferation of low-cost, high-performance microprocessors, has given rise to metacomputing environments. These environments can combine many thousands of hosts, from hundreds of administrative domains, connected by transnational and world-wide networks. Managing the resources in such a system is a complex task, but is necessary to efficiently and economically execute user programs. In this paper, we describe the resource management portions of the Legion metacomputing system, including the basic model and its implementation. These mechanisms are flexible both in their support for system-level resource management but also in their adaptability for user-level scheduling policies. We show this by implementing a simple scheduling policy and demonstrating how it can be adapted to more complex algorithms. Keywords: parallel and distributed systems, task scheduling, resource management, autonomy Topic A...
Article
A number of applications increasingly rely on, or can potentially benefit from, analysis and monitoring of data streams. To support the processing of streaming data in a Grid environment, we have been developing a middleware system called GATES (Grid-based AdapTive Execution on Streams). Our target applications are those involving high-volume data streams and requiring distributed processing of data arising from a distributed set of sources. This paper addresses the problem of resource allocation in the GATES system. Although resource discovery and resource allocation have been active topics in Grid community, the pipelined processing and real-time constraint required by distributed streaming applications pose new challenges. We present a resource allocation algorithm that is based on minimal spanning trees. We evaluate the algorithm experimentally and demonstrate that it results in configurations that are very close to optimal, and significantly better than most other possible configurations. Copyright
Article
SUMMARY Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the eld of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must correspond to social structures. Throughout, we reect on the lessons of experience and chart the course traveled by research ideas as they grow into production systems.
Conference Paper
There are many application classes where the users are flexible with respect to the output quality. At the same time, there are other constraints, such as the need for real-time or interactive response, which are more crucial. This paper presents and evaluates a runtime algorithm for supporting adaptive execution for such applications. The particular domain we target is distributed data mining on streaming data. This work has been done in the context of a middleware system called GATES (grid-based adaptive execution on streams) that we have been developing. The self-adaptation algorithm we present and evaluate in this paper has the following characteristics. First, it carefully evaluates the long-term load at each processing stage. It considers different possibilities for the load at a processing stage and its next stages, and decides if the value of an adaptation parameter needs to be modified, and if so, in which direction. To find the ideal new value of an adaptation parameter, it performs a binary search on the specified range of the parameter. To evaluate the self-adaptation algorithm in our middleware, we have implemented two streaming data mining applications. The main observations from our experiments are as follows. First, our algorithm is able to quickly converge to stable values of the adaptation parameter, for different data arrival rates, and independent of the specified initial value. Second, in a dynamic environment, the algorithm is able to adapt the processing rapidly. Finally, in both static and dynamic environments, the algorithm clearly outperforms the algorithm described in our earlier work and an obvious alternative, which is based on linear-updates.
Conference Paper
The Computational Grid is a promising platform for the efficient execution of parameter sweep applications over large parameter spaces. To achieve performance on the Grid, such applications must be scheduled so that shared data files are strategically placed to maximize reuse, and so that the application execution can adapt to the deliverable performance potential of target heterogeneous, distributed and shared resources. Parameter sweep applications are an important class of applications and would greatly benefit from the development of Grid middleware that embeds a scheduler for performance and targets Grid resources transparently. In this paper we describe a user-level Grid middleware project, the AppLeS Parameter Sweep Template (APST), that uses application-level scheduling techniques [1] and various Grid technologies to allow the efficient deployment of parameter sweep applications over the Grid. We discuss several possible scheduling algorithms and detail our software design. We then describe our current implementation of APST using systems like Globus [2], NetSolve [3] and the Network Weather Service [4], and present experimental results.
Conference Paper
The notion of agent has of late become popular in the Grid community, as exemplified by several workshops on the use of agents in the Grid. What are agents for the Grid? What is the difference between agents and Web-services? These are questions that we address by describing a port of the SoFAR agent framework to Web services in the context of a bioinformatics Grid. In this first paper, we focus our discussion solely on issues at the transport layer. Through an agent communication language (ACL) and an abstract communication model, we have been able to define a generic API to communications, and are able to support multiple protocols, including the XML protocol, the transport mechanism of Web services. This approach facilitates the development of applications, makes our environment future-proof, and promotes the open-ness of our Grid architecture to third-party developers.
Conference Paper
Increasingly, a number of applications rely on, or can potentially benefit from, analysis and monitoring of data streams. Moreover, many of these applications involve high volume data streams and require distributed processing of data arising from a distributed set of sources. Thus, we believe that a grid environment is well suited for flexible and adaptive analysis of these streams. This paper reports the design and initial evaluation of a middleware for processing distributed data streams. Our system is referred to as GATES (grid-based adaptive execution on streams). This system is designed to use the existing grid standards and tools to the extent possible. It flexibly achieves the best accuracy that is possible while maintaining the real-time constraint on the analysis. We have developed a self-adaptation algorithm for this purpose. Results from a detailed evaluation of this system demonstrate the benefits of distributed processing, and the effectiveness of our self-adaptation algorithm.
Conference Paper
Wide area network (WAN)-based distributed computing (DC) has become an active field of research, especially under the label of grid computing. On the other hand, research on WAN-based real-time (RT) DC has remained in an embryonic stage. However, the situation is changing. The continuous increase in the availability of optical channels, called lambdas, along with a steady decrease of their costs, have given rise to efforts to establish an optical network infrastructure in which the possibility of dynamically allocating entire end-to-end light-paths to different message streams created by a moderate number of important applications is real. With such an infrastructure, there is no major obstacle to facilitating RT DC. The notion of RT distributed virtual computer (DVC), which can also be viewed as an RT sub-grid, is introduced. The challenging issues in adapting some promising technologies established for RT DC in LAN environments to enable realization of RT DVCs in WAN environments are then discussed. The programming model and resource management are the focus of the discussion.
Conference Paper
Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or institutions: what are sometimes called virtual organizations. In these settings, the discovery, characterization, and monitoring of resources, services, and computations are challenging problems due to the considerable diversity; large numbers, dynamic behavior, and geographical distribution of the entities in which a user might be interested. Consequently, information services are a vital part of any Grid software infrastructure, providing fundamental mechanisms for discovery and monitoring, and hence for planning and adapting application behavior. We present an information services architecture that addresses performance, security, scalability, and robustness requirements. Our architecture defines simple low-level enquiry and registration protocols that make it easy to incorporate individual entities into various information structures, such as aggregate directories that support a variety of different query languages and discovery strategies. These protocols can also be combined with other Grid protocols to construct additional higher-level services and capabilities such as brokering, monitoring, fault detection, and troubleshooting. Our architecture has been implemented as MDS-2, which forms part of the Globus Grid toolkit and has been widely deployed and applied
Conference Paper
Research on programming models for developing applications in the Grid has proposed component-based models as a viable approach, in which an application is composed of multiple interacting computational objects. We have been developing a framework, called filter-stream programming, for building data-intensive applications that query, analyze and manipulate very large data sets in a distributed environment. In this model, the processing structure of an application is represented as a set of processing units, referred to as filters. We develop the problem of scheduling instances of a filter group. A filter group is a set of filters collectively performing a computation for an application. In particular we seek the answer to the following question: should a new instance be created, or an existing one reused? We experimentally investigate the effects of instantiating multiple filter groups on performance under varying application characteristics
Conference Paper
Checkpointing of parallel applications can be used as the core technology to provide process migration. Both checkpointing and migration, are an important issue for parallel applications on networks of workstations. The CoCheck environment which we present in this paper introduces a new approach to provide checkpointing and migration for parallel applications. CoCheck sits on top of the message passing library and achieves consistency at a level above the message passing system. It uses an existing single process checkpointer which is available for a wide range of systems. Hence, CoCheck can be easily adapted to both, different message passing systems and new machines
Conference Paper
Adaptive load distribution is necessary for parallel applications to co-exist effectively with other jobs in a network of shared, heterogeneous workstations. We present three methods that provide such support for PVM applications. Two of these methods, MPVM (migratable PVM) and UPVM (user-level PVM), adapt to changes in the workstation environment by transparently migrating the virtual processors (VPs) of the parallel application. A VP in MPVM is a Unix process, while UPVM defines lightweight process-like VPs. The third method, ADM (adaptive data movement), is a programming methodology for writing programs that perform adaptive load distribution through data movement. These methods are discussed and compared in terms of effectiveness, usability and performance
Conference Paper
This paper describes an approach to supporting efficient processor virtualization and dynamic load balancing for message-based, parallel programs. Specifically, a user-level process package (UPVM) for SPMD-style PVM applications is presented. UPVM supports light-weight virtual processors that are transparently and independently migratable. It also implements a source-code compatible PVM interface, which means that existing PVM programs only need to be recompiled and re-linked. The performance of UPVM is discussed and compared with that of standard PVM