Article

SNAP: A Protocol for Negotiation of Service Level Agreements and Coordinated Resource Management in

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A fundamental problem with distributed applications is to map activities such as computation or data transfer onto a set of resources that will meet the application's requirement for performance, cost, security, or other quality of service metrics. An application or client must engage in a multi-phase negotiation process with resource managers, as it discovers, reserves, acquires, configures, monitors, and potentially renegotiates resource access. Current approaches to resource management tend to specialize for specific classes of resource (processor, network, etc.), and have addressed coordination across resources in a limited fashion, if at all. We present a generalized resource management model in which resource interactions are mapped onto a well defined set of platform-independent service level agreements (SLAs). We instantiate this model in the Service Negotiation and Acquisition Protocol (SNAP) which provides lifetime management and an at-most-once creation semantics for remote SLAs. The result is a resource management framework for distributed systems that we believe is more powerful and general than current approaches. We explain how SNAP can be deployed within the context of the Globus Toolkit.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The second case assumes availability of information on job start and execution times by means of mechanisms such as advance resource reservation and job runtime and queue waiting time prediction. There are many attempts towards this approach [6,34,22,35] in Grid research. ...
... Traditional computing resources are exposed by resource providers through welldefined remote interfaces of computing services on which users perform certain operations, in particular job submission, control and monitoring. For this reason, a lot of efforts around resource management in Grids have so far been focused only on the definition of standard resource management protocols exposing resources remotely as well as standard mechanisms for expressing resource and job requirements [5,6]. ...
... Although most of available Grid environments do not provide knowledge about job start and completion times there are attempts in a Grid community to realize Grid scheduling based on QoS and agreements instead of "best effort" approaches. To overcome these limitations many efforts have been made on issues such as service level agreements (SLA) and negotiations [6], advanced resource reservations [34] and performance prediction [33] as described in Section 2. Aforementioned trends and needs in a Grid community show that efficient methods of Grid scheduling for such a model are gaining more and more importance. In this section we present a multicriteria approach to this problem. ...
... Infrastructure SLA Planning 89 ...
... In [89], Czajkowski et al. define a protocol for SLA negotiation in the context of Grid computing. Within this definition there exists a resource description language, which is largely equivalent to the one presented in this Chapter. ...
... One difference between the two, is the additional semantics in the present work for resource clustering in virtual, dynamic groups. In addition, here there were presented advance reservation primitives, which were not within the scope of [89]. ...
Article
The present dissertation concerns the area of Service Computing. More specifically, it contributes to the topic of enabling IT service stacks with dependability, such that they can be used even further in pragmatic business environments and applications. The instrument used for this purpose is a Service Level Agreement (SLA). The main focus is on SLA Hierarchies, which reflect corresponding Service Hierarchies. SLAs may be established manually, or automatically among software agents; it is mainly the latter case that is considered here. The thesis contributes by means of a formal problem definition for the construction of SLA hierarchies using a translation process, a management architecture, a formal model for defining penalties and a representation that facilitates the processing of SLAs. Using these tools, it is shown that automated SLA management in hierarchical setups is possible, through an application to Multi-Domain Infrastructure-as-a-Service. Within this specific technical area, different SLA-based resource capacity planning approaches are examined via simulation -- both for online and offline planning. The former case concerns normal runtime operations, and the thesis examines two greedy algorithms with regard to their energy-savings efficiency and their performance. In the latter case, a resource-scarce environment is simulated with the purpose of minimizing penalties from already established SLAs. This is achieved via formally-defined combinatorial models, which are solved and compared to two greedy algorithms.
... (c) Determining the specific amount of concession to each negotiator's trading partner separately, instead of the same amount to all. Although there are many agentbased systems for negotiation in e-commerce (e.g., just to name a few: NDF [32], 2-phase negotiation [33], service negotiation [34], Kasbah [35], Tete-a-Tete [36], MDA and EMDA [37][38][39][40], Zhao and Li [41], SNAP [42][43][44] and An [45]), the strategies of most of them make the same concession amount for all negotiators' trading partners. In contrast, our work considers different concession amount for different negotiator's trading partners (by applying a multicriteria decision function) which provides more flexibility in keeping the chance of making deal (by computing rational and sufficiently minimum price) with more than one opponent. ...
... Whereas the agents in NDF [32], 2-phase negotiation [33], service negotiation [34], Kasbah [35], Tete-à-Tete (extended Kasbah, which focuses on multipleissue negotiation rather than single-issue negotiation) [36], MDA and EMDA [37][38][39][40], Zhao and Li [41] and An [45] considered the issue of time constraint, the agents in SNAP [42][43][44] and policy-driven negotiation [46] did not consider this issue in designing the agents. ...
... 2-phase negotiation [33], MDA and EMDA [37][38][39][40] and [45] modeled market dynamics in their concession making strategies, but NDF [32], service negotiation [34], Kasbah [35], Tete-à-Tete [36], SNAP [42][43][44] and policy-driven negotiation [46] and [41] did not consider the market factors in making concession amount. ...
Article
Providing an efficient resource allocation mechanism is a challenge to computational grid due to large-scale resource sharing and the fact that Grid Resource Owners (GROs) and Grid Resource Consumers (GRCs) may have different goals, policies, and preferences. In a real world market, various economic models exist for setting the price of grid resources, based on supply-and-demand and their value to the consumers. In this paper, we discuss the use of multiagent-based negotiation model for interaction between GROs and GRCs. For realizing this approach, we designed the Market- and Behavior-driven Negotiation Agents (MBDNAs). Negotiation strategies that adopt MBDNAs take into account the following factors: Competition, Opportunity, Deadline and Negotiator’s Trading Partner’s Previous Concession Behavior. In our experiments, we compare MBDNAs with MDAs (Market-Driven Agent), NDF (Negotiation Decision Function) and Kasbah in terms of the following metrics: total tasks complementation and budget spent. The results show that by taking the proposed negotiation model into account, MBDNAs outperform MDAs, NDF and Kasbah.
... A grid user can submit a job directly to a resource or can use a middleware that chooses a suitable resource for his job. Although some approaches in [Czajkowski et al. 2002] [Raman et al. 1998] [GRMS 2004] achieve the same goal, that is, to allocate a job to a resource, they lack a negotiation process to give intelligence to the resource allocation problem. This negotiation process should allow users and resources to define the conditions about how the grid service should be delivered. ...
... The work in [Czajkowski et al. 2002] presents the development of a protocol for negotiation called Service Negotiation and Acquisition Protocol (SNAP). SNAP uses three types of Service Level Agreements (SLAs): Task, Resource, and Binding. ...
... Works in [Czajkowski et al. 2002] [Raman et al. 1998] [GRMS 2004] do not present a suitable negotiation process. We believe that it is a way to give flexibility to the job submission by adding guarantees for the service delivery. ...
Article
Full-text available
Computational grids offer access to a large number of comput ational resources for the execution of parallel and distributed aplications. Thi s work presents the architecture of the HyperGrid, a platform for the execution of distributed and p arallel applications based on the message passing paradigm and written with MPI (Message Pass ing Interface). The HyperGrid is based on a virtual hypercube, that is a virtual network ove r the Internet. The Virtual Dis- tributed Hypercube Algorithm (DiVHA) is used to maintain th e hypercube and to monitor the system resources. The hypercube provides the necessary res ource location transparency and also hides resource heterogenity, providing a fault-toler ant environment that is capable of self reconfiguration when faults occur.
... The goal of this research is to design and build a Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems and that can serve as a basic building block for a broad range of large-scale distributed applications. Monitoring, querying, and reacting to changes in the state of a distributed system are core components of applications such as system management [11,36,71,80,84,93], service placement [35,94], grid scheduling [4,7,8,17,21,47,34,32,54,76], data sharing and caching [61,70,74,79,83,100], sensor monitoring and control [48,56], multicast tree formation [14,15,86,78,81], and naming and request routing [16,19]. We therefore speculate that an SDIMS in a networked system would provide a "distributed operating systems backbone" and facilitate the development and deployment of new distributed services. ...
... Using an existing grid information system API will allow a broad range of existing applications and services to make immediate use of our improved abstraction as well as enable new, more demanding grid applications. We will focus particular attention on developing a grid scheduler [4,7,8,17,21,47,34,32,54,76] based on the Community Scheduling Framework (CSF) [77] suitable for deployment by the Texas Advanced Computing Center across both campus-scale and state-scale grids. This scheduler will use SDIMS both to monitor system health and performance within clusters and to aggregate information across federations of clusters to guide global scheduling decisions. ...
... The goal of the proposed research is to design and build a Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems and that can serve as a basic building block for a broad range of large-scale distributed applications. Monitoring, querying, and reacting to changes in the state of a distributed system are core components of applications such as system management [11,36,71,80,84,93], service placement [35,94], grid scheduling [4,7,8,17,21,47,34,32,54,76], data sharing and caching [61,70,74,79,83,100], sensor monitoring and control [48,56], multicast tree formation [14,15,86,78,81], and naming and request routing [16,19]. We therefore speculate that an SDIMS in a networked system would provide a "distributed operating systems backbone" and facilitate the development and deployment of new distributed services. ...
Article
A Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems can serve as a basic building for a broad-range of large-scale distributed applications simplifying the design, development, and deployment of such services. In this document, we outline four key requirements such an aggregation system should satisfy to be useful as a general middleware building block - scalability with both nodes and data attributes, flexibility to accommodate broad range of services, administrative autonomy and isolation for availability and security, and robustness to reconfigurations in the system. We propose a new aggregation framework that leverages Distributed Hash Tables (DHTs) and a new aggregation abstraction that builds on a previously proposed abstraction in Astrolabe. We also present details of several significant applications that we propose to build on top of SDIMS.
... Service Negotiation and Acquisition Protocol (SNAP) is a well-known protocol aimed at managing access to and use of distributed computing resources in a coordinated fashion by means of Service Level Agreements (SLAs) [36]. SNAP coordinates the resource management through three types of SLAs, which separate task requirements, resource capabilities, and biding of tasks to resources. ...
... Therefore, we believe that most of the future work on resource co-allocation will continue to follow this approach as well. In addition, we have seen more researchers working on negotiation mechanisms for co-allocation requests in order to better satisfy user demand and resource provider requirements [36,44,75,107]. Negotiation is an important mechanism to avoid providers disclosing private information, such as load and resource capabilities, to the metascheduler. ...
... An increasing number of researchers are working on negotiation mechanisms for coallocation requests in order to better satisfy user demand and resource provider requirements [36,44,75,107]. In addition, negotiation is an important mechanism to avoid providers disclosing private information, such as load and resource capabilities, to the metascheduler. This thesis proposed the use of execution offers for deadline-constrained BoT applications, which is the initial step towards the development of negotiation protocols. ...
... A grid user can submit a job directly to a resource or can use a middleware that chooses a suitable resource for his job. Although some approaches in [Czajkowski et al. 2002] [Raman et al. 1998] [GRMS 2004] achieve the same goal, that is, to allocate a job to a resource, they lack a negotiation process to give intelligence to the resource allocation problem. This negotiation process should allow users and resources to define the conditions about how the grid service should be delivered. ...
... The work in [Czajkowski et al. 2002] presents the development of a protocol for negotiation called Service Negotiation and Acquisition Protocol (SNAP). SNAP uses three types of Service Level Agreements (SLAs): Task, Resource, and Binding. ...
... Works in [Czajkowski et al. 2002] [Raman et al. 1998] [GRMS 2004] do not present a suitable negotiation process. We believe that it is a way to give flexibility to the job submission by adding guarantees for the service delivery. ...
Article
Full-text available
In this paper a new approach for quantum computer simulations is presented. The proposal is creating a simulator where the main concern is not simply the results of the algorithm for a given input. Instead, this simulator will imitate, as close as possible, the internal behavior of a real quantum computer. In order to do that, Distributed Computing is necessary.
... A grid user can submit a job directly to a resource or can use a middleware that chooses a suitable resource for his job. Although some approaches in [Czajkowski et al. 2002][Raman et al. 1998][GRMS 2004] achieve the same goal, that is, to allocate a job to a resource, they lack a negotiation process to give intelligence to the resource allocation problem. This negotiation process should allow users and resources to define the conditions about how the grid service should be delivered. ...
... The work in [Czajkowski et al. 2002] presents the development of a protocol for negotiation called Service Negotiation and Acquisition Protocol (SNAP). SNAP uses three types of Service Level Agreements (SLAs): Task, Resource, and Binding. ...
... Works in [Czajkowski et al. 2002][Raman et al. 1998][GRMS 2004] do not present a suitable negotiation process. We believe that it is a way to give flexibility to the job submission by adding guarantees for the service delivery. ...
Article
Full-text available
Grid technology allows the sharing of resources within groups of individuals or organizations. A job submission in grid initially requires the identification of a list of servers that meet a certain job description. After, it is necessary to select the best server from this list. None of current researches associates the choice of the server with the service delivery conditions. In order to incorporate quality to the grid service it is important to know when the job will finish and what are the cost and quality factors involved. We present here a Multi-Agent System that chooses the best place to run a grid job by making use of negotiation. The prediction of job execution is achieved with case-based reasoning technique and the negotiation flexibility is delimited by resource policies. Our approach models different forms of negotiation, identified as multi-issue, bilateral and chaining negotiations.
... After that, the SLA becomes a standard protocol of business applications and Web Services [9]. Generally, There are two main specifications are designed to describe the SLA; 1) The Service Negotiation and Acquisition Protocol (SNAP) which support reliable management of remote SLAs and describe the negotiating process in the system [10]. 2) The conceptual SLA frame work for Cloud Computing that describes the main characteristics of SLAs in Cloud Computing and explains the SLA parameters specified by metrics for the four types of cloud services (i.e., IaaS, SaaS, PaaS, Storage as a Service) [6]. ...
...  Community Scheduler, which is an entity that acts as an intermediary between the community and its resources, and  File Transfer; it restricts the activity of sending and receiving requests from/to the user (e.g. transferring a file) with a deadline time [10]. ...
Article
Full-text available
The Service Level Agreement (SLA) becomes an important issue especially over the Cloud Computing and online services that based on the ‘pay-as-you-use’ fashion. Establishing the Service level agreements (SLAs), which can be defined as a negotiation between the service provider and the user, is needed for many types of current applications as the E-Learning systems. The work in this paper presents an idea of optimizing the SLA parameters to serve any E-Learning system over the Cloud Computing platform, with defining the negotiation process, the suitable frame work, and the sequence diagram to accommodate the E-Learning systems.
... Researchers [3,5,7,8,9,10] also proposed several negotiation strategies. In [3], three useful negotiation strategies including "greedy", "bumping" and "NCost" have proposed. ...
... Wainer et al. proposed strategies depending on availability of user information, time and preferences [8]. Kraus, Czajkowski proposed game theoretic and SLA based strategies [9,10]. ...
Article
Full-text available
Resolving conflicts using automatic negotiation for agent-based meeting scheduling is a challenge. In order to negotiate with all meeting participants strategically, a set of negotiation strategies and a strategy selection model are required. This research focuses on developing a strategy selection model for selecting an appropriate strategy from a set of different strategies to resolve or avoid meeting conflicts. The strategy selection model is based on analyzing historical data, current meeting scheduling, participants’ profile and preference data using AI techniques.
... Ideally, we believe that WS-Agreement should adopt a hybrid solution between these extremes. Some discovery processes will undoubtedly be subsumed by matchmakers, brokers, and other negotiating intermediaries [3,5]. However, abstracted characterization of providers should be possible in order to use existing registry models such as OGSI's ServiceGroup [8]. ...
... Beyond the practical issue of generic program code for reuse in WS-Agreement implementations, some use cases that we envision for WS-Agreement require generic behavior in deployed implementations. While intermediating negotiators may be specialized for a given problem domain [5], eg. the recently announced Platform CSF for computational job metascheduling, our experience with co-allocators [4] suggests that a number of intermediary behaviors are generic-in fact, much of the complexity of intermediaries comes from fault-handling behavior during the asynchronous negotiation in addition to any domain-specific planning mechanisms. We believe that an important category of WS-Agreement implementation will provide generic brokering or aggregate-scheduling capability "out of the box," while admitting run-time extension of the domain-specific negotiating details. ...
Article
The GRAAP working group of the Global Grid Forum is drafting a specication for the management of resources and services using negotiated service level agreements in a Web services environment (WS-Agreement). This memo discusses on- going design considerations for this activity, focusing on the desire to strike a balance between goals for e xibility, reusability, and interoperability of systems utilizing the WS-Agreement interface. If WS-Agreement services are to be discovered and uti- lized by clients in a large-scale environment, their extended negotiation capabilities, policies, and offered services must be available for search, inspection, and comparison. The GRAAP working group faces difcult design challenges to achieve these goals while utilizing Web services technologies for term and constraint languages, negotiation messages, negotiator characterization, and negotiator discovery.
... Distributed systems are one of the most important technological achievements in recent years [23,24], which have had a significant impact on the development of modern computing. The scope of distributed systems applications in everyday life is very wide [25,26], from local systems such as cars, ships, aircraft to global systems of millions of nodes used for data processing services; from simple built-in systems consisting of very small and simple sensors to those containing powerful computational components [27]; from built-in systems to those that support advanced interactive user interfaces. ...
Article
Full-text available
Electrical capacitance tomography (ECT) is one of non-invasive visualization techniques which can be used for industrial process monitoring. However, acquiring images trough 3D ECT often requires performing time consuming complex computations on large size matrices. Therefore, a new parallel approach for 3D ECT image reconstruction is proposed, which is based on application of multi-GPU, multi-node algorithms in heterogeneous distributed system. This solution allows to speed up the required data processing. Distributed measurement system with a new framework for parallel computing and a special plugin dedicated to ECT are presented in the paper. Computing system architecture and its main features are described. Both data distribution as well as transmission between the computing nodes are discussed. System performance was measured using LBP and the Landweber’s reconstruction algorithms which were implemented as a part of the ECT plugin. Application of the framework with a new network communication layer reduced data transfer times significantly and improved the overall system efficiency.
... Service Level Agreements (SLAs): There has been a significant amount of research on various topics related to SLAs. The usage of resource management in grids have been considered in [28]; issues related to specification of SLAs have been considered in [29]; and topics related to the economic aspects of SLAs usage for service provisioning through negotiation between consumers and providers are considered in [30]. A common characteristic (and/or inherent assumption) in the above-referenced body of prior work is that the customer's SLAs are immutable. ...
Preprint
Full-text available
In hosting environments such as IaaS clouds, desirable application performance is usually guaranteed through the use of Service Level Agreements (SLAs), which specify minimal fractions of resource capacities that must be allocated for use for proper operation. Arbitrary colocation of applications with different SLAs on a single host may result in inefficient utilization of the host's resources. In this paper, we propose that periodic resource allocation and consumption models be used for a more granular expression of SLAs. Our proposed SLA model has the salient feature that it exposes flexibilities that enable the IaaS provider to safelya transform SLAs from one form to another for the purpose of achieving more efficient colocation. Towards that goal, we present MorphoSys: a framework for a service that allows the manipulation of SLAs to enable efficient colocation of workloads. We present results from extensive trace-driven simulations of colocated Video-on-Demand servers in a cloud setting. The results show that potentially-significant reduction in wasted resources (by as much as 60%) are possible using MorphoSys.
... Network Resource Negotiation -RNAP [26] and SNAP [27] are two examples of negotiation protocols for networking and Grid computing resources, respectively. Both protocols are based on querying resource provider for the availability of a resources before making a reservation. ...
Article
Full-text available
To interconnect research facilities across wide geographic areas, network operators deploy science networks, also referred to as Research and Education (R&E) networks. These networks allow experimenters to establish dedicated circuits between research facilities for transferring large amounts of data, by using advanced reservation systems. Intercontinental dedicated circuits typically require coordination between multiple administrative domains, which need to reach an agreement on a suitable advance reservation. The success rate of finding an advance reservation decreases as the number of participant domains increases for traditional systems because the circuit is composed over a single path. To improve provisioning of multi-domain advance reservations, we propose an architecture for end-to-end service orchestration in multi-domain science networks that leverages software-defined exchanges (SDX) for providing multi-path, multi-domain advance reservations. We have implemented an orchestrator for multi-path, multi-domain advance reservations and an SDX to support these services. Our orchestration architecture enables multi-path, multi-domain advance reservations and improves the reservation success rate from 50% in single path systems to 99% when four path are available.
... In general, SLA management has been studied in the past few years and it has been mainly concentrated on the definition of languages and the specification of standards for SLA [6,23]. However, these standards are still evolving as they present some limitations. ...
... In general, SLA management has been studied in the past few years and it has been mainly concentrated on the definition of languages and the specification of standards for SLA [6,23]. However, these standards are still evolving as they present some limitations. ...
Conference Paper
Business services arguably play a central role in service-based information systems as they fill in the gap between the technicality of Service-Oriented Architecture and the business aspects captured in Enterprise Architecture. Business services have distinctive features that are not typically observed in Web services, e.g. significant portions of the functionality of business services might be executed in a human-mediated fashion. As such, service level agreement (SLA) should be described as a mixture of human-mediated functionality (e.g., service penalty) and computer-interpretable measurement (e.g., reliability, payment). In this paper, we propose a formal framework for reasoning about the SLAs from the perspective of services bundling – the practice of innovatively organizing business services into a bulkier service offering that creates new values. Specifically, we (a) represent multi-level SLA of a business service in terms of service reliability, payment and penalty using the mathematical structure of semiring; (b) provide formality for aggregating SLAs of the constituent services that make up the service bundling; (c) make multi-level SLAs of a bundled service technically comparable. The main contribution of this work is a machinery for handling a large number of SLAs generated through services bundling, allowing to the service consumers to pick up the right service offering according to their preference.
... As the participating parties of grid market are independent bodies, negotiation activities are required [2] . An application or client must engage in a multi-phase negotiation process with resource managers, as it discovers, reserves, acquires, configures, monitors and potentially renegotiates resource access [3] . In this context, negotiation has emerged as one of the important activities in the grid market. ...
Article
Problem statement: As grid resources are geographically distributed, e fficient resource discovery and management has become one of the important requirements. Besides, Grid users are independent identities and negotiation is necessary for reconciling their diverse characteristics. Therefore special mechanism was required to negotiate and discover the required resource or similar resource as an alternative when discovery fails. Mo reover the quality of the service being provided in the grid environment depends on both functional as well as the Non-Functional Requirements (NFR). But conflicts between NFRs are not yet resolved eff ectively. Discover the requested resources to the requester, Provide compromised alternate resources by negotiation when resource discovery fails to increase the success rate of the agent, Provide kno wledge for efficient management of resources and quality of service is to be improved by considering NFR. Approach: A system Agent Based Grid Resource Discovery with Negotiated Alternate Solution and Non-Functional Requirement Preferences (AGRD_NFRP) was proposed to provide an expeditious resource and most relevant alternate resource when discovery fails. Four types of intelligent an d mobile agents were proposed for judicious management of resources to the advantage of resource providers and requesters in ensuring speedy execution of processes. Resource discovery, negoti ation and alternate solution were handled by these agents. In order to improve the quality of the serv ice the non-functional requirements of the grid use r request with their preferences were identified and conflicts among them were analyzed using fuzzy rules. Results: The results showed that the AGRD_NFRP system proposed is producing consistently higher success rate by providing alternate solution and getting knowledge from the cognitive agent. Quality of the service was enriched by prioritizing the preferences of grid user. Conclusion: On numerous occasions, grid users face non availabilit y of high-end resources for completing the task on hand. In this context, the approach outlined in thi s research is most appropriate, convenient and efficient. The AGRD_NFRP system proposed herein played a crucial role in bridging the seemingly wide gape between resource requester and resource o wners.
... In 2002 Czajkowski et al. [67] presented their work on a Service Negotiation and Acquisition Protocol , called SNAP. This protocol aims to allow remote management of Service Level Agreements beyond the borders of different resource providers. ...
Thesis
Full-text available
This thesis describes a Service Level Agreement Schema for the High Performance Computing domain and the according architecture to allow for SLA Management, which are both developed on base of three different use cases.
... Commitment and assurances are specified in terms of Service Level Agreements (SLAs). SLAs either provide some measurable capability or perform a specific task, and thereby allow Grid users to know what is expected from a service without requiring detailed knowledge of the service providers' policies [64,65]. ...
... It is to be noted, there is a deficiency of re-allocation which results to a significant challenge for the users and brokers, because agents are out of communication among themselves and they have no knowledge of the resources while both users and brokers needs agents to accomplish the resources allocation in a short period time. So by utilizing quasi-transactional mechanisms which is advance reservation and agreement for resource allocation, (Czajkowski et al., 2002), may resulted to decomposition of co-allocation problem into a set of simpler independent operation types with separate learning states for each agents. ...
Article
Full-text available
As computing technology improves the accessibility to computing resources increases, the demands put on resources gets higher and higher. A grid is a large-scale, heterogeneous, dynamic collection of independent systems, geographically distributed and interconnected with high speed networks. Furthermore, in grid, the resource allocation is a process of allocating user jobs to the CPUs. These jobs are divided into tasks which are allocated to different computers on grid for execution process. Resource allocation is one of the critical features of grid technology. Thus, we found that resource heterogeneity has a great impact on resource allocation which is quite significant in terms of performance, reliability, robustness and scalability. Indeed, the system robustness increases as the system complexity increases. In other words, resource allocation is also an NP complete problem where there is no final solution. The main objective of this study is to review the various grid resource allocations strategies which will in turn serve as a guide for researchers and our vision for future research directions. Therefore, to facilitate further developments in the area, it is essential to survey and review the existing body of knowledge. Therefore, in this chapter, we have studied and classified various ways to achieve an optimum solution. Operation research management (game theory and transportation method) which have been widely use in grid resource allocation for optimum solution, we will design and evaluate a new algorithm for resource allocation either by using simulation or real grid environment.
... Traditionally for outsourced infrastructure the Service Level Agreement (SLA) contract provided knowledge of the quality of service expected comparable with internal service provision (Willcocks et al., 2011a). Such contracts were mechanisms for negotiating the relationship between IT vendor and client, establishing trust and anticipated risk (Czajkowski et al., 2002;Buyya et al., 2009b). At present, however, cloud SLAs are often weak and ineffectual for this purpose: 'In the cloud market space, meaningful SLAs are few and far between, and even when a vendor does have one, most of the time it is toothless' (Durkee, 2010). ...
Article
Full-text available
Cloud computing has become central to current discussions about corporate information technology. To assess the impact that cloud may have on enterprises, it is important to evaluate the claims made in the existing literature and critically review these claims against empirical evidence from the field. To this end, this paper provides a framework within which to locate existing and future research on cloud computing. This framework is structured around a series of technological and service ‘desires’, that is, characteristics of cloud that are important for cloud users. The existing literature on cloud computing is located within this framework and is supplemented with empirical evidence from interviews with cloud providers and cloud users that were undertaken between 2010 and 2012. The paper identifies a range of research questions that arise from the analysis.
... It leverages the Globus Monitoring and Discovering System (MDS) to aggregate resource information and enforces scheduling policies based on an auctioning mechanism [92]. To process resource negotiation, it is required that resource brokers use a common protocol like SNAP [93], and the negotiation result is difficult to predict. Gridway's scheduling system follows the "greedy approach", implemented by the round-robin algorithm. ...
Article
Full-text available
Biomedical, translational and clinical research through increasingly complex computational modeling and simulation generate enormous potential for personalized medicine and therapy, and an insatiable demand for advanced cyberinfrastructure. Metascheduling that provides integrated interfaces to computation, data, and workflow management in a scalable fashion is essential to advanced pervasive computing environment that enables mass participation and collaboration through virtual organizations (VOs). Avian Flu Grid (AFG) is a VO dedicated to members from the international community to
... A language was described for unambiguous and precise specification of SLAs, and a monitoring engine that aggregates measurements from multiple sites. The SNAP protocol [4][4] was developed for negotiating SLAs and coordinating resource management in distributed systems. It presents a model for managing the process of negotiating access to, and the use of resources through the definition of a framework within which reservation, acquisition and task submission can be expressed for any resource in a uniform fashion. ...
Article
Full-text available
Recent advances in Grid computing have lead to real world deployments of grid implementations in the eScience and commercial domains. With the increasing demands on these resources, the role of Service Level Agreements (SLA) becomes hugely important, as an SLA is the means used to define the terms of usage and obligations for the relevant parties. However, the current methods used to manage grid resources are not capable of supporting SLAs. The system has to be robust and be capable of adapting to changing resource demands from continuous SLA requests. In this paper we present our adaptive approach to SLA management for Grid systems based on the adaptive mechanisms of cognitive control in the human brain and an ontological approach to SLA decomposition.
... QoS guarantees based on advance reservations were investigated in [2]. Service Level Agreement protocols for Web services, and multi-agent systems are specified in [3][4][5][6][7]. A general-purpose XML based policy framework is defined in [8]. ...
Article
As a result of ever-increasing network performance, and growing needs in the science community for large-scale computations on huge data sets, the research fields of parallel computing and distributed computing merged into what became known as Grid computing. A Grid is a system that coordinates decentralized resources, such as CPU, disk space and bandwidth, while leveraging ubiquitous communication protocols to solve complex problems. Due to the typically large scale of the Grid, the resources are often heterogeneous in nature, which puts a high burden on the infrastructure a.k.a. middleware that shields the end users from the back-end complexity. On the other hand the feature heterogeneity of the Grid resources also provides an enormous opportunity to leverage a multitude of Quality of Service (QoS) characteristics. In order to reason about provided as well as expected QoS of resources, a common language must be used. These languages are often referred to as policy languages. The contract between a resource and a user that describes promised QoS, is frequently referred to as a Service Level Agreement (SLA). It is a non-trivial task to make sure that SLA contracts in a Grid environment are adhered to, because it often spans many software and hardware abstraction layers. Coordinated monitoring and management software is thus vital for such a system. Policing contracts is not simply a matter of making sure that users don't exceed their QoS grants, and that resources fulfill their promises. Such a static view may in fact lead to worse resource utilization and less service offered to the user. Instead intelligent policing agents could be used by the infrastructure as an adaptive, self- healing, autonomous mechanism to optimize various resource or user goals. This more dynamic and less restrictive way of reasoning fits well into the area of soft computing and rule based systems. Fuzzy logic, for example, facilitates decision-making based on approximations, uncertainties, and conflicting data with smooth transitions between policy choices. In conjunction with a learning and adaptive system, such as a neural network for populating the fuzzy rule base with knowledge about usage, it could serve as an agent framework baseline. 2 Problem Statement Resource providers may want to optimize utilization, whereas resource users may want to optimize response time while minimizing cost. These goals can be contradictory though, and may hence be dealt with by separate agents. The problem that this work is addressing is how independent distributed agents who leverage ubiquitous Grid and Multi Agent protocols and standards work can automatically refine contracts, based on monitored performance, usage records and soft computing decision-making, rule-based algorithms, such as fuzzy logic theory and neural networks. The emphasis will be on the SLA trade-offs, and languages and methodologies for implementing and describing such trade-offs. Simulations will be focused on
... (4) QoS 保证机制.即网格"提供非平凡的服务质量" [20] ,它表现了广义的 QoS,包括安全性、可靠性和性能等 方面.目前网格 QoS 研究已经延伸到上述 3 个层次,如基于服务级别协定(service level agreement)来描述用户对 任务、任务对资源的 QoS 需求 [21,22] ,基于协定的资源管理机制和支持特定 QoS 指标的任务调度机制来强化虚 拟组织对端到端 QoS 的保证 [23] ,以及在抽象资源层引入过载保护、负载平衡和差分服务等机制支持 QoS 等. ...
Article
Grid is a new paradigm of Internet computing to share distributed resources and collaborate among them. A web service-based approach for Grid can improve the extensibility and interoperability of Grid system. In this paper, a layered Grid functional model is discussed; within the OGSA (open grid service architecture) framework, a Web service-based Grid architecture is presented. An approach of integrating web services and Grid technology is proposed. Web service workflow technology is used to model the task of the Grid application and its requirement on resource services. The architecture of a Web service-based Grid supporting environment called WebSASE4G is introduced, which gives a new approach to Web service based Grid architecture.
... There exist a number of other projects working on related issues, some trying to cover the complete resource brokering problem, other focusing on more specific features, such as to enable resource advance reservations, negotiation for service level agreements, allocation of resources for interactive use, scheduling algorithms etc. For examples, see [1,2,3,4,7,13]. ...
Article
Full-text available
This contribution presents the ongoing development of a resource man- ager for use in early production grids. Even though our main focus is to develop a stable brokering facility for current production grids, we also address features needed in further improved resource managers for future enhanced grid infrastructures. The primary target environment is the NorduGrid platform, comprising around 20 parallel systems in 5 countries, available for production grid jobs 24 hours a day. Application character- istics considered include serial, parallel, and coordinated multi-resource jobs running in sequence or in parallel, all types in either interactive or non-interactive mode. The brokering process aims to minimize the time to delivery for each individual job and is based on a number of new fea- tures including reservation capability, information about currently used or reserved capacity, benchmark-scaled time predictions, and queue adap- tation capability. We present the basic motivations for all these features and discuss various issues regarding their implementations in the current grid environment.
... SLA Languages have been proposed in [18][19][20] and are now converging within the Grid community in the WS-Agreement specification in the Global Grid Forum. Some initial experiments on WS-Agreement templates have been performed in [21]. ...
Article
In this paper1 we present the requirements of a national computing Grid. In particular we discuss the issues involved in managing complex policies of multiple stakeholders in such a large-scale, dynamic, and heterogeneous Grid. We also propose a Service Level Agreement (SLA) and agent-based architecture to address these issues. This work is a continuation of the work performed and experiences gained when we developed a Grid accounting system for the Swedish national Grid network, called SweGrid, which provides the foundation for the investigation presented here. We conclude that many SLA concepts fit very well within the SweGrid network to address some of the issues of the current system. Future work includes prototyping parts of the SLA framework and running simulations before eventually deploying it in the SweGrid production environment.
... This represents an agreement specification for a desired level of performance or time constraint for their application. At present we use a request() / agree() protocol similar to that specified within the SNAP framework [Czajkowski et al. 2002]. The SLA Manager enters into an agreement with the Resource Broker, which provides a reservation guarantee with the resource provider; this is a Resource Service Level Agreement (RSLA). ...
Article
Full-text available
Users of Grid systems often need to attach Quality of Service (QoS) information such as time or performance constraints to guarantee timely execution of their application. Grid resources have varying quality and reliability and can easily be swamped by competing applications. If this coincides with the users execution their results may be delayed. In support of this we propose a Service Level Agreement (SLA) management system including resource reservation and run-time adaptation. Our system has the capability of predicting the execution time of an SLA bound application before and during runtime. A historical usage record for auditing and prediction of future execution times is also described. Through experimental analysis we show our solution is capable of predicting with some accuracy the execution time of SLA bound applications before and during runtime. Mechanisms for automated monitoring and violation capture are presented showing how performance and time constraints can be validated. Adaptation through migration is proved useful in reducing the execution time of our application when the CPU load available to that application is reduced.
... The SLA either provides some measurable capability or performs a specific task. An SLA allows the user to know what is expected from a service without requiring detailed knowledge of the provider's policies [5,6]. ...
Article
Full-text available
Service Level Agreements (SLAs) are introduced to overcome the shortages of best-effort approach in Grid computing and make Grid computing more attractive for commercial uses. Yet commercial Grid providers are not keen to adopt SLAs, since there is a risk of SLA violation, which will result in a penalty fee. This paper analyses failure data collected from three different Grid sites. We study the statistics of the data including the root cause, the mean time to repair and time between failures. We find that software and hardware failures are the largest contributors, and that the time to repair varies, depending on the root cause, from 13 hours in network errors to around 46 hours in unknown errors. We also find that the repair time is well modelled by a Weibull distribution. From the analysis of the historical data we find that the distribution between failures in a Grid system is well modelled by a Weibull distribution with decreasing hazard rate, and this can be used by a resource provider to predicate the risk of failure.
... Grid Reputation Service is responsible for the acquisition, storing, dissemination, retrieving and aggregation of first-hand reputation ratings, while Grid Contract Service provides a supervising mechanism and help with the negotiation of service providers and consumers. Besides, we adopt the Service Negotiation and Acquisition Protocol (SNAP) proposed in [7], which gives 3 kinds of service level agreement: RSLA (Resource Service Level Agreement), TSLA (Task Service Level Agreement) and BSLA (Binding Service Level Agreement). The 3 agreements supplement each other, clarify an interaction's context, which are ideal container for first-hand reputation ratings. ...
Article
Summary The problem of resource selection in Grid is challenging because of the wide range of selection and the high degree of strangeness. Efficient resource sharing and utilization cannot be achieved without the guarantee of a higher degree of trust relationship. In this paper, reputation mechanism is introduced to resource selection in Grid, which aims at leverage the guarantee of trustworthiness and reliability. According to the fact that reputation is multi-faceted and uncertain, guided by the evaluation and decision making ideas from fuzzy partial ordering, the proposed approach makes fuzzy partial order modeling on each resource provider's multi-faceted reputation, integrates overall information, and choose proper resource providers according to the final integrative order. Compared with other methods, this approach has better overall consideration.
... The SNAP supports the reliable management of remote SLAs[Czajkowski 2002]. 11 I CMU/SEI-2008-TN-021 ...
Article
Full-text available
Quality attribute requirements play an important role in service selection in a service-oriented architecture environment. It is easy to envision finding several services that fulfill the functional requirements but fail to meet important quality attribute measures. Service level agreements provide the architect with a specification of the verifiable quality characteristics that the service will provide. Such a specification allows the architect to select the service that best supports the system's quality attribute requirements. This report surveys the state of practice in service level agreement specification and offers guidelines on how to assure that services are provided with high availability, security, performance, and other required qualities. In addition, this report discusses the quality properties that have been expressed in service level agreements, and it considers the mechanisms used to assure service quality by contract and those used by service providers to achieve and monitor quality properties. Also, this report examines the support for expressing service quality requirements in existing service technologies and describes areas where more research is needed.
... The full exploitation of the infrastructure by a number of different user groups will indeed require several concurrent operating policies running across different virtual organisations and over different geographical sites. It will only be in meeting these predefined policies (and therefore the service-level agreements) that the notion of a trusted ubiquitous system will be established [2,11]. There are several research areas which support the development and delivery of universally available and trusted services. ...
Article
Full-text available
This paper reports on two strands of work that are being undertaken as part of the EPSRC funded DOPCHE project. The paper focuses on open soft-ware architectures for dynamic operating policies and a performance model used to find optimal operating policies.
... A language was described f or unambiguous and precise specification o f SLAs, and a monitoring engine that aggregates measurements from multiple sites. The SNAP protocol [7] was developed for negotiating SLAs and coordinating resource management in distributed systems. It presents a model for managing the process of negotiating access to and the use of resources through the definition of a framework within which reservation, acquisition and task submission can be expressed for any resource in a uniform fashion. ...
Article
Full-text available
Current implementations of grids exist in the research and academic communities, where applications are generally characterised as being computationally intensive usually involving large amounts of data. For these purposes and in this environment a 'best effort' resource guarantee is sufficient. However, as grid uses mature in both the academic and commercial arenas, grid providers will require some form of service level management to address these issues. This paper presents a service l evel approach to grid resource management for SLAs, focussing on the management of SLAs and modelling of resources required by these processes.
... Traditionally for outsourced infrastructure the Service Level Agreement (SLA) contract provided knowledge of the quality of service expected comparable with internal service provision (Willcocks et al., 2011a). Such contracts were mechanisms for negotiating the relationship between IT vendor and client, establishing trust and anticipated risk (Czajkowski et al., 2002;Buyya et al., 2009b). At present, however, cloud SLAs are often weak and ineffectual for this purpose: 'In the cloud market space, meaningful SLAs are few and far between, and even when a vendor does have one, most of the time it is toothless' (Durkee, 2010). ...
Article
Full-text available
This paper reviews how technological artefacts are employed within Knowledge Management interventions. The paper first describes the nature of technology within Knowledge Management practice. It then draws upon a categorisation of knowledge management as either functionalist or interpretivist to consider the use of technology either encoding knowledge objects, or in supporting personalisation and the emergence of communities of practice. Finally the paper draws upon phenomenological writings, in particular the work of Martin Heidegger, in order to consider the way in which individuals engage with technology and how this impacts upon the desire of knowledge management technology. Finally the paper concludes by calling up future research to consider the situated design of technology for Knowledge Management.
... The broker hides all the complexity of grids from users. Similarly, there is also a need of legal support that can resolve various conflicts between providers and consumers, such as violation of Service level Agreement (SLA) [15]. Thus, the legal support can come from some authoritative agency such as country government. ...
Chapter
Market-oriented computing has gained a lot of attention both from industry and academia. Grid computing is the major paradigm, which supports the market-oriented computing, thus can enable vision of computing as utility a reality. Most important challenge in enabling utility Grids is the resource management and scheduling. From last decade many researchers has try to address many issues within the resource management and scheduling but still it looks far away from the original vision. Thus, to find out the gaps and direct future research, this chapter summarizes and classified all the important works through a comprehensive Taxonomy. This chapter also presents the survey of the most popular market-oriented resource management systems with research gaps still needed to be filled in. This survey is intended to help researchers to make cooperative effort towards the goal of utility grids and provide insights for extending and reusing the existing grid middleware.
... These agents are negotiating common SLA basing on application requirements and site's free resources or they are used for reservation of resources before they are really needed. The paper [5] defines monitoring grid applications using SLAs, and in [6] a protocol for SLAs negotiation was defined. Solutions like [7] or [8] have one agent as a part of resource broker and monitoring agents on sites. ...
Conference Paper
Full-text available
Highly demanding application running on grids needs carefully prepared environments. Real-time High Energy Physics (HEP) application from Int.eu.grid project is a good example of an application with requirements difficult to fulfill by typical grid environments. In the paper we present Service Level Agreement metrics which are used by application’s dedicated virtual organization (HEP VO) to sign SLA with service providers. HEP VO with signed SLAs is able to guarantee sufficient service quality for the application. These SLAs are enforced using presented VO Portal.
... Research on future generation Grids is moving its focus from the basic infrastructure that enables the allocation of resources in a dynamic and distributed environment in a transparent way to more advanced management systems that accept and process complex jobs and workflows consisting of numerous sub-tasks and even provide guarantees for the completion of such jobs. In this context, the introduction of service level agreements (SLA) enables flexible mechanisms for describing the qualityof-service (QoS) supported by the multiple resources involved, including mechanisms for negotiating SLAs [5]. The introduction of SLAs implies prices for resource usage and also penalty fees that must be paid when the assured QoS is not met. ...
Chapter
Full-text available
In this paper, we describe the architecture of the virtual resource manager VRM, a management system designed to reside on top of local resource management systems for cluster computers and other kinds of resources. The most important feature of the VRM is its capability to handle quality-of-service (QoS) guarantees and service-level agreements (SLAs). The particular emphasis of the paper is on the various opportunities to deal with local autonomy for resource management systems not supporting SLAs. As local administrators may not want to hand over complete control to the Grid management, it is necessary to define strategies that deal with this issue. Local autonomy should be retained as much as possible while providing reliability and QoS guarantees for Grid applications, e.g., specified as SLAs.
... Applied to the Grid, there are a number of automated "discover, find and bind" protocols and demonstration systems (for example those described in [7] and [17]) some of which use software agents to perform the negotiation to gain access to the services being negotiated for. Whilst these are certainly useful, we assert that there must be a limit determined by the cost, risk and value of the negotiated item, above which humans must get involved. ...
Conference Paper
Full-text available
We describe a number of strategies for a future service oriented market place. We describe the SLA’s role within the service framework, and how it enables customers to make value judgements regarding the quality of a service. We also discuss the complexity of too much choice from both the customer and provider points of view, and advocate a “discrete offer” approach. We discuss the “cost of negotiation” and argue that it must be carefully balanced with the cost, value and risk of the offering being negotiated for. We add to the negotiation analysis with presentation and discussion of some results showing a simulated Grid market place and show that it is possible for service providers to deny themselves work through attempting to offer a high quality guaranteed service.
... 4 For flexibility, certain terms of the contract can be negotiated. 5,6,7 The subscription is tracked during the fulfillment process to make sure that the service level guarantees agreed upon in the SLA are adhered to, a process referred to as compliance monitoring. Data on service and resource us-age, as well as data on any violation of service level guarantees, are used in billing and reporting. ...
Article
Full-text available
In this paper we describe a framework for providing customers of Web services differentiated levels of service through the use of automated management and service level agreements (SLAs). The framework comprises the Web Service Level Agreement (WSLA) language, designed to specify SLAs in a flexible and individualized way, a system to provision resources based on service level objectives, a workload management system that prioritizes requests according to the associated SLAs, and a system to monitor compliance with the SLA. This framework was implemented as the utility computing services part of the IBM Emerging Technologies Tool Kit, which is publicly available on the IBM alphaWorks™ Web site.
... A simple two-step request–response schema is provided for SLA negotiation in Cremona. Similarly [11] , presents a model and protocol for negotiating SLA over accessing resources in distributed environments. Devoting to presenting the requirements of precision and flexibility for SLA specification [16] and analysing the SLA monitoring model from the XML-specific aspect [23], HP proposes an automated and distributed SLA monitoring engine that considers both provider side and client side measurement of SLA and deals with the scenarios where web service providers work and contract with each other to fulfil the customer's request [13]. ...
Article
In the web services environment, service level agreements (SLA) refers to mutually agreed understandings and expectations between service consumers and providers on the service provision. Although management of SLA is critical to wide adoption of web services technologies in the real world, support for it is very limited nowadays, especially in web service composition scenarios. There lacks adequate frameworks and technologies supporting various SLA operations such as SLA formation, enforcement, and recovery. This paper presents a novel agent-based framework which utilises the agents’ ability of negotiation, interaction, and cooperation to facilitate autonomous SLA management in the context of service composition provision. Based on this framework, mechanisms for autonomous SLA operations are proposed and discussed. Results from simulations show that by integrating agents and web services the framework can address issues of SLA management drawn from sophisticated service composition scenarios.
Article
Full-text available
The Distributed Computing Column covers the theory of systems that are composed of a number of interacting computing elements. These include problems of communication and networking, databases, distributed shared memory, multiprocessor architectures, operating systems, verification, internet, and the web.This issue consists of the paper "Distributed Computing Research Issues in Grid Computing" by Henri Casanova. Many thanks to Henri for contributing to this issue.
Article
Quality of service (QoS) can be a critical element for achieving the business goals of a service provider, for the acceptance of a service by the user, or for guaranteeing service characteristics in a composition of services, where a service is defined as either a software or a software-support (i.e., infrastructural) service which is available on any type of network or electronic channel. The goal of this article is to compare the approaches to QoS description in the literature, where several models and metamodels are included. consider a large spectrum of models and metamodels to describe service quality, ranging from ontological approaches to define quality measures, metrics, and dimensions, to metamodels enabling the specification of quality-based service requirements and capabilities as well as of SLAs (Service-Level Agreements) and SLA templates for service provisioning. Our survey is performed by inspecting the characteristics of the available approaches to reveal which are the consolidated ones and which are the ones specific to given aspects and to analyze where the need for further research and investigation lies. The approaches here illustrated have been selected based on a systematic review of conference proceedings and journals spanning various research areas in computer science and engineering, including: distributed, information, and telecommunication systems, networks and security, and service-oriented and grid computing.
Article
Full-text available
As resource management becomes a hot research in Grid Computing area, current research forces on solving heterogeneity of grid environment, but the research on enhancing the efficiency of resource management on condition of delivering seamless QoS (quality of service) is not very abundant. In addition, current research about Grid QoS forces on importing related fruit on QoS from multimedia network to support Grid QoS. For that, a hierarchical structure of gird QoS is proposed in this paper. QoS parameters are newly classified into five categories and they can be measured at VO (virtual organization) layer. Then by making use of SNAP (service negotiation and acquisition protocol), the analysis on QoS parameter mapping and converting based on the hierarchical structure model is also addressed. At last, the research on Grid QoS is applied to scheduling heuristics to improve on Min-Min algorithm. The result of the simulation shows that QoS-based resource management can effectively improve grid resource utilization and service ask for success rate in dynamic service-oriented grid.
Conference Paper
Service Level Agreements (SLAs) are a vital instrument in service-oriented architectures to reserve service capacity at a defined service quality level. Provisioning systems enable service managers to automatically configure resources such as servers, storage, and routers based on a configuration specification. Hence, agreement provisioning is a vital step in managing the life-cycle of agreement-driven services. Deriving detailed resource quantities from arbitrary SLA specifications is a difficult task and requires detailed models of algorithmic behavior of service implementations and capacity of a – potentially heterogeneous – resource environment, which are typically not available today. However, if we look at, e.g., data centers today, system administrators often know the quality-of-service properties of known system configurations and modifications thereof and can write the corresponding provisioning specifications. This paper proposes an approach that leverages the knowledge of existing data center configurations, defines templates of provisioning specifications, and rules on how to fill these templates based on a SLA specification. The approach is agnostic to the specific SLA language and provisioning specification format used, if based on XML.
Conference Paper
With the converging of grid computing and Web service, grid has extended its territory from traditional computing grid to service-oriented grid, which is aiming to realize coordinated resource sharing and problem solving through service selection and composition. Therefore, selecting credible services for applications becomes a key issue in grid environment. Current research on trust inherits the conception of trust from P2P network which is coarse-grained and subjective to satisfy the requirements in grid environment. In this paper, a novel trust model considering userpsilas QoS constraints is proposed. Two new concepts, namely trustworthiness of service and satisfactoriness of service, are introduced into the proposed trust model, which are used to describe the resourcespsila capabilities and userspsila satisfaction respectively. The proposed trust model applies Bayesian Network to evaluate servicespsila trustworthiness that takes into account userspsila multiple QoS metrics. Simulative results show that the establishment of trust is more efficient. Also, when using the proposed trust model for service selection, experimental results indicate that it outperforms other models in terms of query success rate and userpsilas satisfaction.
Article
Full-text available
We extend a measurement-based admission control algorithm suggested for predictive service to provide advance reservations for guaranteed and predictive service, while retaining the attractive features of predictive service. The admission decision for advance reservations is based on information about flows that overlap in time. For flows that have not yet started, the requested values are used, and for those that have already started measurements are used. This allows us to estimate the network load accurately for the near future. To provide advance reservations we ask users to include durations in their requests. We present simulation results to show that predictive service with advance reservations provides utilization levels significantly higher than those for guaranteed service, and comparable to those for predicted service without advance reservations. Those utilization levels are reached without any preemption of other admitted flows. Finally, we discuss how to setup advance reservations over multiple hops in the Internet using resource reservation setup protocols.
Conference Paper
Full-text available
"Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. In this article, we define this new field. First, we review the "Grid problem," which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources-what we refer to as virtual organizations. In such settings, we encounter unique authentication, authorization, resource access, resource discovery, and other challenges. It is this class of problem that is addressed by Grid technologies. Next, we present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. We describe requirements that we believe any such mechanisms must satisfy, and we discuss the central role played by the intergrid protocols that enable interoperability among different Grid systems. Finally, we discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. We maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.
Conference Paper
Full-text available
We present the design, implementation, and experimental results of our soft real time (SRT) system for multimedia applications on top of a general purpose UNIX environment. The SRT system supports multiple CPU service classes for the real time processes based on their processor usage pattern including periodic constant processing time class (PCPT) and periodic variable processing time (PVPT) class. It also provides the following features: reservation and processing time guarantees for the service classes; overrun protection and scheduling algorithm; and system-initiated adaptation strategies. The other unique feature of the SRT system is its easy portability to any operating systems with real time extensions because it is implemented purely in the user space without any modifications to the kernel. We have implemented the SRT system on the Solaris 2.6 operating system with scheduling overhead under 400us and with good performance guarantees
Conference Paper
Full-text available
The development of applications and tools for high-performance “computational grids” is complicated by the heterogeneity and frequently dynamic behavior of the underlying resources; by the complexity of the applications themselves, which often combine aspects of supercomputing and distributed computing; and by the need to achieve high levels of performance. The Globus toolkit has been developed with the goal of simplifying this application development task, by providing implementations of various core services deemed essential for high-performance distributed computing. In this paper, we describe two large applications developed with this toolkit: a distributed interactive simulation and a teleimmersion system. We describe the process used to develop the applications, review the lessons learned and draw conclusions regarding the effectiveness of the toolkit approach
Conference Paper
Full-text available
A distributed, parallel implementation of the widely used Modular Semi-Automated Forces (ModSAF) Distributed Interactive Simulation (DIS) is presented, with scalable parallel processors (SPPs) used to simulate more than 50,000 individual vehicles. The single-SPP code is portable and has been used on a variety of different SPP architectures for simulations with up to 15,000 vehicles. A general metacomputing framework for DIS on multiple SPPs is discussed and results are presented for an initial system using explicit Gateway processes to manage communications among the SPPs. These 50K-vehicle simulations utilized 1,904 processors at six sites across seven time zones, including platforms from three manufacturers. Ongoing activities to both simplify and enhance the metacomputing system using Globus are described
Article
Full-text available
We propose architectural mechanisms for structuring host communication software to provide QoS guarantees. We present and evaluate a QoS sensitive communication subsystem architecture for end hosts that provides real time communication support for generic network hardware. This architecture provides services for managing communication resources for guaranteed QoS (real time) connections, such as admission control, traffic enforcement, buffer management, and CPU and link scheduling. The architecture design is based on three key goals: maintenance of QoS guarantees on a per connection basis, overload protection between established connections, and fairness in delivered performance to best effort traffic. Using this architecture we implement real time channels, a paradigm for real time communication services in packet switched networks. The proposed architecture features a process per channel model that associates a channel handler with each established channel. The model employed for handler execution is one of “cooperative” preemption, where an executing handler yields the CPU to a waiting higher priority handler at well defined preemption points. The architecture provides several configurable policies for protocol processing and overload protection. We present extensions to the admission control procedure for real time channels to account for cooperative preemption and overlap between protocol processing and link transmission at a sending host. We evaluate the implementation to demonstrate the efficacy with which the architecture maintains QoS guarantees on outgoing traffic while adhering to the stated design goals
Article
Full-text available
In this paper we present a middleware infrastructure, called DataCutter, that enables processing of scientific datasets stored in archival storage systems across a widearea network. DataCutter provides support for subsetting of datasets through multidimensional range queries, and application specific aggregation on scientific datasets stored in an archival storage system. We also present experimental results from a prototype implementation. 1 Introduction Increasingly powerful computers have made it possible for computational scientists and engineers to model physical phenomena in great detail. As a result, overwhelming amounts of data are being generated by scientific and engineering simulations. In addition, large amounts of data are being gathered by sensors of various sorts, attached to devices such as satellites and microscopes. The primary goal of generating data through large scale simulations or sensors is to better understand the causes and effects of physical phenomena...
Article
Full-text available
The ability of operating system and network infrastructure to provide end-to-end quality of service (QoS) guarantees in multimedia is a major acceptance factor for various distributed multimedia applications due to the temporal audio-visual and sensory information in these applications. Our constraints on the end-to-end guarantees are (1) QoS should be achieved on a general-purpose platform with a real-time extension support, and (2) QoS should be application-controllable. In order to achieve the users' acceptance requirements and to satisfy our constraints on the multimedia systems, we need a QoS-compliant resource management which supports QoS negotiation, admission and reservation mechanisms in an integrated and accessible way. In this paper we present a new resource model and a time-variant QoS management, which are the major components of the QoS-compliant resource management. The resource model incorporates, the resource scheduler, and a new component, the resource broker, which provides negotiation, admission and reservation capabilities for sharing resources such as CPU, network or memory corresponding to requested QoS. The resource brokers are intermediary resource managers; when combined with the resource schedulers, they provide a more predictable and finer granularity control of resources to the applications during the end-to-end multimedia communication than what is available in current general-purpose networked systems. Furthermore, this paper presents the QoS-aware resource management model called QualMan, as a loadable middleware, its design, implementation, results, tradeoffs, and experiences. There are tradeoffs when comparing our QualMan QoS-aware resource management in middleware and other QoSsupporting resource management solutions...
Article
"Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. In this article, we define this new field. First, we review the "Grid problem," which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources-what we refer to as virtual organizations. In such settings, we encounter unique authentication, authorization, resource access, resource discovery, and other challenges. It is this class of problem that is addressed by Grid technologies. Next, we present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. We describe requirements that we believe any such mechanisms must satisfy, and we discuss the central role played by the intergrid protocols that enable interoperability among different Grid systems. Finally, we discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. We maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.
Conference Paper
A new generation of specialized scientific instruments called synchrotron light sources allow the imaging of materials at very fine scales. However, in contrast to a traditional microscope, interactive use has not previously been possible because of the large amounts of data generated and the considerable computation required translating this data into a useful image. The authors describe a new software architecture that uses high-speed networks and supercomputers to enable quasi-real-time and hence interactive analysis of synchrotron light source data. This architecture uses technologies provided by the Globus computational grid toolkit to allow dynamic creation of a reconstruction pipeline that transfers data from a synchrotron source beamline to a preprocessing station, next to a parallel reconstruction system, and then to multiple visualization stations. Collaborative analysis tools allow multiple users to control data visualization. As a result, local and remote scientists can see and discuss preliminary results just minutes after data collection starts. The implications for more efficient use of this scarce resource and for more effective science appear tremendous.
Article
This document describes a simple client/server model for supporting policy control over QoS signaling protocols. The model does not make any assumptions about the methods of the policy server, but is based on the server returning decisions to policy requests. The model is designed to be extensible so that other kinds of policy clients may be supported in the future. However, this document makes no claims that it is the only or the preferred approach for enforcing future types of policies.
Article
The ability to reserve real-time connections in advance is essential in all distributed multiparty applications (i.e., applications involving multiple human beings) using a network that controls admissions to provide good quality of service. This paper discusses the requirements of the clients of an advance reservation service, and a distributed design for such a service. The design is described within the context of the Tenet Real-Time Protocol Suite 2, a suite being developed for multiparty communication, which will offer advance reservation capabilities to its clients, based on the principles and the mechanisms proposed in the paper. Simulation results providing useful data about the performance and some of the properties of these mechanisms are also presented. We conclude that the one described here is a viable approach to constructing an advance reservation service within the context of the Tenet Suites as well as that of other solutions to the multiparty real-time communication problem.
Article
Distributed multimedia (MM) applications such as video-on-demand and teleconferencing provide services with different quality of service (QoS) requirements. Hence, the user should be able to negotiate the desired QoS depending on his/her needs, the end-system characteristics and his/her financial capacity. In response to a service request with the desired QoS, most QoS negotiation approaches return an acceptance or a simple rejection of the request. More specifically, they provide the user only with the QoS that can be supported at the time the request is made and assume that the service is requested for indefinite duration. This paper describes work on a new QoS negotiation approach with future reservations (NAFUR) that decouples the starting time of the service from the time the service request is made and requires that the duration of the requested service must be specified. NAFUR allows to compute the QoS that can be supported for the time the service request is made, and at certain later times carefully chosen. As an example, if the requested QoS cannot be supported for the time the service request is made, the proposed approach allows to compute the earliest time, when the user can start the service with the desired QoS. NAFUR will help to increase (a) the flexibility of the system by providing the user with more choices, and (b) the system resource utilization, and the availability of the system, by encouraging the sharing of the resources, e.g. multicast for video-on-demand systems. Furthermore, it provides the flexibility to incorporate (a) a range of resource reservation schemes and scheduling policies, and (b) a range of new system component technologies.
Article
The virtual synchrony model for group communication is a powerful paradigm for building distributed applications. Implementations of virtual synchrony usually use failure detectors and failure recovery protocols. In applications that require a large number of groups, significant performance gains can be attained if these groups share the resources required to provide virtual synchrony. A service that maps multiple user groups onto a small number of instances of a virtually synchronous implementation is called a light-weight group service. This paper describes a new design for the light-weight group protocols that enables such service to function transparently. We discuss how these protocols can be applied in dynamic environments, where group mappings cannot be defined a priori and may change over time. We show that it is possible to establish mappings that promote resource sharing and, at the same time, minimize interference. These mappings can be established in an automated manner, using heuristics applied locally at each node. Experiments using an implementation in the Horus system show that significant performance improvements can be achieved with this approach.
Article
High-end networked applications such as distance visualization, distributed data analysis, and advanced collaborative environments have demanding quality of service (QoS) requirements. Particular challenges include concurrent flows with different QoS specifications, high-bandwidth flows, application-level monitoring and control, and end-to-end QoS across networks and other devices. We describe a QoS architecture and implementation that together help to address these challenges. The General-purpose Architecture for Reservation and Allocation (GARA) supports flow-specific QoS specification, immediate and advance reservation, and online monitoring and control of both individual resources and heterogeneous resource ensembles. Mechanisms provided by the Globus Toolkit are used to address resource discovery and security issues when resources span multiple administrative domains. Our prototype GARA implementation builds on differentiated services mechanisms to enable the coordinated management of two distinct flow types—foreground media flows and background bulk transfers—as well as the co-reservation of networks, CPUs, and storage systems. We present results obtained on a wide area differentiated services testbed that demonstrate our ability to deliver QoS for realistic flows.
Conference Paper
A new generation of specialized scientific instruments called synchrotron lightsources allow the imaging of materials at very fine scales. However, in contrast toa traditional microscope, interactive use has not previously been possible becauseof the large amounts of data generated and the considerable computation requiredtranslating this data into a useful image. We describe a new software architecturethat uses high-speed networks and supercomputers to enable quasi-real-time and...
Article
XML Schema Part 0: Primer is a non-normative document intended to provide an easily readable description of the XML Schema facilities, and is oriented towards quickly understanding how to create schemas using the XML Schema language. XML Schema Part 1: Structures and XML Schema Part 2: Datatypes provide the complete normative description of the XML Schema language. This primer describes the language features through numerous examples which are complemented by extensive references to the normative texts.
Conference Paper
High-end networked applications such as distance visualization, distributed data analysis, and advanced collaborative environments have demanding quality of service (QoS) requirements. We focus on making policy decisions when users attempt to make reservations for network bandwidth across several administrative network domains that are controlled by a bandwidth broker. We present a signalling protocol that facilitates the establishment of a distributed policy decision point as well as the establishment of a direct signalling channel between the source and end domains
Conference Paper
Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or institutions: what are sometimes called virtual organizations. In these settings, the discovery, characterization, and monitoring of resources, services, and computations are challenging problems due to the considerable diversity; large numbers, dynamic behavior, and geographical distribution of the entities in which a user might be interested. Consequently, information services are a vital part of any Grid software infrastructure, providing fundamental mechanisms for discovery and monitoring, and hence for planning and adapting application behavior. We present an information services architecture that addresses performance, security, scalability, and robustness requirements. Our architecture defines simple low-level enquiry and registration protocols that make it easy to incorporate individual entities into various information structures, such as aggregate directories that support a variety of different query languages and discovery strategies. These protocols can also be combined with other Grid protocols to construct additional higher-level services and capabilities such as brokering, monitoring, fault detection, and troubleshooting. Our architecture has been implemented as MDS-2, which forms part of the Globus Grid toolkit and has been widely deployed and applied
Conference Paper
Computational grids are enabling collaboration between scientists and organizations to generate and archive extremely large datasets across shared, distributed resources. There is a need to visually explore such data throughout the life-cycle of projects. Practical exploration of large datasets requires visualization tools that can function in the same grid environment in which the data is created and stored. Resource management interfaces are an important structural component of grid computing environments because they enable uniform access to the wide variety of resources necessary for scientific work. We describe a new advance-reservation system for graphics resources; and an application of existing grid technology to create general-purpose active storage systems. We report our experience with prototype infrastructure and application components, involving experiments coupling end-to-end resources for interactive visual exploration of large data in representative distributed environments
Conference Paper
Federated distributed systems present new challenges to resource management, which cannot be met by conventional systems that employ relatively static resource models and centralized allocators. We previously argued that matchmaking provides an elegant and robust resource management solution for these highly dynamic environments (R. Raman et al., 1998). Although powerful and flexible, multiparty policies (e.g., co-allocation) cannot be accommodated by matchmaking. The authors present Gang-Matching, a multilateral matchmaking formalism to address this deficiency
Conference Paper
Data-intensive applications in the Condor high-throughput computing (HTC) environment can place heavy demands on network resources for checkpointing and remote data access. We have developed mechanisms to monitor, control and schedule network usage in Condor. By managing network resources, these mechanisms provide administrative control over Condor's network usage and improve the execution efficiency of Condor applications
Conference Paper
Conventional resource management systems use a system model to describe resources and a centralized scheduler to control their allocation. We argue that this paradigm does not adapt well to distributed systems, particularly those built to support high throughput computing. Obstacles include heterogeneity of resources, which make uniform allocation algorithms difficult to formulate, and distributed ownership, leading to widely varying allocation policies. Faced with these problems, we developed and implemented the classified advertisement (classad) matchmaking framework, a flexible and general approach to resource management in distributed environment with decentralized ownership of resources. Novel aspects of the framework include a semi structured data model that combines schema, data, and query in a simple but powerful specification language, and a clean separation of the matching and claiming phases of resource allocation. The representation and protocols result in a robust, scalable and flexible framework that can evolve with changing resources. The framework was designed to solve real problems encountered in the deployment of Condor, a high throughput computing system developed at the University of Wisconsin-Madison. Condor is heavily used by scientists at numerous sites around the world. It derives much of its robustness and efficiency from the matchmaking architecture
Conference Paper
The potential for faults in distributed computing systems is a significant complicating factor for application developers. While a variety of techniques exist for detecting and correcting faults, the implementation of these techniques in a particular context can be difficult. Hence, we propose a fault detection service designed to be incorporated, in a modular fashion, into distributed computing systems, tools, or applications. This service uses well-known techniques based on unreliable fault detectors to detect and report component failure, while allowing the user to tradeoff timeliness of reporting against false positive rates. We describe the architecture of this service, report on experimental results that quantify its cost and accuracy, and describe its use in two applications, monitoring the status of system components of the GUSTO computational grid testbed and as part of the NetSolve network-enabled numerical solver
Conference Paper
The Globus project is a multi-institutional research effort that seeks to enable the construction of computational grids providing pervasive, dependable, and consistent access to high-performance computational resources, despite geographical distribution of both resources and users. Computational grid technology is being viewed as a critical element of future high-performance computing environments that will enable entirely new classes of computation-oriented applications, much as the World Wide Web fostered the development of new classes of information-oriented applications. The authors report on the status of the Globus project as of early 1998. They describe the progress that has been achieved to date in the development of the Globus toolkit, a set of core services for constructing grid tools and applications. They also discuss the Globus Ubiquitous Supercomputing Testbed (GUSTO) that they have constructed to enable large-scale evaluation of Globus technologies, and they review early experiences with the development of large-scale grid applications on the GUSTO testbed
Conference Paper
We propose architectural mechanisms for structuring host communication software to provide QoS guarantees. In particular, we present and evaluate a QoS sensitive communication subsystems architecture for end hosts that provides real time communication support for generic network hardware. This architecture provides services for managing communication resources for guaranteed QoS (real time) connections, such as admission control, traffic enforcement, buffer management, and CPU and link scheduling. The design of the architecture is based on three key goals: maintenance of QoS guarantees on a per connection basis, overload protection between established connections, and fairness in delivered performance to best effort traffic. Using this architecture we implement real time channels, a paradigm for real time communication services in packet switched networks. We evaluate the implementation to demonstrate the efficacy with which the architecture maintains QoS guarantees while adhering to the stated design goals. The evaluation also demonstrates the need for specific features and policies provided in the architecture
Article
The differentiated services architecture has been proposed for providing different levels of services and has received wide attention. A packet in a diff-serv domain is classified into a class of service according to its contract profile and treated differently by its class. While many studies have addressed issues on the diff-serv architecture (e.g., dropper, marker, classifier and shaper), there have been few attempts to analytically understand a flow's behavior in a diff-serv network. We propose simple models of TCP behavior in a diff-serv network. Our models quantitatively characterize TCP throughput as functions of the contract rate, the packet-drop rate and the round-trip time in either two-drop precedence or three-drop precedence network. We also extend our models to aggregated flows. The models are validated through a number of simulations
Article
The Internet2 project is a partnership of over 130 U.S. universities, 40 corporations, and 30 other organizations. Since its inception, one of the primary technical objectives of Internet2 has been to engineer scalable, interoperable, and administrable interdomain QoS to support an evolving set of new advanced networked applications. Applications like distance learning, remote instrument access and control, advanced scientific visualization, and networked collaboratories will allow universities to fulfill their research and education missions into the future, but only if the network QoS these applications require can be ensured. To meet this challenge, the Internet2 QBone initiative has brought together a dedicated group of U.S. university and federal agency networks, international research networks, engineers, researchers, and applications developers to build a testbed for interdomain IP differentiated services. This article presents the engineering motivations behind DiffServ and its adoption by Internet2, provides an overview of the QBone architecture, and describes its anticipated deployment, including plans for a trial inter-domain bandwidth brokering architecture. Security aspects are considered togethered together with an inter-bandwidth broker reservation signaling protocol
Article
New cell-switched network technologies and multimedia peripherals enable distributed applications with strict real-time requirements such as remote control with feedback. Time-bounded network communications services are necessary, but not sufficient, to meet application-to-application real-time requirements. Real-time communication must be coupled with real-time computing support at the network end-points. An end-point architecture for the computation/communications coupling must be flexible and robust to support a diversity of applications. The OMEGA architecture, when coupled with cell-switched networks (or others which can make bandwidth and delay guarantees), can approximate the behavior of dedicated microcontrollers connected by dedicated circuits in support of an application. The essence of the OMEGA architecture is resource reservation and management within the set of multimedia endpoints. Communications is preceded by a call set-up period where requirements, expressed in terms of Quality of Service (QoS) parameters, are negotiated, and guarantees are made at several logical levels, such as between applications and the network subsystem, applications and the operating system, and the network subsystem and the operating system. This establishes customized connections and allocation of resources appropriate to the application requirements and OS/network capabilities. To facilitate this resource management process, a new paradigm called the 'QoS Brokerage' is used. This paradigm requires new services and protocols across all layers of the protocol stack (i.e., the higher layers of B-ISDN), as well as re-architecting the application/network interface. A prototype of OMEGA has been implemented and tested with a master/slave telerobotics application using a dedicated 155 Mbps ATM LAN. This application employs media with highly diverse QoS requirements and therefore provides a good platform for testing how closely one can approximate a dedicated circuit and controller with workstation hosts and cell-switching. Experience with this implementation has helped to identify new challenges to extending these techniques to a larger domain of applications and systems, and raises several new research questions.
Article
: Resource management offers Quality-of-Service reliability for time-critical continuousmedia applications. Currently, existing resource management systems in the Internet and ATM domain only provide means to reserve resources starting with the reservation attempt and lasting for an unspecified duration. However, for several applications such as video conferencing, the ability to reserve the required resources in advance is of great advantage. This paper outlines a new model for resource reservation in advance. We identify and discuss issues to be resolved for allowing resource reservation in advance. We show how the resource reservation in advance scheme can be embedded in a general architecture and describe the design and implementation of a resource management system providing reservation in advance functionality. 1 Introduction Computer systems used for continuous media processing must cope with streams having data rates of several Mbits/s and must provide timely processing guaran...
Article
In "Grids" and "collaboratories," we find distributed communities of resource providers and resource consumers, within which often complex and dynamic policies govern who can use which resources for which purpose. We propose a new approach to the representation, maintenance, and enforcement of such policies that provides a scalable mechanism for specifying and enforcing these policies. Our approach allows resource providers to delegate some of the authority for maintaining fine-grained access control policies to communities, while still maintaining ultimate control over their resources. We also describe a prototype implementation of this approach and an application in a data management context.
Article
Building on both Grid and Web services technologies, the Open Grid Services Architecture (OGSA) defines mechanisms for creating, managing, and exchanging information among entities called Grid services. Succinctly, a Grid service is a Web service that conforms to a set of conventions (interfaces and behaviors) that define how a client interacts with a Grid service. These conventions, and other OGSA mechanisms associated with Grid service creation and discovery, provide for the controlled, fault resilient, and secure management of the distributed and often long-lived state that is commonly required in advanced distributed applications. In a separate document, we have presented in detail the motivation, requirements, structure, and applications that underlie OGSA. Here we focus on technical details, providing a full specification of the behaviors and Web Service Definition Language (WSDL) interfaces that define a Grid service.
Article
In both e-business and e-science, we often need to integrate services across distributed, heterogeneous, dynamic "virtual organizations" formed from the disparate resources within a single enterprise and/or from external resource sharing and service provider relationships. This integration can be technically challenging because of the need to achieve various qualities of service when running on top of different native platforms. We present an Open Grid Services Architecture that addresses these challenges. Building on concepts and technologies from the Grid and Web services communities, this architecture defines a uniform exposed service semantics (the Grid service); defines standard mechanisms for creating, naming, and discovering transient Grid service instances; provides location transparency and multiple protocol bindings for service instances; and supports integration with underlying native platform facilities. The Open Grid Services Architecture also defines, in terms of Web Services Description Language (WSDL) interfaces and associated conventions, mechanisms required for creating and composing sophisticated distributed systems, including lifetime management, change management, and notification. Service bindings can support reliable invocation, authentication, authorization, and delegation, if required. Our presentation complements an earlier foundational article, "The Anatomy of the Grid," by describing how Grid mechanisms can implement a service-oriented architecture, explaining how Grid functionality can be incorporated into a Web services framework, and illustrating how our architecture can be applied within commercial computing as a basis for distributed system integration--within and across organizational domains.
Article
The Globus project is a multi-institutional research e#ort that seeks to enable the construction of computational grids providing pervasive, dependable, and consistent access to high-performance computational resources, despite geographical distribution of both resources and users. Computational grid technology is being viewed as a critical element of future highperformance computing environments that will enable entirely new classes of computation-oriented applications, much as the World Wide Web fostered the development of new classes of information-oriented applications. In this paper, we report on the status of the Globus project as of early 1998. We describe the progress that has been achieved to date in the development of the Globus toolkit, a set of core services for constructing grid tools and applications. We also discuss on the Globus Ubiquitous Supercomputing Testbed (GUSTO) that we have constructed to enable largescale evaluation of Globus technologies, and review early exp...
Article
Reservation and adaptation are two well-known and effective techniques for enhancing the end-to-end performance of network applications. However, both techniques also have limitations, particularly when dealing with high-bandwidth, dynamic flows: fixed-capability reservations tend to be wasteful of resources and hinder graceful degradation in the face of congestion, while adaptive techniques fail when congestion becomes excessive. We propose an approach to quality of service (QoS) that overcomes these difficulties by combining features of reservations and adaptation. In this approach, a combination of online control interfaces for resource management, a sensor permitting online monitoring, and decision procedures embedded in resources enable a rich variety of dynamic feedback interactions between applications and resources. We describe a QoS architecture, GARA, that has been extended to support these mechanisms, and use three examples of application-level adaptive strategies to show ho...
Article
State-of-the-art and emerging scientific applications require fast access to large quantities of data and commensurately fast computational resources. Both resources and data are often distributed in a wide-area network with components administered locally and independently. Computations may involve hundreds of processes that must be able to acquire resources dynamically and communicate e#ciently. This paper analyzes the unique security requirements of large-scale distributed (grid) computing and develops a security policy and a corresponding security architecture. An implementation of the architecture within the Globus metacomputing toolkit is discussed. 1 Introduction Large-scale distributed computing environments, or "computational grids" as they are sometimes termed [4], couple computers, storage systems, and other devices to enable advanced applications such as distributed supercomputing, teleimmersion, computer-enhanced instruments, and distributed data mining [2]. Grid applica...
Article
Metacomputing systems are intended to support remote and/or concurrent use of geographically distributed computational resources. Resource management in such systems is complicated by five concerns that do not typically arise in other situations: site autonomy and heterogeneous substrates at the resources, and application requirements for policy extensibility, co-allocation, and online control. We describe a resource management architecture that addresses these concerns. This architecture distributes the resource management problem among distinct local manager, resource broker, and resource coallocator components and defines an extensible resource specification language to exchange information about requirements. We describe how these techniques have been implemented in the context of the Globus metacomputing toolkit and used to implement a variety of different resource management strategies. We report on our experiences applying our techniques in a large testbed, GUSTO, incorporating ...
Article
Data-intensive applications in the Condor High Throughput Computing environment can place heavy demands on network resources for checkpointing and remote data access. We have developed mechanisms to monitor, control, and schedule network usage in Condor. By managing network resources, these mechanisms provide administrative control over Condor's network usage and improve the execution efficiency of Condor applications. 1 Introduction Until recently, the Condor research project has focused on the challenges of managing usage of CPU resources for High Throughput Computing (HTC) [4]. However, as the amount of physical memory available to HTC applications has dramatically increased, HTC environments have become an attractive platform for applications which are more data-intensive. As these applications place greater demands on the network, it has become important for Condor to manage usage of network resources in addition to CPU resources to enforce administrative network policies and to...
Datacutter: Middleware for filtering very large scientific datasets on archival storage systems
  • M Beynon
  • R Ferreira
  • T M Kurc
  • A Sussman
  • J H Saltz
M. Beynon, R. Ferreira, T. M. Kurc, A. Sussman, and J. H. Saltz. Datacutter: Middleware for filtering very large scientific datasets on archival storage systems. In IEEE Symposium on Mass Storage Systems, pages 119-134, 2000.
A security architecture for computational grids
  • I Foster
  • C Kesselman
  • G Tsudik
  • S Tuecke
I. Foster, C. Kesselman, G. Tsudik, and S. Tuecke. A security architecture for computational grids. In ACM Conference on Computers and Security, pages 83-91. ACM Press, 1998.
Network quality of service
  • R Guérin
  • H Schulzrinne
R. Guérin and H. Schulzrinne. Network quality of service. In [16], pages 479-503.
Exploration and visualization of very large datasets with the Active Data Repository
  • T Kurc
  • C Ümit Ç Atalyürek
  • A Chang
  • J Sussman
  • Salz
T. Kurc,Ümit Ç atalyürek, C. Chang, A. Sussman, and J. Salz. Exploration and visualization of very large datasets with the Active Data Repository. Technical Report CS-TR-4208, University of Maryland, 2001.
A community authorization service for group collaboration
  • L Pearlman
  • V Welch
  • I Foster
  • C Kesselman
  • S Tuecke
L. Pearlman, V. Welch, I. Foster, C. Kesselman, and S. Tuecke. A community authorization service for group collaboration. In The IEEE 3rd International Workshop on Policies for Distributed Systems and Networks, June 2002.