Article

SNAP: A Protocol for Negotiation of Service Level Agreements and Coordinated Resource Management in

January 2002

January 2002

Authors:

Ian Foster

University of Chicago

Volker Sander

FH Aachen University of Applied Sciences

Show all 5 authorsHide

A fundamental problem with distributed applications is to map activities such as computation or data transfer onto a set of resources that will meet the application's requirement for performance, cost, security, or other quality of service metrics. An application or client must engage in a multi-phase negotiation process with resource managers, as it discovers, reserves, acquires, configures, monitors, and potentially renegotiates resource access. Current approaches to resource management tend to specialize for specific classes of resource (processor, network, etc.), and have addressed coordination across resources in a limited fashion, if at all. We present a generalized resource management model in which resource interactions are mapped onto a well defined set of platform-independent service level agreements (SLAs). We instantiate this model in the Service Negotiation and Acquisition Protocol (SNAP) which provides lifetime management and an at-most-once creation semantics for remote SLAs. The result is a resource management framework for distributed systems that we believe is more powerful and general than current approaches. We explain how SNAP can be deployed within the context of the Globus Toolkit.

Scheduling jobs on the grid-multicriteria approach

Article

Jan 2006

Automated hierarchical service level agreements

Article

Apr 2011

Constantinos Kotsokalis

The present dissertation concerns the area of Service Computing. More specifically, it contributes to the topic of enabling IT service stacks with dependability, such that they can be used even further in pragmatic business environments and applications. The instrument used for this purpose is a Service Level Agreement (SLA). The main focus is on SLA Hierarchies, which reflect corresponding Service Hierarchies. SLAs may be established manually, or automatically among software agents; it is mainly the latter case that is considered here. The thesis contributes by means of a formal problem definition for the construction of SLA hierarchies using a translation process, a management architecture, a formal model for defining penalties and a representation that facilitates the processing of SLAs. Using these tools, it is shown that automated SLA management in hierarchical setups is possible, through an application to Multi-Domain Infrastructure-as-a-Service. Within this specific technical area, different SLA-based resource capacity planning approaches are examined via simulation -- both for online and offline planning. The former case concerns normal runtime operations, and the thesis examines two greedy algorithms with regard to their energy-savings efficiency and their performance. In the latter case, a resource-scarce environment is simulated with the purpose of minimizing penalties from already established SLAs. This is achieved via formally-defined combinatorial models, which are solved and compared to two greedy algorithms.

Negotiation strategies considering market, time and behavior functions for resource allocation in computational grid

Article

Dec 2013

Providing an efficient resource allocation mechanism is a challenge to computational grid due to large-scale resource sharing and the fact that Grid Resource Owners (GROs) and Grid Resource Consumers (GRCs) may have different goals, policies, and preferences. In a real world market, various economic models exist for setting the price of grid resources, based on supply-and-demand and their value to the consumers. In this paper, we discuss the use of multiagent-based negotiation model for interaction between GROs and GRCs. For realizing this approach, we designed the Market- and Behavior-driven Negotiation Agents (MBDNAs). Negotiation strategies that adopt MBDNAs take into account the following factors: Competition, Opportunity, Deadline and Negotiator’s Trading Partner’s Previous Concession Behavior. In our experiments, we compare MBDNAs with MDAs (Market-Driven Agent), NDF (Negotiation Decision Function) and Kasbah in terms of the following metrics: total tasks complementation and budget spent. The results show that by taking the proposed negotiation model into account, MBDNAs outperform MDAs, NDF and Kasbah.

HyperGrid: Arquitetura de Grade Baseada em Hipercubo Virtual

Article

Full-text available

May 2005

Computational grids offer access to a large number of comput ational resources for the execution of parallel and distributed aplications. Thi s work presents the architecture of the HyperGrid, a platform for the execution of distributed and p arallel applications based on the message passing paradigm and written with MPI (Message Pass ing Interface). The HyperGrid is based on a virtual hypercube, that is a virtual network ove r the Internet. The Virtual Dis- tributed Hypercube Algorithm (DiVHA) is used to maintain th e hypercube and to monitor the system resources. The hypercube provides the necessary res ource location transparency and also hides resource heterogenity, providing a fault-toler ant environment that is capable of self reconfiguration when faults occur.

Research Challenges for a Scalable Distributed Information Management System

Article

A Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems can serve as a basic building for a broad-range of large-scale distributed applications simplifying the design, development, and deployment of such services. In this document, we outline four key requirements such an aggregation system should satisfy to be useful as a general middleware building block - scalability with both nodes and data attributes, flexibility to accommodate broad range of services, administrative autonomy and isolation for availability and security, and robustness to reconfigurations in the system. We propose a new aggregation framework that leverages Distributed Hash Tables (DHTs) and a new aggregation abstraction that builds on a previously proposed abstraction in Astrolabe. We also present details of several significant applications that we propose to build on top of SDIMS.

Adaptive Co-Allocation of Distributed Resources for Parallel Applications

Article

Simulating the Quantum Fourier Transform with Distributed Computing

Article

Full-text available

In this paper a new approach for quantum computer simulations is presented. The proposal is creating a simulator where the main concern is not simply the results of the algorithm for a given input. Instead, this simulator will imitate, as close as possible, the internal behavior of a real quantum computer. In order to do that, Distributed Computing is necessary.

Agent-based negotiation for Resource Allocation in Grid

Article

Full-text available

Grid technology allows the sharing of resources within groups of individuals or organizations. A job submission in grid initially requires the identification of a list of servers that meet a certain job description. After, it is necessary to select the best server from this list. None of current researches associates the choice of the server with the service delivery conditions. In order to incorporate quality to the grid service it is important to know when the job will finish and what are the cost and quality factors involved. We present here a Multi-Agent System that chooses the best place to run a grid job by making use of negotiation. The prediction of job execution is achieved with case-based reasoning technique and the negotiation flexibility is delimited by resource policies. Our approach models different forms of negotiation, identified as multi-issue, bilateral and chaining negotiations.

SLA for E-Learning System Based on Cloud Computing

Article

Full-text available

Oct 2015

The Service Level Agreement (SLA) becomes an important issue especially over the Cloud Computing and online services that based on the ‘pay-as-you-use’ fashion. Establishing the Service level agreements (SLAs), which can be defined as a negotiation between the service provider and the user, is needed for many types of current applications as the E-Learning systems. The work in this paper presents an idea of optimizing the SLA parameters to serve any E-Learning system over the Cloud Computing platform, with defining the negotiation process, the suitable frame work, and the sequence diagram to accommodate the E-Learning systems.

Selecting Negotiation Strategies for Meeting Scheduling Using a Model Based Approach

Article

Full-text available

Dec 2012

S. M. Mozammal Hossain

Resolving conflicts using automatic negotiation for agent-based meeting scheduling is a challenge. In order to negotiate with all meeting participants strategically, a set of negotiation strategies and a strategy selection model are required. This research focuses on developing a strategy selection model for selecting an appropriate strategy from a set of different strategies to resolve or avoid meeting conflicts. The strategy selection model is based on analyzing historical data, current meeting scheduling, participants’ profile and preference data using AI techniques.

Interoperability and Reuse with WS-Agreement Problems for Discussion as of January 20, 2004

Article

The GRAAP working group of the Global Grid Forum is drafting a specication for the management of resources and services using negotiated service level agreements in a Web services environment (WS-Agreement). This memo discusses on- going design considerations for this activity, focusing on the desire to strike a balance between goals for e xibility, reusability, and interoperability of systems utilizing the WS-Agreement interface. If WS-Agreement services are to be discovered and uti- lized by clients in a large-scale environment, their extended negotiation capabilities, policies, and offered services must be available for search, inspection, and comparison. The GRAAP working group faces difcult design challenges to achieve these goals while utilizing Web services technologies for term and constraint languages, negotiation messages, negotiator characterization, and negotiator discovery.

Multi-GPU, Multi-Node Algorithms for Acceleration of Image Reconstruction in 3D Electrical Capacitance Tomography in Heterogeneous Distributed System

Article

Full-text available

Jan 2020
SENSORS-BASEL

Electrical capacitance tomography (ECT) is one of non-invasive visualization techniques which can be used for industrial process monitoring. However, acquiring images trough 3D ECT often requires performing time consuming complex computations on large size matrices. Therefore, a new parallel approach for 3D ECT image reconstruction is proposed, which is based on application of multi-GPU, multi-node algorithms in heterogeneous distributed system. This solution allows to speed up the required data processing. Distributed measurement system with a new framework for parallel computing and a special plugin dedicated to ECT are presented in the paper. Computing system architecture and its main features are described. Both data distribution as well as transmission between the computing nodes are discussed. System performance was measured using LBP and the Landweber’s reconstruction algorithms which were implemented as a part of the ECT plugin. Application of the framework with a new network communication layer reduced data transfer times significantly and improved the overall system efficiency.

MORPHOSYS: Efficient Colocation of QoS-Constrained Workloads in the Cloud

Preprint

Full-text available

Dec 2019

In hosting environments such as IaaS clouds, desirable application performance is usually guaranteed through the use of Service Level Agreements (SLAs), which specify minimal fractions of resource capacities that must be allocated for use for proper operation. Arbitrary colocation of applications with different SLAs on a single host may result in inefficient utilization of the host's resources. In this paper, we propose that periodic resource allocation and consumption models be used for a more granular expression of SLAs. Our proposed SLA model has the salient feature that it exposes flexibilities that enable the IaaS provider to safelya transform SLAs from one form to another for the purpose of achieving more efficient colocation. Towards that goal, we present MorphoSys: a framework for a service that allows the manipulation of SLAs to enable efficient colocation of workloads. We present results from extensive trace-driven simulations of colocated Video-on-Demand servers in a cloud setting. The results show that potentially-significant reduction in wasted resources (by as much as 60%) are possible using MorphoSys.

Orchestrating Intercontinental Advance Reservations with Software-Defined Exchanges

Article

Full-text available

Nov 2017
FUTURE GENER COMP SY

To interconnect research facilities across wide geographic areas, network operators deploy science networks, also referred to as Research and Education (R&E) networks. These networks allow experimenters to establish dedicated circuits between research facilities for transferring large amounts of data, by using advanced reservation systems. Intercontinental dedicated circuits typically require coordination between multiple administrative domains, which need to reach an agreement on a suitable advance reservation. The success rate of finding an advance reservation decreases as the number of participant domains increases for traditional systems because the circuit is composed over a single path. To improve provisioning of multi-domain advance reservations, we propose an architecture for end-to-end service orchestration in multi-domain science networks that leverages software-defined exchanges (SDX) for providing multi-path, multi-domain advance reservations. We have implemented an orchestrator for multi-path, multi-domain advance reservations and an SDX to support these services. Our orchestration architecture enables multi-path, multi-domain advance reservations and improves the reservation success rate from 50% in single path systems to 99% when four path are available.

Aligning Service Level Agreements with Service-Oriented Enterprise Architecture

Conference Paper

Oct 2017

Aggregating Service Level Agreements in Services Bundling: A Semiring-Based Approach

Conference Paper

Nov 2016
Lect Notes Comput Sci

Business services arguably play a central role in service-based information systems as they fill in the gap between the technicality of Service-Oriented Architecture and the business aspects captured in Enterprise Architecture. Business services have distinctive features that are not typically observed in Web services, e.g. significant portions of the functionality of business services might be executed in a human-mediated fashion. As such, service level agreement (SLA) should be described as a mixture of human-mediated functionality (e.g., service penalty) and computer-interpretable measurement (e.g., reliability, payment). In this paper, we propose a formal framework for reasoning about the SLAs from the perspective of services bundling – the practice of innovatively organizing business services into a bulkier service offering that creates new values. Specifically, we (a) represent multi-level SLA of a business service in terms of service reliability, payment and penalty using the mathematical structure of semiring; (b) provide formality for aggregating SLAs of the constituent services that make up the service bundling; (c) make multi-level SLAs of a bundled service technically comparable. The main contribution of this work is a machinery for handling a large number of SLAs generated through services bundling, allowing to the service consumers to pick up the right service offering according to their preference.

Agent Based Grid Resource Discovery with Negotiated Alternate Solution and Non-Functional Requirement Preferences

Article

Mar 2009
J Comput Sci

Muthuchelvi

Problem statement: As grid resources are geographically distributed, e fficient resource discovery and management has become one of the important requirements. Besides, Grid users are independent identities and negotiation is necessary for reconciling their diverse characteristics. Therefore special mechanism was required to negotiate and discover the required resource or similar resource as an alternative when discovery fails. Mo reover the quality of the service being provided in the grid environment depends on both functional as well as the Non-Functional Requirements (NFR). But conflicts between NFRs are not yet resolved eff ectively. Discover the requested resources to the requester, Provide compromised alternate resources by negotiation when resource discovery fails to increase the success rate of the agent, Provide kno wledge for efficient management of resources and quality of service is to be improved by considering NFR. Approach: A system Agent Based Grid Resource Discovery with Negotiated Alternate Solution and Non-Functional Requirement Preferences (AGRD_NFRP) was proposed to provide an expeditious resource and most relevant alternate resource when discovery fails. Four types of intelligent an d mobile agents were proposed for judicious management of resources to the advantage of resource providers and requesters in ensuring speedy execution of processes. Resource discovery, negoti ation and alternate solution were handled by these agents. In order to improve the quality of the serv ice the non-functional requirements of the grid use r request with their preferences were identified and conflicts among them were analyzed using fuzzy rules. Results: The results showed that the AGRD_NFRP system proposed is producing consistently higher success rate by providing alternate solution and getting knowledge from the cognitive agent. Quality of the service was enriched by prioritizing the preferences of grid user. Conclusion: On numerous occasions, grid users face non availabilit y of high-end resources for completing the task on hand. In this context, the approach outlined in thi s research is most appropriate, convenient and efficient. The AGRD_NFRP system proposed herein played a crucial role in bridging the seemingly wide gape between resource requester and resource o wners.

Enhanced SLA management in the high performance computing domain

Thesis

Full-text available

Jan 2011

Bastian Koller

This thesis describes a Service Level Agreement Schema for the High Performance Computing domain and the according architecture to allow for SLA Management, which are both developed on base of three different use cases.

Risk Assessment Models for Resource Failure in Grid Computing By

Article

Abdullah Alsoghayer

Grid Resource Allocation: A Review

Article

Full-text available

Jun 2012
Res J Inform Tech

As computing technology improves the accessibility to computing resources increases, the demands put on resources gets higher and higher. A grid is a large-scale, heterogeneous, dynamic collection of independent systems, geographically distributed and interconnected with high speed networks. Furthermore, in grid, the resource allocation is a process of allocating user jobs to the CPUs. These jobs are divided into tasks which are allocated to different computers on grid for execution process. Resource allocation is one of the critical features of grid technology. Thus, we found that resource heterogeneity has a great impact on resource allocation which is quite significant in terms of performance, reliability, robustness and scalability. Indeed, the system robustness increases as the system complexity increases. In other words, resource allocation is also an NP complete problem where there is no final solution. The main objective of this study is to review the various grid resource allocations strategies which will in turn serve as a guide for researchers and our vision for future research directions. Therefore, to facilitate further developments in the area, it is essential to survey and review the existing body of knowledge. Therefore, in this chapter, we have studied and classified various ways to achieve an optimum solution. Operation research management (game theory and transportation method) which have been widely use in grid resource allocation for optimum solution, we will design and evaluate a new algorithm for resource allocation either by using simulation or real grid environment.

A Critical Review of Cloud Computing: Researching Desires and Realities

Article

Full-text available

Sep 2012

Cloud computing has become central to current discussions about corporate information technology. To assess the impact that cloud may have on enterprises, it is important to evaluate the claims made in the existing literature and critically review these claims against empirical evidence from the field. To this end, this paper provides a framework within which to locate existing and future research on cloud computing. This framework is structured around a series of technological and service ‘desires’, that is, characteristics of cloud that are important for cloud users. The existing literature on cloud computing is located within this framework and is supplemented with empirical evidence from interviews with cloud providers and cloud users that were undertaken between 2010 and 2012. The paper identifies a range of research questions that arise from the analysis.

Cyberinfrastructure for biomedical applications: Metascheduling as an essential component for pervasive computing

Article

Full-text available

Jan 2009

Biomedical, translational and clinical research through increasingly complex computational modeling and simulation generate enormous potential for personalized medicine and therapy, and an insatiable demand for advanced cyberinfrastructure. Metascheduling that provides integrated interfaces to computation, data, and workflow management in a scalable fashion is essential to advanced pervasive computing environment that enables mass participation and collaboration through virtual organizations (VOs). Avian Flu Grid (AFG) is a VO dedicated to members from the international community to

Adaptive Service Level Management for Grids

Article

Full-text available

Recent advances in Grid computing have lead to real world deployments of grid implementations in the eScience and commercial domains. With the increasing demands on these resources, the role of Service Level Agreements (SLA) becomes hugely important, as an SLA is the means used to define the terms of usage and obligations for the relevant parties. However, the current methods used to manage grid resources are not capable of supporting SLAs. The system has to be robust and be capable of adapting to changing resource demands from continuous SLA requests. In this paper we present our adaptive approach to SLA management for Grid systems based on the adaptive mechanisms of cognitive control in the human brain and an ontological approach to SLA decomposition.

Position Paper: An Agent-Based Quality of Service Negotiation Framework for Grid Environments - Semi Automatic Service Level Agreement Adaptation

Article

Thomas Sandholm

As a result of ever-increasing network performance, and growing needs in the science community for large-scale computations on huge data sets, the research fields of parallel computing and distributed computing merged into what became known as Grid computing. A Grid is a system that coordinates decentralized resources, such as CPU, disk space and bandwidth, while leveraging ubiquitous communication protocols to solve complex problems. Due to the typically large scale of the Grid, the resources are often heterogeneous in nature, which puts a high burden on the infrastructure a.k.a. middleware that shields the end users from the back-end complexity. On the other hand the feature heterogeneity of the Grid resources also provides an enormous opportunity to leverage a multitude of Quality of Service (QoS) characteristics. In order to reason about provided as well as expected QoS of resources, a common language must be used. These languages are often referred to as policy languages. The contract between a resource and a user that describes promised QoS, is frequently referred to as a Service Level Agreement (SLA). It is a non-trivial task to make sure that SLA contracts in a Grid environment are adhered to, because it often spans many software and hardware abstraction layers. Coordinated monitoring and management software is thus vital for such a system. Policing contracts is not simply a matter of making sure that users don't exceed their QoS grants, and that resources fulfill their promises. Such a static view may in fact lead to worse resource utilization and less service offered to the user. Instead intelligent policing agents could be used by the infrastructure as an adaptive, self- healing, autonomous mechanism to optimize various resource or user goals. This more dynamic and less restrictive way of reasoning fits well into the area of soft computing and rule based systems. Fuzzy logic, for example, facilitates decision-making based on approximations, uncertainties, and conflicting data with smooth transitions between policy choices. In conjunction with a learning and adaptive system, such as a neural network for populating the fuzzy rule base with knowledge about usage, it could serve as an agent framework baseline. 2 Problem Statement Resource providers may want to optimize utilization, whereas resource users may want to optimize response time while minimizing cost. These goals can be contradictory though, and may hence be dealt with by separate agents. The problem that this work is addressing is how independent distributed agents who leverage ubiquitous Grid and Multi Agent protocols and standards work can automatically refine contracts, based on monitored performance, usage records and soft computing decision-making, rule-based algorithms, such as fuzzy logic theory and neural networks. The emphasis will be on the SLA trade-offs, and languages and methodologies for implementing and describing such trade-offs. Simulations will be focused on

Web Service-Based Grid Architecture and Its Supporting Environment

Article

Jul 2004

Grid is a new paradigm of Internet computing to share distributed resources and collaborate among them. A web service-based approach for Grid can improve the extensibility and interoperability of Grid system. In this paper, a layered Grid functional model is discussed; within the OGSA (open grid service architecture) framework, a Web service-based Grid architecture is presented. An approach of integrating web services and Grid technology is proposed. Web service workflow technology is used to model the task of the Grid application and its requirement on resource services. The architecture of a Web service-based Grid supporting environment called WebSASE4G is introduced, which gives a new approach to Web service based Grid architecture.

Resource Management for Early Production Grids

Article

Full-text available

This contribution presents the ongoing development of a resource man- ager for use in early production grids. Even though our main focus is to develop a stable brokering facility for current production grids, we also address features needed in further improved resource managers for future enhanced grid infrastructures. The primary target environment is the NorduGrid platform, comprising around 20 parallel systems in 5 countries, available for production grid jobs 24 hours a day. Application character- istics considered include serial, parallel, and coordinated multi-resource jobs running in sequence or in parallel, all types in either interactive or non-interactive mode. The brokering process aims to minimize the time to delivery for each individual job and is based on a number of new fea- tures including reservation capability, information about currently used or reserved capacity, benchmark-scaled time predictions, and queue adap- tation capability. We present the basic motivations for all these features and discuss various issues regarding their implementations in the current grid environment.

Service Level Agreement Requirements of an Accounting-Driven Computational Grid

Article

Thomas Sandholm

In this paper1 we present the requirements of a national computing Grid. In particular we discuss the issues involved in managing complex policies of multiple stakeholders in such a large-scale, dynamic, and heterogeneous Grid. We also propose a Service Level Agreement (SLA) and agent-based architecture to address these issues. This work is a continuation of the work performed and experiences gained when we developed a Grid accounting system for the Swedish national Grid network, called SweGrid, which provides the foundation for the investigation presented here. We conclude that many SLA concepts fit very well within the SweGrid network to address some of the issues of the current system. Future work includes prototyping parts of the SLA framework and running simulations before eventually deploying it in the SweGrid production environment.

Predictive adaptation for service level agreements on the grid

Article

Full-text available

Mar 2006

Users of Grid systems often need to attach Quality of Service (QoS) information such as time or performance constraints to guarantee timely execution of their application. Grid resources have varying quality and reliability and can easily be swamped by competing applications. If this coincides with the users execution their results may be delayed. In support of this we propose a Service Level Agreement (SLA) management system including resource reservation and run-time adaptation. Our system has the capability of predicting the execution time of an SLA bound application before and during runtime. A historical usage record for auditing and prediction of future execution times is also described. Through experimental analysis we show our solution is capable of predicting with some accuracy the execution time of SLA bound applications before and during runtime. Mechanisms for automated monitoring and violation capture are presented showing how performance and time constraints can be validated. Adaptation through migration is proved useful in reducing the execution time of our application when the CPU load available to that application is reduced.

Probabilistic Risk Assessment for Resource Provision in Grid

Article

Full-text available

Service Level Agreements (SLAs) are introduced to overcome the shortages of best-effort approach in Grid computing and make Grid computing more attractive for commercial uses. Yet commercial Grid providers are not keen to adopt SLAs, since there is a risk of SLA violation, which will result in a penalty fee. This paper analyses failure data collected from three different Grid sites. We study the statistics of the data including the root cause, the mean time to repair and time between failures. We find that software and hardware failures are the largest contributors, and that the time to repair varies, depending on the root cause, from 13 hours in network errors to around 46 hours in unknown errors. We also find that the repair time is well modelled by a Weibull distribution. From the analysis of the historical data we find that the distribution between failures in a Grid system is well modelled by a Weibull distribution with decreasing hazard rate, and this can be used by a resource provider to predicate the risk of failure.

Towards Trustworthy Resource Selection in Grid: A Fuzzy Partial Ordering based Approach

Article

Summary The problem of resource selection in Grid is challenging because of the wide range of selection and the high degree of strangeness. Efficient resource sharing and utilization cannot be achieved without the guarantee of a higher degree of trust relationship. In this paper, reputation mechanism is introduced to resource selection in Grid, which aims at leverage the guarantee of trustworthiness and reliability. According to the fact that reputation is multi-faceted and uncertain, guided by the evaluation and decision making ideas from fuzzy partial ordering, the proposed approach makes fuzzy partial order modeling on each resource provider's multi-faceted reputation, integrates overall information, and choose proper resource providers according to the final integrative order. Compared with other methods, this approach has better overall consideration.

Service Level Agreements in Service-Oriented Architecture Environments

Article

Full-text available

Sep 2008

Quality attribute requirements play an important role in service selection in a service-oriented architecture environment. It is easy to envision finding several services that fulfill the functional requirements but fail to meet important quality attribute measures. Service level agreements provide the architect with a specification of the verifiable quality characteristics that the service will provide. Such a specification allows the architect to select the service that best supports the system's quality attribute requirements. This report surveys the state of practice in service level agreement specification and offers guidelines on how to assure that services are provided with high availability, security, performance, and other required qualities. In addition, this report discusses the quality properties that have been expressed in service level agreements, and it considers the mechanisms used to assure service quality by contract and those used by service providers to achieve and monitor quality properties. Also, this report examines the support for expressing service quality requirements in existing service technologies and describes areas where more research is needed.

Dynamic Operating Policies for Commercial Hosting Environments

Article

Full-text available

Aug 2006

This paper reports on two strands of work that are being undertaken as part of the EPSRC funded DOPCHE project. The paper focuses on open soft-ware architectures for dynamic operating policies and a performance model used to find optimal operating policies.

Sla management and resource modelling for grid computing

Article

Full-text available

Current implementations of grids exist in the research and academic communities, where applications are generally characterised as being computationally intensive usually involving large amounts of data. For these purposes and in this environment a 'best effort' resource guarantee is sufficient. However, as grid uses mature in both the academic and commercial arenas, grid providers will require some form of service level management to address these issues. This paper presents a service l evel approach to grid resource management for SLAs, focussing on the management of SLAs and modelling of resources required by these processes.

The Use of Technology within Knowledge Management: A Review

Article

Full-text available

Dec 2006

Will Venters

This paper reviews how technological artefacts are employed within Knowledge Management interventions. The paper first describes the nature of technology within Knowledge Management practice. It then draws upon a categorisation of knowledge management as either functionalist or interpretivist to consider the use of technology either encoding knowledge objects, or in supporting personalisation and the emergence of communities of practice. Finally the paper draws upon phenomenological writings, in particular the work of Martin Heidegger, in order to consider the way in which individuals engage with technology and how this impacts upon the desire of knowledge management technology. Finally the paper concludes by calling up future research to consider the situated design of technology for Knowledge Management.

Market‐Oriented Resource Management and Scheduling: A Taxonomy and Survey

Chapter

Jul 2011

Market-oriented computing has gained a lot of attention both from industry and academia. Grid computing is the major paradigm, which supports the market-oriented computing, thus can enable vision of computing as utility a reality. Most important challenge in enabling utility Grids is the resource management and scheduling. From last decade many researchers has try to address many issues within the resource management and scheduling but still it looks far away from the original vision. Thus, to find out the gaps and direct future research, this chapter summarizes and classified all the important works through a comprehensive Taxonomy. This chapter also presents the survey of the most popular market-oriented resource management systems with research gaps still needed to be filled in. This survey is intended to help researchers to make cooperative effort towards the goal of utility grids and provide insights for extending and reusing the existing grid middleware.

Service Level Agreement Metrics for Real-Time Application on the Grid

Conference Paper

Full-text available

May 2008

Highly demanding application running on grids needs carefully prepared environments. Real-time High Energy Physics (HEP) application from Int.eu.grid project is a good example of an application with requirements difficult to fulfill by typical grid environments. In the paper we present Service Level Agreement metrics which are used by application’s dedicated virtual organization (HEP VO) to sign SLA with service providers. HEP VO with signed SLAs is able to guarantee sufficient service quality for the application. These SLAs are enforced using presented VO Portal.

The Virtual Resource Manager: Local Autonomy Versus QoS Guarantees for Grid Applications

Chapter

Full-text available

Jan 2006

In this paper, we describe the architecture of the virtual resource manager VRM, a management system designed to reside on top of local resource management systems for cluster computers and other kinds of resources. The most important feature of the VRM is its capability to handle quality-of-service (QoS) guarantees and service-level agreements (SLAs). The particular emphasis of the paper is on the various opportunities to deal with local autonomy for resource management systems not supporting SLAs. As local administrators may not want to hand over complete control to the Grid management, it is necessary to define strategies that deal with this issue. Local autonomy should be retained as much as possible while providing reliability and QoS guarantees for Grid applications, e.g., specified as SLAs.

Strategies for the Service Market Place

Conference Paper

Full-text available

Aug 2007

We describe a number of strategies for a future service oriented market place. We describe the SLA’s role within the service framework, and how it enables customers to make value judgements regarding the quality of a service. We also discuss the complexity of too much choice from both the customer and provider points of view, and advocate a “discrete offer” approach. We discuss the “cost of negotiation” and argue that it must be carefully balanced with the cost, value and risk of the offering being negotiated for. We add to the negotiation analysis with presentation and discussion of some results showing a simulated Grid market place and show that it is possible for service providers to deny themselves work through attempting to offer a high quality guaranteed service.

Web services on demand: WSLA-driven automated management

Article

Full-text available

Feb 2004
IBM SYST J

In this paper we describe a framework for providing customers of Web services differentiated levels of service through the use of automated management and service level agreements (SLAs). The framework comprises the Web Service Level Agreement (WSLA) language, designed to specify SLAs in a flexible and individualized way, a system to provision resources based on service level objectives, a workload management system that prioritizes requests according to the associated SLAs, and a system to monitor compliance with the SLA. This framework was implemented as the utility computing services part of the IBM Emerging Technologies Tool Kit, which is publicly available on the IBM alphaWorks™ Web site.

Lifetime service level agreement management with autonomous agents for services provision

Article

Jul 2009
INFORM SCIENCES

In the web services environment, service level agreements (SLA) refers to mutually agreed understandings and expectations between service consumers and providers on the service provision. Although management of SLA is critical to wide adoption of web services technologies in the real world, support for it is very limited nowadays, especially in web service composition scenarios. There lacks adequate frameworks and technologies supporting various SLA operations such as SLA formation, enforcement, and recovery. This paper presents a novel agent-based framework which utilises the agents’ ability of negotiation, interaction, and cooperation to facilitate autonomous SLA management in the context of service composition provision. Based on this framework, mechanisms for autonomous SLA operations are proposed and discussed. Results from simulations show that by integrating agents and web services the framework can address issues of SLA management drawn from sophisticated service composition scenarios.

Portfolio Scheduling for Managing Operational and Disaster-Recovery Risks in Virtualized Datacenters Hosting Business-Critical Workloads

Conference Paper

Jun 2019

Advanced Information Systems Engineering: 19th International Conference, CAiSE 2007, Trondheim, Norway, June 11-15, 2007. Proceedings

Book

Jan 2007
Lect Notes Comput Sci

ACM SIGACT news distributed computing column 8

Article

Full-text available

Sep 2002

Sergio Rajsbaum

The Distributed Computing Column covers the theory of systems that are composed of a number of interacting computing elements. These include problems of communication and networking, databases, distributed shared memory, multiprocessor architectures, operating systems, verification, internet, and the web.This issue consists of the paper "Distributed Computing Research Issues in Grid Computing" by Henri Casanova. Many thanks to Henri for contributing to this issue.

Ποιότητα Και Συμφωνίες Διασφάλισης Επιπέδου Υπηρεσιών Σε Υπηρεσιοστρεφείς Αρχιτεκτονικές

Article

Jan 2008

Κωνσταντίνος Τσερπές

A Survey on Service Quality Description

Article

Jul 2013

Quality of service (QoS) can be a critical element for achieving the business goals of a service provider, for the acceptance of a service by the user, or for guaranteeing service characteristics in a composition of services, where a service is defined as either a software or a software-support (i.e., infrastructural) service which is available on any type of network or electronic channel. The goal of this article is to compare the approaches to QoS description in the literature, where several models and metamodels are included. consider a large spectrum of models and metamodels to describe service quality, ranging from ontological approaches to define quality measures, metrics, and dimensions, to metamodels enabling the specification of quality-based service requirements and capabilities as well as of SLAs (Service-Level Agreements) and SLA templates for service provisioning. Our survey is performed by inspecting the characteristics of the available approaches to reveal which are the consolidated ones and which are the ones specific to given aspects and to analyze where the need for further research and investigation lies. The approaches here illustrated have been selected based on a systematic review of conference proceedings and journals spanning various research areas in computer science and engineering, including: distributed, information, and telecommunication systems, networks and security, and service-oriented and grid computing.

QoS-Based Grid Resource Management

Article

Full-text available

Nov 2006

As resource management becomes a hot research in Grid Computing area, current research forces on solving heterogeneity of grid environment, but the research on enhancing the efficiency of resource management on condition of delivering seamless QoS (quality of service) is not very abundant. In addition, current research about Grid QoS forces on importing related fruit on QoS from multimedia network to support Grid QoS. For that, a hierarchical structure of gird QoS is proposed in this paper. QoS parameters are newly classified into five categories and they can be measured at VO (virtual organization) layer. Then by making use of SNAP (service negotiation and acquisition protocol), the analysis on QoS parameter mapping and converting based on the hierarchical structure model is also addressed. At last, the research on Grid QoS is applied to scheduling heuristics to improve on Min-Min algorithm. The result of the simulation shows that QoS-based resource management can effectively improve grid resource utilization and service ask for success rate in dynamic service-oriented grid.

Grid Economics and Business Models, 4th International Workshop, GECON 2007, Rennes, France, August 28, 2007, Proceedings

Conference Paper

Jan 2007

SOG: A SELF-ORGANIZED GROUPING INFRASTRUCTURE FOR GRID RESOURCE DISCOVERY

Article

Full-text available

Anand Padmanabhan

Template-Based Automated Service Provisioning – Supporting the Agreement-Driven Service Life-Cycle

Conference Paper

Dec 2005
Lect Notes Comput Sci

Service Level Agreements (SLAs) are a vital instrument in service-oriented architectures to reserve service capacity at a defined service quality level. Provisioning systems enable service managers to automatically configure resources such as servers, storage, and routers based on a configuration specification. Hence, agreement provisioning is a vital step in managing the life-cycle of agreement-driven services. Deriving detailed resource quantities from arbitrary SLA specifications is a difficult task and requires detailed models of algorithmic behavior of service implementations and capacity of a – potentially heterogeneous – resource environment, which are typically not available today. However, if we look at, e.g., data centers today, system administrators often know the quality-of-service properties of known system configurations and modifications thereof and can write the corresponding provisioning specifications. This paper proposes an approach that leverages the knowledge of existing data center configurations, defines templates of provisioning specifications, and rules on how to fill these templates based on a SLA specification. The approach is agnostic to the specific SLA language and provisioning specification format used, if based on XML.

A Novel Trust Model Based on Bayesian Network for Service-Oriented Grid

Conference Paper

Jul 2009

With the converging of grid computing and Web service, grid has extended its territory from traditional computing grid to service-oriented grid, which is aiming to realize coordinated resource sharing and problem solving through service selection and composition. Therefore, selecting credible services for applications becomes a key issue in grid environment. Current research on trust inherits the conception of trust from P2P network which is coarse-grained and subjective to satisfy the requirements in grid environment. In this paper, a novel trust model considering userpsilas QoS constraints is proposed. Two new concepts, namely trustworthiness of service and satisfactoriness of service, are introduced into the proposed trust model, which are used to describe the resourcespsila capabilities and userspsila satisfaction respectively. The proposed trust model applies Bayesian Network to evaluate servicespsila trustworthiness that takes into account userspsila multiple QoS metrics. Simulative results show that the establishment of trust is more efficient. Also, when using the proposed trust model for service selection, experimental results indicate that it outperforms other models in terms of query success rate and userpsilas satisfaction.

Advance reservations for predictive service in the Internet

Article

Full-text available

May 1997

We extend a measurement-based admission control algorithm suggested for predictive service to provide advance reservations for guaranteed and predictive service, while retaining the attractive features of predictive service. The admission decision for advance reservations is based on information about flows that overlap in time. For flows that have not yet started, the requested values are used, and for those that have already started measurements are used. This allows us to estimate the network load accurately for the near future. To provide advance reservations we ask users to include durations in their requests. We present simulation results to show that predictive service with advance reservations provides utilization levels significantly higher than those for guaranteed service, and comparable to those for predicted service without advance reservations. Those utilization levels are reached without any preemption of other admitted flows. Finally, we discuss how to setup advance reservations over multiple hops in the Internet using resource reservation setup protocols.

The Anatomy of the Grid: Enabling Scalable Virtual Organizations

Conference Paper

Full-text available

Aug 2001
INT J HIGH PERFORM C

"Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. In this article, we define this new field. First, we review the "Grid problem," which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources-what we refer to as virtual organizations. In such settings, we encounter unique authentication, authorization, resource access, resource discovery, and other challenges. It is this class of problem that is addressed by Grid technologies. Next, we present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. We describe requirements that we believe any such mechanisms must satisfy, and we discuss the central role played by the intergrid protocols that enable interoperability among different Grid systems. Finally, we discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. We maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.

CPU service classes for multimedia applications

Conference Paper

Full-text available

Aug 1999

We present the design, implementation, and experimental results of our soft real time (SRT) system for multimedia applications on top of a general purpose UNIX environment. The SRT system supports multiple CPU service classes for the real time processes based on their processor usage pattern including periodic constant processing time class (PCPT) and periodic variable processing time (PVPT) class. It also provides the following features: reservation and processing time guarantees for the service classes; overrun protection and scheduling algorithm; and system-initiated adaptation strategies. The other unique feature of the SRT system is its easy portability to any operating systems with real time extensions because it is implemented purely in the user space without any modifications to the kernel. We have implemented the SRT system on the Solaris 2.6 operating system with scheduling overhead under 400us and with good performance guarantees

Application Experiences with the Globus Toolkit

Conference Paper

Full-text available

Aug 1998

The development of applications and tools for high-performance “computational grids” is complicated by the heterogeneity and frequently dynamic behavior of the underlying resources; by the complexity of the applications themselves, which often combine aspects of supercomputing and distributed computing; and by the need to achieve high levels of performance. The Globus toolkit has been developed with the goal of simplifying this application development task, by providing implementations of various core services deemed essential for high-performance distributed computing. In this paper, we describe two large applications developed with this toolkit: a distributed interactive simulation and a teleimmersion system. We describe the process used to develop the applications, review the lessons learned and draw conclusions regarding the effectiveness of the toolkit approach

Implementing distributed synthetic forces simulations inmetacomputing environments

Conference Paper

Full-text available

Apr 1998

A distributed, parallel implementation of the widely used Modular Semi-Automated Forces (ModSAF) Distributed Interactive Simulation (DIS) is presented, with scalable parallel processors (SPPs) used to simulate more than 50,000 individual vehicles. The single-SPP code is portable and has been used on a variety of different SPP architectures for simulations with up to 15,000 vehicles. A general metacomputing framework for DIS on multiple SPPs is discussed and results are presented for an initial system using explicit Gateway processes to manage communications among the SPPs. These 50K-vehicle simulations utilized 1,904 processors at six sites across seven time zones, including platforms from three manufacturers. Ongoing activities to both simplify and enhance the metacomputing system using Globus are described

Structuring communication software for quality-of-service guarantees

Article

Full-text available

Nov 1997

We propose architectural mechanisms for structuring host communication software to provide QoS guarantees. We present and evaluate a QoS sensitive communication subsystem architecture for end hosts that provides real time communication support for generic network hardware. This architecture provides services for managing communication resources for guaranteed QoS (real time) connections, such as admission control, traffic enforcement, buffer management, and CPU and link scheduling. The architecture design is based on three key goals: maintenance of QoS guarantees on a per connection basis, overload protection between established connections, and fairness in delivered performance to best effort traffic. Using this architecture we implement real time channels, a paradigm for real time communication services in packet switched networks. The proposed architecture features a process per channel model that associates a channel handler with each established channel. The model employed for handler execution is one of “cooperative” preemption, where an executing handler yields the CPU to a waiting higher priority handler at well defined preemption points. The architecture provides several configurable policies for protocol processing and overload protection. We present extensions to the admission control procedure for real time channels to account for cooperative preemption and overlap between protocol processing and link transmission at a sending host. We evaluate the implementation to demonstrate the efficacy with which the architecture maintains QoS guarantees on outgoing traffic while adhering to the stated design goals

DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems

Article

Full-text available

Apr 2000

In this paper we present a middleware infrastructure, called DataCutter, that enables processing of scientific datasets stored in archival storage systems across a widearea network. DataCutter provides support for subsetting of datasets through multidimensional range queries, and application specific aggregation on scientific datasets stored in an archival storage system. We also present experimental results from a prototype implementation. 1 Introduction Increasingly powerful computers have made it possible for computational scientists and engineers to model physical phenomena in great detail. As a result, overwhelming amounts of data are being generated by scientific and engineering simulations. In addition, large amounts of data are being gathered by sensors of various sorts, attached to devices such as satellites and microscopes. The primary goal of generating data through large scale simulations or sensors is to better understand the causes and effects of physical phenomena...

QoS-Aware Resource Management for Distributed Multimedia Applications

Article

Full-text available

Feb 2001

The ability of operating system and network infrastructure to provide end-to-end quality of service (QoS) guarantees in multimedia is a major acceptance factor for various distributed multimedia applications due to the temporal audio-visual and sensory information in these applications. Our constraints on the end-to-end guarantees are (1) QoS should be achieved on a general-purpose platform with a real-time extension support, and (2) QoS should be application-controllable. In order to achieve the users' acceptance requirements and to satisfy our constraints on the multimedia systems, we need a QoS-compliant resource management which supports QoS negotiation, admission and reservation mechanisms in an integrated and accessible way. In this paper we present a new resource model and a time-variant QoS management, which are the major components of the QoS-compliant resource management. The resource model incorporates, the resource scheduler, and a new component, the resource broker, which provides negotiation, admission and reservation capabilities for sharing resources such as CPU, network or memory corresponding to requested QoS. The resource brokers are intermediary resource managers; when combined with the resource schedulers, they provide a more predictable and finer granularity control of resources to the applications during the end-to-end multimedia communication than what is available in current general-purpose networked systems. Furthermore, this paper presents the QoS-aware resource management model called QualMan, as a loadable middleware, its design, implementation, results, tradeoffs, and experiences. There are tradeoffs when comparing our QualMan QoS-aware resource management in middleware and other QoSsupporting resource management solutions...

The Anatomy of the Grid - Enabling Scalable Virtual Organizations

Article

Mar 2001

Co-allocation services for computational grids

Article

Jan 1999

AAA Authorization Application Examples

Article

Aug 2000

Real-time analysis, visualization, and steering of microtomography experiments at photon sources

Conference Paper

Feb 2000

A new generation of specialized scientific instruments called synchrotron light sources allow the imaging of materials at very fine scales. However, in contrast to a traditional microscope, interactive use has not previously been possible because of the large amounts of data generated and the considerable computation required translating this data into a useful image. The authors describe a new software architecture that uses high-speed networks and supercomputers to enable quasi-real-time and hence interactive analysis of synchrotron light source data. This architecture uses technologies provided by the Globus computational grid toolkit to allow dynamic creation of a reconstruction pipeline that transfers data from a synchrotron source beamline to a preprocessing station, next to a parallel reconstruction system, and then to multiple visualization stations. Collaborative analysis tools allow multiple users to control data visualization. As a result, local and remote scientists can see and discuss preliminary results just minutes after data collection starts. The implications for more efficient use of this scarce resource and for more effective science appear tremendous.

Network Quality of Service

Chapter

Jan 1998

The COPS (common open policy service) protocol

Article

Jan 2000

This document describes a simple client/server model for supporting policy control over QoS signaling protocols. The model does not make any assumptions about the methods of the policy server, but is based on the server returning decisions to policy requests. The model is designed to be extensible so that other kinds of policy clients may be supported in the future. However, this document makes no claims that it is the only or the preferred approach for enforcing future types of policies.

Distributed Advance reservation of Real-time Connections

Article

May 1997

The ability to reserve real-time connections in advance is essential in all distributed multiparty applications (i.e., applications involving multiple human beings) using a network that controls admissions to provide good quality of service. This paper discusses the requirements of the clients of an advance reservation service, and a distributed design for such a service. The design is described within the context of the Tenet Real-Time Protocol Suite 2, a suite being developed for multiparty communication, which will offer advance reservation capabilities to its clients, based on the principles and the mechanisms proposed in the paper. Simulation results providing useful data about the performance and some of the properties of these mechanisms are also presented. We conclude that the one described here is a viable approach to constructing an advance reservation service within the context of the Tenet Suites as well as that of other solutions to the multiparty real-time communication problem.

A quality of service negotiation approach with future reservations (NAFUR): a detailed study

Article

Feb 1970
COMPUT NETW

Distributed multimedia (MM) applications such as video-on-demand and teleconferencing provide services with different quality of service (QoS) requirements. Hence, the user should be able to negotiate the desired QoS depending on his/her needs, the end-system characteristics and his/her financial capacity. In response to a service request with the desired QoS, most QoS negotiation approaches return an acceptance or a simple rejection of the request. More specifically, they provide the user only with the QoS that can be supported at the time the request is made and assume that the service is requested for indefinite duration. This paper describes work on a new QoS negotiation approach with future reservations (NAFUR) that decouples the starting time of the service from the time the service request is made and requires that the duration of the requested service must be specified. NAFUR allows to compute the QoS that can be supported for the time the service request is made, and at certain later times carefully chosen. As an example, if the requested QoS cannot be supported for the time the service request is made, the proposed approach allows to compute the earliest time, when the user can start the service with the desired QoS. NAFUR will help to increase (a) the flexibility of the system by providing the user with more choices, and (b) the system resource utilization, and the availability of the system, by encouraging the sharing of the resources, e.g. multicast for video-on-demand systems. Furthermore, it provides the flexibility to incorporate (a) a range of resource reservation schemes and scheduling policies, and (b) a range of new system component technologies.

A Dynamic Light-Weight Group Service

Article

Dec 2000

The virtual synchrony model for group communication is a powerful paradigm for building distributed applications. Implementations of virtual synchrony usually use failure detectors and failure recovery protocols. In applications that require a large number of groups, significant performance gains can be attained if these groups share the resources required to provide virtual synchrony. A service that maps multiple user groups onto a small number of instances of a virtually synchronous implementation is called a light-weight group service. This paper describes a new design for the light-weight group protocols that enables such service to function transparently. We discuss how these protocols can be applied in dynamic environments, where group mappings cannot be defined a priori and may change over time. We show that it is possible to establish mappings that promote resource sharing and, at the same time, minimize interference. These mappings can be established in an automated manner, using heuristics applied locally at each node. Experiments using an implementation in the Horus system show that significant performance improvements can be achieved with this approach.

End-to-end quality of service for high-end applications

Article

Jun 2000
COMPUT COMMUN

High-end networked applications such as distance visualization, distributed data analysis, and advanced collaborative environments have demanding quality of service (QoS) requirements. Particular challenges include concurrent flows with different QoS specifications, high-bandwidth flows, application-level monitoring and control, and end-to-end QoS across networks and other devices. We describe a QoS architecture and implementation that together help to address these challenges. The General-purpose Architecture for Reservation and Allocation (GARA) supports flow-specific QoS specification, immediate and advance reservation, and online monitoring and control of both individual resources and heterogeneous resource ensembles. Mechanisms provided by the Globus Toolkit are used to address resource discovery and security issues when resources span multiple administrative domains. Our prototype GARA implementation builds on differentiated services mechanisms to enable the coordinated management of two distinct flow types—foreground media flows and background bulk transfers—as well as the co-reservation of networks, CPUs, and storage systems. We present results obtained on a wide area differentiated services testbed that demonstrate our ability to deliver QoS for realistic flows.

Real-time Analysis, Visualization, and Steering of Microtomography Experiments at Photon Source.

Conference Paper

Jan 1999

A Resource Management Architecture for Metacomputing Systems.

Conference Paper

Jan 1998

Web services description language (WSDL) 1.1

Article

Mar 2001

XML schema part 0: Primer second edition

Article

Jan 2004

XML Schema Part 0: Primer is a non-normative document intended to provide an easily readable description of the XML Schema facilities, and is oriented towards quickly understanding how to create schemas using the XML Schema language. XML Schema Part 1: Structures and XML Schema Part 2: Datatypes provide the complete normative description of the XML Schema language. This primer describes the language features through numerous examples which are complemented by extensive references to the normative texts.

End-to-end provision of policy information for network QoS

Conference Paper

Feb 2001

High-end networked applications such as distance visualization, distributed data analysis, and advanced collaborative environments have demanding quality of service (QoS) requirements. We focus on making policy decisions when users attempt to make reservations for network bandwidth across several administrative network domains that are controlled by a bandwidth broker. We present a signalling protocol that facilitates the establishment of a distributed policy decision point as well as the establishment of a direct signalling channel between the source and end domains

Grid Information Services for Distributed Resource Sharing

Conference Paper

Feb 2001

Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or institutions: what are sometimes called virtual organizations. In these settings, the discovery, characterization, and monitoring of resources, services, and computations are challenging problems due to the considerable diversity; large numbers, dynamic behavior, and geographical distribution of the entities in which a user might be interested. Consequently, information services are a vital part of any Grid software infrastructure, providing fundamental mechanisms for discovery and monitoring, and hence for planning and adapting application behavior. We present an information services architecture that addresses performance, security, scalability, and robustness requirements. Our architecture defines simple low-level enquiry and registration protocols that make it easy to incorporate individual entities into various information structures, such as aggregate directories that support a variety of different query languages and discovery strategies. These protocols can also be combined with other Grid protocols to construct additional higher-level services and capabilities such as brokering, monitoring, fault detection, and troubleshooting. Our architecture has been implemented as MDS-2, which forms part of the Globus Grid toolkit and has been widely deployed and applied

Practical resource management for grid-based visual exploration

Conference Paper

Feb 2001

Computational grids are enabling collaboration between scientists and organizations to generate and archive extremely large datasets across shared, distributed resources. There is a need to visually explore such data throughout the life-cycle of projects. Practical exploration of large datasets requires visualization tools that can function in the same grid environment in which the data is created and stored. Resource management interfaces are an important structural component of grid computing environments because they enable uniform access to the wide variety of resources necessary for scientific work. We describe a new advance-reservation system for graphics resources; and an application of existing grid technology to create general-purpose active storage systems. We report our experience with prototype infrastructure and application components, involving experiments coupling end-to-end resources for interactive visual exploration of large data in representative distributed environments

Resource management through multilateral matchmaking

Conference Paper

Feb 2000

Federated distributed systems present new challenges to resource management, which cannot be met by conventional systems that employ relatively static resource models and centralized allocators. We previously argued that matchmaking provides an elegant and robust resource management solution for these highly dynamic environments (R. Raman et al., 1998). Although powerful and flexible, multiparty policies (e.g., co-allocation) cannot be accommodated by matchmaking. The authors present Gang-Matching, a multilateral matchmaking formalism to address this deficiency

Managing network resources in Condor

Conference Paper

Feb 2000

Data-intensive applications in the Condor high-throughput computing (HTC) environment can place heavy demands on network resources for checkpointing and remote data access. We have developed mechanisms to monitor, control and schedule network usage in Condor. By managing network resources, these mechanisms provide administrative control over Condor's network usage and improve the execution efficiency of Condor applications

Matchmaking: Distributed resource management for high throughput computing

Conference Paper

Aug 1998

Conventional resource management systems use a system model to describe resources and a centralized scheduler to control their allocation. We argue that this paradigm does not adapt well to distributed systems, particularly those built to support high throughput computing. Obstacles include heterogeneity of resources, which make uniform allocation algorithms difficult to formulate, and distributed ownership, leading to widely varying allocation policies. Faced with these problems, we developed and implemented the classified advertisement (classad) matchmaking framework, a flexible and general approach to resource management in distributed environment with decentralized ownership of resources. Novel aspects of the framework include a semi structured data model that combines schema, data, and query in a simple but powerful specification language, and a clean separation of the matching and claiming phases of resource allocation. The representation and protocols result in a robust, scalable and flexible framework that can evolve with changing resources. The framework was designed to solve real problems encountered in the deployment of Condor, a high throughput computing system developed at the University of Wisconsin-Madison. Condor is heavily used by scientists at numerous sites around the world. It derives much of its robustness and efficiency from the matchmaking architecture

A fault detection service for wide area distributed computations

Conference Paper

Aug 1998

The potential for faults in distributed computing systems is a significant complicating factor for application developers. While a variety of techniques exist for detecting and correcting faults, the implementation of these techniques in a particular context can be difficult. Hence, we propose a fault detection service designed to be incorporated, in a modular fashion, into distributed computing systems, tools, or applications. This service uses well-known techniques based on unreliable fault detectors to detect and report component failure, while allowing the user to tradeoff timeliness of reporting against false positive rates. We describe the architecture of this service, report on experimental results that quantify its cost and accuracy, and describe its use in two applications, monitoring the status of system components of the GUSTO computational grid testbed and as part of the NetSolve network-enabled numerical solver

The Globus Project: A Status Report

Conference Paper

Apr 1998

The Globus project is a multi-institutional research effort that seeks to enable the construction of computational grids providing pervasive, dependable, and consistent access to high-performance computational resources, despite geographical distribution of both resources and users. Computational grid technology is being viewed as a critical element of future high-performance computing environments that will enable entirely new classes of computation-oriented applications, much as the World Wide Web fostered the development of new classes of information-oriented applications. The authors report on the status of the Globus project as of early 1998. They describe the progress that has been achieved to date in the development of the Globus toolkit, a set of core services for constructing grid tools and applications. They also discuss the Globus Ubiquitous Supercomputing Testbed (GUSTO) that they have constructed to enable large-scale evaluation of Globus technologies, and they review early experiences with the development of large-scale grid applications on the GUSTO testbed

Structuring communication software for quality-of-serviceguarantees

Conference Paper

Jan 1997

We propose architectural mechanisms for structuring host communication software to provide QoS guarantees. In particular, we present and evaluate a QoS sensitive communication subsystems architecture for end hosts that provides real time communication support for generic network hardware. This architecture provides services for managing communication resources for guaranteed QoS (real time) connections, such as admission control, traffic enforcement, buffer management, and CPU and link scheduling. The design of the architecture is based on three key goals: maintenance of QoS guarantees on a per connection basis, overload protection between established connections, and fairness in delivered performance to best effort traffic. Using this architecture we implement real time channels, a paradigm for real time communication services in packet switched networks. We evaluate the implementation to demonstrate the efficacy with which the architecture maintains QoS guarantees while adhering to the stated design goals. The evaluation also demonstrates the need for specific features and policies provided in the architecture

Modeling TCP Behavior in a Differentiated Services Network

Article

Mar 2001

The differentiated services architecture has been proposed for providing different levels of services and has received wide attention. A packet in a diff-serv domain is classified into a class of service according to its contract profile and treated differently by its class. While many studies have addressed issues on the diff-serv architecture (e.g., dropper, marker, classifier and shaper), there have been few attempts to analytically understand a flow's behavior in a diff-serv network. We propose simple models of TCP behavior in a diff-serv network. Our models quantitatively characterize TCP throughput as functions of the contract rate, the packet-drop rate and the round-trip time in either two-drop precedence or three-drop precedence network. We also extend our models to aggregated flows. The models are validated through a number of simulations

Internet2 QBone: Building a testbed for differentiated services

Article

Oct 1999

The Internet2 project is a partnership of over 130 U.S. universities, 40 corporations, and 30 other organizations. Since its inception, one of the primary technical objectives of Internet2 has been to engineer scalable, interoperable, and administrable interdomain QoS to support an evolving set of new advanced networked applications. Applications like distance learning, remote instrument access and control, advanced scientific visualization, and networked collaboratories will allow universities to fulfill their research and education missions into the future, but only if the network QoS these applications require can be ensured. To meet this challenge, the Internet2 QBone initiative has brought together a dedicated group of U.S. university and federal agency networks, international research networks, engineers, researchers, and applications developers to build a testbed for interdomain IP differentiated services. This article presents the engineering motivations behind DiffServ and its adoption by Internet2, provides an overview of the QBone architecture, and describes its anticipated deployment, including plans for a trial inter-domain bandwidth brokering architecture. Security aspects are considered togethered together with an inter-bandwidth broker reservation signaling protocol

Design, Implementation and Experiences of the OMEGA End-Point Architecture

Article

Oct 1996

New cell-switched network technologies and multimedia peripherals enable distributed applications with strict real-time requirements such as remote control with feedback. Time-bounded network communications services are necessary, but not sufficient, to meet application-to-application real-time requirements. Real-time communication must be coupled with real-time computing support at the network end-points. An end-point architecture for the computation/communications coupling must be flexible and robust to support a diversity of applications. The OMEGA architecture, when coupled with cell-switched networks (or others which can make bandwidth and delay guarantees), can approximate the behavior of dedicated microcontrollers connected by dedicated circuits in support of an application. The essence of the OMEGA architecture is resource reservation and management within the set of multimedia endpoints. Communications is preceded by a call set-up period where requirements, expressed in terms of Quality of Service (QoS) parameters, are negotiated, and guarantees are made at several logical levels, such as between applications and the network subsystem, applications and the operating system, and the network subsystem and the operating system. This establishes customized connections and allocation of resources appropriate to the application requirements and OS/network capabilities. To facilitate this resource management process, a new paradigm called the 'QoS Brokerage' is used. This paradigm requires new services and protocols across all layers of the protocol stack (i.e., the higher layers of B-ISDN), as well as re-architecting the application/network interface. A prototype of OMEGA has been implemented and tested with a master/slave telerobotics application using a dedicated 155 Mbps ATM LAN. This application employs media with highly diverse QoS requirements and therefore provides a good platform for testing how closely one can approximate a dedicated circuit and controller with workstation hosts and cell-switching. Experience with this implementation has helped to identify new challenges to extending these techniques to a larger domain of applications and systems, and raises several new research questions.

Concepts for Resource Reservation in Advance

Article

Feb 1997

: Resource management offers Quality-of-Service reliability for time-critical continuousmedia applications. Currently, existing resource management systems in the Internet and ATM domain only provide means to reserve resources starting with the reservation attempt and lasting for an unspecified duration. However, for several applications such as video conferencing, the ability to reserve the required resources in advance is of great advantage. This paper outlines a new model for resource reservation in advance. We identify and discuss issues to be resolved for allowing resource reservation in advance. We show how the resource reservation in advance scheme can be embedded in a general architecture and describe the design and implementation of a resource management system providing reservation in advance functionality. 1 Introduction Computer systems used for continuous media processing must cope with streams having data rates of several Mbits/s and must provide timely processing guaran...

A Community Authorization Service for Group Collaboration

Article

Aug 2002

In "Grids" and "collaboratories," we find distributed communities of resource providers and resource consumers, within which often complex and dynamic policies govern who can use which resources for which purpose. We propose a new approach to the representation, maintenance, and enforcement of such policies that provides a scalable mechanism for specifying and enforcing these policies. Our approach allows resource providers to delegate some of the authority for maintaining fine-grained access control policies to communities, while still maintaining ultimate control over their resources. We also describe a prototype implementation of this approach and an application in a data management context.

Grid Service Specification

Article

Mar 2002

Building on both Grid and Web services technologies, the Open Grid Services Architecture (OGSA) defines mechanisms for creating, managing, and exchanging information among entities called Grid services. Succinctly, a Grid service is a Web service that conforms to a set of conventions (interfaces and behaviors) that define how a client interacts with a Grid service. These conventions, and other OGSA mechanisms associated with Grid service creation and discovery, provide for the controlled, fault resilient, and secure management of the distributed and often long-lived state that is commonly required in advanced distributed applications. In a separate document, we have presented in detail the motivation, requirements, structure, and applications that underlie OGSA. Here we focus on technical details, providing a full specification of the behaviors and Web Service Definition Language (WSDL) interfaces that define a Grid service.

The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration

Article

Jul 2002

In both e-business and e-science, we often need to integrate services across distributed, heterogeneous, dynamic "virtual organizations" formed from the disparate resources within a single enterprise and/or from external resource sharing and service provider relationships. This integration can be technically challenging because of the need to achieve various qualities of service when running on top of different native platforms. We present an Open Grid Services Architecture that addresses these challenges. Building on concepts and technologies from the Grid and Web services communities, this architecture defines a uniform exposed service semantics (the Grid service); defines standard mechanisms for creating, naming, and discovering transient Grid service instances; provides location transparency and multiple protocol bindings for service instances; and supports integration with underlying native platform facilities. The Open Grid Services Architecture also defines, in terms of Web Services Description Language (WSDL) interfaces and associated conventions, mechanisms required for creating and composing sophisticated distributed systems, including lifetime management, change management, and notification. Service bindings can support reliable invocation, authentication, authorization, and delegation, if required. Our presentation complements an earlier foundational article, "The Anatomy of the Grid," by describing how Grid mechanisms can implement a service-oriented architecture, explaining how Grid functionality can be incorporated into a Web services framework, and illustrating how our architecture can be applied within commercial computing as a basis for distributed system integration--within and across organizational domains.

The Globus Project: A Status Report

Article

Sep 1999
FUTURE GENER COMP SY

The Globus project is a multi-institutional research e#ort that seeks to enable the construction of computational grids providing pervasive, dependable, and consistent access to high-performance computational resources, despite geographical distribution of both resources and users. Computational grid technology is being viewed as a critical element of future highperformance computing environments that will enable entirely new classes of computation-oriented applications, much as the World Wide Web fostered the development of new classes of information-oriented applications. In this paper, we report on the status of the Globus project as of early 1998. We describe the progress that has been achieved to date in the development of the Globus toolkit, a set of core services for constructing grid tools and applications. We also discuss on the Globus Ubiquitous Supercomputing Testbed (GUSTO) that we have constructed to enable largescale evaluation of Globus technologies, and review early exp...

A Quality of Service Architecture that Combines Resource Reservation and Application Adaptation

Article

Jul 2000

Reservation and adaptation are two well-known and effective techniques for enhancing the end-to-end performance of network applications. However, both techniques also have limitations, particularly when dealing with high-bandwidth, dynamic flows: fixed-capability reservations tend to be wasteful of resources and hinder graceful degradation in the face of congestion, while adaptive techniques fail when congestion becomes excessive. We propose an approach to quality of service (QoS) that overcomes these difficulties by combining features of reservations and adaptation. In this approach, a combination of online control interfaces for resource management, a sensor permitting online monitoring, and decision procedures embedded in resources enable a rich variety of dynamic feedback interactions between applications and resources. We describe a QoS architecture, GARA, that has been extended to support these mechanisms, and use three examples of application-level adaptive strategies to show ho...

A Security Architecture for Computational Grids

Article

Feb 2000

State-of-the-art and emerging scientific applications require fast access to large quantities of data and commensurately fast computational resources. Both resources and data are often distributed in a wide-area network with components administered locally and independently. Computations may involve hundreds of processes that must be able to acquire resources dynamically and communicate e#ciently. This paper analyzes the unique security requirements of large-scale distributed (grid) computing and develops a security policy and a corresponding security architecture. An implementation of the architecture within the Globus metacomputing toolkit is discussed. 1 Introduction Large-scale distributed computing environments, or "computational grids" as they are sometimes termed [4], couple computers, storage systems, and other devices to enable advanced applications such as distributed supercomputing, teleimmersion, computer-enhanced instruments, and distributed data mining [2]. Grid applica...

A Resource Management Architecture for Metacomputing Systems

Article

Feb 1970
Lect Notes Comput Sci

Metacomputing systems are intended to support remote and/or concurrent use of geographically distributed computational resources. Resource management in such systems is complicated by five concerns that do not typically arise in other situations: site autonomy and heterogeneous substrates at the resources, and application requirements for policy extensibility, co-allocation, and online control. We describe a resource management architecture that addresses these concerns. This architecture distributes the resource management problem among distinct local manager, resource broker, and resource coallocator components and defines an extensible resource specification language to exchange information about requirements. We describe how these techniques have been implemented in the context of the Globus metacomputing toolkit and used to implement a variety of different resource management strategies. We report on our experiences applying our techniques in a large testbed, GUSTO, incorporating ...

Managing Network Resources in Condor

Article

Jun 2000

Data-intensive applications in the Condor High Throughput Computing environment can place heavy demands on network resources for checkpointing and remote data access. We have developed mechanisms to monitor, control, and schedule network usage in Condor. By managing network resources, these mechanisms provide administrative control over Condor's network usage and improve the execution efficiency of Condor applications. 1 Introduction Until recently, the Condor research project has focused on the challenges of managing usage of CPU resources for High Throughput Computing (HTC) [4]. However, as the amount of physical memory available to HTC applications has dramatically increased, HTC environments have become an attractive platform for applications which are more data-intensive. As these applications place greater demands on the network, it has become important for Condor to manage usage of network resources in addition to CPU resources to enforce administrative network policies and to...

Datacutter: Middleware for filtering very large scientific datasets on archival storage systems

Jan 2000
119-134

M Beynon
R Ferreira
T M Kurc
A Sussman
J H Saltz

M. Beynon, R. Ferreira, T. M. Kurc, A. Sussman, and J. H. Saltz. Datacutter: Middleware for filtering very large scientific datasets on archival storage systems. In IEEE Symposium on Mass Storage Systems, pages 119-134, 2000.

A security architecture for computational grids

Jan 1998
83-91

I Foster
C Kesselman
G Tsudik
S Tuecke

I. Foster, C. Kesselman, G. Tsudik, and S. Tuecke. A security architecture for computational grids. In ACM Conference on Computers and Security, pages 83-91. ACM Press, 1998.

Network quality of service

479-503

R Guérin
H Schulzrinne

R. Guérin and H. Schulzrinne. Network quality of service. In [16], pages 479-503.

Exploration and visualization of very large datasets with the Active Data Repository

Jan 2001

T Kurc
C Ümit Ç Atalyürek
A Chang
J Sussman
Salz

T. Kurc,Ümit Ç atalyürek, C. Chang, A. Sussman, and J. Salz. Exploration and visualization of very large datasets with the Active Data Repository. Technical Report CS-TR-4208, University of Maryland, 2001.

A community authorization service for group collaboration

Jun 2002

L Pearlman
V Welch
I Foster
C Kesselman
S Tuecke

L. Pearlman, V. Welch, I. Foster, C. Kesselman, and S. Tuecke. A community authorization service for group collaboration. In The IEEE 3rd International Workshop on Policies for Distributed Systems and Networks, June 2002.

SNAP: A Protocol for Negotiation of Service Level Agreements and Coordinated Resource Management in

Abstract

No full-text available

Recommended publications

SNAP: A Protocol for Negotiating Service Level Agreements and Coordinating Resource