Article

Hierarchical Filtering-based Monitoring Architecture for Large-Scale Distributed Systems

January 1998

January 1998

Authors:

Ehab Al-Shaer

Carnegie Mellon University

Providing quality of service monitoring: challenges and approaches

Conference Paper

Feb 2000

Future integrated services networks will need to provide quality of service (QoS) guarantees to multimedia applications. To ensure that the contracted QoS is sustained, it is not sufficient to just commit resources. QoS monitoring is required to detect and locate the degradation of QoS performance. In addition, the distribution of QoS, instead of simply end-to-end QoS, needs to be monitored. In QoS distribution monitoring, the distribution of QoS experienced by a real-time flow in different network segments is monitored. This paper presents a brief survey of current QoS monitoring-related mechanisms, followed by a discussion of the challenges involved in providing QoS distribution monitoring. Several approaches are then proposed to meet these challenges

HiFi: a new monitoring architecture for distributed systems management

Conference Paper

Feb 1999

With the increasing complexity of large scale distributed (LSD) systems, an efficient monitoring mechanism has become an essential service for improving the performance and reliability of such complex applications. The paper presents a scalable, dynamic, flexible and nonintrusive monitoring architecture for managing large scale distributed (LSD) systems. This architecture, which is referred to as the HiFi monitoring system, detects and classifies interesting primitive and composite events and performs either a corrective or steering action. When appropriate, information is also disseminated to management applications, such as reactive control tools. The outlined solution offers improvements over related works by supporting new monitoring techniques such as hierarchical filtering based monitoring and filter incarnation that improve the monitoring scalability and dynamism which are required for managing large scale distributed systems. The HiFi monitoring system has been implemented and used at the Old Dominion University for monitoring and steering Interactive Remote Instruction (IRI) which is a large scale distributed multimedia system for distance learning

A Scalable Monitoring Architecture for Managing Large-scale Distributed Multimedia Systems

Article

May 1997

Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed multimedia (LDM) systems. In an LDM environment, a large number of events is generated by the system components during its execution or interaction with external objects (e.g. users or processes). Monitoring such events is necessary for observing the run-time behavior of LDM systems and providing status information required for managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment. This complicates the management decisions process thereby making monitoring LDM systems an intricate task. Furthermore, different media streams in LDM systems may have different management requirements that must be considered in the monitoring architecture. In this paper, we present a scalable high-performance monitoring architecture for LDM systems using a hierarchical event filtering...

A Scalable Monitoring Architecture for Managing Distributed Multimedia Systems

Article

Sep 1997

Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed multimedia (LDM) systems. Monitoring events generated by LDM systems is necessary for observing the runtime behavior of LDM systems and providing status information required for managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment. Furthermore, different media streams in LDM systems may have different management requirements that must be considered in the monitoring architecture. In this paper, we present a scalable high-performance monitoring architecture for LDM systems using a hierarchical event filtering mechanism to detect and classify interesting local and global events and perform the appropriate action specified with this event or disseminate the monitoring information to the corresponding end-points management applications. We also describe how this mon...

High-performance Monitoring Architecture for Large-scale Distributed Systems Using Event Filtering

Article

Oct 1997

Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed (LSD) systems. In an LSD environment, a large number of events is generated by the system components during its execution or interaction with external objects (e.g. users or processes) . Monitoring such events is necessary for observing the run-time behavior of LSD systems and providing status information required for debugging, tuning and managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment which complicates the management decisions process and thereby makes monitoring LSD systems an intricate task. In this paper, we present a scalable high-performance monitoring architecture for LSD systems using an efficient event filtering mechanism to detect and classify interesting local and global events and disseminate the monitoring information to the corresponding endp...

HiFi: A New Monitoring Architecture for Distributed Systems Management

Article

Aug 1999

With the increasing complexity of large-scale distributed (LSD) systems, an efficient monitoring mechanism has become an essential service for improving the performance and reliability of such complex applications. This paper presents a scalable, dynamic, flexible and nonintrusive monitoring architecture for managing large-scale distributed (LSD) systems. This architecture, which is referred to as the HiFi monitoring system, detects and classifies interesting primitive and composite events and performs either a corrective or steering action. When appropriate, information is also disseminated to management applications, such reactive control tools. The outlined solution offers improvements over related works by supporting new monitoring techniques such as hierarchical filtering-based monitoring and filter incarnation that improve the monitoring scalability and dynamism which are required for managing large-scale distributed systems. The HiFi monitoring system has been implemented and u...

Infrastructure for the Management of SmartHomes

Article

Dec 2002

this paper, we present the various types of users and their management needs in the context of SmartHomes. We describe an infrastructure consisting of architectures for the management of embedded and distributed applications in a SmartHome. We also present the criteria for evaluating various management architectures and discuss the evaluation of the described architectures

Challenges and approaches in providing QoS monitoring

Article

Dec 2001
Int J Netw Manag

This paper presents a brief survey of current QoS monitoring-related mechanisms, followed by a discussion of the challenges involved in providing QoS distribution monitoring. Several approaches are then proposed to meet these challenges. Finally, the issues that remain open are discussed. Copyright # 2000 John Wiley & Sons, Ltd. Introduction C omputer networks are evolving to support multimedia applications with diverse performance requirements. To provide quality of service (QoS) guarantees to these applications and ensure that the agreed QoS is sustained, it is not sufficient to just commit resources since QoS degradation is often unavoidable. Any fault or weakening of the performance of a network element may result in the degradation of the contracted QoS. Thus, QoS monitoring is required to track the ongoing QoS, compare the monitored QoS against the expected performance, detect possible QoS degradation, and then tune n

On IP Traffic Monitoring

Chapter

Full-text available

Jul 2004

An overview on emerging IP traffic monitoring is presented. Important parameters to characterize the traffic, network, and QoS are discussed. The infrastructure and methodology to measure those parameters directly or to compute them based on other measurements are described. We also present a discourse on coping with the challenge of new transport architectures and technologies. In summary, a framework of IP traffic monitoring is presented.

A Dynamic Middleware-based Instrumentation Framework to Assist the Understanding of Distributed Applications

Article

Full-text available

Denis Reilly

Integrated quality of service and network management

Conference Paper

Feb 2000

Today, there are a large number of bandwidth-hungry applications which cause congestion and delay in networks. Delivery delays adversely affect critical applications especially those with real time requirements, e.g. video-conferencing. Hence, more effective quality of service (QoS) and network performance monitoring is required in order to quickly identify and locate performance bottlenecks. An integrated remote monitoring application that monitors both QoS and network performance has been developed. This application is based on the IETF Real Time Flow Monitoring Architecture (RTFM)

DEMIS: A dynamic event model for interactive systems

Article

Full-text available

Oct 2002

Modern interaction systems are usually event-driven. New input devices often require new event types, and handling input from the user becomes increasingly more complex. Frequently, the WIMP (Windows, Icons, Menus, Pointer) paradigm widely used today is not suitable for interactive applications, such a virtual reality applications, that use more than the standard mouse and keyboard input devices. In this paper, we present the design and implementation of the Dynamic Event Model for Interactive System (DEMIS). DEMIS is a middleware between the operating system and the application that supports various input device events while using generic event recognition to detect composite events.

Using Runtime Information of Controllers for Safe Adaptation at Runtime: A Process Mining Approach

Chapter

Sep 2023

The increasing complexity of current Software Systems is generating the urge to find new ways to check the correct functioning of models during runtime. Runtime verification helps ensure that a system is working as expected even after being deployed, essential when dealing with systems working in critical or autonomous scenarios. This paper presents an improvement to an existing tool, named CRESCO, linking it with another tool to enable performing periodical verification based on event logs. These logs help determine whether the functioning of the system is inadequate or not after the last periodic check. If the system is determined to be working incorrectly, new code files are automatically generated from the traces of the log file, so they can be replaced when a faulty scenario is to occur. Thanks to this improvement, the CRESCO components are able to evaluate their correctness and adapt themselves at runtime, making the system more robust against unforeseen faulty scenarios.

Runtime Observable and Adaptable UML State Machine-Based Software Components Generation and Verification: Models@Run.Time Approach

Thesis

Full-text available

Oct 2019

Miren Illarramendi

Cyber-Physical Systems (CPSs) are embedded computing systems in which computation interacts closely with the physical world through sensors and actuators. CPSs are used to control context aware systems. These types of systems are complex systems that will have different configurations and their control strategy can be configured depending the environmental data and current situation of the context. Therefore, in current industrial environments, the software of embedded and Cyber-Physical systems have to cope with increasing complexity, uncertain scenarios and safe requirements at runtime. The UML State Machine is a powerful formalism to model the logical behaviour of these types of systems, and in Model Driven Engineering (MDE) we can generate code automatically from these models. MDE aims to overcome the complexity of software construction by allowing developers to work at the high-level models of software systems instead of low-level codes. However, determining and evaluating the runtime behaviour and performance of models of CPSs using commercial MDE tools is a challenging task. Such tools provide little support to observe at model-level the execution of the code generated from the model, and to collect the runtime information necessary to, for example, check whether defined safe properties are met or not. One solution to address these requirements is having the software components information in model terms at runtime (models@run.time). Work on models@run.time seeks to extend the applicability of models produced in MDE approaches to the runtime environment. Having the model at runtime is the first step towards the runtime verification. Runtime verification can be performed using the information of model elements (current state, event, next state,etc.) This thesis aims at advancing the current practice on generating automatically Unified Modeling Language - State Machine (UML-SM) based software components that are able to provide their internal information in model terms at runtime. Regarding automation, we propose a tool supported methodology to automatically generate these software components. As for runtime monitoring, verification and adaptation, we propose an externalized runtime module that is able to monitor and verify the correctness of the soft are components based on their internal status in model terms at component and system level. In addition, if an error is detected, the runtime adaptation module is activated and the safe adaptation process starts in the involved software components. All things considered, the overall safe level of the software components and CPSs is enhanced.

UNIFRAME RESOURCE DISCOVERY SERVICE MONITORING AND MANAGEMENT SYSTEM

Article

Srikanth Reddy

A Business Continuity Monitoring Model for Distributed Architectures: A Case Study

Article

Full-text available

Jan 2011

Computerized services are the driving force behind every day business for many companies, it is of the utmost importance that these services are available during business hours because downtime costs serious money. Most of the computerized services today are based on a distributed architecture because of the many benefits of such an architecture. There is a downside to distributed architectures though; distributed architectures have an incomplete observability problem resulting in tough decision making and difficult control of the system build according to the architecture. This paper describes a design of a business continuity monitoring model, developed to cope with software, hardware, and operator failures by reducing the time required to detect, diagnose, and repair a problem in a distributed architecture. It is based on a three-tier model combined with five monitoring domains distilled from a standard distributed architecture. A prototype was developed to test the model in a real environment.

Fundamentals of Decentralized Optimization in Autonomic Systems

Article

Full-text available

May 2004

this paper therefore is to provide conditions under which a decentralized optimization framework is as good as a centralized framework. In particular, we show that there is no loss of quality in the optimal self-management of complex information systems when a decentralized approach is used and we provide a foundation for the decentralized approach to designing and implementing autonomic systems with self-# properties. Another purpose of our study is to investigate in more detail the interactions between system components at different levels of this hierarchical decentralized framework for optimal self-management. Specifically, we consider a negotiation scheme where additional information is passed between the CM and the AEsin order to significantly increase the efficiency with which the optimization algorithms compute the optimal solution. We then exploit a representative example of our general mathematical framework to investigate other fundamental properties of decentralized optimal self-management in practice, including phase transitions, chaotic behavior, stability and computational complexity

Monitoring and Debugging of Real-Time Systems: A Survey

Article

Idriz Smaili

The objective of this paper is the presentation of the basic concepts for monitoring of real-time systems. It starts with presentation of real-time systems with special view on real-time data. Special attention will be paid to the notion temporal consistency of real-time data (because these data are observed and col-lected by monitoring systems), which consists of the absolute and relative consistency. Especially, the rel-ative consistency is very important for monitoring systems, because during correlation of collected mon-itoring data, these systems must be certain that these data are relative consistent. Otherwise, the gathered monitoring information would not correctly represent the behavior of the monitored system at the intended abstraction level. Another goal of this paper is the presentation of a survey of the real-time monitoring research area. The goal of the last section of this pa-per will be the case studies, in which different moni-toring systems are presented, including the monitor-ing approaches used for monitoring of time-triggered systems that are based on the time-triggered architec-ture.

Adaptable Analysis of Dependable System Architectures Through Monitoring

Conference Paper

Jan 2004

Every day, our society becomes more dependent on complex software systems with high availability requirements, such as those present in telecommunications, air traffic control, power plants and distribution lines, among others. In order to facilitate the task of maintaining and evolving such systems, dynamic software architecture infrastructures have recently been in the research agenda. However, complexity and dynamic evolution of dependable systems bring some challenges for verification. Some of these challenges are associated to modifications in the set of properties being verified and also in the types of analysis being performed during system operation. In this work, we present a multiple specification and architectural-based approach for software monitoring that allows the adaptation of analysis tasks in order to properly handle the challenges mentioned above.

Tools and Techniques for Performance Measurement of Large Distributed Multiagent Systems

Conference Paper

Full-text available

Jul 2003

Performance measurement of large distributed multiagent systems (MAS) offers challenges that must be addressed explicitly in the agent infrastructure. Performance data is widely distributed and voluminous, and poor data collection can impact the operation of the system itself. However, performance metrics are essential to internal system function, e.g., autonomous adaptation to dynamic environments, as well as to external assessment. In this paper we describe the tools, techniques, and results of performance characterization of the Cougaar distributed agent architecture. These techniques include infrastructure instrumentation, plugin-based instrumentation of agents, and dynamic control of metric collection. We introduce multiple redundant "channels" for metric delivery, each serving separate quality of service requirements. We present our techniques for instrumenting the agent society, justify the metrics chosen, and describe the tools developed for collecting these metrics. We also present results from distributed agent societies comprising hundreds of agents.

Security issues in adaptive distributed systems

Conference Paper

Full-text available

Jan 2006

Adaptive Distributed Systems (ADSs) are distributed systems that can evolve their behaviors based on changes in their environments. In this work, we discuss security and propose security metrics issues in the context of ADSs. A key premise with adaptation of distributed systems is that in order to detect changes, information must be collected by monitoring the system and its environment. How monitoring should be done, what should be monitored, and the impact monitoring may have on the security mechanism of the target system need to be carefully considered. Conversely, the impact of implementation of security mechanism on the adaptation of distributed system is also assessed. We propose security metrics that can be used to quantify the impact of monitoring on the security mechanism of the target distributed system.

Monitoring QoS distribution in multimedia networks

Article

Mar 2000
Int J Netw Manag

This paper presents two schemes, relevant monitor (RM)-based and improved relevant monitor (IRM)-based, for QoS distribution monitoring. With these schemes, when monitoring a real-time flow, a network manager can locate relevant monitors that are metering the flow. Copyright # 2000 John Wiley & Sons, Ltd. Introduction P roviding quality of service (QoS) guarantees is an important requirement for multimedia networks. To maintain agreed QoS, it is not sufficient to just commit resources because QoS degradation can be caused by many factors and is often unavoidable, e.g. any fault or weakening of the performance of a network element may result in the degradation of contracted QoS. Thus, performance management is required to ensure that the contracted QoS is sustained. 1 To date, there has been a considerable amount of research within the field of QoS management support for multimedia networks, including the service model,<F7

Adaptive Object-Oriented Filtering Framework for Event Management Applications

Article

Full-text available

Mar 2000

Event filtering is an essential element in event management applications. In event management environments, the filtering mechanisms are employed to track the events generated from applications at run-time and perform the corresponding appropriate actions. Several key applications domains, such as system and network management, distributed system toolkits, communication protocols and active databases, utilize event filtering for various management purposes. The goal of this paper is to describe the object-oriented design and implementation of an adaptive event filtering framework which can be integrated and reused efficiently to develop event management applications for various domain environments. In our approach, the event filtering framework captures the common components and design patterns of event management in different domains. The major contribution of this work is to provide a flexible event filtering framework that can be efficiently adapted to different domain-specific requirements and with minimal development effort. In this paper, we also present examples of using the event filtering framework for developing event management applications in different domains.

A Generic Software Monitoring Model and Features Analysis

Article

Mar 2011

Software runtime monitoring has been used to increase the dependability of software. This paper focuses on software runtime monitoring techniques and tools. A generic software runtime monitoring model is presented, which consists of five basic elements, i.e., Monitored Object Features, Monitoring Access Methods, Execution Relationships, Runtime Monitor and Platform Dependencies. This model is an innovation in software monitoring fields. This paper gives some features of each element. Based on these features, researchers can use the model to comprehend and analyze runtime monitoring techniques and tools. The objective of this paper is to help researchers and users to identify the difference and the basic principles of software runtime monitoring techniques and tools. This paper also shows a result of relationship between techniques and features, through the result, we can understand the development trends of the techniques and tools, such as, what features are concerned more, and what features are concerned less.

Metadata-Mediated Browsing and Retrieval in a Cultural Heritage Image Collection

Chapter

Full-text available

Jul 2004

For users of image management systems, and especially for the user who doesn't know what he wants until he sees it, these systems should be organized in such a way as to support intelligent browsing so that the user will be satisfied in the shortest amount of time. It is our belief that intelligent browsing should be mediated by the standard paradigms of image similarity as well as by an appropriate organization of metadata, including annotations and self-describing image regions.

RTP: A Transport Protocol for Real-Time Applications

Technical Report

Full-text available

Jul 2003

This memorandum describes RTP, the real-time transport protocol. RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of-service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers. Most of the text in this memorandum is identical to RFC 1889 which it obsoletes. There are no changes in the packet formats on the wire, only changes to the rules and algorithms governing how the protocol is used. The biggest change is an enhancement to the scalable timer algorithm for calculating when to send RTCP packets in order to minimize transmission in excess of the intended rate when many participants join a session simultaneously.

Development of a debugger for a concurrent language

Article

Full-text available

Apr 1986

This work deals with some issues concerned in the debugging of concurrent programs. A set of desirable characteristics for a debugger for concurrent languages is deduced from a review of the differences between the debugging of concurrent programs and that of sequential ones. A debugger for a concurrent language, based upon CSP, is then described. The debugger makes it possible to compare a description of the expected program behavior to the actual behavior. The description of the behavior is given in terms of expressions composed by events and/or assertions on the process state. The developed formalism is able to describe behaviors at various levels of abstraction. Lastly, some guidelines for the implementation of the debugger are given and a detailed example of program debugging is analyzed.

Efficient use of workstations for passive monitoring of local area networks

Article

Full-text available

Aug 1990
COMPUT COMMUN REV

Jeffrey Mogul

Effective management of a local area network (LAN) requires not only a protocol to manage the active entities, but also a means to monitor the LAN channel. This is especially true in shared-channel LANs, such as Ethernet, where the behavior of the LAN as a whole may be impractical to deduce from the states of the individual hosts. Passive monitoring can be done using either a dedicated system or a general-purpose system. Dedicated monitors have been favored for several reasons, but recent workstations, when carefully programmed, are sufficiently powerful to serve this function. Using a workstation offers high-performance graphics and a more flexible environment for collecting and presenting LAN behavior.

Network management by delegation - the MAD approach

Conference Paper

Full-text available

Jan 1991

Network management systems built on a client/server model centralize responsibilities in client manager processes, with server agents playing restrictive support roles. As a result, managers must micro-manage agents through primitive steps, resulting in ineffective distribution of management responsibilities, failure-prone management bottlenecks, and limitations for real time responsiveness. We present a more flexible paradigm, the Manager-Agent Delegation (MAD) framework. It supports the ability to extend the functionality of servers (agents) at execution time, allowing flexible distribution of management responsibilities in a distributed environment. MAD can store and instantiate delegated scripts, and provides a concurrent runtime environment, where they can execute asynchronously without requiring the manager's intervention. A delegation protocol allows a manager to transfer programs, create process instances, and control their execution. We describe the delegation model, its application to network management, and the design of a prototype implementation.

A Coding Approach to Event Correlation

Conference Paper

Full-text available

Jan 1995

This paper describes a novel approach to event correlation in networks based on coding techniques. Observable symptom events are viewed as a code that identifies the problems that caused them; correlation is performed by decoding the set of observed symptoms. The coding approach has been implemented in SMARTS Event Management System (SEMS), as server running under Sun Solaris 2.3. Preliminary benchmarks of the SEMS demonstrate that the coding approach provides a speedup at least two orders of magnitude over other published correlation systems. In addition, it is resilient to high rates of symptom loss and false alarms. Finally, the coding approach scales well to very large domains involving thousands of problems.

PathFinder: A Pattern-Based Packet Classifier.

Conference Paper

Full-text available

Jan 1994

This paper describes a pattern-based approach to building packet classifiers. One novelty of the approach is that it can be implemented efficiently in both software and hardware. A performance study shows that the software implemen- tation is about twice as fast as existing mechanisms, and that the hardware implementation is currently able to keep up with OC-12 (622Mbps) network links and is likely to operate at gigabit speeds in the near future.

Interactive Distance Learning over Intranets.

Article

Full-text available

Feb 1997

this article we describe IRI and the lessons learned deploying it. We first deployed IRI to teach a fall 1995 graduate course in software metrics. We then evaluated it in terms of logistics, reliability, performance, and usability, performing off-line experiments to try out new features and to develop protocols that would improve IRI use. We subsequently reengineered IRI into an open architecture with a published specification as version 1.0, which became available just recently (http://www.cs.odu.edu/ tele/iri). We used version 1.0 to teach a junior-level software engineering course in the fall 1996 semester.

Monitoring Distributed Systems.

Article

Full-text available

Mar 1987

The monitoring of distributed systems involves the collection, interpretation, and display of information concerning the interactions among concurrently executing processes. This information and its display can support the debugging, testing, performance evaluation, and dynamic documentation of distributed systems. General problems associated with monitoring are outlined in this paper, and the architecture of a general purpose, extensible, distributed monitoring system is presented. Three approaches to the display of process interactions are described: textual traces, animated graphical traces, and a combination of aspects of the textual and graphical approaches. The roles that each of these approaches fulfill in monitoring and debugging distributed systems are identified and compared. Monitoring tools for collecting communication statistics, detecting deadlock, controlling the non-deterministic execution of distributed systems, and for using protocol specifications in monitoring are also described. Our discussion is based on experience in the development and use of a monitoring system within a distributed programming environment called Jade. Jade was developed within the Computer Science Department of the University of Calgary and is now being used to support teaching and research at a number of university and research organizations.

A Reliable Multicast Framework for Light-Weight Sessions and Application Level Framing.

Article

Full-text available

Dec 1997

This paper describes scalable reliable multicast (SRM), a reliable multicast framework for light-weight sessions and application level framing. The algorithms of this framework are efficient, robust, and scale well to both very large networks and very large sessions. The SRM framework has been prototyped in wb, a distributed whiteboard application, which has been used on a global scale with sessions ranging from a few to a few hundred participants. The paper describes the principles that have guided the SRM design, including the IP multicast group delivery model, an end-to-end, receiver-based model of reliability, and the application level framing protocol model. As with unicast communications, the performance of a reliable multicast delivery algorithm depends on the underlying topology and operational environment. We investigate that dependence via analysis and simulation, and demonstrate an adaptive algorithm that uses the results of previous loss recovery events to adapt the control parameters used for future loss recovery. With the adaptive algorithm, our reliable multicast delivery algorithm provides good performance over a wide range of underlying topologies.

Designing Reusable Classes

Article

Full-text available

Jun 1988

Object-oriented programming is as much a different way of designing programs as it is a different way of designing programming languages. This paper describes what it is like to design systems in Smalltalk. In particular, since a major motivation for object-oriented programming is software reuse, this paper describes how classes are developed so that they will be reusable.

Debugging concurrent programs

Article

Full-text available

Dec 1989

The main problems associated with debugging concurrent programs are increased complexity, the 'probe effect', nonrepeatability, and the lack of a synchronized global clock. The probe effect refers to the fact that any attempt to observe the behavior of a distributed system may change the behavior of that system. For some parallel programs, different executions with the same data will result in different results even without any attempt to observe the behavior. Even when the behavior can be observed, in many systems the lack of a synchronized global clock makes the results of the observation difficult to interpret. This paper discusses these and other problems related to debugging concurrent programs and presents a survey of current techniques used in debugging concurrent programs. Systems using three general techniques are described: traditional or breakpoint style debuggers, event monitoring systems, and static analysis systems. In addition, techniques for limiting, organizing, and displaying a large amount of data produced by the debugging systems are discussed.

Development of a debugger for a concurrent language

Article

Aug 1983

This work discusses some issues in the debugging of concurrent programs. A set of desirable characteristics of a debugger for concurrent languages is deduced from an examination of the differences between the debugging of concurrent programs and that of sequential ones. A debugger for a concurrent language, derived from CSP, is then presented. It is based upon a semantic model of the supported language. The debugger enables to compare a description of the program behaviour to the actual behaviour as well as to valuate assertions on the process state. The description of the behaviuor is given by a formalism whose semantics is also specified. The formalism can specify program behaviuors at various abstraction levels. Lastly some guidelines for the implementation of the debugger are shown and a detailed example of program description is analyzed.

Event Specification in an Active Object-Oriented Database

Article

Jan 1992

The concept of a trigger is central to any active database. Upon the occurrence of a trigger event, the trigger is “fired”, i.e, the trigger action is executed. We describe a model and a language for specifying basic and composite trigger events in the context of an object-oriented database. The specified events can be detected efficiently using finite automata. We integrate our model with O++, the database programming language for the ode object database being developed at AT&T Bell Labs. We propose a new Event-Action model, which folds into the event specification the condition part of the well-known Event-Condition-Action model and avoids the multiple coupling modes between the event, condition, and action trigger components.

Principles of Programming Languages

Book

Jan 1999

Bruce Maclennan

The Entity-Relationship Model-Towards a Uni ed View of Data

Article

Jan 1988

P. P. S. Chen

Simple Network Time Protocol (SNTP) Ve rsion 4 for IPv4, IPv6 and OSI

Article

Jan 1996

D.L. Mills

New Resource Reservation Protocol

Article

Jan 1993

An Expressive Event Speci cation Language For Active Databases

Article

Jan 1993

Application-Dependent Dynamic Monitoring of Distributed Systems

Article

Jan 1989

Design Patterns: Elements of Reusable OO Software

Article

Jan 1995

SNMP, SNMPv2, and CMIP: the practical Guide to Network Management Standards

Article

Jan 1993

W. Stallings

The bsd packet filter

Article

The Simple Book: An Introduction to Internet Management

Article

Jan 1996

M. T. Rose

COMPOSE: A system for composite specification and detection

Article

Jan 1993

Without Abstract

The tcpdump mannual page

Article

Jan 1989

DISDEB: An interactive high-level debugging system for a multi-microprocessor system

Article

Dec 1986

This paper describes the architecture of the interactive debugging system DISDEB, which is intended to debug programs on a multi-microprocessor system constituting a node of the Selenia Mara architecture.DISDEB requires neither changes in or additions to the code produced by the compiler nor heavy modifications to the operating system Kernel. Moreover, the use of ad hoc hardware provided with autonomous processing power allows the user to monitor and control the execution of both concurrent and distributed processes and their interactions, while, in most cases, maintaining the real-time operation of the target Mara system.

Threads primer: a guide to multi-threaded programming

Article

Jan 1996

Discrete mathematics for computer scientists and mathematicians: 2nd ed

Book

Jan 1986

This book considers such algorithms as the least mean-square (LMS) algorithm, different versions of the Kalman algorithm, the recursive least squares (RLS) algorithm, the fast transversal filters (FTF) algorithm, the exact least squares lattice (LSL) algorithm, and the recursive-QR decomposition-LS algorithm; offers thorough discussion of the Wiener and Kalman filter theories and considers the structures of transversal filter, lattice predictor, and systolic array; and explores such applications as adaptive prediction, adaptive equalization, system identification, analysis of superimposed sinusoids in noise, adaptive detection, and adaptive beam forming.

A pseudo-machine for packet monitoring and statistics

Article

Aug 1988
COMPUT COMMUN REV

Robert Taylor Braden

This paper concerns the design of a flexible and efficient packet monitoring program for analyzing traffic patterns and gathering statistics on a packet network. This monitor operates in real time, using an analyzer which is an interpretive pseudo-machine driving object-oriented data collection programs. The pseudo-program for the interpreter is “compiled” from configuration commands written in a monitoring control language.

Network and distributed systems management

Article

Morris Sloman

An abstract is not available.

Execution monitoring and debugging tool for ADA using relational algebra

Article

Sep 1985

This symbolic run-time debugger for Ada provides facilities for observing and manipulating the execution of a monitored program, also for concurrent aspects. The debugger can be used interactively, and also as a monitoring program to control the application. A feature of this project is the use of relational algebra for defining compiler and kernel interfaces and for handling debugger information. The implementation is based on an Ada task to interface with the debugging operator and a set of user-defined Ada monitoring tasks. A prototype of the debugger was completed as a part of ART, a relational translator and interpreter for Ada.

Introduction To Automata Theory, Languages, And Computation

Chapter

Jan 1979

Monitoring and debugging distributed realtime programs

Article

Oct 1992
SOFTWARE PRACT EXPER

In this paper we describe the design and implementation of an integrated monitoring and debugging system for a distributed real-time computer system. The monitor provides continuous, transparent monitoring capabilities throughout a real-time system's lifecycle with bounded, minimal, predictable interference by using software support. The monitor is flexible enough to observe both high-level events that are operating system- and application-specific, as well as low-level events such as shared variable references. We present a novel approach to monitoring shared variable references that provides transparent monitoring with low overhead. The monitor is designed to support tasks such as debugging realtime applications, aiding real-time task scheduling, and measuring system performance. Since debugging distributed real-time applications is particularly difficult, we describe how the monitor can be used to debug distributed and parallel applications by deterministic execution replay.

Snoop: An expressive event specification language for active databases

Article

Apr 1993
DATA KNOWL ENG

Making a database system active to meet the requirements of a wide range of applications entails developing an expressive event specification language and its implementation. Extant systems support mostly database events and in some cases a few predefined events.This paper discusses an event specification language (termed Snoop) for active databases. We define an event, distinguish between events and conditions, classify events into a class hierarchy, identify primitive events, and introduce a small number of event operators for constructing composite (or complex) events. Snoop supports temporal, explicit, and composite events in addition to the traditional database events. The novel aspect of our work lies not only in supporting a rich set of events and event expressions, but also in the notion of parameter contexts. Essentially, parameter contexts augment the semantics of composite events for computing their parameters. For concreteness, we present parameter computation for the relational model. Finally, we show how a contingency plan that includes time constraints can be supported without stepping outside of the framework proposed in this paper.

IDD: An Interactive Distributed Debugger.

Conference Paper

Jan 1985

The Art of Computer Systems Performance Analysis

Conference Paper

Jan 1990

Raj Jain

Monitoring Database Objects.

Conference Paper

Aug 1989

Tore Risch

A method is described for actively interfac- ing an Object-Oriented Database Manage- ment System (OODBMS) to application pre grams. The method, called a database moni- tor, observes how values of derived or stored attributes of database objects change over time. Whenever such a value change is ob- served, the OODBMS invokes tracking pro- cedures within running application programs. The OODBMS associates tracking procedures and the object attributes they monitor, and it invokes appropriate tracking procedures when data changes. Use is made of atomic transac- tions in the OODBMS. The applicability of monitors is localized both in time and space, so that only a minimal amount of data is monitored during as short a time as possible. Such localization reduces the frequency of tracking procedure invoca- tion, makes it easy to add and remove mon- itors dynamically, and permits efficient imple- mentation. To demonstrate these ideas, an implementa- tion is described for the Iris OODBMS (lo). The implementation uses a technique of partial view materialization for efficient implementa- tion.

Event Correlation in Heterogeneous Networks Using the OSI Management Framework.

Conference Paper

Jan 1993

The paper describes a prototype event correlation application developed at the IBM European Networking Center. In heterogeneous networks with integrated network management systems the symptoms of the failure of a single network resource could be detected and reported independently by many different system components. As a result, a single network failure triggers numerous event reports, with no indication which one of them (if any) reports an actual failure. The application described here assists in finding the actual failure by analyzing and structuring such event reports. The analysis consists of correlation of reports resulting from the same failure and ordering correlated reports to indicate the resource where the failure has occurred with a high probability, thereby decreasing the complexity of the operator task. It relies on a homogeneous model of interconnected, heterogeneous networks and explores relationships among physical and logical network resources to perform its task. The application uses the OSI standardized management framework and management communication protocol, and OSI-based managed objects.

Belvedere: Prototype of a Pattern Oriented Debugger for Highly Parallel Computation.

Conference Paper

Jan 1987

A scalable monitoring architecture for managing distributed multimedia systems.

Conference Paper

Jan 1997

Debugging Heterogeneous Distributed Systems Using Event-Based Models of Behavior.

Conference Paper

Jan 1988

Peter C. Bates

Modern Operating SystemsM

Book

Jan 1992

Andrew S. Tanenbaum

Advanced Programming in the UNIX (R) Environment

Book

Jan 1995

W. Richard Stevens

Essentials of Artificial Intelligence

Book

Jan 1993

Matthew L. Ginsberg

Debugging Heterogeneous Distributed Systems Using Event-Based Models of Behavior

Article

Feb 1995

Peter C. Bates

Event Based Behavioral Abstraction (EBBA) is a high-level debugging approach which treats debugging as a process of creating models of actual behavior from the activity of the system and comparing these to models of expected system behavior. The differences between the actual and expected models are used to characterize erroneous system behavior and direct further investigation. A set of EBBA-based tools has been implemented that users can employ to construct libraries of behavior models and investigate the behavior of an errorful system through these models. EBBA evolves naturally as a cooperative distributed program that can take better advantage of computational power available in a network computer system to enhance debugging tool transparency, reduce latency and uncertainty for fundamental debugging activities and accommodate diverse, heterogeneous architectures.

Fault Detection with Multiple Observers

Article

Feb 1993

There is a pressing need for network management systems capable of handling faults. In this paper, we propose to use a set of independent observers to detect faults in communication systems that are modeled by finite-state machines. An algorithm for constructing these observers and a fast real-time fault detection mechanism used by each observer are given. Since these observers run in parallel and independently, one immediate benefit is that of graceful degradation-one failed observer will not cause collapse of the fault management system. In addition, each observer has a simpler structure than the original system and can be operated at higher speed. This approach has the potential to be incorporated into the fault management system for a high-speed communication system.

Time, Clocks, and the Ordering of Events in a Distributed System

Article

Jul 1978

Leslie Lamport

The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

Reliable Distributed Computing with the ISIS Toolkit

Book

Jan 1993

Monitoring distributed systems

Article

Honna Segel

In debugging distributed programs a distinction is made between an observed error and the program fault, or bug, that caused the error. Testing reveals an error; debugging is the process of tracing the error through time and space to the bug that caused it. A program is considered to be in error when some state of computation violates a safety requirement of the program. Expressing safety requirements in such a way that a computation can be monitored for safe behavior is thus a basic preliminary step in the testing-debugging cycle. Safety requirements are usually expressed as predicates. When a state of the computation violates such a safety predicate, that state can be said to be in error. A predicate logic is proposed that permits the specification of relationships between distributed predicates. This increases the scope and precision of situation-specific conditions that can be specified and detected. It also permits the specification of safety primitives such as P unless Q using distributed predicates. Thus a distributed program can be directly monitored for satisfaction and violation of safety requirements. Breakpoint conditions and predicates expressing safety may hold over a number of states of a program. A breakpoint state is meaningful if the causal relationships of events included in the breakpoint are unambiguous. At least two such states exist for each condition: the minimal and the maximal prefix of the computation at which the predicate holds. These states are specifiable as part of a breakpoint definition in the logic presented.

Systematic Framework Design By Generalization

Article

Oct 1997

Hans Albrecht Schmid

A framework is a generic application that allows the creation of different applications from an application domain. Due to the inherent flexibility and variability of a framework, framework design is much more complex than application design. Experience shows that the complexity of framework design is reduced by separating clearly different issues: the design of a class model for an application from the framework domain; the analysis and specification of the domain variability and flexibility; and its stepwise implementation by a sequence of generalizing transformations. Since application design is a well-known activity, the article will concentrate on the specification of the variable aspects, on the design of a local class structure that provides each with the required variability, and on how to transform a class structure for generalization. When developing a framework, don't plan all design activities in one development cycle. Framework development should be based on experience, nobody will develop a useful framework from scratch in one development cycle. Therefore, the design activities should be distributed over different development cycles.

Hierarchical Filtering-based Monitoring Architecture for Large-Scale Distributed Systems

No full-text available

Recommended publications

A Monitoring Model for Hierarchical Architecture of Distributed Systems

Runtime Monitoring of Distributed Systems

Carp@ — A Reflection Based Tool for Observing Jini Services

HiFi: A New Monitoring Architecture for Distributed Systems Management