Article

Hierarchical Filtering-based Monitoring Architecture for Large-Scale Distributed Systems

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... In recent years, several mechanisms have been proposed for QoS monitoring [4, 14]. In addition, based on the information offered by QoS monitoring, a lot of studies have been conducted to perform further QoS related analysis or operations [6, 7, 12]. However, most of the current QoS monitoring related mechanisms are concerned with end-to-end QoS monitoring or based on the assumption that necessary QoS information, such as QoS distribution information, can be obtained from other mechanisms. ...
... However, since the flow may cross several network segments that provides different levels of QoS, only the QoS distribution monitoring approach can further isolate the network segment(s) causing the degradation. Lastly, in addition to the functions supported in the traditional network monitoring model, the monitoring application in the QoS monitoring model may provide further QoS related analysis and operations, such as identifying QoS problems [12], adjusting the monitoring system [6] and reconfiguring the network system [7]. ...
... However , the mechanism through which the agent could monitor the end-to-end QoS was not discussed. Ehab Al-Shaer [6] proposed an event-driven dynamic monitoring approach for multimedia networks. The task of detecting primitive and composite events is distributed among dedicated monitoring agents as in [12]. ...
Conference Paper
Future integrated services networks will need to provide quality of service (QoS) guarantees to multimedia applications. To ensure that the contracted QoS is sustained, it is not sufficient to just commit resources. QoS monitoring is required to detect and locate the degradation of QoS performance. In addition, the distribution of QoS, instead of simply end-to-end QoS, needs to be monitored. In QoS distribution monitoring, the distribution of QoS experienced by a real-time flow in different network segments is monitored. This paper presents a brief survey of current QoS monitoring-related mechanisms, followed by a discussion of the challenges involved in providing QoS distribution monitoring. Several approaches are then proposed to meet these challenges
... In addition, the monitoring/filtering tasks in such systems are usually static and do not support defining a programmable management "actions". In [1], we present a survey and evaluation of number of proposed monitoring and filtering systems This work try to bridge this gap by designing and developing a monitoring architecture that explicitly addresses the challenges and the requirements imposed by managing large-scale distributed systems. Each component in the overall system is accounted for, from the instrumentation, user subscriptions, event filtering to information dissemination and management reaction. ...
... This feature is enables the consumer to control the monitoring granularity, and thereby minimizing its intrusiveness. In particular, the consumers can subscribe for a small number of filters, however, these filters may activate other filters when a specific event pattern is detected [1]. Therefore, the monitoring model supports dynamically activating/deactivating the appropriate monitoring operations (or filters) at the right time (event), and thereby relieving the system environment from the overhead of launching multiple filters or monitoring requests simultaneously. ...
... Figure 3 shows the formal definition of the monitor action in BNF. In [1], we present more discussion and examples on HiFi monitoring language specifications. ...
Conference Paper
With the increasing complexity of large scale distributed (LSD) systems, an efficient monitoring mechanism has become an essential service for improving the performance and reliability of such complex applications. The paper presents a scalable, dynamic, flexible and nonintrusive monitoring architecture for managing large scale distributed (LSD) systems. This architecture, which is referred to as the HiFi monitoring system, detects and classifies interesting primitive and composite events and performs either a corrective or steering action. When appropriate, information is also disseminated to management applications, such as reactive control tools. The outlined solution offers improvements over related works by supporting new monitoring techniques such as hierarchical filtering based monitoring and filter incarnation that improve the monitoring scalability and dynamism which are required for managing large scale distributed systems. The HiFi monitoring system has been implemented and used at the Old Dominion University for monitoring and steering Interactive Remote Instruction (IRI) which is a large scale distributed multimedia system for distance learning
... It also provides a flexible and adjustable event reporting mechanism to facilitate the instrumentation process. A survey and evaluation of the related work in monitoring and event filtering are in [3,4]. We classified the monitoring distributed systems related work into three classes: hardware monitoring such as [13], software monitoring such as [7,17], and hybrid monitoring such as [20]. ...
... The MAs receives the delegated monitoring tasks (subfilters) [11] and configure itself accordingly by inserting this subfilter in its filtering internal representation such as the direct acyclic graph (DAG) [5] or Petri Nets (PN) [9] (see [3,4] for more details). The LMAs and DMAs (in the monitoring agent network) work cooperatively and collaboratively for monitoring the target application based on the subscription requests. ...
... In this section, we present a brief description of the functionality, the design and the implementation of each component in the monitoring system. However, more details can be found in [3]. ...
Article
Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed multimedia (LDM) systems. In an LDM environment, a large number of events is generated by the system components during its execution or interaction with external objects (e.g. users or processes). Monitoring such events is necessary for observing the run-time behavior of LDM systems and providing status information required for managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment. This complicates the management decisions process thereby making monitoring LDM systems an intricate task. Furthermore, different media streams in LDM systems may have different management requirements that must be considered in the monitoring architecture. In this paper, we present a scalable high-performance monitoring architecture for LDM systems using a hierarchical event filtering...
... It also provides a flexible and adjustable event reporting mechanism to facilitate the instrumentation process. A survey and evaluation of the related work in monitoring and event filtering are in [3,4]. We classified the monitoring distributed systems related work into three classes: hardware monitoring such as [13], software monitoring such as [7,17], and hybrid monitoring such as [20]. ...
... The MAs receives the delegated monitoring tasks (subfilters) [11] and configure itself accordingly by inserting this subfilter in its filtering internal representation such as the direct acyclic graph (DAG) [5] or Petri Nets (PN) [9] (see [3,4] for more details). The LMAs and DMAs (in the monitoring agent network) work cooperatively and collaboratively for monitoring the target application based on the subscription requests. ...
... In this section, we present a brief description of the functionality, the design and the implementation of each component in the monitoring system. However, more details can be found in [3]. ...
Article
Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed multimedia (LDM) systems. Monitoring events generated by LDM systems is necessary for observing the runtime behavior of LDM systems and providing status information required for managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment. Furthermore, different media streams in LDM systems may have different management requirements that must be considered in the monitoring architecture. In this paper, we present a scalable high-performance monitoring architecture for LDM systems using a hierarchical event filtering mechanism to detect and classify interesting local and global events and perform the appropriate action specified with this event or disseminate the monitoring information to the corresponding end-points management applications. We also describe how this mon...
... Our monitoring uses an efficient filtering event filtering mechanism to classify and detect generated events and to reduce the large volume of event traffic that may be generated by LSD application and thereby minimizes the monitoring overhead (intrusiveness). A survey of monitoring and event filtering related work can be found in [1,2]. We classified the monitoring distributed systems related work into three classes: hardware monitoring such as [9], software monitoring such as [5,11], and hybrid monitoring [8]. ...
... Spec. figure) to its assigned MA. The MAs receives the delegated monitoring tasks (subfilters) and configure itself accordingly by inserting this subfilter in its filtering internal representation such as the direct acyclic graph (DAG) [3] or Petri Nets (PN) [6] (see [1,2] for more details). The LMAs and DMAs (in the monitoring agent network) work cooperatively and collaboratively for monitoring the target application based on the subscription requests. ...
... In this section, we describe briefly the major components of the monitoring systems: Instrumentation, Subscription Service, Event Processing and Control components [2]. In the following, we present a brief description of the functionality and the design of each component in the monitoring system. ...
Article
Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed (LSD) systems. In an LSD environment, a large number of events is generated by the system components during its execution or interaction with external objects (e.g. users or processes) . Monitoring such events is necessary for observing the run-time behavior of LSD systems and providing status information required for debugging, tuning and managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment which complicates the management decisions process and thereby makes monitoring LSD systems an intricate task. In this paper, we present a scalable high-performance monitoring architecture for LSD systems using an efficient event filtering mechanism to detect and classify interesting local and global events and disseminate the monitoring information to the corresponding endp...
... In addition, the monitoring/filtering tasks in such systems are usually static and do not support defining a programmable management "actions". In [1], we present a survey and evaluation of number of proposed monitoring and filtering systems This work try to bridge this gap by designing and developing a monitoring architecture that explicitly addresses the challenges and the requirements imposed by managing large-scale distributed systems. Each component in the overall system is accounted for, from the instrumentation, user subscriptions, event filtering to information dissemination and management reaction. ...
... This feature is enables the consumer to control the monitoring granularity, and thereby minimizing its intrusiveness. In particular, the consumers can subscribe for a small number of filters, however, these filters may activate other filters when a specific event pattern is detected [1]. Therefore, the monitoring model supports dynamically activating/deactivating the appropriate monitoring operations (or filters) at the right time (event), and thereby relieving the system environment from the overhead of launching multiple filters or monitoring requests simultaneously. ...
... Figure 3 shows the formal definition of the monitor action in BNF. In [1], we present more discussion and examples on HiFi monitoring language specifications. ...
Article
With the increasing complexity of large-scale distributed (LSD) systems, an efficient monitoring mechanism has become an essential service for improving the performance and reliability of such complex applications. This paper presents a scalable, dynamic, flexible and nonintrusive monitoring architecture for managing large-scale distributed (LSD) systems. This architecture, which is referred to as the HiFi monitoring system, detects and classifies interesting primitive and composite events and performs either a corrective or steering action. When appropriate, information is also disseminated to management applications, such reactive control tools. The outlined solution offers improvements over related works by supporting new monitoring techniques such as hierarchical filtering-based monitoring and filter incarnation that improve the monitoring scalability and dynamism which are required for managing large-scale distributed systems. The HiFi monitoring system has been implemented and u...
... HiFi: HiFi [7] uses an event-based abstraction for modeling and monitoring the behavior of distributed applications. It provides for the specification of the events to be observed at run-time. ...
... Whereas HiFi uses a multi-level hierarchy, Wabash only employs a single level. For applications where the frequency of the events is low and the number of components in the system is small, the single level monitoring architecture performs better than a multi-level hierarchy [7]. ...
Article
this paper, we present the various types of users and their management needs in the context of SmartHomes. We describe an infrastructure consisting of architectures for the management of embedded and distributed applications in a SmartHome. We also present the criteria for evaluating various management architectures and discuss the evaluation of the described architectures
... 4,15 In addition, based on the information offered by QoS monitoring, a lot of studies have been conducted to perform further QoS-related analysis or operations. 6,7,12 However, most of the current QoS monitoring-related mechanisms are concerned with end-to-end QoS monitoring or based on the assumption that necessary QoS information, such as QoS distribution information, can be obtained from other mechanisms. Few of them address the problem of QoS distribution monitoring directly. ...
... (4) Hierarchical monitoring: Hierarchical monitoring is an important approach to improve a monitoring system's scalability such as in RMON2 18 and in Ehab Al-Shaer. 6 We believe that it can also be integrated to QoS distribution monitoring systems. For example, relevant monitors are arranged hierarchically in the RTANS. ...
Article
This paper presents a brief survey of current QoS monitoring-related mechanisms, followed by a discussion of the challenges involved in providing QoS distribution monitoring. Several approaches are then proposed to meet these challenges. Finally, the issues that remain open are discussed. Copyright # 2000 John Wiley & Sons, Ltd. Introduction C omputer networks are evolving to support multimedia applications with diverse performance requirements. To provide quality of service (QoS) guarantees to these applications and ensure that the agreed QoS is sustained, it is not sufficient to just commit resources since QoS degradation is often unavoidable. Any fault or weakening of the performance of a network element may result in the degradation of the contracted QoS. Thus, QoS monitoring is required to track the ongoing QoS, compare the monitored QoS against the expected performance, detect possible QoS degradation, and then tune n
... In order to audit if the QoS performance conforms the SLA, QoS performance monitoring has received more attention in the past few years. According to the different traffic information collected by different approaches, they can be classifed into two categories: end-to-end QoS monitoring (Brownlee, 1997 ;Brownlee et al., 1997 ;Ehab, 1998 ;Mourelatou and Bouloutas, 1994 ;Schulzrinne et al., 1996 ;Waldbusser, 1997) and QoS distribution monitoring (Jiang et al., 1999). In end-to-end QoS monitoring, traffic information is collected on both ends (sender and receiver) of the monitored flow. ...
Chapter
Full-text available
An overview on emerging IP traffic monitoring is presented. Important parameters to characterize the traffic, network, and QoS are discussed. The infrastructure and methodology to measure those parameters directly or to compute them based on other measurements are described. We also present a discourse on coping with the challenge of new transport architectures and technologies. In summary, a framework of IP traffic monitoring is presented.
... The evaluation criteria are: architecture, middleware instrumentation for monitoring communication behaviour, support for analysis of concurrent activities and the overhead incurred by the monitoring system. The systems evaluated are: OLT [36], HiFi [37], MOTEL [16,38,39] and MIMO [40]. Of these systems, OLT is a commercial tool available from IBM and the others are academic research projects with some industrial participation. ...
... They showed that a management system is capable of identifying the cause of performance degradation by correlating the information from these QoS monitoring agents. Ehab Al-Saher et al. [4] looked at an event-driven dynamic monitoring approach for multimedia networks. ...
Conference Paper
Today, there are a large number of bandwidth-hungry applications which cause congestion and delay in networks. Delivery delays adversely affect critical applications especially those with real time requirements, e.g. video-conferencing. Hence, more effective quality of service (QoS) and network performance monitoring is required in order to quickly identify and locate performance bottlenecks. An integrated remote monitoring application that monitors both QoS and network performance has been developed. This application is based on the IETF Real Time Flow Monitoring Architecture (RTFM)
... Algorithms in the Hifi system (Al-Shaer, Abdel-Wahab, and Maly, 1997) are intended to reside on different sites of a distributed system to serve as separate filtering functions. The system supports three kinds of event filtering procedures; identity-based (which determines the generator), content-based (which checks for a valid attribute value), and correlation-based (which checks for a given relationship among an events set). ...
Article
Full-text available
Modern interaction systems are usually event-driven. New input devices often require new event types, and handling input from the user becomes increasingly more complex. Frequently, the WIMP (Windows, Icons, Menus, Pointer) paradigm widely used today is not suitable for interactive applications, such a virtual reality applications, that use more than the standard mouse and keyboard input devices. In this paper, we present the design and implementation of the Dynamic Event Model for Interactive System (DEMIS). DEMIS is a middleware between the operating system and the application that supports various input device events while using generic event recognition to detect composite events.
Chapter
The increasing complexity of current Software Systems is generating the urge to find new ways to check the correct functioning of models during runtime. Runtime verification helps ensure that a system is working as expected even after being deployed, essential when dealing with systems working in critical or autonomous scenarios. This paper presents an improvement to an existing tool, named CRESCO, linking it with another tool to enable performing periodical verification based on event logs. These logs help determine whether the functioning of the system is inadequate or not after the last periodic check. If the system is determined to be working incorrectly, new code files are automatically generated from the traces of the log file, so they can be replaced when a faulty scenario is to occur. Thanks to this improvement, the CRESCO components are able to evaluate their correctness and adapt themselves at runtime, making the system more robust against unforeseen faulty scenarios.
Thesis
Full-text available
Cyber-Physical Systems (CPSs) are embedded computing systems in which computation interacts closely with the physical world through sensors and actuators. CPSs are used to control context aware systems. These types of systems are complex systems that will have different configurations and their control strategy can be configured depending the environmental data and current situation of the context. Therefore, in current industrial environments, the software of embedded and Cyber-Physical systems have to cope with increasing complexity, uncertain scenarios and safe requirements at runtime. The UML State Machine is a powerful formalism to model the logical behaviour of these types of systems, and in Model Driven Engineering (MDE) we can generate code automatically from these models. MDE aims to overcome the complexity of software construction by allowing developers to work at the high-level models of software systems instead of low-level codes. However, determining and evaluating the runtime behaviour and performance of models of CPSs using commercial MDE tools is a challenging task. Such tools provide little support to observe at model-level the execution of the code generated from the model, and to collect the runtime information necessary to, for example, check whether defined safe properties are met or not. One solution to address these requirements is having the software components information in model terms at runtime (models@run.time). Work on models@run.time seeks to extend the applicability of models produced in MDE approaches to the runtime environment. Having the model at runtime is the first step towards the runtime verification. Runtime verification can be performed using the information of model elements (current state, event, next state,etc.) This thesis aims at advancing the current practice on generating automatically Unified Modeling Language - State Machine (UML-SM) based software components that are able to provide their internal information in model terms at runtime. Regarding automation, we propose a tool supported methodology to automatically generate these software components. As for runtime monitoring, verification and adaptation, we propose an externalized runtime module that is able to monitor and verify the correctness of the soft are components based on their internal status in model terms at component and system level. In addition, if an error is detected, the runtime adaptation module is activated and the safe adaptation process starts in the involved software components. All things considered, the overall safe level of the software components and CPSs is enhanced.
Article
Full-text available
Computerized services are the driving force behind every day business for many companies, it is of the utmost importance that these services are available during business hours because downtime costs serious money. Most of the computerized services today are based on a distributed architecture because of the many benefits of such an architecture. There is a downside to distributed architectures though; distributed architectures have an incomplete observability problem resulting in tough decision making and difficult control of the system build according to the architecture. This paper describes a design of a business continuity monitoring model, developed to cope with software, hardware, and operator failures by reducing the time required to detect, diagnose, and repair a problem in a distributed architecture. It is based on a three-tier model combined with five monitoring domains distilled from a standard distributed architecture. A prototype was developed to test the model in a real environment.
Article
Full-text available
this paper therefore is to provide conditions under which a decentralized optimization framework is as good as a centralized framework. In particular, we show that there is no loss of quality in the optimal self-management of complex information systems when a decentralized approach is used and we provide a foundation for the decentralized approach to designing and implementing autonomic systems with self-# properties. Another purpose of our study is to investigate in more detail the interactions between system components at different levels of this hierarchical decentralized framework for optimal self-management. Specifically, we consider a negotiation scheme where additional information is passed between the CM and the AEsin order to significantly increase the efficiency with which the optimization algorithms compute the optimal solution. We then exploit a representative example of our general mathematical framework to investigate other fundamental properties of decentralized optimal self-management in practice, including phase transitions, chaotic behavior, stability and computational complexity
Article
The objective of this paper is the presentation of the basic concepts for monitoring of real-time systems. It starts with presentation of real-time systems with special view on real-time data. Special attention will be paid to the notion temporal consistency of real-time data (because these data are observed and col-lected by monitoring systems), which consists of the absolute and relative consistency. Especially, the rel-ative consistency is very important for monitoring systems, because during correlation of collected mon-itoring data, these systems must be certain that these data are relative consistent. Otherwise, the gathered monitoring information would not correctly represent the behavior of the monitored system at the intended abstraction level. Another goal of this paper is the presentation of a survey of the real-time monitoring research area. The goal of the last section of this pa-per will be the case studies, in which different moni-toring systems are presented, including the monitor-ing approaches used for monitoring of time-triggered systems that are based on the time-triggered architec-ture.
Conference Paper
Every day, our society becomes more dependent on complex software systems with high availability requirements, such as those present in telecommunications, air traffic control, power plants and distribution lines, among others. In order to facilitate the task of maintaining and evolving such systems, dynamic software architecture infrastructures have recently been in the research agenda. However, complexity and dynamic evolution of dependable systems bring some challenges for verification. Some of these challenges are associated to modifications in the set of properties being verified and also in the types of analysis being performed during system operation. In this work, we present a multiple specification and architectural-based approach for software monitoring that allows the adaptation of analysis tasks in order to properly handle the challenges mentioned above.
Conference Paper
Full-text available
Performance measurement of large distributed multiagent systems (MAS) offers challenges that must be addressed explicitly in the agent infrastructure. Performance data is widely distributed and voluminous, and poor data collection can impact the operation of the system itself. However, performance metrics are essential to internal system function, e.g., autonomous adaptation to dynamic environments, as well as to external assessment. In this paper we describe the tools, techniques, and results of performance characterization of the Cougaar distributed agent architecture. These techniques include infrastructure instrumentation, plugin-based instrumentation of agents, and dynamic control of metric collection. We introduce multiple redundant "channels" for metric delivery, each serving separate quality of service requirements. We present our techniques for instrumenting the agent society, justify the metrics chosen, and describe the tools developed for collecting these metrics. We also present results from distributed agent societies comprising hundreds of agents.
Conference Paper
Full-text available
Adaptive Distributed Systems (ADSs) are distributed systems that can evolve their behaviors based on changes in their environments. In this work, we discuss security and propose security metrics issues in the context of ADSs. A key premise with adaptation of distributed systems is that in order to detect changes, information must be collected by monitoring the system and its environment. How monitoring should be done, what should be monitored, and the impact monitoring may have on the security mechanism of the target system need to be carefully considered. Conversely, the impact of implementation of security mechanism on the adaptation of distributed system is also assessed. We propose security metrics that can be used to quantify the impact of monitoring on the security mechanism of the target distributed system.
Article
This paper presents two schemes, relevant monitor (RM)-based and improved relevant monitor (IRM)-based, for QoS distribution monitoring. With these schemes, when monitoring a real-time flow, a network manager can locate relevant monitors that are metering the flow. Copyright # 2000 John Wiley & Sons, Ltd. Introduction P roviding quality of service (QoS) guarantees is an important requirement for multimedia networks. To maintain agreed QoS, it is not sufficient to just commit resources because QoS degradation can be caused by many factors and is often unavoidable, e.g. any fault or weakening of the performance of a network element may result in the degradation of contracted QoS. Thus, performance management is required to ensure that the contracted QoS is sustained. 1 To date, there has been a considerable amount of research within the field of QoS management support for multimedia networks, including the service model,<F7
Article
Full-text available
Event filtering is an essential element in event management applications. In event management environments, the filtering mechanisms are employed to track the events generated from applications at run-time and perform the corresponding appropriate actions. Several key applications domains, such as system and network management, distributed system toolkits, communication protocols and active databases, utilize event filtering for various management purposes. The goal of this paper is to describe the object-oriented design and implementation of an adaptive event filtering framework which can be integrated and reused efficiently to develop event management applications for various domain environments. In our approach, the event filtering framework captures the common components and design patterns of event management in different domains. The major contribution of this work is to provide a flexible event filtering framework that can be efficiently adapted to different domain-specific requirements and with minimal development effort. In this paper, we also present examples of using the event filtering framework for developing event management applications in different domains.
Article
Software runtime monitoring has been used to increase the dependability of software. This paper focuses on software runtime monitoring techniques and tools. A generic software runtime monitoring model is presented, which consists of five basic elements, i.e., Monitored Object Features, Monitoring Access Methods, Execution Relationships, Runtime Monitor and Platform Dependencies. This model is an innovation in software monitoring fields. This paper gives some features of each element. Based on these features, researchers can use the model to comprehend and analyze runtime monitoring techniques and tools. The objective of this paper is to help researchers and users to identify the difference and the basic principles of software runtime monitoring techniques and tools. This paper also shows a result of relationship between techniques and features, through the result, we can understand the development trends of the techniques and tools, such as, what features are concerned more, and what features are concerned less.
Chapter
Full-text available
For users of image management systems, and especially for the user who doesn't know what he wants until he sees it, these systems should be organized in such a way as to support intelligent browsing so that the user will be satisfied in the shortest amount of time. It is our belief that intelligent browsing should be mediated by the standard paradigms of image similarity as well as by an appropriate organization of metadata, including annotations and self-describing image regions.
Technical Report
Full-text available
This memorandum describes RTP, the real-time transport protocol. RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of-service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers. Most of the text in this memorandum is identical to RFC 1889 which it obsoletes. There are no changes in the packet formats on the wire, only changes to the rules and algorithms governing how the protocol is used. The biggest change is an enhancement to the scalable timer algorithm for calculating when to send RTCP packets in order to minimize transmission in excess of the intended rate when many participants join a session simultaneously.
Article
Full-text available
This work deals with some issues concerned in the debugging of concurrent programs. A set of desirable characteristics for a debugger for concurrent languages is deduced from a review of the differences between the debugging of concurrent programs and that of sequential ones. A debugger for a concurrent language, based upon CSP, is then described. The debugger makes it possible to compare a description of the expected program behavior to the actual behavior. The description of the behavior is given in terms of expressions composed by events and/or assertions on the process state. The developed formalism is able to describe behaviors at various levels of abstraction. Lastly, some guidelines for the implementation of the debugger are given and a detailed example of program debugging is analyzed.
Article
Full-text available
Effective management of a local area network (LAN) requires not only a protocol to manage the active entities, but also a means to monitor the LAN channel. This is especially true in shared-channel LANs, such as Ethernet, where the behavior of the LAN as a whole may be impractical to deduce from the states of the individual hosts. Passive monitoring can be done using either a dedicated system or a general-purpose system. Dedicated monitors have been favored for several reasons, but recent workstations, when carefully programmed, are sufficiently powerful to serve this function. Using a workstation offers high-performance graphics and a more flexible environment for collecting and presenting LAN behavior.
Conference Paper
Full-text available
Network management systems built on a client/server model centralize responsibilities in client manager processes, with server agents playing restrictive support roles. As a result, managers must micro-manage agents through primitive steps, resulting in ineffective distribution of management responsibilities, failure-prone management bottlenecks, and limitations for real time responsiveness. We present a more flexible paradigm, the Manager-Agent Delegation (MAD) framework. It supports the ability to extend the functionality of servers (agents) at execution time, allowing flexible distribution of management responsibilities in a distributed environment. MAD can store and instantiate delegated scripts, and provides a concurrent runtime environment, where they can execute asynchronously without requiring the manager's intervention. A delegation protocol allows a manager to transfer programs, create process instances, and control their execution. We describe the delegation model, its application to network management, and the design of a prototype implementation.
Conference Paper
Full-text available
This paper describes a novel approach to event correlation in networks based on coding techniques. Observable symptom events are viewed as a code that identifies the problems that caused them; correlation is performed by decoding the set of observed symptoms. The coding approach has been implemented in SMARTS Event Management System (SEMS), as server running under Sun Solaris 2.3. Preliminary benchmarks of the SEMS demonstrate that the coding approach provides a speedup at least two orders of magnitude over other published correlation systems. In addition, it is resilient to high rates of symptom loss and false alarms. Finally, the coding approach scales well to very large domains involving thousands of problems.
Conference Paper
Full-text available
This paper describes a pattern-based approach to building packet classifiers. One novelty of the approach is that it can be implemented efficiently in both software and hardware. A performance study shows that the software implemen- tation is about twice as fast as existing mechanisms, and that the hardware implementation is currently able to keep up with OC-12 (622Mbps) network links and is likely to operate at gigabit speeds in the near future.
Article
Full-text available
this article we describe IRI and the lessons learned deploying it. We first deployed IRI to teach a fall 1995 graduate course in software metrics. We then evaluated it in terms of logistics, reliability, performance, and usability, performing off-line experiments to try out new features and to develop protocols that would improve IRI use. We subsequently reengineered IRI into an open architecture with a published specification as version 1.0, which became available just recently (http://www.cs.odu.edu/ tele/iri). We used version 1.0 to teach a junior-level software engineering course in the fall 1996 semester.
Article
Full-text available
The monitoring of distributed systems involves the collection, interpretation, and display of information concerning the interactions among concurrently executing processes. This information and its display can support the debugging, testing, performance evaluation, and dynamic documentation of distributed systems. General problems associated with monitoring are outlined in this paper, and the architecture of a general purpose, extensible, distributed monitoring system is presented. Three approaches to the display of process interactions are described: textual traces, animated graphical traces, and a combination of aspects of the textual and graphical approaches. The roles that each of these approaches fulfill in monitoring and debugging distributed systems are identified and compared. Monitoring tools for collecting communication statistics, detecting deadlock, controlling the non-deterministic execution of distributed systems, and for using protocol specifications in monitoring are also described. Our discussion is based on experience in the development and use of a monitoring system within a distributed programming environment called Jade. Jade was developed within the Computer Science Department of the University of Calgary and is now being used to support teaching and research at a number of university and research organizations.
Article
Full-text available
This paper describes scalable reliable multicast (SRM), a reliable multicast framework for light-weight sessions and application level framing. The algorithms of this framework are efficient, robust, and scale well to both very large networks and very large sessions. The SRM framework has been prototyped in wb, a distributed whiteboard application, which has been used on a global scale with sessions ranging from a few to a few hundred participants. The paper describes the principles that have guided the SRM design, including the IP multicast group delivery model, an end-to-end, receiver-based model of reliability, and the application level framing protocol model. As with unicast communications, the performance of a reliable multicast delivery algorithm depends on the underlying topology and operational environment. We investigate that dependence via analysis and simulation, and demonstrate an adaptive algorithm that uses the results of previous loss recovery events to adapt the control parameters used for future loss recovery. With the adaptive algorithm, our reliable multicast delivery algorithm provides good performance over a wide range of underlying topologies.
Article
Full-text available
Object-oriented programming is as much a different way of designing programs as it is a different way of designing programming languages. This paper describes what it is like to design systems in Smalltalk. In particular, since a major motivation for object-oriented programming is software reuse, this paper describes how classes are developed so that they will be reusable.
Article
Full-text available
The main problems associated with debugging concurrent programs are increased complexity, the 'probe effect', nonrepeatability, and the lack of a synchronized global clock. The probe effect refers to the fact that any attempt to observe the behavior of a distributed system may change the behavior of that system. For some parallel programs, different executions with the same data will result in different results even without any attempt to observe the behavior. Even when the behavior can be observed, in many systems the lack of a synchronized global clock makes the results of the observation difficult to interpret. This paper discusses these and other problems related to debugging concurrent programs and presents a survey of current techniques used in debugging concurrent programs. Systems using three general techniques are described: traditional or breakpoint style debuggers, event monitoring systems, and static analysis systems. In addition, techniques for limiting, organizing, and displaying a large amount of data produced by the debugging systems are discussed.
Article
This work discusses some issues in the debugging of concurrent programs. A set of desirable characteristics of a debugger for concurrent languages is deduced from an examination of the differences between the debugging of concurrent programs and that of sequential ones. A debugger for a concurrent language, derived from CSP, is then presented. It is based upon a semantic model of the supported language. The debugger enables to compare a description of the program behaviour to the actual behaviour as well as to valuate assertions on the process state. The description of the behaviuor is given by a formalism whose semantics is also specified. The formalism can specify program behaviuors at various abstraction levels. Lastly some guidelines for the implementation of the debugger are shown and a detailed example of program description is analyzed.
Article
The concept of a trigger is central to any active database. Upon the occurrence of a trigger event, the trigger is “fired”, i.e, the trigger action is executed. We describe a model and a language for specifying basic and composite trigger events in the context of an object-oriented database. The specified events can be detected efficiently using finite automata. We integrate our model with O++, the database programming language for the ode object database being developed at AT&T Bell Labs. We propose a new Event-Action model, which folds into the event specification the condition part of the well-known Event-Condition-Action model and avoids the multiple coupling modes between the event, condition, and action trigger components.
Article
This paper describes the architecture of the interactive debugging system DISDEB, which is intended to debug programs on a multi-microprocessor system constituting a node of the Selenia Mara architecture.DISDEB requires neither changes in or additions to the code produced by the compiler nor heavy modifications to the operating system Kernel. Moreover, the use of ad hoc hardware provided with autonomous processing power allows the user to monitor and control the execution of both concurrent and distributed processes and their interactions, while, in most cases, maintaining the real-time operation of the target Mara system.
Book
This book considers such algorithms as the least mean-square (LMS) algorithm, different versions of the Kalman algorithm, the recursive least squares (RLS) algorithm, the fast transversal filters (FTF) algorithm, the exact least squares lattice (LSL) algorithm, and the recursive-QR decomposition-LS algorithm; offers thorough discussion of the Wiener and Kalman filter theories and considers the structures of transversal filter, lattice predictor, and systolic array; and explores such applications as adaptive prediction, adaptive equalization, system identification, analysis of superimposed sinusoids in noise, adaptive detection, and adaptive beam forming.
Article
This paper concerns the design of a flexible and efficient packet monitoring program for analyzing traffic patterns and gathering statistics on a packet network. This monitor operates in real time, using an analyzer which is an interpretive pseudo-machine driving object-oriented data collection programs. The pseudo-program for the interpreter is “compiled” from configuration commands written in a monitoring control language.
Article
This symbolic run-time debugger for Ada provides facilities for observing and manipulating the execution of a monitored program, also for concurrent aspects. The debugger can be used interactively, and also as a monitoring program to control the application. A feature of this project is the use of relational algebra for defining compiler and kernel interfaces and for handling debugger information. The implementation is based on an Ada task to interface with the debugging operator and a set of user-defined Ada monitoring tasks. A prototype of the debugger was completed as a part of ART, a relational translator and interpreter for Ada.
Article
In this paper we describe the design and implementation of an integrated monitoring and debugging system for a distributed real-time computer system. The monitor provides continuous, transparent monitoring capabilities throughout a real-time system's lifecycle with bounded, minimal, predictable interference by using software support. The monitor is flexible enough to observe both high-level events that are operating system- and application-specific, as well as low-level events such as shared variable references. We present a novel approach to monitoring shared variable references that provides transparent monitoring with low overhead. The monitor is designed to support tasks such as debugging realtime applications, aiding real-time task scheduling, and measuring system performance. Since debugging distributed real-time applications is particularly difficult, we describe how the monitor can be used to debug distributed and parallel applications by deterministic execution replay.
Article
Making a database system active to meet the requirements of a wide range of applications entails developing an expressive event specification language and its implementation. Extant systems support mostly database events and in some cases a few predefined events.This paper discusses an event specification language (termed Snoop) for active databases. We define an event, distinguish between events and conditions, classify events into a class hierarchy, identify primitive events, and introduce a small number of event operators for constructing composite (or complex) events. Snoop supports temporal, explicit, and composite events in addition to the traditional database events. The novel aspect of our work lies not only in supporting a rich set of events and event expressions, but also in the notion of parameter contexts. Essentially, parameter contexts augment the semantics of composite events for computing their parameters. For concreteness, we present parameter computation for the relational model. Finally, we show how a contingency plan that includes time constraints can be supported without stepping outside of the framework proposed in this paper.
Conference Paper
A method is described for actively interfac- ing an Object-Oriented Database Manage- ment System (OODBMS) to application pre grams. The method, called a database moni- tor, observes how values of derived or stored attributes of database objects change over time. Whenever such a value change is ob- served, the OODBMS invokes tracking pro- cedures within running application programs. The OODBMS associates tracking procedures and the object attributes they monitor, and it invokes appropriate tracking procedures when data changes. Use is made of atomic transac- tions in the OODBMS. The applicability of monitors is localized both in time and space, so that only a minimal amount of data is monitored during as short a time as possible. Such localization reduces the frequency of tracking procedure invoca- tion, makes it easy to add and remove mon- itors dynamically, and permits efficient imple- mentation. To demonstrate these ideas, an implementa- tion is described for the Iris OODBMS (lo). The implementation uses a technique of partial view materialization for efficient implementa- tion.
Conference Paper
The paper describes a prototype event correlation application developed at the IBM European Networking Center. In heterogeneous networks with integrated network management systems the symptoms of the failure of a single network resource could be detected and reported independently by many different system components. As a result, a single network failure triggers numerous event reports, with no indication which one of them (if any) reports an actual failure. The application described here assists in finding the actual failure by analyzing and structuring such event reports. The analysis consists of correlation of reports resulting from the same failure and ordering correlated reports to indicate the resource where the failure has occurred with a high probability, thereby decreasing the complexity of the operator task. It relies on a homogeneous model of interconnected, heterogeneous networks and explores relationships among physical and logical network resources to perform its task. The application uses the OSI standardized management framework and management communication protocol, and OSI-based managed objects.
Conference Paper
Monitoring is an essential process to observe and improve the reliability andthe performance of large-scale distributed multimedia (LDM) systems. Monitoringevents generated by LDM systems is necessary for observing the runtimebehavior of LDM systems and providing status information required formanaging such applications. However, correlated events are generated concurrentlyand could be distributed in various locations in the applications environment.Furthermore, different media streams in ...
Article
Event Based Behavioral Abstraction (EBBA) is a high-level debugging approach which treats debugging as a process of creating models of actual behavior from the activity of the system and comparing these to models of expected system behavior. The differences between the actual and expected models are used to characterize erroneous system behavior and direct further investigation. A set of EBBA-based tools has been implemented that users can employ to construct libraries of behavior models and investigate the behavior of an errorful system through these models. EBBA evolves naturally as a cooperative distributed program that can take better advantage of computational power available in a network computer system to enhance debugging tool transparency, reduce latency and uncertainty for fundamental debugging activities and accommodate diverse, heterogeneous architectures.
Article
There is a pressing need for network management systems capable of handling faults. In this paper, we propose to use a set of independent observers to detect faults in communication systems that are modeled by finite-state machines. An algorithm for constructing these observers and a fast real-time fault detection mechanism used by each observer are given. Since these observers run in parallel and independently, one immediate benefit is that of graceful degradation-one failed observer will not cause collapse of the fault management system. In addition, each observer has a simpler structure than the original system and can be operated at higher speed. This approach has the potential to be incorporated into the fault management system for a high-speed communication system.
Article
The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.
Article
In debugging distributed programs a distinction is made between an observed error and the program fault, or bug, that caused the error. Testing reveals an error; debugging is the process of tracing the error through time and space to the bug that caused it. A program is considered to be in error when some state of computation violates a safety requirement of the program. Expressing safety requirements in such a way that a computation can be monitored for safe behavior is thus a basic preliminary step in the testing-debugging cycle. Safety requirements are usually expressed as predicates. When a state of the computation violates such a safety predicate, that state can be said to be in error. A predicate logic is proposed that permits the specification of relationships between distributed predicates. This increases the scope and precision of situation-specific conditions that can be specified and detected. It also permits the specification of safety primitives such as P unless Q using distributed predicates. Thus a distributed program can be directly monitored for satisfaction and violation of safety requirements. Breakpoint conditions and predicates expressing safety may hold over a number of states of a program. A breakpoint state is meaningful if the causal relationships of events included in the breakpoint are unambiguous. At least two such states exist for each condition: the minimal and the maximal prefix of the computation at which the predicate holds. These states are specifiable as part of a breakpoint definition in the logic presented.
Article
A framework is a generic application that allows the creation of different applications from an application domain. Due to the inherent flexibility and variability of a framework, framework design is much more complex than application design. Experience shows that the complexity of framework design is reduced by separating clearly different issues: the design of a class model for an application from the framework domain; the analysis and specification of the domain variability and flexibility; and its stepwise implementation by a sequence of generalizing transformations. Since application design is a well-known activity, the article will concentrate on the specification of the variable aspects, on the design of a local class structure that provides each with the required variability, and on how to transform a class structure for generalization. When developing a framework, don't plan all design activities in one development cycle. Framework development should be based on experience, nobody will develop a useful framework from scratch in one development cycle. Therefore, the design activities should be distributed over different development cycles.