Hon Fung Li's research while affiliated with Concordia University Montreal and other places

Publications (33)

Article
In this paper, we introduce SPM ( S oftware-built P arallel M achines), a model to create software based virtual parallel machines. With SPM, an application developer simply selects all the required virtual parallel machines from the repository and implements the intended parallel algorithms directly without any need of complex mappings, as if the...
Conference Paper
This paper explores the use of locality of dependencies in large-scale distributed systems towards developing efficient checkpoint strategies. Dependencies among processes evolve into message interactions, which often spread and affect recovery dependencies and logging requirements. On the other hand, message interactions are usually localized with...
Article
Distributed multi-agent systems are usually large-scale, involving a large number of agents and messages. Existing checkpoint and recovery strategies are not quite favorable to such systems due to either global recovery spread or runtime logging overhead associated with these strategies. This paper presents our work on the design of correct and eff...
Chapter
We avoid state explosion in model checking of delay-insensitive VLSI systems by not using states. Systems are networks of communicating finite-state nonsequential processes with well-behaved nondeterministic choice. A specification strategy based on partial orders allows precise description of the branching and recurrence structure of processes. Pr...
Conference Paper
Complexity of parallel application development has been one of the major obstacles towards the mainstream adoption of parallel programming. In order to hide some of these complexities, researchers have been actively investigating the pattern-based approaches to parallel programming. As reusable components, patterns are intended to ease the design a...
Conference Paper
In spite of the advent of high performance parallel computers and commodity clusters, complexity of parallel application development remains one of the major obstacles towards the mainstream adoption of parallel computing. Researchers are constantly investigating different approaches to reduce parallel application development time and increase pro...
Conference Paper
With the advent of hardware technologies, high-performance parallel computers and commodity clusters are becoming affordable. However, complexity of parallel application development remains one of the major obstacles towards the mainstream adoption of parallel computing. As one of the solution techniques, researchers are actively investigating the...
Conference Paper
Application of pattern-based approaches to parallel programming is an active area of research today. The main objective of pattern-based approaches to parallel programming is to facilitate the reuse of frequently occurring structures for parallelism whereby a user supplies mostly the application specific code-components and the programming environm...
Conference Paper
FATMAD is a fault-tolerant multi-agent development framework that is built on top of a mobile agent platform (Jade). FATMAD aims to satisfy the needs of two communities of users: Jade application developers and fault-tolerant protocol developers. Application-level fault tolerance incurs significant development-time cost. FATMAD is based on a generi...
Conference Paper
The main objective of all design-patterns based approaches to parallel programming is to facilitate the reuse of frequently occurring structures and behaviors of parallelism, which are traditionally known as architectural and algorithmic skeletons respectively. In order to improve the flexibility and extensibility aspects of a specific skeleton-bas...
Article
Full-text available
Program Slicing is a well-known decomposition technique that transforms a large program into a smaller one that contains only statements relevant to the computation of a selected function. In this paper, we present two novel predicate-based dynamic slicing algorithms for message passing programs. Unlike more traditional slicing criteria that focus...
Conference Paper
There have been significant amount of research activities towards the programmability aspects of distributed software. However not much of thought has been put in the area of post deployment monitoring of distributed software. In this paper, we describe the DAMon - an infrastructure for monitoring distributed applications. The DAMon is an event-dri...
Article
Numerous shared memory consistency models have appeared for the purpose of obtaining better shared memory parallel computers, ones which suffer less from long memory latency. This paper uses the primitive notion of program-order and value-order to define global view. Using this as a seed, various consistency models evolve and form hierarchies of mo...
Article
Full-text available
Program Slicing is a well-known decomposition technique that transforms a large program into a smaller one that contains only statements relevant to the computation of a selected function. In this paper, we present a novel predicate-based dynamic slicing algorithm for message passing programs. Unlike the more traditional slicing criteria that focus...
Article
This paper addresses the problems of state space decomposition and predicate detection in a distributed computation involving asynchronous messages. We introduce a natural communication dependency which leads to the definition of the communication graph. This abstraction proves to be a useful tool to decompose the state lattice of a distributed com...
Conference Paper
Full-text available
Program slicing is a well-known decomposition technique that transforms a large program into a smaller one that contains only statements relevant to the computation of a selected function. We present a novel predicate-based dynamic slicing algorithm for message passing programs. Unlike the more traditional slicing criteria that focus only on the pa...
Article
Cause-and-eect precedence can be established between the reading of a value written by another process. Together with the local program order, the local view of a process is dened. The local views of all processes can be composed together to form the global view of a DSM. We propose a hierarchical set of augmentation rules to be applied to the glob...
Chapter
Technology, architecture, and application trends are clear for all to see. Many have observed the increasing complexity of hardware designs, and the presence of higher-level functionality on chip. Design (i.e. product) cycles have become much shorter. Both for reasons of avoiding catastrophic failure, and because the cost of fabrication is so high,...
Book
CHARM '97 is the ninth in a series of working conferences devoted to the development and use of formal techniques in digital hardware design and verification. This series is held in collaboration with IFIP WG 10.5. Previous meetings were held in Europe every other year.
Conference Paper
Timed behavior automata are finite-state generators of timed behaviors, which are infinite timing-constrained pomsets of system events. Automatic verification is not showing inclusion of infinitary timed string languages. Rather, model checking starts by linking specification mirror and implementation network; verification is showing satisfaction o...
Article
A shape matching technique based on the straight line Hough transform (SLHT) is presented. In the θ-ρ space, the transform can be expressed as the sum of the translation term and the intrinsic term. This formulation allows the translation, rotation, and intrinsic parameters of the curve to be easily decoupled. A shape signature, called the scalable...
Conference Paper
Timed behavior automata allow surprisingly efficient model checking of delay-constrained reactive systems when partial-order methods for delay-insensitive systems are adapted for real time. The complexity of timing verification is a sensitive function of the precise abstraction of real time used in the model. Untimed behavior automata [14] are modi...
Conference Paper
Practicing verifiers of finite-state concurrent systems should be able to adapt our partial-order methods for verifying delay-insensitive systems to other verification problems. We answer the question, is it possible to control state explosion arising from various sources during automatic verification (model checking) of delay-insensitive systems?...
Article
Several designs are presented for VLSI dictionary machines that combine both a linear (modify) network and a logarithmic (query) network with a novel idea for separation of concerns. The initial design objectives included: (1) single-cycle operability of host-issued modify and query commands (no compress instructions), (2) complete processor utiliz...
Article
The problem of updating the simulation clock in distributed discrete event simulation is investigated. Three objectives for efficiently updating the simulation clock in processes modeled by the strongly connected components of a process graph are identified. An optimal solution is proposed for one of the objectives. The optimization problem for ano...
Chapter
There has been considerable recent interest in concurrency modelling of delay-insensitive VLSI systems, which abandon global clocks and rely on communicating asynchronous circuits. We present an approach based on partial-order semantics for abstract specification and composition of reactive hardware processes, and proving the correctness of network...
Article
Systolic algorithms suitable for VLSI implementation for recognizing handwritten characters using shape features are presented. Local shape features, namely start and end points, edge types and their join-relations in the contours of a given character, are first extracted using a systolic algorithm. The global features consisting of the actual sequ...
Article
Presents a critical study of two approaches, the classical RC-cut approach and H.T. Kung and M.S. Lam's (Proc. 1984 MIT Conf. Advanced Res. VLSI p.74-83, 1984) RCS-cut approach, for reconfiguring faulty systolic arrays. The amount of cell (processing element) redundancy needed to ensure successful reconfiguration into an n × n array is considered....
Article
A combined methodology is presented for specifying abstract synchronous data types and proving the correctness of systolic network implementations. It is shown that an extension of the Parnas trace method of specifying software modules containing distinct access programs yields a natural method of specifying abstract synchronous data types that pos...
Article
Backward error recovery or rollback recovery is a well-known technique used in the design of reliable uni-processor computer systems. However, use of this technique for design of reliable distributed computer systems could result in uncontrolled rollback which is known as the domino effect. Here, we formally present the reasons for the domino effec...

Citations

... Following the notations used in [54,55], a run (execution) of program is represented by a partially ordered multi-set (POMSET) <E, A, D, L> such that: E = set of events A = set of actions (program statements) D = (control/data/communication) dependency ordering among the events, and L = labeling function: E  A These definitions are illustrated with the help of the sample program in Figure 1. It is assumed that blocking receive is used. ...
... As part of their extension of the SDG, Larson and Harrold [12] used objectoriented programs to illustrate the SDG. For every class in an object-oriented program, they created Class Dependence Graphs (ClDGs). ...
... On one hand, Petri networks [14, 13] and their interpretation called signal transition graphs (STG) [16, 1] have provided powerful representations of partial orders and causal relations in logic circuits. On the other hand, trace semantics, communicating sequential processes (CSP) [6], and related approaches [22, 24, 2, 11, 15] , are preferred in modelling complex systems. A formalism sharing both qualities would be particularly helpful for efficiently simulating and implementing causal relations in asynchronous circuits of high complexity. ...
... These distributed computing architectures require efficient fault-tolerance techniques for improving service availability in case of sequential node failures [5,7,9,18,22]. For this purpose, checkpointing and message logging may satisfy the requirement with much less computing resources during failure-free operation than process replication [12]. ...
... To handle drops and delays, we take inspiration from subsequent work (e.g., Li et. al. [114]) and classical network assumptions. While snapshot protocols exist for other, more relaxed system models, they typically require massive storage requirements, delaying of messages, or they limit the gathered state to packet/byte counts. ...
... Furthermore, the reachability problem has been explored by Schröter and Esparza in [52]. Probst and Li in [44,53] relied on "pomtrees" (i.e., a form of partial-orders) to verify delay-insensitive VLSI systems using what is called "behavior machines". ...
... It makes use of bash and Python scripts, and communicates with Z3 4 solver via files and command line. Our benchmark is the well-known dining philosophers problem [41] and its extension with clocks [48], slightly modified to give some agents a choice in selecting actions. The system consists of n = 2p + 1 agents given by timed automata, where the agents from 1 to p model philosophers (see Fig. 1), the subsequent p agents stand for the consecutive forks, and the last agent is the lackey who coordinates the philosophers' access to the dining room (see Fig. 2). ...
... They used a simulator to test their algorithm . Wei et al [18] offered a purely theoretical framework to combine checkpoint in atomic subgroups of tasks with message logging. They based their algorithm on checkpoint dependency graph (CDG) and formally demonstrated different properties of this technique. ...
... Recent research efforts demonstrate that parallel computation of the views is the most effective solution [6, 11, 19]. During our previous researches on parallel computation for clusters23456, we have found that data retrieval time is still the most critical factor for the applications described above. The retrieval time determines how fast the face recognition system can work or how frequently the OLAP application can revise a view from the dynamic raw data. ...
... Recent research efforts demonstrate that parallel computation of the views is the most effective solution [6, 11, 19]. During our previous researches on parallel computation for clusters23456, we have found that data retrieval time is still the most critical factor for the applications described above. The retrieval time determines how fast the face recognition system can work or how frequently the OLAP application can revise a view from the dynamic raw data. ...