Article

Threads primer: a guide to multi-threaded programming

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... There are a huge number of parallel programming models/languages taking different approaches [33]. Some of them, such as Message Passing Interface (MPI) [38], arrange intercommunication via messages send to/received from other threads, while most of them utilize some kind of a shared memory for exchanging data between threads [39,40]. There are differences in how this shared memory is presented to threads and how exclusive and concurrent data access handles, how synchronizations are organized and how race conditions and deadlocks are avoided [5,6,23,41]. ...
... POSIX threads (Pthreads) is a set of C language interfaces (functions, header files) for threaded programming [39]. It allows a program to control multiple different threads of computational work that overlap in time. ...
... Synchronization can be done via mutual exclusion locks, semaphores, join functions and barriers, which can be implemented by a set of corresponding provided functions. Threads in different processes can be synchronized via synchronization variables in the shared memory [39]. In a typical SMP system, such as Apple MacBook Pro and Apple iMac Pro running Apple Mac OS operating system utilized in this paper, threads are periodically assigned to processor cores with least amount of work. ...
Article
Full-text available
Commercial multicore central processing units (CPU) integrate a number of processor cores on a single chip to support parallel execution of computational tasks. Multicore CPUs can possibly improve performance over single cores for independent parallel tasks nearly linearly as long as sufficient bandwidth is available. Ideal speedup is, however, difficult to achieve when dense intercommunication between the cores or complex memory access patterns is required. This is caused by expensive synchronization and thread switching, and insufficient latency toleration. These facts guide programmers away from straight-forward parallel processing patterns toward complex and error-prone programming techniques. To address these problems, we have introduced the Thick control flow (TCF) Processor Architecture. TCF is an abstraction of parallel computation that combines self-similar threads into computational entities. In this paper, we compare the performance and programmability of an entry-level TCF processor and two Intel Skylake multicore CPUs on commonly used parallel kernels to find out how well our architecture solves these issues that greatly reduce the productivity of parallel software development. Code examples are given and programming experiences recorded.
... Communication is achieved by explicit calls to functions for sending and receiving messages; some programming languages support these operations through special syntax. Synchronization between subtasks is achieved as a part of communication, by using blocking send and receive, or by using barriers [39]. Communication in a message-passing application may be visualized by a graph, with vertices representing tasks and (directed) edges representing communication between the tasks. ...
... We focus on processes and threads, which are used for exploiting multiple CPUs, scheduling, communication and synchronization mechanisms. Our presentation synthesizes the material that can be found in text-books on operating systems [5] or in reference materials such as [67,39,68]. ...
... Upon receipt of a message, a worker calls the burn_cycles function, shown in figure 5.7, with argument wT 0 , to consume the specified amount of CPU time, and then it sends a reply message back to p 0 ; the T 0 constant has been explained in section 5.2.1.4. In the meantime, p 0 waits to receive a reply from all workers, and then begins the next round of distributing work to workers, thus acting as a barrier [39]. The total workload executed by the workers is W = nmw = nmT/d CPU seconds. ...
Article
The appearance of commodity multi-core processors, has spawned a wide interest in parallel programming, which is widely-regarded as more challenging than sequential programming. KPNs are a model of concurrency that relies exclusively on message passing, and that has some advantages over parallel programming tools in wide use today: simplicity, graphical representation, and determinism. Because of determinism, it is possible to reliably reproduce faults, an otherwise notoriously difficult problem with parallel programs. KPNs have gained acceptance in simulation and signal-processing communities. In this thesis, we investigate the applicability of KPNs to implementing general-purpose parallel computations for multi-core machines. In particular, we investigate 1) how KPNs can be used for modeling general-purpose problems; 2) how an efficient KPN run-time can be implemented; 3) what KPN scheduling strategies give good run-time performance. For these purposes, we have developed Nornir, an efficient run-time system for executing KPNs. With Nornir, we show that it is possible to develop a high-performance KPN run-time for multi-core machines. We experimentally demonstrate that problems expressed in the Kahn model resemble very much their sequential implementations, yet perform much better than when expressed in the MapReduce model, which has become widely-recognized as a simple parallel programming model. Lastly, we use Nornir to evaluate several load-balancing methods: static assignment, work-stealing, our improvement of work-stealing, and amethod based on graph partitioning. The understanding brought by this evaluation is significant not only in the context of the Kahn model, but also in the more general context of load-balancing (potentially distributed) applications written in message-passing style.
... For instance, one would need methods to resolve possible conflicts due to simultaneous modifications by different processors of the data stored in the same memory location. This problem is usually solved using mutual exclusion locks (see Lewis and Berg, 1996, for more details on this topic). There is however generally no need to explicitly resolve conflicts when processors are accessing memory locations for reading purposes only, as computer systems typically contain built-in switching circuits that automatically resolve such conflicts. ...
... The development of codes on shared-memory machines is usually done using multi-threading techniques (see Lewis and Berg, 1996, for more details on this topic). Many processors can simultaneously execute threads, which have access to a single copy of the algorithm code and data. ...
... Access to joint data for writing purposes, must be done in a way to avoid simultaneous access conflicts and preserve the order of data updating as specified in the algorithm. Shared-memory implementations of this paper were developed using the Solaris Multithreading (MT) library (Lewis and Berg, 1996). Parallel codes for distributed memory machines are a collection of sequential processes that communicate between each other. ...
Article
The development of intelligent transportation systems (ITS) and the resulting need for the solution of a variety of dynamic traffic network models and management problems require faster-than-real-time computation of shortest path problems in dynamic networks. Recently, a sequential algorithm was developed to compute shortest paths in discrete time dynamic networks from all nodes and all departure times to one destination node. The algorithm is known as algorithm DOT and has an optimal worst-case running-time complexity. This implies that no algorithm with a better worst-case computational complexity can be discovered. Consequently, in order to derive algorithms to solve all-to-one shortest path problems in dynamic networks, one would need to explore avenues other than the design of sequential solution algorithms only. The use of commercially-available high-performance comput-ing platforms to develop parallel implementations of sequential algorithms is an example of such avenue. This paper reports on the design, implementation, and computational testing of parallel dynamic shortest path algorithms. We develop two shared-memory and two message-passing dynamic shortest path algorithm implementations, which are derived from algorithm DOT using the following parallelization strategies: decom-position by destination and decomposition by transportation network topology. The algorithms are coded using two types of parallel computing environments: a message-passing environment based on the parallel virtual machine (PVM) library and a multi-threading environment based on the SUN Microsystems Multi-Threads (MT) library. We also develop a time-based parallel version of algorithm DOT for the case of minimum time paths in FIFO networks, and a theoretical parallelization of algorithm DOT on an 'ideal' theoretical parallel machine. Performances of the implementations are analyzed and evaluated using large transportation networks, and two types of parallel computing platforms: a distributed network of Unix workstations and a SUN shared-memory machine containing eight processors. Satisfactory speed-ups in the running time of sequential algorithms are achieved, in particular for shared-memory machines. Numerical results indicate that shared-memory computers constitute the most appropriate type of parallel computing platforms for the computation of dynamic shortest paths for real-time ITS applications.
... Une technique de programmation se prêtant bien à la mise en oeuvre d'applications irrégulières est basée sur des réseaux de processus légers communicants. Un processus léger (ou « thread », en anglais) [107,103] est une abstraction permettant d'exprimer la multiprogrammation (i.e. de multiples flots d'exécution indépendants) au sein d'un processus système. Le qualificatif "léger" vient du fait que la gestion de ces entités a un coût très faible par rapport aux processus système. ...
... Sur architecture à mémoire partagée, la programmation parallèle est souvent fondée sur des techniques de multiprogrammation légère (« multithreading ») [107,108,103,30]. Selon ce modèle, le programme parallèle est composé de multiples flots d'exécution (fils d'exécution) concurrents manipulant des données placées dans une mémoire commune. ...
Article
Numerical simulation applications requiring the resolution of Partial Differential Equation (PDE) problems are often parallelized using domain decomposition methods. These mathematical methods are well adapted to parallel computing, however their effective exploitation on parallel machines becomes difficult when the applications have an irregular behavior. This is the case for example when the mathematical problems are solved over complex geometries or when one uses mesh refinement techniques. A programming technique that is useful to cope with irregular parallel applications is multithreading. In this thesis we perform a thorough study on the use of this programming paradigm for solving PDE problems through domain decomposition methods, and we show that a generic algorithmic writing of this methods is possible. One of our main contributions resides in the design and implementation of a programming harness called Ahpik, allowing for easy development of applications relying on domain decomposition methods. This programming environment provides a generic support that is adaptable to many mathematical methods, which can be synchronous or asynchronous, overlapping or non-overlapping. Its object-oriented design allows to encapsulate implementation details concerning the management of threads and communications, which eases the task of developing new methods. We validate the Ahpik environment in the context of the resolution of some classical PDE problems and in particular for one large problem in computational fluid dynamics.
... Programs that solve several independent tasks, which use the same resources, need to be synchronized to pretend deadlocks and failures within the computation. Programs that solve independent tasks in general use threads to handle every task [56]. ...
... An explanation of this effect could be the fact that females and males use different strategies in navigation [40]. Females tend to remember path descriptions with the help of landmarks [56] and that they are also more sensitive to verbal interferences than males. ...
... Two alternative multi-threading approaches have been realised. Native threading uses the pthreads [25] library, whereas green threading implements scheduling and thread management within the VM itself. For memory management, GCs applying mark/sweep and referencecounting [22] have been implemented. ...
... Two alternative multi-threading approaches have been realised. Native threading uses the pthreads [25] library, whereas green threading implements scheduling and thread management within the VM itself. For memory management, GCs applying mark/sweep and reference- counting [22] have been implemented. ...
Article
Full-text available
CSOM/PL is a software product line (SPL) derived from applying multi-dimensional separation of concerns (MDSOC) techniques to the domain of high-level language virtual machine (VM) implementations. For CSOM/PL, we modularised CSOM, a Smalltalk VM implemented in C, using VMADL (virtual machine architecture description language). Several features of the original CSOM were encapsulated in VMADL modules and composed in various combinations. In an evaluation of our approach, we show that applying MDSOC and SPL principles to a domain as complex as that of VMs is not only feasible but beneficial, as it improves understandability, maintainability, and configurability of VM implementations without harming performance.
... Os primeiros S.O.s a implementarem esta característica eram baseados em microkernel [11] e até 1991 nenhum S.O. possuía uma biblioteca em user level para criação e uso de múltiplas threads [3]. Somente a partir de 1996, os principais S.O. ...
... The software was completely developed in the C language. The parallel implementation was made through the POSIX API (Lewis and Berg, 1996). The C compiler utilized was the GNU compiler collection (GCC). ...
Article
The comparison and assessment of similarity across metagenomes are still an open problem. Uncultivated samples suffer from high variability, thus making it difficult for heuristic sequence comparison methods to find precise matches in reference databases. Finer methods are required to provide higher accuracy and certainty, although these come at the expense of larger computation times. Therefore, in this work, we present our software for the highly parallel, fine-grained pairwise alignment of metagenomes. First, an analysis of the computational limitations of performing coarse-grained global alignments in parallel manner is described, and a solution is discussed and employed by our proposal. Second, we show that our development is competitive with state-of-the-art software in terms of speed and consumption of resources, while achieving more accurate results. In addition, the parallel scheme adopted is tested, depicting a performance of up to 98% efficiency while using up to 64 cores. Sequential optimizations are also tested and show a speedup of 9× over our previous proposal.
... Another concept applicable to the multi-thread context is thread safety. A piece of code is threadsafe when it can manipulate shared data structures in such a way that it can ensure a secure execution across multiple threads at the same time (Lewis and Berg, 1995). Similarly, variant applications can be executed from a single instance of a multi-tenant SaaS system that must manipulate shared data structures. ...
Conference Paper
Full-text available
SaaS (Software as a Service) is a service delivery model in which an application can be provided on demand via the Internet. Multi-tenant architecture is essential for SaaS because it enables multiple customers, so-called tenants, to share the system's resources in a transparent way to reduce costs and customize the software layer, resulting in variant applications. Despite the popularity of this model, there have been few cases of evaluation of software testing in cloud computing. Many researchers argue that traditional software testing may not be a suitable way of validating cloud applications owing to the high degree of customization, its dynamic environment and multi-tenancy. User Acceptance Testing (UAT) evaluates the external quality of a product and complements previous testing activities. The main focus of this paper is on investigating the ability of the parallel and automated UAT to detect faults with regard to the number of tenants. Thus, our aim is to evaluate to what exte nt the ability to detect faults varies if a different number of variant applications is executed. A case study was designed with a multi-tenant application called iCardapio and a testing framework created through Selenium and JUnit extensions. The results showed a significant difference in terms of detected faults when test scenarios with a single-tenant and multi-tenant were included.
... This implies that we have 'four flows of execution'. Each of these flows of execution is called a thread [3]. So, in this figure, we have four threads. ...
Conference Paper
Full-text available
Today, everything has gone distributed for so many types of server applications. We have Web servers, application servers, database servers, file servers, and mail servers that maintain worker queues and thread pools to handle large numbers of short tasks that arrive from remote sources. In this paper we have done analysis for multithreaded programs, focusing on ways to improve the efficiency of analyzing interactions between threads. A multithreaded program always contains two or more parts that can run concurrently without lagging and each part can handle different tasks at the same time making optimal use of the available resources. Each task is independent of the other. Multithreading is based on the idea of multitasking in applications where specific operations within a single application are further divided into individual threads. This application of multithreading is developed using Eclipse IDE. Eclipse consists of a base workspace and an extensible plug-in system for customizing the environment.
... This allows the same application client to be compatible with a variety of object server implementations. Second, the Alert Manager interface allows subscribers to effectively decompose themselves into a dynamic collection of thread-based interest clients (Lewis and Berg 1996). That is, the Alert Manager extends the monolithic one-to-one relationship between the IS Server and an IS client into one which supports a one-to-many relationship. ...
Article
Not unlike King Arthur relying on the infamous Round Table as the setting for consultation with his most trusted experts, agent-based, decision-support systems provide human decision makers with a means of solving complex problems through collaboration with collections of both human and computer-based expert agents. The Round Table Framework provides a formalized architecture together with a set of development and execution tools which can be utilized to design, develop, and execute agent-based, decision-support applications. Based on a three-tier architecture, Round Table incorporates forefront technologies including distributed-object servers, inference engines, and web-based presentation to provide a framework for collaborative, agent-based decision making systems.
... This allows the same application client to be compatible with a variety of object server implementations. Second, the Alert Manager interface allows subscribers to effectively decompose themselves into a dynamic collection of threadbased interest clients (Lewis and Berg 1996). In this respect, the Alert Manager extends the monolithic one-to-one relationship between the Subscription Server and its clients into one that supports a one-to-many relationship. ...
Article
Full-text available
This report describes work performed by CDM Technologies Inc. in conjunction with the Collaborative Agent Design (CAD) Research Center of California Polytechnic State University (Cal Poly), San Luis Obispo, for the Office of Naval Research (ONR), on the SEAWAY experimental system for planning, gaming and executing maritime logistic operations from a sea base. SEAWAY incorporates three fundamental concepts that distinguish it from existing (i.e., legacy) command and control applications. First, it is a collaborative system in which computer-based agents assist human operators by monitoring, analyzing and reasoning about events in near real-time. Second, SEAWAY includes an ontological model of the sea base that represents the behavioral characteristics and relationships among real world entities such as sea base ships, inbound supply ships, supplies and equipment, infrastructure objects (terrain, intermediate embarkation ports, supply points, roads, and rivers), and abstract notions. This object model provides the essential common language that binds all SEAWAY components into an integrated and adaptive decision-support system. Third, SEAWAY provides no ready made solutions that may not be applicable to the problems that will occur in the real world. Instead, the agents represent a powerful set of tools that together with the human operators can adjust themselves to the problem situations that cannot be predicted in advance. In this respect, SEAWAY is an adaptive logistic command and control system that supports planning, execution and training functions concurrently. SEAWAY is an experimental maritime logistic decision-support system that is intended to provide near real-time adaptive command and control in sustaining joint forces from the sea during contingencies. It is based on satisfying the dynamic requirements of joint forces operating ashore, with the ability to provide: offload planning and dynamic re-planning; visibility on all items en route by sea and warehoused at the sea base; track and respond to the dynamic logistic support requirements cycle originating with the supported force ashore; coordinate and control the ship-to-shore ship-to-objective, and ship-to-unit delivery of supplies ashore through a near real-time transport composite operational picture; track supplies and execute reorder; and, provide a full range of warehousing and cargo churning functions aboard the ships of the sea base.
... Une première implémentation a été réalisée à l'aide des segments de mémoire partagée entre plusieurs processus [91]. Actuellement, les processus légers (ou threads) représentent le moyen le plus performant de multiprogrammation [61]. L'interface de programmation des processus légers a été standardisée pour les opérations de bases (Norme POSIX 1003 [43]). ...
Article
In their traditional flavor, Distributed Shared Memory (DSM) libraries allow a number of separate processes to share a common address space using a consistency protocol according to a semantics specified by some given consistency model: sequential consistency, release consistency, etc. The processes may usually be physically distributed among a number of computing nodes interconnected through some communication library. Most approaches to DSM programming assume that the DSM library and the underlying architecture are fixed, and that it is up to the programmer to fit his program with them. This static view does not allow experimentations with alternative implementations. The contribution of this thesis consists in proposing a generic impementation and experimentation platform called DSM-PM , which allows both the application and the underlying DSM consistency protocol to be co-designed and tuned for performance. This platform is entirely implemented in software, in user-space. It is portable across a large number of cluster architectures. It provides the basic blocks for implementing and evaluating a large number of multithreaded consistency pro- tocols within a unified framework. Three consistency models are currently supported: sequential consistency, release consistency and Java consistency. Several performances studies have been carried out with multiple multithreaded applications on different clusters, in order to evaluate the proposed consistency protocols. The platform has been validated as a target for a Java compiling system for distributed architectures, called Hyperion.
... The PAWIAN environment was implemented on a Sun Sparc1000 with 4 processors. The parallelism was realized by multithreading techniques (MT) [11] [12]. ...
Article
The increase of processing speed is an important goal in the development of image recognition systems, especially in case of the recognition of complex objects. The use of parallel computers offers possibilities for the acceleration of the necessary algorithms. However, only few research work has been done in the area of high level image recognition. In our contribution parallel knowledge based processing in a hybrid image recognition system is presented. It is based on parallel search strategies and can also be applied to other knowledge based image recognition systems. The implementation was done on a multiprocessor workstation. Our strategies were confirmed by run time measurements. Keywords: control algorithm, hybrid image recognition, multithreading, parallel search, knowledge based systems 1 INTRODUCTION Besides the improvement of the performance of image recognition systems, the increase of processing speed is still an important field of research. This concerns in particular...
... Thus, POSIX Threads (Pthreads) is a library which defines an API to create and manipulate threads in C/C++. Pthreads is an IEEE POSIX (Portable Operating System Interface) for threads, which provides efficient ways to expand a running process in new concurrent processes and can run efficiently in computer systems with multiple processors and/or multi-core processors [14]. ...
Article
Full-text available
Advances in multi-cores CPUs and in Graphics Processors Units (GPUs) are attracting a lot of attention of the scientific community due to their parallel processing power in conjunction with their low cost. In recent years the resolution of inverse thermal problems (ITP) is gaining increasing importance and attention in simulation-based applied science and engineering. However, the resolutions of these problems are very sensitive to random errors and the computer cost is high. In an attempt to improve the computational performance to solve an ITP, the computational power of multi-core architectures was used and analysed; mainly those offered by the GPU via Compute Unified Device Architecture (CUDA) and multi-cores CPUs via Pthreads. Also, we developed the implementation of the Preconditioned Conjugate Gradient method as a kernel on GPU to solve several sparse linear systems. Our CUDA and Pthreads-based systems are, respectively, two and four times faster than the serial version, while maintaining comparable convergence behaviour.
... These commands provide information about the threads, and assist in controlling the execution of threads. A more comprehensive list of commands and information about threads can be found in [1][2]. ...
Article
Full-text available
This paper covers the two most popular methods for implementing parallel code: Threads and Message Passing Interface (MPI). Both methods are discussed in detail to provide information about the implementation issues of the methods. An in-depth look is taken into the parallelization libraries that are widely used among programmers. The paper also describes, how to write parallel code by using these methods. In addition, two characteristics of parallel computing, synchronization and load balancing, are explored. Finally, a performance study of both methods is presented.
... In the current paper, it is shown that this method can be used to provide effective computations on parallel machines using multithreaded model. The libraries OpenMP [9] and PThreads [4] are often used to perform parallel computations on parallel machines with shared memory. For example, the PLASMA package [1] uses the PThreads library to perform linear numerical algebra computations (including the solution of systems of linear equations) in parallel on real-valued vectors and matrices. ...
Conference Paper
In this paper, an approach to the solution of systems of interval linear equations with the use of the parallel machines is presented, based on parallel multithreaded model and "interval extended zero" method. This approach not only allows us to decrease the undesirable excess width effect, but makes it possible to avoid the inverted interval solutions too. The efficiency of this method has been already proved for non-parallel systems. Here it is shown that it can be also used to perform efficient calculations on parallel machines using the multithreaded model.
... Several techniques, generally used for achieving software parallelism are based on threads and Message Passing Interface, [2], [3], [4], and [5]. These methods provide parallel threads or processes execution, by means of intra-node and inter-node communication. ...
Conference Paper
Full-text available
As information society changes, the digital world is making more use of larger bulks of data and complex operations that need to be executed. This trend has caused overcoming the processor speed limit issues, by introducing multiple processor systems. In spite of hardware-level parallelism, the software has evolved with various techniques for achieving parallel programs execution. Executing a program in parallel can be efficiently done only if the program code follows certain rules. There are many techniques, which tend to provide variant processing speeds. The aim of this paper is to test the Matlab, OpenMPI and Pthreads methods on a single-processor, multi-processor, GRID and cluster systems and suggest optimal method for that particular system.
... A thread [96,97] is a series of instructions within a process, which can be executed as a program. Typically, a process contains only one thread which runs sequentially, sharing the processor with other processes via timeslicing or some other scheduling scheme. ...
Article
This thesis presents a study of applications and techniques for molecular dynamics simulations. Three studies are presented that are intended to improve our ability to simulate larger systems more realistically. A comparison study of two and three-body potential models for liquid and amorphous SiO2 is presented. The structural, vibrational, and dynamic properties of the substance are compared using two- and three-body potential energy models against experimental results. The three-body interaction does poorly at reproducing the experimental phonon density of states, but better at reproducing the Si-O-Si bond angle distribution. The three-body interaction also produces much higher diffusivities than the two-body interactions. A study of tabulated functions in molecular dynamics is presented. Results show that the use of tabulated functions as a method for accelerating the force and potential energy calculation can be advantageous for interactions above a certain complexity level. The decrease in precision due to the use of tabulated functions is negligible when the tables are sufficiently large. Finally, an investigation into the benefits of multi-threaded programming for molecular dynamics is presented.
... In order to exploit all the parallelism and processing available, specific software infrastructure is needed. For multithreading programming[l], low-level libraries such as pthreads [2], Windows Threads [3] and high-level libraries like OpenMP [4] are the most popular available for instrument parallel software. Additionally, communication mechanisms and APIs such as MPI [5], PVM [6], and CRL [7] are widely used in distributed application development, offering a variety of primitives for point to point and collective operations, as well as, process control, startup and shutdown. ...
Article
Full-text available
Current high-performance applications development is increasing due to breakthrough advances in microprocessor and power management technologies, network speed and reliability. By this way, distributed parallel applications make use of message-passing interface and multithreaded programming libraries. Nevertheless, drawbacks in message-passing implementations limit the use of thread-safe network communication. This paper presents a thread-safe message-passing interface based on MPI Standard assuring correct message ordering and sender/receiver synchronization.
... The master-slave parallel API design is very similar in idea to the streams and substreams design, though it is based on the Master-Slave model of execution Java uses for threads [18]. This is a design of controlling the creation and seeding of sequences in a random number cycle by first initialising all the shared data in a master class. ...
Article
Abstract Scientiflc computing has long been pushing the boundaries of computational re- quirements in computer science. An important aspect of scientiflc computing is the generation of large quantities of random numbers, especially in parallel to take advan- tage of parallel architectures. Many science and engineering programs require random numbers for applications like Monte Carlo simulation. Such an environment suitable for parallel computing is Java, though rarely used for scientiflc applications due to its perceived slowness when compared to complied languages like C. Through research and recommendations, Java is slowly being shaped into a viable language for such computa- tional intense applications. Java has the potential for such large scale applications, since it is a modern language with a large programmer,base and many well received features such as built-in support for parallelism using threads. With improved performance from better compilers, Java is becoming more commonly used for scientiflc computing but Java still lacks a number,of features like optimised scientiflc software libraries. This project looks at the efiectiveness and e‐ciency of implementing a parallel random num- ber library in Java using threads, and explores the options for creating a high-quality parallel generator. The parallel random,number generator library extends the current java.util.Random to add features, like generator selection, and has been implemented as a set of high-quality generators that can be used sequentially or in parallel with- out requiring synchronisation. The implementation is e‐cient with a selection of tests verifying both e‐ciency and efiectiveness. This project has a viable parallel Random API implementation that can be used in parallel scientiflc applications e‐ciently and efiectively, unlike the current standard Java random generators. Acknowledgements I would like to thank my,Supervisor Paul Coddington for all his help and patience
... La programmation parallèle la plus simple consiste à avoir plusieurs flots d'instructions (en anglais, Threads) qui s'exécutent en manipulant des données stockées dans une mémoire commune [85,84,81,18]. Elle est donc naturellement le type de programmation le plus employé sur les machines SMP. ...
Article
The load balancing and data distribution are major problems to solve in order to implement a parallel application. They require to choose the date and location of the computations. The efficiency of the application is a function of these choices. We will solve this "scheduling problem" with a model recently proposed : the malleable tasks. The introduction to the domain of parallel computing includes the main drawbacks of some standard models. Namely, the fine grain modeling of application requires in these models accurate modeling of data exchange. The involved scheduling problems seem, in our opinion, difficult to handle. An application is handled by the malleable task model as a set of parallel tasks. Each one is executed simultaneously by several processors. The modeling of an application is the standard task graph but communications are taken into account implicitly in the execution time of each malleable tasks. We claim this approach simplifies the scheduling problem practically and theoretically. This document presents firstly the independent malleable tasks scheduling. We analyze previous works and propose a new algorithm in almost two shelves with a performance guarantee of 3/2. An average analysis of the algorithms is also presented. Some previous results for the problems with precedence constraints in related models are recalled. We propose a first approach to the problem of malleable tasks chains. Then, the ocean stream simulation is introduced. The practical use of the malleable tasks model to schedule this simulation is finally exposed.
... The experiments consist of measuring the message passing and application execution times for the RM3D application kernel before and after incorporating our optimiza- Multi-threading is an approach which can best exploit the parallelism inherent in SAMR applications. The advantages of using threads are discussed in detail in a number of publications [21] [12]. Among the obvious are the easy use of multiple processors if available, latency hiding and cheap inter-thread (as opposed to inter-process) communication and synchronization. ...
Article
OF THE THESISArchitecture Specific CommunicationOptimizations for Structured AdaptiveMesh-Refinement Applicationsby Taher SaifThesis Director: Professor Manish ParasharDynamic Structured Adaptive Mesh Refinement (SAMR) techniques for solving partialdi#erential equations provide a means for concentrating computational e#ort toappropriate regions in the computational domain. Parallel implementations of thesetechniques typically partition the adaptive heterogeneous grid hierarchy...
... Second, the Alert Manager interface allows subscribers to effectively decompose themselves into a dynamic collection of thread-based interest clients (Lewis and Berg 1996). In other words, the Alert Manager extends the monolithic one-to-one relationship between the Information Server and its client into one which supports a one-to-many relationship. ...
Article
Full-text available
This report describes work performed by the Collaborative Agent Design Research Center for the US Marine Corps Warfighting Laboratory (MCWL), on the IMMACCS experimental decision-support system. IMMACCS (Integrated Marine Multi-Agent Command and Control System) incorporates three fundamental concepts that distinguish it from existing (i.e., legacy) command and control applications. First, it is a collaborative system in which computer-based agents assist human operators by monitoring, analyzing, and reasoning about events in near real-time. Second, IMMACCS includes an ontological model of the battlespace that represents the behavioral characteristics and relationships among real world entities such as friendly and enemy assets, infrastructure objects (e.g., buildings, roads, and rivers), and abstract notions. This object model provides the essential common language that binds all IMMACCS components into an integrated and adaptive decision-support system. Third, IMMACCS provides no ready made solutions that may not be applicable to the problems that will occur in the real world. Instead, the agents represent a powerful set of tools that together with the human operators can adjust themselves to the problem situations that cannot be predicted in advance. In this respect, IMMACCS is an adaptive command and control system that supports planning, execution and training functions concurrently. The report describes the nature and functional requirements of military command and control, the architectural features of IMMACCS that are designed to support these operational requirements, the capabilities of the tools (i.e., agents) that IMMACCS offers its users, and the manner in which these tools can be applied. Finally, the performance of IMMACCS during the Urban Warrior Advanced Warfighting Experiment held in California in March, 1999, is discussed from an operational viewpoint.
... This allows the same application client to be compatible with a variety of object server implementations. Second, the Alert Manager interface allows subscribers to effectively decompose themselves into a dynamic collection of thread-based interest clients (Lewis and Berg 1996). In this respect, the Alert Manager extends the monolithic one-to-one relationship between the Subscription Server and its clients into one that supports a one-to-many relationship. ...
Article
Full-text available
This report provides an overview description of the Integrated Cooperative Decision-Making (ICDM) software toolkit for the development of intelligent decision-support applications. More technical descriptions of ICDM are contained in a companion CDM Technical Report (CDM-18-04) entitled: ‘The ICDM Development Toolkit: Technical Description’. ICDM is an application development framework and toolkit for decision-support systems incorporating software agents that collaborate with each other and human users to monitor changes (i.e., events) in the state of problem situations, generate and evaluate alternative plans, and alert human users to immediate and developing resource shortages, failures, threats, and similar adverse conditions. A core component of any ICDM-based application is a virtual representation of the real world problem (i.e., decision-making) domain. This virtual representation takes the form of an internal information model, commonly referred to as an ontology. By providing context (i.e., data plus relationships) the ontology is able to support the automated reasoning capabilities of rule-based software agents. Principal objectives that are realized to varying degrees by the ICDM Toolkit include: support of an ontology-based, information-centric system environment that limits internal communications to changes in information; ability to automatically ‘push’ changes in information to clients, based on individual subscription profiles that are changeable during execution; ability of clients to assign priorities to their subscription profiles; ability of clients to generate information queries in addition to their standing subscription-based requests; automatic management of object relationships (i.e., associations) during the creation, deletion and editing of objects; support for the management of internal communication transmissions through load balancing, self-diagnosis, self-association and self-healing capabilities; and, the ability to interface with external data sources through translators and ontological facades. Most importantly, the ICDM Toolkit is designed to support the machine generation of significant portions of both the server and client side code of an application. This is largely accomplished with scripts that automatically build an application engine by integrating Toolkit components with the ontological properties derived from the internal information model. In this respect, an ICDM-based application consists of loosely coupled, generic services (e.g., subscription, query, persistence, agent engine), which in combination with the internal domain-specific information model are capable of satisfying the functional requirements of the application field. Particular ICDM design notions and features that have been incorporated in response to the increasing need for achieving interoperability among heterogeneous systems include: support for overarching ontologies in combination with more specialized, domain-specific, lower level facades; compliance with Defense Information Infrastructure (DII) Common Operating Environment (COE) segmentation principles, and their recent transition to the more challenging information-centric objectives of the Global Information Grid (GIG) Enterprise Services (GES) environment; seamless transition from one functional domain to another; operational integration to allow planning, rehearsal, execution, gaming, and modeling functions to be supported within the same application; and, system diagnosis with the objective of ensuring graceful degradation through self-monitoring, self-diagnosis, and failure alert capabilities. An ICDM-based software development process offers at least four distinct advantages over current data-centric software development practices. First, it provides a convenient structured transition to information-centric software applications and systems in which computer-based agents with reasoning capabilities assist human users to accelerate the tempo and increase the accuracy of decision-making activities. Second, ICDM allows software developers to automatically generate a significant portion of the code, leaving essentially only the domain-specific user-interface functions and individual agents to be designed and coded manually. Third, ICDM disciplines the software development process by shifting the focus from implementation to design, and by structuring the process into clearly defined stages. Each of these stages produces a set of verifiable artifacts, including a well defined and comprehensive documentation trail. Finally, ICDM provides a development platform for achieving interoperability by formalizing a common language and compatible representation across multiple applications.
... To overcome the heavy overhead for OS thread switching and synchronization, some runtime systems implement light-weight user-level thread libraries that allow the user to create OS-transparent user-level threads and to explicitly manage thread scheduling and synchronous switching without invoking the OS kernel. However, a conventional user-level thread library, such as the fiber utility [4] in the Windows OS, is unable to perform eventdriven preemptive scheduling. In addition to the ability to synchronously switch between threads, VMT provides applications with the new capability to directly observe for and react to microarchitectural events on a logical processor without any OS involvement. ...
Article
Helper threading is a technology to accelerate a program by exploiting a processor's multithreading capability to run ``assist'' threads. Previous experiments on hyper-threaded processors have demonstrated significant speedups by using helper threads to prefetch hard-to-predict delinquent data accesses. In order to apply this technique to processors that do not have built-in hardware support for multithreading, we introduce virtual multithreading (VMT), a novel form of switch-on-event user-level multithreading, capable of fly-weight multiplexing of event-driven thread executions on a single processor without additional operating system support. The compiler plays a key role in minimizing synchronization cost by judiciously partitioning register usage among the user-level threads. The VMT approach makes it possible to launch dynamic helper thread instances in response to long-latency cache miss events, and to run helper threads in the shadow of cache misses when the main thread would be otherwise stalled.The concept of VMT is prototyped on an Itanium ® 2 processor using features provided by the Processor Abstraction Layer (PAL) firmware mechanism already present in currently shipping processors. On a 4-way MP physical system equipped with VMT-enabled Itanium 2 processors, helper threading via the VMT mechanism can achieve significant performance gains for a diverse set of real-world workloads, ranging from single-threaded workstation benchmarks to heavily multithreaded large scale decision support systems (DSS) using the IBM DB2 Universal Database. We measure a wall-clock speedup of 5.8% to 38.5% for the workstation benchmarks, and 5.0% to 12.7% on various queries in the DSS workload.
... IEEE defines a POSIX standard API, referred to as Pthreads (IEEE 1003.1c), for thread creation and synchronization [4]. Many contemporary systems, including Linux, Solaris, and Mac OS X implement Pthreads. ...
Article
Full-text available
This paper describes the experience of the authors in using SDL threads to develop computer science and engineering course materials that cover multi-threaded programming. The courses include data structures, operating systems, computer graphics and video game programming. The techniques developed are also used in the work of students' independent study projects and master's projects. In particular, they have been used to support an NSF CPATH grant to revitalize computer science education and promote computational thinking. This paper includes 3 simple example programs that illustrate threading concepts.
... More importantly, however, is the overhead associated with context switching and synchronization. This context switching time can be as low as 5us in the case of an application level thread context switch, and up to 300us when synchronizing two lightweight processes on a condition variable [7]. The RTI design allows for the runtime configuration of the underlying thread model used. ...
Article
A recent DMSO (Defense Modeling and Simulation Office) initiative resulted in a new RTI design and build effort. This paper describes the design constructs used in the RTI 2.0 architecture and the driving principles used throughout the design process. Key architectural features are identified and analyzed in terms of meeting the RTI's set of requirements. Concepts such as system scalability, runtime performance, federation-specific tuning, reliability, and maintainability are discussed within the confines of the RTI 2.0 architecture. This paper presents information representing the HLA development process underway by the DMSO and the DoD AMG (Architecture Management Group).
Chapter
Full-text available
This study offers a step-by-step practical procedure from the analysis of the current status of the spare parts inventory system to advanced service level analysis by virtue of simulation-optimization technique for a real-world case study associated with a seaport. The remarkable variety and immense diversity, on one hand, and extreme complexities not only in consumption patterns but also in the supply of spare parts in an international port with technically advanced port operator machinery, on the other hand, have convinced the managers to deal with this issue in a structural framework. The huge available data require cleaning and classification to properly process them and derive reorder point (ROP) estimation, reorder quantity (ROQ) estimation, and associated service level analysis. Finally, from 247,000 items used in 9 years long, 1416 inventory items are elected as a result of ABC analysis integrating with the analytic hierarchy process (AHP), which led to the main items that need to be kept under strict inventory control. The ROPs and the pertinent quantities are simulated by Arena software for all the main items, each of which took approximately 30 minutes run time on a personal computer to determine near-optimal estimations.
Thesis
Full-text available
Many contemporary composers and sound artists are using sensing systems, based on control voltage to MIDI converters and laptop computers running algorithmic composition software, to create interactive instruments and responsive environments. Using an integrated device that encapsulates the entire system for performance can reduce latency, improve system stability, and reduce setup complexity. This research addresses the issues of how one can develop such a device, including the techniques one would use to make the design easily upgradeable as newer technologies become available, the programming interface that should be employed for use by artists and composers, the knowledge bases and specialist expert skills that can be utilised to gain the required information to design and build such devices, and the low-cost hardware and software development tools appropriate for such a task. This research resulted in the development of the Smart Controller, a portable hardware/software device that allows performers to create music using programmable logic control. The device can be programmed remotely through the use of a patch editor or Workbench, which is an independent computer application that simulates and communicates with the hardware. The Smart Controller responds to input control voltages, Open Sound Control, and MIDI messages, producing output control voltages, Open Sound Control, and MIDI messages (depending upon the patch1). The Smart Controller is a stand alone device—a powerful, reliable, and compact instrument—capable of reducing the number of electronic modules required in a live performance or sound installation, particularly the requirement for a laptop computer. The success of this research was significantly facilitated through the use of the iterative development technique known as the Unified process instead of the traditional Waterfall model; and through the use of the RTEMS real-time operating system as the underlying scheduling system for the embedded hardware, an operating system originally designed for guided missile systems.
Conference Paper
Deadlock occurs when all threads of a program remain in their current state and cannot move forward. These threads execute concurrently in multi-core CPUs. As the execution order of their code lines is uncertain, it is extremely difficult to locate the accurate position that deadlock occurs without modifying the source code. C/C++, Qt and Java are three commonly used programming languages in Linux. This paper presents an intelligent scheme of deadlock locating for these languages. By modifying the kernel of pthreads, Qt and OpenJDK, we redesign three kinds of resource functions: mutex, lock and semaphore. At runtime, the file names and line numbers of these functions which a user's program calls are written to a shared memory database called Redis. The data in Redis can be fetched by two tools. One graphical tool is responsible for displaying the usage of resources and do deadlock analysis. Another is used to detect deadlock periodically and write deadlock to a journal file, or notify users by mail or short message. A plugin is also developed respectively for QtCreator and Eclipse. Both tools can be started from either plugin. The deadlock detection method does not need to modify the source code of a user program, which greatly facilitates the user to determine the location of deadlock.
Article
The Decision Support Workshop of May 2-4, 2000 held in San Luis Obispo, Cal., was the second in a series that was started one year earlier as a joint project of the Office of Naval Research and the Collaborative Agent Design Research Center of Cal Poly. The goal of this series of Workshops is to provide a forum where connections can be established on one hand between developers and proponents of decision support tools, with potential users such as managers of large, complex organizations/systems on the other. Clearly, the military belong to this class of users and it is therefore not surprising that ONR has a vested interest in promoting research in this particular field. It is also clear that the class of potential users is not restricted to the military - in fact civilian government bodies as well as business and industry entities should be strongly interested in adopting these tools (and their future refinements) for their own specific purposes. The list of the speakers and the topics presented during the Workshop does indeed attest to the variety of areas where decision support systems are already in use. This Workshop has concentrated on the human-computer interaction. Although computers are after all man-made devices, there is a peculiarity in the way humans interact with a computer that has no parallel in human-human interactions. This was brought out in an interesting talk by Dr. Ron DeMarco. Other areas where computers play a major role included the topic of how information is handled, secured, and assured. Since the basis of all decision making is accurate , uncontaminated information, this is a very important topic that was excellently treated by Mr. Steve York and Ms. Virginia Wiggins in their presentations. Other highlights included a thought-provoking talk by RADM C. L. Munns that raised many questions concerning decision support in the Fleet. An interesting description of the risks of misusing information technology was given, with his usual verve, by Dr. Gary Klein. The reader of these Proceedings will find other excellent discussions of decision support systems, in particular the agent-based ones described by the senior staff of CADRC.
Article
This report provides an overview description of the Toolkit for Information Representation and Agent Collaboration (TIRAC™) software framework for the development of intelligent decision-support applications. More technical descriptions of TIRAC™ are contained in a companion CDM Technical Report (CDM-19-03) entitled: ‘The TIRAC™ Development Toolkit: Technical Description’. TIRAC™ is an application development framework and toolkit for decision-support systems incorporating software agents that collaborate with each other and human users to monitor changes (i.e., events) in the state of problem situations, generate and evaluate alternative plans, and alert human users to immediate and developing resource shortages, failures, threats, and similar adverse conditions. A core component of any TIRAC-based application is a virtual representation of the real world problem (i.e., decision-making) domain. This virtual representation takes the form of an internal information model, commonly referred to as an ontology. By providing context (i.e., data plus relationships) the ontology is able to support the automated reasoning capabilities of rule-based software agents. Principal objectives that are realized to varying degrees by the TIRAC™ toolkit include: support of an ontology-based, information-centric, distributed system environment that limits internal communications to changes in information; ability to automatically ‘push’ changes in information to clients, based on individual subscription profiles that are changeable during execution; ability of clients to assign priorities to their subscription profiles; ability of clients to generate information queries in addition to their standing subscription-based requests; automatic management of object relationships (i.e., associations) during the creation, deletion and editing of objects; support for the management of internal communication transmissions through load balancing, self-diagnosis, self-association and self-healing capabilities; and, the ability to interface with external data sources through translators and ontological facades. Most importantly, the TIRAC™ toolkit is designed to support the machine generation of significant portions of both the server and client side code of an application. This is largely accomplished with scripts that automatically build an application engine by integrating toolkit components with the ontological properties derived from the internal information model. In this respect, an TIRAC-based application consists of loosely coupled, generic services (e.g., subscription, query, persistence, agent engine), which in combination with the internal domain-specific information model are capable of satisfying the functional requirements of the application field. Particular TIRAC™ design notions and features that have been incorporated in response to the increasing need for achieving interoperability among heterogeneous systems include: support for overarching ontologies in combination with more specialized, domain-specific, lower level facades; compliance with Defense Information Infrastructure (DII) Common Operating Environment (COE) segmentation principles, and their recent transition to the more challenging information-centric objectives of the Global Information Grid (GIG) Enterprise Services (GES) environment; seamless transition from one functional domain to another; operational integration to allow planning, rehearsal, execution, gaming, and modeling functions to be supported within the same application; and, system diagnosis with the objective of ensuring graceful degradation through self-monitoring, self-diagnosis, and failure alert capabilities. An TIRAC-based software development process offers at least four distinct advantages over current data-centric software development practices. First, it provides a convenient structured transition to information-centric software applications and systems in which computer-based agents with reasoning capabilities assist human users to accelerate the tempo and increase the accuracy of decision-making activities. Second, TIRAC™ allows software developers to automatically generate a significant portion of the code, leaving essentially only the domain-specific user-interface functions and individual agents to be designed and coded manually. Third, TIRAC™ disciplines the software development process by shifting the focus from implementation to design, and by structuring the process into clearly defined stages. Each of these stages produces a set of verifiable artifacts, including a well defined and comprehensive documentation trail. Finally, TIRAC™ provides a development platform for achieving interoperability by formalizing a common language and compatible representation across multiple applications.
Chapter
One major problem with non-rigid image registration techniques is their high computational cost. Because of this, these methods have found limited application to clinical situations where fast execution is required, e.g., intra-operative imaging. This paper applies a parallel implementation of a non-rigid image registration algorithm to pre and intra-operative MR images and quantitatively analyzes its scaling properties. The method computes the intra-operative brain deformation in about one minute using 64 CPUs on a 128-CPU shared-memory supercomputer (SGI Origin 3800). The serial component is no more than 2 percent of the total computation time, allowing a speedup of at least a factor of 50. In most cases, the theoretical limit of the speedup is substantially higher (up to 132-fold in the application examples presented in this paper). Our parallel algorithm is therefore capable of solving non-rigid registration problems with short execution time requirements and may be considered an important step in the application of such techniques to clinically important problems such as the computation of brain deformation during cranial image-guided surgery.
Article
INTRODUCTION A thread, also known as a lightweight process, provides the ability to have numerous paths of execution through a program be traversed at the same time. Multitasking is the ability to run numerous processes on a single CPU in a similar fashion to how threads appear to be executing in parallel in a multithreaded system. The multithreaded methodology is a programming paradigm that is well suited to parallel and distributed processing. Multithreading is supported on the majority of modern computer systems so an optimal multithreaded implementation is highly desirable. The multithreading philosophy has impacted many areas of computer science and its application has shown benefits in areas such as artificial intelligence. A concept that needs to be understood is the difference between those threaded implementations that support user level threads and those that support kernel level threads. While both user level and kernel level threads are considered lightweight processes, the
Article
Full-text available
This is paper summarizes an interdisciplinary, fourth-semester, undergraduate course where the development of a small, all, autonomous robot serves as the focus application. The disciplines of microprocessors, programming, digital and analog electronics, mathematical modeling, dynamical systems, and control theory are the elements of this course. The aim of the course is to teach the basics theory and how to complete the engineering design project from specification to working model of the specified product. During the course students work in teams and build the robots, which perform a compulsory task and a free task. The course ends with a competition - 3 STARS Robot Race. The competition, to design the best robot, is one of the most important motivation factors. We summarize with a discussion of the evaluation results and the students' own opinion of this learning method.
Article
An embedded positioning system, which can be applied to seismic exploration, is designed. Meeting the characteristics of field work and the instantaneity of data transmission, the system is composed by the positioning terminals and supplemented by earthquake decision-making experts and monitoring center. With the style of mobile phone message and monitoring center, the real-time transmission and query of positioning data are mobile.
Conference Paper
We propose a platform-independent multithread routing method for FPGAs including two aspects: single high fanout net is routed parallel within itself and several low fanout nets are routed parallel between themselves. Routing for high fanout nets usually takes considerable time because of the large physical area surrounded by bounding boxes to traverse and tens of terminals to connect. Therefore, one high fanout net is partitioned into several subnets with fewer terminals and smaller bounding boxes to be routed in parallel. However, low fanout nets with intrinsic small bounding boxes and few terminals could hardly be divided. Instead, low fanout nets whose bounding boxes are not overlapping with each other are routed concurrently. A new graph, named bounding box graph, was utilized to facilitate the process of selecting several nets to be routed concurrently. In this graph, one vertex stands for a corresponding net and one edge between two connected vertex means that the two represented nets have their bounding boxes overlapped. Several strategies are introduced to balance the load among threads and ensure the deterministic results. The routing times scale down with increasing number of threads. On a 4-core processor, this technique improves the run-time by ~1.9 × with routing quality degrading by no more than 2.3%.
Conference Paper
A platform-independent multithread routing method for FPGAs is proposed in this paper. Specifically, the proposed method includes two aspects for maximal parallelization. First, for high fanout net which usually takes considerable time to be routed due to large bounding boxes and number of terminals, it is partitioned into several subnets to be routed in parallel. Second, low fanout nets with non-overlapping bounding boxes are identified and routed in parallel as well to further speed up the routing process. A bounding box graph was constructed to facilitate the process of selecting nets to be routed concurrently. In addition, load balancing and synchronization strategies are introduced to raise routing efficiency and ensure the deterministic results. Experiments on different platforms and benchmarks with various combinations of high and low fanout nets are carried out. This technique improves the run-time by ~1.9 × with routing quality degrading by no more than 2.3%, on a quad-core processor platform.
Article
Ten years ago the CAD Research Center at California Polytechnic University in San Luis Obispo, California identified a standard framework for agent-based, decision support systems. Employing inter-process and inference engine technologies of the time, the CAD Research Center termed this 'blueprint' the Integrated Collaborative Decision-Making (ICDM) framework. Over the past twelve years ICDM has been successfully used as a foundation in several systems. These systems focus on a wide range of domains of application including architectural design and ship cargo stowage. Success of the ICDM framework in conjunction with the availability of newer technologies has prompted an evolutionary leap in the ICDM architecture. Capitalizing on the recent introduction of technologies such as distributed object servers and web-based computing the second generation of ICDM promises to maintain its position on the technological cutting-edge. This paper describes this second stage in evolution of the ICDM framework
Article
Full-text available
Proceedings of a decision-support workshop hosted by the Collaborative Agent Design Research Center of the California Polytechnic State University, San Luis Obispo on May 2-4, 2000. Includes 16 papers by military, government and industry experts in the design, development and utilization of military decision-support systems. With a theme on The Human-Computer Partnership in Decision-Support the proceedings are divided into two sections. Section One includes formal presentations and papers, and Section Two provides a summary of Open Forum discussions that took place on the afternoons of the first two days of the workshop. Papers cover topics dealing with: information representation; information security; information superiority; information assurance; information misuse; communication networks; warfighting experimentation with decision-support systems; and, evolutionary computing applications to decision-support software systems. Discussions focused on: Expeditionary Command and Control Users; Appropriate R&D Directions; System design Requirements; and, Communication Infrastructure.
Article
This paper presents a portable mechanism for vectorization of a Hardware Description Language (HDL) in a multi-processing environment. Each of the functional modules in the environment is atomized and put in a centralized event queue using the traditional dynamic event scheduling mechanism. However, during the execution phase, the set of events in a particular time instant form, what we call a `rope' of independent events. Each of the events in this rope is simulated through the creation of an independent thread in the environment. In a multi processing operating system (OS) this means, depending on the availability of processor time-slice, independent threads created from the same process will run parallely on different processors without any special input from the program, thus ensuring a platform-independent portable mechanism . As there is no direct interaction between the program and the kernel of the OS, it is possible to port the code even on a uni processor machine with the worst-case performance being same as that of a sequential simulator.
Article
Abstract Sampling-based algorithms have become the favored approach for solving path and motion planning problems due to their ability to successfully deal with complex high-dimensional planning scenarios. This thesis presents an overview of existing sampling-based path planning methods for tree-structured rigid robots. The two most prominent algorithm families, Rapidly-exploring Dense Trees and Probabilistic Roadmaps, are examined with respect to their computational parallelizability. In addition, a parallel cell-based roadmap planning algorithm is proposed, which uti- lizes a novel dimensionality reduction technique for configuration space grids. The described methods are benchmarked on a number of 2D scenarios using a newly de- veloped path planning library. The results show that on average significant speedups can be achieved, but that the individual algorithms scale very differently. Critical points in the current implementation are discussed and future improvements are suggested. Zusammenfassung
Chapter
Multimedia data is ever increasing, and efficient and effective solutions in multimedia computing and processing are therefore highly sought after. In this paper, we address the problem of analysing and processing multimedia data in a distributed fashion using multiple intelligent agents that communicate via a blackboard interface. We propose a system with three different kinds of agents. A Distributor agent splits multimedia data into smaller segments before placing them on the blackboard.Worker agents retrieve these segments and process them in a distributed fashion. An Accumulator agent then reconstructs the processed multimedia output. Co-ordination of agents is achieved by means of reactive behaviour and communication via the blackboard, thus removing the need for a dedicated control module and associated overheads.
ResearchGate has not been able to resolve any references for this publication.