Threads primer: a guide to multi-threaded programming

Performance and programmability comparison of the thick control flow architecture and current multicore processors

Article

Full-text available

Feb 2022
J SUPERCOMPUT

Commercial multicore central processing units (CPU) integrate a number of processor cores on a single chip to support parallel execution of computational tasks. Multicore CPUs can possibly improve performance over single cores for independent parallel tasks nearly linearly as long as sufficient bandwidth is available. Ideal speedup is, however, difficult to achieve when dense intercommunication between the cores or complex memory access patterns is required. This is caused by expensive synchronization and thread switching, and insufficient latency toleration. These facts guide programmers away from straight-forward parallel processing patterns toward complex and error-prone programming techniques. To address these problems, we have introduced the Thick control flow (TCF) Processor Architecture. TCF is an abstraction of parallel computation that combines self-similar threads into computational entities. In this paper, we compare the performance and programmability of an entry-level TCF processor and two Intel Skylake multicore CPUs on commonly used parallel kernels to find out how well our architecture solves these issues that greatly reduce the productivity of parallel software development. Code examples are given and programming experiences recorded.

Implementation and performance aspects of Kahn process networks

Article

Jan 2009

Željko Vrba

The appearance of commodity multi-core processors, has spawned a wide interest in parallel programming, which is widely-regarded as more challenging than sequential programming. KPNs are a model of concurrency that relies exclusively on message passing, and that has some advantages over parallel programming tools in wide use today: simplicity, graphical representation, and determinism. Because of determinism, it is possible to reliably reproduce faults, an otherwise notoriously difficult problem with parallel programs. KPNs have gained acceptance in simulation and signal-processing communities. In this thesis, we investigate the applicability of KPNs to implementing general-purpose parallel computations for multi-core machines. In particular, we investigate 1) how KPNs can be used for modeling general-purpose problems; 2) how an efficient KPN run-time can be implemented; 3) what KPN scheduling strategies give good run-time performance. For these purposes, we have developed Nornir, an efficient run-time system for executing KPNs. With Nornir, we show that it is possible to develop a high-performance KPN run-time for multi-core machines. We experimentally demonstrate that problems expressed in the Kahn model resemble very much their sequential implementations, yet perform much better than when expressed in the MapReduce model, which has become widely-recognized as a simple parallel programming model. Lastly, we use Nornir to evaluate several load-balancing methods: static assignment, work-stealing, our improvement of work-stealing, and amethod based on graph partitioning. The understanding brought by this evaluation is significant not only in the context of the Kahn model, but also in the more general context of load-balancing (potentially distributed) applications written in message-passing style.

Parallel Algorithms for Dynamic Shortest Path Problems

Article

May 2002

The development of intelligent transportation systems (ITS) and the resulting need for the solution of a variety of dynamic traffic network models and management problems require faster-than-real-time computation of shortest path problems in dynamic networks. Recently, a sequential algorithm was developed to compute shortest paths in discrete time dynamic networks from all nodes and all departure times to one destination node. The algorithm is known as algorithm DOT and has an optimal worst-case running-time complexity. This implies that no algorithm with a better worst-case computational complexity can be discovered. Consequently, in order to derive algorithms to solve all-to-one shortest path problems in dynamic networks, one would need to explore avenues other than the design of sequential solution algorithms only. The use of commercially-available high-performance comput-ing platforms to develop parallel implementations of sequential algorithms is an example of such avenue. This paper reports on the design, implementation, and computational testing of parallel dynamic shortest path algorithms. We develop two shared-memory and two message-passing dynamic shortest path algorithm implementations, which are derived from algorithm DOT using the following parallelization strategies: decom-position by destination and decomposition by transportation network topology. The algorithms are coded using two types of parallel computing environments: a message-passing environment based on the parallel virtual machine (PVM) library and a multi-threading environment based on the SUN Microsystems Multi-Threads (MT) library. We also develop a time-based parallel version of algorithm DOT for the case of minimum time paths in FIFO networks, and a theoretical parallelization of algorithm DOT on an 'ideal' theoretical parallel machine. Performances of the implementations are analyzed and evaluated using large transportation networks, and two types of parallel computing platforms: a distributed network of Unix workstations and a SUN shared-memory machine containing eight processors. Satisfactory speed-ups in the running time of sequential algorithms are achieved, in particular for shared-memory machines. Numerical results indicate that shared-memory computers constitute the most appropriate type of parallel computing platforms for the computation of dynamic shortest paths for real-time ITS applications.

Generic parallel multithreaded programming of domain decomposition methods

Article

Sep 2001

Andréa Schwertner-Charão

Numerical simulation applications requiring the resolution of Partial Differential Equation (PDE) problems are often parallelized using domain decomposition methods. These mathematical methods are well adapted to parallel computing, however their effective exploitation on parallel machines becomes difficult when the applications have an irregular behavior. This is the case for example when the mathematical problems are solved over complex geometries or when one uses mesh refinement techniques. A programming technique that is useful to cope with irregular parallel applications is multithreading. In this thesis we perform a thorough study on the use of this programming paradigm for solving PDE problems through domain decomposition methods, and we show that a generic algorithmic writing of this methods is possible. One of our main contributions resides in the design and implementation of a programming harness called Ahpik, allowing for easy development of applications relying on domain decomposition methods. This programming environment provides a generic support that is adaptable to many mathematical methods, which can be synchronous or asynchronous, overlapping or non-overlapping. Its object-oriented design allows to encapsulate implementation details concerning the management of threads and communications, which eases the task of developing new methods. We validate the Ahpik environment in the context of the resolution of some classical PDE problems and in particular for one large problem in computational fluid dynamics.

The Bum Bag Navigator: A Configurable Mobile Multi-Purpose Navigation System for Pedestrians

Article

Ilhan Aslan

CSOM/PL: a virtual machine product line

Article

Full-text available

Jul 2011

CSOM/PL is a software product line (SPL) derived from applying multi-dimensional separation of concerns (MDSOC) techniques to the domain of high-level language virtual machine (VM) implementations. For CSOM/PL, we modularised CSOM, a Smalltalk VM implemented in C, using VMADL (virtual machine architecture description language). Several features of the original CSOM were encapsulated in VMADL modules and composed in various combinations. In an evaluation of our approach, we show that applying MDSOC and SPL principles to a domain as complex as that of VMs is not only feasible but beneﬁcial, as it improves understandability, maintainability, and conﬁgurability of VM implementations without harming performance.

Avaliação do Desempenho da Utilização de Threads em user level em Linux

Article

Full-text available

Apr 2011

Precise and Parallel Pairwise Metagenomic Comparisons

Article

Aug 2018
J COMPUT BIOL

The comparison and assessment of similarity across metagenomes are still an open problem. Uncultivated samples suffer from high variability, thus making it difficult for heuristic sequence comparison methods to find precise matches in reference databases. Finer methods are required to provide higher accuracy and certainty, although these come at the expense of larger computation times. Therefore, in this work, we present our software for the highly parallel, fine-grained pairwise alignment of metagenomes. First, an analysis of the computational limitations of performing coarse-grained global alignments in parallel manner is described, and a solution is discussed and employed by our proposal. Second, we show that our development is competitive with state-of-the-art software in terms of speed and consumption of resources, while achieving more accurate results. In addition, the parallel scheme adopted is tested, depicting a performance of up to 98% efficiency while using up to 64 cores. Sequential optimizations are also tested and show a speedup of 9× over our previous proposal.

Evaluating the User Acceptance Testing for Multi-tenant Cloud Applications

Conference Paper

Full-text available

Jan 2018

SaaS (Software as a Service) is a service delivery model in which an application can be provided on demand via the Internet. Multi-tenant architecture is essential for SaaS because it enables multiple customers, so-called tenants, to share the system's resources in a transparent way to reduce costs and customize the software layer, resulting in variant applications. Despite the popularity of this model, there have been few cases of evaluation of software testing in cloud computing. Many researchers argue that traditional software testing may not be a suitable way of validating cloud applications owing to the high degree of customization, its dynamic environment and multi-tenancy. User Acceptance Testing (UAT) evaluates the external quality of a product and complements previous testing activities. The main focus of this paper is on investigating the ability of the parallel and automated UAT to detect faults with regard to the number of tenants. Thus, our aim is to evaluate to what exte nt the ability to detect faults varies if a different number of variant applications is executed. A case study was designed with a multi-tenant application called iCardapio and a testing framework created through Selenium and JUnit extensions. The results showed a significant difference in terms of detected faults when test scenarios with a single-tenant and multi-tenant were included.

A Framework to protect multiple applications in java using synchronization.

Conference Paper

Full-text available

Mar 2016

Today, everything has gone distributed for so many types of server applications. We have Web servers, application servers, database servers, file servers, and mail servers that maintain worker queues and thread pools to handle large numbers of short tasks that arrive from remote sources. In this paper we have done analysis for multithreaded programs, focusing on ways to improve the efficiency of analyzing interactions between threads. A multithreaded program always contains two or more parts that can run concurrently without lagging and each part can handle different tasks at the same time making optimal use of the available resources. Each task is independent of the other. Multithreading is based on the idea of multitasking in applications where specific operations within a single application are further divided into individual threads. This application of multithreading is developed using Eclipse IDE. Eclipse consists of a base workspace and an extensible plug-in system for customizing the environment.

The Round Table Model: A Web-Oriented, Agent-Based Approach To Decision-Support Applications</em

Article

Aug 1998

Not unlike King Arthur relying on the infamous Round Table as the setting for consultation with his most trusted experts, agent-based, decision-support systems provide human decision makers with a means of solving complex problems through collaboration with collections of both human and computer-based expert agents. The Round Table Framework provides a formalized architecture together with a set of development and execution tools which can be utilized to design, develop, and execute agent-based, decision-support applications. Based on a three-tier architecture, Round Table incorporates forefront technologies including distributed-object servers, inference engines, and web-based presentation to provide a framework for collaborative, agent-based decision making systems.

SEAWAY: A Multi-Agent Decision-Support System for Naval Expeditionary Logistic Operations

Article

Full-text available

Dec 2000

This report describes work performed by CDM Technologies Inc. in conjunction with the Collaborative Agent Design (CAD) Research Center of California Polytechnic State University (Cal Poly), San Luis Obispo, for the Office of Naval Research (ONR), on the SEAWAY experimental system for planning, gaming and executing maritime logistic operations from a sea base. SEAWAY incorporates three fundamental concepts that distinguish it from existing (i.e., legacy) command and control applications. First, it is a collaborative system in which computer-based agents assist human operators by monitoring, analyzing and reasoning about events in near real-time. Second, SEAWAY includes an ontological model of the sea base that represents the behavioral characteristics and relationships among real world entities such as sea base ships, inbound supply ships, supplies and equipment, infrastructure objects (terrain, intermediate embarkation ports, supply points, roads, and rivers), and abstract notions. This object model provides the essential common language that binds all SEAWAY components into an integrated and adaptive decision-support system. Third, SEAWAY provides no ready made solutions that may not be applicable to the problems that will occur in the real world. Instead, the agents represent a powerful set of tools that together with the human operators can adjust themselves to the problem situations that cannot be predicted in advance. In this respect, SEAWAY is an adaptive logistic command and control system that supports planning, execution and training functions concurrently. SEAWAY is an experimental maritime logistic decision-support system that is intended to provide near real-time adaptive command and control in sustaining joint forces from the sea during contingencies. It is based on satisfying the dynamic requirements of joint forces operating ashore, with the ability to provide: offload planning and dynamic re-planning; visibility on all items en route by sea and warehoused at the sea base; track and respond to the dynamic logistic support requirements cycle originating with the supported force ashore; coordinate and control the ship-to-shore ship-to-objective, and ship-to-unit delivery of supplies ashore through a near real-time transport composite operational picture; track supplies and execute reorder; and, provide a full range of warehousing and cargo churning functions aboard the ships of the sea base.

DSM-PM2: a portable platform for implementing multithreaded consistency protocols pour distributed shared memory systems

Article

Nov 2001

Gabriel Antoniu

In their traditional flavor, Distributed Shared Memory (DSM) libraries allow a number of separate processes to share a common address space using a consistency protocol according to a semantics specified by some given consistency model: sequential consistency, release consistency, etc. The processes may usually be physically distributed among a number of computing nodes interconnected through some communication library. Most approaches to DSM programming assume that the DSM library and the underlying architecture are fixed, and that it is up to the programmer to fit his program with them. This static view does not allow experimentations with alternative implementations. The contribution of this thesis consists in proposing a generic impementation and experimentation platform called DSM-PM , which allows both the application and the underlying DSM consistency protocol to be co-designed and tuned for performance. This platform is entirely implemented in software, in user-space. It is portable across a large number of cluster architectures. It provides the basic blocks for implementing and evaluating a large number of multithreaded consistency pro- tocols within a unified framework. Three consistency models are currently supported: sequential consistency, release consistency and Java consistency. Several performances studies have been carried out with multiple multithreaded applications on different clusters, in order to evaluate the proposed consistency protocols. The platform has been validated as a target for a Java compiling system for distributed architectures, called Hyperion.

Parallel Control Algorithm Hybrid Image Recognition

Article

Jan 1998

The increase of processing speed is an important goal in the development of image recognition systems, especially in case of the recognition of complex objects. The use of parallel computers offers possibilities for the acceleration of the necessary algorithms. However, only few research work has been done in the area of high level image recognition. In our contribution parallel knowledge based processing in a hybrid image recognition system is presented. It is based on parallel search strategies and can also be applied to other knowledge based image recognition systems. The implementation was done on a multiprocessor workstation. Our strategies were confirmed by run time measurements. Keywords: control algorithm, hybrid image recognition, multithreading, parallel search, knowledge based systems 1 INTRODUCTION Besides the improvement of the performance of image recognition systems, the increase of processing speed is still an important field of research. This concerns in particular...

Resolution of an inverse thermal problem using parallel processing on shared-memory multiprocessor architectures

Article

Full-text available

Jan 2014

Advances in multi-cores CPUs and in Graphics Processors Units (GPUs) are attracting a lot of attention of the scientific community due to their parallel processing power in conjunction with their low cost. In recent years the resolution of inverse thermal problems (ITP) is gaining increasing importance and attention in simulation-based applied science and engineering. However, the resolutions of these problems are very sensitive to random errors and the computer cost is high. In an attempt to improve the computational performance to solve an ITP, the computational power of multi-core architectures was used and analysed; mainly those offered by the GPU via Compute Unified Device Architecture (CUDA) and multi-cores CPUs via Pthreads. Also, we developed the implementation of the Preconditioned Conjugate Gradient method as a kernel on GPU to solve several sparse linear systems. Our CUDA and Pthreads-based systems are, respectively, two and four times faster than the serial version, while maintaining comparable convergence behaviour.

A STUDY OF THREADS AND MPI LIBRARIES FOR IMPLEMENTING PARALLEL SIMULATION

Article

Full-text available

Jul 2000

This paper covers the two most popular methods for implementing parallel code: Threads and Message Passing Interface (MPI). Both methods are discussed in detail to provide information about the implementation issues of the methods. An in-depth look is taken into the parallelization libraries that are widely used among programmers. The paper also describes, how to write parallel code by using these methods. In addition, two characteristics of parallel computing, synchronization and load balancing, are explored. Finally, a performance study of both methods is presented.

Solving Systems of Interval Linear Equations in Parallel Using Multithreaded Model and “Interval Extended Zero” Method

Conference Paper

Sep 2011

In this paper, an approach to the solution of systems of interval linear equations with the use of the parallel machines is presented, based on parallel multithreaded model and "interval extended zero" method. This approach not only allows us to decrease the undesirable excess width effect, but makes it possible to avoid the inverted interval solutions too. The efficiency of this method has been already proved for non-parallel systems. Here it is shown that it can be also used to perform efficient calculations on parallel machines using the multithreaded model.

Performance estimation of parallel processing techniques on various platforms

Conference Paper

Full-text available

Nov 2012

As information society changes, the digital world is making more use of larger bulks of data and complex operations that need to be executed. This trend has caused overcoming the processor speed limit issues, by introducing multiple processor systems. In spite of hardware-level parallelism, the software has evolved with various techniques for achieving parallel programs execution. Executing a program in parallel can be efficiently done only if the program code follows certain rules. There are many techniques, which tend to provide variant processing speeds. The aim of this paper is to test the Matlab, OpenMPI and Pthreads methods on a single-processor, multi-processor, GRID and cluster systems and suggest optimal method for that particular system.

Molecular dynamics applications and techniques: A comparison study of silica potentials and techniques for accelerating computation

Article

Jan 1999

David Anthony Wolff

This thesis presents a study of applications and techniques for molecular dynamics simulations. Three studies are presented that are intended to improve our ability to simulate larger systems more realistically. A comparison study of two and three-body potential models for liquid and amorphous SiO2 is presented. The structural, vibrational, and dynamic properties of the substance are compared using two- and three-body potential energy models against experimental results. The three-body interaction does poorly at reproducing the experimental phonon density of states, but better at reproducing the Si-O-Si bond angle distribution. The three-body interaction also produces much higher diffusivities than the two-body interactions. A study of tabulated functions in molecular dynamics is presented. Results show that the use of tabulated functions as a method for accelerating the force and potential energy calculation can be advantageous for interactions above a certain complexity level. The decrease in precision due to the use of tabulated functions is negligible when the tables are sufficiently large. Finally, an investigation into the benefits of multi-threaded programming for molecular dynamics is presented.

A Thread-Safe Communication Mechanism for Message-Passing Interface based on MPI standard

Article

Full-text available

Dec 2009

Current high-performance applications development is increasing due to breakthrough advances in microprocessor and power management technologies, network speed and reliability. By this way, distributed parallel applications make use of message-passing interface and multithreaded programming libraries. Nevertheless, drawbacks in message-passing implementations limit the use of thread-safe network communication. This paper presents a thread-safe message-passing interface based on MPI Standard assuring correct message ordering and sender/receiver synchronization.

Parallel Random Number Generators in Java

Article

Andrew Newell

Abstract Scientiflc computing has long been pushing the boundaries of computational re- quirements in computer science. An important aspect of scientiflc computing is the generation of large quantities of random numbers, especially in parallel to take advan- tage of parallel architectures. Many science and engineering programs require random numbers for applications like Monte Carlo simulation. Such an environment suitable for parallel computing is Java, though rarely used for scientiflc applications due to its perceived slowness when compared to complied languages like C. Through research and recommendations, Java is slowly being shaped into a viable language for such computa- tional intense applications. Java has the potential for such large scale applications, since it is a modern language with a large programmer,base and many well received features such as built-in support for parallelism using threads. With improved performance from better compilers, Java is becoming more commonly used for scientiflc computing but Java still lacks a number,of features like optimised scientiflc software libraries. This project looks at the efiectiveness and e‐ciency of implementing a parallel random num- ber library in Java using threads, and explores the options for creating a high-quality parallel generator. The parallel random,number generator library extends the current java.util.Random to add features, like generator selection, and has been implemented as a set of high-quality generators that can be used sequentially or in parallel with- out requiring synchronisation. The implementation is e‐cient with a selection of tests verifying both e‐ciency and efiectiveness. This project has a viable parallel Random API implementation that can be used in parallel scientiflc applications e‐ciently and efiectively, unlike the current standard Java random generators. Acknowledgements I would like to thank my,Supervisor Paul Coddington for all his help and patience

Efficient Scheduling of Parallel Application : The Monotonic Malleable Tasks

Article

Jun 2000

Grégory Mounié

The load balancing and data distribution are major problems to solve in order to implement a parallel application. They require to choose the date and location of the computations. The efficiency of the application is a function of these choices. We will solve this "scheduling problem" with a model recently proposed : the malleable tasks. The introduction to the domain of parallel computing includes the main drawbacks of some standard models. Namely, the fine grain modeling of application requires in these models accurate modeling of data exchange. The involved scheduling problems seem, in our opinion, difficult to handle. An application is handled by the malleable task model as a set of parallel tasks. Each one is executed simultaneously by several processors. The modeling of an application is the standard task graph but communications are taken into account implicitly in the execution time of each malleable tasks. We claim this approach simplifies the scheduling problem practically and theoretically. This document presents firstly the independent malleable tasks scheduling. We analyze previous works and propose a new algorithm in almost two shelves with a performance guarantee of 3/2. An average analysis of the algorithms is also presented. Some previous results for the problems with precedence constraints in related models are recalled. We propose a first approach to the problem of malleable tasks chains. Then, the ocean stream simulation is introduced. The practical use of the malleable tasks model to schedule this simulation is finally exposed.

ARCHITECTURE SPECIFIC COMMUNICATION OPTIMIZATIONS FOR STRUCTURED ADAPTIVE MESH-REFINEMENT APPLICATIONS

Article

Taher Saif

OF THE THESISArchitecture Specific CommunicationOptimizations for Structured AdaptiveMesh-Refinement Applicationsby Taher SaifThesis Director: Professor Manish ParasharDynamic Structured Adaptive Mesh Refinement (SAMR) techniques for solving partialdi#erential equations provide a means for concentrating computational e#ort toappropriate regions in the computational domain. Parallel implementations of thesetechniques typically partition the adaptive heterogeneous grid hierarchy...

IMMACCS: A Multi-Agent Decision-Support System

Article

Full-text available

Jun 2001

This report describes work performed by the Collaborative Agent Design Research Center for the US Marine Corps Warfighting Laboratory (MCWL), on the IMMACCS experimental decision-support system. IMMACCS (Integrated Marine Multi-Agent Command and Control System) incorporates three fundamental concepts that distinguish it from existing (i.e., legacy) command and control applications. First, it is a collaborative system in which computer-based agents assist human operators by monitoring, analyzing, and reasoning about events in near real-time. Second, IMMACCS includes an ontological model of the battlespace that represents the behavioral characteristics and relationships among real world entities such as friendly and enemy assets, infrastructure objects (e.g., buildings, roads, and rivers), and abstract notions. This object model provides the essential common language that binds all IMMACCS components into an integrated and adaptive decision-support system. Third, IMMACCS provides no ready made solutions that may not be applicable to the problems that will occur in the real world. Instead, the agents represent a powerful set of tools that together with the human operators can adjust themselves to the problem situations that cannot be predicted in advance. In this respect, IMMACCS is an adaptive command and control system that supports planning, execution and training functions concurrently. The report describes the nature and functional requirements of military command and control, the architectural features of IMMACCS that are designed to support these operational requirements, the capabilities of the tools (i.e., agents) that IMMACCS offers its users, and the manner in which these tools can be applied. Finally, the performance of IMMACCS during the Urban Warrior Advanced Warfighting Experiment held in California in March, 1999, is discussed from an operational viewpoint.

The ICDM Development Toolkit: Purpose and Overview

Article

Full-text available

May 2004

This report provides an overview description of the Integrated Cooperative Decision-Making (ICDM) software toolkit for the development of intelligent decision-support applications. More technical descriptions of ICDM are contained in a companion CDM Technical Report (CDM-18-04) entitled: ‘The ICDM Development Toolkit: Technical Description’. ICDM is an application development framework and toolkit for decision-support systems incorporating software agents that collaborate with each other and human users to monitor changes (i.e., events) in the state of problem situations, generate and evaluate alternative plans, and alert human users to immediate and developing resource shortages, failures, threats, and similar adverse conditions. A core component of any ICDM-based application is a virtual representation of the real world problem (i.e., decision-making) domain. This virtual representation takes the form of an internal information model, commonly referred to as an ontology. By providing context (i.e., data plus relationships) the ontology is able to support the automated reasoning capabilities of rule-based software agents. Principal objectives that are realized to varying degrees by the ICDM Toolkit include: support of an ontology-based, information-centric system environment that limits internal communications to changes in information; ability to automatically ‘push’ changes in information to clients, based on individual subscription profiles that are changeable during execution; ability of clients to assign priorities to their subscription profiles; ability of clients to generate information queries in addition to their standing subscription-based requests; automatic management of object relationships (i.e., associations) during the creation, deletion and editing of objects; support for the management of internal communication transmissions through load balancing, self-diagnosis, self-association and self-healing capabilities; and, the ability to interface with external data sources through translators and ontological facades. Most importantly, the ICDM Toolkit is designed to support the machine generation of significant portions of both the server and client side code of an application. This is largely accomplished with scripts that automatically build an application engine by integrating Toolkit components with the ontological properties derived from the internal information model. In this respect, an ICDM-based application consists of loosely coupled, generic services (e.g., subscription, query, persistence, agent engine), which in combination with the internal domain-specific information model are capable of satisfying the functional requirements of the application field. Particular ICDM design notions and features that have been incorporated in response to the increasing need for achieving interoperability among heterogeneous systems include: support for overarching ontologies in combination with more specialized, domain-specific, lower level facades; compliance with Defense Information Infrastructure (DII) Common Operating Environment (COE) segmentation principles, and their recent transition to the more challenging information-centric objectives of the Global Information Grid (GIG) Enterprise Services (GES) environment; seamless transition from one functional domain to another; operational integration to allow planning, rehearsal, execution, gaming, and modeling functions to be supported within the same application; and, system diagnosis with the objective of ensuring graceful degradation through self-monitoring, self-diagnosis, and failure alert capabilities. An ICDM-based software development process offers at least four distinct advantages over current data-centric software development practices. First, it provides a convenient structured transition to information-centric software applications and systems in which computer-based agents with reasoning capabilities assist human users to accelerate the tempo and increase the accuracy of decision-making activities. Second, ICDM allows software developers to automatically generate a significant portion of the code, leaving essentially only the domain-specific user-interface functions and individual agents to be designed and coded manually. Third, ICDM disciplines the software development process by shifting the focus from implementation to design, and by structuring the process into clearly defined stages. Each of these stages produces a set of verifiable artifacts, including a well defined and comprehensive documentation trail. Finally, ICDM provides a development platform for achieving interoperability by formalizing a common language and compatible representation across multiple applications.

Helper threads via virtual multithreading on an experimental Itanium® 2 processor-based platform

Article

Dec 2004
Comput Architect News

Helper threading is a technology to accelerate a program by exploiting a processor's multithreading capability to run ``assist'' threads. Previous experiments on hyper-threaded processors have demonstrated significant speedups by using helper threads to prefetch hard-to-predict delinquent data accesses. In order to apply this technique to processors that do not have built-in hardware support for multithreading, we introduce virtual multithreading (VMT), a novel form of switch-on-event user-level multithreading, capable of fly-weight multiplexing of event-driven thread executions on a single processor without additional operating system support. The compiler plays a key role in minimizing synchronization cost by judiciously partitioning register usage among the user-level threads. The VMT approach makes it possible to launch dynamic helper thread instances in response to long-latency cache miss events, and to run helper threads in the shadow of cache misses when the main thread would be otherwise stalled.The concept of VMT is prototyped on an Itanium ® 2 processor using features provided by the Processor Abstraction Layer (PAL) firmware mechanism already present in currently shipping processors. On a 4-way MP physical system equipped with VMT-enabled Itanium 2 processors, helper threading via the VMT mechanism can achieve significant performance gains for a diverse set of real-world workloads, ranging from single-threaded workstation benchmarks to heavily multithreaded large scale decision support systems (DSS) using the IBM DB2 Universal Database. We measure a wall-clock speedup of 5.8% to 38.5% for the workstation benchmarks, and 5.0% to 12.7% on various queries in the DSS workload.

Using SDL Threads in Computer Science and Engineering Courses

Article

Full-text available

This paper describes the experience of the authors in using SDL threads to develop computer science and engineering course materials that cover multi-threaded programming. The courses include data structures, operating systems, computer graphics and video game programming. The techniques developed are also used in the work of students' independent study projects and master's projects. In particular, they have been used to support an NSF CPATH grant to revitalize computer science education and promote computational thinking. This paper includes 3 simple example programs that illustrate threading concepts.

RTI 2.0 architecture

Article

Oct 2011

A recent DMSO (Defense Modeling and Simulation Office) initiative resulted in a new RTI design and build effort. This paper describes the design constructs used in the RTI 2.0 architecture and the driving principles used throughout the design process. Key architectural features are identified and analyzed in terms of meeting the RTI's set of requirements. Concepts such as system scalability, runtime performance, federation-specific tuning, reliability, and maintainability are discussed within the confines of the RTI 2.0 architecture. This paper presents information representing the HLA development process underway by the DMSO and the DoD AMG (Architecture Management Group).

A Simulation-Optimization Technique for Service Level Analysis in Conjunction with Reorder Point Estimation and Lead-Time Consideration: A Case Study in Sea Port

Chapter

Full-text available

Oct 2021

This study offers a step-by-step practical procedure from the analysis of the current status of the spare parts inventory system to advanced service level analysis by virtue of simulation-optimization technique for a real-world case study associated with a seaport. The remarkable variety and immense diversity, on one hand, and extreme complexities not only in consumption patterns but also in the supply of spare parts in an international port with technically advanced port operator machinery, on the other hand, have convinced the managers to deal with this issue in a structural framework. The huge available data require cleaning and classification to properly process them and derive reorder point (ROP) estimation, reorder quantity (ROQ) estimation, and associated service level analysis. Finally, from 247,000 items used in 9 years long, 1416 inventory items are elected as a result of ABC analysis integrating with the analytic hierarchy process (AHP), which led to the main items that need to be kept under strict inventory control. The ROPs and the pertinent quantities are simulated by Arena software for all the main items, each of which took approximately 30 minutes run time on a personal computer to determine near-optimal estimations.

The Smart Controller: an integrated electronic instrument for realtime performance using programmable logic control

Thesis

Full-text available

Sep 2006

Angelo Fraietta

Many contemporary composers and sound artists are using sensing systems, based on control voltage to MIDI converters and laptop computers running algorithmic composition software, to create interactive instruments and responsive environments. Using an integrated device that encapsulates the entire system for performance can reduce latency, improve system stability, and reduce setup complexity. This research addresses the issues of how one can develop such a device, including the techniques one would use to make the design easily upgradeable as newer technologies become available, the programming interface that should be employed for use by artists and composers, the knowledge bases and specialist expert skills that can be utilised to gain the required information to design and build such devices, and the low-cost hardware and software development tools appropriate for such a task. This research resulted in the development of the Smart Controller, a portable hardware/software device that allows performers to create music using programmable logic control. The device can be programmed remotely through the use of a patch editor or Workbench, which is an independent computer application that simulates and communicates with the hardware. The Smart Controller responds to input control voltages, Open Sound Control, and MIDI messages, producing output control voltages, Open Sound Control, and MIDI messages (depending upon the patch1). The Smart Controller is a stand alone device—a powerful, reliable, and compact instrument—capable of reducing the number of electronic modules required in a live performance or sound installation, particularly the requirement for a laptop computer. The success of this research was significantly facilitated through the use of the iterative development technique known as the Unified process instead of the traditional Waterfall model; and through the use of the RTEMS real-time operating system as the underlying scheduling system for the embedded hardware, an operating system originally designed for guided missile systems.

A Preliminary Fault Taxonomy for Multi-tenant SaaS Systems

Conference Paper

May 2019

An Intelligent Deadlock Locating Scheme for Multithreaded Programs

Conference Paper

Mar 2019

Deadlock occurs when all threads of a program remain in their current state and cannot move forward. These threads execute concurrently in multi-core CPUs. As the execution order of their code lines is uncertain, it is extremely difficult to locate the accurate position that deadlock occurs without modifying the source code. C/C++, Qt and Java are three commonly used programming languages in Linux. This paper presents an intelligent scheme of deadlock locating for these languages. By modifying the kernel of pthreads, Qt and OpenJDK, we redesign three kinds of resource functions: mutex, lock and semaphore. At runtime, the file names and line numbers of these functions which a user's program calls are written to a shared memory database called Redis. The data in Redis can be fetched by two tools. One graphical tool is responsible for displaying the usage of resources and do deadlock analysis. Another is used to detect deadlock periodically and write deadlock to a journal file, or notify users by mail or short message. A plugin is also developed respectively for QtCreator and Eclipse. Both tools can be started from either plugin. The deadlock detection method does not need to modify the source code of a user program, which greatly facilitates the user to determine the location of deadlock.

Determination of Optimal Thread Pool for Cloud Based Concurrent Enhanced No-Escape Search

Conference Paper

Aug 2018

Analysis of multi-threading time metric on single and multi-core CPUs with Matrix Multiplication

Conference Paper

Full-text available

Feb 2017

Proceedings of the 2000 ONR Decision-Support Workshop Series: The Human-Computer Partnership in Decision-Support

Article

May 2000

Collaborative Agent Design Research Center

The Decision Support Workshop of May 2-4, 2000 held in San Luis Obispo, Cal., was the second in a series that was started one year earlier as a joint project of the Office of Naval Research and the Collaborative Agent Design Research Center of Cal Poly. The goal of this series of Workshops is to provide a forum where connections can be established on one hand between developers and proponents of decision support tools, with potential users such as managers of large, complex organizations/systems on the other. Clearly, the military belong to this class of users and it is therefore not surprising that ONR has a vested interest in promoting research in this particular field. It is also clear that the class of potential users is not restricted to the military - in fact civilian government bodies as well as business and industry entities should be strongly interested in adopting these tools (and their future refinements) for their own specific purposes. The list of the speakers and the topics presented during the Workshop does indeed attest to the variety of areas where decision support systems are already in use. This Workshop has concentrated on the human-computer interaction. Although computers are after all man-made devices, there is a peculiarity in the way humans interact with a computer that has no parallel in human-human interactions. This was brought out in an interesting talk by Dr. Ron DeMarco. Other areas where computers play a major role included the topic of how information is handled, secured, and assured. Since the basis of all decision making is accurate , uncontaminated information, this is a very important topic that was excellently treated by Mr. Steve York and Ms. Virginia Wiggins in their presentations. Other highlights included a thought-provoking talk by RADM C. L. Munns that raised many questions concerning decision support in the Fleet. An interesting description of the risks of misusing information technology was given, with his usual verve, by Dr. Gary Klein. The reader of these Proceedings will find other excellent discussions of decision support systems, in particular the agent-based ones described by the senior staff of CADRC.

The TIRAC™ Development Toolkit: Purpose and Overview

Article

Aug 2004

This report provides an overview description of the Toolkit for Information Representation and Agent Collaboration (TIRAC™) software framework for the development of intelligent decision-support applications. More technical descriptions of TIRAC™ are contained in a companion CDM Technical Report (CDM-19-03) entitled: ‘The TIRAC™ Development Toolkit: Technical Description’. TIRAC™ is an application development framework and toolkit for decision-support systems incorporating software agents that collaborate with each other and human users to monitor changes (i.e., events) in the state of problem situations, generate and evaluate alternative plans, and alert human users to immediate and developing resource shortages, failures, threats, and similar adverse conditions. A core component of any TIRAC-based application is a virtual representation of the real world problem (i.e., decision-making) domain. This virtual representation takes the form of an internal information model, commonly referred to as an ontology. By providing context (i.e., data plus relationships) the ontology is able to support the automated reasoning capabilities of rule-based software agents. Principal objectives that are realized to varying degrees by the TIRAC™ toolkit include: support of an ontology-based, information-centric, distributed system environment that limits internal communications to changes in information; ability to automatically ‘push’ changes in information to clients, based on individual subscription profiles that are changeable during execution; ability of clients to assign priorities to their subscription profiles; ability of clients to generate information queries in addition to their standing subscription-based requests; automatic management of object relationships (i.e., associations) during the creation, deletion and editing of objects; support for the management of internal communication transmissions through load balancing, self-diagnosis, self-association and self-healing capabilities; and, the ability to interface with external data sources through translators and ontological facades. Most importantly, the TIRAC™ toolkit is designed to support the machine generation of significant portions of both the server and client side code of an application. This is largely accomplished with scripts that automatically build an application engine by integrating toolkit components with the ontological properties derived from the internal information model. In this respect, an TIRAC-based application consists of loosely coupled, generic services (e.g., subscription, query, persistence, agent engine), which in combination with the internal domain-specific information model are capable of satisfying the functional requirements of the application field. Particular TIRAC™ design notions and features that have been incorporated in response to the increasing need for achieving interoperability among heterogeneous systems include: support for overarching ontologies in combination with more specialized, domain-specific, lower level facades; compliance with Defense Information Infrastructure (DII) Common Operating Environment (COE) segmentation principles, and their recent transition to the more challenging information-centric objectives of the Global Information Grid (GIG) Enterprise Services (GES) environment; seamless transition from one functional domain to another; operational integration to allow planning, rehearsal, execution, gaming, and modeling functions to be supported within the same application; and, system diagnosis with the objective of ensuring graceful degradation through self-monitoring, self-diagnosis, and failure alert capabilities. An TIRAC-based software development process offers at least four distinct advantages over current data-centric software development practices. First, it provides a convenient structured transition to information-centric software applications and systems in which computer-based agents with reasoning capabilities assist human users to accelerate the tempo and increase the accuracy of decision-making activities. Second, TIRAC™ allows software developers to automatically generate a significant portion of the code, leaving essentially only the domain-specific user-interface functions and individual agents to be designed and coded manually. Third, TIRAC™ disciplines the software development process by shifting the focus from implementation to design, and by structuring the process into clearly defined stages. Each of these stages produces a set of verifiable artifacts, including a well defined and comprehensive documentation trail. Finally, TIRAC™ provides a development platform for achieving interoperability by formalizing a common language and compatible representation across multiple applications.

Intra-operative brain deformation using non-rigid image registration on a shared-memory multiprocessor computer

Chapter

Jan 2002

One major problem with non-rigid image registration techniques is their high computational cost. Because of this, these methods have found limited application to clinical situations where fast execution is required, e.g., intra-operative imaging. This paper applies a parallel implementation of a non-rigid image registration algorithm to pre and intra-operative MR images and quantitatively analyzes its scaling properties. The method computes the intra-operative brain deformation in about one minute using 64 CPUs on a 128-CPU shared-memory supercomputer (SGI Origin 3800). The serial component is no more than 2 percent of the total computation time, allowing a speedup of at least a factor of 50. In most cases, the theoretical limit of the speedup is substantially higher (up to 132-fold in the application examples presented in this paper). Our parallel algorithm is therefore capable of solving non-rigid registration problems with short execution time requirements and may be considered an important step in the application of such techniques to clinically important problems such as the computation of brain deformation during cranial image-guided surgery.

UMA VISÃO GERAL SOBRE THREADS

Article

Oct 2007

Rosely Scheffer

A Comparison of Multithreading Implementations

Article

Oct 1998

INTRODUCTION A thread, also known as a lightweight process, provides the ability to have numerous paths of execution through a program be traversed at the same time. Multitasking is the ability to run numerous processes on a single CPU in a similar fashion to how threads appear to be executing in parallel in a multithreaded system. The multithreaded methodology is a programming paradigm that is well suited to parallel and distributed processing. Multithreading is supported on the majority of modern computer systems so an optimal multithreaded implementation is highly desirable. The multithreading philosophy has impacted many areas of computer science and its application has shown benefits in areas such as artificial intelligence. A concept that needs to be understood is the difference between those threaded implementations that support user level threads and those that support kernel level threads. While both user level and kernel level threads are considered lightweight processes, the

Learning Robotics by Combining the Theory with Practical Design and Competition in Undergraduate Engineering Education

Article

Full-text available

Jan 2007

Anna Friesel

This is paper summarizes an interdisciplinary, fourth-semester, undergraduate course where the development of a small, all, autonomous robot serves as the focus application. The disciplines of microprocessors, programming, digital and analog electronics, mathematical modeling, dynamical systems, and control theory are the elements of this course. The aim of the course is to teach the basics theory and how to complete the engineering design project from specification to working model of the specified product. During the course students work in teams and build the robots, which perform a compulsory task and a free task. The course ends with a competition - 3 STARS Robot Race. The competition, to design the best robot, is one of the most important motivation factors. We summarize with a discussion of the evaluation results and the students' own opinion of this learning method.

An Embedded Positioning System Applied to the Seismic Exploration

Article

Dec 2012

An embedded positioning system, which can be applied to seismic exploration, is designed. Meeting the characteristics of field work and the instantaneity of data transmission, the system is composed by the positioning terminals and supplemented by earthquake decision-making experts and monitoring center. With the style of mobile phone message and monitoring center, the real-time transmission and query of positioning data are mobile.

Real-time Linux in Control Applications Area

Article

R A Stephan

A novel multithread routing method for FPGAs (abstract only)

Conference Paper

Feb 2013

We propose a platform-independent multithread routing method for FPGAs including two aspects: single high fanout net is routed parallel within itself and several low fanout nets are routed parallel between themselves. Routing for high fanout nets usually takes considerable time because of the large physical area surrounded by bounding boxes to traverse and tens of terminals to connect. Therefore, one high fanout net is partitioned into several subnets with fewer terminals and smaller bounding boxes to be routed in parallel. However, low fanout nets with intrinsic small bounding boxes and few terminals could hardly be divided. Instead, low fanout nets whose bounding boxes are not overlapping with each other are routed concurrently. A new graph, named bounding box graph, was utilized to facilitate the process of selecting several nets to be routed concurrently. In this graph, one vertex stands for a corresponding net and one edge between two connected vertex means that the two represented nets have their bounding boxes overlapped. Several strategies are introduced to balance the load among threads and ensure the deterministic results. The routing times scale down with increasing number of threads. On a 4-core processor, this technique improves the run-time by ~1.9 × with routing quality degrading by no more than 2.3%.

A novel net-partition-based multithread FPGA routing method

Conference Paper

Sep 2013

A platform-independent multithread routing method for FPGAs is proposed in this paper. Specifically, the proposed method includes two aspects for maximal parallelization. First, for high fanout net which usually takes considerable time to be routed due to large bounding boxes and number of terminals, it is partitioned into several subnets to be routed in parallel. Second, low fanout nets with non-overlapping bounding boxes are identified and routed in parallel as well to further speed up the routing process. A bounding box graph was constructed to facilitate the process of selecting nets to be routed concurrently. In addition, load balancing and synchronization strategies are introduced to raise routing efficiency and ensure the deterministic results. Experiments on different platforms and benchmarks with various combinations of high and low fanout nets are carried out. This technique improves the run-time by ~1.9 × with routing quality degrading by no more than 2.3%, on a quad-core processor platform.

Hierarchical Filtering-based Monitoring Architecture for Large-Scale Distributed Systems

Article

Jan 1998

Ehab Al-Shaer

The Second Generation Integrated Collaborative Decision Making (ICDM) Model: A Three-Tier Approach to Agent-Based, Decision-Support Systems

Article

Aug 1999

Kym J. Pohl

Ten years ago the CAD Research Center at California Polytechnic University in San Luis Obispo, California identified a standard framework for agent-based, decision support systems. Employing inter-process and inference engine technologies of the time, the CAD Research Center termed this 'blueprint' the Integrated Collaborative Decision-Making (ICDM) framework. Over the past twelve years ICDM has been successfully used as a foundation in several systems. These systems focus on a wide range of domains of application including architectural design and ship cargo stowage. Success of the ICDM framework in conjunction with the availability of newer technologies has prompted an evolutionary leap in the ICDM architecture. Capitalizing on the recent introduction of technologies such as distributed object servers and web-based computing the second generation of ICDM promises to maintain its position on the technological cutting-edge. This paper describes this second stage in evolution of the ICDM framework

A Decision-Support Workshop (Proceedings) Held in San Luis Obispo, California on May 2-4, 2000

Article

Full-text available

Nov 2000

Proceedings of a decision-support workshop hosted by the Collaborative Agent Design Research Center of the California Polytechnic State University, San Luis Obispo on May 2-4, 2000. Includes 16 papers by military, government and industry experts in the design, development and utilization of military decision-support systems. With a theme on The Human-Computer Partnership in Decision-Support the proceedings are divided into two sections. Section One includes formal presentations and papers, and Section Two provides a summary of Open Forum discussions that took place on the afternoons of the first two days of the workshop. Papers cover topics dealing with: information representation; information security; information superiority; information assurance; information misuse; communication networks; warfighting experimentation with decision-support systems; and, evolutionary computing applications to decision-support software systems. Discussions focused on: Expeditionary Command and Control Users; Appropriate R&D Directions; System design Requirements; and, Communication Infrastructure.

A Portable Mechanism for Vectorizing Compiled Event-Driven Simulation

Article

Mar 1997

Swapnajit Mittra

This paper presents a portable mechanism for vectorization of a Hardware Description Language (HDL) in a multi-processing environment. Each of the functional modules in the environment is atomized and put in a centralized event queue using the traditional dynamic event scheduling mechanism. However, during the execution phase, the set of events in a particular time instant form, what we call a `rope' of independent events. Each of the events in this rope is simulated through the creation of an independent thread in the environment. In a multi processing operating system (OS) this means, depending on the availability of processor time-slice, independent threads created from the same process will run parallely on different processors without any special input from the program, thus ensuring a platform-independent portable mechanism . As there is no direct interaction between the program and the kernel of the OS, it is possible to port the code even on a uni processor machine with the worst-case performance being same as that of a sequential simulator.

Parallelized Sampling-based Path Planning for Tree-structured Rigid Robots

Article

Apr 2009

Klaas Klasing

Abstract Sampling-based algorithms have become the favored approach for solving path and motion planning problems due to their ability to successfully deal with complex high-dimensional planning scenarios. This thesis presents an overview of existing sampling-based path planning methods for tree-structured rigid robots. The two most prominent algorithm families, Rapidly-exploring Dense Trees and Probabilistic Roadmaps, are examined with respect to their computational parallelizability. In addition, a parallel cell-based roadmap planning algorithm is proposed, which uti- lizes a novel dimensionality reduction technique for configuration space grids. The described methods are benchmarked on a number of 2D scenarios using a newly de- veloped path planning library. The results show that on average significant speedups can be achieved, but that the individual algorithms scale very differently. Critical points in the current implementation are discussed and future improvements are suggested. Zusammenfassung

Distributed Multimedia Computing Using Intelligent Agents

Chapter

Sep 2010

Multimedia data is ever increasing, and efficient and effective solutions in multimedia computing and processing are therefore highly sought after. In this paper, we address the problem of analysing and processing multimedia data in a distributed fashion using multiple intelligent agents that communicate via a blackboard interface. We propose a system with three different kinds of agents. A Distributor agent splits multimedia data into smaller segments before placing them on the blackboard.Worker agents retrieve these segments and process them in a distributed fashion. An Accumulator agent then reconstructs the processed multimedia output. Co-ordination of agents is achieved by means of reactive behaviour and communication via the blackboard, thus removing the need for a dedicated control module and associated overheads.

Threads primer: a guide to multi-threaded programming

No full-text available

Recommended publications

EVALUACIÓN DE LA COMPETENCIA LECTORA EN ESTUDIANTES DE PRIMER AÑO DE CARRERAS DEL ÁREA HUMANISTA Y C...

Intervenció del preparador laboral en situacions crítiques en l'aplicació de programes de treball am...

A programmed primer on literary analysis.

Identidad, treball i èxit. Estudi de cas de tres talent shows declinats a nens: MasterChef Junior, T...