Home
Texas A&M University
Department of Computer Science and Engineering
Valerie Taylor

Valerie Taylor
Texas A&M University | TAMU · Department of Computer Science and Engineering

PhD

About

145

Publications

14,294

Reads

3,371

Citations

Skills and Expertise

High Performance Computing

Parallel Programming

Parallel and Distributed Computing

Parallel Processing

Scientific Computing

Publications

Using Performance-Power Modeling to Improve Energy Efficiency of HPC Applications

Article

Oct 2016

Energy-efficient scientific applications require insight into how high-performance computing system features impact the applications' power and performance. This insight results from the development of performance and power models. When used with an earthquake simulation and an aerospace application, a proposed modeling framework reduces energy con...

Utilizing Hardware Performance Counters to Model and Optimize the Energy and Performance of Large Scale Scientific Applications on Power-Aware Supercomputers

Conference Paper

May 2016

Power and performance characteristics of CORAL Scalable Science Benchmarks on BlueGene/Q Mira

Conference Paper

Dec 2015

The CORAL Scalable Science Benchmarks are full applications that are expected to test the full scale of future CORAL systems. It is important to better understand power and performance characteristics of these benchmarks before using them to test future CORAL systems. In this paper we have an in-depth analysis of power and performance characteristi...

Transfer student pathways to engineering degrees: A multi-institutional study based in Texas

Conference Paper

Oct 2015

Parallel Optical Flow Processing of 4D Cardiac CT Data on Multicore Clusters

Conference Paper

Dec 2014

SKOPE: a framework for modeling and exploring workload behavior

Article

May 2014

Understanding workload behavior plays an important role in performance studies. The growing complexity of applications and architectures has increased the gap among application developers, performance engineers, and hardware designers. To reduce this gap, we propose SKOPE, a SKeleton framework for Performance Exploration, that produces a descriptiv...

Performance Characteristics of Hybrid MPI/OpenMP Scientific Applications on a Largescale Multithreaded BlueGene/Q Supercomputer

Article

Full-text available

Nov 2013

Many/multi-core supercomputers provide a natural programming paradigm for hybrid MPI/OpenMP scientific applications. In this paper, we investigate the performance characteristics of five hybrid MPI/OpenMP scientific applications (two NAS Parallel benchmarks Multi-Zone SP-MZ and BT-MZ, an earthquake simulation PEQdyna, an aerospace application PMLB...

Culturally Responsive Computing in Urban, After-School Contexts Two Approaches

Article

Full-text available

Sep 2013

The academic performance and engagement of youth from under-represented ethnic groups (African American, Latino, and Indigenous) in science, technology, engineering, and mathematics (STEM) show statistically large gaps in comparison with their White and Asian peers. Some of these differences can be attributed to the direct impact of economic forces...

E-AMOM: An energy-aware modeling and optimization methodology for scientific applications

Article

Aug 2013

In this paper, we present the Energy-Aware Modeling and Optimization Methodology (E-AMOM) framework, which develops models of runtime and power consumption based upon performance counters and uses these models to identify energy-based optimizations for scientific applications. E-AMOM utilizes predictive models to employ run-time Dynamic Voltage and...

MuMMI: Multiple metrics modeling infrastructure for exploring performance and power modeling

Conference Paper

Jul 2013

MuMMI (Multiple Metrics Modeling Infrastructure) environment is an infrastructure that facilitates systematic measurement, modeling, and prediction of performance, power consumption and performance-power tradeoffs for parallel systems. MuMMI builds upon three existing frameworks: Prophesy for performance modeling and prediction of parallel applicat...

MuMMI: Multiple Metrics Modeling Infrastructure

Conference Paper

Jul 2013

The MuMMI (Multiple Metrics Modeling Infrastructure) project is an infrastructure that facilitates systematic measurement, modeling, and prediction of performance, power consumption and performance-power tradeoffs for parallel systems. In this paper, we present the MuMMI framework, which consists of an Instrument or, Databases and Analyzer. The MuM...

Performance Characteristics of Hybrid MPI/OpenMP Scientific Applications on a Large-Scale Multithreaded BlueGene/Q Supercomputer

Conference Paper

Jul 2013

In this paper, we investigate the performance characteristics of five hybrid MPI/OpenMP scientific applications (two NAS Parallel benchmarks Multi-Zone SP-MZ and BT-MZ, an earthquake simulation PEQdyna, an aerospace application PMLB and a 3D particle-in-cell application GTC) on a large-scale multithreaded Blue Gene/Q supercomputer at Argonne Nation...

Parallel Earthquake Simulations on Large-Scale Multicore Supercomputers

Article

May 2012

Earthquakes are one of the most destructive natural hazards on our planet Earth. Hugh earthquakes striking offshore may cause devastating tsunamis, as evidenced by the 11 March 2011 Japan (moment magnitude M w 9:0/ and the 26 December 2004 Sumatra .M w 9:1/ earthquakes. Earthquake prediction (in terms of the precise time, place, and magnitude of a...

Parallel Finite Element Earthquake Rupture Simulations on Quad-and Hex-core Cray XT Systems

Article

May 2012

In this paper, we integrate a 3D mesh generator into the simulation, and use MPI to parallelize the 3D mesh generator, illustrate an element-based partitioning scheme for explicit finite element methods, and based on the partitioning scheme and what we learned from our previous work, we implement our hybrid MPI/OpenMP finite element earthquake simu...

SWAPP: A Framework for Performance Projections of HPC Applications Using Benchmarks

Conference Paper

Full-text available

May 2012

Surrogate-based Workload Application Performance Projection (SWAPP) is a framework for performance projections of High Performance Computing (HPC) applications using benchmark data. Performance projections of HPC applications onto various hardware platforms are important for hardware vendors and HPC users. The projections aid hardware vendors in th...

Approaches to architecture-aware parallel scientific computation

Article

Full-text available

Mar 2012

Modern large-scale scientiflc computation,problems must execute in a parallel computational environment to achieve acceptable performance. Target parallel environments range from the largest tightly-coupled supercomputers to heterogeneous clusters of workstations. Grid technologies make Internet execution more likely. Hierarchical and heterogeneous...

Diversity and Competence Response

Article

Mar 2012

Broadening Participation Data Trends on Minorities and People with Disabilities in Computing

Article

Dec 2011

Seeking a comprehensive view of minority student demographics to determine what programs and policies are needed to promote diversity.

Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems

Article

Full-text available

Nov 2011

Predictive models enable a better understanding of the performance characteristics of applications on multicore systems. Previous work has utilized performance counters in a system-centered approach to model power consumption for the system, CPU, and memory components. Often, these approaches use the same group of counters across different applicat...

Energy and Performance Characteristics of Different Parallel Implementations of Scientific Applications on Multicore Systems

Article

Full-text available

Aug 2011

Energy consumption is a major concern with high-performance multicore systems. In this paper, we explore the energy consumption and performance (execution time) characteristics of different parallel implementations of scientific applications. In particular, the experiments focus on message-passing interface (MPI)-only versus hybrid MPI/OpenMP imple...

Performance Modeling of Hybrid MPI/OpenMP Scientific Applications on Large-scale Multicore Cluster Systems

Conference Paper

Aug 2011

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore clusters: IBM POWER4, POWER5+ and BlueGene/P, and analyze the performance of these MPI, OpenMP...

Figure 3. OpenMP parallelization of EQdyna

Figure 16. Speedup of the hybrid on Pangu

Figure 18. Speedup of the hybrid implementation on Hydra

Parallel Simulations of Dynamic Earthquake Rupture Along Geometrically Complex Faults on CMP Systems

Article

Full-text available

Jun 2011

Chip multiprocessors (CMP) are widely used for high performance computing and are being configured in a hierarchical manner to compose a CMP compute node in a CMP system. Such a CMP system provides a natural programming paradigm for hybrid MPI/OpenMP applications. In this paper, we use OpenMP to parallelize a sequential earthquake simulation code f...

Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters

Article

Feb 2011

The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication b...

Cultivating Cultural Diversity in Information Technology

Article

Jul 2010

Valerie Taylor

Introducing CMD-IT, a new center focused on synergistic activities related to ethnic minorities and people with disabilities.

DistDLB: Improving Cosmology SAMR Simulations on Distributed Computing Systems Through Hierarchical Load Balancing

Chapter

Dec 2009

Introduction Cosmology SAMR Applications Design of DistDLB Experiments Conclusion and Future Work Acknowledgments References

Mesh Partitioning for Efficient Use of Distributed Systems

Chapter

Dec 2009

Introduction Major Issues Related to Mesh Partitioning for Distributed Systems Description of Part Parallel Simulated Annealing Experiments Previous Work Conclusion and Future Work References

An OpenMP Approach to Modeling Dynamic Earthquake Rupture Along Geometrically Complex Faults on CMP Systems

Conference Paper

Sep 2009

Chip multiprocessors (CMP) are widely used for high performance computing and are being configured in a hierarchical manner to compose a CMP compute node in a parallel system. OpenMP parallel programming within such a CMP node can take advantage of the globally shared address space and on-chip high inter-core bandwidth and low inter-core latency. I...

Performance projection of HPC applications using SPEC CFP2006 benchmarks

Conference Paper

Full-text available

May 2009

Performance projections of High Performance Computing (HPC) applications onto various hardware platforms are important for hardware vendors and HPC users. The projections aid hardware vendors in the design of future systems, enable them to compare the application performance across different existing and future systems, and help HPC users with syst...

Performance Analysis and Optimization of Parallel Scientific Applications on CMP Clusters.

Article

Full-text available

Jan 2009

Chip multiprocessors (CMP) are widely used for high performance computing. Further , these CMPs are being configured in a hierarchical manner to compose a node in a cluster system. A major challenge to be addressed is efficient use of such cluster systems for large-scale scientific applications. In this paper, we quantify the performance gap result...

A Methodology for Developing High Fidelity Communication Models for Large-Scale Applications Targeted on Multicore Systems

Conference Paper

Full-text available

Dec 2008

Resource sharing and implementation of software stack for emerging multicore processors introduce performance and scaling challenges for large-scale scientific applications, particularly on systems with thousands of processing elements. Traditional performance optimization, tuning and modeling techniques that rely on uniform representation of compu...

Performance Analysis and Optimization of Parallel Scientific Applications on CMP Cluster Systems

Conference Paper

Full-text available

Oct 2008

Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system. A major challenge to be addressed is efficient use of such cluster systems for large-scale scientific applications. In this paper, we quantify the performance gap resulti...

Performance Analysis of Parallel Visualization Applications and Scientific Applications on an Optical Grid

Conference Paper

Sep 2008

One major challenge for grid environments is how to efficiently utilize geographically distributed resources given large communication latency introduced by wide area networks interconnecting different sites. In this paper, we use optical networks to connect four clusters from three different sites: Texas A&M University, University of Illinois at C...

Error Correction Code MultiSwitch Junction Crossbar Nanomemory Demultiplexer: Performance and Reliability Analysis

Article

Aug 2008

In this work, the performance and reliability analysis of a crossbar molecular switch nanomemory demultiplexer is studied and results presented. In particular, we investigate the impact on the performance of a crossbar nanomemory demultiplexer of implementing a combination of error correction coding and multi-switch junction fault tolerance schemes...

Multijunction Fault-Tolerance Architecture for Nanoscale Crossbar Memories

Article

Full-text available

Apr 2008

Nanoscale elements are fabricated using bottom-up processes, and as such are prone to high levels of defects. Therefore, fault-tolerance is crucial for the realization of practical nanoscale devices. In this paper, we investigate a fault-tolerance scheme that utilizes redundancies in the rows and columns of a nanoscale crossbar molecular switch mem...

Processor partitioning: an experimental performance analysis of parallel applications on SMP cluster systems

Article

Nov 2007

Currently, clusters of shared memory symmetric multiprocessors (SMPs) are one of the most common parallel computing systems, for which some existing environments have between 8 to 32 processors per node. Examples of such environments include some supercomputers: DataStar p655 (P655 and P655m) and P690 at the San Diego Supercomputing Center, and Sea...

Performance and reliability analysis of a scaled multi-switch junction crossbar nanomemory and demultiplexer

Conference Paper

Sep 2007

This paper presents a performance and reliability analysis of a scaled crossbar molecular switch memory and demultiplexer. In particular, we compare our multi-switch junction fault tolerance scheme with a banking defect tolerance scheme. Results indicate that delay and power scale linearly increasing number of redundant molecular switch junctions....

US QCD computational performance studies with PERI

Article

Full-text available

Aug 2007

We report on some of the interactions between two SciDAC projects: The National Computational Infrastructure for Lattice Gauge Theory (USQCD), and the Performance Engineering Research Institute (PERI). Many modern scientific programs consistently report the need for faster computational resources to maintain global competitiveness. However, as the...

Performance database technology for SciDAC applications

Article

Full-text available

Jan 2007

As part of the Performance Engineering Research Institute (PERI) effort, the Performance Database Working Group, which involves PERI researchers as well as outside researchers at the University of Oregon, Portland State University, and Texas A&M University, has developed technology for storing performance data collected by a number of performance m...

Proceedings - IEEE/ACM International Workshop on Grid Computing: General co-chairs' message

Article

Jan 2007

Performance Analysis, Modeling and Prediction of a Parallel Multiblock Lattice Boltzmann Application Using Prophesy System

Conference Paper

Oct 2006

The Lattice Boltzmann method is widely used in simulating fluid flows. In this paper, we present the performance analysis, modeling and prediction of a parallel multiblock Lattice Boltzmann application on up to 512 processors on three SMP clusters: two IBM SP systems at San Diego Supercomputing Center (DataStar - p655 and p690) and one IBM SP syste...

Predicting application run times using historical information

Chapter

Full-text available

Jul 2006

We present a technique for deriving predictions for the run times of parallel applications from the run times of “similar” applications that have executed in the past. The novel aspect of our work is the use of search techniques to determine those application characteristics that yield the best definition of similarity for the purpose of making pre...

Multi-Junction Fault Tolerance Architecture for Nanoscale Crossbar Memories

Conference Paper

Jun 2006

Nanoscale elements are fabricated using bottom-up processes, and as such are prone to high levels of defects. Therefore, fault-tolerance is crucial for the realization of practical nanoscale devices. In this paper, we investigate a fault tolerance scheme that utilizes redundancies in the rows and columns of a nanoscale crossbar molecular switch mem...

Performance analysis of a fault-tolerant crossbar molecular switch memory demultiplexer

Conference Paper

May 2006

Nanoscale elements are fabricated using bottom-up processes, and as such they are prone to high levels of defects. Defect-tolerance will play a crucial role in the realization of practical nanoscale devices. In this paper we investigate the performance impact of combining a molecular switch junction with an ECC demultiplexer to allow for enhanced f...

DistDLB: Improving Cosmology SAMR Simulations on Distributed Computing Systems Through Hierarchical Load Balancing

Article

May 2006

Cosmology SAMR simulations have played a prominent role in the field of astrophysics. The emerging distributed computing systems provide an economic alternative to the traditional parallel machines, and enable scientists to conduct cosmological simulations that require vast computing power. An important issue of conducting distributed cosmological...

A web-based prophesy automated performance modeling system

Article

Jan 2006

Prophesy system is a performance analysis and modeling infrastructure that allows users to record many different parameters relevant to an application's performance. A key component of Prophesy system is the web-based automated performance modeling system, which allows a developer to quickly gain insight into the performance of an application code...

A hybrid framework for design and analysis of fault-tolerant architectures

Conference Paper

Full-text available

Jan 2006

It is anticipated that self assembled ultra-dense nanomemories will be more susceptible to manufacturing defects and transient faults than conventional CMOS-based memories, thus the need exists for fault-tolerant memory architectures. The development of such architectures will require intense analysis in terms of achievable performance measures-pow...

Multi-Junction Fault Tolerance Architecture for Nanoscale Crossbar Memories

Conference Paper

Jan 2006

Performance Prediction-based versus Load-based Site Selection: Quantifying the Difference.

Conference Paper

Full-text available

Jan 2005

Distributed systems are available and provide vast compute and data resources to users. With the avai lability of multiple resources, one of the major issues to b e addressed is site selection. Users have access to many resource sites from which to select for execution o f applications. In this paper, we quantify the advan tages of using performanc...

Dynamic Load Balancing of SAMR Applications

Chapter

Jan 2005

We present two novel dynamic load balancing schemes for SAMR applications: one is for parallel systems denoted as parallel DLB and the other is for distributed systems denoted as distributed DLB. Parallel DLB scheme divides the load balancing process into two steps: moving-grid phase and splitting-grid phase. Distributed DLB scheme takes into consi...

A hybrid framework for design and analysis of fault-tolerant architectures for nanoscale molecular crossbar memories

Article

Jan 2005

Predicting application run times with historical information

Article

Sep 2004

We present a technique for predicting the run times of parallel applications based upon the run times of “similar” applications that have executed in the past. The novel aspect of our work is the use of search techniques to determine those application characteristics that yield the best definition of similarity for the purpose of making predictions...

Isocoupling: Reusing kernel coupling values to predict the performance of parallel applications

Conference Paper

Full-text available

May 2004

Summary form only given. Kernel coupling quantifies the interaction between adjacent and chains of kernels in an application. A kernel can be a loop, procedure or file. In our previous work, we used the kernel coupling values to identify how to combine the execution times of the individual kernels that compose the application to predict the executi...

Introduction: Performance Analysis and Modeling

Article

Oct 2003

Exploring cosmology applications on distributed environments

Article

Aug 2003

A typical cosmological simulation requires a large amount of compute power, which is hard to satisfy with a single machine. Distributed systems provide the opportunity to execute such large-scale applications. As part of the iGrid Research Demonstration 2002, we explored a large-scale cosmology application on a distributed system composed of two su...

Prophesy

Article

Mar 2003

Performance is an important issue with any application, especially grid applications. Efficient execution of applications requires insight into how the system features impact the performance of the applications. This insight generally results from significant experimental analysis and possibly the development of performance models. This paper prese...

April: A Run-Time Library for Tape-Resident Data

Article

Jan 2003

Over the last decade, processors have made enormous gains in speed. But increase in the speed of the secondary and tertiary storage devices could not cope with these gains. The result is that the secondary and tertiary storage access times dominate execution time of data intensive computations. Therefore, in scientific computations, efficient data...

Unknown

Article

Jan 2003

With the increasing number of scientific applications manipulating huge amounts of data, effective high-level data management is an increasingly important problem. Unfortunately, so far the solutions to the high-level data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance...

Prophesy: An infrastructure for performance analysis and modeling of parallel and grid applications

Article

Full-text available

Jan 2003

Using Kernel Coupling to Improve the Performance of Multithreaded Applications.

Conference Paper

Full-text available

Jan 2003

Kernel coupling refers to the eect that kernel i has on kernel j in relation to running each kernel in isolation. The two kernels can correspond to adjacent kernels or a chain of three or more kernels in the control ow of an application. In previous work, we used kernel cou- pling to provide insights on where further algorithm and code implementati...

A novel dynamic load balancing scheme for parallel systems

Article

Dec 2002

Adaptive mesh refinement (AMR) is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. One of the key issues related to AMR is dynamic load balancing (DLB), which allows large-scale adaptive applications to run efficiently on parallel systems. In this paper, we present...

White Paper: A Grid Monitoring Service Architecture (DRAFT)

Article

Full-text available

Aug 2002

Large distributed systems such as Computational and Data Grids require a substantial amount of monitoring data be collected for a variety of tasks such as fault detection, performance analysis, performance tuning, performance prediction, and scheduling. Some tools are currently available and others are being developed for collecting and forwarding...

Performance Coupling: Case Studies for Improving the Performance of Scientific Applications

Article

Aug 2002

Traditional performance optimization techniques have focused on finding the kernel in an application that is the most time consuming and attempting to optimize it. In this paper, we focus on an optimization technique with a more global perspective of the application. In particular, we present a methodology for measuring the interaction, or coupling...

Women of color in computing

Article

Jun 2002

Valerie Taylor

Mesh Partitioning for Efficient Use of Distributed Systems

Article

Feb 2002

Mesh partitioning for homogeneous systems has been studied extensively; however, mesh partitioning for distributed systems is a relatively new area of research. To ensure efficient execution on a distributed system, the heterogeneities in the processor and network performance must be taken into consideration in the partitioning process; equal size...

Using kernel couplings to predict parallel application performance

Conference Paper

Full-text available

Feb 2002

Performance models provide significant insight into the performance relationships between an application and the system used for execution. The major obstacle to developing performance models is the lack of knowledge about the performance relationships between the different functions that compose an application. This paper addresses the issue by us...

I/O analysis and optimization for an AMR cosmology application

Conference Paper

Feb 2002

In this paper we investigate the data access patterns and file I/O behaviors of a production cosmology application that uses the adaptive mesh refinement (AMR) technique for its domain decomposition. This application was originally developed using Hierarchical Data Format (HDF version 4) I/O library and since HDF4 does not provide parallel I/O faci...

Dynamic Load Balancing of Samr Applications on Distributed Systems

Article

Full-text available

Jan 2002

Dynamic load balancing(DLB) for parallel systems has been studied extensively; however, DLB for distributed systems is relatively new. To efficiently utilize computing resources provided by distributed systems, an underlying DLB scheme must address both heterogeneous and dynamic features of distributed systems. In this paper, we propose a DLB schem...

Implementing parallel shortest path for parallel transportation applications

Article

Nov 2001

Shortest path algorithms are required by several transportation applications; furthermore, the shortest path computation in these applications can account for a large percentage of the total execution time. Since these algorithms are very computationally intense, parallel processing can provide the compute power and memory required to solve large p...

A Grid Monitoring Architecture

Article

Aug 2001

A Simple Case Study of a Grid Performance System

Article

Full-text available

Jul 2001

This document presents a simple case study of a Grid performance system based on the Grid Monitoring Architecture (GMA) being developed by the Grid Forum Performance Working Group. It describes how the various system components would interact for a very basic monitoring scenario, and is intended to introduce people to the terminology and concepts p...

Balancing Load versus Decreasing Communication: Parameterizing the Tradeoff

Article

May 2001

Mesh partitioning is an important step for parallel scientific applications, in particular finite element analyses. A good partitioner will minimize both the time spent on local computation and on interprocessor communication. It is often the case that these two goals cannot be satisfied simultaneously. In this paper, we use analytical and experime...

Choosing a Shortest Path Algorithm

Article

Full-text available

Apr 2001

Computation of shortest paths is an integral component of many applications such as transportation planning and VLSI design. Frequently, a shortest path algorithm is selected for a given application based on the performance of the algorithm for a set of test networks. The performance of this algorithm, however, can be significantly different for ne...

Reducing the Idle Time of Parallel Shortest Path Algorithms

Article

Apr 2001

Shortest path computation is required by a large number of applications such as VLSI, transportation and communication networks. These applications, which are often very complex and have sparse networks, generally use parallel labeling shortest path algorithms. Such algorithms, when implemented on a distributed memory machine, require termination d...

Parallel Shortest Path Algorithms: Identifying the Factors that Affect Performance

Article

Apr 2001

Shortest path computation is required by a large number of applications such as VLSI, transportation and communication networks. These applications, which often use parallel processing, require an efficient parallel shortest path algorithm. The experimental work related to parallel shortest path algorithms has focused on the development of efficien...

Dynamic load balancing of SAMR applications on distributed systems

Conference Paper

Full-text available

Jan 2001

Dynamic load balancing (DLB) for parallel systems has been studied extensively; however, DLB for distributed systems is relatively new. To efficiently utilize computing resources provided by distributed systems, an underlying DLB scheme must address both heterogeneous and dynamic features of distributed systems. In this paper, we propose a DLB sche...

Prophesy: Automating the Modeling Process.

Conference Paper

Full-text available

Jan 2001

Performance models provide significant insight into the performance relationships between an application and the system, either paral lel or distributed, used for execution. The development of models often requires significant time, sometimes in the range of months, to develop; this is especially the case for detailed models. This paper presents ou...

System Buffer Size Requirements: An Application's Perspective.

Conference Paper

Jan 2001

Dynamic Load Balancing for Adaptive Mesh Refinement Application

Conference Paper

Full-text available

Jan 2001

Adaptive Mesh Refinement (AMR) is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. One of the key issues related to AMR is dynamic load balancing (DLB), which allows large-scale adaptive applications to run efficiently on parallel systems. In this paper we present...

Design and Development of the Prophesy Performance Database for Distributed Scientific Applications.

Conference Paper

Full-text available

Jan 2001

Dynamic Load Balancing for Adaptive Mesh Refinement Applications: Improvements and Sensitivity Analysis

Article

Full-text available

Jan 2001

Adaptive Mesh Refinement (AMR) is a type of multiscale algorithm that dynamically achieves high resolution in lo- calized regions of multidimensional numerical simulations. A dynamic load balancing(DLB) scheme for structured AMR applications was proposed in (19). Unfortunately, the overhead introduced by this DLB scheme is significant. Further, a p...

On the use of simulation and parallelization tools in computer architecture and programming courses

Article

Jan 2001

Computer architecture and programming are disciplines that require extensive experimentation with computer tools such as simulators and compilers. At the authors' universities, several tools are being incorporated in courses at the junior and senior levels by using a powerful, web-based network-computing system as a computational and educational re...

Scheduling with Advanced Reservations

Article

Oct 2000

Some computational grid applications have very large resource requirements and need simultaneous access to resources from more than one parallel computer. Current scheduling systems do not provide mechanisms to gain such simultaneous access without the help of human administrators of the computer systems. In this work, we propose and evaluate sever...

High Performance Computing for Computer Graphics and Visualisation

Article

Full-text available

Jul 2000

In this paper we develop a performance model for analyzing the endto -end lag in a combined supercomputer/virtual environment. Wefirst present a general model and then use this model to analyze the lag of an interactive, immersive visualization of a scientific application. This application consists of a finite element simulation executed on an IBM...

Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance

Conference Paper

Jul 2000

On many computers, a request to run a job is not serviced immediately but instead is placed in a queue and serviced only when resources are released by preceding jobs. In this paper, we build on runtime prediction techniques that we developed in previous research to explore two problems. The first problem is to predict how long applications will wa...

Prophesy: An Infrastructure for Analyzing and Modeling the Performance of Parallel and Distributed Applications

Article

Full-text available

Feb 2000

Efficient execution of applications requires insight into how the system features affect the performance of the application. For distributed systems, the task of gaining this insight is complicated by the complexity of the system features. This insight generally results from significant experimental analysis and possibly the development of performa...

Data Management for Large-Scale Scientific Computations in High Performance Distributed Systems

Article

Full-text available

Jan 2000

With the increasing number of scientific applications manipulating huge amounts of data, effective high-level data management is an increasingly important problem. Unfortunately, so far the solutions to the high‐level data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance...

APRIL: A Run-Time Library for Tape-Resident Data.

Conference Paper

Jan 2000

Scheduling with Advance Reservations

Conference Paper

Jan 2000

On the use of simulation and parallelization tools in computer architecture and programming courses

Article

Jan 2000

Computer architecture and programming are disciplines that require extensive experimentation with computer tools, such as simulators and compilers. At the authors' universities, several tools are being incorporated in courses at the junior and senior levels by using a powerful, web-based network-computing system as a computational and educational r...

PART: A Run-Time Library for Tape-Resident Data

Article

Oct 1999

Predicting application run times using historical information

Conference Paper

Jun 1999

The authors present a technique for deriving predictions for the run times of parallel applications from the run times of similar applications that have executed in the past. The novel aspect of the work is the use of search techniques to determine those application characteristics that yield the best definition of similarity for the purpose of mak...

Performance Coupling: Case Studies for Measuring the Interactions of Kernels in Modern Applications

Article

Full-text available

Jun 1999

Traditional performance optimization techniques have focused on finding the kernel in an application that is the most time consuming and attempting to optimize it. In this paper we focus on optimization techniques for a more global perspective of the application. In particular, we present a methodolodgy for measuring the interaction or coupling bet...

Toward Real-time Interactive Virtual Prototyping of Mechanical Systems: Experiences Coupling Virtual Reality with Finite Element Analysis

Article

Full-text available

May 1999

Virtual prototyping involves a synthesis of engineering methodology and immersive, three-dimensional visualization technology. Ideally, this is a process in which computational models are used in place of physical models in the development of a new product or design concept. If used successfully, virtual prototyping can lead to more rapid product d...

Data management for large-scale scientific computations in highperformance distributed systems

Conference Paper

Feb 1999

With the increasing number of scientific applications manipulating huge amounts of data, effective data management is an increasingly important problem. Unfortunately, so far the solutions to this data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file systems) or pro...

Mesh Partitioning for Distributed Systems: Exploring Optimal Number of Partitions with Local and Remote Communication

Article

Jan 1999

Mesh partitioning for distributed systems differs from partitioning for homogeneous systems in that both system and application heterogeneities need to be taken into consideration. In this paper, we focus on the issue of optimal number of partitions with local and remote communication. This is an important issue due to the fact that local and remot...

Performance Coupling: A Methodology for Predicting Application Performance using Kernel Performance

Article

Full-text available

Jan 1999

Traditional performance optimization techniques have focused on finding the kernel in a program that is the most time consuming and attempting to optimize it. We introduce a methodology for measuring and representing the interaction, or coupling, between kernels that improves upon the accuracy of the traditional method. Then we demonstrate the bene...

Termination Detection for Parallel Shortest Path Algorithms

Article

Oct 1998

Modeling Processor Wait Time for 2-D Parallel Finite Element Applications

Article

Oct 1998

this paper we present a model of the communication of parallel, 2-D finite element problems implemented on the Intel Delta. Our communication model consists of start-up time, transmission latency, and processor wait time due to synchronization. We find that the wait time can account for up to 25% of the total communication time. 2 Parallel Finite E...

Three-Dimensional Finite-Element Analyses: Implications for Computer Architectures

Article

Full-text available

Oct 1998

The computationally-intensive step of the finite element method is the solution of a linear system of equations. Very large and very sparse system matrices result from three-dimensional finite-element applications. The sparsity must be exploited for efficient use of memory and computational components (e.g., floating-point units) in executing the s...

The Effects of Communication Overhead on the Speedup of Parallel 3-D Finite Element Applications

Article

Oct 1998

The use of parallel processors for implementing the finite elment method has made feasible the analyses of large applications, especially three-dimensional applications. The speedup, however, is limited by the interprocessor communication requirements. In this paper we analyze the effects of interprocessor communications on the resultant speedup of...