Anthony Chan
Hang Seng Management College · Business School

About

Publications

2,747

Reads

388

Citations

Publications

Exascale workload characterization and architecture implications

Conference Paper

Apr 2013

Emerging exascale architectures bring forth new challenges related to heterogeneous systems power, energy, cost, and resilience. These new challenges require a shift from conventional paradigms in understanding how to best exploit and optimize these features and limitations. Our objective is to identify the top few dominant characteristics in a set...

Abstract: An Exascale Workload Study

Conference Paper

Nov 2012

Amdahl's law has been one of the factors influencing speedup in high performance computing over the last few decades. While Amdahl's approach of optimizing (10% of the code is where 90% of the execution time is spent) has worked very well in the past, new challenges related to emerging exascale heterogeneous architectures, combined with stringent p...

Poster: An Exascale Workload Study

Conference Paper

Nov 2012

The Importance of Non-Data-Communication Overheads in MPI

Article

Full-text available

Feb 2010

With processor speeds no longer doubling every 18-24 months owing to the exponential increase in power consumption and heat dissipation, modern HEC systems tend to rely lesser on the performance of single processing units. Instead, they rely on achieving high-performance by using the parallelism of a massive number of low-frequency/low-power proces...

Improving Resource Availability by Relaxing Network Allocation Constraints on Blue Gene/P

Conference Paper

Full-text available

Sep 2009

Torus networks are prevalent on leadership-class petas- cale systems. While many systems use a single shared torus for a full system, Blue Gene/P systems provide the ability to partition the system torus into a series of inde- pendent, isolated tori for individual jobs. While this ap- proach provides substantially improved network behavior for thos...

Toward message passing for a million processes: Characterizing MPI on a massive scale blue gene/P

Article

Full-text available

Sep 2009

Upcoming exascale capable systems are expected to comprise more than amillion processing elements. As researchers continue to work toward architecting these systems, it is becoming increasingly clear that these systems will utilize asignificant amount of shared hardware between processing units; this includes shared caches, memory and network compo...

Communication Analysis of Parallel 3D FFT for Flat Cartesian Meshes on Large Blue Gene Systems

Conference Paper

Full-text available

Dec 2008

Parallel 3D FFT is a commonly used numerical method in scientific computing. P3DFFT is a recently proposed implementation of parallel 3D FFT that is designed to allow scalability to massively large systems such as Blue Gene. While there has been recent work that demonstrates such scalability on regular cartesian meshes (equal length in each dimensi...

Non-data-communication Overheads in MPI: Analysis on Blue Gene/P

Conference Paper

Full-text available

Sep 2008

Modern HEC systems, such as Blue Gene/P, rely on achiev- ing high-performance by using the parallelism of a massive number of low-frequency/low-power processing cores. This means that the local pre- and post-communication processing required by the MPI stack might not be very fast, owing to the slow processing cores. Similarly, small amounts of ser...

An Efficient Format for Nearly Constant-Time Access to Arbitrary Time Intervals in Large Trace Files

Article

Full-text available

Jan 2008

A powerful method to aid in understanding the performance of parallel applications uses log or trace files containing time-stamped events and states (pairs of events). These trace files can be very large, often hundreds or even thousands of megabytes. Because of the cost of accessing and displaying such files, other methods are often used that redu...

Early Experiments with the OpenMP/MPI Hybrid Programming Model

Conference Paper

Full-text available

Jan 2008

The paper describes some very early experiments on new architectures that support the hybrid programming model. The results are promising in that OpenMP threads interact with MPI as desired, allowing OpenMP-agnostic tools to be used. They explore three environments: a 'typical' Linux cluster, a new large-scale machine from SiCortex, and the new IBM...

Jumpshot-4 Users Guide

Article

Full-text available

Aug 2007

A Portable Method for Finding User Errors in the Usage of MPI Collective Operations

Article

Full-text available

May 2007

An MPI profiling library is a standard mechanism for intercepting MPI calls by applications. Profiling libraries are so named because they are commonly used to gather runtime information about performance characteristics. Here we present a profiling library whose purpose is to detect user errors in the use of MPI's collective operations. While some...

Collective Error Detection for MPI Collective Operations

Conference Paper

Full-text available

Mar 2005

An MPI profiling library is a standard mechanism for intercepting MPI calls by applications. Profiling libraries are so named because they are commonly used to gather performance data on MPI programs. Here we present a profiling library whose purpose is to detect user errors in the use of MPI’s collective operations. While some errors can be detect...

From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems

Article

Sep 2000

In this paper we describe a trace analysis framework, from trace generation to visualization. It includes a unified tracing facility on IBM SP systems, a self-defining interval file format, an API for framework extensions, utilities for merging and statistics generation, and a visualization tool with preview and multiple time-space diagrams. The tr...

Scalable log files for parallel program trace data (DRAFT)

Article