Anthony Chan

Anthony Chan
Hang Seng Management College · Business School

About

17
Publications
2,747
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
388
Citations

Publications

Publications (17)
Conference Paper
Emerging exascale architectures bring forth new challenges related to heterogeneous systems power, energy, cost, and resilience. These new challenges require a shift from conventional paradigms in understanding how to best exploit and optimize these features and limitations. Our objective is to identify the top few dominant characteristics in a set...
Conference Paper
Amdahl's law has been one of the factors influencing speedup in high performance computing over the last few decades. While Amdahl's approach of optimizing (10% of the code is where 90% of the execution time is spent) has worked very well in the past, new challenges related to emerging exascale heterogeneous architectures, combined with stringent p...
Conference Paper
Amdahl's law has been one of the factors influencing speedup in high performance computing over the last few decades. While Amdahl's approach of optimizing (10% of the code is where 90% of the execution time is spent) has worked very well in the past, new challenges related to emerging exascale heterogeneous architectures, combined with stringent p...
Article
Full-text available
With processor speeds no longer doubling every 18-24 months owing to the exponential increase in power consumption and heat dissipation, modern HEC systems tend to rely lesser on the performance of single processing units. Instead, they rely on achieving high-performance by using the parallelism of a massive number of low-frequency/low-power proces...
Conference Paper
Full-text available
Torus networks are prevalent on leadership-class petas- cale systems. While many systems use a single shared torus for a full system, Blue Gene/P systems provide the ability to partition the system torus into a series of inde- pendent, isolated tori for individual jobs. While this ap- proach provides substantially improved network behavior for thos...
Article
Full-text available
Upcoming exascale capable systems are expected to comprise more than amillion processing elements. As researchers continue to work toward architecting these systems, it is becoming increasingly clear that these systems will utilize asignificant amount of shared hardware between processing units; this includes shared caches, memory and network compo...
Conference Paper
Full-text available
Parallel 3D FFT is a commonly used numerical method in scientific computing. P3DFFT is a recently proposed implementation of parallel 3D FFT that is designed to allow scalability to massively large systems such as Blue Gene. While there has been recent work that demonstrates such scalability on regular cartesian meshes (equal length in each dimensi...
Conference Paper
Full-text available
Modern HEC systems, such as Blue Gene/P, rely on achiev- ing high-performance by using the parallelism of a massive number of low-frequency/low-power processing cores. This means that the local pre- and post-communication processing required by the MPI stack might not be very fast, owing to the slow processing cores. Similarly, small amounts of ser...
Article
Full-text available
A powerful method to aid in understanding the performance of parallel applications uses log or trace files containing time-stamped events and states (pairs of events). These trace files can be very large, often hundreds or even thousands of megabytes. Because of the cost of accessing and displaying such files, other methods are often used that redu...
Conference Paper
Full-text available
The paper describes some very early experiments on new architectures that support the hybrid programming model. The results are promising in that OpenMP threads interact with MPI as desired, allowing OpenMP-agnostic tools to be used. They explore three environments: a 'typical' Linux cluster, a new large-scale machine from SiCortex, and the new IBM...
Article
Full-text available
An MPI profiling library is a standard mechanism for intercepting MPI calls by applications. Profiling libraries are so named because they are commonly used to gather runtime information about performance characteristics. Here we present a profiling library whose purpose is to detect user errors in the use of MPI's collective operations. While some...
Conference Paper
Full-text available
An MPI profiling library is a standard mechanism for intercepting MPI calls by applications. Profiling libraries are so named because they are commonly used to gather performance data on MPI programs. Here we present a profiling library whose purpose is to detect user errors in the use of MPI’s collective operations. While some errors can be detect...
Article
In this paper we describe a trace analysis framework, from trace generation to visualization. It includes a unified tracing facility on IBM SP systems, a self-defining interval file format, an API for framework extensions, utilities for merging and statistics generation, and a visualization tool with preview and multiple time-space diagrams. The tr...

Network

Cited By