About
142
Publications
41,005
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,832
Citations
Publications
Publications (142)
The next generation Sunway supercomputer employs the SW26010pro processor, which features a specialized on-chip heterogeneous architecture. Applications with significant hotspots can benefit from the great computation capacity of Sunway many-core architectures by carefully making intensive manual many-core parallelization efforts. However, some pro...
The next generation Sunway supercomputer employs the SW26010pro processor, which features a specialized on-chip heterogeneous architecture. Applications with significant hotspots can benefit from the great computation capacity improvement of Sunway many-core architectures by carefully making intensive manual many-core parallelization efforts. Howev...
The underestimation of cloud fraction, especially the low stratus cloud fraction (LSC) over the eastern oceans, remains a problem in most AGCMs. This study investigated potential improvements through perturbing nine moist physical parameters, using uniform sampling and Latin hypercube sampling methods, and quantified the parametric uncertainty and...
Molecular dynamics (MD) simulations of biological systems are playing an increasingly important role in the research of pathogens and drugs. Most MD methods for biological simulations rely on the listed bonds which interact among specific groups of atoms identified by atom tags (unique atom tags regardless the storage location). However, efficient...
The Single Column Atmospheric Model (SCAM) is an essential tool for analyzing and improving the physics schemes of CAM. Although it already largely reduces the compute cost from a complete CAM, the exponentially-growing parameter space makes a combined analysis or tuning of multiple parameters difficult. In this paper, we propose a hybrid framework...
Variations in the performance of parallel and distributed systems are becoming increasingly challenging. The runtimes of different executions can vary greatly even with a fixed number of computing nodes. Many HPC applications on supercomputers exhibit such variance. This not only leads to unpredictable execution times, but also renders the system’s...
With the increasing complexity of scientific computing, it is imperative to enhance the efficiency and ease of High Performance Computing (HPC) utilization. Scientific workflow is introduced to that aim, but the current infrastructure still needs optimization. In this paper, we discuss the current problems based on scientific computing scenarios an...
Tridiagonal solver is an important kernel used in a wide range of applications and has been well supported in mainstream numerical libraries. Quite a few parallel algorithms have been developed, but the best-performing algorithm may vary across architectures as well as input sizes. Targeting this algorithm choice challenge, we present a model guide...
Hybrid modeling combining data-driven techniques and numerical methods is an emerging and promising research direction for efficient climate simulation. However, previous works lack practical platforms, making developing hybrid modeling a challenging programming problem. Furthermore, the lack of standard data sets and evaluation metrics may hamper...
In climate models, subgrid parameterizations of
convection and clouds are one of the main causes of the biases in
precipitation and atmospheric circulation simulations. In recent years, due
to the rapid development of data science, machine learning (ML)
parameterizations for convection and clouds have been demonstrated to have
the potential to perf...
Background
Large uncertainty in modeling land carbon (C) uptake heavily impedes the accurate prediction of the global C budget. Identifying the uncertainty sources among models is crucial for model improvement yet has been difficult due to multiple feedbacks within Earth System Models (ESMs). Here we present a Matrix-based Ensemble Model Inter-comp...
The tridiagonal solver is an important kernel and is widely supported in mainstream numerical libraries. While parallel algorithms have been studied for many-core architectures, the performance of current algorithms and implementations is still hindered by input size sensitivity and cross-platform portability. In this paper, we propose a novel algo...
Molecular dynamics (MD) simulations are playing an increasingly important role in many areas ranging from chemical materials to biological molecules. With the continuing development of MD models, the potentials are getting larger and more complex. In this paper, we focus on the reactive force field (ReaxFF) potential from LAMMPS to optimize the com...
The Community Atmosphere Model (CAM) has been ported, redesigned, and scaled to the full system of the Sunway TaihuLight, and provides peta-scale climate modeling performance. Based on a novel domain decomposition method, we have fully optimized the complete model code by using both OpenACC refactoring and more aggressive and finer-grained Athread...
With semiconductor technology gradually approaching its physical and thermal limits, recent supercomputers have adopted major
architectural changes to continue increasing the performance through more
power-efficient heterogeneous many-core systems. Examples include Sunway
TaihuLight that has four management processing elements (MPEs) and 256
comput...
A team effort to develop a Community Integrated Earth System Model (CIESM) was initiated in China in 2012. The model was based on NCAR Community Earth System Model (Version 1.2.1) with several novel developments and modifications aimed to overcome some persistent systematic biases, such as the double Intertropical convergence Zone problem and under...
Large-scale
molecular dynamics
(MD) simulations on supercomputers play an increasingly important role in many research areas. With the capability of simulating
charge equilibration
(QEq), bonds and so on,
Reactive force field
(ReaxFF) enables the precise simulation of chemical reactions. Compared to the first principle molecular dynamics (FPM...
The ever-growing complexity of HPC applications and the computer architectures cost more efforts than ever to learn application behaviors. In this paper, we propose the APMT, an Automatic Performance Modeling Tool, to understand and predict performance efficiently in the regimes of interest to developers and performance analysts while outperforming...
In the original version of this article, the second author’s first name was misspelled as Zhipeng. The correct spelling is Zipeng. The sixth author’s first name was misspelled as Jirong. The correct spelling is Jinrong. The correct version is as follows: LICOM Model Datasets for the CMIP6 Ocean Model Intercomparison Project Pengfei LIN1,4, Zipeng Y...
The Sunway TaihuLight supercomputer has been installed for several years and many applications have been ported or built for TaihuLight. Initially most applications running on TaihuLight are with regular memory access patterns, such as dense linear algebra, structured grids and dynamic programming. At the year of 2018, developers have published a g...
The datasets of two Ocean Model Intercomparison Project (OMIP) simulation experiments from the LASG/IAP Climate Ocean Model, version 3 (LICOM3), forced by two different sets of atmospheric surface data, are described in this paper. The experiment forced by CORE-II (Co-ordinated Ocean–Ice Reference Experiments, Phase II) data (1948–2009) is called O...
Abstract. With the semi-conductor technology gradually approaching its physical and heat limits, recent supercomputers have adopted major architectural changes to continue increasing the performance through more power-efficient heterogeneous many-core systems. Examples include Sunway TaihuLight that has four Management Processing Element (MPE) and...
Uncertain parameters in physical parameterizations of
general circulation models (GCMs) greatly impact model performance. In
recent years, automatic parameter optimization has been introduced for
tuning model performance of GCMs, but most of the optimization methods are
unconstrained optimization methods under a given performance indicator.
Therefo...
GROMACS is one of the most popular Molecular Dynamic (MD) applications and is widely used in the field of chemical and bimolecular system study. Similar to other MD applications, it needs long run-time for large-scale simulations. Therefore, many high performance platforms have been employed to accelerate it, such as Knights Landing (KNL), Cell Pro...
The prediction ability of the climate system is highly depended on the efficient integration of observations and simulations of the Earth, which is regarded as a canonical example of the cyber-physical system. The climate system model, the simulation engine in this cyber-physical system, is one of most challenging applications in scientific computi...
The Weather Research and Forecasting (WRF) Model is one of the widely-used mesoscale numerical weather prediction system and is designed for both atmospheric research and operational forecasting applications. However, it is an extremely time-consuming application: running a single simulation takes researchers days to weeks as the simulation size sc...
As scientific applications are increasingly ported to GPUs to benefit from both the powerful computing capacity and high throughput, accelerating explicit solvers for GPU-based finite volume methods is gaining more and more attention. In this paper, based on the detailed analysis of the FVM algorithm, we present a set of novel optimization methods,...
Uncertain parameters in physical parameterizations of General Circulation Models (GCMs) greatly impact model performance. In recent years, automatic parameter optimization has been introduced for tuning model performance of GCMs but most of the optimization methods are unconstrained optimization methods under a given performance indicator, so that...
Tropical cyclone (TC) genesis is a problem of great significance in climate and weather research. Although various environmental conditions necessary for TC genesis have been recognized for a long time, prediction of TC genesis remains a challenge due to complex and stochastic processes involved during TC genesis. Different from traditional statist...
We introduce NAMSG, an adaptive first-order algorithm for training neural networks. The method is efficient in computation and memory, and straightforward to implement. It computes the gradients at configurable remote observation points, in order to expedite the convergence by adjusting the step size for directions with different curvatures, in the...
In this paper, we propose an efficient time-space-domain optimized (OptTS) finite difference scheme to model 2D and 3D scalar wave propagation. It adopts piecewise constant interpolation coefficients for several consecutive Courant number ranges, which avoids the extra time costs caused by loading the coefficients consecutively according to differe...
Coastal areas, where sea breeze are prevalent, generally have good wind power resources and are favorable sites for wind farms. Corkscrew sea breezes, having greater wind power than backdoor sea breezes, dominate the local circulations in summer over the coastal area of Jiangsu province, China. Daily Weather Research and Forecasting simulations wer...
Traditional trial-and-error tuning of uncertain parameters in global atmospheric general circulation models (GCMs) is time consuming and subjective. This study explores the feasibility of automatic optimization of GCM parameters for fast physics by using short-term hindcasts. An automatic workflow is described and applied to the Community Atmospher...
The sparse triangular solver (SpTRSV) is one of the most essential kernels in many scientific and engineering applications. Efficiently parallelizing the SpTRSV on modern many-core architectures is considerably difficult due to inherent dependency of computation and discontinuous memory accesses. Achieving high performance of SpTRSV is even more ch...
Soil organic carbon (SOC)
has a significant effect on carbon emissions and climate change. However, the
current SOC prediction accuracy of most models is very low. Most evaluation
studies indicate that the prediction error mainly comes from parameter
uncertainties, which can be improved by parameter calibration. Data
assimilation techniques have be...
Sparse Matrix-Vector Multiplication (SpMV) is an essential computation kernel for many data-analytic workloads running in both supercomputers and data centers. The intrinsic irregularity in SpMV is challenging to achieve high performance, especially when porting to new architectures. In this paper, we present our work on designing and implementing...
Due to the advantages on scalability and reliability, the floating random walk (FRW) algorithm has been widely adopted for calculating the capacitances among three-dimensional (3-D) conductors. This is evidenced by the industrial practice of interconnect capacitance extraction during the design of high-performance very large-scale integrated (VLSI)...
Electromagnetic transients (EMT) simulation is the most accurate and intensive computation for power systems. Past research has shown the potential of accelerating such simulations using graphics processing units (GPUs). In this paper, an efficient GPU-based parallel EMT simulator is designed. Thread-oriented model transformations are first propose...
Traditional trial-and-error tuning of uncertain parameters in global atmospheric General Circulation Models (GCM) is time consuming and subjective. This study explores the feasibility of automatic optimization of GCM parameters for fast physics by using short-term hindcasts. An automatic workflow is described and applied to the Community Atmospheri...
Performance variance becomes increasingly challenging on current large-scale HPC systems. Even using a fixed number of computing nodes, the execution time of several runs can vary significantly. Many parallel programs executing on supercomputers suffer from such variance. Performance variance not only causes unpredictable performance requirement vi...
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world applications. Currently, much research on parallel SpTRSV focuses on level-set construction for reducing the number of inter-level synchronizations. However, the out-of-control data reuse and high cost for global memory or shared cache access in inter-level syn...
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world applications. Currently, much research on parallel SpTRSV focuses on level-set construction for reducing the number of inter-level synchronizations. However, the out-of-control data reuse and high cost for global memory or shared cache access in inter-level syn...
Performance variance becomes increasingly challenging on current large-scale HPC systems. Even using a fixed number of computing nodes, the execution time of several runs can vary significantly. Many parallel programs executing on supercomputers suffer from such variance. Performance variance not only causes unpredictable performance requirement vi...
The original version of this Article contained an error in Figure 2. In panel a, the x axis of the graph was incorrectly labeled 'precipitation bias', and should have read 'negative precipitation bias'. This error has been corrected in both the PDF and HTML versions of the Article.
The response of surface winds over the equatorial Pacific to cloud-related parameters is quantified by using a uniform sampling method and conducting a large number of perturbed parameter simulations. The results show that the surface winds are highly sensitive to, and even linearly dependent on some parameters that include the precipitation effici...
This paper reports our large-scale nonlinear earthquake simulation software on Sunway TaihuLight. Our innovations include: (1) a customized parallelization scheme that employs the 10 million cores efficiently at both the process and the thread levels; (2) an elaborate memory scheme that integrates on-chip halo exchange through register communcation...
Memory accesses limit the performance and scalability of countless applications. Many design and optimization efforts will benefit from an in-depth understanding of memory access behavior, which is not offered by extant access tracing and profiling methods.
In this paper, we adopt a holistic memory access profiling approach to enable a better under...
To investigate the impacts of uncertain parameters on simulated Pacific Walker circulation (PWC), a large number of perturbed parameter simulations are conducted using GAMIL2 (the Grid-point Atmospheric Model of IAP/LASG, version 2), and three different PWC indices are selected. The results show that the influences of some parameters on PWC are dep...
Climate models show a conspicuous summer warm and dry bias over the central United States. Using results from 19 climate models in the Coupled Model Intercomparison Project Phase 5 (CMIP5), we report a persistent dependence of warm bias on dry bias with the precipitation deficit leading the warm bias over this region. The precipitation deficit is a...
Soil organic carbon (SOC) has a significant effect on the carbon emission and climate change. However, current SOC prediction accuracy of most models is very low. Most evaluation studies indicate that the prediction error mainly comes from parameter uncertainties, which can be obviously improved by parameter calibration. Data assimilation technique...
The scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasing number of nodes in a system and the integration...
FPGA-based reconfigurable dataflow engines provide a novel architecture to achieve breakthroughs in both time and energy to solution in numerical simulations. This article presents an efficient dataflow methodology for solving the Euler atmospheric dynamic equations, an essential step for mesoscale atmospheric simulation. The authors present custom...
In this paper, we study the problem of keyword search with access control over encrypted data in cloud computing. We first propose a scalable framework where user can use his attribute values and a search query to locally derive a search capability, and a file can be retrieved only when its keywords match the query and the user’s attribute values c...
An ultra-scalable fully-implicit solver is developed for stiff time-dependent problems arising from the hyperbolic conservation laws in nonhydrostatic atmospheric dynamics. In the solver, we propose a highly efficient hybrid domain-decomposed multigrid precondi-tioner that can greatly accelerate the convergence rate at the extreme scale. For solvin...
The Sunway TaihuLight supercomputer is the world’s first system with a peak performance greater than 100 PFlops. In this paper, we provide a detailed introduction to the TaihuLight system. In contrast with other existing heterogeneous supercomputers, which include both CPU processors and PCIe-connected many-core accelerators (NVIDIA GPU or Intel Xe...
In climate change studies, the atmospheric model is an essential component for building a high-resolution climate simulation system. While the accuracy of atmospheric simulations has long been limited by the computational capabilities of CPU platforms, the heterogeneous platforms equipped with accelerators are becoming promising candidates for achi...
The tridiagonal solver is an important kernel in many scientific and engineering applications. Although quite a few parallel algorithms have been exploited recently, challenges still remain when solving tridiagonal systems on many-core architectures. In this paper, quantitative analysis is conducted to guide the selection of algorithms on different...
Physical parameterization is one of the most important sources of uncertainties in the current climate system models. With the increasing complexity of models and the diverse requirements for climate studies, the priori and manual model tuning method for physical parameterization has become a bottleneck to further improve the climate system model....
In the present study, the LASG/IAP Climate system Ocean Model version 2 (LICOM2) was implemented to replace the original ocean component in the Community Earth System Model version 1.0.4 (CESM1) to form a new coupled model referred to as CESM1+LICOM2. The simulation results from a 300-yr preindustrial experiment by using this model were evaluated a...
An overview of the Chinese National Key Basic Research Project entitled “Development and Evaluation of High-Resolution Climate System Models” under grant No. 2010CB951900 is presented. The background and the objectives of the project are introduced. The main progress made in the past 5 years of the project is the development of “one system” and “tw...
The ensemble method is effective at reducing model uncertainties. In this work, a novel ensemble technology has been developed and employed to the coupling process in the climate system model, forming a flexible multi-model ensemble coupling platform. This platform can perform the couple of the ensemble of multiple atmospheric models or multiple re...
This book is based on the project "Development and Validation of High Resolution Climate System Models" with the support of the National Key Basic Research Project under grant No. 2010CB951900. It demonstrates the major advances in the development of new, dynamical Atmospheric General Circulation Model (AGCM) and Ocean General Circulation Model (OG...
Physical parameterizations in general circulation models (GCMs),
having various uncertain parameters, greatly impact model
performance and model climate sensitivity. Traditional manual and
empirical tuning of these parameters is time-consuming and
ineffective. In this study, a "three-step" methodology is proposed
to automatically and effectively ob...
In this work an ultra-scalable algorithm is designed and optimized to accelerate a 3D compressible Euler atmospheric model on the CPU-MIC hybrid system of Tianhe-2. We first reformulate the mesocale model to avoid long-latency operations, and then employ carefully designed inter-node and intra-node domain decomposition algorithms to achieve balance...
Physical parameterizations in General Circulation Models (GCMs), having various uncertain parameters, greatly impact model performance and model climate sensitivity. Traditional manual and empirical tuning of these parameters is time consuming and ineffective. In this study, a "three-step" methodology is proposed to automatically and effectively ob...
Stencils are among the most important and time-consuming kernels in many applications. While stencil optimization has been a well-studied topic on CPU platforms, achieving higher performance and efficiency for the evolving numerical stencils on the more recent multi-core and many-core architectures is still an important issue. In this paper, we exp...
An Interactive Ensemble (IE) platform was established based on a Standard Coupled (SC) climate model with seven atmosphere–land model realizations coupled to a single ocean model and a single sea ice model. The IE strategy reduces stochastic noise generated by atmospheric dynamics and therefore can be used to estimate the impact of atmospheric pert...
Scientific data analysis and visualization have become the key component for nowadays large scale simulations. Due to the rapidly increasing data volume and awkward I/O pattern among high structured files, known serial methods/tools cannot scale well and usually lead to poor performance over traditional architectures. In this paper, we propose a ne...
Tridiagonal system solver is an important kernel in many scientific and engineering applications. Even though quite a few parallel algorithms and implementations have been addressed in recent years, challenges still remain when solving large-scale tridiagonal system on heterogenous supercomputers. In this paper, a hierarchical algorithm framework S...
Atmospheric modeling is an essential issue in the study of climate change. However, due to the complicated algo-rithmic and communication models, scientists and researchers are facing tough challenges in finding efficient solutions to solve the atmospheric equations. In this paper, we accelerate a solver for the three-dimensional Euler atmospheric...
Numerical weather forecast is a most efficient means to reduce the effects of unexpected weather events. With the increasing prediction precision and the time-critical requirement, technologies of high performance computing have been improved much. However, I/O has become a significant performance bottleneck when scaling up to thousands of processe...
This paper presents a hybrid algorithm for the petascale global simulation of atmospheric dynamics on Tianhe-2, the world's current top-ranked supercomputer developed by China's National University of Defense Technology (NUDT). Tianhe-2 is equipped with both Intel Xeon CPUs and Intel Xeon Phi accelerators. A key idea of the hybrid algorithm is to e...
The chaotic atmospheric circulations and the ocean–atmosphere coupling may both cause variations in the
North Atlantic Oscillation (NAO). This study uses an interactive ensemble (IE) coupled model to study the contribution of the atmospheric noise and coupling to the monthly variability of the NAO. In the IE model, seven atmospheric general circula...
One of the most essential and challenging components in climate modeling is the atmospheric model. To solve multiphysical atmospheric equations, developers have to face extremely complex stencil kernels that are costly in terms of both computing and memory resources. This article aims to accelerate the solution of global shallow water equations (SW...
Watershed distributed ecohydrological modelling associating with massive data and intensive computation, has a rising demand for performance computing. Till now models parallelisation mainly conducted at a granularity of sub-basin, which is of low parallel efficiency and tends to cause load unbalance. Few studies conducted at a granularity of grid...
One of the most essential and challenging components in a climate system model is the atmospheric model. To solve the multi-physical atmospheric equations, developers have to face extremely complex stencil kernels. In this paper, we propose a hybrid CPU-FPGA algorithm that applies single and multiple FPGAs to compute the upwind stencil for the glob...
This paper represents a novel strategy to improve the scalability of the barotropic mode in the Parallel Ocean Program (POP), by theoretically analyzing the barotropic communications bottleneck. POP discretizes the elliptic equations of the barotropic mode into a linear system Ax=b and solves it using the Preconditioned Conjugate Gradient (PCG) met...
This paper discusses performance optimization on the dynamical core of global numerical weather prediction model in Global/Regional Assimilation and Prediction System (GRAPES). GRAPES is a new generation of numerical weather prediction system developed and currently used by Chinese Meteorology Administration. The computational performance of the dy...
Developing highly scalable algorithms for global atmospheric modeling is becoming increasingly important as scientists inquire to understand behaviors of the global atmosphere at extreme scales. Nowadays, heterogeneous architecture based on both processors and accelerators is becoming an important solution for large-scale computing. However, large-...
Cloud computing cuts down large capital outlays in facilities purchase and eliminates complex system management for users. To protect data confidentiality in cloud utilization, sensitive data are usually stored in encrypted form, making traditional search service on plaintext inapplicable. Thus, enabling keyword search over encrypted data becomes a...
Sea ice is an important component in the Earth’s climate system. Coupled climate system models are indispensable tools for the study of sea ice, its internal processes, interaction with other components, and projection of future changes. This paper evaluates the simulation of sea ice by the Flexible Global Ocean-Atmosphere-Land System model Grid-po...
form only given. As the only method to study long-term climate trend and to predict potential climate risk, climate modeling is becoming a key research topic among governments and research organizations. One of the most essential and challenging components in climate modeling is the atmospheric model. To cover high resolution in climate simulation...