Figure 9 - uploaded by David Borhani
Content may be subject to copyright.
An electron density map. The green contours are associated with potassium ions, red contours with water molecules, and grey-blue contours with the atoms of proteins. 

An electron density map. The green contours are associated with potassium ions, red contours with water molecules, and grey-blue contours with the atoms of proteins. 

Source publication
Conference Paper
Full-text available
As parallel algorithms and architectures drive the longest molecular dynamics (MD) simulations towards the millisecond scale, traditional sequential post-simulation data analysis methods are becoming increasingly untenable. Inspired by the programming interface of Google's MapReduce, we have built a new parallel analysis framework called HiMach, wh...

Contexts in source publication

Context 1
... of electrons, thus illustrating the atomic structure of proteins and other macromolecules. While electron density maps are generally obtained through physical experiments such as X-ray diffraction measurements or electron microscopy reconstructions, MD simulations have introduced a new way of constructing electron density maps, as shown in Fig. 9. The iso-electron-density contours, shown as chicken-wire meshes, approximately represent the locations of potassium ions and water molecules within a conduction channel outlined by the rectangular box. The electron density map of an MD trajectory is defined as the average of the electron density maps associated with individual ...
Context 2
... electron density map defines a frame's 3D spatial (probability) distribution of electrons, thus illustrating the atomic structure of proteins and other macromolecules. While electron density maps are generally obtained through physical experiments such as X-ray diffraction measurements or electron microscopy reconstructions, MD simulations have introduced a new way of constructing electron density maps, as shown in Fig. 9. The iso-electron-density contours, shown as chicken-wire meshes, approximately represent the locations of potassium ions and water molecules within a conduction channel outlined by the rectangular box. The electron density map of an MD trajectory is defined as the average of the electron density maps associated with individual ...

Citations

... HDF5/DMS [31] uses the HDF5 and PnetCDF I/O library calls to expose a virtual file to staging nodes. GLEAN [141,144] is another example of a simulation/visualization coupling tool initiated by making HDF5 and PnetCDF I/O library calls. DataSpaces [48] stores the simulation data on staging nodes with a spatially coherent layout and acts as a server for client applications. ...
Thesis
Numerical simulations are complex programs that allow scientists to solve, simulate and model complex phenomena. High Performance Computing (HPC) is the domain in which these complex and heavy computations are performed on large-scale computers, also called supercomputers.Nowadays, most scientific fields need supercomputers to undertake their research. It is the case of cosmology, physics, biology or chemistry. Recently, we observe a convergence between Big Data/Machine Learning and HPC. Applications coming from these emerging fields (for example, using Deep Learning framework) are becoming highly compute-intensive. Hence, HPC facilities have emerged as an appropriate solution to run such applications. From the large variety of existing applications has risen a necessity for all supercomputers: they mustbe generic and compatible with all kinds of applications. Actually, computing nodes also have a wide range of variety, going from CPU to GPU with specific nodes designed to perform dedicated computations. Each category of node is designed to perform very fast operations of a given type (for example vector or matrix computation).Supercomputers are used in a competitive environment. Indeed, multiple users simultaneously connect and request a set of computing resources to run their applications. This competition for resources is managed by the machine itself via a specific program called scheduler. This program reviews, assigns andmaps the different user requests. Each user asks for (that is, pay for the use of) access to the resources ofthe supercomputer in order to run his application. The user is granted access to some resources for a limited amount of time. This means that the users need to estimate how many compute nodes they want to request and for how long, which is often difficult to decide.In this thesis, we provide solutions and strategies to tackle these issues. We propose mathematical models, scheduling algorithms, and resource partitioning strategies in order to optimize high-throughput applications running on supercomputers. In this work, we focus on two types of applications in the context of the convergence HPC/Big Data: data-intensive and irregular (orstochastic) applications.Data-intensive applications represent typical HPC frameworks. These applications are made up oftwo main components. The first one is called simulation, a very compute-intensive code that generates a tremendous amount of data by simulating a physical or biological phenomenon. The second component is called analytics, during which sub-routines post-process the simulation output to extract,generate and save the final result of the application. We propose to optimize these applications by designing automatic resource partitioning and scheduling strategies for both of its components.To do so, we use the well-known in situ paradigm that consists in scheduling both components together in order to reduce the huge cost of saving all simulation data on disks. We propose automatic resource partitioning models and scheduling heuristics to improve overall performance of in situ applications.Stochastic applications are applications for which the execution time depends on its input, while inusual data-intensive applications the makespan of simulation and analytics are not affected by such parameters. Stochastic jobs originate from Big Data or Machine Learning workloads, whose performanceis highly dependent on the characteristics of input data. These applications have recently appeared onHPC platforms. However, the uncertainty of their execution time remains a strong limitation when using supercomputers. Indeed, the user needs to estimate how long his job will have to be executed by the machine, and enters this estimation as his first reservation value. But if the job does not complete successfully within this first reservation, the user will have to resubmit the job, this time requiring a longer reservation.
... Perhaps more importantly, MPI allows us to implement our matrix computations of interest with significantly less communication overhead inherently. Indeed, each iteration of MapReduce includes a shuffle operation that is equivalent to an MPI_Alltoall call [7], [8]. And any given implementation may require multiple iterations, meaning multiple shuffles. ...
Preprint
The Singular Value Decomposition (SVD) is one of the most important matrix factorizations, enjoying a wide variety of applications across numerous application domains. In statistics and data analysis, the common applications of SVD such as Principal Components Analysis (PCA) and linear regression. Usually these applications arise on data that has far more rows than columns, so-called "tall/skinny" matrices. In the big data analytics context, this may take the form of hundreds of millions to billions of rows with only a few hundred columns. There is a need, therefore, for fast, accurate, and scalable tall/skinny SVD implementations which can fully utilize modern computing resources. To that end, we present a survey of three different algorithms for computing the SVD for these kinds of tall/skinny data layouts using MPI for communication. We contextualize these with common big data analytics techniques, principally PCA. Finally, we present both CPU and GPU timing results from the Summit supercomputer, and discuss possible alternative approaches.
... These trajectories need to be analyzed using statistical mechanics approaches 6,7 but because of the increasing size of data, trajectory analysis is becoming a bottleneck in typical biomolecular simulation scientific workflows 8 . Many data analysis tools and libraries have been developed to extract the desired information from the output trajectories from MD simulations [9][10][11][12][13][14][15][16][17][18][19][20][21][22] but few can efficiently use modern High Performance Computing (HPC) resources to accelerate the analysis stage. MD trajectory analysis primarily requires reading of data from the file system; the processed output data are typically negligible in size compared to the input data and therefore we exclusively investigate the reading aspects of trajectory I/O (i.e., the "I"). ...
Article
The performance of biomolecular molecular dynamics simulations has steadily increased on modern high‐performance computing resources but acceleration of the analysis of the output trajectories has lagged behind so that analyzing simulations is becoming a bottleneck. To close this gap, we studied the performance of trajectory analysis with message passing interface (MPI) parallelization and the Python MDAnalysis library on three different Extreme Science and Engineering Discovery Environment (XSEDE) supercomputers where trajectories were read from a Lustre parallel file system. Strong scaling performance was impeded by stragglers, MPI processes that were slower than the typical process. Stragglers were less prevalent for compute‐bound workloads, thus pointing to file reading as a bottleneck for scaling. However, a more complicated picture emerged in which both the computation and the data ingestion exhibited close to ideal strong scaling behavior whereas stragglers were primarily caused by either large MPI communication costs or long times to open the single shared trajectory file. We improved overall strong scaling performance by either subfiling (splitting the trajectory into separate files) or MPI‐IO with parallel HDF5 trajectory files. The parallel HDF5 approach resulted in near ideal strong scaling on up to 384 cores (16 nodes), thus reducing trajectory analysis times by two orders of magnitude compared with the serial approach.
... In the past several years, new frameworks emerged, providing parallel algorithms for analyzing MD trajectories. HiMach [126], by the D. E. Shaw Research group, extends Google's MapReduce for parallel trajectory data analysis. Pteros-2.0 ...
Chapter
Our project is at the interface of Big Data and HPC – High-Performance Big Data computing and this paper describes a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia Tech, and Utah. It addresses the intersection of High-performance and Big Data computing with several different application areas or communities driving the requirements for software systems and algorithms. We describe the base architecture, including the HPC-ABDS, High-Performance Computing enhanced Apache Big Data Stack, and an application use case study identifying key features that determine software and algorithm requirements. We summarize middleware including Harp-DAAL collective communication layer, Twister2 Big Data toolkit, and pilot jobs. Then we present the SPIDAL Scalable Parallel Interoperable Data Analytics Library and our work for it in core machine-learning, image processing and the application communities, Network science, Polar Science, Biomolecular Simulations, Pathology, and Spatial systems. We describe basic algorithms and their integration in end-to-end use cases.
... At the same time, access times for capacity storage hard disks and flash drives remain almost constant year-to-year. An emerging reality we need to confront is that we are fast approaching a point in time when it will be infeasible to comb through all the data we are generating in order to extract insight [1][2][3][4]. Fortunately, computational power, as defined in FLOPS, continues to increase in each cluster [5][6][7]. This has led the research community to move towards performing computation on data in-situ, i.e., as it streams to storage [8]. ...
Conference Paper
Full-text available
We are approaching a point in time when it will be infeasible to catalog and query data after it has been generated. This trend has fueled research on in-situ data processing (i.e. operating on data as it is streamed to storage). One important example of this approach is in-situ data indexing. Prior work has shown the feasibility of indexing at scale as a two-step process. First, one partitions data by key across the CPU cores of a parallel job. Then each core indexes its subset as data is persisted. Online partitioning requires transferring data over the network so that it can be indexed and stored by the core responsible for the data. This approach is becoming increasingly costly as new computing platforms emphasize parallelism instead of individual core performance that is crucial for communication libraries and systems software in general. In addition to indexing, scalable online data partitioning is also useful in other contexts such as load balancing and efficient compression. We present FilterKV, an efficient data management scheme for fast online data partitioning of key-value (KV) pairs. FilterKV reduces the total amount of data sent over the network and to storage. We achieve this by: (a) partitioning pointers to KV pairs instead of the KV pairs themselves and (b) using a compact format to represent and store KV pointers. Results from LANL show that FilterKV can reduce total write slowdown (including partitioning overhead) by up to 3x across 4096 CPU cores.
... These trajectories need to be analyzed using statistical mechanics approaches [6,7] but because of the increasing size of data, trajectory analysis is becoming a bottleneck in typical biomolecular simulation scientific workflows [8]. Many data analysis tools and libraries have been developed to extract the desired information from the output trajectories from MD simulations [9][10][11][12][13][14][15][16][17][18][19][20][21][22] but few can efficiently use modern High Performance Computing (HPC) resources to accelerate the analysis stage. MD trajectory analysis primarily requires reading of data from the file system; the processed output data are typically negligible in size compared to the input data and therefore we exclusively investigate the reading aspects of trajectory I/O (i.e., the "I"). ...
... Different approaches to parallelizing the analysis of MD trajectories have been proposed. Hi-Mach [14] introduces scalable and flexible parallel Python framework to deal with massive MD trajectories, by combining and extending Google's MapReduce and the VMD analysis tool [11]. HiMach's runtime is responsible to parallelize and distribute Map and Reduce classes to assigned cores. ...
... Zazen is implemented in a parallel disk cache system and avoids the overhead associated with querying metadata servers by reading data in parallel from local disks. This approach has also been used to improve the performance of HiMach [14]. It, however, advocates a specific architecture where a parallel supercomputer, which runs the simulations, immediately pushes the trajectory data to a local analysis cluster where trajectory fragments are cached on node-local disks. ...
Preprint
Full-text available
The performance of biomolecular molecular dynamics (MD) simulations has steadily increased on modern high performance computing (HPC) resources but acceleration of the analysis of the output trajectories has lagged behind so that analyzing simulations is increasingly becoming a bottleneck. To close this gap, we studied the performance of parallel trajectory analysis with MPI and the Python MDAnalysis library on three different XSEDE supercomputers where trajectories were read from a Lustre parallel file system. We found that strong scaling performance was impeded by stragglers, MPI processes that were slower than the typical process and that therefore dominated the overall run time. Stragglers were less prevalent for compute-bound workloads, thus pointing to file reading as a crucial bottleneck for scaling. However, a more complicated picture emerged in which both the computation and the ingestion of data exhibited close to ideal strong scaling behavior whereas stragglers were primarily caused by either large MPI communication costs or long times to open the single shared trajectory file. We improved overall strong scaling performance by two different approaches to file access, namely subfiling (splitting the trajectory into as many trajectory segments as number of processes) and MPI-IO with Parallel HDF5 trajectory files. Applying these strategies, we obtained near ideal strong scaling on up to 384 cores (16 nodes). We summarize our lessons-learned in guidelines and strategies on how to take advantage of the available HPC resources to gain good scalability and potentially reduce trajectory analysis times by two orders of magnitude compared to the prevalent serial approach.
... These trajectories need to be analyzed using statistical mechanics approaches [6,7] but because of the increasing size of data, trajectory analysis is becoming a bottleneck in typical biomolecular simulation scientific workflows [8]. Many data analysis tools and libraries have been developed to extract the desired information from the output trajectories from MD simulations [9][10][11][12][13][14][15][16][17][18][19][20][21][22] but few can efficiently use modern High Performance Computing (HPC) resources to accelerate the analysis stage. MD trajectory analysis primarily requires reading of data from the file system; the processed output data are typically negligible in size compared to the input data and therefore we exclusively investigate the reading aspects of trajectory I/O (i.e., the "I"). ...
... Different approaches to parallelizing the analysis of MD trajectories have been proposed. Hi-Mach [14] introduces scalable and flexible parallel Python framework to deal with massive MD trajectories, by combining and extending Google's MapReduce and the VMD analysis tool [11]. HiMach's runtime is responsible to parallelize and distribute Map and Reduce classes to assigned cores. ...
... Zazen is implemented in a parallel disk cache system and avoids the overhead associated with querying metadata servers by reading data in parallel from local disks. This approach has also been used to improve the performance of HiMach [14]. It, however, advocates a specific architecture where a parallel supercomputer, which runs the simulations, immediately pushes the trajectory data to a local analysis cluster where trajectory fragments are cached on node-local disks. ...
Preprint
Full-text available
The performance of biomolecular molecular dynamics (MD) simulations has steadily increased on modern high performance computing (HPC) resources but acceleration of the analysis of the output trajectories has lagged behind so that analyzing simulations is increasingly becoming a bottleneck. To close this gap, we studied the performance of parallel trajectory analysis with MPI and the Python MDAnalysis library on three different XSEDE supercomputers where trajectories were read from a Lustre parallel file system. We found that strong scaling performance was impeded by stragglers, MPI processes that were slower than the typical process and that therefore dominated the overall run time. Stragglers were less prevalent for compute-bound workloads, thus pointing to file reading as a crucial bottleneck for scaling. However, a more complicated picture emerged in which both the computation and the ingestion of data exhibited close to ideal strong scaling behavior whereas stragglers were primarily caused by either large MPI communication costs or long times to open the single shared trajectory file. We improved overall strong scaling performance by two different approaches to file access, namely subfiling (splitting the trajectory into as many trajectory segments as number of processes) and MPI-IO with Parallel HDF5 trajectory files. Applying these strategies, we obtained near ideal strong scaling on up to 384 cores (16 nodes). We summarize our lessons-learned in guidelines and strategies on how to take advantage of the available HPC resources to gain good scalability and potentially reduce trajectory analysis times by two orders of magnitude compared to the prevalent serial approach.
... Computational biologists have considered the map/reduce model as an alternative to traditional HPC parallelization approaches to speed up the post-hoc MD trajectory analysis. Himach [20] was the first MD analysis framework to provide parallel execution capabilities inspired by Google's Map-Reduce. The authors initially considered Hadoop as a target, but due to its poor performance they developed a dedicated map/reduce framework based on MPI, Python and using VMD for the analysis kernels. ...
Conference Paper
Full-text available
In this paper, an on-line parallel analytics framework is proposed to process and store in transit all the data being generated by a Molecular Dynamics (MD) simulation run using staging nodes in the same cluster executing the simulation. The implementation and deployment of such a parallel workflow with standard HPC tools, managing problems such as data partitioning and load balancing, can be a hard task for scientists. In this paper we propose to leverage Apache Flink, a scalable stream processing engine from the Big Data domain, in this HPC context. Flink enables to program analyses within a simple window based map/reduce model, while the runtime takes care of the deployment, load balancing and fault tolerance. We build a complete in transit analytics workflow, connecting an MD simulation to Apache Flink and to a distributed database, Apache HBase, to persist all the desired data. To demonstrate the expressivity of this programming model and its suitability for HPC scientific environments, two common analytics in the MD field have been implemented. We assessed the performance of this framework, concluding that it can handle simulations of sizes used in the literature while providing an effective and versatile tool for scientists to easily incorporate on-line parallel analytics in their current workflows.
... In the past several year, new frameworks emerged providing parallel algorithms for analyzing MD trajectories. HiMach [120] , by the D. E. Shaw Research group, extends Google's MapReduce for parallel trajectory data analysis. Pteros-2.0 ...
Technical Report
Full-text available
Our project is at Interface Big Data and HPC -- High-Performance Big Data computing and this paper describes a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia Tech, and Utah. It addresses the intersection of High-performance and Big Data computing with several different application areas or communities driving the requirements for software systems and algorithms. We describe the base architecture including the HPC-ABDS, High-Performance Computing enhanced Apache Big Data Stack, and an application use case study identifying key features that determine software and algorithm requirements. We summarize middleware including Harp-DAAL collective communication layer, Twister2 Big Data toolkit and pilot jobs. Then we present the SPIDAL Scalable Parallel Interoperable Data Analytics Library and our work for it in core machine-learning, image processing and the application communities, Network science, Polar Science, Biomolecular Simulations, Pathology and Spatial systems. We describe basic algorithms and their integration in end-to-end use cases.
... During the last years several frameworks emerged providing parallel algorithms for analyzing MD trajectories. Some of those frame- [28], HiMach [29], Pteros 2.0 [30], MD-Traj [31], and nMoldyn-3 [32]. We compare these frameworks with our approach over the parallelization techniques used. ...
... HiMach [29] was developed by D. E. Shaw Research group to provide a parallel analysis framework for MD simulations, extends Google's MapReduce. HiMach API defines trajectories, does per frame data acquisition (Map) and cross-frame analysis (Reduce). ...
Conference Paper
Full-text available
Different parallel frameworks for implementing data analysis applications have been proposed by the HPC and Big Data communities. In this paper, we investigate three task-parallel frameworks: Spark, Dask and RADICAL-Pilot with respect to their ability to support data analytics on HPC resources and compare them to MPI. We investigate the data analysis requirements of Molecular Dynamics (MD) simulations which are significant consumers of supercomputing cycles, producing immense amounts of data. A typical large-scale MD simulation of a physical system of O(100k) atoms over μsecs can produce from O(10) GB to O(1000) GBs of data. We propose and evaluate different approaches for parallelization of a representative set of MD trajectory analysis algorithms, in particular the computation of path similarity and leaflet identification. We evaluate Spark, Dask and RADICAL-Pilot with respect to their abstractions and runtime engine capabilities to support these algorithms. We provide a conceptual basis for comparing and understanding different frameworks that enable users to select the optimal system for each application. We also provide a quantitative performance analysis of the different algorithms across the three frameworks.