Figure 7 - uploaded by Akihiko Konagaya
Content may be subject to copyright.
The sustained performance of the Protein Explorer with the direct summation algorithm. The calculation time per step (left axis) and the efficiency are plotted. The solid line and dashed one indicate those of the petaflops system and the 2-Tflops system, re- spectively.  

The sustained performance of the Protein Explorer with the direct summation algorithm. The calculation time per step (left axis) and the efficiency are plotted. The solid line and dashed one indicate those of the petaflops system and the 2-Tflops system, re- spectively.  

Source publication
Conference Paper
Full-text available
We are developing the 'Protein Explorer' system, a petaflops special-purpose computer system for molecular dynamics simulations. The Protein Explorer is a PC cluster equipped with special-purpose engines that calculate nonbonded interactions between atoms, which is the most time-consuming part of the simulations. A dedicated LSI 'MDGRAPE-3 chip' pe...

Context in source publication

Context 1
... model is based on the direct summation algorithm, which will show the best sustained performance. Figure 7 shows the sustained performance of the Protein Explorer. The total time T = T PE + T host + T comm + T MPI and the efficiency T PE /T are plotted. ...

Citations

... The ASIC designs discussed follow a similar trajectory to FPGA-based development. In the earlier period of the work, the MDGRAPE 118,128,130 and MD-Engine 120 designs implemented brute-force nonbonded calculations, even if they were part of a larger more complex simulation workflow as is the case in ref 120. The Anton design implemented the full-md simulation on chip coupled with a more sophisticated PME nonbonded force calculation. ...
... ASICs are also orders of magnitude more expensive to build, with the costs for full clusters given in the millions of USD. 118,134 The justifications for development of ASICs have been cited as being a choice between developing better algorithms versus developing specialized hardware. 117,118,120,130 The large upfront costs were amortized by the expected improvement in potential simulation throughput versus general processors which was demonstrated for refs 40 and 141. ...
... 118,134 The justifications for development of ASICs have been cited as being a choice between developing better algorithms versus developing specialized hardware. 117,118,120,130 The large upfront costs were amortized by the expected improvement in potential simulation throughput versus general processors which was demonstrated for refs 40 and 141. ...
Article
Full-text available
Atomistic Molecular Dynamics (MD) simulations provide researchers the ability to model biomolecular structures such as proteins and their interactions with drug-like small molecules with greater spatiotemporal resolution than is otherwise possible using experimental methods. MD simulations are notoriously expensive computational endeavors that have traditionally required massive investment in specialized hardware to access biologically relevant spatiotemporal scales. Our goal is to summarize the fundamental algorithms that are employed in the literature to then highlight the challenges that have affected accelerator implementations in practice. We consider three broad categories of accelerators: Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), and Application Specific Integrated Circuits (ASICs). These categories are comparatively studied to facilitate discussion of their relative trade-offs and to gain context for the current state of the art. We conclude by providing insights into the potential of emerging hardware platforms and algorithms for MD.
... However, this supercomputer with high computing power is not afforded by most scientists [154]. Other supercomputers with special-purpose hardware such as MDGRAPE [155], MD Engine [156] and FASTRUN [157] have been designed to speed up the most expensive computational systems for MD simulation. In recent time, computer system with powerful Graphical Processing Unit (GPU) has been introduced to allow run MD simulation at a minimal cost [158]. ...
Article
Full-text available
With the financial requirements and high time associated with bringing a commercial drug to the market, the application of computer-aided drug design has been recognized as a powerful technology in the drug discovery pipeline. In accelerating drug discovery, molecular modeling techniques have experienced considerable growth in computational capabilities over the last decade. Pharmaceutical companies and academic research organizations are currently using various computational modeling techniques to lower the cost and time required for the discovery of an effective drug. In this article, we focus on reviewing three key components of molecular modeling (Molecular Docking, Molecular Dynamics, and ADMET modeling), their applications, and limitations in small-molecule drug discovery. We discussed the technicalities encircling molecular dynamics and docking, the algorithms used to develop the docking softwares, and the models explored by these algorithms coupled with their scoring functions. We also reviewed the Journal Pre-proof influence of molecular dynamics simulations (all atoms and coarse-grained molecular dynamics simulations) in drug discovery and also elucidated how the ensembles generated from MD simulations could pave the way for novel drug discovery. Furthermore, we briefly explain the role played by pharmacokinetics and pharmacodynamics profiling in discovering new leads for therapeutic efficacy. Besides the computational success of molecular modeling in drug discovery, we highlighted the experimental corroboration of in silico discovered drug candidates. However, as there is hardly a drug in the market discovered primarily with the use of computational modeling, we concluded the review by proposing possible solutions that could foster the advancement and clinical success of drugs.
... See, e.g., Refs. [38,39,48,49,54]. These attempts have made it possible to perform EFFMD for systems up to a spatial scale of sub-millimeters (twenty trillion atoms) [55] or a temporal scale of up to milliseconds [49]. ...
Preprint
Full-text available
We present the GPU version of DeePMD-kit, which, upon training a deep neural network model using ab initio data, can drive extremely large-scale molecular dynamics (MD) simulation with ab initio accuracy. Our tests show that the GPU version is 7 times faster than the CPU version with the same power consumption. The code can scale up to the entire Summit supercomputer. For a copper system of 113, 246, 208 atoms, the code can perform one nanosecond MD simulation per day, reaching a peak performance of 86 PFLOPS (43% of the peak). Such unprecedented ability to perform MD simulation with ab initio accuracy opens up the possibility of studying many important issues in materials and molecules, such as heterogeneous catalysis, electrochemical cells, irradiation damage, crack propagation, and biochemical reactions.
... There exist only a few dedicated systems that have been built to enhance MD simulation performance. Two older ones are the MD-ENGINE [40] and MDGRAPE-3 [41]. The MD-ENGINEis a simple ASIC coprocessor that evaluates nonbonded forces only (with a non-optimized Ewald sum method which is O N 2 ). ...
Preprint
Full-text available
Classical molecular dynamics (MD) simulations are important tools in life and material sciences since they allow studying chemical and biological processes in detail. However, the inherent scalability problem of particle-particle interactions and the sequential dependency of subsequent time steps render MD computationally intensive and difficult to scale. To this end, specialized FPGA-based accelerators have been repeatedly proposed to ameliorate this problem. However, to date none of the leading MD simulation packages fully support FPGA acceleration and a direct comparison of GPU versus FPGA accelerated codes has remained elusive so far. With this report, we aim at clarifying this issue by comparing measured application performance on GPU-dense compute nodes with performance and cost estimates of a FPGA-based single- node system. Our results show that an FPGA-based system can indeed outperform a similarly configured GPU-based system, but the overall application-level speedup remains in the order of 2x due to software overheads on the host. Considering the price for GPU and FPGA solutions, we observe that GPU-based solutions provide the better cost/performance tradeoff, and hence pure FPGA-based solutions are likely not going to be commercially viable. However, we also note that scaled multi-node systems could potentially benefit from a hybrid composition, where GPUs are used for compute intensive parts and FPGAs for latency and communication sensitive tasks.
... Molecular dynamics (MD) simulation [1,2,3,4,5,6,7,8,9] is one of the past decade's most important tools in that it enables biology scientists and researchers to explore human health and diseases. In order to observe those critical biology phenomena, it iteratively simulates the motions of the molecular dynamics at an atomic level. ...
... We can divide the prior studies into the following three main categories: 1) ASIC-based acceleration; 2) GPU-based acceleration; and 3) FPGAbased acceleration. First, several ASIC-based machines have been built to accelerate the MD simulation, including MD-GRAPE [4], GRAPE-based Protein Explorer [5], and Anton from D.E. Shaw [6,7,8]. ...
... Molecular dynamics (MD) simulation [1,2,3,4,5,6,7,8,9] is one of the most important tools for observing those critical biology phenomena. Basically, it simulates the motions of the molecular systems at an atomic level by 10 6 to 10 12 iterations for practical usage, which makes it very time consuming. ...
Article
Full-text available
Molecular dynamics (MD) simulation is one of the past decade's most important tools for enabling biology scientists and researchers to explore human health and diseases. However, due to the computation complexity of the MD algorithm, it takes weeks or even months to simulate a comparatively simple biology entity on conventional multicore processors. The critical path in molecular dynamics simulations is the force calculation between particles inside the simulated environment, which has abundant parallelism. Among various acceleration platforms, FPGA is an attractive alternative because of its low power and high energy efficiency. However, due to its high programming cost using RTL, none of the mainstream MD software packages has yet adopted FPGA for acceleration. In this paper we revisit the FPGA acceleration of MD in high-level synthesis (HLS) so as to provide affordable programming cost. Our experience with the MD acceleration demonstrates that HLS optimizations such as loop pipelining, module duplication and memory partitioning are essential to improve the performance, achieving a speedup of 9.5X compared to a 12-core CPU. More importantly, we observe that even the fully optimized HLS design can still be 2X slower than the reference RTL architecture due to the common dynamic (conditional) data flow behavior that is not yet supported by current HLS tools. To support such behavior, we further customize an array of processing elements together with a data-driven streaming network through a common RTL template, and fully automate the design flow. Our final experimental results demonstrate a 19.4X performance speedup and 39X energy efficiency for the widely used ApoA1 MD benchmark on the Convey HC1ex FPGA compared to a 12-core Intel Xeon server.
... As an acceleration technique, lookup table methods have been widely applied in MD research. Andrea et al. [10] used a lookup table method to calculate long-range forces in their simulation of water, and Taiji et al. [11] employed lookup table method to evaluate forces in the L-J (6-12) potential and the Coulomb force in their BProtein Explorer^system. Furthermore, lookup table methods have been widely used in MD software packages including Desmond [12], CHARMM [13], GROMACS [14], NAMD [15], and so on. ...
Article
Full-text available
A critical challenge for molecular dynamics simulations of chemical or biological systems is to improve the calculation efficiency while retaining sufficient accuracy. The main bottleneck in improving the efficiency is the evaluation of nonbonded pairwise interactions. We propose a new piecewise lookup table method for rapid and accurate calculation of interatomic nonbonded pairwise interactions. The piecewise lookup table allows nonuniform assignment of table nodes according to the slope of the potential function and the pair interaction distribution. The proposed method assigns the nodes more reasonably than in general lookup tables, and thus improves the accuracy while requiring fewer nodes. To obtain the same level of accuracy, our piecewise lookup table accelerates the calculation via the efficient usage of cache memory. This new method is straightforward to implement and should be broadly applicable. Graphical Abstract Illustration of piecewise lookup table method
... The development of the GRAPE systems started with accelerators for astrophysical N-body simulations, and they have been extended for MD simulations such as MDGRAPE [16]. In this paper, we report on the architecture of the fourth-generation special-purpose computer for MD simulations, MDGRAPE- 4. The MDGRAPE-4 is a successor to MDGRAPE-3 [17,18], but the architecture has been changed dramatically.Figure 1 illustrates the architectural transitions of GRAPE/MDGRAPE systems. The GRAPE-4 for astrophysics, which was completed in 1995, was the first tera floatingpoint operations per second (TFLOPS) machine [14]. ...
... The force summation is performed in a 32-bit fixed-point format. Although the MDGRAPE-4 force calculation pipeline has a structure that is quite similar to that of the MDGRAPE-3 pipeline [17,26], the following major differences exist. (i) MDGRAPE-4 calculates Coulomb and van der Waals force and potentials simultaneously, whereas the MDGRAPE-3 pipeline can evaluate only one of the four terms at a time. ...
Article
Full-text available
We are developing the MDGRAPE-4, a special-purpose computer system for molecular dynamics (MD) simulations. MDGRAPE-4 is designed to achieve strong scalability for protein MD simulations through the integration of general-purpose cores, dedicated pipelines, memory banks and network interfaces (NIFs) to create a system on chip (SoC). Each SoC has 64 dedicated pipelines that are used for non-bonded force calculations and run at 0.8 GHz. Additionally, it has 65 Tensilica Xtensa LX cores with single-precision floating-point units that are used for other calculations and run at 0.6 GHz. At peak performance levels, each SoC can evaluate 51.2 G interactions per second. It also has 1.8 MB of embedded shared memory banks and six network units with a peak bandwidth of 7.2 GB s(-1) for the three-dimensional torus network. The system consists of 512 (8×8×8) SoCs in total, which are mounted on 64 node modules with eight SoCs. The optical transmitters/receivers are used for internode communication. The expected maximum power consumption is 50 kW. While MDGRAPE-4 software has still been improved, we plan to run MD simulations on MDGRAPE-4 in 2014. The MDGRAPE-4 system will enable long-time molecular dynamics simulations of small systems. It is also useful for multiscale molecular simulations where the particle simulation parts often become bottlenecks.
... Using proper computational methods (molecular dynamics Kholmirzo T. Kholmurodov protein structures associated with diseases of a radiobiological nature (MD)) and efficiently implementing them or special-purpose MD machines, it is possible an adequate study. In the following sections strate an efficient use of molecular simulation of radiobiological objects (the p53 oncoprotein, visual pi ment rhodopsin, cyclin-dependent kinases teins) [2][3][4][5][6][7][8][9][10][11][12][13][14]. ...
Article
Full-text available
The induced mutations in biological molecules, such as DNA and proteins, have quite a different nature (environmental factors, viruses, ionizing radiation, mutagenic chemicals, inherited genetic alterations, etc.). Induced mutations can destroy the existing chemical (hydrogen) bonds in the native molecular structures or, on the contrary, create new chemical (hydrogen) bonds that do not normally exist there. In protein structures, the cause of such changes might be the substitution of one or several specific amino acid residues (point mutations). At the atomic level, the replacement of one amino acid residue by another causes essential modifications of the molecular force fields of the environment, which can break important hydrogen bonds underlying the structural stability of biological molecules. In this work, based on molecular dynamics (MD) method, we demonstrate the effect of mutational structure changes on several biological protein models (the p53 oncoprotein, visual pigment rhodopsin, cyclin-dependent kinase, and recA protein). Molecular dynamics simulation is a powerful tool in investigating the structure properties of biological molecules on the atomic and molecular levels, and it has been widely used to study the structural conformational behavior of proteins. We also discuss the scenario of the mutation effects associated with different kinds of diseases that could develop and take place in physiological conditions.
... Several projects including FASTRUN (Fine et al., 1991), MDGRAPE (Taiji et al., 2003), and MD Engine (Toyoda et al., 1999) each have produced special-purpose hardware to support the acceleration of the most computationally expensive stages of an MD simulation. The Anton supercomputer (Shaw, 2009) mentioned previously is producing the most dramatic performance improvements in MD simulations to date achieving from microseconds up to one millisecond of chemical simulation time of a virtual system of over ten thousand atoms making up a small protein enveloped by water molecules. ...
Thesis
Full-text available
The quest to understand the mechanisms of the origin of life on Earth could be enhanced by computer simulations of plausible stages in the emergence of life from non-life at the molecular level. This class of simulation could then support testing and validation through parallel laboratory chemical experiments. This combination of a computational, or “cyber” component and a parallel effort investigation in chemical abiogenesis could be termed a cyberbiogenesis approach. The central technological challenge to cyberbiogenesis endeavours is to design computer simulation models permitting de novo emergence of prebiotic and biological virtual molecular structures and processes through multiple thresholds of complexity. This thesis takes on the challenge of designing, implementing and analyzing one such simulation model. This model can be described concisely as: distributed processing and global optimization through the method of search coupled with stochastic hill climbing supporting emergent phenomena within small volume, short time frame molecular dynamics simulations. The original contributions to knowledge made by this work are to frame computational origins of life endeavours historically; postulate and describe one concrete design to test a hypothesis surrounding this class of computation; present results from a prototype system, the EvoGrid, built to execute a range of experiments which test the hypothesis; and propose a road map and societal considerations for future computational origins of life endeavours.
... These algorithms can be divided into those that compute an estimate for the true value of ρ, and those that perform hypothesis testing (i.e., selecting between the hypothesis that the model satisfies the formula, versus the hypothesis that it does not.). We note that the cost of generating each sample path can be very time-consuming in some domains (e.g., [13,53,56]), including modeling biochemical systems. Hence, it is important to sample as few traces from the model as possible. ...
Article
The stochastic dynamics of biochemical reaction networks can be modeled using a number of succinct formalisms all of whose semantics are expressed as Continuous Time Markov Chains (CTMC). While some kinetic parameters for such models can be measured experimentally, most are estimated by either fitting to experimental data or by performing ad hoc, and often manual search procedures. We consider an alternative strategy to the problem, and introduce algorithms for automatically synthesizing the set of all kinetic parameters such that the model satisfies a given high-level behavioral specification. Our algorithms, which integrate statistical model checking and abstraction refinement, can also report the infeasibility of the model if no such combination of parameters exists. Behavioral specifications can be given in any finitely monitorable logic for stochastic systems, including the probabilistic and bounded fragments of linear and metric temporal logics. The correctness of our algorithms is established using a novel combination of arguments based on survey sampling and uniform continuity. We prove that the probability of a measurable set of paths is uniformly and jointly continuous with respect to the kinetic parameters. Under a suitable technical condition, we also show that the unbiased statistical estimator for the probability of a measurable set of paths is monotonic in the parameter space. We apply our algorithms to two benchmark models of biochemical signaling, and demonstrate that they can efficiently find parameter regimes satisfying a given high-level behavioral specification. In particular, we show that our algorithms can synthesize up to 6 parameters, simultaneously, which is more than that reported by any other synthesis algorithm for stochastic systems. Moreover, when parameter estimation is desired, as opposed to synthesis, we show that our approach can scale to even higher dimensional spaces, by identifying the single parameter combination that maximizes the probability of the behavior being true in an 11-dimensional system.