Edmond Chow&#x27;s research while affiliated with Georgia Institute of Technology and other places

[...]

Data-Driven Construction of Hierarchical Matrices With Nested Bases

Article

July 2023

16 Reads

4 Citations

SIAM Journal on Scientific Computing

Data‐driven linear complexity low‐rank approximation of general kernel matrices: A geometric approach

Article

July 2023

10 Reads

3 Citations

Numerical Linear Algebra with Applications

Difeng Cai

GPU acceleration of local and semilocal density functional calculations in the SPARC electronic structure code

Yuanzhe Xi

A general, rectangular kernel matrix may be defined as where is a kernel function and where and are two sets of points. In this paper, we seek a low‐rank approximation to a kernel matrix where the sets of points and are large and are arbitrarily distributed, such as away from each other, “intermingled”, identical, and so forth. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where corresponds to the training data and corresponds to the test data. In this case, the points are often high‐dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linearly with respect to the size of data for a fixed approximation rank. The main idea in this paper is to geometrically select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.

Article

May 2023

11 Reads

5 Citations

The Journal of Chemical Physics

Alfredo Metere

Version 2.0.0 -- SPARC: Simulation Package for Ab-initio Real-space Calculations

[...]

John E Pask

We present a Graphics Processing Unit (GPU)-accelerated version of the real-space SPARC electronic structure code for performing Kohn-Sham density functional theory calculations within the local density and generalized gradient approximations. In particular, we develop a modular math-kernel based implementation for NVIDIA architectures wherein the computationally expensive operations are carried out on the GPUs, with the remainder of the workload retained on the central processing units (CPUs). Using representative bulk and slab examples, we show that relative to CPU-only execution, GPUs enable speedups of up to 6× and 60× in node and core hours, respectively, bringing time to solution down to less than 30 s for a metallic system with over 14 000 electrons and enabling significant reductions in computational resources required for a given wall time.

Preprint

May 2023

208 Reads

Boqin Zhang

Xin Jing

[...]

An Adaptive Factorized Nystr\"om Preconditioner for Regularized Kernel Matrices

SPARC is an accurate, efficient, and scalable real-space electronic structure code for performing ab initio Kohn-Sham density functional theory calculations. Version 2.0.0 of the software provides increased efficiency, and includes spin-orbit coupling, dispersion interactions, and advanced semilocal/hybrid exchange-correlation functionals. These new features further expand the range of physical applications amenable to first principles investigation using SPARC.

April 2023

30 Reads

The spectrum of a kernel matrix significantly depends on the parameter values of the kernel function used to define the kernel matrix. This makes it challenging to design a preconditioner for a regularized kernel matrix that is robust across different parameter values. This paper proposes the Adaptive Factorized Nystr\"om (AFN) preconditioner. The preconditioner is designed for the case where the rank k of the Nystr\"om approximation is large, i.e., for kernel function parameters that lead to kernel matrices with eigenvalues that decay slowly. AFN deliberately chooses a well-conditioned submatrix to solve with and corrects a Nystr\"om approximation with a factorized sparse approximate matrix inverse. This makes AFN efficient for kernel matrices with large numerical ranks. AFN also adaptively chooses the size of this submatrix to balance accuracy and cost.

Download

GPU acceleration of local and semilocal density functional calculations in the SPARC electronic structure code

Preprint

February 2023

11 Reads

Alfredo Metere

Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach

[...]

John E. Pask

We present a GPU-accelerated version of the real-space SPARC electronic structure code for performing Kohn-Sham density functional theory calculations within the local density and generalized gradient approximations. In particular, we develop a modular math kernel based implementation for NVIDIA architectures wherein the computationally expensive operations are carried out on the GPUs, with the remainder of the workload retained on the CPUs. Using representative bulk and slab examples, we show that GPUs enable speedups of up to 6x relative to CPU-only execution, bringing time to solution down to less than 30 seconds for a metallic system with over 14,000 electrons, and enabling significant reductions in computational resources required for a given wall time.

Figure 6: Accuracy comparison of different geometric selection schemes for constructing two-sided data-driven low-rank factorizations on the kernel matrix defined by the Gas Sensor dataset (d = 128) and a Gaussian kernel with the bandwidth σ 1 ≈ 307.5.

Figure 7: Accuracy comparison of one-sided vs. two-sided data-driven factorizations on the kernel matrix defined by the Gas Sensor dataset (d = 128) and a Gaussian kernel with the bandwidth σ 1 ≈ 307.5.

Figure 8: Accuracy comparison of one-sided data-driven factorizations (DD-ANC and DD-FPS) with ACA on kernel matrices defined by the Covertype dataset (d=54) and Gaussian kernel with three different bandwidths σ.

December 2022

38 Reads

Difeng Cai

SPARC v2.0.0: Spin-orbit coupling, dispersion interactions, and advanced exchange–correlation functionals

Yuanzhe Xi

A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = \kappa(x_i,y_j)$ where $\kappa(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are not well-separated (e.g., the points in $X$ and $Y$ may be ``intermingled''). Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly, i.e., with computational complexity $O(m)$ or $O(n)$ for a fixed accuracy or rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.

Download

... boundary conditions can naturally be accommodated, are perhaps the most mature and commonly employed to date. In particular, these methods can significantly outperform their planewave counterparts for local/semilocal exchange-correlation functionals, with increasing advantages as the number of processors is increased [35][36][37] . Furthermore, they have been scaled to large systems containing up to a million atoms [38][39][40] . ...
Reference:
Efficient real space formalism for hybrid density functionals

Citing Article
May 2024

Software Impacts

Boqin Zhang

Xin Jing

[...]

Data-Driven Construction of Hierarchical Matrices With Nested Bases

... In this paper, we pursue an exact solution approach for (1.1) with iterative methods. Fast matrix-vector multiplications by K for the iterative solver are available through fast transforms [19,39] and hierarchical matrix methods [2,5,14,7,30]. This paper specifically addresses the problem of preconditioning for the iterative solver. ...
Reference:
An Adaptive Factorized Nystr\"om Preconditioner for Regularized Kernel Matrices

Citing Article
July 2023

SIAM Journal on Scientific Computing

... It is desirable to develop algorithms that have linear dependence on N s and N t . In this category, there are certain versions of Adaptive Cross Approximation (ACA) [2,3], black-box fast multipole method [11], and Nyström approximation [6]. Certain analytic techniques such as multipole expansions [14], Taylor expansions, equivalent densities [35], and proxy point method [34]. ...
Reference:
Parametric kernel low-rank approximations using tensor train decomposition

Data‐driven linear complexity low‐rank approximation of general kernel matrices: A geometric approach

Citing Article
July 2023

Numerical Linear Algebra with Applications

Difeng Cai

GPU acceleration of local and semilocal density functional calculations in the SPARC electronic structure code

Yuanzhe Xi

... In this context, the choice/development of functionals that have the best balance between accuracy and computational cost is a worthy subject for future research. The implementation of Δ OF -MLFF on GPUs is likely to significantly bring down the solution times, as demonstrated recently for the associated Kohn-Sham calculations, 95 making it another subject worthy of future research. From an MLFF perspective, the current findings suggest that orbital-free DFT and other fast physical approximations can provide a valuable complement to machine learning techniques, indicating that renewed focus on improving the speed, accuracy, and general applicability of orbital-free DFT is warranted. ...
Reference:
Kohn–Sham accuracy from orbital-free density functional theory via Δ-machine learning

Citing Article
May 2023

The Journal of Chemical Physics

Alfredo Metere

Pseudodiagonalization Method for Accelerating Nonlinear Subspace Diagonalization in Density Functional Theory

[...]

John E Pask

... Q-Next is a diagonalization-free [130][131][132][133] approach for accelerating convergence of the Fock matrix in the SCF algorithm that replaces the diagonalization step of DIIS-based convergence acceleration algorithms with more scalable and close-to-peak FLOP performance matrix multiplications. 134 Q-Next is based on the idea that the convergence of the wave function in the SCF procedure can be obtained by minimizing the energy with respect to orbital rotations that mix the molecular orbitals while retaining the orthonormality. ...
Reference:
Toward an extreme-scale electronic structure system

Citing Article
May 2022

Journal of Chemical Theory and Computation

Shikhar Shah

Large-Scale Maintenance and Unit Commitment : A Decentralized Subgradient Approach

... Within the ADMM, subproblems are smaller in size and are easier to solve than the relaxed problem within ALR [19]. In [20], ADMM aims at dualizing the power flow constraints between the neighboring regions in UC optimization problem. ...
Reference:
Quantum Distributed Unit Commitment

Citing Article
June 2021

Power Systems, IEEE Transactions on

... boundary conditions can naturally be accommodated, are perhaps the most mature and commonly employed to date. In particular, these methods can significantly outperform their planewave counterparts for local/semilocal exchange-correlation functionals, with increasing advantages as the number of processors is increased [35][36][37] . Furthermore, they have been scaled to large systems containing up to a million atoms [38][39][40] . ...
Reference:
Efficient real space formalism for hybrid density functionals

SPARC: Simulation Package for Ab-initio Real-space Calculations

Citing Article
July 2021

SoftwareX

Benjamin M. Comer

[...]

Efficient Construction of an HSS Preconditioner for Symmetric Positive Definite $\mathcal{H}^2$ Matrices

... More theory and discussion on symmetric formulations of integral equations (including hypersingular integrals) can be found in [25]. One challenge for solving (1.2) is that A usually has a large condition number [37,38], and this paper is concerned with solving (1.2) iteratively using domain decomposition preconditioners. ...
Reference:
Overlapping Domain Decomposition Preconditioner for Integral Equations

Citing Article
April 2021

SIAM Journal on Matrix Analysis and Applications

Xin Xing

Hua Huang

Scalable Asynchronous Domain Decomposition Solvers

... Such a unique feature allows for considering network protocols where no (or less) control is needed to ensure data transmission, which may considerably reduce communication latency. Efforts are therefore continuously made to assess and increase the practical potential of asynchronous computing (see, e.g., latest works [1][2][3][4][5][6][7]). ...
Reference:
Scalable asynchronous domain decomposition solvers for non-homogeneous elastic structures

Citing Article
December 2020

SIAM Journal on Scientific Computing

[...]

... H2Pack [7,21] is used to provide linear complexity matrix-vector multiplications associated with large-scale K for 3D datasets with the relative error threshold 1e−8. We utilized a brute force parallel FPS algorithm on the global dataset. ...
Reference:
An Adaptive Factorized Nystr\"om Preconditioner for Regularized Kernel Matrices

H2Pack: High-performance H 2 Matrix Package for Kernel Matrices Using the Proxy Point Method

Citing Article
December 2020

ACM Transactions on Mathematical Software

Hua Huang

Xin Xing