Fig 4 - uploaded by Houman Owhadi
Content may be subject to copyright.
1. Localization of computation in Algorithm 4.1 based on hierarchy. When adding i to the ordering, only consider indices j such that dist(x i , x j ) ≤ ρl[i]. Those indices are a subset of the children of the coarse-level index k if ρl[k] ≥ dist(x i , x j ) + ρl[i]]. Thus, the search for candidates j can be restricted to those children of k.

1. Localization of computation in Algorithm 4.1 based on hierarchy. When adding i to the ordering, only consider indices j such that dist(x i , x j ) ≤ ρl[i]. Those indices are a subset of the children of the coarse-level index k if ρl[k] ≥ dist(x i , x j ) + ρl[i]]. Thus, the search for candidates j can be restricted to those children of k.

Source publication
Article
Full-text available
Dense kernel matrices $\Theta \in \mathbb{R}^{N \times N}$ obtained from point evaluations of a covariance function $G$ at locations $\{ x_{i} \}_{1 \leq i \leq N}$ arise in statistics, machine learning, and numerical analysis. For covariance functions that are Green's functions elliptic boundary value problems and approximately equally spaced samp...

Citations

... Multiscale and hierarchical ideas have also been applied to seek for a full-scale approximation of Θ with a near-linear complexity. They include H matrix [23,25,24] and variants [38,1,2,36,49,48,41,18] that rely on the low-rank structure of the off-diagonal block matrices at different scales; wavelets-based methods [5,19] that use the sparsity of Θ in the wavelet basis; multiresolution predictive processes [32]; and Vecchia approximations [73,33] and sparse Cholesky factorizations [69,68] that rely on the approximately sparse correlation conditioned on carefully ordered points. ...
... The screening effect implies that approximate conditional independence of a spatial random field is likely to occur, under suitable ordering of points. The line of work [55,56,69] provides quantitative exponential decay results for the conditional covariance in the setting of a coarse-to-fine ordering of data points, laying down the theoretical groundwork for [68]. ...
... Indeed, the off-diagonal entries exhibit exponential decay. A rigorous proof of the quantitative decay can be found in [69], where the measurements consist of Diracs functionals only, and the kernel function is the Green function of some differential operator subject to Dirichlet boundary conditions. The proof of Theorem 6.1 in [69] effectively implies that ...
Article
Full-text available
In recent years, there has been widespread adoption of machine learning-based approaches to automate the solving of partial differential equations (PDEs). Among these approaches, Gaussian processes (GPs) and kernel methods have garnered considerable interest due to their flexibility, robust theoretical guarantees, and close ties to traditional methods. They can transform the solving of general nonlinear PDEs into solving quadratic optimization problems with nonlinear, PDE-induced constraints. However, the complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel, and its partial derivatives , a result of the PDE constraint and for which fast algorithms are scarce. The primary goal of this paper is to provide a near-linear complexity algorithm for working with such kernel matrices. We present a sparse Cholesky factorization algorithm for these matrices based on the near-sparsity of the Cholesky factor under a novel ordering of pointwise and derivative measurements. The near-sparsity is rigorously justified by directly connecting the factor to GP regression and exponential decay of basis functions in numerical homogenization. We then employ the Vecchia approximation of GPs, which is optimal in the Kullback-Leibler divergence, to compute the approximate factor. This enables us to compute ϵ \epsilon -approximate inverse Cholesky factors of the kernel matrices with complexity O ( N log d ⁡ ( N / ϵ ) ) O(N\log ^d(N/\epsilon )) in space and O ( N log 2 d ⁡ ( N / ϵ ) ) O(N\log ^{2d}(N/\epsilon )) in time. We integrate sparse Cholesky factorizations into optimization algorithms to obtain fast solvers of the nonlinear PDE. We numerically illustrate our algorithm’s near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Ampère equations. In summary, we provide a fast, scalable, and accurate method for solving general PDEs with GPs and kernel methods.
... While these studies initially "attracted little attention among numerical analysts" [67], they were revived in the fields of Information Based Complexity [68], Bayesian Numerical Analysis [69], and more recently in Probabilistic Numerics [61,70]. This connection between inference and numerical approximation is also central to Bayesian/decisiontheoretic approaches to solving ODEs [71] and PDEs [38], in identifying operator adapted wavelets [59] and designing fast solvers for kernel matrices [72][73][74], and in parameter estimation [75]. ...
Preprint
The article presents a systematic study of the problem of conditioning a Gaussian random variable $\xi$ on nonlinear observations of the form $F \circ \phi(\xi)$ where $\phi: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable $\xi \mid F\circ \phi(\xi)$, stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification.
... Since the eigenvalues of elliptic solution operators follow a power law, these methods require poly(1∕ ) matrix-vector products to obtain an approximation of the operator. In contrast, Schäfer et al. [32] showed that hidden sparsity of the solution operator results in an approximation from only poly(log(1∕ )) carefully crafted matrix-vector products. This speedup amounts to an exponential reduction in the number of matrix-vector products. ...
... It is well-known that the solution operators of elliptic PDEs are dense, owing to the long-range interactions produced by diffusion. However, Schäfer et al. [32] show that when represented in a multiresolution basis ordered from coarse to fine, solution operators of elliptic PDEs have almost sparse Cholesky factors. This phenomenon is illustrated in Fig. 5. ...
... No rigorous guarantees exist for the accuracy of LU reconstruction applied to eddy diffusivity matrices. However, Schäfer et al. [32] show a wide range of diffusion-like operators, including those produced by fractional-order Matérn or Cauchy kernels, produce sparse Cholesky factors, despite the lack of theory supporting this observation. ...
... Since the eigenvalues of elliptic solution operators follow a power law, these methods require poly(1/ε) matrix-vector products to obtain an ε approximation of the operator. In contrast, Schäfer et al. [23] showed that hidden sparsity of the solution operator results in an ε approximation from only poly(log(1/ε)) carefully crafted matrix-vector products. This speedup amounts to an exponential reduction in the number of matrix-vector products. ...
... It is well-known that the solution operators of elliptic PDEs are dense, owing to the long-range interactions produced by diffusion. However, Schäfer et al. [23] show that when represented in a multiresolution basis ordered from coarse to fine, solution operators of elliptic PDEs have almost sparse Cholesky factors. This phenomenon is illustrated in fig. 5. ...
... Columns of L are recovered from matrix-vector products, and rows of U are recovered from matrix-transpose-vector products (transpose-vector products), which can be computed by solving the adjoint equation of L. No rigorous guarantees exist for the accuracy of LU reconstruction applied to eddy diffusivity matrices. However, Schäfer et al. [23] show a wide range of diffusion-like operators, including those produced by fractional-order Matérn or Cauchy kernels, produce sparse Cholesky factors, despite the lack of theory supporting this observation. ...
Preprint
Full-text available
The macroscopic forcing method (MFM) of Mani and Park and similar methods for obtaining turbulence closure operators, such as the Green's function-based approach of Hamba, recover reduced solution operators from repeated direct numerical simulations (DNS). MFM has been used to quantify RANS-like operators for homogeneous isotropic turbulence and turbulent channel flows. Standard algorithms for MFM force each coarse-scale degree of freedom (i.e., degree of freedom in the RANS space) and conduct a corresponding fine-scale simulation (i.e., DNS), which is expensive. We combine this method with an approach recently proposed by Schäfer and Owhadi (2023) to recover elliptic integral operators from a polylogarithmic number of matrix-vector products. The resulting Fast MFM introduced in this work applies sparse reconstruction to expose local features in the closure operator and reconstructs this coarse-grained differential operator in only a few matrix-vector products and correspondingly, a few MFM simulations. For flows with significant nonlocality, the algorithm first "peels" long-range effects with dense matrix-vector products to expose a local operator. We demonstrate the algorithm's performance for scalar transport in a laminar channel flow and momentum transport in a turbulent one. For these, we recover eddy diffusivity operators at 1% of the cost of computing the exact operator via a brute-force approach for the laminar channel flow problem and 13% for the turbulent one. We observe that we can reconstruct these operators with an increase in accuracy by about a factor of 100 over randomized low-rank methods. We glean that for problems in which the RANS space is reducible to one dimension, eddy diffusivity and eddy viscosity operators can be reconstructed with reasonable accuracy using only a few simulations, regardless of simulation resolution or degrees of freedom.
... We order the spatial locations s 1 , . . . , s n , and hence the columns of Y according to a maximin ordering (Guinness, 2018;Schäfer et al., 2021b), which sequentially adds to the ordering the location that maximizes the minimum distance from the locations already in the ordering. ...
... Recent results based on elliptic boundary-value problems (Schäfer et al., 2021b) imply that the Cholesky entry (u i ) j , corresponding to the jth nearest neighbor, decays exponentially as a function of j for Mateŕn covariance functions whose spectral densities are the reciprocal of a polynomial (ignoring edge effects). With the motivation to capture this exponential decay for the Cholesky entries, we arrive at the same functional form for the entries v ij of the diagonal matrix V i in (14). ...
Preprint
Full-text available
Single-cell RNA-sequencing technologies may provide valuable insights to the understanding of the composition of different cell types and their functions within a tissue. Recent technologies such as spatial transcriptomics, enable the measurement of gene expressions at the single cell level along with the spatial locations of these cells in the tissue. Dimension-reduction and spatial clustering are two of the most common exploratory analysis strategies for spatial transcriptomic data. However, existing dimension reduction methods may lead to a loss of inherent dependency structure among genes at any spatial location in the tissue and hence do not provide insights of gene co-expression pattern. In spatial transcriptomics, the matrix-variate gene expression data, along with spatial co-ordinates of the single cells, provides information on both gene expression dependencies and cell spatial dependencies through its row and column covariances. In this work, we propose a flexible Bayesian approach to simultaneously estimate the row and column covariances for the matrix-variate spatial transcriptomic data. The posterior estimates of the row and column covariances provide data summaries for downstream exploratory analysis. We illustrate our method with simulations and two analyses of real data generated from a recent spatial transcriptomic platform. Our work elucidates gene co-expression networks as well as clear spatial clustering patterns of the cells.
... Given their long training times, ANN-based methods may not be competitive with FEM in low dimensions [33]. In contrast, GP-based methods can achieve near-linear complexity when combined with fast algorithms for kernel methods such as the sparse Cholesky factorization [71,72,12]. In some applications, these algorithms can be competitive (both in terms of complexity and accuracy) even when compared to highly optimized algebraic multigrid solvers such as AMGCL and Trilinos [10]. ...
Preprint
Full-text available
We introduce a priori Sobolev-space error estimates for the solution of nonlinear, and possibly parametric, PDEs using Gaussian process and kernel based methods. The primary assumptions are: (1) a continuous embedding of the reproducing kernel Hilbert space of the kernel into a Sobolev space of sufficient regularity; and (2) the stability of the differential operator and the solution map of the PDE between corresponding Sobolev spaces. The proof is articulated around Sobolev norm error estimates for kernel interpolants and relies on the minimizing norm property of the solution. The error estimates demonstrate dimension-benign convergence rates if the solution space of the PDE is smooth enough. We illustrate these points with applications to high-dimensional nonlinear elliptic PDEs and parametric PDEs. Although some recent machine learning methods have been presented as breaking the curse of dimensionality in solving high-dimensional PDEs, our analysis suggests a more nuanced picture: there is a trade-off between the regularity of the solution and the presence of the curse of dimensionality. Therefore, our results are in line with the understanding that the curse is absent when the solution is regular enough.
... These localized reduced basis functions are known as Wannier functions in the physics literature [49], and can be interpreted as linear combinations of eigenfunctions that are localized in both frequency space and the physical domain, akin to wavelets. The hierarchical generalization of numerical homogenization [54] (gamblets) has led to the current state-of-the-art for operator compression of linear elliptic [63,65] and parabolic/hyperbolic PDEs [59]. In particular, for arbitrary (and possibly unknown) elliptic PDEs [64] shows that the solution operator (i.e., the Green's function) can be approximated in near-linear complexity to accuracy from only O(log d+1 ( 1 )) solutions of the PDE. ...
... One could also adapt our framework to non-vanilla kernel methods such as random features or inducing point methods to provide a low-complexity alternative to NNs in the large-data regime. Finally, since the proposed approach is essentially a generalization of GP Regression to the infinite-dimensional setting, we anticipate that some of the hierarchical techniques of [54,63,65] could be extended to this setting and provide a better cost-accuracy trade-off than current methods. ...
Preprint
We present a general kernel-based framework for learning operators between Banach spaces along with a priori error analysis and comprehensive numerical comparisons with popular neural net (NN) approaches such as Deep Operator Net (DeepONet) [Lu et al.] and Fourier Neural Operator (FNO) [Li et al.]. We consider the setting where the input/output spaces of target operator $\mathcal{G}^\dagger\,:\, \mathcal{U}\to \mathcal{V}$ are reproducing kernel Hilbert spaces (RKHS), the data comes in the form of partial observations $\phi(u_i), \varphi(v_i)$ of input/output functions $v_i=\mathcal{G}^\dagger(u_i)$ ($i=1,\ldots,N$), and the measurement operators $\phi\,:\, \mathcal{U}\to \mathbb{R}^n$ and $\varphi\,:\, \mathcal{V} \to \mathbb{R}^m$ are linear. Writing $\psi\,:\, \mathbb{R}^n \to \mathcal{U}$ and $\chi\,:\, \mathbb{R}^m \to \mathcal{V}$ for the optimal recovery maps associated with $\phi$ and $\varphi$, we approximate $\mathcal{G}^\dagger$ with $\bar{\mathcal{G}}=\chi \circ \bar{f} \circ \phi$ where $\bar{f}$ is an optimal recovery approximation of $f^\dagger:=\varphi \circ \mathcal{G}^\dagger \circ \psi\,:\,\mathbb{R}^n \to \mathbb{R}^m$. We show that, even when using vanilla kernels (e.g., linear or Mat\'{e}rn), our approach is competitive in terms of cost-accuracy trade-off and either matches or beats the performance of NN methods on a majority of benchmarks. Additionally, our framework offers several advantages inherited from kernel methods: simplicity, interpretability, convergence guarantees, a priori error estimates, and Bayesian uncertainty quantification. As such, it can serve as a natural benchmark for operator learning.
... Two main approaches are available for solving PDEs as learning problems: (i) artificial neural network (ANN)-based approaches, with physics-informed neural networks [14][15] as a prototypical example and (ii) GP-based approaches, with Gamblets [16][17][18] as a prototypical example. Although GP-based approaches are more theoretically well-founded [9] and have a long history of interplay with numerical approximation [8,[19][20][21] , they were essentially limited to linear/quasi-linear/time-dependent PDEs and have only recently been generalized to arbitrary nonlinear PDEs [22] (and computational graphs [23] ). ...
... Therefore, we use the identity E(|ξ(x) − ξ(y)| 2 ) = Tr(K(x, x) + K(y, y) − 2K(x, y)) (42) to incorporate the aforementioned velocity-increment power laws. Considering the scenario in which G(x, y) is stationary (as in (37)), i.e., G(x, y) = ψ(x − y) for a function ψ, (20) and (28) reduce to the particular construction of Subsection 5.1 in Ref. [26], i.e., ...
... Although the sparse Cholesky factorization algorithms introduced in Refs. [19] and [20] can be adapted to potentially reduce the inversion cost to O(N ln 2d N ), we have not employed this strategy here. ...
Article
Full-text available
We present a Gaussian process (GP) approach, called Gaussian process hydrodynamics (GPH) for approximating the solution to the Euler and Navier-Stokes (NS) equations. Similar to smoothed particle hydrodynamics (SPH), GPH is a Lagrangian particle-based approach that involves the tracking of a finite number of particles transported by a flow. However, these particles do not represent mollified particles of matter but carry discrete/partial information about the continuous flow. Closure is achieved by placing a divergence-free GP prior ξ on the velocity field and conditioning it on the vorticity at the particle locations. Known physics (e.g., the Richardson cascade and velocity increment power laws) is incorporated into the GP prior by using physics-informed additive kernels. This is equivalent to expressing ξ as a sum of independent GPs ξ l , which we call modes, acting at different scales (each mode ξ l self-activates to represent the formation of eddies at the corresponding scales). This approach enables a quantitative analysis of the Richardson cascade through the analysis of the activation of these modes, and enables us to analyze coarse-grain turbulence statistically rather than deterministically. Because GPH is formulated by using the vorticity equations, it does not require solving a pressure equation. By enforcing incompressibility and fluid-structure boundary conditions through the selection of a kernel, GPH requires significantly fewer particles than SPH. Because GPH has a natural probabilistic interpretation, the numerical results come with uncertainty estimates, enabling their incorporation into an uncertainty quantification (UQ) pipeline and adding/removing particles (quanta of information) in an adapted manner. The proposed approach is suitable for analysis because it inherits the complexity of state-of-the-art solvers for dense kernel matrices and results in a natural definition of turbulence as information loss. Numerical experiments support the importance of selecting physics-informed kernels and illustrate the major impact of such kernels on the accuracy and stability. Because the proposed approach uses a Bayesian interpretation, it naturally enables data assimilation and predictions and estimations by mixing simulation data and experimental data.
... When the number of data points n is small, solution methods based on dense matrix factorizations are the most efficient. When n is large, a common approach is to solve (1.1) using a sparse or low-rank approximation to K [32,33,25]. In this paper, we pursue an exact solution approach for (1.1) with iterative methods. ...
... The geometric interpretation of this measure is the radius of an empty ball in Ω that does not intersect with X k . This implies X k with a smaller fill distance will better fill out Ω. Since K 22 + µ I − K 12 (K 11 + µ I) −1 K 12 can be considered as the conditional covariance matrix of X\X k conditioned on X k [33,32], the screening effect [20,28] implies that a smaller h X k often yields a K 22 + µ I − K 12 (K 11 + µ I) −1 K 12 that has more entries with small magnitude. ...
... A naive implementation of FPS for selecting k samples from n points in R d scales as O(dk 2 n). The scaling can be reduced to O(ρ d n log n) by using an algorithm [33] that keeps the distance information in a heap and that only updates part of the heap when a new point is added to the set X k . Here, ρ is a constant greater than 1 that controls the efficiency the sampling process. ...
Preprint
Full-text available
The spectrum of a kernel matrix significantly depends on the parameter values of the kernel function used to define the kernel matrix. This makes it challenging to design a preconditioner for a regularized kernel matrix that is robust across different parameter values. This paper proposes the Adaptive Factorized Nystr\"om (AFN) preconditioner. The preconditioner is designed for the case where the rank k of the Nystr\"om approximation is large, i.e., for kernel function parameters that lead to kernel matrices with eigenvalues that decay slowly. AFN deliberately chooses a well-conditioned submatrix to solve with and corrects a Nystr\"om approximation with a factorized sparse approximate matrix inverse. This makes AFN efficient for kernel matrices with large numerical ranks. AFN also adaptively chooses the size of this submatrix to balance accuracy and cost.
... Multiscale and hierarchical ideas have also been applied to seek for a full-scale approximation of Θ with a near-linear complexity. They include H matrix [21,23,22] and variants [34,1,2,32,42] that rely on the low-rank structure of the off-diagonal block matrices at different scales; wavelets-based methods [4,17] that use the sparsity of Θ in the wavelet basis; multiresolution predictive processes [28]; and Vecchia approximations [66,29] and sparse Cholesky factorizations [62,61] that rely on the approximately sparse correlation conditioned on carefully ordered points. ...
... The screening effect implies that approximate conditional independence of a spatial random field is likely to occur, under suitable ordering of points. The line of work [48,49,62] provides quantitative exponential decay results for the conditional covariance in the setting of a coarse-tofine ordering of data points, laying down the theoretical groundwork for [61]. ...
... Indeed, the off-diagonal entries exhibit exponential decay. A rigorous proof of the quantitative decay can be found in [62], where the measurements consist of Diracs functionals only, and the kernel function is the Green function of some differential operator subject to Dirichlet boundary conditions. The proof of Theorem 6.1 in [62] effectively implies that ...
Preprint
Full-text available
We study the computational scalability of a Gaussian process (GP) framework for solving general nonlinear partial differential equations (PDEs). This framework transforms solving PDEs to solving quadratic optimization problem with nonlinear constraints. Its complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel of the GP and its partial derivatives at collocation points. We present a sparse Cholesky factorization algorithm for such kernel matrices based on the near-sparsity of the Cholesky factor under a new ordering of Diracs and derivative measurements. We rigorously identify the sparsity pattern and quantify the exponentially convergent accuracy of the corresponding Vecchia approximation of the GP, which is optimal in the Kullback-Leibler divergence. This enables us to compute $\epsilon$-approximate inverse Cholesky factors of the kernel matrices with complexity $O(N\log^d(N/\epsilon))$ in space and $O(N\log^{2d}(N/\epsilon))$ in time. With the sparse factors, gradient-based optimization methods become scalable. Furthermore, we can use the oftentimes more efficient Gauss-Newton method, for which we apply the conjugate gradient algorithm with the sparse factor of a reduced kernel matrix as a preconditioner to solve the linear system. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Amp\`ere equations. In summary, we provide a fast, scalable, and accurate method for solving general PDEs with GPs.