Elliptic PDE with varying noise variances σ 2 . Total numbers of density evaluations in all layers (a), reciprocal sample size (b), and IACT (c). Tempering is carried out with β 0 = 0.1σ 2 , β k+1 = √ 10 · β k . TT parameters: n = 16, TT rank 20, one TT-cross iteration

Elliptic PDE with varying noise variances σ 2 . Total numbers of density evaluations in all layers (a), reciprocal sample size (b), and IACT (c). Tempering is carried out with β 0 = 0.1σ 2 , β k+1 = √ 10 · β k . TT parameters: n = 16, TT rank 20, one TT-cross iteration

Source publication
Article
Full-text available
Characterising intractable high-dimensional random variables is one of the fundamental challenges in stochastic computation. The recent surge of transport maps offers a mathematical foundation and new insights for tackling this challenge by coupling intractable random variables with tractable reference random variables. This paper generalises the f...

Context in source publication

Context 1
... carry out an additional test with decreasing noise variance σ 2 . In Fig. 10, we fix m = 15 2 and vary σ 2 from 10 −1 to 10 −5 . In this experiment, fixing TT ranks becomes insufficient for representing posterior densities with low observation noise. In particular, the piecewise linear basis does not have sufficient accuracy for the case of the smallest noise variance. In contrast, the Fourier basis can still ...

Similar publications

Preprint
Full-text available
Hierarchical models with gamma hyperpriors provide a flexible, sparse-promoting framework to bridge $L^1$ and $L^2$ regularizations in Bayesian formulations to inverse problems. Despite the Bayesian motivation for these models, existing methodologies are limited to \textit{maximum a posteriori} estimation. The potential to perform uncertainty quant...

Citations

... In normalizing flows, the map is considered to be a generative model similar to neural networks [30], while the TM approach mostly uses polynomials [31]. However, there have also been advancements in using tensor trains to decompose the posterior density, thus increasing the efficiency; see [32,33] for a discussion and comparison with MCMC methods. We will focus in this paper on the polynomial formulation introduced in [34]. ...
... However, the number of points needs also to be chosen such that the accuracy is not suffering. Because the reference density is easily computable, some quadrature schemes apply here [32]. We investigated two random sampling approaches, namely Monte Carlo (MC) and Latin hypercube sampling (LHS), as well as Gauss-Hermite integration and sparse grids with Smolyak's algorithm [40,41]. ...
... These allow for a slower convergence to the density of interest, such that multiple modes are more likely to be captured. This concept was recently shown to work with transport maps [32,43]. Instead of calculating a single direct map, a series of maps is chained together. ...
Article
Full-text available
In this paper, an alternative to solving Bayesian inverse problems for structural health monitoring based on a variational formulation with so-called transport maps is examined. The Bayesian inverse formulation is a widely used tool in structural health monitoring applications. While Markov Chain Monte Carlo (MCMC) methods are often implemented in these settings, they come with the problem of using many model evaluations, which in turn can become quite costly. We focus here on recent developments in the field of transport theory, where the problem is formulated as finding a deterministic, invertible mapping between some easy to evaluate reference density and the posterior. The resulting variational formulation can be solved with integration and optimization methods. We develop a general formulation for the application of transport maps to vibration-based structural health monitoring. Further, we study influences of different integration approaches on the efficiency and accuracy of the transport map approach and compare it to the Transitional MCMC algorithm, a widely used method for structural identification. Both methods are applied to a lower-dimensional dynamic model with uni-and multi-modal properties, as well as to a higher-dimensional neural network surrogate system of an airplane structure. We find that transport maps have a significant increase in accuracy and efficiency, when used in the right circumstances.
... A particularly interesting example is non-parametric estimation of probability densities in low-rank tensor formats [90]. Among the used approaches that are used to ensure nonnegativity are some ad-hoc corrections [26] and the squared low-rank TT approximations [20,78]-our idea can serve as an alternative. ...
Preprint
Full-text available
We study the convergence of specific inexact alternating projections for two non-convex sets in a Euclidean space. The σ-quasioptimal metric projection (σ≥1) of a point x onto a set A consists of points in A the distance to which is at most σ times larger than the minimal distance dist(x,A). We prove that quasioptimal alternating projections, when one or both projections are quasioptimal, converge locally and linearly for super-regular sets with transversal intersection. The theory is motivated by the successful application of alternating projections to low-rank matrix and tensor approximation. We focus on two problems -- nonnegative low-rank approximation and low-rank approximation in the maximum norm -- and develop fast alternating-projection algorithms for matrices and tensor trains based on cross approximation and acceleration techniques. The numerical experiments confirm that the proposed methods are efficient and suggest that they can be used to regularise various low-rank computational routines.
... In particular, we leverage tensor-train orthogonalizations to improve the linear scaling in the recently proposed squared inverse Rosenblatt transport [10] method for interpolating probability densities with guaranteed non-negativity. From another perspective, our method can be viewed as an improvement over TT-based generative model [11], by augmenting it with a neural-network flow model, as TT has a limited representation power. ...
... The details of obtaining C as a TT are deferred for now to Section 3.1.4. We remark here that although we intend to obtain samples from p 0 (x) = q 0 (x) 2 , we never explicitly form its coefficient tensor as a TT, as doing so would require us to work with core tensors of size r 2 × n × r 2 [11]. Instead we only work directly with the core tensors of C. In particular, we demonstrate that by putting the cores of C in right-left orthogonal form, the marginalizations in variables x k+1 , . . . ...
... 11 and summarized inTable 4.Several typical sample configurations of the 2-dimensional Ginzburg-Landau model drawn from (a) the trained TF and (b) the trained NF. ...
Preprint
Full-text available
Fueled by the expressive power of deep neural networks, normalizing flows have achieved spectacular success in generative modeling, or learning to draw new samples from a distribution given a finite dataset of training samples. Normalizing flows have also been applied successfully to variational inference, wherein one attempts to learn a sampler based on an expression for the log-likelihood or energy function of the distribution, rather than on data. In variational inference, the unimodality of the reference Gaussian distribution used within the normalizing flow can cause difficulties in learning multimodal distributions. We introduce an extension of normalizing flows in which the Gaussian reference is replaced with a reference distribution that is constructed via a tensor network, specifically a matrix product state or tensor train. We show that by combining flows with tensor networks on difficult variational inference tasks, we can improve on the results obtained by using either tool without the other.
... The idea is to compute the exact KR rearrangement of an approximate density of the target random variable. For instance, tensor methods are used in [14,15,21] to construct the approximate density. By doing so, this approach bypasses the highly nonlinear optimization procedure, and the associated error analysis is often well established. ...
... In this case, it is often computationally infeasible to directly approximate the target density function with standard approximation tools. Following [14], one way to circumvent this challenge is to introduce a layered construction technique that builds a composition of KR rearrangements, i.e., T = Q (1) • · · · • Q (L) . Each map Q ( ) is greedily constructed as the KR rearrangement from the reference random variable U to the preconditioned bridging random variable the law (Q (1) • · · · • Q ( −1) ) ν X ( ) . ...
... In this paper, we present generalizations and improvements of the function approximation methods used to build the KR rearrangement from density. Instead of using tensor methods as in the seminal papers [14,15], we propose in Section 2 to approximate the target density using a sparse tensorproduct basis via a least square formulation. More specifically, we approximate the square root of the unnormalized target density to ensure the nonnegativity of the density approximation by squaring the resulting function. ...
Preprint
Transport map methods offer a powerful statistical learning tool that can couple a target high-dimensional random variable with some reference random variable using invertible transformations. This paper presents new computational techniques for building the Knothe--Rosenblatt (KR) rearrangement based on general separable functions. We first introduce a new construction of the KR rearrangement -- with guaranteed invertibility in its numerical implementation -- based on approximating the density of the target random variable using tensor-product spectral polynomials and downward closed sparse index sets. Compared to other constructions of KR arrangements based on either multi-linear approximations or nonlinear optimizations, our new construction only relies on a weighted least square approximation procedure. Then, inspired by the recently developed deep tensor trains (Cui and Dolgov, Found. Comput. Math. 22:1863--1922, 2022), we enhance the approximation power of sparse polynomials by preconditioning the density approximation problem using compositions of maps. This is particularly suitable for high-dimensional and concentrated probability densities commonly seen in many applications. We approximate the complicated target density by a composition of self-reinforced KR rearrangements, in which previously constructed KR rearrangements -- based on the same approximation ansatz -- are used to precondition the density approximation problem for building each new KR rearrangement. We demonstrate the efficiency of our proposed methods and the importance of using the composite map on several inverse problems governed by ordinary differential equations (ODEs) and partial differential equations (PDEs).
... 3. Squared-tensor-train algorithms, debiasing and smoothing. In this section, we review the squared-tensor-train technique for building order-preserving transport maps, which is originally introduced in [6]. Then, we integrate the resulting transport maps into the recursive procedure defined in Section 2.4 to sequentially solve the filtering problem (1.5) and the parameter estimation problem (1.6) with sample-based debiasing. ...
... Proof. The original proof is given in Proposition 2 of [6]. We illustrate the idea here for the sake of completeness. ...
Preprint
Full-text available
Numerous real-world applications involve the filtering problem: one aims to sequentially estimate the states of a (stochastic) dynamical system from incomplete, indirect, and noisy observations over time to forecast and control the underlying system. Examples can be found in econometrics, meteorology, robotics, bioinformatics, and beyond. In addition to the filtering problem, it is often of interest to estimate some parameters that govern the evolution of the system. Both the filtering and the parameter estimation can be naturally formalized under the Bayesian framework. However, the Bayesian solution poses some significant challenges. For example, the most widely used particle filters can suffer from particle degeneracy and the more robust ensemble Kalman filters rely on the rather restrictive Gaussian assumptions. Exploiting the interplay between the low-rank tensor structure (tensor train) and Markov property of the filtering problem, we present a new approach for tackling Bayesian filtering and parameter estimation altogether. We also explore the preconditioning method to enhance the tensor-train approximation power. Our approach aims at exact Bayesian solutions and does not suffer from particle degeneracy.
... We present a deep importance sampling scheme that is suitable for high-dimensional rare event problems. It employs the deep inverse Rosenblatt transport developed in Dolgov et al. (2020) and Cui & Dolgov (2021) to adaptively approximate the optimal importance density using a composition of order-preserving maps. When the optimal importance density is multi-modal and concentrated in the tails of the input distribution, the composite structure is able to adapt to those complicated features. ...
... It requires only O(dnr 2 ) evaluations of the density ρ * and O(dnr 3 ) floating point operations, where n = max k n k and r = max k r k . For more details see Dolgov et al. (2020) and Cui & Dolgov (2021). In general, the maximal rank r depends on the dimension d and can be large when the density ρ * concentrates in some part of its domain, but some theoretical results exist that provide rank bounds. ...
... for some τ > 0. The additional term τ λ(x) guarantees that supp(ρ * ) ⊆ supp(p), and thus the importance sampling estimator defined by the approximate density p is unbiased. The following lemma, whose original proof is given in Cui & Dolgov (2021), shows how to choose τ as a function of the error ing in the L 2 -norm, to be able to control the overall error of the approximate density p in Hellinger distance. ...
Preprint
Full-text available
We propose a deep importance sampling method that is suitable for estimating rare event probabilities in high-dimensional problems. We approximate the optimal importance distribution in a general importance sampling problem as the pushforward of a reference distribution under a composition of order-preserving transformations, in which each transformation is formed by a squared tensor-train decomposition. The squared tensor-train decomposition provides a scalable ansatz for building order-preserving high-dimensional transformations via density approximations. The use of composition of maps moving along a sequence of bridging densities alleviates the difficulty of directly approximating concentrated density functions. To compute expectations over unnormalized probability distributions, we design a ratio estimator that estimates the normalizing constant using a separate importance distribution, again constructed via a composition of transformations in tensor-train format. This offers better theoretical variance reduction compared with self-normalized importance sampling, and thus opens the door to efficient computation of rare event probabilities in Bayesian inference problems. Numerical experiments on problems constrained by differential equations show little to no increase in the computational complexity with the event probability going to zero, and allow to compute hitherto unattainable estimates of rare event probabilities for complex, high-dimensional posterior densities.
... In this work, we focus on the low rank tensor train representation to construct the random projection f . Tensor decompositions are widely used for data compression [5,[19][20][21][22][23][24]. The Tensor Train (TT) decomposition, also called Matrix Product States (MPS) [25][26][27][28], gives the following benefits-low rank TT-formats can provide compact representations of projection matrices and efficient basic linear algebra operations of matrix-by-vector products [29]. ...
Article
Full-text available
This work proposes a Tensor Train Random Projection (TTRP) method for dimension reduction, where pairwise distances can be approximately preserved. Our TTRP is systematically constructed through a Tensor Train (TT) representation with TT-ranks equal to one. Based on the tensor train format, this random projection method can speed up the dimension reduction procedure for high-dimensional datasets and requires fewer storage costs with little loss in accuracy, compared with existing methods. We provide a theoretical analysis of the bias and the variance of TTRP, which shows that this approach is an expected isometric projection with bounded variance, and we show that the scaling Rademacher variable is an optimal choice for generating the corresponding TT-cores. Detailed numerical experiments with synthetic datasets and the MNIST dataset are conducted to demonstrate the efficiency of TTRP.
... -Square root trick: One can obtain a tensor train approximation ĝ of f 0 , as e.g. applied in Cui and Dolgov (2021). Subsequently squaring the result yields the desired non-negative tensor train approximationĝ off 0 . ...
Article
Full-text available
This paper presents a novel method for the accurate functional approximation of possibly highly concentrated probability densities. It is based on the combination of several modern techniques such as transport maps and low-rank approximations via a nonintrusive tensor train reconstruction. The central idea is to carry out computations for statistical quantities of interest such as moments based on a convenient representation of a reference density for which accurate numerical methods can be employed. Since the transport from target to reference can usually not be determined exactly, one has to cope with a perturbed reference density due to a numerically approximated transport map. By the introduction of a layered approximation and appropriate coordinate transformations, the problem is split into a set of independent approximations in seperately chosen orthonormal basis functions, combining the notions h- and p-refinement (i.e. “mesh size” and polynomial degree). An efficient low-rank representation of the perturbed reference density is achieved via the Variational Monte Carlo method. This nonintrusive regression technique reconstructs the map in the tensor train format. An a priori convergence analysis with respect to the error terms introduced by the different (deterministic and statistical) approximations in the Hellinger distance and the Kullback–Leibler divergence is derived. Important applications are presented and in particular the context of Bayesian inverse problems is illuminated which is a main motivation for the developed approach. Several numerical examples illustrate the efficacy with densities of different complexity and degrees of perturbation of the transport to the reference density. The (superior) convergence is demonstrated in comparison to Monte Carlo and Markov Chain Monte Carlo methods.
... Consequently, the knowledge of the (approximate) transport map can help to significantly alleviate the high computational burden that e.g. is typical for Markov chain Monte Carlo methods. With this in mind, functional representations in a polynomial basis exploiting the beneficial structure of the Knothe-Rosenblatt transform were for instance developed in [74,21,20]. For the formulation of the variational problem, the Kullback-Leibler divergence is used. ...
Preprint
Full-text available
A unsupervised learning approach for the computation of an explicit functional representation of a random vector $Y$ is presented, which only relies on a finite set of samples with unknown distribution. Motivated by recent advances with computational optimal transport for estimating Wasserstein distances, we develop a new \textit{Wasserstein multi-element polynomial chaos expansion} (WPCE). It relies on the minimization of a regularized empirical Wasserstein metric known as debiased Sinkhorn divergence. As a requirement for an efficient polynomial basis expansion, a suitable (minimal) stochastic coordinate system $X$ has to be determined with the aim to identify ideally independent random variables. This approach generalizes representations through diffeomorphic transport maps to the case of non-continuous and non-injective model classes $\mathcal{M}$ with different input and output dimension, yielding the relation $Y=\mathcal{M}(X)$ in distribution. Moreover, since the used PCE grows exponentially in the number of random coordinates of $X$, we introduce an appropriate low-rank format given as stacks of tensor trains, which alleviates the curse of dimensionality, leading to only linear dependence on the input dimension. By the choice of the model class $\mathcal{M}$ and the smooth loss function, higher order optimization schemes become possible. It is shown that the relaxation to a discontinuous model class is necessary to explain multimodal distributions. Moreover, the proposed framework is applied to a numerical upscaling task, considering a computationally challenging microscopic random non-periodic composite material. This leads to tractable effective macroscopic random field in adopted stochastic coordinates.
... However, the latter is a complicated multivariate function, which lacks an independent variable parametrization necessary to set up the discretization and TT approximation. (A possible solution to this using optimal transport [4] can be a matter of future research.) As a proof of concept, we simplify the distribution to independent uniform, reflecting means and variances estimated by ABC. ...
Preprint
Full-text available
This article develops a new algorithm named TTRISK to solve high-dimensional risk-averse optimization problems governed by differential equations (ODEs and/or PDEs) under uncertainty. As an example, we focus on the so-called Conditional Value at Risk (CVaR), but the approach is equally applicable to other coherent risk measures. Both the full and reduced space formulations are considered. The algorithm is based on low rank tensor approximations of random fields discretized using stochastic collocation. To avoid non-smoothness of the objective function underpinning the CVaR, we propose an adaptive strategy to select the width parameter of the smoothed CVaR to balance the smoothing and tensor approximation errors. Moreover, unbiased Monte Carlo CVaR estimate can be computed by using the smoothed CVaR as a control variate. To accelerate the computations, we introduce an efficient preconditioner for the KKT system in the full space formulation.The numerical experiments demonstrate that the proposed method enables accurate CVaR optimization constrained by large-scale discretized systems. In particular, the first example consists of an elliptic PDE with random coefficients as constraints. The second example is motivated by a realistic application to devise a lockdown plan for United Kingdom under COVID-19. The results indicate that the risk-averse framework is feasible with the tensor approximations under tens of random variables.