Fig 2 - uploaded by Alex Kostinski
Content may be subject to copyright.
Covariance-matrix condition number 1 versus for ( )= . The correlation coefficient = 0 98 is constant throughout. Note the high sensitivity of 1 for =2 . 

Covariance-matrix condition number 1 versus for ( )= . The correlation coefficient = 0 98 is constant throughout. Note the high sensitivity of 1 for =2 . 

Source publication
Article
Full-text available
The authors examine the reasons behind the fact that the Gaussian autocorrelation-function model, widely used in remote sensing, yields a particularly ill-conditioned sample-covariance matrix in the case of many strongly correlated samples. The authors explore the question numerically and relate the magnitude of the matrix-condition number to the n...

Contexts in source publication

Context 1
... us next explore the "Gaussian anomaly" systematically, by com- puting the eigenvalues and the condition number as a function of as it approaches 2. The results are shown in Fig. 2 for various dimen- sions, and r = 0:98. Again, the Gaussian anomaly is clear. We see that reducing from 2 to 1.9 reduces the condition number by several or- ders of magnitude (note that = 2:0 is not quite resolved in Fig. ...
Context 2
... by com- puting the eigenvalues and the condition number as a function of as it approaches 2. The results are shown in Fig. 2 for various dimen- sions, and r = 0:98. Again, the Gaussian anomaly is clear. We see that reducing from 2 to 1.9 reduces the condition number by several or- ders of magnitude (note that = 2:0 is not quite resolved in Fig. ...

Citations

... A ubiquitous problem in the use of GPs is the ill-conditioning of their covariance matrices [1,12,30]. This problem is present with the use of many kernels, including the Gaussian kernel. ...
... For Fig. 1b the black line is Eq. (18), the solid green line is the mean of the surrogate from Eq. (12), and the light green area represents ±2σ f (x), where σ f comes from Eq. (13). For Fig. 1a the mean and standard deviation of the gradient-free GP are calculated with equations analogous to Eqs. (12) and (13) that omit the gradient evaluations and use the gradient-free kernel matrix. ...
... (18), the solid green line is the mean of the surrogate from Eq. (12), and the light green area represents ±2σ f (x), where σ f comes from Eq. (13). For Fig. 1a the mean and standard deviation of the gradient-free GP are calculated with equations analogous to Eqs. (12) and (13) that omit the gradient evaluations and use the gradient-free kernel matrix. ...
Preprint
Full-text available
Gaussian processes provide probabilistic surrogates for various applications including classification, uncertainty quantification, and optimization. Using a gradient-enhanced covariance matrix can be beneficial since it provides a more accurate surrogate relative to its gradient-free counterpart. An acute problem for Gaussian processes, particularly those that use gradients, is the ill-conditioning of their covariance matrices. Several methods have been developed to address this problem for gradient-enhanced Gaussian processes but they have various drawbacks such as limiting the data that can be used, imposing a minimum distance between evaluation points in the parameter space, or constraining the hyperparameters. In this paper a new method is presented that applies a diagonal preconditioner to the covariance matrix along with a modest nugget to ensure that the condition number of the covariance matrix is bounded, while avoiding the drawbacks listed above. Optimization results for a gradient-enhanced Bayesian optimizer with the Gaussian kernel are compared with the use of the new method, a baseline method that constrains the hyperparameters, and a rescaling method that increases the distance between evaluation points. The Bayesian optimizer with the new method converges the optimality, ie the $\ell_2$ norm of the gradient, an additional 5 to 9 orders of magnitude relative to when the baseline method is used and it does so in fewer iterations than with the rescaling method. The new method is available in the open source python library GpGradPy, which can be found at https://github.com/marchildon/gpgradpy/tree/paper_precon. All of the figures in this paper can be reproduced with this library.
... The problem of ill-conditioned gradient-free covariance matrices for GPs is common for a wide range of kernels [11,28], including the Gaussian kernel [1]. The factors that cause the ill-conditioning of the gradient-free covariance matrix for GPs has been studied in detail in [4]. ...
Article
Full-text available
Gaussian processes (GPs) are used for numerous different applications, including uncertainty quantification and optimization. Ill-conditioning of the covariance matrix for GPs is common with the use of various kernels, including the Gaussian, rational quadratic, and Matérn kernels. A common approach to overcome this problem is to add a nugget along the diagonal of the covariance matrix. For GPs that are not constructed with gradients, it is straightforward to derive a nugget value that guarantees the condition number of the covariance matrix to be below a user-set threshold. However, for gradient-enhanced GPs, there are no existing practical bounds to select a nugget that guarantee that the condition number of the gradient-enhanced covariance matrix is below a user-set threshold. In this paper a novel approach is taken to bound the condition number of the covariance matrix for GPs that use the Gaussian kernel. This is achieved by using non-isotropic rescaling for the data and a modest nugget value. This non-intrusive method works for GPs applied to problems of any dimension and it allows all data points to be kept. The method is applied to a Bayesian optimizer using a gradient-enhanced GP to achieve deep convergence. Without this method, the high condition number constrains the hyperparameters for the GP and this is shown to impede the convergence of the optimizer. It is also demonstrated that applying this method to the rational quadratic and Matérn kernels alleviates the ill-conditioning of their gradient-enhanced covariance matrices. Implementation of the method is straightforward and clearly described in the paper.
... A lower value of q, e.g., q = 1, is more suitable for a rougher response as it permits a more substantial difference in function values for adjacent points. Some studies (Kostinski & Koivunen, 2000;Zimmermann, 2015) have demonstrated that the Gaussian correlation model is exceptionally prone to ill conditioning, while the exponential correlation models with proper power q can effectively mitigate such issue. Furthermore, some numerical center actions for normalization are often necessary. ...
... Furthermore, some numerical center actions for normalization are often necessary. The reader may refer to (Kostinski & Koivunen, 2000;Zimmermann, 2015) for more related discussions. In this paper, we have presented an appropriate multi-fidelity normalized method of the multi-fidelity input data in AMF-PCK modeling, as introduced in Section 3.4. ...
Article
Full-text available
The multi-fidelity metamodeling method can dramatically improve the efficiency of metamodeling for computationally expensive engineering problems when multiple levels of fidelity data are available. In this paper, an efficient and novel adaptive multi-fidelity sparse polynomial chaos- Kriging (AMF-PCK) metamodeling method is proposed for accurate global approximation. This approach, by first using low-fidelity computations, builds the PCK model as a model trend for the high-fidelity function and captures the relative importance of the significant sparse polynomials bases selected by least angle regression (LAR). Then, by using high-fidelity model evaluations, the developed method utilizes the trend information to adaptively refine a scaling PCK model using an adaptive correction polynomial expansion-Gaussian process modeling. Here, the most relevant sparse polynomial basis set and the optimal correction expansion are adaptively identified and constructed based on a devised nested leave-one-out cross-validation-based LAR procedure. As a result, the optimal AMF-PCK metamodel is adaptively established, which combines advantages of high flexibility and strong nonlinear modeling ability. Moreover, an adaptive sequential sampling approach is specially developed to further improve the multi-fidelity metamodeling efficiency. The developed method is evaluated by several benchmark functions and two practically-challenging transonic aerodynamic modeling applications. A comprehensive comparison with the popular hierarchical Kriging, universal Kriging and LAR-PCK approaches demonstrates that the proposed method is the most efficient and provides the best global approximation accuracy, with particular superiority for quantities of interest in the multi-modal and highly nonlinear landscape. This novel method is very promising for efficient uncertainty analysis and surrogate-based optimization of expensive engineering problems.
... For a positive-definite, Hermitian matrix, the condition number is defined as the ratio of its maximum and minimum eigenvalues[61]. A wellconditioned matrix indicates that its condition number is small. ...
Preprint
Full-text available
This paper investigates regularized estimation of Kronecker-structured covariance matrices (CM) for complex elliptically symmetric (CES) data. To obtain a well-conditioned estimate of the CM, we add penalty terms of Kullback-Leibler divergence to the negative log-likelihood function of the associated complex angular Gaussian (CAG) distribution. This is shown to be equivalent to regularizing Tyler's fixed-point equations by shrinkage. A sufficient condition that the solution exists is discussed. An iterative algorithm is applied to solve the resulting fixed-point iterations and its convergence is proved. In order to solve the critical problem of tuning the shrinkage factors, we then introduce three methods by exploiting oracle approximating shrinkage (OAS) and cross-validation (CV). The proposed algorithms are applied to the CM estimation for polarization-space-time adaptive processing (PSTAP) in the context of heterogeneous clutter, where the clutter can be modeled using CES distributions with a Kronecker product-structured CM. When the training samples are limited, the proposed estimator, referred to as the robust shrinkage Kronecker estimator (RSKE), has better performance compared with several existing methods. Simulations are conducted for validating the proposed estimator and demonstrating its high performance.
... En effet, le conditionnement dépend du nombre de points et de la distance maximale entre ces points. Ainsi, les matrices de covariance construites en se basant sur ce noyau sont particulièrement mal conditionnées quand les points d'apprentissage sont fortement corrélés [Kostinski et Koivunen, 2000]. ...
Thesis
Les analyses de fiabilité sur des problèmes d'ingénierie complexes impliquent souvent des temps de calcul très élevés et requièrent l'utilisation de méthodes numériques avancées.L'estimation de la probabilité de défaillance par des approches d'apprentissage actif de processus gaussiens est une voie possible pour fortement réduire les temps de calcul. Elles consistent en le classement d'une population d'échantillons à partir du modèle de processus gaussien construit. Ainsi, deux sources d'incertitude ont une influence sur l'estimateur de la probabilité de défaillance : l'approximation par métamodèle et la variabilité de l'échantillonnage. Dans ces travaux de thèse, nous proposons une méthodologie pour quantifier la sensibilité de l'estimateur de la probabilité de défaillance à ces deux sources d'incertitude. Une méthodologie focalisant l'enrichissement sur la réduction de la source majeure d'incertitude ainsi qu'un critère d'arrêt basé sur l'erreur globale sont proposés. L'approche est étendue pour l'estimation d'évènements rares.Une autre voie étudiée concerne la réduction du coût numérique unitaire d'évaluation des points d'apprentissage. Ainsi, nous proposons un couplage entre les approches d'apprentissage actif et la réduction de modèle de type base réduite. Une méthodologie adaptative permettant de choisir, par le biais d'un critère de couplage, si une solution réduite peut être utilisée à la place du modèle numérique complexe est proposée.
... In fact, the condition number depends on the number of sample points and the maximum distance between them. The covariance matrices built with this kernel are then particularly ill-conditioned when the training data are strongly correlated [41], which is the case when an active learning algorithm is used for GP based reliability analysis. For this reason, in this paper the Matérn 5 /2 kernel (see Eq. (7)) will be used. ...
Preprint
Running a reliability analysis on engineering problems involving complex numerical models can be computationally very expensive, requiring advanced simulation methods to reduce the overall numerical cost. Gaussian process based active learning methods for reliability analysis have emerged as a promising way for reducing this computational cost. The learning phase of these methods consists in building a Gaussian process surrogate model of the performance function and using the uncertainty structure of the Gaussian process to enrich iteratively this surrogate model. For that purpose a learning criterion has to be defined. Then, the estimation of the probability of failure is typically obtained by a classification of a population evaluated on the final surrogate model. Hence, the estimator of the probability of failure holds two different uncertainty sources related to the surrogate model approximation and to the sampling based integration technique. In this paper, we propose a methodology to quantify the sensitivity of the probability of failure estimator to both uncertainty sources. This analysis also enables to control the whole error associated to the failure probability estimate and thus provides an accuracy criterion on the estimation. Thus, an active learning approach integrating this analysis to reduce the main source of error and stopping when the global variability is sufficiently low is introduced. The approach is proposed for both a Monte Carlo based method as well as an importance sampling based method, seeking to improve the estimation of rare event probabilities. Performance of the proposed strategy is then assessed on several examples.
... The exponential and spline correlation functions presented in Table 1 are studied in this study and maximum likelihood estimate is used to obtain the correlation coefficient θ t from 0.1 to 10. For exponential function, the parameter α is set to be 1.9 to avoid the ill-condition of exponential correlation matrix suggested by Refs (Koivunen and Kostinski 1998;Kostinski and Koivunen 2000;Zimmermann 2015). ...
Article
Full-text available
The efficiency of optimization for the high dimensional problem has been improved by the metamodeling techniques in multidisciplinary in the past decades. In this study, comparative studies are implemented for high dimensional problems on the accuracy of four popular metamodeling methods, Kriging (KRG), radial basis function (RBF), least square support vector regression (LSSVR) and cut-high dimensional model representation (cut-HDMR) methods. Besides, HDMR methods with different basis functions are considered, including KRG-HDMR, RBF-HDMR and SVR-HDMR. Four factors that might influence the quality of metamodeling methods involving parameter interaction of problems, sample sizes, noise level and sampling strategies are considered. The results show that the LSSVR with Gaussian kernel, using Latin hypercube sampling (LHS) strategy, constructs more accurate metamodels than the KRG. The RBF with Gaussian basis function performs poor in the group. Generally, cut-HDMR methods perform much better than the other metamodeling methods when handling the function with weak parameter interaction, but not better when handling the function with strong parameter interaction.
... The inherent ill conditioning of covariance matrices was investigated in the literature in different applications [22,35]. In DA applications the behavior of the condition number with respect to sampling distance, number of data points, domain size, for Gaussian-type covariances has been studied in [27,39]. ...
Article
We consider the Variational Data Assimilation (VarDA) problem in an operational framework, namely, as it results when it is employed for the analysis of temperature and salinity variations of data collected in closed and semi closed seas. We present a computing approach to solve the main computational kernel at the heart of the VarDA problem, which outperforms the technique nowadays employed by the oceanographic operative software. The new approach is obtained by means of Tikhonov regularization. We provide the sensitivity analysis of this approach and we also study its performance in terms of the accuracy gain on the computed solution. We provide validations on two realistic oceanographic data sets.
... We observe that quantifying observation error correlations is not a straightforward problem and this is a research issue [20,43]. Secondly, even when good estimates of the errors can be made, there were conditioning problems with the minimization that had to be overcome [17][18][19]24,25]. Observation errors are usually assumed uncorrelated because diagonal matrices are simple and numerical efficient. ...
Article
Full-text available
Data assimilation (DA) is a methodology for combining mathematical models simulating complex systems (the background knowledge) and measurements (the reality or observational data) in order to improve the estimate of the system state (the forecast). The DA is an inverse and ill posed problem usually used to handle a huge amount of data, so, it is a large and computationally expensive problem. Here we focus on scalable methods that makes DA applications feasible for a huge number of background data and observations. We present a scalable algorithm for solving variational DA which is highly parallel. We provide a mathematical formalization of this approach and we also study the performance of the resulted algorithm.
... In [18] the special structure of the Gaussian correlation matrix was investigated for equidistant multivariate observations. The exceptionally high conditioning of the Gaussian correlation model was termed the Gaussian anomaly in [17] and has been explored numerically in [19] in the setting of a univariate process based on uniform sample data. In this case, explicit expressions for the eigenvalues and eigenvectors of the circulant Gaussian correlation matrices exist, see [6, p. 5, eqs. ...
... It turns out that the key difference between the Gaussian model and the other members of the exponential correlation family in quantifying the condition number growth is that the derivative of the Gaussian correlation matrix with respect to the range parameter is rank deficient, being a standard Euclidean distance matrix [20,21], while the derivatives of the exponential correlation matrices are regular for generic sample point sets. Thus, all phenomena observed in [19] are explained theoretically. ...
Article
Full-text available
Spatial correlation matrices appear in a large variety of applications. For example, they are an essential component of spatial Gaussian processes, also known as spatial linear models or Kriging estimators, which are powerful and well-established tools for a multitude of engineering applications such as the design and analysis of computer experiments, geostatistical problems and meteorological tasks.In radial basis function interpolation, Gaussian correlation matrices arise frequently as interpolation matrices from the Gaussian radial kernel function. In the field of data assimilation in numerical weather prediction, such matrices arise as background error covariances.Over the past thirty years, it was observed by several authors from several fields that the Gaussian correlation model is exceptionally prone to suffer from ill-conditioning, but a quantitative theoretical explanation for this anomaly was lacking. In this paper, a proof for the special position of the Gaussian correlation matrix is given. The theoretical findings are illustrated by numerical experiment.