Article

Some theoretical properties of the O-PLS method

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The objective of this paper is to present new properties of the orthogonal projections to latent structures (O-PLS) method developed by Trygg and Wold (J. Chemometrics 2002; 16: 119–128). The original orthogonal signal correction (OSC) filter of Wold et al. (Chemometrics Intell. Lab. Syst. 1998; 44: 175–185) removes systematic variation from X that is unrelated to Y. O-PLS is a more restrictive OSC filter. O-PLS removes only systematic variation in X explained in each PLS component that is not correlated with Y. O-PLS is a slight modification of the NIPALS PLS algorithm, which should make O-PLS a generally applicable preprocessing and filtering method. The computation of the O-PLS components under the constraint of being correlated with one PLS component imposes particular properties on the space spanned by the O-PLS components. This paper is divided into two main sections. First we give an application of O-PLS on near-infrared reflectance spectra of soil samples, showing some graphical properties. Then we give the mathematical justifications of these properties. Copyright © 2004 John Wiley & Sons, Ltd.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... for LC and for no cancer (NC), respectively (software default, SIMCA v.14.1)) were first analysed by principal component analysis (PCA) for data inspection for potential biases in the data, such as clusters or outliers which could skew findings 22 . Orthogonal projections to latent structures (OPLS) discriminant analysis (detailed description below) with cross-validation (CV) was then carried out to class-separate the data between the predicted (LC vs. NC) and orthogonal (structured noise) states [23][24][25][26] (SIMCA v.14.1). Univariate associations to LC were analysed with binary logistic regression, and proportional (e.g. ...
... orthogonal projections to latent structures (opLS) discriminant analysis. An OPLS modelling approach was utilised to analyse variables (descriptors) covarying with outcome (LC or NC) [23][24][25][26] . Analyses were performed with SIMCA v.14.1, Umetrics ™ Suite, Sartorius Stedim Biotech. ...
... Signs of early satiety before diagnosis and treatment, for example, was a major early LC predictor in the current study that has, to our knowledge, not been identified before. Our specific, in-depth and complex investigation allowed for key descriptors to surface, and such an approach requires an advanced method like OPLS to handle the magnitude of variables by projection instead of being directly influenced by-or needing to control for the amount of variables [23][24][25][26] . As a potential tool for use in clinical practice, the 70 variables identified may at a later stage be administered as a questionnaire to individuals exhibiting respiratory-related distress, whereby the resulting OPLS risk-prediction score may be used to flag patients for specialized diagnostic workup. ...
Article
Full-text available
The aim of this study was to identify a combination of early predictive symptoms/sensations attributable to primary lung cancer (LC). An interactive e-questionnaire comprised of pre-diagnostic descriptors of first symptoms/sensations was administered to patients referred for suspected LC. Respondents were included in the present analysis only if they later received a primary LC diagnosis or had no cancer; and inclusion of each descriptor required ≥4 observations. Fully-completed data from 506/670 individuals later diagnosed with primary LC (n = 311) or no cancer (n = 195) were modelled with orthogonal projections to latent structures (OPLS). After analysing 145/285 descriptors, meeting inclusion criteria, through randomised seven-fold cross-validation (six-fold training set: n = 433; test set: n = 73), 63 provided best LC prediction. The most-significant LC-positive descriptors included a cough that varied over the day, back pain/aches/discomfort, early satiety, appetite loss, and having less strength. Upon combining the descriptors with the background variables current smoking, a cold/flu or pneumonia within the past two years, female sex, older age, a history of COPD (positive LC-association); antibiotics within the past two years, and a history of pneumonia (negative LC-association); the resulting 70-variable model had accurate cross-validated test set performance: area under the ROC curve = 0.767 (descriptors only: 0.736/background predictors only: 0.652), sensitivity = 84.8% (73.9/76.1%, respectively), specificity = 55.6% (66.7/51.9%, respectively). In conclusion, accurate prediction of LC was found through 63 early symptoms/sensations and seven background factors. Further research and precision in this model may lead to a tool for referral and LC diagnostic decision-making. https://www.nature.com/articles/s41598-019-52915-x
... By extracting variation from its computed PLS components that is uncorrelated (orthogonal) to the responses, OPLS produces a more interpretable regression model compared to PLS. In fact, when trained on the same data and responses, an OPLS model and a PLS model with the same total number of components will show no difference in predictive ability [8]. Despite its relative novelty to the field, the enhanced interpretability of OPLS over PLS has made it a popular method in exploratory studies of spectroscopic datasets of complex chemical mixtures (e.g., metabolomics [9], food and soil science [10], and chemical process control [11]). ...
... Worley where w is constrained to unit norm and Z is the orthogonal projector for the super scores where the predictive and orthogonal loadings for each matrix are interrelated by a set of predictive and orthogonal super scores, respectively: (8) where each E i is a data residual matrix that holds all variation in X i not explained by the model. Concatenation of all block-level matrices together in equation (8) results in a toplevel consensus model, which is in fact equivalent to an OPLS model trained on the partitioned data matrix X: (9) Like PLS and MB-PLS, an MB-OPLS model contains a second equation that relates the predictive super scores and responses: (10) where C is the response loadings matrix that relates the super scores to the responses, and F is the response residual matrix that holds Y-variation not captured by the model. ...
... In effect, by computing φ as the fraction of orthogonal variation to remove from its loadings, MB-OPLS yields the same orthogonal weights (w o ) as OPLS of the concatenated matrix. Therefore, because w o equals the column-wise concatenation of all weights w o,i , it is then apparent that the orthogonal super scores extracted by MB-OPLS are identical to those from OPLS of the concatenated matrix X, as illustrated in the following equation: (15) From this equivalence, and the fact that steps (2)(3)(4)(5)(6)(7)(8)(9)(10)(11) and (21) in MB-OPLS constitute an MB-PLS iteration, we arrive at the equivalence between MB-OPLS and OPLS. Thus, orthogonality between the responses and orthogonal super scores t o computed by MB-OPLS is also ensured. ...
Article
Methods of multiblock bilinear factorizations have increased in popularity in chemistry and biology as recent increases in the availability of information-rich spectroscopic platforms has made collecting multiple spectroscopic observations per sample a practicable possibility. Of the existing multiblock methods, consensus PCA (CPCA-W) and multiblock PLS (MB-PLS) have been shown to bear desirable qualities for multivariate modeling, most notably their computability from single-block PCA and PLS factorizations. While MB-PLS is a powerful extension to the nonlinear iterative partial least squares (NIPALS) framework, it still spreads predictive information across multiple components when response-uncorrelated variation exists in the data. The OnPLS extension to O2PLS provides a means of simultaneously extracting predictive and uncorrelated variation from a set of matrices, but is more suited to unsupervised data discovery than regression. We describe the union of NIPALS MB-PLS with an orthogonal signal correction (OSC) filter, called MB-OPLS, and illustrate its equivalence to single-block OPLS for regression and discriminant analysis.
... Verron et al. [1] established that the single response partial least squares regression (PLS1) and the single response orthogonal projections to latent structures (OPLS) models with the same number of components give identical predictions, and Ergon [2] demonstrated that the weights W opls associated with the y-orthogonal components found by the OPLS algorithm in [3] are identical (modulo signs) to the ordinary PLS1 weights after excluding the first column of W. The exact relationship for the weights is W opls = [-w 2 -w 3 : : : ...
... Trygg and Wold [3] (who appear not to have realized this when their paper was published) took out a patent on their OPLS algorithm in spite of the fact that their description clearly misses a proper analysis of the optimization problem being solved as well as a derivation of the associated regression coefficients corresponding to the original X variables. Tapp and Kemsley (who obviously understood the consequences of the conclusions in [1,2]) published two papers [4,5], both with the message that the findings of these papers imply that everything found by the OPLS algorithm can be calculated from a PLS1 model of the same complexity and vice versa. ...
... A , however, correspond to y-orthogonal phenomena or noise. Until the relatively recent publications [1,2], this seems to have been an ignored fact in most applications of PLS. Comparisons with prior knowledge (in particular about spectra associated with relevant phenomena, substances and/or chemicals) are of course the key to obtain meaningful interpretations of both orthogonal phenomena and regression coefficients w 1 and b. ...
Article
It is well known that the predictions of the single response orthogonal projections to latent structures (OPLS) and the single response partial least squares regression (PLS1) regression are identical in the single-response case. The present paper presents an approach to identification of the complete y-orthogonal structure by starting from the viewpoint of standard PLS1 regression. Three alternative non-deflating OPLS algorithms and a modified principal component analysis (PCA)-driven method (including MATLAB code) is presented. The first algorithm implements a postprocessing routine of the standard PLS1 solution where QR factorization applied to a shifted version of the non-orthogonal scores is the key to express the OPLS solution. The second algorithm finds the OPLS model directly by an iterative procedure. By a rigorous mathematical argument, we explain that orthogonal filtering is a ‘built-in’ property of the traditional PLS1 regression coefficients. Consequently, the capabilities of OPLS with respect to improving the predictions (also for new samples) compared with PLS1 are non-existing. The PCA-driven method is based on the fact that truncating off one dimension from the row subspace of X results in a matrix Xorth with y-orthogonal columns and a rank of one less than the rank of X. The desired truncation corresponds exactly to the first X deflation step of Martens non-orthogonal PLS algorithm. The significant y-orthogonal structure of X found by PCA of Xorth is split into two fundamental parts: one part that is significantly contributing to correct the first PLS score toward y and one part that is not. The third and final OPLS algorithm presented is a modification of Martens non-orthogonal algorithm into an efficient dual PLS1–OPLS algorithm. Copyright © 2014 John Wiley & Sons, Ltd.
... To alleviate the aforementioned problem, many approaches have been proposed on the basis of either reducing the dimensionality of irrelevant X variables or modifying the original PLS algorithm using different inner relation descriptors [8][9][10][11][12][13][14][15][16][17][18][19][20][21]. For instance, variable selection methods are introduced to objectively identify variables that contribute useful information and/or eliminate variables containing high-level noise [9][10][11]. ...
... O-PLS can be seen as a dimensionality reduction algorithm that removes the irrelevant systematic variation to facilitate model interpretation. Although O-PLS model is simpler than PLS, it confers no predictive performance advantage over traditional PLS [14][15][16]. Conversely, Wold proposed polynomial and spline PLS algorithms, which retain the framework of linear PLS but modifies the relationship between the predictor and the response LVs to be nonlinear [17,19]. Indahl proposed the powered PLS (PPLS) algorithm, which provided a stepwise optimization over a set of candidate loading weights on the basis of powers of correlations and standard deviations, and the method often finds models with fewer and more interpretable components than PLS [8]. ...
... OPLS-DA [41] divides the variables into predictive and orthogonal information with orthogonal signal correction (OSC) technology [42]. Compared with PLS-DA, it provides better visualization and interpretation and has been widely used in modeling and biomarker discovery for metabolomics [43,44]. However, PLS-DA and OPLS-DA are suitable for linear and collinearity datasets and may fail on the nonlinear datasets. ...
Article
Full-text available
Untargeted metabolomics based on liquid chromatography coupled with mass spectrometry (LC–MS) can detect thousands of features in samples and produce highly complex datasets. The accurate extraction of meaningful features and the building of discriminant models are two crucial steps in the data analysis pipeline of untargeted metabolomics. In this study, pure ion chromatograms were extracted from a liquor dataset and left-sided colon cancer (LCC) dataset by K-means-clustering-based Pure Ion Chromatogram extraction method version 2.0 (KPIC2). Then, the nonlinear low-dimensional embedding by uniform manifold approximation and projection (UMAP) showed the separation of samples from different groups in reduced dimensions. The discriminant models were established by extreme gradient boosting (XGBoost) based on the features extracted by KPIC2. Results showed that features extracted by KPIC2 achieved 100% classification accuracy on the test sets of the liquor dataset and the LCC dataset, which demonstrated the rationality of the XGBoost model based on KPIC2 compared with the results of XCMS (92% and 96% for liquor and LCC datasets respectively). Finally, XGBoost can achieve better performance than the linear method and traditional nonlinear modeling methods on these datasets. UMAP and XGBoost are integrated into KPIC2 package to extend its performance in complex situations, which are not only able to effectively process nonlinear dataset but also can greatly improve the accuracy of data analysis in non-target metabolomics.
... In an OPLS-DA model, the variation from matrix X that is not correlated to Y is removed. Therefore, some authors classify OPLS-DA better than PLS-DA for analysis interpretation, although both methods have the same predictive power (Trygg and Wold, 2002;Verron et al., 2004;Tapp and Kemsley, 2009). Some authors have reported the applicability of unsupervised (Zhang et al., 2015) and supervised (Pan et al., 2010) chemometrics to discriminate samples and to discover biomarkers from GC-MS data. ...
Article
Full-text available
Bioguided isolation to discriminate antimicrobial compounds from volatile oils is a time- and money-consuming process. Considering the limitations of the classical methods, it would be a great improvement to use chemometric techniques to identify putative biomarkers from volatile oils. For this purpose, antimicrobial assays of volatile oils extracted from different plant species were carried out against Streptococcus mutans. Eight volatile oils that showed different antimicrobial effects (inactive, weakly active, moderately active and very active) were selected in this work. The volatile oils’ composition was determined by GC–MS-based metabolomic analysis. Orthogonal projection to latent structures discriminant analysis and decision tree were carried out to access the metabolites that were highly correlated with a good antimicrobial activity. Initially, the GC–MS metabolomic data were pretreated by different methods such as centering, autoscaling, Pareto scaling, level scaling and power transformation. The level scaling was selected by orthogonal projection to latent structures discriminant analysis as the best pretreatment according to the validation results. Based on this data, decision tree was also carried out using the same pretreatment. Both techniques (orthogonal projection to latent structures discriminant analysis and decision tree) pointed palmitic acid as a discriminant biomarker for the antimicrobial activity of the volatile oils against S. mutans. Additionally, orthogonal projection to latent structures discriminant analysis and decision tree predicted as “very active” the antimicrobial activity of volatile oils, which did not belong to the training group. This predicted result is in agreement with our experimental result (MIC = 31.25 μg ml⁻¹). The present study can contribute to the development of useful strategies to help identifying antimicrobial constituents of complex oils.
... This gives rise to a much better interpretability as the orthogonal variation is not accounted for the prediction (Trygg et al. 2002). Nevertheless, it is well known that the predictions of the single response OPLS and the single response PLS-r result in identical regressions (Verron et al. 2004, Indahl 2014). ...
Thesis
Medicinal plants constitute an unfailing source of compounds (natural products – NPs) utilised in medicine for the prevention and treatment of various deceases. The introduction of new technologies and methods in the field of natural products chemistry enabled the development of high throughput methodologies for the chemical composition determination of plant extracts, evaluation of their properties and the exploration of their potentials as drug candidates. Lately, metabolomics, an integrated approach incorporating the advantages of modern analytical technologies and the power of bioinformatics has been proven an efficient tool in systems biology. In particular, the application of metabolomics for the discovery of new bioactive compounds constitutes an emerging field in natural products chemistry. In this context, Acronychia genus of Rutaceae family was selected based on its well-known traditional use as antimicrobial, antipyretic, antispasmodic and anti-inflammatory therapeutic agent. Modern chromatographic, spectrometric and spectroscopic methods were utilised for the exploration of their metabolite content following three basic axes constituting the three chapters of this thesis. Briefly, the first chapter describes the phytochemical investigation of Acronychia pedunculata, the identification of secondary metabolites contained in this species and evaluation of their biological properties. The second chapter refers to the development of analytical methods for the identification of acetophenones (chemotaxonomic markers of the genus) and to the dereplication strategies for the chemical characterisation of extracts by UHPLC-HRMSn. The third chapter focuses on the application of metabolomic methodologies (LC-MS & NMR) for comparative analysis (between different species, origins, organs), chemotaxonomic studies (between species) and compound-activity correlations.
... OPLS-DA supposes that only the variance related to the response is useful for modeling [167]. It gives better visualization and interpretation than PLS-DA [168] and has been widely applied in modeling and biomarker discovery in metabolomics [169e171]. ...
Article
This review focuses on recent and potential advances in chemometric methods in relation to data processing in metabolomics, especially for data generated from mass spectrometric techniques. Metabolomics is gradually being regarded a valuable and promising biotechnology rather than an ambitious advancement. Herein, we outline significant developments in metabolomics, especially in the combination with modern chemical analysis techniques, and dedicated statistical, and chemometric data analytical strategies. Advanced skills in the preprocessing of raw data, identification of metabolites, variable selection, and modeling are illustrated. We believe that insights from these developments will help narrow the gap between the original dataset and current biological knowledge. We also discuss the limitations and perspectives of extracting information from high-throughput datasets.
... By far, the most popular one is the OPLS-DA (orthogonal PLS-DA) method which usually generates models that are easier to interpret compared to the models generated by PLS-DA [53]. However, it should be mentioned that given the same data set PLS-DA and OPLS-DA only show differences in the interpretability of the model and not in the classification or predictive abilities [54]. From a mathematical point of view, the main difference is that PLS-DA separates the X variability in two parts: systematic and residual; while OPLS-DA separates the X variability in three parts: predictive (correlated to Y), orthogonal (uncorrelated to Y) and residual. ...
Article
Lipids are a broad group of biomolecules involved in diverse critical biological roles such as cellular membrane structure, energy storage or cell signaling and homeostasis. Lipidomics is the -omics science that pursues the comprehensive characterization of lipids present in a biological sample. Different analytical strategies such as nuclear magnetic resonance or mass spectrometry with or without previous chromatographic separation are currently used to analyze the lipid composition of a sample. However, current analytical techniques provide a vast amount of data which complicates the interpretation of results without the use of advanced data analysis tools. The choice of the appropriate chemometric method is essential to extract valuable information from the crude data as well as to interpret the lipidomic results in the biological context studied. The present work summarizes the diverse methods of analysis than can be used to study lipidomic data, from statistical inference tests to more sophisticated multivariate analysis methods. In addition to the theoretical description of the methods, application of various methods to a particular lipidomic data set as well as literature examples are presented. Copyright © 2015 Elsevier B.V. All rights reserved.
... It is important to note that OPLS- DA has similar prediction results with PLS-DA (Tapp and Kemsley, 2009). But OPLS-DA has better visualization and interpretation ability since fewer latent variables are required to explain the same variation of the data compared to PLS-DA (Verron et al., 2004). ...
Article
This review focuses on the recent and potential advances of currently available chemometric methods in relation to data processing in plant metabolomics, especially for the data generated from the mass spectrometry (MS) techniques. Recently, plant metabolomics has been gradually regarded as a valuable and promising biotechnology rather than an ambitious advancement. We here outline some significant developments of plant metabolomics, especially, in the combination of modern chemical analysis techniques, dedicated statistical, chemometric data analysis strategies. The advanced skills in the preprocessing of raw data, identification of metabolites, variable selection and modeling are illustrated. We believe that the insights into these developments are helpful to narrow down the knowledge gap between the molecular organization and metabolism control of plants. We here also discuss the limitations and perspectives in extracting information from high-throughput datasets. Copyright © 2014. Published by Elsevier Inc.
... This is definitively not the case. OPLS modeling was shown to have the same predictive power as PLS modeling at an early stage in its development [58]. As long as different models capture roughly identical amounts of variation, their predictive power will be fairly similar. ...
... For multivariate calibration the main benefit is thus reduction of the number of PLS components rather than improved predictions. Ergon presented an alternative to OPLS using ordinary PLS and similarity transformation as a post-processing method (112). ...
Article
Full-text available
This review covers the area of multivariate calibration; from pre-processing of data prior to modeling and applications of regression methods for calibration and prediction. The importance of pre-treatment of data is highlighted with many of the recently developed methods together with traditional methods. Several articles provide comparisons between different pre-processing methods. Methods for data from coupled chromatographic methods, which have found increasing use and where data pre-processing is a prerequisite for multivariate modeling, are also included. Many of the novel chemometric methods deal with model complexity and interpretation. A diverse set of applications are also presented and references are also given to early papers, making it possible to acquire a deeper knowledge of methods of interest.
... The main drawback of this approach is that the calibration set does not necessarily contain all the variability due to the scatter effect. Moreover, the PLS algorithm is subject to the same constraint so OSC does not add any value to the calibration quality if it is based on a PLS (Verron et al., 2004). ...
Chapter
Soil carbon sequestration is one possible way of reducing greenhouse gas emissions in the atmosphere. However, to evaluate the real benefits offered by these methods (new agricultural practices, reforestation, etc.), there is a need in rapid, precise, andlow-cost analytical tools. Near-infrared spectroscopy (NIRS) is now commonly used to measure different physical and chemical parameters of soils, including carbon content. However, prediction model accuracy is insufficient for NIRS to replace routine laboratory analysis and/or to make in situ measurements, whatever the type of soil. One of the biggest issues that need to be addressed concerns the calibration process: how does the mathematical method or the sample selection influence the model quality? In most cases, there are not a lot of thoughts put into the choice of the mathematical method, which is often made empirically (test and try). It is therefore essential to return to fundamental laws governing spectrum formation in order to optimize calibration. Indeed, the light/matter interactions are at the basis of the resulting linear modeling. This chapter reviews and discusses the basic theoretical concepts underpinning NIRS and linear chemometric modeling in the specific context of soil: (i) light scattering due to soil particles causes departure in the assumed linear relationship between the spectrum and the carbon content, and (ii) the other classical linear regression assumptions (constant residual variance, normal error distribution, etc.) are also put into question. Regarding these specific issues, the different chemometric methods presented as possible solutions to perform better calibration model are discussed, from linear methods associated with various preprocessing, local methods, or nonlinear methods.
... The predictive ability for PLS and OPLS is the same for the same number of PLS components implying that OPLS model the same subspace as obtained by standard PLS regression. Verron et al. [9] showed that this was indeed the case. Ergon [10] showed that the same predictive score and loading vectors as in (www.interscience.wiley.com) ...
Article
Target projection (TP) also called target rotation (TR) was introduced to facilitate interpretation of latent-variable regression models. Orthogonal partial least squares (OPLS) regression and PLS post-processing by similarity transform (PLS + ST) represent two alternative algorithms for the same purpose. In addition, OPLS and PLS + ST provide components to explain systematic variation in X orthogonal to the response. We show, that for the same number of components, OPLS and PLS + ST provide score and loading vectors for the predictive latent variable that are the same as for TP except for a scaling factor. Furthermore, we show how the TP approach can be extended to become a hybrid of latent-variable (LV) regression and exploratory LV analysis and thus embrace systematic variation in X unrelated to the response. Principal component analysis (PCA) of the residual variation after removal of the target component is here used to extract the orthogonal components, but X-tended TP (XTP) permits other criteria for decomposition of the residual variation. If PCA is used for decomposing the orthogonal variation in XTP, the variance of the major orthogonal components obtained for OPLS and XTP is observed to be almost the same, showing the close relationship between the methods. The XTP approach is tested and compared with OPLS for a three-component mixture analyzed by infrared spectroscopy and a multicomponent mixture measured by near infrared spectroscopy in a reactor. Copyright © 2008 John Wiley & Sons, Ltd.
Article
Oral administration of chitooligosaccharides (COS) has been reported to alleviate colitis in mice. However, the mechanism of action of COS with specific polymerization degree on gut inflammation and metabolism remains unclear. This study aimed to investigate the effects of chitobiose (COS2), chitotetraose (COS4), and chitohexaose (COS6) on colitis, and to elucidate their underlying mechanisms. COS2, COS4, and COS6 were able to significantly alleviate colonic injury and inflammation levels. COS6 has the best anti-inflammatory effect. Furthermore, COS6 could down-regulate the level of indoleamine-2,3-dioxygenase1 (IDO1) and restore the levels of indole, indoleacetic-3-acid (IAA), and indole-3-carbaldehyde (I3A) in the cecum of chronic colitis mice (p < 0.05), thereby regulating tryptophan metabolism. In the aromatic hydrocarbon receptor-IL-22 (AHR-IL-22) pathway, although there were differences between chronic colitis and acute colitis mice, COS intervention could restore the AHR-IL-22 pathway to normal, promote the expression of MUC2, and repair the intestinal mucosal barrier. In conclusion, the results of this study suggested that COS had a good inhibitory effect on IDO1 under inflammation and the changes of AHR and IL-22 levels at different stages of disease development. This provides new insights into the potential use of COS as a functional food for improving intestinal inflammation and metabolism.
Article
The rhizoma of Anemarrhenae asphodeloides has a long history of hypoglycemic use in Chinese traditional medicine. In this paper, 400 μmol/L H2O2 induced normal INS-1 pancreatic beta cells to establish experimental model of oxidative damage. Quercetin was used as a positive drug, and mangiferin and its ethanolic extract were selected as therapeutic agents in an oxidative damage model to evaluate the ameliorative effect of the active ingredients of Anemarrhenae asphodeloides rhizoma on oxidative damage in INS-1 pancreatic β-cells. Buliding a qualitative analysis method of membrane phospholipids of INS-1 pancreatic beta cells and identified 82 phospholipids based on the UPLC/Q-TOF MS technology, which could provide a database for further statistics analysis. OPLS-DA was used to screen the phospholipid biomarkers from the raw data. Exploring the biological significances of these biomarkers, and discussing the toxic effect of the effective components of Anemarrhenae asphodeloides rhizoma , on oxidatively damaged INS-1 pancreatic beta cell.
Thesis
Metabolomics is the science designed to comprehensively study the metabolome, the repertoire of small molecule metabolites, which gives a comprehensive snapshot of the physiological state of the biofluid, extracts or cells studied. Measuring metabolites by using metabolomics is a key complementary to genome, transcriptome and proteome studies, which may improve our understanding of how genetics, environment, the microbiome, disease, drug exposure, diet, and lifestyle influence the phenotype. One of important application of metabolomics in clinical research is the discovery of novel biomarkers. The present PhD thesis focus on biomarkers discovery by applying metabolomics, the objectives were: (1) by using NMR and UPLC-HRMS based metabolomic and lipidomic profiling, to identify novel plasma biomarkers, if any, which characterize the different stage of Non-alcoholic fatty liver disease (NAFLD), and (2) by combining UPLC-HRMS based untargeted metabolomics with epidemiology approach, to identify plasma biomarkers which associated with the risk of developing prostate cancer (PCa) within the following decade.
Article
Metabolomics is the science of studying small molecules (metabolites) in biological systems with the aim of getting insight into cells, biofluids and organisms. Chemometric methods are powerful tools to address data problems generated in metabolomic studies and to extract valuable information. This review focuses mainly on a range of chemometric methods used for processing of metabolomics data generated from gas chromatography-mass spectrometry (GC-MS) and comprehensive two-dimensional gas chromatography-mass spectrometry (GC×GC-MS). Herein, essential skills used for preprocessing of raw data, multivariate resolution, pattern recognition, variable selection and identification of metabolites are thoroughly discussed to tackle metabolomic problems. Additionally, different metabolites extraction methods from various biological samples are discussed prior to chemometric analysis. To present a clear picture, different examples form the literature between 2010 to 2020 are used to show each of the mentioned topics.
Article
Orthogonal Projection to Latent Structures (OPLS) is a preprocessing method that was presented as an improvement of the PLS algorithm it was issued from. Nevertheless, according to the bibliography its added value is questionnable both for prediction and interpretation. To contribute to a better understanding, we investigated the relationship between OPLS and the Net Analyte Signal (NAS). For four numerical applications, the matrix obtained after the OPLS deflation tended towards a matrix of rank 1 when the number of removed dimensions increased. Therefore, the row-vectors of this matrix are collinear to the NAS, and so the usual one-latent-variable PLS1 regression following the OPLS preprocessing can be replaced by almost any regression method. Moreover, the interpretation relies on a vector of rank one issued from the deflated matrix, which does not bring more than the regular PLS regression vector.
Article
Background: The computational studies on 2-phenazinamines with their protein targets have been carried out to design compounds with potential anticancer activity and selectivity over specific BCR-ABL tyrosine kinase. Methods: This has been achieved through G-QSAR and molecular docking studies. Computational chemistry was done by using VLife MDS 4.3 and Autodock 4.2. 2D and structures of ligands were drawn by using Chemdraw 2D Ultra 8.0 and were converted into 3D. These were optimized by using semi-empirical method called MOPAC. The protein structure was downloaded as PDB file from RCSC protein data bank. PYMOL was used for studying the binding interactions. The G-QSAR models generated were found to possess training (r2=0.8074), cross-validation (q2=0.6521), and external validation (pred_r2=0.5892) which proved their statistical significance. Accordingly, the newly designed series of 2-phenazinamines viz., 3-chloro-4-aryl-1-(phenazin-7-yl) azetidin-2-ones (4a-4e) were subjected to wet lab synthesis. Alternatively, docking studies were also conducted which showed binding interactions of some derivatives with > 30% higher binding energy values than the standard anticancer drug imatinib. The lower energy values obtained for these derivatives indicate energetically favorable interaction with protein binding site as compared to standard imatinib. G-QSAR and molecular docking studies predicted better anticancer activity for the synthesized azitidine derivatives of 2-phenazinamines (4a-4e) as compared to standard drug. Results and conclusion: It is therefore surmised that the molecular manipulations at appropriate sites of these derivatives suggested by structure activity relationship data will prove to be beneficial in raising anticancer potential.
Article
Effective methods often rely on simple mathematical operators. Among these operators, orthogonal projections have been widely used because of their simplicity in compensating for detrimental factors. This efficiency depends largely on the way these tools are prepared. This article links the mathematical basics of orthogonal projections to the notion of vectoral subspaces, highlighting which information should be removed in the process and the important practical properties concerned with optimizing this technique. This review covers several methods involving orthogonal projections and focuses specifically on their practical use. This concerns the identification of detrimental information and its removal together with adjusting the dimension of the projection. The methodology discussed in this review will enable the reader to optimize orthogonal projections for any given situation. The concept and importance of orthogonal projections are presented and situated within pretreatments and calibrations. The key points of orthogonal projections are noted: identifying the right information, then building a basis of the subspace to remove detrimental information.
Article
Multivariate techniques based on projection methods such as Principal Component Analysis and Partial Least Squares (PLS) regression are widely applied in metabolomics. However, the effects of confounding factors and the presence of specific clusters in the data could force the projection to produce inefficient representations in the latent space, preventing the identification of the most relevant data variation. To overcome this issue, we introduce a general framework for projection methods, allowing an easy integration of orthogonal constraints, which help in reducing the effect of uninformative variations. In particular, the discussed algorithms address different scenarios. When known confounding factors can be explicitly encoded into a proper constraint matrix, orthogonally Constrained Principal Component Analysis (oCPCA) and orthogonally Constrained PLS2 (oCPLS2) can be used. Orthogonal PLS (OPLS) and post‐transformation of PLS2 (ptPLS2), instead, are suited to problems in which a constraint matrix cannot be defined. Finally, a data integration task is considered: Orthogonal two‐block PLS (O2PLS) and Orthogonal Wold's two-block Mode A PLS (OPLS‐W2A) are used to identify the common variation between two data sets.
Article
Foodomics is a newly developed discipline that has become more and more important in the last years where focus on food and the understanding of food systems has increased significantly. In this review, the flow of a typical foodomics study will be followed with a focus on the core components, where chemometric expertise is more deeply involved. These are: how to acquire sound data, how to exploit an experimental design, how to use classification in a proper way, how to look at more analytical platforms at the same time and, not the least, how to understand the limitations when interpreting the developed models. For each of these phases, the most common data issues will be highlighted and some of the most recent chemometric methods that are able to help solving them, will be presented.
Article
Partial Least Squares (PLS) is a wide class of regression methods aiming at modelling relationships between sets of observed variables by means of latent variables. Specifically, PLS2 was developed to correlate two blocks of data, the X-block representing the independent or explanatory variables and the Y-block representing the dependent or response variables. Lately, OPLS was introduced to further reduce model complexity by removing Y-orthogonal sources of variation from X in the latent space, thus improving data interpretation through the generated predictive latent variables. Nevertheless, relationships between PLS2 and OPLS in case of multiple Y-response have not yet been fully explored. With this perspective and taking inspiration from some basic mathematical properties of PLS2, we here present a novel and general approach consisting in a post-transformation of PLS2 (ptPLS2), which results in a decomposition of the latent space into orthogonal and predictive components, while preserving the same goodness of fit and predictive ability of PLS2. Additionally, we discuss the application of ptPLS2 approach to two metabolomic data sets extracted from earlier published studies and its advantages in model interpretation as compared with the ‘standard’ PLS approach.
Article
Model-based preprocessing using orthogonal signal correction (OSC) is a procedure with a well-defined objective: to remove variation that is linearly independent (orthogonal) of a given response matrix. The properties and applicability of OSC filtering are discussed. Recent extensions of OSC, including orthogonal projection to latent structures (OPLS) (comparable to projection to latent structures (PLS) with an integral OSC filter), O2PLS (bidirectional OPLS), and K-OPLS (nonlinear OPLS), are outlined and discussed.
Article
A rapid and reliable ultra-performance liquid chromatography coupled with electrospray ionization/quadrupole-time-of-flight mass spectrometry (U-HPLC/Q-TOF-MS) has been firstly used to analyze the changes of plasma phospholipids, in type 2 diabetes mellitus (T2DM) mice after administration of berberine and pomegranate seed oil (PSO). The separation of plasma phospholipids was carried out on an Acquity U-HPLC BEH C18 column (2.1mm×50mm, 1.7μm, Waters) by linear gradient elution using a mobile phase consisting of 10mM ammonium formate in water and acetonitrile: isopropanol (1:1, v/v) mixed solution added by 0.25% water and 10mM ammonium formate. The method demonstrated a good precision and reproducibility. Linear regression analysis showed a good linearity. And potential biomarkers were discovered based on their mass spectra and chemometrics methods. The results demonstrated that the proposed U-HPLC/Q-TOF-MS method was successfully applied to analyze the dynamic changes of phospholipids components in plasma of T2DM mice after drug treatment and could provide a useful data base for meriting further study in humans and investigating pharmacological actions of drugs.
Article
The amount of information collected and analyzed in biochemical and bioanalytical research has exploded over the last few decades, due in large part to the increasing availability of analytical instrumentation that yields information-rich spectra. Datasets from Nuclear Magnetic Resonance (NMR), Mass Spectrometry (MS), infrared (IR) or Raman spectroscopy may easily carry tens to hundreds of thousands of potentially correlated variables observed from only a few samples, making the application of classical statistical methods inappropriate, if not impossible. Drawing useful biochemical conclusions from these unique sources of data requires the use of specialized multivariate data handling techniques. Unfortunately, proper implementation of many new multivariate algorithms requires domain knowledge in mathematics, statistics, digital signal processing, and software engineering in addition to analytical chemical and biochemical expertise. As a consequence, analysts using multivariate statistical methods were routinely required to chain together multiple commercial software packages and fashion small ad hoc software solutions to interpret a single dataset. This has been especially true in the field of NMR metabolomics, where no single software package, free or otherwise, was capable of completing all operations required to transform raw instrumental data into a set of validated, informative multivariate models. Therefore, while many powerful methods exist in published literature to statistically treat and model multivariate spectral data, few are readily available for immediate use by the community as a whole. This dissertation describes the development of an end-to-end software solution for the handling and multivariate statistical modeling of spectroscopic data, called MVAPACK, and a set of novel spectral data acquisition, processing and treatment algorithms whose creation was expedited by MVAPACK. A final foray into the potential existence of n-pi* interactions in proteins is also presented. Advisor: Robert Powers
Chapter
OnPLS was recently proposed as a general extension of O2PLS for applications in multiblock and path model analysis. OnPLS is very similar to O2PLS in the case with two matrices, but generalizes symmetrically to cases with more than two matrices without giving preference to any matrix. OnPLS extracts a minimal number of globally joint components that exhibit maximal covariance and correlation. A number of locally joint components are also extracted. These are shared between some matrices, but not between all. These components are also maximally covarying with maximal correlation. The variation that remains after the joint and locally joint variation has been extracted is unique to a particular matrix. This unique variation is orthogonal to all other matrices and captures phenomena specific in its matrix. The method’s utility has been demonstrated by its application to synthetic datasets with very good results in terms of its ability to decompose the matrices. It has been shown that OnPLS affords a reduced number of globally joint components and increased intercorrelations of scores, and that it greatly facilitates interpretation of the models. Preliminary results in the application on real data has also given positive results. The results are similar to previous results using other multiblock and path model methods, but afford an increased interpretability because of the locally joint and unique components.
Chapter
Metabolomics, the characterization of low-molecular-weight compounds in a system, is arising as a powerful tool in food authentication. Metabolomic analyses, which can be performed either in a targeted or untargeted way, allow differentiation between classes and sometimes even identification of different compounds. Gas and liquid chromatography coupled to mass spectrometry and nuclear magnetic resonance are the analytical techniques commonly used in metabolomics. They generate highly complex signals that need to be deconvoluted to extract the maximum information possible. In this chapter, we describe the common workflow of a metabolomic analysis together with examples of approaches in the field of food origin authentication.
Conference Paper
Considering the time-varying nature of an industrial process, a soft sensor based on fast moving window algorithm was developed. The proposed approach adapted the parameters of the inferential model with the dissimilarities between the new and oldest data and incorporated them into the proposed kernel algorithm for the orthogonal projections to latent structures (OPLS). The computational loading of the model adaptation was therefore independent on the window size. Since the non-correlated systematic variation in the predictor variables is removed by the OPLS method, it reduces the complexity of the prediction model, furthermore, clarifying the correlation between the predictor and the response variables. A simulated example of a continuous stirred tank reactor (CSTR) with feedback control systems illustrated that the process characteristics captured by the OPLS could be adapted to accommodate a nonlinear process.
Article
Partial least squares (PLS) is a widely used algorithm in the field of chemometrics. In calibration studies, a PLS variant called orthogonal projection to latent structures (O-PLS) has been shown to successfully reduce the number of model components while maintaining good prediction accuracy, although no theoretical analysis exists demonstrating its applicability in this context. Using a discrete formulation of the linear mixture model known as Beer's law, we explicitly analyze O-PLS solution properties for calibration data. We find that, in the absence of noise and for large n, O-PLS solutions are simpler but just as accurate as PLS solutions for systems in which analyte and background concentrations are uncorrelated. However, the same is not true for the most general chemometric data in which correlations between the analyte and background concentrations are nonzero and pure profiles overlap. On the contrary, forcing the removal of orthogonal components may actually degrade interpretability of the model. This situation can also arise when the data are noisy and n is small, because O-PLS may identify and model the noise as orthogonal when it is statistically uncorrelated with the analytes. For the types of data arising from systems biology studies, in which the number of response variables may be much greater than the number of observations, we show that O-PLS is unlikely to discover orthogonal variation whether or not it exists. In this case, O-PLS and PLS solutions are the same. Copyright © 2011 John Wiley & Sons, Ltd.
Article
Predicting crop yield in-season over large areas before harvest is an important topic in agricultural decision-making. This study compares the performance of partial least squares regression PLSR for predicting rice yield Oryza sativa L. using different signal correction methods on canopy reflectance spectral data. These signal correction methods include the standard normal variate SNV transformation, multiplicative scatter correction MSC, orthogonal signal correction algorithm with leave-one-out cross-validation OSCCV, and orthogonal projections to latent structures O-PLS. Data were acquired over a wavelength range of 350–1100 nm. However, the influence of the intra-variance based on measured dates appeared in the original spectra. Using these pre-processing methods effectively reduced the influence of noise and increased the performance of the final PLSR model. Although SNV and MSC had good predictive ability, they could not clearly identify intra-variance effects. Conversely, the PLSR models with OSC and O-PLS were based on only one component, and could be interpreted in terms of crop parameters. Moreover, the Y -orthogonal component of O-PLS clearly identified intra-variance based on measured dates and provided superior modelling ability. The results of this study show that the O-PLS method is a useful tool for correction and interpretation when constructing a PLSR model for predicting rice yield in-season using canopy reflectance data.
Article
To improve the accuracy in discriminating plant species or genotypes in the field with canopy spectral data, a number of statistical methods incorporating measurement techniques have been developed. This study analyzed canopy reflectance spectra collected at the booting stage by using partial least square regression in combination with discriminant analysis (PLS-DA) to establish a classification model for the discrimination of three mega rice cultivars. To improve the model's capability to interpret and sharpen the separation between cultivars, PLS-DA was combined with orthogonal projection to the latent structure (O-PLS) to derive the OPLS-DA models by removing noise and the Y-orthogonal variation. The ground-based high-resolution reflectance spectra (330–1030 nm) were acquired from paddy field experiments during the growing periods, and were recalculated at intervals of 10 nm. With the PLS-DA approach, the total accuracy for discriminating three cultivars in the calibration datasets was 90% and was above 80% for individual cultivars. In the validation datasets, a similar capability for cultivar discrimination was obtained for both pooled and individual cultivars. However, the Y-orthogonal variation might be embedded within the PLS-DA model. Using the OPLS-DA approach, the large variation within rice cultivars (the intra variation) was effectively removed to improve the performance of both group separation and model establishment. The overall accuracy reached 100% in the calibration datasets and had superior discrimination than the PLS-DA model in the validation datasets. Therefore, the OPLS-DA method is recommended for establishing a classification model for the cultivar discrimination of rice in the vegetative phase using remotely sensed canopy reflectance spectra.
Article
Soft independent modeling of class analogy(SIMCA) has been the standard statistical modeling Umetrics has released SIMCA-P+ Version 12, which is a multivariate data analysis (MVDA)software that performs cluster analysis and partial least squares (PLS) regression with classification trees as well as recently developed modeling technique as well as orthogonal projections to latent structures. In this paper we will discuss briefly the general capabilities of the SIMCA software package and will get focused on PLS and OPLS statistical modeling techniques.
Article
This article concerns two chemometric modeling methods – the well-known partial least squares regression and the comparatively recently-devised orthogonal projections to latent structures (OPLS). We discuss their similarities and differences with a focus on the usage of OPLS in the analytical-chemistry literature.
Article
Partial least squares or projection to latent structures (PLS) has been used in multivariate statistical process monitoring similar to principal component analysis. Standard PLS often requires many components or latent variables (LVs), which contain variations orthogonal to Y and useless for predicting Y. Further, the X-residual of PLS usually has quite large variations, thus is not proper to monitor with the Q-statistic. To reduce false alarm and missing alarm rates of faults related to Y, a total projection to latent structures (T-PLS) algorithm is proposed in this article. The new structure divides the X-space into four parts instead of two parts in standard PLS. The properties of T-PLS are studied in detail, including its relationship to the orthogonal PLS. Further study shows the space decomposition on X-space induced by T-PLS. Fault detection policy is developed based on the T-PLS. Case studies on two simulation examples show the effectiveness of the T-PLS based fault detection methods. © 2009 American Institute of Chemical Engineers AIChE J, 2010
Article
In this paper a novel signal-preprocessing technique that combines the local and multiscale properties of the wavelet prism with the global filtering capability of orthogonal signal correction (OSC) is presented for the pretreatment of spectroscopic data. In this hybrid method, referred to as wavelet OSC (WOSC), a separate OSC filter is applied to each frequency component generated from a wavelet prism decomposition. The combination of wavelet prism and OSC is shown to be complementary, as the prediction results obtained in subsequent partial least squares (PLS) calibrations using WOSC are superior to those obtained using either the wavelet prism or OSC independently. The ability of WOSC to remove undesirable background effects and enhance subsequent PLS models is demonstrated with two near-infrared data sets. Copyright © 2005 John Wiley & Sons, Ltd.
Article
This work presents a new method for variable selection in complex spectral profiles. The method is validated by comparing samples from cerebrospinal fluid (CSF) with the same samples spiked with peptide and protein standards at different concentration levels. Partial least squares discriminant analysis (PLS-DA) attempts to separate two groups of samples by regressing on a y-vector consisting of zeros and ones in the PLS decomposition. In most cases, several PLS components are needed to optimize the discrimination between groups. This creates difficulties for the interpretation of the model. By using the y-vector as a target, it is possible to transform the PLS components to obtain a single predictive target-projected component analogously to the predictive component in orthogonal partial least squares discriminant analysis (OPLS-DA). By calculating the ratio between explained and residual variance of the spectral variables on the target-projected component, a selectivity ratio plot is obtained that can be used for variable selection. Used on whole mass spectral profiles of pure and spiked CSF, we can detect peptide in the low molecular mass range (740–9000 Da) at least down to 400 pM level without severe problems with false biomarker candidates. Similarly, we detect added proteins at least down to 2 nM level in the medium mass range (6000–17,500 Da). Target projection represents the optimal way to fit a latent variable decomposition to a known target, but the selectivity ratio plot can be used for OPLS as well as other methods that produce a single predictive component. Comparison with some commonly used tools for variable selection shows that the selectivity ratio plot has the best performance. This observation is attributed to the fact that target projection utilizes both the predictive ability (regression coefficients) and the explanatory ability (spectral variance/covariance matrix) for the calculation of the selectivity ratio.
Article
Full-text available
Several methods for orthogonal signal correction (OSC) based on pre-processing of the modeling data have been developed in recent years, and OPLS (orthogonal projections to latent structures) is a well known algorithm. The main result from these methods is a reduction in the number of …nal components in partial least squares (PLS) regression, while the predictions are virtually unchanged (identical for OPLS). This raises the question whether the same or similar results can be obtained in a more direct way using an ordinary PLS model as starting point, and as shown in the present paper this can indeed be done by use of a simple similarity transformation. This post-processing PLS+ST method is compared with OPLS, assuming a single response variable. The PLS+ST factorization of the data matrix X is just a similarity transformation of the non-orthogonalized PLS factorization, while OPLS is a similarity transformation of the orthogonalized PLS factorization. The predictions are therefore identical, but the residuals are somewhat di¤erent. A theoretically founded modi…cation of the orthogonalized PLS factorization, and a corresponding modi…cation of OPLS, leads to identical factorizations for all these methods, within similarity transformations. The PLS+ST vs. OPLS comparison also leads to an alternative post-processing method, using the ordinary PLS algorithm twice, with predetermined and permuted loading weights vectors in the second step. A limited comparison with post-processing using principal components of predictions (PCP) or canonical correlation analysis (CCA) is included.
Article
Full-text available
In this work, we consider a data array encoding interactions between two sets of observations respectively referred to as "subjects" and "objects". Besides, descriptions of subjects and objects are available through two variable sets. We propose a geometrically grounded exploratory technique to analyze the interactions using descriptions of subjects and objects: interactions are modelled using a hierarchy of subject-factors and object-factors built up from these descriptions. Our method bridges the gap between those of Chessel (RLQ analysis) and Martens (L-PLS), although it only has rank 1 components in common with them.
Article
Full-text available
In 1991, soil samples were taken from the long-term (40 years old) field trial at Ultuna in order to investigate soil P status and the distribution of its various forms. Among the treatments investigated, two were inorganic PK additions only – one to continuous fallow (PK-fallow) and the other to cropped fields (PK). There were also treatments amended with PK in combination with applications of straw, green manure composed of grass (GM), farmyard manure (FYM) or sewage sludge (SS). A total of 720, 720, 883, 1154, 1941 and 6617 kg P h-1 had been supplied in the PK-fallow, PK, Straw, GM, FYM and SS treatments, respectively up to 1991. The soil P distribution was determined by step-wise fractionation using anion exchange resin (resin-P), sodium bicarbonate (bicarb-P), sodium hydroxide (hyd-P), and HCl (HCl-P). Finally, the soil was digested to obtain residual P (resid-P). The amendments resulted in a significant (p=0.05) enrichment of total P in soils relative to the initial value. A breakdown of the bicarb-P and hyd-P into inorganic P (Pi) and organic P (Po) was manifested as considerable transformations within these P compartments compared with the initial values. Thus, total Pi (resin-P, bicarb-Pi, hyd-Pi, HC1-P, resid-P)/total Po (bicarb-Po, hyd-Po) ratios markedly decreased in all treatments relative to control. The two P compartments were significantly and negatively (p =0.05) correlated. On average, the total Po increase was about 380 mg kg-1 (range 270–715). The results suggested that an equilibrium between Pi immobilization and Po mineralization was difficult to attain under any of the experimental management regimes used, which exclude inorganic N application. The balance sheet calculations revealed P deficits ranging from about 10 to 60 kg ha-1, indicating that some P had migrated to the subsoil.
Article
In the present paper, the concept of orthogonal signal correction (OSC) as a spectral preprocessing method is discussed and a number of OSC algorithms that have appeared are compared from a theoretical viewpoint. Since all of these algorithms had some problems concerning the orthogonality towards Y, non-optimal amount of variance removed from X, or a non-attainable solution, a new direct OSC algorithm (DOSC) is introduced. DOSC was originally developed as a direct method solely based on least squares steps that had none of the problems mentioned above. The first practical results with the new method, however, were not encouraging due to the complete orthogonality constraint. If this orthogonality constraint is loosened, the method improves considerably and simplifies the calibration model for the prediction of Y.
Article
A multivariate method called direct orthogonalization is proposed for removing factors that describe irrelevant phenomena from data in calibration situations. The method is suggested for improving regression of data sets with systematic, but irrelevant, variations. The method is applied to FT-IR spectral data measured on dry pectin powder samples with the purpose of predicting the degree of esterification. Direct orthogonalization is compared with piecewise multiplicative scatter correction (PMSC) schemes and second order derivatives on the predictive performance of principal component regression (PCR) and partial least squares regression (PLSR) models. When applying direct orthogonalization to the FT-IR spectral data under investigation, the number of significant PLSR and PCR components was lowered significantly while facilitating a qualitative discussion of the scatter phenomena, and at the same time providing a means to identify outliers prior to prediction. In terms of root mean square error of prediction (RMSEP), the proposed method resulted in error measures at the same level as the applied PMSC schemes. Application of second order derivatives to the same data resulted in significantly poorer models.
Article
Orthogonal signal correction (OSC) is a technique for pre-processing of, for example, NIR-spectra before they are subjected to a multivariate calibration. With OSC the X-matrix is corrected by a subtraction of variation that is orthogonal to the calibration Y-matrix. This correction can then be applied to new spectra that are going to be used in predictions. The aim of this study is to investigate if the OSC transform makes the spectra less dependent of instrument variation. This may result in easier calibration model transfer between different instruments without creating or re-analysing the whole calibration sample set. OSC was applied to NIR-spectra that were used in a calibration for the water content in a pharmaceutical product. Partial Least Squares calibrations were then compared to other calibration models with uncorrected spectra, models with spectra subjected to multiplicative signal correction, and a number of other transfer methods. The performance of OSC was on the same level as for piece-wise direct standardisation and spectral offset correction for each individual instrument and PLS-models with both instruments included.
Article
The O2-PLS method is derived from the basic partial least squares projections to latent structures (PLS) prediction approach. The importance of the covariation matrix (YTX) is pointed out in relation to both the prediction model and the structured noise in both X and Y. Structured noise in X (or Y) is defined as the systematic variation of X (or Y) not linearly correlated with Y (or X). Examples in spectroscopy include baseline, drift and scatter effects. If structured noise is present in X, the existing latent variable regression (LVR) methods, e.g. PLS, will have weakened score–loading correspondence beyond the first component. This negatively affects the interpretation of model parameters such as scores and loadings. The O2-PLS method models and predicts both X and Y and has an integral orthogonal signal correction (OSC) filter that separates the structured noise in X and Y from their joint X–Y covariation used in the prediction model. This leads to a minimal number of predictive components with full score–loading correspondence and also an opportunity to interpret the structured noise. In both a real and a simulated example, O2-PLS and PLS gave very similar predictions of Y. However, the interpretation of the prediction models was clearly improved with O2-PLS, because structured noise was present. In the NIR example, O2-PLS revealed a strong water peak and baseline offset in the structured noise components. In the simulated example the O2-PLS plot of observed versus predicted Y-scores (u vs uhat) showed good predictions. The corresponding loading vectors provided good interpretation of the covarying analytes in X and Y. Copyright © 2003 John Wiley & Sons, Ltd.
Article
In this paper the O-PLS method [1] has been modified to further improve its interpretational functionality to give (a) estimates of the pure constituent profiles in X as well as model (b) the Y-orthogonal variation in X, (c) the X-orthogonal variation in Y and (d) the joint X–Y covariation. It is also predictive in both ways, X ↔ Y. We call this the O2-PLS approach. In earlier papers we discussed the improved interpretation using O-PLS compared to the partial least squares projections to latent structures (PLS) when systematic Y-orthogonal variation in X exists, i.e. when a PLS model has more components than the number of Y variables. In this paper we show how the parameters in the PLS model are affected and to what degree the interpretational ability of the PLS components changes with the amount of Y-orthogonal variation. In both real and synthetic examples, the O2-PLS method provided improved interpretation of the model and gave a good estimate of the pure constituent profiles, and the prediction ability was similar to the standard PLS model. The method is discussed from geometric and algebraic points of view, and a detailed description of this modified O2-PLS method is given and reviewed. Copyright © 2002 John Wiley & Sons, Ltd.
Article
Six different algorithms for orthogonal signal correction (OSC) are studied and compared both from an algorithmic point of view and from a prediction and analysis point of view. The algorithms have appeared under the names OSC (three alternative algorithms), direct orthogonalization (DO) and orthogonal projection to latent structures (OPLS). These algorithms can be divided into two groups. The first group has the ability to reduce the number of PLS components in the calibration models significantly by removing only one orthogonal component. The second group reduces the complexity of the calibration model by one PLS component for each orthogonal component removed. The methods are evaluated and compared using both simulated and real calibration data sets. In some cases the OSC algorithms can have quite different behaviors, such as when non-linearities are present. However, in all cases we have studied, none of the OSC algorithms provided a significant improvement in the calibration models over using PLS on the raw data. The main advantage with OSC may lie in the possibly easier interpretation and understanding from the analysis of corrected data. Analysis of the orthogonal information removed with OSC might also be beneficial. Copyright © 2002 John Wiley & Sons, Ltd.
Article
A generic preprocessing method for multivariate data, called orthogonal projections to latent structures (O-PLS), is described. O-PLS removes variation from X (descriptor variables) that is not correlated to Y (property variables, e.g. yield, cost or toxicity). In mathematical terms this is equivalent to removing systematic variation in X that is orthogonal to Y. In an earlier paper, Wold et al. (Chemometrics Intell. Lab. Syst. 1998; 44: 175–185) described orthogonal signal correction (OSC). In this paper a method with the same objective but with different means is described. The proposed O-PLS method analyzes the variation explained in each PLS component. The non-correlated systematic variation in X is removed, making interpretation of the resulting PLS model easier and with the additional benefit that the non-correlated variation itself can be analyzed further. As an example, near-infrared (NIR) reflectance spectra of wood chips were analyzed. Applying O-PLS resulted in reduced model complexity with preserved prediction ability, effective removal of non-correlated variation in X and, not least, improved interpretational ability of both correlated and non-correlated variation in the NIR spectra. Copyright © 2002 John Wiley & Sons, Ltd.
Article
Soil C balances were calculated in a field experiment started in 1956. Treatments include a fallow and soils receiving different N fertilizers or organic amendments. By assuming the absence of a priming effect, the degree of mineralization of crop residues and organic amendments was calculated. Crop residue mineralization was not affected by a more than 50% decrease in the size of the microbial biomass in soil fertilized with (NH4)2SO4, which had caused the pH of this soil to drop from 6.6 to 4.4. More C had accumulated per unit C input in peat-and sewage sludge-amended soils than in any of the other soils, suggesting that peat and sewage sludge were more resistant to microbial attack. Recalcitrance of substrate C was an adequate explanation for the low ratio of biomass C to soil C in the peat-amended soils, but not in the sewage sludge-amended soil. There was a close linear relationship (r=0.94) between the content of microbial biomass C in the soil measured in 1990 and cumulative C losses from the soil since 1956. Compared to the relationship between soil biomass C and soil organic C concentrations, the linear relationship between microbial C and cumulative C losses suggested that the significantly reduced biomass in the sewage sludge-amended soil was at least partially due to the presence of toxic substances (presumably elevated heavy metal concentrations) in this soil and was probably not affected by the somewhat low pH (5.3) in this soil.
Article
A comparison is presented between orthogonal signal correction (OSC) and net analyte signal (NAS) calculations. It was shown that the latter can be used as a preprocessing method comparable to the former, before the application of partial least-squares (PLS) to the filtered data. When the number of factors used in the net analyte preprocessing (NAP) procedure increases, the subsequent application of PLS requires progressively less factors, a behavior comparable to OSC. If enough factors are extracted by either NAP or OSC methods, the remaining calibration problem is amenable to a classical least-squares solution, giving rise to two multivariate calibration methods named NAP/CLS and OSC/CLS. All methods are illustrated from cross-validation and external validation results for two experimental examples: (1) the determination of the antibiotic tetracycline in human serum, and (2) the quantitation of the nasal decongestant naphazoline in multicomponent pharmaceutical solutions.
Article
Near-infrared (NIR) spectra are often pre-processed in order to remove systematic noise such as base-line variation and multiplicative scatter effects. This is done by differentiating the spectra to first or second derivatives, by multiplicative signal correction (MSC), or by similar mathematical filtering methods. This pre-processing may, however, also remove information from the spectra regarding Y (the measured response variable in multivariate calibration applications). We here show how a variant of PLS can be used to achieve a signal correction that is as close to orthogonal as possible to a given Y-vector or Y-matrix. Thus, one ensures that the signal correction removes as little information as possible regarding Y. In the case when the number of X-variables (K) exceeds the number of observations (N), strict orthogonality is obtained. The approach is called orthogonal signal correction (OSC) and is here applied to four different data sets of multivariate calibration. The results are compared with those of traditional signal correction as well as with those of no pre-processing, and OSC is shown to give substantial improvements. Prediction sets of new data, not used in the model development, are used for the comparisons.
Article
A new algorithm for orthogonal signal correction is presented, compared with existing algorithms, and illustrated on an example from near infrared spectroscopy. Given a matrix X of spectral or other high dimensional data and a vector or matrix Y of concentrations or other reference measurements on the same samples, orthogonal signal correction subtracts from X factors that account for as much as possible of the variance in X and are orthogonal to Y. The aim is to improve the performance of a subsequent partial least squares (PLS) regression of Y on X.
Article
A novel signal-processing method that performs orthogonal signal correction (OSC) in a piecewise manner, namely piecewise OSC (POSC), is developed and applied to two near-infrared (NIR) data sets of multivariate calibration. Partial least squares (PLS) regression models were constructed for the POSC-corrected spectra, and the results were compared with those obtained by the Wise and Fearn OSC algorithms. It is shown that performing POSC prior to calibration yields regression models that are more parsimonious (fewer latent variables) and with better predictive power than models obtained by the above methods. The removal of orthogonal components from the response matrix is greatly facilitated simply by considering localized spectral features.
Article
The original chemometrics partial least squares (PLS) model with two blocks of variables (X and Y), linearly related to each other, has had several enhancements/extensions since the beginning of 1980. We here discuss multi-block and hierarchical PLS modeling for installing a priori knowledge of the data structure and simplifying the model interpretation, variable selection schemes for PLS with often similar objectives, nonlinear PLS, and prefiltered PLS, orthogonal signal correction (OSC). A very recent development, orthogonalized-PLS (O-PLS) is included as a way to accomplish both OSC, and a simpler interpretation of the PLS model. In this context, we also briefly mention time series, batch, and wavelets variants of PLS.These PLS extensions are illustrated by examples from peptide quantitative structure–activity relationships (QSAR) and multivariate characterization of pulp using NIR.
Article
In this article we discuss our experience designing and implementing a statistical computing language. In developing this new language, we sought to combine what we felt were useful features from two existing computer languages. We feel that the new language provides advantages in the areas of portability, computational efficiency, memory management, and scoping.
A comparison of orthogonal signal correction and net analyte preprocessing methods
  • Hc Goicoeches
  • Olivieri
Goicoeches HC, Olivieri AC. A comparison of orthogonal signal correction and net analyte preprocessing methods. Chemometrics Intell. Lab. Syst. 2001; 56: 73–81.
An investigation of orthogonal signal correction algorithms and their characteristics Some theoretical properties of the O-PLS 67 Copyright # The Ultuna long-term soil organic matter experiment Witter E. Soil C balance in a long-term field experiment in relation to the size of the microbial biomass
  • O Svensson
  • T Kourti
  • Macgregor
  • H Kirchmann
  • J Persson
  • Carlgren
Svensson O, Kourti T, MacGregor JF. An investigation of orthogonal signal correction algorithms and their characteristics. J. Chemometrics 2002; 16: 176–188. Some theoretical properties of the O-PLS 67 Copyright # 2004 John Wiley & Sons, Ltd. J. Chemometrics 2004; 18: 62–68 15. Kirchmann H, Persson J, Carlgren K. The Ultuna long-term soil organic matter experiment, 1956–1991. Department of Soil Science Reports and Dissertation 17, Swedish University of Agricultural Sciences, Uppsala, 1994. 16. Witter E. Soil C balance in a long-term field experiment in relation to the size of the microbial biomass. Biol. Fertil. Soils 1996; 23: 33–37.
  • T Verron
  • R Sabatier
  • R. Joffre Copyright
68 T. Verron, R. Sabatier and R. Joffre Copyright # 2004 John Wiley & Sons, Ltd. J. Chemometrics 2004; 18: 62–68
The Ultuna long-term soil organic matter experiment 1956-1991.Department of Soil Science Reports and Dissertation 17 Swedish University of Agricultural Sciences Uppsala
  • Carlgrenk Kirchmannh Perssonj