A schema presenting the sequential orthogonalized partial least squares (SO-PLS) regression method [26].

A schema presenting the sequential orthogonalized partial least squares (SO-PLS) regression method [26].

Source publication
Article
Full-text available
In recent years, multi-modal measurements of process and product properties have become widely popular. Sometimes classical chemometric methods such as principal component analysis (PCA) and partial least squares regression (PLS) are not adequate to analyze this kind of data. In recent years, several multi-block methods have emerged for this purpos...

Context in source publication

Context 1
... (SO-PLS) and parallel and orthogonalized-PLS (PO-PLS), were also proposed as extensions of standard PLS [52]. The SO-PLS approach involves a series of standard PLS regression and matrix orthogonalization operations to extract sequentially the complementary information from different data blocks; a generic schema of the algorithm is presented in Fig. 5. As mentioned, in SO-PLS, the extraction of information is sequential, meaning that the aim is to incorporate blocks of data one at a time and to assess their incremental contribution. A PLS regression model is calculated between the first block X 1 and Y, yielding scores T 1 . Then, all the remaining blocks X 2 , … X k and Y are ...

Similar publications

Article
Full-text available
The Scientific Committee on Cosmetic and Non-Food Products has identified 26 compounds that may cause contact allergy in consumers when present in concentrations above certain legal thresholds in a product. Twenty-four of these compounds are volatiles and can be analyzed by gas chromatography-mass spectrometry (GC-MS) or electronic nose (e-nose) te...
Article
Full-text available
O desenvolvimento tecnológico aliado ao avanço da instrumentação trouxeram novos desafios para os químicos, o de extrair informações de grandes bancos de dados. Neste contexto surgiu a quimiometria, área da química dedicada ao tratamento de dados analíticos de origem multivariada. Por ser uma área relativamente nova, materiais didáticos a respeito...
Article
Full-text available
The suitability of UV–Vis spectrometry was evaluated for the classification of undiluted and diluted Slovak Tokaj wine samples according to style (essences, Tokaj selection, varietal and other wines), grade (quantity of cibebas), and variety (Furmint, Lipovina, and Muškát žltý) using principal component analysis (PCA), variable selection (VS), line...
Preprint
Full-text available
As goat milk has a higher economic value compared to cow milk, the phenomenon of adulterating goat milk with cow milk appears in the market. In this article, the potential of Raman spectroscopy along with chemometrics was investigated for authentication and quantitation liquid goat milk adulterated with cow milk. First, the Raman spectra of goat mi...
Article
Full-text available
Halitosis is a highly distressing, socially unaesthetic condition, with a very high incidence amongst the adult population. It predominantly arises from excessive oral cavity volatile sulphur compound (VSC) concentrations, which have either oral or extra-oral etiologies (90–95% and 5–10% of cases, respectively). However, reports concerning age- and...

Citations

... In some cases, complex data are generated, which poses challenges in terms of interpretation and analysis. Traditional chemometric models, such as PLSR, are typically designed for data collected in a single sensor mode, where the spectral data are obtained from a single source or instrument (Mishra et al. 2021;Kandpal et al. 2022b;Karami et al. 2024), and developing separate models for each sensor requires additional time and effort, making it impractical to create a universal model for all sensors. Consequently, the process is more time-consuming than typical modeling approaches. ...
... SOPLS, as a supervised "multiblock" data analysis, is an extension of the PLSR model and has been developed to take advantage of multiple blocks of predictors (Naes et al. 2011;Mishra et al. 2021;Smilde et al. 2022). The SOPLS equation with N blocks of independent variables is as follows (Eq. ...
Article
Full-text available
Purpose The cation exchange capacity (CEC) is a pivotal soil attribute that influences soil chemistry, fertility, and productivity. Nevertheless, the conventional techniques employed for CEC measurements present challenges in terms of complexity, cost, and laboriousness. Hence, there is a demand for expedited, cost-effective, streamlined alternative methodologies that can yield accurate outcomes. The objective of this study was to employ and compare various techniques, including Pedotransfer Functions (PTFs) based on fundamental soil properties, support vector regression (SVR), sequential orthogonalized partial least squares (SOPLS) as a multiblock data analysis method, and Spectrotransfer Function (SPTF) utilizing visible-near infrared (VNIR) and mid infrared (MIR) diagnostic wavelengths to estimate the CEC of calcareous soils with diverse land uses in the semi-arid region of Fars province, Iran. Materials and methods A total of 130 samples were gathered from the soils of the study region, CEC was measured using sodium acetate, and the spectral reflectance in the VNIR and transmission in the MIR regions were measured, and prediction models were created using linear support vector regression (L-SVR), radial basis function support vector regression (RBF-SVR), partial least squares regression (PLSR), and multiblock data analysis algorithms, after different spectral preprocessing methods. Results and discussion The results generally indicated that spectroscopy models performed better than PTFs in predicting CEC with the multiblock SOPLS showing the best results (R² = 0.92, RMSE = 1.67 cmol(+) kg⁻¹, and RPIQ = 4.34). The performance of the models followed the order: SOPLS > SPTF > L-SVR > RBF-SVR. Conclusion Our findings indicate that spectroscopy coupled with SOPLS analysis can be a robust, viable, fast, cheap, and efficient alternative assessment method with acceptable accuracy for estimating soil CEC in calcareous soils, instead of the difficult, costly, and cumbersome conventional measurement approaches or other estimation methods.
... ComDim can be considered a generalization of principal component analysis (PCA) [12][13][14][15] for cases where multiple data matrices are collected to describe the same set of samples. It was first applied in sensory analysis and has gained some popularity in both sensorimetry and chemometrics [10,11,[16][17][18]. ...
... Briefly [10,11,[16][17][18], we consider a set of M data matrixes Xm m = 1. . .M sharing the same number of samples n, but possibly with a different number of variables. ...
... Briefly [10,11,[16][17][18], we consider a set of M data matrixes Xm m = 1…M sharing the same number of samples n, but possibly with a different number of variables. Each matrix is assumed to be mean-centered and further scaled to unit Frobenius' norm, so to make the variance of the different blocks comparable. ...
Article
Full-text available
Direct catalytic methanol fuel cells (DCMFCs) have been studied for several years for energy conversion. Less extensive is the investigation of their analytical properties. In this paper, we demonstrate that the behavior of both the discharge and charger curves of DCMFCs depends on the chemical composition of the solution injected in the fuel cell. Their discharge and charge curves, analyzed using a chemometric data fusion method named ComDim, enable the identification of various types of aliphatic alcohols diluted in water. The results also show that the identification of alcohols can be obtained from the first portion of the discharge and charge curves. To this end, the curves have been described by a set of features related to the slope and intercept of the initial portion of the curves. The ComDim analysis of this set of features shows that the identification of alcohols can be obtained in a time that is about thirty times shorter than the time taken to achieve steady-state voltage.
... Chemometric software packages enable the combination two or more analytical techniques (thus, more accurate knowledge about a sample is used), and classifications with a lower error rate and predictions with less uncertainty than the application of a single technique (Tarapoulouzi & Theocharis, 2019, 2022. The multifunctional analysis of orthogonal elements (multiblock orthogonal component analysis, MOCA) has recently become available and is used to perform data fusion (Borràs et al., 2015;Maléchaux et al., 2020;Mishra et al., 2021;Tarapoulouzi & Theocharis, 2022). ...
Chapter
The chapter first discusses the fundamentals of designation of origin and geographical indication-related labels in dairy agricultural products, and specifically cheese. The rationale behind the development of these labels, and the prospective benefits are reviewed. The benefits and challenges of implementing cheese-related GI on the consumers, and the local communities where these cheeses are made are then elaborated. A comparison is conducted regarding the production of GI cheese by large and small dairies, which highlights the challenges for small and artisanal dairies. Artificial intelligence as an innovative way of validating the compliance of a cheese with certain authenticity standards and assess the accuracy of the application of the labels is also presented. Lastly, the evolution of these labels in the case of the Cypriot traditional Halloumi cheese is discussed.
... The central goal of the BS preprocessing approach is to equalize the effects between different blocks, which may have different scales and number of variables, through block scaling and block variance scaling. This helps to avoid any one block having a dominant influence on the modeling results (Mishra et al., 2021). Analyzing the spectrograms, it is observed that the spectra preprocessed using the BS method exhibit a heightened number of absorption peaks in comparison to spectra treated with alternative preprocessing techniques. ...
Article
Full-text available
Currently the determination of cyanidin 3-rutinoside content in plant petals usually requires chemical assays or high performance liquid chromatography (HPLC), which are time-consuming and laborious. In this study, we aimed to develop a low-cost, high-throughput method to predict cyanidin 3-rutinoside content, and developed a cyanidin 3-rutinoside prediction model using near-infrared (NIR) spectroscopy combined with partial least squares regression (PLSR). We collected spectral data from Michelia crassipes (Magnoliaceae) tepals and used five different preprocessing methods and four variable selection algorithms to calibrate the PLSR model to determine the best prediction model. The results showed that (1) the PLSR model built by combining the blockScale (BS) preprocessing method and the Significance multivariate correlation (sMC) algorithm performed the best; (2) The model has a reliable prediction ability, with a coefficient of determination (R²) of 0.72, a root mean square error (RMSE) of 1.04%, and a residual prediction deviation (RPD) of 2.06. The model can be effectively used to predict the cyanidin 3-rutinoside content of the perianth slices of M. crassipes, providing an efficient method for the rapid determination of cyanidin 3-rutinoside content.
... LV models can deal with significant and correlated variables (Trygg and Wold, 2003). Multi-block hierarchical PCA (HPCA) and hierarchical partial least squares (HPLS) are frequently used in chemometrics to deal with spectral information from different sensors and a batch of data (Mishra et al., 2021;Martins et al., 2023). Therefore, multi-block analysis can create a bi-directional reconstruction of whole tomato fruit from the skin, pulp, and seed spectral data. ...
... The literature offers several methodologies (Mishra et al., 2021) that could be used for spectral reconstruction. For instance, O2-PLS (Trygg and Wold, 2003) and OnPLS (Lofstedt et al., 2013) utilise spectral data's local and global joints, bioheat models (Alzahrani and Abbas, 2019;Marin et al., 2021) could be adapted to predict the internal tomato tissues and adaptive neuro-fuzzy inference system (Abdullahi et al., 2021;Abdullahi et al., 2022), a hybrid computational model that combines the adaptive capabilities of neural networks with the interpretability of a mathematical framework that deals with uncertainty and imprecision in decision-making. ...
Article
Full-text available
Introduction Precision monitoring maturity in climacteric fruits like tomato is crucial for minimising losses within the food supply chain and enhancing pre- and post-harvest production and utilisation. Objectives This paper introduces an approach to analyse the precision maturation of tomato using hyperspectral tomography-like. Methods A novel bi-directional spectral reconstruction method is presented, leveraging visible to near-infrared (Vis-NIR) information gathered from tomato spectra and their internal tissues (skin, pulp, and seeds). The study, encompassing 118 tomatoes at various maturation stages, employs a multi-block hierarchical principal component analysis combined with partial least squares for bi-directional reconstruction. The approach involves predicting internal tissue spectra by decomposing the overall tomato spectral information, creating a superset with eight latent variables for each tissue. The reverse process also utilises eight latent variables for reconstructing skin, pulp, and seed spectral data. Results The reconstruction of the tomato spectra presents a mean absolute percentage error of 30.44 % and 5.37 %, 5.25 % and 6.42 % and Pearson’s correlation coefficient of 0.85, 0.98, 0.99 and 0.99 for the skin, pulp and seed, respectively. Quality parameters, including soluble solid content (%), chlorophyll (a.u.), lycopene (a.u.), and puncture force (N), were assessed and modelled with PLS with the original and reconstructed datasets, presenting a range of R2 higher than 0.84 in the reconstructed dataset. An empirical demonstration of the tomato maturation in the internal tissues revealed the dynamic of the chlorophyll and lycopene in the different tissues during the maturation process. Conclusion The proposed approach for inner tomato tissue spectral inference is highly reliable, provides early indications and is easy to operate. This study highlights the potential of Vis-NIR devices in precision fruit maturation assessment, surpassing conventional labour-intensive techniques in cost-effectiveness and efficiency. The implications of this advancement extend to various agronomic and food chain applications, promising substantial improvements in monitoring and enhancing fruit quality.
... The complementary information about the structural elements in the material can help guide interpretation of the compositional data. Ideally, these different sources would be fused using methods such as multi-block PCA to better assess key variables contributing to the model (Mishra et al., 2021). Such analysis is, however, outside the scope of this paper and a potential direction for future work.The metamorphic nature of quartzite presents a particular challenge for spectroscopic sampling. ...
... A more robust geological resource model could be developed by fusing data from multiple types of spectroscopic instrumentation (e.g. multiblock PCA; Mishra et al., 2021). ...
... The disadvantage of using this model is that it may overfit the data if the number of predictor variables dramatically exceeds the sample size. This method is less robust than the Ridge and Lasso models [71,72]. Comparing performance by the parameters R2 of each model, Lasson regression is the most appropriate choice, followed by Ridge and PLS regression. ...
Article
Full-text available
Hoisting is an essential aspect of Industrial Building System (IBS) construction. Although research on hoisting safety in China has made strides to focus on “worker,” “data,” “task,” “site,” and “accident,” there still needs to be more approaches based on multi-dimensional social system thinking. Therefore, the paper aims to fill this gap. We investigated 105 hoisting accidents in China and found that hoisting accidents occurred most frequently in China's southeast coastal region; truck-mounted cranes and tower cranes were the most common types of machinery involved in accidents; hoisting load off, capsizing of crane machinery, and workers falling from height are the three most common accident types; the average impact of a single hoisting accident is approximately RMB 2.43 million direct economic loss, 1.543 deaths and 0.829 injured. This study used three algorithms (Rindge regression, Lasson regression, and partial least squares regression) to explore the impact of deaths and injuries on direct economic losses. By combining Rasmussen's risk framework with the characteristics of hoisting construction, six risk domains and thirty-six safety risk factors were identified. Finally, we used AcciMap technology to construct a qualitative IBS hoisting management model, which exhaustively presents the systematic levels and propagation paths of the influencing factors by the PDCA method. The research helps academics explore strategies to improve the safety of hoisting construction in IBS. Moreover, the study outcomes can inform the policy-making process towards promoting healthy and sustainable construction development.
... Wold, 2003). Other methods, such as multi-block hierarchical PCA (MHPCA) and hierarchical partial least squares (HPLS), can handle spectral information from various sensors and datasets (Mishra et al., 2021). The application of MHPCA analysis is hypothesised to fitting for the bi-directional spectral reconstruction of the total grape (TO) using spectral data from the SK, PU, and SE, creating a technique called "tomography-like". ...
... This provides insight into the importance of each data block for specific biochemical differentiation and characterization. CCSWA also allows for multiple model extension possibilities such as in the supervised/predicting context (P-ComDim [52]), variable selection and path modelling (Path-ComDim [53]) making it a desired multi-block exploratory model over alternative methods, due to the more refined mathematical nature of the algorithm [54]. ...
Article
Full-text available
In an era where phytoplankton-based technologies are expanding, optimization for rapid cellular growth detection and detailed biochemical characterization is key to enhancing productivity. Hence, the complex analysis and development of highly sensitive and rapid in-vivo methods for precise real-time molecular monitoring of phytoplankton cells is imperative. Herein, a multi-block spectroscopic based methodology for the identification and characterization of molecular changes occurring across different growth phases in the chlorophyte species, Tetraselmis suecica, is presented. Confocal Raman microscopy with near-infrared (NIR) excitation spectra and concurrent excitation-emission fluorescence matrix (EEM) measurements were taken at different intervals across a twenty-day cell growth cycle period encompassing three distinct growth phases (exponential, stationary and decline). Three different data fusion strategies were explored: low-level, mid-level and a mixed level with subsequent multi-block model building using the Common Components and Specific Weights Analysis (CCSWA) algorithm. Various pre-processing sequences in regard to the raw data and single-block exploratory methods were evaluated for all three strategies and selected based on the optimum computed salience contribution towards the multi-block global components within each model. A detailed characterization of the biochemical changes happening within cellular growth phases of the chlorophyte species was constructed. Additionally, the study establishes a novel paradigm for data manipulation within a multi-block framework for complex spectroscopic data of biological cells, such as phytoplankton.
... To avoid the dominance of one of the two blocks on the entire model, a block variance scaling was applied. This consisted in scaling each block by the square root of the pooled variance of its variables (Mishra et al., 2021). The outcome was truly interesting, the regression and score plot shown in Fig. 4 showed a good fit between the theoretical and experimental regression line, as well as a fine separation of samples with different percentages of olive leaf adulteration. ...