Figure - available from: Frontiers in Chemistry
This content is subject to copyright.
Steps in the data analysis methodology.

Steps in the data analysis methodology.

Source publication
Article
Full-text available
The olive oil assessment involves the use of a standardized sensory analysis according to the “panel test” method. However, there is an important interest to design novel strategies based on the use of Gas Chromatography (GC) coupled to mass spectrometry (MS), or ion mobility spectrometry (IMS) together with a chemometric data treatment for olive o...

Similar publications

Article
Full-text available
The number of representative samples to build a calibration model plays a major role in the success of chemometric models for class discrimination; therefore, knowing which samples should be used for the calibration of prediction models is essential. The aim of this work is to design a basic guideline for the training of partial least squares discr...

Citations

... ML is seen as a part of artificial intelligence and one of its purposes is to extract knowledge from data. ML is also one of the most widely used techniques for knowledge discovery, and its use can be found in many areas of knowledge such as chemistry [11,12], computer vision [13][14][15] or data streaming [16][17][18]. Furthermore, we can find numerous studies that use ML to predict electricity consumption [19,20] or price prediction [21,22]. ...
Article
Full-text available
CO2 emissions play a crucial role in international politics. Countries enter into agreements to reduce the amount of pollution emitted into the atmosphere. Energy generation is one of the main contributors to pollution and is generally considered the main cause of climate change. Despite the interest in reducing emissions, few studies have focused on investigating energy pricing technologies. This article analyzes the technologies used to meet the demand for electricity from 2016 to 2021. The analysis is based on data provided by the Spanish Electricity System regulator, using statistical and clustering techniques. The objective is to establish the relationship between the level of pollution of electricity generation technologies and the hourly price and demand. Overall, the results suggest that there are two distinct periods with respect to the technologies used in the studied years, with a trend toward the use of cleaner technologies and a decrease in power generation using fossil fuels. It is also surprising that in the years 2016 to 2018, the most polluting technologies offered the cheapest prices.
... Moreover, Multilayer Perceptron (MP) models exploiting olive oil spectroscopy measurements have successfully been used to distinguish blends consisting of refined and extra virgin olive oil (Aroca-Santos et al., 2016). Yet, a feed forward ANN-based model, exploiting GC-IMS spectroscopy data, discriminated successfully extra virgin olive oil samples from virgin and lampante olive oils (Vega-Márquez et al., 2019) while classification and prediction models have been conducted for distinguishing extra virgin olive oil from other edible vegetable oils (Jiménez-Carvelo et al., 2017). ...
Article
Extra virgin olive oil traceability and authenticity are important quality indicators, and are currently the subject of exhaustive research, for developing methods to secure olive oil origin-related issues. The aim of this study was the development of a classification model capable of olive cultivar identification based on olive oil chemical composition. To achieve our aim, 385 samples of two Greek and three Italian olive cultivars were collected during two successive crop years from different locations in the coastline part of western Greece and southern Italy and analyzed for their chemical characteristics. Principal Component Analysis showed trends of differentiation among olive cultivars within or between the crop years. Artificial intelligence model of the XGBoost machine learning algorithm showed high performance in classifying the five olive cultivars from the pooled samples.
... Moreover, Multilayer Perceptron (MP) models exploiting olive oil spectroscopy measurements have successfully been used to distinguish blends consisting of refined and extra virgin olive oil (Aroca-Santos et al., 2016). Yet, a feed forward ANN-based model, exploiting GC-IMS spectroscopy data, discriminated successfully extra virgin olive oil samples from virgin and lampante olive oils (Vega-Márquez et al., 2019) while classification and prediction models have been conducted for distinguishing extra virgin olive oil from other edible vegetable oils (Jiménez-Carvelo et al., 2017). ...
... Many authors agreed on the controversies associated to the panel test, conducted by a human panel, especially in terms of efficiency and robustness, pointing out the need of setting up a supporting instrumental tool for sensory evaluation (Aparicio-Ruiz et al., 2019;Barbieri, Bubola, et al., 2020;Conte et al., 2020). Since Mf and Md are olfactory perceived attributes, the methods aiming sensory quality grading have been generally based on the analysis of volatile organic compounds (VOC) by different techniques, providing satisfactory results Contreras et al., 2019;Quintanilla-Casas et al., 2020;Sales et al., 2019;Valli et al., 2020;Vega-Márquez et al., 2020). ...
Article
Full-text available
Unlike other food products, virgin olive oil must undergo an organoleptic assessment that is currently based on a trained human panel, which presents drawbacks that might affect the efficiency and robustness. Therefore, disposing of instrumental methods that could serve as screening tools to support sensory panels is of paramount importance. The present work aimed to explore excitation-emission fluorescence spectroscopy (EEFS) to predict bitterness and pungency, since both attributes are related with fluorophore compounds, such as polar phenols. Bitterness and pungency intensities of 250 samples were provided by an official sensory panel and used to build and compare partial least squares regressions (PLSR) with the excitation-emission matrix. Both PARAFAC scores and two-way unfolded data led to successful PLSR. The most relevant PARAFAC scores agreed with virgin olive oil phenolic spectra, evidencing that EEFS would be the fit-for-purpose screening tool to support the sensory panel.
... A fully connected feed-forward DNN architecture shown in Figure 7 was used for the classification process. In order to avoid network underfitting and overfitting [35], the following rule of thumb methods [35], [51] were considered in the design of hidden layers of a deep neural network classifier shown in Figure 7:. ...
Article
Full-text available
Electricity theft is a global problem that negatively affects both utility companies and electricity users. It destabilizes the economic development of utility companies, causes electric hazards and impacts the high cost of energy for users. The development of smart grids plays an important role in electricity theft detection since they generate massive data that includes customer consumption data which, through machine learning and deep learning techniques, can be utilized to detect electricity theft. This paper introduces the theft detection method which uses comprehensive features in time and frequency domains in a deep neural network-based classification approach. We address dataset weaknesses such as missing data and class imbalance problems through data interpolation and synthetic data generation processes. We analyze and compare the contribution of features from both time and frequency domains, run experiments in combined and reduced feature space using principal component analysis and finally incorporate minimum redundancy maximum relevance scheme for validating the most important features. We improve the electricity theft detection performance by optimizing hyperparameters using a Bayesian optimizer and we employ an adaptive moment estimation optimizer to carry out experiments using different values of key parameters to determine the optimal settings that achieve the best accuracy. Lastly, we show the competitiveness of our method in comparison with other methods evaluated on the same dataset. On validation, we obtained 97% area under the curve (AUC), which is 1% higher than the best AUC in existing works, and 91.8% accuracy, which is the second-best on the benchmark.
... Transport, storage, and physical handling procedures of load units such as containers, swap-bodies, pallets, boxes, and so on are included in the Physical Internet, as is any other resource required for a freight transport and logistics operation [5]. [6]. Deep learning has received a lot of attention recently, and for good reason. ...
... However, techniques based on machine learning and data mining are becoming increasingly important in time series forecasting nowadays. These techniques are widely used in various application fields such as energy [6], stock market [21], health [4], pollution [20], natural disasters [2], agriculture [27], or energy resources [26], being some examples that can contribute to the fashioning of the future of humanity. ...
Chapter
Forecasting electricity demand is crucial for the management of smart grids to ensure a secure, reliable and sustainable supply. Recently, a variant of convolutional neural networks, called temporal convolutional networks, has emerged for data sequence, competing directly with deep recurrent neural networks in terms of execution time and memory requirements. In this work, we propose a deep temporal convolutional network to predict time series, namely, the electricity consumption with a 4-h forecast horizon. Results using nine and a half years of Spanish electricity load, with a 10-min sampling rate, are reported and discussed. In addition, the performance of the proposed model is compared with linear regression, decision trees, gradient boosted trees, random forests, deep feed forward neural networks that use different techniques to find the optimal hyper-parameters and a deep Long Short-Term Memory network. The proposed model reaches competitive results in terms of accuracy, with the smallest error verging on 1%.
... Other supervised methods used in NTS with HS-GC-IMS are gradient boosting (e.g., XGBoost) [31], decision tree classification (Tree) [91], logistic regression (Regressor) [91], orthogonal partial least-squares discriminant analysis (OPLS-DA), quadratic discriminant analysis (QDA) [30], or soft independent modeling of class analogy (SIMCA) [82]. Furthermore, nonlinear classifications are often performed using support vector machines (SVMs). ...
... Other supervised methods used in NTS with HS-GC-IMS are gradient boosting (e.g., XGBoost) [31], decision tree classification (Tree) [91], logistic regression (Regressor) [91], orthogonal partial least-squares discriminant analysis (OPLS-DA), quadratic discriminant analysis (QDA) [30], or soft independent modeling of class analogy (SIMCA) [82]. Furthermore, nonlinear classifications are often performed using support vector machines (SVMs). ...
... PLS-DA (58.7%), kNN (k = 5, 60.8%) and SVM (51.8%), and XGBoost (81.8%), for the classification of Sauvignon Blanc via SHS-GC-IMS [31]. Vega-Márquez and coworkers evaluated a deep learning network and five different benchmark methods for the classification of olive oil samples into EVOO, OO, and LVOO, based on HS-GC-IMS spectra obtained for 701 olive oil sample from two different harvests [91]. Among the five benchtop models used for comparison, XGBoost offers the best accuracy of 85.7%, compared to SVM (83.1%), kNN (84.5%), ...
Article
Full-text available
Due to its high sensitivity and resolving power, gas chromatography-ion mobility spectrometry (GC-IMS) is a powerful technique for the separation and sensitive detection of volatile organic compounds. It is a robust and easy-to-handle technique, which has recently gained attention for non-targeted screening (NTS) approaches. In this article, the general working principles of GC-IMS are presented. Next, the workflow for NTS using GC-IMS is described, including data acquisition, data processing and model building, model interpretation and complementary data analysis. A detailed overview of recent studies for NTS using GC-IMS is included, including several examples which have demonstrated GC-IMS to be an effective technique for various classification and quantification tasks. Lastly, a comparison of targeted and non-targeted strategies using GC-IMS are provided, highlighting the potential of GC-IMS in combination with NTS.
... In contrast, Vega-Márquez et al. (2020) used DL to classify olive oil with regard to other quality parameters. For classifying olive oil, Vega-Márquez et al. applied DL approaches to process the MS data [77]. They used a set of 701 samples to discriminate three classes (extra virgin olive oil vs. virgin olive oil vs. lampante olive oil). ...
... Researchers performed binary classifications (e.g., lampante oil vs. non-lampante oil) as well as a ternary classification of the oils and used different numbers of hidden neurons. The binary models showed better results than the ternary model [77]. A comparison of the ternary classification with an ML application like PCA and OPLS-DA with an accuracy of 74.3%, and the DL approach with an accuracy of 81.4% highlighted that the novel technique was a useful tool in view of the classification of olive oil [77]. ...
... The binary models showed better results than the ternary model [77]. A comparison of the ternary classification with an ML application like PCA and OPLS-DA with an accuracy of 74.3%, and the DL approach with an accuracy of 81.4% highlighted that the novel technique was a useful tool in view of the classification of olive oil [77]. The success of the model presented by Vega-Márquez et al. (2020) also revealed some limitations. ...
Article
Full-text available
Deep learning is a trending field in bioinformatics; so far, mostly known for image processing and speech recognition, but it also shows promising possibilities for data processing in food analysis, especially, foodomics. Thus, more and more deep learning approaches are used. This review presents an introduction into deep learning in the context of metabolomics and proteomics, focusing on the prediction of shelf-life, food authenticity, and food quality. Apart from the direct food-related applications, this review summarizes deep learning for peptide sequencing and its context to food analysis. The review’s focus further lays on MS (mass spectrometry)-based approaches. As a result of the constant development and improvement of analytical devices, as well as more complex holistic research questions, especially with the diverse and complex matrix food, there is a need for more effective methods for data processing. Deep learning might offer meeting this need and gives prospect to deal with the vast amount and complexity of data.
... Sensory analysis by targeted and untargeted chromatography (GC), associated to ion mobility spectrometry (IMS), with chemometrics handling was applied for olive oil category classification (i.e. extra virgin olive oil, virgin, and lampante) [17,18]. Phenol and polyphenol profiles combined with chemometric analysis allowed classifying red wines according to their geographic origin, grape variety and vintage [19]. ...
Article
The characterization of Argan oils to classify them in three categories (‘Extra Virgin’, ‘Virgin’ and ‘Lower quality’) was evaluated. A total of 120 Moroccan Argan oils samples from the Taroudant Argan forest was investigated. The free acidity, peroxide value, spectrophotometric indices (K232 and K270), fatty acids, sterols, and tocopherol contents were assessed. The samples were also scanned by FTIR spectroscopy. The Principal Component Analysis (PCA) and four classification methods, Partial Least Squares Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogy (SIMCA), K-nearest Neighbors (KNN), and Support Vector Machines (SVM), were applied on both the chemical and spectral data. Besides the conventional chemical profiling, FTIR spectra were evaluated for their feasibility as a rapid non-invasive approach for classifying and predicting the oil quality categories. The most important variables for differentiating the oil categories were identified as K232, peroxide value, ɣ-tocopherol, δ-tocopherol, acidity, stigma-8-22-dien-3β-ol, stearic acid (C18:0) and linoleic acid (C18:2) and could be used as quality indicators. Eight chemical descriptors or key features from the FTIR spectra (selected by interval-PLS) could also be established as indicators of quality and freshness of Argan oils.