Diagram of the random forest machine learning method, which is an ensemble of regression trees.

Source publication

The super-turbine wind power conversion paradox: using machine learning to reduce errors caused by Jensen's inequality

Article

Full-text available

Jun 2019

Wind power is a variable generation resource and therefore requires accurate forecasts to enable integration into the electric grid. Generally, the wind speed is forecast for a wind plant and the forecasted wind speed is converted to power to provide an estimate of the expected generating capacity of the plant. The average wind speed forecast for t...

Context 1

... random forest represents an ensemble of regression trees where the final prediction is an average of the prediction from each of the trees. Figure 8 illustrates the structure of the random forest: the final prediction is an average of the predictions from each tree in the forest where each tree is given a subset of the available predictors and training data. Regression trees utilize the predictive power of dividing a dataset into smaller subsets based on the predictive relationships between the predictor and the predictand until the subsets minimize the cost function (Witten and Frank, 2005). ...

View in full-text

Context 2

View in full-text

Multistage Quality Control Using Machine Learning in the Automotive Industry

Article

Full-text available

Jun 2019

Product dimensional variability is a crucial factor in the quality control of complex multistage manufacturing processes, where undetected defects can easily be propagated downstream. The recent advances in information technologies and consequently the increased volume of data that has become readily available provide an excellent opportunity for t...

Analysis of machine repair time prediction using machine learning at one of leading footwear manufacturers in Indonesia

Article

Full-text available

Dec 2023

p>Machine breakdowns in the production line mostly finish in more than 18 minutes, since the machine that needs repair more time is done on the production line, not in the machine warehouse. Historical machine breakdown data is digitally recorded through the Andon system, but it is still not being used adequately to aid decision-making. This research introduces an analysis of historical machine breakdown data to provide predictions of repair time intervals with a focus on finding the best algorithm accuracy. The research method uses machine learning techniques with a classification model. There are five algorithms used: logistic regression (LR), naive bayes (NB), k-nearest neighbor (KNN), support vector machine (SVM), and random forest (RF). The results of this study prove that historical machine breakdown data can be optimized to predict machine repair time intervals in the production line. The accuracy of LR algorithm is slightly better than the other algorithms. Based on the receiver operating characteristic–area under curve (ROC-AUC) performance evaluation metric, the quality value of the accuracy of LR model is satisfied with a percentage of 69% with a difference of 0.5% between the train and test data.</p

A review of deep learning and machine learning techniques for hydrological inflow forecasting

Article

Full-text available

Mar 2023
Environ Dev Sustain

Conventional machine learning models have been widely used for reservoir inflow and rainfall prediction. Nowadays, researchers focus on a new computing architecture in the area of AI, namely, deep learning for hydrological forecasting parameters. This review paper tends to broadcast more of the intriguing interest in reservoir inflow prediction utilizing deep learning and machine learning algorithms. The AI models utilized for different hydrology sectors, as well as the most prevalent machine learning techniques, will be explored in this thorough study, which divides AI techniques into two primary categories: deep learning and machine learning. In this study, we look at the long short-term memory deep learning method as well as three traditional machine learning algorithms: support vector machine, random forest, and boosted regression tree. Under each part, a summary of the findings is provided. For convenience of reference, some of the benefits and drawbacks discovered through literature reviews have been listed. Finally, future recommendations and overall conclusions based on research findings are given. This review focuses on papers from high-impact factor periodicals published over a 4 years period beginning in 2018 onwards.

Exploring the Potential of Statistical Modeling to Retrieve the Cloud Base Height from Geostationary Satellites: Applications to the ABI Sensor on Board of the GOES-R Satellite Series

Article

Full-text available

Jan 2021

Although cloud base height is a relevant variable for many applications, including aviation, it is not routinely monitored by current geostationary satellites. This is probably a consequence of the difficulty of providing reliable estimations of the cloud base height from visible and infrared radiances from current imagers. We hypothesize that existing algorithms suffer from the accumulation of errors from upstream retrievals necessary to estimate the cloud base height, and that this hampers higher predictability in the retrievals to be achieved. To test this hypothesis, we trained a statistical model based on the random forest algorithm to retrieve the cloud base height, using as predictors the radiances from Geostationary Operational Environmental Satellites (GOES-16) and variables from a numerical weather prediction model. The predictand data consisted of cloud base height observations recorded at meteorological aerodrome report (METAR) stations over an extended region covering the contiguous USA. Our results indicate the potential of the proposed methodology. In particular, the performance of the cloud base height retrievals appears to be superior to the state-of-the-science algorithms, which suffer from the accumulation of errors from upstream retrievals. We also find a direct relationship between the errors and the mean cloud base height predicted over the region, which allowed us to obtain estimations of both the cloud base height and its error.

Examining the Potential of a Random Forest Derived Cloud Mask from GOES-R Satellites to Improve Solar Irradiance Forecasting

Article

Full-text available

Apr 2020

In order for numerical weather prediction (NWP) models to correctly predict solar irradiance reaching the earth's surface for more accurate solar power forecasting, it is important to initialize the NWP model with accurate cloud information. Knowing where the clouds are located is the first step. Using data from geostationary satellites is an attractive possibility given the low latencies and high spatio-temporal resolution provided nowadays. Here, we explore the potential of utilizing the random forest machine learning method to generate the cloud mask from GOES-16 radiances. We first perform a predictor selection process to determine the optimal predictor set for the random forest predictions of the horizontal cloud fraction and then determine the appropriate threshold to generate the cloud mask prediction. The results show that the random forest method performs as well as the GOES-16 level 2 clear sky mask product with the ability to customize the threshold for under or over predicting cloud cover. Further developments to enhance the cloud mask estimations for improved short-term solar irradiance and power forecasting with the MAD-WRF NWP model are discussed.

Identifying wind regimes near Kuwait using self-organizing maps

Article

Mar 2024

Optimization of wind energy integration requires knowing the relationship between weather patterns and winds they cause. For a region with less-studied weather such as the Middle East, climatology becomes more vital. The Shagaya Renewable Energy Park in development in Kuwait experiences regional wind regimes that affect wind power production. Weather Research and Forecasting (WRF) model output allowed investigation into the weather regimes most likely to impact Shagaya Park. The self-organizing maps (SOMs) machine-learning method clustered the WRF output into six primary weather regimes experienced by the Middle East. According to the wind regimes mapped by the SOM, two of the six regimes have average wind speeds of approximately 9.9 and 8.6 m s−1 at 80 m near Shagaya Park, as well as wind speed and estimated wind power distributions that are more favorable to wind power production in Kuwait. One regime depicts a strong northwesterly wind called the summer shamal, and the second is associated with strong westerlies. Regimes less favorable for Kuwaiti wind power production are represented by the remaining four SOM nodes: local weak southeasterlies, an African nocturnal low-level jet, a daytime planetary boundary layer, and local northwesterlies from autumn to spring. The remaining four SOM nodes have average wind speeds of 5.7–7.2 m s−1 and wind speed and estimated wind power distributions which indicate regimes less favorable for wind power production in Kuwait.

A database of hourly wind speed and modeled generation for US wind plants based on three meteorological models

Article

Full-text available

Dec 2023

In 2022, wind generation accounted for ~10% of total electricity generation in the United States. As wind energy accounts for a greater portion of total energy, understanding geographic and temporal variation in wind generation is key to many planning, operational, and research questions. However, in-situ observations of wind speed are expensive to make and rarely shared publicly. Meteorological models are commonly used to estimate wind speeds, but vary in quality and are often challenging to access and interpret. The Plant-Level US multi-model WIND and generation (PLUSWIND) data repository helps to address these challenges. PLUSWIND provides wind speeds and estimated generation on an hourly basis at almost all wind plants across the contiguous United States from 2018–2021. The repository contains wind speeds and generation based on three different meteorological models: ERA5, MERRA2, and HRRR. Data are publicly accessible in simple csv files. Modeled generation is compared to regional and plant records, which highlights model biases and errors and how they differ by model, across regions, and across time frames.

A statistical data-driven approach for vibration-based condition monitoring of bolted connections in structural systems

Thesis

Full-text available

Dec 2022

Mohamed Mohamed Ahmed

A variety of civil infrastructure assets such as bridges, pipes and railways form an integral part of modern societies. However, these structures are vulnerable to changes in environmental conditions and physical or direct damages. These vulnerabilities have brought rise to Structural Health Monitoring (SHM) systems, which are installed in civil infrastructure assets to monitor the health of structures through installed sensors. SHM is achieved by implementing techniques that identify, localise and assess the damage on infrastructure assets. Structural elements in metallic infrastructures assets have been connected using rivet joints and bolts since 1900s. The integrity of these connections is a crucial factor in the overall stiffness and strength of a structure; hence, it is beneficial to install a damage identification system which monitors the dynamic response within connections and detects any differences that may arise due to changes in connection characteristics. Previous studies investigated the dynamic response through modal-based properties where modal damping is one of the least researched topics due to mathematical complexities in obtaining damping matrix and limitations in the traditional methods to obtain damping ratio in both time and frequency domains. Probability Distribution Decay Rate (PDDR) algorithm has been proposed which seems to be able to overcome the limitations in time domain to detect changes in the overall damping by observing changes in the statistical parameters. However, PDDR method limitations are following: (1) was tested on only sensors that is placed close to structural connection with loosened bolts; (2) only achieves levels 1 and 3 of Rytter’s damage classification (detection and quantification). Several techniques such as, Data fusion, damage localisation, supervised and unsupervised learning, and dimensionality reduction technique were implemented to PDDR algorithm to fuse the distribution data together and observe any deviation in the physical condition of the structure and localise the damage in the structure with bolted connections. Comparison was done between Kalman and Bayesian fusion methodologies using single story frame and 4-Storey steel frame datasets and it showed an improvement in detection, localisation and classification from individual sensors.

Detention Decision-Making in Slovenia Using the Computerized Risk Assessment Tool Detention v1.0: Effective Use of Machine Learning Algorithms from the Perspective of the Defendant’s Procedural Rights

Chapter

Aug 2021

Judicial systems will soon no longer be able to avoid the process of modernization in the form of day-to-day use of algorithms. In the United States of America (USA), computer programs are already used consistently by judges to assess the risk of a defendant (re)offending. A similar program is currently being developed in Slovenia under the project name Detention v1.0. In this article, we argue that the potential use of this computerized experimental proof of concept in Slovenian detention procedures would not violate the defendant’s rights. Quite the opposite, the detention procedure can be fair for the defendant and fully transparent. This can be achieved by using supervised machine learning algorithms and meaningful human intervention during the detention procedure. Judges should therefore not be scared of being replaced by a computer program. They would remain the dominus litis of the detention procedure while the computer program would only be considered as a tool to provide objectified risk assessment results.

Diagram of the random forest machine learning method, which is an ensemble of regression trees.

Contexts in source publication

Similar publications

Citations