ArticlePDF Available

Random forest machine learning algorithm based seasonal multi‐step ahead short‐term solar photovoltaic power output forecasting

Wiley
IET Renewable Power Generation
Authors:

Abstract and Figures

To maintain grid stability, the energy levels produced by sources within the network must be equal to the energy consumed by customers. In current times, achieving energy balance mainly involves regulating the electrical energy sources, as consumption is typically beyond the control of grid operators. For improving the stability of the grid, accurate forecasting of photovoltaic power output from largely integrated solar photovoltaic plant connected to grid is required. In the present study, to improve the forecasting accuracy of the forecasting models, onsite measurements of the weather parameters and the photovoltaic power output from the 20 kW on‐grid were collected for a typical year which covers all four seasons and evaluated the random forest techniques and other techniques like deep neural networks, artificial neural networks and support vector regression (reference in this study). The simulation results show that the proposed random forest technique for the forecasting horizon of 15 and 30 min is performing well with 49% and 50% improvements in the accuracy respectively over reference model for the study location 22.78°N, 73.65°E, College of Agricultural Engineering and Technology, Anand Agricultural University, Godhra, India.
This content is subject to copyright. Terms and conditions apply.
Received: 14 June 2023 Revised: 30 October 2023 Accepted: 22 November 2023 IET Renewable Power Generation
DOI: 10.1049/rpg2.12921
ORIGINAL RESEARCH
Random forest machine learning algorithm based seasonal
multi-step ahead short-term solar photovoltaic power output
forecasting
Sravankumar Jogunuri1,2Josh F.T1Albert Alexander Stonier3Geno Peter4
Jayakumar Jayaraj1Jaganathan S5Jency Joseph J6Vivekananda Ganji7
1Division of Electrical and Electronics Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India
2Department of Renewable Energy Engineering, College of Agricultural University, Anand Agricultural University, Godhra, Gujarat, India
3School of Electrical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
4CRISD, School of Engineering and Technology, University of Technology Sarawak, Sarawak, Malaysia
5Division of Electrical and Electronics Engineering, NGP Institute of Technology, Coimbatore, Tamil Nadu, India
6Department of Electrical and Electronics Engineering, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India
7Department of Electrical and Computer Engineering, Debre Tabor University, Debre Tabor, Ethiopia
Correspondence
Geno Peter, CRISD, School of Engineering and
Technology, University of Technology Sarawak,
Sarawak, Malaysia.
Email: drgeno.peter@uts.edu.my
Vivekananda Ganji, Department of Electrical and
Computer Engineering, Debre Tabor University,
Debre Tabor, Ethiopia.
Email: vivekganji@dtu.edu.et
Abstract
To maintain grid stability, the energy levels produced by sources within the network must
be equal to the energy consumed by customers. In current times, achieving energy balance
mainly involves regulating the electrical energy sources, as consumption is typically beyond
the control of grid operators. For improving the stability of the grid, accurate forecasting
of photovoltaic power output from largely integrated solar photovoltaic plant connected to
grid is required. In the present study, to improve the forecasting accuracy of the forecast-
ing models, onsite measurements of the weather parameters and the photovoltaic power
output from the 20 kW on-grid were collected for a typical year which covers all four sea-
sons and evaluated the random forest techniques and other techniques like deep neural
networks, artificial neural networks and support vector regression (reference in this study).
The simulation results show that the proposed random forest technique for the forecasting
horizon of 15 and 30 min is performing well with 49% and 50% improvements in the accu-
racy respectively over reference model for the study location 22.78N, 73.65E, College of
Agricultural Engineering and Technology, Anand Agricultural University, Godhra, India.
1 INTRODUCTION
The utilization of photovoltaic (PV) power has the potential to
meet the rising global need for clean energy as it is a renewable,
environmentally-friendly, and adaptable source of distributed
energy [1].
Power grids typically include power plants that produce
steady streams of energy, such as coal, gas, and nuclear power
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is
properly cited.
© 2024 The Authors. IET Renewable Power Generation published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.
plants, as well as plants that generate unpredictable and variable
energy, such as wind and photovoltaic power plants, whose out-
put depends heavily on weather conditions in a given location
and time. To maintain grid stability, the energy levels produced
by sources within the network must be equal to the energy con-
sumed by customers. In current times, achieving energy balance
mainly involves regulating the electrical energy sources, as con-
sumption is typically beyond the control of grid operators [2, 3].
IET Renew. Power Gener. 2024;1–16. wileyonlinelibrary.com/iet-rpg 1
2JOGUNURI ET AL.
Forecasting solar power will play a crucial role in determining
the future of renewable energy plants and their integration with
grids on a large scale. The accuracy of predicting photovoltaic
power generation is heavily reliant on the constantly changing
weather conditions [4, 5].
Accurately forecasting solar power is critical in reducing
energy expenses and ensuring high-quality power in electrical
power grids that rely on distributed solar photovoltaic gener-
ation. For residential and small commercial users who utilize
on-site photovoltaic generation, obtaining historical irradiance
data directly can be difficult due to the high cost of solar irradi-
ance meters. However, weather forecasting services offered by
local meteorological organizations have improved and provide
information such as temperature, dew point, humidity, visibil-
ity, wind speed, and descriptive weather summaries through the
internet. Unfortunately, forecasting data for solar power is often
unavailable [6].
Solar power is one of the crucial sources of electricity grid,
and accurate information about the amount of solar power to
be generated from different sources and at various intervals—
minutes, hours, and days—is essential for its optimal utilization.
Forecasting solar power relies on two primary methods based
on the time horizon: statistical time series forecasting for short
to midterm intervals and numerical weather prediction (NWP)
for medium to long-term intervals [7].
Forecasting methods can be broadly categorized into three
groups: physical, statistical, and machine learning methods. In
the physical method, NWP models are used for long-term fore-
casting horizons of one to two days. The statistical method is
based on historical time data series, which is less complex than
the physical method. However, its prediction accuracy is limited
as it relies on the persistence or stochastic time series concept,
while irradiance time series characteristics are non-stationary
[6].
In recent years, the use of machine learning-based (ML) algo-
rithms has become a reliable alternative/complement to NWP
in solar energy prediction problems, given the significant rise
in their ability and accuracy to obtain reliable forecasts [8].
Machine learning is a branch of artificial intelligence that uses
datasets to build a non-linear mapping between input and out-
put data without explicit programming. While the literature has
utilized statistical based machine learning forecasting methods,
mostly seen methods are support vector machine (SVM), arti-
ficial neural network (ANN) or deep learning neural networks
(DNN) but there are very a few studies that utilize random
forest (RF) algorithms for solar photovoltaic power forecast-
ing and needs lot of exploration of forecasting accuracy of RF
algorithms for different site specific and seasonal data available.
In the present study, we developed a forecasting model with
multi-step a-head short-term (15–60 min a-head) solar PV
power forecasting for the selected site based on ensemble RF
techniques and domain knowledge. These results were com-
pared with the widely used statistical machine learning based
algorithms support vector machine/regression (SVM/SVR)
and ANN/DNN. Selection of an appropriate forecasting model
is a challenging job and hence we meticulously compared and
evaluated the models for forecasting using different perfor-
mance metrics for different seasonal intervals of the data and
selected the best model for forecasting solar photovoltaic power
from 15 min a-head to 60 min a-head.
The rest of this paper is divided into several sections. In Sec-
tion 2, the literature on machine learning statistical models such
as RF, SVR, and ANN/DNN for predicting short-term solar
photovoltaic power is presented. Section 3includes information
on the dataset used in this study, data analysis, and the prediction
of solar photovoltaic power values for different time periods
(ranging from 15 to 60 min ahead) using measured weather
parameters data. Section 4provides a brief introduction to RF,
SVR, and ANN/DNN. Section 5presents the results obtained
from the study, and Section 6concludes the paper.
2LITERATURE REVIEW
2.1 Works reported on short term solar
photovoltaic power forecasting
Ref. [9] introduced two stochastic models for predicting solar
photovoltaic (PV) system behaviour. They provide short-term,
high-resolution probabilistic forecasts using historical data.
The first model uses uncertain basis functions with three
possible distributions. The second model uses stochastic state-
space models with a filter-based expectation-maximization and
Kalman filtering mechanism. These models are suitable for real-
time use in tertiary dispatch controllers and optimal power
controllers.
Ref. [10] proposed a new hybrid model for short-term solar
PV power forecasting in India using data collected in Kolkata
in 2014 for a year. The model combines GA with ANFIS and
was tested using actual data from a solar power plant in India.
Weekly forecasts were created with the model and its accuracy
was compared to existing models, showing better performance
in forecasting with a mean absolute percentage error (MAPE)
accuracy metric.
In the current situation, most distribution companies bid for
power every 15 min. To address this, [11] developed a short-
term solar energy forecasting with an intelligent approach based
on wavelet transform and generalized neural network (GNN) to
overcome the fluctuating and non-linear nature of solar energy.
Data on global solar irradiance, ambient temperature, relative
humidity, and wind speed were collected at 15-min intervals and
used as input. The proposed GNN model outperformed tradi-
tional models, as shown by statistical indicators like root mean
square error (RMSE) and mean absolute error (MAE).
Reference [12] introduced three deep learning models,
including a convolutional neural network, a long short-term
memory network, and a hybrid model, for photovoltaic power
prediction using data from a 4-year period with 5-min inter-
vals. The models were evaluated using different metrics and
the hybrid model performed the best, followed by the con-
volutional neural network. The study found that longer input
sequences improved the accuracy of the models, but not always.
The deep learning models presented demonstrate the potential
for improving photovoltaic power prediction accuracy.
JOGUNURI ET AL.3
Reference [13] proposed a deep neural network model, called
PVPNet, PVPNet is a precise deep neural network model for
forecasting PV system output power. It uses meteorological
information and historical data to produce 24-h probabilistic
and deterministic forecasts. PVPNet outperforms other mod-
els with an MAE of 109.4845 and an RMSE of 163.1513 and
is effective in predicting complex time series with high volatility
and irregularity.
Reference [14] proposed a new model combining long-
short-term-memory (LSTM) and wavelet transform (WT) for
short-term solar power prediction in Flanders, Belgium. The
model uses meteorological factors as inputs and is evalu-
ated using statistical measures. Results show it outperforms
other contemporary machine learning and deep-learning-based
models.
Reference [15] proposed methods for forecasting day-ahead
power output time-series for solar power plants, with separate
approaches for ideal and non-ideal weather conditions. The
ideal weather conditions method uses LSTM networks, while
the non-ideal method considers time-series relevance and spe-
cific non-ideal weather characteristics, incorporating adjacent
day time-series, and uses discrete grey model (DGM) to improve
power output prediction. The data was collected over a time
period spanning from 1 November 2016 to 28 October 2017
and was recorded in 15-min intervals. The proposed model
was evaluated using data from a solar power plant in Shan-
dong province, China, and outperformed traditional algorithms
in terms of forecasting accuracy.
Reference [1] proposed a hybrid deep learning model for
predicting PV power 1 h ahead, using data collected at 5-min
intervals from 1 June 2014 to 31 May 2015. The model parti-
tions the PV power series using Wavelet Packet Decomposition
and uses four distinct LSTM networks to handle the sub-series,
with predictions combined using a linear weighting method.
Evaluation against actual data from Alice Springs, Australia
shows better performance than other models such as LSTM,
RNN, GRU, and MLP, according to MBE, MAPE, and RMSE
criteria.
Reference [16] introduced a method for predicting PV power
generation using LSTM neural network. Historical weather data
from Desoto solar farm and Arcadia in Florida were collected
from NREL for 2012–2018, divided into 4 seasons. A synthetic
weather forecast was created using historical solar irradiance
data and publicly available sky forecast data by K-means algo-
rithm. The proposed synthetic weather forecast was found to
improve the accuracy of PV power generation forecasting.
Reference [17] developed a WT-LSTM model for predict-
ing short-term solar power using wavelet transform and long
short-term memory networks. The model decomposes solar
energy time-series data into frequency series and uses LSTM
with dropout to predict future values using meteorological fac-
tors as input. The model outperformed other contemporary
models according to statistical performance measures. Data was
collected from February 2016 to October 2017.
Reference [18] compared deep learning neural networks
(DLNN) including LSTM, BiLSTM, GRU, BiGRU, CNN1D,
CNN1D-LSTM, and CNN1D-GRU for short-term output PV
power forecasting. The models were trained and tested on a
database of PV power generated by a micro grid at the Uni-
versity of Trieste in Italy, evaluated across four different time
periods for both one-step and multi-step forecasting. The study
found high accuracy, especially for one-step forecasting with
a 1 min time horizon, and acceptable results for multi-step
forecasting up to 8 steps ahead.
Reference [19] introduced a model to predict short-term
power output of residential solar panels using genetic algo-
rithms and support vector machines (SVM). The GASVM
technique was trained and verified using real-world data and
outperformed the conventional SVM model with a significant
margin, according to RMSE and MAPE metrics.
Reference [20] presented a new algorithm called hybrid
improved multi-verse optimizer (HIMVO) to optimize sup-
port vector machine (SVM) for predicting photovoltaic output.
The data collection period is from 8:00 AM to 6:00 PM each
day, with power output readings recorded every 10 min. The
HIMVO algorithm incorporates chaotic sequences and was
tested on authentic operational data from photovoltaic arrays
in Alice Springs, Australia. Results show that the HIMVO algo-
rithm is more stable and effective than other optimization
algorithms tested, with higher prediction accuracy and stabil-
ity for different weather types. The proposed method has the
potential to improve photovoltaic output prediction.
Reference [21] investigated an ultra-short-term PV model for
data preprocessing to improve the performance of support vec-
tor machine (SVM) models for predicting PV power. The data
from January 2018 to November 2019 was processed at 5-min
intervals and optimized using ant colony optimization (ACO).
The results showed that appropriate data preprocessing can
increase the model’s regression coefficient (R2) by 6.8%.
Reference [22] introduced a technique for solar energy
forecasting using machine and deep learning methods. The
proposed solution utilizes a single tool and suitable predic-
tive models and was assessed for real-time and short-term
prediction of solar energy. The data used in the study span
from 2016 to 2018 and relate to Errachidia, Morocco, and the
Pearson correlation coefficient was employed to determine rel-
evant meteorological inputs. The RF and ANN models showed
high accuracy, while LR and SVR models reported significant
errors. ANN performed well for both real-time and short-term
predictions.
Reference [23] investigated the relationship between input
parameters and power generated by solar PV panels using SVM
and GPR machine learning models. The input parameters stud-
ied were solar PV panel temperature, ambient temperature, solar
flux, time of the day, and relative humidity. The Matern 5/2
GPR algorithm was found to be the most effective, while the
cubic SVM had the poorest performance. The predicted results
were consistent with experimental values, indicating the suit-
ability of the proposed ML models for predicting the power of
various solar PV panels. The accuracy and efficiency of SVM
and GPR models were compared using the RMSE and MAE
criteria.
Reference [24] proposed a new model based on RF algorithm
for forecasting daily power generation at Zhonghe PV station in
4JOGUNURI ET AL.
North China. The model was found to be effective in reducing
overfitting and achieved lower mean absolute percentage errors
of 2.83% and 3.89% for clear and cloudy days, respectively.
However, the model’s forecasting errors were relatively high on
unusual weather days, and methods such as increasing train-
ing samples, subdividing, and manual intervention were found
to improve accuracy. The proposed model outperformed the
other three methods in most error evaluation indicators across
all categories.
Reference [25] created a framework to assess different mod-
els and techniques for solar power forecasting. Machine learning
methods, such as random forest, artificial neural network, and
extreme gradient boosting, were tested with feature selection
techniques, including feature importance and principal compo-
nent analysis. The optimal combination was found to be the
XGBoost method with features selected by PCA, which outper-
formed other methods. The framework can be utilized to select
the most suitable machine learning approaches for short-term
solar power forecasting.
Reference [26] developed a forecasting method called
RF-CEEMD-DIFPSO-BPNN for PV power generation fore-
casting, which combines several techniques such as ran-
dom forest (RF), improved grey ideal value approximation
(IGIVA), complementary ensemble empirical mode decompo-
sition (CEEMD), particle swarm optimization algorithm based
on dynamic inertia factor (DIFPSO), and backpropagation neu-
ral network (BPNN). The RF method is utilized to identify the
most significant factors, and the weight values obtained from
RF are transferred to the IGIVA model. Then, the CEEMD
method is applied to reduce the sequence’s fluctuations. The
hybrid model’s effectiveness is confirmed by an empirical analy-
sis, indicating that the RF-CEEMD-DIFPSO-BPNN approach
is a promising method for PV power generation forecasting.
Reference [27] developed a predictive model for PV power
generation using machine learning algorithms. The study was
conducted using data collected at Alice Springs in Australia as
a case study, and a variety of environmental factors were con-
sidered for short-term and long-term energy output prediction.
The study compared several machine learning algorithms and
found that random forest regression was the most effective for
the given dataset.
Reference [28] introduced an improved version of the ran-
dom forest model for data analysis. The approach optimized
bias/variance in STPF application by employing attribute selec-
tion methods, resulting in better forecasting quality. Local
Interpretable Model-Agnostic Explanations, Extreme Boosting
Model, and Elastic Net were used to create a feature-weighting
vector for weather inputs. The proposed approach outper-
formed various data-driven machine learning models when used
in a typical distributed PV system, using a real database from
weather sensors.
Three studies focus on optimizing solar energy forecast-
ing and microgrid operations [29-31]. The first study [29]
addresses the challenges of inaccurate solar power forecasts
and evaluates various machine learning models. It finds that
the Bi-LSTM model performs best, enhancing forecasting accu-
racy. The second study [30] proposes an optimal scheduling
model for microgrids that employs automated reinforcement
learning for load and renewable energy forecasts. The model
reduces operating costs and improves prediction accuracy. The
third study [31] concentrates on predicting solar PV power
generation in Lubbock, Texas, using machine learning models.
Random forest regression and Long Short-Term Memory mod-
els outperform others, capturing complex relationships in solar
power data, aiding in efficient planning and energy production.
These studies collectively contribute to enhancing the accu-
racy and efficiency of solar energy generation and microgrid
operations.
The proposed random forest technique in this work
effectively models non-linear relationships between weather
parameters and photovoltaic (PV) power output. Proper data
preprocessing, feature selection, and hyperparameter tuning
are essential. This approach is robust due to its non-linearity
handling, resistance to overfitting, and features important
insights. The direct computation method of kernel functions
enhances non-linear learning by improving efficiency, manag-
ing dimensionality, ensuring flexibility, and enhancing feature
interpretability.
3STUDY AREA DESCRIPTION, DATA
COLLECTION, AND PREPARATION
We collected the dataset for 12 months (October 2021 to
September 2022) from 20 kW on-grid solar power plant and
a local weather station installed at 22.78N, 73.65E, College of
Agricultural Engineering and Technology, Anand Agricultural
University, Godhra, India. The study site has four classic sea-
sons, autumn season from October to November, winter season
from December to February, summer season from March to
May and rainy season from June to September.
The proposed technique can provide information about
the importance of different weather parameters (e.g., temper-
ature, humidity, wind speed) in making predictions. This feature
importance analysis using correlation coefficient analysis using
heat maps helps identify which variables have the most signif-
icant impact on the outcome, allowing the model to adapt to
varying weather conditions. Accordingly, data was collected at
a 15-min time resolution, but only data between 7:00 AM and
5:00 PM was considered due to solar radiation availability. How-
ever, missing values occurred in the dataset due to power failures
affecting the data loggers. This resulted in a dataset of 10,086
samples for the study period, which is summarized in Table 1.
The weather variables (time of the day,Time(Hrs), ambient tem-
perature,Temp (C), relative humidity, Hum(%), solar radiation,
Rad (W/m2), wind speed, Ws(m/s), and wind direction, Wd())
were recorded by one data logger, while the other data log-
ger recorded the target variable (solar photovoltaic power, Rad
(W/m2)). Enough care was taken in matching the data collected
from two data loggers with respect to the time of data collection.
Figure 1shows the distribution of the weather variables.
It is crucial to take into account data availability and corre-
lation when choosing variables for a prediction task. Hence, a
statistical examination was performed to assess the correlation
JOGUNURI ET AL.5
TAB LE 1 Description of the data collected for the study period.
Time (Hrs) Temp (C) Hum (%) Wd()Ws(m/s) Rad (W/m2)PVoutput (kW)
Count 10,086 10,086 10,086 10,086 10,086 10,086 10,086
Mean 11.85 30.33 54.19 121.12 2.00 337.63 7.13
Std 2.95 5.61 27.57 73.75 2.02 198.85 4.27
Min 7.00 8.60 0.00 0.00 0.00 0.00 0.00
25% 9.30 27.50 32.00 70.00 0.00 170.00 3.35
50% 12.00 31.20 51.00 91.00 1.80 336.00 6.86
75% 14.30 33.70 76.00 182.00 2.70 503.00 10.79
Max 17.00 45.40 100.00 360.00 12.60 1013.00 20.80
*Study period (October 2021 to September 2022) at 15-min time resolution.
FIGURE 1 Distribution of weather parameters. (a) Ambient temperature (C), (b) relative humidity (%), (c) solar radiation (W/m2) and (d) wind speed (m/s).
between each available weather variable and solar photovoltaic
power, as demonstrated in Figure 2. It displays the correla-
tion coefficients between all five weather parameters and time
of day (a total of six input variables X1to X6) in relation to
PVoutput (output or target variable Yt) using the entire dataset.
A negative correlation was detected between temperature and
humidity. The data indicates a strong correlation between solar
radiation and PVoutput. Although a direct and significant relation-
ship between time, temperature, and PVoutput was not evident,
time does appear to have a substantial impact on temperature,
which is in turn correlated with solar radiation. Therefore, time
of day was included as an input variable in the study in addition
to the weather parameters.
In the present study, entire one-year data set was divided in
to seasonal data, for autumn season data considered from 1
October 2021 to 30 November 2021, for winter season from 1
December 2021 to 28 February 2022, for summer season from
1 March 2022 to 31 May 2022 and, for rainy season from 1 June
2022 to 30 September 2022.
4MATERIAL AND METHODS
Great efforts have been put by the researchers for mod-
elling and forecasting of PV power. Proposals for various
forecasting techniques have been described in various papers
in order to forecast PV power at various time horizons
and most importantly, short term PV power forecasting is
very much essential in controlling, dispatching, and scheduling
power [32].
6JOGUNURI ET AL.
FIGURE 2 Heatmap of correlation coefficients between input variables
and target variable solar photo voltaic power output.
Machine learning (ML) involves training a computer system
to gain expertise by processing and analysing data collected over
time, aiming to improve its performance over time [33].
4.1 Random forest (RF)
RF is one of the most widely used machine learning algorithms
due to its simplicity. It can be used in the areas of regression
and classification. It is one of the classes of supervised learn-
ing algorithms and SVMs, which includes naive Bayes algorithm
and other tree-based algorithms such as Adaboost [34]. It was
first developed and proposed by Breiman et al. [38]attheUni-
versity of California in 2001. Random forest regression is an
ensemble learning technique that integrates predictions from
various machine learning algorithms to produce more precise
predictions than a single model [27]. The proposed random for-
est technique does not require extensive data preprocessing or
imputation of missing values prior to training. This contrasts
with some other machine learning algorithms that may require
a complete dataset or explicit imputation strategies. Random
forest can work with missing data “out of the box”. When ran-
dom forest encounters missing data in the training dataset, it
can handle it by imputing or filling in missing values. Random
forest assesses the importance of features during the training
process. If a feature with missing data is not highly relevant
for the prediction task, the model may naturally give it less
weight, effectively downplaying its contribution to the model’s
predictions. The method constructs trees separately by utiliz-
ing bootstrap data samples, resulting in a forest that includes a
considerable number of decision trees. The accuracy of the fore-
cast increases as more trees are included, leading to improved
precision [35]. Figure 3demonstrates the configuration of the
random forest model.
To execute random forest regression on the training data set,
the subsequent actions must be taken:
FIGURE 3 Random forest model.
1. Initially, a selection of ‘k data points is made from the input
(training) dataset, denoted by x’.
2. A decision tree is created that corresponds to these k data
points.
3. The first and second steps are reiterated until N decision
trees are generated during the training phase.
4. When presented with a new data point, each of the generated
trees produces a prediction value y’. The data point is then
attributed to the average of all predicted y values.
Random forest regression performs well on diversified prob-
lems with the potentiality of handling non-linear relationships.
Recently, RF has gained greater attention among researchers
in the field of PV power forecasting due to its advantages in
ensemble learning and superior performance compared to other
statistical-based machine learning algorithms. Random forests
offer high accuracy, robustness, and versatility in handling
diverse data types. They mitigate overfitting through ensem-
ble learning and serve as a dependable baseline for forecasting.
In contrast, models like ARIMA, Exponential Smoothing, and
Neural Networks’ accuracy relies on data characteristics and
hyperparameter choices. Random forest can be computation-
ally intensive, especially with many trees and features, but it
benefits from parallel processing. Other models like ARIMA
and Exponential Smoothing are generally efficient for univari-
ate time series. Deep learning models like RNNs and LSTMs are
computationally intensive to train. In this study, RF techniques
were used to forecast PV power output from a grid-connected
PV plant. Random forest is robust to outliers due to its ensem-
ble nature, which combines predictions from multiple trees.
While outliers may affect individual tree decisions, their impact
is minimized when aggregated in the ensemble’s majority vote
or average.
To assessthe effectiveness of RF, other commonly used mod-
els such as SVR and ANN/DNN were also evaluated and
JOGUNURI ET AL.7
compared with the RF technique. SVR technique is used as
a reference model for evaluating the skill score of the model
proposed for the selected site.
4.2 Support vector machines (SVM)
The support vector machine (SVM) is an advanced machine-
learning technique that was first introduced by Vapnik and is
known for its high performance [36]. An SVM is a supervised
learning algorithm used for classification and regression analy-
sis [37]. It works by finding the hyperplane that best separates
the data points into different classes. In simple terms, an SVM
tries to find the best line (in two dimensions) or hyperplane (in
multiple dimensions) that separates the data points belonging to
different classes.
The SVM algorithm works by transforming the input data
into a higher-dimensional space using a kernel function. Then,
it finds the hyperplane that maximally separates the transformed
data points into different classes. The points closest to the
hyperplane are called support vectors and are used to define the
hyperplane.
Support vector regression (SVR) and support vector classifi-
cation (SVC) are two types of supervised learning algorithms
based on SVMs, but they are used for different types of
problems.
The main difference between SVR and SVC is that SVR is
used for regression problems, where the goal is to predict a
continuous output variable, while SVC is used for classifica-
tion problems, where the goal is to predict a categorical output
variable.
In SVR, the goal is to find a hyperplane that best fits the
training data points while minimizing the error between the pre-
dicted and actual values. The hyperplane is defined by a set of
support vectors, and the distance between the hyperplane and
the closest data points from each class is called the margin. The
margin is used to control the trade-off between model complex-
ity and generalization performance. SVR uses a loss function
that penalizes errors more for points that are further away from
the hyperplane, resulting in a model that is less sensitive to
outliers.
In SVC, the goal is to find a hyperplane that best separates
the training data points into different classes. The hyperplane is
defined by a set of support vectors, and the distance between the
hyperplane and the closest data points from each class is called
the margin. The margin is used to control the trade-off between
model complexity and generalization performance. SVC uses a
loss function that penalizes errors more for misclassified points,
resulting in a model that is more sensitive to misclassification.
Both algorithms use support vectors and margins to control
the trade-off between model complexity and generalization per-
formance, but they use different loss functions that penalize
errors differently.
Unlike the commonly used empirical risk minimization
(ERM) approach in statistical learning methods, SVMs utilize
the structural risk minimization (SRM) concept to mitigate an
upper bound on the generalization error. This allows SVMs to
have a greater potential to generalize, as opposed to simply min-
imizing the error in the training data. In addition, SVMs are
more likely to find a global optimum solution rather than get-
ting stuck in a local optimal solution like classical neural network
models. SVMs can be used for both classification and regression
problems [38].
4.3 Feature space and kernel functions
The core operating principle of SVMs is to map data onto a
feature space using non-linear mapping and then apply a lin-
ear algorithm. Because the feature space requires dot product
evaluation, it is often high-dimensional and resource-intensive,
requiring significant computational power and time. However,
in some cases, a simpler kernel may be developed and eval-
uated for its effectiveness. In real-world scenarios, complex
problems require more advanced hypothesis spaces than those
provided by linear learning machines, which are limited by their
computational capabilities.
The given attributes cannot be expressed as a basic linear
combination in the target data. Linear learning machines have
a useful characteristic, which is the ability to be represented in a
dual form. This means that the hypothesis can be expressed as a
linear combination of the training points, allowing the decision
rule to be evaluated solely based on the inner products between
the test point and the training points. If it is possible to directly
calculate the inner product in feature space using the original
input points, it may be possible to create a non-linear learn-
ing machine called a direct computation method of the kernel
function, denoted by K[38].
The SVM models utilize input variables that have a con-
nection with the objective variable, which is the variable that
needs to be predicted. This involves representing the data in a
non-linear function f(x) and visualizing it.
f(x)=𝝎.𝝋
(x)+b(1)
where 𝜔is the normal vector bis a constant or biased term
𝜑(x) is a large dimensional special characteristic mapped by a
space vector x.
To determine the coefficients 𝜔and b, an optimization
problem is solved through minimization.
R(SVM )(f)=C1
N
N
I=1
xi=1=Le(f(xi),yi)+1
2w2
(2)
Lef(xi),yi=Si f(x),y−𝜖for f(x),y𝜖(3)
Lef(xi),yi=0 otherwise (4)
where 𝜖is the parameter of the model.
Le(f(xi),di) is describes the 𝜖th missing function, this refers
to the fact that any errors that fall under the value of epsilon will
not be subject to penalty, direpresents the solar PV power in
8JOGUNURI ET AL.
FIGURE 4 Illustration of Support Vector Regression.
the period iand C1
NN
I=1L𝜖(f(xi),di) defines the empirical
error of SVM model.
1
2w2is the regularization term, Cis the penalty function
assessed to balance the compensation between the error and
empirical risk by utilizing slack variables 𝜀and 𝜀. These vari-
ables indicate the presence of excessive top and bottom skews,
respectively.
Equation (2) can be formulated as demonstrated below by
utilizing the characteristics of the function that needs to be
optimized (illustration shown in Figure 4).
minimize 1
2w2+1
N
N
i=1𝜀i+𝜀
i(5)
only when
yi (wxi+b)𝜀i+𝜀
i
wxi+b.yi𝜀+𝜀
i
𝜀i,𝜀
i0
(6)
By utilizing Lagrange and optimal constraints, it is feasible to
derive a non-linear regression function to solve Equation (1):
f(x)=
l
i=1i−∝
iK(xix)+b(7)
where i,∝
iare Lagrange multipliers.
The term K(xix) is defined as a kernel function.
K(xix)=
D
i=1
𝜑i(x)+𝜑
i(y)(8)
There are four main functions available for SVM, namely
linear, polynomial, radial basis function, and sigmoid [33].
(i) Linear Kernel Function
Kxi,xj=xi.xj(9)
where xi,xjare the inputs to the ith and jth dimensions
respectively
(ii) Polynomial Kernel Function
Kxi,xj=xi.xjq(10)
where qis degree of polynomial
(iii) Radial Basis Kernel Function
K(xi,xj)=exixj2𝜎
2(xixj)q(11)
where 𝜎is kernel weight
(iv) Sigmoid Kernel Function
xi,xj=tanh(v(xi.xj)+c) (12)
where vand cae adjustable kernel functions relying on the data.
4.4 Artificial neural networks (ANN) and
deep neural networks (DNN)
ANN emulates the form and function of the natural neural
network found in the human body. ANNs possess the capa-
bility to automatically identify data patterns from previously
included data in the network [39]. ANNs are widely acclaimed
for their ability to model complex and non-linear processes
between input and output variables, making them superior to
other forecasting techniques.
Figure 5illustrates a simple architecture for an ANN, where
neurons process the input received and produce an output using
their individual activation functions. In ANNs, the learning rate
parameter, number of hidden layers, and maximum iteration
count are essential parameters that regulate the learning process.
The activation functions’ weights and parameters are adjusted
through a process called learning. The number of neurons in
the input, hidden, and output layers may vary, and various acti-
vation functions such as Sigmoid, Rectified Linear Unit, and
Softmax are utilized in ANNs for computation. ANNs have
various advantages such as fault tolerance, parallel processing
capability, and ability to store information across the network,
without losing performance. However, there are also some dis-
advantages to ANNs, such as hardware dependency, which
requires processors with parallel processing power. Additionally,
the lack of interpretability of the network and the unpre-
dictability of the duration of the network are also significant
drawbacks [40].
DNNs are essentially ANNs with multiple hidden layers,
rather than just one. The conventional layers of DNNs can
effectively capture and utilize the fundamental one- or two-
dimensional structure of the network. With the growth of
IoT and the increasing capacity of big data, DNN models
have gained significant attention for use in various research
JOGUNURI ET AL.9
FIGURE 5 Basic architecture of ANN and DNN. (a) Artificial neural networks (ANN) and (b) Deep neural networks (DNN).
fields. One advantage of DNNs is their ability to capture
non-linear relationships between input features and output tar-
gets. The process involves acquiring knowledge from data by
focusing on learning multiple layers of representations that
gradually become more meaningful representation of the data.
As it delves deeper, it becomes capable of recognizing more
advanced representations, enabling it to establish an accurate
correlation between input characteristics and their intended
target [41].
4.5 Advantages of RF over other ML
techniques
1. ANN, DNN, and SVM require larger data base, while ran-
dom forest does not require large data base, it is powerful
in giving the more accurate predications compared to the
others with less data.
2. Random forest models are relatively easy to interpret. They
provide feature important scores, allowing you to understand
the impact of different features on the predictions.
3. Random forest is less sensitive to noisy data and outliers. The
ensemble nature of the model helps mitigate the impact of
individual noisy data points.
4. Random forest models are generally faster to train and
require less computational resources than deep neural
networks, especially when dealing with large datasets.
5. Random forest can naturally handle categorical data without
the need for one-hot encoding or extensive preprocessing.
6. Random forest is generally more stable and less prone to
overfitting, making it a reliable choice when the dataset is
limited, or the data quality is inconsistent.
7. Random forest can perform well with smaller datasets, mak-
ing it suitable for applications where large amounts of data
are not readily available.
5PERFORMANCE METRICS
The RMSE, the MAPE, and mean absolute arc-tangent percent-
age error (MAAPE) were calculated and are used as evaluation
criteria to validate the error and assess how well the proposed
model is performing. Additionally, the skill score has been cal-
culated, considering one of the statistical models as a reference.
Here in this study, SVR is the reference model.
5.1 Root mean square error
RMSE =
1
N
N
i=1AiFi
Ai
(13)
If you need to communicate model performance to non-data
professionals, MAPE would be a better choice than RMSE as
it is much easier to understand. MAPE is expressed as a per-
centage, making it more accessible and comprehensible for end
users who may not have a background in data.
5.2 Mean absolute percentage error
MAPE =1
N
N
i=1AiFi
Ai
.100%(14)
MAPE suffers from a significant drawback as it can pro-
duce undefined or infinite values when actual values are zero or
close to zero. To address this issue, a new measure of forecast
accuracy called the MAAPE has been developed by looking at
MAPE from a different perspective. While MAPE considers the
slope as a ratio, MAAPE views it as an angle, making it a more
10 JOGUNURI ET AL.
reliable and robust measure. MAAPE overcomes the problem
of division by zero by using bounded influences for outliers in
a fundamental manner. It retains the philosophy of MAPE but
considers the ratio as an angle, rather than a slope, to address
the issue of undefined or infinite values [42].
5.3 Mean arc-tangent absolute percentage
error
MAAPE,𝜃, an angle,varying from 0to 90
=tan1AiFi
Ai(15)
5.4 Skill score
There is a common argument that measures of forecast accuracy
ought to be presented as a skill score.
SS =AfAr
ApAr
(16)
where Afand Arrepresent the ‘accuracy’, according to some
given measure, of the forecasting system of interest and some
reference forecasting system, respectively. The quantity Aprep-
resents the persistent model accuracy value of the measure;
that is, the value of the metric if the outcome were known
perfectly.
5.5 Relative skill score
When the persistent model accuracy Apis equal to zero,
a different statistical measure can be used to compare
the performance of two forecasting systems. This measure
is defined as relative skill score, an alternative to a skill
score [43].
Relative skill score =AfAr
Ar
(17)
For calculating skill score/relative skill score in this work,
the accuracy parameter used is MAAPE over its advantages as
mentioned above.
6RESULTS AND DISCUSSIONS
In this study, determining the optimal random forest parameters
involved experimentation and domain knowledge. The key con-
trol parameters identified were the number of estimators and
the random state. Correlation coefficients and heat maps high-
lighted the key data parameters. To capture seasonal variations,
time-based features were added to the dataset, capturing annual
patterns.
Results for correlation coefficients were presented in Figure 6
in the form of seasonal heat maps.
Strong correlation between PV power output, solar radiation,
and ambient temperatures were seen in summer followed by
winter and autumn.
Correlation between temperatures and time were seen strong
in winter season followed by autumn and summer.
It was very clear form the heat maps, the time-temperature,
and solar radiation-PV output were strongly correlated.
Correlations in monsoon season compared to the other
seasons were not so strong.
Though there is not much significance correlation seen between
other weather parameters with PV output, to avoid any loss of
data, all attributes were considered for the present study.
15 min ahead seasonal PV power forecasted values versus
observed values using different ML techniques (RF, DNN,
ANN, SVR) were presented in Figure 7along with the mean
arc-tangent absolute percentage errors (varies between 0and
90), which should be low for a good model. In all four seasons,
the MAAPE for random forest is observed to be low compared
to the other models evaluated in the study. Season wise analy-
sis, shows that, models are performing better in summer season
followed by winter, autumn and monsoon based on the values
of MAAPE.
Figure 8presents the results of 30 min ahead seasonal PV
power forecasted values versus actual values using different ML
techniques (RF, DNN, ANN, SVR). In all four seasons, the
MAAPE for random forest is observed to be low compared to
the other models evaluated in the study and the reference SVR
methods. Season wise analysis, shows that, models are perform-
ing better in summer season with the lower MAAPE of 9.29
followed by winter, autumn and monsoon based on the values
of MAAPE. The errors are less compared to the 15 min ahead
forecasting.
Figure 9shows the results of 45 min ahead seasonal PV
power forecasted values versus actual values. Proposed random
forest technique for the study location is found to be perform-
ing better based on the lower MAAPE values compared to the
other techniques used for comparison. These values are on par
with the results obtained in 30 min ahead forecasting. Season
wise analysis, shows that, models are performing better in sum-
mer season with the lower MAAPE of 8.57followed by winter,
and autumn based on the values of MAAPE. The errors are less
compared to the 15 and 30 min ahead forecasting.
Figure 10 shows the results of 60 min ahead seasonal PV
power forecasted values versus actual values. Compared to
reference SVR, and other techniques evaluated in the study, pro-
posed random forest technique for the study location is found
to be performing better as per the lower MAAPE values. These
values are on par with the results obtained in 15,30, and 45 min
ahead forecasting. Season wise analysis, shows that, models are
performing better in summer season with the lower MAAPE of
9.13followed by winter, and autumn based on the values of
MAAPE.
JOGUNURI ET AL.11
FIGURE 6 Seasonal variation of correlation coefficient.
With reference to Figures 7–10, and on MAAPE values, it is
visible that the random forest technique is superior over DNN,
ANN, and SVM (reference technique) with lowest MAAPE
values ranging from 8.57to 15.77.
Various performance metrics for different seasons and fore-
casting horizons for various ML techniques were presented in
Table 2, but the base for ranking the performance of the mod-
els considered for the study considered is MAAPE. Though
RMSE, r2and MAE were considered in the literature reported,
it becomes complex when the error values are appearing to be
very close for different techniques as shown in Table 2.This
leads to the calculation of MAPE. This value gives the per-
centage of error between different models results. If the data
consists of any values closer to zero or zero, this percentage
values as per their equations shown in methodology leads to
a higher value or infinite and MAPE too sometimes does not
provide any information for deciding the performance of the
model. To avoid this, a new error metric was considered here,
that is, MAAPE which give error in terms of angle varying
between 0and 90. Any value closer to 0is considered as
a good model. As per the MAAPE results shown in the table,
except the monsoon season, the values are ranging between 8
and 15for random forest technique and for all other tech-
niques, these values are higher than the proposed random forest
technique.
Further, though MAAPE shows RF superiority over other
models, considering SVM as the reference model, the percent-
age improvements of all models compared to the reference
model too were evaluated using relative skill score. The rela-
tive scores for the forecasting horizons of 45 and 60 min do not
show any improvement in the proposed RF model, while show-
ing an appreciable percentage of improvement ranging from 6%
to 49% and 3% to 50% for forecasting horizons 15 and 30 min,
respectively. Lowest and highest amounts of improvement are
seen in the monsoon and winter seasons respectively for both
15 and 30 min horizons.
The forecasting accuracy of 15 and 30 min ahead hori-
zons were high compared to the other forecasting horizons.
This is because of the reason that the frequency of input
data collection is close to the forecasting horizon. Also,
the ensemble nature of random forest combines the pre-
dictions of multiple decision trees, each capturing different
aspects of the forecasting problem. This ensemble approach
improves the model’s ability to adapt to the variations
and sudden changes in weather conditions at shorter time
horizons.
12 JOGUNURI ET AL.
FIGURE 7 15 min ahead seasonal PV forecasting graphs for different ML techniques. RF, random forest; DNN, deep neural networks; ANN, artificial neural
networks; SVR, support vector regression.
FIGURE 8 30 min ahead seasonal PV forecasting graphs for different ML techniques. RF, random forest; DNN, deep neural networks; ANN, artificial neural
networks; SVR, support vector regression.
JOGUNURI ET AL.13
FIGURE 9 45 min ahead seasonal PV forecasting graphs for different ML techniques. RF, random forest; DNN, deep neural networks; ANN, artificial neural
networks; SVR, support vector regression.
FIGURE 10 60 min ahead seasonal PV power forecasting graphs for different ML techniques. RF, random forest; DNN, deep neural networks; ANN, artificial
neural networks; SVR, support vector regression.
14 JOGUNURI ET AL.
TAB LE 2 Performance metrics for different seasons and forecasting horizons for various ML techniques.
15 min 30 min 45 min 60 min
Horizon RF DNN ANN SVM RF DNN ANN SVM RF DNN ANN SVM RF DNN ANN SVM
RMSE Autumn 1.8109 1.7226 1.7330 1.8003 1.7376 1.6748 1.6652 1.7117 1.7025 1.6470 1.8286 1.6998 1.9682 1.8342 1.8940 1.8428
Winter 1.1769 1.2291 1.2547 1.2450 1.0694 1.2982 1.3841 1.3376 1.0800 1.2531 1.3406 1.3556 1.0742 1.3328 1.4035 1.3643
Summer 1.7025 1.4059 0.9405 1.2903 1.2478 1.4140 1.4312 1.3490 1.2676 1.3935 1.6342 1.3171 1.2695 1.4298 1.6682 1.4520
Monsoon 3.7021 3.5697 3.5238 3.6100 3.6052 3.6960 2.9735 3.7791 3.6092 3.6217 3.6537 3.7817 3.5416 3.5920 3.6027 3.7034
r2Autumn 0.8303 0.8544 0.8527 0.8410 0.8436 0.8619 0.8635 0.8558 0.8474 0.8633 0.8315 0.8544 0.7953 0.8268 0.8154 0.8252
Winter 0.9249 0.9235 0.9203 0.9215 0.9367 0.9135 0.9017 0.9080 0.9362 0.9202 0.9087 0.9066 0.9395 0.9087 0.8987 0.9043
Summer 0.8940 0.8460 0.8482 0.8703 0.8816 0.8483 0.8445 0.8619 0.8783 0.8551 0.8007 0.8705 0.8755 0.8489 0.7943 0.8441
Monsoon 0.3168 0.2989 0.3145 0.2827 0.3224 0.2880 0.2948 0.2558 0.3329 0.3229 0.3026 0.2618 0.3437 0.3083 0.3042 0.2647
MAPE Autumn 0.282 0.420 0.421 0.430 0.235 0.495 0.807 0.436 0.249 0.601 0.798 0.504 0.265 0.642 0.709 0.444
Winter 0.170 0.332 0.383 0.338 0.175 0.355 0.385 0.357 0.175 0.398 0.412 0.389 0.179 0.541 0.545 0.445
Summer 0.149 0.201 0.209 0.184 0.164 0.265 0.265 0.254 0.151 0.280 0.289 0.244 0.161 0.264 0.313 0.285
Monsoon 0.932 1.135 1.140 1.029 0.952 1.154 1.114 0.991 1.001 1.025 1.019 0.870 0.968 1.005 1.034 0.909
MAAPE Autumn 15.77 22.78 22.83 23.29 13.20 38.92 26.34 23.53 13.96 30.99 38.58 26.73 14.84 32.69 35.33 23.95
Winter 9.63 18.37 20.95 18.70 9.92 21.08 19.52 19.66 9.90 21.69 22.38 21.25 10.17 28.40 28.61 24.00
Summer 8.45 11.38 11.83 10.40 9.29 14.83 14.84 14.24 8.57 15.66 16.11 13.73 9.13 14.77 17.35 15.91
Monsoon 42.98 48.62 48.74 45.82 43.59 48.09 49.09 44.74 45.03 45.70 45.53 41.01 44.07 45.14 45.96 42.28
Relative
Skill
score
Autumn 0.323 0.022 0.020 0.000 0.439 0.119 0.654 0.000 0.478 0.160 0.443 0.000 0.380 0.365 0.475 0.000
Winter 0.485 0.018 0.121 0.000 0.495 0.007 0.072 0.000 0.534 0.021 0.053 0.000 0.576 0.183 0.192 0.000
Summer 0.188 0.094 0.137 0.000 0.347 0.042 0.041 0.000 0.376 0.141 0.173 0.000 0.426 0.072 0.091 0.000
Monsoon 0.062 0.061 0.064 0.000 0.026 0.097 0.075 0.000 0.098 0.114 0.110 0.000 0.042 0.068 0.087 0.000
7CONCLUSION
For improving the stability of the grid, accurate forecasting of
PV power output from largely integrated solar photovoltaic
plant connected to grid is required. In the present study, to
improve the forecasting accuracy of the forecasting models, in
situ measurement of the weather parameters and the PV power
output from the 20 kW on-grid were collected for a typical
year which covers all four seasons and evaluated the random
forest techniques and other techniques like DNNs, ANNs and
SVR (reference in this study). The simulation results show that
the proposed random forest technique for the forecasting hori-
zon of 15 and 30 min is performing well with 49% and 50%
improvements in the accuracy respectively over reference model
for the study location 22.78N, 73.65E, College of Agricultural
Engineering and Technology, Anand Agricultural University,
Godhra, India. The proposed random forest technique can be
applied to different locations with varying weather patterns and
grid characteristics; it is essential to adapt the model to the spe-
cific context. This involves collecting relevant data, engineering
appropriate features, and tuning the model to achieve accurate
forecasts for the new location/study location. Additionally, data
privacy concerns are mitigated as the analysis uses locally stored
data without transmission.
Further studies to improve the forecasting accuracy for more
forecasting horizons are to be considered for the future scope
of work.
ABBREVIATIONS
PV Photovoltaic
ML Machine learning
NWP Numerical weather predictions
SVM Support vector machines
SVC Support vector classification
SVR Support vector regression
ANN Artificial neural networks
DNN Deep Neural Networks
RF Random Forest
GA Genetic algorithm
ANFIS Adaptive neuro-fuzzy inference system
GNN Graph neural network
PVPNet Prioritization of variants in personal network
LSTM Long-Short-Term-Memory
WT Wavelet transform
DGM Discrete Grey Model
RNN Recurrent neural networks
GRU Gated recurrent unit
DLNN Deep learning neural network
BiLSTM Bidirectional Long Short-Term Memory
BiGRU Bidirectional Gated recurrent unit
HIMVO Hybrid improved multi-verse optimizer
algorithm
ACO Ant colony optimization
XGBoost Extreme Gradient Boosting
JOGUNURI ET AL.15
CEEMD Complementary Ensemble Empirical Mode
Decomposition
DIFPSO Particle swarm optimization algorithm based on
dynamic inertia factor
BPNN Back propagation neural network
IGIVA Improved grey ideal value approximation
RMSE Root mean squared error
MAE Mean absolute error
MAPE Mean absolute percentage error
MAAPE Mean absolute arc-tangent percentage error
SS Skill score
RSS Relative skill score
AUTHOR CONTRIBUTIONS
Sravankumar Jogunuri: Conceptualization; resources. Josh T:
Data curation; software. Albert Alexander Stonier: Formal
analysis; supervision. Geno Peter: Formal analysis; supervision.
Jaya kumar Jayaraj: Validation. Jaganathan S: Investigation;
visualization. Jency Joseph J: Methodology; writing—original
draft. Vivekananda Ganji: Writing—review and editing.
FUNDING INFORMATION
The authors received no specific funding for this work.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available
from the corresponding author upon reasonable request.
ORCID
Sravankumar Jogunuri https://orcid.org/0000-0003-4272-
5176
Josh F.T https://orcid.org/0000-0002-1580-0567
Albert Alexander Stonier https://orcid.org/0000-0002-3572-
2885
Geno Peter https://orcid.org/0000-0002-1825-8427
Jayakumar Jayaraj https://orcid.org/0000-0001-7099-0554
Jaganathan S https://orcid.org/0000-0002-2967-7301
JencyJosephJ https://orcid.org/0000-0001-9909-2256
Vivekananda Ganji https://orcid.org/0000-0002-5646-2138
REFERENCES
1. Li, P., Zhou, K., Lu, X., Yang, S.: A hybrid deep learning model for short-
term PV power forecasting. Appl. Energy 259, 114216 (2020). https://doi.
org/10.1016/j.apenergy.2019.114216
2. Mi, S., Plato, J., Kr, P.: Supervised learning of photovoltaic power plant.
Neural Netw. World 4(13), 321–338 (2013)
3. Nam, K.J., Hwangbo, S., Yoo, C.K.: A deep learning-based forecasting
model for renewable energy scenarios to guide sustainable energy policy:
A case study of Korea. Renewable Sustainable Energy Rev. 122, 109725
(2020). https://doi.org/10.1016/j.rser.2020.109725
4. AlKandari, M., Ahmad, I.: Solar power generation forecasting using
ensemble approach based on deep learning and statistical methods. Appl.
Comput. Inf. (2019) ahead-of-print. https://doi.org/10.1016/j.aci.2019.11.
002
5. Brahma, B., Wadhvani, R.: Solar irradiance forecasting based on deep
learning methodologies and multi-site data. Symmetry 12(11), 1–20 (2020).
https://doi.org/10.3390/sym12111830
6. Qing, X., Niu, Y.: Hourly day-ahead solar irradiance prediction using
weather forecasts by LSTM. Energy 148, 461–468 (2018). https://doi.org/
10.1016/j.energy.2018.01.177
7. Sorkun, M.C., Incel, Ö.D., Paoli, C.: Time series forecasting on multivari-
ate solar radiation data using deep learning (LSTM). Turk. J. Electr. Eng.
Comput. Sci. 28(1), 211–223 (2020). https://doi.org/10.3906/elk-1907-
218
8. Guijo-Rubio, D., et al.: Evolutionary artificial neural networks for accurate
solar radiation prediction. Energy 210, 118374 (2020). https://doi.org/10.
1016/j.energy.2020.118374
9. Dong, J., et al.: Novel stochastic methods to predict short-term solar radi-
ation and photovoltaic power. Renewable Energy 145, 333–346 (2020).
https://doi.org/10.1016/j.renene.2019.05.073
10. Yadav, H.K., Pal, Y., Tripathi, M.M.: A novel GA-ANFIS hybrid model for
short-term solar PV power forecasting in Indian electricity market. J. Inf.
Optim. Sci. 40(2), 377–395 (2019). https://doi.org/10.1080/02522667.
2019.1580880
11. Chaudhary, P., Rizwan, M.: Short term solar energy forecasting
using GNN integrated wavelet-based approach. Int. J. Renewable
Energy Technol. 10(3), 229 (2019). https://doi.org/10.1504/ijret.2019.
101729
12. Wang, K., Qi, X., Liu, H.: A comparison of day-ahead photovoltaic power
forecasting models based on deep learning neural network. Appl. Energy
251, 113315 (2019). https://doi.org/10.1016/j.apenergy.2019.113315
13. Huang, C.J., Kuo, P.H.: Multiple-input deep convolutional neural network
model for short-term photovoltaic power forecasting. IEEE Access 7,
74822–74834 (2019). https://doi.org/10.1109/ACCESS.2019.2921238
14. Li, G., Wang, H., Zhang, S., Xin, J., Liu, H.: Recurrent neural networks
based photovoltaic power forecasting approach. Energies 12(13), 1–17
(2019). https://doi.org/10.3390/en12132538
15. Gao, M., Li, J., Hong, F., Long, D.: Day-ahead power forecasting in a
large-scale photovoltaic plant based on weather classification using LSTM.
Energy 187, 115838 (2019). https://doi.org/10.1016/j.energy.2019.07.
168
16. Hossain, M.S., Mahmood, H.: Short-term photovoltaic power forecast-
ing using an LSTM neural network and synthetic weather forecast. IEEE
Access 8, 172524–172533 (2020). https://doi.org/10.1109/ACCESS.
2020.3024901
17. Mishra, M., Byomakesha Dash, P., Nayak, J., Naik, B., Kumar Swain, S.:
Deep learning and wavelet transform integrated approach for short-term
solar PV power prediction. Measurement 166, 108250 (2020). https://doi.
org/10.1016/j.measurement.2020.108250
18. Mellit, A., Pavan, A.M., Lughi, V.: Deep learning neural networks for short-
term photovoltaic power forecasting. Renewable Energy 172, 276–288
(2021). https://doi.org/10.1016/j.renene.2021.02.166
19. VanDeventer, W., et al.: Short-term PV power forecasting using hybrid
GASVM technique. Renewable Energy 140, 367–379 (2019). https://doi.
org/10.1016/j.renene.2019.02.087
20. Li, L.L., Wen, S.Y., Tseng, M.L., Wang, C.S.: Renewable energy predic-
tion: A novel short-term prediction model of photovoltaic output power.
J. Cleaner Prod. 228, 359–375 (2019). https://doi.org/10.1016/j.jclepro.
2019.04.331
21. Pan, M., et al.: Photovoltaic power forecasting based on a support vec-
tor machine with improved ant colony optimization. J. Cleaner Prod. 277,
123948 (2020). https://doi.org/10.1016/j.jclepro.2020.123948
22. Jebli, I., Belouadha, F.Z., Kabbaj, M.I., Tilioua, A.: Prediction of solar
energy guided by pearson correlation using machine learning. Energy 224,
120109 (2021). https://doi.org/10.1016/j.energy.2021.120109
23. Zazoum, B.: Solar photovoltaic power prediction using different machine
learning methods. Energy Rep. 8, 19–25 (2022). https://doi.org/10.1016/
j.egyr.2021.11.183
24. Meng, M., Song, C.: Daily photovoltaic power generation forecasting
model based on random forest algorithm for north china in winter.
Sustainability 12(6), 2247 (2020). https://doi.org/10.3390/su12062247
16 JOGUNURI ET AL.
25. Munawar, U., Wang, Z.: A framework of using machine learning
approaches for short-term solar power forecasting. J. Electr. Eng. Technol.
15(2), 561–569 (2020). https://doi.org/10.1007/s42835-020-00346-4
26. Niu, D., Wang, K., Sun, L., Wu, J., Xu, X.: Short-term photovoltaic
power generation forecasting based on random forest feature selection
and CEEMD: A case study. Appl. Soft Comput. J. 93, 106389 (2020).
https://doi.org/10.1016/j.asoc.2020.106389
27. Mahmud, K., Azam, S., Karim, A., Zobaed, S., Shanmugam, B., Mathur, D.:
Machine learning based PV power generation forecasting in alice springs.
IEEE Access 9, 46117–46128 (2021). https://doi.org/10.1109/ACCESS.
2021.3066494
28. Massaoudi, M., Chihi, I., Sidhom, L., Trabelsi, M., Refaat, S.S., Oueslati,
F.S.: Photovoltaic power forecasting using weather measurements. Ener-
gies 14(13), 1–20 (2021)
29. Zameer, A., Jaffar, F., Shahid, F., Muneeb, M., Khan, R., Nasir, R.: Short-
term solar energy forecasting: Integrated computational intelligence of
LSTMs and GRU. PLoS One 18(10), e0285410 (2023). https://doi.org/
10.1371/journal.pone.0285410
30. Li, Y., Wang, R., Yang, Z.: Optimal scheduling of isolated microgrids using
automated reinforcement learning-based multi-period forecasting. 13(1),
159–169 (2021). https://doi.org/10.1109/TSTE.2021.3105529
31. Balal, A., Jafarabadi, Y.P., Demir, A., Igene, M., Giesselmann, M., Bayne,
S.: Forecasting solar power generation utilizing machine learning models
in Lubbock. Emerging Sci. J. 7(4), 1052–1062 (2023). https://doi.org/10.
28991/ESJ-2023-07-04-02
32. Behera, M.K., Nayak, N.: A comparative study on short-term PV
power forecasting using decomposition based optimized extreme learning
machine algorithm. Eng. Sci. Technol. 23(1), 156–167 (2020). https://doi.
org/10.1016/j.jestch.2019.03.006
33. Álvarez-Alvarado, J.M., Ríos-Moreno, J.G., Obregón-Biosca, S.A.,
Ronquillo-Lomelí, G., Ventura-Ramos, E., Trejo-Perea, M.: Hybrid
techniques to predict solar radiation using support vector machine and
search optimization algorithms: A review. Appl. Sci. 11(3), 1–17 (2021).
https://doi.org/10.3390/app11031044
34. Villegas-Mier, C.G., Rodriguez-Resendiz, J., Álvarez-Alvarado, J.M.,
Jiménez-Hernández, H., Odry, Á.: Optimized random forest for solar radi-
ation prediction using sunshine hours. Micromachines 13(9), 1406 (2022).
https://doi.org/10.3390/mi13091406
35. Chahboun, S., Maaroufi, M.: Novel comparison of machine learning
techniques for predicting photovoltaic output power. Int. J. Renewable
Energy Res. 11(3), 1205–1214 (2021). https://doi.org/10.20508/ijrer.
v11i3.12056.g8252
36. Jang, H.S., Bae, K.Y., Park, H.S., Sung, D.K.: Solar power prediction
based on satellite images and support vector machine. IEEE Trans. Sus-
tainable Energy 7(3), 1255–1263 (2016). https://doi.org/10.1109/TSTE.
2016.2535466
37. Varanasi, J., Tripathi, M.M.: K-means clustering based photo voltaic power
forecasting using artificial neural network, particle swarm optimization
and support vector regression. J. Inf. Optim. Sci. 40(2), 309–328 (2019).
https://doi.org/10.1080/02522667.2019.1578091
38. Ramedani, Z., Omid, M., Keyhani, A., Shamshirband, S., Khoshnevisan,
B.: Potential of radial basis function based support vector regression for
global solar radiation prediction. Renewable Sustainable Energy Rev. 39,
1005–1011 (2014). https://doi.org/10.1016/j.rser.2014.07.108
39. Aslam, S., Herodotou, H., Ayub, N., Mohsin, S.M.: Deep learning based
techniques to enhance the performance of microgrids: A review. In: Pro-
ceedings - 2019 International Conference on Frontiers of Information Technology, FIT
2019 at Islamabad, Pakistan. pp. 116–121 (2019). https://doi.org/10.1109/
FIT47737.2019.00031
40. Yadav, P.K., Bhasker, R., Stonier, A.A., Peter, G., Vijayakumar, A., Ganji,
V.: Machine learning based load prediction in smart-grid under different
contract scenario. IET Generation Trans & Dist (2023). https://doi.
org/10.1049/gtd2.12828
41. Qasem, M., Fılık, Ü.B.: Solar radiation forecasting by using deep neural
networks in Eski˙¸sehi˙
r. Sigma J. Eng. Nat. Sci. 39(2), 159–169 (2021).
https://doi.org/10.14744/sigma.2021.00005
42. Kim, S., Kim, H.: A new metric of absolute percentage error for intermit-
tent demand forecasts. Int. J. Forecasting 32(3), 669–679 (2016). https://
doi.org/10.1016/j.ijforecast.2015.12.003
43. Wheatcroft, E.: Interpreting the skill score form of forecast performance
metrics. Int. J. Forecasting 35(2), 573–579 (2019). https://doi.org/10.
1016/j.ijforecast.2018.11.010
How to cite this article: Jogunuri, S., F.T, J., Stonier,
A.A., Peter, G., Jayaraj, J., S, J., J, J.J., Ganji, V.: Random
forest machine learning algorithm based seasonal
multi-step ahead short-term solar photovoltaic power
output forecasting. IET Renew. Power Gener. 1–16
(2024). https://doi.org/10.1049/rpg2.12921
... Recently, there has been a substantial worldwide increase in the use of renewable energies [1,2]. This extensive adoption is motivated by the necessity to reduce the overdependence on traditional energy sources, which are known to contribute to environmental pollution. ...
... The photocurrent, denoted as I ph , is a variable that is influenced by both irradiance and temperature, as stated in Equation (2). ...
Article
Full-text available
This paper presents a novel architecture to enhance the performance of grid‐connected photovoltaic (PV) systems through the introduction of several key novelties. Firstly, a packed U‐cell seven‐level (PUC7)‐based single‐phase solar active filter is implemented, offering a comprehensive solution for harmonics mitigation, reactive power compensation, and efficient power extraction from the PV source, while facilitating the injection of real power into the grid. Secondly, the p‐q power injection algorithm is modified to accommodate the extraction of solar power from the PV generator to the grid, simultaneously addressing the need for harmonic current injection to improve power quality. This modification ensures dynamic performance by extracting reference current with harmonic content and solar power information, thereby enhancing the system's overall efficiency. Lastly, the proposed architecture undergoes real outdoor testing, validating its performance in various key aspects including maximum power tracking, reduction of total harmonic distortion in comparison with previous work, operation at unity power factor, and testing the effective operation of the multifunction feature. These contributions collectively demonstrate the effectiveness of the proposed system in enhancing power injection quality and reactive power compensation under real outdoor conditions of PV systems connected to the grid.
Article
Full-text available
Problems with erroneous forecasts of electricity production from solar farms create serious operational, technological, and financial challenges to both Solar farm owners and electricity companies. Accurate prediction results are necessary for efficient spinning reserve planning as well as regulating inertia and power supply during contingency events. In this work, the impact of several climatic conditions on solar electricity generation in Amherst. Furthermore, three machine learning models using Lasso Regression, ridge Regression, ElasticNet regression, and Support Vector Regression, as well as deep learning models for time series analysis include long short-term memory, bidirectional LSTM, and gated recurrent unit along with their variants for estimating solar energy generation for every five-minute interval on Amherst weather power station. These models were evaluated using mean absolute error root means square error, mean square error, and mean absolute percentage error. It was observed that horizontal solar irradiance and water saturation deficiency had a highly proportional relationship with Solar PV electricity generation. All proposed machine learning models turned out to perform well in predicting electricity generation from the analyzed solar farm. Bi-LSTM has performed the best among all models with 0.0135, 0.0315, 0.0012, and 0.1205 values of MAE, RMSE, MSE, and MAPE, respectively. Comparison with the existing methods endorses the use of our proposed RNN variants for higher efficiency, accuracy, and robustness. Multistep-ahead solar energy prediction is also carried out by exploiting hybrids of LSTM, Bi-LSTM, and GRU.
Article
Full-text available
Solar energy is a widely accessible, clean, and sustainable energy source. Solar power harvesting in order to generate electricity on smart grids is essential in light of the present global energy crisis. However, the highly variable nature of solar radiation poses unique challenges for accurately predicting solar photovoltaic (PV) power generation. Factors such as cloud cover, atmospheric conditions, and seasonal variations significantly impact the amount of solar energy available for conversion into electricity. Therefore, it is essential to precisely estimate the output of solar power in order to assess the potential of smart grids. This paper presents a study that utilizes various machine learning models to predict solar photovoltaic (PV) power generation in Lubbock, Texas. Mean Squared Error (MSE) and R² metrics are utilized to demonstrate the performance of each model. The results show that the Random Forest Regression (RFR) and Long Short-Term Memory (LSTM) models outperformed the other models, with a MSE of 2.06% and 2.23% and R² values of 0.977 and 0.975, respectively. In addition, RFR and LSTM demonstrate their capability to capture the intricate patterns and complex relationships inherent in solar power generation data. The developed machine learning models can aid solar PV investors in streamlining their processes and improving their planning for the production of solar energy. Doi: 10.28991/ESJ-2023-07-04-02 Full Text: PDF
Article
Full-text available
Many progressed information scientific strategies, particularly Artificial Intelligence (AI) and profound learning methods, have been proposed and tracked down wide applications in our general public. This proposition creates information driven arrangements by utilizing the most recent profound learning and AI innovation, including outfit learning, meta‐learning and move learning, for energy the executives framework issues. Genuine world datasets are tried on proposed models contrasted and best in class plans, which exhibit the predominant presentation of the proposed model. In this proposition, the engineering of the Smart Grid testbed is additionally planned and created by using ML calculations and true remote correspondence frameworks to such an extent that constant plan necessities of Smart Grid testbed is met by this reconfigurable system with stacking of full convention in medium access control (MAC) and physical layers (PHY). The proposed engineering has the reconfiguration property in view of the organization of remote correspondence and trend setting innovations of Information and communication technologies (ICT) which incorporates Artificial Intelligence (AI) calculation. The fundamental plan objectives of the Smart Grid testbed is to make it simple to construct, reconfigure and scale to address the framework level prerequisites and to address the ongoing necessities.
Article
Full-text available
Knowing exactly how much solar radiation reaches a particular area is helpful when planning solar energy installations. In recent years the use of renewable energies, especially those related to photovoltaic systems, has had an impressive up-tendency. Therefore, mechanisms that allow us to predict solar radiation are essential. This work aims to present results for predicting solar radiation using optimization with the Random Forest (RF) algorithm. Moreover, it compares the obtained results with other machine learning models. The conducted analysis is performed in Queretaro, Mexico, which has both direct solar radiation and suitable weather conditions more than three quarters of the year. The results show an effective improvement when optimizing the hyperparameters of the RF and Adaboost models, with an improvement of 95.98% accuracy compared to conventional methods such as linear regression, with 54.19%, or recurrent networks, with 53.96%, without increasing the computational time and performance requirements to obtain the prediction. The analysis was successfully repeated in two different scenarios for periods in 2020 and 2021 in Juriquilla. The developed method provides robust performance with similar results, confirming the validity and effectiveness of our approach.
Article
Full-text available
With the tremendous advancements in the sector of renewable energy, power generation sources have diversified, and the grid has become complex to manage. Therefore, enhanced prediction of power sources has become paramount to efficiently manage power systems. In the context of the fourth industrial revolution, artificial intelligence appears as an emerging solution to address this challenging issue. Supported by other developing technologies such as big data and connected objects, artificial intelligence can provide accurate forecasts. Therefore, a performance comparison of several up-to-date machine learning algorithms is conducted in this paper for the hourly prediction of the resulting Photovoltaic power. The methods employed include bayesian regularized neural networks, k-nearest neighbors, gradient boosting, random forest, support vector regression, and multivariate adaptive regression splines. Moreover, this paper investigates the prevailing meteorological and irradiation data affecting the Photovoltaic power through using a correlation analysis. Although there are various existing surveys on photovoltaic power forecasts, current researches usually employ a single approach and compare it to basic algorithms without providing a thorough performance comparison. Furthermore, the datasets used in general depend on a certain period of the year, making it hard to assess the final results. The key contribution of this research is the comprehensive assessment of six complex machine learning techniques, using two years of input data, the most popular error metrics: R-squared, Root Mean Square Error, Mean Absolute Error and a consistent set of black box model's explainers. The results revealed precisely that Bayesian regularized neural networks achieved the best prediction accuracy with R²=99,99%.
Article
Full-text available
In order to reduce the negative impact of the uncertainty of load and renewable energies outputs on micro-grid operation, an optimal scheduling model is proposed for isolated microgrids by using automated reinforcement learning-based multi-period forecasting of renewable power generations and loads. Firstly, a prioritized experience replay automated reinforcement learning (PER-AutoRL) is designed to simplify the deployment of deep reinforcement learning (DRL)-based forecasting model in a customized manner, the single-step multi-period forecasting method based on PER-AutoRL is proposed for the first time to address the error accumulation issue suffered by existing multi-step forecasting methods, then the prediction values obtained by the proposed forecasting method are revised via the error distribution to improve the prediction accuracy; secondly, a scheduling model considering demand response is constructed to minimize the total microgrid operating costs, where the revised forecasting values are used as the dispatch basis, and a spinning reserve chance constraint is set according to the error distribution; finally, by transforming the original scheduling model into a readily solvable mixed integer linear programming via the sequence operation theory (SOT), the transformed model is solved by using CPLEX solver. The simulation results show that compared with the traditional scheduling model without forecasting, this approach manages to significantly reduce the system operating costs by improving the prediction accuracy.
Article
Full-text available
Short-term Photovoltaic (PV) Power Forecasting (STPF) is considered a topic of utmost importance in smart grids. The deployment of STPF techniques provides fast dispatching in the case of sudden variations due to stochastic weather conditions. This paper presents an efficient data-driven method based on enhanced Random Forest (RF) model. The proposed method employs an ensemble of attribute selection techniques to manage bias/variance optimization for STPF application and enhance the forecasting quality results. The overall architecture strategy gathers the relevant information to constitute a voted feature-weighting vector of weather inputs. The main emphasis in this paper is laid on the knowledge expertise obtained from weather measurements. The feature selection techniques are based on local Interpretable Model-Agnostic Explanations, Extreme Boosting Model, and Elastic Net. A comparative performance investigation using an actual database, collected from the weather sensors, demonstrates the superiority of the proposed technique versus several data-driven machine learning models when applied to a typical distributed PV system.
Article
Full-text available
The generation volatility of photovoltaics (PVs) has created several control and operation challenges for grid operators. For a secure and reliable day or hour-ahead electricity dispatch, the grid operators need the visibility of their synchronous and asynchronous generators’ capacity. It helps them to manage the spinning reserve, inertia and frequency response during any contingency events. This study attempts to provide a machine learning-based PV power generation forecasting for both the short and long-term. The study has chosen Alice Springs, one of the geographically solar energy-rich areas in Australia, and considered various environmental parameters. Different machine learning algorithms, including Linear Regression, Polynomial Regression, Decision Tree Regression, Support Vector Regression, Random Forest Regression, Long Short-Term Memory, and Multilayer Perceptron Regression, are considered in the study. Various comparative performance analysis is conducted for both normal and uncertain cases and found that Random Forest Regression performed better for our dataset. The impact of data normalization on forecasting performance is also analyzed using multiple performance metrics. The study may help the grid operators to choose an appropriate PV power forecasting algorithm and plan the time-ahead generation volatility.
Article
The main aim of the present study is to explore the relationship between numerous input parameters and the solar photovoltaic (PV) power using machine learning (ML) models. Two different ML approaches such as support vector machine (SVM) and Gaussian process regression (GPR) were considered and compared. The basic input parameters including solar PV panel temperature, ambient temperature, solar flux, time of the day and relative humidity were considered for predicting the solar PV power. The results showed that among the proposed ML approaches, Matern 5/2 GPR algorithm provided the optimal performance; whereas cubic SVM had the worst performance. Furthermore, the predicted output results are in good agreement with the experimental values, indicating that the proposed ML approaches are appropriate for use in predicting the power of different solar PV panel. Additionally, to showcase the effectiveness and the accuracy of SVM and GPR models in forecasting solar PV power, the results of these models are compared using root mean squared error (RMSE) and mean absolute error (MAE) criteria.
Article
According to the World Economic Outlook (WEO), the global demand for energy is presum- ably going to be increased due to growing the world’s population up during the upcoming two decades. As a result of that, apprehensions about environmental effects, which appear as a re- sult of greenhouse gases are grown and cleaner energy technologies are developed. This clearly shows that extended growth of the worldwide market share of clean energy. Solar energy is considered as one of the fundamental types of renewable energy. For this reason, the need for a predictive model that effectively observes solar energy conversion with high performance becomes urgent. In this paper, classic empirical, artificial neural network (ANN), deep neural network (DNN), and time series models are applied, and their results are compared to each other to find the most accurate model for daily global solar radiation (DGSR) estimation. In addition, four regression models have been developed and applied for DGSR estimation. The obtained results are evaluated and compared by the root mean square error (RMSE), rela- tive root mean square error (rRMSE), mean absolute error (MAE), mean bias error (MBE), t-statistic, and coefficient of determination (R2). Finally, simulation results provided that the best result is found by the DNN model.