ArticlePDF Available

Random forest machine learning algorithm based seasonal multi‐step ahead short‐term solar photovoltaic power output forecasting

IET Renewable Power Generation

January 2024

DOI:10.1049/rpg2.12921

License
CC BY 4.0

Authors:

Sravankumar Jogunuri

Anand Agricultural University

Dr.S.Albert Alexander

VIT University

Geno Peter

Show all 8 authorsHide

To maintain grid stability, the energy levels produced by sources within the network must be equal to the energy consumed by customers. In current times, achieving energy balance mainly involves regulating the electrical energy sources, as consumption is typically beyond the control of grid operators. For improving the stability of the grid, accurate forecasting of photovoltaic power output from largely integrated solar photovoltaic plant connected to grid is required. In the present study, to improve the forecasting accuracy of the forecasting models, onsite measurements of the weather parameters and the photovoltaic power output from the 20 kW on‐grid were collected for a typical year which covers all four seasons and evaluated the random forest techniques and other techniques like deep neural networks, artificial neural networks and support vector regression (reference in this study). The simulation results show that the proposed random forest technique for the forecasting horizon of 15 and 30 min is performing well with 49% and 50% improvements in the accuracy respectively over reference model for the study location 22.78°N, 73.65°E, College of Agricultural Engineering and Technology, Anand Agricultural University, Godhra, India.

Distribution of weather parameters. (a) Ambient temperature (°C), (b) relative humidity (%), (c) solar radiation (W/m²) and (d) wind speed (m/s).

…

Heatmap of correlation coefficients between input variables and target variable solar photo voltaic power output.

…

Random forest model.

…

Illustration of Support Vector Regression.

…

Basic architecture of ANN and DNN. (a) Artificial neural networks (ANN) and (b) Deep neural networks (DNN).

…

Figures - available from: IET Renewable Power Generation

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by Wiley.

Learn more

Content available from IET Renewable Power Generation

This content is subject to copyright. Terms and conditions apply.

Received: 14 June 2023 Revised: 30 October 2023 Accepted: 22 November 2023 IET Renewable Power Generation

DOI: 10.1049/rpg2.12921

ORIGINAL RESEARCH

Random forest machine learning algorithm based seasonal

multi-step ahead short-term solar photovoltaic power output

forecasting

Sravankumar Jogunuri1,2Josh F.T1Albert Alexander Stonier3Geno Peter4

Jayakumar Jayaraj1Jaganathan S5Jency Joseph J6Vivekananda Ganji7

1Division of Electrical and Electronics Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India

2Department of Renewable Energy Engineering, College of Agricultural University, Anand Agricultural University, Godhra, Gujarat, India

3School of Electrical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India

4CRISD, School of Engineering and Technology, University of Technology Sarawak, Sarawak, Malaysia

5Division of Electrical and Electronics Engineering, NGP Institute of Technology, Coimbatore, Tamil Nadu, India

6Department of Electrical and Electronics Engineering, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India

7Department of Electrical and Computer Engineering, Debre Tabor University, Debre Tabor, Ethiopia

Correspondence

Geno Peter, CRISD, School of Engineering and

Technology, University of Technology Sarawak,

Sarawak, Malaysia.

Email: drgeno.peter@uts.edu.my

Vivekananda Ganji, Department of Electrical and

Computer Engineering, Debre Tabor University,

Debre Tabor, Ethiopia.

Email: vivekganji@dtu.edu.et

Abstract

To maintain grid stability, the energy levels produced by sources within the network must

be equal to the energy consumed by customers. In current times, achieving energy balance

mainly involves regulating the electrical energy sources, as consumption is typically beyond

the control of grid operators. For improving the stability of the grid, accurate forecasting

of photovoltaic power output from largely integrated solar photovoltaic plant connected to

grid is required. In the present study, to improve the forecasting accuracy of the forecast-

ing models, onsite measurements of the weather parameters and the photovoltaic power

output from the 20 kW on-grid were collected for a typical year which covers all four sea-

sons and evaluated the random forest techniques and other techniques like deep neural

networks, artiﬁcial neural networks and support vector regression (reference in this study).

The simulation results show that the proposed random forest technique for the forecasting

horizon of 15 and 30 min is performing well with 49% and 50% improvements in the accu-

racy respectively over reference model for the study location 22.78◦N, 73.65◦E, College of

Agricultural Engineering and Technology, Anand Agricultural University, Godhra, India.

1 INTRODUCTION

The utilization of photovoltaic (PV) power has the potential to

meet the rising global need for clean energy as it is a renewable,

environmentally-friendly, and adaptable source of distributed

energy [1].

Power grids typically include power plants that produce

steady streams of energy, such as coal, gas, and nuclear power

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is

properly cited.

plants, as well as plants that generate unpredictable and variable

energy, such as wind and photovoltaic power plants, whose out-

put depends heavily on weather conditions in a given location

and time. To maintain grid stability, the energy levels produced

by sources within the network must be equal to the energy con-

sumed by customers. In current times, achieving energy balance

mainly involves regulating the electrical energy sources, as con-

sumption is typically beyond the control of grid operators [2, 3].

IET Renew. Power Gener. 2024;1–16. wileyonlinelibrary.com/iet-rpg 1

2JOGUNURI ET AL.

Forecasting solar power will play a crucial role in determining

the future of renewable energy plants and their integration with

grids on a large scale. The accuracy of predicting photovoltaic

power generation is heavily reliant on the constantly changing

weather conditions [4, 5].

Accurately forecasting solar power is critical in reducing

energy expenses and ensuring high-quality power in electrical

power grids that rely on distributed solar photovoltaic gener-

ation. For residential and small commercial users who utilize

on-site photovoltaic generation, obtaining historical irradiance

data directly can be difﬁcult due to the high cost of solar irradi-

ance meters. However, weather forecasting services offered by

local meteorological organizations have improved and provide

information such as temperature, dew point, humidity, visibil-

ity, wind speed, and descriptive weather summaries through the

internet. Unfortunately, forecasting data for solar power is often

unavailable [6].

Solar power is one of the crucial sources of electricity grid,

and accurate information about the amount of solar power to

be generated from different sources and at various intervals—

minutes, hours, and days—is essential for its optimal utilization.

Forecasting solar power relies on two primary methods based

on the time horizon: statistical time series forecasting for short

to midterm intervals and numerical weather prediction (NWP)

for medium to long-term intervals [7].

Forecasting methods can be broadly categorized into three

groups: physical, statistical, and machine learning methods. In

the physical method, NWP models are used for long-term fore-

casting horizons of one to two days. The statistical method is

based on historical time data series, which is less complex than

the physical method. However, its prediction accuracy is limited

as it relies on the persistence or stochastic time series concept,

while irradiance time series characteristics are non-stationary

[6].

In recent years, the use of machine learning-based (ML) algo-

rithms has become a reliable alternative/complement to NWP

in solar energy prediction problems, given the signiﬁcant rise

in their ability and accuracy to obtain reliable forecasts [8].

Machine learning is a branch of artiﬁcial intelligence that uses

datasets to build a non-linear mapping between input and out-

put data without explicit programming. While the literature has

utilized statistical based machine learning forecasting methods,

mostly seen methods are support vector machine (SVM), arti-

ﬁcial neural network (ANN) or deep learning neural networks

(DNN) but there are very a few studies that utilize random

forest (RF) algorithms for solar photovoltaic power forecast-

ing and needs lot of exploration of forecasting accuracy of RF

algorithms for different site speciﬁc and seasonal data available.

In the present study, we developed a forecasting model with

multi-step a-head short-term (15–60 min a-head) solar PV

power forecasting for the selected site based on ensemble RF

techniques and domain knowledge. These results were com-

pared with the widely used statistical machine learning based

algorithms support vector machine/regression (SVM/SVR)

and ANN/DNN. Selection of an appropriate forecasting model

is a challenging job and hence we meticulously compared and

evaluated the models for forecasting using different perfor-

mance metrics for different seasonal intervals of the data and

selected the best model for forecasting solar photovoltaic power

from 15 min a-head to 60 min a-head.

The rest of this paper is divided into several sections. In Sec-

tion 2, the literature on machine learning statistical models such

as RF, SVR, and ANN/DNN for predicting short-term solar

photovoltaic power is presented. Section 3includes information

on the dataset used in this study, data analysis, and the prediction

of solar photovoltaic power values for different time periods

(ranging from 15 to 60 min ahead) using measured weather

parameters data. Section 4provides a brief introduction to RF,

SVR, and ANN/DNN. Section 5presents the results obtained

from the study, and Section 6concludes the paper.

2LITERATURE REVIEW

2.1 Works reported on short term solar

photovoltaic power forecasting

Ref. [9] introduced two stochastic models for predicting solar

photovoltaic (PV) system behaviour. They provide short-term,

high-resolution probabilistic forecasts using historical data.

The ﬁrst model uses uncertain basis functions with three

possible distributions. The second model uses stochastic state-

space models with a ﬁlter-based expectation-maximization and

Kalman ﬁltering mechanism. These models are suitable for real-

time use in tertiary dispatch controllers and optimal power

controllers.

Ref. [10] proposed a new hybrid model for short-term solar

PV power forecasting in India using data collected in Kolkata

in 2014 for a year. The model combines GA with ANFIS and

was tested using actual data from a solar power plant in India.

Weekly forecasts were created with the model and its accuracy

was compared to existing models, showing better performance

in forecasting with a mean absolute percentage error (MAPE)

accuracy metric.

In the current situation, most distribution companies bid for

power every 15 min. To address this, [11] developed a short-

term solar energy forecasting with an intelligent approach based

on wavelet transform and generalized neural network (GNN) to

overcome the ﬂuctuating and non-linear nature of solar energy.

Data on global solar irradiance, ambient temperature, relative

humidity, and wind speed were collected at 15-min intervals and

used as input. The proposed GNN model outperformed tradi-

tional models, as shown by statistical indicators like root mean

square error (RMSE) and mean absolute error (MAE).

Reference [12] introduced three deep learning models,

including a convolutional neural network, a long short-term

memory network, and a hybrid model, for photovoltaic power

prediction using data from a 4-year period with 5-min inter-

vals. The models were evaluated using different metrics and

the hybrid model performed the best, followed by the con-

volutional neural network. The study found that longer input

sequences improved the accuracy of the models, but not always.

The deep learning models presented demonstrate the potential

for improving photovoltaic power prediction accuracy.

JOGUNURI ET AL.3

Reference [13] proposed a deep neural network model, called

PVPNet, PVPNet is a precise deep neural network model for

forecasting PV system output power. It uses meteorological

information and historical data to produce 24-h probabilistic

and deterministic forecasts. PVPNet outperforms other mod-

els with an MAE of 109.4845 and an RMSE of 163.1513 and

is effective in predicting complex time series with high volatility

and irregularity.

Reference [14] proposed a new model combining long-

short-term-memory (LSTM) and wavelet transform (WT) for

short-term solar power prediction in Flanders, Belgium. The

model uses meteorological factors as inputs and is evalu-

ated using statistical measures. Results show it outperforms

other contemporary machine learning and deep-learning-based

models.

Reference [15] proposed methods for forecasting day-ahead

power output time-series for solar power plants, with separate

approaches for ideal and non-ideal weather conditions. The

ideal weather conditions method uses LSTM networks, while

the non-ideal method considers time-series relevance and spe-

ciﬁc non-ideal weather characteristics, incorporating adjacent

day time-series, and uses discrete grey model (DGM) to improve

power output prediction. The data was collected over a time

period spanning from 1 November 2016 to 28 October 2017

and was recorded in 15-min intervals. The proposed model

was evaluated using data from a solar power plant in Shan-

dong province, China, and outperformed traditional algorithms

in terms of forecasting accuracy.

Reference [1] proposed a hybrid deep learning model for

predicting PV power 1 h ahead, using data collected at 5-min

intervals from 1 June 2014 to 31 May 2015. The model parti-

tions the PV power series using Wavelet Packet Decomposition

and uses four distinct LSTM networks to handle the sub-series,

with predictions combined using a linear weighting method.

Evaluation against actual data from Alice Springs, Australia

shows better performance than other models such as LSTM,

RNN, GRU, and MLP, according to MBE, MAPE, and RMSE

criteria.

Reference [16] introduced a method for predicting PV power

generation using LSTM neural network. Historical weather data

from Desoto solar farm and Arcadia in Florida were collected

from NREL for 2012–2018, divided into 4 seasons. A synthetic

weather forecast was created using historical solar irradiance

data and publicly available sky forecast data by K-means algo-

rithm. The proposed synthetic weather forecast was found to

improve the accuracy of PV power generation forecasting.

Reference [17] developed a WT-LSTM model for predict-

ing short-term solar power using wavelet transform and long

short-term memory networks. The model decomposes solar

energy time-series data into frequency series and uses LSTM

with dropout to predict future values using meteorological fac-

tors as input. The model outperformed other contemporary

models according to statistical performance measures. Data was

collected from February 2016 to October 2017.

Reference [18] compared deep learning neural networks

(DLNN) including LSTM, BiLSTM, GRU, BiGRU, CNN1D,

CNN1D-LSTM, and CNN1D-GRU for short-term output PV

power forecasting. The models were trained and tested on a

database of PV power generated by a micro grid at the Uni-

versity of Trieste in Italy, evaluated across four different time

periods for both one-step and multi-step forecasting. The study

found high accuracy, especially for one-step forecasting with

a 1 min time horizon, and acceptable results for multi-step

forecasting up to 8 steps ahead.

Reference [19] introduced a model to predict short-term

power output of residential solar panels using genetic algo-

rithms and support vector machines (SVM). The GASVM

technique was trained and veriﬁed using real-world data and

outperformed the conventional SVM model with a signiﬁcant

margin, according to RMSE and MAPE metrics.

Reference [20] presented a new algorithm called hybrid

improved multi-verse optimizer (HIMVO) to optimize sup-

port vector machine (SVM) for predicting photovoltaic output.

The data collection period is from 8:00 AM to 6:00 PM each

day, with power output readings recorded every 10 min. The

HIMVO algorithm incorporates chaotic sequences and was

tested on authentic operational data from photovoltaic arrays

in Alice Springs, Australia. Results show that the HIMVO algo-

rithm is more stable and effective than other optimization

algorithms tested, with higher prediction accuracy and stabil-

ity for different weather types. The proposed method has the

potential to improve photovoltaic output prediction.

Reference [21] investigated an ultra-short-term PV model for

data preprocessing to improve the performance of support vec-

tor machine (SVM) models for predicting PV power. The data

from January 2018 to November 2019 was processed at 5-min

intervals and optimized using ant colony optimization (ACO).

The results showed that appropriate data preprocessing can

increase the model’s regression coefﬁcient (R2) by 6.8%.

Reference [22] introduced a technique for solar energy

forecasting using machine and deep learning methods. The

proposed solution utilizes a single tool and suitable predic-

tive models and was assessed for real-time and short-term

prediction of solar energy. The data used in the study span

from 2016 to 2018 and relate to Errachidia, Morocco, and the

Pearson correlation coefﬁcient was employed to determine rel-

evant meteorological inputs. The RF and ANN models showed

high accuracy, while LR and SVR models reported signiﬁcant

errors. ANN performed well for both real-time and short-term

predictions.

Reference [23] investigated the relationship between input

parameters and power generated by solar PV panels using SVM

and GPR machine learning models. The input parameters stud-

ied were solar PV panel temperature, ambient temperature, solar

ﬂux, time of the day, and relative humidity. The Matern 5/2

GPR algorithm was found to be the most effective, while the

cubic SVM had the poorest performance. The predicted results

were consistent with experimental values, indicating the suit-

ability of the proposed ML models for predicting the power of

various solar PV panels. The accuracy and efﬁciency of SVM

and GPR models were compared using the RMSE and MAE

criteria.

Reference [24] proposed a new model based on RF algorithm

for forecasting daily power generation at Zhonghe PV station in

4JOGUNURI ET AL.

North China. The model was found to be effective in reducing

overﬁtting and achieved lower mean absolute percentage errors

of 2.83% and 3.89% for clear and cloudy days, respectively.

However, the model’s forecasting errors were relatively high on

unusual weather days, and methods such as increasing train-

ing samples, subdividing, and manual intervention were found

to improve accuracy. The proposed model outperformed the

other three methods in most error evaluation indicators across

all categories.

Reference [25] created a framework to assess different mod-

els and techniques for solar power forecasting. Machine learning

methods, such as random forest, artiﬁcial neural network, and

extreme gradient boosting, were tested with feature selection

techniques, including feature importance and principal compo-

nent analysis. The optimal combination was found to be the

XGBoost method with features selected by PCA, which outper-

formed other methods. The framework can be utilized to select

the most suitable machine learning approaches for short-term

solar power forecasting.

Reference [26] developed a forecasting method called

RF-CEEMD-DIFPSO-BPNN for PV power generation fore-

casting, which combines several techniques such as ran-

dom forest (RF), improved grey ideal value approximation

(IGIVA), complementary ensemble empirical mode decompo-

sition (CEEMD), particle swarm optimization algorithm based

on dynamic inertia factor (DIFPSO), and backpropagation neu-

ral network (BPNN). The RF method is utilized to identify the

most signiﬁcant factors, and the weight values obtained from

RF are transferred to the IGIVA model. Then, the CEEMD

method is applied to reduce the sequence’s ﬂuctuations. The

hybrid model’s effectiveness is conﬁrmed by an empirical analy-

sis, indicating that the RF-CEEMD-DIFPSO-BPNN approach

is a promising method for PV power generation forecasting.

Reference [27] developed a predictive model for PV power

generation using machine learning algorithms. The study was

conducted using data collected at Alice Springs in Australia as

a case study, and a variety of environmental factors were con-

sidered for short-term and long-term energy output prediction.

The study compared several machine learning algorithms and

found that random forest regression was the most effective for

the given dataset.

Reference [28] introduced an improved version of the ran-

dom forest model for data analysis. The approach optimized

bias/variance in STPF application by employing attribute selec-

tion methods, resulting in better forecasting quality. Local

Interpretable Model-Agnostic Explanations, Extreme Boosting

Model, and Elastic Net were used to create a feature-weighting

vector for weather inputs. The proposed approach outper-

formed various data-driven machine learning models when used

in a typical distributed PV system, using a real database from

weather sensors.

Three studies focus on optimizing solar energy forecast-

ing and microgrid operations [29-31]. The ﬁrst study [29]

addresses the challenges of inaccurate solar power forecasts

and evaluates various machine learning models. It ﬁnds that

the Bi-LSTM model performs best, enhancing forecasting accu-

racy. The second study [30] proposes an optimal scheduling

model for microgrids that employs automated reinforcement

learning for load and renewable energy forecasts. The model

reduces operating costs and improves prediction accuracy. The

third study [31] concentrates on predicting solar PV power

generation in Lubbock, Texas, using machine learning models.

Random forest regression and Long Short-Term Memory mod-

els outperform others, capturing complex relationships in solar

power data, aiding in efﬁcient planning and energy production.

These studies collectively contribute to enhancing the accu-

racy and efﬁciency of solar energy generation and microgrid

operations.

The proposed random forest technique in this work

effectively models non-linear relationships between weather

parameters and photovoltaic (PV) power output. Proper data

preprocessing, feature selection, and hyperparameter tuning

are essential. This approach is robust due to its non-linearity

handling, resistance to overﬁtting, and features important

insights. The direct computation method of kernel functions

enhances non-linear learning by improving efﬁciency, manag-

ing dimensionality, ensuring ﬂexibility, and enhancing feature

interpretability.

3STUDY AREA DESCRIPTION, DATA

COLLECTION, AND PREPARATION

We collected the dataset for 12 months (October 2021 to

September 2022) from 20 kW on-grid solar power plant and

a local weather station installed at 22.78◦N, 73.65◦E, College of

Agricultural Engineering and Technology, Anand Agricultural

University, Godhra, India. The study site has four classic sea-

sons, autumn season from October to November, winter season

from December to February, summer season from March to

May and rainy season from June to September.

The proposed technique can provide information about

the importance of different weather parameters (e.g., temper-

ature, humidity, wind speed) in making predictions. This feature

importance analysis using correlation coefﬁcient analysis using

heat maps helps identify which variables have the most signif-

icant impact on the outcome, allowing the model to adapt to

varying weather conditions. Accordingly, data was collected at

a 15-min time resolution, but only data between 7:00 AM and

5:00 PM was considered due to solar radiation availability. How-

ever, missing values occurred in the dataset due to power failures

affecting the data loggers. This resulted in a dataset of 10,086

samples for the study period, which is summarized in Table 1.

The weather variables (time of the day,Time(Hrs), ambient tem-

perature,Temp (◦C), relative humidity, Hum(%), solar radiation,

Rad (W/m2), wind speed, Ws(m/s), and wind direction, Wd(◦))

were recorded by one data logger, while the other data log-

ger recorded the target variable (solar photovoltaic power, Rad

(W/m2)). Enough care was taken in matching the data collected

from two data loggers with respect to the time of data collection.

Figure 1shows the distribution of the weather variables.

It is crucial to take into account data availability and corre-

lation when choosing variables for a prediction task. Hence, a

statistical examination was performed to assess the correlation

JOGUNURI ET AL.5

TAB LE 1 Description of the data collected for the study period.

Time (Hrs) Temp (◦C) Hum (%) Wd(◦)Ws(m/s) Rad (W/m2)PVoutput (kW)

Count 10,086 10,086 10,086 10,086 10,086 10,086 10,086

Mean 11.85 30.33 54.19 121.12 2.00 337.63 7.13

Std 2.95 5.61 27.57 73.75 2.02 198.85 4.27

Min 7.00 8.60 0.00 0.00 0.00 0.00 0.00

25% 9.30 27.50 32.00 70.00 0.00 170.00 3.35

50% 12.00 31.20 51.00 91.00 1.80 336.00 6.86

75% 14.30 33.70 76.00 182.00 2.70 503.00 10.79

Max 17.00 45.40 100.00 360.00 12.60 1013.00 20.80

*Study period (October 2021 to September 2022) at 15-min time resolution.

FIGURE 1 Distribution of weather parameters. (a) Ambient temperature (◦C), (b) relative humidity (%), (c) solar radiation (W/m2) and (d) wind speed (m/s).

between each available weather variable and solar photovoltaic

power, as demonstrated in Figure 2. It displays the correla-

tion coefﬁcients between all ﬁve weather parameters and time

of day (a total of six input variables X1to X6) in relation to

PVoutput (output or target variable Yt) using the entire dataset.

A negative correlation was detected between temperature and

humidity. The data indicates a strong correlation between solar

radiation and PVoutput. Although a direct and signiﬁcant relation-

ship between time, temperature, and PVoutput was not evident,

time does appear to have a substantial impact on temperature,

which is in turn correlated with solar radiation. Therefore, time

of day was included as an input variable in the study in addition

to the weather parameters.

In the present study, entire one-year data set was divided in

to seasonal data, for autumn season data considered from 1

October 2021 to 30 November 2021, for winter season from 1

December 2021 to 28 February 2022, for summer season from

1 March 2022 to 31 May 2022 and, for rainy season from 1 June

2022 to 30 September 2022.

4MATERIAL AND METHODS

Great efforts have been put by the researchers for mod-

elling and forecasting of PV power. Proposals for various

forecasting techniques have been described in various papers

in order to forecast PV power at various time horizons

and most importantly, short term PV power forecasting is

very much essential in controlling, dispatching, and scheduling

power [32].

6JOGUNURI ET AL.

FIGURE 2 Heatmap of correlation coefﬁcients between input variables

and target variable solar photo voltaic power output.

Machine learning (ML) involves training a computer system

to gain expertise by processing and analysing data collected over

time, aiming to improve its performance over time [33].

4.1 Random forest (RF)

RF is one of the most widely used machine learning algorithms

due to its simplicity. It can be used in the areas of regression

and classiﬁcation. It is one of the classes of supervised learn-

ing algorithms and SVMs, which includes naive Bayes algorithm

and other tree-based algorithms such as Adaboost [34]. It was

ﬁrst developed and proposed by Breiman et al. [38]attheUni-

versity of California in 2001. Random forest regression is an

ensemble learning technique that integrates predictions from

various machine learning algorithms to produce more precise

predictions than a single model [27]. The proposed random for-

est technique does not require extensive data preprocessing or

imputation of missing values prior to training. This contrasts

with some other machine learning algorithms that may require

a complete dataset or explicit imputation strategies. Random

forest can work with missing data “out of the box”. When ran-

dom forest encounters missing data in the training dataset, it

can handle it by imputing or ﬁlling in missing values. Random

forest assesses the importance of features during the training

process. If a feature with missing data is not highly relevant

for the prediction task, the model may naturally give it less

weight, effectively downplaying its contribution to the model’s

predictions. The method constructs trees separately by utiliz-

ing bootstrap data samples, resulting in a forest that includes a

considerable number of decision trees. The accuracy of the fore-

cast increases as more trees are included, leading to improved

precision [35]. Figure 3demonstrates the conﬁguration of the

random forest model.

To execute random forest regression on the training data set,

the subsequent actions must be taken:

FIGURE 3 Random forest model.

1. Initially, a selection of ‘k’ data points is made from the input

(training) dataset, denoted by ‘x’.

2. A decision tree is created that corresponds to these ‘k’ data

points.

3. The ﬁrst and second steps are reiterated until ‘N’ decision

trees are generated during the training phase.

4. When presented with a new data point, each of the generated

trees produces a prediction value ‘y’. The data point is then

attributed to the average of all predicted ‘y’ values.

Random forest regression performs well on diversiﬁed prob-

lems with the potentiality of handling non-linear relationships.

Recently, RF has gained greater attention among researchers

in the ﬁeld of PV power forecasting due to its advantages in

ensemble learning and superior performance compared to other

statistical-based machine learning algorithms. Random forests

offer high accuracy, robustness, and versatility in handling

diverse data types. They mitigate overﬁtting through ensem-

ble learning and serve as a dependable baseline for forecasting.

In contrast, models like ARIMA, Exponential Smoothing, and

Neural Networks’ accuracy relies on data characteristics and

hyperparameter choices. Random forest can be computation-

ally intensive, especially with many trees and features, but it

beneﬁts from parallel processing. Other models like ARIMA

and Exponential Smoothing are generally efﬁcient for univari-

ate time series. Deep learning models like RNNs and LSTMs are

computationally intensive to train. In this study, RF techniques

were used to forecast PV power output from a grid-connected

PV plant. Random forest is robust to outliers due to its ensem-

ble nature, which combines predictions from multiple trees.

While outliers may affect individual tree decisions, their impact

is minimized when aggregated in the ensemble’s majority vote

or average.

To assessthe effectiveness of RF, other commonly used mod-

els such as SVR and ANN/DNN were also evaluated and

JOGUNURI ET AL.7

compared with the RF technique. SVR technique is used as

a reference model for evaluating the skill score of the model

proposed for the selected site.

4.2 Support vector machines (SVM)

The support vector machine (SVM) is an advanced machine-

learning technique that was ﬁrst introduced by Vapnik and is

known for its high performance [36]. An SVM is a supervised

learning algorithm used for classiﬁcation and regression analy-

sis [37]. It works by ﬁnding the hyperplane that best separates

the data points into different classes. In simple terms, an SVM

tries to ﬁnd the best line (in two dimensions) or hyperplane (in

multiple dimensions) that separates the data points belonging to

different classes.

The SVM algorithm works by transforming the input data

into a higher-dimensional space using a kernel function. Then,

it ﬁnds the hyperplane that maximally separates the transformed

data points into different classes. The points closest to the

hyperplane are called support vectors and are used to deﬁne the

hyperplane.

Support vector regression (SVR) and support vector classiﬁ-

cation (SVC) are two types of supervised learning algorithms

based on SVMs, but they are used for different types of

problems.

The main difference between SVR and SVC is that SVR is

used for regression problems, where the goal is to predict a

continuous output variable, while SVC is used for classiﬁca-

tion problems, where the goal is to predict a categorical output

variable.

In SVR, the goal is to ﬁnd a hyperplane that best ﬁts the

training data points while minimizing the error between the pre-

dicted and actual values. The hyperplane is deﬁned by a set of

support vectors, and the distance between the hyperplane and

the closest data points from each class is called the margin. The

margin is used to control the trade-off between model complex-

ity and generalization performance. SVR uses a loss function

that penalizes errors more for points that are further away from

the hyperplane, resulting in a model that is less sensitive to

outliers.

In SVC, the goal is to ﬁnd a hyperplane that best separates

the training data points into different classes. The hyperplane is

deﬁned by a set of support vectors, and the distance between the

hyperplane and the closest data points from each class is called

the margin. The margin is used to control the trade-off between

model complexity and generalization performance. SVC uses a

loss function that penalizes errors more for misclassiﬁed points,

resulting in a model that is more sensitive to misclassiﬁcation.

Both algorithms use support vectors and margins to control

the trade-off between model complexity and generalization per-

formance, but they use different loss functions that penalize

errors differently.

Unlike the commonly used empirical risk minimization

(ERM) approach in statistical learning methods, SVMs utilize

the structural risk minimization (SRM) concept to mitigate an

upper bound on the generalization error. This allows SVMs to

have a greater potential to generalize, as opposed to simply min-

imizing the error in the training data. In addition, SVMs are

more likely to ﬁnd a global optimum solution rather than get-

ting stuck in a local optimal solution like classical neural network

models. SVMs can be used for both classiﬁcation and regression

problems [38].

4.3 Feature space and kernel functions

The core operating principle of SVMs is to map data onto a

feature space using non-linear mapping and then apply a lin-

ear algorithm. Because the feature space requires dot product

evaluation, it is often high-dimensional and resource-intensive,

requiring signiﬁcant computational power and time. However,

in some cases, a simpler kernel may be developed and eval-

uated for its effectiveness. In real-world scenarios, complex

problems require more advanced hypothesis spaces than those

provided by linear learning machines, which are limited by their

computational capabilities.

The given attributes cannot be expressed as a basic linear

combination in the target data. Linear learning machines have

a useful characteristic, which is the ability to be represented in a

dual form. This means that the hypothesis can be expressed as a

linear combination of the training points, allowing the decision

rule to be evaluated solely based on the inner products between

the test point and the training points. If it is possible to directly

calculate the inner product in feature space using the original

input points, it may be possible to create a non-linear learn-

ing machine called a direct computation method of the kernel

function, denoted by K[38].

The SVM models utilize input variables that have a con-

nection with the objective variable, which is the variable that

needs to be predicted. This involves representing the data in a

non-linear function f(x) and visualizing it.

f(x)=𝝎.𝝋

(x)+b(1)

where 𝜔is the normal vector bis a constant or biased term

𝜑(x) is a large dimensional special characteristic mapped by a

space vector x.

To determine the coefﬁcients 𝜔and b, an optimization

problem is solved through minimization.

R(SVM )(f)=C1



I=1

xi=1=Le(f(xi),yi)+1

2w2

(2)

Lef(xi),yi=Si f(x),y−𝜖for f(x),y≥𝜖(3)

Lef(xi),yi=0 otherwise (4)

where 𝜖is the parameter of the model.

Le(f(xi),di) is describes the 𝜖th missing function, this refers

to the fact that any errors that fall under the value of epsilon will

not be subject to penalty, direpresents the solar PV power in

8JOGUNURI ET AL.

FIGURE 4 Illustration of Support Vector Regression.

the period iand C1

NN

I=1L𝜖(f(xi),di) deﬁnes the empirical

error of SVM model.

2w2is the regularization term, Cis the penalty function

assessed to balance the compensation between the error and

empirical risk by utilizing slack variables 𝜀and 𝜀∗. These vari-

ables indicate the presence of excessive top and bottom skews,

respectively.

Equation (2) can be formulated as demonstrated below by

utilizing the characteristics of the function that needs to be

optimized (illustration shown in Figure 4).

minimize 1

2w2+1



i=1𝜀i+𝜀

∗

i(5)

only when ∶









yi −(wxi+b)≥𝜀i+𝜀

∗

wxi+b.yi≤𝜀+𝜀

∗

𝜀i,𝜀∗

i≥0

(6)

By utilizing Lagrange and optimal constraints, it is feasible to

derive a non-linear regression function to solve Equation (1):

f(x)=



i=1∝i−∝

∗

iK(xi−x)+b(7)

where ∝i,∝

∗

iare Lagrange multipliers.

The term K(xi−x) is deﬁned as a kernel function.

K(xi−x)=



i=1

𝜑i(x)+𝜑

i(y)(8)

There are four main functions available for SVM, namely

linear, polynomial, radial basis function, and sigmoid [33].

(i) Linear Kernel Function

Kxi,xj=xi.xj(9)

where xi,xjare the inputs to the ith and jth dimensions

respectively

(ii) Polynomial Kernel Function

Kxi,xj=xi.xjq(10)

where qis degree of polynomial

(iii) Radial Basis Kernel Function

K(xi,xj)=exi−xj2𝜎

2(xi⋅xj)q(11)

where 𝜎is kernel weight

(iv) Sigmoid Kernel Function

xi,xj=tanh(v(xi.xj)+c) (12)

where vand cae adjustable kernel functions relying on the data.

4.4 Artiﬁcial neural networks (ANN) and

deep neural networks (DNN)

ANN emulates the form and function of the natural neural

network found in the human body. ANNs possess the capa-

bility to automatically identify data patterns from previously

included data in the network [39]. ANNs are widely acclaimed

for their ability to model complex and non-linear processes

between input and output variables, making them superior to

other forecasting techniques.

Figure 5illustrates a simple architecture for an ANN, where

neurons process the input received and produce an output using

their individual activation functions. In ANNs, the learning rate

parameter, number of hidden layers, and maximum iteration

count are essential parameters that regulate the learning process.

The activation functions’ weights and parameters are adjusted

through a process called learning. The number of neurons in

the input, hidden, and output layers may vary, and various acti-

vation functions such as Sigmoid, Rectiﬁed Linear Unit, and

Softmax are utilized in ANNs for computation. ANNs have

various advantages such as fault tolerance, parallel processing

capability, and ability to store information across the network,

without losing performance. However, there are also some dis-

advantages to ANNs, such as hardware dependency, which

requires processors with parallel processing power. Additionally,

the lack of interpretability of the network and the unpre-

dictability of the duration of the network are also signiﬁcant

drawbacks [40].

DNNs are essentially ANNs with multiple hidden layers,

rather than just one. The conventional layers of DNNs can

effectively capture and utilize the fundamental one- or two-

dimensional structure of the network. With the growth of

IoT and the increasing capacity of big data, DNN models

have gained signiﬁcant attention for use in various research

JOGUNURI ET AL.9

FIGURE 5 Basic architecture of ANN and DNN. (a) Artiﬁcial neural networks (ANN) and (b) Deep neural networks (DNN).

ﬁelds. One advantage of DNNs is their ability to capture

non-linear relationships between input features and output tar-

gets. The process involves acquiring knowledge from data by

focusing on learning multiple layers of representations that

gradually become more meaningful representation of the data.

As it delves deeper, it becomes capable of recognizing more

advanced representations, enabling it to establish an accurate

correlation between input characteristics and their intended

target [41].

4.5 Advantages of RF over other ML

techniques

1. ANN, DNN, and SVM require larger data base, while ran-

dom forest does not require large data base, it is powerful

in giving the more accurate predications compared to the

others with less data.

2. Random forest models are relatively easy to interpret. They

provide feature important scores, allowing you to understand

the impact of different features on the predictions.

3. Random forest is less sensitive to noisy data and outliers. The

ensemble nature of the model helps mitigate the impact of

individual noisy data points.

4. Random forest models are generally faster to train and

require less computational resources than deep neural

networks, especially when dealing with large datasets.

5. Random forest can naturally handle categorical data without

the need for one-hot encoding or extensive preprocessing.

6. Random forest is generally more stable and less prone to

overﬁtting, making it a reliable choice when the dataset is

limited, or the data quality is inconsistent.

7. Random forest can perform well with smaller datasets, mak-

ing it suitable for applications where large amounts of data

are not readily available.

5PERFORMANCE METRICS

The RMSE, the MAPE, and mean absolute arc-tangent percent-

age error (MAAPE) were calculated and are used as evaluation

criteria to validate the error and assess how well the proposed

model is performing. Additionally, the skill score has been cal-

culated, considering one of the statistical models as a reference.

Here in this study, SVR is the reference model.

5.1 Root mean square error

RMSE =



1



i=1Ai−Fi

(13)

If you need to communicate model performance to non-data

professionals, MAPE would be a better choice than RMSE as

it is much easier to understand. MAPE is expressed as a per-

centage, making it more accessible and comprehensible for end

users who may not have a background in data.

5.2 Mean absolute percentage error

MAPE =1



i=1Ai−Fi

.100%(14)

MAPE suffers from a signiﬁcant drawback as it can pro-

duce undeﬁned or inﬁnite values when actual values are zero or

close to zero. To address this issue, a new measure of forecast

accuracy called the MAAPE has been developed by looking at

MAPE from a different perspective. While MAPE considers the

slope as a ratio, MAAPE views it as an angle, making it a more

10 JOGUNURI ET AL.

reliable and robust measure. MAAPE overcomes the problem

of division by zero by using bounded inﬂuences for outliers in

a fundamental manner. It retains the philosophy of MAPE but

considers the ratio as an angle, rather than a slope, to address

the issue of undeﬁned or inﬁnite values [42].

5.3 Mean arc-tangent absolute percentage

error

MAAPE,𝜃, an angle,varying from 0◦to 90◦

=tan−1Ai−Fi

Ai(15)

5.4 Skill score

There is a common argument that measures of forecast accuracy

ought to be presented as a skill score.

SS =Af−Ar

Ap−Ar

(16)

where Afand Arrepresent the ‘accuracy’, according to some

given measure, of the forecasting system of interest and some

reference forecasting system, respectively. The quantity Aprep-

resents the persistent model accuracy value of the measure;

that is, the value of the metric if the outcome were known

perfectly.

5.5 Relative skill score

When the persistent model accuracy Apis equal to zero,

a different statistical measure can be used to compare

the performance of two forecasting systems. This measure

is deﬁned as relative skill score, an alternative to a skill

score [43].

Relative skill score =Af−Ar

−Ar

(17)

For calculating skill score/relative skill score in this work,

the accuracy parameter used is MAAPE over its advantages as

mentioned above.

6RESULTS AND DISCUSSIONS

In this study, determining the optimal random forest parameters

involved experimentation and domain knowledge. The key con-

trol parameters identiﬁed were the number of estimators and

the random state. Correlation coefﬁcients and heat maps high-

lighted the key data parameters. To capture seasonal variations,

time-based features were added to the dataset, capturing annual

patterns.

Results for correlation coefﬁcients were presented in Figure 6

in the form of seasonal heat maps.

∙Strong correlation between PV power output, solar radiation,

and ambient temperatures were seen in summer followed by

winter and autumn.

∙Correlation between temperatures and time were seen strong

in winter season followed by autumn and summer.

∙It was very clear form the heat maps, the time-temperature,

and solar radiation-PV output were strongly correlated.

∙Correlations in monsoon season compared to the other

seasons were not so strong.

Though there is not much signiﬁcance correlation seen between

other weather parameters with PV output, to avoid any loss of

data, all attributes were considered for the present study.

15 min ahead seasonal PV power forecasted values versus

observed values using different ML techniques (RF, DNN,

ANN, SVR) were presented in Figure 7along with the mean

arc-tangent absolute percentage errors (varies between 0◦and

90◦), which should be low for a good model. In all four seasons,

the MAAPE for random forest is observed to be low compared

to the other models evaluated in the study. Season wise analy-

sis, shows that, models are performing better in summer season

followed by winter, autumn and monsoon based on the values

of MAAPE.

Figure 8presents the results of 30 min ahead seasonal PV

power forecasted values versus actual values using different ML

techniques (RF, DNN, ANN, SVR). In all four seasons, the

MAAPE for random forest is observed to be low compared to

the other models evaluated in the study and the reference SVR

methods. Season wise analysis, shows that, models are perform-

ing better in summer season with the lower MAAPE of 9.29◦

followed by winter, autumn and monsoon based on the values

of MAAPE. The errors are less compared to the 15 min ahead

forecasting.

Figure 9shows the results of 45 min ahead seasonal PV

power forecasted values versus actual values. Proposed random

forest technique for the study location is found to be perform-

ing better based on the lower MAAPE values compared to the

other techniques used for comparison. These values are on par

with the results obtained in 30 min ahead forecasting. Season

wise analysis, shows that, models are performing better in sum-

mer season with the lower MAAPE of 8.57◦followed by winter,

and autumn based on the values of MAAPE. The errors are less

compared to the 15 and 30 min ahead forecasting.

Figure 10 shows the results of 60 min ahead seasonal PV

power forecasted values versus actual values. Compared to

reference SVR, and other techniques evaluated in the study, pro-

posed random forest technique for the study location is found

to be performing better as per the lower MAAPE values. These

values are on par with the results obtained in 15,30, and 45 min

ahead forecasting. Season wise analysis, shows that, models are

performing better in summer season with the lower MAAPE of

9.13◦followed by winter, and autumn based on the values of

MAAPE.

JOGUNURI ET AL.11

FIGURE 6 Seasonal variation of correlation coefﬁcient.

With reference to Figures 7–10, and on MAAPE values, it is

visible that the random forest technique is superior over DNN,

ANN, and SVM (reference technique) with lowest MAAPE

values ranging from 8.57◦to 15.77◦.

Various performance metrics for different seasons and fore-

casting horizons for various ML techniques were presented in

Table 2, but the base for ranking the performance of the mod-

els considered for the study considered is MAAPE. Though

RMSE, r2and MAE were considered in the literature reported,

it becomes complex when the error values are appearing to be

very close for different techniques as shown in Table 2.This

leads to the calculation of MAPE. This value gives the per-

centage of error between different models results. If the data

consists of any values closer to zero or zero, this percentage

values as per their equations shown in methodology leads to

a higher value or inﬁnite and MAPE too sometimes does not

provide any information for deciding the performance of the

model. To avoid this, a new error metric was considered here,

that is, MAAPE which give error in terms of angle varying

between 0◦and 90◦. Any value closer to 0◦is considered as

a good model. As per the MAAPE results shown in the table,

except the monsoon season, the values are ranging between 8◦

and 15◦for random forest technique and for all other tech-

niques, these values are higher than the proposed random forest

technique.

Further, though MAAPE shows RF superiority over other

models, considering SVM as the reference model, the percent-

age improvements of all models compared to the reference

model too were evaluated using relative skill score. The rela-

tive scores for the forecasting horizons of 45 and 60 min do not

show any improvement in the proposed RF model, while show-

ing an appreciable percentage of improvement ranging from 6%

to 49% and 3% to 50% for forecasting horizons 15 and 30 min,

respectively. Lowest and highest amounts of improvement are

seen in the monsoon and winter seasons respectively for both

15 and 30 min horizons.

The forecasting accuracy of 15 and 30 min ahead hori-

zons were high compared to the other forecasting horizons.

This is because of the reason that the frequency of input

data collection is close to the forecasting horizon. Also,

the ensemble nature of random forest combines the pre-

dictions of multiple decision trees, each capturing different

aspects of the forecasting problem. This ensemble approach

improves the model’s ability to adapt to the variations

and sudden changes in weather conditions at shorter time

horizons.

12 JOGUNURI ET AL.

FIGURE 7 15 min ahead seasonal PV forecasting graphs for different ML techniques. RF, random forest; DNN, deep neural networks; ANN, artiﬁcial neural

networks; SVR, support vector regression.

FIGURE 8 30 min ahead seasonal PV forecasting graphs for different ML techniques. RF, random forest; DNN, deep neural networks; ANN, artiﬁcial neural

networks; SVR, support vector regression.

JOGUNURI ET AL.13

FIGURE 9 45 min ahead seasonal PV forecasting graphs for different ML techniques. RF, random forest; DNN, deep neural networks; ANN, artiﬁcial neural

networks; SVR, support vector regression.

FIGURE 10 60 min ahead seasonal PV power forecasting graphs for different ML techniques. RF, random forest; DNN, deep neural networks; ANN, artiﬁcial

neural networks; SVR, support vector regression.

14 JOGUNURI ET AL.

TAB LE 2 Performance metrics for different seasons and forecasting horizons for various ML techniques.

15 min 30 min 45 min 60 min

Horizon RF DNN ANN SVM RF DNN ANN SVM RF DNN ANN SVM RF DNN ANN SVM

RMSE Autumn 1.8109 1.7226 1.7330 1.8003 1.7376 1.6748 1.6652 1.7117 1.7025 1.6470 1.8286 1.6998 1.9682 1.8342 1.8940 1.8428

Winter 1.1769 1.2291 1.2547 1.2450 1.0694 1.2982 1.3841 1.3376 1.0800 1.2531 1.3406 1.3556 1.0742 1.3328 1.4035 1.3643

Summer 1.7025 1.4059 0.9405 1.2903 1.2478 1.4140 1.4312 1.3490 1.2676 1.3935 1.6342 1.3171 1.2695 1.4298 1.6682 1.4520

Monsoon 3.7021 3.5697 3.5238 3.6100 3.6052 3.6960 2.9735 3.7791 3.6092 3.6217 3.6537 3.7817 3.5416 3.5920 3.6027 3.7034

r2Autumn 0.8303 0.8544 0.8527 0.8410 0.8436 0.8619 0.8635 0.8558 0.8474 0.8633 0.8315 0.8544 0.7953 0.8268 0.8154 0.8252

Winter 0.9249 0.9235 0.9203 0.9215 0.9367 0.9135 0.9017 0.9080 0.9362 0.9202 0.9087 0.9066 0.9395 0.9087 0.8987 0.9043

Summer 0.8940 0.8460 0.8482 0.8703 0.8816 0.8483 0.8445 0.8619 0.8783 0.8551 0.8007 0.8705 0.8755 0.8489 0.7943 0.8441

Monsoon 0.3168 0.2989 0.3145 0.2827 0.3224 0.2880 0.2948 0.2558 0.3329 0.3229 0.3026 0.2618 0.3437 0.3083 0.3042 0.2647

MAPE Autumn 0.282 0.420 0.421 0.430 0.235 0.495 0.807 0.436 0.249 0.601 0.798 0.504 0.265 0.642 0.709 0.444

Winter 0.170 0.332 0.383 0.338 0.175 0.355 0.385 0.357 0.175 0.398 0.412 0.389 0.179 0.541 0.545 0.445

Summer 0.149 0.201 0.209 0.184 0.164 0.265 0.265 0.254 0.151 0.280 0.289 0.244 0.161 0.264 0.313 0.285

Monsoon 0.932 1.135 1.140 1.029 0.952 1.154 1.114 0.991 1.001 1.025 1.019 0.870 0.968 1.005 1.034 0.909

MAAPE Autumn 15.77 22.78 22.83 23.29 13.20 38.92 26.34 23.53 13.96 30.99 38.58 26.73 14.84 32.69 35.33 23.95

Winter 9.63 18.37 20.95 18.70 9.92 21.08 19.52 19.66 9.90 21.69 22.38 21.25 10.17 28.40 28.61 24.00

Summer 8.45 11.38 11.83 10.40 9.29 14.83 14.84 14.24 8.57 15.66 16.11 13.73 9.13 14.77 17.35 15.91

Monsoon 42.98 48.62 48.74 45.82 43.59 48.09 49.09 44.74 45.03 45.70 45.53 41.01 44.07 45.14 45.96 42.28

Relative

Skill

score

Autumn 0.323 0.022 0.020 0.000 0.439 −0.119 −0.654 0.000 0.478 −0.160 −0.443 0.000 0.380 −0.365 −0.475 0.000

Winter 0.485 0.018 −0.121 0.000 0.495 0.007 −0.072 0.000 0.534 −0.021 −0.053 0.000 0.576 −0.183 −0.192 0.000

Summer 0.188 −0.094 −0.137 0.000 0.347 −0.042 −0.041 0.000 0.376 −0.141 −0.173 0.000 0.426 0.072 −0.091 0.000

Monsoon 0.062 −0.061 −0.064 0.000 0.026 −0.097 −0.075 0.000 −0.098 −0.114 −0.110 0.000 −0.042 −0.068 −0.087 0.000

7CONCLUSION

For improving the stability of the grid, accurate forecasting of

PV power output from largely integrated solar photovoltaic

plant connected to grid is required. In the present study, to

improve the forecasting accuracy of the forecasting models, in

situ measurement of the weather parameters and the PV power

output from the 20 kW on-grid were collected for a typical

year which covers all four seasons and evaluated the random

forest techniques and other techniques like DNNs, ANNs and

SVR (reference in this study). The simulation results show that

the proposed random forest technique for the forecasting hori-

zon of 15 and 30 min is performing well with 49% and 50%

improvements in the accuracy respectively over reference model

for the study location 22.78◦N, 73.65◦E, College of Agricultural

Engineering and Technology, Anand Agricultural University,

Godhra, India. The proposed random forest technique can be

applied to different locations with varying weather patterns and

grid characteristics; it is essential to adapt the model to the spe-

ciﬁc context. This involves collecting relevant data, engineering

appropriate features, and tuning the model to achieve accurate

forecasts for the new location/study location. Additionally, data

privacy concerns are mitigated as the analysis uses locally stored

data without transmission.

Further studies to improve the forecasting accuracy for more

forecasting horizons are to be considered for the future scope

of work.

ABBREVIATIONS

PV Photovoltaic

ML Machine learning

NWP Numerical weather predictions

SVM Support vector machines

SVC Support vector classiﬁcation

SVR Support vector regression

ANN Artiﬁcial neural networks

DNN Deep Neural Networks

RF Random Forest

GA Genetic algorithm

ANFIS Adaptive neuro-fuzzy inference system

GNN Graph neural network

PVPNet Prioritization of variants in personal network

LSTM Long-Short-Term-Memory

WT Wavelet transform

DGM Discrete Grey Model

RNN Recurrent neural networks

GRU Gated recurrent unit

DLNN Deep learning neural network

BiLSTM Bidirectional Long Short-Term Memory

BiGRU Bidirectional Gated recurrent unit

HIMVO Hybrid improved multi-verse optimizer

algorithm

ACO Ant colony optimization

XGBoost Extreme Gradient Boosting

JOGUNURI ET AL.15

CEEMD Complementary Ensemble Empirical Mode

Decomposition

DIFPSO Particle swarm optimization algorithm based on

dynamic inertia factor

BPNN Back propagation neural network

IGIVA Improved grey ideal value approximation

RMSE Root mean squared error

MAE Mean absolute error

MAPE Mean absolute percentage error

MAAPE Mean absolute arc-tangent percentage error

SS Skill score

RSS Relative skill score

AUTHOR CONTRIBUTIONS

Sravankumar Jogunuri: Conceptualization; resources. Josh T:

Data curation; software. Albert Alexander Stonier: Formal

analysis; supervision. Geno Peter: Formal analysis; supervision.

Jaya kumar Jayaraj: Validation. Jaganathan S: Investigation;

visualization. Jency Joseph J: Methodology; writing—original

draft. Vivekananda Ganji: Writing—review and editing.

FUNDING INFORMATION

The authors received no speciﬁc funding for this work.

CONFLICT OF INTEREST STATEMENT

The authors declare no conﬂicts of interest.

DATA AVAILABILITY STATEMENT

The data that support the ﬁndings of this study are available

from the corresponding author upon reasonable request.

ORCID

Sravankumar Jogunuri https://orcid.org/0000-0003-4272-

5176

Josh F.T https://orcid.org/0000-0002-1580-0567

Albert Alexander Stonier https://orcid.org/0000-0002-3572-

2885

Geno Peter https://orcid.org/0000-0002-1825-8427

Jayakumar Jayaraj https://orcid.org/0000-0001-7099-0554

Jaganathan S https://orcid.org/0000-0002-2967-7301

JencyJosephJ https://orcid.org/0000-0001-9909-2256

Vivekananda Ganji https://orcid.org/0000-0002-5646-2138

REFERENCES

1. Li, P., Zhou, K., Lu, X., Yang, S.: A hybrid deep learning model for short-

term PV power forecasting. Appl. Energy 259, 114216 (2020). https://doi.

org/10.1016/j.apenergy.2019.114216

2. Mi, S., Plato, J., Kr, P.: Supervised learning of photovoltaic power plant.

Neural Netw. World 4(13), 321–338 (2013)

3. Nam, K.J., Hwangbo, S., Yoo, C.K.: A deep learning-based forecasting

model for renewable energy scenarios to guide sustainable energy policy:

A case study of Korea. Renewable Sustainable Energy Rev. 122, 109725

(2020). https://doi.org/10.1016/j.rser.2020.109725

4. AlKandari, M., Ahmad, I.: Solar power generation forecasting using

ensemble approach based on deep learning and statistical methods. Appl.

Comput. Inf. (2019) ahead-of-print. https://doi.org/10.1016/j.aci.2019.11.

002

5. Brahma, B., Wadhvani, R.: Solar irradiance forecasting based on deep

learning methodologies and multi-site data. Symmetry 12(11), 1–20 (2020).

https://doi.org/10.3390/sym12111830

6. Qing, X., Niu, Y.: Hourly day-ahead solar irradiance prediction using

weather forecasts by LSTM. Energy 148, 461–468 (2018). https://doi.org/

10.1016/j.energy.2018.01.177

7. Sorkun, M.C., Incel, Ö.D., Paoli, C.: Time series forecasting on multivari-

ate solar radiation data using deep learning (LSTM). Turk. J. Electr. Eng.

Comput. Sci. 28(1), 211–223 (2020). https://doi.org/10.3906/elk-1907-

218

8. Guijo-Rubio, D., et al.: Evolutionary artiﬁcial neural networks for accurate

solar radiation prediction. Energy 210, 118374 (2020). https://doi.org/10.

1016/j.energy.2020.118374

9. Dong, J., et al.: Novel stochastic methods to predict short-term solar radi-

ation and photovoltaic power. Renewable Energy 145, 333–346 (2020).

https://doi.org/10.1016/j.renene.2019.05.073

10. Yadav, H.K., Pal, Y., Tripathi, M.M.: A novel GA-ANFIS hybrid model for

short-term solar PV power forecasting in Indian electricity market. J. Inf.

Optim. Sci. 40(2), 377–395 (2019). https://doi.org/10.1080/02522667.

2019.1580880

11. Chaudhary, P., Rizwan, M.: Short term solar energy forecasting

using GNN integrated wavelet-based approach. Int. J. Renewable

Energy Technol. 10(3), 229 (2019). https://doi.org/10.1504/ijret.2019.

101729

12. Wang, K., Qi, X., Liu, H.: A comparison of day-ahead photovoltaic power

forecasting models based on deep learning neural network. Appl. Energy

251, 113315 (2019). https://doi.org/10.1016/j.apenergy.2019.113315

13. Huang, C.J., Kuo, P.H.: Multiple-input deep convolutional neural network

model for short-term photovoltaic power forecasting. IEEE Access 7,

74822–74834 (2019). https://doi.org/10.1109/ACCESS.2019.2921238

14. Li, G., Wang, H., Zhang, S., Xin, J., Liu, H.: Recurrent neural networks

based photovoltaic power forecasting approach. Energies 12(13), 1–17

(2019). https://doi.org/10.3390/en12132538

15. Gao, M., Li, J., Hong, F., Long, D.: Day-ahead power forecasting in a

large-scale photovoltaic plant based on weather classiﬁcation using LSTM.

Energy 187, 115838 (2019). https://doi.org/10.1016/j.energy.2019.07.

168

16. Hossain, M.S., Mahmood, H.: Short-term photovoltaic power forecast-

ing using an LSTM neural network and synthetic weather forecast. IEEE

Access 8, 172524–172533 (2020). https://doi.org/10.1109/ACCESS.

2020.3024901

17. Mishra, M., Byomakesha Dash, P., Nayak, J., Naik, B., Kumar Swain, S.:

Deep learning and wavelet transform integrated approach for short-term

solar PV power prediction. Measurement 166, 108250 (2020). https://doi.

org/10.1016/j.measurement.2020.108250

18. Mellit, A., Pavan, A.M., Lughi, V.: Deep learning neural networks for short-

term photovoltaic power forecasting. Renewable Energy 172, 276–288

(2021). https://doi.org/10.1016/j.renene.2021.02.166

19. VanDeventer, W., et al.: Short-term PV power forecasting using hybrid

GASVM technique. Renewable Energy 140, 367–379 (2019). https://doi.

org/10.1016/j.renene.2019.02.087

20. Li, L.L., Wen, S.Y., Tseng, M.L., Wang, C.S.: Renewable energy predic-

tion: A novel short-term prediction model of photovoltaic output power.

J. Cleaner Prod. 228, 359–375 (2019). https://doi.org/10.1016/j.jclepro.

2019.04.331

21. Pan, M., et al.: Photovoltaic power forecasting based on a support vec-

tor machine with improved ant colony optimization. J. Cleaner Prod. 277,

123948 (2020). https://doi.org/10.1016/j.jclepro.2020.123948

22. Jebli, I., Belouadha, F.Z., Kabbaj, M.I., Tilioua, A.: Prediction of solar

energy guided by pearson correlation using machine learning. Energy 224,

120109 (2021). https://doi.org/10.1016/j.energy.2021.120109

23. Zazoum, B.: Solar photovoltaic power prediction using different machine

learning methods. Energy Rep. 8, 19–25 (2022). https://doi.org/10.1016/

j.egyr.2021.11.183

24. Meng, M., Song, C.: Daily photovoltaic power generation forecasting

model based on random forest algorithm for north china in winter.

Sustainability 12(6), 2247 (2020). https://doi.org/10.3390/su12062247

16 JOGUNURI ET AL.

25. Munawar, U., Wang, Z.: A framework of using machine learning

approaches for short-term solar power forecasting. J. Electr. Eng. Technol.

15(2), 561–569 (2020). https://doi.org/10.1007/s42835-020-00346-4

26. Niu, D., Wang, K., Sun, L., Wu, J., Xu, X.: Short-term photovoltaic

power generation forecasting based on random forest feature selection

and CEEMD: A case study. Appl. Soft Comput. J. 93, 106389 (2020).

https://doi.org/10.1016/j.asoc.2020.106389

27. Mahmud, K., Azam, S., Karim, A., Zobaed, S., Shanmugam, B., Mathur, D.:

Machine learning based PV power generation forecasting in alice springs.

IEEE Access 9, 46117–46128 (2021). https://doi.org/10.1109/ACCESS.

2021.3066494

28. Massaoudi, M., Chihi, I., Sidhom, L., Trabelsi, M., Refaat, S.S., Oueslati,

F.S.: Photovoltaic power forecasting using weather measurements. Ener-

gies 14(13), 1–20 (2021)

29. Zameer, A., Jaffar, F., Shahid, F., Muneeb, M., Khan, R., Nasir, R.: Short-

term solar energy forecasting: Integrated computational intelligence of

LSTMs and GRU. PLoS One 18(10), e0285410 (2023). https://doi.org/

10.1371/journal.pone.0285410

30. Li, Y., Wang, R., Yang, Z.: Optimal scheduling of isolated microgrids using

automated reinforcement learning-based multi-period forecasting. 13(1),

159–169 (2021). https://doi.org/10.1109/TSTE.2021.3105529

31. Balal, A., Jafarabadi, Y.P., Demir, A., Igene, M., Giesselmann, M., Bayne,

S.: Forecasting solar power generation utilizing machine learning models

in Lubbock. Emerging Sci. J. 7(4), 1052–1062 (2023). https://doi.org/10.

28991/ESJ-2023-07-04-02

32. Behera, M.K., Nayak, N.: A comparative study on short-term PV

power forecasting using decomposition based optimized extreme learning

machine algorithm. Eng. Sci. Technol. 23(1), 156–167 (2020). https://doi.

org/10.1016/j.jestch.2019.03.006

33. Álvarez-Alvarado, J.M., Ríos-Moreno, J.G., Obregón-Biosca, S.A.,

Ronquillo-Lomelí, G., Ventura-Ramos, E., Trejo-Perea, M.: Hybrid

techniques to predict solar radiation using support vector machine and

search optimization algorithms: A review. Appl. Sci. 11(3), 1–17 (2021).

https://doi.org/10.3390/app11031044

34. Villegas-Mier, C.G., Rodriguez-Resendiz, J., Álvarez-Alvarado, J.M.,

Jiménez-Hernández, H., Odry, Á.: Optimized random forest for solar radi-

ation prediction using sunshine hours. Micromachines 13(9), 1406 (2022).

https://doi.org/10.3390/mi13091406

35. Chahboun, S., Maarouﬁ, M.: Novel comparison of machine learning

techniques for predicting photovoltaic output power. Int. J. Renewable

Energy Res. 11(3), 1205–1214 (2021). https://doi.org/10.20508/ijrer.

v11i3.12056.g8252

36. Jang, H.S., Bae, K.Y., Park, H.S., Sung, D.K.: Solar power prediction

based on satellite images and support vector machine. IEEE Trans. Sus-

tainable Energy 7(3), 1255–1263 (2016). https://doi.org/10.1109/TSTE.

2016.2535466

37. Varanasi, J., Tripathi, M.M.: K-means clustering based photo voltaic power

forecasting using artiﬁcial neural network, particle swarm optimization

and support vector regression. J. Inf. Optim. Sci. 40(2), 309–328 (2019).

https://doi.org/10.1080/02522667.2019.1578091

38. Ramedani, Z., Omid, M., Keyhani, A., Shamshirband, S., Khoshnevisan,

B.: Potential of radial basis function based support vector regression for

global solar radiation prediction. Renewable Sustainable Energy Rev. 39,

1005–1011 (2014). https://doi.org/10.1016/j.rser.2014.07.108

39. Aslam, S., Herodotou, H., Ayub, N., Mohsin, S.M.: Deep learning based

techniques to enhance the performance of microgrids: A review. In: Pro-

ceedings - 2019 International Conference on Frontiers of Information Technology, FIT

2019 at Islamabad, Pakistan. pp. 116–121 (2019). https://doi.org/10.1109/

FIT47737.2019.00031

40. Yadav, P.K., Bhasker, R., Stonier, A.A., Peter, G., Vijayakumar, A., Ganji,

V.: Machine learning based load prediction in smart-grid under different

contract scenario. IET Generation Trans & Dist (2023). https://doi.

org/10.1049/gtd2.12828

41. Qasem, M., Fılık, Ü.B.: Solar radiation forecasting by using deep neural

networks in Eski˙¸sehi˙

r. Sigma J. Eng. Nat. Sci. 39(2), 159–169 (2021).

https://doi.org/10.14744/sigma.2021.00005

42. Kim, S., Kim, H.: A new metric of absolute percentage error for intermit-

tent demand forecasts. Int. J. Forecasting 32(3), 669–679 (2016). https://

doi.org/10.1016/j.ijforecast.2015.12.003

43. Wheatcroft, E.: Interpreting the skill score form of forecast performance

metrics. Int. J. Forecasting 35(2), 573–579 (2019). https://doi.org/10.

1016/j.ijforecast.2018.11.010

How to cite this article: Jogunuri, S., F.T, J., Stonier,

A.A., Peter, G., Jayaraj, J., S, J., J, J.J., Ganji, V.: Random

forest machine learning algorithm based seasonal

multi-step ahead short-term solar photovoltaic power

output forecasting. IET Renew. Power Gener. 1–16

(2024). https://doi.org/10.1049/rpg2.12921

Content uploaded by Geno Peter

Content may be subject to copyright.

Performance evaluation of PUC7‐based multifunction single‐phase solar active filter in real outdoor environments: Experimental insights

Article

Full-text available

Jun 2024
IET RENEW POWER GEN

This paper presents a novel architecture to enhance the performance of grid‐connected photovoltaic (PV) systems through the introduction of several key novelties. Firstly, a packed U‐cell seven‐level (PUC7)‐based single‐phase solar active filter is implemented, offering a comprehensive solution for harmonics mitigation, reactive power compensation, and efficient power extraction from the PV source, while facilitating the injection of real power into the grid. Secondly, the p‐q power injection algorithm is modified to accommodate the extraction of solar power from the PV generator to the grid, simultaneously addressing the need for harmonic current injection to improve power quality. This modification ensures dynamic performance by extracting reference current with harmonic content and solar power information, thereby enhancing the system's overall efficiency. Lastly, the proposed architecture undergoes real outdoor testing, validating its performance in various key aspects including maximum power tracking, reduction of total harmonic distortion in comparison with previous work, operation at unity power factor, and testing the effective operation of the multifunction feature. These contributions collectively demonstrate the effectiveness of the proposed system in enhancing power injection quality and reactive power compensation under real outdoor conditions of PV systems connected to the grid.

Short-term solar energy forecasting: Integrated computational intelligence of LSTMs and GRU

Article

Full-text available

Oct 2023
PLOS ONE

Problems with erroneous forecasts of electricity production from solar farms create serious operational, technological, and financial challenges to both Solar farm owners and electricity companies. Accurate prediction results are necessary for efficient spinning reserve planning as well as regulating inertia and power supply during contingency events. In this work, the impact of several climatic conditions on solar electricity generation in Amherst. Furthermore, three machine learning models using Lasso Regression, ridge Regression, ElasticNet regression, and Support Vector Regression, as well as deep learning models for time series analysis include long short-term memory, bidirectional LSTM, and gated recurrent unit along with their variants for estimating solar energy generation for every five-minute interval on Amherst weather power station. These models were evaluated using mean absolute error root means square error, mean square error, and mean absolute percentage error. It was observed that horizontal solar irradiance and water saturation deficiency had a highly proportional relationship with Solar PV electricity generation. All proposed machine learning models turned out to perform well in predicting electricity generation from the analyzed solar farm. Bi-LSTM has performed the best among all models with 0.0135, 0.0315, 0.0012, and 0.1205 values of MAE, RMSE, MSE, and MAPE, respectively. Comparison with the existing methods endorses the use of our proposed RNN variants for higher efficiency, accuracy, and robustness. Multistep-ahead solar energy prediction is also carried out by exploiting hybrids of LSTM, Bi-LSTM, and GRU.

Forecasting Solar Power Generation Utilizing Machine Learning Models in Lubbock

Article

Full-text available

Jul 2023

Solar energy is a widely accessible, clean, and sustainable energy source. Solar power harvesting in order to generate electricity on smart grids is essential in light of the present global energy crisis. However, the highly variable nature of solar radiation poses unique challenges for accurately predicting solar photovoltaic (PV) power generation. Factors such as cloud cover, atmospheric conditions, and seasonal variations significantly impact the amount of solar energy available for conversion into electricity. Therefore, it is essential to precisely estimate the output of solar power in order to assess the potential of smart grids. This paper presents a study that utilizes various machine learning models to predict solar photovoltaic (PV) power generation in Lubbock, Texas. Mean Squared Error (MSE) and R² metrics are utilized to demonstrate the performance of each model. The results show that the Random Forest Regression (RFR) and Long Short-Term Memory (LSTM) models outperformed the other models, with a MSE of 2.06% and 2.23% and R² values of 0.977 and 0.975, respectively. In addition, RFR and LSTM demonstrate their capability to capture the intricate patterns and complex relationships inherent in solar power generation data. The developed machine learning models can aid solar PV investors in streamlining their processes and improving their planning for the production of solar energy. Doi: 10.28991/ESJ-2023-07-04-02 Full Text: PDF

Machine learning based load prediction in smart‐grid under different contract scenario

Article

Full-text available

Mar 2023
IET GENER TRANSM DIS

Many progressed information scientific strategies, particularly Artificial Intelligence (AI) and profound learning methods, have been proposed and tracked down wide applications in our general public. This proposition creates information driven arrangements by utilizing the most recent profound learning and AI innovation, including outfit learning, meta‐learning and move learning, for energy the executives framework issues. Genuine world datasets are tried on proposed models contrasted and best in class plans, which exhibit the predominant presentation of the proposed model. In this proposition, the engineering of the Smart Grid testbed is additionally planned and created by using ML calculations and true remote correspondence frameworks to such an extent that constant plan necessities of Smart Grid testbed is met by this reconfigurable system with stacking of full convention in medium access control (MAC) and physical layers (PHY). The proposed engineering has the reconfiguration property in view of the organization of remote correspondence and trend setting innovations of Information and communication technologies (ICT) which incorporates Artificial Intelligence (AI) calculation. The fundamental plan objectives of the Smart Grid testbed is to make it simple to construct, reconfigure and scale to address the framework level prerequisites and to address the ongoing necessities.

Optimized Random Forest for Solar Radiation Prediction Using Sunshine Hours

Article

Full-text available

Aug 2022

Knowing exactly how much solar radiation reaches a particular area is helpful when planning solar energy installations. In recent years the use of renewable energies, especially those related to photovoltaic systems, has had an impressive up-tendency. Therefore, mechanisms that allow us to predict solar radiation are essential. This work aims to present results for predicting solar radiation using optimization with the Random Forest (RF) algorithm. Moreover, it compares the obtained results with other machine learning models. The conducted analysis is performed in Queretaro, Mexico, which has both direct solar radiation and suitable weather conditions more than three quarters of the year. The results show an effective improvement when optimizing the hyperparameters of the RF and Adaboost models, with an improvement of 95.98% accuracy compared to conventional methods such as linear regression, with 54.19%, or recurrent networks, with 53.96%, without increasing the computational time and performance requirements to obtain the prediction. The analysis was successfully repeated in two different scenarios for periods in 2020 and 2021 in Juriquilla. The developed method provides robust performance with similar results, confirming the validity and effectiveness of our approach.

Novel Comparison of Machine Learning Techniques for Predicting Photovoltaic Output Power

Article

Full-text available

Sep 2021

With the tremendous advancements in the sector of renewable energy, power generation sources have diversified, and the grid has become complex to manage. Therefore, enhanced prediction of power sources has become paramount to efficiently manage power systems. In the context of the fourth industrial revolution, artificial intelligence appears as an emerging solution to address this challenging issue. Supported by other developing technologies such as big data and connected objects, artificial intelligence can provide accurate forecasts. Therefore, a performance comparison of several up-to-date machine learning algorithms is conducted in this paper for the hourly prediction of the resulting Photovoltaic power. The methods employed include bayesian regularized neural networks, k-nearest neighbors, gradient boosting, random forest, support vector regression, and multivariate adaptive regression splines. Moreover, this paper investigates the prevailing meteorological and irradiation data affecting the Photovoltaic power through using a correlation analysis. Although there are various existing surveys on photovoltaic power forecasts, current researches usually employ a single approach and compare it to basic algorithms without providing a thorough performance comparison. Furthermore, the datasets used in general depend on a certain period of the year, making it hard to assess the final results. The key contribution of this research is the comprehensive assessment of six complex machine learning techniques, using two years of input data, the most popular error metrics: R-squared, Root Mean Square Error, Mean Absolute Error and a consistent set of black box model's explainers. The results revealed precisely that Bayesian regularized neural networks achieved the best prediction accuracy with R²=99,99%.

Optimal Scheduling of Isolated Microgrids Using Automated Reinforcement Learning-Based Multi-Period Forecasting

Article

Full-text available

Jan 2022

In order to reduce the negative impact of the uncertainty of load and renewable energies outputs on micro-grid operation, an optimal scheduling model is proposed for isolated microgrids by using automated reinforcement learning-based multi-period forecasting of renewable power generations and loads. Firstly, a prioritized experience replay automated reinforcement learning (PER-AutoRL) is designed to simplify the deployment of deep reinforcement learning (DRL)-based forecasting model in a customized manner, the single-step multi-period forecasting method based on PER-AutoRL is proposed for the first time to address the error accumulation issue suffered by existing multi-step forecasting methods, then the prediction values obtained by the proposed forecasting method are revised via the error distribution to improve the prediction accuracy; secondly, a scheduling model considering demand response is constructed to minimize the total microgrid operating costs, where the revised forecasting values are used as the dispatch basis, and a spinning reserve chance constraint is set according to the error distribution; finally, by transforming the original scheduling model into a readily solvable mixed integer linear programming via the sequence operation theory (SOT), the transformed model is solved by using CPLEX solver. The simulation results show that compared with the traditional scheduling model without forecasting, this approach manages to significantly reduce the system operating costs by improving the prediction accuracy.

Enhanced Random Forest Model for Robust Short-Term Photovoltaic Power Forecasting Using Weather Measurements

Article

Full-text available

Jul 2021

Short-term Photovoltaic (PV) Power Forecasting (STPF) is considered a topic of utmost importance in smart grids. The deployment of STPF techniques provides fast dispatching in the case of sudden variations due to stochastic weather conditions. This paper presents an efficient data-driven method based on enhanced Random Forest (RF) model. The proposed method employs an ensemble of attribute selection techniques to manage bias/variance optimization for STPF application and enhance the forecasting quality results. The overall architecture strategy gathers the relevant information to constitute a voted feature-weighting vector of weather inputs. The main emphasis in this paper is laid on the knowledge expertise obtained from weather measurements. The feature selection techniques are based on local Interpretable Model-Agnostic Explanations, Extreme Boosting Model, and Elastic Net. A comparative performance investigation using an actual database, collected from the weather sensors, demonstrates the superiority of the proposed technique versus several data-driven machine learning models when applied to a typical distributed PV system.

Machine Learning Based PV Power Generation Forecasting in Alice Springs

Article

Full-text available

Mar 2021

The generation volatility of photovoltaics (PVs) has created several control and operation challenges for grid operators. For a secure and reliable day or hour-ahead electricity dispatch, the grid operators need the visibility of their synchronous and asynchronous generators’ capacity. It helps them to manage the spinning reserve, inertia and frequency response during any contingency events. This study attempts to provide a machine learning-based PV power generation forecasting for both the short and long-term. The study has chosen Alice Springs, one of the geographically solar energy-rich areas in Australia, and considered various environmental parameters. Different machine learning algorithms, including Linear Regression, Polynomial Regression, Decision Tree Regression, Support Vector Regression, Random Forest Regression, Long Short-Term Memory, and Multilayer Perceptron Regression, are considered in the study. Various comparative performance analysis is conducted for both normal and uncertain cases and found that Random Forest Regression performed better for our dataset. The impact of data normalization on forecasting performance is also analyzed using multiple performance metrics. The study may help the grid operators to choose an appropriate PV power forecasting algorithm and plan the time-ahead generation volatility.

Solar photovoltaic power prediction using different machine learning methods

Article

Nov 2021

Bouchaib Zazoum

The main aim of the present study is to explore the relationship between numerous input parameters and the solar photovoltaic (PV) power using machine learning (ML) models. Two different ML approaches such as support vector machine (SVM) and Gaussian process regression (GPR) were considered and compared. The basic input parameters including solar PV panel temperature, ambient temperature, solar flux, time of the day and relative humidity were considered for predicting the solar PV power. The results showed that among the proposed ML approaches, Matern 5/2 GPR algorithm provided the optimal performance; whereas cubic SVM had the worst performance. Furthermore, the predicted output results are in good agreement with the experimental values, indicating that the proposed ML approaches are appropriate for use in predicting the power of different solar PV panel. Additionally, to showcase the effectiveness and the accuracy of SVM and GPR models in forecasting solar PV power, the results of these models are compared using root mean squared error (RMSE) and mean absolute error (MAE) criteria.

Solar radiation forecasting by using deep neural networks in Eski̇şehi̇r

Article

Jun 2021

According to the World Economic Outlook (WEO), the global demand for energy is presum- ably going to be increased due to growing the world’s population up during the upcoming two decades. As a result of that, apprehensions about environmental effects, which appear as a re- sult of greenhouse gases are grown and cleaner energy technologies are developed. This clearly shows that extended growth of the worldwide market share of clean energy. Solar energy is considered as one of the fundamental types of renewable energy. For this reason, the need for a predictive model that effectively observes solar energy conversion with high performance becomes urgent. In this paper, classic empirical, artificial neural network (ANN), deep neural network (DNN), and time series models are applied, and their results are compared to each other to find the most accurate model for daily global solar radiation (DGSR) estimation. In addition, four regression models have been developed and applied for DGSR estimation. The obtained results are evaluated and compared by the root mean square error (RMSE), rela- tive root mean square error (rRMSE), mean absolute error (MAE), mean bias error (MBE), t-statistic, and coefficient of determination (R2). Finally, simulation results provided that the best result is found by the DNN model.

Random forest machine learning algorithm based seasonal multi‐step ahead short‐term solar photovoltaic power output forecasting

Abstract and Figures

Recommended publications

Deep Neural Network based Forecasting of Short- Term Solar Photovoltaic Power output

Short-term Photovoltaic Power Forecasting Based on Improved Firefly Algorithm to optimize support ve...

Research on LSTM-XGBoost Integrated Model of Photovoltaic Power Forecasting System

Photovoltaic Power Generation Forecasting Based on the ARIMA-BPNN-SVR Model