ArticlePDF Available

Developing a Risk Management Approach based on reinforcement Training in the Formation of an investment Portfolio

Authors:

Abstract and Figures

Investments play a significant role in the functioning and development of the economy. Risk management is an integral part of the formation of the investment portfolio. This means that an investor must be willing to take on a certain level of risk in order to receive a certain level of return. However, when forming an investment portfolio, an investor faces such problems as market unpredictability, asset correlation, incorrect asset allocation. Therefore, when forming an investment portfolio, an investor should carefully study all possible risks and try to minimize them. The object of research is an approach to risk management in the formation of an investment portfolio using the method of reinforcement training. The basic principles of formation of the investment portfolio and determination of risks are described. The application of the method of reinforcement training for building a model of risk management of investment portfolio is considered. The process of selecting optimal investment assets based on alternative data sources that minimize risks and maximize profits is also considered. A functional model of the process of risk optimization in the formation of an investment portfolio based on machine learning methods has been developed. The functional model constructed makes it possible to build a process of risk optimization, including asset selection, risk comparison and assessment, to form an investment portfolio and monitor its risks. The study results showed that the proposed approach to the formation of the investment portfolio increased the total growth of the investment portfolio by 0.4363 compared to the base model. Also, the volatility indicator improved compared to the market, as evidenced by the percentage difference between the initial and final cash amount, which increased from 128.98 to 295.57.
Content may be subject to copyright.
Eastern-European Journal of Enterprise Technologies ISSN 1729-3774 2/3 ( 122 ) 2023
106
Copyright © 2023, Authors. This is an open access article under the Creative Commons CC BY license
DEVELOPING A RISK
MANAGEMENT
APPROACH BASED
ON REINFORCEMENT
TRAINING IN THE
FORMATION OF AN
INVESTMENT PORTFOLIO
Vitalii Martovytskyi
Corresponding author
PhD, Associate Professor*
E-mail: vitalii.martovytskyi@nure.ua
Volodymyr Argunov
Postgraduate Student*
Igor Ruban
Doctor of Technical Sciences, First Vice-Rector**
Yuri Romanenkov
Doctor of Technical Sciences, Professor
Department of Management
National Aerospace University «Kharkiv Aviation Institute»
Chkalovа str., 17, Kharkiv, Ukraine, 61070
*Department of Electronic Computers**
**Kharkiv National University of Radio Electronics
Nauky ave., 14, Kharkiv, Ukraine, 61166
Investments play a significant role in the function-
ing and development of the economy. Risk management
is an integral part of the formation of the investment
portfolio. This means that an investor must be will-
ing to take on a certain level of risk in order to receive
a certain level of return. However, when forming an
investment portfolio, an investor faces such problems
as market unpredictability, asset correlation, incorrect
asset allocation. Therefore, when forming an invest-
ment portfolio, an investor should carefully study all
possible risks and try to minimize them. The object
of research is an approach to risk management in the
formation of an investment portfolio using the method
of reinforcement training. The basic principles of for-
mation of the investment portfolio and determination
of risks are described. The application of the method
of reinforcement training for building a model of risk
management of investment portfolio is considered. The
process of selecting optimal investment assets based on
alternative data sources that minimize risks and maxi-
mize profits is also considered. A functional model of
the process of risk optimization in the formation of an
investment portfolio based on machine learning methods
has been developed. The functional model construc-
ted makes it possible to build a process of risk opti-
mization, including asset selection, risk comparison
and assessment, to form an investment portfolio and
monitor its risks. The study results showed that the
proposed approach to the formation of the investment
portfolio increased the total growth of the investment
portfolio by 0.4363 compared to the base model. Also,
the volatility indicator improved compared to the mar-
ket, as evidenced by the percentage difference between
the initial and final cash amount, which increased from
128.98 to 295.57
Keywords: investment portfolio, risk manage-
ment, machine learning, actor-critic, learning with-
out a trainer
UDC 004.891.2
DOI: 10.15587/1729-4061.2023.277997
How to Ci te: Mar tovy tskyi, V., Argu nov, V., R uban, I., Roma nenkov, Y. (2 023). De velop ing a risk managemen t appr oach ba sed
on re inforcement training in the form ation of an investment portfolio. Easter n-European Journal of Enter prise Technologies,
2 (3 (122)), 106–116. d oi: https://doi.or g/10.15587/1729-4061.2023.277997
Received date 07.02.2023
Accepted date 18.04.2023
Publi shed date 28.0 4.2023
1. Introduction
Investments play a significant role in the functioning
and development of the economy, and changes in physical
volumes and quantitative ratios of investments affect the vo-
lume of public production and employment, the development
of industries and sectors of the economy.
The active development of the world financial market testi-
fies to the acquisition of special importance of financial instru-
ments in the system of the global economic mechanism for the
functioning of the global economy. With the intensification of
globalization processes, portfolio investment goes through dif-
ferent stages of evolution and acquires new characteristics [1, 2].
Global foreign direct investment (FDI) flows declined
by 35 percent in 2020, reaching USD 1 trillion, from
USD 1.5 trillion in 2019 (Fig. 1) [3]. This is the lowest level
since 2005 and almost 20 % below the 2009 minimum after
the global financial crisis. Lockdowns around the world in
response to the COVID-19 pandemic have slowed down
existing investment projects, and the prospect of recession
has forced multinational corporations (MNCs) to reevaluate
new projects. The fall in FDI was much sharper than the fall
in gross domestic product (GDP) and trade.
That is why industrial companies, concerns, and holdings
need to manage individual innovative projects and portfolios
and optimize the risks associated with them.
The portfolio theory of Harry Markowitz, published in
1952, is considered classic in this area. As part of his theo-
ry [4], Markowitz first expressed the idea of the need to mea-
sure, track, and control not only profitability but also risk. As
quantitative metrics describing interesting characteristics of
portfolio assets, Markowitz proposed using expected returns
and risk levels, which are estimated based on a history of
asset price fluctuations. A key aspect of Markowitz theory is
not only the idea of diversifying a portfolio in order to reduce
the overall level of risk but also the formulation of its quanti-
fication. One drawback is that Markowitz’s analysis did not
involve short positions (negative values of asset weights).
Control processes
107
Similar studies were conducted by Roy; a similar ap-
proach was published in [5], which is also based on data on
expected return and portfolio variance. When forming his
model, Roy was guided by the principle «Safety First prin-
ciple» [5, 6], that is, the expected profitability should not be
less than the predefined level. And unlike Markowitz’s theory,
Roy’s model made it possible to take into account short posi-
tions in the formation of a portfolio.
A common problem in both approaches is the need to
estimate the expected return and variance of assets, as well
as their correlation structure. The easiest way to assess these
points is to estimate based on historical data. However, in
some cases, this approach can lead to significant valuation
errors and, as a result, to inefficient indicators of the formed
portfolio in the future.
Risk management in the formation of project portfolios is
necessary to solve the following management tasks:
– formation of a balanced portfolio of projects, taking into
account its compliance with the company’s goals and ensuring
the necessary balance between risk and return on investment;
– risk management of portfolio projects, including build-
ing an effective risk management system in the company;
– ensuring the necessary transparency and attractiveness of
the company to investors, insurance companies in order to re-
duce the cost of attracted financing, reduce the cost of insurance
programs, improve credit ratings and the value of the company.
Unsupervised learning methods such as cluster analysis,
amplification learning, and deep neural network training can
help investors identify basic market behavior patterns and
build an optimal portfolio.
For example, cluster analysis can help investors divide
assets into clusters according to their interaction and the risks
associated with these assets. Investors can review this infor-
mation and build a portfolio that reflects different clusters,
providing a balance between risk and potential return.
Reinforcement training and deep neural network training
can help investors identify patterns of market behavior and
develop risk management strategies based on these patterns.
Consequently, trainerless learning methods can be useful to
investors in practical use when forming an investment portfolio.
It is important that the study contains specific recommenda-
tions that would be accessible and understandable to investors.
Thus, the development of methods and means of risk
management in the formation of investment portfolios is an
urgent task that requires continuous improvement.
2. Literature review and problem statement
The formation of an investment portfolio, on the one hand,
is aimed at preserving capital at the expense of conditionally
risk-free assets, and on the other hand, at increasing it by
including risky assets. Unlike monoinvestment, portfolio in-
vestment makes it possible to improve investment conditions
by giving a set of assets an investment characteristic that are
unattainable from the position of a single asset. The main
investment characteristic that interests any investor is the
ratio of risk and portfolio return. Finding a balance between
these indicators, depending on individual investment goals,
is the main task of the theory of investment portfolio mana-
gement [3, 7, 8]. An unambiguous approach to the formation
of an optimal portfolio in financial theory does not exist [9].
One of the most well-known methods of risk assessment
is value at risk (VaR) [9]. It is a generalizing quantitative sta-
tistical measurement of risk, which makes it possible in one
number to generalize the influence of different risk factors
and takes into account the correlation between the influence
of risk factors. VaR characterizes the amount that will not
exceed the expected losses during a certain period with the
predefined probability. The VaR indicator was first used by
JP Morgan to improve risk efficiency. Taking into account
the peculiarities of regulation and the impact of various
factors on financial markets, the effectiveness of the VaR
methodology is perceived ambiguously. Despite the classic
nature of use, the parametric method for calculating VaR
needs to be improved.
Fig. 1. The flow of foreign direct investment in the world and by groups of economies, 2007–2020
(USD billions and interest rates)
Eastern-European Journal of Enterprise Technologies ISSN 1729-3774 2/3 ( 122 ) 2023
108
In [10], a practical example of risk management based on
rebalancing is given. In addition to risk management methods
based on diversification and hedging, the rebalancing method
is used to reduce the risks of the investment portfolio. Unfor-
tunately, this approach to risk management and reduction is
not always possible. In [11], a new framework State-Augment-
ed RL is presented. Its structure is aimed at solving two unique
problems of financial project management:
– heterogeneity of data the information collected for
each asset is usually diverse, noisy, and unbalanced;
– environmental uncertainty the financial market is
multifaceted and unstationary.
To include heterogeneous data and increase resilience to
environment uncertainty, SARL supplements asset informa-
tion with a forecast of their price movements in the form of
additional states.
In [12], the authors presented a flexible approach to the for-
mation of an investment portfolio at the industry level and used
the Merton model of conditional claims to assess the impact of
the carbon tax shock on the market value of equity and debt in-
struments. In the process, calibrating the model using detailed
company-level vulnerability data. As a result, a decrease in the
market value of banks’ assets by 2–13 % of fixed capital was re-
vealed with a tax shock of EUR 100 and an increase to 6–29 %
with a tax shock of EUR 200. But the results of this approach
can only be used as an additional factor for building a general
model for forming an investment portfolio. In [13], the authors
reported a study aimed at introducing the ELECTRE-TRI and
FlowSort methods when choosing a stock portfolio as one of
the most popular and important subjects for decision-making.
They also compare the results of each method to understand
how these methods work in the tasks of forming investment
portfolios. In this study, the best worst method was used to de-
termine the weights of the criteria. Four approaches for ELEC-
TRE-TRI were considered. In ELECTRE-TRI, if the results of
the pessimistic approach differ greatly from the optimistic ap-
proach, there are several incompatibilities between categorical
portfolio formation. According to the optimistic or pessimistic
approach, the nine alternatives belong to the best (first) or
worst (third) class but not to the intermediate (second) class.
Thus, for the correct formation of the portfolio, additional in-
formation may be required, for example, the inclusion of other
decision-making criteria or the provision of accurate estimates.
The result also shows that using the veto threshold for ELEC-
TRE-TRI does not give a good result for this task.
Based on the analysis, we can conclude that such classical
methods as presented in [9] cannot be fully used to optimize
the risks of forming an investment portfolio. This is due to
the fact that under real conditions, in addition to classical
financial indicators and indicators on the attractiveness of
certain assets, many external factors play a role.
In place of classical approaches, the latest methods and
approaches [10–13] are gradually beginning to be intro-
duced, based on information technology. These methods and
approaches try to introduce additional indicators and param-
eters with the help of which it is possible to assess in a certain
way the influence of certain external factors in optimizing the
risks of forming an investment portfolio. But such methods
and approaches are rather highly specialized and/or aimed at
certain types of investments, or at certain factors that affect
the risk assessment of certain types of assets.
That is why research and development of risk manage-
ment methods in the formation of an investment portfolio
become relevant.
The transition from post-industrial to information so-
ciety, which takes place in the XXI century, cannot be
imagined without intensive information exchange and de-
velopment of information systems. The key communicative
role in it belongs to networks that freely form an association
of people and interest groups. Communication technologies
make it possible to create social communities (Internet com-
munity) with almost any given characteristics – educational,
professional, age. They are formed against the background of
acceleration of social time and strengthening the dynamics
of communication forms in the process of social reproduc-
tion. At the same time, stable relations give way to constant
changes, and society becomes similar to reflective and com-
munication communities [14, 15].
Therefore, our study proposes an approach to risk opti-
mization in the formation of an investment portfolio, which
will include, in addition to classical indicators, the impact of
social media on asset price volatility.
This choice of additional indicators is due to the fact that
by analyzing social media, you can get not only information
from reputable investors or public figures but various discus-
sions that are related to a particular asset.
The method of reinforcement training is an effective ap-
proach for building an investment portfolio. This approach
is that an agent (for example, an investor) interacts with the
environment (for example, the stock market) and makes de-
cisions based on the rewards received (profit or loss).
In the case of forming an investment portfolio, the agent
can choose different assets (for example, stocks, bonds, funds,
etc.) and distribute his/her capital among them according to
his/her goals and strategy. The agent can receive rewards in
the form of investment profits and penalties in case of losses.
The vast majority of reinforcement learning (RL) and
neurodynamic programming (NDP) methods fall into one of
the following two categories:
– methods intended only for subjects (actors) work with
a parameterized policy family. The performance gradient
with respect to the actor’s parameters is estimated directly
by modeling, and the parameters are updated in the direction
of improvement [16–20]. A possible disadvantage of such
methods is that gradient estimates can have greater variance.
Moreover, as the policy changes, the new gradient is evalu-
ated regardless of previous estimates. Consequently, there is
no «learning» in the sense of accumulating and consolidating
old information;
– methods intended only for critics rely solely on the
approximation of the value function and aim to obtain an ap-
proximate solution to the Bellman equation, which proposed
an almost optimal policy [21, 22]. Such methods are indirect
because they try to optimize the policy space directly. The
method can succeed in constructing a «good» approximation
of the value function but with reliable guarantees in terms of
the near optimality of the resulting policy;
– actor-critical methods aim to combine the strengths of
only actor and critical methods. The critic uses intermediary
architecture and simulation to explore the value function,
which is then used to update the actor’s policy settings to-
wards performance improvements. Such methods, if based
on a gradient, may have desirable convergence properties,
unlike critical methods, for which convergence is guaranteed
under very limited conditions. They promise to provide fas-
ter convergence (by reducing variance) compared to meth-
ods based only on actors. On the other hand, the theoretical
understanding of actor-criticism methods was limited to the
Control processes
109
possibility of presenting policies in search tables [23]. That is
why the work proposes such an approach to learning.
3. The aim and objectives of the study
The aim of this work is to develop an approach to risk ma-
nagement in the formation of an investment portfolio based on
machine learning methods. This will provide an opportunity
to analyze the impact of social media on asset price volatility.
In turn, this will make it possible to comprehensively assess
the risks in the formation of the investment portfolio and on
the basis of this build a system of support and decision-making
in the formation of the company’s investment portfolio.
To accomplish the aim, the following tasks have been set:
– to build a functional model of the process of risk opti-
mization in the formation of an investment portfolio based on
machine learning methods;
– to conduct an experimental study on the proposed ap-
proach.
4. The study materials and methods
The object of research is the process of risk management
in the formation of an investment portfolio based on rein-
forcement training.
The main hypothesis of the study assumes that the use
of reinforcement learning methods will effectively manage
risks in the formation of an investment portfolio, reducing
costs and increasing portfolio profitability. The developed
approach to risk management will reduce costs in portfolio
formation and increase portfolio profitability.
The algorithms of actors’ criticism as algorithms of
stochastic gradient in the actor’s parameter space were con-
sidered. When the actor’s parameter vector is θ, the critic’s
job is to compute the projection approximation Πθ. The actor
uses this approximation to update his/her policy in the ap-
proximate direction of the gradient.
The actor-critic is similar to the policy gradient algo-
rithm called REINFORCE [24] with a basic level. Reinforce-
ment is Monte Carlo learning, indicating that total income is
taken from the full trajectory. But in actor’s criticism we use
bootstrap. So, the main changes will be in the action-value
function, and it will take the form:
As arbs
tt
tt
t
T
,.
()
=−
()
=
0
1
(1)
b(st) was replaced by a function of the value of the current
state. Then it can be represented as follows:
Asarsa Vs Vs
tt tt
ttπππ
θθθ
,, .
()
=
()
+
()
()
+
1 (2)
The state value Vπ(s) is the expected total reward start-
ing at state s and operates in accordance with policy π.
If the agent uses the predefined policy to select actions,
the corresponding value function is defined as:
V
sE rsS
i
i
i
T
πγ
()
=
∀∈
=
1
1
.
(3)
The optimal state function has a high possible value func-
tion, compared to another value function for all states:
V
sVssS
*
ma
x.
()
=
()
∀∈
π
π (4)
In RL, if we know the optimal value function, then the
policy corresponding to the optimal value function is the op-
timal π* policy.
Alternatively, the preference function is the error of TD,
as shown in the Actor-Critic scheme in Fig. 2.
Environment
Value
function
Policy
Actor
Critic
Error TD
State
Action
Reward
Fig. 2. Algorithm actor-critic scheme
The expression of the actor’s policy gradient can be ex-
pressed as follows:
()
≈∇
()
()
=
Ja
sA as
tt
tt
t
T
θπ
θθ πθ
log,
,.
0
1
(5)
The actor-critic algorithm [25] can be represented as
follows:
Step 1. Take an example (st, at) that uses a πθpolicy from
a network of actors.
Step 2. Evaluate the preference function At. This can be
called an error of TD. In the actor-critic algorithm, the prefer-
ence function is created by the critics network using formula (2).
Step 3. Estimate the gradient by expression (5).
Step 4. Update the θ policy settings using the formula:
θθa θ=+
()
J.
(6)
Step 5. Update the weights based on RL criticism (Q-learn-
ing). δt corresponds to the preference function:
(7)
Step 6. Repeat steps 1 to 5 until we find the optimal po-
licy πθ.
Taking into account the peculiarities of reinforcement
learning methods, the following simplifications were adopted
in the current study:
– lack of consideration of macroeconomic factors in the
formation of the portfolio;
– lack of consideration of individual needs and limita-
tions of investors when forming a portfolio.
5. Results of investigating the risk management approach
in the formation of the investment portfolio
5. 1. Functional model of risk management process in
the formation of an investment portfolio
Investment risks are understood as the reasons for the
volatility of investment returns. All investments are subject to
Eastern-European Journal of Enterprise Technologies ISSN 1729-3774 2/3 ( 122 ) 2023
110
different risks. The greater the volatility of prices, the higher
the level of risk. Understanding the risks involved in owning
different securities is crucial to building the right investment
portfolio. Probably, it is the high risk that discourages many
investors from investing in stocks and forces them to keep
money in so-called risk-free savings deposits, certificates of
deposit, and bonds.
After analyzing the methods of risk assessment in the
formation of the investment portfolio [3–14], it can be con-
cluded that approaches to risk assessment in the formation
of the investment portfolio should include three categories
of indicators:
1) classic financial data that can be obtained from public
reports of companies;
2) technical financial data (indicators), which make it
possible to obtain data on future prices using quotes data for
a certain period of time;
3) alternative non-financial data, which include the fol-
lowing indicators:
– economic and technological threats: inflation, econom-
ic sanctions, rising prices for resources;
– political and legal threats, for example,
imperfection of legislation in a certain region
where the company’s assets are concentrated;
– socio-demographic threats, such as re-
duced purchasing power;
– potential threats from social media: ana-
lysis of public opinion, analysis of messages of
key figures-leaders in social networks.
Taking into account all these indicators,
a general structure of the approach to risk op-
timization in the formation of the investment
portfolio was formed.
To describe the approach to risk management in the for-
mation of an investment portfolio, it is proposed to use the
following tuple:
RMSActorCriticRL=
{}
,,,,
(8)
where S is the set of environmental states, formed on the ba-
sis of data collected in the process of monitoring asset states;
Actor an asset management agent for the formation of an
investment portfolio, designed to assess the risk of the i-th
asset and make decisions to include an asset in the portfolio;
Critic an agent whose task is to approximate the Q-func-
tion – the utility function of control; RL is a reinforcement
learning algorithm that forms an optimal risk management
policy when forming an investment portfolio. During train-
ing, the system (Actor, Critic) learns by interacting with some
environment.
Each element of the state of the medium is described by
the following tuple:
Sfin sem=
{}
,,
(9)
fin opAopC am lq pr st dbtmt=
{}
,,,, ,, ,,
(10)
where sem i-th set of indicators as a result of semantic analy-
sis of social media resources.
A fin is a set of classical financial indicators, where:
opA is the set of operational analysis indicators;
opC – set of indicators of operating costs;
am – a set of asset management indicators;
lq – multiple liquidity indicators;
pr – a set of profitability indicators;
st – a set of capital structure indicators;
dbt – multiple indicators of debt service;
mt – set of market indicators.
In turn, Actor and Critic are two independent neural
networks, where π(s,a,θ) is a policy function that controls
the actions of our agent and
qsaw,,
()
is a value function that
measures how good these actions are.
Since we have two models (Actor and Critic) that need to
be trained, this means that we have two sets of weights (θ for
our action and w for our critic) that must be optimized se-
parately:
Δθ
θθ
=∇
()
()
()
log,
,.
sa qsa
w
(11)
Δ=
()
+
()
()
()
()
++
wRsa qs aqsa qsa
wt tw ww
βγ
,,
,,.

11
(12)
At each step t we take the current state (St) from the en-
vironment and pass it as input through our Actor and Critic.
Politicy adopts a state, decides on an asset (At) and receives
a new state (St+1) and a reward (Rt+1), Fig. 3.
Due to this:
– the critic calculates the value of performing this action
in this state;
– the actor updates his/her policy (weight) parameters
using the q value.
Thanks to its updated parameters, the Actor performs the
next action to be done in At+1 given the new state St+1. The
critic then updates his/her parameters.
Since value-based methods have high variability, it is pro-
posed to replace the value function with a modified one (2).
This function will report an improvement compared to
the average value of the action taken in this state.
In other words, this function calculates the additional
reward we get if we perform this action. An additional reward
is something that goes beyond the expected value of that
condition.
If A(s, a) > 0: the gradient shifts in that direction.
If A(s, a) < 0 (our action is worse than the mean of this
state), the gradient shifts in the opposite direction.
5. 2. Experimental research on the proposed approach
The following set was used to conduct the experiment:
dateassetopenhighlow closevolume
djcp turbulen
,,,,,, ,
,
cce MACD
MACDhMACDs
ADXD
,___,
___, ___,
_,
12 26 9
12 26 912269
14 MMP DMN
CCI RSImentions
mentions
_, _,
__., _, ,
,
14 14
14 0015 14
ttweetscount tweetsnormalized
sentiments negative senti
__,
_,
,
mmentsneutral
sentiments positive
_,
_
,
(13)
Environment, S
ˆ
w
q
θ
π
t
S
ˆ
w
q
θ
π
1t
S
+
1t
R
+
t
A
ˆ
w
q
θ
π
2t
S
+
1t
R
+
1t
A
+
Fig. 3. Diagram of the process of teaching actor’s criticism
Control processes
111
where date – date (period); asset – ticker (exchange code) of
the asset; open ∈ fin – starting price of the asset for the period;
high ∈fin – maximum asset price for the period; low ∈ fin – mi-
nimum asset price for the period; close ∈fin – the final price
of the asset for the period; volume ∈fin – the total amount of
the asset participating in trading for the period; adjcp ∈fin
adjusted closing price of an asset reflecting the value of that
asset after taking into account any corporate actions such as
payment of dividends, divestitures, mergers, etc.; turbulence
index of financial turbulence; MACD_12_26_9 – MACD in-
dicator/line MACD; MACDh_12_26_9 – MACD indicator/
bar chart; MACDs_12_26_9 – MACD indicator/signal line;
ADX_14 – ADX indicator/trendline; DMP_14 – ADX indica-
tor/positive directional movement; DMN_14 – ADX indica-
tor/negative directional movement; CCI_14_0.015 – CCI in-
dicator; RSI_14 – RSI indicator; mentionsthe total number
of mentions in social text messages that were collected and
analyzed for tonality over a certain period; tweets_count
the same mentions, only «tweets», messages on the social
network Twitter, without retweets, reposts, and other «social
activity», only original messages; tweets_normalized daily
average from polarity, where polarity is a superposition of
negative, neutral, positive tonality analysis estimates in the
range {–1, ..., 1}; sentiments_negative ∈ sem aggregated ne-
gative score based on the results of the analysis of sentiment
of social text messages related to the asset for a certain period;
sentiments_neutral ∈sem – aggregated neutral score based on
the analysis of the tone of social text messages related to the
asset for a certain period; sentiments_positive ∈ sem is an aggre-
gated positive score based on the results of analyzing the tone
of social text messages related to an asset for a certain period.
The parameters when setting up the model were selected
as follows:
lr =0.0001 – classic learning rate;
vf_loss_coeff =0.5 – loss coefficient of function «value»;
entropy_coeff =0.01 coefficient of regularizer entro-
py (control of freedom);
model_fcnet_hiddens: [512, 512] – size of agent cascade.
When implementing the model, the Ray framework was
used, both for building an agent model and for distributed
tuning/training, plus an upgraded framework for creating
an environment for OpenAI Gym, taking into account the
financial features of the model. Plus, a mini framework was
written to compile datasets and calculate technical indicators
using configuration files to speed up this process. Backtesting
was also implemented with Quantopian’s Empyrical in mind.
All experiments were conducted on 2–3 virtual ma-
chines (Google Cloud VM) with 8 cores/16 GB of RAM. For
tuning, we used: 500 iterations for each grid search element, 1.5–
2 hours per element, for training: 1000 iterations, 3.5–4 hours.
To test the performance of the proposed approach and
conduct the experiment, several models were trained, while
the architecture of the model, environment, etc., is the same,
the difference is in the composition of the dataset:
1) full dataset without alternative financial data;
2) a complete dataset with alternative data from Twitter;
3) full dataset with alternative data from the news.
In this case:
– random seed was configured in all possible modules of
the model where it is possible, and it does not disrupt the
learning process of the model for maximum similarity at all
stages and reproducibility of the whole process;
– a sufficiently large number of iterations were chosen so
that the possible differences were completely insignificant
since the possible ways of exploration process came to the
same optimal solutions thanks to other fixed SEED-s.
Tuning of hyperparameters of the model took place on a full
dataset without alternative financial data to achieve good results
of the model as a whole. The experiment was conducted using
the same hyperparameters of architecture and environment in
order to be able to prove or disprove the real effect of alterna-
tive non-financial data on improving the quality of the model.
An important point of training took place on the so-called
rollout fragments, these are randomly selected fragments
from the entire time series. Thanks to this approach, both
«bias» and memorization of the best choice are excluded, as
well as retraining, which is although minimally possible in
such systems. But for the sake of purity and reliability, this
kind of comparative experiment should be excluded.
After training the models, we carried out:
– classical validation, in the form of loss control, vf_loss,
and other standard metrics of the learning process. But in
this situation and for the purposes of the experiment they are
completely useless, therefore they were not included in the
results of the experiment, they were only tested to confirm
the correct learning process;
– for the main and additional experiment, a basic analysis
of the model performance and several basic backtesting met-
rics were conducted;
– for the main experiment, full backtesting of models and
analysis of relevant results were carried out.
Next, paired graphs will be shown describing how the
comparison was made and the key points about them for all
experiments. Plus, all backtesting metrics and its specific
graphs for the main one. The order is the first without alter-
native data, the second with them.
Since older «alt-data» data does not exist or is not pos-
sible to obtain in principle but, given the results, it is for
the better. The training was conducted on data obtained
from sources starting in 2022 and early 2023, that is, during
a period of severe global recession in markets, so some stabili-
ty indicators like MDD, Downside Deviation, Stability, Beta
have low and similar indicators.
Global backtest metrics (in basic model/alt-data format)
are given in Tables 1, 2.
Table 1
Backtesting results – global metrics for the base model
Metrics Value Metrics Value
Gain 128.98 Omega 1.51
CAGR 1.34 Volatility 0.42
Sharpe 2.27 Return 1.34
Tail 1.07 Stability 0.84
Sortino 3.48 Downside 0.27
MDD –0.23 Beta 1.18
Calmar 5.87 Alpha 1.68
Table 2
Backtesting results – global metrics for alternative data
Metrics Value Metrics Value
Gain 295.57 Omega 1.96
CAGR 3.11 Volatility 0.43
Sharpe 3.5 Return 3.11
Tail 1.16 Stability 0.92
Sortino 5.93 Downside 0.26
MDD –0.23 Beta 1.13
Calmar –0.18 Alpha 3.73
Eastern-European Journal of Enterprise Technologies ISSN 1729-3774 2/3 ( 122 ) 2023
112
Metrics for comparison with the market (Al-
pha&Beta) are calculated compared to S&P-500.
Fig. 4, 5 show a plot of the effectiveness of the
base model and the proposed approach with alter-
native data.
Fig. 6, 7 show a plot for calculating the Sharpe
sliding coefficient.
The Sharpe coefficient of a strategy is designed
to measure the average excess profitability of a stra-
tegy as the ratio of volatility «sustained» to achieve
this return. This is a broad measure of the ratio of
reward to risk of strategy.
The Sharpe moving coefficient on an annualized
basis simply calculates this value based on trading
data for the previous year. It provides a constantly
updated, albeit retrospective, view of the current
reward-risk ratio.
A low Sharpe coefficient (below 1.0) means that
significant yield volatility is maintained at a mini-
mum average return.
The negative Sharpe coefficient implies that it
would be better to have an instrument representing
the risk-free rate used in the calculations. In this
case, not only is the average return of the strategy
lower than that achieved by the risk-free rate, but
the volatility of these inefficient returns also re-
mains. This is a good indicator of the effectiveness
of a strategy.
Fig. 8, 9 show a plot for calculating the sliding
coefficient Alpha.
From the plots, you can see a significant increase
in peak values, some redistribution of intervals, some
intervals have a smaller peak but longer duration.
0
500
1000
1500
2000
2500
Portfolio value, $
Date
Fig. 4. Baseline model performance plot
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Portfolio value, $
Date
Fig. 5. Performance plot for the proposed approach with alternative data
-6
-4
-2
0
2
4
6
8
10
12
14
Sharpe coefficient value
Date
Fig. 6. Calculation plot of the Sharpe sliding coefficient for the base mode
-6
-4
-2
0
2
4
6
8
10
12
Sharpe coefficient value
Date
Fig. 7. Calculation plot of the Sharpe sliding coefficient for the proposed approach with alternative data
Control processes
113
-50
0
50
100
150
200
Alpha coefficient value
Date
Fig. 8. Alpha sliding coefficient calculation
plot for base model
-50
0
50
100
150
200
250
300
350
400
450
Alpha coefficient value
Date
Fig. 9. Calculation plot of sliding
coefficient Alpha for the proposed approach with
alternative data
The plots shown in Fig. 10, 11 present a comparison of
the moving factor Beta.
Despite the high similar peaks, it can be seen that the
number and steepness of these peaks is generally lower.
Fig. 12, 13 show the sliding window calculation plots for
the Rolling Drawdown coefficient.
0
0.5
1
1.5
2
2.5
3
Beta coefficient value
Date
Fig. 10. Sliding coefficient Beta calculation plot
for the base model
0
0.5
1
1.5
2
2.5
3
3.5
Beta coefficient value
Date
Fig. 11. Calculation plot of the sliding
coefficient Beta for the proposed approach with
alternative data
-0.18
-0.16
-0.14
-0.12
-0.1
-0.08
-0.06
-0.04
-0.02
0
Drawdown coefficient value
Date
Fig. 12. Rolling Drawdown sliding factor calculation plot
for the base model
-0.18
-0.16
-0.14
-0.12
-0.1
-0.08
-0.06
-0.04
-0.02
0
Drawdown coefficient value
Date
Fig. 13. Rolling Drawdown sliding coefficient
calculation plot for the proposed approach with
alternative data
Despite the high similar global indicators, it is seen that
the number of intervals in which they have high values has
decreased.
Fig. 14, 15 present plots for calculating the sliding win-
dow for the Cumulative Returns coefficient.
On the charts you can see a noticeable decrease in losses
and, as a result, improved stability and profitability. There is
also an increase in growth against the market.
Eastern-European Journal of Enterprise Technologies ISSN 1729-3774 2/3 ( 122 ) 2023
114
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Coefficient values
Date
returns versus
Fig. 14. Cumulative Returns sliding factor calculation
plot for the base model
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
Coefficient values
Date
returns versus
Fig. 15. Calculation plot of the sliding
coefficient Cumulative Returns for the proposed approach
with alternative data
6. Discussion of results of investigating
the approach to risk management in the formation
of the investment portfolio
After analyzing the general metrics of the backtest, which
are presented in Tables 1, 2, and on the plots in Fig. 4–15, the
following improvements can be seen in the formation of the
investment portfolio:
– having considered the Gain indicator, you can see that
the total growth of the investment portfolio increased by
0.43637717;
– having considered the CAGR indicator, you can also
see that the average annual increase also increased by
about 0.43;
– having considered the indicator Sharpe Fig. 6.7 it can
be noted that the index as a whole is very high since with
such fluctuations the losses are not so great, but the applica-
tion of the proposed approach significantly improves it;
– having considered the Sortino indicator, you can also
notice an improvement from the coefficient value for the base
model 3.48 to 5.93;
– after analyzing the MDD coefficient, you can see that the
level of subsidence improved by 0.05. While not a significant
improvement, this is a significant improvement for this metric;
– having considered the indicator that measures the
return of the portfolio adjusted for risk (Calmar coefficient)
Tables 1, 2 you can see an increase in efficiency in relation
to risk.
On the plots of profit growth (portfolio value history)
Fig. 4, 5 one can see a marked reduction in losses and, as
a result, stability and profitability improved.
From Fig. 6, 7, one can see an increase in the number of
intervals with higher values and a decrease in the number of
intervals with negative values and, as a result, an improve-
ment in the average and global values of the metric.
After analyzing the plots in Fig. 8, 9, it is noticeable that
some intervals have a smaller peak, which can generally be
considered an improvement since in this case, the stability
and indicator are higher for a longer time.
The plots shown in Fig. 10, 11 also demonstrate high
similar peaks and the number and steepness of these peaks
is generally lower than that of the base model, indicating an
improved indicator and less instability.
According to the plot of the history of aggregate profit
growth Fig. 14, 15 there is a marked reduction in losses and,
as a result, the increased stability and profitability. You can
also see an increase compared to the market.
The results of the study showed that RL agents can sig-
nificantly improve asset allocation because they outperform
strong baselines in contrast to the methods demonstrated
in [11–13].
Although the experiment successfully proved the work of
the proposed approach, there are cases when the result does
not correspond to the expected. This is caused by shadow
factors, such as insider transactions, speculation in the mar-
ket, and other unfair actions of interested parties, which also
significantly affects the state of assets.
The proposed approach is of great practical importance
for investors and asset managers. Risk management is an
important aspect of investing because it helps reduce possible
losses and ensure sustainable portfolio returns. The use of
reinforcement learning methods makes it possible to identify
and analyze the risks that arise when forming a portfolio and
develop optimal strategies for their management.
In addition, the developed risk management approach
reduces portfolio management costs as it provides more
accurate and efficient asset allocation solutions. This is im-
portant because management costs can affect overall port-
folio returns.
Consequently, the risk management approach based on
reinforcement learning has great practical potential for in-
vestors and asset managers, allowing them to reduce risk and
increase the return on their portfolio.
As a shortcoming, it can be noted that a small number
of content sources were used during the experiment and the
result of the analysis may not actually fully reflect the real
picture. From the side of the approach itself, the problem is
solved quite simply by adding additional sources of content
but on the technical side, it will require additional labor and
computing resources.
For further development of this approach, attention
should also be paid to the formation of validated and exten-
ded data sets that can be used in training neural networks.
7. Conclusions
1. A functional model of the process of risk optimiza-
tion in the formation of an investment portfolio based on
machine learning methods has been built. The developed
functional model makes it possible to build a process of risk
optimization, including asset selection, risk comparison and
Control processes
115
assessment, building an investment portfolio and monitor-
ing its risks. In addition, various constraints can be taken
into account, such as asset limits, risk limits, and portfolio
value limits. The use of machine learning methods in the
development of a risk optimization model makes it possible
to use a large amount of data and take into account various
factors that affect the risks and returns of the investment
portfolio.
2. Our experimental study on the proposed approach
to the formation of the investment portfolio showed that
the total growth of the investment portfolio increased
by 0.43637717 compared to the base model. Also, the
volatility indicator improved compared to the market,
as evidenced by the percentage difference between the
initial and final amount of cash, which increased from
128.98 to 295.57.
Conflicts of interest
The authors declare that they have no conflicts of interest
in relation to the current study, including financial, personal,
authorship, or any other, that could affect the study and the
results reported in this paper.
Funding
The study was conducted without financial support.
Data availability
The data will be provided upon reasonable request.
References
1. Kaftia, M. A. (2019). The Formation of Modern Portfolio Theories: the Main Problems and Tendencies of Development. Business
Inform, 2 (493), 414–419. doi: https://doi.org/10.32983/2222-4459-2019-2-414-419
2. Romanenkov, Y., Vartanian, V. (2016). Formation of prognostic software support for strategic decision-making in an organization.
Eastern-European Journal of Enterprise Technologies, 2 (9 (80)), 25–34. doi: https://doi.org/10.15587/1729-4061.2016.66306
3. Zadoia, A. O. (2019). Portfolio investments in Ukraine: chance or challenges? Academic Review, 2, 81–92. doi: https://doi.org/
10.32342/2074-5354-2019-2-51-8
4. Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7 (1), 77–91. doi: https://doi.org/10.1111/j.1540-6261.1952.tb01525.x
5. Roy, A. D. (1952). Safety First and the Holding of Assets. Econometrica, 20 (3), 431. doi: https://doi.org/10.2307/1907413
6. Sullivan, E. J. (2011). A.D. Roy: The Forgotten Father of Portfolio Theory. Research in the History of Economic Thought and
Methodology, 73–82. doi: https://doi.org/10.1108/s0743-4154(2011)000029a008
7. Jagannathan, R., Ma, T. (2003). Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps. The Journal of
Finance, 58 (4), 1651–1683. doi: https://doi.org/10.1111/1540-6261.00580
8. Ratushna, Yu. S. (2019). Foreign financial investment development factors. Naukovyi visnyk Uzhhorodskoho natsionalnoho uni-
versytetu. Seriya: Mizhnarodni ekonomichni vidnosyny ta svitove hospodarstvo, 24 (3), 59–66. Available at: http://nbuv.gov.ua/
UJRN/Nvuumevcg_2019_24%283%29__13
9. Lopez, J. A. (1999). Methods for evaluating value-at-risk estimates. Economic Review, Federal Reserve Bank of San Francisco.
10. Sunchalin, A. M. et al. (2019). Methods of risk management in portfolio theory. Available at: https://www.revistaespacios.com/
a19v40n16/a19v40n16p25.pdf
11. Ye, Y., Pei, H., Wang, B., Chen, P.-Y., Zhu, Y., Xiao, J., Li, B. (2020). Reinforcement-Learning Based Portfolio Management with
Augmented Asset Movement Prediction States. Proceedings of the AAAI Conference on Artificial Intelligence, 34 (01), 1112–1119.
doi: https://doi.org/10.1609/aaai.v34i01.5462
12. Reinders, H. J., Schoenmaker, D., van Dijk, M. (2023). A finance approach to climate stress testing. Journal of International Money
and Finance, 131, 102797. doi: https://doi.org/10.1016/j.jimonfin.2022.102797
13. Emamat, M. S. M. M., Mota, C. M. de M., Mehregan, M. R., Sadeghi Moghadam, M. R., Nemery, P. (2022). Using ELECTRE-TRI and
FlowSort methods in a stock portfolio selection context. Financial Innovation, 8 (1). doi: https://doi.org/10.1186/s40854-021-00318-1
14. Lukina, N. P., Nurgaleeva, L. V. (2005). Valuegical and ideological status of a network community in information social space: state-
ment of a problem. Gumanitarnaya informatika.
15. Romanenkov, Y., Danova, M., Kashcheyeva, V., Bugaienko, O., Volk, M., Karminska-Bielobrova, M., Lobach, O. (2018). Complexi-
fication methods of interval forecast estimates in the problems on short-term prediction. Eastern-European Journal of Enterprise
Technologies, 3 (3 (93)), 50–58. doi: https://doi.org/10.15587/1729-4061.2018.131939
16. Model velykykh danykh ta mashynnoho navchannia. Available at: https://business.diia.gov.ua/en/handbook/impact-investment/
model-velikih-danih-ta-masinnogo-navcanna
17. Marbach, P., Tsitsiklis, J. N. (2001). Simulation-based optimization of Markov reward processes. IEEE Transactions on Automatic
Control, 46 (2), 191–209. doi: https://doi.org/10.1109/9.905687
18. Raskin, L., Sukhomlyn, L., Sagaidachny, D., Korsun, R. (2021). Analysis of multi-threaded markov systems. Advanced Information
Systems, 5 (4), 70–78. doi: https://doi.org/10.20998/2522-9052.2021.4.11
19. Raskin, L., Sira, O., Sukhomlyn, L., Parfeniuk, Y. (2021). Universal method for solving optimization problems under the conditions
of uncertainty in the initial data. Eastern-European Journal of Enterprise Technologies, 1 (4 (109)), 46–53. doi: https://doi.org/
10.15587/1729-4061.2021.225515
Eastern-European Journal of Enterprise Technologies ISSN 1729-3774 2/3 ( 122 ) 2023
116
20. Raskin, L., Sira, O. (2016). Method of solving fuzzy problems of mathematical programming. Eastern-European Journal of Enter-
prise Technologies, 5 (4 (83)), 23–28. doi: https://doi.org/10.15587/1729-4061.2016.81292
21. Alibekov, E., Kubal k, J., Babu ka, R. (2018). Policy derivation methods for critic-only reinforcement learning in continuous spaces.
Engineering Applications of Artificial Intelligence, 69, 178–187. doi: https://doi.org/10.1016/j.engappai.2017.12.004
22. Semenov, S., Weilin, C., Zhang, L., Bulba, S. (2021). Automated penetration testing method using deep machine learning technology.
Advanced Information Systems, 5 (3), 119–127. doi: https://doi.org/10.20998/2522-9052.2021.3.16
23. Zheng, L., Fiez, T., Alumbaugh, Z., Chasnov, B., Ratliff, L. J. (2022). Stackelberg Actor-Critic: Game-Theoretic Reinforcement
Learning Algorithms. Proceedings of the AAAI Conference on Artificial Intelligence, 36 (8), 9217–9224. doi: https://doi.org/
10.1609/aaai.v36i8.20908
24. Mnih, V. et al. (2016). Asynchronous methods for deep reinforcement learning. International conference on machine learning.
doi: https://doi.org/10.48550/arXiv.1602.01783
25. Grondman, I., Busoniu, L., Lopes, G. A. D., Babuska, R. (2012). A Survey of Actor-Critic Reinforcement Learning: Standard and
Natural Policy Gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42 (6),
1291–1307. doi: https://doi.org/10.1109/tsmcc.2012.2218595
26. A faster, simpler approach to parallel Python. Available at: https://www.ray.io/ray-core
... Управління інвестиційними портфелями є ключовим завданням у сфері фінансів та інвестицій, де інвестори прагнуть досягти оптимального балансу між ризиком і прибутковістю. Одним із підходів до вирішення цього завдання є диверсифікація інвестиційного портфеля, яка полягає у розподілі коштів між різними активами, щоб знизити ризик та підвищити очікуваний прибуток [1]. ...
Conference Paper
Full-text available
Управління інвестиційними портфелями є ключовим завданням у сфері фінансів та інвестицій, де інвестори прагнуть досягти оптимального балансу між ризиком і прибутковістю. Одним із підходів до вирішення цього завдання є диверсифікація інвестиційного портфеля, яка полягає у розподілі коштів між різними активами, щоб знизити ризик та підвищити очікуваний прибуток [1]. Метою дослідження є пошук оптимального розподілення капіталу між рі-зними активами з використанням алгоритмів навчання з підкріпленням, щоб максимізувати накопичений прибуток агенту, який управляє інвестиціями. Та-ким чином, пропонується підібрати таке поєднання різнорідних даних (фінан-сових активів), в яких обраний метод навчання з підкріпленням буде найефек-тивнішим в умовах безперервного потоку даних. Існуючі рішення пропонують використовувати методи логічної диверси-фікації, такі як розподіл активів за класами, регіонами, галузями та видами ак-тивів [2]. Недоліком є присутність людського фактору та неоптимізованість в умовах різнорідності даних. Рішення включає застосування методу випадкового пошуку, який показав свою ефективність при оптимізації гіперпараметрів нейронних мереж [3]. У цьому контексті випадковий пошук є методом оптимізації початкового набору фінансових активів. В результаті проведеного дослідження можна зробити висновок, що за-стосування випадкового пошуку є перспективним методом вирішення задачі оптимізації диверсифікованого інвестиційного портфеля в рамках навчання з підкріпленням в умовах різнорідних даних. Цей метод демонструє здатність агенту знаходити оптимальний набір різнорідних даних, максимізуючи нако-пичений прибуток, що становить інтерес для практичного застосування у сфері управління інвестиціями.
Article
Full-text available
There is increasing interest in assessing the impact of climate policies on the value of financial sector assets, and consequently on financial stability. Prior studies either take a “black box” macro-financial approach or focus solely on equity instruments – though banks’ exposures predominantly consist of debt. We develop a more tractable finance (valuation) approach at the industry-level and use a Merton contingent claims model to assess the impact of a carbon tax shock on the market value of equity and debt instruments. We calibrate our model using detailed firm level vulnerability data and apply the model to 2-digit sectoral exposures of Dutch banks. We find declines in the market value of banks’ assets of 2-13% of core capital for a €100 carbon tax shock, increasing to 6-29% for a €200 carbon tax shock.
Article
Full-text available
In recent years, multi-criteria sorting problems have become an interesting topic for researchers working on multi-criteria decision-making. ELimination and Choice Expressing REality (ELECTRE)-TRI and FlowSort are well-known approaches suggested for such a classification. The current study aimed to implement ELECTRE-TRI and FlowSort methods in the stock portfolio selection (SPS) as one of the most popular and important decision-making subjects and compare the outcomes of each method to understand how these methods perform in SPS problems. In this study, the best–worst method was applied to determine the weights of criteria. Four approaches for ELECTRE-TRI and 15 approaches for FlowSort were considered. Finally, 19 different approaches were considered to select stocks from a large pool of stocks. Results indicated that the model parameter should be properly defined to minimize inconsistencies and improve the power of the model.
Article
Full-text available
Known technologies for analyzing Markov systems use a well-operating mathematical apparatus based on the computational implementation of the fundamental Markov property. Herewith the resulting systems of linear algebraic equations are easily solved numerically. Moreover, when solving lots of practical problems, this numerical solution is insufficient. For instance, both in problems of structural and parametric synthesis of systems, as well as in control problems. These problems require to obtain analytical relations describing the dependences of probability values of states of the analyzed system with the numerical values of its parameters. The complexity of the analytical solution of the related systems of linear algebraic equations increases rapidly along with the increase in the system dimensionality. This very phenomenon manifests itself especially demonstratively when analyzing multi-threaded queuing systems. Accordingly, the objective of this paper is to develop an effective computational method for obtaining analytical relations that allow to analyze high-dimensional Markov systems. To analyze such systems this paper provides for a decomposition method based on the idea of phase enlargement of system states. The proposed and substantiated method allows to obtain analytical relations for calculating the distribution of Markov system states. The method can be effectively applied to solve problems of analysis and management in high-dimensional Markov systems. An example has been considered
Article
Full-text available
We solved the problem of improvement of methodological base for a decision support system in the process of short-term prediction of indicators of organizational-technical systems by developing new, and adapting existing, methods of complexification that are capable of taking into consideration the interval uncertainty of expert forecast estimates. The relevance of this problem stems from the need to take into consideration the uncertainty of primary information, predetermined by the manifestation of NON-factors. Analysis of the prerequisites and characteristics of formalization of uncertainty of primary data in the interval form was performed, the merits of interval analysis for solving the problems of complexification of interval forecast estimates were identified. Brief information about the basic mathematical apparatus was given: interval arithmetic and interval analysis. The methods of complexification of forecast estimates were improved through the synthesis of interval extensions, obtained in accordance with the paradigm of an interval analysis. We found in the course of the study that the introduction of the analytical preference function made it possible to synthesize the model of complexification in a general way, by aggregating the classes of hybrid and selective models in a single form for the generation of consolidated predictions based on interval forecast estimates. This allows obtaining complexification predictions based on the interval forecast estimates, thereby ensuring accuracy of the consolidated short-term prediction. Critical analysis of the proposed methods was performed and recommendations on their practical application were developed. Recommendations for parametric setting of the analytic function of preferences were stated. Using the example, the adaptive properties of the interval model of complexification were shown.
Article
Full-text available
This paper proposes a method to solve a mathematical programming problem under the conditions of uncertainty in the original data. The structural basis of the proposed method for solving optimization problems under the conditions of uncertainty is the function of criterion value distribution, which depends on the type of uncertainty and the values of the problem’s uncertain variables. In the case where independent variables are random values, this function then is the conventional theoretical-probabilistic density of the distribution of the random criterion value; if the variables are fuzzy numbers, it is then a membership function of the fuzzy criterion value. The proposed method, for the case where uncertainty is described in the terms of a fuzzy set theory, is implemented using the following two-step procedure. In the first stage, using the membership functions of the fuzzy values of criterion parameters, the values for these parameters are set to be equal to the modal, which are fitted in the analytical expression for the objective function. The resulting deterministic problem is solved. The second stage implies solving the problem by minimizing the comprehensive criterion, which is built as follows. By using an analytical expression for the objective function, as well as the membership function of the problem’s fuzzy parameters, applying the rules for operations over fuzzy numbers, one finds a membership function of the criterion’s fuzzy value. Next, one calculates a measure of the compactness of the resulting membership function of the fuzzy value of the problem’s objective function whose numerical value defines the first component of the integrated criterion. The second component is the rate of deviation of the desired solution to the problem from the previously received modal one. Absolutely similarly designed is the computational procedure for the case where uncertainty is described in the terms of a probability theory. Thus, the proposed method for solving optimization problems is universal in relation to the nature of the uncertainty in the original data. An important advantage of the proposed method is the ability to use it when solving any problem of mathematical programming under the conditions of fuzzily assigned original data, regardless of its nature, structure, and type
Article
Full-text available
Portfolio management (PM) is a fundamental financial planning task that aims to achieve investment goals such as maximal profits or minimal risks. Its decision process involves continuous derivation of valuable information from various data sources and sequential decision optimization, which is a prospective research direction for reinforcement learning (RL). In this paper, we propose SARL, a novel State-Augmented RL framework for PM. Our framework aims to address two unique challenges in financial PM: (1) data heterogeneity – the collected information for each asset is usually diverse, noisy and imbalanced (e.g., news articles); and (2) environment uncertainty – the financial market is versatile and non-stationary. To incorporate heterogeneous data and enhance robustness against environment uncertainty, our SARL augments the asset information with their price movement prediction as additional states, where the prediction can be solely based on financial data (e.g., asset prices) or derived from alternative sources such as news. Experiments on two real-world datasets, (i) Bitcoin market and (ii) HighTech stock market with 7-year Reuters news articles, validate the effectiveness of SARL over existing PM approaches, both in terms of accumulated profits and risk-adjusted profits. Moreover, extensive simulations are conducted to demonstrate the importance of our proposed state augmentation, providing new insights and boosting performance significantly over standard RL-based PM method and other baselines.
Article
The article developed a method for automated penetration testing using deep machine learning technology. The main purpose of the development is to improve the security of computer systems. To achieve this goal, the analysis of existing penetration testing methods was carried out and their main disadvantages were identified. They are mainly related to the subjectivity of assessments in the case of manual testing. In cases of automated testing, most authors confirm the fact that there is no unified effective solution for the procedures used. This contradiction is resolved using intelligent methods of analysis. It is proposed that the developed method be based on deep reinforcement learning technology. To achieve the main goal, a study was carried out of the Shadov system's ability to collect factual data for designing attack trees, as well as the Mulval platform for generating attack trees. A method for forming a matrix of cyber intrusions using the Mulval tool has been developed. The Deep Q - Lerning Network method has been improved for analyzing the cyber intrusion matrix and finding the optimal attack trajectory. In the study, according to the deep reinforcement learning method, the reward scores assigned to each node, according to the CVSS rating, were used. This made it possible to shrink the attack trees and identify an attack with a greater likelihood of occurring. A comparative study of the automated penetration testing method was carried out. The practical possibility of using the developed method to improve the security of a computer system has been revealed.
Article
The hierarchical interaction between the actor and critic in actor-critic based reinforcement learning algorithms naturally lends itself to a game-theoretic interpretation. We adopt this viewpoint and model the actor and critic interaction as a two-player general-sum game with a leader-follower structure known as a Stackelberg game. Given this abstraction, we propose a meta-framework for Stackelberg actor-critic algorithms where the leader player follows the total derivative of its objective instead of the usual individual gradient. From a theoretical standpoint, we develop a policy gradient theorem for the refined update and provide a local convergence guarantee for the Stackelberg actor-critic algorithms to a local Stackelberg equilibrium. From an empirical standpoint, we demonstrate via simple examples that the learning dynamics we study mitigate cycling and accelerate convergence compared to the usual gradient dynamics given cost structures induced by actor-critic formulations. Finally, extensive experiments on OpenAI gym environments show that Stackelberg actor-critic algorithms always perform at least as well and often significantly outperform the standard actor-critic algorithm counterparts.
Article
This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuous-valued action is used, the most common approach is to discretize the action space and exhaustively search for the action that maximizes the right-hand side of the Bellman equation. Such a policy derivation procedure is computationally involved and results in steady-state error due to the lack of continuity. In this work, we propose policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression. The proposed methods are tested on nonlinear control problems: 1-DOF and 2-DOF pendulum swing-up problems, and on magnetic manipulation. The results show significantly improved performance in terms of cumulative return and computational complexity.