ArticlePDF Available

An Integrated Data-Driven Method Using Deep Learning for a Newsvendor Problem with Unobservable Features

Authors:

Abstract

We consider a single-period inventory problem with random demand with both directly observable and unobservable features that impact the demand distribution. With the recent advances in data collection and analysis technologies, data-driven approaches to classical inventory management problems have gained traction. Specially, machine learning methods are increasingly being integrated into optimization problems. Although data-driven approaches have been developed for the newsvendor problem, they often consider learning from the available data and optimizing the system separate tasks to be performed in sequence. One of the setbacks of this approach is that in the learning phase, costly and cheap mistakes receive equal attention and, in the optimization phase, the optimizer is blind to the confidence of the learner in its estimates for different regions of the problem. To remedy this, we consider an integrated learning and optimization problem for optimizing a newsvendor’s strategy facing a complex correlated demand with additional information about the unobservable state of the system. We give an algorithm based on integrating optimization, neural networks and hidden Markov models and use numerical experiments to show the efficiency of our method. In an empirical experiment, the method outperforms the best competitor benchmark by more than 27%, on average, in terms of the system cost. We give further analyses of the performance of the method using a set of numerical experiments.
An Integrated Data-Driven Method Using Deep
Learning for a Newsvendor Problem with
Unobservable Features
Davood Pirayesh Neghab
Department of Mechanical and Industrial Engineering, Ryerson University, Toronto, Canada
Email: dneghab@ryerson.ca
Siamak Khayyati
College of Engineering, Ko¸c University, Rumeli Feneri Yolu, Istanbul, Turkey, 34450
Email: skhayyati13@ku.edu.tr
Fikri Karaesmen*
College of Engineering, Ko¸c University, Rumeli Feneri Yolu, Istanbul, Turkey, 34450
Email: fkaraesmen@ku.edu.tr
We consider a single-period inventory problem with random demand with both directly observable and
unobservable features that impact the demand distribution. With the recent advances in data collection
and analysis technologies, data-driven approaches to classical inventory management problems have gained
traction. Specially, machine learning methods are increasingly being integrated into optimization problems.
Although data-driven approaches have been developed for the newsvendor problem, they often consider
learning from the available data and optimizing the system separate tasks to be performed in sequence. One of
the setbacks of this approach is that in the learning phase, costly and cheap mistakes receive equal attention
and, in the optimization phase, the optimizer is blind to the confidence of the learner in its estimates for
different regions of the problem. To remedy this, we consider an integrated learning and optimization problem
for optimizing a newsvendor’s strategy facing a complex correlated demand with additional information
about the unobservable state of the system. We give an algorithm based on integrating optimization, neural
networks and hidden Markov models and use numerical experiments to show the efficiency of our method.
In an empirical experiment, the method outperforms the best competitor benchmark by more than 27%, on
average, in terms of the system cost. We give further analyses of the performance of the method using a set
of numerical experiments.
Key words : Inventory; Hidden Markov model; Deep neural network; Partially observed data; Integrated
Estimation and Optimization
* Corresponding author.
This paper is dedicated to the memory of Prof. Gabor Rudolf, who is unfortunately no longer with us. We thank him
for his valuable advice and comments enhancing the quality of this work.
1
2
1. Introduction
The single period random demand inventory problem is one of the central problems in inventory
control and capacity management. In the standard version of the problem, it is assumed that the
inventory manager chooses an order quantity before observing the random demand. The mismatch
between the order quantity and the realized demand may lead to unsatisfied demand or unsold
items and the implied costs of a unit lost demand and an unsold item are not symmetrical. One
commonly used simplifying assumption on random demand is that its probability distribution
is known with certainty and this distribution is independent and identically distributed in each
period. However, this assumption might not hold given the available data in many cases.
Recent interest in data-driven approaches to inventory management has stimulated renewed
interest in general versions of this problem where the random demand depends on multiple factors
that are observed prior to ordering. In addition, there is past data on the observed factors and
the corresponding demand that was realized that can guide a data-dependent ordering decision.
This more general setup can be used for production/inventory management, and capacity planning
problems in systems that are affected by disruptions in the supply chains and the consequent
increase or decrease in demand for various items due to changes in the state of the environment
such as the disruptions and changes caused by the recent COVID-19 pandemic. In addition to
simple seasonal factors (such as day of the week or week of the month), or planning related factors
(promotions, competitor’s actions, etc.), many data sources that might have a potential impact
on demand are being monitored daily (weather forecasts, stock market indices, currency exchange
rates) and can be used in controlling the inventory systems.
A recent stream of research (Ban and Rudin 2019,Oroojlooyjadid et al. 2020) investigates the
above inventory problem under such observable features. On the other hand, there might be other
factors that may affect demand that are not directly observable at the time of the decision. These
may include supplier conditions affecting competitors, business cycles, or consumer preference
shifts that are observable only after a long time lag. In economics and finance, market models with
such hidden or unobservable features that randomly evolve are used to model and analyze business
cycles (Hamilton 1990) or financial market conditions (Bhar and Hamori 2004). The operations
literature also includes analytical models that investigate the effects of hidden features on inventory
decisions but this literature does not study how to base these decisions on limited historical data.
We propose a framework that includes such unobservable features governed by a random process
which affect demand in addition to observable features that are known at the time of the decision.
In this framework, considering the structure of the process that governs the unobservable features
evolving over time is as important as the short-term modeling of the effects of the observable
features.
3
In this paper, we contribute to the recent stream of literature that incorporates observable
features in data-driven inventory or capacity planning (Ban and Rudin 2019,Oroojlooyjadid et al.
2020) by investigating a single-period random demand inventory problem where random demand
in each period is dependent on an unobservable factor in addition to some observable factors. We
assume that past data is available on the observable factors and the corresponding demand that
was observed. On the other hand, information about the unobservable factor has to be inferred from
past data. The traditional approach to such a problem would be to first estimate the underlying
model parameters for the effect of the observable factors on demand and infer the hidden state
information and its effects. Once estimation and inference take place, the optimization problem
would be solved in the second step. However, some of the recent research on this problem (see
Ban and Rudin (2019) for example) has demonstrated the advantages of integrating the estimation
step of the parameters for the observable factors with the optimization step using tools from
machine learning. We pursue this integrated estimation and optimization approach in a model
which has the additional complexity of an unobservable factor. This brings the additional challenge
of considering a multi-period dynamic optimization problem because the unobservable state is
estimated dynamically and depends on the entire demand sequence that was observed prior to the
decision. We therefore combine estimation, inference and optimization using a multi-layered neural
network. To assess the performance of this integrated approach, we compare the results from our
approach against data-based methods that ignore the hidden factor information or that employ
separate inference and optimization steps. Numerical examples on both a synthetic data set and on
representative real data that is taken as a proxy to retail data which might have an unobservable
state demonstrate that our approach compares favorably against the other benchmarks.
The remainder of this paper is organized as follows. Section 2presents a review of the related
literature. Section 3presents the setup and the solution method we have proposed. Section 4
introduces several benchmark methods and reports the results of the numerical experiments to
assess the performance of the suggested method against the benchmarks. Section 5provides the
evaluation of the methods in an example with real data. Finally Section 6concludes the paper.
2. Literature Review
In the following we provide a review of the pertinent literature. This is presented in two parts.
First, we review the literature on inventory problems with an evolving demand environment. Then,
we review the related work on the data-driven newsvendor problem.
2.1. Inventory problems with an evolving demand environment
A long line of research investigates the impact of several factors such as macroeconomic shocks and
cycles on inventory systems (Blinder and Maccini 1991,Shang 2012,Kesavan and Kushwaha 2014),
4
especially through the effects of such factors on random demand. There are several papers that
consider dynamic environmental factors that cause the demand distribution to be non-stationary
which creates additional challenges. A common assumption in these papers is that the random
environment evolves according to a Markov chain.
Earlier studies assume that the state of the Markov chain at each point in time is fully observed
and the true demand distribution associated with each state is known (Sethi and Cheng 1997,Beyer
and Sethi 1997,Huh et al. 2011,Gallego and Hu 2004). For instance, Feldman (1978) proposes
and analyzes a model where demand depends on the state of the environment modeled by a
continuous-time Markov process. Lovejoy (1992) investigates the optimality of a myopic policy with
non-stationary demand which is dependent on a Markovian process over time. Song and Zipkin
(1993) present an inventory model where the demand depends on the state of the world modeled
by a Markov chain and derive the optimal ordering policy.
In many practical situations, the environmental states are not perfectly observable. Instead, one
can observe information about the environment and can only infer the states in a probabilistic
manner. Treharne and Sox (2002) categorize the literature in terms of stationarity of the demand
and observability of the information into four classes. The class of decision systems with Markov
modulated demand and partially observed information is known as partially observed Markov deci-
sion process (POMDP) (Monahan 1982). Treharne and Sox (2002) study several inventory policies
where only the historical demand is observable and the probability distribution of the demand is
determined by the non-observable state of the Markov chain. Bensoussan et al. (2005,2007) con-
sider the newsvendor problem with censored demand and inventory which depend on the Markov
chain states. Arifo˘glu and ¨
Ozekici (2010) analyze a single-item periodic-review inventory system
in a random environment. They extend the model of Gallego and Hu (2004) to the more general
setting where the environment is only partially observable. In particular, they show that a state-
dependent base-stock policy is optimal using sufficient statistics on the environment process. In a
later work, Arifo˘glu and ¨
Ozekici (2011) investigate the optimality of a state-dependent inventory
policy in a random environment where the capacity of production is random. Avci et al. (2020)
model and analyze the inventory problem where the demand belongs to a probability distribution
conditional on the Markovian states of the world.
In the above papers that use a POMDP model with imperfectly observed environment processes,
demand state is partially revealed via past demand data and the estimation of the state of the
environment is an important subproblem. This subproblem is solved using Bayesian updating to
incorporate the partial observations into the inventory models and a general solution is given by the
Baum-Welch algorithm. The outcome of this algorithm is the estimation of demand distribution
for each state of the observed sequence. This estimation is based on maximizing the likelihood
5
of the observed sequence. This maximization at the subproblem level does not take into account
the objective function of the inventory problem which leads to a separation of estimation and
optimization. However, it is seen in recent examples in the literature that integrating the estimation
and optimization problems may lead to better solutions. We refer to the separated estimation
and optimization procedure used in the above papers as Objective-blind Baum-Welch (ObBW)
method. This work contributes to this stream of research by developing a method that integrates
the estimation of the hidden states and the optimization of the system, allowing for the parameters
of the objective function to guide the estimation. Table 1displays the characteristics of this method
along with other approaches that are explained in the next subsection.
2.2. Data-driven approaches to inventory problems
Many papers address the concern that the demand distribution in an inventory problem may not
be completely known. Many of the works considering this topic approach the problem from the
perspective of robust optimization (Scarf 1958,Gallego and Moon 1993,Perakis and Roels 2008)
and Bayesian updating (Scarf 1959).
Some studies contribute to relaxations of the assumption that demand distribution is completely
known by developing data-based methods. In this framework, the decision maker uses the empirical
distribution obtained from past observations (Levi et al. 2007,Liyanage and Shanthikumar 2005,
Huh et al. 2011,Besbes and Muharremoglu 2013). We refer to the approach in these papers which
is mostly based on the sample observations of demand as the Empirical Demand Distribution
(EDD) method. For instance, Bertsimas and Thiele (2005) solve the problem without estimating
the distribution but assuming that all of the demand observations in the sample are assigned an
equal probability 1/T , where Tdenotes the number of demand observations. The optimal stock
level or the order quantity is then approximated by the estimated empirical distribution. The
advantage of this method is that, unlike the ObBW method, it does not assume any particular
shape for the demand distribution. This is useful with real data where demand may not follow a
common distribution. On the other hand, one drawback of this method is assuming that all the
future observations will also belong to the same empirical distribution which may be questionable.
In recent decades, data-driven optimization under uncertainty has gained increasing attention.
For instance, He et al. (2012) model the problem of setting nurse staffing levels in hospital operating
rooms with the uncertainty of daily workload as a newsvendor problem. They present various
models including a linear decision model that uses two features. Sachs (2015) considers ordering
with different types of exogenous data such as price and temperature that might explain demand.
She formulates the optimal inventory level as a linear function of those variables. In a case study
with real data, she shows that the non-parametric approaches outperform the parametric ones.
6
This is advantageous when the true demand distribution is not completely known and several
exogenous variables are available. We refer to this approach as Parameter Fitting Linear Regression
(PfLR). Here, the exogenous features explain part of demand variability through a (typically) linear
relationship. The approach therefore estimates the mean of the demand by a linear regression on
the features. This results in a time-varying mean that depends on the features and a fixed standard
deviation. In addition, the usual Gaussian assumptions are usually taken. Similar to the empirical
method, this method is not able to capture the dependency between demand observations.
The integrated estimation and optimization approach is referred to as a prescriptor method
(Van Parys et al. 2020). In a recent paper, Van der Laan et al. (2019) propose a new data-driven
approach based on distributionally robust optimization to achieve on-target service levels. They
show that the suggested approach, which bases the inventory decision directly on feature data,
is more reliable than several classical approaches even with a limited number of historical obser-
vations. Several studies combine the estimation and optimization steps using tools from machine
learning. Ban and Rudin (2019) consider the newsvendor problem with nobservations and pfea-
tures in two cases of a low and a high number of features. They propose two Machine Learning
based approaches: regularization and Kernel Optimization (KO), and demonstrate some theoret-
ical properties. They also show, in a numerical study, that using such features may lower the
expected cost significantly. A recent paper by Khayyati and Tan (2020) shows that integrating the
two steps of parameter estimation and optimization can improve the performance of a system in
make-to-stock queues.
The general use of ML-based methods in joint estimation and optimization, however, goes back
to several earlier studies such as (Efendigil et al. 2009,Goel et al. 2010,Gruhl et al. 2004,2005).
Bertsimas and Kallus (2020) combine the methods of ML with the conditional stochastic optimiza-
tion problem. They include direct-effect data as well as other auxiliary information and assume
that the joint probability distributions are unknown and the observations are imperfect. They
develop the framework with several ML methods and show that these techniques are computation-
ally tractable and asymptotically optimal under some conditions. This tractability is shown in the
presence of dependencies in the data and censored observations.
The main contributions of these important recent papers are using informative data, benefit-
ing from non-parametric models, decreasing the estimation errors, and making the decisions more
dynamic. We categorize the approach proposed by Ban and Rudin (2019) of relating the opti-
mal order quantity directly to the features using a functional form for the relationship as Linear
Machine Learning (LML). This method combines the estimation and optimization steps by solving
a nonlinear optimization problem, where the objective is minimizing directly the cost function of
the newsvendor problem instead of minimizing the regression error. In this method, similar to the
7
parameter fitting approach, there is an assumption of the linear relations between features and the
optimal order quantity.
Finally, some recent studies in data-based optimization contribute to the literature by taking
nonlinear relationships between feature data and the order quantity into consideration. Oroojlooy-
jadid et al. (2020) apply a deep learning approach to the newsvendor problem when such non-
linearities exist. They also consider multi-feature and multi-products extensions of the problem. It
is shown that deep learning outperforms other benchmarks such as local regression, classification
and regression trees, random forests, and kernel optimization, especially when demand is highly
volatile. Seubert et al. (2020) develop a data-driven system of ordering for a bakery chain based on
artificial neural networks. They use two different methods of sequential and joint estimation and
optimization and show that both methods considerably save costs compared to human planners.
Qi et al. (2020) extend the approach of Oroojlooyjadid et al. (2020) to multi-period inventory
system with uncertain demand and vendor lead time. Zhang and Gao (2017) examine a supervised
deep learning algorithm with two objectives. They demonstrate that the original newsvendor loss
function as the training function outperforms the quadratic loss function. The algorithm has been
evaluated on synthetic and real data. In this class of methods, non-linear relations between features
and demand have been considered. Oroojlooyjadid et al. (2020), as the first study of this class, sug-
gest complex functions that relate the features and the optimal order quantity using deep learning
in the classical newsvendor problem. This method is able to use the features and optimize the cost
function while considering a wide range of relationships in addition to linear relations. However,
their method does not identify the dependencies between consecutive states of the world in an
evolving environment. This work aims at addressing this issue by modeling long term dependencies
using hidden Markov models.
The contribution of this paper is as follows: first, we suggest a novel approach in data-driven
inventory system, which considers both observable and unobservable sources of features that affect
the randomness in demand. This new model extends the existing data-driven methods to other
applications, where there are limitations in identifying the factors and their volatility that influence
the state of the system. Second, we use the hidden Markov model as the most used modeling of
the evolving environment. Utilizing HMM in a data-driven framework enables us to model long
term dependencies and take advantage of other information sources available in the form of the
feature data. Third, by combining neural network modeling tools with stochastic inference, and
integrating them into the optimization method, we propose an integrated solution method for the
suggested model that captures nonlinear dependencies between features and order quantity while
alleviating the errors that occur in the estimation step. This is different from most of the literature
where the tasks of learning about the demand distribution based on the features and setting the
8
Table 1 Position of the suggested model in the existing literature.
Method Abbrev. State-dependent Data-driven NV-cost
function
integration
Non-linear
model
1 Suggested model HMMNV ✓ ✓ ✓
2
Empirical
Demand Distri-
bution
EDD
3
Objective-blind
by Baum-Welch
Alg.
ObBW
4
Parameter fitting
(Linear Regres-
sion)
PfLR
5Linear Machine
Learning LML ✓ ✓
6 Deep Learning DNN ✓ ✓ ✓
order quantity are handled separately. This may lead to a misalignment between the two different
objectives of minimizing the estimation error and minimizing the system cost. Finally, we show the
robustness of our proposed solution method using an extensive numerical experiments with both
synthetic and real data. We compare the results of the suggested model with data-driven methods
that ignore the evolving environment as a hidden factor or that employ inference and optimization
steps separately. Our numerical results reveal that the proposed approach performs better than
other methods when there might be unobservable features and leads to the recommendation that
taking into account the unobservable features might have significant benefits. In addition, we
present evidence that integrating unobservable feature estimation and inventory optimization is
feasible and may bring significant improvements.
3. Model
In this section, we describe the model setting and our integrated estimation and optimization
approach. We consider a single item newsvendor problem where the goal is to choose the order
quantity at the beginning of every period to minimize the expected costs in that period. We assume
that inventory is not carried over from one period to the next and backordering of unsatisfied
demand is not allowed (as in service capacity planning problems). In this setup, the decision maker
solves the problem in a period independently of previous periods’ inventory and his order quantity
does not affect future decisions. We suppose that the demand distribution is not known but that
the demand depends on the available observable feature data in addition to having some long term
9
Table 2 Description of the variables of the model.
Variable Description
Dtdemand at time t
Qtorder quantity at time t
Bibase demand in state i
fnt nth feature observed at time t
Ftvector of independent and identically distributed of Nfeatures at time t
Tnumber of periods or sample observations
βnlinear coefficient of nth feature with demand
Ststate of the Markov chain at time t
qtvector of states probability
ψEemission network function which maps the features Ftto Dtpartially
WEset of parameters of emission network function ψE
ψNV newsvendor network function which maps Ftto Qtpartially
WNV set of parameters of the newsvendor network function ψNV
ϵterror term as the nonsystematic part of the demand
N(µ, σ) Gaussian distribution with mean µand standard deviation σ
Λ likelihood function of the hidden Markov model
aij probability of transition from state ito state j
Atransition probability matrix of the Markov model
Ethe vector of the probability density functions of the hidden states
πinitial probability of the hidden Markov chain
αtforward parameter of the Baum-Welch algorithm
γ1learning rate of updating WE
γ2learning rate of updating WN V
γ3learning rate of updating A
ηthe coefficient of the trade-off between newsvendor cost and the Likelihood
dependencies on an unobservable feature modeled as a Markov chain. The optimal order quantity,
therefore, must also be dependent on previous information about demand and feature data. We
first present our notation and present the assumptions and then formulate the demand evolution
model. A brief introduction to deep neural networks followed by our suggested approach completes
the section.
3.1. Demand data, optimization and notation
The historical data of the problem can be represented using tuples of feature vectors and demand
realizations as
D={(F1, D1),(F2, D2),...,(FT, DT)},(1)
where Tdenotes the number of periods. In Equation (1), the vector of Ftfor each period of
t= 1,2,...,T consists of Ndifferent features f1t, f2t,...,fNt.
Given the data D, our focus is on the following newsvendor optimization problem where the
objective is to choose an order quantity Qto minimize expected (weighted) mismatch costs:
min
QNV C(Q) = ED[h(QD)++b(DQ)+|D],(2)
10
where Ddenotes the random demand, Qis the order quantity with has the unit overage cost and
bas the unit underage cost. In a retail setting, the overage and underage costs may refer to the
cost of unsold items and lost demand respectively. In a capacity setting (staffing) they represent
the cost of unused capacity and unfulfilled demand respectively.
When the demand distribution is known, the optimal order quantity in (2) can be found by
the well-known critical fractile rule. However, the data-driven environment presents an additional
challenge in that the solution of the outer optimization problem depends on the inner estimation
problem where the expected cost has to be estimated using past observations. The estimation
problem in itself is also solved as an optimization problem. Motivated by the recent success of
methods that integrate estimation and optimization (Ban and Rudin 2019,Oroojlooyjadid et al.
2020), we propose an approach that uses an integrated solution.
Finally, we should note here the optimal quantity in (2) is found separately for each period since
inventory or backorders are not carried over. On the other hand, the quantity decision depends on
the currently observed features and all past demand observations which carry information about
the state of the unobservable feature. This makes the order quantity dependent on the entire
demand sequence up to time t. Table 2describes all the variables that are used in this study.
3.2. Special case: A newsvendor problem with Markov modulated demand and
observable states
Many papers in the literature assume that demand depends on an external state of the world that
evolves according to a Markov chain S(Treharne and Sox 2002,Arifo˘glu and ¨
Ozekici 2010). It is
then natural to assume that demand Ddepends on the external state and therefore the demand in
period t,Dt=D|St(or D|St1depending on the filtration). Unlike the general formulation in (2),
the entire demand sequence is not required here because Stcarries all the necessary information.
We then look for the order quantity that minimizes the expected cost of the system as
min
QNV C(Q) = ED[h(QD)++b(DQ)+|S],(3)
Q
t=F1
D|Stb
h+b,(4)
where FD|St() is the cumulative distribution function of demand given Stand F1() denotes its
inverse.
Let us now generalize the model further and define a base demand Bithat depends on St=i
and an additional demand that depends on the values of certain observed features f1t, f2t,...,fNt
that do not depend on St. Further, let us assume that Biis independent of features. We can then
have
Dt=Bi+ψ(f1t, f2t,...,fN t ) + ϵt,(5)
where ϵtis a random error term with E[ϵt] = 0.
11
3.2.1. Example As an example, assume that there are two states of the world: (1) Good and
(2) Bad, and Biis normally distributed with parameters (µi, σi) where i= (1),(2). Further assume
that
ψ(f1t, f2t,...,fN t ) = β0+β1f1t+β2f2t+...+βNfN t .(6)
We then have that Dtis normally distributed with mean µi+ψ(f1t, f2t,...,fN t) and variance
σ2
i+σ2
ϵ.
From the known results, we then have
Q
t=µi+ψ(f1t, f2t,...,fN t ) + zpσ2
i+σ2
ϵ,(7)
where z= Φ1(b/(h+b)) (and Φ() is the cumulative distribution function of a standard normal
random variable).
Next, we present the main model in this paper which includes an unobservable environment
process states.
3.3. Our model: A data-driven newsvendor problem with unobservable
environment process
In an inventory system, there may be several sources of uncertainty, generated by observable sources
as features and non-observable sources. The features may correspond to observations such as the
weather, seasonality, and local market conditions. In this model, the base demand takes place as
the non-observable source. The base demand distribution, as a part of whole demand, depends on
some evolving states which have two properties; first, they follow a Markov model, second, they
are not observable. The state of the Markov chain affects the system partially through the base
demand. Therefore, the joint probability distribution of demand and a state varies depending on
the base demand distribution in each state, and the feature-dependent part completes the demand
distribution independently of the states. Let us assume that Stis not observable but can only be
inferred from past demands D1, D2,...,Dt1. This is similar to the setup in Treharne and Sox
(2002) and Arifo˘glu and ¨
Ozekici (2010). One can then estimate the conditional distribution
ˆ
St=St|D1, D2,...,Dt1.(8)
In our model, we assume that the realizations of the features are independent from each other
and the time period. Any dependency in the sequence of the observations of a feature and any
dependency between the different features do not add more information to the decision at the
beginning of a period as the features are observed before setting the order quantity. Hence, the
presence of these dependencies does not affect the solution. Moreover, a dependency between the
12
states of the (unobservable) Markov chain and the features may be beneficial to the order quantity
decision as the state of the Markov chain can be better inferred through the feature observations
ˆ
St=St|D1, D2,...,Dt1,Ft1.(9)
We consider the general form of the function (6) and substitute it in the Equation (5) to get
Dt=
S
X
i=1
I(St=i)Bi+ψ(f1t, f2t,...,fN t ) + ϵt,(10)
where I(x) = 1 if xis true and 0 otherwise. This implies a linear relation between the Markov chain-
dependent and the features-dependent parts of the demand in each state and an unknown complex
relation between Dtand Ft. More specifically, we assume that the unknown set of parameters of
the function of features that constitutes the demand partially is W. We then assume the following
form
Dt=
S
X
i=1
I(St=i)Bi+ψ(Ft,W) + ϵt.(11)
Figure 1 The effects of the features and the evolution of the state on the demand.
Figure 1shows how demand is generated by features and states: D3is a function of the observed
features F3and the unobservable state s3.
The inventory optimization problem can be written as:
min
QNV C(Q) = ED[h(QD)++b(DQ)+|D1, D2,...,Dt1,Ft1].(12)
We propose to fit a joint estimation and optimization model to sample data and test the perfor-
mance of the model out of sample with the objective of minimizing the cost in the out of sample
13
data. Minimizing the cost function requires deriving an accurate belief about the state of the
Markov chain at each point in time and solving the inventory optimization problem at that time.
These two problems can be stated as:
(1) Hidden Markov model problem: First, we consider the problem of maximizing the
likelihood of the observed sequence of demands that undergoes a hidden Markov model.
In our proposed problem setting, we assume that demand observations are produced by a con-
tinuous stochastic process. The problem of interest is characterizing the properties of demand
observations. According to our data structure and several important applications, we know that
the source of demand observations is nonstationary and its property varies over time. Here we use
the mathematical structure of HMM to explain the theoretical basis to characterize the statistical
properties of the demand observations (Rabiner 1989,Picone 1990).
Given a set of demand observations D=D1:T, HMM with a finite set of Sdistinct hidden states
changes the state of the system according to a set of probabilities associated with each state. In
order to present a full probabilistic description of this system, the other elements of an HMM
rather than the observations and the number of states are defined as follows referred to as the
triple model parameters λ= (π, A, E ). Here, π={πi=P(S1=i)}is the prior probabilities of si
being the initial state of the demand observations. A={aij}is the state transition probabilities
matrix. E={p1, p2,...,pS}is the probability density functions of the observations in hidden states
where pj(Dt) = P(Dt|St=j). Given this form of HMM, three basic problems must be solved for
the model1. We use the mathematical programming formulation of HMM to facilitate representing
these fundamental problems implicitly (Qin et al. 2000). The optimal solution of the following
formulation is the model parameters set λthat is most likely to generate the observed demand
sequence
max
π,A,E, Λ = log[P(D|λ)]
s.t. : A·1 = 1
π·1 = 1,
(P-HMM)
where the first constraint characterizes the transition probability from hidden state St+1 into St,
and P
j{aij }= 1, and the second one satisfies the relation between observation Dtand hidden state
Stat time t, and P
j{pj(Dt)}= 1.
1Rabiner (1989) clarifies that HMM design involves three problems; evaluating the probability or likelihood of a
sequence of observations using particular parameters of HMM; identifying the best sequence of states; and adjusting
the model parameters so that they explain the occurrence of the observations as much as possible.
14
(2) Newsvendor problem: The second problem is finding the network that sets the best order
quantity given the state sequence S=S1:Testimated from the first problem that has most probably
generated the demand sequence
min
Q,WNV ,Bi
NV C(Q|S)
s.t. : Qt=Bi+ψNV (Ft,WN V ) ; t= 1, . . . , T, i S
WNV R.
(P-NV)
However, these two objectives do not always align necessarily. We use a linear scalarization
method to formulate these two problems as a single-objective optimization
min
Q,WNV ,Bi,λ,q
Loss = ηΛ + (1 η)NV C
s.t. : A·1 = 1
π·1 = 1
Qt=Bi+ψNV (Ft,WN V ) ; t= 1, . . . , T, i S
(P-HMMNV)
where qis the probability of the states effective at the time of making decision on order quantity
Q. In Appendix B Section B.1, we show that how one can adjust the order quantity by changing
the trade-off coefficient ηin (P-HMMNV) to counter the effect of the probability of the states2.
To solve the above problem, we consider deep learning as it is one of the machine learning
methodologies that can model both highly non-linear functions and the Markov chain using histor-
ical data and its training can be performed efficiently using gradient methods3. In the following,
we describe the deep neural network briefly.
3.4. Deep neural networks
Deep neural networks are a sub-category of neural networks. Neural networks are machine learning
models originally inspired by biological processes. Neural networks are extremely capable of
approximating highly non-linear functions. Neural networks are widely studied and widely used
in machine learning and have various applications, especially in image and speech recognition
(Gurney 2018).
A neural network consists of several nodes that are connected, forming a directed graph. Each
node/neuron function receives signals from its upstream neurons and passes the aggregate signal
2We thank an anonymous referee for this suggestion.
3It is important to note that, in most of the data-driven approaches of the newsvendor problem, a regularization
term is added to the objective function (Oroojlooyjadid et al. 2020). However, this issue is more critical in the cases
labeled as “fat data” where there are many input feature variables in the model (Ban and Rudin 2019). In the present
study, the number of features is small and we do not incorporate the regularization term explicitly in the objective
function, rather we handle overfitting by performing training and evaluation in the training step.
15
Figure 2 A deep neural network.
to an activation function. The output of the activation function then in turn is passed to the
downstream neurons. Activation functions are typically monotone increasing functions that map
the set of real numbers to a finite interval e.g. [0 1]. Some commonly used activation functions
include the sigmoid and tanh functions. A deep neural network is a neural network with various
hidden layers between the inputs and the outputs. Figure 2depicts a symbolic neural net.
The data that we are interested in modeling in this work has a time dimension that is important
in understanding the demand. The time dimension can be incorporated into a neural network by
folding a neural network that takes the inputs from different times via different neurons into a
network that takes in similar features from different time periods, from the same neuron. The
folded network is referred to as a recurrent network.
A neural network is fitted to a given set of data points by changing the weights of the arcs that
connect the neurons. Namely, the goal in training a neural network is decreasing the total error
of the model in predicting the output variable by changing the network weights. This problem,
in general, can be very complicated, however, it has become computationally much less burden-
some thanks to the backpropagation algorithm. The backpropagation algorithm is a gradient-based
method that iterates between forward and backward passes through the network modifying the
weights of the arcs and calculating the gradients in the network.
3.5. Deep neural network for solving Markov modulated and data-driven
newsvendor
In this study, we propose a new algorithm referred to as HMMNV that utilizes the machine learning
method of Deep Neural Networks (DNN) for solving the proposed model. We unify the objective
functions of the two problems (P-HMM) and (P-NV) by integration of Markov chain and the
16
newsvendor problem in a network. To this end, we propose a two-head neural network in which these
objective functions are combined in a single function as in problem (P-HMMNV) and optimized
simultaneously. The suggested network comprises two networks of HMM and the newsvendor. The
most likely sequence of states is obtained from the available data by the HMM network. The state
information, q, that influences the order quantity partially along with the available features as
the inputs of the newsvendor network completes the order quantity at each time period. Figure 3
shows this integration and represents the folded version of the expanded recurrent network over
time.
In addition to these two networks, we estimate the base demand Band the function ψin
Equation (11) by a separate neural network in each state. We refer to this network as the emission
network whose outputs are used as likelihoods of an observation given some model parameters.
The emission network is embedded into the HMM network.
In order to estimate the hidden Markov model and train its network we can unfold the recurrent
network and treat it as a feed forward network. Feeding and backpropagation steps of a neural
network estimation is equivalent to the forward and backward steps of the well-known Baum-Welch
algorithm which is used to estimate HMM. The likelihood of each demand observation is estimated
by the probability density functions based on a normal distribution as
pi(Dt)Nµi+ψE(Ft,WE),pσ2
i+σ2
ϵ,(13)
where, µiis the mean of the base demand in state i. In the forward step, a forward term is defined
at each time for each state of the model as
αt(s) = P(D1:t, St=s) = P(D1:t, St=s, St1=i), D1:t={D1, D2,···, Dt},(14)
where D1:t={D1, D2,···, Dt}. Using the chain rule and rewriting for P(D1:t, St=s, St1=i) , we
then have
αt(s) =
S
X
i=1
P(D1:t|St=s, St1=i, D1:t1)
P(St=s|St1=i, D1:t1)P(St1=i, D1:t1).
(15)
Since the last observation Dtis conditionally independent of everything but St, and in the Markov
model, we know that Stonly depends on St1, Equation (15) could be written as
αt(s) = P(D1:t|St=s)
S
X
i=1
P(St=s|St1=i)αt1(i).(16)
Finally, the probability of being in state jfor a new time t+ 1 is
P(St+1 =j) =
S
X
i=1
αt(i)aij ,(17)
17
Figure 3 HMMNV network. This figure includes three neural networks: 1- Hidden Markov Model (HMM) net-
work (dashed rectangle), 2- Newsvendor network, and 3- Emission network. The emission network is
embedded into HMM network. The network weights denoted by B and A are the base demand associ-
ated with each state and the transition probabilities between states, respectively. All other connections
have weight=1. Filled units (circles) denote summation, crossed units multiply their inputs, and units
divided by a line, divide one input by the other. In fact, this network is repeated in the number of
observations so that constructs a network over time (recurrent network).
where aij is the most recent estimation of the probability of transition from state ito state
j. Let qt+1(j) denote the probability obtained by Equation (17). The vector qt+1 = [qt+1(j=
1),...,qt+1 (j=S)] contains the probabilities of all states at time t+ 1 which sum to one. Vector
qt+1, as the state information, is then multiplied by some network weights and builds the base
demand which is represented as the connection between HMM and newsvendor networks in Figure
3. The sum of the values obtained from state-dependent and feature-dependent is the estimation
for the order quantity.
18
In the estimation process, scaling is required in implementation of forward-backward algorithm.
Let us consider the forward term of Equation (14) in the re-estimations procedure and write as
αt(s)=Pλ(D0,···, Dt, St=s)
=X
s1s2...st1
Pλ(D0,···, Dt, St=s|s1s2. . . st1)Pλ(s1s2. . . st1)
=X
s1s2...st1"t
Y
i=1
psi(Di)
t1
Y
i=1
asisi+1 #
(18)
3.5.1. Scaling the terms: All the involving terms in Equation (18) are on a probability scale
meaning that they are less than one. Therefore, the summation rapidly drops to zero with an
exponential rate. The result is too small and may exceed the machine precision and relative errors
round to zero in floating point. To solve this problem, a scaling is proposed and used by Levinson
et al. (1983) to keep all αt(s)’s bounded at each induction step. This scaling factor only depends
on time tand not the current state s. Corresponding computations include two parts:
Initialization ¨α1(i) = α1(i),
c1=1
PS
i=1 ¨α1(i),
ˆα1(i) = c1¨α1(i)
(19)
Induction
¨α1(i) =
S
X
j=1
ˆαt1aji pi(Dt),
ct=1
PS
i=1 ¨αt(i),
ˆαt(i) = ct¨αt(i)
(20)
The coefficient ctat each step depends on t. ˆαt(i) is the modified forward variable which sums to
one, PS
i=1 ˆαt(i) = 1. It is easy to see that
ˆαt(i) = t
Y
τ=1
cτ!αt(i).(21)
By using this new forward algorithm, obtained in the last step, we have
1 =
S
X
i=1
ˆαT(i) =
S
X
i=1 T
Y
t=1
ct!αT(i)
= T
Y
t=1
ct!S
X
i=1
αT(i) = T
Y
t=1
ct!P(D|λ).
(22)
Let C=Qt
τ=1 cτ, then P(D|λ) = 1/CT. The logarithmic form of the likelihood function is then
Λ = log[P(D|λ)] =
T
X
t=1
log ct.(23)
In the next step, we obtain the partial derivatives of the function Λ with respect to all network
parameters.
19
3.5.2. Backpropagation step: The backpropagation step of the HMM network includes the
partial derivative of the likelihood function (23) with respect to transition probabilities aij and
emissions pi(Dt), which are calculated as
Λ
∂aij
=
T
X
t=1
Λ
∂ct
∂ct
¨αt(i)
¨αt(i)
∂aij
,(24)
Λ
∂aij
=
T
X
t=1
ctˆαt1(i)pj(Dt),(25)
Λ
∂pi(Dt)=
T
X
t=1
Λ
∂ct
S
X
j=1
∂ct
¨αt(j)
¨αt(j)
∂pi(Dt),(26)
Λ
∂pi(Dt)=
T
X
t=1
S
X
j=1
ctˆαt1aji .(27)
Derivatives obtained from Equation (27) are used for updating two parameter sets of the base
demand vector B= [B1, B2,...,BS] and the weights of the emission network WE
Bn+1 =Bn+γ1Λ
∂pi(Dt)
∂pi(Dt)
Bn.(28)
The input associated with base demand is the unit vector as in Figure 3. Therefore, the second
partial derivative of ∂pi(Dt)
∂Bn= 1.
Regarding the update of the emission network weights we have
WE
n+1 =WE
n+γ1Λ
∂pi(Dt)
∂pi(Dt)
WE
n,(29)
where, the first partial derivative is obtained in (27) and the second one, partial derivatives of
emissions pi(Dt) with respect to the network weights, is related to the backpropagation step in
emission network WE
n, which is explained in Appendix A.
The newsvendor network is a feed forward network which is trained by backpropagation algo-
rithm and the weights are updated using the derivatives of the cost function with respect to WN V
WNV
n+1 =WNV
nγ2∂N V C
WNV
n.(30)
Further details of the partial derivative term are provided in Appendix A.
The transition matrix Ais the joint part of the HMM and the newsvendor networks. Con-
sequently, both partial derivatives of which in Equation (25) plus the partial derivative of the
newsvendor cost function with respect to Aare used to update the transition matrix
An+1 =Anγ3ηΛ
∂An
+ (1 η)∂N V C
∂An,(31)
where the term in parenthesis is the partial derivative of the single objective function (P-HMMNV)
with respect to A
An+1 =Anγ3Loss
∂An.(32)
20
3.5.3. Network specification: We consider two hidden layers in the newsvendor network
following the rule proposed by Huang (2003) that determines the number of hidden nodes in each
layer. Further, we consider a single hidden layer for the emission network in which the number
of hidden nodes are specified based on the formula suggested by Ke and Liu (2008). In training
procedure of the HMMNV model, we choose the candidates for the trade-off coefficient ηfrom the
set of {0.001,0.01,0.1,0.9,0.99,0.999}. In order to find the best learning rates for each network,
a grid search over the set {0.001,0.01,0.1,2,10}is used for all learning rates. We choose the best
candidate among these parameters based on a cross validation step on the training data set. We
then train and test the model on all data set using the best parameter chosen by cross validation.
Figure 4indicates how the sample observations are divided into different sets so that one can
implement the algorithms. First, we pick a smaller set of the data and divide it into training and
test samples for inner cross-validation to find the best parameters. We then train the algorithm
with the best selected parameter on the entire smaller set that we chose initially and test the model
on the other test set to evaluate the performance of the model.
Figure 4 Different sets of the sample observations used for cross-validation, training, and test.
4. Analyses
In this section, we test the performance of HMMNV model using simulated data. In Section 4.1,
we introduce several alternative methods as benchmarks. In Section 4.2, we describe our numerical
setup to evaluate the performance of HMMNV model with different benchmarks, and present and
highlight the results in Section 4.3.
4.1. Benchmarks and true model
In the following, we list the benchmark methods and explain more specifically how they solve the
intended newsvendor problem. We also provide results for the newsvendor cost that is obtained
by the true model as a near perfect benchmark. The true model is used as a baseline to compare
all the methods.
21
EDD: The empirical demand distribution approach uses only the demand observation and finds
the optimal solution based on the empirical distribution which assumes equal weights for each
observation. The optimal quantity is obtained by the known formula.
ObBW: The Objective-blind Baum-Welch algorithm considers the Markov chain which subor-
dinates the demand (Avci et al. 2020,Treharne and Sox 2002). Consequently, it finds the most
probable sequence of states by the predetermined distribution for demand. Therefore, this is a
parametric approach that fits a distribution on demand and estimates the appropriate parameters
for each state of the Markov model. Baum-Welch algorithm is a well-known forward-backward
method in the estimation of hidden Markov states using observable data (i.e. demand). Details of
this algorithm are described by Rabiner (1989).
PfLR: The parameter fitting linear regression is the first and the simplest method which takes
the feature data into account. It consists of two steps: estimating the parameters of the demand
distribution and then use the estimations in optimization problem (Ban and Rudin 2019). However,
this approach ignores the dependency between demand observations through the hidden Markov
states. That said, it is a dynamic approach that makes a linear relationship between features and
demand by the following linear regression
Dt=β0+β1f1t+ +β2f2t+...+βNfN t +ϵt,(33)
where t= 1,...,T, and Nis the number of features. The time-varying mean for demand and
the standard deviation of the estimated normal distribution is calculated by the coefficients and
residuals of the regression in (33). The standard deviation of the demand distribution is equivalent
to the standard deviation of the residuals. The optimal order quantity would be
Q
t=β0+β1f1t+β2f2t+...+βNfN t +zσϵ,(34)
where z= Φ1(b/(h+b)).
LML: Ban and Rudin (2019) suggest a linear decision rule for the order quantity as
Q=q:X →R:qt(β) = β0+
N
X
n=1
βnfnt.(35)
They substitute this linear formula into the newsvendor problem and solve a nonlinear program
for the optimal order quantity.
22
DNN: Oroojlooyjadid et al. (2020) apply a deep neural network to the newsvendor problem which
is a nonlinear extension to the LML model. In this approach, a neural network maps the feature
data to the optimal decision. The goal of the network is to provide the minimum average cost value
over all periods
min
W
1
T
T
X
t=1
NV C(θ(Ft,W), Dt),(36)
where Wis the matrix of the network weights, Ftis the vector of input features at time t, and
the network is indicated by the mapping function θ. In order to have a network structure in DNN
method similar to HMMNV method, we use the same hyper parameters selecting rule (network
specification including the number of hidden layers and the number of hidden nodes in each layer)
explained for HMMNV method.
True model: The true model is used as a benchmark where the prediction of the state and the
effects of features are done perfectly. Although this scenario is impossible in practice, it can be
used as a benchmark to capture the difference between the performance of the methods discussed
here with an ideal method. In the true model, we assume that one knows the exact parameters
of the model. Two sets of parameters exist, first, the states of the system which are hidden and
unobservable to all the other methods, second, the distribution of the demand at each time which
is the sum of two normal distributions related to states referred to as base demand and the feature
part of the demand. More specifically, current state i, associated mean and variance of the base
demand denoted by µiand σ2
i, respectively, the exact function ψ(Ft,W) and the variance σ2
ϵ
are known. Therefore, the factors that cause some costs in the true model are the two variance
terms of σ2
ϵand σ2
irelated to the features part and base demand distributions. The true optimal
order quantity is then obtained by the known formula as Q
t=µi+ψ(Ft,W) + zpσ2
i+σ2
ϵ, where
z= Φ1(b/(b+h)) and Φ() is the cumulative distribution function of a standard normal random
variable.
In the next section, for each method, the results are reported as the percentage deviation of the
cost obtained by a method with respect to the cost for the true model,
Percentage deviation of a method = Cost of a method Cost of the true model
Cost of the true model ×100.
4.2. Numerical Experiments
In this section, we first present the experimental setup that we have used to evaluate the per-
formance of our method compared to the benchmarks. In our experimental setup, the demand
in each period is a normally distributed random variable. The mean of this distribution in each
period is determined based on the state of a Markov chain and a number of randomly generated
23
Table 3 The set of parameters used in the experimental setup.
Parameter Description Domain
etransition probability 0.01 0.1 0.2
µ1mean of the base demand in state 1 1 2 3
µ2mean of the base demand in state 2 1
σ1standard deviation of the base demand in state 1 0.5 1
σ2standard deviation of the base demand in state 2 0.5
σϵstandard deviation of the demand by features 0.5
Nnumber of features 1 3 5
Wnetwork weights 1 2 3
Lnumber of hidden layers 1 2 3
b/h newsvendor cost parameters ratio 2 5 10
Tnumber of observations 200 400 800
features for that period. Table 3shows the domain for each parameter of the system. According
to Figure 1, we evolve the base demand by a two-state Markov chain model, and the additional
part of the demand is generated by using a neural network and feature data. The final demand is
the summation of state-dependent and feature-dependant demand values. Parameter erelates to
the transition probability between two states of the Markov chain. The transition matrix is then
determined as
s1s2
s11e e
s2e1e
,
where s1and s2indicate two states of the Markov chain. Regarding the base demand in each state,
we assume that they have normal distribution with parameters (µ1, σ1) and (µ2, σ2), respectively.
We change the parameters of the first state within the range specified in Table 3and fix those of
the second state on (µ2= 1, σ2= 0.5). Nis the number of features observed before the realization
of the demand. All the features are randomly drawn from the standard normal distribution. We
consider three different network weights denoted by W. Each network weight is a random variable
with the standard normal distribution. The structure of the network, including the number of
hidden layers makes the relation between the features and demand more complex as the number
of hidden layers increases. In our experiments, we consider up to three layers denoted by L. We
consider 10 hidden nodes in each hidden layer where a sigmoid function serves as the activation
function. We also consider three different values for the ratio between the cost rates in the problem
(b/h). Finally, the parameter Tis the number of observations. We evaluate the methods on all
5832 combinations that these parameter sets provide.
4.3. Results
In the following, we give a summary of the results of our experiments. In pairwise comparisons,
HMMNV outperforms DNN in 64 % of the cases. HMMNV outperforms EDD in 60 % of the cases.
24
HMMNV outperforms all the methods in 37 % of the cases. EDD outperforms all methods in 15
% of the cases.
Figures 5presents the results of these experiments. In this figure, each point on the plot corre-
sponds to a data with a certain parameter set where the x-axis is labeled by the algorithms and the
value of y-axis represents the percentage deviation from the cost of the true model associated with
each parameter set. A box plot is also depicted on the scattered result points of each algorithm.
HMMNV obtained a lower mean and deviation from the mean compared to other methods. ObBW
and EDD do not seem to perform well. The other three methods PfLR, LML, and DNN perform
similarly. This grouping in performance implies that both sources that affect demand including
observable features and non-observable Markov states have to be taken into account.
EDD ObBW PfLR LML DNN HMMNV
0
20
40
60
80
100
120
140
160
180
200
Figure 5 The result of all algorithms for different parameter sets.
To show the outperformance of each method over the others, we also implement a t-test on cost
values two by two. This test proves if there is a meaningful and statistically significant difference
between the results of any two algorithms. Table B.2 shows the p-values of the test. If the algorithm
in row ihas a lower cost than algorithm in column j, the cell ij of the table is filled by their p-value
25
Table 4 p-values of the t-test between the cost of algorithms.
EDD ObBW PfLR LML DNN HMMNV
EDD 3.7894e-100
ObBW
PfLR 7.7860e-13 9.8089e-168 0.8860 1.0743e-06
LML 3.5083e-12 1.1297e-163 2.6823e-06
DNN 0.0312 2.7515e-115
HMMNV 1.6786e-80 4.6684e-299 2.4323e-35 1.5093e-35 9.6652e-63
obtained from the test otherwise it is left empty. A p-value lower that 1 % indicates that the mean
cost of an algorithm is less than the other one at 1 % significance level. The HMMNV has a lower
cost compared to others and the p-values close to zero confirm this. EDD as the simplest method
outperforms only the ObBW. ObBW has shown higher cost compared to all other methods. PfLR
and LML are very close and they outperform the DNN. DNN is better than EDD and ObBW at
5 % and 1 % levels, respectively.
In order to investigate the effect of each parameter on the results and show how the algorithms
differ over the range of the parameters, we plot the deviations of the methods for each parameter
separately. Figure 6represents the results for parameter values in table 3. The parameter ewhich
refers to the transition probability of the hidden Markov is a representative for the information
about the non-observable states. A lower value of eindicates more stability for Markov chain and
consequently greater importance and effect caused from states on demand. As a result we can see
0 0.05 0.1 0.15 0.2
0
20
40
60
80
100 EDD
ObBW
PfLR
LML
DNN
HMMNV
1 1.5 2 2.5 3
0
20
40
60
80
100 EDD
ObBW
PfLR
LML
DNN
HMMNV
0.5 0.6 0.7 0.8 0.9 1
0
20
40
60
80
100 EDD
ObBW
PfLR
LML
DNN
HMMNV
12345
0
20
40
60
80
100 EDD
ObBW
PfLR
LML
DNN
HMMNV
1 1.5 2 2.5 3
0
20
40
60
80
100 EDD
ObBW
PfLR
LML
DNN
HMMNV
1 1.5 2 2.5 3
0
20
40
60
80
100 EDD
ObBW
PfLR
LML
DNN
HMMNV
2 4 6 8 10
0
20
40
60
80
100 EDD
ObBW
PfLR
LML
DNN
HMMNV
200 400 600 800
0
20
40
60
80
100 EDD
ObBW
PfLR
LML
DNN
HMMNV
Figure 6 Deviations of algorithms’ cost from the true model against the range of each parameter.
26
that the HMMNV method have lower cost for the smallest value of e. As eincreases the methods
except ObBW converges which means that the systematic effect of Markov states on randomness
of the demand decreases. The parameter associated with the mean of the base demand in two
states affects the results more than other parameters. As µ1increases, the long term dependencies
influence the demand more than observable features. As a result, HMMNV performs better than
the benchmarks in cases with higher imbalance in base demands. Other parameter of the base
demand which is the standard deviation in not effective as much as the mean and all methods
except HMMNV are insensitive to that. Similar to the mean of the base demand, standard deviation
imposes more costs on our method as its value in one state differ more from the other state. The
fourth plot shows the effect of the number of observable features indicated by N. It is reasonable
that methods which don’t use features like EDD and ObBW are not sensitive to N. Other methods
converge as Nincreases. This results from the noises that features add to the demand. If features
compose the underlying pattern, one can gets better results by incorporating them to the model.
A specific value of the parameter Wwhich indicates the relations between features and demand
is not meaningful compared to its other values but lower deviations over all Wshow the power
of the HMMNV algorithm in achieving optimal policy with different random relations compared
to others. The complexity of these relations stems from the structure of the networks and the
associated number of hidden layers. It is observed that HMMNV results in lower cost regardless
of the nonlinear complexity. The outperformance of the HMMNV method is obvious over different
cost imbalance imposed by the b/h ratio. Deviations increase as this ratio increases in all methods.
Regarding the number of observations, the results represent that our method works better as T
increases. This is actually derived by the Markov sequence which is well captured using more
observations. Briefly, all these analyses prove the robustness of the suggested algorithm that results
in a lower average cost with respect to other methods.
In a separate experiment, we examine the performance of the suggested algorithm in some
extreme situations when historical demand observations contain some outliers that have low-
frequency records. We generate new sample sets using different model parameters and show the
robustness of our model in these extreme scenarios. The results of these experiments are given in
Appendix B Section B.24.
5. Real Data Experiment: Crude Oil Demand as a Proxy
We evaluate the performance of our algorithm and the benchmark methods using real data of the
U.S. weekly crude oil demand. This data can be taken as a proxy for a product or service whose
demand is closely correlated with U.S. crude oil. Food crops and agricultural commodities including
4We thank an anonymous referee for this suggestion.
27
Figure 7 Time series of the weekly demand for crude oil from January 1986 to August 2020. The average demand
is 14731 thousand barrels per week.
corn, wheat, rice, and sugar are some newsvendor products that are shown to have a correlation
with crude oil (Du et al. 2011,Mokni and Youssef 2020).
The crude oil demand data consists of weekly demand and covers the period from January 1986
to August 20205. The series of the demand is shown in Figure 7. The set has 1800 weekly data
sample observations. The set of features for the model includes 16 dummy variables representing
the quarter of the year and the month of the year. These features capture the observable variation in
demand which derives the seasonality. However, the variations caused by environmental randomness
are not observable, we assume that they exist and follow a Markov chain model with two states.
Our algorithm use the feature data to explain the additional demand and model the residuals as
base demand which are not captured by features. Base demand is modeled by Markov chain and
evolves the demand over time along with the additional demand.
We pick a time window with the length of 500 weeks and train the algorithms on the first 400
weeks and test the trained models on the remaining 100 weeks. We roll this window by 100-week
forward steps and obtain 14 test periods in total. Table 5summarizes the result of the algorithms.
This table shows the average cost of each method over 14 test periods (1400 weeks) for each of
newsvendor cost parameters ratio (b/h).
It is observed that our proposed method outperforms other approaches for all values of cost
parameters ratio. The second best approach is ObBW that uses demand and fits a Markov model.
Its outperformance suggests that the states of the environment have long dependency and influence
the demand in a state-dependent manner. The benchmark methods which use features have similar
results with minor differences. Briefly, ObBW and EDD are still the best benchmarks among others.
5Data is collected from the U.S. Energy Information Administration at: https://www.eia.gov
28
Table 5 Average cost of the crude oil ordering policy by each algorithm (values are divided by 103).
b/h EDD ObBW PfLR LML DNN HMMNV
2 1.161 0.840 1.150 1.123 1.117 0.477
5 1.660 1.262 1.751 1.701 1.703 0.650
10 2.158 1.625 2.282 2.274 2.244 1.557
20 2.871 2.134 2.847 3.104 3.178 1.832
EDD
b/h=5
01234
0
0.1
0.2
0.3
0.4 ObBW
b/h=5
01234
0
0.1
0.2
0.3
0.4 PfLR
b/h=5
01234
0
0.1
0.2
0.3
0.4
LML
b/h=5
01234
0
0.1
0.2
0.3
0.4 DNN
b/h=5
01234
0
0.1
0.2
0.3
0.4 HMMNV
b/h=5
01234
0
0.1
0.2
0.3
0.4
Figure 8 The frequency of the cost values obtained by each algorithm for the real data over all test periods.
We conclude that if the randomness of the features and the environment are combined in a model,
the cost decreases drastically. 6
Figure 8shows the frequency of costs over 14 test periods for all methods. The suggested method
offers an ordering policy that results in lower costs than other methods. In order to show the states
obtained by HMMNV method and the optimal policy, we plot the demand and the order quantity
for 200 recent out of sample weeks in Figure 9. It is observed that the order quantity tracks the
demand closely, imitating the present variations in the demand. The bottom panel of this figure
shows the estimated sequence of the states during this period. When comparing the order and
6In addition to these five benchmarks, we also consider the well-known method of Holt-Winters that explicitly
accounts for seasonality and trend in time-series forecasting literature. To use this method in the newsvendor problem
setting, we utilize it to forecast the demand and then use the forecast as the order quantity. This approach is referred
to as Estimate-As-Solution (EAS) in the literature (Oroojlooyjadid et al. 2020). For the trend and seasonality compo-
nents in the Holt-Winters method, we consider the trend component as an additive and the seasonality components
as multiplicative, since the Holt-Winters model performs well using these components compared to other types of
components. The mean values of newsvendor cost for the Holt-Winters method are as follows. For b/h =2, 5 ,10, and
20, the mean of costs are 0.907, 1.826, 3.370, and 6.419, multiplied by 103, respectively. Comparing these results with
those in Table 5indicates that the means of all costs are smaller when we use HMMNV to find the optimal order
quantity compared to the Holt-Winters approach. We thank an anonymous referee for this valuable suggestion.
29
Figure 9 Optimal order quantity obtained by HMMNV for b/h = 5 during 200 weeks of test periods from October
2016 to August 2020 (top panel). Bottom panel shows the corresponding state sequence of the Markov
model estimated by this algorithm.
demand series to the states, we observe that the HMMNV algorithm assigns demands with higher
variations to state 1 while the data labeled by state 2 has lower variations. Consequently, the order
quantity by HMMNV has the variations similar to that presented in the demand.
We further show the robustness of the suggested method in the real experiment by performing
the algorithms using monthly crude oil demand observations. Additionally, we take into account the
well-studied features in the literature of crude oil forecasting. These features such as unemployment,
S&P 500 stock index, personal disposable income, and etc. capture microeconomic effects on the
dynamics of the oil market. Details of the features and the results are reported in Appendix B
Section B.37.
6. Conclusions
We presented an integrated learning and optimization method based on deep learning for the data-
driven newsvendor problem with observable and unobservable features. Feature data about the
demand for a product and underlying dynamics of the market that have often dependencies across
multiple decision periods are both common factors in the ordering decision and this work provides
an approach for effectively learning from both these categories of information.
7We thank an anonymous referee for this suggestion.
30
Through extensive numerical experiments based on synthetic and real data, we assess the per-
formance of a variety of methods for the data-driven control of a newsvendor system and show
that our method outperforms the others in a variety of settings.
Although we consider only the ordering problem for a single-item newsvendor in this study,
our approach can be extended to multiple items whose demands could be correlated. Another
interesting extension is that of a joint price and inventory optimization problem where the demand
is dependent on the selling price. In this case, one can investigate how to switch between pricing
strategies during various hidden states and use environmental and local features as price drivers.
Acknowledgements: Research leading to these results has received funding from the EU ECSEL
Joint Undertaking under grant agreement no. 737459 (project Productive4.0) and from TUBITAK
(217M145).
References
Arifo˘glu, K. and S. ¨
Ozekici (2010). Optimal policies for inventory systems with finite capacity and par-
tially observed Markov-modulated demand and supply processes. European Journal of Operational
Research 204 (3), 421–438.
Arifo˘glu, K. and S. ¨
Ozekici (2011). Inventory management with random supply and imperfect information:
A hidden Markov model. International Journal of Production Economics 134 (1), 123–137.
Avci, H., K. Gokbayrak, and E. Nadar (2020). Structural results for average-cost inventory models with
Markov-modulated demand and partial information. Production and Operations Management 29 (1),
156–173.
Ban, G.-Y. and C. Rudin (2019). The big data newsvendor: Practical insights from machine learning.
Operations Research 67 (1), 90–108.
Bensoussan, A., M. Cakanyıldırım, and S. P. Sethi (2005). On the optimal control of partially observed
inventory systems. Comptes Rendus Mathematique 341 (7), 419–426.
Bensoussan, A., M. C¸ akanyıldırım, and S. P. Sethi (2007). A multiperiod newsvendor problem with partially
observed demand. Mathematics of Operations Research 32 (2), 322–344.
Bertsimas, D. and N. Kallus (2020). From predictive to prescriptive analytics. Management Science 66 (3),
1025–1044.
Bertsimas, D. and A. Thiele (2005). A data-driven approach to newsvendor problems. Technical report,
Massachusetts Institute of Technology, Cambridge, MA.
Besbes, O. and A. Muharremoglu (2013). On implications of demand censoring in the newsvendor problem.
Management Science 59 (6), 1407–1424.
31
Beyer, D. and S. P. Sethi (1997). Average cost optimality in inventory models with Markovian demands.
Journal of Optimization Theory and Applications 92 (3), 497–526.
Bhar, R. and S. Hamori (2004). Hidden Markov models: applications to financial economics, Volume 40.
Springer Science & Business Media.
Blinder, A. S. and L. J. Maccini (1991). The resurgence of inventory research: what have we learned? Journal
of Economic Surveys 5 (4), 291–328.
Du, X., L. Y. Cindy, and D. J. Hayes (2011). Speculation and volatility spillover in the crude oil and
agricultural commodity markets: A Bayesian analysis. Energy Economics 33 (3), 497–503.
Efendigil, T., S. ¨
On¨ut, and C. Kahraman (2009). A decision support system for demand forecasting with
artificial neural networks and neuro-fuzzy models: A comparative analysis. Expert Systems with Appli-
cations 36 (3), 6697–6707.
Feldman, R. M. (1978). A continuous review (s, s) inventory system in a random environment. Journal of
Applied Probability 15 (3), 654–659.
Gallego, G. and H. Hu (2004). Optimal policies for production/inventory systems with finite capacity and
Markov-modulated demand and supply processes. Annals of Operations Research 126 (1-4), 21–41.
Gallego, G. and I. Moon (1993). The distribution free newsboy problem: review and extensions. Journal of
the Operational Research Society 44 (8), 825–834.
Goel, S., J. M. Hofman, S. Lahaie, D. M. Pennock, and D. J. Watts (2010). Predicting consumer behavior
with web search. Proceedings of the National Academy of Sciences.
Gruhl, D., L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Zien (2004). How to build a
webfountain: An architecture for very large-scale text analytics. IBM Systems Journal 43 (1), 64–77.
Gruhl, D., R. Guha, R. Kumar, J. Novak, and A. Tomkins (2005). The predictive power of online chatter. In
Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data
Mining, pp. 78–87. ACM.
Gurney, K. (2018). An introduction to neural networks. CRC press.
Hamilton, J. (1990). Analysis of time series subject to changes in regime. Journal of Econometrics 45,
39–70.
He, B., F. Dexter, A. Macario, and S. Zenios (2012). The timing of staffing decisions in hospital operating
rooms: incorporating workload heterogeneity into the newsvendor problem. Manufacturing & Service
Operations Management 14 (1), 99–114.
Huang, G.-B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks.
IEEE Transactions on Neural Networks 14 (2), 274–281.
Huh, W. T., R. Levi, P. Rusmevichientong, and J. B. Orlin (2011). Adaptive data-driven inventory control
with censored demand based on Kaplan-Meier estimator. Operations Research 59 (4), 929–941.
32
Ke, J. and X. Liu (2008). Empirical analysis of optimal hidden neurons in neural network modeling for stock
prediction. In Computational Intelligence and Industrial Application, 2008. PACIIA’08. Pacific-Asia
Workshop on, Volume 2, pp. 828–832. IEEE.
Kesavan, S. and T. Kushwaha (2014). Differences in retail inventory investment behavior during macroeco-
nomic shocks: Role of service level. Production and Operations Management 23 (12), 2118–2136.
Khayyati, S. and B. Tan (2020). Data-driven control of a production system by using marking-dependent
threshold policy. International Journal of Production Economics 226, 107607.
Levi, R., R. O. Roundy, and D. B. Shmoys (2007). Provably near-optimal sampling-based policies for
stochastic inventory control models. Mathematics of Operations Research 32 (4), 821–839.
Levinson, S. E., L. R. Rabiner, and M. M. Sondhi (1983). An introduction to the application of the theory
of probabilistic functions of a Markov process to automatic speech recognition. Bell System Technical
Journal 62 (4), 1035–1074.
Liyanage, L. H. and J. G. Shanthikumar (2005). A practical inventory control policy using operational
statistics. Operations Research Letters 33 (4), 341–348.
Lovejoy, W. S. (1992). Stopped myopic policies in some inventory models with generalized demand processes.
Management Science 38 (5), 688–707.
Mokni, K. and M. Youssef (2020). Empirical analysis of the cross-interdependence between crude oil and
agricultural commodity markets. Review of Financial Economics 38 (4), 635–654.
Monahan, G. E. (1982). State of the art—a survey of partially observable Markov decision processes: theory,
models, and algorithms. Management Science 28 (1), 1–16.
Oroojlooyjadid, A., L. V. Snyder, and M. Tak´c (2020). Applying deep learning to the newsvendor problem.
IISE Transactions 52 (4), 444–463.
Perakis, G. and G. Roels (2008). Regret in the newsvendor model with partial information. Operations
Research 56 (1), 188–203.
Picone, J. (1990). Continuous speech recognition using hidden Markov models. IEEE ASSP Magazine 7 (3),
26–41.
Qi, M., Y. Shi, Y. Qi, C. Ma, R. Yuan, D. Wu, and Z.-J. M. Shen (2020). A practical end-to-end inventory
management model with deep learning. Available at SSRN 3737780 .
Qin, F., A. Auerbach, and F. Sachs (2000). A direct optimization approach to hidden Markov modeling for
single channel kinetics. Biophysical Journal 79 (4), 1915–1927.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition.
Proceedings of the IEEE 77 (2), 257–286.
Sachs, A.-L. (2015). The data-driven newsvendor with censored demand observations. In Retail Analytics,
pp. 35–56. Springer.
33
Scarf, H. (1958). A min-max solution of an inventory problem. Studies in the Mathematical Theory of
Inventory and Production.
Scarf, H. (1959). Bayes solutions of the statistical inventory problem. The Annals of Mathematical Statis-
tics 30 (2), 490–508.
Sethi, S. P. and F. Cheng (1997). Optimality of (s, s) policies in inventory models with Markovian demand.
Operations Research 45 (6), 931–939.
Seubert, F., N. Stein, F. Taigel, and A. Winkelmann (2020). Making the newsvendor smart–order quantity
optimization with anns for a bakery chain. Working Paper.
Shang, K. H. (2012). Single-stage approximations for optimal policies in serial inventory systems with
nonstationary demand. Manufacturing & Service Operations Management 14 (3), 414–422.
Song, J.-S. and P. Zipkin (1993). Inventory control in a fluctuating demand environment. Operations
Research 41 (2), 351–370.
Treharne, J. T. and C. R. Sox (2002). Adaptive inventory control for nonstationary demand and partial
information. Management Science 48 (5), 607–624.
Van der Laan, N., R. H. Teunter, W. Romeijnders, and O. Kilic (2019). The data-driven newsvendor problem:
Achieving on-target service levels. Technical report, Working paper, University of Groningen, SOM
research school.
Van Parys, B. P., P. M. Esfahani, and D. Kuhn (2020). From data to decisions: Distributionally robust
optimization is optimal. Management Science.
Zhang, Y. and J. Gao (2017). Assessing the performance of deep learning algorithms for newsvendor problem.
In International Conference on Neural Information Processing, pp. 912–921. Springer.
34
Appendices
Appendix A Backprobagation Algorithm
In order to calculate the derivatives based on backpropagation algorithm for the newsvendor net-
work, we consider the network with Llayers including input and output layers, so it has L2
hidden layers in each of which there are M(l)hidden nodes for l= 2,...,L1. w(l, l+1)
ij denotes the
weight between node iin layer land node jin layer l+ 1. Let net(l)
iand o(l)
idefine the input and
output of each node iin layer l. It is seen that o(1)
i=snet(1)
i, and in output unit the activation
function is a simple addition function so simply the output of this node is identical to the trans-
mitted value to it and they also represent the order quantity Q=o(L)=net(L).
If we consider a sample being fed into the network, we could drive the calculations for it, and then
extend them for all samples in a matrix form. At the end of feed-forward step, once we get the
output from the network in output node, the corresponding newsvendor cost will be computed by
newsvendor cost function.
Backpropagation step begins right after feed-forward step. In this step we look for partial deriva-
tives ∂N V C
∂w(l,l+1)
ij
for all i,j, and l= 1,...,L1. So, we need to compute the derivatives of activation
functions in each node and find the path to a certain edge. This takes us to the intended partial
derivative. The derivative of the newsvendor cost function NV C with respect to the output Qor
equivalently o(L)is
∂N V C
∂o(L)=h if D < Q
b if Q < D .(A.37)
As the activation function in the output unit is a simple additional function, the derivative of it
with respect to its input value, ∂o(L)
∂net(L)is just equal to 1. In hidden layer units, sigmoid function has
a property that its derivative results in an expression that consists only the function itself. So if the
output of hidden node jin layer lis o(l)
j, the derivative would be o(l)
j(1 o(l)
j). Now for simplicity
of chain rule in taking the derivatives we define a term referred as to backpropagation error for
node j= 1,...,M(l)in layer l= 2,...,L. We are moving from the right of the network to the left,
so firstly we can compute partial derivatives of N V C
∂w(L1,L)
j1
. To this end we define backpropagation
error of output unit as
δ(L)=∂N V C
∂o(L)
∂o(L)
∂net(L).(A.38)
In this problem δ(L)would be
δ(L)=δ(L)(h) = h.1if D < Q
δ(L)(b) = b.1if Q < D .(A.39)
35
Now, one more multiplication is left to get to the partial derivative ∂N V C
∂w(L1,L)
j1
which is ∂net(L)
∂w(L1,L)
j1
=
o(L1)
jFinally we have
∂N V C
∂w(L1,L)
j1
=δ(L)o(L1)
j,(A.40)
which is
∂N V C
∂w(L1,L)
j1
=(δ(L)(h)o(L1)
jif D < Q
δ(L)(b)o(L1)
jif Q < D .(A.41)
The remaining set of partial derivatives ∂E
∂w(l,l+1)
ij
for l= 1,...,L 2 are obtained similarly by
computing backprobagtion errors. In order to compute the backpropagation error δ(l)
jof node jin
the hidden layer l, all possible backward paths which reach to hidden node jshould be considered
so that an integration weighted of backpropagation errors of layer l+1, δ(l+1) , will transit to hidden
node j
δ(l)
j=
M(l+1)
X
k=1
δl+1
kw(l,l+1)
jk .(A.42)
Similar to the previous set of derivatives, the term of ∂o(l)
j
∂net(l)
j
=o(l)
j(1 o(l)
j) should be multiplied
by δ(l)
j. Then backpropagation error for node jin the hidden layer lis calculated as
δ(l)
j=o(l)
j1o(l)
jδ(l)
j.(A.43)
Again in this problem we have two types of backpropagation error
δ(l)
j=(δ(1)
j(h)if D < Q
δ(1)
j(b)if Q < D .(A.44)
According to the chain rule, to obtain the partial derivative ∂ NV C
∂w(1,l+1)
ij
the last term which is derivative
of ∂net(l+1)
j
∂w(l,l+1)
ij
=oj(l)is multiplied by δ(l+1)
j, hence;
∂N V C
∂w(l,l+1)
ij
=(δ(l)
j(h)oj(l)if D < Q
δ(l)
j(b)oj(l)if Q < D .(A.45)
When all partial derivatives are computed the network weights would be updated in the negative
gradient direction. Introducing a constant γas a learning rate, the corrections for the weights will
be
w(1,l+1)
ij =(γδ(l)
j(h)oj(l)if D < Q
γδ(l)
j(b)oj(l)if Q < D .(A.46)
Figure A.10 Input and backpropagated error on an edge
36
There are more than one sample which are fed into the network. So if we have Tpairs of samples
like {(F1, D1),(F2, D2),...,(FT, DT)}, the weight corrections are computed for each sample and we
get, for example, the corrections as ∆1w(l,l+1)
ij ,2w(l,l+1)
ij ,...,Tw(l,l+1)
ij for weight w(l,l+1)
ij . Therefore
the total amount of correction in the gradient direction is
w(l,l+1)
ij =
T
X
t=1
tw(l,l+1)
ij .(A.47)
In the emission network, the corresponding partial derivatives and the chain rule is similar to
the newsvendor network. The emissions whose partial derivatives with respect to network weights
are calculated is ∂p(D)
wE, where p(D) is a probability density function of normal distribution at each
time.
p(D) = 1
σ2πe(Dµ)2
2σ2(A.48)
First we need to compute the derivative of the above function with respect to Das the output
of the emission network, o(L)
i, where iS, then
∂p(D)
∂o(L)
i
=o(L)
i
σ2 1
σ2πe(o(L)
i
µ)2
2σ2!.(A.49)
This value is multiplied by previous derivatives of the HMM network reached to the emission
network and completes the initial derivatives that are used for updating the emission network
weights. First, we have
Λ
∂p(D)
∂p(D)
∂o(L)
i
.(A.50)
We sum up the above derivatives over all states to obtain a single value for updating the network
weights as they are identical in each state, we show the result as
Λ
∂o(L).(A.51)
The backpropagation error term for the last layer is
δ(L)=Λ
∂o(L)
∂o(L)
∂net(L).(A.52)
Now, one more multiplication is left to get to the partial derivative Λ
∂w(L1,L)
j1
which is ∂net(L)
∂w(L1,L)
j1
=
o(L1)
j. Finally we have
Λ
∂w(L1,L)
j1
=δ(L)o(L1)
j.(A.53)
Derivatives of the function Λ to other network weights w(l,l+1)
ij between node iin layer land j
in layer l+ 1 for l= 1,...,L1 are obtained similar to newsvendor network by chain rule and
computing backpropagation errors for each node and layer.
37
Appendix B Further Analyses
B.1 Trade-off coefficient ηanalysis
Figure B.1 depicts the effect of the trade-off coefficient ηon the final cost of the model for test
sample. The curve on the plot is obtained by optimizing the function in (P-HMMNV) for different
values of η. Accordingly, the network is trained and the cost for the out-of-sample is shown on
Y-axis for the corresponding value of η. The optimal value for ηis 0.64 (indicated by the red
vertical line) by a grid search over the range [0.001,0.999]. However, this value is not available
prior to setting the order quantity. For this reason, the HMMNV method finds the best value for η
equal to 0.70 by implementing the cross-validation process (indicated by the black vertical line). It
is observed that the proposed algorithm and the cross-validation detect near-optimal value for η.
One can also observe that the plot is unimodal, confirming that ignoring the best trade-off between
hidden states and observable features as two sources of randomness will lead to an increase in the
final objective value.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
Cost
Newsvendor cost of test set
Best = 0.70 by cross-valdation
Optimal = 0.64 by grid search
Figure B.1 Effect of the trade-off coefficient ηon the out-of-sample newsvendor cost.
B.2 Extreme scenarios
We generate some samples to simulate the suggested extreme scenarios. In this data set, we assume
that some demand outliers with low-frequency historical records belong to the state that is rarely
visited by the Markov chain and the demand value in that scarce state is significantly different
from the demand observations in the other state(s). We consider a two-state Markov chain with
38
the parameter ethat is the probability of transition from state 1 to state 2. The transition matrix
can be set as
s1s2
s11e e
s21e e
,
where the value of eis chosen so that 1 eis big enough to ensure that the system is in state 1
most of the time. We set the other parameters of the model as given in Table B.1. The small e
values ensure that most of the demand observations come from state 1. The big difference between
µ1and µ2(µ2µ1= 4) causes the demand observations from state 2 to have extreme values and
act as outliers with low frequency in the demand sequence.
Table B.1 The set of parameters used in the experimental setup for extreme scenarios with demand outliers.
Parameter Description Domain
etransition probability 0.01 0.05
µ1mean of the base demand in state 1 1
µ2mean of the base demand in state 2 5
σ1standard deviation of the base demand in state 1 0.5
σ2standard deviation of the base demand in state 2 0.5
σϵstandard deviation of the demand by features 0.5
Nnumber of features 1 3
Wnetwork weights 1
Lnumber of hidden layers 1
b/h newsvendor cost parameters ratio 2 5 10
Tnumber of observations 100 200
Figure B.2 and Table B.2 give the results for these experiments. The results indicate that both
the HMMNV and the DNN method are able to perform well given the presence of states that are
rarely visited.
Table B.2 p-values of the t-test between the cost of algorithms in extreme scenarios.
EDD ObBW PfLR LML DNN HMMNV
EDD 0.2685
ObBW
PfLR 0.1940 0.0150
LML 0.0265 6.4659e-04 0.3785
DNN 0.0192 4.2083e-04 0.3115 0.8878
HMMNV 0.0091 9.9872e-05 0.2308 0.7778 0.8952
39
Figure B.2 The result of all algorithms for extreme scenarios.
B.3 Real experiment with macroeconomic features
We have used monthly data related to suggested extra features including lagged price of crude
oil, lagged unemployment, lagged SP 500 stock index, and lagged personal disposable income for
assessing the effect of using extra features. This data covers the period from September 1986 to
August 2020 and it consists of 408 monthly sample observations. Accordingly, we resample the
demand for crude oil on a monthly basis. We pick a time window with the length of 120 months
and feed the first 96 months (80 % of the total periods in a window) to the algorithms and then
test the trained models on the following 24 months. We roll this window by 24-month forward
steps and obtain 13 test periods in total. Table B.3 gives the average cost of the monthly ordering
policy for crude oil. We observe that the proposed HMMNV method performs well compared to
other benchmark methods, however, as expected, this advantage is not present in all cases (e.g.
when b/h = 10, DNN method has an average cost lower than HMMNV method). This is due to the
informative input features that capture almost all the uncertainties in demand observations and
have more out of sample forecasting power. Moreover, it should be noted that both HMMNV and
DNN methods perform well compared to other methods since they consider nonlinear relations in
40
modeling as well.
Table B.3 Average cost of the crude oil ordering for the newsvendor problem for monthly data by each
algorithm using macroeconomic features (values are divided by 105).
b/h EDD ObBW PfLR LML DNN HMMNV
2 4.791 4.589 5.321 7.875 5.518 4.350
5 6.479 7.511 7.244 10.534 6.580 5.892
10 7.594 11.23 8.604 9.057 6.916 8.009
20 8.668 17.054 10.430 7.329 8.232 7.372
... The existing body of research on data-driven E2E methods for the NVP is still in its infancy, with the prominent studies focusing on linear decision mapping for the ordering decisions, as in [5] with a linear programming approach and [6] with the development of a framework based on the Empirical Risk Minimization (ERM) principle. Additionally, machine and deep learning methods have proven to be powerful and flexible approaches to modeling single-step inventory problems, as shown in [7][8][9][10][11][12][13]. However, the studies that adopted neural networks as a data-driven solution for the NVP predominantly relied on Multi-Layer Perceptron (MLP) architectures. ...
... Recently, ref. [13] proposed an algorithm integrating neural networks and hidden Markov models to solve the data-driven NVP. Lastly, ref. [31] extended the [6] method to the non-linear case by maximizing the profit instead of minimizing the expected cost and conducted experiments with ARIMA models. ...
... Position of the proposed methods in the data-driven NVP inventory literature[4][5][6][7][8][9][10][11][12][13][22][23][24][25][26][27][29][30][31]. ...
Article
Full-text available
Background: Over the past decade, the potential advantages of employing deep learning models and leveraging auxiliary data in data-driven end-to-end (E2E) frameworks to enhance inventory decision-making have gained recognition. However, current approaches predominantly rely on feed-forward networks, which may have difficulty capturing temporal correlations in time series data and identifying relevant features, resulting in less accurate predictions. Methods: Addressing this gap, we introduce novel E2E deep learning frameworks that combine Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) for resolving single-period inventory ordering decisions, also termed the Newsvendor Problem (NVP). This study investigates the performance drivers of hybrid CNN-LSTM architectures, coupled with an evolving algorithm for optimizing network configuration. Results: Empirical evaluation of real-world retail data demonstrates that our proposed models proficiently extract pertinent features and interpret sequential data characteristics, leading to more accurate and informed ordering decisions. Notably, results showcase substantial benefits, yielding up to an 85% reduction in costs compared to a univariate benchmark and up to 40% savings compared to a feed-forward E2E deep learning architecture. Conclusions: This confirms that, in practical scenarios, understanding the impact of features on demand empowers decision-makers to derive tailored, cost-effective ordering decisions for each store or product category.
... Liu et al. (2022) considered the nonlinear cost of the newsvendor problem, accounting for different costs associated with shortages. Pirayesh Neghab et al. (2022) considered not only observable feature data but also unobservable features and integrated the three steps of parameter estimation, inference, and optimization into a multi-layer neural network with a hidden Markov model. Furthermore, researchers have explored data-driven newsvendor problems with constraints and multi-period inventory problems. ...
... Firstly, the existing literature lacks exploration of multi-source data. In the previous data-driven inventory management studies, researchers primarily focused on using single-source data, such as historical demand data (Liyanage and Shanthikumar 2005) or prespecified numerical feature data (Ban and Rudin 2019;Pirayesh Neghab et al. 2022), with limited attention given to the selection of demand-relevant features. However, the choice of data is critical for effective inventory decisionmaking. ...
... Our method utilizes sentiment word lexicons and degree adverb lexicons, offering a transparent, interpretable, and easily maintainable approach compared to methods like recursive neural networks (RNNs). Thirdly, the previous studies have overlooked the investigation of the impact of sample size on order/production decision outcomes (Liu et al. 2022;Pirayesh Neghab et al. 2022). In contrast, our research explicitly examines the performance of different models under varying sample sizes, highlighting the robustness of our proposed method in the face of limited data availability. ...
Article
Full-text available
The production decision of a large commodity or equipment manufacturing enterprise can be modeled as a newsvendor problem. Managers must determine the optimal production volume in advance to minimize the underage cost and the overage cost. However, the traditional newsvendor problem assumes the known demand distribution, which is not the case in practice. Data-driven approaches have become the hot research topic and opened up new avenues for such issues. Recent studies have considered demand-related features but have failed to address how to optimize production and inventory using informative textual reviews, not just numerical feature data. To address this issue, we propose a data-driven newsvendor model that leverages sentiment analysis on textual reviews using a deep learning model to solve the data-driven newsvendor problem by integrating estimation and optimization. Experiments on real data show that our proposed method reduces the average cost by approximately 14.18% compared to the most advanced deep neural network method, making it the best-performing method. Furthermore, our method is more suitable for situations where unit shortage costs are greater than unit overage costs. Finally, our method is robust in terms of sample size and can still obtain good results even with insufficient historical data.
... Among these attributes, elements such as local weather conditions, day of the week, month of the year, interest rates, and discounts play a role, in addition to various other local factors [2]. Furthermore, some researchers also account for broader attributes at the regional and global levels, including inflation, factors that affect competition, and the consumer price index [5]. In particular, neural networks exhibit particular proficiency in processing and analyzing this type of data [2]. ...
Article
Full-text available
Background: Efficient inventory management is critical for sustainability in supply chains. However, maintaining adequate inventory levels becomes challenging in the face of unpredictable demand patterns. Furthermore, the need to disseminate demand-related information throughout a company often relies on cloud services. However, this method sometimes encounters issues such as limited bandwidth and increased latency. Methods: To address these challenges, our study introduces a system that incorporates a machine learning algorithm to address inventory-related uncertainties arising from demand fluctuations. Our approach involves the use of an attention mechanism for accurate demand prediction. We combine it with the Newsvendor model to determine optimal inventory levels. The system is integrated with fog computing to facilitate the rapid dissemination of information throughout the company. Results: In experiments, we compare the proposed system with the conventional demand estimation approach based on historical data and observe that the proposed system consistently outperformed the conventional approach. Conclusions: This research introduces an inventory management system based on a novel deep learning architecture that integrates the attention mechanism with cloud computing to address the Newsvendor problem. Experiments demonstrate the better accuracy of this system in comparison to existing methods. More studies should be conducted to explore its applicability to other demand modeling scenarios.
... Another popular data-driven approach is the distributionally robust optimization that uses data-calibrated uncertainty or ambiguity sets to help derive decisions (Delage and Ye, 2010;Goh and Sim, 2010;Van Parys et al., 2021). Besides using the uncertain parameter's historical data, there is a growing body of literature on utilizing auxiliary data to predict the uncertain parameter via machine learning tools (Liu et al., 2022a;Neghab et al., 2022;Liu et al., 2022b;Han et al., 2023;Zhong et al., 2023). Several advanced frameworks have been developed, including the smart predict-then-optimize framework (Elmachtoub and Grigas, 2022;Tian et al., 2023), the weighted SAA framework (Bertsimas and Kallus, 2020;Notz and Pibernik, 2022), the empirical risk minimization framework (Ban and Rudin, 2019;Notz and Pibernik, 2022), and the kernel optimization framework (Ban and Rudin, 2019;Bertsimas and Koduri, 2021;Notz and Pibernik, 2022). ...
Article
Full-text available
This paper explores binary decision making, a critical domain in areas such as finance and supply chain management, where decision makers must often choose between a deterministic-cost option and an uncertain-cost option. Given the limited historical data on the uncertain cost and its unknown probability distribution, this research aims to ascertain how decision makers can optimize their decisions. To this end, we evaluate the worst-case expected performance of all possible data-driven policies, including the sample average approximation policy, across four scenarios differentiated by the extent of knowledge regarding the lower and upper bounds of the first moment of the uncertain cost distribution. Our analysis, using worst-case expected absolute regret and worst-case expected relative regret metrics, consistently shows that no data-driven policy outperforms the straightforward strategy of choosing either a deterministic-cost or uncertain-cost option in these scenarios. Notably, the optimal choice between these two options depends on the specific lower and upper bounds of the first moment. Our research contributes to the literature by revealing the minimal worst-case expected performance of all possible data-driven policies for binary decision-making problems.
... Another interesting research direction is to challenge the assumption that the joint distribution of the covariate and uncertain parameters is stationary. Neghab et al. (2022) study a newsvendor model with a hidden Markov model underlying the distribution of the covariates and demand. ...
... Lin et al. [24] proposed a data-driven approximation to the theoretical risk-averse newsboy model under a value-at-risk constraint. Neghab et al. [25] presented an integrated learning and optimization method based on deep learning for the data-driven newsboy problem with observable and unobservable features. Yang et al. [26] studied a data-driven newsvendor problem, in which the replenishment decisions are obtained by mapping high-dimensional and mixed-frequency features of historical data. ...
Article
Full-text available
Effective inventory management depends on accurate estimates of product profitability to formulate ordering and manufacturing strategies. The achievable capacity index (ACI) is a simple yet efficient approach to measuring the profitability of newsboy-type products with normally distributed demand, wherein profitability is presented as the probability of achieving the target profit under the optimal ordering quantity. Unfortunately, the ACI is applicable only to retail stores with a single demand. In the current study, we addressed the issue of measuring the integrated profitability of newsboy-type products sold in multiple locations with independent demand levels, such as own-branding-and-manufacture (OBM) companies with multiple owned channels. We began by formulating profitability in accordance with multiple independent normal demands, and then developed an integrated ACI (IACI) to simplify expression. We also derived the statistical properties of the unbiased estimator to determine the true IACI in situations where demand patterns are unknown. Finally, we conducted hypothesis testing to determine whether the integrated profitability meets a stipulated minimum level. For convenience, we tabulated the critical values as a function of sample size, confidence level, the number of channels, and the stipulated minimum level. One can make decisions simply by estimating the IACI based on historical demand data from all channels and then looking up the critical value in the corresponding tables. Consequently, the proposed methods make it possible for OBM managers to address integrated profitability evaluation, which is effective in deciding the optimal timing to pull unprofitable items from the shelves by looking up generic tables. Furthermore, we also performed numerical and sensitivity analyses for a real-world case to illustrate the applicability and some managerial implications of the proposed scheme.
... Other data-driven approaches related to the newsvendor problem include the quantile regression (Harsha et al. 2021), deep neural network (Pirayesh Neghab et al. 2022, Zhang 2019) and robust optimization (RO) (Xu et al. 2022). Compared with these works, we study a distribution-free pricing problem with decision-dependent effect, and extend the model to a convex price-only case and nonlinear price adjustment cost case. ...
Preprint
This paper investigates the data-driven pricing newsvendor problem, which focuses on maximizing expected profit by deciding on inventory and pricing levels based on historical demand and feature data. We first build an approximate model by assigning weights to historical samples. However, due to decision-dependent effects, the resulting approximate model is complicated and unable to solve directly. To address this issue, we introduce the concept of approximate gradients and design an Approximate Gradient Descent (AGD) algorithm. We analyze the convergence of the proposed algorithm in both convex and non-convex settings, which correspond to the newsvendor pricing model and its variants respectively. Finally, we perform numerical experiment on both simulated and real-world dataset to demonstrate the efficiency and effectiveness of the AGD algorithm. We find that the AGD algorithm can converge to the local maximum provided that the approximation is effective. We also illustrate the significance of two characteristics: distribution-free and decision-dependent of our model. Consideration of the decision-dependent effect is necessary for approximation , and the distribution-free model is preferred when there is little information on the demand distribution and how demand reacts to the pricing decision. Moreover, the proposed model and algorithm are not limited to the newsvendor problem, but can also be used for a wide range of decision-dependent problems.
Article
This work provides performance guarantees for solving data-driven contextual newsvendor problems, when the contextual data contains intertemporal dependence and non-stationarities. While machine learning tools have observed increasing use in data-driven inventory management problems, most of existing work assumes that the contextual data are independent and identically distributed (often referred to as i.i.d.). However, such assumptions are often violated in real operational environments where the contextual data are sequentially generated with intertemporal correlations and possible non-stationarities. By accommodating these naturally arised operational environments, our work adopts comparatively more realistic assumptions and develops out-of-sample performance bounds for learning data-driven contextual newsvendor problems.
Article
Full-text available
We investigate a data-driven multiperiod inventory replenishment problem with uncertain demand and vendor lead time (VLT) with accessibility to a large quantity of historical data. Different from the traditional two-step predict-then-optimize (PTO) solution framework, we propose a one-step end-to-end (E2E) framework that uses deep learning models to output the suggested replenishment amount directly from input features without any intermediate step. The E2E model is trained to capture the behavior of the optimal dynamic programming solution under historical observations without any prior assumptions on the distributions of the demand and the VLT. By conducting a series of thorough numerical experiments using real data from one of the leading e-commerce companies, we demonstrate the advantages of the proposed E2E model over conventional PTO frameworks. We also conduct a field experiment with JD.com, and the results show that our new algorithm reduces holding cost, stockout cost, total inventory cost, and turnover rate substantially compared with JD’s current practice. For the supply chain management industry, our E2E model shortens the decision process and provides an automatic inventory management solution with the possibility to generalize and scale. The concept of E2E, which uses the input information directly for the ultimate goal, can also be useful in practice for other supply chain management circumstances. This paper was accepted by Hamid Nazerzadeh, big data analytics—fast track. Funding: This research was supported by the National Key Research and Development Program of China [Grant 2018YFB1700600] and National Natural Science Foundation of China [Grants 71991462 and 91746210]. Supplemental Material: The online appendix and data are available at https://doi.org/10.1287/mnsc.2022.4564 .
Article
Full-text available
This paper aims to investigate the cross‐interdependence between crude oil and agricultural commodity prices. We apply a test of persistence in order to verify whether crude oil prices' effect on the agricultural commodity markets is immediate or delayed. Using the daily data covering the period 2003–2017, results show that the delayed effect of crude oil prices on the agricultural commodity prices is lower than the immediate effect. Furthermore, the dependence is strongly persistent and more affected by the food crisis than the oil crisis. Additionally, a contagion effect is detected during the food crisis for almost agricultural commodity markets, while during the oil crisis, it is verified only for the soybean and wheat markets. The study is designed to determine a reliable framework for returns and volatility forecasting in commodity markets based on the oil market changes.
Article
Full-text available
As increasingly more shop-floor data becomes available, the performance of a production system can be improved by developing effective data-driven control methods that utilize this information. We focus on the following research questions: how can the decision to produce or not to produce at any time be given depending on the real-time information about a production system?; how can the collected data be used directly in optimizing the policy parameters?; and what is the effect of using different information sources on the performance of the system? In order to answer these questions, a production/inventory system that consists of a production stage that produces to stock to meet random demand is considered. The system is not fully observable but partial production and demand information, referred to as markings is available. We propose using the marking-dependent threshold policy to decide whether to produce or not based on the observed markings in addition to the inventory and production status at any given time. An analytical method that uses a matrix geometric approach is developed to analyze a production system controlled with the marking-dependent threshold policy when the production, demand, and information arrivals are modeled as Marked Markovian Arrival Processes. A mixed integer programming formulation is presented to determine the optimal thresholds. Then a mathematical programming formulation that uses the real-time shop floor data for joint simulation and optimization (JSO) of the system is presented. Using numerical experiments, we compare the performance of the JSO approach to the analytical solutions. We show that using the marking-dependent control policy where the policy parameters are determined from the data works effectively as a data-driven control method for manufacturing.
Article
Full-text available
In retailer management, the Newsvendor problem has widely attracted attention as one of basic inventory models. In the traditional approach to solving this problem, it relies on the probability distribution of the demand. In theory, if the probability distribution is known, the problem can be considered as fully solved. However, in any real world scenario, it is almost impossible to even approximate or estimate a better probability distribution for the demand. In recent years, researchers start adopting machine learning approach to learn a demand prediction model by using other feature information. In this paper, we propose a supervised learning that optimizes the demand quantities for products based on feature information. We demonstrate that the original Newsvendor loss function as the training objective outperforms the recently suggested quadratic loss function. The new algorithm has been assessed on both the synthetic data and real-world data, demonstrating better performance.
Article
We combine ideas from machine learning (ML) and operations research and management science (OR/MS) in developing a framework, along with specific methods, for using data to prescribe optimal decisions in OR/MS problems. In a departure from other work on data-driven optimization, we consider data consisting, not only of observations of quantities with direct effect on costs/revenues, such as demand or returns, but also predominantly of observations of associated auxiliary quantities. The main problem of interest is a conditional stochastic optimization problem, given imperfect observations, where the joint probability distributions that specify the problem are unknown. We demonstrate how our proposed methods are generally applicable to a wide range of decision problems and prove that they are computationally tractable and asymptotically optimal under mild conditions, even when data are not independent and identically distributed and for censored observations. We extend these to the case in which some decision variables, such as price, may affect uncertainty and their causal effects are unknown. We develop the coefficient of prescriptiveness P to measure the prescriptive content of data and the efficacy of a policy from an operations perspective. We demonstrate our approach in an inventory management problem faced by the distribution arm of a large media company, shipping 1 billion units yearly. We leverage both internal data and public data harvested from IMDb, Rotten Tomatoes, and Google to prescribe operational decisions that outperform baseline measures. Specifically, the data we collect, leveraged by our methods, account for an 88% improvement as measured by our coefficient of prescriptiveness. This paper was accepted by Noah Gans, optimization.
Article
We consider a discrete‐time infinite‐horizon inventory system with non‐stationary demand, full backlogging, and deterministic replenishment lead time. Demand arrives according to a probability distribution conditional on the state of the world that undergoes Markovian transitions over time. But the actual state of the world can only be imperfectly estimated based on past demand data. We model the inventory replenishment problem for this system as a Markov decision process (MDP) with an uncountable state space consisting of both the inventory position and the most recent belief, a conditional probability mass function, about the actual state of the world. Assuming that the state of the world evolves as an ergodic Markov chain, using the vanishing discount method along with a coupling argument, we prove the existence of an optimal average cost that is independent of the initial system state. For our linear cost structure, we also establish the average‐cost optimality of a belief‐dependent base‐stock policy. We then discretize the uncountable belief space into a regular grid and observe that the average cost under our discretization converges to the optimal average cost as the number of grid points grows large. Finally, we conduct numerical experiments to evaluate the use of a myopic belief‐dependent base‐stock policy as a heuristic for our MDP with the uncountable state space. On a test bed of 108 instances, the average cost obtained from the myopic policy deviates by no more than a few percent from the best lower bound on the optimal average cost obtained from our discretization
Article
We investigate the data-driven newsvendor problem when one has n observations of p features related to the demand as well as historical demand data. Rather than a two-step process of first estimating a demand distribution then optimizing for the optimal order quantity, we propose solving the “big data” newsvendor problem via single-step machine-learning algorithms. Specifically, we propose algorithms based on the empirical risk minimization (ERM) principle, with and without regularization, and an algorithm based on kernel-weights optimization (KO). The ERM approaches, equivalent to high-dimensional quantile regression, can be solved by convex optimization problems and the KO approach by a sorting algorithm. We analytically justify the use of features by showing that their omission yields inconsistent decisions. We then derive finite-sample performance bounds on the out-of-sample costs of the feature-based algorithms, which quantify the effects of dimensionality and cost parameters. Our bounds, based on algorithmic stability theory, generalize known analyses for the newsvendor problem without feature information. Finally, we apply the feature-based algorithms for nurse staffing in a hospital emergency room using a data set from a large UK teaching hospital and find that (1) the best ERM and KO algorithms beat the best practice benchmark by 23% and 24%, respectively, in the out-of-sample cost, and (2) the best KO algorithm is faster than the best ERM algorithm by three orders of magnitude and the best practice benchmark by two orders of magnitude.
Research
We consider a discrete-time infinite-horizon inventory system with non-stationary demand, full backlogging, and deterministic replenishment lead time. Demand arrives according to a probability distribution conditional on the state of the world that undergoes Markovian transitions over time. But the actual state of the world can only be imperfectly estimated based on past demand data. We model the inventory replenishment problem for this system as a Markov decision process (MDP) with an uncountable state space consisting of both the inventory position and the most recent belief, a conditional probability mass function, about the actual state of the world. Assuming that the state of the world evolves as an ergodic Markov chain, using the vanishing discount method along with a coupling argument, we prove the existence of an optimal average cost that is independent of the initial system state. For our linear cost structure, we also establish the average-cost optimality of a belief-dependent base-stock policy. We then discretize the uncountable belief space into a regular grid and observe that the average cost under our discretization converges to the optimal average cost as the number of grid points grows large. Finally, we conduct numerical experiments to evaluate the use of a myopic belief-dependent base-stock policy as a heuristic for our MDP with the uncountable state space. On a test bed of 108 instances, the average cost obtained from the myopic policy deviates by no more than a few percent from the best lower bound on the optimal average cost obtained from our discretization.
Conference Paper
In retailer management, the Newsvendor problem has widely attracted attention as one of basic inventory models. In the traditional approach to solving this problem, it relies on the probability distribution of the demand. In theory, if the probability distribution is known, the problem can be considered as fully solved. However, in any real world scenario, it is almost impossible to even approximate or estimate a better probability distribution for the demand. In recent years, researchers start adopting machine learning approach to learn a demand prediction model by using other feature information. In this paper, we propose a supervised learning that optimizes the demand quantities for products based on feature information. We demonstrate that the original Newsvendor loss function as the training objective outperforms the recently suggested quadratic loss function. The new algorithm has been assessed on both the synthetic data and real-world data, demonstrating better performance.