ArticlePDF Available

An Integrated Data-Driven Method Using Deep Learning for a Newsvendor Problem with Unobservable Features

January 2022
European Journal of Operational Research 302(3)

January 2022
302(3)

DOI:10.1016/j.ejor.2021.12.047

Authors:

Davood Pirayesh Neghab

Ryerson University

Siamak Khayyati

University of Liège

Fikri Karaesmen

Koc University

We consider a single-period inventory problem with random demand with both directly observable and unobservable features that impact the demand distribution. With the recent advances in data collection and analysis technologies, data-driven approaches to classical inventory management problems have gained traction. Specially, machine learning methods are increasingly being integrated into optimization problems. Although data-driven approaches have been developed for the newsvendor problem, they often consider learning from the available data and optimizing the system separate tasks to be performed in sequence. One of the setbacks of this approach is that in the learning phase, costly and cheap mistakes receive equal attention and, in the optimization phase, the optimizer is blind to the confidence of the learner in its estimates for different regions of the problem. To remedy this, we consider an integrated learning and optimization problem for optimizing a newsvendor’s strategy facing a complex correlated demand with additional information about the unobservable state of the system. We give an algorithm based on integrating optimization, neural networks and hidden Markov models and use numerical experiments to show the efficiency of our method. In an empirical experiment, the method outperforms the best competitor benchmark by more than 27%, on average, in terms of the system cost. We give further analyses of the performance of the method using a set of numerical experiments.

Content uploaded by Fikri Karaesmen

Content may be subject to copyright.

An Integrated Data-Driven Method Using Deep

Learning for a Newsvendor Problem with

Unobservable Features

Davood Pirayesh Neghab

Department of Mechanical and Industrial Engineering, Ryerson University, Toronto, Canada

Email: dneghab@ryerson.ca

Siamak Khayyati

College of Engineering, Ko¸c University, Rumeli Feneri Yolu, Istanbul, Turkey, 34450

Email: skhayyati13@ku.edu.tr

Fikri Karaesmen*

College of Engineering, Ko¸c University, Rumeli Feneri Yolu, Istanbul, Turkey, 34450

Email: fkaraesmen@ku.edu.tr

We consider a single-period inventory problem with random demand with both directly observable and

unobservable features that impact the demand distribution. With the recent advances in data collection

and analysis technologies, data-driven approaches to classical inventory management problems have gained

traction. Specially, machine learning methods are increasingly being integrated into optimization problems.

Although data-driven approaches have been developed for the newsvendor problem, they often consider

learning from the available data and optimizing the system separate tasks to be performed in sequence. One of

the setbacks of this approach is that in the learning phase, costly and cheap mistakes receive equal attention

and, in the optimization phase, the optimizer is blind to the conﬁdence of the learner in its estimates for

diﬀerent regions of the problem. To remedy this, we consider an integrated learning and optimization problem

for optimizing a newsvendor’s strategy facing a complex correlated demand with additional information

about the unobservable state of the system. We give an algorithm based on integrating optimization, neural

networks and hidden Markov models and use numerical experiments to show the eﬃciency of our method.

In an empirical experiment, the method outperforms the best competitor benchmark by more than 27%, on

average, in terms of the system cost. We give further analyses of the performance of the method using a set

of numerical experiments.

Key words : Inventory; Hidden Markov model; Deep neural network; Partially observed data; Integrated

Estimation and Optimization

* Corresponding author.

This paper is dedicated to the memory of Prof. Gabor Rudolf, who is unfortunately no longer with us. We thank him

for his valuable advice and comments enhancing the quality of this work.

1. Introduction

The single period random demand inventory problem is one of the central problems in inventory

control and capacity management. In the standard version of the problem, it is assumed that the

inventory manager chooses an order quantity before observing the random demand. The mismatch

between the order quantity and the realized demand may lead to unsatisﬁed demand or unsold

items and the implied costs of a unit lost demand and an unsold item are not symmetrical. One

commonly used simplifying assumption on random demand is that its probability distribution

is known with certainty and this distribution is independent and identically distributed in each

period. However, this assumption might not hold given the available data in many cases.

Recent interest in data-driven approaches to inventory management has stimulated renewed

interest in general versions of this problem where the random demand depends on multiple factors

that are observed prior to ordering. In addition, there is past data on the observed factors and

the corresponding demand that was realized that can guide a data-dependent ordering decision.

This more general setup can be used for production/inventory management, and capacity planning

problems in systems that are aﬀected by disruptions in the supply chains and the consequent

increase or decrease in demand for various items due to changes in the state of the environment

such as the disruptions and changes caused by the recent COVID-19 pandemic. In addition to

simple seasonal factors (such as day of the week or week of the month), or planning related factors

(promotions, competitor’s actions, etc.), many data sources that might have a potential impact

on demand are being monitored daily (weather forecasts, stock market indices, currency exchange

rates) and can be used in controlling the inventory systems.

A recent stream of research (Ban and Rudin 2019,Oroojlooyjadid et al. 2020) investigates the

above inventory problem under such observable features. On the other hand, there might be other

factors that may aﬀect demand that are not directly observable at the time of the decision. These

may include supplier conditions aﬀecting competitors, business cycles, or consumer preference

shifts that are observable only after a long time lag. In economics and ﬁnance, market models with

such hidden or unobservable features that randomly evolve are used to model and analyze business

cycles (Hamilton 1990) or ﬁnancial market conditions (Bhar and Hamori 2004). The operations

literature also includes analytical models that investigate the eﬀects of hidden features on inventory

decisions but this literature does not study how to base these decisions on limited historical data.

We propose a framework that includes such unobservable features governed by a random process

which aﬀect demand in addition to observable features that are known at the time of the decision.

In this framework, considering the structure of the process that governs the unobservable features

evolving over time is as important as the short-term modeling of the eﬀects of the observable

features.

In this paper, we contribute to the recent stream of literature that incorporates observable

features in data-driven inventory or capacity planning (Ban and Rudin 2019,Oroojlooyjadid et al.

2020) by investigating a single-period random demand inventory problem where random demand

in each period is dependent on an unobservable factor in addition to some observable factors. We

assume that past data is available on the observable factors and the corresponding demand that

was observed. On the other hand, information about the unobservable factor has to be inferred from

past data. The traditional approach to such a problem would be to ﬁrst estimate the underlying

model parameters for the eﬀect of the observable factors on demand and infer the hidden state

information and its eﬀects. Once estimation and inference take place, the optimization problem

would be solved in the second step. However, some of the recent research on this problem (see

Ban and Rudin (2019) for example) has demonstrated the advantages of integrating the estimation

step of the parameters for the observable factors with the optimization step using tools from

machine learning. We pursue this integrated estimation and optimization approach in a model

which has the additional complexity of an unobservable factor. This brings the additional challenge

of considering a multi-period dynamic optimization problem because the unobservable state is

estimated dynamically and depends on the entire demand sequence that was observed prior to the

decision. We therefore combine estimation, inference and optimization using a multi-layered neural

network. To assess the performance of this integrated approach, we compare the results from our

approach against data-based methods that ignore the hidden factor information or that employ

separate inference and optimization steps. Numerical examples on both a synthetic data set and on

representative real data that is taken as a proxy to retail data which might have an unobservable

state demonstrate that our approach compares favorably against the other benchmarks.

The remainder of this paper is organized as follows. Section 2presents a review of the related

literature. Section 3presents the setup and the solution method we have proposed. Section 4

introduces several benchmark methods and reports the results of the numerical experiments to

assess the performance of the suggested method against the benchmarks. Section 5provides the

evaluation of the methods in an example with real data. Finally Section 6concludes the paper.

2. Literature Review

In the following we provide a review of the pertinent literature. This is presented in two parts.

First, we review the literature on inventory problems with an evolving demand environment. Then,

we review the related work on the data-driven newsvendor problem.

2.1. Inventory problems with an evolving demand environment

A long line of research investigates the impact of several factors such as macroeconomic shocks and

cycles on inventory systems (Blinder and Maccini 1991,Shang 2012,Kesavan and Kushwaha 2014),

especially through the eﬀects of such factors on random demand. There are several papers that

consider dynamic environmental factors that cause the demand distribution to be non-stationary

which creates additional challenges. A common assumption in these papers is that the random

environment evolves according to a Markov chain.

Earlier studies assume that the state of the Markov chain at each point in time is fully observed

and the true demand distribution associated with each state is known (Sethi and Cheng 1997,Beyer

and Sethi 1997,Huh et al. 2011,Gallego and Hu 2004). For instance, Feldman (1978) proposes

and analyzes a model where demand depends on the state of the environment modeled by a

continuous-time Markov process. Lovejoy (1992) investigates the optimality of a myopic policy with

non-stationary demand which is dependent on a Markovian process over time. Song and Zipkin

(1993) present an inventory model where the demand depends on the state of the world modeled

by a Markov chain and derive the optimal ordering policy.

In many practical situations, the environmental states are not perfectly observable. Instead, one

can observe information about the environment and can only infer the states in a probabilistic

manner. Treharne and Sox (2002) categorize the literature in terms of stationarity of the demand

and observability of the information into four classes. The class of decision systems with Markov

modulated demand and partially observed information is known as partially observed Markov deci-

sion process (POMDP) (Monahan 1982). Treharne and Sox (2002) study several inventory policies

where only the historical demand is observable and the probability distribution of the demand is

determined by the non-observable state of the Markov chain. Bensoussan et al. (2005,2007) con-

sider the newsvendor problem with censored demand and inventory which depend on the Markov

chain states. Arifo˘glu and ¨

Ozekici (2010) analyze a single-item periodic-review inventory system

in a random environment. They extend the model of Gallego and Hu (2004) to the more general

setting where the environment is only partially observable. In particular, they show that a state-

dependent base-stock policy is optimal using suﬃcient statistics on the environment process. In a

later work, Arifo˘glu and ¨

Ozekici (2011) investigate the optimality of a state-dependent inventory

policy in a random environment where the capacity of production is random. Avci et al. (2020)

model and analyze the inventory problem where the demand belongs to a probability distribution

conditional on the Markovian states of the world.

In the above papers that use a POMDP model with imperfectly observed environment processes,

demand state is partially revealed via past demand data and the estimation of the state of the

environment is an important subproblem. This subproblem is solved using Bayesian updating to

incorporate the partial observations into the inventory models and a general solution is given by the

Baum-Welch algorithm. The outcome of this algorithm is the estimation of demand distribution

for each state of the observed sequence. This estimation is based on maximizing the likelihood

of the observed sequence. This maximization at the subproblem level does not take into account

the objective function of the inventory problem which leads to a separation of estimation and

optimization. However, it is seen in recent examples in the literature that integrating the estimation

and optimization problems may lead to better solutions. We refer to the separated estimation

and optimization procedure used in the above papers as Objective-blind Baum-Welch (ObBW)

method. This work contributes to this stream of research by developing a method that integrates

the estimation of the hidden states and the optimization of the system, allowing for the parameters

of the objective function to guide the estimation. Table 1displays the characteristics of this method

along with other approaches that are explained in the next subsection.

2.2. Data-driven approaches to inventory problems

Many papers address the concern that the demand distribution in an inventory problem may not

be completely known. Many of the works considering this topic approach the problem from the

perspective of robust optimization (Scarf 1958,Gallego and Moon 1993,Perakis and Roels 2008)

and Bayesian updating (Scarf 1959).

Some studies contribute to relaxations of the assumption that demand distribution is completely

known by developing data-based methods. In this framework, the decision maker uses the empirical

distribution obtained from past observations (Levi et al. 2007,Liyanage and Shanthikumar 2005,

Huh et al. 2011,Besbes and Muharremoglu 2013). We refer to the approach in these papers which

is mostly based on the sample observations of demand as the Empirical Demand Distribution

(EDD) method. For instance, Bertsimas and Thiele (2005) solve the problem without estimating

the distribution but assuming that all of the demand observations in the sample are assigned an

equal probability 1/T , where Tdenotes the number of demand observations. The optimal stock

level or the order quantity is then approximated by the estimated empirical distribution. The

advantage of this method is that, unlike the ObBW method, it does not assume any particular

shape for the demand distribution. This is useful with real data where demand may not follow a

common distribution. On the other hand, one drawback of this method is assuming that all the

future observations will also belong to the same empirical distribution which may be questionable.

In recent decades, data-driven optimization under uncertainty has gained increasing attention.

For instance, He et al. (2012) model the problem of setting nurse staﬃng levels in hospital operating

rooms with the uncertainty of daily workload as a newsvendor problem. They present various

models including a linear decision model that uses two features. Sachs (2015) considers ordering

with diﬀerent types of exogenous data such as price and temperature that might explain demand.

She formulates the optimal inventory level as a linear function of those variables. In a case study

with real data, she shows that the non-parametric approaches outperform the parametric ones.

This is advantageous when the true demand distribution is not completely known and several

exogenous variables are available. We refer to this approach as Parameter Fitting Linear Regression

(PfLR). Here, the exogenous features explain part of demand variability through a (typically) linear

relationship. The approach therefore estimates the mean of the demand by a linear regression on

the features. This results in a time-varying mean that depends on the features and a ﬁxed standard

deviation. In addition, the usual Gaussian assumptions are usually taken. Similar to the empirical

method, this method is not able to capture the dependency between demand observations.

The integrated estimation and optimization approach is referred to as a prescriptor method

(Van Parys et al. 2020). In a recent paper, Van der Laan et al. (2019) propose a new data-driven

approach based on distributionally robust optimization to achieve on-target service levels. They

show that the suggested approach, which bases the inventory decision directly on feature data,

is more reliable than several classical approaches even with a limited number of historical obser-

vations. Several studies combine the estimation and optimization steps using tools from machine

learning. Ban and Rudin (2019) consider the newsvendor problem with nobservations and pfea-

tures in two cases of a low and a high number of features. They propose two Machine Learning

based approaches: regularization and Kernel Optimization (KO), and demonstrate some theoret-

ical properties. They also show, in a numerical study, that using such features may lower the

expected cost signiﬁcantly. A recent paper by Khayyati and Tan (2020) shows that integrating the

two steps of parameter estimation and optimization can improve the performance of a system in

make-to-stock queues.

The general use of ML-based methods in joint estimation and optimization, however, goes back

to several earlier studies such as (Efendigil et al. 2009,Goel et al. 2010,Gruhl et al. 2004,2005).

Bertsimas and Kallus (2020) combine the methods of ML with the conditional stochastic optimiza-

tion problem. They include direct-eﬀect data as well as other auxiliary information and assume

that the joint probability distributions are unknown and the observations are imperfect. They

develop the framework with several ML methods and show that these techniques are computation-

ally tractable and asymptotically optimal under some conditions. This tractability is shown in the

presence of dependencies in the data and censored observations.

The main contributions of these important recent papers are using informative data, beneﬁt-

ing from non-parametric models, decreasing the estimation errors, and making the decisions more

dynamic. We categorize the approach proposed by Ban and Rudin (2019) of relating the opti-

mal order quantity directly to the features using a functional form for the relationship as Linear

Machine Learning (LML). This method combines the estimation and optimization steps by solving

a nonlinear optimization problem, where the objective is minimizing directly the cost function of

the newsvendor problem instead of minimizing the regression error. In this method, similar to the

parameter ﬁtting approach, there is an assumption of the linear relations between features and the

optimal order quantity.

Finally, some recent studies in data-based optimization contribute to the literature by taking

nonlinear relationships between feature data and the order quantity into consideration. Oroojlooy-

jadid et al. (2020) apply a deep learning approach to the newsvendor problem when such non-

linearities exist. They also consider multi-feature and multi-products extensions of the problem. It

is shown that deep learning outperforms other benchmarks such as local regression, classiﬁcation

and regression trees, random forests, and kernel optimization, especially when demand is highly

volatile. Seubert et al. (2020) develop a data-driven system of ordering for a bakery chain based on

artiﬁcial neural networks. They use two diﬀerent methods of sequential and joint estimation and

optimization and show that both methods considerably save costs compared to human planners.

Qi et al. (2020) extend the approach of Oroojlooyjadid et al. (2020) to multi-period inventory

system with uncertain demand and vendor lead time. Zhang and Gao (2017) examine a supervised

deep learning algorithm with two objectives. They demonstrate that the original newsvendor loss

function as the training function outperforms the quadratic loss function. The algorithm has been

evaluated on synthetic and real data. In this class of methods, non-linear relations between features

and demand have been considered. Oroojlooyjadid et al. (2020), as the ﬁrst study of this class, sug-

gest complex functions that relate the features and the optimal order quantity using deep learning

in the classical newsvendor problem. This method is able to use the features and optimize the cost

function while considering a wide range of relationships in addition to linear relations. However,

their method does not identify the dependencies between consecutive states of the world in an

evolving environment. This work aims at addressing this issue by modeling long term dependencies

using hidden Markov models.

The contribution of this paper is as follows: ﬁrst, we suggest a novel approach in data-driven

inventory system, which considers both observable and unobservable sources of features that aﬀect

the randomness in demand. This new model extends the existing data-driven methods to other

applications, where there are limitations in identifying the factors and their volatility that inﬂuence

the state of the system. Second, we use the hidden Markov model as the most used modeling of

the evolving environment. Utilizing HMM in a data-driven framework enables us to model long

term dependencies and take advantage of other information sources available in the form of the

feature data. Third, by combining neural network modeling tools with stochastic inference, and

integrating them into the optimization method, we propose an integrated solution method for the

suggested model that captures nonlinear dependencies between features and order quantity while

alleviating the errors that occur in the estimation step. This is diﬀerent from most of the literature

where the tasks of learning about the demand distribution based on the features and setting the

Table 1 Position of the suggested model in the existing literature.

Method Abbrev. State-dependent Data-driven NV-cost

function

integration

Non-linear

model

1 Suggested model HMMNV ✓ ✓ ✓ ✓

Empirical

Demand Distri-

bution

EDD ✓

Objective-blind

by Baum-Welch

Alg.

ObBW ✓

Parameter ﬁtting

(Linear Regres-

sion)

PfLR ✓

5Linear Machine

Learning LML ✓ ✓

6 Deep Learning DNN ✓ ✓ ✓

order quantity are handled separately. This may lead to a misalignment between the two diﬀerent

objectives of minimizing the estimation error and minimizing the system cost. Finally, we show the

robustness of our proposed solution method using an extensive numerical experiments with both

synthetic and real data. We compare the results of the suggested model with data-driven methods

that ignore the evolving environment as a hidden factor or that employ inference and optimization

steps separately. Our numerical results reveal that the proposed approach performs better than

other methods when there might be unobservable features and leads to the recommendation that

taking into account the unobservable features might have signiﬁcant beneﬁts. In addition, we

present evidence that integrating unobservable feature estimation and inventory optimization is

feasible and may bring signiﬁcant improvements.

3. Model

In this section, we describe the model setting and our integrated estimation and optimization

approach. We consider a single item newsvendor problem where the goal is to choose the order

quantity at the beginning of every period to minimize the expected costs in that period. We assume

that inventory is not carried over from one period to the next and backordering of unsatisﬁed

demand is not allowed (as in service capacity planning problems). In this setup, the decision maker

solves the problem in a period independently of previous periods’ inventory and his order quantity

does not aﬀect future decisions. We suppose that the demand distribution is not known but that

the demand depends on the available observable feature data in addition to having some long term

Table 2 Description of the variables of the model.

Variable Description

Dtdemand at time t

Qtorder quantity at time t

Bibase demand in state i

fnt nth feature observed at time t

Ftvector of independent and identically distributed of Nfeatures at time t

Tnumber of periods or sample observations

βnlinear coeﬃcient of nth feature with demand

Ststate of the Markov chain at time t

qtvector of states probability

ψEemission network function which maps the features Ftto Dtpartially

WEset of parameters of emission network function ψE

ψNV newsvendor network function which maps Ftto Qtpartially

WNV set of parameters of the newsvendor network function ψNV

ϵterror term as the nonsystematic part of the demand

N(µ, σ) Gaussian distribution with mean µand standard deviation σ

Λ likelihood function of the hidden Markov model

aij probability of transition from state ito state j

Atransition probability matrix of the Markov model

Ethe vector of the probability density functions of the hidden states

πinitial probability of the hidden Markov chain

αtforward parameter of the Baum-Welch algorithm

γ1learning rate of updating WE

γ2learning rate of updating WN V

γ3learning rate of updating A

ηthe coeﬃcient of the trade-oﬀ between newsvendor cost and the Likelihood

dependencies on an unobservable feature modeled as a Markov chain. The optimal order quantity,

therefore, must also be dependent on previous information about demand and feature data. We

ﬁrst present our notation and present the assumptions and then formulate the demand evolution

model. A brief introduction to deep neural networks followed by our suggested approach completes

the section.

3.1. Demand data, optimization and notation

The historical data of the problem can be represented using tuples of feature vectors and demand

realizations as

D={(F1, D1),(F2, D2),...,(FT, DT)},(1)

where Tdenotes the number of periods. In Equation (1), the vector of Ftfor each period of

t= 1,2,...,T consists of Ndiﬀerent features f1t, f2t,...,fNt.

Given the data D, our focus is on the following newsvendor optimization problem where the

objective is to choose an order quantity Qto minimize expected (weighted) mismatch costs:

min

QNV C(Q) = ED[h(Q−D)++b(D−Q)+|D],(2)

where Ddenotes the random demand, Qis the order quantity with has the unit overage cost and

bas the unit underage cost. In a retail setting, the overage and underage costs may refer to the

cost of unsold items and lost demand respectively. In a capacity setting (staﬃng) they represent

the cost of unused capacity and unfulﬁlled demand respectively.

When the demand distribution is known, the optimal order quantity in (2) can be found by

the well-known critical fractile rule. However, the data-driven environment presents an additional

challenge in that the solution of the outer optimization problem depends on the inner estimation

problem where the expected cost has to be estimated using past observations. The estimation

problem in itself is also solved as an optimization problem. Motivated by the recent success of

methods that integrate estimation and optimization (Ban and Rudin 2019,Oroojlooyjadid et al.

2020), we propose an approach that uses an integrated solution.

Finally, we should note here the optimal quantity in (2) is found separately for each period since

inventory or backorders are not carried over. On the other hand, the quantity decision depends on

the currently observed features and all past demand observations which carry information about

the state of the unobservable feature. This makes the order quantity dependent on the entire

demand sequence up to time t. Table 2describes all the variables that are used in this study.

3.2. Special case: A newsvendor problem with Markov modulated demand and

observable states

Many papers in the literature assume that demand depends on an external state of the world that

evolves according to a Markov chain S(Treharne and Sox 2002,Arifo˘glu and ¨

Ozekici 2010). It is

then natural to assume that demand Ddepends on the external state and therefore the demand in

period t,Dt=D|St(or D|St−1depending on the ﬁltration). Unlike the general formulation in (2),

the entire demand sequence is not required here because Stcarries all the necessary information.

We then look for the order quantity that minimizes the expected cost of the system as

min

QNV C(Q) = ED[h(Q−D)++b(D−Q)+|S],(3)

Q∗

t=F−1

D|Stb

h+b,(4)

where FD|St() is the cumulative distribution function of demand given Stand F−1() denotes its

inverse.

Let us now generalize the model further and deﬁne a base demand Bithat depends on St=i

and an additional demand that depends on the values of certain observed features f1t, f2t,...,fNt

that do not depend on St. Further, let us assume that Biis independent of features. We can then

have

Dt=Bi+ψ(f1t, f2t,...,fN t ) + ϵt,(5)

where ϵtis a random error term with E[ϵt] = 0.

3.2.1. Example As an example, assume that there are two states of the world: (1) Good and

(2) Bad, and Biis normally distributed with parameters (µi, σi) where i= (1),(2). Further assume

that

ψ(f1t, f2t,...,fN t ) = β0+β1f1t+β2f2t+...+βNfN t .(6)

We then have that Dtis normally distributed with mean µi+ψ(f1t, f2t,...,fN t) and variance

σ2

i+σ2

ϵ.

From the known results, we then have

Q∗

t=µi+ψ(f1t, f2t,...,fN t ) + z∗pσ2

i+σ2

ϵ,(7)

where z∗= Φ−1(b/(h+b)) (and Φ() is the cumulative distribution function of a standard normal

random variable).

Next, we present the main model in this paper which includes an unobservable environment

process states.

3.3. Our model: A data-driven newsvendor problem with unobservable

environment process

In an inventory system, there may be several sources of uncertainty, generated by observable sources

as features and non-observable sources. The features may correspond to observations such as the

weather, seasonality, and local market conditions. In this model, the base demand takes place as

the non-observable source. The base demand distribution, as a part of whole demand, depends on

some evolving states which have two properties; ﬁrst, they follow a Markov model, second, they

are not observable. The state of the Markov chain aﬀects the system partially through the base

demand. Therefore, the joint probability distribution of demand and a state varies depending on

the base demand distribution in each state, and the feature-dependent part completes the demand

distribution independently of the states. Let us assume that Stis not observable but can only be

inferred from past demands D1, D2,...,Dt−1. This is similar to the setup in Treharne and Sox

(2002) and Arifo˘glu and ¨

Ozekici (2010). One can then estimate the conditional distribution

St=St|D1, D2,...,Dt−1.(8)

In our model, we assume that the realizations of the features are independent from each other

and the time period. Any dependency in the sequence of the observations of a feature and any

dependency between the diﬀerent features do not add more information to the decision at the

beginning of a period as the features are observed before setting the order quantity. Hence, the

presence of these dependencies does not aﬀect the solution. Moreover, a dependency between the

states of the (unobservable) Markov chain and the features may be beneﬁcial to the order quantity

decision as the state of the Markov chain can be better inferred through the feature observations

St=St|D1, D2,...,Dt−1,Ft−1.(9)

We consider the general form of the function (6) and substitute it in the Equation (5) to get

Dt=

i=1

I(St=i)Bi+ψ(f1t, f2t,...,fN t ) + ϵt,(10)

where I(x) = 1 if xis true and 0 otherwise. This implies a linear relation between the Markov chain-

dependent and the features-dependent parts of the demand in each state and an unknown complex

relation between Dtand Ft. More speciﬁcally, we assume that the unknown set of parameters of

the function of features that constitutes the demand partially is W. We then assume the following

form

Dt=

i=1

I(St=i)Bi+ψ(Ft,W) + ϵt.(11)

Figure 1 The eﬀects of the features and the evolution of the state on the demand.

Figure 1shows how demand is generated by features and states: D3is a function of the observed

features F3and the unobservable state s3.

The inventory optimization problem can be written as:

min

QNV C(Q) = ED[h(Q−D)++b(D−Q)+|D1, D2,...,Dt−1,Ft−1].(12)

We propose to ﬁt a joint estimation and optimization model to sample data and test the perfor-

mance of the model out of sample with the objective of minimizing the cost in the out of sample

data. Minimizing the cost function requires deriving an accurate belief about the state of the

Markov chain at each point in time and solving the inventory optimization problem at that time.

These two problems can be stated as:

(1) Hidden Markov model problem: First, we consider the problem of maximizing the

likelihood of the observed sequence of demands that undergoes a hidden Markov model.

In our proposed problem setting, we assume that demand observations are produced by a con-

tinuous stochastic process. The problem of interest is characterizing the properties of demand

observations. According to our data structure and several important applications, we know that

the source of demand observations is nonstationary and its property varies over time. Here we use

the mathematical structure of HMM to explain the theoretical basis to characterize the statistical

properties of the demand observations (Rabiner 1989,Picone 1990).

Given a set of demand observations D=D1:T, HMM with a ﬁnite set of Sdistinct hidden states

changes the state of the system according to a set of probabilities associated with each state. In

order to present a full probabilistic description of this system, the other elements of an HMM

rather than the observations and the number of states are deﬁned as follows referred to as the

triple model parameters λ= (π, A, E ). Here, π={πi=P(S1=i)}is the prior probabilities of si

being the initial state of the demand observations. A={aij}is the state transition probabilities

matrix. E={p1, p2,...,pS}is the probability density functions of the observations in hidden states

where pj(Dt) = P(Dt|St=j). Given this form of HMM, three basic problems must be solved for

the model1. We use the mathematical programming formulation of HMM to facilitate representing

these fundamental problems implicitly (Qin et al. 2000). The optimal solution of the following

formulation is the model parameters set λthat is most likely to generate the observed demand

sequence

max

π,A,E, Λ = log[P(D|λ)]

s.t. : A·1 = 1

π·1 = 1,

(P-HMM)

where the ﬁrst constraint characterizes the transition probability from hidden state St+1 into St,

and P

j{aij }= 1, and the second one satisﬁes the relation between observation Dtand hidden state

Stat time t, and P

j{pj(Dt)}= 1.

1Rabiner (1989) clariﬁes that HMM design involves three problems; evaluating the probability or likelihood of a

sequence of observations using particular parameters of HMM; identifying the best sequence of states; and adjusting

the model parameters so that they explain the occurrence of the observations as much as possible.

(2) Newsvendor problem: The second problem is ﬁnding the network that sets the best order

quantity given the state sequence S=S1:Testimated from the ﬁrst problem that has most probably

generated the demand sequence

min

Q,WNV ,Bi

NV C(Q|S)

s.t. : Qt=Bi+ψNV (Ft,WN V ) ; ∀t= 1, . . . , T, i ∈S

WNV ∈ R.

(P-NV)

However, these two objectives do not always align necessarily. We use a linear scalarization

method to formulate these two problems as a single-objective optimization

min

Q,WNV ,Bi,λ,q

Loss = −ηΛ + (1 −η)NV C

s.t. : A·1 = 1

π·1 = 1

Qt=Bi+ψNV (Ft,WN V ) ; ∀t= 1, . . . , T, i ∈S

(P-HMMNV)

where qis the probability of the states eﬀective at the time of making decision on order quantity

Q. In Appendix B Section B.1, we show that how one can adjust the order quantity by changing

the trade-oﬀ coeﬃcient ηin (P-HMMNV) to counter the eﬀect of the probability of the states2.

To solve the above problem, we consider deep learning as it is one of the machine learning

methodologies that can model both highly non-linear functions and the Markov chain using histor-

ical data and its training can be performed eﬃciently using gradient methods3. In the following,

we describe the deep neural network brieﬂy.

3.4. Deep neural networks

Deep neural networks are a sub-category of neural networks. Neural networks are machine learning

models originally inspired by biological processes. Neural networks are extremely capable of

approximating highly non-linear functions. Neural networks are widely studied and widely used

in machine learning and have various applications, especially in image and speech recognition

(Gurney 2018).

A neural network consists of several nodes that are connected, forming a directed graph. Each

node/neuron function receives signals from its upstream neurons and passes the aggregate signal

2We thank an anonymous referee for this suggestion.

3It is important to note that, in most of the data-driven approaches of the newsvendor problem, a regularization

term is added to the objective function (Oroojlooyjadid et al. 2020). However, this issue is more critical in the cases

labeled as “fat data” where there are many input feature variables in the model (Ban and Rudin 2019). In the present

study, the number of features is small and we do not incorporate the regularization term explicitly in the objective

function, rather we handle overﬁtting by performing training and evaluation in the training step.

Figure 2 A deep neural network.

to an activation function. The output of the activation function then in turn is passed to the

downstream neurons. Activation functions are typically monotone increasing functions that map

the set of real numbers to a ﬁnite interval e.g. [0 1]. Some commonly used activation functions

include the sigmoid and tanh functions. A deep neural network is a neural network with various

hidden layers between the inputs and the outputs. Figure 2depicts a symbolic neural net.

The data that we are interested in modeling in this work has a time dimension that is important

in understanding the demand. The time dimension can be incorporated into a neural network by

folding a neural network that takes the inputs from diﬀerent times via diﬀerent neurons into a

network that takes in similar features from diﬀerent time periods, from the same neuron. The

folded network is referred to as a recurrent network.

A neural network is ﬁtted to a given set of data points by changing the weights of the arcs that

connect the neurons. Namely, the goal in training a neural network is decreasing the total error

of the model in predicting the output variable by changing the network weights. This problem,

in general, can be very complicated, however, it has become computationally much less burden-

some thanks to the backpropagation algorithm. The backpropagation algorithm is a gradient-based

method that iterates between forward and backward passes through the network modifying the

weights of the arcs and calculating the gradients in the network.

3.5. Deep neural network for solving Markov modulated and data-driven

newsvendor

In this study, we propose a new algorithm referred to as HMMNV that utilizes the machine learning

method of Deep Neural Networks (DNN) for solving the proposed model. We unify the objective

functions of the two problems (P-HMM) and (P-NV) by integration of Markov chain and the

newsvendor problem in a network. To this end, we propose a two-head neural network in which these

objective functions are combined in a single function as in problem (P-HMMNV) and optimized

simultaneously. The suggested network comprises two networks of HMM and the newsvendor. The

most likely sequence of states is obtained from the available data by the HMM network. The state

information, q, that inﬂuences the order quantity partially along with the available features as

the inputs of the newsvendor network completes the order quantity at each time period. Figure 3

shows this integration and represents the folded version of the expanded recurrent network over

time.

In addition to these two networks, we estimate the base demand Band the function ψin

Equation (11) by a separate neural network in each state. We refer to this network as the emission

network whose outputs are used as likelihoods of an observation given some model parameters.

The emission network is embedded into the HMM network.

In order to estimate the hidden Markov model and train its network we can unfold the recurrent

network and treat it as a feed forward network. Feeding and backpropagation steps of a neural

network estimation is equivalent to the forward and backward steps of the well-known Baum-Welch

algorithm which is used to estimate HMM. The likelihood of each demand observation is estimated

by the probability density functions based on a normal distribution as

pi(Dt)∼Nµi+ψE(Ft,WE),pσ2

i+σ2

ϵ,(13)

where, µiis the mean of the base demand in state i. In the forward step, a forward term is deﬁned

at each time for each state of the model as

αt(s) = P(D1:t, St=s) = P(D1:t, St=s, St−1=i), D1:t={D1, D2,···, Dt},(14)

where D1:t={D1, D2,···, Dt}. Using the chain rule and rewriting for P(D1:t, St=s, St−1=i) , we

then have

αt(s) =

i=1

P(D1:t|St=s, St−1=i, D1:t−1)

P(St=s|St−1=i, D1:t−1)P(St−1=i, D1:t−1).

(15)

Since the last observation Dtis conditionally independent of everything but St, and in the Markov

model, we know that Stonly depends on St−1, Equation (15) could be written as

αt(s) = P(D1:t|St=s)

i=1

P(St=s|St−1=i)αt−1(i).(16)

Finally, the probability of being in state jfor a new time t+ 1 is

P(St+1 =j) =

i=1

αt(i)aij ,(17)

Figure 3 HMMNV network. This ﬁgure includes three neural networks: 1- Hidden Markov Model (HMM) net-

work (dashed rectangle), 2- Newsvendor network, and 3- Emission network. The emission network is

embedded into HMM network. The network weights denoted by B and A are the base demand associ-

ated with each state and the transition probabilities between states, respectively. All other connections

have weight=1. Filled units (circles) denote summation, crossed units multiply their inputs, and units

divided by a line, divide one input by the other. In fact, this network is repeated in the number of

observations so that constructs a network over time (recurrent network).

where aij is the most recent estimation of the probability of transition from state ito state

j. Let qt+1(j) denote the probability obtained by Equation (17). The vector qt+1 = [qt+1(j=

1),...,qt+1 (j=S)] contains the probabilities of all states at time t+ 1 which sum to one. Vector

qt+1, as the state information, is then multiplied by some network weights and builds the base

demand which is represented as the connection between HMM and newsvendor networks in Figure

3. The sum of the values obtained from state-dependent and feature-dependent is the estimation

for the order quantity.

In the estimation process, scaling is required in implementation of forward-backward algorithm.

Let us consider the forward term of Equation (14) in the re-estimations procedure and write as

αt(s)=Pλ(D0,···, Dt, St=s)

s1s2...st−1

Pλ(D0,···, Dt, St=s|s1s2. . . st−1)Pλ(s1s2. . . st−1)

s1s2...st−1"t

i=1

psi(Di)

t−1

i=1

asisi+1 #

(18)

3.5.1. Scaling the terms: All the involving terms in Equation (18) are on a probability scale

meaning that they are less than one. Therefore, the summation rapidly drops to zero with an

exponential rate. The result is too small and may exceed the machine precision and relative errors

round to zero in ﬂoating point. To solve this problem, a scaling is proposed and used by Levinson

et al. (1983) to keep all αt(s)’s bounded at each induction step. This scaling factor only depends

on time tand not the current state s. Corresponding computations include two parts:

•Initialization ¨α1(i) = α1(i),

c1=1

i=1 ¨α1(i),

ˆα1(i) = c1¨α1(i)

(19)

•Induction

¨α1(i) =

j=1

ˆαt−1aji pi(Dt),

ct=1

i=1 ¨αt(i),

ˆαt(i) = ct¨αt(i)

(20)

The coeﬃcient ctat each step depends on t. ˆαt(i) is the modiﬁed forward variable which sums to

one, PS

i=1 ˆαt(i) = 1. It is easy to see that

ˆαt(i) = t

τ=1

cτ!αt(i).(21)

By using this new forward algorithm, obtained in the last step, we have

1 =

i=1

ˆαT(i) =

i=1 T

t=1

ct!αT(i)

= T

t=1

ct!S

i=1

αT(i) = T

t=1

ct!P(D|λ).

(22)

Let C=Qt

τ=1 cτ, then P(D|λ) = 1/CT. The logarithmic form of the likelihood function is then

Λ = log[P(D|λ)] = −

t=1

log ct.(23)

In the next step, we obtain the partial derivatives of the function Λ with respect to all network

parameters.

3.5.2. Backpropagation step: The backpropagation step of the HMM network includes the

partial derivative of the likelihood function (23) with respect to transition probabilities aij and

emissions pi(Dt), which are calculated as

∂Λ

∂aij

t=1

∂Λ

∂ct

∂¨αt(i)

∂aij

,(24)

∂Λ

∂aij

t=1

ctˆαt−1(i)pj(Dt),(25)

∂Λ

∂pi(Dt)=

t=1

∂Λ

∂ct

j=1

∂ct

∂¨αt(j)

∂pi(Dt),(26)

∂Λ

∂pi(Dt)=

t=1

j=1

ctˆαt−1aji .(27)

Derivatives obtained from Equation (27) are used for updating two parameter sets of the base

demand vector B= [B1, B2,...,BS] and the weights of the emission network WE

Bn+1 =Bn+γ1∂Λ

∂pi(Dt)

∂Bn.(28)

The input associated with base demand is the unit vector as in Figure 3. Therefore, the second

partial derivative of ∂pi(Dt)

∂Bn= 1.

Regarding the update of the emission network weights we have

n+1 =WE

n+γ1∂Λ

∂pi(Dt)

∂WE

n,(29)

where, the ﬁrst partial derivative is obtained in (27) and the second one, partial derivatives of

emissions pi(Dt) with respect to the network weights, is related to the backpropagation step in

emission network WE

n, which is explained in Appendix A.

The newsvendor network is a feed forward network which is trained by backpropagation algo-

rithm and the weights are updated using the derivatives of the cost function with respect to WN V

WNV

n+1 =WNV

n−γ2∂N V C

∂WNV

n.(30)

Further details of the partial derivative term are provided in Appendix A.

The transition matrix Ais the joint part of the HMM and the newsvendor networks. Con-

sequently, both partial derivatives of which in Equation (25) plus the partial derivative of the

newsvendor cost function with respect to Aare used to update the transition matrix

An+1 =An−γ3−η∂Λ

∂An

+ (1 −η)∂N V C

∂An,(31)

where the term in parenthesis is the partial derivative of the single objective function (P-HMMNV)

with respect to A

An+1 =An−γ3∂Loss

∂An.(32)

3.5.3. Network speciﬁcation: We consider two hidden layers in the newsvendor network

following the rule proposed by Huang (2003) that determines the number of hidden nodes in each

layer. Further, we consider a single hidden layer for the emission network in which the number

of hidden nodes are speciﬁed based on the formula suggested by Ke and Liu (2008). In training

procedure of the HMMNV model, we choose the candidates for the trade-oﬀ coeﬃcient ηfrom the

set of {0.001,0.01,0.1,0.9,0.99,0.999}. In order to ﬁnd the best learning rates for each network,

a grid search over the set {0.001,0.01,0.1,2,10}is used for all learning rates. We choose the best

candidate among these parameters based on a cross validation step on the training data set. We

then train and test the model on all data set using the best parameter chosen by cross validation.

Figure 4indicates how the sample observations are divided into diﬀerent sets so that one can

implement the algorithms. First, we pick a smaller set of the data and divide it into training and

test samples for inner cross-validation to ﬁnd the best parameters. We then train the algorithm

with the best selected parameter on the entire smaller set that we chose initially and test the model

on the other test set to evaluate the performance of the model.

Figure 4 Diﬀerent sets of the sample observations used for cross-validation, training, and test.

4. Analyses

In this section, we test the performance of HMMNV model using simulated data. In Section 4.1,

we introduce several alternative methods as benchmarks. In Section 4.2, we describe our numerical

setup to evaluate the performance of HMMNV model with diﬀerent benchmarks, and present and

highlight the results in Section 4.3.

4.1. Benchmarks and true model

In the following, we list the benchmark methods and explain more speciﬁcally how they solve the

intended newsvendor problem. We also provide results for the newsvendor cost that is obtained

by the true model as a near perfect benchmark. The true model is used as a baseline to compare

all the methods.

EDD: The empirical demand distribution approach uses only the demand observation and ﬁnds

the optimal solution based on the empirical distribution which assumes equal weights for each

observation. The optimal quantity is obtained by the known formula.

ObBW: The Objective-blind Baum-Welch algorithm considers the Markov chain which subor-

dinates the demand (Avci et al. 2020,Treharne and Sox 2002). Consequently, it ﬁnds the most

probable sequence of states by the predetermined distribution for demand. Therefore, this is a

parametric approach that ﬁts a distribution on demand and estimates the appropriate parameters

for each state of the Markov model. Baum-Welch algorithm is a well-known forward-backward

method in the estimation of hidden Markov states using observable data (i.e. demand). Details of

this algorithm are described by Rabiner (1989).

PfLR: The parameter ﬁtting linear regression is the ﬁrst and the simplest method which takes

the feature data into account. It consists of two steps: estimating the parameters of the demand

distribution and then use the estimations in optimization problem (Ban and Rudin 2019). However,

this approach ignores the dependency between demand observations through the hidden Markov

states. That said, it is a dynamic approach that makes a linear relationship between features and

demand by the following linear regression

Dt=β0+β1f1t+ +β2f2t+...+βNfN t +ϵt,(33)

where t= 1,...,T, and Nis the number of features. The time-varying mean for demand and

the standard deviation of the estimated normal distribution is calculated by the coeﬃcients and

residuals of the regression in (33). The standard deviation of the demand distribution is equivalent

to the standard deviation of the residuals. The optimal order quantity would be

Q∗

t=β0+β1f1t+β2f2t+...+βNfN t +z∗σϵ,(34)

where z∗= Φ−1(b/(h+b)).

LML: Ban and Rudin (2019) suggest a linear decision rule for the order quantity as

Q=q:X →R:qt(β) = β0+

n=1

βnfnt.(35)

They substitute this linear formula into the newsvendor problem and solve a nonlinear program

for the optimal order quantity.

DNN: Oroojlooyjadid et al. (2020) apply a deep neural network to the newsvendor problem which

is a nonlinear extension to the LML model. In this approach, a neural network maps the feature

data to the optimal decision. The goal of the network is to provide the minimum average cost value

over all periods

min

t=1

NV C(θ(Ft,W), Dt),(36)

where Wis the matrix of the network weights, Ftis the vector of input features at time t, and

the network is indicated by the mapping function θ. In order to have a network structure in DNN

method similar to HMMNV method, we use the same hyper parameters selecting rule (network

speciﬁcation including the number of hidden layers and the number of hidden nodes in each layer)

explained for HMMNV method.

True model: The true model is used as a benchmark where the prediction of the state and the

eﬀects of features are done perfectly. Although this scenario is impossible in practice, it can be

used as a benchmark to capture the diﬀerence between the performance of the methods discussed

here with an ideal method. In the true model, we assume that one knows the exact parameters

of the model. Two sets of parameters exist, ﬁrst, the states of the system which are hidden and

unobservable to all the other methods, second, the distribution of the demand at each time which

is the sum of two normal distributions related to states referred to as base demand and the feature

part of the demand. More speciﬁcally, current state i, associated mean and variance of the base

demand denoted by µiand σ2

i, respectively, the exact function ψ(Ft,W) and the variance σ2

are known. Therefore, the factors that cause some costs in the true model are the two variance

terms of σ2

ϵand σ2

irelated to the features part and base demand distributions. The true optimal

order quantity is then obtained by the known formula as Q∗

t=µi+ψ(Ft,W) + z∗pσ2

i+σ2

ϵ, where

z∗= Φ−1(b/(b+h)) and Φ() is the cumulative distribution function of a standard normal random

variable.

In the next section, for each method, the results are reported as the percentage deviation of the

cost obtained by a method with respect to the cost for the true model,

Percentage deviation of a method = Cost of a method −Cost of the true model

Cost of the true model ×100.

4.2. Numerical Experiments

In this section, we ﬁrst present the experimental setup that we have used to evaluate the per-

formance of our method compared to the benchmarks. In our experimental setup, the demand

in each period is a normally distributed random variable. The mean of this distribution in each

period is determined based on the state of a Markov chain and a number of randomly generated

Table 3 The set of parameters used in the experimental setup.

Parameter Description Domain

etransition probability 0.01 0.1 0.2

µ1mean of the base demand in state 1 1 2 3

µ2mean of the base demand in state 2 1

σ1standard deviation of the base demand in state 1 0.5 1

σ2standard deviation of the base demand in state 2 0.5

σϵstandard deviation of the demand by features 0.5

Nnumber of features 1 3 5

Wnetwork weights 1 2 3

Lnumber of hidden layers 1 2 3

b/h newsvendor cost parameters ratio 2 5 10

Tnumber of observations 200 400 800

features for that period. Table 3shows the domain for each parameter of the system. According

to Figure 1, we evolve the base demand by a two-state Markov chain model, and the additional

part of the demand is generated by using a neural network and feature data. The ﬁnal demand is

the summation of state-dependent and feature-dependant demand values. Parameter erelates to

the transition probability between two states of the Markov chain. The transition matrix is then

determined as





s1s2

s11−e e

s2e1−e

,

where s1and s2indicate two states of the Markov chain. Regarding the base demand in each state,

we assume that they have normal distribution with parameters (µ1, σ1) and (µ2, σ2), respectively.

We change the parameters of the ﬁrst state within the range speciﬁed in Table 3and ﬁx those of

the second state on (µ2= 1, σ2= 0.5). Nis the number of features observed before the realization

of the demand. All the features are randomly drawn from the standard normal distribution. We

consider three diﬀerent network weights denoted by W. Each network weight is a random variable

with the standard normal distribution. The structure of the network, including the number of

hidden layers makes the relation between the features and demand more complex as the number

of hidden layers increases. In our experiments, we consider up to three layers denoted by L. We

consider 10 hidden nodes in each hidden layer where a sigmoid function serves as the activation

function. We also consider three diﬀerent values for the ratio between the cost rates in the problem

(b/h). Finally, the parameter Tis the number of observations. We evaluate the methods on all

5832 combinations that these parameter sets provide.

4.3. Results

In the following, we give a summary of the results of our experiments. In pairwise comparisons,

HMMNV outperforms DNN in 64 % of the cases. HMMNV outperforms EDD in 60 % of the cases.

HMMNV outperforms all the methods in 37 % of the cases. EDD outperforms all methods in 15

% of the cases.

Figures 5presents the results of these experiments. In this ﬁgure, each point on the plot corre-

sponds to a data with a certain parameter set where the x-axis is labeled by the algorithms and the

value of y-axis represents the percentage deviation from the cost of the true model associated with

each parameter set. A box plot is also depicted on the scattered result points of each algorithm.

HMMNV obtained a lower mean and deviation from the mean compared to other methods. ObBW

and EDD do not seem to perform well. The other three methods PfLR, LML, and DNN perform

similarly. This grouping in performance implies that both sources that aﬀect demand including

observable features and non-observable Markov states have to be taken into account.

EDD ObBW PfLR LML DNN HMMNV

100

120

140

160

180

200

Figure 5 The result of all algorithms for diﬀerent parameter sets.

To show the outperformance of each method over the others, we also implement a t-test on cost

values two by two. This test proves if there is a meaningful and statistically signiﬁcant diﬀerence

between the results of any two algorithms. Table B.2 shows the p-values of the test. If the algorithm

in row ihas a lower cost than algorithm in column j, the cell ij of the table is ﬁlled by their p-value

Table 4 p-values of the t-test between the cost of algorithms.

EDD ObBW PfLR LML DNN HMMNV

EDD 3.7894e-100

ObBW

PfLR 7.7860e-13 9.8089e-168 0.8860 1.0743e-06

LML 3.5083e-12 1.1297e-163 2.6823e-06

DNN 0.0312 2.7515e-115

HMMNV 1.6786e-80 4.6684e-299 2.4323e-35 1.5093e-35 9.6652e-63

obtained from the test otherwise it is left empty. A p-value lower that 1 % indicates that the mean

cost of an algorithm is less than the other one at 1 % signiﬁcance level. The HMMNV has a lower

cost compared to others and the p-values close to zero conﬁrm this. EDD as the simplest method

outperforms only the ObBW. ObBW has shown higher cost compared to all other methods. PfLR

and LML are very close and they outperform the DNN. DNN is better than EDD and ObBW at

5 % and 1 % levels, respectively.

In order to investigate the eﬀect of each parameter on the results and show how the algorithms

diﬀer over the range of the parameters, we plot the deviations of the methods for each parameter

separately. Figure 6represents the results for parameter values in table 3. The parameter ewhich

refers to the transition probability of the hidden Markov is a representative for the information

about the non-observable states. A lower value of eindicates more stability for Markov chain and

consequently greater importance and eﬀect caused from states on demand. As a result we can see

0 0.05 0.1 0.15 0.2

100 EDD

ObBW

PfLR

LML

DNN

HMMNV

1 1.5 2 2.5 3

100 EDD

ObBW

PfLR

LML

DNN

HMMNV

0.5 0.6 0.7 0.8 0.9 1

100 EDD

ObBW

PfLR

LML

DNN

HMMNV

12345

100 EDD

ObBW

PfLR

LML

DNN

HMMNV

1 1.5 2 2.5 3

100 EDD

ObBW

PfLR

LML

DNN

HMMNV

1 1.5 2 2.5 3

100 EDD

ObBW

PfLR

LML

DNN

HMMNV

2 4 6 8 10

100 EDD

ObBW

PfLR

LML

DNN

HMMNV

200 400 600 800

100 EDD

ObBW

PfLR

LML

DNN

HMMNV

Figure 6 Deviations of algorithms’ cost from the true model against the range of each parameter.

that the HMMNV method have lower cost for the smallest value of e. As eincreases the methods

except ObBW converges which means that the systematic eﬀect of Markov states on randomness

of the demand decreases. The parameter associated with the mean of the base demand in two

states aﬀects the results more than other parameters. As µ1increases, the long term dependencies

inﬂuence the demand more than observable features. As a result, HMMNV performs better than

the benchmarks in cases with higher imbalance in base demands. Other parameter of the base

demand which is the standard deviation in not eﬀective as much as the mean and all methods

except HMMNV are insensitive to that. Similar to the mean of the base demand, standard deviation

imposes more costs on our method as its value in one state diﬀer more from the other state. The

fourth plot shows the eﬀect of the number of observable features indicated by N. It is reasonable

that methods which don’t use features like EDD and ObBW are not sensitive to N. Other methods

converge as Nincreases. This results from the noises that features add to the demand. If features

compose the underlying pattern, one can gets better results by incorporating them to the model.

A speciﬁc value of the parameter Wwhich indicates the relations between features and demand

is not meaningful compared to its other values but lower deviations over all Wshow the power

of the HMMNV algorithm in achieving optimal policy with diﬀerent random relations compared

to others. The complexity of these relations stems from the structure of the networks and the

associated number of hidden layers. It is observed that HMMNV results in lower cost regardless

of the nonlinear complexity. The outperformance of the HMMNV method is obvious over diﬀerent

cost imbalance imposed by the b/h ratio. Deviations increase as this ratio increases in all methods.

Regarding the number of observations, the results represent that our method works better as T

increases. This is actually derived by the Markov sequence which is well captured using more

observations. Brieﬂy, all these analyses prove the robustness of the suggested algorithm that results

in a lower average cost with respect to other methods.

In a separate experiment, we examine the performance of the suggested algorithm in some

extreme situations when historical demand observations contain some outliers that have low-

frequency records. We generate new sample sets using diﬀerent model parameters and show the

robustness of our model in these extreme scenarios. The results of these experiments are given in

Appendix B Section B.24.

5. Real Data Experiment: Crude Oil Demand as a Proxy

We evaluate the performance of our algorithm and the benchmark methods using real data of the

U.S. weekly crude oil demand. This data can be taken as a proxy for a product or service whose

demand is closely correlated with U.S. crude oil. Food crops and agricultural commodities including

4We thank an anonymous referee for this suggestion.

Figure 7 Time series of the weekly demand for crude oil from January 1986 to August 2020. The average demand

is 14731 thousand barrels per week.

corn, wheat, rice, and sugar are some newsvendor products that are shown to have a correlation

with crude oil (Du et al. 2011,Mokni and Youssef 2020).

The crude oil demand data consists of weekly demand and covers the period from January 1986

to August 20205. The series of the demand is shown in Figure 7. The set has 1800 weekly data

sample observations. The set of features for the model includes 16 dummy variables representing

the quarter of the year and the month of the year. These features capture the observable variation in

demand which derives the seasonality. However, the variations caused by environmental randomness

are not observable, we assume that they exist and follow a Markov chain model with two states.

Our algorithm use the feature data to explain the additional demand and model the residuals as

base demand which are not captured by features. Base demand is modeled by Markov chain and

evolves the demand over time along with the additional demand.

We pick a time window with the length of 500 weeks and train the algorithms on the ﬁrst 400

weeks and test the trained models on the remaining 100 weeks. We roll this window by 100-week

forward steps and obtain 14 test periods in total. Table 5summarizes the result of the algorithms.

This table shows the average cost of each method over 14 test periods (1400 weeks) for each of

newsvendor cost parameters ratio (b/h).

It is observed that our proposed method outperforms other approaches for all values of cost

parameters ratio. The second best approach is ObBW that uses demand and ﬁts a Markov model.

Its outperformance suggests that the states of the environment have long dependency and inﬂuence

the demand in a state-dependent manner. The benchmark methods which use features have similar

results with minor diﬀerences. Brieﬂy, ObBW and EDD are still the best benchmarks among others.

5Data is collected from the U.S. Energy Information Administration at: https://www.eia.gov

Table 5 Average cost of the crude oil ordering policy by each algorithm (values are divided by 103).

b/h EDD ObBW PfLR LML DNN HMMNV

2 1.161 0.840 1.150 1.123 1.117 0.477

5 1.660 1.262 1.751 1.701 1.703 0.650

10 2.158 1.625 2.282 2.274 2.244 1.557

20 2.871 2.134 2.847 3.104 3.178 1.832

EDD

b/h=5

01234

0.1

0.2

0.3

0.4 ObBW

b/h=5

01234

0.1

0.2

0.3

0.4 PfLR

b/h=5

01234

0.1

0.2

0.3

0.4

LML

b/h=5

01234

0.1

0.2

0.3

0.4 DNN

b/h=5

01234

0.1

0.2

0.3

0.4 HMMNV

b/h=5

01234

0.1

0.2

0.3

0.4

Figure 8 The frequency of the cost values obtained by each algorithm for the real data over all test periods.

We conclude that if the randomness of the features and the environment are combined in a model,

the cost decreases drastically. 6

Figure 8shows the frequency of costs over 14 test periods for all methods. The suggested method

oﬀers an ordering policy that results in lower costs than other methods. In order to show the states

obtained by HMMNV method and the optimal policy, we plot the demand and the order quantity

for 200 recent out of sample weeks in Figure 9. It is observed that the order quantity tracks the

demand closely, imitating the present variations in the demand. The bottom panel of this ﬁgure

shows the estimated sequence of the states during this period. When comparing the order and

6In addition to these ﬁve benchmarks, we also consider the well-known method of Holt-Winters that explicitly

accounts for seasonality and trend in time-series forecasting literature. To use this method in the newsvendor problem

setting, we utilize it to forecast the demand and then use the forecast as the order quantity. This approach is referred

to as Estimate-As-Solution (EAS) in the literature (Oroojlooyjadid et al. 2020). For the trend and seasonality compo-

nents in the Holt-Winters method, we consider the trend component as an additive and the seasonality components

as multiplicative, since the Holt-Winters model performs well using these components compared to other types of

components. The mean values of newsvendor cost for the Holt-Winters method are as follows. For b/h =2, 5 ,10, and

20, the mean of costs are 0.907, 1.826, 3.370, and 6.419, multiplied by 103, respectively. Comparing these results with

those in Table 5indicates that the means of all costs are smaller when we use HMMNV to ﬁnd the optimal order

quantity compared to the Holt-Winters approach. We thank an anonymous referee for this valuable suggestion.

Figure 9 Optimal order quantity obtained by HMMNV for b/h = 5 during 200 weeks of test periods from October

2016 to August 2020 (top panel). Bottom panel shows the corresponding state sequence of the Markov

model estimated by this algorithm.

demand series to the states, we observe that the HMMNV algorithm assigns demands with higher

variations to state 1 while the data labeled by state 2 has lower variations. Consequently, the order

quantity by HMMNV has the variations similar to that presented in the demand.

We further show the robustness of the suggested method in the real experiment by performing

the algorithms using monthly crude oil demand observations. Additionally, we take into account the

well-studied features in the literature of crude oil forecasting. These features such as unemployment,

S&P 500 stock index, personal disposable income, and etc. capture microeconomic eﬀects on the

dynamics of the oil market. Details of the features and the results are reported in Appendix B

Section B.37.

6. Conclusions

We presented an integrated learning and optimization method based on deep learning for the data-

driven newsvendor problem with observable and unobservable features. Feature data about the

demand for a product and underlying dynamics of the market that have often dependencies across

multiple decision periods are both common factors in the ordering decision and this work provides

an approach for eﬀectively learning from both these categories of information.

7We thank an anonymous referee for this suggestion.

Through extensive numerical experiments based on synthetic and real data, we assess the per-

formance of a variety of methods for the data-driven control of a newsvendor system and show

that our method outperforms the others in a variety of settings.

Although we consider only the ordering problem for a single-item newsvendor in this study,

our approach can be extended to multiple items whose demands could be correlated. Another

interesting extension is that of a joint price and inventory optimization problem where the demand

is dependent on the selling price. In this case, one can investigate how to switch between pricing

strategies during various hidden states and use environmental and local features as price drivers.

Acknowledgements: Research leading to these results has received funding from the EU ECSEL

Joint Undertaking under grant agreement no. 737459 (project Productive4.0) and from TUBITAK

(217M145).

References

Arifo˘glu, K. and S. ¨

Ozekici (2010). Optimal policies for inventory systems with ﬁnite capacity and par-

tially observed Markov-modulated demand and supply processes. European Journal of Operational

Research 204 (3), 421–438.

Arifo˘glu, K. and S. ¨

Ozekici (2011). Inventory management with random supply and imperfect information:

A hidden Markov model. International Journal of Production Economics 134 (1), 123–137.

Avci, H., K. Gokbayrak, and E. Nadar (2020). Structural results for average-cost inventory models with

Markov-modulated demand and partial information. Production and Operations Management 29 (1),

156–173.

Ban, G.-Y. and C. Rudin (2019). The big data newsvendor: Practical insights from machine learning.

Operations Research 67 (1), 90–108.

Bensoussan, A., M. Cakanyıldırım, and S. P. Sethi (2005). On the optimal control of partially observed

inventory systems. Comptes Rendus Mathematique 341 (7), 419–426.

Bensoussan, A., M. C¸ akanyıldırım, and S. P. Sethi (2007). A multiperiod newsvendor problem with partially

observed demand. Mathematics of Operations Research 32 (2), 322–344.

Bertsimas, D. and N. Kallus (2020). From predictive to prescriptive analytics. Management Science 66 (3),

1025–1044.

Bertsimas, D. and A. Thiele (2005). A data-driven approach to newsvendor problems. Technical report,

Massachusetts Institute of Technology, Cambridge, MA.

Besbes, O. and A. Muharremoglu (2013). On implications of demand censoring in the newsvendor problem.

Management Science 59 (6), 1407–1424.

Beyer, D. and S. P. Sethi (1997). Average cost optimality in inventory models with Markovian demands.

Journal of Optimization Theory and Applications 92 (3), 497–526.

Bhar, R. and S. Hamori (2004). Hidden Markov models: applications to ﬁnancial economics, Volume 40.

Springer Science & Business Media.

Blinder, A. S. and L. J. Maccini (1991). The resurgence of inventory research: what have we learned? Journal

of Economic Surveys 5 (4), 291–328.

Du, X., L. Y. Cindy, and D. J. Hayes (2011). Speculation and volatility spillover in the crude oil and

agricultural commodity markets: A Bayesian analysis. Energy Economics 33 (3), 497–503.

Efendigil, T., S. ¨

On¨ut, and C. Kahraman (2009). A decision support system for demand forecasting with

artiﬁcial neural networks and neuro-fuzzy models: A comparative analysis. Expert Systems with Appli-

cations 36 (3), 6697–6707.

Feldman, R. M. (1978). A continuous review (s, s) inventory system in a random environment. Journal of

Applied Probability 15 (3), 654–659.

Gallego, G. and H. Hu (2004). Optimal policies for production/inventory systems with ﬁnite capacity and

Markov-modulated demand and supply processes. Annals of Operations Research 126 (1-4), 21–41.

Gallego, G. and I. Moon (1993). The distribution free newsboy problem: review and extensions. Journal of

the Operational Research Society 44 (8), 825–834.

Goel, S., J. M. Hofman, S. Lahaie, D. M. Pennock, and D. J. Watts (2010). Predicting consumer behavior

with web search. Proceedings of the National Academy of Sciences.

Gruhl, D., L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Zien (2004). How to build a

webfountain: An architecture for very large-scale text analytics. IBM Systems Journal 43 (1), 64–77.

Gruhl, D., R. Guha, R. Kumar, J. Novak, and A. Tomkins (2005). The predictive power of online chatter. In

Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data

Mining, pp. 78–87. ACM.

Gurney, K. (2018). An introduction to neural networks. CRC press.

Hamilton, J. (1990). Analysis of time series subject to changes in regime. Journal of Econometrics 45,

39–70.

He, B., F. Dexter, A. Macario, and S. Zenios (2012). The timing of staﬃng decisions in hospital operating

rooms: incorporating workload heterogeneity into the newsvendor problem. Manufacturing & Service

Operations Management 14 (1), 99–114.

Huang, G.-B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks.

IEEE Transactions on Neural Networks 14 (2), 274–281.

Huh, W. T., R. Levi, P. Rusmevichientong, and J. B. Orlin (2011). Adaptive data-driven inventory control

with censored demand based on Kaplan-Meier estimator. Operations Research 59 (4), 929–941.

Ke, J. and X. Liu (2008). Empirical analysis of optimal hidden neurons in neural network modeling for stock

prediction. In Computational Intelligence and Industrial Application, 2008. PACIIA’08. Paciﬁc-Asia

Workshop on, Volume 2, pp. 828–832. IEEE.

Kesavan, S. and T. Kushwaha (2014). Diﬀerences in retail inventory investment behavior during macroeco-

nomic shocks: Role of service level. Production and Operations Management 23 (12), 2118–2136.

Khayyati, S. and B. Tan (2020). Data-driven control of a production system by using marking-dependent

threshold policy. International Journal of Production Economics 226, 107607.

Levi, R., R. O. Roundy, and D. B. Shmoys (2007). Provably near-optimal sampling-based policies for

stochastic inventory control models. Mathematics of Operations Research 32 (4), 821–839.

Levinson, S. E., L. R. Rabiner, and M. M. Sondhi (1983). An introduction to the application of the theory

of probabilistic functions of a Markov process to automatic speech recognition. Bell System Technical

Journal 62 (4), 1035–1074.

Liyanage, L. H. and J. G. Shanthikumar (2005). A practical inventory control policy using operational

statistics. Operations Research Letters 33 (4), 341–348.

Lovejoy, W. S. (1992). Stopped myopic policies in some inventory models with generalized demand processes.

Management Science 38 (5), 688–707.

Mokni, K. and M. Youssef (2020). Empirical analysis of the cross-interdependence between crude oil and

agricultural commodity markets. Review of Financial Economics 38 (4), 635–654.

Monahan, G. E. (1982). State of the art—a survey of partially observable Markov decision processes: theory,

models, and algorithms. Management Science 28 (1), 1–16.

Oroojlooyjadid, A., L. V. Snyder, and M. Tak´aˇc (2020). Applying deep learning to the newsvendor problem.

IISE Transactions 52 (4), 444–463.

Perakis, G. and G. Roels (2008). Regret in the newsvendor model with partial information. Operations

Research 56 (1), 188–203.

Picone, J. (1990). Continuous speech recognition using hidden Markov models. IEEE ASSP Magazine 7 (3),

26–41.

Qi, M., Y. Shi, Y. Qi, C. Ma, R. Yuan, D. Wu, and Z.-J. M. Shen (2020). A practical end-to-end inventory

management model with deep learning. Available at SSRN 3737780 .

Qin, F., A. Auerbach, and F. Sachs (2000). A direct optimization approach to hidden Markov modeling for

single channel kinetics. Biophysical Journal 79 (4), 1915–1927.

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition.

Proceedings of the IEEE 77 (2), 257–286.

Sachs, A.-L. (2015). The data-driven newsvendor with censored demand observations. In Retail Analytics,

pp. 35–56. Springer.

Scarf, H. (1958). A min-max solution of an inventory problem. Studies in the Mathematical Theory of

Inventory and Production.

Scarf, H. (1959). Bayes solutions of the statistical inventory problem. The Annals of Mathematical Statis-

tics 30 (2), 490–508.

Sethi, S. P. and F. Cheng (1997). Optimality of (s, s) policies in inventory models with Markovian demand.

Operations Research 45 (6), 931–939.

Seubert, F., N. Stein, F. Taigel, and A. Winkelmann (2020). Making the newsvendor smart–order quantity

optimization with anns for a bakery chain. Working Paper.

Shang, K. H. (2012). Single-stage approximations for optimal policies in serial inventory systems with

nonstationary demand. Manufacturing & Service Operations Management 14 (3), 414–422.

Song, J.-S. and P. Zipkin (1993). Inventory control in a ﬂuctuating demand environment. Operations

Research 41 (2), 351–370.

Treharne, J. T. and C. R. Sox (2002). Adaptive inventory control for nonstationary demand and partial

information. Management Science 48 (5), 607–624.

Van der Laan, N., R. H. Teunter, W. Romeijnders, and O. Kilic (2019). The data-driven newsvendor problem:

Achieving on-target service levels. Technical report, Working paper, University of Groningen, SOM

research school.

Van Parys, B. P., P. M. Esfahani, and D. Kuhn (2020). From data to decisions: Distributionally robust

optimization is optimal. Management Science.

Zhang, Y. and J. Gao (2017). Assessing the performance of deep learning algorithms for newsvendor problem.

In International Conference on Neural Information Processing, pp. 912–921. Springer.

Appendices

Appendix A Backprobagation Algorithm

In order to calculate the derivatives based on backpropagation algorithm for the newsvendor net-

work, we consider the network with Llayers including input and output layers, so it has L−2

hidden layers in each of which there are M(l)hidden nodes for l= 2,...,L−1. w(l, l+1)

ij denotes the

weight between node iin layer land node jin layer l+ 1. Let net(l)

iand o(l)

ideﬁne the input and

output of each node iin layer l. It is seen that o(1)

i=snet(1)

i, and in output unit the activation

function is a simple addition function so simply the output of this node is identical to the trans-

mitted value to it and they also represent the order quantity Q=o(L)=net(L).

If we consider a sample being fed into the network, we could drive the calculations for it, and then

extend them for all samples in a matrix form. At the end of feed-forward step, once we get the

output from the network in output node, the corresponding newsvendor cost will be computed by

newsvendor cost function.

Backpropagation step begins right after feed-forward step. In this step we look for partial deriva-

tives ∂N V C

∂w(l,l+1)

for all i,j, and l= 1,...,L−1. So, we need to compute the derivatives of activation

functions in each node and ﬁnd the path to a certain edge. This takes us to the intended partial

derivative. The derivative of the newsvendor cost function NV C with respect to the output Qor

equivalently o(L)is

∂N V C

∂o(L)=h if D < Q

b if Q < D .(A.37)

As the activation function in the output unit is a simple additional function, the derivative of it

with respect to its input value, ∂o(L)

∂net(L)is just equal to 1. In hidden layer units, sigmoid function has

a property that its derivative results in an expression that consists only the function itself. So if the

output of hidden node jin layer lis o(l)

j, the derivative would be o(l)

j(1 −o(l)

j). Now for simplicity

of chain rule in taking the derivatives we deﬁne a term referred as to backpropagation error for

node j= 1,...,M(l)in layer l= 2,...,L. We are moving from the right of the network to the left,

so ﬁrstly we can compute partial derivatives of ∂ N V C

∂w(L−1,L)

. To this end we deﬁne backpropagation

error of output unit as

δ(L)=∂N V C

∂o(L)

∂net(L).(A.38)

In this problem δ(L)would be

δ(L)=δ(L)(h) = h.1if D < Q

δ(L)(b) = b.1if Q < D .(A.39)

Now, one more multiplication is left to get to the partial derivative ∂N V C

∂w(L−1,L)

which is ∂net(L)

∂w(L−1,L)

o(L−1)

jFinally we have

∂N V C

∂w(L−1,L)

=δ(L)o(L−1)

j,(A.40)

which is

∂N V C

∂w(L−1,L)

=(δ(L)(h)o(L−1)

jif D < Q

δ(L)(b)o(L−1)

jif Q < D .(A.41)

The remaining set of partial derivatives ∂E

∂w(l,l+1)

for l= 1,...,L −2 are obtained similarly by

computing backprobagtion errors. In order to compute the backpropagation error δ(l)

jof node jin

the hidden layer l, all possible backward paths which reach to hidden node jshould be considered

so that an integration weighted of backpropagation errors of layer l+1, δ(l+1) , will transit to hidden

node j

δ(l)

M(l+1)

k=1

δl+1

kw(l,l+1)

jk .(A.42)

Similar to the previous set of derivatives, the term of ∂o(l)

∂net(l)

=o(l)

j(1 −o(l)

j) should be multiplied

by δ(l)

j. Then backpropagation error for node jin the hidden layer lis calculated as

δ(l)

j=o(l)

j1−o(l)

jδ(l)

j.(A.43)

Again in this problem we have two types of backpropagation error

δ(l)

j=(δ(1)

j(h)if D < Q

δ(1)

j(b)if Q < D .(A.44)

According to the chain rule, to obtain the partial derivative ∂ NV C

∂w(1,l+1)

the last term which is derivative

of ∂net(l+1)

∂w(l,l+1)

=oj(l)is multiplied by δ(l+1)

j, hence;

∂N V C

∂w(l,l+1)

=(δ(l)

j(h)oj(l)if D < Q

δ(l)

j(b)oj(l)if Q < D .(A.45)

When all partial derivatives are computed the network weights would be updated in the negative

gradient direction. Introducing a constant γas a learning rate, the corrections for the weights will

∆w(1,l+1)

ij =(−γδ(l)

j(h)oj(l)if D < Q

−γδ(l)

j(b)oj(l)if Q < D .(A.46)

Figure A.10 Input and backpropagated error on an edge

There are more than one sample which are fed into the network. So if we have Tpairs of samples

like {(F1, D1),(F2, D2),...,(FT, DT)}, the weight corrections are computed for each sample and we

get, for example, the corrections as ∆1w(l,l+1)

ij ,∆2w(l,l+1)

ij ,...,∆Tw(l,l+1)

ij for weight w(l,l+1)

ij . Therefore

the total amount of correction in the gradient direction is

∆w(l,l+1)

ij =

t=1

∆tw(l,l+1)

ij .(A.47)

In the emission network, the corresponding partial derivatives and the chain rule is similar to

the newsvendor network. The emissions whose partial derivatives with respect to network weights

are calculated is ∂p(D)

∂wE, where p(D) is a probability density function of normal distribution at each

time.

p(D) = 1

σ√2πe−(D−µ)2

2σ2(A.48)

First we need to compute the derivative of the above function with respect to Das the output

of the emission network, o(L)

i, where i∈S, then

∂p(D)

∂o(L)

=−o(L)

σ2 1

σ√2πe−(o(L)

−µ)2

2σ2!.(A.49)

This value is multiplied by previous derivatives of the HMM network reached to the emission

network and completes the initial derivatives that are used for updating the emission network

weights. First, we have

∂Λ

∂p(D)

∂o(L)

.(A.50)

We sum up the above derivatives over all states to obtain a single value for updating the network

weights as they are identical in each state, we show the result as

∂Λ

∂o(L).(A.51)

The backpropagation error term for the last layer is

δ(L)=∂Λ

∂o(L)

∂net(L).(A.52)

Now, one more multiplication is left to get to the partial derivative ∂Λ

∂w(L−1,L)

which is ∂net(L)

∂w(L−1,L)

o(L−1)

j. Finally we have

∂Λ

∂w(L−1,L)

=δ(L)o(L−1)

j.(A.53)

Derivatives of the function Λ to other network weights w(l,l+1)

ij between node iin layer land j

in layer l+ 1 for l= 1,...,L−1 are obtained similar to newsvendor network by chain rule and

computing backpropagation errors for each node and layer.

Appendix B Further Analyses

B.1 Trade-oﬀ coeﬃcient ηanalysis

Figure B.1 depicts the eﬀect of the trade-oﬀ coeﬃcient ηon the ﬁnal cost of the model for test

sample. The curve on the plot is obtained by optimizing the function in (P-HMMNV) for diﬀerent

values of η. Accordingly, the network is trained and the cost for the out-of-sample is shown on

Y-axis for the corresponding value of η. The optimal value for ηis 0.64 (indicated by the red

vertical line) by a grid search over the range [0.001,0.999]. However, this value is not available

prior to setting the order quantity. For this reason, the HMMNV method ﬁnds the best value for η

equal to 0.70 by implementing the cross-validation process (indicated by the black vertical line). It

is observed that the proposed algorithm and the cross-validation detect near-optimal value for η.

One can also observe that the plot is unimodal, conﬁrming that ignoring the best trade-oﬀ between

hidden states and observable features as two sources of randomness will lead to an increase in the

ﬁnal objective value.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.9

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

Cost

Newsvendor cost of test set

Best = 0.70 by cross-valdation

Optimal = 0.64 by grid search

Figure B.1 Eﬀect of the trade-oﬀ coeﬃcient ηon the out-of-sample newsvendor cost.

B.2 Extreme scenarios

We generate some samples to simulate the suggested extreme scenarios. In this data set, we assume

that some demand outliers with low-frequency historical records belong to the state that is rarely

visited by the Markov chain and the demand value in that scarce state is signiﬁcantly diﬀerent

from the demand observations in the other state(s). We consider a two-state Markov chain with

the parameter ethat is the probability of transition from state 1 to state 2. The transition matrix

can be set as





s1s2

s11−e e

s21−e e 

,

where the value of eis chosen so that 1 −eis big enough to ensure that the system is in state 1

most of the time. We set the other parameters of the model as given in Table B.1. The small e

values ensure that most of the demand observations come from state 1. The big diﬀerence between

µ1and µ2(µ2−µ1= 4) causes the demand observations from state 2 to have extreme values and

act as outliers with low frequency in the demand sequence.

Table B.1 The set of parameters used in the experimental setup for extreme scenarios with demand outliers.

Parameter Description Domain

etransition probability 0.01 0.05

µ1mean of the base demand in state 1 1

µ2mean of the base demand in state 2 5

σ1standard deviation of the base demand in state 1 0.5

σ2standard deviation of the base demand in state 2 0.5

σϵstandard deviation of the demand by features 0.5

Nnumber of features 1 3

Wnetwork weights 1

Lnumber of hidden layers 1

b/h newsvendor cost parameters ratio 2 5 10

Tnumber of observations 100 200

Figure B.2 and Table B.2 give the results for these experiments. The results indicate that both

the HMMNV and the DNN method are able to perform well given the presence of states that are

rarely visited.

Table B.2 p-values of the t-test between the cost of algorithms in extreme scenarios.

EDD ObBW PfLR LML DNN HMMNV

EDD 0.2685

ObBW

PfLR 0.1940 0.0150

LML 0.0265 6.4659e-04 0.3785

DNN 0.0192 4.2083e-04 0.3115 0.8878

HMMNV 0.0091 9.9872e-05 0.2308 0.7778 0.8952

Figure B.2 The result of all algorithms for extreme scenarios.

B.3 Real experiment with macroeconomic features

We have used monthly data related to suggested extra features including lagged price of crude

oil, lagged unemployment, lagged SP 500 stock index, and lagged personal disposable income for

assessing the eﬀect of using extra features. This data covers the period from September 1986 to

August 2020 and it consists of 408 monthly sample observations. Accordingly, we resample the

demand for crude oil on a monthly basis. We pick a time window with the length of 120 months

and feed the ﬁrst 96 months (80 % of the total periods in a window) to the algorithms and then

test the trained models on the following 24 months. We roll this window by 24-month forward

steps and obtain 13 test periods in total. Table B.3 gives the average cost of the monthly ordering

policy for crude oil. We observe that the proposed HMMNV method performs well compared to

other benchmark methods, however, as expected, this advantage is not present in all cases (e.g.

when b/h = 10, DNN method has an average cost lower than HMMNV method). This is due to the

informative input features that capture almost all the uncertainties in demand observations and

have more out of sample forecasting power. Moreover, it should be noted that both HMMNV and

DNN methods perform well compared to other methods since they consider nonlinear relations in

modeling as well.

Table B.3 Average cost of the crude oil ordering for the newsvendor problem for monthly data by each

algorithm using macroeconomic features (values are divided by 105).

b/h EDD ObBW PfLR LML DNN HMMNV

2 4.791 4.589 5.321 7.875 5.518 4.350

5 6.479 7.511 7.244 10.534 6.580 5.892

10 7.594 11.23 8.604 9.057 6.916 8.009

20 8.668 17.054 10.430 7.329 8.232 7.372

Evolving Hybrid Deep Neural Network Models for End-to-End Inventory Ordering Decisions

Article

Full-text available

Nov 2023

Background: Over the past decade, the potential advantages of employing deep learning models and leveraging auxiliary data in data-driven end-to-end (E2E) frameworks to enhance inventory decision-making have gained recognition. However, current approaches predominantly rely on feed-forward networks, which may have difficulty capturing temporal correlations in time series data and identifying relevant features, resulting in less accurate predictions. Methods: Addressing this gap, we introduce novel E2E deep learning frameworks that combine Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) for resolving single-period inventory ordering decisions, also termed the Newsvendor Problem (NVP). This study investigates the performance drivers of hybrid CNN-LSTM architectures, coupled with an evolving algorithm for optimizing network configuration. Results: Empirical evaluation of real-world retail data demonstrates that our proposed models proficiently extract pertinent features and interpret sequential data characteristics, leading to more accurate and informed ordering decisions. Notably, results showcase substantial benefits, yielding up to an 85% reduction in costs compared to a univariate benchmark and up to 40% savings compared to a feed-forward E2E deep learning architecture. Conclusions: This confirms that, in practical scenarios, understanding the impact of features on demand empowers decision-makers to derive tailored, cost-effective ordering decisions for each store or product category.

Solving data-driven newsvendor problem with textual reviews through deep learning

Article

Full-text available

Sep 2023
SOFT COMPUT

The production decision of a large commodity or equipment manufacturing enterprise can be modeled as a newsvendor problem. Managers must determine the optimal production volume in advance to minimize the underage cost and the overage cost. However, the traditional newsvendor problem assumes the known demand distribution, which is not the case in practice. Data-driven approaches have become the hot research topic and opened up new avenues for such issues. Recent studies have considered demand-related features but have failed to address how to optimize production and inventory using informative textual reviews, not just numerical feature data. To address this issue, we propose a data-driven newsvendor model that leverages sentiment analysis on textual reviews using a deep learning model to solve the data-driven newsvendor problem by integrating estimation and optimization. Experiments on real data show that our proposed method reduces the average cost by approximately 14.18% compared to the most advanced deep neural network method, making it the best-performing method. Furthermore, our method is more suitable for situations where unit shortage costs are greater than unit overage costs. Finally, our method is robust in terms of sample size and can still obtain good results even with insufficient historical data.

Fog Computing and Industry 4.0 for Newsvendor Inventory Model Using Attention Mechanism and Gated Recurrent Unit

Article

Full-text available

Jun 2024

Background: Efficient inventory management is critical for sustainability in supply chains. However, maintaining adequate inventory levels becomes challenging in the face of unpredictable demand patterns. Furthermore, the need to disseminate demand-related information throughout a company often relies on cloud services. However, this method sometimes encounters issues such as limited bandwidth and increased latency. Methods: To address these challenges, our study introduces a system that incorporates a machine learning algorithm to address inventory-related uncertainties arising from demand fluctuations. Our approach involves the use of an attention mechanism for accurate demand prediction. We combine it with the Newsvendor model to determine optimal inventory levels. The system is integrated with fog computing to facilitate the rapid dissemination of information throughout the company. Results: In experiments, we compare the proposed system with the conventional demand estimation approach based on historical data and observe that the proposed system consistently outperformed the conventional approach. Conclusions: This research introduces an inventory management system based on a novel deep learning architecture that integrates the attention mechanism with cloud computing to address the Newsvendor problem. Experiments demonstrate the better accuracy of this system in comparison to existing methods. More studies should be conducted to explore its applicability to other demand modeling scenarios.

Determinism versus uncertainty: Examining the worst-case expected performance of data-driven policies

Article

Full-text available

Apr 2024
EUR J OPER RES

This paper explores binary decision making, a critical domain in areas such as finance and supply chain management, where decision makers must often choose between a deterministic-cost option and an uncertain-cost option. Given the limited historical data on the uncertain cost and its unknown probability distribution, this research aims to ascertain how decision makers can optimize their decisions. To this end, we evaluate the worst-case expected performance of all possible data-driven policies, including the sample average approximation policy, across four scenarios differentiated by the extent of knowledge regarding the lower and upper bounds of the first moment of the uncertain cost distribution. Our analysis, using worst-case expected absolute regret and worst-case expected relative regret metrics, consistently shows that no data-driven policy outperforms the straightforward strategy of choosing either a deterministic-cost or uncertain-cost option in these scenarios. Notably, the optimal choice between these two options depends on the specific lower and upper bounds of the first moment. Our research contributes to the literature by revealing the minimal worst-case expected performance of all possible data-driven policies for binary decision-making problems.

A survey of contextual optimization methods for decision-making under uncertainty

Article

Mar 2024
EUR J OPER RES

Integrated Profitability Evaluation for a Newsboy-Type Product in Own Brand Manufacturers

Article

Full-text available

Feb 2024

Effective inventory management depends on accurate estimates of product profitability to formulate ordering and manufacturing strategies. The achievable capacity index (ACI) is a simple yet efficient approach to measuring the profitability of newsboy-type products with normally distributed demand, wherein profitability is presented as the probability of achieving the target profit under the optimal ordering quantity. Unfortunately, the ACI is applicable only to retail stores with a single demand. In the current study, we addressed the issue of measuring the integrated profitability of newsboy-type products sold in multiple locations with independent demand levels, such as own-branding-and-manufacture (OBM) companies with multiple owned channels. We began by formulating profitability in accordance with multiple independent normal demands, and then developed an integrated ACI (IACI) to simplify expression. We also derived the statistical properties of the unbiased estimator to determine the true IACI in situations where demand patterns are unknown. Finally, we conducted hypothesis testing to determine whether the integrated profitability meets a stipulated minimum level. For convenience, we tabulated the critical values as a function of sample size, confidence level, the number of channels, and the stipulated minimum level. One can make decisions simply by estimating the IACI based on historical demand data from all channels and then looking up the critical value in the corresponding tables. Consequently, the proposed methods make it possible for OBM managers to address integrated profitability evaluation, which is effective in deciding the optimal timing to pull unprofitable items from the shelves by looking up generic tables. Furthermore, we also performed numerical and sensitivity analyses for a real-world case to illustrate the applicability and some managerial implications of the proposed scheme.

Solving Data-Driven Newsvendor Pricing Problems with Decision-Dependent Effect

Preprint

May 2023

This paper investigates the data-driven pricing newsvendor problem, which focuses on maximizing expected profit by deciding on inventory and pricing levels based on historical demand and feature data. We first build an approximate model by assigning weights to historical samples. However, due to decision-dependent effects, the resulting approximate model is complicated and unable to solve directly. To address this issue, we introduce the concept of approximate gradients and design an Approximate Gradient Descent (AGD) algorithm. We analyze the convergence of the proposed algorithm in both convex and non-convex settings, which correspond to the newsvendor pricing model and its variants respectively. Finally, we perform numerical experiment on both simulated and real-world dataset to demonstrate the efficiency and effectiveness of the AGD algorithm. We find that the AGD algorithm can converge to the local maximum provided that the approximation is effective. We also illustrate the significance of two characteristics: distribution-free and decision-dependent of our model. Consideration of the decision-dependent effect is necessary for approximation , and the distribution-free model is preferred when there is little information on the demand distribution and how demand reacts to the pricing decision. Moreover, the proposed model and algorithm are not limited to the newsvendor problem, but can also be used for a wide range of decision-dependent problems.

Intelligent decision-making framework for agriculture supply chain in emerging economies: Research opportunities and challenges

Article

Apr 2024
COMPUT ELECTRON AGR

EXPRESS: Learning Newsvendor Problems with Intertemporal Dependence and Moderate Non-stationarities

Article

Mar 2024

This work provides performance guarantees for solving data-driven contextual newsvendor problems, when the contextual data contains intertemporal dependence and non-stationarities. While machine learning tools have observed increasing use in data-driven inventory management problems, most of existing work assumes that the contextual data are independent and identically distributed (often referred to as i.i.d.). However, such assumptions are often violated in real operational environments where the contextual data are sequentially generated with intertemporal correlations and possible non-stationarities. By accommodating these naturally arised operational environments, our work adopts comparatively more realistic assumptions and develops out-of-sample performance bounds for learning data-driven contextual newsvendor problems.

Data-driven dynamic pricing and inventory management of an omni-channel retailer in an uncertain demand environment

Article

Dec 2023
EXPERT SYST APPL

A Practical End-to-End Inventory Management Model with Deep Learning

Article

Full-text available

Dec 2022

We investigate a data-driven multiperiod inventory replenishment problem with uncertain demand and vendor lead time (VLT) with accessibility to a large quantity of historical data. Different from the traditional two-step predict-then-optimize (PTO) solution framework, we propose a one-step end-to-end (E2E) framework that uses deep learning models to output the suggested replenishment amount directly from input features without any intermediate step. The E2E model is trained to capture the behavior of the optimal dynamic programming solution under historical observations without any prior assumptions on the distributions of the demand and the VLT. By conducting a series of thorough numerical experiments using real data from one of the leading e-commerce companies, we demonstrate the advantages of the proposed E2E model over conventional PTO frameworks. We also conduct a field experiment with JD.com, and the results show that our new algorithm reduces holding cost, stockout cost, total inventory cost, and turnover rate substantially compared with JD’s current practice. For the supply chain management industry, our E2E model shortens the decision process and provides an automatic inventory management solution with the possibility to generalize and scale. The concept of E2E, which uses the input information directly for the ultimate goal, can also be useful in practice for other supply chain management circumstances. This paper was accepted by Hamid Nazerzadeh, big data analytics—fast track. Funding: This research was supported by the National Key Research and Development Program of China [Grant 2018YFB1700600] and National Natural Science Foundation of China [Grants 71991462 and 91746210]. Supplemental Material: The online appendix and data are available at https://doi.org/10.1287/mnsc.2022.4564 .

Empirical analysis of the cross-interdependence between crude oil and agricultural commodity markets

Article

Full-text available

Jan 2020

This paper aims to investigate the cross‐interdependence between crude oil and agricultural commodity prices. We apply a test of persistence in order to verify whether crude oil prices' effect on the agricultural commodity markets is immediate or delayed. Using the daily data covering the period 2003–2017, results show that the delayed effect of crude oil prices on the agricultural commodity prices is lower than the immediate effect. Furthermore, the dependence is strongly persistent and more affected by the food crisis than the oil crisis. Additionally, a contagion effect is detected during the food crisis for almost agricultural commodity markets, while during the oil crisis, it is verified only for the soybean and wheat markets. The study is designed to determine a reliable framework for returns and volatility forecasting in commodity markets based on the oil market changes.

Data-Driven Control of a Production System by Using Marking-Dependent Threshold Policy

Article

Full-text available

Dec 2019
INT J PROD ECON

As increasingly more shop-floor data becomes available, the performance of a production system can be improved by developing effective data-driven control methods that utilize this information. We focus on the following research questions: how can the decision to produce or not to produce at any time be given depending on the real-time information about a production system?; how can the collected data be used directly in optimizing the policy parameters?; and what is the effect of using different information sources on the performance of the system? In order to answer these questions, a production/inventory system that consists of a production stage that produces to stock to meet random demand is considered. The system is not fully observable but partial production and demand information, referred to as markings is available. We propose using the marking-dependent threshold policy to decide whether to produce or not based on the observed markings in addition to the inventory and production status at any given time. An analytical method that uses a matrix geometric approach is developed to analyze a production system controlled with the marking-dependent threshold policy when the production, demand, and information arrivals are modeled as Marked Markovian Arrival Processes. A mixed integer programming formulation is presented to determine the optimal thresholds. Then a mathematical programming formulation that uses the real-time shop floor data for joint simulation and optimization (JSO) of the system is presented. Using numerical experiments, we compare the performance of the JSO approach to the analytical solutions. We show that using the marking-dependent control policy where the policy parameters are determined from the data works effectively as a data-driven control method for manufacturing.

Assessing the Performance of Deep Learning Algorithms for Newsvendor Problem

Article

Full-text available

Jun 2017

In retailer management, the Newsvendor problem has widely attracted attention as one of basic inventory models. In the traditional approach to solving this problem, it relies on the probability distribution of the demand. In theory, if the probability distribution is known, the problem can be considered as fully solved. However, in any real world scenario, it is almost impossible to even approximate or estimate a better probability distribution for the demand. In recent years, researchers start adopting machine learning approach to learn a demand prediction model by using other feature information. In this paper, we propose a supervised learning that optimizes the demand quantities for products based on feature information. We demonstrate that the original Newsvendor loss function as the training objective outperforms the recently suggested quadratic loss function. The new algorithm has been assessed on both the synthetic data and real-world data, demonstrating better performance.

A Practical End-to-End Inventory Management Model with Deep Learning

Article

Jan 2020

From Predictive to Prescriptive Analytics

Article

Aug 2019

We combine ideas from machine learning (ML) and operations research and management science (OR/MS) in developing a framework, along with specific methods, for using data to prescribe optimal decisions in OR/MS problems. In a departure from other work on data-driven optimization, we consider data consisting, not only of observations of quantities with direct effect on costs/revenues, such as demand or returns, but also predominantly of observations of associated auxiliary quantities. The main problem of interest is a conditional stochastic optimization problem, given imperfect observations, where the joint probability distributions that specify the problem are unknown. We demonstrate how our proposed methods are generally applicable to a wide range of decision problems and prove that they are computationally tractable and asymptotically optimal under mild conditions, even when data are not independent and identically distributed and for censored observations. We extend these to the case in which some decision variables, such as price, may affect uncertainty and their causal effects are unknown. We develop the coefficient of prescriptiveness P to measure the prescriptive content of data and the efficacy of a policy from an operations perspective. We demonstrate our approach in an inventory management problem faced by the distribution arm of a large media company, shipping 1 billion units yearly. We leverage both internal data and public data harvested from IMDb, Rotten Tomatoes, and Google to prescribe operational decisions that outperform baseline measures. Specifically, the data we collect, leveraged by our methods, account for an 88% improvement as measured by our coefficient of prescriptiveness. This paper was accepted by Noah Gans, optimization.

Structural Results for Average‐Cost Inventory Models with Markov‐Modulated Demand and Partial Information

Article

Aug 2019

We consider a discrete‐time infinite‐horizon inventory system with non‐stationary demand, full backlogging, and deterministic replenishment lead time. Demand arrives according to a probability distribution conditional on the state of the world that undergoes Markovian transitions over time. But the actual state of the world can only be imperfectly estimated based on past demand data. We model the inventory replenishment problem for this system as a Markov decision process (MDP) with an uncountable state space consisting of both the inventory position and the most recent belief, a conditional probability mass function, about the actual state of the world. Assuming that the state of the world evolves as an ergodic Markov chain, using the vanishing discount method along with a coupling argument, we prove the existence of an optimal average cost that is independent of the initial system state. For our linear cost structure, we also establish the average‐cost optimality of a belief‐dependent base‐stock policy. We then discretize the uncountable belief space into a regular grid and observe that the average cost under our discretization converges to the optimal average cost as the number of grid points grows large. Finally, we conduct numerical experiments to evaluate the use of a myopic belief‐dependent base‐stock policy as a heuristic for our MDP with the uncountable state space. On a test bed of 108 instances, the average cost obtained from the myopic policy deviates by no more than a few percent from the best lower bound on the optimal average cost obtained from our discretization

The Big Data Newsvendor: Practical Insights from Machine Learning

Article

Nov 2018

We investigate the data-driven newsvendor problem when one has n observations of p features related to the demand as well as historical demand data. Rather than a two-step process of first estimating a demand distribution then optimizing for the optimal order quantity, we propose solving the “big data” newsvendor problem via single-step machine-learning algorithms. Specifically, we propose algorithms based on the empirical risk minimization (ERM) principle, with and without regularization, and an algorithm based on kernel-weights optimization (KO). The ERM approaches, equivalent to high-dimensional quantile regression, can be solved by convex optimization problems and the KO approach by a sorting algorithm. We analytically justify the use of features by showing that their omission yields inconsistent decisions. We then derive finite-sample performance bounds on the out-of-sample costs of the feature-based algorithms, which quantify the effects of dimensionality and cost parameters. Our bounds, based on algorithmic stability theory, generalize known analyses for the newsvendor problem without feature information. Finally, we apply the feature-based algorithms for nurse staffing in a hospital emergency room using a data set from a large UK teaching hospital and find that (1) the best ERM and KO algorithms beat the best practice benchmark by 23% and 24%, respectively, in the out-of-sample cost, and (2) the best KO algorithm is faster than the best ERM algorithm by three orders of magnitude and the best practice benchmark by two orders of magnitude.

Structural Results for Average-Cost Inventory Models with Markov-Modulated Demand and Partial Information

Research

Jul 2019

We consider a discrete-time infinite-horizon inventory system with non-stationary demand, full backlogging, and deterministic replenishment lead time. Demand arrives according to a probability distribution conditional on the state of the world that undergoes Markovian transitions over time. But the actual state of the world can only be imperfectly estimated based on past demand data. We model the inventory replenishment problem for this system as a Markov decision process (MDP) with an uncountable state space consisting of both the inventory position and the most recent belief, a conditional probability mass function, about the actual state of the world. Assuming that the state of the world evolves as an ergodic Markov chain, using the vanishing discount method along with a coupling argument, we prove the existence of an optimal average cost that is independent of the initial system state. For our linear cost structure, we also establish the average-cost optimality of a belief-dependent base-stock policy. We then discretize the uncountable belief space into a regular grid and observe that the average cost under our discretization converges to the optimal average cost as the number of grid points grows large. Finally, we conduct numerical experiments to evaluate the use of a myopic belief-dependent base-stock policy as a heuristic for our MDP with the uncountable state space. On a test bed of 108 instances, the average cost obtained from the myopic policy deviates by no more than a few percent from the best lower bound on the optimal average cost obtained from our discretization.

Assessing the Performance of Deep Learning Algorithms for Newsvendor Problem

Conference Paper

Oct 2017

An Integrated Data-Driven Method Using Deep Learning for a Newsvendor Problem with Unobservable Features

Abstract

Recommended publications

Integrating chaos to deep learning

Technical Note—Data-Driven Profit Estimation Error in the Newsvendor Model

Data-Driven Solutions for the Newsvendor Problem: A Systematic Literature Review

Data-Driven E-Commerce End-to-End Inventory Optimization Algorithm

Profit Estimation Error in the Newsvendor Model Under a Parametric Demand Distribution