ArticlePDF Available

Nonparametric Vector Autoregressions: Specification, Estimation, and Inference

Authors:

Abstract and Figures

For over three decades, vector autoregressions have played a central role in empirical macroeconomics. These models are general, can capture sophisticated dynamic behavior, and can be extended to include features such as structural instability, time-varying parameters, dynamic factors, threshold-crossing be-havior, and discrete outcomes. Building upon growing evidence that the assumption of linearity may be undesirable in modeling certain macroeconomic relationships, this paper seeks to add to recent ad-vances in VAR modeling by proposing a nonparametric dynamic model for multivariate time series. In this model, the problems of modeling and estimation are approached from a hierarchical Bayesian per-spective. The article considers the issues of identification, estimation, and model comparison, enabling nonparametric VAR models to be fit efficiently by Markov chain Monte Carlo algorithms and compared to parametric and semiparametric alternatives by marginal likelihoods and Bayes factors. Among other benefits, the methodology allows for a more careful study of structural instability while guarding against the possibility of unaccounted nonlinearity in otherwise stable economic relationships. Extensions of the proposed nonparametric model to settings with heteroskedasticity and other important modeling features are also considered. The techniques are employed to study the post-war US economy, confirming the presence of distinct volatility regimes and supporting the contention that certain nonlinear relationships in the data can remain undetected by standard models.
Content may be subject to copyright.
NONPARAMETRIC VECTOR AUTOREGRESSIONS:
SPECIFICATION, ESTIMATION, AND INFERENCE
IVAN JELIAZKOV
University of California, Irvine
July, 2013
Abstract
For over three decades, vector autoregressions have played a central role in empirical macroeconomics.
These models are general, can capture sophisticated dynamic behavior, and can be extended to include
features such as structural instability, time-varying parameters, dynamic factors, threshold-crossing be-
havior, and discrete outcomes. Building upon growing evidence that the assumption of linearity may
be undesirable in modeling certain macroeconomic relationships, this paper seeks to add to recent ad-
vances in VAR modeling by proposing a nonparametric dynamic model for multivariate time series. In
this model, the problems of modeling and estimation are approached from a hierarchical Bayesian per-
spective. The article considers the issues of identification, estimation, and model comparison, enabling
nonparametric VAR models to be fit efficiently by Markov chain Monte Carlo algorithms and compared
to parametric and semiparametric alternatives by marginal likelihoods and Bayes factors. Among other
benefits, the methodology allows for a more careful study of structural instability while guarding against
the possibility of unaccounted nonlinearity in otherwise stable economic relationships. Extensions of the
proposed nonparametric model to settings with heteroskedasticity and other important modeling features
are also considered. The techniques are employed to study the post-war US economy, confirming the
presence of distinct volatility regimes and supporting the contention that certain nonlinear relationships
in the data can remain undetected by standard models.
Keywords: Additive model; Vector autoregressive (VAR) model; Bayesian model comparison; Markov
chain Monte Carlo.
JEL Codes: C11, C14, C15, C32, C52, E31, E32, E37, E43, E47.
1 Introduction and Motivation
Following the seminal work of Sims (1980), the vector autoregressive (VAR) model has played a central
role in empirical macroeconomics. The basic model postulates that a q-dimensional vector of time-series
Department of Economics, University of California, Irvine, 3151 Social Science Plaza, Irvine, CA 92697-5100. E-mail:
ivan@uci.edu. I am grateful to Tom Fomby, Lutz Killian, Anthony Murphy, two anonymous referees, and my colleagues Dale
Poirier, David Brownstone, Fabio Milani, and especially Angela Vossmeyer, for their careful comments on earlier drafts.
1
variables y
t
= (y
1t
, . . . , y
qt
)
depends on its past realizations through the specification
y
t
= c +
p
j=1
B
j
y
tj
+ ε
t
, t = 1, . . . , T, (1)
where c is a q-vector of intercepts, {B
j
}
p
j=1
are q × q matrices of parameters, {y
tj
}
p
j=1
are lags of y
t
, and
ε
t
is an error term with mean zero and q × q covariance matrix . VAR models are general and can capture
sophisticated dynamic behavior, even when the lag length p is relatively small. Moreover, VAR models are
very versatile in the past few decades they have been adapted to incorporate structural instability, regime
switching, time-varying parameters, dynamic factors, threshold-crossing behavior, and discrete data, among
others. Consequently, VAR methodology has been an important instrument in policy analysis, forecasting,
and academic discourse (for a recent review, see Koop and Korobilis, 2009).
While many extensions of the model in (1) are possible, it has been common practice to maintain a para-
metric, typically linear, functional form for the conditional mean of y
t
given its lags. Important extensions of
the basic setup are afforded by models in which the parameters are allowed to change over time as in regime
switching, changepoint, time-varying parameter, and threshold models.
1
For instance, following Hamilton
(1989), much work has been done on estimating models subject to regime shifts in the mean, variance, or
dynamics (e.g., Hansen, 1992; Chib, 1996; Chauvet, 1998; Kim and Nelson, 1999; Kim et al., 2005; Sims
and Zha, 2006). Threshold regressions have been considered in Beaudry and Koop (1993), Potter (1995),
and Pesaran and Potter (1997), while time-varying parameter applications have been examined in Canova
(1993), Stock and Watson (1996), Cogley and Sargent (2001), Primiceri (2005), and Chan and Jeliazkov
(2009), among others. Much more rare in time series analysis has been the application of nonparametric
methods in the modeling and estimation of the conditional mean of a time series process, and this represents
the basic econometric problem motivating this work.
The discussion in this paper is primarily concerned with methods for allowing considerable flexibility
in estimating the dependence on lags in VAR models, while maintaining simplicity, computational tractabil-
ity, and accommodating other modeling features that may be present in the model. The VAR paradigm
1
In these cases, the modeling involves an additional state variable that can be latent or observed; conditionally on the state, the
models are linear, whereas marginalization over the state yields piecewise linear regressions or mixtures of linear regressions.
2
came to prominence as a methodological framework involving only minimal restrictions. Consequently,
continuing in this tradition, this paper discusses ways of estimating multivariate dynamic systems without
assuming a priori knowledge of the functional form. Even though the nonparametric literature is vast and
diverse, applications to time series have been limited despite their potential appeal and importance. In an
application to exchange rates, H
¨
ardle et al. (1998) use local polynomial methods based on kernel-weighted
least squares, to estimate a nonparametric bivariate dynamic system in which both the conditional mean and
variance are unknown functions of the past. The methodology in that paper relates to single-equation tech-
niques and exchange rate applications considered in H
¨
ardle and Tsybakov (1997) and Yang et al. (1999).
Univariate nonparametric regressions have been used in Dahl and Gonzalez-Rivera (2003a) to study the
evolution of U.S. GNP growth using the method of Hamilton (2001), who employed the techniques to ad-
dress nonlinearity in the inflation-unemployment trade-off in an example involving the Phillips Curve. Dahl
and Gonzalez-Rivera (2003b) apply the methodology to study the evolution of industrial production for a
subsample of OECD countries. Their results support the contention that much nonlinearity is neglected if
standard linear models are applied in these settings.
These and other papers have provided a growing body of evidence that allowing for various kinds of
nonlinearity in empirical macroeconomics can be very valuable in uncovering important features of time-
series relationships. Building upon these advances, this paper seeks to add to the literature by examining
a nonparametric dynamic model for multivariate time series. Specifically, this paper considers a dynamic
system of q regression equations for data {y
t
}
T
t=1
, where y
t
= (y
1t
, . . . , y
qt
)
, in which the ith equation
(i = 1, . . . , q) is modeled through the additive form (Hastie and Tibshirani, 1990)
y
it
=
q
j=1
p
k=1
g
ijk
(y
j,tk
) + ε
it
, t = 1, . . . , T, (2)
where ε
t
= (ε
1t
, . . . , ε
qt
)
N(0, ) and the unknown functions {g
ijk
(y
j,tk
)} will be modeled and
estimated nonparametrically. For this reason, in the remainder of this paper, the specification in (2) will be
referred to as a nonparametric VAR (or NPVAR) model. The model in (2) provides a natural extension of
the traditional linear VAR model in (1) relative to its parametric counterpart, the NPVAR model maintains
3
additivity but does not require that the estimated regression relationships lie in a particular class of functions.
Functional flexibility is desirable because nonlinearity is common in both economic theory and practice.
Moreover, nonparametric additive modeling has desirable practical and theoretical properties and can serve
as a useful exploratory tool that is easily inserted in more complex models. Even though the estimation
of unknown functions is a complex high-dimensional problem, the additive framework is well suited for
dealing with the “curse of dimensionality” because the argument of each function is a single variable.
The specification of the NPVAR model will be approached from a hierarchical Bayesian perspective
with special emphasis on the issues of identification, estimation, and model comparison, enabling NPVAR
models to be fit efficiently by Markov chain Monte Carlo (MCMC) algorithms and compared to nested
and non-nested parametric and semiparametric alternatives by marginal likelihoods and Bayes factors. The
methodology is useful in its own right as an exploratory and modeling tool, but is also appealing because
it enables a more careful study of other structural features while guarding against the possibility of unac-
counted nonlinearity. Doing so is important for theoretical and practical reasons, and because the conse-
quences of ignored nonlinearity can be severe.
The types of misspecification that arise from assuming an inappropriate functional form can be illus-
trated by considering two simple motivating examples. Imagine that data are generated from the model
y
t
= g(x
t
) + ε
t
, ε
t
N(0, σ
2
), and g(·) is the nonlinear function in panel (a) of Figure 1. If estimation
is by linear regression (the resulting regression line is also shown in the panel (a) of the Figure), it is easy
to see that the regression residuals will be heteroskedastic, owing to the neglected nonlinearity in g(·). If
the covariate x
t
is a lag of y
t
, the misspecification can also lead to erroneous findings of serial correlation
in the errors. Furthermore, even though the original errors used to generate the data were Gaussian, in a
linear regression they will appear non-Gaussian (see panel (b) of Figure 1). Due to the omitted nonlinearity,
the error distribution will be a location mixture of normals, and consequently one would conclude that the
Gaussian assumption is inadequate when the real culprit is neglected nonlinearity. Note that these problems
will not be resolved by using estimators that are robust to distributional misspecification.
For our second example, consider panel (c) of Figure 1. In this case, imagine that the researcher is
4
Figure 1: Ignored nonlinearity can lead to erroneous conclusions about the presence of heteroskedasticity or
autocorrelation, the adequacy of the distributional assumptions, the structural stability of the regression, or
lead to conclusions that more profligate models are required.
0 1 2
−2
−1
0
1
2
3
(a)
−2 0 2
0
0.1
0.2
0.3
0.4
(b)
0.5 1 1.5
−1
−0.5
0
0.5
1
(c)
aware that the data generating process involves a nonlinear mean, but chooses to restrict attention to the
class of piecewise linear polynomials. Although heteroskedasticity, autocorrelation, and non-Gaussianity
may not be significant (or even discernable) problems if a bilinear model is fit to the particular data in panel
(c) of the Figure, one can erroneously conclude that there is evidence of structural instability. For instance,
regime switching, changepoint, threshold, and time-varying parameter models may appreciably improve the
fit relative to a linear model, although one should bear in mind that these would be spurious findings of
instability or structural breaks since the underlying data generating process is a stable, although nonlinear,
function of the covariates that is not properly accommodated in the regressions. Such spurious instability,
unfortunately, is not the only pitfall that can be induced by this type of misspecification. The poor fit of low
order linear dynamic systems may also lead researchers to explore more profligate models involving more
lags. This would lead to loss of parsimony as additional lag components are incorporated but simply act as
atheoretical fitting parameters in the model.
These examples provide strong motivation for studying the NPVAR model because the problems they
identify can not be addressed satisfactorily without directly addressing the flexibility of the functional form.
This, of course, is not to say that features such as heteroskedasticity, autocorrelation, non-Gaussianity,
or structural instability can not be present in nonlinear models. On the contrary, they can be important
5
integral parts of the NPVAR model, and many such extensions will be considered in Section 5 and the
application in Section 6. However, the examples do suggest that before jumping to conclusions about the
presence of any of the aforementioned features, one must ensure that they are not spuriously induced by
functional form misspecifcation. To enable this task to be carried out, this paper provides methodology for
the specification, estimation, and comparison of nonparametric models, which can be useful in this pursuit,
as demonstrated in a study of U.S. macroeconomic data. The application reveals that the NPVAR model
supports the existence of distinct volatility regimes in the data, and provides evidence that means remain
stable but exhibit interesting nonlinearities.
The remainder of this article deals with the hierarchical structure of NPVAR models and their imple-
mentation in practice. Specifically, Section 2 presents the specification of the NPVAR model together with a
computationally convenient identification restriction on the unknown additive functions. Section 3 presents
an efficient fitting algorithm based on MCMC simulation techniques, which subsumes frequentist estima-
tion by backfitting as a special case. Section 4 addresses the problem of model comparison and model
averaging by discussing the computation of marginal likelihoods and Bayes factors. Section 5 outlines
extensions to settings with heteroskedasticity, Student-t errors, structural instability, heteroskedasticity or
stochastic volatility, dynamic factors, and discrete outcomes, and provides references to the relevant litera-
ture. Section 6 considers the application of the NPVAR model to data for the post-war US economy, whereas
Section 7 offers concluding remarks.
2 Hierarchical Model Specification
The NPVAR model will be specified through the following distributional hierarchy. The likelihood function
f(y|{g
ijk
(·)}, ) obtained from the additive model in (2) will be augmented with a prior distribution (or
model) for the each unknown function π(g
ijk
(·)|τ
2
ijk
) that will, in turn, depend on a hyperparameter τ
2
ijk
.
The hierarchy will be completed by the priors on τ
2
ijk
and , denoted by π(τ
2
ijk
) and π(), respectively.
The prior π(g
ijk
(·)|τ
2
ijk
) is often referred to as a “smoothness prior” because it aims at penalizing rough
functions but does not absolutely rule out any values that the function can take. The prior is very similar to
6
the roughness penalty in frequentist penalized likelihood estimation. The parameters τ
2
ijk
are often called
smoothness parameters because they control the degree of smoothness of g
ijk
(·) in π(g
ijk
(·)|τ
2
ijk
). Details
will be provided in the remainder of this section, but it is important to note that the methodology leans on
a vast literature in Bayesian nonparametric estimation in a variety of areas with continuous, discrete, and
censored responses, including cross-sectional settings (Besag et al., 1995; Wood and Kohn, 1998; Hastie and
Tibshirani, 2000; Fahrmeir and Lang, 2001; Wood et al., 2002; Koop and Poirier, 2004), multiple equation
systems (Smith and Kohn, 2000; Holmes et al., 2002; Koop et al., 2005), panel data (Chib and Jeliazkov,
2006), sample selection models (Chib and Greenberg, 2007; Chib et al., 2009), and time series applications
(Hamilton, 2001). Extensions to Bayesian models with free-knot splines have been pursued in Denison et al.
(1998) and DiMatteo et al. (2001), while Bayesian estimation techniques for multivariate functions have
been provided in Shively et al. (1999) and Wood et al. (2002). Useful reviews and introduction to many
aspects of nonparametric modeling can be found in Hastie and Tibshirani (1990), Denison et al. (2002),
Koop (2003), Ruppert et al. (2003), Wasserman (2006), and Ahamada and Flachaire (2010).
2
Nonparametric
functional modeling has appealing frequentist and Bayesian properties, and many of its advantages have
been illustrated in the aforementioned works.
Some simplification in the notation can be obtained by denoting the r = qp lagged variables on the right
hand side of the ith equation (i = 1, . . . , q) in (2) by {s
ijt
}
r
j=1
and writing that equation as
y
it
= g
i1
(s
i1t
) + . . . + g
ir
(s
irt
) + ε
it
, (t = 1, . . . , T ). (3)
Then, to motivate the hierarchical model for the functions, it is useful to stack the observations and write
the model in matrix notation. Let y
i
= (y
i1
, . . . , y
iT
)
, ε
i
= (ε
i1
, . . . , ε
iT
)
, and for each of the j =
1, . . . , r functions in (3), let the T observations in the covariate vectors s
ij
= (s
ij1
, . . . , s
ijT
)
determine the
corresponding m
j
× 1 design point vectors v
ij
=
v
ij1
, . . . , v
ijm
j
with entries equal the unique ordered
values of s
ij
, that is v
ij1
< . . . < v
ijm
j
. Let the corresponding function evaluation vectors be denoted by
g
ij
=
g
ij
(v
ij1
) , . . . , g
ij
(v
ijm
j
)
. Then, stacking over time, the ith equation of the system can be written
2
Interested readers are referred to these books for further details on a rich variety of nonparametric modeling approaches such
as truncated polynomials, radial basis functions, neural networks, regression trees, wavelets, kernel smoothing, locally weighted
polynomials, B-splines, etc., many of which are beyond the scope of this paper.
7
in matrix notation as
y
i
= Q
i1
g
i1
+ Q
i2
g
i2
+ . . . + Q
ir
g
ir
+ ε
i
, (4)
where Q
ij
are T × m
ij
incidence matrices with entries Q
ij
(h, k) = 1 if s
ijh
= v
ijk
and 0 otherwise, which
establishes the correspondence between s
ij
and v
ij
. Note that because there may be repeating values in s
ij
,
we have that m
j
T for j = 1, . . . , r. Since all rows of Q
j
contain a single 1, row t of the product Q
ij
g
ij
is given by g
ij
(s
ijt
).
The idea behind nonparametric modeling is to view the function evaluations in each g
ij
as the realization
of a stochastic process which controls the degree of local variation between neighboring elements. Despite
differing theoretical foundations and assumptions, the vectors of function evaluations g
ij
can eventually be
written, in a wide range of nonparametric modeling approaches, as random fields of the form
g
ij
|τ
2
ij
N
g
ij0
, τ
2
ij
K
1
ij
, j = 1, . . . , r, (5)
where τ
2
ij
is a smoothness parameter and K
ij
is a matrix whose structure will be discussed shortly. From a
Bayesian perspective, equation (5) can be viewed as a smoothness prior for g
ij
, where as from a frequentist
perspective, it is often viewed as a roughness penalty term in penalized likelihood estimation (Wahba, 1978).
In either case, the goal of the modeling is to introduce a penalty to local variation between successive
elements of g
ij
, without absolutely ruling out any possible value that the elements of g
ij
can take.
The focus in this paper is on models involving banded precision matrices K
ij
, i.e., matrices which
have non-zero elements only in small bands around the main diagonal. Matrix bandedness is a feature
that significantly reduces the computational costs and makes the analysis of high-dimensional problems
feasible and inexpensive. This paper will examine the construction of g
ij0
and K
ij
in (5) for a class of
smoothness priors, which are conceptually simple and easily adaptable, can approximate unknown functions
arbitrarily well, and have been widely used (see, for example, Poirier, 1973; Shiller, 1984; Besag et al.,
1995; Fahrmeir and Lang, 2001; Koop and Poirier, 2004; Koop et al., 2005; Chib and Jeliazkov, 2006; Chib
et al., 2009). The roots of this method can be traced back to Whittaker (1923), and its relationship with
state space models has been discussed in Chan and Jeliazkov (2009). It should be noted that despite the
8
focus on a specific smoothness prior, the estimation methodology described in this paper is generic and can
be applied with various modeling approaches for the unknown functions, such as splines (Poirier, 1973;
Shiller, 1984), B-splines (Silverman, 1985), wavelets (Denison et al., 2002), or the approach of Hamilton
(2001). Although bandedness of the precision matrix is a useful characteristic of many of the preceding
nonparametric approaches, it is not a feature of other popular modeling methods, e.g. regression splines or
integrated Wiener process priors, which may be more computationally intensive in high-dimensional cases.
Because the modeling follows identical steps for each of the functions, for the time being we can sim-
plify notation by suppressing the ij subscripts that denote the equation and function numbers. With this
convention, a Markov process prior views the elements of g = (g(v
1
), . . . , g(v
m
))
(g
1
, . . . , g
m
)
as a
stochastic process observed at the unique and ordered values in v. Specifically, letting h
v
v
1
, a
first-order Markov process prior can be defined as
g
= g
1
+ u
, (6)
while a second-order Markov process prior is given by
g
=
1 +
h
h
1
g
1
h
h
1
g
2
+ u
, (7)
where u
N(0, τ
2
h
) and τ
2
is a smoothness parameter, such that small values of τ
2
produce smoother
functions, while larger values allow the function to be more flexible and interpolate the data more closely.
The weights h
adjust the variance to account for possibly irregular spacing between consecutive points in
each design vector; the one given here implies that the variance grows linearly with the distance h
, although
other weights are also possible. A distribution for the initial states of the stochastic process is necessary in
order to complete the specification of the smoothness prior. For example, for the first-order prior, the initial
state can be modeled as
g
1
N
g
10
, τ
2
G
10
, (8)
whereas in the second-order case, we have
g
1
g
2
|τ
2
N

g
10
g
20
, τ
2
G
0
, (9)
9
where G
0
is a 2 × 2 symmetric positive definite matrix. The prior on the initial conditions in (8) and (9)
is very important because it induces a proper prior on the remaining observations (see Chib and Jeliazkov,
2006). Specifically, equation (6), starting with the initial condition in (8), implies a penalty on abrupt
jumps between successive function evaluations, whereas (7), starting with (9), induces a more general prior
on linear functions of v
j
that is conceptually similar to the usual priors placed on the intercept and slope
parameters in linear regression. This can be seen more precisely by iterating (7) in expectation (to eliminate
u
which is the source of the nonlinearity), starting with initial states in (9).
The interpretability of the directed Markovian structure of the priors specified by (6)–(9) is a convenient
aspect of this approach, however, it also leads to an equivalent undirected representation that is used in
deriving the random field version of the smoothness prior in (5). This can be shown by recongnizing that
upon defining
H =
1
1 1
.
.
.
.
.
.
1 1
, Σ =
G
10
h
2
.
.
.
h
m
,
for the first-order case in equations (6) and (8), and similarly letting
H =
1
1
h
3
h
2
1 +
h
3
h
2
1
.
.
.
.
.
.
.
.
.
h
m
h
m1
1 +
h
m
h
m1
1
, Σ =
G
0
h
3
.
.
.
h
m
,
for the second-order Markov process in (7) and (9), one can write Hg = u, where u N(u
0
, Σ) is used
to denote the errors in the Markov process with u
0
= (g
10
, 0, . . . , 0)
and u
0
= (g
10
, g
10
, 0, . . . , 0)
in the
first- and second-order cases, respectively. A simple change of variables technique leads to the distribution
g|τ
2
N
g
0
, τ
2
K
1
, where the penalty matrix K is given by K = H
Σ
1
H and g
0
= H
1
u
0
. This
derivation leads to the distributions presented in (5), where the indices i and j are explicitly present. Note
that g
0
can alternatively be derived by taking recursive expectations of either (6) or (7) starting with the
mean in (8) or (9), respectively.
Two key features of the class of priors are that (i) they are proper, which allows for formal Bayesian
10
model selection, and (ii) the m × m penalty matrices K are banded, which is of considerable convenience,
as manipulations involving such matrices take O(m) operations, rather than the usual O(m
3
) operations
for inversions and determinant computations, or O(m
2
) operations for multiplication by a vector. Given
that m may potentially be as large as the total number of observations T in the sample, this has important
ramifications for the numerical efficiency.
Since the priors on {g
ij
} are defined conditionally on the hyperparameters {τ
2
ij
}, the hierarchical struc-
ture of the model is completed by specifying the prior distributions τ
2
ij
IG (ν
ij0
/2, δ
ij0
/2). Similarly,
the prior distribution on the covariance matrix is taken as
1
W (r
0
, R
0
). In setting these priors
it is generally very helpful to consider their mapping to the mean and variance of the inverse gamma and
Wishart distributions (see Gelman et al., 2003, App. A), as the choice of these parameters plays a role in
determining the trade-off between smoothness and goodness of fit. An example of how different settings of
the prior parameters can lead to over- or under-smoothing is presented in Chib and Jeliazkov (2006).
Before we can focus on estimating the model, we must address the likelihood identification problem
that emerges due to the additive structure in (3). Because the likelihood will remain unchanged if we
simultaneously let g
j
(·) = g
j
(·) + α and g
k
(·) = g
k
(·) α for k ̸= j, it is obvious that neither an intercept,
nor the level of the individual functions is likelihood identified. This can also be seen by recognizing that
all rows of every incidence matrix Q
j
in (4) sum to 1, leading to perfect multicollinearity because model (4)
can be thought of as a saturated dummy variable model. Therefore, the functions must be appropriately
“anchored” in order to achieve likelihood identification.
It is well known that Bayesian models with proper priors do not suffer from identification problems
even when the likelihood is not identified (Lindley, 1971; Poirier, 1998). However, because the nonpara-
metric components of additive models are correlated by construction (since they enter the mean function
additively), likelihood identification is essential for providing a model with well-behaved conditional pos-
terior distributions that will produce quickly-mixing MCMC algorithms for efficient posterior sampling. To
achieve likelihood identification, I remove free constants in the likelihood by employing the identification
restrictions proposed in Jeliazkov (2011). The approach formally identifies the model by centering the func-
11
tions in the likelihood and integrating out—instead of holding fixed—any unidentified quantities that enter
the specification.
3
Such quantities are marginalized out with respect to a proper prior that is of no relevance
in the likelihood and does not affect marginal likelihood estimation, thus relating this approach to the idea
of marginal data augmentation discussed in Meng and van Dyk (1999), van Dyk and Meng (2001), and Imai
and van Dyk (2005).
One approach to identification is to remove the free constants in the likelihood by restricting r 1 of
the functions {g
ij
} in each equation to start at zero (e.g., Shively et al., 1999; Koop et al., 2005; Chib et al.,
2009). While this approach is quite natural as it corresponds to creating a baseline category in dummy
variable models, in the context of nonparametric regression it tends to produce funnel-shaped error bands
for the function estimates due to the identification restriction. Consequently, the information content in the
data can be confounded with the repercussions of the identification restriction, so that narrower bands need
not correspond to regions with more data or better identification of the function.
Another possibility for anchoring the functions is to consider the following version of (4)
y
i
= Q
i1
g
i1
+ Q
i2
M
02
g
i2
+ . . . + Q
ir
M
0r
g
ir
+ ε
i
, (10)
where
M
0j
=
I
m
j
1
m
j
1
m
j
m
j
, j = 1, . . . , r,
are m
j
× m
j
symmetric and idempotent mean-differencing matrices (Hastie and Tibshirani, 1990; Lin and
Zhang, 1999). Unfortunately, this identification scheme does not lend itself to computationally efficient pos-
terior simulation and, as pointed out by Gelfand (2000), it has been applied in ways that do not correspond
to well-defined Bayesian models, with centering typically introduced “on the fly” merely as a step in the
fitting algorithm.
For these reasons, this paper employs the closely related, yet computationally very distinct, identification
scheme presented in Jeliazkov (2011), where the additive functions are identified through
y
i
= Q
i1
g
i1
+ M
0
Q
i2
g
i2
+ . . . + M
0
Q
ir
g
ir
+ ε
i
, (11)
3
It will be sufficient to apply this centering to r 1 of the unknown functions in each equation allowing the overall intercept to
be absorbed in the remaining function.
12
where the T × T symmetric and idempotent mean-differencing matrix
M
0
=
I
T
1
T
1
T
T
now pre-multiplies the incidence matrices {Q
ij
} and centers the expanded vector of functional evaluations.
The benefits from this identification method are discussed next.
3 Estimation
To motivate the general approach, this section begins by considering the important special case of a single
equation univariate regression model. Given data {y
t
, s
t
}
T
t=1
, the scalar responses y
t
are assumed to depend
on the (scalar) covariate s
t
according to
y
t
= g (s
t
) + ε
t
, (t = 1, . . . , T ), (12)
where ε
t
N
0, σ
2
, and g (·) is an unknown smooth function. The model in (12) can be written in stacked
form as
y = Qg + ε, ε N
0, σ
2
I
, (13)
where Q is the incidence matrix defined after equation (4). Given the Gaussian likelihood implied by (13),
and assuming the Gaussian smoothness prior in (5) for either a first- or second-order process together with
inverse Gamma priors τ
2
IG(ν
0
/2, δ
0
/2) and σ
2
IG(s
0
/2, d
0
/2), yields full-conditional distributions
which are conjugate, i.e. they are in the same family as the priors (see, e.g., Koop, 2003; Greenberg,
2008). Sequential sampling from those full-conditional distributions lays the foundations for the following
algorithm.
Algorithm 1 Univariate Gaussian Nonparametric Model: MCMC Implementation
1. Sample [g|y, τ
2
, σ
2
] N(ˆg, G), where G and ˆg are the usual Bayes updates for linear regression,
namely G =
K
2
+ Q
Q
2
1
and ˆg = G
Kg
0
2
+ Q
y
2
. Remark 1 presents important
notes on the sampling in this step.
2. Sample [τ
2
|g] IG
ν
0
+m
2
,
δ
0
+(gg
0
)
K(gg
0
)
2
, where conditionally on g, τ
2
is independent of the
remaining parameters and the data.
13
3. Sample [σ
2
|y, g] IG
s
0
+n
2
,
d
0
+(yQg)
(yQg)
2
.
While steps 2 and 3 of Algorithm 1 are fairly straightforward, step 1 requires careful consideration
because the quantities involved there can be of dimension as high as the sample size n. For this reason,
estimation is performed as follows (see Fahrmeir and Lang, 2001).
Remark 1 Sampling of g. To sample g, note that Q
Q is a diagonal matrix whose t-th diagonal entry
equals the number of values in s corresponding to the design point v. Since K and Q
Q are banded,
G
1
is banded as well. Thus sampling of g need not include an inversion to obtain G and ˆg. The mean
ˆg is found instead by solving G
1
ˆg =
Kg
0
2
+ Q
y
2
, which is done in O(T ) operations by back
substitution. Also, let P
P = G
1
, where P is the Cholesky decomposition of G
1
and is also banded. To
obtain a random draw from N (ˆg, G) efficiently, sample u N (0, I), and solve P w = u for w by back
substitution. It follows that w N (0, G). Adding the mean ˆg to w, one obtains a draw g N(ˆg, G).
Turning attention to the additive case, let θ denote the vector of all model parameters, i.e. the elements
of {g
ij
}, {τ
2
ij
}, and the unique entries of . Then, based on the identifying restrictions in (11) and the priors
discussed in Section 2, MCMC estimation can proceed through iterative sampling of the following steps.
Algorithm 2 NPVAR Model: MCMC Implementation
1. Sample [g
i1
|y, θ\g
i1
] N
ˆg
i1
,
ˆ
G
i1
, where,
ˆ
G
i1
=
1
τ
2
i1
K
i1
+
1
σ
2
i|\i
Q
i1
Q
i1
1
,
ˆg
i1
=
ˆ
G
i1
1
τ
2
i1
K
i1
g
i10
+
1
σ
2
i|\i
Q
i1
y
i
µ
i|\i
r
j=2
M
0
Q
ij
g
ij
,
with µ
i|\i
= E(ε
i
|ε
\i
) and σ
2
i|\i
= V ar(ε
i
|ε
\i
). The sampling in this step is carried out efficiently in
O (T ) operations as discussed in Remark 1.
2. Sample [g
ij
|y, θ\g
ij
] N
ˆg
j
,
ˆ
G
j
for j = 2, . . . , r and i = 1, . . . , q, where
ˆ
G
ij
=
1
τ
2
ij
K
ij
+
1
σ
2
i|\i
Q
ij
M
0
Q
ij
1
, and
ˆg
ij
=
ˆ
G
ij
1
τ
2
ij
K
ij
g
ij0
+
1
σ
2
i|\i
Q
ij
M
0
y µ
i|\i
Q
i1
g
i1
k2,k̸=j
M
0
Q
ik
g
ik
.
Remark 2 below shows how the sampling in this step can be carried out efficiently in O (T ) operations,
even though
ˆ
G
j
is not banded.
14
3. Sample [τ
2
ij
|g
ij
] IG
[ν
ij0
+ m
j
]/2, [δ
ij0
+
g
ij
g
ij0
K
ij
g
ij
g
ij0
]/2
for i = 1, . . . , q,
and j = 1, . . . , r, where, given g
ij
, τ
2
ij
is independent of the other elements in θ and the data y.
4. Sample [
1
|y, θ\] W
r
0
+ T, [R
1
0
+
T
t=1
e
t
e
t
]
1
, where e
t
denotes the q × 1 vector of
residuals in time period t.
Algorithm 2 generalizes Algorithm 1 in a straightforward fashion by sampling each unknown function
conditionally on the remaining ones, making simulation manageable. Importantly, however, Step 2 of Al-
gorithm 2 involves r 1 non-banded matrices in each equation, and at first glance it would appear that
simulation will be very demanding. Fortunately, however, as shown in Jeliazkov (2011), an application
of the Sherman-Morrison formula makes it possible to sample these functions efficiently. The approach is
presented in greater detail in Remark 2 further below. The modularity and computational advantages of this
estimation strategy can provide important benefits in a variety of settings because simulating the functions
by brute force methods is not always practical owing to the algorithmic complexity of working with high-
dimensional matrices. Moreover, because the frequentist backfitting approach to estimating the unknown
functions can be viewed as a (non-stochastic) simplification of Gibbs sampling (see Hastie and Tibshirani,
2000), Algorithm 2 can also be useful in frequentist estimation.
Remark 2 Sampling of Centered Functions. To draw g
ij
N
ˆ
g
ij
,
ˆ
G
ij
in Step 2 of Algorithm 2, use
the definition of M
0
to write
ˆ
G
ij
=
1
τ
2
ij
K
ij
+
1
σ
2
i|\i
Q
ij
M
0
Q
ij
1
=
1
τ
2
ij
K
ij
+
1
σ
2
i|\i
Q
ij
Q
ij
c
ij
c
ij
σ
2
i|\i
T
1
,
where c
ij
= Q
ij
1. Letting A
ij
=
1
τ
2
ij
K
ij
+
1
σ
2
i|\i
Q
ij
Q
ij
, u
ij
=
1
σ
2
i|\i
T
c
ij
, and λ
ij
= u
ij
A
1
ij
u
ij
, one
can write, by the Sherman-Morrison formula,
ˆ
G
ij
=
A
ij
u
ij
u
ij
1
= A
1
ij
+
A
1
ij
u
ij
u
ij
A
1
ij
1 λ
ij
. (14)
Significant efficiency benefits can be derived from (14) because ˆg
j
in Step 2 of Algorithm 2 can be obtained
by working with A
ij
without inverting to A
1
ij
as outlined in Remark 1. Furthermore, let
B
ij
=
A
ij
+
u
ij
u
ij
1 λ
ij
,
15
which implies that
ˆ
G
ij
= A
1
ij
B
ij
A
1
ij
. Thus, if x N (0, B
ij
), then z = A
1
ij
x is distributed z
N
0,
ˆ
G
ij
, and a draw g
ij
N
ˆg
ij
,
ˆ
G
ij
is obtained as g
ij
= ˆg
ij
+ z. To generate x N (0, B
ij
),
draw w
1
N (0, A
ij
) and w
2
N (0, 1) and let x = w
1
+ w
2
u
ij
/
1 λ
ij
,
As a consequence of the shortcuts afforded by Remark 2, all operations are O (T ) rather than O
T
3
.
4 Model Comparison
Empirical studies must inevitably address uncertainty not only about the parameters of a given model, but
also about the model specification itself. This makes model comparison a central issue in statistical analysis.
Given a collection of models {M
1
, . . . , M
L
}, the formal Bayesian approach to model comparison (or
testing the validity of the alternative hypotheses captured by each model) is based on the posterior model
probabilities and their ratios, the posterior odds. Specifically, for any two models M
i
and M
j
, a simple
application of Bayes’ theorem suggests that the posterior odds can be represented as the product of the prior
odds and the ratio of the marginal likelihoods (the Bayes factor) as follows
Pr(M
i
|y)
Pr(M
j
|y)
=
Pr(M
i
)
Pr(M
j
)
×
m(y|M
i
)
m(y|M
j
)
.
In turn, for any model M
l
, l = 1, . . . , L, the marginal likelihood is given by
m(y|M
l
) =
f(y|θ
l
, M
l
)π
l
(θ
l
|M
l
)dθ
l
, (15)
which is the integral of the likelihood function f(y|θ
l
, M
l
) with respect to the prior distribution on the
model parameters π(θ
l
|M
l
). Because in the case of nonparametric additive models the dimension of θ can
be very large, it should be clear that direct analytical integration will generally be infeasible. However, this
difficulty can be addressed by using the approach of Chib (1995), where after rearranging Bayes’ theorem
m(y|M
l
) can alternatively be expressed as
m(y|M
l
) =
f(y|θ
l
, M
l
)π(θ
l
|M
l
)
π(θ
l
|y, M
l
)
, (16)
so that the integral in (15) is reduced to the more tractable problem of evaluating the likelihood, prior, and
posterior ordinates at a single point θ
l
(e.g., the posterior mean). Because the numerator terms in (16)
16
are available by direct calculation, the marginal likelihood can be computed by finding an estimate of the
posterior ordinate π(θ
|y).
In the current context, the hierarchical structure of NPVAR models allows application of (16) in two
different ways. One approach relies on
m(y) =
f
y|
τ
2
ij
,
,
g
ij

π

τ
2
ij
,
,
g
ij

π

τ
2
ij
,
,
g
ij
|y
,
where the nonparametric functions are explicitly included in the identity (see Chib and Jeliazkov, 2006;
Chib et al., 2009). However, owing to the Gaussian structure of the model, the marginal likelihood can also
be computed using
m(y) =
f
y|{τ
2
ij
},
π
{τ
2
ij
},
π
{τ
2
j
},
|y
,
where all quantities are marginalized over the high-dimensional blocks {g
ij
}. This marginalization is pos-
sible because conditionally on

τ
2
ij
,
, the density f
y|{τ
2
ij
},
, marginalized over {g
ij
} with
respect to the prior distributions in (5), is also normal (Koop and Poirier, 2004) and can be evaluated di-
rectly for the typical sample sizes T encountered in macroeconomic applications. Because of this analytical
tractability, m (y) can then be found after the main run where, using the conditional independence of the
densities in Steps 3 and 4 of Algorithm 2, one computes
π

τ
2
ij
,
|y
T
1
T
t=1
f
IW
|y,
g
(t)
j

p
i=1
f
IG
τ
2
j
|g
(t)
j
using draws {g
(t)
j
} from the main MCMC run. In instances where T is large or there are other complications
(e.g., discrete outcomes), the decomposition involving {g
ij
} is more appropriate and readers are referred
to Chib and Jeliazkov (2006) and Chib et al. (2009) for methods that use reduced runs or to Jeliazkov and
Lee (2010) for a method that employs the Gibbs kernel and invariance of the Markov chain to estimate the
posterior ordinate π({τ
2
ij
},
, {g
ij
}|y).
An important point mentioned earlier is that the marginal likelihood for the additive NPVAR model does
not depend on the (likelihood unidentified) levels of the functions that are centered for identification. This
17
can be seen by recognizing that if f (y|θ
1
, θ
2
) = f(y|θ
1
), i.e. the likelihood depends only on θ
1
whereas
θ
2
is unidentified, but we have a proper prior π(θ
1
, θ
2
), then the marginal likelihood
m (y) =
f (y|θ
1
, θ
2
) π (θ
1
, θ
2
) dθ
1
dθ
2
=
f (y|θ
1
) π (θ
1
)
π (θ
2
|θ
1
) dθ
2
dθ
1
=
f (y|θ
1
) π (θ
1
) dθ
1
,
is not influenced by the prior on θ
2
. In practice this is important for modeling, because it implies that
researchers with different beliefs about unidentified parameters will nevertheless reach identical conclusions
about the relative ranking of alternative models.
Finally, note that because estimation of the marginal likelihood does not require maximization, it is less
computationally intensive in nonparametric additive models than evaluation of information criteria such as
AIC and BIC. This point has been overlooked and not fully appreciated in the literature despite its impor-
tance for model selection and model averaging on the basis of {Pr(M
l
|y)}.
5 Model Extensions
The estimation techniques presented in this paper are fully modular and readily applicable in various other
settings since estimation of the unknown functions {g
ij
} can be done conditionally on modifications in
other parts of the model. The goal of this discussion is to briefly review the relevant literature and provide
references that could guide researchers interested in pursuing such extensions.
One should note that the methods in Section 2 and Section 3 trivially generalize to cases where one or
more exogenous covariates enter the regression as in seemingly unrelated regression models (e.g., Smith
and Kohn, 2000; Holmes et al., 2002; Koop et al., 2005). Further extensions of the framework to Bayesian
models with nonparametric endogeneity or sample selection can be pursued following Chib and Greenberg
(2007) or Chib et al. (2009), respectively; the modeling would also be useful in guiding future research on
structural NPVAR models and impulse response analysis. Many of the aforementioned papers also trivially
subsume semiparametric and partially linear cases where some of the covariates enter the model linearly.
18
Estimation of such models is a straightforward extension of Algorithms 1 and 2, and proceeds by using
the partial residuals y
i
X
i
β when simulating {g
ij
}, followed by simulating β conditionally upon the
functions {g
ij
}.
The methods in this paper, including the above extensions, can also be applied, using data augmentation
techniques (Tanner and Wong, 1987; Albert and Chib, 1993), to the analysis of dynamic systems involving
binary, polychotomous, censored, and other discrete outcomes such as the qualitative VAR model of Dueker
(2005). The main advantage of this approach is that conditionally on the latent data, estimation of the
parameters and the unknown functions closely mirrors the methods for continuous data. The methodology
presented here is also applicable to the class of additive mixed models for continuous and discrete data (e.g.,
Lin and Zhang, 1999). For example, Chib and Jeliazkov (2006) discuss the specification and estimation of
a semiparametric partially linear model for dynamic binary panel data with multivariate heterogeneity. The
estimation algorithm in that paper can be easily modified to include an additive structure whose estimation
can be carried out by the methods presented in Section 3.
While the specification and estimation of NPVAR models was discussed in detail for homoskedastic
Gaussian models, extensions to other distributions (e.g., Student’s t, mixtures of normals, or Dirichlet pro-
cess priors for nonparametric distributional modeling) and heteroskedasticity (e.g., regime switching models
with different variance regimes, or models with stochastic volatility) may be very desirable in certain appli-
cations. One such example, relating to different variance regimes, will be studied in Section 6. Fortunately,
such extensions can be estimated using data augmentation techniques that could build upon the homoskedas-
tic Gaussian specification discussed earlier. In particular, consider a heteroskedastic model in which the ith
equation can be written as
y
i
= Q
i1
g
i1
+ M
0
Q
i2
g
i2
+ . . . + M
0
Q
ir
g
ir
+ ε
i
with Σ
i
Var(ε
i
) = diag(σ
2
i1
, . . . , σ
2
iT
). Due to the heteroskedasticity, the covariance matrix of [g
ij
|y, θ]
is not of the form presented in Remark 2, and estimation can not be performed efficiently by relying on the
Sherman-Morrison formula. This poses an important computational difficulty because estimation in large
19
dimensional models would be very difficult and potentially infeasible. In this paper, I propose a solution to
this problem that employs data augmentation to reduce the heteroskedastic model to a homoskedastic one
enabling application of the methods discussed in Algorithm 2 and Remark 2. In particular, following Chib
and Jeliazkov (2006), we can write
y
i
= Q
i1
g
i1
+ M
0
Q
i2
g
i2
+ . . . + M
0
Q
ir
g
ir
+ η
i
+ ν
i
,
where η
i
iid
N(0, Σ
i
κ
i
I) and ν
i
iid
N(0, κ
i
I) with 0 < κ
i
min{σ
2
it
}. Consequently, given a draw of
η
i
, which is simple and inexpensive to obtain, the model
y
i
η
i
= Q
i1
g
i1
+ M
0
Q
i2
g
i2
+ . . . + M
0
Q
ir
g
ir
+ ν
i
is homoskedastic because Var(ν
i
) = κ
i
I. In our context, it would actually be optimal to set κ
i
= min{σ
2
it
}
because this would imply that the corresponding elements of η
i
would be identically 0 and will not need to
be sampled. This leads to the following extension of Algorithm 2:
Algorithm 3 NPVAR Model: MCMC Estimation of Heteroskedastic Model
1. For i = 1, . . . , q:
(a) Sample [η
i
|y, θ] by drawing, for t = 1, . . . , T , η
it
N(ˆη
it
,
ˆ
H
it
), where
ˆ
H
it
= κ
i
(σ
2
it
κ
i
)
2
it
and ˆη
it
= (σ
2
it
κ
i
)(y
it
µ
it|\i,t
m
it
)
2
it
, where m
it
is the t-th row of Q
i1
g
i1
+
k2
M
0
Q
ik
g
ik
, µ
it|\i,t
is the t-th row of µ
i|\i
= E(ε
i
|ε
\i
), and κ
i
= min{σ
2
it
}, where
σ
2
it
= V ar(ε
it
|ε
\i,t
). Note that for cases where κ
i
= σ
2
it
, the corresponding entry in η
i
is
identically zero and need not be sampled.
(b) Sample [g
i1
|y, η
i
, θ\g
i1
] N
ˆg
i1
,
ˆ
G
i1
, where,
ˆ
G
i1
=
1
τ
2
i1
K
i1
+
1
κ
i
Q
i1
Q
i1
1
,
ˆg
i1
=
ˆ
G
i1
1
τ
2
i1
K
i1
g
i10
+
1
κ
i
Q
i1
y
i
η
i
µ
i|\i
r
j=2
M
0
Q
ij
g
ij
,
with µ
i|\i
= E(ε
i
|ε
\i
) and κ
i
= min{σ
2
it
}, where σ
2
it
= V ar(ε
it
|ε
\i,t
). The sampling in this
step is carried out efficiently as in Remark 1.
(c) Sample [g
ij
|y, η
i
, θ\g
ij
] N
ˆg
j
,
ˆ
G
j
for j = 2, . . . , r and i = 1, . . . , q, where
ˆ
G
ij
=
1
τ
2
ij
K
ij
+
1
κ
i
Q
ij
M
0
Q
ij
1
, and
20
ˆg
ij
=
ˆ
G
ij
1
τ
2
ij
K
ij
g
ij0
+
1
κ
i
Q
ij
M
0
y η
i
µ
i|\i
Q
i1
g
i1
k2,k̸=j
M
0
Q
ik
g
ik
.
This is done as in Remark 2.
2. Sample [τ
2
ij
|g
ij
] for i = 1, . . . , q, and j = 1, . . . , r, as in Algorithm 2.
3. Sample [
it
1
|y, θ\] according to the volatility process under consideration.
The above machinery also applies to mixture-of-normals and scale mixture-of-normals models. In par-
ticular, a model with t errors with ν degrees of freedom can be represented as a conditionally Gaussian
model, whose variance, given a set of a priori gamma latent variables λ
t
G(ν/2, ν/2), t = 1, . . . , T , is
given by Var(ε
t
|λ
t
) = σ
2
t
(Andrews and Mallows, 1974; Albert and Chib, 1993). Estimation of these
models is straightforward because given {λ
i
}, one can decompose ε
i
into η
i
and ν
i
and proceed as above.
In this way, NPVAR models can be adapted to a variety of specifications for the error variance, including
changepoint and regime switching models (Chib, 1996, 1998; Sims and Zha, 2006), time-varying parameter
models (Primiceri, 2005; Chan and Jeliazkov, 2009), factor models (Kose et al., 2003; Belviso and Milani,
2006; Kose et al., 2008; Chan and Jeliazkov, 2009), and others.
6 Application to U.S. Macroeconomic Data
The data sample for this application contains post-war quarterly macroeconomic data for the U.S. from
1948:Q1 to 2005:Q1. The set of variables includes output growth g
t
measured by log differences of real
GDP between two consecutive quarters, average quarterly unemployment rate u
t
, inflation π
t
measured
by the percentage change in the Consumer Price Index between consecutive quarters, and interest rates i
t
measured by the average quarterly secondary market yield on the 3-month Treasury bill. The first three of
these variables are seasonally adjusted. These variables, summarized in Table 1, reflect the general state
of the economy, and have been widely used in empirical macroeconomics.
4
From the Table, we see that
4
The sample period excludes the past recession for a number of reasons. Over the last few years, interest rates have approached
and stayed very close to their lower bound of zero. This could lead to findings of nonlinearity due to the effects of the lower bound,
thereby favoring the methods of the paper over a linear model. Moreover, traditional modeling may be inadequate near the bound,
where the distribution of the interest rate process is truncated and exhibits point mass. Appropriate modeling in this case is still an
open research problem. Finally, if the “Great Recession” marked a possible structural break, at present there would be insufficient
observations estimate the model after the break.
21
the average quarterly GDP growth over the sample period is 0.85 percent, which amounts to annual GDP
growth of 3.4 percent. A similar computation shows an average annual inflation rate of approximately 3.7
percent. Unemployment and interest rates average at 5.63 and 4.81 percent, respectively.
Table 1: Descriptive statistics for the data sample (in percentage points).
Variable Mean SD Min Max
Quarterly growth in real GDP 0.85 1.00 -2.76 4.02
Unemployment rate 5.63 1.52 2.60 10.70
Nominal interest rate 4.81 2.92 0.79 15.05
Quarterly Inflation 0.92 0.85 -1.24 4.08
These data are analyzed using the econometric techniques discussed earlier. The empirical strategy for
studying the behavior of the dynamic system in (2) is to address both model and functional form uncertainty.
The first area of model uncertainty in the macroeconomic system has to do with determination its dynamics
i.e., the number of lags needed in equation (2). This one-lag model was compared to several more richly
parameterized models in order to gauge whether restricting attention to an NPVAR(1) specification is a
sensible empirical strategy. The baseline NPVAR(1) model, which contains a single lag of y
t
with 16
unknown functions, was compared with an NPVAR(2) model (with 32 such functions). The baseline model
overwhelmingly outperformed the longer lag specification its log-marginal likelihood exceeded that of the
larger model by over 40, implying a Bayes factor of over e
40
in favor of the NPVAR(1) specification.
5
Guided by earlier research findings suggesting the possibility of a structural break (at least in error
volatility as in Stock and Watson (2003) and Sims and Zha (2006)), I also used split-sample estimation
to capture the possibility of structural breaks in the series. Specifically, an NPVAR(1) model was fit on
data in the pre-Volcker era (prior to 1979:Q2), and a separate model was fit on the data thereafter (following
1979:Q3). The marginal likelihood for the pre-Volcker model was 539.5, and that for the second part of the
data sample was 438.7. Compared to the log-marginal likelihood of 908.8 for the baseline NPVAR(1)
model on the entire data sample, the split sample measure of fit, as captured by the marginal likelihood, was
5
A specification including a fourth (year-ago) lag was also considered but it also did not perform competitively with the
NPVAR(1) specification.
22
far worse (the sum of the log-marginal likelihoods for the two subsamples is 978.2, which is far below
marginal likelihood for the overall NPVAR(1) model of 908.8). These results are interesting because (i)
they suggest that the simpler and parsimonious NPVAR(1) fit on the entire specification appears preferable to
the (twice as big) split sample model and (ii) they demonstrate the ability of the Bayesian model comparison
framework to penalize overparameterized specifications.
Figure 2: Full sample estimates: the rows represent the functions in each equation, columns contain the
functions of a given lagged variable across equations.
−2 0 2 4
0
1
2
growth
growth
t−1
5 10
−0.5
0
0.5
1
1.5
unemployment
t−1
5 10 15
−2
−1
0
interest
t−1
0 2 4
−1
−0.5
0
0.5
inflation
t−1
−2 0 2 4
5
6
7
unemployment
5 10
−2
0
2
4
5 10 15
−0.2
0
0.2
0.4
0.6
0 2 4
−0.2
0
0.2
0.4
0.6
−2 0 2 4
3
4
5
interest
5 10
−0.6
−0.4
−0.2
0
0.2
5 10 15
0
5
10
0 2 4
−1
−0.5
0
0.5
−2 0 2 4
0
1
2
inflation
5 10
−0.6
−0.4
−0.2
0
0.2
5 10 15
−0.5
0
0.5
1
0 2 4
−1
0
1
2
Figure 2 presents the estimated functions for the full-sample NPVAR(1) model. The figure shows that
a linear model would be reasonable for many of the economic relationships particularly in modeling the
effects of lagged unemployment. To a lesser extent, the same is true in other instances (e.g. the dependence
of interest on its past value), where the function estimates do not reveal drastic departures from linearity.
On the other hand, however, in many equations, the effects of lagged financial variables (interest and infla-
tion), as well as the effects of lagged growth, appear to be quite nonlinear. This finding concurs with earlier
studies that have found nonlinearity in growth behavior (Dahl and Gonzalez-Rivera, 2003a,b) and financial
23
markets (H
¨
ardle and Tsybakov, 1997; H
¨
ardle et al., 1998). The Figure shows, for instance, that lagged infla-
tion exhibits significant nonlinearities in every equation, whereas the function estimates for lagged interest
exhibit nonlinearities in three of the four equations. For this reason, future analysis of financial variables
might benefit considerably from employing nonparametric methods. Regarding the effects of lagged GDP
growth, a review of the estimated functions reveals that although there is much nonlinearity, there are also
large regions where the function estimates are approximately linear. This suggests that an interesting future
research question would be to examine whether those types of nonlinearities can be adequately captured
through threshold models.
For comparison purposes, Figures 3 and 4 show function estimates for the pre- and post-Volcker periods,
respectively. It is interesting to note that, although the function estimates differ in some respects, most point
to the same types of nonlinearities as the estimates from the overall sample.
6
This is quite instructive, as
it provides evidence that the econometric relationships may be stable but nonlinear, and therefore omitted
nonlinearity may be a significant driver in findings of structural instability (cf. Hamilton, 2001). Resolving
this issue should be an important item on the research agenda of studies focusing on structural (in)stability.
The apparent stability of the nonparametric function estimates across subsamples naturally leads to
another important research question that has attracted much attention recently. Specifically, it would be of
interest to consider whether an NPVAR model would exhibit evidence of a structural change in variances,
which has been widely documented in contexts utilizing linear models. Such findings (e.g., Stock and
Watson (1996, 2003), Sims and Zha (2006)) have led to the conclusion that a reduction in error volatility
has been a driving force in the “Great Moderation” of the 1980s and 1990s. To formally test the stability of
the mean relationships while allowing for structural breaks in variances, I have estimated three additional
NPVAR models. The first allows for a single structural break between 1979:Q2 and 1979:Q3 with the
Volcker appointment. The second model employs a single structural break between 1982:Q4 and 1983:Q1
with the following the Fed’s disinflation of the early 1980s. The third model allows for both of these
break points. The models were estimated using Algorithm 3 of Section 5, and the marginal likelihoods
6
Since the range and level of each function may differ across samples, readers are cautioned to compare those functions over
the relevant ranges, keeping in mind that the level of the functions will shift to satisfy the identification constraints.
24
Figure 3: Pre-Volcker estimates: the rows represent the functions in each equation, columns contain the
functions of a given lagged variable across equations.
−2 0 2 4
0
1
2
growth
growth
t−1
4 6 8
−1
0
1
unemployment
t−1
2 4 6 8
−2
−1
0
1
interest
t−1
−1 0 1 2 3
−1
0
1
inflation
t−1
−2 0 2 4
5
6
7
unemployment
4 6 8
−2
0
2
2 4 6 8
−0.2
0
0.2
0.4
0.6
−1 0 1 2 3
−0.5
0
0.5
−2 0 2 4
2
3
4
interest
4 6 8
−0.4
−0.2
0
0.2
2 4 6 8
−2
0
2
4
6
−1 0 1 2 3
−0.5
0
0.5
−2 0 2 4
0
1
2
inflation
4 6 8
−0.2
0
0.2
0.4
2 4 6 8
0
1
2
−1 0 1 2 3
−1
0
1
were estimated as discussed in Section 4. The marginal likelihood for the first model was estimated to be
891.4, whereas that of the second was estimated to be 880.1, showing that, conditionally on a single
break, the data favor the 1982/83 breakpoint. However, a much more dramatic improvement is offered by
the third model, the one which allows for both a 1979 and a 1982/83 breakpoints. The marginal likelihood
for that model is 838.4, leading to the conclusion that these three periods in the U.S. sample are indeed
dramatically different. This if further confirmed by examining the estimated covariance matrices for the
three sub-periods:
48:79
=
1.135 0.254 0.085 0.060
0.254 0.144 0.033 0.002
0.085 0.033 0.255 0.044
0.060 0.002 0.044 0.463
,
79:82
=
1.327 0.285 1.095 0.636
0.285 0.367 0.575 0.261
1.095 0.575 4.348 1.445
0.636 0.261 1.445 1.048
,
25
Figure 4: Post-Volcker estimates: the rows represent the functions in each equation, columns contain the
functions of a given lagged variable across equations.
−2 0 2
−0.5
0
0.5
1
1.5
growth
growth
t−1
5 10
−0.5
0
0.5
1
unemployment
t−1
5 10 15
−1.5
−1
−0.5
0
interest
t−1
0 2 4
−1.5
−1
−0.5
0
0.5
inflation
t−1
−2 0 2
6
7
unemployment
5 10
−2
0
2
4
5 10 15
−0.2
0
0.2
0.4
0.6
0 2 4
−0.2
0
0.2
0.4
0.6
−2 0 2
5
6
7
interest
5 10
−0.5
0
0.5
5 10 15
−5
0
5
0 2 4
−1
0
1
−2 0 2
0
0.5
1
inflation
5 10
−0.8
−0.6
−0.4
−0.2
0
0.2
5 10 15
−0.5
0
0.5
1
0 2 4
0
1
2
and
83:05
=
0.234 0.032 0.051 0.008
0.032 0.066 0.031 0.010
0.051 0.031 0.202 0.044
0.008 0.010 0.044 0.216
.
These covariance matrices clearly demonstrate the dramatic peak in the error variances of all variables except
growth (i.e. unemployment, interest rates, and inflation) during the disinflation period and the subsequent
“moderation” of all 4 variables in 1983. A notable feature is the large jump, and subsequent decrease, in the
estimated error variance in the interest rate equation during the period 1979-1982, which can be accounted
for by the Fed’s change of policy instrument from the federal funds rate to reserve targeting, as well as the
unprecedented increase in interest rates during the disinflation period.
Figure 5 presents the function estimates from the model with three variance regimes. The figure demon-
strates that the same type of nonlinearities that were present in the homoskedastic models are still present
here. Therefore, even though the heteroskedastic NPVAR model has confirmed earlier conclusions that
26
changes were large due to breaks in variances, it also shows that there is much nonlinearity that would
remain unexplored by linear models and that future research should study such features of the economic
relationships more closely.
Figure 5: Full sample estimates from model with 3 volatility regimes: the rows represent the functions in
each equation, columns contain the functions of a given lagged variable across equations.
−2 0 2 4
0
1
2
3
growth
growth
t−1
5 10
−0.5
0
0.5
1
unemployment
t−1
5 10 15
−1.5
−1
−0.5
0
0.5
interest
t−1
0 2 4
−1
0
1
inflation
t−1
−2 0 2 4
5
6
7
unemployment
5 10
−2
0
2
4
5 10 15
−0.2
0
0.2
0.4
0.6
0.8
0 2 4
−0.4
−0.2
0
0.2
0.4
−2 0 2 4
3
4
5
interest
5 10
−0.2
0
0.2
0.4
5 10 15
0
5
10
0 2 4
−0.5
0
0.5
−2 0 2 4
0
1
2
inflation
5 10
−0.4
−0.2
0
0.2
5 10 15
−0.5
0
0.5
1
0 2 4
−1
0
1
2
7 Concluding Remarks
This article has examined the specification, estimation, and comparison of nonparametric VAR models.
Efficient MCMC sampling and model comparison techniques are discussed in the context of a new scheme
for identifying the unknown covariate functions, and extensions to heteroskedastic and other settings have
been examined. An application of the NPVAR model to U.S. post-war data on GDP growth, unemployment,
interest rates, and inflation, has confirmed the presence of distinct volatility regimes in the post-war U.S.
macroeconomic series, but has also revealed that important nonlinearities is certain economic relationships
may remain undetected by standard regressions. Implementation of these techniques in related settings, such
27
as those considered in Section 5, is an interesting area for future research.
References
Ahamada, I. and Flachaire, E. (2010), Non-Parametric Econometrics, Oxford: Oxford University Press.
Albert, J. and Chib, S. (1993), “Bayesian Analysis of Binary and Polychotomous Response Data, Journal
of the American Statistical Association, 88, 669–679.
Andrews, D. F. and Mallows, C. L. (1974), “Scale Mixtures of Normal Distributions, Journal of the Royal
Statistical Society Series B, 36, 99–102.
Beaudry, P. and Koop, G. (1993), “Do Recessions Permanently Change Output?” Journal of Monetary
Economics, 31, 149–163.
Belviso, F. and Milani, F. (2006), “Structural Factor-Augmented VARs (SFAVARs) and the Effects of Mon-
etary Policy, Topics in Macroeconomics, 6, Iss. 3, Article 2.
Besag, J., Green, P., Higdon, D., and Mengersen, K. (1995), “Bayesian Computation and Stochastic Sys-
tems, Statistical Science, 10, 3–66.
Canova, F. (1993), “Modelling and Forecasting Exchange Rates with a Bayesian Time-Varying Coefficient
Model, Journal of Economic Dynamics and Control, 17, 233–261.
Chan, J. C. and Jeliazkov, I. (2009), “Efficient Simulation and Integrated Likelihood Estimation in State
Space Models, International Journal of Mathematical Modelling and Numerical Optimisation, 1, 101–
120.
Chauvet, M. (1998), An Econometric Characterization of Business Cycle Dynamics with Factor Structure
and Regime Switching, International Economic Review, 39, 969–996.
Chib, S. (1995), “Marginal Likelihood from the Gibbs Output, Journal of the American Statistical Associ-
ation, 90, 1313–1321.
Chib, S. (1996), “Calculating Posterior Distributions and Modal Estimates in Markov Mixture Models,
Journal of Econometrics, 75, 79–97.
Chib, S. (1998), “Estimation and Comparison of Multiple Change-Point Models, Journal of Econometrics,
86, 221–241.
Chib, S. and Greenberg, E. (2007), Analysis of Additive Instrumental Variable Models, Journal of Com-
putational and Graphical Statistics, 16, 86–114.
Chib, S. and Jeliazkov, I. (2006), “Inference in Semiparametric Dynamic Models for Binary Longitudinal
Data, Journal of the American Statistical Association, 101, 685–700.
Chib, S., Greenberg, E., and Jeliazkov, I. (2009), “Estimation of Semiparametric Models in the Presence of
Endogeneity and Sample Selection, Journal of Computational and Graphical Statistics, 18, 321–348.
28
Cogley, T. and Sargent, T. J. (2001), “Evolving Post-World War II U.S. Inflation Dynamics, NBER Macroe-
conomics Annual, 16, 331–338.
Dahl, C. M. and Gonzalez-Rivera, G. (2003a), “Identifying Nonlinear Components by Random Fields in the
US GNP Growth. Implications for the Shape of the Business Cycle, Studies in Nonlinear Dynamics &
Econometrics, 7, Article 2.
Dahl, C. M. and Gonzalez-Rivera, G. (2003b), “Testing for Neglected Nonlinearity in Regression Models
Based on the Theory of Random Fields, Journal of Econometrics, 114, 141–164.
Denison, D. G. T., Mallick, B. K., and Smith, A. F. M. (1998), Automatic Bayesian Curve Fitting, Journal
of the Royal Statistical Society Series B, 60, 333–350.
Denison, D. G. T., Holmes, C. C., Mallick, B. K., and Smith, A. F. M. (2002), Bayesian Methods for
Nonlinear Classification and Regression, John Wiley & Sons, New York.
DiMatteo, I., Genovese, C. R., and Kass, R. E. (2001), “Bayesian Curve-Fitting with Free-Knot Splines,
Biometrika, 88, 1055–1071.
Dueker, M. (2005), “Dynamic Forecasts of Qualitative Variables: A Qual VAR Model of U.S. Recessions,
Journal of Business & Economic Statistics, 23, 96–104.
Fahrmeir, L. and Lang, S. (2001), “Bayesian Inference for Generalized Additive Mixed Models Based on
Markov Random Field Priors, Journal of the Royal Statistical Society Series C, 50, 201–220.
Gelfand, A. E. (2000), “Discussion to “Bayesian Backfitting”, Statistical Science, 15, 217–218.
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003), Bayesian Data Analysis, Chapman & Hall,
New York, 2 edn.
Greenberg, E. (2008), Introduction to Bayesian Econometrics, Cambridge University Press, New York.
Hamilton, J. D. (1989), A New Approach to the Economic Analysis of Nonstationary Time Series and the
Business Cycle, Econometrica, 57, 357–384.
Hamilton, J. D. (2001), A Parametric Approach to Flexible Nonlinear Inference, Econometrica, 69, 537–
573.
Hansen, B. E. (1992), “The Likelihood Ratio Test Under Nonstandard Conditions: Testing the Markov
Switching Model of GNP, Journal of Applied Econometrics, 7, S61–S82.
H
¨
ardle, W. and Tsybakov, A. (1997), “Local Polynomial Estimators of the Volatility Function in Nonpara-
metric Autoregression, Journal of Econometrics, 81, 223–243.
H
¨
ardle, W., Tsybakov, A., and Yang, L. (1998), “Nonparametric Vector Autoregression, Journal of Statis-
tical Planning and Inference, 68, 221–245.
Hastie, T. and Tibshirani, R. (1990), Generalized Additive Models, Chapman & Hall, New York.
Hastie, T. and Tibshirani, R. (2000), “Bayesian Backfitting, Statistical Science, 15, 196–223.
29
Holmes, C. C., Denison, D. G. T., and Mallick, B. K. (2002), Accounting for Model Uncertainty in Seem-
ingly Unrelated Regressions, Journal of Computational and Graphical Statistics, 11, 533–551.
Imai, K. and van Dyk, D. (2005), A Bayesian Analysis of the Multinomial Probit Model Using Marginal
Data Augmentation, Journal of Econometrics, 124, 311–334.
Jeliazkov, I. (2011), “Specification and Inference in Nonparametric Additive Regression, working paper,
Department of Economics, University of California, Irvine.
Jeliazkov, I. and Lee, E. H. (2010), “MCMC Perspectives on Simulated Likelihood Estimation, Advances
in Econometrics: Maximum Simulated Likelihood, 26, 3–39.
Kim, C.-J. and Nelson, C. R. (1999), “Has the U.S. Economy Become More Stable? A Bayesian Approach
Based on a Markov-Switching Model of the Business Cycle, Review of Economics and Statistics, 81,
608–616.
Kim, C.-J., Morley, J., and Piger, J. (2005), “Nonlinearity and the Permanent Effects of Recessions, Journal
of Applied Econometrics, 20, 291–309.
Koop, G. (2003), Bayesian Econometrics, John Wiley & Sons, New York.
Koop, G. and Korobilis, D. (2009), “Bayesian Multivariate Time Series Methods for Empirical Macroeco-
nomics, Foundations and Trends in Econometrics, 3, 267–358.
Koop, G. and Poirier, D. J. (2004), “Bayesian Variants of Some Classical Semiparametric Regression Tech-
niques, Journal of Econometrics, 123, 259–282.
Koop, G., Poirier, D. J., and Tobias, J. (2005), “Bayesian Semiparametric Inference in Multiple Equation
Models, Journal of Applied Econometrics, 20, 723–747.
Kose, M. A., Otrok, C., and Whiteman, C. H. (2003), “International Business Cycles: World, Region and
Country Specific Factors, American Economic Review, 93, 1216–1239.
Kose, M. A., Otrok, C., and Whiteman, C. H. (2008), “Understanding the Evolution of World Business
Cycles, Journal of International Economics, 75, 110–130.
Lin, X. and Zhang, D. (1999), “Inference in Generalized Additive Mixed Models by Using Smoothing
Splines, Journal of the Royal Statistical Society Series B, 61, 381–400.
Lindley, D. V. (1971), Bayesian Statistics: A Review, SIAM, Philadelphia.
Meng, X.-L. and van Dyk, D. (1999), “Seeking Efficient Data Augmentation Schemes via Conditional and
Marginal Augmentation, Biometrika, 86, 301–320.
Pesaran, M. H. and Potter, S. M. (1997), A Floor and Ceiling Model of U.S. Output, Journal of Economic
Dynamics and Control, 21, 661–695.
Poirier, D. J. (1973), “Piecewise Regression Using Cubic Spline, Journal of the American Statistical Asso-
ciation, 68, 515–524.
Poirier, D. J. (1998), “Revising Beliefs in Non-Identified Models, Econometric Theory, 14, 483–509.
30
Potter, S. M. (1995), A Nonlinear Approach to US GNP, Journal of Applied Econometrics, 10, 109–125.
Primiceri, G. (2005), “Time Varying Structural Vector Autoregressions and Monetary Policy, Review of
Economic Studies, 72, 821–852.
Ruppert, D., Wand, M. P., and Carroll, R. J. (2003), Semiparametric Regression, Cambridge University
Press, Cambridge, UK.
Shiller, R. (1984), “Smoothness Priors and Nonlinear Regression, Journal of the American Statistical As-
sociation, 79, 609–615.
Shively, T. S., Kohn, R., and Wood, S. (1999), “Variable Selection and Function Estimation in Additive
Nonparametric Regression Using a Data-Based Prior, Journal of the American Statistical Association,
94, 777–806.
Silverman, B. (1985), “Some Aspects of the Spline Smoothing Approach to Non-parametric Regression
Curve Fitting, Journal of the Royal Statistical Society Series B, 47, 1–52.
Sims, C. A. (1980), “Macroeconomics and Reality, Econometrica, 48, 1–48.
Sims, C. A. and Zha, T. (2006), “Were There Regime Switches in U.S. Monetary Policy?” American
Economic Review, 96, 54–81.
Smith, M. and Kohn, R. (2000), “Nonparametric Seemingly Unrelated Regression, Journal of Economet-
rics, 98, 257–281.
Stock, J. H. and Watson, M. W. (1996), “Evidence on Structural Instability in Macroeconomic Time Series
Relations, Journal of Business and Economic Statistics, 14, 11–30.
Stock, J. H. and Watson, M. W. (2003), “Has the Business Cycles Changed? Evidence and Explanations,
in Monetary Policy and Uncertainty: Adapting to a Changing Economy, pp. 9–56, Federal Reserve Bank
of Kansas City.
Tanner, M. A. and Wong, W. H. (1987), “The Calculation of Posterior Distributions by Data Augmentation,
Journal of the American Statistical Association, 82, 528–549.
van Dyk, D. and Meng, X.-L. (2001), “The Art of Data Augmentation, Journal of Computational and
Graphical Statistics, 10, 1–50.
Wahba, G. (1978), “Improper Priors, Spline Smoothing and the Problem of Guarding Against Model Errors
in Regression, Journal of the Royal Statistical Society Series B, 40, 364–372.
Wasserman, L. (2006), All of Nonparametric Statistics, Springer, New York.
Whittaker, E. (1923), “On a New Method of Graduation, Proceedings of the Edinburgh Mathematical
Society, 41, 63–75.
Wood, S. and Kohn, R. (1998), A Bayesian Approach to Robust Binary Nonparametric Regression, Jour-
nal of the American Statistical Association, 93, 203–213.
Wood, S., Kohn, R., Shively, T., and Jiang, W. (2002), “Model Selection in Spline Nonparametric Regres-
sion, Journal of the Royal Statistical Society Series B, 64, 119–139.
31
Yang, L., H
¨
ardle, W., and Nielsen, J. (1999), “Nonparametric Autoregression with Multiplicative Volatility
and Additive Mean, Journal of Time Series Analsys, 20, 579–604.
32
... Similarly, there is a need for flexible VAR modeling for analyzing heterogenous multi-subject data in our motivating Human Connectome Project (HCP) neuroimaging study, where parametric VAR models prove inadequate (see Section 5). To bypass parametric constraints, some recent articles relaxed Gaussianity assumptions (Lanne and Lütkepohl, 2010;Jeliazkov, 2013) or proposed non-linear extensions (Dahl and González-Rivera, 2003). ...
Preprint
Full-text available
There has been a rich development of vector autoregressive (VAR) models for modeling temporally correlated multivariate outcomes. However, the existing VAR literature has largely focused on single subject parametric analysis, with some recent extensions to multi-subject modeling with known subgroups. Motivated by the need for flexible Bayesian methods that can pool information across heterogeneous samples in an unsupervised manner, we develop a novel class of non-parametric Bayesian VAR models based on heterogeneous multi-subject data. In particular, we propose a product of Dirichlet process mixture priors that enables separate clustering at multiple scales, which result in partially overlapping clusters that provide greater flexibility. We develop several variants of the method to cater to varying levels of heterogeneity. We implement an efficient posterior computation scheme and illustrate posterior consistency properties under reasonable assumptions on the true density. Extensive numerical studies show distinct advantages over competing methods in terms of estimating model parameters and identifying the true clustering and sparsity structures. Our analysis of resting state fMRI data from the Human Connectome Project reveals biologically interpretable differences between distinct fluid intelligence groups, and reproducible parameter estimates. In contrast, single-subject VAR analyses followed by permutation testing result in negligible differences, which is biologically implausible.
... See for instanceJeliazkov (2013) for the case of latent variables in a non parametric VAR specification. 4 Many thanks to Ivan Jeliazkov and the participants of the UCI seminar for the suggestion to sample γ marginally of Z and then sampling Z|γ. See appendix E in the supplementary material. ...
Article
Linear regression with measurement error in the covariates is a heavily studied topic, however, the statistics/econometrics literature is almost silent to estimating a multi-equation model with measurement error. This paper considers a seemingly unrelated regression model with measurement error in the covariates and introduces two novel estimation methods: a pure Bayesian algorithm (based on Markov chain Monte Carlo techniques) and its mean field variational Bayes (MFVB) approximation. The MFVB method has the added advantage of being computationally fast and can handle big data. An issue pertinent to measurement error models is parameter identification, and this is resolved by employing a prior distribution on the measurement error variance. The methods are shown to perform well in multiple simulation studies, where we analyze the impact on posterior estimates for different values of reliability ratio or variance of the true unobserved quantity used in the data generating process. The paper further implements the proposed algorithms in an application drawn from the health literature and shows that modeling measurement error in the data can improve model fitting.
... VAR models typically attempt to fit a multivariate time series with linear coefficients representing the dependencies of multivariate variables within limited number of lags, and innovation (or error) representing new information (impulses) fed to the process at a given time point. Although it has been common practice to maintain a linear functional form to achieve interpretability and tractability, recent studies have provided a growing body of evidence that nonlinearity often exists in time series, and allowing for nonlinearities can be valuable for uncovering important features of dynamics [13,14,17,22,23,27,28]. Many recent studies used a deep learning framework to model nonlinear processes in video [1,19,21,26,30,32] or audio [29], for example, with neural networks. ...
Preprint
The nonlinear vector autoregressive (NVAR) model provides an appealing framework to analyze multivariate time series obtained from a nonlinear dynamical system. However, the innovation (or error), which plays a key role by driving the dynamics, is almost always assumed to be additive. Additivity greatly limits the generality of the model, hindering analysis of general NVAR process which have nonlinear interactions between the innovations. Here, we propose a new general framework called independent innovation analysis (IIA), which estimates the innovations from completely general NVAR. We assume mutual independence of the innovations as well as their modulation by a fully observable auxiliary variable (which is often taken as the time index and simply interpreted as nonstationarity). We show that IIA guarantees the identifiability of the innovations with arbitrary nonlinearities, up to a permutation and component-wise invertible nonlinearities. We propose two practical estimation methods, both of which can be easily implemented by ordinary neural network training. We thus provide the first rigorous identifiability result for general NVAR, as well as very general tools for learning such models.
... See for instanceJeliazkov (2013) for the case of latent variables in a non parametric VAR specification. 4 Many thanks to Ivan Jeliazkov and the participants of the UCI seminar for the suggestion to sample γ marginally of Z and then sampling Z|γ. See appendix E in the supplementary material. ...
Preprint
Full-text available
Linear regression with measurement error in the covariates is a heavily studied topic, however, the statistics/econometrics literature is almost silent to estimating a multi-equation model with measurement error. This paper considers a seemingly unrelated regression model with measurement error in the covariates and introduces two novel estimation methods: a pure Bayesian algorithm (based on Markov chain Monte Carlo techniques) and its mean field variational Bayes (MFVB) approximation. The MFVB method has the added advantage of being computationally fast and can handle big data. An issue pertinent to measurement error models is parameter identification, and this is resolved by employing a prior distribution on the measurement error variance. The methods are shown to perform well in multiple simulation studies, where we analyze the impact on posterior estimates arising due to different values of reliability ratio or variance of the true unobserved quantity used in the data generating process. The paper further implements the proposed algorithms in an application drawn from the health literature and shows that modeling measurement error in the data can improve model fitting.
... See for instanceJeliazkov (2013) for the case of latent variables in a non parametric VAR specification. 4 Many thanks to Ivan Jeliazkov and the participants of the UCI seminar for the suggestion to sample γ marginally of Z and then sampling Z|γ. See appendix E in the supplementary material. ...
Preprint
Full-text available
Linear regression with measurement error in the covariates is a heavily studied topic, however, the statistics/econometrics literature is almost silent to estimating a multi-equation model with measurement error. This paper considers a seemingly unrelated regression model with measurement error in the covariates and introduces two novel estimation methods: a pure Bayesian algorithm (based on Markov chain Monte Carlo techniques) and its mean field variational Bayes (MFVB) approximation. The MFVB method has the added advantage of being computationally fast and can handle big data. An issue pertinent to measurement error models is parameter identification, and this is resolved by employing a prior distribution on the measurement error variance. The methods are shown to perform well in multiple simulation studies, where we analyze the impact on posterior estimates for different values of reliability ratio or variance of the true unobserved quantity used in the data generating process. The paper further implements the proposed algorithms in an application drawn from the health literature and shows that modeling measurement error in the data can improve model fitting.
Article
I propose a flexible Radial Basis Functions (RBFs) Artificial Neural Networks method for studying the time series properties of macroeconomic variables. To assess the validity of the RBF approach, I conduct a Monte Carlo experiment using the data generated from a nonlinear New Keynesian (NK) model. I find that the RBF estimator can uncover the structure of the NK model from the simulated data of 300 observations. Finally, I apply the RBF estimator to the quarterly US data and show that the positive supply shocks have significantly weaker expansionary effects during the periods of passive monetary policy regimes.
Article
Full-text available
Vector autoregressive (VAR) models are the main work-horse model for macroeconomic forecasting, and provide a framework for the analysis of complex dynamics that are present between macroeconomic variables. Whether a classical or a Bayesian approach is adopted, most VAR models are linear with Gaussian innovations. This can limit the model’s ability to explain the relationships in macroeconomic series. We propose a nonparametric VAR model that allows for nonlinearity in the conditional mean, heteroscedasticity in the conditional variance, and non-Gaussian innovations. Our approach differs to that of previous studies by modelling the stationary and transition densities using Bayesian nonparametric methods. Our Bayesian nonparametric VAR (BayesNP-VAR) model is applied to US and UK macroeconomic time series, and compared to other Bayesian VAR models. We show that BayesNP-VAR is a flexible model that is able to account for nonlinear relationships as well as heteroscedasticity in the data. In terms of short-run out-of-sample forecasts, we show that BayesNP-VAR predictively outperforms competing models.
Article
Full-text available
Spline theory and piecewise regression theory are integrated to provide a framework in which structural change is viewed as occurring in a smooth fashion. Specifically, structural change occurs at given points through jump discontinuities in the third derivative of a continuous piecewise cubic estimating function. Testing procedures are developed for detecting structural change as well as linear or quadratic segments. Finally, the techniques developed are illustrated empirically in a learning-by-doing model.
Article
Full-text available
This article revisits the Bayesian inferential problem for the class of nonparametric additive models. A new identiflcation scheme for the unknown covariate functions is proposed and con- trasted with existing approaches, and is used to develop an e-cient Markov chain Monte Carlo estimation algorithm. Building upon the identiflcation scheme, the resulting estimation proce- dure, and a class of proper smoothness priors for the unknown functions, the paper considers the problem of model comparison using marginal likelihoods and Bayes factors. A simulation study illustrates the performance of the proposed techniques. The methods are illustrated in two applications in economics { one dealing with student achievement, and the other with urban growth. Extensions of the methodology to other settings, such as discrete and clustered data, are also discussed.
Article
Monetary policy and the private sector behaviour of the U.S. economy are modelled as a time varying structural vector autoregression, where the sources of time variation are both the coefficients and the variance covariance matrix of the innovations. The paper develops a new, simple modelling strategy for the law of motion of the variance covariance matrix and proposes an efficient Markov chain Monte Carlo algorithm for the model likelihood/posterior numerical evaluation. The main empirical conclusions are: (1) both systematic and non-systematic monetary policy have changed during the last 40 years - in particular, systematic responses of the interest rate to inflation and unemployment exhibit a trend toward a more aggressive behaviour, despite remarkable oscillations; (2) this has had a negligible effect on the rest of the economy. The role played by exogenous non-policy shocks seems more important than interest rate policy in explaining the high inflation and unemployment episodes in recent U.S. economic history.
Article
A Bayesian analysis of a nonidentified model is always possible if a proper prior on all the parameters is specified. There is, however, no Bayesian free lunch. The "price" is that there exist quantities about which the data are uninformative, i.e., their marginal prior and posterior distributions are identical. In the case of improper priors the analysis is problematic - resulting posteriors can be improper. This study investigates both proper and improper cases through a series of examples.
Article
Markov chain Monte Carlo (MCMC) methods have been used extensively in statistical physics over the last 40 years, in spatial statistics for the past 20 and in Bayesian image analysis over the last decade. In the last five years, MCMC has been introduced into significance testing, general Bayesian inference and maximum likelihood estimation. This paper presents basic methodology of MCMC, emphasizing the Bayesian paradigm, conditional probability and the intimate relationship with Markov random fields in spatial statistics. Hastings algorithms are discussed, including Gibbs, Metropolis and some other variations. Pairwise difference priors are described and are used subsequently in three Bayesian applications, in each of which there is a pronounced spatial or temporal aspect to the modeling. The examples involve logistic regression in the presence of unobserved covariates and ordinal factors; the analysis of agricultural field experiments, with adjustment for fertility gradients; and processing of low-resolution medical images obtained by a gamma camera. Additional methodological issues arise in each of these applications and in the Appendices. The paper lays particular emphasis on the calculation of posterior probabilities and concurs with others in its view that MCMC facilitates a fundamental breakthrough in applied Bayesian modeling. Comments: Arnoldo Frigessi (41–43), Alan E. Gelfand, Bradley P. Carlin (43–46), Charles J. Geyer (46–48), G. O. Roberts, S. K. Sahu, W. R. Gilks (49–51), Wing Hung Wong (52–53), Bin Yu (54–58), Julian Besag, Peter Green, David Higdon, Kerrie Mengersen (58–66).
Article
This paper presents necessary and sufficient conditions under which a random variable X may be generated as the ratio Z/V where Z and V are independent and Z has a standard normal distribution. This representation is useful in Monte Carlo calculations. It is established that when 1/2V2 is exponential, X is double exponential; and that when 1/2V has the asymptotic distribution of the Kolmogorov distance statistic, X is logistic.
Article
A hierarchical Bayesian approach is proposed for variable selection and function estimation in additive nonparametric Gaussian regression models and additive nonparametric binary regression models. The prior for each component function is an integrated Wiener process resulting in a posterior mean estimate that is a cubic smoothing spline. Each of the explanatory variables is allowed to be in or out of the model, and the regression functions are estimated by model averaging. To allow variable selection and model averaging, data-based priors are used for the smoothing parameter and the slope at 0 of each component function. A two-step Markov chain Monte Carlo method is used to efficiently obtain the data-based prior and to carry out variable selection and function estimation. It is shown by simulation that significant improvements in the function estimators can be obtained over an approach that estimates all the unknown functions simultaneously. The methodology is illustrated for a binary regression using heart attack data.
Article
The term data augmentation refers to methods for constructing iterative optimization or sampling algorithms via the introduction of unobserved data or latent variables. For deterministic algorithms,the method was popularizedin the general statistical community by the seminal article by Dempster, Laird, and Rubin on the EM algorithm for maximizing a likelihood function or, more generally, a posterior density. For stochastic algorithms, the method was popularized in the statistical literature by Tanner and Wong’s Data Augmentation algorithm for posteriorsampling and in the physics literatureby Swendsen and Wang’s algorithm for sampling from the Ising and Potts models and their generalizations; in the physics literature,the method of data augmentationis referred to as the method of auxiliary variables. Data augmentationschemes were used by Tanner and Wong to make simulation feasible and simple, while auxiliary variables were adopted by Swendsen and Wang to improve the speed of iterative simulation. In general,however, constructing data augmentation schemes that result in both simple and fast algorithms is a matter of art in that successful strategiesvary greatlywith the (observed-data) models being considered.After an overview of data augmentation/auxiliary variables and some recent developments in methods for constructing such