ArticlePDF Available

Nonparametric Vector Autoregressions: Specification, Estimation, and Inference

December 2013
Advances in Econometrics 33

December 2013
33

DOI:10.1108/S0731-9053(2013)0000031009

Authors:

University of California, Irvine

For over three decades, vector autoregressions have played a central role in empirical macroeconomics. These models are general, can capture sophisticated dynamic behavior, and can be extended to include features such as structural instability, time-varying parameters, dynamic factors, threshold-crossing be-havior, and discrete outcomes. Building upon growing evidence that the assumption of linearity may be undesirable in modeling certain macroeconomic relationships, this paper seeks to add to recent ad-vances in VAR modeling by proposing a nonparametric dynamic model for multivariate time series. In this model, the problems of modeling and estimation are approached from a hierarchical Bayesian per-spective. The article considers the issues of identification, estimation, and model comparison, enabling nonparametric VAR models to be fit efficiently by Markov chain Monte Carlo algorithms and compared to parametric and semiparametric alternatives by marginal likelihoods and Bayes factors. Among other benefits, the methodology allows for a more careful study of structural instability while guarding against the possibility of unaccounted nonlinearity in otherwise stable economic relationships. Extensions of the proposed nonparametric model to settings with heteroskedasticity and other important modeling features are also considered. The techniques are employed to study the post-war US economy, confirming the presence of distinct volatility regimes and supporting the contention that certain nonlinear relationships in the data can remain undetected by standard models.

Full sample estimates: the rows represent the functions in each equation, columns contain the functions of a given lagged variable across equations.

…

Pre-Volcker estimates: the rows represent the functions in each equation, columns contain the functions of a given lagged variable across equations.

…

Full sample estimates from model with 3 volatility regimes: the rows represent the functions in each equation, columns contain the functions of a given lagged variable across equations.

…

Figures - uploaded by Ivan Jeliazkov

Content may be subject to copyright.

Content uploaded by Ivan Jeliazkov

Content may be subject to copyright.

NONPARAMETRIC VECTOR AUTOREGRESSIONS:

SPECIFICATION, ESTIMATION, AND INFERENCE

IVAN JELIAZKOV

∗

University of California, Irvine

July, 2013

Abstract

For over three decades, vector autoregressions have played a central role in empirical macroeconomics.

These models are general, can capture sophisticated dynamic behavior, and can be extended to include

features such as structural instability, time-varying parameters, dynamic factors, threshold-crossing be-

havior, and discrete outcomes. Building upon growing evidence that the assumption of linearity may

be undesirable in modeling certain macroeconomic relationships, this paper seeks to add to recent ad-

vances in VAR modeling by proposing a nonparametric dynamic model for multivariate time series. In

this model, the problems of modeling and estimation are approached from a hierarchical Bayesian per-

spective. The article considers the issues of identiﬁcation, estimation, and model comparison, enabling

nonparametric VAR models to be ﬁt efﬁciently by Markov chain Monte Carlo algorithms and compared

to parametric and semiparametric alternatives by marginal likelihoods and Bayes factors. Among other

beneﬁts, the methodology allows for a more careful study of structural instability while guarding against

the possibility of unaccounted nonlinearity in otherwise stable economic relationships. Extensions of the

proposed nonparametric model to settings with heteroskedasticity and other important modeling features

are also considered. The techniques are employed to study the post-war US economy, conﬁrming the

presence of distinct volatility regimes and supporting the contention that certain nonlinear relationships

in the data can remain undetected by standard models.

Keywords: Additive model; Vector autoregressive (VAR) model; Bayesian model comparison; Markov

chain Monte Carlo.

JEL Codes: C11, C14, C15, C32, C52, E31, E32, E37, E43, E47.

1 Introduction and Motivation

Following the seminal work of Sims (1980), the vector autoregressive (VAR) model has played a central

role in empirical macroeconomics. The basic model postulates that a q-dimensional vector of time-series

∗

Department of Economics, University of California, Irvine, 3151 Social Science Plaza, Irvine, CA 92697-5100. E-mail:

ivan@uci.edu. I am grateful to Tom Fomby, Lutz Killian, Anthony Murphy, two anonymous referees, and my colleagues Dale

Poirier, David Brownstone, Fabio Milani, and especially Angela Vossmeyer, for their careful comments on earlier drafts.

variables y

= (y

, . . . , y

)

′

depends on its past realizations through the speciﬁcation

= c +



j=1

t−j

+ ε

, t = 1, . . . , T, (1)

where c is a q-vector of intercepts, {B

}

j=1

are q × q matrices of parameters, {y

t−j

}

j=1

are lags of y

, and

is an error term with mean zero and q × q covariance matrix Ω. VAR models are general and can capture

sophisticated dynamic behavior, even when the lag length p is relatively small. Moreover, VAR models are

very versatile – in the past few decades they have been adapted to incorporate structural instability, regime

switching, time-varying parameters, dynamic factors, threshold-crossing behavior, and discrete data, among

others. Consequently, VAR methodology has been an important instrument in policy analysis, forecasting,

and academic discourse (for a recent review, see Koop and Korobilis, 2009).

While many extensions of the model in (1) are possible, it has been common practice to maintain a para-

metric, typically linear, functional form for the conditional mean of y

given its lags. Important extensions of

the basic setup are afforded by models in which the parameters are allowed to change over time as in regime

switching, changepoint, time-varying parameter, and threshold models.

For instance, following Hamilton

(1989), much work has been done on estimating models subject to regime shifts in the mean, variance, or

dynamics (e.g., Hansen, 1992; Chib, 1996; Chauvet, 1998; Kim and Nelson, 1999; Kim et al., 2005; Sims

and Zha, 2006). Threshold regressions have been considered in Beaudry and Koop (1993), Potter (1995),

and Pesaran and Potter (1997), while time-varying parameter applications have been examined in Canova

(1993), Stock and Watson (1996), Cogley and Sargent (2001), Primiceri (2005), and Chan and Jeliazkov

(2009), among others. Much more rare in time series analysis has been the application of nonparametric

methods in the modeling and estimation of the conditional mean of a time series process, and this represents

the basic econometric problem motivating this work.

The discussion in this paper is primarily concerned with methods for allowing considerable ﬂexibility

in estimating the dependence on lags in VAR models, while maintaining simplicity, computational tractabil-

ity, and accommodating other modeling features that may be present in the model. The VAR paradigm

In these cases, the modeling involves an additional state variable that can be latent or observed; conditionally on the state, the

models are linear, whereas marginalization over the state yields piecewise linear regressions or mixtures of linear regressions.

came to prominence as a methodological framework involving only minimal restrictions. Consequently,

continuing in this tradition, this paper discusses ways of estimating multivariate dynamic systems without

assuming a priori knowledge of the functional form. Even though the nonparametric literature is vast and

diverse, applications to time series have been limited despite their potential appeal and importance. In an

application to exchange rates, H

ardle et al. (1998) use local polynomial methods based on kernel-weighted

least squares, to estimate a nonparametric bivariate dynamic system in which both the conditional mean and

variance are unknown functions of the past. The methodology in that paper relates to single-equation tech-

niques and exchange rate applications considered in H

ardle and Tsybakov (1997) and Yang et al. (1999).

Univariate nonparametric regressions have been used in Dahl and Gonzalez-Rivera (2003a) to study the

evolution of U.S. GNP growth using the method of Hamilton (2001), who employed the techniques to ad-

dress nonlinearity in the inﬂation-unemployment trade-off in an example involving the Phillips Curve. Dahl

and Gonzalez-Rivera (2003b) apply the methodology to study the evolution of industrial production for a

subsample of OECD countries. Their results support the contention that much nonlinearity is neglected if

standard linear models are applied in these settings.

These and other papers have provided a growing body of evidence that allowing for various kinds of

nonlinearity in empirical macroeconomics can be very valuable in uncovering important features of time-

series relationships. Building upon these advances, this paper seeks to add to the literature by examining

a nonparametric dynamic model for multivariate time series. Speciﬁcally, this paper considers a dynamic

system of q regression equations for data {y

}

t=1

, where y

= (y

, . . . , y

)

′

, in which the ith equation

(i = 1, . . . , q) is modeled through the additive form (Hastie and Tibshirani, 1990)



j=1



k=1

ijk

j,t−k

) + ε

, t = 1, . . . , T, (2)

where ε

= (ε

, . . . , ε

)

′

∼ N(0, Ω) and the unknown functions {g

ijk

j,t−k

)} will be modeled and

estimated nonparametrically. For this reason, in the remainder of this paper, the speciﬁcation in (2) will be

referred to as a nonparametric VAR (or NPVAR) model. The model in (2) provides a natural extension of

the traditional linear VAR model in (1) – relative to its parametric counterpart, the NPVAR model maintains

additivity but does not require that the estimated regression relationships lie in a particular class of functions.

Functional ﬂexibility is desirable because nonlinearity is common in both economic theory and practice.

Moreover, nonparametric additive modeling has desirable practical and theoretical properties and can serve

as a useful exploratory tool that is easily inserted in more complex models. Even though the estimation

of unknown functions is a complex high-dimensional problem, the additive framework is well suited for

dealing with the “curse of dimensionality” because the argument of each function is a single variable.

The speciﬁcation of the NPVAR model will be approached from a hierarchical Bayesian perspective

with special emphasis on the issues of identiﬁcation, estimation, and model comparison, enabling NPVAR

models to be ﬁt efﬁciently by Markov chain Monte Carlo (MCMC) algorithms and compared to nested

and non-nested parametric and semiparametric alternatives by marginal likelihoods and Bayes factors. The

methodology is useful in its own right as an exploratory and modeling tool, but is also appealing because

it enables a more careful study of other structural features while guarding against the possibility of unac-

counted nonlinearity. Doing so is important for theoretical and practical reasons, and because the conse-

quences of ignored nonlinearity can be severe.

The types of misspeciﬁcation that arise from assuming an inappropriate functional form can be illus-

trated by considering two simple motivating examples. Imagine that data are generated from the model

= g(x

) + ε

, ε

∼ N(0, σ

), and g(·) is the nonlinear function in panel (a) of Figure 1. If estimation

is by linear regression (the resulting regression line is also shown in the panel (a) of the Figure), it is easy

to see that the regression residuals will be heteroskedastic, owing to the neglected nonlinearity in g(·). If

the covariate x

is a lag of y

, the misspeciﬁcation can also lead to erroneous ﬁndings of serial correlation

in the errors. Furthermore, even though the original errors used to generate the data were Gaussian, in a

linear regression they will appear non-Gaussian (see panel (b) of Figure 1). Due to the omitted nonlinearity,

the error distribution will be a location mixture of normals, and consequently one would conclude that the

Gaussian assumption is inadequate when the real culprit is neglected nonlinearity. Note that these problems

will not be resolved by using estimators that are robust to distributional misspeciﬁcation.

For our second example, consider panel (c) of Figure 1. In this case, imagine that the researcher is

Figure 1: Ignored nonlinearity can lead to erroneous conclusions about the presence of heteroskedasticity or

autocorrelation, the adequacy of the distributional assumptions, the structural stability of the regression, or

lead to conclusions that more proﬂigate models are required.

0 1 2

−2

−1

(a)

−2 0 2

0.1

0.2

0.3

0.4

(b)

0.5 1 1.5

−1

−0.5

0.5

(c)

aware that the data generating process involves a nonlinear mean, but chooses to restrict attention to the

class of piecewise linear polynomials. Although heteroskedasticity, autocorrelation, and non-Gaussianity

may not be signiﬁcant (or even discernable) problems if a bilinear model is ﬁt to the particular data in panel

regime switching, changepoint, threshold, and time-varying parameter models may appreciably improve the

ﬁt relative to a linear model, although one should bear in mind that these would be spurious ﬁndings of

instability or structural breaks since the underlying data generating process is a stable, although nonlinear,

function of the covariates that is not properly accommodated in the regressions. Such spurious instability,

unfortunately, is not the only pitfall that can be induced by this type of misspeciﬁcation. The poor ﬁt of low

order linear dynamic systems may also lead researchers to explore more proﬂigate models involving more

lags. This would lead to loss of parsimony as additional lag components are incorporated but simply act as

atheoretical ﬁtting parameters in the model.

These examples provide strong motivation for studying the NPVAR model because the problems they

identify can not be addressed satisfactorily without directly addressing the ﬂexibility of the functional form.

This, of course, is not to say that features such as heteroskedasticity, autocorrelation, non-Gaussianity,

or structural instability can not be present in nonlinear models. On the contrary, they can be important

integral parts of the NPVAR model, and many such extensions will be considered in Section 5 and the

application in Section 6. However, the examples do suggest that before jumping to conclusions about the

presence of any of the aforementioned features, one must ensure that they are not spuriously induced by

functional form misspecifcation. To enable this task to be carried out, this paper provides methodology for

the speciﬁcation, estimation, and comparison of nonparametric models, which can be useful in this pursuit,

as demonstrated in a study of U.S. macroeconomic data. The application reveals that the NPVAR model

supports the existence of distinct volatility regimes in the data, and provides evidence that means remain

stable but exhibit interesting nonlinearities.

The remainder of this article deals with the hierarchical structure of NPVAR models and their imple-

mentation in practice. Speciﬁcally, Section 2 presents the speciﬁcation of the NPVAR model together with a

computationally convenient identiﬁcation restriction on the unknown additive functions. Section 3 presents

an efﬁcient ﬁtting algorithm based on MCMC simulation techniques, which subsumes frequentist estima-

tion by backﬁtting as a special case. Section 4 addresses the problem of model comparison and model

averaging by discussing the computation of marginal likelihoods and Bayes factors. Section 5 outlines

extensions to settings with heteroskedasticity, Student-t errors, structural instability, heteroskedasticity or

stochastic volatility, dynamic factors, and discrete outcomes, and provides references to the relevant litera-

ture. Section 6 considers the application of the NPVAR model to data for the post-war US economy, whereas

Section 7 offers concluding remarks.

2 Hierarchical Model Speciﬁcation

The NPVAR model will be speciﬁed through the following distributional hierarchy. The likelihood function

f(y|{g

ijk

(·)}, Ω) obtained from the additive model in (2) will be augmented with a prior distribution (or

model) for the each unknown function π(g

ijk

(·)|τ

ijk

) that will, in turn, depend on a hyperparameter τ

ijk

The hierarchy will be completed by the priors on τ

ijk

and Ω, denoted by π(τ

ijk

) and π(Ω), respectively.

The prior π(g

ijk

(·)|τ

ijk

) is often referred to as a “smoothness prior” because it aims at penalizing rough

functions but does not absolutely rule out any values that the function can take. The prior is very similar to

the roughness penalty in frequentist penalized likelihood estimation. The parameters τ

ijk

are often called

smoothness parameters because they control the degree of smoothness of g

ijk

(·) in π(g

ijk

(·)|τ

ijk

). Details

will be provided in the remainder of this section, but it is important to note that the methodology leans on

a vast literature in Bayesian nonparametric estimation in a variety of areas with continuous, discrete, and

censored responses, including cross-sectional settings (Besag et al., 1995; Wood and Kohn, 1998; Hastie and

Tibshirani, 2000; Fahrmeir and Lang, 2001; Wood et al., 2002; Koop and Poirier, 2004), multiple equation

systems (Smith and Kohn, 2000; Holmes et al., 2002; Koop et al., 2005), panel data (Chib and Jeliazkov,

2006), sample selection models (Chib and Greenberg, 2007; Chib et al., 2009), and time series applications

(Hamilton, 2001). Extensions to Bayesian models with free-knot splines have been pursued in Denison et al.

(1998) and DiMatteo et al. (2001), while Bayesian estimation techniques for multivariate functions have

been provided in Shively et al. (1999) and Wood et al. (2002). Useful reviews and introduction to many

aspects of nonparametric modeling can be found in Hastie and Tibshirani (1990), Denison et al. (2002),

Koop (2003), Ruppert et al. (2003), Wasserman (2006), and Ahamada and Flachaire (2010).

Nonparametric

functional modeling has appealing frequentist and Bayesian properties, and many of its advantages have

been illustrated in the aforementioned works.

Some simpliﬁcation in the notation can be obtained by denoting the r = qp lagged variables on the right

hand side of the ith equation (i = 1, . . . , q) in (2) by {s

ijt

}

j=1

and writing that equation as

= g

i1t

) + . . . + g

irt

) + ε

, (t = 1, . . . , T ). (3)

Then, to motivate the hierarchical model for the functions, it is useful to stack the observations and write

the model in matrix notation. Let y

= (y

, . . . , y

)

′

, ε

= (ε

, . . . , ε

)

′

, and for each of the j =

1, . . . , r functions in (3), let the T observations in the covariate vectors s

= (s

ij1

, . . . , s

ijT

)

′

determine the

corresponding m

× 1 design point vectors v



ij1

, . . . , v

ijm



′

with entries equal the unique ordered

values of s

, that is v

ij1

< . . . < v

ijm

. Let the corresponding function evaluation vectors be denoted by



ij1

) , . . . , g

ijm

)



′

. Then, stacking over time, the ith equation of the system can be written

Interested readers are referred to these books for further details on a rich variety of nonparametric modeling approaches such

as truncated polynomials, radial basis functions, neural networks, regression trees, wavelets, kernel smoothing, locally weighted

polynomials, B-splines, etc., many of which are beyond the scope of this paper.

in matrix notation as

= Q

+ Q

+ . . . + Q

+ ε

, (4)

where Q

are T × m

incidence matrices with entries Q

(h, k) = 1 if s

ijh

= v

ijk

and 0 otherwise, which

establishes the correspondence between s

and v

. Note that because there may be repeating values in s

we have that m

≤ T for j = 1, . . . , r. Since all rows of Q

contain a single 1, row t of the product Q

is given by g

ijt

The idea behind nonparametric modeling is to view the function evaluations in each g

as the realization

of a stochastic process which controls the degree of local variation between neighboring elements. Despite

differing theoretical foundations and assumptions, the vectors of function evaluations g

can eventually be

written, in a wide range of nonparametric modeling approaches, as random ﬁelds of the form

|τ

∼ N



ij0

, τ

−1



, j = 1, . . . , r, (5)

where τ

is a smoothness parameter and K

is a matrix whose structure will be discussed shortly. From a

Bayesian perspective, equation (5) can be viewed as a smoothness prior for g

, where as from a frequentist

perspective, it is often viewed as a roughness penalty term in penalized likelihood estimation (Wahba, 1978).

In either case, the goal of the modeling is to introduce a penalty to local variation between successive

elements of g

, without absolutely ruling out any possible value that the elements of g

can take.

The focus in this paper is on models involving banded precision matrices K

, i.e., matrices which

have non-zero elements only in small bands around the main diagonal. Matrix bandedness is a feature

that signiﬁcantly reduces the computational costs and makes the analysis of high-dimensional problems

feasible and inexpensive. This paper will examine the construction of g

ij0

and K

in (5) for a class of

smoothness priors, which are conceptually simple and easily adaptable, can approximate unknown functions

arbitrarily well, and have been widely used (see, for example, Poirier, 1973; Shiller, 1984; Besag et al.,

1995; Fahrmeir and Lang, 2001; Koop and Poirier, 2004; Koop et al., 2005; Chib and Jeliazkov, 2006; Chib

et al., 2009). The roots of this method can be traced back to Whittaker (1923), and its relationship with

state space models has been discussed in Chan and Jeliazkov (2009). It should be noted that despite the

focus on a speciﬁc smoothness prior, the estimation methodology described in this paper is generic and can

be applied with various modeling approaches for the unknown functions, such as splines (Poirier, 1973;

Shiller, 1984), B-splines (Silverman, 1985), wavelets (Denison et al., 2002), or the approach of Hamilton

(2001). Although bandedness of the precision matrix is a useful characteristic of many of the preceding

nonparametric approaches, it is not a feature of other popular modeling methods, e.g. regression splines or

integrated Wiener process priors, which may be more computationally intensive in high-dimensional cases.

Because the modeling follows identical steps for each of the functions, for the time being we can sim-

plify notation by suppressing the ij subscripts that denote the equation and function numbers. With this

convention, a Markov process prior views the elements of g = (g(v

), . . . , g(v

))

′

≡ (g

, . . . , g

)

′

as a

stochastic process observed at the unique and ordered values in v. Speciﬁcally, letting h

ℓ

≡ v

ℓ

− v

ℓ−1

, a

ﬁrst-order Markov process prior can be deﬁned as

ℓ

= g

ℓ−1

+ u

ℓ

, (6)

while a second-order Markov process prior is given by

ℓ



1 +

ℓ

ℓ−1



ℓ−1

−

ℓ

ℓ−1

ℓ−2

+ u

ℓ

, (7)

where u

ℓ

∼ N(0, τ

ℓ

) and τ

is a smoothness parameter, such that small values of τ

produce smoother

functions, while larger values allow the function to be more ﬂexible and interpolate the data more closely.

The weights h

ℓ

adjust the variance to account for possibly irregular spacing between consecutive points in

each design vector; the one given here implies that the variance grows linearly with the distance h

ℓ

, although

other weights are also possible. A distribution for the initial states of the stochastic process is necessary in

order to complete the speciﬁcation of the smoothness prior. For example, for the ﬁrst-order prior, the initial

state can be modeled as

∼ N



, τ



, (8)

whereas in the second-order case, we have





|τ

∼ N





, τ



, (9)

where G

is a 2 × 2 symmetric positive deﬁnite matrix. The prior on the initial conditions in (8) and (9)

is very important because it induces a proper prior on the remaining observations (see Chib and Jeliazkov,

2006). Speciﬁcally, equation (6), starting with the initial condition in (8), implies a penalty on abrupt

jumps between successive function evaluations, whereas (7), starting with (9), induces a more general prior

on linear functions of v

that is conceptually similar to the usual priors placed on the intercept and slope

parameters in linear regression. This can be seen more precisely by iterating (7) in expectation (to eliminate

ℓ

which is the source of the nonlinearity), starting with initial states in (9).

The interpretability of the directed Markovian structure of the priors speciﬁed by (6)–(9) is a convenient

aspect of this approach, however, it also leads to an equivalent undirected representation that is used in

deriving the random ﬁeld version of the smoothness prior in (5). This can be shown by recongnizing that

upon deﬁning

H =







−1 1







, Σ =













for the ﬁrst-order case in equations (6) and (8), and similarly letting

H =







−



1 +



m−1

−



1 +

m−1









, Σ =













for the second-order Markov process in (7) and (9), one can write Hg = u, where u ∼ N(u

, Σ) is used

to denote the errors in the Markov process with u

= (g

, 0, . . . , 0)

′

and u

= (g

, g

, 0, . . . , 0)

′

in the

ﬁrst- and second-order cases, respectively. A simple change of variables technique leads to the distribution

g|τ

∼ N



, τ

−1



, where the penalty matrix K is given by K = H

′

−1

H and g

= H

−1

. This

derivation leads to the distributions presented in (5), where the indices i and j are explicitly present. Note

that g

can alternatively be derived by taking recursive expectations of either (6) or (7) starting with the

mean in (8) or (9), respectively.

Two key features of the class of priors are that (i) they are proper, which allows for formal Bayesian

model selection, and (ii) the m × m penalty matrices K are banded, which is of considerable convenience,

as manipulations involving such matrices take O(m) operations, rather than the usual O(m

) operations

for inversions and determinant computations, or O(m

) operations for multiplication by a vector. Given

that m may potentially be as large as the total number of observations T in the sample, this has important

ramiﬁcations for the numerical efﬁciency.

Since the priors on {g

} are deﬁned conditionally on the hyperparameters {τ

}, the hierarchical struc-

ture of the model is completed by specifying the prior distributions τ

∼ IG (ν

ij0

/2, δ

ij0

/2). Similarly,

the prior distribution on the covariance matrix Ω is taken as Ω

−1

∼ W (r

, R

). In setting these priors

it is generally very helpful to consider their mapping to the mean and variance of the inverse gamma and

Wishart distributions (see Gelman et al., 2003, App. A), as the choice of these parameters plays a role in

determining the trade-off between smoothness and goodness of ﬁt. An example of how different settings of

the prior parameters can lead to over- or under-smoothing is presented in Chib and Jeliazkov (2006).

Before we can focus on estimating the model, we must address the likelihood identiﬁcation problem

that emerges due to the additive structure in (3). Because the likelihood will remain unchanged if we

simultaneously let g

∗

(·) = g

(·) + α and g

∗

(·) = g

(·) − α for k ̸= j, it is obvious that neither an intercept,

nor the level of the individual functions is likelihood identiﬁed. This can also be seen by recognizing that

all rows of every incidence matrix Q

in (4) sum to 1, leading to perfect multicollinearity because model (4)

can be thought of as a saturated dummy variable model. Therefore, the functions must be appropriately

“anchored” in order to achieve likelihood identiﬁcation.

It is well known that Bayesian models with proper priors do not suffer from identiﬁcation problems

even when the likelihood is not identiﬁed (Lindley, 1971; Poirier, 1998). However, because the nonpara-

metric components of additive models are correlated by construction (since they enter the mean function

additively), likelihood identiﬁcation is essential for providing a model with well-behaved conditional pos-

terior distributions that will produce quickly-mixing MCMC algorithms for efﬁcient posterior sampling. To

achieve likelihood identiﬁcation, I remove free constants in the likelihood by employing the identiﬁcation

restrictions proposed in Jeliazkov (2011). The approach formally identiﬁes the model by centering the func-

tions in the likelihood and integrating out—instead of holding ﬁxed—any unidentiﬁed quantities that enter

the speciﬁcation.

Such quantities are marginalized out with respect to a proper prior that is of no relevance

in the likelihood and does not affect marginal likelihood estimation, thus relating this approach to the idea

of marginal data augmentation discussed in Meng and van Dyk (1999), van Dyk and Meng (2001), and Imai

and van Dyk (2005).

One approach to identiﬁcation is to remove the free constants in the likelihood by restricting r − 1 of

the functions {g

} in each equation to start at zero (e.g., Shively et al., 1999; Koop et al., 2005; Chib et al.,

2009). While this approach is quite natural as it corresponds to creating a baseline category in dummy

variable models, in the context of nonparametric regression it tends to produce funnel-shaped error bands

for the function estimates due to the identiﬁcation restriction. Consequently, the information content in the

data can be confounded with the repercussions of the identiﬁcation restriction, so that narrower bands need

not correspond to regions with more data or better identiﬁcation of the function.

Another possibility for anchoring the functions is to consider the following version of (4)

= Q

+ Q

+ . . . + Q

+ ε

, (10)

where



−

′



, j = 1, . . . , r,

are m

× m

symmetric and idempotent mean-differencing matrices (Hastie and Tibshirani, 1990; Lin and

Zhang, 1999). Unfortunately, this identiﬁcation scheme does not lend itself to computationally efﬁcient pos-

terior simulation and, as pointed out by Gelfand (2000), it has been applied in ways that do not correspond

to well-deﬁned Bayesian models, with centering typically introduced “on the ﬂy” merely as a step in the

ﬁtting algorithm.

For these reasons, this paper employs the closely related, yet computationally very distinct, identiﬁcation

scheme presented in Jeliazkov (2011), where the additive functions are identiﬁed through

= Q

+ M

+ . . . + M

+ ε

, (11)

It will be sufﬁcient to apply this centering to r − 1 of the unknown functions in each equation allowing the overall intercept to

be absorbed in the remaining function.

where the T × T symmetric and idempotent mean-differencing matrix



−

′



now pre-multiplies the incidence matrices {Q

} and centers the expanded vector of functional evaluations.

The beneﬁts from this identiﬁcation method are discussed next.

3 Estimation

To motivate the general approach, this section begins by considering the important special case of a single

equation univariate regression model. Given data {y

, s

}

t=1

, the scalar responses y

are assumed to depend

on the (scalar) covariate s

according to

= g (s

) + ε

, (t = 1, . . . , T ), (12)

where ε

∼ N



0, σ



, and g (·) is an unknown smooth function. The model in (12) can be written in stacked

form as

y = Qg + ε, ε ∼ N



0, σ



, (13)

where Q is the incidence matrix deﬁned after equation (4). Given the Gaussian likelihood implied by (13),

and assuming the Gaussian smoothness prior in (5) for either a ﬁrst- or second-order process together with

inverse Gamma priors τ

∼ IG(ν

/2, δ

/2) and σ

∼ IG(s

/2, d

/2), yields full-conditional distributions

which are conjugate, i.e. they are in the same family as the priors (see, e.g., Koop, 2003; Greenberg,

2008). Sequential sampling from those full-conditional distributions lays the foundations for the following

algorithm.

Algorithm 1 Univariate Gaussian Nonparametric Model: MCMC Implementation

1. Sample [g|y, τ

, σ

] ∼ N(ˆg, G), where G and ˆg are the usual Bayes updates for linear regression,

namely G =



K/τ

+ Q

′

Q/σ



−1

and ˆg = G



/τ

+ Q

′

y/σ



. Remark 1 presents important

notes on the sampling in this step.

2. Sample [τ

|g] ∼ IG



+(g−g

)

′

K(g−g

)



, where conditionally on g, τ

is independent of the

remaining parameters and the data.

3. Sample [σ

|y, g] ∼ IG



+(y−Qg)

′

(y−Qg)



While steps 2 and 3 of Algorithm 1 are fairly straightforward, step 1 requires careful consideration

because the quantities involved there can be of dimension as high as the sample size n. For this reason,

estimation is performed as follows (see Fahrmeir and Lang, 2001).

Remark 1 Sampling of g. To sample g, note that Q

′

Q is a diagonal matrix whose t-th diagonal entry

equals the number of values in s corresponding to the design point v. Since K and Q

′

Q are banded,

−1

is banded as well. Thus sampling of g need not include an inversion to obtain G and ˆg. The mean

ˆg is found instead by solving G

−1

ˆg =



/τ

+ Q

′

y/σ



, which is done in O(T ) operations by back

substitution. Also, let P

′

P = G

−1

, where P is the Cholesky decomposition of G

−1

and is also banded. To

obtain a random draw from N (ˆg, G) efﬁciently, sample u ∼ N (0, I), and solve P w = u for w by back

substitution. It follows that w ∼ N (0, G). Adding the mean ˆg to w, one obtains a draw g ∼ N(ˆg, G).

Turning attention to the additive case, let θ denote the vector of all model parameters, i.e. the elements

of {g

}, {τ

}, and the unique entries of Ω. Then, based on the identifying restrictions in (11) and the priors

discussed in Section 2, MCMC estimation can proceed through iterative sampling of the following steps.

Algorithm 2 NPVAR Model: MCMC Implementation

1. Sample [g

|y, θ\g

] ∼ N



ˆg



, where,



i|\i

′



−1

ˆg





i10

i|\i

′





− µ

i|\i

−



j=2









with µ

i|\i

= E(ε

|ε

) and σ

i|\i

= V ar(ε

|ε

). The sampling in this step is carried out efﬁciently in

O (T ) operations as discussed in Remark 1.

2. Sample [g

|y, θ\g

] ∼ N



ˆg



for j = 2, . . . , r and i = 1, . . . , q, where



i|\i

′



−1

, and

ˆg





ij0

i|\i

′





y − µ

i|\i

− Q

−



k≥2,k̸=j









Remark 2 below shows how the sampling in this step can be carried out efﬁciently in O (T ) operations,

even though

is not banded.

3. Sample [τ

] ∼ IG



[ν

ij0

+ m

]/2, [δ

ij0



− g

ij0



′



− g

ij0



]/2



for i = 1, . . . , q,

and j = 1, . . . , r, where, given g

, τ

is independent of the other elements in θ and the data y.

4. Sample [Ω

−1

|y, θ\Ω] ∼ W



+ T, [R

−1



t=1

′

]

−1



, where e

denotes the q × 1 vector of

residuals in time period t.

Algorithm 2 generalizes Algorithm 1 in a straightforward fashion by sampling each unknown function

conditionally on the remaining ones, making simulation manageable. Importantly, however, Step 2 of Al-

gorithm 2 involves r − 1 non-banded matrices in each equation, and at ﬁrst glance it would appear that

simulation will be very demanding. Fortunately, however, as shown in Jeliazkov (2011), an application

of the Sherman-Morrison formula makes it possible to sample these functions efﬁciently. The approach is

presented in greater detail in Remark 2 further below. The modularity and computational advantages of this

estimation strategy can provide important beneﬁts in a variety of settings because simulating the functions

by brute force methods is not always practical owing to the algorithmic complexity of working with high-

dimensional matrices. Moreover, because the frequentist backﬁtting approach to estimating the unknown

functions can be viewed as a (non-stochastic) simpliﬁcation of Gibbs sampling (see Hastie and Tibshirani,

2000), Algorithm 2 can also be useful in frequentist estimation.

Remark 2 Sampling of Centered Functions. To draw g

∼ N





in Step 2 of Algorithm 2, use

the deﬁnition of M

to write



i|\i

′



−1



i|\i

′

−

′

i|\i



−1

where c

= Q

′

1. Letting A

i|\i

′

, u

√

i|\i

, and λ

= u

′

−1

, one

can write, by the Sherman-Morrison formula,



− u

′



−1

= A

−1

′

−1

1 − λ

. (14)

Signiﬁcant efﬁciency beneﬁts can be derived from (14) because ˆg

in Step 2 of Algorithm 2 can be obtained

by working with A

without inverting to A

−1

as outlined in Remark 1. Furthermore, let



′

1 − λ



which implies that

= A

−1

. Thus, if x ∼ N (0, B

), then z = A

−1

x is distributed z ∼





, and a draw g

∼ N



ˆg



is obtained as g

= ˆg

+ z. To generate x ∼ N (0, B

draw w

∼ N (0, A

) and w

∼ N (0, 1) and let x = w

+ w



1 − λ

As a consequence of the shortcuts afforded by Remark 2, all operations are O (T ) rather than O





4 Model Comparison

Empirical studies must inevitably address uncertainty not only about the parameters of a given model, but

also about the model speciﬁcation itself. This makes model comparison a central issue in statistical analysis.

Given a collection of models {M

, . . . , M

}, the formal Bayesian approach to model comparison (or

testing the validity of the alternative hypotheses captured by each model) is based on the posterior model

probabilities and their ratios, the posterior odds. Speciﬁcally, for any two models M

and M

, a simple

application of Bayes’ theorem suggests that the posterior odds can be represented as the product of the prior

odds and the ratio of the marginal likelihoods (the Bayes factor) as follows

Pr(M

|y)

Pr(M

|y)

Pr(M

)

Pr(M

)

m(y|M

)

m(y|M

)

In turn, for any model M

, l = 1, . . . , L, the marginal likelihood is given by

m(y|M

) =



f(y|θ

, M

)π

(θ

)dθ

, (15)

which is the integral of the likelihood function f(y|θ

, M

) with respect to the prior distribution on the

model parameters π(θ

). Because in the case of nonparametric additive models the dimension of θ can

be very large, it should be clear that direct analytical integration will generally be infeasible. However, this

difﬁculty can be addressed by using the approach of Chib (1995), where after rearranging Bayes’ theorem

m(y|M

) can alternatively be expressed as

m(y|M

) =

f(y|θ

∗

, M

)π(θ

∗

)

π(θ

∗

|y, M

)

, (16)

so that the integral in (15) is reduced to the more tractable problem of evaluating the likelihood, prior, and

posterior ordinates at a single point θ

∗

(e.g., the posterior mean). Because the numerator terms in (16)

are available by direct calculation, the marginal likelihood can be computed by ﬁnding an estimate of the

posterior ordinate π(θ

∗

|y).

In the current context, the hierarchical structure of NPVAR models allows application of (16) in two

different ways. One approach relies on

m(y) =





2∗



, Ω

∗



∗





2∗



, Ω

∗



∗





2∗



, Ω

∗



∗





where the nonparametric functions are explicitly included in the identity (see Chib and Jeliazkov, 2006;

Chib et al., 2009). However, owing to the Gaussian structure of the model, the marginal likelihood can also

be computed using

m(y) =



y|{τ

2∗

}, Ω

∗





{τ

2∗

}, Ω

∗





{τ

2∗

}, Ω

∗



where all quantities are marginalized over the high-dimensional blocks {g

}. This marginalization is pos-

sible because conditionally on



2∗



, Ω

∗



, the density f



y|{τ

2∗

}, Ω

∗



, marginalized over {g

} with

respect to the prior distributions in (5), is also normal (Koop and Poirier, 2004) and can be evaluated di-

rectly for the typical sample sizes T encountered in macroeconomic applications. Because of this analytical

tractability, m (y) can then be found after the main run where, using the conditional independence of the

densities in Steps 3 and 4 of Algorithm 2, one computes



2∗



, Ω

∗



≈ T

−1



t=1





Ω

∗

|y,



(t)





i=1



2∗

(t)





using draws {g

(t)

} from the main MCMC run. In instances where T is large or there are other complications

(e.g., discrete outcomes), the decomposition involving {g

} is more appropriate and readers are referred

to Chib and Jeliazkov (2006) and Chib et al. (2009) for methods that use reduced runs or to Jeliazkov and

Lee (2010) for a method that employs the Gibbs kernel and invariance of the Markov chain to estimate the

posterior ordinate π({τ

2∗

}, Ω

∗

, {g

∗

}|y).

An important point mentioned earlier is that the marginal likelihood for the additive NPVAR model does

not depend on the (likelihood unidentiﬁed) levels of the functions that are centered for identiﬁcation. This

can be seen by recognizing that if f (y|θ

, θ

) = f(y|θ

), i.e. the likelihood depends only on θ

whereas

is unidentiﬁed, but we have a proper prior π(θ

, θ

), then the marginal likelihood

m (y) =



f (y|θ

, θ

) π (θ

, θ

) dθ

dθ



f (y|θ

) π (θ

)



π (θ

|θ

) dθ

dθ



f (y|θ

) π (θ

) dθ

is not inﬂuenced by the prior on θ

. In practice this is important for modeling, because it implies that

researchers with different beliefs about unidentiﬁed parameters will nevertheless reach identical conclusions

about the relative ranking of alternative models.

Finally, note that because estimation of the marginal likelihood does not require maximization, it is less

computationally intensive in nonparametric additive models than evaluation of information criteria such as

AIC and BIC. This point has been overlooked and not fully appreciated in the literature despite its impor-

tance for model selection and model averaging on the basis of {Pr(M

|y)}.

5 Model Extensions

The estimation techniques presented in this paper are fully modular and readily applicable in various other

settings since estimation of the unknown functions {g

} can be done conditionally on modiﬁcations in

other parts of the model. The goal of this discussion is to brieﬂy review the relevant literature and provide

references that could guide researchers interested in pursuing such extensions.

One should note that the methods in Section 2 and Section 3 trivially generalize to cases where one or

more exogenous covariates enter the regression as in seemingly unrelated regression models (e.g., Smith

and Kohn, 2000; Holmes et al., 2002; Koop et al., 2005). Further extensions of the framework to Bayesian

models with nonparametric endogeneity or sample selection can be pursued following Chib and Greenberg

(2007) or Chib et al. (2009), respectively; the modeling would also be useful in guiding future research on

structural NPVAR models and impulse response analysis. Many of the aforementioned papers also trivially

subsume semiparametric and partially linear cases where some of the covariates enter the model linearly.

Estimation of such models is a straightforward extension of Algorithms 1 and 2, and proceeds by using

the partial residuals y

− X

β when simulating {g

}, followed by simulating β conditionally upon the

functions {g

The methods in this paper, including the above extensions, can also be applied, using data augmentation

techniques (Tanner and Wong, 1987; Albert and Chib, 1993), to the analysis of dynamic systems involving

binary, polychotomous, censored, and other discrete outcomes such as the qualitative VAR model of Dueker

(2005). The main advantage of this approach is that conditionally on the latent data, estimation of the

parameters and the unknown functions closely mirrors the methods for continuous data. The methodology

presented here is also applicable to the class of additive mixed models for continuous and discrete data (e.g.,

Lin and Zhang, 1999). For example, Chib and Jeliazkov (2006) discuss the speciﬁcation and estimation of

a semiparametric partially linear model for dynamic binary panel data with multivariate heterogeneity. The

estimation algorithm in that paper can be easily modiﬁed to include an additive structure whose estimation

can be carried out by the methods presented in Section 3.

While the speciﬁcation and estimation of NPVAR models was discussed in detail for homoskedastic

Gaussian models, extensions to other distributions (e.g., Student’s t, mixtures of normals, or Dirichlet pro-

cess priors for nonparametric distributional modeling) and heteroskedasticity (e.g., regime switching models

with different variance regimes, or models with stochastic volatility) may be very desirable in certain appli-

cations. One such example, relating to different variance regimes, will be studied in Section 6. Fortunately,

such extensions can be estimated using data augmentation techniques that could build upon the homoskedas-

tic Gaussian speciﬁcation discussed earlier. In particular, consider a heteroskedastic model in which the ith

equation can be written as

= Q

+ M

+ . . . + M

+ ε

with Σ

≡ Var(ε

) = diag(σ

, . . . , σ

). Due to the heteroskedasticity, the covariance matrix of [g

|y, θ]

is not of the form presented in Remark 2, and estimation can not be performed efﬁciently by relying on the

Sherman-Morrison formula. This poses an important computational difﬁculty because estimation in large

dimensional models would be very difﬁcult and potentially infeasible. In this paper, I propose a solution to

this problem that employs data augmentation to reduce the heteroskedastic model to a homoskedastic one

enabling application of the methods discussed in Algorithm 2 and Remark 2. In particular, following Chib

and Jeliazkov (2006), we can write

= Q

+ M

+ . . . + M

+ η

+ ν

where η

iid

∼ N(0, Σ

− κ

I) and ν

iid

∼ N(0, κ

I) with 0 < κ

≤ min{σ

}. Consequently, given a draw of

, which is simple and inexpensive to obtain, the model

− η

= Q

+ M

+ . . . + M

+ ν

is homoskedastic because Var(ν

) = κ

I. In our context, it would actually be optimal to set κ

= min{σ

}

because this would imply that the corresponding elements of η

would be identically 0 and will not need to

be sampled. This leads to the following extension of Algorithm 2:

Algorithm 3 NPVAR Model: MCMC Estimation of Heteroskedastic Model

1. For i = 1, . . . , q:

(a) Sample [η

|y, θ] by drawing, for t = 1, . . . , T , η

∼ N(ˆη

), where

= κ

(σ

−

)/σ

and ˆη

= (σ

− κ

)(y

− µ

it|\i,t

− m

)/σ

, where m

is the t-th row of Q



k≥2

, µ

it|\i,t

is the t-th row of µ

i|\i

= E(ε

|ε

), and κ

= min{σ

}, where

= V ar(ε

|ε

\i,t

). Note that for cases where κ

= σ

, the corresponding entry in η

identically zero and need not be sampled.

(b) Sample [g

|y, η

, θ\g

] ∼ N



ˆg



, where,



′



−1

ˆg





i10

′





− η

− µ

i|\i

−



j=2









with µ

i|\i

= E(ε

|ε

) and κ

= min{σ

}, where σ

= V ar(ε

|ε

\i,t

). The sampling in this

step is carried out efﬁciently as in Remark 1.

|y, η

, θ\g

] ∼ N



ˆg



for j = 2, . . . , r and i = 1, . . . , q, where



′



−1

, and

ˆg





ij0

′





y − η

− µ

i|\i

− Q

−



k≥2,k̸=j









This is done as in Remark 2.

2. Sample [τ

] for i = 1, . . . , q, and j = 1, . . . , r, as in Algorithm 2.

3. Sample [Ω

−1

|y, θ\Ω] according to the volatility process under consideration.

The above machinery also applies to mixture-of-normals and scale mixture-of-normals models. In par-

ticular, a model with t errors with ν degrees of freedom can be represented as a conditionally Gaussian

model, whose variance, given a set of a priori gamma latent variables λ

∼ G(ν/2, ν/2), t = 1, . . . , T , is

given by Var(ε

|λ

) = σ

/λ

(Andrews and Mallows, 1974; Albert and Chib, 1993). Estimation of these

models is straightforward because given {λ

}, one can decompose ε

into η

and ν

and proceed as above.

In this way, NPVAR models can be adapted to a variety of speciﬁcations for the error variance, including

changepoint and regime switching models (Chib, 1996, 1998; Sims and Zha, 2006), time-varying parameter

models (Primiceri, 2005; Chan and Jeliazkov, 2009), factor models (Kose et al., 2003; Belviso and Milani,

2006; Kose et al., 2008; Chan and Jeliazkov, 2009), and others.

6 Application to U.S. Macroeconomic Data

The data sample for this application contains post-war quarterly macroeconomic data for the U.S. from

1948:Q1 to 2005:Q1. The set of variables includes output growth g

measured by log differences of real

GDP between two consecutive quarters, average quarterly unemployment rate u

, inﬂation π

measured

by the percentage change in the Consumer Price Index between consecutive quarters, and interest rates i

measured by the average quarterly secondary market yield on the 3-month Treasury bill. The ﬁrst three of

these variables are seasonally adjusted. These variables, summarized in Table 1, reﬂect the general state

of the economy, and have been widely used in empirical macroeconomics.

From the Table, we see that

The sample period excludes the past recession for a number of reasons. Over the last few years, interest rates have approached

and stayed very close to their lower bound of zero. This could lead to ﬁndings of nonlinearity due to the effects of the lower bound,

thereby favoring the methods of the paper over a linear model. Moreover, traditional modeling may be inadequate near the bound,

where the distribution of the interest rate process is truncated and exhibits point mass. Appropriate modeling in this case is still an

open research problem. Finally, if the “Great Recession” marked a possible structural break, at present there would be insufﬁcient

observations estimate the model after the break.

the average quarterly GDP growth over the sample period is 0.85 percent, which amounts to annual GDP

growth of 3.4 percent. A similar computation shows an average annual inﬂation rate of approximately 3.7

percent. Unemployment and interest rates average at 5.63 and 4.81 percent, respectively.

Table 1: Descriptive statistics for the data sample (in percentage points).

Variable Mean SD Min Max

Quarterly growth in real GDP 0.85 1.00 -2.76 4.02

Unemployment rate 5.63 1.52 2.60 10.70

Nominal interest rate 4.81 2.92 0.79 15.05

Quarterly Inﬂation 0.92 0.85 -1.24 4.08

These data are analyzed using the econometric techniques discussed earlier. The empirical strategy for

studying the behavior of the dynamic system in (2) is to address both model and functional form uncertainty.

The ﬁrst area of model uncertainty in the macroeconomic system has to do with determination its dynamics

– i.e., the number of lags needed in equation (2). This one-lag model was compared to several more richly

parameterized models in order to gauge whether restricting attention to an NPVAR(1) speciﬁcation is a

sensible empirical strategy. The baseline NPVAR(1) model, which contains a single lag of y

with 16

unknown functions, was compared with an NPVAR(2) model (with 32 such functions). The baseline model

overwhelmingly outperformed the longer lag speciﬁcation – its log-marginal likelihood exceeded that of the

larger model by over 40, implying a Bayes factor of over e

in favor of the NPVAR(1) speciﬁcation.

Guided by earlier research ﬁndings suggesting the possibility of a structural break (at least in error

volatility as in Stock and Watson (2003) and Sims and Zha (2006)), I also used split-sample estimation

to capture the possibility of structural breaks in the series. Speciﬁcally, an NPVAR(1) model was ﬁt on

data in the pre-Volcker era (prior to 1979:Q2), and a separate model was ﬁt on the data thereafter (following

1979:Q3). The marginal likelihood for the pre-Volcker model was −539.5, and that for the second part of the

data sample was −438.7. Compared to the log-marginal likelihood of −908.8 for the baseline NPVAR(1)

model on the entire data sample, the split sample measure of ﬁt, as captured by the marginal likelihood, was

A speciﬁcation including a fourth (year-ago) lag was also considered but it also did not perform competitively with the

NPVAR(1) speciﬁcation.

far worse (the sum of the log-marginal likelihoods for the two subsamples is −978.2, which is far below

marginal likelihood for the overall NPVAR(1) model of −908.8). These results are interesting because (i)

they suggest that the simpler and parsimonious NPVAR(1) ﬁt on the entire speciﬁcation appears preferable to

the (twice as big) split sample model and (ii) they demonstrate the ability of the Bayesian model comparison

framework to penalize overparameterized speciﬁcations.

Figure 2: Full sample estimates: the rows represent the functions in each equation, columns contain the

functions of a given lagged variable across equations.

−2 0 2 4

growth

t−1

5 10

−0.5

0.5

1.5

unemployment

t−1

5 10 15

−2

−1

interest

t−1

0 2 4

−1

−0.5

0.5

inflation

t−1

−2 0 2 4

unemployment

5 10

−2

5 10 15

−0.2

0.2

0.4

0.6

0 2 4

−0.2

0.2

0.4

0.6

−2 0 2 4

interest

5 10

−0.6

−0.4

−0.2

0.2

5 10 15

0 2 4

−1

−0.5

0.5

−2 0 2 4

inflation

5 10

−0.6

−0.4

−0.2

0.2

5 10 15

−0.5

0.5

0 2 4

−1

Figure 2 presents the estimated functions for the full-sample NPVAR(1) model. The ﬁgure shows that

a linear model would be reasonable for many of the economic relationships – particularly in modeling the

effects of lagged unemployment. To a lesser extent, the same is true in other instances (e.g. the dependence

of interest on its past value), where the function estimates do not reveal drastic departures from linearity.

On the other hand, however, in many equations, the effects of lagged ﬁnancial variables (interest and inﬂa-

tion), as well as the effects of lagged growth, appear to be quite nonlinear. This ﬁnding concurs with earlier

studies that have found nonlinearity in growth behavior (Dahl and Gonzalez-Rivera, 2003a,b) and ﬁnancial

markets (H

ardle and Tsybakov, 1997; H

ardle et al., 1998). The Figure shows, for instance, that lagged inﬂa-

tion exhibits signiﬁcant nonlinearities in every equation, whereas the function estimates for lagged interest

exhibit nonlinearities in three of the four equations. For this reason, future analysis of ﬁnancial variables

might beneﬁt considerably from employing nonparametric methods. Regarding the effects of lagged GDP

growth, a review of the estimated functions reveals that although there is much nonlinearity, there are also

large regions where the function estimates are approximately linear. This suggests that an interesting future

research question would be to examine whether those types of nonlinearities can be adequately captured

through threshold models.

For comparison purposes, Figures 3 and 4 show function estimates for the pre- and post-Volcker periods,

respectively. It is interesting to note that, although the function estimates differ in some respects, most point

to the same types of nonlinearities as the estimates from the overall sample.

This is quite instructive, as

it provides evidence that the econometric relationships may be stable but nonlinear, and therefore omitted

nonlinearity may be a signiﬁcant driver in ﬁndings of structural instability (cf. Hamilton, 2001). Resolving

this issue should be an important item on the research agenda of studies focusing on structural (in)stability.

The apparent stability of the nonparametric function estimates across subsamples naturally leads to

another important research question that has attracted much attention recently. Speciﬁcally, it would be of

interest to consider whether an NPVAR model would exhibit evidence of a structural change in variances,

which has been widely documented in contexts utilizing linear models. Such ﬁndings (e.g., Stock and

Watson (1996, 2003), Sims and Zha (2006)) have led to the conclusion that a reduction in error volatility

has been a driving force in the “Great Moderation” of the 1980s and 1990s. To formally test the stability of

the mean relationships while allowing for structural breaks in variances, I have estimated three additional

NPVAR models. The ﬁrst allows for a single structural break between 1979:Q2 and 1979:Q3 with the

Volcker appointment. The second model employs a single structural break between 1982:Q4 and 1983:Q1

with the following the Fed’s disinﬂation of the early 1980s. The third model allows for both of these

break points. The models were estimated using Algorithm 3 of Section 5, and the marginal likelihoods

Since the range and level of each function may differ across samples, readers are cautioned to compare those functions over

the relevant ranges, keeping in mind that the level of the functions will shift to satisfy the identiﬁcation constraints.

Figure 3: Pre-Volcker estimates: the rows represent the functions in each equation, columns contain the

functions of a given lagged variable across equations.

−2 0 2 4

growth

t−1

4 6 8

−1

unemployment

t−1

2 4 6 8

−2

−1

interest

t−1

−1 0 1 2 3

−1

inflation

t−1

−2 0 2 4

unemployment

4 6 8

−2

2 4 6 8

−0.2

0.2

0.4

0.6

−1 0 1 2 3

−0.5

0.5

−2 0 2 4

interest

4 6 8

−0.4

−0.2

0.2

2 4 6 8

−2

−1 0 1 2 3

−0.5

0.5

−2 0 2 4

inflation

4 6 8

−0.2

0.2

0.4

2 4 6 8

−1 0 1 2 3

−1

were estimated as discussed in Section 4. The marginal likelihood for the ﬁrst model was estimated to be

−891.4, whereas that of the second was estimated to be −880.1, showing that, conditionally on a single

break, the data favor the 1982/83 breakpoint. However, a much more dramatic improvement is offered by

the third model, the one which allows for both a 1979 and a 1982/83 breakpoints. The marginal likelihood

for that model is −838.4, leading to the conclusion that these three periods in the U.S. sample are indeed

dramatically different. This if further conﬁrmed by examining the estimated covariance matrices for the

three sub-periods:

Ω

48:79







1.135 −0.254 0.085 −0.060

−0.254 0.144 −0.033 0.002

0.085 −0.033 0.255 0.044

−0.060 0.002 0.044 0.463







Ω

79:82







1.327 −0.285 1.095 0.636

−0.285 0.367 −0.575 −0.261

1.095 −0.575 4.348 1.445

0.636 −0.261 1.445 1.048







Figure 4: Post-Volcker estimates: the rows represent the functions in each equation, columns contain the

functions of a given lagged variable across equations.

−2 0 2

−0.5

0.5

1.5

growth

t−1

5 10

−0.5

0.5

unemployment

t−1

5 10 15

−1.5

−1

−0.5

interest

t−1

0 2 4

−1.5

−1

−0.5

0.5

inflation

t−1

−2 0 2

unemployment

5 10

−2

5 10 15

−0.2

0.2

0.4

0.6

0 2 4

−0.2

0.2

0.4

0.6

−2 0 2

interest

5 10

−0.5

0.5

5 10 15

−5

0 2 4

−1

−2 0 2

0.5

inflation

5 10

−0.8

−0.6

−0.4

−0.2

0.2

5 10 15

−0.5

0.5

0 2 4

and

Ω

83:05







0.234 −0.032 0.051 0.008

−0.032 0.066 −0.031 −0.010

0.051 −0.031 0.202 0.044

0.008 −0.010 0.044 0.216







These covariance matrices clearly demonstrate the dramatic peak in the error variances of all variables except

growth (i.e. unemployment, interest rates, and inﬂation) during the disinﬂation period and the subsequent

“moderation” of all 4 variables in 1983. A notable feature is the large jump, and subsequent decrease, in the

estimated error variance in the interest rate equation during the period 1979-1982, which can be accounted

for by the Fed’s change of policy instrument from the federal funds rate to reserve targeting, as well as the

unprecedented increase in interest rates during the disinﬂation period.

Figure 5 presents the function estimates from the model with three variance regimes. The ﬁgure demon-

strates that the same type of nonlinearities that were present in the homoskedastic models are still present

here. Therefore, even though the heteroskedastic NPVAR model has conﬁrmed earlier conclusions that

changes were large due to breaks in variances, it also shows that there is much nonlinearity that would

remain unexplored by linear models and that future research should study such features of the economic

relationships more closely.

Figure 5: Full sample estimates from model with 3 volatility regimes: the rows represent the functions in

each equation, columns contain the functions of a given lagged variable across equations.

−2 0 2 4

growth

t−1

5 10

−0.5

0.5

unemployment

t−1

5 10 15

−1.5

−1

−0.5

0.5

interest

t−1

0 2 4

−1

inflation

t−1

−2 0 2 4

unemployment

5 10

−2

5 10 15

−0.2

0.2

0.4

0.6

0.8

0 2 4

−0.4

−0.2

0.2

0.4

−2 0 2 4

interest

5 10

−0.2

0.2

0.4

5 10 15

0 2 4

−0.5

0.5

−2 0 2 4

inflation

5 10

−0.4

−0.2

0.2

5 10 15

−0.5

0.5

0 2 4

−1

7 Concluding Remarks

This article has examined the speciﬁcation, estimation, and comparison of nonparametric VAR models.

Efﬁcient MCMC sampling and model comparison techniques are discussed in the context of a new scheme

for identifying the unknown covariate functions, and extensions to heteroskedastic and other settings have

been examined. An application of the NPVAR model to U.S. post-war data on GDP growth, unemployment,

interest rates, and inﬂation, has conﬁrmed the presence of distinct volatility regimes in the post-war U.S.

macroeconomic series, but has also revealed that important nonlinearities is certain economic relationships

may remain undetected by standard regressions. Implementation of these techniques in related settings, such

as those considered in Section 5, is an interesting area for future research.

References

Ahamada, I. and Flachaire, E. (2010), Non-Parametric Econometrics, Oxford: Oxford University Press.

Albert, J. and Chib, S. (1993), “Bayesian Analysis of Binary and Polychotomous Response Data,” Journal

of the American Statistical Association, 88, 669–679.

Andrews, D. F. and Mallows, C. L. (1974), “Scale Mixtures of Normal Distributions,” Journal of the Royal

Statistical Society – Series B, 36, 99–102.

Beaudry, P. and Koop, G. (1993), “Do Recessions Permanently Change Output?” Journal of Monetary

Economics, 31, 149–163.

Belviso, F. and Milani, F. (2006), “Structural Factor-Augmented VARs (SFAVARs) and the Effects of Mon-

etary Policy,” Topics in Macroeconomics, 6, Iss. 3, Article 2.

Besag, J., Green, P., Higdon, D., and Mengersen, K. (1995), “Bayesian Computation and Stochastic Sys-

tems,” Statistical Science, 10, 3–66.

Canova, F. (1993), “Modelling and Forecasting Exchange Rates with a Bayesian Time-Varying Coefﬁcient

Model,” Journal of Economic Dynamics and Control, 17, 233–261.

Chan, J. C. and Jeliazkov, I. (2009), “Efﬁcient Simulation and Integrated Likelihood Estimation in State

Space Models,” International Journal of Mathematical Modelling and Numerical Optimisation, 1, 101–

120.

Chauvet, M. (1998), “An Econometric Characterization of Business Cycle Dynamics with Factor Structure

and Regime Switching,” International Economic Review, 39, 969–996.

Chib, S. (1995), “Marginal Likelihood from the Gibbs Output,” Journal of the American Statistical Associ-

ation, 90, 1313–1321.

Chib, S. (1996), “Calculating Posterior Distributions and Modal Estimates in Markov Mixture Models,”

Journal of Econometrics, 75, 79–97.

Chib, S. (1998), “Estimation and Comparison of Multiple Change-Point Models,” Journal of Econometrics,

86, 221–241.

Chib, S. and Greenberg, E. (2007), “Analysis of Additive Instrumental Variable Models,” Journal of Com-

putational and Graphical Statistics, 16, 86–114.

Chib, S. and Jeliazkov, I. (2006), “Inference in Semiparametric Dynamic Models for Binary Longitudinal

Data,” Journal of the American Statistical Association, 101, 685–700.

Chib, S., Greenberg, E., and Jeliazkov, I. (2009), “Estimation of Semiparametric Models in the Presence of

Endogeneity and Sample Selection,” Journal of Computational and Graphical Statistics, 18, 321–348.

Cogley, T. and Sargent, T. J. (2001), “Evolving Post-World War II U.S. Inﬂation Dynamics,” NBER Macroe-

conomics Annual, 16, 331–338.

Dahl, C. M. and Gonzalez-Rivera, G. (2003a), “Identifying Nonlinear Components by Random Fields in the

US GNP Growth. Implications for the Shape of the Business Cycle,” Studies in Nonlinear Dynamics &

Econometrics, 7, Article 2.

Dahl, C. M. and Gonzalez-Rivera, G. (2003b), “Testing for Neglected Nonlinearity in Regression Models

Based on the Theory of Random Fields,” Journal of Econometrics, 114, 141–164.

Denison, D. G. T., Mallick, B. K., and Smith, A. F. M. (1998), “Automatic Bayesian Curve Fitting,” Journal

of the Royal Statistical Society – Series B, 60, 333–350.

Denison, D. G. T., Holmes, C. C., Mallick, B. K., and Smith, A. F. M. (2002), Bayesian Methods for

Nonlinear Classiﬁcation and Regression, John Wiley & Sons, New York.

DiMatteo, I., Genovese, C. R., and Kass, R. E. (2001), “Bayesian Curve-Fitting with Free-Knot Splines,”

Biometrika, 88, 1055–1071.

Dueker, M. (2005), “Dynamic Forecasts of Qualitative Variables: A Qual VAR Model of U.S. Recessions,”

Journal of Business & Economic Statistics, 23, 96–104.

Fahrmeir, L. and Lang, S. (2001), “Bayesian Inference for Generalized Additive Mixed Models Based on

Markov Random Field Priors,” Journal of the Royal Statistical Society – Series C, 50, 201–220.

Gelfand, A. E. (2000), “Discussion to “Bayesian Backﬁtting”,” Statistical Science, 15, 217–218.

Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003), Bayesian Data Analysis, Chapman & Hall,

New York, 2 edn.

Greenberg, E. (2008), Introduction to Bayesian Econometrics, Cambridge University Press, New York.

Hamilton, J. D. (1989), “A New Approach to the Economic Analysis of Nonstationary Time Series and the

Business Cycle,” Econometrica, 57, 357–384.

Hamilton, J. D. (2001), “A Parametric Approach to Flexible Nonlinear Inference,” Econometrica, 69, 537–

573.

Hansen, B. E. (1992), “The Likelihood Ratio Test Under Nonstandard Conditions: Testing the Markov

Switching Model of GNP,” Journal of Applied Econometrics, 7, S61–S82.

ardle, W. and Tsybakov, A. (1997), “Local Polynomial Estimators of the Volatility Function in Nonpara-

metric Autoregression,” Journal of Econometrics, 81, 223–243.

ardle, W., Tsybakov, A., and Yang, L. (1998), “Nonparametric Vector Autoregression,” Journal of Statis-

tical Planning and Inference, 68, 221–245.

Hastie, T. and Tibshirani, R. (1990), Generalized Additive Models, Chapman & Hall, New York.

Hastie, T. and Tibshirani, R. (2000), “Bayesian Backﬁtting,” Statistical Science, 15, 196–223.

Holmes, C. C., Denison, D. G. T., and Mallick, B. K. (2002), “Accounting for Model Uncertainty in Seem-

ingly Unrelated Regressions,” Journal of Computational and Graphical Statistics, 11, 533–551.

Imai, K. and van Dyk, D. (2005), “A Bayesian Analysis of the Multinomial Probit Model Using Marginal

Data Augmentation,” Journal of Econometrics, 124, 311–334.

Jeliazkov, I. (2011), “Speciﬁcation and Inference in Nonparametric Additive Regression,” working paper,

Department of Economics, University of California, Irvine.

Jeliazkov, I. and Lee, E. H. (2010), “MCMC Perspectives on Simulated Likelihood Estimation,” Advances

in Econometrics: Maximum Simulated Likelihood, 26, 3–39.

Kim, C.-J. and Nelson, C. R. (1999), “Has the U.S. Economy Become More Stable? A Bayesian Approach

Based on a Markov-Switching Model of the Business Cycle,” Review of Economics and Statistics, 81,

608–616.

Kim, C.-J., Morley, J., and Piger, J. (2005), “Nonlinearity and the Permanent Effects of Recessions,” Journal

of Applied Econometrics, 20, 291–309.

Koop, G. (2003), Bayesian Econometrics, John Wiley & Sons, New York.

Koop, G. and Korobilis, D. (2009), “Bayesian Multivariate Time Series Methods for Empirical Macroeco-

nomics,” Foundations and Trends in Econometrics, 3, 267–358.

Koop, G. and Poirier, D. J. (2004), “Bayesian Variants of Some Classical Semiparametric Regression Tech-

niques,” Journal of Econometrics, 123, 259–282.

Koop, G., Poirier, D. J., and Tobias, J. (2005), “Bayesian Semiparametric Inference in Multiple Equation

Models,” Journal of Applied Econometrics, 20, 723–747.

Kose, M. A., Otrok, C., and Whiteman, C. H. (2003), “International Business Cycles: World, Region and

Country Speciﬁc Factors,” American Economic Review, 93, 1216–1239.

Kose, M. A., Otrok, C., and Whiteman, C. H. (2008), “Understanding the Evolution of World Business

Cycles,” Journal of International Economics, 75, 110–130.

Lin, X. and Zhang, D. (1999), “Inference in Generalized Additive Mixed Models by Using Smoothing

Splines,” Journal of the Royal Statistical Society – Series B, 61, 381–400.

Lindley, D. V. (1971), Bayesian Statistics: A Review, SIAM, Philadelphia.

Meng, X.-L. and van Dyk, D. (1999), “Seeking Efﬁcient Data Augmentation Schemes via Conditional and

Marginal Augmentation,” Biometrika, 86, 301–320.

Pesaran, M. H. and Potter, S. M. (1997), “A Floor and Ceiling Model of U.S. Output,” Journal of Economic

Dynamics and Control, 21, 661–695.

Poirier, D. J. (1973), “Piecewise Regression Using Cubic Spline,” Journal of the American Statistical Asso-

ciation, 68, 515–524.

Poirier, D. J. (1998), “Revising Beliefs in Non-Identiﬁed Models,” Econometric Theory, 14, 483–509.

Potter, S. M. (1995), “A Nonlinear Approach to US GNP,” Journal of Applied Econometrics, 10, 109–125.

Primiceri, G. (2005), “Time Varying Structural Vector Autoregressions and Monetary Policy,” Review of

Economic Studies, 72, 821–852.

Ruppert, D., Wand, M. P., and Carroll, R. J. (2003), Semiparametric Regression, Cambridge University

Press, Cambridge, UK.

Shiller, R. (1984), “Smoothness Priors and Nonlinear Regression,” Journal of the American Statistical As-

sociation, 79, 609–615.

Shively, T. S., Kohn, R., and Wood, S. (1999), “Variable Selection and Function Estimation in Additive

Nonparametric Regression Using a Data-Based Prior,” Journal of the American Statistical Association,

94, 777–806.

Silverman, B. (1985), “Some Aspects of the Spline Smoothing Approach to Non-parametric Regression

Curve Fitting,” Journal of the Royal Statistical Society – Series B, 47, 1–52.

Sims, C. A. (1980), “Macroeconomics and Reality,” Econometrica, 48, 1–48.

Sims, C. A. and Zha, T. (2006), “Were There Regime Switches in U.S. Monetary Policy?” American

Economic Review, 96, 54–81.

Smith, M. and Kohn, R. (2000), “Nonparametric Seemingly Unrelated Regression,” Journal of Economet-

rics, 98, 257–281.

Stock, J. H. and Watson, M. W. (1996), “Evidence on Structural Instability in Macroeconomic Time Series

Relations,” Journal of Business and Economic Statistics, 14, 11–30.

Stock, J. H. and Watson, M. W. (2003), “Has the Business Cycles Changed? Evidence and Explanations,”

in Monetary Policy and Uncertainty: Adapting to a Changing Economy, pp. 9–56, Federal Reserve Bank

of Kansas City.

Tanner, M. A. and Wong, W. H. (1987), “The Calculation of Posterior Distributions by Data Augmentation,”

Journal of the American Statistical Association, 82, 528–549.

van Dyk, D. and Meng, X.-L. (2001), “The Art of Data Augmentation,” Journal of Computational and

Graphical Statistics, 10, 1–50.

Wahba, G. (1978), “Improper Priors, Spline Smoothing and the Problem of Guarding Against Model Errors

in Regression,” Journal of the Royal Statistical Society – Series B, 40, 364–372.

Wasserman, L. (2006), All of Nonparametric Statistics, Springer, New York.

Whittaker, E. (1923), “On a New Method of Graduation,” Proceedings of the Edinburgh Mathematical

Society, 41, 63–75.

Wood, S. and Kohn, R. (1998), “A Bayesian Approach to Robust Binary Nonparametric Regression,” Jour-

nal of the American Statistical Association, 93, 203–213.

Wood, S., Kohn, R., Shively, T., and Jiang, W. (2002), “Model Selection in Spline Nonparametric Regres-

sion,” Journal of the Royal Statistical Society – Series B, 64, 119–139.

Yang, L., H

ardle, W., and Nielsen, J. (1999), “Nonparametric Autoregression with Multiplicative Volatility

and Additive Mean,” Journal of Time Series Analsys, 20, 579–604.

Non-parametric Bayesian Vector Autoregression using Multi-subject Data

Preprint

Full-text available

Nov 2021

There has been a rich development of vector autoregressive (VAR) models for modeling temporally correlated multivariate outcomes. However, the existing VAR literature has largely focused on single subject parametric analysis, with some recent extensions to multi-subject modeling with known subgroups. Motivated by the need for flexible Bayesian methods that can pool information across heterogeneous samples in an unsupervised manner, we develop a novel class of non-parametric Bayesian VAR models based on heterogeneous multi-subject data. In particular, we propose a product of Dirichlet process mixture priors that enables separate clustering at multiple scales, which result in partially overlapping clusters that provide greater flexibility. We develop several variants of the method to cater to varying levels of heterogeneity. We implement an efficient posterior computation scheme and illustrate posterior consistency properties under reasonable assumptions on the true density. Extensive numerical studies show distinct advantages over competing methods in terms of estimating model parameters and identifying the true clustering and sparsity structures. Our analysis of resting state fMRI data from the Human Connectome Project reveals biologically interpretable differences between distinct fluid intelligence groups, and reproducible parameter estimates. In contrast, single-subject VAR analyses followed by permutation testing result in negligible differences, which is biologically implausible.

Article

Sep 2020

Linear regression with measurement error in the covariates is a heavily studied topic, however, the statistics/econometrics literature is almost silent to estimating a multi-equation model with measurement error. This paper considers a seemingly unrelated regression model with measurement error in the covariates and introduces two novel estimation methods: a pure Bayesian algorithm (based on Markov chain Monte Carlo techniques) and its mean field variational Bayes (MFVB) approximation. The MFVB method has the added advantage of being computationally fast and can handle big data. An issue pertinent to measurement error models is parameter identification, and this is resolved by employing a prior distribution on the measurement error variance. The methods are shown to perform well in multiple simulation studies, where we analyze the impact on posterior estimates for different values of reliability ratio or variance of the true unobserved quantity used in the data generating process. The paper further implements the proposed algorithms in an application drawn from the health literature and shows that modeling measurement error in the data can improve model fitting.

Independent innovation analysis for nonlinear vector autoregressive process

Preprint

Jun 2020

The nonlinear vector autoregressive (NVAR) model provides an appealing framework to analyze multivariate time series obtained from a nonlinear dynamical system. However, the innovation (or error), which plays a key role by driving the dynamics, is almost always assumed to be additive. Additivity greatly limits the generality of the model, hindering analysis of general NVAR process which have nonlinear interactions between the innovations. Here, we propose a new general framework called independent innovation analysis (IIA), which estimates the innovations from completely general NVAR. We assume mutual independence of the innovations as well as their modulation by a fully observable auxiliary variable (which is often taken as the time index and simply interpreted as nonstationarity). We show that IIA guarantees the identifiability of the innovations with arbitrary nonlinearities, up to a permutation and component-wise invertible nonlinearities. We propose two practical estimation methods, both of which can be easily implemented by ordinary neural network training. We thus provide the first rigorous identifiability result for general NVAR, as well as very general tools for learning such models.

Preprint

Full-text available

Jun 2020

Linear regression with measurement error in the covariates is a heavily studied topic, however, the statistics/econometrics literature is almost silent to estimating a multi-equation model with measurement error. This paper considers a seemingly unrelated regression model with measurement error in the covariates and introduces two novel estimation methods: a pure Bayesian algorithm (based on Markov chain Monte Carlo techniques) and its mean field variational Bayes (MFVB) approximation. The MFVB method has the added advantage of being computationally fast and can handle big data. An issue pertinent to measurement error models is parameter identification, and this is resolved by employing a prior distribution on the measurement error variance. The methods are shown to perform well in multiple simulation studies, where we analyze the impact on posterior estimates arising due to different values of reliability ratio or variance of the true unobserved quantity used in the data generating process. The paper further implements the proposed algorithms in an application drawn from the health literature and shows that modeling measurement error in the data can improve model fitting.

Preprint

Full-text available

Jun 2020

Nonparametric Impulse Response Analysis in Changing Macroeconomic Conditions

Article

Jan 2021

Radial basis functions neural networks for nonlinear time series analysis and time-varying effects of supply shocks

Article

Jun 2020
J MACROECON

Nobuyuki Kanazawa

I propose a flexible Radial Basis Functions (RBFs) Artificial Neural Networks method for studying the time series properties of macroeconomic variables. To assess the validity of the RBF approach, I conduct a Monte Carlo experiment using the data generated from a nonlinear New Keynesian (NK) model. I find that the RBF estimator can uncover the structure of the NK model from the simulated data of 300 observations. Finally, I apply the RBF estimator to the quarterly US data and show that the positive supply shocks have significantly weaker expansionary effects during the periods of passive monetary policy regimes.

Bayesian nonparametric vector autoregressive models

Article

Full-text available

Jan 2018
J ECONOMETRICS

Vector autoregressive (VAR) models are the main work-horse model for macroeconomic forecasting, and provide a framework for the analysis of complex dynamics that are present between macroeconomic variables. Whether a classical or a Bayesian approach is adopted, most VAR models are linear with Gaussian innovations. This can limit the model’s ability to explain the relationships in macroeconomic series. We propose a nonparametric VAR model that allows for nonlinearity in the conditional mean, heteroscedasticity in the conditional variance, and non-Gaussian innovations. Our approach differs to that of previous studies by modelling the stationary and transition densities using Bayesian nonparametric methods. Our Bayesian nonparametric VAR (BayesNP-VAR) model is applied to US and UK macroeconomic time series, and compared to other Bayesian VAR models. We show that BayesNP-VAR is a flexible model that is able to account for nonlinear relationships as well as heteroscedasticity in the data. In terms of short-run out-of-sample forecasts, we show that BayesNP-VAR predictively outperforms competing models.

Piecewise Regression Using Cubic Spline

Article

Full-text available

Sep 1973

Dale Poirier

Spline theory and piecewise regression theory are integrated to provide a framework in which structural change is viewed as occurring in a smooth fashion. Specifically, structural change occurs at given points through jump discontinuities in the third derivative of a continuous piecewise cubic estimating function. Testing procedures are developed for detecting structural change as well as linear or quadratic segments. Finally, the techniques developed are illustrated empirically in a learning-by-doing model.

Specification and Inference in Nonparametric Additive Regression

Article

Full-text available

Ivan Jeliazkov

This article revisits the Bayesian inferential problem for the class of nonparametric additive models. A new identiflcation scheme for the unknown covariate functions is proposed and con- trasted with existing approaches, and is used to develop an e-cient Markov chain Monte Carlo estimation algorithm. Building upon the identiflcation scheme, the resulting estimation proce- dure, and a class of proper smoothness priors for the unknown functions, the paper considers the problem of model comparison using marginal likelihoods and Bayes factors. A simulation study illustrates the performance of the proposed techniques. The methods are illustrated in two applications in economics { one dealing with student achievement, and the other with urban growth. Extensions of the methodology to other settings, such as discrete and clustered data, are also discussed.

Bayesian Data Analysis

Book

Jul 2003

Local polynomial estimators of the volatility function in nonparametric autorregression

Article

J ECONOMETRICS

Time Varying Structural Vector Autoregressions and Monetary Policy

Article

Jul 2005

Giorgio E. Primiceri

Monetary policy and the private sector behaviour of the U.S. economy are modelled as a time varying structural vector autoregression, where the sources of time variation are both the coefficients and the variance covariance matrix of the innovations. The paper develops a new, simple modelling strategy for the law of motion of the variance covariance matrix and proposes an efficient Markov chain Monte Carlo algorithm for the model likelihood/posterior numerical evaluation. The main empirical conclusions are: (1) both systematic and non-systematic monetary policy have changed during the last 40 years - in particular, systematic responses of the interest rate to inflation and unemployment exhibit a trend toward a more aggressive behaviour, despite remarkable oscillations; (2) this has had a negligible effect on the rest of the economy. The role played by exogenous non-policy shocks seems more important than interest rate policy in explaining the high inflation and unemployment episodes in recent U.S. economic history.

Revising beliefs in nonidentified models

Article

Aug 1998

Dale Poirier

A Bayesian analysis of a nonidentified model is always possible if a proper prior on all the parameters is specified. There is, however, no Bayesian free lunch. The "price" is that there exist quantities about which the data are uninformative, i.e., their marginal prior and posterior distributions are identical. In the case of improper priors the analysis is problematic - resulting posteriors can be improper. This study investigates both proper and improper cases through a series of examples.

Bayesian Computation and Stochastic Systems

Article

Feb 1995
STAT SCI

Markov chain Monte Carlo (MCMC) methods have been used extensively in statistical physics over the last 40 years, in spatial statistics for the past 20 and in Bayesian image analysis over the last decade. In the last five years, MCMC has been introduced into significance testing, general Bayesian inference and maximum likelihood estimation. This paper presents basic methodology of MCMC, emphasizing the Bayesian paradigm, conditional probability and the intimate relationship with Markov random fields in spatial statistics. Hastings algorithms are discussed, including Gibbs, Metropolis and some other variations. Pairwise difference priors are described and are used subsequently in three Bayesian applications, in each of which there is a pronounced spatial or temporal aspect to the modeling. The examples involve logistic regression in the presence of unobserved covariates and ordinal factors; the analysis of agricultural field experiments, with adjustment for fertility gradients; and processing of low-resolution medical images obtained by a gamma camera. Additional methodological issues arise in each of these applications and in the Appendices. The paper lays particular emphasis on the calculation of posterior probabilities and concurs with others in its view that MCMC facilitates a fundamental breakthrough in applied Bayesian modeling. Comments: Arnoldo Frigessi (41–43), Alan E. Gelfand, Bradley P. Carlin (43–46), Charles J. Geyer (46–48), G. O. Roberts, S. K. Sahu, W. R. Gilks (49–51), Wing Hung Wong (52–53), Bin Yu (54–58), Julian Besag, Peter Green, David Higdon, Kerrie Mengersen (58–66).

Scale Mixtures of Normal Distributions

Article

Sep 1974

This paper presents necessary and sufficient conditions under which a random variable X may be generated as the ratio Z/V where Z and V are independent and Z has a standard normal distribution. This representation is useful in Monte Carlo calculations. It is established that when 1/2V2 is exponential, X is double exponential; and that when 1/2V has the asymptotic distribution of the Kolmogorov distance statistic, X is logistic.

Variable Selection and Function Estimation in Additive Nonparametric Regression Using a Data-Based Prior

Article

Sep 1999

A hierarchical Bayesian approach is proposed for variable selection and function estimation in additive nonparametric Gaussian regression models and additive nonparametric binary regression models. The prior for each component function is an integrated Wiener process resulting in a posterior mean estimate that is a cubic smoothing spline. Each of the explanatory variables is allowed to be in or out of the model, and the regression functions are estimated by model averaging. To allow variable selection and model averaging, data-based priors are used for the smoothing parameter and the slope at 0 of each component function. A two-step Markov chain Monte Carlo method is used to efficiently obtain the data-based prior and to carry out variable selection and function estimation. It is shown by simulation that significant improvements in the function estimators can be obtained over an approach that estimates all the unknown functions simultaneously. The methodology is illustrated for a binary regression using heart attack data.

The Art of Data Augmentation

Article

Mar 2001

The term data augmentation refers to methods for constructing iterative optimization or sampling algorithms via the introduction of unobserved data or latent variables. For deterministic algorithms,the method was popularizedin the general statistical community by the seminal article by Dempster, Laird, and Rubin on the EM algorithm for maximizing a likelihood function or, more generally, a posterior density. For stochastic algorithms, the method was popularized in the statistical literature by Tanner and Wong’s Data Augmentation algorithm for posteriorsampling and in the physics literatureby Swendsen and Wang’s algorithm for sampling from the Ising and Potts models and their generalizations; in the physics literature,the method of data augmentationis referred to as the method of auxiliary variables. Data augmentationschemes were used by Tanner and Wong to make simulation feasible and simple, while auxiliary variables were adopted by Swendsen and Wang to improve the speed of iterative simulation. In general,however, constructing data augmentation schemes that result in both simple and fast algorithms is a matter of art in that successful strategiesvary greatlywith the (observed-data) models being considered.After an overview of data augmentation/auxiliary variables and some recent developments in methods for constructing such

Nonparametric Vector Autoregressions: Specification, Estimation, and Inference

Abstract and Figures

Recommended publications

Forecasting Key Macroeconomic Variables of the South African Economy Using Bayesian Variable Selecti...

Specification and Inference in Nonparametric Additive Regression

Efficient simulation and integrated likelihood estimation in state space models

An Alternate Parameterization for Bayesian Nonparametric/Semiparametric Regression

Estimation of Semiparametric Models in the Presence of Endogeneity and Sample Selection