ArticlePDF Available

Gretl user's guide ARIMA time-series

January 2008

January 2008

Authors:

Allin F. Cottrell

Wake Forest University

Riccardo Lucchetti

Università Politecnica delle Marche

Get the latest version here: https://sourceforge.net/projects/gretl/files/manual/gretl-guide-a4.pdf/download

Content uploaded by Riccardo Lucchetti

Content may be subject to copyright.

Gretl User’s Guide

Gnu Regression, Econometrics and Time-series Library

Allin Cottrell

Department of Economics

Wake Forest University

Riccardo “Jack” Lucchetti

Dipartimento di Economia

Universit`a Politecnica delle Marche

June, 2024

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU

Free Documentation License, Version 1.1 or any later version published by the Free Software Foun-

dation (see http://www.gnu.org/licenses/fdl.html).

Contents

1 Introduction 1

1.1 Features at a glance ..................................... 1

1.2 Acknowledgements ..................................... 1

1.3 Installing the programs ................................... 2

I Running the program 3

2 Getting started 4

2.1 Let’s run a regression .................................... 4

2.2 Estimation output ...................................... 6

2.3 The main window menus .................................. 6

2.4 Keyboard shortcuts ..................................... 10

2.5 The gretl toolbar ...................................... 10

3 Modes of working 11

3.1 Command scripts ...................................... 11

3.2 Saving script objects .................................... 12

3.3 The gretl console ...................................... 13

3.4 The Session concept ..................................... 13

4 Data ﬁles 17

4.1 Data ﬁle formats ...................................... 17

4.2 Databases .......................................... 17

4.3 Creating a dataset from scratch .............................. 18

4.4 Structuring a dataset .................................... 20

4.5 Panel data speciﬁcs ..................................... 21

4.6 Missing data values ..................................... 25

4.7 Maximum size of data sets ................................. 26

4.8 Data ﬁle collections ..................................... 26

4.9 Assembling data from multiple sources .......................... 28

5 Sub-sampling a dataset 29

5.1 Introduction ......................................... 29

5.2 Setting the sample ..................................... 29

5.3 Restricting the sample ................................... 30

5.4 Panel data .......................................... 31

5.5 Resampling and bootstrapping ............................... 32

6 Graphics 34

Contents ii

6.1 Gnuplot graphs ....................................... 34

6.2 Plotting graphs from scripts ................................ 37

6.3 Boxplots ........................................... 43

7 Joining data sources 45

7.1 Introduction ......................................... 45

7.2 Basic syntax ......................................... 45

7.3 Filtering ........................................... 46

7.4 Matching with keys ..................................... 47

7.5 Aggregation ......................................... 49

7.6 String-valued key variables ................................. 50

7.7 Importing multiple series .................................. 51

7.8 A real-world case ...................................... 51

7.9 The representation of dates ................................. 54

7.10 Time-series data ....................................... 54

7.11 Special handling of time columns ............................. 57

7.12 Panel data .......................................... 57

7.13 Memo: join options .................................... 59

8 Realtime data 62

8.1 Introduction ......................................... 62

8.2 Atomic format for realtime data .............................. 62

8.3 More on time-related options ................................ 64

8.4 Getting a certain data vintage ............................... 64

8.5 Getting the n-th release for each observation period ................... 65

8.6 Getting the values at a ﬁxed lag after the observation period .............. 66

8.7 Getting the revision history for an observation ...................... 67

9 Temporal disaggregation 70

9.1 Introduction ......................................... 70

9.2 Notation and design ..................................... 71

9.3 Overview of data handling ................................. 71

9.4 Extrapolation ........................................ 72

9.5 Function signature ..................................... 72

9.6 Handling of deterministic terms .............................. 74

9.7 Some technical details ................................... 74

9.8 The plot option ....................................... 75

9.9 Multiple low-frequency series ................................ 76

9.10 Examples ........................................... 76

10 Special functions in genr 78

10.1 Introduction ......................................... 78

10.2 Cumulative densities and p-values ............................. 79

10.3 Retrieving internal variables (dollar accessors) ...................... 79

Contents iii

11 Gretl data types 81

11.1 Introduction ......................................... 81

11.2 Series ............................................. 81

11.3 Scalars ............................................ 82

11.4 Matrices ........................................... 82

11.5 Lists ............................................. 82

11.6 Strings ............................................ 82

11.7 Bundles ........................................... 82

11.8 Arrays ............................................ 88

11.9 The life cycle of gretl objects ................................ 91

12 Discrete variables 94

12.1 Declaring variables as discrete ............................... 94

12.2 Commands for discrete variables .............................. 95

13 Loop constructs 99

13.1 Introduction ......................................... 99

13.2 Loop control variants .................................... 99

13.3 Special controls ....................................... 102

13.4 Progressive mode ...................................... 102

13.5 Loop examples ........................................ 102

14 User-deﬁned functions 106

14.1 Deﬁning a function ..................................... 106

14.2 Calling a function ...................................... 108

14.3 Deleting a function ..................................... 109

14.4 Function programming details ............................... 109

14.5 Function packages ...................................... 117

15 Named lists and strings 118

15.1 Named lists ......................................... 118

15.2 Named strings ........................................ 123

16 String-valued series 127

16.1 Introduction ......................................... 127

16.2 Creating a string-valued series ............................... 127

16.3 Permitted operations .................................... 130

16.4 String-valued series and functions ............................. 132

16.5 Other import formats .................................... 133

17 Matrix manipulation 134

17.1 Creating matrices ...................................... 134

17.2 Empty matrices ....................................... 135

17.3 Selecting submatrices .................................... 136

17.4 Deleting rows or columns .................................. 137

Contents iv

17.5 Matrix operators ...................................... 138

17.6 Matrix–scalar operators ................................... 139

17.7 Matrix functions ....................................... 139

17.8 Matrix accessors ....................................... 144

17.9 Namespace issues ...................................... 145

17.10 Creating a data series from a matrix ........................... 146

17.11 Matrices and lists ...................................... 146

17.12 Deleting a matrix ...................................... 146

17.13 Printing a matrix ...................................... 147

17.14 Example: OLS using matrices ............................... 148

18 Complex matrices 149

18.1 Introduction ......................................... 149

18.2 Creating a complex matrix ................................. 149

18.3 Indexation .......................................... 150

18.4 Operators .......................................... 150

18.5 Functions ........................................... 151

18.6 File input/output ...................................... 153

18.7 Backward (in)compatibility ................................. 153

19 Calendar dates 155

19.1 Introduction ......................................... 155

19.2 Date and time representations ............................... 155

19.3 Converting between representations ............................ 157

19.4 Epoch day arithmetic .................................... 160

19.5 Other accessors and functions ............................... 162

19.6 Working with pre-Gregorian dates ............................. 163

20 Mixed-frequency data 166

20.1 Basics ............................................ 166

20.2 The notion of a “MIDAS list” ............................... 167

20.3 High-frequency lag lists ................................... 168

20.4 High-frequency ﬁrst diﬀerences ............................... 170

20.5 MIDAS-related plots .................................... 171

20.6 Alternative MIDAS data methods ............................. 171

21 Cheat sheet 176

21.1 Dataset handling ...................................... 176

21.2 Creating/modifying variables ................................ 180

21.3 Neat tricks .......................................... 186

II Econometric methods 191

22 Robust covariance matrix estimation 192

Contents v

22.1 Introduction ......................................... 192

22.2 Cross-sectional data and the HCCME ........................... 193

22.3 Time series data and HAC covariance matrices ...................... 194

22.4 Special issues with panel data ............................... 199

22.5 The cluster-robust estimator ................................ 200

23 Panel data 202

23.1 Estimation of panel models ................................. 202

23.2 Autoregressive panel models ................................ 209

24 Dynamic panel models 211

24.1 Introduction ......................................... 211

24.2 Usage ............................................. 214

24.3 Replication of DPD results ................................. 216

24.4 Cross-country growth example ............................... 219

24.5 Auxiliary test statistics ................................... 221

24.6 Post-estimation available statistics ............................. 222

24.7 Memo: dpanel options ................................... 223

25 Nonlinear least squares 225

25.1 Introduction and examples ................................. 225

25.2 Initializing the parameters ................................. 225

25.3 NLS dialog window ..................................... 226

25.4 Analytical and numerical derivatives ........................... 226

25.5 Advanced use ........................................ 227

25.6 Controlling termination ................................... 227

25.7 Details on the code ..................................... 228

25.8 Numerical accuracy ..................................... 228

26 Maximum likelihood estimation 230

26.1 Generic ML estimation with gretl ............................. 230

26.2 Syntax ............................................ 231

26.3 Covariance matrix and standard errors .......................... 232

26.4 Gamma estimation ..................................... 233

26.5 Stochastic frontier cost function .............................. 235

26.6 GARCH models ....................................... 237

26.7 Analytical derivatives .................................... 238

26.8 Debugging ML scripts ................................... 240

26.9 Using functions ....................................... 240

26.10 Advanced use of mle: functions, analytical derivatives, algorithm choice . . . . . . . 243

26.11 Estimating constrained models ............................... 246

26.12 Handling non-convergence gracefully ........................... 249

27 GMM estimation 250

27.1 Introduction and terminology ............................... 250

Contents vi

27.2 GMM as Method of Moments ............................... 251

27.3 OLS as GMM ........................................ 254

27.4 TSLS as GMM ....................................... 255

27.5 Covariance matrix options ................................. 255

27.6 A real example: the Consumption Based Asset Pricing Model ............. 257

27.7 Caveats ............................................ 260

28 Model selection criteria 261

28.1 Introduction ......................................... 261

28.2 Information criteria ..................................... 261

29 Degrees of freedom correction 263

29.1 Introduction ......................................... 263

29.2 Back to basics ........................................ 263

29.3 Application to OLS regression ............................... 263

29.4 Beyond OLS ......................................... 264

29.5 Consistency and awkward cases .............................. 265

29.6 What gretl does ....................................... 266

30 Time series ﬁlters 269

30.1 Fractional diﬀerencing ................................... 269

30.2 The Hodrick–Prescott ﬁlter ................................. 269

30.3 The Baxter and King ﬁlter ................................. 270

30.4 The Butterworth ﬁlter ................................... 271

30.5 The discrete Fourier transform ............................... 272

31 Univariate time series models 275

31.1 Introduction ......................................... 275

31.2 ARIMA models ....................................... 275

31.3 Unit root tests ........................................ 280

31.4 Cointegration test ...................................... 285

31.5 ARCH and GARCH .................................... 286

32 Vector Autoregressions 289

32.1 Notation ........................................... 289

32.2 Estimation .......................................... 290

32.3 Structural VARs ....................................... 293

32.4 Residual-based diagnostic tests .............................. 295

33 Cointegration and Vector Error Correction Models 297

33.1 Introduction ......................................... 297

33.2 Vector Error Correction Models as representation of a cointegrated system . . . . . . 298

33.3 Interpretation of the deterministic components ...................... 299

33.4 The Johansen cointegration tests ............................. 301

33.5 Identiﬁcation of the cointegration vectors ......................... 301

Contents vii

33.6 Over-identifying restrictions ................................ 304

33.7 Numerical solution methods ................................ 309

34 Multivariate models 312

34.1 The system command .................................... 312

34.2 Equation systems within functions ............................. 313

34.3 Restriction and estimation ................................. 314

34.4 System accessors ...................................... 315

35 Forecasting 319

35.1 Introduction ......................................... 319

35.2 Saving and inspecting ﬁtted values ............................ 319

35.3 The fcast command .................................... 319

35.4 Univariate forecast evaluation statistics .......................... 321

35.5 Forecasts based on VAR models .............................. 323

35.6 Forecasting from simultaneous systems .......................... 324

36 State Space Modeling 326

36.1 Introduction ......................................... 326

36.2 Notation ........................................... 326

36.3 Deﬁning the model as a bundle .............................. 326

36.4 Special features of state-space bundles .......................... 327

36.5 The kfilter function .................................... 328

36.6 The ksmooth function .................................... 329

36.7 The kdsmooth function ................................... 329

36.8 Diﬀuse initialization of the state vector .......................... 330

36.9 Extensions and reﬁnements ................................. 332

36.10 The ksimul function .................................... 334

36.11 Numerical optimization ................................... 335

36.12 Example scripts ....................................... 336

36.13 Graphical interface ..................................... 341

37 Numerical methods 348

37.1 Derivative-based optimization methods .......................... 348

37.2 Derivative-free optimization methods ........................... 352

37.3 Numerical diﬀerentiation .................................. 354

37.4 Numerical integration .................................... 358

38 Discrete and censored dependent variables 360

38.1 Logit and probit models .................................. 360

38.2 Ordered response models .................................. 363

38.3 Multinomial logit ...................................... 364

38.4 Bivariate probit ....................................... 367

38.5 Panel estimators ....................................... 367

38.6 The Tobit model ...................................... 368

Contents viii

38.7 Interval regression ...................................... 369

38.8 Sample selection model ................................... 371

38.9 Count data .......................................... 372

38.10 Duration models ....................................... 373

39 Quantile regression 381

39.1 Introduction ......................................... 381

39.2 Basic syntax ......................................... 381

39.3 Conﬁdence intervals ..................................... 382

39.4 Multiple quantiles ...................................... 382

39.5 Large datasets ........................................ 383

40 Nonparametric methods 386

40.1 Locally weighted regression (loess) ............................. 386

40.2 The Nadaraya–Watson estimator ............................. 387

41 MIDAS models 391

41.1 Parsimonious parameterizations .............................. 391

41.2 Estimating MIDAS models ................................. 392

41.3 Parameterization functions ................................. 399

III Technical details 401

42 Gretl and ODBC 402

42.1 ODBC support ....................................... 402

42.2 ODBC base concepts .................................... 402

42.3 Syntax ............................................ 403

42.4 Examples ........................................... 405

42.5 Connectivity details ..................................... 406

43 Gretl and T

X408

43.1 Introduction ......................................... 408

43.2 T

X-related menu items .................................. 408

43.3 Fine-tuning typeset output ................................. 411

43.4 Installing and learning T

X................................ 412

44 Gretl and R 413

44.1 Introduction ......................................... 413

44.2 Starting an interactive R session .............................. 413

44.3 Running an R script .................................... 416

44.4 Sending data back and forth ................................ 417

44.5 Interacting with R from the command line ........................ 420

44.6 Performance issues with R ................................. 420

44.7 Further use of the R library ................................ 421

Contents ix

45 Gretl and Ox 423

45.1 Introduction ......................................... 423

45.2 Ox support in gretl ..................................... 423

45.3 Illustration: replication of DPD model .......................... 425

46 Gretl and Octave 427

46.1 Introduction ......................................... 427

46.2 Octave support in gretl ................................... 427

46.3 Illustration: spectral methods ............................... 428

47 Gretl and Stata 431

48 Gretl and Python 432

48.1 Introduction ......................................... 432

48.2 Python support in gretl ................................... 432

48.3 Illustration: linear regression with multicollinearity ................... 432

49 Gretl and Julia 434

49.1 Introduction ......................................... 434

49.2 Julia support in gretl .................................... 434

49.3 Illustration .......................................... 434

50 Troubleshooting gretl 436

50.1 Bug reports ......................................... 436

50.2 Auxiliary programs ..................................... 437

51 The command line interface 438

IV Appendices 439

A Data ﬁle details 440

A.1 Basic native format ..................................... 440

A.2 Binary data ﬁle format ................................... 440

A.3 Native database format ................................... 440

B Building gretl 442

B.1 Installing the prerequisites ................................. 442

B.2 Getting the source: release or git ............................. 443

B.3 Conﬁgure the source .................................... 443

B.4 Build and install ....................................... 444

C Numerical accuracy 446

D Related free software 447

E Listing of URLs 448

Contents x

Bibliography 449

Chapter 1

Introduction

1.1 Features at a glance

Gretl is an econometrics package, including a shared library, a command-line client program and a

graphical user interface.

User-friendly Gretl oﬀers an intuitive user interface; it is very easy to get up and running with

econometric analysis. Thanks to its association with the econometrics textbooks by Ramu

Ramanathan, Jeﬀrey Wooldridge, and James Stock and Mark Watson, the package oﬀers many

practice data ﬁles and command scripts. These are well annotated and accessible. Two other

useful resources for gretl users are the available documentation and the gretl-users mailing list.

Flexible You can choose your preferred point on the spectrum from interactive point-and-click to

complex scripting, and can easily combine these approaches.

Cross-platform Gretl’s “home” platform is Linux but it is also available for MS Windows and Mac

OS X, and should work on any unix-like system that has the appropriate basic libraries (see

Appendix B).

Open source The full source code for gretl is available to anyone who wants to critique it, patch it,

or extend it. See Appendix B.

Sophisticated Gretl oﬀers a full range of least-squares based estimators, either for single equations and

for systems, including vector autoregressions and vector error correction models. Several speciﬁc

maximum likelihood estimators (e.g. probit, ARIMA, GARCH) are also provided natively; more

advanced estimation methods can be implemented by the user via generic maximum likelihood

or nonlinear GMM.

Extensible Users can enhance gretl by writing their own functions and procedures in gretl’s scripting

language, which includes a wide range of matrix functions.

Accurate Gretl has been thoroughly tested on several benchmarks, among which the NIST reference

datasets. See Appendix C.

Internet ready Gretl can fetch materials such databases, collections of textbook dataﬁles and add-on

packages over the internet.

International Gretl will produce its output in English, French, Italian, Spanish, Polish, Portuguese,

German, Basque, Turkish, Russian, Albanian or Greek depending on your computer’s native

language setting.

1.2 Acknowledgements

The gretl code base originally derived from the program ESL (“Econometrics Software Library”),

written by Professor Ramu Ramanathan of the University of California, San Diego. We are much in

debt to Professor Ramanathan for making this code available under the GNU General Public Licence

and for helping to steer gretl’s early development.

We are also grateful to the authors of several econometrics textbooks for permission to package for gretl

various datasets associated with their texts. This list currently includes William Greene, author of

Econometric Analysis; Jeﬀrey Wooldridge (Introductory Econometrics: A Modern Approach); James

Stock and Mark Watson (Introduction to Econometrics); Damodar Gujarati (Basic Econometrics);

Russell Davidson and James MacKinnon (Econometric Theory and Methods); and Marno Verbeek

(A Guide to Modern Econometrics).

Chapter 1. Introduction 2

GARCH estimation in gretl is based on code deposited in the archive of the Journal of Applied

Econometrics by Professors Fiorentini, Calzolari and Panattoni, and the code to generate p-values

for Dickey–Fuller tests is due to James MacKinnon. In each case we are grateful to the authors for

permission to use their work.

With regard to the internationalization of gretl, thanks go to Ignacio D´ıaz-Emparanza (Spanish),

Michel Robitaille and Florent Bresson (French), Cristian Rigamonti (Italian), Tadeusz Kufel and

Pawel Kufel (Polish), Markus Hahn and Sven Schreiber (German), H´elio Guilherme and Henrique

Andrade (Portuguese), Susan Orbe (Basque), Talha Yalta (Turkish) and Alexander Gedranovich

(Russian).

Gretl has beneﬁtted greatly from the work of numerous developers of free, open-source software:

for speciﬁcs please see Appendix B. Our thanks are due to Richard Stallman of the Free Software

Foundation, for his support of free software in general and for agreeing to “adopt” gretl as a GNU

program in particular.

Many users of gretl have submitted useful suggestions and bug reports. In this connection particular

thanks are due to Ignacio D´ıaz-Emparanza, Tadeusz Kufel, Pawel Kufel, Alan Isaac, Cri Rigamonti,

Sven Schreiber, Talha Yalta, Andreas Rosenblad, and Dirk Eddelbuettel, who maintains the gretl

package for Debian GNU/Linux.

1.3 Installing the programs

Linux

On the Linux1platform you have the choice of compiling the gretl code yourself or making use of a

pre-built package. Building gretl from the source is necessary if you want to access the development

version or customize gretl to your needs, but this takes quite a few skills; most users will want to go

for a pre-built package.

Some Linux distributions feature gretl as part of their standard oﬀering: Debian, Ubuntu and Fedora,

for example. If this is the case, all you need to do is install gretl through your package manager of

choice. In addition the gretl webpage at http://gretl.sourceforge.net oﬀers a “generic” package

in rpm format for modern Linux systems.

If you prefer to compile your own (or are using a unix system for which pre-built packages are not

available), instructions on building gretl can be found in Appendix B.

MS Windows

The MS Windows version comes as a self-extracting executable. Installation is just a matter of

downloading gretl_install.exe and running this program. You will be prompted for a location to

install the package.

Mac OS X

The Mac version comes as a gzipped disk image. Installation is a matter of downloading the image

ﬁle, opening it in the Finder, and dragging Gretl.app to the Applications folder. However, when

installing for the ﬁrst time two prerequisite packages must be put in place ﬁrst; details are given on

the gretl website.

1In this manual we use “Linux” as shorthand to refer to the GNU/Linux operating system. What is said herein

about Linux mostly applies to other unix-type systems too, though some local modiﬁcations may be needed.

Part I

Running the program

Chapter 2

Getting started

2.1 Let’s run a regression

This introduction is mostly angled towards the graphical client program; please see Chapter 51 below

and the Gretl Command Reference for details on the command-line program, gretlcli.

You can supply the name of a data ﬁle to open as an argument to gretl, but for the moment let’s not

do that: just ﬁre up the program.1You should see a main window (which will hold information on

the data set but which is at ﬁrst blank) and various menus, some of them disabled at ﬁrst.

What can you do at this point? You can browse the supplied data ﬁles (or databases), open a data

ﬁle, create a new data ﬁle, read the help items, or open a command script. For now let’s browse the

supplied data ﬁles. Under the File menu choose “Open data, Sample ﬁle”. A second notebook-type

window will open, presenting the sets of data ﬁles supplied with the package (see Figure 2.1). Select

the ﬁrst tab, “Ramanathan”. The numbering of the ﬁles in this section corresponds to the chapter

organization of Ramanathan (2002), which contains discussion of the analysis of these data. The data

will be useful for practice purposes even without the text.

Figure 2.1: Practice data ﬁles window

If you select a row in this window and click on “Info” this opens a window showing information on

the data set in question (for example, on the sources and deﬁnitions of the variables). If you ﬁnd a

ﬁle that is of interest, you may open it by clicking on “Open”, or just double-clicking on the ﬁle name.

For the moment let’s open data3-6.

☞In gretl windows containing lists, double-clicking on a line launches a default action for the associated list entry:

e.g. displaying the values of a data series, opening a ﬁle.

This ﬁle contains data pertaining to a classic econometric “chestnut”, the consumption function. The

data window should now display the name of the current data ﬁle, the overall data range and sample

range, and the names of the variables along with brief descriptive tags—see Figure 2.2.

OK, what can we do now? Hopefully the various menu options should be fairly self explanatory. For

1For convenience we refer to the graphical client program simply as gretl in this manual. Note, however, that the

speciﬁc name of the program diﬀers according to the computer platform. On Linux it is called gretl_x11 while on

MS Windows it is gretl.exe. On Linux systems a wrapper script named gretl is also installed — see also the Gretl

Command Reference.

Chapter 2. Getting started 5

Figure 2.2: Main window, with a practice data ﬁle open

now we’ll dip into the Model menu; a brief tour of all the main window menus is given in Section 2.3

below.

Gretl’s Model menu oﬀers numerous various econometric estimation routines. The simplest and most

standard is Ordinary Least Squares (OLS). Selecting OLS pops up a dialog box calling for a model

speciﬁcation—see Figure 2.3.

Figure 2.3: Model speciﬁcation dialog

To select the dependent variable, highlight the variable you want in the list on the left and click the

arrow that points to the Dependent variable slot. If you check the “Set as default” box this variable

will be pre-selected as dependent when you next open the model dialog box. Shortcut: double-clicking

on a variable on the left selects it as dependent and also sets it as the default. To select independent

variables, highlight them on the left and click the green arrow (or right-click the highlighted variable);

to remove variables from the selected list, use the rad arrow. To select several variable in the list box,

drag the mouse over them; to select several non-contiguous variables, hold down the Ctrl key and

click on the variables you want. To run a regression with consumption as the dependent variable and

income as independent, click Ct into the Dependent slot and add Yt to the Independent variables list.

Chapter 2. Getting started 6

2.2 Estimation output

Once you’ve speciﬁed a model, a window displaying the regression output will appear. The output is

reasonably comprehensive and in a standard format (Figure 2.4).

Figure 2.4: Model output window

The output window contains menus that allow you to inspect or graph the residuals and ﬁtted values,

and to run various diagnostic tests on the model.

For most models there is also an option to print the regression output in L

X format. See Chapter 43

for details.

To import gretl output into a word processor, you may copy and paste from an output window, using

its Edit menu (or Copy button, in some contexts) to the target program. Many (not all) gretl windows

oﬀer the option of copying in RTF (Microsoft’s “Rich Text Format”) or as L

X. If you are pasting

into a word processor, RTF may be a good option because the tabular formatting of the output is

preserved.2Alternatively, you can save the output to a (plain text) ﬁle then import the ﬁle into the

target program. When you ﬁnish a gretl session you are given the option of saving all the output

from the session to a single ﬁle.

Note that on the gnome desktop and under MS Windows, the File menu includes a command to send

the output directly to a printer.

☞When pasting or importing plain text gretl output into a word processor, select a monospaced or typewriter-style

font (e.g. Courier) to preserve the output’s tabular formatting. Select a small font (10-point Courier should do) to

prevent the output lines from being broken in the wrong place.

2.3 The main window menus

Reading left to right along the main window’s menu bar, we ﬁnd the File, Tools, Data, View, Add,

Sample, Variable, Model and Help menus.

•File menu

–Open data: Open a native gretl data ﬁle or import from other formats. See Chapter 4.

–Append data: Add data to the current working data set, from a gretl data ﬁle, a comma-

separated values ﬁle or a spreadsheet ﬁle.

2Note that when you copy as RTF under MS Windows, Windows will only allow you to paste the material into

applications that “understand” RTF. Thus you will be able to paste into MS Word, but not into notepad. Note also

that there appears to be a bug in some versions of Windows, whereby the paste will not work properly unless the

“target” application (e.g. MS Word) is already running prior to copying the material in question.

Chapter 2. Getting started 7

–Save data: Save the currently open native gretl data ﬁle.

–Save data as: Write out the current data set in native format, with the option of using gzip

data compression. See Chapter 4.

–Export data: Write out the current data set in Comma Separated Values (CSV) format, or

the formats of GNU R or GNU Octave. See Chapter 4and also Appendix D.

–Send to: Send the current data set as an e-mail attachment.

–New data set: Allows you to create a blank data set, ready for typing in values or for

importing series from a database. See below for more on databases.

–Clear data set: Clear the current data set out of memory. Generally you don’t have to do

this (since opening a new data ﬁle automatically clears the old one) but sometimes it’s

useful.

–Working directory: Change the current working directory (or“workdir”) and specify related

options. For an explanation of the role of the workdir click the Help button in the dialog

window which is presented, or refer to the documentation of the set command with the

workdir option in the command reference.

–Script ﬁles: A “script” is a ﬁle containing a sequence of gretl commands. This item contains

entries that let you open a script you have created previously (“User ﬁle”), open a sample

script, or open an editor window in which you can create a new script.

–Session ﬁles: A “session” ﬁle contains a snapshot of a previous gretl session, including the

data set used and any models or graphs that you saved. Under this item you can open a

saved session or save the current session.

–Databases: Allows you to browse various large databases, either on your own computer

or, if you are connected to the internet, on the gretl database server. See Section 4.2 for

details.

–Function packages: Manage user-contributed function packages that extend gretl’s capabil-

ities. To learn more about such packages written in gretl’s built-in matrix and scripting

language “hansl”, please refer to the “Packages” entry in Help menu.

–Resource from addon: Access example scripts and dataﬁles that are shipped as part of

gretl’s oﬃcial “addons”. (Addons are function packages that are more tightly integrated

with the gretl program than standard user-contributed packages.)

–Exit: Quit the program. You’ll be prompted to save any unsaved work.

•Tools menu

–Statistical tables: Look up critical values for commonly used distributions (normal or Gaus-

sian, t, chi-square, Fand Durbin–Watson).

–P-value ﬁnder: Look up p-values from the Gaussian, t, chi-square, F, gamma, binomial or

Poisson distributions. See also the pvalue command in the Gretl Command Reference.

–Distribution graphs: Produce graphs of various probability distributions. In the resulting

graph window, the pop-up menu includes an item “Add another curve”, which enables you

to superimpose a further plot (for example, you can draw the tdistribution with various

diﬀerent degrees of freedom).

–Test statistic calculator: Calculate test statistics and p-values for a range of common hy-

pothesis tests (population mean, variance and proportion; diﬀerence of means, variances

and proportions).

–Nonparametric tests: Calculate test statistics for various nonparametric tests (Sign test,

Wilcoxon rank sum test, Wilcoxon signed rank test, Runs test).

–Seed for random numbers: Set the seed for the random number generator (by default this

is set based on the system time when the program is started).

–Command log: Open a window containing a record of the commands executed so far.

–Gretl console: Open a “console” window into which you can type commands as you would

using the command-line program, gretlcli (as opposed to using point-and-click).

Chapter 2. Getting started 8

–Start Gnu R: Start R(if it is installed on your system), and load a copy of the data set

currently open in gretl. See Appendix D.

–Sort variables: Rearrange the listing of variables in the main window, either by ID number

or alphabetically by name.

–Function packages: Handles “function packages” (see Section 14.5), which allow you to

access functions written by other users and share the ones written by you.

–NIST test suite: Check the numerical accuracy of gretl against the reference results for linear

regression made available by the (US) National Institute of Standards and Technology.

–Preferences: Set the paths to various ﬁles gretl needs to access. Choose the font in which

gretl displays text output. Activate or suppress gretl’s messaging about the availability of

program updates, and so on. See the Gretl Command Reference for further details.

•Data menu

–Select all: Several menu items act upon those variables that are currently selected in the

main window. This item lets you select all the variables.

–Display values: Pops up a window with a simple (not editable) printout of the values of the

selected variable or variables.

–Edit values: Opens a spreadsheet window where you can edit the values of the selected

variables.

–Add observations: Gives a dialog box in which you can choose a number of observations to

add at the end of the current dataset; for use with forecasting.

–Remove extra observations: Active only if extra observations have been added automatically

in the process of forecasting; deletes these extra observations.

–Read info,Edit info: “Read info” just displays the summary information for the current

data ﬁle; “Edit info” allows you to make changes to it (if you have permission to do so).

–Print description: Opens a window containing a full account of the current dataset, including

the summary information and any speciﬁc information on each of the variables.

–Add case markers: Prompts for the name of a text ﬁle containing “case markers” (short

strings identifying the individual observations) and adds this information to the data set.

See Chapter 4.

–Remove case markers: Active only if the dataset has case markers identifying the observa-

tions; removes these case markers.

–Dataset structure: invokes a series of dialog boxes which allow you to change the structural

interpretation of the current dataset. For example, if data were read in as a cross section

you can get the program to interpret them as time series or as a panel. See also section 4.4.

–Compact data: For time-series data of higher than annual frequency, gives you the option of

compacting the data to a lower frequency, using one of four compaction methods (average,

sum, start of period or end of period).

–Expand data: For time-series data, gives you the option of expanding the data to a higher

frequency.

–Transpose data: Turn each observation into a variable and vice versa (or in other words,

each row of the data matrix becomes a column in the modiﬁed data matrix); can be useful

with imported data that have been read in “sideways”.

•View menu

–Icon view: Opens a window showing the content of the current session as a set of icons; see

section 3.4.

–Graph speciﬁed vars: Gives a choice between a time series plot, a regular X–Y scatter plot,

an X–Y plot using impulses (vertical bars), an X–Y plot “with factor separation” (i.e. with

the points colored diﬀerently depending to the value of a given dummy variable), boxplots,

and a 3-D graph. Serves up a dialog box where you specify the variables to graph. See

Chapter 6for details.

Chapter 2. Getting started 9

–Multiple graphs: Allows you to compose a set of up to six small graphs, either pairwise

scatter-plots or time-series graphs. These are displayed together in a single window.

–Summary statistics: Shows a full set of descriptive statistics for the variables selected in the

main window.

–Correlation matrix: Shows the pairwise correlation coeﬃcients for the selected variables.

–Cross Tabulation: Shows a cross-tabulation of the selected variables. This works only if at

least two variables in the data set have been marked as discrete (see Chapter 12).

–Principal components: Produces a Principal Components Analysis for the selected variables.

–Mahalanobis distances: Computes the Mahalanobis distance of each observation from the

centroid of the selected set of variables.

–Cross-correlogram: Computes and graphs the cross-correlogram for two selected variables.

•Add menu Oﬀers various standard transformations of variables (logs, lags, squares, etc.) that

you may wish to add to the data set. Also gives the option of adding random variables, and

(for time-series data) adding seasonal dummy variables (e.g. quarterly dummy variables for

quarterly data).

•Sample menu

–Set range: Select a diﬀerent starting and/or ending point for the current sample, within

the range of data available.

–Restore full range: self-explanatory.

–Deﬁne, based on dummy: Given a dummy (indicator) variable with values 0 or 1, this drops

from the current sample all observations for which the dummy variable has value 0.

–Restrict, based on criterion: Similar to the item above, except that you don’t need a pre-

deﬁned variable: you supply a Boolean expression (e.g. sqft > 1400) and the sample is

restricted to observations satisfying that condition. See the entry for genr in the Gretl

Command Reference for details on the Boolean operators that can be used.

–Random sub-sample: Draw a random sample from the full dataset.

–Drop all obs with missing values: Drop from the current sample all observations for which

at least one variable has a missing value (see Section 4.6).

–Count missing values: Give a report on observations where data values are missing. May be

useful in examining a panel data set, where it’s quite common to encounter missing values.

–Set missing value code: Set a numerical value that will be interpreted as “missing” or “not

available”. This is intended for use with imported data, when gretl has not recognized the

missing-value code used.

•Variable menu Most items under here operate on a single variable at a time. The “active” variable

is set by highlighting it (clicking on its row) in the main data window. Most options will be self-

explanatory. Note that you can rename a variable and can edit its descriptive label under “Edit

attributes”. You can also “Deﬁne a new variable” via a formula (e.g. involving some function

of one or more existing variables). For the syntax of such formulae, look at the online help for

“Generate variable syntax” or see the genr command in the Gretl Command Reference. One

simple example:

foo = x1 * x2

will create a new variable foo as the product of the existing variables x1 and x2. In these

formulae, variables must be referenced by name, not number.

•Model menu For details on the various estimators oﬀered under this menu please consult the

Gretl Command Reference. Also see Chapter 25 regarding the estimation of nonlinear models.

•Help menu Please use this as needed! It gives details on the syntax required in various dialog

entries.

Chapter 2. Getting started 10

2.4 Keyboard shortcuts

When working in the main gretl window, some common operations may be performed using the

keyboard, as shown in the table below.

Return Opens a window displaying the values of the currently selected variables: it is the

same as selecting “Data, Display Values”.

Delete Pressing this key has the eﬀect of deleting the selected variables. A conﬁrmation is

required, to prevent accidental deletions.

eHas the same eﬀect as selecting “Edit attributes” from the “Variable” menu.

F2 Same as “e”. Included for compatibility with other programs.

gHas the same eﬀect as selecting “Deﬁne new variable” from the “Variable” menu

(which maps onto the genr command).

hOpens a help window for gretl commands.

F1 Same as “h”. Included for compatibility with other programs.

rRefreshes the variable list in the main window.

tGraphs the selected variable; a line graph is used for time-series datasets, whereas

a distribution plot is used for cross-sectional data.

2.5 The gretl toolbar

At the bottom left of the main window sits the toolbar.

The icons have the following functions, reading from left to right:

1. Launch a calculator program. A convenience function in case you want quick access to a calcu-

lator when you’re working in gretl. The default program is calc.exe under MS Windows, or

xcalc under the X window system. You can change the program under the“Tools, Preferences,

General” menu, “Programs” tab.

2. Start a new script. Opens an editor window in which you can type a series of commands to be

sent to the program as a batch.

3. Open the gretl console. A shortcut to the “Gretl console” menu item (Section 2.3 above).

4. Open the session icon window.

5. Open a window displaying available gretl function packages.

6. Open this manual in PDF format.

7. Open the help item for script commands syntax (i.e. a listing with details of all available

commands).

8. Open the dialog box for deﬁning a graph.

9. Open the dialog box for estimating a model using ordinary least squares.

10. Open a window listing the sample datasets supplied with gretl, and any other data ﬁle collections

that have been installed.

Chapter 3

Modes of working

3.1 Command scripts

As you execute commands in gretl, using the GUI and ﬁlling in dialog entries, those commands are

recorded in the form of a “script” or batch ﬁle. Such scripts can be edited and re-run, using either

gretl or the command-line client, gretlcli.

To view the current state of the script at any point in a gretl session, choose “Command log” under

the Tools menu. This log ﬁle is called session.inp and it is overwritten whenever you start a new

session. To preserve it, save the script under a diﬀerent name. Script ﬁles will be found most easily,

using the GUI ﬁle selector, if you name them with the extension “.inp”.

To open a script you have written independently, use the “File, Script ﬁles” menu item; to create a

script from scratch use the “File, Script ﬁles, New script”item or the “new script” toolbar button. In

either case a script window will open (see Figure 3.1).

Figure 3.1: Script window, editing a command ﬁle

The toolbar at the top of the script window oﬀers the following functions (left to right): (1) Save

the ﬁle; (2) Save the ﬁle under a speciﬁed name; (3) Print the ﬁle (this option is not available on all

platforms); (4) Execute the commands in the ﬁle; (5) Copy selected text; (6) Paste the selected text;

(7) Find and replace text; (8) Undo the last Paste or Replace action; (9) Help (if you place the cursor

in a command word and press the question mark you will get help on that command); (10) Close the

window.

When you execute the script, by clicking on the Execute icon or by pressing Ctrl-r, all output is

directed to a single window, where it can be edited, saved or copied to the clipboard. To learn more

about the possibilities of scripting, take a look at the gretl Help item “Command reference,” or start

up the command-line program gretlcli and consult its help, or consult the Gretl Command Reference.

If you run the script when part of it is highlighted, gretl will only run that portion. Moreover, if you

want to run just the current line, you can do so by pressing Ctrl-Enter.1

1This feature is not unique to gretl; other econometric packages oﬀer the same facility. However, experience shows

Chapter 3. Modes of working 12

Clicking the right mouse button in the script editor window produces a pop-up menu. This gives you

the option of executing either the line on which the cursor is located, or the selected region of the

script if there’s a selection in place. If the script is editable, this menu also gives the option of adding

or removing comment markers from the start of the line or lines.

The gretl package includes over 70 example scripts. Many of these relate to Ramanathan (2002),

but they may also be used as a free-standing introduction to scripting in gretl and to various points

of econometric theory. You can explore the example ﬁles under “File, Script ﬁles, Example scripts”

There you will ﬁnd a listing of the ﬁles along with a brief description of the points they illustrate and

the data they employ. Open any ﬁle and run it to see the output. Note that long commands in a

script can be broken over two or more lines, using backslash as a continuation character.

You can, if you wish, use the GUI controls and the scripting approach in tandem, exploiting each

method where it oﬀers greater convenience. Here are two suggestions.

•Open a data ﬁle in the GUI. Explore the data—generate graphs, run regressions, perform tests.

Then open the Command log, edit out any redundant commands, and save it under a speciﬁc

name. Run the script to generate a single ﬁle containing a concise record of your work.

•Start by establishing a new script ﬁle. Type in any commands that may be required to set

up transformations of the data (see the genr command in the Gretl Command Reference).

Typically this sort of thing can be accomplished more eﬃciently via commands assembled with

forethought rather than point-and-click. Then save and run the script: the GUI data window

will be updated accordingly. Now you can carry out further exploration of the data via the

GUI. To revisit the data at a later point, open and rerun the “preparatory” script ﬁrst.

Scripts and data ﬁles

One common way of doing econometric research with gretl is as follows: compose a script; execute

the script; inspect the output; modify the script; run it again—with the last three steps repeated as

many times as necessary. In this context, note that when you open a data ﬁle this clears out most

of gretl’s internal state. It’s therefore probably a good idea to have your script start with an open

command: the data ﬁle will be re-opened each time, and you can be conﬁdent you’re getting “fresh”

results.

One further point should be noted. When you go to open a new data ﬁle via the graphical interface,

you are always prompted: opening a new data ﬁle will lose any unsaved work, do you really want

to do this? When you execute a script that opens a data ﬁle, however, you are not prompted. The

assumption is that in this case you’re not going to lose any work, because the work is embodied in the

script itself (and it would be annoying to be prompted at each iteration of the work cycle described

above).

This means you should be careful if you’ve done work using the graphical interface and then decide

to run a script: the current data ﬁle will be replaced without any questions asked, and it’s your

responsibility to save any changes to your data ﬁrst.

3.2 Saving script objects

When you estimate a model using point-and-click, the model results are displayed in a separate

window, oﬀering menus which let you perform tests, draw graphs, save data from the model, and

so on. Ordinarily, when you estimate a model using a script you just get a non-interactive printout

of the results. You can, however, arrange for models estimated in a script to be “captured”, so that

you can examine them interactively when the script is ﬁnished. Here is an example of the syntax for

achieving this eﬀect:

Model1 <- ols Ct 0 Yt

That is, you type a name for the model to be saved under, then a back-pointing “assignment arrow”,

then the model command. The assignment arrow is composed of the less-than sign followed by a

that while this can be remarkably useful, it can also lead to writing dinosaur scripts that are never meant to be executed

all at once, but rather used as a chaotic repository to cherry-pick snippets from. Since gretl allows you to have several

script windows open at the same time, you may want to keep your scripts tidy and reasonably small.

Chapter 3. Modes of working 13

dash; it must be separated by spaces from both the preceding name and the following command. The

name for a saved object may include spaces, but in that case it must be wrapped in double quotes:

"Model 1" <- ols Ct 0 Yt

Models saved in this way will appear as icons in the gretl icon view window (see Section 3.4) after

the script is executed. In addition, you can arrange to have a named model displayed (in its own

window) automatically as follows:

Model1.show

Again, if the name contains spaces it must be quoted:

"Model 1".show

The same facility can be used for graphs. For example the following will create a plot of Ct against

Yt, save it under the name“CrossPlot” (it will appear under this name in the icon view window), and

have it displayed:

CrossPlot <- gnuplot Ct Yt

CrossPlot.show

You can also save the output from selected commands as named pieces of text (again, these will

appear in the session icon window, from where you can open them later). For example this command

sends the output from an augmented Dickey–Fuller test to a “text object” named ADF1 and displays

it in a window:

ADF1 <- adf 2 x1

ADF1.show

Objects saved in this way (whether models, graphs or pieces of text output) can be destroyed using

the command .free appended to the name of the object, as in ADF1.free.

3.3 The gretl console

A further option is available for your computing convenience. Under gretl’s “Tools” menu you will

ﬁnd the item “Gretl console” (there is also an “open gretl console” button on the toolbar in the main

window). This opens up a window in which you can type commands and execute them one by one

(by pressing the Enter key) interactively. This is essentially the same as gretlcli’s mode of operation,

except that the GUI is updated based on commands executed from the console, enabling you to work

back and forth as you wish.

In the console, you have “command history”; that is, you can use the up and down arrow keys to

navigate the list of command you have entered to date. You can retrieve, edit and then re-enter a

previous command. In console mode, you can create, display and free objects (models, graphs or

text) as described above for script mode.

Optionally, the console can be shown not as a separate window, but as a side pane to the main

window, as in Figure 3.2. Some people prefer this arrangement, that is vaguely reminiscent of other

programs such as Rstudio or the Octave GUI interface. In order to activate the “split panes”interface,

go to the Tools menu, select “Preferences” and tick the “Main window includes console” box.

3.4 The Session concept

Gretl oﬀers the idea of a “session” as a way of keeping track of your work and revisiting it later. The

basic idea is to provide an iconic space containing various objects pertaining to your current working

session (see Figure 3.3). You can add objects (represented by icons) to this space as you go along. If

you save the session, these added objects should be available again if you re-open the session later.

If you start gretl and open a data set, then select “Icon view” from the View menu, you should see

the basic default set of icons: these give you quick access to information on the data set (if any),

Chapter 3. Modes of working 14

Figure 3.2: Main window including console

Figure 3.3: Icon view: one model and one graph have been added to the default icons

Chapter 3. Modes of working 15

correlation matrix (“Correlations”) and descriptive summary statistics (“Summary”). All of these are

activated by double-clicking the relevant icon. The “Data set” icon is a little more complex: double-

clicking opens up the data in the built-in spreadsheet, but you can also right-click on the icon for a

menu of other actions.

To add a model to the Icon view, ﬁrst estimate it using the Model menu. Then pull down the File

menu in the model window and select “Save to session as icon. .. ” or “Save as icon and close”. Simply

hitting the Skey over the model window is a shortcut to the latter action.

To add a graph, ﬁrst create it (under the View menu, “Graph speciﬁed vars”, or via one of gretl’s

other graph-generating commands). Click on the graph window to bring up the graph menu, and

select “Save to session as icon”.

Once a model or graph is added its icon will appear in the Icon view window. Double-clicking on the

icon redisplays the object, while right-clicking brings up a menu which lets you display or delete the

object. This popup menu also gives you the option of editing graphs.

The model table

In econometric research it is common to estimate several models with a common dependent variable—

the models diﬀering in respect of which independent variables are included, or perhaps in respect of the

estimator used. In this situation it is convenient to present the regression results in the form of a table,

where each column contains the results (coeﬃcient estimates and standard errors) for a given model,

and each row contains the estimates for a given variable across the models. Note that some estimation

methods are not compatible with the straightforward model table format, therefore gretl will not let

those models be added to the model table. These methods include non-linear least squares (nls),

generic maximum-likelihood estimators (mle), generic GMM (gmm), dynamic panel models (dpanel),

interval regressions (intreg), bivariate probit models (biprobit), AR(I)MA models (arima or arma),

and (G)ARCH models (garch and arch).

In the Icon view window gretl provides a means of constructing such a table (and copying it in plain

text, L

X or Rich Text Format). The procedure is outlined below. (The model table can also be

built non-interactively, in script mode—see the entry for modeltab in the Gretl Command Reference.)

1. Estimate a model which you wish to include in the table, and in the model display window,

under the File menu, select “Save to session as icon” or “Save as icon and close”.

2. Repeat step 1 for the other models to be included in the table (up to a total of six models).

3. When you are done estimating the models, open the icon view of your gretl session, by selecting

“Icon view” under the View menu in the main gretl window, or by clicking the “session icon

view” icon on the gretl toolbar.

4. In the Icon view, there is an icon labeled “Model table”. Decide which model you wish to appear

in the left-most column of the model table and add it to the table, either by dragging its icon

onto the Model table icon, or by right-clicking on the model icon and selecting “Add to model

table” from the pop-up menu.

5. Repeat step 4 for the other models you wish to include in the table. The second model selected

will appear in the second column from the left, and so on.

6. When you are ﬁnished composing the model table, display it by double-clicking on its icon.

Under the Edit menu in the window which appears, you have the option of copying the table

to the clipboard in various formats.

7. If the ordering of the models in the table is not what you wanted, right-click on the model table

icon and select “Clear table”. Then go back to step 4 above and try again.

A simple instance of gretl’s model table is shown in Figure 3.4.

Chapter 3. Modes of working 16

Figure 3.4: Example of model table

The graph page

The “graph page” icon in the session window oﬀers a means of putting together several graphs for

printing on a single page. This facility will work only if you have the L

X typesetting system

installed, and are able to generate and view either PDF or PostScript output. The output format is

controlled by your choice of program for compiling T

X ﬁles, which can be found under the“Programs”

tab in the Preferences dialog box (under the “Tools” menu in the main window). Usually this should

be pdﬂatex for PDF output or latex for PostScript. In the latter case you must have a working set-up

for handling PostScript, which will usually include dvips,ghostscript and a viewer such as gv,ggv or

kghostview.

In the Icon view window, you can drag up to eight graphs onto the graph page icon. When you

double-click on the icon (or right-click and select “Display”), a page containing the selected graphs

(in PDF or EPS format) will be composed and opened in your viewer. From there you should be able

to print the page.

To clear the graph page, right-click on its icon and select “Clear”.

As with the model table, it is also possible to manipulate the graph page via commands in script or

console mode—see the entry for the graphpg command in the Gretl Command Reference.

Saving and re-opening sessions

If you create models or graphs that you think you may wish to re-examine later, then before quitting

gretl select “Session ﬁles, Save session” from the File menu and give a name under which to save the

session. To re-open the session later, either

•Start gretl then re-open the session ﬁle by going to the “File, Session ﬁles, Open session”, or

•From the command line, type gretl -r sessionﬁle, where sessionﬁle is the name under which

the session was saved, or

•Drag the icon representing a session ﬁle onto gretl.

Chapter 4

Data ﬁles

4.1 Data ﬁle formats

Gretl has its own native format for data ﬁles. Most users will probably not want to read or write

such ﬁles outside of gretl itself, but occasionally this may be useful and details on the ﬁle formats

are given in Appendix A. The program can also import data from a variety of other formats. In the

GUI program this can be done via the “File, Open Data, User ﬁle” menu—note the drop-down list of

acceptable ﬁle types. In script mode, simply use the open command. The supported import formats

are as follows.

•Plain text ﬁles (comma-separated or “CSV”being the most common type). For details on what

gretl expects of such ﬁles, see Section 4.3.

•Spreadsheets: MS Excel,Gnumeric and Open Document (ODS). The requirements for such ﬁles

are given in Section 4.3.

•Stata data ﬁles (.dta).

•SPSS data ﬁles (.sav).

•SAS “xport” ﬁles (.xpt).

•Eviews workﬁles (.wf1).1

•JMulTi data ﬁles.

When you import data from a plain text format, gretl opens a “diagnostic” window, reporting on its

progress in reading the data. If you encounter a problem with ill-formatted data, the messages in this

window should give you a handle on ﬁxing the problem.

Note that gretl has a facility for writing out data in the native formats of GNU R,Octave,JMulTi

and PcGive (see Appendix D). In the GUI client this option is found under the “File, Export data”

menu; in the command-line client use the store command with the appropriate option ﬂag.

4.2 Databases

For working with large amounts of data gretl is supplied with a database-handling routine. A database,

as opposed to a data ﬁle, is not read directly into the program’s workspace. A database can contain

series of mixed frequencies and sample ranges. You open the database and select series to import

into the working dataset. You can then save those series in a native format data ﬁle if you wish.

Databases can be accessed via the menu item “File, Databases”.

For details on the format of gretl databases, see Appendix A.

Online access to databases

Several gretl databases are available from Wake Forest University. Your computer must be connected

to the internet for this option to work. Please see the description of the “data” command under the

Help menu.

☞Visit the gretl data page for details and updates on available data.

1See http://users.wfu.edu/cottrell/eviews_format/.

Chapter 4. Data ﬁles 18

Foreign database formats

Thanks to Thomas Doan of Estima, who made available the speciﬁcation of the database format

used by RATS 4 (Regression Analysis of Time Series), gretl can handle such databases—or at least,

a subset of same, namely time-series databases containing monthly and quarterly series.

Gretl can also import data from PcGive databases. These take the form of a pair of ﬁles, one containing

the actual data (with suﬃx .bn7) and one containing supplementary information (.in7).

In addition, gretl oﬀers ODBC connectivity. Be warned: this feature is meant for somewhat advanced

users; there is currently no graphical interface. Interested readers will ﬁnd more info in appendix 42.

4.3 Creating a dataset from scratch

There are several ways of doing this:

1. Find, or create using a text editor, a plain text data ﬁle and open it via “Import”.

2. Use your favorite spreadsheet to establish the data ﬁle, save it in comma-separated format if

necessary (this may not be necessary if the spreadsheet format is MS Excel, Gnumeric or Open

Document), then use one of the “Import” options.

3. Use gretl’s built-in spreadsheet.

4. Select data series from a suitable database.

5. Use your favorite text editor or other software tools to a create data ﬁle in gretl format inde-

pendently.

Here are a few comments and details on these methods.

Common points on imported data

Options (1) and (2) involve using gretl’s “import” mechanism. For the program to read such data

successfully, certain general conditions must be satisﬁed:

•The ﬁrst row must contain valid variable names. A valid variable name is of 31 characters

maximum; starts with a letter; and contains nothing but letters, numbers and the underscore

character, _. (Longer variable names will be truncated to 31 characters.) Qualiﬁcations to the

above: First, in the case of an plain text import, if the ﬁle contains no row with variable names

the program will automatically add names, v1,v2 and so on. Second, by “the ﬁrst row” is meant

the ﬁrst relevant row. In the case of plain text imports, blank rows and rows beginning with a

hash mark, #, are ignored. In the case of Excel, Gnumeric and ODS imports, you are presented

with a dialog box where you can select an oﬀset into the spreadsheet, so that gretl will ignore

a speciﬁed number of rows and/or columns.

•Data values: these should constitute a rectangular block, with one variable per column (and

one observation per row). The number of variables (data columns) must match the number

of variable names given. See also section 4.6. Numeric data are expected, but in the case of

importing from plain text, the program oﬀers limited handling of character (string) data: if a

given column contains character data only, consecutive numeric codes are substituted for the

strings, and once the import is complete a table is printed showing the correspondence between

the strings and the codes.

•Dates (or observation labels): Optionally, the ﬁrst column may contain strings such as dates, or

labels for cross-sectional observations. Such strings have a maximum of 15 characters (as with

variable names, longer strings will be truncated). A column of this sort should be headed with

the string obs or date, or the ﬁrst row entry may be left blank.

For dates to be recognized as such, the date strings should adhere to one or other of a set of

speciﬁc formats, as follows. For annual data: 4-digit years. For quarterly data: a 4-digit year,

followed by a separator (either a period, a colon, or the letter Q), followed by a 1-digit quarter.

Examples: 1997.1,2002:3,1947Q1. For monthly data: a 4-digit year, followed by a period or

a colon, followed by a two-digit month. Examples: 1997.01,2002:10.

Chapter 4. Data ﬁles 19

Plain text (“CSV”) ﬁles can use comma, space, tab or semicolon as the column separator. When you

open such a ﬁle via the GUI you are given the option of specifying the separator, though in most

cases it should be detected automatically.

If you use a spreadsheet to prepare your data you are able to carry out various transformations of

the “raw” data with ease (adding things up, taking percentages or whatever): note, however, that you

can also do this sort of thing easily—perhaps more easily—within gretl, by using the tools under the

“Add” menu.

Appending imported data

You may wish to establish a dataset piece by piece, by incremental importation of data from other

sources. This is supported via the “File, Append data” menu items: gretl will check the new data for

conformability with the existing dataset and, if everything seems OK, will merge the data. You can

add new variables in this way, provided the data frequency matches that of the existing dataset. Or

you can append new observations for data series that are already present; in this case the variable

names must match up correctly. Note that by default (that is, if you choose “Open data” rather than

“Append data”), opening a new data ﬁle closes the current one.

Using the built-in spreadsheet

Under the “File, New data set” menu you can choose the sort of dataset you want to establish (e.g.

quarterly time series, cross-sectional). You will then be prompted for starting and ending dates (or

observation numbers) and the name of the ﬁrst variable to add to the dataset. After supplying this

information you will be faced with a simple spreadsheet into which you can type data values. In the

spreadsheet window, clicking the right mouse button will invoke a popup menu which enables you to

add a new variable (column), to add an observation (append a row at the foot of the sheet), or to

insert an observation at the selected point (move the data down and insert a blank row.)

Once you have entered data into the spreadsheet you import these into gretl’s workspace using the

spreadsheet’s “Apply changes” button.

Please note that gretl’s spreadsheet is quite basic and has no support for functions or formulas. Data

transformations are done via the “Add” or “Variable” menus in the main window.

Selecting from a database

Another alternative is to establish your dataset by selecting variables from a database.

Begin with the “File, Databases”menu item. This has four forks: “Gretl native”, “RATS 4”,“PcGive”

and “On database server”. You should be able to ﬁnd the ﬁle fedstl.bin in the ﬁle selector that

opens if you choose the “Gretl native” option since this ﬁle, which contains a large collection of US

macroeconomic time series, is supplied with the distribution.

You won’t ﬁnd anything under “RATS 4” unless you have purchased RATS data.2If you do possess

RATS data you should go into the “Tools, Preferences, General” dialog, select the Databases tab, and

ﬁll in the correct path to your RATS ﬁles.

If your computer is connected to the internet you should ﬁnd several databases (at Wake Forest

University) under “On database server”. You can browse these remotely; you also have the option of

installing them onto your own computer. The initial remote databases window has an item showing,

for each ﬁle, whether it is already installed locally (and if so, if the local version is up to date with

the version at Wake Forest).

Assuming you have managed to open a database you can import selected series into gretl’s workspace

by using the“Series, Import” menu item in the database window, or via the popup menu that appears

if you click the right mouse button, or by dragging the series into the program’s main window.

Creating a gretl data ﬁle independently

It is possible to create a data ﬁle in one or other of gretl’s own formats using a text editor or software

tools such as awk,sed or perl. This may be a good choice if you have large amounts of data already

2See www.estima.com

Chapter 4. Data ﬁles 20

in machine readable form. You will, of course, need to study these data formats (XML-based or

“traditional”) as described in Appendix A.

4.4 Structuring a dataset

Once your data are read by gretl, it may be necessary to supply some information on the nature of

the data. We distinguish between three kinds of datasets:

1. Cross section

2. Time series

3. Panel data

The primary tool for doing this is the “Data, Dataset structure”menu entry in the graphical interface,

or the setobs command for scripts and the command-line interface.

Cross sectional data

By a cross section we mean observations on a set of “units” (which may be ﬁrms, countries, individ-

uals, or whatever) at a common point in time. This is the default interpretation for a data ﬁle: if

there is insuﬃcient information to interpret data as time-series or panel data, they are automatically

interpreted as a cross section. In the unlikely event that cross-sectional data are wrongly interpreted

as time series, you can correct this by selecting the “Data, Dataset structure” menu item. Click the

“cross-sectional” radio button in the dialog box that appears, then click “Forward”. Click “OK” to

conﬁrm your selection.

Time series data

When you import data from a spreadsheet or plain text ﬁle, gretl will make fairly strenuous eﬀorts

to glean time-series information from the ﬁrst column of the data, if it looks at all plausible that

such information may be present. If time-series structure is present but not recognized, again you

can use the “Data, Dataset structure”menu item. Select “Time series” and click “Forward”; select the

appropriate data frequency and click “Forward” again; then select or enter the starting observation

and click “Forward” once more. Finally, click “OK” to conﬁrm the time-series interpretation if it is

correct (or click “Back” to make adjustments if need be).

Besides the basic business of getting a data set interpreted as time series, further issues may arise

relating to the frequency of time-series data. In a gretl time-series data set, all the series must have

the same frequency. Suppose you wish to make a combined dataset using series that, in their original

state, are not all of the same frequency. For example, some series are monthly and some are quarterly.

Your ﬁrst step is to formulate a strategy: Do you want to end up with a quarterly or a monthly data

set? A basic point to note here is that “compacting” data from a higher frequency (e.g. monthly) to

a lower frequency (e.g. quarterly) is usually unproblematic. You lose information in doing so, but in

general it is perfectly legitimate to take (say) the average of three monthly observations to create a

quarterly observation. On the other hand, “expanding” data from a lower to a higher frequency is

not, in general, a valid operation.

In most cases, then, the best strategy is to start by creating a data set of the lower frequency, and

then to compact the higher frequency data to match. When you import higher-frequency data from

a database into the current data set, you are given a choice of compaction method (average, sum,

start of period, or end of period). In most instances “average” is likely to be appropriate.

You can also import lower-frequency data into a high-frequency data set, but this is generally not

recommended. What gretl does in this case is simply replicate the values of the lower-frequency series

as many times as required. For example, suppose we have a quarterly series with the value 35.5 in

1990:1, the ﬁrst quarter of 1990. On expansion to monthly, the value 35.5 will be assigned to the

observations for January, February and March of 1990. The expanded variable is therefore useless

for ﬁne-grained time-series analysis, outside of the special case where you know that the variable in

question does in fact remain constant over the sub-periods.

Chapter 4. Data ﬁles 21

When the current data frequency is appropriate, gretl oﬀers both “Compact data” and“Expand data”

options under the “Data” menu. These options operate on the whole data set, compacting or exanding

all series. They should be considered “expert” options and should be used with caution.

Panel data

Panel data are inherently three dimensional—the dimensions being variable, cross-sectional unit,

and time-period. For example, a particular number in a panel data set might be identiﬁed as the

observation on capital stock for General Motors in 1980. (A note on terminology: we use the terms

“cross-sectional unit”, “unit” and “group” interchangeably below to refer to the entities that compose

the cross-sectional dimension of the panel. These might, for instance, be ﬁrms, countries or persons.)

For representation in a textual computer ﬁle (and also for gretl’s internal calculations) the three

dimensions must somehow be ﬂattened into two. This “ﬂattening” involves taking layers of the data

that would naturally stack in a third dimension, and stacking them in the vertical dimension.

gretl always expects data to be arranged “by observation”, that is, such that each row represents an

observation (and each variable occupies one and only one column). In this context the ﬂattening of

a panel data set can be done in either of two ways:

•Stacked time series: the successive vertical blocks each comprise a time series for a given unit.

•Stacked cross sections: the successive vertical blocks each comprise a cross-section for a given

period.

You may input data in whichever arrangement is more convenient. Internally, however, gretl always

stores panel data in the form of stacked time series.

4.5 Panel data speciﬁcs

When you import panel data into gretl from a spreadsheet or comma separated format, the panel

nature of the data will not be recognized automatically (most likely the data will be treated as

“undated”). A panel interpretation can be imposed on the data using the graphical interface or via

the setobs command.

In the graphical interface, use the menu item “Data, Dataset structure”. In the ﬁrst dialog box

that appears, select “Panel”. In the next dialog you have a three-way choice. The ﬁrst two options,

“Stacked time series”and “Stacked cross sections” are applicable if the data set is already organized in

one of these two ways. If you select either of these options, the next step is to specify the number of

cross-sectional units in the data set. The third option, “Use index variables”, is applicable if the data

set contains two variables that index the units and the time periods respectively; the next step is then

to select those variables. For example, a data ﬁle might contain a country code variable and a variable

representing the year of the observation. In that case gretl can reconstruct the panel structure of the

data regardless of how the observation rows are organized.

The setobs command has options that parallel those in the graphical interface. If suitable index

variables are available you can do, for example

setobs unitvar timevar --panel-vars

where unitvar is a variable that indexes the units and timevar is a variable indexing the periods.

Alternatively you can use the form setobs freq 1:1 structure, where freq is replaced by the “block

size” of the data (that is, the number of periods in the case of stacked time series, or the number

of units in the case of stacked cross-sections) and structure is either --stacked-time-series or

--stacked-cross-section. Two examples are given below: the ﬁrst is suitable for a panel in the

form of stacked time series with observations from 20 periods; the second for stacked cross sections

with 5 units.

setobs 20 1:1 --stacked-time-series

setobs 5 1:1 --stacked-cross-section

Chapter 4. Data ﬁles 22

Panel data arranged by variable

Publicly available panel data sometimes come arranged “by variable.” Suppose we have data on two

variables, x1 and x2, for each of 50 states in each of 5 years (giving a total of 250 observations per

variable). One textual representation of such a data set would start with a block for x1, with 50

rows corresponding to the states and 5 columns corresponding to the years. This would be followed,

vertically, by a block with the same structure for variable x2. A fragment of such a data ﬁle is shown

below, with quinquennial observations 1965–1985. Imagine the table continued for 48 more states,

followed by another 50 rows for variable x2.

1965 1970 1975 1980 1985

AR 100.0 110.5 118.7 131.2 160.4

AZ 100.0 104.3 113.8 120.9 140.6

If a dataﬁle with this sort of structure is read into gretl,3the program will interpret the columns as

distinct variables, so the data will not be usable “as is.” But there is a mechanism for correcting the

situation, namely the stack function.

Consider the ﬁrst data column in the fragment above: the ﬁrst 50 rows of this column constitute a

cross-section for the variable x1 in the year 1965. If we could create a new series by stacking the

ﬁrst 50 entries in the second column underneath the ﬁrst 50 entries in the ﬁrst, we would be on the

way to making a data set “by observation” (in the ﬁrst of the two forms mentioned above, stacked

cross-sections). That is, we’d have a column comprising a cross-section for x1 in 1965, followed by a

cross-section for the same variable in 1970.

The following gretl script illustrates how we can accomplish the stacking, for both x1 and x2. We

assume that the original data ﬁle is called panel.txt, and that in this ﬁle the columns are headed

with “variable names” v1,v2, . . . , v5. (The columns are not really variables, but in the ﬁrst instance

we “pretend” that they are.)

open panel.txt

series x1 = stack(v1..v5, 50)

series x2 = stack(v1..v5, 50, 50)

setobs 50 1:1 --stacked-cross-section

store panel.gdt x1 x2

The second and third lines illustrate the syntax of the stack function, which has this signature:

series stack(list L, scalar length, scalar offset)

•L: a list of series on which to operate.

•length: an integer giving the number of observations to take from each series.

•offset: an integer giving the oﬀset from the top of the dataset at which to start taking values

(optional, defaults to 0).

The “..” syntax in the example above constructs a list of the 5 contiguous series to be stacked. More

generally, you can deﬁne a named list of series and pass that as the ﬁrst argument to stack (see

chapter 15). In this example we’re supposing that the full data set contains 100 rows, and that in the

stacking of variable x1 we wish to read only the ﬁrst 50 rows from each column, so we give 50 as the

second argument.

On line 3 we do the stacking for variable x2. Again we want a length of 50 for the components of

the stacked series, but this time we want to start reading from the 50th row of the original data, and

so we add a third offset argument of 50. Line 4 then imposes a panel interpretation on the data.

Finally, we save the stacked data to ﬁle, with the panel interpretation.

3Note that you will have to modify such a dataﬁle slightly before it can be read at all. The line containing the

variable name (in this example x1) will have to be removed, and so will the initial row containing the years, otherwise

they will be taken as numerical data.

Chapter 4. Data ﬁles 23

The illustrative script above is appropriate when the number of variables to be processed is small.

When then are many variables in the dataset it will be more convenient to use a loop to accomplish

the stacking, as shown in the following script. The setup is presumed to be the same as in the previous

case (50 units, 5 periods), but with 20 variables rather than 2.

open panel.txt

list L = v1..v5 # predefine a list of series

scalar length = 50

loop i=1..20

scalar offset = (i - 1) * length

series x$i = stack(L, length, offset)

endloop

setobs 50 1.01 --stacked-cross-section

store panel.gdt x1..x20

Side-by-side time series

There’s a second sort of data that you may wish to convert to gretl’s panel format, namely side-by-

side time series for a number of cross-sectional units. For example, a data ﬁle might contain separate

GDP series of common length Tfor each of Ncountries. To turn these into a single stacked time

series the stack function can again be used. An example follows, where we suppose the original data

source is a comma-separated ﬁle named GDP.csv, containing GDP data for countries from Austria

(GDP_AT) to Zimbabwe (GDP_ZW) in consecutive columns.

open GDP.csv

scalar T = $nobs # the number of periods

list L = GDP_AT..GDP_ZW

series GDP = stack(L, T)

setobs T 1:01 --stacked-time-series

store panel.gdt GDP

The resulting data ﬁle, panel.gdt, will contain a single series of length NT where Nis the number of

countries and Tis the length of the original dataset. One could insert revised variants of lines 3 and

4 of the script if the original ﬁle contained additional side-by-side per-country series for investment,

consumption or whatever.

Relatively simple cases of this transformation can be handled via gretl’s graphical interface. The

“simplicity” requirements are:

•The dataset contains exactly M·Ntime series, arranged in M≥1 blocks each having N≥2

contiguous members.

•In each block, the Nseries represent measures of a single variable (e.g. GDP) for a set of N

cross-sectional units, and in each block these units appear in the same order.

The relevant GUI apparatus can be accessed via the item Dataset structure under the Data menu in

the main window. On selecting this item one of the options is Panel; choose this option and the next

step oﬀers “Convert from side-by-side time series”. This leads to steps where you specify Mor N;

give a name for the panel series; then conﬁrm that the speciﬁed transformation is what you want.

The end result is the same as executing a series of commands on the pattern shown above, using the

stack function, then executing the additional command open panel.gdt.

Panel data marker strings

It can be helpful with panel data to have the observations identiﬁed by mnemonic markers. A special

function in the genr command is available for this purpose.

In the example under the heading “Panel data arranged by variable” above, suppose all the states

are identiﬁed by two-letter codes in the left-most column of the original dataﬁle. When the stack

function is invoked as shown, these codes will be stacked along with the data values. If the ﬁrst row

is marked AR for Arkansas, then the marker AR will end up being shown on each row containing an

observation for Arkansas. That’s all very well, but these markers don’t tell us anything about the

date of the observation. To rectify this we could do:

Chapter 4. Data ﬁles 24

genr time

series year = 1960 + (5 * time)

genr markers = "%s:%d", marker, year

The ﬁrst line generates a 1-based index representing the period of each observation, and the second

line uses the time variable to generate a variable representing the year of the observation. The

third line contains this special feature: if (and only if) the name of the new “variable” to generate is

markers, the portion of the command following the equals sign is taken as a C-style format string

(which must be wrapped in double quotes), followed by a comma-separated list of arguments. The

arguments will be printed according to the given format to create a new set of observation markers.

Valid arguments are either the names of variables in the dataset, or the string marker which denotes

the pre-existing observation marker. The format speciﬁers which are likely to be useful in this context

are %s for a string and %d for an integer. Strings can be truncated: for example %.3s will use just

the ﬁrst three characters of the string. To chop initial characters oﬀ an existing observation marker

when constructing a new one, you can use the syntax marker + n, where nis a positive integer: in

the case the ﬁrst ncharacters will be skipped.

After the commands above are processed, then, the observation markers will look like, for example,

AR:1965, where the two-letter state code and the year of the observation are spliced together with a

colon.

Panel dummy variables

In a panel study you may wish to construct dummy variables of one or both of the following sorts:

(a) dummies as unique identiﬁers for the units or groups, and (b) dummies as unique identiﬁers for

the time periods. The former may be used to allow the intercept of the regression to diﬀer across the

units, the latter to allow the intercept to diﬀer across periods.

Two special functions are available to create such dummies. These are found under the “Add” menu

in the GUI, or under the genr command in script mode or gretlcli.

1. “unit dummies” (script command genr unitdum). This command creates a set of dummy vari-

ables identifying the cross-sectional units. The variable du_1 will have value 1 in each row

corresponding to a unit 1 observation, 0 otherwise; du_2 will have value 1 in each row corre-

sponding to a unit 2 observation, 0 otherwise; and so on.

2. “time dummies” (script command genr timedum). This command creates a set of dummy

variables identifying the periods. The variable dt_1 will have value 1 in each row corresponding

to a period 1 observation, 0 otherwise; dt_2 will have value 1 in each row corresponding to a

period 2 observation, 0 otherwise; and so on.

If a panel data set has the YEAR of the observation entered as one of the variables you can create a

periodic dummy to pick out a particular year, e.g. genr dum = (YEAR==1960). You can also create

periodic dummy variables using the modulus operator, %. For instance, to create a dummy with value

1 for the ﬁrst observation and every thirtieth observation thereafter, 0 otherwise, do

genr index

series dum = ((index-1) % 30) == 0

Lags, diﬀerences, trends

If the time periods are evenly spaced you may want to use lagged values of variables in a panel

regression (but see also chapter 24); you may also wish to construct ﬁrst diﬀerences of variables of

interest.

Once a dataset is identiﬁed as a panel, gretl will handle the generation of such variables correctly.

For example the command genr x1_1 = x1(-1) will create a variable that contains the ﬁrst lag of

x1 where available, and the missing value code where the lag is not available (e.g. at the start of

the time series for each group). When you run a regression using such variables, the program will

automatically skip the missing observations.

Chapter 4. Data ﬁles 25

When a panel data set has a fairly substantial time dimension, you may wish to include a trend in

the analysis. The command genr time creates a variable named time which runs from 1 to Tfor

each unit, where Tis the length of the time-series dimension of the panel. If you want to create an

index that runs consecutively from 1 to m×T, where mis the number of units in the panel, use genr

index.

Basic statistics by unit

gretl contains functions which can be used to generate basic descriptive statistics for a given variable,

on a per-unit basis; these are pnobs() (number of valid cases), pmin() and pmax() (minimum and

maximum) and pmean() and psd() (mean and standard deviation).

As a brief illustration, suppose we have a panel data set comprising 8 time-series observations on each

of Nunits or groups. Then the command

series pmx = pmean(x)

creates a series of this form: the ﬁrst 8 values (corresponding to unit 1) contain the mean of xfor unit

1, the next 8 values contain the mean for unit 2, and so on. The psd() function works in a similar

manner. The sample standard deviation for group iis computed as

si=sP(x−¯xi)2

Ti−1

where Tidenotes the number of valid observations on xfor the given unit, ¯xidenotes the group

mean, and the summation is across valid observations for the group. If Ti<2, however, the standard

deviation is recorded as 0.

One particular use of psd() may be worth noting. If you want to form a sub-sample of a panel that

contains only those units for which the variable xis time-varying, you can either use

smpl pmin(x) < pmax(x) --restrict

smpl psd(x) > 0 --restrict

4.6 Missing data values

Representation and handling

Missing values are represented internally as NaN (“not a number”), as deﬁned in the IEEE 754 ﬂoating-

point standard. In a native-format data ﬁle they should be represented as NA. When importing CSV

data gretl accepts several common representations of missing values including −999, the string NA (in

upper or lower case), a single dot, or simply a blank cell. Blank cells should, of course, be properly

delimited, e.g. 120.6,,5.38, in which the middle value is presumed missing.

As for handling of missing values in the course of statistical analysis, gretl does the following:

•In calculating descriptive statistics (mean, standard deviation, etc.) under the summary com-

mand, missing values are simply skipped and the sample size adjusted appropriately.

•In running regressions gretl ﬁrst adjusts the beginning and end of the sample range, truncating

the sample if need be. Missing values at the beginning of the sample are common in time series

work due to the inclusion of lags, ﬁrst diﬀerences and so on; missing values at the end of the

range are not uncommon due to diﬀerential updating of series and possibly the inclusion of

leads.

If gretl detects any missing values “inside” the (possibly truncated) sample range for a regression,

the result depends on the character of the dataset and the estimator chosen. In many cases, the

program will automatically skip the missing observations when calculating the regression results.

Chapter 4. Data ﬁles 26

In this situation a message is printed stating how many observations were dropped. On the other

hand, the skipping of missing observations is not supported for all procedures: exceptions include all

autoregressive estimators, system estimators such as SUR, and nonlinear least squares. In the case of

panel data, the skipping of missing observations is supported only if their omission leaves a balanced

panel. If missing observations are found in cases where they are not supported, gretl gives an error

message and refuses to produce estimates.

Manipulating missing values

Some special functions are available for the handling of missing values. The Boolean function

missing() takes the name of a variable as its single argument; it returns a series with value 1

for each observation at which the given variable has a missing value, and value 0 otherwise (that is,

if the given variable has a valid value at that observation). The function ok() is complementary to

missing; it is just a shorthand for !missing (where !is the Boolean NOT operator). For example,

one can count the missing values for variable xusing

scalar nmiss_x = sum(missing(x))

The function zeromiss(), which again takes a single series as its argument, returns a series where

all zero values are set to the missing code. This should be used with caution—one does not want

to confuse missing values and zeros—but it can be useful in some contexts. For example, one can

determine the ﬁrst valid observation for a variable xusing

genr time

scalar x0 = min(zeromiss(time * ok(x)))

The function misszero() does the opposite of zeromiss, that is, it converts all missing values to

zero.

If missing values get involved in calculations, they propagate according to the IEEE rules: notably,

if one of the operands to an arithmetical operation is a NaN, the result will also be NaN.

4.7 Maximum size of data sets

Basically, the size of data sets (both the number of variables and the number of observations per

variable) is limited only by the characteristics of your computer. Gretl allocates memory dynamically,

and will ask the operating system for as much memory as your data require. Obviously, then, you

are ultimately limited by the size of RAM.

Aside from the multiple-precision OLS option, gretl uses double-precision ﬂoating-point numbers

throughout. The size of such numbers in bytes depends on the computer platform, but is typically

eight. To give a rough notion of magnitudes, suppose we have a data set with 10,000 observations on

500 variables. That’s 5 million ﬂoating-point numbers or 40 million bytes. If we deﬁne the megabyte

(MB) as 1024 ×1024 bytes, as is standard in talking about RAM, it’s slightly over 38 MB. The

program needs additional memory for workspace, but even so, handling a data set of this size should

be quite feasible on a current PC, which at the time of writing is likely to have at least 256 MB of

RAM.

If RAM is not an issue, there is one further limitation on data size (though it’s very unlikely to

be a binding constraint). That is, variables and observations are indexed by signed integers, and

on a typical PC these will be 32-bit values, capable of representing a maximum positive value of

231 −1=2,147,483,647.

The limits mentioned above apply to gretl’s“native” functionality. There are tighter limits with regard

to two third-party programs that are available as add-ons to gretl for certain sorts of time-series

analysis including seasonal adjustment, namely TRAMO/SEATS and X-12-ARIMA. These programs

employ a ﬁxed-size memory allocation, and can’t handle series of more than 600 observations.

4.8 Data ﬁle collections

If you’re using gretl in a teaching context you may be interested in adding a collection of data ﬁles

and/or scripts that relate speciﬁcally to your course, in such a way that students can browse and

Chapter 4. Data ﬁles 27

access them easily.

There are three ways to access such collections of ﬁles:

•For data ﬁles: select the menu item “File, Open data, Sample ﬁle”, or click on the folder icon

on the gretl toolbar.

•For script ﬁles: select the menu item “File, Script ﬁles, Example scripts”.

When a user selects one of the items:

•The data or script ﬁles included in the gretl distribution are automatically shown (this includes

ﬁles relating to Ramanathan’s Introductory Econometrics and Greene’s Econometric Analysis).

•The program looks for certain known collections of data ﬁles available as optional extras, for

instance the dataﬁles from various econometrics textbooks (Davidson and MacKinnon, Gujarati,

Stock and Watson, Verbeek, Wooldridge) and the Penn World Table (PWT 5.6). (See the data

page at the gretl website for information on these collections.) If the additional ﬁles are found,

they are added to the selection windows.

•The program then searches for valid ﬁle collections (not necessarily known in advance) in these

places: the “system” data directory, the system script directory, the user directory, and all ﬁrst-

level subdirectories of these. For reference, typical values for these directories are shown in

Table 4.1. (Note that PERSONAL is a placeholder that is expanded by Windows, corresponding

to “My Documents” on English-language systems.)

Linux MS Windows

system data dir /usr/share/gretl/data c:\Program Files\gretl\data

system script dir /usr/share/gretl/scripts c:\Program Files\gretl\scripts

user dir $HOME/gretl PERSONAL\gretl

Table 4.1: Typical locations for ﬁle collections

Any valid collections will be added to the selection windows. So what constitutes a valid ﬁle collection?

This comprises either a set of data ﬁles in gretl XML format (with the .gdt suﬃx) or a set of script ﬁles

containing gretl commands (with .inp suﬃx), in each case accompanied by a “master ﬁle”or catalog.

The gretl distribution contains several example catalog ﬁles, for instance the ﬁle descriptions in

the misc sub-directory of the gretl data directory and ps_descriptions in the misc sub-directory of

the scripts directory.

If you are adding your own collection, data catalogs should be named descriptions and script

catalogs should be be named ps_descriptions. In each case the catalog should be placed (along

with the associated data or script ﬁles) in its own speciﬁc sub-directory (e.g. /usr/share/gretl/

data/mydata or c:\userdata\gretl\data\mydata).

The catalog ﬁles are plain text; if they contain non-ASCII characters they must be encoded as UTF-8.

The syntax of such ﬁles is straightforward. Here, for example, are the ﬁrst few lines of gretl’s “misc”

data catalog:

# Gretl: various illustrative datafiles

"arma","artificial data for ARMA script example"

"ects_nls","Nonlinear least squares example"

"hamilton","Prices and exchange rate, U.S. and Italy"

The ﬁrst line, which must start with a hash mark, contains a short name, here “Gretl”, which will

appear as the label for this collection’s tab in the data browser window, followed by a colon, followed

by an optional short description of the collection.

Subsequent lines contain two elements, separated by a comma and wrapped in double quotation marks.

The ﬁrst is a dataﬁle name (leave oﬀ the .gdt suﬃx here) and the second is a short description of

the content of that dataﬁle. There should be one such line for each dataﬁle in the collection.

Chapter 4. Data ﬁles 28

A script catalog ﬁle looks very similar, except that there are three ﬁelds in the ﬁle lines: a ﬁlename

(without its .inp suﬃx), a brief description of the econometric point illustrated in the script, and

a brief indication of the nature of the data used. Again, here are the ﬁrst few lines of the supplied

“misc” script catalog:

# Gretl: various sample scripts

"arma","ARMA modeling","artificial data"

"ects_nls","Nonlinear least squares (Davidson)","artificial data"

"leverage","Influential observations","artificial data"

"longley","Multicollinearity","US employment"

If you want to make your own data collection available to users, these are the steps:

1. Assemble the data, in whatever format is convenient.

2. Convert the data to gretl format and save as gdt ﬁles. It is probably easiest to convert the data

by importing them into the program from plain text, CSV, or a spreadsheet format (MS Excel

or Gnumeric) then saving them. You may wish to add descriptions of the individual variables

(the “Variable, Edit attributes” menu item), and add information on the source of the data (the

“Data, Edit info” menu item).

3. Write a descriptions ﬁle for the collection using a text editor.

4. Put the dataﬁles plus the descriptions ﬁle in a subdirectory of the gretl data directory (or user

directory).

5. If the collection is to be distributed to other people, package the data ﬁles and catalog in some

suitable manner, e.g. as a zipﬁle.

If you assemble such a collection, and the data are not proprietary, we would encourage you to submit

the collection for packaging as a gretl optional extra.

4.9 Assembling data from multiple sources

In many contexts researchers need to bring together data from multiple source ﬁles, and in some

cases these sources are not organized such that the data can simply be “stuck together” by appending

rows or columns to a base dataset. In gretl, the join command can be used for this purpose; this

command is discussed in detail in chapter 7.

Chapter 5

Sub-sampling a dataset

5.1 Introduction

Some subtle issues can arise here; this chapter attempts to explain the issues.

A sub-sample may be deﬁned in relation to a full dataset in two diﬀerent ways: we will refer to

these as “setting”the sample and “restricting” the sample; these methods are discussed in sections 5.2

and 5.3 respectively. In addition section 5.4 discusses some special issues relating to panel data, and

section 5.5 covers resampling with replacement, which is useful in the context of bootstrapping test

statistics.

The following discussion focuses on the command-line approach. But you can also invoke the methods

outlined here via the items under the Sample menu in the GUI program.

5.2 Setting the sample

By “setting” the sample we mean deﬁning a sub-sample simply by means of adjusting the starting

and/or ending point of the current sample range. This is likely to be most relevant for time-series

data. For example, one has quarterly data from 1960:1 to 2003:4, and one wants to run a regression

using only data from the 1970s. A suitable command is then

smpl 1970:1 1979:4

Or one wishes to set aside a block of observations at the end of the data period for out-of-sample

forecasting. In that case one might do

smpl ; 2000:4

where the semicolon is shorthand for “leave the starting observation unchanged”. (The semicolon

may also be used in place of the second parameter, to mean that the ending observation should be

unchanged.) By “unchanged” here, we mean unchanged relative to the last smpl setting, or relative

to the full dataset if no sub-sample has been deﬁned up to this point. For example, after

smpl 1970:1 2003:4

smpl ; 2000:4

the sample range will be 1970:1 to 2000:4.

An incremental or relative form of setting the sample range is also supported. In this case a relative

oﬀset should be given, in the form of a signed integer (or a semicolon to indicate no change), for both

the starting and ending point. For example

smpl +1 ;

will advance the starting observation by one while preserving the ending observation, and

smpl +2 -1

will both advance the starting observation by two and retard the ending observation by one.

An important feature of “setting” the sample as described above is that it necessarily results in the

selection of a subset of observations that are contiguous in the full dataset. The structure of the

dataset is therefore unaﬀected (for example, if it is a quarterly time series before setting the sample,

it remains a quarterly time series afterwards).

Chapter 5. Sub-sampling a dataset 30

5.3 Restricting the sample

By “restricting” the sample we mean selecting observations on the basis of some Boolean (logical)

criterion, or by means of a random number generator. This is likely to be most relevant for cross-

sectional or panel data.

Suppose we have data on a cross-section of individuals, recording their gender, income and other

characteristics. We wish to select for analysis only the women. If we have a male dummy variable

with value 1 for men and 0 for women we could do

smpl male==0 --restrict

to this eﬀect. Or suppose we want to restrict the sample to respondents with incomes over $50,000.

Then we could use

smpl income>50000 --restrict

A question arises: if we issue the two commands above in sequence, what do we end up with in our

sub-sample: all cases with income over 50000, or just women with income over 50000? By default,

the answer is the latter: women with income over 50000. The second restriction augments the ﬁrst,

or in other words the ﬁnal restriction is the logical product of the new restriction and any restriction

that is already in place. If you want a new restriction to replace any existing restrictions you can ﬁrst

recreate the full dataset using

smpl --full

Alternatively, you can add the replace option to the smpl command:

smpl income>50000 --restrict --replace

This option has the eﬀect of automatically re-establishing the full dataset before applying the new

restriction.

Unlike a simple “setting” of the sample, “restricting” the sample may result in selection of non-

contiguous observations from the full data set. It may therefore change the structure of the data

set.

This can be seen in the case of panel data. Say we have a panel of ﬁve ﬁrms (indexed by the variable

firm) observed in each of several years (identiﬁed by the variable year). Then the restriction

smpl year==1995 --restrict

produces a dataset that is not a panel, but a cross-section for the year 1995. Similarly

smpl firm==3 --restrict

produces a time-series dataset for ﬁrm number 3.

For these reasons (possible non-contiguity in the observations, possible change in the structure of

the data), gretl acts diﬀerently when you “restrict” the sample as opposed to simply “setting” it. In

the case of setting, the program merely records the starting and ending observations and uses these

as parameters to the various commands calling for the estimation of models, the computation of

statistics, and so on. In the case of restriction, the program makes a reduced copy of the dataset and

by default treats this reduced copy as a simple, undated cross-section—but see the further discussion

of panel data in section 5.4.

If you wish to re-impose a time-series interpretation of the reduced dataset you can do so using the

setobs command, or the GUI menu item “Data, Dataset structure”.

The fact that “restricting” the sample results in the creation of a reduced copy of the original dataset

may raise an issue when the dataset is very large. With such a dataset in memory, the creation of

a copy may lead to a situation where the computer runs low on memory for calculating regression

results. You can work around this as follows:

Chapter 5. Sub-sampling a dataset 31

1. Open the full data set, and impose the sample restriction.

2. Save a copy of the reduced data set to disk.

3. Close the full dataset and open the reduced one.

4. Proceed with your analysis.

Random sub-sampling

Besides restricting the sample on some deterministic criterion, it may sometimes be useful (when

working with very large datasets, or perhaps to study the properties of an estimator) to draw a

random sub-sample from the full dataset. This can be done using, for example,

smpl 100 --random

to select 100 cases. If you want the sample to be reproducible, you should set the seed for the random

number generator ﬁrst, using the set command. This sort of sampling falls under the “restriction”

category: a reduced copy of the dataset is made.

5.4 Panel data

Consider for concreteness the Arellano–Bond dataset supplied with gretl (abdata.gdt). This com-

prises data on 140 ﬁrms (n= 140) observed over the years 1976–1984 (T= 9). The dataset is

“nominally balanced” in the sense that that the time-series length is the same for all countries (this

being a requirement for a dataset to count as a panel in gretl), but in fact there are many missing

values (NAs).

You may want to sub-sample such a dataset in either the cross-sectional dimension (limit the sample

to a subset of ﬁrms) or the time dimension (e.g. use data from the 1980s only). One way to sub-sample

on ﬁrms keys oﬀ the notation used by gretl for panel observations. The full data range is printed as

1:1 (ﬁrm 1, period 1) to 140:9 (ﬁrm 140, period 9). The eﬀect of

smpl 1:1 80:9

is to limit the sample to the ﬁrst 80 ﬁrms. Note that if you instead tried smpl 1:1 80:4 this would

provoke an error: you cannot use this syntax to sub-sample in the time dimension of the panel.

Alternatively, and perhaps more naturally, you can use the --unit option with the smpl command

to limit the sample in the cross-sectional dimension, as in

smpl 1 80 --unit

The ﬁrms in the Arellano–Bond dataset are anonymous, but suppose you had a panel with ﬁve

named countries. With such a panel you can inform gretl of the names of the groups using the

setobs command. For example, given

string cstr = "Portugal Italy Ireland Greece Spain"

setobs country cstr --panel-groups

gretl creates a string-valued series named country with group names taken from the variable cstr.

Then, to include only Italy and Spain you could do

smpl country=="Italy" || country=="Spain" --restrict

or to exclude one country,

smpl country!="Ireland" --restrict

Sub-sampling a panel in the time dimension can be done via --restrict. For example, the Arellano–

Bond dataset contains a variable named YEAR that records the year of the observations and if one

wanted to omit the ﬁrst two years of data one could do

Chapter 5. Sub-sampling a dataset 32

smpl YEAR >= 1978 --restrict

If a dataset does not already incude a suitable variable for this purpose one can use the command

genr time to create a simple 1-based time index.

Another way to sub-sample in the time dimension of a panel starts with a speciﬁcation of time via

the setobs command, as in

setobs 1 1976 --panel-time

This tells gretl that panel-time is annual (frequency 1), starting in 1976. (In fact this is already done

for abdata.gdt.) Then to restrict the sample range to 1979–1982 you can do

smpl 1979 1982 --time

Note that if you apply a sample restriction that just selects certain units (ﬁrms, countries or whatever),

or selects certain contiguous time-periods—such that n > 1, T > 1 and the time-series length is still

the same across all included units—your sub-sample will still be interpreted by gretl as a panel.

Unbalancing restrictions

In some cases one wants to sub-sample according to a criterion that“cuts across the grain” of a panel

dataset. For instance, suppose you have a micro dataset with thousands of individuals observed over

several years and you want to restrict the sample to observations on employed women.

If we simply extracted from the total nT rows of the dataset those that pertain to women who were

employed at time t(t= 1, . . . , T) we would likely end up with a dataset that doesn’t count as a

panel in gretl (because the speciﬁc time-series length, Ti, would diﬀer across individuals). In some

contexts it might be OK that gretl doesn’t take your sub-sample to be a panel, but if you want to

apply panel-speciﬁc methods this is a problem. You can solve it by giving the --preserve-panel

option with smpl. For example, supposing your dataset contained dummy variables gender (with the

value 1 coding for women) and employed, you could do

smpl gender==1 && employed==1 --restrict --preserve-panel

What exactly does this do? Well, let’s say the years of your data are 2000, 2005 and 2010, and that

some women were employed in all of those years, giving a maximum Tivalue of 3. But individual 526

is a woman who was employed only in the year 2000 (Ti= 1). The eﬀect of the --preserve-panel

option is then to insert “padding rows” of NAs for the years 2005 and 2010 for individual 526, and

similarly for all individuals with 0 < Ti<3. Your sub-sample then qualiﬁes as a panel.

5.5 Resampling and bootstrapping

Given an original data series x, the command

series xr = resample(x)

creates a new series each of whose elements is drawn at random from the elements of x. If the original

series has 100 observations, each element of xis selected with probability 1/100 at each drawing.

Thus the eﬀect is to “shuﬄe” the elements of x, with the twist that each element of xmay appear

more than once, or not at all, in xr.

The primary use of this function is in the construction of bootstrap conﬁdence intervals or p-values.

Here is a simple example. Suppose we estimate a simple regression of yon xvia OLS and ﬁnd that

the slope coeﬃcient has a reported t-ratio of t0with νdegrees of freedom. A two-tailed p-value for

the null hypothesis that the slope parameter equals zero can then be found using the t(ν) distribution.

Depending on the context, however, we may doubt whether the ratio of coeﬃcient to standard error

truly follows the t(ν) distribution. In that case we could derive a bootstrap p-value as shown in

Listing 5.1.

Under the null hypothesis that the slope with respect to xis zero, yis simply equal to its mean plus

an error term. We simulate yby resampling the residuals from the initial OLS and re-estimate the

Chapter 5. Sub-sampling a dataset 33

model. We repeat this procedure a large number of times, and record the number of cases where the

absolute value of the t-ratio is greater than t0: the proportion of such cases is our bootstrap p-value.

For a good discussion of simulation-based tests and bootstrapping, see Davidson and MacKinnon

(2004, chapter 4); Davidson and Flachaire (2001) is also instructive.

Listing 5.1: Calculation of bootstrap p-value [Download ▼]

nulldata 50

set seed 54321

series x = normal()

series y = 10 + x + 2*normal()

ols y 0 x

# the reported t-stat

t0 = abs($coeff[2] / $stderr[2])

# save the residuals

series u = $uhat

scalar ybar = mean(y)

# number of replications for bootstrap

scalar B = 1000

scalar tcount = 0

series ysim

loop B

# generate simulated y by resampling

ysim = ybar + resample(u)

ols ysim 0 x --quiet

scalar tsim = abs($coeff[2] / $stderr[2])

tcount += (tsim > t0)

endloop

printf "proportion of cases with |t| > %.3f = %g\n", t0, tcount / B

Chapter 6

Graphics

6.1 Gnuplot graphs

A separate program, gnuplot, is called to generate graphs. Gnuplot is a very full-featured graphing

program with myriad options. It is available from www.gnuplot.info (but note that a suitable copy of

gnuplot is bundled with the packaged versions of gretl for MS Windows and Mac OS X). Gretl gives

you direct access, via a graphical interface, to a subset of gnuplot’s options and it tries to choose

sensible values for you; it also allows you to take complete control over graph details if you wish.

With a graph displayed, you can right-click on the graph window (or use the “hamburger” toolbar

button) for a pop-up menu with the following options.

•Save as PNG: Save the graph in Portable Network Graphics format (the same format that you

see on screen).

•Save as postscript (EPS): Save in encapsulated postscript format.

•Save as PDF: Save in PDF format.

•Save as Windows metaﬁle: Save in Enhanced Metaﬁle (EMF) format (with color and monochrome

options).

•Copy to clipboard: with color and monochrome options.

•Save to session as icon: The graph will appear in iconic form when you select “Icon view” from

the View menu.

•Zoom: Lets you select an area within the graph for closer inspection (not available for all

graphs).

•Display PDF: view a PDF version of the graph.

•Edit: Opens a controller for the plot which lets you adjust many aspects of its appearance.

•Close: Closes the graph window.

If you select Save as postscript or Save as PDF you get a dialog box that lets you adjust several aspects

of the graph, and also preview the result.

Displaying data labels

For simple X-Y scatter plots, some further options are available if the dataset includes “case markers”

(that is, labels identifying each observation).1With a scatter plot displayed, when you move the

mouse pointer over a data point its label is shown on the graph. By default these labels are transient:

they do not appear in the printed or copied version of the graph. They can be removed by selecting

“Clear data labels” from the graph pop-up menu. If you want the labels to be aﬃxed permanently

(so they will show up when the graph is printed or copied), select the option “Freeze data labels”

from the pop-up menu; “Clear data labels” cancels this operation. The other label-related option,

“All data labels”, requests that case markers be shown for all observations. At present the display of

case markers is disabled for graphs containing more than 250 data points.

1For an example of such a dataset, see the Ramanathan ﬁle data4-10: this contains data on private school enrollment

for the 50 states of the USA plus Washington, DC; the case markers are the two-letter codes for the states.

Chapter 6. Graphics 35

GUI plot editor

Selecting the Edit option in the graph popup menu opens an editing dialog box, shown in Figure 6.1.

Notice that there are several tabs, allowing you to adjust many aspects of a graph’s appearance: font,

title, axis scaling, line colors and types, and so on. You can also add lines or descriptive labels to a

graph (under the Lines and Labels tabs). The “Apply” button applies your changes without closing

the editor; “OK” applies the changes and closes the dialog.

Figure 6.1: gretl’s plot controller

Publication-quality graphics: advanced options

The GUI plot editor has two limitations. First, it cannot represent all the myriad options that gnuplot

oﬀers. Users who are suﬃciently familiar with gnuplot to know what they’re missing in the plot editor

presumably don’t need much help from gretl, so long as they can get hold of the gnuplot command ﬁle

that gretl has put together. Second, even if the plot editor meets your needs, in terms of ﬁne-tuning

the graph you see on screen, a few details may need further work in order to get optimal results for

publication.

Either way, the ﬁrst step in advanced tweaking of a graph is to get access to the graph command ﬁle.

•In the graph display window, right-click and choose “Save to session as icon”.

•If it’s not already open, open the icon view window—either via the menu item View/Icon view,

or by clicking the “session icon view” button on the main-window toolbar.

•Right-click on the icon representing the newly added graph and select “Edit plot commands”

from the pop-up menu.

•You get a window displaying the plot ﬁle (Figure 6.2).

Here are the basic things you can do in this window. Obviously, you can edit the ﬁle you just opened.

You can also send it for processing by gnuplot, by clicking the “Execute” (cogwheel) icon in the

Chapter 6. Graphics 36

Figure 6.2: Plot commands editor

toolbar. Or you can use the “Save as” button to save a copy for editing and processing as you wish.

And please note that the Help button on the toolbar (a lifebelt in Figure 6.2) gives you access to the

gnuplot manual.

One relatively simple editorial job would be to set a chosen driver (or “terminal” in gnuplot parlance)

and output ﬁlename. For example, to get PDF output you could insert lines like the following at the

top:

# PDF, slightly amended (the default size is 5in x 3in)

set term pdfcairo font "Sans,6" size 5in,3.5in

set output ’mygraph1.pdf’

# or small size

set term pdfcairo font "Sans,5" size 3in,2in

set output ’mygraph2.pdf’

# or with size given in centimeters

set term pdfcairo font "Sans,6" size 6cm,4.2cm

set output ’mygraph3.pdf’

Or substitute epscairo for pdfcairo (and change the ﬁlenames) if you want EPS output. However,

such changes may be more easily made via the Save as PDF and Save as postscript options in the plot

menu.2

The real payoﬀ to editing the plot code can be obtained if you dive into the details and employ

gnuplot features that are not accessible via gretl, and/or use one of the terminal types not directly

supported by gretl, such as context (ConTeXt), mp (MetaPost), lua (Lua) or pslatex (L

X picture

environment with PostScript specials). The lua terminal with the tikz option is especially useful

for L

X users, because it produces a tikzpicture environment, which oﬀers almost unlimited

customization possibilities (note that in order to use plots produced in this way you’ll also need the

gnuplot-lua-tikz L

X package).

2A “traditional” postscript terminal may also be available in gnuplot, with an eps option. The defaults in this

case are quite diﬀerent from epscairo, and to make use of the alternative you’ll have to consult the gnuplot manual.

Chapter 6. Graphics 37

To ﬁnd out more about gnuplot visit gnuplot.sourceforge.net. This site has documentation for the

current version of the program in various formats along with a large collection of demonstration plots.

Additional tips

To be written. Line widths, enhanced text. Show a “before and after” example.

6.2 Plotting graphs from scripts

When working with scripts, you may want to have a graph shown onto your display or saved into a

ﬁle. In fact, if in your usual workﬂow you ﬁnd yourself creating similar graphs over and over again,

you might want to consider the option of writing a script which automates this process for you. gretl

gives you two main tools for doing this: one is a command called gnuplot, whose main use is to

create standard plot quickly. The other one is the plot command block, which has a more elaborate

syntax but oﬀers you more control on output.

The gnuplot command

The gnuplot command is described at length in the Gretl Command Reference and the online help

system. Here, we just summarize its main features: basically, it consists of the gnuplot keyword,

followed by a list of items, telling the command what you want plotted and a list of options, telling

it how you want it plotted.

For example, the line

gnuplot y1 y2 x

will give you a basic XY plot of the two series y1 and y2 on the vertical axis versus the series xon

the horizontal axis. In general, the arguments to the gnuplot command is a list of series, the last

of which goes on the x-axis, while all the other ones go onto the y-axis. By default, the gnuplot

command gives you a scatterplot. If you just have one variable on the y-axis, then gretl will also draw

a the OLS interpolation, if the ﬁt is good enough.3

Several aspects of the behavior described above can be modiﬁed. You do this by appending options

to the command. Most options can be broadly grouped in three categories:

1. Plot styles: we support points (the default choice), lines, lines and points together, and impulses

(vertical lines).

2. Algorithm for the ﬁtted line: here you can choose between linear, quadratic and cubic inter-

polation, but also more exotic choices, such as semi-log, inverse or loess (non-parametric). Of

course, you can also turn this feature oﬀ.

3. Input and output: you can choose whether you want your graph on your computer screen (and

possibly use the in-built graphical widget to further customize it — see above, page 35), or

rather save it to a ﬁle. We support several graphical formats, among which PNG and PDF, to

make it easy to incorporate your plots into text documents.

Listing 6.1 shows examples of some traditional plots in macroeconomics, using time series from the

“area-wide model” dataset produced by the European Central Bank, which is shipped with gretl in

the ﬁle AWM.gdt.PCR is aggregate private real consumption and YER is real GDP.

The ﬁrst command line in the listing plots consumption against income as a kind of Keynesian

consumption function. More precisely, it produces a simple scatter plot with an automatically linear

ﬁtted line. If this is executed in the gretl console the plot will be directly shown in a new window,

but if this line is contained in a script then instead a ﬁle with the plot commands will be saved for

later execution. The second example line changes this behavior for a script command and forces the

plot to be shown directly.

The third line instead asks for a plot of the two variables as two separate curves against time on

the x-axis. Each observation point is drawn separately with a certain symbol determined by gnuplot

3The technical condition for this is that the two-tailed p-value for the slope coeﬃcient should be under 10%.

Chapter 6. Graphics 38

Listing 6.1: Plotting macroeconomic data

open AWM.gdt --quiet

# --- consumption and income, different styles ------------

gnuplot PCR YER

gnuplot PCR YER --output=display

gnuplot PCR YER --output=display --time-series

gnuplot PCR YER --output=display --time-series --with-lines

# --- Phillips’ curve, different fitted lines -------------

gnuplot INFQ URX --output=display

gnuplot INFQ URX --fit=none --output=display

gnuplot INFQ URX --fit=inverse --output=display

gnuplot INFQ URX --fit=loess --output=display

defaults. If you add the option -with-lines the points will be connected with a continuous line and

the symbols omitted.

The second batch of examples demonstrate how the ﬁtted line in the scatter plot can be controlled

from gretl’s side. The option -fit=none overrides gnuplot’s default to draw a line if it deems the

ﬁt to be “good enough”. The eﬀect of -fit=inverse is to consider the variable on the y-axis as a

function of 1/X instead of Xand draw the corresponding hyperbolic branch. For the workings of

a Loess ﬁt (locally-weighted polynomial regression) please refer to the documentation of the loess

function.

For more detail, consult the Gretl Command Reference.

The plot command block

The plot environment is a way to pass information to Gnuplot in a more structured way, so that

customization of basic plots becomes easier. It has the following characteristics:

The block starts with the plot keyword, followed by a required parameter: the name of a list, a single

series or a matrix. This parameter speciﬁes the data to be plotted. The starting line may be preﬁxed

with the savename <- apparatus to save a plot as an icon in the GUI program. The block ends with

end plot.

Inside the block you have zero or more lines of these types, identiﬁed by an initial keyword:

option: specify a single option (details below)

options: specify multiple options on a single line; if more than one option is given on a line, the

options should be separated by spaces.

literal: a command to be passed to gnuplot literally

printf: a printf statement whose result will be passed to gnuplot literally; this allows the use of

string variables without having to resort to @-style string substitution.

The options available are basically those of the current gnuplot command, but with a few diﬀerences.

For one thing you don’t need the leading double-dash in an ”option” (or ”options”) line. Besides that,

•You can’t use the option --matrix=whatever with plot: that possibility is handled by providing

the name of a matrix on the initial plot line.

•The --input=filename option is not supported: use gnuplot for the case where you’re sup-

plying the entire plot speciﬁcation yourself.

Chapter 6. Graphics 39

•The several options pertaining to the presence and type of a ﬁtted line, are replaced in plot by

a single option fit which requires a parameter. Supported values for the parameter are: none,

linear, quadratic, cubic, inverse, semilog and loess. Example:

option fit=quadratic

As with gnuplot, the default is to show a linear ﬁt in an X-Y scatter if it’s signiﬁcant at the 10

percent level.

Here’s a simple example, the plot speciﬁcation from the “bandplot” package, which shows how to

achieve the same result via the gnuplot command and a plot block, respectively—the latter occupies

a few more lines but is clearer

gnuplot 1 2 3 4 --with-lines --matrix=plotmat \

--fit=none --output=display \

{ set linetype 3 lc rgb "#0000ff"; set title "@title"; \

set nokey; set xlabel "@xname"; }

plot plotmat

options with-lines fit=none

literal set linetype 3 lc rgb "#0000ff"

literal set nokey

printf "set title \"%s\"", title

printf "set xlabel \"%s\"", xname

end plot --output=display

Note that --output=display is appended to end plot; also note that if you give a matrix to plot

it’s assumed you want to plot all the columns. In addition, if you give a single series and the dataset

is time series, it’s assumed you want a time-series plot.

Example: Plotting an histogram together with a density

Listing 6.2 contains a slightly more elaborate example: here we load the Mroz example dataset and

calculate the log of the individual’s wage. Then, we match the histogram of a discretized version of

the same variable (obtained via the aggregate() function) versus the theoretical density if data were

Gaussian.

There are a few points to note:

•The data for the plot are passed through a matrix in which we set column names via the

cnameset function; those names are then automatically used by the plot environment.

•In this example, we make extensive use of the set literal construct for reﬁning the plot by

passing instruction to gnuplot; the power of gnuplot is impossible to overstate. We encourage

you to visit the “demos” version of gnuplot’s website (http://gnuplot.sourceforge.net/) and

revel in amazement.

•In the plot environment you can use all the quantities you have in your script. This is the way

we calibrate the histogram width (try setting the scalar kin the script to diﬀerent values). Note

that the printf command has a special meaning inside a plot environment.

•The script displays the plot on your screen. If you want to save it to a ﬁle instead, replace

--output=display at the end with --output=filename .

•It’s OK to insert comments in the plot environment; actually, it’s a rather good idea to comment

as much as possible (as always)!

The output from the script is shown in Figure 6.3.

Chapter 6. Graphics 40

Listing 6.2: Plotting the log wage from the Mroz example dataset [Download ▼]

set verbose off

open mroz87.gdt --quiet

series lWW = log(WW)

scalar m = mean(lWW)

scalar s = sd(lWW)

###

### prepare matrix with data for plot

###

# number of valid observations

scalar n = nobs(lWW)

# discretize log wage

scalar k = 4

series disc_lWW = round(lWW*k)/k

# get frequencies

matrix f = aggregate(null, disc_lWW)

# add density

phi = dnorm((f[,1] - m)/s) / (s*k)

# put columns together and add labels

plotmat = f[,2]./n ~ phi ~ f[,1]

strings cnames = defarray("frequency", "density", "log wage")

cnameset(plotmat, cnames)

###

### create plot

###

plot plotmat

# move legend

literal set key outside rmargin

# set line style

literal set linetype 2 dashtype 2 linewidth 2

# set histogram color

literal set linetype 1 lc rgb "#777777"

# set histogram style

literal set style fill solid 0.25 border

# set histogram width

printf "set boxwidth %4.2f\n", 0.5/k

options with-lines=2 with-boxes=1

end plot --output=display

Chapter 6. Graphics 41

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

-2 -1 0 1 2 3

log wage

frequency

density

Figure 6.3: Output from listing 6.2

Listing 6.3: Plotting tdensities for varying degrees of freedom [Download ▼]

set verbose off

function string tplot(scalar m)

return sprintf("stud(x,%d) title \"t(%d)\"", m, m)

end function

matrix dfs = {2, 4, 16}

plot

literal set xrange [-4.5:4.5]

literal set yrange [0:0.45]

literal Binv(p,q) = exp(lgamma(p+q)-lgamma(p)-lgamma(q))

literal stud(x,m) = Binv(0.5*m,0.5)/sqrt(m)*(1.0+(x*x)/m)**(-0.5*(m+1.0))

printf "plot %s, %s, %s", tplot(dfs[1]), tplot(dfs[2]), tplot(dfs[3])

end plot --output=display

Chapter 6. Graphics 42

Example: Plotting Student’s tdensities

The power of the printf statement in a plot block becomes apparent when used jointly with user-

deﬁned functions, as exempliﬁed in Listing 6.3, in which we create a plot showing the density functions

of Student’s tdistribution for three diﬀerent settings of the “degrees of freedom”parameter (note that

plotting a tdensity is very easy to do from the GUI: just go to the Tools >Distribution graphs menu).

First we deﬁne a user function called tplot, which returns a string with the ingredients to pass to

the gnuplot plot statement, as a function of a scalar parameter (the degrees of freedom in our case).

Next, this function is used within the plot block to plot the appropriate density. Note that most

of the statements to mathematically deﬁne the function to plot are outsourced to gnuplot via the

literal command.

The output from the script is shown in Figure 6.4.

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

t(2)

t(4)

t(16)

Figure 6.4: Output from listing 6.3

Chapter 6. Graphics 43

6.3 Boxplots

These plots (after Tukey and Chambers) display the distribution of a variable. Its shape depends on

a few quantities, deﬁned as follows:

xmin sample minimum

Q1ﬁrst quartile

mmedian

¯xmean

Q3third quartile

xmax sample maximum

R=Q3−Q1interquartile range

The central box encloses the middle 50 percent of the data, i.e. goes from Q1to Q3; therefore, its

height equals R. A line is drawn across the box at the median mand a “+” sign identiﬁes the mean ¯x.

The length of the “whiskers” depends on the presence of outliers. The top whisker extends from the

top of the box up to a maximum of 1.5 times the interquartile range, but can be shorter if the sample

maximum is lower than that value; that is, it reaches min[xmax, Q3+ 1.5R]. Observations larger than

Q3+ 1.5R, if any, are considered outliers and represented individually via dots.4The bottom whisker

obeys the same logic, with obvious adjustments. Figure 6.5 provides an example of all this by using

the variable FAMINC from the sample dataset mroz87.

20000

40000

60000

80000

FAMINC

¯x

xmin

xmax

outliers

Figure 6.5: Sample boxplot

In the case of boxplots with conﬁdence intervals, dotted lines show the limits of an approximate 90

percent conﬁdence interval for the median. This is obtained by the bootstrap method, which can take

a while if the data series is very long. For details on constructing boxplots, see the entry for boxplot

in the Gretl Command Reference or use the Help button that appears when you select one of the

boxplot items under the menu item “View, Graph speciﬁed vars” in the main gretl window.

4To give you an intuitive idea, if a variable is normally distributed, the chances of picking an outlier by this deﬁnition

are slightly below 0.7%.

Chapter 6. Graphics 44

Factorized boxplots

A nice feature which is quite useful for data visualization is the conditional, or factorized boxplot.

This type of plot allows you to examine the distribution of a variable conditional on the value of some

discrete factor.

As an example, we’ll use one of the datasets supplied with gretl, that is rac3d, which contains an

example taken from Cameron and Trivedi (2013) on the health conditions of 5190 people. The script

below compares the unconditional (marginal) distribution of the number of illnesses in the past 2

weeks with the distribution of the same variable, conditional on age classes.

open rac3d.gdt

# unconditional boxplot

boxplot ILLNESS --output=display

# create a discrete variable for age class:

# 0 = below 20, 1 = between 20 and 39, etc

series age_class = floor(AGE/0.2)

# conditional boxplot

boxplot ILLNESS age_class --factorized --output=display

After running the code above, you should see two graphs similar to Figure 6.6. By comparing the

marginal plot to the factorized one, the eﬀect of age on the mean number of illnesses is quite evident:

by joining the green crosses you get what is technically known as the conditional mean function, or

regression function if you prefer.

ILLNESS

0 1 2 3

ILLNESS

age_clas s

Distribu tion of ILLNES S by age_cla ss

Figure 6.6: Conditional and unconditional distribution of illnesses

Chapter 7

Joining data sources

7.1 Introduction

Gretl provides two commands for adding data from ﬁle to an existing dataset in the program’s

workspace, namely append and join. The append command, which has been available for a long

time, is relatively simple and is described in the Gretl Command Reference. Here we focus on the

join command, which is much more ﬂexible and sophisticated. This chapter gives an overview of the

functionality of join along with a detailed account of its syntax and options. We provide several toy

examples and discuss one real-world case at length.

First, a note on terminology: in the following we use the terms “left-hand” and“inner” to refer to the

dataset that is already in memory, and the terms “right-hand” and “outer” to refer to the dataset in

the ﬁle from which additional data are to be drawn.

Two main features of join are worth emphasizing at the outset:

•“Key”variables can be used to match speciﬁc observations (rows) in the inner and outer datasets,

and this match need not be 1 to 1.

•A row ﬁlter may be applied to screen out unwanted observations in the outer dataset.

As will be explained below, these features support rather complex concatenation and manipulation

of data from diﬀerent sources.

A further aspect of join should be noted—one that makes this command particularly useful when

dealing with very large data ﬁles. That is, when gretl executes a join operation it does not, in

general, read into memory the entire content of the right-hand side dataset. Only those columns that

are actually needed for the operation are read in full. This makes join faster and less demanding

of computer memory than the methods available in most other software. On the other hand, gretl’s

asymmetrical treatment of the “inner” and “outer” datasets in join may require some getting used

to, for users of other packages.

7.2 Basic syntax

The minimal invocation of join is

join ﬁlename varname

where ﬁlename is the name of a data ﬁle and varname is the name of a series to be imported. Only two

sorts of data ﬁle are supported at present: delimited text ﬁles (where the delimiter may be comma,

space, tab or semicolon) and “native” gretl data ﬁles (gdt or gdtb). A series named varname may

already be present in the left-hand dataset, but that is not required. The series to be imported may

be numerical or string-valued. For most of the discussion below we assume that just a single series is

imported by each join command, but see section 7.7 for an account of multiple imports.

The eﬀect of the minimal version of join is this: gretl looks for a data column labeled varname in

the speciﬁed ﬁle; if such a column is found and the number of observations on the right matches the

number of observations in the current sample range on the left, then the values from the right are

copied into the relevant range of observations on the left. If varname does not already exist on the

left, any observations outside of the current sample are set to NA; if it exists already then observations

outside of the current sample are left unchanged.

The case where you want to rename a series on import is handled by the --data option. This option

has one required argument, the name by which the series is known on the right. At this point we

need to explain something about right-hand variable names (column headings).

Chapter 7. Joining data sources 46

Right-hand names

We accept on input arbitrary column heading strings, but if these strings do not qualify as valid gretl

identiﬁers they are automatically converted, and in the context of join you must use the converted

names. A gretl identiﬁer must start with a letter, contain nothing but (ASCII) letters, digits and the

underscore character, and must not exceed 31 characters. The rules used in name conversion are:

1. Skip any leading non-letters.

2. Until the 31-character is reached or the input is exhausted: transcribe “legal” characters; skip

“illegal” characters apart from spaces; and replace one or more consecutive spaces with an

underscore, unless the last character transcribed is an underscore in which case space is skipped.

In the unlikely event that this policy yields an empty string, we replace the original with coln, where

nis replaced by the 1-based index of the column in question among those used in the join operation.

If you are in doubt regarding the converted name of a given column, the function fixname() can

be used as a check: it takes the original string as an argument and returns the converted name.

Examples:

? eval fixname("valid_identifier")

valid_identifier

? eval fixname("12. Some name")

Some_name

Returning to the use of the --data option, suppose we have a column headed "12. Some name" on

the right and wish to import it as x. After ﬁguring how the right-hand name converts, we can do

join foo.csv x --data="Some_name"

No right-hand names?

Some data ﬁles have no column headings; they jump straight into the data (and you need to determine

from accompanying documentation what the columns represent). Since gretl expects column headings,

you have to take steps to get the importation right. It is generally a good idea to insert a suitable

header row into the data ﬁle. However, if for some reason that’s not practical, you should give the

--no-header option, in which case gretl will name the columns on the right as col1,col2 and so

on. If you do not do either of these things you will likely lose the ﬁrst row of data, since gretl will

attempt to make variable names out of it, as described above.

7.3 Filtering

Rows from the outer dataset can be ﬁltered using the --filter option. The required parameter for

this option is a Boolean condition, that is, an expression which evaluates to non-zero (true, include

the row) or zero (false, skip the row) for each of the outer rows. The ﬁlter expression may include

any of the following terms: up to three “right-hand” series (under their converted names as explained

above); scalar or string variables deﬁned “on the left”; any of the operators and functions available in

gretl (including user-deﬁned functions); and numeric or string constants.

Here are a few simple examples of potentially valid ﬁlter options (assuming that the speciﬁed right-

hand side columns are found):

# 1. relationship between two right-hand variables

--filter="x15<=x17"

# 2. comparison of right-hand variable with constant

--filter="nkids>2"

# 3. comparison of string-valued right-hand variable with string constant

--filter="SEX==\"F\""

# 4. filter on valid values of a right-hand variable

Chapter 7. Joining data sources 47

--filter=!missing(income)

# 5. compound condition

--filter="x < 100 && (x > 0 || y > 0)"

Note that if you are comparing against a string constant (as in example 3 above) it is necessary to put

the string in “escaped” double-quotes (each double-quote preceded by a backslash) so the interpreter

knows that Fis not supposed to be the name of a variable.

It is safest to enclose the whole ﬁlter expression in double quotes, however this is not strictly required

unless the expression contains spaces or the equals sign.

In general, an error is ﬂagged if a missing value is encountered in a series referenced in a ﬁlter

expression. This is because the condition then becomes indeterminate; taking example 2 above, if

the nkids value is NA on any given row we are not in a position to evaluate the condition nkids>2.

However, you can use the missing() function—or ok(), which is a shorthand for !missing()—if

you need a ﬁlter that keys oﬀ the missing or non-missing status of a variable.

7.4 Matching with keys

Things get interesting when we come to key-matching. The purpose of this facility is perhaps best

introduced by example. Suppose that (as with many survey and census-based datasets) we have a

dataset that is composed of two or more related ﬁles, each having a diﬀerent unit of observation; for

example we have a “persons” data ﬁle and a “households”data ﬁle. Table 7.1 shows a simple, artiﬁcial

case. The ﬁle people.csv contains a unique identiﬁer for the individuals, pid. The households ﬁle,

hholds.csv, contains the unique household identiﬁer hid, which is also present in the persons ﬁle.

As a ﬁrst example of join with keys, let’s add the household-level variable xh to the persons dataset:

open people.csv --quiet

join hholds.csv xh --ikey=hid

print --byobs

The basic key option is named ikey; this indicates “inner key”, that is, the key variable found in the

left-hand or inner dataset. By default it is assumed that the right-hand dataset contains a column of

the same name, though as we’ll see below that assumption can be overridden. The join command

above says, ﬁnd a series named xh in the right-hand dataset and add it to the left-hand one, using

the values of hid to match rows. Looking at the data in Table 7.1 we can see how this should work.

Persons 1 and 2 are both members of household 1, so they should both get values of 1 for xh; persons

3 and 4 are members of household 2, so that xh = 4; and so on. Note that the order in which the

key values occur on the right-hand side does not matter. The gretl output from the print command

is shown in the lower panel of Table 7.1.

Note that key variables are treated conceptually as integers. If a speciﬁed key contains fractional

values these are truncated.

Two extensions of the basic key mechanism are available.

•If the outer dataset contains a relevant key variable but it goes under a diﬀerent name from

the inner key, you can use the --okey option to specify the outer key. (As with other right-

hand names, this does not have to be a valid gretl identiﬁer.) So, for example, if hholds.csv

contained the hid information, but under the name HHOLD, the join command above could be

modiﬁed as

join hholds.csv xh --ikey=hid --okey=HHOLD

•If a single key is not suﬃcient to generate the matches you want, you can specify a double key

in the form of two series names separated by a comma; in this case the importation of data is

restricted to those rows on which both keys match. The syntax here is, for example

join foo.csv x --ikey=key1,key2

Again, the --okey option may be used if the corresponding right-hand columns are named

diﬀerently. The same number of keys must be given on the left and the right, but when a

Chapter 7. Joining data sources 48

people.csv hholds.csv

pid,hid,gender,age,xp hid,country,xh

1,1,M,50,1 1,US,1

2,1,F,40,2 6,IT,12

3,2,M,30,3 3,UK,6

4,2,F,25,2 4,IT,8

5,3,M,40,3 2,US,4

6,4,F,35,4 5,IT,10

7,4,M,70,3

8,4,F,60,3

9,5,F,20,4

10,6,M,40,4

pid hid xh

1 1 1

2 1 1

3 2 4

4 2 4

5 3 6

6 4 8

7 4 8

8 4 8

9 5 10

10 6 12

Table 7.1: Two linked CSV data ﬁles, and the eﬀect of a join

Chapter 7. Joining data sources 49

double key is used and only one of the key names diﬀers on the right, the name that is in

common may be omitted (although the comma separator must be retained). For example, the

second of the following lines is acceptable shorthand for the ﬁrst:

join foo.csv x --ikey=key1,Lkey2 --okey=key1,Rkey2

join foo.csv x --ikey=key1,Lkey2 --okey=,Rkey2

The number of key-matches

The example shown in Table 7.1 is an instance of a 1 to 1 match: applying the matching criterion

produces exactly one value of the variable xh corresponding to each row of the inner dataset. Three

other possibilities arise:

•Some rows on the left have multiple matches on the right (“1 to nmatching”).

•Some rows on the right have multiple matches on the left (“nto 1 matching”).

•Some rows in the inner dataset have no match on the right.

The ﬁrst case is addressed in detail in the next section; here we discuss the others.

The nto 1 case is straightforward. If a particular key value (or combination of key values) occurs at

each of n > 1 observations on the left but at a single observation on the right, then the right-hand

value is entered at each of the matching slots on the left.

The handling of the case where there’s no match on the right depends on whether the join operation

is adding a new series to the inner dataset or modifying an existing one. If it’s a new series, then

unmatched rows automatically get NA for the imported data. However, if join is pulling in values for

a series already present on the left only matched rows will be updated. In other words we do not

overwite an existing value on the left with NA when there’s no match on the right.

These defaults may not produce the desired results in every case but gretl provides the means to

modify the eﬀect if need be. We will illustrate with two scenarios.

First consider adding a new series recording“number of hours worked”when the inner dataset contains

individuals and the outer ﬁle contains data on jobs. If an individual does not appear in the jobs ﬁle, we

may want to take her hours worked as implicitly zero rather than NA. In this case gretl’s misszero()

function can be used to turn NA into 0 in the imported series.

Second, consider updating a series via join when the outer ﬁle is presumed to contain all available

updated values, such that “no match” should be taken as an implicit NA. In that case we want the

(presumably out-of-date) values on any unmatched rows to be overwritten with NA. Let the series in

question be called x(both on the left and the right) and let the common key be called pid. The

solution is then

join update.csv tmpvar --data=x --ikey=pid

x = tmpvar

As a new variable, tmpvar will get NA for all unmatched rows; we then transcribe its values into x. In

a more complicated case one might use the smpl command to limit the sample range before assigning

tmpvar to x, or use the conditional assignment operator ?:.

One further point: given some missing values in an imported series you may want to know whether

(a) the NAs were explicitly represented in the outer data ﬁle or (b) they arose due to “no match”. You

can ﬁnd this out by using a method described in the following section, namely the count variant of

the aggregation option: this will give you a series with 0 values for all and only unmatched rows.

7.5 Aggregation

In the case of 1 to nmatching of rows (n > 1) the user must specify an “aggregation method”; that

is, a method for mapping from nrows down to one. This is handled by the --aggr option which

requires a single argument from the following list:

Chapter 7. Joining data sources 50

Code Value returned

count count of matches

avg mean of matching values

sum sum of matching values

min minimum of matching values

max maximum of matching values

seq:ithe ith matching value (e.g. seq:2)

min(aux)minimum of matching values of auxiliary variable

max(aux)maximum of matching values of auxiliary variable

Note that the count aggregation method is special, in that there is no need for a “data series” on the

right; the imported series is simply a function of the speciﬁed key(s). All the other methods require

that “actual data” are found on the right. Also note that when count is used, the value returned

when no match is found is (as one might expect) zero rather than NA.

The basic use of the seq method is shown above: following the colon you give a positive integer

representing the (1-based) position of the observation in the sequence of matched rows. Alternatively,

a negative integer can be used to count down from the last match (seq:-1 selects the last match,

seq:-2 the second-last match, and so on). If the speciﬁed sequence number is out of bounds for a

given observation this method returns NA.

Referring again to the data in Table 7.1, suppose we want to import data from the persons ﬁle into a

dataset established at household level. Here’s an example where we use the individual age data from

people.csv to add the average and minimum age of household members.

open hholds.csv --quiet

join people.csv avgage --ikey=hid --data=age --aggr=avg

join people.csv minage --ikey=hid --data=age --aggr=min

Here’s a further example where we add to the household data the sum of the personal data xp, with

the twist that we apply ﬁlters to get the sum speciﬁcally for household members under the age of 40,

and for women.

open hholds.csv --quiet

join people.csv young_xp --ikey=hid --filter="age<40" --data=xp --aggr=sum

join people.csv female_xp --ikey=hid --filter="gender==\"F\"" --data=xp --aggr=sum

The possibility of using an auxiliary variable with the min and max modes of aggregation gives extra

ﬂexibility. For example, suppose we want for each household the income of its oldest member:

open hholds.csv --quiet

join people.csv oldest_xp --ikey=hid --data=xp --aggr=max(age)

7.6 String-valued key variables

The examples above use numerical variables (household and individual ID numbers) in the matching

process. It is also possible to use string-valued variables, in which case a match means that the string

values of the key variables compare equal (with case sensitivity). When using double keys, you can

mix numerical and string keys, but naturally you cannot mix a string variable on the left (via ikey)

with a numerical one on the right (via okey), or vice versa.

Here’s a simple example. Suppose that alongside hholds.csv we have a ﬁle countries.csv with the

following content:

country,GDP

UK,100

US,500

IT,150

FR,180

Chapter 7. Joining data sources 51

The variable country, which is also found in hholds.csv, is string-valued. We can pull the GDP of

the country in which the household resides into our households dataset with

open hholds.csv -q

join countries.csv GDP --ikey=country

which gives

hid country GDP

1 1 1 500

2 6 2 150

3 3 3 100

4 4 2 150

5 2 1 500

6 5 2 150

7.7 Importing multiple series

The examples given so far have been limited in one respect. While several columns in the outer data

ﬁle may be referenced (as keys, or in ﬁltering or aggregation) only one column has actually provided

data—and correspondingly only one series in the inner dataset has been created or modiﬁed—per

invocation of join. However, join can handle the importation of several series at once. This section

gives an account of the required syntax along with certain restrictions that apply to the multiple-

import case.

There are two ways to specify more than one series for importation:

1. The varname ﬁeld in the command can take the form of a space-separated list of names rather

than a single name.

2. Alternatively, you can give the name of an array of strings in place of varname: the elements

of this array should be the names of the series to import.

Here are the limitations:

1. The --data option, which permits the renaming of a series on import, is not available. When

importing multiple series you are obliged to accept their “outer” names, ﬁxed up as described

in section 7.2.

2. While the other join options are available, they necessarily apply uniformly to all the series

imported via a given command. This means that if you want to import several series but using

diﬀerent keys, ﬁlters or aggregation methods you must use a sequence of commands.

Here are a couple of examples of multiple imports.

# open base datafile containing keys

open PUMSdata.gdt

# join using a list of import names

join ss13pnc.csv SCHL WAGP WKHP --ikey=SERIALNO,SPORDER

# using a strings array: may be worthwhile if the array

# will be used for more than one purpose

strings S = defarray("SCHL", "WAGP", "WKHP")

join ss13pnc.csv S --ikey=SERIALNO,SPORDER

7.8 A real-world case

For a real use-case for join with cross-sectional data, we turn to the Bank of Italy’s Survey on

Household Income and Wealth (SHIW).1In ASCII form the 2010 survey results comprise 47 MB of

1Details of the survey can be found at http://www.bancaditalia.it/statistiche/indcamp/bilfait/dismicro.

The ASCII (CSV) data ﬁles for the 2010 survey are available at http://www.bancaditalia.it/statistiche/indcamp/

bilfait/dismicro/annuale/ascii/ind10_ascii.zip.

Chapter 7. Joining data sources 52

data in 29 ﬁles. In this exercise we will draw on ﬁve of the SHIW ﬁles to construct a replica of

the dataset used in Thomas Mroz’s famous paper (Mroz,1987) on women’s labor force participation,

which contains data on married women between the age of 30 and 60 along with certain characteristics

of their households and husbands.

Our general strategy is as follows: we create a “core”dataset by opening the ﬁle carcom10.csv, which

contains basic data on the individuals. After dropping unwanted individuals (all but married women),

we use the resulting dataset as a base for pulling in further data via the join command.

The complete script to do the job is given in the Appendix to this chapter; here we walk through

the script with comments interspersed. We assume that all the relevant ﬁles from the Bank of Italy

survey are contained in a subdirectory called SHIW.

Starting with carcom10.csv, we use the --cols option to the open command to import speciﬁc

series, namely NQUEST (household ID number), NORD (sequence number for individuals within each

household), SEX (male = 1, female = 2), PARENT (status in household: 1 = head of household, 2 =

spouse of head, etc.), STACIV (marital status: married = 1), STUDIO (educational level, coded from 1

to 8), ETA (age in years) and ACOM4C (size of town).

open SHIW/carcom10.csv --cols=1,2,3,4,9,10,29,41

We then restrict the sample to married women from 30 to 60 years of age, and additionally restrict

the sample of women to those who are either heads of households or spouses of the head.

smpl SEX==2 && ETA>=30 && ETA<=60 && STACIV==1 --restrict

smpl PARENT<3 --restrict

For compatibility with the Mroz dataset as presented in the gretl data ﬁle mroz87.gdt, we rename

the age and education variables as WA and WE respectively, we compute the CIT dummy and ﬁnally

we store the reduced base dataset in gretl format.

rename ETA WA

rename STUDIO WE

series CIT = (ACOM4C > 2)

store mroz_rep.gdt

The next step will be to get data on working hours from the jobs ﬁle allb1.csv. There’s a complica-

tion here. We need the total hours worked over the course of the year (for both the women and their

husbands). This is not available as such, but the variables ORETOT and MESILAV give, respectively,

average hours worked per week and the number of months worked in 2010, each on a per-job basis.

If each person held at most one job over the year we could compute his or her annual hours as

HRS = ORETOT * 52 * MESILAV/12

However, some people had more than one job, and in this case what we want is the sum of annual

hours across their jobs. We could use join with the seq aggregation method to construct this sum,

but it is probably more straightforward to read the allb1 data, compute the HRS values per job as

shown above, and save the results to a temporary CSV ﬁle.

open SHIW/allb1.csv --cols=1,2,8,11 --quiet

series HRS = misszero(ORETOT) * 52 * misszero(MESILAV)/12

store HRS.csv NQUEST NORD HRS

Now we can reopen the base dataset and join the hours variable from HRS.csv. Note that we need

a double key here: the women are uniquely identiﬁed by the combination of NQUEST and NORD. We

don’t need an okey speciﬁcation since these keys go under the same names in the right-hand ﬁle. We

deﬁne labor force participation, LFP, based on hours.

open mroz_rep.gdt

join HRS.csv WHRS --ikey=NQUEST,NORD --data=HRS --aggr=sum

WHRS = misszero(WHRS)

LFP = WHRS > 0

Chapter 7. Joining data sources 53

For reference, here’s how we could have used seq to avoid writing a temporary ﬁle:

join SHIW/allb1.csv njobs --ikey=NQUEST,NORD --data=ORETOT --aggr=count

series WHRS = 0

loop i=1..max(njobs)

join SHIW/allb1.csv htmp --ikey=NQUEST,NORD --data=ORETOT --aggr="seq:$i"

join SHIW/allb1.csv mtmp --ikey=NQUEST,NORD --data=MESILAV --aggr="seq:$i"

WHRS += misszero(htmp) * 52 * misszero(mtmp)/12

endloop

To generate the work experience variable, AX, we use the ﬁle lavoro.csv: this contains a variable

named ETALAV which records the age at which the person ﬁrst started work.

join SHIW/lavoro.csv ETALAV --ikey=NQUEST,NORD

series AX = misszero(WA - ETALAV)

We compute the woman’s hourly wage, WW, as the ratio of total employment income to annual working

hours. This requires drawing the series YL (payroll income) and YM (net self-employment income) from

the persons ﬁle rper10.csv.

join SHIW/rper10.csv YL YM --ikey=NQUEST,NORD --aggr=sum

series WW = LFP ? (YL + YM)/WHRS : 0

The family’s net disposable income is available as Yin the ﬁle rfam10.csv; we import this as FAMINC.

join SHIW/rfam10.csv FAMINC --ikey=NQUEST --data=Y

Data on number of children are now obtained by applying the count method. For the Mroz replication

we want the number of children under the age of 6, and also the number aged 6 to 18.

join SHIW/carcom10.csv KIDS --ikey=NQUEST --aggr=count --filter="ETA<=18"

join SHIW/carcom10.csv KL6 --ikey=NQUEST --aggr=count --filter=ETA<6

series K618 = KIDS - KL6

We want to add data on the women’s husbands, but how do we ﬁnd them? To do this we create an

additional inner key which we’ll call H_ID (husband ID), by sub-sampling in turn on the observations

falling into each of two classes: (a) those where the woman is recorded as head of household and

(b) those where the husband has that status. In each case we want the individual ID (NORD) of the

household member whose status is complementary to that of the woman in question. So for case (a)

we subsample using PARENT==1 (head of household) and ﬁlter the join using PARENT==2 (spouse of

head); in case (b) we do the converse. We thus construct H_ID piece-wise.

# for women who are household heads

smpl PARENT==1 --restrict --replace

join SHIW/carcom10.csv H_ID --ikey=NQUEST --data=NORD --filter="PARENT==2"

# for women who are not household heads

smpl PARENT==2 --restrict --replace

join SHIW/carcom10.csv H_ID --ikey=NQUEST --data=NORD --filter="PARENT==1"

smpl full

Now we can use our new inner key to retrieve the husbands’ data, matching H_ID on the left with

NORD on the right within each household.

join SHIW/carcom10.csv HA --ikey=NQUEST,H_ID --okey=NQUEST,NORD --data=ETA

join SHIW/carcom10.csv HE --ikey=NQUEST,H_ID --okey=NQUEST,NORD --data=STUDIO

join HRS.csv HHRS --ikey=NQUEST,H_ID --okey=NQUEST,NORD --data=HRS --aggr=sum

HHRS = misszero(HHRS)

The remainder of the script is straightforward and does not require discussion here: we recode the

education variables for compatibility; delete some intermediate series that are not needed any more;

add informative labels; and save the ﬁnal product. See the Appendix for details.

To compare the results from this dataset with those from the earlier US data used by Mroz, one can

copy the input ﬁle heckit.inp (supplied with the gretl package) and substitute mroz_rep.gdt for

mroz87.gdt. It turns out that the results are qualitatively very similar.

Chapter 7. Joining data sources 54

7.9 The representation of dates

Up to this point all the data we have considered have been cross-sectional. In the following sections

we discuss data that have a time dimension, and before proceeding it may be useful to say something

about the representation of dates. Gretl takes the ISO 8601 standard as its reference point but provides

mean of converting dates provided in other formats; it also oﬀers a set of calendrical functions for

manipulating dates (isodate,isoconv,epochday and others).

ISO 8601 recognizes two formats for daily dates, “extended” and “basic”. In both formats dates are

given as 4-digit year, 2-digit month and 2-digit day, in that order. In extended format a dash is

inserted between the ﬁelds—as in 2013-10-21 or more generally YYYY-MM-DD—while in basic format

the ﬁelds are run together (YYYYMMDD). Extended format is more easily parsed by human readers

while basic format is more suitable for computer processing, since one can apply ordinary arithmetic

to compare dates as equal, earlier or later. The standard also recognizes YYYY-MM as representing year

and month, e.g. 2010-11 for November 2010,2as well as a plain four-digit number for year alone.

One problem for economists is that the “quarter” is not a period covered by ISO 8601. This could

be presented by YYYY-Q (with only one digit following the dash) but in gretl output we in fact use

a colon, as in 2013:2 for the second quarter of 2013. (For printed output of months gretl also uses

a colon, as in 2013:06. A diﬃculty with following ISO here is that in a statistical context a string

such as 1980-10 may look more like a subtraction than a date.) Anyway, at present we are more

interested in the parsing of dates on input rather than in what gretl prints. And in that context note

that “excess precision” is acceptable: a month may be represented by its ﬁrst day (e.g. 2005-05-01

for May, 2005), and a quarter may be represented by its ﬁrst month and day (2005-07-01 for the

third quarter of 2005).

Some additional points regarding dates will be taken up as they become relevant in practical cases of

joining data.

7.10 Time-series data

Suppose our left-hand dataset is recognized by gretl as time series with a supported frequency (annual,

quarterly, monthly, weekly, daily or hourly). This will be the case if the original data were read from

a ﬁle that contained suitable time or date information, or if a time-series interpretation has been

imposed using either the setobs command or its GUI equivalent. Then—apart, perhaps, from some

very special cases—joining additional data is bound to involve matching observations by time-period.

In this case, contrary to the cross-sectional case, the inner dataset has a natural ordering of which

gretl is aware; hence, no “inner key” is required.

If, in addition, the ﬁle from data which are to be joined is in native gretl format and contains time-

series information, keys are not needed at all. Three cases can arise: the frequency of the outer dataset

may be the same, lower or higher than that of the inner dataset. In the ﬁrst two cases join should

work without any special apparatus; lower-frequency values will be repeated for each high-frequency

period. In the third case, however, an aggregation method must be speciﬁed: gretl needs to know

how to map higher-frequency data into the existing dataset (by averaging, summing, or whatever).

If the outer data ﬁle is not in native gretl format we need a means of identifying the period of each

observation on the right, an outer key which we’ll call a “time key”. The join command provides a

simple (but limited) default for extracting period information from the outer data ﬁle, plus an option

that can be used if the default is not applicable, as follows.

•The default assumptions are: (1) the time key appears in the ﬁrst column; (2) the heading

of this column is either left blank or is one of obs,date,year,period,observation, or

observation_date (on a case-insensitive comparison); and (3) the time format conforms to ISO

8601 where applicable (“extended” daily date format YYYY-MM-DD, monthly format YYYY-MM, or

annual format YYYY).

•If dates do not appear in the ﬁrst column of the outer ﬁle, or if the column heading or format is

not as just described, the --tkey option can be used to indicate which column should be used

and/or what format should be assumed.

2The form YYYYMM is not recognized for year and month.

Chapter 7. Joining data sources 55

Setting the time-key column and/or format

The --tkey option requires a parameter holding the name of the column in which the time key

is located and/or a string specifying the format in which dates/times are written in the time-key

column. This parameter should be enclosed in double-quotes. If both elements are present they

should be separated by a comma; if only a format is given it should be preceded by a comma. Some

examples:

--tkey="Period,%m/%d/%Y"

--tkey="Period"

--tkey="obsperiod"

--tkey=",%Ym%m"

The ﬁrst of these applies if Period is not the ﬁrst column on the right, and dates are given in the

US format of month, day, year, separated by slashes. The second implies that although Period is

not the ﬁrst column, the date format is ISO 8601. The third again implies that the date format is

OK; here the name is required even if obsperiod is the ﬁrst column since this heading is not one

recognized by gretl’s heuristic. The last example implies that dates are in the ﬁrst column (with one

of the recognized headings), but are given in the non-standard format year, “m”, month.

The date format string should be composed using the codes employed by the POSIX function

strptime; Table 7.2 contains a list of the most relevant codes.3

Code Meaning

%% The % character.

%b The month name according to the current locale, either abbreviated or

in full.

%C The century number (0–99).

%d The day of month (1–31).

%D Equivalent to %m/%d/%y. (This is the American style date, very con-

fusing to non-Americans, especially since %d/%m/%y is widely used in

Europe. The ISO 8601 standard format is %Y-%m-%d.)

%H The hour (0–23).

%j The day number in the year (1–366).

%m The month number (1–12).

%n Arbitrary whitespace.

%q The quarter (1–4).

%w The weekday number (0–6) with Sunday = 0.

%y The year within century (0–99). When a century is not otherwise spec-

iﬁed, values in the range 69–99 refer to years in the twentieth century

(1969–1999); values in the range 00–68 refer to years in the twenty-ﬁrst

century (2000–2068).

%Y The year, including century (for example, 1991).

Table 7.2: Date format codes

Example: daily stock prices

We show below the ﬁrst few lines of a ﬁle named IBM.csv containing stock-price data for IBM

corporation.

Date,Open,High,Low,Close,Volume,Adj Close

2013-08-02,195.50,195.50,193.22,195.16,3861000,195.16

2013-08-01,196.65,197.17,195.41,195.81,2856900,195.81

2013-07-31,194.49,196.91,194.49,195.04,3810000,195.04

3The %q code for quarter is not present in strptime; it is added for use with join since quarterly data are common

in macroeconomics.

Chapter 7. Joining data sources 56

Note that the data are in reverse time-series order—that won’t matter to join, the data can appear

in any order. Also note that the ﬁrst column is headed Date and holds daily dates as ISO 8601

extended. That means we can pull the data into gretl very easily. In the following fragment we create

a suitably dimensioned empty daily dataset then rely on the default behavior of join with time-series

data to import the closing stock price.

nulldata 500

setobs 5 2012-01-01

join IBM.csv Close

To make explicit what we’re doing, we could accomplish exactly the same using the --tkey option:

join IBM.csv Close --tkey="Date,%Y-%m-%d"

Example: OECD quarterly data

Table 7.3 shows an excerpt from a CSV ﬁle provided by the OECD statistical site (stat.oecd.org)

in response to a request for GDP at constant prices for several countries.4

Frequency,Period,Country,Value,Flags

"Quarterly","Q1-1960","France",463876.148126845,E

"Quarterly","Q1-1960","Germany",768802.119278467,E

"Quarterly","Q1-1960","Italy",414629.791450547,E

"Quarterly","Q1-1960","United Kingdom",578437.090291889,E

"Quarterly","Q2-1960","France",465618.977328614,E

"Quarterly","Q2-1960","Germany",782484.138122549,E

"Quarterly","Q2-1960","Italy",420714.910290157,E

"Quarterly","Q2-1960","United Kingdom",572853.474696578,E

"Quarterly","Q3-1960","France",469104.41925852,E

"Quarterly","Q3-1960","Germany",809532.161494483,E

"Quarterly","Q3-1960","Italy",426893.675840156,E

"Quarterly","Q3-1960","United Kingdom",581252.066618986,E

"Quarterly","Q4-1960","France",474664.327992619,E

"Quarterly","Q4-1960","Germany",817806.132384948,E

"Quarterly","Q4-1960","Italy",427221.338414114,E

...

Table 7.3: Example of CSV ﬁle as provided by the OECD statistical website

This is an instance of data in what we call atomic format, that is, a format in which each line of the

outer ﬁle contains a single data-point and extracting data mainly requires ﬁltering the appropriate

lines. The outer time key is under the Period heading, and has the format Q<quarter>-<year> .

Assuming that the ﬁle in Table 7.3 has the name oecd.csv, the following script reconstructs the time

series of Gross Domestic Product for several countries:

nulldata 220

setobs 4 1960:1

join oecd.csv FRA --tkey="Period,Q%q-%Y" --data=Value --filter="Country==\"France\""

join oecd.csv GER --tkey="Period,Q%q-%Y" --data=Value --filter="Country==\"Germany\""

join oecd.csv ITA --tkey="Period,Q%q-%Y" --data=Value --filter="Country==\"Italy\""

join oecd.csv UK --tkey="Period,Q%q-%Y" --data=Value --filter="Country==\"United Kingdom\""

Note the use of the format codes %q for the quarter and %Y for the 4-digit year. A touch of elegance

could have been added by storing the invariant options to join using the setopt command, as in

setopt join persist --tkey="Period,Q%q-%Y" --data=Value

join oecd.csv FRA --filter="Country==\"France\""

join oecd.csv GER --filter="Country==\"Germany\""

4Retrieved 2013-08-05. The OECD ﬁles in fact contain two leading columns with very long labels; these are

irrelevant to the present example and can be omitted without altering the sample script.

Chapter 7. Joining data sources 57

join oecd.csv ITA --filter="Country==\"Italy\""

join oecd.csv UK --filter="Country==\"United Kingdom\""

setopt join clear

If one were importing a large number of such series it might be worth rewriting the sequence of joins

as a loop, as in

strings countries = defarray("France", "Germany", "Italy", "United Kingdom")

strings vnames = defarray("FRA", "GER", "ITA", "UK")

setopt join persist --tkey="Period,Q%q-%Y" --data=Value

loop foreach i countries

vname = vnames[i]

join oecd.csv @vname --filter="Country==\"$i\""

endloop

setopt join clear

7.11 Special handling of time columns

When dealing with straight time series data the tkey mechanism described above should suﬃce in

almost all cases. In some contexts, however, time enters the picture in a more complex way; examples

include panel data (see section 7.12) and so-called realtime data (see chapter 8). To handle such

cases join provides the --tconvert option. This can be used to select certain columns in the right-

hand data ﬁle for special treatment: strings representing dates in these columns will be converted to

numerical values: 8-digit numbers on the pattern YYYYMMDD (ISO basic daily format). Once dates are

in this form it is easy to use them in key-matching or ﬁltering.

By default it is assumed that the strings in the selected columns are in ISO extended format,

YYYY-MM-DD. If that is not the case you can supply a time-format string using the --tconv-fmt

option. The format string should be written using the codes shown in Table 7.2.

Here are some examples:

# select one column for treatment

--tconvert=start_date

# select two columns for treatment

--tconvert="start_date,end_date"

# specify US-style daily date format

--tconv-fmt="%m/%d/%Y"

# specify quarterly date-strings (as in 2004q1)

--tconv-fmt="%Yq%q"

Some points to note:

•If a speciﬁed column is not selected for a substantive role in the join operation (as data to be

imported, as a key, or as an auxiliary variable for use in aggregation) the column in question is

not read and so no conversion is carried out.

•If a speciﬁed column contains numerical rather than string values, no conversion is carried out.

•If a string value in a selected column fails parsing using the relevant time format (user-speciﬁed

or default), the converted value is NA.

•On successful conversion, the output is always in daily-date form as stated above. If you specify

a monthly or quarterly time format, the converted date is the ﬁrst day of the month or quarter.

7.12 Panel data

In section 7.10 we gave an example of reading quarterly GDP data for several countries from an

OECD ﬁle. In that context we imported each country’s data as a distinct time-series variable. Now

Chapter 7. Joining data sources 58

suppose we want the GDP data in panel format instead (stacked time series). How can we do this

with join?

As a reminder, here’s what the OECD data look like:

Frequency,Period,Country,Value,Flags

"Quarterly","Q1-1960","France",463876.148126845,E

"Quarterly","Q1-1960","Germany",768802.119278467,E

"Quarterly","Q1-1960","Italy",414629.791450547,E

"Quarterly","Q1-1960","United Kingdom",578437.090291889,E

"Quarterly","Q2-1960","France",465618.977328614,E

and so on. If we have four countries and quarterly observations running from 1960:1 to 2013:2 (T=

214 quarters) we might set up our panel workspace like this:

scalar N = 4

scalar T = 214

scalar NT = N*T

nulldata NT --preserve

setobs T 1.1 --stacked-time-series

The relevant outer keys are obvious: Country for the country and Period for the time period. Our

task is now to construct matching keys in the inner dataset. This can be done via two panel-speciﬁc

options to the setobs command. Let’s work on the time dimension ﬁrst:

setobs 4 1960:1 --panel-time

series quarter = $obsdate

This variant of setobs allows us to tell gretl that time in our panel is quarterly, starting in the

ﬁrst quarter of 1960. Having set that, the accessor $obsdate will give us a series of 8-digit dates

representing the ﬁrst day of each quarter—19600101, 19600401, 19600701, and so on, repeating for

each country. As we explained in section 7.11, we can use the --tconvert option on the outer series

Period to get exactly matching values (in this case using a format of Q%q-%Y for parsing the Period

values).

Now for the country names:

string cstrs = sprintf("France Germany Italy \"United Kingdom\"")

setobs country cstrs --panel-groups

Here we write into the string cstrs the names of the countries, using escaped double-quotes to handle

the space in “United Kingdom”, then pass this string to setobs with the --panel-groups option,

preceded by the identiﬁer country. This asks gretl to construct a string-valued series named country,

in which each name will repeat Ttimes.

We’re now ready to join. Assuming the OECD ﬁle is named oecd.csv we do

join oecd.csv GDP --data=Value \

--ikey=country,quarter --okey=Country,Period \

--tconvert=Period --tconv-fmt="Q%q-%Y"

Other input formats

The OECD ﬁle discussed above is in the most convenient format for join, with one data-point per

line. But sometimes we may want to make a panel from a data ﬁle structured like this:

# Real GDP

Period,France,Germany,Italy,"United Kingdom"

"Q1-1960",463863,768757,414630,578437

"Q2-1960",465605,782438,420715,572853

"Q3-1960",469091,809484,426894,581252

"Q4-1960",474651,817758,427221,584779

"Q1-1961",482285,826031,442528,594684

...

Chapter 7. Joining data sources 59

Call this ﬁle side_by_side.csv. Assuming the same initial set-up as above, we can panelize the data

by setting the sample to each country’s time series in turn and importing the relevant column. The

only point to watch here is that the string “United Kingdom”, being a column heading, will become

United_Kingdom on importing (see section 7.2) so we’ll need a slightly diﬀerent set of country strings.

strings cstrs = defarray("France", "Germany", "Italy", "United_Kingdom")

setobs country cstrs --panel-groups

loop foreach i cstrs

smpl country=="$i" --restrict --replace

join side_by_side.csv GDP --data=$i \

--ikey=quarter --okey=Period \

--tconvert=Period --tconv-fmt="Q%q-%Y"

endloop

smpl full

If our working dataset and the outer data ﬁle are dimensioned such that there are just as many

time-series observations on the right as there are time slots on the left—and the observations on the

right are contiguous, in chronological order, and start on the same date as the working dataset—we

could dispense with the key apparatus and just use the ﬁrst line of the join command shown above.

However, in general it is safer to use keys to ensure that the data end up in correct registration.

7.13 Memo: join options

Basic syntax: join ﬁlename varname(s) [options ]

ﬂag eﬀect

--data Give the name of the data column on the right, in case it diﬀers from

varname (7.2); single import only

--filter Specify a condition for ﬁltering data rows (7.3)

--ikey Specify up to two keys for matching data rows (7.4)

--okey Specify outer key name(s) in case they diﬀer the inner ones (7.4)

--aggr Select an aggregation method for 1 to njoins (7.5)

--tkey Specify right-hand time key (7.10)

--tconvert Select outer date columns for conversion to numeric form (7.11)

--tconv-fmt Specify a format for use with tconvert (7.11)

--no-header Treat the ﬁrst row on the right as data (7.2)

--verbose Report on progress in reading the outer data

Chapter 7. Joining data sources 60

Appendix: the full Mroz data script

# start with everybody; get gender, age and a few other variables

# directly while we’re at it

open SHIW/carcom10.csv --cols=1,2,3,4,9,10,29,41

# subsample on married women between the ages of 30 and 60

smpl SEX==2 && ETA>=30 && ETA<=60 && STACIV==1 --restrict

# for simplicity, restrict to heads of households and their spouses

smpl PARENT<3 --restrict

# rename the age and education variables for compatibility; compute

# the "city" dummy and finally save the reduced base dataset

rename ETA WA

rename STUDIO WE

series CIT = (ACOM4C>2)

store mroz_rep.gdt

# make a temp file holding annual hours worked per job

open SHIW/allb1.csv --cols=1,2,8,11 --quiet

series HRS = misszero(ORETOT) * 52 * misszero(MESILAV)/12

store HRS.csv NQUEST NORD HRS

# reopen the base dataset and begin drawing assorted data in

open mroz_rep.gdt

# women’s annual hours (summed across jobs)

join HRS.csv WHRS --ikey=NQUEST,NORD --data=HRS --aggr=sum

WHRS = misszero(WHRS)

# labor force participation

LFP = WHRS > 0

# work experience: ETALAV = age when started first job

join SHIW/lavoro.csv ETALAV --ikey=NQUEST,NORD

series AX = misszero(WA - ETALAV)

# women’s hourly wages

join SHIW/rper10.csv YL YM --ikey=NQUEST,NORD --aggr=sum

series WW = LFP ? (YL + YM)/WHRS : 0

# family income (Y = net disposable income)

join SHIW/rfam10.csv FAMINC --ikey=NQUEST --data=Y

# get data on children using the "count" method

join SHIW/carcom10.csv KIDS --ikey=NQUEST --aggr=count --filter="ETA<=18"

join SHIW/carcom10.csv KL6 --ikey=NQUEST --aggr=count --filter=ETA<6

series K618 = KIDS - KL6

# data on husbands: we first construct an auxiliary inner key for

# husbands, using the little trick of subsampling the inner dataset

# for women who are household heads

smpl PARENT==1 --restrict --replace

join SHIW/carcom10.csv H_ID --ikey=NQUEST --data=NORD --filter="PARENT==2"

# for women who are not household heads

smpl PARENT==2 --restrict --replace

join SHIW/carcom10.csv H_ID --ikey=NQUEST --data=NORD --filter="PARENT==1"

smpl full

# add husbands’ data via the newly-added secondary inner key

join SHIW/carcom10.csv HA --ikey=NQUEST,H_ID --okey=NQUEST,NORD --data=ETA

join SHIW/carcom10.csv HE --ikey=NQUEST,H_ID --okey=NQUEST,NORD --data=STUDIO

join HRS.csv HHRS --ikey=NQUEST,H_ID --okey=NQUEST,NORD --data=HRS --aggr=sum

Chapter 7. Joining data sources 61

HHRS = misszero(HHRS)

# final cleanup begins

# recode educational attainment as years of education

matrix eduyrs = {0, 5, 8, 11, 13, 16, 18, 21}

series WE = replace(WE, seq(1,8), eduyrs)

series HE = replace(HE, seq(1,8), eduyrs)

# cut some cruft

delete SEX STACIV KIDS YL YM PARENT H_ID ETALAV

# add some labels for the series

setinfo LFP -d "1 if woman worked in 2010"

setinfo WHRS -d "Wife’s hours of work in 2010"

setinfo KL6 -d "Number of children less than 6 years old in household"

setinfo K618 -d "Number of children between ages 6 and 18 in household"

setinfo WA -d "Wife’s age"

setinfo WE -d "Wife’s educational attainment, in years"

setinfo WW -d "Wife’s average hourly earnings, in 2010 euros"

setinfo HHRS -d "Husband’s hours worked in 2010"

setinfo HA -d "Husband’s age"

setinfo HE -d "Husband’s educational attainment, in years"

setinfo FAMINC -d "Family income, in 2010 euros"

setinfo AX -d "Actual years of wife’s previous labor market experience"

setinfo CIT -d "1 if live in large city"

# save the final product

store mroz_rep.gdt

Chapter 8

Realtime data

8.1 Introduction

The join command in gretl (see chapter 7) deals in a fairly straightforward manner with so-called

realtime datasets. Such datasets contain information on when the observations in a time series were

actually published by the relevant statistical agency and how they have been revised over time.

Probably the most popular sources of such data are the “Alfred”online database at the St. Louis Fed

(http://alfred.stlouisfed.org/) and the OECD’s StatExtracts site, http://stats.oecd.org/.

The examples in this chapter deal with ﬁles downloaded from these sources, but should be easy to

adapt to ﬁles with a slightly diﬀerent format.

As already stated, join requires a column-oriented plain text ﬁle, where the columns may be separated

by commas, tabs, spaces or semicolons. Alfred and the OECD provide the option to download realtime

data in this format (tab-delimited ﬁles from Alfred, comma-delimited from the OECD). If you have

a realtime dataset in a spreadsheet ﬁle you must export it to a delimited text ﬁle before using it with

join.

Representing revision histories is more complex than just storing a standard time series, because for

each observation period you have in general more than one published value over time, along with

the information on when each of these values were valid or current. Sometimes this is represented in

spreadsheets with two time axes, one for the observation period and another one for the publication

date or “vintage”. The ﬁlled cells then form an upper triangle (or a “guillotine blade” shape, if the

publication dates do not reach back far enough to complete the triangle). This format can be useful

for giving a human reader an overview of realtime data, but it is not optimal for automatic processing;

for that purpose “atomic” format is best.

8.2 Atomic format for realtime data

What we are calling atomic format is exactly the format used by Alfred if you choose the option

“Observations by Real-Time Period”, and by the OECD if you select all editions of a series for

download as plain text (CSV).1A ﬁle in this format contains one actual data-point per line, together

with associated metadata. This is illustrated in Table 8.1, where we show the ﬁrst three lines from

an Alfred ﬁle and an OECD ﬁle (slightly modiﬁed).2

Alfred: monthly US industrial production

observation_date,INDPRO,realtime_start_date,realtime_end_date

1960-01-01,112.0000,1960-02-16,1960-03-15

1960-01-01,111.0000,1960-03-16,1961-10-15

OECD: monthly UK industrial production

Country,Variable,Frequency,Time,Edition,Value,Flags

"United Kingdom","INDPRO","Monthly","Jan-1990","February 1999",100,

"United Kingdom","INDPRO","Monthly","Feb-1990","February 1999",99.3,

Table 8.1: Variant atomic formats for realtime data

Consider the ﬁrst data line in the Alfred ﬁle: in the observation_date column we ﬁnd 1960-01-01,

1If you choose to download in Excel format from OECD you get a ﬁle in the triangular or guillotine format mentioned

above.

2In the Alfred ﬁle we have used commas rather than tabs as the column delimiter; in the OECD example we have

shortened the name in the Variable column.

Chapter 8. Realtime data 63

indicating that the data-point on this line, namely 112.0, is an observation or measurement (in this

case, of the US index of industrial production) that refers to the period starting on January 1st 1960.

The realtime_start_date value of 1960-02-16 tells us that this value was published on February

16th 1960, and the realtime_end_date value says that this vintage remained current through March

15th 1960. On the next day (as we can see from the following line) this data-point was revised slightly

downward to 111.0.

Daily dates in Alfred ﬁles are given in ISO extended format, YYYY-MM-DD, but below we describe

how to deal with diﬀerently formatted dates. Note that daily dates are appropriate for the last two

columns, which jointly record the interval over which a given data vintage was current. Daily dates

might, however, be considered overly precise for the ﬁrst column, since the data period may well be

the year, quarter or month. However, following Alfred’s practice it is acceptable to specify a daily

date, indicating the ﬁrst day of the period, even for non-daily data.3

Compare the ﬁrst data line of the OECD example. There’s a greater amount of leading metadata,

which is left implicit in the Alfred ﬁle. Here Time is the equivalent of Alfred’s observation_date,

and Edition the equivalent of Alfred’s realtime_start_date. So we read that in February 1999 a

value of 100 was current for the UK index of industrial production for January 1990, and from the

next line we see that in the same vintage month a value of 99.3 was current for industrial production

in February 1990.

Besides the diﬀerent names and ordering of the columns, there are a few more substantive diﬀerences

between Alfred and OECD ﬁles, most of which are irrelevant for join but some of which are (possibly)

relevant.

The ﬁrst (irrelevant) diﬀerence is the ordering of the lines. It appears (though we’re not sure how

consistent this is) that in Alfred ﬁles the lines are sorted by observation date ﬁrst and then by

publication date—so that all revisions of a given observation are grouped together—while OECD ﬁles

are sorted ﬁrst by revision date (Edition) and then by observation date (Time). If we want the next

revision of UK industrial production for January 1990 in the OECD ﬁle we have to scan down several

lines until we ﬁnd

"United Kingdom","INDPRO","Monthly","Jan-1990","March 1999",100,

This diﬀerence in format is basically irrelevant because join can handle the case where the lines

appear in random order, although some operations can be coded more conveniently if we’re able to

assume chronological ordering (either on the Alfred or the OECD pattern, it doesn’t matter).

The second (also irrelevant) diﬀerence is that the OECD seems to include periodic “Edition” lines

even when there is no change from the previous value (as illustrated above, where the UK industrial

production index for January 1990 is reported as 100 as of March 1999, the same value that we saw

to be current in February 1999), while Alfred reports a new value only when it diﬀers from what was

previously current.

A third diﬀerence lies in the dating of the revisions or editions. As we have seen, Alfred gives a

speciﬁc daily date while (in the UK industrial production ﬁle at any rate), the OECD just dates each

edition to a month. This is not necessarily relevant for join, but it does raise the question of whether

the OECD might date revisions to a ﬁner granularity in some of their ﬁles, in which case one would

have to be on the lookout for a diﬀerent date format.

The ﬁnal diﬀerence is that Alfred supplies an “end date” for each data vintage while the OECD

supplies only a starting date. But there is less to this diﬀerence than meets the eye: according to

the Alfred webmaster, “by design, a new vintage must start immediately following (the day after) the

lapse of the old vintage”—so the end date conveys no independent information.4

3Notice that this implies that in the Alfred example it is not clear without further information whether the obser-

vation period is the ﬁrst quarter of 1960, the month January 1960, or the day January 1st 1960. However, we assume

that this information is always available in context.

4Email received from Travis May of the Federal Reserve Bank of St. Louis, 2013-10-17. This closes oﬀ the possibility

that a given vintage could lapse or expire some time before the next vintage becomes available, hence giving rise to a

“hole” in an Alfred realtime ﬁle.

Chapter 8. Realtime data 64

8.3 More on time-related options

Before we get properly started it is worth saying a little more about the --tkey and --tconvert

options to join (introduced in section 7.11), as they apply in the case of realtime data.

When you’re working with regular time series data tkey is likely to be useful while tconvert is

unlikely to be applicable (see section 7.10). On the other hand, when you’re working with panel data

tkey is deﬁnitely not applicable but tconvert may well be helpful (section 7.12). When working with

realtime data, however, depending on the task in hand both options may be useful. You will likely

need tkey; you may well wish to select at least one column for tconvert treatment; and in fact you

may want to name a given column in both contexts—that is, include the tkey variable among the

tconvert columns.

Why might this make sense? Well, think of the --tconvert option as a “preprocessing” directive: it

asks gretl to convert date strings to numerical values (8-digit ISO basic dates) “at source”, as they

are read from the outer dataﬁle. The --tkey option, on the other hand, singles out a column as the

one to use for matching rows with the inner dataset. So you would want to name a column in both

roles if (a) it should be used for matching periods and also (b) it is desirable to have the values from

this column in numerical form, most likely for use in ﬁltering.

As we have seen, you can supply speciﬁc formats in connection with both tkey and tconvert (in the

latter case via the companion option --tconv-fmt) to handle the case where the date strings on the

right are not ISO-friendly at source. This raises the question of how the format speciﬁcations work if

a given column is named under both options. Here are the rules that gretl applies:

1. If a format is given with the --tkey option it always applies to the tkey column alone; and for

that column it overrides any format given via the --tconv-fmt option.

2. If a format is given via tconv-fmt it is assumed to apply to all the tconvert columns, unless

this assumption is preempted by rule 1.

8.4 Getting a certain data vintage

The most common application of realtime data is to “travel back in time” and retrieve the data that

were current as of a certain date in the past. This would enable you to replicate a forecast or other

statistical result that could have been produced at that date.

For example, suppose we are interested in a variable of monthly frequency named INDPRO, realtime

data on which is stored in an Alfred ﬁle named INDPRO.txt, and we want to check the status quo as

of June 15th 2011.

If we don’t already have a suitable dataset into which to import the INDPRO data, our ﬁrst steps will

be to create an appropriately dimensioned empty dataset using the nulldata command and then

specify its time-series character via setobs, as in

nulldata 132

setobs 12 2004:01

For convenience we can put the name of our realtime ﬁle into a string variable. On Windows this

might look like

string fname = "C:/Users/yourname/Downloads/INDPRO.txt"

We can then import the data vintage 2011-06-15 using join, arbitrarily choosing the (hopefully)

self-explanatory identiﬁer ip_asof_20110615.

join @fname ip_asof_20110615 --tkey=observation_date --data=INDPRO \

--tconvert="realtime_start_date" \

--filter="realtime_start_date<=20110615" --aggr=max(realtime_start_date)

Here some detailed explanations of the various options are warranted:

Chapter 8. Realtime data 65

•The --tkey option speciﬁes the column which should be treated as holding the observation

period identiﬁers to be matched against the periods in the current gretl dataset.5The more

general form of this option is --tkey="colname,format" (note the double quotes here), so if

the dates do not come in standard format, we can tell gretl how to parse them by using the

appropriate conversion speciﬁers as shown in Table 7.2. For example, here we could have written

--tkey="observation_date,%Y-%m-%d".

•Next, --data=INDPRO tells gretl that we want to retrieve the entries stored in the column named

INDPRO.

•As explained in section 7.11 the --tconvert option selects certain columns in the right-hand

data ﬁle for conversion from date strings to 8-digit numbers on the pattern YYYYMMDD. We’ll need

this for the next step, ﬁltering, since the transformation to numerical values makes it possible

to perform basic arithmetic on dates. Note that since date strings in Alfred ﬁles conform to

gretl’s default assumption it is not necessary to use the --tconv-fmt option here.

•The --filter option speciﬁcation in combination with the subsequent --aggr aggregation

treatment is the central piece of our data retrieval; notice how we use the date constant 20110615

in ISO basic form to do numerical comparisons, and how we perform the numerical max operation

on the converted column realtime_start_date. It would also have been possible to predeﬁne

a scalar variable, as in

vintage = 20110615

and then use vintage in the join command instead. Here we tell join that we only want to

extract those publications that (1) already appeared before (and including) June 15th 2011,

and (2) were not yet obsoleted by a newer release.6

As a result, your dataset will now contain a time series named ip_asof_20110615 with the values that

a researcher would have had available on June 15th 2011. Of course, all values for the observations

after June 2011 will be missing (and probably a few before that, too), because they only have become

available later on.

8.5 Getting the n-th release for each observation period

For some purposes it may be useful to retrieve the n-th published value of each observation, where n

is a ﬁxed positive integer, irrespective of when each of these n-th releases was published. Suppose we

are interested in the third release, then the relevant join command becomes:

join @fname ip_3rdpub --tkey=observation_date --data=INDPRO --aggr="seq:3"

Since we do not need the realtime_start_date information for this retrieval, we have dropped

the --tconvert option here. Note that this formulation assumes that the source ﬁle is ordered

chronologically, otherwise using the option --aggr="seq:3", which retrieves the third value from

each sequence of matches, could have yielded a result diﬀerent from the one intended. However, this

assumption holds for Alfred ﬁles and is probably rather safe in general.

The values of the variable imported as ip_3rdpub in this way were published at diﬀerent dates, so

the variable is eﬀectively a mix of diﬀerent vintages. Depending on the type of variable, this may

also imply drastic jumps in the values; for example, index numbers are regularly re-based to diﬀerent

base periods. This problem also carries over to inﬂation-adjusted economic variables, where the base

period of the price index changes over time. Mixing vintages in general also means mixing diﬀerent

scales in the output, with which you would have to deal appropriately.7

5Strictly speaking, using --tkey is unnecessary in this example because we could just have relied on the default,

which is to use the ﬁrst column in the source ﬁle for the periods. However, being explicit is often a good idea.

6By implementing the second condition through the max aggregation on the realtime_start_date column alone,

without using the realtime_end_date column, we make use of the fact that Alfred ﬁles cannot have “holes” as explained

before.

7Some user-contributed functions may be available that address this issue, but it is beyond our scope here. Another

even more complicated issue in the realtime context is that of “benchmark revisions” applied by statistical agencies,

where the underlying deﬁnition or composition of a variable changes on some date, which goes beyond a mere rescaling.

However, this type of structural change is not, in principle, a feature of realtime data alone, but applies to any time-series

data.

Chapter 8. Realtime data 66

8.6 Getting the values at a ﬁxed lag after the observation period

New data releases may take place on any day of the month, and as we have seen the speciﬁc day

of each release is recorded in realtime ﬁles from Alfred. However, if you are working with, say,

monthly or quarterly data you may sometimes want to adjust the granularity of your realtime axis

to a monthly or quarterly frequency. For example, in order to analyse the data revision process for

monthly industrial production you might be interested in the extent of revisions between the data

available two and three months after each observation period.

This is a relatively complicated task and there is more than one way of accomplishing it. Either you

have to make several passes through the outer dataset or you need a sophisticated ﬁlter, written as a

hansl function. Either way you will want to make use of some of gretl’s built-in calendrical functions.

We’ll assume that a suitably dimensioned workspace has been set up as described above. Given that,

the key ingredients of the join are a ﬁltering function which we’ll call rel_ok (for “release is OK”)

and the join command which calls it. Here’s the function:

function series rel_ok (const series obsdate, const series reldate, int p)

series y_obs, m_obs, y_rel, m_rel

# get year and month from observation date

isoconv(obsdate, &y_obs, &m_obs)

# get year and month from release date

isoconv(reldate, &y_rel, &m_rel)

# find the delta in months

series dm = (12*y_rel + m_rel) - (12*y_obs + m_obs)

# and implement the filter

return dm <= p

end function

Note that the series arguments to rel_ok are marked as const so that they’re simply shared with

the function rather than being copied (since they’re not being modiﬁed; see chapter 14). And here’s

the command:

scalar lag = 3 # choose your fixed lag here

join @fname ip_plus3 --data=INDPRO --tkey=observation_date \

--tconvert="observation_date,realtime_start_date" \

--filter="rel_ok(observation_date, realtime_start_date, lag)" \

--aggr=max(realtime_start_date)

Note that we use --tconvert to convert both the observation date and the realtime start date (or

release date) to 8-digit numerical values. Both of these series are passed to the ﬁlter, which uses the

built-in function isoconv to extract year and month. We can then calculate dm, the “delta months”

since the observation date, for each release. The ﬁlter condition is that this delta should be no greater

than the speciﬁed lag, p.8

This ﬁlter condition may be satisﬁed by more than one release, but only the latest of those will

actually be the vintage that was current at the end of the n-th month after the observation period,

so we add the option --aggr=max(realtime_start_date). If instead you want to target the release

at the beginning of the n-th month you would have to use a slightly more complicated ﬁlter function.

An illustration

Figure 8.1 shows four time series for the monthly index of US industrial production from October

2005 to June 2009: the value as of ﬁrst publication plus the values current 3, 6 and 12 months out

from the observation date.9From visual inspection it would seem that over much of this period the

Federal reserve was fairly consistently overestimating industrial production at ﬁrst release and shortly

thereafter, relative to the ﬁgure they arrived at with a lag of a year.

The script that produced this Figure is shown in full in Listing 8.1.

8The ﬁlter is written on the assumption that the lag is expressed in months; on that understanding it could be used

with annual or quarterly data as well as monthly. The idea could be generalized to cover weekly or daily data without

much diﬃculty.

9Why not a longer series? Because if we try to extend it in either direction we immediately run into the index

re-basing problem mentioned in section 8.5, with big (staggered) leaps downward in all the series.

Chapter 8. Realtime data 67

☞To replicate the examples in Listings 8.1 and 8.2 below you’ll need the Alfred ﬁle INDPRO.txt, which is available

as https://gretl.sf.net/gretldata/INDPRO.txt.

100

102

104

106

108

110

112

114

116

2006 2007 2008 2009

First publication

Plus 3 months

Plus 6 months

Plus 12 months

Figure 8.1: Successive revisions to US industrial production

8.7 Getting the revision history for an observation

For our ﬁnal example we show how to retrieve the revision history for a given observation (again

using Alfred data on US industrial production). In this exercise we are switching the time axis: the

observation period is a ﬁxed point and time is “vintage time”.

A suitable script is shown in Listing 8.2. We ﬁrst select an observation to track (January 1970). We

start the clock in the following month, when a data-point for this period was ﬁrst published, and let

it run to the end of the vintage history (in this ﬁle, March 2013). Our outer time key is the realtime

start date and we ﬁlter on observation date; we name the imported INDPRO values as ip_jan70. Since

it sometimes happens that more than one revision occurs in a given month we need to select an

aggregation method: here we choose to take the last revision in the month.

Recall from section 8.2 that Alfred records a new revision only when the data-point in question

actually changes. This means that our imported series will contain missing values for all months

when no real revision took place. However, we can apply a simple autoregressive rule to ﬁll in the

blanks: each missing value equals the prior non-missing value.

Figure 8.2 displays the revision history. Over this sample period the periodic re-basing of the index

overshadows amendments due to accrual of new information.

Chapter 8. Realtime data 68

Listing 8.1: Retrieving successive realtime lags of US industrial production [Download ▼]

function series rel_ok (const series obsdate, const series reldate, int p)

series y_obs, m_obs, d_obs, y_rel, m_rel, d_rel

isoconv(obsdate, &y_obs, &m_obs, &d_obs)

isoconv(reldate, &y_rel, &m_rel, &d_rel)

series dm = (12*y_rel + m_rel) - (12*y_obs + m_obs)

return dm < p || (dm == p && d_rel <= d_obs)

end function

nulldata 45

setobs 12 2005:10

string fname = "INDPRO.txt"

# initial published values

join @fname firstpub --data=INDPRO --tkey=observation_date \

--tconvert=realtime_start_date --aggr=min(realtime_start_date)

# plus 3 months

join @fname plus3 --data=INDPRO --tkey=observation_date \

--tconvert="observation_date,realtime_start_date" \

--filter="rel_ok(observation_date, realtime_start_date, 3)" \

--aggr=max(realtime_start_date)

# plus 6 months

join @fname plus6 --data=INDPRO --tkey=observation_date \

--tconvert="observation_date,realtime_start_date" \

--filter="rel_ok(observation_date, realtime_start_date, 6)" \

--aggr=max(realtime_start_date)

# plus 12 months

join @fname plus12 --data=INDPRO --tkey=observation_date \

--tconvert="observation_date,realtime_start_date" \

--filter="rel_ok(observation_date, realtime_start_date, 12)" \

--aggr=max(realtime_start_date)

setinfo firstpub --graph-name="First publication"

setinfo plus3 --graph-name="Plus 3 months"

setinfo plus6 --graph-name="Plus 6 months"

setinfo plus12 --graph-name="Plus 12 months"

# set --output=realtime.pdf for PDF

gnuplot firstpub plus3 plus6 plus12 --time --with-lines \

--output=display { set key left bottom; }

Chapter 8. Realtime data 69

Listing 8.2: Retrieving a revision history [Download ▼]

# choose the observation to track here (YYYYMMDD)

scalar target = 19700101

nulldata 518 --preserve

setobs 12 1970:02

join INDPRO.txt ip_jan70 --data=INDPRO --tkey=realtime_start_date \

--tconvert=observation_date \

--filter="observation_date==target" --aggr=seq:-1

ip_jan70 = ok(ip_jan70) ? ip_jan70 : ip_jan70(-1)

gnuplot ip_jan70 --time --with-lines --output=display

100

120

140

160

180

1970 1975 1980 1985 1990 1995 2000 2005 2010

ip_jan70

Figure 8.2: Vintages of the index of US industrial production for January 1970

Chapter 9

Temporal disaggregation

9.1 Introduction

This chapter describes and explains the facility for temporal disaggregation in gretl.1This is im-

plemented by the tdisagg function, which supports three variants of the method of Chow and Lin

(1971); the method of Fern´andez (1981); and two variants of the method of Denton (1971) as modiﬁed

by Cholette (1984). Given the analytical similarities between them, the three Chow–Lin variants and

the Fern´andez method will be grouped in the discussion below as “Chow–Lin methods”.

The balance of this section provides a gentle introduction to the idea of temporal disaggregation;

experts may wish to skip to the next section.

Basically, temporal disaggregation is the business of taking time-series data observed at some given

frequency (say, annually) and producing a counterpart series at a higher frequency (say, quarterly).

The term “disaggregation” indicates the inverse operation of aggregation, and to understand temporal

disaggregation it’s helpful ﬁrst to understand temporal aggregation. In aggregating a high frequency

series to a lower frequency there are three basic methods, the appropriate method depending on the

nature of the data. Here are some illustrative examples.

•GDP: say we have quarterly GDP data and wish to produce an annual series. This is a ﬂow

variable and the annual ﬂow will be the sum of the quarterly values (unless the quarterly values

are annualized, in which case we would aggregate by taking their mean).

•Industrial Production: this takes the form of an index reporting the level of production over

some period relative to that in a base period in which the index is by construction 100. To

aggregate from (for example) monthly to quarterly we should take the average of the monthly

values. (The sum would give a nonsense result.) The same goes for price indices, and also for

ratios of stocks to ﬂows or vice versa (inventory to sales, debt to GDP, capacity utilization).

•Money stock: this is typically reported as an end-of-period value, so in aggregating from monthly

to quarterly we’d take the value from the ﬁnal month of each quarter. In case a stock variable

is reported as a start-of-period value, the aggregated version would be that of the ﬁrst month

of the quarter.

A central idea in temporal disaggregation is that the high frequency series must respect both the

given low frequency data and the aggregation method. So for example, whatever numbers we come

up with for quarterly GDP, given an annual series as starting point, our numbers must sum to the

annual total. If money stock is measured at the end of the period then whatever numbers we come

up with for monthly money stock, given quarterly data, the ﬁgure for the last month of the quarter

must match that for the quarter as a whole. This is why temporal disaggregation is sometimes called

“benchmarking”: the given low frequency data constitute a benchmark which the constructed high

frequency data must match, in a well deﬁned sense that depends on the nature of the data.

Colloquially, we might describe temporal disaggregation as “interpolation,” but strictly speaking in-

terpolation applies only to stock variables. We have a known end-of-quarter value (say), which is

also the value at the end of the last month of the quarter, and we’re trying to ﬁgure out what the

value might have been at the end of months 1 and 2. We’re ﬁlling in the blanks, or interpolating.

In the GDP case, however, the procedure is distribution rather than interpolation. We have a given

annual total and we’re trying to ﬁgure out how it should be distributed over the quarters. We’re also

doing distribution for variables taking the form of indices or ratios, except in this case we’re seeking

plausible values whose mean equals the given low-frequency value.

1We are grateful to Tommaso Di Fonzo, Professor of Statistical Science at the University of Padua, for detailed and

precise comments on earlier drafts. Any remaining errors are, of course, our responsibility.

Chapter 9. Temporal disaggregation 71

While matching the low frequency benchmark is an important constraint, it obviously does not tie

down the high frequency values. That is a job for either regression-based methods such as Chow–Lin

or non-regression methods such as Denton. Details are provided in section 9.7.

9.2 Notation and design

Some notation ﬁrst: the two main ingredients in temporal disaggregation are

•aT×gmatrix Y(holding the series to be disaggregated) and

•a matrix Xwith kcolumns and (s·T+m) rows (to aid in the disaggregation).

The idea is that Ycontains time series data sampled at some frequency f, while each column of X

contains time series data at a higher frequency, sf . So for each observation Ytwe have scorresponding

rows in X. The object is to produce a transformation of Yto frequency sf , with the help of X(whose

columns are typically called “related series” or“indicators” in the temporal disaggregation literature),

via either distribution or interpolation depending on the nature of the data. For most of this document

we will assume that g= 1, or in other words we are performing temporal disaggregation on a single

low-frequency series, but tdisagg supports “batch processing” of several series and we return to this

point in section 9.9.

If the min (s·T+m) is greater than zero, that implies that there are some “extra” high-frequency

observations available for extrapolation—see section 9.4 for details.

We need to say something more about what goes into X. Under the Denton methods this must be a

single series, generally known as the “preliminary series”.2For the Chow–Lin methods, Xcan hold

a combination of deterministic terms (e.g. constant, trend) and stochastic series. Naturally, suitable

candidates for the role of preliminary series or indicator will be variables that are correlated with Y

(and in particular, might be expected to share short-run dynamics with Y). However, it is possible

to carry out disaggregation using deterministic terms only—in the simplest case, with Xcontaining

nothing but a constant. Experts in the ﬁeld tend to frown on this, with reason: in the absence of any

genuine high-frequency information disaggregation just amounts to a “mechanical” smoothing. But

some people may have a use for such smoothing, and it’s permitted by tdisagg.

We should draw attention to a design decision in tdisagg: we have separated the speciﬁcation of

indicators in Xfrom certain standard deterministic terms that might be wanted, namely, a constant,

linear trend or quadratic trend. If you want a disaggregation without stochastic indicators, you can

omit (or set to null) the argument corresponding to X. In that case a constant (only) will be

employed automatically, but for the Chow–Lin methods one can adjust the deterministic terms used

via an option named det, described below. In other words the content of Xbecomes implicit. See

section 9.6 for more detail.

Here’s an important point to note when Xis given explicitly: although this matrix may contain extra

observations“at the end” we assume that Yand Xare correctly aligned at the start. Take for example

the annual to quarterly case: if the ﬁrst observation in annual Yis for 1980 then the ﬁrst observation

in quarterly Xmust be for the ﬁrst quarter of 1980. Ensuring this is the user’s responsibility. We

will have some more to say about this in the following section.

9.3 Overview of data handling

The tdisagg function has three basic arguments, representing Y,Xand srespectively (plus several

options; see below). The ﬁrst two arguments can be given either in matrix form as such, or as “dataset

objects”—that is, a series for Yand a series or list of series for X. Or, as mentioned above, Xcan

be omitted (left implicit). This gives rise to ﬁve cases; which is most convenient will depend on the

user’s workﬂow.

1. Both Yand Xare matrices. In this case, the size and periodicity of the currently open dataset

(if any) are irrelevant. If Yhas Trows Xmust, of course, have at least s·Trows; if that

condition is not satisﬁed an “Invalid argument” error will be ﬂagged.

2There’s nothing to stop a user from constructing such a series using several primary series as input—by taking

the ﬁrst principal component or some other means—but that possibility is beyond our scope here.

Chapter 9. Temporal disaggregation 72

2. Yis a series (or list) and Xa matrix. In this case we assume that the periodicity of the

currently open dataset is the lower one, and Twill be taken as equal to $nobs (the number of

observations in the current sample range). Again, Xmust have at least s·Trows.

3. Yis a matrix and Xa series or list. We then assume that the periodicity of the currently open

dataset is the higher one, so that $nobs deﬁnes (s·T+m). And Yis supposed to be at the

lower frequency, so its number of rows gives T. We should then be able to ﬁnd mas $nobs

minus s·T; if m < 0 an error is ﬂagged.

4. Both Yand Xare “dataset objects”. We have two sub-cases here.

(a) If Xis a series, or an ordinary list of series, the periodicity of the currently open dataset is

taken to be the higher one. The series (or list) containing Yshould hold the appropriate

entries every selements. For example, if s= 4, Y1will be taken from the ﬁrst observation,

Y2from the ﬁfth, Y3from the ninth, and so on. In practical terms, series of this sort are

likely to be composed by repeating each element of a low-frequency variable stimes.

(b) Alternatively, Xcould be a “MIDAS list”. The concept of a MIDAS list is fully explained

in chapter 20 but for example, in a quarterly dataset a MIDAS list would be a list of three

series, for the third, second and ﬁrst month (note the ordering). In this case, the current

periodicity is taken to be the lower one and Xwill contain one column, corresponding to

the high-frequency representation of the MIDAS list.

5. Xis omitted. If Yis given as a matrix it is taken to have Trows. Otherwise the interpretation

is determined heuristically: if the Yseries is recognized by gretl as composed of repeated

low-frequency observations, or if a series result is requested, it is taken as having length sT ,

otherwise its length is taken to be T.

In the previous section we ﬂagged the importance of correct alignment of Xand Yat the start of the

data; we’re now in a position to say a little more about this. If either Xor Yare given in matrix form

alignment is truly the user’s responsibility. But if they are dataset objects gretl can be more helpful.

We automatically advance the start of the sample range to exclude any leading missing values, and

retard the end of the sample ranges for Xand Yto exclude trailing missing values (allowing for the

possibility that Xmay extend beyond Y). In addition we further advance the sample start if this is

required to ensure that the Xdata begin in the ﬁrst high-frequency sub-period (e.g. the ﬁrst quarter

of a year or the ﬁrst month of a quarter). But please note: when gretl automatically excludes leading

or trailing missing values, intra-sample missing values will still provoke an error.

9.4 Extrapolation

As mentioned above, if Xholds covariate data which extend beyond the range of the original series

to be disaggregated then extrapolation is supported. But this is inherently risky, and becomes riskier

the longer the horizon over which it is attempted. In tdisagg extrapolation is by default limited to

one low-frequency period (= shigh-frequency periods) beyond the end of the original data. The user

can adjust this behavior via the extmax member of the opts bundle passed to tdisagg, described in

the next section.

9.5 Function signature

The signature of tdisagg is:

matrix tdisagg(Y0, [X], int s, [bundle opts], [bundle results])

where square brackets indicate optional arguments. Note that while the return value is a matrix, if

Y0 contains a single column or series it can be assigned to a series as in

series ys = tdisagg(Y0, ...)

provided it’s of the right length to match the current dataset, or the current sample range. Details

on the arguments follow.

Chapter 9. Temporal disaggregation 73

Y0 :Y, as a matrix, series or list.

X(optional): Xas a matrix, series or list. This should not contain standard deterministic terms,

since they are handled separately (see det under opts below). If this matrix is omitted, then

disaggregation will be performed using deterministic terms only.

s(int): The temporal expansion factor, for example 3 for quarterly to monthly, 4 for annual to

quarterly or 12 for annual to monthly. We do not support cases such as monthly to weekly or

monthly to daily, where sis not a ﬁxed integer value common to all observations; otherwise,

anything goes.

opts (bundle, optional): a bundle holding additional options. The recognized keys are (in alphabet-

ical order):

aggtype (string): Speciﬁes the type of temporal aggregation appropriate to the series in ques-

tion. The value must be one of sum (each low-frequency value is a sum of shigh-frequency

values, the default); avg (each low-frequency value is the average of shigh frequency val-

ues); or last or first, indicating respectively that each low-frequency value is the last or

ﬁrst of shigh frequency values.

det (int): Relevant only when one of the Chow–Lin methods is selected. This is a numeric code

for the deterministic terms to be included in the regressions: 0 means none; 1, constant

only; 2, constant and linear trend; 3, constant and quadratic trend. The default is 1.

extmax (int): the maximum number of high-frequency periods over which extrapolation should

be carried out, conditional on the availability of covariate data. A zero value means no

extrapolation; a value of −1 means as many periods as possible; and a positive value limits

extrapolation to the speciﬁed number of periods. See section 9.4 for a statement of the

default value.

method (string): Selects the method of disaggregation (see the listing below). Note that the

Chow–Lin methods employ an autoregression coeﬃcient, ρ, which captures the persistence

of the target series at the higher frequency and is used in GLS estimation of the parameters

linking Xto Y.

•chow-lin (the default) is modeled on the original method proposed by Chow and Lin.

It uses a value of ρcomputed as the transformation of a maximum-likelihood estimate

of the low-frequency autocorrelation coeﬃcient.

•chow-lin-mle is equivalent to the method called chow-lin-maxlog in the tempdisagg

package for R;ρis estimated by iterated GLS using the loglikelihood as criterion,

as recommended by Bournay and Laroque (1979). (The BFGS algorithm is used

internally).

•chow-lin-ssr is equivalent to the method called chow-lin-minrss-quilis in tem-

pdisagg;ρis estimated by iterated GLS using the sum of squared GLS residuals as

criterion (L-BFGS is used internally).

•fernandez is basically “Chow–Lin with ρ= 1”. It is suitable if the target series has a

unit root, and is not cointegrated with the indicator series.

•denton-pfd is the proportional ﬁrst diﬀerences variant of Denton, as modiﬁed by

Cholette. See Di Fonzo and Marini (2012) for details.

•denton-afd is the additive ﬁrst diﬀerences variant of Denton (again, as modiﬁed by

Cholette). In contrast to the Chow–Lin methods, neither Denton procedure involves

regression.

plot (int): If a non-zero value is given, a simple plot is displayed by way of a “sanity check” on

the ﬁnal series. See section 9.8 for details.

rho (scalar): Relevant only when one of the Chow–Lin methods is selected. If the method is

chow-lin, then rho is treated as a ﬁxed value for ρ, thus enabling the user to bypass the

default estimation procedure altogether. If the method is chow-lin-mle or chow-lin-ssr,

on the other hand, the supplied ρvalue is used to initialize the numerical optimization

algorithm.

Chapter 9. Temporal disaggregation 74

verbose (int): Controls the verbosity of Chow–Lin or Fern´andez output. If 0 (the default)

nothing is printed unless an error occurs; if 1, summary output from the relevant regression

is shown; if 2, in addition output from the optimizer for the iterated GLS procedure is

shown, if applicable.

results (bundle, optional): If present, this argument must be a previously deﬁned bundle. Upon

successful completion of any of the methods other than denton it contains details of the disag-

gregation under the following keys:

method : the method employed

rho : the value of ρused

lnl : loglikelihood (maximized by the chow-lin-mle method)

SSR : sum of squared residuals (minimized by the chow-lin-ssr method)

coeff : the GLS (or OLS) coeﬃcients

stderr : standard errors for the coeﬃcients

If ρis set to zero—either by speciﬁcation of the user or because the estimate ˆρturned out

to be non-positive—then estimation of the coeﬃcients is via OLS. In that case the lnl and

SSR values are calculated using the OLS residuals (which will be on a diﬀerent scale from the

weighted residuals in GLS).

9.6 Handling of deterministic terms

It may be helpful to set out clearly, in one place, how deterministic terms are handled by tdisagg.

•If Xis given explicitly: No deterministic term is added when the Denton method is used (since a

single preliminary series is wanted) but a constant is added when one of the Chow–Lin methods

is selected. The latter default can be overridden (i.e. the constant removed, or a trend added)

by means of the det entry in the options bundle.

•If Xis omitted: By default a constant is used for all methods. Again, for Chow–Lin this can be

overridden by specifying a det value. If for some reason you wanted Denton with just a trend

you would have to supply Xcontaining a trend.

9.7 Some technical details

In this section we provide some technical details on the methods used by tdisagg. We will refer to

the version of Yconverted to the high frequency sf as the “ﬁnal series”.

As regards the Cholette-modiﬁed Denton methods, for the proportional ﬁrst diﬀerence variant we

calculate the ﬁnal series using the solution described by Di Fonzo and Marini (2012), speciﬁcally

equation (4) on page 5, and for the additive variant we draw on Di Fonzo (2003), pages 3 and 5 in

particular. Note that these procedures require the construction and inversion of a matrix of order

(s+ 1)T. If both sand Tare large it can therefore take some time, and be quite demanding of RAM.

As regards Chow–Lin, let ρ0indicate the rho value passed via the options bundle (if applicable). We

then take these steps:

1. If ρ0>0 set ρ=ρ0and go to step 6if the method is chow-lin or step 7otherwise. But if

ρ0<0 set ρ0= 0.

2. Estimate via OLS a regression of Yon CX,3where Cis the appropriate aggregation matrix.

Let ˆ

βOLS equal the coeﬃcients from this regression. If ρ0= 0 and the method is chow-lin go

to step 8.

3. Calculate the (low frequency) ﬁrst order autocorrelation of the OLS residuals, ˆρL. If ˆρL≥10−6

go to step 4. Otherwise, if the method is chow-lin set ρ= 0 and go to step 8, else set ρ= 0.5

and go to step 7.

3Strictly speaking, CX uses only the ﬁrst sT rows of Xif m > 0.

Chapter 9. Temporal disaggregation 75

4. Reﬁne the positive estimate of ˆρLvia Maximum Likelihood estimation of the AR(1) speciﬁcation

as described in Davidson and MacKinnon (2004).

5. If ˆρL<0.999, set ρto the high-frequency counterpart of ˆρLusing the approach given in Chow

and Lin (1971). Otherwise set ρ= 0.999. If the method is chow-lin, go to step 6, otherwise go

to step 7.

6. Perform GLS with the given value of ρ, store the coeﬃcients as ˆ

βGLS and go to step 9.

7. Perform iterated GLS starting from the prior value of ρ, adjusting ρwith the goal of either

maximizing the loglikelihood (method chow-lin-mle) or minimizing the sum of squared GLS

residuals (chow-lin-ssr); set ˆ

βGLS to the ﬁnal coeﬃcient estimates; and go to step 9.

8. Calculate the ﬁnal series as Xˆ

βOLS +C′(CC′)−1ˆuOLS, where ˆuOLS indicates the OLS residuals,

and stop.

9. Calculate the ﬁnal series as Xˆ

βGLS +VC′(CVC′)−1ˆuGLS, where ˆuGLS indicates the GLS resid-

uals and Vis the estimated high-frequency covariance matrix.

A few notes on our Chow–Lin algorithm follow.

•One might question the value of performing steps 2 to 5 when the method is one that calls

for GLS iteration (chow-lin-mle or chow-lin-ssr), but our testing indicates that it can be

helpful to have a reasonably good estimate of ρin hand before embarking on these iterations.

•Conversely, one might wonder why we bother with GLS iterations if we ﬁnd ˆρL<10−6. But

this allows for the possibility (most likely associated with small sample size) that iteration will

lead to ρ > 0 even when the estimate based on the intial OLS residuals is zero or negative.

•Note that in all cases we are discarding an estimate of ρ < 0 (truncating to 0), which we take

to be standard in this ﬁeld. In our iterated GLS we achieve this by having the optimizer pick

values xin [−∞,+∞] which are translated to [0,1] via the logistic CDF, ρ= 1/(1 + exp(−x)).

To be precise, that’s the case with chow-lin-mle. But we ﬁnd that the chow-lin-ssr method

is liable to overestimate ρand proceed to values arbitrarily close to 1, resulting in numerical

problems. We therefore bound this method to xin [−20,+6.9], corresponding to ρvalues

between near-zero and approximately 0.999.4

As for the Fern´andez method, this is quite straightforward. The place of the high-frequency covariance

matrix Vin Chow–Lin is taken by (D′D)−1, where Dis the approximate ﬁrst-diﬀerencing matrix,

with 1 on the diagonal and −1 on the ﬁrst sub-diagonal. For eﬃcient computation, however, we

store neither Dnor D′Das such, and do not perform any explicit inversion. The special structure

of (D′D)−1makes it possible to produce the eﬀect of pre-multiplication by this matrix with O(T2)

ﬂoating-point operations. Estimation of ρis not an issue since it equals 1 by assumption.

9.8 The plot option

The semantics of this option may be enriched in future but for now it’s a simple boolean switch. The

eﬀect is to produce a time series plot of the ﬁnal series along with the original low-frequency series,

shown in“step” form. If aggregation is by sum the ﬁnal series is multiplied by sfor comparability with

the original. If the disaggregation has been successful these two series should track closely together,

with the ﬁnal series showing plausible short-run dynamics. An example is shown in Figure 9.1.

If there are many observations, the two lines may appear virtually coincident. In that case one can see

what’s going on in more detail by exploiting the “Zoom” functionality of the plot, which is accessed

via the right-click menu in the plot window.

4It may be worth noting that the tempdisagg package for R limits both methods to a maximum ρof 0.999. We

ﬁnd, however, that the ML method can “look after itself”, and does not require the ﬁxed upper bound short of 1.0.

Chapter 9. Temporal disaggregation 76

1600

1700

1800

1900

2000

2100

2200

2300

2400

1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965

Temporal disaggregation (chow-lin)

original data

ﬁnal series * 4

Figure 9.1: Example output from plot option, showing annual GNP (red) and quarterly ﬁnal series (blue)

using quarterly industrial production as indicator.

9.9 Multiple low-frequency series

We now return to a point mentioned in section 9.2, namely, that Ymay be given as a T×gmatrix

with g > 1, or a list of gseries. This means that a single call to tdisagg can be used to process

several input series (“batch processing”), in which case the return value is a matrix with (s·T+m)

rows and gcolumns.

There are some restrictions. First and most obviously, a single call to tdisagg implies a single

selection of “indicators” or “related series” (X) and a single selection of options (aggregation type of

the data, deterministic terms, disaggregation method, and so on). So this possibility will be relevant

only if you have several series that “want the same treatment.” In addition, if g > 1 the plot and

verbose options are ignored and the results bundle is not ﬁlled; if you need those features you

should supply a single series or vector in Y.

The advantage of batch processing lies in the spreading of ﬁxed computational cost, leading to shorter

execution time. However, the relative importance of the ﬁxed cost diﬀers substantially according to

the disaggregation method. For the Chow–Lin methods the ﬁxed cost is relatively small and so little

speed-up can be expected, but for the Denton methods it dominates, and (in our testing) you can

process g > 1 series in little more time than it takes to process a single series.

As they say, “Your mileage may vary,” but if you have a large number of series to be disaggregated

via one of the Denton methods you may well ﬁnd it much faster to use the batch facility of tdisagg.

9.10 Examples

Listing 9.1 shows an example of usage and its output. The data are drawn from the St Louis

Fed; we disaggregate quarterly GDP to monthly with the help of industrial production and payroll

employment, using the default Chow–Lin method.

Several other example scripts are available from http://gretl.sourceforge.net/tdisagg/.

Chapter 9. Temporal disaggregation 77

Listing 9.1: Example of tdisagg usage [Download ▼]

### Traditional Chow-Lin: y is a series with repetition

### and X is a list of series. This corresponds to case 4(a)

### as described in section 9.3 of the documentation above.

###

# ensure that no data are in place

clear

# open gretl’s St Louis Fed database

open fedstl.bin

# import two monthly series

data indpro payems

# import quarterly GDP (values are repeated)

data gdpc1

# restrict sample to complete data

smpl --no-missing

# disaggregate GDP from quarterly to monthly, using

# industrial production and payroll employment as indicators

scalar s = 3

list X = indpro payems

series gdpm = tdisagg(gdpc1, X, s, _(verbose=1, aggtype="sum"))

Output:

Aggregation type sum

GLS estimates (chow-lin) T = 294

Dependent variable: gdpc1

coefficient std. error t-ratio p-value

----------------------------------------------------------

const 312.394 263.372 1.186 0.2365

indpro 10.9158 1.75785 6.210 1.83e-09 ***

payems 0.0242860 0.00171935 14.13 7.39e-35 ***

rho = 0.999, SSR = 51543.9, lnl = -1604.98

Generated series gdpm (ID 4)

Chapter 10

Special functions in genr

10.1 Introduction

The genr command provides a ﬂexible means of deﬁning new variables. At the same time, the

somewhat paradoxical situation is that the“genr” keyword is almost never visible in gretl scripts. For

example, it is not really recommended to write a line such as genr b = 2.5, because there are the

following alternatives:

•scalar b = 2.5, which also invokes the genr apparatus in gretl, but provides explicit type

information about the variable b, which is usually preferable. (gretl’s language hansl is statically

typed, so bcannot switch from scalar to string or matrix, for example.)

•b = 2.5, leaving it to gretl to infer the admissible or most “natural” type for the new object,

which would again be a scalar in this case.

•matrix b = {2.5}: This formulation is required if bis going to be expanded with additional

rows or columns later on. Otherwise, gretl’s static typing would not allow bto be promoted

from scalar to matrix, so it must be a matrix right from the start, even if it is of dimension

1×1 initially. (This deﬁnition could also be written as matrix b = 2.5, but the more explicit

form is recommended.)

In addition to scalar or matrix, other type keywords that can be used to substitute the generic

genr term are those enumerated in the following chapter 11. In the case of an array the concrete

speciﬁcation should be used, so one of matrices,strings,lists,bundles.1

Therefore, there’s only a handful of special cases where it is really necessary to use the “genr” keyword:

•genr time — Creates a time trend variable (1,2,3,.. . ) under the name time. Note that within

an appropriately deﬁned panel dataset this variable honors the panel structure and is a true

time index. (In a cross-sectional dataset, the command will still work and produces the same

result as genr index below, but of course no temporal meaning exists.)

•genr index — Creates an observation variable named index, running from 1 to the sample

size.

•genr unitdum — In the context of panel data, creates a set of dummies for the panel groups

or “units”. These are named du_1,du_2, and so forth. Actually, this particular genr usage is

not strictly necessary, because a list of group dummies can also be obtained as:

series gr = $unit

list groupdums = dummify(gr, NA)

(The NA argument to the dummify function has the eﬀect of not skipping any unit as the

reference group, thus producing the full set of dummies.)

•genr timedum — Again for panel data, creates a set of dummies for the time periods, named

dt_1,dt_2, . . . . And again, a list-producing variant without genr exists, using the special

accessor $obsminor which indexes time in the panel context and can be used as a substitute for

time from above:

series tindex = $obsminor

list timedums = dummify(tindex, NA)

1A recently added advanced datatype is an array of arrays, with the associated type speciﬁer arrays.

Chapter 10. Special functions in genr 79

•genr markers — See section 4.5 for an explanation and example of this panel-related feature.

Finally, there also exists genr dummy, which produces a set of seasonal dummies. However, it is

recommended to use the seasonals() function instead, which can also return centered dummies.

The rest of this chapter discusses other special function aspects.

10.2 Cumulative densities and p-values

The two functions cdf and pvalue provide complementary means of examining values from 17 prob-

ability distributions (as of July 2021), among which the most important ones: standard normal,

Student’s t,χ2,F, gamma, and binomial. The syntax of these functions is set out in the Gretl

Command Reference; here we expand on some subtleties.

The cumulative density function or CDF for a random variable is the integral of the variable’s density

from its lower limit (typically either −∞ or 0) to any speciﬁed value x. The p-value (at least the

one-tailed, right-hand p-value as returned by the pvalue function) is the complementary probability,

the integral from xto the upper limit of the distribution, typically +∞.

In principle, therefore, there is no need for two distinct functions: given a CDF value p0you could

easily ﬁnd the corresponding p-value as 1 −p0(or vice versa). In practice, with ﬁnite-precision

computer arithmetic, the two functions are not redundant. This requires a little explanation. In

gretl, as in most statistical programs, ﬂoating point numbers are represented as “doubles” — double-

precision values that typically have a storage size of eight bytes or 64 bits. Since there are only so

many bits available, only so many ﬂoating-point numbers can be represented: doubles do not model

the real line. Typically doubles can represent numbers over the range (roughly) ±1.7977 ×10308, but

only to about 15 digits of precision.

Suppose you’re interested in the left tail of the χ2distribution with 50 degrees of freedom: you’d like

to know the CDF value for x= 0.9. Take a look at the following interactive session:

? scalar p1 = cdf(X, 50, 0.9)

Generated scalar p1 = 8.94977e-35

? scalar p2 = pvalue(X, 50, 0.9)

Generated scalar p2 = 1

? scalar test = 1 - p2

Generated scalar test = 0

The cdf function has produced an accurate value, but the pvalue function gives an answer of 1, from

which it is not possible to retrieve the answer to the CDF question. This may seem surprising at ﬁrst,

but consider: if the value of p1 above is correct, then the correct value for p2 is 1 −8.94977 ×10−35 .

But there’s no way that value can be represented as a double: that would require over 30 digits of

precision.

Of course this is an extreme example. If the xin question is not too far oﬀ into one or other tail of

the distribution, the cdf and pvalue functions will in fact produce complementary answers, as shown

below:

? scalar p1 = cdf(X, 50, 30)

Generated scalar p1 = 0.0111648

? scalar p2 = pvalue(X, 50, 30)

Generated scalar p2 = 0.988835

? scalar test = 1 - p2

Generated scalar test = 0.0111648

But the moral is that if you want to examine extreme values you should be careful in selecting the

function you need, in the knowledge that values very close to zero can be represented as doubles while

values very close to 1 cannot.

10.3 Retrieving internal variables (dollar accessors)

A very useful feature is to retrieve in a script various values calculated by gretl in the course of

estimating models or testing hypotheses. Since they all start with a literal $ character, they are

Chapter 10. Special functions in genr 80

called “dollar accessors”. The variables that can be retrieved in this way are listed in the Gretl

Command Referenceor in the built-in function help under the Help menu. The dollar accessors can

be used like other gretl objects in script assignments or statements. Some of those accessors are

actually independent of any estimation or test and describe, for example, the context of the running

gretl program. But here we say a bit more about the special variables $test and $pvalue.

These variables hold, respectively, the value of the last test statistic calculated using an explicit test-

ing command and the p-value for that test statistic. If no such test has been performed at the time

when these variables are referenced, they will produce the missing value code. Some “explicit testing

commands” that work in this way are as follows (among others): add (joint test for the signiﬁcance of

variables added to a model); adf (Augmented Dickey–Fuller test, see below); arch (test for ARCH);

chow (Chow test for a structural break); coeffsum (test for the sum of speciﬁed coeﬃcients); coint

(Engle-Granger cointegration test); cusum (the Harvey–Collier t-statistic); difftest (test for a dif-

ference of two groups); kpss (KPSS stationarity test, no p-value available); modtest (see below);

meantest (test for diﬀerence of means); omit (joint test for the signiﬁcance of variables omitted from

a model); reset (Ramsey’s RESET); restrict (general linear restriction); runs (runs test for ran-

domness); and vartest (test for diﬀerence of variances). In most cases both a $test and a $pvalue

are stored; the exception is the KPSS test, for which a p-value is not currently available.

The modtest command (which must follow an estimation command) oﬀers several diagnostic tests;

the particular test performed depends on the option ﬂag provided. Please see the Gretl Command

Reference and for example chapters 32 and 31 of this Guide for details.

An important point to notice about this mechanism is that the internal variables $test and $pvalue

are over-written each time one of the tests listed above is performed. If you want to reference these

values, you must do so at the correct point in the sequence of gretl commands.

Chapter 11

Gretl data types

11.1 Introduction

Gretl oﬀers the following data types:

scalar holds a single numerical value

series holds nnumerical values, where nis the number of observations in the current

dataset

matrix holds a rectangular array of numerical values, of any (two) dimensions

list holds the ID numbers of a set of series

string holds an array of characters

bundle holds zero or more objects of various types

array holds zero or more objects of a given type

The “numerical values” mentioned above are all double-precision ﬂoating point numbers.

In this chapter we give a run-down of the basic characteristics of each of these types and also explain

their “life cycle” (creation, modiﬁcation and destruction). The list and matrix types, whose uses are

relatively complex, are discussed at greater length in chapters 15 and 17 respectively.

11.2 Series

We begin with the series type, which is the oldest and in a sense the most basic type in gretl. When

you open a data ﬁle in the gretl GUI, what you see in the main window are the ID numbers, names

(and descriptions, if available) of the series read from the ﬁle. All the series existing at any point in a

gretl session are of the same length, although some may have missing values. The variables that can

be added via the items under the Add menu in the main window (logs, squares and so on) are also

series.

For a gretl session to contain any series, a common series length must be established. This is usually

achieved by opening a data ﬁle, or importing a series from a database, in which case the length is set

by the ﬁrst import. But one can also use the nulldata command, which takes as it single argument

the desired length, a positive integer.

Each series has these basic attributes: an ID number, a name, and of course nnumerical values. A

series may also have a description (which is shown in the main window and is also accessible via the

labels command), a “display name” for use in graphs, a record of the compaction method used in

reducing the variable’s frequency (for time-series data only) and ﬂags marking the variable as discrete

and/or as a numeric encoding of a qualitative characteristic. These attributes can be edited in the

GUI by choosing Edit Attributes (either under the Variable menu or via right-click), or by means of

the setinfo command.

In the context of most commands you are able to reference series by name or by ID number as you

wish. The main exception is the deﬁnition or modiﬁcation of variables via a formula; here you must

use names since ID numbers would get confused with numerical constants.

Note that series ID numbers are always consecutive, and the ID number for a given series will change

if you delete a lower-numbered series. In some contexts, where gretl is liable to get confused by such

changes, deletion of low-numbered series is disallowed.

Chapter 11. Gretl data types 82

Discrete series

It is possible to mark variables of the series type as discrete. The meaning and uses of this facility

are explained in chapter 12.

String-valued series

It is generally expected that series in gretl will be “properly numeric” (on a ratio or at least an

ordinal scale), or the sort of numerical indicator variables (0/1 “dummies”) that are traditional in

econometrics. However, “string-valued” series are also supported—see chapter 16 for details.

11.3 Scalars

The scalar type is relatively simple: just a convenient named holder for a single numerical value.

Scalars have none of the additional attributes pertaining to series, do not have ID numbers, and must

be referenced by name. A common use of scalar variables is to record information made available

by gretl commands for further processing, as in scalar s2 = $sigma^2 to record the square of the

standard error of the regression following an estimation command such as ols.

You can deﬁne and work with scalars in gretl without having any dataset in place.

In the gretl GUI, scalar variables can be inspected and their values edited via the “Icon view” (see

the View menu in the main window).

11.4 Matrices

Matrices in gretl work much as in other mathematical software (e.g. MATLAB,Octave). Like scalars

they have no ID numbers and must be referenced by name, and they can be used without any dataset

in place. Matrix indexing is 1-based: the top-left element of matrix Ais A[1,1]. Matrices are

discussed at length in chapter 17; advanced users of gretl will want to study this chapter in detail.

Matrices have two optional attribute beyond their numerical content: they may have column and/or

row names attached; these are displayed when the matrix is printed. See the cnameset and rnameset

functions for details.

In the gretl GUI, matrices can be inspected, analysed and edited via the Icon view item under the

View menu in the main window: each currently deﬁned matrix is represented by an icon.

11.5 Lists

As with matrices, lists merit an explication of their own (see chapter 15). Brieﬂy, named lists can (and

should!) be used to make command scripts less verbose and repetitious, and more easily modiﬁable.

Since lists are in fact lists of series ID numbers they can be used only when a dataset is in place.

In the gretl GUI, named lists can be inspected and edited under the Data menu in the main window,

via the item Deﬁne or edit list.

11.6 Strings

String variables may be used for labeling, or for constructing commands. They are discussed in

chapter 15. They must be referenced by name; they can be deﬁned in the absence of a dataset.

Such variables can be created and modiﬁed via the command-line in the gretl console or via script;

there is no means of editing them via the gretl GUI.

11.7 Bundles

Abundle is a container or wrapper for various sorts of objects—primarily scalars, matrices, strings,

arrays and bundles. (Yes, a bundle can contain other bundles). Secondarily, series and lists can be

placed in bundles but this is subject to important qualiﬁcations noted below.

Chapter 11. Gretl data types 83

A bundle takes the form of a hash table or associative array: each item placed in the bundle is

associated with a key which can used to retrieve it subsequently. We begin by explaining the mechanics

of bundles then oﬀer some thoughts on what they are good for.

There are several ways of creating a bundle. Here are the ﬁrst two:

•Just “declare” it, as in

bundle foo

•or deﬁne an empty bundle using the defbundle function without any arguments:

bundle foo = defbundle()

These formulations are basically equivalent, in that they both create an empty bundle. The diﬀerence

is that the second variant may be reused—if a bundle named foo already exists the eﬀect is to empty

it—while the ﬁrst may only be used once in a given gretl session; it is an error to attempt to declare

a variable that already exists.

To create a bundle and add contents in one go, you can use the defbundle function with some

arguments. For example:

bundle foo = defbundle("x", 13, "mat", I(3), "str", "some string")

The arguments must be given in pairs—a key followed by the object to be associated with the key—

with all terms comma-separated. However, you may prefer to use one or other of the alternative

idioms introduced in gretl 2021a. The ﬁrst of these looks like this:

bundle foo = _(x = 13, mat = I(3), str = "some string")

It’s more streamlined than defbundle but not quite so ﬂexible. You don’t have to quote the keys,

but that also means that you can’t give the name of a key as a string variable; it’s always taken as a

string literal. Yet more streamlined but also less ﬂexible is this variant:

bundle foo = _(x, mat, str)

which works if and only if there are existing objects x,mat and str in scope and you want to add

them to the bundle under keys equal to their own names.

For more on the defbundle function, see the Gretl Command Reference or the Function Reference

under Help in the GUI program.

To add an object to a bundle you assign to a compound left-hand value: the name of the bundle

followed by the key. Two forms of syntax are acceptable in this context. The recommended syntax

(for most uses) is bundlename.key; that is, the name of the bundle followed by a dot, then the key.

Both the bundle name and the key must be valid gretl identiﬁers.1For example, the statement

foo.matrix1 = m

adds an object called m(presumably a matrix) to bundle foo under the key matrix1. If you wish to

make it explicit that mis supposed to be a matrix you can use the form

matrix foo.matrix1 = m

Alternatively, a bundle key may be given as a string enclosed in square brackets, as in

foo["matrix1"] = m

This syntax oﬀers greater ﬂexibility in that the key string does not have to be a valid identiﬁer (for

example it can include spaces). In addition, when using the square bracket syntax it is possible to

use a string variable to deﬁne or access the key in question. For example:

1As a reminder: 31 characters maximum, starting with a letter and composed of just letters, numbers or underscore.

Chapter 11. Gretl data types 84

string s = "matrix 1"

foo[s] = m # matrix is added under key "matrix 1"

To get an item out of a bundle, again use the name of the bundle followed by the key, as in

matrix bm = foo.matrix1

# or using the alternative notation

matrix bm = foo["matrix1"]

# or using a string variable

matrix bm = foo[s]

Note that the key identifying an object within a given bundle is necessarily unique. If you reuse an

existing key in a new assignment, the eﬀect is to replace the object which was previously stored under

the given key. In this context it is not required that the type of the replacement object is the same

as that of the original.

Also note that when you add an object to a bundle, what in fact happens is that the bundle acquires

acopy of the object. The external object retains its own identity and is unaﬀected if the bundled

object is replaced by another. Consider the following script fragment:

bundle foo

matrix m = I(3)

foo.mykey = m

scalar x = 20

foo.mykey = x

After the above commands are completed bundle foo does not contain a matrix under mykey, but

the original matrix mis still in good health.

To delete an object from a bundle use the delete command, with the bundle/key combination, as in

delete foo.mykey

This destroys the object associated with mykey and removes the key from the hash table.

To determine whether a bundle contains an object associated with a given key, use the inbundle()

function. This takes two arguments: the name of the bundle and the key string. The value returned

by this function is an integer which codes for the type of the object (0 for no match, 1 for scalar, 2

for series, 3 for matrix, 4 for string, 5 for bundle and 6 for array). The function typestr() may be

used to get the string corresponding to this code. For example:

scalar type = inbundle(foo, x)

if type == 0

print "x: no such object"

else

printf "x is of type %s\n", typestr(type)

endif

Besides adding, accessing, replacing and deleting individual items, the other operations that are

supported for bundles are union, printing and deletion. As regards union, if bundles b1 and b2 are

deﬁned you can say

bundle b3 = b1 + b2

to create a new bundle that is the union of the two others. The algorithm is: create a new bundle that

is a copy of b1, then add any items from b2 whose keys are not already present in the new bundle.

(This means that bundle union is not commutative if the bundles have one or more key strings in

common.)

If bis a bundle and you say print b, you get a listing of the bundle’s keys along with the types of

the corresponding objects, as in

Chapter 11. Gretl data types 85

? print b

bundle b:

x (scalar)

mat (matrix)

inside (bundle)

Note that in the example above the bundle bnests a bundle named inside. If you want to see

what’s inside nested bundles (with a single command) you can append the --tree option to the print

command.

Series and lists as bundle members

It is possible to add both series and lists to a bundle, as in

open data4-10

list X = const CATHOL INCOME

bundle b

b.y = ENROLL

b.X = X

eval b.y

eval b.X

However, it is important to bear in mind the following limitations.

•A series, as such, is inherently a member of a dataset, and a bundle can“survive”the replacement

or destruction of the dataset from which a series was added. It may then be impossible (or, even

if possible, meaningless) to extract a bundled series as a series. However it’s always possible to

retrieve the values of the series in the form of a matrix (column vector).

•In gretl commands that call for series arguments you cannot give a bundled series without ﬁrst

extracting it. In the little example above the series ENROLL was added to bundle bunder the key

y, but b.y is not itself a series (member of a dataset), it’s just an anonymous array of values.

It therefore cannot be given as, say, the dependent variable in a call to gretl’s ols command.

•A gretl list is just an array of ID numbers of series in a given dataset, a “macro” if you like. So

as with series, there’s no guarantee that a bundled list can be extracted as a list (though it can

always be extracted as a row vector).

The points made above are illustrated in Listing 11.1. In “Case 1”we open a little dataset with just 14

cross-sectional observations and put a series into a bundle. We then open a time-series dataset with

64 observations, while preserving the bundle, and extract the bundled series. This instance is legal,

since the stored series does not overﬂow the length of the new dataset (it gets written into the ﬁrst

14 observations), but it’s probably not meaningful. It’s up to the user to decide if such operations

make sense.

In “Case 2” a similar sequence of statements leads to an error (trapped by catch) because this time

the stored series will not ﬁt into the new dataset. We can nonetheless grab the data as a vector.

In “Case 3” we put a list of three series into a bundle. This does not put any actual data values into

the bundle, just the ID numbers of the speciﬁed series, which happen to be 4, 5 and 6. We then

switch to a dataset that contains just 4 series, so the list cannot be extracted as such (IDs 5 and 6

are out of bounds). Once again, however, we can retrieve the ID numbers in matrix form if we want.

In some cases putting a gretl list as such into a bundle may be appropriate, but in others you are

better oﬀ adding the content of the list, in matrix form, as in

open data4-10

list X = const CATHOL INCOME

bundle b

matrix b.X = {X}

In this case we’re adding a matrix with three columns and as many rows as there are in the dataset;

we have the actual data, not just a reference to the data that might “go bad”. See chapter 17 for

Recommended solution:

alpha = 0.9

theta = -0.5

series y = filter(normal(), {1, theta}, alpha)

“Bread and butter” solution:

alpha = 0.9

theta = -0.5

series e = normal()

series y = 0

series y = alpha * y(-1) + e + theta * e(-1)

Comment: The filter function is speciﬁcally designed for this purpose so in most cases you’ll want

to take advantage of its speed and ﬂexibility. That said, in some cases you may want to generate the

series in a manner which is more transparent (maybe for teaching purposes).

In the second solution, the statement series y = 0 is necessary because the next statement evaluates

yrecursively, so y[1] must be set. Note that you must use the keyword series here instead of writing

genr y = 0 or simply y=0, to ensure that yis a series and not a scalar.

Recoding a variable by classes

Problem: You want to recode a variable by classes. For example, you have the age of a sample of

individuals (xi) and you need to compute age classes (yi) as

yi= 1 for xi<18

yi= 2 for 18 ≤xi<65

yi= 3 for xi≥65

Solution:

series y = 1 + (x >= 18) + (x >= 65)

Comment: True and false expressions are evaluated as 1 and 0 respectively, so they can be ma-

nipulated algebraically as any other number. The same result could also be achieved by using the

conditional assignment operator (see below), but in most cases it would probably lead to more con-

voluted constructs.

1FIPS is the Federal Information Processing Standard: it assigns numeric codes from 1 to 56 to the US states and

outlying areas.

Chapter 21. Cheat sheet 183

Conditional assignment

Problem: Generate ytvia the following rule:

yt=(xtfor dt> a

ztfor dt≤a

Solution:

seriesy=(d>a)?x:z

Comment: There are several alternatives to the one presented above. One is a brute force solution

using loops. Another one, more eﬃcient but still suboptimal, would be

series y = (d>a)*x + (d<=a)*z

However, the ternary conditional assignment operator is not only the most eﬃcient way to accomplish

what we want, it is also remarkably transparent to read when one gets used to it. Some readers may

ﬁnd it helpful to note that the conditional assignment operator works exactly the same way as the

=IF() function in spreadsheets.

Generating a time index for panel datasets

Problem: gretl has a $unit accessor, but not the equivalent for time. What should I use?

Solution:

series x = time

Comment: The special construct genr time and its variants are aware of whether a dataset is a

panel.

Sanitizing a list of regressors

Problem: I noticed that built-in commands like ols automatically drop collinear variables and put

the constant ﬁrst. How can I achieve the same result for an estimator I’m writing?

Solution: No worry. The function below does just that

function list sanitize(list X)

list R = X - const

if nelem(R) < nelem(X)

R = const R

endif

return dropcoll(R)

end function

so for example the code below

nulldata 20

x = normal()

y = normal()

z = x + y # collinear

list A = x y const z

list B = sanitize(A)

list print A

list print B

returns

? list print A

x y const z

? list print B

const x y

Chapter 21. Cheat sheet 184

Besides: it has been brought to our attention that some mischievous programs out there put the

constant last, instead of ﬁrst, like God intended. We are not amused by this utter disrespect of

econometric tradition, but if you want to pursue the way of evil, it is rather simple to adapt the script

above to that eﬀect.

Generating the “hat” values after an OLS regression

Problem: I’ve just run an OLS regression, and now I need the so-called the leverage values (also known

as the “hat” values). I know you can access residuals and ﬁtted values through “dollar” accessors, but

nothing like that seems to be available for “hat” values.

Solution: “Hat” values are can be thought of as the diagonal of the projection matrix PX, or more

explicitly as

hi=x′

i(X′X)−1xi

where Xis the matrix of regressors and x′

iis its i-th row.

The reader is invited to study the code below, which oﬀers four diﬀerent solutions to the problem:

open data4-1.gdt --quiet

list X = const sqft bedrms baths

ols price X

# method 1

leverage --save --quiet

series h1 = lever

# these are necessary for what comes next

matrix mX = {X}

matrix iXX = invpd(mX’mX)

# method 2

series h2 = diag(qform(mX, iXX))

# method 3

series h3 = sumr(mX .* (mX*iXX))

# method 4

series h4 = NA

loop i=1..$nobs

matrix x = mX[i,]’

h4[i] = x’iXX*x

endloop

# verify

print h1 h2 h3 h4 --byobs

Comment: Solution 1 is the preferable one: it relies on the built-in leverage command, which

computes the requested series quite eﬃciently, taking care of missing values, possible restrictions to

the sample, etc.

However, three more are shown for didactical purposes, mainly to show the user how to manipulate

matrices. Solution 2 ﬁrst constructs the PXmatrix explicitly, via the qform function, and then takes

its diagonal; this is deﬁnitely not recommended (despite its compactness), since you generate a much

bigger matrix than you actually need and waste a lot of memory and CPU cycles in the process. It

doesn’t matter very much in the present case, since the sample size is very small, but with a big

dataset this could be a very bad idea.

Solution 3 is more clever, and relies on the fact that, if you deﬁne Z=X·(X′X)−1, then hicould

also be written as

hi=x′

izi=

i=1

xikzik

which is in turn equivalent to the sum of the elements of the i-th row of X⊙Z, where ⊙is the

element-by-element product. In this case, your clever usage of matrix algebra would produce a

solution computationally much superior to solution 2.

Chapter 21. Cheat sheet 185

Solution 4 is the most old-fashioned one, and employs an indexed loop. While this wastes practically

no memory and employs no more CPU cycles in algebraic operations than strictly necessary, it imposes

a much greater burden on the hansl interpreter, since handling a loop is conceptually more complex

than a single operation. In practice, you’ll ﬁnd that for any realistically-sized problem, solution 4 is

much slower that solution 3.

Moving functions for time series

Problem: gretl provides native functions for moving averages, but I need a to compute a diﬀerent

statistic on a sliding data window. Is there a way to do this without using loops?

Solution: One of the nice things of the list data type is that, if you deﬁne a list, then several

functions that would normally apply “vertically” to elements of a series apply “horizontally” across

the list. So for example, the following piece of code

open bjg.gdt

order = 12

list L = lg || lags(order-1, lg)

smpl +order ;

series movmin = min(L)

series movmax = max(L)

series movmed = median(L)

smpl full

computes the moving minimum, maximum and median of the lg series. Plotting the four series would

produce something similar to ﬁgure 21.1.

4.6

4.8

5.2

5.4

5.6

5.8

6.2

6.4

6.6

1950 1952 1954 1956 1958 1960

movmin

movmed

movmax

Figure 21.1: “Moving” functions

Generating data with a prescribed correlation structure

Problem: I’d like to generate a bunch of normal random variates whose covariance matrix is exactly

equal to a given matrix Σ. How can I do this in gretl?

Solution: The Cholesky decomposition is your friend. If you want to generate data with a given

population covariance matrix, then all you have to do is post-multiply your pseudo-random data by

the Cholesky factor (transposed) of the matrix you want. For example:

set seed 123

S = {2,1;1,1}

T = 1000

X = mnormal(T, rows(S))

Chapter 21. Cheat sheet 186

X = X * cholesky(S)’

eval mcov(X)

should give you

? eval mcov(X)

2.0016 1.0157

1.0157 1.0306

If, instead, you want your simulated data to have a given sample covariance matrix, you have to apply

the same technique twice: one for standardizing the data, another one for giving it the covariance

structure you want. Example:

S = {2,1;1,1}

T = 1000

X = mnormal(T, rows(S))

X = X * (cholesky(S)/cholesky(mcov(X)))’

eval mcov(X)

gives you

? eval mcov(X)

2 1

1 1

as required.

21.3 Neat tricks

Interaction dummies

Problem: You want to estimate the model yi=xiβ1+ziβ2+diβ3+ (di·zi)β4+εt, where diis a

dummy variable while xiand ziare vectors of explanatory variables.

Solution: As of version 1.9.12, gretl provides the ^operator to make this operation easy. See section

15.1 for details (especially Listing 15.1). But back in my day, we used loops to do that! Here’s how:

list X = x1 x2 x3

list Z = z1 z2

list dZ = deflist()

loop foreach i Z

series d$i = d * $i

list dZ = dZ d$i

endloop

olsyXZddZ

Comment: It’s amazing what string substitution can do for you, isn’t it?

Realized volatility

Problem: Given data by the minute, you want to compute the “realized volatility” for the hour as

RVt=1

60 P60

τ=1 y2

t:τ. Imagine your sample starts at time 1:1.

Solution:

smpl --full

genr time

series minute = int(time/60) + 1

series second = time % 60

setobs minute second --panel

series rv = psd(y)^2

setobs 1 1

smpl second==1 --restrict

store foo rv

Chapter 21. Cheat sheet 187

Comment: Here we trick gretl into thinking that our dataset is a panel dataset, where the minutes

are the “units”and the seconds are the “time”; this way, we can take advantage of the special function

psd(), panel standard deviation. Then we simply drop all observations but one per minute and save

the resulting data (store foo rv translates as “store in the gretl dataﬁle foo.gdt the series rv”).

Looping over two paired lists

Problem: Suppose you have two lists with the same number of elements, and you want to apply some

command to corresponding elements over a loop.

Solution:

list L1 = a b c

list L2 = x y z

k1 = 1

loop foreach i L1

k2 = 1

loop foreach j L2

if k1 == k2

ols $i 0 $j

endif

k2++

endloop

k1++

endloop

Comment: The simplest way to achieve the result is to loop over all possible combinations and ﬁlter

out the unneeded ones via an if condition, as above. That said, in some cases variable names can

help. For example, if

list Lx = x1 x2 x3

list Ly = y1 y2 y3

then we could just loop over the integers—quite intuitive and certainly more elegant:

loop i=1..3

ols y$i const x$i

endloop

Convolution / polynomial multiplication

Problem: How do I multiply polynomials? There’s no dedicated function to do that, and yet it’s a

fairly basic mathematical task.

Solution: Never fear! We have the conv2d function, which is a tool for a more general problem, but

includes polynomial multiplication as a special case..

Suppose you want to multiply two ﬁnite-order polynomials P(x) = Pm

i=0 pixiand Q(x) = Pn

i=0 qixi.

What you want is the sequence of coeﬃcients of the polynomial

R(x) = P(x)·Q(x) =

m+n

k=0

rkxk

where

rk=

i=0

piqk−i

is the convolution of the piand qicoeﬃcients. The same operation can be performed via the FFT,

but in most cases using conv2d is quicker and more natural.

Chapter 21. Cheat sheet 188

As an example, we’ll use the same one we used in Section 30.5: consider the multiplication of two

polynomials:

P(x) = 1 + 0.5x

Q(x) = 1 + 0.3x−0.8x2

R(x) = P(x)·Q(x) = 1 + 0.8x−0.65x2−0.4x3

The following code snippet performs all the necessary calculations:

p = {1; 0.5}

q = {1; 0.3; -0.8}

r = conv2d(p, q)

print r

Runnning the above produces

r (4 x 1)

0.8

-0.65

-0.4

which is indeed the desired result. Note that the same computation could also be performed via the

filter function, at the price of slightly more elaborate syntax.

Comparing two lists

Problem: How can I tell if two lists contain the same variables (not necessarily in the same order)?

Solution: In many respects, lists are like sets, so it makes sense to use the so-called “symmetric

diﬀerence” operator, which is deﬁned as

A△B= (A\B)∪(B\A)

where in this context backslash represents the relative complement operator, such that

A\B={x∈A|x∈ B}

In practice we ﬁrst check if there are any series in Abut not in B, then we perform the reverse check.

If the union of the two results is an empty set, then the lists must contain the same variables. The

hansl syntax for this would be something like

scalar NotTheSame = nelem((A-B) || (B-A)) > 0

Reordering list elements

Problem: Is there a way to reorder list elements?

Solution: You can use the fact that a list can be cast into a vector of integers and then manipulated

via ordinary matrix syntax. So, for example, if you wanted to “ﬂip” a list you may just use the

mreverse function. For example:

open AWM.gdt --quiet

listX=36912

matrix tmp = X

list revX = mreverse(tmp’)

list X print

list revX print

will produce

? list X print

D1 D872 EEN_DIS GCD

? list revX print

GCD EEN_DIS D872 D1

Chapter 21. Cheat sheet 189

Plotting an asymmetric conﬁdence interval

Problem: “I like the look of the --band option to the gnuplot and plot commands, but it’s set up

for plotting a symmetric interval and I want to show an asymmetric one.”

Solution: Any interval is by construction symmetrical about its mean at each observation. So you

just need to perform a little tweak. Say you want to plot a series xalong with a band deﬁned by the

two series top and bot. Here we go:

# create series for mid-point and deviation

series mid = (top + bot)/2

series dev = top - mid

gnuplot x --band=mid,dev --time-series --with-lines --output=display

Cross-validation

Problem: “I’d like to compute the so-called leave-one-out cross-validation criterion for my regression.

Is there a command in gretl?”

If you have a sample with nobservations, the “leave-one-out” cross-validation criterion can be me-

chanically computed by running nregressions in which one observation at a time is omitted and all

the other ones are used to forecast its value. The sum of the nsquared forecast errors is the statistic

we want. Fortunately, there is no need to do so. It is possible to prove that the same statistic can be

computed as

CV =

i=1

[ˆui/(1 −hi)]2,

where hiis the i-th element of the “hat” matrix (see section 21.2) from a regression on the whole

sample.

This method is natively provided by gretl as a side beneﬁt to the leverage command, that stores

the CV criterion into the $test accessor. The following script shows the equivalence of the two

approaches:

set verbose off

open data4-1.gdt

list X = const sqft bedrms baths

# compute the CV criterion the silly way

scalar CV = 0

matrix mX = {X}

loop i = 1 .. $nobs

xi = mX[i,]

yi = price[i]

smpl obs != i --restrict

ols price X --quiet

smpl full

scalar fe = yi - xi * $coeff

CV += fe^2

endloop

printf "CV = %g\n", CV

# the smart way

ols price X --quiet

leverage --quiet

printf "CV = %g\n", $test

Is my matrix result broken?

Problem: “Most of the matrix manipulation functions available in gretl ﬂag an error if something goes

wrong, but there’s no guarantee that every matrix computation will return an entirely ﬁnite matrix,

containing no inﬁnities or NaNs. So how do I tell if I’ve got a fully valid matrix?”

Chapter 21. Cheat sheet 190

Solution: Given a matrix m, the call “ok(m)” returns a matrix with the same dimensions as m, with

elements 1 for ﬁnite values and 0 for inﬁnities or NaNs. A matrix as a whole is OK if it has no elements

which fail this test, so here’s a suitable check for a “broken” matrix, using the logical NOT operator,

“!”:

sumc(sumr(!ok(m))) > 0

If this gives a non-zero return value you know that mcontains at least one non-ﬁnite element.

Part II

Econometric methods

191

Chapter 22

Robust covariance matrix estimation

22.1 Introduction

Consider (once again) the linear regression model

y=Xβ +u(22.1)

where yand uare T-vectors, Xis a T×kmatrix of regressors, and βis a k-vector of parameters.

As is well known, the estimator of βgiven by Ordinary Least Squares (OLS) is

β= (X′X)−1X′y(22.2)

If the condition E(u|X) = 0 is satisﬁed, this is an unbiased estimator; under somewhat weaker

conditions the estimator is biased but consistent. It is straightforward to show that when the OLS

estimator is unbiased (that is, when E(ˆ

β−β) = 0), its variance is

Var( ˆ

β) = E(ˆ

β−β)( ˆ

β−β)′= (X′X)−1X′ΩX(X′X)−1(22.3)

where Ω = E(uu′) is the covariance matrix of the error terms.

Under the assumption that the error terms are independently and identically distributed (iid) we can

write Ω = σ2I, where σ2is the (common) variance of the errors (and the covariances are zero). In

that case (22.3) simpliﬁes to the “classical” formula,

Var( ˆ

β) = σ2(X′X)−1(22.4)

If the iid assumption is not satisﬁed, two things follow. First, it is possible in principle to construct

a more eﬃcient estimator than OLS—for instance some sort of Feasible Generalized Least Squares

(FGLS). Second, the simple “classical” formula for the variance of the least squares estimator is no

longer correct, and hence the conventional OLS standard errors—which are just the square roots

of the diagonal elements of the matrix deﬁned by (22.4)—do not provide valid means of statistical

inference.

In the recent history of econometrics there are broadly two approaches to the problem of non-iid

errors. The “traditional” approach is to use an FGLS estimator. For example, if the departure from

the iid condition takes the form of time-series dependence, and if one believes that this could be

modeled as a case of ﬁrst-order autocorrelation, one might employ an AR(1) estimation method such

as Cochrane–Orcutt, Hildreth–Lu, or Prais–Winsten. If the problem is that the error variance is

non-constant across observations, one might estimate the variance as a function of the independent

variables and then perform weighted least squares, using as weights the reciprocals of the estimated

variances.

While these methods are still in use, an alternative approach has found increasing favor: that is, use

OLS but compute standard errors (or more generally, covariance matrices) that are robust with respect

to deviations from the iid assumption. This is typically combined with an emphasis on using large

datasets—large enough that the researcher can place some reliance on the (asymptotic) consistency

property of OLS. This approach has been enabled by the availability of cheap computing power. The

computation of robust standard errors and the handling of very large datasets were daunting tasks at

one time, but now they are unproblematic. The other point favoring the newer methodology is that

while FGLS oﬀers an eﬃciency advantage in principle, it often involves making additional statistical

assumptions which may or may not be justiﬁed, which may not be easy to test rigorously, and which

may threaten the consistency of the estimator—for example, the “common factor restriction” that is

implied by traditional FGLS “corrections” for autocorrelated errors.

192

Chapter 22. Robust covariance matrix estimation 193

James Stock and Mark Watson’s Introduction to Econometrics illustrates this approach at the level

of undergraduate instruction: many of the datasets they use comprise thousands or tens of thousands

of observations; FGLS is downplayed; and robust standard errors are reported as a matter of course.

In fact, the discussion of the classical standard errors (labeled “homoskedasticity-only”) is conﬁned to

an Appendix.

Against this background it may be useful to set out and discuss all the various options oﬀered by

gretl in respect of robust covariance matrix estimation. The ﬁrst point to notice is that gretl produces

“classical” standard errors by default (in all cases apart from GMM estimation). In script mode you

can get robust standard errors by appending the --robust ﬂag to estimation commands. In the GUI

program the model speciﬁcation dialog usually contains a “Robust standard errors” check box, along

with a “conﬁgure” button that is activated when the box is checked. The conﬁgure button takes you

to a conﬁguration dialog (which can also be reached from the main menu bar: Tools →Preferences

→General →HCCME). There you can select from a set of possible robust estimation variants, and

can also choose to make robust estimation the default.

The speciﬁcs of the available options depend on the nature of the data under consideration—cross-

sectional, time series or panel—and also to some extent the choice of estimator. (Although we

introduced robust standard errors in the context of OLS above, they may be used in conjunction with

other estimators too.) The following three sections of this chapter deal with matters that are speciﬁc

to the three sorts of data just mentioned. Note that additional details regarding covariance matrix

estimation in the context of GMM are given in chapter 27.

We close this introduction with a brief statement of what “robust standard errors” can and cannot

achieve. They can provide for asymptotically valid statistical inference in models that are basically

correctly speciﬁed, but in which the errors are not iid. The “asymptotic” part means that they may

be of little use in small samples. The “correct speciﬁcation” part means that they are not a magic

bullet: if the error term is correlated with the regressors, so that the parameter estimates themselves

are biased and inconsistent, robust standard errors will not save the day.

22.2 Cross-sectional data and the HCCME

With cross-sectional data, the most likely departure from iid errors is heteroskedasticity (non-constant

variance).1In some cases one may be able to arrive at a judgment regarding the likely form of the

heteroskedasticity, and hence to apply a speciﬁc correction. The more common case, however, is

where the heteroskedasticity is of unknown form. We seek an estimator of the covariance matrix

of the parameter estimates that retains its validity, at least asymptotically, in face of unspeciﬁed

heteroskedasticity. It is not obvious a priori that this should be possible, but White (1980) showed

that d

Varh(ˆ

β) = (X′X)−1X′ˆ

ΩX(X′X)−1(22.5)

does the trick. (As usual in statistics we need to say “under certain conditions”, but the conditions

are not very restrictive.) ˆ

Ω is in this context a diagonal matrix, whose non-zero elements may be

estimated using squared OLS residuals. White referred to (22.5) as a heteroskedasticity-consistent

covariance matrix estimator (HCCME).

Davidson and MacKinnon (2004, chapter 5) oﬀer a useful discussion of several variants on White’s

HCCME theme. They refer to the original variant of (22.5)—in which the diagonal elements of ˆ

Ω are

estimated directly by the squared OLS residuals, ˆu2

t—as HC0. (The associated standard errors are

often called “White’s standard errors”.) The various reﬁnements of White’s proposal share a common

point of departure, namely the idea that the squared OLS residuals are likely to be “too small” on

average. This point is quite intuitive. The OLS parameter estimates, ˆ

β, satisfy by design the criterion

that the sum of squared residuals,

Xˆu2

t=Xyt−Xtˆ

β2

is minimized for given Xand y. Suppose that ˆ

β=β. This is almost certain to be the case: even if

OLS is not biased it would be a miracle if the ˆ

βcalculated from any ﬁnite sample were exactly equal

1In some specialized contexts spatial autocorrelation may be an issue. Gretl does not have any built-in methods to

handle this and we will not discuss it here.

Chapter 22. Robust covariance matrix estimation 194

to β. But in that case the sum of squares of the true, unobserved errors,Pu2

t=P(yt−Xtβ)2is

bound to be greater than Pˆu2

t. The elaborated variants on HC0take this point on board as follows:

•HC1: Applies a degrees-of-freedom correction, multiplying the HC0matrix by T/(T−k).

•HC2: Instead of using ˆu2

tfor the diagonal elements of ˆ

Ω, uses ˆu2

t/(1 −ht), where ht=

Xt(X′X)−1X′

t, the tth diagonal element of the projection matrix PX, which has the property

that PX·y= ˆy. The relevance of htis that if the variance of all the utis σ2, the expectation

of ˆu2

tis σ2(1 −ht), or in other words, the ratio ˆu2

t/(1 −ht) has expectation σ2. As Davidson

and MacKinnon show, 0 ≤ht<1 for all t, so this adjustment cannot reduce the the diagonal

elements of ˆ

Ω and in general revises them upward.

•HC3: Uses ˆu2

t/(1 −ht)2. The additional factor of (1 −ht) in the denominator, relative to HC2,

may be justiﬁed on the grounds that observations with large variances tend to exert a lot of

inﬂuence on the OLS estimates, so that the corresponding residuals tend to be under-estimated.

See Davidson and MacKinnon for a fuller explanation.

•HC3a: Implements the jackknife approach from MacKinnon and White (1985). (HC3is a close

approximation of this.)

The relative merits of these variants have been explored by means of both simulations and theoretical

analysis. Unfortunately there is not a clear consensus on which is “best”. Davidson and MacKinnon

argue that the original HC0is likely to perform worse than the others and in gretl the default is HC1.

If you want comparability with other software that reports “White’s standard errors” you can choose

HC0.

If you wish to use a version other than HC1you can arrange for this in either of two ways. In script

or console mode you can do, for example,

set hc_version 2

and the version you specify will be applied in the current gretl session. In the GUI program you can

go to the HCCME conﬁguration dialog, as noted above, and choose any of these variants to be the

default: a choice made in this way persists across gretl sessions.

22.3 Time series data and HAC covariance matrices

Heteroskedasticity may be an issue with time series data too but it is unlikely to be the only, or even

the primary, concern.

One form of heteroskedasticity is common in macroeconomic time series but is fairly easily dealt

with. That is, in the case of strongly trending series such as Gross Domestic Product, aggregate

consumption, aggregate investment, and so on, higher levels of the variable in question are likely

to be associated with higher variability in absolute terms. The obvious “ﬁx”, employed in many

macroeconometric studies, is to use the logs of such series rather than the raw levels. Provided the

proportional variability of such series remains roughly constant over time the log transformation is

eﬀective.

Other forms of heteroskedasticity may resist the log transformation, but may demand a special treat-

ment distinct from the calculation of robust standard errors. We have in mind here autoregressive

conditional heteroskedasticity, for example in the behavior of asset prices, where large disturbances to

the market may usher in periods of increased volatility. Such phenomena call for speciﬁc estimation

strategies, such as GARCH (see chapter 31).

Despite the points made above, some residual degree of heteroskedasticity may be present in time

series data: the key point is that in most cases it is likely to be combined with serial correlation

(autocorrelation), hence demanding a special treatment. In White’s approach, ˆ

Ω, the estimated

covariance matrix of the ut, remains conveniently diagonal: the variances, E(u2

t), may diﬀer by tbut

the covariances, E(utus) for s=t, are all zero. Autocorrelation in time series data means that at

least some of the the oﬀ-diagonal elements of ˆ

Ω should be non-zero. This introduces a substantial

complication and requires another piece of terminology: estimates of the covariance matrix that are

Chapter 22. Robust covariance matrix estimation 195

asymptotically valid in face of both heteroskedasticity and autocorrelation of the error process are

termed HAC (heteroskedasticity- and autocorrelation-consistent).

The issue of HAC estimation is treated in more technical terms in chapter 27. Here we try to convey

some of the intuition at a more basic level. We begin with a general comment: residual autocorrelation

is not so much a property of the data as a symptom of an inadequate model. Data may be persistent

though time, and if we ﬁt a model that does not take this aspect into account properly we end up

with a model with autocorrelated disturbances. Conversely, it is often possible to mitigate or even

eliminate the problem of autocorrelation by including relevant lagged variables in a time series model,

or in other words, by specifying the dynamics of the model more fully. HAC estimation should not

be seen as the ﬁrst resort in dealing with an autocorrelated error process.

That said, the “obvious” extension of White’s HCCME to the case of autocorrelated errors would seem

to be this: estimate the oﬀ-diagonal elements of ˆ

Ω (that is, the autocovariances, E(utus)) using, once

again, the appropriate OLS residuals: ˆωts = ˆutˆus. This is basically right, but demands an important

amendment. We seek a consistent estimator, one that converges towards the true Ω as the sample

size tends towards inﬁnity. This can’t work if we allow unbounded serial dependence. A larger sample

will enable us to estimate more of the true ωts elements (that is, for tand smore widely separated

in time) but will not contribute ever-increasing information regarding the maximally separated ωts

pairs since the maximal separation itself grows with the sample size. To ensure consistency we have

to conﬁne our attention to processes exhibiting temporally limited dependence. In other words we

cut oﬀ the computation of the ˆωts values at some maximum value of p=t−s, where pis treated as

an increasing function of the sample size, T, although it cannot increase in proportion to T.

The simplest variant of this idea is to truncate the computation at some ﬁnite lag order p, where

pgrows as, say, T1/4. The trouble with this is that the resulting ˆ

Ω may not be a positive deﬁnite

matrix. In practical terms, we may end up with negative estimated variances. One solution to this

problem is oﬀered by The Newey–West estimator (Newey and West,1987), which assigns declining

weights to the sample autocovariances as the temporal separation increases.

To understand this point it is helpful to look more closely at the covariance matrix given in (22.5),

namely,

(X′X)−1(X′ˆ

ΩX)(X′X)−1

This is known as a “sandwich” estimator. The bread, which appears on both sides, is (X′X)−1. This

k×kmatrix is also the key ingredient in the computation of the classical covariance matrix. The

ﬁlling in the sandwich is

Σ = X′ˆ

ΩX

(k×k) (k×T) (T×T) (T×k)

It can be proven that under mild regularity conditions T−1ˆ

Σ is a consistent estimator of the long-run

covariance matrix of the random k-vector xt·ut.

From a computational point of view it is neither necessary nor desirable to store the (potentially

very large) T×Tmatrix ˆ

Ω as such. Rather, one computes the sandwich ﬁlling by summation as a

weighted sum:

Σ = ˆ

Γ(0) +

j=1

wjˆ

Γ(j) + ˆ

Γ′(j)

where wjis the weight given to lag j > 0 and the k×kmatrix ˆ

Γ(j), for j≥0, is given by

Γ(j) =

t=j+1

ˆutˆut−jX′

tXt−j;

that is, the sample autocovariance matrix of xt·utat lag j, apart from a scaling factor T.

This leaves two questions. How exactly do we determine the maximum lag length or “bandwidth”, p,

of the HAC estimator? And how exactly are the weights wjto be determined? We will return to the

(diﬃcult) question of the bandwidth shortly. As regards the weights, gretl oﬀers three variants. The

default is the Bartlett kernel, as used by Newey and West. This sets

wj=(1−j

p+1 j≤p

0j > p

Chapter 22. Robust covariance matrix estimation 196

so the weights decline linearly as jincreases. The other two options are the Parzen kernel and the

Quadratic Spectral (QS) kernel. For the Parzen kernel,

wj=









1−6a2

j+ 6a3

j0≤aj≤0.5

2(1 −aj)30.5< aj≤1

0aj>1

where aj=j/(p+ 1), and for the QS kernel,

wj=25

12π2d2

jsin mj

mj−cos mj

where dj=j/p and mj= 6πdj/5.

Figure 22.1 shows the weights generated by these kernels, for p= 4 and j= 1 to 9.

Figure 22.1: Three HAC kernels

Bartlett Parzen QS

In gretl you select the kernel using the set command with the hac_kernel parameter:

set hac_kernel parzen

set hac_kernel qs

set hac_kernel bartlett

Selecting the HAC bandwidth

The asymptotic theory developed by Newey, West and others tells us in general terms how the HAC

bandwidth, p, should grow with the sample size, T—that is, pshould grow in proportion to some

fractional power of T. Unfortunately this is of little help to the applied econometrician, working with

a given dataset of ﬁxed size. Various rules of thumb have been suggested and gretl implements two

such. The default is p= 0.75T1/3, as recommended by Stock and Watson (2003). An alternative

is p= 4(T/100)2/9, as in Wooldridge (2002b). In each case one takes the integer part of the result.

These variants are labeled nw1 and nw2 respectively, in the context of the set command with the

hac_lag parameter. That is, you can switch to the version given by Wooldridge with

set hac_lag nw2

As shown in Table 22.1 the choice between nw1 and nw2 does not make a great deal of diﬀerence.

T p (nw1)p(nw2)

50 2 3

100 3 4

150 3 4

200 4 4

300 5 5

400 5 5

Table 22.1: HAC bandwidth: two rules of thumb

You also have the option of specifying a ﬁxed numerical value for p, as in

Chapter 22. Robust covariance matrix estimation 197

set hac_lag 6

In addition you can set a distinct bandwidth for use with the Quadratic Spectral kernel (since this

need not be an integer). For example,

set qs_bandwidth 3.5

Prewhitening and data-based bandwidth selection

An alternative approach is to deal with residual autocorrelation by attacking the problem from two

sides. The intuition behind the technique known as VAR prewhitening (Andrews and Monahan,1992)

can be illustrated by a simple example. Let xtbe a sequence of ﬁrst-order autocorrelated random

variables

xt=ρxt−1+ut

The long-run variance of xtcan be shown to be

VLR(xt) = VLR(ut)

(1 −ρ)2

In most cases utis likely to be less autocorrelated than xt, so a smaller bandwidth should suﬃce.

Estimation of VLR(xt) can therefore proceed in three steps: (1) estimate ρ; (2) obtain a HAC estimate

of ˆut=xt−ˆρxt−1; and (3) divide the result by (1 −ρ)2.

The application of the above concept to our problem implies estimating a ﬁnite-order Vector Autore-

gression (VAR) on the vector variables ξt=Xtˆut. In general the VAR can be of any order, but in

most cases 1 is suﬃcient; the aim is not to build a watertight model for ξt, but just to “mop up” a

substantial part of the autocorrelation. Hence, the following VAR is estimated

ξt=Aξt−1+εt

Then an estimate of the matrix X′ΩXcan be recovered via

(I−ˆ

A)−1ˆ

Σε(I−ˆ

A′)−1

where ˆ

Σεis any HAC estimator, applied to the VAR residuals.

You can ask for prewhitening in gretl using

set hac_prewhiten on

There is at present no mechanism for specifying an order other than 1 for the initial VAR.

A further reﬁnement is available in this context, namely data-based bandwidth selection. It makes

intuitive sense that the HAC bandwidth should not simply be based on the size of the sample, but

should somehow take into account the time-series properties of the data (and also the kernel chosen).

A nonparametric method for doing this was proposed by Newey and West (1994); a good concise

account of the method is given in Hall (2005). This option can be invoked in gretl via

set hac_lag nw3

This option is the default when prewhitening is selected, but you can override it by giving a speciﬁc

numerical value for hac_lag.

Even the Newey–West data-based method does not fully pin down the bandwidth for any particular

sample. The ﬁrst step involves calculating a series of residual covariances. The length of this series

is given as a function of the sample size, but only up to a scalar multiple—for example, it is given as

O(T2/9) for the Bartlett kernel. Gretl uses an implied multiple of 1.

Newey–West with missing values

If the estimation sample for a time-series model includes incomplete observations—where the value of

the dependent variable or one more regressors is missing—the Newey–West procedure must be either

modiﬁed or abandoned, since some ingredients of the ˆ

Σ matrix deﬁned above will be absent. Two

Chapter 22. Robust covariance matrix estimation 198

modiﬁed methods have been discussed in the literature. Parzen (1963) proposed what he called Am-

plitude Modulation (AM), which involves setting the values of the residual and each of the regressors

to zero for the incomplete observations (and then proceeding as usual). Datta and Du (2012) propose

the so-called Equal Spacing (ES) method: calculate as if the incomplete observations did not exist,

and the complete observations therefore form an equally-spaced series. Somewhat suprisingly, it can

be shown that both of these methods have appropriate asymptotic properties; see Rho and Vogelsang

(2018) for further elaboration.

In gretl you can select a preferred method via one or other of these commands:

set hac_missvals es # ES, Datta and Du

set hac_missvals am # AM, Parzen

set hac_missvals off

The ES method is the default. The off option means that gretl will refuse to produce HAC standard

errors when the sample includes incomplete observations: use this if you have qualms about the

modiﬁed methods.

VARs: a special case

A well-speciﬁed vector autoregression (VAR) will generally include enough lags of the dependent vari-

ables to obviate the problem of residual autocorrelation, in which case HAC estimation is redundant—

although there may still be a need to correct for heteroskedasticity. For that reason plain HCCME,

and not HAC, is the default when the --robust ﬂag is given in the context of the var command.

However, if for some reason you need HAC you can force the issue by giving the option --robust-hac.

Long-run variance

Let us expand a little on the subject of the long-run variance that was mentioned above and the

associated tools oﬀered by gretl for scripting. (You may also want to check out the reference for

the lrcovar function for the multivariate case.) As is well known, the variance of the average of T

random variables x1, x2, . . . , xTwith equal variance σ2equals σ2/T if the data are uncorrelated. In

this case, the sample variance of xtover the sample size provides a consistent estimator.

If, however, there is serial correlation among the xts, the variance of ¯

X=T−1PT

t=1 xtmust be

estimated diﬀerently. One of the most widely used statistics for this purpose is a nonparametric

kernel estimator with the Bartlett kernel deﬁned as

ˆω2(k) = T−1

T−k

t=k"k

i=−k

wi(xt−¯

X)(xt−i−¯

X)#,(22.6)

where the integer kis known as the window size and the witerms are the so-called Bartlett weights,

deﬁned as wi= 1 −|i|

k+1 . It can be shown that, for klarge enough, ˆω2(k)/T yields a consistent

estimator of the variance of ¯

gretl implements this estimator by means of the function lrvar(). This function takes one required

argument, namely the series whose long-run variance is to be estimated, followed by two optional

arguments. The ﬁrst of these can be used to supply a value for k; if it is omitted or negative, the

popular choice T1/3is used. The second allows speciﬁcation of an assumed value for the population

mean of X, which then replaces ¯

Xin the variance calculation. Usage is illustrated below.

# automatic window size; use xbar for mean

lrs2 = lrvar(x)

# set a window size of 12

lrs2 = lrvar(x, 12)

# set window size and impose assumed mean of zero

lrs2 = lrvar(x, 12, 0)

# impose mean zero, automatic window size

lrs2 = lrvar(x, -1, 0)

Chapter 22. Robust covariance matrix estimation 199

22.4 Special issues with panel data

Since panel data have both a time-series and a cross-sectional dimension one might expect that, in

general, robust estimation of the covariance matrix would require handling both heteroskedasticity

and autocorrelation (the HAC approach). In addition, some special features of panel data require

attention.

•The variance of the error term may diﬀer across the cross-sectional units.

•The covariance of the errors across the units may be non-zero in each time period.

•If the “between” variation is not swept out, the errors may exhibit autocorrelation, not in the

usual time-series sense but in the sense that the mean value of the error term may diﬀer across

units. This is relevant when estimation is by pooled OLS.

Gretl currently oﬀers three panel-speciﬁc covariance matrix estimators in response to the --robust

option. These are available for models estimated via ﬁxed eﬀects, random eﬀects, pooled OLS, and

pooled two-stage least squares. The default robust estimator is that suggested by Arellano (2003),

which is HAC provided the panel is of the “large n, small T” variety (that is, many units are observed

in relatively few periods). The Arellano estimator involves clustering by the cross-sectional unit:

ΣA= (X′X)−1 n

i=1

X′

iˆuiˆu′

iXi!(X′X)−1

where Xis the matrix of regressors (with the group means subtracted in the case of ﬁxed eﬀects, or

quasi-demeaned in the case of random eﬀects) ˆuidenotes the vector of residuals for unit i, and nis

the number of such units. Cameron and Trivedi (2005) make a strong case for using this estimator;

they note that the ordinary White HCCME can produce misleadingly small standard errors in the

panel context because it fails to take autocorrelation into account.2In addition Stock and Watson

(2008) show that the White HCCME is inconsistent in the ﬁxed-eﬀects panel context for ﬁxed T > 2.

In cases where autocorrelation is not an issue the estimator proposed by Beck and Katz (1995) and

discussed by Greene (2003, chapter 13) may be appropriate. This estimator, which takes into account

contemporaneous correlation across the units and heteroskedasticity by unit, is

ΣBK = (X′X)−1



i=1

j=1

ˆσij X′

iXj

(X′X)−1

The covariances ˆσij are estimated via

ˆσij =ˆu′

iˆuj

where Tiis the length of the time series for unit i. Beck and Katz call the associated standard errors

“Panel-Corrected Standard Errors” (PCSE). This estimator can be invoked in gretl via the command

set panel_robust pcse

The Arellano default can be re-established via

set panel_robust arellano

The third panel-speciﬁc option is the spatial correlation consistent (SCC) estimator developed by

Driscoll and Kraay (1998). This addresses cross-sectional dependence of the disturbances as well as

heteroskedasticity and autocorrelation. Serial correlation is handled in the manner of Newey–West

and, unlike the Arellano estimator, consistency is not limited to the “small T” case. The command

to select SCC is

set panel_robust scc

2See also Cameron and Miller (2015) for a discussion of the Arellano-type estimator in the context of the random

eﬀects model.

Chapter 22. Robust covariance matrix estimation 200

An additional“set” variable is relevant when using this estimator, namely hac_lag, which governs the

bandwidth of the kernel employed in the Newey–West component as described in section 22.3 above.

But note that in the SCC context only the Bartlett kernel is supported, and neither prewhitening nor

data-based bandwidth selection are available. So the applicable hac_lag variants are just nw1,nw2

or a user-speciﬁed maximum (integer) lag. To replicate results from the xtscc command for Stata

(Hoechle,2007) the nw2 variant should be selected.

Note that regardless of the panel_robust setting, the robust estimator is not used unless the --robust

ﬂag is given with the estimation command (or the“Robust” box is checked in the graphical interface).

For some further remarks on the panel case, the following section.

22.5 The cluster-robust estimator

We begin by describing the cluster-robust variance estimator in general terms. Speciﬁc points per-

taining to panel data follow in the ﬁnal subsection.

This estimator is appropriate when the observations naturally fall into groups or clusters, and the

error term exhibits dependency within the clusters and/or heteroskedasticity across clusters. Such

clusters may be binary (e.g. employed versus unemployed workers), categorical with several values (e.g.

products grouped by manufacturer) or ordinal (e.g. individuals with low, middle or high education

levels).

For linear regression models estimated via least squares the cluster estimator is deﬁned as

ΣC= (X′X)−1



j=1

X′

jˆujˆu′

jXj

(X′X)−1

where mdenotes the number of clusters, and Xjand ˆujdenote, respectively, the matrix of regressors

and the vector of residuals that fall within cluster j. As noted above, the Arellano variance estimator

for panel data models is a special case of this, where the clustering is by panel unit.

For models estimated by the method of Maximum Likelihood (in which case the standard variance

estimator is the inverse of the negative Hessian, H), the cluster estimator is

ΣC=H−1



j=1

G′

jGj

H−1

where Gjis the sum of the “score” (that is, the derivative of the loglikelihood with respect to the

parameter estimates) across the observations falling within cluster j.

It is common to apply a degrees of freedom adjustment to these estimators (otherwise the variance

may appear misleadingly small in comparison with other estimators if the number of clusters is small).

In the least squares case the factor is (m/(m−1)) ×(n−1)/(n−k), where nis the total number of

observations and kis the number of parameters estimated; in the case of ML estimation the factor is

just m/(m−1).

Availability and syntax

Cluster-robust estimation is invoked via the --cluster option, which is available for models estimated

via OLS and TSLS, and also for most ML estimators other than those specialized for time-series data;

so binary logit and probit, ordered logit and probit, multinomial logit, Tobit, interval regression,

biprobit, count models and duration models are all supported. In addition this option is available for

generic maximum likelihood estimation as provided by the mle command (see chapter 26 for more

details).

The --cluster option has a required parameter, the name of a series that deﬁnes the clusters, as in

ols y 0 x1 x2 --cluster=cvar

The speciﬁed series must (a) be deﬁned (not missing) at all observations used in estimating the model

and (b) take on at least two distinct values over the estimation range. The clusters are deﬁned as

sets of observations having a common value for the series in question. It is generally expected that

the number of clusters is substantially less than the total number of observations.

Chapter 22. Robust covariance matrix estimation 201

Panel data speciﬁcs

In the case of panel data, two additional features are supported. First, two comma-separated cluster

variables can be given with the --cluster option, for two-way clustering. Second, the special terms

$unit and $time can be used in place of names of existing series to specify, respectively, clustering

by cross-sectional unit and by time period. So for example, the command

panel ... --cluster=$unit,$time

invokes two-way clustering that allows for dependence of the error term both within unit and within

period. The special terms can be used individually, in combination as shown above, or in combination

with other suitable series.

☞You can use named series that identify unit and time rather than the $-terms, but the latter are a good deal

more eﬃcient since they key directly into the panel structure of the dataset. Given a arbitrary named series more

work is needed to determine which observations fall within which clusters.

A plausible case for clustering by a named series arises if the dataset contains a series that identiﬁes

a set of groups into which the panel units fall—for example, the panel units are individuals but you

know which individuals are members of which household, or the panel units are counties but you

know in which state each county is located. Then you may wish to cluster at the more aggregated

level, as in

panel ... --cluster=household

# OR

panel ... --cluster=state

In these cases, of course, the parameter to --cluster must be the name of a series that does the job,

providing a unique identiﬁer for each of the groups.

The method for two-way clustering is that described by Cameron et al. (2011); that is, the variance

estimator is ˆ

ΣC2 = Σ1+ Σ2−Σ1,2

where Σ1and Σ2are the variance estimators produced using the two cluster variables taken indi-

vidually and Σ1,2is the estimator obtained from their combination. In case ˆ

ΣC2 is not positive

semi-deﬁnite we calculate its eigen decomposition, set any negative eigenvalues to zero, and recreate

the matrix.

Note that if a cluster variable C1 is nested within a more highly aggregated one, C2, then two-way

clustering is not called for since it amounts to clustering on C2 alone. To continue the example from

above, if the panel units are counties and the assignment of counties to states is time-invariant, the

command

panel ... --cluster=$unit,state # not recommended!

is just a round-about way of asking for clustering by state.

The precise magnitude of standard errors produced in case of cluster-robust estimation depends

on whether a degrees of freedom adjustment is applied, and if so on how exactly the adjustment is

calculated—a matter which is somewhat debatable. The default procedure in gretl is that of Cameron

et al. (2011). That is, we apply an adjustment factor equal to

m−1×n−1

n−k

where mis the number of clusters, nis the total number of observations, and kis the number of

parameters estimated. Results then agree with Stata’s xtreg command and also the cgmreg command

made available by Colin Cameron.3To produce clustered standard errors in agreement with the

popular contributed Stata command xtivreg2 it is necessary to suppress this adjustment: append

the --no-df-corr option to the panel command.

3As of 2023-10-12 this is available via https://cameron.econ.ucdavis.edu/research/cgmreg.ado.

Chapter 23

Panel data

A panel dataset is one in which each of N > 1 units (sometimes called “individuals” or “groups”) is

observed over time. In a balanced panel there are T > 1 observations on each unit; more generally

the number of observations may diﬀer by unit. In the following we index units by iand time by t.

To allow for imbalance in a panel we use the notation Tito refer to the number of observations for

unit or individual i.

23.1 Estimation of panel models

Pooled Ordinary Least Squares

The simplest estimator for panel data is pooled OLS. In most cases this is unlikely to be adequate,

but it provides a baseline for comparison with more complex estimators.

If you estimate a model on panel data using OLS an additional test item becomes available. In the

GUI model window this is the item “panel speciﬁcation”under the Tests menu; the script counterpart

is the panspec command.

To take advantage of this test, you should specify a model without any dummy variables representing

cross-sectional units. The test compares pooled OLS against the principal alternatives, the ﬁxed

eﬀects and random eﬀects models. These alternatives are explained in the following section.

The ﬁxed and random eﬀects models

In the graphical interface these options are found under the menu item “Model/Panel/Fixed and

random eﬀects”. In the command-line interface one uses the panel command, with or without the

--random-effects option. (The --fixed-effects option is also allowed but not strictly necessary,

being the default.)

This section explains the nature of these models and comments on their estimation via gretl.

The pooled OLS speciﬁcation may be written as

yit =Xitβ+uit (23.1)

where yit is the observation on the dependent variable for cross-sectional unit iin period t,Xit is a

1×kvector of independent variables observed for unit iin period t,βis a k×1 vector of parameters,

and uit is an error or disturbance term speciﬁc to unit iin period t.

The ﬁxed and random eﬀects models have in common that they decompose the unitary pooled error

term, uit. For the ﬁxed eﬀects model we write uit =αi+εit, yielding

yit =Xitβ+αi+εit (23.2)

That is, we decompose uit into a unit-speciﬁc and time-invariant component, αi, and an observation-

speciﬁc error, εit.1The αis are then treated as ﬁxed parameters (in eﬀect, unit-speciﬁc y-intercepts),

which are to be estimated. This can be done by including a dummy variable for each cross-sectional

unit (and suppressing the global constant). This is sometimes called the Least Squares Dummy

Variables (LSDV) method. Alternatively, one can subtract the group mean from each of variables

and estimate a model without a constant. In the latter case the dependent variable may be written

˜yit =yit −¯yi

1It is possible to break a third component out of uit, namely wt, a shock that is time-speciﬁc but common to all

the units in a given period. In the interest of simplicity we do not pursue that option here.

202

Chapter 23. Panel data 203

The “group mean”, ¯yi, is deﬁned as

¯yi=1

t=1

yit

where Tiis the number of observations for unit i. An exactly analogous formulation applies to the

independent variables. Given parameter estimates, ˆ

β, obtained using such de-meaned data we can

recover estimates of the αis using

ˆαi=1

t=1 yit −Xit ˆ

β

These two methods (LSDV, and using de-meaned data) are numerically equivalent. gretl takes the

approach of de-meaning the data. If you have a small number of cross-sectional units, a large number

of time-series observations per unit, and a large number of regressors, it is more economical in terms

of computer memory to use LSDV. If need be you can easily implement this manually. For example,

genr unitdum

ols y x du_*

(See Chapter 10 for details on unitdum).

The ˆαiestimates are not printed as part of the standard model output in gretl (there may be a large

number of these, and typically they are not of much inherent interest). However you can retrieve

them after estimation of the ﬁxed eﬀects model if you wish. In the graphical interface, go to the

“Save” menu in the model window and select “per-unit constants”. In command-line mode, you can

do series newname =$ahat, where newname is the name you want to give the series.

For the random eﬀects model we write uit =vi+εit, so the model becomes

yit =Xitβ+vi+εit (23.3)

In contrast to the ﬁxed eﬀects model, the vis are not treated as ﬁxed parameters, but as random

drawings from a given probability distribution.

The celebrated Gauss–Markov theorem, according to which OLS is the best linear unbiased estimator

(BLUE), depends on the assumption that the error term is independently and identically distributed

(IID). In the panel context, the IID assumption means that E(u2

it), in relation to equation 23.1,

equals a constant, σ2

u, for all iand t, while the covariance E(uisuit ) equals zero for all s=tand the

covariance E(ujt uit) equals zero for all j=i.

If these assumptions are not met—and they are unlikely to be met in the context of panel data—OLS

is not the most eﬃcient estimator. Greater eﬃciency may be gained using generalized least squares

(GLS), taking into account the covariance structure of the error term.

Consider observations on a given unit iat two diﬀerent times sand t. From the hypotheses above it

can be worked out that Var(uis) = Var(uit) = σ2

v+σ2

ε, while the covariance between uis and uit is

given by E(uisuit) = σ2

In matrix notation, we may group all the Tiobservations for unit iinto the vector yiand write it as

yi=Xiβ+ui(23.4)

The vector ui, which includes all the disturbances for individual i, has a variance–covariance matrix

given by

Var(ui)=Σi=σ2

εI+σ2

vJ(23.5)

where Jis a square matrix with all elements equal to 1. It can be shown that the matrix

Ki=I−θi

where θi= 1 −pσ2

ε/(σ2

ε+Tiσ2

v), has the property

KiΣK′

i=σ2

εI

Chapter 23. Panel data 204

It follows that the transformed system

Kiyi=KiXiβ+Kiui(23.6)

satisﬁes the Gauss–Markov conditions, and OLS estimation of (23.6) provides eﬃcient inference. But

since

Kiyi=yi−θi¯

GLS estimation is equivalent to OLS using “quasi-demeaned” variables; that is, variables from which

we subtract a fraction θof their average.2Notice that for σ2

ε→0, θ→1, while for σ2

v→0, θ→0.

This means that if all the variance is attributable to the individual eﬀects, then the ﬁxed eﬀects

estimator is optimal; if, on the other hand, individual eﬀects are negligible, then pooled OLS turns

out, unsurprisingly, to be the optimal estimator.

To implement the GLS approach we need to calculate θ, which in turn requires estimates of the two

variances σ2

εand σ2

v. (These are often referred to as the “within” and“between”variances respectively,

since the former refers to variation within each cross-sectional unit and the latter to variation between

the units). Several means of estimating these magnitudes have been suggested in the literature (see

Baltagi,1995); by default gretl uses the method of Swamy and Arora (1972): σ2

εis estimated by the

residual variance from the ﬁxed eﬀects model, and σ2

vis estimated indirectly with the help of the

“between” regression which uses the group means of all the relevant variables: is,

¯yi=¯

Xiβ+ei

The residual variance from this regression, s2

e, can be shown to estimate the sum σ2

v+σ2

ε/T . An

estimate of σ2

vcan therefore be obtained by subtracting 1/T times the estimate of σ2

εfrom s2

ˆσ2

v=s2

e−ˆσ2

ε/T (23.7)

Alternatively, if the --nerlove option is given, gretl uses the method suggested by Nerlove (1971).

In this case σ2

vis estimated as the sample variance of the ﬁxed eﬀects, ˆαi,

ˆσ2

v=1

N−1

i=1 ˆαi−¯

ˆα2(23.8)

where Nis the number of individuals and ¯

ˆαis the mean of the estimated ﬁxed eﬀects.

Swamy and Arora’s equation (23.7) involves T, hence assuming a balanced panel. When the number

of time series observations, Ti, diﬀers across individuals some sort of adjustment is needed. By

default gretl follows Stata by using the harmonic mean of the Tis in place of T. It may be argued,

however, that a more substantial adjustment is called for in the unbalanced case. Baltagi and Chang

(1994) recommend a variant of Swamy–Arora which involves Ti-weighted estimation of the between

regression, on the basis that units with more observations will be more informative about the variance

of interest. In gretl one can switch to the Baltagi–Chang variant by giving the --unbalanced option

with the panel command. But the gain in eﬃciency from doing so may well be slim; for a discussion

of this point and related matters see Cottrell (2017). Unbalancedness also aﬀects the Nerlove (1971)

estimator, but the econometric literature oﬀers no guidance on the details. Gretl uses the weighted

average of the ﬁxed eﬀects as a natural extension of the original method. Again, see Cottrell (2017)

for further details.

Choice of estimator

Which panel method should one use, ﬁxed eﬀects or random eﬀects?

One way of answering this question is in relation to the nature of the data set. If the panel comprises

observations on a ﬁxed and relatively small set of units of interest (say, the member states of the

European Union), there is a presumption in favor of ﬁxed eﬀects. If it comprises observations on

a large number of randomly selected individuals (as in many epidemiological and other longitudinal

studies), there is a presumption in favor of random eﬀects.

Besides this general heuristic, however, various statistical issues must be taken into account.

2In a balanced panel, the value of θis common to all individuals, otherwise it diﬀers depending on the value of Ti.

Chapter 23. Panel data 205

1. Some panel data sets contain variables whose values are speciﬁc to the cross-sectional unit but

which do not vary over time. If you want to include such variables in the model, the ﬁxed

eﬀects option is simply not available. When the ﬁxed eﬀects approach is implemented using

dummy variables, the problem is that the time-invariant variables are perfectly collinear with

the per-unit dummies. When using the approach of subtracting the group means, the issue is

that after de-meaning these variables are nothing but zeros.

2. A somewhat analogous issue arises with the random eﬀects estimator. As mentioned above,

the default Swamy–Arora method relies on the group-means regression to obtain a measure of

the between variance. Suppose we have observations on nunits or individuals and there are k

independent variables of interest. If k > n, this regression cannot be run—since we have only

neﬀective observations—and hence Swamy–Arora estimates cannot be obtained. In this case,

however, it is possible to use Nerlove’s method instead.

If both ﬁxed eﬀects and random eﬀects are feasible for a given speciﬁcation and dataset, the choice

between these estimators may be expressed in terms of the two econometric desiderata, eﬃciency and

consistency.

From a purely statistical viewpoint, we could say that there is a tradeoﬀ between robustness and

eﬃciency. In the ﬁxed eﬀects approach, we do not make any hypotheses on the “group eﬀects” (that

is, the time-invariant diﬀerences in mean between the groups) beyond the fact that they exist—and

that can be tested; see below. As a consequence, once these eﬀects are swept out by taking deviations

from the group means, the remaining parameters can be estimated.

On the other hand, the random eﬀects approach attempts to model the group eﬀects as drawings

from a probability distribution instead of removing them. This requires that individual eﬀects are

representable as a legitimate part of the disturbance term, that is, zero-mean random variables,

uncorrelated with the regressors.

As a consequence, the ﬁxed-eﬀects estimator “always works”, but at the cost of not being able to

estimate the eﬀect of time-invariant regressors. The richer hypothesis set of the random-eﬀects esti-

mator ensures that parameters for time-invariant regressors can be estimated, and that estimation of

the parameters for time-varying regressors is carried out more eﬃciently. These advantages, though,

are tied to the validity of the additional hypotheses. If, for example, there is reason to think that

individual eﬀects may be correlated with some of the explanatory variables, then the random-eﬀects

estimator would be inconsistent, while ﬁxed-eﬀects estimates would still be valid. The Hausman test

is built on this principle (see below): if the ﬁxed- and random-eﬀects estimates agree, to within the

usual statistical margin of error, there is no reason to think the additional hypotheses invalid, and as

a consequence, no reason not to use the more eﬃcient RE estimator.

Testing panel models

If you estimate a ﬁxed eﬀects or random eﬀects model in the graphical interface, you may notice that

the number of items available under the“Tests”menu in the model window is relatively limited. Panel

models carry certain complications that make it diﬃcult to implement all of the tests one expects to

see for models estimated on straight time-series or cross-sectional data.

Nonetheless, various panel-speciﬁc tests are printed along with the parameter estimates as a matter

of course, as follows.

When you estimate a model using ﬁxed eﬀects, you automatically get an F-test for the null hypothesis

that the cross-sectional units all have a common intercept. That is to say that all the αis are equal,

in which case the pooled model (23.1), with a column of 1s included in the Xmatrix, is adequate.

When you estimate using random eﬀects (RE), the Breusch–Pagan and Hausman tests are presented

automatically. To save their results in the context of a script one would copy the $model.bp_test

or $model.hausman_test bundles which are nested inside the $model bundle. Both of these inner

bundles contain the elements test,dfn (degrees of freedom), and pvalue.

The Breusch–Pagan test is the counterpart to the F-test mentioned above. The null hypothesis is

that the variance of viin equation (23.3) equals zero; if this hypothesis is not rejected, then again

we conclude that the simple pooled model is adequate. If the panel is unbalanced the method from

Baltagi and Li (1990) is used to perform the Breusch–Pagan test for individual eﬀects.

Chapter 23. Panel data 206

The Hausman test probes the consistency of the GLS estimates. The null hypothesis is that these

estimates are consistent—that is, that the requirement of orthogonality of the viand the Xiis

satisﬁed. The test is based on a measure, H, of the “distance” between the ﬁxed-eﬀects and random-

eﬀects estimates, constructed such that under the null it follows the χ2distribution with degrees

of freedom equal to the number of time-varying regressors in the matrix X. If the value of His

“large” this suggests that the random eﬀects estimator is not consistent and the ﬁxed-eﬀects model is

preferable.

There are two ways of calculating H, the matrix-diﬀerence method and the regression method. The

procedure for the matrix-diﬀerence method is this:

•Collect the ﬁxed-eﬀects estimates in a vector ˜

βand the corresponding random-eﬀects estimates

in ˆ

β, then form the diﬀerence vector (˜

β−ˆ

β).

•Form the covariance matrix of the diﬀerence vector as Var( ˜

β−ˆ

β) = Var( ˜

β)−Var( ˆ

β) = Ψ, where

Var( ˜

β) and Var( ˆ

β) are estimated by the sample variance matrices of the ﬁxed- and random-

eﬀects models respectively.3

•Compute H=˜

β−ˆ

β′Ψ−1˜

β−ˆ

β.

Given the relative eﬃciencies of ˜

βand ˆ

β, the matrix Ψ “should be” positive deﬁnite, in which case

His positive, but in ﬁnite samples this is not guaranteed and of course a negative χ2value is not

admissible.

The regression method avoids this potential problem. The procedure is to estimate, via OLS, an

augmented regression in which the dependent variable is quasi-demeaned yand the regressors include

both quasi-demeaned X(as in the RE speciﬁcation) and the de-meaned variants of all the time-varying

variables (i.e. the ﬁxed-eﬀects regressors). The Hausman null then implies that the coeﬃcients on the

latter subset of regressors should be statistically indistinguishable from zero.

If the RE speciﬁcation employs the default covariance-matrix estimator (assuming IID errors), Hcan

be obtained as follows:

•Treat the random-eﬀects model as the restricted model, and record its sum of squared residuals

as SSRr.

•Estimate the augmented (unrestricted) regression and record its sum of squared residuals as

SSRu.

•Compute H=n(SSRr−SSRu)/SSRu, where nis the total number of observations used.

Alternatively, if the --robust option is selected for RE estimation, His calculated as a Wald test

based on a robust estimate of the covariance matrix of the augmented regression. Either way, H

cannot be negative.

By default gretl computes the Hausman test via the regression method, but it uses the matrix-

diﬀerence method if you pass the option --matrix-diff to the panel command.

Serial correlation

A simple test for ﬁrst-order autocorrelation of the error term, namely the Durbin–Watson (DW)

statistic, is printed as part of the output for pooled OLS as well as ﬁxed-eﬀects and random-eﬀects

estimation. Let us deﬁne “serial correlation proper” as serial correlation strictly in the time dimension

of a panel dataset. When based on the residuals from ﬁxed-eﬀects estimation, the DW statistic is a

test for serial correlation proper.4The DW value shown in the case of random-eﬀects estimation is

based on the ﬁxed-eﬀects residuals. When DW is based on pooled OLS residuals it tests for serial

correlation proper only on the assumption of a common intercept. Put diﬀerently, in this case it tests

a joint null hypothesis: absence of ﬁxed eﬀects plus absence of (ﬁrst order) serial correlation proper.

3Hausman (1978) showed that the covariance of the diﬀerence takes this simple form when ˆ

βis an eﬃcient estimator

and ˜

βis ineﬃcient.

4The generalization of the Durbin–Watson statistic from the straight time-series context to panel data is due to

Bhargava et al. (1982).

Chapter 23. Panel data 207

In the presence of missing observations the DW statistic is calculated as described in Baltagi and Wu

(1999) (their expression for d1under equation (16) on page 819).

When it is computed, the DW statistic can be retrieved via the accessor $dw after estimation. In

addition, an approximate P-value for the null of no serial correlation (ρ= 0) against the alternative

of ρ > 0 may be available via the accessor $dwpval. This is based on the analysis in Bhargava et al.

(1982); strictly speaking it is the marginal signiﬁcance level of DW considered as a dLvalue (the

value below which the test rejects, as opposed to dU, the value above which the test fails to reject).

In the panel case, however, dLand dUare quite close, particularly when N(the number of individual

units) is large. At present gretl does not attempt to compute such P-values when the number of

observations diﬀers across individuals.

Robust standard errors

For most estimators, gretl oﬀers the option of computing an estimate of the covariance matrix that

is robust with respect to heteroskedasticity and/or autocorrelation (and hence also robust standard

errors). In the case of panel data, robust covariance matrix estimators are available for the pooled,

ﬁxed eﬀects and random eﬀects models. See section 22.4 for details.

The constant in the ﬁxed eﬀects model

Users are sometimes puzzled by the constant or intercept reported by gretl on estimation of the ﬁxed

eﬀects model: how can a constant remain when the group means have been subtracted from the data?

The method of calculation of this term is a matter of convention, but the gretl authors decided to

follow the convention employed by Stata; this involves adding the global mean back into the variables

from which the group means have been removed.5If you prefer to interpret the ﬁxed eﬀects model

as “OLS plus unit dummies throughout”, it can be proven the this approach is equivalent to using

centered unit dummies instead of plain 0/1 dummies.

The method that gretl uses internally is exempliﬁed in Listing 23.1. The coeﬃcients in the second

OLS estimation, including the intercept, agree with those in the initial ﬁxed eﬀects model, though the

standard errors diﬀer due to a degrees of freedom correction in the ﬁxed-eﬀects covariance matrix.

(Note that the pmean function returns the group mean of a series.) The third estimator—which

produces quite a lot of output—instead uses the stdize function to create the centered dummies. It

thereby shows the equivalence of the internally-used method to “OLS plus centered dummies”. (Note

that in this case the standard errors agree with the initial estimates.)

R-squared in the ﬁxed eﬀects model

There is no uniquely “correct” way of calculating R2in the context of the ﬁxed-eﬀects model. It

may be argued that a measure of the squared correlation between the dependent variable and the

prediction yielded by the model is a desirable descriptive statistic to have, but which model and which

(variant of the) dependent variable are we talking about?

Fixed-eﬀects models can be thought of in two equally defensible ways. From one perspective they

provide a nice, clean way of sweeping out individual eﬀects by using the fact that in the linear

model a suﬃcient statistic is easy to compute. Alternatively, they provide a clever way to estimate

the “important” parameters of a model in which you want to include (for whatever reason) a full

set of individual dummies. If you take the second of these perspectives, your dependent variable is

unmodiﬁed yand your model includes the unit dummies; the appropriate R2measure is then the

squared correlation between yand the ˆycomputed using both the measured individual eﬀects and

the eﬀects of the explicitly named regressors. This is reported by gretl as the “LSDV R-squared”. If

you take the ﬁrst point of view, on the other hand, your dependent variable is really yit −¯yiand your

model just includes the βterms, the coeﬃcients of deviations of the xvariables from their per-unit

means. In this case, the relevant measure of R2is the so-called “within” R2; this variant is printed

by gretl for ﬁxed-eﬀects model in place of the adjusted R2(it being unclear in this case what exactly

the “adjustment” should amount to anyway).

5See Gould (2013) for an extended explanation.

Chapter 23. Panel data 208

Listing 23.1: Calculating the intercept in the ﬁxed eﬀects model [Download ▼]

open abdata.gdt

list X = w k ys # list of explanatory variables

### built-in method

panel n const X --fixed-effects

### recentering "by hand"

depvar = n - pmean(n) + mean(n) # redefine the dependent variable

list indepvars = const

loop foreach i X

# redefine the explanatory variables

x_$i = $i - pmean($i) + mean($i)

indepvars += x_$i

endloop

ols depvar indepvars # perform estimation

### using centered dummies

list C = dummify(unit) # create the unit dummies

smpl n X --no-missing # adjust to perform centering correctly

list D = stdize(C, -1) # center the unit dummies

ols n const X D # perform estimation

Residuals in the ﬁxed and random eﬀect models

After estimation of most kinds of models in gretl, you can retrieve a series containing the residuals

using the $uhat accessor. This is true of the ﬁxed and random eﬀects models, but the exact meaning

of gretl’s $uhat in these cases requires a little explanation.

Consider ﬁrst the ﬁxed eﬀects model:

yit =Xitβ+αi+εit

In this model gretl takes the “ﬁtted value” ($yhat) to be ˆαi+Xit ˆ

β, and the residual ($uhat) to

be yit minus this ﬁtted value. This makes sense because the ﬁxed eﬀects (the αiterms) are taken

as parameters to be estimated. However, it can be argued that the ﬁxed eﬀects are not really

“explanatory”and if one deﬁnes the residual as the observed yit value minus its “explained” comp onent

one might prefer to see just yit −Xit ˆ

β. You can get this after ﬁxed-eﬀects estimation as follows:

series ue_fe = $uhat + $ahat - $coeff[1]

where $ahat gives the unit-speciﬁc intercept (as it would be calculated if one included all Nunit

dummies and omitted a common y-intercept), and $coeff[1] gives the “global” y-intercept.6

Now consider the random-eﬀects model:

yit =Xitβ+vi+εit

In this case gretl considers the error term to be vi+εit (since viis conceived as a random drawing)

and the $uhat series is an estimate of this, namely

yit −Xit ˆ

What if you want an estimate of just vi(or just εit) in this case? This poses a signal-extraction

problem: given the composite residual, how to recover an estimate of its components? The solution

6For anyone used to Stata, gretl’s ﬁxed-eﬀects $uhat corresponds to what you get from Stata’s “predict, e” after

xtreg, while the second variant corresponds to Stata’s “predict, ue”.

Chapter 23. Panel data 209

is to ascribe to the individual eﬀect, ˆvi, a suitable fraction of the mean residual per individual,

ˆui=PTi

t=1 ˆuit. The “suitable fraction” is the proportion of the variance of the variance of ¯uithat is

due to vi, namely

σ2

v+σ2

ε/Ti

= 1 −(1 −θi)2

After random eﬀects estimation in gretl you can access a series containing the ˆvis under the name

$ahat. This series can be calculated by hand as follows:

# case 1: balanced panel

scalar theta = $["theta"]

series vhat = (1 - (1 - theta)^2) * pmean($uhat)

# case 2: unbalanced, Ti varies by individual

scalar s2v = $["s2v"]

scalar s2e = $["s2e"]

series frac = s2v / (s2v + s2e/pnobs($uhat))

series ahat = frac * pmean($uhat)

If an estimate of εit is wanted, it can then be obtained by subtraction from $uhat.

23.2 Autoregressive panel models

Special problems arise when a lag of the dependent variable is included among the regressors in a

panel model. Consider a dynamic variant of the pooled model (eq. 23.1):

yit =Xitβ+ρyit−1+uit (23.9)

First, if the error uit includes a group eﬀect, vi, then yit−1is bound to be correlated with the error,

since the value of viaﬀects yiat all t. That means that OLS applied to (23.9) will be inconsistent

as well as ineﬃcient. The ﬁxed-eﬀects model sweeps out the group eﬀects and so overcomes this

particular problem, but a subtler issue remains, which applies to both ﬁxed and random eﬀects

estimation. Consider the de-meaned representation of ﬁxed eﬀects, as applied to the dynamic model,

˜yit =˜

Xitβ+ρ˜yi,t−1+εit

where ˜yit =yit −¯yiand εit =uit −¯ui(or uit −αi, using the notation of equation 23.2). The trouble

is that ˜yi,t−1will be correlated with εit via the group mean, ¯yi. The disturbance εit inﬂuences yit

directly, which inﬂuences ¯yi, which, by construction, aﬀects the value of ˜yit for all t. The same

issue arises in relation to the quasi-demeaning used for random eﬀects. Estimators which ignore this

correlation will be consistent only as T→ ∞ (in which case the marginal eﬀect of εit on the group

mean of ytends to vanish).

One strategy for handling this problem, and producing consistent estimates of βand ρ, was proposed

by Anderson and Hsiao (1981). Instead of de-meaning the data, they suggest taking the ﬁrst diﬀerence

of (23.9), an alternative tactic for sweeping out the group eﬀects:

∆yit = ∆Xitβ+ρ∆yi,t−1+ηit (23.10)

where ηit = ∆uit = ∆(vi+εit) = εit −εi,t−1. We’re not in the clear yet, given the structure of the

error ηit: the disturbance εi,t−1is an inﬂuence on both ηit and ∆yi,t−1=yit −yi,t−1. The next step is

then to ﬁnd an instrument for the “contaminated” ∆yi,t−1. Anderson and Hsiao suggest using either

yi,t−2or ∆yi,t−2, both of which will be uncorrelated with ηit provided that the underlying errors, εit,

are not themselves serially correlated.

The Anderson–Hsiao estimator is not provided as a built-in function in gretl, since gretl’s sensible

handling of lags and diﬀerences for panel data makes it a simple application of regression with in-

strumental variables—see Listing 23.2, which is based on a study of country growth rates by Nerlove

(1999).7

Although the Anderson–Hsiao estimator is consistent, it is not most eﬃcient: it does not make the

fullest use of the available instruments for ∆yi,t−1, nor does it take into account the diﬀerenced

structure of the error ηit. It is improved upon by the methods of Arellano and Bond (1991) and

Blundell and Bond (1998). These methods are taken up in the next chapter.

7Also see Clint Cummins’ benchmarks page, http://www.stanford.edu/~clint/bench/.

Chapter 23. Panel data 210

Listing 23.2: The Anderson–Hsiao estimator for a dynamic panel model [Download ▼]

# Penn World Table data as used by Nerlove

open penngrow.gdt

# Fixed effects (for comparison)

panel Y 0 Y(-1) X

# Random effects (for comparison)

panel Y 0 Y(-1) X --random-effects

# take differences of all variables

diff Y X

# Anderson-Hsiao, using Y(-2) as instrument

tsls d_Y d_Y(-1) d_X ; 0 d_X Y(-2)

# Anderson-Hsiao, using d_Y(-2) as instrument

tsls d_Y d_Y(-1) d_X ; 0 d_X d_Y(-2)

Chapter 24

Dynamic panel models

The command for estimating dynamic panel models in gretl is dpanel. This command supports both

the “diﬀerence” estimator (Arellano and Bond,1991) and the “system” estimator (Blundell and Bond,

1998), which has become the method of choice in the applied literature.

24.1 Introduction

Notation

A dynamic linear panel data model can be represented as follows (in notation based on Arellano

(2003)):

yit =αyi,t−1+β′xit +ηi+vit (24.1)

where i= 1,2...,N indexes the cross-section units and tindexes time.

The main idea behind the diﬀerence estimator is to sweep out the individual eﬀect via diﬀerencing.

First-diﬀerencing eq. (24.1) yields

∆yit =α∆yi,t−1+β′∆xit + ∆vit =γ′Wit + ∆vit,(24.2)

in obvious notation. The error term of (24.2) is, by construction, autocorrelated and also correlated

with the lagged dependent variable, so an estimator that takes both issues into account is needed. The

endogeneity issue is solved by noting that all values of yi,t−kwith k > 1 can be used as instruments

for ∆yi,t−1: unobserved values of yi,t−k(whether missing or pre-sample) can safely be substituted

with 0. In the language of GMM, this amounts to using the relation

E(∆vit ·yi,t−k)=0, k > 1 (24.3)

as an orthogonality condition.

Autocorrelation is dealt with by noting that if vit is white noise, the covariance matrix of the vector

whose typical element is ∆vit is proportional to a matrix Hthat has 2 on the main diagonal, −1

on the ﬁrst subdiagonals and 0 elsewhere. One-step GMM estimation of equation (24.2) amounts to

computing

ˆγ=" X

W′

iZi!AN X

Z′

iWi!#−1 X

W′

iZi!AN X

Z′

i∆yi!(24.4)

where

∆yi=h∆yi3··· ∆yiT i′

Wi="∆yi2··· ∆yi,T −1

∆xi3··· ∆xiT #′

Zi=





yi10 0 ··· 0 ∆xi3

0yi1yi2··· 0 ∆xi4

000··· yi,T −2∆xiT







and

AN= X

Z′

iHZi!−1

211

Chapter 24. Dynamic panel models 212

Once the 1-step estimator is computed, the sample covariance matrix of the estimated residuals can

be used instead of Hto obtain 2-step estimates, which are not only consistent but asymptotically

eﬃcient. (In principle the process may be iterated, but nobody seems to be interested.) Standard

GMM theory applies, except for one point: Windmeijer (2005) has computed ﬁnite-sample corrections

to the asymptotic covariance matrix of the parameters, which are nowadays almost universally used.

The diﬀerence estimator is consistent, but has been shown to have poor properties in ﬁnite samples

when αis near one. People these days prefer the so-called“system” estimator, which complements the

diﬀerenced data (with lagged levels used as instruments) with data in levels (using lagged diﬀerences

as instruments). The system estimator relies on an extra orthogonality condition which has to do

with the earliest value of the dependent variable yi,1. The interested reader is referred to Blundell

and Bond (1998, pp. 124–125) for details, but here it suﬃces to say that this condition is satisﬁed

in mean-stationary models and brings an improvement in eﬃciency that may be substantial in many

cases.

The set of orthogonality conditions exploited in the system approach is not very much larger than

with the diﬀerence estimator since most of the possible orthogonality conditions associated with the

equations in levels are redundant, given those already used for the equations in diﬀerences.

The key equations of the system estimator can be written as

˜γ=" X

W′

i˜

Zi!AN X

Z′

i˜

Wi!#−1 X

W′

i˜

Zi!AN X

Z′

i∆˜

yi!(24.5)

where

∆˜

yi=h∆yi3··· ∆yiT yi3··· yiT i′

Wi="∆yi2··· ∆yi,T −1yi2··· yi,T −1

∆xi3··· ∆xiT xi3··· xiT #′

Zi=







yi10 0 ··· 0 0 ··· 0 ∆xi3

0yi1yi2··· 0 0 · ·· 0 ∆xi4

000··· yi,T −20··· 0 ∆xiT

000··· 0 ∆yi2··· 0xi3

000··· 0 0 · ·· ∆yi,T−1xiT







and

AN= X

Z′

iH∗˜

Zi!−1

In this case choosing a precise form for the matrix H∗for the ﬁrst step is no trivial matter. Its

north-west block should be as similar as possible to the covariance matrix of the vector ∆vit, so

the same choice as the “diﬀerence” estimator is appropriate. Ideally, the south-east block should

be proportional to the covariance matrix of the vector ιηi+v, that is σ2

vI+σ2

ηιι′; but since σ2

ηis

unknown and any positive deﬁnite matrix renders the estimator consistent, people just use I. The

oﬀ-diagonal blocks should, in principle, contain the covariances between ∆vis and vit, which would

be an identity matrix if vit is white noise. However, since the south-east block is typically given a

conventional value anyway, the beneﬁt in making this choice is not obvious. Some packages use I;

others use a zero matrix. Asymptotically, it should not matter, but on real datasets the diﬀerence

between the resulting estimates can be noticeable.

Chapter 24. Dynamic panel models 213

Rank deﬁciency

Both the diﬀerence estimator (24.4) and the system estimator (24.5) depend for their existence on the

invertibility of AN. This matrix may turn out to be singular for several reasons. However, this does

not mean that the estimator is not computable. In some cases, adjustments are possible such that

the estimator does exist, but the user should be aware that in such cases not all software packages

use the same strategy and replication of results may prove diﬃcult or even impossible.

A ﬁrst reason why ANmay be singular is unavailability of instruments, chieﬂy because of missing

observations. This case is easy to handle. If a particular row of ˜

Ziis zero for all units, the cor-

responding orthogonality condition (or the corresponding instrument if you prefer) is automatically

dropped; the overidentiﬁcation rank is then adjusted for testing purposes.

Even if no instruments are zero, however, ANcould be rank deﬁcient. A trivial case occurs if there

are collinear instruments, but a less trivial case may arise when T(the total number of time periods

available) is not much smaller than N(the number of units), as, for example, in some macro datasets

where the units are countries. The total number of potentially usable orthogonality conditions is

O(T2), which may well exceed Nin some cases. Since ANis the sum of Nmatrices which have, at

most, rank 2T−3 it could well happen that the sum is singular.

In all these cases, dpanel substitutes the pseudo-inverse of AN(Moore–Penrose) for its regular inverse.

Our choice is shared by some software packages, but not all, so replication may be hard.

Covariance matrix and standard errors

By default the standard errors shown for 1-step estimation are robust, based on the heteroskedasticity-

consistent variance estimator

Var(ˆγ) = M−1 X

W′

iZi!ANˆ

VNAN X

Z′

iWi!M−1

where M= (PiW′

iZi)AN(PiZ′

iWi) and ˆ

VN=N−1PiZ′

iˆ

uiˆ

u′

iZi, with ˆ

uithe vector of residuals

in diﬀerences for individual i. In addition, as noted above, the variance estimator for 2-step estimation

employs the ﬁnite-sample correction of Windmeijer (2005).

When the --asymptotic option is passed to dpanel, however, the 1-step variance estimator is simply

ˆσ2

uM−1, which is not heteroskedasticity-consistent, and the Windmeijer correction is not applied for

2-step estimation. Use of this option is not recommended unless you wish to replicate prior results that

did not report robust standard errors. In particular, tests based on the asymptotic 2-step variance

estimator are known to over-reject quite substantially (standard errors too small).

Treatment of missing values

Textbooks seldom bother with missing values, but in some cases their treatment may be far from

obvious. This is especially true if missing values are interspersed between valid observations. For

example, consider the plain diﬀerence estimator with one lag, so

yt=αyt−1+η+ϵt

where the iindex is omitted for clarity. Suppose you have an individual with t= 1 . . . 5, for which y3

is missing. It may seem that the data for this individual are unusable, because diﬀerencing ytwould

produce something like

t12345

yt∗∗◦∗∗

∆yt◦∗◦◦∗

where ∗= nonmissing and ◦= missing. Estimation seems to be unfeasible, since there are no periods

in which ∆ytand ∆yt−1are both observable.

However, we can use a k-diﬀerence operator and get

∆kyt=α∆kyt−1+ ∆kϵt

Chapter 24. Dynamic panel models 214

where ∆k= 1 −Lkand past levels of ytare valid instruments. In this example, we can choose k= 3

and use y1as an instrument, so this unit is in fact usable.

Not all software packages seem to be aware of this possibility, so replicating published results may

prove tricky if your dataset contains individuals with gaps between valid observations.

24.2 Usage

One feature of dpanel’s syntax is that you get default values for several choices you may wish to

make, so that in a “standard” situation the command is very concise. The simplest case of the model

(24.1) is a plain AR(1) process:

yi,t =αyi,t−1+ηi+vit.(24.6)

If you give the command

dpanel 1 ; y

Gretl assumes that you want to estimate (24.6) via the diﬀerence estimator (24.4), using as many

orthogonality conditions as possible. The scalar 1between dpanel and the semicolon indicates that

only one lag of yis included as an explanatory variable; using 2would give an AR(2) model. The

syntax that gretl uses for the non-seasonal AR and MA lags in an ARMA model is also supported in

this context. For example, if you want the ﬁrst and third lags of y(but not the second) included as

explanatory variables you can say

dpanel {1 3} ; y

or you can use a pre-deﬁned matrix for this purpose:

matrix ylags = {1, 3}

dpanel ylags ; y

To use a single lag of yother than the ﬁrst you need to employ this mechanism:

dpanel {3} ; y # only lag 3 is included

dpanel 3 ; y # compare: lags 1, 2 and 3 are used

To use the system estimator instead, you add the --system option, as in

dpanel 1 ; y --system

The level orthogonality conditions and the corresponding instrument are appended automatically (see

eq. 24.5).

Regressors

If additional regressors are to be included, they should be listed after the dependent variable in the

same way as other gretl estimation commands, such as ols. For the diﬀerence orthogonality relations,

dpanel takes care of transforming the regressors in parallel with the dependent variable.

One case of potential ambiguity is when an intercept is speciﬁed but the diﬀerence-only estimator is

selected, as in

dpanel 1 ; y const

In this case the default dpanel behavior, which agrees with David Roodman’s xtabond2 for Stata

(Roodman,2009a), is to drop the constant (since diﬀerencing reduces it to nothing but zeros). How-

ever, for compatibility with the DPD package for Ox, you can give the option --dpdstyle, in which

case the constant is retained (equivalent to including a linear trend in equation 24.1). A similar point

applies to the period-speciﬁc dummy variables which can be added in dpanel via the --time-dummies

option: in the diﬀerences-only case these dummies are entered in diﬀerenced form by default, but

when the --dpdstyle switch is applied they are entered in levels.

The standard gretl syntax applies if you want to use lagged explanatory variables, so for example the

command

Chapter 24. Dynamic panel models 215

dpanel 1 ; y const x(0 to -1) --system

would result in estimation of the model

yit =αyi,t−1+β0+β1xit +β2xi,t−1+ηi+vit.

Instruments

The default rules for instruments are:

•lags of the dependent variable are instrumented using all available orthogonality conditions; and

•additional regressors are considered exogenous, so they are used as their own instruments.

If a diﬀerent policy is wanted, the instruments should be speciﬁed in an additional list, separated

from the regressors list by a semicolon. The syntax closely mirrors that of the tsls command, but

in this context it is necessary to distinguish between “regular” instruments and what are often called

“GMM-style” instruments (that is, instruments that are handled in the same block-diagonal manner

as lags of the dependent variable, as described above).

“Regular” instruments are transformed in the same way as regressors, and the contemporaneous value

of the transformed variable is used to form an orthogonality condition. Since regressors are treated

as exogenous by default, it follows that these two commands estimate the same model:

dpanel 1 ; y z

dpanel 1 ; y z ; z

The instrument speciﬁcation in the second case simply conﬁrms what is implicit in the ﬁrst: that z

is exogenous. Note, though, that if you have some additional variable z2 which you want to add as a

regular instrument, it then becomes necessary to include zin the instrument list if it is to be treated

as exogenous:

dpanel 1 ; y z ; z2 # z is now implicitly endogenous

dpanel 1 ; y z ; z z2 # z is treated as exogenous

The speciﬁcation of“GMM-style”instruments is handled by the special constructs GMM() and GMMlevel().

The ﬁrst of these relates to instruments for the equations in diﬀerences, and the second to the equa-

tions in levels. The syntax for GMM() is

GMM(name,minlag ,maxlag [,collapse])

where name is replaced by the name of a series (or the name of a list of series), and minlag and

maxlag are replaced by the minimum and maximum lags to be used as instruments. The same goes

for GMMlevel().

One common use of GMM() is to limit the number of lagged levels of the dependent variable used

as instruments for the equations in diﬀerences. It’s well known that although exploiting all possible

orthogonality conditions yields maximal asymptotic eﬃciency, in ﬁnite samples it may be preferable

to use a smaller subset—see Roodman (2009b), Okui (2009). For example, the speciﬁcation

dpanel 1 ; y ; GMM(y, 2, 4)

ensures that no lags of ytearlier than t−4 will be used as instruments.

A second means of limiting the number of instruments is to “collapse” the sets of block-diagonal

instruments shown following equations 24.4 and 24.5. Instead of having a distinct instrument per

observation per lag, this is reduced to a distinct instrument per lag, as shown in Figure 24.1.

This treatment of instruments can be selected per GMM or GMMlevel case—by appending the collapse

ﬂag following the maxlag value—or it can be set “globally” by use of the --collapse option to the

dpanel command. To our knowledge Roodman’s xtabond2 was the ﬁrst software to oﬀer this useful

facility.

A further use of GMM() is to exploit more fully the potential orthogonality conditions aﬀorded by an

exogenous regressor, or a related variable that does not appear as a regressor. For example, in

Chapter 24. Dynamic panel models 216

GMM()







yi100000···

0yi1yi2000···

000yi1yi2yi3···

....





=⇒





yi10 0 ···

yi1yi20···

yi1yi2yi3···

....







GMMlevel()







∆yi20 0 ·· ·

0 ∆yi30···

0 0 ∆yi4·· ·

....





=⇒





∆yi2

∆yi3

∆yi4







Figure 24.1: Collapsing block-diagonal instruments

dpanel 1 ; y x ; GMM(z, 2, 6)

the variable xis considered an endogenous regressor, and up to 5 lags of zare used as instruments.

Note that in the following script fragment

dpanel 1 ; y z

dpanel 1 ; y z ; GMM(z,0,0)

the two estimation commands should not be expected to give the same result, as the sets of orthog-

onality relationships are subtly diﬀerent. In the latter case, you have T−2 separate orthogonality

relationships pertaining to zit, none of which has any implication for the other ones; in the former

case, you only have one. In terms of the Zimatrix, the ﬁrst form adds a single row to the bottom of

the instruments matrix, while the second form adds a diagonal block with T−2 columns; that is,

hzi3zi4··· zit i

versus 





zi30··· 0

0zi4··· 0

......

0 0 ··· zit







24.3 Replication of DPD results

In this section we show how to replicate the results of some of the pioneering work with dynamic

panel-data estimators by Arellano, Bond and Blundell. As the DPD manual (Doornik, Arellano and

Bond,2006) explains, it is diﬃcult to replicate the original published results exactly, for two main

reasons: not all of the data used in those studies are publicly available; and some of the choices made

in the original software implementation of the estimators have been superseded. Here, therefore, our

focus is on replicating the results obtained using the current DPD package and reported in the DPD

manual.

The examples are based on the program ﬁles abest1.ox,abest3.ox and bbest1.ox. These are in-

cluded in the DPD package, along with the Arellano–Bond database ﬁles abdata.bn7 and abdata.in7.1

The Arellano–Bond data are also provided with gretl, in the ﬁle abdata.gdt. In the following we do

not show the output from DPD or gretl; it is somewhat voluminous, and is easily generated by the

user. As of this writing the results from Ox/DPD and gretl are identical in all relevant respects for

all of the examples shown.2

1See http://www.doornik.com/download.html.

2To be speciﬁc, this was last tested using Ox Console version 9.10, version 1.28 of the DPD package, and gretl built

from git as of 2024-02-12, all on Linux.

Chapter 24. Dynamic panel models 217

Listing 24.1: Template for Ox/DPD program

#include <oxstd.h>

#import <packages/DPD/dpd>

main()

{

decl dpd = new DPD();

dpd.Load("abdata.in7");

dpd.SetYear("YEAR");

// model-specific code goes here

delete dpd;

}

An Ox/DPD program to generate the results of interest takes the general form shown in Listing 24.1.

In the examples below we take this template for granted and show just the model-speciﬁc code.

Example 1

The following Ox/DPD code—drawn from abest1.ox—replicates column (b) of Table 4 in Arellano

and Bond (1991), an instance of the diﬀerences-only or GMM-DIF estimator. The dependent variable

is the log of employment, n; the regressors include two lags of the dependent variable, current and

lagged values of the log real-product wage, w, the current value of the log of gross capital, k, and

current and lagged values of the log of industry output, ys. In addition the speciﬁcation includes a

constant and ﬁve year dummies; unlike the stochastic regressors, these deterministic terms are not

diﬀerenced. In this speciﬁcation the regressors w,kand ys are treated as exogenous and serve as their

own instruments. In DPD syntax this requires entering these variables twice, on the X_VAR and I_VAR

lines. The GMM-type (block-diagonal) instruments in this example are the second and subsequent

lags of the level of n. Both 1-step and 2-step estimates are computed.

dpd.SetOptions(FALSE); // don’t use robust standard errors

dpd.Select(DPD::Y_VAR, {"n", 0, 2});

dpd.Select(DPD::X_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});

dpd.Select(DPD::I_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});

dpd.Gmm("n", 2, 99);

dpd.SetDummies(DPD::D_CONSTANT + DPD::D_TIME);

print("\n\n***** Arellano & Bond (1991), Table 4 (b)");

dpd.SetMethod(DPD::M_1STEP);

dpd.Estimate();

dpd.SetMethod(DPD::M_2STEP);

dpd.Estimate();

Here is gretl code to do the same job:

open abdata.gdt

list X = w w(-1) k ys ys(-1)

dpanel 2 ; n X const --time-dummies --asy --dpdstyle

dpanel 2 ; n X const --time-dummies --asy --two-step --dpdstyle

Note that in gretl the switch to suppress robust standard errors is --asymptotic, here abbreviated

to --asy.3The --dpdstyle ﬂag speciﬁes that the constant and dummies should not be diﬀerenced,

3Option ﬂags in gretl can always be truncated, down to the minimal unique abbreviation.

Chapter 24. Dynamic panel models 218

in the context of a GMM-DIF model. With gretl’s dpanel command it is not necessary to specify

the exogenous regressors as their own instruments since this is the default; similarly, the use of the

second and all longer lags of the dependent variable as GMM-type instruments is the default and

need not be stated explicitly.

Example 2

The DPD ﬁle abest3.ox contains a variant of the above that diﬀers with regard to the choice of

instruments: the variables wand kare now treated as predetermined, and are instrumented GMM-

style using the second and third lags of their levels. This approximates column (c) of Table 4 in

Arellano and Bond (1991). We have modiﬁed the code in abest3.ox slightly to allow the use of

robust (Windmeijer-corrected) standard errors, which are the default in both DPD and gretl with

2-step estimation:

dpd.Select(DPD::Y_VAR, {"n", 0, 2});

dpd.Select(DPD::X_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});

dpd.Select(DPD::I_VAR, {"ys", 0, 1});

dpd.SetDummies(DPD::D_CONSTANT + DPD::D_TIME);

dpd.Gmm("n", 2, 99);

dpd.Gmm("w", 2, 3);

dpd.Gmm("k", 2, 3);

print("\n***** Arellano & Bond (1991), Table 4 (c)\n");

print(" (but using different instruments!!)\n");

dpd.SetMethod(DPD::M_2STEP);

dpd.Estimate();

The gretl code is as follows:

open abdata.gdt

list X = w w(-1) k ys ys(-1)

list Ivars = ys ys(-1)

dpanel 2 ; n X const ; GMM(w,2,3) GMM(k,2,3) Ivars --time --two-step --dpd

Note that since we are now calling for an instrument set other then the default (following the second

semicolon), it is necessary to include the Ivars speciﬁcation for the variable ys. However, it is not

necessary to specify GMM(n,2,99) since this remains the default treatment of the dependent variable.

Example 3

Our third example replicates the DPD output from bbest1.ox: this uses the same dataset as the

previous examples but the model speciﬁcations are based on Blundell and Bond (1998), and involve

comparison of the GMM-DIF and GMM-SYS (“system”) estimators. The basic speciﬁcation is slightly

simpliﬁed in that the variable ys is not used and only one lag of the dependent variable appears as a

regressor. The Ox/DPD code is:

dpd.Select(DPD::Y_VAR, {"n", 0, 1});

dpd.Select(DPD::X_VAR, {"w", 0, 1, "k", 0, 1});

dpd.SetDummies(DPD::D_CONSTANT + DPD::D_TIME);

print("\n\n***** Blundell & Bond (1998), Table 4: 1976-86 GMM-DIF");

dpd.Gmm("n", 2, 99);

dpd.Gmm("w", 2, 99);

dpd.Gmm("k", 2, 99);

dpd.SetMethod(DPD::M_2STEP);

dpd.Estimate();

print("\n\n***** Blundell & Bond (1998), Table 4: 1976-86 GMM-SYS");

dpd.GmmLevel("n", 1, 1);

dpd.GmmLevel("w", 1, 1);

Chapter 24. Dynamic panel models 219

dpd.GmmLevel("k", 1, 1);

dpd.SetMethod(DPD::M_2STEP);

dpd.Estimate();

Here is the corresponding gretl code:

open abdata.gdt

list X = w w(-1) k k(-1)

list Z = w k

# Blundell & Bond (1998), Table 4: 1976-86 GMM-DIF

dpanel 1 ; n X const ; GMM(Z,2,99) --time --two-step --dpd

# Blundell & Bond (1998), Table 4: 1976-86 GMM-SYS

dpanel 1 ; n X const ; GMM(Z,2,99) GMMlevel(Z,1,1) \

--time --two-step --dpd --system

Note the use of the --system option ﬂag to specify GMM-SYS, including the default treatment of

the dependent variable, which corresponds to GMMlevel(n,1,1). In this case we also want to use

lagged diﬀerences of the regressors wand kas instruments for the levels equations so we need explicit

GMMlevel entries for those variables. If you want something other than the default treatment for the

dependent variable as an instrument for the levels equations, you should give an explicit GMMlevel

speciﬁcation for that variable—and in that case the --system ﬂag is redundant (but harmless).

For the sake of completeness, note that if you specify at least one GMMlevel term, dpanel will then

include equations in levels, but it will not automatically add a default GMMlevel speciﬁcation for the

dependent variable unless the --system option is given.

24.4 Cross-country growth example

The previous examples all used the Arellano–Bond dataset; for this example we use the dataset

CEL.gdt, which is also included in the gretl distribution. As with the Arellano–Bond data, there

are numerous missing values. Details of the provenance of the data can be found by opening the

dataset information window in the gretl GUI (Data menu, Dataset info item). This is a subset of

the Barro–Lee 138-country panel dataset, an approximation to which is used in Caselli, Esquivel and

Lefort (1996) and Bond, Hoeﬄer and Temple (2001).4Both of these papers explore the dynamic

panel-data approach in relation to the issues of growth and convergence of per capita income across

countries.

The dependent variable is growth in real GDP per capita over successive ﬁve-year periods; the regres-

sors are the log of the initial (ﬁve years prior) value of GDP per capita, the log-ratio of investment to

GDP, s, in the prior ﬁve years, and the log of annual average population growth, n, over the prior ﬁve

years plus 0.05 as stand-in for the rate of technical progress, g, plus the rate of depreciation, δ(with

the last two terms assumed to be constant across both countries and periods). The original model is

∆5yit =βyi,t−5+αsit +γ(nit +g+δ) + νt+ηi+ϵit (24.7)

which allows for a time-speciﬁc disturbance νt. The Solow model with Cobb–Douglas production

function implies that γ=−α, but this assumption is not imposed in estimation. The time-speciﬁc

disturbance is eliminated by subtracting the period mean from each of the series.

Equation (24.7) can be transformed to an AR(1) dynamic panel-data model by adding yi,t−5to both

sides, which gives

yit = (1 + β)yi,t−5+αsit +γ(nit +g+δ) + ηi+ϵit (24.8)

where all variables are now assumed to be time-demeaned.

In (rough) replication of Bond et al. (2001) we now proceed to estimate the following two models:

(a) equation (24.8) via GMM-DIF, using as instruments the second and all longer lags of yit,sit and

4We say an “approximation” because we have not been able to replicate exactly the OLS results reported in the

papers cited, though it seems from the description of the data in Caselli et al. (1996) that we ought to be able to do so.

We note that Bond et al. (2001) used data provided by Professor Caselli yet did not manage to reproduce the latter’s

results.

Chapter 24. Dynamic panel models 220

nit +g+δ; and (b) equation (24.8) via GMM-SYS, using ∆yi,t−1, ∆si,t−1and ∆(ni,t−1+g+δ) as

additional instruments in the levels equations. We report robust standard errors throughout. (As a

purely notational matter, we now use “t−1” to refer to values ﬁve years prior to t,asinBond et al.

(2001)).

The gretl script to do this job is shown in Listing 24.2. Note that the ﬁnal transformed versions of

the variables (logs, with time-means subtracted) are named ly (yit), linv (sit) and lngd (nit +g+δ).

Listing 24.2: GDP growth example [Download ▼]

open CEL.gdt

ngd = n + 0.05

ly = log(y)

linv = log(s)

lngd = log(ngd)

# take out time means

loop i=1..8

smpl (time == i) --restrict --replace

ly -= mean(ly)

linv -= mean(linv)

lngd -= mean(lngd)

endloop

smpl --full

list X = linv lngd

# 1-step GMM-DIF

dpanel 1 ; ly X ; GMM(X,2,99)

# 2-step GMM-DIF

dpanel 1 ; ly X ; GMM(X,2,99) --two-step

# GMM-SYS

dpanel 1 ; ly X ; GMM(X,2,99) GMMlevel(X,1,1) --two-step --sys

For comparison we estimated the same two models using Ox/DPD and xtabond2. (In each case

we constructed a comma-separated values dataset containing the data as transformed in the gretl

script shown above, using a missing-value code appropriate to the target program.) For reference,

the commands used with Stata are reproduced in Listing 24.3.

Listing 24.3: Stata commands for GDP growth example

#delimit ;

insheet using CEL.csv

tsset unit time;

xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))

gmm(lngd, lag(2 99)) rob nolev;

xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))

gmm(lngd, lag(2 99)) rob nolev twostep;

xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))

gmm(lngd, lag(2 99)) rob nocons twostep;

For the GMM-DIF model all three programs ﬁnd 382 usable observations and 30 instruments, and

yield identical parameter estimates and robust standard errors (up to the number of digits printed,

or more); see Table 24.1.5

5The coeﬃcient shown for ly(-1) in the Tables is that reported directly by the software; for comparability with the

Chapter 24. Dynamic panel models 221

1-step 2-step

coeﬀ std. error coeﬀ std. error

ly(-1) 0.577564 0.1292 0.610056 0.1562

linv 0.0565469 0.07082 0.100952 0.07772

lngd −0.143950 0.2753 −0.310041 0.2980

Table 24.1: GMM-DIF: Barro–Lee data

Results for GMM-SYS estimation are shown in Table 24.2. In this case we show two sets of gretl

results: those labeled “gretl(1)” were obtained using gretl’s --dpdstyle option, while those labeled

“gretl(2)” did not use that option—the intent being to reproduce the Hmatrices used by Ox/DPD

and xtabond2 respectively.

gretl(1) Ox/DPD gretl(2) xtabond2

ly(-1) 0.9237 (0.0385) 0.9167 (0.0373) 0.9073 (0.0370) 0.9073 (0.0370)

linv 0.1592 (0.0449) 0.1636 (0.0441) 0.1856 (0.0411) 0.1856 (0.0411)

lngd −0.2370 (0.1485) −0.2178 (0.1433) −0.2355 (0.1501) −0.2355 (0.1501)

Table 24.2: 2-step GMM-SYS: Barro–Lee data (standard errors in parentheses)

In this case all three programs use 479 observations; gretl and xtabond2 use 41 instruments and

produce the same estimates (when using the same Hmatrix) while Ox/DPD nominally uses 66.6

It is noteworthy that with GMM-SYS plus “messy” missing observations, the results depend on the

precise array of instruments used, which in turn depends on the details of the implementation of the

estimator.

24.5 Auxiliary test statistics

We have concentrated above on parameter estimates and standard errors. Here we add some discussion

of the additional test statistics that typically accompany both GMM-DIF and GMM-SYS estimation—

tests of overidentiﬁcation, for ﬁrst- and second-order autocorrelation, and for the joint signiﬁcance of

regressors.

Overidentiﬁcation

If a model estimated with the use of instrumental variables is just-identiﬁed, the condition of or-

thogonality of the residuals and the instruments can be satisﬁed exactly. But if the speciﬁcation

is overidentiﬁed (more instruments than endogenous regressors) this condition can only be approxi-

mated, and the degree to which orthogonality“fails” serves as a test for the validity of the instruments

(and/or the speciﬁcation). Since dynamic panel models are almost always overidentiﬁed such a test

is of particular importance.

There are two such tests in the econometric literature, devised respectively by Sargan (1958) and

Hansen (1982). They share a common principle: a suitably scaled measure of deviation from perfect

orthogonality can be shown to be distributed as χ2(k), with kthe degree of overidentiﬁcation, under

the null hypothesis of valid instruments and correct speciﬁcation. Both test statistics can be written

S= N

i=1

v∗′

iZi!AN N

i=1

Z′

iˆ

v∗

where the ˆ

v∗

iare the residuals in ﬁrst diﬀerences for unit i, and for that reason they are often rolled

together—for example, as “Hansen–Sargan” tests by Davidson and MacKinnon (2004).

original model (eq. 24.7) it is necesary to subtract 1, which produces the expected negative value indicating conditional

convergence in per capita income.

6This is a case of the issue described in section 24.1: the full ANmatrix turns out to be singular and special

measures must be taken to produce estimates.

Chapter 24. Dynamic panel models 222

The Sargan vs Hansen diﬀerence is buried in AN: Sargan’s original test is the minimized orthogonality

score divided by a scalar estimate of the error variance (which is presumed to be homoskedastic), while

Hansen’s is the minimized criterion from eﬃcient GMM estimation, in which the scalar variance

estimate is replaced by a heteroskedasticity- and autocorrelation-consistent (HAC) estimator of the

variance matrix of the error term. These variants correspond to 1-step and 2-step estimates of the

given speciﬁcation.

Up till version 2021d, gretl followed Ox/DPD in presenting a single overidentiﬁcation statistic under

the name “Sargan”—in eﬀect, a Sargan test proper for the 1-step estimator and a Hansen test for 2-

step. Subsequently, however, gretl follows xtabond2 in distinguishing between the tests and presenting

both statistics, under their original names, when 2-step estimation is selected (and therefore the HAC

variance estimator is available). This choice responds to an argument made by Roodman (2009b):

the Sargan test is questionable owing to its assumption of homoskedasticity but the Hansen test is

seriously weakened by an excessive number of instruments (it may under-reject substantially), so

there may be a beneﬁt to taking both tests into consideration.

There are cases where the degrees of freedom for the overidentiﬁcation test diﬀers between DPD and

gretl; this occurs when the ANmatrix is singular (section 24.1). In concept the df equals the number

of instruments minus the number of parameters estimated; for the ﬁrst of these terms gretl uses the

rank of AN, while DPD appears to use the full dimension of this matrix.

Autocorrelation

Negative ﬁrst-order autocorrelation of the residuals in diﬀerences is expected by construction of the

dynamic panel estimator, so a signiﬁcant value for the AR(1) test does not indicate a problem. If

the AR(2) test rejects, however, this indicates violation of the maintained assumptions. Note that

valid AR tests cannot be produced when the --asymptotic option is speciﬁed in conjunction with

one-step GMM-SYS estimation; if you need the tests, either add the two-step option or drop the

asymptotic ﬂag (which is recommended in any case).

Wald tests on regressors

Wald tests on the regressors (and separately on the time dummy variables, if included), are based

on the estimated variance matrix of the parameter estimates and are generally in agreement across

software packages provided the parameter variance is estimated in the same way. One small exception

pertains to comparison between Ox/DPD and gretl when the diﬀerence estimator is used, a constant

term is included, and the --dpdstyle option is given with dpanel (so the constant is not automatically

omitted). In this case DPD includes the constant in the time-dummies Wald test but gretl does not.

24.6 Post-estimation available statistics

After estimation, the $model accessor will return a bundle containing several items that may be of

interest: most should be self-explanatory, but here’s a partial list:

Key Content

AR1,AR2 1st and 2nd order autocorrelation test statistics

sargan,sargan_df Sargan test for overidentifying restrictions and corresponding

degrees of freedom

hansen,hansen_df Hansen test for overidentifying restrictions and correspond-

ing degrees of freedom

wald,wald_df Wald test for overall signiﬁcance and corresponding degrees

of freedom

GMMinst The matrix Zof instruments (see equations (24.2) and (24.5)

wgtmat The matrix Aof GMM weights (see equations (24.2) and

(24.5)

Note that hansen and hansen_df are not included when 1-step estimation is selected. Note also

that GMMinst and wgtmat (which may be quite large matrices) are not saved in the $model bundle

Chapter 24. Dynamic panel models 223

by default; that requires use of the --keep-extra option with the dpanel command. Listing 24.4

illustrates use of these matrices to replicate via hansl commands the calculation of the GMM estimator.

24.7 Memo: dpanel options

ﬂag eﬀect

--asymptotic Suppresses the use of robust standard errors

--two-step Calls for 2-step estimation (the default being 1-step)

--system Calls for GMM-SYS, with default treatment of the dependent variable,

as in GMMlevel(y,1,1)

--collapse Collapse block-diagonal sets of GMM instruments as per Roodman

(2009a)

--time-dummies Includes period-speciﬁc dummy variables

--dpdstyle Compute the Hmatrix as in DPD; also suppresses diﬀerencing of au-

tomatic time dummies and omission of intercept in the GMM-DIF case

--verbose Prints conﬁrmation of the GMM-style instruments used; and when

--two-step is selected, prints the 1-step estimates ﬁrst

--vcv Calls for printing of the covariance matrix

--quiet Suppresses the printing of results

--keep-extra Save additional matrices in $model bundle (see above)

The time dummies option supports the qualiﬁer noprint, as in

--time-dummies=noprint

This means that although the dummies are included in the speciﬁcation their coeﬃcients, standard

errors and so on are not printed.

Chapter 24. Dynamic panel models 224

Listing 24.4: Manual replication of built-in command [Download ▼]

set verbose off

open abdata.gdt

# compose list of regressors

list X = w w(-1) k k(-1)

list Z = w k

dpanel 1 ; n X const ; GMM(Z,2,99) --two-step --dpd --keep-extra

### --- re-do by hand ----------------------------

# fetch Z and A from model

A = $model.wgtmat

mZt = $model.GMMinst # note: transposed

# create data matrices

series valid = ok($uhat)

series ddep = diff(n)

series dldep = ddep(-1)

list dreg = diff(X)

smpl valid --dummy

matrix m_reg = {dldep} ~ {dreg} ~ 1

matrix m_dep = {ddep}

matrix uno = mZt * m_reg

matrix due = qform(uno’, A)

matrix tre = (uno’A) * (mZt * m_dep)

matrix coef = due\tre

print coef

Chapter 25

Nonlinear least squares

25.1 Introduction and examples

Gretl supports nonlinear least squares (NLS) using a variant of the Levenberg–Marquardt algorithm.

The user must supply a speciﬁcation of the regression function; prior to giving this speciﬁcation

the parameters to be estimated must be “declared” and given initial values. Optionally, the user

may supply analytical derivatives of the regression function with respect to each of the parameters. If

derivatives are not given, the user must instead give a list of the parameters to be estimated (separated

by spaces or commas), preceded by the keyword params. The tolerance (criterion for terminating the

iterative estimation procedure) can be adjusted using the set command.

The syntax for specifying the function to be estimated consists of the name of the dependent variable,

followed by an expression to generate it. This is illustrated in the following two examples, with

accompanying derivatives.

# Consumption function from Greene

nls C = alpha + beta * Y^gamma

deriv alpha = 1

deriv beta = Y^gamma

deriv gamma = beta * Y^gamma * log(Y)

end nls

# Nonlinear function from Russell Davidson

nls y = alpha + beta * x1 + (1/beta) * x2

deriv alpha = 1

deriv beta = x1 - x2/(beta*beta)

end nls --vcv

Note the command words nls (which introduces the regression function), deriv (which introduces the

speciﬁcation of a derivative), and end nls, which terminates the speciﬁcation and calls for estimation.

If the -vcv ﬂag is appended to the last line the covariance matrix of the parameter estimates is printed.

25.2 Initializing the parameters

The parameters of the regression function must be given initial values prior to the nls command. (In

the GUI program this may be done via the menu item “Variable, Deﬁne new variable”).

In some cases, where the nonlinear function is a generalization of (or a restricted form of) a linear

model, it may be convenient to run an ols and initialize the parameters from the OLS coeﬃcient

estimates. In relation to the ﬁrst example above, one might do:

ols C 0 Y

alpha = $coeff(0)

beta = $coeff(Y)

gamma = 1

And in relation to the second example one might do:

ols y 0 x1 x2

alpha = $coeff(0)

beta = $coeff(x1)

225

Chapter 25. Nonlinear least squares 226

25.3 NLS dialog window

It is probably most convenient to compose the commands for NLS estimation in the form of a gretl

script but you can also do so interactively, by selecting the item “Nonlinear Least Squares” under

the “Model, Nonlinear models” menu. This opens a dialog box where you can type the function

speciﬁcation (possibly prefaced by statements to set the initial parameter values) and the derivatives,

if available. An example of this is shown in Figure 25.1. Note that in this context you do not have

to supply the nls and end nls tags.

Figure 25.1: NLS dialog box

25.4 Analytical and numerical derivatives

If you are able to ﬁgure out the derivatives of the regression function with respect to the parameters,

it is advisable to supply those derivatives as shown in the examples above. If that is not possible,

gretl will compute approximate numerical derivatives. However, the properties of the NLS algorithm

may not be so good in this case (see section 25.8).

This is done by using the params statement, which should be followed by a list of identiﬁers containing

the parameters to be estimated. In this case, the examples above would read as follows:

# Greene

nls C = alpha + beta * Y^gamma

params alpha beta gamma

end nls

# Davidson

nls y = alpha + beta * x1 + (1/beta) * x2

params alpha beta

end nls

If analytical derivatives are supplied, they are checked for consistency with the given nonlinear func-

tion. If the derivatives are clearly incorrect estimation is aborted with an error message. If the

derivatives are “suspicious” a warning message is issued but estimation proceeds. This warning may

sometimes be triggered by incorrect derivatives, but it may also be triggered by a high degree of

collinearity among the derivatives.

Note that you cannot mix analytical and numerical derivatives: you should supply expressions for all

of the derivatives or none.

Chapter 25. Nonlinear least squares 227

25.5 Advanced use

The nls block can also contain more sophisticated constructs. First, it can handle intermediate

expressions; this makes it possible to construct the conditional mean expression as a multi-step job,

thus enhancing modularity and readability of the code. Second, more complex objects, such as lists

and matrices, can be used for this purpose.

For example, suppose that we want to estimate a Probit Binary Response model via NLS. The

speciﬁcation is

yi= Φ [g(xi)] + ui, g(xi) = b0+b1x1,i +b2x2,i =b′xi(25.1)

Note: this is not the recommended way to estimate a probit model: the uiterm is heteroskedastic

by construction and ML estimation is much preferable here. Still, NLS is a consistent estimator of

the parameter vector b, although its covariance matrix will have to be adjusted to compensate for

heteroskedasticity: this is accomplished via the --robust switch.

Listing 25.1: NLS estimation of a Probit model [Download ▼]

open greene25_1.gdt

list X = const age income ownrent selfempl

# initalisation

ols cardhldr X --quiet

matrix b = $coeff / $sigma

# proceed with NLS estimation

nls cardhldr = cnorm(ndx)

series ndx = lincomb(X, b)

params b

end nls --robust

# compare with ML probit

probit cardhldr X --p-values

The example in script 25.1 can be enhanced by using analytical derivatives: since

∂g(xi)

∂bj

=φ(b′xi)·xij

one could substitute the params line in the script with the two-liner

series f = dnorm(ndx)

deriv b = {f} .* {X}

and have nls use analytically-computed derivatives, which are quicker and usually more reliable.

25.6 Controlling termination

The NLS estimation procedure is an iterative process. Iteration is terminated when the criterion for

convergence is met or when the maximum number of iterations is reached, whichever comes ﬁrst.

Let kdenote the number of parameters being estimated. The maximum number of iterations is

100 ×(k+ 1) when analytical derivatives are given, and 200 ×(k+ 1) when numerical derivatives are

used.

Let ϵdenote a small number. The iteration is deemed to have converged if at least one of the following

conditions is satisﬁed:

•Both the actual and predicted relative reductions in the error sum of squares are at most ϵ.

•The relative error between two consecutive iterates is at most ϵ.

Chapter 25. Nonlinear least squares 228

This default value of ϵis the machine precision to the power 3/4,1but it can be adjusted using the

set command with the parameter nls_toler. For example

set nls_toler .0001

will relax the value of ϵto 0.0001.

25.7 Details on the code

The underlying engine for NLS estimation is based on the minpack suite of functions, available from

netlib.org. Speciﬁcally, the following minpack functions are called:

lmder Levenberg–Marquardt algorithm with analytical derivatives

chkder Check the supplied analytical derivatives

lmdif Levenberg–Marquardt algorithm with numerical derivatives

fdjac2 Compute ﬁnal approximate Jacobian when using numerical derivatives

dpmpar Determine the machine precision

On successful completion of the Levenberg–Marquardt iteration, a Gauss–Newton regression is used

to calculate the covariance matrix for the parameter estimates. If the --robust ﬂag is given a robust

variant is computed. The documentation for the set command explains the speciﬁc options available

in this regard.

Since NLS results are asymptotic, there is room for debate over whether or not a correction for degrees

of freedom should be applied when calculating the standard error of the regression (and the standard

errors of the parameter estimates). For comparability with OLS, and in light of the reasoning given in

Davidson and MacKinnon (1993), the estimates shown in gretl do use a degrees of freedom correction.

25.8 Numerical accuracy

Table 25.1 shows the results of running the gretl NLS procedure on the 27 Statistical Reference

Datasets made available by the U.S. National Institute of Standards and Technology (NIST) for testing

nonlinear regression software.2For each dataset, two sets of starting values for the parameters are

given in the test ﬁles, so the full test comprises 54 runs. Two full tests were performed, one using all

analytical derivatives and one using all numerical approximations. In each case the default tolerance

was used.3

Out of the 54 runs, gretl failed to produce a solution in 4 cases when using analytical derivatives, and

in 5 cases when using numeric approximation. Of the four failures in analytical derivatives mode, two

were due to non-convergence of the Levenberg–Marquardt algorithm after the maximum number of

iterations (on MGH09 and Bennett5, both described by NIST as of “Higher diﬃculty”) and two were

due to generation of range errors (out-of-bounds ﬂoating point values) when computing the Jacobian

(on BoxBOD and MGH17, described as of “Higher diﬃculty” and “Average diﬃculty” respectively). The

additional failure in numerical approximation mode was on MGH10 (“Higher diﬃculty”, maximum

number of iterations reached).

The table gives information on several aspects of the tests: the number of outright failures, the

average number of iterations taken to produce a solution and two sorts of measure of the accuracy of

the estimates for both the parameters and the standard errors of the parameters.

For each of the 54 runs in each mode, if the run produced a solution the parameter estimates obtained

by gretl were compared with the NIST certiﬁed values. We deﬁne the “minimum correct ﬁgures” for

a given run as the number of signiﬁcant ﬁgures to which the least accurate gretl estimate agreed

with the certiﬁed value, for that run. The table shows both the average and the worst case value

1On a 32-bit Intel Pentium machine a likely value for this parameter is 1.82 ×10−12.

2For a discussion of gretl’s accuracy in the estimation of linear models, see Appendix C.

3The data shown in the table were gathered from a pre-release build of gretl version 1.0.9, compiled with gcc 3.3,

linked against glibc 2.3.2, and run under Linux on an i686 PC (IBM ThinkPad A21m).

Chapter 25. Nonlinear least squares 229

of this variable across all the runs that produced a solution. The same information is shown for the

estimated standard errors.4

The second measure of accuracy shown is the percentage of cases, taking into account all parameters

from all successful runs, in which the gretl estimate agreed with the certiﬁed value to at least the 6

signiﬁcant ﬁgures which are printed by default in the gretl regression output.

Table 25.1: Nonlinear regression: the NIST tests

Analytical derivatives Numerical derivatives

Failures in 54 tests 4 5

Average iterations 32 127

Mean of min. correct ﬁgures, 8.120 6.980

parameters

Worst of min. correct ﬁgures, 4 3

parameters

Mean of min. correct ﬁgures, 8.000 5.673

standard errors

Worst of min. correct ﬁgures, 5 2

standard errors

Percent correct to at least 6 ﬁgures, 96.5 91.9

parameters

Percent correct to at least 6 ﬁgures, 97.7 77.3

standard errors

Using analytical derivatives, the worst case values for both parameters and standard errors were

improved to 6 correct ﬁgures on the test machine when the tolerance was tightened to 1.0e−14.

Using numerical derivatives, the same tightening of the tolerance raised the worst values to 5 correct

ﬁgures for the parameters and 3 ﬁgures for standard errors, at a cost of one additional failure of

convergence.

Note the overall superiority of analytical derivatives: on average solutions to the test problems were

obtained with substantially fewer iterations and the results were more accurate (most notably for the

estimated standard errors). Note also that the six-digit results printed by gretl are not 100 percent

reliable for diﬃcult nonlinear problems (in particular when using numerical derivatives). Having

registered this caveat, the percentage of cases where the results were good to six digits or better

seems high enough to justify their printing in this form.

4For the standard errors, I excluded one outlier from the statistics shown in the table, namely Lanczos1. This is an

odd case, using generated data with an almost-exact ﬁt: the standard errors are 9 or 10 orders of magnitude smaller

than the coeﬃcients. In this instance gretl could reproduce the certiﬁed standard errors to only 3 ﬁgures (analytical

derivatives) and 2 ﬁgures (numerical derivatives).

Chapter 26

Maximum likelihood estimation

26.1 Generic ML estimation with gretl

Maximum likelihood estimation is a cornerstone of modern inferential procedures. Gretl provides a

way to implement this method for a wide range of estimation problems, by use of the mle command.

We give here a few examples.

To give a foundation for the examples that follow, we start from a brief reminder on the basics of

ML estimation. Given a sample of size T, it is possible to deﬁne the density function1for the whole

sample, namely the joint distribution of all the observations f(Y;θ), where Y={y1, . . . , yT}. Its

shape is determined by a k-vector of unknown parameters θ, which we assume is contained in a set Θ,

and which can be used to evaluate the probability of observing a sample with any given characteristics.

After observing the data, the values Yare given, and this function can be evaluated for any legitimate

value of θ. In this case, we prefer to call it the likelihood function; the need for another name stems

from the fact that this function works as a density when we use the yts as arguments and θas

parameters, whereas in this context θis taken as the function’s argument, and the data Yonly have

the role of determining its shape.

In standard cases, this function has a unique maximum. The location of the maximum is unaﬀected if

we consider the logarithm of the likelihood (or log-likelihood for short): this function will be denoted

ℓ(θ) = log f(Y;θ)

The log-likelihood functions that gretl can handle are those where ℓ(θ) can be written as

ℓ(θ) =

t=1

ℓt(θ)

which is true in most cases of interest. The functions ℓt(θ) are called the log-likelihood contributions.

Moreover, the location of the maximum is obviously determined by the data Y. This means that the

value ˆ

θ(Y) = Argmax

θ∈Θ

ℓ(θ) (26.1)

is some function of the observed data (a statistic), which has the property, under mild conditions, of

being a consistent, asymptotically normal and asymptotically eﬃcient estimator of θ.

Sometimes it is possible to write down explicitly the function ˆ

θ(Y); in general, it need not be so.

In these circumstances, the maximum can be found by means of numerical techniques. These often

rely on the fact that the log-likelihood is a smooth function of θ, and therefore on the maximum its

partial derivatives should all be 0. The gradient vector, or score vector, is a function that enjoys

many interesting statistical properties in its own right; it will be denoted here as g(θ). It is a k-vector

with typical element

gi(θ) = ∂ℓ(θ)

∂θi

t=1

∂ℓt(θ)

∂θi

Gradient-based methods can be brieﬂy illustrated as follows:

1. pick a point θ0∈Θ;

1We are supposing here that our data are a realization of continuous random variables. For discrete random

variables, everything continues to apply by referring to the probability function instead of the density. In both cases,

the distribution may be conditional on some exogenous variables.

230

Chapter 26. Maximum likelihood estimation 231

2. evaluate g(θ0);

3. if g(θ0) is “small”, stop. Otherwise, compute a direction vector d(g(θ0));

4. evaluate θ1=θ0+d(g(θ0));

5. substitute θ0with θ1;

6. restart from 2.

Many algorithms of this kind exist; they basically diﬀer from one another in the way they compute

the direction vector d(g(θ0)), to ensure that ℓ(θ1)> ℓ(θ0) (so that we eventually end up on the

maximum).

The default method gretl uses to maximize the log-likelihood is a gradient-based algorithm known

as the BFGS (Broyden, Fletcher, Goldfarb and Shanno) method. This technique is used in most

econometric and statistical packages, as it is well-established and remarkably powerful. Clearly, in

order to make this technique operational, it must be possible to compute the vector g(θ) for any value

of θ. In some cases this vector can be written explicitly as a function of Y. If this is not possible or

too diﬃcult the gradient may be evaluated numerically. The alternative Newton-Raphson algorithm

is also available. This method is more eﬀective under some circumstances but is also more fragile; see

section 26.10 and chapter 37 for details.2

The choice of the starting value, θ0, is crucial in some contexts and inconsequential in others. In

general, however, it is advisable to start the algorithm from “sensible” values whenever possible. If a

consistent estimator is available, this is usually a safe and eﬃcient choice: this ensures that in large

samples the starting point will be likely close to ˆ

θand convergence can be achieved in few iterations.

The maximum number of iterations allowed for the BFGS procedure, and the relative tolerance for as-

sessing convergence, can be adjusted using the set command: the relevant variables are bfgs_maxiter

(default value 500) and bfgs_toler (default value, the machine precision to the power 3/4).

26.2 Syntax

ML estimation in gretl is supported by the mle command block. This consists of an initial line

holding the keyword mle plus an equation for the loglikelihood; one or more statements within the

block (details below); and a trailer line to close the block: end mle. Option ﬂags may be appended

to the trailer line.

Listing 26.1 gives a simple but complete example which serves to illustrate the equivalence of MLE

and OLS in the context of the normal linear model.

Listing 26.1: OLS and MLE [Download ▼]

open data9-7

list X = const INCOME PRICE

ols QNC X

matrix b = $coeff

scalar s2 = $sigma^2

scalar l2pi = log(2*$pi)

scalar n = $nobs

mle lt = -0.5*l2pi -0.5*log(s2) - 1/(2*s2) * uhat^2

series uhat = QNC - lincomb(X, b)

s2 = sum(uhat^2)/n

params b

end mle

2Note that some of the statements made below (for example, regarding estimation of the covariance matrix) have

to be modiﬁed when Newton’s method is used.

Chapter 26. Maximum likelihood estimation 232

Initial line of block

If possible the given expression should evaluate to a series or vector (contribution to the loglikelihood

per observation). Failing that, it must evaluate to a scalar (the total loglikelihood). The identiﬁer on

the left-hand side (lt in Listing 26.1) is up to the user. If the variable in question is deﬁned prior to

the mle block it can be referenced after ML estimation; otherwise it is treated as a temporary variable

and is destroyed after estimation.

Lines within the block

These may take three forms:

1. “Helper” statements that calculate auxiliary quantities (in the example, uhat and s2). Such

statements will be evaluated before the loglikelihood and then re-evaluated on each iteration.

2. Keyword plus parameter, as in “params b”, which tells mle that the parameter to be adjusted

to maximize the loglikelihood is the vector b. This sort of statement can also be used to specify

analytical derivatives of the loglikelihood with respect to the parameters; see section 26.7 for

discussion and examples.

3. Statements employing print or printf to track the progress of calculation, which can be useful

for debugging.

Final line

In the example above this merely terminates the block, but if one wanted standard errors calculated

via a numerical approximation to the Hessian (for instance) one could substitute

end mle --hessian

For a full listing of applicable options see the mle entry in the Gretl Command Reference.

26.3 Covariance matrix and standard errors

By default the covariance matrix of the parameter estimates is based on the Outer Product of the

Gradient (OPG). That is,

VarOPG(ˆ

θ) = G′(ˆ

θ)G(ˆ

θ)−1(26.2)

where G(ˆ

θ) is the T×kmatrix of contributions to the gradient. Other options are available. If the

--hessian ﬂag is given, the covariance matrix is computed from a numerical approximation to the

Hessian at convergence. If the --robust option is given the quasi-ML “sandwich” estimator is used:

VarQML(ˆ

θ) = H(ˆ

θ)−1G′(ˆ

θ)G(ˆ

θ)H(ˆ

θ)−1

where Hdenotes the numerical approximation to the Hessian. A reﬁnement here is that if the hac

parameter is appended to the --robust option, as in

end mle --robust=hac

the sandwich estimator is augmented in the manner of Newey and West (1987) to allow for serial

correlation in the gradient. (Note that this only makes sense for time-series data.) In that case the

details of the HAC estimator can be controlled via the set command, as described in chapter 22.

Cluster-robust estimation is also available: in order to activate it, use the --cluster=clustvar,

where clustvar should be a discrete series. See section 22.5 for more details.

Note, however, that if the log-likelihood function supplied by the user just returns a scalar value—as

opposed to a series or vector holding per-observation contributions—then the OPG method is not

applicable and so the covariance matrix must be estimated via a numerical approximation to the

Hessian.

Chapter 26. Maximum likelihood estimation 233

26.4 Gamma estimation

Suppose we have a sample of Tindependent and identically distributed observations from a Gamma

distribution with shape parameter kand scale parameter θ. The density function for each observation

xtis

f(xt) = 1

Γ(k)θk·xk−1

te−xt/θ (26.3)

The log-likelihood for the entire sample can be written as the logarithm of the joint density of all

the observations. Since these are independent and identical, the joint density is the product of the

individual densities, and hence its log is

ℓ(k, θ) =

t=1

log "xk−1

te−xt/θ

Γ(k)θk#=

t=1

ℓt(26.4)

where

ℓt=klog(xt/θ)−γ(k)−log xt−xt/θ

and γ(·) is the log of the gamma function. In order to estimate the parameters kand θvia ML, we

need to maximize (26.4) with respect to them. Here’s a simple snippet of gretl code to do the job.

scalar k = 1

scalar theta = 1

mle logl = k*ln(x/theta) - lngamma(k) - ln(x) - x/theta

params k theta

end mle

The ﬁrst two statements above are necessary to ensure that the variables kand theta exist before

the computation of logl is attempted. Inside the mle block these variables (which could be either

scalars, vectors or a combination of the two—see below for examples) are identiﬁed via the params

keyword as the parameters that should be adjusted to maximize the likelihood. Their values will be

changed by the execution of the mle command; upon successful completion, they will be replaced by

the ML estimates. We set the starting value to 1 for both; this is arbitrary, but does not matter much

in this example (more on this later).

The code above can be made more readable, and marginally more eﬃcient, by deﬁning a series to

hold xt/θ, which we’ll call y. This command can be embedded in the mle block as follows:

mle logl = k*ln(y) - lngamma(k) - ln(x) - y

series y = x/theta

params k theta

end mle

You can insert as many such auxiliary lines as you require before the params line, with the restriction

that they must contain either (a) commands to generate series, scalars or matrices or (b) print

commands (which may be used to aid in debugging).

In a simple example like this, the choice of the starting values is almost inconsequential; the algo-

rithm is likely to converge regardless of the initialization. However, consistent method-of-moments

estimators of kand θcan easily be recovered from the sample mean mand variance V. It can be

shown that

E(xt) = kθ V (xt) = kθ2

hence the following estimators

θ=V/m

k=m/θ

are consistent, and therefore suitable to be used as a starting point for the algorithm. The original

initializers for kand θcould then be replaced by

Chapter 26. Maximum likelihood estimation 234

scalar m = mean(x)

scalar theta = var(x)/m

scalar k = m/theta

Another thing to note is that sometimes parameters are constrained within certain boundaries. In

this case, for example, both Gamma parameters must be positive numbers. Gretl can’t check for

this automatically; it’s the user’s responsibility to ensure that the function is always evaluated at an

admissible point in the parameter space during the iterative search for the maximum. An eﬀective

technique is to deﬁne a scalar variable that checks the validity of the parameters, and set the log-

likelihood as undeﬁned (NA) if the check fails. To implement this, the mle block above could be

modiﬁed as follows:

mle logl = check ? k*ln(y) - lngamma(k) - ln(x) - y : NA

series y = x/theta

scalar check = k > 0 && theta > 0

params k theta

end mle

For reference, Listing 26.2 presents a complete script that generates artiﬁcial Gamma data and obtains

ML estimates of the parameters, employing the various reﬁnements described above.

Listing 26.2: ML estimation of Gamma parameters [Download ▼]

# create an empty data set with 200 observations

nulldata 200

# fix a random seed for replicability

set seed 1707138404

# generate a Gamma random variable x with shape k = 3 and scale theta = 2

series x = randgen(G, 3, 2)

# initialize estimates via sample moments

m = mean(x)

theta = var(x) / m

k = m / theta

mle logl = check ? k*ln(y) - lngamma(k) - ln(x) - y : NA

series y = x/theta

check = k > 0 && theta > 0

params k theta

end mle

The mle output is:

Model 1: ML, using observations 1-200

logl = check ? k*ln(y) - lngamma(k) - ln(x) - y : NA

Standard errors based on Outer Products matrix

estimate std. error z p-value

---------------------------------------------------

k 3.28159 0.309764 10.59 3.18e-26 ***

theta 1.86066 0.181566 10.25 1.21e-24 ***

Log-likelihood -504.8970 Akaike criterion 1013.794

Schwarz criterion 1020.391 Hannan-Quinn 1016.463

Chapter 26. Maximum likelihood estimation 235

26.5 Stochastic frontier cost function

Note: this section has the sole purpose of illustrating the mle command. For the estimation of

stochastic frontier cost or production functions, you may want to use the frontier function package.

When modeling a cost function, it is sometimes worthwhile to incorporate explicitly into the statistical

model the notion that ﬁrms may be ineﬃcient, so that the observed cost deviates from the theoretical

ﬁgure not only because of unobserved heterogeneity between ﬁrms, but also because two ﬁrms could

be operating at a diﬀerent eﬃciency level, despite being identical in all other respects. In this case

we may write

Ci=C∗

i+ui+vi

where Ciis some variable cost indicator, C∗

iis its “theoretical” value, uiis a zero-mean disturbance

term and viis the ineﬃciency term, which is supposed to be nonnegative by its very nature. A linear

speciﬁcation for C∗

iis often chosen. For example, the Cobb–Douglas cost function arises when C∗

iis

a linear function of the logarithms of the input prices and the output quantities.

The stochastic frontier model is a linear model of the form yi=xiβ+εiin which the error term εi

is the sum of uiand vi.

A common postulate is that ui∼N(0, σ2

u) and vi∼N(0, σ2

v). If independence between uiand viis

also assumed, then it is possible to show that the density function of εihas the form:

f(εi) = r2

πΦλεi

σ1

σϕεi

σ(26.5)

where Φ(·) and ϕ(·) are, respectively, the distribution and density function of the standard normal,

σ=pσ2

u+σ2

vand λ=σu

σv.

As a consequence, the log-likelihood for one observation takes the form (apart form an irrelevant

constant)

ℓt= log Φ λεi

σ−log(σ) + ε2

2σ2

Therefore, a Cobb–Douglas cost function with stochastic frontier is the model described by the fol-

lowing equations:

log Ci= log C∗

i+εi

log C∗

i=c+

j=1

βjlog yij +

j=1

αjlog pij

εi=ui+vi

ui∼N(0, σ2

vi∼N(0, σ2

v)

In most cases, one wants to ensure that the homogeneity of the cost function with respect to the

prices holds by construction. Since this requirement is equivalent to Pn

j=1 αj= 1, the above equation

for C∗

ican be rewritten as

log Ci−log pin =c+

j=1

βjlog yij +

j=2

αj(log pij −log pin) + εi(26.6)

The above equation could be estimated by OLS, but it would suﬀer from two drawbacks: ﬁrst, the

OLS estimator for the intercept cis inconsistent because the disturbance term has a non-zero expected

value; second, the OLS estimators for the other parameters are consistent, but ineﬃcient in view of

the non-normality of εi. Both issues can be addressed by estimating (26.6) by maximum likelihood.

Nevertheless, OLS estimation is a quick and convenient way to provide starting values for the MLE

algorithm.

Listing 26.3 shows how to implement the model described so far. The banks91 ﬁle contains part of

the data used in Lucchetti, Papi and Zazzaro (2001).

The script in example 26.3 is relatively easy to modify to show how one can use vectors (that is,

1-dimensional matrices) for storing the parameters to optimize: example 26.4 holds essentially the

Chapter 26. Maximum likelihood estimation 236

Listing 26.3: Estimation of stochastic frontier cost function (with scalar parameters) [Download ▼]

open banks91.gdt

# transformations

series cost = ln(VC)

series q1 = ln(Q1)

series q2 = ln(Q2)

series p1 = ln(P1)

series p2 = ln(P2)

series p3 = ln(P3)

# Cobb-Douglas cost function with homogeneity restrictions

# (for initialization)

series rcost = cost - p1

series rp2 = p2 - p1

series rp3 = p3 - p1

ols rcost const q1 q2 rp2 rp3

# Cobb-Douglas cost function with homogeneity restrictions

# and inefficiency

scalar b0 = $coeff(const)

scalar b1 = $coeff(q1)

scalar b2 = $coeff(q2)

scalar b3 = $coeff(rp2)

scalar b4 = $coeff(rp3)

scalar su = 0.1

scalar sv = 0.1

mle logl = ln(cnorm(e*lambda/ss)) - (ln(ss) + 0.5*(e/ss)^2)

scalar ss = sqrt(su^2 + sv^2)

scalar lambda = su/sv

series e = rcost - b0*const - b1*q1 - b2*q2 - b3*rp2 - b4*rp3

params b0 b1 b2 b3 b4 su sv

end mle

Chapter 26. Maximum likelihood estimation 237

same script in which the parameters of the cost function are stored together in a vector. Of course,

this makes also possible to use variable lists and other reﬁnements which make the code more compact

and readable.

Listing 26.4: Estimation of stochastic frontier cost function (with matrix parameters) [Download ▼]

open banks91.gdt

# transformations

series cost = ln(VC)

series q1 = ln(Q1)

series q2 = ln(Q2)

series p1 = ln(P1)

series p2 = ln(P2)

series p3 = ln(P3)

# Cobb-Douglas cost function with homogeneity restrictions

# (for initialization)

series rcost = cost - p1

series rp2 = p2 - p1

series rp3 = p3 - p1

list X = const q1 q2 rp2 rp3

ols rcost X

X = const q1 q2 rp2 rp3

# Cobb-Douglas cost function with homogeneity restrictions

# and inefficiency

matrix b = $coeff

scalar su = 0.1

scalar sv = 0.1

mle logl = ln(cnorm(e*lambda/ss)) - (ln(ss) + 0.5*(e/ss)^2)

scalar ss = sqrt(su^2 + sv^2)

scalar lambda = su/sv

series e = rcost - lincomb(X, b)

params b su sv

end mle

26.6 GARCH models

GARCH models are handled by gretl via a native function. However, it is instructive to see how they

can be estimated through the mle command.3

The following equations provide the simplest example of a GARCH(1,1) model:

yt=µ+εt

εt=ut·σt

ut∼N(0,1)

ht=ω+αε2

t−1+βht−1.

Since the variance of ytdepends on past values, writing down the log-likelihood function is not simply

a matter of summing the log densities for individual observations. As is common in time series models,

ytcannot be considered independent of the other observations in our sample, and consequently the

density function for the whole sample (the joint density for all observations) is not just the product

of the marginal densities.

3The gig addon, which handles other variants of conditionally heteroskedastic models, uses mle as its internal

engine.

Chapter 26. Maximum likelihood estimation 238

Maximum likelihood estimation, in these cases, is achieved by considering conditional densities, so

what we maximize is a conditional likelihood function. If we deﬁne the information set at time tas

Ft={yt, yt−1, . . .},

then the density of ytconditional on Ft−1is normal:

yt|Ft−1∼N[µ, ht].

By means of the properties of conditional distributions, the joint density can be factorized as follows

f(yt, yt−1, . . .) = "T

t=1

f(yt|Ft−1)#·f(y0)

If we treat y0as ﬁxed, then the term f(y0) does not depend on the unknown parameters, and therefore

the conditional log-likelihood can then be written as the sum of the individual contributions as

ℓ(µ, ω, α, β) =

t=1

ℓt(26.7)

where

ℓt= log 1

√ht

ϕyt−µ

√ht=−1

2log(ht) + (yt−µ)2

ht

The following script shows a simple application of this technique, which uses the data ﬁle djclose; it

is one of the example dataset supplied with gretl and contains daily data from the Dow Jones stock

index.

open djclose

series y = 100*ldiff(djclose)

scalar mu = 0.0

scalar omega = 1

scalar alpha = 0.4

scalar beta = 0.0

mle ll = -0.5*(log(h) + (e^2)/h)

series e = y - mu

series h = var(y)

series h = omega + alpha*(e(-1))^2 + beta*h(-1)

params mu omega alpha beta

end mle

26.7 Analytical derivatives

Computation of the score vector is essential for the working of the BFGS method. In all the previous

examples, no explicit formula for the computation of the score was given, so the algorithm was

fed numerically evaluated gradients. Numerical computation of the score for the i-th parameter is

performed via a ﬁnite approximation of the derivative, namely

∂ℓ(θ1, . . . , θn)

∂θi≃ℓ(θ1, . . . , θi+h, . . . , θn)−ℓ(θ1, . . . , θi−h, . . . , θn)

where his a small number.

In many situations, this is rather eﬃcient and accurate. A better approximation to the true derivative

may be obtained by forcing mle to use a technique known as Richardson Extrapolation, which gives

extremely precise results, but is considerably more CPU-intensive. This feature may be turned on by

using the set command as in

set bfgs_richardson on

Chapter 26. Maximum likelihood estimation 239

However, one might want to avoid the approximation and specify an exact function for the derivatives.

As an example, consider the following script:

nulldata 1000

series x1 = normal()

series x2 = normal()

series x3 = normal()

series ystar = x1 + x2 + x3 + normal()

series y = (ystar > 0)

scalar b0 = 0

scalar b1 = 0

scalar b2 = 0

scalar b3 = 0

mle logl = y*ln(P) + (1-y)*ln(1-P)

series ndx = b0 + b1*x1 + b2*x2 + b3*x3

series P = cnorm(ndx)

params b0 b1 b2 b3

end mle --verbose

Here, 1000 data points are artiﬁcially generated for an ordinary probit model:4ytis a binary variable,

which takes the value 1 if y∗

t=β1x1t+β2x2t+β3x3t+εt>0 and 0 otherwise. Therefore, yt= 1

with probability Φ(β1x1t+β2x2t+β3x3t) = πt. The probability function for one observation can be

written as

P(yt) = πyt

t(1 −πt)1−yt

Since the observations are independent and identically distributed, the log-likelihood is simply the

sum of the individual contributions. Hence

ℓ=

t=1

ytlog(πt) + (1 −yt) log(1 −πt)

The --verbose switch at the end of the end mle statement produces a detailed account of the

iterations done by the BFGS algorithm.

In this case, numerical diﬀerentiation works rather well; nevertheless, computation of the analytical

score is straightforward, since the derivative ∂ℓ

∂βican be written as

∂ℓ

∂βi

=∂ℓ

∂πt·∂πt

∂βi

via the chain rule, and it is easy to see that

∂ℓ

∂πt

=yt

πt−1−yt

1−πt

∂πt

∂βi

=ϕ(β1x1t+β2x2t+β3x3t)·xit

The mle block in the above script can therefore be modiﬁed as follows:

mle logl = y*ln(P) + (1-y)*ln(1-P)

series ndx = b0 + b1*x1 + b2*x2 + b3*x3

series P = cnorm(ndx)

series m = dnorm(ndx)*(y/P - (1-y)/(1-P))

deriv b0 = m

deriv b1 = m*x1

deriv b2 = m*x2

deriv b3 = m*x3

end mle --verbose

4Again, gretl does provide a native probit command (see section 38.1), but a probit model makes for a nice example

here.

Chapter 26. Maximum likelihood estimation 240

Note that the params statement has been replaced by a series of deriv statements; these have the

double function of identifying the parameters over which to optimize and providing an analytical

expression for their respective score elements.

26.8 Debugging ML scripts

We have discussed above the main sorts of statements that are permitted within an mle block, namely

•auxiliary commands to generate helper variables;

•deriv statements to specify the gradient with respect to each of the parameters; and

•aparams statement to identify the parameters in case analytical derivatives are not given.

For the purpose of debugging ML estimators one additional sort of statement is allowed: you can

print the value of a relevant variable at each step of the iteration. This facility is more restricted then

the regular print command. The command word print should be followed by the name of just one

variable (a scalar, series or matrix).

In the last example above a key variable named mwas generated, forming the basis for the analytical

derivatives. To track the progress of this variable one could add a print statement within the ML

block, as in

series m = dnorm(ndx)*(y/P - (1-y)/(1-P))

print m

26.9 Using functions

The mle command allows you to estimate models that gretl does not provide natively: in some cases,

it may be a good idea to wrap up the mle block in a user-deﬁned function (see Chapter 14), so as to

extend gretl’s capabilities in a modular and ﬂexible way.

As an example, we will take a simple case of a model that gretl does not yet provide natively: the

zero-inﬂated Poisson model, or ZIP for short.5In this model, we assume that we observe a mixed

population: for some individuals, the variable ytis (conditionally on a vector of exogenous covariates

xt) distributed as a Poisson random variate; for some others, ytis identically 0. The trouble is, we

don’t know which category a given individual belongs to.

For instance, suppose we have a sample of women, and the variable ytrepresents the number of

children that woman thas. There may be a certain proportion, α, of women for whom yt= 0 with

certainty (maybe out of a personal choice, or due to physical impossibility). But there may be other

women for whom yt= 0 just as a matter of chance — they haven’t happened to have any children at

the time of observation.

In formulae:

P(yt=k|xt) = αdt+ (1 −α)e−µtµyt

yt!

µt= exp(xtβ)

dt=(1 for yt= 0

0 for yt>0

Writing a mle block for this model is not diﬃcult:

mle ll = logprob

series xb = exp(b0 + b1 * x)

series d = (y=0)

series poiprob = exp(-xb) * xb^y / gamma(y+1)

series logprob = (alpha>0) && (alpha<1) ? \

5The actual ZIP model is in fact a bit more general than the one presented here. The specialized version discussed

in this section was chosen for the sake of simplicity. For futher details, see Greene (2003).

Chapter 26. Maximum likelihood estimation 241

log(alpha*d + (1-alpha)*poiprob) : NA

params alpha b0 b1

end mle -v

However, the code above has to be modiﬁed each time we change our speciﬁcation by, say, adding an

explanatory variable. Using functions, we can simplify this task considerably and eventually be able

to write something easy like

list X = const x

zip(y, X)

Listing 26.5: Zero-inﬂated Poisson Model – user-level function [Download ▼]

user-level function: estimate the model and print out

the results

function void zip(series y, list X)

matrix coef_stde = zip_estimate(y, X)

printf "\nZero-inflated Poisson model:\n"

string parnames = "alpha,"

string parnames += varname(X)

modprint coef_stde parnames

end function

Let’s see how this can be done. First we need to deﬁne a function called zip() that will take two

arguments: a dependent variable yand a list of explanatory variables X. An example of such function

can be seen in script 26.5. By inspecting the function code, you can see that the actual estimation

does not happen here: rather, the zip() function merely uses the built-in modprint command to

print out the results coming from another user-written function, namely zip_estimate().

The function zip_estimate() is not meant to be executed directly; it just contains the number-

crunching part of the job, whose results are then picked up by the end function zip(). In turn,

zip_estimate() calls other user-written functions to perform other tasks. The whole set of “internal”

functions is shown in the panel 26.6.

All the functions shown in 26.5 and 26.6 can be stored in a separate inp ﬁle and executed once, at

the beginning of our job, by means of the include command. Assuming the name of this script ﬁle

is zip_est.inp, the following is an example script which (a) includes the script ﬁle, (b) generates a

simulated dataset, and (c) performs the estimation of a ZIP model on the artiﬁcial data.

set verbose off

# include the user-written functions

include zip_est.inp

# generate the artificial data

nulldata 1000

set seed 732237

scalar truep = 0.2

scalar b0 = 0.2

scalar b1 = 0.5

series x = normal()

series y = (uniform()<truep) ? 0 : randgen(p, exp(b0 + b1*x))

list X = const x

# estimate the zero-inflated Poisson model

zip(y, X)

Chapter 26. Maximum likelihood estimation 242

Listing 26.6: Zero-inﬂated Poisson Model — internal functions [Download ▼]

/* compute log probabilities for the plain Poisson model */

function series ln_poi_prob(series y, list X, matrix beta)

series xb = lincomb(X, beta)

return -exp(xb) + y*xb - lngamma(y+1)

end function

/* compute log probabilities for the zero-inflated Poisson model */

function series ln_zip_prob(series y, list X, matrix beta, scalar p0)

# check if the probability is in [0,1]; otherwise, return NA

ifp0>1||p0<0

series ret = NA

else

series ret = ln_poi_prob(y, X, beta) + ln(1-p0)

series ret = y==0 ? ln(p0 + exp(ret)) : ret

endif

return ret

end function

/* do the actual estimation (silently) */

function matrix zip_estimate(series y, list X)

# initialize alpha to a "sensible" value: half the frequency

# of zeros in the sample

scalar alpha = mean(y==0)/2

# initialize the coeffs (we assume the first explanatory

# variable is the constant here)

matrix coef = zeros(nelem(X), 1)

coef[1] = mean(y) / (1-alpha)

# do the actual ML estimation

mle ll = ln_zip_prob(y, X, coef, alpha)

params alpha coef

end mle --hessian --quiet

return $coeff ~ $stderr

end function

Chapter 26. Maximum likelihood estimation 243

The results are as follows:

Zero-inflated Poisson model:

coefficient std. error z-stat p-value

-------------------------------------------------------

alpha 0.209738 0.0261746 8.013 1.12e-15 ***

const 0.167847 0.0449693 3.732 0.0002 ***

x 0.452390 0.0340836 13.27 3.32e-40 ***

A further step may then be creating a function package for accessing your new zip() function via

gretl’s graphical interface. For details on how to do this, see section 14.5.

26.10 Advanced use of mle: functions, analytical derivatives, algorithm choice

All the techniques decribed in the previous sections may be combined, and mle can be used for solving

non-standard estimation problems (provided, of course, that one chooses maximum likelihood as the

preferred inference method).

The strategy that, as of this writing, has proven most successful in designing scripts for this purpose

is:

•Modularize your code as much as possible.

•Use analytical derivatives whenever possible.

•Choose your optimization method wisely.

In the rest of this section, we will expand on the probit example of section 26.7 to give the reader

an idea of what a “heavy-duty” application of mle looks like. Most of the code fragments come from

mle-advanced.inp, which is one of the sample scripts supplied with the standard installation of gretl

(see under File >Script ﬁles >Practice File).

BFGS with and without analytical derivatives

The example in section 26.7 can be made more general by using matrices and user-written functions.

Consider the following code fragment:

list X = const x1 x2 x3

matrix b = zeros(nelem(X),1)

mle logl = y*ln(P) + (1-y)*ln(1-P)

series ndx = lincomb(X, b)

series P = cnorm(ndx)

params b

end mle

In this context, the fact that the model we are estimating has four explanatory variables is totally

incidental: the code is written in such a way that we could change the content of the list Xwithout

having to make any other modiﬁcation. This was made possible by:

1. gathering the parameters to estimate into a single vector brather than using separate scalars;

2. using the nelem() function to initialize b, so that its dimension is kept track of automatically;

3. using the lincomb() function to compute the index function.

A parallel enhancement could be achieved in the case of analytically computed derivatives: since bis

now a vector, mle expects the argument to the deriv keyword to be a matrix, in which each column

is the partial derivative to the corresponding element of b. It is useful to re-write the score for the

i-th observation as ∂ℓi

∂β =mix′

i(26.8)

Chapter 26. Maximum likelihood estimation 244

where miis the “signed Mills’ ratio”, that is

mi=yi

ϕ(x′

iβ)

Φ(x′

iβ)−(1 −yi)ϕ(x′

iβ)

1−Φ(x′

iβ),

which was computed in section 26.7 via

series P = cnorm(ndx)

series m = dnorm(ndx)*(y/P - (1-y)/(1-P))

Here, we will code it in a somewhat terser way as

series m = y ? invmills(-ndx) : -invmills(ndx)

and make use of the conditional assignment operator and of the specialized function invmills() for

eﬃciency. Building the score matrix is now easily achieved via

mle logl = y*ln(P) + (1-y)*ln(1-P)

series ndx = lincomb(X, b)

series P = cnorm(ndx)

series m = y ? invmills(-ndx) : -invmills(ndx)

matrix mX = {X}

deriv b = mX .* {m}

end mle

in which the {} operator was used to turn series and lists into matrices (see chapter 17). However,

proceeding in this way for more complex models than probit may imply inserting into the mle block

a long series of instructions; the example above merely happens to be short because the score matrix

for the probit model is very easy to write in matrix form.

A better solution is writing a user-level function to compute the score and using that inside the mle

block, as in

function matrix score(matrix b, series y, list X)

series ndx = lincomb(X, b)

series m = y ? invmills(-ndx) : -invmills(ndx)

return {m} .* {X}

end function

[...]

mle logl = y*ln(P) + (1-y)*ln(1-P)

series ndx = lincomb(X, b)

series P = cnorm(ndx)

deriv b = score(b, y, X)

end mle

In this way, no matter how complex the computation of the score is, the mle block remains nicely

compact.

Newton’s method and the analytical Hessian

As mentioned above, gretl oﬀers the user the option of using Newton’s method for maximizing the

log-likelihood. In terms of the notation used in section 26.1, the direction for updating the inital

parameter vector θ0is given by

d[g(θ0)] = −λH(θ0)−1g(θ0),(26.9)

where H(θ) is the Hessian of the total loglikelihood computed at θand 0 < λ < 1 is a scalar called

the step length.

The above expression makes a few points clear:

1. At each step, it must be possible to compute not only the score g(θ), but also its derivative

H(θ);

Chapter 26. Maximum likelihood estimation 245

2. the matrix H(θ) should be nonsingular;

3. it is assumed that for some positive value of λ,ℓ(θ1)> ℓ(θ0); in other words, that going in the

direction d[g(θ0)] leads upwards for some step length.

The strength of Newton’s method lies in the fact that, if the loglikelihood is globally concave, then

(26.9) enjoys certain optimality properties and the number of iterations required to reach the max-

imum is often much smaller than it would be with other methods, such as BFGS. However, it may

have some disadvantages: for a start, the Hessian H(θ) may be diﬃcult or very expensive to compute;

moreover, the loglikelihood may not be globally concave, so for some values of θ, the matrix H(θ) is

not negative deﬁnite or perhaps even singular. Those cases are handled by gretl’s implementation of

Newton’s algorithm by means of several heuristic techniques6, but a number of adverse consequences

may occur, which range from longer computation time for optimization to non-convergence of the

algorithm.

As a consequence, using Newton’s method is advisable only when the computation of the Hessian is

not too CPU-intensive and the nature of the estimator is such that it is known in advance that the

loglikelihood is globally concave. The probit models satisﬁes both requisites, so we will expand the

preceding example to illustrate how to use Newton’s method in gretl.

A ﬁrst example may be given simply by issuing the command

set optimizer newton

before the mle block.7This will instruct gretl to use Newton’s method instead of BFGS. If the deriv

keyword is used, gretl will diﬀerentiate the score function numerically; otherwise, if the score has to be

computed itself numerically, gretl will calculate H(θ) by diﬀerentiating the loglikelihood numerically

twice. The latter solution, though, is generally to be avoided, as may be extremely time-consuming

and may yield imprecise results.

A much better option is to calculate the Hessian analytically and have gretl use its true value rather

than a numerical approximation. In most cases, this is both much faster and numerically stable,

but of course comes at the price of having to diﬀerentiate the loglikelihood twice to respect with the

parameters and translate the resulting expressions into eﬃcient hansl code.

Luckily, both tasks are relatively easy in the probit case: the matrix of second derivatives of ℓimay

be written as ∂2ℓi

∂β∂β′=−mi(mi+x′

iβ)xix′

so the total Hessian is

i=1

∂2ℓi

∂β∂β′=−X′





...





X(26.10)

where wi=mi(mi+x′

iβ). It can be shown that wi>0, so the Hessian is guaranteed to be negative

deﬁnite in all sensible cases and the conditions are ideal for applying Newton’s method.

Ahansl translation of equation (26.10) may look like

function void Hess(matrix *H, matrix b, series y, list X)

/* computes the negative Hessian for a Probit model */

series ndx = lincomb(X, b)

series m = y ? invmills(-ndx) : -invmills(ndx)

series w = m*(m+ndx)

matrix mX = {X}

H = (mX .* {w})’mX

end function

6The gist to it is that, if His not negative deﬁnite, it is substituted by k·dg(H) + (1 −k)·H, where kis a suitable

scalar; however, if you’re interested in the precise details, you’ll be much better oﬀ looking at the source code: the ﬁle

you’ll want to look at is lib/src/gretl_bfgs.c.

7To go back to BFGS, you use set optimizer bfgs.

Chapter 26. Maximum likelihood estimation 246

There are two characteristics worth noting of the function above. For a start, it doesn’t return

anything: the result of the computation is simply stored in the matrix pointed at by the ﬁrst argument

of the function. Second, the result is not the Hessian proper, but rather its negative. This function

becomes usable from within an mle block by the keyword hessian. The syntax is

mle ...

...

hessian funcname(&mat_addr, ...)

end mle

In other words, the hessian keyword must be followed by the call to a function whose ﬁrst argument

is a matrix pointer which is supposed to be ﬁlled with the negative of the Hessian at θ.

We said above (section 26.1) that the covariance matrix of the parameter estimates is by default

estimated using the Outer Product of the Gradient (so long as the log-likelihood function returns the

per-observation contributions). However, if you supply a function that computes the Hessian then

by default it is used in estimating the covariance matrix. If you wish to impose use of OPG instead,

append the --opg option to the end of the mle block.

Note that gretl does not perform any numerical check on whether a user-supplied function computes

the Hessian correctly. On the one hand, this means that you can trick mle into using alternatives

to the Hessian and thereby implement other optimization methods. For example, if you substitute

in equation 26.9 the Hessian Hwith the negative of the OPG matrix −G′G, as deﬁned in (26.2),

you get the so-called BHHH optimization method (see Berndt et al. (1974)). Again, the sample ﬁle

mle-advanced.inp provides an example. On the other hand, you may want to perform a check of

your analytically-computed Hmatrix versus a numerical approximation.

If you have a function that computes the score, this is relatively simple to do by using the fdjac

function, brieﬂy described in section 37.3, which computes a numerical approximation to a derivative.

In practice, you need a function computing g(θ) as a row vector and then use fdjac to diﬀerentiate

it numerically with respect to θ. The result can then be compared to your analytically-computed

Hessian. The code fragment below shows an example of how this can be done in the probit case:

function matrix totalscore(matrix *b, series y, list X)

/* computes the total score */

return sumc(score(b, y, X))

end function

function void check(matrix b, series y, list X)

/* compares the analytical Hessian to its numerical

approximation obtained via fdjac */

matrix aH

Hess(&aH, b, y, X) # stores the analytical Hessian into aH

matrix nH = fdjac(b, "totalscore(&b, y, X)")

nH = 0.5*(nH + nH’) # force symmetry

printf "Numerical Hessian\n%16.6f\n", nH

printf "Analytical Hessian (negative)\n%16.6f\n", aH

printf "Check (should be zero)\n%16.6f\n", nH + aH

end function

26.11 Estimating constrained models

In many cases, you may want to perform ML estimation of a model under some kind of constraint.

Mathematically, this amounts to maximizing the log-likelihood ℓ(θ) under some restriction. Assume

that the restriction can be represented as g(θ) = 0, where the function g(·) is diﬀerentiable. On paper,

the most straightforward way to accomplish this task is to set up a Lagrangean

L(θ) = ℓ(θ) + λ′g(θ)

and solve the ﬁrst-order conditions that arise from diﬀerentiating the Lagrangean with respect to θ

and λ.

Chapter 26. Maximum likelihood estimation 247

If an explicit solution can be found, then all is well; but in many cases the resulting system of equations

cannot be solved explicitly, so that numerical optimisation in necessary. In such cases the approach

above is not particularly useful; a diﬀerent strategy is much more convenient.

The idea is to ﬁnd an alternative parametrization—a means of expressing the vector θas a (diﬀeren-

tiable) function of a smaller set of parameters ψ. In other words, ﬁnd a function h(·) such that any

admissible value of θcan be written as θ=h(ψ) and g[h(ψ)] = 0 for any value of ψ. Then maxi-

mization of the log-likelihood is simply a question of operating on ℓ∗(ψ) = ℓ[h(ψ)] using an ordinary

unconstrained numerical optimization routine.

Once the ML estimate ˆ

ψis available, it is easy to recover the corresponding constrained vector

θ=h(ˆ

ψ). Computing the covariance matrix involves an extra step, known as the delta method: the

asymptotic covariance matrix of ˆ

θcan be computed as

Vˆ

θ=J(ˆ

ψ)′Vˆ

ψJ(ˆ

ψ) (26.11)

where Jis the Jacobian matrix, holding the partial derivatives of h(ψ). It is recommended that

the Jacobian matrix should be computed analytically whenever possible, but as a fallback strategy,

numerical diﬀerentiation (available via the function fdjac — see section 37.3) is a viable alternative.

Note that the matrix produced by this method will be singular by construction.

The example reported in script 26.7 is perhaps a little contrived, but useful to elucidate the technique.

Suppose we wish to estimate mean and variance of an iid sample of Gaussian random variables, under

the constraint that V(xi) = σ2= exp[E(xi)] = exp(µ). Of course the unconstrained ML estimators

ˆµ=¯

Xand ˆσ2=n−1Pi(xi−¯

X)2are not guaranteed to satisfy the constraints (in fact, the probability

that they do is 0).

The Lagrangean in this case would be

L(θ) = K−n

2log σ2−1

2σ2X

(xi−µ)2+λ(eµ−σ2)

and ﬁnding an explicit solution by solving the ﬁrst-order conditions is not at all easy. Fortunately,

numerical optimization becomes straightforward by expressing the constrained parameters as

θ=µ, σ2′= [ψ, exp(ψ)]′=h(ψ);

after maximizing the log-likelihood, the covariance matrix for θcan be recovered by computing the

Jacobian as

J(ψ) = hdµ

dψ

dσ2

dψ i=h1 exp(ψ)i

and applying formula (26.11).

Running the example script should produce the following output:

unconstrained estimates: mean = 1.00314, variance = 2.8903

check: vhat - exp(muhat) = 0.163481

Model 1: ML, using observations 1-1000

loglik = -0.5*log(2*$pi) - 0.5*log(s2) - 0.5*(x-m)^2/s2

Standard errors based on Outer Products matrix

estimate std. error z p-value

----------------------------------------------------

psi[1] 1.03763 0.0357311 29.04 2.07e-185 ***

Log-likelihood -1949.972 Akaike criterion 3901.943

Schwarz criterion 3906.851 Hannan-Quinn 3903.808

check: vhat - exp(muhat) = 0

coefficient std. error z p-value

-------------------------------------------------------

mean 1.03763 0.0357311 29.04 2.07e-185 ***

variance 2.82251 0.100851 27.99 2.35e-172 ***

Chapter 26. Maximum likelihood estimation 248

Listing 26.7: Example of ML estimation of a model under constraints [Download ▼]

set verbose off

set seed 7120

function matrix h(matrix psi)

ret = psi[1] | exp(psi[1])

return ret

end function

function matrix anJacob(matrix psi)

# the derivative of h

return 1 ~ exp(psi[1])

end function

nulldata 1000

# generate artificial data from a N(1, e) distribution

series x = 1 + normal() * exp(0.5)

# show that the unconstrained estimates don’t satisfy the restriction

scalar muhat = mean(x)

scalar s2hat = sst(x)/$nobs

printf "unconstrained estimates: mean = %g, variance = %g\n", muhat, s2hat

printf "check: vhat - exp(muhat) = %g\n\n", s2hat - exp(muhat)

# now estimate under the constraint exp(mean) = variance

psi = {1}

mle loglik = -0.5*log(2*$pi) - 0.5*log(s2) - 0.5*(x-m)^2/s2

matrix par = h(psi)

scalar m = par[1]

scalar s2 = par[2]

params psi

end mle

# now map psi to the constrained parametrisation

matrix par = h(psi)

# show that now the constraint holds

printf "check: vhat - exp(muhat) = %g\n\n", par[2] - exp(par[1])

# take care of the covariance matrix

matrix vpar = qform(anJacob(psi)’, $vcv)

# alternatively, one could use the numerical Jacobian, as in

# matrix vpar = qform(fdjac(psi, "h(psi)"), $vcv)

# finally, print out the constrained parameters via "modprint"

matrix cs = par ~ sqrt(diag(vpar))

modprint cs "mean variance"

Chapter 26. Maximum likelihood estimation 249

26.12 Handling non-convergence gracefully

If the numerical aspects of the estimation procedure are complex, it is possible that mle fails to ﬁnd

the maximum within the number of iterations stipulated via the bfgs_maxiter state variable (which

defaults to 500).

In these cases, mle will exit with error and it’s up to the user to handle the situation appropriately.

For example, it is possible that mle is used inside a loop and you don’t want the loop to stop in case

convergence is not achieved. The catch command modiﬁer (see also the Gretl Command Reference)

is an excellent tool for this purpose.

The example provided in listing 26.8 illustrates the usage of catch in an artiﬁcially simple context:

we use the mle command for estimating mean and variance of a Gaussian rv (of course you don’t

need the mle apparatus for this, but it makes for a nice example). The gist of the example is using

the set bfgs_maxiter command to force mle to abort after a very small number of iterations, so

that you can have an idea on how to use the catch modiﬁer and the associated $error accessor to

handle the situation.

You may want to increase the maximum number if BFGS iterations in the example to check what

happens if the algorithm is allowed to converge. Note that, upon successful completion of mle, a

bundle named $model is available, containing several quantities that may be of your interest, including

the total number of function evaluations.

Listing 26.8: Handling non-convergence via catch [Download ▼]

set verbose off

nulldata 200

set seed 8118

# generate simulated data from a N(3,4) variate

series x = normal(3,2)

# set starting values

scalar m = 0

scalar s2 = 1

# set iteration limit to a ridiculously low value

set bfgs_maxiter 10

# perform ML estimation; note the "catch" modifier

catch mle loglik = -0.5* (log(2*$pi) + log(s2) + e2/s2)

series e2 = (x - m)^2

params m s2

end mle --quiet

# grab the error and proceed as needed

err = $error

if err

printf "Not converged! (m = %g, s2 = %g)\n", m, s2

else

printf "Converged after %d iterations\n", $model.grcount

cs = $coeff ~ sqrt(diag($vcv))

pn = "m s2"

modprint cs pn

endif

Chapter 27

GMM estimation

27.1 Introduction and terminology

The Generalized Method of Moments (GMM) is a very powerful and general estimation method,

which encompasses practically all the parametric estimation techniques used in econometrics. It was

introduced in Hansen (1982) and Hansen and Singleton (1982); an excellent and thorough treatment

is given in chapter 17 of Davidson and MacKinnon (1993).

The basic principle on which GMM is built is rather straightforward. Suppose we wish to estimate a

scalar parameter θbased on a sample x1, x2, . . . , xT. Let θ0indicate the “true” value of θ. Theoretical

considerations (either of statistical or economic nature) may suggest that a relationship like the

following holds:

E[xt−g(θ)] = 0 ⇔θ=θ0,(27.1)

with g(·) a continuous and invertible function. That is to say, there exists a function of the data and

the parameter, with the property that it has expectation zero if and only if it is evaluated at the true

parameter value. For example, economic models with rational expectations lead to expressions like

(27.1) quite naturally.

If the sampling model for the xts is such that some version of the Law of Large Numbers holds, then

X=1

t=1

−→ g(θ0);

hence, since g(·) is invertible, the statistic

θ=g−1(¯

X)p

−→ θ0,

so ˆ

θis a consistent estimator of θ. A diﬀerent way to obtain the same outcome is to choose, as an

estimator of θ, the value that minimizes the objective function

F(θ) = "1

t=1

(xt−g(θ))#2

=¯

X−g(θ)2; (27.2)

the minimum is trivially reached at ˆ

θ=g−1(¯

X), since the expression in square brackets equals 0.

The above reasoning can be generalized as follows: suppose θis an n-vector and we have mrelations

E[fi(xt, θ)] = 0 for i= 1 . . . m, (27.3)

where E[·] is a conditional expectation on a set of pvariables zt, called the instruments. In the above

simple example, m= 1 and f(xt, θ) = xt−g(θ), and the only instrument used is zt= 1. Then, it

must also be true that

E[fi(xt, θ)·zj,t] = E[fi,j,t (θ)] = 0 for i= 1 . . . m and j= 1 . . . p; (27.4)

equation (27.4) is known as an orthogonality condition, or moment condition. The GMM estimator

is deﬁned as the minimum of the quadratic form

F(θ, W ) = ¯

f′W¯

f,(27.5)

where ¯

fis a (1 ×m·p) vector holding the average of the orthogonality conditions and Wis some

symmetric, positive deﬁnite matrix, known as the weights matrix. A necessary condition for the

minimum to exist is the order condition n≤m·p.

250

Chapter 27. GMM estimation 251

The statistic ˆ

θ= Argmin

F(θ, W ) (27.6)

is a consistent estimator of θwhatever the choice of W. However, to achieve maximum asymptotic

eﬃciency Wmust be proportional to the inverse of the long-run covariance matrix of the orthogonality

conditions; if Wis not known, a consistent estimator will suﬃce.

These considerations lead to the following empirical strategy:

1. Choose a positive deﬁnite Wand compute the one-step GMM estimator ˆ

θ1. Customary choices

for Ware Im·por Im⊗(Z′Z)−1.

2. Use ˆ

θ1to estimate V(fi,j,t(θ)) and use its inverse as the weights matrix. The resulting estimator

θ2is called the two-step estimator.

3. Re-estimate V(fi,j,t(θ)) by means of ˆ

θ2and obtain ˆ

θ3; iterate until convergence. Asymptotically,

these extra steps are unnecessary, since the two-step estimator is consistent and eﬃcient; how-

ever, the iterated estimator often has better small-sample properties and should be independent

of the choice of Wmade at step 1.

In the special case when the number of parameters nis equal to the total number of orthogonality

conditions m·p, the GMM estimator ˆ

θis the same for any choice of the weights matrix W, so the

ﬁrst step is suﬃcient; in this case, the objective function is 0 at the minimum.

If, on the contrary, n<m·p, the second step (or successive iterations) is needed to achieve eﬃciency,

and the estimator so obtained can be very diﬀerent, in ﬁnite samples, from the one-step estimator.

Moreover, the value of the objective function at the minimum, suitably scaled by the number of

observations, yields Hansen’s J statistic; this statistic can be interpreted as a test statistic that has

aχ2distribution with m·p−ndegrees of freedom under the null hypothesis of correct speciﬁcation.

See Davidson and MacKinnon (1993, section 17.6) for details.

In the following sections we will show how these ideas are implemented in gretl through some examples.

27.2 GMM as Method of Moments

We thank Alecos Papadopoulos, who kindly contributed a document on which this section is based.

A very simple illustration of GMM can be given by dropping the “G”, via an example of the time-

honored statistical technique known as the method of moments. Let’s see how to estimate the pa-

rameters of a gamma distribution, a task which we used to exemplify ML estimation in section 26.4.

Suppose that we have an i.i.d. sample of size Tfrom a gamma distribution. The gamma density can

be parameterized in terms of the two parameters k(shape) and θ(scale), both real and positive. In

order to estimate them by the method of moments, we need two moment conditions so that we have

two equations in the two unknowns (in GMM terms, this amounts to exact identiﬁcation). The two

relations we need are

E(xi) = k·θ V (xi) = k·θ2

Substituting the ﬁnite-sample counterparts of the theoretical moments we have

X=ˆ

k·ˆ

θ(27.7)

V=ˆ

k·ˆ

θ2(27.8)

These two equations are easy to solve analytically, giving ˆ

θ=ˆ

V / ¯

Xand ˆ

k=¯

X/ˆ

θ, ( ˆ

Vbeing the sample

variance of xi), but it’s instructive to see how the gmm command will solve this system of equations

numerically.

We feed gretl the necessary ingredients for GMM estimation in a command block that starts with

gmm and ends with end gmm. The following elements are compulsory within a gmm block:

1. one or more statements to calculate the left-hand side of the orthogonality conditions.

2. one or more orthog statements

Chapter 27. GMM estimation 252

3. one weights statement

4. one params statement

These elements should be given in the stated order.

The orthog statements are used to specify the orthogonality conditions. They must follow the syntax

orthog x ; Z

where xmay be a series, matrix or list of series (given either by name or in extenso)Zmay also be

a series, matrix or list. Note the structure of the statement: it is assumed that the term to the left

of the semicolon represents a quantity that depends on the estimated parameters (and so must be

updated in the process of iterative estimation), while the term on the right is a constant function of

the data.

The weights statement is used to specify the initial weighting matrix and its syntax is straightforward.

The params statement speciﬁes the parameters with respect to which the GMM criterion should be

minimized; it follows the same logic and rules as in the mle and nls commands.

The minimum is found numerically using BFGS (see chapters 37 and 26). The progress of the

optimization procedure can be observed by appending the --verbose switch to the end gmm line.

Equations 27.7 and 27.8 are not yet in the “moment condition” form required by the gmm command.

We need to transform them and arrive at something looking like E(ej,izj,i) = 0 for j= 1,2. We

therefore need two variables e1and e2with associated instruments z1and z2; we can then tell gretl

that ˆ

E(ejzj) = 0 must be satisﬁed (where the ˆ

E(·) notation indicates sample moments).

If we deﬁne the instrument as a series of ones, and deﬁne a series e1 such that e1,i =xi−kθ, we can

rewrite the ﬁrst moment condition as ˆ

E[e1,i ·1] = 0

In similar manner we can deﬁne a series e2 such that e2,i = (xi−¯

X)2−kθ2(and set z2=z1), so

that the second moment condition is ˆ

E[e2,i ·1] = 0

Since the built-in const is just a series of 1s we could express these moment conditions in the notation

that gretl expects as

orthog e1 ; const

orthog e2 ; const

But given that the right-hand side is the same for each the conditions can be combined, using a list

on the left:

orthog e1 e2 ; const

The required weighting matrix can be set to any positive deﬁnite 2 ×2 matrix, since under exact

identiﬁcation the choice doesn’t matter and its dimension is determined by the number of orthogo-

nality conditions. So we’ll use the I2identity matrix. Example code is shown in Listing 27.1, along

with the output it produces.

In order to use the unbiased estimator of the sample variance, we have to modify the second moment

condition by substituting

series e2 = (x - m)^2 - p*theta^2

with

scalar adj = $nobs / ($nobs - 1)

series e2 = adj * (x - m)^2 - p*theta^2

The output then becomes:

Chapter 27. GMM estimation 253

Listing 27.1: MM estimation of Gamma parameters [Download ▼]

# create an empty data set with 200 observations

nulldata 200

# fix a random seed for replicability

set seed 1707138404

# generate a Gamma random variable x with shape k = 3 and scale theta = 2

series x = randgen(G, 3, 2)

# declare and initialize the parameter estimates

scalar k = 1

scalar theta = 1

# create the weight matrix as the identity matrix

matrix W = I(2)

# declare two series for use in the orthogonality conditions

series e1 = 0

series e2 = 0

# obtain the sample mean of x

scalar m = mean(x)

gmm

series e1 = x - k*theta

series e2 = (x - m)^2 - k*theta^2

orthog e1 e2 ; const

weights W

params k theta

end gmm

The gmm output is:

Model 1: 1-step GMM, using observations 1-200

estimate std. error z p-value

---------------------------------------------------

p 3.08539 0.412149 7.486 7.09e-14 ***

theta 1.97898 0.286796 6.900 5.19e-12 ***

GMM criterion: Q = 1.19302e-28 (TQ = 2.38604e-26)

Chapter 27. GMM estimation 254

Model 1: 1-step GMM, using observations 1-200

estimate std. error z p-value

---------------------------------------------------

p 3.06997 0.410088 7.486 7.09e-14 ***

theta 1.98892 0.288237 6.900 5.19e-12 ***

GMM criterion: Q = 1.66926e-28 (TQ = 3.33852e-26)

In this case both of the point estimates are marginally closer to the true values. But this is a small-

sample eﬀect, not something to be expected in large samples.

27.3 OLS as GMM

We now move to an example closer to econometrics proper: the linear model yt=xtβ+ut. Most

of us are used to reading it informally as the sum of a “systematic part” and a “disturbance”, but a

more rigorous interpretation of this familiar expression rests on the hypothesis that the conditional

mean E(yt|xt) is linear plus the deﬁnition of utas yt−E(yt|xt).

From the deﬁnition of ut, it follows that E(ut|xt) = 0. The following orthogonality condition is

therefore available:

E[f(β)] = 0,(27.9)

where f(β)=(yt−xtβ)xt. The deﬁnitions given in section 27.1 therefore specialize here to:

•θis β;

•the instrument is xt;

•fi,j,t(θ) is (yt−xtβ)xt=utxt; the orthogonality condition is interpretable as the requirement

that the regressors should be uncorrelated with the disturbances;

•Wcan be any symmetric positive deﬁnite matrix, since the number of parameters equals the

number of orthogonality conditions. Let’s say we choose I.

•The function F(θ, W ) is in this case

F(θ, W ) = "1

t=1

(ˆutxt)#2

and it is easy to see why OLS and GMM coincide here: the GMM objective function has the

same minimizer as the objective function of OLS, the residual sum of squares. Note, however,

that the two functions are not equal to one another: at the minimum, F(θ, W ) = 0 while the

minimized sum of squared residuals is zero only in the special case of a perfect linear ﬁt.

The code snippet below uses gretl’s gmm command to make the above operational. The series eholds

the “residuals” and the series xholds the regressor. If xhad been a list (or a matrix), the orthog

statement would have generated one orthogonality condition for each element (or column) of x.

/* initialize stuff */

series e = 0

scalar beta = 0

matrix W = I(1)

/* proceed with estimation */

gmm

series e = y - x*beta

orthog e ; x

weights W

params beta

end gmm

Chapter 27. GMM estimation 255

27.4 TSLS as GMM

Moving closer to the proper domain of GMM, we now consider two-stage least squares (TSLS) as a

case of GMM.

TSLS is employed in the case where one wishes to estimate a linear model of the form yt=Xtβ+ut,

but where one or more of the variables in the matrix Xare potentially endogenous—correlated with

the error term, u. We proceed by identifying a set of instruments, Zt, which are explanatory for

the endogenous variables in Xbut which are plausibly uncorrelated with u. The classic two-stage

procedure is (1) regress the endogenous elements of Xon Z; then (2) estimate the equation of interest,

with the endogenous elements of Xreplaced by their ﬁtted values from (1).

An alternative perspective is given by GMM. We deﬁne the residual ˆutas yt−Xtˆ

β, as usual. But

instead of relying on E(u|X) = 0 as in OLS, we base estimation on the condition E(u|Z) = 0. In this

case it is natural to base the initial weighting matrix on the covariance matrix of the instruments.

Listing 27.2 presents a model from Stock and Watson’s Introduction to Econometrics. The demand

for cigarettes is modeled as a linear function of the logs of price and income; income is treated as

exogenous while price is taken to be endogenous and two measures of tax are used as instruments.

Since we have two instruments and one endogenous variable the model is over-identiﬁed.

In the GMM context, this happens when you have more orthogonality conditions than parameters to

estimate. If so, asymptotic eﬃciency gains can be expected by iterating the procedure once or more.

This is accomplished by specifying, after the end gmm statement, two mutually exclusive options:

--two-step or --iterate, whose meaning should be obvious. Note that when the problem is over-

identiﬁed, the weights matrix will inﬂuence the solution you get from the 1- and 2-step procedures.

☞In cases other than one-step estimation the speciﬁed weights matrix will be overwritten with the ﬁnal weights on

completion of the gmm command. If you wish to execute more than one GMM block with a common starting-point

it is therefore necessary to reinitialize the weights matrix between runs.

Partial output from this script is shown in 27.3. The estimated standard errors from GMM are robust

by default; if we supply the --robust option to the tsls command we get identical results.1

After the end gmm statement two mutually exclusive options can be speciﬁed: --two-step or --iterate,

whose meaning should be obvious.

27.5 Covariance matrix options

The covariance matrix of the estimated parameters depends on the choice of Wthrough

Σ=(J′W J )−1J′WΩW J(J′W J )−1(27.10)

where Jis a Jacobian term

Jij =∂¯

∂θj

and Ω is the long-run covariance matrix of the orthogonality conditions.

Gretl computes Jby numeric diﬀerentiation (there is no provision for specifying a user-supplied

analytical expression for Jat the moment). As for Ω, a consistent estimate is needed. The simplest

choice is the sample covariance matrix of the fts:

Ω0(θ) = 1

t=1

ft(θ)ft(θ)′(27.11)

This estimator is robust with respect to heteroskedasticity, but not with respect to autocorrelation. A

heteroskedasticity- and autocorrelation-consistent (HAC) variant can be obtained using the Bartlett

kernel or similar. A univariate version of this is used in the context of the lrvar() function—see

equation (22.6). The multivariate version is set out in equation (27.12).

Ωk(θ) = 1

T−k

t=k"k

i=−k

wift(θ)ft−i(θ)′#,(27.12)

1The data ﬁle used in this example is available in the Stock and Watson package for gretl. See http://gretl.

sourceforge.net/gretl_data.html.

Chapter 27. GMM estimation 256

Listing 27.2: TSLS via GMM [Download ▼]

open cig_ch10.gdt

# real avg price including sales tax

ravgprs = avgprs / cpi

# real avg cig-specific tax

rtax = tax / cpi

# real average total tax

rtaxs = taxs / cpi

# real average sales tax

rtaxso = rtaxs - rtax

# logs of consumption, price, income

lpackpc = log(packpc)

lravgprs = log(ravgprs)

perinc = income / (pop*cpi)

lperinc = log(perinc)

# restrict sample to 1995 observations

smpl --restrict year==1995

# Equation (10.16) by tsls

list xlist = const lravgprs lperinc

list zlist = const rtaxso rtax lperinc

tsls lpackpc xlist ; zlist --robust

# setup for gmm

matrix Z = { zlist }

matrix W = inv(Z’Z)

series e = 0

scalar b0 = 1

scalar b1 = 1

scalar b2 = 1

gmm e = lpackpc - b0 - b1*lravgprs - b2*lperinc

orthog e ; Z

weights W

params b0 b1 b2

end gmm

Chapter 27. GMM estimation 257

Listing 27.3: TSLS via GMM: partial output

Model 1: TSLS estimates using the 48 observations 1-48

Dependent variable: lpackpc

Instruments: rtaxso rtax

Heteroskedasticity-robust standard errors, variant HC0

VARIABLE COEFFICIENT STDERROR T STAT P-VALUE

const 9.89496 0.928758 10.654 <0.00001 ***

lravgprs -1.27742 0.241684 -5.286 <0.00001 ***

lperinc 0.280405 0.245828 1.141 0.25401

Model 2: 1-step GMM estimates using the 48 observations 1-48

e = lpackpc - b0 - b1*lravgprs - b2*lperinc

PARAMETER ESTIMATE STDERROR T STAT P-VALUE

b0 9.89496 0.928758 10.654 <0.00001 ***

b1 -1.27742 0.241684 -5.286 <0.00001 ***

b2 0.280405 0.245828 1.141 0.25401

GMM criterion = 0.0110046

Gretl computes the HAC covariance matrix by default when a GMM model is estimated on time

series data. You can control the kernel and the bandwidth (that is, the value of kin 27.12) using the

set command. See chapter 22 for further discussion of HAC estimation. You can also ask gretl not

to use the HAC version by saying

set force_hc on

27.6 A real example: the Consumption Based Asset Pricing Model

To illustrate gretl’s implementation of GMM, we will replicate the example given in chapter 3 of Hall

(2005). The model to estimate is a classic application of GMM, and provides an example of a case

when orthogonality conditions do not stem from statistical considerations, but rather from economic

theory.

A rational individual who must allocate his income between consumption and investment in a ﬁnancial

asset must in fact choose the consumption path of his whole lifetime, since investment translates into

future consumption. It can be shown that an optimal consumption path should satisfy the following

condition:

pU′(ct) = δkE[rt+kU′(ct+k)|Ft],(27.13)

where pis the asset price, U(·) is the individual’s utility function, δis the individual’s subjective

discount rate and rt+kis the asset’s rate of return between time tand time t+k.Ftis the information

set at time t; equation (27.13) says that the utility“lost” at time tby purchasing the asset instead of

consumption goods must be matched by a corresponding increase in the (discounted) future utility of

the consumption ﬁnanced by the asset’s return. Since the future is uncertain, the individual considers

his expectation, conditional on what is known at the time when the choice is made.

We have said nothing about the nature of the asset, so equation (27.13) should hold whatever asset

we consider; hence, it is possible to build a system of equations like (27.13) for each asset whose price

we observe.

If we are willing to believe that

•the economy as a whole can be represented as a single gigantic and immortal representative

individual, and

•the function U(x) = xα−1

αis a faithful representation of the individual’s preferences,

Chapter 27. GMM estimation 258

then, setting k= 1, equation (27.13) implies the following for any asset j:

E"δrj,t+1

pj,t Ct+1

Ctα−1Ft#= 1,(27.14)

where Ctis aggregate consumption and αand δare the risk aversion and discount rate of the repre-

sentative individual. In this case, it is easy to see that the “deep”parameters αand δcan be estimated

via GMM by using

et=δrj,t+1

pj,t Ct+1

Ctα−1

−1

as the moment condition, while any variable known at time tmay serve as an instrument.

Listing 27.4: Estimation of the Consumption Based Asset Pricing Model [Download ▼]

open hall.gdt

set force_hc on

scalar alpha = 0.5

scalar delta = 0.5

series e = 0

list inst = const consrat(-1) consrat(-2) ewr(-1) ewr(-2)

matrix V0 = 100000*I(nelem(inst))

matrix Z = { inst }

matrix V1 = $nobs*inv(Z’Z)

gmm e = delta*ewr*consrat^(alpha-1) - 1

orthog e ; inst

weights V0

params alpha delta

end gmm

gmm e = delta*ewr*consrat^(alpha-1) - 1

orthog e ; inst

weights V1

params alpha delta

end gmm

gmm e = delta*ewr*consrat^(alpha-1) - 1

orthog e ; inst

weights V0

params alpha delta

end gmm --iterate

gmm e = delta*ewr*consrat^(alpha-1) - 1

orthog e ; inst

weights V1

params alpha delta

end gmm --iterate

In the example code given in 27.4, we replicate selected portions of table 3.7 in Hall (2005). The

variable consrat is deﬁned as the ratio of monthly consecutive real per capita consumption (services

and nondurables) for the US, and ewr is the return–price ratio of a ﬁctitious asset constructed by

averaging all the stocks in the NYSE. The instrument set contains the constant and two lags of each

variable.

The command set force_hc on on the second line of the script has the sole purpose of replicating

Chapter 27. GMM estimation 259

Listing 27.5: Estimation of the Consumption Based Asset Pricing Model – output

Model 1: 1-step GMM estimates using the 465 observations 1959:04-1997:12

e = d*ewr*consrat^(alpha-1) - 1

PARAMETER ESTIMATE STDERROR T STAT P-VALUE

alpha -3.14475 6.84439 -0.459 0.64590

d 0.999215 0.0121044 82.549 <0.00001 ***

GMM criterion = 2778.08

Model 2: 1-step GMM estimates using the 465 observations 1959:04-1997:12

e = d*ewr*consrat^(alpha-1) - 1

PARAMETER ESTIMATE STDERROR T STAT P-VALUE

alpha 0.398194 2.26359 0.176 0.86036

d 0.993180 0.00439367 226.048 <0.00001 ***

GMM criterion = 14.247

Model 3: Iterated GMM estimates using the 465 observations 1959:04-1997:12

e = d*ewr*consrat^(alpha-1) - 1

PARAMETER ESTIMATE STDERROR T STAT P-VALUE

alpha -0.344325 2.21458 -0.155 0.87644

d 0.991566 0.00423620 234.070 <0.00001 ***

GMM criterion = 5491.78

J test: Chi-square(3) = 11.8103 (p-value 0.0081)

Model 4: Iterated GMM estimates using the 465 observations 1959:04-1997:12

e = d*ewr*consrat^(alpha-1) - 1

PARAMETER ESTIMATE STDERROR T STAT P-VALUE

alpha -0.344315 2.21359 -0.156 0.87639

d 0.991566 0.00423469 234.153 <0.00001 ***

GMM criterion = 5491.78

J test: Chi-square(3) = 11.8103 (p-value 0.0081)

Chapter 27. GMM estimation 260

the given example: as mentioned above, it forces gretl to compute the long-run variance of the

orthogonality conditions according to equation (27.11) rather than (27.12).

We run gmm four times: one-step estimation for each of two initial weights matrices, then iterative

estimation starting from each set of initial weights. Since the number of orthogonality conditions (5)

is greater than the number of estimated parameters (2), the choice of initial weights should make a

diﬀerence, and indeed we see fairly substantial diﬀerences between the one-step estimates (Models 1

and 2). On the other hand, iteration reduces these diﬀerences almost to the vanishing point (Models

3 and 4).

Part of the output is given in 27.5. It should be noted that the Jtest leads to a rejection of

the hypothesis of correct speciﬁcation. This is perhaps not surprising given the heroic assumptions

required to move from the microeconomic principle in equation (27.13) to the aggregate system that

is actually estimated.

27.7 Caveats

A few words of warning are in order: despite its ingenuity, GMM is possibly the most fragile estimation

method in econometrics. The number of non-obvious choices one has to make when using GMM is

large, and in ﬁnite samples each of these can have dramatic consequences for the eventual output.

Some of the factors that may aﬀect the results are:

1. Orthogonality conditions can be written in more than one way: for example, if E(xt−µ) = 0,

then E(xt/µ −1) = 0 holds too. It is possible that a diﬀerent speciﬁcation of the moment

conditions leads to diﬀerent results.

2. As with all other numerical optimization algorithms, weird things may happen when the objec-

tive function is nearly ﬂat in some directions or has multiple minima. BFGS is usually quite

good, but there is no guarantee that it always delivers a sensible solution, if one at all.

3. The 1-step and, to a lesser extent, the 2-step estimators may be sensitive to apparently trivial

details, like the re-scaling of the instruments. Diﬀerent choices for the initial weights matrix

can also have noticeable consequences.

4. With time-series data, there is no hard rule on the appropriate number of lags to use when

computing the long-run covariance matrix (see section 27.5). Our advice is to go by trial and

error, since results may be greatly inﬂuenced by a poor choice.

One of the consequences of this state of things is that replicating well-known published studies may

be extremely diﬃcult. Any non-trivial result is virtually impossible to reproduce unless all details of

the estimation procedure are carefully recorded.

Chapter 28

Model selection criteria

28.1 Introduction

In some contexts the econometrician chooses between alternative models based on a formal hypothesis

test. For example, one might choose a more general model over a more restricted one if the restriction

in question can be formulated as a testable null hypothesis, and the null is rejected on an appropriate

test.

In other contexts one sometimes seeks a criterion for model selection that somehow measures the

balance between goodness of ﬁt or likelihood, on the one hand, and parsimony on the other. The

balancing is necessary because the addition of extra variables to a model cannot reduce the degree

of ﬁt or likelihood, and is very likely to increase it somewhat even if the additional variables are not

truly relevant to the data-generating process.

The best known such criterion, for linear models estimated via least squares, is the adjusted R2,

R2= 1 −SSR/(n−k)

TSS/(n−1)

where nis the number of observations in the sample, kdenotes the number of parameters estimated,

and SSR and TSS denote the sum of squared residuals and the total sum of squares for the dependent

variable, respectively. Compared to the ordinary coeﬃcient of determination or unadjusted R2,

R2= 1 −SSR

TSS

the “adjusted” calculation penalizes the inclusion of additional parameters, other things equal.

28.2 Information criteria

A more general criterion in a similar spirit is Akaike’s (1974) “Information Criterion” (AIC). The

original formulation of this measure is

AIC = −2ℓ(ˆ

θ)+2k(28.1)

where ℓ(ˆ

θ) represents the maximum loglikelihood as a function of the vector of parameter estimates,

θ, and k(as above) denotes the number of “independently adjusted parameters within the model.” In

this formulation, with AIC negatively related to the likelihood and positively related to the number

of parameters, the researcher seeks the minimum AIC.

The AIC can be confusing, in that several variants of the calculation are “in circulation.” For example,

Davidson and MacKinnon (2004) present a simpliﬁed version,

AIC = ℓ(ˆ

θ)−k

which is just −2 times the original: in this case, obviously, one wants to maximize AIC.

In the case of models estimated by least squares, the loglikelihood can be written as

ℓ(ˆ

θ) = −n

2(1 + log 2π−log n)−n

2log SSR (28.2)

Substituting (28.2) into (28.1) we get

AIC = n(1 + log 2π−log n) + nlog SSR + 2k

261

Chapter 28. Model selection criteria 262

which can also be written as

AIC = nlog SSR

n+ 2k+n(1 + log 2π) (28.3)

Some authors simplify the formula for the case of models estimated via least squares. For instance,

William Greene writes

AIC = log SSR

n+2k

n(28.4)

This variant can be derived from (28.3) by dividing through by nand subtracting the constant

1 + log 2π. That is, writing AICGfor the version given by Greene, we have

AICG=1

nAIC −(1 + log 2π)

Finally, Ramanathan gives a further variant:

AICR=SSR

ne2k/n

which is the exponential of the one given by Greene.

Gretl began by using the Ramanathan variant, but since version 1.3.1 the program has used the

original Akaike formula (28.1), and more speciﬁcally (28.3) for models estimated via least squares.

Although the Akaike criterion is designed to favor parsimony, arguably it does not go far enough in

that direction. For instance, if we have two nested models with k−1 and kparameters respectively,

and if the null hypothesis that parameter kequals 0 is true, in large samples the AIC will nonetheless

tend to select the less parsimonious model about 16 percent of the time (see Davidson and MacKinnon,

2004, chapter 15).

An alternative to the AIC which avoids this problem is the Schwarz (1978) “Bayesian information

criterion” (BIC). The BIC can be written (in line with Akaike’s formulation of the AIC) as

BIC = −2ℓ(ˆ

θ) + klog n

The multiplication of kby log nin the BIC means that the penalty for adding extra parameters

grows with the sample size. This ensures that, asymptotically, one will not select a larger model over

a correctly speciﬁed parsimonious model.

A further alternative to AIC, which again tends to select more parsimonious models than AIC, is

the Hannan–Quinn criterion or HQC (Hannan and Quinn,1979). Written consistently with the

formulations above, this is

HQC = −2ℓ(ˆ

θ)+2klog log n

The Hannan–Quinn calculation is based on the law of the iterated logarithm (note that the last term

is the log of the log of the sample size). The authors argue that their procedure provides a “strongly

consistent estimation procedure for the order of an autoregression”, and that “compared to other

strongly consistent procedures this procedure will underestimate the order to a lesser degree.”

Gretl reports the AIC, BIC and HQC (calculated as explained above) for most sorts of models. The

key point in interpreting these values is to know whether they are calculated such that smaller values

are better, or such that larger values are better. In gretl, smaller values are better: one wants to

minimize the chosen criterion.

Chapter 29

Degrees of freedom correction

29.1 Introduction

This chapter gives a brief account of the issue of correction for degrees of freedom in the context of

econometric modeling, leading up to a discussion of the policies adopted in gretl in this regard. We

also explain how to supplement the results produced automatically by gretl if you want to apply such

a correction where gretl does not, or vice versa.

The ﬁrst few sections are quite basic; experts are invited to skip to section 29.5.

29.2 Back to basics

It’s well known that given a sample, {xi}, of size nfrom a normally distributed population, the

Maximum Likelihood (ML) estimator of the population variance, σ2, is

ˆσ2=1

i=1

(xi−¯x)2(29.1)

where ¯xis the sample mean, n−1Pn

i=1 xi. It’s also well known that ˆσ2, while it is a consistent

estimator, is biased, and it is commonly replaced by the “sample variance”, namely,

s2=1

n−1

i=1

(xi−¯x)2(29.2)

The intuition behind the bias in (29.1) is straightforward. First, the quantity we seek to estimate is

deﬁned as

σ2=E(xi−µ)2

where µ=E(x). It is clear that if µwere observable, a perfectly good estimator would be

˜σ2=1

i=1

(xi−µ)2.

But this is not a practical option: µis generally unobservable. We therefore substitute ¯xfor the

unknown µ. It is easily shown that ¯xis the least-squares estimator of µ, and also (assuming normality)

the ML estimator. It is unbiased, but is of course subject to sampling error; in any given sample it is

highly unlikely that ¯x=µ. Given that ¯xis the least-squares estimator, the sum of squared deviations

of the xifrom any value other than ¯xmust be greater than the summation in (29.1). But since µis

almost certainly not equal to ¯x, the sum of squared deviations of the xifrom µwill surely be greater

than the sum of squared deviations in (29.1). It follows that the expected value of ˆσ2falls short of

the population variance.

The proof that s2is indeed the unbiased estimator can be found in any good statistics textbook,

where we also learn that the magnitude n−1 in (29.2) can be brought under a general description

as the “degrees of freedom” of the calculation at hand. (Given ¯x, the nsample values provide only

n−1 items of information since the nth value can always be deduced via the formula for ¯x.)

29.3 Application to OLS regression

The argument above carries over into the usual calculation of standard errors in the context of OLS

regression as applied to the linear model y=Xβ +u. If the disturbances, u, are assumed to be

263

Chapter 29. Degrees of freedom correction 264

independently and identically distributed (IID), then the variance of the OLS estimator, ˆ

β, is given

Var ˆ

β=σ2(X′X)−1

where σ2is the variance of the error term and Xis an n×kmatrix of regressors. But how should

the unknown σ2be estimated? The ML estimator is

ˆσ2=1

i=1

ˆu2

i(29.3)

where the ˆu2

iare squared residuals, ui=yi−Xiβ. But this estimator is biased and we typically use

the unbiased counterpart

s2=1

n−k

i=1

ˆu2

i(29.4)

in which n−kis the number of degrees of freedom given nresiduals from a regression where k

parameters are estimated.

The standard estimator of the variance of ˆ

βin the context of OLS is then V=s2(X′X)−1. And

the standard errors of the individual parameter estimates, sˆ

βi, being the square roots of the diagonal

elements of V, inherit a degrees of freedom correction from the estimator s2.

Going one step further, consider hypothesis testing in the context of OLS. Since the variance of ˆ

is unknown and must itself be estimated, the sampling distribution of the OLS coeﬃcients is not,

strictly speaking, normal. But if the disturbances are normally distributed (besides being IID) then,

even in small samples, the parameter estimates will follow a distribution that can be speciﬁed exactly,

namely the Student tdistribution with degrees of freedom equal to the value given above, ν=n−k.

That is, besides using a df correction in computing the standard errors of the OLS coeﬃcients, one

uses the same νin selecting the particular distribution to which the“t-ratio”, ( ˆ

βi−β0)/sˆ

βi, should be

referred in order to determine the marginal signiﬁcance level or p-value for the null hypothesis that

βi=β0. This is the payoﬀ to df correction: we get test statistics that follow a known distribution in

small samples. (In big enough samples the point is moot, since the quantitative distinction between

ˆσ2and s2vanishes.)

So far, so good. Everyone expects df correction in plain OLS standard errors just as we expect division

by n−1 in the sample variance. And users of econometric software expect that the p-values reported

for OLS coeﬃcients will be based on the t(ν) distribution—although they are not always suﬃciently

aware that the validity of such statistics in small samples depends on the assumption of normally

distributed errors.

29.4 Beyond OLS

The situation is diﬀerent when we move beyond estimation of the classical linear model via OLS. We

may wish to estimate nonlinear models (sometimes by least squares), and many models of interest

to econometricians are commonly estimated via maximization of a likelihood function, or via the

generalized method of moments (GMM).

In such cases we do not, in general, have exact small-sample results to rely upon; in particular, we

cannot assume that coeﬃcient estimates follow the tdistribution. Rather, we typically appeal to

asymptotic results in statistical theory. We seek consistent estimators which, although they may be

biased, nonetheless converge in probability to the corresponding parameter values as the sample size

goes to inﬁnity. Under the right conditions, laws of large numbers and central limit theorems entitle

us to expect that test statistics will converge to the normal distribution, or the χ2distribution for

multivariate tests, given big enough samples.

To “correct” or not?

The question arises, should we or should we not apply a df“correction” in reporting variance estimates

and standard errors for models that depart from the classical linear speciﬁcation?

The argument against applying df adjustment is that it lacks a theoretical basis: it does not produce

test statistics that follow any known distribution in small samples. In addition, if parameter estimates

Chapter 29. Degrees of freedom correction 265

are obtained via ML, it makes sense to report ML estimates of variances even if these are biased, since

it is the ML quantities that are used in computing the criterion function and in forming likelihood-ratio

tests.

On the other hand, pragmatic arguments for doing df adjustment are (a) that it makes for closer

comparability between regular OLS estimates and nonlinear ones, and (b) that it provides a “pinch of

salt” in relation to small-sample results—that is, it inﬂates standard errors, conﬁdence intervals and

p-values somewhat—even if it lacks rigorous justiﬁcation.

Note that even for fairly small samples, the diﬀerence between the biased and unbiased estimators

in equations (29.1) and (29.2) above will be small. For example, if n= 30 then s2=30

29 ˆσ2. In

econometric modelling proper, however, the diﬀerence can be quite substantial. If n= 50 and k= 10,

the s2deﬁned in (29.4) will be 50/40 = 1.25 as large as the ˆσ2in (29.3), and standard errors will be

about 12 percent larger.1One can make a case for inﬂating the standard errors obtained via nonlinear

estimators as a precaution against taking results to be “more precise than they really are”.

In rejoinder to the last point, one might equally say that savvy econometricians should know to apply

a discount factor (albeit an imprecise one) to small-sample estimates outside of the classical, normal

linear model—or even that they should distrust such results and insist on large samples before making

inferences. This line of thinking suggests that test statistics such as z=ˆ

βi/ˆσˆ

βishould be referred to

the distribution to which they conform asymptotically—in this case N(0,1) for H0:βi= 0—if and

only if the conditions for appealing to asymptotic results can be considered as met. From this point

of view df adjustment may be seen as providing a false sense of security.

29.5 Consistency and awkward cases

Consistency (in the ordinary sense of uniformity of treatment) is a bugbear when dealing with this

issue. To give a simple example, suppose an econometrics program follows the policy of applying

df correction for OLS estimation but not for ML estimation. One is, of course, free to estimate the

classical, normal linear model via ML, in which case ˆ

βshould be numerically identical to that obtained

via OLS. But the user of the software will obtain two diﬀerent sets of standard errors depending on

the estimation method. Admittedly, this example is not very troublesome; presumably one would

apply ML to the classical linear model only to make a pedagogical point.

Here is a more awkward case. An unrestricted vector autoregression (VAR) is a system of equations,

but the ML estimate of this system, given normal errors, is equivalent to equation-by-equation OLS.

Should df correction be applied to VARs? Consistency with OLS argues Yes. However, a popular

extension of the VAR methodology is the vector error-correction model (VECM). VECMs are closely

related to VARs and one might well be interested in making comparisons across the two, but a VECM

is a nonlinear system and the cointegrating vectors that lie at the heart of this model must be estimated

via Maximum Likelihood. So perhaps VAR results should not be df adjusted, for comparability with

VECMs.

Another “grey area” is the class of Feasible Generalized Least Squares (FGLS) estimators—for exam-

ple, weighted least squares following the estimation of a skedastic function, or estimators designed

to handle ﬁrst-order autocorrelation, such as Cochrane–Orcutt. These depart from the classical lin-

ear model, and the theoreretical basis for inference in such models is asymptotic, yet according to

econometric tradition standard errors are generally df adjusted.

Yet another awkward case: “robust” (heteroskedasticity- and/or autocorrelation-consistent) standard

errors in the context of OLS. Such estimators are justiﬁed by asymptotic arguments and in general

we cannot determine their small-sample distributions. That would argue for referring the associated

test statistics to the normal distribution. But comparability with classical standard errors pulls in

the other direction. Suppose in a particular case a robust estimator produces a standard error that is

numerically indistinguishable from the classical one: if the former is referred to the normal distribution

and the latter to the tdistribution, switching to robust standard errors will give a smaller p-value for

the coeﬃcient in question, making it appear “more signiﬁcant,” and arguably this is misleading.

1A fairly typical situation in time-series macroeconometrics would be have between 100 and 200 quarterly observa-

tions, and to be estimating up to maybe 30 parameters including lags. In this case df correction would make a diﬀerence

to standard errors on the order of 10 percent.

Chapter 29. Degrees of freedom correction 266

29.6 What gretl does

First of all, the third column in gretl model output—following “coeﬃcient”and “std. error”—is labeled

either“t-ratio” or “z.” This is your signal: “t-ratio”indicates that the estimated standard error employs

a degrees of freedom adjustment and the reported p-value is obtained from the Student tdistribution,

while“z” indicates that no such adjustment is applied and the p-value comes from the standard normal

distribution.

If you see that gretl is applying a df adjustment but you don’t want this, the ﬁrst point to check is

whether you can switch to the asymptotic variant by using an option ﬂag or other command.

•The ols and tsls commands support a --no-df-corr option to suppress degrees of freedom

adjustment. In the case of Two-Stage Least Squares it’s certainly arguable that df correction

should not be performed by default, however gretl does this, largely for comparability with other

software (for example Stata’s ivreg command). But you can override the default if you wish.

•The estimate command, for systems of equations, also supports the --no-df-corr option

when the speciﬁed estimation method is OLS or TSLS. (For other estimators supported by

gretl’s system command no df adjustment is applied by default.)

•By default gretl uses the tdistribution for statistics based on robust standard errors under OLS.

However, users can specify that p-values be calculated using the standard normal distribution

whenever the --robust option is passed to an estimation command, by means of the following

“set” command

set robust_z on

If these possibilities do not apply, it is fairly straightforward to “purge” regression results of df correc-

tion, as illustrated in the following script fragment. We assume that a model has just been estimated,

so that the model-related accessors ($stderr,$coeff and so on) are available.

matrix se = $stderr * sqrt($df/$T)

matrix zscore = $coeff ./ se

matrix pv = 2 * pvalue(z, abs(zscore))

matrix M = $coeff ~ se ~ zscore ~ pv

cnameset(M, "coeff stderr z p-value")

print M

This will print the original coeﬃcient estimates along with asymptotic standard errors and the as-

sociated z-scores and (two-sided) normal p-values. The converse case is left as an exercise for the

reader.

VARs

As mentioned above, Vector Autoregressions constitute a particularly awkward case, with considera-

tions of consistency of treatment pulling in two opposite directions. For that reason gretl has adopted

an “agnostic” policy in relation to such systems. We do not oﬀer a $vcv accessor, but instead acces-

sors named $xtxinv (the matrix X′X−1for the system as a whole) and $sigma (an estimate of the

cross-equation variance–covariance matrix, Σ). It’s then up to the user to build an estimate of the

variance matrix of the parameter estimates—call it V—should that be required.

Note that $sigma gives the Maximum Likelihood Estimator (without a degrees of freedom adjustment)

so if you do

matrix Vml = $sigma ** $xtxinv

(where “**” represents Kronecker product) you obtain the MLE of the variance matrix of the param-

eter estimates. But if you want the unbiased estimator you can do

matrix S = $sigma * $T/($T-$ncoeff)

matrix Vu = S ** $xtxinv

Chapter 29. Degrees of freedom correction 267

to employ a suitably inﬂated variant of the Σ estimate. (For VARs, and also VECMs, $ncoeff gives

the number of coeﬃcients per equation.)

The second variant above is such that the vector of standard errors produced by

matrix SE = sqrt(diag(Vu))

agrees with the standard errors printed as part of the per-equation VAR output.

A fuller example of usage of the $xtxinv accessor is given in Listing 29.1: this shows how one can

replicate the F-tests for Granger causality that are displayed by default by the var command, with

the reﬁnement that, depending on the setting of the USE_F ﬂag, these tests can be done using a small

sample correction as in gretl’s output or in asymptotic (χ2) form.

Listing 29.1: Computing statistics to test for Granger causality [Download ▼]

open denmark.gdt

list LST = LRM LRY IBO IDE

scalar p = 2 # lags in VAR

scalar USE_F = 1 # small sample correction?

var p LST --quiet

k = nelem(LST)

matrix theta = vec($coeff)

matrix V = $sigma ** $xtxinv

if USE_F

scalar df = $T - $ncoeff

V *= $T/df

endif

matrix GC = zeros(k, k)

cnameset(GC, LST)

rnameset(GC, LST)

matrix idx = seq(1,p) + 1

loop i = 1..k

loop j = 1..k

GC[i,j] = qform(theta[idx]’, invpd(V[idx,idx]))

idx += (j==k)? p+1 : p

endloop

if USE_F

GC /= p

matrix pvals = pvalue(F, p, df, GC)

else

matrix pvals = pvalue(X, p, GC)

endif

cnameset(pvals, LST)

rnameset(pvals, LST)

print GC pvals

Vector Error Correction Models are more complex than VARs in this respect, since we employ Jo-

hansen’s variance estimator for the “β” terms. This means for example that the $xtxinv accessor

treats each estimated error correction (EC) term as one regressor on its own, such that the sampling

uncertainty of the loading coeﬃcients is thereby addressed (after Kronecker-multiplying with $sigma

as before) . The “internals”of the EC terms are of course made up of the integrated (levels) variables,

and the special $jvbeta accessor is responsible for the variance of the cointegration coeﬃcients, where

Chapter 29. Degrees of freedom correction 268

degrees-of-freedom corrections are not available.

But as soon as the loading coeﬃcients attached to the EC terms are restricted, there is no common

set of regressors with freely varying coeﬃcients in the VECM system anymore, and therefore in these

cases the formulas above are misleading. The $xtxinv accessor can still be retrieved (because it does

not involve the coeﬃcients), but in the restricted αcase it should no longer be used as shown above.

The notion of system degrees of freedom then also becomes fuzzier since the number of regressors can

vary across equations.

Chapter 30

Time series ﬁlters

In addition to the usual application of lags and diﬀerences, gretl provides fractional diﬀerencing and

various ﬁlters commonly used in macroeconomics for trend-cycle decomposition: notably the Hodrick–

Prescott ﬁlter (Hodrick and Prescott,1997), the Baxter–King bandpass ﬁlter (Baxter and King,1999)

and the Butterworth ﬁlter (Butterworth,1930).

30.1 Fractional diﬀerencing

The concept of diﬀerencing a time series dtimes is pretty obvious when dis an integer; it may seem

odd when dis fractional. However, this idea has a well-deﬁned mathematical content: consider the

function

f(z) = (1 −z)−d,

where zand dare real numbers. By taking a Taylor series expansion around z= 0, we see that

f(z) = 1 + dz +d(d+ 1)

2z2+···

or, more compactly,

f(z) = 1 +

∞

i=1

ψizi

with

ψk=Qk

i=1(d+i−1)

k!=ψk−1

d+k−1

The same expansion can be used with the lag operator, so that if we deﬁned

Yt= (1 −L)0.5Xt

this could be considered shorthand for

Yt=Xt−0.5Xt−1−0.125Xt−2−0.0625Xt−3− · ··

In gretl this transformation can be accomplished by the syntax

Y = fracdiff(X, 0.5)

30.2 The Hodrick–Prescott ﬁlter

This ﬁlter is accessed using the hpfilt() function, which takes as its ﬁrst argument the name of the

variable to be processed. (Further optional arguments are explained below.)

A time series ytmay be decomposed into a trend or growth component gtand a cyclical component

ct.

yt=gt+ct, t = 1,2, . . . , T

The Hodrick–Prescott ﬁlter eﬀects such a decomposition by minimizing the following:

t=1

(yt−gt)2+λ

T−1

t=2

((gt+1 −gt)−(gt−gt−1))2.

The ﬁrst term above is the sum of squared cyclical components ct=yt−gt. The second term is

a multiple λof the sum of squares of the trend component’s second diﬀerences. This second term

269

Chapter 30. Time series ﬁlters 270

penalizes variations in the growth rate of the trend component: the larger the value of λ, the higher

is the penalty and hence the smoother the trend series.

Note that the hpfilt function in gretl produces the cyclical component, ct, of the original series. If

you want the smoothed trend you can subtract the cycle from the original:

ct = hpfilt(yt)

gt = yt - ct

Hodrick and Prescott (1997) suggest that a value of λ= 1600 is reasonable for quarterly data. The

default value in gretl is 100 times the square of the data frequency (which, of course, yields 1600 for

quarterly data). The value can be adjusted using an optional second argument to hpfilt(), as in

ct = hpfilt(yt, 1300)

As of version 2018a, the hpfilt() function accepts a third, optional Boolean argument. If set to

non-zero, what is performed is the so-called one-sided version of the ﬁlter. See Section 36.12 for

further details.

30.3 The Baxter and King ﬁlter

This ﬁlter is accessed using the bkfilt() function, which again takes the name of the variable to be

processed as its ﬁrst argument. The operation of the ﬁlter can be controlled via three further optional

argument.

Consider the spectral representation of a time series yt:

yt=Zπ

−π

eiωdZ(ω)

To extract the component of ytthat lies between the frequencies ωand ωone could apply a bandpass

ﬁlter:

c∗

t=Zπ

−π

F∗(ω)eiωdZ(ω)

where F∗(ω) = 1 for ω < |ω|< ω and 0 elsewhere. This would imply, in the time domain, applying to

the series a ﬁlter with an inﬁnite number of coeﬃcients, which is undesirable. The Baxter and King

bandpass ﬁlter applies to yta ﬁnite polynomial in the lag operator A(L):

ct=A(L)yt

where A(L) is deﬁned as

A(L) =

i=−k

aiLi

The coeﬃcients aiare chosen such that F(ω) = A(eiω)A(e−iω ) is the best approximation to F∗(ω) for

a given k. Clearly, the higher kthe better the approximation is, but since 2kobservations have to be

discarded, a compromise is usually sought. Moreover, the ﬁlter has also other appealing theoretical

properties, among which the property that A(1) = 0, so a series with a single unit root is made

stationary by application of the ﬁlter.

In practice, the ﬁlter is normally used with monthly or quarterly data to extract the “business cycle”

component, namely the component between 6 and 36 quarters. Usual choices for kare 8 or 12 (maybe

higher for monthly series). The default values for the frequency bounds are 8 and 32, and the default

value for the approximation order, k, is 8. You can adjust these values using the full form of bkfilt(),

which is

bkfilt(seriesname,f1,f2 ,k)

where f1 and f2 represent the lower and upper frequency bounds respectively.

Chapter 30. Time series ﬁlters 271

30.4 The Butterworth ﬁlter

The Butterworth ﬁlter (Butterworth,1930) is an approximation to an “ideal” square-wave ﬁlter. The

ideal ﬁlter divides the spectrum of a time series into a pass-band (frequencies less than some chosen

ω⋆for a low-pass ﬁlter, or frequencies greater than ω⋆for high-pass) and a stop-band; the gain is

1 for the pass-band and 0 for the stop-band. The ideal ﬁlter is unattainable in practice since it

would require an inﬁnite number of coeﬃcients, but the Butterworth ﬁlter oﬀers a remarkably good

approximation. This ﬁlter is derived and persuasively advocated by Pollock (2000).

For data y, the ﬁltered sequence xis given by

x=y−λΣQ(M+λQ′ΣQ)−1Q′y(30.1)

where

Σ = {2IT−(LT+L−1

T)}T−2and M={2IT+ (LT+L−1

T)}T

ITdenotes the identity matrix of order T;LT= [e1, e2, . . . , eT−1,0] is the ﬁnite-sample matrix version

of the lag operator; and Qis deﬁned such that pre-multiplication of a T-vector of data by Q′of order

(T−2) ×Tproduces the second diﬀerences of the data. The matrix product

Q′ΣQ={2IT−(LT+L−1

T)}T

is a Toeplitz matrix.

The behavior of the Butterworth ﬁlter is governed by two parameters: the frequency cutoﬀ ω⋆and

an integer order, n, which determines the number of coeﬃcients used. The λthat appears in (30.1)

is tan(ω⋆/2)−2n. Higher values of nproduce a better approximation to the ideal ﬁlter in principle

(i.e. a sharper cut between the pass-band and the stop-band) but there is a downside: with a greater

number of coeﬃcients numerical instability may be an issue, and the inﬂuence of the initial values in

the sample may be exaggerated.

In gretl the Butterworth ﬁlter is implemented by the bwfilt() function,1which takes three arguments:

the series to ﬁlter, the order nand the frequency cutoﬀ, ω⋆, expressed in degrees. The cutoﬀ value

must be greater than 0 and less than 180. This function operates as a low-pass ﬁlter; for the high-pass

variant, subtract the ﬁltered series from the original, as in

series bwcycle = y - bwfilt(y, 8, 67)

Pollock recommends that the parameters of the Butterworth ﬁlter be tuned to the data: one should

examine the periodogram of the series in question (possibly after removal of a polynomial trend)

in search of a “dead spot” of low power between the frequencies one wishes to exclude and the

frequencies one wishes to retain. If ω⋆is placed in such a dead spot then the job of separation can be

done with a relatively small n, hence avoiding numerical problems. By way of illustration, consider

the periodogram for quarterly observations on new cars sales in the US,21975:1 to 1990:4 (the upper

panel in Figure 30.1).

A seasonal pattern is clearly visible in the periodogram, centered at an angle of 90◦or 4 periods.

If we set ω⋆= 68◦(or thereabouts) we should be able to excise the seasonality quite cleanly using

n= 8. The result is shown in the lower panel of the Figure, along with the frequency response or

gain plot for the chosen ﬁlter. Note the smooth and reasonably steep drop-oﬀ in gain centered on the

nominal cutoﬀ of 68◦≈3π/8.

The apparatus that supports this sort of analysis in the gretl GUI can be found under the Variable

menu in the main window: the items Periodogram and Filter. In the periodogram dialog box you have

the option of expressing the frequency axis in degrees, which is helpful when selecting a Butterworth

ﬁlter; and in the Butterworth ﬁlter dialog you have the option of plotting the frequency response as

well as the smoothed series and/or the residual or cycle.

1The code for this ﬁlter is based on D. S. G. Pollock’s programs IDEOLOG and DETREND. The Pascal source code

for the former is available from http://www.le.ac.uk/users/dsgp1 and the C sources for the latter were kindly made

available to us by the author.

2This is the variable QNC from the Ramanathan data ﬁle data9-7.

Chapter 30. Time series ﬁlters 272

50000

100000

150000

200000

250000

300000

0 20 40 60 80 100 120 140 160 180

64.0 10.7 5.8 4.0 3.0 2.5 2.1

degrees

periods

1600

1800

2000

2200

2400

2600

2800

3000

3200

3400

1976 1978 1980 1982 1984 1986 1988 1990

QNC (original data)

QNC (smoothed)

0.2

0.4

0.6

0.8

0 π/4 π/2 3π/4 π

Figure 30.1: The Butterworth ﬁlter applied

30.5 The discrete Fourier transform

The Fourier transform is not itself a time-series ﬁlter, but by providing the bridge between the time

and the frequency domain it is a fundamental building block of many ﬁlter internals and deserves

some detailed comments.

The discrete Fourier transform can be best thought of as a linear, invertible transform of a complex

vector. Hence, if xis an n-dimensional vector whose k-th element is xk=ak+ibk, then the output

of the discrete Fourier transform is a vector f=F(x) whose k-th element is

fk=

n−1

j=0

e−iω(j,k)xj

where ω(j, k) = 2πi jk

n. Since the transformation is invertible, the vector xcan be recovered from f

via the so-called inverse transform

xk=1

n−1

j=0

eiω(j,k)fj.

The Fourier transform is used in many diverse situations on account of this key property: the convolu-

tion of two vectors can be performed eﬃciently by multiplying the elements of their Fourier transforms

and inverting the result. If

zk=

j=1

xjyk−j,

then

F(z) = F(x)⊙ F(y).

That is, F(z)k=F(x)kF(y)k.

For computing the Fourier transform, gretl uses the external library fftw3: see Frigo and Johnson

(2005). This guarantees extreme speed and accuracy. In fact, the CPU time needed to perform the

transform is O(nlog n) for any n. This is why the array of numerical techniques employed in fftw3

is commonly known as the Fast Fourier Transform.

Gretl provides two matrix functions for performing the Fourier transform and its inverse: fft and

ffti. For example:

Chapter 30. Time series ﬁlters 273

matrixx1={1;2;3}

# perform the transform

matrix f = fft(x1)

# perform the inverse transform

matrix x2 = ffti(f)

yields

x1=









f=





6 0

−1.5 0.866

−1.5−0.866





x2=











Should it be necessary to compute the Fourier transform on several vectors with the same number of

elements, it is numerically more eﬃcient to group them into a matrix rather than invoking fft for

each vector separately.

As an example, consider the multiplication of two polynomials:

a(x) = 1 + 0.5x

b(x) = 1 + 0.3x−0.8x2

c(x) = a(x)·b(x) = 1 + 0.8x−0.65x2−0.4x3

The coeﬃcients of the polynomial c(x) are the convolution of the coeﬃcients of a(x) and b(x); the

following gretl code fragment illustrates how to compute the coeﬃcients of c(x):

# define the two polynomials

a = { 1, 0.5, 0, 0 }’

b = { 1, 0.3, -0.8, 0 }’

# perform the transforms

fa = fft(a)

fb = fft(b)

# multiply the two transforms element by element

fc = fa .* fb

# compute the coefficients of c via the inverse transform

c = ffti(fc)

Maximum eﬃciency would have been achieved by grouping aand binto a matrix. The computational

advantage is so little in this case that the exercise is a bit silly, but the following alternative may be

preferable for a large number of rows/columns:

# define the two polynomials

a={1;0.5;0;0}

b={1;0.3;-0.8;0}

# perform the transforms jointly

f = fft(a ~ b)

# complex-multiply the two transforms

fc = f[,1] .* f[,2]

# compute the coefficients of c via the inverse transform

c = ffti(fc)

Traditionally, the Fourier transform in econometrics has been mostly used in time-series analysis, the

periodogram being the best known example. Listing 30.1 shows how to compute the periodogram of

a time series via the fft function.

Chapter 30. Time series ﬁlters 274

Listing 30.1: Periodogram via the Fourier transform [Download ▼]

set verbose off

nulldata 50

set seed 76543

# generate an AR(1) process

series e = normal()

series x = 0

x = 0.9*x(-1) + e

# compute the periodogram

F = fft({x}) # note that the series is turned into a matrix on the fly

S = abs(F).^2

S = S[2:($nobs/2)+1] / (2*$pi*$nobs)

sfreq = seq(1,($nobs/2))’

omega = sfreq .* (2*$pi/$nobs)

period = $nobs ./ sfreq

omega = omega ~ sfreq ~ period ~ S

# compare the built-in command

pergm x

print omega

Chapter 31

Univariate time series models

31.1 Introduction

Time series models are discussed in this chapter and the next two. Here we concentrate on ARIMA

models, unit root tests, and GARCH. The following chapter deals with VARs, and chapter 33 with

cointegration and error correction.

31.2 ARIMA models

Representation and syntax

The arma command performs estimation of AutoRegressive, Integrated, Moving Average (ARIMA)

models. These are models that can be written in the form

ϕ(L)yt=θ(L)ϵt(31.1)

where ϕ(L), and θ(L) are polynomials in the lag operator, L, deﬁned such that Lnxt=xt−n, and ϵtis

a white noise process. The exact content of yt, of the AR polynomial ϕ(), and of the MA polynomial

θ(), will be explained in the following.

Mean terms

The process ytas written in equation (31.1) has, without further qualiﬁcations, mean zero. If the

model is to be applied to real data, it is necessary to include some term to handle the possibility that

ythas non-zero mean. There are two possible ways to represent processes with nonzero mean: one

is to deﬁne µtas the unconditional mean of yt, namely the central value of its marginal distribution.

Therefore, the series ˜yt=yt−µthas mean 0, and the model (31.1) applies to ˜yt. In practice, assuming

that µtis a linear function of some observable variables xt, the model becomes

ϕ(L)(yt−xtβ) = θ(L)ϵt(31.2)

This is sometimes known as a “regression model with ARMA errors”; its structure may be more

apparent if we represent it using two equations:

yt=xtβ+ut

ϕ(L)ut=θ(L)ϵt

The model just presented is also sometimes known as “ARMAX” (ARMA + eXogenous variables).

It seems to us, however, that this label is more appropriately applied to a diﬀerent model: another

way to include a mean term in (31.1) is to base the representation on the conditional mean of yt,

that is the central value of the distribution of ytgiven its own past. Assuming, again, that this can

be represented as a linear combination of some observable variables zt, the model would expand to

ϕ(L)yt=ztγ+θ(L)ϵt(31.3)

The formulation (31.3) has the advantage that γcan be immediately interpreted as the vector of

marginal eﬀects of the ztvariables on the conditional mean of yt. And by adding lags of ztto this

speciﬁcation one can estimate Transfer Function models (which generalize ARMA by adding the

eﬀects of exogenous variable distributed across time).

Gretl provides a way to estimate both forms. Models written as in (31.2) are estimated by maximum

likelihood; models written as in (31.3) are estimated by conditional maximum likelihood. (For more

on these options see the section on “Estimation” below.)

275

Chapter 31. Univariate time series models 276

In the special case when xt=zt= 1 (that is, the models include a constant but no exogenous

variables) the two speciﬁcations discussed above reduce to

ϕ(L)(yt−µ) = θ(L)ϵt(31.4)

and

ϕ(L)yt=α+θ(L)ϵt(31.5)

respectively. These formulations are essentially equivalent, but if they represent one and the same

process µand αare, fairly obviously, not numerically identical; rather

α= (1 −ϕ1−. . . −ϕp)µ

The gretl syntax for estimating (31.4) is simply

arma p q ; y

The AR and MA lag orders, pand q, can be given either as numbers or as pre-deﬁned scalars.

The parameter µcan be dropped if necessary by appending the option -nc (“no constant”) to the

command. If estimation of (31.5) is needed, the switch --conditional must be appended to the

command, as in

arma p q ; y --conditional

Generalizing this principle to the estimation of (31.2) or (31.3), you get that

arma p q ; y const x1 x2

would estimate the following model:

yt−xtβ=ϕ1(yt−1−xt−1β) + . . . +ϕp(yt−p−xt−pβ) + ϵt+θ1ϵt−1+. . . +θqϵt−q

where in this instance xtβ=β0+xt,1β1+xt,2β2. Appending the --conditional switch, as in

arma p q ; y const x1 x2 --conditional

would estimate the following model:

yt=xtγ+ϕ1yt−1+. . . +ϕpyt−p+ϵt+θ1ϵt−1+. . . +θqϵt−q

Ideally, the issue broached above could be made moot by writing a more general speciﬁcation that

nests the alternatives; that is

ϕ(L) (yt−xtβ) = ztγ+θ(L)ϵt; (31.6)

we would like to generalize the arma command so that the user could specify, for any estimation

method, whether certain exogenous variables should be treated as xts or zts, but we’re not yet at

that point (and neither are most other software packages).

Seasonal models

A more ﬂexible lag structure is desirable when analyzing time series that display strong seasonal

patterns. Model (31.1) can be expanded to

ϕ(L)Φ(Ls)yt=θ(L)Θ(Ls)ϵt.(31.7)

For such cases, a fuller form of the syntax is available, namely,

armapq;PQ;y

where pand qrepresent the non-seasonal AR and MA orders, and Pand Qthe seasonal orders. For

example,

Chapter 31. Univariate time series models 277

arma11;11;y

would be used to estimate the following model:

(1 −ϕL)(1 −ΦLs)(yt−µ) = (1 + θL)(1 + ΘLs)ϵt

If ytis a quarterly series (and therefore s= 4), the above equation can be written more explicitly as

yt−µ=ϕ(yt−1−µ) + Φ(yt−4−µ)−(ϕ·Φ)(yt−5−µ) + ϵt+θϵt−1+ Θϵt−4+ (θ·Θ)ϵt−5

Such a model is known as a “multiplicative seasonal ARMA model”.

Gaps in the lag structure

The standard way to specify an ARMA model in gretl is via the AR and MA orders, pand q

respectively. In this case all lags from 1 to the given order are included. In some cases one may wish

to include only certain speciﬁc AR and/or MA lags. This can be done in either of two ways.

•One can construct a matrix containing the desired lags (positive integer values) and supply the

name of this matrix in place of por q.

•One can give a comma-separated list of lags, enclosed in braces, in place of por q.

The following code illustrates these options:

matrix pvec = {1,4}

arma pvec 1 ; y

arma {1,4} 1 ; y

Both forms above specify an ARMA model in which AR lags 1 and 4 are used (but not 2 and 3).

This facility is available only for the non-seasonal component of the ARMA speciﬁcation.

Diﬀerencing and ARIMA

The above discussion presupposes that the time series ythas already been subjected to all the trans-

formations deemed necessary for ensuring stationarity (see also section 31.3). Diﬀerencing is the most

common of these transformations, and gretl provides a mechanism to include this step into the arma

command: the syntax

armapdq;y

would estimate an ARMA(p, q) model on ∆dyt. It is functionally equivalent to

series tmp = y

loop i=1..d

tmp = diff(tmp)

endloop

arma p q ; tmp

except with regard to forecasting after estimation (see below).

When the series ytis diﬀerenced before performing the analysis the model is known as ARIMA (“I”

for Integrated); for this reason, gretl provides the arima command as an alias for arma.

Seasonal diﬀerencing is handled similarly, with the syntax

armapdq;PDQ;y

where Dis the order for seasonal diﬀerencing. Thus, the command

arma100;111;y

would produce the same parameter estimates as

Chapter 31. Univariate time series models 278

series dsy = sdiff(y)

arma10;11;dsy

where we use the sdiff function to create a seasonal diﬀerence (e.g. for quarterly data, yt−yt−4).

In specifying an ARIMA model with exogenous regressors we face a choice which relates back to the

discussion of the variant models (31.2) and (31.3) above. If we choose model (31.2), the “regression

model with ARMA errors”, how should this be extended to the case of ARIMA? The issue is whether or

not the diﬀerencing that is applied to the dependent variable should also be applied to the regressors.

Consider the simplest case, ARIMA with non-seasonal diﬀerencing of order 1. We may estimate either

ϕ(L)(1 −L)(yt−Xtβ) = θ(L)ϵt(31.8)

ϕ(L) ((1 −L)yt−Xtβ) = θ(L)ϵt(31.9)

The ﬁrst of these formulations can be described as a regression model with ARIMA errors, while the

second preserves the levels of the Xvariables. As of gretl version 1.8.6, the default model is (31.8), in

which diﬀerencing is applied to both ytand Xt. However, when using the default estimation method

(native exact ML, see below), the option --y-diff-only may be given, in which case gretl estimates

(31.9).1

Estimation

The default estimation method for ARMA models is exact maximum likelihood estimation (under

the assumption that the error term is normally distributed), using a variety of techniques: the main

algorithm for evaluating the log-likelihood is AS197 by Melard (1984). Maximization is performed

via BFGS and the score is approximated numerically. This method produces results that are directly

comparable with many other software packages. The constant, and any exogenous variables, are

treated as in equation (31.2). The covariance matrix for the parameters is computed using a numerical

approximation to the Hessian at convergence.

The alternative method, invoked with the --conditional switch, is conditional maximum likelihood

(CML), also known as “conditional sum of squares” (see Hamilton,1994, p. 132). This method was

exempliﬁed in Listing 13.3, and only a brief description will be given here. Given a sample of size

T, the CML method minimizes the sum of squared one-step-ahead prediction errors generated by

the model for the observations t0, . . . , T . The starting point t0depends on the orders of the AR

polynomials in the model. The numerical maximization method used is BHHH, and the covariance

matrix is computed using a Gauss–Newton regression.

The CML method is nearly equivalent to maximum likelihood under the hypothesis of normality;

the diﬀerence is that the ﬁrst (t0−1) observations are considered ﬁxed and only enter the likelihood

function as conditioning variables. As a consequence, the two methods are asymptotically equivalent

under standard conditions—except for the fact, discussed above, that our CML implementation treats

the constant and exogenous variables as per equation (31.3).

The two methods can be compared as in the following example

open data10-1

arma 1 1 ; r

arma 1 1 ; r --conditional

which produces the estimates shown in Table 31.1. As you can see, the estimates of ϕand θare quite

similar. The reported constants diﬀer widely, as expected—see the discussion following equations

(31.4) and (31.5). However, dividing the CML constant by 1 −ϕwe get 7.38, which is not far from

the ML estimate of 6.93.

Convergence and initialization

The numerical methods used to maximize the likelihood for ARMA models are not guaranteed to

converge. Whether or not convergence is achieved, and whether or not the true maximum of the

1Prior to gretl 1.8.6, the default model was (31.9). We changed this for the sake of consistency with other software.

Chapter 31. Univariate time series models 279

Table 31.1: ML and CML estimates

Parameter ML CML

µ6.93042 (0.923882) 1.07322 (0.488661)

ϕ0.855360 (0.0511842) 0.852772 (0.0450252)

θ0.588056 (0.0986096) 0.591838 (0.0456662)

likelihood function is attained, may depend on the starting values for the parameters. Gretl employs

one of the following two initialization mechanisms, depending on the speciﬁcation of the model and

the estimation method chosen.

1. Estimate a pure AR model by Least Squares (nonlinear least squares if the model requires

it, otherwise OLS). Set the AR parameter values based on this regression and set the MA

parameters to a small positive value (0.0001).

2. The Hannan–Rissanen method: First estimate an autoregressive model by OLS and save the

residuals. Then in a second OLS pass add appropriate lags of the ﬁrst-round residuals to the

model, to obtain estimates of the MA parameters.

To see the details of the ARMA estimation procedure, add the --verbose option to the command.

This prints a notice of the initialization method used, as well as the parameter values and log-likelihood

at each iteration.

Besides the built-in initialization mechanisms, the user has the option of specifying a set of starting

values manually. This is done via the set command: the ﬁrst argument should be the keyword

initvals and the second should be the name of a pre-speciﬁed matrix containing starting values.

For example

matrix start = { 0, 0.85, 0.34 }

set initvals start

arma 1 1 ; y

The speciﬁed matrix should have just as many parameters as the model: in the example above there

are three parameters, since the model implicitly includes a constant. The constant, if present, is

always given ﬁrst; otherwise the order in which the parameters are expected is the same as the order

of speciﬁcation in the arma or arima command. In the example the constant is set to zero, ϕ1to 0.85,

and θ1to 0.34.

You can get gretl to revert to automatic initialization via the command set initvals auto.

Two variants of the BFGS algorithm are available in gretl. In general we recommend the default

variant, which is based on an implementation by Nash (1990), but for some problems the alternative,

limited-memory version (L-BFGS-B, see Byrd et al.,1995) may increase the chances of convergence

on the ML solution. This can be selected via the --lbfgs option to the arma command.

Estimation via X-12-ARIMA

As an alternative to estimating ARMA models using “native” code, gretl oﬀers the option of using the

external program X-12-ARIMA. This is the seasonal adjustment software produced and maintained by

the U.S. Census Bureau; it is used for all oﬃcial seasonal adjustments at the Bureau. (The current

version X13 can also be used, working as a drop-in replacement.)

Gretl includes a module which interfaces with X-12-ARIMA: it translates arma commands using the

syntax outlined above into a form recognized by X-12-ARIMA, executes the program, and retrieves

the results for viewing and further analysis within gretl. To use this facility you have to install X-

12-ARIMA separately. Packages for both MS Windows and GNU/Linux are available from the gretl

website, http://gretl.sourceforge.net/.

To invoke X-12-ARIMA as the estimation engine, append the ﬂag --x-12-arima, as in

Chapter 31. Univariate time series models 280

arma p q ; y --x-12-arima

As with native estimation, the default is to use exact ML but there is the option of using conditional

ML with the --conditional ﬂag. However, please note that when X-12-ARIMA is used in conditional

ML mode, the comments above regarding the variant treatments of the mean of the process ytdo not

apply. That is, when you use X-12-ARIMA the model that is estimated is (31.2), regardless of whether

estimation is by exact ML or conditional ML. In addition, the treatment of exogenous regressors in

the context of ARIMA diﬀerencing is always that shown in equation (31.8).

Forecasting

ARMA models are often used for forecasting purposes. The autoregressive component, in particular,

oﬀers the possibility of forecasting a process “out of sample” over a substantial time horizon.

Gretl supports forecasting on the basis of ARMA models using the method set out by Box and

Jenkins (1976).2The Box and Jenkins algorithm produces a set of integrated AR coeﬃcients which

take into account any diﬀerencing of the dependent variable (seasonal and/or non-seasonal) in the

ARIMA context, thus making it possible to generate a forecast for the level of the original variable.

By contrast, if you ﬁrst diﬀerence a series manually and then apply ARMA to the diﬀerenced series,

forecasts will be for the diﬀerenced series, not the level. This point is illustrated in Listing 31.1. The

parameter estimates are identical for the two models. The forecasts diﬀer but are mutually consistent:

the variable fcdiff emulates the ARMA forecast (static, one step ahead within the sample range,

and dynamic out of sample).

Lag selection

A variant of the arma and arima commands is available as an aid to speciﬁcation. If you give the

--lagselect option the lag orders pand q—as well as Pand Q, if applicable—are taken as maxima,

and the usual output is replaced by a table showing information criteria and log-likelihood for a range

of speciﬁcations from zero lags to the maxima. If no seasonal component is given this table has

six columns: pand q; the criteria AIC, BIC and HQC (see Chapter 28); and log-likelihood. In the

seasonal case there are eight columns: Pand Qare inserted following pand q. Asterisks identify the

rows (speciﬁcations) on which each information criterion is minimized.

If the input speciﬁcation includes diﬀerencing (non-seasonal and/or seasonal) this is respected but d

and Dare treated as ﬁxed values rather than maxima. You have the usual choice between exact and

conditional ML estimation but the option of using X-12-ARIMA (or X13) is not supported. You also

have the usual option of including exogenous regressors (ARMAX).

On successful completion the table of results is available in the form of a matrix via the $test accessor.

The printed version can be suppressed via the --quiet option.

A simple example of usage is shown in Listing 31.2, using annual sunspot data from 1700 to 2021. The

table (part of which is elided for brevity) has the three information criteria agreeing on ARMA(4,2)

as the optimum among the speciﬁcations estimated. The script illustrates how the $test matrix can

be used to extract the “best” speciﬁcation.

31.3 Unit root tests

The ADF test

The Augmented Dickey–Fuller (ADF) test is, as implemented in gretl, the t-statistic on φin the

following regression:

∆yt=µt+φyt−1+

i=1

γi∆yt−i+ϵt.(31.10)

This test statistic is probably the best-known and most widely used unit root test. It is a one-sided

test whose null hypothesis is φ= 0 versus the alternative φ < 0 (and hence large negative values of

the test statistic lead to the rejection of the null). Under the null, ytmust be diﬀerenced at least once

to achieve stationarity; under the alternative, ytis already stationary and no diﬀerencing is required.

2See in particular their “Program 4” on p. 505ﬀ.

Chapter 31. Univariate time series models 281

Listing 31.1: ARIMA forecasting [Download ▼]

open greene18_2.gdt

# log of quarterly U.S. nominal GNP, 1950:1 to 1983:4

series y = log(Y)

# and its first difference

series dy = diff(y)

# reserve 2 years for out-of-sample forecast

smpl ; 1981:4

# Estimate using ARIMA

arima 1 1 1 ; y

# forecast over full period

smpl --full

fcast fc1

# Return to sub-sample and run ARMA on the first difference of y

smpl ; 1981:4

arma 1 1 ; dy

smpl --full

fcast fc2

series fcdiff = (t<=1982:1)? (fc1 - y(-1)) : (fc1 - fc1(-1))

# compare the forecasts over the later period

smpl 1981:1 1983:4

print y fc1 fc2 fcdiff --byobs

The output from the last command is:

y fc1 fc2 fcdiff

1981:1 7.964086 7.940930 0.02668 0.02668

1981:2 7.978654 7.997576 0.03349 0.03349

1981:3 8.009463 7.997503 0.01885 0.01885

1981:4 8.015625 8.033695 0.02423 0.02423

1982:1 8.014997 8.029698 0.01407 0.01407

1982:2 8.026562 8.046037 0.01634 0.01634

1982:3 8.032717 8.063636 0.01760 0.01760

1982:4 8.042249 8.081935 0.01830 0.01830

1983:1 8.062685 8.100623 0.01869 0.01869

1983:2 8.091627 8.119528 0.01891 0.01891

1983:3 8.115700 8.138554 0.01903 0.01903

1983:4 8.140811 8.157646 0.01909 0.01909

Chapter 31. Univariate time series models 282

Listing 31.2: ARMA lag selection [Download ▼]

open sunspots.gdt

# ARMA lag selection with maxima of 4 for p and q

arma 4 4 ; sunspots --lagselect

# determine the "best" row per BIC (column 4)

best_row = iminc($test)[4]

# extract this row

spec = $test[best_row,][1:2]

# extract p and q as scalars

scalar p = spec[1]

scalar q = spec[2]

# and estimate the "best" specification

arma p q ; sunspots

Part of the lag-selection table:

Estimated using AS 197 (exact ML)

Dependent variable sunspots, T = 322

Criteria for ARMA(p, q) specifications

------------------------------------------------------------

p, q AIC BIC HQC lnL

------------------------------------------------------------

0, 0 3575.2367 3582.7858 3578.2505 -1785.6183

0, 1 3283.7333 3295.0569 3288.2540 -1638.8666

0, 2 3123.6726 3138.7708 3129.7002 -1557.8363

0, 3 3071.8351 3090.7078 3079.3697 -1530.9175

0, 4 3047.0500 3069.6973 3056.0916 -1517.5250

1, 0 3220.3385 3231.6621 3224.8593 -1607.1692

1, 1 3108.4048 3123.5030 3114.4325 -1550.2024

1, 2 3060.3363 3079.2090 3067.8709 -1525.1681

1, 3 3051.2713 3073.9187 3060.3129 -1519.6357

1, 4 3045.1230 3071.5449 3055.6715 -1515.5615

...

3, 0 3008.6022 3027.4750 3016.1368 -1499.3011

3, 1 3010.5262 3033.1735 3019.5677 -1499.2631

3, 2 2976.3054 3002.7273 2986.8539 -1481.1527

3, 3 2969.6493 2999.8457 2981.7046 -1476.8246

3, 4 2970.5017 3004.4727 2984.0640 -1476.2509

4, 0 3010.5497 3033.1970 3019.5912 -1499.2748

4, 1 3012.3267 3038.7485 3022.8751 -1499.1633

4, 2 2969.5073* 2999.7037* 2981.5626* -1476.7536

4, 3 2971.2552 3005.2262 2984.8175 -1476.6276

4, 4 2971.1378 3008.8833 2986.2070 -1475.5689

Chapter 31. Univariate time series models 283

One peculiar aspect of this test is that its limit distribution is non-standard under the null hypothesis:

moreover, the shape of the distribution, and consequently the critical values for the test, depends on

the form of the µtterm. A full analysis of the various cases is inappropriate here: Hamilton (1994)

contains an excellent discussion, but any recent time series textbook covers this topic. Suﬃce it to

say that gretl allows the user to choose the speciﬁcation for µtamong four diﬀerent alternatives:

µtcommand option

0--nc

µ0--c

µ0+µ1t--ct

µ0+µ1t+µ1t2--ctt

These option ﬂags are not mutually exclusive; when they are used together the statistic will be

reported separately for each selected case. By default, gretl uses the combination --c --ct. For each

case, approximate p-values are calculated by means of the algorithm developed in MacKinnon (1996).

The gretl command used to perform the test is adf; for example

adf 4 x1

would compute the test statistic as the t-statistic for φin equation 31.10 with p= 4 in the two cases

µt=µ0and µt=µ0+µ1t.

The number of lags (pin equation 31.10) should be chosen as to ensure that (31.10) is a parametriza-

tion ﬂexible enough to represent adequately the short-run persistence of ∆yt. Setting ptoo low results

in size distortions in the test, whereas setting ptoo high leads to low power. As a convenience to the

user, the parameter pcan be automatically determined. Setting pto a negative number triggers a

sequential procedure that starts with plags and decrements puntil the t-statistic for the parameter

γpexceeds 1.645 in absolute value.

The ADF-GLS test

Elliott, Rothenberg and Stock (1996) proposed a variant of the ADF test which involves an alternative

method of handling the parameters pertaining to the deterministic term µt: these are estimated ﬁrst

via Generalized Least Squares, and in a second stage an ADF regression is performed using the GLS

residuals. This variant oﬀers greater power than the regular ADF test for the cases µt=µ0and

µt=µ0+µ1t.

The ADF-GLS test is available in gretl via the --gls option to the adf command. When this option

is selected the --nc and --ctt options become unavailable, and only one case can be selected at a

time; by default the constant-only model is used but a trend can be added using the --ct ﬂag. When

a trend is present in this test MacKinnon-type p-values are not available; instead we show critical

values from Table 1 in Elliott et al. (1996).

The KPSS test

The KPSS test (Kwiatkowski, Phillips, Schmidt and Shin,1992) is a unit root test in which the null

hypothesis is opposite to that in the ADF test: under the null, the series in question is stationary;

the alternative is that the series is I(1).

The basic intuition behind this test statistic is very simple: if ytcan be written as yt=µ+ut, where

utis some zero-mean stationary process, then not only does the sample average of the yts provide a

consistent estimator of µ, but the long-run variance of utis a well-deﬁned, ﬁnite number. Neither of

these properties hold under the alternative.

The test itself is based on the following statistic:

η=PT

i=1 S2

T2¯σ2(31.11)

where St=Pt

s=1 esand ¯σ2is an estimate of the long-run variance of et= (yt−¯y). Under the

null, this statistic has a well-deﬁned (nonstandard) asymptotic distribution, which is free of nuisance

parameters and has been tabulated by simulation. Under the alternative, the statistic diverges.

Chapter 31. Univariate time series models 284

As a consequence, it is possible to construct a one-sided test based on η, where H0is rejected if

ηis bigger than the appropriate critical value; gretl provides the 90, 95 and 99 percent quantiles.

The critical values are computed via the method presented by Sephton (1995), which oﬀers greater

accuracy than the values tabulated in Kwiatkowski et al. (1992).

Usage example:

kpss m y

where mis an integer representing the bandwidth or window size used in the formula for estimating

the long run variance:

¯σ2=

i=−m1−|i|

m+ 1ˆγi

The ˆγiterms denote the empirical autocovariances of etfrom order −mthrough m. For this estimator

to be consistent, mmust be large enough to accommodate the short-run persistence of et, but not too

large compared to the sample size T. If the supplied mis non-positive a default value is computed,

namely the integer part of 4 T

100 1/4.

The above concept can be generalized to the case where ytis thought to be stationary around a

deterministic trend. In this case, formula (31.11) remains unchanged, but the series etis deﬁned as

the residuals from an OLS regression of yton a constant and a linear trend. This second form of the

test is obtained by appending the --trend option to the kpss command:

kpss n y --trend

Note that in this case the asymptotic distribution of the test is diﬀerent and the critical values reported

by gretl diﬀer accordingly.

Panel unit root tests

The most commonly used unit root tests for panel data involve a generalization of the ADF procedure,

in which the joint null hypothesis is that a given times series is non-stationary for all individuals in

the panel.

In this context the ADF regression (31.10) can be rewritten as

∆yit =µit +φiyi,t−1+

j=1

γij ∆yi,t−j+ϵit (31.12)

The model (31.12) allows for maximal heterogeneity across the individuals in the panel: the param-

eters of the deterministic term, the autoregressive coeﬃcient φ, and the lag order pare all speciﬁc to

the individual, indexed by i.

One possible modiﬁcation of this model is to impose the assumption that φi=φfor all i; that is,

the individual time series share a common autoregressive root (although they may diﬀer in respect of

other statistical properties). The choice of whether or not to impose this assumption has an important

bearing on the hypotheses under test. Under model (31.12) the joint null is φi= 0 for all i, meaning

that all the individual time series are non-stationary, and the alternative (simply the negation of the

null) is that at least one individual time series is stationary. When a common φis assumed, the null

is that φ= 0 and the alternative is that φ < 0. The null still says that all the individual series are

non-stationary, but the alternative now says that they are all stationary. The choice of model should

take this point into account, as well as the gain in power from forming a pooled estimate of φand,

of course, the plausibility of assuming a common AR(1) coeﬃcient.3

In gretl, the formulation (31.12) is used automatically when the adf command is used on panel data.

The joint test statistic is formed using the method of Im, Pesaran and Shin (2003). In this context

the behavior of adf diﬀers from regular time-series data: only one case of the deterministic term

is handled per invocation of the command; the default is that µit includes just a constant but the

3If the assumption of a common φseems excessively restrictive, bear in mind that we routinely assume common

slope coeﬃcients when estimating panel models, even if this is unlikely to be literally true.

Chapter 31. Univariate time series models 285

--nc and --ct ﬂags can be used to suppress the constant or to include a trend, respectively; and the

quadratic trend option --ctt is not available.

The alternative that imposes a common value of φis implemented via the levinlin command. The

test statistic is computed as per Levin, Lin and Chu (2002). As with the adf command, the ﬁrst

argument is the lag order and the second is the name of the series to test; and the default case for

the deterministic component is a constant only. The options --nc and --ct have the same eﬀect as

with adf. One reﬁnement is that the lag order may be given in either of two forms: if a scalar is

given, this is taken to represent a common value of pfor all individuals, but you may instead provide

a vector holding a set of pivalues, hence allowing the order of autocorrelation of the series to diﬀer

by individual. So, for example, given

levinlin 2 y

levinlin {2,2,3,3,4,4} y

the ﬁrst command runs a joint ADF test with a common lag order of 2, while the second (which

assumes a panel with six individuals) allows for diﬀering short-run dynamics. The ﬁrst argument to

levinlin can be given as a set of comma-separated integers enclosed in braces, as shown above, or

as the name of an appropriately dimensioned pre-deﬁned matrix (see chapter 17).

Besides variants of the ADF test, the KPSS test also can be used with panel data via the kpss

command. In this case the test (of the null hypothesis that the given time series is stationary for all

individuals) is implemented using the method of Choi (2001). This is an application of meta-analysis,

the statistical technique whereby an overall or composite p-value for the test of a given null hypothesis

can be computed from the p-values of a set of separate tests. Unfortunately, in the case of the KPSS

test we are limited by the unavailability of precise p-values, although if an individual test statistic

falls between the 10 percent and 1 percent critical values we are able to interpolate with a fair degree

of conﬁdence. This gives rise to four cases.

1. All the individual KPSS test statistics fall between the 10 percent and 1 percent critical values:

the Choi method gives us a plausible composite p-value.

2. Some of the KPSS test statistics exceed the 1 percent value and none fall short of the 10 percent

value: we can give an upper bound for the composite p-value by setting the unknown p-values

to 0.01.

3. Some of the KPSS test statistics fall short of the 10 percent critical value but none exceed the

1 percent value: we can give a lower bound to the composite p-value by setting the unknown

p-values to 0.10.

4. None of the above conditions are satisﬁed: the Choi method fails to produce any result for the

composite KPSS test.

31.4 Cointegration test

The generally recommended test for cointegration is the Johansen test, which is discussed in detail in

chapter 33. In this context we just oﬀer a few remarks on the cointegration test of Engle and Granger

(1987), because it builds on the univariate ADF test discussed above (section 31.3).

For the Engle–Granger test, the procedure is:

1. Test each series for a unit root using an ADF test.

2. Run a “cointegrating regression” via OLS. For this we select one of the potentially cointegrated

variables as dependent, and include the other potentially cointegrated variables as regressors.

3. Perform an ADF test on the residuals from the cointegrating regression.

The idea is that cointegration is supported if (a) the null of non-stationarity is not rejected for each

of the series individually, in step 1, while (b) the null is rejected for the residuals at step 3. That is,

each of the individual series is I(1) but some linear combination of the series is I(0).

Chapter 31. Univariate time series models 286

This test is implemented in gretl by the coint command, which requires an integer lag order (for the

ADF tests) followed by a list of variables to be tested, the ﬁrst of which will be taken as dependent in

the cointegrating regression. Please see the online help for coint, or the Gretl Command Reference,

for further details.

31.5 ARCH and GARCH

Heteroskedasticity means a non-constant variance of the error term in a regression model. Autoregres-

sive Conditional Heteroskedasticity (ARCH) is a phenomenon speciﬁc to time series models, whereby

the variance of the error displays autoregressive behavior; for instance, the time series exhibits succes-

sive periods where the error variance is relatively large, and successive periods where it is relatively

small. This sort of behavior is reckoned to be common in asset markets: an unsettling piece of news

can lead to a period of increased volatility in the market.

An ARCH error process of order qcan be represented as

ut=σtεt;σ2

t≡E(u2

t|Ωt−1) = α0+

i=1

αiu2

t−i

where the εts are independently and identically distributed (iid) with mean zero and variance 1, and

where σtis taken to be the positive square root of σ2

t. Ωt−1denotes the information set as of time

t−1 and σ2

tis the conditional variance: that is, the variance conditional on information dated t−1

and earlier.

It is important to notice the diﬀerence between ARCH and an ordinary autoregressive error process.

The simplest (ﬁrst-order) case of the latter can be written as

ut=ρut−1+εt;−1<ρ<1

where the εts are independently and identically distributed with mean zero and variance σ2. With

an AR(1) error, if ρis positive then a positive value of utwill tend to be followed by a positive ut+1.

With an ARCH error process, a disturbance utof large absolute value will tend to be followed by

further large absolute values, but with no presumption that the successive values will be of the same

sign. ARCH in asset prices is a “stylized fact” and is consistent with market eﬃciency; on the other

hand autoregressive behavior of asset prices would violate market eﬃciency.

One can test for ARCH of order qin the following way:

1. Estimate the model of interest via OLS and save the squared residuals, ˆu2

2. Perform an auxiliary regression in which the current squared residual is regressed on a constant

and qlags of itself.

3. Find the T R2value (sample size times unadjusted R2) for the auxiliary regression.

4. Refer the T R2value to the χ2distribution with qdegrees of freedom, and if the p-value is“small

enough” reject the null hypothesis of homoskedasticity in favor of the alternative of ARCH(q).

This test is implemented in gretl via the modtest command with the --arch option, which must follow

estimation of a time-series model by OLS (either a single-equation model or a VAR). For example,

ols y 0 x

modtest 4 --arch

This example speciﬁes an ARCH order of q= 4; if the order argument is omitted, qis set equal to

the periodicity of the data. In the graphical interface, the ARCH test is accessible from the “Tests”

menu in the model window (again, for single-equation OLS or VARs).

GARCH

The simple ARCH(q) process is useful for introducing the general concept of conditional heteroskedas-

ticity in time series, but it has been found to be insuﬃcient in empirical work. The dynamics of the

Chapter 31. Univariate time series models 287

error variance permitted by ARCH(q) are not rich enough to represent the patterns found in ﬁnancial

data. The generalized ARCH or GARCH model is now more widely used.

The representation of the variance of a process in the GARCH model is somewhat (but not exactly)

analogous to the ARMA representation of the level of a time series. The variance at time tis allowed

to depend on both past values of the variance and past values of the realized squared disturbance, as

shown in the following system of equations:

yt=Xtβ+ut(31.13)

ut=σtεt(31.14)

σ2

t=α0+

i=1

αiu2

t−i+

j=1

δjσ2

t−j(31.15)

As above, εtis an iid sequence with unit variance. Xtis a matrix of regressors (or in the simplest

case, just a vector of 1s allowing for a non-zero mean of yt). Note that if p= 0, GARCH collapses to

ARCH(q): the generalization is embodied in the δjterms that multiply previous values of the error

variance.

In principle the underlying innovation, εt, could follow any suitable probability distribution, and

besides the obvious candidate of the normal or Gaussian distribution the Student’s tdistribution has

been used in this context. Currently gretl only handles the case where εtis assumed to be Gaussian.

However, when the --robust option to the garch command is given, the estimator gretl uses for the

covariance matrix can be considered Quasi-Maximum Likelihood even with non-normal disturbances.

See below for more on the options regarding the GARCH covariance matrix.

Example:

garch p q ; y const x

where p≥0 and q>0 denote the respective lag orders as shown in equation (31.15). These values

can be supplied in numerical form or as the names of pre-deﬁned scalar variables.

GARCH estimation

Estimation of the parameters of a GARCH model is by no means a straightforward task. (Consider

equation 31.15: the conditional variance at any point in time, σ2

t, depends on the conditional variance

in earlier periods, but σ2

tis not observed, and must be inferred by some sort of Maximum Likelihood

procedure.) By default gretl uses native code that employs the BFGS maximizer; you also have the

option (activated by the --fcp command-line switch) of using the method proposed by Fiorentini

et al. (1996),4which was adopted as a benchmark in the study of GARCH results by McCullough

and Renfro (1998). It employs analytical ﬁrst and second derivatives of the log-likelihood, and uses a

mixed-gradient algorithm, exploiting the information matrix in the early iterations and then switching

to the Hessian in the neighborhood of the maximum likelihood. (This progress can be observed if you

append the --verbose option to gretl’s garch command.)

Several options are available for computing the covariance matrix of the parameter estimates in

connection with the garch command. At a ﬁrst level, one can choose between a “standard” and a

“robust” estimator. By default, the Hessian is used unless the --robust option is given, in which case

the QML estimator is used. A ﬁner choice is available via the set command, as shown in Table 31.2.

It is not uncommon, when one estimates a GARCH model for an arbitrary time series, to ﬁnd that the

iterative calculation of the estimates fails to converge. For the GARCH model to make sense, there

are strong restrictions on the admissible parameter values, and it is not always the case that there

exists a set of values inside the admissible parameter space for which the likelihood is maximized.

The restrictions in question can be explained by reference to the simplest (and much the most com-

mon) instance of the GARCH model, where p=q= 1. In the GARCH(1, 1) model the conditional

variance is

σ2

t=α0+α1u2

t−1+δ1σ2

t−1(31.16)

4The algorithm is based on Fortran code deposited in the archive of the Journal of Applied Econometrics by the

authors, and is used by kind permission of Professor Fiorentini.

Chapter 31. Univariate time series models 288

Table 31.2: Options for the GARCH covariance matrix

command eﬀect

set garch_vcv hessian Use the Hessian

set garch_vcv im Use the Information Matrix

set garch_vcv op Use the Outer Product of the Gradient

set garch_vcv qml QML estimator

set garch_vcv bw Bollerslev–Wooldridge “sandwich” estimator

Taking the unconditional expectation of (31.16) we get

σ2=α0+α1σ2+δ1σ2

so that

σ2=α0

1−α1−δ1

For this unconditional variance to exist, we require that α1+δ1<1, and for it to be positive we

require that α0>0.

A common reason for non-convergence of GARCH estimates (that is, a common reason for the non-

existence of αiand δivalues that satisfy the above requirements and at the same time maximize the

likelihood of the data) is misspeciﬁcation of the model. It is important to realize that GARCH, in

itself, allows only for time-varying volatility in the data. If the mean of the series in question is not

constant, or if the error process is not only heteroskedastic but also autoregressive, it is necessary to

take this into account when formulating an appropriate model. For example, it may be necessary to

take the ﬁrst diﬀerence of the variable in question and/or to add suitable regressors, Xt, as in (31.13).

Chapter 32

Vector Autoregressions

Gretl provides a standard set of procedures for dealing with the multivariate time-series models known

as VARs (Vector AutoRegression). More general models—such as VARMAs, nonlinear models or

multivariate GARCH models—are not provided as of now, although it is entirely possible to estimate

them by writing custom procedures in the gretl scripting language. In this chapter, we will brieﬂy

review gretl’s VAR toolbox.

32.1 Notation

A VAR is a structure whose aim is to model the time persistence of a vector of ntime series, yt, via

a multivariate autoregression, as in

yt=A1yt−1+A2yt−2+· ·· +Apyt−p+Bxt+ϵt(32.1)

The number of lags pis called the order of the VAR. The vector xt, if present, contains a set of

exogenous variables, often including a constant, possibly with a time trend and seasonal dummies.

The vector ϵtis typically assumed to be a vector white noise, with covariance matrix Σ.

Equation (32.1) can be written more compactly as

A(L)yt=Bxt+ϵt(32.2)

where A(L) is a matrix polynomial in the lag operator, or as







yt−1

···

yt−p−1





=A





yt−1

yt−2

···

yt−p





+





···





xt+





ϵt

···





(32.3)

The matrix Ais known as the “companion matrix” and equals

A=





A1A2··· Ap

I0··· 0

0I··· 0

.....







Equation (32.3) is known as the “companion form” of the VAR.

Another representation of interest is the so-called “VMA representation”, which is written in terms

of an inﬁnite series of matrices Θideﬁned as

Θi=∂yt

∂ϵt−i

(32.4)

The Θimatrices may be derived by recursive substitution in equation (32.1): for example, assuming

for simplicity that B= 0 and p= 1, equation (32.1) would become

yt=Ayt−1+ϵt

which could be rewritten as

yt=An+1yt−n−1+ϵt+Aϵt−1+A2ϵt−2+· · · +Anϵt−n

289

Chapter 32. Vector Autoregressions 290

In this case Θi=Ai. In general, it is possible to compute Θias the n×nnorth-west block of the

i-th power of the companion matrix A(so Θ0is always an identity matrix).

The VAR is said to be stable if all the eigenvalues of the companion matrix Aare smaller than

1 in absolute value, or equivalently, if the matrix polynomial A(L) in equation (32.2) is such that

|A(z)|= 0 implies |z|>1. If this is the case, limn→∞ Θn= 0 and the vector ytis stationary; as a

consequence, the equation

yt−E(yt) =

∞

i=0

Θiϵt−i(32.5)

is a legitimate Wold representation.

If the VAR is not stable, then the inferential procedures that are called for become somewhat more

specialized, except for some simple cases. In particular, if the number of eigenvalues of Awith

modulus 1 is between 1 and n−1, the canonical tool to deal with these models is the cointegrated

VAR model, discussed in chapter 33.

32.2 Estimation

The gretl command for estimating a VAR is var which, in the command line interface, is invoked in

the following manner:

[modelname <- ] var p Ylist [; Xlist ]

where pis a scalar (the VAR order) and Ylist is a list of variables specifying the content of yt. The

optional Xlist argument can be used to specify a set of exogenous variables. If this argument is

omitted, the vector xtis taken to contain a constant (only); if present, it must be separated from

Ylist by a semicolon. Note, however, that a few common choices can be obtained in a simpler

way: the options --trend and --seasonals call for inclusion of a linear trend and a set of seasonal

dummies respectively. In addition the --nc option (no constant) can be used to suppress the standard

inclusion of a constant.

The “<-” construct can be used to store the model under a name (see section 3.2), if so desired. To

estimate a VAR using the graphical interface, choose“Time Series, Vector Autoregression”, under the

Model menu.

The parameters in eq. (32.1) are typically free from restrictions, which implies that multivariate

OLS provides a consistent and asymptotically eﬃcient estimator of all the parameters.1Given the

simplicity of OLS, this is what every software package, including gretl, uses; example script 32.1

exempliﬁes the fact that the var command gives you exactly the output you would have from a

battery of OLS regressions. The advantage of using the dedicated command is that, after estimation

is done, it makes it much easier to access certain quantities and manage certain tasks. For example,

the $coeff accessor returns the estimated coeﬃcients as a matrix with ncolumns and $sigma returns

an estimate of the matrix Σ, the covariance matrix of ϵt.

Moreover, for each variable in the system an Ftest is automatically performed, in which the null

hypothesis is that no lags of variable jare signiﬁcant in the equation for variable i. This is commonly

known as a Granger causality test.

Periodicity horizon

Quarterly 20 (5 years)

Monthly 24 (2 years)

Daily 3 weeks

All other cases 10

Table 32.1: VMA horizon as a function of the dataset periodicity

In addition, two accessors become available for the companion matrix ($compan) and the VMA

representation ($vma). The latter deserves a detailed description: since the VMA representation (32.5)

1In fact, under normality of ϵtOLS is indeed the conditional ML estimator. You may want to use other methods

if you need to estimate a VAR in which some parameters are constrained.

Chapter 32. Vector Autoregressions 291

Listing 32.1: Estimation of a VAR via OLS [Download ▼]

open sw_ch14.gdt

series infl = 400*sdiff(log(PUNEW))

scalar p = 2

list X = LHUR infl

list Xlag = lags(p,X)

loop foreach i X

ols $i const Xlag

endloop

var p X

Output (selected portions):

Model 1: OLS, using observations 1960:3-1999:4 (T = 158)

Dependent variable: LHUR

coefficient std. error t-ratio p-value

--------------------------------------------------------

const 0.113673 0.0875210 1.299 0.1960

LHUR_1 1.54297 0.0680518 22.67 8.78e-51 ***

LHUR_2 -0.583104 0.0645879 -9.028 7.00e-16 ***

infl_1 0.0219040 0.00874581 2.505 0.0133 **

infl_2 -0.0148408 0.00920536 -1.612 0.1090

Mean dependent var 6.019198 S.D. dependent var 1.502549

Sum squared resid 8.654176 S.E. of regression 0.237830

...

VAR system, lag order 2

OLS estimates, observations 1960:3-1999:4 (T = 158)

Log-likelihood = -322.73663

Determinant of covariance matrix = 0.20382769

AIC = 4.2119

BIC = 4.4057

HQC = 4.2906

Portmanteau test: LB(39) = 226.984, df = 148 [0.0000]

Equation 1: LHUR

coefficient std. error t-ratio p-value

--------------------------------------------------------

const 0.113673 0.0875210 1.299 0.1960

LHUR_1 1.54297 0.0680518 22.67 8.78e-51 ***

LHUR_2 -0.583104 0.0645879 -9.028 7.00e-16 ***

infl_1 0.0219040 0.00874581 2.505 0.0133 **

infl_2 -0.0148408 0.00920536 -1.612 0.1090

Mean dependent var 6.019198 S.D. dependent var 1.502549

Sum squared resid 8.654176 S.E. of regression 0.237830

Chapter 32. Vector Autoregressions 292

is of inﬁnite order, gretl deﬁnes a horizon up to which the Θimatrices are computed automatically.

By default, this is a function of the periodicity of the data (see table 32.1), but it can be set by the

user to any desired value via the set command with the horizon parameter, as in

set horizon 30

Calling the horizon h, the $vma accessor returns an (h+ 1) ×n2matrix, in which the (i+ 1)-th row

is the vectorized form of Θi.

VAR lag-order selection

In order to help the user choose the most appropriate VAR order, gretl provides a special variant of

the var command:

var p Ylist [; Xlist ] --lagselect

When the --lagselect option is given, estimation is performed for all lags up to pand a table is

printed: it displays, for each order, a Likelihood Ratio test for the order pversus p−1, plus an array of

information criteria (see chapter 28). For each information criterion in the table, a star indicates what

appears to be the “best” choice. The same output can be obtained through the graphical interface

via the “Time Series, VAR lag selection” entry under the Model menu.

Listing 32.2: VAR lag selection via Information Criteria

open denmark

listY=1234

var 4 Y --lagselect

var 6 Y --lagselect

Output (selected portions):

VAR system, maximum lag order 4

The asterisks below indicate the best (that is, minimized) values

of the respective information criteria, AIC = Akaike criterion,

BIC = Schwarz Bayesian criterion and HQC = Hannan-Quinn criterion.

lags loglik p(LR) AIC BIC HQC

1 609.15315 -23.104045 -22.346466* -22.814552

2 631.70153 0.00013 -23.360844* -21.997203 -22.839757*

3 642.38574 0.16478 -23.152382 -21.182677 -22.399699

4 653.22564 0.15383 -22.950025 -20.374257 -21.965748

VAR system, maximum lag order 6

The asterisks below indicate the best (that is, minimized) values

of the respective information criteria, AIC = Akaike criterion,

BIC = Schwarz Bayesian criterion and HQC = Hannan-Quinn criterion.

lags loglik p(LR) AIC BIC HQC

1 594.38410 -23.444249 -22.672078* -23.151288*

2 615.43480 0.00038 -23.650400* -22.260491 -23.123070

3 624.97613 0.26440 -23.386781 -21.379135 -22.625083

4 636.03766 0.13926 -23.185210 -20.559827 -22.189144

5 658.36014 0.00016 -23.443271 -20.200150 -22.212836

6 669.88472 0.11243 -23.260601 -19.399743 -21.795797

Chapter 32. Vector Autoregressions 293

Warning: in ﬁnite samples the choice of the maximum lag, p, may aﬀect the outcome of the procedure.

This is not a bug, but rather an unavoidable side eﬀect of the way these comparisons should be made.

If your sample contains Tobservations and you invoke the lag selection procedure with maximum

order p, gretl examines all VARs of order ranging form 1 to p, estimated on a uniform sample of

T−pobservations. In other words, the comparison procedure does not use all the available data

when estimating VARs of order less than p, so as to ensure that all the models in the comparison are

estimated on the same data range. Choosing a diﬀerent value of pmay therefore alter the results,

although this is unlikely to happen if your sample size is reasonably large.

An example of this unpleasant phenomenon is given in example script 32.2. As can be seen, according

to the Hannan-Quinn criterion, order 2 seems preferable to order 1 if the maximum tested order is 4,

but the situation is reversed if the maximum tested order is 6.

32.3 Structural VARs

Gretl’s built-in var command does not support the general class of models known as “Structural

VARs”—though it does support the Cholesky decomposition-based approach, the classic and most

popular structural VAR variant. If you wish to go beyond that there is a gretl “addon” named SVAR

which will likely meet your needs. SVAR is supplied as part of the gretl package, you can ﬁnd its

documentation (which is quite detailed) as follows: under the Tools menu in the gretl main window,

go to “Function packages/On local machine.” (Or use the “fx” button on the toolbar at the foot of

the main window.) In the function packages window either scroll down or use the search box to ﬁnd

SVAR. Then right-click and select “Info.” This opens a window which gives basic information on the

package, including a link to SVAR.pdf, the full documentation.

The remainder of this section will thus only deal with the Cholesky-based recursive shock identiﬁcation

used by the native var command.

IRF and FEVD

Assume that the disturbance in equation (32.1) can be thought of as a linear function of a vector of

structural shocks ut, which are assumed to have unit variance and to be mutually unncorrelated, so

V(ut) = I. If ϵt=Kut, it follows that Σ = V(ϵt) = KK ′.

The main object of interest in this setting is the sequence of matrices

Ck=∂yt

∂ut−i

= ΘkK, (32.6)

known as the structural VMA representation. From the Ckmatrices deﬁned in equation (32.6) two

quantities of interest may be derived: the Impulse Response Function (IRF) and the Forecast Error

Variance Decomposition (FEVD).

The IRF of variable ito shock jis simply the sequence of the elements in row iand column jof the

Ckmatrices. In symbols:

Ii,j,k =∂yi,t

∂uj,t−k

As a rule, Impulse Response Functions are plotted as a function of k, and are interpreted as the

eﬀect that a shock has on an observable variable through time. Of course, what we observe are the

estimated IRFs, so it is natural to endow them with conﬁdence intervals: following common practice,

gretl computes the conﬁdence intervals by using the bootstrap;2details are given later in this section.

Another quantity of interest that may be computed from the structural VMA representation is the

Forecast Error Variance Decomposition (FEVD). The forecast error variance after hsteps is given by

Ωh=

k=0

CkC′

2It is possible, in principle, to compute analytical conﬁdence intervals via an asymptotic approximation, but this is

not a very popular choice: asymptotic formulae are known to often give a very poor approximation of the ﬁnite-sample

properties.

Chapter 32. Vector Autoregressions 294

hence the variance for variable iis

ω2

i= [Ωh]i,i =

k=0

diag(CkC′

k)i=

k=0

l=1

(kci.l)2

where kci.l is, trivially, the i, l element of Ck. As a consequence, the share of uncertainty on variable

ithat can be attributed to the j-th shock after hperiods equals

VDi,j,h =Ph

k=0(kci.j )2

k=0 Pn

l=1(kci.l )2.

This makes it possible to quantify which shocks are most important to determine a certain variable

in the short and/or in the long run.

Triangularization

The formula 32.6 takes Kas known, while of course it has to be estimated. The estimation problem

has been the subject of an enormous body of literature we will not even attempt to summarize here:

see for example (L¨

utkepohl,2005, chapter 9).

Suﬃce it to say that the most popular choice dates back to Sims (1980), and consists in assuming

that Kis lower triangular, so its estimate is simply the Cholesky decomposition of the estimate of Σ.

The main consequence of this choice is that the ordering of variables within the vector ytbecomes

meaningful: since Kis also the matrix of Impulse Response Functions at lag 0, the triangularity

assumption means that the ﬁrst variable in the ordering responds instantaneously only to shock

number 1, the second one only to shocks 1 and 2, and so forth. For this reason, each variable is

thought to “own” one shock: variable 1 owns shock number 1, and so on.

In this sort of exercise, therefore, the ordering of the yvariables is important. To put it diﬀerently,

if variable foo comes before variable bar in the Ylist, it follows that the shock owned by foo aﬀects

bar instantaneously, but not vice versa.

Impulse Response Functions and the FEVD can be printed out via the command line interface by

using the --impulse-responses and --variance-decomp options, respectively. If you need to store

them into matrices, you could compute the structural VMA and proceed from there. For example,

the following code snippet shows you how to manually compute a matrix containing the IRFs:

open denmark

listY=1234

scalar n = nelem(Y)

var 2 Y --quiet --impulse-responses

matrix K = cholesky($sigma)

matrix V = $vma

matrix IRF = V * (K ** I(n))

print IRF

in which the equality

vec(Ck) = vec(ΘkK) = (K′⊗I)vec(Θk)

was used.

A more convenient way of obtaining the desired quantities is to use the irf and fevd functions

which can be used in scripts after a VAR (or VECM, see the next chapter) has been estimated. In

these functions you must specify the number of the responding (target) variable and the number of

the analyzed shock to get the corresponding results as a column vector. The choice of how many

periods should be calculated –and thus how long the result vector will be– is determined by previously

invoking set horizon x, where x is a non-negative integer and the ﬁrst response concerns the impact

eﬀect. As always, it is recommended to consult the function reference under the help menu, where in

the case of the irf function it is also explained that the implicit shock size is such that the impact

response in the same equation is one standard deviation (of the corresponding error term).

Chapter 32. Vector Autoregressions 295

IRF bootstrap

The IRFs obtained above are estimates and as such they are uncertain. Mostly due to the fact that

they are nonlinear functions of the VAR parameters the standard way of assessing this estimation

uncertainty and to derive conﬁdence intervals or bands is to use a bootstrap approach. Again, more

advanced options are available with the SVAR addon, but the irf function used after the built-in

var (or vecm) command also provides the option to run a bootstrap based on resampling from the

residuals. (The number of bootstrap iterations can be adjusted through set boot_iters x, where

x must be larger than 499.) The desired nominal conﬁdence level must be speciﬁed after the target

and shock numbers as the third argument, and in that case the return vector becomes a three-column

matrix where the lower and upper bounds of the conﬁdence intervals are given in the extra two

columns.

Menu-driven usage

Almost all the functionality related to the described (recursively identiﬁed) structural VARs is also

available under the menus in the model window that appears after a VAR is estimated in the GUI.3

•In the “Plots” menu there are a number of menu entries relating to the impulse responses as well

as one entry for the forecast error variance decomposition. Selecting any of these will bring up a

little speciﬁcation window where the ordering for the Cholesky decomposition must be chosen,

and in case of IRFs the intended bootstrap coverage can be set.

•In the “Analysis” menu there are also entries for IRF and FEVD, which may sometimes be a

little confusing. The point is that here the numbers (of the point estimates) will be printed out

in a tabular format instead of being plotted.

32.4 Residual-based diagnostic tests

Three diagnostic tests based on residuals are available after estimating a VAR—for normality, auto-

correlation and ARCH (Autoregressive Conditional Heteroskedasticity). These are implemented by

the modtest command, using the options --normality,--autocorr and --arch, respectively.

The (multivariate) normality test is that of Doornik and Hansen (1994); it is based on the skewness

and kurtosis of the VAR residuals.

The autocorrelation and ARCH test are also by default multivariate; they are described in detail

by L¨

utkepohl (2005) (see sections 4.4.4 and 16.5.1). Both tests are of the LM type, although the

autocorrelation test statistic is referred to a Rao Fdistribution (Rao,1973). These tests may involve

estimation of a large number of parameters, depending on the lag horizon chosen, and can fail for

lack of degrees of freedom in small samples. As a fallback, the --univariate option can be used to

specify that the tests be run per-equation rather than in multivariate mode.

Listing 32.3 illustrates the VAR autocorrelation tests, replicating an example given by L¨

utkepohl

(2005, p. 174). Note the diﬀerence in the interpretation of the order argument to modtest with the

--autocorr option (this also applies to the ARCH test): in the multivariate version order is taken

as the maximum lag order and tests are run from lag 1 up to the maximum; but in the univariate

version a single test is run for each equation using just the speciﬁed lag order. The example also

exposes what exactly is returned by the $test and $pvalue accessors in the two variants.

3Note that you cannot directly invoke the SVAR addon from the model window of an estimated VAR; that menu

entry is only present in gretl’s main window under the Model menu and multivariate time series sub-menu.

Chapter 32. Vector Autoregressions 296

Listing 32.3: VAR autocorrelation test from L¨

utkepohl [Download ▼]

open wgmacro.gdt --quiet

list Y = investment income consumption

list dlnY = ldiff(Y)

smpl 1960:4 1978:4

var 2 dlnY

modtest 4 --autocorr

eval $test ~ $pvalue

modtest 4 --autocorr --univariate

eval $test ~ $pvalue

Output from tests:

? modtest 4 --autocorr

Test for autocorrelation of order up to 4

Rao F Approx dist. p-value

lag 1 0.615 F(9, 148) 0.7827

lag 2 0.754 F(18, 164) 0.7507

lag 3 1.143 F(27, 161) 0.2982

lag 4 1.254 F(36, 154) 0.1743

? eval $test ~ $pvalue

0.61524 0.78269

0.75397 0.75067

1.1429 0.29820

1.2544 0.17431

? modtest 4 --autocorr --univariate

Test for autocorrelation of order 4

Equation 1:

Ljung-Box Q’ = 6.11506 with p-value = P(Chi-square(4) > 6.11506) = 0.191

Equation 2:

Ljung-Box Q’ = 1.67136 with p-value = P(Chi-square(4) > 1.67136) = 0.796

Equation 3:

Ljung-Box Q’ = 1.59931 with p-value = P(Chi-square(4) > 1.59931) = 0.809

? eval $test ~ $pvalue

6.1151 0.19072

1.6714 0.79591

1.5993 0.80892

Chapter 33

Cointegration and Vector Error Correction Models

33.1 Introduction

The twin concepts of cointegration and error correction have drawn a good deal of attention in

macroeconometrics over recent years. The attraction of the Vector Error Correction Model (VECM)

is that it allows the researcher to embed a representation of economic equilibrium relationships within

a relatively rich time-series speciﬁcation. This approach overcomes the old dichotomy between (a)

structural models that faithfully represented macroeconomic theory but failed to ﬁt the data, and (b)

time-series models that were accurately tailored to the data but diﬃcult if not impossible to interpret

in economic terms.

The basic idea of cointegration relates closely to the concept of unit roots (see section 31.3). Suppose

we have a set of macroeconomic variables of interest, and we ﬁnd we cannot reject the hypothesis

that some of these variables, considered individually, are non-stationary. Speciﬁcally, suppose we

judge that a subset of the variables are individually integrated of order 1, or I(1). That is, while they

are non-stationary in their levels, their ﬁrst diﬀerences are stationary. Given the statistical problems

associated with the analysis of non-stationary data (for example, the threat of spurious regression),

the traditional approach in this case was to take ﬁrst diﬀerences of all the variables before proceeding

with the analysis.

But this can result in the loss of important information. It may be that while the variables in question

are I(1) when taken individually, there exists a linear combination of the variables that is stationary

without diﬀerencing, or I(0). (There could be more than one such linear combination.) That is,

while the ensemble of variables may be “free to wander”over time, nonetheless the variables are “tied

together” in certain ways. And it may be possible to interpret these ties, or cointegrating vectors, as

representing equilibrium conditions.

For example, suppose we ﬁnd some or all of the following variables are I(1): money stock, M, the

price level, P, the nominal interest rate, R, and output, Y. According to standard theories of the

demand for money, we would nonetheless expect there to be an equilibrium relationship between real

balances, interest rate and output; for example

m−p=γ0+γ1y+γ2r γ1>0, γ2<0

where lower-case variable names denote logs. In equilibrium, then,

m−p−γ1y−γ2r=γ0

Realistically, we should not expect this condition to be satisﬁed each period. We need to allow

for the possibility of short-run disequilibrium. But if the system moves back towards equilibrium

following a disturbance, it follows that the vector x= (m, p, y, r)′is bound by a cointegrating vector

β′= (β1, β2, β3, β4), such that β′xis stationary (with a mean of γ0). Furthermore, if equilibrium is

correctly characterized by the simple model above, we have β2=−β1,β3<0 and β4>0. These

things are testable within the context of cointegration analysis.

There are typically three steps in this sort of analysis:

1. Test to determine the number of cointegrating vectors, the cointegrating rank of the system.

2. Estimate a VECM with the appropriate rank, but subject to no further restrictions.

3. Probe the interpretation of the cointegrating vectors as equilibrium conditions by means of

restrictions on the elements of these vectors.

The following sections expand on each of these points, giving further econometric details and explain-

ing how to implement the analysis using gretl.

297

Chapter 33. Cointegration and Vector Error Correction Models 298

33.2 Vector Error Correction Models as representation of a cointegrated system

Consider a VAR of order pwith a deterministic part given by µt(typically, a polynomial in time).

One can write the n-variate process ytas

yt=µt+A1yt−1+A2yt−2+· ·· +Apyt−p+ϵt(33.1)

But since yt−i≡yt−1−(∆yt−1+ ∆yt−2+· ·· + ∆yt−i+1), we can re-write the above as

∆yt=µt+ Πyt−1+

p−1

i=1

Γi∆yt−i+ϵt,(33.2)

where Π = Pp

i=1 Ai−Iand Γi=−Pp

j=i+1 Aj. This is the VECM representation of (33.1).

The interpretation of (33.2) depends crucially on r, the rank of the matrix Π.

•If r= 0, the processes are all I(1) and not cointegrated.

•If r=n, then Π is invertible and the processes are all I(0).

•Cointegration occurs in between, when 0 < r < n and Π can be written as αβ′. In this case, yt

is I(1), but the combination zt=β′ytis I(0). If, for example, r= 1 and the ﬁrst element of β

was −1, then one could write zt=−y1,t +β2y2,t +·· · +βnyn,t, which is equivalent to saying

that

y1t=β2y2,t +· ·· +βnyn,t −zt

is a long-run equilibrium relationship: the deviations ztmay not be 0 but they are stationary.

In this case, (33.2) can be written as

∆yt=µt+αβ′yt−1+

p−1

i=1

Γi∆yt−i+ϵt.(33.3)

If βwere known, then ztwould be observable and all the remaining parameters could be

estimated via OLS. In practice, the procedure estimates βﬁrst and then the rest.

The rank of Π is investigated by computing the eigenvalues of a closely related matrix whose rank

is the same as Π: however, this matrix is by construction symmetric and positive semideﬁnite. As a

consequence, all its eigenvalues are real and non-negative, and tests on the rank of Π can therefore

be carried out by testing how many eigenvalues are 0.

If all the eigenvalues are signiﬁcantly diﬀerent from 0, then all the processes are stationary. If, on

the contrary, there is at least one zero eigenvalue, then the ytprocess is integrated, although some

linear combination β′ytmight be stationary. At the other extreme, if no eigenvalues are signiﬁcantly

diﬀerent from 0, then not only is the process ytnon-stationary, but the same holds for any linear

combination β′yt; in other words, no cointegration occurs.

Estimation typically proceeds in two stages: ﬁrst, a sequence of tests is run to determine r, the

cointegration rank. Then, for a given rank the parameters in equation (33.3) are estimated. The two

commands that gretl oﬀers for estimating these systems are johansen and vecm, respectively.

The syntax for johansen is

johansen p ylist [ ; xlist [ ; zlist ] ]

where pis the number of lags in (33.1); ylist is a list containing the ytvariables; xlist is an optional

list of exogenous variables; and zlist is another optional list of exogenous variables whose eﬀects are

assumed to be conﬁned to the cointegrating relationships.

The syntax for vecm is

vecm p r ylist [ ; xlist [ ; zlist ] ]

where pis the number of lags in (33.1); ris the cointegration rank; and the lists ylist,xlist and

zlist have the same interpretation as in johansen.

Both commands can be given speciﬁc options to handle the treatment of the deterministic component

µt. These are discussed in the following section.

Chapter 33. Cointegration and Vector Error Correction Models 299

33.3 Interpretation of the deterministic components

Statistical inference in the context of a cointegrated system depends on the hypotheses one is willing

to make on the deterministic terms, which leads to the famous “ﬁve cases.”

In equation (33.2), the term µtis usually understood to take the form

µt=µ0+µ1·t.

In order to have the model mimic as closely as possible the features of the observed data, there is a

preliminary question to settle. Do the data appear to follow a deterministic trend? If so, is it linear

or quadratic?

Once this is established, one should impose restrictions on µ0and µ1that are consistent with this

judgement. For example, suppose that the data do not exhibit a discernible trend. This means that

∆ytis on average zero, so it is reasonable to assume that its expected value is also zero. Write

equation (33.2) as

Γ(L)∆yt=µ0+µ1·t+αzt−1+ϵt,(33.4)

where zt=β′ytis assumed to be stationary and therefore to possess ﬁnite moments. Taking uncon-

ditional expectations, we get

0 = µ0+µ1·t+αmz.

Since the left-hand side does not depend on t, the restriction µ1= 0 is a safe bet. As for µ0, there

are just two ways to make the above expression true: either µ0= 0 with mz= 0, or µ0equals −αmz.

The latter possibility is less restrictive in that the vector µ0may be non-zero, but is constrained to

be a linear combination of the columns of α. In that case, µ0can be written as α·c, and one may

write (33.4) as

Γ(L)∆yt=α[β′c]"yt−1

1#+ϵt.

The long-run relationship therefore contains an intercept. This type of restriction is usually written

α′

⊥µ0= 0,

where α⊥is the left null space of the matrix α.

An intuitive understanding of the issue can be gained by means of a simple example. Consider a

series xtwhich behaves as follows

xt=m+xt−1+εt

where mis a real number and εtis a white noise process: xtis then a random walk with drift m. In

the special case m= 0, the drift disappears and xtis a pure random walk.

Consider now another process yt, deﬁned by

yt=k+xt+ut

where, again, kis a real number and utis a white noise process. Since utis stationary by deﬁnition,

xtand ytcointegrate: that is, their diﬀerence

zt=yt−xt=k+ut

is a stationary process. For k= 0, ztis simple zero-mean white noise, whereas for k= 0 the process

ztis white noise with a non-zero mean.

After some simple substitutions, the two equations above can be represented jointly as a VAR(1)

system "yt

xt#="k+m

m#+"0 1

0 1 #" yt−1

xt−1#+"ut+εt

εt#

Chapter 33. Cointegration and Vector Error Correction Models 300

or in VECM form

"∆yt

∆xt#="k+m

m#+"−1 1

0 0 #" yt−1

xt−1#+"ut+εt

εt#=

="k+m

m#+"−1

0#h1−1i"yt−1

xt−1#+"ut+εt

εt#=

=µ0+αβ′"yt−1

xt−1#+ηt=µ0+αzt−1+ηt,

where βis the cointegration vector and αis the “loadings” or “adjustments” vector.

We are now ready to consider three possible cases:

1. m= 0: In this case xtis trended, as we just saw; it follows that ytalso follows a linear trend

because on average it keeps at a ﬁxed distance kfrom xt. The vector µ0is unrestricted.

2. m= 0 and k= 0: In this case, xtis not trended and as a consequence neither is yt. However,

the mean distance between ytand xtis non-zero. The vector µ0is given by

µ0="k

which is not null and therefore the VECM shown above does have a constant term. The constant,

however, is subject to the restriction that its second element must be 0. More generally, µ0is

a multiple of the vector α. Note that the VECM could also be written as

"∆yt

∆xt#="−1

0#h1−1−ki





yt−1

xt−1





+"ut+εt

εt#

which incorporates the intercept into the cointegration vector. This is known as the “restricted

constant” case.

3. m= 0 and k= 0: This case is the most restrictive: clearly, neither xtnor ytare trended, and

the mean distance between them is zero. The vector µ0is also 0, which explains why this case

is referred to as “no constant.”

In most cases, the choice between these three possibilities is based on a mix of empirical observation

and economic reasoning. If the variables under consideration seem to follow a linear trend then we

should not place any restriction on the intercept. Otherwise, the question arises of whether it makes

sense to specify a cointegration relationship which includes a non-zero intercept. One example where

this is appropriate is the relationship between two interest rates: generally these are not trended,

but the VAR might still have an intercept because the diﬀerence between the two (the “interest rate

spread”) might be stationary around a non-zero mean (for example, because of a risk or liquidity

premium).

The previous example can be generalized in three directions:

1. If a VAR of order greater than 1 is considered, the algebra gets more convoluted but the

conclusions are identical.

2. If the VAR includes more than two endogenous variables the cointegration rank rcan be greater

than 1. In this case, αis a matrix with rcolumns, and the case with restricted constant entails

the restriction that µ0should be some linear combination of the columns of α.

3. If a linear trend is included in the model, the deterministic part of the VAR becomes µ0+µ1t.

The reasoning is practically the same as above except that the focus now centers on µ1rather

than µ0. The counterpart to the“restricted constant”case discussed above is a “restricted trend”

case, such that the cointegration relationships include a trend but the ﬁrst diﬀerences of the

variables in question do not. In the case of an unrestricted trend, the trend appears in both

the cointegration relationships and the ﬁrst diﬀerences, which corresponds to the presence of a

quadratic trend in the variables themselves (in levels).

Chapter 33. Cointegration and Vector Error Correction Models 301

In order to accommodate the ﬁve cases, gretl provides the following options to the johansen and

vecm commands:

µtoption ﬂag description

0--nc no constant

µ0, α′

⊥µ0= 0 --rc restricted constant

µ0--uc unrestricted constant

µ0+µ1t, α′

⊥µ1= 0 --crt constant + restricted trend

µ0+µ1t--ct constant + unrestricted trend

Note that for this command the above options are mutually exclusive. In addition, you have the

option of using the --seasonals options, for augmenting µtwith centered seasonal dummies. In

each case, p-values are computed via the approximations devised by Doornik (1998).

33.4 The Johansen cointegration tests

The two Johansen tests for cointegration are used to establish the rank of β, or in other words the

number of cointegrating vectors. These are the “λ-max” test, for hypotheses on individual eigenvalues,

and the “trace” test, for joint hypotheses. Suppose that the eigenvalues λiare sorted from largest

to smallest. The null hypothesis for the λ-max test on the i-th eigenvalue is that λi= 0. The

corresponding trace test, instead, considers the hypothesis λj= 0 for all j≥i.

The gretl command johansen performs these two tests. The corresponding menu entry in the GUI

is “Model, Time Series, Cointegration Test, Johansen”.

As in the ADF test, the asymptotic distribution of the tests varies with the deterministic component

µtone includes in the VAR (see section 33.3 above). The following code uses the denmark data ﬁle,

supplied with gretl, to replicate Johansen’s example found in his 1995 book.

open denmark

johansen 2 LRM LRY IBO IDE --rc --seasonals

In this case, the vector ytin equation (33.2) comprises the four variables LRM,LRY,IBO,IDE. The

number of lags equals pin (33.2) (that is, the number of lags of the model written in VAR form).

Part of the output is reported below:

Johansen test:

Number of equations = 4

Lag order = 2

Estimation period: 1974:3 - 1987:3 (T = 53)

Case 2: Restricted constant

Rank Eigenvalue Trace test p-value Lmax test p-value

0 0.43317 49.144 [0.1284] 30.087 [0.0286]

1 0.17758 19.057 [0.7833] 10.362 [0.8017]

2 0.11279 8.6950 [0.7645] 6.3427 [0.7483]

3 0.043411 2.3522 [0.7088] 2.3522 [0.7076]

Both the trace and λ-max tests accept the null hypothesis that the smallest eigenvalue is 0 (see the

last row of the table), so we may conclude that the series are in fact non-stationary. However, some

linear combination may be I(0), since the λ-max test rejects the hypothesis that the rank of Π is 0

(though the trace test gives less clear-cut evidence for this, with a p-value of 0.1284).

33.5 Identiﬁcation of the cointegration vectors

The core problem in the estimation of equation (33.2) is to ﬁnd an estimate of Π that has by con-

struction rank r, so it can be written as Π = αβ′, where βis the matrix containing the cointegration

vectors and αcontains the “adjustment” or “loading” coeﬃcients whereby the endogenous variables

respond to deviation from equilibrium in the previous period.

Chapter 33. Cointegration and Vector Error Correction Models 302

Without further speciﬁcation, the problem has multiple solutions (in fact, inﬁnitely many). The

parameters αand βare under-identiﬁed: if all columns of βare cointegration vectors, then any

arbitrary linear combinations of those columns is a cointegration vector too. To put it diﬀerently,

if Π = α0β′

0for speciﬁc matrices α0and β0, then Π also equals (α0Q)(Q−1β′

0) for any conformable

non-singular matrix Q. In order to ﬁnd a unique solution, it is therefore necessary to impose some

restrictions on αand/or β. It can be shown that the minimum number of restrictions that is necessary

to guarantee identiﬁcation is r2. Normalizing one coeﬃcient per column to 1 (or −1, according to

taste) is a trivial ﬁrst step, which also helps in that the remaining coeﬃcients can be interpreted as

the parameters in the equilibrium relations, but this only suﬃces when r= 1.

The method that gretl uses by default is known as the “Phillips normalization”, or “triangular repre-

sentation”.1The starting point is writing βin partitioned form as in

β="β1

β2#

where β1is an r×rmatrix and β2is (n−r)×r. Assuming that β1has full rank, βcan be

post-multiplied by β−1

1, giving

β="I

β2β−1

1#="I

−B#.

The coeﬃcients that gretl produces are ˆ

β, with Bknown as the matrix of unrestricted coeﬃcients.

In terms of the underlying equilibrium relationship, the Phillips normalization expresses the system

of requilibrium relations as

y1,t =b1,r+1yr+1,t +. . . +b1,n yn,t

y2,t =b2,r+1yr+1,t +. . . +b2,n yn,t

yr,t =br,r+1yr+1,t +. . . +br,nyr,t

where the ﬁrst rvariables are expressed as functions of the remaining n−r.

Although the triangular representation ensures that the statistical problem of estimating βis solved,

the resulting equilibrium relationships may be diﬃcult to interpret. In this case, the user may want

to achieve identiﬁcation by specifying manually the system of r2constraints that gretl will use to

produce an estimate of β.

As an example, consider the money demand system presented in section 9.6 of Verbeek (2004). The

variables used are m(the log of real money stock M1), infl (inﬂation), cpr (the commercial paper

rate), y(log of real GDP) and tbr (the Treasury bill rate).2

Estimation of βcan be performed via the commands

open money.gdt

smpl 1954:1 1994:4

vecm 6 2 m infl cpr y tbr --rc

and the relevant portion of the output reads

Maximum likelihood estimates, observations 1954:1-1994:4 (T = 164)

Cointegration rank = 2

Case 2: Restricted constant

beta (cointegrating vectors, standard errors in parentheses)

1For comparison with other studies, you may wish to normalize βdiﬀerently. Using the set command you can

do set vecm_norm diag to select a normalization that simply scales the columns of the original βsuch that βij =

1 for i=jand i≤r, as used in the empirical section of Boswijk and Doornik (2004). Another alternative is

set vecm_norm first, which scales βsuch that the elements on the ﬁrst row equal 1. To suppress normalization

altogether, use set vecm_norm none. (To return to the default: set vecm_norm phillips.)

2This data set is available in the verbeek data package; see http://gretl.sourceforge.net/gretl_data.html.

Chapter 33. Cointegration and Vector Error Correction Models 303

m 1.0000 0.0000

(0.0000) (0.0000)

infl 0.0000 1.0000

(0.0000) (0.0000)

cpr 0.56108 -24.367

(0.10638) (4.2113)

y -0.40446 -0.91166

(0.10277) (4.0683)

tbr -0.54293 24.786

(0.10962) (4.3394)

const -3.7483 16.751

(0.78082) (30.909)

Interpretation of the coeﬃcients of the cointegration matrix βwould be easier if a meaning could

be attached to each of its columns. This is possible by hypothesizing the existence of two long-run

relationships: a money demand equation

m=c1+β1infl +β2y+β3tbr

and a risk premium equation

cpr =c2+β4infl +β5y+β6tbr

which imply that the cointegration matrix can be normalized as

β=







−1 0

β1β4

0−1

β2β5

β3β6

c1c2







This renormalization can be accomplished by means of the restrict command, to be given after the

vecm command or, in the graphical interface, by selecting the“Test, Linear Restrictions” menu entry.

The syntax for entering the restrictions should be fairly obvious:3

restrict

b[1,1] = -1

b[1,3] = 0

b[2,1] = 0

b[2,3] = -1

end restrict

which produces

Cointegrating vectors (standard errors in parentheses)

m -1.0000 0.0000

(0.0000) (0.0000)

infl -0.023026 0.041039

(0.0054666) (0.027790)

cpr 0.0000 -1.0000

(0.0000) (0.0000)

y 0.42545 -0.037414

(0.033718) (0.17140)

tbr -0.027790 1.0172

(0.0045445) (0.023102)

const 3.3625 0.68744

(0.25318) (1.2870)

3Note that in this context we are bending the usual matrix indexation convention, using the leading index to refer

to the column of β(the particular cointegrating vector). This is standard practice in the literature, and defensible

insofar as it is the columns of β(the cointegrating relations or equilibrium errors) that are of primary interest.

Chapter 33. Cointegration and Vector Error Correction Models 304

33.6 Over-identifying restrictions

One purpose of imposing restrictions on a VECM system is simply to achieve identiﬁcation. If

these restrictions are simply normalizations, they are not testable and should have no eﬀect on the

maximized likelihood. In addition, however, one may wish to formulate constraints on βand/or αthat

derive from the economic theory underlying the equilibrium relationships; substantive restrictions of

this sort are then testable via a likelihood-ratio statistic.

Gretl is capable of testing general linear restrictions of the form

Rbvec(β) = q(33.5)

and/or

Ravec(α) = 0 (33.6)

Note that the βrestriction may be non-homogeneous (q= 0) but the αrestriction must be homo-

geneous. Nonlinear restrictions are not supported, and neither are restrictions that cross between β

and α. When r > 1, such restrictions may be in common across all the columns of β(or α) or may be

speciﬁc to certain columns of these matrices. For useful discussions of this point see Boswijk (1995)

and Boswijk and Doornik (2004), section 4.4.

The restrictions (33.5) and (33.6) may be written in explicit form as

vec(β) = Hϕ +h0(33.7)

and

vec(α′) = Gψ (33.8)

respectively, where ϕand ψare the free parameter vectors associated with βand αrespectively. We

may refer to the free parameters collectively as θ(the column vector formed by concatenating ϕand

ψ). Gretl uses this representation internally when testing the restrictions.

If the list of restrictions that is passed to the restrict command contains more constraints than

necessary to achieve identiﬁcation, then an LR test is performed. In addition, the restrict command

can be given the --full switch, in which case full estimates for the restricted system are printed

(including the Γiterms) and the system thus restricted becomes the “current model” for the purposes

of further tests. Thus you are able to carry out cumulative tests, as in Chapter 7 of Johansen (1995).

Syntax

The full syntax for specifying the restriction is an extension of that exempliﬁed in the previous section.

Inside a restrict...end restrict block, valid statements are of the form

parameter linear combination =scalar

where a parameter linear combination involves a weighted sum of individual elements of βor α(but

not both in the same combination); the scalar on the right-hand side must be 0 for combinations

involving α, but can be any real number for combinations involving β. Below, we give a few examples

of valid restrictions:

b[1,1] = 1.618

b[1,4] + 2*b[2,5] = 0

a[1,3] = 0

a[1,1] - a[1,2] = 0

Special syntax is used when a certain constraint should be applied to all columns of β: in this case,

one index is given for each bterm, and the square brackets are dropped. Hence, the following syntax

restrict

b1+b2=0

end restrict

Chapter 33. Cointegration and Vector Error Correction Models 305

corresponds to

β=





β11 β21

−β11 −β21

β13 β23

β14 β24







The same convention is used for α: when only one index is given for an “a” term the restriction is

presumed to apply to all rcolumns of α, or in other words the variable associated with the given row

of αis weakly exogenous. For instance, the formulation

restrict

a3 = 0

a4 = 0

end restrict

speciﬁes that variables 3 and 4 do not respond to the deviation from equilibrium in the previous

period.4

A variant on the single-index syntax for common restrictions on αand βis available: you can replace

the index number with the name of the corresponding variable, in square brackets. For example,

instead of a3 = 0 one could write a[cpr] = 0, if the third variable in the system is named cpr.

Finally, a shortcut (or anyway an alternative) is available for setting up complex restrictions (but

currently only in relation to β): you can specify Rband q, as in Rbvec(β) = q, by giving the names

of previously deﬁned matrices. For example,

matrix I4 = I(4)

matrix vR = I4**(I4~zeros(4,1))

matrix vq = mshape(I4,16,1)

restrict

R = vR

q = vq

end restrict

which manually imposes Phillips normalization on the βestimates for a system with cointegrating

rank 4.

There are two points to note in relation to this option. First, vec(β) is taken to include the coeﬃcients

on all terms within the cointegration space, including the restricted constant or trend, if any, as well

as any restricted exogenous variables. Second, it is acceptable to give an Rmatrix with a number

of columns equal to the number of rows of β; this variant is taken to specify a restriction that is in

common across all the columns of β.

An example

Brand and Cassola (2004) propose a money demand system for the Euro area, in which they postulate

three long-run equilibrium relationships:

money demand m=βll+βyy

Fisher equation π=ϕl

Expectation theory of l=s

interest rates

where mis real money demand, land sare long- and short-term interest rates, yis output and

πis inﬂation.5(The names for these variables in the gretl data ﬁle are m_p,rl,rs,yand infl,

respectively.)

4Note that when two indices are given in a restriction on αthe indexation is consistent with that for βrestrictions:

the leading index denotes the cointegrating vector and the trailing index the equation number.

5A traditional formulation of the Fisher equation would reverse the roles of the variables in the second equation,

but this detail is immaterial in the present context; moreover, the expectation theory of interest rates implies that the

third equilibrium relationship should include a constant for the liquidity premium. However, since in this example the

system is estimated with the constant term unrestricted, the liquidity premium gets absorbed into the system intercept

and disappears from zt.

Chapter 33. Cointegration and Vector Error Correction Models 306

The cointegration rank assumed by the authors is 3 and there are 5 variables, giving 15 elements in

the βmatrix. 3 ×3 = 9 restrictions are required for identiﬁcation, and a just-identiﬁed system would

have 15 −9 = 6 free parameters. However, the postulated long-run relationships feature only three

free parameters, so the over-identiﬁcation rank is 3.

Listing 33.1: Estimation of a money demand system with constraints on β[Download ▼]

open brand_cassola.gdt

# perform a few transformations

m_p = m_p*100

y = y*100

infl = infl/4

rs = rs/4

rl = rl/4

# replicate table 4, page 824

vecm 2 3 m_p infl rl rs y -q

ll0 = $lnl

restrict --full

b[1,1] = 1

b[1,2] = 0

b[1,4] = 0

b[2,1] = 0

b[2,2] = 1

b[2,4] = 0

b[2,5] = 0

b[3,1] = 0

b[3,2] = 0

b[3,3] = 1

b[3,4] = -1

b[3,5] = 0

end restrict

ll1 = $rlnl

Partial output:

Unrestricted loglikelihood (lu) = 116.60268

Restricted loglikelihood (lr) = 115.86451

2 * (lu - lr) = 1.47635

P(Chi-Square(3) > 1.47635) = 0.68774

beta (cointegrating vectors, standard errors in parentheses)

m_p 1.0000 0.0000 0.0000

(0.0000) (0.0000) (0.0000)

infl 0.0000 1.0000 0.0000

(0.0000) (0.0000) (0.0000)

rl 1.6108 -0.67100 1.0000

(0.62752) (0.049482) (0.0000)

rs 0.0000 0.0000 -1.0000

(0.0000) (0.0000) (0.0000)

y -1.3304 0.0000 0.0000

(0.030533) (0.0000) (0.0000)

Listing 33.1 replicates Table 4 on page 824 of the Brand and Cassola article.6Note that we use the

$lnl accessor after the vecm command to store the unrestricted log-likelihood and the $rlnl accessor

6Modulo what appear to be a few typos in the article.

Chapter 33. Cointegration and Vector Error Correction Models 307

after restrict for its restricted counterpart.

The example continues in script 33.2, where we perform further testing to check whether (a) the income

elasticity in the money demand equation is 1 (βy= 1) and (b) the Fisher relation is homogeneous

(ϕ= 1). Since the --full switch was given to the initial restrict command, additional restrictions

can be applied without having to repeat the previous ones. (The second script contains a few printf

commands, which are not strictly necessary, to format the output nicely.) It turns out that both of

the additional hypotheses are rejected by the data, with p-values of 0.002 and 0.004.

Listing 33.2: Further testing of money demand system

restrict

b[1,5] = -1

end restrict

ll_uie = $rlnl

restrict

b[2,3] = -1

end restrict

ll_hfh = $rlnl

# replicate table 5, page 824

printf "Testing zero restrictions in cointegration space:\n"

printf " LR-test, rank = 3: chi^2(3) = %6.4f [%6.4f]\n", 2*(ll0-ll1), \

pvalue(X, 3, 2*(ll0-ll1))

printf "Unit income elasticity: LR-test, rank = 3:\n"

printf " chi^2(4) = %g [%6.4f]\n", 2*(ll0-ll_uie), \

pvalue(X, 4, 2*(ll0-ll_uie))

printf "Homogeneity in the Fisher hypothesis:\n"

printf " LR-test, rank = 3: chi^2(4) = %6.3f [%6.4f]\n", 2*(ll0-ll_hfh), \

pvalue(X, 4, 2*(ll0-ll_hfh))

Output:

Testing zero restrictions in cointegration space:

LR-test, rank = 3: chi^2(3) = 1.4763 [0.6877]

Unit income elasticity: LR-test, rank = 3:

chi^2(4) = 17.2071 [0.0018]

Homogeneity in the Fisher hypothesis:

LR-test, rank = 3: chi^2(4) = 15.547 [0.0037]

Another type of test that is commonly performed is the “weak exogeneity” test. In this context, a

variable is said to be weakly exogenous if all coeﬃcients on the corresponding row in the αmatrix are

zero. If this is the case, that variable does not adjust to deviations from any of the long-run equilibria

and can be considered an autonomous driving force of the whole system.

The code in Listing 33.3 performs this test for each variable in turn, thus replicating the ﬁrst column

of Table 6 on page 825 of Brand and Cassola (2004). The results show that weak exogeneity might

perhaps be accepted for the long-term interest rate and real GDP (p-values 0.07 and 0.08 respectively).

Identiﬁcation and testability

One point regarding VECM restrictions that can be confusing at ﬁrst is that identiﬁcation (does

the restriction identify the system?) and testability (is the restriction testable?) are quite separate

matters. Restrictions can be identifying but not testable; less obviously, they can be testable but not

identifying.

This can be seen quite easily in relation to a rank-1 system. The restriction β1= 1 is identifying

(it pins down the scale of β) but, being a pure scaling, it is not testable. On the other hand, the

Chapter 33. Cointegration and Vector Error Correction Models 308

Listing 33.3: Testing for weak exogeneity

restrict

a1 = 0

end restrict

ts_m = 2*(ll0 - $rlnl)

restrict

a2 = 0

end restrict

ts_p = 2*(ll0 - $rlnl)

restrict

a3 = 0

end restrict

ts_l = 2*(ll0 - $rlnl)

restrict

a4 = 0

end restrict

ts_s = 2*(ll0 - $rlnl)

restrict

a5 = 0

end restrict

ts_y = 2*(ll0 - $rlnl)

loop foreach i m p l s y

printf "Delta $i\t%6.3f [%6.4f]\n", ts_$i, pvalue(X, 6, ts_$i)

endloop

Output (variable, LR test, p-value):

Delta m 18.111 [0.0060]

Delta p 21.067 [0.0018]

Delta l 11.819 [0.0661]

Delta s 16.000 [0.0138]

Delta y 11.335 [0.0786]

Chapter 33. Cointegration and Vector Error Correction Models 309

restriction β1+β2= 0 is testable—the system with this requirement imposed will almost certainly

have a lower maximized likelihood—but it is not identifying; it still leaves open the scale of β.

We said above that the number of restrictions must equal at least r2, where ris the cointegrating

rank, for identiﬁcation. This is a necessary and not a suﬃcient condition. In fact, when r > 1 it

can be quite tricky to assess whether a given set of restrictions is identifying. Gretl uses the method

suggested by Doornik (1995), where identiﬁcation is assessed via the rank of the information matrix.

It can be shown that for restrictions of the sort (33.7) and (33.8) the information matrix has the same

rank as the Jacobian matrix

J(θ) = [(Ip⊗β)G: (α⊗Ip1)H]

A suﬃcient condition for identiﬁcation is that the rank of J(θ) equals the number of free parameters.

The rank of this matrix is evaluated by examination of its singular values at a randomly selected point

in the parameter space. For practical purposes we treat this condition as if it were both necessary

and suﬃcient; that is, we disregard the special cases where identiﬁcation could be achieved without

this condition being met.7

33.7 Numerical solution methods

In general, the ML estimator for the restricted VECM problem has no closed-form solution, hence

the maximum must be found via numerical methods.8In some cases convergence may be diﬃcult,

and gretl provides several choices to solve the problem.

Switching and LBFGS

Two maximization methods are available in gretl. The default is the switching algorithm set out in

Boswijk and Doornik (2004). The alternative is a limited-memory variant of the BFGS algorithm

(LBFGS), using analytical derivatives. This is invoked using the --lbfgs ﬂag with the restrict

command.

The switching algorithm works by explicitly maximizing the likelihood at each iteration, with respect

to ˆ

ϕ,ˆ

ψand ˆ

Ω (the covariance matrix of the residuals) in turn. This method shares a feature with the

basic Johansen eigenvalues procedure, namely, it can handle a set of restrictions that does not fully

identify the parameters.

LBFGS, on the other hand, requires that the model be fully identiﬁed. When using LBFGS, therefore,

you may have to supplement the restrictions of interest with normalizations that serve to identify the

parameters. For example, one might use all or part of the Phillips normalization (see section 33.5).

Neither the switching algorithm nor LBFGS is guaranteed to ﬁnd the global ML solution.9The

optimizer may end up at a local maximum (or, in the case of the switching algorithm, at a saddle

point).

The solution (or lack thereof) may be sensitive to the initial value selected for θ. By default, gretl

selects a starting point using a deterministic method based on Boswijk (1995), but two further options

are available: the initialization may be adjusted using simulated annealing, or the user may supply

an explicit initial value for θ.

The default initialization method is:

1. Calculate the unrestricted ML ˆ

βusing the Johansen procedure.

2. If the restriction on βis non-homogeneous, use the method proposed by Boswijk:

ϕ0=−[(Ir⊗ˆ

β⊥)′H]+(Ir⊗ˆ

β⊥)′h0(33.9)

7See Boswijk and Doornik (2004), pp. 447–8 for discussion of this point.

8The exception is restrictions that are homogeneous, common to all βor all α(in case r > 1), and involve either

βonly or αonly. Such restrictions are handled via the modiﬁed eigenvalues method set out by Johansen (1995). We

solve directly for the ML estimator, without any need for iterative methods.

9In developing gretl’s VECM-testing facilities we have considered a fair number of“tricky cases” from various sources.

We’d like to thank Luca Fanelli of the University of Bologna and Sven Schreiber of Goethe University Frankfurt for

their help in devising torture-tests for gretl’s VECM code.

Chapter 33. Cointegration and Vector Error Correction Models 310

where ˆ

β′

⊥ˆ

β= 0 and A+denotes the Moore–Penrose inverse of A. Otherwise

ϕ0= (H′H)−1H′vec( ˆ

β) (33.10)

3. vec(β0) = Hϕ0+h0.

4. Calculate the unrestricted ML ˆαconditional on β0, as per Johansen:

ˆα=S01β0(β′

0S11β0)−1(33.11)

5. If αis restricted by vec(α′) = Gψ, then ψ0= (G′G)−1G′vec(ˆα′) and vec(α′

0) = Gψ0.

Alternative initialization methods

As mentioned above, gretl oﬀers the option of adjusting the initialization using simulated annealing.

This is invoked by adding the --jitter option to the restrict command.

The basic idea is this: we start at a certain point in the parameter space, and for each of niterations

(currently n= 4096) we randomly select a new point within a certain radius of the previous one,

and determine the likelihood at the new point. If the likelihood is higher, we jump to the new point;

otherwise, we jump with probability P(and remain at the previous point with probability 1 −P). As

the iterations proceed, the system gradually “cools”—that is, the radius of the random perturbation

is reduced, as is the probability of making a jump when the likelihood fails to increase.

In the course of this procedure many points in the parameter space are evaluated, starting with the

point arrived at by the deterministic method, which we’ll call θ0. One of these points will be “best”

in the sense of yielding the highest likelihood: call it θ∗. This point may or may not have a greater

likelihood than θ0. And the procedure has an end point, θn, which may or may not be “best”.

The rule followed by gretl in selecting an initial value for θbased on simulated annealing is this: use

θ∗if θ∗> θ0, otherwise use θn. That is, if we get an improvement in the likelihood via annealing, we

make full use of this; on the other hand, if we fail to get an improvement we nonetheless allow the

annealing to randomize the starting point. Experiments indicate that the latter eﬀect can be helpful.

Besides annealing, a further alternative is manual initialization. This is done by passing a predeﬁned

vector to the set command with parameter initvals, as in

set initvals myvec

The details depend on whether the switching algorithm or LBFGS is used. For the switching algo-

rithm, there are two options for specifying the initial values. The more user-friendly one (for most

people, we suppose) is to specify a matrix that contains vec(β) followed by vec(α). For example:

open denmark.gdt

vecm 2 1 LRM LRY IBO IDE --rc --seasonals

matrix BA = {1, -1, 6, -6, -6, -0.2, 0.1, 0.02, 0.03}

set initvals BA

restrict

b[1] = 1

b[1] + b[2] = 0

b[3] + b[4] = 0

end restrict

In this example—from Johansen (1995)—the cointegration rank is 1 and there are 4 variables. How-

ever, the model includes a restricted constant (the --rc ﬂag) so that βhas 5 elements. The αmatrix

has 4 elements, one per equation. So the matrix BA may be read as

(β1, β2, β3, β4, β5, α1, α2, α3, α4)

The other option, which is compulsory when using LBFGS, is to specify the initial values in terms of

the free parameters, ϕand ψ. Getting this right is somewhat less obvious. As mentioned above, the

implicit-form restriction Rvec(β) = qhas explicit form vec(β) = Hϕ +h0, where H=R⊥, the right

Chapter 33. Cointegration and Vector Error Correction Models 311

nullspace of R. The vector ϕis shorter, by the number of restrictions, than vec(β). The savvy user will

then see what needs to be done. The other point to take into account is that if αis unrestricted, the

eﬀective length of ψis 0, since it is then optimal to compute αusing Johansen’s formula, conditional

on β(equation 33.11 above). The example above could be rewritten as:

open denmark.gdt

vecm 2 1 LRM LRY IBO IDE --rc --seasonals

matrix phi = {-8, -6}

set initvals phi

restrict --lbfgs

b[1] = 1

b[1] + b[2] = 0

b[3] + b[4] = 0

end restrict

In this more economical formulation the initializer speciﬁes only the two free parameters in ϕ(5

elements in βminus 3 restrictions). There is no call to give values for ψsince αis unrestricted.

Scale removal

Consider a simpler version of the restriction discussed in the previous section, namely,

restrict

b[1] = 1

b[1] + b[2] = 0

end restrict

This restriction comprises a substantive, testable requirement—that β1and β2sum to zero—and a

normalization or scaling, β1= 1. The question arises, might it be easier and more reliable to maximize

the likelihood without imposing β1= 1?10 If so, we could record this normalization, remove it for

the purpose of maximizing the likelihood, then reimpose it by scaling the result.

Unfortunately it is not possible to say in advance whether“scale removal” of this sort will give better

results for any particular estimation problem. However, this does seem to be the case more often

than not. Gretl therefore performs scale removal where feasible, unless you

•explicitly forbid this, by giving the --no-scaling option ﬂag to the restrict command; or

•provide a speciﬁc vector of initial values; or

•select the LBFGS algorithm for maximization.

Scale removal is deemed infeasible if there are any cross-column restrictions on β, or any non-

homogeneous restrictions involving more than one element of β.

In addition, experimentation has suggested to us that scale removal is inadvisable if the system is

just identiﬁed with the normalization(s) included, so we do not do it in that case. By “just identiﬁed”

we mean that the system would not be identiﬁed if any of the restrictions were removed. On that

criterion the above example is not just identiﬁed, since the removal of the second restriction would

not aﬀect identiﬁcation; and gretl would in fact perform scale removal in this case unless the user

speciﬁed otherwise.

10As a numerical matter, that is. In principle this should make no diﬀerence.

Chapter 34

Multivariate models

By a multivariate model we mean one that includes more than one dependent variable. Certain

speciﬁc types of multivariate model for time-series data are discussed elsewhere: chapter 32 deals

with VARs and chapter 33 with VECMs. Here we discuss two general sorts of multivariate model,

implemented in gretl via the system command: SUR systems (Seemingly Unrelated Regressions),

in which all the regressors are taken to be exogenous and interest centers on the covariance of the

error term across equations; and simultaneous systems, in which some regressors are assumed to be

endogenous.

In this chapter we give an account of the syntax and use of the system command and its companions,

restrict and estimate; we also explain the options and accessors available in connection with

multivariate models.

34.1 The system command

The speciﬁcation of a multivariate system takes the form of a block of statements, starting with

system and ending with end system. Once a system is speciﬁed it can estimated via various methods,

using the estimate command, with or without restrictions, which may be imposed via the restrict

command.

Starting a system block

The ﬁrst line of a system block may be augmented in either (or both) of two ways:

•An estimation method is speciﬁed for the system. This is done by following system with

an expression of the form method=estimator, where estimator must be one of ols (Ordinary

Least Squares), tsls (Two-Stage Least Squares), sur (Seemingly Unrelated Regressions), 3sls

(Three-Stage Least Squares), liml (Limited Information Maximum Likelihood) or fiml (Full

Information Maximum Likelihood). Two examples:

system method=sur

system method=fiml

OLS, TSLS and LIML are, of course, single-equation methods rather than true system estima-

tors; they are included to facilitate comparisons.

•The system is assigned a name. This is done by giving the name ﬁrst, followed by a back-arrow,

“<-”, followed by system. If the name contains spaces it must be enclosed in double-quotes.

Here are two examples:

sys1 <- system

"System 1" <- system

Note, however, that this naming method is not available within a user-deﬁned function, only in

the main body of a gretl script.

If the initial system line is augmented in the ﬁrst way, the eﬀect is that the system is estimated as

soon as its deﬁnition is completed, using the speciﬁed method. The eﬀect of the second option is

that the system can then be referenced by the assigned name for the purposes of the restrict and

estimate commands; in the gretl GUI an additional eﬀect is that an icon for the system is added to

the “Session view”.

These two possibilities can be combined, as in

312

Chapter 34. Multivariate models 313

mysys <- system method=3sls

In this example the system is estimated immediately via Three-Stage Least Squares, and is also

available for subsequent use under the name mysys.

If the system is not named via the back-arrow mechanism, it is still available for subsequent use

via restrict and estimate; in this case you should use the generic name $system to refer to the

last-deﬁned multivariate system.

The body of a system block

The most basic element in the body of a system block is the equation statement, which is used to

specify each equation within the system. This takes the same form as the regression speciﬁcation for

single-equation estimators, namely a list of series with the dependent variable given ﬁrst, followed by

the regressors, with the series given either by name or by ID number (order in the dataset). A system

block must contain at least two equation statements, and for systems without endogenous regressors

these statements are all that is required. So, for example, a minimal SUR speciﬁcation might look

like this:

system method=sur

equation y1 const x1

equation y2 const x2

end system

For simultaneous systems it is necessary to determine which regressors are endogenous and which

exogenous. By default all regressors are treated as exogenous, except that any variable that appears

as the dependent variable in one equation is automatically treated as endogeous if it appears as a

regressor elsewhere. However, an explicit list of endogenous regressors may be supplied following the

equations lines: this takes the form of the keyword endog followed by the names or ID numbers of

the relevant regressors.

When estimation is via TSLS or 3SLS it is possible to specify a particular set of instruments for each

equation. This is done by giving the equation lists in the format used with the tsls command: ﬁrst

the dependent variable, then the regressors, then a semicolon followed by the instruments, as in

system method=3sls

equation y1 const x11 x12 ; const x11 z1

equation y2 const x21 x22 ; const x21 z2

end system

An alternative way of specifying instruments is to insert an extra line starting with instr, followed

by the list of variables acting as instruments. This is especially useful for specifying the system with

the equations keyword; see the following subsection. As in tsls, any regressors that are not also

listed as instruments are treated as endogenous, so in the example above x11 and x21 are treated as

exogenous while x21 and x22 are endogenous, and instrumented by z1 and z2 respectively.

One more sort of statement is allowed in a system block: that is, the keyword identity followed by

an equation that deﬁnes an accounting relationship, rather then a stochastic one, between variables.

For example,

identityY=C+I+G+X

There can be more than one identity in a system block. But note that these statements are speciﬁc

to estimation via FIML; they are ignored for other estimators.

34.2 Equation systems within functions

It is also possible to deﬁne a multivariate system in a programmatic way. This is useful if the precise

speciﬁcation of the system depends on some input parameters that are not known in advance, but

are given when the script is actually run.

Chapter 34. Multivariate models 314

The relevant syntax is given by the equations keyword (note the plural), which replaces the block

of equation lines in the standard form. This keyword must be followed by two arguments. The ﬁrst

is a named list containing all series on the left-hand side of the system, which determines the number

of equations in the system. The nature of the second argument depends on whether or not the list of

regressors is in common for all equations (as in SUR):

•Common regressors: a second named list.

•Diﬀering regressors: an array of lists, one per equation.

The ﬁrst case is straightforward; the second requires a little more explanation. Suppose we have a

two-equation system with regressors given by the lists xlist1 and xlist2. We can then deﬁne a

suitable array as follows:

lists Xlists = defarray(xlist1, xlist2)

(See section 11.8 for alternative ways of building an array.)

Therefore, specifying a system generically in this way just involves building the necessary list argu-

ments, as shown in the following example:

open denmark

list LHS = LRM LRY

list RHS1 = const LRM(-1) IBO(-1) IDE(-1)

list RHS2 = const LRY(-1) IBO(-1)

lists RHS = defarray(RHS1, RHS2)

system method=ols

equations LHS RHS

end system

As mentioned above, the option of assigning a speciﬁc name to a system is not available within

functions, but the generic identiﬁer $system can be used to similar eﬀect. The following example

illustrates how one can deﬁne a system, estimate it via two methods, apply a restriction, then re-

estimate it subject to the restriction.

function void anonsys(series x, series y)

system

equation x const

equation y const

end system

estimate $system method=ols

estimate $system method=sur

restrict $system

b[1,1] - b[2,1] = 0

end restrict

estimate $system method=ols

end function

34.3 Restriction and estimation

The behavior of the restrict command is a little diﬀerent for multivariate systems as compared with

single-equation models.

In the single-equation case, restrict refers to the last-estimated model, and once the command is

completed the restriction is tested. In the multivariate case, you must give the name of the system to

which the restriction is to be applied (or $system to refer to the last-deﬁned system), and the eﬀect

of the command is just to attach the restriction to the system; testing is not done until the next

estimate command is given. In addition, in the system case the default is to produce full estimates

of the restricted model; if you are not interested in the full estimates and just want the test statistic

you can append the --quiet option to estimate.

A given system restriction remains in force until it is replaced or removed. To return a system to its

unrestricted state you can give an empty restrict block, as in

Chapter 34. Multivariate models 315

restrict sysname

end restrict

As illustrated above, you can use the method tag to specify an estimation method with the estimate

command. If the system has already been estimated you can omit this tag and the previous method

is used again.

The estimate command is the main locus for options regarding the details of estimation. The

available options are as follows:

•If the estimation method is SUR or 3SLS and the --iterate ﬂag is given, the estimator will

be iterated. In the case of SUR, if the procedure converges the results are maximum likelihood

estimates. Iteration of three-stage least squares, however, does not in general converge on the

full-information maximum likelihood results. This ﬂag is ignored for other estimators.

•If the equation-by-equation estimators OLS or TSLS are chosen, the default is to apply a

degrees of freedom correction when calculating standard errors. This can be suppressed using

the --no-df-corr ﬂag. This ﬂag has no eﬀect with the other estimators; no degrees of freedom

correction is applied in any case.

•By default, the formula used in calculating the elements of the cross-equation covariance matrix

ˆσij =ˆu′

iˆuj

where Tis the sample size and ˆuiis the vector of residuals from equation i. But if the --geomean

ﬂag is given, a degrees of freedom correction is applied: the formula is

ˆσij =ˆu′

iˆuj

p(T−ki)(T−kj)

where kidenotes the number of independent parameters in equation i.

•If an iterative method is speciﬁed, the --verbose option calls for printing of the details of the

iterations.

•When the system estimator is SUR or 3SLS the cross-equation covariance matrix is initially esti-

mated via OLS or TSLS, respectively. In the case of a system subject to restrictions the question

arises: should the initial single-equation estimator be restricted or unrestricted? The default is

the former, but the --unrestrict-init ﬂag can be used to select unrestricted initialization.

(Note that this is unlikely to make much diﬀerence if the --iterate option is given.)

34.4 System accessors

After system estimation various matrices may be retrieved for further analysis. Let gdenote the

number of equations in the system and let Kdenote the total number of estimated parameters

(K=Piki). The accessors $uhat and $yhat get T×gmatrices holding the residuals and ﬁtted

values respectively. The accessor $coeff gets the stacked K-vector of parameter estimates; $vcv gets

the K×Kvariance matrix of the parameter estimates; and $sigma gets the g×gcross-equation

covariance matrix, ˆ

Σ.

A test statistic for the hypothesis that Σ is diagonal can be retrieved as $diagtest and its p-value

as $diagpval. This is the Breusch–Pagan test except when the estimator is (unrestricted) iterated

SUR, in which case it’s a Likelihood Ratio test. The Breusch–Pagan test is computed as

LM = T

i=2

i−1

j=1

where rij = ˆσij /pˆσiiˆσj j ; the LR test is

LR = T g

i=1

log ˆσ2

i−log |ˆ

Σ|!

Chapter 34. Multivariate models 316

where ˆσ2

iis ˆu′

iˆui/T from the individual OLS regressions. In both cases the test statistic is distributed

asymptotically as χ2with g(g−1)/2 degrees of freedom.

All these quantities can also be retrieved if necessary via the $system accessor: after successful

completion of the estimation procedure, il will contains a bundle holding various quantities that

describe the estimated system.

Structural and reduced forms for forecasting and simulation

Systems of simultaneous systems can be represented in structural form as

Γyt=A1yt−1+A2yt−2+· ·· +Apyt−p+Bxt+ϵt

where ytrepresents the vector of endogenous variables in period t,xtdenotes the vector of exogenous

variables, and pis the maximum lag of the endogenous regressors. The structural-form matrices can

be retrieved as $sysGamma,$sysA and $sysB respectively, or as elements of the returned $system

bundle. If ytis m×1 and xtis n×1, then Γ is m×mand Bis m×n. If the system contains

no lags of the endogenous variables then the Amatrix is not deﬁned, otherwise Ais the horizontal

concatenation of A1, . . . , Ap, and is therefore m×mp.

From the structural form it is straightforward to obtain the reduced form, namely,

yt= Γ−1 p

i=1

Aiyt−i!+ Γ−1Bxt+vt(34.1)

where vt≡Γ−1ϵt.

As is well known, the reduced form can be used any time one has to calculate the values of the

endogenous variables given the exogenous ones; this is typically necessary in two cases: forecasting

or simulation.

Forecasts for multi-equation systems are generated natively by gretl in response to the fcast com-

mand. This means that—in contrast to single-equation estimation—the values produced via fcast

for a static, within-sample forecast will in general diﬀer from the ﬁtted values retrieved via $yhat.

The ﬁtted values for equation irepresent the expectation of yti conditional on the contemporaneous

values of all the regressors, while the fcast values are conditional on the exogenous and predetermined

variables only.

The above account has to be qualiﬁed for the case where a system is set up for estimation via TSLS

or 3SLS using a speciﬁc list of instruments per equation, as described in section 34.1. In that case it is

possible to include more endogenous regressors than explicit equations (although, of course, there must

be suﬃcient instruments to achieve identiﬁcation). In such systems endogenous regressors that have

no associated explicit equation are treated “as if” exogenous when constructing the structural-form

matrices. This means that forecasts are conditional on the observed values of the “extra” endogenous

regressors rather than solely on the values of the exogenous and predetermined variables.

On the contrary, gretl does not provide a native command for generating simulated data from a

multi-equation system, but this is relatively easily accomplished by means of scripting: script 34.1

gives an example on a 3-variable system.1All equations contain lagged endogenous variables, but the

equation for consumption at time talso contains income at time tas an explanatory variable. This

makes the system simultaneous, so we use FIML as the estimation method.

Once the system is estimated, we store its results to a bundle named sys, so as to make it easier to

retrieve certain quantities used in the remainder of the script.

First, we compute the reduced form matrices by using the Gamma,Aand Bbundle elements. Of course,

simulation needs values for the exogenous variable, which are easy to create in a system such as this

where all the exogenous variables are deterministic. The simulation horizon is set for this example at

12 periods.

Subsequently, structural-form disturbances are drawn randomly from a multivariate normal distibu-

tion with mean 0 and variance equal to the estimated covariance matrix ˆ

Σ, available as the sigma

1Note: the system of equations that is being estimated here is not meant to stand for a realistic model of the

European economy. It is just set up in such a way to provide a simple example.

Chapter 34. Multivariate models 317

Listing 34.1: Simulation from a simultaneous equation system [Download ▼]

set verbose off

set seed 131020

### --------------------------------------------------

### load the data and generate the variables

### --------------------------------------------------

open AWM18.gdt --quiet

Con = log(PCR)

Inv = log(GCR)

Inc = log(YER)

list EXO = const time

### --------------------------------------------------

### estimate the system via FIML

### --------------------------------------------------

system method=fiml

equation Con EXO Con(-1) Inc(0 to -1)

equation Inv EXO Inv(-1) Inc(-1)

equation Inc EXO Inc(-1 to -2) Inv(-1)

end system

bundle sys = $system # save the estimated system to a bundle

### --------------------------------------------------

### compute the reduced form VAR representation

### --------------------------------------------------

matrix iG = inv(sys.Gamma)

matrix rfA = iG * sys.A

matrix rfB = iG * sys.B

### --------------------------------------------------

### produce the simulation

### --------------------------------------------------

scalar horizon = 12

### retrieve a few magnitudes from the estimated system

scalar g = sys.neqns # number of equations

scalar p = cols(sys.A) / g # maximum lag

### future values of the exogenous variable

matrix SimExo = 1 ~ seq($nobs + 1, $nobs + horizon)’

matrix X = SimExo * rfB’

### simulated disturbances

E = mnormal(horizon, g) * cholesky(sys.sigma)’ # reduced form disturbances

V = E * iG’ # structural form disturbances

### initial values

list ENDO = Con Inv Inc

matrix init = {ENDO}[$nobs-p+1:,]

### perform simulation

Sim = varsimul(rfA, X + V, init)

print Sim

Chapter 34. Multivariate models 318

element of the sys bundle. These are then mapped to reduced-form innovations via the relationship

vt≡Γ−1ϵt.

Finally, all these ingredients are combined to produce the simulated values with the varsimul function.

Note that initial values for the VAR recursion are taken from the latest available data. Running the

script should produce the following set of simulated values:

Sim (14 x 3)

13.887 12.874 14.508

13.889 12.877 14.515

13.893 12.880 14.520

13.895 12.885 14.518

13.895 12.894 14.517

13.900 12.902 14.520

13.907 12.908 14.525

13.917 12.910 14.534

13.920 12.911 14.539

13.919 12.906 14.547

13.934 12.910 14.567

13.935 12.908 14.575

13.942 12.913 14.581

13.944 12.916 14.583

Chapter 35

Forecasting

35.1 Introduction

In some econometric contexts forecasting is the prime objective: one wants estimates of the future

values of certain variables to reduce the uncertainty attaching to current decision making. In other

contexts where real-time forecasting is not the focus prediction may nonetheless be an important

moment in the analysis. For example, out-of-sample prediction can provide a useful check on the

validity of an econometric model. In other cases we are interested in questions of “what if”: for

example, how might macroeconomic outcomes have diﬀered over a certain period if a diﬀerent policy

had been pursued? In the latter cases “prediction” need not be a matter of actually projecting into the

future but in any case it involves generating ﬁtted values from a given model. The term “postdiction”

might be more accurate but it is not commonly used; we tend to talk of prediction even when there

is no true forecast in view.

This chapter oﬀers an overview of the methods available within gretl for forecasting or prediction

(whether forward in time or not) and explicates some of the ﬁner points of the relevant commands.

35.2 Saving and inspecting ﬁtted values

In the simplest case, the “predictions” of interest are just the (within sample) ﬁtted values from an

econometric model. For the single-equation linear model, yt=Xtβ+ut, these are ˆyt=Xtˆ

β.

In command-line mode, the ˆyseries can be retrieved, after estimating a model, using the accessor

$yhat, as in

series yh = $yhat

If the model in question takes the form of a system of equations, $yhat returns a matrix, each column

of which contains the ﬁtted values for a particular dependent variable. To extract the ﬁtted series

for, e.g., the dependent variable in the second equation, do

matrix Yh = $yhat

series yh2 = Yh[,2]

Having obtained a series of ﬁtted values, you can use the fcstats function to produce a vector of

statistics that characterize the accuracy of the predictions (see section 35.4 below).

The gretl GUI oﬀers several ways of accessing and examining within-sample predictions. In the

model display window the Save menu contains an item for saving ﬁtted values, the Graphs menu

allows plotting of ﬁtted versus actual values, and the Analysis menu oﬀers a display of actual, ﬁtted

and residual values.

35.3 The fcast command

The fcast command (and its equivalent GUI invocation, see below) generates predictions based

on the last estimated model. Several questions arise here: How to control the range over which

predictions are generated? How to control the forecasting method (where a choice is available)?

How to control the printing and/or saving of the results? Basic answers can be found in the Gretl

Command Reference; we add some more details here.

The forecast range

The range defaults to the currently deﬁned sample range. If this remains unchanged following esti-

mation of the model in question, the forecast will be “within sample” and (with some qualiﬁcations

319

Chapter 35. Forecasting 320

noted below) it will essentially duplicate the information available via the retrieval of ﬁtted values

(see section 35.2 above).

A common situation is that a model is estimated over a given sample and then forecasts are wanted for

a subsequent out-of-sample range. The simplest way to accomplish this is via the --out-of-sample

option to fcast. For example, assuming we have a quarterly time-series dataset containing observa-

tions from 1980:1 to 2008:4, four of which are to be reserved for forecasting:

# reserve the last 4 observations

smpl 1980:1 2007:4

ols y 0 xlist

fcast --out-of-sample

This will generate a forecast from 2008:1 to 2008:4.

There are two other ways of adjusting the forecast range, oﬀering ﬁner control:

•Use the smpl command to adjust the sample range prior to invoking fcast.

•Use the optional startobs and endobs arguments to fcast (which should come right after the

command word). These values set the forecast range independently of the sample range.

What if one wants to generate a true forecast that goes beyond the available data? In that case

one can use the dataset command with the addobs parameter to add extra observations before

forecasting. For example:

# use the entire dataset, which ends in 2008:4

ols y 0 xlist

dataset addobs 4

fcast 2009:1 2009:4

But this will work as stated only if the set of regressors in xlist does not contain any stochastic

regressors other than lags of y. The dataset addobs command attempts to detect and extrapolate

certain common deterministic variables (e.g., time trend, periodic dummy variables). In addition,

lagged values of the dependent variable can be supported via a dynamic forecast (see below for

discussion of the static/dynamic distinction). But “future” values of any other included regressors

must be supplied before such a forecast is possible. Note that speciﬁc values in a series can be set

directly by date, for example: x1[2009:1] = 120.5. Or, if the assumption of no change in the

regressors is warranted, one can do something like this:

loop t=2009:1..2009:4

loop foreach i xlist

$i[t] = $i[2008:4]

endloop

In single-equation OLS models a --recursive forecast option is also available, expanding the estima-

tion sample one-by-one and re-calculating the forecasts again and again for the constantly updated

information set. In this case a number must be given of how many periods ahead should be forecast

for each of the estimation samples. Note that only this k-steps-ahead forecast will be printed (or

accessible in $fcast), not the interim values from step 1 through k−1 (if k > 1). If those interim

values are also needed, then several fcast ... --recursive rounds would have to be done with

diﬀerent steps-ahead numbers.

Static and dynamic forecasts

The distinction between static and dynamic forecasts applies only to dynamic models, i.e., those that

feature one or more lags of the dependent variable. The simplest case is the AR(1) model,

yt=α0+α1yt−1+ϵt(35.1)

Chapter 35. Forecasting 321

In some cases the presence of a lagged dependent variable is implicit in the dynamics of the error

term, for example

yt=β+ut

ut=ρut−1+ϵt

which implies that

yt= (1 −ρ)β+ρyt−1+ϵt

Suppose we want to forecast yfor period susing a dynamic model, say (35.1) for example. If we have

data on yavailable for period s−1 we could form a ﬁtted value in the usual way: ˆys= ˆα0+ ˆα1ys−1.

But suppose that data are available only up to s−2. In that case we can apply the chain rule of

forecasting:

ˆys−1= ˆα0+ ˆα1ys−2

ˆys= ˆα0+ ˆα1ˆys−1

This is what is called a dynamic forecast. A static forecast, on the other hand, is simply a ﬁtted value

(even if it happens to be computed out-of-sample).

Printing, displaying, and saving forecasts

When working from the GUI, the way to perform and access forecasts is to ﬁrst estimate a model

with some inherently dynamic features, and then in the model window navigate to the Forecasts entry

in the Analysis menu. If some out-of-sample observations are already available (see above) a dialog

window is presented where the discussed forecasting options can be chosen by pointing and clicking.

Executing the forecasts then automatically yields two result windows: one with a time-series plot of

the forecasts along with their conﬁdence bands (if those were chosen), and another one with tabular

output.

The produced plot can be saved to the current session or exported like any other plot in gretl by

right-clicking. Notice that in the textual result window there is a “+” button at the top which oﬀers

to save the point forecasts and their standard errors as new series to the active dataset.

In a command line context the fcast command automatically prints out the tables with the produced

forecasts, their standard errors, and associated conﬁdence intervals—unless you wish to suppress this

verbose output with the options --stats-only or --quiet. The former option restricts output to

the forecast evaluation statistics as explained in the next section; the latter option silences output

altogether. Another accepted syntax variant is to supply the name of a new series for the point

forecasts after the fcast command, as for example in fcast Yfc --out-of-sample. At the same

time this also suppresses printout.

Accessing and saving the produced forecast time series along with the estimated standard errors also

works through the $fcast and $fcse accessors available after fcast execution. These return vectors

as gretl matrix objects, not series, so if you want to add the results to the dataset in this way you

would have to set the active sample to the forecast range ﬁrst. (You can of course ﬁrst access and

store the matrices and then later after resetting the sample assign them to series.) Note that the

estimated standard errors do not incorporate parameter uncertainty in the case of dynamic models.

But if you want to create forecast plots within a script the relevant option already has to be appended

to the fcast command. As explained in the command reference, specify --plot=<filename> (without

the < > symbols) to save the plot ﬁle directly to disc, namely by default to the active working directory

if no full path is speciﬁed.1

35.4 Univariate forecast evaluation statistics

Let ytbe the value of a variable of interest at time tand let ftbe a forecast of yt. We deﬁne the

forecast error as et=yt−ft. Given a series of Tobservations and associated forecasts we can

construct several measures of the overall accuracy of the forecasts. Some commonly used measures

1Being a single plot, this is currently not available for forecasts based on multiple equation systems. If the path

contains spaces it must be enclosed in quotes.

Chapter 35. Forecasting 322

are the Mean Error (ME), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean

Percentage Error (MPE) and Mean Absolute Percentage Error (MAPE). These are deﬁned as follows.

ME = 1

t=1

etRMSE = v

t=1

tMAE = 1

t=1 |et|

MPE = 1

t=1

100 et

MAPE = 1

t=1

100 

yt

A further relevant statistic is Theil’s U, of which there are two variants: U1Theil (1961) and U2

(Theil,1966). The ﬁrst is deﬁned thus

U1="1

t=1

(yt−ft)2#0.5

·

 1

t=1

t!0.5

+ 1

t=1

t!0.5



−1

and is bounded by 0 and 1. Value close to zero indicate high forecast accuracy; U1approaches 1 as

the forecast errors grow arbitrarily large. The second is deﬁned as the positive square root of

2=1

T−1

t=1 ft+1 −yt+1

yt2

·"1

T−1

t=1 yt+1 −yt

yt2#−1

U2depends on the data having a natural ordering and is applicable only for time series data. It can

be interpreted as the ratio of the RMSE of the proposed forecasting model to the RMSE of a na¨

ıve

model which simply predicts yt+1 =ytfor all t. The na¨

ıve model yields U2= 1; values less than 1

indicate an improvement relative to this benchmark and values greater than 1 a deterioration.

In addition, Theil (1966, pp. 33–36) proposed a decomposition of the MSE which can be useful in

evaluating a set of forecasts. He showed that the MSE could be broken down into three non-negative

components as follows

MSE = ¯

f−¯y2+ (sf−rsy)2+1−r2s2

where ¯

fand ¯yare the sample means of the forecasts and the observations, sfand syare the respective

standard deviations (using Tin the denominator), and ris the sample correlation between yand f.

Dividing through by MSE we get

¯

f−¯y2

MSE +(sf−rsy)2

MSE +1−r2s2

MSE = 1 (35.2)

Theil labeled the three terms on the left-hand side of (35.2) the bias proportion (UM), regression

proportion (UR) and disturbance proportion (UD), respectively. If yand frepresent the in-sample

observations of the dependent variable and the ﬁtted values from a linear regression then the ﬁrst

two components, UMand UR, will be zero (apart from rounding error), and the entire MSE will be

accounted for by the unsystematic part, UD. In the case of out-of-sample prediction, however (or

“prediction” over a sub-sample of the data used in the regression), UMand URare not necessarily

close to zero. UMdiﬀers from zero if and only if the mean of the forecasts diﬀers from the mean of

the realizations, and URis non-zero if and only if the slope of a simple regression of the realizations

on the forecasts diﬀers from 1.

The above-mentioned statistics are printed as part of the output of the fcast command. They can

also be retrieved in the form of a column vector using the function fcstats, which takes two series

arguments corresponding to yand f. The vector returned is

ME RMSE MAE MPE MAPE U UMURUD′

where Uis U2for time series data, U1otherwise. (Note that the MSE is not included since it can

easily be obtained given the RMSE.) The series given as arguments to fcstats must not contain any

missing values in the current sample range; use the smpl command to adjust the range if needed. See

the Gretl Command Reference for more detail on fcstats.

Chapter 35. Forecasting 323

35.5 Forecasts based on VAR models

The interface for forecasting from a VAR is similar to that for a single equation. Here’s an example

via scripting. The code:

# open sample data file

open sw_ch14.gdt --quiet

# generate the "inflation" series

series INFL = 100 * sdiff(log(PUNEW))

# put last year aside for out-of-sample forecast

smpl ; -4

# estimate a 5-lag VAR

var 5 LHUR INFL --silent

# store fitted values (note: result is a 2-column matrix)

YH = $yhat

# perform out-of-sample forecast (both versions)

fcast LHUR --static --out-of-sample

# note that omission of the variable specification means "all"

fcast --dynamic --out-of-sample

yields

For 95% confidence intervals, t(140, 0.025) = 1.977

LHUR prediction std. error 95% interval

1999:1 4.300000 4.335004 0.222784 3.894549 - 4.775460

1999:2 4.300000 4.243244 0.222784 3.802788 - 4.683699

1999:3 4.233333 4.290981 0.222784 3.850525 - 4.731436

1999:4 4.100000 4.178030 0.222784 3.737575 - 4.618486

Forecast evaluation statistics

Mean Error -0.028481

Root Mean Squared Error 0.058861

Mean Absolute Error 0.05686

Mean Percentage Error -0.68977

Mean Absolute Percentage Error 1.3497

Theil’s U2 0.75027

Bias proportion, UM 0.23414

Regression proportion, UR 0.0081804

Disturbance proportion, UD 0.75768

For 95% confidence intervals, t(140, 0.025) = 1.977

LHUR prediction std. error 95% interval

1999:1 4.300000 4.335004 0.222784 3.894549 - 4.775460

1999:2 4.300000 4.312724 0.401960 3.518028 - 5.107421

1999:3 4.233333 4.272764 0.539582 3.205982 - 5.339547

1999:4 4.100000 4.223213 0.642001 2.953943 - 5.492482

Forecast evaluation statistics

Mean Error -0.052593

Root Mean Squared Error 0.067311

Mean Absolute Error 0.052593

Mean Percentage Error -1.2616

Mean Absolute Percentage Error 1.2616

Theil’s U2 0.87334

Bias proportion, UM 0.61049

Regression proportion, UR 0.29203

Chapter 35. Forecasting 324

Disturbance proportion, UD 0.097478

INFL prediction std. error 95% interval

1999:1 1.651245 1.812250 0.431335 0.959479 - 2.665022

1999:2 2.048545 2.088185 0.777834 0.550366 - 3.626004

1999:3 2.298952 2.266445 1.075855 0.139423 - 4.393467

1999:4 2.604836 2.610037 1.409676 -0.176969 - 5.397043

Forecast evaluation statistics

Mean Error -0.043335

Root Mean Squared Error 0.084525

Mean Absolute Error 0.059588

Mean Percentage Error -2.6178

Mean Absolute Percentage Error 3.3248

Theil’s U2 0.095932

Bias proportion, UM 0.26285

Regression proportion, UR 0.45311

Disturbance proportion, UD 0.28404

One of the main diﬀerences is that specifying a variable name after the fcast command does not

mean to save something under that name, but now it serves to pick one of the Nvariables of the VAR

for printing out the forecasts. That leaves only the $fcast and $fcse accessors to obtain and save

the produced forecasts—in this system case the returned matrix objects will have as many columns

as equations.

In the GUI the relevant menu entry is again Forecasts in the Analysis menu in the window of the

estimated VAR model. Here the user must pick the variable of interest, after which a dialog window

with relevant options is presented. As in the single-equation context a plot and a textual output

windows are created. Again, forecast series can be added to the dataset through the “+” button, and

the plot can be saved or exported.

Special VAR cases: exogenous variables, cointegration

It may be worth noting that when a VAR is speciﬁed with additional (non-deterministic) exogenous

regressors a similar issue as with single equations arises: the forecast is conditional and requires some

assumptions about the development of those regressors out of sample. As before, these values can be

easily ﬁlled in after the dataset has been extended with the observations for the forecasting sample,

but naturally only the user, not gretl, can and must decide what those values should be. This includes

hand-crafted deterministic variables like shift dummies; but on the other hand standard deterministic

terms like trends and seasonals will be extrapolated by gretl automatically.

Using a cointegrated VAR model with gretl’s vecm command does not change the way a forecast is ob-

tained afterwards. The VECM can be internally represented as a VAR (in levels) that automatically

contains the reduced-rank restrictions of cointegration, and this VAR form is then used to calculate

the forecasts. Providing forecast standard errors and the associated conﬁdence bands is also straight-

forward since only the innovation uncertainty is captured in those. This ease of use also carries over

to the situation when a VECM with additional exogenous terms is used for forecasting—provided

that future values of the exogenous variables are speciﬁed, of course.

35.6 Forecasting from simultaneous systems

To be interesting for a forecasting application, a simultaneous-equation system must be dynamic,

including some lags of endogenous variables as regressors. Otherwise we would be conducting a

scenario analysis purely conditional on assumed exogenous developments. For the following discussion

we therefore presuppose that we are dealing with such a dynamic system. Then the diﬀerence between

such a model set up with gretl’s system block and a VAR system concerns mainly two aspects: First,

a VAR model is already given as a so-called reduced form and as such is ready to be used for forward

simulation a.k.a. forecasting. In contrast, a simultaneous system can come in a structural form with

some contemporaneous endogenous variables as regressors in the equations; the future values of those

Chapter 35. Forecasting 325

regressors are unknown, however. Second, a plain VAR is estimated by OLS, whereas a simultaneous

system can be estimated with diﬀerent methods for reasons of eﬃciency.

Neither of these diﬀerences present any deep challenge for forecasting, however.

•As explained at the end of the previous chapter on multivariate models (see the subsection titled

“Structural and reduced forms”), it is easy to obtain the reduced form of any such simultaneous

equation system. This reduced form is used by gretl to simulate the system forward in time, just

as with a VAR model. The slight complication for computing the forecast variances is merely

that the estimated error term ϵtfrom the structural form must be mapped to the reduced-

form innovations vtusing the (inverse of the) estimated structural relations matrix Γ. This is

automatically taken into account.

•The estimation method through which the coeﬃcient values of the system are determined does

not matter for forecasting. The prediction algorithm can simply take these point estimates as

given, use these for calculating the associated reduced form, and use that representation to

iterate the model forward over the desired forecasting horizon. It should nonetheless be obvious

that diﬀerent estimators entail diﬀerent forecast values.

As a consequence of these considerations, the way to handle forecasts from simultaneous systems in

gretl is exactly as discussed before in the context of VARs (possibly with exogenous regressors). This

applies to the command-line interface as well as the GUI.

Chapter 36

State Space Modeling

36.1 Introduction

This chapter describes the handling of linear state space models in gretl 2022b and higher.1Here is

a brief high-level overview of gretl’s Kalman apparatus.

•To obtain a Kalman structure—in the form of a bundle—you use the ksetup function.

•Having obtained such a bundle you can then adjust its contents, as described in detail below.

•You then “do things” with your state space model via the functions kfilter (forecasting)

ksmooth (state smoothing), kdsmooth (disturbance smoothing), and/or ksimul (simulation).

36.2 Notation

In this document our basic representation of a state space model is given by the following pair of

equations:

yt=Ztαt+εt(36.1)

αt+1 =Ttαt+ηt(36.2)

where (36.1) is the observation or measurement equation and (36.2) is the state transition equation.

The state vector, αt, is (r×1) and the vector of observables, yt, is (n×1). The (n×1) vector εtand

the (r×1) vector ηtare assumed to be vector Gaussian white noise:

E(εtε′

s)=Σtfor t=s, otherwise 0

E(ηtη′

s)=Ωtfor t=s, otherwise 0

The number of time-series observations is denoted by N. In the case where Zt=Z,Tt=T, Σt= Σ

and Ωt= Ω for all tthe model is said to be time-invariant. We assume time-invariance in much of

what follows but discuss the time-varying case—along with other extensions of the basic model—in

section 36.9.

36.3 Deﬁning the model as a bundle

The ksetup function is used to initialize a state space model by specifying only its indispensable

elements: the observables and their link to the unobserved state vector, plus the law of motion for

the latter and the covariance matrix of its innovations. Therefore, the function takes a minimum of

four arguments. The corresponding bundle keys are as follows:

Symbol Dimensions Reserved key

y N ×nobsy

Z n ×robsymat

T r ×rstatemat

Ωr×rstatevar

1The user interface was substantially diﬀerent prior to version 2017a. For example, be aware that Lucchetti

(2011) is based on the old syntax. If anyone needs documentation for the original interface it can be found at http:

//gretl.sourceforge.net/papers/kalman_old.pdf. Additional functionality relating to “exact diﬀuse” initialization

of the Kalman ﬁlter was added in version 2022b.

326

Chapter 36. State Space Modeling 327

☞Please note that the matrix Zin the observation equation must be given in transposed form. This is required

to preserve compatibility with gretl versions prior to 2022a. Correspondingly, if you retrieve this matrix using its

key, obsymat, it’s the transpose you actually obtain.

The names of these input matrices don’t matter; in fact they may be anonymous matrices constructed

on the ﬂy. But if and when you wish to copy them out of the bundle you must use the speciﬁed keys,

as in

matrix Z = SSmod.obsymat’

matrix T = SSmod.statemat

Although all the arguments are in principle matrices, as a convenience you may give obsy as a series

or list of series, and the other arguments can be given as scalars if in context they are 1 ×1.

If applicable you may specify any of the following optional input matrices:2

Symbol Dimensions Key If omitted. . .

Σn×nobsvar no disturbance term in observation equation

α0r×1inistate α0is a zero vector

P0r×rinivar P0is set automatically

These matrices are not passed to ksetup, rather you add them to the bundle returned by ksetup

(under their reserved keys) as you usually add elements to a bundle, for example:

SSmod.obsvar = Veps

Naturally, the arguments you pass to ksetup must have mutually compatible dimensions, otherwise

an error is returned. Once setup is complete the dimensions of the model—r,nand N—become

available as scalar members of the bundle (under their own names).

In case inivar is not speciﬁed the matrix P1|0will be automatically initialized by gretl only if all the

eigenvalues of Tlie inside the unit circle and the model is stationary. In this case the variance for the

marginal distribution of αtis well deﬁned and the initializer is computed using

vec(P1|0)=[I−T⊗T]−1vec(Ω)

If the above condition is not satisﬁed you will have to make a choice on which technique to use for

“diﬀuse” initialization.

In Section 36.8 we provide a fuller discussion of the various options, but here’s what is probably the

bottom line for many users. In earlier versions of gretl a rather crude solution was adopted, initializing

P1|0to a “numerically large” matrix. This was accomplished by setting a value of 1 on the bundle

under the (reserved) key diffuse. From gretl version 2022b on, if you have scripts where you set

diffuse=1 on your Kalman bundle you can now try diffuse=2 instead. This invokes the new“exact

initial” method for state space models with a diﬀuse initializer. Don’t expect identical results from

the new code, but to the extent results diﬀer the new ones should be somewhat more accurate. (If

results diﬀer wildly you’ve probably found a bug; please report it!) You may also ﬁnd that the new

code is faster; it should be less likely to get hung up on numerical problems that delay or prevent

convergence of ML estimation.

36.4 Special features of state-space bundles

A bundle created by ksetup works in most ways like any other gretl bundle but some diﬀerences should

be noted. With an ordinary bundle you can replace or delete members at will; with a state-space

bundle there are certain constraints.

•You can replace the coeﬃcient matrices obsymat,statemat,statevar and (if applicable)

obsvar in a given bundle if you wish—but only on condition that the replacement matrix

has the same dimensions as the original. In other words, the dimensions rand nare set once

and for all by the ksetup call (section 36.3).

2Additional optional matrices are described in section 36.9 below.

Chapter 36. State Space Modeling 328

•You can replace the data matrix obsy subject to the condition that the number of columns, n,

is unchanged; the time-series length, N, is mutable.

•None of the input matrices just mentioned can be deleted from the bundle.

•Output matrices that are automatically added to the bundle by the functions described in the

following sections can be deleted (if you don’t need them and want to save on storage). But

they cannot be replaced by arbitrary user content under the same key.

•The only other“special” member that can be deleted is the function call (string) that is discussed

in section 36.9.

Nonetheless, in the “user area” of the bundle (that is, under keys other than the reserved ones noted

in this chapter) the usual rules apply.

For all the “k” functions described below the ﬁrst argument (and in most cases the only argument)

must be a pointer to a bundle obtained via ksetup. Any old bundle will not do. A “pointer to

bundle” is speciﬁed by preﬁxing the name of the bundle with an ampersand, as in “&SSmod”. Passing

the argument in this form allows these functions to modify the content of the bundle.

36.5 The kfilter function

Once a model is established as described in the previous section, kfilter can be used to run a

forward, forecasting pass. This function takes a single argument, namely a bundle-pointer, and it

returns a scalar code: 0 for successful completion or non-zero if numerical problems were encountered.

The forward iteration is as follows.

vt=yt−Ztαt

Ft=ZtPtZ′

t+ Σt

Mt=TtPtZ′

αt+1 =Ttαt+Ktvt

Pt+1 =TtPtT′

t+ Ωt−Ct

where Kt=MtF−1

tis the Kalman gain, and Ct=MtF−1

tM′

On successful completion several elements are added to the input bundle (or updated if they’re

already present). A scalar under the key lnl gives the overall loglikelihood under the joint normality

assumption,

ℓ=−1

2"nN log(2π) +

t=1

log |Ft|+

t=1

v′

tF−1

tvt#

while the key llt gives access to a N-vector, element tof which is

ℓt=−1

2nlog(2π) + log |Ft|+v′

tF−1

tvt

In addition the scalar s2 holds the scale factor,

ˆσ2=1

nN −d

t=1

v′

tF−1

tvt

where ddenotes the number of elements in the state vector subject to a diﬀuse initialization. This is

as in SsfPack 2.2 (Koopman et al.,1999).

Five additional matrices also become available. Each of these has Nrows, one for each time-step; the

contents of the rows are as follows.

1. Forecast errors for the observable variables, v′

t,ncolumns: key prederr.

2. Variance matrix for the forecast errors, vech(Ft)′,n(n+ 1)/2 columns: key pevar.

Chapter 36. State Space Modeling 329

3. Estimate of the state vector, ˆα′

t|t−1,rcolumns: key state.

4. MSE of estimate of the state vector, vech(Pt|t−1)′,r(r+ 1)/2 columns: key stvar.

5. Kalman gain, vec(Kt)′,rn columns: key gain.

The Kalman gain is rarely required by the user as such. However, since it is a key quantity in the

ﬁltering algorithm we make it available under a dedicated key for diagnostic purposes in case numerical

problems should arise. For example, the following retrieves the gain after a ﬁltering operation:

kfilter(&SSmod)

matrix G = SSmod.gain

Then if you want to retrieve, for example, the matrix Kat time 10, you need to reshape the tenth

row of Ginto the appropriate dimensions:

matrix K10 = mshape(G[10,], SSmod.r, SSmod.n)

36.6 The ksmooth function

Like kfilter this function takes a single bundle-pointer argument and returns an integer error code

(0 indicating success). It runs a forward, ﬁltering pass followed by a backward pass which computes

a smoothed estimate of the state and its MSE using the method of Anderson and Moore.

Note that since ksmooth starts with a forward pass, it can be run without a prior call to kfilter.

This may appear to be useless duplication, but in fact it enables an eﬃcient scripting option. The

main utility of the forward pass lies in the calculation of the log-likelihood in the context of estimation,

but if a state space model contains no parameters that have to be estimated, the model setup can be

followed directly by a call to ksmooth. (And the same goes for kdsmooth below.)

The backward-pass algorithm is as follows: for t=N, . . . , 1

Lt=Tt−KtZ′

ut−1=ZtF−1

tvt+L′

tut

Ut−1=ZtF−1

tZ′

t+L′

tUtLt

ˆαt|T= ˆαt|t−1+Pt|t−1ut−1

Pt|T=Pt|t−1−Pt|t−1Ut−1Pt|t−1

with initial values uN= 0 and UN= 0.

On successful completion all the quantities computed by kfilter are available as bundle members

(see section 36.5), but the keys state and stvar now give the smoothed estimates. That is, row tof

the state matrix holds ˆα′

t|Tand row tof stvar holds Pt|T, in transposed vech form with r(r+ 1)/2

elements.

36.7 The kdsmooth function

As with ksmooth, this function requires a bundle-pointer argument and returns an integer error code

(0 indicating success). It runs a forward, ﬁltering pass followed by a backward pass which computes a

smoothed estimate of the disturbances along with a dispersion measure, using the methods described

in Koopman (1993) and Koopman et al. (1999).

Upon successful execution of the function the bundle will contain under the key smdist an N×(r+n)

matrix holding smoothed estimates of ηtand εt. That is, a matrix whose t-th row contains

(ˆη′

t,ˆε′

t) = E[(η′

t, ε′

t)|y1, . . . , yT]

(This assumes the observation equation has a stochastic component; if it does not, then smdist is just

N×r.) Once the smoothed disturbances are obtained the smoothed state can be calculated quickly

and easily, so a call to kdsmooth updates the state member of the bundle passed as argument.

However, the variance of the state (stvar) is not updated by kdsmooth; only ksmooth does that.

Chapter 36. State Space Modeling 330

An associated dispersion measure is provided under the key smdisterr. The precise deﬁnition of

this matrix depends on a second, optional Boolean parameter. Before describing the action of this

parameter we need to give a brief account of the two variance measures that are found in the literature

on disturbance smoothing. Our account runs in terms of the state disturbance, ηt, but it applies

equally to the observation disturbance, εt, if present.

Two measures of variance

One measure of variance is the mean square distance of the inferred disturbances from zero (that is,

from their unconditional expectation). Let us call this V(1)

V(1)

t=E(ˆηtˆη′

This measure is used in computing the so-called auxiliary residuals, which are advocated in Durbin

and Koopman (2012) as useful diagnostic tools. Auxiliary residuals for the state equation are obtained

by dividing ˆηtby the square roots of the associated diagonal elements of V(1)

t. In computing this

matrix we use the formulae given in Koopman et al. (1999, section 4.4).

A second measure of variance is the mean squared distance of the inferred disturbances from their

true values, or in other words the mean squared error, which we’ll write as V(2)

V(2)

t=E(ˆηt−ηt) (ˆηt−ηt)′|y1, . . . , yT

We calculate this matrix using the formulae given in Durbin and Koopman (2012, section 4.5.2). Its

diagonal elements can be used to form conﬁdence intervals for the true disturbances.

We are now ready to state what gretl provides under the key smdisterr. If the optional second

argument to kdsmooth is present and non-zero the results are based on V(2)

t, otherwise (that is, by

default) they are based on V(1)

t. In either case row tof smdisterr contains the square roots of the

diagonal elements of the matrix in question: the ﬁrst relements pertain to the state disturbances

and the following nelements to the observation equation (if applicable). Like smdist,smdisterr has

Nrows and either r+nor just rcolumns depending on whether or not there’s a disturbance term

in the observation equation. We return standard deviations rather than variances since most of the

time it’s the former that users will actually want.

Section 36.12 presents a script which exercises the disturbance smoother and illustrates the diﬀerence

between V(1)

tand V(2)

36.8 Diﬀuse initialization of the state vector

We describe a state space model as “diﬀuse” if it is impossible to pin down the variance of αt, which

is usually denoted by Pt. This may happen either because the model is non-stationary (in which case

Ptis not even deﬁned) or simply out of lack of information.

In that case there are two possible approaches. The “traditional” one, used by gretl up to version

2022a, is to ascribe a very large variance to the initial Pt, as in P0=κ×Irwhere κis, say, 107. You

can impose this diﬀuse prior by setting

SSmod.diffuse = 1

In some cases this strategy may lead to numerical problems. It may then be helpful to specify a

diﬀuse initializer via inivar using a somewhat smaller value of κ, as in

SSmod.inivar = 1.0e5 * I(stdim)

where stdim is the dimension of the state.

While the “κ×I” approach works fairly well in many cases it is nowadays generally deprecated in

favor of one or other “exact initial” method. Such methods depend on derivation of the properties of

the Kalman ﬁlter (and smoother) in the limit, as the aforementioned “very large” variance tends to

inﬁnity. In libgretl we have implemented two such methods: the “univariate approach to multivariate

observable”advocated by Durbin and Koopman (2012) and the augmented Kalman method set out by

Chapter 36. State Space Modeling 331

de Jong (1991) and de Jong and Chu-Chun-Lin (2003).3We’ll refer to them via the labels univariate

and dejong, respectively.

Exact diﬀuse methods

The univariate approach handles a vector observable by “unpacking” it and substituting scalar

calculations for matrix ones so far as possible. Durbin and Koopman claim it is faster than the

alternatives. It is also able to deal in a straightforward way with incomplete observations (where

some but not all elements of ytare missing at time t): it can utilize any non-missing elements while

ignoring the missing ones. However, it runs into complications if (a) the variance matrix of the

observation disturbances is not diagonal, and/or (b) the disturbances are correlated between the

state and observation equations. Case (a) can be handled at the cost of some extra preliminary

computation—transforming yand Zto induce a diagonal variance matrix—and this is automatically

carried out by gretl if needed. Handling case (b) is more bothersome, requiring augmentation of the

state; at present this not is supported in gretl.

The dejong approach has no problem with the variance cases (a) and (b) mentioned above. However,

it’s not clear how incomplete observations can be handled and at present observations with any

missing elements are ignored.

In short, there are cases where univariate may work best, and other cases that are not handled by

univariate but where dejong works ﬁne. Hence our decision to implement both methods.

Table 36.1 sets out the various cases that arise via combination of “code” (where legacy indicates the

Kalman code as of gretl 2022a) and “diﬀuse status” (i.e. whether the model is diﬀuse, and if so how it

is handled). (Note that although the primary virtue of univariate and dejong is their handling of

the exact diﬀuse case, these methods can handle the non-diﬀuse case and the traditional “κ-diﬀuse”

case).

non-diﬀuse κ-diﬀuse exact diﬀuse

code diffuse=0 diffuse=1 diffuse=2

legacy 1 2 –

univariate 4 5 6

dejong 7 8 9

Table 36.1: Cross-tabulation of code-path and diﬀuse status. Numbers in cells are used for reference in the

text; “legacy” indicates gretl 2022a or earlier.

The case used depends on various points, the primary one being the diffuse integer member of the

state space bundle, which defaults to 0 but can be set to 1 or 2.

•diffuse=0: case 1 is the default (for backward compatibility) but case 4 or 7 can be selected,

by adding univariate=1 or dejong=1 to the bundle.

•diffuse=1: case 2 is the default but case 5 or 8 can be selected as above.

•diffuse=2: the default is 6 but can be switched to 9 via dejong=1.

For cases in the same column—namely {1,4,7},{2,5,8}and {6,9}—results from kfilter(),ksmooth()

and kdsmooth() should in principle be the same across the code-paths but in practice there are bound

to be slight diﬀerences due to the diﬀerent algorithms employed. And note that slight diﬀerences at

that level may be somewhat ampliﬁed by iterated ﬁltering as in ML estimation.

3The ﬁrst of these is used in the KFAS package for R(Helske,2017) and the second by the sspace command in

Stata. See https://www.stata.com/manuals/tssspace.pdf.

Chapter 36. State Space Modeling 332

36.9 Extensions and reﬁnements

Regressors in the observation equation

The observation equation (36.1) can be augmented to allow for the eﬀect of a k-vector of observable

exogenous variables, xt, in addition to that of the unobserved state, as in

yt=Btxt+Ztαt+εt

This speciﬁcation can be added to a bundle previously obtained via ksetup by use of the keys obsx

(for x) and obsxmat (for B′). In that case obsx must be an N×kmatrix and Bmust be n×k. (But

please note: as with the case of Zdescribed above, backward compatibility dictates that obsxmat be

given in transposed form.)

An exception to this dimensionality rule is granted for convenience. If the observation equation

includes a constant but no additional exogenous variables, you can give Bas n×1 without having

to specify obsx. More generally, if the column dimension of Bis 1 greater than kit is assumed that

the ﬁrst element of Bis associated with an implicit column of ones.

Intercept in the state equation

In some applications it may be useful to have an “intercept” in the state transition equation, thus

generalizing equation (36.2) to

αt+1 =µt+Ttαt+ηt

The term µis never strictly necessary: the system (36.1) and (36.2) can absorb it as an extra (non

time-varying) element in the state vector. However, this comes at the cost of expanding all the

matrices that touch the state (α,T,η, Ω, Z), making the model relatively awkward to formulate and

forecasts more expensive to compute. We therefore adopt the convention above on practical grounds.

The (r×1) vector µcan be added to a bundle under the key stconst. Despite its name this matrix

can be speciﬁed as time-varying as explained in the next section.

Time-varying matrices

Any or all of the matrices obsymat,obsxmat,obsvar,statemat,statevar and stconst may be

time-varying. In that case you must supply the name of a function to be called to update the matrix

or matrices in question: you add this to the bundle as a string, under the key timevar_call.4For

example, if just obsymat (Zt) should be updated, by a function named TV_Z, you would write

SSmod.timevar_call = "TV_Z"

The function that plays this role will be called at each time-step of the ﬁltering or simulation operation,

prior to performing any calculations. It should have a single bundle-pointer parameter, by means of

which it will be passed a pointer to the Kalman bundle to which the call is attached. Its return value

(if any) will not be used, so generally it returns nothing (is of type void). However, you can use

gretl’s funcerr keyword to raise an error if things seem to be going wrong; see chapter 14 for details.

Besides the bundle members noted above, a time variation function has access to the current (1-based)

time step, under the reserved key t, and the n-vector containing the forecast error from the previous

time step, vt−1, under the key uhat; when t= 1 the latter will be a zero vector.

If any additional information is needed for performing the update it can be placed in the bundle under

a user-speciﬁed key. So, for example, a simple updater for a (1 ×1) Zmatrix might look like this:

function void TV_Z (bundle *b)

b.obsymat = b.Zvals[b.t]

end function

where b.Zvals is a bundled N-vector. An updater that operates on both Z(n×r) and T(r×r)

might be

4The choice of the name for the function itself is of course totally up to the user.

Chapter 36. State Space Modeling 333

function void update_2 (bundle *b)

b.obsymat = mshape(b.Zvals[b.t,], b.r, b.n)

b.statemat = unvech(b.Tvals[b.t,])

end function

where in this case we assume that b.Zvals is N×rn, with row tholding the (transposed) vec of Zt,

and b.Tvals is N×r(r+ 1)/2, with row tholding the vech of Tt. Simpler variants (e.g. just one

element of a relevant matrix in question is changed) and more complex variants—say, involving some

sort of conditionality—are also possible in this framework.

It is worth noting that this setup lends itself to a much wider scope than time-varying system matrices.

In fact, this syntax allows for the possibility of executing user-deﬁned operations at each step. The

function that goes under timevar_call can read all the elements of the model bundle and can modify

several of them: the system matrices (which can therefore be made time-varying) as well as the user-

deﬁned elements.

An extended example of use of the time-variation facility is presented in section 36.12.

Cross-correlated disturbances

The formulation given in equations (36.1) and (36.2) assumes mutual independence of the disturbances

in the state and observation equations, εtand ηt. This assumption holds good in many practical

applications, but in some cases one may wish to allow for cross-correlation.

More generally, we note three common representations of the variance of the disturbances in (36.1)

and (36.2).

1. The “basic” representation: εtand ηtare assumed to be mutually uncorrelated, and we write

their respective (possibly time-varying) variance matrices as V(εt) (n×n) and V(ηt) (r×r).

2. The “de Jong” representation: write εt=Gtνtand ηt=Htνt, where Gtis n×r,Htis r×pand

pis the length of the underlying disturbance vector νt. This formulation allows for correlation

of the disturbances across the equations, if HtG′

tis non-zero.

3. The “Durbin–Koopman” representation: as in the ﬁrst case assume that the disturbances are

uncorrelated across the equations, but write ηt=Rtξtand V(ηt) = RtQtR′

t, where Rtis a

selection matrix and Qt=V(ξt). Let m≤rdenote the dimension of ξt. Then Qtis m×mand

Rtis r×m. This allows for the possibility that there are fewer disturbances to the state than

elements of the state vector.

With the de Jong representation, in place of (36.1)–(36.2) we may write

yt=Ztαt+Gtνt

αt+1 =Ttαt+Htνt.

In that case we may re-express the variance matrices from section 36.2 above as

Σt=GtG′

Ωt=HtH′

with the addition of

Cov(ηt, εt) = HtG′

You can select the de Jong or Durbin–Koopman representation by supplying extra arguments to the

ksetup function. For the de Jong version, in place of giving Ω you should give the two matrices

identiﬁed above as Hand G, as in

bundle SSxmod = ksetup(y, Z, T, H, G),

and in case you wish to retrieve or update information on the variance of the disturbances, note that

in the cross-correlated case the bundle keys statevar and obsvar are taken to designate the factors

Hand Grespectively.

Chapter 36. State Space Modeling 334

To select the Durbin–Koopman representation a sixth, boolean argument must be used. If that has

a non-zero value statevar is taken to be Qand the ﬁfth argument is taken to be R. Note that in

this case obsvar should be added separately as in the basic case.

The following statements illustrate the three cases.

# basic

bundle kb1 = ksetup(y, Z’, T, Veta)

kb1.obsvar = Veps # if wanted

# de Jong

bundle kb2 = ksetup(y, Z’, T, H, G)

# Durbin-Koopman

bundle kb3 = ksetup(y, Z’, T, Q, R, 1)

kb3.obsvar = Veps # if wanted

36.10 The ksimul function

This simulation function has as its required arguments a pointer to a Kalman bundle and a matrix

containing artiﬁcial disturbances, and it returns a matrix of simulation results. An optional trailing

Boolean argument is supported, the purpose of which is explained below.

If you are calling ksimul as a follow-up to other state-space computations you will presumably have

a Kalman bundle to hand to serve as the ﬁrst argument. If you’re doing simulation from scratch you

need to create such a bundle, but note that in this case the ﬁrst argument to ksetup (the matrix of

observables) is really just a place-holder: a 2 ×nzero matrix will suﬃce (two rows being the minimum

required for this matrix). The number of simulated observations is set by the row dimension of the

second argument to ksimul.

If the disturbances are not cross-correlated, the matrix argument must be either N×r, if there is

no disturbance in the observation equation, or N×(r+n) if the Σ (obsvar) matrix is speciﬁed.

Row tholds either η′

tor (η′

t, ε′

t). Note that if Ω (statevar) is not simply an identity matrix you will

have to scale the artiﬁcial state disturbances appropriately; the same goes for Σ and the observation

disturbances, if present. Given a matrix Ucontaining standard normal variates, in the general case

this requires ﬁnding a matrix Asuch that

AA′= Λ ≡ Ω 0r×n

0n×rΣ!

and post-multiplying Uby A′(although it’s not necessary to form Aexplicitly if the disturbance

variance matrices are diagonal). This is not particularly onerous if Λ is time-invariant; gretl’s psdroot

function can be used to form Aif it’s needed. If Ω and/or Σ are time-varying, however, this scaling

can become quite burdensome. As a convenience, the ancillary function ksimdata can be used to

pre-process the Umatrix automatically. Here’s an example (we assume that a bundle named bhas

been obtained via ksetup):

matrix N = mnormal(500, b.r + b.n)

matrix E = ksimdata(&b, N)

matrix sim = ksimul(&b, E)

or if you have no need to store the disturbances:

matrix sim = ksimul(&b, ksimdata(&b, mnormal(500, b.r + b.n)))

Time-variation or no, ksimdata will ensure that the disturbances are scaled to the variances speciﬁed

by Ωtand Σt.

If the disturbances are cross-correlated (see section 36.9) then the matrix argument to ksimul should

be N×p, each row holding ν′

t. In this case no prior scaling is required since it is assumed that νt∼

N(0, I) and the disturbances for the respective equations, Htνtand Gtνt, are computed internally,

with regard to time-variation if applicable.

Chapter 36. State Space Modeling 335

As mentioned above, for the purposes of ksimul the time-series length, N, is deﬁned by the number

of rows of the supplied disturbance matrix. This need not equal the original Nvalue (set from obsy

in the initial call to ksetup), since obsy is ignored under simulation. However, if the model includes

exogenous variables in the observation equation (obsx) and the simulation length is greater than the

original N, the simulation will run out of data and fail unless you supply a larger xmatrix. This

can be done in either of two ways. You can add a suitably sized matrix to the Kalman bundle under

the key simx; if present, this will be used in preference to obsx for simulation. Or if you don’t mind

over-writing the original obsx you can substitute another matrix with the required number of rows

under that key.

By default, the value returned by ksimul is a (N×n) matrix holding simulated values for the vector

of observables at each time step; that is, row tholds ˜y′

t, where the tilde indicates a simulated quantity.

To obtain a record of the simulated state, supply a non-zero value for the ﬁnal, optional argument. In

that case the returned matrix is N×(r+n) and contains both the simulated state and the simulated

observable; row tholds (˜α′

t,˜y′

t).

Note that the original obsy member of the bundle is not overwritten by ksimul, nor is state or any

other user-accessible output matrix. On exit from ksimul the prior value of Nis restored.

The initial state under simulation

The recursion that yields ˜yand ˜αis as follows: for t= 1, . . . , N

˜yt=Zt˜αt+εt

˜αt+1 =Tt˜αt+ηt

This implies that a value for ˜α1is required to get things started. You can add such a value to a

Kalman bundle under the reserved key simstart. If this member is not present in the bundle, ˜α1

defaults to the value given under the key inistate, or if that in turn is not present, to a zero vector.

Alternatively, the starting point can be made stochastic. To do this you can emulate the procedure

followed by SsfPack, namely setting

α1=a+Av0

where ais a non-stochastic r-vector, v0is an r-vector of standard normal random numbers, and Ais

a matrix such that AA′=P0.

Let’s say we have a state-space bundle b, on which we have already set suitable values of inistate

(corresponding to aabove) and inivar (P0). To perform a simulation with a stochastic starting point

we can set α1thus:

matrix A = psdroot(b.inivar)

b.simstart = b.inistate + A * mnormal(b.r, 1)

36.11 Numerical optimization

If the object of using a state space model is to produce maximum likelihood estimates of some

parameters of interest, note that the loglikelihood surface may be quite awkward (far from globally

concave), posing a challenge for numerical methods such as BFGS, the default maximizer under

gretl’s mle command. Symptoms may include failure of convergence—typically due to an excessive

computed gradient even as the maximizer cannot ﬁnd an improvement in the objective function—or

an excessive number of iterations. In such cases it is worth considering the following points.

•In some cases, scaling the observables may help: if the order of magnitude of ytis too small or

too large, ﬂoating-point precision may become an issue for estimating variances.

•If you can obtain plausible initial values for the parameters, things are likely to go better than

starting with arbitrary values.

•The limited-memory version of BFGS (L-BFGS) may work better than the standard version in

some cases. To engage this, issue the command

set lbfgs on

Chapter 36. State Space Modeling 336

prior to ML estimation.

•It may be helpful to employ a more accurate (but computationally more expensive) method for

computing the gradient, namely Richardson extrapolation. Here the command is

set bfgs_richardson on

36.12 Example scripts

This section presents a selection of short sample scripts to illustrate the most important points covered

in this chapter.

ARMA estimation

Functions illustrated in this example: ksetup,kfilter.

Listing 36.1: ARMA estimation [Download ▼]

function void arma11_via_kalman (series y)

/* parameter initialization */

scalar phi = 0

scalar theta = 0

scalar sigma = 1

/* state-space model setup */

matrix Z = {1, theta}

matrix T = {phi, 0; 1, 0}

matrix Q = {sigma^2, 0; 0, 0}

bundle kb = ksetup(y, Z’, T, Q)

/* maximum likelihood estimation */

mle logl = ERR ? NA : kb.llt

kb.obsymat[2] = theta

kb.statemat[1,1] = phi

kb.statevar[1,1] = sigma^2

ERR = kfilter(&kb)

params phi theta sigma

end mle --hessian

end function

# ------------------------ main ---------------------------

open arma.gdt # open the "arma" example dataset

arma11_via_kalman(y) # estimate an arma(1,1) model

arma 1 1 ; y --nc # check via native command

As is well known, the Kalman ﬁlter provides a very eﬃcient way to compute the likelihood of ARMA

models; as an example, take an ARMA(1,1) model

yt=ϕyt−1+εt+θεt−1

One of the ways the above equation can be cast in state-space form is by deﬁning a latent process

αt= (1 −ϕL)−1εt. The observation equation corresponding to (36.1) is then

yt=αt+θαt−1(36.3)

and the state transition equation corresponding to (36.2) is

"αt

αt−1#="ϕ0

1 0 #" αt−1

αt−2#+"εt

Chapter 36. State Space Modeling 337

The hansl syntax for a corresponding ﬁlter would be

matrix Z = {1, theta}

matrix T = {phi, 0; 1, 0}

matrix Q = {s^2, 0; 0, 0}

bundle kb = ksetup(y, Z’, T, Q)

or, if you prefer, just a one-liner:

bundle kb = ksetup(y, {1; theta}, {phi, 0; 1, 0}, {s^2, 0; 0, 0})

Note that the observation equation (36.3) does not include an“error term”; this is equivalent to saying

that var(εt) = 0 and there is therefore no need to add an obsvar element to the bundle.

Once the ﬁlter is set up, all it takes to compute the log-likelihood for given values of ϕ,θand σ2

is to execute the kfilter function and read the Kalman bundle’s lnl member (which holds the

total loglikelihood) or—more appropriately if the likelihood has to be maximized through mle—llt,

which gives the series of contributions to the log-likelihood per observation. An example is shown in

script 36.1.

The Hodrick–Prescott ﬁlter

Functions illustrated in this example: ksetup,kfilter,ksmooth.

The Hodrick–Prescott (HP) ﬁlter can be obtained as the solution of a prediction problem, that is

obtaining estimates of the latent time series µtin

xt=µt+εt

∆2µt=ηt,

where xtis an observable time series and εtand ηtare mutually uncorrelated white noise processes;

the ratio between the two variances is the “smoothing” parameter: λ=V(εt)

V(ηt). This is normally

calibrated rather than estimated, and a common choice for macro data is to make it a function of the

data periodicity: λ= 100 ·p2, where pis the number of subperiods in a year. See Harvey and Jaeger

(1993) and King and Rebelo (1993) for details.

The most common use of the HP ﬁlter is to obtain an estimate of the trend µtusing all the available

data, so that the estimate for each period tuses future as well as prior values of xt. But in some

uses (such as forecasting) this is clearly undesirable, and we may wish to construct ˆµtas a function

of xt, xt−1, . . . only. This point was famously made by Stock and Watson (1999), and the resulting

estimator is known as the one-sided HP ﬁlter.

A possible space-state representation of the model above is

xt= [1 0] "µt

µt−1#+εt

"µt

µt−1#="2−1

1 0 #" µt−1

µt−2#+ηt.

Estimates for µtcan be obtained by running a forward ﬁlter for the one-sided version, plus a smoothing

pass for the two-sided one. Code implementing the ﬁlter is shown in script 36.2, along with an example

using the “housing starts” series from the St. Louis Fed database. The example also compares the

result of the function to that from gretl’s native hpfilt() function.

Note that in the case of the one-sided ﬁlter a little trick is required in order to get the desired result:

the state matrix stored by the kfilter() function is the estimate of ˆαt|t−1, whereas what we require

is in fact ˆαt|t. To work around this we add an extra observation to the end of the series and retrieve

the one-step-ahead estimate of the lagged state.

Local level model

Functions illustrated in this example: ksetup,kfilter,ksmooth.

Chapter 36. State Space Modeling 338

Listing 36.2: HP ﬁlter [Download ▼]

function series hp_via_kalman(series y, scalar lambda[0], bool oneside[0])

if lambda == 0

lambda = 100 * $pd^2

endif

# State transition matrix

matrix T = {2, -1; 1, 0}

# Observation matrix

matrix Z = {1, 0}

# Covariance matrix in the state equation

matrix Q = {1/sqrt(lambda), 0; 0, 0}

matrix my = {y}

string desc = ""

if oneside

matrix my = my | 0

desc = "1-sided "

endif

ssm = ksetup(my, Z’, T, Q)

ssm.obsvar = sqrt(lambda)

ssm.inistate = {2*y[1]-y[2] ; 3*y[1]-2*y[2]}

ssm.diffuse = 1

err = oneside ? kfilter(&ssm) : ksmooth(&ssm)

if err

series ret = NA

else

mu = oneside ? ssm.state[2:,2] : ssm.state[,1]

series ret = y - mu

endif

string d = sprintf("%sHP-filtered %s (lambda = %g)", desc, argname(y), lambda)

setinfo ret --description="@d"

return ret

end function

# --- example ------------------------------------------------------------

clear

open fedstl.bin

data houst

y = log(houst)

# one-sided, built-in then hansl

n1c = hpfilt(y, 1600, 1)

series h1c = hp_via_kalman(y, 1600, 1)

ols n1c const h1c --simple-print

# two-sided, built-in then hansl

n2c = hpfilt(y, 1600)

series h2c = hp_via_kalman(y, 1600)

ols n2c const h2c --simple-print

Chapter 36. State Space Modeling 339

Suppose we have a series yt=µt+εt, where µtis a random walk with normal increments of variance

σ2

1and εtis normal white noise with variance σ2

2, independent of µt. This is known as the “local level”

model, and it can be cast in state-space form as equations (36.1)–(36.2) with T= 1, ηt∼N(0, σ2

1),

Z= 1 and εt∼N(0, σ2

2).5The translation to hansl is

bundle llmod = ksetup(y, 1, 1, s1)

llmod.obsvar = s2

llmod.diffuse = 1

The two unknown parameters σ2

1and σ2

2can be estimated via maximum likelihood. Listing 36.3

provides an example of simulation and estimation of such a model. Since simulating the local level

model is trivial using ordinary gretl commands, we don’t use ksimul in this context.6

Listing 36.3: Local level model [Download ▼]

nulldata 200

set seed 101010

setobs 1 1 --special-time-series

/* set the true variance parameters */

true_s1 = 0.5

true_s2 = 0.25

/* and simulate some data */

v = normal() * sqrt(true_s1)

w = normal() * sqrt(true_s2)

mu = 2 + cum(v)

y=mu+w

/* starting values for variance estimates */

s1 = 1

s2 = 1

/* state-space model set-up */

bundle kb = ksetup(y, 1, 1, s1)

kb.obsvar = s2

kb.diffuse = 1

/* ML estimation of variances */

mle ll = ERR ? NA : kb.llt

ERR = kfilter(&kb)

params kb.statevar kb.obsvar

end mle

/* compute the smoothed state */

ksmooth(&kb)

series muhat = kb.state

5Note that the local level model, plus other common “Structural Time Series” models are implemented in the

StrucTiSM function package.

6Warning: as the script stands, there is an “oﬀ-by-one” misalignment between the state vector and the observable

series. For convenience, the script is written as if equation (36.2) was modiﬁed into the equivalent formulation

αt=T αt−1+ηt

We kept the script as simple as possible so that the reader can focus on the interesting aspects.

Chapter 36. State Space Modeling 340

Time-varying models

To illustrate state space models with time-varying system matrices we will use time-varying OLS.

Suppose the DGP for an observable time series ytis given by

yt=β0+β1,txt+εt(36.4)

where the slope coeﬃcient β1evolves through time according to

β1,t+1 =β1,t +ηt(36.5)

Listing 36.4: Phillips curve on Euro data with time-varying slope [Download ▼]

function void at_each_step(bundle *b)

b.obsymat = transp(b.mX[b.t,])

end function

open AWM.gdt --quiet

smpl 1974:1 1994:1

/* parameter initialization */

scalar b0 = mean(INFQ)

scalar s_obs = 0.1

scalar s_state = 0.1

/* bundle setup */

bundle B = ksetup(INFQ, 1, 1, 1)

matrix B.mX = {URX}

matrix B.depvar = {INFQ}

B.timevar_call = "at_each_step"

B.diffuse = 1

/* ML estimation of intercept and the two variances */

mle LL = err ? NA : B.llt

B.obsy = B.depvar - b0

B.obsvar = s_obs^2

B.statevar = s_state^2

err = kfilter(&B)

params b0 s_obs s_state

end mle

/* display the smoothed time-varying slope */

ksmooth(&B)

series tvar_b1hat = B.state[,1]

series tvar_b1se = sqrt(B.stvar[,1])

gnuplot tvar_b1hat --time-series --with-lines --output=display \

--band=tvar_b1hat,tvar_b1se,1.96 --band-style=fill

It is easy to see that the pair of equations above deﬁne a state space model, with equation (36.4) as

the measurement equation and (36.5) as the state transition equation. The unobservable state is β1,t,

T= 1 and Ω = σ2

η. As for the measurement equation, Σ = σ2

ε, while the matrix multiplying β1,t, and

hence playing the role of Zt, is the time-varying xt.

Once the system is framed as a state-space model, estimation of the three unknown parameters β0,

σ2

εand σ2

ηcan proceed by maximum likelihood in a manner similar to example 36.1 and 36.3. The

sequence of slope coeﬃcients β1,t can then be estimated by running the smoother, which also yields

a consistent estimate of the dispersion of the estimated state.

Listing 36.4 presents an example in which data from the AWM database are used to estimate a

Phillips Curve with time-varying slope:

INFQt=β0+β1,tURXt+εt

Chapter 36. State Space Modeling 341

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

1975 1980 1985 1990

tvar_b1hat

Figure 36.1: Phillips Curve on Euro data: time-varying slope and 95% conﬁdence interval

where INFQ is a measure of quarterly inﬂation and URX a measure of unemployment. At the end

of the script the evolution of the slope coeﬃcient over time is plotted along with a 95% conﬁdence

band—see Figure 36.1.

Disturbance smoothing

Functions illustrated in this example: ksetup,kdsmooth.

In section 36.7 we noted that the kdsmooth function can produce two diﬀerent measures of the

dispersion of the smoothed disturbances, depending on the the value of the (optional) trailing Boolean

parameter. Here we show what these two measures are good for, using the famous Nile ﬂow data that

have been much analysed in the state-space literature. We focus on the state equation; that is, the

random-walk component of the observed series.

Our script is shown in Listing 36.5. This is an instance of the Local Level model and the ML variance

estimates are obtained as in Listing 36.3. In the ﬁrst call to kdsmooth we omit the optional switch

and therefore compute E(ˆηtˆη′

t) for each t. This quantity is suitable for constructing the auxiliary

residuals shown in the top panel of Figure 36.2 (for similar plots see Koopman et al.,1999;Pelagatti,

2011). This plot suggests the presence of a structural break shortly prior to 1900, as various authors

have observed.

In the second kdsmooth call we ask gretl to compute instead E[(ˆηt−ηt)(ˆηt−ηt)′|y1, . . . , yT], the

MSE of ˆηtconsidered as an estimator of ηt. And in the lower panel of the Figure we plot ˆηtalong

with a 90% conﬁdence band (roughly, ±1.64 times the RMSE). This reveals that, given the sampling

variance of ˆηt, we’re not really sure that any of the ηtvalues were truly diﬀerent from zero. The

resolution of the seeming conﬂict here is commonly reckoned to be that there was in fact a change in

mean around 1900, but besides that event there’s little evidence for a non-zero σ2

η. Or in other words

the standard local level model is not really applicable to the data.

36.13 Graphical interface

By this point, the reader will have gathered that setting up a state space model can be quite a complex

undertaking, and the only general way to accomplish it is by writing a script. However, some cases are

simple enough to lend themselves to a standardized treatment, and so can be handled via a relatively

streamlined graphical interface. As of version 2022a, gretl provides just this: a GUI for estimating a

subset of state space models that, while limited, may still be useful for pedagogical purposes, sparing

the user from the intricacies of scripting. In this section, we describe the GUI and the class of models

it supports.

Chapter 36. State Space Modeling 342

Listing 36.5: Working with smoothed disturbances – Nile data [Download ▼]

open nile.gdt

# ML variance estimates

scalar s2_eta = 1468.49

scalar s2_eps = 15099.7

bundle LLM = ksetup(nile, 1, 1, s2_eta)

LLM.obsvar = s2_eps

LLM.diffuse = 1

kdsmooth(&LLM)

series eta_aux = LLM.smdist[,1] ./ LLM.smdisterr[,1]

series zero = 0

plot eta_aux

options time-series with-lines band=zero,const,2

literal unset ylabel

literal set title ’Auxiliary residual, state equation’

end plot --output=display

kdsmooth(&LLM, 1)

series etahat = LLM.smdist[,1]

series sdeta = LLM.smdisterr[,1]

plot etahat

options time-series with-lines band=etahat,sdeta,1.64485

literal unset ylabel

literal set title ’State disturbance with 90% confidence band’

end plot --output=display

Chapter 36. State Space Modeling 343

(a) Auxiliary (standardized) residuals, state equation

-4

-3

-2

-1

1880 1900 1920 1940 1960

(b) Estimated state disturbance with 90% conﬁdence band

-120

-100

-80

-60

-40

-20

100

1880 1900 1920 1940 1960

Figure 36.2: Nile data: auxiliary residuals and ˆηtfrom disturbance smoother

Chapter 36. State Space Modeling 344

The GUI can be used for performing ML estimation of models of the kind

yt=Zαt+εt(36.6)

αt+1 =T αt+Rηt(36.7)

where ytis a vector of observables, V(εt) is a diagonal matrix—or possibly 0, in which case the last

term of equation (36.6) is dropped. As for the covariance matrix of the shocks to equation (36.7),

it is assumed that ηtis an IID sequence of normal random variates with diagonal covariance matrix

Ση. Therefore, the covariance matrix denoted by Ω in the previous sections of this chapter (whose

corresponding key in the Kalman bundle is statevar) is assumed to be Ω = RΣηR′. Note that R

can have fewer columns than r, thereby making Ω singular. In the graphical interface, this is called

the “state variance factor”.

The system matrices Z,Tand Rare assumed to be time-invariant and known, so estimation only

concerns the variances of εtand ηt. Clearly, this is a limited subset of the range of models that gretl

can handle, but it may be of some value to users.

Figure 36.3: GUI hook for state space models

ML estimation is carried out internally using the mle command with the limited-memory version

of the BFGS optimizer, and the user is given the option of tracking the optimization process via

a “verbosity” option. For reasons of numerical performance, it is convenient to have the choice of

representing variances as transformations of the BFGS parameters in one of the three following ways:

Absolute value maximization is performed on the variances: σ2=|θ|

Square maximization is performed on the standard deviations: σ2=θ2

Exponential maximization is performed on the log standard deviations: σ2= exp(2 ·θ)

Normally, this choice should make no diﬀerence for well-behaved data, although numerical problems

may occur sometimes. In these cases, it may be helpful to rescale the data by multiplying ytby some

scalar (such as 100 or 0.0001) so as to make the order of magnitude of the parameters less prone to

ﬁnite-precision issues. In any case, the function reports the estimates of the standard errors whatever

the parametrization type. Once the parameters are estimated the user has the choice of performing

smoothing of the states.

The GUI is shown in ﬁgure 36.3. The “observables” box is used for specifying a list of series (or a

single series) for yt. The next two boxes handle the Zand Tmatrices, respectively. These can be

Chapter 36. State Space Modeling 345

pre-existing matrices or may be created “on the ﬂy”. The same applies for the next box, dedicated to

the Rmatrix. However, the Rmatrix can be omitted, in which case it is implicitly assumed R=I.

The remaining GUI elements should hopefully be self-explanatory.

The function returns a bundle which includes a sub-bundle called kmod with all the state-space

internals; a matrix called state holding the estimated states; and matrices coeff and vcv holding,

respectively, the coeﬃcients and standard errors obtained via ML estimation.

Example: Random walk plus noise

The model here is

yt=αt+εtαt+1 =αt+ηt

so that Z=T=R= 1.

The following script simulates the DGP above with V(εt) = 1 and V(ηt) = 1/16, and sets up the two

matrices Zand T, ready to be entered into the second and third boxes of the GUI helper, respectively;

obviously, the ﬁrst box should contain the string y. Note that the ﬁrst box expects as argument a

named list, thereby allowing for multivariate models.

clear

set verbose off

set seed 280921

nulldata 256

setobs 1 1 --special

# example 1: random walk plus noise

series m = cum(normal() * 0.25)

series y = m + normal()

Z = {1}

T = {1}

-4

-3

-2

-1

0 50 100 150 200 250

mhat

Figure 36.4: Estimated state

Once this is done, clicking on the “OK” button should result in the output shown below; by clicking

on the “Graph” icon the picture shown in Figure 36.4 should be produced.

Observation equation

coefficient std. error z p-value

------------------------------------------------------

stdev[1] 0.939311 0.0496287 18.93 6.86e-80 ***

State transition equation

Chapter 36. State Space Modeling 346

coefficient std. error z p-value

------------------------------------------------------

stdev[1] 0.274650 0.0488920 5.617 1.94e-08 ***

Log-likelihood = -388.946

Example: Random walk plus noise plus seasonal

The model here is

yt=µt+st+εt,

where µt=µt−1+η1,t is a random walk, as in the previous example, and stis a seasonal component,

implicitly deﬁned by the property

st=−

S−1

j=1

st−i+η2,t,

Sbeing the number of subperiods. This model is amenable to the representation used in this section

by deﬁning the state vector as

αt= [µt, st, st−1, st−2]′.

For example, with quarterly data the system matrices would be equal to

Z=h1100iT=





1 0 0 0

0−1−1−1

0 1 0 0

0 0 1 0





R=





1 0

0 1

0 0







The following script applies this model to one of the series in the gretl example dataset data9-3.gdt:

open data9-3

y = log(reskwh)

Z = ones(2,1) | zeros(2,1)

SeasMat = -ones(1,3) | I(2,3)

T = diagcat(1, SeasMat)

R = I(2) | zeros(2,2)

Again, ﬁlling the GUI boxes in the obvious way and clicking “OK” will produce the output below:

Observation equation

coefficient std. error z p-value

-----------------------------------------------------

stdev[1] 0.0173633 0.00448117 3.875 0.0001 ***

State transition equation

coefficient std. error z p-value

------------------------------------------------------

stdev[1] 0.0269790 0.00409457 6.589 4.43e-11 ***

stdev[2] 0.00648082 0.00202576 3.199 0.0014 ***

Log-likelihood = 121.33

Note that the output window will contain a few icons on the top bar. By clicking on the second one

from the left, it is possible to save to the gretl workspace one or more elements from the returned

bundle. For example, the kmod key corresponds to the estimated kalman bundle. Saving it under the

name kb and running the code below will produce the plot shown in Figure 36.5.

Chapter 36. State Space Modeling 347

series trend = kb.state[,1]

series seas = kb.state[,2]

scatters y trend seas

6.3

6.4

6.5

6.6

6.7

6.8

6.9

7.1

7.2

7.3

7.4

1973 1978 1983 1988 1993

6.4

6.5

6.6

6.7

6.8

6.9

7.1

7.2

7.3

1973 1978 1983 1988 1993

trend

-0.15

-0.1

-0.05

0.05

0.1

0.15

1973 1978 1983 1988 1993

seas

Figure 36.5: Estimated trend and seasonal component

Chapter 37

Numerical methods

Several functions are available to aid in the construction of special-purpose estimators: their purpose

is to ﬁnd numerically approximate solutions to problems that in principle could be solved analytically,

but in practice cannot be, for one reason or another. In this chapter, we illustrate the tools that gretl

oﬀers for optimization of functions, diﬀerentiation and integration.

37.1 Derivative-based optimization methods

In some cases, the function we want to optimize is diﬀerentiable and has a maximum in the interior

of the search space. In these cases, you will want to use algorithms that exploit this feature, such

as BFGS or Newton–Raphson. If this is not the case, you may want to use derivative-free methods,

which are illustrated in section 37.2.

BFGS

The BFGSmax function has two required arguments: a vector holding the initial values of a set of

parameters, and a call to a function that calculates the (scalar) criterion to be maximized, given

the current parameter values and any other relevant data. If the object is in fact minimization, this

function should return the negative of the criterion. On successful completion, BFGSmax returns the

maximized value of the criterion and the vector given via the ﬁrst argument holds the parameter

values which produce the maximum. It is assumed here that the objective function is a user-deﬁned

function (see Chapter 14) with the following general set-up:

function scalar ObjFunc (const matrix theta, matrix *X)

scalar val = ... # do some computation

return val

end function

The ﬁrst argument contains the parameter vector (which should not be modiﬁed within the function)

and the second may be used to hold “extra” values that are necessary to compute the objective

function, but are not the variables of the optimization problem. Here the pointer form is chosen for

the argument, but depending on the problem it could also be passed as a plain argument, with our

without the const modiﬁer. For example, if the objective function were a log-likelihood, the ﬁrst

argument would contain the parameters and the second one the data. Or, for more economic-theory

inclined readers, if the objective function were the utility of a consumer, the ﬁrst argument might

contain the quantities of goods and the second one their prices and disposable income.

The operation of BFGS can be adjusted using the set variables bfgs_maxiter and bfgs_toler (see

Chapter 26). In addition you can provoke verbose output from the maximizer by setting max_verbose

to on, again via the set command (alternatively, you could have set it to full and get even richer

output).

The Rosenbrock function is often used as a test problem for optimization algorithms. It is also known

as “Rosenbrock’s Valley” or “Rosenbrock’s Banana Function”, on account of the fact that its contour

lines are banana-shaped. It is deﬁned by:

f(x, y) = (1 −x)2+ 100(y−x2)2

The function has a global minimum at (x, y) = (1,1) where f(x, y) = 0. Listing 37.1 shows a gretl

script that discovers the minimum using BFGSmax (giving a verbose account of progress). Note that,

in this particular case, the function to be maximized only depends on the parameters, so the second

parameter is omitted from the deﬁnition of the function Rosenbrock.

348

Chapter 37. Numerical methods 349

Listing 37.1: Finding the minimum of the Rosenbrock function [Download ▼]

function scalar Rosenbrock(const matrix param)

scalar x = param[1]

scalar y = param[2]

return -(1-x)^2 - 100 * (y - x^2)^2

end function

matrix theta = {0, 0}

set max_verbose on

M = BFGSmax(&theta, Rosenbrock(theta))

print theta

Supplying analytical derivatives for BFGS

An optional third argument to the BFGSmax function enables the user to supply analytical derivatives

of the criterion function with respect to the parameters (without which a numerical approximation to

the gradient is computed). This argument is similar to the second one in that it speciﬁes a function

call. In this case the function that is called must have the following signature.

Its ﬁrst argument should be a pre-deﬁned matrix correctly dimensioned to hold the gradient; that

is, if the parameter vector contains kelements, the gradient matrix must also be a k-vector. This

matrix argument must be given in“pointer” form so that its content can be modiﬁed by the function.

(Note that unlike the parameter vector, where the choice of initial values can be important, the initial

values given to the gradient are immaterial and do not aﬀect the results.)

In addition the gradient function must have as one of its argument the parameter vector. This may

be given in pointer form (which enhances eﬃciency) but that is not required. Additional arguments

may be speciﬁed if necessary.

Given the current parameter values, the function call must ﬁll out the gradient vector appropriately.

It is not required that the gradient function returns any value directly; if it does, that value is ignored.

Listing 37.2 illustrates, showing how the Rosenbrock script can be modiﬁed to use analytical deriva-

tives. (Note that since this is a minimization problem the values written into g[1] and g[2] in the

function Rosen_grad are in fact the derivatives of the negative of the Rosenbrock function.)

Limited-memory variant and constrained optimization

As an alternative to “standard” BFGS gretl oﬀers the limited-memory variant, L-BFGS-B. This is

described by Byrd et al. (1995) and Zhu et al. (1997). Gretl uses version 3.0 of this code, which

features improvements described by Morales and Nocedal (2011). Some problems that defeat standard

BFGS may be amenable to solution by L-BFGS-B. To see if this is the case, gretl code that uses BFGS

can be pushed into using the alternative algorithm via the set command, as follows:

set lbfgs on

The primary case for using L-BFGS-B, however, is constrained optimization: this algorithm supports

constraints on the parameters in the form of minima and/or maxima. In gretl this is implemented

by the function BFGScmax (‘c’ for constrained). The syntax is basically similar to that of BFGSmax,

except that the ﬁrst argument must followed by speciﬁcation of a “bounds” matrix. This matrix

should have three columns and as many rows as there are constrained elements of the parameter

vector. Each row should hold the (1-based) index of the constrained parameter, followed by lower

and upper bounds. The values -$huge and $huge should be used to indicate that the parameter

is unconstrained downward or upward, respectively. For example, the following code constructs a

matrix to specify that the second element of the parameter vector must be non-negative, and the

fourth must lie between 0 and 1:

matrix bounds = {2, 0, $huge; 4, 0, 1}

Chapter 37. Numerical methods 350

Listing 37.2: Rosenbrock function with analytical gradient [Download ▼]

function scalar Rosenbrock (const matrix param)

scalar x = param[1]

scalar y = param[2]

return -(1-x)^2 - 100 * (y - x^2)^2

end function

function void Rosen_grad (matrix *g, const matrix param)

scalar x = param[1]

scalar y = param[2]

g[1] = 2*(1-x) + 2*x*(200*(y-x^2))

g[2] = -200*(y - x^2)

end function

matrix theta = {0, 0}

matrix grad = {0, 0}

set max_verbose 1

M = BFGSmax(&theta, Rosenbrock(theta), Rosen_grad(&grad, theta))

print theta

print grad

Newton–Raphson

BFGS, discussed above, is an excellent all-purpose maximizer, and about as robust as possible given

the limitations of digital computer arithmetic. The Newton–Raphson maximizer is not as robust, but

may converge much faster than BFGS for problems where the maximand is reasonably well behaved—

in particular, where it is anything like quadratic (see below). The case for using Newton–Raphson is

enhanced if it is possible to supply a function to calculate the Hessian analytically.

The gretl function NRmax, which implements the Newton–Raphson method, has a maximum of four

arguments. The ﬁrst two (required) arguments are exactly as for BFGS: an initial parameter vector,

and a function call which returns the maximand given the parameters. The (optional) third argument

is again as in BFGS: a function call that calculates the gradient. Speciﬁc to NRmax is an optional

fourth argument, namely a function call to calculate the (negative) Hessian. The ﬁrst argument of

this function must be a pre-deﬁned matrix of the right dimension to hold the Hessian—that is, a

k×kmatrix, where kis the length of the parameter vector—given in “pointer” form. The second

argument should be the parameter vector (optionally in pointer form). Other data may be passed as

additional arguments as needed. Similarly to the case with the gradient, if the fourth argument to

NRmax is omitted then a numerical approximation to the Hessian is constructed.

What is ultimately required in Newton–Raphson is the negative inverse of the Hessian. Note that

if you give the optional fourth argument, your function should compute the negative Hessian, but

should not invert it; NRmax takes care of inversion, with special handling for the case where the matrix

is not negative deﬁnite, which can happen far from the maximum.

Listing 37.3 extends the Rosenbrock example, using NRmax with a function Rosen_hess to compute

the Hessian. The functions Rosenbrock and Rosen_grad are just the same as in Listing 37.2 and are

omitted for brevity.

The idea behind Newton–Raphson is to exploit a quadratic approximation to the maximand, under

the assumption that it is concave. If this is true, the method is very eﬀective. However, if the

algorithm happens to evaluate the function at a point where the Hessian is not negative deﬁnite,

things may go wrong. Script 37.4 exempliﬁes this by using a normal density, which is concave in the

interval (−1,1) and convex elsewhere. If the algorithm is started from within the interval everything

goes well and NR is (slightly) more eﬀective than BFGS. If, however, the Hessian is positive at the

starting point BFGS converges with only little more diﬃculty, while Newton–Raphson fails.

Chapter 37. Numerical methods 351

Listing 37.3: Rosenbrock function via Newton–Raphson

function void Rosen_hess (matrix *H, const matrix param)

scalar x = param[1]

scalar y = param[2]

H[1,1] = 2 - 400*y + 1200*x^2

H[1,2] = -400*x

H[2,1] = -400*x

H[2,2] = 200

end function

matrix theta = {0, 0}

matrix grad = {0, 0}

matrix H = zeros(2, 2)

set max_verbose 1

M = NRmax(&theta, Rosenbrock(theta), Rosen_grad(&grad, theta),

Rosen_hess(&H, theta))

print theta

print grad

Listing 37.4: Maximization of a Gaussian density [Download ▼]

function scalar ND(matrix x)

scalar z = x[1]

return exp(-0.5*z*z)

end function

set max_verbose 1

x = {0.75}

A = BFGSmax(&x, ND(x))

x = {0.75}

A = NRmax(&x, ND(x))

x = {1.5}

A = BFGSmax(&x, ND(x))

x = {1.5}

A = NRmax(&x, ND(x))

Chapter 37. Numerical methods 352

37.2 Derivative-free optimization methods

Golden section search method

Suppose you have a function f(x) of a scalar argument, that is known to have a unique maximum.

The golden section method is rather eﬀective at ﬁnding it quickly without making use of derivatives

(see Press et al. (2007), section 10.2 for a thorough description). The gretl function implementing

this method is called GSSmax.

The idea is, roughly, to take an interval [x0, x1] (also known as the “bracket”) that should contain the

maximizing value. Once y0=f(x0) and y1=f(x1) are computed, the algorithm sets a new point

x2that replaces the end of the previous interval for which the function takes the worse value. So for

example, if y0< y1, then x0is replaced and the interval becomes [x1, x2]. The width of the interval

shrinks progressively, so after a few iterations you should end close to the maximum.

As an illustration, consider the function f(x) = 50 ·x3/2e−x, which is maximized at x= 1.5. In

Listing 37.5 we set as the initial interval the range [0,10] ad observe the progress of the algorithm.

Listing 37.5: Golden section example

function scalar g(scalar x)

return 50 * x^(1.5) * exp(-x)

end function

set max_verbose on

m = {5, 0, 10}

y = GSSmax(&m, g(m[1]))

printf "\nf(%g) = %g\n", m[1], y

The output is

1: bracket={3.81966,6.18034}, values={8.18747,1.59001}

2: bracket={2.36068,3.81966}, values={17.1118,8.18747}

3: bracket={1.45898,2.36068}, values={20.4841,17.1118}

4: bracket={0.901699,1.45898}, values={17.3764,20.4841}

...

20: bracket={1.50017,1.50042}, values={20.4958,20.4958}

21: bracket={1.50001,1.50017}, values={20.4958,20.4958}

f(1.49996) = 20.4958

As you can see from the output, the bracket shrinks progressively; the center of the interval when the

algorithm stops is x= 1.49996. Figure 37.1 depicts the process.

Simulated Annealing

Simulated annealing—as implemented by the gretl function simann—is not a full-blown maximization

method in its own right, but can be a useful auxiliary tool in problems where convergence depends

sensitively on the initial values of the parameters. The idea is that you supply initial values and the

simulated annealing mechanism tries to improve on them via controlled randomization.

The simann function takes up to three arguments. The ﬁrst two (required) are the same as for

BFGSmax and NRmax: an initial parameter vector and a function that computes the maximand. The

optional third argument is a positive integer giving the maximum number of iterations, n, which

defaults to 1024.

Starting from the speciﬁed point in the parameter space, for each of niterations we select at random

a new point within a certain radius of the previous one and determine the value of the criterion at the

new point. If the criterion is higher we jump to the new point; otherwise, we jump with probability

P(and remain at the previous point with probability 1 −P). As the iterations proceed, the system

gradually “cools”—that is, the radius of the random perturbation is reduced, as is the probability of

Chapter 37. Numerical methods 353

0 1 2 3 4 5 6 7

Figure 37.1: Golden section search: the blue line shows the function to be maximized and the red segments

show successive choices for the bracket.

making a jump when the criterion fails to increase.

In the course of this procedure n+ 1 points in the parameter space are evaluated: call them θi, i =

0, . . . , n, where θ0is the initial value given by the user. Let θ∗denote the“best” point among θ1, . . . , θn

(highest criterion value). The value written into the parameter vector on completion is then θ∗if θ∗is

better than θ0, otherwise θn. In other words, failing an actual improvement in the criterion, simann

randomizes the starting point, which may be helpful in tricky optimization problems.

Listing 37.6 shows simann at work as a helper for BFGSmax in ﬁnding the maximum of a bimodal

function. Unaided, BFGSmax requires 60 function evaluations and 55 evaluations of the gradient, while

after simulated annealing the maximum is found with 7 function evaluations and 6 evaluations of the

gradient.1

Listing 37.6: BFGS with initialization via Simulated Annealing [Download ▼]

function scalar bimodal (matrix x, matrix A)

scalar ret = exp(-qform((x-1)’, A))

ret += 2*exp(-qform((x+4)’, A))

return ret

end function

set seed 12334

set max_verbose on

scalar k = 2

matrix A = 0.1 * I(k)

matrix x0 = {3; -5}

x = x0

u = BFGSmax(&x, bimodal(x, A))

print x

x = x0

u = simann(&x, bimodal(x, A), 1000)

print x

u = BFGSmax(&x, bimodal(x, A))

print x

1Your mileage may vary: these ﬁgures are somewhat compiler- and machine-dependent.

Chapter 37. Numerical methods 354

Nelder–Mead

The Nelder–Mead derivative-free simplex maximizer (also known as the “amoeba” algorithm) is im-

plemented by the function NMmax. The argument list of this function is essentially the same as for

simann: the required arguments are an initial parameter vector and a function-call to compute the

maximand, while an optional third argument can be used to set the maximum number of function

evaluations (default value: 2000).

This method is unlikely to produce as close an approximation to the “true” optimum as derivative-

based methods such as BFGS and Newton–Raphson, but it is more robust than the latter. It may

succeed in some cases where derivative-based methods fail, and it may be useful, like simann, for

improving the starting point for an optimization problem so that a derivative-based method can then

take over successfully.

NMmax includes an internal “convergence” check—namely, veriﬁcation that the best value achieved for

the objective function at termination of the algorithm is at least a local optimum—but by default it

doesn’t ﬂag an error if this condition is not satisﬁed. This permits a mode of usage where you set

a fairly tight budget of function evaluations (for example, 200) and just take any improvement in

the objective function that is available, without worrying about whether an optimum has truly been

reached. However, if you want the convergence check to be enforced you can ﬂag this by setting a

negative value for the maximum function evaluations argument; in that case the absolute value of the

argument is taken and an error is provoked on non-convergence.

If the task for this function is actually minimization, you can either have the function-call return the

negative of the actual criterion or, if you prefer, call NMmax under the alias NMmin.

Here is an example of use: minimization of the Powell quartic function, which is problematic for

BFGS. (The true minimum is zero, obtained for xa 4-vector of zeros.)

function scalar powell (const matrix x)

fx1 = x[1] + 10 * x[2]

fx2 = x[3] - x[4]

fx3 = x[2] - 2 * x[3]

fx4 = x[1] - x[4]

return fx1^2 + 5.0 * fx2^2 + fx3^4 + 10.0 * fx4^4

end function

matrix x = {3, -1, 0, 1}’

printf "Initial f(X) = %g\n", powell(x)

fmin = NMmin(x, powell(x))

printf "Estimate of optimal X*:\n%14f\n", x

printf "f(X*) = %g\n", fmin

37.3 Numerical diﬀerentiation

Computing a Jacobian

Gretl oﬀers the possibility of diﬀerentiating numerically a user-deﬁned function via the fdjac function.

This function takes two arguments: an n×1 matrix holding initial parameter values and a function

call that calculates and returns an m×1 matrix, given the current parameter values and any other

relevant data. On successful completion it returns an m×nmatrix holding the Jacobian. For example,

matrix Jac = fdjac(theta, SumOC(&theta, &X))

where we assume that SumOC is a user-deﬁned function with the following structure:

function matrix SumOC (matrix *theta, matrix *X)

matrix V = ... # do some computation

return V

end function

This may come in handy in several cases: for example, if you use BFGSmax to estimate a model, you

may wish to calculate a numerical approximation to the relevant Jacobian to construct a covariance

matrix for your estimates.

Chapter 37. Numerical methods 355

Another example is the delta method: if you have a consistent estimator of a vector of parameters

θ, and a consistent estimate of its covariance matrix Σ, you may need to compute estimates for a

nonlinear continuous transformation ψ=g(θ). In this case, a standard result in asymptotic theory

is that (ˆ

θp

−→ θ

√Tˆ

θ−θd

−→ N(0,Σ) )=⇒(ˆ

ψ=g(ˆ

θ)p

−→ ψ=g(θ)

√Tˆ

ψ−ψd

−→ N(0, J ΣJ′))

where Tis the sample size and Jis the Jacobian ∂ g(x)

∂x x=θ.

Listing 37.7: Delta Method [Download ▼]

function matrix MPC(matrix *param, matrix *Y)

beta = param[2]

gamma = param[3]

y = Y[1]

return beta*gamma*y^(gamma-1)

end function

# William Greene, Econometric Analysis, 5e, Chapter 9

set echo off

set messages off

open greene5_1.gdt

# Use OLS to initialize the parameters

ols realcons 0 realdpi --quiet

a = $coeff(0)

b = $coeff(realdpi)

g = 1.0

# Run NLS with analytical derivatives

nls realcons = a + b * (realdpi^g)

deriv a = 1

deriv b = realdpi^g

deriv g = b * realdpi^g * log(realdpi)

end nls

matrix Y = realdpi[2000:4]

matrix theta = $coeff

matrix V = $vcv

mpc = MPC(&theta, &Y)

matrix Jac = fdjac(theta, MPC(&theta, &Y))

Sigma = qform(Jac, V)

printf "\nmpc = %g, std.err = %g\n", mpc, sqrt(Sigma)

scalar teststat = (mpc-1)/sqrt(Sigma)

printf "\nTest for MPC = 1: %g (p-value = %g)\n", \

teststat, pvalue(n,abs(teststat))

Script 37.7 exempliﬁes such a case: the example is taken from Greene (2003), section 9.3.1. The slight

diﬀerences between the results reported in the original source and what gretl returns are due to the

fact that the Jacobian is computed numerically, rather than analytically as in the book.

On the subject of numerical versus analytical derivatives, one may wonder what diﬀerence it makes

to use one method or another. Simply put, the answer is: analytical derivatives may be painful

to derive and to translate into code, but in most cases they are much faster than using fdjac; as

a consequence, if you need to use derivatives as part of an algorithm that requires iteration (such

as numerical optimization, or a Monte Carlo experiment), you’ll deﬁnitely want to use analytical

derivatives.

Chapter 37. Numerical methods 356

Analytical derivatives are also, in most cases, more precise than numerical ones, but this advantage

may or may not be negligible in practice depending on the practical details: the two fundamental

aspects to take in consideration are nonlinearity and machine precision.

As an example, consider the derivative of a highly nonlinear function such as the matrix inverse. In

order to keep the example simple, let’s focus on 2 ×2 matrices and deﬁne the function

function matrix vecinv(matrix x)

A = mshape(x,2,2)

return vec(inv(A))’

end function

which, given vec(A), returns vec(A−1). As is well known (see for example Magnus and Neudecker

(1988)),

∂vec(A−1)

∂vec(A)=−(A−1)′⊗(A−1),

which is rather easy to code in hansl as

function matrix grad(matrix x)

iA = inv(mshape(x,2,2))

return -iA’ ** iA

end function

Using the fdjac function to obtain the same result is even easier: you just invoke it like

fdjac(a, "vecinv(a)")

In order to see what the diﬀerence is, in terms of precision, between analytical and numerical Jaco-

bians, let’s start from A="2 1

1 1 #. The following code

a = {2; 1; 1; 1}

ia = vecinv(a)

ag = grad(a)

ng = fdjac(a, "vecinv(a)")

dg = ag - ng

print ag ng dg

gives

ag (4 x 4)

-1 1 1 -1

1 -2 -1 2

1 -1 -2 2

-1 2 2 -4

ng (4 x 4)

-1 1 1 -1

1 -2 -1 2

1 -1 -2 2

-1 2 2 -4

dg (4 x 4)

-3.3530e-08 -3.7251e-08 -3.7251e-08 -3.7255e-08

2.6079e-08 5.2150e-08 3.7251e-08 6.7060e-08

2.6079e-08 3.7251e-08 5.2150e-08 6.7060e-08

-2.2354e-08 -5.9600e-08 -5.9600e-08 -1.4902e-07

Chapter 37. Numerical methods 357

in which the analytically-computed derivative and its numerical approximation are essentially the

same. If, however, you set A="1.0001 1

1 1 #you end up evaluating the function at a point in which

the function itself is considerably more nonlinear since the matrix is much closer to being singular.

As a consequence, the numerical approximation becomes much less satisfactory:

ag (4 x 4)

-1.0000e+08 1.0000e+08 1.0000e+08 -1.0000e+08

1.0000e+08 -1.0001e+08 -1.0000e+08 1.0001e+08

1.0000e+08 -1.0000e+08 -1.0001e+08 1.0001e+08

-1.0000e+08 1.0001e+08 1.0001e+08 -1.0002e+08

ng (4 x 4)

-9.9985e+07 1.0001e+08 1.0001e+08 -9.9985e+07

9.9985e+07 -1.0002e+08 -1.0001e+08 9.9995e+07

9.9985e+07 -1.0001e+08 -1.0002e+08 9.9995e+07

-9.9985e+07 1.0002e+08 1.0002e+08 -1.0001e+08

dg (4 x 4)

-14899. -14901. -14901. -14900.

14899. 14903. 14901. 14902.

14899. 14901. 14903. 14902.

-14899. -14903. -14903. -14903.

Moreover, machine precision may have its impact: if you take A= 0.00001 ×"2 1

1 1 #, the matrix

itself is not singular at all, but the order of magnitude of its elements is close enough to machine

precision to provoke problems:

ag (4 x 4)

-1.0000e+10 1.0000e+10 1.0000e+10 -1.0000e+10

1.0000e+10 -2.0000e+10 -1.0000e+10 2.0000e+10

1.0000e+10 -1.0000e+10 -2.0000e+10 2.0000e+10

-1.0000e+10 2.0000e+10 2.0000e+10 -4.0000e+10

ng (4 x 4)

-1.0000e+10 1.0000e+10 1.0000e+10 -1.0000e+10

1.0000e+10 -2.0000e+10 -1.0000e+10 2.0000e+10

1.0000e+10 -1.0000e+10 -2.0000e+10 2.0000e+10

-1.0000e+10 2.0000e+10 2.0000e+10 -4.0000e+10

dg (4 x 4)

-488.30 -390.60 -390.60 -195.33

634.79 781.21 390.60 585.98

634.79 488.26 683.55 585.98

-781.27 -976.52 -781.21 -1367.3

Computing a Hessian

In principle, you can use the fdjac function repeatedly to evaluate higher-order derivatives. However,

you should be aware that you may encounter rather serious numerical issues. For this reason, gretl

provides a function, similar to fdjac, for computing a Hessian directly, called numhess.

An example follows:

function scalar quad(matrix x, matrix A )

return 0.5*qform(x’, A)

Chapter 37. Numerical methods 358

end function

function matrix coljac(matrix x, matrix A)

return fdjac(x, quad(x, A))’

end function

A = {23, -11; -11, 118}

x = {1;1}

set fdjac_quality 2

H0 = fdjac(x, "coljac(x, A)")

H1 = numhess(x, "quad(x, A)")

printf "\n%14.8f", H0

printf "\n%14.8f", H1

In this example, we use the function quad for evaluating a simple quadratic form f(x) = 1

2x′Ax, for

which the Hessian is trivially the matrix A. The matrix H0 is obtained by using fdjac twice (at its

best quality, that is with Richardson extrapolation on); the matrix H1, instead, is obtained directly

via the numhess function. The result of the above script is:

22.99788411 -10.99983724

-10.99902343 117.99161784

23.00000000 -11.00000000

-11.00000000 118.00000000

which makes it apparent how much better the second result is (note that H0 is not even symmetric,

as it should be).

37.4 Numerical integration

The main tool that gretl oﬀers in this area is integration via Gaussian quadrature. In practice, the

evaluation of an integral of the kind

C=ZA

f(x)ω(x)dx(37.1)

is approximated via a suitable linear combination of the form

C≃

i=1

f(xi)w(xi),(37.2)

where the q“quadrature points” x1, x2, . . . , xqare suitably chosen and the corresponding weights

w(xi) ensure that the approximation is optimal.

The quadtable function provides quadrature points and weights for the following list of problems:

Integral Technique Note

R∞

−∞ f(x)e−x2dxGauss–Hermite see below for the relationship with the

Gaussian distribution

R∞

0f(x)e−xdxGauss–Laguerre can be used for evaluating Laplace trans-

forms numerically

af(x)dxGauss–Legendre by default, a=−1 and b= 1

The quadtable function returns a q×2 matrix with the xipoints in the ﬁrst column and the

corresponding weights wiin the second, so in practice you compute the desired integral of f(x) by

applying that function to the ﬁrst column and then computing the inner product of the result by the

second column.

In econometrics, Gaussian quadrature is most commonly used for evaluating expectations of functions

of Gaussian random variables, that is, integrals of the form

E[f(X)] = Z∞

−∞

f(x)φ(x)dx

Chapter 37. Numerical methods 359

In this case, Gauss–Hermite quadrature can be readily applied, provided one modiﬁes the integral

above via a suitable change of variable, namely by deﬁning z=x/√2 so that

Z∞

−∞

f(x)1

√2πe−x2/2dx=1

√πZ∞

−∞

f(√2·z)·e−z2dz

For example, consider the following script, in which we evaluate E(eX) and E[cos(X)] via Gaussian

quadrature.

q = 10

Q = quadtable(q)

x = exp(sqrt(2) * Q[,1])

w = Q[,2] ./ sqrt($pi)

printf "E(exp(x)) = %16.14f (exact = %16.14f)\n", x’w, exp(0.5)

x = cos(sqrt(2) * Q[,1])

w = Q[,2] ./ sqrt($pi)

printf "E(cos(x)) = %16.14f (exact = %16.14f)\n", x’w, exp(-0.5)

The result is

E(exp(x)) = 1.64872127069823 (exact = 1.64872127070013)

E(cos(x)) = 0.60653065971146 (exact = 0.60653065971263)

Higher values of qwill give a more precise approximation.

In fact, the full form of quadtable—which supports three optional arguments besides the required

q—provides a shortcut which saves you the hassle of having to recalculate the integral with the change

of variable. Suppose you want something like

C=E[f(X)] X∼N(µ, σ2)

Then you can use the syntax quadtable(q, 1, m, s), where the 1 indicates Gauss–Hermite inte-

gration and the two ﬁnal arguments give the mean and standard deviation. So for example if you

want to compute E(X2), where X∼N(1,0.25), you could use

# Note: no rescaling needed

Q = quadtable(q, 1, 1, 0.5)

printf "E(X^2) = %6.4f (exact = %6.4f)\n", (Q[,1].^2)’Q[,2], 1.25

which yields

E(X^2) = 1.2500 (exact = 1.2500)

See the Gretl Command Reference for more details.

Chapter 38

Discrete and censored dependent variables

This chapter deals with models for dependent variables that are discrete or censored or otherwise

limited (as in event counts or durations, which must be positive) and that therefore call for estimation

methods other than the classical linear model. We discuss several estimators (mostly based on the

Maximum Likelihood principle), adding some details and examples to complement the material on

these methods in the Gretl Command Reference.

38.1 Logit and probit models

It often happens that one wants to specify and estimate a model in which the dependent variable

is not continuous, but discrete. A typical example is a model in which the dependent variable is

the occupational status of an individual (1 = employed, 0 = unemployed). A convenient way of

formalizing this situation is to consider the variable yias a Bernoulli random variable and analyze its

distribution conditional on the explanatory variables xi. That is,

yi=(1Pi

0 1 −Pi

(38.1)

where Pi=P(yi= 1|xi) is a given function of the explanatory variables xi.

In most cases, the function Piis a cumulative distribution function F, applied to a linear combination

of the xis. In the probit model, the normal cdf is used, while the logit model employs the logistic

function Λ(). Therefore, we have

probit Pi=F(zi) = Φ(zi) (38.2)

logit Pi=F(zi) = Λ(zi) = 1

1 + e−zi(38.3)

zi=

j=1

xij βj(38.4)

where ziis commonly known as the index function. Note that in this case the coeﬃcients βjcannot

be interpreted as the partial derivatives of E(yi|xi) with respect to xij . However, for a given value of

xiit is possible to compute the vector of “slopes”, that is

slopej(¯x) = ∂ F (z)

∂xjz= ¯z

Gretl automatically computes the slopes, setting each explanatory variable at its sample mean.

Another, equivalent way of thinking about this model is in terms of an unobserved variable y∗

iwhich

can be described thus:

y∗

j=1

xij βj+εi=zi+εi(38.5)

We observe yi= 1 whenever y∗

i>0 and yi= 0 otherwise. If εiis assumed to be normal, then we

have the probit model. The logit model arises if we assume that the density function of εiis

λ(εi) = ∂Λ(εi)

∂εi

=e−εi

(1 + e−εi)2

Both the probit and logit model are estimated in gretl via maximum likelihood, where the log-

likelihood can be written as

L(β) = X

yi=0

ln[1 −F(zi)] + X

yi=1

ln F(zi),(38.6)

360

Chapter 38. Discrete and censored dependent variables 361

which is always negative, since 0 < F (·)<1. Since the score equations do not have a closed form

solution, numerical optimization is used. However, in most cases this is totally transparent to the

user, since usually only a few iterations are needed to ensure convergence. The --verbose switch can

be used to track the maximization algorithm.

By way of example, the commands below reproduce results given in chapter 21 of Greene (2000),

regarding the eﬀectiveness of a program for teaching economics.

open greene19_1

logit GRADE const GPA TUCE PSI

probit GRADE const GPA TUCE PSI

The binary dependent variable, GRADE, equals 1 if a student’s grade improved over a certain period,

0 otherwise. The independent variables are PSI (= 1 if the student participated in the program

in question, otherwise 0) plus two controls, the student’s initial Grade Point Average (GPA) and

score on a prior economics test (TUCE). Output is shown in Table 38.1. Note that for the probit

model a conditional moment test on skewness and kurtosis (Bera, Jarque and Lee,1984) is printed

automatically as a test for normality.

In this context, the $uhat accessor function takes a special meaning: it returns generalized residuals

as deﬁned in Gourieroux, Monfort, Renault and Trognon (1987), which can be interpreted as unbiased

estimators of the latent disturbances εi. These are deﬁned as

ui=(yi−ˆ

Pifor the logit model

yi·ϕ(ˆzi)

Φ(ˆzi)−(1 −yi)·ϕ( ˆzi)

1−Φ(ˆzi)for the probit model (38.7)

Among other uses, generalized residuals are often used for diagnostic purposes. For example, it is

very easy to set up an omitted variables test equivalent to the familiar LM test in the context of a

linear regression; example 38.1 shows how to perform a variable addition test.

Listing 38.1: Variable addition test in a probit model [Download ▼]

open greene19_1

probit GRADE const GPA PSI

series u = $uhat

ols u const GPA PSI TUCE -q

printf "Variable addition test for TUCE:\n"

printf "Rsq * T = %g (p. val. = %g)\n", $trsq, pvalue(X,1,$trsq)

Odds ratios

A noteworthy feature of the binary logit model is that the regression coeﬃcients have an interpretation

as log odds ratios, where the odds ratio is 0 < P (y= 1)/P (y= 0) <∞. In the logit example above

the coeﬃcient on TUCE has a value of 0.095. The corresponding odds ratio is then e0.095 = 1.10,

meaning that the estimated eﬀect of a unit increase in TUCE is to move the odds ratio by 10 percent

in favor of GRADE = 1.

When a binary logit model is estimated via the gretl GUI, the Analysis menu in the model output

window incudes an “Odds ratios” item. This opens a window showing the odds ratios along with

standard errors (obtained via the delta method) plus a 95 percent conﬁdence interval, as illustrated

below.

95% confidence intervals

z(0.025) = 1.9600

odds ratio std. error low high

Chapter 38. Discrete and censored dependent variables 362

Model 1: Logit estimates using the 32 observations 1-32

Dependent variable: GRADE

VARIABLE COEFFICIENT STDERROR T STAT SLOPE

(at mean)

const -13.0213 4.93132 -2.641

GPA 2.82611 1.26294 2.238 0.533859

TUCE 0.0951577 0.141554 0.672 0.0179755

PSI 2.37869 1.06456 2.234 0.449339

Mean of GRADE = 0.344

Number of cases ’correctly predicted’ = 26 (81.2%)

f(beta’x) at mean of independent vars = 0.189

McFadden’s pseudo-R-squared = 0.374038

Log-likelihood = -12.8896

Likelihood ratio test: Chi-square(3) = 15.4042 (p-value 0.001502)

Akaike information criterion (AIC) = 33.7793

Schwarz Bayesian criterion (BIC) = 39.6422

Hannan-Quinn criterion (HQC) = 35.7227

Predicted

0 1

Actual 0 18 3

1 3 8

Model 2: Probit estimates using the 32 observations 1-32

Dependent variable: GRADE

VARIABLE COEFFICIENT STDERROR T STAT SLOPE

(at mean)

const -7.45232 2.54247 -2.931

GPA 1.62581 0.693883 2.343 0.533347

TUCE 0.0517288 0.0838903 0.617 0.0169697

PSI 1.42633 0.595038 2.397 0.467908

Mean of GRADE = 0.344

Number of cases ’correctly predicted’ = 26 (81.2%)

f(beta’x) at mean of independent vars = 0.328

McFadden’s pseudo-R-squared = 0.377478

Log-likelihood = -12.8188

Likelihood ratio test: Chi-square(3) = 15.5459 (p-value 0.001405)

Akaike information criterion (AIC) = 33.6376

Schwarz Bayesian criterion (BIC) = 39.5006

Hannan-Quinn criterion (HQC) = 35.581

Predicted

0 1

Actual 0 18 3

1 3 8

Test for normality of residual -

Null hypothesis: error is normally distributed

Test statistic: Chi-square(2) = 3.61059

with p-value = 0.164426

Table 38.1: Example logit and probit output

Chapter 38. Discrete and censored dependent variables 363

---------------------------------------------------------

GPA 16.8797 21.3181 1.42019 200.624

TUCE 1.09983 0.155686 0.833365 1.45150

PSI 10.7907 11.4874 1.33934 86.9380

Note, however, that conﬁdence intervals shown are not calculated using the delta-method standard

errors; rather, the bounds are obtained by exponentiating the bounds of regular conﬁdence intervals

for the coeﬃcients. This makes sense on the assumption that the coeﬃcients themselves are more

likely to be normally distributed than their exponentials.

Odds ratio information can also be retrieved following binary logit estimation via scripting. In this

case it takes the form of a matrix, provided by the $oddsratios accessor or as $model.oddsratios.

The perfect prediction problem

One curious characteristic of logit and probit models is that (quite paradoxically) estimation is not

feasible if a model ﬁts the data perfectly; this is called the perfect prediction problem. The reason

why this problem arises is easy to see by considering equation (38.6): if for some vector βand scalar

kit’s the case that zi< k whenever yi= 0 and zi> k whenever yi= 1, the same thing is true for any

multiple of β. Hence, L(β) can be made arbitrarily close to 0 simply by choosing enormous values

for β. As a consequence, the log-likelihood has no maximum, despite being bounded.

Gretl has a mechanism for preventing the algorithm from iterating endlessly in search of a non-

existent maximum. One sub-case of interest is when the perfect prediction problem arises because of

a single binary explanatory variable. In this case, the oﬀending variable is dropped from the model

and estimation proceeds with the reduced speciﬁcation. Nevertheless, it may happen that no single

“perfect classiﬁer” exists among the regressors, in which case estimation is simply impossible and the

algorithm stops with an error. This behavior is triggered during the iteration process if

max zi

i:yi=0

<min zi

i:yi=1

If this happens, unless your model is trivially mis-speciﬁed (like predicting if a country is an oil

exporter on the basis of oil revenues), it is normally a small-sample problem: you probably just don’t

have enough data to estimate your model. You may want to drop some of your explanatory variables.

This problem is well analyzed in Stokes (2004); the results therein are replicated in the example script

murder_rates.inp.

38.2 Ordered response models

These models constitute a simple variation on ordinary logit/probit models, and are usually applied

when the dependent variable is a discrete and ordered measurement—not simply binary, but on an

ordinal rather than an interval scale. For example, this sort of model may be applied when the

dependent variable is a qualitative assessment such as “Good”, “Average” and “Bad”.

In the general case, consider an ordered response variable, y, that can take on any of the J+ 1 values

0,1,2, . . . , J . We suppose, as before, that underlying the observed response is a latent variable,

y∗=Xβ +ε=z+ε

Now deﬁne “cut points”, α1< α2<··· < αJ, such that

y= 0 if y∗≤α1

y= 1 if α1< y∗≤α2

y=Jif y∗> αJ

For example, if the response takes on three values there will be two such cut points, α1and α2.

The probability that individual iexhibits response j, conditional on the characteristics xi, is then

Chapter 38. Discrete and censored dependent variables 364

given by

P(yi=j|xi) = 









P(y∗≤α1|xi) = F(α1−zi) for j= 0

P(αj< y∗≤αj+1 |xi) = F(αj+1 −zi)−F(αj−zi) for 0 < j < J

P(y∗> αJ|xi)=1−F(αJ−zi) for j=J

(38.8)

The unknown parameters αjare estimated jointly with the βs via maximum likelihood. The ˆαj

estimates are reported by gretl as cut1,cut2 and so on. For the probit variant, a conditional

moment test for normality constructed in the spirit of Chesher and Irish (1987) is also included.

Note that the αjparameters can be shifted arbitrarily by adding a constant to zi, so the model is

under-identiﬁed if there is some linear combination of the explanatory variables which is constant.

The most obvious case in which this occurs is when the model contains a constant term; for this

reason, gretl drops automatically the intercept if present. However, it may happen that the user

inadventently speciﬁes a list of regressors that may be combined in such a way to produce a constant

(for example, by using a full set of dummy variables for a discrete factor). If this happens, gretl will

also drop any oﬀending regressors.

In order to apply these models in gretl, the dependent variable must either take on only non-negative

integer values, or be explicitly marked as discrete. (In case the variable has non-integer values, it will

be recoded internally.) Note that gretl does not provide a separate command for ordered models: the

logit and probit commands automatically estimate the ordered version if the dependent variable is

acceptable, but not binary.

Listing 38.2 reproduces the results presented in section 15.10 of Wooldridge (2002a). The question

of interest in this analysis is what diﬀerence it makes, to the allocation of assets in pension funds,

whether individual plan participants have a choice in the matter. The response variable is an ordinal

measure of the weight of stocks in the pension portfolio. Having reported the results of estimation

of the ordered model, Wooldridge illustrates the eﬀect of the choice variable by reference to an

“average” participant. The example script shows how one can compute this eﬀect in gretl.

After estimating ordered models, the $uhat accessor yields generalized residuals as in binary models;

additionally, the $yhat accessor function returns ˆzi, so it is possible to compute an unbiased estimator

of the latent variable y∗

isimply by adding the two together.

38.3 Multinomial logit

When the dependent variable is not binary and does not have a natural ordering, multinomial mod-

els are used. Multinomial logit is supported in gretl via the --multinomial option to the logit

command. Simple models can also be handled via the mle command (see chapter 26). We give here

an example of such a model. Let the dependent variable, yi, take on integer values 0,1,...p. The

probability that yi=kis given by

P(yi=k|xi) = exp(xiβk)

j=0 exp(xiβj)

For the purpose of identiﬁcation one of the outcomes must be taken as the “baseline”; it is usually

assumed that β0= 0, in which case

P(yi=k|xi) = exp(xiβk)

1 + Pp

j=1 exp(xiβj)

and

P(yi= 0|xi) = 1

1 + Pp

j=1 exp(xiβj).

Listing 38.3 reproduces Table 15.2 in Wooldridge (2002a), based on data on career choice from Keane

and Wolpin (1997). The dependent variable is the occupational status of an individual (0 = in school;

1 = not in school and not working; 2 = working), and the explanatory variables are education and

work experience (linear and square) plus a “black” binary variable. The full data set is a panel; here

the analysis is conﬁned to a cross-section for 1987.

Chapter 38. Discrete and censored dependent variables 365

Listing 38.2: Ordered probit model [Download ▼]

Replicate the results in Wooldridge, Econometric Analysis of Cross

Section and Panel Data, section 15.10, using pension-plan data from

Papke (AER, 1998).

The dependent variable, pctstck (percent stocks), codes the asset

allocation responses of "mostly bonds", "mixed" and "mostly stocks"

as {0, 50, 100}.

The independent variable of interest is "choice", a dummy indicating

whether individuals are able to choose their own asset allocations.

open pension.gdt

# demographic characteristics of participant

list DEMOG = age educ female black married

# dummies coding for income level

list INCOME = finc25 finc35 finc50 finc75 finc100 finc101

# Papke’s OLS approach

ols pctstck const choice DEMOG INCOME wealth89 prftshr

# save the OLS choice coefficient

choice_ols = $coeff(choice)

# estimate ordered probit

probit pctstck choice DEMOG INCOME wealth89 prftshr

k = $ncoeff

matrix b = $coeff[1:k-2]

a1 = $coeff[k-1]

a2 = $coeff[k]

Wooldridge illustrates the ’choice’ effect in the ordered probit

by reference to a single, non-black male aged 60, with 13.5 years

of education, income in the range $50K - $75K and wealth of $200K,

participating in a plan with profit sharing.

matrix X = {60, 13.5, 0, 0, 0, 0, 0, 0, 1, 0, 0, 200, 1}

# with ’choice’ = 0

scalar Xb = (0 ~ X) * b

P0 = cdf(N, a1 - Xb)

P50 = cdf(N, a2 - Xb) - P0

P100 = 1 - cdf(N, a2 - Xb)

E0 = 50 * P50 + 100 * P100

# with ’choice’ = 1

Xb=(1~X)*b

P0 = cdf(N, a1 - Xb)

P50 = cdf(N, a2 - Xb) - P0

P100 = 1 - cdf(N, a2 - Xb)

E1 = 50 * P50 + 100 * P100

printf "\nWith choice, E(y) = %.2f, without E(y) = %.2f\n", E1, E0

printf "Estimated choice effect via ML = %.2f (OLS = %.2f)\n", E1 - E0,

choice_ols

Chapter 38. Discrete and censored dependent variables 366

Listing 38.3: Multinomial logit

open keane.gdt

smpl year==87 --restrict

logit status 0 educ exper expersq black --multinomial

Output (selected portions):

Model 1: Multinomial Logit, using observations 1-1738 (n = 1717)

Missing or incomplete observations dropped: 21

Dependent variable: status

Standard errors based on Hessian

coefficient std. error z p-value

--------------------------------------------------------

status = 2

const 10.2779 1.13334 9.069 1.20e-19 ***

educ -0.673631 0.0698999 -9.637 5.57e-22 ***

exper -0.106215 0.173282 -0.6130 0.5399

expersq -0.0125152 0.0252291 -0.4961 0.6199

black 0.813017 0.302723 2.686 0.0072 ***

status = 3

const 5.54380 1.08641 5.103 3.35e-07 ***

educ -0.314657 0.0651096 -4.833 1.35e-06 ***

exper 0.848737 0.156986 5.406 6.43e-08 ***

expersq -0.0773003 0.0229217 -3.372 0.0007 ***

black 0.311361 0.281534 1.106 0.2687

Mean dependent var 2.691322 S.D. dependent var 0.573502

Log-likelihood -907.8572 Akaike criterion 1835.714

Schwarz criterion 1890.198 Hannan-Quinn 1855.874

Number of cases ’correctly predicted’ = 1366 (79.6%)

Likelihood ratio test: Chi-square(8) = 583.722 [0.0000]

Chapter 38. Discrete and censored dependent variables 367

38.4 Bivariate probit

The bivariate probit model is a two-equation system in which each equation is a probit model and

the two disturbance terms may not be independent. In formulae,

y∗

1,i =

j=1

xij βj+ε1,i y1,i = 1 ⇐⇒ y∗

1,i >0 (38.9)

y∗

2,i =

j=1

zij γj+ε2,i y2,i = 1 ⇐⇒ y∗

2,i >0 (38.10)

"ε1,i

ε2,i #∼N"0, 1ρ

ρ1!# (38.11)

If ρwere 0, ML estimation of the parameters βjand γjcould be accomplished by estimating the two

equations separately. In the general case, however, joint estimation is required for maximal eﬃciency.

The gretl command for this model is biprobit, which performs ML estimation via numerical op-

timization using the Newton–Raphson method with analytical derivatives. An example of usage is

provided in the biprobit.inp sample script. The command takes either three or four arguments,

the ﬁrst three being series names for y1and y2and a list of explanatory variables. In the common

case when the regressors are the same for the two equations this is suﬃcient, but if zdiﬀers from x

a second list should be appended following a semicolon, as in:

biprobit y1 y2 X ; Z

Output from estimation includes a Likelihood Ratio test for the hypothesis ρ= 0.1This can be

retrieved in the form of a bundle named independence_test under the $model accessor, as in

? eval $model.independence_test

bundle:

dfn = 1

test = 204.066

pvalue = 2.70739e-46

Since biprobit estimates a two-equation system, the $uhat and $yhat accessors provide matrices

rather than series as usual. Speciﬁcally, $uhat gives a two-column matrix containing the generalized

residuals, while $yhat contains four columns holding the estimated probabilities of the possible joint

outcomes: (y1,i, y1,i ) = (1,1) in column 1, (y1,i , y2,i) = (1,0) in column 2, (y1,i, y2,i) = (0,1) in

column 3 and (y1,i, y2,i ) = (0,0) in column 4.

38.5 Panel estimators

When your dataset is a panel, the traditional choice for binary dependent variable models was, for

many years, to use logit with ﬁxed eﬀects and probit with random eﬀects (see 23.1 for a brief discussion

of this dichotomy in the context of linear models). Nowadays the choice is somewhat wider but the

two traditional models are by and large what practitioners use as routine tools.

Gretl provides FE logit via the function package felogit,2RE probit natively. Provided your dataset

has a panel structure, the latter option can be obtained by adding the --random option to the probit

command:

probit depvar const indvar1 indvar2 --random

as exempliﬁed in the reprobit.inp sample script. The numerical technique used for this particular

estimator is Gauss-Hermite quadrature, which we’ll now brieﬂy describe. Generalizing equation (38.5)

to a panel context, we get

y∗

i,t =

j=1

xijt βj+αi+εi,t =zi,t +ωi,t (38.12)

1Note that if the --robust option is given to biprobit—and therefore the estimator is meant to be QMLE—this

test may not be valid, even asymptotically.

2See http://gretl.sourceforge.net/current_fnfiles/felogit.gfn.

Chapter 38. Discrete and censored dependent variables 368

in which we assume that the individual eﬀect, αi, and the disturbance term, εi,t , are mutually

independent zero-mean Gaussian random variables. The composite error term, ωi,t =αi+εi,t , is

therefore a normal r. v. with mean zero and variance 1 + σ2

α. Because of the individual eﬀect, αi,

observations for the same unit are not independent; the likelihood therefore has to be evaluated on a

per-unit basis, as

ℓi= log P[yi,1, yi,2, . . . , yi,T ].

and there’s no way to write the above as a product of individual terms.

However, the above probability could be written as a product if we were to treat αias a constant; in

that case we would have

ℓi|αi=

t=1

Φ"(2yi,t −1)xijtβj+αi

p1 + σ2

α#

so that we can compute ℓiby integrating αiout as

ℓi=E(ℓi|αi) = Z∞

−∞

(ℓi|αi)φ(αi)

p1 + σ2

dαi

The technique known as Gauss–Hermite quadrature is simply a way of approximating the above

integral via a sum of carefully chosen terms:3

ℓi≃

k=1

(ℓi|αi=nk)wk

where the numbers nkand wkare known as quadrature points and weights, respectively. Of course,

accuracy improves with higher values of m, but so does CPU usage. Note that this technique can

also be used in more general cases by using the quadtable() function and the mle command via the

apparatus described in chapter 26. Here, however, the calculations were hard-coded in C for maximal

speed and eﬃciency.

Experience shows that a reasonable compromise can be achieved in most cases by choosing min the

order of 20 or so; gretl uses 32 as a default value, but this can be changed via the --quadpoints

option, as in

probit y const x1 x2 x3 --random --quadpoints=48

38.6 The Tobit model

The Tobit model is used when the dependent variable of a model is censored. Assume a latent variable

y∗

ican be described as

y∗

j=1

xij βj+εi,

where εi∼N(0, σ2). If y∗

iwere observable, the model’s parameters could be estimated via ordinary

least squares. On the contrary, suppose that we observe yi, deﬁned as

yi=









afor y∗

i≤a

y∗

ifor a<y∗

i< b

bfor y∗

i≥b

(38.13)

In most cases found in the applied literature, a= 0 and b=∞, so in practice negative values of y∗

are not observed and are replaced by zeros.

In this case, regressing yion the xis does not yield consistent estimates of the parameters β, because

the conditional mean E(yi|xi) is not equal to Pk

j=1 xij βj. It can be shown that restricting the sample

to non-zero observations would not yield consistent estimates either. The solution is to estimate the

parameters via maximum likelihood. The syntax is simply

3Some have suggested using a more reﬁned method called adaptive Gauss-Hermite quadrature; this is not imple-

mented in gretl.

Chapter 38. Discrete and censored dependent variables 369

tobit depvar indvars

As usual, progress of the maximization algorithm can be tracked via the --verbose switch, while

$uhat returns the generalized residuals. Note that in this case the generalized residual is deﬁned

as ˆui=E(εi|yi= 0) for censored observations, so the familiar equality ˆui=yi−ˆyionly holds for

uncensored observations, that is, when yi>0.

An important diﬀerence between the Tobit estimator and OLS is that the consequences of non-

normality of the disturbance term are much more severe: non-normality implies inconsistency for

the Tobit estimator. For this reason, the output for the Tobit model includes the Chesher and Irish

(1987) normality test by default.

The general case in which ais nonzero and/or bis ﬁnite can be handled by using the options --llimit

and --rlimit. So, for example,

tobit depvar indvars --llimit=10

would tell gretl that the left bound ais set to 10.

38.7 Interval regression

The interval regression model arises when the dependent variable is unobserved for some (possibly

all) observations; what we observe instead is an interval in which the dependent variable lies. In other

words, the data generating process is assumed to be

y∗

i=xiβ+ϵi

but we only know that mi≤y∗

i≤Mi, where the interval may be left- or right-unbounded (but

not both). If mi=Mi, we eﬀectively observe y∗

iand no information loss occurs. In practice, each

observation belongs to one of four categories:

1. left-unbounded, when mi=−∞,

2. right-unbounded, when Mi=∞,

3. bounded, when −∞ < mi< Mi<∞and

4. point observations when mi=Mi.

It is interesting to note that this model bears similarities to other models in several special cases:

•When all observations are point observations the model trivially reduces to the ordinary linear

regression model.

•The interval model could be thought of as an ordered probit model (see 38.2) in which the cut

points (the αjcoeﬃcients in eq. 38.8) are observed and don’t need to be estimated.

•The Tobit model (see 38.6) is a special case of the interval model in which miand Mido not

depend on i, that is, the censoring limits are the same for all observations. As a matter of fact,

gretl’s tobit command is handled internally as a special case of the interval model.

The gretl command intreg estimates interval models by maximum likelihood, assuming normality

of the disturbance term ϵi. Its syntax is

intreg minvar maxvar X

where minvar contains the miseries, with NAs for left-unbounded observations, and maxvar contains

Mi, with NAs for right-unbounded observations. By default, standard errors are computed using the

negative inverse of the Hessian. If the --robust ﬂag is given, then QML or Huber–White standard

errors are calculated instead. In this case the estimated covariance matrix is a “sandwich” of the

inverse of the estimated Hessian and the outer product of the gradient.

Chapter 38. Discrete and censored dependent variables 370

If the model speciﬁcation contains regressors other than just a constant, the output includes a chi-

square statistic for testing the joint null hypothesis that none of these regressors has any eﬀect on

the outcome. This is a Wald statistic based on the estimated covariance matrix. If you wish to

construct a likelihood ratio test, this is easily done by estimating both the full model and the null

model (containing only the constant), saving the log-likelihood in both cases via the $lnl accessor,

and then referring twice the diﬀerence between the two log-likelihoods to the chi-square distribution

with kdegrees of freedom, where kis the number of additional regressors (see the pvalue command

in the Gretl Command Reference). Also included is a conditional moment normality test, similar to

those provided for the probit, ordered probit and Tobit models (see above). An example is contained

in the sample script wtp.inp, provided with the gretl distribution.

Listing 38.4: Interval model on artiﬁcial data [Download ▼]

nulldata 100

# generate artificial data

set seed 201449

x = normal()

epsilon = 0.2*normal()

ystar = 1 + x + epsilon

lo_bound = floor(ystar)

hi_bound = ceil(ystar)

# run the interval model

intreg lo_bound hi_bound const x

# estimate ystar

gen_resid = $uhat

yhat = $yhat + gen_resid

corr ystar yhat

Output (selected portions):

Model 1: Interval estimates using the 100 observations 1-100

Lower limit: lo_bound, Upper limit: hi_bound

coefficient std. error t-ratio p-value

---------------------------------------------------------

const 0.993762 0.0338325 29.37 1.22e-189 ***

x 0.986662 0.0319959 30.84 8.34e-209 ***

Chi-square(1) 950.9270 p-value 8.3e-209

Log-likelihood -44.21258 Akaike criterion 94.42517

Schwarz criterion 102.2407 Hannan-Quinn 97.58824

sigma = 0.223273

Left-unbounded observations: 0

Right-unbounded observations: 0

Bounded observations: 100

Point observations: 0

...

corr(ystar, yhat) = 0.98960092

Under the null hypothesis of no correlation:

t(98) = 68.1071, with two-tailed p-value 0.0000

As with the probit and Tobit models, after a model has been estimated the $uhat accessor returns the

generalized residual, which is an estimate of ϵi: more precisely, it equals yi−xiˆ

βfor point observations

and E(ϵi|mi, Mi, xi) otherwise. Note that it is possible to compute an unbiased predictor of y∗

iby

Chapter 38. Discrete and censored dependent variables 371

summing this estimate to xiˆ

β. Listing 38.4 shows an example. As a further similarity with Tobit,

the interval regression model may deliver inconsistent estimates if the disturbances are non-normal;

hence, the Chesher and Irish (1987) test for normality is included by default here too.

38.8 Sample selection model

In the sample selection model (also known as “Tobit II” model), there are two latent variables:

y∗

j=1

xij βj+εi(38.14)

s∗

j=1

zij γj+ηi(38.15)

and the observation rule is given by

yi=(y∗

ifor s∗

i>0

♢for s∗

i≤0(38.16)

In this context, the ♢symbol indicates that for some observations we simply do not have data on

y:yimay be 0, or missing, or anything else. A dummy variable diis normally used to set censored

observations apart.

One of the most popular applications of this model in econometrics is a wage equation coupled with

a labor force participation equation: we only observe the wage for the employed. If y∗

iand s∗

iwere

(conditionally) independent, there would be no reason not to use OLS for estimating equation (38.14);

otherwise, OLS does not yield consistent estimates of the parameters βj.

Since conditional independence between y∗

iand s∗

iis equivalent to conditional independence between

εiand ηi, one may model the co-dependence between εiand ηias

εi=ληi+vi;

substituting the above expression in (38.14), you obtain the model that is actually estimated:

yi=

j=1

xij βj+λˆηi+vi,

so the hypothesis that censoring does not matter is equivalent to the hypothesis H0:λ= 0, which

can be easily tested.

The parameters can be estimated via maximum likelihood under the assumption of joint normality

of εiand ηi; however, a widely used alternative method yields the so-called Heckit estimator, named

after Heckman (1979). The procedure can be brieﬂy outlined as follows: ﬁrst, a probit model is ﬁt

on equation (38.15); next, the generalized residuals are inserted in equation (38.14) to correct for the

eﬀect of sample selection.

Gretl provides the heckit command to carry out estimation; its syntax is

heckit y X ; d Z

where yis the dependent variable, Xis a list of regressors, dis a dummy variable holding 1 for

uncensored observations and Zis a list of explanatory variables for the censoring equation.

Since in most cases maximum likelihood is the method of choice, by default gretl computes ML

estimates. The 2-step Heckit estimates can be obtained by using the --two-step option. After

estimation, the $uhat accessor contains the generalized residuals. As in the ordinary Tobit model,

the residuals equal the diﬀerence between actual and ﬁtted yionly for uncensored observations (those

for which di= 1).

Listing 38.5 shows two estimates from the dataset used in Mroz (1987): the ﬁrst one replicates Table

22.7 in Greene (2003),4while the second one replicates table 17.1 in Wooldridge (2002a).

4Note that the estimates given by gretl do not coincide with those found in the printed volume. They do, however,

match those found on the errata web page for Greene’s book: http://pages.stern.nyu.edu/~wgreene/Text/Errata/

ERRATA5.htm.

Chapter 38. Discrete and censored dependent variables 372

Listing 38.5: Heckit model [Download ▼]

open mroz87.gdt

series EXP2 = AX^2

series WA2 = WA^2

series KIDS = (KL6+K618)>0

# Greene’s specification

list X = const AX EXP2 WE CIT

list Z = const WA WA2 FAMINC KIDS WE

heckit WW X ; LFP Z --two-step

heckit WW X ; LFP Z

# Wooldridge’s specification

series NWINC = FAMINC - WW*WHRS

series lww = log(WW)

list X = const WE AX EXP2

list Z = X NWINC WA KL6 K618

heckit lww X ; LFP Z --two-step

38.9 Count data

Here the dependent variable is assumed to be a non-negative integer—for example, the number of

Nobel Prize winners in a given country per year, the number of vehicles crossing a certain intersection

per hour, the number of bank failures per year. A probabilistic description of such a variable must

hinge on some discrete distribution and the one most commonly employed is the Poisson, according

to which, for a random variable Yand a speciﬁc realization y,

P(Y=y) = e−λλy

y!, y = 0,1,2. . .

where the single parameter λis both the mean and the variance of Y. In an econometric context we

generally want to treat λas speciﬁc to the observation, i, and driven by covariates Xivia a parameter

vector β. The standard way of allowing for this is the exponential mean function,

λi≡exp(Xiβ)

hence leading to

P(Yi=y) = exp(−exp(Xiβ))(exp(Xiβ))y

Given this model the log-likelihood for nobservations can be written as

ℓ=

i=1

(−exp(Xiβ) + yiXiβ−log yi!

Maximization of this quantity is quite straightforward, and is carried out in gretl using the syntax

poisson depvar indep

In some cases, an “oﬀset” variable is needed: the count of occurrences of the outcome of interest

in a given time is assumed to be strictly proportional to the oﬀset variable ti. In the epidemiology

literature, the oﬀset is known as “population at risk”. In this case λis modeled as

λi=tiexp(Xiβ)

Chapter 38. Discrete and censored dependent variables 373

The log-likelihood is not greatly complicated thereby. Here’s another way of thinking about the oﬀset

variable: its natural log is just another explanatory variable whose coeﬃcient is constrained to equal

If an oﬀset variable is needed, it should be speciﬁed at the end of the command, separated from the

list of explanatory variables by a semicolon, as in

poisson depvar indep ; offset

Overdispersion

As mentioned above, in the Poisson model E(Yi|Xi) = V(Yi|Xi) = λi, that is, the conditional mean

equals the conditional variance by construction. In many cases this feature is at odds with the data;

the conditional variance is often larger than the mean, a phenomenon known as overdispersion. The

output from the poisson command includes a conditional moment test for overdispersion (as per

Davidson and MacKinnon (2004), section 11.5), which is printed automatically after estimation.

Overdispersion can be attributed to unmodeled heterogeneity between individuals. Two data points

with the same observable characteristics Xi=Xjmay diﬀer because of some unobserved scale factor

si=sjso that

E(Yi|Xi, si) = λisi=λjsj=E(Yi|Xj, sj)

even though λi=λj. In other words, Yiis a Poisson random variable conditional on both Xiand

si, but since siis unobservable, the only thing we can we can use, P(Yi|Xi), will not conform to the

Poisson distribution.

It is often assumed that sican be represented as a gamma random variable with mean 1 and variance α.

The parameter α, which measures the degree of heterogeneity between individuals, is then estimated

jointly with the vector β.

In this case, the conditional probability that Yi=ygiven Xican be shown to be

P(Yi=y|Xi) = Γ(y+α−1)

Γ(α−1)Γ(y+ 1) λi

λi+α−1yα−1

λi+α−1α−1

(38.17)

which is known as the Negative Binomial Model. The conditional mean is still E(Yi|Xi) = λi, but

the variance is V(Yi|Xi) = λi(1 + λiα).

To estimate the Negative Binomial model in gretl, just substitute the keyword negbin for poisson

in the commands shown above.

To be precise, the model 38.17 is that labeled NEGBIN2 by Cameron and Trivedi (1986). There’s

also a lesser-used NEGBIN1 variant, in which the conditional variance is a scalar multiple of the

conditional mean; that is, V(Yi|Xi) = λi(1 + γ). This can be invoked in gretl by appending the

option --model1 to the negbin command.5

The two accessors $yhat and $uhat return the predicted values and generalized residuals, respectively.

Note that $uhat is not equal to the diﬀerence between the dependent variable and $yhat.

Examples

Among the sample scripts supplied with gretl you can ﬁnd camtriv.inp. This exempliﬁes the

count-data estimators described above, based on a dataset analysed by Cameron and Trivedi (1998).

The gretl package also contains a relevant dataset used by McCullagh and Nelder (1983), namely

mccullagh.gdt, on which the Poisson and Negative Binomial estimators may be tried.

38.10 Duration models

In some contexts we wish to apply econometric methods to measurements of the duration of certain

states. Classic examples include the following:

•From engineering, the “time to failure” of electronic or mechanical components: how long do,

say, computer hard drives last until they malfunction?

5The “1” and “2” in these labels indicate the power to which λiis raised in the conditional variance expression.

Chapter 38. Discrete and censored dependent variables 374

•From the medical realm: how does a new treatment aﬀect the time from diagnosis of a certain

condition to exit from that condition (where “exit” might mean death or full recovery)?

•From economics: the duration of strikes, or of spells of unemployment.

In each case we may be interested in how the durations are distributed, and how they are aﬀected by

relevant covariates. There are several approaches to this problem; the one we discuss here—which is

currently the only one supported by gretl—is estimation of a parametric model by means of Maximum

Likelihood. In this approach we hypothesize that the durations follow some deﬁnite probability law

and we seek to estimate the parameters of that law, factoring in the inﬂuence of covariates.

We may express the density of the durations as f(t, X, θ), where tis the length of time in the state

in question, Xis a matrix of covariates, and θis a vector of parameters. The likelihood for a sample

of nobservations indexed by iis then

i=1

f(ti, xi, θ)

Rather than working with the density directly, however, it is standard practice to factor f(·) into two

components, namely a hazard function,λ, and a survivor function,S. The survivor function gives

the probability that a state lasts at least as long as t; it is therefore 1 −F(t, X, θ) where Fis the

CDF corresponding to the density f(·). The hazard function addresses this question: given that a

state has persisted as long as t, what is the likelihood that it ends within a short increment of time

beyond t—that is, it ends between tand t+ ∆? Taking the limit as ∆ goes to zero, we end up with

the ratio of the density to the survivor function:6

λ(t, X, θ) = f(t, X, θ)

S(t, X, θ)(38.18)

so the log-likelihood can be written as

ℓ=

i=1

log f(ti, xi, θ) =

i=1

log λ(ti, xi, θ) + log S(ti, xi, θ) (38.19)

One point of interest is the shape of the hazard function, in particular its dependence (or not)

on time since the state began. If λdoes not depend on twe say the process in question exhibits

duration independence: the probability of exiting the state at any given moment neither increases

nor decreases based simply on how long the state has persisted to date. The alternatives are positive

duration dependence (the likelihood of exiting the state rises, the longer the state has persisted)

or negative duration dependence (exit becomes less likely, the longer it has persisted). Finally, the

behavior of the hazard with respect to time need not be monotonic; some parameterizations allow for

this possibility and some do not.

Since durations are inherently positive the probability distribution used in modeling must respect this

requirement, giving a density of zero for t≤0. Four common candidates are the exponential, Weibull,

log-logistic and log-normal, the Weibull being the most common choice. The table below displays the

density and the hazard function for each of these distributions as they are commonly parameterized,

written as functions of talone. (ϕand Φ denote, respectively, the Gaussian PDF and CDF.)

density, f(t) hazard, λ(t)

Exponential γexp (−γt)γ

Weibull αγαtα−1exp [−(γt)α]αγαtα−1

Log-logistic γα (γt)α−1

[1 + (γt)α]2γα (γt)α−1

[1 + (γt)α]

Log-normal 1

σt ϕ[(log t−µ)/σ]1

σt

ϕ[(log t−µ)/σ]

Φ [−(log t−µ)/σ]

6For a fuller discussion see, for example, Davidson and MacKinnon (2004).

Chapter 38. Discrete and censored dependent variables 375

The hazard is constant for the exponential distribution. For the Weibull, it is monotone increasing in

tif α > 1, or monotone decreasing for α < 1. (If α= 1 the Weibull collapses to the exponential.) The

log-logistic and log-normal distributions allow the hazard to vary with tin a non-monotonic fashion.

Covariates are brought into the picture by allowing them to govern one of the parameters of the

density, so that durations are not identically distributed across cases. For example, when using the

log-normal distribution it is natural to make µ, the expected value of logt, depend on the covariates,

X. This is typically done via a linear index function: µ=Xβ.

Note that the expressions for the log-normal density and hazard contain the term (log t−µ)/σ.

Replacing µwith Xβ this becomes (log t−Xβ)/σ. As in Kalbﬂeisch and Prentice (2002), we deﬁne

a shorthand label for this term:

wi≡(log ti−xiβ)/σ (38.20)

It turns out that this constitutes a useful simplifying change of variables for all of the distributions

discussed here. The interpretation of the scale factor, σ, in the expression above depends on the

distribution. For the log-normal, σrepresents the standard deviation of log t; for the Weibull and the

log-logistic it corresponds to 1/α; and for the exponential it is ﬁxed at unity. For distributions other

than the log-normal, Xβ corresponds to −log γ, or in other words γ= exp(−Xβ).

With this change of variables, the density and survivor functions may be written compactly as follows

(the exponential is the same as the Weibull).

density, f(wi) survivor, S(wi)

Weibull exp (wi−ewi) exp(−ewi)

Log-logistic ewi(1 + ewi)−2(1 + ewi)−1

Log-normal ϕ(wi) Φ(−wi)

In light of the above we may think of the generic parameter vector θ, as in f(t, X, θ), as composed of

the coeﬃcients on the covariates, β, plus (in all cases but the exponential) the additional parameter

σ.

A complication in estimation of θis posed by “incomplete spells”. That is, in some cases the state

in question may not have ended at the time the observation is made (e.g. some workers remain

unemployed, some components have not yet failed). If we use tito denote the time from entering the

state to either (a) exiting the state or (b) the observation window closing, whichever comes ﬁrst, then

all we know of the “right-censored” cases (b) is that the duration was at least as long as ti. This can

be handled by rewriting the the log-likelihood (compare 38.19) as

ℓi=

i=1

δilog S(wi) + (1 −δi) [−log σ+ log f(wi)] (38.21)

where δiequals 1 for censored cases (incomplete spells), and 0 for complete observations. The rationale

for this is that the log-density equals the sum of the log hazard and the log survivor function, but for

the incomplete spells only the survivor function contributes to the likelihood. So in (38.21) we are

adding up the log survivor function alone for the incomplete cases, plus the full log density for the

completed cases.

Implementation in gretl and illustration

The duration command accepts a list of series on the usual pattern: dependent variable followed by

covariates. If right-censoring is present in the data this should be represented by a dummy variable

corresponding to δiabove, separated from the covariates by a semicolon. For example,

duration durat 0 X ; cens

where durat measures durations, 0represents the constant (which is required for such models), Xis

a named list of regressors, and cens is the censoring dummy.

By default the Weibull distribution is used; you can substitute any of the other three distributions

discussed here by appending one of the option ﬂags --exponential,--loglogistic or --lognormal.

Chapter 38. Discrete and censored dependent variables 376

Interpreting the coeﬃcients in a duration model requires some care, and we will work through an

illustrative case. The example comes from section 20.3 of Wooldridge (2002a) and it concerns criminal

recidivism.7The data (ﬁlename recid.gdt) pertain to a sample of 1,445 convicts released from prison

between July 1, 1977 and June 30, 1978. The dependent variable is the time in months until they are

again arrested. The information was gathered retrospectively by examining records in April 1984; the

maximum possible length of observation is 81 months. Right-censoring is important: when the date

were compiled about 62 percent had not been rearrested. The dataset contains several covariates,

which are described in the data ﬁle; we will focus below on interpretation of the married variable, a

dummy which equals 1 if the respondent was married when imprisoned.

Listing 38.6 shows the gretl commands for Weibull and log-normal models along with most of the

output. Consider ﬁrst the Weibull scale factor, σ. The estimate is 1.241 with a standard error of

0.048. (We don’t print a zscore and p-value for this term since H0:σ= 0 is not of interest.) Recall

that σcorresponds to 1/α; we can be conﬁdent that αis less than 1, so recidivism displays negative

duration dependence. This makes sense: it is plausible that if a past oﬀender manages to stay out

of trouble for an extended period his risk of engaging in crime again diminishes. (The exponential

model would therefore not be appropriate in this case.)

On a priori grounds, however, we may doubt the monotonic decline in hazard that is implied by the

Weibull speciﬁcation. Even if a person is liable to return to crime, it seems relatively unlikely that he

would do so straight out of prison. In the data, we ﬁnd that only 2.6 percent of those followed were

rearrested within 3 months. The log-normal speciﬁcation, which allows the hazard to rise and then

fall, may be more appropriate. Using the duration command again with the same covariates but the

--lognormal ﬂag, we get a log-likelihood of −1597 as against −1633 for the Weibull, conﬁrming that

the log-normal gives a better ﬁt.

Let us now focus on the married coeﬃcient, which is positive in both speciﬁcations but larger and

more sharply estimated in the log-normal variant. The ﬁrst thing is to get the interpretation of the

sign right. Recall that X β enters negatively into the intermediate variable w(equation 38.20). The

Weibull hazard is λ(wi) = ewi, so being married reduces the hazard of re-oﬀending, or in other words

lengthens the expected duration out of prison. The same qualitative interpretation applies for the

log-normal.

To get a better sense of the married eﬀect, it is useful to show its impact on the hazard across time.

We can do this by plotting the hazard for two values of the index function Xβ: in each case the values

of all the covariates other than married are set to their means (or some chosen values) while married

is set ﬁrst to 0 then to 1. Listing 38.7 provides a script that does this, and the resulting plots are

shown in Figure 38.1. Note that when computing the hazards we need to multiply by the Jacobian

of the transformation from tito wi= log(ti−xiβ)/σ, namely 1/t. Note also that the estimate of σ

is available via the accessor $sigma, but it is also present as the last element in the coeﬃcient vector

obtained via $coeff.

A further diﬀerence between the Weibull and log-normal speciﬁcations is illustrated in the plots. The

Weibull is an instance of a proportional hazard model. This means that for any sets of values of the

covariates, xiand xj, the ratio of the associated hazards is invariant with respect to duration. In

this example the Weibull hazard for unmarried individuals is always 1.1637 times that for married.

In the log-normal variant, on the other hand, this ratio gradually declines from 1.6703 at one month

to 1.1766 at 100 months.

Alternative representations of the Weibull model

One point to watch out for with the Weibull duration model is that the estimates may be represented

in diﬀerent ways. The representation given by gretl is sometimes called the accelerated failure-time

(AFT) metric. An alternative that one sometimes sees is the log relative-hazard metric; in fact this is

the metric used in Wooldridge’s presentation of the recidivism example. To get from AFT estimates

to log relative-hazard form it is necessary to multiply the coeﬃcients by −σ−1. For example, the

married coeﬃcient in the Weibull speciﬁcation as shown here is 0.188104 and ˆσis 1.24090, so the

alternative value is −0.152, which is what Wooldridge shows (2002a, Table 20.1).

7Germ´an Rodr´ıguez of Princeton University has a page discussing this example and displaying estimates from Stata

at http://data.princeton.edu/pop509/recid1.html.

Chapter 38. Discrete and censored dependent variables 377

Listing 38.6: Models for recidivism data [Download ▼]

open recid.gdt

list X = workprg priors tserved felon alcohol drugs \

black married educ age

duration durat 0 X ; cens

duration durat 0 X ; cens --lognormal

Partial output:

Model 1: Duration (Weibull), using observations 1-1445

Dependent variable: durat

coefficient std. error z p-value

--------------------------------------------------------

const 4.22167 0.341311 12.37 3.85e-35 ***

workprg -0.112785 0.112535 -1.002 0.3162

priors -0.110176 0.0170675 -6.455 1.08e-10 ***

tserved -0.0168297 0.00213029 -7.900 2.78e-15 ***

felon 0.371623 0.131995 2.815 0.0049 ***

alcohol -0.555132 0.132243 -4.198 2.69e-05 ***

drugs -0.349265 0.121880 -2.866 0.0042 ***

black -0.563016 0.110817 -5.081 3.76e-07 ***

married 0.188104 0.135752 1.386 0.1659

educ 0.0289111 0.0241153 1.199 0.2306

age 0.00462188 0.000664820 6.952 3.60e-12 ***

sigma 1.24090 0.0482896

Chi-square(10) 165.4772 p-value 2.39e-30

Log-likelihood -1633.032 Akaike criterion 3290.065

Model 2: Duration (log-normal), using observations 1-1445

Dependent variable: durat

coefficient std. error z p-value

---------------------------------------------------------

const 4.09939 0.347535 11.80 4.11e-32 ***

workprg -0.0625693 0.120037 -0.5213 0.6022

priors -0.137253 0.0214587 -6.396 1.59e-10 ***

tserved -0.0193306 0.00297792 -6.491 8.51e-11 ***

felon 0.443995 0.145087 3.060 0.0022 ***

alcohol -0.634909 0.144217 -4.402 1.07e-05 ***

drugs -0.298159 0.132736 -2.246 0.0247 **

black -0.542719 0.117443 -4.621 3.82e-06 ***

married 0.340682 0.139843 2.436 0.0148 **

educ 0.0229194 0.0253974 0.9024 0.3668

age 0.00391028 0.000606205 6.450 1.12e-10 ***

sigma 1.81047 0.0623022

Chi-square(10) 166.7361 p-value 1.31e-30

Log-likelihood -1597.059 Akaike criterion 3218.118

Chapter 38. Discrete and censored dependent variables 378

Listing 38.7: Create plots showing conditional hazards [Download ▼]

open recid.gdt -q

# leave ’married’ separate for analysis

list X = workprg priors tserved felon alcohol drugs \

black educ age

# Weibull variant

duration durat 0 X married ; cens

# coefficients on all Xs apart from married

matrix beta_w = $coeff[1:$ncoeff-2]

# married coefficient

scalar mc_w = $coeff[$ncoeff-1]

scalar s_w = $sigma

# Log-normal variant

duration durat 0 X married ; cens --lognormal

matrix beta_n = $coeff[1:$ncoeff-2]

scalar mc_n = $coeff[$ncoeff-1]

scalar s_n = $sigma

list allX = 0 X

# evaluate X\beta at means of all variables except marriage

scalar Xb_w = meanc({allX}) * beta_w

scalar Xb_n = meanc({allX}) * beta_n

# construct two plot matrices

matrix mat_w = zeros(100, 3)

matrix mat_n = zeros(100, 3)

loop t=1..100

# first column, duration

mat_w[t, 1] = t

mat_n[t, 1] = t

wi_w = (log(t) - Xb_w)/s_w

wi_n = (log(t) - Xb_n)/s_n

# second col: hazard with married = 0

mat_w[t, 2] = (1/t) * exp(wi_w)

mat_n[t, 2] = (1/t) * pdf(z, wi_n) / cdf(z, -wi_n)

wi_w = (log(t) - (Xb_w + mc_w))/s_w

wi_n = (log(t) - (Xb_n + mc_n))/s_n

# third col: hazard with married = 1

mat_w[t, 3] = (1/t) * exp(wi_w)

mat_n[t, 3] = (1/t) * pdf(z, wi_n) / cdf(z, -wi_n)

endloop

cnameset(mat_w, "months unmarried married")

cnameset(mat_n, "months unmarried married")

gnuplot 2 3 1 --with-lines --supp --matrix=mat_w --output=weibull.plt

gnuplot 2 3 1 --with-lines --supp --matrix=mat_n --output=lognorm.plt

Chapter 38. Discrete and censored dependent variables 379

0.006

0.008

0.010

0.012

0.014

0.016

0.018

0.020

0 20 40 60 80 100

months

Weibull

unmarried

married

0.006

0.008

0.010

0.012

0.014

0.016

0.018

0.020

0 20 40 60 80 100

months

Log-normal

unmarried

married

Figure 38.1: Recidivism hazard estimates for married and unmarried ex-convicts

Chapter 38. Discrete and censored dependent variables 380

Fitted values and residuals

By default, gretl computes ﬁtted values (accessible via $yhat) as the conditional mean of duration.

The formulae are shown below (where Γ denotes the gamma function, and the exponential variant is

just Weibull with σ= 1).

Weibull Log-logistic Log-normal

exp(Xβ)Γ(1 + σ) exp(X β)πσ

sin(πσ)exp(X β +σ2/2)

The expression given for the log-logistic mean, however, is valid only for σ < 1; otherwise the expec-

tation is undeﬁned, a point that is not noted in all software.8

Alternatively, if the --medians option is given, gretl’s duration command will produce conditional

medians as the content of $yhat. For the Weibull the median is exp(X β)(log 2)σ; for the log-logistic

and log-normal it is just exp(Xβ ).

The values we give for the accessor $uhat are generalized (Cox–Snell) residuals, computed as the

integrated hazard function, which equals the negative log of the survivor function:

ϵi= Λ(ti, xi, θ) = −log S(ti, xi, θ)

Under the null of correct speciﬁcation of the model these generalized residuals should follow the unit

exponential distribution, which has mean and variance both equal to 1 and density exp(−ϵ). See

chapter 18 of Cameron and Trivedi (2005) for further discussion.

8The predict adjunct to the streg command in Stata 10, for example, gaily produces large negative values for the

log-logistic mean in duration models with σ > 1.

Chapter 39

Quantile regression

39.1 Introduction

In Ordinary Least Squares (OLS) regression, the ﬁtted values, ˆyi=Xiˆ

β, represent the conditional

mean of the dependent variable—conditional, that is, on the regression function and the values

of the independent variables. In median regression, by contrast and as the name implies, ﬁtted

values represent the conditional median of the dependent variable. It turns out that the principle

of estimation for median regression is easily stated (though not so easily computed), namely, choose

βso as to minimize the sum of absolute residuals. Hence the method is known as Least Absolute

Deviations or LAD. While the OLS problem has a straightforward analytical solution, LAD is a linear

programming problem.

Quantile regression is a generalization of median regression: the regression function predicts the

conditional τ-quantile of the dependent variable—for example the ﬁrst quartile (τ=.25) or the ninth

decile (τ=.90).

If the classical conditions for the validity of OLS are satisﬁed—that is, if the error term is indepen-

dently and identically distributed, conditional on X—then quantile regression is redundant: all the

conditional quantiles of the dependent variable will march in lockstep with the conditional mean.

Conversely, if quantile regression reveals that the conditional quantiles behave in a manner quite

distinct from the conditional mean, this suggests that OLS estimation is problematic.

Gretl has oﬀered quantile regression functionality since version 1.7.5 (in addition to basic LAD re-

gression, which has been available since early in gretl’s history via the lad command).1

39.2 Basic syntax

The basic invocation of quantile regression is

quantreg tau reglist

where

•reglist is a standard gretl regression list (dependent variable followed by regressors, including

the constant if an intercept is wanted); and

•tau is the desired conditional quantile, in the range 0.01 to 0.99, given either as a numerical

value or the name of a pre-deﬁned scalar variable (but see below for a further option).

Estimation is via the Frisch–Newton interior point solver (Portnoy and Koenker,1997), which is sub-

stantially faster than the“traditional” Barrodale–Roberts (1974) simplex approach for large problems.

By default, standard errors are computed according to the asymptotic formula given by Koenker

and Bassett (1978). Alternatively, if the --robust option is given, we use the sandwich estimator

developed in Koenker and Zhao (1994).2

1We gratefully acknowledge our borrowing from the quantreg package for GNU R(version 4.17). The core of the

package is composed of Fortran code written by Roger Koenker; this is accompanied by various driver and auxiliary

functions written in the Rlanguage by Koenker and Martin M¨

achler. The latter functions have been re-worked in C

for gretl. We have added some guards against potential numerical problems in small samples.

2These correspond to the iid and nid options in R’s quantreg package, respectively.

381

Chapter 39. Quantile regression 382

39.3 Conﬁdence intervals

An option --intervals is available. When this is given we print conﬁdence intervals for the parameter

estimates instead of standard errors. These intervals are computed using the rank inversion method

and in general they are asymmetrical about the point estimates—that is, they are not simply “plus

or minus so many standard errors”. The speciﬁcs of the calculation are inﬂected by the --robust

option: without this, the intervals are computed on the assumption of IID errors (Koenker,1994);

with it, they use the heteroskedasticity-robust estimator developed by Koenker and Machado (1999).

By default, 90 percent intervals are produced. You can change this by appending a conﬁdence value

(expressed as a decimal fraction) to the intervals option, as in

quantreg tau reglist --intervals=.95

When the conﬁdence intervals option is selected, the parameter estimates are calculated using the

Barrodale–Roberts method. This is simply because the Frisch–Newton code does not currently sup-

port the calculation of conﬁdence intervals.

Two further details. First, the mechanisms for generating conﬁdence intervals for quantile estimates

require that the model has at least two regressors (including the constant). If the --intervals

option is given for a model containing only one regressor, an error is ﬂagged. Second, when a model

is estimated in this mode, you can retrieve the conﬁdence intervals using the accessor $coeff_ci.

This produces a k×2 matrix, where kis the number of regressors. The lower bounds are in the ﬁrst

column, the upper bounds in the second. See also section 39.5 below.

39.4 Multiple quantiles

As a further option, you can give tau as a matrix—either the name of a predeﬁned matrix or in

numerical form, as in {.05, .25, .5, .75, .95}. The given model is estimated for all the τvalues

and the results are printed in a special form, as shown below (in this case the --intervals option

was also given).

Model 1: Quantile estimates using the 235 observations 1-235

Dependent variable: foodexp

With 90 percent confidence intervals

VARIABLE TAU COEFFICIENT LOWER UPPER

const 0.05 124.880 98.3021 130.517

0.25 95.4835 73.7861 120.098

0.50 81.4822 53.2592 114.012

0.75 62.3966 32.7449 107.314

0.95 64.1040 46.2649 83.5790

income 0.05 0.343361 0.343327 0.389750

0.25 0.474103 0.420330 0.494329

0.50 0.560181 0.487022 0.601989

0.75 0.644014 0.580155 0.690413

0.95 0.709069 0.673900 0.734441

The gretl GUI has an entry for Quantile Regression (under /Model/Robust estimation), and you can

select multiple quantiles there too. In that context, just give space-separated numerical values (as

per the predeﬁned options, shown in a drop-down list).

When you estimate a model in this way most of the standard menu items in the model window are

disabled, but one extra item is available—graphs showing the τsequence for a given coeﬃcient in

comparison with the OLS coeﬃcient. An example is shown in Figure 39.1. This sort of graph provides

a simple means of judging whether quantile regression is redundant (OLS is ﬁne) or informative.

In the example shown—based on data on household income and food expenditure gathered by Ernst

Engel (1821–1896)—it seems clear that simple OLS regression is potentially misleading. The “crossing”

of the OLS estimate by the quantile estimates is very marked.

Chapter 39. Quantile regression 383

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0 0.2 0.4 0.6 0.8 1

tau

Coefficient on income

Quantile estimates with 90% band

OLS estimate with 90% band

Figure 39.1: Regression of food expenditure on income; Engel’s data

However, it is not always clear what implications should be drawn from this sort of conﬂict. With the

Engel data there are two issues to consider. First, Engel’s famous “law” claims an income-elasticity

of food consumption that is less than one, and talk of elasticities suggests a logarithmic formulation

of the model. Second, there are two apparently anomalous observations in the data set: household

105 has the third-highest income but unexpectedly low expenditure on food (as judged from a simple

scatter plot), while household 138 (which also has unexpectedly low food consumption) has much the

highest income, almost twice that of the next highest.

With n= 235 it seems reasonable to consider dropping these observations. If we do so, and adopt a

log–log formulation, we get the plot shown in Figure 39.2. The quantile estimates still cross the OLS

estimate, but the “evidence against OLS” is much less compelling: the 90 percent conﬁdence bands

of the respective estimates overlap at all the quantiles considered.

A script to produce the results discussed above is presented in listing 39.1.

39.5 Large datasets

As noted above, when you give the --intervals option with the quantreg command, which calls for

estimation of conﬁdence intervals via rank inversion, gretl switches from the default Frisch–Newton

algorithm to the Barrodale–Roberts simplex method.

This is OK for moderately large datasets (up to, say, a few thousand observations) but on very large

problems the simplex algorithm may become seriously bogged down. For example, Koenker and

Hallock (2001) present an analysis of the determinants of birth weights, using 198377 observations

and with 15 regressors. Generating conﬁdence intervals via Barrodale–Roberts for a single value of τ

took about half an hour on a Lenovo Thinkpad T60p with 1.83GHz Intel Core 2 processor.

If you want conﬁdence intervals in such cases, you are advised not to use the --intervals option, but

to compute them using the method of “plus or minus so many standard errors”. (One Frisch–Newton

run took about 8 seconds on the same machine, showing the superiority of the interior point method.)

The script below illustrates:

quantreg .10 y 0 xlist

scalar crit = qnorm(.95)

matrix ci = $coeff - crit * $stderr

ci = ci~($coeff + crit * $stderr)

print ci

Chapter 39. Quantile regression 384

0.76

0.78

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0 0.2 0.4 0.6 0.8 1

tau

Coefficient on log(income)

Quantile estimates with 90% band

OLS estimate with 90% band

Figure 39.2: Log–log regression; 2 observations dropped from full Engel data set.

Listing 39.1: Food expenditure and income, Engel data [Download ▼]

# this data file is supplied with gretl

open engel.gdt

# specify some quantiles

matrix tau = {.05, .25, .5, .75, .95}

# use levels of variables

QM1 <- quantreg tau foodexp 0 income --intervals

# use log-log specification, with two outliers removed

logs foodexp income

smpl obs!=105 && obs!=138 --restrict

QM2 <- quantreg tau l_foodexp 0 l_income --intervals

The script saves the two models “as icons”. Double-clicking on a model’s icon opens a window to

display the results, and the Graph menu in this window gives access to a tau-sequence plot.

Chapter 39. Quantile regression 385

The matrix ci will contain the lower and upper bounds of the (symmetrical) 90 percent conﬁdence

intervals.

To avoid a situation where gretl becomes unresponsive for a very long time we have set the maximum

number of iterations for the Borrodale–Roberts algorithm to the (somewhat arbitrary) value of 1000.

We will experiment further with this, but for the meantime if you really want to use this method

on a large dataset, and don’t mind waiting for the results, you can increase the limit using the set

command with parameter rq_maxiter, as in

set rq_maxiter 5000

Chapter 40

Nonparametric methods

The main focus of gretl is on parametric estimation, but we oﬀer a selection of nonparametric methods.

The most basic of these

•various tests for diﬀerence in distribution (Sign test, Wilcoxon rank-sum test, Wilcoxon signed-

rank test);

•the Runs test for randomness; and

•nonparametric measures of association: Spearman’s rho and Kendall’s tau.

Details on the above can be found by consulting the help for the commands difftest,runs,corr

and spearman. In the GUI program these items are found under the Tools menu and the Robust

estimation item under the Model menu.

In this chapter we concentrate on two relatively complex methods for nonparametric curve-ﬁtting and

prediction, namely William Cleveland’s “loess” (also known as “lowess”) and the Nadaraya–Watson

estimator.

40.1 Locally weighted regression (loess)

Loess (Cleveland,1979) is a nonparametric smoother employing locally weighted polynomial regres-

sion. It is intended to yield an approximation to g(·) when the dependent variable, y, can be expressed

yi=g(xi) + ϵi

for some smooth function g(·).

Given a sample of nobservations on the variables yand x, the procedure is to run a weighted least

squares regression (a polynomial of order d= 0, 1 or 2 in x) localized to each data point, i. In each

such regression the sample consists of the rnearest neighbors (in the xdimension) to the point i,

with weights that are inversely related to the distance |xi−xk|,k= 1, . . . , r. The predicted value

ˆyiis then obtained by evaluating the estimated polynomial at xi. The most commonly used order is

d= 1.

A bandwidth parameter 0 < q ≤1 controls the proportion of the total number of data points used in

each regression; thus r=qn (rounded up to an integer). Larger values of qlead to a smoother ﬁtted

series, smaller values to a series that tracks the actual data more closely; 0.25 ≤q≤0.5 is often a

suitable range.

In gretl’s implementation of loess the weighting scheme is that given by Cleveland, namely,

wk(xi) = W(h−1

i(xk−xi))

where hiis the distance between xiand its rth nearest neighbor, and W(·) is the tricube function,

W(x) = ((1 − |x|3)3for |x|<1

0 for |x| ≥ 1

The local regression can be made robust via an adjustment based on the residuals, ei=yi−ˆyi.

Robustness weights, δk, are deﬁned by

δk=B(ek/6s)

386

Chapter 40. Nonparametric methods 387

where sis the median of the |ei|and B(·) is the bisquare function,

B(x) = ((1 −x2)2for |x|<1

0 for |x| ≥ 1

The polynomial regression is then re-run using weight δkwk(xi) at (xk, yk).

The loess() function in gretl takes up to ﬁve arguments as follows: the yseries, the xseries, the

order d, the bandwidth q, and a Boolean switch to turn on the robust adjustment. The last three

arguments are optional: if they are omitted the default values are d= 1, q= 0.5 and no robust

adjustment. An example of a full call to loess() is shown below; in this case a quadratic in xis

speciﬁed, three quarters of the data points will be used in each local regression, and robustness is

turned on:

series yh = loess(y, x, 2, 0.75, 1)

An illustration of loess is provided in Listing 40.1: we generate a series that has a deterministic sine

wave component overlaid with noise uniformly distributed on (−1,1). Loess is then used to retrieve

a good approximation to the sine function. The resulting graph is shown in Figure 40.1.

Listing 40.1: Loess script [Download ▼]

nulldata 120

series x = index

scalar n = $nobs

series y = sin(2*pi*x/n) + uniform(-1, 1)

series yh = loess(y, x, 2, 0.75, 0)

gnuplot y yh x --output=display --with-lines=yh

-2

-1.5

-1

-0.5

0.5

1.5

0 20 40 60 80 100 120

loess fit

Figure 40.1: Loess: retrieving a sine wave

40.2 The Nadaraya–Watson estimator

The Nadaraya–Watson nonparametric estimator (Nadaraya,1964;Watson,1964) is an estimator for

the conditional mean of a variable Y, available in a sample of size n, for a given value of a conditioning

variable X, and is deﬁned as

m(X) = Pn

j=1 yj·Kh(X−xj)

j=1 Kh(X−xj)

Chapter 40. Nonparametric methods 388

where Kh(·) is the so-called kernel function, which is usually some simple transform of a density

function that depends on a scalar, h, known as the bandwidth. The one used by gretl is

Kh(x) = exp −x2

2h

for |x|< τ and zero otherwise. Larger values of hproduce a smoother function. The scalar τ, known

as the trim parameter, is used to prevent numerical problems when the kernel function is evaluated

too far away from zero.

A common variant of Nadaraya–Watson is the so-called “leave-one-out” estimator, which omits the

i-th observation when evaluating m(xi). The formula therefore becomes

m(xi) = Pj=iyj·Kh(xi−xj)

Pj=iKh(xi−xj)

This makes the estimator more robust numerically and its usage is often advised for inference purposes.

The nadarwat() function in gretl takes up to ﬁve arguments as follows: the dependent series y, the

independent series x, the bandwidth h, a Boolean switch to turn on “leave-one-out”, and a value for

the trim parameter τ, expressed as a multiple of h. The last three arguments are optional; if they are

omitted the default values are, respectively, an automatic data-determined value for h(see below),

leave-one-out not activated, and τ= 4. The default value of τoﬀers a relatively safe guard against

numerical problems; in some cases a larger τmay produce more sensible values in regions of Xwith

sparse support.

Choice of bandwidth

As mentioned above, larger values of hlead to a smoother m(·) function; smaller values make the

m(·) function follow the yivalues more closely, so that the function appears more “jagged”. In fact,

as h→ ∞,m(xi)→¯

Y; on the contrary, if h→0, observations for which xi=Xare not taken into

account at all when computing m(X). Also, the statistical properties of m(·) vary with h: its variance

can be shown to be decreasing in h, while its squared bias is increasing in h. It can be shown that

choosing h∼n−1/5minimizes the RMSE, so that value is customarily taken as a reference point.

If the argument his omitted or set to 0, gretl uses the following data-determined value:

h= 0.9·min hs, r

1.349i·n−1/5

where sis the sample standard deviation of xand ris its interquartile range.

Example and prediction

By way of example, Listing 40.2 produces the graph shown in Figure 40.2 (after some slight editing).

Although Xcould be, in principle, any value, in the typical usage of this estimator you want to

compute m(X) for Xequal to one or more values actually observed in your sample, that is m(xi).

If you need a point estimate of m(X) for some value of Xwhich is not present among the valid

observations of your dependent variable, you may want to add some “fake” observations to your

dataset in which yis missing and xcontains the values you want m(x) evaluated at. For example,

the following script evaluates m(x) at regular intervals between −2.0 and 2.0:

nulldata 120

set seed 120496

# first part of the sample: actual data

smpl 1 100

x = normal()

y = x^2 + sin(x) + normal()

# second part of the sample: fake x data

smpl 101 120

x = (obs-110) / 5

Chapter 40. Nonparametric methods 389

Listing 40.2: Nadaraya–Watson example [Download ▼]

# Nonparametric regression example: husband’s age on wife’s age

open mroz87.gdt

# initial value for the bandwidth

scalar h = $nobs^(-0.2)

# three increasingly smooth estimates

series m0 = nadarwat(HA, WA, h)

series m1 = nadarwat(HA, WA, h * 5)

series m2 = nadarwat(HA, WA, h * 10)

# produce the graph

dataset sortby WA

gnuplot HA m0 m1 m2 WA --output=display --with-lines=m0,m1,m2

30 35 40 45 50 55 60

Figure 40.2: Nadaraya–Watson example for several choices of the bandwidth parameter

Chapter 40. Nonparametric methods 390

# compute the Nadaraya-Watson estimate

# with bandwidth equal to 0.4 (note that

# 100^(-0.2) = 0.398)

smpl full

m = nadarwat(y, x, 0.4)

# show m(x) for the fake x values only

smpl 101 120

print x m -o

and running it produces

x m

101 -1.8 1.165934

102 -1.6 0.730221

103 -1.4 0.314705

104 -1.2 0.026057

105 -1.0 -0.131999

106 -0.8 -0.215445

107 -0.6 -0.269257

108 -0.4 -0.304451

109 -0.2 -0.306448

110 0.0 -0.238766

111 0.2 -0.038837

112 0.4 0.354660

113 0.6 0.908178

114 0.8 1.485178

115 1.0 2.000003

116 1.2 2.460100

117 1.4 2.905176

118 1.6 3.380874

119 1.8 3.927682

120 2.0 4.538364

Chapter 41

MIDAS models

The acronym MIDAS stands for“Mixed Data Sampling”. MIDAS models can essentially be described

as models where one or more independent variables are observed at a higher frequency than the

dependent variable, and possibly an ad-hoc parsimonious parameterization is adopted. See Ghysels

et al.,2004;Ghysels,2015;Armesto et al.,2010 for a fuller introduction. Naturally, these models

require easy handling of multiple-frequency data. The way this is done in gretl is explained in Chapter

20; in this chapter, we concentrate on the numerical aspects of estimation.

41.1 Parsimonious parameterizations

The simplest MIDAS regression speciﬁcation—known as“unrestricted MIDAS” or U-MIDAS—simply

includes plags of a high-frequency regressor, each with its own parameter to be estimated. A typical

case can be written as

yt=β0+αyt−1+

i=1

δixτ−i+εt(41.1)

where τrepresents the reference point of the sequence of high-frequency lags in “high-frequency

time”.1Obvious generalizations of this speciﬁcation include a higher AR order for yand inclusion of

additional low- and/or high-frequency regressors.

Estimation of (41.1) can be accomplished via OLS. However, it is more common to enforce parsimony

by making the individual coeﬃcients on lagged high-frequency terms a function of a relatively small

number of hyperparameters, as in

yt=β0+αyt−1+γW (xτ−1, xτ−2, . . . , xτ−p;θ) + εt(41.2)

where W(·) is the weighting function associated with a given parameterization and θis a k-vector of

hyperparameters, k < p.

This presents a couple of computational questions: how to calculate the per-lag coeﬃcients given the

values of the hyperparameters, and how best to estimate the value of the hyperparameters? Gretl can

handle natively four commonly used parameterizations: normalized exponential Almon, normalized

beta (with or without a zero last coeﬃcient), and plain (non-normalized) Almon polynomial. The

Almon variants take one or more parameters (two being a common choice). The beta variants take

either two or three parameters. Full details on the forms taken by the W(·) function are provided in

section 41.3.

All variants are handled by the functions mweights and mgradient, which work as follows.

•mweights takes three arguments: the number of lags required (p), the k-vector of hyperparam-

eters (θ), and an integer code or string indicating the method (see Table 41.1). It returns a

p-vector containing the coeﬃcients.

•mgradient takes three arguments, just like mweights. However, this function returns a p×k

matrix holding the (analytical) gradient of the pcoeﬃcients or weights with respect to the k

elements of θ.

In the case of the non-normalized Almon polynomial the γcoeﬃcient in (41.2) is identically 1.0

and is omitted. The "beta1" case is the the same as the two-parameter "beta0" except that θ1is

constrained to equal 1, leaving θ2as the only free parameter. Ghysels and Qian (2016) make a case

for use of this particularly parsimonious version.2

1For discussion of the placement of this reference point relative to low-frequency time, see section 20.3 above.

2Note, however, that at present "beta1" cannot be mixed with other parameterizations in a single model.

391

Chapter 41. MIDAS models 392

Parameterization code string

Normalized exponential Almon 1 "nealmon"

Normalized beta, zero last lag 2 "beta0"

Normalized beta, non-zero last lag 3 "betan"

Almon polynomial 4 "almonp"

One-parameter beta 5 "beta1"

Table 41.1: MIDAS parameterizations

An additional function is provided for convenience: it is named mlincomb and it combines mweights

with the lincomb function, which takes a list (of series) argument followed by a vector of coeﬃcients

and produces a series result, namely a linear combination of the elements of the list. If we have a

suitable list Xavailable, we can do, for example,

series foo = mlincomb(X, theta, "beta0")

This is equivalent to

series foo = lincomb(X, mweights(nelem(X), theta, "beta0"))

but saves a little typing and some CPU cycles.

41.2 Estimating MIDAS models

Gretl oﬀers a dedicated command, midasreg, for estimation of MIDAS models. (There’s a corre-

sponding item, MIDAS, under the Time series section of the Model menu in the gretl GUI.) We begin

by discussing that, then move on to possibilities for deﬁning your own estimator.

The syntax of midasreg looks like this:

midasreg depvar xlist ;midas-terms [options ]

The depvar slot takes the name (or series ID number) of the dependent variable, and xlist is the

list of regressors that are observed at the same frequency as the dependent variable; this list may

contain lags of the dependent variable. The midas-terms slot accepts one or more speciﬁcation(s)

for high-frequency terms. Each of these speciﬁcations must conform to one or other of the following

patterns:

1mds(mlist,minlag,maxlag,type,theta )

2mdsl(llist,type,theta )

In case 1 mlist must be a MIDAS list, as deﬁned in section 20.2, which contains a full set of per-period

series but no lags. Lags will be generated automatically, governed by the minlag and maxlag (integer)

arguments, which may be given as numerical values or the names of predeﬁned scalar variables. The

integer (or string) type argument represents the type of parameterization; in addition to the values

1 to 4 deﬁned in Table 41.1 a value of 0 (or the string "umidas") indicates unrestricted MIDAS.

In case 2 llist is assumed to be a list that already contains the required set of high-frequency lags—

as may be obtained via the hflags function described in section 20.3—hence minlag and maxlag are

not wanted.

The ﬁnal theta argument is optional in most cases (implying an automatic initialization of the

hyperparameters). If this argument is given it must take one of the following forms:

1. The name of a matrix (vector) holding initial values for the hyperparameters, or a simple

expression which deﬁnes a matrix using scalars, such as {1, 5}.

2. The keyword null, indicating that an automatic initialization should be used (as happens when

this argument is omitted).

3. An integer value (in numerical form), indicating how many hyperparameters should be used

(which again calls for automatic initialization).

Chapter 41. MIDAS models 393

The third of these forms is required if you want automatic initialization in the Almon polynomial

case, since we need to know how many terms you wish to include. (In the normalized exponential

Almon case we default to the usual two hyperparameters if theta is omitted or given as null.)

The midasreg syntax allows the user to specify multiple high-frequency predictors, if wanted: these

can have diﬀerent lag speciﬁcations, diﬀerent parameterizations and/or diﬀerent frequencies.

The options accepted by midasreg include --quiet (suppress printed output), --verbose (show

detail of iterations, if applicable) and --robust (use a HAC estimator of the Newey–West type in

computing standard errors). Two additional specialized options are described below.

Examples of usage

Suppose we have a dependent variable named dy and a MIDAS list named dX, and we wish to run

a MIDAS regression using one lag of the dependent variable and high-frequency lags 1 to 10 of the

series in dX. The following will produce U-MIDAS estimates:

midasreg dy const dy(-1) ; mds(dX, 1, 10, 0)

The next lines will produce estimates for the normalized exponential Almon parameterization with

two coeﬃcients, both initialized to zero:

midasreg dy const dy(-1) ; mds(dX, 1, 10, "nealmon", {0,0})

In the examples above, the required lags will be added to the dataset automatically then deleted

after use. If you are estimating several models using a single set of MIDAS lags it is more eﬃcient

to create the lags once and use the mdsl speciﬁer. For example, the following estimates three variant

parameterizations (exponential Almon, beta with zero last lag, and beta with non-zero last lag) on

the same data:

list dXL = hflags(1, 10, dX)

midasreg dy 0 dy(-1) ; mdsl(dXL, "nealmon", {0,0})

midasreg dy 0 dy(-1) ; mdsl(dXL, "beta0", {1,5})

midasreg dy 0 dy(-1) ; mdsl(dXL, "betan", {1,1,0})

Any additional MIDAS terms should be separated by spaces, as in

midasreg dy const dy(-1) ; mds(dX,1,9,1,theta1) mds(Z,1,6,3,theta2)

Replication exercise

We give a substantive illustration of midasreg in Listing 41.1. This replicates the ﬁrst practical

example discussed by Ghysels in the user’s guide titled MIDAS Matlab Toolbox,3The dependent

variable is the quarterly log-diﬀerence of real GDP, named dy in our script. The independent variables

are the ﬁrst lag of dy and monthly lags 3 to 11 of the monthly log-diﬀerence of non-farm payroll

employment (named dXL in our script). Therefore, in this case equation (41.2) becomes

yt=α+βyt−1+γ W (xτ−3, xτ−4, . . . , xτ−11;θ) + εt

and in the U-MIDAS case the model comes down to

yt=α+βyt−1+

i=1

δixτ−i−2+εt

The script exercises all ﬁve of the parameterizations mentioned above,4and in each case the results of 9

pseudo-out-of-sample forecasts are recorded so that their Root Mean Square Errors can be compared.

3See Ghysels (2015). This document announces itself as Version 2.0 of the guide and is dated November 1, 2015.

The example we’re looking at appears on pages 24–26; the associated Matlab code can be found in the program

appADLMIDAS1.m.

4The Matlab program includes an additional parameterization not supported by gretl, namely a step-function.

Chapter 41. MIDAS models 394

Listing 41.1: Script to replicate results given by Ghysels [Download ▼]

set verbose off

open gdp_midas.gdt --quiet

# form the dependent variable

series dy = 100 * ldiff(qgdp)

# form list of high-frequency lagged log differences

list X = payems*

list dXL = hflags(3, 11, hfldiff(X, 100))

# initialize matrix to collect forecasts

matrix FC = {}

# estimation sample

smpl 1985:1 2009:1

print "=== unrestricted MIDAS (umidas) ==="

midasreg dy 0 dy(-1) ; mdsl(dXL, 0)

fcast --out-of-sample --static --quiet

FC ~= $fcast

print "=== normalized beta with zero last lag (beta0) ==="

midasreg dy 0 dy(-1) ; mdsl(dXL, 2, {1,5})

fcast --out-of-sample --static --quiet

FC ~= $fcast

print "=== normalized beta, non-zero last lag (betan) ==="

midasreg dy 0 dy(-1) ; mdsl(dXL, 3, {1,1,0})

fcast --out-of-sample --static --quiet

FC ~= $fcast

print "=== normalized exponential Almon (nealmon) ==="

midasreg dy 0 dy(-1) ; mdsl(dXL, 1, {0,0})

fcast --out-of-sample --static --quiet

FC ~= $fcast

print "=== Almon polynomial (almonp) ==="

midasreg dy 0 dy(-1) ; mdsl(dXL, 4, 4)

fcast --out-of-sample --static --quiet

FC ~= $fcast

smpl 2009:2 2011:2

matrix my = {dy}

print "Forecast RMSEs:"

printf " umidas %.4f\n", fcstats(my, FC[,1])[2]

printf " beta0 %.4f\n", fcstats(my, FC[,2])[2]

printf " betan %.4f\n", fcstats(my, FC[,3])[2]

printf " nealmon %.4f\n", fcstats(my, FC[,4])[2]

printf " almonp %.4f\n", fcstats(my, FC[,5])[2]

Chapter 41. MIDAS models 395

Listing 41.2: Replication of Ghysels’ results, partial output

=== normalized beta, non-zero last lag (betan) ===

Model 3: MIDAS (NLS), using observations 1985:1-2009:1 (T = 97)

Using L-BFGS-B with conditional OLS

Dependent variable: dy

estimate std. error t-ratio p-value

-------------------------------------------------------

const 0.748578 0.146404 5.113 1.74e-06 ***

dy_1 0.248055 0.118903 2.086 0.0398 **

MIDAS list dXL, high-frequency lags 3 to 11

HF_slope 1.72167 0.582076 2.958 0.0039 ***

Beta1 0.998501 0.0269479 37.05 1.10e-56 ***

Beta2 2.95148 2.93404 1.006 0.3171

Beta3 -0.0743143 0.0271273 -2.739 0.0074 ***

Sum squared resid 28.78262 S.E. of regression 0.562399

R-squared 0.356376 Adjusted R-squared 0.321012

Log-likelihood -78.71248 Akaike criterion 169.4250

Schwarz criterion 184.8732 Hannan-Quinn 175.6715

=== Almon polynomial (almonp) ===

Model 5: MIDAS (NLS), using observations 1985:1-2009:1 (T = 97)

Using Levenberg-Marquardt algorithm

Dependent variable: dy

estimate std. error t-ratio p-value

-------------------------------------------------------

const 0.741403 0.146433 5.063 2.14e-06 ***

dy_1 0.255099 0.119139 2.141 0.0349 **

MIDAS list dXL, high-frequency lags 3 to 11

Almon0 1.06035 1.53491 0.6908 0.4914

Almon1 0.193615 1.30812 0.1480 0.8827

Almon2 -0.140466 0.299446 -0.4691 0.6401

Almon3 0.0116034 0.0198686 0.5840 0.5607

Sum squared resid 28.66623 S.E. of regression 0.561261

R-squared 0.358979 Adjusted R-squared 0.323758

Log-likelihood -78.51596 Akaike criterion 169.0319

Schwarz criterion 184.4802 Hannan-Quinn 175.2784

Forecast RMSEs:

umidas 0.5424

beta0 0.5650

betan 0.5210

nealmon 0.5642

almonp 0.5329

Chapter 41. MIDAS models 396

The data ﬁle used in the replication, gdp_midas.gdt, was contructed as described in section 20.1 (and

as noted there, it is included in the current gretl package). Part of the output from the replication

script is shown in Listing 41.2. The γcoeﬃcient is labeled HF_slope in the gretl output.

For reference, output from Matlab (version R2016a for Linux) is available at http://gretl.sourceforge.

net/midas/matlab_output.txt. For the most part (in respect of regression coeﬃcients and auxiliary

statistics such as R2and forecast RMSEs), gretl’s output agrees with that of Matlab to the extent

that one can reasonably expect on nonlinear problems—that is, to at least 4 signiﬁcant digits in all

but a few instances.5Standard errors are not quite so close across the two programs, particularly for

the hyperparameters of the beta and exponential Almon functions. We show these in Table 41.2.

2-param beta 3-param beta Exp Almon

Matlab gretl Matlab gretl Matlab gretl

const 0.135 0.140 0.143 0.146 0.135 0.140

dy(-1) 0.116 0.118 0.116 0.119 0.116 0.119

HF slope 0.559 0.575 0.566 0.582 0.562 0.575

θ10.067 0.106 0.022 0.027 2.695 6.263

θ29.662 17.140 1.884 2.934 0.586 1.655

θ30.022 0.027

Table 41.2: Comparison of standard errors from MIDAS regressions

Diﬀerences of this order are not unexpected, however, when diﬀerent methods are used to calculate

the covariance matrix for a nonlinear regression. The Matlab standard errors are based on a numerical

approximation to the Hessian at convergence, while those produced by gretl are based on a Gauss–

Newton Regression, as discussed and recommended in Davidson and MacKinnon (2004, chapter 6).

Underlying methods

The midasreg command calls one of several possible estimation methods in the background, de-

pending on the MIDAS speciﬁcation(s). As shown in Listing 41.2, this is ﬂagged in a line of output

immediately preceding the “Dependent variable” line. If the only speciﬁcation type is U-MIDAS,

the method is OLS. Otherwise it is one of three variants of Nonlinear Least Squares.

•Levenberg–Marquardt. This is the back-end for gretl’s nls command.

•L-BFGS-B with conditional OLS. L-BFGS is a“limited memory” version of the BFGS optimizer

and the trailing“-B”means that it supports bounds on the parameters, which is useful for reasons

given below.

•Golden Section search with conditional OLS. This is a line search method, used only when there

is a just a single hyperparameter to estimate.

Levenberg–Marquardt is the default NLS method, but if the MIDAS speciﬁcations include any of the

beta variants or normalized exponential Almon we switch to L-BFGS-B, unless the user gives the

--levenberg option. The ability to set bounds on the hyperparameters via L-BFGS-B is helpful,

ﬁrst because the beta parameters (other than the third one, if applicable) must be non-negative but

also because one is liable to run into numerical problems (in calculating the weights and/or gradient)

if their values become too extreme. For example, we have found it useful to place bounds of −2 and

+2 on the exponential Almon parameters.

Here’s what we mean by “conditional OLS” in the context of L-BFGS-B and line search: the search

algorithm itself is only responsible for optimizing the MIDAS hyperparameters, and when the algo-

rithm calls for calculation of the sum of squared residuals given a certain hyperparameter vector we

optimize the remaining parameters (coeﬃcients on base-frequency regressors, slopes with respect to

MIDAS terms) via OLS.

5Nonlinear results, even for a given software package, are subject to slight variation depending on the compiler used

and the exact versions of supporting numerical libraries.

Chapter 41. MIDAS models 397

Testing for a structural break

The --breaktest option can be used to carry out the Quandt Likelihood Ratio (QLR) test for a

structural break at the stage of running the ﬁnal Gauss–Newton regression (to check for convergence

and calculate the covariance matrix of the parameter estimates). This can be a useful aid to diagnosis,

since non-homogeneity of the data over the estimation period can lead to numerical problems in

nonlinear estimation, besides compromising the forecasting capacity of the resulting equation. For

example, when this option is given with the command to estimate the “betan” model shown in

Listing 41.2, the following result is appended to the standard output:

QLR test for structural break -

Null hypothesis: no structural break

Test statistic: chi-square(6) = 35.1745 at observation 2005:2

with asymptotic p-value = 0.000127727

Despite the strong evidence for a structural break, in this case the nonlinear estimator appears to

converge successfully. But one might wonder if a shorter estimation period could provide better

out-of-sample forecasts.

Deﬁning your own MIDAS estimator

As explained above, the midasreg command is in eﬀect a “wrapper” for various underlying methods.

Some users may wish to undo the wrapping. (This would be required if you wish to introduce any

nonlinearity other than that associated with the stock MIDAS parameterizations, or to deﬁne your

own MIDAS parameterization).

Anyone with ambitions in this direction will presumably be quite familiar with the commands and

functions available in hansl, gretl’s scripting language, so we will not say much here beyond presenting

a couple of examples. First we show how the nls command can be used, along with the MIDAS-related

functions described in section 41.1, to estimate a model with the exponential Almon speciﬁcation.

open gdp_midas.gdt --quiet

series dy = 100 * ldiff(qgdp)

series dy1 = dy(-1)

list X = payems*

list dXL = hflags(3, 11, hfldiff(X, 100))

smpl 1985:1 2009:1

# initialization via OLS

series mdX = mean(dXL)

ols dy 0 dy1 mdX --quiet

matrix b = $coeff | {0,0}’

scalar p = nelem(dXL)

# convenience matrix for computing gradient

matrix mdXL = {dXL}

# normalized exponential Almon via nls

nls dy = b[1] + b[2]*dy1 + b[3]*mdx

series mdx = mlincomb(dXL, b[4:], 1)

matrix grad = mgradient(p, b[4:], 1)

deriv b = {const, dy1, mdx} ~ (b[3] * mdXL * grad)

param_names "const dy(-1) HF_slope Almon1 Almon2"

end nls

Listing 41.3 presents a more ambitious example: we use GSSmin (Golden Section minimizer) to esti-

mate a MIDAS model with the “one-parameter beta” speciﬁcation (that is, the two-parameter beta

with θ1clamped at 1). Note that while the function named beta1_SSR is specialized to the given

parameterization, midas_GNR is a fairly general means of calculating the Gauss–Newton regression

for an ADL(1) MIDAS model, and it could be generalized further without much diﬃculty.

Chapter 41. MIDAS models 398

Listing 41.3: Manual MIDAS: one-parameter beta speciﬁcation [Download ▼]

set verbose off

function scalar beta1_SSR (scalar th2, const series y,

const series x, list L)

matrix theta = {1, th2}

series mdx = mlincomb(L, theta, 2)

# run OLS conditional on theta

ols y 0 x mdx --quiet

return $ess

end function

function matrix midas_GNR (const matrix theta, const series y,

const series x, list L, int type)

# Gauss-Newton regression

series mdx = mlincomb(L, theta, type)

ols y 0 x mdx --quiet

matrix b = $coeff

matrix u = {$uhat}

matrix mgrad = mgradient(nelem(L), theta, type)

matrix M = {const, x, mdx} ~ (b[3] * {L} * mgrad)

matrix V

set svd on # in case of strong collinearity

mols(u, M, null, &V)

return (b | theta) ~ sqrt(diag(V))

end function

/* main */

open gdp_midas.gdt --quiet

series dy = 100 * ldiff(qgdp)

series dy1 = dy(-1)

list dX = ld_payem*

list dXL = hflags(3, 11, dX)

# estimation sample

smpl 1985:1 2009:1

matrix b = {0, 1.01, 100}

# use Golden Section minimizer

SSR = GSSmin(b, beta1_SSR(b[1], dy, dy1, dXL), 1.0e-6)

printf "SSR (GSS) = %.15g\n", SSR

matrix theta = {1, b[1]}’ # column vector needed

matrix bse = midas_GNR(theta, dy, dy1, dXL, 2)

bse[4,2] = $nan # mask std error of clamped coefficient

modprint bse "const dy(-1) HF_slope Beta1 Beta2"

Chapter 41. MIDAS models 399

Plot of coeﬃcients

At times, it may be useful to plot the “gross” coeﬃcients on the lags of the high-frequency series

in a MIDAS regression—that is, the normalized weights multiplied by the HF_slope coeﬃcient (the

γin 41.2). After estimation of a MIDAS model in the gretl GUI this is available via the item

MIDAS coeﬃcients under the Graphs menu in the model window. It is also easily generated via script,

since the $model bundle that becomes available following the midasreg command contains a matrix,

midas_coeffs, holding these coeﬃcients. So the following is suﬃcient to display the plot:

matrix m = $model.midas_coeffs

plot m

options with-lp fit=none

literal set title "MIDAS coefficients"

literal set ylabel ’’

end plot --output=display

Caveat: this feature is at present available only for models with a single MIDAS speciﬁcation.

41.3 Parameterization functions

Here we give some more detail of the MIDAS parameterizations supported by gretl.

In general the normalized coeﬃcient or weight i(i= 1, . . . , p) is given by

wi=f(i, θ)

k=1 f(k, θ)(41.3)

such that the coeﬃcients sum to unity.

In the normalized exponential Almon case with mparameters the function f(·) is

f(i, θ) = exp 



j=1

θjij

(41.4)

So in the usual two-parameter case we have

wi=exp θ1i+θ2i2

k=1 exp (θ1k+θ2k2)

and equal weighting is obtained when θ1=θ2= 0.

In the standard, two-parameter normalized beta case we have

f(i, θ)=(i−/p−)θ1−1·(1 −i−/p−)θ2−1(41.5)

where p−=p−1, and i−=i−1 except at the end-points, i= 1 and i=p, where we add and subtract,

respectively, machine epsilon to avoid numerical problems. This formulation constrains the coeﬃcient

on the last lag to be zero—provided that the weights are declining at higher lags, a condition that is

ensured if θ2is greater than θ1by a suﬃcient margin. The special case of θ1=θ2= 1 yields equal

weights at all lags. A third parameter can be used to allow a non-zero ﬁnal weight, even in the case

of declining weights. Let widenote the normalized weight obtained by using (41.5) in (41.3). Then

the modiﬁed variant with additional parameter θ3can be written as

w(3)

i=wi+θ3

1 + pθ3

That is, we add θ3to each weight then renormalize so that the w(3)

ivalues again sum to unity.

In Eric Ghysels’ Matlab code the two beta variants are labeled “normalized beta density with a zero

last lag” and “normalized beta density with a non-zero last lag” respectively. Note that while the two

basic beta parameters must be positive, the third additive parameter may be positive, negative or

zero.

Chapter 41. MIDAS models 400

In the case of the plain Almon polynomial of order m, coeﬃcient iis given by

wi=

j=1

θjij−1

Note that no normalization is applied in this case, so no additional coeﬃcient should be placed before

the MIDAS lags term in the context of a regression.

Analytical gradients

Here we set out the expressions for the analytical gradients produced by the mgradient function, and

also used internally by the midasreg command. In these expressions f(i, θ) should be understood as

referring back to the speciﬁc forms noted above for the exponential Almon and beta distributions.

The summation Pkshould be understood as running from 1 to p.

For the normalized exponential Almon case, the gradient is

dwi

dθj

=f(i, θ)ij

Pkf(k, θ)−f(i, θ)

[Pkf(k, θ)]2X

kf(k, θ)kj

=wi ij−Pkf(k, θ)kj

Pkf(k, θ)!

For the two-parameter normalized beta case it is

dwi

dθ1

=f(i, θ) log(i−/p−)

Pkf(k, θ)−f(i, θ)

[Pkf(k, θ)]2X

kf(k, θ) log(k−/p−)

=wilog(i−/p−)−Pk[f(k, θ) log(k−/p−)]

Pkf(k, θ)

dwi

dθ2

=f(i, θ) log(1 −i−/p−)

Pkf(k, θ)−f(i, θ)

[Pkf(k, θ)]2X

kf(k, θ) log(1 −k−/p−)

=wilog(1 −i−/p−)−Pk[f(k, θ) log(1 −k−/p−)]

Pkf(k, θ)

And for the three-parameter beta, we have

dw(3)

dθ1

1 + pθ3

dwi

dθ1

dw(3)

dθ2

1 + pθ3

dwi

dθ2

dw(3)

dθ3

1 + pθ3−p(wi+θ3)

(1 + pθ3)2

For the (non-normalized) Almon polynomial the gradient is simply

dwi

dθj

=ij−1

Part III

Technical details

401

Chapter 42

Gretl and ODBC

Gretl provides a method for retrieving data from databases which support the Open Database Con-

nectivity (ODBC) standard. Most users won’t be interested in this, but there may be some for whom

this feature matters a lot—typically, those who work in an environment where huge data collections

are accessible via a Data Base Management System (DBMS).

In the following section we explain what is needed for ODBC support in gretl. We provide some

background information on how ODBC works in section 42.2, and explain the details of getting gretl

to retrieve data from a database in section 42.3. Section 42.4 provides some example of usage, and

section 42.5 gives some details on the management of ODBC connections.

42.1 ODBC support

The piece of software that bridges between gretl and the ODBC system is a dynamically loaded

“plugin”. This is included in the gretl packages for MS Windows and Mac OS X. On other unix-type

platforms (notably Linux) you may have to build gretl from source to get ODBC support. This is

because the plugin depends on having unixODBC installed, which we cannot assume to be the case

on typical Linux systems. To enable the ODBC plugin when building gretl, you must pass the option

--with-odbc to gretl’s configure script. In addition, if unixODBC is installed in a non-standard

location you will have to specify its installation preﬁx using --with-ODBC-prefix, as in (for example)

./configure --with-odbc --with-ODBC-prefix=/opt/ODBC

42.2 ODBC base concepts

ODBC is short for Open DataBase Connectivity, a group of software methods that enable a client to

interact with a database server. The most common operation is when the client fetches some data

from the server. ODBC acts as an intermediate layer between client and server, so the client “talks”

to ODBC rather than accessing the server directly (see Figure 42.1).

ODBC

query

data

Figure 42.1: Retrieving data via ODBC

For the above mechanism to work, it is necessary that the relevant ODBC software is installed and

working on the client machine (contact your DB administrator for details). At this point, the database

(or databases) that the server provides will be accessible to the client as a data source with a speciﬁc

identiﬁer (a Data Source Name or DSN); in most cases, a username and a password are required to

connect to the data source.

Once the connection is established, the user sends a query to ODBC, which contacts the database

manager, collects the results and sends them back to the user. The query is almost invariably

formulated in a special language used for the purpose, namely SQL.1We will not provide here an

SQL tutorial: there are many such tutorials on the Net; besides, each database manager tends to

1See http://en.wikipedia.org/wiki/SQL.

402

Chapter 42. Gretl and ODBC 403

support its own SQL dialect so the precise form of an SQL query may vary slightly if the DBMS on

the other end is Oracle, MySQL, PostgreSQL or something else.

Suﬃce it to say that the main statement for retrieving data is the SELECT statement. Within a DBMS,

data are organized in tables, which are roughly equivalent to spreadsheets. The SELECT statement

returns a subset of a table, which is itself a table. For example, imagine that the database holds a

table called “NatAccounts”, containing the data shown in Table 42.1.

year qtr gdp consump tradebal

1970 1 584763 344746.9 −5891.01

1970 2 597746 350176.9 −7068.71

1970 3 604270 355249.7 −8379.27

1970 4 609706 361794.7 −7917.61

1971 1 609597 362490 −6274.3

1971 2 617002 368313.6 −6658.76

1971 3 625536 372605 −4795.89

1971 4 630047 377033.9 −6498.13

Table 42.1: The “NatAccounts” table

The SQL statement

SELECT qtr, tradebal, gdp FROM NatAccounts WHERE year=1970;

produces the subset of the original data shown in Table 42.2.

qtr tradebal gdp

1−5891.01 584763

2−7068.71 597746

3−8379.27 604270

4−7917.61 609706

Table 42.2: Result of a SELECT statement

Gretl provides a mechanism for forwarding your query to the DBMS via ODBC and including the

results in your currently open dataset.

42.3 Syntax

At present we do not oﬀer a graphical interface for ODBC import; this must be done via the command

line interface. The two commands used for fetching data via an ODBC connection are open and data.

The open command is used for connecting to a DBMS: its syntax is

open dsn=database [user=username ] [password=password ] --odbc

The user and password items are optional; the eﬀect of this command is to initiate an ODBC

connection. It is assumed that the machine gretl runs on has a working ODBC client installed.

In order to actually retrieve the data, the data command is used. Its syntax is:

data series [obs-format=format-string ] query=query-string --odbc

where:

series is a list of names of gretl series to contain the incoming data, separated by spaces. Note that

these series need not exist pior to the ODBC import.

format-string is an optional parameter, used to handle cases when a “rectangular” organisation of

the database cannot be assumed (more on this later);

Chapter 42. Gretl and ODBC 404

query-string is a string containing the SQL statement used to extract the data.

There should be no spaces around the equals signs in the obs-format and query ﬁelds in the data

command.

The query-string can, in principle, contain any valid SQL statement which results in a table. This

string may be speciﬁed directly within the command, as in

data x query="SELECT foo FROM bar" --odbc

which will store into the gretl variable xthe content of the column foo from the table bar. However,

since in a real-life situation the string containing the SQL statement may be rather long, it may be

best to store it in a string variable. For example:

string SqlQry = "SELECT foo1, foo2 FROM bar"

data x y query=SqlQry --odbc

The observation format speciﬁer

If the optional parameter obs-format is absent, as in the above example, the SQL query should

return kcolumns of data, where kis the number of series names listed in the data command. It may

be necessary to include a smpl command before the data command to set up the right “window” for

the incoming data. In addition, if one cannot assume that the data will be delivered in the correct

order (typically, chronological order), the SQL query should contain an appropriate ORDER BY clause.

The optional format string is used for those cases when there is no certainty that the data from the

query will arrive in the same order as the gretl dataset. This may happen when missing values are

interspersed within a column, or with data that do not have a natural ordering, e.g. cross-sectional

data. In this case, the SQL statement should return a table with m+kcolumns, where the ﬁrst m

columns are used to identify the observation or row in the gretl dataset into which the actual data

values in the ﬁnal kcolumns should be placed. The obs-format string is used to translate the ﬁrst m

ﬁelds into a string which matches the string gretl uses to identify observations in the currently open

dataset. Up to three columns can be used for this purpose (m≤3).

Note that the strings gretl uses to identify observations can be seen by printing any variable “by

observation”, as in

print index --byobs

(The series named index is automatically added to a dataset created via the nulldata command.)

The format speciﬁers available for use with obs-format are as follows:

%d print an integer value

%s print an string value

%g print a ﬂoating-point value

In addition the format can include literal characters to be passed through, such as slashes or colons,

to make the resulting string compatible with gretl’s observation identiﬁers.

For example, consider the following ﬁctitious case: we have a 5-days-per-week dataset, to which we

want to add the stock index for the Verdurian market;2it so happens that in Verduria Saturdays are

working days but Wednesdays are not. We want a column which does not contain data on Saturdays,

because we wouldn’t know where to put them, but at the same time we want to place missing values

on all the Wednesdays.

In this case, the following syntax could be used

string QRY="SELECT year,month,day,VerdSE FROM AlmeaIndexes"

data y obs-format="%d-%d-%d" query=QRY --odbc

2See http://www.almeopedia.com/index.php/Verduria.

Chapter 42. Gretl and ODBC 405

The column VerdSE holds the data to be fetched, which will go into the gretl series y. The ﬁrst three

columns are used to construct a string which identiﬁes the day. Daily dates take the form YYYY-MM-DD

in gretl. If a row from the DBMS produces the observation string 2008-04-01 this will match OK

(it’s a Tuesday), but 2008-04-05 will not match since it is a Saturday; the corresponding row will

therefore be discarded. On the other hand, since no string 2008-04-23 will be found in the data

coming from the DBMS (it’s a Wednesday), that entry is left blank in our series y.

42.4 Examples

Table Consump Table DATA

Field Type

time decimal(7,2)

income decimal(16,6)

consump decimal(16,6)

Field Type

year decimal(4,0)

qtr decimal(1,0)

varname varchar(16)

xval decimal(20,10)

Table 42.3: Example AWM database – structure

Table Consump Table DATA

1970.00 424278.975500 344746.944000

1970.25 433218.709400 350176.890400

1970.50 440954.219100 355249.672300

1970.75 446278.664700 361794.719900

1971.00 447752.681800 362489.970500

1971.25 453553.860100 368313.558500

1971.50 460115.133100 372605.015300

...

1970 1 CAN −517.9085000000

1970 2 CAN 662.5996000000

1970 3 CAN 1130.4155000000

1970 4 CAN 467.2508000000

1970 1 COMPR 18.4000000000

1970 2 COMPR 18.6341000000

1970 3 COMPR 18.3000000000

1970 4 COMPR 18.2663000000

1970 1 D1 1.0000000000

1970 2 D1 0.0000000000

...

Table 42.4: Example AWM database — data

In the following examples, we will assume that access is available to a database known to ODBC with

the data source name “AWM”, with username “Otto” and password “Bingo”. The database “AWM”

contains quarterly data in two tables (see 42.3 and 42.4):

The table Consump is the classic “rectangular” dataset; that is, its internal organization is the same

as in a spreadsheet or econometrics package: each row is a data point and each column is a variable.

The structure of the DATA table is diﬀerent: each record is one ﬁgure, stored in the column xval, and

the other ﬁelds keep track of which variable it belongs to, for which date.

Listing 42.1: Simple query from a rectangular table

nulldata 160

setobs 4 1970:1 --time

open dsn=AWM user=Otto password=Bingo --odbc

string Qry = "SELECT consump, income FROM Consump"

data cons inc query=Qry --odbc

Chapter 42. Gretl and ODBC 406

Listing 42.1 shows a query for two series: ﬁrst we set up an empty quarterly dataset. Then we connect

to the database using the open statement. Once the connection is established we retrieve two columns

from the Consump table. No observation string is required because the data already have a suitable

structure; we need only import the relevant columns.

Listing 42.2: Simple query from a non-rectangular table

string S = "select year, qtr, xval from DATA \

where varname=’WLN’ ORDER BY year, qtr"

data wln obs-format="%d:%d" query=S --odbc

In example 42.2, by contrast, we make use of the observation string since we are drawing from the

DATA table, which is not rectangular. The SQL statement stored in the string Sproduces a table with

three columns. The ORDER BY clause ensures that the rows will be in chronological order, although

this is not strictly necessary in this case.

42.5 Connectivity details

It may be helpful to supply some details on gretl’s management of ODBC connections. First, when

the open command is invoked with the --odbc option, gretl checks to see if a connection to the

speciﬁed DSN (Data Source Name) can be established via the ODBC function SQLConnect. If not,

an error is ﬂagged; if so, the connection is dropped (SQLDisconnect) but the DSN details are stored.

The stored DSN then remains the implicit source for subsequent invocation of the data command,

with the --odbc option, until a countermanding open command is issued.

Each time an OBDC-related data command is issued, gretl attempts to re-establish a connection to

the given DSN; the connection is dropped once the data transfer is complete.

Chapter 42. Gretl and ODBC 407

Listing 42.3: Handling of missing values for a non-rectangular table

string foo = "select year, qtr, xval from DATA \

where varname=’STN’ AND qtr>1"

data bar obs-format="%d:%d" query=foo --odbc

print bar --byobs

Listing 42.3 shows what happens if the rows in the outcome from the SELECT statement do not match

the observations in the currently open gretl dataset. The query includes a condition which ﬁlters out

all the data from the ﬁrst quarter. The query result (invisible to the user) would be something like

+------+------+---------------+

| year | qtr | xval |

+------+------+---------------+

| 1970 | 2 | 7.8705000000 |

| 1970 | 3 | 7.5600000000 |

| 1970 | 4 | 7.1892000000 |

| 1971 | 2 | 5.8679000000 |

| 1971 | 3 | 6.2442000000 |

| 1971 | 4 | 5.9811000000 |

| 1972 | 2 | 4.6883000000 |

| 1972 | 3 | 4.6302000000 |

...

Internally, gretl ﬁlls the variable bar with the corresponding value if it ﬁnds a match; otherwise, NA

is used. Printing out the variable bar thus produces

Obs bar

1970:1

1970:2 7.8705

1970:3 7.5600

1970:4 7.1892

1971:1

1971:2 5.8679

1971:3 6.2442

1971:4 5.9811

1972:1

1972:2 4.6883

1972:3 4.6302

...

Chapter 43

Gretl and T

43.1 Introduction

X — initially developed by Donald Knuth of Stanford University and since enhanced by hundreds

of contributors around the world — is the gold standard of scientiﬁc typesetting. Gretl provides

various hooks that enable you to preview and print econometric results using the T

X engine, and to

save output in a form suitable for further processing with T

This chapter explains the ﬁner points of gretl’s T

X-related functionality. The next section describes

the relevant menu items; section 43.3 discusses ways of ﬁne-tuning T

X output; and section 43.4 gives

some pointers on installing (and learning) T

X if you do not already have it on your computer. (Just

to be clear: T

X is not included with the gretl distribution; it is a separate package, including several

programs and a large number of supporting ﬁles.)

Before proceeding, however, it may be useful to set out brieﬂy the stages of production of a ﬁnal

document using T

X. For the most part you don’t have to worry about these details, since, in regard

to previewing at any rate, gretl handles them for you. But having some grasp of what is going on

behind the scences will enable you to understand your options better.

The ﬁrst step is the creation of a plain text “source” ﬁle, containing the text or mathematics to be

typset, interspersed with mark-up that deﬁnes how it should be formatted. The second step is to run

the source through a processing engine that does the actual formatting. Typically this a program

called pdﬂatex that generates PDF output.1(In times gone by it was a program called latex that

generated so-called DVI (device-independent) output.)

So gretl calls pdﬂatex to process the source ﬁle. On MS Windows and Mac OS X, gretl expects the

operating system to ﬁnd the default viewer for PDF output. On GNU/Linux you can specify your

preferred PDF viewer via the menu item “Tools, Preferences, General,” under the “Programs” tab.

43.2 T

EX-related menu items

The model window

The fullest T

X support in gretl is found in the GUI model window. This has a menu item titled

“LaTeX” with sub-items “View”, “Copy”, “Save” and “Equation options” (see Figure 43.1).

Figure 43.1: L

X menu in model window

1Experts will be aware of something called “plain T

X”, which is processed using the program tex. The great

majority of T

X users, however, use the L

X macros, initially developed by Leslie Lamport. gretl does not support

plain T

408

Chapter 43. Gretl and T

X 409

The ﬁrst three sub-items have branches titled “Tabular” and “Equation”. By “Tabular” we mean that

the model is represented in the form of a table; this is the fullest and most explicit presentation of

the results. See Table 43.1 for an example; this was pasted into the manual after using the “Copy,

Tabular” item in gretl (a few lines were edited out for brevity).

Table 43.1: Example of L

X tabular output

Model 1: OLS estimates using the 51 observations 1–51

Dependent variable: ENROLL

Variable Coeﬃcient Std. Error t-statistic p-value

const 0.241105 0.0660225 3.6519 0.0007

CATHOL 0.223530 0.0459701 4.8625 0.0000

PUPIL −0.00338200 0.00271962 −1.2436 0.2198

WHITE −0.152643 0.0407064 −3.7499 0.0005

Mean of dependent variable 0.0955686

S.D. of dependent variable 0.0522150

Sum of squared residuals 0.0709594

Standard error of residuals (ˆσ) 0.0388558

Unadjusted R20.479466

Adjusted ¯

R20.446241

F(3,47) 14.4306

The “Equation” option is fairly self-explanatory—the results are written across the page in equation

format, as below:

ENROLL = 0.241105

(0.066022) + 0.223530

(0.04597) CATHOL −0.00338200

(0.0027196) PUPIL −0.152643

(0.040706) WHITE

T= 51 ¯

R2= 0.4462 F(3,47) = 14.431 ˆσ= 0.038856

(standard errors in parentheses)

The distinction between the “Copy” and “Save” options (for both tabular and equation) is twofold.

First, “Copy” puts the T

X source on the clipboard while with “Save” you are prompted for the name

of a ﬁle into which the source should be saved. Second, with “Copy” the material is copied as a

“fragment” while with “Save” it is written as a complete ﬁle. The point is that a well-formed T

source ﬁle must have a header that deﬁnes the documentclass (article, report, book or whatever)

and tags that say \begin{document} and \end{document}. This material is included when you do

“Save”but not when you do “Copy”, since in the latter case the expectation is that you will paste the

data into an existing T

X source ﬁle that already has the relevant apparatus in place.

The items under “Equation options”should be self-explanatory: when printing the model in equation

form, do you want standard errors or t-ratios displayed in parentheses under the parameter estimates?

The default is to show standard errors; if you want t-ratios, select that item.

Other windows

Several other sorts of output windows also have T

X preview, copy and save enabled. In the case of

windows having a graphical toolbar, look for the T

X button. Figure 43.2 shows this icon (second

from the right on the toolbar) along with the dialog that appears when you press the button.

One aspect of gretl’s T

X support that is likely to be particularly useful for publication purposes is

the ability to produce a typeset version of the “model table” (see section 3.4). An example of this is

shown in Table 43.2.

Chapter 43. Gretl and T

X 410

Figure 43.2: T

X icon and dialog

Table 43.2: Example of model table output

OLS estimates

Dependent variable: ENROLL

Model 1 Model 2 Model 3

const 0.2907∗∗ 0.2411∗∗ 0.08557

(0.07853) (0.06602) (0.05794)

CATHOL 0.2216∗∗ 0.2235∗∗ 0.2065∗∗

(0.04584) (0.04597) (0.05160)

PUPIL −0.003035 −0.003382 −0.001697

(0.002727) (0.002720) (0.003025)

WHITE −0.1482∗∗ −0.1526∗∗

(0.04074) (0.04071)

ADMEXP −0.1551

(0.1342)

n51 51 51

R20.4502 0.4462 0.2956

ℓ96.09 95.36 88.69

Standard errors in parentheses

* indicates signiﬁcance at the 10 percent level

** indicates signiﬁcance at the 5 percent level

Chapter 43. Gretl and T

X 411

43.3 Fine-tuning typeset output

There are three aspects to this: adjusting the appearance of the output produced by gretl in L

preview mode; adjusting the formatting of gretl’s tabular output for models when using the tabprint

command; and incorporating gretl’s output into your own T

X ﬁles.

Previewing in the GUI

As regards preview mode, you can control the appearance of gretl’s output using a ﬁle named

gretlpre.tex, which should be placed in your gretl user directory (see the Gretl Command Ref-

erence). If such a ﬁle is found, its contents will be used as the “preamble” to the T

X source. The

default value of the preamble is as follows:

\documentclass[11pt]{article}

\usepackage[utf8]{inputenc}

\usepackage{amsmath}

\usepackage{dcolumn,longtable}

\begin{document}

\thispagestyle{empty}

Note that the amsmath and dcolumn packages are required. (For some sorts of output the longtable

package is also needed.) Beyond that you can, for instance, change the type size or the font by altering

the documentclass declaration or including an alternative font package.

In addition, if you wish to typeset gretl output in more than one language, you can set up per-language

preamble ﬁles. A“localized” preamble ﬁle is identiﬁed by a name of the form gretlpre_xx.tex, where

xx is replaced by the ﬁrst two letters of the current setting of the LANG environment variable. For

example, if you are running the program in Polish, using LANG=pl_PL, then gretl will do the following

when writing the preamble for a T

X source ﬁle.

1. Look for a ﬁle named gretlpre_pl.tex in the gretl user directory. If this is not found, then

2. look for a ﬁle named gretlpre.tex in the gretl user directory. If this is not found, then

3. use the default preamble.

Conversely, suppose you usually run gretl in a language other than English, and have a suitable

gretlpre.tex ﬁle in place for your native language. If on some occasions you want to produce T

output in English, then you could create an additional ﬁle gretlpre_en.tex: this ﬁle will be used

for the preamble when gretl is run with a language setting of, say, en_US.

Command-line options

After estimating a model via a script—or interactively via the gretl console or using the command-

line program gretlcli—you can use the commands tabprint or eqnprint to print the model to ﬁle in

tabular format or equation format respectively. These options are explained in the Gretl Command

Reference.

If you wish alter the appearance of gretl’s tabular output for models in the context of the tabprint

command, you can specify a custom row format using the --format ﬂag. The format string must be

enclosed in double quotes and must be tied to the ﬂag with an equals sign. The pattern for the format

string is as follows. There are four ﬁelds, representing the coeﬃcient, standard error, t-ratio and p-

value respectively. These ﬁelds should be separated by vertical bars; they may contain a printf-type

speciﬁcation for the formatting of the numeric value in question, or may be left blank to suppress the

printing of that column (subject to the constraint that you can’t leave all the columns blank). Here

are a few examples:

--format="%.4f|%.4f|%.4f|%.4f"

--format="%.4f|%.4f|%.3f|"

--format="%.5f|%.4f||%.4f"

--format="%.8g|%.8g||%.4f"

Chapter 43. Gretl and T

X 412

The ﬁrst of these speciﬁcations prints the values in all columns using 4 decimal places. The second

suppresses the p-value and prints the t-ratio to 3 places. The third omits the t-ratio. The last one

again omits the t, and prints both coeﬃcient and standard error to 8 signiﬁcant ﬁgures.

Once you set a custom format in this way, it is remembered and used for the duration of the gretl

session. To revert to the default formatting you can use the special variant --format=default.

Further editing

Once you have pasted gretl’s T

X output into your own document, or saved it to ﬁle and opened it in an

editor, you can of course modify the material in any wish you wish. In some cases, machine-generated

X is hard to understand, but gretl’s output is intended to be human-readable and -editable. In

addition, it does not use any non-standard style packages. Besides the standard L

X document

classes, the only ﬁles needed are, as noted above, the amsmath,dcolumn and longtable packages.

These should be included in any reasonably full T

X implementation.

43.4 Installing and learning T

This is not the place for a detailed exposition of these matters, but here are a few pointers.

So far as we know, every GNU/Linux distribution has a package or set of packages for T

X, and in

fact these are likely to be installed by default. Check the documentation for your distribution. For

MS Windows, several packaged versions of T

X are available: one of the most popular is MiKT

X at

http://www.miktex.org/. For Mac OS X a nice implementation is iT

XMac, at http://itexmac.

sourceforge.net/. An essential starting point for online T

X resources is the Comprehensive T

Archive Network (CTAN) at http://www.ctan.org/.

As for learning T

X, many useful resources are available both online and in print. Among online

guides, Tony Roberts’ “L

X: from quick and dirty to style and ﬁnesse” is very helpful, at

http://www.sci.usq.edu.au/staff/robertsa/LaTeX/latexintro.html

An excellent source for advanced material is The L

X Companion (Goossens et al.,2004).

Chapter 44

Gretl and R

44.1 Introduction

Ris, by far, the largest free statistical project.1Like gretl, it is a GNU project and the two have a lot

in common; however, gretl’s approach focuses on ease of use much more than R, which instead aims

to encompass the widest possible range of statistical procedures.

As is natural in the free software ecosystem, we don’t view ourselves as competitors to R,2but rather

as projects sharing a common goal who should support each other whenever possible. For this reason,

gretl provides a way to interact with Rand thus enable users to pool the capabilities of the two

packages.

In this chapter, we will explain how to exploit R’s power from within gretl. We assume that the

reader has a working installation of Ravailable and a basic grasp of R’s syntax.3

Despite several valiant attempts, no graphical shell has gained wide acceptance in the Rcommunity:

by and large, the standard method of working with Ris by writing scripts, or by typing commands at

the Rprompt, much in the same way as one would write gretl scripts or work with the gretl console.

In this chapter, the focus will be on the methods available to execute Rcommands without leaving

gretl.

44.2 Starting an interactive R session

The easiest way to use Rfrom gretl is in interactive mode. Once you have your data loaded in gretl,

you can select the menu item“Tools, Start GNU R” and an interactive Rsession will be started, with

your dataset automatically pre-loaded.

A simple example: OLS on cross-section data

For this example we use Ramanathan’s dataset data4-1, one of the sample ﬁles supplied with gretl.

We ﬁrst run, in gretl, an OLS regression of price on sqft,bedrms and baths. The basic results are

shown in Table 44.1.

Table 44.1: OLS house price regression via gretl

Variable Coeﬃcient Std. Error t-statistic p-value

const 129.062 88.3033 1.4616 0.1746

sqft 0.154800 0.0319404 4.8465 0.0007

bedrms −21.587 27.0293 −0.7987 0.4430

baths −12.192 43.2500 −0.2819 0.7838

We will now replicate the above results using R. Select the menu item “Tools, Start GNU R”. A

window similar to the one shown in ﬁgure 44.1 should appear.

The actual look of the Rwindow may be somewhat diﬀerent from what you see in Figure 44.1

(especially for Windows users), but this is immaterial. The important point is that you have a

1R’s homepage is at http://www.r-project.org/.

2OK, who are we kidding? But it’s friend ly competition!

3The main reference for Rdocumentation is http://cran.r-project.org/manuals.html. In addition, Rtutorials

abound on the Net; as always, Google is your friend.

413

Chapter 44. Gretl and R 414

Figure 44.1:Rwindow

window where you can type commands to R. If the above procedure doesn’t work and no Rwindow

opens, it means that gretl was unable to launch R. You should ensure that Ris installed and working

on your system and that gretl knows where it is. The relevant settings can be found by selecting the

“Tools, Preferences, General” menu entry, under the “Programs” tab.

Assuming Rwas launched successfully, you will see notiﬁcation that the data from gretl are available.

In the background, gretl has arranged for two Rcommands to be executed, one to load the gretl

dataset in the form of a data frame (one of several forms in which Rcan store data) and one to attach

the data so that the variable names deﬁned in the gretl workspace are available as valid identiﬁers

within R.

In order to replicate gretl’s OLS estimation, go into the Rwindow and type at the prompt

model <- lm(price ~ sqft + bedrms + baths)

summary(model)

You should see something similar to Figure 44.2. Surprise—the estimates coincide! To get out, just

close the Rwindow or type q() at the Rprompt.

Time series data

We now turn to an example which uses time series data: we will compare gretl’s and R’s estimates of

Box and Jenkins’ immortal “airline” model. The data are contained in the bjg sample dataset. The

following gretl code

open bjg

arima011;011;lg--nc

produces the estimates shown in Table 44.2.

If we now open an Rsession as described in the previous subsection, the data-passing mechanism is

slightly diﬀerent. Since our data were deﬁned in gretl as time series, we use an Rtime-series object

(ts for short) for the transfer. In this way we can retain in Ruseful information such as the periodicity

of the data and the sample limits. The downside is that the names of individual series, as deﬁned

in gretl, are not valid identiﬁers. In order to extract the variable lg, one needs to use the syntax

lg <- gretldata[, "lg"].

ARIMA estimation can be carried out by issuing the following two Rcommands:

Chapter 44. Gretl and R 415

Figure 44.2: OLS regression on house prices via R

Table 44.2: Airline model from Box and Jenkins (1976) – selected portion of gretl’s estimates

Variable Coeﬃcient Std. Error t-statistic p-value

θ1−0.401824 0.0896421 −4.4825 0.0000

Θ1−0.556936 0.0731044 −7.6184 0.0000

Variance of innovations 0.00134810

Log-likelihood 244.696

Akaike information criterion −483.39

Chapter 44. Gretl and R 416

lg <- gretldata[, "lg"]

arima(lg, c(0,1,1), seasonal=c(0,1,1))

which yield

Coefficients:

ma1 sma1

-0.4018 -0.5569

s.e. 0.0896 0.0731

sigma^2 estimated as 0.001348: log likelihood = 244.7, aic = -483.4

Happily, the estimates again coincide.

44.3 Running an R script

Opening an Rwindow and keying in commands is a convenient method when the job is small. In

some cases, however, it would be preferable to have Rexecute a script prepared in advance. One way

to do this is via the source() command in R. Alternatively, gretl oﬀers the facility to edit an Rscript

and run it, having the current dataset pre-loaded automatically. This feature can be accessed via the

“File, Script Files” menu entry. By selecting “User ﬁle”, one can load a pre-existing Rscript; if you

want to create a new script instead, select the “New script, R script” menu entry.

Figure 44.3: Editing window for Rscripts

In either case, you are presented with a window very similar to the editor window used for ordinary

gretl scripts, as in Figure 44.3.

There are two main diﬀerences. First, you get syntax highlighting for R’s syntax instead of gretl’s.

Second, clicking on the Execute button (the gears icon), launches an instance of Rin which your

commands are executed. Before Ris actually run, you are asked if you want to run Rinteractively or

not (see Figure 44.4).

Figure 44.4: Editing window for Rscripts

An interactive run opens an Rinstance similar to the one seen in the previous section: your data will

be pre-loaded (if the “pre-load data” box is checked) and your commands will be executed. Once this

is done, you will ﬁnd yourself at the Rprompt, where you can enter more commands.

Chapter 44. Gretl and R 417

A non-interactive run, on the other hand, will execute your script, collect the output from Rand

present it to you in an output window; Rwill be run in the background. If, for example, the script in

Figure 44.3 is run non-interactively, a window similar to Figure 44.5 will appear.

Figure 44.5: Output from a non-interactive Rrun

44.4 Sending data back and forth

As regards the passing of data between the two programs, so far we have only considered passing

series from gretl to R. In order to achieve a satisfactory degree of interoperability, more is needed. In

the following sub-sections we see how matrices can be exchanged, and how data can be passed from

Rback to gretl.

Passing matrices from gretl to R

For passing matrices from gretl to R, you can use the mwrite matrix function described in section

17.7. For example, the following gretl code fragment generates the matrix

A=





3 7 11

4 8 12

5 9 13

6 10 14







and stores it into the ﬁle mymatfile.mat in the user’s“dotdir”(see section 15.2). Note that writing to

this special directory, which is sure to exist and be writable by the user, is mandated by the non-zero

value for the third, optional argument to mwrite.

matrix A = mshape(seq(3,14),4,3)

err = mwrite(A, "mymatfile.mat", 1)

The recommended Rcode to import such a matrix is

A <- gretl.loadmat("mymatfile.mat")

The function gretl.loadmat, which is predeﬁned when Ris called from gretl, retrieves the matrix

from dotdir. (The “.mat” extension for gretl matrix ﬁles is not compulsory; you can name these ﬁles

as you wish.)

It’s also possible to take more control over the details of the transfer if you wish. You have the built-in

string variable $dotdir in gretl, while in Ryou have the same variable under the name gretl.dotdir.

Chapter 44. Gretl and R 418

To use a location other than $dotdir you may (a) omit the third argument to mwrite and supply a

full path to the matrix ﬁle, and (b) use a more generic approach to reading the ﬁle in R. Here’s an

example:

Gretl side:

mwrite(A, "/path/to/mymatfile.mat")

Rside:

A <- as.matrix(read.table("/path/to/mymatfile.mat", skip=1))

Passing data from R to gretl

For passing data in the opposite direction, gretl deﬁnes a special function that can be used in the R

environment. An Robject will be written as a temporary ﬁle in $dotdir, from where it can be easily

retrieved from within gretl.

The name of this function is gretl.export(); it takes one required argument, the object to be

exported. At present, the objects that can be exported with this method are matrices, data frames

and time-series objects. The function creates a text ﬁle, by default with the same name as the

exported object (plus an appropriate suﬃx), in gretl’s temporary directory. Data frames and time-

series objects are stored as CSV ﬁles, and can be retrieved by using gretl’s append command. Matrices

are stored in a special text format that is understood by gretl (see section 17.7); the ﬁle suﬃx is in

this case .mat, and to read the matrix in gretl you must use the mread() function.

This function also has an optional second argument, namely a string which speciﬁes a basename for

the export ﬁle, in case you want to use a name other than that attached to the object within R. As

in the default case an appropriate suﬃx, .csv or .mat, will be added to the basename.

As an example, we take the airline data and use them to estimate a structural time series model `a la

Harvey (1989).4The model we will use is the Basic Structural Model (BSM), in which a time series

is decomposed into three terms:

yt=µt+γt+εt

where µtis a trend component, γtis a seasonal component and εtis a noise term. In turn, the

following is assumed to hold:

∆µt=βt−1+ηt

∆βt=ζt

∆sγt= ∆ωt

where ∆sis the seasonal diﬀerencing operator, (1 −Ls), and ηt,ζtand ωtare mutually uncorrelated

white noise processes. The object of the analysis is to estimate the variances of the noise components

(which may be zero) and to recover estimates of the latent processes µt(the “level”), βt(the “slope”)

and γt.

We will use R’s StructTS command and import the results back into gretl. Once the bjg dataset is

loaded in gretl, we pass the data to Rand execute the following script:

# extract the log series

y <- gretldata[, "lg"]

# estimate the model

strmod <- StructTS(y)

# save the fitted components (smoothed)

compon <- as.ts(tsSmooth(strmod))

# save the estimated variances

vars <- as.matrix(strmod$coef)

# export into gretl’s temp dir

gretl.export(compon)

gretl.export(vars)

4The function package StucTiSM is available to handle this class of models natively in gretl.

Chapter 44. Gretl and R 419

Running this script via gretl produces minimal output:

current data loaded as ts object "gretldata"

wrote /home/cottrell/.gretl/compon.csv

wrote /home/cottrell/.gretl/vars.mat

However, we are now able to pull the results back into gretl by executing the following commands,

either from the console or by creating a small script:

string fname = sprintf("%s/compon.csv", $dotdir)

append @fname

vars = mread("vars.mat", 1)

The ﬁrst command reads the estimated time-series components from a CSV ﬁle, which is the format

that the passing mechanism employs for series. The matrix vars is read from the ﬁle vars.mat.

4.6

4.8

5.2

5.4

5.6

5.8

6.2

6.4

6.6

1949 1955 1961

4.6

4.8

5.2

5.4

5.6

5.8

6.2

1949 1955 1961

level

0.01

0.01005

0.0101

0.01015

0.0102

0.01025

1949 1955 1961

slope

-0.25

-0.2

-0.15

-0.1

-0.05

0.05

0.1

0.15

0.2

0.25

0.3

1949 1955 1961

sea

Figure 44.6: Estimated components from BSM

After the above commands have been executed, three new series will have appeared in the gretl

workspace, namely the estimates of the three components; by plotting them together with the original

data, you should get a graph similar to Figure 44.6. The estimates of the variances can be seen by

printing the vars matrix, as in

? print vars

vars (4 x 1)

0.00077185

0.0000

0.0013969

0.0000

That is,

ˆσ2

η= 0.00077185,ˆσ2

ζ= 0,ˆσ2

ω= 0.0013969,ˆσ2

ε= 0

Notice that, since ˆσ2

ζ= 0, the estimate for βtis constant and the level component is simply a random

walk with a drift.

Chapter 44. Gretl and R 420

44.5 Interacting with R from the command line

Up to this point we have spoken only of interaction with Rvia the GUI program. In order to do the

same from the command line interface, gretl provides the foreign command. This enables you to

embed non-native commands within a gretl script.

A “foreign” block takes the form

foreign language=R [--send-data[=list]] [--quiet]

... R commands ...

end foreign

and achieves the same eﬀect as submitting the enclosed Rcommands via the GUI in the non-interactive

mode (see section 44.3 above). The --send-data option arranges for auto-loading of the data present

in the gretl session, or a subset thereof speciﬁed via a named list. The --quiet option prevents the

output from Rfrom being echoed in the gretl output.

Using this method, replicating the example in the previous subsection is rather easy: basically, all it

takes is encapsulating the content of the Rscript in a foreign...end foreign block; see Listing 44.1.

Listing 44.1: Estimation of the Basic Structural Model – simple [Download ▼]

open bjg.gdt

foreign language=R --send-data

y <- gretldata[, "lg"]

strmod <- StructTS(y)

compon <- as.ts(tsSmooth(strmod))

vars <- as.matrix(strmod$coef)

gretl.export(compon)

gretl.export(vars)

end foreign

append @dotdir/compon.csv

rename level lg_level

rename slope lg_slope

rename sea lg_seas

vars = mread("vars.mat", 1)

The above syntax, despite being already quite useful by itself, shows its full power when it is used in

conjunction with user-written functions. Listing 44.2 shows how to deﬁne a gretl function that calls

Rinternally.

44.6 Performance issues with R

Ris a large and complex program, which takes an appreciable time to initialize itself. In interactive use

this not a signiﬁcant problem, but if you have a gretl script that calls Rrepeatedly the cumulated start-

up costs can become bothersome. To get around this, gretl calls the Rshared library by preference;

in this case the start-up cost is borne only once, on the ﬁrst invocation of Rcode from within gretl.

Support for the Rshared library is built into the gretl packages for MS Windows and OS X—but

the advantage is realized only if the library is in fact available at run time. If you are building gretl

yourself on Linux and wish to make use of the Rlibrary, you should ensure (a) that Rhas been built

with the shared library enabled (specify --enable-R-shlib when conﬁguring your build of R), and

(b) that the pkg-config program is able to detect your Rinstallation. We do not link to the R

library at build time, rather we open it dynamically on demand. The gretl GUI has an item under

the Tools/Preferences menu which enables you to select the path to the library, if it is not detected

automatically.

Chapter 44. Gretl and R 421

Listing 44.2: Estimation of the Basic Structural Model via a function [Download ▼]

function list RStructTS(series myseries)

smpl ok(myseries) --restrict

sx = argname(myseries)

foreign language=R --send-data --quiet

@sx <- gretldata[, "myseries"]

strmod <- StructTS(@sx)

compon <- as.ts(tsSmooth(strmod))

gretl.export(compon)

end foreign

append @dotdir/compon.csv

rename level @sx_level

rename slope @sx_slope

rename sea @sx_seas

list ret = @sx_level @sx_slope @sx_seas

return ret

end function

# ------------ main -------------------------

open bjg.gdt

list X = RStructTS(lg)

If you have the Rshared library installed but want to force gretl to call the Rexecutable instead, you

can do

set R_lib off

44.7 Further use of the R library

Besides improving performance, as noted above, use of the Rshared library makes possible a further

reﬁnement. That is, you can deﬁne functions in R, within a foreign block, then call those functions

later in your script much as if they were gretl functions. This is illustrated below.

set R_functions on

foreign language=R

plus_one <- function(q) {

z = q+1

invisible(z)

}

end foreign

scalar b=R.plus_one(2)

The Rfunction plus_one is obviously trivial in itself, but the example shows a couple of points. First,

for this mechanism to work you need to enable R_functions via the set command. Second, to avoid

collision with the gretl function namespace, calls to functions deﬁned in this way must be preﬁxed

with “R.”, as in R.plus_one. (But please note, this mechanism will not work if you have deﬁned a

gretl bundle named R: in that case identiﬁers beginning with “R.” will be understood as referring to

members of the bundle in question.)

Built-in Rfunctions may also be called in this way, once R_functions is set on. For example one can

invoke R’s choose function, which computes binomial coeﬃcients:

Chapter 44. Gretl and R 422

set R_functions on

scalar b = R.choose(10,4)

The use of Rfunctions from within gretl is limited by the need for an unambiguous and lossless

mapping between Rand gretl data-types (both for arguments passed by gretl and for return values

generated by R). So far, the following possibilities are supported (see chapter 11 for details on the

deﬁnition of types on the gretl side):

•The most basic types—real scalars, real matrices and (single) strings—can be pushed in ei-

ther direction no problem. Since gretl 2023b, row and column names will be preserved when

transferring matrices.

•Aseries in gretl can be pushed to Ras a vector. If the gretl series is string-valued (see

chapter 16), Rwill receive the string values.

•Gretl’s arrays of strings can be pushed to Ras vectors of strings, and vice versa.

•Gretl’s bundles can be pushed to Ras “lists”, with tags naming the elements, and R’s lists can

be retrieved as gretl bundles provided that their elements have a corresponding gretl type and

are identiﬁed by tags. But this is subject to the restriction that a gretl bundle passed to R

cannot contain instances of the gretl list type (or arrays of anything other than strings).

Chapter 45

Gretl and Ox

45.1 Introduction

Ox, written by Jurgen A. Doornik (see Doornik,2007), is described by its author as “an object-

oriented statistical system. At its core is a powerful matrix language, which is complemented by a

comprehensive statistical library. Among the special features of Ox are its speed [and] well-designed

syntax. . . . Ox comes in two versions: Ox Professional and Ox Console. Ox is available for Windows,

Linux, Mac (OS X), and several Unix platforms.” (www.doornik.com)

Ox is proprietary, closed-source software. The command-line version of the program is, however,

available free of change for academic users. Quoting again from Doornik’s website: “The Console

(command line) versions may be used freely for academic research and teaching purposes only. . . .

The Ox syntax is public, and, of course, you may do with your own Ox code whatever you wish.”

If you wish to use Ox in conjunction with gretl please refer to doornik.com for further details on

licensing.

As the reader will no doubt have noticed, most other software that we discuss in this Guide is open-

source and freely available for all users. We make an exception for Ox on the grounds that it is

indeed fast and well designed, and that its statistical library—along with various add-on packages

that are also available—has exceptional coverage of cutting-edge techniques in econometrics. The

gretl authors have used Ox for benchmarking some of gretl’s more advanced features such as dynamic

panel models and state space models.1

45.2 Ox support in gretl

The support oﬀered for Ox in gretl is similar to that oﬀered for R, as discussed in chapter 44.

☞To enable support for Ox, go to the Tools/Preferences/General menu item and look under the Programs tab.

Find the entry for the path to the oxl executable, that is, the program that runs Ox ﬁles (on MS Windows it is

called oxl.exe). Adjust the path if it’s not already right for your system and you should be ready to go.

With support enabled, you can open and edit Ox programs in the gretl GUI. Clicking the “execute”

icon in the editor window will send your code to Ox for execution. Figures 45.1 and Figure 45.2 show

an Ox program and part of its output.

In addition you can embed Ox code within a gretl script using a foreign block, as described in

connection with R. A trivial example, which simply prints the gretl data matrix within Ox, is shown

in Listing 45.1.

The listing illustrates how a matrix can be passed from gretl to Ox. We use the mwrite function to

write a matrix into the user’s“dotdir” (see section 15.2), then in Ox we use the function gretl_loadmat

to retrieve the matrix.

How does gretl_loadmat come to be deﬁned? When gretl writes out the Ox program corresponding

to your foreign block it does two things in addition. First, it writes a small utility ﬁle named

gretl_io.ox into your dotdir. This contains a deﬁnition for gretl_loadmat and also for the function

gretl_export (see below). Second, gretl interpolates into your Ox code a line which includes this

utility ﬁle (it is inserted right after the inclusion of oxstd.h, which is needed in all Ox programs).

Note that gretl_loadmat expects to ﬁnd the named ﬁle in the user’s dotdir.

1For a review of Ox, see Cribari-Neto and Zarkos (2003) and for a (somewhat dated) comparison of Ox with other

matrix-oriented packages such as GAUSS, see Steinhaus (1999).

423

Chapter 45. Gretl and Ox 424

Figure 45.1:Ox editing window

Figure 45.2: Output from Ox

Chapter 45. Gretl and Ox 425

Listing 45.1: Simple example of Ox usage

open data4-1

matrix m = {dataset}

mwrite(m, "gretl.mat", 1)

foreign language=Ox

#include <oxstd.h>

main()

{

decl gmat = gretl_loadmat("gretl.mat");

print(gmat);

}

end foreign

45.3 Illustration: replication of DPD model

Listing 45.2 shows a more ambitious case. This script replicates one of the dynamic panel data models

in Arellano and Bond (1991), ﬁrst using gretl and then using Ox; we then check the relative diﬀerences

between the parameter estimates produced by the two programs (which turn out to be reassuringly

small). This was last tested using version 9.10 of Ox console for Linux.

Unlike the previous example, in this case we pass the dataset from gretl to Ox as a CSV ﬁle in

order to preserve the variable names. Note the use of the internal variable csv_na to get the right

representation of missing values for use with Ox—and also note that the --send-data option for the

foreign command is not available in connection with Ox.

We get the parameter estimates back from Ox using gretl_export on the Ox side and mread on the

gretl side. The gretl_export function takes two arguments, a matrix and a ﬁle name. The ﬁle is

written into the user’s dotdir, from where it can be picked up using mread. The ﬁnal portion of the

output from Listing 45.2 is shown below:

? matrix oxparm = mread("oxparm.mat", 1)

Generated matrix oxparm

? eval abs((parm - oxparm) ./ oxparm)

1.1094e-12

1.6314e-12

1.7885e-13

1.1949e-12

1.8999e-13

1.1359e-13

6.5339e-13

3.1943e-12

1.2108e-12

3.7335e-13

1.7468e-12

3.9536e-12

2.0980e-12

Chapter 45. Gretl and Ox 426

Listing 45.2: Estimation of dynamic panel data model via gretl and Ox [Download ▼]

open abdata.gdt

# 1-step GMM estimation

dpanel 2 ; n w w(-1) k ys ys(-1) 0 --time-dummies --dpdstyle

matrix parm = $coeff

# Write CSV file for Ox

set csv_na .NaN

store @dotdir/abdata.csv

# Replicate using the Ox DPD package

foreign language=Ox

#include <oxstd.h>

#import <packages/DPD/dpd>

main ()

{

decl dpd = new DPD();

dpd.Load("@dotdir/abdata.csv");

dpd.SetYear("YEAR");

dpd.Select(DPD::Y_VAR, {"n", 0, 2});

dpd.Select(DPD::X_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});

dpd.Select(DPD::I_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});

dpd.Gmm("n", 2, 99); // GMM-type instrument

dpd.SetDummies(DPD::D_CONSTANT + DPD::D_TIME);

dpd.SetTest(2, 2); // Sargan, AR 1-2 tests

dpd.Estimate(); // 1-step estimation

decl parm = dpd.GetPar();

gretl_export(parm, "oxparm.mat");

delete dpd;

}

end foreign

# Compare the results

matrix oxparm = mread("oxparm.mat", 1)

eval abs((parm - oxparm) ./ oxparm)

Chapter 46

Gretl and Octave

46.1 Introduction

GNU Octave, written by John W. Eaton and others, is described as “a high-level language, primarily

intended for numerical computations.” The program is oriented towards “solving linear and nonlinear

problems numerically” and “performing other numerical experiments using a language that is mostly

compatible with Matlab.” (www.gnu.org/software/octave)Octave is available in source-code form

(naturally, for GNU software) and also in the form of binary packages for MS Windows and Mac OS

X. Numerous contributed packages that extend Octave’s functionality in various ways can be found

at octave.sf.net.

46.2 Octave support in gretl

The support oﬀered for Octave in gretl is similar to that oﬀered for R(chapter 44). For example, you

can open and edit Octave scripts in the gretl GUI. Clicking the “execute” icon in the editor window

will send your code to Octave for execution. Figures 46.1 and Figure 46.2 show an Octave script and

its output; in this example we use the function logistic_regression to replicate some results from

Greene (2000).

Figure 46.1:Octave editing window

In addition you can embed Octave code within a gretl script using a foreign block, as described in

427

Chapter 46. Gretl and Octave 428

Figure 46.2: Output from Octave

connection with R. A trivial example is shown below: it simply loads and prints the gretl data matrix

within Octave, then takes it back to gretl and checks for any diﬀerence (there should be none). (Note

that in Octave, appending “;” to a line suppresses verbose output; leaving oﬀ the semicolon results in

printing of the object that is produced, if any.)

open data4-1

matrix m = {dataset}

mwrite(m, "gretl.mat", 1)

foreign language=Octave

gmat = gretl_loadmat("gretl.mat")

gretl_export(gmat, "octave.mat")

end foreign

matrix chk = mread("octave.mat", 1)

eval maxr(maxc(abs(m - chk)))

The functions gretl_loadmat and gretl_export, which are predeﬁned when you run Octave from

within gretl, have the following signatures:

function A = gretl_loadmat(fname, autodot=1)

function gretl_export(X, fname, autodot=1)

By default traﬃc in matrices goes via the user’s “dotdir” (see section 15.2) on the Octave side; that

is, the name of this directory is prepended to filename for both reading and writing. (This is

complementary to use of the export and import parameters with gretl’s mwrite and mread func-

tions, respectively.) However, if you wish to take control over the reading and writing locations you

can supply a zero value for autodot (or give an absolute path) when calling gretl_loadmat and

gretl_export: in that case the filename argument is used as is.

46.3 Illustration: spectral methods

We now present a more ambitious example which exploits Octave’s handling of the frequency domain

(and also its ability to use code written for MATLAB), namely estimation of the spectral coherence

of two time series. For this illustration we require two extra Octave packages from octave.sf.

net, namely those supporting spectral functions (specfun) and signal processing (signal). After

downloading the packages you can install them from within Octave as follows (using version numbers

as of March 2010):

Chapter 46. Gretl and Octave 429

pkg install specfun-1.0.8.tar.gz

pkg install signal-1.0.10.tar.gz

In addition we need some specialized MATLAB ﬁles made available by Mario Forni of the Univer-

sity of Modena, at http://morgana.unimore.it/forni_mario/matlab.htm. The ﬁles needed are

coheren2.m,coheren.m,coher.m,cospec.m,crosscov.m,crosspec.m,crosspe.m and spec.m.

These are in a form appropriate for MS Windows. On Linux you could run the following shell script

to get the ﬁles and remove the Windows end-of-ﬁle character (which prevents the ﬁles from running

under Octave):

SITE=http://morgana.unimore.it/forni_mario/MYPROG

# download files and delete trailing Ctrl-Z

for f in \

coheren2.m \

coheren.m \

coher.m \

cospec.m \

crosscov.m \

crosspec.m \

crosspe.m \

spec.m ; do

wget $SITE/$f && \

cat $f | tr -d \\032 > tmp.m && mv tmp.m $f

done

The Forni ﬁles should be placed in some appropriate directory, and you should tell Octave where to

ﬁnd them by adding that directory to Octave’s loadpath. On Linux this can be done via an entry in

one’s ~/.octaverc ﬁle. For example

addpath("~/stats/octave/forni");

Alternatively, an addpath directive can be written into the Octave script that calls on these ﬁles.

With everything set up on the Octave side we now write a gretl script (see Listing 46.1) which opens a

time-series dataset, constructs and writes a matrix containing two series, and deﬁnes a foreign block

containing the Octave statements needed to produce the spectral coherence matrix. This matrix is

exported via gretl_export and picked up using mread. Finally, we produce a graph from the matrix

in gretl. In the script this is sent to the screen; Figure 46.3 shows the same graph in PDF format.

Chapter 46. Gretl and Octave 430

Listing 46.1: Estimation of spectral coherence via Octave

open data9-7

matrix xy = {PRIME, UNEMP}

mwrite(xy, "xy.mat", 1)

foreign language=Octave

pkg load signal

# uncomment and modify the following if necessary

# addpath("~/stats/octave/forni");

xy = gretl_loadmat("xy.mat");

x = xy(:,1);

y = xy(:,2);

# note: the last parameter is the Bartlett window size

h = coher(x, y, 8);

gretl_export(h, "h.mat");

end foreign

h = mread("h.mat", 1)

cnameset(h, "coherence")

gnuplot 1 --time-series --with-lines --matrix=h --output=display

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0 10 20 30 40 50 60

coherence

Figure 46.3: Spectral coherence estimated via Octave

Chapter 47

Gretl and Stata

Stata (www.stata.com) is closed-source, proprietary (and expensive) software and as such is not a

natural companion to gretl. Nonetheless, given Stata’s popularity it is desirable to have a convenient

way of comparing results across the two programs, and to that end we provide some support for Stata

code under the foreign command.

☞To enable support for Stata, go to the Tools/Preferences/General menu item and look under the Programs tab.

Find the entry for the path to the Stata executable. Adjust the path if it’s not already right for your system and

you should be ready to go.

The following example illustrates what’s available. You can send the current gretl dataset to Stata

using the --send-data ﬂag. And having deﬁned a matrix within Stata you can export it for use with

gretl via the gretl_export command: this takes two arguments, the name of the matrix to export

and the ﬁlename to use; the ﬁle is written to the user’s “dotdir”, from where it can be retrieved using

the mread() function.1To suppress printed output from Stata you can add the --quiet ﬂag to the

foreign block.

Listing 47.1: Comparison of clustered standard errors with Stata [Download ▼]

function matrix stata_reorder (matrix se)

# stata puts the intercept last, but gretl puts it first

scalar n = rows(se)

return se[n] | se[1:n-1]

end function

open data4-1

ols 1 0 2 3 --cluster=bedrms

matrix se = $stderr

foreign language=stata --send-data

regress price sqft bedrms, vce(cluster bedrms)

matrix vcv = e(V)

gretl_export vcv "vcv.mat"

end foreign

matrix stata_vcv = mread("vcv.mat", 1)

stata_se = stata_reorder(sqrt(diag(stata_vcv)))

matrix check = se - stata_se

print check

In addition you can edit “pure” Stata scripts in the gretl GUI and send them for execution as with

native gretl scripts.

Note that Stata coerces all variable names to lower-case on data input, so even if series names in gretl

are upper-case, or of mixed case, it’s necessary to use all lower-case in Stata. Also note that when

opening a data ﬁle within Stata via the use command it will be necessary to provide the full path to

the ﬁle.

1We do not currently oﬀer the complementary functionality of gretl_loadmat, which enables reading of matrices

written by gretl’s mwrite() function in Ox and Octave. This is not at all easy to implement in Stata code.

431

Chapter 48

Gretl and Python

48.1 Introduction

According to www.python.org,Python is “an easy to learn, powerful programming language. It has

eﬃcient high-level data structures and a simple but eﬀective approach to object-oriented program-

ming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an

ideal language for scripting and rapid application development in many areas on most platforms.”

Indeed, Python is widely used in a great variety of contexts. Numerous add-on modules are available;

the ones likely to be of greatest interest to econometricians include NumPy (“the fundamental package

for scientiﬁc computing with Python”—see www.numpy.org); SciPy (which builds on NumPy—see

www.scipy.org); and Statsmodels (http://statsmodels.sourceforge.net/).

48.2 Python support in gretl

The support oﬀered for Python in gretl is similar to that oﬀered for Octave (chapter 46). You can

open and edit Python scripts in the gretl GUI. Clicking the “execute” icon in the editor window will

send your code to Python for execution. In addition you can embed Python code within a gretl script

using a foreign block, as described in connection with R.

When you launch Python from within gretl one variable and two convenience functions are pre-deﬁned,

as follows.

gretl_dotdir

gretl_loadmat(filename, autodot=1)

gretl_export(M, filename, autodot=1)

The variable gretl_dotdir holds the path to the user’s “dot directory.” The ﬁrst function loads a

matrix of the given filename as written by gretl’s mwrite function, and the second writes matrix M,

under the given filename, in the format wanted by gretl.

By default the traﬃc in matrices goes via the dot directory on the Python side; that is, the name

of this directory is prepended to filename for both reading and writing. (This is complementary

to use of the export and import parameters with gretl’s mwrite and mread functions, respectively.)

However, if you wish to take control over the reading and writing locations you can supply a zero

value for autodot (or give an absolute path) when calling gretl_loadmat and gretl_export: in that

case the filename argument is used as is.

Note that gretl_loadmat and gretl_export depend on NumPy; they make use of the functions

loadtxt and savetxt respectively. Nonetheless, the presence of NumPy is not an absolute requirement

if you don’t need to use these two functions.

48.3 Illustration: linear regression with multicollinearity

Listing 48.1 compares the numerical accuracy of gretl’s ols command with that of the function

linalg.lstsq in NumPy, using the notorious Longley test data which exhibit extreme multicollinear-

ity. Unlike some econometrics packages, NumPy does a good job on these data. The script computes

and prints the log-relative error in estimation of the regression coeﬃcients, using the NIST-certiﬁed

values as a benchmark;1the error values correspond to the number of correct digits (with a maximum

of 15). The results will likely diﬀer somewhat by computer architecture and compiler.

1See http://www.itl.nist.gov/div898/strd/lls/data/Longley.shtml.

432

Chapter 48. Gretl and Python 433

Listing 48.1: Comparing regression results with Python [Download ▼]

set verbose off

function matrix logrel_err (const matrix est, const matrix true)

return -log10(abs(est - true) ./ abs(true))

end function

open longley.gdt -q

list LX = prdefl .. year

ols employ 0 LX -q

matrix b_gretl = $coeff

mwrite({employ} ~ {const} ~ {LX}, "alldata.mat", 1)

foreign language=python

import numpy as np

X = gretl_loadmat(’alldata.mat’, 1)

# NumPy’s OLS

b = np.linalg.lstsq(X[:,1:], X[:,0])[0]

gretl_export(np.transpose(np.matrix(b)), ’py_b.mat’, 1)

end foreign

# NIST’s certified coefficient values

matrix b_nist = {-3482258.63459582, 15.0618722713733,

-0.358191792925910E-01, -2.02022980381683,

-1.03322686717359, -0.511041056535807E-01,

1829.15146461355}’

matrix b_numpy = mread("py_b.mat", 1)

matrix E = logrel_err(b_gretl, b_nist) ~ logrel_err(b_numpy, b_nist)

cnameset(E, "gretl python")

printf "Log-relative errors, Longley coefficients:\n\n%#12.5g\n", E

printf "Column means\n%#12.5g\n", meanc(E)

Output:

Log-relative errors, Longley coefficients:

gretl python

12.844 12.850

11.528 11.414

12.393 12.401

13.135 13.121

13.738 13.318

12.587 12.363

12.848 12.852

Column means

12.725 12.617

Chapter 49

Gretl and Julia

49.1 Introduction

According to julialang.org,Julia is“a high-level, high-performance dynamic programming language

for technical computing, with syntax that is familiar to users of other technical computing environ-

ments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and

an extensive mathematical function library.” Julia is well known for being very fast; however, you

should be aware that by default starting Julia takes some time due to Just-in-Time compilation of

the input. This ﬁxed cost is well worth bearing if you are asking Julia to perform a big computation,

but small jobs are likely to run faster if you use the (Julia-speciﬁc) --no-compile option with the

foreign command.1

49.2 Julia support in gretl

The support oﬀered for Julia in gretl is similar to that oﬀered for Octave (chapter 46). You can open

and edit Julia scripts in the gretl GUI. Clicking the “execute” icon in the editor window will send

your code to Julia for execution. In addition you can embed Julia code within a gretl script using a

foreign block, as described in connection with R.

When you launch Julia from within gretl one variable and two convenience functions are pre-deﬁned,

as follows.

gretl_dotdir

gretl_loadmat(filename, autodot=true)

gretl_export(M, filename, autodot=true)

The variable gretl_dotdir holds the path to the user’s “dot directory.” The ﬁrst function loads a

matrix of the given filename as written by gretl’s mwrite function, and the second writes matrix M,

under the given filename, in the format wanted by gretl.

By default the traﬃc in matrices goes via the dot directory on the Julia side; that is, the name of this

directory is prepended to filename for both reading and writing. (This is complementary to use of

the export and import parameters with gretl’s mwrite and mread functions, respectively.) However,

if you wish to take control over the reading and writing locations you can supply a zero value for

autodot (or give an absolute path) when calling gretl_loadmat and gretl_export: in that case the

filename argument is used as is.

49.3 Illustration

Listing 49.1 shows a minimal example of how to interact with Julia from a gretl script.

Since this is a very small job JIT compilation is not worthwhile; in our testing the script runs almost

4 times faster if the Julia block is opened with

foreign language=julia --no-compile

This has the eﬀect of passing the option --compile=no to the Julia executable.

1Caveat: it seems that this option is not supported by all builds of Julia.

434

Chapter 49. Gretl and Julia 435

Listing 49.1: Simple Julia I/O example [Download ▼]

set verbose off

matrix A = mnormal(4,4) # generate a random matrix

mwrite(A, "A", 1) # and save it to a file

foreign language=julia # call Julia

print("Hi from Julia!\n"); # output a string

A = gretl_loadmat("A"); # grab the matrix from gretl

gretl_export(inv(A), "iA.mat"); # and save its inverse

end foreign # go back to gretl

matrix iA = mread("iA.mat", 1) # read the inverse from Julia

matrix check = A * iA # compute the product

print check # print out the check (should be I)

Output (good approximation to identity matrix):

Hi from Julia!

check (4 x 4)

1.0000 6.9389e-18 1.6653e-16 1.6653e-16

0.0000 1.0000 0.0000 0.0000

-4.4409e-16 -8.3267e-17 1.0000 -6.6613e-16

-4.4409e-16 -1.3878e-17 -1.1102e-16 1.0000

Chapter 50

Troubleshooting gretl

50.1 Bug reports

Bug reports are welcome—well, if not exactly welcome then useful and appreciated. Hopefully, you

are unlikely to ﬁnd bugs in the actual calculations done by gretl (although this statement does not

constitute any sort of warranty). You may, however, come across bugs or oddities in the behavior

of the graphical interface. Please remember that the usefulness of bug reports is greatly enhanced

if you can be as speciﬁc as possible: what exactly went wrong, under what conditions, and on what

operating system? If you saw an error message, what precisely did it say?

One way of making a bug report more useful is to run the program in such a way that you can see

(and copy) any additional information that gets printed to the stderr output stream. On Linux and

Mac OS X that’s just a matter of launching gretl from the command prompt in a terminal window.

On MS Windows it’s a bit more complicated since stderr is by default “invisble.” However, you can

quite easily set up a special gretl shortcut that does the job. On the Windows desktop, right-click

and select “New shortcut.” In the dialog box that appears, browse to ﬁnd gretl.exe and append the

--debug ﬂag, as shown in Figure 50.1. Note that there are two dashes before “debug”.

Figure 50.1: Creating a debugging shortcut

When you start gretl in this mode, a “console window” appears as well as the gretl window, and

stderr output goes to the console. To copy this output, click at the top left of the console window

for a menu (Figure 50.2): ﬁrst do “Select all”, then “Copy.” You can paste the results into Notepad

436

Chapter 50. Troubleshooting gretl 437

or similar.

Figure 50.2: The program with console window

50.2 Auxiliary programs

As mentioned above, gretl calls some other programs to accomplish certain tasks (gnuplot for graphing,

X for high-quality typesetting of regression output, GNU R). If something goes wrong with such

external links, it is not always easy for gretl to produce an informative error message. If such a link

fails when accessed from the gretl graphical interface, you may be able to get more information by

starting gretl from the command prompt rather than via a desktop menu entry or icon. On the X

window system, start gretl from the shell prompt in an xterm; on MS Windows, start the program

gretl.exe from a console window or “DOS box” using the -g or --debug option ﬂag. Additional

error messages may be displayed on the terminal window.

Also please note that for most external calls, gretl assumes that the programs in question are available

in your “path”—that is, that they can be invoked simply via the name of the program, without

supplying the program’s full location.1Thus if a given program fails, try the experiment of typing

the program name at the command prompt, as shown below.

Graphing Typesetting GNU R

X window system gnuplot pdﬂatex R

MS Windows wgnuplot.exe pdﬂatex RGui.exe

If the program fails to start from the prompt, it’s not a gretl issue but rather that the program’s home

directory is not in your path, or the program is not installed (properly). For details on modifying

your path please see the documentation or online help for your operating system or shell.

1The exception to this rule is the invocation of gnuplot under MS Windows, where a full path to the program is

given.

Chapter 51

The command line interface

The gretl package includes the command-line program gretlcli. On Linux it can be run from a terminal

window (xterm, rxvt, or similar), or at the text console. Under MS Windows it can be run in a console

window (sometimes inaccurately called a “DOS box”). gretlcli has its own help ﬁle, which may be

accessed by typing “help” at the prompt. It can be run in batch mode, sending output directly to a

ﬁle (see also the Gretl Command Reference).

If gretlcli is linked to the readline library (this is automatically the case in the MS Windows version;

also see Appendix B), the command line is recallable and editable, and oﬀers command completion.

You can use the Up and Down arrow keys to cycle through previously typed commands. On a

given command line, you can use the arrow keys to move around, in conjunction with Emacs editing

keystrokes.1The most common of these are:

Keystroke Eﬀect

Ctrl-a go to start of line

Ctrl-e go to end of line

Ctrl-d delete character to right

where “Ctrl-a” means press the “a” key while the “Ctrl” key is also depressed. Thus if you want to

change something at the beginning of a command, you don’t have to backspace over the whole line,

erasing as you go. Just hop to the start and add or delete characters. If you type the ﬁrst letters of

a command name then press the Tab key, readline will attempt to complete the command name for

you. If there’s a unique completion it will be put in place automatically. If there’s more than one

completion, pressing Tab a second time brings up a list.

Probably the most useful mode for heavy-duty work with gretlcli is batch (non-interactive) mode, in

which the program reads and processes a script, and sends the output to ﬁle. For example

gretlcli -b scriptfile > outputfile

Note that scriptﬁle is treated as a program argument; only the output ﬁle requires redirection (>).

Don’t forget the -b (batch) switch, otherwise the program will wait for user input after executing the

script (and if output is redirected, the program will appear to “hang”).

1Actually, the key bindings shown below are only the defaults; they can be customized. See the readline manual.

438

Part IV

Appendices

439

Appendix A

Data ﬁle details

A.1 Basic native format

In gretl’s basic native data format—for which we use the suﬃx gdt—a dataset is stored in XML

(extensible mark-up language). Data ﬁles correspond to the simple DTD (document type deﬁnition)

given in gretldata.dtd, which is supplied with the gretl distribution and is installed in the system

data directory (e.g. /usr/share/gretl/data on Linux.) Such ﬁles may be plain text (uncompressed)

or gzipped. They contain the actual data values plus additional information such as the names and

descriptions of variables, the frequency of the data, and so on.

In a gdt ﬁle the actual data values are written to 17 signiﬁcant ﬁgures (for generated data such as

logs or pseudo-random numbers) or to a maximum of 15 ﬁgures for primary data. The C printf

format “%.*g” is used (for *= 17 or 15) so that trailing zeros are not printed.

Most users will probably not have need to read or write such ﬁles other than via gretl itself, but if

you want to manipulate them using other software tools you should examine the DTD and also take a

look at a few of the supplied practice data ﬁles: data4-1.gdt gives a simple example; data4-10.gdt

is an example where observation labels are included.

A.2 Binary data ﬁle format

A native binary format is also available for dataset storage. This format—with suﬃx gdtb—oﬀers

much faster writing and reading for very large datasets. For small to moderately sized datasets (say,

up to a few megabytes) there is little advantage in the binary format and we recommend use of plain

gdt. Note that gdtb ﬁles are saved in the endianness of the machine on which they’re written and

are not portable across platforms of diﬀering endianness, but since almost all machines on which gretl

is likely to be run are little-endian this is unlikely to be a serious limitation. The implementation of

gdtb format can be found in purebin.c, in the plugin subdirectory of the gretl source tree.

Prior to version 2021b of gretl, gdtb ﬁles had a diﬀerent structure, namely a PKZIP ﬁle containing an

XML component for the metadata plus a binary component for the actual data values. It turned out

that this hybrid format did not scale well for datasets with a great deal of metadata. For backward

compatibility gretl can still read such old-style ﬁles but it doesn’t write them any more.

A.3 Native database format

A gretl database has two primary parts: a plain text index ﬁle (with ﬁlename suﬃx .idx) containing

information on the included series, and a binary ﬁle (suﬃx .bin) containing the actual data. Two

examples of the format for an entry in the idx ﬁle are shown below:

G0M910 Composite index of 11 leading indicators (1987=100)

M 1948.01 - 1995.11 n = 575

currbal Balance of Payments: Balance on Current Account; SA

Q 1960.1 - 1999.4 n = 160

The ﬁrst ﬁeld is the series name. The second is a description of the series (maximum 128 characters).

On the second line the ﬁrst ﬁeld is a frequency code: Mfor monthly, Qfor quarterly, Afor annual, B

for business-daily (daily with ﬁve days per week), Dfor 7-day daily, Sfor 6-day daily, Ufor undated.

No other frequencies are accepted at present.

Then comes the starting date (with two digits following the point for monthly data, one for quarterly

data, none for annual), a space, a hyphen, another space, the ending date, the string “n = ” and the

440

Appendix A. Data ﬁle details 441

integer number of observations. In the case of daily data the starting and ending dates should be

given in the ISO 8601 form, YYYY-MM-DD. This format must be respected exactly.

Optionally, the ﬁrst line of the index ﬁle may contain a short comment (up to 64 characters) on the

source and nature of the data, following a hash mark. For example:

# Federal Reserve Board (interest rates)

The corresponding binary database ﬁle holds the data values, represented as “ﬂoats”, that is, single-

precision ﬂoating-point numbers taking four bytes apiece. The values are packed “by variable”, so

that the ﬁrst nnumbers are the observations of variable 1, the next mthe observations on variable

2, and so on.

A third ﬁle may accompany the idx and bin ﬁles, namely a “codebook” containing a description of

the data. If present, this must be plain text, with ﬁlename suﬃx .cb, or PDF with suﬃx .pdf.

The components of a gretl database are generally combined in a single ﬁle, with zlib compression and

.gz suﬃx, for distribution. A small program named gretlzip can be used to create or unpack such

ﬁles. See the utils/dbzip subdirectory of the gretl source tree.

Appendix B

Building gretl

Here we give instructions detailed enough to allow a user with only a basic knowledge of a Unix-

type system to build gretl. These steps were tested on a fresh installation of Debian Etch. For

other Linux distributions (especially Debian-based ones, like Ubuntu and its derivatives) little should

change. Other Unix-like operating systems such as Mac OS X and BSD would probably require more

substantial adjustments.

In this guided example, we will build gretl complete with documentation. This introduces a few more

requirements, but gives you the ability to modify the documentation ﬁles as well, like the help ﬁles

or the manuals.

B.1 Installing the prerequisites

We assume that the basic GNU utilities are already installed on the system, together with these other

programs:

•some T

X/L

Xsystem (texlive will do beautifully)

•Gnuplot

•ImageMagick

We also assume that the user has administrative privileges and knows how to install packages. The

examples below are carried out using the apt-get shell command, but they can be performed with

menu-based utilities like aptitude,dselect or the GUI-based program synaptic. Users of Linux

distributions which employ rpm packages (e.g. Red Hat/Fedora, Mandriva, SuSE) may want to refer

to the dependencies page on the gretl website.

The ﬁrst step is installing the C compiler and related basic utilities, if these are not already in place.

On a Debian (or derivative) system, these are contained in a bunch of packages that can be installed

via the command

apt-get install gcc autoconf automake1.9 libtool flex bison gcc-doc \

libc6-dev libc-dev gfortran gettext pkgconfig

Then it is necessary to install the “development” (dev) packages for the libraries that gretl uses:

Library command

GLIB apt-get install libglib2.0-dev

GTK 3.0 apt-get install libgtk3.0-dev

PNG apt-get install libpng12-dev

XSLT apt-get install libxslt1-dev

LAPACK apt-get install liblapack-dev

FFTW apt-get install libfftw3-dev

READLINE apt-get install libreadline-dev

ZLIB apt-get install zlib1g-dev

XML apt-get install libxml2-dev

GMP apt-get install libgmp-dev

CURL apt-get install libcurl4-gnutls-dev

MPFR apt-get install libmpfr-dev

442

Appendix B. Building gretl 443

It is possible to substitute GTK 2.0 for GTK 3.0. The dev packages for these libraries are necessary

to compile gretl—you’ll also need the plain, non-dev library packages to run gretl, but most of these

should already be part of a standard installation. In order to enable other optional features, like

audio support, you may need to install more libraries.

☞The above steps can be much simpliﬁed on Linux systems that provide deb-based package managers, such as

Debian and its derivatives (Ubuntu, Knoppix and other distributions). The command

apt-get build-dep gretl

will download and install all the necessary packages for building the version of gretl that is currently present in

your APT sources. Technically, this does not guarantee that all the software necessary to build the git version is

included, because the version of gretl on your repository may be quite old and build requirements may have changed

in the meantime. However, the chances of a mismatch are rather remote for a reasonably up-to-date system, so in

most cases the above command should take care of everything correctly.

B.2 Getting the source: release or git

At this point, it is possible to build from the source. You have two options here: obtain the latest

released source package, or retrieve the current git version of gretl (git = the version control software

currently in use for gretl). The usual caveat applies to the git version, namely, that it may not build

correctly and may contain“experimental”code; on the other hand, git often contains bug-ﬁxes relative

to the released version. If you want to help with testing and to contribute bug reports, we recommend

using git gretl.

To work with the released source:

1. Download the gretl source package from gretl.sourceforge.net.

2. Unzip and untar the package. On a system with the GNU utilities available, the command

would be tar xvfJ gretl-N.tar.xz (replace Nwith the speciﬁc version number of the ﬁle you

downloaded at step 1).

3. Change directory to the gretl source directory created at step 2 (e.g. gretl-2020a).

4. Proceed to the next section, “Conﬁgure and make”.

To work with git you’ll ﬁrst need to install the git client program if it’s not already on your sys-

tem. Relevant resources you may wish to consult include the main git website at git-scm.com and

instructions speciﬁc to gretl: gretl git basics.

When grabbing the git sources for the ﬁrst time, you should ﬁrst decide where you want to store

the code. For example, you might create a directory called git under your home directory. Open a

terminal window, cd into this directory, and type the following commands:

git clone git://git.code.sf.net/p/gretl/git gretl-git

At this point git should create a subdirectory named gretl-git and ﬁll it with the current sources.

When you want to update the source, this is very simple: just move into the gretl-git directory and

type

git pull

Assuming you’re now in the gretl-git directory, you can proceed in the same manner as with the

released source package.

B.3 Conﬁgure the source

The next command you need is ./configure; this is a complex script that detects which tools you

have on your system and sets things up. The configure command accepts many options; you may

want to run

./configure --help

Appendix B. Building gretl 444

ﬁrst to see what options are available. One option you way wish to tweak is -prefix. By default the

installation goes under /usr/local but you can change this. For example

./configure --prefix=/usr

will put everything under the /usr tree.

Note that the recommended location to build gretl is not in the source directory. The way to achieve

that is quite simple: the invocation of the conﬁgure script has to take into account the relative path

to the source tree. So if your build directory is inside (underneath) the source tree it is

../configure

while if it is in a parallel tree it would be (something like):

../gretl-git/configure

If you have a multi-core machine you may want to activate support for OpenMP, which permits the

parallelization of matrix multiplication and some other tasks. This requires adding the configure

ﬂag

--enable-openmp

By default the gretl GUI is built using version 3.0 of the GTK library, if available, otherwise version

2.0. If you have both versions installed and prefer to use GTK 2.0, use the ﬂag

--enable-gtk2

In order to have the documentation built, we need to pass the relevant option to configure, as in

--enable-build-doc

But please note that this option will work only if you are using the git source.

☞In order to build the documentation, there is the possibility that you will have to install some extra software

on top of the packages mentioned in the previous section. For example, you may need some extra L

X packages

to compile the manuals. Two of the required packages, that not every standard L

X installation include, are

typically pifont.sty and appendix.sty. You could install the corresponding packages from your distribution or you

could simply download them from CTAN and install them by hand.

This, for example, if you want to install under /usr, with OpenMP support, and also build the

documentation, you would do

./configure --prefix=/usr \

--enable-openmp \

--enable-build-doc

You will see a number of checks being run, and if everything goes according to plan, you should see

a summary similar to that displayed in Listing B.1.

☞If you’re using git, it’s a good idea to re-run the configure script after doing an update. This is not always

necessary, but sometimes it is, and it never does any harm. For this purpose, you may want to write a little shell

script that calls configure with any options you want to use.

B.4 Build and install

We are now ready to undertake the compilation proper: this is done by running the make command,

which takes care of compiling all the necessary source ﬁles in the correct order. All you need to do is

type

make

Appendix B. Building gretl 445

Listing B.1: Sample output from ./configure

Configuration:

Installation path: /usr

Use readline library: yes

Use gnuplot for graphs: yes

Use LaTeX for typesetting output: yes

Use libgsf for zip/unzip: no

sse2 support for RNG: yes

OpenMP support: yes

MPI support: no

AVX support for arithmetic: no

Build with GTK version: 2.0

Build gretl documentation: yes

Use Lucida fonts: no

Build message catalogs: yes

X-12-ARIMA support: yes

TRAMO/SEATS support: yes

libR support: yes

ODBC support: no

Experimental audio support: no

Use xdg-utils in installation: if DESTDIR not set

LAPACK libraries:

-llapack -lblas -lgfortran

Now type ’make’ to build gretl.

You can also do ’make pdfdocs’ to build the PDF documentation.

This step will likely take several minutes to complete; a lot of output will be produced on screen.

Once this is done, you can install your freshly baked copy of gretl on your system via

make install

On most systems, the make install command requires you to have administrative privileges. Hence,

either you log in as root before launching make install or you may want to use the sudo utility, as

in:

sudo make install

Now try if everything works: go back to your home directory and run gretl

cd ~

gretl &

If all is well, you ought to see gretl start, at which point just exit the program in the usual way. On

the other hand, there is the possibility that gretl doesn’t start and instead you see a message like

/usr/local/bin/gretl_x11: error while loading shared libraries:

libgretl-1.0.so.0: cannot open shared object file: No such file or directory

In this case, just run

sudo ldconfig

The problem should be ﬁxed once and for all.

Appendix C

Numerical accuracy

Gretl uses double-precision arithmetic throughout—except for the multiple-precision plugin invoked

by the menu item “Model, Other linear models, High precision OLS” which represents ﬂoating-point

values using a number of bits given by the environment variable GRETL_MP_BITS (default value 256).

The normal equations of Least Squares are by default solved via Cholesky decomposition, which is

highly accurate provided the matrix of cross-products of the regressors, X′X, is not very ill condi-

tioned. If this problem is detected, gretl automatically switches to use QR decomposition.

The program has been tested rather thoroughly on the statistical reference datasets provided by NIST

(the U.S. National Institute of Standards and Technology) and a full account of the results may be

found on the gretl website (follow the link “Numerical accuracy”).

To date, two published reviews have discussed gretl’s accuracy: Giovanni Baiocchi and Walter Distaso

(2003), and Talha Yalta and Yasemin Yalta (2007). We are grateful to these authors for their careful

examination of the program. Their comments have prompted several modiﬁcations including the use

of Stephen Moshier’s cephes code for computing p-values and other quantities relating to probability

distributions (see netlib.org), changes to the formatting of regression output to ensure that the pro-

gram displays a consistent number of signiﬁcant digits, and attention to compiler issues in producing

the MS Windows version of gretl (which at one time was slighly less accurate than the Linux version).

Gretl now includes a “plugin” that runs the NIST linear regression test suite. You can ﬁnd this under

the “Tools” menu in the main window. When you run this test, the introductory text explains the

expected result. If you run this test and see anything other than the expected result, please send a

bug report to cottrell@wfu.edu.

All regression statistics are printed to 6 signiﬁcant ﬁgures in the current version of gretl (except when

the multiple-precision plugin is used, in which case results are given to 12 ﬁgures). If you want to

examine a particular value more closely, ﬁrst save it (for example, using the genr command) then

print it using printf, to as many digits as you like (see the Gretl Command Reference).

446

Appendix D

Related free software

Gretl’s capabilities are substantial, and are expanding. Nonetheless you may ﬁnd there are some

things you can’t do in gretl, or you may wish to compare results with other programs. If you are

looking for complementary functionality in the realm of free, open-source software we recommend the

following programs. The self-description of each program is taken from its website.

•GNU R r-project.org: “R is a system for statistical computation and graphics. It consists of

a language plus a run-time environment with graphics, a debugger, access to certain system

functions, and the ability to run programs stored in script ﬁles. . . It compiles and runs on a

wide variety of UNIX platforms, Windows and MacOS.” Comment: There are numerous add-on

packages for R covering most areas of statistical work.

•GNU Octave www.octave.org: “GNU Octave is a high-level language, primarily intended for

numerical computations. It provides a convenient command line interface for solving linear

and nonlinear problems numerically, and for performing other numerical experiments using a

language that is mostly compatible with Matlab. It may also be used as a batch-oriented

language.”

•Julia julialang.org: “Julia is a high-level, high-performance dynamic programming language

for technical computing, with syntax that is familiar to users of other technical computing

environments. It provides a sophisticated compiler, distributed parallel execution, numerical

accuracy, and an extensive mathematical function library.”

•JMulTi www.jmulti.de: “JMulTi was originally designed as a tool for certain econometric pro-

cedures in time series analysis that are especially diﬃcult to use and that are not available

in other packages, like Impulse Response Analysis with bootstrapped conﬁdence intervals for

VAR/VEC modelling. Now many other features have been integrated as well to make it possible

to convey a comprehensive analysis.” Comment: JMulTi is a java GUI program: you need a

java run-time environment to make use of it.

As mentioned above, gretl oﬀers the facility of exporting data in the formats of both Octave and R.

In the case of Octave, the gretl data set is saved as a single matrix, X. You can pull the Xmatrix

apart if you wish, once the data are loaded in Octave; see the Octave manual for details. As for R,

the exported data ﬁle preserves any time series structure that is apparent to gretl. The series are

saved as individual structures. The data should be brought into R using the source() command.

In addition, gretl has a convenience function for moving data quickly into R. Under gretl’s “Tools”

menu, you will ﬁnd the entry “Start GNU R”. This writes out an R version of the current gretl data

set (in the user’s gretl directory), and sources it into a new R session. The particular way R is

invoked depends on the internal gretl variable Rcommand, whose value may be set under the “Tools,

Preferences” menu. The default command is RGui.exe under MS Windows. Under X it is xterm -e

R. Please note that at most three space-separated elements in this command string will be processed;

any extra elements are ignored.

447

Appendix E

Listing of URLs

Below is a listing of the full URLs of websites mentioned in the text.

Estima (RATS) http://www.estima.com/

FFTW3 http://www.fftw.org/

Gnome desktop homepage http://www.gnome.org/

GNU Multiple Precision (GMP) library http://gmplib.org/

CURL library http://curl.haxx.se/libcurl/

GNU Octave homepage http://www.octave.org/

GNU R homepage http://www.r-project.org/

GNU R manual http://cran.r-project.org/doc/manuals/R-intro.pdf

Gnuplot homepage http://www.gnuplot.info/

Gretl data page http://gretl.sourceforge.net/gretl_data.html

Gretl homepage http://gretl.sourceforge.net/

GTK+ homepage http://www.gtk.org/

GTK+ port for win32 https://wiki.gnome.org/Projects/GTK/Win32/

InfoZip homepage http://www.info-zip.org/pub/infozip/zlib/

JMulTi homepage http://www.jmulti.de/

JRSoftware http://www.jrsoftware.org/

Julia homepage http://julialang.org/

Mingw (gcc for win32) homepage http://www.mingw.org/

Minpack http://www.netlib.org/minpack/

Penn World Table http://pwt.econ.upenn.edu/

Readline homepage http://cnswww.cns.cwru.edu/~chet/readline/rltop.html

Readline manual http://cnswww.cns.cwru.edu/~chet/readline/readline.html

Xmlsoft homepage http://xmlsoft.org/

448

Bibliography

Akaike, H. (1974) ‘A new look at the statistical model identiﬁcation’, IEEE Transactions on Auto-

matic Control AC-19: 716–723.

Anderson, T. W. and C. Hsiao (1981) ‘Estimation of dynamic models with error components’,

Journal of the American Statistical Association 76: 598–606.

Andrews, D. W. K. and J. C. Monahan (1992) ‘An improved heteroskedasticity and autocorrelation

consistent covariance matrix estimator’, Econometrica 60: 953–966.

Arellano, M. (2003) Panel Data Econometrics, Oxford: Oxford University Press.

Arellano, M. and S. Bond (1991) ‘Some tests of speciﬁcation for panel data: Monte carlo evidence

and an application to employment equations’, The Review of Economic Studies 58: 277–297.

Armesto, M. T., K. Engemann and M. Owyang (2010) ‘Forecasting with mixed frequencies’, Fed-

eral Reserve Bank of St. Louis Review 92(6): 521–536. URL http://research.stlouisfed.org/

publications/review/10/11/Armesto.pdf.

Baiocchi, G. and W. Distaso (2003) ‘GRETL: Econometric software for the GNU generation’, Journal

of Applied Econometrics 18: 105–110.

Baltagi, B. H. (1995) Econometric Analysis of Panel Data, New York: Wiley.

Baltagi, B. H. and Y.-J. Chang (1994) ‘Incomplete panels: A comparative study of alternative

estimators for the unbalanced one-way error component regression model’, Journal of Econometrics

62: 67–89.

Baltagi, B. H. and Q. Li (1990) ‘A lagrange multiplier test for the error components model with

incomplete panels’, Econometric Reviews 9: 103–107.

Baltagi, B. H. and P. X. Wu (1999) ‘Unequally spaced panel data regressions with AR(1) distur-

bances’, Econometric Theory 15: 814–823.

Barrodale, I. and F. D. K. Roberts (1974) ‘Solution of an overdetermined system of equations in the

ℓlnorm’, Communications of the ACM 17: 319–320.

Baxter, M. and R. G. King (1999) ‘Measuring business cycles: Approximate band-pass ﬁlters for

economic time series’, The Review of Economics and Statistics 81(4): 575–593.

Beck, N. and J. N. Katz (1995) ‘What to do (and not to do) with time-series cross-section data’,

The American Political Science Review 89: 634–647.

Bera, A. K., C. M. Jarque and L. F. Lee (1984) ‘Testing the normality assumption in limited

dependent variable models’, International Economic Review 25: 563–578.

Berndt, E., B. Hall, R. Hall and J. Hausman (1974) ‘Estimation and inference in nonlinear structural

models’, Annals of Economic and Social Measurement 3(4): 653–665.

Bhargava, A., L. Franzini and W. Narendranathan (1982) ‘Serial correlation and the ﬁxed eﬀects

model’, Review of Economic Studies 49: 533–549.

Blundell, R. and S. Bond (1998) ‘Initial conditions and moment restrictions in dynamic panel data

models’, Journal of Econometrics 87: 115–143.

Bond, S., A. Hoeﬄer and J. Temple (2001) ‘GMM estimation of empirical growth models’. Economics

Papers from Economics Group, Nuﬃeld College, University of Oxford, No 2001-W21.

Boswijk, H. P. (1995) ‘Identiﬁability of cointegrated systems’. Tinbergen Institute Discussion Paper

95-78. URL http://www.ase.uva.nl/pp/bin/258fulltext.pdf.

449

Bibliography 450

Boswijk, H. P. and J. A. Doornik (2004) ‘Identifying, estimating and testing restricted cointegrated

systems: An overview’, Statistica Neerlandica 58(4): 440–465.

Bournay, J. and G. Laroque (1979) ‘R´eﬂexions sur la m´ethode d’´elaboration des comptes trimestriels’,

Annales de l’ins´e´e (36): 3–30. URL http://www.jstor.com/stable/20075332.

Box, G. E. P. and G. Jenkins (1976) Time Series Analysis: Forecasting and Control, San Franciso:

Holden-Day.

Brand, C. and N. Cassola (2004) ‘A money demand system for euro area M3’, Applied Economics

36(8): 817–838.

Butterworth, S. (1930) ‘On the theory of ﬁlter ampliﬁers’, Experimental Wireless & The Wireless

Engineer 7: 536–541.

Byrd, R. H., P. Lu, J. Nocedal and C. Zhu (1995) ‘A limited memory algorithm for bound constrained

optimization’, SIAM Journal on Scientiﬁc Computing 16(5): 1190–1208.

Cameron, A. C., J. B. Gelbach and D. L. Miller (2011) ‘Robust inference with multiway clustering’,

Journal of Business & Economic Statistics 29(2): 238–249.

Cameron, A. C. and D. L. Miller (2015) ‘A practitioner’s guide to cluster-robust inference’, Journal

of Human Resources 50(2): 317–373.

Cameron, A. C. and P. K. Trivedi (1986) ‘Econometric models based on count data: comparisons

and applications of some estimators and tests’, Journal of Applied Econometrics 1: 29–54.

(1998) Regression Analysis of Count Data, Cambridge: Cambridge University Press.

(2005) Microeconometrics, Methods and Applications, Cambridge: Cambridge University

Press.

(2013) Regression Analysis of Count Data, Cambridge University Press.

Caselli, F., G. Esquivel and F. Lefort (1996) ‘Reopening the convergence debate: A new look at

cross-country growth empirics’, Journal of Economic Growth 1(3): 363–389.

Chesher, A. and M. Irish (1987) ‘Residual analysis in the grouped and censored normal linear model’,

Journal of Econometrics 34: 33–61.

Choi, I. (2001) ‘Unit root tests for panel data’, Journal of International Money and Finance 20(2):

249–272.

Cholette, P. A. (1984) ‘Adjusting sub-annual series to yearly benchmarks’, Survey Methodology 10(1):

35–49. URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1984001/article/14348-eng.

pdf.

Chow, G. C. and A.-l. Lin (1971) ‘Best linear unbiased interpolation, distribution, and extrapolation

of time series by related series’, The Review of Economics and Statistics 53(4): 372–375. URL

https://www.jstor.org/stable/1928739.

Cleveland, W. S. (1979) ‘Robust locally weighted regression and smoothing scatterplots’, Journal of

the American Statistical Association 74(368): 829–836.

Cottrell, A. (2017) ‘Random eﬀects estimators for unbalanced panel data: a Monte Carlo analysis’.

gretl working papers, number 4. URL https://ideas.repec.org/p/anc/wgretl/4.html.

Cottrell, A. and R. Lucchetti (2016) Gretl Function Package Guide, gretl documentation. URL

http://sourceforge.net/projects/gretl/files/manual/.

Cribari-Neto, F. and S. G. Zarkos (2003) ‘Econometric and statistical computing using Ox’, Com-

putational Economics 21: 277–295.

Datta, D. D. and W. Du (2012) ‘Nonparametric HAC estimation for time series data with miss-

ing observations’. Board of Governors of the Federal Reserve System, International Finance Dis-

cussion Papers, Number 1060. URL https://www.federalreserve.gov/pubs/ifdp/2012/1060/

ifdp1060.pdf.

Bibliography 451

Davidson, R. and E. Flachaire (2001) ‘The wild bootstrap, tamed at last’. GREQAM Document de

Travail 99A32. URL http://russell.vcharite.univ-mrs.fr/GMMboot/wild5-euro.pdf.

Davidson, R. and J. G. MacKinnon (1993) Estimation and Inference in Econometrics, New York:

Oxford University Press.

(2004) Econometric Theory and Methods, New York: Oxford University Press.

Denton, F. T. (1971) ‘Adjustment of monthly or quarterly series to annual totals: An approach

based on quadratic minimization’, Journal of the American Statistical Association 66(333): 99–102.

URL http://www.jstor.com/stable/2284856.

Di Fonzo, T. (2003) ‘Benchmarking di serie storiche economiche. Nota tecnica ed estensioni’. Working

paper, Universit`a degli Studi di Padova. URL http://paduaresearch.cab.unipd.it/7302/1/WP_

2003_10.pdf.

Di Fonzo, T. and M. Marini (2012) ‘On the extrapolation with the Denton proportional benchmark-

ing method’. IMF Working Paper WP/12/169. URL https://www.imf.org/external/pubs/ft/

wp/2012/wp12169.pdf.

Doornik, J. A. (1995) ‘Testing general restrictions on the cointegrating space’. Discussion Paper,

Nuﬃeld College. URL http://www.doornik.com/research/coigen.pdf.

(1998) ‘Approximations to the asymptotic distribution of cointegration tests’, Journal of

Economic Surveys 12: 573–593. Reprinted with corrections in McAleer and Oxley (1999).

(2007) Object-Oriented Matrix Programming Using Ox, London: Timberlake Consultants

Press, third edn. URL http://www.doornik.com.

Doornik, J. A., M. Arellano and S. Bond (2006) Panel Data estimation using DPD for Ox.

Doornik, J. A. and H. Hansen (1994) ‘An omnibus test for univariate and multivariate normality’.

Working paper, Nuﬃeld College, Oxford.

Driscoll, J. C. and A. C. Kraay (1998) ‘Consistent covariance matrix estimation with spatially

dependent panel data’, Review of Economics and Statistics 80(4): 549–560. URL https://www.

jstor.org/stable/2646837.

Durbin, J. and S. J. Koopman (2012) Time Series Analysis by State Space Methods, Oxford: Oxford

University Press, second edn.

Elliott, G., T. J. Rothenberg and J. H. Stock (1996) ‘Eﬃcient tests for an autoregressive unit root’,

Econometrica 64: 813–836.

Engle, R. F. and C. W. J. Granger (1987) ‘Co-integration and error correction: Representation,

estimation, and testing’, Econometrica 55: 251–276.

Fern´andez, R. B. (1981) ‘A methodological note on the estimation of time series’, The Review of

Economics and Statistics 63(3): 471–476. URL https://www.jstor.org/stable/1924371.

Fiorentini, G., G. Calzolari and L. Panattoni (1996) ‘Analytic derivatives and the computation of

GARCH estimates’, Journal of Applied Econometrics 11: 399–417.

Frigo, M. and S. G. Johnson (2005) ‘The design and implementation of FFTW3’, Proceedings of the

IEEE 93 2: 216–231.

Ghysels, E. (2015) ‘MIDAS Matlab Toolbox’. University of North Carolina, Chapel Hill. URL

http://www.unc.edu/~eghysels/papers/MIDAS_Usersguide_V1.0.pdf.

Ghysels, E. and H. Qian (2016) ‘Estimating MIDAS regressions via OLS with polynomial parameter

proﬁling’. University of North Carolina, Chapel Hill, and MathWorks. URL http://dx.doi.org/

10.2139/ssrn.2837798.

Ghysels, E., P. Santa-Clara and R. Valkanov (2004) ‘The MIDAS touch: Mixed data sampling

regression models’. S´erie Scientiﬁque, CIRANO, Montr´eal. URL http://www.cirano.qc.ca/files/

publications/2004s-20.pdf.

Bibliography 452

Golub, G. H. and C. F. Van Loan (1996) Matrix Computations, Baltimore and London: The John

Hopkins University Press, third edn.

Goossens, M., F. Mittelbach and A. Samarin (2004) The L

X Companion, Boston: Addison-Wesley,

second edn.

Gould, W. (2013) ‘Interpreting the intercept in the ﬁxed-eﬀects model’. URL http://www.stata.

com/support/faqs/statistics/intercept-in-fixed-effects-model/.

Gourieroux, C., A. Monfort, E. Renault and A. Trognon (1987) ‘Generalized residuals’, Journal of

Econometrics 34: 5–32.

Greene, W. H. (2000) Econometric Analysis, Upper Saddle River, NJ: Prentice-Hall, fourth edn.

(2003) Econometric Analysis, Upper Saddle River, NJ: Prentice-Hall, ﬁfth edn.

Hall, A. D. (2005) Generalized Method of Moments, Oxford: Oxford University Press.

Hamilton, J. D. (1994) Time Series Analysis, Princeton, NJ: Princeton University Press.

Hannan, E. J. and B. G. Quinn (1979) ‘The determination of the order of an autoregression’, Journal

of the Royal Statistical Society, B 41: 190–195.

Hansen, L. P. (1982) ‘Large sample properties of generalized method of moments estimation’, Econo-

metrica 50(4): 1029–1054.

Hansen, L. P. and K. J. Singleton (1982) ‘Generalized instrumental variables estimation of nonlinear

rational expectations models’, Econometrica 50: 1269–1286.

Harvey, A. C. (1989) Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge:

Cambridge University Press.

Harvey, A. C. and A. Jaeger (1993) ‘Detrending, stylized facts and the business cycle’, Journal of

Applied Econometrics 8(3): 231–247.

Hausman, J. A. (1978) ‘Speciﬁcation tests in econometrics’, Econometrica 46: 1251–1271.

Heckman, J. (1979) ‘Sample selection bias as a speciﬁcation error’, Econometrica 47: 153–161.

Helske, J. (2017) ‘KFAS: Exponential family state space models in R’, Journal of Statistical Software

78(10): 1–39. URL https://doi.org/10.18637/jss.v078.i10.

Hodrick, R. and E. C. Prescott (1997) ‘Postwar U.S. business cycles: An empirical investigation’,

Journal of Money, Credit and Banking 29: 1–16.

Hoechle, D. (2007) ‘Robust standard errors for panel regressions with cross-sectional dependence’,

The Stata Journal 7(3): 281–312. URL https://doi.org/10.1177/1536867X0700700301.

Im, K. S., M. H. Pesaran and Y. Shin (2003) ‘Testing for unit roots in heterogeneous panels’, Journal

of Econometrics 115: 53–74.

Islam, N. (1995) ‘Growth Empirics: A Panel Data Approach’, The Quarterly Journal of Economics

110(4): 1127–1170.

Johansen, S. (1995) Likelihood-Based Inference in Cointegrated Vector Autoregressive Models, Ox-

ford: Oxford University Press.

de Jong, P. (1991) ‘The diﬀuse Kalman ﬁlter’, The Annals of Statistics 19: 1073–1083.

de Jong, P. and S. Chu-Chun-Lin (2003) ‘Smoothing with an unknown initial condition’, Journal of

Time Series Analysis 24(2): 141–148.

Kalbﬂeisch, J. D. and R. L. Prentice (2002) The Statistical Analysis of Failure Time Data, New

York: Wiley, second edn.

Keane, M. P. and K. I. Wolpin (1997) ‘The career decisions of young men’, Journal of Political

Economy 105: 473–522.

Bibliography 453

King, R. G. and S. T. Rebelo (1993) ‘Low frequency ﬁltering and real business cycles’, Journal of

Economic dynamics and Control 17(1-2): 207–231.

Klein, P. (2000) ‘Using the generalized Schur form to solve a multivariate linear rational expectations

model’, Journal of Economic Dynamics and Control 24(10): 1405–1423.

Koenker, R. (1994) ‘Conﬁdence intervals for regression quantiles’. In P. Mandl and M. Huskova

(eds.), Asymptotic Statistics, pp. 349–359. New York: Springer-Verlag.

Koenker, R. and G. Bassett (1978) ‘Regression quantiles’, Econometrica 46: 33–50.

Koenker, R. and K. Hallock (2001) ‘Quantile regression’, Journal of Economic Perspectives 15(4):

143–156.

Koenker, R. and J. Machado (1999) ‘Goodness of ﬁt and related inference processes for quantile

regression’, Journal of the American Statistical Association 94: 1296–1310.

Koenker, R. and Q. Zhao (1994) ‘L-estimation for linear heteroscedastic models’, Journal of Non-

parametric Statistics 3: 223–235.

Koopman, S. J. (1993) ‘Disturbance smoother for state space models’, Biometrika 80: 117–126.

Koopman, S. J., N. Shephard and J. A. Doornik (1999) ‘Statistical algorithms for models in state

space using SsfPack 2.2’, Econometrics Journal 2: 107–160.

Kwiatkowski, D., P. C. B. Phillips, P. Schmidt and Y. Shin (1992) ‘Testing the null of stationarity

against the alternative of a unit root: How sure are we that economic time series have a unit root?’,

Journal of Econometrics 54: 159–178.

Levin, A., C.-F. Lin and J. Chu (2002) ‘Unit root tests in panel data: asymptotic and ﬁnite-sample

properties’, Journal of Econometrics 108: 1–24.

Lucchetti, R. (2011) ‘State space methods in gretl’, Journal of Statistical Software 41(11): 1–22.

Lucchetti, R., L. Papi and A. Zazzaro (2001) ‘Banks’ ineﬃciency and economic growth: A micro

macro approach’, Scottish Journal of Political Economy 48: 400–424.

L¨

utkepohl, H. (2005) New Intoduction to Multiple Time Series Analysis, Berlin: Springer.

MacKinnon, J. G. (1996) ‘Numerical distribution functions for unit root and cointegration tests’,

Journal of Applied Econometrics 11: 601–618.

MacKinnon, J. G. and H. White (1985) ‘Some heteroskedasticity-consistent covariance matrix esti-

mators with improved ﬁnite sample properties’, Journal of Econometrics 29: 305–325.

Magnus, J. R. and H. Neudecker (1988) Matrix Diﬀerential Calculus with Applications in Statistics

and Econometrics, John Wiley & Sons.

McAleer, M. and L. Oxley (1999) Practical Issues in Cointegration Analysis, Oxford: Blackwell.

McCullagh, P. and J. A. Nelder (1983) Generalized linear models, London and New York: Chapman

and Hall.

McCullough, B. D. and C. G. Renfro (1998) ‘Benchmarks and software standards: A case study of

GARCH procedures’, Journal of Economic and Social Measurement 25: 59–71.

Melard, G. (1984) ‘Algorithm AS 197: A Fast Algorithm for the Exact Maximum Likelihood of

Autoregressive-Moving Average Models’, Journal of the Royal Statistical Society. Series C (Applied

Statistics 33(1): 104–114.

Morales, J. L. and J. Nocedal (2011) ‘Remark on Algorithm 778: L-BFGS-B: Fortran routines for

large-scale bound constrained optimization’, ACM Transactions on Mathematical Software 38(1):

1–4.

Mroz, T. (1987) ‘The sensitivity of an empirical model of married women’s hours of work to economic

and statistical assumptions’, Econometrica 5: 765–799. URL https://doi.org/10.2307/1911029.

Bibliography 454

Nadaraya, E. A. (1964) ‘On estimating regression’, Theory of Probability and its Applications 9:

141–142.

Nash, J. C. (1990) Compact Numerical Methods for Computers: Linear Algebra and Function Min-

imisation, Bristol: Adam Hilger, second edn.

Nerlove, M. (1971) ‘Further evidence on the estimation of dynamic economic relations from a time

series of cross sections’, Econometrica 39: 359–382.

(1999) ‘Properties of alternative estimators of dynamic panel models: An empirical analysis

of cross-country data for the study of economic growth’. In C. Hsiao, K. Lahiri, L.-F. Lee and M. H.

Pesaran (eds.), Analysis of Panels and Limited Dependent Variable Models. Cambridge: Cambridge

University Press.

Newey, W. K. and K. D. West (1987) ‘A simple, positive semi-deﬁnite, heteroskedasticity and auto-

correlation consistent covariance matrix’, Econometrica 55: 703–708.

(1994) ‘Automatic lag selection in covariance matrix estimation’, Review of Economic Stud-

ies 61: 631–653.

Okui, R. (2009) ‘The optimal choice of moments in dynamic panel data models’, Journal of Econo-

metrics 151(1): 1–16.

Parzen, E. (1963) ‘On spectral analysis with missing observations and amplitude modulation’,

Sankhy¯a: The Indian Journal of Statistics, Series A 25(4): 383–392.

Pelagatti, M. (2011) ‘State space methods in Ox/SsfPack’, Journal of Statistical Software 41(3):

1–25.

Pollock, D. S. G. (2000) ‘Trend estimation and de-trending via rational square-wave ﬁlters’, Journal

of Econometrics 99(2): 317–334.

Portnoy, S. and R. Koenker (1997) ‘The Gaussian hare and the Laplacian tortoise: computability of

squared-error versus absolute-error estimators’, Statistical Science 12(4): 279–300.

Press, W., S. Teukolsky, W. Vetterling and B. Flannery (2007) Numerical Recipes: The Art of

Scientiﬁc Computing, Cambridge University Press, 3 edn.

Ramanathan, R. (2002) Introductory Econometrics with Applications, Fort Worth: Harcourt, ﬁfth

edn.

Rao, C. R. (1973) Linear Statistical Inference and its Applications, New York: Wiley, second edn.

Rho, S.-H. and T. J. Vogelsang (2018) ‘Heteroskedasticity autocorrelation robust inference in time

series regressions with missing data’, Econometric Theory 35(3): 601–629. URL https://doi.org/

10.1017/S0266466618000117.

Roodman, D. (2009a) ‘How to do xtabond2: An introduction to diﬀerence and system GMM in

Stata’, The Stata Journal 9: 86–136. URL https://doi.org/10.1177/1536867X0900900106.

(2009b) ‘A note on the theme of too many instruments’, Oxford Bulletin of Economics and

Statistics 71: 135–158. URL https://doi.org/10.1111/j.1468-0084.2008.00542.x.

Sargan, J. D. (1958) ‘The estimation of economic relationships using instrumental variables’, Econo-

metrica 26(3): 393–415. URL https://doi.org/10.2307/1907619.

Schwarz, G. (1978) ‘Estimating the dimension of a model’, Annals of Statistics 6: 461–464.

Sephton, P. S. (1995) ‘Response surface estimates of the KPSS stationarity test’, Economics Letters

47: 255–261.

Shumway, R. H. and D. S. Stoﬀer (2017) Time series analysis and its applications: with R examples,

Springer, 4th edn.

Sims, C. A. (1980) ‘Macroeconomics and reality’, Econometrica 48: 1–48.

Bibliography 455

Steinhaus, S. (1999) ‘Comparison of mathematical programs for data analysis (edition 3)’. University

of Frankfurt. URL http://www.informatik.uni-frankfurt.de/~stst/ncrunch/.

Stock, J. H. and M. W. Watson (1999) ‘Forecasting inﬂation’, Journal of Monetary Economics 44(2):

293–335.

(2003) Introduction to Econometrics, Boston: Addison-Wesley.

(2008) ‘Heteroskedasticity-robust standard errors for ﬁxed eﬀects panel data regression’,

Econometrica 76(1): 155–174.

Stokes, H. H. (2004) ‘On the advantage of using two or more econometric software systems to solve

the same problem’, Journal of Economic and Social Measurement 29: 307–320.

Swamy, P. A. V. B. and S. S. Arora (1972) ‘The exact ﬁnite sample properties of the estimators of

coeﬃcients in the error components regression models’, Econometrica 40: 261–275.

Theil, H. (1961) Economic Forecasting and Policy, Amsterdam: North-Holland.

(1966) Applied Economic Forecasting, Amsterdam: North-Holland.

Verbeek, M. (2004) A Guide to Modern Econometrics, New York: Wiley, second edn.

Watson, G. S. (1964) ‘Smooth regression analysis’, Shankya Series A 26: 359–372.

White, H. (1980) ‘A heteroskedasticity-consistent covariance matrix astimator and a direct test for

heteroskedasticity’, Econometrica 48: 817–838.

Windmeijer, F. (2005) ‘A ﬁnite sample correction for the variance of linear eﬃcient two-step GMM

estimators’, Journal of Econometrics 126: 25–51.

Wooldridge, J. M. (2002a) Econometric Analysis of Cross Section and Panel Data, Cambridge, MA:

MIT Press.

(2002b) Introductory Econometrics, A Modern Approach, Mason, OH: South-Western, sec-

ond edn.

Yalta, A. T. and A. Y. Yalta (2007) ‘GRETL 1.6.0 and its numerical accuracy’, Journal of Applied

Econometrics 22: 849–854.

Zhu, C., R. H. Byrd and J. Nocedal (1997) ‘Algorithm 778: L-BFGS-B: Fortran routines for large-

scale bound-constrained optimization’, ACM Transactions on Mathematical Software 23(4): 550–

560.

A Realist Evaluation Approach to Unpacking the Impacts of the Sentencing Guidelines

Article

Full-text available

Oct 2010

Evaluations of complex interventions such as sentencing guidelines provide an opportunity to understand the mechanisms by which policies and programs can impact intermediate and long-term outcomes. There is limited previous discussion of the underlying frameworks by which sentencing guidelines can impact outcomes such as crime rates. Guided by a realist evaluation framework, this article examines the impact of linkages of sentencing policy to resource capacity—a cost-control paradigm under which a few states created guidelines to control rising prison populations and expenditures. Additionally, we argue that the moderating influence of this linkage will depend on the severity of the crime. A key conclusion is that in addition to social science theory, evaluation theory is needed to understand how programs work; there is a greater need for identifying conditions under which policies work or do not work. We find the realist approach as a promising approach to build such knowledge.

Use of ARIMA Mathematical Analysis to Model the Implementation of Expert System Courses by Means of Free Software OpenSim and Sloodle Platforms in Virtual University Campuses

Article

Jun 2013
EXPERT SYST APPL

This paper describes the implementation of a virtual World based in GNU OpenSimulator. This program offers a great variety of Web 3.0 ways of work, as it makes possible to visit web sites using avatars created for that purpose. The Universities should be familiar with the creation of new metaverses. That is the reason why a new basic methodology for the creation of a course on expert systems within a metaverse in a virtual campus for e-learning. Besides the creation of a repository or island, it is necessary to make measurements of the performance of the server dedicated to host the system when the number of users of the application grows. In order to forecast the behavior of such servers, ARIMA based time series are used. The auto-correlogrames obtained are analyzed to formulate a statistical model, as close to reality as possible

ResearchGate has not been able to resolve any references for this publication.

Gretl user's guide ARIMA time-series

Abstract

Recommended publications

Modeling of network traffic in a digital video transmission via Time Series ARIMA and SARIMA univari...

Guía del usuario de Gretl

Gretl Command Reference

Gretl: Retrospect, Design and Prospect

Testing restrictions on VECMs in gretl: the current state of play