BookPDF Available

Semiparametric Regression

Authors:

Abstract and Figures

Semiparametric regression is concerned with the flexible incorporation of non-linear functional relationships in regression analyses. Any application area that benefits from regression analysis can also benefit from semiparametric regression. Assuming only a basic familiarity with ordinary parametric regression, this user-friendly book explains the techniques and benefits of semiparametric regression in a concise and modular fashion. The authors make liberal use of graphics and examples plus case studies taken from environmental, financial, and other applications. They include practical advice on implementation and pointers to relevant software. The 2003 book is suitable as a textbook for students with little background in regression as well as a reference book for statistically oriented scientists such as biostatisticians, econometricians, quantitative social scientists, epidemiologists, with a good working knowledge of regression and the desire to begin using more flexible semiparametric models. Even experts on semiparametric regression should find something new here.
Content may be subject to copyright.
A preview of the PDF is not available
... For complicated variational regression models like additive models, performing both the modeling and inference in the same framework would be preferable to a more computationally burdensome approach of bootstrapping or running a separate, fully Bayesian estimation procedure. One such model is the additive model which typically implies a regression model with a scalar outcome and an additive smoothed or semiparametric effect or potentially a set of such effects in addition to scalar covariates Additive models with smoothed effects have been broadly studied, see texts by Ruppert et al. [2003] or Wood [2017], for example. Variational Bayesian approaches that incorporate smoothing include work by Luts and Wand [2015], Lee and Wand [2016], Wand [2017], Hui et al. [2019], Yang and Yang [2024] and references therein. ...
... In this manuscript, we explore MFVB approximations to variational additive models for both Gaussian and binary outcomes, developing Coordinate Ascent Variational Inference (CAVI) algorithms to perform estimation (Section 2). These models can incorporate an arbitrary number of both smoothed and functional effects and thus include semiparametric regression models (in the Ruppert et al. [2003] sense), scalar-on-function regression models (of the form described by Ramsay and Dalzell [1991] and Ramsay and Silverman [2005]), and the combination of the two. The target of inference we consider is either a smoothed effect or a functional effect, both of which can be represented as smoothed effects. ...
... The lidar data is a classic illustration of semiparametric regression used by Ruppert et al. [2003] to illustrate the concept. It features 221 observations from a light detection and ranging experiment measuring the log of the ratio of light received from two laser sources. ...
Preprint
Full-text available
Variational regression methods are an increasingly popular tool for their efficient estimation of complex. Given the mixed model representation of penalized effects, additive regression models with smoothed effects and scalar-on-function regression models can be fit relatively efficiently in a variational framework. However, inferential procedures for smoothed and functional effects in such a context is limited. We demonstrate that by using the Mean Field Variational Bayesian (MFVB) approximation to the additive model and the subsequent Coordinate Ascent Variational Inference (CAVI) algorithm, we can obtain a form of the estimated effects required of a Frequentist test for semiparametric curves. We establish MFVB approximations and CAVI algorithms for both Gaussian and binary additive models with an arbitrary number of smoothed and functional effects. We then derive a global testing framework for smoothed and functional effects. Our empirical study demonstrates that the test maintains good Frequentist properties in the variational framework and can be used to directly test results from a converged, MFVB approximation and CAVI algorithm. We illustrate the applicability of this approach in a wide range of data illustrations.
... But to reduce the computational burden of the smoothing spline approach, the number of knots is typically much smaller than the number of distinct time points. The coefficients are again estimated using a penalty to avoid overfitting [9,10]. Many types of spline bases exist, such as a truncated power basis and a B-spline basis [11]. ...
... Furthermore, serial correlation was taken into account by assuming first-order autoregressive errors for each country [10,19]. We modeled heteroscedasticity by allowing the residual variances and autocorrelation parameters to be country-specific [20,21], which at the same time accommodates differing country sizes. ...
... First, a model for each of the two outcomes was built. Since the longitudinal trajectories of both responses are nonlinear, we made use of P-splines [10,12,24]. A general nonparametric model using splines can be formulated as ...
Article
Full-text available
One of the key tools to understand and reduce the spread of the SARS-CoV-2 virus is testing. The total number of tests, the number of positive tests, the number of negative tests, and the positivity rate are interconnected indicators and vary with time. To better understand the relationship between these indicators, against the background of an evolving pandemic, the association between the number of positive tests and the number of negative tests is studied using a joint modeling approach. All countries in the European Union, Switzerland, the United Kingdom, and Norway are included in the analysis. We propose a joint penalized spline model in which the penalized spline is reparameterized as a linear mixed model. The model allows for flexible trajectories by smoothing the country-specific deviations from the overall penalized spline and accounts for heteroscedasticity by allowing the autocorrelation parameters and residual variances to vary among countries. The association between the number of positive tests and the number of negative tests is derived from the joint distribution for the random intercepts and slopes. The correlation between the random intercepts and the correlation between the random slopes were both positive. This suggests that, when countries increase their testing capacity, both the number of positive tests and negative tests will increase. A significant correlation was found between the random intercepts, but the correlation between the random slopes was not significant due to a wide credible interval.
... However, functional data are often observed with measurement error and need to be smoothed before calculating derivatives and performing analysis. For this, there is a vast literature on non-parametric smoothing techniques (see, e.g., Fan, 2017;Ramsay & Silverman, 2005;Ruppert et al., 2003) which should suffice provided the functions are measured on a sufficiently dense grid. In real data analysis settings, the order of derivative to model will also have to be chosen. ...
... However, functional data are often observed with measurement error and need to be smoothed before calculating derivatives and performing analysis. For this, there is a vast literature on non-parametric smoothing techniques (see, e.g., Fan, 2017;Ramsay & Silverman, 2005;Ruppert et al., 2003) which should suffice provided the functions are measured on a sufficiently dense grid. In real data analysis settings, the order of derivative to model will also have to be chosen. ...
Preprint
Full-text available
In functional data analysis, replicate observations of a smooth functional process and its derivatives offer a unique opportunity to flexibly estimate continuous-time ordinary differential equation models. Ramsay (1996) first proposed to estimate a linear ordinary differential equation from functional data in a technique called Principal Differential Analysis, by formulating a functional regression in which the highest-order derivative of a function is modelled as a time-varying linear combination of its lower-order derivatives. Principal Differential Analysis was introduced as a technique for data reduction and representation, using solutions of the estimated differential equation as a basis to represent the functional data. In this work, we re-formulate PDA as a generative statistical model in which functional observations arise as solutions of a deterministic ODE that is forced by a smooth random error process. This viewpoint defines a flexible class of functional models based on differential equations and leads to an improved understanding and characterisation of the sources of variability in Principal Differential Analysis. It does, however, result in parameter estimates that can be heavily biased under the standard estimation approach of PDA. Therefore, we introduce an iterative bias-reduction algorithm that can be applied to improve parameter estimates. We also examine the utility of our approach when the form of the deterministic part of the differential equation is unknown and possibly non-linear, where Principal Differential Analysis is treated as an approximate model based on time-varying linearisation. We demonstrate our approach on simulated data from linear and non-linear differential equations and on real data from human movement biomechanics. Supplementary R code for this manuscript is available at \url{https://github.com/edwardgunning/UnderstandingOfPDAManuscript}.
... Energy metabolism during sleep was analyzed using a semi-parametric regression model, i.e., a parametric analysis for the effect of sleep stage (N1, N2, SWS, REM, and WASO) and a nonparametric analysis for the effect of time after sleep onset were simultaneously applied using the SemiPar package of the statistical software package R (ver 4.2.3). 29 The results are expressed as mean G standard error of the mean (SEM). Paired t tests were used to compare the mean values of energy expenditure, substrate oxidation, and sleep parameters between trials. ...
... For a visual representation of this model, refer to Figure 3. Model (13) is recognized as an additive model in the transformed coordinate t * . This structure has received extensive treatment in the statistical literature (Buja et al., 1989;Ruppert et al., 2006) and a variety of algorithms have been proposed to estimate the unknown marginal functions, i.e. the h (i) G := h G (t * i |b, ξ i ). As ξ i effects the observed signals only through the i'th kernel decay function, it is practical to approximate our inference of the former using only our estimates of the latter, i.e. assuming conditional independence with the remaining marginal functions j ̸ = i. ...
Preprint
Full-text available
Diffusion MRI (dMRI) is the primary imaging modality used to study brain microstructure in vivo. Reliable and computationally efficient parameter inference for common dMRI biophysical models is a challenging inverse problem, due to factors such as variable dimensionalities (reflecting the unknown number of distinct white matter fiber populations in a voxel), low signal-to-noise ratios, and non-linear forward models. These challenges have led many existing methods to use biologically implausible simplified models to stabilize estimation, for instance, assuming shared microstructure across all fiber populations within a voxel. In this work, we introduce a novel sequential method for multi-fiber parameter inference that decomposes the task into a series of manageable subproblems. These subproblems are solved using deep neural networks tailored to problem-specific structure and symmetry, and trained via simulation. The resulting inference procedure is largely amortized, enabling scalable parameter estimation and uncertainty quantification across all model parameters. Simulation studies and real imaging data analysis using the Human Connectome Project (HCP) demonstrate the advantages of our method over standard alternatives. In the case of the standard model of diffusion, our results show that under HCP-like acquisition schemes, estimates for extra-cellular parallel diffusivity are highly uncertain, while those for the intra-cellular volume fraction can be estimated with relatively high precision.
Article
Full-text available
Introduction With the introduction of the new psychiatric diagnostic manuals, personality functioning has gained new prominence. Several studies have reported consistent findings that individual showing high levels of antisocial features are associated with alterations in interpersonal functioning domains such as empathy and mentalisation. The focus of the current study (N = 198) is to examine antisocial cognitions, as measured by the Scrambled Sentences Task (SST), and to what extent this approach can help to better understand the relationship between antisocial traits and personality functioning/empathy. Method We implemented a hypothesis-driven approach using logistic regression and a data-driven approach using machine learning to examine distinct but related measures of personality functioning as predictors of antisocial cognitions. Results Antisocial cognitions were associated with low interpersonal functioning as expected, but only when not adjusting for antisocial traits, which accounted for almost all the association. The data-driven analysis revealed that individual items assessing empathic concern in personality functioning scales (as opposed to the whole scores) explained low antisocial cognitions even when adjusting for antisocial traits. Discussion Antisocial cognitions appear to be associated to two distinct traits, the antisocial and a specific type of personality functioning. This finding is discussed in terms of the possible distinction between two motivational forces: to harm others/prioritize one’s advantage, and to help suffering others.
Article
We propose novel quantile regression methods when the response is discrete and the data come from a longitudinal design. The approach is based on conditional mid-quantiles, which have good theoretical properties even in the presence of ties. Optimization of a ridge-type penalized objective function accommodates for the data dependence. We investigate the performance and pertinence of our methods in a simulation study and an original application to macroprudential policies use in more than one hundred countries over a period of seventeen years.
Article
Full-text available
We used mark-recapture methods to estimate the number of Parnassius smintheus (Papilionidae) butterflies moving among 20 alpine meadows separated by varying amounts of forest along the east slope of the Rocky Mountains in Alberta, Canada. We combined generalized additive models and generalized linear models to estimate the effects of intervening habitat type and of population size on butterfly movement. By incorporating habitat-specific distances between patches, we were better able to estimate movement compared to a strictly isolation-by-distance model. Our analysis estimated that butterflies move readily through open meadow but that forests are twice as resistant to butterfly movement. Butterflies also tended to stay at sites with high numbers of butterflies, but readily emigrate from sites with small populations: We showed that P. smintheus are highly restricted in their movement at even a fine spatial scale, a pattern reflected in concurrent studies of population genetic structure. As an example of the utility of our approach, we used these statistical models, in combination with aerial photographs of the same area taken in 1952, to estimate the degree to which landscape change over a 43-year interval has reduced movement of butterflies among subpopulations. At these sites, alpine meadow habitat has declined in area by 78%, whereas the estimated effect of fragmentation has been to reduce butterfly movement by 41%.
Article
Full-text available
Penalized splines, or P-splines, are regression splines t by least-squares with a roughness penaly. P-splines have much in common with smoothing splines, but the type of penalty used with a P-spline is somewhat more general than for a smoothing spline. Also, the number and location of the knots of a P-spline is not xed as with a smoothing spline. Generally, the knots of a P-spline are at xed quantiles of the independent variable and the only tuning parameter to choose is the number of knots. In this article, the eeects of the number of knots on the performance of P-splines are studied. Two algorithms are proposed for the automatic selection of the number of knots. The myoptic algorithm stops when no improvement in the generalized cross validation statistic (GCV) is noticed with the last increase in the number of knots. The full search examines all candidates in a xed sequence of possible numbers of knots and chooses the candidate that minimizes GCV. The myoptic algorithm works well in many cases but can stop prematurely. The full search algorithm worked well in all examples examined. A Demmler-Reinsch type diagonalization for computing univariate and additive P-splines is described.
Article
Full-text available
Regression spline smoothing involves modelling a regression function as a piecewise polynomial with a high number of pieces relative to the sample size. Because the number of possible models is so large, efficient strategies for choosing among them are required. In this paper we review approaches to this problem and compare them through a simulation study. For simplic-ity and conciseness we restrict attention to the univariate smoothing setting with Gaussian noise and the truncated polynomial regression spline basis.
Article
Nested random effects models are often used to represent similar processes occurring in each of many clusters. Suppose that, given cluster-specific random effects b, the data y are distributed according to f(y|b, Θ), while b follows a density p(b|Θ). Likelihood inference requires maximization of ∫ f(y|b, Θ)p(b|Θdb with respect to Θ. Evaluation of this integral often proves difficult, making likelihood inference difficult to obtain. We propose a multivariate Taylor series approximation of the log of the integrand that can be made as accurate as desired if the integrand and all its partial derivatives with respect to b are continuous in the neighborhood of the posterior mode of b|Θ,y. We then apply a Laplace approximation to the integral and maximize the approximate integrated likelihood via Fisher scoring. We develop computational formulas that implement this approach for two-level generalized linear models with canonical link and multivariate normal random effects. A comparison with approximations based on penalized quasi-likelihood, Gauss—Hermite quadrature, and adaptive Gauss-Hermite quadrature reveals that, for the hierarchical logistic regression model under the simulated conditions, the sixth-order Laplace approach is remarkably accurate and computationally fast.
Article
We describe a Bayesian method, for fitting curves to data drawn from an exponential family, that uses splines for which the number and locations of knots are free parameters. The method uses reversible-jump Markov chain Monte Carlo to change the knot configurations and a locality heuristic to speed up mixing. For nonnormal models, we approximate the integrated likelihood ratios needed to compute acceptance probabilities by using the Bayesian information criterion, BIC, under priors that make this approximation accurate. Our technique is based on a marginalised chain on the knot number and locations, but we provide methods for inference about the regression coefficients, and functions of them, in both normal and nonnormal models. Simulation results suggest that the method performs well, and we illustrate the method in two neuroscience applications.
Article
This article proposes an automatic smoothing method for recovering discontinuous regression functions. The method models the target regression function with a series of disconnected cubic regression splines which partition the function's domain. In this way discontinuity points can be incorporated in a tted curve simply as the boundary points between adjacent splines. Three objective criteria are constructed and com-pared for choosing the number and placement of these discontinuity points as well as the amount of smoothing. These criteria are derived from three fundamentally diierent model selection methods: AIC, GCV and the MDL principle. Practical optimization of these criteria is done by genetic algorithms. Simulation results show that the proposed method is superior to many existing smoothing methods when the target function is non{smooth. The method is further made robust by using a Gaussian mixture ap-proach to model outliers.
Article
Computing posterior modes (e.g., maximum likelihood estimates) for models involving latent variables or missing data often involves complicated opti-mization procedures. By splitting this task into two simpler parts, however, EM-type algorithms often offer a simple solution. Although this approach has proven useful, in some settings even these simpler tasks are challenging. In particular, computations involving latent variables are typically difficult to simplify. Thus, in models such as hierarchical models with complicated latent variable structures, computationally intensive methods may be required for the expectation step of EM. This paper describes how nesting two or more EM algorithms can take advan-tage of closed form conditional expectations and lead to algorithms which converge faster, are straightforward to implement, and enjoy stable convergence properties. Methodology to monitor convergence of nested EM algorithms is developed using importance and bridge sampling. The strategy is applied to hierarchical probit and t regression models to derive algorithms which incorporate aspects of Monte-Carlo EM, PX-EM, and nesting in order to combine computational efficiency with easy implementation.
Article
Most regression problems in practice require flexible semiparametric forms of the predictor for modelling the dependence of responses on covariates. Moreover, it is often necessary to add random effects accounting for overdispersion caused by unobserved heterogeneity or for correlation in longitudinal or spatial data. We present a unified approach for Bayesian inference via Markov chain Monte Carlo simulation in generalized additive and semiparametric mixed models. Different types of covariates, such as the usual covariates with fixed effects, metrical covariates with non-linear effects, unstructured random effects, trend and seasonal components in longitudinal data and spatial covariates, are all treated within the same general framework by assigning appropriate Markov random field priors with different forms and degrees of smoothness. We applied the approach in several case-studies and consulting cases, showing that the methods are also computationally feasible in problems with many covariates and large data sets. In this paper, we choose two typical applications.
Article
This paper develops a likelihood-based method for fitting additive models in the presence of measurement error. It formulates the additive model using the linear mixed model representation of penalized splines. In the presence of a structural measurement error model, the resulting likelihood involves intractable integrals, and a Monte Carlo expectation maximization strategy is developed for obtaining estimates. The method's performance is illustrated with a simulation study.
Article
This article provides a new methodology for estimating the term structure of corporate debt using a semiparametric penalized spline model. The method is applied to a case study of AT&T bonds. Typically, very few data are available on individual corporate bond prices, too little to find a nonparametric estimate of term structure from these bonds alone. This problem is solved by "borrowing strength" from Treasury bond data. More specifically, we combine a nonparametric model for the term structure of Treasury bonds with a parametric component for the credit spread. Our methodology generalizes the work of Fisher, Nychka, and Zervos in several ways. First, their model was developed for Treasury bonds only and cannot be applied directly to corporate bonds. Second, we more fully investigate the problem of choosing the smoothing parameter, a problem that is complicated because the forward rate is the derivative-log{D(t)}, where the discount function D is the function fit to the data. In our case study, estimation of the derivative requires substantially more smoothing than selected by generalized cross-validation (GCV). Another problem for smoothing parameter selection is possible correlation of the errors. We compare three methods of choosing the penalty parameter: generalized cross-validation (GCV), the residual spatial autocorrelation (RSA) method of Ellner and Seifu, and an extension of Ruppert's empirical bias bandwidth selection (EBBS) to splines. Third, we provide approximate sampling distributions based on asymptotics for the Treasury forward rate and the bootstrap for corporate bonds. Confidence bands and tests of interesting hypotheses, for example, about the functional form of the credit spreads, are also discussed.