Book

The Analysis of Longitudinal Data

Taylor & Francis
Journal of the American Statistical Association
Authors:
... Longitudinal studies therefore serve as valuable assets widely employed across various domains such as economics, medical investigations, and environmental inquiries. In the past 20-30 years, there have been several excellent books published pertaining to statistics, biostatistics, econometrics, and social sciences (Davidian and Giltinan, 1995;Diggle et al., 2002;Hardin and Hilbe, 2012;Manuel, 2003;Molenberghs and Verbeke, 2005;Molenberghs et al., 2004;Pinheiro and Bates, 2006;Searle et al., 2009;Song and Song, 2007) among others. Statistical methodologies devised for longitudinal data analysis often find relevance in scenarios involving hierarchical, spatial, and clustered (such as familial or litter) structures as well (Diggle et al., 2002;Hand and Crowder, 1996;Hedeker and Gibbons, 2006). ...
... In the past 20-30 years, there have been several excellent books published pertaining to statistics, biostatistics, econometrics, and social sciences (Davidian and Giltinan, 1995;Diggle et al., 2002;Hardin and Hilbe, 2012;Manuel, 2003;Molenberghs and Verbeke, 2005;Molenberghs et al., 2004;Pinheiro and Bates, 2006;Searle et al., 2009;Song and Song, 2007) among others. Statistical methodologies devised for longitudinal data analysis often find relevance in scenarios involving hierarchical, spatial, and clustered (such as familial or litter) structures as well (Diggle et al., 2002;Hand and Crowder, 1996;Hedeker and Gibbons, 2006). In particular, the recent book by Wang et al. (2022) provides a comprehensive discussion of modern approaches to inference the longitudinal data analysis. ...
... If repeated measures are taken, time series models (such as autoregressive of order one, i.e., AR(1)) can be considered. In spatial statistics, variogram models (Diggle et al., 2002) can be considered. Other models such as compound symmetric (also referred to as uniform in the literature) or more complicated structures capture certain particular shared or independent sources of variations. ...
Chapter
Full-text available
This work aims to provide a review of methodology on analysis of longitudinal data focusing on (i) how to select different model components: the covariance (correlation and variance) functions or structures and the predictive variables; (ii) the robust approaches including rank and quantile regression; and (iii) machine learning algorithms that incorporate the temporal or clustering effects. Specifically, among longitudinal machine learning algorithms, tree-based methods are widely used for modeling random effects, while support vector machine-based techniques are adapted to include temporal structure and random effects. More recently, there has been an emerging interest in using (deep) neural networks trained with derived optimization objectives to capture complex patterns in longitudinal data.
... Different time metrics, the description of how much much data was collected through the study (how many observations and at which times), missing values across time points, including drop-out, and longitudinal trends of variables should be considered. Model building and inference for longitudinal studies have received much attention [14], and many textbooks on longitudinal studies discuss data exploration and the specific challenges due to missing values [14][15][16]; however, a systematic process for data screening is missing. ...
... It is also useful to estimate the probability of drop-out after inclusion during study, stratifying by structural variables. The display of the mean outcome as a function of time stratified by different drop-out times can suggest a relationship between the outcome and the drop-out process [15]. ...
Article
Full-text available
Initial data analysis (IDA) is the part of the data pipeline that takes place between the end of data retrieval and the beginning of data analysis that addresses the research question. Systematic IDA and clear reporting of the IDA findings is an important step towards reproducible research. A general framework of IDA for observational studies includes data cleaning, data screening, and possible updates of pre-planned statistical analyses. Longitudinal studies, where participants are observed repeatedly over time, pose additional challenges, as they have special features that should be taken into account in the IDA steps before addressing the research question. We propose a systematic approach in longitudinal studies to examine data properties prior to conducting planned statistical analyses. In this paper we focus on the data screening element of IDA, assuming that the research aims are accompanied by an analysis plan, meta-data are well documented, and data cleaning has already been performed. IDA data screening comprises five types of explorations, covering the analysis of participation profiles over time, evaluation of missing data, presentation of univariate and multivariate descriptions, and the depiction of longitudinal aspects. Executing the IDA plan will result in an IDA report to inform data analysts about data properties and possible implications for the analysis plan—another element of the IDA framework. Our framework is illustrated focusing on hand grip strength outcome data from a data collection across several waves in a complex survey. We provide reproducible R code on a public repository, presenting a detailed data screening plan for the investigation of the average rate of age-associated decline of grip strength. With our checklist and reproducible R code we provide data analysts a framework to work with longitudinal data in an informed way, enhancing the reproducibility and validity of their work.
... The sample size calculation of this project was based on the method of Diggle et al. [37]. Considering a minimal clinically important difference (MCID) of eight points in the KOOS subscale pain, 16 points as within-group standard deviation after TKA [24,38], three measurement points, a confidence level of 0.05, and power of 0.80, at least 25 subjects per group were necessary [37]. ...
... The sample size calculation of this project was based on the method of Diggle et al. [37]. Considering a minimal clinically important difference (MCID) of eight points in the KOOS subscale pain, 16 points as within-group standard deviation after TKA [24,38], three measurement points, a confidence level of 0.05, and power of 0.80, at least 25 subjects per group were necessary [37]. Anticipating disturbed somatosensory functioning in 30% of KOA patients [10,39], we hypothesized that 15% would have disturbed somatosensory functioning at baseline and 1 year post-TKA. ...
Article
Full-text available
The objective of this study is to determine whether the change in pain intensity over time differs between somatosensory functioning evolution profiles in knee osteoarthritis (KOA) patients undergoing total knee arthroplasty (TKA). This longitudinal prospective cohort study, conducted between March 2018 and July 2023, included KOA patients undergoing TKA in four hospitals in Belgium and the Netherlands. The evolution of the Knee Injury and Osteoarthritis Outcome Score (KOOS) subscale pain over time (baseline, 3 months, and 1 year post-TKA scores) was the outcome variable. The evolution scores of quantitative sensory testing (QST) and Central Sensitization Inventory (CSI) over time (baseline and 1 year post-TKA scores) were used to make subgroups. Participants were divided into separate normal, recovered, and persistent disturbed somatosensory subgroups based on the CSI, local and widespread pressure pain threshold [PPT] and heat allodynia, temporal summation [TS], and conditioned pain modulation [CPM]. Linear mixed model analyses were performed. Two hundred twenty-three participants were included. The persistent disturbed somatosensory functioning group had less pronounced pain improvement (based on CSI and local heat allodynia) and worse pain scores 1 year post-TKA (based on CSI, local PPT and heat allodynia, and TS) compared to the normal somatosensory functioning group. This persistent group also had worse pain scores 1 year post-TKA compared to the recovered group (based on CSI). The study suggests the presence of a “centrally driven central sensitization” subgroup in KOA patients awaiting TKA in four of seven grouping variables, comprising their less pain improvement or worse pain score after TKA. Future research should validate these findings further. The protocol is registered at clinicaltrials.gov (NCT05380648).
... El modelado de la altura dominante se ha realizado con varios enfoques de ajuste, como son las técnicas del método de mínimos cuadrados ordinarios, donde se asumen supuestos de normalidad, homogeneidad de varianzas e independencia de los errores (García-Cuevas et al. 2007, Murillo-Brito et al. 2017, mismos supuestos que no se cumplen cuando se utilizan datos correlacionados provenientes de datos trasversales, longitudinales, remediciones o de sitios agrupados por condiciones similares (Carrero et al. 2008, Seoane 2014, Corral et al. 2019. Otro enfoque de ajuste es por medio de máxima verosimilitud y el uso de modelos de efectos mixtos , en donde se asocian efectos aleatorios a los parámetros de los modelos, lo que influye para mejorar el término de error (Jerez-Rico et al. 2011) al corregir la estructura de varianzas-covarianzas asociadas con datos correlacionados (Verbeke y Molenberghs 2000, Diggle et al. 2002, Littell et al. 2006. ...
... Los modelos de efectos mixtos permitieron corregir la estructura de varianzas-covarianzas asociadas con datos de medición de árboles agrupados en sitios con características similares, ya que en ningún caso se observan problemas de correlación entre los parámetros (Diggle et al. 2002, Littell et al. 2006. Así, se obtuvieron mejores resultados para modelar los patrones de crecimiento en altura dominante que con la técnica de mínimos cuadrados ordinario (MCO). ...
Article
Full-text available
La altura dominante de los árboles de un sitio es un indicador indirecto de la productividad de los suelos forestales y se le denomina calidad de estación, la cual se mide a través del índice de sitio. El objetivo fue ajustar modelos de altura dominante en función de la edad con enfoque de modelos de efectos mixtos para generar curvas de índice de sitio en plantaciones forestales de Pinus oocarpaShiede en Ario de Rosales, Michoacán, México. Con 1 773 pares de datos altura dominante-edad de plantaciones forestales de entre dos a 20 años y relictos de 56 años, se ajustaron cinco modelos de crecimiento para elegir uno base. El modelo de Weibull se ajustó por modelos de efectos mixtos con la parcela como covariable aditiva en el parámetro relacionado al sitio. La curva guía en altura dominante y las curvas polimórficas cubren adecuadamente la variabilidad muestral de la información y representan de forma confiable la productividad del sitio donde crecen estas plantaciones. El sesgo y la validación de los resultados mostró confiabilidad en las estimaciones. El turno técnico en altura dominante está entre cinco y 15 años de acuerdo con su calidad de estación. Con estos resultados es posible realizar la clasificación de las plantaciones forestales de Pinus oocarpa de manera confiable y acorde con el nivel de productividad de cada sitio de establecimiento.
... Ignoring individual heterogeneity among countries could lead to inconsistent and inefficient estimates of the parameters of interest (Pinheiro and Bates 2000). As a result, our longitudinal models will include not only common population information and measurement error terms, as described by Diggle et al. (2002), but also random effects to account for specific country characteristics. Statistical estimation is carried out within the Bayesian inferential framework. ...
... .; w i ðt in i Þ ðÁÞ Þ 0 in the conditional mean, where each w i ðt ij Þ ðÁÞ , j ¼ 2; . . .; n i , is a realisation at time t ij from a Gaussian process with mean q w i ðt i;jÀ1 Þ and variance Diggle et al. 2002). That is, w ðÁÞ i is a vector of time correlated noises. ...
Article
Full-text available
The European sardine is a pelagic species of great ecological importance for the conservation of the Mediterranean Sea as well as economic importance for the Mediterranean countries. Its fishing has suffered a significant decline in recent years due to various economic, cultural and ecological reasons. This paper focuses on the evolution of sardine catches in the Mediterranean Sea from 1985 to 2018 according to the fishing Mediterranean country and the type of fishing practised, artisanal and industrial. We propose three Bayesian longitudinal linear mixed models to assess differences in the temporal evolution of artisanal and industrial fisheries between and within countries. Overall results confirm that Mediterranean fishery time series are highly diverse along their dynamics and this heterogeneity is persistent throughout the time. Furthermore, our results highlight a positive correlation between artisanal and industrial fishing. Finally, the study observes a consistent decreasing time trend in the quantity of fish landings. Although the causes of this feature could be also linked to economic motivations (such as a reduction in demand or the reorientation of fleets towards more commercially beneficial species), it may indicate a potential risk to the stock of this species in the Mediterranean Sea.
... Longitudinal studies enable direct study of within-person cognitive changes. By investigating the effects of neighborhood on rates of cognitive changes over time, longitudinal studies can help to identify factors that may alter the course of cognitive decline and inform development of intervention strategies [24][25][26]. In current study, we compared the relative strength of objective and subjective neighborhood measures in relation to levels (i.e., between-person cognitive differences at cross-section) and rates of cognitive decline (i.e., average trajectories of within-person cognitive change). ...
Article
Full-text available
Background Although a growing body of literature documents the importance of neighborhood effects on late-life cognition, little is known about the relative strength of objective and subjective neighborhood measures on late-life cognitive changes. This study examined effects of objective and subjective neighborhood measures in three neighborhood domains (neighborhood safety, physical disorder, food environments) on longitudinal changes in processing speed, an early marker of cognitive aging and impairment. Methods The analysis sample included 306 community-dwelling older adults enrolled in the Einstein Aging Study (mean age = 77, age range = 70 to 91; female = 67.7%; non-Hispanic White: 45.1%, non-Hispanic Black: 40.9%). Objective and subjective measures of neighborhood included three neighborhood domains (i.e., neighborhood safety, physical disorder, food environments). Processing speed was assessed using a brief Symbol Match task (unit: second), administered on a smartphone device six times a day for 16 days and repeated annually for up to five years. Years from baseline was used as the within-person time index. Results Results from mixed effects models showed that subjective neighborhood safety (β= -0.028) and subjective availability of healthy foods (β= -0.028) were significantly associated with less cognitive slowing over time. When objective and subjective neighborhood measures were simultaneously examined, subjective availability of healthy foods remained significant (β= -0.028) after controlling for objective availability of healthy foods. Associations of objective neighborhood crime and physical disorder with processing speed seemed to be confounded by individual-level race and socioeconomic status; after controlling for these confounders, none of objective neighborhood measures showed significant associations with processing speed. Conclusion Subjective neighborhood safety and subjective availability of healthy foods, rather than objective measures, were associated with less cognitive slowing over time over a five-year period. Perception of one’s neighborhood may be a more proximal predictor of cognitive health outcomes as it may reflect one’s experiences in the environment. It would be important to improve our understanding of both objective and subjective neighborhood factors to improve cognitive health among older adults.
... all analyses included two fixed effects for session type (two levels: baseline measure vs Qct practice) and practice location (two levels: day-to-day life vs flight operation). the random effect structure for each analysis was determined by comparing akaike's information criterion (aic) values of candidate maximal models that contained varying complexities of random by-participant and by-session slopes (for each fixed effect) and intercepts (Diggle et al., 2002). the model containing the random effect structure with an aic value equal to 0 was chosen as the model which subsequent fixed effect analyses were based upon. ...
Article
Full-text available
Commercial pilots endure multiple stressors in their daily and occupational lives which are detrimental to psychological well-being and cognitive functioning. The Quick coherence technique (QCT) is an effective intervention tool to improve stress resilience and psychophysiological balance based on a five-minute paced breathing exercise with heart rate variability (HRV) biofeedback. The current research reports on the application of QCT training within an international airline to improve commercial pilots’ psychological health and support cognitive functions. Forty-four commercial pilots volunteered in a one-month training programme to practise self-regulated QCT in day-to-day life and flight operations. Pilots’ stress index, HRV time-domain and frequency-domain parameters were collected to examine the influence of QCT practice on the stress resilience process. The results demonstrated that the QCT improved psychophysiological indicators associated with stress resilience and cognitive functions, in both day-to-day life and flight operation settings. HRV fluctuations, as measured through changes in RMSSD and LF/HF, revealed that the resilience processes were primarily controlled by the sympathetic nervous system activities that are important in promoting pilots’ energy mobilization and cognitive functions, thus QCT has huge potential in facilitating flight performance and aviation safety. These findings provide scientific evidence for implementing QCT as an effective mental support programme and controlled rest strategy to improve pilots’ psychological health, stress management, and operational performance.
... Binary responses with repeated measurements are often modeled by generalized linear mixed models (GLMM). The hierarchical structure of the GLMM is a natural extension of the generalized linear model (GLM) where the response variable belongs to the exponential family of distributions (Diggle et al., 2013;Hardin and Hilbe, 2016). Random effects are included in the systematic component to induce intraclass correlation and to draw inferences on it (Breslow and Clayton, 1993;. ...
Preprint
In this paper, we deduce a new multivariate regression model designed to fit correlated binary data. The multivariate distribution is derived from a Bernoulli mixed model with a nonnormal random intercept on the marginal approach. The random effect distribution is assumed to be the generalized log-gamma (GLG) distribution by considering a particular parameter setting. The complement log-log function is specified to lead to strong conjugacy between the response variable and random effect. The new discrete multivariate distribution, named MBerGLG distribution, has location and dispersion parameters. The MBerGLG distribution leads to the MBerGLG regression (MBerGLGR) model, providing an alternative approach to fitting both unbalanced and balanced correlated response binary data. Monte Carlo simulation studies show that its maximum likelihood estimators are unbiased, efficient, and consistent asymptotically. The randomized quantile residuals are performed to identify possible departures from the proposal model and the data and detect atypical subjects. Finally, two applications are presented in the data analysis section.
... This was a longitudinal, quantitative study that went through the following stages. First, we collected data over three periods -October to December 2018, July to September 2019, and April to June 2020 -with a minimum interval of six months between them, according to the literature (Diggle, 2002). Then, we delimited and organized the sample -civil servants from federal education institutions in the Northeast of Brazil. ...
Article
Full-text available
Little more than a decade separates this research from the first academic proposition of organizational entrenchment. To date, no longitudinal research has been carried out despite the recognized importance of this method for enriching studies in the behavioral field. This research aimed to identify characteristics of organizational entrenchment among civil servants at federal educational institutions in Brazil based on a longitudinal assessment of latent profiles. A quantitative and longitudinal survey was carried out with 1060 participants in the first collection. Descriptive analysis, comparison of means, and latent transition analysis were carried out. The civil servants in the sample showed low levels of entrenchment. Among the main findings are the stability of the profiles formed by the civil servants and the higher averages found in the dimensions of adjustments to social position and impersonal bureaucratic arrangements, reinforcing some cross-sectional theoretical findings. The results help to guide managers on the importance of internal factors for the entrenchment of civil servants and how attention to the items in the dimension of adjustments to social position can favor working with this bond. This research found that, over time, entrenchment is a stable bond.
... First, it is important to understand the difference between the fully conditional model, the fully marginal model, and the partly conditional model for Y(t) under a nonpredictive setting (i.e., using covariates information up to time t). The fully conditional model refers to P{Y(t) | H(t)} , where the distribution of Y(t) is conditional on the entire disease history up to t [11]. This approach accounts for the evolving nature of Y(t) and X * (t) , but is technically challenging to implement and difficult to draw statistical inference as well. ...
Article
Full-text available
Assessing time-dependent risk factors in relation to the risk of disease progression is challenging, yet important, especially for chronic diseases with slow progression. In this paper, the partly conditional model is extended for characterizing disease progression at time t with longitudinal ordinal outcomes in the presence of time-dependent covariates at time s $$(s < t)$$ ( s < t ) and time-dependent effects. Advantages of the method include direct modeling of disease progression, use of longitudinal risk factors, and flexibility in target period of progression $$u = t - s$$ u = t - s . A generalized estimating equation approach is adopted for parameter estimation, and a new regularity condition requiring a fixed prediction time window $$u_0$$ u 0 is established to consistently estimate the covariance of the estimated parameters. Extensive simulation studies are conducted to assess the properties of the proposed model alongside with an explicit demonstration of the implementation using existing statistical software. The proposed method is applied to a longitudinal Alzheimer’s disease dataset from the National Alzheimer’s Coordinating Center to assess effect of time-dependent cognitive complaint on Alzheimer’s Disease progression.
... Stepped Wedge Trials (SWTs) require statistical methods that account for the correlation between units in the same cluster and, with cohort designs, between repeated observations on the same units over time. Analyzing longitudinal, or panel, data requires careful attention to its temporal nature particularly when covariates can vary over time (Diggle, 2002;Hsiao, 2022). The two primary approaches to modeling SWTs are frequentist random effects models (mixed models), and frequentist marginal approaches estimated using GEE (Liang & Zeger, 1986). ...
Article
Full-text available
Multilevel interventions (MLIs) hold promise for reducing health inequities by intervening at multiple types of social determinants of health consistent with the socioecological model of health. In spite of their potential, methodological challenges related to study design compounded by a lack of tools for sample size calculation inhibit their development. We help address this gap by proposing the Multilevel Intervention Stepped Wedge Design (MLI-SWD), a hybrid experimental design which combines cluster-level (CL) randomization using a Stepped Wedge design (SWD) with independent individual-level (IL) randomization. The MLI-SWD is suitable for MLIs where the IL intervention has a low risk of interference between individuals in the same cluster, and it enables estimation of the component IL and CL treatment effects, their interaction, and the combined intervention effect. The MLI-SWD accommodates cross-sectional and cohort designs as well as both incomplete (clusters are not observed in every study period) and complete observation patterns. We adapt recent work using generalized estimating equations for SWD sample size calculation to the multilevel setting and provide an R package for power and sample size calculation. Furthermore, motivated by our experiences with the ongoing NC Works 4 Health study, we consider how to apply the MLI-SWD when individuals join clusters over the course of the study. This situation arises when unemployment MLIs include IL interventions that are delivered while the individual is unemployed. This extension requires carefully considering whether the study interventions will satisfy additional causal assumptions but could permit randomization in new settings.
... The GEE model controls for intraindividual correlation between repeated measures and assumes that measurements are missing completely at random. [39,40] Mean levels of each cognitive domain by sleep duration and sleep difficulty status before retirement (wave -1) were examined with analysis of variance. The association between changes in sleep duration and difficulties and changes in cognitive function during retirement transition (waves -1 to 1) were examined with GEE models, which included a "sleep group × time (i.e. ...
Preprint
Full-text available
Background: The transition to retirement has been shown to be accompanied by increased sleep duration and improved sleep quality. In addition, some studies suggest accelerated decline in cognitive function in post-retirement years. However, less is known about their interconnectedness. The aim of this study was to examine the concurrent changes in sleep and cognitive function during retirement transition. Methods: The study population consisted of 250 public sector workers (mean age before retirement 63.1 years, standard deviation 1.4) from the Finnish Retirement and Aging study. The participants used a wrist-worn ActiGraph accelerometer, responded to the Jenkins Sleep Problem Scale and underwent cognitive testing annually before and after retirement. Computerized Cambridge Neuropsychological Test Automated Battery (CANTAB) was used to evaluate learning and memory, working memory, sustained attention and information processing, executive function and cognitive flexibility and reaction time. Results: Cognitive function improved in all cognitive domains, except for reaction time, during 1-year retirement transition period. The improvement was temporary in learning and memory, working memory and executive function and cognitive flexibility, which plateaued in post-retirement years. The participants were categorized into constantly short (49%), increasing (20%), decreasing (6%), and constantly mid-range (25%) sleep duration; and constantly without (36%), increasing (10%), decreasing (16%), and constantly with (38%) sleep difficulties. There were no associations between changes in sleep duration or sleep difficulties and cognitive function during retirement transition. Conclusions: Cognitive function improves temporarily during transition to retirement, but the improvement is independent of changes in sleep characteristics.
... The non-constant covariance structure within each level may cause a severe concern, especially if separate trajectories for subjects and clusters are of interest (Grund et al., 2018). Different number of subjects in each cluster may cause misspecification of mean and covariance structures for each level required by the model estimation (e.g., Laird 1988;Little and Rubin, 2019) and result in low statistical power for overall MLGM estimation (Diggle, 2002). Therefore, this study aims to fill the gaps in these literature. ...
Article
Full-text available
This study informed researchers about the performance of different level-specific and target-specific model fit indices in the Multilevel Latent Growth Model (MLGM) with unbalanced design. As the use of MLGMs is relatively new in applied research domain, this study helped researchers using specific model fit indices to evaluate MLGMs. Our simulation design factors included three levels of number of groups (50, 100, and 200) and three levels of unbalanced group sizes (5/15, 10/20, and 25/75), based on simulated datasets derived from a correctly specified MLGM. We evaluated the descriptive information of the model fit indices under various simulation conditions. We also conducted ANOVA to calculated the extent to which these fit indices could be influenced by different design factors. Based on the results, we made recommendations for practical and theoretical research about the fit indices. CFI- and TFI-related fit indices performed well in the MLGM and could be trustworthy to use to evaluate model fit under similar conditions found in applied settings. However, RMSEA-related fit indices, SRMR-related fit indices, and chi square-related fit indices varied by the factors included in this study and should be used with caution for evaluating model fit in the MLGM.
... Las áreas de interés formaban parte de los efectos fijos del modelo, ya que se buscaba la comparación de los tiempos de lectura dentro de las áreas en las diferentes condiciones. Se incluyeron efectos aleatorios (Diggle, Heagerty, Liang & Zeger, 2002), entre los que se tuvo en cuenta la velocidad de lectura propia de cada participante y los diferentes temas que representaban cada estímulo. Los efectos no lineales (Wood, 2017) permitieron estimar patrones flexibles, como la longitud de las palabras de cada AOI, ya que la suposición de que el tiempo de lectura crece progresivamente con la longitud media de las palabras es demasiado restrictiva. ...
Preprint
Full-text available
Anaphoric encapsulators are referential expressions which, as they compress textual segments of predicative nature, are expected to present different and more complex processing patterns than those of coreferential expressions, since, for their interpretation, it is necessary to recover an explicature. However, references to encapsulation processing are still scarce and there are hardly any experimental studies that offer data on this subject. In this paper the results of an eye-tracking experiment comparing the processing efforts of two types of non-(re)categorizing encapsulators (neutral demonstrative pronoun and deverbal nominalizations) with analogous coreferential mechanisms (coreferential demonstrative pronoun and lexical repetition) are analyzed. The results demonstrate that the processing efforts of utterances containing encapsulators are not higher than those that contain coreferential expressions, but the processing profile of encapsulation is qualitatively different from coreference. In addition, in the case of encapsulators, it is shown that neutral demonstrative pronouns do not reduce processing efforts in comparison with deverbal nominalizations
... Las áreas de interés formaban parte de los efectos fijos del modelo, ya que se buscaba la comparación de los tiempos de lectura dentro de las áreas en las diferentes condiciones. Se incluyeron efectos aleatorios (Diggle, Heagerty, Liang & Zeger, 2002), entre los que se tuvo en cuenta la velocidad de lectura propia de cada participante y los diferentes temas que representaban cada estímulo. Los efectos no lineales (Wood, 2017) permitieron estimar patrones flexibles, como la longitud de las palabras de cada AOI, ya que la suposición de que el tiempo de lectura crece progresivamente con la longitud media de las palabras es demasiado restrictiva. ...
Article
Full-text available
Anaphoric encapsulators are referential expressions which, as they compress textual segments of predicative nature, are expected to present different and more complex processing patterns than those of coreferential expressions, since, for their interpretation, it is necessary to recover an explicature. However, references to encapsulation processing are still scarce and there are hardly any experimental studies that offer data on this subject. In this paper the results of an eye-tracking experiment comparing the processing efforts of two types of non-(re)categorizing encapsulators (neutral demonstrative pronoun and deverbal nominalizations) with analogous coreferential mechanisms (coreferential demonstrative pronoun and lexical repetition) are analyzed. The results demonstrate that the processing efforts of utterances containing encapsulators are not higher than those that contain coreferential expressions, but the processing profile of encapsulation is qualitatively different from coreference. In addition, in the case of encapsulators, it is shown that neutral demonstrative pronouns do not reduce processing efforts in comparison with deverbal nominalizations.
... An advantage of GEE is that it estimates population-averaged effects, emphasizing the generalized impact of predictors on the entire population. This enhances interpretability and generalizability of the findings (Diggle 2002). Moreover, GEE allows flexibility in modeling different outcome distributions tailored to the variable type. ...
Article
Full-text available
We investigate how a firm’s positioning relative to category exemplars shapes security analysts’ evaluations. Using a two-stage model of evaluation (initial screening and subsequent assessment), we propose that exemplar similarity enhances a firm’s recognizability and legitimacy, increasing the likelihood that it passes the initial screening stage and attracts analyst coverage. However, exemplar similarity may also prompt unfavorable comparisons with exemplar firms, leading to lower analyst recommendations in the assessment stage. We further argue that category coherence, distinctiveness, and exemplar typicality influence the impact of exemplar similarity on firm evaluation. Leveraging natural language processing (NLP) techniques to analyze a sample of 7,603 U.S. public firms from 1997 to 2022, we find robust support for our predictions. By highlighting the intricate role of strategic positioning vis-à-vis category exemplars in shaping audience evaluations, our findings have important implications for research on positioning relative to category exemplars, category viability, optimal distinctiveness, and security analysts. Supplemental Material: The online appendices are available at https://doi.org/10.1287/orsc.2022.16855 .
... These models can handle a wide range of outcome types, including continuous, binary, ordinal, and count, and can account for the correlations among observations within the same cluster. GLMMs have been widely used for conducting ITT analysis in randomized controlled trials with missing outcome data and can account for data missing at random (MAR) without the need to model why data are missing or to perform explicit imputations of the missing values [31]. Specifically, we analyzed each primary end point using a linear mixed-effects model, a special case of the GLMM. ...
Article
Full-text available
Background University attendance represents a transition period for students that often coincides with the emergence of mental health and substance use challenges. Digital interventions have been identified as a promising means of supporting students due to their scalability, adaptability, and acceptability. Minder is a mental health and substance use mobile app that was codeveloped with university students. Objective This study aims to examine the effectiveness of the Minder mobile app in improving mental health and substance use outcomes in a general population of university students. Methods A 2-arm, parallel-assignment, single-blinded, 30-day randomized controlled trial was used to evaluate Minder using intention-to-treat analysis. In total, 1489 participants were recruited and randomly assigned to the intervention (n=743, 49.9%) or waitlist control (n=746, 50.1%) condition. The Minder app delivers evidence-based content through an automated chatbot and connects participants with services and university social groups. Participants are also assigned a trained peer coach to support them. The primary outcomes were measured through in-app self-assessments and included changes in general anxiety symptomology, depressive symptomology, and alcohol consumption risk measured using the 7-item General Anxiety Disorder scale, 9-item Patient Health Questionnaire, and US Alcohol Use Disorders Identification Test–Consumption Scale, respectively, from baseline to 30-day follow-up. Secondary outcomes included measures related to changes in the frequency of substance use (cannabis, alcohol, opioids, and nonmedical stimulants) and mental well-being. Generalized linear mixed-effects models were used to examine each outcome. Results In total, 79.3% (589/743) of participants in the intervention group and 83% (619/746) of participants in the control group completed the follow-up survey. The intervention group had significantly greater average reductions in anxiety symptoms measured using the 7-item General Anxiety Disorder scale (adjusted group mean difference=−0.85, 95% CI −1.27 to −0.42; P<.001; Cohen d=−0.17) and depressive symptoms measured using the 9-item Patient Health Questionnaire (adjusted group mean difference=−0.63, 95% CI −1.08 to −0.17; P=.007; Cohen d=−0.11). A reduction in the US Alcohol Use Disorders Identification Test–Consumption Scale score among intervention participants was also observed, but it was not significant (P=.23). Statistically significant differences in favor of the intervention group were found for mental well-being and reductions in the frequency of cannabis use and typical number of drinks consumed. A total of 77.1% (573/743) of participants in the intervention group accessed at least 1 app component during the study period. Conclusions In a general population sample of university students, the Minder app was effective in reducing symptoms of anxiety and depression, with provisional support for increasing mental well-being and reducing the frequency of cannabis and alcohol use. These findings highlight the potential ability of e-tools focused on prevention and early intervention to be integrated into existing university systems to support students’ needs. Trial Registration ClinicalTrials.gov NCT05606601; https://clinicaltrials.gov/ct2/show/NCT05606601 International Registered Report Identifier (IRRID) RR2-10.2196/49364
... We studied the association between changes in sleep problems and changes in total and domain-specific life satisfaction during the retirement transition by using multiple linear regression analyses with generalized estimating equations (GEEs). The GEE model controls for the intra-individual correlation between measurements waves, as one participant may contribute to multiple waves of life satisfaction measurements within the sleep problem group (Zeger et al. 1988;Diggle 2013). We defined the retirement transition period which covered wave − 1 to wave + 1 (on average 0.5 years before and 0.5 years after retirement), and for the analyses measuring the changes during the retirement transition (wave − 1 to wave + 1), the interaction term 'sleep problem group × time (wave − 1 to wave + 1)' was added to the GEE models. ...
Article
Full-text available
Retirement reduces sleep problems, but changes in life satisfaction during the retirement transition are multifactorial and partly unknown. The aim of this prospective cohort study was to examine whether changes in sleep problems are associated with changes in total and domain-specific life satisfaction during the retirement transition (on average 0.5 years before and 0.5 years after retirement). The study population consisted of Finnish public sector employees (n = 3518) from the Finnish Retirement and Aging (FIREA) study who responded to annual surveys before and after transition to statutory retirement. Sleep problems were measured with Jenkins Sleep Problem Scale questionnaire and participants were grouped into four sleep problem groups depending on the state of their sleep problems during the retirement transition: ‘Never,’ ‘Decreasing,’ ‘Increasing,’ and ‘Constant’ sleep problems. Life satisfaction was measured with the Life Satisfaction Scale questionnaire including four domains (interestingness, happiness, easiness, togetherness). We found that the improvement in total life satisfaction was greatest for participants in the ‘Decreasing’ (0.17, 95% CI 0.11–0.23, SMD 0.27) and ‘Constant’ (0.12, 95% CI 0.07–0.18, SMD 0.19) sleep problem groups. Of the specific life satisfaction domains, similar findings were observed only for the easiness domain. It seems that decreasing or constant sleep problems are associated with improved life satisfaction during the retirement transition, especially in the feeling of easiness of life. This may be due to the fact that as the demands of working life are removed, sleep problems are alleviated or it becomes easier to live with them, which improves life satisfaction.
... Hierarchical data are commonly analyzed using linear mixed-effects models (Pinheiro & Bates, 2006), that incorporate in the linear predictor both fixed effects, associated to the entire population, and random effects, associated to the groups in which observations are nested, randomly drawn from the Gaussian-distributed population (Goldstein, 2011). Generalized linear mixed-effects models (GLMMs) deal with responses that follow distributions in the exponential family, other than Gaussian (Diggle et al., 2002;Agresti, 2018). However, extending GLMMs to handle unordered categorical responses presents more challenges (Daniels & Gatsonis, 1997;Hartzel & Agresti, 2001;Hedeker, 2003;Kuss et al., 2007;Wang & Tsodikov, 2010), due to the increased complexity associated with their modelling. ...
Article
Full-text available
We propose a discrete random effects multinomial regression model to deal with estimation and inference issues in the case of categorical and hierarchical data. Random effects are assumed to follow a discrete distribution with an a priori unknown number of support points. For a K -categories response, the modelling identifies a latent structure at the highest level of grouping, where groups are clustered into subpopulations. This model does not assume the independence across random effects relative to different response categories, and this provides an improvement from the multinomial semi-parametric multilevel model previously proposed in the literature. Since the category-specific random effects arise from the same subjects, the independence assumption is seldom verified in real data. To evaluate the improvements provided by the proposed model, we reproduce simulation and case studies of the literature, highlighting the strength of the method in properly modelling the real data structure and the advantages that taking into account the data dependence structure offers.
... When assumptions of multilevel linear regression models were not met, the authors used Huber and White sandwich estimators for fixed effects. 62 For each outcome, authors estimated a main effect model. Because obesity prevalence is often higher among women than among men and among Black than among White Americans, authors examined whether relationships between HOLC grade and obesity outcomes were modified by individual-level factors. ...
Article
Full-text available
Introduction Historical maps of racialized evaluation of mortgage lending risk (i.e., redlined neighborhoods) have been linked to adverse health outcomes. Little research has examined whether living in historically redlined neighborhoods is associated with obesity, differentially by race or gender. Methods This is a cross-sectional study to examine whether living in historically redlined neighborhoods is associated with BMI and waist circumference among Black and White adults in 1985–1986. Participants’ addresses were linked to the 1930s Home Owners’ Loan Corporation maps that evaluated mortgage lending risk across neighborhoods. The authors used multilevel linear regression models clustered on Census tract, adjusted for confounders to estimate main effects, and stratified, and interaction models by (1) race, (2) gender, and (3) race by gender with redlining differentially for Black versus White adults and men versus women. To better understand strata differences, they compared Census tract–level median household income across race and gender groups within Home Owners’ Loan Corporation grade. Results Black adults (n=2,103) were more likely than White adults (n=1,767) to live in historically rated hazardous areas and to have higher BMI and waist circumference. Redlining and race and redlining and gender interactions for BMI and waist circumference were statistically significant (p<0.10). However, in stratified analyses, the only statistically significant associations were among White participants. White participants living in historically rated hazardous areas had lower BMI (β=−0.63 [95% CI= −1.11, −0.15]) and lower waist circumference (β=−1.50 [95% CI= −2.62, −0.38]) than those living in declining areas. Within each Home Owners’ Loan Corporation grade, residents in White participants’ neighborhoods had higher incomes than those living in Black participants’ neighborhoods (p<0.0001). The difference was largest within historically redlined areas. Covariate associations differed for men, women, Black, and White adults, explaining the difference between the interaction and the stratified models. Race by redlining interaction did not vary by gender. Conclusions White adults may have benefitted from historical redlining, which may have reinforced neighborhood processes that generated racial inequality in BMI and waist circumference 50 years later.
... The AR (1) was considered: first order auto-regressive. The choice of this is due to the fact that it best represented the variation of the equally spaced data and, also, the covariance decreased on average between two observations as the time interval between them increased, as suggested by (Diggle et al., 2002). ...
Article
Full-text available
Este estudo teve como objetivo avaliar a atividade antibacteriana in vitro do extrato aquoso de Baccharis dracunculifolia e Tamarindus indica L., dos extratos naturais líquidos da casca da castanha de caju (LCC) e do óleo essencial (OE) de cravo contra cinco espécies de bactérias ruminais Gram-negativas. As culturas foram cultivadas em meio anaeróbio contendo 0,1, 0,2, 0,5 e 1,0 mg mL-1 dos extratos ou óleos. O crescimento foi avaliado monitorando a densidade óptica (DO 600 nm) em intervalos de 0, 8, 12 e 24 horas de incubação a 39° C. O extrato aquoso de baccharis e tamarindo e o extrato natural de LCC inibiram o crescimento de Prevotella albensis, Prevotella bryantii, Treponema saccharophilum e Succinivibrio dextrinosolvens. Para Prevotella ruminicola e Succinivibrio dextrinosolvens, a adição de 1,0 mg mL-1 de OE de folhas de cravo resultou em maior impacto na dinâmica de crescimento, com redução na densidade óptica em todos os intervalos de observações. Os resultados desta pesquisa estabelecem a eficácia dos aditivos naturais extratos aquosos de baccharis e tamarindo, LCC e óleo essencial de cravo, na atividade antimicrobiana in vitro contra as bactérias ruminais Gram-negativas analisadas.
... Unadjusted statistical comparisons, using two sample t-tests and Pearson's chi-squared tests will be made between groups. Generalized estimating equations (i.e., the GEE method) [34] will be used to analyze the effects of the intervention on the primary and secondary outcome variables, while accounting for nesting of participants in pharmacies. The GEE method is an extension of the generalized linear model. ...
Article
Full-text available
Background Suicide prevention gatekeeping is a skill that may support community (retail) pharmacists in managing patients who present with suicide warning signs. A brief, virtual, case-based training intervention was tailored to the retail setting (Pharm-SAVES). To test training effectiveness, a randomized controlled trial (RCT) protocol was developed for use in pharmacies across four states. Objective To introduce the trial protocol for assessing the effectiveness for increasing the proportion of staff who recognize patients displaying warning signs and self-report engaging in gatekeeping, including asking if the patient is considering suicide. Methods This study uses a parallel cluster-randomized controlled trial to recruit 150 pharmacy staff in community pharmacies in four states with two groups (intervention and control). The control group completes Pharm-SAVES online suicide prevention gatekeeper training and all assessment surveys at baseline after training and at 1-month follow-up. The experimental group completes all control group training and assessments plus interactive video role-play patient cases. Conclusion We hypothesize that compared to those in the control group, experimental group trainees exposed to the interactive video role play patient cases will be more likely to recognize warning signs in patient cases and self-report engaging in gatekeeping.
... To consider potential interactions between variables relevant pairs were integrated in the model stepwise using the interaction term. Model selection followed the top-down strategy (Diggle et al., 2002) as described in Zuur et al. (2009). The Akaike Information Criteria (AIC) based on the restricted maximum likelihood (REML) was chosen as model selection tool for each of the five response variables. ...
Article
Full-text available
Longer durations of warmer weather, altered precipitation, and modified streamflow patterns driven by climate change are expected to impair ecosystem resilience, exposing freshwater ecosystems and their biota to a severe threat worldwide. Understanding the spatio-temporal temperature variations and the processes governing thermal heterogeneity within the riverscape are essential to inform water management and climate adaptation strategies. We combined UAS-based imagery data of aquatic habitats with meteorological, hydraulic, river morphology and water quality data to investigate how key factors influence spatio-temporal stream heterogeneity on a diurnal basis within different thermal regions of a large recently restored Danube floodplain. Diurnal temperature ranges of aquatic habitats were larger than expected and ranged between 14.2 and 28.0 °C (mean = 20.7 °C), with peak median temperatures (26.1 °C) around 16:00 h. The observed temperature differences in timing and amplitude among thermal regions were unexpectedly high and created a mosaic pattern of temperature heterogeneity. For example, cooler groundwater-influenced thermal regions provided several cold water patches (CWP, below 19.0 °C) and potential cold water refuges (CWRs) around 12:00 h, at the time when other habitats were warmer than 21.0 °C, exceeding the ecological threshold (20.0 °C) for key aquatic species. Within the morphological complexity of the restored floodplain, we identified groundwater influence, shading and river morphology as the key processes driving thermal riverscape heterogeneity. Promoting stream thermal refuges will become increasingly relevant under climate change scenarios, and river restoration should consider both measures to physically prevent habitat from excessive warming and measures to improve connectivity that meet the temperature requirements of target species for conservation. This requires restoring mosaics of complex and dynamic temperature riverscapes.
... Longitudinal data implies continuous assessments repeated over time. This type of data is common in the health area, and its major advantage is its capacity to separate cohort and temporal effects in the context of the analyses (Diggle et al. 2002). For example, longitudinal data is part of clinical studies that follow a group of patients with diabetes over five years to track changes in their blood sugar levels and complications. ...
Article
Full-text available
Transformers are state-of-the-art technology to support diverse Natural Language Processing (NLP) tasks, such as language translation and word/sentence predictions. The main advantage of transformers is their ability to obtain high accuracies when processing long sequences since they avoid the vanishing gradient problem and use the attention mechanism to maintain the focus on the information that matters. These features are fostering the use of transformers in other domains beyond NLP. This paper employs a systematic protocol to identify and analyze studies that propose new transformers’ architectures for processing longitudinal health datasets, which are often dense, and specifically focused on physiological, symptoms, functioning, and other daily life data. Our analysis considered 21 of 456 initial papers, collecting evidence to characterize how recent studies modified or extended these architectures to handle longitudinal multifeatured health representations or provide better ways to generate outcomes. Our findings suggest, for example, that the main efforts are focused on methods to integrate multiple vocabularies, encode input data, and represent temporal notions among longitudinal dependencies. We comprehensively discuss these and other findings, addressing major issues that are still open to efficiently deploy transformers architectures for longitudinal multifeatured healthcare data analysis.
... Environmental exposure and biomonitoring data with repeated measurements in longitudinal and cluster studies are known to generally be subject to left censoring. Marginal models are appropriate when focusing on inferences about the population average [42] but these models have not been carried out in the literature on exposure and biomonitoring data with repeated measures and non-detects. Therefore, we proposed incorporating available fill-in or substitution methods for utilizing detects below the LOD into three estimating approaches, i.e., GEE, QIF, and GMM, in which consistent regression parameter estimates can be obtained even when a working correlation structure is incorrectly specified [21]. ...
Article
Full-text available
Background Environmental exposure and biomonitoring data with repeated measurements from environmental and occupational studies are commonly right-skewed and in the presence of limits of detection (LOD). However, existing model has not been discussed for small-sample properties and highly skewed data with non-detects and repeated measurements. Objective Marginal modeling provides an alternative to analyzing longitudinal and cluster data, in which the parameter interpretations are with respect to marginal or population-averaged means. Methods We outlined the theories of three marginal models, i.e., generalized estimating equations (GEE), quadratic inference functions (QIF), and generalized method of moments (GMM). With these approaches, we proposed to incorporate the fill-in methods, including single and multiple value imputation techniques, such that any measurements less than the limit of detection are assigned values. Results We demonstrated that the GEE method works well in terms of estimating the regression parameters in small sample sizes, while the QIF and GMM outperform in large-sample settings, as parameter estimates are consistent and have relatively smaller mean squared error. No specific fill-in method can be deemed superior as each has its own merits. Impact Marginal modeling is firstly employed to analyze repeated measures data with non-detects, in which only the mean structure needs to be correctly provided to obtain consistent parameter estimates. After replacing non-detects through substitution methods and utilizing small-sample bias corrections, in a simulation study we found that the estimating approaches used in the marginal models have corresponding advantages under a wide range of sample sizes. We also applied the models to longitudinal and cluster working examples.
... Generalized Estimating Equations, also known as GEE [6,12,13], is a statistical method initially proposed by Liang and Zeger [11]. It extends the framework of Generalized Linear Models (GLMs) and overcomes the assumption of independence among observations, making it particularly useful for handling correlated data. ...
Article
Full-text available
Rapid development in data science enables machine learning and artificial intelligence to be the most popular research tools across various disciplines. While numerous articles have shown decent predictive ability, little research has examined the impact of complex correlated data. We aim to develop a more accurate model under repeated measures or hierarchical data structures. Therefore, this study proposes a novel algorithm, the Generalized Estimating Equations Boosting (GEEB) machine, to integrate the gradient boosting technique into the benchmark statistical approach that deals with the correlated data, the generalized Estimating Equations (GEE). Unlike the previous gradient boosting utilizing all input features, we randomly select some input features when building the model to reduce predictive errors. The simulation study evaluates the predictive performance of the GEEB, GEE, eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) across several hierarchical structures with different sample sizes. Results suggest that the new strategy GEEB outperforms the GEE and demonstrates superior predictive accuracy than the SVM and XGBoost in most situations. An application to a real-world dataset, the Forest Fire Data, also revealed that the GEEB reduced mean squared errors by 4.5% to 25% compared to GEE, XGBoost, and SVM. This research also provides a freely available R function that could implement the GEEB machine effortlessly for longitudinal or hierarchical data.
... The most common approach for analysing this type of repeated measurements data is via Mixed Linear Models, which assume normally distributed scenarios (Weiss, 2005). Alternatively, Generalized Linear Mixed Models can be employed when the behaviour of the response variables are within the framework of the exponential family of distributions, for example when the response variable is binary and logistic regression is employed for analysis (Diggle et al., 2002). ...
Article
Full-text available
Wheelchair basketball, regulated by the International Wheelchair Basketball Federation, is a sport designed for individuals with physical disabilities. This paper presents a data-driven tool that effectively determines optimal team line-ups based on past performance data and metrics for player effectiveness. Our proposed methodology involves combining a Bayesian longitudinal model with an integer linear problem to optimise the line-up of a wheelchair basketball team. To illustrate our approach, we use real data from a team competing in the Rollstuhlbasketball Bundesliga, namely the Doneck Dolphins Trier. We consider three distinct performance metrics for each player and incorporate uncertainty from the posterior predictive distribution of the longitudinal model into the optimisation process. The results demonstrate the tool's ability to select the most suitable team compositions and calculate posterior probabilities of compatibility or incompatibility among players on the court.
... The sample size was set considering a type 1 error of 0.05, type 2 error of 0.20, and success rate of p1 = 0.25 and p2 = 0.40; the minimum required sample size was 101 in each group. Considering the sample loss rate of about 50%, the minimum final sample volume in each group will be about 151 people in each group (29). ...
Article
Full-text available
Background Healthy dietary intake and physical activity affect the immune systems. The present study aimed to investigate the effects of a web-based lifestyle intervention on nutritional status, physical activity, and prevention of COVID-19. Methods Three hundred-three women (30–60 years old), who did not have COVID-19 in the City of Ardabil, participated in this study. Participants were randomized into an intervention (n = 152) or control group (n = 151). The intervention group received eight online educational sessions focusing on a healthy diet and physical activity via the website. There was no educational session for the control group during the intervention, but they were placed on the waiting list to receive the intervention and given access to the website and educational content after the follow-up. Outcomes were nutritional status, physical activity, immunoglobulin G (IgG), and immunoglobulin M (Ig M) antibody titers against the virus. They were evaluated at the baseline, after 4 and 12 weeks. Results Significant improvements in weight (P < 0.001), BMI (P < 0.001), total energy (P = 0.006), carbohydrate (P = 0.001), protein (P = 0.001), and fat (P < 0.001) were found for the intervention group compared to the control group during the study. MET-min/week for moderate physical activity increased during the time for the intervention and control groups (P < 0.001 and P = 0.007, respectively). MET-min/week for walking activity rose in the post-intervention and follow-up compared to that in the baseline in the groups (P < 0.001 for both groups). Total physical activity was increased during the study (P < 0.001) for both groups. The mean of serum IgG and IgM titers against the virus were increased during the study in both groups in time effect (P < 0.001). There was a significant time x group interaction for carbohydrate and fat intakes (P = 0.005 and P = 0.004, respectively). Conclusion The web-based lifestyle intervention may improve nutritional status and physical activity, and have the potential to reduce the risk of contracting a COVID-19 infection.
... Linear mixed models can accommodate participants with missing observations. 31 For the third research question, participants' responses will be examined qualitatively and described narratively. All participants will be asked the questions relating to psychosocial risks. ...
Article
Full-text available
Background A positive child-caregiver relationship is one of the strongest determinants of child health and development, yet many caregivers report challenges in establishing a positive relationship with their child. For over 20 years, Make the Connection® (MTC), an evidence-based parenting program, has been delivered in-person by child-caring professionals to over 120,000 parents to improve positive parenting behaviours and attitudes. Recently, MTC has been adapted into a ‘direct to caregiver’ online platform to increase scalability and accessibility. The purpose of this study is to evaluate the effectiveness of the online modality of MTC in increasing parenting knowledge, attitudes, and the perceived relationship with their child, and to understand barriers and facilitators to its access. Methods Two hundred caregivers with children aged 0-3 years old will be recruited through Public Health agencies in Ontario, Canada. Participants will be randomly placed in the intervention or waitlist control group. Both groups will complete a battery of questionnaires at study enrolment and 8 weeks later. The intervention group will receive the MTC online program during the 8-week period, while the waitlist group will receive the program after an 8-week wait. The study questionnaires will address demographic information, caregivers’ relational attitudes towards their infant, self-competence in their caregiver role, depression, and caregiver stress, as well as caregivers’ and infants’ emotion regulation. Discussion Results from this study will add critical knowledge to the development, scaling, and roll out of the MTC online program, thus increasing its capacity to reach a greater number of families. Trial registration The study was registered with ClinicalTrials.gov on 15 March 2023 (NCT05770414).
... Repeated-measures and clustered data, specially in longitudinal data studies, are very common in several areas of knowledge, such as medicine, economy, biology, etc. In the modeling of such experiments, it is necessary to consider and model the structure of dependence within subjects to obtain greater precision, especially in estimating the standard errors of the estimators (Diggle et al., 2002). The first models to deal with this problem were based on the Multivariate Normal distribution (Singer, Rocha and Nobre, 2017), as Marginal Multivariate models, linear mixed models (Henderson (1953) and Laird and Ware (1982)) and non-linear mixed models (Lindstrom and Bates, 1990). ...
Article
Experiments with repeated measures are the ones where more than one observation per subject is available. To model of such experiments, dependency within subjects needs to be taken into consideration. In cases where the variable of interest is bounded in (a, b) with a < b known reals, there are few proposals to model correlated bounded data most part being based on Beta, Simplex and Unit gamma distributions. In particular, for marginal modeling of the mean and precision/dispersion, Simplex and Beta models based on Generalized Estimating Equations (GEE) are used. In this paper, to take account of possible within-subject dependence using the GEE approach, we proposed an Unit Gamma regression model used to modeling bounded data in a unit interval. In this paper, we developed residuals and influence diagnostic tools to the Simplex and Unit Gamma models for correlated bounded data. Furthermore, To assess the finite-sample performance of the proposed estimators, we conducted a Monte Carlo simulation study. The methodology is illustrated with the analysis of a real data set. An R package was developed for all the new methodology described in this paper.
Article
Full-text available
The mixed-effects model for repeated measures (MMRM) approach has been widely applied for longitudinal clinical trials. Many of the standard inference methods of MMRM could possibly lead to the inflation of type I error rates for the tests of treatment effect, when the longitudinal dataset is small and involves missing measurements. We propose two improved inference methods for the MMRM analyses, (1) the Bartlett correction with the adjustment term approximated by bootstrap, and (2) the Monte Carlo test using an estimated null distribution by bootstrap. These methods can be implemented regardless of model complexity and missing patterns via a unified computational framework. Through simulation studies, the proposed methods maintain the type I error rate properly, even for small and incomplete longitudinal clinical trial settings. Applications to a postnatal depression clinical trial are also presented.
Article
Full-text available
Introduction Type 2 diabetes impacts millions and poor maintenance of diabetes can lead to preventable complications, which is why achieving and maintaining target A1C levels is critical. Thus, we aimed to examine inequities in A1C over time, place, and individual characteristics, given known inequities across these indicators and the need to provide continued surveillance. Methods Secondary de-identified data from medical claims from a single payer in Texas was merged with population health data. Generalized Estimating Equations were utilized to assess multiple years of data examining the likelihood of having non-target (>7% and ≥7%, two slightly different cut points based on different sources) and separately uncontrolled (>9%) A1C. Adults in Texas, with a Type 2 Diabetes (T2D) flag and with A1C reported in first quarter of the year using data from 2016 and 2019 were included in analyses. Results Approximately 50% had A1Cs within target ranges (<7% and ≤7%), with 50% considered having non-target (>7% and ≥7%) A1Cs; with 83% within the controlled ranges (≤9%) as compared to approximately 17% having uncontrolled (>9%) A1Cs. The likelihood of non-target A1C was higher among those individuals residing in rural (vs urban) areas (P < .0001); similar for the likelihood of reporting uncontrolled A1C, where those in rural areas were more likely to report uncontrolled A1C (P < .0001). In adjusted analysis, ACA enrollees in 2016 were approx. 5% more likely (OR = 1.049, 95% CI = 1.002-1.099) to have non-target A1C (≥7%) compared to 2019; in contrast non-ACA enrollees were approx. 4% more likely to have non-target A1C (≥7%) in 2019 compared to 2016 (OR = 1.039, 95% CI = 1.001-1.079). In adjusted analysis, ACA enrollees in 2016 were 9% more likely (OR = 1.093, 95% CI = 1.025-1.164) to have uncontrolled A1C compared to 2019; whereas there was no significant change among non-ACA enrollees. Conclusions This study can inform health care interactions in diabetes care settings and help health policy makers explore strategies to reduce health inequities among patients with diabetes. Key partners should consider interventions to aid those enrolled in ACA plans, those in rural and border areas, and who may have coexisting health inequities.
Book
Preface Disease early detection and prevention offer numerous benefits to both our health and society. Often, the earlier a disease is detected, the higher the likelihood of successful cure or management. Managing a disease in its early stages can significantly reduce its impact on a patient’s quality of life and decrease healthcare costs. To detect a disease early, disease screening has become a popular tool. This method aims to determine the likelihood of a given patient having a particular disease by applying medical procedures or tests to check the major risk factors, even in patients without obvious symptoms of the disease. While disease screening primarily focuses on individual patients, disease surveillance is for detecting disease outbreaks early within a given population. For example, our society faces constant threats from bioterrorist attacks and pandemic influenza. It is thus important to monitor the incidence of infectious diseases continuously and detect their outbreaks promptly. This allows governments and individuals to implement timely disease control and prevention measures, minimizing the impact of these diseases. This book introduces some recent analytic methodologies and software packages developed for effective disease screening and disease surveillance. My exploration into disease screening was motivated by an experience around 2010 when I analyzed a dataset from the Framingham Heart Study (FHS). The FHS primarily aims to identify major risk factors for cardiovascular diseases (CVDs), and numerous CVD risk factors have been recognized since the study's inception in 1948, including smoking, high blood pressure, obesity, high cholesterol levels, physical inactivity, and more. During my data analysis, a pivotal question emerged: Could the identified CVD risk factors be utilized to predict the likelihood of a severe CVD, such as stroke, for individual patients? Statistically, this translates into a sequential decision-making problem, where the relevant statistical tool is the statistical process control (SPC) charts. However, traditional SPC charts, developed primarily for monitoring production lines in manufacturing, assume independence and identical distribution of process observations when the process is in-control (IC), and are designed for monitoring a single sequential process. In the context of disease screening, observed data of a patient's disease risk factors would rarely be independent and identically distributed over time and treating a patient's observed data as a process introduces numerous processes of different patients, making traditional SPC charts unsuitable to use. Recognizing the importance of the disease screening problem, I dedicated much of the past decade to addressing this issue. This endeavor led to the development of a series of new concepts and methods by my research team. The central methodology, termed the Dynamic Screening System (DySS), operates as follows: firstly, the regular longitudinal pattern of disease risk factors is estimated from a pre-collected dataset representing the population without the target disease. Subsequently, a patient's observed pattern of disease risk factors is cross-sectionally compared with the estimated regular longitudinal pattern at each observation time. The cumulative difference between the two patterns up to the current time is then employed to determine the patient's disease status at that time. DySS utilizes all historical data of the patient in its decision-making, and effectively accommodates the complex data structure, including time-varying data distribution. In the summer of 2013, upon joining the University of Florida (UF), I started to work on the pressing issue of disease surveillance due to its paramount importance in public health. Disease incidence data are typically collected sequentially over time and across multiple locations or regions, constituting spatio-temporal data. Similar to disease screening, disease surveillance is a sequential decision-making problem. However, its complexity arises from the intricate spatio-temporal data structure, encompassing seasonality, temporal/spatial variation, data correlation, and intricate data distribution. Common disease reporting and surveillance systems incorporate conventional SPC charts such as the cumulative sum (CUSUM) and exponentially weighted moving average (EWMA) charts. Additionally, retrospective methods like scan tests and generalized linear modeling approaches are employed for routine surveillance. Unfortunately, these methods often prove ineffective or unreliable due to their inability to handle the sequential nature of the problem or their restrictive model assumptions (cf., Section 2.7 and Chapters 7 and 8). Over the past decade, my research team has devoted significant effort to this domain, resulting in the development of several novel analytic methods for disease surveillance. Our initial method operates as follows: First, a nonparametric spatio-temporal modeling approach is employed to estimate the regular spatio-temporal pattern of disease incidence rates from observed data in a baseline time interval (e.g., a previous year without outbreaks). Second, the new spatial data collected at the current time are compared with the estimated regular pattern and decorrelated with all previous data. Third, an SPC chart is then applied to the decorrelated data to determine the occurrence of a disease outbreak by the current time. Modified versions of this method have been crafted to incorporate covariate information and accommodate specific spatial features of disease outbreaks. These methods adeptly handle the complex structure of observed data and have demonstrated effectiveness in disease surveillance. As discussed earlier, both disease screening and disease surveillance pose challenges as sequential decision-making problems, and traditional SPC charts prove unreliable in addressing them adequately. Consequently, disease screening and disease surveillance emerge as crucial applications of SPC, demanding the development of new methods tailored to their specific requirements. Fortuitously, my research journey in SPC began in 1998, allowing me to contribute significantly to several key areas within the field. Notable contributions include advancements in nonparametric process monitoring (e.g., Qiu and Hawkins 2001, Qiu 2018), monitoring correlated data (e.g., Qiu et al. 2020a, Xue and Qiu 2021), dynamic process monitoring (e.g., Qiu and Xiang 2014, Xie and Qiu 2023a), profile monitoring (e.g., Qiu et al. 2010, Zhou and Qiu 2022), and more. For a comprehensive description of SPC and some SPC charts developed by my research group, see the book Qiu (2014). This extensive experience has proven invaluable in my exploration of disease screening and disease surveillance, providing a robust foundation to innovate and tailor SPC methodologies to the distinctive challenges presented in these critical areas of public health. The book comprises nine chapters. In Chapter 1, a concise introduction sets the stage for understanding the challenges posed by disease screening and surveillance problems. Chapter 2 delves into fundamental statistical concepts and methods commonly employed in data modeling and analysis. Given that disease screening and surveillance involve sequential decision-making, Chapter 3 is dedicated to introducing essential SPC concepts and methods -- a major statistical tool for such problems. Chapters 4-6 focus on recent developments in DySS methods tailored for effective disease screening. Chapter 4 covers univariate and multivariate DySS methods based on direct monitoring of observed disease risk factors, while Chapter 5 introduces methods based on disease risk quantification and sequential monitoring of quantified disease risks. The practical implementation of DySS methods by the R package DySS is detailed in Chapter 6. Chapters 7-9 shift the focus to disease surveillance. Chapter 7 explores traditional methods utilizing the Knox test, scan statistics, and generalized linear modeling. Chapter 8 presents recent methods developed by my research team based on nonparametric spatio-temporal data modeling and monitoring. The implementation of these methods is demonstrated using the R package SpTe2M in Chapter 9. This book serves as an ideal primary textbook for a one-semester course focused on disease screening and/or disease surveillance, tailored for graduate students in biostatistics, bioinformatics, health data science, and related disciplines. Additionally, the book can be utilized as a supplementary textbook for courses covering analytic methods and tools relevant to medical and public health studies. Its content is designed to be accessible and beneficial for medical and public health researchers and practitioners. By introducing recent analytic tools for disease screening and surveillance, the book equips readers with valuable insights that can be easily implemented using the accompanying R packages DySS and SpTe2M. I extend my sincere gratitude to my current and former students and collaborators, Drs. Jun Li, Dongdong Xiang, Kai Yang, Lu You, and Jingnan Zhang, whose dedicated efforts, stimulating discussions, and constructive comments have played an invaluable role in the completion of this book. Their patience and insights have been indispensable. I express my deep appreciation to Dr. Xiulin Xie and Mr. Zibo Tian, who generously dedicated their time to reading the entire book manuscript and diligently corrected numerous typos and mistakes. Completing this book has been a three-year journey, and I owe a debt of gratitude to my wife, Yan, for providing unwavering help and support. Her efforts in managing household responsibilities and caring for our two sons, Andrew and Alan, allowed me to focus on this project. I extend my heartfelt thanks to my family for their love and constant support throughout this endeavor. Peihua Qiu Gainesville, Florida November 2023
Article
Full-text available
Background Extreme heat and air pollution is associated with increased mortality. Recent evidence suggests the combined effects of both is greater than the effects of each individual exposure. Low neighborhood socioeconomic status (“socioeconomic burden”) has also been associated with increased exposure and vulnerability to both heat and air pollution. We investigated if neighborhood socioeconomic burden or the combination of socioeconomic and environmental exposures (“socioenvironmental burden”) modified the effect of combined exposure to extreme heat and particulate air pollution on mortality in California. Methods We used a time-stratified case-crossover design to assess the impact of daily exposure to extreme particulate matter <2.5 μm (PM2.5) and heat on cardiovascular, respiratory, and all-cause mortality in California 2014–2019. Daily average PM2.5 and maximum temperatures based on decedent’s residential census tract were dichotomized as extreme or not. Census tract-level socioenvironmental and socioeconomic burden was assessed with the CalEnviroScreen (CES) score and a social deprivation index (SDI), and individual educational attainment was derived from death certificates. Conditional logistic regression was used to estimate associations of heat and PM2.5 with mortality with a product term used to evaluate effect measure modification. Results During the study period 1,514,292 all-cause deaths could be assigned residential exposures. Extreme heat and air pollution alone and combined were associated with increased mortality, matching prior reports. Decedents in census tracts with higher socioenvironmental and socioeconomic burden experienced more days with extreme PM2.5 exposure. However, we found no consistent effect measure modification by CES or SDI on combined or separate extreme heat and PM2.5 exposure on odds of total, cardiovascular or respiratory mortality. No effect measure modification was observed for individual education attainment. Conclusion We did not find evidence that neighborhood socioenvironmental- or socioeconomic burden significantly influenced the individual or combined impact of extreme exposures to heat and PM2.5 on mortality in California. Impact We investigated the effect measure modification by socioeconomic and socioenvironmental of the co-occurrence of heat and PM2.5, which adds support to the limited previous literature on effect measure modification by socioeconomic and socioenvironmental burden of heat alone and PM2.5 alone. We found no consistent effect measure modification by neighborhood socioenvironmental and socioeconomic burden or individual level SES of the mortality association with extreme heat and PM2.5 co-exposure. However, we did find increased number of days with extreme PM2.5 exposure in neighborhoods with high socioenvironmental and socioeconomic burden. We evaluated multiple area-level and an individual-level SES and socioenvironmental burden metrics, each estimating socioenvironmental factors differently, making our conclusion more robust.
Article
Mucopolysaccharidosis type I Hurler (MPSIH) is characterized by severe and progressive skeletal dysplasia that is not fully addressed by allogeneic hematopoietic stem cell transplantation (HSCT). Autologous hematopoietic stem progenitor cell–gene therapy (HSPC-GT) provides superior metabolic correction in patients with MPSIH compared with HSCT; however, its ability to affect skeletal manifestations is unknown. Eight patients with MPSIH (mean age at treatment: 1.9 years) received lentiviral-based HSPC-GT in a phase 1/2 clinical trial (NCT03488394). Clinical (growth, measures of kyphosis and genu velgum), functional (motor function, joint range of motion), and radiological [acetabular index (AI), migration percentage (MP) in hip x-rays and MRIs and spine MRI score] parameters of skeletal dysplasia were evaluated at baseline and multiple time points up to 4 years after treatment. Specific skeletal measures were retrospectively compared with an external cohort of HSCT-treated patients. At a median follow-up of 3.78 years after HSPC-GT, all patients treated with HSPC-GT exhibited longitudinal growth within WHO reference ranges and a median height gain greater than that observed in patients treated with HSCT after 3-year follow-up. Patients receiving HSPC-GT experienced complete and earlier normalization of joint mobility compared with patients treated with HSCT. Mean AI and MP showed progressive decreases after HSPC-GT, suggesting a reduction in acetabular dysplasia. Typical spine alterations measured through a spine MRI score stabilized after HSPC-GT. Clinical, functional, and radiological measures suggested an early beneficial effect of HSPC-GT on MPSIH-typical skeletal features. Longer follow-up is needed to draw definitive conclusions on HSPC-GT’s impact on MPSIH skeletal dysplasia.
Article
The correlation matrix might be of scientific interest for longitudinal data. However, few studies have focused on both robust estimation of the correlation matrix against model misspecification and robustness to outliers in the data, when the precision matrix possesses a typical structure. In this paper, we propose an alternative modified Cholesky decomposition (AMCD) for the precision matrix of longitudinal data, which results in robust estimation of the correlation matrix against model misspecification of the innovation variances. A joint mean-covariance model with multivariate normal distribution and AMCD is established, the quasi-Fisher scoring algorithm is developed, and the maximum likelihood estimators are proven to be consistent and asymptotically normally distributed. Furthermore, a double-robust joint modeling approach with multivariate Laplace distribution and AMCD is established, and the quasi-Newton algorithm for maximum likelihood estimation is developed. The simulation studies and real data analysis demonstrate the effectiveness of the proposed AMCD method.
Article
In this paper, we focused on identifying influential observations using Liu corrected likelihood estimator (LCLE) in linear mixed measurement error models when multicollinearity is present. Based on LCLE, the residuals were analyzed for evaluating the validity of the assumptions of a model. Also, diagnostic measures were developed to identify influential and high-leverage observations. We considered an extension of Cook’s distance to determine influential observations based on the case deletion model. Finally, a real example and also simulation studies were provided to illustrate the performance of the influence measures.
Article
Data collected over time are common in applications and may contain censored or missing observations, making it difficult to use standard statistical procedures. This article proposes an algorithm to estimate the parameters of a censored linear regression model with errors serially correlated and innovations following a Student- distribution. This distribution is widely used in the statistical modelling of data containing outliers because its longer-than-normal tails provide a robust approach to handling such data. The maximum likelihood estimates of the proposed model are obtained through a stochastic approximation of the EM algorithm. The methods are applied to an environmental dataset regarding ammonia-nitrogen concentration, which is subject to a limit of detection (left censoring) and contains missing observations. Additionally, two simulation studies are conducted to examine the asymptotic properties of the estimates and the robustness of the model. The proposed algorithm and methods are implemented in the R package ARCensReg.
Chapter
We propose a method of constructing representations of multiple one-dimensional longitudinal measurements as two-dimensional grey-scale images. This can be used to turn classification problems based on longitudinal data into simpler image classification problems, allowing image-based deep learning methods to be applied to longitudinal measurements. Our approach is applicable to situations with balanced or imbalanced longitudinal data sets and where there are missing data at some time points. To evaluate our approach, we apply it to an important and challenging task: the prediction of dementia from brain volume trajectories derived from longitudinal MRI. We construct an ensemble of convolutional neural network models to classify two groups of subjects: those diagnosed with mild cognitive impairment at all examinations (stable MCI) versus those starting out as MCI but later converting to Alzheimer’s disease (converted AD). Models were trained on image representations derived from \(N=736\) subjects sourced from the ADNI database (471/265 sMCI/cAD). We obtained an accuracy of a resulting ensemble model of \(76\%\), measured on an independent test set. Our approach is simple and easy to apply but competitive (in terms of accuracy), with results reported in other machine learning approaches with similar classification on comparable tasks. This indicates that this approach can lead to useful representations of longitudinal data.
Article
To analyze tree growth statistically through annual ring widths measured in 2-D horizontal trunk sections, we propose two tests of significance defined under a linear-circular regression model with fixed trigonometric effects and normal random errors with a variance-covariance structure from the symmetric circulant family. The associated von Mises distribution has a preferred direction parameter. Accordingly, the first test aims to assess the presence of a preferred direction in the radial growth of a tree from the center of its trunk in a given year. Assuming there is a preferred direction of radial growth for the tree in two years, the second test extends the first one by assessing the equality of tree radial growth in the two preferred directions. Both tests of significance are modified F-tests with the denominator df adjusted for the presence of autocorrelation. Their validity is analyzed for two autoregressive symmetric circulant correlation structures, as a function of the number (n) of angular data and the autocorrelation parameter value. Effects of the inter-year correlation coefficient value are also studied in the two-year case. The performance of REstricted Maximum Likelihood as estimation method is scrutinized in an extensive Monte Carlo study, and the power of the tests is analyzed when valid. The new testing procedures are applied with \(n = 32, 64\) ring widths per year for a white spruce tree during 18 years of growth until its harvest. R codes are available. Conclusions and perspectives for future research are given. Supplementary materials accompanying this paper appear on-line.
Article
Objective Determine whether continuous glucose monitor (CGM) metrics can provide actionable advance warning of an emergency department (ED) visit or hospitalization for hypoglycemic or hyperglycemic (dysglycemic) events. Research Design and Methods Two nested case-control studies were conducted among insulin-treated diabetes patients at Kaiser Permanente who shared their CGM data with their providers. Cases included dysglycemic events identified from ED and hospital records (2016 -2021). Controls were selected using incidence density sampling. Multiple CGM metrics were calculated among patients using CGM >70% of the time using CGM data from two lookback periods (0-7 days and 8-14 days) prior to each event. Generalized estimating equations were specified to estimate odds ratios and C-statistics. Results Among 3,626 CGM users, 108 patients had 154 hypoglycemic events and 165 patients had 335 hyperglycemic events. Approximately 25% of patients had no CGM data during either lookback; these patients had >2x the odds of a hypoglycemic event and 3-4x the odds of a hyperglycemic event. While several metrics were strongly associated with a dysglycemic event, none had good discrimination. Conclusion Several CGM metrics were strongly associated with risk of dysglycemic events and these can be used to identify higher risk patients. Also, patients who are not using their CGM device may be at elevated risk of adverse outcomes. However, no CGM metric or absence of CGM data had adequate discrimination to reliably provide actionable advance warning of an event and thus justify a rapid intervention.
Preprint
Full-text available
The original protocol was approved on December 19, 2018 during the funding application process (NIH/NIDDK 1K23DK116935). Two protocol amendments were made prior to the start of recruitment. First, the study medication was changed from liraglutide (the medication proposed in the original K23 funding application) to phentermine, and medication-specific sections of the protocol, including medication-related inclusion/exclusion criteria, were modified accordingly (IRB-approved May 13, 2019). The second amendment included the following minor changes (IRB-approved July 18, 2019): replacing cholecystokinin (CCK) with insulin in a neuropeptide hormone panel; adding a prespecified secondary analysis of changes in past-week VAS ratings of appetite, adding exploratory questionnaires (Philadelphia Mindfulness Questionnaire, Yale Food Addiction questionnaire, sleep hours/week); specifying that the electrocardiogram (ECG) would occur at screening and that the first BT session would start immediately after the baseline assessment visit; updating the portion size of the liquid test meal for males vs. females; and specifying that only randomized participants would complete a cardiometabolic and lipid panel at the randomization visit. Three minor protocol amendments were made during the trial: 1) in order to reduce in-person contact in response to COVID-19, we allowed the ECG to take place any time prior to randomization (rather than only at screening) and specified that all lifestyle counseling sessions (not just make-up sessions) could be conducted remotely via videoconferencing or phone (IRB-approved July 27, 2020); 2) we replaced the term “nurse practitioner” with “research nurse” due to a change in the Center’s staff (IRB-approved April 13, 2021); and 3) we amended that HbA1c could be substituted for fasting blood glucose for the second, confirmatory assessment of diabetes at screening (IRB-approved May 14, 2021).
Preprint
Background: The overall goal of the North Carolina Works for Health (NCW4H) study is to adapt and test the effectiveness of a multilevel intervention to reduce chronic disease risk in socioeconomically disadvantaged, unemployed (SEDU) populations who rely on publicly funded job placement programs to secure employment. Studies have shown an unemployment episode increases in psychological distress, health compromising coping behavior, blood pressure, and weight gain, all of which increase chronic disease risk. Methods: A randomized, 2 x 2 factorial design will test an individual level (IL) and employer level (EL) intervention, and their joint effects, in SEDU adults receiving job placement services through publicly funded programs. Interventions consist of a chronic disease prevention program adapted from the Diabetes Prevention Program at the IL, and an implicit bias based supervisor support program for newly hired SEDU adults at the EL. We will enroll 600 SEDU adults 18 to 64 years of age who have either received public assistance benefits in the prior two years or have <4 year college degree, used publicly funded job placement services during the most recent unemployment episode, and are not receiving disability income, not pregnant, and fluent in English; and 80-200 supervisors of our SEDU enrolled participants when hired by an employer. Primary outcomes include psychological distress, blood pressure, and weight gain, and will be collected at baseline and 3, 6, and 12 months post-enrollment. Secondary outcomes related to coping, health behavior, workplace support, and employment will also be collected. Main effects of the IL and EL interventions, and IL x EL interactions, will be analyzed using generalized multivariate models accounting for clustering effects. Discussion: This multilevel intervention is novel in that it is designed to mitigate chronic disease risk during an unemployment episode, and includes an intervention at a level at the employer level, where social determinants of health operate. Despite the design challenges that multilevel intervention trials such as the NCW4H study present, they are needed to meaningfully address health inequities in the U.S. Findings from this study are expected to inform how approaches that incorporate public health and employment sectors could reduce chronic disease among socioeconomically disadvantaged populations.
Preprint
Full-text available
Objective: The use of blood-based biomarkers of Alzheimer disease (AD) may facilitate access to biomarker testing of groups that have been historically under-represented in research. We evaluated whether plasma Aβ42/40 has similar or different baseline levels and longitudinal rates of change in participants racialized as Black or White. Methods: The Study of Race to Understand Alzheimer Biomarkers (SORTOUT-AB) is a multi-center longitudinal study to evaluate for potential differences in AD biomarkers between individuals racialized as Black or White. Plasma samples collected at three AD Research Centers (Washington University, University of Pennsylvania, and University of Alabama-Birmingham) underwent analysis with C2N Diagnostics’ PrecivityAD™ blood test for Aβ42 and Aβ40. General linear mixed effects models were used to estimate the baseline levels and rates of longitudinal change for plasma Aβ measures in both racial groups. Analyses also examined whether dementia status, age, sex, education, APOE ε4 carrier status, medical comorbidities, or fasting status modified potential racial differences. Results: Of the 324 Black and 1,547 White participants, there were 158 Black and 759 White participants with plasma Aβ measures from at least two longitudinal samples over a mean interval of 6.62 years. At baseline, the group of Black participants had lower levels of plasma Aβ40 but similar levels of plasma Aβ42 as compared to the group of White participants. As a result, baseline plasma Aβ42/40 levels were higher in the Black group than the White group, consistent with the Black group having lower levels of amyloid pathology. Racial differences in plasma Aβ42/40 were not modified by age, sex, education, APOE ε4 carrier status, medical conditions (hypertension and diabetes), or fasting status. Despite differences in baseline levels, the Black and White groups had a similar longitudinal rate of change in plasma Aβ42/40. Interpretation: Black individuals participating in AD research studies had a higher mean level of plasma Aβ42/40, consistent with a lower level of amyloid pathology, which, if confirmed, may imply a lower proportion of Black individuals being eligible for AD clinical trials in which the presence of amyloid is a prerequisite. However, there was no significant racial difference in the rate of change in plasma Aβ42/40, suggesting that amyloid pathology accumulates similarly across racialized groups.
Article
Full-text available
Introduction: To reduce obesity-related disparities, reaching economically disadvantaged and/or minority status adolescents to assist them in meeting physical activity (PA) and nutrition recommendations is important. To address the problem, a 16-week intervention called Guys/Girls Opt for Activities for Life (GOAL) was designed. The purpose of this randomised controlled trial is to evaluate any effect of the intervention, compared with a control condition, on improving: (1) adolescents' % body fat (primary outcome), moderate-to-vigorous PA (MVPA), diet quality and cardiorespiratory fitness from 0 to 4 months; (2) body mass index (BMI), overweight/obesity percentage and quality of life from 0 to 4 months and to 13 months; and (3) perceived social support, self-efficacy and motivation from 0 to 4 months with evaluation of any mediating effect on adolescent PA and diet quality. An exploratory aim is to evaluate any effect of the intervention, compared with the control, on improving parents'/guardians' home environment, MVPA and diet quality from 0 to 4 months; and BMI from 0 to 4 months and to 13 months. Methods and analysis: Adolescents (fifth to eighth grade) in 14 schools located in underserved urban communities are randomly assigned to the intervention or usual school offerings. One parent per adolescent is enrolled (882 dyads total). Cohort 1 includes four schools (2022-2023). Cohorts 2 and 3 include 5 schools in 2023-2024 and 2024-2025, respectively. The 16-week intervention has three components: (1) after-school GOAL club for adolescents to engage in PA and healthy eating/cooking activities; (2) three parent-adolescent meetings to empower parents to assist adolescents; and (3) GOAL social networking website for parents to share how they helped their adolescent. Ethics and dissemination: The Michigan State University Biomedical Institutional Review Board provided ethical approval for the study. Findings will be shared via the trial registration database, peer-reviewed publications, conferences and community-oriented strategies. Trial registration number: NCT04213014.
Chapter
Practical experience and theoretical study are the basis for solid technical knowledge. Electrical engineering courses require a series of laboratory exercises, often held in university facilities where students can practice only a few hours a week, if at all. This technique has changed dramatically after the Covid-19 health pandemic and social distancing. The possibility of acquiring and improving computational thinking and IoT technological development skills (MQTT) has been facilitated by new freely available digital tools, which has made it possible to go much deeper into the subject (“Smart Cities”) in a didactic way. As a result, we propose systematically creating e-Learning distance learning workshops in a virtualized environment, including IoT simulators (sensors and actuators) interacting with cloud servers. Its implementation capability transforms the instructor into a facilitator or guide to acquire information and verify learning outcomes through checklists.
Article
Full-text available
Background High-quality systematic data on antimicrobial use in UK inpatient paediatric haematology-oncology services are lacking, despite this population being at high risk from antimicrobial exposure and resistance. Objectives We conducted a retrospective study to demonstrate how routinely collected electronic prescribing data can address this issue. Patients and methods This retrospective study describes and compares IV antibiotic consumption between two UK paediatric haematology-oncology inpatient units, between 2018 and 2022. Both sites provide similar services and receive proactive antimicrobial stewardship input. Data were extracted from each site’s antimicrobial surveillance system, which report monthly days of therapy (DOT) per 100 patient-days (PD). Consumption was reported for specific and total antibiotics. Trends were modelled using linear regression and autoregressive moving average models. Results Total IV antibiotic consumption at each site was similar. Median monthly DOT per 100 PD were 25.9 (IQR: 22.1–34.0) and 29.4 (24.2–34.9). Total antibiotic use declined at both sites, with estimated annual yearly reductions of 3.52 DOT per 100 PD (95% CI: 0.46–6.59) and 2.57 (1.30–3.85). Absolute consumption was similar for carbapenems, piperacillin/tazobactam and aminoglycosides, whilst ceftriaxone and teicoplanin demonstrated approximately 3-fold relative differences in median monthly consumption. Meropenem, piperacillin/tazobactam, teicoplanin, vancomycin and gentamicin all demonstrated statistically significant reductions in use over time at either one or both sites, although this was most marked for piperacillin/tazobactam and vancomycin. Conclusions Routinely collected electronic prescribing data can aid benchmarking of antibiotic use in paediatric haematology-oncology inpatients, highlighting areas to target stewardship strategies, and evaluating their impact. This approach should be rolled out nationally, and to other high-risk groups.
Article
Full-text available
Quantitative social research relies heavily on data that originate either from surveys with sampling designs that depart from simple random sampling, or from observational studies with no formal sampling design. Simple random sampling is often not feasible, or its use would yield data with less information about certain features of interest, and it is often economically prohibitive. For example, in studies of school effectiveness it may be difficult to secure the cooperation of a school or a classroom. Therefore it would be rather wasteful to collect data from a small number of students in such a classroom. Data from a larger proportion, or from all the students, could be collected at a small additional expense, thus reducing the number of classrooms required for a sample to contain sufficient information for the intended purposes. Similarly, in household surveys, having contacted a selected individual, it would make sense to collect data from the rest of the members of the household at the same time. When this is done, we usually end up with data for which the standard assumptions of independence (such as in ordinary regression) are inappropriate.
Article
Clustered or correlated samples of binary responses arise frequently in practice due to repeated measurements or to subsampling the primary sampling units. Several recent approaches address intracluster correlation in binary regression problems including cluster-specific methods such as those based on mixed-effects logistic models and population-averaged methods such as those based on beta-binomial models. This paper considers the interpretations of the regression parameters in these two general approaches. We show that, unlike models for correlated Gaussian outcomes, the parameters of the cluster-specific and population-averaged models for correlated binary data describe different types of effects of the covariates on the response probabilities. In the case of random intercepts, we show that the covariate effects measured by the population-averaged approach are closer to zero than those of the cluster-specific approach when the cluster-specific model holds and that the difference in the magnitude of the covariate effects is increasing with intra-cluster correlation. The case of random slopes is also examined. These results are valid for arbitrary random effects distributions and are demonstrated using data on the ability to obtain samples of breast fluid from women. /// Des échantillons corrélés ou en grappes de réponses binaire se présentent fréquemment en pratique à cause de mesures répétées ou de sous-échantillonage d'échantillons initiaux. La corrélation intra-grappe dans les problèmes de régression binaires a été l'objet d'études et méthodes récentes. On peut citer les méthodes spécifiques aux grappes telles que celles basées sur les modèles logistiques à effets mixtes ainsi que les méthodes de moyennes de populations telles que celles basées sur les modèles bêta-binomiaux. Notre article considère l'interprétation des paramètres de régression dans ces méthods générales. Nous démontrons que, contrairement à ce qui se passe pour des résultats Gaussiens corrélés, pour des données binaires corrélées les paramètres spécifique aux grappes et ceux des modèles de moyennes de populations décrivent différents types d'effet des covariables sur les probabilités de réponse. Dans le cas des valeurs à zéro aléatories, nous démontrons que si la modèle des grappes est verifié, mesurer l'effet des covariables par la méthode des moyennes de populations donne des résultats plus proches de zéro que si l'on mesure par les méthodes spécifiques aux grappes. On démontre aussi que la différence entre les résultats des deux méthodes augmente avec la corrélation intra-grappe. Nous étudions aussi le cas des pentes aléatories. Ces résultats s'appliquent à toute distribution des effets aléatoires. On le décrit pour des données sur la capacité d'obtenir des échantillons de liquide mammaire.
Article
This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for niultivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the pioposecl estimators in two simple situations is considered. The approach is closely related to quasi-likelihood.
Article
Dependent binary response data arise frequently in practice due to repeated measurements in longitudinal studies or to subsampling primary sampling units as in fields such as teratology and ophthalmology. Several classes of approaches have recently been proposed to analyse such repeated binary outcome data. The different classes of approaches measure different effects of covariates on binary responses and address different statistical questions. This article compares the different classes of approaches in terms of parameter interpretation and magnitude, standard errors of model parameters and Wald tests for covariate effects. The results help to clarify the substantive questions which data analysts can address with each approach, as well as why the covariate effects measured by different approaches may be different. Finally, I will provide guidelines to the advantages and disadvantages of alternative approaches for analysing dependent binary responses. Simulations and example data illustrate these findings.
Article
This paper describes methods for simultaneous cross-sectional and longitudinal analysis of repeated measurements obtained in cohort studies with regular examination schedules, then uses these methods to describe age-related changes in pulmonary function level among nonsmoking participants in the Six Cities Study, a longitudinal study of air pollution and respiratory health conducted between 1974 and 1983 in Watertown, Massachusetts; Kingston and Harriman, Tennessee; St. Louis, Missouri; Steubenville, Ohio; Portage, Wisconsin; and Topeka, Kansas. The subjects, initially aged 25-74, were examined on three occasions at 3-year intervals. Individual rates of loss increased more rapidly with age than predicted from the cross-sectional model. For example, for a male of height 1.75 m, the cross-sectional model predicted an increase in the annual rate of loss of FEV1 from 23.7 ml/yr at age 25 to 39.0 ml/yr at age 75, while the longitudinal model gave rates of loss increasing from 12.9 ml/yr at age 25 to 58.2 ml/yr at age 75. These results contrast with those of other studies comparing longitudinal and cross-sectional estimates of pulmonary function loss.
Article
This article discusses extensions of generalized linear models for the analysis of longitudinal data. Two approaches are considered: subject-specific (SS) models in which heterogeneity in regression parameters is explicitly modelled; and population-averaged (PA) models in which the aggregate response for the population is the focus. We use a generalized estimating equation approach to fit both classes of models for discrete and continuous outcomes. When the subject-specific parameters are assumed to follow a Gaussian distribution, simple relationships between the PA and SS parameters are available. The methods are illustrated with an analysis of data on mother's smoking and children's respiratory disease.
Article
Dependent data, such as arise with cluster sampling, typically yield variances of parameter estimates which are larger than would be provided by a simple random sample of the same size. This variance inflation factor is called the design effect of the estimator. Design effects have been derived for cluster sampling designs using simple estimators such as means and proportions, and also for linear regression coefficient estimators. In this paper, we show that a method to derive design effects for linear regression estimators extends to generalized linear models for binary responses. In particular, some simple expressions for design effects in the linear regression model provide accurate approximations for binary regression models such as those based on the logistic, probit and complementary log-log links. We corroborate our findings with two examples and some simulation studies.