Table 1 - uploaded by Russ Wolfinger
Content may be subject to copyright.
The Three Most Common Covariance Structures

The Three Most Common Covariance Structures

Source publication
Article
Full-text available
This article provides a unified discussion of a useful collection of heterogeneous covariance structures for repeated-measures data. The collection includes heterogeneous versions of the compound symmetry and first-order autoregressive structures, the Huynh-Feldt structure, the independent-increments structure, correlated random coefficients models...

Contexts in source publication

Context 1
... basic assumption here is that data from different subjects are uncorrelated, but some kind of covariance structure holds for data arising from the same subject. Table 1 describes the three most commonly assumed covariance structures of this approach: compound symmetry (CS), first-order autoregressive [AR(1)], and unstructured (UN). CS and AR(1) are homogeneous structures; that is, the variance along the main diag- onal is constant. ...
Context 2
... 37 40 31 25 33 29 27 36 32 30 34 28 63 45 74 63 54 46 60 52 47 51 52 43 91 79 100 75 56 68 47 50 79 65 70 61 81 80 68 58 78 55 67 60 46 38 37 38 39 35 34 37 32 30 30 32 60 50 67 54 35 37 48 39 39 36 39 31 50 34 37 40 43 37 39 50 48 54 57 43 run; /*---unstructured covariance matrix---*/ proc mixed data=cork; class dir tree; model y dir; repeated / type=un sub=tree r rcorr; run; ...

Similar publications

Article
Full-text available
Data consisting of individual records of male and female animals of purebred Bos Indicus beef cattle (Nellore, Guzera, Gir and Indubrasil) weighted every three months from birth to 24 months of age available from National Archive of Brazilian Zebu Breeders Association (ABCZ) were used 10 evaluate two alternatives (co)variance analyses for body weig...
Article
Full-text available
In this study, the Von Bertalanffy, Richards, Gompertz, Brody, and Logistics non-linear mixed regression models were compared for their ability to estimate the growth curve in commercial laying hens. Data were obtained from 100 Lohmann LSL layers. The animals were identified and then weighed weekly from day 20 after hatch until they were 553 days o...
Article
Full-text available
Repeated measures ANOVA is a technique used to test the equality of means. It is performed when all the members of a random sample are tested under a number of many conditions. Repeated measures data needed special methods of statistical analysis as several types of covariance structure could be applied. Each of the regression and ANOVA methods cou...
Conference Paper
Full-text available
When estimating multilevel models (also called hierarchical models, mixed models, and random effect models), researchers are often interested not only in the regression coefficients but also in the fit of the overall model to the data (e.g.,-2LL, AIC, BIC). Whereas both model fit and regression coefficient estimates are important to examine when es...

Citations

... For valid inferences, the comparison and selection of variance-covariance structures should be done before examining and fitting the fixed effects (mean structure) model that provides a good fit to estimate changes in the response at both the group/population and the subject/individual levels [2,4,6,25,27,[29][30][31]. ...
... For repeated measures observed at 4 time points for each subject, the variance-covariance structure of the errors for a given subject can be given in a matrix notation as follows (in general, if we have measures at k time points, there are k-variances at diagonal and k(k-1)/2 covariances at off-diagonal, and a total of k(k+1)/2 variance-covariances parameters to be estimated) [3,4,26,30]: As we can see from the matrix, there are a total of 10 different parameters to be estimated, and there are no specific structure considerations about the variances (equal or unequal variances) and covariances (no correlation or common correlation or varying correlation). This type of covariance structure is called unstructured covariance (UN) and is the most complex (as we need many distinct parameters to estimate) and flexible (as it imposes no pattern on the covariances) structure [3,4,31]. ...
... Table 1 given below shows the model fit statistics for each fitted variance-covariance structure, and we can see the information criteria values are relatively the smallest (largest negative values) in all heterogeneous variance-covariance structures, which agree with the patterns indicated in the data exploration parts. Based on the patterns we observed in data exploration and using BIC information criteria (to avoid loss of statistical power), we select the simpler/parsimonious heterogeneous compound symmetry (CSH) using chi-square difference test as a final variance-covariance structure to fit the model and test the mean structures [2,30,50,51]. ...
Article
Full-text available
Accounting for zonal-level variations and identifying factors that have linear effects on crop production help to make better decisions and plan new policies for effective crop production and food security. The main objective of this study is to identify potential subsets of covariates and estimate their linear effects on crop production. A linear mixed effects model (random--intercept) is used on agricultural sample survey data for Meher seasons from 2012/13 to 2019/20 to explore and identify the best variance-covariance structure for the longitudinal data on 90 zones with eight repeated observations and different sampling weights. The minimum, mean, and maximum crop production by farmers across the country are 1.616, 8.693, and 147.843 quintals, respectively, and about 98 % of farmers produced less than 25 quintals. There is a small rate of increase in mean and median crop production by farmers across the years, and the variability between zones is highest in the year 2019/20 and in the Somali region. The histogram, kernel density, and P–P plots suggested a common logarithm transformation on the crop production variable. Results from the data exploration and variance-covariance structure selection methods suggested a heterogeneous compound symmetry (CSH) structure. Covariates region, year, proportion of farmers who practice pure-agriculture and other-agriculture types, proportion of farmers who use any type of fertilizer, farmer's age, area used, farmer association crop production, indigenous seed used, improved seed used, UREA fertilizer used, other fertilizers used, and percentage of crop damaged are significant in linearly explaining/affecting log crop production, and among these area used, farmers association crop production, UREA fertilizer used, and indigenous seed used have relatively highest effect on log crop production. Zones Wolayita, North-Shewa (Am), West-Arsi, West-Welega, Dawro, and Guji are top/good performers while zones Southwest-Shewa, Waghimra, Guraghe, South-Omo, Keffa, North-Wello, South-Wello, and Eastern Tigray are bottom/poor performers in crop production. Model assumptions and influence diagnostics results suggested the linearity of the model and normality of random effects and residuals are not violated, even though some zones have influences on either model parameters, precisions of estimates of these parameters, and predicted values.
... We assume that both ln(X i,t ), ln(Y t ), X i,t and Y t have finite moments. As we are modelling multiple groups of the same species, we further assume a Heterogeneous Compound Symmetry (HCS) conditional variance-covariance structure (Thall and Vail, 1990;Wolfinger, 1996), similar to when one is analysing repeated measurements of clustered data. That is, for σ 2 it := Var(X i,t |w, H t−1 ), we are assuming that ...
... A common problem that occurs during model selection is the risk of over parameterization of the model due to more parameters which are not part of the model used for design generation. The Akaike information criterion (AIC) deals with this problem via a penalty for model complexity and is calculated as minus twice the residual log-likelihood plus twice the number of variance parameters (Wolfinger, 1996). Thus, through AIC, we can assess whether fitting of different spatial models can improve the baseline model or not. ...
Article
Full-text available
In plant breeding field experiments, proper statistical design and analysis improve precision of genotype comparisons. The focus of this study was to compare the precision of different spatial techniques in estimating genotypic effects using sorghum [Sorghum bicolor (L.) Moench] breeding data from Ethiopia and to investigate alternative design strategies maintaining overall field layout of the current trials while modifying the blocking (replicate, rows, and columns) structures compared to the current practice. The current trials comprise both partially replicated and fully replicated row–column designs where the field layout has short rows and long columns. For model comparison, six partially replicated row–column trials and 10 fully replicated row–column trials of sorghum were used. Relative efficiency calculations for the designs indicate that in most of the trials, alpha designs with block sizes of five, six, 10 and 15, and the alternative row–column designs were more efficient than the current design practice. Moreover, overall model comparison showed that augmenting the baseline model by a two‐dimensional nonlinear spatial model plus nugget improves the precision, while the randomization‐based plus two‐dimensional linear variance model and the randomization based plus a two‐dimensional nonlinear spatial model are also good candidate models. If row and column coordinates are available for all plots, the post‐blocking approach used here can be used in any breeding program and crop to explore alternative design options.
... Our baseline model in Table 1 has independent (ID) residual effects e ij with constant variance. Replacing this by an AR(1) model for serial correlation of observations on the same genotype across the 18 times (environments) leads to a substantial drop in the Akaike information criterion (AIC) (Wolfinger, 1996), indicating that serial correlation is important for this repeated measures data. Next, we add random effects x T j c i as in model (16) and fit an FA0(1) structure for the T A B L E 1 Model selection of covariance structure for lettuce data using fixed effects α i þ ε j . ...
... The variance explained is also comparable between the different approaches (Table 5). Table 6 gives an overview of the fits obtained by the different methods and models when using the full likelihood, thus permitting comparison of models with different fixed effects (Wolfinger, 1996). ...
Article
Finlay–Wilkinson regression is a popular method for analysing genotype–environment interaction in series of plant breeding and variety trials. It involves a regression on the environmental mean, indexing the productivity of an environment, which is driven by a wide array of environmental factors. Increasingly, it is becoming feasible to characterize environments explicitly using observable environmental covariates. Hence, there is mounting interest to replace the environmental index with an explicit regression on such observable environmental covariates. This paper reviews the development of such methods. The focus is on parsimonious models that allow replacing the environmental index by regression on synthetic environmental covariates formed as linear combinations of a larger number of observable environmental covariates. Two new methods are proposed for obtaining such synthetic covariates, which may be integrated into genotype‐specific regression models, that is, criss‐cross regression and a factor‐analytic approach. The main advantage of such explicit modelling is that predictions can be made also for new environments where trials have not been conducted. A published dataset is employed to illustrate the proposed methods.
... Moreover, slopes for the different covariates have the same variance and there is no correlation between them. This represents a key difference from the RRMs we used, given that we allowed (co)variance between and within intercept and slope to be a free parameter, which is essential to ensure invariance with respect to translation and scale transformation of the covariates (Wolfinger, 1996). Heslot et al. (2014) reported not being able to implement a RRM with such a covariance structure in a similarly large wheat dataset due to extensive RAM usage with ASReml-R. ...
Article
Full-text available
Genotype by environment interaction (GEI) is one of the main challenges in plant breeding. A complete characterization of it is necessary to decide on proper breeding strategies. Random regression models (RRMs) allow a genotype‐specific response to each regressor factor. RRMs that include selected environmental variables represent a promising approach to deal with GEI in genomic prediction. They enable to predict for both tested and untested environments, but their utility in a plant breeding scenario remains to be shown. We used phenotypic, climatic, pedigree, and genomic data from two public subtropical rice (Oryza sativa L.) breeding programs; one manages the indica population and the other manages the japonica population. First, we characterized GEI for grain yield (GY) with a set of tools: variance component estimation, mega‐environment (ME) definition, and correlation between locations, sowing periods, and MEs. Then, we identified the most influential climatic variables related to GY and its GEI and used them in RRMs for single‐step genomic prediction. Finally, we evaluated the predictive ability of these models for GY prediction in tested and untested years and environments using the complete dataset and within each ME. Our results suggest large GEI in both populations while larger in indica than in japonica. In indica, early sowing periods showed crossover (i.e., rank‐change) GEI with other sowing periods. Climatic variables related to temperature, radiation, wind, and precipitation affecting GY were identified and differed in each population. RRMs with selected climatic covariates improved the predictive ability in both tested and untested years and environments. Prediction using the complete dataset performed better than predicting within each ME.
... In fact, T. grandiflorum has a production window of 6 months, so data can be collected repeatedly within a year. Such a pattern characterizes repeated measures data, in which the most important peculiarity is the presence of genetic and environmental covariance between measures (Faveri et al., 2017;Piepho & Eckl, 2014;Wolfinger, 1996). This cannot be ignored if breeders intend to follow the basic principles of plant breeding, namely (i) appropriate experimental design, (ii) accurate data collection and (iii) adequate statistical models for genetic evaluation (Stringer et al., 2017). ...
... The AR1H covariance structure is often used in spatial analyses, considering that as the distance between two plots increases, the correlation between them decreases (Chavarría-Perez et al., 2020;Gilmour et al., 1997). This concept can be applied in repeated measures context, i.e., the correlation between two measures decreases as the time between them increases (Piepho & Eckl, 2014;Verbyla et al., 2021;Wolfinger, 1996). For T. grandiflorum and other fruitbearing species, this occurs not only because of environmental factors but also because of the plants' maturity stage. ...
Article
Full-text available
Theobroma grandiflorum is a perennial fruit tree native to the Amazon region. As a perennial species with continuous production throughout the years, breeders should seek well‐conducted trials, accurate phenotyping and adequate statistical models for genetic evaluation and selection that can leverage the information provided by the repeated measures. We evaluated 13 models with different covariance structures for genetic and residual effects for T. grandiflorum evaluation, using an unbalanced dataset with 34 hybrids from the triple‐crossing of nine parents, planted in a randomized complete block design. For nine consecutive years, the fruit yield of these hybrids was evaluated. Each model had its goodness‐of‐fit tested by the Akaike information criterion. The most adequate model for estimating the variance components and the breeding values were modelled with the first‐order heterogeneous autoregressive for residual effects and third‐order factor analytic for genetic effects. From this model, we used the factor analytic selection tools for selecting the top 10 families, providing a genetic gain of 10.42%. These results are important not only for T. grandiflorum breeding but also to show that in any repeated measures' data from fruit‐bearing perennial species the modelling of genetic and residual effects should not be neglected.
... We assume that both ln(X i,t ), ln(Y t ), X i,t and Y t have finite moments. As we are modelling multiple groups of the same species, we further assume a Heterogeneous Compound Symmetry (HCS) conditional variance-covariance structure (Thall and Vail, 1990;Wolfinger, 1996), similar to when one is analysing repeated measurements of clustered data. That is, for σ 2 it := Var(X i,t |w, H t−1 ), we are assuming that ...
Preprint
Full-text available
The internal behaviour of a population is an important feature to take account of when modelling their dynamics. In line with kin selection theory, many social species tend to cluster into distinct groups in order to enhance their overall population fitness. Temporal interactions between populations are often modelled using classical mathematical models, but these sometimes fail to delve deeper into the, often uncertain, relationships within populations. Here, we introduce a stochastic framework that aims to capture the interactions of animal groups and an auxiliary population over time. We demonstrate the model's capabilities, from a Bayesian perspective, through simulation studies and by fitting it to predator-prey count time series data. We then derive an approximation to the group correlation structure within such a population, while also taking account of the effect of the auxiliary population. We finally discuss how this approximation can lead to ecologically realistic interpretations in a predator-prey context. This approximation can also serve as verification to whether the population in question satisfies our various simplifying assumptions. Our modelling approach will be useful for empiricists for monitoring groups within a conservation framework and also theoreticians wanting to quantify interactions, to study cooperation and other phenomena within social populations.
... For single-trial analysis, autoregressive (AR1) and symmetric autoregressive (SAR) models provided the best performance in most datasets tested based on AIC values. Those models are known as spatial correlation models, and they are specifically designed to model spatially autocorrelated data based on neighborhood relationships (Wolfinger 1996;Gilmour et al. 1997). Spatial models are especially useful in unbalanced or nonreplicated experiments such as ARCBD. ...
Article
Full-text available
Key Message The use of multi-environment trials to test yield-related traits in a diverse alfalfa panel allowed to find multiple molecular markers associated with complex agronomic traits. Abstract Yield is one of the most important target traits in alfalfa breeding; however, yield is a complex trait affected by genetic and environmental factors. In this study, we used multi-environment trials to test yield-related traits in a diverse panel composed of 200 alfalfa accessions and varieties. Phenotypic data of maturity stage measured as mean stage by count (MSC), dry matter content, plant height (PH), biomass yield (Yi), and fall dormancy (FD) were collected in three locations in Idaho, Oregon, and Washington from 2018 to 2020. Single-trial and stagewise analyses were used to obtain estimated trait means of entries by environment. The plants were genotyped using a genotyping by sequencing approach and obtained a genotypic matrix with 97,345 single nucleotide polymorphisms. Genome-wide association studies identified a total of 84 markers associated with the traits analyzed. Of those, 29 markers were in noncoding regions and 55 markers were in coding regions. Ten significant SNPs at the same locus were associated with FD and they were linked to a gene annotated as a nuclear fusion defective 4-like (NFD4). Additional SNPs associated with MSC, PH, and Yi were annotated as transcription factors such as Cysteine3Histidine (C3H), Hap3/NF-YB family, and serine/threonine-protein phosphatase 7 proteins, respectively. Our results provide insight into the genetic factors that influence alfalfa maturity, yield, and dormancy, which is helpful to speed up the genetic gain toward alfalfa yield improvement.
... The estimation method used was Maximum Likelihood as we did not include random effects in the model (p 835, Field, 2013). Based on the lowest BIC value we selected either the compound symmetry covariance structure or its heterogeneous version CSH in order to account for the heterogeneity of variance across different organs (Wolfinger, 1996). Olfactory bulb and PFC data were also analyzed via LMM using Group (control or treated) and Region (Olf. ...
Article
Full-text available
Background Chronic alcohol consumption and alcohol use disorder have a tremendous impact on the patient's psychological and physiological health. There is evidence that chronic alcohol consumption influences SARS‐CoV2 infection risk, but so far, the molecular mechanism underlying such an effect is unknown. Methods We generated the expression data of SARS‐CoV2 infection‐relevant genes (Ace2, Tmprss2, and Mas) in different organs in rat models of chronic alcohol exposure and alcohol dependence. Ace2 and Tmprss2 represent the virus entry point, whereas Mas activates the anti‐inflammatory response once the cells are infected. Results Across three different chronic alcohol test conditions, we found a consistent upregulation of Ace2 gene expression in the lung, which has been shown to be the most affected organ in COVID‐19 patients. Other organs such as liver, ileum, kidney, heart, and brain also showed upregulation of Ace2 and Mas gene expression but less consistently across the different animal models, while Tmprss2 expression was unaffected in all conditions. Conclusions We conclude that alcohol‐induced upregulation of Ace2 gene expression can lead to an elevated stochastic probability of virus entry into cells and may thus confer a molecular risk for SARS‐CoV2 infection.
... Our baseline model in Table 1 has independent (ID) residual effects e ij with constant variance. Replacing this by an AR(1) model for serial correlation of observations on the same genotype across the 18 times (environments) leads to a substantial drop in the Akaike Information Criterion (AIC) (Wolfinger, 1996), indicating that serial correlation is important for this repeated measures data. Next, we add random effects i T j c x as in model (16) and fit an FA0(1) structure for the covariance among random slopes c i . ...
... The variance explained is also comparable between the different approaches (Table 5). Table 6 gives an overview of the fits obtained by the different methods and models when using full maximum likelihood (ML), thus permitting comparison of models with different fixed effects (Wolfinger, 1996). This also comprises model estimates based on SVD using the fitted factorial regression model (option iii). ...
Preprint
Full-text available
Finlay-Wilkinson regression is a popular method for analysing genotype-environment interaction in series of plant breeding and variety trials. The method involves a regression on the environmental mean, computed as the average of all genotype means. The environmental mean indexes the productivity of an environment, which is driven by a wide array of environmental factors. Increasingly, it is becoming feasible to characterize environments explicitly using observable environmental covariates. Hence, there is mounting interest to replace the environmental index with an explicit regression on such observable environmental covariates. This paper reviews the development of such methods. The focus is on parsimonious models that allow replacing the environmental index by regression on synthetic environmental covariates formed as linear combinations of a larger number of observable environmental covariates. Two new methods are proposed for obtaining such synthetic covariates, which may be integrated into genotype-specific regression models. The main advantage of such explicit modelling is that predictions can be made also for new environments where trials have not been conducted. A published dataset is employed to illustrate the proposed methods.