ArticlePDF Available

Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat

December 2012
G3 Genes Genomes Genetics 2(12):1595-605

December 2012
2(12):1595-605

DOI:10.1534/g3.112.003665

Source
PubMed

License
CC BY 3.0

Authors:

Daniel Gianola

University of Wisconsin–Madison

Juan Manuel Gonzalez-Camacho

Colegio de Postgraduados

Jose Crossa

Consultative Group on International Agricultural Research

Show all 6 authorsHide

In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

Structure of a single-layer feedforward neural network (SLNN) adapted from González-Camacho et al. (2012). In the hidden layer, input variables x i = ðx i1 ; :::; x ip Þ ( j = 1,.. .,p markers) are combined for each neuron (k=1,.. .,S neurons) using a linear function, u ½k i ¼ b k þ P p j¼1

…

Structure of a single-layer feed-forward neural network (SLNN) adapted from González-Camacho et al. (2012). In the hidden layer, input variables = (j = 1,…,p markers) are combined for each neuron (k=1,…,S neurons) using a linear function, , and subsequently transformed using a non-linear activation function, yielding a set of inferred scores, . These scores are used in the output layer as basis functions to regress the response using the linear activation function on the data-derived predictors .

…

Structure of a radial basis function neural network adapted from González- Camacho et al. (2012). In the hidden layer, information from input variables (j = 1,…,p markers) is first summarized by means of the Euclidean distance between each of the input vectors with respect to S (data-inferred) (k=1,…,S neurons) centers , that is, . These distances are then transformed using the Gaussian function . These scores are used in the output layer as basis functions for the linear regression .

…

Plots of the predictive correlation for each of 50 cross-validation partitions and 10 environments for days to heading (DTH) in different combinations of models. (A) When the best non-parametric model is RKHS, this is represented by an open circle; when the best linear model is BL, this is represented by a filled circle. (B) When the best non-parametric model is RBFNN, this is represented by an open circle; when the best linear model is BL, this is represented by a filled circle. (C) When the best non-parametric model is BRNN, this is represented by an open circle; when the best linear model is BL, this is represented by a filled circle. The histograms depict the distribution of the correlations in the testing set obtained from the 50 partitions for different models. The horizontal (vertical) dashed line represents the average of the correlations for the testing set in the 50 partitions for the model shown on the Y (X) axis. The solid line represents Y = X; i.e. both models have the same prediction ability.

…