ArticlePDF Available

Abstract and Figures

In this paper we refer to the gene-to-phenotype modeling challenge as the GP problem. Integrating information across levels of organization within a genotype-environment system is a major challenge in computational biology. However, resolving the GP problem is a fundamental requirement if we are to understand and predict phenotypes given knowledge of the genome and model dynamic properties of biological systems. Organisms are consequences of this integration, and it is a major property of biological systems that underlies the responses we observe. We discuss the E(NK) model as a framework for investigation of the GP problem and the prediction of system properties at different levels of organization. We apply this quantitative framework to an investigation of the processes involved in genetic improvement of plants for agriculture. In our analysis, N genes determine the genetic variation for a set of traits that are responsible for plant adaptation to E environment-types within a target population of environments. The N genes can interact in epistatic NK gene-networks through the way that they influence plant growth and development processes within a dynamic crop growth model. We use a sorghum crop growth model, available within the APSIM agricultural production systems simulation model, to integrate the gene-environment interactions that occur during growth and development and to predict genotype-to-phenotype relationships for a given E(NK) model. Directional selection is then applied to the population of genotypes, based on their predicted phenotypes, to simulate the dynamic aspects of genetic improvement by a plant-breeding program. The outcomes of the simulated breeding are evaluated across cycles of selection in terms of the changes in allele frequencies for the N genes and the genotypic and phenotypic values of the populations of genotypes.
No caption available
… 
Content may be subject to copyright.
In Silico Biology 2 (2002) 151–164
IOS Press
Electronic publication can be found in In Silico Biol. 2, 0013 <http:// www.bioinfo.de/isb/2002/02/0013/>, 24 January 2002.
1386-6338/02/$8.00 © 2002 – IOS Press and Bioinformation Systems e.V. All rights reserved
151
The GP Problem: Quantifying Gene-to-Phenotype
Relationships
Mark Cooper
1, 2,*
, Scott C. Chapman
3
, Dean W. Podlich
1, 2
and Graeme L. Hammer
1, 4
1
School of Land and Food Sciences, The University of Queensland, Brisbane, Queensland 4072,
Australia
2
Current Address: Pioneer Hi-Bred International Inc., 7300 N.W. 62
nd
Avenue, P.O. Box 1004,
Johnston, Iowa 50131, USA
3
CSIRO Plant Industry, 120 Meiers Road, Indooroopilly, Queensland 4068, Australia
4
Agricultural and Production Systems Research Unit (APSRU), Queensland Department of
Primary Industries, Tor Street, Toowoomba, Queensland, Australia
Edited by E. Wingender; received 26 September 2001; revised and accepted 21 December 2001; published
24 January 2002
ABSTRACT
: In this paper we refer to the gene-to-phenotype modeling challenge as the GP problem.
Integrating information across levels of organization within a genotype-environment system is a major
challenge in computational biology. However, resolving the GP problem is a fundamental requirement if
we are to understand and predict phenotypes given knowledge of the genome and model dynamic proper-
ties of biological systems. Organisms are consequences of this integration, and it is a major property of
biological systems that underlies the responses we observe. We discuss the
E(NK)
model as a framework
for investigation of the GP problem and the prediction of system properties at different levels of organiza-
tion. We apply this quantitative framework to an investigation of the processes involved in genetic im-
provement of plants for agriculture. In our analysis,
N
genes determine the genetic variation for a set of
traits that are responsible for plant adaptation to
E
environment-types within a target population of envi-
ronments. The
N
genes can interact in epistatic
NK
gene-networks through the way that they influence
plant growth and development processes within a dynamic crop growth model. We use a sorghum crop
growth model, available within the APSIM agricultural production systems simulation model, to integrate
the gene-environment interactions that occur during growth and development and to predict genotype-to-
phenotype relationships for a given
E(NK)
model. Directional selection is then applied to the population
of genotypes, based on their predicted phenotypes, to simulate the dynamic aspects of genetic improve-
ment by a plant-breeding program. The outcomes of the simulated breeding are evaluated across cycles of
selection in terms of the changes in allele frequencies for the
N
genes and the genotypic and phenotypic
values of the populations of genotypes.
Links:
http://pig.ag.uq.edu.au/qu-gene/
http://www.apsru.gov.au/Products/apsim.htm
____________________________
*Corresponding author: Email: mark.cooper@pioneer.com.
M. Cooper et al. / The GP Problem
152
KEYWORDS:
E(NK)
model, epistasis, genotype-by-environment interactions, plant, crop, target popula-
tion of environments, genetic space
INTRODUCTION
Today, a major research focus in the field of genetics and computational biology is developing methods
to predict properties of organisms and populations of organisms at the phenotypic level from knowledge
of the structure, function and diversity of genomes. We refer to the problem of determining gene-to-
phenotype relationships as the GP problem. Formulating a solution to this problem for a defined natural
system requires integration of information across many levels of organization within biophysical systems.
An iterative modeling approach combined with strategic experimentation provides a powerful framework
for tackling the GP problem. The objective of this paper is to define what we mean by an iterative model-
ing approach. We commence by describing a general approach to modeling natural systems and then il-
lustrate its application in the modeling of plant breeding programs in an agricultural context.
It has often been stated that a model is a simplification of the natural system under investigation and
that the level of simplification must be balanced against the complexity of the properties of the system to
be studied. Therefore, what is an appropriate modeling approach to tackle the GP problem? In describing
modeling strategies in general, Rosen [1985] and Casti [1989, 1997] distinguished between the "Natural
System" that we are attempting to understand and the "Formal System" that is a mathematical construc-
tion of how we understand the properties of the natural system (Fig. 1). This is a useful starting point for
thinking about how we might model a genotype-environment system.
Fig.1. Concept map for modeling biophysical systems. The "Natural System" is the biophysical structure that is un-
der investigation and the "Formal System" is the model based investigative strategy that is in use to construct
"Knowledge Structures" that represent properties of the "Natural System".
Approaches to constructing a formal system that captures the important properties of a natural system
can take many forms. Here we are interested in a mathematical framework that allows us to represent the
key components of the natural system and define the key relationships between these components. We
M. Cooper et al. / The GP Problem
153
intend that the mathematical relationships that we construct within the formal system will ultimately be
representative of the causal relationships that are properties of the natural system, so that we can investi-
gate the implications of these causal relationships within the formal system. More detailed formalizations
might be appropriate if the objective is to investigate relationships at lower levels within the system,
rather than to understand the relationships among its components. An example of this is in the modeling
of processes related to the productivity of agricultural plants (Fig. 2). The genetics of different species
and varieties within species determine how plants interact with the soil and aerial (radiation, temperature,
rainfall) environments to develop structures to 'capture' radiation, CO
2
and water. While the enzymes and
biochemistry of the primary processes involved in the photosynthesis are extensively studied, it is diffi-
cult to integrate the results of these processes (assimilation of CO
2
into biomass) over various time scales,
and model the effect of this biomass as it is diverted to new tissues and organs to either store biomass, or
capture more resources. It has been considered that models can only simulate one or two levels of scale
away from the level of their primary function. Further, at molecular and atomic scales, the requirement
for information input to the system rapidly increases. By studying and experimenting within the natural
system we attempt to gain knowledge about the biophysical structures and their causal relationships at
appropriate scales. Some of these causal relationships may be functional, others pre-functional and in
many cases non-functional but consequential of the ways in which the biophysical structures interact
[Kauffman, 2000].
Fig. 2. Hierarchy and scale in modeling processes within plant systems.
In many cases the results of experiments in biology are summarized descriptively. Alternatively we can
attempt to encode, within a formal mathematical framework, our understanding of the results of the ex-
periments (Fig. 1). The collection of these formal mathematical structures that we create is a model of the
system. In many cases we may commence an experiment with a prior model or hypothesis and use the
results of our experimental program to update and improve our model of the natural system [e.g. Ideker et
al., 2001]. As we refine and improve our model through iterative cycles of experimentation and modeling
we will be able to study properties of the natural system within the properties of the formal system. This
will give us a basis for determining the level of confidence we have in decoding the structures observed
within the model and making predictions about the properties that we expect to see within the natural sys-
tem. Additionally, model building through iteration will enable us to acquire and interpret data structures
from experimental programs as a foundation for constructing knowledge structures and queries that apply
to the properties of the natural system. As we improve the quality of our model we will increasingly im-
prove our power to predict properties of the natural system across its levels of organization.
M. Cooper et al. / The GP Problem
154
INTEGRATION BY ASSUMPTION OR BY FOLDING OUT THE DETAIL?
If we attempt to construct an integrated model of a natural system without adequate attention to the
ways that the components of the system interact across levels of organization (Fig. 2) then we are either
confining ourselves to working within a level of organization or we will construct a model that has lim-
ited power to provide insight into many of the properties of the natural system. In the absence of experi-
mental evidence, attempting to integrate across levels of organization by assuming that interactions are
unlikely to be important will leave the resulting model vulnerable to deviate from the natural system
whenever these interactions become important.
In classical quantitative genetics many of the complicating interactions that can impact on gene-to-
phenotype relationships have been assumed to be unimportant, based on the expectation that their effects
are small, and/or that their estimation is impractical. Two properties of genotype-environment systems
that are often ignored are those of gene-to-gene interactions (epistasis) and gene-by-environment interac-
tions [e.g. Clark, 2000]. For example, in defining the value of a genotype for a quantitative trait that is
determined by multiple genes, the assumption that epistasis is zero implies that the effects of the alleles
for the segregating genes are independent of the effects of the alleles at the other genes. In this case, for
each gene, additive and dominance intra-gene effects can be defined in terms of contrasts between the
homozygous and heterozygous genotypes. Hence, the value of the multi-gene genotype for an individual
is then simply determined as the cumulative effects of the genes by summing the allele effects across the
segregating genes. Similarly, gene-by-environment (GE) interactions have been assumed to be unimpor-
tant or a source of error that can be summed to zero by evaluating genotypes in adequately large samples
of experimental environments representing the target population of environments.
Where experimental evidence demonstrates that the interactions are important it is necessary to directly
evaluate their implications within the formal system. Analyses of the genetic architecture of quantitative
traits in model systems indicate important sources of genetic variation attributed to epistasis and GE in-
teractions [Mackay, 2001]. The same can be expected of economically important traits in agricultural
plant species. Therefore, in tackling the GP problem for quantitative traits we seek a modeling framework
that enables investigation of the impact of gene-to-gene and gene-by-environment interactions.
MODELING A GENOTYPE-ENVIRONMENT SYSTEM
To progress from a general discussion of strategies for modeling natural systems to the specifics re-
quired to model genotype-environment systems it is necessary to define both the key properties and rela-
tionships that are important in the target natural system and the methods that are to be used in construct-
ing the formal system. Figure 3 is a concept map, based on the modeling framework described in Figure
1, which focuses on the GP problem for a genotype-environment system. Our objective is to establish a
formal representation of a genotype-environment system to enable modeling gene-to-phenotype relation-
ships as a basis for evaluating the efficiency of plant breeding strategies [Cooper et al., 1999]. Therefore,
here we emphasize the quantification of allelic variation at
N
genes and their potential interactions within
NK
gene networks [Kauffman, 1993] and with
E
environmental conditions [Podlich and Cooper, 1998] in
determining the gene-to-phenotype relationships for the traits to be improved by plant breeding.
The scope for modeling plant and animal breeding strategies has been a long-term focus of applied
quantitative genetics [e.g. Falconer and Mackay, 1996; Comstock, 1996]. The use of computer simulation
approaches has increased as hardware and software capability and flexibility have improved. Adopting a
simulation approach to study gene-to-phenotype relationships provides greater flexibility for investigating
the influences of epistasis and GE interactions than is possible within the classical statistical modeling
approach [Kempthorne, 1988; Podlich and Cooper, 1998]. Kauffman [1993] gave a comprehensive dis-
M. Cooper et al. / The GP Problem
155
cussion of the
NK
model and its suitability for investigating the impact of epistasis in evolutionary proc-
esses. Podlich and Cooper [1998] defined the
E(NK)
model as an extension of Kauffman's
NK
model in
order to accommodate the effects of gene-by-environment interactions. In the
E(NK)
model gene-by-
environment interactions are possible where different forms of
NK
gene network models can be expressed
in the different environmental conditions that are possible within a target population of environments.
Fig. 3. Concept map for modeling the key components of a genotype-environment system and the relationships to
the components of the E(NK) model and the investigative strategies applied to quantify the value of alleles of genes
within the genotype-environment system [Adapted from Cooper et al., 1999].
The relationships between the components of the
E(NK)
model and the biophysical components of a
genotype-environment system are indicated in Figure 3. Some of the investigation strategies that can be
used to provide the information necessary to build formal models of gene-to-phenotype relationships and
quantify the value of allelic variation in terms of the components of the
E(NK)
framework are indicated.
Key activities that are emphasized include: (i) environmental characterization as a basis for defining the
target population of environments and causes of GE interactions, (ii) genetic analysis to study genetic
variation for biochemical pathways, physiological processes and adaptive traits, (iii) genetic (recombina-
tion) and physical mapping of genes, (iv) functional genomics to study the regulation and expression of
genes, and (v) crop growth models that define the relationships between genetic variation for traits, plant
growth and development processes and variation in environmental resources within a target population of
environments [e.g. Bidinger et al., 1996].
SORGHUM BREEDING EXAMPLE: PROBLEM AND MODEL DEFINITION
To examine the effectiveness of a breeding strategy we need to define two properties of a genotype-
environment system: (1) the target population of environments, and (2) the target genotype for the gene-
to-phenotype model. Within the target geographical area that a breeding program operates, new genotypes
M. Cooper et al. / The GP Problem
156
are developed over sequences of cycles of intermating parents, evaluation and selection of progeny to
identify new genotypes that have high and stable yield performance across a wide range of environmental
conditions. The occurrence of environmental conditions within the geographical area has both spatial and
temporal dimensions and the different conditions can occur with different frequencies in both dimensions.
This results in a complex mixture of different environmental conditions that is referred to here as the
tar-
get population of environments
. In the presence of GE interactions, understanding the environmental fac-
tors that influence genotype performance and cause these interactions is an important step in designing an
effective testing strategy for measurement of trait phenotypes as part of a breeding program. The
target
genotype
is then defined as the genotype that results in the best trait performance across the target popula-
tion of environments for the specified gene-to-phenotype model. For complex genotype-environment sys-
tems there can be multiple genotype targets. As
E(NK)
models become more complex, with increasing
levels of
E
,
N
and
K
, it becomes increasingly difficult to compute and identify a single target genotype. In
these situations, where it is not possible to create and evaluate all potential genotypes for a gene-to-
phenotype model, alternative evaluation strategies are used. In the example we consider here the geno-
type-environment system is of a size that definition of a single target genotype is possible.
In this example we discuss some key results from a larger long-term study. This larger study is investi-
gating the requirements (Fig. 3) for model development and simulation of sorghum (
Sorghum bicolor
(L.)
Moench) adaptation and grain yield for the heterogeneous dryland agricultural system in northeastern
Australia [Chapman et al., 2000a,b,c, 2002a,b].
First we provide some background and context to the complexity of this genotype-environment system.
Sorghum is the major summer crop grown in the northeastern cropping region of Australia. Grain yield is
the major economic product and is used mainly as animal feed. Sorghum grain yield is a complex quanti-
tative trait and is the result of interactions and integration of many component traits that can themselves
interact with variation in environmental conditions (rainfall, temperature and solar radiation) during a
crop growth and developmental cycle of around 100 days. The major environmental variable that has a
dominant influence on grain yield variation is water availability to the crop. Variation in water availabil-
ity is a consequence of complex spatial and temporal variation in rainfall prior to and during the growth
of the crop and also the spatial variation in the water holding capacity of the soil types across the geo-
graphical area. We have found that the environmental variation in incidence of drought can explain a sig-
nificant component of the GE interactions for grain yield [Chapman et al., 2000a,b,c]. Research into the
genetic and physiological bases of drought tolerance of sorghum has identified and examined the impor-
tance of the following four traits: (1) phenology, in particular the timing of flowering (PH) [Hammer et
al., 1989], (2) stay-green (SG) [Borrell and Hammer, 2000], (3) transpiration efficiency (TE) [Hammer et
al., 1997; Mortlock and Hammer, 1999], and (4) osmotic adjustment (OA) [Hammer et al., 1999]. In par-
allel research, genetic analysis and the construction of a molecular marker map for grain sorghum [Tao et
al., 1998, 2000] has enabled trait dissection. This body of work provides working hypotheses of the num-
ber of genes or Quantitative Trait Loci (QTL) that may contribute to the genetic variation for these four
traits [Chapman et al., 2002a,b].
With access to this experimental database we have used a simulation approach to investigate the effi-
ciencies of plant breeding strategies used for genetic improvement of grain yield of sorghum under the
dryland conditions in Australia. This required us to develop an interface between a genetic modeling plat-
form (QU-GENE) [Podlich and Cooper, 1998; http://pig.ag.uq.edu.au/qu-gene/] and a cropping system
model (APSIM) [McCown et al., 1996; http://www.apsru.gov.au/products/apsim.htm], which has a mod-
ule for sorghum [Hammer and Muchow, 1994; Hammer et al., 2001]. This interface was constructed in a
way that used information generated from our ability to characterize environments for their occurrence of
drought, our understanding of the spatial and temporal distributions of drought in the target population of
environments, and the data available from genetic and physiological analyses of traits considered to con-
tribute to drought tolerance (Fig. 3). This provides a model architecture that links the alleles of genes and
M. Cooper et al. / The GP Problem
157
the plant growth and development processes that respond to variation in the environmental conditions to
determine grain yield (Fig. 4). Thus, by developing an interface between the QU-GENE genetic model
and the APSIM-Sorg model for sorghum there is a relationship between genes and phenotypes that en-
ables investigation of the GP problem within a genotype-environment system context. These gene-to-
phenotype relationships can be used to assess the value of genes in terms of an
E(NK)
model for grain
yield in a target population of environments. Further, as additional experimental information becomes
available it is possible to continually update the genetic and physiological models for the genotype-
environment system, our assessment of the allelic variation we have identified, and any impact that this
may have on the efficiency of the breeding strategies we are using for genetic improvement of sorghum.
Fig. 4. Schematic of the modular structures and linkages between QU-GENE and APSIM. In this example S1 recur-
rent selection was used as the breeding strategy to improve grain yield of the sorghum population of genotypes.
Other plant breeding strategies are indicated (e.g. pedigree selection). Genotypes are categorized into expression-
states in QU-GENE and these expression-states map to trait values modeled in APSIM-Sorg for different combina-
tions of soil and weather data. Output from APSIM is processed to define both the yield of all possible genotypes
(expression-state combinations) and the frequency of drought environment types (ETs) encountered in the target
population of environments (TPE).
The
E(NK)
model can be parameterized in a number of ways, including: (1) Constructing Boolean gene
networks and sampling genotype values for the components of the networks from underlying distributions
of gene effects; a procedure pioneered by Kauffman [1993]; (2) Defining inheritance models using em-
pirical estimates for classical quantitative genetic parameters [Podlich and Cooper, 1998]; and (3) Speci-
fying gene networks to represent the properties of biochemical pathways. For the sorghum genotype-
environment system in our example the resulting
E(NK)
model is a consequence of the number of genes
specified to control variation for traits, the number of environment-types identified for the target popula-
tion of environments and the physiological relationships that determine crop growth and development
with the APSIM-Sorg sorghum model. This is a novel approach for determining the parameters for an
E(NK)
model and it is made feasible by developing the interface between QU-GENE and APSIM (Fig. 4).
M. Cooper et al. / The GP Problem
158
Here we consider an
E(NK)
model where the number of environment-types
E
=3 and the total number of
genes
N
=15. Each of the 15 genes has two alleles segregating within a base population of genotypes. The
level of epistasis for grain yield, as defined by the
K
parameter, is not explicitly defined here and is an
emergent property of the extent of trait interconnectedness within the APSIM-Sorg crop growth model.
The three environment-types represent different levels of severity of drought: (1) mild terminal stress,
(2) moderate terminal stress, and (3) severe terminal stress. These drought environment types, together
with their frequencies of occurrence in the target population of environments, were determined from an
analysis of the timing and severity of water deficits during crop growth and development by running the
APSIM-Sorg model for a standard genotype with approximately 100 years of weather data across a num-
ber of locations in northeastern Australia. The locations represented different soil types from the target
geographical area. The APSIM-Sorg simulations were then summarized by cluster analysis to identify the
three key drought environment-types (Fig. 4) [Chapman et al., 2000b,c]. While there are three environ-
ment-types in the target population of environments, to be concise we will mostly concentrate on only
two of these in this paper; (1) the mild-terminal stress environment-type, and (2) the severe terminal stress
environment-type. The 15 genes determine the genetic variation for grain yield in the environment-types
by specifying the extent of genetic variation for the four traits PH (3 genes), SG (5 genes), TE (5 genes)
and OA (2 genes). Thus, the genetic variation for grain yield is an emergent property of the variation for
the physiologically defined growth and development processes in the APSIM-Sorg model impacted by
the four traits. The process we have used here to specify the genetic variation for grain yield differs from
the classical quantitative genetics approach where effects of "yield-genes" are specified in ways that are
unrelated to or unconstrained by the biophysical properties of plant growth and development processes.
The resultant genetic variation for grain yield in the base population of genotypes is then subjected to a
series of recurrent cycles of directional selection for increased levels of grain yield. The breeding strategy
we evaluate in this example is S1 recurrent selection [Hallauer and Miranda, 1988] and selection is based
on the yield phenotypes of genotypes when they are evaluated in samples of environments taken from the
target population of environments.
The genetic changes in the population of genotypes in response to the selection imposed by the breed-
ing strategy are examined in terms of: (1) the changes in frequencies of the alternative alleles for the 15
genes (referred to as changes in gene frequencies) on a trait basis, and (2) the changes in grain yield per-
formance of the genotypes created and selected during the course of the simulation experiment. We
examine these changes due to selection at both genetic and phenotypic levels by constructing response
surfaces that relate genetic distances between genotypes to the phenotypic values for the four traits PH,
SG, TE, OA and also grain yield. Genetic distances are calculated as Hamming Distances, which give a
measure of the number of alleles that differ between any pair of genotypes.
For 15 genes, each segregating for two alleles, there are 3
15
= 14,348,907 possible genotypes from all
combinations of alleles. The frequency of occurrence of these genotypes in the reference population is
dependent on the gene frequencies for the 15 genes. Running the APSIM-Sorg crop growth model
14,348,907 times for each environmental condition was not feasible. Therefore, in this example we re-
duced the number of simulations necessary by allocating genotypes to classes based on defining "expres-
sion states" for each trait. An expression state was defined for a trait by the total number of + or - alleles
summed across the genes influencing the trait, where the + allele increased trait value and the - allele de-
creased trait value. Adopting this approach, for
N
genes determining genetic variation for a trait, with two
alleles per gene, there are 2
N
+1 expression states for the trait. For example, for the trait OA with
N
=2,
individuals can have 0, 1, 2, 3 or 4 + alleles, representing the 5 states of expression for OA. There are
numbers of genotypes in each of the expression state classes. If we label the two genes A (
A,a
) and B
(
B,b
) such that the alternative alleles are
A
(+),
a
(-) and
B
(+),
b
(-) then the genotype membership of the
expression state classes are: 0 =
aabb
; 1 =
Aabb
,
aaBb
; 2 =
AAbb
,
AaBb
,
aaBB
; 3 =
AABb
, A
aBB
; 4 =
AABB
. We then divided the range of phenotypic values for the traits into equal increments on a linear
M. Cooper et al. / The GP Problem
159
scale, with genotype
aabb
defined as the lowest expression state and
AABB
the highest expression state
for OA. The same process was applied to the other three traits. Following this procedure, we have 5 ex-
pression states for OA, 7 expression states for PH, 11 expression states for both SG and TE. With the four
traits we have 5×7×11×11 = 4,235 combinations of expression states. Thus, the 14,348,907-dimension
genotype space is condensed and mapped onto a 4,235-dimension expression state space. Running 4,235
APSIM-Sorg simulations for the 600 environments used to represent the target population of environ-
ments was manageable with our computer cluster [Micallef et al., 2001; http://pig.ag.uq.edu.au/qu-gene]
resources. The deterministic relationship between genotypes and trait expression states used in this exam-
ple is only one of many ways in which a gene-to-phenotype relationship can be constructed within our
modeling framework (Figs. 3 and 4).
SORGHUM BREEDING EXAMPLE: RESULTS
For the three environment-types the APSIM-Sorg model was used to estimate a grain yield value for
each of the 4,235 trait expression states, referred to hereafter as genotype classes. These estimates were
averages from ca. 200 runs of the model, using as inputs daily weather data and soils data from location-
year combinations chosen to represent the target population of environments. Some appreciation of the
genetic variation for yield that exists among the genotype classes for each of the four traits in the mild
Fig. 5. Grain yield distribution of the genotype classes for the Mild Terminal Stress (colored blue) and Severe Ter-
minal Stress (colored red) environment-types, for representations where the genotype classes are distributed accord-
ing to their genetic distance from the target genotype (based on grain yield) for each of the four traits; (a) Transpira-
tion Efficiency, (b) Osmotic Adjustment, (c) Phenology and (d) Stay-green. The vertical axis indicates the percent-
age of the 4235-genotype classes present at each yield/Hamming distance combination. The horizontal left axis
indicates the level of grain yield (t/ha). The horizontal right axis indicates the number of alleles different from the
target genotype in the target population of environments (referred to as Hamming distance).
M. Cooper et al. / The GP Problem
160
terminal stress and severe terminal stress environment-types is given in Figure 5. For both environment-
types a series of grain yield frequency distributions is shown for each trait. The genotypic classes are or-
dered on their genetic distance (measured as a Hamming distance) from the allele combination of the tar-
get genotype in the target population of environments. As expected lower grain yields are achieved under
severe terminal stress (colored red) than in the mild terminal stress (colored blue) environment-type. For
any genotype class for the four traits there is considerable genetic variation for grain yield, which results
from genotypic variation for the other three traits.
To evaluate the consequences of the effects of GE interactions between the mild terminal stress and se-
vere terminal stress environment-types at the level of grain yield we need to examine the relationship be-
tween grain yield performance in both environment-types. To do this we construct a scatter plot of the
yield values in both environment-types (Fig. 6). If there were no GE interactions there would be a perfect
correlation of the grain yield values between the two environment-types. From the shape of the distribu-
tion of the yield values it can be seen that there are GE interactions and that the genotypes with highest
grain yield differ between the two environment-types.
Fig. 6. Grain yield values (t/ha) for the 4235-genotype classes in the Mild Terminal Stress and Severe Terminal
Stress environment-types for color coded representations of each of the four traits; (a) Transpiration Efficiency, (b)
Osmotic Adjustment, (c) Phenology and (d) Stay-green. Genotype classes are color coded according to their genetic
distance from the target genotype in the target population of environments (Hamming distance), extending from
yellow (all alleles different from the target genotype) to blue (no alleles different from the target genotype).
In Figure 6, each of the 4235 genotype classes is color coded by trait, extending from light (yellow) to
dark (blue), to depict for each trait the genetic distance between the genotype class and the target geno-
M. Cooper et al. / The GP Problem
161
type. As the colors get darker the genotypes in the classes have more alleles in common (giving a lower
Hamming distance) with the target genotype. For both TE (Fig. 6a) and OA (Fig. 6b), genotypes with
high yield in the severe terminal and mild terminal stress environment-types generally have a large pro-
portion of genes in common with genotypes that yield well in the target population of environments. The
situation is different for PH (Fig. 6c). For the PH trait, genotypes that have a high yield in the mild termi-
nal stress environment-type have many genes in common with the target genotype, whereas genotypes
that have high yield in the severe terminal stress environment-type are genetically distant from the target
genotype. Thus, we have strong GE interactions for grain yield that can impact on selection outcomes for
the PH trait and yield in the different environment-types and in the target population of environments. For
SG (Fig. 6d) there is a strong association between high yield in the mild terminal stress environment-type
and having genes in common with the target genotype. However, this relationship is much weaker in the
severe terminal stress environment-type, in part because the other traits have a stronger influence on yield
in this environment-type.
Since there are strong epistatic and GE interactions for the four traits in determining grain yield in the
genotype-environment system represented in this example, it is important to consider the influence of se-
lection environment on the expected changes in the genetic structure of the population. Here we examine
genetic responses over recurrent cycles of selection on yield phenotypes in either the severe terminal
stress or mild terminal stress environment-types. These responses to selection are examined in terms of
changes in the gene frequencies of alleles for increasing levels of trait expression for each trait (Fig. 7)
and finally in terms of trajectories through genetic space for yield (Fig. 8).
Fig. 7. Change in gene frequency of the + alleles for increasing level of the four traits (TE=Transpiration Efficiency,
OA=Osmotic Adjustment, Ph=Phenology, SG=Stay-green) over cycles of selection, when selection is conducted in
the Severe Terminal Stress (a) and Mild Terminal Stress (b) environment-types.
Selection for increased grain yield within the severe terminal stress environment-type (Fig. 7a) had the
effect of rapidly increasing the frequencies of alleles that enhanced expression of the two traits OA and
TE, gradually increasing the frequencies of alleles for enhanced SG, and decreasing the frequencies of
alleles for later flowering, thus selecting early flowering genotypes that could developmentally escape
from the severe terminal stress conditions. After selection cycles 5 and 6, once the alleles for greater ex-
pression of OA and TE were fixed, the rate of increase in frequency of alleles for enhanced levels of SG
was greater than in the previous selection cycles. Selection for higher grain yield under the mild terminal
M. Cooper et al. / The GP Problem
162
stress environment-type (Fig. 7b) resulted in a different pattern of changes in frequencies of alleles to that
observed for the severe terminal stress environment-type (Fig. 7a). Under the mild terminal stress envi-
ronment-type selection for greater yield favored an increase in the frequencies of alleles for higher ex-
pression levels of all four traits (Fig. 7b). Thus, in contrast to the severe terminal stress environment-type,
where early flowering genotypes were favored, selection in the mild terminal stress environment-type
favored late flowering genotypes. Therefore, as we expect in the presence of these interactions, if we plot
the trajectories through genetic space followed by the populations over cycles of selection for yield, these
trajectories contrast depending on whether we select under a severe terminal stress environment-type
(Fig. 8a) or a mild terminal stress environment-type (Fig. 8b).
Fig. 8. Grain yield values (t/ha) for the 4235-genotype classes and the average trajectory of a population of geno-
types (red line) over cycles of selection, when selection is conducted in the Severe Terminal Stress (a) and Mild
Terminal Stress (b) environment-types. Genotype classes are color coded according to their genetic distance from
the target genotype in either the Severe Terminal Stress (a) or the Mild Terminal Stress (b) environment-types, ex-
tending from yellow (all alleles different from the target genotype) to blue (no alleles different from the target geno-
type).
SORGHUM BREEDING EXAMPLE: DISCUSSION
The purposes for considering the sorghum breeding example we have described in this paper were
threefold: (1) to demonstrate some aspects of the approaches we are developing and using to investigate
and deal with the GP problem for complex traits in plant breeding applications (Fig. 3), (2) to emphasize
the importance that both epistatic and GE interactions can have in gene-to-phenotype relationships, and
(3) show how the
E(NK)
model can be used as a framework for many approaches to investigating the GP
problem. An equally valid case study, with availability of a suitable experimental information base, could
be the study of human health issues such as heart disease with influences from the genetics of individuals
and the lifestyle environment they choose.
To date our investigation of sorghum genetic improvement in Australia has synthesized a large body of
information that previously existed as a series of less well connected studies. The modeling framework
we now have has highlighted many previously unappreciated implications of interactions between breed-
ing strategies, the genetic architecture of traits and the environments in which we select for higher grain
yield. Also, and perhaps most importantly, the results of these studies have provided testable hypotheses
and focal points for further experimentation to test our current understanding of the ways in which these
M. Cooper et al. / The GP Problem
163
traits interact with each other and environmental conditions to determine grain yield. Thus, we are enter-
ing another cycle of the iterative modeling approach described in Figure 3.
The GP problem has always and will continue to be a major challenge in biology. With the increasing
availability of the complete genome sequences of a number of prokaryotic and eukaryotic organisms, our
improving ability to define the locations of genes in these sequences, and our growing knowledge of the
functional relationships between these genes and the biochemical and metabolic pathways they influence
[Karp, 2001], we are beginning to understand the dynamical nature of the GP problem. We see that an
iterative modeling approach, as described in this paper, is a logical quantitative framework for exploring
the growing experimental databases and creating knowledge structures for genotype-environment systems
(Fig. 1). This provides a foundation for defining priorities in the model development process and in decid-
ing when development of practical applications is feasible. In our case the practical applications we seek
are efficient plant breeding strategies that contribute to sustainable agricultural systems.
ACKNOWLEDGMENTS
We thank Professor John Casti for his permission to create a modification of his original modeling
concept map in Figure 1 and also Research Trends, Trivandrum, India, for permission to reproduce com-
ponents of Figure 3.
REFERENCES
[1] Bidinger, F. R., Hammer, G. L. and Muchow, R. C. (1996). The physiological basis of genotype by environ-
ment interaction in crop adaptation.
In
: Plant Adaptation and Crop Improvement, Cooper, M. and Hammer,
G.L. (eds). CAB International, Wallingford, pp. 329-347.
[2] Borrell, A. K. and Hammer, G. L. (2000). Nitrogen dynamics and the physiological basis of stay-green in
sorghum. Crop Sci. 40, 1295-1307.
[3] Casti, J. L. (1989). Paradigms Lost: Images of Man in the Mirror of Science. Cardinal, London.
[4] Casti, J. L. (1997). Would-be-Worlds: How Simulation is Changing the Frontiers of Science. John Wiley &
Sons, Inc., New York.
[5] Clark, A. G. (2000). Limits to prediction of phenotypes from knowledge of genotypes. Evol. Biol. 32, 205-
224.
[6] Chapman, S. C., Cooper, M., Butler, D. G. and Henzell, R. G. (2000a). Genotype by environment interac-
tions affecting grain sorghum. I. Characteristics that confound interpretation of hybrid yield. Aust. J. Agric.
Sci. 51, 197-207.
[7] Chapman, S. C., Cooper, M., Hammer, G. L. and Butler, D. G. (2000b). Genotype by environment interac-
tions affecting grain sorghum. II. Frequencies of different seasonal patterns of drought stress are related to
location effects on hybrid yields. Aust. J. Agric. Sci. 51, 209-221.
[8] Chapman, S. C., Hammer, G. L., Butler, D. G. and Cooper, M. (2000c). Genotype by environment interac-
tions affecting grain sorghum. III. Temporal sequences and spatial patterns in the target population of envi-
ronments. Aust. J. Agric. Sci. 51, 223-233.
[9] Chapman, S. C., Cooper, M. and Hammer, G. L. (2002a). Using crop simulation to generate genotype by
environment interaction effects for sorghum in water-limited environments. Aust. J. Agric. Sci., in press.
[10] Chapman, S. C., Cooper, M., Podlich, D. W. and Hammer, G. L. (2002b). Evaluating plant breeding strate-
gies by simulating gene action and environmental effects to predict phenotypes for dryland adaptation.
Agron. J., submitted.
[11] Comstock, R. E. (1996). Quantitative Genetics with Special Reference to Plant and Animal Breeding. Iowa
State University Press, Ames.
M. Cooper et al. / The GP Problem
164
[12] Cooper, M., Podlich, D. W., Jensen, N. M., Chapman, S. C. and Hammer, G. L. (1999). Modelling plant
breeding programs. Trends Agron. 2, 33-64.
[13] Falconer, D. S. and Mackay, T. F. C. (1996). Introduction to Quantitative Genetics. 4th edn. Longman, Es-
sex.
[14] Hallauer, A. R. and Miranda, J. B. F. (1988). Quantitative Genetics in Maize Breeding 2nd edn. Iowa State
University Press, Ames.
[15] Hammer, G. L., Chapman, S. C., and Snell, P. (1999). Crop simulation modelling to improve selection effi-
ciency in plant breeding programs. Proc. Ninth Assembly Wheat Breeding Society of Australia, Toowoomba,
pp. 79-85.
[16] Hammer, G. L. Farquhar, G. D. and Broad, I. J. (1997). On the extent of genetic variation for transpiration
efficiency in sorghum. Aust. J. Agric. Res. 48, 649-655.
[17] Hammer, G. L. and Muchow, R. C. (1994). Assessing climatic risk to sorghum production in water-limited
subtropical environments. I. Development and testing of a simulation model. Field Crops Res. 36, 221-234.
[18] Hammer G. L., Vanderlip R. L., Gibson G., Wade L. J., Henzell R. G., Younger D. R., Warren J., Dale A. B.
(1989). Genotype by environment interaction in grain sorghum. II. Effects of temperature and photoperiod on
ontogeny. Crop Sci. 29, 376-384.
[19] Hammer, G. L., van Oosterom, E. J., Chapman, S. C. and McLean, G. (2001). The economic theory of water
and nitrogen dynamics in field crops. In: Proceedings of the Fourth Australian Sorghum Conference, Kooral-
byn, Queensland, 5-8 Feb 2001, A. K. Borrell and R. G. Henzell (eds). CD-Rom Format. Range Media Pty
Ltd. (ISBN: 0-7242-2163-8).
[20] Ideker, T., Thorsson, V., Ranish, J. A., Christmas, R., Buhler, J., Eng, J. K., Bumgarner, R., Goodlett, D. R.,
Aebersold, R. and Hood, L. (2001). Integrated genomic and proteomic analyses of a systematically perturbed
metabolic network. Science 292, 929-934.
[21] Karp, P. D. (2001). Pathway databases: A case study in computational symbolic theories. Science 293, 2040-
2044.
[22] Kauffman, S. A. (1993). The Origins of Order: Self-Organization and Selection in Evolution. Oxford Univer-
sity Press, New York.
[23] Kauffman, S. A. (2000). Investigations. Oxford University Press, Oxford.
[24] Kempthorne, O. (1988). An overview of the field of quantitative genetics. In: Proceedings of the Second
International Conference on Quantitative Genetics, Weir, B. S., Eisen, E. J., Goodman, M. M. and
Namkoong, G. (eds). Sinauer Associates, Inc., Sunderland, pp. 47-56.
[25] Mackay, T. F. C. (2001). Quantitative trait loci in Drosophila. Nat. Rev. Genet. 2, 11-20.
[26] McCown, R. L., Hammer, G. L., Hargreaves, J. N. G., Holzworth, D. P. and Freebairn, D. M. (1996). AP-
SIM: A novel software system for model development, model testing, and simulation in agricultural systems
research. Agric. Syst. 50, 255-271.
[27] Micallef, K. P., Cooper, M. and Podlich, D. W. (2001). Using clusters of computers for large QU-GENE
simulation experiments. Bioinformatics 17, 194-195.
[28] Mortlock, M. Y. and Hammer, G. L., (1999). Genotype and water limitation effects on transpiration effi-
ciency in sorghum. J. Crop Prod. 2, 265-286.
[29] Podlich, D. W. and Cooper, M. (1998). QU-GENE: a platform for quantitative analysis of genetic models.
Bioinformatics 14, 632-653.
[30] Rosen, R. (1985). Anticipatory Systems: Philosophical, Mathematical and Methodological Foundations. Per-
gamon Press, Oxford.
[31] Tao, Y. Z., Jordan, D. R., Henzell, R. G. and McIntyre, C. L. (1998). Construction of a genetic map in a sor-
ghum RIL population using probes from different sources and its alignment with other sorghum maps. Aust.
J. Agric. Res. 49, 729-736.
[32] Tao, Y. Z., Henzell, R. G., Jordan, D. R., Butler, D. G., Kelly, A. M. and McIntyre, C. L. (2000). Identifica-
tion of genomic regions associated with stay green in sorghum by testing RILs in multiple environments.
Theo. Appl. Genet. 100, 1225-1232.
... Furthermore, testing the effect of these traits under water stress scenarios is limited since drought events vary over time and geographies (Tang and Piechota, 2009;Pournasiri-Poshtiri et al., 2018). Thus, plant breeding programs require complementary methods to test the effect of any hypothetical drought adaptation trait to design a breeding pipeline (Cooper et al., 2002;Bernardo, 2020;Crossa et al., 2022). ...
... Crop model applications to support breeding exist for different crops and diverse geographies (Technow et al., 2015;Chenu et al., 2017;Hammer et al., 2019b;Jighly et al., 2023). For instance, a study deployed a rice crop model to hypothesize the lack of effectiveness in breeding drought-tolerant upland rice in Brazil (Heinemann et al., 2019), and other studies improved phenomic prediction by integrating crop models and genomic prediction (Cooper et al., 2002;Heslot et al., 2014;Crossa et al., 2022). ...
... Higher yields in early spring resulted from the synchronization of planting dates with the onset of precipitation, which increased the frequency of WW environments (Supplementary Figure S3). Likewise, simulation and field studies demonstrated yield gains of up to 11% in seasons with higher water availability (Francis et al., 1986;Carcedo et al., 2021;Zander et al., 2021) Genetic pyramiding for drought adapted phenotypes can be explored via crop modeling (Cooper et al., 2002). A simulation study in sorghum revealed that LT is more effective than stay-green in water scarcity scenarios (Kholováet al., 2014). ...
Article
Full-text available
Breeding sorghum to withstand droughts is pivotal to secure crop production in regions vulnerable to water scarcity. Limited transpiration (LT) restricts water demand at high vapor pressure deficit, saving water for use in critical periods later in the growing season. Here we evaluated the hypothesis that LT would increase sorghum grain yield in the United States. We used a process-based crop model, APSIM, which simulates interactions of genotype, environment, and management (G × E × M). In this study, the G component includes the LT trait (GT) and maturity group (GM), the EW component entails water deficit patterns, and the MP component represents different planting dates. Simulations were conducted over 33 years (1986-2018) for representative locations across the US sorghum belt (Kansas, Texas, and Colorado) for three planting dates and maturity groups. The interaction of GT x EW indicated a higher impact of LT sorghum on grain for late drought (LD), mid-season drought (MD), and early drought (ED, 8%), than on well-watered (WW) environments (4%). Thus, significant impacts of LT can be achieved in western regions of the sorghum belt. The lack of interaction of GT × GM × MP suggested that an LT sorghum would increase yield by around 8% across maturity groups and planting dates. Otherwise, the interaction GM × MP revealed that specific combinations are better suited across geographical regions. Overall, the findings suggest that breeding for LT would increase sorghum yield in the drought-prone areas of the US without tradeoffs.
... Within this context we argue that the crop sciences together with advances in plant and crop models have the potential for important new roles in improving the design and effective operation of breeding programs within the context of the future needs for crop improvement . To realise these opportunities plant and crop models will have to be designed to take advantage of advances in understanding of trait genetic architecture and the principles of quantitative genetics (Lynch and Walsh 1998, Mackay 2001, Cooper et al. 2002a,b, 2005, Hammer et al. 2006, Walsh and Lynch 2018. Here we introduce a quantitative genetics perspective of approaches for linking trait genetic models with mechanistic crop models to enhance our understanding of the genetic architecture of complex traits, such as grain yield of crops, and to explore new prediction applications for breeding. ...
... However, in contrast, for plant breeding there has been much less consideration of the potential applications of mechanistic crop models to study traits. Nevertheless, given the long-term nature of breeding programs, there has been parallel interest in applications of simulation methods for modelling plant breeding programs (Podlich and Cooper 1998, Cooper et al. 2002a,b, Li et al. 2012, Bernardo 2020. Here, the statistical gene-to-phenotype (G2P) models of quantitative genetics are used to study and represent the genetic architecture of traits and the effects of genes on trait variation (Falconer and Mackay 1996, Lynch and Walsh 1998, Walsh and Lynch 2018. ...
... These differences in trait modelling approaches can create barriers to their integration for accelerated crop improvement. However, understanding their potential connections creates new opportunities (e.g., Cooper et al. 2002a,b, 2005, Chapman et al. 2002, Hammer et al. 2006, Chenu et al. 2009, Messina et al. 2011, Technow et al. 2015, Onogi et al. 2016, Bustos-Korts et al. 2019a,b, Peng et al. 2020, Toda et al. 2020. Here we provide an overview of the progression from the trait G2P models of quantitative genetics to applications using mechanistic crop models (CGM-G2P). ...
Article
Full-text available
Plant breeding programs are designed and operated over multiple cycles to systematically change the genetic makeup of plants to achieve improved trait performance for a Target Population of Environments (TPE). Within each cycle, selection applied to the standing genetic variation within a structured reference population of genotypes (RPG) is the primary mechanism by which breeding programs make the desired genetic changes. Selection operates to change the frequencies of the alleles of the genes controlling trait variation within the RPG. The structure of the RPG and the TPE has important implications for the design of optimal breeding strategies. The breeder’s equation, together with the quantitative genetic theory behind the equation, informs many of the principles for design of breeding programs. The breeder’s equation can take many forms depending on the details of the breeding strategy. Through the genetic changes achieved by selection, the cultivated varieties of crops (cultivars) are improved for use in agriculture. From a breeding perspective, selection for specific trait combinations requires a quantitative link between the effects of the alleles of the genes impacted by selection and the trait phenotypes of plants and their breeding value. This gene-to-phenotype link function provides the G2P map for one to many traits. For complex traits controlled by many genes, the infinitesimal model for trait genetic variation is the dominant G2P model of quantitative genetics. Here we consider motivations and potential benefits of using the hierarchical structure of crop models as CGM-G2P trait link functions in combination with the infinitesimal model for the design and optimisation of selection in breeding programs.
... WGP estimates the effect of all genotyped variants simultaneously by fitting them in a single analysis (Meuwissen et al., 2001) leading to a more comprehensive and accurate dissection of the genetic variance especially with the presence of linkage disequilibrium among variants (Wray et al., 2014). Early attempts to integrate CGMs and genetic traits (Cooper et al., 2002) have resulted in a plethora of attempts globally to bring together suitable CGM and genotypic information (Hammer et al., 2005). Heslot et al. (2014) attempted to integrate CGMs with WGP by fitting stress environmental factors extracted from CGMs as covariates in the WGP equation. ...
Article
Running crop growth models (CGM) coupled with whole genome prediction (WGP), as a CGM-WGP model, introduces environmental information to WGP and genomic relatedness information to the genotype-specific parameters (GSPs) modelled through CGMs. Previous studies have primarily used CGM-WGP to infer prediction accuracy without exploring its potential to enhance CGM and WGP. Here, we implemented a heading date and a heading and maturity date wheat phenology model within a CGM-WGP framework and compared it to CGM and WGP. The CGM-WGP resulted in more heritable GSPs with more biologically realistic correlation structures between GSPs and phenology traits compared to CGM-modelled GSPs that reflected the correlation of measured phenotypes. Another advantage of CGM-WGP is the ability to infer accurate prediction with much smaller and less diverse reference data compared to that required for CGM. A genome-wide association analysis linked the GSPs from the CGM-WGP model to nine significant phenology loci including Vrn-A1 and the three PPD1 genes, which were not detected for CGM-modelled GSPs. Selection on GSPs could be simpler than on observed phenotypes. For example, thermal time traits are theoretically more independent candidates, compared to the highly correlated heading and maturity dates, which could be used to achieve an environment-specific optimal flowering period. CGM-WGP combines the advantages of CGM and WGP to predict more accurate phenotypes for new genotypes under alternative or future environmental conditions.
... Understanding the mechanism of host-plant resistance will aid in the identification of genes that confers resistance and in the development of resistant varieties through classical breeding or biotechnology, and plant development can be explained ultimately at the level of molecular due to decision of gene to phenotype [51] [52]. Regulation of genes related to metabolism of pigments to resist insect attack has been studied for some years. ...
... However, this approach has limitations: (i) Resources to deploy such trials are often limited, (ii) it is difficult to reproduce and/or interpret findings from year to year due to spatiotemporal variability of environmental conditions, and (iii) in most of the cases, the novel trait is not yet available in commercial varieties, so cannot be studied in relevant genetic backgrounds (Lenaerts et al., 2019). In this context, crop growth models complement field experiments (Challinor et al., 2018;van Ittersum et al., 2003) to help breeding programs evaluate the potential value of novel traits within an existing or novel cropping system (Aggarwal et al., 1997;Chenu et al., 2017;Cooper et al., 2002). Models can be used to evaluate the potential benefits and tradeoffs of combining new genotypes and new management approaches in the target production environments by accounting for interactions of genotype, environment, and management (G Â E Â M) (Chenu et al., 2013;Kholová et al., 2014;Singh et al., 2014). ...
Article
Full-text available
Many crop species, particularly those of tropical origin, are chilling sensitive, so improved chilling tolerance can enhance production of these crops in temperate regions. For the cereal crop sorghum (Sorghum bicolor L.), early planting and chilling tolerance have been investigated for >50 years, but the potential value or tradeoffs of this genotype × management change have not been formally evaluated with modeling. To assess the potential of early planted chilling-tolerant grain sorghum in the central US sorghum belt, we conducted CERES-Sorghum simulations and characterized scenarios under which this change would be expected to enhance (or diminish) drought escape, water capture, and yield. We conducted crop growth modeling for full- and short-season hybrids under rainfed systems that were simulated to be planted in very early (April), early (May 15), and normal (June 15) planting dates over 1986–2015 in four locations in Kansas representative of the central US sorghum belt. Simulations indicated that very early planting will generally lead to lower initial soil moisture, longer growing periods, and higher evapotranspiration. Very early planting is expected to extend the growing period by 20% for short- or full-season hybrids, reduce evaporation during fallow periods, and increase plant transpiration in the two-thirds of years with the highest precipitation (mean > 428 mm), leading to 11% and 7% increase grain yield for short- and full-season hybrids, respectively. Thus, in this major sorghum growing region, very early and early planting could reduce risks of terminal droughts, extend seasons, and increase rotation options, suggesting that further development of chilling-tolerant hybrids is warranted.
Chapter
The growing food demand in the world due to the increasing population and decreasing availability of agricultural land requires new crops that are more productive and resistant to harsher environmental conditions. Thus, rapid and effective exploration, identification, and validation of an important trait, gene, molecular mediator, and protein interaction are important for improving crop yield and quality in the near future. Integrating genomics, transcriptomics, proteomics, metabolomics and phenomics enables a deeper understanding of the mechanisms underlying the complex architecture of many phenotypic traits of agricultural relevance. Here, we cite several relevant examples that can appraise our understanding of the recent developments in omics technologies and how they drive our quest to breed climate-resilient crops. Large-scale genome resequencing, pangenomes, and genome-wide association studies aid in identifying and analysing species-level genome variations. RNA-sequencing-driven transcriptomics approach has provided unprecedented opportunities for performing crop abiotic and biotic stress response studies. Additionally, the high-resolution proteomics technologies necessitated a gradual shift from the general descriptive studies of plant protein abundances to large-scale analysis of protein-metabolite interactions. Especially, advent in metabolomics is currently receiving special attention, owing to the role metabolites play as metabolic intermediates and close links to the phenotypic expression. Further, the high-throughput phenomics approach opened new research domains such as root system architecture analysis and plant root-associated microbes for improved crop health and climate resilience. Overall, integrating the PANOMICS approach to modern plant breeding and genetic engineering methods ensures the development of climate-smart crops with higher nutrition quality that can sustainably meet the current and future global food demands.
Article
Full-text available
Functional genomics is the systematic study of genome‐wide effects of gene expression on organism growth and development with the ultimate aim of understanding how networks of genes influence traits. Here, we use a dynamic biophysical cropping systems model (APSIM‐Sorg) to generate a state space of genotype performance based on 15 genes controlling four adaptive traits and then search this space using a quantitative genetics model of a plant breeding program (QU‐GENE) to simulate recurrent selection. Complex epistatic and gene × environment effects were generated for yield even though gene action at the trait level had been defined as simple additive effects. Given alternative breeding strategies that restricted either the cultivar maturity type or the drought environment type, the positive (+) alleles for 15 genes associated with the four adaptive traits were accumulated at different rates over cycles of selection. While early maturing genotypes were favored in the Severe‐Terminal drought environment type, late genotypes were favored in the Mild‐Terminal and Midseason drought environment types. In the Severe‐Terminal environment, there was an interaction of the stay‐green (SG) trait with other traits: Selection for + alleles of the SG genes was delayed until + alleles for genes associated with the transpiration efficiency and osmotic adjustment traits had been fixed. Given limitations in our current understanding of trait interaction and genetic control, the results are not conclusive. However, they demonstrate how the per se complexity of gene × gene × environment interactions will challenge the application of genomics and marker‐assisted selection in crop improvement for dryland adaptation.
Chapter
High-throughput phenotyping (HTP) is poised to fundamentally transform plant breeding through increased accuracy, spatial, and temporal resolution in measuring breeding trials. In this chapter, we examine different types of phenotyping platforms, data management, and data utilization for decision making using HTP in plant breeding, with case studies from wheat breeding programs. Development of HTP platforms, both ground-based and aerial vehicles requires evaluating the traits to be measured as well as the resources available. Data management is a critical part of the overall research process, and an example data management program is provided. Finally, examples of HTP use within crop breeding and plant science are presented. This chapter provides an overview of the entire HTP process from system conception to decision making within research programs based on HTP data.
Article
Full-text available
Key message Climate change and Genotype-by-Environment-by-Management interactions together challenge our strategies for crop improvement. Research to advance prediction methods for breeding and agronomy is opening new opportunities to tackle these challenges and overcome on-farm crop productivity yield-gaps through design of responsive crop improvement strategies. Abstract Genotype-by-Environment-by-Management (G × E × M) interactions underpin many aspects of crop productivity. An important question for crop improvement is “How can breeders and agronomists effectively explore the diverse opportunities within the high dimensionality of the complex G × E × M factorial to achieve sustainable improvements in crop productivity?” Whenever G × E × M interactions make important contributions to attainment of crop productivity, we should consider how to design crop improvement strategies that can explore the potential space of G × E × M possibilities, reveal the interesting Genotype–Management (G–M) technology opportunities for the Target Population of Environments (TPE), and enable the practical exploitation of the associated improved levels of crop productivity under on-farm conditions. Climate change adds additional layers of complexity and uncertainty to this challenge, by introducing directional changes in the environmental dimension of the G × E × M factorial. These directional changes have the potential to create further conditional changes in the contributions of the genetic and management dimensions to future crop productivity. Therefore, in the presence of G × E × M interactions and climate change, the challenge for both breeders and agronomists is to co-design new G–M technologies for a non-stationary TPE. Understanding these conditional changes in crop productivity through the relevant sciences for each dimension, Genotype, Environment, and Management, creates opportunities to predict novel G–M technology combinations suitable to achieve sustainable crop productivity and global food security targets for the likely climate change scenarios. Here we consider critical foundations required for any prediction framework that aims to move us from the current unprepared state of describing G × E × M outcomes to a future responsive state equipped to predict the crop productivity consequences of G–M technology combinations for the range of environmental conditions expected for a complex, non-stationary TPE under the influences of climate change.
Preprint
Full-text available
Many crop species, particularly those of tropical origin, are chilling sensitive so improved chilling tolerance can enhance production of these crops in temperate regions. For the cereal crop sorghum (Sorghum bicolor L.) early planting and chilling tolerance have been investigated for >50 years, but the potential value or tradeoffs of this genotype X management change has not been formally evaluated with modeling. To assess the potential of early-planted chilling-tolerant grain sorghum in the central US sorghum belt, we conducted CERES-Sorghum simulations and characterized scenarios under which this change would be expected to enhance (or diminish) drought escape, water capture, or yield. We conducted crop growth modeling for a full- and short-season hybrids under rainfed systems that were simulated to be planted in early (mid-April), normal (mid-May), and late (mid-June) planting dates from 1986 to 2015 in four locations in Kansas representative of the central US sorghum belt. Simulations indicated that early planting will generally lead to lower initial soil moisture, longer growing periods, and higher evapotranspiration. Early planting is expected to extend the growing period by 20% for short- or full-season hybrids, reduce evaporation during fallow periods, and increase plant transpiration in the two-thirds of years with the highest precipitation (mean > 428 mm), leading to 11% and 7% increase grain yield for short- and full-season hybrids, respectively. Thus, in this major sorghum growing region early planting could reduce risks of terminal droughts, extend seasons, and increase rotation options, suggesting that further development of chilling tolerant hybrids is warranted.
Article
Full-text available
We demonstrate an integrated approach to build, test, and refine a model of a cellular pathway, in which perturbations to critical pathway components are analyzed using DNA microarrays, quantitative proteomics, and databases of known physical interactions. Using this approach, we identify 997 messenger RNAs responding to 20 systematic perturbations of the yeast galactose-utilization pathway, provide evidence that approximately 15 of 289 detected proteins are regulated posttranscriptionally, and identify explicit physical interactions governing the cellular response to each perturbation. We refine the model through further iterations of perturbation and global measurements, suggesting hypotheses about the regulation of galactose utilization and physical interactions between this and a variety of other metabolic pathways.
Article
Full-text available
Classical quantitative genetics theory makes a number of simplifying assumptions in order to develop mathematical expressions that describe the mean and variation (genetic and phenotypic) within and among populations, and to predict how these are expected to change under the influence of external forces. These assumptions are often necessary to render the development of many aspects of the theory mathematically tractable. The availability of high-speed computers today provides opportunity for the use of computer simulation methodology to investigate the implications of relaxing many of the assumptions that are commonly made. QU-GENE (QUantitative-GENEtics) was developed as a flexible computer simulation platform for the quantitative analysis of genetic models. Three features of the QU-GENE software that contribute to its flexibility are (i) the core E(N:K) genetic model, where E is the number of types of environment, N is the number of genes, K indicates the level of epistasis and the parentheses indicate that different N:K genetic models can be nested within types of environments, (ii) the use of a two-stage architecture that separates the definition of the genetic model and genotype-environment system from the detail of the individual simulation experiments and (iii) the use of a series of interactive graphical windows that monitor the progress of the simulation experiments. The E(N:K) framework enables the generation of families of genetic models that incorporate the effects of genotype-by-environment (G x E) interactions and epistasis. By the design of appropriate application modules, many different simulation experiments can be conducted for any genotype-environment system. The structure of the QU-GENE simulation software is explained and demonstrated by way of two examples. The first concentrates on some aspects of the influence of G x E interactions on response to selection in plant breeding, and the second considers the influence of multiple-peak epistasis on the evolution of a four-gene epistatic network. QU-GENE is available over the Internet at (http://pig.ag.uq.edu.au/qu-gene/) m.cooper@mailbox.uq.edu. au
Article
The fact that natural selection acts on phenotypes but the transmission of traits to the next generation is indirectly accomplished through genes gives rise to a challenging set of problems in evolutionary Biology. In order to understand adaptive evolution, it appears to be essential to first understand how genotypes give rise to observed phenotypes, or more precisely, how variation in phenotypes is mediated by underlying variation in genotypes. As the tools of molecular genetics give an increasingly detailed view of the underlying genetic variation, one would hope that this problem would be solved by the sheer volume of genetic data. Human molecular genetics has produced many significant successes recently, particularly in identifying genes that cause Mendelian disorders. In stark contrast, chronic diseases that exhibit familial clustering but do not segregate like a Mendelian gene have been remarkably difficult to analyze genetically. The focus of this chapter is on the question, “What are the barriers to our understanding of the genetic basis for familiar clustering of chronic diseases?” We will focus on medical genetics rather than the more general problem of genotype-phenotype associations in evolutionary Biology, because knowledge of phenotypic variation is so extensive for humans and the quantity of data on genetic variation is soon going to eclipse that of all other species, if it has not already.
Article
A genetic map was established using 120 F-5 sorghum recombinant inbred lines (RILs) developed from a cross between 2 Australian elite sorghum inbred lines, QL39 and QL41. A variety of DNA probes, including sorghum genomic DNA, maize genomic DNA and cDNA, sugarcane genomic DNA and cDNA, and cereal anchor probes, were screened to identify DNA polymorphism between the parental lines. Using 5 restriction enzymes, probe polymorphism levels were low (26.5%). A total of 155 restriction fragment length polymorphism (RFLP) loci and 8 simple sequence repeat (SSR) loci were mapped onto 21 linkage groups, covering a map distance of approximately 1400 cM. Genes for 3 simply inherited traits, awns (AW), mesocarp thickness (Z), and organophosphate insecticide (OPR) reaction, were also mapped. The relationships between this map and other published sorghum maps were reviewed and a comparison of major sorghum RFLP maps attempted. This comparison is expected to enhance the effectiveness of existing mapping information and will facilitate efforts to map agronomically important traits in sorghum.
Article
(…) Development rate of all hybrids exhibited a curvilinear response to temperature in both phases. Old and new hybrids differed in their temperature responses in GS1 but were similar in GS2. New hybrids had slower rates of development at all temperatures, but the difference was greater at higher temperature (>25 o C). All hybrids had similar short-day photoperiodic response in GS1, with a critical photoperiod 13.2 h. The models were tested on a separate data set covering a similar broad range of environments and performed well
Article
Stay green is an important drought resistance trait for sorghum production. QTLs for this trait with consistent effects across a set of environments would increase the efficiency of selection because of its relatively low heritability. One hundred and sixty recombinant inbreds, derived from a cross between QL39 and QL41, were used as a segregating population for genome mapping and stay green evaluation. Phenotypic data were collected in replicated field trials from five sites and in three growing seasons, and analysed by fitting appropriate models to account for spatial variability and to describe the genotype by environment interaction. Interval mapping and non-parametric mapping identified three regions, each in a separate linkage group, associated with stay green in more than one trial, and two regions in single trial. The regions on linkage groups B and I were both consistently identified from three trials. The multiple environment testing was very helpful for correctly identifying QTLs associated with the trait. The utilisation of molecular markers for stay green in sorghum breeding is also discussed.
Article
APSIM (Agricultural Production Systems Simulator) is a software system which allows (a) models of crop and pasture production, residue decomposition, soil water and nutrient flow, and erosion to be readily re-configured to simulate various production systems and (b) soil and crop management to be dynamically simulated using conditional rules. A key innovation is change from a core concept of a crop responding to resource supplies to that of a soil responding to weather, management and crops. While this achieves a sound logical structure for improved simulation of soil management and long-term change in the soil resource, it does so without loss of sensitivity in simulating crop yields. This concept is implemented using a program structure in which all modules (e.g. growth of specific crops, soil water, soil N, erosion) communicate with each other only by messages passed via a central ‘engine’. Using a standard interface design, this design enables easy removal, replacement, or exchange of modules without disruption to the operation of the system. Simulation of crop sequences and multiple crops are achieved by managing connection of crop growth modules to the engine.A shell of software tools has been developed within a WINDOWS environment which includes user-installed editor, linker, compiler, testbed generator, graphics, database and version control software. While the engine and modules are coded in FORTRAN, the Shell is in C++. The resulting product is one in which the functions are coded in the language most familiar to the developers of scientific modules but provides many of the features of object oriented programming. The Shell is written to be aware of UNIX operating systems and be capable of using the processor on UNIX workstations.
Article
Sorghum (Sorghum bicolor (L.) Moench.) is one of the major summer crops grown in the subtropics. The high rainfall variability and limited planting opportunities in these regions make crop production risky. A robust crop simulation model can assist farmer decision-making via simulation analyses to quantify production risks. Accordingly, we developed a simple, yet mechanistic crop simulation model for sorghum for use in assessing climatic risk to production in water-limited environments. The model simulates grain yield, biomass accumulation, crop leaf area, phenology and soil water balance. The model uses a daily time-step and readily available weather and soil information and assumes no nutrient limitation. The model was tested on numerous data (n=38) from experiments spanning a broad range of environments in the semi-arid tropics and subtropics. Potential limitations in the model were identified and examined in a novel testing procedure by using combinations of predicted and observed data in various modules of the model. The model performed satisfactorily, accounting for 94% and 64% of the variation in total biomass and grain yield, respectively. The difference in outcome for biomass and yield was caused by limitations in predicting harvest index. The concepts involved, and the limitations encountered, developing a crop model to be simple but consistent with the biophysical rigour required for application to such a diverse range of environments, are discussed.