ArticlePDF Available

Diversity in Nucleotide Binding Site–Leucine-Rich Repeat Genes in Cereals

Authors:

Abstract and Figures

The diversity of the largest group of plant disease resistance genes, the nucleotide binding site–leucine-rich repeat (NBS–LRR) genes, was examined in cereals following polymerase chain reaction (PCR) cloning and database mining. NBS–LRR genes in rice are a large and diverse class with more than 600 genes, at least three to four times the complement of Arabidopsis . Most occur in small families containing one or a few cross-hybridizing members. Unlike in Arabidopsis and other dicots, the class of NBS–LRR genes coding for a Toll and mammalian interleukin-1 receptor (TIR) domain were not amplified during the evolution of the cereals. Genes coding for TIR domains are present in the rice genome, but have diverged from the NBS–LRR genes. Most cereal genes are similar in structure to the members of the non-TIR class of dicots, although many do not code for a coiled-coil domain in their amino termini. One unique class of cereal genes, with ∼50 members, codes for proteins similar to the N-termini and NBS domains of resistance genes but does not code for LRR domains. The resistance gene repertoire of grasses has changed from that of dicots in their independent evolution since the two groups diverged. It is not clear whether this reflects a difference in downstream defense signaling pathways. [Supplemental material is available online at www.genome.org . The sequence data from this study have been submitted to GenBank under accession nos. AF516886 – AF516895 .]
Content may be subject to copyright.
Diversity in Nucleotide Binding Site–Leucine-Rich
Repeat Genes in Cereals
Jianfa Bai, Lourdes A. Pennill, Jianchang Ning, Se Weon Lee,
Jegadeesan Ramalingam, Craig A. Webb, Bingyu Zhao, Qing Sun,
James C. Nelson, Jan E. Leach, and Scot H. Hulbert
1
Department of Plant Pathology, Kansas State University, Manhattan, Kansas 66506-5502, USA
The diversity of the largest group of plant disease resistance genes, the nucleotide binding site–leucine-rich
repeat (NBS–LRR) genes, was examined in cereals following polymerase chain reaction (PCR) cloning and
database mining. NBS–LRR genes in rice are a large and diverse class with more than 600 genes, at least three to
four times the complement of Arabidopsis. Most occur in small families containing one or a few cross-hybridizing
members. Unlike in Arabidopsis and other dicots, the class of NBS–LRR genes coding for a Toll and mammalian
interleukin-1 receptor (TIR) domain were not amplified during the evolution of the cereals. Genes coding for
TIR domains are present in the rice genome, but have diverged from the NBS–LRR genes. Most cereal genes are
similar in structure to the members of the non-TIR class of dicots, although many do not code for a coiled-coil
domain in their amino termini. One unique class of cereal genes, with 50 members, codes for proteins similar
to the N-termini and NBS domains of resistance genes but does not code for LRR domains. The resistance gene
repertoire of grasses has changed from that of dicots in their independent evolution since the two groups
diverged. It is not clear whether this reflects a difference in downstream defense signaling pathways.
[Supplemental material is available online at www.genome.org. The sequence data from this study have been
submitted to GenBank under accession nos. AF516886–AF516895.]
Plants use a variety of different types of disease-resistance
genes to detect the presence of pathogens and induce defense
responses. The largest class of these genes code for proteins
with nucleotide binding site (NBS) and leucine-rich repeat
(LRR) domains (Bent 1996; Hammond-Kosack and Jones
1997; Hulbert et al. 2001). The Col0 ecotype of Arabidopsis has
been estimated to carry 150 genes coding for NBS–LRR pro-
teins, or more if genes coding for truncated versions of the
protein are considered, and the rice genome was estimated to
carry even more (Meyers et al. 2002). No function other than
disease resistance has yet been assigned to this large class of
genes.
NBS–LRR genes in plants are typically divided into two
classes depending on whether they code for a TIR domain
(having homology to the intracellular domain of the Dro-
sophila Toll and mammalian interleukin-1 receptors in their
N-terminus). The TIR group genes are composed of an
N-terminal TIR domain, a central NBS domain, and a C-
terminal LRR region. This group of genes has been observed
only in dicot plant species (Meyers et al. 1999; Pan et al.
2000a, Goff et al. 2002). The non-TIR group is sometimes
referred to as the coiled-coil (CC) group because they typically
have CC domains at their N termini.
The sequences of the central portion of NBS–LRR genes,
including the NBS domain, have been used extensively to
identify and to classify these genes. The popular use of this
domain stems from a number of reasons: The NBS domain has
some conserved amino acid motifs that assist in cloning these
genes via PCR amplification and recognizing them in data-
bases; the conserved motifs assist in aligning the sequences
for phylogenetic analyses, and classification of NBS–LRR
genes by their NBS region sequences accurately predicts
whether they belong to the TIR or non-TIR class (Meyers et al.
1999; Pan et al. 2000a).
While LRR regions typically appear to be under strong
diversifying selection pressure, NBS domains do not, or at
least not to the same extent (Parniske et al. 1997; McDowell
1998; Meyers et al. 1998; Sun et al. 2001). This likely reflects
the role of the LRR region in recognition of constantly evolv-
ing pathogen ligands and a role for the NBS domain in rec-
ognition signaling (Ellis et al. 1999; Ellis et al. 2000; Dodds et
al. 2001). Analysis of the L locus of flax has demonstrated that
the TIR domain can also play a role in determining pathogen
recognition and that it may be under diversifying selection
like the LRR (Luck et al. 2000). LRR regions can also be diffi-
cult to use for comparative sequence analysis because even
closely related genes often show size polymorphisms, making
alignment difficult.
There are now sufficient sequences available from the
cereals, especially rice, to reveal the diversity and general na-
ture of NBS–LRR genes and related genes in cereal genomes.
The present study was conducted to characterize the numbers
and structures of NBS–LRR genes in rice for comparison with
those of dicot species. Comparisons of these genes with those
of other cereal species can be used to identify possible or-
thologs. We also describe a collection of probes that can be
used for mapping and isolating these genes in different cereal
crops.
1
Corresponding author.
E-MAIL shulbrt@plantpath.ksu.edu; FAX (785) 532-5692.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/
gr.454902.
Letter
12:1871–1884 ©2002 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/02 $5.00; www.genome.org Genome Research 1871
www.genome.org
RESULTS
Isolation of NBS Clones for a Probe Collection
Over 150 sequences were isolated by PCR amplification and
cloning of the NBS region of rice and maize NBS–LRR genes
using primers designed from conserved regions of known ce-
real NBS–LRR genes, or from sequences that were mined from
the databases (see below). Most primers were designed to
match the conserved P-loop motif marking the N-terminal
end of the NBS domain and a conserved MHD motif that
occurs near the beginning of the LRR. This allowed isolation
of the whole NBS region of the gene, corresponding to 900
nucleotides. The majority of the clones were amplified from
the rice cultivar Nipponbare. Nondegenerate primers de-
signed from rice sometimes amplified maize DNA but usually
worked poorly and provided only six unique clones from
maize. Many primer pairs amplified several closely related
genes. When these related gene fragments were used as probes
in gel-blot hybridization experiments, they typically detected
identical restriction fragments (data not shown). To make a
collection of gene probes that would each identify different
families of R genes (Table 1), only sequences that were suf-
ficiently different from previously collected sequences were
retained. An arbitrary cutoff of 75% amino acid identity was
used to represent sequences from different families, because
sequences with less identity than this typically identified dif-
ferent fragments in genomic hybridization experiments (data
not shown). Using this criterion, the cloned rice probes were
derived from 96 different NBS–LRR families.
Collection of NBS Sequences by Database Mining
The initial stages of NBS–LRR gene sequence collection con-
sisted of mining sequences (mostly gene fragments) from
GenBank and the Monsanto Rice Genome Sequence Data-
base. Searches for rice sequences in the GenBank nonredun-
dant and the high throughput genomic sequence databases
were performed in March 2002 using a variety of predicted
protein sequences from monocot and dicot NBS–LRR genes as
queries. Examination of all sequences with TBLASTN scores
>1e-4 yielded 216 rice bacterial artificial chromosomes (BACs)
with one or more genes predicted to code for NBS–LRR pro-
teins. An additional 61 gene fragments were identified,
mostly from PCR-amplified genomic fragments including
parts of NBS-coding sequences. The Monsanto database con-
tained 259 Mb of assembled sequence from the Japonica rice
cultivar Nipponbare. We were able to collect 144 sequences
that code for the complete NBS area, from the P-loop to the
MHD motifs from this database. A very large Indica (line 93–
11) rice database (Yu et al. 2002), consisting of assembled
whole-genome shotgun sequences of >360 Mb of the rice ge-
nome, was also searched for NBS–LRR sequences. Approxi-
mately 560 different sequence contigs containing NBS–LRR
sequences were identified in this database. Most of these con-
tigs coded for at least one sequence of the complete NBS-
coding area of the gene (P-loop to MHD) and 253 predicted
full-length genes were identified. Altogether, including se-
quences from the NBS-coding probe collection and the Gen-
Bank, Monsanto, and Indica databases, over 1080 sequences
coding for complete NBS areas were examined. Pairwise com-
parisons of these sequences allowed the identification of iden-
tical sequences from different databases and their classifica-
tion into different families. Genes were grouped into 354 dif-
ferent families, where members of a family share >75% amino
Table 1. Predicted NBS-Coding Sequences Isolated
From Rice by Cloning PCR Amplified Fragments
Clone
Name Identical or Related Sequences* Copy No.**
rNBS1 AB017914 1.1
rNBS2 AP003275, IN35396 2.1
rNBS3 IN7596 4.1
rNBS4 IN2882 2.0
rNBS5 IN15065, rNBS5b (88%) 0.8
rNBS6 OSM148075 (97%), IN1837 (97%)
rNBS7 OSM144390, IN19918
rNBS8 OSM151991 1.8
rNBS9 OSM12017, IN2806
rNBS10 OSM1922 3.1
rNBS11 IN4580 4.8
rNBS12 OSM148064 2.0
rNBS13 OSM120301, IN7733 1.0
rNBS14 OSM133731 1.3
rNBS15 OSM148062, AP003275a 4.9
rNBS16 OSM148567, IN15839 0.8
rNBS17 OSM116395 1.0
rNBS18 OSM1922, IN12788 1.5
rNBS19 OSM122428, IN5476 3.9
rNBS20 OSM1970 1.9
rNBS21 OSM124828 1.0
rNBS22 IN19, OSM129598, rNBS22b (87%) 2.0
rNBS23 OSM133645, AP003839b, IN1044 1.0
rNBS24 OSM138053, IN1277 1.4
rNBS25 OSM144727, AP003995, IN16936 1.9
rNBS26 OSM14020, IN34799 6.7
rNBS27 OSM14023, AF032697 7.3
rNBS28 IN20419 2.6
rNBS29 OSM1916, AP3575b 3.1
rNBS30 OSM128453, IN637 3.6
rNBS31 OSM11432, AP003914a, IN10063 3.3
rNBS32 OSM11430, IN1053 4.3
rNBS33 OSM117679 2.0
rNBS34 OSM11429, AP003914d, IN7220 5.3
rNBS35 OSM11430B, AP003914b 2.4
rNBS36 OSM13352 1.9
rNBS37 OSM12017B, AF074894, IN12951 1.0
rNBS38 OSM14018, AF032689 4.3
rNBS39 OSM141732, IN14579 3.6
rNBS40 OSM14676, IN23744 3.1
rNBS41 OSM15374 3.1
rNBS42 OSM19134 4.8
rNBS43 OSM19134, IN459 5.3
rNBS44 OSM135760 1.6
rNBS45 IN1487 0.8
rNBS46 OSM15354 3.9
rNBS48 OSM11901, AC003621a, IN3280 2.1
rNBS49 OSM113184, IN1169, rNBS49b (82%) 1.3
rNBS50 OSM13480, AP003633, IN14668 2.4
rNBS52 OSM15936, IN14071 1.5
rNBS53 OSM15929, AF220740, IN17130 1.3
rNBS54 OSM146, AP003840, IN8045 1.3
rNBS55 OSM113712, IN13161 1.3
rNBS56 OSM14162 2.9
rNBS57 IN97289 1.1
rNBS58 OSM142192, IN4 1.0
rNBS59 ACO78890 2.3
rNBS60 OSM150826, IN41105 8.2
rNBS62 OSM14552, IN7190 0.6
rNBS63 OSM146376, IN42153
rNBS67 OSM15999, AP003539d
rNBS68 OSM11089, IN15589
rNBS69 OSM12066, IN130 1.0
rNBS70 OSM140512 0.8
rNBS71 AC083751, IN15208 3.0
rNBS72 AC074354, IN3585 1.0
rNBS73 OSM133123, IN41851 1.2
Bai et al.
1872 Genome Research
www.genome.org
acid sequence identity with other members. Efforts were
made to isolate at least one full-length coding region for each
family of NBS–LRR genes. These predicted coding regions can
be searched or examined at (http://coding.plantpath.ksu.edu/
blast/blastNBS.html).
Estimation of NBS–LRR Gene Number and Copy
Number of Different Families
Approximately 560 sequences predicted to code for NBS do-
mains of NBS–LRR genes were identified in the Indica data-
base. To project a total number of NBS–LRR genes in the rice
genome, an estimate of the number of NBS–LRR genes miss-
ing from the Indica database was made. The predicted amino
acid sequences of the 96 cloned NBS fragment probes were
used in TBLASTN searches with the Indica sequences. The 96
sequences represent a diverse collection of genes (see below)
and should represent an unbiased estimate of the coverage of
the whole genome. Most of the sequences used in the search
were from the Japonica cultivar Nipponbare, so that identifi-
cation of Indica alleles for specific sequences was sometimes
ambiguous. In most cases, near-perfect matches were identi-
fied; 74 of the 96 clones matched Indica sequences with 98%
or better identity for at least 150 amino acids (Table 1). In
other cases, presumed alleles were identified but sequence
identity was lower. For example, two sequences, rNBS21 and
rNBS70, which were estimated to exist in single copies in the
genomes (below), were matched by single sequences in the
Indica database that were 95% identical. If sequences with
identities of 95% or better amino acid identity are considered
probable alleles, then the Indica database carries alleles for all
but 14 of the 96 sequences tested (85%). If the 560 NBS-
coding sequences in this database represent 85% of the NBS–
LRR genes in the genome, the estimated number of genes
would be 660, which is very close to the estimate of Goff
et al. (2002).
The average copy number of 73 of the rice NBS-region
clones was estimated by hybridizing to DNAs of four rice va-
rieties, each cut with four different enzymes (Fig. 1). The rice
varieties examined included one Indica line (IR64), two Ja-
ponica types (Azucena and Gihobyeo) and cultivar Milyang23
derived from an Indica X Japonica cross. All of the probes
hybridized to all four cultivars demonstrating that both rice
subspecies generally carry the same families of NBS–LRR
genes. The 30 probes that appeared to detect single-copy
genes typically revealed one or two bands in most enzyme
digests of all four cultivars, although lanes in some digests
were sometimes lacking bands, probably because small frag-
ments migrated off the gel. Multiple-copy probes usually de-
tected similar numbers of fragments in the different cultivars,
with one exception. The rNBS41 probe identified an esti-
mated five genes in the Azucena and Gihobyeo cultivars, but
single genes in the other two. This copy number difference
was also apparent when the sequence databases were exam-
ined. The Indica database carried a single gene that was highly
similar to rNBS41, while the Monsanto database carried six
sequences with >70% amino acid identity. The probe there-
fore detects a family that has become amplified in some lines,
possibly most Japonica types. The number of hybridizing re-
Table 1. (Continued)
Clone
Name Identical or Related Sequences* Copy No.**
rNBS74 AL442107, IN48 1.3
rNBS75 OSM15011, IN37552 1.9
rNBS76 OSM12654, AP003799, IN3311 1.0
rNBS77 OSM15610, AP003753, IN28943 1.8
rNBS78 OSM13655 2.1
rNBS79 OSM147688, IN13548 3.8
rNBS80 OSM11170, IN26131 1.3
rNBS81 OSM128851, IN10755 0.8
rNBS82 OSM19131, AF220749, IN459 2.3
rNBS83 AC079128b
rNBS84 APO03269-2, IN-1507 3.0
rNBS85 APO3219, IN112357 1.8
rNBS86 AP003368, IN39058 1.5
rNBS87 AL513003.1, IN541
rNBS88 IN41159
rNBS89 AP001168, IN8174
rNBS90 AP003073 (97%), IN9666 (96%)
rNBS92 AP004223, AP003262
rNBS95 AP003616, IN16441
rNBS96 IN8880
rNBS97 OSM140690
rNBS98 AP003208, IN8376
rNBS102 AC090870, IN149
rNBS104 AP003563, IN1434
rNBS106 OSM14542, IN214
rNBS108 OSM150553
rNBS109 AC090870, IN22788
rNBS111 AP003609, IN5558
rNBS112 AC099402
*The most closely related sequences existing in GenBank (as of
Jan. 2002), Monsanto, or Indica databases. Sequences from the
Indica databases are designated by the prefix ‘IN’ and the se-
quence contig number. Sequences from the Monsanto database
are designated ‘OSM’. Values in parentheses give the approxi-
mate % sequence identity to the cloned sequence if <98%.
**Copy No. is the average number of hybridizing bands when
hybridized to gel blots of four rice cultivars digested with four
different restriction enzymes (Fig. 1). Those where no number is
shown were not tested.
Figure 1 Estimation of genomic copy number of different nucleo-
tide binding site–leucine-rich repeat (NBS–LRR) families in rice. Dif-
ferent cloned NBS-coding sequences were used as probes on gel-blot
hybridizations to DNAs of four rice lines digested with four different
restriction enzymes. The top blot is probed with rNBS79 and the
bottom blot is probed with rNBS72, a predicted single-copy gene.
Cereal NBS–LRR Genes
Genome Research 1873
www.genome.org
Figure 2 Cladistic analysis of the nucleotide binding site (NBS) region in rice NBS–leucine-rich repeat (LRR) genes. A total of 354 rice sequences
and four other cereal R gene sequences were used for generating a neighbor-joining tree. The 354 rice sequences were selected so that none
exhibited >75% sequence identity to any others in the NBS region. The tree is a representative tree in which groups with >75% bootstrapping value
(% of trees of 1000 generated) are represented by a single branch with an average branch length for that group. The complete tree can be viewed
from the online supplementary material section. The members of the individual groups are shown in the tree for the smaller groups or listed as
follows: Group 1 = rNBS57, In4386-3, In3256-1, In4208-2, In4208-2B, rNBS24, In1277-2, In10933-1, In8374-4, In32618-1, rNBS58, AP003345,
In 6313-2; group 2 = rNBS15, rNBS12, rNBS17, rNBS6, rNBS3, rNBS97, In37394-1, In9848-1, AB019186, In6575-2; group 3 = rNBS77, rNBS7,
rNBS13, In7573-1, In18748-1, In37595-1; group 4 = rNBS112, rNBS8, OSM13267, In2847-2, In7323-1, AC74283; group 5 = Pita, In20736-1,
In1586-1, In10315-1, In20846-1, In3014-1, In785-4B, In10722-1; group 6 = rNBS96, AP001073-c, AP001073-a, AP003848, In22736-1, In5389-2;
group 7 = AC092548, AP003859, In4930-1, In3292-1, In16749-1, In3436-1, In10436-2; group 8 = rNBS25, rNBS92, AP003918, In31789-1,
In40443-1; group 9 = rNBS11, rNBS26, rNBS27, rNBS38, rNBS39, In10362-1; group 10 = AC092388, AC097277-2, AC097277-1, In7217-2,
In20247-1, In15637-1, In5597-3; group 11 = rNBS84, AP003568, AP004092, In19349-1, In13218-2, In2456-1, In47000-1, In31232-1, 5790-1;
group 12 = rNBS32, rNBS2, rNBS16, rNBS59, rNBS49, rNBS31, rNBS34, rNBS35, rNBS78, AB022164, In7980-1, In10136-1, In2646-2, In10063-1,
In340-3; group 13 = rNBS9, rNBS37, In16096-1, In18642-1, In36703-1, In14463-1; group 14 = Rp1, rNBS33, rNBS86, rNBS108, AP004061,
AP003368-c, AL606992, AL606616-2, OSM140512, In7903-1, In9457-1, In20413-1, In20680-1; group 15 = Xa1, rNBS10, rNBS19, rNBS42,
rNBS52, rNBS53, AL606660-2, AL606660-5, OSM15716, OSM12815, In12431-1; group 16 = rNBS48, rNBS83, rNBS85, rNBS41, rNBS62,
AC098834, AP004010, AP003621-b, OSM15362, OSM14953, OSM11901, In46824-1, In30175-1, In3589-3, In27999-1, In4754-2; group
17 = rNBS68, rNBS44, AL513004-1, AL513004-2, OSM129685, In271-4, In47803-1, In7388-1, In11951-1, In17986-1, In3169-2, In271-2,
In3639-1, In9038-1; group 18 = rNBS87, rNBS36, AP003930-e, AP003930-b, AP003827-d, In1964-1. Gene designations are as in Table 1.
striction fragments is probably not an accurate reflection of
genomic copy number for the genes with several copies. In
our experience with the maize Rp1 and Rp3 gene families,
lines with 10 to 20 family members generally show fewer
distinct fragments with most enzymes (Webb et al. 2002).
Phylogenetic Analysis of the NBS Region of Rice
NBS–LRR Genes
As mentioned above, the rice sequences were grouped into
354 different families by sequence similarity. The predicted
amino acid sequences of the NBS regions of one member of
each of these families were aligned for phylogenetic analyses
(Fig. 2). Among the rice genes aligned were three conferring
known resistance phenotypes: the Xa1 (Yoshimura et al.
1998), Pib (Wang et al. 1999), and Pi-ta (Bryan et al. 2000)
genes. Four other cereal genes with demonstrated or sus-
pected resistance phenotypes were also included for compari-
son; these were barley Mla1 (Zhou et al. 2000), and single
members of the maize Rp1, putative Rp3 (Webb et al. 2002),
and PIC19 (Collins et al. 1998) gene families. The different
rice sequences formed many distinct clades in the phyloge-
netic analysis, forming 117 groups when bootstrap values of
>75% were used to define the groups. Some of the clades were
composed of single families or even single genes. For example,
the rNBS1 and rNBS69 sequences each form a distinct branch
on the tree (Fig. 2) and detect a single gene in gel-blot hybrid-
ization experiments (Table 1). Other families formed distinct
groups with long branch lengths. These have apparently
arisen from ancient duplication events where the different
families have diverged considerably in sequence, but still
show good homology in conserved regions. Other rice genes
are grouped into nondistinct subgroups with different branch
lengths indicating a range of different times for the duplica-
tion and divergence of their family members. The maize and
barley sequences were dispersed on the tree into different
clades of rice sequences. Similar results were found when
other cereal sequences were included in the analysis (not
shown). This would be expected if these groups of resistance
genes had already differentiated when the different cereal lin-
eages separated.
Architectural Diversity in Cereal NBS–LRR Genes
Full-length sequences of cereal NBS–LRR genes were com-
pared to examine their structural diversity. Sequences com-
pared included several cereal genes for which full-length tran-
scripts had been characterized, including the known resis-
tance genes Rp1, Mla1, Xa1, Pib, and Pi-ta, and a full-length
rice cDNA AB017914. Full-length transcripts for two addi-
tional maize gene families, the putative rp3 family and the
PIC19 family, were also isolated. Other sequences included
coding regions predicted from genomic sequences. These in-
clude annotated sequences obtained from GenBank and gene
predictions from genomic sequences by GENSCAN and
FGENESH. To represent the full range of diversity of NBS–LRR
sequences in rice, we examined a full-length coding sequence
for most rice NBS–LRR gene families included in the phylo-
genetic analysis. Predicted full-length members were identi-
fied for >250 genes.
N-terminal Domain Structures
Most of the N-terminal regions in the cereal genes ranged
from 200–250 amino acids from the start of the coding region
to the beginning of the NBS domain (P-loop), similar to most
non-TIR genes in dicots. Because the non-TIR genes from di-
cots typically have CC motifs, the cereal sequences were ex-
amined for this domain structure. Using the COILS program
(threshold set to 0.9), CC motifs were apparent in only 47 of
the 100 randomly selected sequences. The Paircoil program
predicted even fewer CC domains with a threshold of 0.5. The
predicted CC domains were poorly conserved in sequence and
in their position, occurring in the beginning, middle, or ends
of the N-terminal regions. Most, but not all, of the N-terminal
sequences could be aligned reasonably well using alignment
programs like ClustalX because of their sequence similarity.
To look for conserved aspects of the sequences that were com-
mon to all the genes, we examined them with the MEME and
Block Maker programs. One conserved sequence motif, desig-
nated nT (for non-TIR), was identified, which occurred in
nearly all of the rice NBS–LRR genes examined. The nT motif
(WVxxIRELAYDIEDIVDxY) was usually located 130 amino
acids before the P-loop.
The N-terminal region of the Xa1 gene was unusual. Ini-
tial analysis indicated the region was relatively long compared
to the other genes, coding for a predicted 327 amino acids
before the P-loop. A CD (Conserved Domain) search of Gen-
Bank (deploys Pfam and Smart databases and NCBI collec-
tions) predicted that it codes for a zinc-finger, DNA-binding
domain (gnl|Smart|smart00614, score = 69.7 bits [169],
expect = 4e-13). The zinc-finger domain corresponds to resi-
dues 140–188 of the predicted amino acid sequence, within a
relatively typical N-terminal domain and before the NBS do-
main. Database searches found two other genes with similar
amino termini. The genes clustered together with Xa1 in the
phylogenetic analyses based on NBS-coding sequences (Fig. 2;
sequences rNBS19 and AL606660-5). The two genes flank a
gene that is highly similar (presumably allelic) to Xa1 within
a 63-kb interval of a BAC clone in the GenBank HTGS data-
base (AL606660). The amino acid sequences of both genes
align well with Xa1 but both have diverged, showing only
54% (rNBS19) and 47% (AL606660-5) amino acid identity in
the NBS region. Both genes were predicted to code for zinc-
finger, DNA-binding motifs (expected probabilities = 9e-07
and 3e-06).
The LRR and C-Terminal Regions
The leucine-rich repeat regions of the rice NBS–LRR genes
were quite variable in size and sequence. The repeats in most
of the genes were imperfect, with few repeats conforming to a
consensus sequence. In some, like Pib (Wang et al. 1999) and
Pi-ta (Bryan et al. 2000), the region is leucine-rich but has no
clearly distinguishable repeat structure. Roughly one third of
the predicted proteins examined were not predicted to have a
LRR repeat using the Pfam database. The LRR regions of some
of these genes carried >15% leucine, but they did not match
consensus sequences. When these highly degenerate LRR re-
gions are used to search databases for similar sequences, few,
if any, matches are found and similarity is typically weak. At
the other extreme, one rice NBS–LRR gene isolated from Gen-
Bank (accession no. AP003269) has an LRR region with 16
complete LRRs flanked by two incomplete repeats. Each of the
repeats is 24 amino acids long and the positions of the leucine
residues in each repeat is highly conserved. The size of the
predicted LRR domains was variable, but was between 350
and 700 amino acids for most of the genes.
Several NBS–LRR genes have been identified in Arabidop-
sis that code for additional domains after the LRR in their C
terminus (Dodds et al. 2001; Deslandes et al. 2002). The LRR
Cereal NBS–LRR Genes
Genome Research 1875
www.genome.org
domains of all the characterized cereal resistance genes ex-
tend to the end of the predicted gene products. The coding
regions predicted from rice genomic sequences were all simi-
lar in this regard. No evidence for any conserved domains
following the LRR regions was observed in the rice sequences.
Intron Positions in the NBS Regions
Introns in the NBS region of cereal NBS–LRR genes have im-
portant practical implications for identifying resistance gene
sequences by PCR amplification with degenerate primers or
identifying them in genomic sequence databases and distin-
guishing potentially functional genes from pseudogenes. In-
tron positions can also be used to support phylogenetic inter-
pretation of the relationships between the genes. In a survey
of 20 characterized dicot NBS–LRR resistance genes, only the
Arabidopsis Rpp8/Hrt gene family had introns in the NBS do-
main. Three of the characterized cereal resistance genes have
introns in their NBS region, that is, Mla1 (Zhou et al. 2000),
Pi-ta (Bryan et al. 2000), and Pib (Wang et al. 1999). This
suggests that NBS region introns could be more common in
cereals. Predicted intron positions in the NBS regions were
accordingly examined to estimate their frequency. The NBS
regions of several families of genes were verified by amplifi-
cation and cloning of cDNAs corresponding to genomic se-
quences that appeared to have introns (Table 2).
The most common intron position in cereals was at the
immediate N-terminal side of the kinase-2 motif (Table 2).
The Pi-ta, rNBS30, rNBS71, rNBS96, and rNBS98 genes all had
introns in this position. These genes all cluster together in the
same large group on neighbor-joining trees that were gener-
ated based on their NBS sequences, although the bootstrap
support for this group is very weak (9%). This large group is
composed of 40 different small clades (rNBS96–AC099734-2;
Fig. 2) that contain a total of 87 different gene families. Ex-
amination of other representative genes in this group found
that the majority of the gene families (31 out of 33 examined)
have predicted introns in this general position. The 87 gene
families in this group represent nearly one fourth of the total
rice families (354) that were analyzed on the phylogenetic tree
(Fig. 2), consistent with a very ancient origin for this intron
position. Further refinement of the relationships between
these gene families is needed to determine whether the pres-
ence of the intron accurately reflects the evolutionary rela-
tionships of these genes.
The rice Pib resistance gene has a single intron between
the RNBS-B and GLPL motifs. The position is different than
either of the introns in the rNBS102 clade (see below). Two
other related genes that form a clade with Pib (Table 2) also
have introns in this position. The rNBS84 gene has an intron
near the end of the NBS domain, before the conserved MHD
motif. This gene occurs in a clade with eight other NBS–LRR
genes. Six of the eight genes have sufficient genomic sequence
available to determine if introns are present, and all six appear
to have introns at this position. Overall, there is relatively
good agreement between NBS-area intron positions and the
predicted evolutionary relationships among these genes, in-
dicating that intron position can be used to support their
classification.
The gene corresponding to rice clone rNBS102 has two
introns between the RNBS-B and GLPL motifs. The first intron
is located at 13 amino acids after the conserved arginine resi-
due at the end of the RNBS-B motif, and the second one is 11
amino acids before the glycine residue in the GLPL motif
(GVPF; Table 2). Genes at the Arabidopsis Rpp8/Hrt locus (Mc-
Dowell et al. 1998) also have two introns in the NBS region,
but the first intron is located before the RNBS-B motif and the
second one is 21 amino acids upstream of the glycine residue
in the GLPL motif. On this basis, intron positions in the rice
rNBS102 gene and the unrelated Arabidopsis Rpp8 gene are not
in conserved positions. Surprisingly, no rice genes were iden-
tified with intron positions similar to that of the Rpp8 gene.
The most similar sequences to Rpp8 in the rice databases were
AC099402 and OSM144390, neither of which had introns
predicted in their NBS domains.
NBS Regions With Unusual Structures
As described by Wang et al. (1999), the NBS domain of the rice
Pib gene has a partially duplicated NBS structure. The gene
carries one complete NBS region with an intron between the
kinase-2 and GLPL domains. Directly upstream of this, the
Table 2. Intron Positions in the NBS Domains of Some Cereal NBS-LRR Genes and the Arabidopsis Rpp8 Gene
Gene General Position Splicing Position*
rNBS30 Before kinase-2 KRYIIVIDDIW
rNBS71 Beginning of kinase-2 ERYLVVIDDIW
rNBS96 Beginning of kinase-2 KRYLIVIDDLW
rNBS98 Beginning of kinase-2 RYIVILDDIW
Pi-ta Beginning of kinase-2 KRYFIIIEDLW
rNBS84 Before MHD SMVSPVHAKAPRKLTMHDLVYD
rNBS102 Two, between RNBS-B and GLPL GSKVIVTTRSGAVAKLLGMDLTKPLSSEDCWSLFRRCALGVEVKEYNSGDFLDRLKMEVLQKCN
GVPFIA
RPP8 Two, 1st between kinase 2 and RNBS-B;
2nd between RNBS-B and GLPL
LVVLDDVWKKEDWDVIKAVFPRKRGWKMLLTSRNEGVGIHADPTCLTFRASILNPEESWKLCER
IVFPRRDETEVRLDEEMEAMGKEMVTHCGGLPLAV
Pib-2** Between RNBS-B and GLPL TSRIIVTTRKENIANHCSGKNGNVHNLKVLKHNDALCLLSEKVFEEATYLDDQNNPELVKEAKQ
ILKKCDGLPL
Pib-1 After RNBS-B GSRIIVSSTQVEVASLCAGQESQASELKQLSADQTLYAFYDKGSQ
*Conserved sequences from NBS domains are underlined. Positions of introns are indicated by boxing the amino acids corresponding to the
codon that was created or the two codons that were joined by splicing.
**The Pib gene has a duplicated structure with two tandem, partial NBS regions with introns at essentially the same position in each. Pib-1
indicated the first N-terminal NBS region, which ends before the GLPL domain.
Bai et al.
1876 Genome Research
www.genome.org
sequence codes for most of another NBS domain, from the
P-loop through the kinase-3a domain until similarity is inter-
rupted by another intron. The structures of the duplications
are very similar from the P-loop through the kinase-3a gene
where they are flanked on their 3 by an intron (Table 2),
suggesting that the intron may have played a role in creating
the duplication. One gene with a similar structure was found
in the GenBank HTGS database on a BAC clone (accession no.
AP004048). The BAC maps to rice chromosome 2, like Pib,
and the predicted protein shows >90% sequence identity to
that predicted for Pib for much of the coding region. This
gene is probably a member of the gene family that the Pib
gene detects in gel-blot hybridizations (Wang et al. 1999).
Three other genes predicted to code for similar proteins were
also identified from a BAC clone (accession no. AP003862) on
chromosome 8. These genes were less related to Pib, only
46%–53% identical in the NBS coding region. Two of the
genes had tandem duplications with intron positions like Pib,
while the third (AP003862-3) was predicted to code for three
tandem duplications separated by introns. Only fragments of
this gene were present in the Monsanto and Indica databases,
so the structure of this region of the gene could not be verified
with a second sequence. The C-terminal-most NBS-coding re-
gion, adjacent to the LRR, was typically the most conserved in
sequence in all of the genes coding for tandem NBS se-
quences. The upstream duplicated sequences had typically di-
verged considerably and were in some cases difficult to rec-
ognize by searching for conserved motifs.
A Novel Class of Rice Genes Has No LRR-Coding Region
While mining NBS–LRR sequences from GenBank, we identi-
fied a gene family in rice with a different structure than the
known NBS–LRR resistance genes. The most striking differ-
ence is that none of the family members possesses an LRR
domain. We found a total of 32 genes on five BAC clones from
the GenBank database. Eleven of the genes reside on a 202-kb
interval spanned by two BAC clones (AC079843 and
AC074283) on chromosome 10. The distances between the
genes in these two overlapping BACs range from 5.5–52 kb.
The first three genes are in the opposite orientation compared
to the other eight genes. Another nine clones were closely
spaced, in the same orientation, on a 43-kb interval of rice
chromosome 1 (AP003292). A single member was found on
another BAC clone (AP000570) on chromosome 1. Eleven ad-
ditional genes were found in the same orientation on a 50-kb
interval of a chromosome 7 BAC clone (AP003810). Only one
of the genes was interrupted by a gap in the sequence of this
clone. Searches of the Monsanto database with these genes
identified four additional genes with sequence and structural
similarity. A search of the Indica database found 24 genes
coding for proteins with 97–100% sequence identity to those
found in the Japonica databases, and these were therefore
considered possible alleles. Fourteen additional genes from
the Indica database were <90% identical to any of the Ja-
ponica genes. It is difficult to determine the degree of ge-
nomic clustering or the map positions of genes from the Mon-
santo and Indica databases, as they were identified on smaller,
unmapped sequence contigs. In total, 50 genes in this class
were identified (Fig. 3). Five partial gene sequences were also
observed on some of the smaller sequence contigs that were
not identical to any of these 50, indicating that there are more
than 50 genes in this class. The predicted coding regions were
composed of single exons and ranged from 385–556 amino
acids. One sequence (from Indica contig 32812) was predicted
to code for only 258 amino acids, but may be a pseudogene
because it appears truncated at both the N and C termini. Two
other genes from GenBank (AP003292-4 and AC79843-5) and
four genes in the Indica database were predicted to have in-
trons, but on closer inspection were found to be likely pseu-
dogenes. At least some of the genes are transcribed, as rice
ESTs (BE229855, AU096505, AU166590, AU063352, and
AA754293) matching five of the genomic sequences were
found in GenBank.
The N-terminus of the non-LRR genes, before the P-loop,
consists of roughly 175–210 amino acids in most of the genes.
Alignment of the N termini of these genes found they code for
three conserved motifs separated by less conserved regions. To
visualize a typical arrangement of conserved motifs, a consen-
sus sequence (Fig. 4) of the gene family was generated from
aligned sequences using the program HMMER 2.2g (http://
hmmer.wustl.edu/). The N-terminal sequences show similar-
ity to the N termini of resistance genes in the non-TIR–NBS–
LRR genes, but this similarity is very weak; even the conserved
nT motif is poorly conserved. Similar to the regular NBS–LRR
genes, the COILs program predicts a CC domain (>90% prob-
ability) in some, but not all, of these regions, and the regions
coding for predicted CC motifs do not correspond to the con-
served motifs.
The NBS coding domain of the rice non-LRR group is also
diverged from other NBS–LRR genes. The conserved domain
(CD) search tool at NCBI (http://www.ncbi.nlm.nih.gov/)
predicts an NBS domain, but sequence similarities are typi-
cally weak when compared with other R gene homologs.
When consensus sequences of conserved motifs of NBS–LRR
genes are compared to those of the non-LRR gene families, the
consensus sequences were slightly different (Table 3). Similar-
ity to the NBS–LRR genes falls off shortly after the GLPL motif
and the C-terminal 100 amino acids of the rice non-LRR
genes show little similarity to R genes. One interesting aspect
of the NBS domains of the non-LRR rice genes is that some of
the motifs that are the most highly conserved among the R
genes are not among the most highly conserved sequences in
this gene family. For example, the kinase-2 and GLPL motifs
Figure 3 Structures and approximate numbers of nucleotide bind-
ing site–leucine-rich repeat and related genes in rice. Open boxes
represent domains with no significant database similarity except for
similar genes from other plant species.
Cereal NBS–LRR Genes
Genome Research 1877
www.genome.org
are poorly conserved compared to many other regions of the
gene family.
Database similarity (BLASTP) searches of other plant spe-
cies identified EST sequences from other cereals, such as sor-
ghum (e.g., BM325897) and barley (e.g., AV835233), that
code for very similar proteins, but none were apparent among
dicot genomic or EST sequences. The most similar Arabidopsis
sequences in searches with representative non-LRR members
were typically the NBS–LRR genes from the non-TIR class, and
these showed weak similarity (<25% identity). For example, a
BLASTP search of Arabidopsis proteins using the consensus
nT-NBS sequence (Fig. 4) as a query found the best match to
be the predicted protein BAB01338, a typical nT-NBS–LRR
coding gene. Thus, no evidence of a dicot version of these
non-LRR genes was found.
Rice Genes Similar to TIR Coding Sequences
To determine if the rice genome carries any genes with TIR
domains, we searched databases with a consensus TIR se-
quence designed from the tobacco N, flax L and Arabidopsis
Rpp5 genes. One sequence was identified in both the Mon-
santo (OSM12752) and GenBank HTGS (AP003932 and
AP003866) databases. A GenScan analysis predicted a coding
region composed of three exons coding for 196, 21, and 29
amino acids separated by introns of 121 and 644 nucleotides.
A presumably allelic sequence was identified in the Indica
database that coded for a protein with only three amino acid
differences. An alignment with the TIR domains of N, L, RPP5,
TOLL and a human Toll-like receptor gene showed similarity
throughout the whole predicted protein (not shown). The
sequence apparently represents an expressed gene in cereals,
as a barley EST (accession no. BI948029) was identified that
was very similar to the rice gene (72% amino acid identity).
The gene is also similar (50% identity for 144 amino acids) to
a predicted Arabidopsis protein (accession no. AAG52286).
The Arabidopsis protein is also small (199 amino acids) and
composed mainly of a TIR domain.
A second class of genes was identified that code for di-
vergent TIR and NBS domains. In total, three genes from
this family and one pseudogene were identified. The first
was identified in the GenBank (AP000364; protein_id
= BAB61209.1) and Monsanto databases (OSM1850). The In-
dica rice sequence Contig4057 also codes for most of a protein
(except the first 106 amino acids) that is identical in sequence
to this predicted protein. A second coding region was carried
on overlapping GenBank sequences (AP003256 and
AP003274) and on contig OSM15552 and was >99% iden-
tical to a sequence coded by a presumed allele on the Indica
sequence Contig2492. An additional gene was coded by the
Indica sequence Contig17995, and Contig5477 appears to
code for a pseudogene with a stop codon and at least one
small deletion. A sequence similar to this latter locus was
present in the Japonica rice sequences of the Monsanto
database, but it could not be determined if it was a potentially
functional allele because the coding region was incomplete.
The NBS regions of these genes code for motifs similar to
most of the conserved motifs in NBS-coding domains of
R genes (Table 3), but their sequences diverge from the
R genes after the GLPL motif. The three genes were predicted
to encode functional proteins ranging from 986 to 1002
amino acids. The genes code for 165–168 amino acids be-
fore their TIR-like domain and this N-terminal region is the
least conserved region of the gene (Fig. 5). One feature
common to the N-terminal regions of these genes is that
they are serine-rich, with 14.5%–23% serine residues. Most of
the remainder of the coding regions are more conserved, in-
cluding the C-terminal 250–300 amino acids domain after
the predicted NBS domain. No known protein-coding
domains were detected in the C-terminal region, but several
highly conserved sequences were apparent among the
genes. The C-terminal 275 amino acids of the three genes
range from 12%–14.5% leucine. This presents the possibility
that these sequences evolved from a degenerate LRR, but the
patterns of leucines poorly match that of a leucine-rich re-
peat.
Database searches identified two Arabidopsis genes that
were very similar to the three rice TIR–NBS genes. Sequence
homology extends throughout the genes, including the 5
Table 3. Consensus Sequences of Conserved Motifs in NBS Domains in Different Classes of Genes
Gene Class The P-Loop Kinase-2 RNBS-B GLPL MHD Domain
TIR-NBS-LRR GIGKTT(il)A KVL(ivl)(vi)LDDVD GSRII(iv)TT(qre)D G(gn)LPLGL MH(ndkr)LLQQ
nT-NBS-LRR GMGG(vli)GKTTL (lv)(lvi)(vl)DD(vl)W GS(kr)(vi)(lva)(fl)T(ts)R GLRLAL MHD(vm)(vlim)R(ed)
nT-NBS VGK(skr)TLV L(vi)(vi)(vi)E(lf)xxD GSK(iv)l(iv)xSR GSF(ilm)xAN Not conserved
TIR-NBS SGIGKTEL L(lv)(iv)IDNL HV(il)(iv)TTR GLW(iv)V Not conserved
*Letters in parentheses designate residues that occur in similar frequencies at a single position. Consensus sequences of TIR and Non-TIR genes
were calculated from Arabidopsis sequences (http://niblrrs.ucdavis.edu/At_RGenes). The non-LRR consensus sequences were calculated from 47
rice sequences and the TIR-NBS sequences were calculated from three rice genes and two related Arabidopsis genes.
Figure 4 Consensus sequence of the rice nT-nucleotide binding site
(NBS) class genes. The consensus was generated from an alignment
of 47 nT-NBS genes by HMMER 2.2g. Conserved regions in the N-
terminal domain are underlined. Sequences in bold correspond to
sequences conserved in NBS-coding domains; the P-Loop, kinase-2,
RNBSB, and GLPL motifs (Table 3). More conserved amino acid po-
sitions are represented by upper-case letters.
Bai et al.
1878 Genome Research
www.genome.org
and 3 regions. The sequence affinities of the five genes indi-
cate that the ancestral gene had duplicated and the two para-
logs had diverged before monocots and dicots split. The pre-
dicted proteins of the two rice genes from AP000364 and Con-
tig17995 are more similar to the Arabidopsis At5g56220
protein (59% and 55% identical, respectively) than they are to
the rice AP003256 protein (both 36% identical). Similarly, the
other rice protein (from AP003256) is more similar (53% iden-
tical) to the other Arabidopsis protein, At4g23440.
A Phylogeny of Rice Genes Based on NBS Region Sequences
To examine the evolutionary relationships of the different
types of NBS-coding sequences in rice, the sequences of the
rice nT-NBS genes and TIR–NBS genes were compared to rep-
resentative NBS–LRR genes from rice and other species (Fig.
6). The rice NBS–LRR genes were selected to represent a diver-
sity of different clades from the previous analysis of these
genes (Fig. 2). Amino acid sequences utilized from the nT-NBS
and TIR–NBS genes were limited to the region between the
P-loop to the GLPL motif because of the limited identity be-
tween classes outside this region. For comparative purposes,
several NBS–LRR resistance genes from dicots also were used,
including five representative genes from the TIR class (Arabi-
dopsis Rpp1, flax L6 and M, and tobacco N) and four genes
from the non-TIR subclass (Arabidopsis Rps5 and Rpm1, to-
mato I2 and potato Rx). Two Arabidopsis genes that were re-
lated to the rice TIR–NBS genes also were included. Trees
based on distance and parsimony had similar topologies.
The rice and Arabidopsis TIR–NBS genes formed a distinct
clade that was well separated from NBS–LRR genes. It was
separated into two distinct groups, each with an Arabidopsis
gene and one or two rice genes. This supports the hypothesis
that there were at least two genes of this class in the nearest
common ancestor of modern monocots and dicots. The rice
nT-NBS genes also formed a single large clade that was distinct
from the NBS–LRR genes. The variation within this group and
the deep divisions between the different subgroups indicated
that the nT-NBS genes comprise a very old gene family, like
the NBS–LRR genes. The different subgroups are, at least par-
tially, separated into different chromosomal regions in the
rice genome. One group is composed mainly of sequences
from the two overlapping chromosome 10 BACs (AC79843
and AC074283), while the clones from a chromosome one
BAC (AP003292) all belonged to a separate group. Similarly,
the sequences from the chromosome 7 BAC clone (AP003810)
also cluster together in 69% of the trees (Fig. 6).
The cereal and dicot NBS–LRR genes formed two distinct
groups, as expected from previous analyses that found the TIR
subclass to be a distinct group (Meyers et al. 1999; Pan et al.
2000b; Young 2000). The four non-TIR class dicot R genes
included in the analysis did not cluster together, but rather
grouped with different cereal genes with bootstrap values
ranging from 71% (for potato Rx) to 100% (for Arabidopsis
Rps5). This agrees with the proposal by Cannon et al. (2002)
that there are several ancient sequence clades of nT-NBS–LRR
genes and that some of these predate the split between mono-
cots and dicots.
Similarity Between Rice NBS–LRR Genes and Those From
Other Cereals
To estimate the extent to which the genomic sequences in-
clude most NBS–LRR gene families, we selected 61 rice gene
fragments that potentially code for partial NBS-coding do-
mains from GenBank and used to search the available ge-
nomic sequences. Forty-nine (80%) of the NBS gene frag-
ments matched highly similar genomic sequences (95%
amino acid identity) and all but one matched sequences with
>85% sequence identity. The remaining sequence identified a
less closely related family member, with 77% amino acid
identity. This indicates that members of nearly all of the NBS–
LRR families are represented in the available genomic se-
quences, although the gene fragments used in the similarity
searches are clearly not a random sample of the NBS–LRR
genes. The rice sequences in our clone collection and in the
GenBank, Monsanto, and Indica databases should therefore
represent most of the NBS–LRR genes in rice, or at least carry
members of most NBS–LRR gene families.
A similar approach was used to examine the extent of
similarity of rice NBS–LRR genes to those of other cereal spe-
cies. Forty-seven fragments that potentially code NBS genes
from different cereal species were used to identify the most
similar rice sequences in the genomic sequence databases
(Table 4). Comparisons of the 47 sequences to one another
revealed 28 groups or families with >75% amino acid identity
within the families. The different members of these groups
Figure 5 Consensus sequence of three predicted rice TIR-
nucleotide binding site (NBS) genes with two related Arabidopsis pro-
teins. The region showing similarity to the TIR domain of TIR–NBS–
leucine-rich repeat genes is underlined. Sequences in bold corre-
spond to sequences conserved in NBS-coding domains, the P-Loop,
kinase-2, RNBSB, and GLPL motifs (Table 3). Serine residues in the
serine-rich N terminus and leucine residues in the C terminus are
highlighted. More conserved amino acid positions are represented by
upper-case letters.
Cereal NBS–LRR Genes
Genome Research 1879
www.genome.org
Figure 6 Phylogenetic analysis of nucleotide binding site-leucine-rich repeat (NBS–LRR) and related genes from rice and dicot species. Rice
NBS–LRR sequences were selected from different clades identified in Figure 1. Other designations correspond to characterized resistance genes or
GenBank, Monsanto, or Indica database accession numbers and the species designations are indicated after the clone designation: O.s., rice; Z.m.
maize; S.t. potato; A.t. Arabidopsis; L.e. tomato; L.u. flax; N.g. tobacco, and H.s. human. The 48 nT-NBS sequences are all from rice, while two of
the TIR–NBS sequences are from Arabidopsis and the other three are from rice.
Bai et al.
1880 Genome Research
www.genome.org
generally showed similar levels of identity to the same rice
sequence. The different cereal genes exhibited a range of se-
quence similarities to the rice genes. Two of the maize se-
quences (mNBS2 and AF056161) showed 84% and 85%
amino acid identity with rice sequences, and one wheat
(AF087521) sequence showed 85% identity. Alternatively,
many of the 28 families did not identify likely orthologs. Sur-
prisingly, 10 of the 28 families showed amino acid identities
of 60%.
DISCUSSION
As the largest class of disease resistance genes, the NBS–LRR
genes play a critical role in defending plants from a multitude
of pathogens and pests. The availability of nearly complete
genomic sequences of two distantly related plant species, rice
and Arabidopsis, allows comparative evolutionary analyses of
these genes. Previous analyses of available cereal sequences
have implied that the cereal NBS–LRR genes may be more
homogeneous in their domain architecture than similar genes
in dicots. The dicot genes can be divided into two distinct
groups, those coding for a TIR domain at their N-termini and
those without the TIR domain. The TIR class NBS–LRR genes
account for the largest proportion of the NBS–LRR genes in
the Arabidopsis genome (http://niblrrs.ucdavis.edu/
At_RGenes/), but this class has not yet been found in cereals.
We failed to detect these sequences after examining roughly
820 Mb of assembled rice genomic sequence from two differ-
ent genotypes of the estimated 430-Mb rice genome. They
also are absent in cereal EST databases. It therefore seems the
TIR class is not only rare in cereals, but probably absent. The
presence of these sequences in gymnosperm databases, like
the Pinus taeda EST sequence database (e.g., GenBank acces-
sion no. BI077056) indicates that this class of gene was pres-
ent in the progenitors of grass species, but lost in the grass
family (Meyers et al. 2002). It is likely that there were a very
small number of TIR class genes in early angiosperms, and
that their numbers amplified in the progenitors of modern
dicots as they became more dependent on them for defense
against pathogens. On the other hand, our results and those
of others (Cannon et al. 2002) have shown that specific
monocot and dicot nT-NBS–LRR genes cluster together in
phylogenetic analysis as expected if several members of this
class had already diverged in early angiosperms. The numbers
of genes in this class has amplified to 600 or more in rice,
compared to 50 in Arabidopsis (http://niblrrs.ucdavis.edu/
At_RGenes/).
One class of NBS–LRR-related gene was identified in ce-
reals that may be specific to cereal, or monocot genomes. The
nT-NBS class shows similarity to the N-terminal half of nT-
NBS–LRR genes but has no LRR domain. Other genes related
to nT-NBS–LRR genes without LRR domains were observed in
cereal genomes and are also present in Arabidopsis (Meyers et
al. 2002). For example, the rp1-pd5 gene is a transcribed mem-
ber of the Rp1 family of maize and is 99% identical to the
Rp1-D gene, but is truncated before the LRR coding region
(Sun et al. 2001). The nT-NBS family is different from these
sequences in several respects. As a group, their sequences have
diverged considerably from the other NBS–LRR genes, evi-
dence that they have evolved independently from these genes
for most of their evolution. The family is also different in that
it is monophyletic and contains no known members that
code for LRR domains. Although the family appears very rare,
or missing, in dicot genomes, it is a very old gene family as
evidenced by the extensive sequence divergence among mem-
bers. Over 50 members exist in rice, making the family
roughly the size of the nT-NBS–LRR family in Arabidopsis.
Some limited structural heterogeneity was observed in
Table 4. Sequence Similarity Between Rice Sequences
and Genes From Other Cereal Species
Cereal NBS Family* Most Similar Rice Sequence**
mNBS1 Zea AF032688 84%
mNBS2 Zea rNBS6, In37394 71%
mNBS6 Zea In7573 64%
AF056150 Zea In7573 59%
AF056152 Zea In55234 75%
AF056155 Zea AP003914, In20630 58%
AF056161 Zea AP003280, In930 85%
AF107293 Zea (Rp1) AP003368 72%
AF056153 AP003449, In17424 74%
AF056159 In1613 56%
AF325196 Triticum In1613 59%
AF320290 (85%) Secale
AF108008 (81%) Hordeum
AF108011 (81%) Hordeum
AF108012 (83%) Hordeum
AF108013 (81%) Hordeum
AF108015 (80%) Hordeum
AF108010 Hordeum In7323 65%
AF078873 Avena In4386 78%
AF032679 (84%) Hordeum
AF032682 Hordeum AF032688 81%
AF087519 (91%) Triticum
AJ296001 (87%) Avena
AF032684 Hordeum In8824 (62%)
AF032686 Hordeum In10315 60%
AF032687 Hordeum AP003948, AP003914 67%
AJ302298 Hordeum In23808 55%
AY009939 (84%) Hordeum
(Mla1)
AJ302296 (77%) Hordeum
AF032685 Hordeum AC099734 61%
AF149114 Triticum In5429 60%
AF052641 Triticum In9457 53%
AF158635 (82%) Triticum
AF158634 Triticum In7095 59%
AF087521 Triticum In20889 85%
AF325198 Triticum In1613 59%
AJ295999 Avena In24620 77% A220741 77%
TVE249945 73% Aegilops
TVE249944 74% Aegilops
AF087518 74% Triticum
AF087520 81% Triticum
AF032683 78% Hordeum
AF032680 75% Hordeum
AJ249948 Aegilops In6820 63%
AJ249949 Aegilops In6820 66%
AJ249943 Aegilops In10362 70%
AJ296002 79% Avena
AJ295998 Avena AF032692 64%
*Cereal sequence, or family of related sequences, with a predicted
NBS coding region. When families of related sequences are listed,
the top sequence shows the highest identity to a rice sequence.
Numbers in parentheses next to other sequences give their iden-
tity (% amino acid) to the top family member.
**Accession no. of the rice sequence that exhibits the highest
sequence identity. Accession nos. starting with ‘A’ are GenBank
accessions, and an ‘In’ prefix indicates the Indica sequence data-
base. Percentages indicate the % amino acid identity for at least
155 amino acids, typically from the P-loop to the GLPL domain.
Cereal sequences that did not include this entire region were not
examined.
Cereal NBS–LRR Genes
Genome Research 1881
www.genome.org
the cereal NBS–LRR genes as several genes with duplicated or
novel domains were observed. For example, the Xa1 gene car-
ries sequences near its N terminus that are predicted to code
for a zinc-finger, DNA-binding domain. The N termini of the
vast majority of the N-terminal domains were quite homoge-
neous, that is, they were typically small with at least one
highly conserved region. Many code for predicted CC do-
mains, although this is not a distinguishing feature of this
class of genes. While the structures of these genes are fairly
conserved, they are extremely diverse in sequence. The C-
terminal regions of many of the genes are barely recognizable
as LRR domains. Even the NBS regions of the genes have di-
verged extensively and classification of the genes based on the
sequence of this region reveals over 100 distinct clades. Some
of these clades consist of one or a few genes in the rice ge-
nome, while others have amplified into large groups with
varying degrees of similarity. Much of the divergence among
these genes apparently occurred before the different cereal
species separated, as NBS coding sequences of other cereals
typically cluster within the rice clades.
While typical TIR class NBS–LRR genes do not appear to
be present in cereal species, genes related to this class are
present. One rice gene was identified, with strong homology
to a barley EST, which coded for a TIR domain but no other
recognizable domains. A second class, with at least three
genes in rice, coded for divergent TIR and NBS domains. The
C-terminal domain of these genes may have been derived
from an LRR-like domain but was unique. Homologs of both
of these classes of genes were observed in the Arabidopsis ge-
nome. In fact, sequence affinities between the rice TIR–NBS
genes, and two similar Arabidopsis genes provide evidence
that two members of this gene class were present when mono-
cots and dicots diverged.
The structural differences between NBS–LRR genes in
Arabidopsis are partially correlated with their dependence on
certain other disease-signaling components (Glazebrook
2001; Austin et al. 2002). For example, a functional Eds1 gene
is required for resistance mediated by the TIR–NBS–LRR genes
Rps4 and Rpp5 but is not required for several non-TIR class
genes, such as Rps2 and Rpm1 (Aarts et al. 1998). The predomi-
nance of the non-TIR class in cereals might indicate that the
cereal R genes signal through fewer or simpler pathways. On
the other hand, the Mla1 and Mla6 genes are highly similar in
sequence and structure, but differ in their requirements for
the Rar1 and Sgt1 gene products (Zhou et al. 2000; Azevedo et
al. 2002; Halterman et al. 2002). It seems likely, therefore, that
the different defense signaling pathways that cereal R genes
utilize depend on factors other than obvious differences in
structural domains.
NBS-coding sequences from other cereals exhibit a sur-
prising range of similarity to the rice sequences. Some maize
and wheat sequences exhibit 85% amino acid sequence iden-
tity to rice genes, while 10 of 28 families showed <60% se-
quence identity. It is possible that the rice orthologs of some
of these families are missing from the available rice databases,
but most of the rice genes are recently duplicated, and it is
unlikely that all the sequences would be missing for very
many families. This would also explain why we and others
(Leister et al. 1998), have found that many of the rice se-
quences cross-hybridize weakly to other cereal species. This
may be an indication that the resistance genes are evolving at
very different rates. Alternatively, it could be from loss of
resistance genes, or gene families in different species lineages
(Michelmore and Meyers 1998; Cannon et al. 2002). If resis-
tance genes are commonly lost from species lineages, com-
parative mapping experiments might frequently mistake
similar sequences for orthologs when they are actually more
distantly related paralogs. This may explain why the initial
comparative mapping experiments with resistance genes in
cereals have implied that their relative map positions may be
less conserved than other types of genes (Leister et al. 1998).
The present collection of NBS coding clones should provide
sufficient probes for more detailed comparative mapping ex-
periments, allowing a more extensive test of relative levels of
Table 5. List of Bioinformatics Programs Used in This Study
Program Functions Used In This Study Web Site URL References
Clustal X Multiple sequence alignment;
Bootstrapping analysis
ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX/ Thompson et al. 1997,
Jeanmougin et al. 1998
GeneDoc Editing aligned sequences http://www.psc.edu/biomed/genedoc/ Nicholas et al. 1997
GENSCAN Predicting open-reading-frames http://genes.mit.edu/GENSCAN.html Burge and Karlin 1997
FGENESH Predicting open-reading-frames http://genomic.sanger.ac.uk/ Solovyev V.V. 2001
PAUP Parsimony tree generation http://paup.csit.fsu.edu Swofford, D.L. 2002
Pfam Conserved protein domain
search
http://pfam.wustl.edu/ Bateman et al. 2002
Smart Conserved protein domain
search
http://smart.embl-heidelberg.de/ Letunic et al. 2002
MEME Conserved protein domain
search
http://meme.sdsc.edu/meme/website/meme.html Bailey et al. 1999
MAST Conserved protein domain
search
http://meme.sdsc.edu/meme/website/mast-intro.html Bailey et al. 1999
Block Maker Conserved protein domain
search
http://www.blocks.fhcrc.org/blockmkr/ Henikoff et al. 1995
TreeView Phylogenitic tree image
generation
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html Page 1996
NJPlot Phylogenitic tree image
generation
http://pbil.univ-lyonl.fr/software/njplot.html Perrière and Gouy 1996
COILS Predicting coiled-coil domain http://www.ch.embnet.org/software/COILS_form.html Lupas et al. 1991
Paircoil Predicting coiled-coil domain http://nightingale.lcs.mit.edu/cgi-bin/score Berger et al. 1995
Bai et al.
1882 Genome Research
www.genome.org
synteny. Examination of the presence or absence, and estima-
tion of copy numbers of the different NBS–LRR gene families
in the different grass species will shed light on the evolution-
ary dynamics of resistance-gene gain and loss in cereal ge-
nomes. The sequences identified in the present study provide
a framework for classification of additional cereal genes.
METHODS
Sequence Acquisition
Cereal resistance gene sequences were obtained either from
cloned PCR products (below) or by searching various data-
bases with amino acid sequences of specific resistance genes
as queries. Initial queries were performed with known resis-
tance genes. Additional queries were done with more unique
sequences after initial cladistic analyses. Two things that were
considered do determine whether sequences were retained for
phylogentic and structural analysis: (1) whether they were
NBS–LRR sequences as indicated by conserved domains (do-
mains in Table 3); legitimate NBS–LRR genes often had <30%
amino acid identity to the sequences used in the initial que-
ries and sometimes had TBLASTN scores <1e-4, but sequences
with lower scores were often apparent pseudogenes with in-
terrupted coding regions; and (2) whether they were <75%
identical in amino acid sequence to already collected se-
quences. To determine this, a local database of the predicted
proteins of these sequences was sequentially searched by
BLASTP with each of the new sequences. This allows the iden-
tification of identical sequences and their classification into
families. The local database was updated with the new se-
quences periodically. Databases searched include the Gen-
Bank nonredundant and high-throughput genomic sequence
databases (http://www.ncbi.nlm.nih.gov/blast/index.html),
the Monsanto rice genomic database (www.rice-research.org),
and the Indica rice genomic database posted by the Beijing
Genomics Institute (http://btn.genomics.org.cn/rice). The fi-
nal search to above-mentioned databases was in May 2002.
Cloning and Characterization of Maize and Rice NBS
Region Sequences
Primers were designed from the NBS-coding sequences ob-
tained from the database searches. Forward primers were pro-
duced in such a way that they ended at the beginning of the
P-loop of the NBS region of each gene. The reverse primers
were designed about 900 bp after the P-loop, and at the end of
the NBS region where the amino acid sequence ‘MHD‘ is mod-
erately conserved. The PCR products that showed 0.9–1 kb in
size were cloned into pCR2.1-TOPO cloning vector from In-
vitrogen and sequenced in the Kansas State University DNA
sequencing facilities. Maize inbred line B73 and an Indica rice
variety Nipponbare were used to generate RNA or DNA tem-
plates. Genomic DNA was usually used as a template, but RNA
was used to amplify several NBS sequences to verify intron
positions.
Full-length coding sequences were isolated from two pre-
viously isolated maize NBS clones, PIC13 and PIC19. Ge-
nomic clones homologous to the probes were isolated from
libraries made from the maize lines B73 (for PIC19) and Rp3-
A-R168 (PIC13). The PIC13 probe is thought to represent a
member of the Rp3 gene family (Collins et al. 1998; Webb et
al. 2002). The NBS–LRR genes were sequenced, following sub-
cloning, into a pUC19 vector. Transcripts corresponding to
the genes were isolated by RACE PCR using 5 and 3 RACE
System For Rapid Amplification of cDNA Ends from Gibco
Invitrogen Corporation.
Genomic Hybridizations
Cloned fragments of rice NBS–LRR genes were used as probes
on genomic blots of rice to estimate the genomic copy num-
bers of each of the different families. Five micrograms of ge-
nomic DNA from four rice cultivars (Azucena, IR64, Gi-
hobyeo, and Milyang23) were digested with four restriction
endonucleases (EcoRI, EcoRV, HindIII, and XbaI), separated on
0.8% TBE agarose gels and blotted prior to hybridization.
Probe labeling, hybridization, and signal detection were per-
formed using ECL Direct Nucleic Acid Labeling And Detection
System from Amersham Pharmacia Biotech. Blots were
washed following hybridization at a moderate hybridization
stringency of 0.5X SSC (75 mM NaCl and 7.5 mM sodium
citrate) at 65°C.
Bioinformatic Programs Used for
Phylogenetic Studies
The bioinformatic programs used in this study are listed in
Table 5. All parameters in these programs were set to default
except that ‘Arabidopsis’ was specified as the organism in
GenScan, and ‘Monocots’ was specified in FGENESH.
ACKNOWLEDGMENTS
The authors wish to thank Blake Meyers and Richard
Michelmore for valuable discussions. This work was sup-
ported by NSF Plant Genome grant 9975971.
The publication costs of this article were defrayed in part
by payment of page charges. This article must therefore be
hereby marked “advertisement” in accordance with 18 USC
section 1734 solely to indicate this fact.
REFERENCES
Aarts, N., Metz, M., Holub, E., Staskawicz, B.J., Daniels, M.J., and
Parker, J.E. 1998. Different requirements for EDS1 and NDR1 by
disease resistance genes define at least two R gene-mediated
signaling pathways in Arabidopsis. Proc. Natl. Acad. Sci.
95: 10306–10311.
Austin, M.J., Muskett, P., Kahn, K., Feys, B.J., Jones, J. D., and Parker,
J.E. 2002. Regulatory role of SGT1 in early R gene-mediated plant
defenses. Science 295: 2077–2080.
Azevedo, C., Sadanandom, A., Kitagawa, K., Freialdenhoven, A.,
Shirasu, K., and Schulze-Lefert, P. 2002. The RAR1 interactor
SGT1, an essential component of R gene-triggered disease
resistance. Science 295: 2073–2076.
Bailey, T.L., Baker, M.E., Elkan, C.P., and Grundy, W.N. 1999.
MEME, MAST, and Meta-MEME: New tools for motif discovery in
protein sequences. In Pattern discovery in biomolecular data: Tools,
techniques, and applications. (eds. J. Wang, B. Shapiro, and D.
Shashas), pp. 30–54. Oxford University Press, Oxford, UK.
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy,
S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., and
Sonnhammer, E.L.L. 2002. The Pfam protein families database.
Nucleic Acids Res. 30: 276–280.
Bent, A.F. 1996. Plant disease resistance genes: Function meets
structure. Plant Cell 8: 1757–1771.
Berger, B., Wilson, D.B., Wolf, E.T.T., Milla, M., and Kim, P.S. 1995.
Predicting coiled coils by use of pairwise residue correlations.
Proc. Natl. Acad. Sci. 92: 8259–8263.
Bryan, G.T., Wu, K.-S., Farrall, L., Jia, Y., Hershey, H.P., McAdams,
S.A., Donaldson, G.K., Tarchini, R., and Valent, B. 2000. A single
amino acid difference distinguishes resistant and susceptible
alleles of the rice blast resistance gene Pi-ta. Plant Cell
12: 2033–2046.
Burge, C. and Karlin, S. 1997. Prediction of complete gene structures
in human genomic DNA. J. Mol. Biol. 268: 78–94.
Cannon, S.B., Zhu, H., Baumgarten, A.M., Spangler, R., May, G.,
Cook, D.R., and Young, N.D. 2002. Diversity, distribution, and
ancient taxonomic relationships within the TIR and Non-TIR
NBS–LRR resistance gene subfamilies. J. Mol. Evol. 54: 548–562.
Collins, N.C., Webb, C.A., Seah, S., Ellis, J.G., Hulbert, S.H., and
Pryor, A. 1998. The isolation and mapping of disease resistance
gene analogs in maize. Mol. Plant Microbe Interact 11: 968–978.
Deslandes, L., Olivier, J., Theulieres, F., Hirsch, J., Feng, D.X.,
Cereal NBS–LRR Genes
Genome Research 1883
www.genome.org
Bittner-Eddy, P.D., Beynon, J., and Marco, Y. 2002. Resistance to
Ralstonia solanacearum in Arabidopsis thaliana is conferred by
the recessive RRS1-R gene, a member of a novel family of
resistance genes. Proc. Natl. Acad. Sci. 99: 2404–2409.
Dodds, P.N., Lawrence, G.J., and Ellis, J.G. 2001. Six amino acid
changes confined to the LRR B-strand/B-turn motif determine
the difference between the P and Ps rust resistance specificities in
flax. Plant Cell 13: 163–178.
Ellis, J., Dodds, P., and Pryor, T. 2000. The generation of plant
disease resistance gene specificities. Trends Plant Sci. 5: 373–379.
Ellis, J.G., Lawrence, G.J., Luck, J.E., and Dodds, P.N. 1999.
Identification of regions in alleles of the flax rust resistance gene
L that determine differences in gene-for-gene specificity. Plant
Cell 11: 495–506.
Glazebrook, J. 2001. Genes controlling expression of defense
responses in Arabidopsis—2001 status. Current Opinion in Plant
Biology 4: 301–308.
Goff, S.A., Ricke, D., Lan, T.H., Presting, G., Wang, R., Dunn, M.,
Glazebrook, J., Sessions, A., Oeller, P., Varma, H., et al. 2002. A
draft sequence of the rice genome (Oryza sativa L. ssp. japonica)
Science 296: 92–100.
Halterman, D.A., Zhou, F., Wei, F., Wise, R., and Schulze-Lefert, P.
2002. The MLA6 coiled-coil, NBS–LRR protein confers
AvrMla6-dependent resistance specificity to Blumeria graminis F.
sp. hordei in barley and wheat. Plant J. 25: 335–348.
Hammond-Kosack, K.E. and Jones, J.D.G. 1997. Plant disease
resistance genes. Annual Review of Plant Physiology and Plant
Molecular Biology 48: 575–607.
Henikoff, S., Henikoff, J.G, Alford, W.J, and Pietrokovski, S. 1995.
Automated construction and graphical presentation of protein
blocks from unaligned sequences. Gene 163: GC17–GC26.
Hulbert, S.H., Webb, C.A., Smith, S.M., and Sun Q. 2001. Resistance
gene complexes: Evolution and utilization. Annu. Rev.
Phytopathol. 39: 285–312.
Jeanmougin, F., Thompson, J.D., Gouy, M., Higgins, D.G., and
Gibson, T.J. 1998. Multiple sequence alignment with Clustal X.
Trends Biochem. Sci. 23: 403–405.
Leister, D., Kurth, J., Laurie, D.A., Yano, M., Sasaki, T., Devos, K.,
Graner, A., and Schulze-Lefert, P. 1998. Rapid reorganization of
resistance gene homologues in cereal genomes. Proc. Natl. Acad.
Sci. 95: 370–375.
Letunic, I., Goodstadt, L., Dickens, N.J., Doerks, T., Schultz, J., Mott,
R., Ciccarelli, F., Copley, R.R., Ponting, C.P., and Bork, P. 2002.
Recent improvements to the SMART domain-based sequence
annotation resource. Nucleic Acids Res. 30: 242–244.
Luck, J.E., Lawrence, G.L., Dodds, P.N., Shepherd, K.W., and Ellis,
J.G. 2000. Regions outside of the leucine-rich repeats of flax rust
resistance proteins play a role in specificity determination. Plant
Cell 12: 1367–1377.
Lupas, A., Van Dyke, M., and Stock, J. 1991. Predicting coiled coils
from protein sequences. Science 252: 1162–1164.
McDowell, J.M., Dhandaydham, M., Long, T.A., Aarts, M.G., Goff, S.,
Holub, E.B., and Dangl, J.L. 1998. Intragenic recombination and
diversifying selection contribute to the evolution of downy
mildew resistance at the RPP8 locus of Arabidopsis. Plant Cell
10: 1861–1874.
Meyers, B.C., Shen, K.A., Rohani, P., Gaut, B.S., and Michelmore,
R.W. 1998. Receptor-like genes in the major resistance locus of
lettuce are subject to divergent selection. Plant Cell
11: 1833–1846.
Meyers, B.C., Dickerman, A.W., Michelmore, R.W., Pecherer, R.M.,
Sivaramakrishnan, S., Sobral, B.W., and Young, N.D. 1999. Plant
disease resistance genes encode members of an ancient and
diverse protein family within the nucleotide-binding
superfamily. Plant J. 20: 317–332.
Meyers, B.C., Morgante, M., and Michelmore, R.W. 2002. TIR-X and
TIR–NBS proteins: Two new families related to disease resistance
TIR–NBS–LRR proteins encoded in Arabidopsis and other plant
genomes. Plant J. 32: 77–92.
Michelmore, R.W. and Meyers, B.C. 1998. Clusters of resistance
genes in plants evolve by divergent selection and a
birth-and-death process. Genome Res. 8: 1113–1130.
Nicholas, K.B., Nicholas Jr., H.B., and Deerfield II, D.W. 1997.
GeneDoc: Analysis and Visualization of Genetic Variation.
EMBNEW NEWS 4: 14.
Page, R.D.M. 1996. TREEVIEW: An application to display
phylogenetic trees on personal computers. Comput. Appl. Biosci.
12: 357–358.
Pan, Q., Liu, Y.-S., Budai-Hadrian, O., Sela, M., Carmel-Goren, L.,
Zamir, D., and Fluhr, R. 2000a. Comparative genetics of
nucleotide binding site-leucine rich repeat resistance gene
homologues in the genomes of two dicotyledons: Tomato and
Arabidopsis. Genetics 155: 309–322.
Pan, Q., Wendel, J., and Fluhr, R. 2000b. Divergent evolution of
plant NBS–LRR resistance gene homologues in dicot and cereal
genomes. J. Mol. Evol. 50: 203–213.
Parniske, M., Hammond-Kosack, K.E., Golstein, C., Thomas, C.M.,
Jones, D.A., Harrison, K., Wulff, B.B.H., and Jones, J.D.G. 1997.
Novel disease resistance specificities result from sequence
exchange between tandemly repeated genes at the Cf-4/9 locus
of tomato. Cell 91: 821–832.
Perrière, G. and Gouy, M. 1996. WWW-Query: An on-line retrieval
system for biological sequence banks. Biochimie 78: 364–369.
Solovyev V.V. 2001. Statistical approaches in Eukaryotic gene
prediction. In Handbook of statistical genetics (eds. D. Balding, et
al.), pp. 83–127. John Wiley & Sons, Ltd., New York, NY.
Sun, Q., Collins, N.C., Ayliffe, M., Smith, S.M., Drake, J., Pryor, A.,
and Hulbert, S.H. 2001. Recombination between paralogues at
the rp1 rust resistance locus in maize. Genetics 158: 423–438.
Swofford, D.L. 2002. PAUP. Phylogenetic Analysis Using Parsimony.
Version 4. Sinauer Associates, Sunderland, Massachusetts.
Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and
Higgins, D.G. 1997. The ClustalX windows interface: Flexible
strategies for multiple sequence alignment aided by quality
analysis tools. Nucleic Acids Res. 24: 4876–4882.
Wang, Z.X., Yano, M., Yamanouchi, U., Iwamoto, M., Monna, L.,
Hayasaka, H., Katayose, Y., and Sasaki, T. 1999. The Pib gene for
rice blast resistance belongs to the nucleotide binding and
leucine-rich repeat class of plant disease resistance genes. Plant J.
19: 55–64.
Webb, C.A., Richter, T.E., Collins, N.C., Nicolas, M., Trick, H.N.,
Pryor, T., and Hulbert, S.H. 2002. Genetic and molecular
characterization of the maize rp3 rust resistance locus. Genetics
162: 381–394.
Yoshimura, S., Yamanouchi, U., Katayose, Y., Toki, S., Wang, Z.-X.,
Kono, I., Yano, M., Iwata, N., and Sasaki, T. 1998. Expression of
Xa1, a bacterial blight-resistance gene in rice, is induced by
bacterial inoculation. Proc. Natl. Acad. Sci. 95: 1663–1668.
Young, N.D. 2000. The genetic architecture of resistance. Curr. Opin.
Plant Biol. 3: 285–290.
Yu, J., Hu, S., Wang, J., Wong, K.-S.-G., Li, S., Liu, B., Deng, Y., Dai,
L., Zhou, Y., Zhang, X., et al. 2002. A draft sequence of the rice
genome (Oryza sativa ssp. indica). Science 296: 79–92.
Zhou, F., Kurth, J., Wei, F., Elliott, C., Vale, G., Yahiaoui, N., Keller,
B., Somerville, S., Wise, R., and Schulze-Lefert, P. 2000.
Cell-autonomous expression of barley Mla1 confers race-specific
resistance to the powdery mildew fungus via a Rar1 independent
signaling pathway. Plant Cell 13: 337–350.
WEB SITE REFERENCES
http://www.ncbi.nlm.nih.gov/; National Center for Biotechnology
Information.
http://niblrrs.ucdavis.edu/At_RGenes/; database of Arabidopsis
NBS–LRR encoding disease resistance gene homologs.
http://www.rice-research.org/; Monsanto Rice Genome Sequence
Database.
http://btn.genomics.org.cn/rice/; Indica rice database from Beijing
Institute of Genomics.
Received May 22, 2002; accepted in revised form September 27, 2002.
Bai et al.
1884 Genome Research
www.genome.org

Supplementary resources (10)

... In a comprehensive analysis of NB-ARC domain-containing proteins in wheat genome, chromosome arms 4AL and 7BL were found containing the most NB-ARC domain encoding genes, which are potentially involved in leaf rust disease resistance (Chandra et al. 2017). In the rice genome, there are more than 600 putative NBS-LRR genes (Bai et al. 2002). Some of them have the potential to confer broad resistance of rice blast disease, bacterial blight disease, and sheath blight disease (Du et al. 2021;Guo et al. 2016;Wang et al. 2021). ...
Article
Full-text available
Key message Key message Three major QTLs for resistance to downy mildew were located within an 0.78 Mb interval on chromosome 8 in foxtail millet. Abstract Downy mildew, a disease caused by Sclerospora graminicola, is a serious problem that jeopardizes the yield and quality of foxtail millet. Breeding resistant varieties represents one of the most economical and effective solutions, yet there is a lack of molecular markers related to the resistance. Here, a mapping population comprising of 158 F6:7 recombinant inbred lines (RILs) was constructed from the crossing of G1 and JG21. Based on the specific locus amplified fragment sequencing results, a high-density linkage map of foxtail millet with 1031 bin markers, spanning 1041.66 cM was constructed. Based on the high-density linkage map and the phenotype data in four environments, a total of nine quantitative trait loci (QTL) associated with resistance to downy mildew were identified. Further BSR-seq confirmed the genomic regions containing the potential candidate genes related to downy mildew resistance. Interestingly, a 0.78-Mb interval between C8M257 and C8M268 on chromosome 8 was highlighted because of its presence in three major QTL, qDM8_1, qDM8_2, and qDM8_4, which contains 10 NBS-LRR genes. Haplotype analysis in RILs and natural population suggest that 9 SNP loci on Seita8G.199800, Seita8G.195900, Seita8G.198300, and Seita.8G199300 genes were significantly correlated with disease resistance. Furthermore, we found that those genes were taxon-specific by collinearity analysis of pearl millet and foxtail millet genomes. The identification of these new resistance QTL and the prediction of resistance genes against downy mildew will be useful in breeding for resistant varieties and the study of genetic mechanisms of downy mildew disease resistance in foxtail millet.
... Plants encode multiple disease-resistance (R) genes that confer resistance to insects and pathogens [3][4][5]. ...
Article
Full-text available
Background Most disease resistance (R) genes in plants encode proteins that contain leucine-rich-repeat (LRR) and nucleotide-binding site (NBS) domains, which belong to the NBS-LRR family. The sequenced genomes of Fusarium wilt-susceptible Vernicia fordii and its resistant counterpart, Vernicia montana, offer significant resources for the functional characterization and discovery of novel NBS-LRR genes in tung tree. Results Here, we identified 239 NBS-LRR genes across two tung tree genomes: 90 in V. fordii and 149 in V. montana. Five VmNBS-LRR paralogous were predicted in V. montana, and 43 orthologous were detected between V. fordii and V. montana. The orthologous gene pair Vf11G0978-Vm019719 exhibited distinct expression patterns in V. fordii and V. montana: Vf11G0978 showed downregulated expression in V. fordii, while its orthologous gene Vm019719 demonstrated upregulated expression in V. montana, indicating that this pair may be responsible for the resistance to Fusarium wilt in V. montana. Vm019719 from V. montana, activated by VmWRKY64, was shown to confer resistance to Fusarium wilt in V. montana by a virus-induced gene silencing (VIGS) experiment. However, in the susceptible V. fordii, its allelic counterpart, Vf11G0978, exhibited an ineffective defense response, attributed to a deletion in the promoter’s W-box element. Conclusions This study provides the first systematic analysis of NBS-LRR genes in the tung tree and identifies a candidate gene that can be utilized for marker-assisted breeding to control Fusarium wilt in V. fordii.
... In the current investigation, the EcNBLRRs not showed any TIR domains, on other hand ̴ 60% of EcN-BLRRs showed N-terminal located CC domain. The previous studies showed absence of TIR domain in the NBLRRs of cereal species indicating the loss of TNLs in cereal linages from the early angiosperm ancestors [36,37]. The CC-NBLRRs play a major role in assigning resistance to various fungal pathogen in crops. ...
Article
Full-text available
Background The nucleotide binding site leucine rich repeat (NBLRR) genes significantly regulate defences against phytopathogens in plants. The genome-wide identification and analysis of NBLRR genes have been performed in several species. However, the detailed evolution, structure, expression of NBLRRs and functional response to Magnaporthe grisea are unknown in finger millet (Eleusine coracana (L.) Gaertn.). Results The genome-wide scanning of the finger millet genome resulted in 116 NBLRR (EcNBLRRs1-116) encompassing 64 CC-NB-LRR, 47 NB-LRR and 5 CCR-NB-LRR types. The evolutionary studies among the NBLRRs of five Gramineae species, viz., purple false brome (Brachypodium distachyon (L.) P.Beauv.), finger millet (E. coracana), rice (Oryza sativa L.), sorghum (Sorghum bicolor L. (Moench)) and foxtail millet (Setaria italica (L.) P.Beauv.) showed the evolution of NBLRRs in the ancestral lineage of the target species and subsequent divergence through gene-loss events. The purifying selection (Ka/Ks < 1) shaped the expansions of NBLRRs paralogs in finger millet and orthologs among the target Gramineae species. The promoter sequence analysis showed various stress- and phytohormone-responsive cis-acting elements besides growth and development, indicating their potential role in disease defence and regulatory mechanisms. The expression analysis of 22 EcNBLRRs in the genotypes showing contrasting responses to Magnaporthe grisea infection revealed four and five EcNBLRRs in early and late infection stages, respectively. The six of these nine candidate EcNBLRRs proteins, viz., EcNBLRR21, EcNBLRR26, EcNBLRR30, EcNBLRR45, EcNBLRR55 and EcNBLRR76 showed CC, NB and LRR domains, whereas the EcNBLRR23, EcNBLRR32 and EcNBLRR83 showed NB and LRR somains. Conclusion The identification and expression analysis of EcNBLRRs showed the role of EcNBLRR genes in assigning blast resistance in finger millet. These results pave the foundation for in-depth and targeted functional analysis of EcNBLRRs through genome editing and transgenic approaches.
... Studies hav e demonstr ated an important role for this domain for pathogen recognition and signaling [ 22 , 23 ]. Plant NLRs also contain leucine-rich repeats (LRRs), whic h ar e subject to str ong div ersifying selection and show high sequence diversity even within closely related genes [ 24 ]. Studies suggest the high diversity of this region is the result of coev olution betw een host and pathogen, with se v er al studies showing specific pathogen ligand interaction at this site. ...
Article
Full-text available
Background Melaleuca quinquenervia (broad-leaved paperbark) is a coastal wetland tree species that serves as a foundation species in eastern Australia, Indonesia, Papua New Guinea, and New Caledonia. While extensively cultivated for its ornamental value, it has also become invasive in regions like Florida, USA. Long-lived trees face diverse pest and pathogen pressures, and plant stress responses rely on immune receptors encoded by the nucleotide-binding leucine-rich repeat (NLR) gene family. However, the comprehensive annotation of NLR encoding genes has been challenging due to their clustering arrangement on chromosomes and highly repetitive domain structure; expansion of the NLR gene family is driven largely by tandem duplication. Additionally, the allelic diversity of the NLR gene family remains largely unexplored in outcrossing tree species, as many genomes are presented in their haploid, collapsed state. Results We assembled a chromosome-level pseudo-phased genome for M. quinquenervia and described the allelic diversity of plant NLRs using the novel FindPlantNLRs pipeline. Analysis reveals variation in the number of NLR genes on each haplotype, distinct clustering patterns, and differences in the types and numbers of novel integrated domains. Conclusions The high-quality M. quinquenervia genome assembly establishes a new framework for functional and evolutionary studies of this significant tree species. Our findings suggest that maintaining allelic diversity within the NLR gene family is crucial for enabling responses to environmental stress, particularly in long-lived plants.
... Typical coiled-coil domains incorporate the motif "EDVID", while the RPW8-like coiled-coil CC R domains do not contain this motif [81]. Since in monocotyledonous plants, including cereals, NLR proteins incorporate only CC domains, one may assume that toll-like/interleukin-1 receptor-type domains were lost in the process of divergence from the common angiosperm ancestor [82]. In addition to ETI, there is another basic mechanism of plant resistance to pathogens, called pattern triggered immunity (PTI), that limits pathogen spreading but does not trigger cell death. ...
Article
Full-text available
Virus-specific proteins, including coat proteins, movement proteins, replication proteins, and suppressors of RNA interference are capable of triggering the hypersensitive response (HR), which is a type of cell death in plants. The main cell death signaling pathway involves direct interaction of HR-inducing proteins with nucleotide-binding leucine-rich repeats (NLR) proteins encoded by plant resistance genes. Singleton NLR proteins act as both sensor and helper. In other cases, NLR proteins form an activation network leading to their oligomerization and formation of membrane-associated resistosomes, similar to metazoan inflammasomes and apoptosomes. In resistosomes, coiled-coil domains of NLR proteins form Ca2+ channels, while toll-like/interleukin-1 receptor-type (TIR) domains form oligomers that display NAD+ glycohydrolase (NADase) activity. This review is intended to highlight the current knowledge on plant innate antiviral defense signaling pathways in an attempt to define common features of antiviral resistance across the kingdoms of life.
... To better investigate the potential function, several methods had been taken to ideally classify the NLR super-family. Based on the presence of a Toll/IL-1 Receptor-like (TIR) domain, the NLRs have been traditionally divided into TIR-NBS-LRR (TNL) and the non-TIR-NBS-LRR (nTNL) [59][60][61]. Combining exon-intron structures and DNA motif sequences, three NLR classes were determined, named TNLs, CC-NBS-LRR (CNLs), and RPW8-NBS-LRR (RNLs) [62]. By comparative analysis, the distribution of NLRs in our study was almost identical to that previously reported [63]. ...
Article
Full-text available
Rosaceae is one of the major families in the plant kingdom with important economic value. However, many of them are attacked by Valsa canker, resulting in serious loss of production and profits. Nucleotide-binding leucine-rich repeats (NLRs) play a key role in the plant immune response as the largest class of resistance genes. Currently, we performed a genome-wide identification of NLR genes in Rosaceae and revealed some NLR genes in response to Valsa canker using multispecies bioinformatics including co-expression network analysis and RNASeq data. A total of 3718 NLR genes were identified from genomes of 19 plant species (include 9 Rosaceae plants) and classified them into 15 clades. The NLRs display species- and group-specific expansions that are derived from both the whole genome duplication and the tandem duplication. Additionally, the expression of some NLR members was low under normal growth conditions in various plant tissues, while significantly enhanced after the infection of Valsa canker. Furthermore, co-expression network analysis shows that the 13 NLR members were distributed in key nodes of differentially expressed genes which could be considered as promosing key regulators for the resistance of Valsa canker. Therefore, our findings provide a reference for the evolution of NLR genes in Rosaceae and the key regulators of Valsa canker resistance.
... A total of 902 RGAs were identified from 15 flax chromosome-scale pseudomolecules (CDC Bethune v2) after the removal of redundancy. RGAugury categorized these RGAs into four major groups: NLR, RLK, RLP, and TM-CC This is different from most monocot species in which TNL genes have lost in the common ancestor of monocots during their genome evolution (Meyers et al. 1999;Pan et al. 2000;Akita and Valkonen 2002;Bai et al. 2002;Cannon et al. 2002;. A total of eight CNL and eleven TNL genes with complete structures were predicted in this assembly. ...
Chapter
Genomic selection (GS) or genomic prediction (GP) is a type of marker-assisted selection that relies on genome-wide markers to predict genomic-estimated breeding values (GEBVs) of phenotypes. GS is quickly becoming a conventional approach in both plant and animal breeding to increase selection accuracy, reduce breeding cost and shorten breeding cycles. The concept of GS models was first developed using genome-wide random markers, with marker density being a key element in estimating the predictive ability in breeding populations. It is currently straightforward to generate high-density marker datasets thanks to the remarkable advances in genotyping technologies. Recent studies showed that high-density genome-wide random markers do not necessarily generate high genomic predictive ability in GS because the vast majority of markers are unrelated to the traits of interest, thus generating background noises and lowering the predictive ability. Alternatively, the use of quantitative trait loci (QTLs), identified through genome-wide association study (GWAS) methods, in GS models can significantly improve genomic predictive ability and reduce the genotyping cost of the test populations. Here, we present recent findings, discuss a few case studies, a QTL-based GS strategy and a genomic cross-predictions for flax breeding improvement.
Article
Full-text available
Fusarium wilt in Cymbidium ensifolium, caused by Fusarium oxysporum, is highly contagious and poses a severe hazard. It significantly reduces the ornamental value of C. ensifolium and causes substantial economic losses in agricultural production. Nucleotide-binding site–leucine-rich repeat (NBS-LRR) genes are key regulatory factors in plant disease resistance responses, playing vital roles in defending against pathogen invasions. In our study, we conducted a comprehensive analysis of the NBS-LRR gene family in the genome of Cymbidium ensifolium. Phylogenetic analysis identified a total of 31 NBS-LRR genes encoding NB-ARC proteins, which were categorized into five classes (CNL, CN, NL, N, RNL) based on their protein structural domains. These genes were found to be unevenly distributed across eight chromosomes. Physicochemical analysis revealed significant variances in molecular weight and sequence length among the family members. Subcellular localization results indicated that most genes primarily reside in the cytoplasm and cell membrane, suggesting that the primary sites of disease resistance responses may be the cell membrane and cyto-plasm. Furthermore, noticeable disparities were observed in gene structures and conserved motifs among different categories of family genes. Promoter analysis indicated that cis-regulatory elements are mainly associated with plant stress, jasmonic acid, gibberellin, and other development-related factors, suggesting that CeNBS-LRR genes mainly resist external stress through hormones such as abscisic acid and jasmonic acid. We characterized twenty-seven CeNBS-LRR gene expression patterns of healthy C. ensifolium at different periods after Fusarium wilt infection, and found that those genes exhibit a temporospatial expression pattern, and that their expression is also responsive to Fusarium wilt infection. By analyzing the expression pattern via transcriptome and qRT-PCR, we speculated that JL006442 and JL014305 may play key roles in resisting Fusarium wilt. This study lays the groundwork and holds considerable significance as a reference for identifying disease-resistant genes and facilitating genetic breeding in C. ensifolium.
Article
Full-text available
Plants possess an arsenal of immune receptors to allow for numerous tiers of defense against pathogen attack. These immune receptors can be located either in the nucleocytoplasm or on the plant cell surface. NLR gene clusters have recently gained momentum owing to their robustness and malleability in adapting to recognize pathogens. The modular domain architecture of an NLR provides valuable clues about its arms race with pathogens. Additionally, plant NLRs have undergone functional specialization to have either one of the following roles: to sense pathogen effectors (sensor NLRs) or co-ordinate immune signaling (helper or executer NLRs). Sensor NLRs directly recognize effectors whilst helper NLRs act as signaling hubs for more than one sensor NLR to transduce the effector recognition into a successful plant immune response. Furthermore, sensor NLRs can use guard, decoy, or integrated decoy models to recognize effectors directly or indirectly. Thus, by studying a plant host’s NLR repertoire, inferences can be made about a host’s evolutionary history and defense potential which allows scientists to understand and exploit the molecular basis of resistance in a plant host. This review provides a snapshot of the structural and biochemical properties of the different classes of NLRs which allow them to perceive pathogen effectors and contextualize these findings by discussing the activation mechanisms of these NLR resistosomes during plant defense. We also summarize future directives on applications of this NLR structural biology. To our knowledge, this review is the first to collate all vast defense properties of NLRs which make them valuable candidates for study in applied plant biotechnology.
Article
Full-text available
Powdery mildew (PMD), caused by the pathogen Microsphaera diffusa, leads to substantial yield decreases in susceptible soybean under favorable environmental conditions. Effective prevention of soybean PMD damage can be achieved by identifying resistance genes and developing resistant cultivars. In this study, we genotyped 331 soybean germplasm accessions, primarily from Northeast China, using the SoySNP50K BeadChip, and evaluated their resistance to PMD in a greenhouse setting. To identify marker-trait associations while effectively controlling for population structure, we conducted genome-wide association studies utilizing factored spectrally transformed linear mixed models, mixed linear models, efficient mixed-model association eXpedited, and compressed mixed linear models. The results revealed seven single nucleotide polymorphism (SNP) loci strongly associated with PMD resistance in soybean. Among these, one SNP was localized on chromosome (Chr) 14, and six SNPs with low linkage disequilibrium were localized near or in the region of previously mapped genes on Chr 16. In the reference genome of Williams82, we discovered 96 genes within the candidate region, including 17 resistance (R)-like genes, which were identified as potential candidate genes for PMD resistance. In addition, we performed quantitative real-time reverse transcription polymerase chain reaction analysis to evaluate the gene expression levels in highly resistant and susceptible genotypes, focusing on leaf tissues collected at different times after M. diffusa inoculation. Among the examined genes, three R-like genes, including Glyma.16G210800, Glyma.16G212300, and Glyma.16G213900, were identified as strong candidates associated with PMD resistance. This discovery can significantly enhance our understanding of soybean resistance to PMD. Furthermore, the significant SNPs strongly associated with resistance can serve as valuable markers for genetic improvement in breeding M. diffusa-resistant soybean cultivars.
Article
Full-text available
At least six rust resistance specificities (P and P1 to P5) map to the complex P locus in flax. The P2 resistance gene was identified by transposon tagging and transgenic expression. P2 is a member of a small multigene family and encodes a protein with nucleotide binding site (NBS) and leucine-rich repeat (LRR) domains and an N-terminal Toll/interleukin-1 receptor (TIR) homology domain, as well as a C-terminal non-LRR (CNL) domain of 150 amino acids. A related CNL domain was detected in almost half of the predicted Arabidopsis TIR-NBS-LRR sequences, including the RPS4 and RPP1 resistance proteins, and in the tobacco N protein, but not in the flax L and M proteins. Presence or absence of this domain defines two subclasses of TIR-NBS-LRR resistance genes. Truncations of the P2 CNL domain cause loss of function, and evidence for diversifying selection was detected in this domain, suggesting a possible role in specific-ity determination. A spontaneous rust-susceptible mutant of P2 contained a G → E amino acid substitution in the GLPL motif, which is conserved in the NBS domains of plant resistance proteins and the animal cell death control proteins APAF-1 and CED4, providing direct evidence for the importance of this motif in resistance gene function. A P2 homolo-gous gene isolated from a flax line expressing the P resistance specificity encodes a protein with only 10 amino acid differences from the P2 protein. Chimeric gene constructs indicate that just six of these amino acid changes, all located within the predicted-strand/-turn motif of four LRR units, are sufficient to alter P2 to the P specificity.
Chapter
We are in the midst of an explosive increase in the number of DNA and protein sequences available for study, as various genome projects come on line. This wealth of information offers important opportunities for understanding many biological processes and developing new plant and animal models, and ultimately drugs, for human diseases, in addition to other applications of modern biotechnology. Unfortunately, sequences are accumulating at a pace that strains present methods for extracting significant biological information from them. A consequence of this explosion in the sequence databases is that there is much interest and effort in developing tools that can efficiently and automatically extract the relevant biological information in sequence data and make it available for use in biology and medicine. In this chapter, we describe one such method that we have developed based on algorithms from artificial intelligence research. We call this software tool MEME (Multiple Expectation-maximization for Motif Elicitation). It has the attractive property that it is an “unsupervised” discovery tool: it can identify motifs, such as regulatory sites in DNA and functional domains in proteins, from large or small groups of unaligned sequences. As we show below, motifs are a rich source of information about a dataset; they can be used to discover other homologs in a database, to identify protein subsets that contain one or more motifs, and to provide information for mutagenesis studies to elucidate structure and function in the protein family as well as its evolution. Learning tools are used to extract higher level biological patterns from lower level DNA and protein sequence data. In contrast, search tools such as BLAST (Basic Local Alignment Search Tool) take a given higher level pattern and find all items in a database that possess the pattern. Searching for items that have a certain pattern is a problem intrinsically easier than discovering what the pattern is from items that possess it. The patterns considered here are motifs, which for DNA data can be subsequences that interact with transcription factors, polymerases, and other proteins.
Article
We have developed a World Wide Web (WWW) version of the sequence retrieval system Query: WWW-Query. This server allows to query nucleotide sequence banks in the EMBL/GenBank/DDBJ formats and protein sequence banks in the NBRF/PIR format. WWW-Query includes all the features of the on-line sequences browsers already available: possibility to build complex queries, integration of cross-references with different data banks, and access to the functional zones of biological interest. It also provides original services not available elsewhere: introduction of the notion of re-usable sequence lists, integration of dedicated helper applications for visualizing alignments and phylogenetic trees and links with multivariate methods for studying codon usage or for complementing phylogenies.
Article
Disease resistance genes in plants are often found in complex multigene families. The largest known cluster of disease resistance specificities in lettuce contains the RGC2 family of genes. We compared the sequences of nine full-length genomic copies of RGC2 representing the diversity in the cluster to determine the structure of genes within this family and to examine the evolution of its members. The transcribed regions range from at least 7.0 to 13.1 kb, and the cDNAs contain deduced open reading frames of ~5.5 kb. The predicted RGC2 proteins contain a nucleotide binding site and irregular leucine-rich repeats (LRRs) that are characteristic of resistance genes cloned from other species. Unique features of the RGC2 gene products include a bipartite LRR region with >40 repeats. At least eight members of this family are transcribed. The level of sequence diversity between family members varied in different regions of the gene. The ratio of nonsynonymous (Ka) to synonymous (Ks) nucleotide substitutions was lowest in the region encoding the nucleotide binding site, which is the presumed effector domain of the protein. The LRR-encoding region showed an alternating pattern of conservation and hypervariability. This alternating pattern of variation was also found in all comparisons within families of resistance genes cloned from other species. The Ka/Ks ratios indicate that diversifying selection has resulted in increased variation at these codons. The patterns of variation support the predicted structure of LRR regions with solvent-exposed hypervariable residues that are potentially involved in binding pathogen-derived ligands.
Article
Pathogen resistance ( R ) genes of the NBS-LRR class (for nucleotide binding site and leucine-rich repeat) are found in many plant species and confer resistance to a diverse spectrum of pathogens. Little is known about the mechanisms that drive NBS-LRR gene evolution in the host‐pathogen arms race. We cloned the RPP8 gene (for resistance to Peronospora parasitica ) and compared the structure of alleles at this locus in resistant Landsberg erecta (L er -0) and susceptible Columbia (Col-0) accessions. RPP8-L er encodes an NBS-LRR protein with a putative N-terminal leucine zipper and is more closely related to previously cloned R genes that confer resistance to bacterial pathogens than it is to other known RPP genes. The RPP8 haplotype in L er -0 contains the functional RPP8-L er gene and a nonfunctional homolog, RPH8A. In contrast, the rpp8 locus in Col-0 contains a single chimeric gene, which was likely derived from unequal crossing over between RPP8-L er and RPH8A ancestors within a L er -like haplotype. Sequence divergence among RPP8 family members has been accelerated by positive selection on the putative ligand binding region in the LRRs. These observations indicate that NBS-LRR molecular evolution is driven by the same mechanisms that promote rapid sequence diversification among other genes involved in non-self-recognition.
Article
Thirteen alleles (L, L1 to L11, and LH) from the flax L locus, which encode Toll/interleukin-1 receptor homology–nucleotide binding site–leucine-rich repeat (TIR-NBS-LRR) rust resistance proteins, were sequenced and compared to provide insight into their evolution and into the determinants of gene-for-gene resistance specificity. The predicted L6 and L11 proteins differ solely in the LRR region, whereas L6 and L7 differ solely in the TIR region. Thus, specificity differences between alleles can be determined by both the LRR and TIR regions. Functional analysis in transgenic plants of recombinant alleles constructed in vitro provided further information: L10–L2 and L6–L2 recombinants, encoding the LRR of L2, conferred L2 resistance specificity, and an L2–L10 recombinant, encoding the LRR of L10, conferred a novel specificity. The sequence comparisons also indicate that the evolution of L alleles has probably involved reassortment of variation, resulting from accumulated point mutations, by intragenic recombination. In addition, large deletion events have occurred in the LRR-encoding regions of L1 and L8, and duplication events have occurred in the LRR-encoding region of L2.
Article
The rice blast resistance ( R ) gene Pi-ta mediates gene-for-gene resistance against strains of the fungus Magnaporthe grisea that express avirulent alleles of AVR-Pita . Using a map-based cloning strategy, we cloned Pi-ta , which is linked to the centromere of chromosome 12. Pi-ta encodes a predicted 928‐amino acid cytoplasmic receptor with a centrally localized nucleotide binding site. A single-copy gene, Pi-ta shows low constitutive expression in both resistant and susceptible rice. Susceptible rice varieties contain pi-ta ‐ alleles encoding predicted proteins that share a single amino acid difference relative to the Pi-ta resistance protein: serine instead of alanine at position 918. Transient expression in rice cells of a Pi-ta 1 R gene together with AVR-Pita 1 induces a resistance response. No resistance response is induced in transient assays that use a naturally occurring pi-ta ‐ allele differing only by the serine at position 918. Rice varieties reported to have the linked Pi-ta 2 gene contain Pi-ta plus at least one other R gene, potentially explaining the broadened resistance spectrum of Pi-ta 2 relative to Pi-ta . Molecular cloning of the AVR-Pita and Pi-ta genes will aid in deployment of R genes for effective genetic control of rice blast disease.