Article

The Legume Information System (LIS): An integrated information resource for comparative legume biology

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Legume Information System (LIS) (http://www.comparative-legumes.org), developed by the National Center for Genome Resources in cooperation with the USDA Agricultural Research Service (ARS), is a comparative legume resource that integrates genetic and molecular data from multiple legume species enabling cross-species genomic and transcript comparisons. The LIS virtual plant interface allows simplified and intuitive navigation of transcript data from Medicago truncatula, Lotus japonicus, Glycine max and Arabidopsis thaliana. Transcript libraries are represented as images of plant organs in different developmental stages, which are selected to query the analyzed and annotated data. Complex queries can be accomplished by adding modifiers, keywords and sequence names. The LIS also contains annotated genomic data featuring transcript alignments to validate gene predictions as well as motif and similarity analyses. The genomic browser supports comparative analysis via novel dynamic functional annotation comparisons. CMap, developed as part of the GMOD project (http://www.gmod.org/cmap/index.shtml), has been incorporated to support comparative analyses of community linkage and physical map data. LIS is being expanded to incorporate gene expression and biochemical pathways which will be seamlessly integrated forming a knowledge discovery framework.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... None were found in the variable gene set. The identified genes were all confirmed to group with the M. truncatula genes in the gene family trees on the Legume Information System genomic data portal (Dash et al., 2016;Gonzales et al., 2004; https://legumeinfo.org/). These included all genes currently known to be involved in rhizobium symbiosis. ...
... The legume gene family trees on the legume information system portal (https:// legumeinfo.org/; Gonzales et al., 2004;Dash et al., 2016) for the M. truncatula and L. angustifolius genes were used to validate that these were true homologues. ...
Article
Full-text available
Narrow‐leafed lupin (NLL; Lupinus angustifolius) is a key rotational crop for sustainable farming systems, whose grain is high in protein content. It is a gluten‐free, non‐GM, alternative protein source to soybean and as such has gained an interest as a human food ingredient. Here, we present a chromosome‐length reference genome for the species and a pan‐genome assembly comprising 55 NLL lines including, Australian and European cultivars, breeding lines and wild accessions. We present the core and variable genes for the species and report on the absence of essential mycorrhizal associated genes. The genome and pan‐genomes of NLL and its close relative white lupin (L. albus) are compared. Furthermore, we provide additional evidence supporting LaRAP2‐7 as the key alkaloid regulatory gene for NLL and demonstrate the NLL genome is underrepresented in classical NLR disease resistance genes compared to other sequenced legume species. The NLL genomic resources generated here coupled with previously generated RNA‐seq datasets provide new opportunities to fast‐track lupin crop improvement.
... the annotation of putative targets was studied based on the annotation pipeline available at Legume Information Resources (https:// legumeinfo.org/annot) ( Gonzales et al., 2005). Further, GO term clas- sification was done using the CateGOrizer tool ( Na et al., 2014). ...
... The annotation pipeline included several computational tools based on three reference genomes namely Arabidopsis thaliana, Medicago truncatula, and Glycine max. Lentil microRNA targets were identified into functional GO terms using LIS legume resource annotation tool server ( Gonzales et al., 2005). The classification of retrieved GO terms was carried out with CateGOrizer using the Plant_GOslim classification with the consolidated single occurrences count method ( Na et al., 2014). ...
Article
MicroRNAs (miRNAs) are a class of endogenous non-coding, small RNAs that are associated with the regulation of gene expression in eukaryotes. In plants, few miRNAs are highly conserved, that may have the same ancestor in the early stages of evolution. This fact allows the detection of conserved miRNAs in various plant species, especially in those that lack genome sequence information. Though the draft genome of the orphan crop, Lens culinaris Medik. (Lentil) is published, its complete genome assembly is still underway. In this computational study, an EST and GSS based comparative genomics approach were conducted to identify miRNAs in Lentils. The adopted approach was on the basis of a search for sequence similarity followed by series of filtering steps to provide reliable and precise results, eliminating the false-positive predictions. This study reports 24 miRNAs from 10,190 ESTs and 715 GSSs in Lentil. Further, it was also sought to pinpoint the 83 likely target genes of Lentil miRNAs and their most probable functions using Legume Information Resource (LIS) annotation pipeline. The newly identified miRNAs were mainly found to regulate the genes that encode transcription factors and key enzymes involved in metabolic processes as well as oxidation-reduction processes. Many of target genes were found to have an association with plant growth and development, stress response, defense and hormone signaling pathways. The miRNAs accounted is presumed to advance Lentil miRNAome in future as this computational study puts forward, first-ever, view towards miRNA research in Lentil.
... To anchor the map to V. faba, Medicago truncatula, and chickpea, marker sequences were searched against the complete coding sequences (CDS) of the V. faba assembly for cultivar Hedin GCA_948472305.1 (Jayakodi et al., 2023), using BLASTn v.2.2.29 (Altschul et al., 1990;Camacho et al., 2009) and CDS of M truncatula A17 assembly v. 4.0 (Young et al., 2011), with parameter -task blastn-short, e-value set to 1e−5, and parameter -num-alignments set to 2. QTLs from the linkage map of Gela et al. (2022) were anchored according to their marker coordinates within the V. faba genome. In addition, all mapped markers were searched on the Legume Information system's blastn-server (Gonzales et al., 2005) running BLASTn with default settings against the CDS databases of M. truncatula A17v5 (Pecrix et al., 2018; medtr.A17.gnm5.ann1_6.L2RX) and Cicer arietinum Frontier v1 (Varshney et al., 2013;cicar.CDCFrontier.gnm1.GkHc). ...
Article
Full-text available
Introduction Chocolate spot, caused by the ascomycete fungus Botrytis fabae, is a devastating foliar disease and a major constraint on the quality and yield of faba beans (Vicia faba). The use of fungicides is the primary strategy for controlling the disease. However, high levels of partial genetic resistance have been identified and can be exploited to mitigate the disease. Methods The partially resistant V. faba cultivar Maris Bead and susceptible Egyptian accession ig70726 were crossed, and a genetic mapping population of 184 individuals was genotyped in the F2 generation and screened for resistance to B. fabae infection in the F3, F5, and F6 generations in a series of field experiments. A high-density linkage map of V. faba containing 3897 DArT markers spanning 1713.7 cM was constructed. Results Multiple candidate quantitative trait loci (QTLs) in 11 separate regions of the V. faba genome were identified; some on chromosomes 2, 3, and 6 overlapped with loci previously linked to resistance to Ascochyta leaf and pod blight caused by the necrotrophic fungus Ascochyta fabae. A transcriptomics experiment was conducted at 18 h post-inoculation in seedlings of both parents of the mapping population, identifying several differentially expressed transcripts potentially involved in early stage defence against B. fabae, including cell-wall associated protein kinases, NLR genes, and genes involved in metabolism and response to reactive oxygen species. Discussion This study identified several novel candidate QTLs in the V. faba genome that contribute to partial resistance to chocolate spot, but differences between growing seasons highlighted the importance of multi-year phenotyping experiments when searching for candidate QTLs for partial resistance.
... Whole-genome, transcriptome and genotype data can also be submitted to most of the GGB databases such as Genome Database for Rosaceae (72,73), CottonGen (74,75), SoyBase (76,77), Legume Information System (78,79), Sol Genomics Network (80,81), MaizeGDB (82,83), TreeGenes (84,85), the Arabidopsis Information Resource (TAIR) (86,87), KnowPulse (88) and InterMine (89, 90) ( Table 2). Some of these databases, such as Gramene (91)(92)(93), SorghumBase (94) and InterMine (89,95), do not accept data from authors but obtain from the primary databases. ...
Article
Full-text available
Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021–22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org.
... The current version of Phytozome (v13.0) has ~103 plant genomics sequences, and the evolutionary history of each gene at the gene structure, enzyme, expression, nucleotide, and amino acids sequence level (https://phytozome-next.jgi.doe.gov/). The phytozome database has a large data set from different plants genomes associated with various databases such as LIS for legumes, Gramene for grasses, TAIR for Arabidopsis, SGN for Solanaceae, and GDR for Rosaceae, GreenPhylDB, Plaza and PlantGDB (Bombarely et al. 2011;Proost et al. 2009;Conte et al. 2008;Swarbreck et al. 2008;Jung et al. 2008;Liang et al. 2008;Gonzales et al. 2005). Furthermore, all gene sets in Phytozome database are annotated using different databases (e.g. ...
Chapter
Genomics, transcriptomics, and proteomics data mining are some of the interdisciplinary and emerging fields of bioinformatics that are evolving rapidly, making it difficult to predict the magnitude and pace of change. Bioinformatics tools have become an important means for researchers to conduct research. Such tools provide easy access to large datasets spanning genomes, transcriptomes, proteomes, epigenomes, and other ‘omics’ generated over the last decade. This chapter describes the use of bioinformatics tools found in numerous databases, such as Bio-Analytic Resource for Plant Biology, Protein Sequence Analysis and Classification Database (InterPro), Comparative Toxicogenomics Database, Plant Comparative Genomics Portal (Phytozome), Kyoto Encyclopedia of Genes and Genomes, The Arabidopsis Information Resource, and ExplorEnz-Enzyme database (IUBMB). Therefore, this chapter defines data mining as an efficient method of extracting new information or new knowledge from large data sets or databases. However, new information that reflects the underlying biological processes can potentially be useful in biotechnology and biochemical research. Finally, this chapter briefly discusses some of these tools and scenarios that may be useful to scientific researchers.
... Whole genome, transcriptome, and genotype data can also be submitted to most of the GGB databases such as GDR (30)(31)(32), CottonGen (33,34), SoyBase (35,36), LIS (37,38), SGN (39,40), Maize-GDB (41,42), TreeGenes (43,44), TAIR (45,46), KnowPulse (47), and InterMine (48,49) databases ( Table 2). Some of these databases, such as Gramene (50)(51)(52), SorghumBase (53) and InterMine (48,(54), do not accept data from authors but obtain from the primary databases. ...
Preprint
Full-text available
The Genotype-Phenotype Working Group was established in November 2021 as part of the AgBioData Consortium (https://www.agbiodata.org) with the goal of identifying current challenges in annotating and integrating large-scale genotype and phenotype data. Over the course of the year, the members of this working group identified different types of data sets, explored experimental platforms and methods for data generation, and examined how these data are annotated including the metadata requirements. We conducted a thorough review of publicly funded repositories for raw and processed data for each data type. We also examined several secondary databases and knowledgebases that enable the integration of heterogeneous data types in the context of the Genome Browser, Pathway Networks and tissue-specific gene expression. The review revealed a need for additional infrastructural support, standards, and tools to connect Genotype to Phenotype data and enhance data interoperability for knowledge synthesis and to foster translational research.
... (Lamesch et al., 2012) and soybean OMTs from Legume Information System (https:// legumeinfo.org/) (Gonzales et al., 2005). A. ipaensis and A. duranensis OMT sequences were obtained from the PeanutBase database (https://www.peanutbase.org/home) ...
Article
Full-text available
Cultivated peanut (Arachis hypogaea) is a leading protein and oil-providing crop and food source in many countries. At the same time, it is affected by a number of biotic and abiotic stresses. O-methyltransferases (OMTs) play important roles in secondary metabolism, biotic and abiotic stress tolerance. However, the OMT genes have not been comprehensively analyzed in peanut. In this study, we performed a genome-wide investigation of A. hypogaea OMT genes (AhOMTs). Gene structure, motifs distribution, phylogenetic history, genome collinearity and duplication of AhOMTs were studied in detail. Promoter cis-elements, protein-protein interactions, and micro-RNAs targeting AhOMTs were also predicted. We also comprehensively studied their expression in different tissues and under different stresses. We identified 116 OMT genes in the genome of cultivated peanut. Phylogenetically, AhOMTs were divided into three groups. Tandem and segmental duplication events played a role in the evolution of AhOMTs, and purifying selection pressure drove the duplication process. AhOMT promoters were enriched in several key cis-elements involved in growth and development, hormones, light, and defense-related activities. Micro-RNAs from 12 different families targeted 35 AhOMTs. GO enrichment analysis indicated that AhOMTs are highly enriched in transferase and catalytic activities, cellular metabolic and biosynthesis processes. Transcriptome datasets revealed that AhOMTs possessed varying expression levels in different tissues and under hormones, water, and temperature stress. Expression profiling based on qRT-PCR results also supported the transcriptome results. This study provides the theoretical basis for further work on the biological roles of AhOMT genes for developmental and stress responses.
... (Lamesch et al., 2012), and Legume Information System LIS (https://legacy.legumeinfo.org/) (Gonzales et al., 2005). The protein sequence was also submitted to ScanProsite and NCBI databases to analyze the protein binding sites and functional domains. ...
Article
Full-text available
Peanut is an important oil and food legume crop grown in more than one hundred countries, but the yield and quality are often impaired by different pathogens and diseases, especially aflatoxins jeopardizing human health and causing global concerns. For better management of aflatoxin contamination, we report the cloning and characterization of a novel A. flavus inducible promoter of the O-methyltransferase gene (AhOMT1) from peanut. The AhOMT1 gene was identified as the highest inducible gene by A. flavus infection through genome-wide microarray analysis and verified by qRT-PCR analysis. AhOMT1 gene was studied in detail, and its promoter, fussed with the GUS gene, was introduced into Arabidopsis to generate homozygous transgenic lines. Expression of GUS gene was studied in transgenic plants under the infection of A. flavus. The analysis of AhOMT1 gene characterized by in silico assay, RNAseq, and qRT-PCR revealed minute expression in different organs and tissues with trace or no response to low temperature, drought, hormones, Ca2+, and bacterial stresses, but highly induced by A. flavus infection. It contains four exons encoding 297 aa predicted to transfer the methyl group of S-adenosyl-L-methionine (SAM). The promoter contains different cis-elements responsible for its expression characteristics. Functional characterization of AhOMT1P in transgenic Arabidopsis plants demonstrated highly inducible behavior only under A. flavus infection. The transgenic plants did not show GUS expression in any tissue(s) without inoculation of A. flavus spores. However, GUS activity increased significantly after inoculation of A. flavus and maintained a high level of expression after 48 hours of infection. These results provided a novel way for future management of peanut aflatoxins contamination through driving resistance genes in A. flavus inducible manner.
... Legume Information system (LIS), the genomic data portal for all legume plants sequenced so far, has been used in this study for information on the legume JAZs. LIS provides support to GenScan (Burge and Karlin 1997) software to initiate gene prediction, based on genomic sequences, delineate their gene structures, exon-intron organizations, and comparative analysis (Gonzales et al. 2005). Therefore, the Gene Models of LIS present legume gene accession, gene location, conserved domains, exon-intron structures, alternatively spliced variants, and isoforms of the genes. ...
Article
Full-text available
Plant-specific TIFY transcription factor family is characterized by a highly conserved TIFY domain. Four sub-families ZML, PPD, TIFY, and JAZ (Jasmonate ZIM domain), participate in a wide diversity of developmental and stress-responsive processes. Here, 29 Arachis hypogaea (peanut) TIFY family genes are identified and characterized by conserved domain distribution, exon–intron pattern, and phylogenetic clustering. 6 ZMLs, 2 TIFYs, 1 PPD, and 20 JAZ sub-family genes are confirmed in AhTIFY family. Since allotetraploid peanut contains two progenitor sub-genomes, 14 homeologous gene pairs are obtained, also showing positional or functional homeology, based on comparative chromosomal localization and comparative tissue expression, respectively. Transcriptomic studies of Peanutbase show tissue-specific expression, drought, and nematode-responsiveness of AhJAZs. It reveals contribution of JA-signaling pathway in the development and stress management of a globally important oil-crop peanut, finally to enhance its agricultural production. In connection to AhJAZs, this study has scrutinized legume JAZs as a whole, to find out few interesting features of the legume JAZs, viz., (a) the absence of JAZ5, 10, 11 and (b) the presence of Jas intron in the JAZ1s, subjected to alternative splicing, contrasted with Arabidopsis. Most intriguingly, in 13 legume JAZ12s, predicted peptides from the in-frame coding sequence containing three-nucleotide periodicity and termination codon within the Jas intron/3′-intron show sequence similarity to rhizobial ABC transporter-type proteins. Predicted peptides appear to be encoded by these intronic sORFs to modulate rhizobial transporter proteins and thereby to influence the symbiotic relation of the legume crops through a unique way of JAZ utilization in the legumes.
... The Legume Information System (LIS) was developed in the early 2000s, as a collaborative project between the USDA-Agricultural Research Service and the National Center for Genome Resources [1]. It initially focused on transcript resources, which were a predominant data type at the time; but as data resources have matured and expanded, LIS has evolved to accommodate new research needs and available types of data [2]. ...
Chapter
Full-text available
In this chapter, we introduce the main components of the Legume Information System (https://legumeinfo.org) and several associated resources. Additionally, we provide an example of their use by exploring a biological question: is there a common molecular basis, across legume species, that underlies the photoperiod-mediated transition from vegetative to reproductive development, that is, days to flowering? The Legume Information System (LIS) holds genetic and genomic data for a large number of crop and model legumes and provides a set of online bioinformatic tools designed to help biologists address questions and tasks related to legume biology. Such tasks include identifying the molecular basis of agronomic traits; identifying orthologs/syntelogs for known genes; determining gene expression patterns; accessing genomic datasets; identifying markers for breeding work; and identifying genetic similarities and differences among selected accessions. LIS integrates with other legume-focused informatics resources such as SoyBase (https://soybase.org), PeanutBase (https://peanutbase.org), and projects of the Legume Federation (https://legumefederation.org).
... LIS was developed as a collaboration between the National Center for Genome Resources and the USDA Agricultural Research Service (Gonzales et al. 2005, Berendzen et al. 2021. The platform offers a variety of genetic and genomic tools for more than 20 legume species. ...
Technical Report
Full-text available
The number of genomic resources for Phaseolus and the legume family have undergone unprecedented growth in recent years. Many of these resources, including marker-trait associations, have been assigned to a range of genome assemblies and genetic maps that may not be readily comparable among experiments. This obstructs the accessibility of many promising results, particularly for those working with different genetic data types or distinct species. The Legume Information System (LIS) offers a continuously updated, highly integrated platform for comparing genetic and phenotypic data among distinct genomes assemblies and species. Recent updates in the data deposition system (available at https://legumeinfo.org/submit_data) now facilitate the process of adding QTL mapping or GWAS data to the repository. These data can then be quickly and easily compared using a variety of LIS tools. We propose here a community-led curation of genotypic and phenotypic data that will greatly increase the impact of deposited data among Phaseolus research labs and across the legume community.
... (Young and Udvardi, 2009). Several studies have shown conserved synteny among the cool season legumes, particularly between M. truncatula and lucerne (Kaló et al., 2004, Chandran et al., 2008 and pea (Macas et al., 2007), as well as between the major papilionoid clades (Gonzales et al., 2005;Bertioli et al., 2009). Thus the conservation of genome structure and function between legume species has facilitated cross-species transfer of genetic markers (Fredslund et al., 2006;Gupta and Prasad, 2009) and microarray chips (Frickey et al., 2008). ...
Chapter
This book offers an extensive reference on the recent developments made in major food legumes. It offers exhaustive information on various aspects related to history, origin and evolution, botany, breeding objectives and methods, hybrid technology, doubled-haploid breeding and in vitro techniques; and on recent developments made through biotechnology, genetic engineering and molecular approaches.
... doe.gov/pz/portal.html#), Vigna Genome Server (https:// viggs.dna.affrc.go.jp/), and LIS -Legume Information System (https://legumeinfo.org) in the genomes of the main representatives of the tribe Phaseoleae: pigeonpea Cajanus cajan (L.) Millsp., soybean Glycine max (L.) Merr., common bean Phaseolus vulgaris L., adzuki bean V. angu laris, mung bean V. radiata, and cowpea V. unguiculata (accessed Jan. 20, 2020) (Gonzales et al., 2005;Goodstein et al., 2012;Kersey et al., 2014;Sakai et al., 2015Sakai et al., , 2016. The multiple alignment of nucleotide and amino acid sequences was made using MULTALIN v5.4.1 (http:// multalin.toulouse.inra.fr/multalin/) ...
Article
Full-text available
The type of stem growth is one of the key features in determining plant architectonics. Stem growth type is an economically important trait. It interconnects with stem length, flowering duration, yield, resistance to lodging, and suitability of mechanized cultivation. Mutations in the TFL1 gene and its homologs have been demonstrated to change meristem indeterminacy across genera. The aim of this work was to characterize and compare the structural organization of TFL1-like genes in representatives of the tribe Phaseoleae (pigeonpea Cajanus cajan, soybean Glycine max, common bean Phaseolus vulgaris, adzuki bean Vigna angularis, mung bean V. radiata, and cowpea V. unguiculata) based on in silico analysis, including analysis of nucleotide sequences, predicted elements in promoter regions, predicted amino acid sequences, putative functional domains and 3D protein structures. We investigated TFL1 (one gene for adzuki bean, four copies for soybean, two copies for other studied species), ATC (two copies for soybean, one gene for other investigated species), and BFT (two copies for soybean, one gene for other studied species) gene family members found in whole-genome sequences databases available for representatives of the tribe Phaseoleae. The presence of duplicated copies for all genes in soybean may be a result of the last genome duplication event during the evolution of this species. Duplication of TFL1 gene to two copies in most of studied species of the tribe Phaseoleae is probably accompanied by the maintenance of the functional state of these genes. The exception is VrTFL1.2 of V. radiata, which likely had lost its functionality. This work broadens the existing data about the number of gene copies, their structural divergence and evolution, and the expected functional differences. This information will be important for understanding the molecular genetic mechanisms underlying the maintenance of indeterminacy in the growth of the shoot apical meristem, as well as in the control of the transition to the reproductive phase of plant development.
... Data warehouse systems, such as InterMine [338], provide more powerful query options but are relatively difficult to configure and generally do not come with data visualization options. Previously, data warehouse systems like InterMine and genome browsers like JBrowse have been combined into custom one-off data portals for model organisms, such as Araport for Arabidopsis thaliana [339], the Legume Information System for legumes [340] or Wormbase for Caenorhabditis elegans and related nematodes [341]. However, setting up a custom data portal for each new genome is inefficient and time consuming. ...
... The mission of the Legume Information System (LIS; https:// legumeinfo.org) is to facilitate research and crop improvement for the many legume species that are important in global agriculture. LIS is a collaborative project between the USDA-Agricultural Research Service and the National Center for Genome Resources (Dash et al., 2016;Gonzales et al., 2005;Gonzales, Gajendran, Farmer, Archuleta, & Beavis, 2007). LIS houses genetic and genomic data for 30 legume species, with extensive support for 18 of these, as of late 2020. ...
Article
Full-text available
The Legume Information System (LIS; https://legumeinfo.org) houses genetic and genomic data, integrated in various online tools to allow comparative genomic analyses. The website and database maintain data for more than two dozen species, particularly focusing on crop and model species and holding data for other diverse species of taxonomic interest. Major analysis features include genome browsers, sequence‐search tools, legume‐focused gene families and a phylogenetic tree viewer, a gene annotation service (which places a submitted gene into a gene family and phylogenetic tree), an interactive microsynteny and pan‐genome viewer, a novel viewer of genetic variant data, genetic maps and viewers, a Data Store for data sets such as assemblies and annotations, InterMine instances for querying genetic and genomic data, and a tool for viewing geographic distributions of germplasm accessions. LIS also integrates with several other legume data resources and tools, including PeanutBase (https://peanutbase.org), SoyBase (https://soybase.org), Medicago Hapmap (https://medicagohapmap2.org), Alfalfa Breeder's Toolbox (https://alfalfatoolbox.org), and the Legume Federation (https://legumefederation.org).
... noble.org/LegumeIP) [109] and Legume Information System (LIS) (https://legumeinfo.org/) [110]. ...
Article
Full-text available
Background: Seed weight is a complex yield-related trait with a lot of quantitative trait loci (QTL) reported through linkage mapping studies. Integration of QTL from linkage mapping into breeding program is challenging due to numerous limitations, therefore, Genome-wide association study (GWAS) provides more precise location of QTL due to higher resolution and diverse genetic diversity in un-related individuals. Results: The present study utilized 573 breeding lines population with 61,166 single nucleotide polymorphisms (SNPs) to identify quantitative trait nucleotides (QTNs) and candidate genes for seed weight in Chinese summer-sowing soybean. GWAS was conducted with two single-locus models (SLMs) and six multi-locus models (MLMs). Thirty-nine SNPs were detected by the two SLMs while 209 SNPs were detected by the six MLMs. In all, two hundred and thirty-one QTNs were found to be associated with seed weight in YHSBLP with various effects. Out of these, seventy SNPs were concurrently detected by both SLMs and MLMs on 8 chromosomes. Ninety-four QTNs co-localized with previously reported QTL/QTN by linkage/association mapping studies. A total of 36 candidate genes were predicted. Out of these candidate genes, four hub genes (Glyma06g44510, Glyma08g06420, Glyma12g33280 and Glyma19g28070) were identified by the integration of co-expression network. Among them, three were relatively expressed higher in the high HSW genotypes at R5 stage compared with low HSW genotypes except Glyma12g33280. Our results show that using more models especially MLMs are effective to find important QTNs, and the identified HSW QTNs/genes could be utilized in molecular breeding work for soybean seed weight and yield. Conclusion: Application of two single-locus plus six multi-locus models of GWAS identified 231 QTNs. Four hub genes (Glyma06g44510, Glyma08g06420, Glyma12g33280 & Glyma19g28070) detected via integration of co-expression network among the predicted candidate genes.
... This can be utilized to detect associations amid linkage groups in a map, display marker connections amid homoeologous chromosomes, as well as among the maps reported in several publications. A legume genome database, the Legume Information System, was announced to incorporate species-specific data and allows cross-legume assessments (Gonzales et al. 2005). The Legume Information System will allow evaluation of synteny amid several species and also includes sequence data, making it conceivable to explore the genes, expressed in diverse tissues and at several physiological circumstances. ...
Chapter
Cultivated peanut (Arachis hypogaea L.), a vital source of proteins and nutrient-rich fodder for livestock, is considered globally as a major oilseed crop. Being a segmental allopolyploid with AABB genome conformation, the cultivated peanut is considered to have evolved through single interspecific hybridization amid two diploid species. A number of biotic and abiotic forces restrict the production and productivity of peanut. Intensive attempts to develop superior peanut varieties with inherent tolerance/resistance and enriched nutritional components were executed to combat stress factors in fulfilling the requirements of farmers and consumers. Breeding objectives in the past were achieved mainly through mass and pure-line selections. Subsequently to accomplish breeding objectives, peanut breeders employed backcross and pedigree approaches followed by inter- and intra-specific hybridization in a considerable way. Simultaneously, peanut breeding through the mutagenic approach played a noteworthy part during the development of multiple propitious high-yielding varieties. Traditional breeding approaches helped in identification and advancement of cultivars with inherent resistant traits, but such resistance traits are tightly linked with inferior pod and kernel characteristics that are extremely challenging to break. Under non-conventional approaches, several molecular breeding techniques were successfully attempted to break this barrier. Marker-assisted selection (MAS) and transformation of genes coding the traits of interest, overlaying the way of gene insertion, assisted significantly in establishing superior varieties of peanut with inherent resistance and enhanced pod and kernel features. Among all efficient markers, microsatellite markers were extensively employed in constructing linkage maps, genotyping as well as MAS, owing to the distinguishable and co-dominance nature of these markers. A number of reproducible molecular markers were developed that are associated with salinity and drought tolerance, as well as resistance to biotic stresses like rust, and leaf spots, and to a certain extent Sclerotinia blight etc. Agrobacterium-mediated genetic transformations, via in planta or particle-bombardment approaches, have resulted in development of transgenic peanuts with enhanced yield attributes and inherent resistance against a few biotic and abiotic stresses. Such genetically transformed peanut populations could also be employed as donor parents in traditional breeding system to develop fungal and a few virus disease tolerant varieties. Nevertheless, it could be suggested that a combination of breeding and biotechnological tools and approaches, might deliver an inherent, cost-effective, as well as eco-friendly solutions in developing better peanut varieties globally.
... For example, the International Legume Database of Nodulation (ILDON; Appendix 1) builds on one of the original ILDIS Phase 2 modules for root-nodulation data and on the legume species checklist in the World Checklist of Selected Plant Families (see below). A second example is the Legume Information System (LIS), focused on legume crops, which integrates genetic, genomic and trait data across legume species, enabling cross-species genomic and transcript comparisons and facilitating crop improvement (Gonzales et al. 2005;Dash et al. 2016). Other legume portals, focused on particular clades (e.g. ...
Article
Full-text available
The need for scientists to exchange, share and organise data has resulted in a proliferation of biodiversity research-data portals over recent decades. These cyber-infrastructures have had a major impact on taxonomy and helped the discipline by allowing faster access to bibliographic information, biological and nomenclatural data, and specimen information. Several specialised portals aggregate particular data types for a large number of species, including legumes. Here, we argue that, despite access to such data-aggregation portals, a taxon-focused portal, curated by a community of researchers specialising on a particular taxonomic group and who have the interest, commitment, existing collaborative links, and knowledge necessary to ensure data quality, would be a useful resource in itself and make important contributions to more general data providers. Such an online species-information system focused on Leguminosae (Fabaceae) would serve useful functions in parallel to and different from international data-aggregation portals. We explore best practices for developing a legume-focused portal that would support data sharing, provide a better understanding of what data are available, missing, or erroneous, and, ultimately, facilitate cross-analyses and direct development of novel research. We present a history of legume-focused portals, survey existing data portals to evaluate what is available and which features are of most interest, and discuss how a legume-focused portal might be developed to respond to the needs of the legume-systematics research community and beyond. We propose taking full advantage of existing data sources, informatics tools and protocols to develop a scalable and interactive portal that will be used, contributed to, and fully supported by the legume-systematics community in the easiest manner possible.
... Data warehouse systems, such as InterMine (Smith et al., 2012), provide more powerful query options but are relatively difficult to configure and generally do not come with data visualization options. Previously, data warehouse systems like InterMine and genome browsers like JBrowse have been combined into custom one-off data portals for model organisms, such as Araport for Arabidopsis thaliana (Krishnakumar et al., 2015), the Legume Information System for legumes (Gonzales et al., 2005) or Wormbase for Caenorhabditis elegans and related nematodes (Stein et al., 2001). However, setting up a custom data portal for each new genome is inefficient and time consuming. ...
Article
Full-text available
Analysis and comparison of genomic and transcriptomic data sets have become standard procedures in biological research. However, for non-model organisms no efficient tools exist to visually work with multiple genomes and their metadata, and to annotate such data in a collaborative way. Here we present GeneNoteBook: a web based collaborative notebook for comparative genomics. GeneNoteBook allows experimental and computational researchers to query, browse, visualize and curate bioinformatic analysis results for multiple genomes. GeneNoteBook is particularly suitable for the analysis of non-model organisms, as it allows for comparing newly sequenced genomes to those of model organisms. Availability and implementation: GeneNoteBook is implemented as a node.js web application and depends on MongoDB and NCBI BLAST. Source code is available at https://github.com/genenotebook/genenotebook. Additionally, GeneNoteBook can be installed through Bioconda and as a Docker image. Supplementary information: Full installation instructions and online documentation are available at https://genenotebook.github.io. Supplementary text is available at Bioinformatics online.
... The available number of plant sequences and genome databases have grown up around different clades, like TAIR for Arabidopsis [177], Gramene for grasses [178], LIS for legumes [179], Phytozome [180], Plaza [181], and PlantGDB [182], for instance. These databases and web portals provide a set of tools and automated analyses across plant genomes, providing the identification of putative gene families, transferring functional information from model plants to plants of agricultural, industrial and environmental importance [183,184]. ...
Article
The plasma membrane forms a permeable barrier that separates the cytoplasm from the external environment, defining the physical and chemical limits in each cell in all organisms. The movement of molecules and ions into and out of cells is controlled by the plasma membrane as a critical process for cell stability and survival, maintaining essential differences between the composition of the extracellular fluid and the cytosol. In this process aquaporins (AQPs) figure as important actors, comprising highly conserved membrane proteins that carry water, glycerol and other hydrophilic molecules through biomembranes, including the cell wall and membranes of cytoplasmic organelles. While mammals have 15 types of AQPs described so far (displaying 18 paralogs), a single plant species can present more than 120 isoforms, providing transport of different types of solutes. Such aquaporins may be present in the whole plant or can be associated with different tissues or situations, including biotic and especially abiotic stresses, such as drought, salinity or tolerance to soils rich in heavy metals, for instance. The present review addresses several aspects of plant aquaporins, from their structure, classification, and function, to in silico methodologies for their analysis and identification in transcriptomes and genomes. Aspects of evolution and diversification of AQPs (with a focus on plants) are approached for the first time with the aid of the LCA (Last Common Ancestor) analysis. Finally, the main practical applications involving the use of AQPs are discussed, including patents and future perspectives involving this important protein family.
... The call to integrate genetic and genomic resources for all legume species has been met by legumeinfo.org (LIS) (Dash et al. 2016;Gonzales et al. 2005;Waugh et al. 2001). Twelve sequenced legume species are available and have been annotated and compared with a multitude of analyses. ...
Chapter
Comparative genomics is the leveraging of genomic data between species to understand the evolution of genomes and species. With the increasing availability of genomics resources (genomes, transcriptomes, epigenomes, proteomes, etc.), opportunities exist to explore species relationships using comparative genomics. Comparative genomics is most commonly used to determine structural and functional variation between genomes. Traditional approaches that study genomes in isolation are limiting in both the kind of questions that can be answered, as well as the transferability of knowledge between species. Herein, we will address the recent advances in comparative genomics research, specifically in legumes, and how this wealth of knowledge can further expand our understanding of biological diversity. Comparative genomics can be performed at the genic or at genomic level, for which there are numerous workflows to exploit, including gene prediction and annotation, orthologous gene relationships, building gene and species phylogenetic trees, synteny, finding lineage specific genes, and pan-genomic analyses.
... (19), 33 plant genomes were downloaded from Phytozome 12.0 (http://www.phytozome.net) (20), four legume genomes were obtained from Legume Information System (https://legumeinfo.org) (21), two plant genomes were acquired from PLAZA 3.0 (http://bioinformatics.psb.ugent. be/plaza) (22), and the 107 remaining genomes were retrieved from Ensembl (http://www.ensembl.org/) ...
Article
Full-text available
Real-time quantitative polymerase chain reaction (qPCR) is one of the most important methods for analyzing the expression patterns of target genes. However , successful qPCR experiments rely heavily on the use of high-quality primers. Various qPCR primer databases have been developed to address this issue , but these databases target only a few important organisms. Here, we developed the qPrimerDB database, founded on an automatic gene-specific qPCR primer design and thermodynamics-based validation workflow. The qPrimerDB database is the most comprehensive qPCR primer database available to date, with a web front-end providing gene-specific and pre-computed primer pairs across 147 important organisms, including human, mouse, ze-brafish, yeast, thale cress, rice and maize. In this database, we provide 3331426 of the best primer pairs for each gene, based on primer pair coverage, as well as 47760359 alternative gene-specific primer pairs, which can be conveniently batch downloaded. The specificity and efficiency was validated for qPCR primer pairs for 66 randomly selected genes, in six different organisms, through qPCR assays and gel electrophoresis. The qPrimerDB database represents a valuable, timesaving resource for gene expression analysis. This resource, which will be routinely updated, is publically accessible at http: //biodb.swu.edu.cn/qprimerdb.
... There are many community-specific databases, which typically contain high standards information and address the particular researchers' community needs. Prominent community-specific databases are an example of those that cater to researchers focused on model organisms study (Lawrence et al., 2005) or clade-oriented comparative databases (Gonzales et al., 2005). Databases focused on specific types of data such as metabolism (Zhang et al., 2005) modification of protein (Tchieu et al., 2003) are examples of community-specific databases. ...
Article
Bioinformatics plays an important role in agriculture science. As the data amount grows exponentially, there is a parallel growth in tools and methods demand in visualization, integration, analysis, prediction and management of data. At the same time, many researchers in the field of plant sciences are unfamiliar with available methods, databases, and tools of bioinformatics which could lead to missed information opportunities or misinterpretation. Some key concepts of software packages, methods, and databases used in bioinformatics are described in this review. In this review, we have discussed some problems related to biological databases and biological sequence analyses. Gene findings, genome annotation, type of biological database, how to data represent and store was deliberated. Future perspective of bioinformatics tools was also discussed in this review.
... Other comparative genomics databases are GreenPhyIDB, Plaza and PlantGDB. Plant genome databases for specific plant have been developed such as TAIR, Gramene, SGN, GDR and LIS specific for Arabidopsis, grasses, Solanaceae, Rosaceae and legumes (Swarbreck et al. 2008;Liang et al. 2008;Bombarely et al. 2011;Jung et al. 2008;Gonzales et al. 2005). ...
Chapter
Sugarcane (Saccharum spp.) is a major crop grown for sugar and biofuel in tropical and subtropical regions around the world. Sugarcane has a high level of polyploidy, large and complex genome. There is a constant increase in demand of sugarcane worldwide, to meet this demand there is a need in improving the sugarcane yield, sucrose content, increasing growth rate, abiotic and biotic stress tolerance, etc. Researchers have been using conventional breeding efficiently to improve the sugarcane for many years. Present situation demands the improvement in sugarcane varieties at faster rate than which the conventional breeding technique can provide. It is possible to achieve faster improvement only when researchers understand the genome of the plant. Genetics and genome studies have given a better path to develop better varieties. Understanding of sugarcane genome can help breeders to support the conventional breeding in selecting the parents and traits needed. In spite of the complexity, sugarcane genome is been successfully studied and in recent past good progress have been made by genome sequencing strategy, i.e., bacterial artificial chromosome (BAC) libraries. Study on genetic diversity among the species of sugarcane was carried out by RFLP, AFLP, RAPD, SRAP, TRAP and so on. In late 1990s fluorescent in situ hybridization (FISH) technique was used to physically map two S. officinarum and three S. robustum clones. Later using molecular cytogenetic technique of FISH, many other clones were studied. Quantitative trait loci (QTLs) have been used to screen variety with sugar content, sugar yield, disease resistance, etc. Researchers in Brazil have developed SUCEST database which consist of over 230,000 Expressed Sequence Tags (ESTs) which can be used for detection of molecular polymorphisms, gene expression profiles and gene discovery. In this chapter we discuss about the progress made so far and challenges faced during the study of sugarcane genome.
... There are many community-specific databases, which typically contain high standards information and address the particular researchers' community needs. Prominent community-specific databases are an example of those that cater to researchers focused on model organisms study (Lawrence et al., 2005) or clade-oriented comparative databases (Gonzales et al., 2005). Databases focused on specific types of data such as metabolism (Zhang et al., 2005) modification of protein (Tchieu et al., 2003) are examples of community-specific databases. ...
Article
Full-text available
Bioinformatics plays an important role in agriculture science. As the data amount grows exponentially, there is a parallel growth in tools and methods demand in visualization, integration, analysis, prediction and management of data. At the same time, many researchers in the field of plant sciences are unfamiliar with available methods, databases, and tools of bioinformatics which could lead to missed information opportunities or misinterpretation. Some key concepts of software packages, methods, and databases used in bioinformatics are described in this review. In this review, we have discussed some problems related to biological databases and biological sequence analyses. Gene findings, genome annotation, type of biological database, how to data represent and store was deliberated. Future perspective of bioinformatics tools was also discussed in this review.
... There are many community-specific databases, which typically contain high standards information and address the particular researchers' community needs. Prominent community-specific databases are an example of those that cater to researchers focused on model organisms study (Lawrence et al., 2005) or clade-oriented comparative databases (Gonzales et al., 2005). Databases focused on specific types of data such as metabolism (Zhang et al., 2005) modification of protein (Tchieu et al., 2003) are examples of community-specific databases. ...
Article
Bioinformatics plays an important role in agriculture science. As the data amount grows exponentially, there is a parallel growth in tools and methods demand in visualization, integration, analysis, prediction and management of data. At the same time, many researchers in the field of plant sciences are unfamiliar with available methods, databases, and tools of bioinformatics which could lead to missed information opportunities or misinterpretation. Some key concepts of software packages, methods, and databases used in bioinformatics are described in this review. In this review, we have discussed some problems related to biological databases and biological sequence analyses. Gene findings, genome annotation, type of biological database, how to data represent and store was deliberated. Future perspective of bioinformatics tools was also discussed in this review.
... Legumes are excellent vegetable sources of proteins and oils. Moreover they are considered as organic fertilizers because of their ability to fix atmospheric nitrogen (Gonzales et al., 2005). Legumes contribute 33% of the dietary protein nitrogen (N) needed for humans and they are also rich in fiber and energy (Vance et al., 2000; Graham and Vance, 2003). ...
... Figure 2 describes the process of translational genomics and breeding in legume crops using genic information from soybean. This process is now very much possible due to the availability of several legume specific databases like Legume Information System (LIS) (Gonzales et al., 2005;Dash et al., 2016), LegumeIP 2.0 , SoyBase (Grant et al., 2010) and Soybean knowledge base (SoyKB) . The first two databases namely LIS and LegumeIP 2.0 are facilitating translational genomics and breeding in major legumes by integrating genetic, genomic, transcriptomic data, and comparative genomics across important legume crops while the SoyBase and SoyKB are facilitating translational genomics and breeding mainly in soybean. ...
Article
Full-text available
Food legumes play an important role in attaining both food and nutritional security along with sustainable agricultural production for the well-being of humans globally. The various traits of economic importance in legume crops are complex and quantitative in nature, which are governed by quantitative trait loci (QTLs). Mapping of quantitative traits is a tedious and costly process, however, a large number of QTLs has been mapped in soybean for various traits albeit their utilization in breeding programmes is poorly reported. For their effective use in breeding programme it is imperative to narrow down the confidence interval of QTLs, to identify the underlying genes, and most importantly allelic characterization of these genes for identifying superior variants. In the field of functional genomics, especially in the identification and characterization of gene responsible for quantitative traits, soybean is far ahead from other legume crops. The availability of genic information about quantitative traits is more significant because it is easy and effective to identify homologs than identifying shared syntenic regions in other crop species. In soybean, genes underlying QTLs have been identified and functionally characterized for phosphorous efficiency, flowering and maturity, pod dehiscence, hard-seededness, α-Tocopherol content, soybean cyst nematode, sudden death syndrome, and salt tolerance. Candidate genes have also been identified for many other quantitative traits for which functional validation is required. Using the sequence information of identified genes from soybean, comparative genomic analysis of homologs in other legume crops could discover novel structural variants and useful alleles for functional marker development. The functional markers may be very useful for molecular breeding in soybean and harnessing benefit of translational research from soybean to other leguminous crops. Thus, soybean crop can act as a model crop for translational genomics and breeding of quantitative traits in legume crops. In this review, we summarize current status of identification and characterization of genes underlying QTLs for various quantitative traits in soybean and their significance in translational genomics and breeding of other legume crops.
... While no legume genome sequence has been completely determined, projects are underway for several including bean, pea, soybean, alfalfa, and the two model legumes, M. truncatula and L. japonicus. Most such projects involve sequencing cDNAs, usually referred to as Expressed Sequence Tags (ESTs), while the gene space of both model legumes is being sequenced using genomic clones (Alkharouf and Matthews, 2004;Cannon et al., 2005;Cronk et al., 2006;Gonzales et al., 2005;Town, 2006;. ESTs have been used to construct arrays (see below) to jump start analysis of the plant transcriptome. ...
Article
With the sequencing of entire genomes it has become technically feasible to study transcription on a global scale. Accessing an organism's transcriptional profile provides a glimpse into its inner workings. Transcriptional studies help determine how an organism adapts to diverse environments and how it interacts with other organisms. In the symbiosis between rhizobial bacteria and legume plants, the two organisms must be able to adapt to various environmental stresses and communicate to form a mutually beneficial relationship. The study of global gene expression during this nitrogen-fixing symbiosis has confirmed results of earlier studies and has shed new light on the molecular players involved in this complex, highly choreographed interaction.
... The website was originally available at http://comparative-legumes. org (3,6), but was moved in 2013 to http://legumeinfo.org. Along with the change in domain name, the new genomic data portal (GDP) is implemented with Tripal (7,8), which consists of a set of Drupal modules for developing GDPs, and Chado (9) a database schema for biological data. ...
Poster
Full-text available
LIS (legumeinfo.org) is a resource for trait genetics and comparative genomics for legumes. The site hosts annotated genomes for nine species: common bean, chickpea, pigeonpea, Medicago truncatula, Lotus japonicus, mungbean, soybean (SoyBase.org) and two Arachis species (PeanutBase.org). A major effort at LIS is to leverage data from information-rich species, such as soybean and Medicago, to aid the interpretation of data from other species, using phylogenetic and synteny-based approaches. Genes from all hosted genomes have been placed into ~18,500 gene families - searchable and viewable as gene trees and multiple alignments. These families enable traversal among orthologous and paralogous sequences across the legumes. This complements functional annotations based on protein domains and multi-species microsynteny views using a genome “context viewer” showing genomic regions with similar local gene content and ordering. Chromosome-scale synteny blocks are presented in per-species genome browsers. The other emphasis at LIS is integration of genetic and genomic data. QTLs from many studies (so far focused on common bean and peanut) have been collected and integrated into a common database, and projected onto composite genetic maps (in CMap) when possible. Molecular markers are being mapped on to the genome to make possible traversal from traits to genome, and vice versa. Germplasm data is also being incorporated in readiness for future sequence-based genotyping and phenotype data. LIS, funded by the USDA-ARS and jointly developed with NCGR, is a major component of the NSF-funded Legume Federation project that promotes sharing common resources and standards among its member databases.
Article
Full-text available
With the increasing availability of large-scale biology data in crop plants, there is an urgent demand for a versatile platform that fully mines and utilizes the data for modern molecular breeding. We present Crop-GPA ( https://crop-gpa.aielab.net ), a comprehensive and functional open-source platform for crop gene-phenotype association data. The current Crop-GPA provides well-curated information on genes, phenotypes, and their associations (GPAs) to researchers through an intuitive interface, dynamic graphical visualizations, and efficient online tools. Two computational tools, GPA-BERT and GPA-GCN, are specifically developed and integrated into Crop-GPA, facilitating the automatic extraction of gene-phenotype associations from bio-crop literature and predicting unknown relations based on known associations. Through usage examples, we demonstrate how our platform enables the exploration of complex correlations between genes and phenotypes in crop plants. In summary, Crop-GPA serves as a valuable multi-functional resource, empowering the crop research community to gain deeper insights into the biological mechanisms of interest.
Chapter
The advancement of sequencing technologies and molecular techniques has generated a huge amount of biological data which needs to be analyzed and interpreted with the help of machine learning and artificial intelligence-based methods. Bioinformatics tools and databases like NCBI-BLAST, ensembl Plants, Galaxy platform, RAP-DB, etc., plays a major role in understanding the functional genomics and molecular system of many crops. Bioinformatics tools and databases emerged as a crucial platform to deal with huge data generated by the OMICS technologies such as genomics, transcriptomics, proteomics, and metabolomics and used to draw logical conclusions about the problem. Bioinformatics in agriculture, also known as agro-informatics, plays an important and increasing role in deciphering the genomics and related information for crop improvement and also discussed some of the important genomic tools and databases in this chapter.
Chapter
Global agricultural productivity is regulated by soil salinity, one of the major abiotic constraints faced by the farmers, growers and breeders. The genomic, transcriptomics, proteomics and metabolomic salt profile of coastal plants could provide an insight into the mechanisms by which the differential performance is regulated in contrasting varieties of a single crop. This study proposes the construction of an ionome suitable for the coastal saline region for sustainable food security. This study focuses on functional genomic studies of saline belt crops and meta-analysis of the information for proposed ionome (Salt-omic) development. In the salt-genome segment, the appropriate genes were identified and categorized covering ion-transport-genes, senescence-associated genes, molecular-chaperones, dehydration-genes. Proteome provides additional information on protein coding sequences, endogenous small molecules. The identified genes, proteins and signalling pathways could form an ionome repository for molecular crop breeding programmes. The primary bioinformatics web source along with a customized database for several crops were found useful for identifying essential biomolecules. The study was able to assist in the formation of agri-ionome for the improvement of coastal crops. The sequential integration of agri-engineering model along with omic details could be utilized for the construction of an explicit repository for molecular plant breeders in a way similar to AMBAB, LIS, Pulsechip or RiceMetaSys.
Chapter
Dry beans, a nutrient-dense dietary staple in Africa, Latin America, and the Caribbean, deliver nutrients such as protein, minerals, and folate, which are often in short supply in other staples. Beans are relatively rich in iron and zinc, two micronutrients for which dietary deficiencies impact billions of people globally. Wide genetic variability in beans seeds, from ~34 to 96 mg/kg for iron and 21 to 59 mg/kg for zinc, led to the recognition that biofortification of beans for maximum levels of these micronutrients is possible through plant breeding. Biofortification efforts to develop bean varieties with seed iron concentrations approaching 90 mg/kg have been underway since the early 2000s. Iron and zinc levels in seeds are positively correlated with each other, and although iron has been the major focus of biofortification efforts, zinc is often evaluated alongside iron. Germplasm diversity screenings have revealed multiple high iron sources in cultivated Andean and Middle American beans as well as wild P. vulgaris and genotypes from closely related species P. dumosus, P. acutifolius, and P. parvifolius. Both seed iron and zinc are moderately heritable traits, and breeding with high iron donor parents based on phenotypic selection has been successfully utilized to achieve genetic gains. To date, at least 60 high iron bean varieties have been released over 12 countries in Eastern and Southern Africa and Latin America. Bean breeders have combined the high iron trait with other traits important to farmers, including seed yield, disease resistance, and abiotic stress tolerance. The application of genomic approaches in breeding high iron beans has been limited. While numerous seed iron and zinc Quantitative Trait Loci (QTL) studies have been undertaken and a meta-analysis identified 12 meta-QTL, 8 of which are for both increased iron and zinc, there has not been much traction in incorporation of these QTL in breeding strategies. Since iron and zinc are quantitative traits controlled by many small-effect QTL, breeders have not found marker-assisted breeding with single or multiple QTL worthwhile. A genomic prediction approach, which in contrast, utilizes thousands of random markers throughout the genome, may be a promising strategy to apply to breeding high iron and zinc beans, and is currently being explored. The prospect of using a transgenic approach to develop high iron and zinc beans is limited at this time due to challenges with plant regeneration and public acceptance of genetically modified (GMO) beans, which may change in the future, and there are many potential candidate genes. The future of biofortification of beans with iron must also look beyond a pure focus on increasing concentration as this approach relies on the assumption that higher iron yields deliver more absorbable iron. To date, one human efficacy study has demonstrated a positive, although slight, effect of biofortification on human iron status. Regardless of concentration, iron from beans can have very low bioavailability due to seed coat polyphenols and phytic acid present in the cotyledons. Evidence from in vitro and animal studies suggests that beans without inhibitory polyphenols and with promoter polyphenols would have higher iron bioavailability and thus deliver more iron. Therefore, redefining biofortification to focus on both iron bioavailability and iron concentration simultaneously in breeding programs has the potential to deliver substantially more nutritional benefits to consumers. The introduction of varieties labeled as high iron beans in Africa and Latin America has largely been met with interest and adoption by farmers and consumers due to strong promotion and the development of varieties with superior yield and disease resistance. Going forward in addition to focusing on iron bioavailability, a greater focus should also be placed on zinc.KeywordsFe fortificationZn fortificationDry beansQTLBiofortified varieties
Article
Full-text available
Components of the plant immune signaling network need mechanisms that confer resilience against fast‐evolving pathogen effectors that target them. Among eight Arabidopsis CaM‐Binding Protein (CBP) 60 family members, AtCBP60g and AtSARD1 are partially functionally redundant, major positive immune regulators, and AtCBP60a is a negative immune regulator. We investigated possible resilience‐conferring evolutionary mechanisms among the CBP60a, CBP60g and SARD1 immune regulatory subfamilies. Phylogenetic analysis was used to investigate the times of CBP60 subfamily neofunctionalization. Then, using the pairwise distance rank based on the newly developed analytical platform Protein Evolution Analysis in a Euclidean Space (PEAES), hypotheses of specific coevolutionary mechanisms that could confer resilience on the regulator module were tested. The immune regulator subfamilies diversified around the time of angiosperm divergence and have been evolving very quickly. We detected significant coevolutionary interactions across the immune regulator subfamilies in all of 12 diverse core eudicot species lineages tested. The coevolutionary interactions were consistent with the hypothesized coevolution mechanisms. Despite their unusually fast evolution, members across the CBP60 immune regulator subfamilies have influenced the evolution of each other long after their diversification in a way that could confer resilience on the immune regulator module against fast‐evolving pathogen effectors.
Article
Full-text available
The 2020 world population data sheet estimates that 30 years from now, the population is intended to outstretch approximately up to 9.9 billion. The exponential rise in population has led to the agitation of land and the environment. Abrupt changes in climatic conditions and increasing global population emphasize the propagation of novel crops and acclimatization of the available plants and crops to obtain sufficient food in the extended future. In addition to 'Omics', bioinformatics plays an essential role in understanding the underlying mechanism of molecular functional systems in various plants. Bioinformatics contributes towards multidisciplinary interactions and helps to reshape agricultural tradition and production, providing knowledge for enhanced plant quality, and it also provides a plan of action for protection against adverse environmental conditions. This review examines recent approaches in systems biology for predicting the functionality of genes and their networks (GRNs) along with its influential effect in current research involving various disciplines like genomics, proteomics, transcriptomics, phenomics, interactomics, ionomics, epigenomics and metabolomics, which will be very useful to researchers in plant sciences.
Chapter
Phytohormones play a crucial role in regulating plant developmental processes. Among them, ethylene and jasmonate are known to be involved in plant defense responses to a wide range of biotic stresses as their levels increase with pathogen infection. In addition, these two phytohormones have been shown to inhibit plant nodulation in legumes. Here, exogenous salicylic acid (SA), jasmonate acid (JA), and ethephon (ET) were applied to the root system of Casuarina glauca plants before Frankia inoculation, in order to analyze their effects on the establishment of actinorhizal symbiosis. This protocol further describes how to identify putative ortholog genes involved in ethylene and jasmonate biosynthesis and/or signaling pathways in plant, using the Arabidopsis Information Resource (TAIR), Legume Information System (LIS), and Genevestigator databases. The expression of these genes in response to the bacterium Frankia was analyzed using the gene atlas for Casuarina–Frankia symbiosis (SESAM web site).
Chapter
Medicago truncatula emerged in 1990 as a model for legumes, comprising the third largest land plant family. Most legumes form symbiotic nitrogen-fixing root nodules with compatible soil bacteria and thus are important contributors to the global nitrogen cycle and sustainable agriculture. Legumes and legume products are important sources for human and animal protein as well as for edible and industrial oils. In the years since M. truncatula was chosen as a legume model, many genetic, genomic, and molecular resources have become available, including reference quality genome sequences for two widely used genotypes. Accessibility of genomic data is important for many different types of studies with M. truncatula as well as for research involving crop and forage legumes. In this chapter, we discuss strategies to obtain archived M. truncatula genomic data originally deposited into custom databases that are no longer maintained but are now accessible in general databases. We also review key current genomic databases that are specific to M. truncatula as well as those that contain M. truncatula data in addition to data from other plants.
Article
Full-text available
Next-generation sequencing and traditional Sanger sequencing methods are of great significance in unraveling the complexity of plant genomes. These are constantly generating heaps of sequence data to be analyzed, annotated and stored. This has created a revolutionary demand for bioinformatics tools and software that can perform these functions. A large number of potentially useful bioinformatics tools and plant genome databases are created that have greatly simplified the analysis and storage of vast amounts of sequence data. The information garnered using the available bioinformatics methods have greatly helped in understanding the plant genome structure. Despite the availability of a good number of such tools, the information pouring from single gene-sequencing, and various whole-genome sequencing projects is overwhelming; thus, further innovations and improved methods are needed to sift through this sequence data, and assemble genomes. The current review focuses on diverse bioinformatics approaches and methods developed to systematically analyze and store plant sequence data. Finally, it outlines the bottlenecks in plant genome analysis, and some possible solutions that could be utilized to overcome the problems associated with plant genome analysis.
Article
Full-text available
White lupin (Lupinus albus L.) is a valuable source of seed protein, carbohydrates and oil, but requires genetic improvement to attain its agronomic potential. This study aimed to (i) develop a new high-density consensus linkage map based on new, transcriptome-anchored markers; (ii) map four important agronomic traits, namely, vernalization requirement, seed alkaloid content, and resistance to anthracnose and Phomopsis stem blight; and, (iii) define regions of synteny between the L. albus and narrow-leafed lupin (L. angustifolius L.) genomes. Mapping of white lupin quantitative trait loci (QTLs) revealed polygenic control of vernalization responsiveness and anthracnose resistance, as well as a single locus regulating seed alkaloid content. We found high sequence collinearity between white and narrow-leafed lupin genomes. Interestingly, the white lupin QTLs did not correspond to previously mapped narrow-leafed lupin loci conferring vernalization independence, anthracnose resistance, low alkaloids and Phomopsis stem blight resistance, highlighting different genetic control of these traits. Our suite of allele-sequenced and PCR validated markers tagging these QTLs is immediately applicable for marker-assisted selection in white lupin breeding. The consensus map constitutes a platform for synteny-based gene cloning approaches and can support the forthcoming white lupin genome sequencing efforts.
Article
Full-text available
The combinatorial interaction of a receptor kinase and a modified CLE peptide is involved in several developmental processes in plants, including Autoregulation of Nodulation (AON), which allows legumes to limit the number of root nodules formed based on available nitrogen and previous rhizobial colonization. Evidence supports modification of CLE peptides by enzymes of the hydroxyproline O-arabinosyltransferase (HPAT/RDN) family. Here we show by grafting and genetic analysis that in the AON pathway, RDN1, functioning in the root, acts upstream of the receptor kinase SUNN, functioning in the shoot. As expected for a glycosyltransferase, we found that RDN1 and RDN2 proteins are localized to the Golgi, as was shown previously for AtHPAT1. Using composite plants with transgenic hairy roots, we show that RDN1 and RDN2 orthologs from dicots as well as a related RDN gene from rice can rescue the phenotype of rdn1-2 when expressed constitutively, but the less related MtRDN3 cannot. The timing of the induction of MtCLE12 and MtCLE13 peptide genes (negative regulators of AON) in nodulating roots is not altered by mutation of RDN1 or SUNN, although expression levels are higher. Plants with transgenic roots constitutively expressing MtCLE12 require both RDN1 and SUNN to prevent nodule formation, while plants constitutively expressing MtCLE13 require only SUNN, suggesting the two CLEs have different requirements for function. Combined with previous work, the data support a model in which RDN1 arabinosylates MtCLE12, and this modification is necessary for transport and/or reception of the AON signal by the SUNN kinase.
Chapter
In this chapter, we introduce the latest development of LegumeIP: a platform of comparative genomics and transcriptomics, and then describe some practical usages of the LegumeIP for studying gene functions, molecular mechanisms underpinning the plant-rhizobia interactions, and genome evolution with respect to nitrogen fixing in several agriculturally important model legume species. LegumeIP currently hosts large-scale genomics and transcriptomics data that include (i) genomic sequences of three model legumes, Medicago truncatula, Glycine max (soybean), Lotus japonicus, and two reference plant species, Arabidopsis thaliana and Poplar trichocarpa, with the annotation based on UniProt, InterProScan, Gene Ontology, and KEGG (Kyoto Encyclopedia of Genes and Genomes) databases, comprising a total of 222,217 protein-coding gene sequences; (ii) large-scale compendium gene expression data sets compiled from various tissues of multiple species. These include 104 microarray data sets from L. japonicus, 156 microarray data sets from M. truncatula gene atlas database, and 14 RNA-seq data sets from G. max. These data are further compiled centering on four tissues: nodules, flowers, roots, and leaves being shared by all species; (iii) systematic synteny analysis among M. truncatula, G. max, L. japonicus, and A. thaliana; (iv) reconstruction of gene family and gene family-wide phylogenetic analysis across the five hosted species; and (v) genome-wide reconstruction of gene coexpression networks. The usefulness of this platform in facilitating molecular research of legume species is demonstrated by two case studies, in which SymRK (symbiosis receptor-like kinase) genes for symbiosis analysis and nitrogen-fixation-related genes in M. truncatula were identified through integrative analysis of gene expression and constructed coexpression networks provided by the LegumeIP platform. The LegumeIP is freely available at http://plantgrn.noble.org/LegumeIP/.
Chapter
Genome duplication, widespread in flowering plants, is a driving force in evolution. Genome alignments between/within genomes facilitate identification of homologous regions and individual genes to investigate evolutionary consequences of genome duplication. PGDD (the Plant Genome Duplication Database), a public web service database, provides intra- or interplant genome alignment information. At present, PGDD contains information for 47 plants whose genome sequences have been released. Here, we describe methods for identification and estimation of dates of genome duplication and speciation by functions of PGDD. The database is freely available at http:// chibba. agtec. uga. edu/ duplication/
Chapter
This chapter presents a use case illustrating the search for homologues of a known protein in species-specific genome sequence databases. The results from different species-specific resources are compared to each other and to results obtained from a more general genome sequence database (Phytozome). Various options and settings relevant when searching these databases are discussed. For example, it is shown how the choice of reference sequence set in a given database influences the results one obtains. The provided examples illustrate some problems and pitfalls related to interpreting results obtained from species-specific genome sequence databases.
Article
Full-text available
Lotus japonicus is a well-characterized model legume widely used in the study of plant-microbe interactions. However, datasets from various Lotus studies are poorly integrated and lack interoperability. We recognize the need for a comprehensive repository that allows comprehensive and dynamic exploration of Lotus genomic and transcriptomic data. Equally important are user-friendly in-browser tools designed for data visualization and interpretation. Here, we present Lotus Base, which opens to the research community a large, established LORE1 insertion mutant population containing an excess of 120,000 lines, and serves the end-user tightly integrated data from Lotus, such as the reference genome, annotated proteins, and expression profiling data. We report the integration of expression data from the L. japonicus gene expression atlas project, and the development of tools to cluster and export such data, allowing users to construct, visualize, and annotate co-expression gene networks. Lotus Base takes advantage of modern advances in browser technology to deliver powerful data interpretation for biologists. Its modular construction and publicly available application programming interface enable developers to tap into the wealth of integrated Lotus data. Lotus Base is freely accessible at: https://lotus.au.dk.
Chapter
Drought stress induces a vast array of responses in plants that require the use of integrative and multidisciplinary approaches to understand the different levels of regulation. Holistic systems biology approaches still remain unexploited, which is especially important for plant and agricultural sciences. Given the increasing development of high-throughput genomic tools and concomitant progress on plant genome sequencing, it is now possible to gain quantitative information at a comprehensive scale and a quantitative overview on the gene-to-metabolite networks that are associated with a particular plant phenotype. Systems biology aims to find regulatory mechanisms controlling gene expression, to identify candidate genes and molecular markers to support promissory strategies to engineer and/or breed plants with desired traits such as enhanced quality. Most of the systems biology approaches rely upon three main axes representing the multiple layers of the regulation of gene expression: transcriptomics, proteomics, and metabolomics. Coupled with the study of the noncoding genome, new insights into the regulation of gene expression are being provided as well as their effects on the phenotypic changes in a specific biological context. Bioinformatics tools have been crucial in omics-based research to manage genome-wide datasets, extract valuable information, and facilitate knowledge exchange between model and crop species. The present chapter reviews the use of system biology approaches undertaken to understand drought stress response in plants, providing a critical discussion on the constraints and future prospects of using these approaches to address the current needs of agriculture in a context of climate change.
Article
Molecular genetic markers represent one of the most powerful tools for the analysis of genomes and enable the association of heritable traits with underlying genomic variation. Molecular marker technology has developed rapidly over the last decade and two forms of sequence based marker, Simple Sequence Repeats (SSRs), also known as microsatellites, and Single Nucleotide Polymorphisms (SNPs) now predominate applications in modern genetic analysis. The reducing cost of DNA sequencing has led to the availability of large sequence data sets derived from whole genome sequencing and large scale Expressed Sequence Tag (EST) discovery that enable the mining of SSRs and SNPs, which may then be applied to diversity analysis, genetic trait mapping, association studies, and marker assisted selection. These markers are inexpensive, require minimal labour to produce and can frequently be associated with annotated genes. Here we review automated methods for the discovery of SSRs and SNPs and provide an overview of the diverse applications of these markers.
Article
Full-text available
Ore mineral and host lithologies have been sampled with 89 oriented samples from 14 sites in the Naica District, northern Mexico. Magnetic parameters permit to charac- terise samples: saturation magnetization, density, low- high-temperature magnetic sus- ceptibility, remanence intensity, Koenigsberger ratio, Curie temperature and hystere- sis parameters. Rock magnetic properties are controlled by variations in titanomag- netite content and hydrothermal alteration. Post-mineralization hydrothermal alter- ation seems the major event that affected the minerals and magnetic properties. Curie temperatures are characteristic of titanomagnetites or titanomaghemites. Hysteresis parameters indicate that most samples have pseudo-single domain (PSD) magnetic grains. Alternating filed (AF) demagnetization and isothermal remanence (IRM) ac- quisition both indicate that natural and laboratory remanences are carried by MD-PSD spinels in the host rocks. The trend of NRM intensity vs susceptibility suggests that the carrier of remanent and induced magnetization is the same in all cases (spinels). The Koenigsberger ratio range from 0.05 to 34.04, indicating the presence of MD and PSD magnetic grains. Constraints on the geometry of the intrusive source body devel- oped in the model of the magnetic anomaly are obtained by quantifying the relative contributions of induced and remanent magnetization components.
Article
Full-text available
Unlabelled: InterProScan is a tool that scans given protein sequences against the protein signatures of the InterPro member databases, currently--PROSITE, PRINTS, Pfam, ProDom and SMART. The number of signature databases and their associated scanning tools as well as the further refinement procedures make the problem complex. InterProScan is designed to be a scalable and extensible system with a robust internal architecture. Availability: The Perl-based InterProScan implementation is available from the EBI ftp server (ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/) and the SRS-basedInterProScan is available upon request. We provide the public web interface (http://www.ebi.ac.uk/interpro/scan.html) as well as email submission server (interproscan@ebi.ac.uk).
Article
Full-text available
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
Article
Full-text available
We introduce a general probabilistic model of the gene structure of human genomic sequences which incorporates descriptions of the basic transcriptional, translational and splicing signals, as well as length distributions and compositional features of exons, introns and intergenic regions. Distinct sets of model parameters are derived to account for the many substantial differences in gene density and structure observed in distinct C + G compositional regions of the human genome. In addition, new models of the donor and acceptor splice signals are described which capture potentially important dependencies between signal positions. The model is applied to the problem of gene identification in a computer program, GENSCAN, which identifies complete exon/intron structures of genes in genomic DNA. Novel features of the program include the capacity to predict multiple genes in a sequence, to deal with partial as well as complete genes, and to predict consistent sets of genes occurring on either or both DNA strands. GENSCAN is shown to have substantially higher accuracy than existing methods when tested on standardized sets of human and vertebrate genes, with 75 to 80% of exons identified exactly. The program is also capable of indicating fairly accurately the reliability of each predicted exon. Consistently high levels of accuracy are observed for sequences of differing C + G content and for distinct groups of vertebrates.
Article
Full-text available
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
Article
Full-text available
Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
Article
Full-text available
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
Article
Full-text available
GenBank (R) is a comprehensive database that contains publicly available DNA sequences for more than 140 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large‐scale sequencing projects. Most submissions are made using the BankIt (web) or Sequin program and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in the UK and the DNA Data Bank of Japan helps ensure worldwide coverage. GenBank is accessible through NCBI’s retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI home page at: http://www.ncbi.nlm.nih.gov.
Article
Motivation: As databanks grow, sequence classification and prediction of function by searching protein family databases becomes increasingly valuable. The original Blocks Database, which contains ungapped multiple alignments for families documented in PROSITE, can be searched to classify new sequences. However PROSITE is incomplete, and families from other databases are now available to expand coverage of the Blocks Database. Results: To take advantage of protein family information present in several existing compilations, we have used five databases to construct Blocks+, a unified database that is built on the PROTOMAT/BLOSUM scoring model and that can be searched using a single algorithm for consistent sequence classification. The LAMA blocks-versus-blocks searching program identifies overlapping protein families, making possible a non-redundant hierarchical compilation. Blocks+ consists of all blocks derived from PROSITE, blocks from Prints not present in PROSITE, blocks from Pfam-A not present in PROSITE or Prints, and so on for ProDom and Demo, for a total of 1995 protein families represented by 8909 blocks, doubling the coverage of the original Blocks Database. A challenge for any procedure aimed at non-redundancy is to retain related but distinct families while discarding those that are duplicates. We illustrate how using multiple compilations can minimize this potential problem by examining the SNF2 family of ATPases, which is detectably similar to distinct families of helicases and ATPases.
Article
The most highly conserved regions of proteins can be represented as "blocks" of locally aligned sequence segments. Previously, an automated system was introduced to generate a database of blocks that is searched for local similarities using a sequence query. Here, we describe a method for searching this database that can also reveal significant global similarities. Local and global alignments are scored independently, so they can be used in concert to infer homology. A set of 7082 diverse sequences not represented in the database provided queries for testing this approach. The resulting distributions of scores led to guidelines for interpretation of search data and to the classification of 289 uncatalogued sequences into known groups. Thirty-eight of these relationships appear to be new discoveries. We also show how searching a database of blocks can be used to detect repeated domains and to find distinct cross-family relationships that were missed in searches of sequence databases.
Article
As databanks grow, sequence classification and prediction of function by searching protein family databases becomes increasingly valuable. The original Blocks Database, which contains ungapped multiple alignments for families documented in Prosite, can be searched to classify new sequences. However, Prosite is incomplete, and families from other databases are now available to expand coverage of the Blocks Database. To take advantage of protein family information present in several existing compilations, we have used five databases to construct Blocks+, a unified database that is built on the PROTOMAT/BLOSUM scoring model and that can be searched using a single algorithm for consistent sequence classification. The LAMA blocks-versus-blocks searching program identifies overlapping protein families, making possible a non-redundant hierarchical compilation. Blocks+ consists of all blocks derived from PROSITE, blocks from Prints not present in PROSITE, blocks from Pfam-A not present in PROSITE or Prints, and so on for ProDom and Domo, for a total of 1995 protein families represented by 8909 blocks, doubling the coverage of the original Blocks Database. A challenge for any procedure aimed at non-redundancy is to retain related but distinct families while discarding those that are duplicates. We illustrate how using multiple compilations can minimize this potential problem by examining the SNF2 family of ATPases, which is detectably similar to distinct families of helicases and ATPases. http://blocks.fhcrc.org/
Article
The Blocks Database WWW (http://blocks.fhcrc.org ) and Email (blocks@blocks.fhcrc.org ) servers provide tools to search DNA and protein queries against the Blocks+ Database of multiple alignments, which represent conserved protein regions. Blocks+ nearly doubles the number of protein families included in the database by adding families from the Pfam-A, ProDom and Domo databases to those from PROSITE and PRINTS. Other new features include improved Block Searcher statistics, searching with NCBI's IMPALA program and 3D display of blocks on PDB structures.
Article
At certain junctures in development, gene transcription is coupled to the completion of landmark morphological events. We refer to this dependence on morphogenesis for gene expression as "morphological coupling." Three examples of morphological coupling in prokaryotes are reviewed in which the activation of a transcription factor is tied to the assembly of a critically important structure in development.
Article
We describe improvements of the currently most popular method for prediction of classically secreted proteins, SignalP. SignalP consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated. Motivated by the idea that the cleavage site position and the amino acid composition of the signal peptide are correlated, new features have been included as input to the neural network. This addition, combined with a thorough error-correction of a new data set, have improved the performance of the predictor significantly over SignalP version 2. In version 3, correctness of the cleavage site predictions has increased notably for all three organism groups, eukaryotes, Gram-negative and Gram-positive bacteria. The accuracy of cleavage site prediction has increased in the range 6-17% over the previous version, whereas the signal peptide discrimination improvement is mainly due to the elimination of false-positive predictions, as well as the introduction of a new discrimination score for the neural network. The new method has been benchmarked against other available methods. Predictions can be made at the publicly available web server