Table 1 - uploaded by Rachael P Huntley
Content may be subject to copyright.
Examples of taxonomic groups whose only source of annotation is from automatic prediction methods

Examples of taxonomic groups whose only source of annotation is from automatic prediction methods

Source publication
Article
Full-text available
The Gene Ontology Consortium (GOC) is a major bioinformatics project that provides structured controlled vocabularies to classify gene product function and location. GOC members create annotations to gene products using the Gene Ontology (GO) vocabularies, thus providing an extensive, publicly available resource. The GO and its annotations to gene...

Context in source publication

Context 1
... type of annota- tion is critical for supplying functional information to a wide range of species that do not have experimental data or a dedicated manual annotation focus. There are ap- proximately 31 million proteins spanning 434,561 taxa (October 2013) where the only source of GO annotation is from automatic methods, some examples of which are shown in Table 1. When comparing this with the approxi- mately 264,000 proteins over 2,800 taxa that additionally have manual annotation, it is clear that automatic annota- tion is a very powerful method of populating large num- bers of proteins with annotations in a short amount of time. ...

Citations

... Bioinformatic analysis was performed using the Gene Ontology [51,52], Reactome [53,54], UniProtKB Keywords [55], and Kyoto encyclopaedia of genes and genomes (KEGG) databases [56,57] screened employing the Database for Annotation, Visualization and Integrated Discovery (DAVID) [58,59]. Bioinformatic filtration was conducted in October 2023, considering the constant updates of the mentioned bioinformatic databases, including organellar and sub-organellar protein localisation annotations [60,61]. For the filtration of bioinformatic pathways, we applied an Expression Analysis Systematic Explorer (EASE) score, a conservative adjustment to the Fisher exact probability, which is calculated by removing one gene within the given category from the list and calculating the resulting Fisher exact probability for that category [58,59,62]. ...
Article
Full-text available
Calciprotein particles (CPPs) are indispensable scavengers of excessive Ca2+ and PO43− ions in blood, being internalised and recycled by liver and spleen macrophages, monocytes, and endothelial cells (ECs). Here, we performed a pathway enrichment analysis of cellular compartment-specific proteomes in primary human coronary artery ECs (HCAEC) and human internal thoracic artery ECs (HITAEC) treated with primary (amorphous) or secondary (crystalline) CPPs (CPP-P and CPPs, respectively). Exposure to CPP-P and CPP-S induced notable upregulation of: (1) cytokine- and chemokine-mediated signaling, Ca2+-dependent events, and apoptosis in cytosolic and nuclear proteomes; (2) H+ and Ca2+ transmembrane transport, generation of reactive oxygen species, mitochondrial outer membrane permeabilisation, and intrinsic apoptosis in the mitochondrial proteome; (3) oxidative, calcium, and endoplasmic reticulum (ER) stress, unfolded protein binding, and apoptosis in the ER proteome. In contrast, transcription, post-transcriptional regulation, translation, cell cycle, and cell–cell adhesion pathways were underrepresented in cytosol and nuclear compartments, whilst biosynthesis of amino acids, mitochondrial translation, fatty acid oxidation, pyruvate dehydrogenase activity, and energy generation were downregulated in the mitochondrial proteome of CPP-treated ECs. Differentially expressed organelle-specific pathways were coherent in HCAEC and HITAEC and between ECs treated with CPP-P or CPP-S. Proteomic analysis of mitochondrial and nuclear lysates from CPP-treated ECs confirmed bioinformatic filtration findings.
... Several publications are addressing the issues related to temporal changes applied to these ontologies. In [15], authors identified several scenarios during which the created Gene Ontology (GO) may change. Predefined post-processing procedures and cleanup rules assert the consistency of the final version of evolving ontology. ...
Article
Full-text available
Nowadays, none can expect that knowledge about some part of reality will not change. Consequently, a representation of such evolving knowledge (for example, ontologies) also changes. Such changes entail that applications incorporating such knowledge may become compromised and yield wrong results. An example of such an application is ontology alignment which can be informally described as a set of connections between two ontologies. Those connections mark elements from two ontologies that relate to the same parts of reality. In changing one of the corresponding ontologies, such connections may become invalid. One may designate the ontology alignment once again from scratch for altered ontologies. However, such an approach is time and resource-consuming. The paper comprehensively presents our ontology evolution and alignment maintenance framework. It can be used to preserve the validity of ontology alignment using only the analysis of changes introduced to maintained ontologies. The precise definition of ontologies is provided, along with a definition of the ontology change log. A set of algorithms that allow revalidating ontology alignments have been built based on such elements.
... There were three types of annotations used from Gene Ontology (GO): molecular function, cellular component, and biological process. GO terms were obtained through UniProt GOA, which had granular GO annotations and excluded those higher up within the GO hierarchy when identified by the same technique [29,30]. All ChEBI entries except those that mapped to ChEMBL entries were utilized as function assignments. ...
Article
Full-text available
We present the Pharmacorank search tool as an objective means to obtain prioritized protein drug targets and their associated medications according to user-selected diseases. This tool could be used to obtain prioritized protein targets for the creation of novel medications or to predict novel indications for medications that already exist. To prioritize the proteins associated with each disease, a gene similarity profiling method based on protein functions is implemented. The priority scores of the proteins are found to correlate well with the likelihoods that the associated medications are clinically relevant in the disease’s treatment. When the protein priority scores are plotted against the percentage of protein targets that are known to bind medications currently indicated to treat the disease, which we termed the pertinency score, a strong correlation was observed. The correlation coefficient was found to be 0.9978 when using a weighted second-order polynomial fit. As the highly predictive fit was made using a broad range of diseases, we were able to identify a general threshold for the pertinency score as a starting point for considering drug repositioning candidates. Several repositioning candidates are described for proteins that have high predicated pertinency scores, and these provide illustrative examples of the applications of the tool. We also describe focused reviews of repositioning candidates for Alzheimer’s disease. Via the tool, an open online interface is provided for interactive use, and there is a site for programmatic access.
... Functional annotation has been widely applied on analyzing the biological processes of collecting genes based on molecular function, biological role, subcellular location, and the regulatory pathways [52,53]. The functional annotation results of the eight-gene signature pointed out that the cross-reaction of identified eight genes were involved in regulating steroid hormone biosynthesis and process and modulating the cellular response of steroid hormone as well as affecting the signal transduction, biological oxidation, and metabolic pathways. ...
Article
Full-text available
Simple Summary Prostate cancer (PC) is the second most common cancer worldwide and steroid hormone plays an important role in prostate carcinogenesis. Most patients with PC are initially sensitive to androgen deprivation therapy (ADT) but eventually become hormone refractory and reflect disease progression. The aim of the study was to investigate the genes which regulate the steroid hormone functional pathways and associate with the disease progression of PC. We identified a panel of eight-gene signatures that modulated steroid-hormone pathways and predicted the prognosis of PC using integrative bioinformatics analysis of multiple datasets validated from external cohorts. This panel could be used for predicting the prognosis of PC patients and might be associated with the drug response of hormonal therapies. Moreover, these genes in the signature could be potential targets to develop a novel treatment for castration-resistant PC therapy. Abstract The importance of anti-androgen therapy for prostate cancer (PC) has been well recognized. However, the mechanisms underlying prostate cancer resistance to anti-androgens are not completely understood. Therefore, identifying pharmacological targets in driving the development of castration-resistant PC is necessary. In the present study, we sought to identify core genes in regulating steroid hormone pathways and associating them with the disease progression of PC. The selection of steroid hormone-associated genes was identified from functional databases, including gene ontology, KEGG, and Reactome. The gene expression profiles and relevant clinical information of patients with PC were obtained from TCGA and used to examine the genes associated with steroid hormone. The machine-learning algorithm was performed for key feature selection and signature construction. With the integrative bioinformatics analysis, an eight-gene signature, including CA2, CYP2E1, HSD17B, SSTR3, SULT1E1, TUBB3, UCN, and UGT2B7 was established. Patients with higher expression of this gene signature had worse progression-free interval in both univariate and multivariate cox models adjusted for clinical variables. The expression of the gene signatures also showed the aggressiveness consistently in two external cohorts, PCS and PAM50. Our findings demonstrated a validated eight-gene signature could successfully predict PC prognosis and regulate the steroid hormone pathway.
... In order to analyse the modular structure of the bladder cancer genes subnetwork, we used the MCODE clustering algorithm [57] with default parameters on the whole network. We used ClusterProfiler [58] and a hypergeometric test to determine which terms and pathways from the Gene Ontology (GO) [59], the Kyoto Encyclopaedia of Genes and Genomes (KEGG) [60], and the cancer hallmark signatures from the Molecular Signature Database (MSigDB) [61] were more significantly associated with the modules than expected by chance. ...
Article
Full-text available
Bladder cancer remains one of the most common forms of cancer and yet there are limited small molecule targeted therapies. Here, we present a computational platform to identify new potential targets for bladder cancer therapy. Our method initially exploited a set of known driver genes for bladder cancer combined with predicted bladder cancer genes from mutationally enriched protein domain families. We enriched this initial set of genes using protein network data to identify a comprehensive set of 323 putative bladder cancer targets. Pathway and cancer hallmarks analyses highlighted putative mechanisms in agreement with those previously reported for this cancer and revealed protein network modules highly enriched in potential drivers likely to be good targets for targeted therapies. 21 of our potential drug targets are targeted by FDA approved drugs for other diseases — some of them are known drivers or are already being targeted for bladder cancer (FGFR3, ERBB3, HDAC3, EGFR). A further 4 potential drug targets were identified by inheriting drug mappings across our in-house CATH domain functional families (FunFams). Our FunFam data also allowed us to identify drug targets in families that are less prone to side effects i.e., where structurally similar protein domain relatives are less dispersed across the human protein network. We provide information on our novel potential cancer driver genes, together with information on pathways, network modules and hallmarks associated with the predicted and known bladder cancer drivers and we highlight those drivers we predict to be likely drug targets.
... Molecular Function and Cellular Component terms, but not Biological Process Terms, were transferred from the human reference protein dataset. Inferred annotations were added using inter-ontology links (64) in the go.obo file downloaded from the Gene Ontology Consortium (release date 2020-08-11, doi:10.5281/zenodo.2529950; ...
Article
Full-text available
We report an update of the Hymenoptera Genome Database (HGD; http://HymenopteraGenome.org), a genomic database of hymenopteran insect species. The number of species represented in HGD has nearly tripled, with fifty-eight hymenopteran species, including twenty bees, twenty-three ants, eleven wasps and four sawflies. With a reorganized website, HGD continues to provide the HymenopteraMine genomic data mining warehouse and JBrowse/Apollo genome browsers integrated with BLAST. We have computed Gene Ontology (GO) annotations for all species, greatly enhancing the GO annotation data gathered from UniProt with more than a ten-fold increase in the number of GO-annotated genes. We have also generated orthology datasets that encompass all HGD species and provide orthologue clusters for fourteen taxonomic groups. The new GO annotation and orthology data are available for searching in HymenopteraMine, and as bulk file downloads.
... This detailed information can be used by automated pipelines to infer function based on sequence similar or orthology. These automated functional inferences provide functional characterization for poorly studied species or even for poorly studied genes of highly studied species (Huntley et al., 2014). However, it is important to be aware that most GO annotations belong to the category of automatically generated annotations and that this phenomenon does not affect all species equally. ...
... For example, most terms are taxon neutral but some terms have taxon restrictions as they can be annotated only to specific taxa; also, there is an annotation blacklist specifying protein: GO term combinations that should not exist. These restrictions are verified when new terms or relationships are incorporated into the ontology (Huntley et al., 2014). ...
Chapter
With the advent of novel computational methods, it has been possible to improve the identification of new therapeutic alternatives as well as novel biomarkers for early cancer diagnosis. Considering the abundance of genomics data for cancer patients it is possible to extract genomic profiles and generate associations with rigorous clinical data. Nowadays it is possible to predict genomic marks of cancer subtypes, genomic progression based on genomic features, genomic variants association with clinical data and even the response prediction to different therapies implementing computational methods and state-of-the-art statistical models. Furthermore, the integration of clinical data to genomic profiles is available thought the potential application of novel and robust quantitative methods like Bayesian statistics, dynamic modeling and machine learning. In this sense, the integration of clinical and phenotypic information with tumor genomics data is essential to improve treatment, classification and diagnosis of cancer patients. In this chapter we present an introduction to the computational approaches used in the association between genomic data and patient clinical information. We present a brief overview of the databases associated with cancer genomic data, the methods used to integrate clinical data and perspectives of the application of new quantitative clinico-genomics models in cancer. With this in mind, we highlight the importance of the association between genomic variants, mutational patterns, molecular classification with clinical data such as pathology, histology, history of life and genealogy as an unprecedented tool to understand cancer progression, diagnosis and prevention through precision personalized medicine.
... Gene Ontology (GO) is a controlled vocabulary of terms for describing the biological roles of genes and their products [11], it has been extensively used as a golden standard [12]. GO annotations of proteins are originally collected from published (or unpublished) experimental data by GO curators. ...
... In the protein function prediction, effectively mining GO hierarchy and known annotation is important [12,13,22,23]. The semantic and structural information of GO can largely assist computational models to determine the function of proteins. ...
Article
Full-text available
Background Maize (Zea mays ssp. mays L.) is the most widely grown and yield crop in the world, as well as an important model organism for fundamental research of the function of genes. The functions of Maize proteins are annotated using the Gene Ontology (GO), which has more than 40000 terms and organizes GO terms in a direct acyclic graph (DAG). It is a huge challenge to accurately annotate relevant GO terms to a Maize protein from such a large number of candidate GO terms. Some deep learning models have been proposed to predict the protein function, but the effectiveness of these approaches is unsatisfactory. One major reason is that they inadequately utilize the GO hierarchy. Results To use the knowledge encoded in the GO hierarchy, we propose a deep Graph Convolutional Network (GCN) based model (DeepGOA) to predict GO annotations of proteins. DeepGOA firstly quantifies the correlations (or edges) between GO terms and updates the edge weights of the DAG by leveraging GO annotations and hierarchy, then learns the semantic representation and latent inter-relations of GO terms in the way by applying GCN on the updated DAG. Meanwhile, Convolutional Neural Network (CNN) is used to learn the feature representation of amino acid sequences with respect to the semantic representations. After that, DeepGOA computes the dot product of the two representations, which enable to train the whole network end-to-end coherently. Extensive experiments show that DeepGOA can effectively integrate GO structural information and amino acid information, and then annotates proteins accurately. Conclusions Experiments on Maize PH207 inbred line and Human protein sequence dataset show that DeepGOA outperforms the state-of-the-art deep learning based methods. The ablation study proves that GCN can employ the knowledge of GO and boost the performance. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=DeepGOA.
... This is effective for a single point in time, but as our understanding of cancer and biological interactions rapidly change, the structure and annotations of the intermediate resources also change and therefore so do the associations. [11] [12] [13].The use of biological ontologies and knowledge bases to help structure, cluster and compare research results is well-established in bioinformatics [14] [15]. However, interrelating continuously changing knowledge from multiple sources is a larger challenge. ...
Preprint
Full-text available
Motivation: The hallmarks of cancer provide a highly cited and well-used conceptual framework for describing the processes involved in cancer cell development and tumourigenesis. However, methods for translating these high-level hallmarks concepts into data level-links between individual genes and individual cancer hallmarks varies widely between studies. When we examine different strategies for linking and mapping cancer hallmarks in detail, we see significant differences, but also consensus. Results: Here we compare hallmark mapping schemes from multiple studies and explore the consensus knowledge from these different approaches, in order to help us better understand the core biological processes and pathways that are associated with the hallmarks of cancer. We also explore the differences between mapping schemes and identify which differences represent changes in our understanding of cancer, changes in our understanding of biological processes in the non-disease state, or the accumulation of more experimental evidence over time. Conclusions: Mapping strategies rely on intermediate knowledge resources, such as biological pathway databases like KEGG or the Gene Ontology. The structure and annotations of these intermediate resources also change over time. The results of this study therefore highlight the challenges of integrating accumulated, distributed and changing biological knowledge in bioinformatics.
... The goal of incorporating this information is to overcome the common assumption in other predictors, that the collected annotations of each gene are complete. However, the GO annotations of proteins are often incomplete [55]. While DeepIsoFun also uses GO hierarchy to train the model, it uses only expression information. ...
Article
Full-text available
Multiple mRNA isoforms of the same gene are produced via alternative splicing, a biological mechanism that regulates protein diversity while maintaining genome size. Alternatively spliced mRNA isoforms of the same gene may sometimes have very similar sequence, but they can have significantly diverse effects on cellular function and regulation. The products of alternative splicing have important and diverse functional roles, such as response to environmental stress, regulation of gene expression, human heritable, and plant diseases. The mRNA isoforms of the same gene can have dramatically different functions. Despite the functional importance of mRNA isoforms, very little has been done to annotate their functions. The recent years have however seen the development of several computational methods aimed at predicting mRNA isoform level biological functions. These methods use a wide array of proteo-genomic data to develop machine learning-based mRNA isoform function prediction tools. In this review, we discuss the computational methods developed for predicting the biological function at the individual mRNA isoform level.