Figure - available from: BMC Bioinformatics
This content is subject to copyright. Terms and conditions apply.
Relational database schema representing schematically the data warehouse built by TAGOPSIN. Primary key (PK) attributes (allowing unique identification) are in bold and underlined. An arbitrary PK was chosen where a satisfactory biological attribute could not symbolize it (e.g. OID, CDS ID, NTS ID). Foreign key attributes are indicated by (FK). Arrows indicate how one relation is linked to another. Start, Stop: start and end positions on the corresponding sequence; SP Start, SP Stop: start and end positions on the Swiss-Prot sequence; PDB Start, PDB Stop: start and end positions on the PDB structure

Relational database schema representing schematically the data warehouse built by TAGOPSIN. Primary key (PK) attributes (allowing unique identification) are in bold and underlined. An arbitrary PK was chosen where a satisfactory biological attribute could not symbolize it (e.g. OID, CDS ID, NTS ID). Foreign key attributes are indicated by (FK). Arrows indicate how one relation is linked to another. Start, Stop: start and end positions on the corresponding sequence; SP Start, SP Stop: start and end positions on the Swiss-Prot sequence; PDB Start, PDB Stop: start and end positions on the PDB structure

Source publication
Article
Full-text available
Background The wealth of biological information available nowadays in public databases has triggered an unprecedented rise in multi-database search and data retrieval for obtaining detailed information about key functional and structural entities. This concerns investigations ranging from gene or genome analysis to protein structural analysis. Howe...

Citations

... Data retrieval and collation were performed using TAGOPSIN. 19 In brief, TAGOPSIN is a Java command line programme for rapid and systematic retrieval of select data from 7 public biological databases relevant to comparative genomics and protein structure studies. The programme allows a user to retrieve organism-centred data and assemble them in a single local database in PostgreSQL (https://www.postgresql. ...
Article
Full-text available
Mycobacterium tuberculosis (Mtb) is the causative agent of tuberculosis (TB), an infectious disease that is a major killer worldwide. Due to selection pressure caused by the use of antibacterial drugs, Mtb is characterised by mutational events that have given rise to multi drug resistant (MDR) and extensively drug resistant (XDR) phenotypes. The rate at which mutations occur is an important factor in the study of molecular evolution, and it helps understand gene evolution. Within the same species, different protein-coding genes evolve at different rates. To estimate the rates of molecular evolution of protein-coding genes, a commonly used parameter is the ratio dN/dS, where dN is the rate of non-synonymous substitutions and dS is the rate of synonymous substitutions. Here, we determined the estimated rates of molecular evolution of select biological processes and molecular functions across 264 strains of Mtb. We also investigated the molecular evolutionary rates of core genes of Mtb by computing the dN/dS values, and estimated the pan genome of the 264 strains of Mtb. Our results show that the cellular amino acid metabolic process and the kinase activity function evolve at a significantly higher rate, while the carbohydrate metabolic process evolves at a significantly lower rate for M. tuberculosis. These high rates of evolution correlate well with Mtb physiology and pathogenicity. We further propose that the core genome of M. tuberculosis likely experiences varying rates of molecular evolution which may drive an interplay between core genome and accessory genome during M. tuberculosis evolution.