ArticlePDF Available

Comparative Genomic Analysis of the DUF34 Protein Family Suggests Role as a Metal Ion Chaperone or Insertase

Authors:

Abstract and Figures

Members of the DUF34 (domain of unknown function 34) family, also known as the NIF3 protein superfamily, are ubiquitous across superkingdoms. Proteins of this family have been widely annotated as “GTP cyclohydrolase I type 2” through electronic propagation based on one study. Here, the annotation status of this protein family was examined through a comprehensive literature review and integrative bioinformatic analyses that revealed varied pleiotropic associations and phenotypes. This analysis combined with functional complementation studies strongly challenges the current annotation and suggests that DUF34 family members may serve as metal ion insertases, chaperones, or metallocofactor maturases. This general molecular function could explain how DUF34 subgroups participate in highly diversified pathways such as cell differentiation, metal ion homeostasis, pathogen virulence, redox, and universal stress responses.
DUF34 fusions and select gene neighborhoods. (a) Domain architectures of DUF34 fusions. The domain rendering dimensions and positions are approximate. DUF34 domains are rendered in white with black outlines. Domain colors correspond to the key shown in panel b. COGs of fusion domains are listed below each. Fusions deemed "invalid" or "inconclusive" were excluded for panels a and b. (b) Pie chart of DUF34 fusions (126 sequences, total). The outer halo surrounding chart indicates the superkingdoms in which respective fusions were observed (Eukaryota: black; Archaea: dark gray; Bacteria: light gray). (c) Neighborhoods of select bacterial and archaeal fusions are shown (12 kb, each), all of at least "conditional" validation confidence (Data Table S11). DUF34 is depicted in bright yellow and fusion domains are indicated by hashing or alternative coloring. For DUF34 sequence labels, "YqfO" denotes a sequence also containing inserted domain, COG3323, while "YbgI" denotes a sequence without the inserted COG3323 domain. Rendered fusion domains do not reflect exact sizes or locations. The color key is divided into two sets of identities (gray boxes): (top) general metabolic theme or specific annotation with bioinformatic precedent; and (bottom) COGs observed in physical clustering analysis (PCA). COGs also observed in PCA (Table 3) are shown in bold. Six minor exceptions to the top-20 rank cut-off are shown in bold with an asterisk (*): COG1196 (top 31st); COG0564 (top 23rd); COG0648 (top 25th); COG0406 (top 48th) in a fusion with COG0328; and COG0041 (top 36th). Others observed in rep. operons but were ranked beyond the "minor exception" threshold (exceeded top-50) in PCA are shown without additional symbols, not bolded: COG0245 (116th) and COG0761 (61st). Finally, one was not observed in PCA (not bolded) but was in at least one rep. operon (double asterisk, **): COG0642 (SAMN05192534_10671 of A. persepolensis; rep. operon, Desulfurispirillum indicum S5) (Data Table S7). Note: COG4111 (NUDIX hydrolase), present in panel c (neighborhood of M. rubeus), was absent from PCA (any rank) and rep. operons, despite the fusion with COG3323 in F. nucleatum having been resolved in preceding homolog capture and literature review.
… 
Content may be subject to copyright.
biomolecules
Article
Comparative Genomic Analysis of the DUF34 Protein Family
Suggests Role as a Metal Ion Chaperone or Insertase
Colbie J. Reed 1, Geoffrey Hutinet 1and Valérie de Crécy-Lagard 1, 2, *


Citation: Reed, C.J.; Hutinet, G.;
de Crécy-Lagard, V. Comparative
Genomic Analysis of the DUF34
Protein Family Suggests Role as a
Metal Ion Chaperone or Insertase.
Biomolecules 2021,11, 1282. https://
doi.org/10.3390/biom11091282
Academic Editor: Lukasz Kurgan
Received: 12 July 2021
Accepted: 24 August 2021
Published: 27 August 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA;
creed212@ufl.edu (C.J.R.); ghutinet@ufl.edu (G.H.)
2Genetics Institute, University of Florida, Gainesville, FL 32611, USA
*Correspondence: vcrecy@ufl.edu; Tel.: +1-352-392-9416
Abstract: Members of the DUF34 (domain of unknown function 34) family, also known as the NIF3
protein superfamily, are ubiquitous across superkingdoms. Proteins of this family have been widely
annotated as “GTP cyclohydrolase I type 2” through electronic propagation based on one study.
Here, the annotation status of this protein family was examined through a comprehensive literature
review and integrative bioinformatic analyses that revealed varied pleiotropic associations and
phenotypes. This analysis combined with functional complementation studies strongly challenges
the current annotation and suggests that DUF34 family members may serve as metal ion insertases,
chaperones, or metallocofactor maturases. This general molecular function could explain how
DUF34 subgroups participate in highly diversified pathways such as cell differentiation, metal ion
homeostasis, pathogen virulence, redox, and universal stress responses.
Keywords:
comparative genomics; metabolic reconstruction; bioinformatics; conserved unknowns;
function prediction; functional annotation; orthology
1. Introduction
Protein families that are both highly conserved across domains of life and poorly
characterized are referred to as conserved unknowns [
1
,
2
]. Though recent studies that use
comparative genomics [
3
,
4
], classical genetics [
5
] and/or biochemistry [
6
,
7
] approaches
have solved a few of these “orphan” family puzzles, their number remains high [
1
,
8
12
].
One of the issues is that, because these conserved proteins often harbor core functional
roles, genetic approaches lead to pleiotropic phenotypes, making the elucidation of a pre-
cise molecular function quite difficult. For example, the COG0533 and COG0009 proteins
involved in the synthesis of the universal tRNA modification threonylcarbamoyladenosine
(t
6
A) [
13
15
], were first thought to be involved in protein degradation [
16
,
17
], transcrip-
tional regulation [
18
], or cell division [
14
]. Similarly, RidA (reactive intermediate deaminase
A), a subgroup within the Rid family of proteins (members also have been referred to as
YjgF/YER057c/UK114), was a notable challenge for functional characterization due to the
multiple and complex phenotypes associated with mutations in genes of this family in
different organisms [1923].
The DUF34/NIF3 protein family is reportedly ubiquitous, with members found in
model organisms such as Homo sapiens (NIF3L1), Mus musculus (Nif3l1), Saccharomyces
cerevisiae (Ngg1-interacting Factor 3/NIF3) [
24
,
25
], Escherichia coli (YbgI) [
26
] and Bacil-
lus cereus (YqfO) [
27
]. Despite its conservation, the precise function(s) of members of
this family remain undetermined. More than a decade has passed since the family
was first formally identified as a target for characterization [
24
] and even longer since
the gene encoding a homolog of NIF3 in S. cerevisiae was first described in Drosophila
melanogaster [
28
,
29
]. Since, it has been linked to a variety of functions across superk-
ingdoms and several diseases in humans (e.g., juvenile amyotrophic lateral sclerosis,
Williams–Beuren Syndrome [
30
,
31
], among many others). The role of this protein family
Biomolecules 2021,11, 1282. https://doi.org/10.3390/biom11091282 https://www.mdpi.com/journal/biomolecules
Biomolecules 2021,11, 1282 2 of 32
remains mysterious, even with recent studies trying to more proximately decipher its
function in E. coli [
32
]. Automated annotation databases indicate that the human DUF34
family member, NIF3L1, is highly connected, for example listing 4178 functional associa-
tions for its entry in the Harmonizome database (i.e., 65 datasets, electronically extracted;
https://amp.pharm.mssm.edu/Harmonizome/gene/NIF3L1 [
33
]; accessed on 22 June
2021). In addition, an annotation based on a single set of
in vitro
results examining the NIF3
homolog of Helicobacter pylori (HP0959) [
34
] led to the swift percolation of the annotation,
“GTP cyclohydrolase I type 2 homolog”, throughout many databases, including UniProtKB.
This annotation as the first enzyme of tetrahydrofolate biosynthesis is certainly incorrect
for the whole protein family, as DUF34 members are found in folate auxotrophs such as
Mycoplasma [3537].
A comprehensive analysis of the literature was conducted to catalog all published
knowledge for DUF34 family members, an endeavor that cannot be easily conducted
using only simple PubMed searches, as many studies do not mention general family
names of genes/proteins for which data has been generated, often only citing species- or
system-specific gene names. In parallel, an extensive comparative genomic analysis was
performed to investigate the validity of “GTP cyclohydrolase I type 2”, a dubious anno-
tation widespread among DUF34 family members, and to ultimately propose a unifying
functional role for the family as a metal insertase. With this, it was possible to divide
the DUF34 protein family into subgroups by distinctions in structure, complete domain
architecture, regulation, occurrence, localization, and functional associations.
2. Materials and Methods
2.1. Capture of Literature, Structural, and Essentiality Data
The strategy used to compile published literature for members of the DUF34 family is
detailed in the Supplemental Methods and all websites used, both here and in subsequent
analyses, are listed in Supplemental Table S1. Most of the public search engines/web
crawlers, and searchable libraries/depositories used required text as input while more
specialized tools leveraged the use of protein sequences (e.g., PaperBLAST [
38
]). Protein
Data Bank (PDB; RCSB PDB, Research Collaboratory for Structural Bioinformatics PDB))
was used to evaluate and acquire protein crystal structures and respective sequences,
related literature, and relevant data files for subsequent search and analysis [
38
40
]. Struc-
tures were edited, aligned using PyMol (Edu PyMol, Schrödinger, Inc., New York, New
York, USA, Educational edition). MetalPDB was used to survey ions present, indicated, or
predicted to complex with published protein crystal structures [41,42].
Essentiality data was acquired using multiple different sources listed in Table S1.
The BLAST search tool of DEG (Database of Essential Genes) [
43
] was used, with H. sapi-
ens (NIF3L_HUMAN, Q9GZT8), Methanocaldococcus jannaschii (GCH1L_METJA, Q58337),
B. cereus (Q818H0_BACCR, Q818H0), and E. coli (GCH1L_ECOLI, P0AFP6) as inputs.
Ogee [
44
] was used to collect additional essentiality data through the browse function.
Predicted essentiality data for Mycoplasma species were acquired using pDEG (Database of
Predicted Essential Genes) [45].
2.2. Domain Analysis
The first set of sequences of DUF34 family members from model organisms was
extracted using OrthoInspector 3.0 (accessed on 30 January 2020; iCube Laboratory, Illkirch-
Graffenstaden, France) [
46
] using the following input sequences for retrieving sets of
sequences per superkingdom: NIF3L_HUMAN (Q9GZT8), GCH1L_METJA (Q58337), and
GCH1L_ECOLI (P0AFP6). An additional set of sequences from organisms with published
data was extracted from UniProtKB [
47
] to generate a non-redundant list of 219 sequences
to be used in subsequent analyses. The sequences of the corresponding DUF34 proteins
were not available for a few organisms with which publications were associated. For
Desulfovibrio desulfuricans, sequences of the closely related Desulfovibrio alaskensis G20 were
used, and those of Schistosoma mansoni were used in the place of Schistosoma mekongi.
Biomolecules 2021,11, 1282 3 of 32
Although described in their respective publications, sequences for DUF34 family members
could not be retrieved for three organisms: Idiosepius paradoxus, Streptomyces sp. SN-
1061M, Verrucomicrobium (Termite Associated, TAV) sp. strain 2. Sequences were aligned
using MAFFT (E-INS-i, default settings) [4850]. Motif and domain logos were generated
through the use of the WebLogo web server [
51
]. Sequence logos were manually aligned
using Inkscape [52].
2.3. Absence-Presence, Phyletic Patterns & Homolog/Paralog Co-Occurrence
Species trees were generated with PhyloT (database version 2020.2) and iToL [
53
].
Absence-presence data was acquired, both, through manual curation using advanced
searches of common databases (i.e., UniProt, NCBI [
54
]), subsequent BLAST validation, as
well as the use of phyletic patterning tools available through MicrobesOnline (accessed on
7 June 2019) [
55
] and STRING (v11, released 19 January 2019) [
56
]. Paralogs were identified
using EggNOG (EggNOG 5.0, EMBL, Heidelberg, Germany) [
57
] and KEGG Paralog Search
(KEGG release 94.1, Kyoto University, Kyoto, Japan) [58].
2.4. Physical Clustering Analysis
Physical clustering data was acquired from Gene Context Tool NG (GeConT 3) of
the Computational Genomic Group, IBT–UNAM, using the central orthologous group ID
known for the DUF34 family, COG0327 (accessed on 3 May 2020) [
59
] and analyzed using
a text-mining strategy we developed and termed Physical Clustering Keyword Frequency
Analysis (PCKFA). This approach as well as the further annotation of a subset of families
are described in detail in the Supplemental Methods (1.2).
2.5. Coexpression, Covariation Data Acquisition & Enrichment Analysis
Lists of 300 genes co-expressed with DUF34 family members were retrieved for all 10
eukaryotic model organisms available using CoXPresDb (gene sets excluded respective
DUF34 homologs) [
60
], except for Caenorhabditis elegans, which does not encode for a
DUF34 family member. Protein covariation data for Homo sapiens was acquired using the
ProteomeHD webserver (unsupervised query format) [
61
]; a threshold of 0.98 was used
for data retrieval for NIF3L1 (specific protein reference ID within the database: Q9GZT8-2,
resulting in 114 total covarying proteins). Gene set enrichment analyses (GSEA), was
performed using two tools: g:GOSt (via g:Profiler web server, Bioinformatics, Algorithmics
and Data Mining Group, University of Tartu, Tartu, Estonia) [
62
], and the functional
annotation clustering tool (via DAVID bioinformatic suite, Frederick National Laboratory,
Frederick, Maryland, USA) [
63
65
]. UniProtKB was used to map UniProt IDs to the Entrez
Gene IDs of eukaryotic datasets prior to GSEA. If electronic mapping failed for a human
identifier, the HGNC database was used in manual retrieval (HUGO Gene Nomenclature
Committee at the European Bioinformatics Institute [
66
]). If mapping failed the “reviewed”
entries in UniProtKB were selected over the “unreviewed” duplicates and/or isoforms
listed.
2.6. Fusion Analysis
To analyze fusions present in the DUF34 family, the protein family members as defined
by UniProt (e.g., “GTP cyclohydrolase I type 2/NIF3 family”) were exported and filtered
for all sequences containing InterPro HMM profile signature annotations distinct from
those already recognized in Results Section 3.5. To optimize coverage of all documented
fusions, the second and third approaches for curating such homologs were implemented
in parallel to the UniProt-dependent approach. For these two complementary methods,
sequences of various domain architectures were directly exported from Pfam (PF01784) and
InterPro (IPR036069), independently. Three lists of homologs generated by each method
were concatenated and duplicate sequences removed. Fusions identified via the preceding
literature review were added, defining the final collection of “noncanonical” homologs.
All fusion/arrangement types were further evaluated for legitimacy through manual cu-
Biomolecules 2021,11, 1282 4 of 32
ration (i.e., comparative annotation review of the genome and sequence features) and
the assignment of confidence scores: “valid” (highest confidence); “valid, conditional”;
“conditional”/“conditional, singleton”; “inconclusive”; “invalid” (lowest confidence, no
validity). To ensure results of fusion analyses were comparable to those of other bioinfor-
matics presented, singularly representative COGs and COG descriptions were assigned to
the final list of exceptional homologs using CDD Search, subsequently cross-referencing
results with EggNOG records for optimal domain descriptions. For more information on
data transformation, amendment, and clean-up, see Supplemental Methods (1.3).
2.7. Strain Construction & List
All strains and oligonucleotides used in this study are listed in Table S2. Two genes
of E. coli,ybgI (encoding for DUF34) and folE (encoding for GTP cyclohydrolase I type 1)
were cloned independently in pBAD24 between NcoI and SbfI following PCR amplification
by Phusion®High-Fidelity DNA Polymerase (New England Biolabs, Ipswitch, MA, USA,
NEB) using GO285 and GO286 oligonucleotides for ybgI, while GO434 and GO435 were
used for folE. After verification by sequencing, the plasmids generated were renamed
“pGH50” and “pGH101”, respectively.
The ybgI::Kan
R
E. coli mutant came from the Keio Collection [
67
], while the folE::Kan
R
had been previously constructed [
68
]. These mutations were transduced by P1vir into E.
coli K-12 MG1655. The ybgI and folE double mutant were obtained by first flipping out
the kanamycin cassette from the ybgI mutant using pCP20 [
69
], subsequently transducing
the folE::Kan
R
mutation using P1vir. Mutation verifications were performed by oneTaq
PCR (NEB) using a set of primers internal and external to the gene (GO563 to GO570).
Each plasmid, including empty pBAD24, was individually transformed into the control
strain and each mutant. Strains were grown at 37
C using LB supplemented with glucose
0.2%, kanamycin sulfate 50
µ
g/mL, or ampicillin 100
µ
g/mL when necessary for selection.
20-deoxythymidine (dT) 0.3 mM was used for folE mutants.
2.8. dT Sensitivity Assay
Strains (WT, single mutants, and double mutants) were grown overnight at 37
C in
LB supplemented with glucose 0.2%, kanamycin sulfate 50
µ
g/mL (except for WT), and
dT 0.3 mM. Each strain was inoculated in various LB with or without dT 0.3 mM at an
OD
600nm
of 0.1 and grown at 37
C in a bioscreen (Oy Growth Curves Ab Ltd., Turku,
Varsinais-Suomi, Finland) for 40 h. This experiment was completed in quintuplicate.
2.9. dT Essentiality Complementation Assay
Strains containing pBAD24 variations were grown overnight at 37
C in LB sup-
plemented with glucose 0.2%, ampicillin 100
µ
g/mL and dT 0.3 mM. They were then
normalized to an OD
600nm
of 1.0 in LB, and a 5
µ
L drop was streaked on LB agar containing
ampicillin 100
µ
g/mL, either glucose or arabinose at 0.2%, and either with or without dT
0.3 mM. These plates were left to grow for 10 h at 37
C. This experiment was performed in
triplicate.
3. Results and Discussion
3.1. Extensive Literature Capture and Analysis Confirms Pleiotropic Role of DUF34 Family
Members
While the earliest mention of the family dates back to 1996 when the binding of a yeast
homolog to NGG1/ADA3 via a GAL4 fusion domain was noted [
70
], the first dedicated
description of a DUF34 family member was published in 2000 with the isolation and
characterization of the human NIF3L1 and its mouse homolog [
30
]. Only seven papers in
PubMed cite the latter study (per 6 June 2021) and 20 mostly unrelated publications cite
the former (as of 6 June 2021; studies focused mostly on NGG1/ADA3 or SAGA complex,
only 6 demonstrating relevance to DUF34). PaperBLAST, a sequence-based literature
search tool, searches titles, abstracts, and full publication texts available through Europe
Biomolecules 2021,11, 1282 5 of 32
PMC [
71
]. As PaperBLAST searches only open-source texts, we expanded our search using
a cyclic approach described in Supplemental Methods Section (1.1). A final collection of
sequences and keywords used for sequence-/text-based searches can be found in Data
Table S1. The resulting list of curated publications was divided into two groups: “focal”
(i.e., homolog mentioned in title or abstract; Table 1) and “non-focal” (i.e., mention occurs
in other publication sections or supplemental/attached files). The complete collection of
focal/non-focal publications is reported in Data Table S2. All individual DUF34 family
members with publications are listed in Table S3. Using this integrative search approach,
the ultimate total of reference terms reached upwards of 857 and provided DUF34 member-
relevant data for ~100 unique organisms. This process increased the total number of
DUF34 protein family-relevant papers from < 30 when using a simple PubMed search with
the following query, [“DUF34” OR “NIF3” OR “NIF3L1” OR “YbgI” OR “YqfO”], to 333
distinct publications using the iterative approach.
Although the captured data covered all superkingdoms, the distribution of publication
counts skewed largely toward bacteria, this domain having the greatest number of “non-
focal” publications and, thereby, total publications overall. In contrast, work examining
eukaryotic systems contributed the greatest proportion of “focal” publications. Only one
“non-focal” publication featured a viral homolog. No publications were found to describe
DUF34 family members for any species of plant (Viridiplantae), consistent with the absence
of DUF34 homologs among annotated plant genomes discussed below.
To discern whether any common functional associations could be extracted from the
final DUF34 corpus, word clouds were generated using publication titles of both focal
and non-focal publications (Data Table S2, Figure S1). The resulting diagrams predomi-
nantly emphasized the systems of study (e.g., “Mycobacterium”, “Escherichia”, “Bacillus”,
“yeast”) and terms relating to the characterization process (e.g., “reveal”, “novel”, “analy-
sis”, “functional”, “identifies”, “associated”), both of which observations provided little
insight into a specific function. However, other less pronounced keywords were indica-
tive of more specific biological contexts, such as “mitochondrial”, “DNA repair”, “DNA
methylation”, “[Fe]-hydrogenase cofactor biosynthesis”, “stress”, “virulence”, “heat”, “re-
sistance”, and “secreted”, for example. Together, these diagrams illustrated that, of the
surveyed literature, themes of bacterial pathogen virulence, gene regulation, cell signal-
ing pathways, stress response, as well as metal ion metabolism and related membrane
homeostasis, seemed to be emphasized.
Across published data, differences in the localization of DUF34 proteins are reported
with no clear consensus. In fungi, for example, family members have been linked to
mitochondria (e.g., P53081, Saccharomyces cerevisiae), while also, in the same organism
(S. cerevisiae [
72
]), being observed to translocate between the nucleus and cytosol. This
translocation is also observed in higher eukaryotes (e.g., Q9GZT8, Homo sapiens; Q9EQ80,
Mus musculus), and, in some cases, appears to be regulated by retinoic acid (Q09GP9,
Bombyx mori [
73
]). Although understood as being predominantly cytoplasmic in bacteria,
truncated DUF34 homologs are secreted in Pseudomonas species as a proposed nematocidal
agent [
74
]. In another case, homologs have been observed to occur at the cellular poles
of E. coli, co-localizing with PstB (phosphate transporter subunit, ATP-binding) and TktA
(transketolase) [32].
Historically, associations of NIF3L1 with human disease have driven much of the impe-
tus for research into this DUF34 homolog [
30
,
31
,
75
,
76
]. Such links to human disease have been
particularly reinforced by many non-focal publications (Table S3; Data Table S2). Indeed, ex-
pression of DUF34 in eukaryotes has been associated with several human pathologies, includ-
ing cancers [
77
93
], chemotherapeutic drug response [
94
,
95
], psychiatric disorders [
96
,
97
],
cardiovascular disease [
98
100
], insulin resistance [
101
], osteoporosis [
75
,
102
], inflamma-
tion [
103
], Amyotrophic Lateral Sclerosis (ALS) [
30
,
104
], William-Beuren Syndrome [
31
], as
well as several other degenerative and developmental neurological diseases [
76
,
105
,
106
]. The
regulation of DUF34 homologs by retinoic acid or biochemical relatives (e.g., all-trans retinoic
acid, ATRA; testosterone [Comparative Toxicogenomics Database]) appears to be conserved
Biomolecules 2021,11, 1282 6 of 32
between humans, mice, and select life stages of some insects [
73
,
107
109
]. Associations to
cell differentiation through gene regulation were also numerous [73,106108,110113].
Links to virulence and environmental stress responses dominated the studies of
bacterial and fungal DUF34 homologs [
32
,
74
,
114
127
]. In addition, links to the regulation
of central carbon metabolism were made in Geobacillus stearothermophilus [
128
] and Bacillus
subtilis [
129
]. Although ssDNA- and dsDNA-binding properties
in vitro
were observed
for at least one archaeal homolog [
130
], only ssDNA-binding activity has been reported in
bacteria [
131
], observations of which later came under scrutiny in the context of UV-induced
DNA damage responses in E. coli [32].
In this comprehensive review of the literature for members of the DUF34 family, obser-
vations and functional associations were highly pleiotropic and could be the result of many
indirect effects. The only precise molecular function proposed with compelling biochemical
evidence is the role as a metal ion insertase in metallocofactor biogenesis described for the
homologs of Methanocaldococcus jannaschii [132] and Methanococcus maripaludis [133].
3.2. Conservation of Metal Binding Site but Variability of Metal Identity across DUF34 Structures
To complement the literature search, PDB was queried using select DUF34 sequences
(YqfO, B. subtilis, P54472; NIF3L1, H. sapiens, Q9GZT8; YbgI, E. coli, P0AFP6; MJ0927, M.
jannaschii, Q58337) as input. These initial queries returned 15 unique structure entries of
DUF34 proteins from six different organisms (5 bacteria, 1 archaeon) (Table 2). Text-based
queries of PDB were also performed using “NIF3”, yielding a total of 27 structures, of which
only 16 were discernible members of the DUF34 family. These were found to represent two
superkingdoms and, within these, seven distinct organisms (eight structures respectively
from each, bacteria and archaea).
Table 1. Focal publications featuring members of the DUF34 protein family.
Name Organisms Phenotype, Biological Relevance Reference
YqfO/BC_4286 Bacillus cereus
Inserted domain similar to PII-like/CutA1 family
proteins; present in select bacterial clades; domain
may regulate catalytic activity [134]
YqfO/BSU_25170 Bacillus subtilis subsp.
subtilis str. 168
With YlxR, coregulates tsaEBD (t6A synthesis [62]);
disruption impairs tsaEDB regulation, loss of
glucose-induction of sigX via PDHc expression
dysregulation
[129]
BmNIF3l Bombyx mori
Translocates to nucleus from cytoplasm upon ATRA
tx; higher transcript levels in differentiating tissues; no
expression detected in the egg stage [73]
YbgI/b0710 Escherichia coli
Structure, homohexameric toroid; monomers possess
dinuclear metal ion-binding site; putatively involved
in DNA repair [26]
No survival impairment upon mutant UV tx; polar
localization during cell division (co-localized with
PstB, TktA); GlmS putative interaction partner; mutant
sensitive to antibiotics affecting cell wall synthesis
[32]
XynX Geobacillus
stearothermophilus
Negatively regulates expression of xynA (encodes a
secreted xylanase); may be negatively regulated by
xylR
[128]
Biomolecules 2021,11, 1282 7 of 32
Table 1. Cont.
Name Organisms Phenotype, Biological Relevance Reference
NIF3L1/ALS2CR1/CALS-
7/MDS015/My018 Homo sapiens
Ubiquitously expressed during embryonic
development; strong over-expression in
spermatogonia-derived, teratocarcinoma cell lines;
Isolated, characterized; cytosolic subcellular
localization; highly conserved N-, C-terminal regions;
shares inserted region of its murine homolog
(CutA1-like)
[24]
NIF3L1 interacts with splice variant, NIF3L1 BP1
(THOC7), cytosolic colocalization; C-terminal leucine
zipper-like domain of variant mediates interaction;
not indicated in repression in NIH3T3 cells; binding
partner, NIF3L1 BP1, demonstrates additional passive
presence in the nucleus
[25]
Retinoic acid-induced binding, cooperative
translocation with Trip15/CSN2 from the cytosol to
the nucleus (early neuronal development, silences
differentiation suppressor Oct-3/4); ubiquitous
expression, important in neuronal development
[107]
Detected in brain, spinal cord, and lymphocytes;
observed as two distinct transcripts with similar
patterns of expression; highest levels of both
transcripts in heart, skeletal muscle, testis; smaller
transcript was expressed at a higher level than the
other; no deletions, polymorphisms linked to ALS
patients relative to controls; 1 of 6 candidates
eliminated for a causative link to ALS2
[30]
1 of 4 hypermethylated, significant differential
expression shared between two cancellous bone
specimen groups: osteoarthritis, osteoporosis [75]
With 14-3-3, co-regulates transcriptional of Wbscr14 by
preventing its nuclear localization via complex
formation (Wbscr14 participates in the
complex-mediated transcription of lipogenic enzymes,
promoting fat accumulation)
[31]
Included in a 7.5-Mb interstitial deletion on
2q32.3–33.1 (28 genes) inpatient diagnosed with
SATB2-Associated 2q32-q33 microdeletion syndrome [76]
Significantly associated with triptolide
chemosensitivity in lymphoblast cell lines [135]
COPS2 point mutations consistent with previously
defined NIF3L1-COPS2 co-repression interaction
model (limited; pathogenesis associated COPS2
mutations: S120C, N144S, Y159H, R173C)
[136]
HP0959 Helicobacter pylori GTP-binding, hydrolysis in vitro, biologically
irrelevant pH, temperature [34]
HcgD/MJ0927 Methanocaldococcus
jannaschii
Proposed iron chaperone required for FeGP cofactor
biosynthesis
Homohexameric via 2 interfaced homotrimeric units;
binds to ssDNA/dsDNA
[132]
[130,137]
Biomolecules 2021,11, 1282 8 of 32
Table 1. Cont.
Name Organisms Phenotype, Biological Relevance Reference
Nif3l1/1110030G24Rik Mus musculus
Isolated, characterized; ubiquitous expression across
tissues; cytosolic localization; highly conserved N-,
C-terminal regions; shares inserted region of the
human homolog
[24]
Retinoic acid-induced binding, cooperative
translocation with Trip15/CSN2 from the cytosol to
the nucleus (early neuronal development, results in
the silence of the differentiation suppressor Oct-3/4);
ubiquitous tissue expression, important in neuronal
development
[107]
WP_046236688
WP_032702676
PP_1038
VT47_06255
WP_017124074
WP_054077596
Pseudomonas sp.
(“YqfO03”) small, secreted protein; demonstrated high
potency as nematicide against C. elegans,M. incognita;
free-standing YqfO domain-containing protein (no
NIF3/DUF34 domains) is a member of the NIF3
protein family
[74]
Nif3/YGL221C
Saccharomyces cerevisiae
Determined to have dual/multiple localizations
(cytosolic, mitochondrial) [72]
SA1388 Staphylococcus aureus
The central domain of NIF3 homolog has high
structural similarity to CutA1 (family linked to cation
tolerance, homeostasis) [138]
SP1609 Streptococcus
pneumoniae
Described as a member of the same orthologous group
(COG2384) as TrmK, RpoD protein families via
structural alignment (incorrect*) [139]
TTHA1606 Thermus thermophilus
HB8 Binds to ssDNA (very weakly, in vitro) [131]
NIF3-like protein superfamily NA (electronic translation) describes family members of
model organisms (Eukaryota, Bacteria), structures
published prior to 2007 [140]
DUF34 monomers form a homohexameric quaternary structure assembled through the
trimerization of homodimers in a “head-to-tail”, tessellating fashion. This homohexameric
toroid is conserved across published structures with the central opening averaging a
diameter of 31 Å (range: 24–38 Å). In some cases, this toroid is modified by the addition
of trimeric “lids” to each side of the central opening, creating a cage-like structure; the
monomeric structural features constituting these “lids” are the inserted P
II
-like domains
observed in the DUF34 family members belonging to select bacterial clades, fungi, and
vertebrates [
134
]. These inserted domains forming these trimeric “lids” have been described
as highly flexible, affecting the resolution of the corresponding architecture [134,138].
A dinuclear metal-binding active site predicted to be catalytic, not structural [
26
]
is highly conserved across available structures of DUF34 family proteins (Table 2). This
active site structure is defined by a central cleft per monomer within which two divalent
metal ions bind [
26
]. The nature of these divalent metal ions varies: from iron found
in both bacterial and archaeal homologs [
26
,
132
] to zinc found in bacterial homologs
containing the additional P
II
-like domain (i.e., SA1388 of Staphylococcus aureus; YqfO
of Bacillus cereus) [
134
,
138
]. This difference in metal ion-binding does not appear to be
attributable to the additional domain as the topology of the active site has been described
as remaining entirely undisturbed, or “identical”, between homologs with and without the
distinct domain architecture [134,138].
Biomolecules 2021,11, 1282 9 of 32
Table 2. Published structures of DUF34 protein family members.
Name Organisms Ligands PII Domain PDB Phenotype Reference
YbgI Escherichia coli (2)Fe3+ No 1NMO NA [26]
(2)Mg2+ No 1NMP
HcgD/MJ0927
Methanocaldococcus
jannaschii
(1)Cl, (2)Fe3+ No 3WSD
Weaker Fe1 site under
oxidized conditions
in vitro
[132]
(2)Fe2+, (1)PO43No 3WSE
(1)Fe3+, (1)citrate No 3WSF
(1)Fe2+, (1)citrate No 3WSG
(1)Fe3+, (1)SO42No 3WSH
(1)Fe2+, (1)PO43No 3WSI
NA No 4IWG Binds to ssDNA, dsDNA
in vitro [130,137]
NA No 4IWM
SA1388 Staphylococcus aureus
(2)Zn2+, (1)B3P Yes 3LNL Cavity diameter = 38 Å;
opening edge length =
20 Å (triangular opening)
[138]
(2)Zn2+ Yes 2NYD
SP1609 Streptococcus
pneumoniae NA No 2FYW NA PDB only
TTHA1606 Thermus thermophilus NA No 2YYB Binds ssDNA not
dsDNA in vitro [131]
Sthe_0840 Sphaerobacter
thermophilus
(7)Cl*, (14)FMT *,
(1)ACT * No 3RXY NA PDB only
YqfO Bacillus cereus (2)Zn2+, (1)HEPES,
(1)TRS Yes 2GX8 NA [134]
* Asterisk indicates that ion count is per the respective asymmetrical unit as opposed to per monomer.
The metal ion-binding sites found in bacterial DUF34 structures contain seven highly
conserved residues: five histidines, one glutamate, one aspartate [
26
,
138
] (Figure 1). These
seven residues are conserved in both YbgI and YqfO forms, the latter possessing the addi-
tional, central “YqfO-like” domain [
134
]. The localization of the active sites within the inside
of the toroid’s central channel is ubiquitous, however, solvent-accessibility of this space
differs between the two types of quaternary structure, the “cage-like” prolate spheroid with
trimeric “lids” demonstrating greater restriction of access to active sites [
131
,
134
]. It should
be noted that one outlier publication regarding the archaeal DUF34 family member, MJ0927
of M. jannaschii (4IWG, 4IWM), appears to differ greatly from all other descriptions of qua-
ternary structure for this family [
130
,
137
], even contradicting several structures published
for the same homolog (3WSD, 3WSE, 3WSF, 3WSG, 3WSH, 3WSI), of which even go as far
as to resolve the active site in different states of oxidation [
132
]. This anomalous structure is
described as a homohexameric spheroid with three openings (~33Å in diameter), instead of
the single, central opening of the toroid conserved in all other published structures of the
DUF34 family.
Biomolecules 2021,11, 1282 10 of 32
Biomolecules 2021, 11, x 9 of 31
A dinuclear metal-binding active site predicted to be catalytic, not structural [26] is
highly conserved across available structures of DUF34 family proteins (Table 2). This ac-
tive site structure is defined by a central cleft per monomer within which two divalent
metal ions bind [26]. The nature of these divalent metal ions varies: from iron found in
both bacterial and archaeal homologs [26,132] to zinc found in bacterial homologs con-
taining the additional PII-like domain (i.e., SA1388 of Staphylococcus aureus; YqfO of Bacil-
lus cereus) [134,138]. This difference in metal ion-binding does not appear to be attributable
to the additional domain as the topology of the active site has been described as remaining
entirely undisturbed, or “identical”, between homologs with and without the distinct do-
main architecture [134,138].
The metal ion-binding sites found in bacterial DUF34 structures contain seven highly
conserved residues: five histidines, one glutamate, one aspartate [26,138] (Figure 1). These
seven residues are conserved in both YbgI and YqfO forms, the latter possessing the ad-
ditional, central “YqfO-like” domain [134]. The localization of the active sites within the
inside of the toroid’s central channel is ubiquitous, however, solvent-accessibility of this
space differs between the two types of quaternary structure, the “cage-like” prolate sphe-
roid with trimeric “lids” demonstrating greater restriction of access to active sites
[131,134]. It should be noted that one outlier publication regarding the archaeal DUF34
family member, MJ0927 of M. jannaschii (4IWG, 4IWM), appears to differ greatly from all
other descriptions of quaternary structure for this family [130,137], even contradicting
several structures published for the same homolog (3WSD, 3WSE, 3WSF, 3WSG, 3WSH,
3WSI), of which even go as far as to resolve the active site in different states of oxidation
[132]. This anomalous structure is described as a homohexameric spheroid with three
openings (~33Å in diameter), instead of the single, central opening of the toroid conserved
in all other published structures of the DUF34 family.
Figure 1. Dinuclear metal-binding site of the E. coli DUF34 homolog, YbgI. The crystal structure of
YbgI (DUF34 homolog, E. coli) illustrates conserved residues of the protein family specific to the
monomeric cleft of the active site and its dinuclear metal center. There are highly conserved residues
noted by Ladner et al. [26] to demonstrate involvement in the structure of the binding pocket that
are distinctively colorized, annotated (orange; residue identity and location labeled accordingly).
3.3. Family Wide and Superkingdom-Specific Signature Motifs
The NIF3/DUF34 family is large, containing 6804 member sequences in Pfam (Pfam
release 32.0), and its members span all kingdoms of life. Previous studies have already
shown that proteins of this family can have different domain architectures
[26,130,131,134,138] but no systematic, comparative analysis of the architectural distinc-
tions had ever been performed across all superkingdoms. We, therefore, set out to classify
Figure 1.
Dinuclear metal-binding site of the E. coli DUF34 homolog, YbgI. The crystal structure
of YbgI (DUF34 homolog, E. coli) illustrates conserved residues of the protein family specific to the
monomeric cleft of the active site and its dinuclear metal center. There are highly conserved residues
noted by Ladner et al. [
26
] to demonstrate involvement in the structure of the binding pocket that are
distinctively colorized, annotated (orange; residue identity and location labeled accordingly).
3.3. Family Wide and Superkingdom-Specific Signature Motifs
The NIF3/DUF34 family is large, containing 6804 member sequences in Pfam (Pfam
release 32.0), and its members span all kingdoms of life. Previous studies have already shown
that proteins of this family can have different domain architectures [
26
,
130
,
131
,
134
,
138
] but no
systematic, comparative analysis of the architectural distinctions had ever been performed
across all superkingdoms. We, therefore, set out to classify the proteins of the DUF34 family
into different subtypes based on the domain arrangements and the presence-absence of
specific sequence motifs. Because several DUF34 protein structures were available (Table 2),
these were used to guide alignment choices and to ultimately map conserved residues.
To resolve subtypes within the DUF34 family, multiple sequence alignments were ini-
tially performed inclusive of members across all superkingdoms. Ortholog sequences were
extracted from OrthoInspector for each superkingdom (Data Table S3), and structure-based
alignments were generated for each group using the MultAlin and ESPript webservers
(Figure S2) [
141
,
142
]. The motifs were divided into three groups, or “tiers”, based on their
degree of cross-superkingdom conservation. Four motifs were found to be conserved across
all three superkingdoms (logos with distinct tiers for all three superkingdoms are shown in
Figure S3). These conserved residues of tier 1 were all integral to the metal-binding pocket
and are the residues described in Figure 2.
The mostnotable difference in the more highly conserved motifs was within the dual-
histidine motif of the N-terminal region (Figure 2). In eukaryotes, the first histidine residue
is replaced by a tyrosine, which may alter the dimensions of the binding pocket (Figure 1).
Another notable distinction in eukaryotes is the second histidine pair ((M/L)xHH) located
after the C-terminal “Dxxx(T/S)G(E/D)” motif (Figure 2). As no published structures for
eukaryotic homologs were available, a model of a representative tertiary structure was
generated using the Phyre2 fold prediction webserver (Figure S4). This alignment suggested
that the additional histidine pair did not contribute to the binding pocket (Figure S4d),
and was, instead, positioned exposed on the protein surface, implying a possible role
in protein-protein interactions; however, characterizations of this and similar structures
have demonstrated a putative involvement in the architecture of the cleft of the active site
formed upon dimerization [
138
]. A final distinguishing feature observed in the eukaryotic
tier 1 sequence is an additional arginine residue following the C-terminal “HxxxE” motif of
the C-terminus, a final motif indicated as a likely contributor to the binding pocket [
26
,
134
].
Biomolecules 2021,11, 1282 11 of 32
Biomolecules 2021, 11, x 10 of 31
the proteins of the DUF34 family into different subtypes based on the domain arrange-
ments and the presence-absence of specific sequence motifs. Because several DUF34 pro-
tein structures were available (Table 2), these were used to guide alignment choices and
to ultimately map conserved residues.
To resolve subtypes within the DUF34 family, multiple sequence alignments were
initially performed inclusive of members across all superkingdoms. Ortholog sequences
were extracted from OrthoInspector for each superkingdom (Data Table S3), and struc-
ture-based alignments were generated for each group using the MultAlin and ESPript
webservers (Figure S2) [141,142]. The motifs were divided into three groups, or “tiers”,
based on their degree of cross-superkingdom conservation. Four motifs were found to be
conserved across all three superkingdoms (logos with distinct tiers for all three superk-
ingdoms are shown in Figure S3). These conserved residues of tier 1 were all integral to
the metal-binding pocket and are the residues described in Figure 2.
Figure 2. Key motifs of Bacteria and Archaea compared to those of Eukaryota. Sequences were aligned for eukaryotic
sequences, separately, and, for bacterial and archaeal sequences, combined. A multiple motif method was used to deter-
mine and compare family signatures. A full figure illustrating the distinct levels of conservation per superkingdom can
be examined in Figure S3.
The most notable difference in the more highly conserved motifs was within the dual-
histidine motif of the N-terminal region (Figure 2). In eukaryotes, the first histidine residue
is replaced by a tyrosine, which may alter the dimensions of the binding pocket (Figure 1).
Another notable distinction in eukaryotes is the second histidine pair ((M/L)xHH) located
after the C-terminal “Dxxx(T/S)G(E/D)” motif (Figure 2). As no published structures for
eukaryotic homologs were available, a model of a representative tertiary structure was gen-
erated using the Phyre2 fold prediction webserver (Figure S4). This alignment suggested
that the additional histidine pair did not contribute to the binding pocket (Figure S4d), and
was, instead, positioned exposed on the protein surface, implying a possible role in pro-
tein-protein interactions; however, characterizations of this and similar structures have
demonstrated a putative involvement in the architecture of the cleft of the active site formed
upon dimerization [138]. A final distinguishing feature observed in the eukaryotic tier 1 se-
quence is an additional arginine residue following the C-terminal “HxxxE” motif of the C-
terminus, a final motif indicated as a likely contributor to the binding pocket [26,134].
3.4. A Variable Central Insertion Occurs in Some DUF34 Family Members
Alignments performed per superkingdom revealed a large diversity in the lengths of
aligned sequences (Data Table S4). The spacing between the Tier 1 motifs seemed to vary
greatly with the superkingdom. To better understand the occurrence and distribution of
lengths for this inserted domain, the regions between the “YxxHxxxxD” and
“Dxxx(T/S)G(E/D)” motifs were manually extracted, lengths measured, and their values
were then superimposed onto a species tree (Figure 3). With this, it was revealed that the
inserted domains were relatively well conserved in select clades of bacteria, a finding rem-
iniscent of an earlier observation made by Godsey et al. [134]. Unexpectedly, an inserted
Figure 2.
Key motifs of Bacteria and Archaea compared to those of Eukaryota. Sequences were aligned for eukaryotic
sequences, separately, and, for bacterial and archaeal sequences, combined. A multiple motif method was used to determine
and compare family signatures. A full figure illustrating the distinct levels of conservation per superkingdom can be
examined in Figure S3.
3.4. A Variable Central Insertion Occurs in Some DUF34 Family Members
Alignments performed per superkingdom revealed a large diversity in the lengths
of aligned sequences (Data Table S4). The spacing between the Tier 1 motifs seemed to
vary greatly with the superkingdom. To better understand the occurrence and distri-
bution of lengths for this inserted domain, the regions between the “YxxHxxxxD” and
“Dxxx(T/S)G(E/D)” motifs were manually extracted, lengths measured, and their values
were then superimposed onto a species tree (Figure 3). With this, it was revealed that
the inserted domains were relatively well conserved in select clades of bacteria, a find-
ing reminiscent of an earlier observation made by Godsey et al. [
134
]. Unexpectedly, an
inserted region was frequent in proteins from higher-order eukaryotes but was absent
from archaeal homologs. Among eukaryotic DUF34 proteins, the insertion sizes followed a
pattern of diminishing length from vertebrate to invertebrate homologs (from higher-order
to lower-order eukaryotes) (Figure 3). In contrast, the length of this domain was relatively
stable among bacterial homologs, if occurring at all, with 28.3% harboring a large form of
the insertion (~100 aa), while the remaining sequences lacked the domain entirely. Outside
of the regions observed in vertebrates, the sizes of this domain varied greatly, especially
in members of invertebrate bilateria and fungi, the latter taxon demonstrating domains
of the shortest lengths. Only one viral DUF34 member, MIMI_R836 (Q5UQI9) of Acan-
thamoeba polyphaga mimivirus, was retrieved from published data and its length was notably
dominated by the inserted domain.
Biomolecules 2021,11, 1282 12 of 32
Biomolecules 2021, 11, x 11 of 31
region was frequent in proteins from higher-order eukaryotes but was absent from ar-
chaeal homologs. Among eukaryotic DUF34 proteins, the insertion sizes followed a pat-
tern of diminishing length from vertebrate to invertebrate homologs (from higher-order
to lower-order eukaryotes) (Figure 3). In contrast, the length of this domain was relatively
stable among bacterial homologs, if occurring at all, with 28.3% harboring a large form of
the insertion (~100 aa), while the remaining sequences lacked the domain entirely. Outside
of the regions observed in vertebrates, the sizes of this domain varied greatly, especially
in members of invertebrate bilateria and fungi, the latter taxon demonstrating domains of
the shortest lengths. Only one viral DUF34 member, MIMI_R836 (Q5UQI9) of Acan-
thamoeba polyphaga mimivirus, was retrieved from published data and its length was nota-
bly dominated by the inserted domain.
Figure 3. Inserted domain lengths across model taxa. The lengths of inserted domains were measured for each homolog.
The sequences (organisms listed in Data Table S4) were aligned per superkingdom for delimiting domains, which then
allowed for the measurement of each inserted region (if present). An evolutionary tree was generated using PhyloT and
iToL, and was mapped with the lengths of inserted domains within each respective homolog. For all inserted domain
lengths measured, these data were used to generate Figure S5, a histogram illustrating counts by ranges of domain lengths
per superkingdom.
Figure 3.
Inserted domain lengths across model taxa. The lengths of inserted domains were measured for each homolog.
The sequences (organisms listed in Data Table S4) were aligned per superkingdom for delimiting domains, which then
allowed for the measurement of each inserted region (if present). An evolutionary tree was generated using PhyloT and
iToL, and was mapped with the lengths of inserted domains within each respective homolog. For all inserted domain
lengths measured, these data were used to generate Figure S5, a histogram illustrating counts by ranges of domain lengths
per superkingdom.
3.5. The DUF34 Family Can Be Split into Eight Interconnected Subgroups
To further characterize domain architectures and examine possibilities of functional
subclasses, we collected the annotated domains linked to DUF34 family members, specif-
ically leveraging InterPro HMM profile signature identifiers and EggNOG group IDs
(Clusters of Orthologous Groups or COGs) (Figure 4; Data Table S5). Various overlapping
combinations of COGs and HMM profile signatures were observed, generating a set of spe-
cific architectural patterns that were used to delineate alphabetically named subgroups (i.e.,
Biomolecules 2021,11, 1282 13 of 32
A–G). Most DUF34 members fell within one of two keystone COGs. The first, COG0327
(subgroup A; Figure 4a), is predominantly defined by the presence of two specific HMM
profile signatures, IPR036069 and IPR002678, and largely defines the shared bases across
subgroups. COG0327 is further divided by HMM profile signatures into two subgroups,
subgroup B and subgroup C (Figure 4a), the former containing an animal-specific signature
(IPR017222) and the latter harboring a bacteria-specific signature (IPR017221). Although
subgroup C was described by InterPro-defined HMM profile signature annotations as
being limited to bacteria, nearly all proteins observed within this subgroup belonged to
eukaryotes. All members of subgroup B occurred in eukaryotes. The second keystone COG
of the DUF34 family, COG3323, as defined by the presence of IPR015867 and IPR036069
(subgroup D; Figure 4a), with IPR036069 being shared between COG3323 and COG0327.
The addition of a third HMM profile signature, IPR004323, to the pairing of IPR015867 and
IPR036069 defined the fifth subgroup, subgroup E. Homologs containing all three keystone
COG-definitive signatures (i.e., IPR002678, IPR015867, and IPR036069) was determinate
for fusions of COG0327 and COG3323. These fusions were observed to occur in two forms:
subgroup F and subgroup G, the latter of which was defined by the additional bacteria-
specific signature, IPR017221 (Figure 4a), a signature previously noted in the definition of
subgroup C.
The D-G subgroups can be differentiated from the A-C subgroups by the presence
of an “HPYE” motif attributable to the HMM profile signature, IPR015867 (Figure S6a,b).
It can also be noted that subgroups D and E can be viewed as stand-alone forms of the
inserted domain found in subgroups F and G. For example, for the DUF34 paralogs of B.
cereus, BC_2685 (Q81CR2), and BC_4286 (Q818H0), the latter sequence was found to contain
an inserted domain bearing high similarity to the former (31.0% identity, 48.0% similarity;
EMBOSS Matcher; Figure S7d) (Figure 4b). This same paralog, BC_2685, was identified
as a member of the CutA1 protein family (PF03091). Interestingly, this YqfO-like paralog
was also found to have a greater identity to the CutA1 homolog of H. sapiens (O60888;
29.4% identity, 47.1% similarity) than to that of other bacteria (i.e., E. coli; P69488; 25.6%
identity, 55.8% similarity). Interestingly, the final glutamate residue of the key motif also
distinguishing DUF34 protein family member inserted domains, “HPYE” of the IPR015867
HMM signature profile (Figure S7g), was replaced by a glutamine in the CutA1 of E. coli,
a replacement also observed in the inserted domain of NIF3L1, the DUF34 homolog of
H. sapiens. The CutA1 protein family (formerly known as DUF190) has historically been
linked to divalent cation tolerance, copper sensitivity, and cytotoxicity (PF03091; IPR004323;
COG1324) [
143
149
]; however, due to characteristics of the quaternary structure (trimers
form ferredoxin-like folds [
150
]), roles in signal transduction and regulation have also
been suggested [
151
153
]. More recently, refute of the protein’s involvement in metal
ion tolerance has led to predictions of CutA1 proteins acting in a small molecule carrier
or signaling capacity [
154
,
155
]. Still, the functions of all three “CutA” proteins remain
under-defined with only small attributions put forward for each, in addition to CutA1:
CutA2 (DsbD) is thought to have disulfide oxidoreductase activity [
156
]; and CutA3 (YjdC)
has been annotated as an HTH-type transcriptional regulator (TetR/AcrR family), more
specifically a negative regulator of nitroreductase NfnB [157].
Biomolecules 2021,11, 1282 14 of 32
Biomolecules 2021, 11, x 13 of 31
(a)
(b)
Figure 4. COG-InterPro HMM signature profile relationships and defined subgroups across DUF34 family members. The
sequences of organisms across the DUF34 protein family, including all fusions and paralogs, were analyzed for co-occur-
rence relationships of COGs and HMM-determined InterPro family/superfamily/domain annotations. All organism hom-
ologs, paralogs & fusions were validated using eggNOG and KEGG Paralog Search. Sequences missing InterPro annota-
tion were analyzed by NCBI CDD Search and InterProScan Search. See Data Table S5 for categories and respective COG
designations/InterPro signature profiles in tabular format. The sequence source organisms considered were those also
observed in Data Table S4. Groups were designated by differential keystone signatures shown in (a) and select representa-
tive sequences of subgroups (AG) are shown (b).
3.6. Taxonomic Distribution Suggests That the NIF3 (COG0327) and YqfO-like (COG3323)
Domains Have Different Functions
Contrary to expectations for the universal conservation established by past publica-
tions, particularly in Eukaryota, DUF34 appeared absent from the eukaryotic clade of Vi-
ridiplantae with the closest incidence of homologs occurring in select haptophyta. Alt-
hough some sequence-based queries of NCBI’s databases indicated the existence of a par-
tial homolog belonging to a specific eudicot (i.e., histidinol dehydrogenase chloroplastic
isoform X1, GEY60218.1; GFD1148.1; KYP77406.1), these few observations appear largely
uncorroborated and were suspected to be products of bacterial contamination. Caenorhab-
ditis elegans, a common model organism, was also observed to lack a DUF34 homolog.
Among the organisms analyzed, Archaea exclusively harbored DUF34 members of sub-
group A (Figure 5). The animal-specific subgroup B was restricted to Metazoa, occurring
ubiquitously across Euteleostomi. Subgroup A often replaced the animal-specific subgroup
Figure 4.
COG-InterPro HMM signature profile relationships and defined subgroups across DUF34 family members. The
sequences of organisms across the DUF34 protein family, including all fusions and paralogs, were analyzed for co-occurrence
relationships of COGs and HMM-determined InterPro family/superfamily/domain annotations. All organism homologs,
paralogs & fusions were validated using eggNOG and KEGG Paralog Search. Sequences missing InterPro annotation
were analyzed by NCBI CDD Search and InterProScan Search. See Data Table S5 for categories and respective COG
designations/InterPro signature profiles in tabular format. The sequence source organisms considered were those also
observed in Data Table S4. Groups were designated by differential keystone signatures shown in (
a
) and select representative
sequences of subgroups (A–G) are shown (b).
3.6. Taxonomic Distribution Suggests That the NIF3 (COG0327) and YqfO-like (COG3323)
Domains Have Different Functions
Contrary to expectations for the universal conservation established by past publi-
cations, particularly in Eukaryota, DUF34 appeared absent from the eukaryotic clade
of Viridiplantae with the closest incidence of homologs occurring in select haptophyta.
Although some sequence-based queries of NCBI’s databases indicated the existence of
a partial homolog belonging to a specific eudicot (i.e., histidinol dehydrogenase chloro-
plastic isoform X1, GEY60218.1; GFD1148.1; KYP77406.1), these few observations appear
largely uncorroborated and were suspected to be products of bacterial contamination.
Caenorhabditis elegans, a common model organism, was also observed to lack a DUF34
homolog. Among the organisms analyzed, Archaea exclusively harbored DUF34 members
of subgroup A (Figure 5). The animal-specific subgroup B was restricted to Metazoa, oc-
curring ubiquitously across Euteleostomi. Subgroup A often replaced the animal-specific
Biomolecules 2021,11, 1282 15 of 32
subgroup B in other lower-order clades of Metazoa including, but not limited to: Arthropoda,
Annelida, and Mollusca (Figure 5). Subgroup A also demonstrated the greatest overall
prevalence and broadest taxonomic range, being observed in the majority of organisms
across the three major superkingdoms. Almost all bacteria lacking a subgroup A homolog
harbored a subgroup G, the bacterial COG0327-COG3323 fusion, in its place. Of all YqfO-
like (COG3323) variants of the DUF34 family (subgroups D–G), only subgroup G was ever
observed to occur without a subgroup A, B, or C form also present. The only exception to
this pattern of subgroup absence-presence was Acanthamoeba polyphaga mimivirus (tax ID:
212035), which was found to only encode a subgroup D homolog. Interestingly, the DUF34
form annotated as being specific to bacteria, subgroup C, was exclusively observed among
select species of non-metazoan bilateria, only occurring in a single bacterial organism (i.e.,
Desulfovibrio alaskensis).
Biomolecules 2021, 11, x 14 of 31
B in other lower-order clades of Metazoa including, but not limited to: Arthropoda, Annelida,
and Mollusca (Figure 5). Subgroup A also demonstrated the greatest overall prevalence
and broadest taxonomic range, being observed in the majority of organisms across the
three major superkingdoms. Almost all bacteria lacking a subgroup A homolog harbored
a subgroup G, the bacterial COG0327-COG3323 fusion, in its place. Of all YqfO-like
(COG3323) variants of the DUF34 family (subgroups D-G), only subgroup G was ever
observed to occur without a subgroup A, B, or C form also present. The only exception to
this pattern of subgroup absence-presence was Acanthamoeba polyphaga mimivirus (tax ID:
212035), which was found to only encode a subgroup D homolog. Interestingly, the
DUF34 form annotated as being specific to bacteria, subgroup C, was exclusively observed
among select species of non-metazoan bilateria, only occurring in a single bacterial organ-
ism (i.e., Desulfovibrio alaskensis).
Figure 5. Absencepresence of DUF34 architectural domain subgroups. Absencepresence data of COGs and HMM-de-
termined InterPro family/superfamily/domain signature profiles added to a species tree, generated using organisms har-
boring published homologs and those used in alignments acquired via OrthoInspector (Data Table S4). Proteins are des-
ignated as categories A-G, as detailed in Figure 4 and Data Table S5. These homologous domains are classified in the map
according to their HMM-defined DUF34 domain identities (see Figure 4a).
Approximately three-quarters of the genomes analyzed encoded only one subgroup
of the DUF34 family. In organisms with two or more subgroups, the most frequent com-
bination was the co-occurrence of either a subgroup A, B, or C with any member of sub-
Figure 5.
Absence–presence of DUF34 architectural domain subgroups. Absence–presence data of COGs and HMM-
determined InterPro family/superfamily/domain signature profiles added to a species tree, generated using organisms
harboring published homologs and those used in alignments acquired via OrthoInspector (Data Table S4). Proteins are
designated as categories A–G, as detailed in Figure 4and Data Table S5. These homologous domains are classified in the
map according to their HMM-defined DUF34 domain identities (see Figure 4a).
Biomolecules 2021,11, 1282 16 of 32
Approximately three-quarters of the genomes analyzed encoded only one subgroup of
the DUF34 family. In organisms with two or more subgroups, the most frequent combina-
tion was the co-occurrence of either a subgroup A, B, or C with any member of subgroups
D–G. Although seldom, subgroups A, B, and/or C were observed to co-occur together,
most often in pairs, in eukaryotic organisms, but never in bacteria, archaea or viruses. Only
members of subgroup G ever occurred alone more than once without any subgroups A–C.
This suggests that this is the only form that can functionally replace any one of the A–C
forms and that the stand-alone versions of the inserted domains definitive of subgroups D
or E, relative to subgroups A–C, certainly perform a different function.
In a larger survey of available complete bacterial genomes (JGI-IMG/M; accessed
on 30 January 2020), DUF34 homologs annotated as belonging to both COGs (subgroups
D–G) COG3323 and COG0327, occurred in 18% of complete bacterial genomes, while a
much larger fraction of the bacterial family members (66%) were found to encode only the
COG0327 designation (Subgroups A–C) (Data Table S6) [158160].
3.7. Physical Clustering and Co-Expression Further Link the DUF34 Family to Metal Ion
Homeostasis and Iron Sulfur-Cluster Metabolism
To determine associations based on physical clustering, gene neighborhoods for
members of the DUF34 family were examined using the IBT–UNAM Computational
Genomic Group’s Gene Context Tool (GCT). The GCT webserver was used to retrieve
collections of commonly clustered COGs of DUF34-encoding operons for taxonomic subsets
of bacterial and archaeal DUF34 family members (Data Table S7, a). These data were
then used to develop a method of text analysis-enabled assessment of COG and COG
description keyword/phrase frequencies, the methods of which are described further in
the Supplemental Methods Section (1.2). This approach will be referred to, henceforth,
as Physical Clustering Keyword Frequency Analysis (PCKFA). Using PCKFA, COGs and
their descriptions were examined for common annotations and trends that could inform
on potential functional associations. PCKFA of COG identifiers was used to generate a
ranked list of co-occurring COGs. This data was sorted by frequency to generate a final list
of the top 20 highest-ranking COGs occurring across all taxonomic ranges (Table 3). Upon
closer review of the associated functional annotation, it was determined that 65% (13) of
the top 20 most frequently co-occurring COGs of DUF34-containing operons were either
predicted or confirmed to be “metal ion-binding/-dependent”, an incidence notably greater
than the one-third of proteins within PDB predicted to require metal ions [
161
]. Three of
the 13 metal ion-binding/-dependent COGs within those ranking within the top 20 were
found to bind Fe-S clusters (Table 3). Despite the diversity of operon compositions that
were observed within and between the data’s selected taxonomic ranges (Data Table S7),
keywords linked to metal ion homeostasis and Fe-S cluster-dependent processes recurred
with notable frequency (Figure S7a).
Representative operons were curated to facilitate more granular, context-driven analy-
ses investigating the observed trends (Data Table S7, d–e). With an initial survey of metal
bias based only on COG descriptions, whether or how many of the encoded COGs might
be linked to pathways involving metal ions and/or Fe-S clusters remained unclear. This
was largely due to the generally poor functional annotation statuses for many of the COGs
retrieved. Therefore, the individual sequences constituting these operons were investigated
thoroughly using functional annotation and key background literature (as described in
Methods) to investigate annotations for any catalytic dependencies or interactions with
metals ions. In 13 of the 51 selected bacteria (25.5%), COG0327 was observed to occur alone,
and, of those not encoded alone (38 of 51), 31 were found to encode at least one protein
with supported annotations of metal-binding/-dependence (81.6% of operons; count in-
clusive of Fe-S cluster-containing proteins) (Data Tables S7 and S8). Similar incidence was
observed across archaeal representative operons with 3 of 9 archaeal COG0327 proteins
(33.3%) being encoded alone, and, of those not, five were found to encode at least one
metal-binding/-dependent protein (5 of 6 operons; ~83%).
Biomolecules 2021,11, 1282 17 of 32
Table 3. Top 20 COGs found to occur in operons containing COG0327.
Rank COG Name/Description Metal(s) References (PMID, EC
Number)
1 COG0327 Putative GTP cyclohydrolase 1 type 2, NIF3
family Fe2+/Fe3+, Zn2+, Mg2+ [26], [132], [138],
[26.88.147.156],
[26.89.148.157]
2 COG1579 Predicted nucleic acid-binding protein
DR0291, contains C4-type Zn-ribbon domain Zn2+ [125]
3 COG0568 DNA-directed RNA polymerase, sigma
subunit (sigma70/sigma32) Zn2+, Mg2+ [162], [2.7.7.6]
4 COG0358 DNA primase (bacterial type) Zn2+, Mg2+, Mn2+ [163], [2.7.7.101]
5 COG0457 aTetratricopeptide (TPR) repeat NA None listed
6 COG2384 tRNA A22 N1-methylase NA [2.1.1.217]
7 COG0079 Histidinol-phosphate/aromatic
aminotransferase or cobyric acid
decarboxylase
NA;
Co (cobalamin) [164], [2.6.1.9]
8 COG0240 Glycerol-3-phosphate dehydrogenase NA [1.1.1.94]
9 COG0328 Ribonuclease HI (RnhA) Mg2+, Mn2+, Co2+,
Ni2+ [165], [3.1.26.4]
10 COG0500 bSAM-dependent methyltransferase NA [2.1.1.242]
11 COG0513 cSuperfamily II DNA and RNA helicase
(SrmB/RhlB) Mg2+, Mn2+ [3.6.4.13]
12 COG0596 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-
carboxylate synthase MenH and related
esterases, alpha/beta hydrolase fold (MhpC)
NA [3.7.1.14]
13 COG0655 Multimeric flavodoxin WrbA, includes
NAD(P)H:quinone oxidoreductase
Most req. Fe-S cluster;
subtypes without Fe-S
clusters [1.6.5.2], [1.6.5.6]
14 COG0752 Glycyl-tRNA synthetase, alpha subunit Mg2+, Mn2+ , Co2+ [166], [6.1.1.14]
15 COG0826
23S rRNA C2501 and tRNA U34
5’-hydroxylation protein RlhA/YrrN/YrrO,
U32 peptidase family; ubiquinone
biosynthesis protein, UbiU/YhbU
Fe-S cluster/Fe, Ca2+ [167,168]
16 COG1028 NAD(P)-dependent dehydrogenase,
short-chain alcohol dehydrogenase family Co2+, Fe/Fe2+, Mg2+,
Mn2+, Zn/Zn2+ [1.1.1.2]
17 COG1897 Homoserine O-succinyltransferase NA [2.3.1.31], [2.3.1.46]
18 COG0177 dEndonuclease III (Nth)
Fe-S cluster, Ca
2+
, Co
2+
,
Fe/Fe2+, Mg2+, Mn2+,
Ni2+, Zn2+ [169], [4.2.99.18]
19 COG0477 dMFS family permease (includes
anhydromuropeptide permease AmpG, ProP)
NA None listed
20 COG0494 e8-oxo-dGTP pyrophosphatase MutT and
related house-cleaning NTP
pyrophosphohydrolases, NUDIX family
Co2+, Mg2+, Mn2+,
Zn2+ [3.6.1.13]
Exceptions to representative operons relative to table contents:
a
Proteins containing TPR repeat domains present in archaeal operons.
b
SAM-dependent methyltransferase domains present (not designated COG0500).
c
Though not assigned COG0513, helicase domain-
containing proteins are present (e.g., Era/COG1159, YhaM/COG3481).
d
MutY is present (COG1194), another endonuclease family member.
eMutM/NUDIX domain containing proteins are present (COG0266).
Of all COGs encoded by COG0327-containing representative operons, COG1579 co-
occurred most frequently. This COG was also determined through PCKFA to be the top-most
Biomolecules 2021,11, 1282 18 of 32
ranked in, both, singular occurrence and paired occurrence with COG0327 across taxonomic
ranges (Figure S8b,c). COG1579 is a family of unknown functions (DUF164) that is conserved
primarily among bacterial clades, although homologs are found also in archaea. Members
of this group have been linked to functional roles in chemotaxis, flagellin synthesis, type
III secretion systems (i.e., Helicobacter pylori and Chlamydia trachomatis [
125
,
170
172
]), and
bacteria-induced host cell maturation (i.e., Mycobacterium avium [
173
,
174
]) but the molecular
mechanisms involved remain mysterious. The homolog of Mycobacterium tuberculosis has
been noted as an essential gene under some circumstances [
175
]. COG1579 members have
an obvious link because of the presence of a domain belonging to the zf-RING_7 Pfam family
(PF02591 [
176
]). A characteristic feature of the zf-RING_7 family is the presence of a C4-type
zinc-ribbon domain with two pairs of cysteines in a CxxC-x (18–26)-CxxC (zinc-finger) motif
capable of binding zinc ions. Published structures (5Y06/5Y05 of M. smegmatis [
171
]; 4ILO of
Chlamydia trachomatis [
172
]) demonstrate an unusual coiled-coil structure that is book-ended
by the aforementioned distinctive zinc-finger domain.
Despite the high clustering frequencies discernible for several co-occurring COGs, a
single link between DUF34 homologs and a distinct metabolic area remained unclear. The
diversity of metals associated with proteins encoded by DUF34-containing operons failed
to support a preference for a single metal or metal ion-complex, although zinc and iron
were found to be common interactors, second to magnesium and manganese. In addition,
many of the families listed in Table S4 were found to interact with several metal ions (up
to eight) with averages, across the table, of ~2.5 different metals for bacterial proteins
and ~1.9 for archaeal proteins (Figure 6). Several metal-dependent/-binding COGs found
to frequently cluster within DUF34-containing operons across taxa (Table 3) were also
common among representative operons (Data Table S7). When compared to all available
PDB structures (PDB 2020), the relative abundance of metal-binding proteins across both
archaeal and bacterial representative operons was observed to be significant (Data Table S8;
Figures S9–S11). A strong association with Fe-S cluster associated proteins was observed (7
of the 40 bacterial and 2 of the 14 archaeal metal-binding proteins analyzed) (Figure 6and
Table S4). Examples include HcgA/BioB and HmdC/HcgG (FlpA homolog) in archaea,
and MutY, SplB, NfuA, PhrB, and BolA in bacteria.
Because DUF34 is conserved across bacteria, archaea, and most eukaryotes, and
as physical clustering was appropriate for only two of three superkingdoms [
177
], co-
expression (top 300 co-expressed, CoXPresDb; Data Table S9, sheets d.1–d.10) and coreg-
ulation databases (ProteomeHD; Data Table S10, a) were consulted to identify trends in
putative functional associations of eukaryotic DUF34 family members shared with those
observed through preceding analyses with bacterial and archaeal family members. Interest-
ingly, a number of genes directly involved in iron homeostasis and Fe-S cluster biogenesis
were observed to occur in most eukaryotic organisms surveyed (Data Table S9; Figure S12).
BolA or BolA-like family members occurred in H sapiens,M. mulatta, and S. cerevisiae. How-
ever, in absence of a BolA-like homolog, S. pombe showed co-expression of a Fe-S cluster
biogenesis factor, caf17 (IBA57-like; SPAC21E11.07), a member of the GcvT and CAF17
families [
178
]. Upon further review of the top 100 genes co-expressed in H. sapiens, YAE1D1
(57002, Yet Another Essential domain-containing 1), a highly conserved protein essential to
cytosolic Fe-S cluster protein assembly (CIA) complex [
179
], was also observed. Although a
Yae1 homolog was not observed in the acquired datasets for either yeast, another essential
component of the CIA complex, the Fe-S cluster-binding ATPase, Nbp35 (2543416, S. pombe;
852789, S. cerevisiae), was found within the top 130 co-expressed genes of each. Genes
encoding this protein were found co-expressed with NIF3L1 homologs in three eukaryotes
of the 10 for which data was retrieved. Similar trends associating Fe-S cluster proteins and
pathways were observed upon gene functional classification analyses of the same sets of
co-expressed genes using the DAVID bioinformatics suite (Data Table S9, e.1–e.10).
Biomolecules 2021,11, 1282 19 of 32
Biomolecules 2021, 11, x 17 of 31
[125,170172]), and bacteria-induced host cell maturation (i.e., Mycobacterium avium
[173,174]) but the molecular mechanisms involved remain mysterious. The homolog of
Mycobacterium tuberculosis has been noted as an essential gene under some circumstances
[175]. COG1579 members have an obvious link because of the presence of a domain be-
longing to the zf-RING_7 Pfam family (PF02591 [176]). A characteristic feature of the zf-
RING_7 family is the presence of a C4-type zinc-ribbon domain with two pairs of cysteines
in a CxxC-x (1826)-CxxC (zinc-finger) motif capable of binding zinc ions. Published struc-
tures (5Y06/5Y05 of M. smegmatis [171]; 4ILO of Chlamydia trachomatis [172]) demonstrate
an unusual coiled-coil structure that is book-ended by the aforementioned distinctive
zinc-finger domain.
Despite the high clustering frequencies discernible for several co-occurring COGs, a
single link between DUF34 homologs and a distinct metabolic area remained unclear. The
diversity of metals associated with proteins encoded by DUF34-containing operons failed
to support a preference for a single metal or metal ion-complex, although zinc and iron
were found to be common interactors, second to magnesium and manganese. In addition,
many of the families listed in Table S4 were found to interact with several metal ions (up
to eight) with averages, across the table, of ~2.5 different metals for bacterial proteins and
~1.9 for archaeal proteins (Figure 6). Several metal-dependent/-binding COGs found to
frequently cluster within DUF34-containing operons across taxa (Table 3) were also com-
mon among representative operons (Data Table S7). When compared to all available PDB
structures (PDB 2020), the relative abundance of metal-binding proteins across both ar-
chaeal and bacterial representative operons was observed to be significant (Data Table S8;
Figures S9S11). A strong association with Fe-S cluster associated proteins was observed
(7 of the 40 bacterial and 2 of the 14 archaeal metal-binding proteins analyzed) (Figure 6
and Table S4). Examples include HcgA/BioB and HmdC/HcgG (FlpA homolog) in ar-
chaea, and MutY, SplB, NfuA, PhrB, and BolA in bacteria.
Figure 6. Metal ion-binding of proteins encoded in representative Bacterial and Archaeal operons.
(a) A radar chart illustrating the proportions of DUF34-operon encoded proteins documented to
interact with certain metals or metal-containing moieties. Accounting for the over-representation of
magnesium and zinc among available protein structures, a second radar chart (b) was generated to
show the same data without proteins found to exclusively bind either or both ions. Bacterial data
are shown in blue while Archaeal data are shown in red. Data used to generate these figures can be
found in Table S4.
Because DUF34 is conserved across bacteria, archaea, and most eukaryotes, and as
physical clustering was appropriate for only two of three superkingdoms [177], co-expres-
sion (top 300 co-expressed, CoXPresDb; Data Table S9, sheets d.1d.10) and coregulation
databases (ProteomeHD; Data Table S10, a) were consulted to identify trends in putative
Figure 6.
Metal ion-binding of proteins encoded in representative Bacterial and Archaeal operons. (
a
) A radar chart
illustrating the proportions of DUF34-operon encoded proteins documented to interact with certain metals or metal-
containing moieties. Accounting for the over-representation of magnesium and zinc among available protein structures, a
second radar chart (
b
) was generated to show the same data without proteins found to exclusively bind either or both ions.
Bacterial data are shown in blue while Archaeal data are shown in red. Data used to generate these figures can be found in
Table S4.
3.8. DUF34 Fusions Fortify Links to Metals and Metallocofactors, Most Notably Fe-S Clusters
Fusions can provide substantial insight into putative functional relationships between
their constituent protein families. To better understand the full diversity of fusions across
the DUF34 family, three different methods were used, as described in the methods section,
to generate a curated set of 226 sequences of varying validity (Data Table S11, b), covering
47 distinct fusion classes and 65 different fusion subclasses (see Supplemental Methods, 1.3).
After further curation focusing on fusions of highest confidence, nine fusion classes were
observed in eukaryotes and seven in bacteria. Eukaryotic fusions of note included those
with the following domains: WD40 repeat; BolA (BolA-like); FAD-binding flavoprotein;
RING- or THAP-type zinc finger; EF-Hand pair; or histone acetyltransferase (Figure 7a).
The most common fusion among eukaryotes were those containing the WD40 repeat
domain, CIAO1/Cia1 (COG2319), which is thought to play a role in Fe-S cluster biogenesis.
Somewhat consistent with this finding, a fusion with BolA was also observed (COG0271,
PF01722; Fusarium oxysporum Fo47). It was also remarked that the neighboring of BolA
family members, a phenomenon shared by at least one bacterial representative operon
(Data Table S7, d.1–d.2), was not necessarily uncommon in fungal genomes, as Bol2, for
example, is divergently encoded immediately upstream of DUF34 in S. cerevisiae.
Biomolecules 2021,11, 1282 20 of 32
Biomolecules 2021, 11, x 19 of 31
Figure 7. DUF34 fusions and select gene neighborhoods. (a) Domain architectures of DUF34 fusions. The domain render-
ing dimensions and positions are approximate. DUF34 domains are rendered in white with black outlines. Domain colors
correspond to the key shown in panel b. COGs of fusion domains are listed below each. Fusions deemed “invalid” or
“inconclusive” were excluded for panels a and b. (b) Pie chart of DUF34 fusions (126 sequences, total). The outer halo
surrounding chart indicates the superkingdoms in which respective fusions were observed (Eukaryota: black; Archaea:
dark gray; Bacteria: light gray). (c) Neighborhoods of select bacterial and archaeal fusions are shown (12 kb, each), all of
at least “conditional” validation confidence (Data Table S11). DUF34 is depicted in bright yellow and fusion domains are
indicated by hashing or alternative coloring. For DUF34 sequence labels, “YqfO” denotes a sequence also containing in-
serted domain, COG3323, while “YbgI” denotes a sequence without the inserted COG3323 domain. Rendered fusion do-
mains do not reflect exact sizes or locations. The color key is divided into two sets of identities (gray boxes): (top) general
metabolic theme or specific annotation with bioinformatic precedent; and (bottom) COGs observed in physical clustering
analysis (PCA). COGs also observed in PCA (Table 3) are shown in bold. Six minor exceptions to the top-20 rank cut-off
are shown in bold with an asterisk (*): COG1196 (top 31st); COG0564 (top 23rd); COG0648 (top 25th); COG0406 (top 48th)
in a fusion with COG0328; and COG0041 (top 36th). Others observed in rep. operons but were ranked beyond the “minor
exception” threshold (exceeded top-50) in PCA are shown without additional symbols, not bolded: COG0245 (116th) and
COG0761 (61st). Finally, one was not observed in PCA (not bolded) but was in at least one rep. operon (double asterisk,
**): COG0642 (SAMN05192534_10671 of A. persepolensis; rep. operon, Desulfurispirillum indicum S5) (Data Table S7). Note:
COG4111 (NUDIX hydrolase), present in panel c (neighborhood of M. rubeus), was absent from PCA (any rank) and rep.
operons, despite the fusion with COG3323 in F. nucleatum having been resolved in preceding homolog capture and liter-
ature review.
Notable bacterial fusions included domains belonging to COG1579, COG2384, and
COG0328, all three COGs having occurred independently in the top-20 ranked COGs de-
termined through PCKFA that were also metal-binding, in addition to being observed
among bacterial representative operons (COG1579, Wolinella succinogenes ATCC 29543;
COG2384, Ruminococcus flavefaciens Sab67; COG0328, Clostridia bacterium 1MN72D_59_214
(taxid: 2044939)). Although without recognizable COGs, the most common gene fusion
among bacteria were TAT signals, a sequence feature neglected at the protein annotation
level. While the neighborhoods of many bacterial fusions appeared very diverse (Figure
7b), 55% (11) of the top-20 co-occurring COGs of the DUF34 family (Table 3) were repre-
sented at least once across all observed neighborhoods. Additionally, genes encoding pro-
teins involved in cofactor biosynthesis, corrinoid/siderophore/metal ion transport, metal-
and metal ion stress-dependent processes, as well as DNA/RNA metabolism (e.g., de novo
purine biosynthesis), were pronounced among these selected neighborhoods.
Figure 7.
DUF34 fusions and select gene neighborhoods. (
a
) Domain architectures of DUF34 fusions. The domain rendering
dimensions and positions are approximate. DUF34 domains are rendered in white with black outlines. Domain colors
correspond to the key shown in panel b. COGs of fusion domains are listed below each. Fusions deemed “invalid” or
“inconclusive” were excluded for panels a and b. (
b
) Pie chart of DUF34 fusions (126 sequences, total). The outer halo
surrounding chart indicates the superkingdoms in which respective fusions were observed (Eukaryota: black; Archaea: dark
gray; Bacteria: light gray). (
c
) Neighborhoods of select bacterial and archaeal fusions are shown (12 kb, each), all of at least
“conditional” validation confidence (Data Table S11). DUF34 is depicted in bright yellow and fusion domains are indicated
by hashing or alternative coloring. For DUF34 sequence labels, “YqfO” denotes a sequence also containing inserted domain,
COG3323, while “YbgI” denotes a sequence without the inserted COG3323 domain. Rendered fusion domains do not reflect
exact sizes or locations. The color key is divided into two sets of identities (gray boxes): (top) general metabolic theme or
specific annotation with bioinformatic precedent; and (bottom) COGs observed in physical clustering analysis (PCA). COGs
also observed in PCA (Table 3) are shown in bold. Six minor exceptions to the top-20 rank cut-off are shown in bold with an
asterisk (*): COG1196 (top 31st); COG0564 (top 23rd); COG0648 (top 25th); COG0406 (top 48th) in a fusion with COG0328; and
COG0041 (top 36th). Others observed in rep. operons but were ranked beyond the “minor exception” threshold (exceeded
top-50) in PCA are shown without additional symbols, not bolded: COG0245 (116th) and COG0761 (61st). Finally, one was
not observed in PCA (not bolded) but was in at least one rep. operon (double asterisk, **): COG0642 (SAMN05192534_10671
of A. persepolensis; rep. operon, Desulfurispirillum indicum S5) (Data Table S7). Note: COG4111 (NUDIX hydrolase), present in
panel c (neighborhood of M. rubeus), was absent from PCA (any rank) and rep. operons, despite the fusion with COG3323 in F.
nucleatum having been resolved in preceding homolog capture and literature review.
Biomolecules 2021,11, 1282 21 of 32
Notable bacterial fusions included domains belonging to COG1579, COG2384, and
COG0328, all three COGs having occurred independently in the top-20 ranked COGs
determined through PCKFA that were also metal-binding, in addition to being observed
among bacterial representative operons (COG1579, Wolinella succinogenes ATCC 29543;
COG2384, Ruminococcus flavefaciens Sab67; COG0328, Clostridia bacterium 1MN72D_59_214
(taxid: 2044939)). Although without recognizable COGs, the most common gene fusion
among bacteria were TAT signals, a sequence feature neglected at the protein annotation
level. While the neighborhoods of many bacterial fusions appeared very diverse (Figure 7b),
55% (11) of the top-20 co-occurring COGs of the DUF34 family (Table 3) were represented
at least once across all observed neighborhoods. Additionally, genes encoding proteins
involved in cofactor biosynthesis, corrinoid/siderophore/metal ion transport, metal- and
metal ion stress-dependent processes, as well as DNA/RNA metabolism (e.g., de novo
purine biosynthesis), were pronounced among these selected neighborhoods.
3.9. A Role of the DUF34 Family Protein in Folate Synthesis Is Precluded by Bioinformatic and
Experimental Evidence
GTP cyclohydrolase I activity was reported using an
in vitro
assay with the H. pylori
DUF34 family member, HP0959, expressed in E. coli [
34
]. With the roll-out of UniRule,
an automated curation and annotation transfer program, by UniProtKB, the annotation
of “GTP cyclohydrolase I type 2” was subsequently electronically propagated across
thousands of proteins without further substantiation or review outside of this singular
publication.
The canonical GTP cyclohydrolase I (GCYHI) enzymes catalyze a complex reaction,
the formation of H
2
-neopterin-triphosphate (H
2
NTP) from GTP, required for the first step
of tetrahydrofolate (THF) synthesis in most bacteria [
180
182
]. H
2
NTP is also a precursor to
the cofactor BH
4
and 7-cyano-deazaguanine (preQ
0
) and intermediate in the synthesis of
modified RNA and DNA bases [
183
,
184
]. Two non-orthologous protein families have been
shown to harbor GCYHI activity [
185
]. The first, COG0302 (PF01227), was first characterized
as FolE in E. coli K12 and is called GTP cyclohydrolase I type 1 [
35
]. The second named
FolE2 and part of the COG1469 (PF02649) family was discovered much more recently and is
called GTP cyclohydrolase I type 2 [
186
]. The distribution of the two families in Bacteria and
Archaea vary greatly, some have FolE1, some FolE2 and some have both [
4
,
187
]. Humans
encode FolE as the first step of BH
4
synthesis but no other folate enzyme [
183
]. A minority
of bacteria are auxotrophic for THF, requiring the uptake of a folate source; hence, they do
not encode any de novo folate biosynthesis enzymes [
188
]. However, as folate transporters
are not present in most bacteria that are folate prototrophs, it follows that the de novo THF
synthesis genes are often found to be essential in these organisms [
35
,
36
]. Folate prototrophy
is common in most plants (Viridiplantae). although minor differences are observed among
specific pathway contributors between select clades [189].
Despite the proposed role of the H. pylori DUF34 protein (HP0959) in folate synthe-
sis [
34
], this hypothesis is not supported by the patterns of occurrence of DUF34 family
members across folate auxotrophs or prototrophs. Indeed, organisms prototrophic for
folate do not encode DUF34 proteins (e.g., plants), whereas folate auxotrophs, such as
M. genitalium, do. In general, genes encoding DUF34 proteins are not essential with a
few exceptions (Table S5). The gene encoding for GTP cyclohydrolase I, folE, is essential
in E. coli, as is expected in most folate prototrophic bacteria [
37
]. The same essentiality,
however, is not observed in mutants of ybgI in E. coli (Table S5). Moreover, this would
imply that YbgI lacks the GTP cyclohydrolase I activity necessary to effectively compensate
for the absence of folE, an alternative explanation to this compensatory failure being that
the gene had not been sufficiently expressed in previously tested conditions to do so. An
additional observation of note, however, is that even the YbgI-encoding operon, as a whole,
has been reported as being non-essential in E. coli [
190
]. Although DUF34/NIF3 homologs
are considered non-essential in an overwhelming majority of bacteria for which data is
available (Table S5), one published case of bacterial DUF34 homolog mutant inviability
was found, but it occurred in the context of using a specialized method of mutagenesis in
Biomolecules 2021,11, 1282 22 of 32
H. pylori (i.e.,
in vitro
mutagenesis using the Tn7 transposon) [
191
]. Moreover, this case
stands out compared to other systems again in that the homolog is essential for H. pylori, a
rare observation among DUF34 family members (Table S5).
With differences in essentiality considered, a series of complementation assays were
performed to better illustrate the relationship of ybgI to folE and the folate biosynthetic
pathway. The essentiality of folate in E. coli is partially linked to the de novo synthesis
of thymidine, as the thymidylate synthase (ThyA, [
192
]), that catalyzes the formation of
dTMP from dUTP, uses THF as a cofactor. It was previously reported that complementing
the growth media with dT allowed a folE mutant of E. coli to grow at a low rate [
184
]. The
ybgI mutant of E. coli had a similar growth compared to a WT in the presence and absence
of dT, while the folE mutant could only grow in presence of dT (Figure 8). Interestingly,
the double mutant also required dT to grow but grew at a slower rate than the folE single
mutant, eventually reaching the same final OD as the folE single mutant (Figure 8a,b).
Expression of E. coli folE in trans complemented the essentiality of dT upon plating for,
both, the single and double mutants (Figure 8c), whereas the expression of E. coli ybgI in
trans did not complement this phenotype. It can be noted that the overexpression of folE
in the single mutant did not fully complement the growth phenotype, while successfully
doing so in the double mutant (Figure 8c, + arabinose). The WT was not impacted by
the overexpression of folE, eliminating the hypothesis for toxicity of high FolE levels but
revealed a genetic interaction between ybgI and folE that is also observed with the better
growth of the double mutant on dT compared to the single folE mutant. Further studies
will have to be performed to dissect this interaction but it can be noted that FolE is a
metal-dependent zinc-requiring enzyme [193].
Biomolecules 2021, 11, x 21 of 31
both, the single and double mutants (Figure 8c), whereas the expression of E. coli ybgI in
trans did not complement this phenotype. It can be noted that the overexpression of folE
in the single mutant did not fully complement the growth phenotype, while successfully
doing so in the double mutant (Figure 8c, + arabinose). The WT was not impacted by the
overexpression of folE, eliminating the hypothesis for toxicity of high FolE levels but re-
vealed a genetic interaction between ybgI and folE that is also observed with the better
growth of the double mutant on dT compared to the single folE mutant. Further studies
will have to be performed to dissect this interaction but it can be noted that FolE is a metal-
dependent zinc-requiring enzyme [193].
Figure 8. DUF34 of E. coli, ybgI, fails complementation in the absence of folE. Plates were imaged after 20 h of growth at 37
°C. (a,b) dT essentiality assay. WT, single mutants, and double mutant (folE, ybgI) strains have been grown at 37 °C in LB
supplemented in the absence (a) or presence (b) or dT 0.3 mM. Each curve shown is averaged across 5 replicates. (c) dT
essentiality complementation assay. WT, single mutants, and double mutant (folE, ybgI) strains, containing various deriv-
atives of pBAD24 encoding for either E. coli YbgI or FolE, have been streaked on LB plates supplemented with Ampicillin
100 µg/mL in the presence of either 0.2% glucose for repression of the gene expression, or 0.2% arabinose for overexpres-
sion of the gene of interest, and in presence or absence of dT 0.3 mM.
4. Conclusions
In this comprehensive comparative genomic analysis of the DUF34 family, we pre-
sented a collection of arguments refuting a role in folate synthesis as a GTP cyclohydrolase
I type 2 in most organisms, including the gram-negative model, E. coli. While we concede
that it is possible the in vitro GTP cyclohydrolase I activity described for the DUF34 mem-
ber of H. pylori, HP0959, may still accurately reflect the enzyme’s ability, further con-
trols―such as site-directed mutagenesis of essential residues or in vivo complementation
data―would be necessary to ensure that the observed activity was not related to a con-
taminating endogenous enzyme or non-biological assay conditions such as low pH. In
light of our analyses, the propagation of this annotation should therefore be limited until
further experimental work is conducted.
The published quorum emphasizes a pleiotropic role of the DUF34 that is typical of
a core molecular function. We propose that members of this family have a general metal
ion insertase function that may vary in the substrate and target individual members and
clades. Diiron proteins have long been implicated in metal shuttling [194], but the only
member of the DUF34 family with notable biochemical and structural characterization is
Figure 8.
DUF34 of E. coli,ybgI, fails complementation in the absence of folE. Plates were imaged after 20 h of growth at
37
C. (
a
,
b
) dT essentiality assay. WT, single mutants, and double mutant (folE,ybgI) strains have been grown at 37
C
in LB supplemented in the absence (
a
) or presence (
b
) or dT 0.3 mM. Each curve shown is averaged across 5 replicates.
(
c
) dT essentiality complementation assay. WT, single mutants, and double mutant (folE,ybgI) strains, containing various
derivatives of pBAD24 encoding for either E. coli YbgI or FolE, have been streaked on LB plates supplemented with
Ampicillin 100
µ
g/mL in the presence of either 0.2% glucose for repression of the gene expression, or 0.2% arabinose for
overexpression of the gene of interest, and in presence or absence of dT 0.3 mM.
Biomolecules 2021,11, 1282 23 of 32
4. Conclusions
In this comprehensive comparative genomic analysis of the DUF34 family, we pre-
sented a collection of arguments refuting a role in folate synthesis as a GTP cyclohydrolase
I type 2 in most organisms, including the gram-negative model, E. coli. While we con-
cede that it is possible the
in vitro
GTP cyclohydrolase I activity described for the DUF34
member of H. pylori,HP0959, may still accurately reflect the enzyme’s ability, further
controls—such as site-directed mutagenesis of essential residues or in vivo complementa-
tion data—would be necessary to ensure that the observed activity was not related to a
contaminating endogenous enzyme or non-biological assay conditions such as low pH. In
light of our analyses, the propagation of this annotation should therefore be limited until
further experimental work is conducted.
The published quorum emphasizes a pleiotropic role of the DUF34 that is typical of a
core molecular function. We propose that members of this family have a general metal ion
insertase function that may vary in the substrate and target individual members and clades.
Diiron proteins have long been implicated in metal shuttling [
194
], but the only member of
the DUF34 family with notable biochemical and structural characterization is the archaeal
HcgD, which has been proposed to act as an iron chaperone in the maturation of the iron-
guanylylpyridinol (FeGP) cofactor required by [Fe]-hydrogenase [
132
]. The structural data
presented here strongly link the DUF34 family to metal homeostasis, while the physical
clustering, fusion, and co-expression data also suggest a metal link, most notably to Fe-S
clusters. Proving metal insertion activity
in vivo
can be a very difficult task. For example,
our group predicted that members of the COG0523 family were involved in metal insertion
over 15 years ago and the experimental validation of this prediction has only been published
within recent years [
195
197
]. We believe that the thorough analysis presented here should
guide future experimental efforts to solve this long-standing functional enigma for one of
the most conserved unknowns remaining to be confidently characterized.
Supplementary Materials:
The following are available online at https://www.mdpi.com/article/
10.3390/biom11091282/s1. Figure S1: Word clouds generated from titles of focal and non-focal
publications listed in Data Table S2; Figure S2: Secondary structural annotation by superkingdom
using MultAlign-based ESPRIPT analyses; Figure S3: Complete DUF34/NIF3 homolog sequence
logos across and for each superkingdom (Eukaryota, Archaea, Bacteria) with three tiers of relative
conservation; Figure S4: Phyre2 generated model of NIF3L1 (H. sapiens) structurally aligned with
YqfO to illustrate binding pockets, residues differences within and adjacent to the active site; Figure
S5: count per domain length range as a function of superkingdom (histogram); Figure S6: Motif
differences in sequences of the D-G subgroups with and without the IPR015867 HMM profile
signature annotation; Figure S7: Pairwise alignments of B. cereus DUF34 paralogs; Figure S8: PCKFA
of COGs and COG descriptions; Figure S9: Abundances of metal ion ligand annotations across
published protein structures; Figure S10: Relative abundances of metal-binding proteins per distinct
ion across representative operons comparing those of bacteria and archaea to those observed in PDB;
Figure S11: Relative abundances of metal-binding proteins per distinct ion as fractions of all encoded
proteins across representative operons; Figure S12: Distributions of GO terms retrieved for each set
of top 300 co-expressed genes of eukaryotic DUF34 family members; Figure S13: STRING network of
GSEA output of DUF34 co-regulated genes of H. sapiens; Table S1: All resources used in systematic
literature review and subsequent analyses; Table S2: Lists of strains and oligos used in growth
assays; Table S3: Formatted table of all organisms, genes/proteins with published data (both focal
and non-focal publications); Table S4: Metal ion interactions of proteins encoded by representative
operons; Table S5: Essentiality data of DUF34 homologs; Data Table S1: Table of search terms used
and generated in the literature review/data capture process; Data Table S2: Catalog of all focal and
non-focal publications collected through comprehensive literature review and data capture process
of the DUF34 protein family; Data Table S3: Model organism sequences used in initial sequence
alignments across and for each superkingdom exported from OrthoInspector (FASTA format); Data
Table S4: Collating lists of sequences from model organisms (exported from OrthoInspector) and
those acquired from comprehensive data capture and literature review (Table S3); Data Table S5: All
COGs and InterPro signature profiles of the DUF34 family including paralogs and some fusions;
Biomolecules 2021,11, 1282 24 of 32
Data Table S6: “IMG-occurrence” data sheet; Data Table S7: Physical clustering keyword frequency
analysis (PCKFA) and representative operons; Data Table S8: Representative operon metal-binding
protein abundance; Data Table S9: CoXPresDb (Eukaryota) exports of the top 300 co-expressed genes
of DUF34; Data Table S10: Co-regulated genes of Homo sapiens DUF34 homolog; Data Table S11:
Concatenated list of sequences indicated to be possible non-canonical fusions of the DUF34 family;
Data Table S12: STRING network export generated following the results of Data Table S10.
Author Contributions:
Conceptualization, V.d.C.-L. and C.J.R.; Data curation, C.J.R.; Formal analy-
sis, C.J.R.; Investigation, G.H. and V.d.C.-L.; Methodology, G.H.; Project administration, V.d.C.-L.;
Visualization, C.J.R.; Writing—original draft, G.H. and C.J.R.; Writing—review & editing, G.H.,
V.d.C.-L. and C.J.R. All authors have read and agreed to the published version of the manuscript.
Funding:
This research was funded by the National Institutes of Health grant number GM70641 to
V.d.C.-L and by funds from the University of Florida Dept of Microbiology and Cell Sciences.
Acknowledgments:
Early preliminary bioinformatics and initial complementation assays (not dis-
cussed, not shown) were performed by an undergraduate student, Rouyi Zhang. Institutional
support was provided by the Department of Microbiology and Cell Science of the University of
Florida. Additional appreciation is noted for the developers of UniProt for their helpful feedback
and correspondence relating to the current annotation statuses of proteins relevant to this work.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
Danchin, A.; Fang, G. Unknown unknowns: Essential genes in quest for function. Microb. Biotechnol.
2016
,9, 530–540. [CrossRef]
[PubMed]
2.
Niehaus, T.D.; Thamm, A.M.; de Crécy-Lagard, V.; Hanson, A.D. Proteins of unknown biochemical function—A persistent
problem and a roadmap to help overcome it. Plant Physiol. 2015,169, 1436–1442. [CrossRef] [PubMed]
3.
de Crécy-Lagard, V.; Haas, D.; Hanson, A.D. Newly-discovered enzymes that function in metabolite damage-control. Curr. Opin.
Chem. Biol. 2018,47, 101–108. [CrossRef] [PubMed]
4.
De Crécy-Lagard, V.; Phillips, G.; Grochowski, L.L.; Yacoubi, B.E.; Jenney, F.; Adams, M.W.W.; Murzin, A.G.; White, R.H.
Comparative genomics guided discovery of two missing archaeal enzyme families involved in the biosynthesis of the pterin
moiety of tetrahydromethanopterin and tetrahydrofolate. ACS Chem. Biol. 2012,7, 1807–1816. [CrossRef] [PubMed]
5.
Price, M.N.; Wetmore, K.M.; Waters, R.J.; Callaghan, M.; Ray, J.; Liu, H.; Kuehl, J.V.; Melnyk, R.A.; Lamson, J.S.; Suh, Y.; et al.
Mutant phenotypes for thousands of bacterial genes of unknown function. Nature 2018,557, 503–509. [CrossRef] [PubMed]
6.
Kolker, E. Identification and functional analysis of “hypothetical” genes expressed in Haemophilus influenzae.Nucleic Acids Res.
2004,32, 2353–2361. [CrossRef]
7.
Ghodge, S.V. Mechanistic Characterization and Function Discovery of Phosphohydrolase Enzymes from the Amidohydrolase Superfamily;
Texas A&M University: College Station, TX, USA, 2015.
8. Tan, C.L. The absence of universally-conserved protein-coding genes. bioRxiv 2019, 842633. [CrossRef]
9.
Rödelsperger, C.; Prabh, N.; Sommer, R.J. New Gene Origin and Deep Taxon Phylogenomics: Opportunities and Challenges.
Trends Genet. 2019,35, 914–922. [CrossRef]
10.
Alam, M.T.; Takano, E.; Breitling, R. Prioritizing orphan proteins for further study using phylogenomics and gene expression
profiles in Streptomyces coelicolor.BMC Res. Notes 2011,4, 325. [CrossRef] [PubMed]
11.
Wood, V.; Lock, A.; Harris, M.A.; Rutherford, K.; Bähler, J.; Oliver, S.G. Hidden in plain sight: What remains to be discovered in
the eukaryotic proteome? Open Biol. 2019,9, 180241. [CrossRef]
12.
Nagy, L.G.; Merényi, Z.; Hegedüs, B.; Bálint, B. Novel phylogenetic methods are needed for understanding gene function in the
era of mega-scale genome sequencing. Nucleic Acids Res. 2020,48, 2209–2219. [CrossRef]
13.
Thiaville, P.C.; Iwata-Reuyl, D.; DeCrécy-Lagard, V. Diversity of the biosynthesis pathway for threonylcarbamoyladenosine (t
6
A),
a universal modification of tRNA. RNA Biol. 2014,11, 1529–1539. [CrossRef] [PubMed]
14.
El Yacoubi, B.; Hatin, I.; Deutsch, C.; Kahveci, T.; Rousset, J.-P.; Iwata-Reuyl, D.; G Murzin, A.; de Crécy-Lagard, V. A role for the
universal Kae1/Qri7/YgjD (COG0533) family in tRNA modification. EMBO J. 2011,30, 882–893. [CrossRef] [PubMed]
15.
El Yacoubi, B.; Lyons, B.; Cruz, Y.; Reddy, R.; Nordin, B.; Agnelli, F.; Williamson, J.R.; Schimmel, P.; Swairjo, M.A.; De Crécy-
Lagard, V. The universal YrdC/Sua5 family is required for the formation of threonylcarbamoyladenosine in tRNA. Nucleic Acids
Res. 2009,37, 2894–2909. [CrossRef] [PubMed]
16.
Sutherland, D.R.; Abdullah, K.M.; Cyopick, P.; Mellors, A. Cleavage of the cell-surface O-sialoglycoproteins CD34, CD43, CD44,
and CD45 by a novel glycoprotease from Pasteurella haemolytica.J. Immunol. 1992,148, 1458–1464.
17.
Nichols, C.E.; Lamb, H.K.; Thompson, P.; El Omari, K.; Lockyer, M.; Charles, I.; Hawkins, A.R.; Stammers, D.K. Crystal structure
of the dimer of two essential Salmonella typhimurium proteins, YgjD & YeaZ and calorimetric evidence for the formation of a
ternary YgjD-YeaZ-YjeE complex. Protein Sci. 2013,22, 628–640. [CrossRef]
Biomolecules 2021,11, 1282 25 of 32
18.
Edvardson, S.; Prunetti, L.; Arraf, A.; Haas, D.; Bacusmo, J.M.; Hu, J.F.; Ta-Shma, A.; Dedon, P.C.; de Crécy-Lagard, V.; Elpeleg, O.
tRNA N6-adenosine threonylcarbamoyltransferase defect due to KAE1/TCS3 (OSGEP) mutation manifest by neurodegeneration
and renal tubulopathy. Eur. J. Hum. Genet. 2017,25, 545–551. [CrossRef]
19.
Niehaus, T.D.; Gerdes, S.; Hodge-Hanson, K.; Zhukov, A.; Cooper, A.J.L.; ElBadawi-Sidhu, M.; Fiehn, O.; Downs, D.M.; Hanson,
A.D. Genomic and experimental evidence for multiple metabolic functions in the RidA/YjgF/YER057c/UK114 (Rid) protein
family. BMC Genom. 2015,16, 382. [CrossRef]
20.
Downs, D.M.; Ernst, D.C. From microbiology to cancer biology: The Rid protein family prevents cellular damage caused by
endogenously generated reactive nitrogen species. Mol. Microbiol. 2015,96, 211–219. [CrossRef]
21.
Irons, J.L.; Hodge-Hanson, K.; Downs, D.M. RidA Proteins Protect against Metabolic Damage by Reactive Intermediates. Microbiol.
Mol. Biol. Rev. 2020,84, 1–28. [CrossRef]
22.
Lambrecht, J.A.; Schmitz, G.E.; Downs, D.M. RidA proteins prevent metabolic damage inflicted by PLP-dependent dehydratases
in all domains of life. mBio 2013,4, e00033-13. [CrossRef]
23.
Borchert, A.J.; Ernst, D.C.; Downs, D.M. Reactive enamines and imines
in vivo
: Lessons from the RidA paradigm. Trends Biochem.
Sci. 2019,44, 849–860. [CrossRef]
24.
Tascou, S.; Uedelhoven, J.; Dixkens, C.; Nayernia, K.; Engel, W.; Burfeind, P. Isolation and characterization of a novel human gene,
NIF3L1, and its mouse ortholog, Nif3l1, highly conserved from bacteria to mammals. Cytogenet. Genome Res.
2000
,90, 330–336.
[CrossRef]
25.
Tascou, S.; Kang, T.W.; Trappe, R.; Engel, W.; Burfeind, P. Identification and characterization of NIF3L1 BP1, a novel cytoplasmic
interaction partner of the NIF3L1 protein. Biochem. Biophys. Res. Commun. 2003,309, 440–448. [CrossRef] [PubMed]
26.
Ladner, J.E.; Obmolova, G.; Teplyakov, A.; Howard, A.J.; Khil, P.P.; Camerini-Otero, R.D.; Gilliland, G.L. Crystal structure of
Escherichia coli protein YbgI, a toroidal structure with a dinuclear metal site. BMC Struct. Biol. 2003,3, 7. [CrossRef] [PubMed]
27.
Baysal, Ö.; Lai, D.; Xu, H.-H.; Siragusa, M.; Çalı¸skan, M.; Carimi, F.; da Silva, J.A.T.; Tör, M. A Proteomic Approach Provides New
Insights into the Control of Soil-Borne Plant Pathogens by Bacillus Species. PLoS ONE 2013,8, e53182. [CrossRef] [PubMed]
28.
Ashburner, M.; Misra, S.; Roote, J.; Lewis, S.E.; Blazej, R.; Davis, T.; Doyle, C.; Galle, R.; George, R.; Harris, N.; et al. An
exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: The Adh region. Genetics
1999
,153,
179–219. [CrossRef]
29.
Geisler, R.; Bergmann, A.; Hiromi, Y.; Nüsslein-Volhard, C. cactus, a gene involved in dorsoventral pattern formation of Drosophila,
is related to the IκB gene family of vertebrates. Cell 1992,71, 613–621. [CrossRef]
30.
Hadano, S.; Yanagisawa, Y.; Skaug, J.; Fichter, K.; Nasir, J.; Martindale, D.; Koop, B.F.; Scherer, S.W.; Nicholson, D.W.; Rouleau,
G.A.; et al. Cloning and characterization of three novel genes, ALS2CR1, ALS2CR2, and ALS2CR3, in the juvenile amyotrophic
lateral sclerosis (ALS2) critical region at chromosome 2q33-q34: Candidate genes for ALS2. Genomics
2001
,71, 200–213. [CrossRef]
31.
Merla, G.; Howald, C.; Antonarakis, S.E.; Reymond, A. The subcellular localization of the ChoRE-binding protein, encoded by
the Williams–Beuren syndrome critical region gene 14, is regulated by 14-3-3. Hum. Mol. Genet. 2004,13, 1505–1514. [CrossRef]
32.
Sergeeva, O.V.; Bredikhin, D.O.; Nesterchuk, M.V.; Serebryakova, M.V.; Sergiev, P.V.; Dontsova, O.A. Possible Role of Escherichia
coli Protein YbgI. Biochemistry 2018,83, 270–280. [CrossRef]
33.
Rouillard, A.D.; Gundersen, G.W.; Fernandez, N.F.; Wang, Z.; Monteiro, C.D.; McDermott, M.G.; Ma’ayan, A. The harmonizome:
A collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database
2016
,2016, baw100.
[CrossRef]
34.
Choi, H.-P.; Juarez, S.; Ciordia, S.; Fernandez, M.; Bargiela, R.; Albar, J.P.; Mazumdar, V.; Anton, B.P.; Kasif, S.; Ferrer, M.; et al.
Biochemical Characterization of Hypothetical Proteins from Helicobacter pylori.PLoS ONE 2013,8, e66605. [CrossRef]
35.
Adams, N.E.; Thiaville, J.J.; Proestos, J.; Juárez-Vázquez, A.L.; McCoy, A.J.; Barona-Gómez, F.; Iwata-Reuyl, D.; de Crécy-Lagard,
V.; Maurelli, A.T. Promiscuous and adaptable enzymes fill “holes” in the tetrahydrofolate pathway in Chlamydia species. mBio
2014,5, e01378-14. [CrossRef]
36.
De Crécy-Lagard, V. Variations in metabolic pathways create challenges for automated metabolic reconstructions: Examples from
the tetrahydrofolate synthesis pathway. Comput. Struct. Biotechnol. J. 2014,10, 41–50. [CrossRef] [PubMed]
37.
Hutchison, C.A.; Peterson, S.N.; Gill, S.R.; Cline, R.T.; White, O.; Fraser, C.M.; Smith, H.O.; Venter, J.C. Global transposon
mutagenesis and a minimal Mycoplasma genome. Science 1999,286, 2165–2169. [CrossRef]
38. Berman, H.M. The Protein Data Bank. Nucleic Acids Res. 2000,28, 235–242. [CrossRef]
39.
Burley, S.K.; Berman, H.M.; Bhikadiya, C.; Bi, C.; Chen, L.; Di Costanzo, L.; Christie, C.; Dalenberg, K.; Duarte, J.M.; Dutta, S.;
et al. RCSB Protein Data Bank: Biological macromolecular structures enabling research and education in fundamental biology,
biomedicine, biotechnology and energy. Nucleic Acids Res. 2019,47, D464–D474. [CrossRef]
40.
Bernstein, F.C.; Koetzle, T.F.; Williams, G.J.B.; Meyer, E.F.; Brice, M.D.; Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi,
M. The Protein Data Bank. A Computer-Based Archival File for Macromolecular Structures. Eur. J. Biochem.
1977
,80, 319–324.
[CrossRef]
41.
Andreini, C.; Cavallaro, G.; Lorenzini, S.; Rosato, A. MetalPDB: A database of metal sites in biological macromolecular structures.
Nucleic Acids Res. 2013,41, 312–319. [CrossRef]
42.
Putignano, V.; Rosato, A.; Banci, L.; Andreini, C. MetalPDB in 2018: A database of metal sites in biological macromolecular
structures. Nucleic Acids Res. 2018,46, D459–D464. [CrossRef]
Biomolecules 2021,11, 1282 26 of 32
43.
Luo, H.; Lin, Y.; Gao, F.; Zhang, C.-T.; Zhang, R. DEG 10, an update of the database of essential genes that includes both
protein-coding genes and noncoding genomic elements: Table 1. Nucleic Acids Res. 2014,42, D574–D580. [CrossRef] [PubMed]
44.
Chen, W.-H.; Lu, G.; Chen, X.; Zhao, X.-M.; Bork, P. OGEE v2: An update of the online gene essentiality database with special
focus on differentially essential genes in human cancer cell lines. Nucleic Acids Res. 2017,45, D940–D944. [CrossRef] [PubMed]
45. Lin, Y.; Zhang, R.R. Putative essential and core-essential genes in Mycoplasma genomes. Sci. Rep. 2011,1, 53. [CrossRef]
46.
Nevers, Y.; Kress, A.; Defosset, A.; Ripp, R.; Linard, B.; Thompson, J.D.; Poch, O.; Lecompte, O. OrthoInspector 3.0: Open portal
for comparative genomics. Nucleic Acids Res. 2019,47, D411–D418. [CrossRef]
47. Bateman, A. UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 2019,47, D506–D515. [CrossRef]
48.
Landan, G.; Graur, D. Local reliability measures from sets of co-optimal multiple sequence alignments. Pacific Symp. Biocomput.
2008,24, 15–24. [CrossRef]
49.
Penn, O.; Privman, E.; Ashkenazy, H.; Landan, G.; Graur, D.; Pupko, T. GUIDANCE: A web server for assessing alignment
confidence scores. Nucleic Acids Res. 2010,38, 23–28. [CrossRef]
50.
Sela, I.; Ashkenazy, H.; Katoh, K.; Pupko, T. GUIDANCE2: Accurate detection of unreliable alignment regions accounting for the
uncertainty of multiple parameters. Nucleic Acids Res. 2015,43, W7–W14. [CrossRef]
51.
Crooks, G.; Hon, G.; Chandonia, J.; Brenner, S. WebLogo: A sequence logo generator. Genome Res.
2004
,14, 1188–1190. [CrossRef]
52.
Minatani, K. Proposal for SVG2DOT: An Interoperable Tactile Graphics Creation System Using SVG outputs from Inkscape. Stud.
Health Technol. Inform. 2015,217, 506–511.
53.
Letunic, I.; Bork, P. Interactive Tree of Life (iTOL) v4: Recent updates and new developments. Nucleic Acids Res.
2019
,47, 256–259.
[CrossRef] [PubMed]
54.
Bethesda (MD): National Library of Medicine (US), N.C. for B.I. National Center for Biotechnology Information (NCBI) [Internet].
Available online: https://www.ncbi.nlm.nih.gov/ (accessed on 26 August 2021).
55.
Dehal, P.S.; Joachimiak, M.P.; Price, M.N.; Bates, J.T.; Baumohl, J.K.; Chivian, D.; Friedland, G.D.; Huang, K.H.; Keller, K.;
Novichkov, P.S.; et al. MicrobesOnline: An integrated portal for comparative and functional genomics. Nucleic Acids Res.
2009
,38,
396–400. [CrossRef] [PubMed]
56.
Szklarczyk, D.; Gable, A.L.; Lyon, D.; Junge, A.; Wyder, S.; Huerta-Cepas, J.; Simonovic, M.; Doncheva, N.T.; Morris, J.H.;
Bork, P.; et al. STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in
genome-wide experimental datasets. Nucleic Acids Res. 2019,47, D607–D613. [CrossRef] [PubMed]
57.
Huerta-Cepas, J.; Szklarczyk, D.; Heller, D.; Hernández-Plaza, A.; Forslund, S.K.; Cook, H.; Mende, D.R.; Letunic, I.; Rattei, T.;
Jensen, L.J.; et al. EggNOG 5.0: A hierarchical, functionally and phylogenetically annotated orthology resource based on 5090
organisms and 2502 viruses. Nucleic Acids Res. 2019,47, D309–D314. [CrossRef] [PubMed]
58.
Kanehisa, M.; Sato, Y.; Furumichi, M.; Morishima, K.; Tanabe, M. New approach for understanding genome variations in KEGG.
Nucleic Acids Res. 2019,47, D590–D595. [CrossRef]
59.
Martinez-Guerrero, C.E.; Ciria, R.; Abreu-Goodger, C.; Moreno-Hagelsieb, G.; Merino, E. GeConT 2: Gene context analysis for
orthologous proteins, conserved domains and metabolic pathways. Nucleic Acids Res. 2008,36, 176–180. [CrossRef] [PubMed]
60.
Obayashi, T.; Kagaya, Y.; Aoki, Y.; Tadaka, S.; Kinoshita, K. COXPRESdb v7: A gene coexpression database for 11 animal species
supported by 23 coexpression platforms for technical evaluation and evolutionary inference. Nucleic Acids Res.
2019
,47, D55–D62.
[CrossRef]
61.
Kustatscher, G.; Grabowski, P.; Schrader, T.A.; Passmore, J.B.; Schrader, M.; Rappsilber, J. Co-regulation map of the human
proteome enables identification of protein functions. Nat. Biotechnol. 2019,37, 1361–1371. [CrossRef] [PubMed]
62.
Raudvere, U.; Kolberg, L.; Kuzmin, I.; Arak, T.; Adler, P.; Peterson, H.; Vilo, J. g:Profiler: A web server for functional enrichment
analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019,47, W191–W198. [CrossRef]
63.
Huang, D.W.; Sherman, B.T.; Lempicki, R.A. Bioinformatics enrichment tools: Paths toward the comprehensive functional
analysis of large gene lists. Nucleic Acids Res. 2009,37, 1–13. [CrossRef]
64.
Huang, D.W.; Sherman, B.T.; Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics
resources. Nat. Protoc. 2009,4, 44–57. [CrossRef] [PubMed]
65.
Jiao, X.; Sherman, B.T.; Huang, D.W.; Stephens, R.; Baseler, M.W.; Lane, H.C.; Lempicki, R.A. DAVID-WS: A stateful web service
to facilitate gene/protein list analysis. Bioinformatics 2012,28, 1805–1806. [CrossRef] [PubMed]
66.
Bruford, E.A.; Braschi, B.; Denny, P.; Jones, T.E.M.; Seal, R.L.; Tweedie, S. Guidelines for human gene nomenclature. Nat. Genet.
2020,52, 754–758. [CrossRef] [PubMed]
67.
Baba, T.; Ara, T.; Hasegawa, M.; Takai, Y.; Okumura, Y.; Baba, M.; Datsenko, K.A.; Tomita, M.; Wanner, B.L.; Mori, H. Construction
of Escherichia coli K-12 in-frame, single-gene knockout mutants: The Keio collection. Mol. Syst. Biol.
2006
,2, 2006.0008. [CrossRef]
[PubMed]
68.
Hutinet, G.; Kot, W.; Cui, L.; Hillebrand, R.; Balamkundu, S.; Gnanakalai, S.; Neelakandan, R.; Carstens, A.B.; Fa Lui, C.;
Tremblay, D.; et al. 7-Deazaguanine modifications protect phage DNA from host restriction systems. Nat. Commun.
2019
,10, 5442.
[CrossRef]
69.
Datsenko, K.A.; Wanner, B.L. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc. Natl.
Acad. Sci. USA 2000,97, 6640–6645. [CrossRef]
70.
Martens, J.A.; Genereaux, J.; Saleh, A.; Brandl, C.J. Transcriptional Activation by Yeast PDR1p Is Inhibited by Its Association with
NGG1p/ADA3p. J. Biol. Chem. 1996,271, 15884–15890. [CrossRef]
Biomolecules 2021,11, 1282 27 of 32
71.
Gou, Y.; Graff, F.; Kilian, O.; Kafkas, S.; Katuri, J.; Kim, J.H.; Marinos, N.; McEntyre, J.; Morrison, A.; Pi, X.; et al. Europe PMC: A
full-text literature database for the life sciences and platform for innovation. Nucleic Acids Res.
2015
,43, D1042–D1048. [CrossRef]
72.
Karniely, S.; Rayzner, A.; Sass, E.; Pines, O.
α
-Complementation as a probe for dual localization of mitochondrial proteins. Exp.
Cell Res. 2006,312, 3835–3846. [CrossRef]
73.
Chen, J.; Gai, Q.; Lv, Z.; Chen, J.; Nie, Z.; Wu, X.; Zhang, Y. All-trans retinoic acid affects subcellular localization of a novel
BmNIF3l protein: Functional deduce and tissue distribution of NIF3l gene from silkworm (Bombyx mori). Arch. Insect Biochem.
Physiol. 2010,74, 217–231. [CrossRef]
74.
Manan, A.; Bazai, Z.; Fan, J.; Yu, H.; Li, L. The Nif3-family protein YqfO03 from Pseudomonas syringae MB03 has multiple
nematicidal activities against Caenorhabditis elegans and Meloidogyne incognita.Int. J. Mol. Sci. 2018,19, 3915. [CrossRef]
75.
Li, Y.; Xie, B.; Jiang, Z.; Yuan, B. Relationship between osteoporosis and osteoarthritis based on DNA methylation. Int. J. Clin. Exp.
Pathol. 2019,12, 3399–3407. [PubMed]
76.
Yu, N.; Shin, S.; Lee, K.-A. First Korean Case of SATB2 -Associated 2q32-q33 Microdeletion Syndrome. Ann. Lab. Med.
2015
,35,
275. [CrossRef]
77.
Huang, S.; Li, Y.; Chen, Y.; Podsypanina, K.; Chamorro, M.; Olshen, A.B.; Desai, K.V.; Tann, A.; Petersen, D.; Green, J.E.; et al.
Changes in gene expression during the development of mammary tumors in MMTV-Wnt-1 transgenic mice. Genome Biol.
2005
,6,
R84. [CrossRef] [PubMed]
78.
Jostes, S.V. The bromodomain Inhibitor JQ1 as Novel Therapeutic Option for Type II Testicular Germ Cell Tumours: The Role of SOX2 and
SOX17 in Regulating Germ Cell Tumour Pluripotency; Rheinischen Friedrich-Wilhelms-Universität: Bonn, Germany, 2019.
79.
Lin, C.-Y.; Ström, A.; Vega, V.B.; Kong, S.L.; Yeo, A.L.; Thomsen, J.S.; Chan, W.C.; Doray, B.; Bangarusamy, D.K.; Ramasamy, A.;
et al. Discovery of estrogen receptor alpha target genes and response elements in breast tumor cells. Genome Biol.
2004
,5, R66.
[CrossRef] [PubMed]
80.
Xi, Y.; Riker, A.; Shevde-Samant, L.; Samant, R.; Morris, C.; Gavin, E.; Fodstad, O.; Ju, J. Global comparative gene expression
analysis of melanoma patient samples, derived cell lines and corresponding tumor xenografts. Cancer Genom. Proteom.
2011
,5,
1–35. [CrossRef]
81.
Schrader, A.; Meyer, K.; Walther, N.; Stolz, A.; Feist, M.; Hand, E.; von Bonin, F.; Evers, M.; Kohler, C.; Shirneshan, K.; et al.
Identification of a new gene regulatory circuit involving B cell receptor activated signaling using a combined analysis of
experimental, clinical and global gene expression data. Oncotarget 2016,7, 47061–47081. [CrossRef]
82.
Uxa, S.; Bernhart, S.H.; Mages, C.F.S.; Fischer, M.; Kohler, R.; Hoffmann, S.; Stadler, P.F.; Engeland, K.; Müller, G.A. DREAM and
RB cooperate to induce gene repression and cell-cycle arrest in response to p53 activation. Nucleic Acids Res.
2019
,47, 9087–9103.
[CrossRef]
83.
Xiang, Y.; Zhang, C.-Q.; Huang, K. Predicting glioblastoma prognosis networks using weighted gene co-expression network
analysis on TCGA data. BMC Bioinformatics 2012,13, S12. [CrossRef] [PubMed]
84.
Cury, S.S.; Lapa, R.M.L.; de Mello, J.B.H.; Marchi, F.A.; Domingues, M.A.C.; Pinto, C.A.L.; Carvalho, R.F.; de Carvalho, G.B.;
Kowalski, L.P.; Rogatto, S.R. Increased DSG2 plasmatic levels identified by transcriptomic-based secretome analysis is a potential
prognostic biomarker in laryngeal carcinoma. Oral Oncol. 2020,103, 104592. [CrossRef] [PubMed]
85.
Qu, S.; Shi, Q.; Xu, J.; Yi, W.; Fan, H. Weighted Gene Coexpression Network Analysis Reveals the Dynamic Transcriptome
Regulation and Prognostic Biomarkers of Hepatocellular Carcinoma. Evol. Bioinform.
2020
,16, 117693432092056. [CrossRef]
[PubMed]
86.
Wu, J.; Liu, S.; Xiang, Y.; Qu, X.; Xie, Y.; Zhang, X. Bioinformatic Analysis of Circular RNA-Associated ceRNA Network Associated
with Hepatocellular Carcinoma. BioMed Res. Int. 2019,2019, 8308694. [CrossRef] [PubMed]
87.
Quigley, D.A.; Fiorito, E.; Nord, S.; Van Loo, P.; Alnaes, G.G.; Fleischer, T.; Tost, J.; Moen Vollan, H.K.; Tramm, T.; Overgaard, J.;
et al. The 5p12 breast cancer susceptibility locus affects MRPS30 expression in estrogen-receptor positive tumors. Mol. Oncol.
2014,8, 273–284. [CrossRef] [PubMed]
88.
Kusonmano, K.; Halle, M.K.; Wik, E.; Hoivik, E.A.; Krakstad, C.; Mauland, K.K.; Tangen, I.L.; Berg, A.; Werner, H.M.J.; Trovik, J.;
et al. Identification of highly connected and differentially expressed gene subnetworks in metastasizing endometrial cancer. PLoS
ONE 2018,13, e0206665. [CrossRef] [PubMed]
89.
Wang, M.; Li, L.; Liu, J.; Wang, J. A gene interaction network-based method to measure the common and heterogeneous
mechanisms of gynecological cancer. Mol. Med. Rep. 2018,18, 230–242. [CrossRef]
90.
Antoniali, G.; Serra, F.; Lirussi, L.; Tanaka, M.; D’Ambrosio, C.; Zhang, S.; Radovic, S.; Dalla, E.; Ciani, Y.; Scaloni, A.; et al.
Mammalian APE1 controls miRNA processing and its interactome is linked to cancer RNA metabolism. Nat. Commun.
2017
,8,
797. [CrossRef] [PubMed]
91.
Schneeweiss, A.; Hartkopf, A.D.; Müller, V.; Wöckel, A.; Lux, M.P.; Janni, W.; Ettl, J.; Belleville, E.; Huober, J.; Thill, M.; et al.
Update Breast Cancer 2020 Part 1 – Early Breast Cancer: Consolidation of Knowledge About Known Therapies. Geburtshilfe
Frauenheilkd. 2020,80, 277–287. [CrossRef] [PubMed]
92.
Codrich, M.; Comelli, M.; Malfatti, M.C.; Mio, C.; Ayyildiz, D.; Zhang, C.; Kelley, M.R.; Terrosu, G.; Pucillo, C.E.M.; Tell, G.
Inhibition of APE1-endonuclease activity affects cell metabolism in colon cancer cells via a p53-dependent pathway. DNA Repair
2019,82, 102675. [CrossRef]
Biomolecules 2021,11, 1282 28 of 32
93.
Wang, L.-J.; Hsu, C.-W.; Chen, C.-C.; Liang, Y.; Chen, L.-C.; Ojcius, D.M.; Tsang, N.-M.; Hsueh, C.; Wu, C.-C.; Chang, Y.-S.
Interactome-wide Analysis Identifies End-binding Protein 1 as a Crucial Component for the Speck-like Particle Formation of
Activated Absence in Melanoma 2 (AIM2) Inflammasomes. Mol. Cell. Proteom. 2012,11, 1230–1244. [CrossRef] [PubMed]
94.
Chauhan, L.; Jenkins, G.D.; Bhise, N.; Feldberg, T.; Mitra-Ghosh, T.; Fridley, B.L.; Lamba, J.K. Genome-wide association analysis
identified splicing single nucleotide polymorphism in CFLAR predictive of triptolide chemo-sensitivity. BMC Genom.
2015
,16,
483. [CrossRef]
95.
Kalari, K.R.; Necela, B.M.; Tang, X.; Thompson, K.J.; Lau, M.; Eckel-Passow, J.E.; Kachergus, J.M.; Anderson, S.K.; Sun, Z.; Baheti,
S.; et al. An Integrated Model of the Transcriptome of HER2-Positive Breast Cancer. PLoS ONE 2013,8, e79298. [CrossRef]
96.
Ahmed, S.S.S.J.; Ahameethunisa, A.R.; Santosh, W.; Chakravarthy, S.; Kumar, S. Systems biological approach on neurological
disorders: A novel molecular connectivity to aging and psychiatric diseases. BMC Syst. Biol. 2011,5, 6. [CrossRef] [PubMed]
97.
Malan-Müller, S.; de Souza, V.B.C.; Daniels, W.M.U.; Seedat, S.; Robinson, M.D.; Hemmings, S.M.J. Shedding Light on the
Transcriptomic Dark Matter in Biological Psychiatry: Role of Long Noncoding RNAs in D-cycloserine-Induced Fear Extinction in
Posttraumatic Stress Disorder. OMICS J. Integr. Biol. 2020,24, 352–369. [CrossRef]
98. Qiu, L.; Liu, X. Identification of key genes involved in myocardial infarction. Eur. J. Med. Res. 2019,24, 22. [CrossRef]
99.
Lin, H. Identification of Potential coregenes in Sevoflurance induced Myocardial Energy Metabolismin Patients Undergoing
Off-pump Coronary Artery Bypass Graft Surgery using Bioinformatics analysis. Res. Sq. 2019, 1–16. [CrossRef]
100.
Chekouo, T.; Safo, S.E. Bayesian Integrative Analysis and Prediction with Application to Atherosclerosis Cardiovascular Disease.
arXiv 2020, arXiv:2005.11586.
101.
Winer, D.A.; Winer, S.; Shen, L.; Wadia, P.P.; Yantha, J.; Paltser, G.; Tsui, H.; Wu, P.; Davidson, M.G.; Alonso, M.N.; et al. B cells
promote insulin resistance through modulation of T cells and production of pathogenic IgG antibodies. Nat. Med.
2011
,17,
610–617. [CrossRef]
102.
Xia, B.; Li, Y.; Zhou, J.; Tian, B.; Feng, L. Identification of potential pathogenic genes associated with osteoporosis. Bone Jt. Res.
2017,6, 640–648. [CrossRef]
103.
Thankam, F.G.; Boosani, C.S.; Dilisio, M.F.; Agrawal, D.K. MicroRNAs associated with inflammation in shoulder tendinopathy
and glenohumeral arthritis. Mol. Cell. Biochem. 2018,437, 81–97. [CrossRef]
104.
Wang, J.C.; Ramaswami, G.; Geschwind, D.H. Gene co-expression network analysis in human spinal cord highlights mechanisms
underlying amyotrophic lateral sclerosis susceptibility. bioRxiv 2020,11, 1–14.
105.
Lv, L.; Zhang, D.; Hua, P.; Yang, S. The glial-specific hypermethylated 3
0
untranslated region of histone deacetylase 1 may
modulates several signal pathways in Alzheimer’s disease. Life Sci. 2021,265, 118760. [CrossRef] [PubMed]
106.
Tian, Y.; Voineagu, I.; Pa¸sca, S.P.; Won, H.; Chandran, V.; Horvath, S.; Dolmetsch, R.E.; Geschwind, D.H. Alteration in basal and
depolarization induced transcriptional network in iPSC derived neurons from Timothy syndrome. Genome Med.
2014
,6, 75.
[CrossRef] [PubMed]
107.
Akiyama, H.; Fujisawa, N.; Tashiro, Y.; Takanabe, N.; Sugiyama, A.; Tashiro, F. The Role of Transcriptional Corepressor Nif3l1
in Early Stage of Neural Differentiation via Cooperation with Trip15/CSN2. J. Biol. Chem.
2003
,278, 10752–10762. [CrossRef]
[PubMed]
108.
Duzyj, C.M.; Paidas, M.J.; Jebailey, L.; Huang, J.; Barnea, E.R. PreImplantation factor (PIF*) promotes embryotrophic and
neuroprotective decidual genes: Effect negated by epidermal growth factor. J. Neurodev. Disord.
2014
,6, 36. [CrossRef] [PubMed]
109.
Akiyama, H. Implication of Trip15/CSN2 in early stage of neuronal differentiation of P19 embryonal carcinoma cells. Dev. Brain
Res. 2003,140, 45–56. [CrossRef]
110.
Boswell, W.T.; Boswell, M.; Walter, D.J.; Navarro, K.L.; Chang, J.; Lu, Y.; Savage, M.G.; Shen, J.; Walter, R.B. Exposure to 4100 K
fluorescent light elicits sex specific transcriptional responses in Xiphophorus maculatus skin. Comp. Biochem. Physiol. Part C Toxicol.
Pharmacol. 2018,208, 96–104. [CrossRef] [PubMed]
111.
Zuccotti, M.; Merico, V.; Sacchi, L.; Bellone, M.; Brink, T.C.; Bellazzi, R.; Stefanelli, M.; Redi, C.; Garagna, S.; Adjaye, J. Maternal
Oct-4 is a potential key regulator of the developmental competence of mouse oocytes. BMC Dev. Biol.
2008
,8, 97. [CrossRef]
[PubMed]
112.
Skottman, H.; Mikkola, M.; Lundin, K.; Olsson, C.; Strömberg, A.-M.; Tuuri, T.; Otonkoski, T.; Hovatta, O.; Lahesmaa, R. Gene
Expression Signatures of Seven Individual Human Embryonic Stem Cell Lines. Stem Cells
2005
,23, 1343–1356. [CrossRef]
[PubMed]
113.
Yan, L.; Yao, X.; Bachvarov, D.; Saifudeen, Z.; El-Dahr, S.S. Genome-wide analysis of gestational gene-environment interactions in
the developing kidney. Physiol. Genom. 2014,46, 655–670. [CrossRef]
114.
Liang, W.; Bi, Y.; Wang, H.; Dong, S.; Li, K.; Li, J. Gene Expression Profiling of Clostridium botulinum under Heat Shock Stress.
BioMed Res. Int. 2013,2013, 760904. [CrossRef]
115.
Selby, K.; Mascher, G.; Somervuo, P.; Lindström, M.; Korkeala, H. Heat shock and prolonged heat stress attenuate neurotoxin
and sporulation gene expression in group I Clostridium botulinum strain ATCC 3502.PLoS ONE
2017
,12, e0176944. [CrossRef]
[PubMed]
116.
Anderson, K.L.; Roux, C.M.; Olson, M.W.; Luong, T.T.; Lee, C.Y.; Olson, R.; Dunman, P.M. Characterizing the effects of inorganic
acid and alkaline shock on the Staphylococcus aureus transcriptome and messenger RNA turnover. FEMS Immunol. Med. Microbiol.
2010,60, 208–250. [CrossRef] [PubMed]
Biomolecules 2021,11, 1282 29 of 32
117.
Belvin, B.R.; Gui, Q.; Hutcherson, J.A.; Lewis, J.P. The Porphyromonas gingivalis hybrid cluster protein Hcp is required for growth
with nitrite and survival with host cells. Infect. Immun. 2019,87. [CrossRef] [PubMed]
118.
Aurass, P.; Pless, B.; Rydzewski, K.; Holland, G.; Bannert, N.; Flieger, A. bdhA-patD Operon as a Virulence Determinant, Revealed
by a Novel Large-Scale Approach for Identification of Legionella pneumophila Mutants Defective for Amoeba Infection. Appl.
Environ. Microbiol. 2009,75, 4506–4515. [CrossRef]
119.
Zhao, W.; Caro, F.; Robins, W.; Mekalanos, J.J. Antagonism toward the intestinal microbiota and its effect on Vibrio cholerae
virulence. Science 2018,359, 210–213. [CrossRef] [PubMed]
120.
Gangaiah, D.; Labandeira-Rey, M.; Zhang, X.; Fortney, K.R.; Ellinger, S.; Zwickl, B.; Baker, B.; Liu, Y.; Janowicz, D.M.; Katz, B.P.;
et al. Haemophilus ducreyi Hfq Contributes to Virulence Gene Regulation as Cells Enter Stationary Phase. mBio
2014
,5, e01081-13.
[CrossRef] [PubMed]
121.
Labandeira-Rey, M.; Mock, J.R.; Hansen, E.J. Regulation of Expression of the Haemophilus ducreyi LspB and LspA2 Proteins by
CpxR. Infect. Immun. 2009,77, 3402–3411. [CrossRef]
122.
Spinola, S.M.; Fortney, K.R.; Baker, B.; Janowicz, D.M.; Zwickl, B.; Katz, B.P.; Blick, R.J.; Munson, R.S. Activation of the CpxRA
System by Deletion of cpxA Impairs the Ability of Haemophilus ducreyi To Infect Humans. Infect. Immun.
2010
,78, 3898–3904.
[CrossRef]
123.
Rahmani-Badi, A.; Sepehr, S.; Fallahi, H.; Heidari-Keshel, S. Erratum: Exposure of E. coli to DNA-Methylating Agents Impairs
Biofilm Formation and Invasion of Eukaryotic Cells via Down Regulation of the N-Acetylneuraminate Lyase NanA. Front.
Microbiol. 2016,7, 1–13. [CrossRef]
124.
Dunman, P.M.; Murphy, E.; Haney, S.; Palacios, D.; Tucker-Kellogg, G.; Wu, S.; Brown, E.L.; Zagursky, R.J.; Shlaes, D.; Projan, S.J.
Transcription Profiling-Based Identification of Staphylococcus aureus Genes Regulated by the agr and/or sarA Loci. J. Bacteriol.
2001,183, 7341–7353. [CrossRef] [PubMed]
125.
Pereira, L.E.; Tsang, J.; Mrázek, J.; Hoover, T.R. The zinc-ribbon domain of Helicobacter pylori HP0958: Requirement for RpoN
accumulation and possible roles of homologs in other bacteria. Microb. Inform. Exp. 2011,1, 8. [CrossRef] [PubMed]
126.
Pomposiello, P.J.; Bennik, M.H.J.; Demple, B. Genome-Wide Transcriptional Profiling of the Escherichia coli Responses to Superoxide
Stress and Sodium Salicylate. J. Bacteriol. 2001,183, 3890–3902. [CrossRef] [PubMed]
127.
Peng, C.; Andersen, B.; Arshid, S.; Larsen, M.R.; Albergaria, H.; Lametsch, R.; Arneborg, N. Proteomics insights into the responses
of Saccharomyces cerevisiae during mixed-culture alcoholic fermentation with Lachancea thermotolerans.FEMS Microbiol. Ecol.
2019
,
95, 1–16. [CrossRef]
128.
Shulami, S.; Shenker, O.; Langut, Y.; Lavid, N.; Gat, O.; Zaide, G.; Zehavi, A.; Sonenshein, A.L.; Shoham, Y. Multiple Regulatory
Mechanisms Control the Expression of the Geobacillus stearothermophilus Gene for Extracellular Xylanase. J. Biol. Chem.
2014
,289,
25957–25975. [CrossRef]
129.
Ogura, M.; Sato, T.; Abe, K. Bacillus subtilis YlxR, Which Is Involved in Glucose-Responsive Metabolic Changes, Regulates
Expression of tsaD for Protein Quality Control of Pyruvate Dehydrogenase. Front. Microbiol. 2019,10, 1–15. [CrossRef]
130.
Chen, S.-C.; Huang, C.-H.; Yang, C.S.; Kuan, S.-M.; Lin, C.-T.; Chou, S.-H.; Chen, Y. Crystal Structure of a Conserved Hypothetical
Protein MJ0927 from Methanocaldococcus jannaschii Reveals a Novel Quaternary Assembly in the Nif3 Family. BioMed Res. Int.
2014,2014, 171263. [CrossRef]
131.
Tomoike, F.; Wakamatsu, T.; Nakagawa, N.; Kuramitsu, S.; Masui, R. Crystal structure of the conserved hypothetical protein
TTHA1606 from Thermus thermophilus HB8.Proteins Struct. Funct. Bioinforma. 2009,76, 244–248. [CrossRef] [PubMed]
132.
Fujishiro, T.; Ermler, U.; Shima, S. A possible iron delivery function of the dinuclear iron center of HcgD in [Fe]-hydrogenase
cofactor biosynthesis. FEBS Lett. 2014,588, 2789–2793. [CrossRef] [PubMed]
133.
Lie, T.J.; Costa, K.C.; Pak, D.; Sakesan, V.; Leigh, J.A. Phenotypic evidence that the function of the [Fe]-hydrogenase Hmd in
Methanococcus maripaludis requires seven hcg ( hmd co-occurring genes) but not hmdII. FEMS Microbiol. Lett.
2013
,343, 156–160.
[CrossRef]
134.
Godsey, M.H.; Minasov, G.; Shuvalova, L.; Brunzelle, J.S.; Vorontsov, I.I.; Collart, F.R.; Anderson, W.F. The 2.2 Å resolution crystal
structure of Bacillus cereus Nif3-family protein YqfO reveals a conserved dimetal-binding motif and a regulatory domain. Protein
Sci. 2007,16, 1285–1293. [CrossRef] [PubMed]
135.
Lamba, J.K.; Feldberg, T.; Ghosh, T.M.; Bhise, N.; Fridley, B. Abstract 2214: Genome-wide association analysis identified genetic
markers associated with triptolide cellular sensitivity using HapMap LCLs as model system. In Proceedings of the Experimental and
Molecular Therapeutics; American Association for Cancer Research: Philadelphia, PA, USA, 2013; Volume 73, p. 2214.
136.
Malik, A.; Pande, K.; Kumar, A.; Vemula, A.; Chandramohan, M.R.V. Finding Pathogenic nsSNP’s and their structural effect on
COPS2 using Molecular Dynamic Approach. bioRxiv 2020. [CrossRef]
137.
Kuan, S.-M.; Chen, H.-C.; Huang, C.-H.; Chang, C.-H.; Chen, S.-C.; Yang, C.S.; Chen, Y. Crystallization and preliminary X-ray
diffraction analysis of the Nif3-family protein MJ0927 from Methanocaldococcus jannaschii.Acta Crystallogr. Sect. F Struct. Biol.
Cryst. Commun. 2013,69, 80–82. [CrossRef]
138.
Saikatendu, K.S.; Zhang, X.; Kinch, L.; Leybourne, M.; Grishin, N.V.; Zhang, H. Structure of a conserved hypothetical protein
SA1388 from S. aureus reveals a capped hexameric toroid with two PII domain lids and a dinuclear metal center. BMC Struct. Biol.
2006,6, 27. [CrossRef]
Biomolecules 2021,11, 1282 30 of 32
139.
Constantine, K.L.; Krystek, S.R.; Healy, M.D.; Doyle, M.L.; Siemers, N.O.; Thanassi, J.; Yan, N.; Xie, D.; Goldfarb, V.; Yanchunas,
J.; et al. Structural and functional characterization of CFE88: Evidence that a conserved and essential bacterial protein is a
methyltransferase. Protein Sci. 2009,14, 1472–1484. [CrossRef]
140. Qijing, G.; Zhang, Y. NIF3 Superfamily protein. Chin. J. Cell Biol. 2007,29, 816–820.
141. Corpet, F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 1988,16, 10881–10890. [CrossRef]
142.
Robert, X.; Gouet, P. Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res.
2014
,42,
320–324. [CrossRef]
143.
Yang, J.; Li, Q.; Yang, H.; Yan, L.; Yang, L.; Yu, L. Overexpression of human CUTA isoform 2 enhances the cytotoxicity of copper
to HeLa cells. Acta Biochim. Pol. 2008,55, 411–415. [CrossRef]
144.
Gupta, S.D.; Lee, B.T.O.; Camakaris, J.; Wu, H.C. Identification of cutC and cutF (nlpE) genes involved in copper tolerance in
Escherichia coli.J. Bacteriol. 1995,177, 4207–4215. [CrossRef]
145.
Fong, S.T.; Camakaris, J.; Lee, B.T. Molecular genetics of a chromosomal locus involved in copper tolerance in Escherichia coli K-12.
Mol. Microbiol. 1995,15, 1127–1137. [CrossRef]
146.
Tanaka, Y.; Tsumoto, K.; Nakanishi, T.; Yasutake, Y.; Sakai, N.; Yao, M.; Tanaka, I.; Kumagai, I. Structural implications for heavy
metal-induced reversible assembly and aggregation of a protein: The case of Pyrococcus horikoshii CutA. FEBS Lett.
2004
,556,
167–174. [CrossRef]
147.
Odermatt, A.; Solioz, M. Two trans-acting metalloregulatory proteins controlling expression of the copper-ATPases of Enterococcus
hirae.J. Biol. Chem. 1995,270, 4349–4354. [CrossRef] [PubMed]
148.
Rensing, C.; Franke, S. Copper Homeostasis in Escherichia coli and Other Enterobacteriaceae.EcoSal Plus
2007
,2, ecosalplus.5.4.4.1.
[CrossRef]
149.
Bagautdinov, B. The structures of the CutA1 proteins from Thermus thermophilus and Pyrococcus horikoshii: Characterization of
metal-binding sites and metal-induced assembly. Acta Crystallogr. Sect. F Struct. Biol. Commun. 2014,70, 404–413. [CrossRef]
150.
Krissinel, E.; Henrick, K. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions.
Acta Crystallogr. Sect. D Biol. Crystallogr. 2004,60, 2256–2268. [CrossRef]
151.
Siltberg-Liberles, J.; Martinez, A. Searching distant homologs of the regulatory ACT domain in phenylalanine hydroxylase. Amino
Acids 2009,36, 235–249. [CrossRef]
152.
Arnesano, F.; Banci, L.; Benvenuti, M.; Bertini, I.; Calderone, V.; Mangani, S.; Viezzoli, M.S. The Evolutionarily Conserved Trimeric
Structure of CutA1 Proteins Suggests a Role in Signal Transduction. J. Biol. Chem. 2003,278, 45999–46006. [CrossRef]
153. Forchhammer, K.; Lüddecke, J. Sensory properties of the PII signalling protein family. FEBS J. 2016,283, 425–437. [CrossRef]
154.
Selim, K.A.; Tremiño, L.; Marco-Marín, C.; Alva, V.; Espinosa, J.; Contreras, A.; Hartmann, M.D.; Forchhammer, K.; Rubio, V.
Functional and structural characterization of PII-like protein CutA does not support involvement in heavy metal tolerance and
hints at a small-molecule carrying/signaling role. FEBS J. 2021,288, 1142–1162. [CrossRef]
155.
Selim, K.A.; Haffner, M. Heavy Metal Stress Alters the Response of the Unicellular Cyanobacterium Synechococcus elongatus PCC
7942 to Nitrogen Starvation. Life 2020,10, 275. [CrossRef]
156.
Koga, R.; Matsumoto, A.; Kouzuma, A.; Watanabe, K. Identification of an extracytoplasmic function sigma factor that facilitates
c-type cytochrome maturation and current generation under electrolyte-flow conditions in Shewanella oneidensis MR-1.Environ.
Microbiol. 2020,22, 3671–3684. [CrossRef] [PubMed]
157.
Manina, G.; Bellinzoni, M.; Pasca, M.R.; Neres, J.; Milano, A.; De Jesus Lopes Ribeiro, A.L.; Buroni, S.; Škovierová, H.; Dianišková,
P.; Mikušová, K.; et al. Biological and structural characterization of the Mycobacterium smegmatis nitroreductase NfnB, and its role
in benzothiazinone resistance. Mol. Microbiol. 2010,77, 1172–1185. [CrossRef] [PubMed]
158.
Markowitz, V.M.; Chen, I.M.A.; Palaniappan, K.; Chu, K.; Szeto, E.; Grechkin, Y.; Ratner, A.; Anderson, I.; Lykidis, A.; Mavromatis,
K.; et al. The integrated microbial genomes system: An expanding comparative analysis resource. Nucleic Acids Res.
2009
,38,
382–390. [CrossRef]
159.
Grigoriev, I.V.; Nordberg, H.; Shabalov, I.; Aerts, A.; Cantor, M.; Goodstein, D.; Kuo, A.; Minovitsky, S.; Nikitin, R.; Ohm, R.A.;
et al. The genome portal of the Department of Energy Joint Genome Institute. Nucleic Acids Res.
2012
,40, D26–D32. [CrossRef]
[PubMed]
160.
Nordberg, H.; Cantor, M.; Dusheyko, S.; Hua, S.; Poliakov, A.; Shabalov, I.; Smirnova, T.; Grigoriev, I.V.; Dubchak, I. The genome
portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res. 2014,42, 26–31. [CrossRef]
161.
Waldron, K.J.; Rutherford, J.C.; Ford, D.; Robinson, N.J. Metalloproteins and metal sensing. Nature
2009
,460, 823–830. [CrossRef]
[PubMed]
162.
Wu, X.; Haakonsen, D.L.; Sanderlin, A.G.; Liu, Y.J.; Shen, L.; Zhuang, N.; Laub, M.T.; Zhang, Y. Structural insights into the unique
mechanism of transcription activation by Caulobacter crescentus GcrA. Nucleic Acids Res. 2018,46, 3245–3256. [CrossRef]
163.
Stamford, N.P.; Lilley, P.E.; Dixon, N.E. Enriched sources of Escherichia coli replication proteins. The dnaG primase is a zinc
metalloprotein. Biochim. Biophys. Acta 1992,1132, 17–25. [CrossRef]
164.
Czubat, B.; Minias, A.; Brzostek, A.; ˙
Zaczek, A.; Stru´s, K.; Zakrzewska-Czerwi´nska, J.; Dziadek, J. Functional Disassociation
Between the Protein Domains of MSMEG_4305 of Mycolicibacterium smegmatis (Mycobacterium smegmatis)
in vivo
.Front.
Microbiol. 2020,11, 1–15. [CrossRef]
165.
Nowotny, M.; Yang, W. Stepwise analyses of metal ions in RNase H catalysis from substrate destabilization to product release.
EMBO J. 2006,25, 1924–1933. [CrossRef]
Biomolecules 2021,11, 1282 31 of 32
166.
Niyomporn, B.; Dahl, J.L.; Strominger, J.L. Biosynthesis of the peptidoglycan of bacterial cell walls. IX. Purification and properties
of glycyl transfer ribonucleic acid synthetase from Staphylococcus aureus. J. Biol. Chem. 1968,243, 773–778. [CrossRef]
167.
Pelosi, L.; Vo, C.-D.-T.; Abby, S.S.; Loiseau, L.; Rascalou, B.; Hajj Chehade, M.; Faivre, B.; Goussé, M.; Chenal, C.; Touati, N.; et al.
Ubiquinone Biosynthesis over the Entire O2 Range: Characterization of a Conserved O2-Independent Pathway. mBio
2019
,10,
e01319-19. [CrossRef]
168.
Kato, T.; Takahashi, N.; Kuramitsu, H.K. Sequence analysis and characterization of the Porphyromonas gingivalis prtC gene,
which expresses a novel collagenase activity. J. Bacteriol. 1992,174, 3889–3895. [CrossRef]
169.
Cunningham, R.P.; Ahern, H.; Xing, D.; Thayer, M.M.; Tainer, J.A. Structure and function of Escherichia coli endonuclease III.
Ann. N. Y. Acad. Sci. 1994,726, 215–222. [CrossRef] [PubMed]
170.
Ryan, K.A.; Karim, N.; Worku, M.; Moore, S.A.; Penn, C.W.; O’Toole, P.W. HP0958 is an essential motility gene in Helicobacter
pylori.FEMS Microbiol. Lett. 2005,248, 47–55. [CrossRef]
171.
Kumar, A.; Karthikeyan, S. Crystal structure of the MSMEG_4306 gene product from Mycobacterium smegmatis.Acta Crystallogr.
Sect. F Struct. Biol. Commun. 2018,74, 166–173. [CrossRef] [PubMed]
172.
Barta, M.L.; Battaile, K.P.; Lovell, S.; Hefty, P.S. Hypothetical protein CT398 (CdsZ) interacts with
σ
54 (RpoN)-holoenzyme and
the type III secretion export apparatus in Chlamydia trachomatis.Protein Sci. 2015,24, 1617–1632. [CrossRef] [PubMed]
173.
Rees, W.D.; Lorenzo-Leal, A.C.; Steiner, T.S.; Bach, H. Mycobacterium avium Subspecies paratuberculosis Infects and Replicates within
Human Monocyte-Derived Dendritic Cells. Microorganisms 2020,8, 994. [CrossRef]
174.
Kim, W.S.; Shin, M.-K.; Shin, S.J. MAP1981c, a Putative Nucleic Acid-Binding Protein, Produced by Mycobacterium avium subsp.
paratuberculosis, Induces Maturation of Dendritic Cells and Th1-Polarization. Front. Cell. Infect. Microbiol. 2018,8. [CrossRef]
175.
Sassetti, C.M.; Boyd, D.H.; Rubin, E.J. Genes required for mycobacterial growth defined by high density mutagenesis. Mol.
Microbiol. 2003,48, 77–84. [CrossRef]
176.
Lu, S.; Wang, J.; Chitsaz, F.; Derbyshire, M.K.; Geer, R.C.; Gonzales, N.R.; Gwadz, M.; Hurwitz, D.I.; Marchler, G.H.; Song, J.S.;
et al. CDD/SPARCLE: The conserved domain database in 2020. Nucleic Acids Res. 2020,48, D265–D268. [CrossRef]
177.
Yanai, I.; Hunter, C.P. Comparison of diverse developmental transcriptomes reveals that coexpression of gene neighbors is not
evolutionarily conserved. Genome Res. 2009,19, 2214–2220. [CrossRef] [PubMed]
178.
Sheftel, A.D.; Wilbrecht, C.; Stehling, O.; Niggemeyer, B.; Elsässer, H.P.; Mühlenhoff, U.; Lill, R. The human mitochondrial ISCA1,
ISCA2, and IBA57 proteins are required for [4Fe-4S] protein maturation. Mol. Biol. Cell 2012,23, 1157–1166. [CrossRef]
179.
Cai, K.; Markley, J. NMR as a Tool to Investigate the Processes of Mitochondrial and Cytosolic Iron-Sulfur Cluster Biosynthesis.
Molecules 2018,23, 2213. [CrossRef] [PubMed]
180.
Katzemeier, G.; Schmid, C.; Kellermann, J.; Lottspeich, F.; Bacher, A. Biosynthesis of Tetrahydrofolate. Sequence of GTP
Cyclohydrolase I from Escherichia coli.Biol. Chem. Hoppe. Seyler. 1991,372, 991–998. [CrossRef] [PubMed]
181. Cossins, E.A.; Chen, L. Folates and one-carbon metabolism in plants and fungi. Phytochemistry 1997,45, 437–452. [CrossRef]
182.
Burg, A.W.; Brown, G.M. The biosynthesis of folic acid. 8. Purification and properties of the enzyme that catalyzes the production
of formate from carbon atom 8 of guanosine triphosphate. J. Biol. Chem. 1968,243, 2349–2358. [CrossRef]
183.
Thöny, B.; Auerbach, G.; Blau, N. Tetrahydrobiopterin biosynthesis, regeneration and functions. Biochem. J.
2000
,347, 1–16.
[CrossRef]
184.
Phillips, G.; El Yacoubi, B.; Lyons, B.; Alvarez, S.; Iwata-Reuyl, D.; De Crécy-Lagard, V. Biosynthesis of 7-deazaguanosine-modified
tRNA nucleosides: A new role for GTP cyclohydrolase I. J. Bacteriol. 2008,190, 7876–7884. [CrossRef]
185.
El Yacoubi, B.; Bonnett, S.; Anderson, J.N.; Swairjo, M.A.; Iwata-Reuyl, D.; De Crécy-Lagard, V. Discovery of a new prokaryotic
type I GTP cyclohydrolase family. J. Biol. Chem. 2006,281, 37586–37593. [CrossRef]
186.
Paranagama, N.; Bonnett, S.A.; Alvarez, J.; Luthra, A.; Stec, B.; Gustafson, A.; Iwata-Reuyl, D.; Swairjo, M.A. Mechanism and
catalytic strategy of the prokaryotic-specific GTP cyclohydrolase-IB. Biochem. J. 2017,474, 1017–1039. [CrossRef]
187.
Sankaran, B.; Bonnett, S.A.; Shah, K.; Gabriel, S.; Reddy, R.; Schimmel, P.; Rodionov, D.A.; De Crécy-Lagard, V.; Helmann, J.D.;
Iwata-Reuyl, D.; et al. Zinc-independent folate biosynthesis: Genetic, biochemical, and structural investigations reveal new metal
dependence for GTP cyclohydrolase IB. J. Bacteriol. 2009,191, 6936–6949. [CrossRef]
188.
de Crécy-Lagard, V.; El Yacoubi, B.; de la Garza, R.D.; Noiriel, A.; Hanson, A.D. Comparative genomics of bacterial and plant
folate synthesis and salvage: Predictions and validations. BMC Genom. 2007,8, 1–15. [CrossRef]
189.
Gorelova, V.; Bastien, O.; De Clerck, O.; Lespinats, S.; Rébeillé, F.; Van Der Straeten, D. Evolution of folate biosynthesis and
metabolism across algae and land plant lineages. Sci. Rep. 2019,9, 5731. [CrossRef] [PubMed]
190.
Gerdes, S.Y.; Scholle, M.D.; Campbell, J.W.; Balázsi, G.; Ravasz, E.; Daugherty, M.D.; Somera, A.L.; Kyrpides, N.C.; Anderson,
I.; Gelfand, M.S.; et al. Experimental determination and system level analysis of essential genes in Escherichia coli MG1655.J.
Bacteriol. 2003,185, 5673–5684. [CrossRef]
191.
Salama, N.R.; Shepherd, B.; Falkow, S. Global transposon mutagenesis and essential gene analysis of Helicobacter pylori.J. Bacteriol.
2004,186, 7926–7935. [CrossRef] [PubMed]
192. Wahba, A.J.; Friedkin, M. The Enzymatic Synthesis of Thymidylate. J. Biol. Chem. 1962,237, 3794–3801. [CrossRef]
193.
Rebelo, J.; Auerbach, G.; Bader, G.; Bracher, A.; Nar, H.; Hösl, C.; Schramek, N.; Kaiser, J.; Bacher, A.; Huber, R.; et al. Biosynthesis
of Pteridines. Reaction Mechanism of GTP Cyclohydrolase I. J. Mol. Biol. 2003,326, 503–516. [CrossRef]
194.
Philpott, C.C.; Jadhav, S. The ins and outs of iron: Escorting iron through the mammalian cytosol. Free Radic. Biol. Med.
2019
,133,
112–117. [CrossRef] [PubMed]
Biomolecules 2021,11, 1282 32 of 32
195.
Jordan, M.R.; Wang, J.; Weiss, A.; Skaar, E.P.; Capdevila, D.A.; Giedroc, D.P. Mechanistic Insights into the Metal-Dependent
Activation of Zn II -Dependent Metallochaperones. Inorg. Chem. 2019,58, 13661–13672. [CrossRef] [PubMed]
196.
Edmonds, K.A.; Jordan, M.R.; Giedroc, D.P. COG0523 proteins: A functionally diverse family of transition metal-regulated G3E
P-loop GTP hydrolases from bacteria to man. Metallomics 2021,13. [CrossRef] [PubMed]
197.
Chandrangsu, P.; Huang, X.; Gaballa, A.; Helmann, J.D. Bacillus subtilis FolE is sustained by the ZagA zinc metallochaperone and
the alarmone ZTP under conditions of zinc deficiency. Mol. Microbiol. 2019,112, 751–765. [CrossRef] [PubMed]
... Members of this cluster are present in both archaea and bacteria and feature an N-terminal transmembrane domain. The fourth cluster contains the multidomain NIF3 proteins (NGG1p-interacting factor 3, PDB: 2GX8) of DUF34 protein family [51], which are ubiquitously conserved across all kingdoms of life, yet their cellular functions are obscure. In eukaryotes, NIF3 proteins appear to have a role in transcriptional regulation, specifically preventing Ngg1p translocation to the nucleus by forming a complex with it in the cytoplasm [52][53][54]. ...
... For example, in E. coli, the gene encoding NIF3 is highly upregulated during genotoxic stress caused by DNA damage [52], whereas in Methanocaldococcus jannaschii, nif3 was found in a gene cluster involved in the biosynthesis of the iron-guanylylpyridinol cofactor of [Fe]-hydrogenase [54]. Thus, NIF3 proteins were proposed to act as metallochaperones, insertases, or metallocofactor maturases [51][52][53][54], thereby explaining their divergent functions in various cellular pathways, including virulence, cell differentiation, universal stress responses, redox signaling, and metal ion homeostasis [51]. Notably, classical NIF3 proteins consist of a central PII-like domain, flanked by two NIF3-like domains at both the N-and C-terminal ends [52][53][54]. ...
... For example, in E. coli, the gene encoding NIF3 is highly upregulated during genotoxic stress caused by DNA damage [52], whereas in Methanocaldococcus jannaschii, nif3 was found in a gene cluster involved in the biosynthesis of the iron-guanylylpyridinol cofactor of [Fe]-hydrogenase [54]. Thus, NIF3 proteins were proposed to act as metallochaperones, insertases, or metallocofactor maturases [51][52][53][54], thereby explaining their divergent functions in various cellular pathways, including virulence, cell differentiation, universal stress responses, redox signaling, and metal ion homeostasis [51]. Notably, classical NIF3 proteins consist of a central PII-like domain, flanked by two NIF3-like domains at both the N-and C-terminal ends [52][53][54]. ...
Article
Full-text available
Members of the PII superfamily are versatile, multitasking signaling proteins ubiquitously found in all domains of life. They adeptly monitor and synchronize the cell's carbon, nitrogen, energy, redox, and diurnal states, primarily by binding interdependently to adenyl-nucleotides, including charged nucleotides (ATP, ADP, and AMP) and second messengers such as cyclic adenosine monophosphate (cAMP), cyclic di-adenosine monophosphate (c-di-AMP), and S-adenosylmethionine–AMP (SAM-AMP). These proteins also undergo a variety of posttranslational modifications, such as phosphorylation, adenylation, uridylation, carboxylation, and disulfide bond formation, which further provide cues on the metabolic state of the cell. Serving as precise metabolic sensors, PII superfamily proteins transmit this information to diverse cellular targets, establishing dynamic regulatory assemblies that fine-tune cellular homeostasis. Recently discovered, PII-like proteins are emerging families of signaling proteins that, while related to canonical PII proteins, have evolved to fulfill a diverse range of cellular functions, many of which remain elusive. In this review, we focus on the evolution of PII-like proteins and summarize the molecular mechanisms governing the assembly dynamics of PII complexes, with a special emphasis on the PII-like protein SbtB.
... To both explore the challenges of finding all relevant literature of a protein family and propose potential solutions, a stepwise demonstration of the capture process was recapitulated using the conserved unknown protein family, DUF34, recently examined in Reed et al. [25]. In this case study, publications were classified as being either 'focal' (i.e., any family homolog being mentioned in the title or abstract) or 'non-focal' (i.e., any family homolog being mentioned anywhere outside of the abstract or title, including supplementary materials). ...
... Examples of this can be seen when navigating the clustered groups and hierarchical relations of the EggNOG (v6) Database. The DUF34 family COG root cluster, LCOG0327, functional annotations include K22391, K07164 and K24730, all of which are incorrectly attributed to this group, the former due to premature EC number assignment in Helicobacter pylori [25] and the latter two due to DUF34 fusion sequences in bacteria and eukaryotes, respectively (Fig. S4). The aggregation mechanism and presentation manner of annotations by EggNOG implicitly Nodes of the directed network diagram are distinguished by the shape and color. ...
... One set of results, those of the Bacillus cereus DUF34 homolog (UniProt: Q818H0), produced publications that were entirely unique among the seven queries, unshared with the results of other single-sequence queries. Coincidentally, the homolog of B. cereus contains an inserted domain distinguishing it and others like it as members of a putative functional subgroup of the DUF34 family [25]. Therefore, these unique retrieved publications reinforce that an understanding of the taxonomic distribution of protein family domain architecture diversities is important to develop prior to selection of representatives for single-sequencebased literature retrieval via PaperBLAST. ...
Article
Full-text available
Capturing the published corpus of information on all members of a given protein family should be an essential step in any study focusing on specific members of that family. Using a previously gathered dataset of more than 280 references mentioning a member of the DUF34 (NIF3/Ngg1-interacting Factor 3) family, we evaluated the efficiency of different databases and search tools, and devised a workflow that experimentalists can use to capture the most information published on members of a protein family in the least amount of time. To complement this workflow, web-based platforms allowing for the exploration of protein family members across sequenced genomes or for the analysis of gene neighbourhood information were reviewed for their versatility and ease of use. Recommendations that can be used for experimentalist users, as well as educators, are provided and integrated within a customized, publicly accessible Wiki.
... The structure of Bacillus cereus YqfO has been resolved, revealing the presence of a dimetal-binding motif [10]. Recently, bioinformatics analysis using data from the determined genome sequences and published reports revealed that YqfO may function as a metal chaperone or metal insertase [11]. We observed that YqfO is under positive control of YlxR [8]. ...
... As a result, we first observed that disruption of the yqfO gene has a broad impact on genome gene expression. YqfO belongs to a large protein superfamily with unknown functions (DUF34), which is conserved in all three domains of life [11]. Although an exact mechanistic analysis was lacking, pleiotropic effects on physiological aspects, including transcription regulation, were observed in the disruptants of the genes encoding DUF34 proteins in many organisms [11]. ...
... YqfO belongs to a large protein superfamily with unknown functions (DUF34), which is conserved in all three domains of life [11]. Although an exact mechanistic analysis was lacking, pleiotropic effects on physiological aspects, including transcription regulation, were observed in the disruptants of the genes encoding DUF34 proteins in many organisms [11]. The Thermus thermophilus DUF34 protein YbgI binds to single-stranded DNA [18], and the Geobacillus stearotherophilus DUF34 protein XynX regulates the xynA gene encoding xylanase through its binding to the xynA promoter [19]. ...
Article
Full-text available
We investigated the regulators of the glucose induction (GI) of the ECF-sigma genes sigX/M. During further screening of transposon-inserted mutants, we identified several regulators including an RNA component of RNase P (rnpB), which is required for tRNA maturation. A depletion of rnpB is known to trigger the stringent response. We showed evidence that the stringent response inhibited GI of sigX/M.
Chapter
Abstract Metalloproteins represents more than one third of human proteome, with huge variation in physiological functions and pathological implications, depending on the metal/metals involved and tissue context. Their functions range from catalysis, bioenergetics, redox, to DNA repair, cell proliferation, signaling, transport of vital elements, and immunity. The human metalloproteomic studies revealed that many families of metalloproteins along with individual metalloproteins are dysregulated under several clinical conditions. Also, several sorts of interaction between redox- active or redox- inert metalloproteins are observed in health and disease. Metalloproteins profiling shows distinct alterations in neurodegenerative diseases, cancer, inflammation, infection, diabetes mellitus, among other diseases. This makes metalloproteins -either individually or as families- a promising target for several therapeutic approaches. Inhibitors and activators of metalloenzymes, metal chelators, along with artificial metalloproteins could be versatile in diagnosis and treatment of several diseases, in addition to other biomedical and industrial applications.
Preprint
Full-text available
Capturing the published corpus of information on all members of a given protein family should be an essential step in any study focusing on any specific member of that said family. This step is often performed only superficially or partially by experimentalists as the most common approaches and tools to pursue this objective are far from optimal. Using a previously gathered dataset of 284 references mentioning a member of the DUF34 (NIF3/Ngg1-interacting Factor 3), we evaluated the productivity of different databases and search tools, and devised a workflow that can be used by experimentalists to capture the most information in less time. To complement this workflow, web-based platforms allowing for the exploration of member distributions for several protein families across sequenced genomes or for the capture of gene neighborhood information were reviewed for their versatility, completeness and ease of use. Recommendations that can be used for experimentalist users, as well as educators, are provided and integrated within a customized, publicly accessible Wiki. Data summary The authors confirm all supporting data, code, and protocols have been provided within the article or through supplementary data files. The complete set of supplementary data sheets may be accessed via FigShare.
Article
Full-text available
Objective We observed that the addition of glucose enhanced the expression of sigX and sigM, encoding extra-cytoplasmic function sigma factors in Bacillus subtilis . Several regulatory factors were identified for this phenomenon, including YqfO, CshA (RNA helicase), and YlxR (nucleoid-associated protein). Subsequently, the relationships among these regulators were analyzed. Among them, YqfO is conserved in many bacterial genomes and may function as a metal ion insertase or metal chaperone, but has been poorly characterized. Thus, to further characterize YqfO, we performed RNA sequencing (RNA-seq) analysis of YqfO in addition to CshA and YlxR. Results We first performed comparative RNA-seq to detect the glucose-responsive genes. Next, to determine the regulatory effects of YqfO in addition to CshA and YlxR, three pairs of comparative RNA-seq analyses were performed ( yqfO /wt, cshA /wt, and ylxR /wt). We observed relatively large regulons (approximately 420, 780, and 180 for YqfO, CshA, and YlxR, respectively) and significant overlaps, indicating close relationships among the three regulators. This study is the first to reveal that YqfO functions as a global regulator in B. subtilis .
Article
Full-text available
Intimate embryo-maternal interaction is paramount for pregnancy success post-implantation. The embryo follows a specific developmental timeline starting with neural system, dependent on endogenous and decidual factors. Beyond altered genetics/epigenetics, post-natal diseases may initiate at prenatal/neonatal, post-natal period, or through a continuum. Preimplantation factor (PIF) secreted by viable embryos promotes implantation and trophoblast invasion. Synthetic PIF reverses neuroinflammation in non-pregnant models. PIF targets embryo proteins that protect against oxidative stress and protein misfolding. We report of PIF's embryotrophic role and potential to prevent developmental disorders by regulating uterine milieu at implantation and first trimester. PIF's effect on human implantation (human endometrial stromal cells (HESC)) and first-trimester decidua cultures (FTDC) was examined, by global gene expression (Affymetrix), disease-biomarkers ranking (GeneGo), neuro-specific genes (Ingenuity) and proteins (mass-spectrometry). PIF co-cultured epidermal growth factor (EGF) in both HESC and FTDC (Affymetrix) was evaluated. In HESC, PIF promotes neural differentiation and transmission genes (TLX2, EPHA10) while inhibiting retinoic acid receptor gene, which arrests growth. PIF promotes axon guidance and downregulates EGF-dependent neuroregulin signaling. In FTDC, PIF promotes bone morphogenetic protein pathway (SMAD1, 53-fold) and axonal guidance genes (EPH5) while inhibiting PPP2R2C, negative cell-growth regulator, involved in Alzheimer's and amyotrophic lateral sclerosis. In HESC, PIF affects angiotensin via beta-arrestin, transforming growth factor-beta (TGF-β), notch, BMP, and wingless-int (WNT) signaling pathways that promote neurogenesis involved in childhood neurodevelopmental diseases-autism and also affected epithelial-mesenchymal transition involved in neuromuscular disorders. In FTDC, PIF upregulates neural development and hormone signaling, while downregulating genes protecting against xenobiotic response leading to connective tissue disorders. In both HESC and FTDC, PIF affects neural development and transmission pathways. In HESC interactome, PIF promotes FUS gene, which controls genome integrity, while in FTDC, PIF upregulates STAT3 critical transcription signal. EGF abolished PIF's effect on HESC, decreasing metalloproteinase and prolactin receptor genes, thereby interfering with decidualization, while in FTDC, EGF co-cultured with PIF reduced ZHX2, gene that regulates neural AFP secretion. PIF promotes decidual trophic genes and proteins to regulate neural development. By regulating the uterine milieu, PIF may decrease embryo vulnerability to post-natal neurodevelopmental disorders. Examination of PIF-based intervention strategies used during embryogenesis to improve pregnancy prognosis and reduce post-natal vulnerability is clearly in order.
Article
Full-text available
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease defined by motor neuron (MN) loss. Multiple genetic risk factors have been identified, implicating RNA and protein metabolism and intracellular transport, among other biological mechanisms. To achieve a systems-level understanding of the mechanisms governing ALS pathophysiology, we built gene co-expression networks using RNA-sequencing data from control human spinal cord samples, identifying 13 gene co-expression modules, each of which represents a distinct biological process or cell type. Analysis of four RNA-seq datasets from a range of ALS disease-associated contexts reveal dysregulation in numerous modules related to ribosomal function, wound response, and leukocyte activation, implicating astrocytes, oligodendrocytes, endothelia, and microglia in ALS pathophysiology. To identify potentially causal processes, we partitioned heritability across the genome, finding that ALS common genetic risk is enriched within two specific modules, SC.M4, representing genes related to RNA processing and gene regulation, and SC.M2, representing genes related to intracellular transport and autophagy and enriched in oligodendrocyte markers. Top hub genes of this latter module include ALS-implicated risk genes such as KPNA3, TMED2, and NCOA4, the latter of which regulates ferritin autophagy, implicating this process in ALS pathophysiology. These unbiased, genome-wide analyses confirm the utility of a systems approach to understanding the causes and drivers of ALS.
Article
Full-text available
Since glycyl transfer RNA (glycyl-tRNA) is utilized for both peptidoglycan and protein synthesis in Staphylococcus aureus, the possible existence of more than one glycyl-tRNA synthetase in this microorganism was investigated. The glycyl-tRNA synthetase has been purified 200-fold from sonic extracts of S. aureus H, and no evidence of a second enzyme has been found. This enzyme can also charge the tRNAs of certain other bacteria and yeast. Some of its properties are reported.
Article
Full-text available
A procedure is described for the purification from extracts of Escherichia coli of the enzyme that catalyzes the release of carbon 8 of guanosine triphosphate as formic acid. This reaction is the first in a series of steps involved in the conversion of GTP to the pteridine component of folic acid. The enzyme, which has been purified some 700-fold, has been named GTP cyclohydrolase. No purine, nucleoside, or nucleotide other than GTP can be used as substrate. No coenzyme or metal activator is required in the reaction. The Km for GTP has been determined to be 2.2 × 10⁻⁵M. The molecular weight of the enzyme has been estimated to be larger than 300,000. The purified enzyme has been shown to catalyze the formation from GTP of formic acid and the triphospho ester of 2-amino-4-hydroxy-6-(erythro-1′,2′,3′-trihydroxypropyl)dihydropteridine (dihydroneopterin triphosphate) as products. No evidence was obtained to indicate that more than a single protein is involved in this transformation, although theoretical considerations suggest that this phenomenon is the sum of several individual reactions. Evidence is presented which shows that an arsenate-sensitive phosphatase is present in extracts of E. coli and that this enzyme is involved in the dephosphorylation of dihydroneopterin triphosphate, an enzymatic step which is necessary in order for this compound to be converted to dihydropteroic acid.
Article
Full-text available
Non-diazotrophic cyanobacteria are unable to fix atmospheric nitrogen and rely on combined nitrogen for growth and development. In the absence of combined nitrogen sources, most non-diazotrophic cyanobacteria, e.g., Synechocystis sp. PCC 6803 or Synechococcus elongatus PCC 7942, enter a dormant stage called chlorosis. The chlorosis process involves switching off photosynthetic activities and downregulating protein biosynthesis. Addition of a combined nitrogen source induces the regeneration of chlorotic cells in a process called resuscitation. As heavy metals are ubiquitous in the cyanobacterial biosphere, their influence on the vegetative growth of cyanobacterial cells has been extensively studied. However, the effect of heavy metal stress on chlorotic cyanobacterial cells remains elusive. To simulate the natural conditions, we investigated the effects of long-term exposure of S. elongatus PCC 7942 cells to both heavy metal stress and nitrogen starvation. We were able to show that elevated heavy metal concentrations, especially for Ni 2+ , Cd 2+ , Cu 2+ and Zn 2+ , are highly toxic to nitrogen starved cells. In particular, cells exposed to elevated concentrations of Cd 2+ or Ni 2+ were not able to properly enter chlorosis as they failed to degrade phycobiliproteins and chlorophyll a and remained greenish. In resuscitation assays, these cells were unable to recover from the simultaneous nitrogen starvation and Cd 2+ or Ni 2+ stress. The elevated toxicity of Cd 2+ or Ni 2+ presumably occurs due to their interference with the onset of chlorosis in nitrogen-starved cells, eventually leading to cell death.
Article
Transition metal homeostasis ensures that cells and organisms obtain sufficient metal to meet cellular demand while dispensing with any excess so as to avoid toxicity. In bacteria, zinc restriction induces the expression of one or more Zur (zinc-uptake repressor)-regulated Cluster of Orthologous Groups (COG) COG0523 proteins. COG0523 proteins encompass a poorly understood sub-family of G3E P-loop small GTPases, others of which are known to function as metallochaperones in the maturation of cobalamin (CoII) and NiII cofactor-containing metalloenzymes. Here, we use genomic enzymology tools to functionally analyze over 80,000 sequences that are evolutionarily related to Acinetobacter baumannii ZigA (Zur-inducible GTPase), a COG0523 protein and candidate zinc metallochaperone. These sequences segregate into distinct sequence similarity network (SSN) clusters, exemplified by the ZnII-Zur-regulated and FeIII-nitrile hydratase activator CxCC (C, Cys; X, any amino acid)-containing COG0523 proteins (SSN cluster 1), NiII-UreG (clusters 2, 8), CoII-CobW (cluster 4), and NiII-HypB (cluster 5). Five large clusters that comprise ≈ 25% of all sequences, including cluster 3 which harbors the only structurally characterized COG0523 protein, Escherichia coli YjiA, and many uncharacterized eukaryotic COG0523 proteins. We also establish that mycobacterial-specific protein Y (Mpy) recruitment factor (Mrf) which promotes ribosome hibernation in actinomycetes under conditions of ZnII starvation, segregates into a fifth SSN cluster (cluster 17). Mrf is a COG0523 paralog that lacks all GTP-binding determinants as well as the ZnII-coordinating Cys found in CxCC-containing COG0523 proteins. On the basis of this analysis, we discuss new perspectives on the COG0523 proteins as cellular reporters of widespread nutrient stress induced by ZnII limitation.
Article
The problem of associating data from multiple sources and predicting an outcome simultaneously is an important one in modern biomedical research. It has potential to identify multidimensional array of variables predictive of a clinical outcome and to enhance our understanding of the pathobiology of complex diseases. Incorporating functional knowledge in association and prediction models can reveal pathways contributing to disease risk. We propose Bayesian hierarchical integrative analysis models that associate multiple omics data, predict a clinical outcome, allow for prior functional information, and can accommodate clinical covariates. The models, motivated by available data and the need for exploring other risk factors of atherosclerotic cardiovascular disease (ASCVD), are used for integrative analysis of clinical, demographic, and genomics data to identify genetic variants, genes, and gene pathways likely contributing to 10-year ASCVD risk in healthy adults. Our findings revealed several genetic variants, genes, and gene pathways that are highly associated with ASCVD risk, with some already implicated in cardiovascular disease (CVD) risk. Extensive simulations demonstrate the merit of joint association and prediction models over two-stage methods: association followed by prediction.
Article
Aims Epigenetic regulation plays an important role in the progression of Alzheimer's disease (AD). Here, we identified differential methylation probes (DMP) and investigated their potential mechanistic roles in AD. Main methods DMPs were identified via bioinformatic analysis of GSE66351, which was made up with 106 AD samples and 84 control samples derived from three separate brain regions. Differentially expressed genes (DEGs) were analyzed based on GSE5281 comprising 45 control samples and 58 AD samples. Gene ontology (GO), gene set enrichment analysis (GSEA), and protein-protein interaction (PPI) were used to identify pathways and hub genes. Key findings We found 9007 DMPs in Occipital Cortex glia, 1527 in OC neurons, 100 in Temporal Cortex, and 194 in Frontal Cortex. 74 DEGs were identified in Primary Visual Cortex, 67 of which were downregulated while seven upregulated. 482 were upregulated and 697 downregulated in medial temporal gyrus. In superior frontal gyrus, 687 were upregulated and 85 downregulated. GO and PPI revealed that pathways involving epithelial-cell differentiation, cellular responses to lipids, transcription corepressor activities, apoptotic and organ growth were modulated by histone deacetylase 1 (HDAC1) and associated with AD. Additionally, GSEA illustrated that the transforming growth factor beta signaling pathway was significantly enriched in some brain regions and HDAC1 played an important role in this pathway. Significance We found the glial-specific 3′UTR of HDAC1 was hypermethylated and HDAC1 was overexpressed in AD patients. Moreover, we also speculate that HDAC1 triggered signaling pathways linked to many different biological processes and functions via the regulation of histone deacetylation.
Preprint
COP9 Signalosome Subunit 2 is a highly conserved multiprotein complex which is involved in the cellular process and developmental process. It is one of the essential components in the COP9 Signalosome Complex (CSN). It is also involved in neuronal differentiation interacting with NIF3L1. The gene involved in neuronal differentiation is negatively regulated due to the transcription co-repressor interaction of NIF3L1 with COPS2. In the present study, we have evaluated the outcome for 90 non-synonymous single nucleotide polymorphisms (nsSNP’s) in COPS2 gene through computational tools. After the analysis, 4 SNP’s (S120C, N144S, Y159H, R173C) were found to be deleterious. The native and mutated structures were prepared using discovery studio and docked to check the interactions with NIF3L1.On the basis of ZDOCK score the top 3 mutations (N144S, Y159H, R173C) were screened out. Further to analyze the effect of amino acid substitution on the molecular structure of protein Molecular Dynamics simulation was carried out. Analysis based on RMSD, RMSF, RG, H-bond showed a significant deviation in the graph, which demonstrated conformation change and instability compared to the wild structure. As it is known mutations in COPS2 gene can disrupt the normal activity of the CSN2 protein which may cause neuronal differentiation. Our results showed N144S, Y159H and R173C mutations are to be more pathogenic and may cause disease