Collin A. O’Leary's research while affiliated with Iowa State University and other places

What is this page?


This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Publications (19)


Structure of the SARS-CoV-2 Frameshift Stimulatory Element with an Upstream Multibranch Loop
  • Article

May 2024

·

10 Reads

Biochemistry

Jake M Peterson

·

Scott T Becker

·

Collin A O'Leary

·

[...]

·

Share

The RNA secondary structure of androgen receptor-FL and V7 transcripts reveals novel regulatory regions

March 2024

·

5 Reads

Nucleic Acids Research

The androgen receptor (AR) is a ligand-dependent nuclear transcription factor belonging to the steroid hormone nuclear receptor family. Due to its roles in regulating cell proliferation and differentiation, AR is tightly regulated to maintain proper levels of itself and the many genes it controls. AR dysregulation is a driver of many human diseases including prostate cancer. Though this dysregulation often occurs at the RNA level, there are many unknowns surrounding post-transcriptional regulation of AR mRNA, particularly the role that RNA secondary structure plays. Thus, a comprehensive analysis of AR transcript secondary structure is needed. We address this through the computational and experimental analyses of two key isoforms, full length (AR-FL) and truncated (AR-V7). Here, a combination of in-cell RNA secondary structure probing experiments (targeted DMS-MaPseq) and computational predictions were used to characterize the static structural landscape and conformational dynamics of both isoforms. Additionally, in-cell assays were used to identify functionally relevant structures in the 5′ and 3′ UTRs of AR-FL. A notable example is a conserved stem loop structure in the 5′UTR of AR-FL that can bind to Poly(RC) Binding Protein 2 (PCBP2). Taken together, our results reveal novel features that regulate AR expression.


Fig 3. Poly(U) mutations in MYC intron 2 reduce HNRNPC binding and inhibit splicing. A) MYC Intron 2 is represented with attention to three poly(U) regions (orange boxes and sequences below) that were mutated. Biotinylated, in vitro transcribed RNA (region between purple arrows) was incubated with HeLa lysates followed by immunoblotting (B) for HNRNPC. Coomassie stain represents a type of loading control. The blot is representative of two independent experiments; both the uncropped original images and the result from the other independent experiment can be found in S1 Raw images. Mutation (mut) numbering indicates which regions shown in (A) were mutated for each. C) Schematic of the intron insertion reporter used for experiments. D) Results of dual luciferase assays, plotted as the ratio of Firefly (FL) to Renilla (RL) relative light units with mutation numbering as before. Data are from at least 12 independent transfections in HeLa cells performed on at least two separate days. Black bar indicates a significant difference from wild-type (wt; p < 0.05, two-tailed Student's t-test).
Fig 4. Poly(U) regions identified in other MYC introns. A) Comparison of introns for paralogues of human MYC showing poly(U) regions (purple) with at least five consecutive U residues. B) Cladogram after MAFFT alignment of MYC intron 2 sequences (exception of human MYCL that only has one intron) from various species. C) Plots of the length of the intron versus either (top) the distance of the first poly(U) to the 3 0 splice site (3 0 ss) or (bottom) the distance of the last poly(U) to the 5 0 splice site (5 0 ss) demonstrate strong positive linear relationships. D) Box plot demonstrating the consistently high ratios between intron length and distances of indicated poly(U) regions to the 5 0 ss and the 3 0 ss of MYC across the species examined. Values represented in graphs are found in S3 File.
Identification of MYC intron 2 regions that modulate expression
  • Article
  • Full-text available

January 2024

·

22 Reads

PLOS ONE

PLOS ONE

MYC pre-mRNA is spliced with high fidelity to produce the transcription factor known to regulate cellular differentiation, proliferation, apoptosis, and alternative splicing. The mechanisms underpinning the pre-mRNA splicing of MYC , however, remain mostly unexplored. In this study, we examined the interaction of heterogeneous nuclear ribonucleoprotein C (HNRNPC) with MYC intron 2. Building off published eCLIP studies, we confirmed this interaction with poly(U) regions in intron 2 of MYC and found that full binding is correlated with optimal protein production. The interaction appears to be compensatory, as mutational disruption of all three poly(U) regions was required to reduce both HNRNPC binding capacity and fidelity of either splicing or translation. Poly(U) sequences in MYC intron 2 were relatively conserved across sequences from several different species. Lastly, we identified a short sequence just upstream of an HNRNPC binding region that when removed enhances MYC translation.

Download

SARS-CoV-2 Orphan Gene ORF10 Contributes to More Severe COVID-19 Disease

November 2023

·

44 Reads

·

1 Citation

The orphan gene of SARS-CoV-2, ORF10, is the least stud- ied gene in the virus responsible for the COVID-19 pandemic. Recent experimentation indicated ORF10 expression moder- ates innate immunity in vitro. However, whether ORF10 af- fects COVID-19 in humans remained unknown. We determine that the ORF10 sequence is identical to the Wuhan-Hu-1 ances- tral haplotype in 95% of genomes across five variants of con- cern (VOC). Four ORF10 variants are associated with less vir- ulent clinical outcomes in the human host: three of these af- fect ORF10 protein structure, one affects ORF10 RNA struc- tural dynamics. RNA-Seq data from 2070 samples from di- verse human cells and tissues reveals ORF10 accumulation is conditionally discordant from that of other SARS-CoV-2 tran- scripts. Expression of ORF10 in A549 and HEK293 cells per- turbs immune-related gene expression networks, alters expres- sion of the majority of mitochondrially-encoded genes of oxida- tive respiration, and leads to large shifts in levels of 14 newly- identified transcripts. We conclude ORF10 contributes to more severe COVID-19 clinical outcomes in the human host.


Discovery of RNA secondary structural motifs using sequence-ordered thermodynamic stability and comparative sequence analysis

June 2023

·

19 Reads

·

1 Citation

MethodsX

Major advances in RNA secondary structural motif prediction have been achieved in the last few years; however, few methods harness the predictive power of multiple approaches to deliver in-depth characterizations of local RNA motifs and their potential functionality. Additionally, most available methods do not predict RNA pseudoknots. This work combines complementary bioinformatic systems into one robust discovery pipeline where: •RNA sequences are folded to search for thermodynamically favorable motifs utilizing ScanFold.•Motifs are expanded and refolded into alternate pseudoknot conformations by Knotty/Iterative HFold.•All conformations are evaluated for covariance via the cm-builder pipeline (Infernal and R-scape).


Figure 1 Schematic of ScanFold 2.0 training procedure. Representative sequences were generated for a range of lengths (between 60 and 200 nt) and dinucleotide frequencies. These sequences were shuffled and analyzed using RNAfold to determine their MFEs, mean MFEs and respective standard deviations. Mean MFEs and standard deviations were then combined with 18 sequence composition features to comprise all 20 training features. These 20 features were used to generate mean MFE and standard deviation models. Full-size  DOI: 10.7717/peerj.14361/fig-1
ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes

November 2022

·

81 Reads

·

11 Citations

A major limiting factor in target discovery for both basic research and therapeutic intervention is the identification of structural and/or functional RNA elements in genomes and transcriptomes. This was the impetus for the original ScanFold algorithm, which provides maps of local RNA structural stability, evidence of sequence-ordered (potentially evolved) structure, and unique model structures comprised of recurring base pairs with the greatest structural bias. A key step in quantifying this propensity for ordered structure is the prediction of secondary structural stability for randomized sequences which, in the original implementation of ScanFold, is explicitly evaluated. This slow process has limited the rapid identification of ordered structures in large genomes/transcriptomes, which we seek to overcome in this current work introducing ScanFold 2.0. In this revised version of ScanFold, we no longer explicitly evaluate randomized sequence folding energy, but rather estimate it using a machine learning approach. For high randomization numbers, this can increase prediction speeds over 100-fold compared to ScanFold 1.0, allowing for the analysis of large sequences, as well as the use of additional folding algorithms that may be computationally expensive. In the testing of ScanFold 2.0, we re-evaluate the Zika, HIV, and SARS-CoV-2 genomes and compare both the consistency of results and the time of each run to ScanFold 1.0. We also re-evaluate the SARS-CoV-2 genome to assess the quality of ScanFold 2.0 predictions vs several biochemical structure probing datasets and compare the results to those of the original ScanFold program.


Figure 3. Secondary structure models of EBER2. DMS reactivity data is overlaid on the models as red shaded nucleotides with the DMS reactivity scale ranging from 0.0 (white) to 1.0 (dark red). The nucleotides on each model are numbered at every 20-nucleotide interval. Covarying base pairs, as identified by R-scape, are highlighted with blue boxes. (A) Our proposed EBER2 model generated from RNAfold using DMS reactivity data as pseudo-energies. (B) The previously established reference model of EBER2.
Figure 4. Secondary structural models of mono and di-terminal repeat RNA units of the EBV type II genome. (A) A DMS informed RNAfold 2D structure model of the EBV TR RNA mono-segment. The DMS reactivity data used is overlaid on the model as red shaded nucleotides with the DMS reactivity scale ranging from 0.0 (white) to 1.0 (dark red). The model has nucleotide positions labelled at every 20-nucleotide interval. The site which binds EBER2 is highlighted in purple. Sites of R-scape identified covariation are highlighted in green and blue. (B) The same TR RNA mono-segment as modeled in panel A. Here, structure probabilities (as determined by RNAfold) are overlaid on the model and ScanFold predicted base pairs with G z-scores < -1 are highlighted with green base pair lines. (C) A DMS informed, RNAfold 2D structural model of the EBV TR RNA di-segment. Here, the first segment is highlighted in green, and the second segment is highlighted in red to help differentiate their positions. The EBER2 binding sites are highlighted in purple. This model has nucleotide positions labelled at every 40-nucleotide interval.
Reactivity and sequencing statistics for multiple classes of transcripts
ScanFold metrics for EBV transcripts
Thermodynamic and structural characterization of an EBV infected B-cell lymphoma transcriptome

October 2022

·

46 Reads

·

3 Citations

NAR Genomics and Bioinformatics

Epstein–Barr virus (EBV) is a widely prevalent human herpes virus infecting over 95% of all adults and is associated with a variety of B-cell cancers and induction of multiple sclerosis. EBV accomplishes this in part by expression of coding and noncoding RNAs and alteration of the host cell transcriptome. To better understand the structures which are forming in the viral and host transcriptomes of infected cells, the RNA structure probing technique Structure-seq2 was applied to the BJAB-B1 cell line (an EBV infected B-cell lymphoma). This resulted in reactivity profiles and secondary structural analyses for over 10000 human mRNAs and lncRNAs, along with 19 lytic and latent EBV transcripts. We report in-depth structural analyses for the human MYC mRNA and the human lncRNA CYTOR. Additionally, we provide a new model for the EBV noncoding RNA EBER2 and provide the first reported model for the EBV tandem terminal repeat RNA. In-depth thermodynamic and structural analyses were carried out with the motif discovery tool ScanFold and RNAfold prediction tool; subsequent covariation analyses were performed on resulting models finding various levels of support. ScanFold results for all analyzed transcripts are made available for viewing and download on the user-friendly RNAStructuromeDB.


Expansion of the RNAStructuromeDB to include secondary structural data spanning the human protein-coding transcriptome

August 2022

·

40 Reads

·

2 Citations

Scientific Reports

RNA plays vital functional roles in almost every component of biology, and these functional roles are often influenced by its folding into secondary and tertiary structures. An important role of RNA secondary structure is in maintaining proper gene regulation; therefore, making accurate predictions of the structures involved in these processes is important. In this study, we have expanded on our previous work that led to the creation of the RNAStructuromeDB. Unlike this previous study that analyzed the human genome at low resolution, we have now scanned the protein-coding human transcriptome at high (single nt) resolution. This provides more robust structure predictions for over 100,000 isoforms of known protein-coding genes. Notably, we also utilize the motif identification tool, , to model structures with high propensity for ordered/evolved stability. All data have been uploaded to the RNAStructuromeDB, allowing for easy searching of transcripts, visualization of data tracks (via the or ), and download of data—including unique highly-ordered motifs. Herein, we provide an example analysis of MAT2A to demonstrate the utility of at finding known and novel secondary structures, highlighting regions of potential functionality, and guiding generation of functional hypotheses through use of the data.


FIGURE 4 | Predicted inter-intronic structures. (A). Secondary structure formed between elements 1 and 2 located within introns 6 and 7, respectively. The structure sequesters both TIA1 binding sites. Numbering of nucleotides, negative, neutral and negative, starts from the last position of intron 6, first position of exon 7 and the first position of intron 7, respectively. Element 1 is highlighted in purple; branch point sequence, in yellow, with "A" indicated in red. Other markings and abbreviations are same as shown in Figure 3. (B). An alternative secondary structure of element 2. The structural context changes the positioning of the TIA1 binding sites. Markings and abbreviation are the same as in panel A.
FIGURE 5 | In silico ScanFold results for the pre-mRNA of SMN2. (A). At the top, an IGV representation of the whole SMN2 pre-mRNA transcript with 6 data tracks: a base pair (arc diagram) track, a track showing ScanFold extracted structures where base pairs with z-score < -2, -1 and 0 are indicated in blue, green and yellow, respectively; a track of transcripts (with introns as lines and exons as boxes), an ensemble diversity (ED) track, a minimum free energy (MFE) track, and a ΔG z-score track. Below the whole transcript are two zoomed in regions. Highlighted by the blue box is a region of SMN2 with the lowest z-score and both the lowest and highest MFE regions in the transcript. Highlighted by the red box is Exon 7, which contains one of the two extracted structures present in an exonic region. (B). The ScanFold informed 2D model of the blue highlighted region from panel A is shown with the per nucleotide (NT) z-score overlaid on the model. (C). The ScanFold informed 2D model of the red highlighted region from panel A is shown with the per nucleotide (NT) z-score overlaid on the model. Here, the boundaries of intron splice sites and the start and stop site of exon 7 are labelled.
Structural Context of a Critical Exon of Spinal Muscular Atrophy Gene

July 2022

·

83 Reads

·

10 Citations

Frontiers in Molecular Biosciences

Humans contain two nearly identical copies of Survival Motor Neuron genes, SMN1 and SMN2. Deletion or mutation of SMN1 causes spinal muscular atrophy (SMA), one of the leading genetic diseases associated with infant mortality. SMN2 is unable to compensate for the loss of SMN1 due to predominant exon 7 skipping, leading to the production of a truncated protein. Antisense oligonucleotide and small molecule-based strategies aimed at the restoration of SMN2 exon 7 inclusion are approved therapies of SMA. Many cis-elements and transacting factors have been implicated in regulation of SMN exon 7 splicing. Also, several structural elements, including those formed by a long-distance interaction, have been implicated in the modulation of SMN exon 7 splicing. Several of these structures have been confirmed by enzymatic and chemical structure-probing methods. Additional structures formed by inter-intronic interactions have been predicted by computational algorithms. SMN genes generate a vast repertoire of circular RNAs through inter-intronic secondary structures formed by inverted Alu repeats present in large number in SMN genes. Here, we review the structural context of the exonic and intronic cis-elements that promote or prevent exon 7 recognition. We discuss how structural rearrangements triggered by single nucleotide substitutions could bring drastic changes in SMN2 exon 7 splicing. We also propose potential mechanisms by which inter-intronic structures might impact the splicing outcomes.


Expansion of the RNAStructuromeDB to include secondary structural data spanning the human protein-coding transcriptome

May 2022

·

17 Reads

RNA plays vital functional roles in almost every component of biology, and these functional roles are often influenced by its folding into secondary and tertiary structures. An important role of RNA secondary structure is in maintaining proper gene regulation; therefore, making accurate predictions of the structures involved in these processes is important. In this study, we have expanded on our previous work that led to the creation of the RNAStructuromeDB. Unlike this previous study that analyzed the human genome at low resolution, we have now scanned the protein-coding human transcriptome at high (single nt) resolution. This provides more robust structure predictions for over 100,000 isoforms of known protein-coding genes. Notably, we also utilize the motif identification tool, ScanFold, to model structures with high propensity for ordered/evolved stability. All data have been uploaded to the RNAStructuromeDB, allowing for easy searching of transcripts, visualization of data tracks (via the Integrative Genomics Viewer or IGV), and download of ScanFold data—including unique highly-ordered motifs. Herein, we provide an example analysis of MAT2A to demonstrate the utility of ScanFold at finding known and novel secondary structures, highlighting regions of potential functionality, and guiding generation of functional hypotheses through use of the data.


Citations (12)


... Intron sequences were analyzed using ScanFold 2.0 [16] with default settings (no probing data, 120 nucleotide windows, 1 nucleotide step-size, global refold on, extract -2 z-score structures). Data were downloaded and z-average -1 dot-bracket structures were visually compared using VARNA [26]. ...

Reference:

Identification of MYC intron 2 regions that modulate expression
ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes
PeerJ

PeerJ

... Since the tertiary structure of RNA is highly limited by the bi-dimensional structure [67], we analyzed the secondary structure of S03 generated by RNAFold Delta ( Figure 3B). An RNA molecule contains rigid and ecstatic structures and free highly dynamic ones [68], which can be organized into regions. ...

Thermodynamic and structural characterization of an EBV infected B-cell lymphoma transcriptome

NAR Genomics and Bioinformatics

... Existing probing data can be used as a constraint to inform the final model but are not required; purely in silico ScanFold results (especially -2 z-score motifs) correlate well with SHAPE and DMS reactivities [ 12 , 19 ]. A 120 nucleotide window size is sufficient for most structured RNAs (e.g., most known human cis-regulatory RNA structures are < ∼150 nucleotides [21] ), but the size can be reduced or increased depending on user needs. Window size can be increased to 150 nucleotides with little decline in accuracy, although it is not recommended to exceed 200 nucleotides [ 1 , 12 ]. ...

Expansion of the RNAStructuromeDB to include secondary structural data spanning the human protein-coding transcriptome

Scientific Reports

... SMN protein plays a crucial role in the assembly of snRNPs, which are essential for mRNA splicing (37). Impaired snRNP assembly due to SMN deficiency leads to widespread splicing defects in various genes, further exacerbating motor neuron dysfunction (38). While SMA is primarily a disorder of motor neurons, the resulting muscle atrophy and weakness are critical clinical features. ...

Structural Context of a Critical Exon of Spinal Muscular Atrophy Gene

Frontiers in Molecular Biosciences

... We also asked whether the binding of HNRNPC to intron 2 of MYC could alter splicing of MYC pre-mRNA. To this end, a dual luciferase plasmid was generated containing MYC intron 2 inserted into the Firefly coding region (Fig 3C) at a location previously demonstrated not to adversely affect the resultant protein [22]. This approach also allowed introduction of mutations to each of the binding regions. ...

Analyses of human cancer driver genes uncovers evolutionarily conserved RNA structural elements involved in posttranscriptional control
PLOS ONE

PLOS ONE

... Subsequent bioinformatics analyses showed that the regions creating the highest amount of structure within the SARS-CoV-2 genome are in the 5 end and the regions corresponding to glycoproteins S and M [21]. Recently, RNA structure mapping of the complete SARS-CoV-2 genome and subgenomic RNA in vitro, in vivo, and in cellulo were published [22][23][24][25][26][27][28]. Moreover, the 3D folding of selected domains and motifs of genomic RNA was also proposed [29]. ...

Secondary Structure of Subgenomic RNA M of SARS-CoV-2

... Nucleotide positions 87 -130 of segment 5 are well conserved in IAV strains and predicted to form a pseudoknot structure in vRNA [10,47]. Interestingly, this region was not identified as a structural region in the recent study, where segments were scanned for local secondary structure and sequence covariance analysis [51]. Multiple structure modes in this region might trigger the weaker signals, but beyond the scope of this paper. ...

In silico analysis of local RNA secondary structure in influenza virus A, B and C finds evidence of widespread ordered stability but little evidence of significant covariation

Scientific Reports

... Similarly, a number of computational biology studies have been undertaken for this problem. The Moss laboratory published their findings in early 2021, describing eight highly likely structures predicted with their ScanFold-based computational pipeline [35,36]. The Pyle laboratory, at nearly the same time, described their findings using the SuperFold RNA secondary structure prediction utility, identifying 61% of the genome as being base paired [37]. ...

A map of the SARS-CoV-2 RNA structurome

NAR Genomics and Bioinformatics

... One attractive way to inhibit the viral frameshifting and interfere the viral replication is to develop the structure-specific binder to target the FES element. There have been drug-like small molecules identified to selectively bind to the AH structure and impair the frameshifting of virus 12,15 . These studies demonstrated the promise that the RNA genome of SARS-CoV-2 could be an ideal drug targets to disrupt the viral cellular functions. ...

Targeting the SARS-COV-2 RNA genome with small molecule binders and ribonuclease targeting chimera (RiboTAC) degraders

ACS Central Science

... Based on these findings, ordered thermodynamic stability can be measured to determine if a given RNA structure has undergone selection [1] . This is the basis for ScanFold, previously used to uncover unusually stable local motifs in human immunodeficiency virus (HIV-1) [1] , Zika virus (ZIKV) [1] , Epstein-Barr virus (EBV) [11] , severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [12] , and influenza virus [13] . ...

A survey of RNA secondary structural propensity encoded within human herpesvirus genomes: global comparisons and local motifs
PeerJ

PeerJ