ArticlePDF Available

Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals

February 2004
Journal of Computational Biology: a Journal of Computational Molecular Cell Biology 11(2-3):377-94

February 2004
11(2-3):377-94

DOI:10.1089/1066527041410418

Source
PubMed

Authors:

Gene W Yeo

University of California, San Diego

We propose a framework for modeling sequence motifs based on the maximum entropy principle (MEP). We recommend approximating short sequence motif distributions with the maximum entropy distribution (MED) consistent with low-order marginal constraints estimated from available data, which may include dependencies between nonadjacent as well as adjacent positions. Many maximum entropy models (MEMs) are specified by simply changing the set of constraints. Such models can be utilized to discriminate between signals and decoys. Classification performance using different MEMs gives insight into the relative importance of dependencies between different positions. We apply our framework to large datasets of RNA splicing signals. Our best models out-perform previous probabilistic models in the discrimination of human 5' (donor) and 3' (acceptor) splice sites from decoys. Finally, we discuss mechanistically motivated ways of comparing models.

Content uploaded by Gene W Yeo

Content may be subject to copyright.

Maximum Entropy Modeling of Short Sequence Motifs with

Applications to RNA Splicing Signals

Gene Yeo

Massachusetts Institute of Technology

Cambridge, Massachusetts 02139

geneyeo@mit.edu

Christopher Burge

Massachusetts Institute of Technology

Cambridge, Massachusetts 02139

cburge@mit.edu

ABSTRACT

Keywords

1. INTRODUCTION

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

RECOMB ’02 Berlin, Germany

5.00.

2. METHODS

2.1 Maximum Entropy Method

2.2 Marginal Constraints

2.2.1 “Complete” Constraints

2.2.2 “Speciﬁc” Constraints

2.3 Maximum Entropy Models

2.4 Iterative Scaling to Calculate MED

2.5 Ranking Position dependencies

3. SPLICE SITE RECOGNITION

3.1 Construction of Transcript Data

4. RESULTS AND DISCUSSION

4.1 Models of the 5’ splice site

0 0.1 0.2

0.7

0.75

0.8

0.85

0.9

0.95

Receiver Operating Curve Analysis

False Positive Rate

True Positive Rate (Sensitivity)

me1s0

me2s0

me2s1

me2x5

mdd

4.1.1 Ranked Constraints

0 20 40 60 80 100

4.5

5.5

6.5

7.5

8.5

Information Plot (me2s0 model)

Increasing Constraints

Information, I = 18−H

ranked

random

0 20 40 60 80 100

0.2

0.3

0.4

0.5

0.6

0.7

Maximum Correlation Coefficient (me2s0 model)

Increasing Constraints

Max Correlation Coefficient

ranked

random

4.2 Models of the 3’ splice site

4.3 Clustering of Splice Sites

0.02 0.04 0.06 0.08 0.1 0.12

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

Receiver Operating Curve Analysis

False Positive Rate

True Positive Rate (Sensitivity)

me2s0 modified

1mm

me2s0

wmm/0mm

me1s0

0.04 0.05 0.06 0.07 0.08 0.09

0.86

0.87

0.88

0.89

0.9

0.91

0.92

0.93

0.94

0.95

Receiver Operating Curve Analysis

False Positive Rate

True Positive Rate (Sensitivity)

me2x2

me2x3

me2x4

me2x5

me2x1

me3s0/2mm

me2s0/1mm

me4s0/3mm

me1s0/wmm/0mm

5. APPLICATIONSOFSPLICE SITE MOD-

ELS

5.1 Proximal 5’ss decoys in introns

0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11

0.86

0.88

0.9

0.92

0.94

0.96

0.98

Receiver Operating Curve Analysis

False Positive Rate

True Positive Rate (Sensitivity)

me2x5

me2x5 (combined)

me2s0

1mm (combined)

wmm

wmm (combined)

0 1000 2000 3000 4000 5000 6000 7000 8000

No hsd

Fhsd > 250

Fhsd < 250

Number of introns

me2x5

MDD

WMM

5.2 Ranking and Competing 5’ss

−20 −10 0 10

−15

−10

−5

me2s0/1mm

WMM

−20 0 20

−15

−10

−5

MDD

WMM

−40 −20 0 20

−15

−10

−5

me2x5

WMM

−20 0 20

−20

−15

−10

−5

MDD

me2s0/1mm

−40 −20 0 20

−20

−15

−10

−5

me2x5

me2s0/1mm

−40 −20 0 20

−20

−15

−10

−5

me2x5

MDD

6. CONCLUSIONS

7. FUTURE WORK

8. ACKNOWLEDGEMENTS

9. REFERENCES

APPENDIX

A. INHOMOGENEOUSMARKOVMODELS

B. PERFORMANCE MEASURES

C. ROC ANALYSIS

D. TABLES

Alternative splicing across the C. elegans nervous system

Preprint

Full-text available

May 2024

Alternative splicing is a key mechanism that shapes neuronal transcriptomes, helping to define neuronal identity and modulate function. Here, we present an atlas of alternative splicing across the nervous system of Caenorhabditis elegans . Our analysis identifies novel alternative splicing in key neuronal genes such as unc-40 /DCC and sax-3 /ROBO. Globally, we delineate patterns of differential alternative splicing in almost 2,000 genes, and estimate that a quarter of neuronal genes undergo differential splicing. We introduce a web interface for examination of splicing patterns across neuron types. We explore the relationship between neuron type and splicing patterns, and between splicing patterns and differential gene expression. We identify RNA features that correlate with differential alternative splicing, and describe the enrichment of microexons. Finally, we compute a splicing regulatory network that can be used to generate hypotheses on the regulation and targets of alternative splicing in neurons.

The polyadenosine RNA binding protein Nab2 regulates alternative splicing and intron retention during Drosophila melanogaster brain development

Preprint

Full-text available

May 2024

The regulation of cell-specific gene expression patterns during development requires the coordinated actions of hundreds of proteins, including transcription factors, processing enzymes, and many RNA binding proteins (RBPs). RBPs often become associated with a nascent transcript immediately after its production and are uniquely positioned to coordinate concurrent processing and quality control steps. Since RNA binding proteins can regulate multiple post-transcriptional processing steps for many mRNA transcripts, mutations within RBP-encoding genes often lead to pleiotropic effects that alter the physiology of multiple cell types. Thus, identifying the mRNA processing steps where an RBP functions and the effects of RBP loss on gene expression patterns can provide a better understanding of both tissue physiology and mechanisms of disease. In the current study, we have investigated the coordination of mRNA splicing and polyadenylation facilitated by the Drosophila RNA binding protein Nab2, an evolutionary conserved ortholog of human ZC3H14. ZC3H14 loss in human patients has previously been linked to alterations in nervous system function and disease. Both fly Nab2 and vertebrate ZC3H14 bind to polyadenosine RNA and have been implicated in the control of poly(A) tail length. Interestingly, we show that fly Nab2 functionally interacts with components of the spliceosome, suggesting that this family of RNA biding proteins may also regulate alternative splicing of mRNA transcripts. Using RNA-sequencing approaches, we show that Nab2 loss causes widespread changes in alternative splicing and intron retention. These changes in splicing cause alterations in the abundance of protein isoforms encoded by the affected transcripts and may contribute to phenotypes, such as decreases in viability and alterations in brain morphology, observed in Nab2 null flies. Overall, these studies highlight the importance of RNA binding proteins in the coordination of post-transcriptional gene expression regulation and potentially identify a class of proteins that can coordinate multiple processing events for specific mRNA transcripts. Author Summary Although most cells in a multicellular organism contain the same genetic material, each cell type produces a set of RNA molecules and proteins that allows it to perform specific functions. Protein production requires that a copy of the genetic information encoded in a cell’s DNA first be copied into RNA. Then the RNA is often processed to remove extra sequences and the finalized RNA can be used to create a particular type of protein. Our work is focused on how cells within developing fruit fly brain control the types, processing steps, and final sequences of the RNA molecules produced. We present data showing that when fly brain tissue lacks a protein called Nab2, some RNA molecules are not produced correctly. Nab2 loss causes extra sequences to be retained within many RNA molecules when those sequences are normally removed. These extra sequences can alter protein production from the affected RNAs and appear to contribute to the brain development problems observed in flies lacking Nab2. Since Nab2 performs very similar functions to a human protein called ZC3H14, these findings could provide a better understanding of how ZC3H14 loss leads to human disease.

Impact of U2-type introns on splice site prediction in Arabidopsis thaliana using deep learning

Preprint

Full-text available

May 2024

In this study, we investigate the impact of introns on the effectiveness of splice site prediction using deep learning models, focusing on Arabidopsis thaliana. We specifically utilize U2-type introns due to their ubiquity in plant genomes and the rich datasets available. We formulate two hypotheses: first, that short introns would lead to a higher effectiveness of splice site prediction than long introns due to reduced spatial complexity; and second, that sequences containing multiple introns would improve prediction effectiveness by providing a richer context for splicing events. Our findings indicate that (1) models trained on datasets with shorter introns consistently outperform those trained on datasets with longer introns, highlighting the importance of intron length in splice site prediction, and (2) models trained with datasets containing multiple introns per sequence demonstrate superior effectiveness over those trained with datasets containing a single intron per sequence. Furthermore, our findings not only align with the two hypotheses we put forward but also confirm existing observations from wet lab experiments regarding the impact of length of an intron and the number of introns present in a sequence on splice site prediction effectiveness, suggesting that our computational insights come with biological relevance.

Significance of utilizing in silico structural analysis and phenotypic data to characterize phenylalanine hydroxylase variants: A PAH landscape

Article

Full-text available

Jun 2024
MOL GENET METAB

Phenylketonuria (PKU) is a genetic disorder caused by variations in the phenylalanine hydroxylase (PAH) gene. Among the 3369 reported PAH variants, 33.7% are missense alterations. Unfortunately, 30% of these missense variants are classified as variants of unknown significance (VUS), posing challenges for genetic risk assessment. In our study, we focused on analyzing 836 missense PAH variants following the American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines specified by ClinGen PAH Variant Curation Expert Panel (VCEP) criteria. We utilized and compared variant annotator tools like Franklin and Varsome, conducted 3D structural analysis of PAH, and examined active and regulatory site hotspots. In addition, we assessed potential splicing effect of apparent missense variants. By evaluating phenotype data from 22962 PKU patients, our aim was to reassess the pathogenicity of missense variants. Our comprehensive approach successfully reclassified 309 VUSs out of 836 missense variants as likely pathogenic or pathogenic (37%), upgraded 370 likely pathogenic variants to pathogenic, and reclassified one previously considered likely benign variant as likely pathogenic. Phenotypic information was available for 636 missense variants, with 441 undergoing 3D structural analysis and active site hotspot identification for 180 variants. After our analysis, only 6% of missense variants were classified as VUSs, and three of them (c.23A>C/p.Asn8Thr, c.59_60delinsCC/p. Gln20Pro, and c.278A >T/p.Asn93Ile) may be influenced by abnormal splicing. Moreover, a pathogenic variant (c.168G>T/p.Glu56Asp) was identified to have a risk exceeding 98% for modifications of the consensus splice site, with high scores indicating a donor loss of 0.94. The integration of ACMG/AMP guidelines with in silico structural analysis and phenotypic data significantly reduced the number of missense VUSs, providing a strong basis for genetic counseling and emphasizing the importance of metabolic phenotype information in variant curation. This study also sheds light on the current landscape of PAH variants.

Acetylation-dependent regulation of core spliceosome modulates hepatocellular carcinoma cassette exons and sensitivity to PARP inhibitors

Article

Full-text available

Jun 2024

Despite the importance of spliceosome core components in cellular processes, their roles in cancer development, including hepatocellular carcinoma (HCC), remain poorly understood. In this study, we uncover a critical role for SmD2, a core component of the spliceosome machinery, in modulating DNA damage in HCC through its impact on BRCA1/FANC cassette exons and expression. Our findings reveal that SmD2 depletion sensitizes HCC cells to PARP inhibitors, expanding the potential therapeutic targets. We also demonstrate that SmD2 acetylation by p300 leads to its degradation, while HDAC2-mediated deacetylation stabilizes SmD2. Importantly, we show that the combination of Romidepsin and Olaparib exhibits significant therapeutic potential in multiple HCC models, highlighting the promise of targeting SmD2 acetylation and HDAC2 inhibition alongside PARP inhibitors for HCC treatment.

Evaluation of pathogenicity of WT1 intron variants by in vitro splicing analysis

Article

Full-text available

Jun 2024

Background Wilms tumor 1 ( WT1 ; NM_024426) causes Denys–Drash syndrome, Frasier syndrome, or isolated focal segmental glomerulosclerosis. Several WT1 intron variants are pathogenic; however, the pathogenicity of some variants remains undefined. Whether a candidate variant detected in a patient is pathogenic is very important for determining the therapeutic options for the patient. Methods In this study, we evaluated the pathogenicity of WT1 gene intron variants with undetermined pathogenicity by comparing their splicing patterns with those of the wild-type using an in vitro splicing assay using minigenes. The three variants registered as likely disease-causing genes: Mut1 (c.1017-9 T > C(IVS5)), Mut2 (c.1355-28C > T(IVS8)), Mut3 (c.1447 + 1G > C(IVS9)), were included as subjects along the 34 splicing variants registered in the Human Gene Mutation Database (HGMD) ® . Results The results showed no significant differences in splicing patterns between Mut1 or Mut2 and the wild-type; however, significant differences were observed in Mut3. Conclusion We concluded that Mut1 and Mut2 do not possess pathogenicity although they were registered as likely pathogenic, whereas Mut3 exhibits pathogenicity. Our results suggest that the pathogenicity of intronic variants detected in patients should be carefully evaluated.

Implications of Provider Specialty, Test Type, and Demographic Factors on Genetic Testing Outcomes for Patients with Autism Spectrum Disorder

Article

Full-text available

Jun 2024
J AUTISM DEV DISORD

A minority of patients with autism spectrum disorder (ASD) are offered genetic testing by their providers or referred for genetics evaluation despite published guidelines and consensus statements supporting genetics-informed care for this population. This study aimed to investigate the ordering habits of providers of different specialties and to additionally assess the diagnostic utility of genetic testing by test type, patient sex, and race and ethnicity. We retrospectively analyzed data associated with orders for the indication of ASD from a large clinical laboratory over 6 years (2017–2022). Geneticists and neurologists were more likely than other specialists to order exome sequencing and neurodevelopmental (NDD) panel testing while other providers were more likely to order chromosomal microarray (CMA) and Fragile X testing. Exome had the highest diagnostic yield (24.5%), followed by NDD panel (6.4%), CMA (6.2%), and Fragile X testing (0.4%). Females were 1.4x (95% CI: 1.2–1.7) more likely than males to receive a genetic diagnosis. However, for Fragile X, males had a higher diagnostic yield than females (0.4% vs 0.2%). Our findings highlight the need to enable non-genetics providers to order comprehensive genetic testing or promote referral to genetics following negative CMA and/or Fragile X testing. Our data supports that ASD testing should include exome, CMA, and other clinically indicated tests, as first-tier tests, with the consideration of panel testing, in cases where exome sequencing is not an option. Lastly, our study helps to inform expectations for genetic testing yield by test type and patient presentation.

Characterization of trans-spliced chimeric RNAs: insights into the mechanism of trans-splicing

Article

Jun 2024

Trans-splicing is a post-transcriptional processing event that joins exons from separate RNAs to produce a chimeric RNA. However, the detailed mechanism of trans-splicing remains poorly understood. Here, we characterize trans-spliced genes and provide insights into the mechanism of trans-splicing in the tunicate Ciona. Tunicates are the closest invertebrates to humans, and their genes frequently undergo trans-splicing. Our analysis revealed that, in genes that give rise to both trans-spliced and non-trans-spliced messenger RNAs, trans-splice acceptor sites were preferentially located at the first functional acceptor site, and their paired donor sites were weak in both Ciona and humans. Additionally, we found that Ciona trans-spliced genes had GU- and AU-rich 5′ transcribed regions. Our data and findings not only are useful for Ciona research community, but may also aid in a better understanding of the trans-splicing mechanism, potentially advancing the development of gene therapy based on trans-splicing.

Stabilizing mammalian RNA thermometer confers neuroprotection in subarachnoid hemorrhage

Preprint

May 2024

Mammals tightly regulate their core body temperature, yet how cells sense and respond to small temperature changes at the molecular level remains incompletely understood. Here, we discover a significant enrichment of RNA G-quadruplex (rG4) motifs around splice sites of cold-repressed exons. These thermosensing RNA structures, when stabilized, mask splice sites, reducing exon inclusion. Focusing on cold-induced neuroprotective RBM3, we demonstrate that rG4s near splice sites of a cold-repressed poison exon are stabilized at low temperatures, leading to exon exclusion. This enables evasion of nonsense-mediated decay, increasing RBM3 expression at cold. Additionally, increasing intracellular potassium concentration stabilizes rG4s and enhances RBM3 expression, leading to RBM3-dependent neuroprotection in a mouse model of subarachnoid hemorrhage. Our findings unveil a mechanism how mammalian RNAs directly sense temperature and potassium perturbations, integrating them into gene expression programs. This opens new avenues for treating diseases arising from splicing defects and disorders benefiting from therapeutic hypothermia.

Lineage-specific splicing regulation of MAPT gene in the primate brain

Article

May 2024

Structure of Tau Exon 10 Splicing Regulatory Element RNA and Destabilization by Mutations of Frontotemporal Dementia and Parkinsonism Linked to Chromosome 17

Article

Full-text available

Jul 1999
P NATL ACAD SCI USA

Coding region and intronic mutations in the tau gene cause frontotemporal dementia and parkinsonism linked to chromosome 17. Intronic mutations and some missense mutations increase splicing in of exon 10, leading to an increased ratio of four-repeat to three-repeat tau isoforms. Secondary structure predictions have led to the proposal that intronic mutations and one missense mutation destabilize a putative RNA stem-loop structure located close to the splice-donor site of the intron after exon 10. We have determined the three-dimensional structure of this tau exon 10 splicing regulatory element RNA by NMR spectroscopy. We show that it forms a stable, folded stem-loop structure whose thermodynamic stability is reduced by frontotemporal dementia and parkinsonism linked to chromosome 17 mutations and increased by compensatory mutations. By exon trapping, the reduction in thermodynamic stability is correlated with increased splicing in of exon 10.

RNA splice junctions of different classes of eukaryotes: Sequence statistics and functional implications in gene expression

Article

Full-text available

Oct 1987

A systematic analysis of the RNA splice junction sequences of eukaryotic protein coding genes was carried out using the GENBANK databank. Nucleotide frequencies obtained for the highly conserved regions around the splice sites for different categories of organisms closely agree with each other. A striking similarity among the rare splice junctions which do not contain AG at the 3′ splice site or GT at the 5′ splice site indicates the existence of special mechanisms to recognize them, and that these unique signals may be involved in crucial gene-regulation events and in differentiation. A method was developed to predict potential exons in a bare sequence, using a scoring and ranking scheme based on nucleotide weight tables. This method was used to find a majority of the exons in selected known genes, and also predicted potential new exons which may be used in alternative splicing situations.

Chapter 8 Modeling dependencies in pre-mRNA splicing signals

Article

Dec 1998
New Compr Biochem

Christopher B. Burge

This chapter discusses the compositional properties of pre-mRNA splicing signals explains the construction and application of simple probabilistic models of biological signal sequences. Although all of the examples relate to splicing signals, many of the techniques can equally well be applied to other types of nucleic acid signals, such as those involved in transcription, translation, or other biochemical processes. The chapter emphasizes the use of simple statistical tests of dependence between sequence positions. In some cases, observed dependencies can give clues to important functional constraints on a signal and incorporation of such dependencies into probabilistic models of sequence signals can lead to significant improvements in the accuracy of signal prediction/classification. The chapter discusses some caveats concerning the dangers of: (1) constructing models with too many parameters and (2) overinterpreting observed correlations in sequence data. It also describes pre-mRNA splicing, the probabilistic approach to signal classification, and several standard types of discrete probabilistic models.

Hidden Markov models in computational biology: Applications to protein modeling

Article

Jan 1994

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids

Book

Apr 1998

Probablistic models are becoming increasingly important in analyzing the huge amount of data being produced by large-scale DNA-sequencing efforts such as the Human Genome Project. For example, hidden Markov models are used for analyzing biological sequences, linguistic-grammar-based probabilistic models for identifying RNA secondary structure, and probabilistic evolutionary models for inferring phylogenies of sequences from different organisms. This book gives a unified, up-to-date and self-contained account, with a Bayesian slant, of such methods, and more generally to probabilistic methods of sequence analysis. Written by an interdisciplinary team of authors, it is accessible to molecular biologists, computer scientists, and mathematicians with no formal knowledge of the other fields, and at the same time presents the state of the art in this new and important field.

A Note on Approximations to Discrete Probability Distributions

Article

Dec 1959
Inform Contr

David T. Brown

An iterative method is presented which gives an optimum approximationto the joint probability distribution of a set of binary variables given the joint probability distributions of any subsets of the variables (any set of component distributions). The most significant feature of this approximation procedure is that there is no limitation to the number or type of component distributions that can be employed. Each step of the iteration gives an improved approximation, and the procedure converges to give an approximation that is the minimum information (i.e. maximum entropy) extension of the component distributions employed.

Approximating Probability Distributions to Reduce Storage Requirements

Article

Sep 1959
Inform Contr

Philip M. Lewis II

The measurement and/or storage of high order probability distributions implies exponential increases in equipment complexity. This paper considers the possibility of storing several of the lower order component distributions and using this partial information to form an approximation to the actual high order distribution. The approximation method is based on an information measure for the “closeness” of two distributions and on the criterion of maximum entropy. Approximations consisting of products of appropriate lower order distributions are proved to be optimum under suitably restricted conditions. Two such product approximations can be compared and the better one selected without any knowledge of the actual high order distribution other than that implied by the lower order distributions.

Information Theory and Statistical Mechanics .2.

Article

Oct 1957

Edwin T. Jaynes

Treatment of the predictive aspect of statistical mechanics as a form of statistical inference is extended to the density-matrix formalism and applied to a discussion of the relation between irreversibility and information loss. A principle of "statistical complementarity" is pointed out, according to which the empirically verifiable probabilities of statistical mechanics necessarily correspond to incomplete predictions. A preliminary discussion is given of the second law of thermodynamics and of a certain class of irreversible processes, in an approximation equivalent to that of the semiclassical theory of radiation. It is shown that a density matrix does not in general contain all the information about a system that is relevant for predicting its behavior. In the case of a system perturbed by random fluctuating fields, the density matrix cannot satisfy any differential equation because rho&dot;(t) does not depend only on rho(t), but also on past conditions The rigorous theory involves stochastic equations in the type rho(t)=G(t, 0)rho(0), where the operator G is a functional of conditions during the entire interval (0-->t). Therefore a general theory of irreversible processes cannot be based on differential rate equations corresponding to time-proportional transition probabilities. However, such equations often represent useful approximations.

Information Theory and Statistical Mechanics I

Article

May 1957

Edwin T. Jaynes

Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum-entropy estimate. It is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information. If one considers statistical mechanics as a form of statistical inference rather than as a physical theory, it is found that the usual computational rules, starting with the determination of the partition function, are an immediate consequence of the maximum-entropy principle. In the resulting "subjective statistical mechanics," the usual rules are thus justified independently of any physical argument, and in particular independently of experimental verification; whether or not the results agree with experiment, they still represent the best estimates that could have been made on the basis of the information available.

Measuring the Accuracy of Diagnostic Systems

Article

Jul 1988

John A. Swets

Diagnostic systems of several kinds are used to distinguish between two classes of events, essentially "signals" and "noise". For them, analysis in terms of the "relative operating characteristic" of signal detection theory provides a precise and valid measure of diagnostic accuracy. It is the only measure available that is uninfluenced by decision biases and prior probabilities, and it places the performances of diverse systems on a common, easily interpreted scale. Representative values of this measure are reported here for systems in medical imaging, materials testing, weather forecasting, information retrieval, polygraph lie detection, and aptitude testing. Though the measure itself is sound, the values obtained from tests of diagnostic systems often require qualification because the test data on which they are based are of unsure quality. A common set of problems in testing is faced in all fields. How well these problems are handled, or can be handled in a given field, determines the degree of confidence that can be placed in a measured value of accuracy. Some fields fare much better than others.

Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals

Abstract

Recommended publications

Proto-splice site model of intron origin

Inference of Splicing Regulatory Activities by Sequence Neighborhood Analysis

Nonclassical Splicing Mutations in the Coding and Noncoding Regions of the ATM Gene: Maximum Entropy...

An interpretable model of pre-mRNA splicing for animal and plant genes

Variation in sequence and organization of splicing regulatory elements in vertebrate genes