ArticlePDF Available

Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals

Authors:

Abstract

We propose a framework for modeling sequence motifs based on the maximum entropy principle (MEP). We recommend approximating short sequence motif distributions with the maximum entropy distribution (MED) consistent with low-order marginal constraints estimated from available data, which may include dependencies between nonadjacent as well as adjacent positions. Many maximum entropy models (MEMs) are specified by simply changing the set of constraints. Such models can be utilized to discriminate between signals and decoys. Classification performance using different MEMs gives insight into the relative importance of dependencies between different positions. We apply our framework to large datasets of RNA splicing signals. Our best models out-perform previous probabilistic models in the discrimination of human 5' (donor) and 3' (acceptor) splice sites from decoys. Finally, we discuss mechanistically motivated ways of comparing models.
Maximum Entropy Modeling of Short Sequence Motifs with
Applications to RNA Splicing Signals
Gene Yeo
Massachusetts Institute of Technology
Cambridge, Massachusetts 02139
geneyeo@mit.edu
Christopher Burge
Massachusetts Institute of Technology
Cambridge, Massachusetts 02139
cburge@mit.edu
ABSTRACT
Keywords
1. INTRODUCTION
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
RECOMB ’02 Berlin, Germany
Copyright 2002 ACM X-XXXXX-XX-X/XX/XX ...
5.00.
2. METHODS
2.1 Maximum Entropy Method
2.2 Marginal Constraints
2.2.1 “Complete” Constraints
2.2.2 “Specific” Constraints
2.3 Maximum Entropy Models
2.4 Iterative Scaling to Calculate MED
2.5 Ranking Position dependencies
3. SPLICE SITE RECOGNITION
3.1 Construction of Transcript Data
4. RESULTS AND DISCUSSION
4.1 Models of the 5’ splice site
0 0.1 0.2
0.7
0.75
0.8
0.85
0.9
0.95
1
Receiver Operating Curve Analysis
False Positive Rate
True Positive Rate (Sensitivity)
me1s0
me2s0
me2s1
me2x5
mdd
4.1.1 Ranked Constraints
0 20 40 60 80 100
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
Information Plot (me2s0 model)
Increasing Constraints
Information, I = 18−H
ranked
random
0 20 40 60 80 100
0.2
0.3
0.4
0.5
0.6
0.7
Maximum Correlation Coefficient (me2s0 model)
Increasing Constraints
Max Correlation Coefficient
ranked
random
4.2 Models of the 3’ splice site
4.3 Clustering of Splice Sites
0.02 0.04 0.06 0.08 0.1 0.12
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
Receiver Operating Curve Analysis
False Positive Rate
True Positive Rate (Sensitivity)
me2s0 modified
1mm
me2s0
wmm/0mm
me1s0
0.04 0.05 0.06 0.07 0.08 0.09
0.86
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
Receiver Operating Curve Analysis
False Positive Rate
True Positive Rate (Sensitivity)
me2x2
me2x3
me2x4
me2x5
me2x1
me3s0/2mm
me2s0/1mm
me4s0/3mm
me1s0/wmm/0mm
5. APPLICATIONSOFSPLICE SITE MOD-
ELS
5.1 Proximal 5’ss decoys in introns
0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Receiver Operating Curve Analysis
False Positive Rate
True Positive Rate (Sensitivity)
me2x5
me2x5 (combined)
me2s0
1mm (combined)
wmm
wmm (combined)
0 1000 2000 3000 4000 5000 6000 7000 8000
No hsd
Fhsd > 250
Fhsd < 250
Number of introns
me2x5
MDD
WMM
5.2 Ranking and Competing 5’ss
−20 −10 0 10
−15
−10
−5
0
5
10
me2s0/1mm
WMM
−20 0 20
−15
−10
−5
0
5
10
MDD
WMM
−40 −20 0 20
−15
−10
−5
0
5
10
me2x5
WMM
−20 0 20
−20
−15
−10
−5
0
5
10
MDD
me2s0/1mm
−40 −20 0 20
−20
−15
−10
−5
0
5
10
me2x5
me2s0/1mm
−40 −20 0 20
−20
−15
−10
−5
0
5
10
15
me2x5
MDD
6. CONCLUSIONS
7. FUTURE WORK
8. ACKNOWLEDGEMENTS
9. REFERENCES
APPENDIX
A. INHOMOGENEOUSMARKOVMODELS
B. PERFORMANCE MEASURES
C. ROC ANALYSIS
D. TABLES
... For AS, cis-acting factors-sequences or structures within the pre-mRNA-can regulate splicing diversity. The logic of cis regulation has been analyzed to identify features that give rise to AS 13,14,15,16,17 . But for differential AS, features within the nascent transcript are not sufficient, since the transcript is the same in all cell types. ...
... Previous work has found a role of sequence features in distinguishing AS events from constitutive splicing 13,14,15,16,17 . Do sequence features also affect differential AS? ...
Preprint
Full-text available
Alternative splicing is a key mechanism that shapes neuronal transcriptomes, helping to define neuronal identity and modulate function. Here, we present an atlas of alternative splicing across the nervous system of Caenorhabditis elegans . Our analysis identifies novel alternative splicing in key neuronal genes such as unc-40 /DCC and sax-3 /ROBO. Globally, we delineate patterns of differential alternative splicing in almost 2,000 genes, and estimate that a quarter of neuronal genes undergo differential splicing. We introduce a web interface for examination of splicing patterns across neuron types. We explore the relationship between neuron type and splicing patterns, and between splicing patterns and differential gene expression. We identify RNA features that correlate with differential alternative splicing, and describe the enrichment of microexons. Finally, we compute a splicing regulatory network that can be used to generate hypotheses on the regulation and targets of alternative splicing in neurons.
... (D) Nab2 loss causes retention of introns with weaker splice sites. Splice site scores were obtained using a Maximum Entropy (MaxEnt) model, MaxEntScan [99]. Lower scores signify "weaker" splice sites or a lower probability that a given sequence functions as a splice site. ...
... We next reasoned that if Nab2 loss altered splicing efficacy, splicing of non-consensus but well-annotated splice sites would be more often disrupted. We tested this hypothesis by comparing the relative strength of all 5'-and 3'-splice sites from retained introns to the strength of each splice site from all introns throughout the entire genome using MaxEntScan [99]. As Likewise, comparison of the 12-13 nucleotides surrounding the 3' splice site (the -1 position is the last nucleotide of the intron before the annotated splice site) demonstrated that a polypyrimidine (poly(T)) tract is commonly found just upstream of the 3' splice site in the majority of Drosophila introns ( Fig 5E). ...
Preprint
Full-text available
The regulation of cell-specific gene expression patterns during development requires the coordinated actions of hundreds of proteins, including transcription factors, processing enzymes, and many RNA binding proteins (RBPs). RBPs often become associated with a nascent transcript immediately after its production and are uniquely positioned to coordinate concurrent processing and quality control steps. Since RNA binding proteins can regulate multiple post-transcriptional processing steps for many mRNA transcripts, mutations within RBP-encoding genes often lead to pleiotropic effects that alter the physiology of multiple cell types. Thus, identifying the mRNA processing steps where an RBP functions and the effects of RBP loss on gene expression patterns can provide a better understanding of both tissue physiology and mechanisms of disease. In the current study, we have investigated the coordination of mRNA splicing and polyadenylation facilitated by the Drosophila RNA binding protein Nab2, an evolutionary conserved ortholog of human ZC3H14. ZC3H14 loss in human patients has previously been linked to alterations in nervous system function and disease. Both fly Nab2 and vertebrate ZC3H14 bind to polyadenosine RNA and have been implicated in the control of poly(A) tail length. Interestingly, we show that fly Nab2 functionally interacts with components of the spliceosome, suggesting that this family of RNA biding proteins may also regulate alternative splicing of mRNA transcripts. Using RNA-sequencing approaches, we show that Nab2 loss causes widespread changes in alternative splicing and intron retention. These changes in splicing cause alterations in the abundance of protein isoforms encoded by the affected transcripts and may contribute to phenotypes, such as decreases in viability and alterations in brain morphology, observed in Nab2 null flies. Overall, these studies highlight the importance of RNA binding proteins in the coordination of post-transcriptional gene expression regulation and potentially identify a class of proteins that can coordinate multiple processing events for specific mRNA transcripts. Author Summary Although most cells in a multicellular organism contain the same genetic material, each cell type produces a set of RNA molecules and proteins that allows it to perform specific functions. Protein production requires that a copy of the genetic information encoded in a cell’s DNA first be copied into RNA. Then the RNA is often processed to remove extra sequences and the finalized RNA can be used to create a particular type of protein. Our work is focused on how cells within developing fruit fly brain control the types, processing steps, and final sequences of the RNA molecules produced. We present data showing that when fly brain tissue lacks a protein called Nab2, some RNA molecules are not produced correctly. Nab2 loss causes extra sequences to be retained within many RNA molecules when those sequences are normally removed. These extra sequences can alter protein production from the affected RNAs and appear to contribute to the brain development problems observed in flies lacking Nab2. Since Nab2 performs very similar functions to a human protein called ZC3H14, these findings could provide a better understanding of how ZC3H14 loss leads to human disease.
... U12-type and U2-type introns are two distinct classes of introns 16 found in eukaryotic genomes [6]. They differ in terms of their spliceosomal machinery 17 and splicing mechanisms [7]. U12-type introns are a less common class of introns, 18 constituting a small fraction of introns in most eukaryotic genomes. ...
... Deep learning models, for instance making use of convolutional neural 41 networks (CNNs) [11][12][13][14][15], have shown remarkable proficiency in extracting complex 42 features from DNA and RNA sequences, enabling them to discern subtle patterns 43 crucial for splice site identification. However, the presence of U2-type introns, the 44 predominant class of introns in eukaryotic genomes, poses unique challenges to splice 45 site prediction [16] due to factors such as variability in their sequences [17], the presence 46 of multiple potential splice sites or the complex regulatory elements associated with 47 them [18]. Incorporating these distinctive characteristics into deep learning models In this study, we use DNA sequences with donor splice sites and acceptor splice sites 55 from Arabidopsis thaliana. ...
Preprint
Full-text available
In this study, we investigate the impact of introns on the effectiveness of splice site prediction using deep learning models, focusing on Arabidopsis thaliana. We specifically utilize U2-type introns due to their ubiquity in plant genomes and the rich datasets available. We formulate two hypotheses: first, that short introns would lead to a higher effectiveness of splice site prediction than long introns due to reduced spatial complexity; and second, that sequences containing multiple introns would improve prediction effectiveness by providing a richer context for splicing events. Our findings indicate that (1) models trained on datasets with shorter introns consistently outperform those trained on datasets with longer introns, highlighting the importance of intron length in splice site prediction, and (2) models trained with datasets containing multiple introns per sequence demonstrate superior effectiveness over those trained with datasets containing a single intron per sequence. Furthermore, our findings not only align with the two hypotheses we put forward but also confirm existing observations from wet lab experiments regarding the impact of length of an intron and the number of introns present in a sequence on splice site prediction effectiveness, suggesting that our computational insights come with biological relevance.
... was used for splice variants prediction in VUS, including cis-elements ESS, ISS, ESE, and ISE elements. It uses Splice Site Prediction by Neural Network (NNSplice; donor, acceptor; 0-1) [30], Splice Site Finder (SSF; donor, acceptor, branchpoint; 0-100), Max-EntScan (MES; scores from the Maximum Entropy Model; 0-12) [31], ESE Finder [32] and Relative Enhancer and Silencer Classification by Unanimous Enrichment (RESCUE-ESE) [33]. Splicing predictions at nearest natural junction: The variant annotation window with automatically computed splicing predictions at the nearest junction for MaxEntScan and SSF predictors. ...
Article
Full-text available
Phenylketonuria (PKU) is a genetic disorder caused by variations in the phenylalanine hydroxylase (PAH) gene. Among the 3369 reported PAH variants, 33.7% are missense alterations. Unfortunately, 30% of these missense variants are classified as variants of unknown significance (VUS), posing challenges for genetic risk assessment. In our study, we focused on analyzing 836 missense PAH variants following the American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines specified by ClinGen PAH Variant Curation Expert Panel (VCEP) criteria. We utilized and compared variant annotator tools like Franklin and Varsome, conducted 3D structural analysis of PAH, and examined active and regulatory site hotspots. In addition, we assessed potential splicing effect of apparent missense variants. By evaluating phenotype data from 22962 PKU patients, our aim was to reassess the pathogenicity of missense variants. Our comprehensive approach successfully reclassified 309 VUSs out of 836 missense variants as likely pathogenic or pathogenic (37%), upgraded 370 likely pathogenic variants to pathogenic, and reclassified one previously considered likely benign variant as likely pathogenic. Phenotypic information was available for 636 missense variants, with 441 undergoing 3D structural analysis and active site hotspot identification for 180 variants. After our analysis, only 6% of missense variants were classified as VUSs, and three of them (c.23A>C/p.Asn8Thr, c.59_60delinsCC/p. Gln20Pro, and c.278A >T/p.Asn93Ile) may be influenced by abnormal splicing. Moreover, a pathogenic variant (c.168G>T/p.Glu56Asp) was identified to have a risk exceeding 98% for modifications of the consensus splice site, with high scores indicating a donor loss of 0.94. The integration of ACMG/AMP guidelines with in silico structural analysis and phenotypic data significantly reduced the number of missense VUSs, providing a strong basis for genetic counseling and emphasizing the importance of metabolic phenotype information in variant curation. This study also sheds light on the current landscape of PAH variants.
... Reference alternative exons, which are not regulated by SmD2, were determined by excluding exons affected by SmD2 based on data from HEXEvent. Splice site strengths for both types of exons were determined using maximum entropy models 68 . Additionally, the interaction binding energy between U1 snRNA and 5'ss was computed using the RNA-fold algorithm 69 , which is part of the ViennaRNA package v2.5.0. ...
Article
Full-text available
Despite the importance of spliceosome core components in cellular processes, their roles in cancer development, including hepatocellular carcinoma (HCC), remain poorly understood. In this study, we uncover a critical role for SmD2, a core component of the spliceosome machinery, in modulating DNA damage in HCC through its impact on BRCA1/FANC cassette exons and expression. Our findings reveal that SmD2 depletion sensitizes HCC cells to PARP inhibitors, expanding the potential therapeutic targets. We also demonstrate that SmD2 acetylation by p300 leads to its degradation, while HDAC2-mediated deacetylation stabilizes SmD2. Importantly, we show that the combination of Romidepsin and Olaparib exhibits significant therapeutic potential in multiple HCC models, highlighting the promise of targeting SmD2 acetylation and HDAC2 inhibition alongside PARP inhibitors for HCC treatment.
... Therefore, we included this variant in our study. In silico analysis for these variants were performed using MaxEntScan (MES) [13] (http:// holly wood. mit. ...
Article
Full-text available
Background Wilms tumor 1 ( WT1 ; NM_024426) causes Denys–Drash syndrome, Frasier syndrome, or isolated focal segmental glomerulosclerosis. Several WT1 intron variants are pathogenic; however, the pathogenicity of some variants remains undefined. Whether a candidate variant detected in a patient is pathogenic is very important for determining the therapeutic options for the patient. Methods In this study, we evaluated the pathogenicity of WT1 gene intron variants with undetermined pathogenicity by comparing their splicing patterns with those of the wild-type using an in vitro splicing assay using minigenes. The three variants registered as likely disease-causing genes: Mut1 (c.1017-9 T > C(IVS5)), Mut2 (c.1355-28C > T(IVS8)), Mut3 (c.1447 + 1G > C(IVS9)), were included as subjects along the 34 splicing variants registered in the Human Gene Mutation Database (HGMD) ® . Results The results showed no significant differences in splicing patterns between Mut1 or Mut2 and the wild-type; however, significant differences were observed in Mut3. Conclusion We concluded that Mut1 and Mut2 do not possess pathogenicity although they were registered as likely pathogenic, whereas Mut3 exhibits pathogenicity. Our results suggest that the pathogenicity of intronic variants detected in patients should be carefully evaluated.
... Deleterious effects of missense variants were predicted by PolyPhen2, SIFT, and AlignGVGD. Loss of a canonical splice site was predicted by MaxEntScan (MES) (Yeo & Burge, 2004) and Splice Site Finder-like (SSF-like) (Shapiro & Senapathy, 1987) (≥ 15% decrease in splice site score compared to reference). Gain of a splice site was predicted by the splicing module of Alamut (Interactive Biosoftware, Rouen, France) where two or more algorithms meet significance thresholds for the variant sequence but not the reference sequence: MES (score > 0), SSF-like (score > 70), and Splice Site Prediction by Neural Network (NNSPLICE) (Reese et al., 1997) (score > 0.4). ...
Article
Full-text available
A minority of patients with autism spectrum disorder (ASD) are offered genetic testing by their providers or referred for genetics evaluation despite published guidelines and consensus statements supporting genetics-informed care for this population. This study aimed to investigate the ordering habits of providers of different specialties and to additionally assess the diagnostic utility of genetic testing by test type, patient sex, and race and ethnicity. We retrospectively analyzed data associated with orders for the indication of ASD from a large clinical laboratory over 6 years (2017–2022). Geneticists and neurologists were more likely than other specialists to order exome sequencing and neurodevelopmental (NDD) panel testing while other providers were more likely to order chromosomal microarray (CMA) and Fragile X testing. Exome had the highest diagnostic yield (24.5%), followed by NDD panel (6.4%), CMA (6.2%), and Fragile X testing (0.4%). Females were 1.4x (95% CI: 1.2–1.7) more likely than males to receive a genetic diagnosis. However, for Fragile X, males had a higher diagnostic yield than females (0.4% vs 0.2%). Our findings highlight the need to enable non-genetics providers to order comprehensive genetic testing or promote referral to genetics following negative CMA and/or Fragile X testing. Our data supports that ASD testing should include exome, CMA, and other clinically indicated tests, as first-tier tests, with the consideration of panel testing, in cases where exome sequencing is not an option. Lastly, our study helps to inform expectations for genetic testing yield by test type and patient presentation.
Article
Trans-splicing is a post-transcriptional processing event that joins exons from separate RNAs to produce a chimeric RNA. However, the detailed mechanism of trans-splicing remains poorly understood. Here, we characterize trans-spliced genes and provide insights into the mechanism of trans-splicing in the tunicate Ciona. Tunicates are the closest invertebrates to humans, and their genes frequently undergo trans-splicing. Our analysis revealed that, in genes that give rise to both trans-spliced and non-trans-spliced messenger RNAs, trans-splice acceptor sites were preferentially located at the first functional acceptor site, and their paired donor sites were weak in both Ciona and humans. Additionally, we found that Ciona trans-spliced genes had GU- and AU-rich 5′ transcribed regions. Our data and findings not only are useful for Ciona research community, but may also aid in a better understanding of the trans-splicing mechanism, potentially advancing the development of gene therapy based on trans-splicing.
Preprint
Mammals tightly regulate their core body temperature, yet how cells sense and respond to small temperature changes at the molecular level remains incompletely understood. Here, we discover a significant enrichment of RNA G-quadruplex (rG4) motifs around splice sites of cold-repressed exons. These thermosensing RNA structures, when stabilized, mask splice sites, reducing exon inclusion. Focusing on cold-induced neuroprotective RBM3, we demonstrate that rG4s near splice sites of a cold-repressed poison exon are stabilized at low temperatures, leading to exon exclusion. This enables evasion of nonsense-mediated decay, increasing RBM3 expression at cold. Additionally, increasing intracellular potassium concentration stabilizes rG4s and enhances RBM3 expression, leading to RBM3-dependent neuroprotection in a mouse model of subarachnoid hemorrhage. Our findings unveil a mechanism how mammalian RNAs directly sense temperature and potassium perturbations, integrating them into gene expression programs. This opens new avenues for treating diseases arising from splicing defects and disorders benefiting from therapeutic hypothermia.
Article
Full-text available
Coding region and intronic mutations in the tau gene cause frontotemporal dementia and parkinsonism linked to chromosome 17. Intronic mutations and some missense mutations increase splicing in of exon 10, leading to an increased ratio of four-repeat to three-repeat tau isoforms. Secondary structure predictions have led to the proposal that intronic mutations and one missense mutation destabilize a putative RNA stem-loop structure located close to the splice-donor site of the intron after exon 10. We have determined the three-dimensional structure of this tau exon 10 splicing regulatory element RNA by NMR spectroscopy. We show that it forms a stable, folded stem-loop structure whose thermodynamic stability is reduced by frontotemporal dementia and parkinsonism linked to chromosome 17 mutations and increased by compensatory mutations. By exon trapping, the reduction in thermodynamic stability is correlated with increased splicing in of exon 10.
Article
Full-text available
A systematic analysis of the RNA splice junction sequences of eukaryotic protein coding genes was carried out using the GENBANK databank. Nucleotide frequencies obtained for the highly conserved regions around the splice sites for different categories of organisms closely agree with each other. A striking similarity among the rare splice junctions which do not contain AG at the 3′ splice site or GT at the 5′ splice site indicates the existence of special mechanisms to recognize them, and that these unique signals may be involved in crucial gene-regulation events and in differentiation. A method was developed to predict potential exons in a bare sequence, using a scoring and ranking scheme based on nucleotide weight tables. This method was used to find a majority of the exons in selected known genes, and also predicted potential new exons which may be used in alternative splicing situations.
Article
This chapter discusses the compositional properties of pre-mRNA splicing signals explains the construction and application of simple probabilistic models of biological signal sequences. Although all of the examples relate to splicing signals, many of the techniques can equally well be applied to other types of nucleic acid signals, such as those involved in transcription, translation, or other biochemical processes. The chapter emphasizes the use of simple statistical tests of dependence between sequence positions. In some cases, observed dependencies can give clues to important functional constraints on a signal and incorporation of such dependencies into probabilistic models of sequence signals can lead to significant improvements in the accuracy of signal prediction/classification. The chapter discusses some caveats concerning the dangers of: (1) constructing models with too many parameters and (2) overinterpreting observed correlations in sequence data. It also describes pre-mRNA splicing, the probabilistic approach to signal classification, and several standard types of discrete probabilistic models.
Book
Probablistic models are becoming increasingly important in analyzing the huge amount of data being produced by large-scale DNA-sequencing efforts such as the Human Genome Project. For example, hidden Markov models are used for analyzing biological sequences, linguistic-grammar-based probabilistic models for identifying RNA secondary structure, and probabilistic evolutionary models for inferring phylogenies of sequences from different organisms. This book gives a unified, up-to-date and self-contained account, with a Bayesian slant, of such methods, and more generally to probabilistic methods of sequence analysis. Written by an interdisciplinary team of authors, it is accessible to molecular biologists, computer scientists, and mathematicians with no formal knowledge of the other fields, and at the same time presents the state of the art in this new and important field.
Article
An iterative method is presented which gives an optimum approximationto the joint probability distribution of a set of binary variables given the joint probability distributions of any subsets of the variables (any set of component distributions). The most significant feature of this approximation procedure is that there is no limitation to the number or type of component distributions that can be employed. Each step of the iteration gives an improved approximation, and the procedure converges to give an approximation that is the minimum information (i.e. maximum entropy) extension of the component distributions employed.
Article
The measurement and/or storage of high order probability distributions implies exponential increases in equipment complexity. This paper considers the possibility of storing several of the lower order component distributions and using this partial information to form an approximation to the actual high order distribution. The approximation method is based on an information measure for the “closeness” of two distributions and on the criterion of maximum entropy. Approximations consisting of products of appropriate lower order distributions are proved to be optimum under suitably restricted conditions. Two such product approximations can be compared and the better one selected without any knowledge of the actual high order distribution other than that implied by the lower order distributions.
Article
Treatment of the predictive aspect of statistical mechanics as a form of statistical inference is extended to the density-matrix formalism and applied to a discussion of the relation between irreversibility and information loss. A principle of "statistical complementarity" is pointed out, according to which the empirically verifiable probabilities of statistical mechanics necessarily correspond to incomplete predictions. A preliminary discussion is given of the second law of thermodynamics and of a certain class of irreversible processes, in an approximation equivalent to that of the semiclassical theory of radiation. It is shown that a density matrix does not in general contain all the information about a system that is relevant for predicting its behavior. In the case of a system perturbed by random fluctuating fields, the density matrix cannot satisfy any differential equation because rho&dot;(t) does not depend only on rho(t), but also on past conditions The rigorous theory involves stochastic equations in the type rho(t)=G(t, 0)rho(0), where the operator G is a functional of conditions during the entire interval (0-->t). Therefore a general theory of irreversible processes cannot be based on differential rate equations corresponding to time-proportional transition probabilities. However, such equations often represent useful approximations.
Article
Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum-entropy estimate. It is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information. If one considers statistical mechanics as a form of statistical inference rather than as a physical theory, it is found that the usual computational rules, starting with the determination of the partition function, are an immediate consequence of the maximum-entropy principle. In the resulting "subjective statistical mechanics," the usual rules are thus justified independently of any physical argument, and in particular independently of experimental verification; whether or not the results agree with experiment, they still represent the best estimates that could have been made on the basis of the information available.
Article
Diagnostic systems of several kinds are used to distinguish between two classes of events, essentially "signals" and "noise". For them, analysis in terms of the "relative operating characteristic" of signal detection theory provides a precise and valid measure of diagnostic accuracy. It is the only measure available that is uninfluenced by decision biases and prior probabilities, and it places the performances of diverse systems on a common, easily interpreted scale. Representative values of this measure are reported here for systems in medical imaging, materials testing, weather forecasting, information retrieval, polygraph lie detection, and aptitude testing. Though the measure itself is sound, the values obtained from tests of diagnostic systems often require qualification because the test data on which they are based are of unsure quality. A common set of problems in testing is faced in all fields. How well these problems are handled, or can be handled in a given field, determines the degree of confidence that can be placed in a measured value of accuracy. Some fields fare much better than others.