Content uploaded by Gene W Yeo
Author content
All content in this area was uploaded by Gene W Yeo
Content may be subject to copyright.
Maximum Entropy Modeling of Short Sequence Motifs with
Applications to RNA Splicing Signals
Gene Yeo
Massachusetts Institute of Technology
Cambridge, Massachusetts 02139
geneyeo@mit.edu
Christopher Burge
Massachusetts Institute of Technology
Cambridge, Massachusetts 02139
cburge@mit.edu
ABSTRACT
Keywords
1. INTRODUCTION
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
RECOMB ’02 Berlin, Germany
Copyright 2002 ACM X-XXXXX-XX-X/XX/XX ...
5.00.
2. METHODS
2.1 Maximum Entropy Method
2.2 Marginal Constraints
2.2.1 “Complete” Constraints
2.2.2 “Specific” Constraints
2.3 Maximum Entropy Models
2.4 Iterative Scaling to Calculate MED
2.5 Ranking Position dependencies
3. SPLICE SITE RECOGNITION
3.1 Construction of Transcript Data
4. RESULTS AND DISCUSSION
4.1 Models of the 5’ splice site
0 0.1 0.2
0.7
0.75
0.8
0.85
0.9
0.95
1
Receiver Operating Curve Analysis
False Positive Rate
True Positive Rate (Sensitivity)
me1s0
me2s0
me2s1
me2x5
mdd
4.1.1 Ranked Constraints
0 20 40 60 80 100
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
Information Plot (me2s0 model)
Increasing Constraints
Information, I = 18−H
ranked
random
0 20 40 60 80 100
0.2
0.3
0.4
0.5
0.6
0.7
Maximum Correlation Coefficient (me2s0 model)
Increasing Constraints
Max Correlation Coefficient
ranked
random
4.2 Models of the 3’ splice site
4.3 Clustering of Splice Sites
0.02 0.04 0.06 0.08 0.1 0.12
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
Receiver Operating Curve Analysis
False Positive Rate
True Positive Rate (Sensitivity)
me2s0 modified
1mm
me2s0
wmm/0mm
me1s0
0.04 0.05 0.06 0.07 0.08 0.09
0.86
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
Receiver Operating Curve Analysis
False Positive Rate
True Positive Rate (Sensitivity)
me2x2
me2x3
me2x4
me2x5
me2x1
me3s0/2mm
me2s0/1mm
me4s0/3mm
me1s0/wmm/0mm
5. APPLICATIONSOFSPLICE SITE MOD-
ELS
5.1 Proximal 5’ss decoys in introns
0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Receiver Operating Curve Analysis
False Positive Rate
True Positive Rate (Sensitivity)
me2x5
me2x5 (combined)
me2s0
1mm (combined)
wmm
wmm (combined)
0 1000 2000 3000 4000 5000 6000 7000 8000
No hsd
Fhsd > 250
Fhsd < 250
Number of introns
me2x5
MDD
WMM
5.2 Ranking and Competing 5’ss
−20 −10 0 10
−15
−10
−5
0
5
10
me2s0/1mm
WMM
−20 0 20
−15
−10
−5
0
5
10
MDD
WMM
−40 −20 0 20
−15
−10
−5
0
5
10
me2x5
WMM
−20 0 20
−20
−15
−10
−5
0
5
10
MDD
me2s0/1mm
−40 −20 0 20
−20
−15
−10
−5
0
5
10
me2x5
me2s0/1mm
−40 −20 0 20
−20
−15
−10
−5
0
5
10
15
me2x5
MDD
6. CONCLUSIONS
7. FUTURE WORK
8. ACKNOWLEDGEMENTS
9. REFERENCES
APPENDIX
A. INHOMOGENEOUSMARKOVMODELS
B. PERFORMANCE MEASURES
C. ROC ANALYSIS
D. TABLES