Schematic representation of the spliceosome assembly and action. Pre-mRNA, containing two exons separated by an intron assembles into splicing complexes together with snRNPs. The individual snRNPs are indicated by U1, U2, U4, U5 and U6, U2AF: U2 auxiliary protein factor 

Schematic representation of the spliceosome assembly and action. Pre-mRNA, containing two exons separated by an intron assembles into splicing complexes together with snRNPs. The individual snRNPs are indicated by U1, U2, U4, U5 and U6, U2AF: U2 auxiliary protein factor 

Source publication
Article
Full-text available
The mature mRNA always carries nucleotide sequences that faithfully mirror the protein product according to the niles of the genetic code. However, in the chromosome, the nucleotide sequence that represents a certain protein is interrupted by additional sequences. Therefore, most eukaryotic genes are longer than their final mRNA products. The human...

Context in source publication

Context 1
... assembly of spliceosomes on the pre-mRNA template is a well-organized process and the different snRNP components enter into the complex in a coordinat- ed manner (Fig. 2). As a first step, U1 snRNP base pairs via its RNA part with the 5' splice site [35], while the U2AF protein binds to a C/U reach region (polypyrimidine tract) located between the branch site and the 3' splice site (E complex) [59]. U2AF is a heterodimer consisting of 65-and 35-kDa subunits. U2AF 65 interacts with the polypyrimidine ...

Citations

... Transforming raw genomic sequences into knowledge-based data is crucial for improving the performances of splice site models (Mathe et al., 2002), which needs ensuring the elements in derived SSM biologically meaningful (Rauch and Kiss, 2003). Knowledge-driven feature selection is a way to identify biologically meaningful biomarkers (Chen et al., 2009). ...
Article
Full-text available
The splice sites are essential for pre-mRNA maturation and crucial for Splice Site Modelling (SSM); however, there are gaps between the splicing signals and the computationally identified sequence features. In this paper, the Locality Sensitive Features (LSFs) are proposed to reduce the gaps by homogenising their contexts. Under the skewness-kurtosis based statistics and data analysis, SSM attributed with LSFs is fulfilled by double-boundary outlier filters. The LSF-based SSM had been applied to six model organisms of diverse species; by the accuracy and Receiver Operating Characteristic (ROC) analysis, the promising results show the proposed methodology is versatile and robust for the splice-site classification. It is prospective the LSF-based SSM can serve as a new infrastructure for developing effective splice-site prediction methods and have the potential to be applied to other sequence prediction problems.
... The 5 0 and 3 0 splice sites (5SS and 3SS) define the exonintron junctions [15], branch point (BP) initiates the lariat formation [5], and poly-pyrimidine tract (PPT) facilitates the exon ligation [20]. The well-known 5SS, 3SS, BP, and PPT in the classical IDM are all weak identities for splice site prediction [19]. ...
Article
Full-text available
Current computational predictions of splice sites largely depend on the sequence patterns of known intronic sequence features (ISFs) described in the classical intron definition model (IDM). The computation-oriented IDM (CO-IDM) clearly provides more specific and concrete information for describing intron flanks of splice sites (IFSSs). In the paper, we proposed a novel approach of fuzzy decision trees (FDTs) which utilize (1) weighted ISFs of twelve uni-frame patterns (UFPs) and forty-five multi-frame patterns (MFPs) and (2) gain ratios to improve the performances in identifying an intron. First, we fuzzified extracted features from genomic sequences using membership functions with an unsupervised self-organizing map (SOM) technique. Then, we brought in different viewpoints of globally weighting and crossly referring in generating fuzzy rules, which are interpretable and useful for biologists to verify whether a sequence is an intron or not. Finally, the experimental results revealed the effectiveness of the proposed method in improving the identification accuracy. Besides, we also implemented an on-line intronic identifier to infer an unknown genomic sequence.
... Current computational predictions of splice sites largely depend on the sequence patterns of known intronic sequence features (ISFs) described in the classical intron definition model (IDM) [5,6]. The well-known 5SS, 3SS, BP, and PPT in the classical IDM are all weak identities for splice site prediction [8]. By combining the advantages of consensus and PWM, three computational concerns (CCs) are recruited: expression (E-CC), location (L-CC), and range (R-CC) [3,4]. ...
... where N| A denotes the particular child of node N, created by using attribute A to split node N. Finally, the information gain could be got using Equation (8). ...
... Although splice sites had been extensively studied in higher eukaryotes [4], the exonintron structures of most genes are still mosaics [1]. Obviously, mining ISFs to enrich the classical IDM is in demand, the context of the existing ISFs should be clarified with more specific content and more concrete descriptions; thus, resolving existing ISFs and discovering new ISFs are crucial for splice site identification in silico [13]. ...
... The well-known 5SS, 3SS, BP and PPT in classical IDM are all weak identities for splice site prediction [13], and such a situation hinders the existing prediction methods from being precise and effective. Thus, discovering any meaningful sequence features in IFSSs is valuable for splice site research. ...
Conference Paper
Full-text available
Intronic sequence features (ISFs) in the proximity of splice sites are the basis for predicting exon-intron junctions; however, the well-known ISFs are either short consensuses or ambiguous descriptions that impedes the development of precise and effective methods for splice site recognition. In this work, a multidimensional-feature mining methodology, depth-breadth fused codon analyses (DBFCA), was proposed to mine concrete and specific ISFs; three computational concerns, expression, location and range, were used to represent each dimension of the ISFs. DBFCA was applied to analyze 22,448 human introns, the well-known splicing signals were all recovered with richer details and some new ISFs were also discovered in pattern-desert regions; then, a computation-oriented intron definition model (CO-IDM) was constructed accordingly to make the results be easily reusable. The results show the proposed DBFCA is effective, reliable and widely applicable.
... Determining gene structure is still a big challenge [2] caused mainly by the far outnumbered pseudo EIJSs [20]; 5SSs and 3SSs related information are still the only resources for predicting EIJSs owing to the uncertainties and ambiguity of PPTs and BPs; yet the GT-AG major-class patterns [16] are weak identities because they are short and ambiguous; thus, any further understanding about the distinguishing properties between exon and intron flank regions will be valuable information for gene research [16]. Although some efforts had been devoted to mining sequence information in the flank regions of splice sites [10,23], significant breakthrough is still awaiting. ...
... Determining gene structure is still a big challenge [2] caused mainly by the far outnumbered pseudo EIJSs [20]; 5SSs and 3SSs related information are still the only resources for predicting EIJSs owing to the uncertainties and ambiguity of PPTs and BPs; yet the GT-AG major-class patterns [16] are weak identities because they are short and ambiguous; thus, any further understanding about the distinguishing properties between exon and intron flank regions will be valuable information for gene research [16]. Although some efforts had been devoted to mining sequence information in the flank regions of splice sites [10,23], significant breakthrough is still awaiting. ...
Conference Paper
Full-text available
The flank regions of exon-intron junction sites (EIJSs) are closely related to gene structure; and their sequence context is the determinants for the spliceosome to recognize the authentic EIJSs. Although some efforts had been devoted to mining sequence information in the flank regions of EIJSs, significant breakthroughs are still awaiting. In this paper, a hypothesis on EIJSs' flank regions was proposed, it stated that the sequence context of intron flank is more flexible than the conjoined exon flank for a specific EIJS; it means that the exon flanks are more conservative than the intron flanks. For investigating the proposed hypothesis, a frameshift strategy was used to explore the sequence context; and a codon-based measurement, frameshift oscillation (FO), was devised to estimate the sensitivity of being disturbed by frame shifting; according to the hypothesis, the exon flanks will be more sensitive to frameshift than their conjoined intron flanks. After investigating all EIJSs in complete human genome with various lengths of flank regions by FO, the results reveal there do exist intrinsic differences in the flank regions of EIJSs. Therefore, the proposed hypothesis has great implications in predicting gene structure; furthermore, it is believed that the devised estimator FO really catches a key feature of authentic EIJSs and it is of great potential in identifying authentic EIJSs.