Table 2 - uploaded by John Kececioglu
Content may be subject to copyright.
Recovery rates comparing the gap models alone 

Recovery rates comparing the gap models alone 

Source publication
Article
Full-text available
Accurately aligning distant protein sequences is notoriously difficult. Since the amino acid sequence alone often does not provide enough information to obtain accurate alignments under the standard alignment scoring functions, a recent approach to improving alignment accuracy is to use additional information such as secondary structure. We make se...

Citations

... , . . . a p q r b . . . ) would be 3ρ 2 + ρ 3 where ρ 2 , ρ 3 are costs for spaces and gaps respectively [20]. The alignment that minimizes the cost c may again be obtained using a dynamic program in O(mn) time. ...
Preprint
Data-driven algorithm configuration is a promising, learning-based approach for beyond worst-case analysis of algorithms with tunable parameters. An important open problem is the design of efficient data-driven algorithms for algorithm families with more than one parameter. In this work we provide algorithms for efficient (output-polynomial) multidimensional parameter tuning, i.e. for families with a small constant number of parameters, for three very different combinatorial problems -- linkage-based clustering, dynamic programming for sequence alignment, and auction design for two-part tariff schemes. We extend the single-parameter clustering algorithm of Balcan et al. 2020 arXiv:1907.00533 to multiple parameters and to the sequence alignment problem by proposing an execution graph which compactly represents all the states the algorithm could attain for all possible parameter values. A key problem-specific challenge is to efficiently compute how the partition of the parameter space (into regions with unique algorithmic states) changes with a single algorithmic step. We give algorithms which improve on the runtime of previously best known results for linkage-based clustering, sequence alignment and two-part tariff pricing.
... Phylogenetic analysis of type I IFN genes in N. parkeri Automated multiple sequence alignments for amino acids were generated using the ClustalX program (27), and nonconserved protein sequences of extralong chains at N-or C-terminals were removed manually. Structureguided alignment (28,29) was performed manually with the information of available three-dimensional structures of the type I IFNs, including human IFN-a2 and IFN-b (30,31) and zebrafish IFN1 and IFN2 (28), and with the information of predicted secondary structures of other type I IFNs generated by Scratch Protein Predictor (http://scratch.proteomics.ics.uci.edu/). ProtTest software (32) was used to select the best-fit model, which was the JTT model for the construction of the phylogenetic tree of type I IFNs in vertebrates. ...
Article
Full-text available
Type I interferons are a subset of cytokines playing central roles in host antiviral defense, and their effects depend on the interaction with the heterodimeric receptor complex. Surprisingly, two pairs of the receptor subunits, CRFB1 and CRFB5, and CRFB2 and CRFB5, have been identified in fish, but the studies about preferential receptor usage of different fish IFN subtypes are rather limited. In this study, the three receptor chains of type I IFNs named as On-CRFB1, On-CRFB2 and On-CRFB5 were identified in Nile tilapia, Oreochromis niloticus. These three genes were constitutively expressed in all tissues examined, with the highest expression level observed in muscle and liver, and were rapidly induced in liver following the stimulation of poly(I:C). Interestingly, it is possible that all three subtypes of tilapia IFNs are able to signal through two pairs of the receptor subunits, On-CRFB1 and On-CRFB5, and On-CRFB2 and On-CRFB5. More importantly, tilapia group I IFNs (On-IFNd and On-IFNh) preferentially signal through a receptor complex composed of On-CRFB1 and On-CRFB5, and group II IFNs (On-IFNc) preferentially signal through a receptor complex comprised of On-CRFB2 and On-CRFB5. The present study thus provides new insights into the receptor usage of group I and group II IFNs in fish.
... Predicting the secondary structure of a protein from its amino acid sequence is a classic and fundamental problem in bioinformatics, that is a building block in many tasks such as protein tertiary structure prediction (Dill and MacCallum, 2012), protein multiple sequence alignment (Deng and Cheng, 2011;Kececioglu et al., 2010;Lu and Sze, 2008) and solvent accessibility prediction (Adamczak et al., 2004). The true secondary structure of a protein is usually obtained from its known tertiary structure using a tool such as DSSP (Kabsch and Sander, 1983), which labels the amino acid residues in the protein (based on the torsion angles of their backbone) with eight states, that are traditionally reduced to just three classes: G (3 10 -helix), H (a-helix) and I (p-helix), are usually classified as alpha (a); B (isolated bridge) and E (extended sheet) are usually classified as beta (b); and everything else is usually classified as coil (c), representing 'other' (or the unstructured class). ...
Article
Full-text available
Motivation: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. Method: We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. Results: On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q8 accuracy by more than 2-10%, and Q3 accuracy by more than 1-3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. Availability and implementation: A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu.
... Phylogenetic analysis of type I IFN genes in N. parkeri Automated multiple sequence alignments for amino acids were generated using the ClustalX program (27), and nonconserved protein sequences of extralong chains at N-or C-terminals were removed manually. Structureguided alignment (28,29) was performed manually with the information of available three-dimensional structures of the type I IFNs, including human IFN-a2 and IFN-b (30,31) and zebrafish IFN1 and IFN2 (28), and with the information of predicted secondary structures of other type I IFNs generated by Scratch Protein Predictor (http://scratch.proteomics.ics.uci.edu/). ProtTest software (32) was used to select the best-fit model, which was the JTT model for the construction of the phylogenetic tree of type I IFNs in vertebrates. ...
Article
Full-text available
In vertebrates, intron-containing and intronless type I IFN genes have recently been reported in amphibian model species Xenopus tropicalis and X. laevis. However, whether intronless type I IFNs in amphibians are the ancestral genes of type I IFNs in amniotes or just represent the independent divergence in amphibians is unknown or even uninvestigated. In this study, both intron-containing and intronless type I IFN genes, as well as their receptor genes, were identified in the Tibetan frog Nanorana parkeri The evidence obtained from homology, synteny, phylogeny, and divergence time showed that intronless type I IFN genes in N. parkeri and in Xenopus might have arisen from two independent retroposition events occurred in these two lineages, and the retrotransposition causing the generation of intronless type I IFN genes in amniotes is another independent event beyond the two in amphibians. It can then be proposed that intronless type I IFNs in N. parkeri and Xenopus may not be the ancestral genes of intronless type I IFNs in amniotes but may just represent two independent bifurcations in the amphibian lineage. Furthermore, both intronless and intron-containing type I IFNs in N. parkeri showed strong ability in inducing the expression of IFN-stimulated genes and the strong antiviral activity against frog virus 3. The present study thus provides the evolutionary evidence to support the independent retroposition hypothesis for the occurrence of intronless type I IFN genes in amphibians and contributes to a functional understanding of type I IFNs in this group of vertebrates.
... Mutations in this region alter its structure, which lowers the drug binding affinity [14]. Specifically, S31N mutation increases the bulkiness and polarity of the channel lining residues, and thus reduces the space available for interaction with drugs [15]. Large-scale analysis of these influenza surface proteins may give better elucidation for understating the evolution of the proteins toward the drug resistant mechanism. ...
Article
Full-text available
Background: M2 channel protein of influenza A virus is one of the specific targets for the anti-influenza drugs amantadine and rimantadine. These drugs have lost their efficacy because of the mutations in their drug interaction sites. Large-scale analysis of these influenza surface proteins may give better elucidation for understanding the evolution of the proteins toward the drug resistant mechanism. Objective: The current investigation aimed to understand the evolutionary lineage and to enlighten the mechanism of drug resistance in newly emerging strains. Method: Combined sequence, secondary structural, evolutionary conservation, and phylogenetic analyses were carried out with 2010 influenza A M2 channel protein sequences. Results: The structural information provides enough details for understanding the drug resistance in the target proteins. Herein, secondary structural analysis of M2 sequences predicted the variation only in the drug binding region. The rate of mutation in S31N is high in swine/H3N2 than in human/H1N1, human/H3N2, swine/H1N1, and avian/H5N influenza A viruses. This confirms that antigenic drift does not affect the functional mechanism of the protein. Also, it reports that the avian influenza virus is the source for the M2 gene segment and has transferred from the avian to human and swine. Our findings show that the M2 gene segment has interchanged between swine and human. Conclusion: This study proves that rapid mutation and frequent reassortment play a major role in drug resistant strains. Phylogenetic and secondary structural analysis confirms the existence of a genetic lineage between avian, swine, and human influenza A viruses.
... The difficulty of assigning a domain structure to Rictor reflects the difficulty of accurately aligning distant protein sequences, especially for those proteins with less than 25% identity. Several approaches have been proposed to address the challenge (Kececioglu et al., 2010). Homology identification could also suffer from mis-assignments because of the similarity of homologous domains in otherwise unrelated sequences (Zinzalla et al., 2011). ...
Article
Full-text available
Mammalian target of rapamycin (mTOR) complexes play a pivotal role in the cell. Raptor and Rictor proteins interact with mTOR to form two distinct complexes, mTORC1 and mTORC2, respectively. While the domain structure of Raptor is known, current bioinformatics tools failed to classify the domains in Rictor. Here we focus on identifying specific domains in Rictor by searching for conserved regions. We scanned the pdb structural database and constructed three protein domain datasets. Next we carried out multiple pairwise sequence alignments of the proteins in the domain dataset. By analyzing the z-scores of Rictor sequence similarity to protein sequences in the dataset, we assigned the structural and functional domains of Rictor. We found that, like Raptor, Rictor also has HEAT and WD40 domains, which could be the common motif binding to mTORC. Rictor may also have pleckstrin homology domains, which mediate cellular localization and transmit signals to downstream targets, as well as a domain that is homologous to 50S protein L17 and human 39S protein L17. This putative ribosome binding domain could mediate mTORC2-ribosome interaction.
... Additionally, as structures are evolutionarily more conserved than sequences in proteins, structural information also provides more distant relationships between sequences (Kemena and Notredame, 2009). For instance, Kececioglu et al. (2010) provided a novel scoring scheme to evaluate MSAs from their predicted secondary structures. Other scores, such as contact accepted mutation (Lin et al., 2003) and STRIKE (Kemena et al., 2011) scores also estimated the molecular contacts from protein structures to calculate alignment accuracies. ...
Article
Full-text available
Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce quite different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Amongst them, 3D structures are increasingly being used to evaluate alignments. Since structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. Results: The proposed multiobjective algorithm, based on the Non-Dominated Sorting Genetic Algorithm (NSGA-II), aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (p<0.01). This algorithm also outperforms other aligners, such as ClustalW, MSA-GA, PRRP, DIALIGN, HMMT, PIMA, MULTIALIGN, PILEUP, RGT-GA and VDGA according to the Wilcoxon signed-rank test (p<0.05), whereas it shows statistically equivalent results to 3D-COFFEE (p>0.05) with the advantage of being able to use less structures. Structural information is included within the objective function in order to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip. fortuno@ugr.es.
Conference Paper
Protein multiple sequence alignment is significant in the field of bioinformatics as it may reveal important information about the protein sequences' functional, structural or evolutionary relationships. It involves the alignment of three or more biological protein sequences and represents a real challenge both from a biological and a computational point of view. Q-learning is a reinforcement learning technique in which an artificial agent learns to find an optimal sequence of actions to achieve a goal by receiving rewards for its chosen actions. This paper investigates a Q-learning based model for the multiple sequence alignment problem applied on protein sequences. The experimental evaluation of the model is performed on two artificial data sets and on benchmark problem sets selected from the BAliBASE database. The obtained results show the effectiveness of using reinforcement learning for determining the optimal alignment of multiple protein sequences.
Article
Full-text available
Interferons (IFNs) play a major role in orchestrating the innate immune response toward viruses in vertebrates, and their defining characteristic is their ability to induce an antiviral state in responsive cells. Interferons have been reported in a multitude of species, from bony fish to mammals. However, our current knowledge about the molecular function of fish IFNs as well as their evolutionary relationship to tetrapod IFNs is limited. Here we establish the three-dimensional (3D) structure of zebrafish IFNϕ1 and IFNϕ2 by crystallography. These high-resolution structures offer the first structural insight into fish cytokines. Tetrapods possess two types of IFNs that play an immediate antiviral role: type I IFNs (e.g., alpha interferon [IFN-α] and beta interferon [IFN-β]) and type III IFNs (lambda interferon [IFN-λ]), and each type is characterized by its specific receptor usage. Similarly, two groups of antiviral IFNs with distinct receptors exist in fish, including zebrafish. IFNϕ1 and IFNϕ2 represent group I and group II IFNs, respectively. Nevertheless, both structures reported here reveal a characteristic type I IFN architecture with a straight F helix, as opposed to the remaining class II cytokines, including IFN-λ, where helix F contains a characteristic bend. Phylogenetic trees derived from structure-guided multiple alignments confirmed that both groups of fish IFNs are evolutionarily closer to type I than to type III tetrapod IFNs. Thus, these fish IFNs belong to the type I IFN family. Our results also imply that a dual antiviral IFN system has arisen twice during vertebrate evolution.