Figure - available via license: Creative Commons Attribution 2.0 Generic
Content may be subject to copyright.
Correlated mutation analysis of MADS domain proteins: workflow diagram. Sequences were obtained via blast, interpro and from a set of sequenced plant genomes. Orthologs were assigned using a best-hit criterium, followed by clustering to group sequences within each species with a very high sequence identity and filtering to remove sequences with low sequence identity to Arabidopsis MADS sequences. Intramolecular and intermolecular correlated mutations were obtained, and validated using various datasets. Subsequently, conservation of correlated mutations between proteins was analyzed, and correlated mutations were compared between interacting and non-interacting proteins. A. th. is Arabidopsis thaliana.

Correlated mutation analysis of MADS domain proteins: workflow diagram. Sequences were obtained via blast, interpro and from a set of sequenced plant genomes. Orthologs were assigned using a best-hit criterium, followed by clustering to group sequences within each species with a very high sequence identity and filtering to remove sequences with low sequence identity to Arabidopsis MADS sequences. Intramolecular and intermolecular correlated mutations were obtained, and validated using various datasets. Subsequently, conservation of correlated mutations between proteins was analyzed, and correlated mutations were compared between interacting and non-interacting proteins. A. th. is Arabidopsis thaliana.

Source publication
Article
Full-text available
Plant MADS domain proteins are involved in a variety of developmental processes for which their ability to form various interactions is a key requisite. However, not much is known about the structure of these proteins or their complexes, whereas such knowledge would be valuable for a better understanding of their function. Here, we analyze those pr...

Citations

... Although the rationale of using correlated mutations to inform on molecular interactions seems very plausible and straightforward, it has also been shown that covarying sites do not necessarily correspond to sites that are spatially interacting (4)(5)(6)(7)(8)(9)(10)15). Phylogenetic bias, transitivity effects, passenger mutations are some of the reasons why the detected covariance can be misleading (4). ...
Article
Full-text available
The concept of exploiting correlated mutations has been introduced and applied successfully to identify interactions within and between biological macromolecules. Its rationale lies in the preservation of physical interactions via compensatory mutations. With the massive increase of available sequence information, approaches based on correlated mutations have regained considerable attention. We analyzed a set of 10 707 430 single nucleotide polymorphisms detected in 1135 accessions of the plant Arabidopsis thaliana. To measure their covariance and to reveal the global genome-wide sequence correlation structure of the Arabidopsis genome, the adjusted mutual information has been estimated for each possible pair of polymorphic sites. We developed a series of filtering steps to account for genetic linkage and lineage relations between Arabidopsis accessions, as well as transitive covariance as possible confounding factors. We show that upon appropriate filtering, correlated mutations prove indeed informative with regard to molecular interactions, and furthermore, appear to reflect on chromosomal interactions. Our study demonstrates that the concept of correlated mutations can also be applied successfully to within-species sequence variation and establishes a promising approach to help unravel the complex molecular interactions in A. thaliana and other species with broad sequence information.
... The MADS domain protein dataset consists of 12 Arabidopsis MADS domain proteins with homologous sequences from various plant genomes which we previously analysed using CAPS, an algorithm which uses BLOSUM and calculates Pearson correlation coefficients between the transition probability scores (between pairs of sequences) observed in one column and each other column [7,36]. For these proteins, the RMRCM predictions in almost all cases had a significant overrepresentation of short distances compared to the crystal structure: using a c 2 -test, all but two out of twelve MADS datasets had p-values below 0.05, and in most cases the p-value was much smaller; the average for the ten cases with p < 0.05 was 0.006 +/-0.01. ...
... We also used MI on those datasets, and found that the distance enrichment of MI-predicted links was much worse than what was obtained with CAPS or RMRCM (data not shown). Note that there is quite some variation in the performance for the various MADS domain proteins, which is mainly related to the different amount of sequences in the multiple sequence alignments for those proteins, as observed already when using CAPS (see [36]) and in line with results mentioned above. ...
Article
Full-text available
In addition to sequence conservation, protein multiple sequence alignments contain evolutionary signal in the form of correlated variation among amino acid positions. This signal indicates positions in the sequence that influence each other, and can be applied for the prediction of intra- or intermolecular contacts. Although various approaches exist for the detection of such correlated mutations, in general these methods utilize only pairwise correlations. Hence, they tend to conflate direct and indirect dependencies. We propose RMRCM, a method for Regularized Multinomial Regression in order to obtain Correlated Mutations from protein multiple sequence alignments. Importantly, our method is not restricted to pairwise (column-column) comparisons only, but takes into account the network nature of relationships between protein residues in order to predict residue-residue contacts. The use of regularization ensures that the number of predicted links between columns in the multiple sequence alignment remains limited, preventing overprediction. Using simulated datasets we analyzed the performance of our approach in predicting residue-residue contacts, and studied how it is influenced by various types of noise. For various biological datasets, validation with protein structure data indicates a good performance of the proposed algorithm for the prediction of residue-residue contacts, in comparison to previous results. RMRCM can also be applied to predict interactions (in addition to only predicting interaction sites or contact sites), as demonstrated by predicting PDZ-peptide interactions. A novel method is presented, which uses regularized multinomial regression in order to obtain correlated mutations from protein multiple sequence alignments. R-code of our implementation is available via http://www.ab.wur.nl/rmrcm.
Preprint
Full-text available
The concept of exploiting correlated mutations has been introduced and applied successfully to identify interactions within and between biological macromolecules. Its rationale lies in the preservation of physical interactions via compensatory mutations. With the massive increase of available sequence information, approaches based on correlated mutations have regained considerable attention. We analyzed a set of 10,707,430 single nucleotide polymorphisms detected in 1,135 accessions of the plant Arabidopsis thaliana . To measure their covariance and to reveal the global genome-wide sequence correlation structure of the Arabidopsis genome, the adjusted mutual information has been estimated for each possible pair of polymorphic sites. We developed a series of filtering steps to account for genetic linkage and lineage relations between Arabidopsis accessions, as well as transitive covariance as possible confounding factors. We show that upon appropriate filtering, correlated mutations prove indeed informative with regard to molecular interactions, and furthermore, appear to reflect on chromosomal interactions. Our study demonstrates that the concept of correlated mutations can also be applied successfully to within-species sequence variation and establishes a promising approach to help unravel the complex molecular interactions in A. thaliana and other species with broad sequence information.