Discussion
Started 8th Aug, 2021

Amino Acids sequence to form Protein

I'm coming from pure Computer Science background, my research is to utilize the amino acids to encode Network transactions. I'm thinking of using Amino Acids for the encoding then to use any of the pattern matching.
My main question here is whether if there is any rules between amino acids order when formulating protein sequence? Meaning for example can glycine (G) follow any amino acids like leucine (L)?
Also if you can provide Amino acids sequences for actual proteins, that would help.

Most recent answer

Michael Sadovsky
Krasnoyarsk Scientific Center
Formally, there are no constraints in combinations of AA in a protein sequence, at all. However, some combinations are rather frequent while others are very rare. This is a real way to study the point you are asking about. The problem is what frequency of a specific combination of two or several AAs one should consider to be a reference. A possible answer to this question comes from the method of invariant manifolds.

All replies (7)

Any amino acid can follow any other amino acid.
Protein sequences can be found in various databases, including here:
1 Recommendation
Amal Senevirathne
Chungnam National University
Dear Thaer Hani,
There are just 20 amino acids that can be found in human proteins. The amino acid can be of any order directed by the genetic code. Sample proteins can be downloaded from the NCBI database using the FASTA format.
Best wishes,
Amal.
1 Recommendation
Samiur Rahman
MilliporeSigma
Hi Thaer Hani!
I do agree with others that amino acids can be in any sequence to make a protein. For structural integrity some amino acids might be favored than others. For example, Ala, Glu, Leu, and Met are mostly found in helices whereas, Gly, Tyr, Ser, and Pro are less likely to be there. But in general any amino acid sequence can be present in a protein.
Best of luck!
1 Recommendation
Engelbert Buxbaum
Private Person
Just to add to the comments given by others, look up the keywords Ramachandran plot, helix and sheet propensity, hydropathy. Those are computer science concepts to analyse protein structures.
1 Recommendation
Michael Sadovsky
Krasnoyarsk Scientific Center
A tentative answer STRONGLY depends on the meaning of the word "rule". So, what did you mean?
Thaer Hani
De Montfort University
Hi @michael,
My "Rule" word supposed to refer to "constraints" like constraint on consecutive Amino acids, or on repetitive occurrences of same amino acids, etc.
Michael Sadovsky
Krasnoyarsk Scientific Center
Formally, there are no constraints in combinations of AA in a protein sequence, at all. However, some combinations are rather frequent while others are very rare. This is a real way to study the point you are asking about. The problem is what frequency of a specific combination of two or several AAs one should consider to be a reference. A possible answer to this question comes from the method of invariant manifolds.

Similar questions and discussions

Related Publications

Conference Paper
This paper introduces idiom networks of SCS (Short Constituent Sequences) for amino acid sequence analysis, where an idiom represents co-occurrence of two SCSs in a sequence. Thus, the idiom network proposed in this paper shows relations among idioms in an amino acid sequence. The authors also present an idiom network alignment problem, which may u...
Article
Full-text available
We have studied five methods of protein classification and have applied them to the 768 groups of related proteins in the PROSITE catalog. Four of these methods are based on searching a database of blocks, and the other uses the frequently occurring motifs found in the protein families combined with a fingerprint technique. Our experimental results...
Conference Paper
Deep learning has proven to be a useful tool for modelling protein properties. However, given the variability in the length of proteins, it can be difficult to summarise the sequence of amino acids effectively. In many cases, as a result of using fixed-length representations, information about long proteins can be lost through truncation, or model...
Got a technical question?
Get high-quality answers from experts.