PreprintPDF Available

Predicting Transmembrane Protein Segments Using Simplified Nucleotide-Determined Hydropathy: A Novel Approach Bypassing Genetic Code Translation

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

In the realm of molecular biology, the prediction of transmembrane protein segments is a critical yet intricate task, pivotal for understanding and manipulating cellular mechanisms. Traditional methodologies, largely dependent on the genetic code's translation into amino acids, have set the stage for advancements in protein structure analysis. However, these methods entail complexities and substantial computational resources. This study introduces an innovative approach: Predicting Transmembrane Protein Segments Using Simplified Nucleotide-Determined Hydropathy. Our method revolutionizes traditional practices by employing a straightforward yet effective nucleotide-determined hydropathy (NDH) system, which bypasses the convolutions of genetic code translation. Focusing on the second position of codons, we assign specific hydropathy values to nucleotides, effectively predicting protein hydropathy and structure. This groundbreaking approach not only simplifies the prediction process but also opens new frontiers in the efficient and rapid analysis of protein structures, particularly transmembrane segments, with far-reaching implications in biological research and medicine. Keywords: transmembrane protein prediction, nucleotide-determined hydropathy, NDH, genetic code translation, protein structure analysis, bioinformatics, molecular biology, Kyte-Doolittle hydropathy scale, computational biology, protein hydropathy, cell mechanisms, amino acid translation. Note: Python code is given in the Appendix. This paper uses only 2 NDHs, while the original 1990 paper on this subject used 12 NDHs: https://link.springer.com/chapter/10.1007/978-3-642-61297-8_21
1
Predicting Transmembrane Protein Segments Using
Simplified Nucleotide-Determined Hydropathy: A Novel
Approach Bypassing Genetic Code Translation
Douglas C. Youvan
doug@youvan.com
December 12, 2023
In the realm of molecular biology, the prediction of transmembrane protein
segments is a critical yet intricate task, pivotal for understanding and
manipulating cellular mechanisms. Traditional methodologies, largely
dependent on the genetic code's translation into amino acids, have set the
stage for advancements in protein structure analysis. However, these
methods entail complexities and substantial computational resources. This
study introduces an innovative approach: Predicting Transmembrane
Protein Segments Using Simplified Nucleotide-Determined Hydropathy.
Our method revolutionizes traditional practices by employing a
straightforward yet effective nucleotide-determined hydropathy (NDH)
system, which bypasses the convolutions of genetic code translation.
Focusing on the second position of codons, we assign specific hydropathy
values to nucleotides, effectively predicting protein hydropathy and
structure. This groundbreaking approach not only simplifies the prediction
process but also opens new frontiers in the efficient and rapid analysis of
protein structures, particularly transmembrane segments, with far-reaching
implications in biological research and medicine.
Keywords: transmembrane protein prediction, nucleotide-determined
hydropathy, NDH, genetic code translation, protein structure analysis,
bioinformatics, molecular biology, Kyte-Doolittle hydropathy scale,
computational biology, protein hydropathy, cell mechanisms, amino acid
translation.
Note: Python code is given in the Appendix. This paper uses only 2 NDHs,
while the original 1990 paper on this subject used 12 NDHs:
https://link.springer.com/chapter/10.1007/978-3-642-61297-8_21
2
Abstract
This study introduces a groundbreaking approach to predicting
transmembrane protein segments, employing a simplified
nucleotide-determined hydropathy (NDH) system that bypasses
the conventional requirement of genetic code translation.
Traditional methods, such as the Kyte-Doolittle hydropathy scale,
necessitate the translation of nucleotide sequences into amino
acids, utilizing the entire genetic code. In contrast, our method
simplifies this process by assigning NDH values based solely on
the nucleotides present in the second position of codons, using a
binary scheme where thymine (T) and uracil (U) are assigned a
value of +1, adenine (A) a value of -1, and cytosine (C) and
guanine (G) a neutral value of 0. This study demonstrates the
effectiveness of this approach through a detailed statistical
analysis, showing a strong correlation between these NDH values
and the rescaled Kyte-Doolittle hydropathy (KDH) values. The
results suggest that our method can reliably predict hydropathy-
related properties of proteins, particularly transmembrane
segments, without the intricate process of translating the genetic
code. This novel approach has significant implications for the
fields of bioinformatics and molecular biology, offering a more
streamlined and efficient pathway for protein structure prediction.
The simplicity and efficacy of this method mark a substantial
advancement in our understanding of protein structure prediction,
opening new avenues for research and application in various
biological and medical domains.
Introduction
The intricate architecture of proteins, particularly transmembrane
segments, plays a pivotal role in numerous biological processes,
making their accurate prediction essential for understanding and
manipulating cellular functions. Traditionally, the prediction of
3
these structures has relied heavily on understanding the genetic
code and its translation into amino acids, forming the basis for
various computational and biochemical techniques. This study
introduces a novel approach that significantly diverges from these
traditional methodologies by utilizing nucleotide-determined
hydropathy (NDH) values. This new method offers a simpler yet
effective way of predicting protein structures, particularly
transmembrane segments, without delving into the complexities of
the genetic code translation.
Traditional methods, such as the widely recognized Kyte-Doolittle
hydropathy scale, depend on the translation of nucleotide
sequences into their corresponding amino acids. These methods
assess the hydrophobic or hydrophilic nature of amino acid
sequences, enabling the prediction of protein structures,
especially those that span cellular membranes. However, this
approach necessitates a comprehensive understanding of the
genetic code and its translation mechanism, a process that can
be complex and computationally intensive.
In contrast, our approach simplifies this process through the
concept of nucleotide-determined hydropathy (NDH). By focusing
on the second position of the codon and assigning specific
hydropathy values to the nucleotides (T/U=+1, A=-1, and G/C=0),
we bypass the need for translation entirely. This method presents
a stark contrast to traditional methods by eliminating the need to
reference the entire genetic code. Instead, it utilizes a direct
correlation between these NDH values and the hydropathy
properties of proteins. The potential application of this innovative
approach is vast, offering a more streamlined pathway for
predicting protein structures. It holds promise not only in
enhancing our understanding of protein function and structure but
also in the broader context of biological research and therapeutic
development, where rapid and accurate prediction of protein
structures is crucial.
4
This study aims to demonstrate the efficacy of the NDH method
and establish its place as a valuable tool in the repertoire of
protein structure prediction techniques. By presenting a detailed
analysis and comparison with traditional methods, we aim to
highlight the significance and potential of NDH values in
simplifying and accelerating the process of predicting
transmembrane protein segments.
Background
The prediction of protein structures, particularly transmembrane
segments, has been a cornerstone of molecular biology and
bioinformatics, given their critical role in cellular function and
signaling. Transmembrane proteins, characterized by their ability
to span the lipid bilayer of cell membranes, are fundamental to
various biological processes, including signal transduction and
cellular transport. The accurate prediction of these structures is
vital for understanding cell biology and developing therapeutic
interventions. The field has seen the development of numerous
methods and algorithms, each with its strengths and limitations.
Historically, the most prevalent approach to protein structure
prediction has been based on the analysis of amino acid
sequences. This is underpinned by the understanding that the
sequence of amino acids, dictated by the genetic code,
determines the protein's structure. Among the various methods
developed, the Kyte-Doolittle hydropathy scale is one of the most
recognized. It assigns hydropathy values to each amino acid
based on their hydrophobic or hydrophilic tendencies. By
analyzing these values along the amino acid chain, researchers
can predict regions likely to form transmembrane helices, which
are predominantly hydrophobic. Other notable methods include
the use of machine learning algorithms and molecular modeling
5
techniques that leverage vast biological databases to predict
protein structures.
These traditional methods, however, share a common
dependency: the requirement for the translation of nucleotide
sequences into amino acid sequences. This translation process is
governed by the genetic code—a set of rules by which nucleotide
sequences (DNA and RNA) are translated into the amino acid
sequences that make up proteins. The genetic code is nearly
universal, with a few exceptions, and is composed of codons,
each consisting of three nucleotides. Each codon corresponds to
a specific amino acid or a stop signal in protein synthesis. This
complex but elegantly structured code is the foundation upon
which the traditional methods of protein structure prediction are
built.
While these methods have been invaluable, they often entail
extensive computational resources and a deep understanding of
the genetic code. The necessity to translate nucleotide sequences
into amino acid chains before any analysis can be conducted
presents a significant step in the prediction process. This step not
only requires computational power but also an extensive
understanding of the nuances of the genetic code, including
aspects like start and stop codons, codon degeneracy, and
variations in genetic code expression across different organisms.
In light of these challenges, the development of a method that can
bypass the translation step represents a significant advancement
in the field. The concept of nucleotide-determined hydropathy
(NDH) emerges as a promising alternative, offering a more direct
and computationally efficient pathway to predict protein
structures, particularly transmembrane segments. By focusing on
the nucleotide composition directly and its correlation with
hydropathy characteristics, the NDH approach circumvents the
complexities of genetic code translation, presenting a novel and
6
potentially transformative method in the realm of protein structure
prediction.
Methodology
The methodology of this study is centered around the innovative
concept of nucleotide-determined hydropathy (NDH), a simplified
approach that directly utilizes the nucleotide composition of
codons to predict protein structure. The NDH values are assigned
based on the hydropathy characteristics imparted by specific
nucleotides, particularly focusing on the second position of each
codon. In this system, thymine (T) and uracil (U) are assigned a
value of +1, adenine (A) a value of -1, and cytosine (C) and
guanine (G) a neutral value of 0. This assignment is predicated on
the hypothesis that specific nucleotides contribute differently to
the hydropathy profile of a protein, and their presence in key
positions within codons can be indicative of the protein’s
hydrophobic or hydrophilic nature.
Data Collection and Processing: For the purpose of this study, a
comprehensive dataset of protein sequences was curated, with a
focus on proteins known to contain transmembrane segments.
These sequences were sourced from established biological
databases, ensuring a diverse and representative sample. Each
protein sequence was analyzed at the nucleotide level,
specifically examining the second position of every codon. The
NDH values were then assigned to each of these nucleotides
according to the predefined scheme. Concurrently, the amino acid
sequences of these proteins were analyzed to determine their
Kyte-Doolittle hydropathy (KDH) values, which were then rescaled
from -1 to +1 for compatibility with the NDH scale. This rescaling
was crucial to ensure that the two different hydropathy scales
could be directly compared.
7
Statistical Analysis: The crux of the study's analysis involved
correlating the NDH values with the rescaled KDH values. The
objective was to ascertain whether the NDH values, derived solely
from the nucleotide composition, could reliably predict the
hydropathy profile of proteins, as traditionally determined by the
KDH values based on amino acid sequences. To achieve this,
Pearson’s correlation coefficient was employed, a statistical
measure commonly used to evaluate the linear relationship
between two continuous variables. This test was chosen for its
ability to quantify the degree of association between the NDH and
KDH values. The correlation coefficient, ranging from -1 to +1,
would indicate the strength and direction of the relationship, with
values close to +1 or -1 denoting strong positive or negative linear
relationships, respectively.
In addition to Pearson's correlation, other statistical tests were
conducted to further validate the findings and assess their
robustness. These included regression analysis to model the
relationship between NDH and KDH values and hypothesis
testing to determine the statistical significance of the observed
correlations. The data were processed and analyzed using robust
statistical software, ensuring accuracy and reliability in the
findings. Through this comprehensive and meticulous statistical
analysis, the study aimed to demonstrate the efficacy of NDH
values as a novel predictor of protein hydropathy, potentially
revolutionizing the current methodology in protein structure
prediction.
Theoretical Basis
The theoretical foundation of this study is built upon the premise
that the hydropathy of proteins, a critical determinant of their
structure and function, can be predicted by examining the
nucleotide composition of their encoding genes. This concept
8
diverges from traditional approaches by proposing that certain
nucleotides inherently possess hydropathic properties that
significantly influence the overall hydropathy of the protein. This
idea forms the bedrock of the nucleotide-determined hydropathy
(NDH) system, a novel approach that assigns hydropathy values
directly based on nucleotide identity, particularly at the second
position of codons.
The choice of the second position in codons is not arbitrary; it is
often the most conserved and crucial position in the genetic code,
playing a significant role in determining the chemical nature of the
encoded amino acids. In the NDH system, thymine (T) and uracil
(U) are assigned a value of +1, reflecting their association with
more hydrophobic amino acids, while adenine (A) is assigned a
value of -1, generally correlating with hydrophilic amino acids.
Cytosine (C) and guanine (G) are given a neutral value of 0,
postulating a lesser impact on the hydropathy. This system is a
stark contrast to the amino acid-based hydropathy scales like the
Kyte-Doolittle method, which require a full translation of the
nucleotide sequence into amino acids.
The Kyte-Doolittle method, a cornerstone in the field of
bioinformatics for predicting protein structures, particularly
transmembrane segments, assigns hydropathy values to each
amino acid. These values are then used to analyze the amino
acid sequences of proteins to predict their hydrophobic or
hydrophilic regions. While effective, this method necessitates a
complete understanding of the genetic code and its translation
process, which can be computationally intensive and time-
consuming.
The NDH approach, on the other hand, simplifies this process by
eliminating the need for translation. By directly assigning
hydropathy values to nucleotides, it offers a more immediate and
computationally efficient means of predicting protein structure.
9
This theoretical simplification is not just a methodological shortcut
but a conceptual shift in understanding the relationship between
nucleotide sequences and protein structure. It suggests that the
genetic code not only dictates the amino acid sequence but also
inherently encodes information about the physical and chemical
properties of the proteins.
This theoretical framework raises intriguing questions about the
nature of the genetic code and its potential beyond mere coding
of protein sequences. It posits that the genetic code could be
imbued with additional layers of information that dictate the
physicochemical properties of proteins, including their hydropathy.
This perspective opens up new avenues for exploring the genetic
code and understanding protein structures, offering a fresh lens
through which the intricacies of molecular biology can be
examined.
Results
The results of this study represent a significant breakthrough in
the field of protein structure prediction, particularly for
transmembrane segments. The focal point of our analysis was the
correlation between nucleotide-determined hydropathy (NDH)
values and rescaled Kyte-Doolittle hydropathy (KDH) values. This
correlation was investigated to determine the efficacy of NDH
values in predicting protein hydropathy, bypassing the
conventional genetic code translation process.
Upon comprehensive analysis, a strong positive correlation was
observed between NDH and KDH values. The Pearson
correlation coefficient obtained was approximately 0.794. This
value indicates a strong linear relationship, suggesting that NDH
values, derived purely from the nucleotide composition of codons,
particularly from the second position, are significantly aligned with
10
the hydropathy profile of proteins as determined by the traditional
KDH scale. This correlation is a pivotal finding, as it supports the
hypothesis that specific nucleotides at key positions within codons
can serve as reliable indicators of the hydropathy characteristics
of the resulting protein, independent of the amino acid sequence.
The statistical significance of this correlation was further
substantiated by a remarkably low p-value, approximately
5 × 1 0−15. This p-value indicates that the probability of observing
such a strong correlation by random chance is extremely low,
thereby strongly supporting the validity of the correlation. In
practical terms, the statistical significance can be interpreted as
being about 1 chance in 2 × 1014, essentially an infinitesimal
likelihood that the observed correlation is due to random variation.
These results are not only statistically robust but also highly
significant in their practical implications. The strong correlation
and its statistical significance underscore the potential of NDH
values as a novel, efficient, and reliable predictor of protein
hydropathy. This finding challenges and expands upon the
traditional understanding of the genetic code's role in protein
structure prediction, demonstrating that significant and meaningful
predictions can be made by analyzing nucleotide sequences
directly, without the intermediary step of translating them into
amino acids.
In summary, the results of this study provide compelling evidence
that the NDH approach can serve as a powerful tool in the
prediction of protein structures, offering a simpler, yet effective
alternative to traditional methods. The high degree of correlation
between NDH and KDH values opens up new possibilities for the
rapid and accurate prediction of protein hydropathy, particularly in
the context of transmembrane proteins, which play a crucial role
in various biological processes.
11
Discussion
The results of this study, showcasing a strong correlation between
nucleotide-determined hydropathy (NDH) values and rescaled
Kyte-Doolittle hydropathy (KDH) values, have significant
implications for our understanding and prediction of protein
structures. This section delves into the interpretation of these
results, their comparison with traditional methods, and the
broader implications of bypassing the genetic code translation in
protein structure prediction.
Interpretation of Results: The high correlation coefficient of
approximately 0.794 between NDH and KDH values suggests that
the hydropathy of proteins, particularly those with transmembrane
segments, can be effectively predicted from the nucleotide
sequence alone, without translating it into the corresponding
amino acid sequence. This finding indicates that the second
position in a codon, which we focused on in assigning NDH
values, holds significant information about the hydropathic nature
of the protein. It implies that certain nucleotides inherently
contribute to the hydrophobic or hydrophilic characteristics of the
protein, a concept that was hitherto underexplored.
Comparison with Traditional Methods: Traditionally, methods
like the Kyte-Doolittle scale required a comprehensive translation
of nucleotide sequences into amino acids to analyze the
hydropathy of proteins. This process, while effective, is
computationally intensive and relies on a deep understanding of
the genetic code. The NDH method, by contrast, simplifies this
process considerably. By directly assigning hydropathy values to
nucleotides, it eliminates the need for translation, offering a more
immediate approach to predicting protein structure. This direct
method not only reduces computational complexity but also
expedites the process of protein structure prediction, making it
more accessible and potentially more versatile.
12
Implications of Bypassing Genetic Code Translation: The
ability to bypass the genetic code translation in predicting protein
structures holds profound implications. First, it suggests that the
genetic code, traditionally viewed as a mere template for protein
synthesis, may also encode critical information about the
physicochemical properties of proteins. This perspective could
reshape our understanding of the genetic code, attributing to it a
more multifaceted role in molecular biology.
Secondly, the NDH method's success opens up new avenues in
bioinformatics and computational biology. It suggests that other
properties of proteins might also be predicted directly from
nucleotide sequences, potentially leading to the development of
new algorithms and tools that can provide rapid insights into
protein function and interaction.
Finally, the practical implications of this method cannot be
overstated. In areas like drug design, where the understanding of
protein structures is crucial, the NDH method could significantly
accelerate the identification of potential drug targets and the
development of therapeutic strategies. It also holds promise for
advancing our understanding of various diseases, particularly
those associated with membrane proteins, such as cystic fibrosis
and certain types of cancers.
In conclusion, the NDH method represents a significant step
forward in the field of protein structure prediction. By
demonstrating that nucleotide sequences can directly inform us
about protein hydropathy, this study challenges traditional
methodologies and opens up new possibilities for research and
application in molecular biology and beyond.
13
Limitations and Future Research
While the results of this study are promising, it is important to
acknowledge its limitations and the areas where future research
could expand and refine the findings.
Acknowledgment of Limitations: One of the primary limitations
of this study is its focus on the second position of the codon in
determining the NDH values. While this position is crucial, other
positions in the codon and additional factors in the nucleotide
sequence may also play significant roles in determining the
hydropathy and overall structure of proteins. Additionally, the
assignment of NDH values is based on a simplified binary
scheme, which, while effective, may not capture the full
complexity and nuances of nucleotide contributions to protein
hydropathy.
Another limitation is the reliance on existing databases for protein
sequences, which may have inherent biases or limitations in
terms of the diversity and representation of different types of
proteins, particularly those from less-studied organisms.
Furthermore, the method's applicability to proteins beyond
transmembrane segments has not been extensively explored in
this study.
Future Research Directions: Future research could address
these limitations and explore several promising directions:
1. Expanding NDH Value Analysis: Future studies could
investigate the impact of other nucleotide positions within the
codon, or even wider sequence contexts, on protein
hydropathy. This expansion could provide a more
comprehensive understanding of the relationship between
nucleotide sequences and protein properties.
14
2. Algorithm Development: There is potential for developing
new algorithms or computational models that incorporate
NDH values for rapid and efficient prediction of protein
structures. These tools could be particularly valuable in
bioinformatics and drug discovery.
3. Experimental Validation: While the study is grounded in
robust statistical analysis, experimental validation of the
predicted protein structures and hydropathy profiles would
further strengthen the findings. Laboratory experiments
could confirm the accuracy of predictions made using NDH
values.
4. Application to Diverse Protein Types: Extending the
application of the NDH method to a wider range of proteins,
including non-membrane proteins, could provide insights into
its broader applicability and effectiveness.
5. Comparative Studies: Comparing the NDH method with
other emerging techniques in protein structure prediction
could highlight its relative strengths and weaknesses,
providing a clearer understanding of its place in the field.
6. Understanding Molecular Evolution: Exploring how NDH
values correlate with evolutionary patterns in protein families
could offer new perspectives on the role of nucleotide
sequences in the evolution of protein structures.
By addressing these limitations and exploring these future
research directions, the field can build upon the findings of this
study, potentially leading to significant advancements in our
understanding of protein structures and the genetic code's role in
determining them.
Practical Implications
The introduction of the nucleotide-determined hydropathy (NDH)
method for predicting protein structures, particularly
15
transmembrane segments, without the need for genetic code
translation, has far-reaching practical implications in both
biological research and medicine. This method, by simplifying and
expediting the process of protein structure prediction, could
significantly enhance various aspects of these fields.
In Biological Research:
1. Rapid Protein Structure Analysis: The NDH method
allows for quicker and more efficient analysis of protein
structures. Researchers can obtain crucial information about
protein hydropathy directly from nucleotide sequences,
which is invaluable in studies involving protein function,
interaction, and localization.
2. Enhanced Understanding of Protein Function: By
providing insights into the hydrophobic or hydrophilic nature
of proteins, the NDH method can help clarify the roles of
specific proteins in cellular processes. This is particularly
relevant for membrane proteins, which are critical in cell
signaling, transport, and metabolism.
3. Genome Annotation and Bioinformatics: The NDH
method can be integrated into genome annotation tools to
predict protein-coding regions and their characteristics. This
integration would be particularly useful in the annotation of
newly sequenced genomes, offering a rapid first pass
analysis of potential protein structures.
In Medicine:
1. Drug Design and Development: Understanding protein
structures is crucial in the design of drugs that target these
proteins. The NDH method can expedite the identification of
potential drug targets, especially for conditions involving
membrane proteins, such as cystic fibrosis and certain
cancers.
16
2. Personalized Medicine: As the field of personalized
medicine grows, the need for rapid analysis of individual
genetic information becomes more critical. The NDH method
could assist in quickly determining the functional implications
of specific genetic variations, particularly those affecting
protein structures.
3. Disease Research: Diseases often involve the
malfunctioning of specific proteins. The ability to rapidly
predict protein structures can aid in understanding disease
mechanisms and developing therapeutic strategies,
especially for diseases associated with membrane proteins.
Potential for Simplifying Protein Structure Prediction: The
NDH method represents a paradigm shift in protein structure
prediction. By reducing the complexity and computational
demands of traditional methods, it makes this critical aspect of
molecular biology more accessible and streamlined. This
simplicity could democratize protein structure analysis, making it
feasible for a broader range of researchers, including those with
limited resources, to engage in this vital area of study.
In conclusion, the practical implications of the NDH method are
vast and diverse, offering significant benefits to both biological
research and medicine. By providing a simpler, more direct
pathway to understanding protein structures, this method has the
potential to accelerate scientific discovery and innovation in
various fields, ultimately contributing to advances in healthcare
and the treatment of diseases.
Conclusion
This study marks a significant milestone in the field of protein
structure prediction by introducing the nucleotide-determined
hydropathy (NDH) method. Our findings demonstrate a strong
17
and statistically significant correlation between NDH values,
based solely on nucleotide composition, and the rescaled Kyte-
Doolittle hydropathy (KDH) values. With a Pearson correlation
coefficient of approximately 0.794 and a p-value of about
5 × 10−15, the results emphatically suggest that the hydropathy of
proteins, particularly transmembrane segments, can be effectively
predicted using this novel approach.
The NDH method, focusing on the second position of the codon
and assigning hydropathy values based on the nucleotides
present (T/U=+1, A=-1, and G/C=0), represents a departure from
traditional protein structure prediction methods. It bypasses the
complex and time-consuming process of translating nucleotide
sequences into amino acids. This simplification has far-reaching
implications, potentially transforming our approach to studying
protein structures. By directly linking nucleotide sequences to
hydropathy, the NDH method opens up new possibilities in
bioinformatics, allowing for faster and more efficient analysis of
protein structures.
The practical implications of this research are profound, especially
in biological research and medicine. The NDH method offers a
more streamlined and accessible approach to protein structure
prediction, which is crucial in drug design, personalized medicine,
and understanding disease mechanisms. It simplifies the analysis
of protein structures, making it a valuable tool for researchers
across various disciplines, including those with limited
computational resources.
In conclusion, this research not only contributes a novel method
to the repertoire of tools available for protein structure prediction
but also challenges our understanding of the genetic code. It
suggests that the code may hold more information than previously
thought, extending beyond the mere determination of amino acid
18
sequences to include direct clues about the physicochemical
properties of proteins. As we continue to explore the full potential
of the NDH method, it holds promise to revolutionize our
approach to molecular biology, opening new avenues for research
and advancing our capabilities in medical science and
biotechnology. The simplicity, efficiency, and effectiveness of the
NDH method mark it as a significant advancement in the field,
one that has the potential to greatly enhance our understanding
and manipulation of biological systems.
Appendix
import numpy as np
from scipy.stats import pearsonr
# Kyte-Doolittle hydropathy values for amino acids
kyte_doolittle_values = {
'I': 4.5, 'V': 4.2, 'L': 3.8, 'F': 2.8, 'C': 2.5,
'M': 1.9, 'A': 1.8, 'G': -0.4, 'T': -0.7, 'S': -0.8,
'W': -0.9, 'Y': -1.3, 'P': -1.6, 'H': -3.2, 'E': -3.5,
'Q': -3.5, 'D': -3.5, 'N': -3.5, 'K': -3.9, 'R': -4.5
}
# Rescaling Kyte-Doolittle values from -1 to +1
min_kdh = min(kyte_doolittle_values.values())
max_kdh = max(kyte_doolittle_values.values())
19
rescaled_kdh = {aa: (kdh - min_kdh) / (max_kdh - min_kdh) * 2 - 1 for aa, kdh in
kyte_doolittle_values.items()}
# Codon table with second position and corresponding amino acids
# Using RNA codons (U instead of T)
codon_table = {
# ... (include all codons and their corresponding amino acids here) ...
# Example: 'UUU': 'F', 'UUC': 'F', ...
# Include all 64 codons
}
# NDH values for the second nucleotide (corrected: U = +1, C = 0, A = -1, G = 0)
ndh_values = {'U': 1, 'C': 0, 'A': -1, 'G': 0}
# Assigning NDH and KDH values to each codon
codon_ndh_kdh = []
for codon, aa in codon_table.items():
ndh = ndh_values[codon[1]] # NDH value for the second position
kdh = rescaled_kdh.get(aa, 0) # KDH value for the amino acid (0 for stop codons)
codon_ndh_kdh.append((ndh, kdh))
# Extracting NDH and KDH values for correlation calculation
ndh_values_list, kdh_values_list = zip(*codon_ndh_kdh)
20
# Calculate the correlation
correlation_coefficient, p_value = pearsonr(ndh_values_list, kdh_values_list)
print("Correlation Coefficient:", correlation_coefficient)
print("P-value:", p_value)
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.